



#### Registers

- Recall memory is compared to a processor
- Registers provide storage ٠ locations within the processor



**USC**Viterbi

#### What if we didn't have registers?

- Example w/o registers: F = (X+Y) (X\*Y)٠
  - Requires an ADD instruction, MULtiply instruction, and SUBtract Instruction
  - w/o registers
    - ADD: Load X and Y from memory, store result to memory
    - MUL: Load X and Y again from mem., store result to memory
    - · SUB: Load results from ADD and MUL and store result to memory
    - memory accesses



USC Viterbi

## **General Purpose Registers**

- Registers available to software instructions for use by the
- Programmer/compiler is in charge of using these registers as inputs (source locations) and outputs (destination locations)



USC Viterbi

# What if we have registers?

- Example w/ registers: F = (X+Y) (X\*Y)
  - Load X and Y into registers
  - ADD: R0 + R1 and store result in R2
  - MUL: R0 \* R1 and store result in R3
  - SUB: R2 R3 and store result in R4
  - Store R4 back to memory
  - total memory access



#### (4.13) **USC**Viterbi **Other Registers**

- Some bookkeeping information is needed to make the processor ٠ operate correctly
- Example: (PC/IP) Reg.
  - Recall that the processor must fetch instructions from memory before decoding and executing them
  - PC/IP register holds the address of the instruction to fetch



USC Viterbi

# Fetching an Instruction

- ٠ To fetch an instruction
  - PC/IP contains the address of the instruction
  - The value in the PC/IP is placed on the address bus and the memory is told to read
  - The PC/IP is incremented, and the process is repeated for the next instruction



USC Viterbi Fetching an Instruction

**USC**Viterbi

- ٠ To fetch an instruction
  - PC/IP contains the address of the instruction
  - The value in the PC/IP is placed on the address bus and the memory is told to read
  - The PC/IP is \_\_\_\_\_, and the process is repeated for the next instruction



## **Control Circuitry**

- Control circuitry is used to
   the instruction and then generate the necessary signals to complete its execution
- Controls the ALU
  - registers to be used as source and destination locations



#### 4.17 **USC**Viterbi **Control Circuitry Summary** Assume 0x0201 is machine code for an ADD instruction of R2 ٠ Registers are used for fast, temporary storage in the = R0 + R1 processor Control Logic will... - Data (usually) must be moved into registers select the registers (R0 and R1) - tell the ALU to add • The PC or IP register stores the address of the next select the destination register (R2) instruction to be executed Memory Processor 0 - Maintains the current execution location in the program PC/IP 0 Addr Control 0201 ADD inst. 2 0201 2 inst. 3 0x0123 in1 out ALU 0x0456 3 Data inst. 4 ADD 0x0579 inst. 5 in2 .... **R0-Rn-1** Control FF **USC**Viterb **USC**Viter **Memory and Addresses** • Set of cells that each store a group of bits Address Data Address Inputs A[0] - Usually, 1 byte (8 bits) per 11010010 0 cell 01001011 1 Unique 10010000 2 A[n-1] (number) assigned to each 11110100 3 01101000 cell Data Inputs/Outputs 11010001 5 Used to reference the value D[0] in that location ... **UNDERSTANDING MEMORY** ••• and D[7] are both stored in memory FFFF 00001011 and are always represented as a string of 1's and 0's Memory Device











# **Immediate Examples**

|                    |                       |          | Memory / RAM |         |  |
|--------------------|-----------------------|----------|--------------|---------|--|
| Immediate Examples |                       |          | 7654 3210    | 0x00204 |  |
|                    |                       |          | fedc ba98    | 0x00200 |  |
|                    | Processor Register    | ffff fff | f 1234 5678  | rax     |  |
| - movl             | \$0xfe1234, %eax      |          |              | rax     |  |
| — movw             | \$0xaa55, %ax         |          |              | rax     |  |
| - movb             | \$20, %al             |          |              | rax     |  |
| — movq             | \$-1, %rax            |          |              | rax     |  |
| — movabsq          | \$0x123456789ab, %rax |          |              | rax     |  |
| - movq             | \$-1, 0x4e0           |          |              | 0x004e8 |  |
|                    | · •                   |          |              | 0x004e0 |  |
| Rules:             |                       |          |              |         |  |

#### Immediates must be source operand

- Indicate with '\$' and can be specified in decimal (default) or hex (start with 0x)
- movg can only support a 32-bit immediate (and will then sign-extend that value to fill the upper 32-bits) Use movabsq for a full 64-bit immediate value

# **Move Variations**

- There are several variations when the destination of a mov instruction is a register
  - This only applies when the is a register
- Normal mov does \_\_\_\_\_ upper portions of registers (with exception of mov1)
- movzxy will the upper portion
  - movzbw (move a byte from the source but zero-extend it to a word in the dest. register)
  - movzbw, movzbl, movzbq, movzwl, movzwq
- movsxy will the upper portion
  - movsbw (move a byte from the source but sign-extend it to a word in the dest. register)
  - movsbl, movsbl, movsbq, movswl, movswq, movslq

**USC**Viterbi

# Zero/Signed Move Variations

Memory / RAM Initial Conditions: 7654 3210 0x00204 fedc ba98 0x00200 Processor Register 0123 4567 89ab cdef rdx - movslq 0x200, %rax rax - movzwl 0x202, %eax rax -movsbw 0x201, %axrax - movsbl 0x206, %eax rax - movzbg %dl, %rax rax

#### Why So Many Oddities & Variations

- The x86 instruction set has been around for nearly 40 years and each new processor has had to maintain backward compatibility (support the old instruction set) while adding new functionality
- If you wore one clothing article from each decade you'd look funny too and have a lot of oddities



**USC**Viterb





80s



## Summary

- To access different size portions of a register requires different names in x86 (e.g. AL, AX, EAX, RAX)
- Moving to a register may involve zero- or signextending since registers are 64-bits
  - Long (dword) operations always 0-extend the upper 32-bits
- Moving to memory never involves zero- or sign-extending since it memory is broken into finer granularities

#### **ADDRESSING MODES**

USC Viterbi (4.47)

# What Are Addressing Modes

- Recall an operand must be:
  - A \_\_\_\_\_\_ value (e.g. %rax)
  - A value in a \_\_\_\_\_ location
  - An \_\_\_\_\_
- To access a memory location we must supply an \_\_\_\_\_
  - However, there can be many ways to compute an address, each useful in particular contexts [e.g. accessing an array element, a[i] vs. object member, obj.member]
- Proc. Reg. Reg. ALU ALU Mem. 400 Inst. 401 Inst. D D ALU

School of Engineering

#### Common x86-64 Addressing Modes

CS:APP 3.4.1

|                                 |                                                           |                           | C3.ATT 3.4.1                          |
|---------------------------------|-----------------------------------------------------------|---------------------------|---------------------------------------|
| Name                            | Form                                                      | Example                   | Description                           |
| Immediate                       | \$imm                                                     | movl \$-500,%rax          | R[rax] = imm.                         |
| Register                        | r <sub>a</sub>                                            | movl %rdx,%rax            | R[rax] = R[rdx]                       |
| Direct<br>Addressing            | imm                                                       | movl 2000,%rax            | R[rax] = M[2000]                      |
| Indirect<br>Addressing          | (r <sub>a</sub> )                                         | movl (%rdx),%rax          | $R[rax] = M[R[r_a]]$                  |
| Base w/<br>Displacement         | imm(r <sub>b</sub> )                                      | movl 40(%rdx),%rax        | $R[rax] = M[R[r_b]+40]$               |
| Scaled Index                    | (r <sub>b</sub> ,r <sub>i</sub> ,s†)                      | movl (%rdx,%rcx,4),%rax   | $R[rax] = M[R[r_{b}] + R[r_{i}]^* s]$ |
| Scaled Index w/<br>Displacement | <pre>imm(r<sub>b</sub>,r<sub>i</sub>,s<sup>†</sup>)</pre> | movl 80(%rdx,%rcx,2),%rax | $R[rax] = M[80 + R[r_b] + R[r_i]^*s]$ |

†Known as the scale factor and can be {1,2,4, or 8}
 Imm = Constant, R[x] = Content of register x, M[addr] = Content of memory @ addr.
 Purple values = effective address (EA) = Actual address used to get the operand

• The ways to specify the operand location are known as \_\_\_\_\_



#### USC Viterbi (4.53) Base/Indirect with Displacement **Addressing Mode**

- Form: d(%reg) ٠
- Adds a constant displacement to the value in a register and ٠ uses the sum as the effective address of the actual operand in memory



**USC**Viterbi

Memory / RAM

Assembly

**Base/Indirect with Displacement Example** 

rbx

1 0000 0000 0000 0200

Useful for access members of a struct or object

struct mystruct {

int x;

int y;





• If a 64-bit immediate is needed, use movabsq to place the immediate in a register and then add two regs

### lea Instruction

CS:APP 3.5.1

• Recall the exotic addressing modes supported by x86

imm(r<sub>b</sub>,r<sub>i</sub>,s) movl 80(%rdx,%rcx,2),%rax  $R[rax] = M[80 + R[r_b] + R[r_i]^*s]$ Scaled Index w/ Displacement

- The hardware has to support the calculation of the (i.e. \_\_\_\_\_adds + \_\_\_\_mul [by 2,4,or 8])
- Meanwhile normal add and mul instructions can only do operation at a time
- Idea: Create an instruction that can use the address calculation hardware but for \_\_\_\_\_ ops
- lea =
  - lea 80(%rdx,%rcx,2),\$rax; // \$rax=
  - Computes the "address" and just puts it in the destination (doesn't load anything from memory)



x86 Convention: The return value of a function is expected in %eax / %rax

Arithmetic and Logic Instructions

**USC**Viterbi

(4.65)

| C operator                            | Assembly                                                    | Notes                                                 |
|---------------------------------------|-------------------------------------------------------------|-------------------------------------------------------|
| +                                     | add[b,w,l,q] src1,src2/dst                                  | <pre>src2/dst += src1</pre>                           |
| -                                     | <pre>sub[b,w,l,q] src1,src2/dst</pre>                       | <pre>src2/dst -= src1</pre>                           |
| &                                     | and[b,w,l,q] src1,src2/dst                                  | <pre>src2/dst &amp;= src1</pre>                       |
|                                       | or[b,w,l,q] src1,src2/dst                                   | <pre>src2/dst  = src1</pre>                           |
| ٨                                     | <pre>xor[b,w,l,q] src1,src2/dst</pre>                       | <pre>src2/dst ^= src1</pre>                           |
| ~                                     | not[b,w,l,q] src/dst                                        | <pre>src/dst = ~src/dst</pre>                         |
| -                                     | neg[b,w,l,q] src/dst                                        | <pre>src/dst = (~src/dst) + 1</pre>                   |
| ++                                    | inc[b,w,l,q] src/dst                                        | <pre>src/dst += 1</pre>                               |
|                                       | dec[b,w,l,q] src/dst                                        | src/dst -= 1                                          |
| * (signed)                            | <pre>imul[b,w,l,q] src1,src2/dst</pre>                      | <pre>src2/dst *= src1</pre>                           |
| << (signed)                           | sal cnt, src/dst                                            | <pre>src/dst = src/dst &lt;&lt; cnt</pre>             |
| << (unsigned)                         | shl cnt, src/dst                                            | <pre>src/dst = src/dst &lt;&lt; cnt</pre>             |
| >> (signed)                           | sar cnt, src/dst                                            | <pre>src/dst = src/dst &gt;&gt; cnt</pre>             |
| >> (unsigned)                         | shr cnt, src/dst                                            | <pre>src/dst = src/dst &gt;&gt; cnt</pre>             |
| ==, <, >, <=, >=, !=<br>(src2 ? src1) | <pre>cmp[b,w,l,q] src1, src2 test[b,w,l,q] src1, src2</pre> | cmp performs: src2-src1<br>test performs: src1 & src2 |

Initial Condition

| Initial Conditions                   | Processor Registers | 0000 | 0089 | 0000<br>1234<br>ff00 | 4000 | rcx<br>rdx<br>rbx |
|--------------------------------------|---------------------|------|------|----------------------|------|-------------------|
| <pre>- leal (%rdx,%rcx),%</pre>      | śrax                |      |      |                      |      | rax               |
| <pre>- leaq -8(%rbx),%rax</pre>      | (                   |      |      |                      |      | rax               |
| <ul><li>leaq 12(%rdx,%rcx,</li></ul> | 2),%rax             |      |      |                      |      | rax               |

Rules: leal zeroes out the upper 32-bits

USC Viterbi (4.69) USC Viterbi (4.70) mov and add/sub Examples **Compiler Example 1** Instruction M[0x7000] M[0x7004] %rax 5A13 F87C 2933 ABC0 0000 0000 0000 0000 movl \$0x26CE071B, 0x7000 // data = %edi f1: // val = %esi movsbw 0x7002,%ax // i = %edx int f1(int data[], int\* val, int i) 0x7004,%rax { movzwq int sum = \*val; sum += data[i]; movw \$0xFE44,7006 return sum; ret } addl 0x7000,%eax **Original Code Compiler Output** subb %eax,0x7007 x86 Convention: The return value of a function is expected in %eax / %rax USC Viterbi (4.71) **USC**Viterbi **Compiler Output 2** struct Data { f1: char c; int d; }; // ptr = %edi // x = %esi ret int f1(struct Data\* ptr, int x) Compiler output { ptr->c++; **ASSEMBLY TRANSLATION EXAMPLE** ptr->d -= x; } **Original Code Compiler Output** x86 Convention: The return value of a function is expected in %eax / %rax

# Translation to Assembly

- We will now see some C code and its assembly translation
- A few things to remember:
  - Data variables live in \_
  - Data must be brought into \_\_\_\_\_\_ before being processed
  - You often need an address/pointer in a register to load/store data to/from memory
- Generally, you will need 4 steps to translate C to assembly:
  - Setup a \_\_\_\_\_ in a register
  - \_\_\_\_\_ memory to a register (mov)
  - Process data (add, sub, and, or, shift, etc.)
  - \_\_\_\_\_ back to memory (mov)

# Translating HLL to Assembly

- Variables are simply locations in memory
  - A variable name really translates to an address in assembly

| C operator                   | Assembly                                                           | Notes                                                         |
|------------------------------|--------------------------------------------------------------------|---------------------------------------------------------------|
| int x,y,z;<br><br>z = x + y; | <pre>movl \$0x10000004,%ecx movl, %eax addl, %eax movl %eax,</pre> | Assume x @ 0x10000004<br>& y @ 0x10000008<br>& z @ 0x1000000C |
| char a[100];<br><br>a[1];    | <pre>movl \$0x1000000c,%ecx dec 1(%ecx)</pre>                      | Assume array 'a' starts @<br>0x1000000C                       |

USC Viterbi (4.75)

USC Viterbi (4.73)

# Translating HLL to Assembly

| C operator                                               | Assembly                                                                                                                          | Notes                                                                                                                                                    |
|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| int dat[4],x;<br><br>x = dat[0];<br>x += dat[1];         | <pre>movl \$0x1000010,%ecx movl (%ecx), %eax movl %eax, 16(%ecx) movl 16(%ecx), %eax addl 4(%ecx), %eax movl %eax, 16(%ecx)</pre> | Assume dat @ 0x10000010<br>& x @ 0x10000020<br>• Purple = Pointer init<br>• Blue = Read data from mem.<br>• Red = ALU op<br>• Green = Write data to mem. |
| unsigned int y;<br>short z;<br>y = y / 4;<br>z = z << 3; | <pre>movl \$0x10000010,%ecx movl (%ecx), %eax, %eax movl %eax, (%ecx) mov 4(%ecx), %ax, %ax mov %ax, 4(%ecx)</pre>                | Assume y @ 0x10000010 &<br>z @ 0x10000014                                                                                                                |

JSC Viterbi 4

How instruction sets differ

#### **INSTRUCTION SET ARCHITECTURE**

# Instruction Set Architecture (ISA)

- Defines the software interface of the processor and memory system
- Instruction set is the vocabulary the HW can understand and the SW is composed with
- 2 approaches
  - \_\_\_\_\_ = \_\_\_\_\_ instruction set computer
    - Large, rich vocabulary
    - More work per instruction but slower HW
      - \_\_ = \_\_\_\_\_ instruction set computer
    - Small, basic, but sufficient vocabulary
    - Less work per instruction but faster HW

# Components of an ISA

- Data and Address Size
   8-, 16-, 32-, 64-bit
- Which \_\_\_\_\_\_ does the processor support
  - SUBtract instruc. vs. NEGate + ADD instrucs.
  - \_\_\_\_\_ accessible to the instructions
  - How \_\_\_\_\_\_ and expected usage
  - How instructions can specify location of data operands
  - \_\_\_\_\_ and \_\_\_\_\_ of instructions
  - How is the operation and operands represented with 1's and 0's

SC Viterbi 4.79

**USC** Viterbi

# **General Instruction Format Issues**

- Different instruction sets specify these differently
  - 3 operand instruction set (ARM, PPC)
    - Similar to example on previous page
    - Format: ADD DST, SRC1, SRC2 (DST = SRC1 + SRC2)
  - 2 operand instructions (Intel)
    - Second operand doubles as source and destination
    - Format: ADD SRC1, S2/D (S2/D = SRC1 + S2/D)
  - 1 operand instructions (Old Intel FP, Low-End Embedded)
    - Implicit operand to every instruction usually known as the Accumulator (or ACC) register
    - Format: ADD SRC1 (ACC = ACC + SRC1)

**General Instruction Format Issues** 

- Consider the pros and cons of each format when performing the set of operations
  - F = X + Y Z
  - G = A + B
- Simple embedded computers often use single operand format

   Smaller data size (8-bit or 16-bit machines) means limited instruc. size
- Modern, high performance processors use 2- and 3-operand formats

| Single-Operand                                                                                                                            | Two-Operand                                           | Three-Operand                                                                                                                                     |
|-------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                           | MOVE F,X<br>ADD F,Y<br>SUB F,Z<br>MOVE G,A<br>ADD G,B | ADD F,X,Y<br>SUB F,F,Z<br>ADD G,A,B                                                                                                               |
| <ul> <li>(+) Smaller size to encode each<br/>instruction</li> <li>(-) Higher instruction count to<br/>load and store ACC value</li> </ul> | Compromise of two extremes                            | <ul> <li>(+) More natural program style</li> <li>(+) Smaller instruction count</li> <li>(-) Larger size to encode each<br/>instruction</li> </ul> |

#### USCViterbi

### Instruction Format

\_\_\_\_\_ architecture

- \_\_\_\_\_ (read) data values from memory into a register
- Perform operations on registers
- \_\_\_\_\_ (write) data values back to memory
- Different load/store instructions for different operand sizes (i.e. byte, half, word)

#### Load/Store Architecture



1.) Load operands to proc. registers



3.) Store results back to memory