



#### **Data Hazards**

- Consider the data dependencies in the following sequence
  - The last four are all dependent on register \$2
- But because of pipelining the instructions and, or, add could \_\_\_\_\_ before the sub writes its new result
- This is called a \_\_\_\_\_ more specifically a RAW (\_\_\_\_\_) Hazard
  - If the RAW hazards is not handled, incorrect program execution may result

| SUB | (\$2) | \$1, \$3  |
|-----|-------|-----------|
| AND | \$12, | \$2, \$5  |
| OR  | \$13, | \$6, \$2  |
| ADD | \$14, | \$2, \$2  |
| SW  | \$15, | 100 (\$2) |
|     |       |           |

# An Opening Example An Opening Example

EE 457 Unit 6b

Data Hazards



• Can the compiler solve this problem w/o hardware help?

### An Opening Example

- The compiler's solution is to insert nop (\_\_\_\_\_) instructions
- The effect is to push the dependency later in time





#### **Control for Data Hazards**

| • | two nardware solutions |
|---|------------------------|
|   | _                      |
|   |                        |

Stall Strategy:

- Detect the hazard and stall the dependent instructions in the pipeline until the
- Stalling is achieved by sending \_\_\_\_\_\_ forward into the pipe and not updating the stalled stage registers

## USC Viterbi 6b.6

## **Stalling Strategy**

 Since we must be careful not to read a \_\_\_\_\_ register value from the register file, we should detect hazards in the ID stage and stall the instruction there!

If an instruction stalls, all instructions behind it

- All instructions in front of it are

Insert "bubbles" into the
 \_\_\_\_\_ (set all control signals to 0 so no incorrect behavior takes place)

LW \$t1,4(\$s0) ADD \$t5,\$t1,\$t4

|    | Fetch | Decode | Exec. | Mem.  | WB    |
|----|-------|--------|-------|-------|-------|
| C1 | LW    |        |       |       |       |
| C2 | ADD   | LW     |       |       |       |
| СЗ | i     | ADD    | LW    |       |       |
| C4 | i     | ADD    | (nop) | LW    |       |
| C5 | i     | ADD    | (nop) | (nop) | LW    |
| C6 | i     | ADD    | (nop) | (nop) | (dou  |
| C7 | i+1   | i      | ADD   | (nop) | (nop) |
| C8 | i+2   | i+1    | i     | ADD   | (nop) |

Using Stalls to Handle
Dependencies (Data Hazards)



### **Detecting Data Hazards**

- Need to stall if an instruction in the last 3 stages is going to write a register the currently decoding instruction wants to read (i.e. READ-AFTER-WRITE)
- How would we know if an instruction in the pipe is going to write a register than an instruction in ID wants to read?
  - By comparing register ID values!!

| Cases for Detecting Data Dependecies |                                                   |  |  |
|--------------------------------------|---------------------------------------------------|--|--|
| 1a. ID/EX                            | and ID/EX.WriteRegister == IF/ID.ReadRegister1    |  |  |
| 1b. ID/EX                            | and ID/EX.WriteRegister == IF/ID.ReadRegister2    |  |  |
| 2a. EX/MEM                           | _ and EX/MEM.WriteRegister == IF/ID.ReadRegister1 |  |  |
| 2b. EX/MEM                           | _ and EX/MEM.WriteRegister == IF/ID.ReadRegister2 |  |  |
| 3a. MEM/WB                           | and MEM/WB.WriteRegister == IF/ID.ReadRegister1   |  |  |
| 3b. MEM/WB                           | and MEM/WB.WriteRegister == IF/ID.ReadRegister2   |  |  |



• Only stall if a Write register in one of the last 3 stages matches one of the read registers in the ID stage





## **HDU Operation**

| Hazard     | Detection                                                                                                          |
|------------|--------------------------------------------------------------------------------------------------------------------|
| EX Hazard  | ID/EX RegWrite and ((ID/EX.WriteRegister = IF/ID.ReadRegister1) or (ID/EX.WriteRegister = IF/ID.ReadRegister2))    |
| MEM Hazard | EX/MEM RegWrite and ((EX/MEM.WriteRegister = IF/ID.ReadRegister1) or (EX/MEM.WriteRegister = IF/ID.ReadRegister2)) |
| WB Hazard  | MEM/WB RegWrite and ((MEM/WB.WriteRegister = IF/ID.ReadRegister1) or (MEM/WB.WriteRegister = IF/ID.ReadRegister2)) |



## **HDU Logic**

- Detection logic requires \_\_\_\_\_\_ \_\_\_\_-bit comparators along with some AND and OR gates
- Upon detection, HDU inserts a bubble into the ID/EX stage register
  - Bubble = HW generated NOP = Turn all control signals to zeros



## **HDU** Implementation

How long do we stall

(wait \_\_\_\_\_ cycle)

| _ | If the hazard | exists in the EX stage, we need to insert $\_\_$ b | ubbles |
|---|---------------|----------------------------------------------------|--------|
|   | (wait         | ) before restarting the pipeline                   |        |
| _ | If the hazard | exists in the WB stage we only need to insert      | bubb   |

• So since the delay is time dependent does the HDU require a counter or state machine?

| _ | The producer instruction will keep                         | and         |
|---|------------------------------------------------------------|-------------|
|   | eventually clear. The HDU works by simply checking if      | hazard      |
|   | exists in the forward stages and inserts a bubble into the | ID/EX stage |
|   | register                                                   |             |

 If an EX hazard exists it will take 3 cycles to clear and thus the HDU will detect an EX hazard in one clock, a MEM hazard in the next, and a WB hazard in the third inserting a bubble for each of these cycle = 3 bubbles



## **HDU** Implementation

- What if two hazards exist at the same time
  - Again, any hazard should cause a bubble
  - The producing instructions will continue to move forward and

eventually clear

| SUB | \$2   | \$1,         | \$3   |
|-----|-------|--------------|-------|
| AND | (\$4) | \$2)         | \$5   |
| OR  | \$8,  | <b>\$2</b> , | \$6   |
| ADD | \$9,  | (\$4)        | (\$2) |
| SLT | \$1,  | \$6,         | \$7   |
|     |       |              |       |

|    | Fetch | Decode | Exec. | Mem. | WB |
|----|-------|--------|-------|------|----|
| C1 | SUB   |        |       |      |    |
| C2 | AND   | SUB    |       |      |    |
| C3 | OR    | AND    | SUB   |      |    |
| C4 |       |        |       |      |    |
| C5 |       |        |       |      |    |
| C6 |       |        |       |      |    |
| C7 |       |        |       |      |    |
| C8 |       |        |       |      |    |
| C9 |       |        |       |      |    |



USC Viterbi 6b.14

## Key Idea

While \$2 is not written until WB stage, the subtraction result is available at the end of the \_\_\_ stage (beginning of the \_\_\_ stage) and can be \_\_\_\_ to dependent instructions

SUB \$2, \$1, \$3 IM Reg ALU DM Reg

AND \$12, \$2, \$5 IM Reg ALU DM Reg

OR \$13, \$6, \$2 IM Reg ALU DM Reg

ADD \$14, \$2, \$2 Register file can be designed such that the

value being written can

immediately be forwarded to read ports

SW \$15, 100(\$2)

Register Forwarding/Bypassing

#### **REDUCING DATA HAZARDS**



#### Register File Internal Forwarding

- Internal Forwarding:
  - Value read = Value being written







#### Forwarding Unit Addition

- Remove the old HDU in the ID stage
- Add a new Forwarding Unit (FU) in the EX stage
  - Like HDU it services dependent instructions
  - Compares write register ID's in later stages to read register ID's in earlier stages



#### Forwarding Unit vs. HDU

- Since the HDU stalled instructions in the ID stage it needed to compare 2 source ID's with 3 destination ID's
- Because we let instructions fetch stale register values and just replace them in the EX (or MEM) stage, the forwarding Unit compares 2 source ID's with 2 destination ID's
- HDU had 6 comparators while the FU requires 4



### ReadReg1 Forwarding

ALUSelA mux



## ReadReg2 Forwarding

ALUSelB mux













#### Back to the HDU

- Re-introduce the HDU to handle the case of a LW immediately followed by a dependent instruction
- EX Hazard:

If (ID/EX.RegWrite and ID/EX.MemRead = 1 and
 (ID/EX.WriteRegRt = IF/ID.ReadReg1
 or
 ID/EX.WriteRegRt = IF/ID.ReadReg2))
Then

Stall the Pipeline

Note: We use MemRead = 1 to indicate the instruction in ID/EX is an LW. We could also use \_\_\_\_\_\_ or even \_\_\_\_\_

LW \$2, 100(\$1) AND \$12, \$2, \$5 OR \$13, \$6, \$2 ADD \$14, \$2, \$2 SW \$15, 100(\$2)

|            | Fetc<br>h | Decod<br>e | Exec. | Mem. | WB |
|------------|-----------|------------|-------|------|----|
| C1         | LW        |            |       |      |    |
| C2         | AND       | LW         |       |      |    |
| C3         | OR        | AND        | LW    |      |    |
| C4         |           |            |       |      |    |
| C5         |           |            |       |      |    |
| C6         |           |            |       |      |    |
| <b>C</b> 7 |           |            |       |      |    |

## Back to the HDU



## USC Viterbi 66.27

#### **One More Consideration**

- Consider the sequence shown to the right
- Is there a dependency?
- Do we have the forwarding paths to handle this dependency?
  - At first glance

| But we can _ |             |
|--------------|-------------|
|              | and use our |
|              |             |

SUB <mark>\$2</mark>, \$1, \$3 SW **\$2**, 40(\$6)

|    | Fetch | Decode | Exec. | Mem. | WB  |
|----|-------|--------|-------|------|-----|
| C1 | SUB   |        |       |      |     |
| C2 | SW    | SUB    |       |      |     |
| СЗ | i     | SW     | SUB   |      |     |
| C4 | i+1   | i      | SW    | SUB  |     |
| C5 | i+2   | i+1    | i     | SW   | SUB |

|    | Fetch | Decode | Exec. | Mem. | WB  |
|----|-------|--------|-------|------|-----|
| C1 | SUB   |        |       |      |     |
| C2 | SW    | SUB    |       |      |     |
| СЗ | i     | SW     | SUB   |      |     |
| C4 | i+1   | i      | SW    | SUB  |     |
| C5 | i+2   | i+1    | i     | SW   | SUB |

## Dealing with Memory Dependency





## **Calculating Stall Cycles**

• To find the number of bubbles (stall cycles) that the HDU will need to insert:





## **Forwarding Unit Complexity**

 Consider how many muxes and pathways must be added to support forwarding in the worst case? For n stages, forward logic complexity =

