

# USC Viterbi School of Engineering

# **Learning Outcomes**

- I understand the control inputs to counters
- I can design logic to control the inputs of counters to create a desired count sequence
- I understand how smaller adder blocks can be combined to form larger ones
- I can build larger arithmetic circuits from smaller building blocks
- I understand the timing and control input differences between asynchronous and synchronous memories

# Spiral 2-2

Arithmetic Components and Their Efficient Implementations





- Control (CU) and Datapath Unit (DPU) paradigm
  - Separate logic into datapath elements that operate on data and control elements that generate control signals for datapath elements
  - Datapath: Adders, muxes, comparators, counters, registers (shift, with enables, etc.), memories, FIFO's
  - Control Unit: State machines/sequencers



#### **DATAPATH COMPONENTS**





#### Overflow

- Overflow occurs when the result of an arithmetic operation is \_\_\_\_\_\_ to be represented with the given number of bits
  - Unsigned overflow occurs when adding or subtracting unsigned numbers
  - Signed (2's complement overflow) overflow occurs when adding or subtracting 2's complement numbers

#### Detecting Overflow Helps Us Perform Comparison

#### **OVERFLOW & COMPARISON**



# **Unsigned Overflow**





# 2's Complement Overflow





# **Testing for Overflow**

- Most fundamental test
  - Check if answer is \_\_\_\_\_ (i.e. Positive + Positive yields a negative)
- Unsigned overflow test [Different for add or sub]
  - Addition: If carry-out of final position equals
  - Subtraction: If carry-out of final addition equals \_\_\_\_\_
- Signed (2's complement) overflow test [Same for add or sub]
  - Only occurs if \_\_\_\_\_
  - Alternate test: if \_\_\_\_\_\_ of final column are different



# **Testing for Unsigned Overflow**

- Unsigned Overflow has occurred if...
  - Unsigned Addition: If final carry-out = \_\_\_\_
  - Unsigned Subtraction: If final carry-out = \_\_\_\_



# Testing for 2's Comp. Overflow

- 2's Complement Overflow Occurs If...
  - Test 1: If pos. + pos. = neg. or neg. + neg. = pos.
  - Test 2: If carry-in to MSB position and carry-out of MSB position are different



## **Checking for Overflow**

 Produce additional outputs to indicate if unsigned (UOV) or signed (SOV) overflow has occurred





#### **COMPARISON**

# USC Viterbi School of Engineering

## Computing A<B from "Negative" Result

#### Unsigned

- Perform A-B
- If A-B would yield a negative result, this will appear as \_\_\_\_\_in an unsigned subtraction
- And we know unsigned subtraction overflow occurs if \_\_\_\_\_\_
- So just check if \_\_\_\_\_

#### Signed

- Perform A-B
- If A-B would yield a negative result, this will appear as
   If there is no overflow (V=0), simply check if \_\_\_\_\_\_
  - But if there is overflow??
    - Recall overflow has the effect of flipping the sign of the result to the opposite of what it should be.
  - So if *there is overflow (V=1)* check is (i.e. positive)
  - Summary: A-B is "truly" negative if:



# **Comparison Via Subtraction**

- Suppose we want to compare two numbers: A & B
- Suppose we let DIFF = A-B...what could the result tell us
  - If DIFF < 0, then \_\_\_\_\_</p>
  - If DIFF = 0, then \_\_\_\_\_
  - IF DIFF > 0, then \_\_\_\_\_
- How would we know DIFF == 0?
  - If all bits of our answer
- How would we know DIFF < 0 (i.e. negative)?
  - Signed: \_\_\_\_\_! (but what about overflow)
  - Unsigned: Huh? In unsigned there are no negative results



# **Unsigned Comparator**

• A comparator can be built by using a subtractor





# **Signed Comparator**

• A comparator can be built by using a subtractor





# **Summary**

- You should now be able to build:
  - Fast Adders
  - Comparators



# USC Viterbi 2-2.20

# Addition - Full Adders

• Be sure to connect first  $C_{in}$  to 0

$$0110 = X$$
  
+  $0111 = Y$ 



#### **ADDER TIMING**



# **Timing**

- A chain of full adders presents an interesting timing analysis problem
- To correctly compute its own Sum and Carry-out, each full adder requires the carry-out bit from the \_\_\_\_\_ full adder
- Because hardware works in parallel, the full adders further down the chain may \_\_\_\_\_\_ produce the \_\_\_\_\_ outputs because the carry has not had time to \_\_\_\_\_ to them



# **USC**Viterbi (2-2.22)

## **Timing Example**

 Assume that we were adding one set of inputs and then change to a new set of inputs:





# **Timing**

 At the time just before we enter the new input values, all carries are 0's





# **Timing**

 Now we enter the new inputs and all the FA's starting adding their respective inputs





# **Timing**

 Each adder computes from the current inputs (notice the sum of 1110 is incorrect at this point)



Now the carries are all based off the new inputs



# **Timing**

• The carry is "rippling" through each adder





# **Timing**

• The carry is "rippling" through each adder





# **Timing**

 Only after the carry propagates through all the adders is the sum valid and correct





# "Ripple-Carry" Adder

- The longest path through a chain of full adders is the carry path
- We say that the carry
   "\_\_\_\_\_" through the
   adder







# Ripple Carry Adder Delay

 An n-bit ripple carry adder has a worst case delay proportional to \_\_\_\_\_





# **Glitches**

\_\_\_\_\_\_ output values due to \_\_\_\_\_ arrival times of gate inputs



# **Output Glitches**

- Delay of the carry causes glitches on the sum bits
- Glitch = momentarily, incorrect output value







#### **Critical Path**

• Critical Path = \_\_\_\_\_ possible delay path

Assume 
$$t_{sum} = 5 \text{ ns},$$
  
 $t_{carry} = 4 \text{ ns}$ 



····· Critical Path

# USC Viter bi 2-2.34 School of Engineering

#### **MULTIPLIERS**



# **Unsigned Multiplication Review**

- Same rules as decimal multiplication
- Multiply each bit of Q by M shifting as you go
- An m-bit \* n-bit mult. produces an \_\_\_\_\_ bit result
   (i.e. n-bit \* n-bit produces \_\_\_\_\_ bit result)
- Notice each partial product is a shifted copy of M or 0 (zero)



# Signed Multiplication Techniques

- When adding signed (2's comp.) numbers, some new issues arise
- Must \_\_\_\_\_



# **Signed Multiplication Techniques**

- · Also, must worry about negative multiplier
  - MSB of multiplier has negative weight
  - If MSB=1, \_\_\_\_\_

$$1100 = -4$$
**\***  $1010 = -6$ 

# USC Viterbi (2-2.39) School of Engineering

# **Combinational Multiplier**

- Partial Product (PP<sub>i</sub>) Generation
  - Multiply Q[i] \* M
    - if Q[i]=0 => PP<sub>i</sub> = \_\_\_\_
    - if Q[i]=1 => PP<sub>i</sub> = \_\_\_\_
  - gates can be used to generate each partial product





# **Combinational Multiplier**

- Partial Product (PP<sub>i</sub>) Generation
  - Multiply Q[i] \* M
    - if Q[i]=0 => PP<sub>i</sub> = \_\_\_\_
    - if Q[i]=1 => PP<sub>i</sub> = \_\_\_\_



# **Multiplication Overview**

- Multiplication approaches:
  - Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow)
  - Combinational: Array multiplier uses an array of adders
    - Can be as simple as N-1 ripple-carry adders for an NxN multiplication







# **Combinational Multiplier**

- Partial Products must be added together
- Combinational multipliers require long propagation delay through the adders
  - propagation delay is proportional to the number of partial products (i.e. number of bits of input) and the width of each adder



# **Array Multiplier**



- Maximum delay =
  - Do you look for the longest path or the shortest path between any input and output?
  - Compare with the delay of a shift-and-add method



# **Adder Propagation Delay**





# **Critical Path**

Critical Path = Longest possible delay path

Assume 
$$t_{sum} = 5 \text{ ns}$$
,  $t_{carry} = 4 \text{ ns}$ 

16 ns

Co FA Ci
S
S
S
 $t_{carry} = 4 \text{ ns}$ 

Co FA Ci
S
S
 $t_{carry} = 4 \text{ ns}$ 

OFA Ci
S
S
 $t_{carry} = 4 \text{ ns}$ 

OFA Ci
S
 $t_{carry} = 4 \text{ ns}$ 
S
 $t_{carry} = 4$ 

← Critical Path



# **Combinational Multiplier**





#### **Critical Paths**



Critical Path 1

◆ Critical Path 2



# **Combinational Multiplier Analysis**

- Large Area due to \_\_\_\_\_\_-bit adders
  - n-1 because the first adder adds the first two partial products and then each adder afterwards adds one more partial product
- Propagation delay is in two dimensions
  - proportional to \_\_\_\_\_



# Pipelined Multiplier

• Now try to pipeline the previous design



Determine the maximum stage delay to decide the pipeline clock rate.

Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?



# Carry-Save Multiplier

Instead of propagating the carries to the left in the same row, carries are now sent down to the next stage to reduce stage delay and facilitate pipelining





# **Carry Save Adders**

Consider the decimal addition of

- One way is to add \_\_\_\_\_\_ to get \_\_\_\_ and \_\_\_\_\_
- Here the column cannot be added is produced
- In the carry-save style, we add the \_\_\_\_ column and \_\_\_\_ column simultaneous







# Carry-Save (3,2) Adders

• A carry save adder is also called a (3,2) adder or a (3,2) counter (refer to Computer Arithmetic Algorithms by Israel Koren) as it takes three vectors, adds them up, and reduces them to two vectors, namely a sum vector and a carry vector



• CSA's are based on the principle that carries do not have to be added combined

• An n-bit CSA consist of n disjoint full

adders





Carry-Lookahead Adders

#### **FAST ADDERS**



# **Ripple Carry Adders**

- Ripple-carry adders (RCA) are slow due to carry propagation
  - At least 2 levels of logic per full adder





#### **Fast Adders**

- Rather than calculating one carry at a time and passing it down the chain, can we compute a group of carries at the same time
- To do this, let us define some new signals for each column of addition:

| - | p <sub>i</sub> =: This column will propagate a carry-in (if there is |
|---|----------------------------------------------------------------------|
|   | one) to the carry-out.                                               |
|   | $p_i$ is true when $A_i$ or $B_i$ is 1 => $p_i$ =                    |
| _ | g <sub>i</sub> =: This column will generate a carry-out whether or   |
|   | not the carry-in is '1'                                              |
|   | $g_i$ is true when $A_i$ and $B_i$ is 1 => $g_i$ =                   |

• Using these signals, we can define the carry-out (c<sub>i+1</sub>) as:



# Carry Lookahead Logic

- Define each carry in terms of p<sub>i</sub>, g<sub>i</sub> and the initial carry-in (c<sub>0</sub>) and not in terms of carry chain (intermediate carries: c1,c2,c3,...)
- c1 =
- c2 =
- c3 =
- c4 =



# Carry Lookahead Analogy

- Consider the carry-chain like a long tube broken into segments. Each segment is controlled by a valve (propagate signal) and can insert a fluid into that segment (generate signal)
- The carry-out of the diagram below will be true if g1 is true or p1 is true and g0 is true, or p1, p0 and c1 is true









# Carry Lookahead Adder

- Use carry-lookahead logic to generate all the carries in one shot and then create the sum
- Example 4-bit CLA shown below
- · How many levels of logic is the adder?





### 4-bit Adders

• 74LS283 chip implements a 4-bit adder using **CLA** methodology



#### 16-Bit CLA

- But how would we make a 16-bit adder?
- Should we really just chain these fast 4-bit adders together?
  - Or can we do better?



What's the difference between the equation for G here and C4 on the previous slides

signals for a set of 4 bits

 $G = g3 + p3 \cdot g2 + p3 \cdot p2 \cdot g1 + p3 \cdot p2 \cdot p1 \cdot g0$ 



# REVIEW ON YOUR OWN FOR CLA LAB



# 16-bit CLA Closer Look

- Each 4-bit CLA only propagates its overall carry-in if each of the 4 columns propagates:
  - P0 = p3• p2 •p1 •p0
  - P1 = p7• p6 •p5 •p4
  - P2 = p11• p10 •p9 •p8
  - P3 = p15• p14 •p13 •p12
- Each 4-bit CLA generates a carry if any column generates and the more significant columns propagate
  - G0 = g3 + (p3  $\bullet$ g2) + (p3  $\bullet$ p2  $\bullet$ g1)+(p3  $\bullet$ p2  $\bullet$ p1  $\bullet$ g0)
  - \_
  - $G3 = g15 + (p15 \cdot g14) + (p15 \cdot p14 \cdot g13) + (p15 \cdot p14 \cdot p13 \cdot g12)$
- The higher order CLL logic (producing C4,C8,C12,C16) then is realized as:
  - (C4) =>C1 = G0 + (P0 •c0)

= \_\_\_ = Delay in producing S2

= \_\_\_ = Delay in producing S0

- ..
- (C16) => C4 = G3 + (P3  $\bullet$ G2) + (P3  $\bullet$ P2  $\bullet$ G1) + (P3  $\bullet$  P2  $\bullet$  P1  $\bullet$  G0) + (P3  $\bullet$ P2  $\bullet$ P1  $\bullet$ P0  $\bullet$ C0)
- · These equations are exactly the same CLL logic we derived earlier



#### 16-Bit CLA

Understanding 16-bit CLA hierarchy...





#### 64-Bit CLA

• We can reuse the same CLL logic to build a 64-bit CLA 0000 0000 C40 C36 C12 C8 **C60** C56 C44 C28 C24 CLL CLL PG CLL CLL C48 C32 C16 CLL c2 p3 g3 c3 p2 g2 p1 g1 c1 p0 g0 c0 = \_\_\_ = Delay in producing S63 = Delay in producing pi\*,gi\* = Delay in producing Pi\*\*, Gi\*\* Is the delay in producing s63 the same as in s35?

= \_\_\_ = Delay in producing C48

= \_\_\_ = Delay in producing C60

= Delay in producing C63

= Delay in producing S63

Total Delay