

School of Engineering

# Spiral 2-2

Arithmetic Components and Their Efficient Implementations

# Learning Outcomes

School of Engineering

- I know how to combine overflow and subtraction results to determine comparison results of both signed and unsigned numbers
- I understand how combination multipliers can be built
- I understand how hierarchical carry lookahead logic can be used to produce logarithmic time delay for an adder

#### **DATAPATH COMPONENTS**



School of Engineering

2-2.3

#### USC Viterbi 2-2.4 School of Engineering

# **Digital System Design**

- Control (CU) and Datapath Unit (DPU) paradigm
  - Separate logic into datapath elements that operate on data and control elements that generate control signals for datapath elements
  - Datapath: Adders, muxes, comparators, counters, registers (shift, with enables, etc.), memories, FIFO's
  - Control Unit: State machines/sequencers





Detecting Overflow Helps Us Perform Comparison

#### **OVERFLOW & COMPARISON**

### Overflow

School of Engineering

- Overflow occurs when the result of an arithmetic operation is too large to be represented with the given number of bits
  - Unsigned overflow occurs when adding or subtracting unsigned numbers
  - Signed (2's complement overflow) overflow occurs when adding or subtracting 2's complement numbers



School of Engineering

## **Unsigned Overflow**



+8

10 + 7 = 17

With 4-bit *unsigned* numbers we can only represent 0 - 15. Thus, we say overflow has occurred.



School of Engineering

## 2's Complement Overflow



Overflow occurs when you cross this discontinuity

# **Testing for Overflow**

School of Engineering

- Most fundamental test
  - Check if answer is wrong (i.e. Positive + Positive yields a negative)
- Unsigned overflow test [Different for add or sub]
  - Addition: If carry-out of final position equals '1'
  - Subtraction: If carry-out of final addition equals '0'
- Signed (2's complement) overflow test [Same for add or sub]
  - Only occurs if two positives are added and result is negative or two negatives are added and result is positive
  - Alternate test: if carry-in and carry-out of final position are different



# **Testing for Unsigned Overflow**

- Unsigned Overflow has occurred if...
  - Unsigned Addition: If final carry-out = 1
  - Unsigned Subtraction: If final carry-out = 0

| 1011<br>+ 0110 | 1011<br>+ 0011 |
|----------------|----------------|
| 1011<br>- 0110 | 0110<br>- 1011 |

# **Testing for Unsigned Overflow**

2-2.11

School of Engineering

- Unsigned Overflow has occurred if...
  - Unsigned Addition: If final carry-out = 1
  - Unsigned Subtraction: If final carry-out = 0





# Testing for 2's Comp. Overflow

- 2's Complement Overflow Occurs If...
  - Test 1: If pos. + pos. = neg. or neg. + neg. = pos.
  - Test 2: If carry-in to MSB position and carry-out of MSB position are different

$$\begin{array}{cccc} 0101 & (5) & 1100 & (-4) \\ + & 0110 & (6) & + & 1001 & (-7) \\ \end{array}$$

$$\begin{array}{cccc} 0011 & (3) & 1110 & (-2) \\ + & 0010 & (2) & + & 1010 & (-6) \end{array}$$



# Testing for 2's Comp. Overflow

- 2's Complement Overflow Occurs If...
  - Test 1: If pos. + pos. = neg. or neg. + neg. = pos.
  - Test 2: If carry-in to MSB position and carry-out of MSB position are different





### **Checking for Overflow**

 Produce additional outputs to indicate if unsigned (UOV) or signed (SOV) overflow has occurred





#### **COMPARISON**



# **Comparison Via Subtraction**

- Suppose we want to compare two numbers: A & B
- Suppose we let DIFF = A-B...what could the result tell us
  - If DIFF < 0, then A < B
  - If DIFF = 0, then A=B
  - IF DIFF > 0, then A > B
- How would we know DIFF == 0?
  - If all bits of our answer are 0...check with a NOR gate.
- How would we know DIFF < 0 (i.e. negative)?
  - Signed: Check MSB! (but what about overflow)
  - Unsigned: Huh? In unsigned there are no negative results



#### School of Engineering

### Computing A<B from "Negative" Result

#### Unsigned

- Perform A-B
- If A-B would yield a negative result, this will appear as "overflow" in an unsigned subtraction
- And we know unsigned subtraction overflow occurs if Cout = 0
- So just check if Cout=0

#### Signed

- Perform A-B
- If there is no overflow (V=0), simply check if MSB = 1
- But if there is overflow??
  - Recall overflow has the effect of flipping the sign of the result to the opposite of what it should be.
- So if *there is overflow (V=1)* check is MSB = 0 (i.e. positive)
- Summary: A-B is "truly' negative if V=0 & MSB=1 or V=1 & MSB=0



# **Unsigned Comparator**

• A comparator can be built by using a subtractor





# **Signed Comparator**

• A comparator can be built by using a subtractor





School of Engineering

#### **ADDER TIMING**



#### Addition – Full Adders

• Be sure to connect first C<sub>in</sub> to 0

0110 = X+ 0111 = Y



School of Engineering

- A chain of full adders presents an interesting timing analysis problem
- To correctly compute its own Sum and Carry-out, each full adder requires the carry-out bit from the previous full adder
- Because hardware works in parallel, the full adders further down the chain may momentarily produce the wrong outputs because the carry has not had time to propagate to them





# **Timing Example**

• Assume that we were adding one set of inputs and then change to a new set of inputs:



2-2.24

School of Engineering

• At the time just before we enter the new input values, all carries are 0's



2-2.25

School of Engineering

• Now we enter the new inputs and all the FA's starting adding their respective inputs



2-2.26

School of Engineering

• Each adder computes from the current inputs (notice the sum of 1110 is incorrect at this point)



Now the carries are all based off the new inputs

2-2.27

School of Engineering

• The carry is "rippling" through each adder



2-2.28

School of Engineering

• The carry is "rippling" through each adder



2-2.29

School of Engineering

• Only after the carry propagates through all the adders is the sum valid and correct



#### USC Viterbi<sup>2-2.30</sup>

School of Engineering

# "Ripple-Carry" Adder

- The longest path through a chain of full adders is the carry path
- We say that the carry "ripples" through the adder







# **Ripple Carry Adder Delay**

 An n-bit ripple carry adder has a worst case delay proportional to n (i.e. n-bits => n columns of addition => n-full adders)





### Glitches

• Transient, incorrect output values due to differing arrival times of gate inputs

# **Output Glitches**

- Delay of the carry causes glitches on the sum bits
- Glitch = momentarily, incorrect output value

Х

Full

S

Cin

C<sub>out</sub>Adder

late

early

**0**→**0** 

Cin

**0**→1

Х

Cout Adder

Full

S

→**1**→0



2-2.33

School of Engineering

#### **Critical Path**

• Critical Path = Longest possible delay path

Assume  $t_{sum} = 5$  ns,

t<sub>carry</sub>= 4 ns



Critical Path

2-2.34

School of Engineering





School of Engineering

2-2.35



# **Unsigned Multiplication Review**

- Same rules as decimal multiplication
- Multiply each bit of Q by M shifting as you go
- An m-bit \* n-bit mult. produces an m+n bit result (i.e. n-bit \* n-bit produces 2\*n bit result)
- Notice each partial product is a shifted copy of M or 0 (zero)



# **Unsigned Multiplication Review**

- Same rules as decimal multiplication
- Multiply each bit of Q by M shifting as you go
- An m-bit \* n-bit mult. produces an m+n bit result (i.e. n-bit \* n-bit produces 2\*n bit result)
- Notice each partial product is a shifted copy of M or 0 (zero)

|          | M (Multiplicand)<br>Q (Multiplier) |
|----------|------------------------------------|
| 1010     |                                    |
| 1010_    | PP(Partial                         |
| 0000     | Products)                          |
| + 1010   |                                    |
| 01101110 | P (Product)                        |



# Signed Multiplication Techniques

- When adding signed (2's comp.) numbers, some new issues arise
- Must sign extend partial products (out to 2n bits)

| Without Sign Extension<br>Wrong Answer! | With Sign Extension<br>Correct Answer! |
|-----------------------------------------|----------------------------------------|
| 1001 = -7                               | 1001 = -7                              |
| <b>* 0110</b> = +6                      | <b>* 0110</b> = +6                     |
| 0000                                    | 0000000                                |
| 1001_                                   | 1111001_                               |
| 1001                                    | 111001                                 |
| + 0000                                  | + 00000                                |
| 00110110 = +54                          | 11010110 = -42                         |



### Signed Multiplication Techniques

- Also, must worry about negative multiplier
  - MSB of multiplier has negative weight
  - If MSB=1, multiply by -1 (i.e. take 2's comp. of multiplicand)

| With Sign Extension but w/o<br>consideration of MSB<br>Wrong Answer! | With Sign Extension and w/<br>consideration of MSB<br>Correct Answer! |
|----------------------------------------------------------------------|-----------------------------------------------------------------------|
|                                                                      | <b>Place Value:</b> -8 <b>1100</b> = -4                               |
| $\pm$ 1010 = -6                                                      | Multiply by -1 $\star 1010 = -6$                                      |
| 0000000                                                              | 0000000                                                               |
| 1111100_                                                             | 1111100_                                                              |
| 000000                                                               | 000000                                                                |
| + 11100                                                              | + 00100                                                               |
| 11011000 = -40                                                       | 00011000 = +24                                                        |



- Partial Product (PP<sub>i</sub>) Generation
  - Multiply Q[i] \* M
    - if Q[i]=0 => PP<sub>i</sub> = 0
    - if Q[i]=1 => PP<sub>i</sub> = M



- Partial Product (PP<sub>i</sub>) Generation
  - Multiply Q[i] \* M
    - if Q[i]=0 => PP<sub>i</sub> = 0
    - if Q[i]=1 => PP<sub>i</sub> = M
  - AND gates can be used to generate each partial product





# **Multiplication Overview**

- Multiplication approaches:
  - Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow)
  - Combinational: Array multiplier uses an array of adders
    - Can be as simple as N-1 ripple-carry adders for an NxN multiplication





- Partial Products must be added together
- Combinational multipliers require long propagation delay through the adders
  - propagation delay is proportional to the number of partial products (i.e. number of bits of input) and the width of each adder



# **Array Multiplier**



- Maximum delay = ?
  - Do you look for the longest path or the shortest path between any input and output?
  - Compare with the delay of a shift-and-add method



# **Pipelined Multiplier**

• Now try to pipeline the previous design



Determine the maximum stage delay to decide the pipeline clock rate. Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?

# **Carry-Save Multiplier**

 Instead of propagating the carries to the left in the same row, carries are now sent down to the next stage to reduce stage delay and facilitate pipelining





# **Carry Save Adders**

• Consider the decimal addition of

47 + 96 + 58 = 201

- One way is to add 47 to 96 to get 143 and then add 58
- Here the ten's column cannot be added until the carry is produced
- In the carry-save style, we add the one's column and ten's column simultaneous

$$\begin{array}{r}
1 & 1 \\
4 & 7 \\
+ & 9 & 6 \\
3 & 1 & 2 & 4 & 1 & 3 \\
+ & 5 & 8 \\
\hline
2 & 0 & 1 \\
6 & 5 & 4
\end{array}$$





# Carry-Save (3,2) Adders

- A carry save adder is also called a (3,2) adder or a (3,2) counter (refer to Computer Arithmetic Algorithms by Israel Koren) as it takes three vectors, adds them up, and reduces them to two vectors, namely a sum vector and a carry vector
- CSA's are based on the principle that carries do not have to be added as soon as possible, but can be combined in a later step
- An n-bit CSA consist of n disjoint full adders







#### **Adder Propagation Delay**





#### **Adder Propagation Delay**

1111

0001





#### **Adder Propagation Delay**





 $\mathbf{O}$ 

#### **Adder Propagation Delay**

1111

0001

+

0 1 0 0 1 1 1 Х Х Y Y Х Y Х Y 0 0 1 1 Co**FA** Ci Co**FA** CoFA Ci Ci CoFA Ci S S S S () U



#### **Adder Propagation Delay**





#### **Adder Propagation Delay**



#### **Critical Path**

• Critical Path = Longest possible delay path

Assume  $t_{sum} = 5$  ns,

t<sub>carry</sub>= 4 ns



Critical Path

2-2.55

School of Engineering







































#### **Critical Paths**



Critical Path 1Critical Path 2



# **Combinational Multiplier Analysis**

- Large Area due to (n-1) m-bit adders
  - n-1 because the first adder adds the first two partial products and then each adder afterwards adds one more partial product
- Propagation delay is in two dimensions
  - proportional to m+n



Carry-Lookahead Adders

#### **FAST ADDERS**



# **Ripple Carry Adders**

 Ripple-carry adders (RCA) are slow due to carry propagation

At least 2 levels of logic per full adder



#### **Fast Adders**

School of Engineering

- Rather than calculating one carry at a time and passing it down the chain, can we compute a group of carries at the same time
- To do this, let us define some new signals for each column of addition:
  - p<sub>i</sub> = Propagate: This column will propagate a carry-in (if there is one) to the carry-out.

 $p_i$  is true when  $A_i$  or  $B_i$  is  $1 \Rightarrow p_i = A_i + B_i$ 

 g<sub>i</sub> = Generate: This column will generate a carry-out whether or not the carry-in is '1'

 $g_i$  is true when  $A_i$  and  $B_i$  is  $1 \Rightarrow g_i = A_i \bullet B_i$ 

• Using these signals, we can define the carry-out (c<sub>i+1</sub>) as:

 $c_{i+1} = g_i + p_i c_i$ 



# Carry Lookahead Logic

- Define each carry in terms of p<sub>i</sub>, g<sub>i</sub> and the initial carry-in (c<sub>0</sub>) and not in terms of carry chain (intermediate carries: c1,c2,c3,...)
- c1 =
- c2 =
- c3 =
- c4 =



#### Carry Lookahead Logic

- Define each carry in terms of p<sub>i</sub>, g<sub>i</sub> and the initial carry-in (c<sub>0</sub>) and not in terms of carry chain (intermediate carries: c1,c2,c3,...)
- $c1 = g_0 + p_0 c_0$
- $c2 = g_1 + p_1c_1 = g_1 + p_1g_0 + p_1p_0c_0$
- c3 = ...
- c4 = ...



# Carry Lookahead Analogy

- Consider the carry-chain like a long tube broken into segments. Each segment is controlled by a valve (propagate signal) and can insert a fluid into that segment (generate signal)
- The carry-out of the diagram below will be true if g1 is true or p1 is true and g0 is true, or p1, p0 and c1 is true





2-2.73



# Carry Lookahead Adder

- Use carry-lookahead logic to generate all the carries in one shot and then create the sum
- Example 4-bit CLA shown below





# Carry Lookahead Adder

- Use carry-lookahead logic to generate all the carries in one shot and then create the sum
- Example 4-bit CLA shown below





#### **4-bit Adders**

 74LS283 chip implements a 4-bit adder using CLA methodology



# 16-Bit CLA

2 - 2.77

School of Engineering

- But how would we make a 16-bit adder?
- Should we really just chain these fast 4-bit adders together?
  - Or can we do better?



What's the difference between the equation for G here and C4 on the previous slides Define P and G as the overall Propagate and Generate signals for a set of 4 bits  $P = p3 \cdot p2 \cdot p1 \cdot p0$ 

$$G = g3 + p3 \cdot g2 + p3 \cdot p2 \cdot g1 + p3 \cdot p2 \cdot p1 \cdot g0$$







# 16-bit CLA Closer Look

- Each 4-bit CLA only propagates its overall carry-in if each of the 4 columns propagates:
  - − P0 = p3• p2 •p1 •p0
  - − P1 = p7• p6 •p5 •p4
  - − P2 = p11• p10 p9 p8
  - − P3 = p15• p14 p13 p12
- Each 4-bit CLA generates a carry if any column generates and the more significant columns propagate
  - $G0 = g3 + (p3 \bullet g2) + (p3 \bullet p2 \bullet g1) + (p3 \bullet p2 \bullet p1 \bullet g0)$
  - ...
  - $G3 = g15 + (p15 \bullet g14) + (p15 \bullet p14 \bullet g13) + (p15 \bullet p14 \bullet p13 \bullet g12)$
- The higher order CLL logic (producing C4,C8,C12,C16) then is realized as:

$$-$$
 (C4) =>C1 = G0 + (P0 •c0)

- (C16) => C4 = G3 + (P3 G2) + (P3 P2 G1) + (P3 P2 P1 G0) + (P3 P2 P1 P0 c0)
- These equations are exactly the same CLL logic we derived earlier

#### 16-Bit CLA

2-2.80

School of Engineering

• Understanding 16-bit CLA hierarchy...



#### 64-Bit CLA

• We can reuse the same CLL logic to build a 64-bit CLA



= \_\_\_\_ = Delay in producing S63

Is the delay in producing s63 the same as in s35?

- = \_\_\_\_ = Delay in producing S2
- = \_\_\_\_ = Delay in producing S0

- = \_\_\_\_ = Delay in producing pi\*,gi\*
- = \_\_\_\_ = Delay in producing Pj\*\*,Gj\*\*

2-2.81

**C0** 

School of Engineering

- = \_\_\_\_ = Delay in producing C48
- = \_\_\_\_ = Delay in producing C60
- = \_\_\_\_ = Delay in producing C63
- = \_\_\_\_ = Delay in producing S63
- \_\_\_\_\_ Total Delay

#### USC Viterbi 2-2.82 School of Engineering

#### Summary

- You should now be able to build:
  - Fast Adders
  - Comparators