# DD1: A QDI, Radiation-Hard-by-Design, Near-Threshold 18µW/MIPS Microcontroller in 40nm Bulk CMOS

#### Sean Keller, Alain J. Martin, Chris Moore

California Institute of Technology & Situs Logic

May 5, 2015

Outline Overview Radiation Tapeout Results



- Overview
- Design and Test Results
- Reliability Analysis
- Conclusion

Outline Overview Radiation Tapeout Results

# Designed, Built, Taped-Out, and Tested the World's Lowest Power Radiation-Hardened MCU

- 200 times lower power than any other design built at the time
  - Similar speed
  - Similar radiation tolerance
- 40nm low-power bulk CMOS (TSMC 40LP)
- Atmel AVR reduced-core ISA compatible (e.g. ATtiny)
- QDI logic
- 95% yield on first silicon
- Power and reliability



Outline Overview Radiation Tapeout Results

# Tools, Architecture, and Layout

- QDI: primarily PCHBs
- ► CHPSIM: CHP → structural Verilog
- Magic: custom 8-track standard cells
- Magic: full-custom memories & regfile
- Mentor Graphics: P&R, DRC, LVS, SPICE, & Fast-SPICE
- 2KB IMEM, 256B DMEM, 16 I/Os
- 32x8b registers
- Supply range: 550mV to 1.1V
- 550mV: 18uW at 1 MIPS & 800nW at idle



Outline Overview Radiation Tapeout Results

# Why Radiation Hard?

- Error rates increase as technology scales
- Error rates increase with altitude (aircraft & satellites)
- 14nm CMOS: 1 error every four days in 100Mbit SRAM (sea level)
- 14nm CMOS: 1 error every hour in 100Mbit SRAM (10km)



1 Hubert, G. et al.: Integration, Jan. 2015.

Outline Overview Radiation Tapeout Results

## Effects of Radiation on CMOS Devices

- SEU (single event upset)
  - Problem: bit-flips in memory and logic
  - Mitigation: DD for logic, DICE for memories, physical separation of cells
- SEL (single event latchup)
  - Problem: transient or permanent well-based latchup
  - Mitigation: near-threshold operation, well separation
- TID (total ionizing dose)
  - Problem: gradual shift in  $V_t$  resulting in timing variation
  - Mitigation: body biasing, QDI robustness to delay variation

Outline Overview Radiation Tapeout Results

## SEU Mitigation (Random Logic)



- QDI
  - Input persistence
  - Acknowledgment of errant change on output of operators *F<sub>a</sub>* and *F<sub>b</sub>* blocked by C-element *fence*

Outline Overview Radiation Tapeout Results

## SEU Mitigation (Random Logic)



- QDI
  - Input persistence
  - Acknowledgment of errant change on output of operators *F<sub>a</sub>* and *F<sub>b</sub>* blocked by C-element *fence*

Outline Overview Radiation Tapeout Results

## SEU Mitigation (Random Logic)



- QDI
  - Input persistence
  - Acknowledgment of errant change on output of operators *F<sub>a</sub>* and *F<sub>b</sub>* blocked by C-element *fence*

Outline Overview Radiation Tapeout Results

## SEU Mitigation (Memories)

DICE

Outline Overview Radiation Tapeout Results

# SEL & TID Mitigation

- No latchup: reducing V<sub>DD</sub> to near or below the nominal device threshold voltage, V<sub>t</sub>, disables the n-p-n-p positive feedback path<sup>1</sup>
- Near-threshold operation
  - The minimum energy operating point occurs near V<sub>t</sub>
  - Reduced reliability and robustness
- TID
  - increased timing variation
  - (adaptive) body biasing and distinct voltage domains: shift operation back towards the TT corner

<sup>&</sup>lt;sup>1</sup>Harris et al.: CMOS VLSI Design, 2010

Outline Overview Radiation Tapeout Results

#### DD1 Cobalt-60 Test Results (TID)



Outline Overview Radiation Tapeout Results

#### Heavy Ion Test Results (SEU/SEL)



<sup>\*</sup> Test Facility - UC Berkeley 88-inch cyclotron

Outline Overview Radiation Tapeout Results

#### Measured Energy/Instruction vs Supply Voltage



\* Test Facility - Caltech lab

Outline Overview Radiation Tapeout Results

#### Measured MIPS vs Supply Voltage



\* Test Facility - Caltech lab

Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

## Exactly Why Does QDI Fail Subthreshold?

- Reducing power and increasing robustness are in direct competition
- Fabricating and testing a microprocessor/ASIC is not sufficient
  - At what supply voltage does it fail, and why?
  - How do we optimize, *i.e.* reduce power and increase robustness?
  - Will it work in a future process?
- Primary difficulty: Parameter Variation and Noise

Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Parameter Variation

- Unavoidable
- Global variation (inter die)
  - Chemical mechanical planarization
  - Mask alignment
  - Ion implantation and annealing
- Local variation (intra-die)
  - Line edge roughness
  - Metal/Poly granularity
  - Random dopant fluctuation (RDF) <sup>a</sup>
    - Dominates
    - Uncorrelated
    - V<sub>t</sub> is a normal RV
  - <sup>a</sup> Drago et al.: IEEE TSM May, 2009 <sup>b</sup> Bernstein et al.: IBM JRD July, 2006

#### (NFET Dopant Concentration)<sup>b</sup>



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Parameter Variation

- Unavoidable
- Global variation (inter die)
  - Chemical mechanical planarization
  - Mask alignment
  - Ion implantation and annealing
- Local variation (intra-die)
  - Line edge roughness
  - Metal/Poly granularity
  - Random dopant fluctuation (RDF) <sup>a</sup>
    - Dominates
    - Uncorrelated
    - V<sub>t</sub> is a normal RV
  - <sup>a</sup> Drago et al.: IEEE TSM May, 2009 <sup>b</sup> Bernstein et al.: IBM JRD July, 2006

(NFET Dopant Concentration)<sup>b</sup>



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Noise

- Unavoidable
- Physical noise
- Switching noise
  - Crosstalk
    - Capacitive coupling
    - Inductive coupling
  - Charge sharing
  - Power supply noise
- Noise tends to be proportional to V<sub>DD</sub>
- Noise can be modeled as a DC voltage source between nodes

#### (Coupling Noise (40nm))



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

## Noise and Parameter Variation Problems

- Both noise and variation can cause circuit failures
- Timing failures
  - Relative path delays (isochronic fork)
- Functional failures
  - Memories fail to hold state
  - Gates switch erroneously or do no switch at all
- Need to analyze and quantify these failure rates

Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Near-Threshold Model

$$\mathbf{I}_{on} = \mathbf{I}_1 \mathbf{k}_0 \mathbf{e}^{\mathbf{k}_1 \frac{\mathbf{v}_{DD} - \mathbf{v}_t}{n\phi_t} + \mathbf{k}_2 \left(\frac{\mathbf{v}_{DD} - \mathbf{v}_t}{n\phi_t}\right)^2}$$

- Physically derived<sup>a</sup>
- Transregional
- Valid for NFET and PFET
- Validated across four different process technologies
- ▶ k<sub>0</sub>, k<sub>1</sub>, and k<sub>2</sub> are process independent
- New fundamental model

<sup>a</sup>Keller et al.: IEEE TVLSI, 2014

#### (Near-Threshold Model vs Simulation)



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

## Near-Threshold Statistical Delay

$$\mathbf{I_{on}} = \mathbf{I_1} \mathbf{k_0} \mathbf{e}^{\mathbf{k_1} \frac{\mathbf{v_{DD}} - \mathbf{v_t}}{\mathbf{n}\phi_t} + \mathbf{k_2} \left(\frac{\mathbf{v_{DD}} - \mathbf{v_t}}{\mathbf{n}\phi_t}\right)^2}$$

- $t_{pd} \propto rac{V_{DD}}{I_{on}}$
- V<sub>t</sub> normally distributed
- ▶  $t_{pd}$  log-non-central  $\chi^2$
- $t_d = t_{pd} L_{dp}$  log-normal

#### (Path Delay Distribution)



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Asynchronous Circuit Timing Failures

- Variation significantly alters path delay near-threshold
- Timing violations in QDI exactly why and how?
  - Adversary path timing assumption is necessary and sufficient for correct QDI circuit operation
  - Formal proof<sup>1</sup>
- Statistical timing model used to estimate probability of QDI timing violations

Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

## Parameter Variation Reduces Robustness

- Gates interpret voltages at their input
- Parameter variation makes a gate more susceptible to noise causing a *misinterpretation*
- ► G<sub>1</sub> more robust than G<sub>2</sub> iff G<sub>1</sub> can tolerate more noise than G<sub>2</sub>
- Quantify robustness and treat as first-order metric

#### (VTC Parameter Variation)



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# Static Noise Margin

- ► DC sweep input of gate → VTC parameters
- A gate-pair:  $G_x$  driving  $G_y$
- $\blacktriangleright NM_H = V_{OH}(G_x) V_{IH}(G_y)$
- $\blacktriangleright NM_L = V_{IL}(G_y) V_{OL}(G_x)$
- $SNM = min(NM_H, NM_L)$
- A SNM at or below zero implies failure
- Statistical notion of SNM
  - $V_{IL}$  and  $V_{IH}$  vary with of  $V_t$

#### (VTC Parameter Extraction)



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# The Robustness Metric

- ► The probability that any gate-pair in a system has a NM less than a target (NM<sub>T</sub>) (e.g. 10% V<sub>DD</sub>)
- ► Consider (*INV<sub>x</sub>*, *INV<sub>y</sub>*), a cross-coupled inverter-pair

$$\begin{split} P(FAIL) &= P(FAIL(INV_x, NM_T) \cup FAIL(INV_y, NM_T)) \\ &= P(SNM(INV_x, INV_y) \leq NM_T \cup \\ &SNM(INV_y, INV_x) \leq NM_T) \end{split}$$

- A constructive and composable definition
  - ► 1-Pair of inverters → N-Pairs of inverters → Chains of inverters → Chains of inverting gates → Any set of inverting gates
- Push through statistics and method of computation<sup>2</sup>

<sup>2</sup>Keller et al.: CSTR, 2015

Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# **DD1** Timing Failures

- 20K isochronic forks
  - 100 length-five adversary paths
  - 19.9K length-seven adversary paths
- Statistical delay model
- Failure: probability that isochronic path delay greater than length-five or length-seven adversary path



Near-Threshold Hurdles 1/3 (New MOSFET Model) 2/3 (Timing Assumptions) 3/3 (Quantifying Robustness) DD1 Analysis

# **DD1 Robustness Failures**

- 120K equivalent gate pairs
- (NAND3, NOR3) worst-case upper bound
- Timing failure more likely than NM failure
  - No ratioed logic (staticizers)
  - Maximum stack of 4 FETs





- Designed, built, tested, & analyzed a full-custom radiation-hard by design QDI microcontroller
- Optimized for near-threshold operation
- Developed new models and methods of analysis to understand failure in the subthreshold & near-threshold operating regimes

#### **Thank You!**