# **Chapter 1 Introduction**

#### Jin-Fu Li

Advanced Reliable Systems (ARES) Laboratory Department of Electrical Engineering National Central University Jungli, Taiwan

### Outline

- VLSI Realization
- □ Role of Testing
- □ Defects, Faults, and Errors
- VLSI Testing Concepts
- **D** Testing Economics
- Test Quality Measure

### **VLSI Realization Process**



### Definitions

#### Design synthesis

- Given an I/O function, develop a procedure to manufacture a device using known materials and processes
- Verification
  - Predictive analysis to ensure that the synthesized design, when manufactured, will perform the given I/O function

#### □ Test

• A manufacturing step that ensures that the physical device, manufactured from the synthesized design, has no manufacturing defect

## **Role of Testing**

- □ If you design a product, fabricate, and test it, and it fails the test, then there must be a cause for the failure
  - Test was wrong
  - The fabrication process was faulty
  - The design was incorrect
  - The specification problem
- □ The role of *testing* is to detect whether something went wrong and the role of *diagnosis* is to determine exactly what went wrong
- Correctness and effectiveness of testing is most important for quality products

## **Benefits of Testing**

- □ *Quality* and *economy* are two major benefits of testing
- The two attributes are greatly dependent and can not be defined without the other
- Quality means satisfying the user's needs at a minimum cost
- □ The purpose of testing is to weed out all bad products before they reach the user
  - The number of bad products heavily affect the price of good products
- A profound understanding of the principles of manufacturing and test is essential for an engineer to design a quality product

### **Trends of Testing**

- Two key factors are changing the way of VLSI ICs testing
  - The manufacturing test cost has been not scaling
  - The effort to generate tests has been growing geometrically along with product complexity



### Test Knowledge is Important

- *Testing* is becoming a factor in design optimization
- Designers customarily strive for an optimal design
  - A high-speed, low-power design occupying the smallest possible area
- □ Conventionally, the designer often optimize one of the tree attributes: *speed* (or delay), *area*, and *power*
- □ At present, a fourth attribute is considered

#### Testability

Nowadays, the *testability cycle* should *parallel* the *design cycle* 

#### As Technology Scales Continuously

- Die size, chip yield, and design productivity have so far limited transistor integration in a VLSI design
- Now the focus has shifted to energy consumption, power dissipation, and power delivery
- □ As technology scales further we will face new challenges, such as variability, single-event upsets (soft errors), and device (transistor performance) degradation— these effects manifesting as inherent *unreliability* of the components, posing design and test challenges Source: S. Borkar (Intel Corp.), *IEEE Micro*, 2005

#### **Possible Solution to Conquer Unreliability**

We could distribute test functionality as a part of the hardware to dynamically detect errors, or to correct and isolate aging and faulty hardware. Or a subset of cores in the multicore design could perform this work. This microarchitecture strategy, with multicores to assist in redundancy, is called *resilient microarchitecture*. It continuously detects errors, isolates faults, confines faults, reconfigures the hardware, and thus adapts. If we can make such a strategy work, there is no need for ontime factory testing, burn in, since the system is capable of testing and reconfiguring itself to make itself work reliably throughout its lifetime.

Source: S. Borkar (Intel Corp.), IEEE Micro, 2005

#### Itanium (JSSC, Jan. 2006)



#### Cell Processor (JSSC, Jan. 2006)



Advanced Reliable Systems (ARES) Lab.

## **Defect, Fault, and Error**

#### □ Defect

- A defect is the unintended difference between the implemented hardware and its intended design
- Defects occur either during manufacture or during the use of devices
- 🗖 Fault
  - A representation of a *defect* at the abstracted function level
- **Error** 
  - A wrong output signal produced by a defective system
  - An error is caused by a *fault* or a design error

# **Typical Types of Defects**

#### Extra and missing material

- Primarily caused by dust particles on the mask or wafer surface, or in the processing chemicals
- Oxide breakdown
  - Primarily caused by insufficient oxygen at the interface of silicon (Si) and silicon dioxide (SiO<sub>2</sub>), chemical contamination, and crystal defects

#### Electromigration

- Primarily caused by the transport of metal atoms when a current flows through the wire
  - Because of a low melting point, aluminum has large self-diffusion properties, which increase its electromigration liability



- □ Error: a=1, b=1, c=0 (correct output c=1)
- □ Note that the error is not permanent. As long as at least one input is 0, there is no error in the output

С

## **Defect, Fault, and Error**

Different types of defects may cause the same fault



Jin-Fu Li, EE, NCU

Different types of faults may cause the same error







| С | D | Y | Y(C is S/1) |
|---|---|---|-------------|
| 0 | 0 | 0 | 1           |
| 0 | 1 | 1 | 1           |
| 1 | 0 | 1 | 1           |
| 1 | 1 | 1 | 1           |

## **Ideal Tests & Real Tests**

#### □ The problems of ideal tests

- Ideal tests detect all defects produced in the manufacturing process
- Ideal tests pass all functionally good devices
- Very large numbers and varieties of possible defects need to be tested
- Difficult to generate tests for some real defects
- $\Box$  Real tests
  - Based on analyzable fault models, which may not map on real defects
  - Incomplete coverage of modeled faults due to high complexity
  - Some good chips are rejected. The fraction (or percentage) of such chips is called the *yield loss*
  - Some bad chips pass tests. The fraction (or percentage) of bad chips among all passing chips is called the *defect level*

### **How to Test Chips?**



Lab.

### **Cost of Test**

- □ Design for testability (DFT)
  - Chip area overhead and yield reduction
  - Performance overhead
- □ Software processes of test
  - Test generation and fault simulation
  - Test programming and debugging
- Manufacturing test
  - Automatic test equipment (ATE) capital cost
  - Test center operational cost

# ADVENTEST Model T6682 ATE

#### Consists of

- Powerful computer
- Powerful 32-bit digital signal processor (DSP) for analog testing
- Probe head: actually touches the bare dies or packaged chips to perform fault detection experiments
- Probe card: contains electronics to measure chip pin or pad



### **Internal Structure of the ATE**



Source: H.-J. Huang, CIC

#### **ATE Test Operation**



Source: H.-J. Huang, CIC

#### □ Characterization testing

- A.k.s. design debug or verification testing
- Performed on a new design before it is sent to production
- Verify whether the design is correct and the device will meet all specifications
- Functional tests and comprehensive AC and DC measurements are made
- A *characterization test* determines the exact limits of device operation values
- $\square DC Parameter tests$ 
  - Measure steady-state electrical characteristics
  - For example, threshold test
    - $0 < V_{OL} < V_{IL}$  $V_{IH} < V_{OH} < V_{CC}$

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU

#### □ AC parametric tests

- Measure transient electronic characteristics
- For example:
  - $\Box$  Rise time & fall time tests



#### Production testing

- Every fabricated chip is subjected to production tests
- The test patterns may not cover all possible functions and data patterns but must have a high fault coverage of modeled faults
- The main driver is cost, since every device must be tested. Test time must be absolutely minimized
- Only a go/no-go decision is made
- Test whether some device-under-test parameters are met to the device specifications under normal operating conditions

Jin-Fu Li, EE, NCU

- $\Box$  Burn-In testing
  - Ensure reliability of tested devices by testing
  - Detect the devices with potential failures

26

- The potential failures can be accelerated at elevated temperatures
- The devices with *infant mortality failures* may be screened out by a short-term burn-in test in an accelerate
- □ Failure rate versus product lifetime (*bathtub curve*)



### **Testing Economics**

- Chips must be tested before they are assembled onto PCBs, which, in turn, must be tested before they are assembled into systems
- □ The rule of ten
  - If a chip fault is not detected by chip testing, then finding the fault costs 10 times as much at the PCB level as at the chip level
  - Similarly, if a board fault is not found by PCB testing, then finding the fault costs 10 times as much at the system level as at the board level
- □ Some claim that the rule of ten should be renamed the rule of twenty

Chips, boards, and systems are more complex

# **VLSI Chip Yield**

- A manufacturing defect is a finite chip area with electrically malfunctioning circuitry caused by errors in the fabrication process
- A chip with no manufacturing defect is called a good chip
- Fraction (or percentage) of good chips produced in a manufacturing process is called the *yield*. Yield is denoted by symbol Y
- □ Cost of a chip

Cost of fabricating and testing a wafer

Yield x Number of chip sites on the wafer

## **VLSI Chip Yield**



Wafer yield = 12/22 = 0.55

Wafer yield = 17/22 = 0.77

## Fault Coverage & Defect Level

#### □ Fault coverage (FC)

- The measure of the ability of a test (a collection of test patterns) to detect a given faults that may occur on the device under test
- FC=#(detected faults)/#(possible faults)
- □ Defect level (DL)
  - The ratio of faulty chips among the chips that pass tests
  - DL is measured as *defects per million* (DPM)
  - DL is a measure of the effectiveness of tests
  - DL is a quantitative measure of the manufactured product quality. For commercial VLSI chips a DL greater than 500 DPM is considered unacceptable

DL =  $1 - Y^{(1-FC)}$  and  $0 < DL \le 1 - Y$ 

## **Defect Level & Quality Level**

□ For example, required FC for DL=200 DPM

| Y(%)  | 10     | 50    | 90   | 95   | 99 |
|-------|--------|-------|------|------|----|
| FC(%) | 99.991 | 99.97 | 99.8 | 99.6 | 98 |

#### □ Quality level (QL)

The fraction of good parts among the parts that pass all the tests and are shipped

• 
$$QL = 1 - DL = Y^{(1-FC)}$$
 and  $0 \le QL \le 1$ 

Consequently, fault coverage affects the quality level

### Summary

- Efficient test strategies can heavily reduce the cost of a chip
- □ Fault modeling makes the test be analyzable and reduces the complexity of testing
- The quality of tests affect the quality of products