Recent Research Interests
Currently about 20% of newly manufactured complex chips are faulty. The yield is thus 80%. This is due primarily to the manufacturing process that introduces defects and process variations, and to noise in the circuit itself. Each chip must be thoroughly tested before sold to a customer to ensure it meets all specification. For about 20 years Dr. Breuer has focused on how to develop tests for new chip designs. This includes automatic test pattern generation, design for test and built in self-test. More recently he has turned his attention to two radically new topics, namely (1) the defective chips themselves and is addressing the problem of salvaging defective chips that can do useful computation, and (2) the field of design for yield. Below we expand on these tow new areas.
Dr. Breuer and his PhD candidates continue their research on characterization of the functional severity of defects in defective semiconductor chips, and extending the domain of applications for the commercial use of such chips. Successful commercial use of defective chips can save the semiconductor industry millions of dollars provided industry is willing to accept or tolerate some performance degradation or acceptable error rates. One domain they have successfully studied and found to be very suitable is the use of defective Random Access Memories (RAMs) in digital answering machines. Here about 1% of the RAM can be defective with no impact on the quality of the recorded message. Similar results were obtained when MPEG and JPEG modules were studied. They successfully extended their built-in self-test methodology for estimating error-rate using signature analysis to include the use of “one’s counting.” They have also extended their work to combine error-significance and rate into a new measure. Dr. Breuer has referred to this field of study as “error tolerance.”
The result of these studies has lead to the conclusion that a significant fraction of so-called “bad” chips might produce acceptable performance in some applications. His research team has focused on two related problems. First, they have identified and quantized that this notion is actually true for many multi-media applications. Secondly, they have developed test methodologies to support this new paradigm. For example, rather than determining if a chip is good or bad, they take the bad chips and answer such questions as error-rate and error-significance. The net result of this work is intended to significantly increase the effective yield of an industry that annually produces several billion dollars of chips per year.
Dr. Breuer’s current research has recently evolved from the field of error-tolerance, which he developed and refined with the aim of identifying acceptable yet defective chips, to the newly developed field of Performance Degrading Faults (PDF), wherein he is developing testing methodologies to improve the yield of complex CMOS VLSI chips. Historically, there is an area called fault-tolerance, where designers use redundant components to ensure getting 100% correct results during a mission, even in the light of some components failing. This is needed for such applications as in the space shuttle. A second recognized area is defect-tolerance, where engineers use redundancy so that a newly manufactured part will still operate properly even if some of its components are defective. Error-tolerance, on the other hand, acknowledges that a device is faulty and may occasionally produce errors or reduced performance, yet the performance of the system is acceptable to the end user. This is analogous to saying that you need not put your automobile in the junk pile just because the upholstery is torn or a light bulb is bad. Research into the study of PDF moves beyond error-tolerance by figuring out how chips, which suffer from defects, noise, and/or process variations, might still be used to do useful computation under such circumstances.
For many years industry has not accepted the notions of error-tolerance or performance degradation. But, after pursuing this issue for about eight years, industry is beginning to see the possibilities, particularly the economic benefits of improved yields in the semiconductor manufacturing process. Dr. Breuer has received funding from both the National Science Foundation and the SemiConductor Research Corporation to continue this work. He has also given five keynote addresses at conferences on this topic.
In the study of PDF Dr. Breuer notes that traditionally a fault in a chip is assumed to create errors, and that such chips are usually discarded. Using a rich set of test data and special clocking techniques, most faults can be detected, and to some extent their location can be isolated to within a few gates. As chip complexity reaches billions of transistors, most chips will contain numerous faults resulting in a high level of rejects. However, some of the faults result in internal errors but not external errors, though they might still result in some form of performance degradation. If the degree of this performance degradation is not too large these chips can still be used for low-end applications resulting in a higher apparent yield in the chip manufacturing process.
Dr. Breuer defines PDF with respect to a circuit as a fault that:
· Can produce steady-state errors (in storage cells)
· Does not produce any errors in the normal functional operation (outputs) of a circuit, such as in executing a user’s C-code.
· Is not compensated for by reconfiguration, masking, coding or modifying how the circuit is operated, e.g. by increasing the voltage supply (traditional techniques)
· But does (might) reduce some aspect of system performance, such as throughput, latency or power.
A PDF is said to pass a test if the degradation it introduces is below a threshold level; otherwise it is said to fail the test. Engineers and marketing personnel familiar with factors such as the product being produced and the applications in which it will be used determine the threshold.
Thus, the net result of Dr. Breuer’s research will allow for very complex digital systems to be manufactured using the most advanced technology and scaling factors. While all of these chips will have issues dealing with spot defects and process variations, using their test and evaluation techniques, they will be able to identify those chips that provide acceptable performance for specific applications. This will have a tremendous economic effect on the entire microelectronics industry.
Dr. Breuer’s overall research goal is to determine the practical feasibility and applications of using chips with PDF. If this notion is indeed a viable computing paradigm for the future, then many things need to be done. Application engineers need to develop new ways to specify their needs in terms of degrees of acceptability. Design engineers need to learn how to design for useful computation; currently and past technique have focused on design for performance, low power, or yield. And clearly, industry must establish a new business model that focuses on usability rather than total correctness.
More recently Dr. Breuer’s group has focused on adding redundancy to chips to enhance yield per area. This is done at the core or module level. The test process consists of testing each module and its spares. Then a subset of good modules is selected that can be configured into a working chip. The configuration is done using fork and join circuits that have been added to the chip. One unique property of this work is the partitioning and clustering algorithms used to define modules.
Some papers that deal with the issues mentioned above are listed below.
Algorithms to maximize yield and enhance yield/area of pipeline circuitry by insertion of switches and redundant modules, Design Automation and Test in Europe (DATE), Dresden, Germany, pp. 8-12, March 8-12, 2010.
Hardware that produces bounded rather than exact results, Design Automation Conf. (DAC), invited paper, pp. 871-876, June 13-18, 2010.
HYPER: a Heuristic for Yield/area imProvEment using Redundancy in SoC, Asian Test Symp., Shanghai, China, pp. 249-254, Dec. 1-4, 2010.
Theory of logical partitioning for yield/area maximization using redundancy, IEEE Int’l. Workshop on Design for Manufacturing and Yield (DFM&Y 2011), San Diego, CA, June 6, 2011, with M. M. Aghatabar and S. K. Gupta.
Yield/area maximization of logic circuits: From theorem to implementation, IEEE Int’l. Workshop on Defect and Adaptive Test Analysis (DATA-2011), Anaheim CA, Sept. 22-23, 2011, with M. M. Aghatabar and S. K. Gupta.
DACS: Data aware component salvaging in presence of microprocessor integer functional unit delay faults, IEEE Int’l. Workshop on Defect and Adaptive Test Analysis (DATA-2011), Anaheim CA, Sept. 22-23, 2011, with Y. Gao.
Theory of Redundancy for Logic Circuits to Maximize Yield/Area, Proc. Int’l. Symp. on Quality Electronic Design (ISQED), Santa Clara, CA. March 19-21, 2012, with M. M. Aghatabar, S. K. Gupta, and S. Nazarian.
A design flow to maximize yield/area of physical devices via redundancy, IEEE Int’t. Test Conf. (ITC), 2012, with M. M. Aghatabar and S. K. Gupta.
Trading off area, yield and performance via hybrid redundancy in multi-core architectures, IEEE VLSI Test Symposium (VTS’13), 2013, with Y. Gao, Y. Zhang, and da Cheng.
An error-tolerance-based test methodology to support product grading for yield enhancement, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 30, No. 6, pp. 930-934, June 2011, with T.-Y. Hsieh and K.-J. Lee.
Efficient over-detection elimination of acceptable faults for yield improvement, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, No. 5, pp. 754-764, May 2012, with T.-Y. Hsieh and K.-J. Lee.
Error rate estimation for defective circuits via ones counting, J. ACM Trans. on Design Automation of Electronic Systems, Vol. 27, No. 1, Article #8, January 2012, with Z. Pan.