Part of the  

Chip Design Magazine


About  |  Contact

Posts Tagged ‘test’

Increasing Power Density of Electric Motors Challenges IGBT Makers

Tuesday, August 23rd, 2016

Mentor Graphics answers questions about failure modes and simulation-testing for IGBT and MOSFET power electronics in electronic and hybrid-electronic vehicles (EV/HEV).

By John Blyler, Editorial Director

Most news about electric and hybrid vehicles (EV/HEV) electronics focuses on the processor-based engine control and the passenger infotainment systems.  Of equal importance is the power electronics that support and control the actual vehicle motors. On-road EVs and HEVs operate on either AC induction or permanent magnet (PM) motors. These high-torque motors must operate over a wide range of temperatures and in often electrically noisy environments. The motors are driven by converters that generally contain a main IGBT or power MOSFET inverter.

The constant power cycling that occurs during the operation of the vehicle significantly affects the reliability of these inverters. Design and reliability engineers must simulate and test the power electronics for thermal reliability and lifecycle performance.

To understand more about the causes of inverter failures and the test that reveal these failures, I presented the following questions to Andras Vass-Varnai, Senior Product Manager for the MicReD Power Tester 600A , Mentor Graphic’s Mechanical Analysis Division. What follows is a portion of his responses. – JB


Blyler: What are some of the root causes of failures for power devices in EV/HEV devices today, namely, for insulated gate bipolar transistors (IGBTs), MOSFETs, transistors, and chargers?

Vass-Varnai: As the chip and module sizes of power devices show a shrinking tendency, while the required power dissipation stays the same or even increases, the power density in power devices increases, too. The increasing power densities require careful thermal design and management. The majority of failures is thermal related, the temperature difference between the material layers within an IGBT or MOSFET structure, plus the differences in the coefficient of thermal expansion of the same layers lead to thermo-mechanical stress.

The failure will develop ultimately at these layer boundaries or interconnects, such as the bond wires, die attach, base plate solder, etc. (see Figure 1). Our technology can induce the failure mechanisms using active power cycling and can track the failure while it develops using high resolution electric tests, from which we derive thermal and structural information.

Figure 1: Cross-section of an IGBT module.

Blyler: Reliability testing during power cycling improves the reliability of these devices. How was this testing done in the past? What new technology is Mentor bringing to the testing approach?

Vass-Varnai: The way we see it, traditionally the tests were done in a very simplified way, companies used tools to stress the devices by power cycles, however these technologies were not combined with in-progress characterization. They started the tests, then stopped to see if any failure happened (using X-ray microscopy, ultrasonic microscopy, sometimes dissection), then continued the power cycling. Testing this way took much more time and more user interaction, and there was a chance that the device fails before one had the chance to take a closer look at the failure. In some more sophisticated cases companies tried to combine the tests with some basic electrical characterization, however none of these were as sophisticated and complete as offered by today’s power testers. One major advantage of today’s technology is the high resolution (about 0.01C) temperature measurement and the structure function technology, which helps users to precisely identify in which structural layer the failure develops and what is its effect on the thermal resistance, all of these embedded in the power cycling process.

The combination with simulation is also unique. In order to calculate the lifetime of the car, one needs to simulate very precisely the temperature changes in an IGBT for a given mission profile. In order to do this, the simulation model has to behave exactly as the real device both for steady state and transient excitations. The thermal simulation and testing system must be capable of taking real measurement data and calibrating the simulation model for precise behavior.

Blyler: Can this tester be used for both (non-destructive) power-cycle stress screening as well as (destructive) testing the device all the way to failure? I assume the former is the wider application in EV/HEV reliability testing.

Vass-Varnai: The system can be used for non-destructive thermal metrics measurements (junction temperature, thermal resistance) and also for active power cycling (which is a stress test), and can track automatically the development of the failure (see Figure 2).

Figure 2: Device voltage change during power cycling for three tested devices in Mentor Graphics MicReD Power Tester 1500A

Blyler: How do you make IGBT thermal lifetime failure estimations?

Vass-Varnai: We use a combination of thermal software simulation and hardware testing solution specifically for the EV/HEV market. Thermal models are created using computational fluid dynamics based on the material properties of the IGBT under test. These models accurately simulate the real temperature response of the EV/HEV’s dynamic power input.

Blyler: Thank you.

For more information, see the following: “Mentor Graphics Launches Unique MicReD Power Tester 600A Solution for Electric and Hybrid Vehicle IGBT Thermal Reliability

Bio: Andras Vass-Varnai obtained his MSc degree in electrical engineering in 2007 at the Budapest University of Technology and Economics. He started his professional career at the MicReD group of Mentor Graphics as an application engineer. Currently, he works as a product manager responsible for the Mentor Graphics thermal transient testing hardware solutions, including the T3Ster product. His main topics of interest include thermal management of electric systems, advanced applications of thermal transient testing, characterization of TIM materials, and reliability testing of high power semiconductor devices.


EDA Tool Reduces Chip Test Time With Same Die Size

Thursday, February 4th, 2016

Cadence combines physically-aware scan logic with elastic decompression in new test solution. What does that really mean?

By John Blyler, Editorial Director

Cadence recently announced the Modus Test Solution suite that the company claims will enable up to 3X reduction in test time and up to 2.6X reduction in compression logic wirelength. This improvement is made possible, in part, by a patent-pending, physically aware 2D Elastic Compression architecture that enables compression ratios beyond 400X without impacting design size or routing. The press release can be found on the company’s website.

What does all the technical market-ese mean? My talk with Paul Cunningham, vice president of R&D at Cadence, helps clarify the engineering behind the announcement. What follows are portions of that conversation. – JB


Blyler:  Reducing test times saves companies a lot of money. What common methods are used today?

Cunningham: Test compression is the technique of reducing the test data volume and test application time while retaining test coverage. XOR-based compression has been widely used to reduce test time and cost. Shorter scan chains mean fewer clock cycles are needed to shift in each test pattern, reducing test time. Compression reduces test time by partitioning registers in a design into more scan chains than there are scan pins.

But there is an upper limit to test time. If the compression ratio is too high, then the test coverage is lost. Even if test coverage is not lost, test time savings eventually dry up. In other words, as you shrink the test time you also shrink the data you can put into the compression system for fault coverage.

As I change the compression ratio, I’m making the scan chains shorter. But I’ve got more chains while the scan in pin numbers are constant. So every time I shrink the chain, each pattern that I’m shifting in has less and less bits because the width of the pattern coming in is the number of scan pins. The length of the pattern coming in is the length of the scan chain. So if you keep shrinking the chain, the amount of information in each pattern decreases. At some point, there just isn’t enough information in the pattern to allow us to control the circuits to detect the faults.

Blyler: Where is the cross-over point?

Cunningham: The situation is analogous to general relativity. You know that you can never go faster than the speed of light but as you approach the speed of light it takes exponentially more energy. The same thing is going on here. At some point, if the length of the chain is too short and our coverage drops. But, as we approach that cliff moment, the number of patterns that it takes to achieve the coverage – even if we can maintain it – increases exponentially. So, you can get into the situation where, for example, you half the length of the chain but you need twice as many patterns. At that point, your test time hasn’t actually dropped because test time it the number of patterns times the length of the chain. So the product of those two starts to cancel out. At some point you’ll never go beyond a certain level but your coverage will drop. But as you get close to it, you start losing any benefit because you need more and more patterns to achieve the same result.

Blyler: What is the second limit to testing a chip with compression circuitry?

Cunningham: The other limit doesn’t come from the mathematics of fault detection but is related to physical implementation. In other words, the chip size limit is due to physical implementation, not mathematics (like coverage).

Most of the test community has been focused on the upper limit of test time. But even a breakthrough there wouldn’t address the physical implementation challenge. In the diagram below, you can see that the big blue spot in the middle is the XOR circuit wiring. All that wiring in the red is wiring to and from the chains. It is quite scary in size.

Blyler: So the second limit is related to the die size and wire length for the XOR circuit?

Cunningham:  Yes - There are the algorithm limits related to coverage and pattern count (mentioned earlier) and then there are the physical limits related to wire length. The industry has been stuck because of these two things. Now for the solution. Let’s talk about the things in reverse order, i.e., the issue of the physical limits first.

What is the most efficient way to span two dimensions (2D) with Manhattan routing? The answer is by using a grid or lattice. [Editor’s Note: The Manhattan Distance is the distance measured between two points by following a grid pattern instead of the straight line between the points.]

So the lattice is the best way to get across two dimensions while giving you the best possible way to control circuit behavior at all points. We’ve come up with a special XOR Circuit structure that unfolds beautifully into a grid in 2D. So when Modus inserts compress it doesn’t just create an XOR circuit, rather, it actually places it. It takes the X-Y coordinates for those XOR gates. Thus, using 2D at 400X has the same wire length as 1D at 100X.

Blyler: This seems like a marriage with place & route technology.

Cunningham:  For a long time people did logic synthesis only based on the connectivity of the gates. Then we realized that we really had to do physical synthesis. Similarly, for a long time, the industry has realized that the way we connect up the scan chains need to be physically aware. That’s been done. But nobody made the actual compression logic physically aware. That is a key innovation in our product offering.

And it is the compression logic that is filling the chip – all that red and blue nasty stuff. That is not scan chain but compression logic.

Blyler: It seems that you’ve address the wire length problem. How do you handle the mathematics of the fault coverage issue?

Cunningham: The industry got stuck on the idea that, as you shrink the chains you have shorter patterns or a reduction in the amount of information that can be input. But why don’t we play the same game with the data we shift in. Most of the time, I do want really short scan chains because that typically means I can pump data into the chip faster than before. But in so doing, there will be a few cases where I lose the capability to detect faults because some faults really require precise control of values in the circuit. For those few cases, why don’t I shift in more clock cycles than I shift out?

In those cases, I really need more bit of information coming in. But that could be done by making the scan deeper, that is, by adding more clock cycles. In practice, that means we need to put sequential elements inside the decompressor portion of the XOR Compressor system.  Thus, where necessary, I can read in more information. For example, I might scan in for 10 clock cycles but I’ll scan out (shift out) for only five clock cycles. I’m read in more information than I’ve read out.

In every sense of the word, it is an elastic decompressor. When we need to, we can stretch that pattern to contain more information. That stretched pattern it then transposed by 90 degrees into a very wide pattern that we then shove into those scan chains.

Blyler: So you’ve combined this elastic decompressor with the 2D concept.

Cunningham: Yes – and now you have changed the testing game with 400x compression ratios and achieving up to 3X reduction in test time without impacting the wire length (chip size). We have several endorsements from key customers, too.

In summary:

  • 2D compression: Scan compression logic forms a physically aware two-dimensional grid across the chip floorplan, enabling higher compression ratios with reduced wirelength. At 100X compression ratios, wirelength for 2D compression can be up to 2.6X smaller than current industry scan compression architectures.
  • Elastic compression: Registers embedded in the decompression logic enable fault coverage to be maintained at compression ratios beyond 400X by controlling care bits sequentially across

Blyler: Thank you.