Server system-on-chips pack up to 48 64-bit ARM cores

June 18th, 2014

Targeting secure cloud servers, storage servers, compute servers, and data-plane applications, the ThunderX series of multicore SoCs deliver power-efficient computing solutions

Dave Bursky
Semiconductor Technology Editor

Multicore processors based on x86 cores are a very common choice for servers and for handling packets in data-networking applications. Although x86-based servers command most of the IT market, other processors such as MIPS and PowerPC are key players in the deeply embedded applications such as network switches and routers, handling both data plane and control plane functions. ARM processors have started to make inroads in the server market, and with the release of the A57 64-bit core, the ARM processors are poised to make significant inroads into all the applications that are currently employing the x86, MIPS, and PowerPC cores.

One example of that opportunity takes aim at low-power servers and secure network communications — the just-released ThunderX series of multicore processors from Cavium. This family includes versions containing from 8 to 48 customized ARM 64-bit processor cores that can operate at up to 2.5 GHz. There will actually be four families of processors in the ThunderX series–each optimized for a different type of workload. The ThunderX_SC is targeted at security applications, the ThunderX_ST for storage control and management, the ThunderX_NT for networking systems, and the ThunderX_CP for computational applications.

Implemented in a low-power 28-nm process, the basic ThunderX architecture brings together up to 48 full custom 64-bit processor cores that are fully compliant with the ARMv8 architecture specification and ARM’s Server Base System Architecture (SBSA). Included on each multi-core chip are a cache subsystem (each processor has level 1 instruction and data caches, and all processors share an L2 cache), Ethernet interfaces capable of 10/40/100 Gbit/s data rates, multiple PCIe gen3 and SATA v3 interfaces, up to four DDR3/4 memory controllers, additional I/O ports, and various accelerators depending on the market segment the processor is optimized to tackle (see the figure).

Figure:
Members of the ThunderX family from Cavium contain up to 48 ARM64 processor cores, application-specific hardware accelerators, high-speed Ethernet ports, both PCIe gen3 and SATA v3 ports and many other system support features to support Compute, Storage, Networking, and Secure Computing applications.

 

 

 

 

 

For example, the ThunderX_SC family is optimized for Secure Web frontend, security appliances and Cloud RAN type workloads. It includes specialized hardware accelerators consisting of Cavium’s 4th generation NITROX and TurboDPI technology with acceleration for IPSec, SSL, Anti-virus, Anti-malware, firewall and DPI. The NITROX engine can deliver 50 Mbps to 40Gbps of encryption bandwidth with 1K to 200K RSA/DH operations per second. Additionally, the TurboDPI block employs the company’s Uniscan technology that simultaneously blocks malicious or inappropriate URLs, identifies hundreds of widely used protocols and applications, helps block thousands of different intrusion attempts and locates over a hundred thousand varieties of virus and malware threats, all with just a single scan of the data stream,

Also integrated on the Thunder_SC are multiple 10/40 Gbit/s Ethernet ports, multiple PCIe Gen3 and SATA 3 ports, up to four high-memory-bandwidth DDR 3 or DDR 4 72-bit memory controllers able to support 2400 MHz memories, a cache-coherent interconnect across dual sockets thanks to the Cavium Coherent Processor Interconnect, and a scalable fabric for east-west as well as north-south traffic connectivity. Most of these features are also available on the other Thunder families along with accelerators for each target application segment – the ST series includes storage accelerators for data protection, data integrity, security and compression, as well as efficient user-to-user data movement, the CP series includes core-to-I/O virtualization in hardware, and the NT series processors include full virtualization support and network accelerators for QoS, traffic shaping, tunnel termination, and high packet-throughput processing, network virtualization, and data monitoring.

Interesting product developments at DAC

June 4th, 2014

Dave Bursky

Many interesting IP and design verification announcements were one of the key topics running through this year’s Design Automation Conference. Several IP announcements from CAST Inc., for example, offer solutions in video decoding, graphics acceleration, and image decoding. Although developed by the Fraunhofer Henrich Hertz Institute, a H.265 HVEC decoder core is now available from CAST, and it is the first in a series of high-efficiency video coding cores that CAST will offer. The core implements the MPI-D main profile intra HVEC decoding and will be available in the third quarter of this year. The decoder design makes clever use of internal and external memory and its application-specific internal memory architecture enables the core to reuse already fetched data, thus reducing the number of memory fetches. Fewer fetches give more memory bus bandwidth back to the CPU, while at the same time reducing the power needed by the core.

Another core offered by CAST that was developed by IP partner Think Silicon, saves energy in graphics applications by offloading a GPU or a CPU that does not include GPU support. The Think2.5D graphics accelerator is a rendering engine that accelerates two-dimensional graphics functions and pseudo three-dimensional effects such as reflected and shadowed icons. The engine significantly offloads a system’s GPU, performing the calculations at a reduced power level. And for systems without a GPU, the core can offload the host CPU and accelerate the calculations, providing a “snappier” feel to the screen operations –and at lower power consumption levels. Also available from CAST is a graphics processing unit that was also created by Think Silicon. The ThinkVG core supports the Khronos Group OpenVG 1.1 standard, and CAST claims it is one of the smallest and lowest power GPU cores available. Inside the core is a floating-point SIMD streaming engine specifically designed for graphics applications (Vshader) plus graphic accelerators for the blending, rasterization, and texture-mapping functions.

For still images, Alma Technologies, another CAST partner developed a 12-bit extended-resolution JPEG decoder, the JPEG-D-X, that CAST supports. The core supports applications requiring images with greater dynamic range, such as in medical imaging and machine vision. Able to decode static images or motion JPEG streams compressed in Baseline or Extended JPEG formats with 8-or 12-bits per sample precision. The decoder complements the company’s previously-release 12-bit JPEG encoder, and provides efficient, low-latency decompression do deep color images and video with a tiny silicon footprint and low power consumption.

It’s not often tool vendors will offer a free version of one of their new tools, but Agnisys has done just that – free versions of DVinsight, a correct-by-construction tool for design and verification applications. The tool is an integrated development environment for he development of Universal Verification Methodology (UVM) based System Verilog (SV) design verification (DV) code. DVinsight ensures compliance with best practices in using UVM while adhering to established standards. The tool provides on-the-fly checks and guides for creating SV/UVM code, provides auto code completion, context-based hints and includes many built-in rules to ensure correct-by-construction DV code development.

Another newcomer in the DV space is SmartDV North America, the U.S. arm of SmartDV Technologies India Private Ltd. The company provides well-supported verification IP blocks that include compliance test suites and complete functional coverage models that help accelerate time to market. The verification models are generated by the company’s internally developed compiler technology, which allows the company to rapidly generate the verification IP and tweak the IP very rapidly (in days rather than weeks) if customers need any customization or a bug must be corrected. Also offering verification IP, TrueChip provides support for USB 3, various versions of ARM’s AMBA bus, and will shortly have the new USB 3.1 verification IP.
Additional DAC product updates will appear in the next column.

From 3-D transistors to 2.5D or 3D systems

December 31st, 2013

From the ultra-small 3D transistors described in papers at this month’s International Electron Devices Meeting (IEDM) in Washington, D.C., to the 2.5D and 3D multichip structures described at the 3D Architectures for Semiconductor Integration and Packaging (ASIP) conference held in Burlingame, Calif., designers are finding more ways to pack more transistors on a chip and to pack more functions into a limited area on a printed-circuit board. For instance, at IEDM TSMC Shien-Yang Wu and his team of researchers described a 16-nm FinFET process in paper 9.1 that they feel is one of the world’s most advanced semiconductor technologies.

The process is the first integrated technology platform to be announced below the 20 nm node, with key capabilities that include a 48-nm fin pitch and the smallest SRAM cell ever incorporated into an integrated process—a 128-Mb SRAM with a cell area of just 0.07 µm2 per bit. The process’ short-channel effects were well-controlled, with DIBL <30 mV/V, saturation current of 520/525 µA/µm at 0.75V (NMOS and PMOS, respectively) and an off-current of 30 pA/µm. Depending on the designer’s goal, the process delivers either a 35% speed gain or a 55% power reduction in comparison with TSMC’s existing 28-nm high-k/metal-gate planar process, and with twice the transistor density (Figure 1).

Figure 1: The 16 nm process platform developed by TSMC allows designers to get 55% reduction in operating power or a 35% improvement in operating speed vs the company’s established 28 nm high-K/metal-gate process.

Creation of a “superchip” was the goal of researchers at the New Industry Creation Hatchery Center at Tohoku University in Sendai, Japan. The heterogeneous 3D integration described by the Professor Mitsumasa Koyanagi in a plenary presentation at IEDM allows various kinds of device chips with different sizes, different functions, and different materials to be stacked to form the superchip. A key technology developed to achieve this consists of self-assembly and electrostatic (SAE) temporary bonding. To demonstrate the technology, the university fabricated several prototype superchips—examples include stacking MEMS chips, spin memory chips and a photonic device chip on a CMOS logic chip; a 3D back-illuminated image sensor with through-silicon vias stacked on top of an image processing chip; and a 3D microprocessor with self-test and self-repair functions.

The assembly process to create superchips starts with known good die (KGD) that are sorted from several device wafers and simultaneously bonded as a batch onto a carrier wafer (Figure 2, left). High alignment accuracy is achieved using the self-assembly and electrostatic bonding. The process repeats with additional carrier wafers. Multiple carrier wafers with the KGDs are then stacked onto a target interposer wafer. This allows multiple superchips to simultaneously be fabricated. The surface tension of liquid is used in the self-assembly scheme to simultaneously align many dies in parallel. Hydrophilic areas and hydrophobic areas are formed on the surface of the wafer or chip to obtain high alignment accuracy. As many as 500 chips have been simultaneously aligned with an average alignment accuracy of 0.05 µm within 0.1 seconds.

Figure 2: The assembly process to create superchips starts with known good die (KGD) that are sorted from several device wafers and simultaneously bonded as a batch onto a carrier wafer (left). The process repeats with additional carrier wafers and then multiple carrier wafers with the KGDs are then stacked onto a target interposer wafer using electrostatic bonding and debonding (right).

 

The electrostatic temporary-bonding and de-bonding method for assembly of the multiple carrier wafers allows the stacking integration of multiple chips (Figure 2, right). Many chips are simultaneously bonded onto the electrostatic carrier wafer (e-carrier) by the electrostatic force after the simultaneous alignment by self-assembly. The electrostatic force for temporary bonding is generated by applying a high voltage to the electrodes embedded in the e-carrier wafer. A high voltage with opposite polarity is applied to the electrodes for de-bonding the chips.

These two presentations are just the proverbial tip of the iceberg representing several hundred paper presentations at IEDM that covered process and manufacturing, memory technology, nano-device technology, power and compound semiconductors, advanced CMOS technology, and many other subjects. For more information, go to www.ieee-iedm.org.

Running concurrently with IEDM but on the opposite coast, the 3D-ASIP Conference in Burlingame delved into many aspects of 2.5 and 3D integration, ranging from basic integration, to various interposer technologies, to wafer handling and thermal challenges to name a few. Many of the presentations examined the evolution of assembly techniques to move from 2D to 2.5 D to true 3D implementations. Doug Yu, the Director of the Integrated Interconnect and Packaging Division at TSMC provided an overview of wafer-level system integration technology, while Robert Patti, the CTO of Tezzaron Semiconductor described a combination of dis-integration and then integration to create a high-density and high-performance memory stack (See “Advances in DRAM and non-volatile memories keep upping system performance”, Aug. 26, 2013). The architecture of the memory array provides 256 independent channels, each containing 256 Mbits of storage and capable of transferring data at 64 Gbits/s with a latency of just 9 ns
.
Another memory presentation by Eric Beyne, the Program Director for 3D System Design at IMEC examined high-bandwidth memory-logic 3D integration by either direct stacking or the use of interposers. One of the key aspects of leveraging the 3D integration is to reduce the power consumption of the chip-to-chip interconnects by lowering the voltage swing, widening the I/O to lower the transfer frequency, and use vertical interconnects in a chip stack to reduce the wiring length (Figure 3). Using 3D through-silicon vias and microbump interconnects designers at IMEC were able to assemble high-density chip stacks, but encountered issues with the increased power density. The high power density can result in thermal issues (increased temperatures) and higher temperatures could affect DRAM data retention since retention time decreases as temperature increases.

Figure 3: Multiple factors such as the I/O width, the load capacitance, the transfer frequency, and the operating voltage must be taken into account when estimating the power consumed in chip-to-chip interconnects. (Source, IMEC).

Just such thermal issues were discusses by Joseph Maurer a support contractor to DARPA, and by Muhannad Bakir, Associate Professor, School of Electrical and Computer Engineering at the Georgia Institute of Technology. At DARPA, Maurer described multiple projects aimed at pulling out the heat and improving thermal conductivity. Techniques such as the use of copper nanosprings; near-junction thermal transport with liquid cooling and high-thermal conductivity diamond substrates; the use of a 3D vapor chamber with vibrating elements; the use of thin-film superlattice materials; and still other approaches are all being explored. Examining the use of microfluidic cooling on 3D ICs, Bakir showed a potential solution using coolant cycled through a multichip stack composed of two processor layers and a memory stack (Figure 4). With such a stack there are concerns about the reliability of circulation system used the microfluidic cooling, as well as the endurance of the TSVs since they are under pressure from the liquid flowing between the layers.

Figure 4: Microfluidic cooling between layers of chips, can pull out the heat, but there are concerns about the reliability of the microfluidic I/O technology as well as power-supply noise and the durability of TSVs due to the pressure of the liquid coolant as it flows through the package. (diagram courtesy of Georgia Tech).

 

These few papers were just a few of the presentations at the 3D-ASIP conference. For more details, go to www.3dasip.org to view the program or purchase the proceedings.

Dave Bursky
Semiconductor Technology Editor

Advances in CPUs, System Architecture, Heat Up the Performance Race at Hot Chips

September 14th, 2013

Celebrating its 25th year, the annual Hot Chips Conference held at Stanford University last month lived up to its reputation for highlighting high-performance processor solutions as well as advances in low-power designs for mobile applications. One of the highest performance processors unveiled at the conference was the Power 8 – the next generation Power series processor developed by IBM. The processor chip was fabricated using a 22 nm silicon-on-insulator process that allows designers to pack 12 CPU cores, lots of level-2 cache (512 kbytes 8-way per core), a large embedded-DRAM level-3 shared cache (96 Mbytes, 12 x 8 Mbytes 8-way bank), PCI Gen 3 ports capable of 8 Gbits/s transfers, eight DRAM ports, and many other sub-processors to offload the CPU and manage power (Figure 1). All these functions add up to about 5 billion transistors, all squeezed into a 650 mm2 chip with 15 layers of metal that can be clocked at 4 GHz and consumes over 300 W.

 

Figure 1: Containing about 5 billion transistor, the Power8 CPU developed by IBM combines 12 processor cores, 96 Mbytes of embedded DRAM for a level 3 on-chip cache, dual gen 3 PCIe ports, and eight memory ports. A separate memory buffer chip that connects to each memory port handles four channels of DDR3 memory.

 

The high throughput of the multicore processor posed another challenge – transferring data from or to the memory at a rate that won’t starve the process cores. To solve that problem, along with the processor chip, IBM designers crafted a memory buffer chip dubbed Centaur that packs 16 Mbytes of cache and provides a 9.6 Gbyte/s high-speed interface to the processor. The Power 8 processor can support up to eight of the Centaur memory buffers to achieve a sustained transfer rate of 230 Gbytes/s between the buffers and the processors. Each buffer chip has four DDR3 memory channels that yield a peak throughput of 410 Gbytes/s when all 32 memory channels are transferring data. A fully configured processor socket can address up to 1 Tbyte.

Additional high-performance CPUs based on the SPARC architecture included the SPARC 64X+, a next-generation processor targeted at UNIX servers developed by Fujitsu, and the M6 SPARC processor for enterprise systems developed by Oracle. A companion ASIC to the M6, the Sixby, is a scalability and coherency directory chip to support the company’s highly-scalable enterprise systems.

Fujitsu’s design packs 16 processor cores, each capable of running two threads, a shared L2 cache of 24 Mbytes, dual DDR3 memory interface controllers, several hardware-based software accelerators for cipher, database, and decimal calculations, dual PCI gen3 I/O controllers, and many other enhancements. The SPARC 64X+ processor was fabricated on a 28 nm CMOS process, contains almost 3 billion transistors, 1500 signal pins, and can clock at over 3.5 GHz. A single processor chip can deliver a peak performance of over 448 GFLOPs and has a memory throughput of 102 Gbytes/s. Designed for use in systems with from 1 to 64 CPU sockets, the processors incorporate a high-speed interconnect that can transfer data at up to 25 Gbits/s, per lane directly between CPU sockets (about 70% faster than the company’s previous SPARC 64X CPU). A basic system building block comprises four CPU sockets and two crossbar chips, and a full 64-socket system would contain 16 building-block modules capable of executing 2048 program threads.

Similar in some features to the Fujitsu processor, the Oracle M6 is also fabricated in a 28 nm process, packs 12 processor cores, and integrates two 8-lane PCIe gen 3 ports. However the M6 will be capable of executing eight program threads per core vs the two threads per core of the SPARC 64X+, and incorporates 48 Mbytes of L3 cache vs no L3 cache on the Fujitsu processor. Four DDR3 memory schedulers, each capable of handling four memory channels provide a total of 16 DDR channels that can address a total of 1 Tbyte per CPU socket (Figure 2).

Figure 2: The Oracle M6 processor packs 12 processor cores, each capable of executing eight program threads, 48 Mbytes of L3 cache, a pair of 8-lane PCIe gen 3 ports, and four DDR3 memory schedulers that together can address a total of 1 Tbyte per CPU socket.

In the large systems targeted by the Power8 and the SPARC processors, the CPUs are only a small part of the overall system. At the other extreme, however, in handheld, laptop and desktop computers, a system-on-a-chip solution is typically the main component. One example of that, the Kabini processor developed by AMD, packs four Jaguar CPU cores, a high-performance graphics and multimedia engine with Display Port, HDMI, and VGA outputs, SATA, USB, PCIe interfaces, advanced power management, and still other functions (Figure 3). Implemented in a 28 nm process, the chip is only 105 mm2—one-sixth the area and transistor count of the IBM Power8 and close to 1/30th the power. The four processor cores share a 2-Mbyte level-2 cache and are fed through a 64-bit DDR3 memory interface that can transfer data at up to 10.3 Gbytes/s with DDR3-1600 memory DIMMs. Additionally, the graphics/multimedia core is based on the company’s RADEON HD8000 Graphics Core Next (GCN) architecture that can handle 4k by 2k resolution. The core includes a video codec engine that can encode H.264 streams, and a universal video decoder that handles over half-a-dozen codec formats.

Figure 3: A highly-integrated system-on-a-chip solution for notebook and other portable computing systems, the Kabini processor packs four Jaguar CPU cores that share a 2 Mbyte level-2 cache and a 64-bit DDR3 memory interface. Included on the chip is a high-performance graphics/multimedia processor based on the HD8000 Radeon graphics core.

 

Making its move into the low-power handheld systems arena, Intel showed off its Clovertrail+, an SoC  targeted at smartphones that is a significant upgrade over Intel’s Medfield based smartphone solutions. Implemented in a 32-nm high-k metal-gate process, the Clovertrail+ (Atom Z2580) contains dual Atom CPU cores, a dual-core 2D/3D graphics processor, a multi-standard video decoder capable of handling 1080p/60 Hz video, a video encoder able to encode 1080p/30Hz video, a camera/imaging subsystem based on a programmable very-long-instruction word SIMD vector processor, a security engine, and many other control, power management, and system interface support functions (Figure 4).

Figure 4: Taking aim at next-generation smart phones, the Clovertrail+ system-on-a-chip platform developed by Intel employs two dual-thread Atom processor cores that can run at up to 2 GHz. The chip contains a 2D/3D dual core graphics engine as well as dedicated video decode and encode blocks, an image signal processor, a crypto engine, and many other interface and control blocks.

 

Each Atom core has a 512 kbytes L2 cache, can execute two program threads and run at a top speed of 2 GHz. The processor’s memory interface can support up to 2 Gbytes of low-power DDR2 (533 MHz) and address up to 256 Gbytes over an eMMC 4.41 interface. The enhancements in the Clovertrail+ processor yield a doubling of overall performance vs the Medfield processor and up to a 3X improvement in graphics performance.

Also focusing on portable system solutions, the just-unveiled Richland processor detailed by AMD incorporates the company’s Turbo-Core temperature-smart technology to manage core performance and power consumption (Figure 5). The Turbo-core technology is designed to more effectively exploit temperature margins by detecting favorable thermal conditions in real time and adjusting operating voltage and frequency.

Figure 5: The Turbo-core temperature-smart technology developed by AMD measure core temperature and performs various calculations to determine the optimum operating frequency and voltage for the CPU and graphics cores on the Richland processor chip.

 

The processor contains two dual-core CPU modules, each based on AMD’s recently released Piledriver CPU core. Each module has a 2 Mbyte L2 cache that is shared across the two CPU cores in the module. Like the Kabini processor, the Richland also incorporates an HD8000 series graphics processing unit and multimedia accelerators to offload the CPUs. The chip delivers up to 29% higher CPU performance and 41% higher GPU performance than the company’s previous generation solution while keeping power consumption down, allowing systems to deliver 10 or more hours of idle operation, or over five hours of video playback. Also supported by the processor chip is AMD’s wireless display for Windows 8.1, a low-latency wireless interface that can stream HD video at 1080p/60 Hz along with rich audio playback.

With low power consumption on nearly every processor designer’s mind, a new body bias technology used in conjunction with deeply-depleted-channel (DDC) transistors developed by SuVolta promises to reduce processor power consumption by as much as 50%. Implemented in an ARM Cortex M0 processor to validate the concepts, the body bias network requires minimal routing resources and the DDC transistor performance can be optimized to reduce leakage current for high-performance designs, or reduce active power for low-voltage threshold devices, thus permitting the processor designers to optimize the performance and power. The M0 processor implemented with 65 nm design rules achieved comparable performance at half the power vs a standard 65 nm process, or delivered 35% better performance at comparable power levels.

These are only a few of the many presentations at the Hot Chips conference. To view the full conference program, go to www.hotchips.org.

Dave Bursky
Semiconductor Technology Editor
Chip Design Magazine

Advances in DRAM and non-volatile memories keep upping system performance

August 26th, 2013

In the drive to improve system performance, faster processors often end
up spotlighting system bottlenecks, especially in the memory subsystem. To
reduce those bottlenecks, designers are developing faster-accessing memories,
faster interfaces with reduced overheads, and even new memory architectures and
technologies. At this month’s MemCon conference in Santa Clara, Calif., presentations
highlighted many of the developments in storage subsystems and devices that
promise improve memory subsystem performance. Memory interfaces such as DDR3
will soon give way to DDR4 and the low-power DDR3 will give way to LPDDR4,
while new interfaces such as HMC (hybrid memory cube), Wide I/O 2, eMMC 5.0, and
NVMe are gaining acceptance for future system designs.

When designers try to start implementing systems based on these new
standards, having functional models that can link into the memory subsystem
help verify system designs before any hardware gets built. Released at the
conference, new verification IP models developed by Cadence Design Systems for
DDR4, LPDDR4, Wide I/O 2, eMMC 5.0, and HMC allow designers to check out their
designs, while providing them with trace debugging, address scrambling, and
backdoor memory access. The models support all leading third-party simulators,
verification languages, and methodologies, thus enabling SoC designers to
verify the correctness of the interfaces to the specialized memories.

New memory architectures, such as the Dis-integrated 3D RAM developed
by Tezzaron Semiconductor (Figure 1a) and the Hybrid Memory Cube developed by
Micron Semiconductor (Figure 1b) in conjunction with Samsung, SK Hynix, Open
Silicon, IBM, ARM, Altera, and Xilinx, promise to provide much higher bandwidth
– in the case of the HMC module, a bandwidth of 160 Gbytes/s, which is a 15X
boost in memory bandwidth over a DDR3 memory module, while Tezzaron is
projecting a data bandwidth of 16 Tbits/s for its novel memory structure.

 

Figure 1a: The high-density memory subsystem proposed by Tezzaron consists of multiple layers of DRAM memory cells and access transistors. These layers sit on top of a layer of sense amplifiers and control logic, which, in
turn, sits on top of another chip that contains the I/O circuits.

Figure 1b: The hybrid memory cube developed at Micron also consists of multiple thinned chips that each contain multiple blocks of DRAM cells.  The multiple chips are stacked using through-silicon vias and sit on top of a logic chip that controls access to all the memories and the I/O operations.

 

Both the Tezzaron and Micron solutions have a somewhat similar approach
– multiple layers of thinned DRAM storage chips all interconnected and then connected
to a lower layer or two that contains the control logic and I/O. In Tezzaron’s
concept, there are 256 independent memory channels with each channel containing
256 Mbits of storage and delivering a 64 Gbit/s bandwidth. This gives the
DiRAM4 stack the capability of delivering 21 billion transactions per second.
Micron’s hybrid memory cube consists of a single package containing multiple
memory die and one logic die, stacked together and interconnected using
through-silicon via (TSV) technology. Within an HMC, memory is organized into
vaults. Each vault is functionally and operationally independent. Each vault
has a memory controller in the logic base (called a vault controller) that manages
all memory reference operations within that vault. Each vault controller
determines its own timing requirements. Refresh operations are controlled by
the vault controller, eliminating this function from the host memory controller.

One additional presentation at MemCon discussed the future of ReRAM (resistive
RAM) as a possible DRAM replacement. A new material developed by 4DS, dubbed
MOHJO (metal oxide heterojunction operation) allows the company to develop ReRAMs
with a high cycle live, low power dissipation, good data retention, reduced
manufacturing time and cost, and it also solves the word-line drop problem that
occurs with other ReRAM solutions. The MOHJO material is deposited on the
back-end of the process flow, on top of a standard CMOS manufacturing flow. The
material has low-current reset state that permits large blocks of memory to be
erased, and MOHJO-based memories can be especially useful in solid-state drive
systems by lowering the energy consumption by almost 100X. In comparison to
flash, spin-torque technology (STT), phase-change memories (PCM), and MOHJO,
the MOHJO technology has about the same endurance as PCM storage, but its
endurance is significantly lower than STT memories yet higher than flash. Read
and write performance of the MOHJO memories is symmetrical and ranges from 10
to 50 ns, which is about 200X faster than flash but competitive with STT and
PCM memories (See the table). The company expects this technology to be used in
both flash replacement applications, as well as a nonvolatile cache in hybrid DRAM
systems.

Performance comparison of Flash, STT, PCM and MOHJO technologies

Dave Bursky

Semiconductor Technology Editor

The Big and Small Come Together at Semicon and Intersolar

July 25th, 2013

The recently held Semicon West and Intersolar Conferences in San Francisco were interesting examples of technology extremes. At Semicon, for example, major efforts are underway to define the equipment and fabrication facilities needed to produce chips based on 450 mm diameter wafers, while at the other extreme at Semicon, device and equipment designers were challenging each other to define and design ultra-small transistors and the lithography and other systems capable of fabricating devices with gate dimensions of 14 nm, 10 nm, and even smaller features. Intersolar also had it extremes, with presentations discussing energy efficiency of photovoltaic cells measuring a few square inches to the performance aspects of multi-square-meter PV panels and the implementation of large multi-acre commercial PV arrays.

Large research consortiums such as IMEC (formerly the Interuniversity Microelectronics Centre) in Leuven, Belgium, LETI (Laboratoire D’Electronique et de Technologies de L’Information) in Grenoble, France, Sematech in Albany, NY, as well as foundries such as Global Foundries, TSMC, and others are all working hard to define and qualify the processes needed for future-generation chips. Over the past few decades, scaling has lowered the cost of transistors by integrating more and more devices on a chip, even as the cost to fabricate the chips continued to increase.

However, Kurt Ronse, director of advanced patterning at IMEC explains that the extremely high cost of fabrication tools and facilities to implement features of 14 nm and smaller, has led to an increasing cost per transistor. The higher cost comes as a result of the use of triple or quadruple patterning with 193 nm immersion lithography. Such patterning techniques require many more masks to create the ultra-small features, and the higher number of masks adds considerably to the fabrication cost. Subi Kengeri, the Vice President of Advanced Technology Architecture at Global Foundries confirmed the rising cost of lithography comparison to other factors in a TechXpot presentation. In the graph he presented, various steps in the manufacturing process—etch, CMP, Doping, Metrology, metal deposition, dry etch, diffusion and dielectric deposition, and lithography were compared for relative costs (Figure 1). In most cases only moderate cost increases were observed. However lithography costs escalated the most for 193i for nodes below 20 nm.

Figure 1: A comparison of costs of difference aspects of the manufacturing flow was done by Global Foundries across four process nodes to show the cost increases as the process nodes shrink from 28 nm to 20 nm, from 20 nm to N+1 nm using 193i immersion lithography, and alternately from 20 nm to N+1 using extreme ultraviolet lithography. As the graph shows, lithography costs skyrocket for the N+1 node using immersion lithography, but when EUV lithography is used the lithography cost drops considerably.

 

According to Ronse, it will still be a few years before EUV systems can be used in mass production, but only EUV systems can enable the 50% scaling needed to reach the 10 nm node. Current EUV research has led to UV power sources capable of delivering about 55 W. However, for a tool capable of commercial production, UV sources capable of 250 W will be needed. Such sources are not expected until 2015 at the earliest. Additionally, Ronse is hopeful that at the 10 nm node, EUV lithography can reverse the cost escalation trend since double or triple patterning would not be needed to create the 10 nm features (Figure 2).

Figure 2: Researchers at IMEC also agree that lithography costs have become a significant portion of the fabrication flow. In this graph the 28 nm node is used as the relative reference, with the 20 nm node costing almost 50% more and the N+1 193i showing an almost 80% cost increase over the 28 nm node, while the use of EUV promises to reduce the cost increase to just 20% vs the 28 nm node.

 

At the device level there has been much written about the three-dimensional FinFET structures and the high performance that such transistors can deliver. However, there is a competition brewing between FinFET advocates and the supporters of planar fully-depleted silicon-on-insulator (FDSOI) device structures. Additionally, to further boost device performance, researchers are looking beyond silicon for the channel material in future transistor structures — options being researched include III-V materials, silicon-germanium, germanium, and carbon nanotubes.

As Maud Vinet, the FDSOI Manager at LETI explained, the planar FDSOI structure requires fewer masks and is easier to scale than the 3D structure of the FinFET. Additionally, with FDSOI designers can take one of two directions – they can opt for lower power consumption at comparable performance to current designs, or they can design for higher performance at comparable power levels. This quarter Vinet expects LETI’s development partner, STMicro, to start releasing FDSOI 14 nm design kits, while in early 2014, device and process models for 10 nm FDSOI designs should be ready for STMicro to develop the design kits for release to the availability in the third quarter of 2014.

Dave Bursky, Chip Design Magazine

Microfluidics – an interesting blend of MEMS, IC technologies, and paper

July 3rd, 2013

At a recently held MEMS Technology and Business Symposium hosted by MEPTEC (the Microelectronics Packaging and Test Engineering Council) in San Jose, Calif., many advances in MEMS technology focusing on health care demonstrated the implementation in silicon of pumps, valves, chemical sensing, and still other functions. Additional research on the use of paper rather than silicon as the substrate shows a lot of promise since paper is very inexpensive, is compatible with many chemical/biochemical/medical applications, and it transports liquids using capillary forces, thus eliminating the need for a MEMS-based pump.

This combination of microscopic mechanical functions, silicon control circuits, and paper-based sensors, is making possible a wide range of products for the medical eHealth market and for industrial and military applications. As demonstrated in a presentation by Dr. Gisela Lin from University of California at Irvine, silicon technology can now implement all the functions to form a “lab-on-a-chip” – bubble pumps, fluid channels, a mixing chamber, a polysilicon heater, and valves, all interconnected and controlled by an off-chip processor (Figure 1). The technology is similar that used by the ink-jet printer print heads.

Figure 1: Implemented in silicon, this lab-on-a-chip can pump liquid through fluid channels, warm the liquid using polysilicon heaters and control the liquid flow into mixing chambers via electrically controlled valves.

 

 

And the innovation doesn’t stop there as Dr. Janusz Bryzek, the Vice President of Development for MEMS and Sensing Solutions at Fairchild Semiconductor pointed out in the conference’s opening presentation. Driving that development is the growth in the wearable health monitoring market – according to ABI Research, a market research company, in 2010 just 10 million monitoring devices were deployed and all for mostly sports and fitness applications. However by 2014, ABI analysts expect the market to grow to 420 million wearable health monitors, with about 59 million used at home.

Ongoing research at several universities is examining the ability of directly printing sensors on skin, allowing direct-contact measurements. For example, at the University of Illinois at Urbana-Champaign, researchers have succeeded in printing a triple-function sensor that senses the skin’s temperature, strain, and hydration state, all of which are useful to track general health and wellness, as well as for monitoring wound healing (Figure 2). An even more complex sensor circuit developed at the University of California at San Diego combines ECG and EMG sensors, temperature sensors, strain gauges, photodetectors, a wireless antenna, a wireless communications oscillator, a power pick-up coil to capture transmitted power, and an LED—all in a thin layer of rubbery polyester that allows the senosrs to stretch, bend, or wrinkle. Such a solution can provide a means to monitor premature babies to detect the onset of seizures, which could lead to epilepsy or brain development problems (Figure 3).

Figure 2: Sensors directly printed on the skin by researchers from the University of Illinois at Urban-Champaign can sense temperature, strain, and hydration state.

 

 

Figure 3: Multiple sensors as well as a wireless power pick-up coil and simple transmitter and antenna allow this sensing solution in a thin flexible polymer from the University of California at San Diego, be used for various patient monitoring applications. One such  application could be to monitor premature babies to detect the onset of seizures, which could affect the baby’s development.

 

In addition to these advanced research prototypes, there are many real examples of Appcessories – application software and peripherals that link to and run on smartphones such as the Apple iPhone. Bryzek highlighted just a few – Proteus offers digestable sensors that send wireless signal through the body to a receiver. The sensors measure heart rate, activity, and respiratory rate. GeneZ offers a low-cost DNA chip containing up to 64 reaction of less than 1 microliter in volume – assay time is 10 to 30 minutes and the cost is less than $1000 (the chip cost is just $5 to $10). Uchek from MIT uses the smartphone’s image sensor and a software application available on the Apple App Store to read test strips and it can detect up to 25 diseases such as diabetes, urinary tract infections, and pre-clampsia. The test strips can also measure the levels of glucose, proteins, ketones, and still other health factors.

Putting a doctor in a pocket, Scanadu released three home diagnostic tools that leverage the sensors and processing capability in a Smartphone to perform imaging, sound analysis, molecular diagnostics, data analytics, and run a suite of algorithms that can create a comprehensive, real-time picture of your health. A “Lab on a Chip” developed by STMicroelectronics is employed by Veredus Laboratories to detect the current subtype of H7N9 (Avian Flu) along with other types of human subtypes of Influenza A. The Lab on a chip combines two powerful molecular biological applications – polymerase chain reaction and microarray and can detect the infection with a high accuracy and sensitivity within two hours while providing genetic information on the infection that traditionally would take days to weeks to learn. One last example provided by Bryzek is a device that performs DNA and RNA sensing – Nanobiosim, an engine that integrates physics, biomedicine, and nanotechnology that can rapidly and accurately detect genetic fingerprints from any biological organism.

Dave Bursky, Technology Editor

For conference program details, go to  https://www.meptec.org/meptec11thannual.html

New Processor Core Options Try Some ARM Wrestling

May 13th, 2013

When designing a system on a chip (SoC) that employs one or more embedded processor cores, the choice of available processors continues to expand. At last month’s Design West conference in San Jose, Calif., designers were presented with many processor options. Leading the pack, ARM, with its broad array of cores offers a wide range of performance choices, ranging from the Cortex-M0 at the low end of the performance spectrum to the 64-bit Cortex A57 at the high end. Although ARM’s cores dominate some SoC market segments, they aren’t the only game in town. EDA tool suppliers Synopsys and Cadence have acquired core suppliers ARC and Tensilica, respectively, and recently, Imagination Technologies acquired MIPS. Thus the number of independent processor core IP providers dropped considerably, but not for long.

One newcomer to the U.S. market, Andes Technology, has crafted multiple, synthesizable processor-core families, the N7, N8, N9, N10, N12, and N13, that offer 32-bit cores with gate counts that start at just 12k gates (for the N7). For applications that don’t require legacy compatibility these cores can challenge ARM and other vendors for embedded applications. Based on a proprietary instruction-set architecture (ISA), the N7 family cores can deliver about 1.19 MIPS/MHz, which is about 20 percent higher than the ARM Cortex-M0. Additionally, the cores consume about 30 percent less power at the same performance level as the M0. The low-gate-count core, referred to as the Hummingbird, also requires a small amount of chip real-estate – less than 0.04 mm2 when fabricated using a 90 nm process. With optional features such as a prefetch buffer that can serve as a small instruction cache, the core can deliver up to 1.45 DMIPS/MHz, but to get the higher performance the gate count would increase to close to 30K gates.

Figure 1: One of the higher-end processor cores from Andes Technology is
the N12. It contains an eight-stage pipeline with dynamic branch prediction, a
memory-management unit and instruction and data caches.

 

The ISA consists of a mix of 16- and 32-bit instructions that execute on the N7, which has a simple two-stage pipelined architecture. On the high-end, the N12 and N13 series implement the ISA on an eight-stage pipeline and pack a memory-management unit, instruction and data caches, and dynamic branch prediction (Figure 1). Programming tools and a good compiler make the proprietary ISA a non-issue and allow designers to program using tools like GCC/Linux. The Hummingbird core is targeted at applications such as Bluetooth, the Internet of Things (IOT)/machine2machine communications, touchscreen controllers, and other embedded applications, the Hummingbird core licensing fees are considerably lower than what ARM charges for its M0 core, thus keeping down the cost of the SoC. The higher-performance cores take on performance-sensitive applications such as embedded Linux systems.

Figure 2: Between the commercial CPUs and a dedicated fixed-function solution is the ASIP (application-specific
instruction-set processor)—a block of customer-defined intellectual property (left). Tools from Target Compiler Technologies allow designers to craft the IP block and incorporate the block in an ASIC, thus enabling the designers t0 significantly improve the power efficiency as well as the performance of their ASIC solution (right).

 

 

Taking a different approach to crafting an embedded processor core, Target Compiler Technologies offers tools that let designers define everything from their own optimized processor cores to a complete multicore application-specific SoC. By allowing designers to craft their own application-specific intellectual property (ASIP) the company’s IP Designer tools allow architectural exploration, SDK generation (C compiler, instruction set simulator, debugger, etc.), and RTL generation. Once the IP blocks are defined, the MP Designer tools for multicore ASIC design perform code parallelization, communication and synchronization and multicore platform generation (Figure 2).

Figure 3: A single-tile xCore processor SoC platform from XMOS can
emulate up to eight “logical” processors and has areas set aside that designers
can use to customize the I/O and bus interface/communications channel. The
platform chips from XMOS can contain 1, 2, or 4 physical processor tiles (up to
32 logical processors) and can clock at up to 500 MHz.

 

Somewhere between a dedicated processor core and a fully-definable multicore platform sits the configurable processor SoC platform developed by XMOS. The company offers a partially-predefined multiple processor platform that contains 1, 2, or 4 processor “tiles”, with each tile able to run up to eight threads (or eight logical processors) and basic support blocks such as SRAM, PLLs, timing (schedulers, timers, clocks), Security (one-time-programmable ROM), and JTAG debug port (Figure 3). The remainder of the platform consists of configurable sections into which designers can drop special IP blocks from the XMOS library or their own their proprietary interface/special function logic IP that connects to the platform’s I/O ports and X-Connect interface channels/links.

Each processor tile can deliver up to 500 MIPS of compute power when running at 500 MHz. Each logical processor (a thread) shares processing resources and memory in the tile, but each logical processor has its own register files and gets a guaranteed slice of the tile processor’s compute power (125 MIPS at 500 MHz). The high performance of the processor tiles allows the xCore to take on many applications in consumer and audio systems, automotive systems, industrial control, and display/imaging systems.

Dave Bursky
Semiconductor Technology Editor

Highly-Integrated Solutions for IEEE 802.11ac Deliver Gigabit Wireless Networks

May 8th, 2013

The demand for higher and higher bandwidths on wireless networks has pushed data rates from a paltry 10 Mbits/s for early wireless networks that employed hardware based on the IEEE 802.11b standard to data rates peaking at 1.3 Gbit/s by leveraging the latest IEEE 802.11ac standard. This recently approved standard leverages advances in silicon integration to pack copious amounts of signal processing, multiple radios to set up multiple-input/multiple-output (MIMO) subsystems that employ as many as four transmit and four receive channels, and still more features. Although there is no relationship to cellular radio standards, many people refer to 11ac systems as 5G wireless since the 11ac standard is basically the fifth major standard for wireless networks (previous “generation” standards started with 802.11b, then 802.11g, followed by 802.11a, and then 802.11n, with each generation offering higher data transfer rates, and with 11a and 11n, moving the operating carrier from the 2.4 GHz band up to the 5 GHz band, with 802.11 devices typically offering dual band capability (2.4/5 GHz)

Although backward compatible with the 802.11a and 11n 5-GHz frequencies, the 802.11ac standard does not have a “legacy” mode to connect with 802.11b, and 11g wireless interfaces. By eliminating the lower frequency radio, designers opened up some area on the chip to add many new features such as beamforming and enhancements to deliver better quality of service (QoS). Additionally, other wireless interfaces have been added by some vendors – Bluetooth and NFC (near-field communication) interfaces have been integrated by a few of the 802.11ac chip suppliers. The QoS on a wireless network has become a key issue since many of the networks now stream extensive amounts of video and audio content, and no one enjoys video or audio content that breaks-up or starts and stops.

The various chip suppliers have each taken different integration approaches for their system-on-a-chip (SoC) solutions, with the differences showing up in the number of MIMO channels, the inclusion of Bluetooth, NFC, and even an FM radio receiver, Currently there are only a handful of chip suppliers – Broadcom, Marvell, Redpine Signals, Qualcomm-Atheros, and Quantenna that provide 802.11ac solutions. Broadcom, for example offers the BMC4335, which it calls a complete single-stream 5G WiFi system. Since this chip includes only one transmit and one receive channel, its maximum data rate is limited to 433.3 Mbits/s.

On the chip designers employed a 40 nm CMOS process and have integrated the media-access controller (MAC), the physical interface (PHY), RF circuits for both 2.4 and 5 GHz operation (legacy compatibility with 802.11a/b/g/n), an FM radio, as well as a Bluetooth radio capable of handling both the 4.0 low-energy protocol as well as the high-speed standard. The chip is platform-agnostic and can be added to any handset, tablet, or other platform. To ensure reliable connectivity and good area coverage, the chip also incorporates advanced beamforming to optimize the antenna radiation pattern, and both low-density parity check (LDPC) and space-time block coding (STBC) to reduce transmission and reception errors.

Building on that basic chip, Broadcom has multiple variants of the circuit that include 2×2 and 3×3 MIMO radios to achieve higher data throughputs. The BCM4360 and BCM43460 have three spatially multiplexed channels and can achieve data rates of up to 1.3 Gbits/s, while the BCM4352 and BCM43526 have two channels and max out their data rates at 866.6 Mbits/s.

Going full-bore with four MIMO channels, the Marvell Avastar 88W8864 delivers a 1.3 Gbit/s peak data rate and leverages both Beamforming and LDPC to ensure signal quality (Figure 1).

Figure 1: Providing a 2×2 MIMO capability that delivers data at rates of up to 866.6
Mbits/s the the Avastar 88W8897 from Marvell includes an NFC interface and
supports the Miracast point-to-point streaming interface for HD video.

Offering a top data rate of 866.6 Mbits/s, the Avastar 88W8897 offers a lower-cost alternative to the 4×4 channel chip. Unlike the Broadcom chips, both the 88W8864 and 88W8897 don’t include the FM radio, but they add an NFC capability and support for point-to-point HD video streaming using the Miracast specification.

Qualcomm-Atheros has 1-, 2- and 3-stream solutions in its VIVE family that deliver data rates ranging from 433.3 Mbits/s to 1.3 Gbits/s. The chips also include a Bluetooth 4.0 low-energy radio that can also operate in a high-speed mode. For tablets, the WCN3680 mobile 802.11ac solution features integrated Bluetooth 4.0 and FM capabilities, while for notebooks the QCA9862 and QCA9860 are 2- and 3-stream, dual-band 802.11ac solutions with integrated Bluetooth 4.0 connectivity. The company has also developed a triband chip in conjunction with Wilocity, the QCA9005, that co-integrates the 60-GHz 802.11ad standard referred to as WiGig. The WiGig interface provides multi-gigabit networking, data syncing, and video and audio streaming, while maintaining its wireless bus extension docking capabilities.

A two-chip solution, the QAC2300 from Quantenna offers a full 4×4 MIMO transceiver that combines both 802.11ac and 802.11n channels. By using both the 802.11ac and 802.11n channels in parallel the chipset can achieve transfer speeds of up to 2 Gbits/s. The two chips consist of a digital baseband with the 4×4 MIMO channels, and an RF chip that supports the 5 GHz 802.11ac standard. Also dividing their solution into two chips, Redpine Signals has crafted both a single-channel and a triple-channel baseband chip, the RS9117 and RS9333, respectively. Both incorporate Bluetooth 4.0 radios and a ZigBee interface (Figure 2).

 

 

Figure 2: Supporting a single channel, the RS9117 from Redpine Signals handles data
transfers of up to 433.3 Mbits/s and can also simultaneously transfer data over
the 802.11n interface, thus increasing the overall data transfer to over 500
Mbits/s.

Complementing the baseband chips are several RF transceiver options – the RS8221, 8331, and 8112. The RS8221 is a CMOS dual-band (2.4 and 5 GHz) RF and power amplifier that supports 1×1 or 2×2 channel configurations, while the RS8331 can handle 1×1, 2×2, or 3×3 MIMO configurations (Figure 3) and the RS8112 has a single-channel output that can simultaneously operate on both the 2.4 and 5 GHz bands.

Figure 3: A triple-channel (3×3 MIMO) dual-band RF front end, the RS8331 from Redpine
Signals, can simultaneously transfer data at 5 GHz for 802.11ac and at 2.4 GHz
for legacy 802.11 compatibility.

For someone who started out using sneaker-net and migrated to each generation of networking interface, that ability to deliver data at gigabit speeds is impressive. And, it won’t stop there. Future process and integration advances will allow yet higher data rates and improved QoS for better media streaming – especially important now that Ultra HD video systems (4K resolution) are already starting to appear.

Circuit and Process Advances at ISQED Deliver Improved Performance

March 10th, 2013
 

Last week’s International Conference on Quality Electronic Design (ISQED) held in Santa Clara bridges the gap between, and the integration of, electronic design tools and processes, integrated circuit technologies, processes and manufacturing to achieve the best possible design quality. At the conference, the multiple keynotes by Chenming Hu, TSMC Distinguished Professor at the University of California at Berkeley Graduate School, Brad Brech, a member of the IBM Academy of Technology, and Bill Swift, the Vice President of Engineering at Cisco Systems presented a look at the evolving device technology, sustaining innovation for smarter computing in data centers, and a system-level perspective on semiconductors for intelligent networks, respectively.

At the keynote luncheon, Ed Petrus, the Director of Custom Architecture for the Deep Submicron Division of Mentor Graphics, examined the trends and issues with analog/mixed-signal design tools. Additional keynotes on the second day included presentations by Sanjiv Taneja, the Vice President of Product Engineering at Cadence Design Systems and Perry Goldstein, the Director of Sales and Marketing for Marshall Electronics. Taneja discussed advances in physically aware, high-capacity RTL synthesis for advanced nanometer designs, while Goldstein gave an end-systems look at the lifecycle issues of audio products for both consumer and professional applications. 

Professor Hu kicked off the plenary session with a look at changes to transistor structures ranging from vertical FinFET devices to the latest developments in ultra-thin-body (UTB) devices. Future devices such as the UTB transistors, pillar structure devices, and tunneling transistors are all on the horizon as feature sizes shrink to 10 nm and below. However, as features shrink, the operating voltages also have to shrink. Thus, much of Hu’s research is studying ultra-low operating voltages – as low as 0.1 V Vdd – and the creation of tunneling transistors that can turn on or off with less than a 0.1 V change in gate voltage.

Driving this need for smaller and higher-performance devices are the ever-increasing performance demands of data centers and intelligent networks, as Brech and Swift respectively detailed in their keynote presentations. According to Brech, smarter computing overcomes the challenges of new analytics, cloud, big data, and security requirements through the use of appropriate technologies. Thus, doing things smarter and faster are the driving factors for the next generation of data centers. Intelligent networks go hand-in-hand with smarter computing in the data centers, and Swift examined the technology and business trends, as well as innovation drivers for advances in semiconductor technology that enable products and solutions for intelligent networks.

In the previous discussions,

There are many design aspects that can cause a design re-spin, with logic or functional errors leading the way by at least 2X any of the other causes. However, analog circuit tuning and mixed-signal interface issues also account for a significant percentage of re-spins.

the main focus has been on digital systems. However, many systems require interfaces to the analog world and thus analog/mixed-signal circuit integration with digital systems becomes an essential aspect of systems design. Toward that goal, Petrus discussed the unique challenges of designing custom ICs that leverage the smaller geometries such as Hu discussed in his keynote. These ICs are often assembled using multiple resources and various design methodologies that include IP reuse, top-down design, and bottom-up design. However, as Petrus pointed out, analog circuits are sensitive to layout, matching and proximity, and have demanding interface requirements, with advanced process nodes amplifying the design challenges.

Today’s mixed-signal SoCs often integrate many mixed-signal blocks that each contain a hierarchy of tightly-integrated analog and digital circuits. However, functionally verifying the mixed-signal designs remains a tough challenge, with simple simulation no longer able to do an adequate job. In fact, as Petrus explained using a graph supplied by Off-Chip Communications LLC, about 20% of all design re-spins were due to errors at the mixed-signal interface (see the Figure) – but 50% of design respins at 65 nm and below were due to issues with mixed-signal functionality. Although the chart’s numbers only show designs through 2007, the continuing shift to smaller features just exacerbates the re-spin challenges.

Thus, designers need a new discipline referred to as mixed-signal verification that combines skills in mixed-mode simulation, behavioral model generation, as well as characterization and test-bench development. The tools needed must achieve better verification of analog and digital interfaces, create a higher-level of abstraction for analog and mixed-signal blocks, verify the increased amount of digital logic in analog designs, automate the current manual custom design steps (especially in placement and routing), and adopt circuit analytics that tell the designer something about why and where the circuit is failing to perform.

RTL synthesis tools that are physically-aware and can deal with new physics effects, new device structures, interconnect stacks with vastly varying resistance characteristics, and process variations are the next challenge according to Taneja. Such RTL synthesis tools are needed to handle gigascale and gigahertz SoC designs and perform accurate and predictive modeling of the interconnect stack, vias, and other physical effects.

Dave Bursky
Chip Design Magazine

Next Page »

©2014 Extension Media. All Rights Reserved. PRIVACY POLICY | TERMS AND CONDITIONS