Paperlinks

ISSCC 2011

Posted on March 11, 2011
Tagged with , , , ,

Website: http://isscc.org
20-24 February 2011, San Francisco, CA

ISSCC – the International Solid-State Circuits Conference – is the flagship conference of the Solid-State Circuits Society. It is widely considered the premier forum for presenting advances in solid-state circuits and systems-on-a-chip.

Here is a round-up of this year’s major SOI-based papers.

Session 4: Enterprise Processors & Components

#4.1:  A 5.2Ghz Microprocessor Chip for the IBM zEnterpriseTM System

J. Warnock, Y. Chan, W. Huott, S. Carey, M. Fee, H. Wen, M. Saccamango, F. Malgioglio, P. Meaney, D. Plass, Y-H. Chan, M. Mayo, G. Mayer, L. Sigal, D. Rude, R. Averill, M. Wood, T. Strach, H. Smith, B. Curran, E. Schwarz, L. Eisen, D. Malone, S. Weitzel, P-K. Mak, T. McPherson, C. Webb (IBM)

With this paper, IBM demonstrated the first commercial processor breaking the 5GHz speed  barrier.  The microprocessor chip for the IBM zEnterprise 196 system, it is implemented in 45nm SOI. It contains 4 processor cores running at 5.2GHz, and includes an on-chip high-speed 24MB shared DRAM L3 cache. The IBM team used a comprehensive design approach combining detailed power modeling and reduction techniques, with programmable timing control and a number of high-performance process-technology features in order to achieve this speed breakthrough within the available power envelope.

#4.2: Dynamic Hit Logic With Embedded 8Kb SRAM in 45nm SOI for the zEnterpriseTM Processor

A. R. Pelella, Y. H. Chan, B. Balakrishnan, P. Patel, D. Rodko, R. E. Serton (IBM)

This paper describes dynamic hit logic with an embedded 8Kbit SRAM. The 14b hit logic uses a search-for-a-hit scheme with programmable launch and reset clocks. Array BIST provides both the hit logic and SRAM with full at-speed test coverage. The SRAM (1R/1W) uses 45nm SOI 6T cell with domino hierarchical dual-read bitlines.

#4.5:  Design Solutions for the Bulldozer 32nm SOI 2-core Processor Module in an 8-Core CPU

T. Fischer, S. Arekapudi, E. Busta, C. Dietz, M. Golden, S. Hilker, A. Horiuchi, K. A. Hurd, D. Johnson, H. McIntyre, S. Naffziger, J. Vinh, J. White, K. Wilcox (AMD)

This paper describes the new “Bulldozer” 2-core CPU module that contains 213M transistors in an11-metal layer 32nm high-k metal-gate SOI CMOS process. In addition to the micro-architecture improvements, the components, such as the L1 and L2 caches, the integer unit and the Floating Point unit, are designed to achieve higher frequency, lower power consumption, and lower gate counts per cycle than the 45nm AMD core while maintaining IPC (Instructions per Cycles). It achieves over 3.5GHz in an area (including 2MB L2 cache) of 30.9mm2.

#4.6:  40-Entry Unified Out-of-Order Scheduler and Integer Execution Unit for the AMD Bulldozer x86-64 Core

M. Golden, S. Arekapudi, J. Vinh (AMD)

This paper presents a 40-instruction out-of-order scheduler that issues four operations per cycle and supports single-cycle operation wakeup. The integer execution unit supports single-cycle bypass between four functional units. Critical paths are implemented without exotic circuit techniques or heavy reliance on full-custom design. Architectural choices minimize power consumption.


Session 8: Wireline Architectures and Circuits for Next Generation Wireline Transceivers

#8.8: A 14Gb/s High-Swing Thin-Oxide Device SST TX in 45nm CMOS SOI

C. Menolfi, T. Toifl, M. Rueegg, M. Braendli, P. Buchmann, M. Kossel, T. Morf (IBM, Miromico)

The IBM/Zurich R&D team presented a 14Gb/s high-swing source-series-terminated (SST) TX that features up to twice the signaling amplitude of a conventional SST design. The 4-tap FFE TX is based on a high-voltage, thin-oxide device SST output driver stage whose pull-up and pull-down switches are driven from separate, split supply pre-drivers. Implemented in 45nm CMOS SOI, the circuit consumes 85.5mW at 14Gb/s from a nominal supply of 1V and an output driver supply of 2V.


Session 12: Design In Emerging Technologies

#12.4:  A 3.9ns 8.9mW 4×4 Silicon Photonic Switch Hybrid Integrated with CMOS Driver

A. Rylyakov, C. Schow, B. Lee, W. Green, J. Van Campenhout, M. Yang, F. Doany, S. Assefa, C. Jahnes, J. Kash, Y. Vlasov (IBM)

A monolithic 4×4 silicon photonic router, composed of 6 2×2 2mW 3.9ns 300×50µm2 Mach-Zehnder interferometer switches, is flip-chip bonded with a custom 90nm bulk CMOS driver, routing 3×40Gb/s WDM data with BER <10-12, less than -10dB cross-talk and 7dB loss. The size of the micro-assembly is 1×2×2mm3.


Session 14:  High-Performance Embedded Memory

#14.1: A 64Mb SRAM in 32nm High-k Metal-Gate SOI Technology with 0.7v Operation Enabled by Stability, Write-Ability and Read-Ability Enhancements

H. Pilo, I. Arsovski1, K. Batson, G. Braceras, J. Gabric, R. Houle, S. Lamphier, F. Pavlik, A. Seferagic, L-Y. Chen, S-B. Ko, C. Radens (IBM)

This paper described the first 32nm embedded SRAM SOI implementation that enables low-power operation down to 0.7V.  The SRAM features a 0.154µm2 bitcell. A 0.7V VDDMIN operation is enabled by three assist features. Stability is improved by a bitline regulation scheme that reduces charge injection into the bitcell. Enhancements to the write path include an increase of 40% of bitline boost voltage. Finally, a bitcell-tracking delay circuit improves both performance and yield across the process space.

#14.2: A 4R2W Register File for a 2.3Ghz Wire-Speed PowerTM Processor With Double-Pumped Write Operation

G. S. Ditlow, R. K. Montoye, S. N. Storino, S. M. Dance, S. Ehrenreich, B. M. Fleischer, T. W. Fox, K. M. Holmes, J. Mihara, Y. Nakamura, S. Onishi, R. Shearer, D. Wendel, L. Chang (IBM)

IBM introduces architectural techniques to significantly improve the area, power, and performance of multi-ported register file arrays. A 144×78b macro for a 45nm SOI-CMOS 2.3GHz POWER™ processor is presented with double-pumped write ports that are operated twice in a single cycle and replicated read ports that are combined from duplicate data copies. A compact 2R1W memory cell is thus leveraged to perform a 4R2W function with near 2× area and read power reduction, low 190ps read latency, and fast error correction. The macro operates at up to 2.76GHz at a supply voltage of 0.9V.

#14.3: An 8MB Level-3 Cache in 32nm SOI With Column-Select Aliasing

D. Weiss, M. Dreesen, M. Ciraula, C. Henrion, C. Helt, R. Freese, T. Miles, A. Karegar, R. Schreiber, B. Schneller, J. Wuu (AMD)

This paper presents the design of the 8MB level-3 cache in 32nm SOI-CMOS for AMD’s next-generation Bulldozer architecture that operates above 2.4GHz at 1.1V. Area efficiency is improved by the use of a column-select aliasing technique, in which column select wires are shared between odd and even pairs for reads and writes, while leakage power is minimized by supply gating and floating bitlines. An efficient redundancy scheme is also implemented using centralized redundancy blocks instead of storing all redundant data in the data macro itself.


Session 19:  Energy-Efficient Digital Low-Power Digital Techniques

#19.3: Comparison of 65nm LP Bulk and LP PD-SOI With Adaptive Power Gate Body Bias for an LDPC Codec

J. Le Coz, P. Flatresse, S. Engels, A. Valentian, M. Belleville, C. Raynaud, D. Croain, P. Urard (STMicroelectronics, CEA-LETI-MINATEC)

This paper compares a 65nm LP PD-SOI technology combined with an enhanced power gate device utilizing automatic adaptive body bias, to a standard LP CMOS bulk implementation, demonstrating an 802.11n LDPC codec. The authors show how a low resistivity produced with forward body bias of the power switch, combined with PD-SOI can reduce leakage current by 52.4% vs. bulk and increase the frequency by 20% at 1.2V, while decreasing power by 30% at 360MHz.


Session 25:  Wireline CDRs and Equalization Techniques

#25.6: A 15Gb/s 0.5mw/Gb/s 2-Tap DFE Receiver With Far-End Crosstalk Cancellation

M. Honarvar Nazari, A. Emami-Neyestanak (CalTech)

In this paper, a 2-tap DFE receiver is implemented in a 45nm SOI technology. High data rate and low power dissipation is achieved using a switched-capacitor S/H/summer front-end, which enables FEXT cancellation with 33µW/Gb/s/lane power overhead. It equalizes 15Gb/s data over a link with >14dB loss and dissipates 7.5mW from a 1.2V supply.

20-24 February 2011, San Francisco, CA