Low-Power High-Speed Serial Link Design

MISSING IMAGE

Material Information

Title:
Low-Power High-Speed Serial Link Design
Physical Description:
1 online resource (165 p.)
Language:
english
Creator:
Chen, Jikai
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Bashirullah, Rizwan
Committee Members:
Lin, Jenshan
Fox, Robert M
Ranka, Sanjay

Subjects

Subjects / Keywords:
adc -- background -- calibration -- circuit -- high -- link -- low -- mpu -- power -- serial -- speed
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
With ever increasing integrated functionalities and on-chip clock frequency on a processor, the off-chip bandwidth is increasing at even higher rates. The ITRS predicts that the aggregate off-chip bandwidth of future processors will reach 100 Tb/s in the next ten years, delivered by multiple high-speed serial links in parallel, each running at multi-Gb/s.  At the same time, the total power budget of a processor is practically flat due to package and cooling technology limitations. To accommodate the increase of off-chip bandwidth, the power efficiency of high-speed interconnects must be dramatically improved over the next decade. Various factors come into play when improving the energy efficiency of high-speed serial links. For multi-Gb/s off-chip signaling, the electrical channel presents the most difficult challenge with its latency and frequency-dependent attenuation. As a result, clock and data recovery (CDR) and channel equalization have become essential functions in all high-speed off-chip serial links.To truly optimize the link energy efficiency, the impact of channel condition, CDR and equalization on the link power must be well understood, in addition to that of such design choices as signaling mode and termination topology. This Dissertation is the result of such an effort. The Dissertation starts with an overview of the high-speed serial link. The channel loss mechanisms are first reviewed and dielectric loss is shown to be the dominant factor in future high-speed channels. The dependence of the signaling power on signaling modes, termination topologies and equalization techniques is analyzed to identify power-efficient solutions. CDR is also briefly reviewed, revealing the need for a better baud-rate scheme than existing ones. To reduce the dielectric loss, a low-power active link is presented in Chapter 3 with an air-cavity transmission line which reduces the channel latency and the dielectric loss by replacing the dielectric material between the signal lines and the ground plane with air. Other techniques include the use of DFE, a current-sharing frontend, and the removal of back termination for better power efficiency. The link works up to 6.25 Gb/s with an energy efficiency of 0.6 pJ/bit. Clock recovery is addressed in Chapter 4. A novel digital baud-rate CDR scheme is proposed which automatically tracks the maximum eye-opening. Chapter 4 also proposes replacing the selectors in a traditional speculative DFE with majority-voters which is faster and more power-efficient.A receiver that incorporates the proposed baud-rate CDR and majority-voting DFE works at 4.5 Gb/s while consuming 12.4 mW, yielding an energy efficiency of 2.8pJ/bit. Building upon the results of Chapters 3 and 4, Chapter 5presents a complete 5-Gb/s transceiver which dissipates only 3.7 mW. To improve the energy efficiency, the transceiver uses exclusively static CMOS logic gates instead of the CML gates in Chapters 3 and 4, and employs injection-locking based clock generation. Heavy parallelism and speculation in the DFE selection tree further reduces the power consumption. The measured 0.75-pJ/it energy efficiency is among the best reported to date. While currently most serial links still rely on some analog signal processing, the continuous scaling of CMOS technology has recently made an ADC-based serial link attractive in which equalization and timing recovery are all carried out in the digital domain. One of the key challenges in this ADC-based architecture is the power consumption of the high-speed ADC. Chapter 6presents a novel digital background calibration scheme suitable for high-speed ADCs which features negligible hardware and power overhead. The efficacy of the proposed calibration scheme is experimentally confirmed with a 50-mW 2.5-GS/s 5-bit full-flash ADC. All the test chips in this Dissertation are in a 0.13-µmbulk CMOS technology. However, they are readily applicable to more advanced technologies. It is therefore expected that techniques proposed in this Dissertation should help enable future off-chip serial links with high aggregate bandwidth and low power consumption.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Jikai Chen.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Bashirullah, Rizwan.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-05-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045277:00001


This item is only available as the following downloads:


Full Text

PAGE 1

1 LOW POWER HIGH SPEED SERIAL LINK DESIGN By JIKAI CHEN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSI TY OF FLORIDA 201 3

PAGE 2

2 201 3 Jikai Chen

PAGE 3

3 To my wife and my parents

PAGE 4

4 ACKNOWLEDGEMENTS During the seven years as a PhD student at the University of Florida, I received much help from many people. Although ther e is only person listed as the author, t he work presented in this Dissertation would not have been possible without them To each one of them I owe many thanks. I want to thank my advisor Dr. Rizwan Bashirullah for his encouragement when things might go wrong, his tolerance and patience when things did go wrong, and his high standard which I will carry though the rest of my life. I want to thank Dr. Jenshan Lin, Dr. Robert Fox, and Dr. Sanjay Ranka for being in my committee and spending their precious tim e on this Dissertation My special thanks go to my friends at ICR. Walker Turner, Qiuzhong Wu, H a ng Yu, Chris Dougherty, Chun ming Tang, Lin Xue, Zhiming Xiao, Chun chin Peng, Yan Hu Pawan Sabharwal, Deepak Bhatia, Lawrence Fomundam and Felipe Garay off ered me help when I needed it the most, and brought fun to my supp osedly dull PhD life I will miss the basketball games that we played in those hot summer days I want to thank Professor Paul Kohl and his group at Georgia Institute of Technology for their wonderful cooperation, especially Brad Chen and Todd Spencer. I feel blessed to have such wonderful friends outside ICR including Shuo Cheng, Mingqi Chen, Changzhi Li, Xiaogang Yu and Yan Yan. There is no doubt I enjoyed and will always cherish our frien dship. I am grateful to my manager, Yanli Fan, and my colleagues, Karl Muth, Archie Hu and Huawen Jin, at Texas Instruments Yanli has been very supportive when I needed to take time off for my defense. I learnt a lot from each one of them, and look forwar d to making my own contribution to the team.

PAGE 5

5 I want to thank my parents, my parents in law, and my sister Throughout the ups and downs in the past years, they support ed me with their love without condition If there is only one thing that I want to achiev e in my life, I want to make them proud. Finally I want to thank my dear wife, Yuan Rao the most caring and lovely woman in my life I cannot thank her enough for her love, encouragement, patience and everything she has done for me Marrying her is by f ar the best thing that ever happened to me. my wife and dedicating this Dissertation to her is the least I can do.

PAGE 6

6 TABLE OF CONTENTS page ACKNOWLEDGEMENTS ................................ ................................ ............................... 4 TABLE OF CONTENTS ................................ ................................ ................................ .. 6 LIST OF TABLES ................................ ................................ ................................ ............ 9 LIST OF FIGURES ................................ ................................ ................................ ........ 10 LIST OF ABBREVIATIONS ................................ ................................ ........................... 16 ABSTRACT ................................ ................................ ................................ ................... 18 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 21 1.1 Research Motivation ................................ ................................ ......................... 21 1.2 Dissertation Organization ................................ ................................ .................. 24 2 HIGH SPEED SERIAL LINK OVERVIEW ................................ ............................... 27 2. 1 Chapter Overview ................................ ................................ ............................. 27 2.2 The Channel ................................ ................................ ................................ ..... 28 2.3 Equalization ................................ ................................ ................................ ...... 32 2.3.1 FFE ................................ ................................ ................................ ........ 33 2.3.2 CTLE ................................ ................................ ................................ ..... 34 2.3.3 DFE ................................ ................................ ................................ ....... 35 2.4 Clocking ................................ ................................ ................................ ............ 36 2.4.1 Clock Generation ................................ ................................ ................... 36 2.4.2 Clock Recovery ................................ ................................ ...................... 39 2.5 Signaling ................................ ................................ ................................ ........... 41 2.5.1 Signaling Efficiency ................................ ................................ ................ 42 2.5.2 Effects of Channel Loss ................................ ................................ ......... 43 2.5.3 Effects of FFE and DFE ................................ ................................ ......... 45 2.5.4 Effects of Back Termination ................................ ................................ ... 46 2.5.5 Effects of Signaling and Termination Modes ................................ ......... 49 2.6 Summary ................................ ................................ ................................ .......... 52 3 AN ACTIVE LINK WITH AIR CAVITY TRANSMISSION LINES ............................. 54 3.1 Chapter Overview ................................ ................................ ............................. 54 3.2 Transmission Line Design ................................ ................................ ................. 57 3.3 Fabrication ................................ ................................ ................................ ........ 60 3.4 Link Implementa tion ................................ ................................ .......................... 62

PAGE 7

7 3.4.1 Link Architecture ................................ ................................ .................... 62 3.4.2 TX Design ................................ ................................ .............................. 63 3.4.3 RX Desi gn ................................ ................................ ............................. 64 3.4.3.1 Preamp design ................................ ................................ .......... 64 3.4.3.2 DFE design ................................ ................................ ............... 68 3.5 Experiment al Results ................................ ................................ ........................ 69 3.5.1 Air Cavity Transmission Line Measurement ................................ .......... 70 3.5.2 Link Measurement ................................ ................................ ................. 71 3.6 Summary ................................ ................................ ................................ .......... 74 4 A 4.5 Gb/s 12.4 mW RX WITH BAUD RATE CDR ................................ ................. 76 4.1 Chapter Overview ................................ ................................ ............................. 76 4.2 Baud Rate CDR ................................ ................................ ................................ 77 4.3 Majority Voting DFE ................................ ................................ .......................... 81 4.4 Chip Implementation ................................ ................................ ......................... 86 4.4.1 Architecture ................................ ................................ ........................... 86 4.4.2 Slicer ................................ ................................ ................................ ...... 88 4.4.3 DMUX ................................ ................................ ................................ .... 89 4.4.4 Clocking ................................ ................................ ................................ 90 4.5 Experimental Results ................................ ................................ ........................ 92 4.6 Summary ................................ ................................ ................................ .......... 96 5 A 5 Gb/s 0.75 pJ/BIT VOLTAGE MODE TRANSCEIVER ................................ ...... 98 5.1 Chapter Overview ................................ ................................ ............................. 98 5.2 TX Implementation ................................ ................................ ............................ 99 5.2.1 TX Architecture ................................ ................................ ...................... 99 5.2.2 PRBS Generator ................................ ................................ .................. 100 5.2.3 LDO ................................ ................................ ................................ ..... 102 5.2.4 TX Driver ................................ ................................ ............................ 103 5.3 RX Implementation ................................ ................................ ......................... 104 5.3.1 RX Architecture ................................ ................................ .................... 104 5.3.2 Slicer Design ................................ ................................ ....................... 105 5.3.3 Level Shifting and DFE Tap Generation ................................ .............. 106 5.3.4 DFE with Look Ahead Selection Tree ................................ .................. 108 5.3.5 Decimated Baud Rate CDR ................................ ................................ 109 5.4 Injection Locking Based Clock Generation ................................ ..................... 109 5.4.1 Clock Generation Overview ................................ ................................ 109 5.4.2 ILRO Core ................................ ................................ ........................... 110 5.4.3 D elay Line ................................ ................................ ............................ 111 5.5 Experimental Results ................................ ................................ ...................... 112 5.5.1 TX Measurement ................................ ................................ ................. 112 5.5.2 Clocking Measurement ................................ ................................ ........ 114 5.5.3 RX Measurement ................................ ................................ ................. 115 5.5.4 Transceiver Measurement ................................ ................................ ... 117 5.6 Summary ................................ ................................ ................................ ........ 120

PAGE 8

8 6 A DIGITAL BACKGROUND ADC CALIBRATION TECHNIQUE .......................... 122 6.1 Chapter Overview ................................ ................................ ........................... 122 6.2 Background Calibration ................................ ................................ ................... 124 6.2.1 Review of Prior Art ................................ ................................ ............... 124 6.2.2 Proposed Bac kground Calibration Scheme ................................ ......... 128 6.2.2.1 Calibration accuracy ................................ ............................... 130 6.2.2.2 Convergence speed ................................ ................................ 131 6.2.2.3 Calibration overhead and performance considerations ........... 133 6.3 Chip Implementation ................................ ................................ ....................... 134 6.3.1 ADC A rchitecture ................................ ................................ ................. 134 6.3.2 Resistor Ladder ................................ ................................ ................... 136 6.3.3 T/H ................................ ................................ ................................ ....... 136 6.3.4 Compa rator ................................ ................................ .......................... 138 6.3.5 Digital Backend ................................ ................................ .................... 144 6.3.6 Reference ADC ................................ ................................ .................... 14 4 6.3.7 C alibration Engine and Supporting Circuitry ................................ ........ 145 6.3.8 Clock and Power Distribution ................................ ............................... 146 6.4 Experimental Results ................................ ................................ ...................... 146 6.5 Summary ................................ ................................ ................................ ........ 151 7 CONCLUSIONS ................................ ................................ ................................ ... 153 LIST OF REFERENCES ................................ ................................ ............................. 155 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 165

PAGE 9

9 LIST OF TABLES Table Page 2 1 Summary of signaling and termination modes ................................ ........................ 52 3 1 Final air cavity microstrip dimensions ................................ ................................ ..... 58 3 2 Performance summary ................................ ................................ ............................ 74 4 1 C DR truth table ................................ ................................ ................................ ....... 79 4 2 update ................................ ................................ ................................ ...... 81 4 3 Clock phase update ................................ ................................ ................................ 81 4 4 Selector truth table ................................ ................................ ................................ .. 83 4 5 Majority voter truth table ................................ ................................ ......................... 84 4 6 Performance summary ................................ ................................ ............................ 96 5 1 Performance summary of the receiver ................................ ................................ .. 117 5 2 Performance summary of the transceiver ................................ ............................. 120 6 1 Comparison of proposed and existing background calibration schemes ............... 134 6 2 Comparison with recently published work ................................ ............................. 150

PAGE 10

10 LIST OF FIGURES Figure P age 1 1 Evolution of Intel Micropro cessors. ................................ ................................ ....... 22 1 2 ITRS predictions for transistor count and on chip clock frequency for the next decade. ................................ ................................ ................................ ............... 22 1 3 ITRS predictions of I/O and power for the next decade ................................ .......... 23 1 4 Power efficiency of high speed links vs. year ................................ ......................... 23 2 1 A typical high speed serial lin k ................................ ................................ ............... 27 2 2 Conductor loss. ................................ ................................ ................................ ..... 29 2 3 Physical mechanism of dielectric loss ................................ ................................ ..... 30 2 4 Channel loss ................................ ................................ ................................ .......... 31 2 5 A sample SBR ................................ ................................ ................................ ........ 32 2 6 Main cursor vs. Nyquist loss ................................ ................................ .................. 32 2 7 Eye degradation due to channel loss ................................ ................................ ..... 32 2 8 FFE. ................................ ................................ ................................ ....................... 33 2 9 CTLE. ................................ ................................ ................................ ..................... 34 2 10 DFE block diagrams. ................................ ................................ ............................ 36 2 11 Block diagrams of a PLL and a DLL. ................................ ................................ .... 37 2 12 Block diagrams of an inject ion locked 5 stage ring oscillator ............................... 38 2 13 Simulated phase noise suppression with injection locking ................................ ... 39 2 14 CDR block diagram ................................ ................................ .............................. 39 2 15 Block diagram and principle of Alexander PD ................................ ...................... 40 2 16 Simulated performances of an inverter in a 0.13 ............. 41 2 17 A typical link frontend ................................ ................................ ........................... 42 2 18 Main cursor amplitude and signaling power penalty vs. channel loss .................. 43

PAGE 11

11 2 19 Post cursor amplitudes vs. channel loss ................................ .............................. 44 2 20 The effects of channel loss and equalization on ................................ .......... 45 2 21 Effects of FFE and DFE in frequency domain ................................ ...................... 46 2 22 Lattice diagram for reflection calculation ................................ .............................. 48 2 23 Eye opening vs. RX mismatch ................................ ................................ ............. 48 2 24 CM signaling. ................................ ................................ ................................ ....... 50 2 25 VM signaling. ................................ ................................ ................................ ........ 51 3 1 Cross sections of microstrips. ................................ ................................ ................ 55 3 2 Simulated of conventional and air cavity microstrip ................................ ..... 56 3 3 Simulated of conventional and air cavity microstrip ................................ .......... 56 3 4 Simu lated dielectric loss of conventional and air cavity microstrip .......................... 56 3 5 Picture of the 3D model and simulated loss at various line widths ......................... 58 3 6 Simulated dielectric loss of air cavity and conventional transmission lines ............ 58 3 7 Improvement with air cavity transmission line ................................ ........................ 59 3 8 Signaling power reduction with air cavity. ................................ .............................. 59 3 9 Fabrication process for the air cavity structure ................................ ....................... 61 3 10 Picture and cross section of the fabricated air cavity structure ............................. 61 3 11 Link block diagram ................................ ................................ ................................ 62 3 12 Schematics of the latch and m ultiplexer. ................................ ............................... 63 3 13 Schematic of the 5 b DAC ................................ ................................ ..................... 63 3 14 Preamp model for gain optimization ................................ ................................ ...... 64 3 15 Preamp design. ................................ ................................ ................................ ..... 65 3 16 Input impedance tuning. ................................ ................................ ........................ 67 3 17 Simulated RX eye diagrams. ................................ ................................ ................. 67 3 19 Layout of the test board with the air cavity active link ................................ ........... 69

PAGE 12

12 3 20 Measured performances of a 5 cm air cavity microstrip. ................................ ...... 70 3 21 Loss of the air cavity line ................................ ................................ ....................... 71 3 22 Chip micrographs of the TX and the RX ................................ ................................ 71 3 23 Picture of the populated test board ................................ ................................ ....... 72 3 24 Test setup ................................ ................................ ................................ ............. 72 3 25 Measured waveforms ................................ ................................ ............................ 73 3 26 Measured link performances. ................................ ................................ ................ 74 4 1 Different ISI seen by the edge and data samples ................................ ................... 76 4 2 CDR block diag rams. ................................ ................................ .............................. 78 4 3 Operation principle of the proposed baud rate CDR ................................ ............... 80 4 4 Block diagram of a 1 tap speculative DFE ................................ .............................. 82 4 6 Proposed majority voter schematic ................................ ................................ ......... 83 4 7 Simulated delay. ................................ ................................ ................................ ...... 85 4 8 Simulated sele ctor and majority voter performances. ................................ ............. 86 4 9 Block diagram of the RX ................................ ................................ ......................... 87 4 10 Schematic of the slicer with threshold control ................................ ....................... 88 4 11 Simulated slicer performances. ................................ ................................ ............. 89 4 12 Schematics of the CML and CMOS DMUX cells ................................ ................... 90 4 13 Schematic of the divider for I/Q generation ................................ ........................... 90 4 14 Principle of PI ................................ ................................ ................................ ........ 91 4 15 Schematic of the phase interpo lator ................................ ................................ ...... 91 4 16 Level converter schematic. ................................ ................................ ................... 92 4 17 Die micrograph and board picture ................................ ................................ ......... 92 4 18 Test setup ................................ ................................ ................................ ............. 93 4 ................................ ................................ ... 94

PAGE 13

13 4 20 Measured DFE performances. ................................ ................................ .............. 95 4 21 CDR measurement results. ................................ ................................ ................... 95 4 22 Measured CDR jitter tolerance ................................ ................................ .............. 96 5 1 TX bloc k diagram ................................ ................................ ................................ .. 100 5 2 PRBS block diagram ................................ ................................ ............................. 100 5 3 All zero detector ................................ ................................ ................................ .... 102 5 4 Schematic of the self biased comparator with offset ................................ ............. 102 5 5 Simulated waveforms confirming the function of the all zero detector .................. 102 5 6 Stability of the LDO ................................ ................................ ............................... 103 5 7 RX block diagram ................................ ................................ ................................ .. 104 5 8 Schematic of the slicer ................................ ................................ .......................... 105 5 9 Level shifters. ................................ ................................ ................................ ........ 106 5 10 Detailed schematic of the level shifter ................................ ................................ 107 5 11 Simulated frequency respon se of the level shifter at different gain settings ........ 107 5 12 Simulated pre layout selector delay vs. power supply ................................ ......... 108 5 13 DFE sele ction tree. ................................ ................................ .............................. 109 5 14 Block diagram of the injection locking based clock generation ........................... 110 5 15 Schematic of the ILRO core ................................ ................................ ................ 111 5 16 Start up issue of the pseudo differential oscillator ................................ .............. 111 5 17 Schematic of the current starved delay line ................................ ........................ 112 5 18 Simulated delay line tuning curve ................................ ................................ ........ 112 5 19 Chip micrograph and transceiver layout ................................ .............................. 113 5 20 TX measurement results at 6.25 Gb/s. ................................ ................................ 113 5 21 ILRO measurement results. ................................ ................................ ................ 114 5 22 Measured phase noise with and without inje ction locking ................................ ... 115

PAGE 14

1 4 5 23 Measured CDR delay line tuning curve showing >2 UI tuning range .................. 115 5 24 Measured loss characteristics of t ................................ ............... 116 5 25 Measured 4 ...................... 116 5 26 RX bathtubs with and without DFE ................................ ................................ ...... 116 5 27 Jitter histogram of the recovered clock ................................ ................................ 117 5 28 Measured 5 Gb/s TX eye diagrams. ................................ ................................ ... 118 5 29 Measured CDR waveforms. ................................ ................................ ................ 119 5 30. RX bathtubs with and withou DFE ................................ ................................ ...... 119 6 1 An ADC based serial link ................................ ................................ ...................... 122 6 2 Schematic of a preamp ................................ ................................ ......................... 123 6 3 Correlation based calibration ................................ ................................ ................ 125 6 4 Redundancy based calibration ................................ ................................ .............. 126 6 5 Reference ADC based calibration ................................ ................................ ......... 127 6 6 Principle of reference ADC based calibration ................................ ...................... 127 6 7 Proposed reconfigurable comparator based calibration ................................ ........ 129 6 9 Mechanism of noise induced calibration error ................................ ....................... 131 6 10 Required conversions for convergence with different resolutions ....................... 133 6 11 Block diagram of the ADC ................................ ................................ ................... 135 6 12 T/H Design. ................................ ................................ ................................ ......... 137 6 13 T/H Bandwidth vs. switch width ................................ ................................ .......... 137 6 14 Comparator block diagram. ................................ ................................ ................. 138 6 15 Schematics of the first two stages of the preamplifier ................................ ......... 139 6 16 Effects of M 3 ................................ ................................ ................................ ....... 140 6 19 Current steering DAC and the DAC bias generator. The bias generator is shared by all the comparators. ................................ ................................ ......... 142 6 20 Simulated comparator performances. ................................ ................................ 143

PAGE 15

15 6 21 Block diagram of the digital backend ................................ ................................ ... 144 6 22 FSM flow chart. N is the calibration index, which is also the SRAM address. ..... 145 6 23 Chip micrograph. ................................ ................................ ................................ 147 6 24 Measured ADC linearity. ................................ ................................ ..................... 148 6 25 Test setup for dynami c performance evaluation ................................ .................. 149 6 26 Output spectrums. ................................ ................................ ............................... 149 6 27 ENOB w/ and w/o calibration ................................ ................................ .............. 149

PAGE 16

16 L I ST O F ABBREVIATIONS Term : Definition ADC Analog to digital converter CDR Clock and data recovery CG Common gate CM Current mode CML Current mode logic CTLE Continuous time linear equalization DFE Decision feedback equalization DLL Delay lock ed loop DMUX De multiplexer DNL Differential non linearity DSP Digital signal processor ENOB Effective number of bits FFE Feedforward equalization FSM Finite state machine ILRO Injection locked ring oscillator INL Integral non linearity ISI I nter symbol in terference ITRS International technology roadmap of semiconductors I/O Input/output LFSR Linear feedback shift register LPF Low pass filter LSB Least significant bit MUX Multiplexer NRZ Non return to zero

PAGE 17

17 PD Phase detector PFD Phase and frequency detector PI Phase interpolator PLL Phase locked loop PM Phase modulation PRBS Pseudo random bit sequence RX Receiver SAFF Sense amplifier flip flop SBR Single bit response SNR Signal to noise ratio TX Transmitter UI Unit interval VCDL Voltage controlled delay line VCO Voltage controlled oscillator VM Voltage mode

PAGE 18

18 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy LOW POWER HIGH SPE ED SERIAL LINK DESIGN By Jikai Chen May 2013 Chair: Rizwan Bashirullah Major: Electrical and Computer Engineering With ever increasing integrated functionalities and on chip clock frequency on a processor, the off chip bandwidth is increasing at even higher rates. The ITRS predicts that the aggregate off chip bandwidth of future processors will reach 100 Tb/s in the next ten years, delivered by multiple high speed serial links in parallel, each running at multi Gb/s. At the same time, the total power budget of a processor is practically flat due to package and cooling technology limitations. To accommodate the increase of off chip bandwidth, the power efficiency of high speed interconnects must be dramatically improved over the next decade. Various fac tors come into play when improving the power efficiency of high speed serial links. For multi Gb/s off chip signaling the electrical channel presents the most difficult challenge with its latency and frequency dependent attenuation. As a result, clock and data recovery (CDR) and channel equalization have become essential functions in all high speed off chip serial links To truly optimize the link power efficiency, the impact of channel condition, CDR and equalization on the link power

PAGE 19

19 must be well underst ood in addition to that of such design choices as signaling mode and termination topology This Dissertation is the result of such an effort. The Dissertation starts with an overview of the high speed serial link. The channel loss mechanisms are first re viewed and dielectric loss is shown to be the dominant factor in future high speed channels The dependence of the signaling power on signaling modes, termination topologies and equalization techniques is analyzed to identify power efficient solutions. CDR is also briefly reviewed, revealing the need for a better baud rate scheme than existing ones. To reduce the dielectric loss, a low power active link is presented in Chapter 3 with an air cavity transmission line which reduces the channel latency and the dielectric loss b y replacing the dielectric material between the signal lines and the ground plane with air. Other techniques include the use of DFE, a current sharing frontend and the removal of back termination for better power efficiency. The link work s up to 6.25 Gb/s with a power efficiency of 0.6 pJ/bit. C lock recovery is addressed in Chapter 4. A novel digital baud rate CDR scheme is proposed which automatically track s the maximum eye opening Chapter 4 also proposes replacing the selectors in a tra ditional speculative DFE with majority voters which is faster and more power efficient. A receiver that incorporates the proposed baud rate CDR and majority voting DFE works at 4.5 Gb/s while consuming 12.4 mW, yielding a power efficiency of 2.8 pJ/bit. Bu ilding upon the results of Chapters 3 and 4, Chapter 5 presents a complete 5 Gb/s transceiver which dissipates only 3.7 mW. To improve the power efficiency, the transceiver uses exclusively static CMOS logic gates instead of the CML gates in

PAGE 20

20 Chapters 3 and 4, and employs injection locking based clock generation Heavy parallelism and speculation in the DFE selection tree further reduces the power consumption. The measured 0.75 pJ/it power efficiency is among the best reported to date. While currently m ost s erial link s still rely on some analog signal processing t he continuous scaling of CMOS technology has recently made an ADC based serial link attractive in which equalization and timing recovery are all carried out in the digital domain. One of the key cha llenges in this ADC based architecture is the power consumption of the high speed ADC. Chapter 6 presents a novel digital background calibration scheme suitable for high speed ADCs which features negligible hardware and power overhead. The efficacy of the proposed calibration scheme is experimentally confirmed with a 50 mW 2.5 GS/s 5 bit full flash ADC. All the test chips in this Dissertation are in a 0.13 m bulk CMOS technology. However, they are readily applicable to more advanced technologies. It is the refore expected that techniques proposed in this Dissertation should help enable future off chip serial links with high aggregate bandwidth and low power consumption.

PAGE 21

21 CHAPTER 1 INTRODUCTION 1.1 Research Motivation The past few decades have witnessed the tremendous ad vancement of the [1] [2] the functionality (represented by the number of transisto rs) integrated on a single chip and the on chip clock frequency b oth grew exponentially as can be observed in Figure 1 1 which shows the transistor number and on chip clock frequency microprocessors over the past 40 years. Consequently, higher and higher I/O bandwidth is needed for the communication between microprocessors, accelerators, and memories [3] Recently the aggregate off chip bandwidth has entered the Tb/s range, necessitating the integration of multiple (tens or even hundreds of) high speed s erial link transceivers on the same chip each operating at multi Gb/s. For example, in [4] a 16 core SPARC processor has 1.1 Tb/s aggregate I/O bandwidth provided by 112 transmitters and 176 receivers with peak signaling rate of 4.08 Gb/s each. Such exponential growth of functionality and clock frequency is expected to continue in the coming decade, as predicted by ITRS [5] and shown in Figure 1 2 (A) and (B) giving rise to even faster increase of the I/O bandwidth over the same period. Figure 1 3 (A) and Figure 1 3 (B) show the predicted off chip clock frequency and the total number of pads, while the resulting aggregate off chip bandwidth is plotted in Figure 1 3 (C) assuming that differential NRZ signaling is used and that 50% of the pads are dedicated to off chip signaling. It can be seen that within 10 years, the total bandwidth will extend to th e hundred Tb/s range.

PAGE 22

22 (A) (B) Figure 1 1 Evolution of Intel Microprocessors. A ) Transistor c ount B ) on chip clock frequency (A) (B) Figure 1 2 ITRS p redict ions for t ransistor count an d on chip clock frequency for the next decade. A) Transistor count. B) on chip clock frequency However, due to packaging and cooling limitations, it is also predicted that the total power consumption of a processor will be kept practically flat about 140 W over the same period as shown in Figure 1 3 (D) [5] State of the art power efficiency of high speed serial link transceivers is around 1 pJ/bit (1 mW/Gb/s), which means 100 W I/O power consumption i f 100 Tb/s aggregate bandwidth is desired. Apparently, the power efficiency of high speed transceivers must be greatly improved in order to maintain such a growth of I/O bandwidth. For example, if the I/O power is to be kept around 20% of the whole chip, t he power efficiency should improve to approximately 0.2 pJ/bit in 2022.

PAGE 23

23 (A) (B) (C) (D) Figure 1 3 ITRS predictions of I/O and power for the next decade Figure 1 4 Power efficie ncy of high spee d links vs. year In response, the power efficiency of high speed serial links has been steadily improving at about 20% each year [6] [7] in the past driven by the joint effort of tec hnology scaling and design innovations. Figure 1 4 shows the power efficiency of the high speed serial links published in ISSCC and the VLSI Symposium since 2000.

PAGE 24

24 Extrapolating this trend to 2022 gives about 0.7 pJ/bit, which is 3 the 0.2 pJ/bit goal. This clearly indicates that more drastic improvement is needed in the future and is the motivation behind the research work presented in this Dissertation 1.2 Dissertation Organization A high speed serial link involves functions such as equalization, clock ing and signaling. To improve the power efficiency of the whole link, it is vital to understand each of these components and their inter dependencies which is the topic of Chapter 2 Chapter 2 starts with the channel with special emp hasis on the intrinsic loss of transmission lines. It then introduces a few popular equalization techniques to compensate channel loss. The i mportant topic of clock generation and recovery follows, revealing the attractiveness of injection locking based cl ock generation and baud rate CDR. After that, t he signaling power is related to channel loss, equalization, termination, and signaling modes. The advantages of DFE and voltage mode signaling with differential termination are demonstrated. Chapter 3 focuse s on reducing the signaling power by joint channel and circuit optimization An air cavity transmission line structure is proposed to reduce the dielectric loss which dominates at high frequencies. To further reduce the power dissipation, the link also fea tures speculative DFE and a current sharing frontend without back termination. The active link dissipates 3.7 mW at 6.25 Gb/s, which translates to a power efficiency of 0.6 pJ/bit. A digital eye tracking baud rate CDR scheme is proposed in Chapter 4. The b aud rate CDR automatically tracks the maximum eye opening while reducing the clocking power by more than 50% compared to a conv entional oversampling based CDR A majority voting 1 tap speculative DFE is also proposed which is more amenable

PAGE 25

25 to low power and high speed designs than the selectors in conventional speculative Implemented with CML gates, a receiver with the proposed baud rate CDR and majority voting DFE consumes 12.4 mW at 4.5 Gb/s including the clocking circuitry. To further improve the p ower efficiency, C hapter 5 presents a complete transceiver in exclusive static CMOS gates. The RX employs heavy parallelism to reduce the power supply from the nominal 1.2 V to 1.0 V. Other design features include a speculative DFE with a look ahead select ion tree, a decimated baud rate eye tracking CDR, and a n injecti on locked ring oscillator for multi phase clock generation The TX uses a voltage mode driver with differential termination to reduce the signaling power. The transceiver consumes 3.7 mW at 5 Gb/s. At 0. 75 pJ/bit, t he power efficiency is among the best to date. With advanced CMOS technologies offer ing transistors with cut off frequencies above 100 GHz and gate delays of around 10 ps it is now possible for the RX to directly digitize incoming s ignal and perform equalization and timing recovery in the digital domain [8] One of the key challenges, however, is the power consumption With prevents the use of small transistors. In response, Chapter 6 describes a novel background ADC calibration scheme that is suitable for high speed ADCs and incurs negligible hardware and power overhead. The proposed calibration scheme is implemented in a 5 0 mW 2.5 GS/s 5 bit flash ADC and its effectiveness is demonstrated with experimental results. A ll the reported results are in 0.13 I t is expected that the migration to more advanced technologies will lead to even better performanc es The

PAGE 26

26 proposed techniques should therefore help pave the way toward low power high speed serial links to meet the requirements of future high performance electronic systems

PAGE 27

27 CHAPTER 2 H IGH SPEED SERIAL LINK OVERVIEW 2.1 Chapter Overview Figure 2 1 shows a typical high speed serial link, which consists of a TX, a channel, and a RX. The TX multiplexes a low speed parallel bus into a high speed serial stream and drives it toward the channel. The RX resolves the stream into digital bits with a slicer and de multiplexes them back to a parallel format. The equalizer (EQ) compensates the frequency dependent loss of the channel, and the clock and data recovery (CDR) unit adaptively adjust s the RX clock phase so that the slicer digitizes the incoming stream with enough timing margin. Figure 2 1 A typical high speed serial link To improve the power efficiency of a serial link, the v arious parts of the link must be well understood. We first examine the channel, with emphasis on transmission line loss because it plays a vital role in determining the link performance. We then introduce some popular equalization techniques to compensate the channel loss, including FFE, CTLE, and DFE. Cloc king, including clock generation and clock recovery, is presented next. We show in this part that injection locking is an attractive clock generation technique, and that baud rate CDR schemes are generally preferred over their over

PAGE 28

28 sampling counterparts. I n the end, we relate the signaling power to channel loss, equalization, impedance mismatch, signaling modes, and termination schemes. We demonstrate that DFE usually gives better signaling efficiency than FFE, and that voltage mode signaling with different ial termination reduces the signaling power significantly. 2.2 The Channel At multi Gb/s, the channel delay is comparable or even larger than the bit time, rendering the signaling sensitive to reflections due to impedance mismatch. For this reason, the channe l is usually a transmission line with controlled 50 impedance to accommodate measurement equipment and proper ly terminat ed at both the TX and RX. Discontinuities along the channel such as vias packages, and connectors should all be carefully evaluated a nd controlled. However, even a perfectly uniform transmission with proper termination presents challenges to high speed signaling. At multi Gb/s, the channel suffers from two frequency dependent loss mechanisms stors that limit the total signaling bandwidth For example, it is shown i n [9] that, in theory, an NMOS in 0.8um technology is able to resolve a 48 Gb/s binary bit stream. However, the experimental results fall way short of th e theoretical prediction due to the channel bottleneck (including the pads and packages) The first loss mechanism is the conductor resistance At low frequencies, the current flows evenly through the conductor cross sectional area At high frequencies, h owever, the current tends to follow the path with least inductance, flowing only in a shallow band underneath the conductor surface, a phenomenon known as skin effect

PAGE 29

29 as shown in Figure 2 2 (A) The skin depth the depth at which the current density decays to e 1 of that at th e surface, is given by [10] where is the skin depth, is the frequency, is the permeability and is the conductivity. Figure 2 2 (B) plots the skin depth in copper as a function of frequency. In GHz range the skin depth is only on the order of (A) (B) Figure 2 2 Conductor loss. A) Skin effect B) S kin depth vs. frequency in copper The crowding of current to the conductor surface increases the effective resistance at high frequencies. S ince the skin depth is inversely proportional to the conductor loss (in dB) increases proportionally to

PAGE 30

30 The second loss mechanism is the dielectric dissipation which originates from the polarization of the molecules in the dielectric material. As illustrated in Figure 2 3 when an alternating electric field is applied to a dielectric material, the molecules rotate to align with the external field and in doing so rub against each other and convert some of the electric energy into heat [11] Because the molecules rotate every time the field polarity changes, the dielectric loss (in dB) is proportional to frequency, and is given by [12] where is the loss tangent of the dielectric material. Figure 2 3 Physical mechanism of dielectric loss The total loss is the combined effects of and and can be expressed as where and are constants determined by the transmission line constru ction. Since both and increase with frequency, the channel displays a low pass profile. Figure 2 4 shows an example channel loss, where is the data rate The loss at half data rate, is also known as the Nyquist loss. denotes t he frequency at which the two loss mechanisms contribute the same and is given by

PAGE 31

31 For a differential 10 0 mil 0.5 OZ microstrip line on FR4, is around 2 GHz. For high quality cables, may be much higher. For example, a 5 0 58 cable with PolyEthylene dielectric material may hav e an around 100 GHz. Figure 2 4 Channel loss In the time domain, this low single bit response (SBR) Figure 2 5 shows a sample SBR where is the main cursor, those with negative index are pre cursors, and those with positive index are post cursors. It can be seen that due to the limited channel bandwidth, a single bit spans more than one UI and interfer es with neighboring bits, a phenomenon known as inter symbol interference (ISI). To evaluate the impact of channel loss on the link performances, it is desirable to establish a relationship between the Nyquist loss and the SBR. However, since the Nyquist loss does not completely characterize the channel, an exact mapping between the Nyquist loss and the SBR is not possible. Figure 2 6 shows the main cursor amplitude at different Nyquist losses. Depending on the relation ship between and and may have varying significances, and channels with the same Nyquist loss may have different SBRs. Without loss of generality, the discussion in this chapter considers the case

PAGE 32

32 Figure 2 5 A sample SBR Figure 2 6 Main cursor vs. Nyquist loss 2.3 Equalization Figure 2 7 shows the simulated eye diagrams for channels with 6 12 and 18 dB Nyquist loss es T he channel loss degrades both the voltage and timing margins seen by the RX When the Nyquist loss is about 12 dB, the eye completely closes. To extend the bandwidth of the channel, equalization is often employed in high speed serial links. This section reviews some of the most popular techniques. Figure 2 7 Eye degradation due to channel loss

PAGE 33

33 2.3.1 FFE pass characteristic, it is possible to reverse it with a linear high pass filter. One way of doing this is through a discrete time FIR filter [13] [14] at the TX or RX, of which TX feedforward equalization (FFE) is the most popular, as Figure 2 8 (A) shows. By adjusting the tap weights, a relatively flat composite frequency response can be obtained, as shown in Figure 2 8 (B) (A) (B) Figure 2 8 FFE. A) Block diagram B) W orking principle Although more drivers are used for FFE, their total size is the same as the driver without FFE if the same peak gain is maintained. The electronic power overhead of FFE stems mainly from the additional flip flops and the associated wiring

PAGE 34

34 2.3.2 CTLE (A) (B) Figure 2 9 CTLE. A) Circuit detail. B) Frequency response Another linear equalization technique is the continuous time linear equalizer (CTLE) [6] [7] Figure 2 9 (A) and (B) show the schematic and tra nsfer function of such a CTLE. The transfer functions has two poles and one zero, which are given by The product of the gain, the peaking factor, and the bandwidth satisfies [15]

PAGE 35

35 which means the performance of the CTLE is limited by the cut off frequency of the technology. Due to the high bandwidth and linearity requirement s a CTLE tends to be power hungry. For example, implemented in 90 nm CMOS, the CTLE in [6] provides 8.7 dB peaking and accounts for 27% of the total RX power at 6.25 Gb/s. For a 12.5 Gb/s link implemented in 65 nm CMOS, the CTLE provides 7.5 dB peaking and represents 38% of the RX power [7] 2.3.3 DFE Besides the linear equalizers discussed above, a non linear equalization technique, known as decision feedback equalization (DFE), has found interest in recent high speed serial links [16] [17] [18] A 1 tap DFE is depicted in Figure 2 10 (A) It works by directly removing the ISI of the previous bit from the current analog sample. Another way of viewing it is that the DFE adjust the slicer t hreshold depending on the previous bit. The power overhead of the DFE shown in Figure 2 10 (A) consists mainly of the summer. The feedback path in Figure 2 10 (A) must settle within one UI, a difficult desi gn challenge at high data rates. To relax this stringent timing requirement, speculative DFE can be used, where possible results are pre computed and then selected by the previous bits [19] as shown in Figure 2 10 (B) The power overhead of speculative DFE is comprised of the additional slicers.

PAGE 36

36 (A) (B) Figure 2 10 DFE b lock diagram s A) C onventional DFE. B) S peculative DFE 2.4 Clock in g At multi Gb/s, both the timing offset and uncertainty must be well controlled and clocking including clock generation and clock recovery, may constitute a significant or even domina nt portion of the total link power [6] [20] .This section look s at both clock generation and clock recovery, and identifies ways to reduce the clocking power. 2.4.1 Clock Generation Clock generation in high speed serial links is usually done with a PLL or a DLL. Figure 2 11 (A) depicts a PLL block diagram, which consists of a phase detector (PD), a low pass loop filter (LPF), a voltage controlled oscillator (VCO), and an optional divider. At steady state, the negative feedback loop ensures that the VC O output phase is aligned with that of the reference clock. A DLL block diagram is shown in Figure 2 11 (B) where the VCO in a PLL is replaced with a voltage controlled delay line (VCDL). Under locked condition, the delay

PAGE 37

37 of the VCDL is equal to one reference clock cycle. Compared to a PLL, a DLL is usually easier to design because the loop is of first order. While the cores of a PLL and a DLL are the VCO and VCDL, the other loop components may consume significant power. For examp le, in [6] the VCO consumes only 12% of the total PLL power. Besides, the PD and loop filter also occupy considerable area. (A) (B) Figure 2 11 Block diag rams of a PLL and a DLL A) PLL. B) DLL Another cloc k generation technique that is found in some recent serial links is the injection locked oscillator [21] [22] Figure 2 12 depicts the block diagram of an injection locked 5 stage ring oscillator. In the absence of injection signal each stage of the oscillator contributes a delay of resulting in a free running frequency of When a clock with frequency i s injected to one of the nodes, the delay of the injected stage changes by and at rising and falling edges respectively.

PAGE 38

38 Designating under locked condit ion, the oscillation is sustained at and the following equation holds: I njection locking a ring oscillator to a clean reference clock can dramatically improve its noise performance because periodical correcti on by the injected clock prevents jitter from accumulating indefinitely [23] This can be observed in the frequency domain as a reduction in the phase with injection locking, as illustrated in Figure 2 13 Compared to a PLL or a DLL, an injection locked oscillator avoids the po wer and area overhead of the PD, the LPF and the dividers while still offer ing good jitter performance [24] [23] [25] Besides, since no feedback loop is involved, an injection locking based clock generation does not have the stability issue of a PLL or DLL. Figure 2 12 Block diagrams of an injection locked 5 stage ring oscillator

PAGE 39

39 Figure 2 13 Simulated phase noise suppression with injection locking 2.4.2 Clock Recovery A clock recovery unit is essentially a feedba ck system consisting of three basic blocks, namely a phase detector (PD), a phase shifter or rotator, and a loop filter, as shown in Figure 2 14 The PD determines whether the sampling clock is too early or too late. The early/lat e information, after being processed by a loop filter, is used to control the phase shifter or rotator toward the desired position. Figure 2 14 CDR block diagram Various architectures ex ist for clock recovery [26] The PD can be either linear [27] or non linear [28] with the former giving both the direction and magnitude of the phase deviation, while t he latter only the direction. In high speed serial links, non linear PD is more popular because it does not require processing of narrow pulses [29] The loop filter can be analog [30] digital [31] or hybrid [32] The phase shifter or rotator can be implemented with an oscillator, a delay line, or a phase interpolator (PI) etc.

PAGE 40

40 Non linear phase detection is usually achieved via oversampling. Figure 2 15 (A) shows the block diagram of an Alexander PD [28] The input signal is sliced twice for each UI, one for eye center (data) and one for eye boundary (edge). Whenever a data transition is de tected, the edge sample in between is compared with the two data samples to determine whether the sampling clock is too early or too late, as illustrated in Figure 2 15 (B) Assuming the clock phases are evenly spaced, at locked co ndition, the data sampling phase is automatically placed at the center. (A) (B) Figure 2 15 Block diagram and principle of Alexander PD The power overhead of oversampling CDR consists of the additional slicers and clocking circuitry. While the additional slicers may be disabled to reduce their power consumption if a low CDR bandwidth is acceptable [6] it is still necessary to generate the extra clock phase s. Moreover, since oversampling requires timing resolution better

PAGE 41

41 than the bit time, the clocking power overhead is more than it appear s because d oubling the timing resolution requires more than doubling the clocking power. This can be observed in Figure 2 16 CMOS technology. For this reason, baud rate CDR is preferred to reduce clocking power. (A) (B) Figure 2 16 Simulated performances of an inverter in a 0.13 Delay. B) E nergy 2.5 Signaling In a high speed serial link, the TX driver needs to produce a large enough voltage swing over the low channel impedance. The power consumed by the TX driver, also known as the signaling power, may constitute a significant portion of the total link power. For instance, in [7] nearly 40% of the link power is consumed by the TX driver.

PAGE 42

42 To improve the power efficiency o f the whole link, it is imperative to gain an insight to the various factors that affect the signaling power. 2.5.1 Signaling Efficiency Figure 2 17 shows a typical frontend found in high speed links [17] T he analysis in this section assumes that the DC loss of the channel is negligible. Without DC loss, the signal swing at the TX and RX are the same, as shown in Figure 2 17 For the ideal case with lossless channel and perfect term ination, the eye opening is the same as the signal swing and the signaling power is Figure 2 17 A typical link frontend F actors such as channel loss, equalization, termination, and signaling modes cause to deviate from If we define the signaling efficiency as the signaling power now becomes

PAGE 43

43 By studying the relationship between and the various factors such as channel loss, equalization termination, and signaling mode, their impacts on the signaling power can be understood. 2.5.2 Effects of Channel Loss With the SBR given, the worst case eye opening can be found using the peak distortion technique [33] and is calculated to be For a uniform channel with perfect matching, all the cursors are positive. Since the DC loss is negligible, i.e. Equation 2 9 can be simplified to Figure 2 18 Main cursor amplitude and signaling power penalty vs. channel loss Figure 2 18 shows the simulated amplitudes of the main cursor as a function of the channel Nyquist loss. Assuming the post cursors are completely removed by DFE, the main cursor amplitude equals the RX eye opening. The signaling power penalty of the channel loss is therefore calculated accordingly and is plotted also in Figure 2 18 It

PAGE 44

44 can be seen that when the Nyquist loss exceeds about 9dB, 50% more signaling power is needed to restore the eye opening seen by the RX slicers. Besides mandating more signaling power, higher channel loss also neces sitates more equalization and induces power penalty for signal processing thereof. This is explained with the help of Figure 2 19 which shows the amplitudes of the first three post cursors normalized to the main cursor. Generally speaking, with increasing channel loss, the post cursors become more and more significant compared to the main cursor. Specifically, when the Nyquist loss is 9 dB, the second post cursor is around 10% of the main cursor. While 1 tap DFE may be enough when the Nyquist loss is less than about 6~9 dB, extra DFE taps are desired beyond that, incurring power penalty for the extra latches etc. Figure 2 19 Post cursor amplitudes vs. channel l oss Figure 2 20 plots for different channel losses. When the Nyquist loss goes beyond 9 dB, the eye opening quickly degrades and error free signaling without equalization becomes impractical or even imposs ible near 12 dB.

PAGE 45

45 Figure 2 20 The effects of channel loss and equalization on 2.5.3 Effects of FFE and DFE To facilitate signaling over lossy channels, equalization is often employe d in high speed serial links. The impacts on the signaling power depend on the specific equalization scheme The FFE operates with an FIR filter in cascade with the channel. With proper tap weights, the FIR filter inverts the channel response so that the c omposite frequency response is flat up to the Nyquist frequency, i.e. The peak gain of the FIR filter occurs at the Nyquist frequency, and is kept at unity fo r fair comparison, i.e. Equation 2 12 can then be simplified to The signaling efficiency with FFE is then given by [34] The DFE, on the other hand, directly removes the ISI of the previous bits and is better understood in the time domain. In the absence of detection errors (no error propagatio n), the DFE can be analyzed in a linear fashion and the composite SBR is

PAGE 46

46 The signaling efficiency with DFE is then given b y The normalized signaling power with FFE and DFE is also plotted in Figure 2 20 While bo th FFE and DFE extend the achievable data rate, DFE always yields the lowest signaling power. For example, when the Nyquist loss is 9 dB, the signaling power with DFE is 40% lower than that with FFE. Figure 2 21 Effects of FFE and DFE in frequency domain Intuitively, this benefit of DFE stems from the fact that DFE boosts the high frequency component [16] This is in contrast to FFE, which merely attenuates th e low frequency component of the signal so that the high and low frequency components have the same amplitude when arriving at the RX. This is shown in Figure 2 21 which compares the composite frequency responses with FFE and DF E of a hypothetical channel which has an SBR of [0.8, 0.2]. 2.5.4 Effects of Back Termination As shown in Figure 2 17 a typical link has termination at both the TX and RX. Although the TX back termination helps mitigate reflections, it reduces the signal swing by 50%, which must be compensated for by doubling the signaling power. Note,

PAGE 47

47 however, that this back termination is not necessary if the channel is relatively uniform and a good impedance matching is ensured at the RX. With the ba ck termination removed and assuming perfect RX matching, the signaling power now becomes Comparing Equation 2 17 to Equation 2 7 removing the back termination reduces the signaling power by half bec ause it doubles the impedance seen by the TX driver [35] However, without the damping of the back termination, reflections due to RX impedance mismatch may make multiple trips along the channel before dying out. The resulting degradation of the eye opening must be evaluated. The effect of RX impedance mismatch can be studied with the help of the lattice diagram [36] as shown in Figure 2 22 where and are the reflection coefficients at the TX and RX respectively. When a pulse first arrives at the RX, the transmitted pulse is given by The reflected pulse travels back and gets fully reflected at the TX. When it arrives again at the RX, the transmitted pulse is where denotes convolution. Since the channel DC loss is negligible, the worst case eye opening degradation due to the first reflected pulse is Similarly, the degradation due to the n th reflection is

PAGE 48

48 The total effect is obtained by taking the sum and is The signaling eff iciency without back termination is therefore where the factor 2 accounts for the amplitude doub ling due to the removal of the back termination. Figure 2 22 Lattice diagram for reflection calculation Figure 2 23 d epicts the eye opening improvement with the back termination removed as a function of RX impedance mismatch. With 9 dB Nyquist loss and 10% impedance mismatch, the signaling power is reduced by nearly 40%. Figure 2 23 E y e opening vs. RX mismatch Also plotted in Figure 2 23 is the effect of RX mismatch when the Nyquist loss is 12 dB. The eye degradation becomes more sensitive to RX mismatch without back

PAGE 49

49 termination when the channel loss increases. Intuitiv ely, this is because the main cursor decreases with increasing channel loss, while the reflection remains the same as long as the DC loss is negligible. Note the above discussion assumes negligible DC loss of the channel. If the channel has substantial DC loss, the reflections may be heavily attenuated and good termination may not be required at either TX or RX [37] 2.5.5 Effects of Signaling and Termination Modes The above discussion considers exclusively current mode signaling. How ever, both current mode (CM) [16] [20] and voltage mode (VM) [6] [38] [39] signaling have been used for high spee d serial links. Besides, the termination may be single ended or differential. Their signaling powers are analyzed below. Figure 2 24 (A) shows the schematic of a current mode frontend with single ended termination. The differential pair works in saturation region and steers the tail current to either branch according to the bit being transmitted. The voltage levels at the TX outputs are The voltage swing and the signaling power are therefore When the termination is differential, as shown in Figure 2 24 (A) the voltage levels become

PAGE 50

50 while the single end ed voltage swing and the signaling power are the same as single ended termination. (A) (B) Figure 2 24 CM signaling A) S ingle ended termination. B) D ifferential termination Figure 2 25 shows the schematic for VM signaling. The transistors work in linear region and connect the outputs to either voltage rails according to the bit being transmitted. Termination is provided by series resistors, either by the on r esistance of the transistors or by explicit resistors in series with the transistors. With single ended termination, the voltage levels at the TX outputs are The single ended voltage swing and the signaling power are

PAGE 51

51 For the case of differential termination, the voltage levels become The single ended voltage swin g and the signaling power now are It can be seen that using differential termination reduces the signaling power by 50% for VM signaling. (A) (B) Figure 2 25 VM signaling A) Single ended termination. B) Differential termination.

PAGE 52

52 Table 2 1 summarizes the performance of current mode and voltage mode drivers with singl e ended and differen tial terminations. It can be seen that even with a linear regulator to generate V DRV a VM signaling with differential termination consumes only 25% of CM signaling power. Table 2 1 Summary of signaling and termination modes Mode CM CM VM VM Term. SE Diff. SE Diff 2.6 Summary Various factors come into play when one tries to improve the power efficiency of a high speed serial link with the channel pos ing the most difficult challenge At multi Gb/s, conductor loss and dielectric loss limit the channel loss and causes te mporal spreading of the transmitted pulses. To compensate for the resulting ISI, high speed serial links usually employ equalization such as FFE, CTLE and DFE, with each involving a different level of complexity. Clocking, including clock generation and cl ock recovery, is challenging at high data rates and sometimes may dominate the total link power budget. Conventional

PAGE 53

53 solutions such as PLL and DLL entail considerable area and power overhead due to the PD and LPF. Injection locking based clock generation, on the other hand, is a promising technique because it avoids such overhead while still features low jitter. To reduce the clocking power, baud rate CDR is preferred over its oversampling counterpart, such as the Alexander type CDR, which has found popular use in recent high speed serial links. Due to the low channel impedance, the signaling power, the power dissipated by the TX driver, consumes c onsiderable percentage of the link power. Using the peak distortion technique and the concept of signaling effic iency, this chapter shows the attractiveness of DFE and VM signaling with differential termination. It is also shown that with moderate channel loss and reasonable termination tolerance, back termination can be removed to further reduce the signaling power The rest part of this Dissertation will report a few TX and RX implementations that embed the analysis results presented in this chapter. Their usefulness is demonstrated with experimental results.

PAGE 54

54 CHAPTER 3 AN ACTIVE LINK WITH AIR CAVITY TRANSMISSION LINES 3.1 Cha pter Overview As discussed in chapter 2, t he bandwidth of transmission lines is limited primarily by conductor loss and dielectric lo s s Because is proportional to while is proportional to [36] the latter mechanism dominates at high frequencies. For conventional dielectric materials such as FR4, the d ielectric loss significantly degrades the channel bandwidth for multi Gb/s signaling. While resorting to materials with low loss tangents or even optics is possible, such solutions incur significant cost overhead. Figure 3 1 (A) sh ows the cross sections of a conventional microstrip on FR4 ( Since the field of a microstrip transmission line resides in both the air and FR4, the effective dielectric constant lies somewhere between and The extent to which dominates is characterized by a so called filling factor [40] which satisfies The effective loss tangent can also be related to the filling factor by [40] Because the dielectric loss is determined by both and through [12] reduction of the filling factor will reduce the dielectric loss. Intuitively, since most of the field energy is confined between the signal lines and the ground plane, if we can somehow fill the space between them with air, the filling

PAGE 55

55 factor will be reduced. This can be done by employing the air cavity microstrip structure (also known as inverted microstrip [41] ) as shown in Figure 3 1 (B) Air cavity microstrips can be formed by selective ly post processing the FR4 boards for high speed interconnects. This avoids the cost overhead associated with expensive substr ate materials for non critical signals. (A) (B) Figure 3 1 Cross sections of microstrips. A) C onventional B) A ir cavity Figure 3 3 shows the simulated of conventional and air cavity differential microstrips, with the conductor thickness kept at 5 m. The calculated filling factor is shown in Figure 3 3 It can be seen that air cavity microstrip has lower and and that when is reduced by 30% by employing the air cavity. According to Equation 3 7, such reductions translate to an improvement of 36% of as shown in Figure 3 4 It shoul d be noted that n ot only is the air cavity structure attractive for low loss, it also features lower latency for the same channel length because is reduced. Encouraged by these results, this Chapter presents the design and fabrication of air cavity transmission lines, and their use in an active link. The active link features a

PAGE 56

56 current sharing frontend and speculative DFE to reduce the signaling power. Back termination at the TX is also removed for further power saving. Experimental results co nfirm the dielectric loss is reduced by 26% by the air cavity structure. Operating at 6.25 Gb/s, the link consumes 3.7 mW, yielding a 0.6 pJ/bit power efficiency. Figure 3 2 Simulated of conventional and air cavity microstrip Figure 3 3 Simulated of conventional and air cavity microstrip Figure 3 4 Simulated dielectric loss of conventional and air cavity microstrip

PAGE 57

57 3.2 Transmission Line Design The main design parameters of the proposed air cavity structure include the signal line width W and spacing S the conductor thickness t and the height of the air cavity H density. Considering the process capability, the conductor thickness is chosen to be 5 W and spacing S are assumed to be the same. A meande red transmission line length of 20 cm is used as a representative channel length for chip to chip interconnects [42] The channel loss is evaluated at 5GHz with a target of 10 dB or less, or equivalently an attenuation constant of 0.5 dB/cm at this frequency. The transmission line is simulated in a 3D electromagnetic simulator. Figure 3 5 (A) shows the picture of the 3D model. To reduce the requirement on computation resources, a short line of 1 cm is si mulated. The obtained S parameters are then cascaded to get the characteristics of longer lines. Figure 3 5 (B) shows the simulated air cavity loss performance at 5 GHz at various signal line widths. While the conductor loss decrea ses with increasing conductor sizes due to larger effective conducting surface area, the dielectric loss stays relatively constant since it is primarily determined by the material properties. From a loss reduction perspective, it is desirable to use as big a W as possible. However, to achieve the desired impedance, a proper must be maintained. The fabrication process limits the air cavity height to about 20 Accordingly, t he final W which gives an 8 dB total loss for a 20 c m channel at 5 GHz. The transmission line dimensions are listed in Table 3 1

PAGE 58

58 Figure 3 5 Picture of the 3D model and s imulated loss at various lin e widths Table 3 1 Final air cavity microstrip dimensions W S t H 40 m 40 m 5 m 19 m Figure 3 6 compares the dielectric loss in the proposed air cavity trans mission line and the conventional FR4 based microstrip transmission line (in dB/cm) with the same conductor width and spacing. The air cavity structure reduces th e dielectric loss by around 26% Figure 3 6 Simulated dielectric loss of air cavity and conventional transmission lines The effective dielectric constants are calculated from the simulated phase characteristic. The air cavity structure reduces the effective dielectric constant by 25% from 2.75 to 2.07.

PAGE 59

59 Figure 3 7 compares the simulated loss es of conventional and air cavity transmission lines. The loss of the air cavity transmission line is 0.25 dB/cm at 3.125 GHz and is 8% less than the conventional struc ture Figure 3 8 shows the signaling power reduction with the air cavity structure assuming FFE and DFE respectively. T he improvement of air cavity topology becomes more pronounced at higher frequencies as the dielectric loss beco mes more significant. For example, at 10 GHz, the loss improvement is nearly 15%, and for a 20 cm channel the signaling power is reduced by more than 10% with DFE and 16% with FFE. It is therefore expected that the air cavity structure is especially attrac tive for future high speed interconnects. Figure 3 7 Improvement with air cavity transmission line (A) (B) Figure 3 8 Signaling power reduction with air cavity. A) With FFE B) With DFE

PAGE 60

60 3.3 Fabrication Figure 3 9 illustrates the process flow for fabricating the proposed air cavity interconnects. The process begins with electroplating the first copper pattern on an FR4 substrate representing the differential signal lines ( Figure 3 9 (A) ). Following this step, a sacrificial polymer laye r is spin coated with desired thickness and patterned to act as a temporary placeholder in the formation of the air cavity ( Figure 3 9 (B) ). The sacrificial polymer contains poly propylene carbonate (PPC) (Novomer Inc., Ithica, NY) A photoacid genera tor is added in order to obtain a photo sensitive polymer mixture, and butyrolact one (GBL) serves as the solvent A similar formulation is available as Unity 2203P from Promerus LLC, Brecksville, OH. Two different approaches for patterning are studied fo r the PPC layer, photo patterning and self patterning [43] When photo patterning, a photo mask is used. When employing PPC self patterning process, no photo mask is needed, and the slightly sloped sidewalls of the PPC patterns makes it ideal for the sequential layers to have a better step coverage. The copper ground layer is then patterned on top of the PPC patterns. The entire surface is then overcoated with Avatrel 8000P (functionalized polynorbornene) for hermetic seal of tr ansmission line and providing mechanical support for the top ground copper layer ( Figure 3 9 (C) ). PPC polymer backbone unzipping occurs upon heating up to 220C during Avatrel overcoat curing, during which period of time the solid PPC is converted to gaseous products. The gaseous products gradually permeates through the overcoat sidewalls and opening in the ground layer patterns, leaving an air cavity region of the same physical shape as the patterned PPC with little residue ( Figure 3 9 (D) ), thus air cavity transmission line structure is formed. The overcoat also serves as solder mask for later die and cable attachments.

PAGE 61

61 (A) (B) (C) (D) Figure 3 9 Fabrication process for the air cavity structure Figure 3 10 Picture and c ross section of the fabricated air cavity structure Figure 3 10 (A) shows the picture of the finished air cavity differential transmission lines. The ground plane is patterned in a grid style, with holes for gas release during PPC evaporation. Figure 3 10 (B) shows a cross section of the finish ed air cavity structure.

PAGE 62

62 3.4 Link Implementation 3.4.1 Link Architecture Figure 3 11 shows the block diagram of the link. The RX has a common gate (CG) preamp and a half rate 1 tap speculative DFE The TX consists of a half rate 2 7 1 PRBS c ore a MUX, and an open drain driver. To reduce signaling power, t he back termination at the TX output usually found in high speed serial links is removed in this design. F or the same voltage swing seen by the RX, removing the back termination reduces the required signaling power by 50% because it doubles the impedance seen by the TX. Figure 3 11 Link block diagram Channel equalization is primarily done by the DFE for better power efficien cy as discussed above. However, because DFE only cancels post cursors, a 2 tap FFE is still

PAGE 63

63 built in the TX driver for pre cursor cancellation. Note that this TX FFE can also be configured for post cursor cancellation, and facilitates the comparison betwee n FFE and DFE in terms of power efficiency. 3.4.2 TX Design The latches, multiplexers and drivers in the TX are all implemented in current mode logic (CML) for fast operation and good power noise immunity, as shown in Figure 3 12 Consi dering the fact that the pre cursor is usually only a fraction of the main cursor, the pre cursor driver is sized half of the main cursor driver. The multiplexers are sized in such a manner that the signal path comprised of the latch, the multiplexer and t he driver has a uniform fan out. (A) (B) Figure 3 12 Schematics of the latch and multiplexer A) Latch. B) Multiplexer Figure 3 13 Schematic of the 5 b DAC To facilitate debugging and testing, a serial interface is integrated on chip. The bias currents of all the gates are controlled with 5 b DACs, the sc hematic of which is shown in Figure 3 13

PAGE 64

64 3.4.3 RX Design 3.4.3.1 Preamp d esign The RX consists of the CG preamp and the DFE. The CG frontend at the RX side serves multiple purposes. First, it provides low to high impedance transformation and i ncreases the voltage swing seen by the following DFE stage. This accommodates a smaller input voltage swing, which is important for high power efficiency as discussed before. Second, it accomplishes level shifting of the input signal so that NMOS input sta ges can be used in the DFE. Third, the input impedance looking into the source of the CG amplifier provides partial impedance matching for the channel. The most important design metrics of the CG pre amp are bandwidth and gain which are both closely relate d to power With the bandwidth design target set to 67% of the data rate, or 4.2 GHz for 6.25 Gb/s NRZ signaling the gain of the CG preamp is optimized for minimum link power. A higher preamp gain yields better RX sensitivity and lower signaling power, bu t requires more power for the preamp. For a given channel condition and technology, an optimum gain therefore exists that minimizes the total frontend power P F E Figure 3 14 P reamp model for gain optimization Figure 3 14 shows a preamp model for gain optimization For a given load capacitance gain A and 3 dB bandwidth the following equations hold:

PAGE 65

65 w here W is the transistor width, is the transistor transconductance per unit width, R is the load resistance, and is the transistor drain capacitance per unit width. For each transistor current density W and R can be solved and t he amplifier current is found to be (A) (B) Figure 3 15 Preamp design. A) Amplifier curre nt vs. current density. B) F r ontend power vs. preamp gain Figure 3 15 (A) plots the amplifier current as a function of at different gain in the target 0.13 um CMOS technology when driving the four slicers of the DFE. For each

PAGE 66

66 gain, there exists an optimu m current density, and the optimum current density increases with increasing gain. Figure 3 15 (B) shows the signaling power, the preamp power and the frontend power at different gain with optimum current density over a channel wi th 9 dB Nyquist loss. The slicer sensitivity is 100 mV, and it is assumed that DFE is used and that back termination is removed. The minimum frontend power is attained when the preamp gain is around 4, and is about 50% lower than the case without the pream p. The frontend power is further reduced with a current sharing frontend, as shown in Figure 3 11 By stacking the CG preamp and the open drain TX driver, the tail current of the TX driver is reused by the RX amplifier. According to Figure 3 15 (B) this reduces the frontend power by nearly 50%. The fact that the TX driver is powered from the RX supply also helps to suppress the noise coupling from the TX supply. B ack termination is removed in this work to reduce signaling power T he downside of this practice is the risk of potential reflections due to TX impedance mismatch. To mitigate the effect of reflections, a good impedance matching at the RX side must be maintained. Since the input impedance of the CG frontend is bias dependent and non linear, a programmable resistor is connected across the RX inputs to provide a better matching as shown in Figure 3 16 (A) The programmable range of the resistor is chosen so that a differentia a wide bias range between 0.5 mA and 5 mA, as shown in Figure 3 16 (B) Figure 3 17 compares the RX eye diagrams with and without back termination. It can be s een that, as expected, removing the back termination nearly doubles the RX eye opening without

PAGE 67

67 any noticeable degradation of the eye quality. Given the same RX sensitivity, this means the signaling power is reduced by nearly 50%. (A) (B) Figure 3 16 Input impedance tuning A) S chematic B) S imulated result (A) (B) Figure 3 17 Simulated RX eye diagrams A) W ith back termination. B) W ithout back termination

PAGE 68

68 To prevent the RX sensitivity degradation due to small transistor sizes, offset cancellation is also built into the CG amplifier, as shown in Figure 3 11 The polarity and magnitude of the offset cancellation are all adjustable via digital control. 3.4.3.2 DFE d esign The DFE employs a speculative architecture and half rate clocking to ease timing requirement. The slicers are implemented as CML latches with adjustable built in offset as shown in Figure 3 18 When the latch is in its amplification phase (CKP is HIGH), an auxiliary differential amplifier injects static current into the output nodes to introduce a d esired offset. This is in contrast to [44] where the offset is introduced during the regeneration phase. This leads to more robust latch operation since the regenerative gain is not affected by the offset injecting differentia l pair. Another highlight of the DFE design is that a single latch stage is employed before the selector, unlike [44] where a complete flip flop is used. To account for different channel profiles, both the polarity and the magn itude of the offset injecting current are programmable via an on chip serial interface. The programmable range of the slicer threshold is simulated to be 140 mV, which is large enough to account for different DFE tap weights required by different channel profiles. Figure 3 18 S licer schematic

PAGE 69

69 The designs of the CML latches and multiplexers in the DFE are the same as the TX except sizing. Unlike the multiplexers in the TX which see the lar ge input capacitances of the pre cursor and main cursor drivers, the multiplexers in the RX only see the CML latch inputs. Accordingly, they are sized the same as the latches to save power. 3.5 Experimental Results To evaluate the performance of the proposed a ir cavity structure, a test board is designed. The layout of the test board is shown in Figure 3 19 The center area is occupied by the active link, which include footprints for a TX chip and a RX, and the air cavity transmission lines. The rectangular board for the active link is cut using a dicing saw and interfaced with test equipment to evaluate overall link performance. CPW lines are used to connect the SMA connectors to the chip footprint. Figure 3 19 Layout of the test board with the air cavity active link The top and bottom areas of the test board are used to implement air cavity test structures of various lengths. To improve measurement accuracy, open shor t thru de embedding structures are also implemented. To facilitate processing, custom alignment marks are placed at multiple locations. The entire board footprint is designed to fit into a house fab rication capabilities.

PAGE 70

70 3.5.1 Air Cavity Transmission Line Measurement The performance of the air cavity transmission line was obtained by measuring a 5 cm test structure using a vector network analyzer with high frequency probes. Figure 3 20 shows the measured loss and phase responses. The effective dielectric constant is calculated to be 1.7 from the measured phase, which is lower than predicted before. This is probably because the dielectric constant of the base material is lower (~3.9 ) than the used 4.4 in previous simulations. The lower dielectric constant also leads to higher line impedance, which causes ripples in the measured loss due to impedance mismatch [12] (A) (B) Figure 3 20 Measured performances of a 5 cm air cavity microstrip. A) L oss B) P hase

PAGE 71

71 The true loss of the line (excluding the effects of impedance mismatch) is calculated from extracted propagation constant using the technique in [45] and the result is shown in Figure 3 21 The loss is 0.28 dB/cm at 3.125 GHz, which readily meets our design goal. Simulation result (with ) is also overlaid for compariso n, demonstrating good agreement between measurement and simulation. Figure 3 21 Loss of the air cavity line 3.5.2 Link Measurement The TX and RX test chips are fabricated in 0.13 V CMOS process. Figure 3 22 shows the chip micrographs. The TX and RX cores occupy 0.03 mm 2 and 0.02 mm 2 respectively. The test chips are wire bonded to QFN packages and mounted on the test board with a 20 cavity interconnec t. Figure 3 23 shows the picture of the populated test board with air cavity lines in the center of the board. Figure 3 22 Chip micrographs of t he TX and the RX

PAGE 72

72 Figure 3 23 Picture of the populated test board Figure 3 24 Test setup The test setup is depicted in Figure 3 24 The TX and RX work mesochronously, deriving their clocks from the same signal generator, with their phase relationship adjusted by a mechanically tunable delay line. The full link operates successfully at 6.25 Gb/s wi th a half rate input clock of 3.125 GHz. Figure 3 25 (A) and Figure 3 25 (B) show the measured single ended eye diagrams at the outputs of the RX CG amplifier (driven off chip for testing purpose) before an d after enabling the TX FFE respectively. The closed eye diagram is

PAGE 73

73 successfully opened by enabling TX FFE. Figure 3 25 (C) shows the eye diagram at the output of the DFE for a 2 7 1 PRBS pattern, with the corresponding transient wa veform shown in Figure 3 25 (D) Correct 2 7 1 PRBS sequence is verified with both visual inspection and BER measurements. (A) (B) (C) (D) Figure 3 25 Measured waveforms Figure 3 26 shows the measured RX bathtub curves and energy per bit performance with different e qualization settings. At 6.25 Gb/s and a BER of 10 12 with only the TX FFE enabled, the eye opening is 30% UI. Enabling the RX DFE and disabling TX FFE improves the eye opening to 37%, while the overall power efficiency improves from 0.9 to 0.6 mw/(Gb/s), respectively. Enabling both FFE and DFE further improves the horizontal eye opening to 56% UI but decreases the power efficiency. When the link is operated at 6.25 Gb/s with only the DFE enabled, the TX core, the current sharing front end, and the DFE diss ipate 1.44 mW, 1.2 mW and 1.06 mW, respectively.

PAGE 74

74 (A) (B) Figure 3 26 Measured link performances. A) RX bathtub curves. B ) Power efficiency. Table 3 2 summarizes the link performance in relation to a recently published paper. Compared to previously published results, a large portion of the TX and RX power is decreased using the current sharing front end. Table 3 2 P erformance summary This work [7] Technology 0.13 m 65 nm Supply voltage 1.2 V 1.0 V Data rate 6.25 Gb/s 12.5 Gb/s Front end swing 125 mV 10 0 mV BER 1e 12 1e 12 Horizontal eye 56% UI @ 6.25Gb/s Power 3.7 mW 12 mW Energy per bit 0.6 pJ/bit 0.98 pJ/bit TX/RX core area 0.03mm 2 / 0.02mm 2 0.24mm 2 /0.24mm 2 3.6 Summary The bandwidth of the channel poses difficult challenges for high speed serial l inks. At high frequencies, dielectric loss dominates over conductor loss. The design and

PAGE 75

75 fabrication of the air cavity transmission line structure is presented in this Chapter to reduce the dielectric loss. The measured effective dielectric constant is 1.7 3 and the loss is about 0.4 dB/cm. The air cavity transmission lines are used in an active link. The active link features a low power current sharing frontend with a 1 tap speculative DFE. To further reduce power consumption, the back termination is also r emoved. The active link achieves successful 6.25 Gb/s operation and consumes 3.7 mW off a 1.2 V power supply, demonstrating the potential of the techniques for future low power high speed interconnects.

PAGE 76

76 CHAPTER 4 A 4.5 Gb/s 12.4 mW RX WITH BAUD RATE CDR 4.1 Chapter Ov erview The receiver presented in Chapter 3 does not include CDR a n essential function in high speed receivers as discussed in Chapter 2. CDR in high speed serial links is usually achieved with oversampling. However, oversampling CDRs have a few issues On e of the issues is explained in Chapter 2, which is the requirement for power hungry clock generation and distribution with sub bit time resolution The second issue with oversampling CDR lies in its assumption that the maximum voltage margin occurs at the eye center [31] When the input eye is horizontally asymmetric, locking to the eye center may lead to sub optimal voltage margin. The third issue with oversampling CDR is that it reduces the already challenging settling time r equirement for DFE [17] [46] Because the input signal is oversampled, the time allowed for the DFE to settle is now less than one UI. For low power high speed serial link design, a bau d rate CDR tha t circumvents these issues is therefore of interest. Sampling at the eye edges may also require dedicated edge equalization, since the edge samples experience different ISI than the data samples, as shown in Figure 4 1 Different ISI seen by the edge and data samples

PAGE 77

77 In this Chapter w e present a RX with a novel digital baud rate eye tracking CDR which employs an auxiliary slicer ( CDR slicer ) with adjustable threshold voltage. By jointly updating the sampling phase and the threshold voltage of the CDR slicer the CDR loop drives the decision point of the CDR slicer to the peak of the eye opening, and thus automatically locks to the maximum voltage margin point Because t he CDR slicer sampl es at exactly the same i nstant as the main data slicers, it does not interfere with DFE operation. We also present a majority voting DFE architecture that replaces the selectors in a traditional speculative DFE with majority voters. Compared to a selector, a majority voter is more amenable to low power and high speed designs because it reduces the transistor stacking levels and features equal delay to all data inputs. A majority voter also eliminates the need for a level shifter in bipolar designs. A receiv er was implemented with the proposed CDR scheme and the majority voting DFE. Details of the RX implementation will be given in this Chapter, together with measurement results, which confirmed correct functions of both techniques. 4.2 Baud Rate CDR A few baud r ate CDR schemes have been pro posed in the past The Mueller Muller CDR [47] used in several recently pu blished serial link receivers [8] [46] operates by adjusting the cl ock phase so that the sampled pulse response satisfies a predefined timing criterion. However, this type of CDR does not necessarily ensure maximum voltage margin of the sampled eye at lock. The CDR in [48] improves the voltage sampling margin but is only suitable for integrating type RX fr ontends. The baud rate CDR in [7] r elies on auxiliary slicers that have a larger sampling window than the main data slicers to keep the sampling phase away from the eye edges, but it does

PAGE 78

78 not take into account the voltage margin. Anoth er baud rate CDR reported in [49] locks to the maximum voltage margin point, but requires analog slope detection circuitry and is therefore not as amenable to technology scaling and migration as digital solutions. (A) (B) Figure 4 2 CDR b lock diagram s. A) Alexander CDR B) P roposed baud rate CDR Figure 4 2 shows th e block diagram of the Alexander CDR and the proposed baud rate CDR. The Alexander CDR employs two slicers, sampling half UI away from each other, hence 2 oversampling. The PD in an Alexander CDR only produces information for updating the clock phase. The proposed CDR also employs two slicers (main and CDR slicers). However, unlike the Alexander CDR, these two slicers sample the input signal at the same time, therefore no oversampling is involved. The PD in the proposed CDR not only controls the clock phas e, but also the offset of the CDR slicer.

PAGE 79

79 The algorithm of the proposed CDR is such that it drives the sampling point of the CDR slicer to the position with maximum vertical eye opening. Since the CDR slicer and the main slicer are triggered by the same cl ock phase, this automatically lock the clock phase to the point with maximum voltage margin. The operation principle of the proposed CDR is explained with the help of the CDR truth table shown in T able 4 1 and are three consecutive outputs of the data slicer is the output of the CDR slicer sampled at the same time as and is the threshold voltage of the CDR slicer. The CDR takes action whene ver is 1, tracking only the upper part of the eye. The discussion below therefore considers the case when exclusively. If higher CDR bandwidth is desired, the lower portion of the eye can also be utilized using an additional CDR slicer. T able 4 1 CDR truth table 0 1 0 0 -0 1 0 1 -0 1 1 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 0 --1 1 1 1 -- 0 --Figure 4 3 (A) illustrates an example eye diagram. The upper portion of the eye is divided into five numbered regions by the different waveform trajectories corresponding to input patterns (010), (011), (110 ) and (111). According to T able 4 1 the CDR updates only when data pattern equals (010), (011) or (110) since pattern (111)

PAGE 80

80 does not contain any timing information. Assuming equal probability for pattern occurrences, the CDR behavior is summarized in Table 4 2 and Table 4 3 and is graphically depicted in Figure 4 3 (B) where the circles indicate possible decision points of the CDR slicer the vertical arrows indicate the updating direction, and the horizontal arr ows indicate the clock phase updating direction. By inspecting Figure 4 3 (B) it can be seen that the CDR drives the CDR slicer around the maximum eye opening position (denoted by a star). Since t he CDR slicer and the DFE are clocked at the same phase, this automatically locks the DFE to the maximum voltage margin point. The proposed CDR has a few noteworthy advantages. First, baud rate operation saves clocking power by eliminating the need to gene rate extra clock phases for oversampling. Second, the CDR automatically locks to the point with maximum voltage margin without using any eye opening monitor circuits. Third, the proposed CDR does not constrain the frontend interface to any particular archi tecture. Moreover, decimation of the CDR slicer output is easily accommodated in this CDR, whereas in some other schemes this may be constrained because they require consecutive CDR slicer results [46] It should also be noted that the CDR slicer can be re used for equalization adaptation to reduce hardware and power overhead. Figure 4 3 Operation principle of the proposed baud rate CDR

PAGE 81

81 Table 4 2 update Region (010) (011) (110) (111) Total 1 -2 -3 -4 -5 -Table 4 3 Clock phase update Region (010) (011) (110) (111) Total 1 ---2 ---3 --4 --5 ---4.3 Majority Voting DFE DFE has been used extensively in high speed links to compensate for inter symbol interference (ISI) in band limited electrical channels [12] [17] [16] due to its noise immunity, high signaling power efficiency as explained in Chapter 2 To relax the stringent timing requirement, speculative DFE architecture [19] [50] is often used. As shown in Figure 4 4 a 1 tap speculative DFE makes two tentative decisions and assuming the previous bit is and respectively, and then the correct decision is selected by The timing requirement for the DFE loop can be written as where is the selector delay, and and are the delay and setup time of the CML DFF.

PAGE 82

82 Figure 4 4 Block diagram of a 1 tap speculative DFE From Equation 4 2 the selector and flip flop delays in the critical timing path determine the maximum operating speed of the 1 tap speculative DFE. While significant work has been published on CML latches/FFs [51] [52] the following observations can be made regarding the operation of a CML selector which is shown in Figure 4 5 First, because the selection of the current bit decision is made by series connecting the previous bit the CML selector employs three transistors in the stack (including the tail current), and is therefore not optimal for l ow voltage/low power designs. Second, to maximize the timing margin of the critical DFE feedback loop, it is desirable to minimize the delay from to yet in Figure 4 5 experiences the largest delay among the three inputs. The third issue concerns the common mode level of : s ince is supplied from a CML latch, its common mode level is close to VDD and this may necessitate an explicit level shifting stage which incurs power and speed overhead (especially in bipolar implementations [53] ). Figure 4 5 Schematic of a CML selector

PAGE 83

83 Table 4 4 Selector truth table 1 1 1 +1 1 1 1 +1 1 1 1 +1 1 +1 +1 1 +1 +1 1 1 +1 1 1 +1 1 +1 1 +1 1 +1 +1 +1 1 +1 +1 +1 +1 +1 1 +1 Table 4 3 shows the truth table of the CML selector in a speculative DFE. Not e, however, that i n a low pass electrical channel s with a pulse response of [ ] both coefficients and are positive, and thus the feedback tap weight in the DFE always tends negative. This implies that the combination a nd in the truth table in Figure 4 5 does not occur (indicated in gray), and inverting the corresponding row output s therefore does not affect the DFE function. Thus the truth table can be rewritten as shown in Table 4 5 and can be expressed as where is the sign of the operand. Figure 4 6 Proposed majority voter schematic Equation 4 2 can be readily implemented with a majority voter, as shown in Figure 4 6 Compared to the selector in Figure 4 5 the majority voter obviates the few disadvantages mentioned previously. The number of transistors in stack is reduced from

PAGE 84

84 three to two, making the majority voter more amenable for low voltage designs. The majority voter is f ully symmetric w ith respect to the three inputs, and a s a result, the critical delay from to is identical for all inputs. Moreover, no level shifting is required for Table 4 5 Majority voter truth table 1 1 1 +1 1 1 1 +1 1 1 1 +1 1 +1 +1 1 +1 +1 1 1 +1 1 1 +1 +1 +1 1 +1 1 1 +1 +1 1 +1 +1 +1 +1 +1 1 +1 Figure 4 7 (A) compares the simulated to delay for a selector and majority input transistors are of the same size, the single ended input swing is 300mV the fan out is assumed to be tw o, and the supply is set to 1.2 V. The load resistors are adjusted so that both the selector and the majority voter have a small signal gain of one. The delay of both selector and majority voter decrease s with larger current densities and h igher transistor and saturate s as reaches its maximum. For equal current densities the majority voter exhibits ~50% less delay. Figure 4 7 (B) shows the overall DFE loop delay using the proposed majority voter and th e traditional selector. In this comparison, the latches in the DFF are biased with equal current density in both cases. The majority voter based DFE shows >10% improvement in delay over a wide range of current densities. Further improvement can be achieved by increasing the current density bias point and speed of the CML DFFs.

PAGE 85

85 (A) (B) Figure 4 7 Simulated delay A) Selector and majority voter B) O veral l DFE loop. Figure 4 8 (A) shows the selector and the majority voter delay as a function of bias current. Although the majority voter has three static tail current paths compared to the single current bias leg of the selector, the overall current consumption to achieve the same delay is comparable. This is due to the fact that the majority voter requires a lower current density than the selector to achieve the same speed. That is, the majority voter has a lower effort delay [54] and thus it exhibits higher power efficiency. This can be related to the majority voter having one transistor less in the stack, which also enables operation at lower supply voltages as shown in Figure 4 8 (B) A comparison of the selector and majority voter delay normalized to their respective delays at the

PAGE 86

86 nominal supply voltage of 1.2V shows that 1) the majority voter is significantly less sensitive to supply voltage variation and 2) it can operate at a l ower supply voltage. For instance, t he selector d elay quickly degrades below 0.8V while the majority voter exhibits a more gradual degradation below 0.6 V. (A) (B) Figure 4 8 Simulated sel ector and majority voter performance s. A) D elay vs. total bias current B) N ormalized delay variation with supply voltage (VDD) for current density of 100 A/ m. 4.4 Chip Implementation 4.4.1 Architecture Figure 4 9 shows the block diagram of the RX core. The input data is sampled by a half rate 1 tap speculative DFE and a CDR slicer The DFE output is then de multiplexed by 8 whereas the CDR slicer output is decimated by 8 A CDR logic block

PAGE 87

87 processes the output of the DFE and the CDR slic er according to the CDR algorithm described above and updates both the threshold of the CDR slicer with a 6 b DAC and the clock phase with a phase interpolater (PI) The I/Q inputs to the PI are generated by dividing down a full rate external clock. To mi nimize power, the RX employs high speed CML circuits only in the first two stages and static CMOS logic for the later stages, as shown in Figure 4 9 In addition, the data output of the CDR slicer is decimated by 8 instead of bein g fully de multiplexed. Although this decimation reduces the CDR bandwidth, experimental results reported in following sections confirm that the CDR bandwidth is sufficiently large for plesiochronous chip to chip interconnects. All blocks are built with cu stom layout except the CDR logic block which is synthesized with standard cells. Figure 4 9 Block diagram of the RX

PAGE 88

88 4.4.2 Slicer The slicer is implemented as a CML latch with digital offset co ntrol, as shown in Figure 4 10 where all transistors without length annotation are of minimum channel length. During pre amplification mode, a current is injected to the output nodes to introduce a desired offset. To reduce power supply noise, the offset injection current is kept active even when the slicer is in regeneration mode. Both the polarity and magnitude of the injected current are controlled through the serial interface. An important design parameter of the slicer is the offset tuning range, which must be large enough to override the intrinsic slicer offset while generating the desired DFE tap weight. Figure 4 11 (A) shows the simulated offset of the slicer, while the simulated offset tuning chara cteristic of the slicer is shown in Figure 4 11 (B) when the sign of offset is set to 1 The slicer offset is 34 mV, and the offset tuning range is 220 mV. With 6 b digital control, this gives a maximum DFE tap weight of nearly 200 mV with a nominal step of 3 mV. Figure 4 10 Schematic of the slicer with threshold control

PAGE 89

89 (A) (B) Figure 4 11 Simulated slicer performances. A) S licer offset B) O ffset tuning. 4.4.3 DMUX The DMUX is constructed from cascading 1:2 DMUX cells. Figure 4 12 illustrates t he schematics of the latch based CML and CMOS 1:2 DMUX cells together with their transistor level details The CML latch has the same topology as the slicer, except that it does not have the offset adjustment. Also note that the bias current and the transistor sizes are reduced by 50% since offset is not critical. The CMOS latch es are implemented as sense amplifier flip flops (SAFFs).

PAGE 90

90 Figure 4 12 Schematics of the CML and CMOS DMUX cells 4.4.4 Clocking The clocking circuitry generates clocks for the DFE and the DMUX. A full rate external clock is first divided down by a CML divider to obtain I/Q clocks as shown in Figure 4 13 Since phase inversion is simply swapping the differential signal polarity, I and IB are obtained simultaneously. The same is true for Q and QB. Figure 4 13 Schematic of the divider for I/Q generation A phase interpolato r (PI) combines the I/Q clocks with digitally controlled weights to adjust the rece iver sampling phase. The principle of PI is depicted in Figure 4 14 Phase interpolation is achieved by combining the I/Q clock phases with different

PAGE 91

91 weightings. Figure 4 15 shows the schematic of the PI, which consists of four differential pairs. Phase tuning is achieved by adjusting the tail currents of the four differential pairs. To guarantee monotonicity, the tail current in each differential pair is split into eight identical current sources, and the binary phase control word PI[5:0] is converted to thermometer code W[0:31] to control the 32 current sources. With this half rate architecture, the phase resolution of the PI is UI. (A) (B) Figure 4 14 Principle of PI Figure 4 15 Schematic of the phase interpolator

PAGE 92

92 The outp ut of the PI is further divided down to clock the DMUX. Figure 4 16 shows the level converter schematic used to convert CML logic levels to full swing CMOS for clock ing the SAF The CML clock is AC c oupled to inverters with resistive feedback. The feedback resistor and coupling capacitor values are chosen so that the lower cut off frequency is well below the target clock frequency. Figure 4 16 Level converter schematic. 4.5 Experimental Results The receiver chip was implemented in 0.13 mounted on a QFN package and assembled on an FR4 test board. Figure 4 17 shows the die micrograph along with test board picture. The receiver core occupies an area of 0.14mm 2 (A) (B) Figure 4 17 Die micrograph and board picture

PAGE 93

93 Figure 4 18 depicts the measurement setup. A PRBS generator and a 20 inch differential microstrip FR4 channel were used to validate the receiver. The PRBS generator and the RX were clocked b y two different RF sources. When evaluating the DFE the two RF sources are synchronized with the RX CDR disabled Otherwise they ran independ ently when CDR loop was enabled. The phase modulation (PM) was added for jitter tolerance measurement. T he recover ed data was monitored using a BERT and a high speed sampling oscilloscope. Measurements were performed up to 4.5 Gb/s with a 2 7 1 PRBS pattern, limited at higher data rates by equipment capability. Figure 4 18 Test setup Figure 4 19 shows the measured channel insertion loss and the resulting eye diagram at 4.5 Gb/s, showing complete eye closure due to severe ISI. The loss at Nyquist frequency is 22 dB. The measu red bathtubs at different DFE settings are shown in Figure 4 20 which were obtained by sweeping the PI control code while monitoring the receiver BER. Without DFE, error free operation was not possible. The eye opening enlarges w ith increasing DFE settings, and decreases due to over equalization after reaching the maximum eye opening. The peak eye opening is 0.5 UI. Figure 4 21 (A) shows the measured PI linearity. The minimum DNL of 0.64 LSB indicates mon otonic operation, as guaranteed by the thermometer coding. The maximum DNL is 1.5 LSB, giving a maximum phase step of 0.09 (=1.5/16) UI. The

PAGE 94

94 repetitive DNL and INL pattern s are due to the use of simple I/Q interpolation scheme [55] (A) (B) Figure 4 19 Measured channel performances. A) L oss B) E ye diagram The CDR function was evaluated by setting the frequency of the PRBS generator slightly di fferent from the RX clock source. The CDR lock range was measured to be 100 ppm confirming plesiochronous operation even though the CDR bandwidth is low due to decimation T he histogram of the recovered clock at the limit of the lock range is the shown i n Figure 4 21 (B) The RMS jitter is 13 ps. The jitter is relatively high because the clock output buffer chain shares the same power domain with the noisy digital circuitry.

PAGE 95

95 (A) (B) Figure 4 20 Measured DFE performances. A) Bathtub curves. B) E ye openings (A) (B) Figure 4 21 CDR measurement results. A) PI linearity B) R ecovered clock Jitter tolerance of the CDR was measured by phase modulating the clock of the PRBS generator and recording the modulation depth when bit error occurred. The measured jitter tolerance is shown in Figure 4 22 Below 30 KHz jitter frequency, the jitter tolerance is larger than 1 UI.

PAGE 96

96 Figure 4 22 Measured CDR jitter tolerance The RX core consumes 12.4 mW from a 1.2V supply, which translates to an FOM of 2.75 pJ/bit. Table 4 6 shows the performance summary. Table 4 6 Performance summary Input Data Rate 4.5 Gb/s De multip lexing 1:16 Equalization 1 tap speculative DFE Clock Recovery Baud rate eye tracking Power Supply 1.2 V Power 12.4 mW Process Area FoM 2.8 pJ/bit 4.6 Summary Traditional oversampling CDR involves a few design issues, including the requirement of power hungry generation and distribution of clocks with sub bit time resolution, the stringent constraint on the settling time of DFE, the possibility of sub optimal equalization of edge samples. It also locks to the center of the eye regardless of the specific eye shape, potentially leading to degraded voltage margins. Various baud rate CDRs have been proposed o ver the years. However, they either do not take into account the voltage margin, still require sampling at instants other than the data

PAGE 97

97 sampling instants, entails analog circuitry for slope detection, or is only suitable for integrating type frontends. In this Chapter, we propose a novel digital baud rate eye tracking CDR scheme that obviates the above disadvantages. It employs a CDR slicer in parallel with the main slicers, and the CDR algorithm controls both the clock phase and the threshold voltage of t he CDR slicer to drive the decision point of the CDR slicer to the peak of the eye opening. Since the CDR slicer shares the same clock phase as the main slicer, this automatically locks the RX to the point with the maximum eye opening. A majority voting DF E architecture is also presented in this Chapter wherein the selectors in a speculative DFE are replaced with majority voters. The majority voter has one less level of transistors in the stack, and is therefore more amenable to low power and high speed des igns c ompared to a selector It also reduces the DFE loop delay due to its structural simplicity. Furthermore, the majority voting DFE obviates the need for a level shifter in bipolar designs. Experimental results confirm the effectiveness of the proposed CDR scheme and the majority voting DFE. Implemented in 0.13 CMOS, the RX works reliably at 4.5 Gb/s while consuming 12.4 mW. Higher data rate is limited by the measurement equipment. The CDR displays a lock range of 100 ppm and the DFE is able to equa lize a channel with 22 dB Nyquist loss while producing a 50% UI equalized eye opening

PAGE 98

98 CHAPTER 5 A 5 Gb/s 0.7 5 pJ/ BIT VOLTAGE MODE TRANSCEIVER 5.1 Chapter Overview Chapter 3 and Chapter 4 apply some of the results from Chapter 2 to improve the link power efficiency on the architecture level, namely the removal of back termination, the channel loss reduction with air cavity transmission lines, and the us e of DFE and baud rate CDR. A few circuit techniques are also resorted to in Chapters 3 and 4, such as the current sharing frontend and the majority voting speculative DFE. The 6.25 Gb/s transceiver in Chapter 3 achieves 0.6 pJ/bit power efficiency without CDR whereas t he 4.5 Gb/s RX in Chapter 4 achieves 2.8 pJ/bit including CDR and clocking circuitry. Based upon the se results, this Chapter attempts to build a complete transceiver with better power efficiency in the same technology To attain this goal the transceiver employs a combination of architectural improvements and circuit techniques One major improvement is the signaling mode. The transceiver uses voltage mode signaling with differential termination in place of the current mode signaling used in the air cavity active link in Chapter 3. According to chapter 2, this reduces the signaling power by 75%. The oth er major improvement is the exclusive use of static CMOS logic gates instead of the CML logic gates in Chapters 3 and 4. This avoids the static current consumption of the CML gates since the CMOS gates only consume power during state transitions. To furthe r improve the power efficiency, the RX operates from a 1 V power supply, instead of the nominal 1.2 V power supply. To cope with the resulting speed degradation of the gates, the slicers heavily parallelized and a look ahead selection tree

PAGE 99

99 is used in the D FE. Heavy parallelism in the frontend also saves power by eliminating the need for an explicit DMUX. The RX in this Chapter uses the same baud rate CDR algorithm as presented in Chapter 4. However, further decimation is applied to reduce the power consumpt ion. An injection locked ring oscillator is used for clock generation to avoid the power overhead of a PLL or DLL. In place of the PI for phase rotation in Chapter 4, a delay line is used to adjust the injection clock phase so that the RX clock phases can be moved simultaneously. The result is a complete 5 Gb/s transceiver in 0.13 m bulk CMOS process with 3.7 mW power consumption. This translates to a power efficiency of 0.75 pJ/bit which is among the best reported to date 5.2 TX Implementation 5.2.1 TX Architect ure Figure 5 1 shows the TX block diagram. A full swing restorer (FSR) converts the output from a CML PRBS generator (reused from a previous design) to full swing CMOS logic levels. A tapered inverter chain acts as a pre driver b etween the FSR and the VM driver. To preserve high speed, the fan out of the predriver is designed to be two. An on chip LDO generates the supply V DRV for the VM driver from the un regulated chip supply.

PAGE 100

100 Figure 5 1 TX block diagram 5.2.2 PRBS Generator Figure 5 2 shows the block diagram of the PRBS generator. It consists of a clock buffer, a PRBS core, a buffer, and an all zero detector. This PRBS generator is reused fr om a previous design, and all the buffers and gates are implemented in fully differential CML although the drawing is single ended for simplicity The PRBS core is a linear feedback shift register (LFSR) comprised of 14 D latches clocked at 2.5 GHz The l inear feedback through the XOR gates implements the polynomial X 7 +X 6 +1 to generate a 2 7 1 maximum length sequence. A half rate architecture is chosen for easier clock distribution [56] The two 2.5 Gb/s PRBS streams with proper phase shift are multiplexed to obtain the 5 Gb/s PRBS Figure 5 2 PRBS block diagram

PAGE 101

101 One well known design issue in PRBS generator is the all zero state of the LFSR which will circulate indefinitely once the LFSR falls into this state To prevent this from happening, [57] [58] uses a reset signal to manually insert a one into the LFSR. This solution will not work if the LFSR accid entally falls into the all zero state during normal operation (for instance due to power supply disturbance) A better solution is to monitor the LFSR and automatically reset it if such a n all zero state is detected [59] uses logic gates to detect the all zero state, which is complex and timing critical. [60] instead detect the average DC level of the LFSR outputs. Although this solution is not timing critical, it still needs a dditional routing for all the LFSR outputs and thereby incurs extra loading and complicates the layout. the all zero state. Instead, monitoring the final generator output would suffice. This avoids the loading and layout complication. Figure 5 3 shows the all zero detection used in this work. The RC filter has a cut off frequency of 2 MHz and filters out the high frequency component. Since a PRBS is nearly DC balanced, P an d N should have nearly the same DC voltages. When the LFSR falls into the all zero state, however, P will have a lower DC voltage than N, and the comparator senses such a condition and resets the LFSR. Figure 5 4 shows the schemat ic of the self biased comparator. For robust operation, the comparator has a built in offset of roughly 60 mV so that it will not activate reset during normal operation. Figure 5 5 shows the simulated waveforms of the all zero det ector. At start up, the PRBS is stuck at the all zero state. The detector initiated.

PAGE 102

102 Figure 5 3 All ze ro detector Figure 5 4 Schematic of the self biased comparator with offset Figure 5 5 Simulated waveforms confirming t he function of the all zero detector 5.2.3 LDO The LDO powers the TX driver for better supply noise rejection and also provides a convenient means for adjust ing the TX output swing. For a single ended output swing of 100 mV, the driver current consumption is 1 m A with differential RX termination. With

PAGE 103

103 a width of t he pass element is large enough to source 10 mA to support larger swings in measurement The error amplifier is a simple two stage opamp. The dominant pole is located at the V DRV node due to th e large decoupling capacitor Figure 5 6 shows the stability simulation results The phase margin is 72 degrees (A) (B) Figure 5 6 Stability of the LDO 5.2.4 TX Dr iver Since the targeted TX swing is less than 100 mV, the TX employs an N over N VM driver [6] [39] as shown in Figure 5 1 Exclusive use of NMOS in the driver reduces the input capacitance and therefore the predriver power consumption compared to an inverter driver [61] The transistors are sized for 50 on for proper channel back

PAGE 104

104 termination. Note the top NMOS is sized slightly larger than t he bottom one since it sees less overdrive voltage. 5.3 RX Implementation 5.3.1 RX Architecture Figure 5 7 depicts the receiver block diagram with differential termination. Because the TX output has low common mode voltage, the input signal s V P and V N are first shifted up to enable NMOS transistors at the input of the slicers. Equalization is done with 1 tap speculative DFE for its high signaling efficiency compared to TX FFE [16] A bank of 32 slicers performs d igitization and direct 1:16 de multiplexing. Two synchronized, and 17 of them are selected to accomplish DFE. The ILRO, locked to a 312.5 MHz external source, generates 16 clocks phases CK[0:15] for the slicer bank. The CDR logic extracts timing information from the 17 bits and adjusts the phase of the injection clock to track the maximum eye opening. Figure 5 7 RX block diagram

PAGE 105

105 5.3.2 Slicer Design The most important design goals of the slicer include power, speed and sensitivity. The slicers are implemented as SAFFs to avoid static power consumption, as shown in F igure 5 8 With 16 way interleaving, the speed requirement on the slicer is much relaxed, leaving its sensitivity the focus of design optimization. One factor that impacts the slicer sensitivity is transistor mismatch. To reduce the input capacitance and power consumption, the slicers are sized to near minimum. As a result, the simulated 1 38 mV To improve RX sensitivity, all the slicers have 8 b offset trimming. The trimming range is designed to be 160 mV, yielding a trimming resolution of 1.25 mV. F igure 5 8 Schematic of the slicer Another factor that impacts the slicer sensitivity is hysteresis, including the hysteresis due to incomplete resetting of the SA core, and the hysteresis d ue to the imbalanced input capacitances of the RS latch that follows the SA core [62] With heavy front end parallelism, the SA core has enough time to completely reset and no hysteresis is observed due to the SA core. To remov e the hysteresis due to the imbalanced RS latch input capacitance, a buffer stage is inserted between the SA core and the RS latch, as shown in F igure 5 8 Simulation indicates that without this buffer

PAGE 106

106 stage, the slicer has a hyst eresis of 30 mV, whereas inserting the buffer stage makes the hysteresis negligible. 5.3.3 Level Shift ing and DFE Tap Generation The slicers use NMOS input transistors for faster operation. However, the RX input has a common mode level close to ground due to the use of the VM signaling. A level shifter is therefore required before the slicers to shift up the input signals by V LS Level shifting can be accomplished with an AC coupling capacitor [63] or a common gate (CG) amplifier [64] as shown in Figure 5 9 (A) and (B) AC coupling does not consume power but cuts off the low frequency component of the input signal. On the other hand, a CG amplifier provides DC coverage but dissipat es excessive power due to the stringent bandwidth requirement. This is especially true when driving the large input capacitance of the heavily parallelized slicer bank. Figure 5 9 (C) shows the basic idea of the proposed level shif ter, which combines the advantages of both a capacitor provides a high frequency signal path while a source follower enables DC coverage. (A) (B) (C) Figure 5 9 Level shifters. A) Capacitor based B) CG amp based C) P roposed Figure 5 10 shows the detailed schematic of the level shifter. The AC coupling capacitor is implemen ted as a NMOS transistor with source and drain shorted to the input. The shifting voltage is adjusted by tuning VB. To control the low frequency gain, the source follower is broken into 4 identical segments, with the input of each segment

PAGE 107

107 switchable betwee n the input and the common mode voltage by GAIN[3:0]. When all the four inputs are switched to the common mode voltage (GAIN=0), the DC path of the level shifter is disabled. Figure 5 11 shows the simulated frequency response of t he level shifter at different gain settings. When the DC path is disabled, the level shifter has a low cut off frequency of 3M Hz. Because of its much relaxed bandwidth requirement, the source follower consumes n Figure 5 10 Detailed schematic of the level shifter Figure 5 11 Simulated frequency response of the level shifter at different gain settings The level shifters also provide a convenient means of generating the DFE tap. This is achieved by introducing an offset in the shifting voltages of V P and V N as shown in Figure 5 7 doing so would have required too large a slicer trimming range when the input swing is high

PAGE 108

108 5.3.4 DFE with Look Ahead Selection Tree The slicer bank is implemented using a 16 way parallel architecture to relax speed requirement and avoid the added power consumption by an explicit de multiplexer. A critical issue in the speculative DFE is the stringent timing constraint, which occurs when decisions are selected based on previously r eceived bits. For a straightforward implementation of the DFE selection tree shown in Figure 5 13 (A) the previous bits must ripple through all 16 selectors under worst case conditions and the resulting timing constraint is where and are the delay and set up times of the D flip flop, is the selector delay, and is the bit time. Figure 5 12 shows the simulated as a function of V DD before layout extraction. At 1.0 V, the delay is about 120 ps. Considering the parasitics due to wiring, such a delay is marginal for 5 Gb/s operation ( ). Figure 5 12 Simulated pre layout selector delay vs. power supply This work uses a look ahead selection tree to expedite the selection process. Two possible sets of decisions for Q[8:15] are pre computed an d then selected, as shown in Figure 5 13 (B) The timing constraint now becomes

PAGE 109

109 which is relaxed by nearly 50% compared to the straightforward implementation. (A) (B) Figure 5 13 DFE selection tree. A) Conventional B) L ook ahead 5.3.5 Decimated Baud Rate CDR The RX employs the same baud rate CDR scheme as that in chapter 4 to reduce the clocking power compared to Alexander type CDRs [64] If we want to monitor all CK[0:15], 32 more slicers will be required, leading to considerable power and area overhead. To further reduce power consumption, only CK[8] is monitored in this work This greatly reduces the number of CDR sli cers by more than 90%, from 32 to 2. Although this decimation reduces CDR bandwidth, it is generally acceptable for mesochronous chip to chip links [64] Note that because of heavy parallelism, the reduction in input capacitance and area is more pronounced compared to the decimation in [64] 5.4 Injection Locking Based Clock Generation 5.4.1 Clock Generation Overview Despite a 50% reduction in the number of clock phases by the baud rate CDR, generating the req uired 16 phases for the slicer bank is still non trivial. Injection locking based clock generation is chosen in place of PLL or DLL based schemes for its low power and superior jitter performance. Figure 5 14 shows the block diag ram of the clock

PAGE 110

110 generation circuitry At the core lie two cascaded (master and slave) low power injection locked ring oscillators (ILROs). Both ILROs are digitally trimmed to ensure smatch and duty cycle distortion due to injection locking [65] A bank of current starved delay lines facilitates further phase calibration. Phase tuning of ILRO is usually done by adjusting the free run frequency of the ILRO [66] [22] [67] However, tuning the free run frequency of the ILRO may change the phase relationship between its outputs and degrade the RX timing margin. In this work, the phases of the ILRO outputs are tuned by adjusting the injection clock phase with a n additional delay line controlled by the CDR logic, as shown in Figure 5 14 Figure 5 14 Block diagram of the injection locking based clock generation 5.4.2 ILRO Core The master and slave ILROs are of the same design. Figure 5 15 shows the ILRO core schematic. Eight p seudo differential delay cells constructe d from inverters are used instead of CML delay cells to avoid static current consumption. The input clock phases are injected through NMOS transistor s To ensure locking, t he free run frequency of the oscillator is digitally trimmed.

PAGE 111

111 Figure 5 15 S chematic of the ILRO core One design issue of the pseudo differential oscillator is its start up. Because there are even stages of delay cells, a stable DC solution exists where the whole ring beh aves like a latch, as shown in Figure 5 16 To prevent that from happening, the cross coupled inverters must be sized large enough compared to the main inverters In this design, the cross coupled inverters are sized of the m ain inverters for reliable start up as annotated in Figure 5 15 Figure 5 16 Start up issue of the pseudo differential oscillator 5.4.3 Delay Line The delay lines are constructed from cascading current starved delay cells, the schematic of which is shown in Figure 5 17 where a 4 b digitally controlled current sets the bias current of the i nverters. Figure 5 18 shows the simulated tuning curve of one delay cell. The tuning range is 30 ps. The CDR delay line consists of 8 delay cells. The

PAGE 112

112 total tuning range of 240 ps is larger than 1 UI for reliable CDR operation especially when the extra delay caused by parasitics is c onsidered. Figure 5 17 Schematic of the current starved delay line Figure 5 18 Simulated delay line tuning curve 5.5 Expe rimental Results The transceiver was fabricated in a 0.13 nominal V T devices. The test chip was assembled in a 32 pin QFN package and mounted on an FR4 board. Figure 5 19 shows the chip micrograph. The RX measures while the TX occupies 5.5.1 TX Measurement The TX is measured at different supply voltages. With a 1.5 V supply the TX is able to work up to 6.25 Gb/s, whereas at 1.2 V the TX is able to work at 5 Gb/s. Below 1.2 V the TX does not work properly, probably limited by the CML PRBS core. Figure 5 20 (A) shows the measured TX eye diagrams at 6.25 Gb/s. The RMS jitter is 11 ps.

PAGE 113

113 Figure 5 20 (B) shows the captured transient of the TX out put, which confirms correct 2 7 1 pattern generation. Figure 5 19 Chip micrograph and transceiver layout (A) (B) Figure 5 20 TX measurement results at 6.25 Gb/s. A) O utput eye diagram B) TX transient showing correct 2 7 1 PRBS patter.

PAGE 114

114 5.5.2 Clocking Measurement Figure 5 21 shows the measured tuning curve and locking range of the ILRO. The ILRO has a tuning range of more than 500 MHz, and the locking range is larger than 10% when the free run frequency is 312.5 MHz. (A) (B) Figure 5 21 ILRO measurement results. A) F requency tuning B) L ock ing range Figure 5 22 shows measured phase noises with and without injection. At 100 KHz offset, injection locking suppresses the phase noise by more than 70 dB.

PAGE 115

115 Figure 5 22 Measured phase noise with and without injection locking The measured CDR delay line tuning curve is shown in Figure 5 23 The tuning range is 400 ps, which covers 2 UI when the data rate is 5 Gb/s. The measur ed tuning range is more than 60% larger than simulation results, indicating heavy parasitics due to routing. Figure 5 23 Measured CDR delay line tuning curve showing >2 UI tuning range 5.5.3 RX Measurement Standalone RX measurement is done up to 4 Gb/s due to equipment limit. Figure 5 24 Figure 5 25 shows the 4 Gb /s eye diagrams before and after the channel. Due to severe channel loss, the eye is completely closed after the channel.

PAGE 116

116 Figure 5 24 Figure 5 25 Measured 4 Figure 5 26 shows the measured bathtubs with and without DFE. Error free operatio n cannot be attained without DFE, while the eye opening is 30 % when DFE is enabled. Figure 5 26 shows the recovered clock. The RMS jitter is 4.85 ps, while the p p jitter is 42 ps. Figure 5 26 RX bathtubs with and without DFE

PAGE 117

117 Figure 5 27 Jitter histogram of the recovered clock The receiver core is powered from a 1V supply, and dissipates 1.1 mW, which translates to a power efficiency of 0.28 pJ/bit. Table 5 1 compares the performance to some recently published work. The power efficiency is nearly a 2 improvement over the best result of previously published complete recei vers. Table 5 1 Performance summary of the receiver [6] [7] [22] This work Data rate (Gb/s) 6.25 12. 5 8 4 Equalization CTLE CTLE CTLE DFE Nyquist loss (dB) 15 12 9.7 19 Sub rate 1/2 1/2 1/10 1/16 Clock generation PLL PLL ILRO ILRO CDR Alexander Buad rate NA Buad rate eye tracking J rms (ps) NA 2.2 4 4.85 Technology 90 nm 65 nm 65 nm 0.13 m V DD ( V) 1.0 1.0 0.6/1.0 1.0 Power (mW) 8.22 6.6 1.3 1.98 1.1 Area (mm 2 ) 0.15 0.24 0.014 0.018 0.15 FoM (pJ/bit) 1.31 0.53 0.16 0.25 0.28 5.5.4 Transceiver Measurement at 5 Gb/s, although the TX is capable o f operating at 6.25 Gb/s.

PAGE 118

118 Figure 5 28 channel. Although the Nyquist channel loss is less than the standalone RX measurement, the eye is still completely closed due to the bandwidth and jitter of the TX. The near end TX RMS jitter is 13 ps. (A) (B) Figure 5 28 Measured 5 Gb/s TX eye diagrams A) Before the channel. B) Af channel Figure 5 29 show the recovered data and clock of the RX. The recovered clock has an RMS jitter of 6.9 ps. Figure 5 30 shows the RX bathtubs before and after enabling the DFE. The eye opening with DFE enabled is 18%.

PAGE 119

119 (A ) (B) Figure 5 29 Measured CDR waveforms. A) Recovered 312.5 Mb/s data B) Recovered 312.5 M clock Figure 5 30 RX bath tubs with and withou DFE T he TX works from a 1.2 V supply and consumes 2.1 mW, while the RX consumes 1.6 mW from a 1 V supply The total power consumption of the transceiver is 3. 7 mW, and the power efficiency is 0.75 pJ/bit. Table 5 2 compares the transceiver

PAGE 120

120 performance with some recent publications. Even though we use a lelatively less advanced technology, the power efficiency is among the best. Table 5 2 Performan ce summary of the transceiver [42] [6] [7] [68] This work Technology 65 nm 90 nm 65 nm 45 nm 0.13 m TX V DD (V) 0.68 1.0 1.0 V 0.8 1.2 V RX V DD (V) 0.68 1.0 1.0 0.8 1.0 Data rate (Gb/s) 5 6.25 12.5 10 5 Nyquist loss (dB) 4 15 12 8 12 TX swing (mVpp) 100 200 150 150 160 BER 1e 12 1e 15 1e 12 1e 14 1e 12 Eye opening (UI) 30% 43% 18% Power (mW ) 13.5 14 12 14 3.7 Energy efficiency (pJ/bit) 2.7 2.24 0.98 1.4 0.75 TX/RX area (mm 2 ) 0.03/0.06 0.31/0.31 0.24/0.24 0.07/0.07 0.15/0.12 5.6 Summary Building on the results in Chapter 3 and Chapter 4, this Chapter presents a 5 Gb/s 0.7 5 pJ/bit transceiver in 0.13 um bulk CMOS technology. Various design techniques are combined to attain this high power efficiency including the VM signaling with differential termination to reduce the signaling power by 75% compared to CM signaling the e xclusive use of static CMOS gates to avoid the static power consumption of CML gates, the injection locking based clock generation decimation in the CDR circuitry, and low voltage RX operation enabled by the heavy frontend parallelism and the look ahead D FE selection tree The heavy parallelism also eliminates the need for an explicit DMUX, leading to further power reduction.

PAGE 121

121 Even though the transceiver is implemented in a less advanced 0.13 um CMOS technology, the achieved power efficiency of 0.75 pJ/bit is among the best reported to therefore believed that the techniques presented in this C hapter will help enable the Tb/s aggregate off chip signaling of future electronic systems

PAGE 122

122 CHAPTER 6 A DIGITAL BACKGROUND ADC CALIBRATION T ECHNIQUE 6.1 Chapter Overview The continuous scaling of CMOS technology has made digital signal processing more powerful and affordable. Compared to analog signal processing, digital solutions have the advantages of greater flexibility and better scalability. As a result, there is a trend of moving more and more signal processing into the digital domain. This trend is also reflected in high speed serial links [8] [69] [70] whe re an ADC digitizes the distorted incoming bit stream and a DSP carries out the signal processing such as equalization and timing recovery in the digital domain, as shown in Figure 6 1 Figure 6 1 An ADC based serial link One of the key challenges in such ADC based serial links is the design of a high speed low power ADC. Due to its high speed, a flash ADC is often the architecture of choice. For low power consum ption, it is desirable to use small transistors in the flash ADC. However, the mismatch between transistors becomes worse with small transistor sizes, which will degrade the linearity of the ADC if left unaddressed. For example, consider the preamp in Figure 6 2 often found in flash ADCs. Around balanced condition, the input and output are related by where is the preamp gain, and are the differential output, input and reference voltages respectively. The last term, i s the offset voltage of the preamp due to device mismatches. With proper design and

PAGE 123

123 layout, has a zero mean (no systematic offset) and a certain spread determined by circuit details and the fabrication technology. For typical bias conditions, is dominated by transistor threshold voltage mismatch [71] and can be expressed as where is a parameter determined by the technology, and is the gate area of the transistors. To satisfy linearity requirement, the transistors must be sized large enough so that is kept within a frac tion of the ADC step size. With the transistor length and current density largely determined by speed requirement, W is the only design variable that can be exploited to reduce According to Equation 6 2 to decrease by half, the transistor width and therefore the current consumption must be increased by a very unfavorable tradeoff for low power designs. As technology scales down, this tradeoff is expected to become more and more challenging due to effects such as random dopant fluctuation (RDF) and line edge roughness (LER) [72] Figure 6 2 Schematic of a preamp Since offset changes slowly over time with environ mental (supply voltage and temperature) variations and device aging, it can be cancelled with some form of calibration effectively. Various calibration schemes have been proposed in the past for

PAGE 124

124 flash ADCs, which all fall into either the foreground [73] [74] [75] or the background categories [76] [77] A foreground calibration scheme mandates temporarily interrupti ng the normal ADC operation and is therefore usually done at power up or during certain idle times when allowed by the system. However, as the supply voltage and temperature change over time, the calibration results may no longer be optimum, leading to deg raded performance [78] In contrast, a background calibration scheme does not require interrupting the ADC operation and can run continuously to track environmental variations and device aging. Thus, background calibration sche mes are generally preferred. Some of the critical challenges in background calibration for high speed ADCs are accuracy, convergence speed, area/power overhead, and performance penalty. Despite the many background calibration techniques proposed in the pas t, a quick literature review demonstrates the need for an improved background calibration scheme that is suitable for high speed ADCs. In response, t his Chapter describes a novel background calibration scheme for ADCs which features negligible hardware and power overhead. The proposed calibration scheme is implemented in a 50 mW 2.5 GS/s 5 bit flash ADC and its effectiveness is verified with experimental results. 6.2 Background Calibration 6.2.1 Review of P rior Art Several background calibration schemes for flash ADC s have been reported in literature, and are briefly reviewed here. Correlation based calibration operates by modulating the analog input signal with pseudo random sequences to extract offset information from the resulting statistics of the digital output and has been proposed for [79] [80] [81] [82] In [79] and [80] the analog input is

PAGE 125

125 converted to a white signal with little energy at DC by chopping it with a pseudo random binary sequence. The DC component in the resulted signal stems mainly from the ADC offset. By forcing this DC component to zero, the c omparator offset can be effectively removed. A more general approach is proposed in [81] where the offset of a comparator is detected by chopping the analog input with a sequence from an on chip random number generator (RGN) a nd observing the code distribution of the digital outputs, as illustrated in Figure 6 3 (drawn single ended for simplicity). The chopping operation degrades the ADC sample rate because it needs finite time to settle. Due to this a chip generated random sequence and the calibration results are prone to fluctuation which can only be minimized at the cost of the convergence speed [81] Furthermore, Correlation based calibration invariably introduces performance penalty because they interfere with the analog signal path with chopping or noise injection. For fast and robust calibration, deterministic schemes are generally preferred Figure 6 3 C orrelation based calibration

PAGE 126

126 Redundancy based calibration [83] [77] [84] achieves deterministic operation by employing redundant elements to enable un interrupted ADC operation when some of the elements undergo calibration. Figure 6 4 shows the 6b ADC block diagram with background calibration as reported in [76] where 64 instead of 63 comparators (C1 C64) are employed in parallel. When C1 is being calibrated, the other 63 comparators (C2 array is re configured so that C1 and C3 C64 work together as a normal ADC and C2 undergoes calibration, with the ADC operation un interrupted. This process repeats continuously and in the end all the comparators are calibrated. The advantage of this technique is its low hardware overhead. However, this technique still incurs speed penalty because it needs to reconfigure the ADC during its normal operation. Figure 6 4 R edundancy based calibration Refer ence ADC based calibration schemes proposed in [85] [86] [87] employ a slow but accurate reference ADC to improve the linearity of th e fast but inaccurate main ADC Figure 6 5 shows a simplified block diagram of the reference ADC based calibration scheme, while Figure 6 6 shows its working principle. For simplicity, we assume that the main ADC has 3 b resolution. I n Figure 6 6 the transfer curve s of the main ADC and the ideal reference ADC are overlaid Denoting the transition levels of

PAGE 127

127 the main and reference ADCs as and respectively, any offset will cause to differ from These differences are marked by gray bars in Figure 6 6 and are referred to as calibration windows hereafter. Whenever falls within the calibration windows, a discrepancy occurs betwe en the reference and main ADC outputs. The calibration engine then examines such discrepancies and drives toward the ideal Figure 6 5 R eference ADC based calibration Figure 6 6 Principle of r eference ADC based calibration Although reference ADC based calibration is deterministic and incurs negligible performance penalty, there is considerable design overhead when the reference and main ADCs are entirely different pipeline ADC in [87] Furthermore, because the main and reference ADCs operate from different sampling clocks, mi smatch in their track and hold (T/H) circuits can degrade the calibration accuracy. To alleviate this problem, one has to resort to either power

PAGE 128

1 28 hungry T/H circuits to drive both ADCs [86] or dedicated timing calibration for the two sampling clocks [88] both of which are very challenging at high speeds. These disadvantages can be avoided with the so reference ADC is simply a replica of the main ADC an d operates at the same speed [78] [89] The replica ADC, however, incurs significant area, input capacitance and power overhead. 6.2.2 Proposed Background Calibration Scheme In the reference ADC based ca libration scheme, all the transition levels are calibrated simultaneously. This necessitates a reference ADC with at least the same resolution as the main ADC, and thus high overhead seems inevitable. However, because offset varies slowly over time, the tr ansition levels can be calibrated sequentially instead of simultaneously The benefit of this sequential calibration is the greatly reduced complexity of the reference ADC. In the extreme case, as in our proposed calibration scheme, 1 b resolution is suffi cient, and the reference ADC degenerates to a single comparator. Figure 6 7 shows a block diagram of the proposed calibration scheme. The reference ADC is now replaced with a single comparator, whose threshold voltage is reconfigu rable through a digital to analog converter (DAC). At the beginning, the as shown in Figure 6 8 (A) By monitoring the outputs of the ADC and the comparato r, the calibration engine adjusts until After calibrating voltage is set to and calibration of begins, as shown in Figure 6 8 (B) By iterating the same process, all the transition levels of the main ADC can be calibrated.

PAGE 129

129 The resulting fully calibrated transfer curve of the ADC is shown in Figure 6 8 (H) The performance metrics of the proposed calibration scheme are discussed below. Figure 6 7 P roposed reconfigurable comparator based calibration (A) (B) (C) (D) (E) (F) (G) (H) Figure 6 8 P rinc iple of the proposed calibration scheme. The transition levels are calibrated sequentially in A ) G ), and the resulting transfer curve is shown in H ).

PAGE 130

130 6.2.2.1 Calibration a ccuracy The calibration accuracy is determined by a few factors, including the reference ADC accuracy, the calibration step size, and noise. The discussion above assumes an ideal reference ADC. In reality, however, both the DAC and the comparator in the reference ADC introduce errors and ultimately limit the calibration accuracy. Moreover, due to the digital nature of the calibration scheme, the main ADC can only be adjusted in discrete steps. The reference ADC accuracy, together with the finite calibration step size, limits the overall calibration accuracy. Once the ADC is calibrated, the residual error in the transition level is bounded by where is the DAC error, is the offset of the comparator in the reference ADC, and is the calibration step size. The calibrated INL and DNL are bounded by and respectively. Notice that does not impact the calibrated DNL. This is because appears in all the calibrated transition levels and merely causes a DC sh ift in the calibrated transfer curve. The effect of noise on calibration accuracy is shown in Figure 6 9 for the case where denotes the mean of a random variable. For convenience, the noise is lumped to in Figure 6 9 Ideally, whenever a discrepancy occurs, it should indicate and correct calibration can be made. However, due to noise, may be temporarily higher than as indicated by the dashed line in Figure 6 9

PAGE 131

131 and this may cause incorrect calibration to occur. To improve immunity to noise, the calibration engine can average multiple discrepancies before making a decision. Figure 6 9 Mechanism of noise induced calibration error Because the reference ADC shares the same T/H and sampling clock as the main ADC, the calibration accuracy of the proposed scheme does not suffer from the T/H mismatch issue as the conventional reference ADC based approach does. Nor is it sensitive to the statistics of the input signal since it does not rely on the correlation between the input signal and an on chip pseudo random sequence. 6.2.2.2 Convergen ce s peed To calculate the convergence speed, we assume distributes uniformly within the full scale input range V FS Similar calculations can be carried out for other input distributions, such as those of sine waves. Suppose the initial offset of a certain transition level is The probability that the input produces a discrepancy is and on average conversions are needed to reduce the offset by one step, where is the smallest integer that is larger than Therefore, the number of conversions to calibrate the offset is

PAGE 132

132 If we assume the offset is a normal distribution with a mean of zero and a particular transition level is Exploiting the symmetry of the integrand and assuming the offset is within [ we can approximate the above integral as For an N bit ADC, there are 2 N 1 transition levels. The total number of conversions for the calibration to converge is Since Equations 6 8 and 6 9 are combined to yield Figure 6 10 plots For a 5 bit ADC, when the calibration takes about conversions to converge. Note that while grows at a rate of 2 2N it is a relatively

PAGE 133

133 to increases the required number of conversions by only 37%. This is because calibrating small offsets takes more conversions as the input has a lower chance of producing a discrepancy when the offset is small. Figure 6 10 Required conversions for convergence with different resolutions 6.2.2.3 Calibration o verhead and p erformance c onsiderations The calibration overhead consists mainly of the reference ADC, the calibration engine, the memory to store the offset con trol words, and the circuitry to adjust the main ADC offset. With the calibration engine, the memory and the adjustment circuitry being common to all digital calibration schemes, the major overhead advantage of the proposed scheme lies in the simplicity of the reference ADC. The comparator in the reference ADC can reuse the design available in the main ADC and entails no extra design effort. The DAC in the reference ADC is only used to set the threshold voltage and its speed requirement is much relaxed comp ared to the main ADC sample rate. The power, area, and design overhead of the reference ADC is therefore trivial. The proposed calibration scheme does not require noise injection or chopping as seen in correlation based calibrations. While redundancy bas ed calibration reconfigures the main ADC during normal operation, the calibration scheme herein does not.

PAGE 134

134 Moreover, it does not insert extra conversion cycles thereby avoiding any speed penalty. Although the reference ADC does increase the input capacitanc e, th is penalty is minimal because only a single comparator is used. For example, calibrating a 5 b flash ADC with the proposed scheme increases the input capacitance by less than 4%. This is in stark contrast to the split ADC architecture, which increases the input capacitance by Table 6 1 shows a comparison of various background calibration schemes. The proposed calibration engine achieves deterministic operation, introduces little performance penalty, and incurs low hardware and design overhead. B ecause the calibration is sequential its convergence is slower than the split ADC architecture. This usually is not detrimental since environmental variations are slow. When fast convergence is desired (for example, to reduce the te st time during mass production), foreground calibration can be performed at power up before the background calibration is enabled. Table 6 1 Comparison of p roposed and e xisting b ackground c a libration s chemes Deterministic Perf ormance Penalty Hardware Overhead Design Effort Converg Speed Correlation based No Yes Medium Medium Low Redundancy based Yes Yes Low Low High Ref. ADC based Yes No High High Medium Split ADC Yes Yes High Low High This work Yes No Low Low Medium 6.3 Chip Implementation 6.3.1 ADC Architecture Figure 6 11 depicts a block diagram of the implemented 5 bit flash ADC with the calibration circuitry (drawn single ended for simplicity, though the real impl ementation is

PAGE 135

135 differential). The main ADC consists of a track and hold (T/H), a resistor ladder, a comparator array, and a digital backend. The comparator array is comprised of comparators C[1:31], which digitize the sampled analog input against 31 evenly spaced reference voltages V R [1:31] from the resistor ladder. The resulting thermometer codes are then converted to binary format by the digital backend which also corrects first order bubble errors. Figure 6 11 Block diagram of the ADC The calibration circuitry consists of the resistor ladder and the shaded blocks in Figure 6 11 The switch bank SR, the resistor ladder and the comparator C[0] make up the reference ADC. The SRAM stores the offset control words W[1:31] for C1~C31. The finite state machine (FSM) communicates with the SRAM through the address decoder and serves as the calibration engine. The chip also houses a serial interface. This facilitates digital c ontrol of the bias generator and allows clearing the SRAM content to disable calibration.

PAGE 136

136 6.3.2 Resistor Ladder Since the resistor ladder generates the reference voltages for the reference ADC, its linearity ultimately determines the achievable calibration accu racy. For an N bit ADC, the requirement on the resistors used in the ladder is [90] where R is the nominal resistance and is the variance. The resistor ladder consists of ide which is better than 8 bit accuracy [91] To stabilize the reference voltages and suppress input feedthrough, decoupling PMOS capacitors are connect ed to all resistor ladder output taps [92] The resistor ladder consumes 0.21 mW. 6.3.3 T/H A passive T/H precedes the comparator array, the schematic of which is shown in Figure 6 12 (A) By presenting a sta tic signal to the comparator array during quantization, the T/H helps minimize linearity degradation due to signal dependent comparator delays and the clock and signal skew between comparators. Since the input voltage swing is from V DD 0.4V to V DD PMOS tr ansistors are used. This also eliminates the need for a buffer to shift the input common mode level [93] [92] The bandwidth of the T/H is determined by the on resistance of the switch and the sampl ing capacitor. Figure 6 12 (B) shows the small signal model of the T/H, where C PAD is the pad parasitic capacitance, C sample resistor is the parallel combination of the channel impedance and the on chip being the channel resistance and the gate capa citance of a unit width transistor

PAGE 137

137 respectively. A larger transistor has a lower on resistance and thus tends to give a higher bandwidth. However, when the on bandwidth will drop with increasing transistor width because the parasitic capacitance begins to dominate. An optimum transistor size therefore exists which maximizes the total T/H bandwidth. Figure 6 13 plots the T/H bandwidth as a function of the transistor width. It can be seen that a w idth of 28um gives the highest bandwidth. However, the optimum is not a very sharp one. A transistor width of 14um is chosen instead, with only a 10% drop in bandwidth, while saving about 0.2mW on clocking. (A) (B) Figure 6 12 T/H Design. A) Schematic B) Its small signal model Figure 6 13 T/H Bandwidth vs. switch width

PAGE 138

138 A few mechanisms limit the T/H linearity, in cluding signal dependent charge injection, clock feedthrough, and nonlinear channel resistance during track mode [94] Dummy switches driven by a delayed complementary clock are used at both sides of the sampling switch to cance l the charge injection [92] With second order distortion largely removed by differential signaling, the third order term dominates the distortion performance. Simulation shows that, when sampling a 1.4GHz full scale sine wave at 2.5GS/s, the T/H achieves 45dBc third order harmonic distortion, with 1.5dB improvement by the dummy switches. 6.3.4 Comparator Figure 6 14 shows the block diagram of the comparator. A three stage preamplifier followed by a regenera tive latch digitizes the difference between input and reference voltages. Another two latch stages reduce metastability and convert current mode logic (CML) levels to full swing CMOS logic levels. A current steering DAC accepts the control word from the SR AM and injects static current into the output of the first preamplifier stage to cancel the offset of the whole comparator. Figure 6 14 Comparator block diagram. Compared to a dynamic comp arator [74] the preamplifier expedites the regeneration in the latch [95] suppresses charge kickback, and provides better power supply and common mode rejections The preamplifier consists of three stages (P1~P3) for fast overdrive recovery [90] [93] Figure 6 15 shows the schematics of P1, P2, and

PAGE 139

139 the DAC. Resistor loads are used instead of diode connected transistors to avoid the voltage headroom due to the transistor V T [93] Figure 6 15 Schematics of the first two stages of the pre amplifier For high speed operation, the bandwidth of the preamplifiers must be maximized. this practice is limited by two factors. First, the transit frequency of a transistor increases slowly at high current densities, as shown in Figure 6 16 (A) which means the current efficiency drops at high current densities, even without considering the drop caused by velo city saturation. Second, the highest current density is limited by the supply voltage due to voltage headroom issues. For P1, ignoring the currents through M3, the gain is given by where is the transconductance of M1 and M2, is the current through M1 and M2, and is the voltage drop on R1 when the differential pair is balanced. The term is due to the fact that half of the bias current flows through M 1B and M 2B and does not produce any gain. Since V INP V INN V RP and V RN all vary between V DD 0.4V to V DD to prevent M1 and M2 from entering linear region, must be kept below or about 0.25

PAGE 140

140 V considering the body effect. Figure 6 16 (B) plots as a function of the current density, assuming a moderate gain of 2. It can be seen that the speed and gain limit. To solve this problem, two transistors biased in the saturation region (M 3A and M 3B ) are used to bypass half of the current to reduce the volta ge headroom on R 1A and R 1B by half [96] as also shown in Figure 6 16 (B) The chosen current density is 5 Since P2 has less self loading, it can achieve a larger GBW than P1 given the same bias c ondition and fanout. The gain of P2 is therefore designed 70% higher than P1, while the bandwidths of P1 and P2 are kept the same. No inductive peaking is used to save area. (A) (B) Figure 6 16 Effects of M 3 A) Transit frequency vs. current densit y. B) R equired voltage drop on the load resistor vs. current densit y.

PAGE 141

141 Figure 6 17 Schematic of the CML latches A CML flip fl op and a sense amplifier flip flop (SAFF) complete the comparator. Figure 6 17 shows the CML flip flop, which is constructed with the conventional master slave topology. Figure 6 18 shows the SAFF schema tic. It consists of a sense amplifier (SA) and a set reset (SR) latch. The SAFF provides additional gain to suppress metastability errors and convert CML levels to full swing CMOS levels. With the to be better than [97] Figure 6 18 Schematic of the SAFF Figure 6 19 shows the current steering DAC. A bias generato r shared by all the comparators generates three bias voltages. The offset control word W[N] selects from these three bias voltages and VSS to inject an appropriate current to comparator C[N] and cancel its offset.

PAGE 142

142 Figure 6 19 Current steering DAC and the DAC bias generator. The bias generator is shared by all the comparators. One important design parameter of the current steering DAC is its calibration range This range is selected based on the comparator offset and the yield target. To reduce area and power consumption, the transistors in the comparators are sized close to the minimum. Figure 6 20 (A) shows the simulated co mparator offset which is 22.5 mV (0.9 LSB) and is dominated by the preamplifier. For a certain calibration range, the yield range, and, assuming a Gaussian distribution for the comparator offset, is given by Figure 6 20 (B) shows t he yield as a function of the normalized calibration range. To achieve a yield higher than 90%, the normalized calibration range should be higher than 6. In this prototype, the maximum I DAC is programmable through the serial interfa ce, and the simulated can cover up to as Figure 6 20 (C) shows. The other key parameter of the current steering DAC is its resolution, which determines the calibration step and the achievable calibration a ccuracy as discussed previously. In this prototype, 5 b resolution is chosen. When the calibration range is

PAGE 143

143 programmed to 5.4 LSB ( ), the calibration step is 0.19 LSB. With the resistor ladder providing higher than 8 b linearity, this guarante es a calibration accuracy of 0.5 LSB according to Equation 6 3 (A) (B) (C) Figure 6 20 Simulated comparator performance s. A) O ffset B) Y ield vs. normalized calibration range C) C alibration range.

PAGE 144

144 6.3.5 Digital Backend A digital backend converts the output thermometer codes of the comparator array to binary format. It also provides the capability of correcting or minimizing errors due to bubbles or metastabilities. Figure 6 21 shows the block diagram of the digital backend. The three input AND gate array converts the thermometer codes to one hot codes and provides 1 st order bubble error correction. The one hot codes are then used to address a quasi gray code ROM encoder [98] Simple XOR gates convert the quasi gray code to binary codes. The binary codes are then decimated by 64 to accommodate the limited bandwidth of the test equipment. Figure 6 21 Block diagram of the digital backend 6.3.6 Reference ADC The reference ADC is comprised of the resistor ladder, the switch bank SR, and the comparator C [ 0 ] The resistor ladder is reused form the main ADC to reduce the calibration overhead. The switch bank SR is built with CMOS transmission gates and is controlled by the one hot code S[1:31] to select the desired reference voltage for C [ 0 ] from the resistor ladder. The switch bank SR is implemented with simple CMOS

PAGE 145

145 transmission gat es. C [ 0 ] shares the same design as C[1:31] and does not involve any extra design effort. 6.3.7 Calibration Engine and Supporting Circuitry The other calibration circuitry includes the FSM as the calibration engine, the SRAM to store the offset control words, the address decoder to facilitate the communication between the FSM and SRAM, and the switch bank SQ. The FSM, the SRAM, and the address decoder are all built with standard cells, while the switch bank SQ is implemented with CMOS transmission gates, same as S R. Figure 6 22 FSM flow chart. N is the calibration index, which is also the SRAM address. Figure 6 22 shows the flow chart of the FSM operation. At the beginning the FSM sets N to 1. This sets S [ 1 ] are connected to V R [1]. Meanwhile immunity, the FSM then accumulates the results of 128 comparisons between C[0] and sets N to 2 and calibrates C [ 2 ] This process repeats cyclically for C[1 :31 ] so that the comparators are all continuously calibrated in the background.

PAGE 146

146 Note that, w thermometer output instead of its decoded binary output. This eliminates the need for a 5 b digital comparator and bypasses the possible complication introduced by bubble error correction. 6.3.8 Clock and Power Distribution Clock distribution is of crucial importance in high speed ADC design. The clock buffers are sized for the same fan out. Dummy loads are inserted in the clock tree to compensate for unbalanced loads. To account for the finite delay throug h the preamplifier the clock of the T/H leads that of the comparators by one inverter delay. Since the clock of the FSM and the decimator is divided down from the full speed clock and its phase relation ship with the full speed clock is unknown, multiple p hases are generated for selection through the on chip serial interface. The power is split to analog and digital domains. Decoupling capacitors are inserted whenever there is spare area. To prevent noise coupling through the substrate, guardring is inserte d between the analog part and the digital part. The guardring is connected to a dedicated ground pad, separate from analog and digital ground pads [99] 6.4 Experimental Results The prototype 5 bit flash ADC was poly 8 metal bulk CMOS process and was measured in a QFN package. Figure 6 23 shows the chip micrograph. The ADC core occupies an active area of 0.24 mm 2 Even without any layout optimization, the calibrati on circuitry takes less than 10% of the core area.

PAGE 147

147 Figure 6 23 Chip micrograph The ADC was powered from a 1.2 V supply. The reference voltages V RP and V RN were set to 1.2 V and 0.8 V resp ectively, giving a differential full scale input range of 0.8 was captured by a mixed signal oscilloscope and post processed in Matlab. the ADC and recording the levels at which the output toggles. The peak to peak noise observed during DC measurement is 2.5 mV, or roughly 0.1 LSB. To remove the effect of noise during the DC measurement, the output codes were averaged to find the transition l evels. Figure 6 24 shows the measured INL and DNL with and without calibration. When calibration is disabled, i.e., when all the SRAM bits are cleared to 0 through the serial interface, the ADC has an INL of 1.85/1.48 LSB and a D NL of 1.00/2.75 LSB. Enabling calibration improves the INL to 0.21/0.17 LSB and the DNL to 0.07/0.04 LSB. The low calibrated DNL and INL clearly demonstrates the efficacy of the proposed calibration scheme.

PAGE 148

148 (A) (B) Figure 6 24 Measured ADC linearity. A) INL B) DNL Figure 6 25 shows dynamic performance evaluation test setup. The single ended input signal from a signal generator is first converted to differen tial by a passive balun before being fed to the ADC. Figure 6 26 shows the output spectrums before and after enabling the calibration. The input signal is a full scale 1.172 GHz sine wave, and the sample rate is 2.5 GS/s. Note tha t due to the decimation, the fundamental tone is aliased to 0.3 MHz and the frequency spans from DC to 19.53125 MHz. The SFDR improves by nearly 12 dB from 27.3 dB to 39.2 dB with calibration.

PAGE 149

149 Figure 6 25 Test setup for dynamic performance evaluation (A) (B) Figure 6 26 Output spectrums A) W / calibration. B) W/o calibration Figure 6 27 ENOB w/ and w/o calibration Figure 6 27 shows the measured ENOB at various sa mple rates with the input frequency kept at around 1.2 GHz. Without calibration, the highest ENOB is below 3.5

PAGE 150

150 bits. With calibration, the ENOB improves to 4.7 bits below 2 GS/s and remains above 4.4 bits until 2.5 GS/s. For all sample rates, the calibrati on improves the ENOB by more than 1.2 bits. The ADC core (excluding peripheral IO and termination) consumes 50mW, of which about 34 mW is consumed by the digital backend and the clocking circuitry. Even without resorting to power saving architectures such as interpolation and folding, our design achieves a competitive figure of merit (FoM) of 0.95 pJ/conversion. Table 6 2 ADCs. Note that designs with sim ilar or better FoM all employ interpolating or folding techniques except [74] which uses fully dynamic comparators and a more advanced technology. Table 6 2 Comp arison with r ecently p ublished w ork Reference [77] [78] [100] [74] [92] [101 ] [102 ] [103 ] This work Interpolating Yes Yes No No Yes Yes No Yes No Folding No Yes No No No No No No No Resolution 6 6 4 5 6 6 6 6 5 Fs (GS/s) 3 2.7 4 1.75 3.5 1.6 5 1.2 2.5 INL (LSB) 0.2 0.73 0.24 0.39 1 0.42 0.7 0.6 0.21/0.17 DNL (LSB) 0.2 0.53 0.15 0.38 0.5 0.49 0.6 0.4 0.07/0.04 ENOB 5.8 1) 5.3 3.5 4.7 4.9 5.4 5.0 5.7 4.4 Process (nm) 90 90 180 90 90 130 65 130 130 VDD (V) 1.2 1 1.8/2.5 1 0.9 1.5 1.3 1.5 1.2 Power (mW) 90 50 608 7.6 98 180 320 90 50 Calibration BG 2) BG FG 3) FG No No No No BG Area (mm 2 ) 0.28 0.36 0.88 0.03 0.15 0.42 0.3 0.12 0.24 FoM (pJ/Conv.) 2.3 0.47 13.6 0.17 0.95 2.6 1.97 1.4 0.95 1) With 10MHz input. 2) Background. 3) Foreground.

PAGE 151

151 6.5 Summary As technology scales, ADC b ased serial links are becoming attractive for its flexibility and scalability where a flash ADC architecture is usually used for its high speed capability. One of the key challenges in ADC based serial links is the power consumption of high speed ADC s, re duction of which is limited by the mismatch between components By compensating for the offset due to mismatch calibration allows the use of small components enables low power designs. Running the calibra tion in the background provides the additional benefit of tracking environmental changes and device aging. Key metrics for background calibration techniques include accuracy, convergence speed, area/power overhead, and performance penalty. A brief survey o f currently available background calibration techniques against these metrics suggests the need for improvement, especially for high speed ADCs. A novel digital background ADC calibration scheme has been proposed in this Chapter. By employing a single refe rence comparator and reconfiguring its threshold voltage, the proposed scheme calibrates the transition levels of the main ADC sequentially. Compared to the simultaneous calibration of existing solutions, this sequential operation leads to extremely low ha rdware and design overhead. Its impact on the ADC performance is also minimal. The effectiveness of the proposed calibration scheme is experimentally demonstrated by the significant improvements in the static and dynamic performance of a 50 mW 2.5 GS/s 5 b it full flash ADC in 0.13 Although a flash ADC is used as a prototype in this work, the concept can be readily extended to other

PAGE 152

152 architectures. This technique should help pave the way for future low power ADC based serial links.

PAGE 153

153 CHAPTER 7 CONC LUSIONS The exponential increase of functionality integrated on a single microprocessor requires ever higher aggregate I/O bandwidth. Meanwhile, the whole chip power b udget has been kept practically flat at around 140 W due to packaging and thermal manag ement limitations. As a result, the power efficiency of off chip signaling must be greatly improved to maintain the scaling of microprocessors. At multi Gb/s, t he channel impose s a challenging bandwidth bottleneck because of its frequency dependent loss i nduced by skin effect and dielectric dissipation. As a result, h igh speed signaling usually resorts to sophisticated equalization such as FFE and DFE to compensate for the channel loss. Besides equalization, other essential functions in a high speed link i nclude clocking and signaling. To improve the link power efficiency, the implementation options for each function must be carefully evaluated in terms of their impact on the total link power so that informed tradeoffs can be made This Dissertation re prese nts such an effort from both the circuit and channel perspectives On the circuit side, d ifferent schemes for equalization, clock generation and recovery, and signaling modes a re compared. The advantages of DFE, injection locking based clock generation, ba ud rate CDR and voltage mode signaling with differential termination are identified. On the channel side, air cavity transmission lines are proposed to reduce the dielectric loss of electrical channels at high frequencies. Th e results of this effort inclu de a 6.25 Gb/s 0.6 pJ/bit active with a current sharing frontend and an air cavity channel, a 4.5 Gb/s 3.2 pJ/bit receiver with baud rate eye tracking

PAGE 154

154 CDR and majority voting DFE, and a 5 Gb/s 0.75 pJ/bit transceiver in exclusive static CMOS logic style, w hich is among the best reported to date. As semiconductor technology scales, digital signaling processing has become more and more power efficient compared to its analog counterpart. In the field of high speed off chip signaling, this has recently led to t he interest in ADC based links. One critical challenge in the ADC based link architecture is to reduce the power consumption of the high speed ADC which is limited by the component mismatches among other factors. This Dissertation presents a digital backg round calibration technique that features minimal overhead and performance penalty T he efficacy of the calibration scheme is experimentally confirmed with a 50 mW 2.5 GS/s 5 b full flash ADC. All the silicon results in this Dissertation are based on a 0.1 3 m bulk CMOS technology However, there are no fundamental reasons that prevent the presented techniques from being extend ed to more advanced technologies. The work in this Dissertation should therefore help pave the way toward more power efficient off c hip signaling in future electronic systems

PAGE 155

155 LIST OF REFERENCES [1] G. E. Moore, "Cramming more components onto integrated circuits," Electronics, vol. 38, no. 8, pp. 114 117, April 1965. [2] G. Moore, "Progress in Digital Electronic s," in Electron Devices Meeting 1975. [3] B. Casper, G. Balamurugan, J. Jaussi, J. Kennedy and M. Mansuri, "Future microprocessor interfaces: analysis, design and optimization," in IEEE Custom Integrated Circuit Conf. 2007. [4] J. Nasrullah, A. Amin, W. Ahmad, Z. Qin, Z. Mushtaq, O. Javed, J. Yoon, L. Chua, D. Huang, B. Huang, M. Vichare, K. Ho and M. Rashid, "A terabit/s throughput; SerDes based interface for a third generation 16 Core 32 thread chip multithreadin g SPARC processor," in IEEE Symp. VLSI Circuits 2008. [5] "The International Technology Roadmap for Semiconductors (ITRS)," 2011. [Online]. Available: http://public.itrs.net/. [Accessed 2011]. [6] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyl es, W. J. Dally and M. Horowitz, "A 14 mW 6.25 Gb/s Transceiver in 90 nm CMOS," IEEE J. Solid State Circuits, vol. 42, no. 12, pp. 2745 2757, December 2007. [7] K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, T. Takemoto, F. Yuki and T. Saito, "A 12.3 mW 12.5 Gb/s complete transceiver in 65nm CMOS," in ISSCC Dig. Tech. Papers San Francisco, 2010. [8] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandel wal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio and G. Swanson, "A 12.5Gb/s SerDes in 65nm CMOS Using a Baud Rate ADC with Digital receiver Equalization and Clock Recovery," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2007 [9] H. Johansson and C. Svensson, "Time resolution of NMOS sampling switches used on low swing signals," IEEE J. Solid State Circuits, vol. 33, no. 2, pp. 237 245, February 1998. [10] H. Johnson and M. Graham, High speed digital design: a handbook of black magic, New Jersey: Prentice Hall, 1993. [11] E. Bogatin, "Essential principles of signal integrity," IEEE Microwave Magazine, vol. 12, no. 5, pp. 34 41, August 2011.

PAGE 156

156 [12] E. Bogatin, Signal integrity: simplified, New Jersey: Prentice Hall, 2 003. [13] W. J. Dally and J. Poulton, "Transmitter equalization for 4 Gbps signaling," Micro, vol. 17, no. 1, pp. 48 56, 1997. [14] J. Jaussi, G. Balamurugan, D. Johnson, B. Casper, A. Martin, J. Kennedy, N. Shanbhag and R. Mooney, "8 Gb/s source syn chronous I/O link with adaptive receiver equalization, offset cancellation, and clock de skew," IEEE J. Solid State Circuits, vol. 40, no. 1, pp. 80 88, January 2005. [15] S. Gondi and B. Razavi, "Equalization and clock and data recovery techniques fo r 10 Gb/s CMOS serial link receivers," IEEE J. Solid State Circuits, vol. 42, no. 9, pp. 1999 2011, 2007. [16] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker and M. Beakes, "A 6.4Gb/s CMOS SerDes c ore with feed forward and decision feedback equalization," IEEE J. Solid State Circuits, vol. 40, no. 12, pp. 2633 2645, 2005. [17] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, S. Wu, J. D. Powers, M. U. Erdogan, A. Yee, R. Gu, L. Wu, Y. Xie, B. Parth asarathy, K. Brouse, W. Mohammed, K. Heragu, V. Gupta, L. Dyson and W. Lee, "A 6.25 Gb/s binary transceiver in 0.13 um CMOS for serial data transmission across high los legacy backplane channels," IEEE J. Solid State Circuits, vol. 40, no. 12, pp. 2646 265 7, December 2005. [18] A. Emami Neyestanak, A. Varzaghani, J. Bulzacchelli, A. Rylyakov, C. K. Yang and D. Friedman, "A 6.0 mW 10.0Gb/s receiver with switched capacitor summation DFE," IEEE J. Solid State Circuits, vol. 42, no. 4, pp. 889 896, 2007. [ 19] S. Kasturia and J. H. Winters, "Techniques for high speed implementation of nonlinear cancellation," IEEE J. Sel. Areas Commun., vol. 9, no. 5, pp. 711 717, June 1991. [20] G. Balamurugan, J. Kennedy, G. Banerjee, J. Jaussi, M. Mansuri, F. O'Mahony B. Casper and R. Mooney, "A scalable 5 15Gbps, 14 75mW low power I/O transceiver in 65nm CMOS," in IEEE Symp. VLSI Circuits 2007. [21] F. O'Mahony, S. Shekhar, M. Mansuri, G. Balamurugan, J. E. Jaussi, J. Kennedy, B. Casper, D. J. Allstot and R. Moon ey, "A 27Gb/s forwarded clock I/O receiver using an injection locked LC DCO in 45nm CMOS," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2008. [22] K. Hu, R. Bai, T. Jiang, C. Ma, A. Ragab, S. Palermo and P. Y. Chiang, "0.16 0.25 pJ/bit, 8 Gb/s near t hreshold serial link receiver with super harmonic injection locking," IEEE J. Solid State Circuits, vol. 47, no. 8, pp. 1842 1853, 2012.

PAGE 157

157 [23] B. Razavi, "A study of injection locking and pulling in oscillators," IEEE J. Solid State Circuits, vol. 39, no 9, pp. 1415 1424, 2004. [24] J. Lee and H. Wang, "Study of subharmonically injetion locked PLLs," IEEE J. Solid State Circuits, vol. 44, no. 5, pp. 1539 1553, 2009. [25] J. Chen, A. Hu, Y. Fan and R. Bashirullah, "Noise suppression in injection loc ked ring oscillators," Electronics Letters, vol. 48, no. 6, pp. 323 324, 2012. [26] M. Hsieh and G. Sobelman, "Architectures for multi gigabit wire linked clock and data recovery," IEEE Circuits and Systems Magazine, vol. 8, no. 4, pp. 45 57, 2008. [2 7] C. R. Hogge, "A self correcting clock recovery circuit," IEEE J. Lightwave Tech., vol. 3, no. 12, pp. 1312 1314, 1985. [28] J. D. H. Alexander, "Clock recovery from binary signals," Electronics Letters, vol. 11, no. 22, pp. 541 542, 30 October 1975. [29] Y. M. Greshishchev, P. Schvan, J. L. Showell, M. Xu, J. J. Ojha and J. E. Rogers, "A fully integrated SiGe receiver IC for 10 Gb/s data rate," IEEE J. Solid State Circuits, vol. 35, no. 12, p. 1949 1957, 2000. [30] J. Lee and B. Razavi, "A 40 G b/s clock and data recovery circuit in 0.18um CMOS technology," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2003. [31] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf, J. Weiss and M. Schmatz, "A 72mW 0.03mm2 inductorless 40Gb/s CDR in 65nm SOI CMOS," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2007. [32] C. Kromer, G. Sialm, c. Menolfi, M. Schmatz, F. Ellinger and H. Jackel, "A 25 Gb/s CDR in 90 nm CMOS for high density interconnects," IEEE J. Solid State Circuits, vol. 4 1, no. 12, p. 2921 2929, December 2006. [33] B. K. Casper, M. Haycock and R. Mooney, "An accurate and efficient analysis method for multi Gb/s chip to chip signaling schemes," in IEEE Symp. VLSI Circuits 2002. [34] H. Hatamkhani and C. K. K. Yang, A study of the optimal data rate for minimum power of I/Os," IEEE Trans. Circuits and Syst. II, vol. 53, no. 11, pp. 1230 1234, 2006. [35] M. S. Chen, Y. N. Shih, C. L. Lin, H. W. Hung and J. Lee, "A Fully Integrated 40 Gb/s Transceiver in 65 nm," vol. 47, no. 3, pp. 627 640, March 2012.

PAGE 158

158 [36] S. Hall and H. Heck, Advanced signal integrity for high speed digital designs, New Jersey: John Wiley & Sons, 2009. [37] B. Kim, Y. Liu, T. Dickson, J. Bulzacchelli and D. Friedman, "A 10 Gb/s Compact Low Pow er Serial I/O With DFE IIR Equalization in 65 nm CMOS," IEEE J. Solid State Circuits, vol. 44, no. 12, pp. 3526 3538, 2009. [38] T. Tanahashi, M. Kurisu, H. Yamaguchi, T. Nedachi, M. Arai, S. Tomari, T. Matsuzaki, K. Nakamura, M. Fukaishi, S. Naramoto a nd T. Sato, "A 2 Gb/s 21 CH low latency transceiver circuit for inter processor communication," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2001. [39] K. L. Wong, H. Hatamkhani, M. Mansuri and C. K. Yang, "A 27 mW 3.6 Gb/s I/O transceiver," IEEE J. Solid State Circuits, vol. 39, no. 4, p. 2004, April 2003. [40] D. M. Pozar, Microwave engineering, New Jersey: John Wiley & Sons, 1998. [41] M. V. Schneider, "Microstrip lines for microwave integrated circuits," Bell Syst. Tech. Journal, vol. 48, no 5, p. 1421 1444, 1969. [42] G. Balamurugan, J. Kennedy, G. Banerjee, J. Jaussi, M. Mansuri, F. O'Mahony, B. Casper and R. Mooney, "A scalable 5 15 Gbps, 14 75 mW low power I/O transceiver in 65 nm CMOS," IEEE J. Solid State Circuits, vol. 43, no. 4, p p. 1010 1019, 2008. [43] T. Spencer, Y. Chen, R. Saha and P. Kohl, "Stablization of the thermal decomposition of poly(propylene carbonate) through Copper ion incorporation and use in self patterning," Journal of Electronic Materials, pp. 1350 1363, 2011 [44] D. Z. Turker, A. Rylyakov, D. Friedman, S. Gowda and E. Sanchez Sinencio, "A 19Gb/s 38mW 1 tap speculative DFE receiver in 90nm CMOS," in IEEE Symp. VLSI Circuits 2009. [45] W. R. Eisenstadt and Y. Eo, "S parameter based IC interconnect trans mission line characterization," IEEE Trans. Components, Hybrids, and Manufacturing Technology, vol. 15, no. 4, pp. 483 490, 1992. [46] V. Balan, J. Caroselli, J. G. Chern, C. Chow, R. Dadi, C. Desai, L. Fang, D. Hsu, P. Joshi, H. Kimura, C. Liu, T. W. P an, R. Park, C. You, Y. Zeng, E. Zhang and F. Zhong, "A 4.8 6.4 Gb/s serial link for backplane applications using decision feedback equalization," IEEE J. Solid State Circuits, vol. 40, no. 9, pp. 1957 1967, 2005. [47] K. H. Mueller and m. Muller, "Timi ng recovery in digital synchronous data receivers," IEEE Trans. on Communications, vol. 24, no. 5, pp. 516 531, May 1976.

PAGE 159

159 [48] A. Emami Neyestanak, S. Palermo, H. C. Lee and M. Horowitz, "CMOS transceiver with baud rate clock recovery for optical interc onnects," in IEEE Symp. VLSI Circuits 2004. [49] F. Musa and A. C. Carusone, "A baud rate timing recovery scheme with a dual function analog filter," IEEE Trans. Circuits Syst. II, vol. 53, no. 12, pp. 1393 1397, December 2006. [50] R. S. Kajley, P. Hurst and J. E. C. Brown, "A mixed signal decision feedback equalizer that uses a look ahead architecture," IEEE J. Solid State Circuits, vol. 32, no. 3, pp. 450 459, 1997. [51] W. Fang, "Accurate analytical delay expressions for ECL and CML circuits a nd their applications to optimizing high speed bipolar circuits," IEEE J. Solid State Circuits, vol. 25, no. 2, pp. 572 583, 1990. [52] T. E. Collins, V. Manan and S. I. Long, "Design analysis and circuit enhancements for high speed bipolar flip flops," IEEE J. Solid State Circuits, vol. 40, no. 5, pp. 1166 1174, 2005. [53] A. Garg, A. C. Carusone and S. P. Voinigescu, "A 1 tap 40 Gb/s look ahead decision feedback equalizer in 0.18 um SiGe BiCMOS technology," IEEE J. Solid State Circuits, vol. 41, no. 10, pp. 2224 2232, October 2006. [54] A. Kapoor, Y. Hu and R. Bashirullah, "Design and optimization of high speed CML gaters using a current centric LE model," to appear in IEEE Trans. Circuits & Syst. I. [55] C. Kromer, G. Sialm, C. Menolfi, M. Sch matz, F. Ellinger and H. Jackel, "A 25 Gb/s CDR in 90 nm CMOS for high density interconnects," IEEE J. Solid State Circuits, vol. 41, no. 12, p. 2921 2929, December 2006. [56] M. G. Chen and J. K. Notthoff, "A 3.3 V 21 Gb/s PRBS generator in AlGaAs/GaAs HBT technology," IEEE J. Solid State Circuits, vol. 35, no. 9, pp. 1266 1270, 2000. [57] E. Laskin and S. P. Voinigescu, "A 60 mW per lan, 4X23 Gb/s 27 1 PRBS generator," IEEE J. Solid State Circuits, vol. 41, no. 10, pp. 2198 2208, 2006. [58] T. O. Dickson, E. Laskin, I. Khalid, R. Beerkens, J. Xie, B. Karajica and S. P. Voinigescu, "An 80 Gb/s 231 1 pseudorandom binary sequence generator in SiGe BiCMOS technology," IEEE J. Solid State Circuits, vol. 41, no. 12, pp. 2735 2745, 2005. [59] H. Knapp M. Wurzer, T. F. Meister, J. Bock and K. Aufinger, "40Gbitps 27 1 PRBS generator IC in SiGe bipolar technology," in Proc. Bipolar/BiCMOS Circuits and Technology Meeting Monterey, CA, 2002.

PAGE 160

160 [60] H. Knapp, M. Wurzer, W. Perndl, K. Aufinger, J. Bock and T. F. Meister, "100 Gb/s 27 1 and 54 Gb/s 211 1 PRBS generators in SiGe bipolar technology," IEEE J. Solid State Circuits, vol. 40, no. 10, pp. 2118 2125, 2005. [61] K. Fukuda, H. Yamashita, F. Yuki, M. Yagyu, R. Nemoto, T. Takemoto, T. Saito, N. Chujo K. Yamamoto, H. Yanai and A. Hayashi, "An 8Gb/s transceiver with 3X oversampling 2 threshold eye tracking CDR citcuit for 36.8dB loss backplane," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2008. [62] M. J. E. Lee, W. J. Dally and P. Chiang, "Low power area efficient high speed I/O circuit techniques," IEEE J. Solid State Circuits, vol. 35, no. 11, pp. 1591 1599, 2000. [63] S. Quan, F. Zhong and W. L. e. al, "A 1.0625 to 14.025Gb/s multimedia transceiver with full rate source series terminated transmit driver and floating tap decision feedback equalizer in 40nm CMOS," in ISSCC Dig. Tech. Papers San Francisco, 2011. [64] R. Palmer, J. Poulton, W. J. Dally, J. Eyles, A. M. Fuller, T. Greer, M. Horowitz, M. Kellam, F. Quan and F. Zarkeshvari, A 14mW 6.25Gb/s transceiver in 90nm CMOS for serial chip to chip communications," in ISSCC Dig. Tech. Papers San Francisco, 2007. [65] R. Farjad Rad, A. Nguyen, J. M. Tran, T. Greer, J. Poulton, W. J. Dally, J. H. Edmondson, R. Senthinathan, R. Rathi, M. J. E. Lee and H. Ng, "A 33 mW 8 Gb/s CMOS clock multiplier and CDR for highly integrated I/Os," IEEE J. Solid State Circuits, vol. 39, no. 9, pp. 1553 1561, 2004. [66] K. Hu, T. Jiang, J. Wang, F. O'Mahony and P. Y. Chiang, "A 0.6 mV/Gb/s, 6.4 7.2 Gb /s serial link receiver using local injection locked ring oscillators in 90 nm CMOS," IEEE J. Solid State Circuits, vol. 45, no. 4, pp. 899 908, 2010. [67] S. Shekhar, M. Mansuri, F. O'Mahony, G. Balamurugan, J. E. Jaussi, J. Kennedy, D. J. Allstot, R. Mooney and B. Casper, "Strong injection locking in low Q LC oscillators: modeling and application in a forwarded clock I/O receiver," IEEE Trans. Circuits and Syst. I: Regular Papers, vol. 56, no. 8, pp. 1818 1829, 2009. [68] F. O'Mahony, J. E. Jaussi, J. Kennedy, G. Balamurugan, M. Mansuri, C. Roberts, S. Shekhar, R. Mooney and B. Casper, "A 14X10 Gb/s 1.4mW/Gb/s parallel interface in 45 nm CMOS," IEEE J. Solid State Circuits, vol. 45, no. 12, pp. 2828 2837, 2010. [69] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi, B. Raghavan, H. Pan, I. Fujimori and A. Momtaz, "A 500mW digitally calibrated AFE in 65nm CMOS for 10Gb/s serial links over backplane and multimode fiber," in IEEE ISSCC Dig. Tech. Papers San Francisc o, 2009.

PAGE 161

161 [70] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto, K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito, H. Ishida and K. Gotoh, "A 5Gb/s transceiver with and ADC based feedforward CDR and CMA adaptive equalizer in 65nm CMOS," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2010. [71] P. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," IEEE J. Solid State Circuits, vol. 40, no. 6, pp. 1212 1224, June 2005. [7 2] I. Young, "Analog mixed signal circuits in advanced nano scale CMOS technology for microprocessors and SoCs," in Proceedings of the ESSCIRC 2010. [73] C. Chen, M. Le and K. Kim, "A low power 6 bit flash ADC with reference voltage and common mode ca libration," IEEE J. Solid State Circuits, vol. 44, no. 4, pp. 1041 1046, 2009. [74] B. Verbruggen, P. Wambacq, M. Kuijk and G. Van der Plas, "A 7.6 mW 1.75 GS/s 5 bit flash A/D converter in 90 nm digital CMOS," in IEEE Symp. VLSI Circuits 2008. [75] M. Flynn, C. Donovan and L. Sattler, "Digital calibration incorporating redundancy of flash ADCs," IEEE Trans. Circuits Syst. II, vol. 50, no. 5, pp. 205 213, May 2003. [76] S. Tsukamoto, I. Dedic, T. Endo, K. Kikuta, K. Goto and O. Kobayashi, "A CMO S 6 b; 200 MSample/s; 3 V supply A/D converter for a PRML read channel LSI," IEEE J. Solid State Circuits, vol. 31, no. 11, pp. 1831 1836, 1996. [77] M. Kijima, K. Ito, K. Kamei and S. Tsukamoto, "A 6b 3GS/s Flash ADC with Background Calibration," in IEEE Custom Integrated Circuits Conf. 2009. [78] Y. Nakajima, A. Sakaguchi, T. Ohkido, N. Kato, T. Matsumoto and M. Yotsuyanagi, "A background self calibrated 6b 2.7 GS/s ADC with cascade calibrated folding interpolating architecture," IEEE J. Solid St ate Circuits, vol. 45, no. 4, pp. 707 718, April 2010. [79] H. Ploeg, G. Hoogzaad, H. Termeer, M. Vertregt and a. R. Roovers, "A 2.5 V 12 b 54 Msample/s 0.25 um CMOS ADC in 1 mm2 with mixed signal chopping and calibration," IEEE J. Solid State Circuits, vol. 36, no. 12, pp. 1859 1867, December 2001. [80] S. Jamal, D. Fu, N. Chang, P. Hurst and S. Lewis, "A 10 b 120 Msample/s time interleaved analog to digital converter with digital background calibration," IEEE J. Solid State Circuits, vol. 37, no. 12 pp. 1618 1627, December 2002.

PAGE 162

162 [81] C. Huang and J. Wu, "A background comparator calibration technique for flash analog to digital converters," IEEE Trans. Circuits Syst., vol. 52, no. 9, pp. 1732 1740, September 2005. [82] D. Fu, K. C. Dyer, S. H. Lewis and P. J. Hurst, "A digital background calibration technique for time interleaved analog to digital converters," IEEE J. Solid State Circuits, vol. 33, no. 12, pp. 1904 1911, 1998. [83] S. Tsukamoto, I. Dedic, T. Endo, K. Kikuta, K. Goto and O. Kobayashi, "A CMOS 6 b, 200 MSample/s, 3 V supply A/D converter for a PRML read channel LSI," IEEE J. Solid State Circuits, vol. 31, no. 11, pp. 1831 1836, 1996. [84] J. Ingino and B. Wooley, "A continuously calibrated 12 b, 10 MS/s, 3.3 V A/D conver ter," IEEE J. Solid State Circuits, vol. 33, no. 12, pp. 1920 1931, 1998. [85] Y. Chiu, C. Tsang, B. Nikolic and P. Gray, "Least mean square adaptive digital background calibration of pipelined analog to digital converters," IEEE Trans. Circuits Syst. vol. 51, no. 1, pp. 38 46, 2004. [86] X. Wang, P. J. Hurst and S. H. Lewis, "A 12 bit 20 MSampls/s pipelined analog to digital converter with nested digital background calibration," IEEE J. Solid State Circuits, vol. 39, no. 11, pp. 1799 1808, Novem ber 2004. [87] C. Tsang, Y. Chiu, J. Vanderhaegen, S. Hoyos, C. Chen, R. Brodersen and B. Nikolic, "Background ADC calibration in digital domain," in IEEE Custom Integrated Circuits Conf. 2008. [88] H. Wang, X. Wang, P. J. Hurst and S. H. Lewis, "Ne sted digital background calibration of a 12 bit pipelined ADC without an input SHA," IEEE J. Solid State Circuits, vol. 44, no. 10, pp. 2780 2789, 2009. [89] J. McNeill, M. C. W. Coln and B. J. Larivee, ""Split ADC" architecture for deterministic digita l background calibration of a 16 bit 1 MS/s ADC," IEEE J. Solid State Circuits, vol. 40, no. 12, pp. 2437 2445, 2005. [90] J. Doernberg, P. Gray and D. Hodges, "A 10 bit 5 Msample/s CMOS two step flash ADC," IEEE J. Solid State Circuits, vol. 24, no. 4, pp. 241 249, 1989. [91] K. Uyttenhove and M. Steyaert, "A 1.8 V 6 bit 1.3 GHz flash ADC in 0.25 CMOS," IEEE J. Solid State Circuits, vol. 38, no. 7, pp. 1115 1122, July 2003. [92] K. Deguchi, N. Suwa, M. Ito, T. Kumamoto and T. Miki, "A 6b 3. 5GS/s 0.9V 98mW flash ADC in 90nm CMOS," IEEE J. Solid State Circuits, vol. 43, no. 10, pp. 2303 2310, 2008.

PAGE 163

163 [93] M. Choi and A. Abidi, "A 6b 1.3GS/s A/D converter in 0.35um CMOS," IEEE J. Solid State Circuits, vol. 36, no. 12, pp. 1847 1858, 2001. [ 94] R. J. V. d. Plassche, Integrated analog to digital and digital to analog converters, Boston: Kluwer, 1994. [95] P. Allen and D. Holberg, CMOS analog circuit design, New York: Oxford, 2002. [96] B. Razavi, Design of analog CMOS integrated circuit s, New York: McGraw Hill, 2001. [97] W. Evans, E. Naviasky, H. Tang and B. Allison, "Comparator metastability analysis," 1 January 2011. [Online]. Available: http://www.designers guide.org/Analysis/metastability.pdf. [Accessed 1 July 2012]. [98] Y. Ak azawa, A. Iwata, T. Wakimoto, T. Kamato, H. Nakamura and H. Ikawa, "A 400MSPS 8b flash AD conversion LSI," in IEEE ISSCC Dig. Tech. Papers San Francisco, 1987. [99] M. Ingels and M. S. J. Steyaert, Integrated CMOS circuits for optical communications, New York: Springer Verlag, 2004. [100] IEEE J. Solid State Circuits, vol. 42, no. 9, pp. 1865 1872, September 2007. [101] A. Ismail and M. Elmasry, "A 6bit 1.6GS/s low power wi deband flash ADC converter in 0.13um CMOS," IEEE J. Solid State Circuits, vol. 43, no. 9, pp. 1982 1990, September 2008. [102] M. Choi, J. Lee, J. Lee and H. Son, "A 6 bit 5 GSample/s Nyquist A/D Converter in 65nm CMOS," in Symp. VLSI Circuits 2008. [103] C. Sandner, M. Clara, A. Santner, T. Hartig and F. Kuttner, "A 6bit 1.2GS/s low power flash ADC in 0.13um CMOS," IEEE J. Solid State Circuits, vol. 40, no. 7, pp. 1499 1505, July 2005. [104] H. Katamkhani and C. K. K. Yang, "A study of the optima l data rate for minimum power of I/Os," IEEE Trans. Circuits and Systems II, vol. 53, no. 11, pp. 1230 1234, November 2006. [105] A. Deutsch, C. Surovic, R. Krabbenhoft, G. Kopcsay and B. Chamberlin, "Prediction of losses caused by roughness of metalliz ation in printed circuit boards," IEEE Trans. Advanced Packaging, vol. 30, no. 2, pp. 279 287, 2007.

PAGE 164

164 [106] P. M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe and J. Vital, "A 90 nm CMOS 1.2 V 6b 1 GS/s two step subranging ADC," in IEEE ISSCC Dig. Tech. Papers San Francisco, 2006. [107] X. Wang, P. Hurst and S. Lewis, "A 12 bit 20 MSampls/s pipelined analog to digital converter with nested digital background calibration," IEEE J. Solid State Circuits, vol. 39, no. 11, pp. 179 9 1808, November 2004. [108] W. Evans, E. Naviasky, H. Tang and B. Allison, "http://www.designers guide.org/Analysis/metastability.pdf," 1 January 2011. [Online]. Available: http://www.designers guide.org/Analysis/metastability.pdf. [Accessed 1 Octobe r 2011]. [109] H. Chen, I. Chen, H. Tseng and H. Chen, "1 GS/s 6 bit two channel two step ADC in 0.13 IEEE J. Solid State Circuits, vol. 44, no. 11, pp. 3051 3059, 2009. [110] G. Balamurugan, F. O'Mahnoy, M. Mansuri, J. E. Jaussi, J. T. Kenn edy and B. Casper, "A 5 to 25Gb/s 1.6 to 3.8mW/(Gb/s) reconfigurable transceiver in 45nm CMOS," in ISSCC Dig. Tech. Papers San Francisco, 2010.

PAGE 165

165 BIOGRAPHICAL SKETCH Jikai Chen received BSEE and MSEE from East China Normal University, Shanghai, China and Zhejiang University, Hangzhou, China respectively. He received his PhD from the University of Florida, Gainesville, FL in 2013 From 2003 to 2004, he was an analog IC design engineer with Realsil Microelectronics, working on PLL based clock buffe rs. From 2004 to 2006, he was a senior analog IC design engineer with Philips Semiconductors (now NXP), designing high voltage LCD drivers. From 2006 to 2012 he was a research assistant with the Integrated Circuit Research lab of the University of Florida, with his research focused on low power circuit design for high speed serial links. Since 2012 he has been with Texas Instruments as a n analog circuit design er working on high speed circuit design for optical communications.