Citation
On-Chip Structures for Reliability Management of System-On-Chips

Material Information

Title:
On-Chip Structures for Reliability Management of System-On-Chips
Creator:
Sadi, Mehdi Zahid
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (129 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
TEHRANIPOOR,MARK M
Committee Co-Chair:
FORTE,DOMENIC J
Committee Members:
BHUNIA,SWARUP
BUTLER,KEVIN

Subjects

Subjects / Keywords:
reliability -- system-on-chip -- vlsi
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
With aggressive technology scaling in the finfet era the transistor density per unit chip area has increased significantly over the past decade. As a result billion transistor complex System on Chips (SoC) that integrate multicore processors, memory, ASIC on the same die are becoming the trend. The semiconductor design houses actively conform to this trend to remain competitive in the global market. Although the complex design of emerging SoCs offer significant performance boost with innovative features, maintaining lifetime reliable operation of these SoCs are becoming a challenge. To develop an integrated framework for continuous monitoring of key hardware reliability parameters, we explore the design of novel embedded on-chip structures that can collect key reliability data from critical components of the SoC in-field. Moreover, we explore the use of software level machine learning techniques to process the data collected by these on-chip macros for the reliability and integrity management of SoCs. In this presentation the primary focus will be on the reliability challenges stemming from power supply noise, timing delay of critical paths, and in-field aging degradation. Increased functional density with shrinking technology could result in escalating Power Supply Noise (PSN) induced failures in the field. To address these issues, a fully digital on-chip distributed sensor network is presented to continuously monitor the PSN profile across the chip, and generate a trace for diagnosis of any noise-induced failure at silicon validation, structural test, system test and functional operation phases of SoCs. The sensors capture PSN at a fine granularity and store the SoC's critical status bits. The sensor offers easy access and control with the aid of scan chains. The performance of the sensor network has been demonstrated in the physical design of a multicore processor SoC benchmark. The sensor module itself has been fabricated with Globalfoundries' 14nm finfet technology. Because of process variation induced device and interconnect parametric shifts, the post-silicon critical or near-critical paths differ from those identified in the pre-silicon stage. As a result the operating speed or Fmax varies from sample to sample of the same chip. Functional workload-based speed binning techniques incur high test-cost in terms of long test-time and complexity in functional test generation, and require high-end automatic test equipment. We propose a novel speed binning flow that uses path timing slacks, extracted with robust digital embedded sensor IPs, of selected critical/near-critical paths. We apply machine learning techniques to model a predictor considering the extracted slacks and the Fmax values from a set of randomly tested die during wafer sort. The trained predictor is used to obtain the Fmax for the remaining chips. For sufficient number of training samples, Fmax is correctly predicted for 99% of the prediction samples. A novel methodology is presented to accurately predict the degradation due to aging mechanisms in a SoC at run-time by utilizing the existing Logic Built-in Self-Test (LBIST) hardware and software implemented machine learning classifier. Using an innovative method, we convert ATPG-generated transition delay patterns into LBIST patterns, and the corresponding responses are utilized in developing the predictor. A gate-overlap and path delay-aware pattern selection algorithm selects the features for the classier. Using clock sweeping, LBIST is able to capture the aging effect on targeted paths. The result of software level machine learning is then utilized to activate countermeasures at the hardware level to remedy the degradation in the field. We implemented our proposed flow on SoC benchmark circuits, and the results demonstrated worst-case prediction accuracy of 94% to 97%. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: TEHRANIPOOR,MARK M.
Local:
Co-adviser: FORTE,DOMENIC J.
Statement of Responsibility:
by Mehdi Zahid Sadi.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

ON-CHIPSTRUCTURESFORRELIABILITYMANAGEMENTOF SYSTEM-ON-CHIPS By MEHDISADI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2017

PAGE 2

c 2017MehdiSadi

PAGE 3

Idedicatethisdissertationtomyfamilyandfriends.

PAGE 4

ACKNOWLEDGMENTSIwouldliketoexpressmygratitudetomyPhDadvisorProf.MarkTehranipoor,forgivingmetheopportunitytobeamemberofhiseliteresearchgroup.AlongthewayProf.MarkTehranipoorhasconsistentlyoeredguidance,andprovidedopportunitiesthatnotonlymadethecompletionofmydissertationpossible,butalsohelpedmedevelopimportantcareer,researchandleadershipskills.Withoutaquestion,Prof.MarkisoneofthekindestandmostgenerouspeopleIhaveeverknown,andagraduatestudentwouldbeluckytohavesuchawonderfulandresourcefuladvisor.Iwanttothankmydissertationcommittee-Prof.SwarupBhunia,Prof.DomenicForteandProf.KevinButler-fortheirvaluablesuggestionsthathelpedmecompletemydissertation.IamverygratefultomyindustrymentorsDr.SukeshwarKannanandLukeEnglandfromGlobalFoundriesforgivingmetheopportunitytodoresearchrelatedtomydissertationattheirfacility,andforthesilicontape-outswithGlobalFoundries'advancedtechnology.IwouldalsoliketothankLeRoyWinemberg,Dr.JifengChenandDatTranfromNXPSemiconductorsfortheirvaluablefeedbackrelatedtomyresearch.IwouldliketothankDr.UjjwalGuinforhisvaluablementorshipandguidancefromthebeginningofmyPhDstudies.Specialgratitudetomyfriendsandcolleagues-Adib,Tauhid,Fahim,Mahbub,Miao-fortheirconstantsupportovertheyears.Finally,specialthankstoSemiconductorResearchCorporationSRCandNationalScienceFoundationNSFforfundingtheresearchthatledtothisdissertation. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS.................................4 LISTOFTABLES.....................................8 LISTOFFIGURES....................................9 ABSTRACT........................................11 CHAPTER 1INTRODUCTION..................................13 1.1Motivation....................................13 1.1.1ReliabilityImpactsofPowerSupplyNoise..............16 1.1.2ImpactofProcessVariationsonChipSpeedandTimingReliability18 1.1.3Wear-outandChipLifetimeReliability................19 1.2ProblemStatement...............................20 1.2.1Task1.SoCReliabilityAgainstPSN.................20 1.2.2Task2.PathTimingSlackExtractionand F MAX Analysis.....21 1.2.3Task3.SoCLifetimeReliabilityManagement............21 1.3RelatedWork..................................21 1.3.1Task1.SoCReliabilityAgainstPSN.................21 1.3.2Task2.PathTimingSlackExtractionand F MAX Analysis.....23 1.3.3Task3.SoCLifetimeReliabilityManagement............25 1.4ProposedSolutions...............................27 1.4.1Task1.SoCReliabilityAgainstPSN.................27 1.4.2Task2.PathTimingSlackExtractionand F MAX Analysis.....27 1.4.3Task3.SoCLifetimeReliabilityManagement............28 1.5Contributions..................................28 1.5.1Task1.SoCReliabilityAgainstPSN.................28 1.5.2Task2.PathTimingSlackExtractionand F MAX Analysis.....28 1.5.3Task3.SoCLifetimeReliabilityManagement............29 1.6ThesisOrganization...............................29 2DESIGNOFANETWORKOFDIGITALSENSORMACROSFOREXTRACTINGPOWERSUPPLYNOISEPROFILEINSOCS..................31 2.1PowerSupplyNoiseModel...........................31 2.2TheSensorNetworkArchitecture.......................33 2.2.1LocalSensorBlock...........................34 2.2.1.1TransitionGenerationUnitTGU.............35 2.2.1.2RecongurableDelayLineRDL..............35 2.2.1.3FixedDelayStageFDS..................35 2.2.1.4ControlVectorUnit.....................35 5

PAGE 6

2.2.1.5CaptureUnitCU......................35 2.2.1.6StorageUnitSU......................36 2.2.1.7TriggerPulseGenerator...................37 2.2.2GlobalControlandDebugUnit....................37 2.2.3SensorAccessandControl.......................37 2.3WorkingPrinciple:TrackingAverageVDDperclockcycle.........38 2.4DebugUnit...................................40 2.5DesignMethodology..............................44 2.6SimulationResultsandAnalysis........................46 2.7TestChipImplementation...........................57 3SOCSPEEDBINNINGUSINGMACHINELEARNINGANDON-CHIPSLACKSENSORS.......................................58 3.1ProposedSpeedBinningFlow.........................58 3.2ThePathSlackSensor.............................61 3.2.1ClockGatingUnit............................61 3.2.2DelayLine................................61 3.2.3MonitorUnit..............................62 3.2.4CaptureandResultStorageUnit...................63 3.3SensorInsertionFlow..............................64 3.4MachineLearning................................67 3.5SimulationResultsandAnalysis........................67 3.6TestChipImplementation...........................81 4DESIGNOFRELIABLESOCSWITHBISTHARDWAREANDMACHINELEARNING......................................82 4.1HardwareLevelImplementationDetails....................82 4.1.1IdenticationofHighUsageCriticalorNear-CriticalPaths.....84 4.1.2LBIST-excitablePathIdentication..................85 4.1.2.1ATPGPatternGeneration..................85 4.1.2.2LBISTSeedExtraction...................86 4.1.2.3DetailedPathDatabaseCreation..............88 4.1.3Gate-overlapAwareLBISTPatternSelectionforIn-eldApplication90 4.1.4ResponseStorageFlip-opInsertion..................92 4.2CountermeasuresAgainstAgingDegradation.................93 4.2.1ClockFrequencyAdjustment......................93 4.2.2AdaptiveVoltageScalingAVS....................94 4.2.3AdaptiveBodyBiasingABB.....................96 4.3SoftwareLevelPredictor............................96 4.3.1ImpactofProcessVariationsonPredictorSelection.........97 4.3.2FeatureSelectionforMachineLearningPredictor..........97 4.3.3MachineLearningFramework.....................98 4.4SimulationResultsandAnalysis........................102 6

PAGE 7

5SUMMARYANDCONCLUSIONS.........................111 5.1SensorNetworkDesignforPowerSupplyNoiseProleExtraction.....111 5.2SpeedBinningwithSensorsandMachineLearning.............111 5.3BISTHardwareandMachineLearninginReliableSoCDesign.......112 5.4FutureWork...................................112 APPENDIX:MULTICLASSMACHINELEARNINGALGORITHMS........113 REFERENCES.......................................119 BIOGRAPHICALSKETCH................................129 7

PAGE 8

LISTOFTABLES Table page 2-1Calibrationresultsat25 Cfor28nmstandardcelllibrary.............47 2-2SensorreadingsfromSPARCcore..........................55 2-3PostlayoutstatisticsforSPRACcorewithsensors................57 3-1Sensorcalibrationresultsfor28nmstandardcelllibrary Cto80 C.....70 3-2SensordataextractionfromactualpathsofFGU.................71 3-3LayoutstatisticsandsensorareaoverheadforFGU................71 3-4Sensorloadingeectonpathdelay.........................73 3-5SensordistributioninFGUofOpenSPARCT2...................74 3-6ProcessvariationproleforMonteCarlosimulations...............75 4-1Responsecodesatdierenttestclockcyclesfordierentagingstages......99 4-2Interpretationoftheresponsecodes........................100 4-3Resultsfrombenchmarkcircuitsat28nmstandardcelllibrary..........103 4-4Detectionresolutionpercentincreaseinpathdelayatdierentagingstages..104 8

PAGE 9

LISTOFFIGURES Figure page 1-1DierenttypesofvariabilityimpactingSoCs....................14 1-2Performancedegradationfromvariability......................14 1-3Discrepancyinpathdelayinpre-siliconandpost-siliconstages..........15 1-4Impactofprocessvariationson F MAX ........................15 1-5Powersupplynoiseprole...............................16 1-6IllustrationofchipdegradationwithusagesourceIEEE.............16 2-1Powersupplynoiseproleatchiplevel........................32 2-2Thesensornetworkarchitecture...........................33 2-3CircuitdiagramofthePowerSupplyNoisePSNsensorIP............34 2-4Globalcontrolanddebugunit............................37 2-5SensoraccesswithJTAG...............................38 2-6VDDoveraclockcycleandVDDobservedbyeachgate..............39 2-7RelationbetweenmeanVDDoveraclockcycleandxedVDD..........40 2-8TheChipStatusRegisterCSRunit........................41 2-9Theclockcountermodule...............................41 2-10Countingclockwithinaninstructionwindow....................41 2-11Thedesignspaceexploration.............................47 2-12Post-layoutSPICEsimulationresults.........................50 2-13EectofprocessvariationonControlVector....................51 2-14Thesensormacrolayout................................52 2-15ThecompletelayoutofSPARCT1withsensornetwork..............53 2-16TheCSRofSPARC..................................55 2-17SnippetfromtheGDSIIviewfromGlobalFoundriestestchip...........57 3-1 F MAX prediction....................................59 3-2Theslacksensorarchitectureforrisingtransition..................62 9

PAGE 10

3-3Sensorinsertionow..................................64 3-4Wave-shapesofsensorresponsefromSPICEsimulation..............68 3-5MonteCarloprocessvariationsimulationresultsonsensorIPatnominalVDD.70 3-6Sensorloadingeectonpathdelay..........................72 3-7Thecompletelayoutwithsensors...........................75 3-8 F MAX distributioninMonteCarlosamples.....................77 3-9ImportanceofeachfeatureSensorindevelopingtheRandomForestpredictor.77 3-10Predictionresultsforeachofthe100validationsamples..............78 3-11Improvementofpredictionaccuracywithincreasingnumberoftrainingsamples.79 3-12ApplicationofslacksensorinTSVdelaymeasurement...............80 4-1HardwarelevelimplementationdetailsofBIST-RM.................82 4-2Delayincreasewithtimeforapathin28nmtechnology..............83 4-3OverviewofBIST-RMsystem............................84 4-4Pathsmostlikelytobecriticalornear-critical....................84 4-5Togglerateanalysis..................................85 4-6Gate-overlapanalysis.................................90 4-7Responsestorageip-opsconnectedwithcaptureip-ops............92 4-8Adaptivevoltagescalingtocounteractagingdegradation..............94 4-9LookUpTableLUTforvoltageselectionforABB/AVS.............94 4-10UseofBIST-RMwithAVStechnique........................95 4-11UseofBIST-RMwithABBtechnique........................97 4-12AgingeectonthedelaydistributionofpathsexcitedbytheLBISThardware.98 4-13Finallayoutsofbenchmarks..............................105 4-14PredictionwithECOCSVMclassierresultsfor3%aginggridresolution...105 4-15Predictionaccuracyanderrorrangefordierentclassiers.............106 4-16Predictionaccuracywithvaryingsizesoftrainingsamplesfordierentclassiers.108 4-17Normalizedexecutiontimeinsoftwareforprediction................109 10

PAGE 11

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy ON-CHIPSTRUCTURESFORRELIABILITYMANAGEMENTOF SYSTEM-ON-CHIPS By MehdiSadi May2017 Chair:MarkM.Tehranipoor Major:ElectricalandComputerEngineeringWithaggressivetechnologyscalinginthenfeterathetransistordensityperunitchipareahasincreasedsignicantlyoverthepastdecade.AsaresultbilliontransistorcomplexSystemonChipsSoCthatintegratemulticoreprocessors,memory,ASIConthesamediearebecomingthetrend.Thesemiconductordesignhousesactivelyconformtothistrendtoremaincompetitiveintheglobalmarket.AlthoughthecomplexdesignofemergingSoCsoersignicantperformanceboostwithinnovativefeatures,maintaininglifetimereliableoperationoftheseSoCsarebecomingachallenge.Todevelopanintegratedframeworkforcontinuousmonitoringofkeyhardwarereliabilityparameters,weexplorethedesignofnovelembeddedon-chipstructuresthatcancollectkeyreliabilitydatafromcriticalcomponentsoftheSoCin-eld.Moreover,weexploretheuseofsoftwarelevelmachinelearningtechniquestoprocessthedatacollectedbytheseon-chipmacrosforthereliabilityandintegritymanagementofSoCs.InthisdissertationtheprimaryfocuswillbeonthereliabilitychallengesstemmingfromPowerSupplyNoisePSN,timingdelayofcriticalpaths,andin-eldagingdegradation.IncreasedfunctionaldensitywithshrinkingtechnologycouldresultinescalatingPSNinducedfailuresintheeld.Toaddresstheseissues,afullydigitalon-chipdistributedsensornetworkispresentedtocontinuouslymonitorthePSNproleacrossthechip,andgenerateatracefordiagnosisofanynoise-inducedfailureatsiliconvalidation,structuraltest,systemtestandfunctionaloperationphasesofSoCs.ThesensorscapturePSNat11

PAGE 12

anegranularityandstoretheSoC'scriticalstatusbits.Thesensoroerseasyaccessandcontrolwiththeaidofscanchains.TheperformanceofthesensornetworkhasbeendemonstratedinthephysicaldesignofamulticoreprocessorSoCbenchmark.ThesensormoduleitselfhasbeenfabricatedwithGlobalfoundries'14nmnfettechnology.Becauseofprocessvariationinduceddeviceandinterconnectparametricshifts,thepost-siliconcriticalornear-criticalpathsdierfromthoseidentiedinthepre-siliconstage.AsaresulttheoperatingspeedorF MAXvariesfromsampletosampleofthesamechip.Functionalworkload-basedspeedbinningtechniquesincurhightest-costintermsoflongtest-timeandcomplexityinfunctionaltestgeneration,andrequirehigh-endautomatictestequipment.Weproposeanovelspeedbinningowthatusespathtimingslacks,extractedwithrobustdigitalembeddedsensorIPs,ofselectedcriticalornear-criticalpaths.WeapplymachinelearningtechniquestomodelapredictorconsideringtheextractedslacksandtheF MAXvaluesfromasetofrandomlytesteddieduringwafersort.ThetrainedpredictorisusedtoobtaintheF MAXfortheremainingchips.Forsucientnumberoftrainingsamples,F MAXiscorrectlypredictedfor99%ofthepredictionsamples.Inthispaper,anovelframeworkispresentedfordesigninglifetime-reliableSoCswithself-adaptationcapabilityagainstaginginduceddegradation.TheproposedowutilizestheexistingLogicBuiltInSelfTestLBISThardware,andsoftwareimplementedMachineLearningpredictortoactivateappropriatecountermeasurestoremedythewear-outintheeld.Usinganinnovativemethod,weconvertATPG-generatedtransitiondelaytestpatternsintoLBISTpatternstoactivatehigh-usagecritical/near-criticalpathsin-eld,andthecorrespondingresponsesareutilizedindevelopingthepredictor.Agate-overlapandpathdelay-awarealgorithmselectstheoptimumsetofpatterns.Theareaandtesttimeoverheadfortheframeworkareverylow.Weimplementedourproposedow onSoCbenchmarkdesigns,andtheresultsdemonstrateditsecacy. 12

PAGE 13

CHAPTER1 INTRODUCTIONThischapterbeginswiththedescriptionofthemotivationbehindthisthesisresearch.Next,theproblemstatement,relatedpriorresearch,andproposedsolutionsarediscussed.Finally,thischapterconcludeswiththethesiscontributions,andthesisorganizationsections. 1.1MotivationWiththeadventofnfetsandcontinuedtechnologyscalinginthenanometerregime,thetrendofincreasedtransistordensityperunitchipareahassustainedovertheyears.AlthoughthispavedthewayforhighlyintegratedSystem-on-ChipsSoCsandmany-coreprocessors[1],theincreasedvariabilityintransistorparametersinthesub-32nmprocessandtheworkloaddependentuctuationsinchipoperatingconditionshavemadeharnessingthefullbenetsofscalingachallengingtask[2].Variations-amajorconcernfordesigners-canbegroupedintothreemajorcategories.Therstcategoryincludestheone-timestaticprocessvariationsduetomanufacturingimperfectionsthatcausetransistorandinterconnectparameterstodriftfromtheirdesignedvalues.Run-timedynamicvariations-powersupplynoiseandtemperatureuctuations[2][3]-causingshiftsinchipoperatingconditionsareputinthesecondcategory.Finally,thethirdcategoryconsistsofworkload-dependentagingvariationsinducedbystressfromprolongedoperationsandcauseparametricdegradationovertime[2][4].Inthecaseoftherstsubgroup,thecumulativeeectofstaticprocessvariations-uctuationsintransistorlength,width,thresholdvoltage,oxidethicknessetc-manifestitselfinpathdelayvariations.Asaresult,acertainpathmayshowsignicantdiscrepancyinthedelaybetweenpre-andpost-fabricationstages[5][6]andtheactualspeedlimitingpathsmightbemaskedinthesimulationphase[7][8].AdirectresultofpathdelayuctuationsisunequalchipoperatingfrequencyF MAXfordierentsamplesofthesamechipasobservedduringthefunctional F MAX testingphase. 13

PAGE 14

Figure1-1.DierenttypesofvariabilityimpactingSoCs. Figure1-2.Performancedegradationfromvariability.Forthesecondcategoryofvariations-whichincludetemperatureandPSN-advancedpackagingtechniquesandsoftwarelevelworkloadallocationareexploitedforthermalmanagementofachiptokeeptemperatureinducedparametricshiftatminimum[9].Besidesthat,becauseoftemperatureinversioneecti.e.,circuitdelaydecreasewithincreasingtemperatureinmodernCMOSnodes[10]thecircuittimingfailurewithtemperatureriseishighlyimprobable.Ontheotherhand,PSNmaycausetimingfailure14

PAGE 15

Figure1-3.Discrepancyinpathdelayinpre-siliconandpost-siliconstages. Figure1-4.DierentsamplesofthesameSoC/processorexhibitdissimilaroperatingF MAX becauseofprocessvariations.ifnotkeptundercontrolbyadoptingdroop/powersupplynoise-awarevoltageregulationanddecouplingcapacitivescheme[11][12].Moreover,adverseimpactsofPSNoncircuittimingmightdemonstratedissimilarpatternsinstructuraltestandfunctionalmodes[3][13]. 15

PAGE 16

Figure1-5.Powersupplynoiseprole. Figure1-6.IllustrationofchipdegradationwithusagesourceIEEE.SincethethirdcategoryofvariationsresultsfromeventsthatpersistoverthelifetimeofaSoC-BiasTemperatureInstabilityBTI,HotCarrierInjectionHCIandTimeDependentDielectricBreakdownTDDB-,nostraightforwarddesigntimestaticsolutionsareavailable[14][15].Asaresultsignicantamountofresearcheortisbeingspentonensuringthattheip-ops/latcheslocatedatcircuitpathendscapturethecorrectdataoverthechiplifetimedespiteuctuationsintimingmarginsduetodynamicagingeects[16][17]. 1.1.1ReliabilityImpactsofPowerSupplyNoiseThevariationsinthepower/groundsupplyvoltagesresultinginPSNarecausedbyconsiderableswitchingactivityinthecircuit.TwomaincontributingfactorsforPSNareIR-dropcausedbytheparasiticresistivenetworkanddropcausedbypackageandparasiticinductors[3].ModelsofsupplynetworkandPSNweredevelopedandincorporatedintoCADtoolstogivechip,package,andsystemdesignerstheabilitytond16

PAGE 17

potentialsupplyintegrityissuesinsimulation[18].VericationoftheresultsoftheCADtoolsbyactualmeasurementsofPSNasitisseenonthedieisakeysteptosigningoonthesupplyintegrityofthedesign.AcrucialpartofareliableSoCdesignistoensurearobustpowerdeliverynetworkatalltheProcess,VoltageandTemperaturePVTcorners.Theabilitytomeasureon-diePSNisimportantforbothrst-siliconprototypechipsaswellasmass-productionreadycustomerchips.Foraprototypechip,theinformationaboutspatialPSNproleallowsredistributionofthede-couplingcapacitor[19],adjustmentofpowerbumpsandmodicationsoftheon-chipvoltageregulators[20]tomitigatetheimpactofnoiseoncircuittimingclosure.Forachipthathasalreadybeenshippedtothecustomer,theinformationaboutPSNallowssoftwaretechniquessuchasredistributionofworkloadamongdierentcoresorblockstomitigatetheexcessiveswitchinginacertaindielocation.Theprocessvariationsaremainlyfabricationdependentanddonotvarydynamicallyatruntime,butPSNvariesdynamicallywithworkloadandtestconditions.Asaresultmodeling,detectingandcharacterizingthePSNhasbeenextensivelystudiedinliterature[13][3][18],[21].Thepowersupplyandsimultaneousswitchingnoisefollowaspatialvariationpattern[3][21].Thenoiseproledependsontheactualworkloadbeingexecuted,andasaresult,itcandemonstratedissimilarpatternsamongproductiontest,systemtestandactualfunctionaloperation[22][23].Thefabricatedchipsarespeedbinnedduringthesiliconvalidationandstructuralteststepsbyrunningdierenttestpatternsatdierentfrequencies.Duetothedierencesbetweenstructuraltestinproductiontestenvironmentandin-systemfunctionaloperationwithvaryingworkloads,thenoiseproleofaSoCmaydeviatesignicantlyatfunctionalmodefromthecorrespondingtestmode,resultingindierentperformance.IfthePSNprolecanbeextractedfromthesetwodierentphases,amodeltocorrelatethedierencesbetweenthosemodesmaybedevelopedandusedtopredictspeedmismatchduringfunctionaloperation[22][24]. 17

PAGE 18

AlthoughmodelingofPSN,temperature,andprocessvariationscanhelpprovideanunderstandingoftheirimpactonthecircuitperformance,therealworkloadappliedduringsystemtestandin-the-eldoperationoftheSoCsmayprovideacompletelydierentpictureastowhatthenoise-inducedfailureswouldbe.Thechallenge,however,isthelackofapplyingtheactualworkloadtothechipundertestinproductiontestassuchworkloadmaynotbeavailable.In-eldandin-productiontestmonitoringmechanismsusingon-chipsensorswouldbeapotentialsolutiontoaddresstheproblem.Noisemonitorscanbeusedtocollectinformationfromthechipinproductiontest,aswellasthein-eldandsystemtestoperations.Thenthedatacollectedfromthesensorsinchipsthathavefailedsystem-testandvalidationunderrealworkloadcouldbecorrelatedtothedatacollectedfromthesamechipthathavepassedthetestinproductiontest.Suchanalysiswouldimproveeectivenessofstructuraltestsandreduceescapes. 1.1.2ImpactofProcessVariationsonChipSpeedandTimingReliabilityAsaresultofpathdelayvariations,themaximumoperatingfrequencyofachipe.g.microprocessors,DSPs,micro-controllersandASICsortheF MAXvariesfromchip-to-chipandwafer-to-wafer.Inordertosustainanacceptableyieldratewithoutcompromisingperformance,attheproductiontestphasechipsarebinnedaccordingtothemaximumfunctionaloroperatingfrequency.Chipsinthehigherbinarefasterandsoldatahigherprot.ToaccuratelyidentifytheF MAXitisnecessarytoexecutefunctionalworkloadsorbenchmarksandexciteallthepossiblecriticalpathsatincreasinglyhigherclockfrequenciesuntilanyofthecaptureip-opsfail[25].Executionoffunctionalworkloadconsumesasignicantportionoftestertimeandmemory.Moreover,thisowisrequiredtoberepeatedatmultiplefrequenciestoidentifytheactualF MAX.Additionally,anexpensiveAutomatedTestEquipmentATEisrequiredthatiscapableofrunningthetestworkloadsatthecomparativelyhigherF MAXfrequencyrange.In[26]experimentaldataontesttimeforserverprocessorswerepresentedtocorroboratethefactthatamajorportionoftotaltesttimeisassociatedwithspeedbinning.AlthoughfunctionalF MAX 18

PAGE 19

testingisthemostappropriatemethodtoidentifythechipoperatingfrequencyorF MAX,thehighcostandmemoryrequirementoftheATEandthelongtesttimeassociatedwithworkloadexecutionatmultiplefrequencieshavemotivatedresearcherstoinvestigatealternativelow-costandfastertechniquestoidentifychip F MAX [27]-[30]. 1.1.3Wear-outandChipLifetimeReliabilityReliable,resilientandpredictableoperationofthechipoveritsprojectedlifetimeisanabsolutenecessity[2].Especiallyforautomotive,medicaletc.applications,anydeviationfromexpectedbehaviorcanbecatastrophic.TheprimaryreasonsofsilicondegradationafterthechiphasbeenshippedtothecustomerareElectro-MigrationEM,BiasTemperatureInstabilityBTI,HotCarrierInjectionHCI,andTimeDependentDielectricBreakdownTDDB.EMandTDDBareresponsibleforhardfailures,andoccurattheendofchiplifetime[15].TheBTIandHCImanifestasworkload-dependentagingvariationsinducedbystressfromprolongedoperations,andcauseparametricdegradationovertime[15][31].Forhigh-kmetalgateprocesses,BTIofPMOSNBTIisattributedtotrapsatthesilicon/oxideinterface,andBTIofNMOSPBTIareattributedtotrapsintheinterfaceofthehigh-k/interfaciallayer[15][14].Asreportedin[15][14]forrecentIntel14nmprocess,therelativelocationofthetrapswithrespecttothechannelsurfacecausesNBTItobemoreimpactfulondevicedegradationthanPBTI.Althoughstillasourceofdegradation,NBTIin14nmtechnologyisnearly2xlowerthanwhatwasobservedin22nm[15][14].Hotcarrierdegradationfor14nmincreasesoverwhatwasobservedin22nmduetothecombinationofbothgatelengthscalingandnwidthscaling.Comparing14nmto22nm,theincreaseinhotcarrierdegradationisrelativelysmallcomparedtothereductioninBTI,andtheoverallcircuitdegradationisexpectedtobelow[15][14].Althoughthemagnitudeofagingitselfissmallerinadvancedtechnologynodes,becauseoflowersupplyvoltageatadvancednodesthecircuitdelayismuchmoresensitivetoagingwithtime[31][32]. 19

PAGE 20

Signicantamountofresearcheortisbeingspentonensuringthattheip-ops/latcheslocatedattheendofcircuitpathscapturethecorrectdataoverthechiplifetimedespiteuctuationsintimingmarginsduetodynamicagingeects[16][17].TwogeneralclassoftechniquesareexpolredtoensurethelifetimetimingreliabilityofSoCs.Intherstapproach,circuitsoperateatnearzeroguard-bandandusethemethodoferrorcorrectionbyarchitecturalreplayatlowerclockfrequencytorecoverfromtimingfailures[33][34].SincetimingerrorcorrectioncomesattheexpenseofgreatercyclesperinstructionCPIoverheadi.e.,fromreplayingthefailinginstructionandrequireproprietarycircuits[33][34],thesetechniquesarenotapplicabletogeneralpurposeSoCs.Thesecondmethodassignsaconservativetimingguard-bandwiththenominalpathdelayasasafetymarginagainstagingdegradation[35].Theguard-bandassignmentisdoneatthepre-silicondesignphasebasedonthetargetclockfrequencyandworst-caseoperatingconditions.Addingaconservativevoltageguard-bandinsteadofthetimingguard-bandisalsoapracticetocounteractaginginducedfunctionalclockfrequencydegradation[36].Theseguard-bandsareusedtoguaranteethatthesystemoperatescorrectlyfromthetimemanufacturershipsittothecustomertotheprojectedendofitslifespan.However,suchstaticandpessimisticguard-bands,whicharecalculatedatdesigntimebasedonworst-caseconditions,leadtosignicantperformancelossandenergyineciencythattaxthesystem.Further,applyingtheguard-bandstoallpartsinhighvolumemanufacturingincurssignicantoverheads.Moreover,workload-dependentagingcomplicatesoptimalpre-siliconxedguard-bandassignment[37].AsaresulttimingreliabilityofSoCsisstillanactiveeldofresearchintheDesign-for-ReliabilityDfRcommunity[37][38]. 1.2ProblemStatement 1.2.1Task1.SoCReliabilityAgainstPSNTheexistinghardwareformonitoringPSNaremostlyanalogandincurhigharea-overhead.Althoughtheanalogmonitorsoerhighprecision,theyarenotcapableofmonitoringandrecordingPSNinadigitalSoConaclockcycle-by-cyclebasis.Moreover,20

PAGE 21

aholisticworst-casePSNeventdetectionfeaturethatcancapture,ithenoisemagnitude,iinoiseeventlocationinSoClayout,andiiinoisegeneratingsoftwaresegmenti.e.,testpatternorinstructioncode,hasnotbeendevelopedpreviously. 1.2.2Task2.PathTimingSlackExtractionand F MAX AnalysisLight-weightdigitalsensorsthatcancapturedelayortimingslackfromactualcriticalornear-criticalpathsin-situforpre-siliconandpost-silicontiminganalysishasnotbeenexploredindetailspreviously.Anotherquestionthatneedsansweris,howtoselectasetofcriticalornear-criticalpathsacrossthechipforsensorplacementsuchthattheextractedpathdelayscanbeutilizedtodevelopanaccuratechip F MAX predictor. 1.2.3Task3.SoCLifetimeReliabilityManagementIthasbeendemonstratedinnumerouspreviousresearchthatonetimestaticguard-bandagainstwear-outoraginginducedtimingdegradationisapessimisticsolution.Thecurrentapproachtotacklethisissueistousepre-characterizedlook-uptablesthatmapdeterministicworkloadactivitytorequiredguard-bandagainstaging.Theproblem,however,isthelackofresearchonhowtoaccountforrandomworkloadexecutionsandprocessvariationsinaccuratepredictionofagingdegradation,andhowtoadaptivelyadjustguard-bandthroughouttheSoC'slifetime. 1.3RelatedWork 1.3.1Task1.SoCReliabilityAgainstPSNThePSNmeasurementsensorscanbebroadlyclassiedintoanalog/mixed[39]-[43]andfullydigitaldomains[44]-[46].Asdigitalstructuresarelesssusceptibletovariations,theyarepreferableovertheiranalogcounterpartsatlowertechnologynodes.ToexactlycapturethehighbandwidthPSNwaveform,highprecisionAnalog-to-DigitalConverterADCisrequired.But,theuseofADCisimpracticalforon-chipmeasurementbecauseofhighpowerdissipationandarearequirements.In[44]alonginverter-chainbasedsensorhasbeenproposedtomeasurethetiminguncertaintyfromcombinednoisesources.[42]devisedamethodologytoobtaininformationaboutthedistributionofsupplynoiseby21

PAGE 22

repeatedmeasurementatdierentthresholdlevelsandtrackingthepercentageoftimeVDDcrosseseachthreshold.Forthecaseofanoisewaveformthatisperiodicandsynchronizedtoasinglefrequency,themethodofequivalenttimesamplingorsubsamplingmaybeapplicable[42][47].Thisowoperatesbyusingatime-basegeneratortosweepasamplepointfromthebeginningtotheendofthenoisecycle,thustracingouttheentirenoisewaveform.But,becauseoftherandomnatureofchipworkloadandenvironmentcondition,duringnormaloperationofachipthenoisevariationprolemaynotbefullyperiodic.Asaresultthedynamicsofanynon-repetitiveorasynchronousportionofnoiseislostandonlythedistributionofnoiseateachequivalenttimepointisknowninthesetypeofmeasurements[41].Toovercometheseissues,[41][43]modelednoiseasacyclostationaryrandomprocessandextractedtheautocorrelationforestimationofthePowerSpectralDensityPSDfromtheFouriertransformoftheextractedautocorrelationsequence.Thisowworksonthetheorythat,atime-averagePSDcanbemeasuredsimplybytakingtheautocorrelationsamplesuniformlywithinthenoiseperiod.Themeasurementhardwareconsistsoftwoon-chiplowthroughputsamplerwithVCObasedADC,acomparatorbasedsampleretc.Sucientnumbersamplesareaveragedtogettheautocorrelation.In[45]adigitalPSNanalyzerisdescribedthatincorporatesadigitalrandomphase-noiseaccumulatortomeasurepowerspectrumwithoutrequiringaseparate,uncorrelatedtime-baseclock.ItrequiresalowresolutionVCObasedADC.Toobtain1mvaccuracy,32millioncyclesareaveraged.Althoughitachievesahighbandwidthanddetectionresolution,thesubstantialimplementationcomplexityandareaoverheadasdetailedin[45],woulddeteritfrommultipledeploymentsacrossthechiplayouttoobservespatialnoiseprole.Totestanddebugtheeectsofvoltagetransients,theauthorsin[48]enablevoltagetransientdetection,aswellasacapabilitytoinducevoltagetransientsincontrolledmanner. 22

PAGE 23

In[46]agatedringoscillatorstructureisproposedwheretheROisenabledforshortdurations.Theoscillationfrequencyduringthisshortperiodiscountedtoobtaintheaveragesupplyvoltageinthatinterval.Thegatedoscillatormustbeenabledrepeatedlyatthesametimingandthenoisewaveformisassumedperiodic,buttheperiodicityassumptionmaynotholdforrealworkloads.In[49]theauthorspresentedatune-ablecriticalpathreplicabaseddynamicvariationmonitorthatcancapturevoltageorfrequencyuctuations.In[50]analldigitalsensorIPisproposedfordetectingnoiseintestandfunctionalmodes.Theshortcomingoftheexistingon-chipmonitorsare,Oftenthemonitorsincurhighareaoverheadduetothepresenceofanalogblocksorcustomizedoperations.Thesensorsmonitorthenoiseeventsdynamicallyateachclockcycle,asaresulttheworst-casescenarioisoftenmasked.Itwouldbeanover-designtoextractthePSNdistributionortheexactwaveform,sincecircuitperformanceismainlyimpactedbytheaveragenoiseoveraclockcycle.ThesensorscannotmapthenoiseeventstotheSoC'sarchitecturalstatesortheexecutingworkload.Inlightoftheabovediscussions,inthispaperwepresentanalldigitalPSNsensorIPandadistributedsensornetworkbasedframeworkforcontinuouspowersupplynoisemonitoringandtimingfailureanalysisinSoCs. 1.3.2Task2.PathTimingSlackExtractionand F MAX AnalysisStructuraltestpatterns-generallyTransitionDelayFaultTDFandPathDelayFaultPDFpatterns-arewidelyusedtotesttheintegrityofcircuitpathsagainstdefects.Sincestructuralpatternsarelessexpensivetogenerateandoermuchhigherfaultcoverage,therehavebeensignicantamountofresearchtoutilizestructuraltestresultsinestimatingactualchipF MAX[27][28].In[27],theauthorsstudiedthecorrelationbetweenthestructuralandfunctionalF MAXusingsamplechips,wherealinearrelationwasestablished.Theauthorsin[51]demonstratedthatstructuralF MAXbasedoncomplexpathdelayfaultpatternsexhibitedcorrelationtotheactualfunctionalF MAX.In[52][28]23

PAGE 24

datalearningtechniqueswereusedtobuildaF MAXpredictorfromstructuraltestresults.However,inordertoapplyStructuraldelayfaulttestpatterns,thepathundertestmustbetestable.Tomakeallthepathstestablebystructuraltestpatterns,extratestpointsmayberequiredtobeinsertedwhichincurareaoverhead[53].Alsostructuraltest'scorrelationi.e.,directlinearcorrelationtothefunctional-testF MAXcannotbeguaranteedforadvancedtechnologynodesbecauseofprocessvariations[28]-[30].Researchersproposedtheconceptofbuilt-inspeed-grading[29]andbuilt-indelay-binning[54]wheretheBuilt-InSelf-TestBISThardwareandon-chipprogrammableclockgeneratorwereusedtoapplyBISTpatternsatdierentfrequenciesbythechipitself,thuseliminatingtheneedforexpensiveATEandalsoreducingthetesttime.However,becauseofthepseudo-randomnatureofBISTpatterns,thesepatternsmaynotexcitethecritical/near-criticalpaths[55].Asaresult,theaccuracyofthismethodincorrelatingtothechip F MAX cannotbeproven[30].In[56][57]on-chipcircuitswereusedtomeasurethepropagationdelayofcriticalpathsandthenthelongestmeasureddelaydictatedthechipF MAX.Althoughdirectmeasurementofcriticalpathdelayswouldeliminatethecostlyprocedureofrepeatedapplicationofthefunctionaltestatdierentfrequencies,thelargenumberofpossiblecriticalpathsinmodernSoCmakeitalmostimpossibletoincludethatmanypathdelaymonitoringcircuitry.Also,thepre-selectedcriticalpathsmaynotremaincriticalineachfabricatedchipduetotheincreasinglevelsofprocessvariations[6].In[30][58]F MAXobtainedfromfunctionaltestanddatacollectedfromon-chipRingOscillatorRObasedprocessmonitorswereutilizedintrainingmachinelearningpredictors.Next,forrestofthesamples,theROdatawerefedtothetrainedpredictorstoobtaintheF MAX.Thereportedspeed-binningaccuracyrangedfrom90%to93%in[58]and87%to93%in[30]basedonsilicondatacollectedfromchipsfabricatedin55nmand28nmtechnology,respectively. 24

PAGE 25

Forchiptimingdataextraction,in[59]asystematicmethodtosynthesizemultipleDesignDependentRingOscillatorDDROcircuitsusingstandardcellgatesispresented.BecauseofthedesigndependentaspectoftheproposedRO,andtheuseofcriticalpathclusterinformationinthedesign,theDDROoerssignicantaccuracyimprovementoverconventionalinverterbasedROindelayestimation[59].SinceDDROisareplicatypemonitor,itoerslowerareaoverheadanddesigncostcomparedtothein-situmonitors.However,asdiscussedin[59],replicamonitorssuchasDDROcanonlycaptureglobalvariations;whereasin-situmonitorsarebettersuitedforcapturinglocalvariationsinadditiontotheglobalvariationsastheyoperateonactualcircuitpaths.Sinceourresearchgoalrequiresthecaptureoftimingslacksdirectlyfromasetofselectedcriticalpaths,insteadofreplicamonitorswedesignedin-situmonitorsthatcanrecordtheslackvalues.In[60]anin-situtechniquecalledSlackProbewasproposedthatprobestheintermediatenodesofcriticalpathstodetectanyimpendingtimingfailurewithinamonitoringwindow.SlackProbewasdesignedtodetectpossibletimingfailures-withoutrecordingtheactualslackvalue-onthecriticalpathsforthepurposeofactivatingremedialcountermeasures,whereasthegoalofourdesignedslacksensoristomeasureandrecordthetimingslackofcritical/near-criticalpathsfor F MAX estimation. 1.3.3Task3.SoCLifetimeReliabilityManagementIn[98]itisshownthatthepreciseeectofagingdependsonworkloadswitchingactivity,powerstatesandtemperature.Uniquestandard-cellinstanceswerecreatedbyagingindividualtransistorsaccordingtothestaticandworkload-dependentdynamicfactors-switchingactivity,sleepmode,process,voltageandtemperature.Thepresentedanalysisdemonstratedthatpathsthatarenot-criticalinthefreshdesigncanbecomecriticalafteraging,andcriticalpathsinfreshdesignmaynotremaincriticalafteraging.Theseobservationsunderscoretheimportanceofdynamicandreal-timemonitoringintheeld. 25

PAGE 26

In[61]anassumptionismadethatthechipwilloperateatthesameconstantconditioni.e.workloadandswitchingactivity,andbasedonthisuniformstatescenarioapredictoristrainedthatcanpredictthedelayofcriticalpathsatruntime.ButinrealityanySoCorhigh-performanceprocessorexecutetimevaryingworkloadswithuctuatingswitchingactivity.In[38]aproactiveapproachisproposedthatmonitorsworkloadinducedstressandappliesmachinelearningtechniquestopredictaginginducedpathdelays.Themainobjectiveistotrackrecentcircuitactivityandthenextrapolatethetrendtoactivatemitigationmeasures.Thekeyassumptionisthatthechipwillexecutethesametypeofworkloadregularlyinapredictablepattern.Becauseofthisassumption,thisprocedureisnotsuitableforhighperformanceprocessor/SoCi.e.,serverCPUwhereanytypeofworkloadcanbeexecutedwithoutpredictableusagepattern.Anothershortcomingoftheproposedmethodisthattheswitchingprobabilityatthecaptureip-opisconsideredregardlessofthewholepath;butmanypaths-bothlongandshort-canterminateatthesamecaptureip-op.AhighswitchingactivityatthecaptureFFmightresultfromtheshorterpath,notnecessarilyfromlongerorcriticalpaths.Self-tuningcompensatesgradualaging-induceddegradationbyadjustingsystemparametersprogressivelyanddynamicallyoverthelifetimeaccordingtoperformancedemands,whichmaybetime-varyingandworkload-dependent[31].In[32]jointoptimizationofself-tuningparameterssuchasAdaptiveBodyBiasingABBandAdaptiveVoltageScalingAVSarepresentedassumingthesystemisalwaysinactivemodeunderworst-caseworkload.Basedontheloggedtimelinei.e.,howlongthechipison,appropriateself-tuningparametersareset.Requiredtuningparametersatdierenttimestagesarestoredon-chip.However,loggingthechipusagedurationisachallenge.Theauthorsin[62]assumedthatthecumulativetimeofBTIstressisavailable,andbasedonthisdataABB/AVSvaluesareselectedfromalook-uptable.Theimplementationoftheproposedsoftwareleveltechniquetologthetotaltimethechipisonsincet=0isverycomplicatedbecauseitwouldrequirethechiptobealwaysrunning26

PAGE 27

aninternalcounter,whichmaynotbepractical.In[17]anAVS-awaredesign-for-reliabilitytechniqueisproposedthatusesagingde-ratedstandardcelllibrary.Itisshownthatthattheattimingmarginmethodismorepessimisticforlifetimereliability,andthispessimismcanbemitigatedbyAVS.Theconceptofstoringandsubsequentapplicationoftestpatternsforreliabilitymonitoringwasproposedin[63].Thesignatureoftheresponsewasutilizedforfailureanalysis.TheDARTsystemin[64]repeatedlymeasuresthemaximumdelayofacircuittoidentifythepossibilityofimpendingfailure.Inadditiontoseed-basedpatternapplication,temperatureandvoltagesensorsareusedtoquantifythedelaydegradation.Aschemetogenerateawarningalarmwhenthedelayincreasesathresholdvalueispresented.However,themethodologyofgradualdegradationdetectionandconsequentadaptationoverchiplife-timehasnotbeenpresented. 1.4ProposedSolutionsToaddresstheperformancechallenges-stemmingfromvariations-inmodernSoCs,thefollowingresearchtasksareproposedinthisthesis, 1.4.1Task1.SoCReliabilityAgainstPSNAsPowerSupplyNoisePSNisthemostseriousdynamicvariabilityfactor,inthisthesisthePSNchallengesareaddressedatthelayoutlevelusingfullydigitalsensorIPblocks.TheproposedsensorIPiscapableofextractingpowersupplynoiseproleinSoCsaswellasdebuganyPSNinducedfailure. 1.4.2Task2.PathTimingSlackExtractionand F MAX AnalysisSincestaticprocessvariationscausediscrepanciesinthedelayofcritical/near-criticalpathsfromchiptochip,thedesignofacompacttimingslacksensorthatcancapturethetimingdelay/slackofselectedcritical/near-criticalpathsin-situisproposed.Moreoveraninnovativeapplicationofthetimingslacksensorinspeedbinningisproposed. 27

PAGE 28

1.4.3Task3.SoCLifetimeReliabilityManagementToaddresstheworkloaddependentgradualagingprobleminSoCs,aninnovativesolutionofreusingoftheexistingBuiltInSelfTestBISThardwareforlifetimereliabilitymanagementisproposedinthisthesis. 1.5ContributionsThisthesisaddressesandpresentssolutionstothemajorreliabilityissuesofmodernSoCs. 1.5.1Task1.SoCReliabilityAgainstPSNAnovelsensormacrohasbeendesignedthatcontinuouslymeasurestheaveragePSNperclockcycle.Anon-chipnetworkofdistributedPSNsensorshasbeendesigned,andthedistributednatureofthenetworkoersthefeatureofobtainingnoisemapthroughouttheSoC'slayoutandtiminguncertaintiesattheproductiontestandsystemtestmodes.AdebugarchitecturehasbeendesignedthatincludeatemporalclockcounterandashadowregisterchaincalledChipStatusRegisterCSR.ThedebugcircuitryallowstakingasnapshotofSoCactivityindierentmodesofoperationattheonsetofpossiblefailuresforfuturereliabilityanalysis.ThedesignedsensornetworkhasbeendeployedanditsperformanceveriedatthephysicaldesignofOpenSPARCT1multicoreprocessorSoC. 1.5.2Task2.PathTimingSlackExtractionand F MAX AnalysisAnovelF MAXpredictionframeworkispresentedtoeliminatethelongtesttimeandcostassociatedwithfunctionalF MAXtesting.Forthedevelopmentofaccuratemachinelearningpredictors,thetimingslacksextractedfromactualcritical/near-criticalpaths-insteadofdatafrompathreplicaorringoscillatormonitors-areusedasfeatures.Toidentifythemostaccuratepredictorforthisframework,wetested5dierentmachinelearningtechniques. 28

PAGE 29

Forthepurposeofin-situtimingslackdataextraction,anoveltimingslacksensorIPhasbeendesignedthatcanmonitorandrecordthetimingslackofcritical/near-criticalcaptureip-ops.Inordertoensurethatthefeaturesofthemachinelearningtools-thepathslacks-capturespatialvariationandworkloadprole,anovellayoutandgate-overlapawarealgorithmhasbeenproposedfortheselectionoftheoptimumsetofcritical/near-criticalip-ops.Thealgorithmensuresthattheselectedcaptureip-opsandthecorrespondingsensorsarephysicallyspreadacrossthelayout.Moreover,tominimizetheextradesigneorts,anetlist-levelautomatedsensorinsertiontechniquehasbeendevelopedtoinsertthesensorIPsinthesynthesizedgate-levelnetlistoftheSoC. 1.5.3Task3.SoCLifetimeReliabilityManagementATPGtoolgenerateddelayfaultpatterns,targetingthehigh-usagecritical/nearcriticalpaths,areconvertedtoequivalentLBISTpatternsbyndingtherequiredseedsfortheLinearFeedbackShiftRegisterLFSR.Agate-overlapandpath-delayawarealgorithmselectsaminimumsetofpatternswhichactivatethetargetpathsusedasfeaturesofthemachinelearningpredictor.Theseedsoftheselectedpatternsarestoredinon-chipmemory,andappliedattheLBISThardwareatmultipletestclockfrequencieswhenrequired.Thecorrespondingresponsesofthesepatternsatthetargetedcaptureip-opsarecollectedinaseparateresponsestorageip-opchain.Asoftwareimplementedmachinelearningclassieristrainedwiththecollectedmultiple-frequencyresponses.Later,intheeldthetrainedpredictoraccuratelypredictsthestateofagingdegradationinreal-time.ThePLLadjuststhesystemclockaccordinglyoractivatestheadaptivecountermeasures. 1.6ThesisOrganizationInChapter2thesensormacroanddebugarchitecturetoaddresschallengesrelatedtoPSNarediscussedindetails.InChapter3thepathdelayextractionsensorand29

PAGE 30

itsapplicationinthedevelopmentofmachinelearningbasedspeed-binningpredictorispresented.TheapplicationofstoredtestpatternsfromBISThardwareforlifetimereliabilitymanagementisreportedinChapter4.FinallyconclusionsandfutureworksarepresentedinChapter5. 30

PAGE 31

CHAPTER2 DESIGNOFANETWORKOFDIGITALSENSORMACROSFOREXTRACTING POWERSUPPLYNOISEPROFILEINSOCSInthischaptertheproblemofPowerSupplyNoisePSNinaSystemonChipSoCisaddressedindetails.ThedesignofadigitalsensorIPandacorrespondingnetworkofsensorsaredescribedinthischapter. 2.1PowerSupplyNoiseModelThetypicalpowersupplynoiseobservedatthechiplevelcanbecharacterizedbythreedierentresonantdroopevents[68]asshowninFig.2-1andexplainedbelow.Thesevoltagedroopsaredeterminedbythesupplyimpedanceofthepowerdeliverynetworkatdierentfrequencies.FirstDroop:Thisisdeterminedbythepackageinductanceandon-diecapacitanceanditsdurationisofafewnanoseconds.Althoughthisistheshortestdroop,itisalsousuallythedeepestandthereforecanseverelyimpactmicroprocessorperformancewhenacriticalpathisaccessedinconjunctionwiththedroop.Thisdominantsupplynoisecomponentresonatesinthemid-frequencyrangebetween100and400MHz[69][70].ItcanbeseeninFig.2-1thattheresonantsupplynoiseonceexcitedcanbecomesignicantlylargerthanthenoiseatotherfrequenciescausingsevereimpactonthecircuitperformance.SecondDroop:Thisdroopisafunctionoftheboardandpackagedecouplingcapacitors.Thisresonantdroopeventpersistsforarelativelylongerdurationofafewhundrednanosecondsandimpactsasignicantnumberofcriticalpaths.Thisdroopcanbemitigatedbyintroducinghighqualitydecouplingcapacitorsatthepackageandmotherboardlevel. Thecontentsofthispaperwerepreviouslypublishedin[66]. 31

PAGE 32

Figure2-1.Powersupplynoiseproleatchiplevel.ThirdDroop:Thethirddroophasthelowestmagnitudecomparedtotheotherdroopsanditsintensityisdeterminedbytheboardlevelbulkcapacitors.Althoughthethirddroopcanimpactvirtuallyallcriticalpathssinceitsdurationisofafewmicroseconds,thisdroopcanbeeectivelyminimizedbyoptimumuseofbulkcapacitorsattheboardlevel.Asaresult,theadverseconsequencesofthisdrooponcircuitperformanceisminimal.Inadditiontothelocalizedmid-frequencyrstdroopnoiseevent,localizedhigh-frequency5GHzeectsexcitedbyon-dieinductiveeectsareobserved.Thishighfrequencydrooparehighlylocalizedwitharadiusofafewmicronstotheswitchingdevice,insignicantinmagnitudeandtransientinnature[71].Thesecondandthird32

PAGE 33

droopscanbehandledtraditionallythroughimprovingmotherboardandpackageroutingandincreasingtheamountofhighqualitydecouplingcapacitors.Incontrast,therstdroopcannotbeeasilyremediedthroughexternalmeasures[69].Inthiswork,wefocusondynamicallydetectingtheworstcaseofallthenoiseevents-theresonantrstdroop-acrossthechiplayoutordie.Forthispurposewemodeltherstdroopnoiseeventasadampedsinusoid[68]witharesonantfrequencyof250MHzasshowninFig.2-1. 2.2TheSensorNetworkArchitecture Figure2-2.Thesensornetworkarchitecture.ItincludesKsensorblocksdistributedinachiplayout.ThesensorcircuitstoaccommodatefastresponsetoPSNandimprovedimmunitytoprocessvariations.Thesensornetworkwouldbedesignedinsuchawaythateachlocalsensornodewouldmonitoritsneighboringpowersupplyandstorethedetectedworst-casePSNevent.Ateachworst-caseobservation,therespectivesensorIPwouldgenerateapulsetriggertoinitiatetherecordingofasnapshotofthechip'skeystatusbitsorcounttheclockcyclenumberwheretheeventoccurred.Fig.2-2showstheblockdiagramofour33

PAGE 34

Figure2-3.CircuitdiagramofthePowerSupplyNoisePSNsensorIP.targetsensornetwork.TheoperationofthesensornetworkandsensorIPareexplainedasfollows. 2.2.1LocalSensorBlockThecircuitdiagramofthelocalsensorblockisshowninFig.2-3.Theinputandoutputportsofthesensorareannotatedontheleftandrightboundaries,respectively.Thepsn ext set binputactsasthepower-on-resetpinandsetsalltheip-opsinthesensorto`1'.ThesystemclockisgatedwiththePSNactivationpin,psn ON,togeneratetheclockforPSN,psn clk.Theincomingscanenable,psn se,issynchronizedinsidethesensormoduletogeneratethesignalpsn se sync.Forpsn se=`1'thesensorisindataloading/extractionmodeandforpsn se=`0'thesensorisinactive/functionalmodetomeasurethePSNevents.Thepsn extract b=psn loadinputdecidesifthesensorisloadingcontroldataorextractingthePSNresultsbyscan.ThefunctionsoftheothercomponentsofthelocalsensorIParedescribedbelow. 34

PAGE 35

2.2.1.1TransitionGenerationUnitTGUTheTransitionGenerationUnitTGUlaunchesarisingtransitionintothesubsequentdelayline.Whilethesensorisin`scanmode'loading/unloadingthecontrolvectors/sensorresults,`1'issuppliedtothedelayline.Inthefunctionalmode-theactivemodewherethesensorrecordsnoisedata-theclocksignalissuppliedfromtheTGUtothedelayline.ThisclockislatercapturedintheCaptureUnitip-ops.Thesynchronizedscanenablesignal,psn se sync,isusedastheselectorofthemuxtoensurethatarisingtransitionsynchronizedwiththeclocksignalislaunchedintothedelayline. 2.2.1.2RecongurableDelayLineRDLTheRecongurableDelayLineRDLactsasavariabledelaysourcemadeofbuersandmuxes.TheRDLisarrangedintomultiplestagesofbuerswhereeachstagecansupplyadelaytwicethatofitspreviousstageasshowninFig.2-3.Thedelayofeachstageisadjustedbyamux.ThemuxesarecontrolledbyanexternallyloadedcontrolvectorsuchthattheRDLprovidesaminimumdelayofhalfclockcycle. 2.2.1.3FixedDelayStageFDSTheFixedDelayStageFDSconsistsofaseriesofbuers.TheexactnumberofbuersinFDSdependsonthetechnologylibraryandthePSNdetectionrange.ThecumulativedelayofRDLandFDSisadjustedtosupplyaminimumofoneclockcycle. 2.2.1.4ControlVectorUnitThebitstocontrolthemuxesoftheRDLareloadedintothescanip-op'softheControlVectorUnit.Theclocktothisunitisclock-gatedandisactiveonlyduringscanloading/extractionphase. 2.2.1.5CaptureUnitCUTheCaptureUnitCUconsistsofaseriesofDip-ops,wheretheip-opstaptheoutputsofthebuersintheFDS.TheclocklaunchedfromtheTGUpassesthroughtheRDL,andatacertainbuerpositionintheFDSthelaunchedrisingtransitionexperiencesadelaythatexceedsthedierencebetweenclockperiodandthesetuptime35

PAGE 36

ofip-ops.Thecaptureip-opcorrespondingtothisbuerwillrecorda`0'.Priortothisspecicip-op,allunitswillrecorda`1'andalllaterip-opswillcapture`0'.Thepositionoftherst`0'inthecaptureip-opchaindependsontheaveragesupplyvoltageperclockcycle-shiftingtotherightwithdecreasingVDDorincreasedsupplynoise. 2.2.1.6StorageUnitSUTheStorageUnitSUstorestheworst-casePSNeventobservedbythelocalsensorIP.Thisisaccomplishedwiththeaidofspeciallymodiedscanip-opcellnamed`StickyCell/StickyCell T'asshowninFig.2-3.FirsthalfoftheSUcellsare`StickyCell T'andtherestare`StickyCell'.The`StickyCell T'cangenerateapulseatitsTpinwhenthecellchangesitsstatefrom`1'to`0',indicatingagreaternoiseevent.OnlyhalfoftheSUcellsaremade`StickyCell T'inordertogeneratethetriggerpulseswhenthePSNeventissignicant,thisalsoreducesarea-overhead.Attheinitializationphasethescanip-opinsidethe`StickyCell/StickyCell T'issetto`1'.TheANDgateinsidethesecellsensuresthatiftheincomingDinputisevera`0',thecellwillholdonto`0'.Toexplainthiswithanexample,letsassumetheCUobservedthreecasesofnoiseevents-111110,110000and111100.Heretheworst-casePSNeventisthemiddleone-110000.Asaresultofthestickyproperty,theSUscanip-opswillrecord110000andthiscanbelaterextractedbyscan.Themeta-stabilityissuemayoccurinatmostoneoftheip-opsoftheCUiftheoutputfromthebuerofthedelaylinearrivelateintheip-op'ssetuptimewindow.Anysuchmeta-stabilityisautomaticallyresolvedinourarchitectureasweareusingtwoip-opsinseries,oneatCUandtheotheroneatSU.Thusasynchronizercircuit[72]isformedwiththeoutputofeachbuerintheFDS.Evenifthereoccursameta-stabilityintheCUblock,theip-opsoftheSUwillalmostalwaysbemeta-stabilityresolvedbecauseofthehighMeanTimeBetweenFailureMTBFofasynchronizercircuit[72]. 36

PAGE 37

2.2.1.7TriggerPulseGeneratorTheindividualTpulsesgeneratedfromthe`StickyCell T'arecombinedinthisunit.Thecombinedtriggerpulse,T P,indicatesthemomentswhenthesensorexperienceslocalworst-casenoise.Thispulseislaterusedfordebuggingpurpose. 2.2.2GlobalControlandDebugUnitThetaskoftheGlobalControlUnitistoacceptthetriggerpulsesfromallthelocalsensormodulescorrespondingtotherespectiveworstdroopsandgeneratetheglobaldebugpulseT Debug.WecanaddsomespeciallydesigneddebuggingblockstothismoduleasshowninFig.2-4anddiscussedinSection2.4. Figure2-4.Globalcontrolanddebugunit. 2.2.3SensorAccessandControlForeachsensorthecontrolvectorsandthePSNresultsarestoredinregistersandconnectedinascanchain.Thescanchainsoftheindividualsensorsareconnectedinadedicatedglobalscanchainforthesensornetwork.FromaSoCpointofview,thisPSNsensornetwork'sscanchaincanbeaccessedwiththeaidofJTAGfeatures[73].InFig.2-5theuseofexistingJTAGinaccessingourPSNsensornetworkisexplained.ThePSNStorageChainismodeledasthecustomuserdataregisterdenedinIEEE1149[73].TheTDIpinofJTAGisconnectedtothescan-inpinofthePSNsensornetwork.TheTestAccessPortTAPControllercanbeprogrammedtogeneratetheappropriatecontrol37

PAGE 38

Figure2-5.SensoraccesswithJTAG.signalsandinstructionstodictatetheoperationofthePSNsensorIPs.Thecustomsensoraccessmechanismsproposedin[74]canalsobeused. 2.3WorkingPrinciple:TrackingAverageVDDperclockcycleIn[75]therelationshipbetweenpropagationdelayofabuerandthepower/groundnoisehasbeenstudiedindetail.Itwasdemonstratedthattheincrementalchangeofbuerdelayislinearwithrespecttothenoiseinducedpowerandgroundvariations.Oursensorworksonthisprinciple.ThesensorresponsetoPSNcanbeexplainedwiththeaidofFig.2-6wheretheworst-casenoiseinduceddrooptherstdroopasdiscussedin2.1istargeted.InoursensorthedelaylineRDLandFDScombinedisadjustedsuchthatthetotaldelayisoneclockcycle,thisisdepictedinFig.2-6,where D 1 to D k arethedelayelementsmostlybuersandfewmuxes.ThepropagationdelayofD 1ist 1-t 0andthatofD kist n-t n )]TJ/F27 7.9701 Tf 6.586 0 Td [(1.Totaldelay,T total=n P j =1t j )]TJ/F28 11.9552 Tf 12.283 0 Td [(t j )]TJ/F27 7.9701 Tf 6.587 0 Td [(1isadjustedtooneclockcycle.Whenthelaunchedtransitionpassesthroughthedelayline,therstbuerD 1observestheVDDwaveformwithintheintervalt 1-t 0.Asdiscussedin[75][76],thedelayt 1-t 0isessentiallyalinearfunctionoftheaverageVDDintheintervalt 1-t 0.Similarlythethedelayofthe38

PAGE 39

Figure2-6.VDDoveraclockcycleandVDDobservedbyeachgate.delayelementD kisalinearfunctionoftheVDDwaveformbetweenintervalst n-t n )]TJ/F27 7.9701 Tf 6.586 0 Td [(1.Extendingthisconcept,itcanbeinferredthatthetotaldelay,n P j =1t j )]TJ/F28 11.9552 Tf 12.156 0 Td [(t j )]TJ/F27 7.9701 Tf 6.586 0 Td [(1,wouldbealinearfunctionoftheaveragevoltageovertheclockcycle.Weveriedthisbysimulationofarecongurabledelaylineconsistingofbuersandmuxesat28nmstandardcelllibrary[119].ThedelaylinewaspoweredbydierenttestsamplesofnoisyVDDandforeachsamplethetotalpropagationdelaywasadjustedtobe1clockcycle.ForeachnoisyVDDsamplewecalculatedtheaverageVDDoveraclockcycle.Nextforeachtestcase,weappliedaxedVDDtoobtainthesamedelayof1clockcycle.InFig.2-7,X-axisshowstheaverageVDDoveraclockcycleforeachtestcaseandY-axisshowsthecorrespondingxedVDDthatobtainedthesamedelayofaclockcycleforthatparticulartestcase.FromFig.2-7andthediscussionsabove,itcanbeobservedthatthedelayisindeedlinearlyproportionaltotheaveragesupplyvoltageoveraclockcyclewithinthenoisedetectionrange.Thelowestdetectionpointof0.85Vwaswellabovethethreshold39

PAGE 40

voltageofthedevices.Asfarasthenoisedetectionbandwidthofthesensorisconcerned,thesensorrespondstoallthemidtohighfrequencynoiseuctuations,butitonlyrecordstheaveragenoiseperclockcycleirrespectiveofwhatthenoisefrequencyis.Insummary,oursensorIPtrackstheaveragevoltageperclockcyclewiththeaidofadelaylineandstorestheworst-casenoiseinadigitalformat. Figure2-7.RelationbetweenmeanVDDoveraclockcycleandxedVDD.WeneedtoobtainacalibrationtabletomapthesensorcodestotheactualaverageVDDontherailthatthesensorobserved.ForcalibratingPSNsensorsplacedinsidetheSoC,atrsttheSoCwouldbestartedincontrolledconditionswithoutanyworkloadwhichwouldensurethatthesupplyrailsarefreefromPSNevents.Then,byvaryingthesupplyvoltageexternallyindiscretelevelsresemblingaverageVDDperclockcycle,thesensorresponsesatdierentvoltagelevelswouldbeextractedtoobtainthecalibrationormappingtableforthatparticularSoC.LatertheSoCwouldbeoperatedinthefunctionalmodewithworkloadsandanyPSNeventgeneratedinsidetheSoCwouldberecordedbythesensor.Therecordedsensorcodescanbedecodedfromthecalibrationtabletogettheworst-caseaveragesupplyvoltageperclockcycle. 2.4DebugUnitThedesign-for-debugDfDfeaturehasbecomeanindispensablepartofmodernSoCsastherstsiliconprototypeisrarelybugfree.TheembeddedDfDmodulesidentify40

PAGE 41

Figure2-8.TheChipStatusRegisterCSRunit. Figure2-9.Theclockcountermodule. Figure2-10.Countingclockwithinaninstructionwindow. 41

PAGE 42

anyelectricalorfunctionalbugthatmayexistintheprototypeproductandallowsthedesignerstoresolvethosebugsinthenextcommercialrelease[77][78].Besidesaidinginperfectingtherstsilicon,modernDfDtechniquesalsoensurecontinuouslife-timereliabilitymonitoringaftertheproducthasbeenshippedtothecustomer[78].AnecientDfDpackageincorporatesmechanismstopinpointafailurebothtemporallywhenandspatiallywhere[79].AtypicalDfDmoduleconsistsoftriggeringcircuitrytodetectparticularconditionsandstorageunitstorecordtracedsignalsofimportanceatthemomentwhereapotentialbuginuencestheSoC[80][81].Fromthetracedsignalsthebugislocalizedtoasmallerspatialregion,i.e.,specicfunctionalmodulethatcausesit,aswellasthemomenti.e.,certainlinesofcodeorspecicclockcyclethatinitiatedthebugInourproposedDfDarchitectureofthePSNsensornetwork,eachsensorgeneratesatriggerpulseateachlocalworst-casePSNeventtheresonantrstdroop.Thetriggerpulsesfromallofthedistributedsensorsarecombinedinaglobaltriggerpulse,T Debug.Thedelay,,betweenactualPSNeventandthecorrespondingtriggerpulseatT Debug canbemodeledwithEquation2{1. = t CLK + t local OR AND + t global OR + t routing {1InEquation2{1,t CLKistheclockperiod,t local OR ANDistheaggregateddelayfromtheOR/ANDgatesusedincombingthetriggerpulsesineachsensormacrotogeneratethelocaltriggerpulseasshowninFig.2-3.t global ORisthecombineddelayderivingfromORgatesusedinmergingthelocaltriggerpulsesfromalltheindividualsensormacrosintoT DebugasshowninFig.2-4.And,nallytheroutingdelay,t routing,isestimatedfromthephysicaldesigntool.InourarchitecturethespatiallocalizationofaPSNbugiseasilyaccomplishedsincethesensormacrosaredistributedacrossthelayoutandeachofthemisequippedwithascanchaintostorethelocalworst-casePSNevent.Totemporallylocalizethebugwithinatimeframeorablockofinstructionsequence,weproposetwofeatures.Inthe42

PAGE 43

rstfeatureasshowninFig.2-8,asetofimportantsignalswhichreecttheinternalmicro-architecturalandtheinstructionstatusarecontinuouslytraced.AteachglobaltriggerpulsethetracedsignalsarestoredasfootprintsinaregisterlecalledChipStatusRegisterCSR.InFig.2-8ateachclockcyclethesignalsaretracedsuccessivelyby`StorageRegister1'and`StorageRegister2'.Thesecondregister-`StorageRegister2'-isusedtoaccountforthethedelay,,betweentheactualPSNeventandthegenerationofapulseatT Debug.Generally,,isexpectedtobemorethanonebutlessthantwoclockcycles.Foradelayroundedtonextintegerclockcycleoftwocycleswemustkeeptrackofthetracedbitsatthecurrentandthelastclockcyclesatalltimes.InthiswaywhenthetriggerpulsearrivesonT Debug,thestatusoftracedsignalsatthemomentoftheactualPSNeventcanbecapturedintheCSR.Inthelayoutofourbenchmark,thedelay,,waswithintwoclockcyclesandhenceweusedtwostorageregisters.InotherSoCs,ifafterthenalphysicaldesignthedelay,,exceedstwoclockcycles,morethantwostorageregisterswouldbenecessary.UpondetectionofaPSNbugbyanyofthesensormacros,therecordedfootprintsintheCSRarescannedoutthroughJTAGinterface.BydecodingthestatusofthesesignalsatthemomentofthePSNevent,therootcauseofthebugcanbeidentiedasin[79].TheseconddebugfeatureconsistsofatemporalclockcounterunitwhichallowsidenticationoftheclockcyclewhereaseverePSNeventoccurred.TheclockcountermoduleisshowninFig.2-9.Becauseofstoragelimitationsitisnotpracticaltocountthealltheclockcyclesfromthebeginning.InFig.2-9theclockcounterisactivatedwithinacertainwindowbyassertingtheDebug enablesignal.ByselectivelyactivatingthecounteratdierentexecutionwindowsoftheworkloadorthetestprogramasshowninFig.2-10,itwillbepossibletonarrowdownthebugwithinparticularlinesofcodeorinstructionsequences.ThecountedclockcyclenumberissampledandstoredattheCriticalClockCycleRecorderRegisteratthetriggerpulseonT Debugpin.Bysubtractingthedelay,,roundedtothenextintegerclockcyclefromthisstoredvalue,theactualcriticalclockcyclewouldbeobtained.Forexample,ifthetriggergeneration43

PAGE 44

delay,,istwoclockcyclesandthelatchedclockcyclenumberis1500,theactualclockcyclenumberwithinthewindowwherethePSNeventoccurredwouldbe1498. 2.5DesignMethodologyIndesignofthePSNsensor,therststepistoidentifytherequirednumberofstagesintheRDLandthenumberofbuersintheFDS.Thesenumbersdependmainlyonthreefactors-ithedelayofbuersandmuxesintheselectedtechnologylibrary,iitheclockfrequencyofthechip,andiiithedesirednoisedetectionrange.AsmentionedinSection2.2,arisingclockedgeislaunchedintotheRDLunitandsubsequentlycapturedbythecaptureip-opsintheCU.Therstcaptureip-opmustobservea`1'forproperoperationofthePSNsensor.Tomeetthiscondition,thecumulativedelayfromthelaunchpointinTGUtotherstcapturepointinCUmustbeatleasthalftheclockcycleatallvoltagelevelswithinthenoisedetectionrangeasexpressedinEquation2{2. T TGU )]TJ/F29 7.9701 Tf 6.587 0 Td [(to )]TJ/F29 7.9701 Tf 6.586 0 Td [(RDLout > = T CLK = 2{2IfthereareKstagesintheRDL,thetotalnumberofbuersandmuxesintheRDLwouldbeK )]TJ/F15 11.9552 Tf 1.02 0 0 1 156.205 348.91 Tm [(1andK,respectively.ThereisalsoanadditionalmuxintheTGU.WeneedtoselecttheminimumrequirednumberofstagesK minthatsatisesEquation2{2inalltheprocessandtemperaturecornersaswellasallthepossiblevoltagelevelswithinthePSNdetectionrange.Iftheminimumbuerandmuxdelaysoverallthecornersaret min B and t min M ,respectively,thenEquation2{2canbewrittenas, t min M + K min )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 t min B + K min t min M = T CLK = 2{3Afterndingtherequirednumberofstages,K min,inRDL,wehavetoidentifytheminimumrequirednumberofbuersintheFDS.FortheratedVDDwithoutanyPSN,thetotaldelayfromthelaunchpointinTGUtothesecondlastbuerinFDSmustbeoneclockperiodminustheset-uptimeofcaptureip-op.ThisrelationisexpressedbyEquation2{4,wheret RVDD Mandt RVDD Barethemuxandbuerdelays,respectively,at44

PAGE 45

ratedsupplyvoltage.N RVDD TBistherequiredtotalnumberofbuersintheRDLandFDScombined,attheratedVDDRVDD,and t su istheip-opset-uptime. t RVDD M + K min t RVDD M + t RVDD B N RVDD TB = T CLK )]TJ/F28 11.9552 Tf 11.955 0 Td [(t su {4Usingsimilarrelationshipsforthecasewheretheworst-casePSNdropstheaveragesupplyvoltageperclockcycletothelowestlevelLVDDofthedetectionrange,wecanobtainEquation2{5. t LVDD M + K min t LVDD M + t LVDD B N LVDD TB = T CLK )]TJ/F28 11.9552 Tf 11.955 0 Td [(t su {5AtthePSNinducedlowestaveragesupplyvoltagelevel,LVDD,thePSNsensorcodeissetto100...0,whichimpliesnoneofthebuersoftheFDSisrequiredachieveadelayofT CLK )]TJ/F28 11.9552 Tf 12.146 0 Td [(t suatLVDD.ThisrelationshipcanbeexpressedbyEquation2{6.HereC RDListhecontrolvectorwhichdetermineshowmanyofthebuersinRDLwouldbeactivatedatthebeginningfortheselectedsensormacro. C RDL = N LVDD TB {6SimilarlyattheratedsupplyvoltagewithoutanyPSNorthehighestpointofthedetectionrange,thesensorresultissetto11...10,implyingthatuptothesecondlastbueroftheFDSisrequiredtoachievethedelayT CLK )]TJ/F28 11.9552 Tf 11.784 0 Td [(t su.ThisconditionisexpressedbyEquation2{7. C RDL + N FDL )]TJ/F15 11.9552 Tf 11.955 0 Td [(1= N RVDD TB {7 Basedontheaboveanalysis,thedesignstepsaresummarizedinthefollowing, 45

PAGE 46

Step1:FromthetargettechnologylibraryandthedesiredPSNdetectionrange,thebuert min B ;t RVDD B ;t LVDD Bandmuxt min M ;t RVDD M ;t LVDD Mdelaysareobtainedconsideringallcornersofoperation. Step2:Equation2{3issolvedtond K min Step3:Equation2{4issolvedatdierentprocesscornerstond N RVDD TB Step4:UsingthevalueofN LVDD TBobtainedfromsolvingEquation2{5,thecontrolvector C RDL atrespectiveprocesscornersareobtainedfromEquation2{6. Step5:FinallyEquation2{7issolvedtondtheoptimumlengthofFDS,N FDL,thatsatisesalltheprocesscorners. 2.6SimulationResultsandAnalysisThestartingphaseofthedevelopmentofthesensornetworkistodesignthePSNsensormacrofromstandardcells.Indesigningthesensormacro,werstneedtondtherequirednumberofstagesinRDLandFDSforthechosentechnologylibrary.Weselectedthe28nmstandardcelllibraryfromSYNOPSYS[119]andperformedadesignspaceexploration.Thebuerandmuxdelaydistributionsatdierenttemperaturesandprocesscorners-slow-slowSS,fast-fastFF,typical-typicalTT-aredepictedinFig.2-11aandFig.2-11b,respectively.Inthedesignspaceexploration,wetargetedadetectionrangeof20%VDDdrop.Inotherwords,themostseverePSNeventthatthesensorcanrespondisthecasewherethesuddenPSNeventwouldcausetheaveragevoltageperclockcycletodropdownto80%oftheratedVDDRVDD=nominalVDDor100%VDDandLVDD=80%ofnominalVDD.ThelowestdetectionrangeLVDDmustbewellabovethethresholdvoltageofthetransistors,otherwise,thestandardcellsofthePSNsensor,speciallytheip-ops,wouldbecomesignicantlyslower.Forour28nmlibrary[119],RVDD=1.05V,LVDD=0.84Vandthetypicalthresholdvoltageis0.44Vand-0.3VforNMOSandPMOSdevices,respectively.Afterdecidingthelowerboundofthedetectionrange,wefollowedthevedesignstepsasmentionedinSection2.5andobtainedaK min 46

PAGE 47

Figure2-11.Thedesignspaceexploration.aBuerdelayprole.bMuxdelayprole. Table2-1.Calibrationresultsat25 Cfor28nmstandardcelllibrary AverageVDD perclockcycle V C RDL TT corner NoiseID TT corner C RDL FF corner NoiseID FF corner C RDL SS corner NoiseID SScorner 1.059142614012 1.039132613011 1.019122612010 0.99911261008 0.9791026907 0.959926806 0.939726705 0.919526604 0.899426403 0.879326302 0.859226101 0.849126100 of5consideringallprocessandtemperaturecorners.C RDLobtainedatTT,FFandSScornerswere9,26and0,respectively.TherequiredminimumnumberofbuersinFDSwas14thatsatisedallprocessandtemperaturecorners.InadditiontothePSNinducedVDDvariation,thetemperatureuctuationswillalsoimpactthedelayofabuer.Inoursimulationwith28nmstandardcelllibrary[119]thebuerdelaychangedby8.7%astemperaturewasvariedfrom-25Cto125C.Accuratedecouplingoftheeectofpowersupplynoisefromthetemperature,onthesensorreading,47

PAGE 48

willrequireinformationonthelocaltemperatureoftheSoC.WeassumethatthechiptemperaturewouldbeavailablefromthetypicalthermalsensorsusedinSoCs[82][83].ThePSNsensorresponsecanbeaccuratelycalibratedwiththistemperatureinformation.Afternalizingthedesignparameters,wedevelopedastructuralRTLofthesensorIPinVerilog.TheRTLwassynthesizedwithDesignCompiler[119]followedbyphysicaldesignwithICCompiler[119].ApostlayoutSPICEnetlistwasextractedandadetailedtransistor-levelsimulationwasperformedwithHSPICE[119]toverifyitsoperation.Thesensorwascalibratedatdierentprocesscornersinordertoobtainthemappingbetweentheaveragevoltageoveroneclockcycle,controlvectorandtheNoiseID.Thecalibrationresultsat25CaretabulatedinTable2-1.ThenoiseresultsstoredintheStorageUnitarerepresentedinaNoiseIDformatwhereNoiseIDrepresentsthenumberof`1'sinthe15bitnoiseresultvector.Anoisevectorof111110000000000translatestoNoiseIDof5and110000000000000translatesto2.The5bitcontrolvector,C RDL,isshowninitsequivalentdecimalvalueinTable2-1.Inourdesign,fortheselected28nmtechnologylibrary[119],theaveragenoisedetectionresolutionwas20mV.Thisalsoimpliesthatthemaximumdetectionerroris20mV.NextweappliedtwodierentPSNeventsontheVDDsupplyrailandobservedthesensorresponse.ThedetailedHSPICE[119]simulationresultsaredepictedinFig.2-12.Alltheip-opsintheCaptureandStorageUnitsaresetto`1'atthebeginning.Thescanenablesignalofthesensoriskeptatlogichighuptotherst6clockcyclesandthecontrolbits,C RDL,arescan-loadedintotheip-opsoftheControlUnit.Afterloadingthecontrolvectors,thePSNsensor'sscanenablesignalissetto`0'andthesensorentersthefunctionalmodewherethesensorisintheactivestatetorecordPSNuctuationsontheVDDsupplyrail.Inoursimulation,inFig.2-12,thefunctionalmodepersistsfromclockcycles7to24.Inthismodearisingtransitionfromtg outpinFig.2-12islaunchedintothedelaylineandlatercapturedafteradelayofoneclockcycle.TherstdroopPSNeventisinducedontheVDDwaveformatclockcycle14.Theworstdroopoccursbetween48

PAGE 49

clockcycles15and16.Correspondingtothisworst-casenoiseevent,atriggerpulsehadbeengeneratedatT Pafteradelayofoneclockcycle.Notriggerpulseisgeneratedforthesubsequentdroopssinceoursensormonitorsallthenoiseeventsandrespondsonlytotheworst-casenoise.Theworst-casenoiseresultisrecordedindigitalformatinthebitsnoise[0]tonoise[14]ofStorageUnit,andthecodeinthiscaseis111100000000000.ThiscodeisequivalenttoaNoiseIDof4andfromTable2-1,thiscodecanbetranslatedtoanaveragevoltageof0.89Vperclockcycle.TheactualaverageVDDoverthatclockcyclefromdirectcalculationonthewave-formis0.884V.Itcanbeobservedthatthenoisevectordidnotchangeatthesubsequentdroops,thisisbecausethersteventwastheworst-case.Inadditiontoassigningdedicatedoutputportsforthenoisebits,thenoisevectorcanalsobeextractedviathescanoutputpinofthesensor.Startingfromclockcycle24,thescanenablesignalisagainsetto`1',thesensorleavesthefunctionalmodeandentersthedataextractionmodewherethenoisevectoraswellasthecontrolvectorcanbescan-extracted.Duringscan-loadingofthecontrolvectors,thepsn xtract bpinFig.2-12iskeptat`1'andduringscan-extractionitiskeptat`0'forthereasonsexplainedinSection2.2.AsalientfeatureofoursensorIPistheoptiontoadjustitagainstchip-to-chipandspatialprocessvariationswhicharecommonincurrentCMOStechnologies.Forexample,attypical-typicalcornerweexpecttostartthesensorwithC RDL=9,suchthatatratedVDDtheNoiseIDis14equivalentnoisecodeof11...110asshowninTable2-1.But,becauseofspatialandchip-to-chipprocessvariations,forsomesensormacrosthedelaylinemightbeslowerorfaster,andasaresult,theNoiseIDatratedVDDwoulddeviatefromtheexpectedvalueof14evenwithC RDL=9.Forthistypeofsensormodules,theC RDLwouldbeadjustedexternallyviascansuchthatatratedVDDtheNoiseIDissetto14.Tosimulatethisscenarioweperformeda100pointMonte-CarloprocessvariationsimulationwithHSPICE[119].TheselectedparameterstovaryfromthenominalweretransistorlengthL,widthWandthresholdvoltageVtheachby15%49

PAGE 50

Figure2-12.Post-layoutSPICEsimulationresults. 50

PAGE 51

Figure2-13.EectofprocessvariationonControlVector,obtainedfrom100pointMonteCarlosimulation.andoxidethicknessToxby3%.Allsimulationsweredonewithinthestatisticalrangeof3standarddeviations.TheresultsareshowninFig.2-13.FromFig.2-13itisobservedthatformostofthecasesC RDLof8or9isperfect.Forrestofthecases,wherethedelaylinegetsfasterorslowerbecauseofprocessvariations,C RDLwouldbeincreasedordecreased,respectively,tocompensateprocessvariationinduceddelayuctuations.TheadjustmentofC RDLfortheextremecasesofprocessvariationsFFandSScornersarealreadymentionedinTable2-1.BecauseoftransistoragingfromBiasTemperatureInstabilityBTIandHotcarrierInjectionHCIeects,thecumulativedelayofthedelaylineisexpectedtoincreaseovertime.Similartothemethodofcompensatingprocessvariations,wecanreduceC RDLtoadjustthesensoragainstagingdegradation.WeperformedHSPICEMOSRA[119]agingBTIandHCIsimulationsandobservedthattocompensatefortheagingeectof3years,the C RDL isrequiredtobereducedto7from9forthetypicalprocesscorner.OurnextobjectivewastoimplementanetworkofPSNsensormacrosanddeploythenetworkinalargeSoCrepresentativeofindustrylevelcomplexity,applicationandsize.Forthispurpose,aftersuccessfulpost-layoutformalverication,thelayoutofthePSNsensorIPwassavedasahardmacro.DecouplingcapacitorwasinsertedinsidethelayoutofthesensormacrotoensurethatthepowersupplynoisefromtheinternalswitchingactivitiesofthePSNsensormoduleisminimizedandthemacroessentiallyrespondsto51

PAGE 52

Figure2-14.Themacrolayout.aPSNsensor.bCellsfromdierentunitsarehighlighted,darkregionsaredecoupling-capacitors.cDistributionofinternalcellsbycolor.theuctuationsonlocalpowerrailsoftheSoC.ThepostlayoutmacroareaofthesensorIPwasconnedtoa28.5mx28.5mblockasshowninFig.2-14.InFig.2-14bthecellsfromdierentinternalmodulesFig.2-14carehighlighted.Toanalyze,usingoursensorIP,thespatialpowersupplynoiseproleofalargeSoC,wechosetheOpenSPARCT1multi-coreprocessorSoC[117].TheOpenSPARCT1opensourcereleasecontainsthefullRTLofthe64bitUltraSPARCT1multicoreprocessorarchitecture,ASICsynthesisscriptsandrealworkloadbasedtestbencheswritteninSPARCassemblylanguage.FullOpenSPARCT1SoCincludes8SPARCcoreswithdedicatedL1dataandinstructioncachespercore,asingleFloatingPointUnitFPU,asharedL2cache,aCPU-cachecrossbarCCX,anI/ObridgeIOBDG,aJBusinterfaceJBI,DRAMcontrollersandaClockandTestunitCTU[117].WestartedwiththesynthesisandfullphysicaldesignofasingleSPARCcore.Thecachemacroblocks,whichareSRAMS,werenotavailablewiththeOpenSPARCT1toolset.WeusedthememorycompileravailablefromSYNOPSYS[119]togeneratethelayoutsofthecacheunits.52

PAGE 53

Figure2-15.Thecompletelayout.aSingleSPARCT1corewithembeddedPSNsensornetwork.bHierarchydistributioninthelayoutofSPARCcorewithembeddedsensornetwork.Thesensormodulesareannotatedonthelayout.cHierarchicalmodulenamesandcellcountofSPARCcore.SomemodicationsofthegivenRTLcodeofthecacheperipheralblockswererequiredtoaccommodatethesecustommadeSRAMmodulesintothedesign.Theprocessor'sregisterlesweregeneratedfromthegivenRTLusingstandardcells.ThecompletegatelevelnetlistoftheSPARCcorewassynthesizedwithDesignCompiler[119]usingthesame28nmstandardcelllibraryasinpreviousSPICElevelsimulations.AnestimateofthelayoutareaoftheSPARCcorewasobtainedfromaninitialfullphysicaldesignofthecore.ThisareainformationwaslaterusedtodecidethenumberoflocalIPblocksinthedistributedsensornetwork.AstructuralVerilognetlistofthePSNsensorchainwaspreparedandaddedtothenetlistoftheSPARCcore.DuringthephysicaldesignphaseoftheSPARCcorewithsensors,rsttheSRAMcachemoduleswereplaced.Afterassigningthecachemodulestheirdedicatedspaces,andbeforeplacingthecorestandardcells,theremainingareaofthelayoutwasdividedintoequalsizedregionsandasensorIPwasplacedandxedineachblock.Finally,thecorestandardcellswereplacedalongwiththeCSRmoduleandalltheremainingstepsofphysicaldesignwerecompleted.Fig.2-15depictsthelayoutandhierarchydistributioninthephysicaldesign53

PAGE 54

asavailablefromICCompiler[119].ThesensorlocationswiththeirassignedIDsandtheCSRfortheSPARCcoreareannotatedontheimageinFig.2-15b.Thespatiallyvaryingswitchingactivityisdependentonthetypeoffunctionalunitandtheexecutedworkload.ToextracttheworkloaddependentswitchingactivityinsidetheSPARCcore,weran68dierenttestbenchprogramsavailablefromtheopenSPARCT1vendorsinthecore1regressionpackage[117].AValueChangeDumpVCDlewasgeneratedfromeachofthebenchmarks.TheswitchingactivityinformationwereextractedfromtheseVCDlesandannotatedinthepost-physicaldesignpowerrailanalysis.ForthepowerrailanalysisweusedSYNOPSYSPrimeRail[119].PrimeRailtakestheannotatedswitchingactivityinformationandthevalueofthenominalVDDasitsinput,extractstheparasiticRCofthepowergridnetworkandlayoutandthenperformsadynamicorstaticpowerrailanalysistoreportvoltagedropateachstandardandmacrocell'sPowerandGroundPGpins.Table2-2reportstheworst-caseaveragevoltageperclockcyclemeasuredbyeachsensorIPandthecorrespondingpercentagedrop.ThenominalVDDwas1.05V.FromTable2-2,itisobservedthatthesensorscapturedthelocalpowersupplyvariation.TherecordedvoltagedropatS6,S10andS14arecomparativelyhigherthantheothersensors.ThisphenomenoncanbeexplainedbythefactthatthesesensorsareplacedclosetotheSRAMcacheblockswherelargeswitchingeventsoccurduetothepresenceofaddressdecodersandlongwordandbitlines.EachSPARCcoreisequippedwithasingleALUanditislocatedintheexecutionunitexu.Inadditiontoperformingthetypicalarithmeticandlogicinstructions,thisALUisalsoreusedinbranchandvirtualaddresscalculations.AsaresulttheexuunitexperiencesalargerswitchingactivitycomparedtotherestofthemodulesoftheSPARCcore.ThisexplanationisreectedinthelargervoltagedropsonecanobserveatsensorsS5,S8,S9,S12andS16,whichareplacedinsidetheexuunit. 54

PAGE 55

Table2-2.Worst-caseaveragevoltageperclockcycledetectedoneachsensorandcorrespondingVDDdropforSPARCcore SensorID Recordedaveragevoltage V Percentagedrop% S00.9311.4 S10.977.6 S20.986.6 S30.9410.4 S40.9410.4 S50.959.5 S60.9014.3 S70.968.5 S80.9410.4 S90.9410.4 S100.9113.3 S110.9410.4 S120.9212.3 S130.9311.4 S140.9014.3 S150.9410.4 S160.9410.4 S170.968.6 S180.977.6 S190.986.6 Figure2-16.TheCSRofSPARC. 55

PAGE 56

InSection2.4weproposedtwodebuginfrastructures-recordingofcriticalinstructionfootprintsindedicatedstorageandclockcycletracking-toisolatethecauseofthePSNbug.InourhardwaredesignweimplementedthecriticalbittrackingschemewithadedicatedChipStatusRegisterCSR.ForthecaseoftheSPARCcore,thedesignershaveselected94bitsthatreectseveralimportantarchitecturalstatesandactivitiesofthecore[117].ThesebitsarecontinuouslymonitoredandonadebugsignalassertedfromtheCTU,thesestatesaresavedorupdatedinashadowscanchain.WetookadvantageofthisexistingarchitectureindesigningtheCSRofoursensornetworkinsidetheSPARCcore.Thetriggerpulses,T P,fromindividualsensormacrosarecombinedintoaglobaldebugtriggerpulse,T Debug,asshowninFig.2-3.ThecumulativedelayfromtherstthreetermsofEquation2{1,whichmodelsthepulsegenerationdelay,was1.216ns.Theclockcyclewas0.833ns.ForthelastterminEquation2{1,therangeofthevaluesofroutingdelayforthelayoutwasestimatedfromananalysisofthepreliminaryoor-planinICCompiler[119].Fromtheestimates,itwasobservedthatthetotalpulsegenerationdelay,,wouldnotexceed2clockcyclesminusthesetuptimeofstorageip-ops T CLK )]TJ/F28 11.9552 Tf 11.956 0 Td [(t su.Thus,weneedtotrackthecriticalbitsfortwosuccessiveclockcyclesbeforelatchingthosebitsintheCSRattheinitiationoftheT Debugpulse.InFig.2-16,theleftblockistheexistingpartoftheSPARCcoreandtherightmoduleshowsourCSRunit.TheFF1andStorageRegistertrackthecriticalbitsfortwoconsecutiveclockcyclesbeforerecordingitintheCSR.TheCSRisconnectedbyscanchainwithrestofthesensornetwork.ThephysicalpositionoftheimplementedCSRisshowninthelayoutofFig.2-15.ThepostlayoutareaandpowerdataarepresentedinTable2-3.Incomparisonofourpowersupplynoisesensormacrowiththeexistingsensorsintheliterature[46][47],oursensormacrobelongstospecialclasswherewecapturetheaveragenoiseperclockcycleinthetypicalrstdroop,obliviousoftheexactfrequencyofthenoise.Onthecontrary,thereportedsensorsinliterature[46][47],strivetoreproducetheexactnoisewaveformwithinanaccuracyofafewmVatthecostofareaintensivesub-blocks56

PAGE 57

Table2-3.PostlayoutstatisticsforSPRACcorewithsensors Areaofeach sensorIP AreaofCSR unitofSPARC core SPARCcore layoutarea Areaoverheadfor 20sensorsand CSR Power dissipationper sensor 813 m 2 1625 m 2 6002500 m 2 0.30%526 W suchasVCO,ADC,DACetc.Inourdesignwetrade-oaccuracyforsimplicityandthepotentialofmultipledeploymentacrosstheSoClayout. 2.7TestChipImplementationThePSNsensorIPwasimplementedinGlobalFoundriesadvanced14nmnfettechnology.Asofwritingthisthesis,thetestchipisstillunderfabricationatGlobalFoundriesatMalta,NY.Fig.2-17showsasnippetoftheGDSIIviewofthetestchip. Figure2-17.SnippetfromtheGDSIIviewfromGlobalFoundriestestchip. 57

PAGE 58

CHAPTER3 SOCSPEEDBINNINGUSINGMACHINELEARNINGANDON-CHIPSLACK SENSORSInthischaptertheproblemofpathdelayvariationduetodeviceparametricvariationanditsimpactonchip F MAX isaddressedindetails. 3.1ProposedSpeedBinningFlowThechipF MAXislimitedbythepathdelayofthemostcriticalpath.Becauseofprocessvariations,theexactcriticalpathcannotbeidentiedatthepre-fabricationstage[8].Asaresultanyofthecriticalornear-criticalpaths-identiedfromtheStaticTimingAnalysisSTAorStatisticalSTASSTA-canbethespeed-limitingmostcriticalpathforacertainchipdependingonthechip-to-chipprocessvariations.Withoutlossofgenerality,pathslackandF MAXdistributionsarecorrelated[27][51].Asaresult,iftheimpactofprocessvariationsonpathslackdistributionisknown,usingtheslack'scorrelationwithF MAX,onewouldbeabletopredictthedistributionofF MAXwithprocessvariationsattheproductionteststage.Inorderforthisapproachtobepractical,wewouldrequireasimplermethodtoextractpathslackin-situfromfabricatedchips.Towardsthisgoal,wehavedesignedanalldigitalsensorIPwhichisconnectedtoacaptureip-op,andrecordstheworst-caseslackofallthepathsthatterminateatthatcaptureip-op.Theuseofslacksensorseliminatesthefrequencysweepstepassociatedwithspeedbinning.Structuraltestpatternsareusedtoexcitethosecritical/near-criticalpathsmonitoredbytheslacksensorsattachedtotherespectivepath-endingcaptureip-ops.UsingmodernATPGtoolse.g.SynopsysTetramaxthosespecicpathsorcaptureip-opscanbeeasilyexcitedbyPDForTDFpatterns,withouttherequirementofexecutingexhaustivefunctionalpatterns.Applicationofthesetestpatternsand Thecontentsofthispaperwerepreviouslypublishedin[85]. 58

PAGE 59

Figure3-1.F MAXprediction.aF MAXpredictordevelopmentfromCTTs.bF MAX predictionofCUTsusingthedevelopedpredictor.correspondingresponseextractioncanbeaccomplishedbytheATEtoolthroughthechip'sJTAGports.Beforeproceedingwiththeproposedspeedbinningow,itisnecessarytomakesureallthechipsamplesarefreefromallpossiblemanufacturingdefectssuchasstuck-atfault,path/transitiondelayfaultetc[55].Thisistoensurethatnodefectivechipescapesfromtheinitialstageofchiptest[55].Highlyunreliabletrainedpredictormayresultifgrosstimingdefectsarenotscreenedoutinadvance.AlsopredictedF MAXresponsefromdefectivesampleswillbeerroneous.OurproposedF MAXpredictionmethodologyisdepictedinFig.3-1.TheessenceofthisapproachistotrainamachinelearningclassierwithsucienttrainingdataobtainedfromChipsunderTestandTrainingCTTFig.3-1aandlaterusethistrainedpredictortoestimateF MAXfornewChipsUnderTestCUTFig.3-1b.TherequirednumberofCTTsmarkedasblackdotsonthewaferinFig.3-1andependson59

PAGE 60

thecharacteristicsofthedataandgenerallyn<
PAGE 61

learning-reported2780XreductioninF MAXtesttimeforafourbinsearchspaceovertheconventionalfunctional F MAX testingforARM-CA9SoCsfabricatedin28nmtechnology. 3.2ThePathSlackSensorAnimportantcomponentoftheowpresentedinFig.3-1isalow-costembeddedsensortoaccuratelymeasureslackoftheselectednumberofcritical/near-criticalpaths.ThecircuitdiagramoftheusedslacksensorIPisshowninFig.3-2.Whenactivated,thesensormonitorsandrecordstheworst-casetimingslackatthecaptureip-opforallthepathsterminatingatthatparticularcaptureip-op.Thesensorprobespathendingcaptureip-op'sDportthroughaminimumsizeclockgatingcellensuringthatsmallamountofloadcapacitanceisaddedtothemaincircuitpath.InFig.3-2thearchitectureofrisingtransitiondetectorsensorisshown.TomakeafallingtransitiondetectionsensortheM0toM6bitsaresimplyrequiredtobeconnectedtothe QoutputsinsteadofQoftheip-opsoftheMonitorUnit.Thedierentmodulesofthesensorandtheirfunctionsarebrieydescribedbelow. 3.2.1ClockGatingUnitThesensorIPisclockgatedwiththeEnableinputtoturnitononlyduringthetimingslackmeasurementintervalasshowninFig.3-2.ThegatedclocksignalCLK Encontrolsallthelocalip-opsinthesensor.TopreservetherelativetimingrelationshipbetweentheactualclockCLKandinputdataD in,theclockgatingcellwasplacedwithboththeincomingdataD inandclockCLKinputs.AsaresultCLK EnandD Enweresynchronizedsimilarlytotheirnonclock-gatedcounterparts.TheclockgatingmechanismalsominimizespowerdissipationduringthesensorOFFstagebycuttingthedynamicpowerdissipation. 3.2.2DelayLineThedelaylineconsistsofachainofunitsizebuersfromthestandardcelllibrary.Thebuersquantizethetotalslackaccordingtothepropagationdelayoftheusedbuer.Thenumberofrequiredbuerstagesinthedelaylinedependsonthedesiredslack61

PAGE 62

Figure3-2.Theslacksensorarchitectureforrisingtransition.detectionrangeandthedelayofaunitsizebuer.Useoftheunitsizebuerfromthestandardcelllibraryensuresthattheslackdetectionresolutionistheoptimum. 3.2.3MonitorUnitAip-opisattachedattheendofeachbuertocapturethestatesinbitsM0toM6attheactiveclockedge.Thecombinationofdelaylineandtheip-opsintheMonitorUnitMUconvertthetimingslackintoacorrespondingdigitaldata.Therstbit,M0,ofmonitorunitisinvertedtogettheStatesignal.ThisStatesignalindicatesifthedesiredrisingorfalling,dependingonsensorcongurationtransitionisoccurringintheincomingdata D in 62

PAGE 63

3.2.4CaptureandResultStorageUnitTheCaptureandResultStorageUnitCRSUisconstructedofstoragecellsconnectedinascanchain.EachstoragecellconsistsofanOR-ANDgateandascanip-opasshowninFig.3-2.Eachip-opinthescanchainsamplesthecorrespondingMUip-op'soutput.TheANDgateconnectedtothescanip-opinsidethestoragecellmakesitsticky,inthesensethatiftheip-opeverrecordsazero,itwillholdontothezerountilresetoritisscan-loadedwithalogicone.TheStatesignalmakestheORgatetransparenttotheoutputsofMUip-opsonlyforthedesiredtransitions.Attheinitializationstageallthescanip-opsaresettologicone.Duringtheslackevaluationphase,SEissettozeroandtheip-opsinsideCRSUrecordtheworst-caseslackobservedatthemonitoredcaptureip-op.ThesensordataextractionprocessisinitiatedbysettingSEpintologichighwhichconnectstheip-opsinascanchainclockedbythemainclockandnallythestoredslackdataisextractedthroughtheSenor SOpin.Formultiplenumberofsensors,theirCRSUareconnectedbyascanchainfordataextraction.Themeta-stabilityissuemayoccurinatmostoneoftheip-opsoftheMUblockiftheoutputfromthebuerofthedelaylinearriveslateintheip-op'ssetuptimewindow.Anysuchmeta-stabilityisautomaticallyresolvedinourarchitectureasweareusingtwoip-opsinseries-formingasynchronizercircuit[72]-withtheoutputofeachbuerinthedelayline.Sinceip-opsinbothMUandCRSUareoperatedbythesameclock,thevaluesthatwerelatchedbyMUip-opsaresampledagaininthenextclockcyclebytheCRSUip-ops.Thislatencyofoneclockcyclebetweensuccessivesamplingisutilizedtoresolvethemeta-stabilityintheoutputofMUip-ops,ifany.Asaresult,evenifthereoccursameta-stabilityintheMUblock,theip-opsoftheCRSUwillpracticallybemeta-stabilityresolvedbecauseofthehighMeanTimeBetweenFailureMTBFofasynchronizercircuit[72]andthedesignedstickyfeatureoftheCRSUip-ops.Inadditiontoitsapplicationinspeedbinning,thesensorIPcanbealsousedforin-eldreliabilityobservation.Whennecessary,instancesofthissensorembedded63

PAGE 64

Figure3-3.Sensorinsertionow.intotheSoCcanbeactivatedtomonitorandrecordthedelaydegradationonthecritical/near-criticalpathsbecauseofnoise,aging,andotherwear-out.Forthiswork,weonlyanalyzetheapplicationofthesensorIPinspeedbinningat time 0 3.3SensorInsertionFlowInthisSectionwedevelopagate-netlistlevelandlayout-awarealgorithmtoselectpath-endingcaptureip-opstobemonitoredbysensors.Thesensorsactasfeaturesinourmachinelearningtool.Thecandidatecaptureip-opsforsensorplacementshouldbeselectedbasedontwomaincriteria-ithesensorstargetthecritical/near-criticalpathsandiithesensorsorfeaturesofthemachinelearningtoolshouldbeagoodrepresentativeofthetrendofthemodelingdatatoincorporatethechip-to-chipPVTvariations[86].Ourproposedsensorinsertionowalongwiththecaptureip-opselectionalgorithm,showninFig.3-3,addressesthesecriteria.Fortherstcondition,itisnotpracticaltoplacesensorsattheendofeachofthecritical/near-criticalpaths,asthenumberofsuchpathsareextensiveinmodernSoCs.Hence,thenumberofsensorstoplaceare64

PAGE 65

Algorithm1 ProposedCaptureFlip-opSelectionAlgorithm 1: procedure NetlistParser 2: Input: Gate-levelnetlistofthemaindesign 3: Input: Numberoflogicalmodulesindesignnetlist, N module 4: Input: Statictiminganalysisreportwithcaptureip-opsandslackdata 5: Input: Cut-otimingslack, S cut 6: Output: Foreachlogicalmodule k ,alistofuniquecaptureip-opssortedinascendingorderofslack, D k = f d 1 k ;d 2 k ;d 3 k ;:::d nk g 7: C FF setofalluniquecaptureip-opsindesignnetlistwithslackbelow S cut 8: for k =1tonumberoflogicalmodules N module do 9: D k listofcaptureip-ops D k C FF inlogicalmodule k 10: endfor 11: endprocedure 12: 13: procedure Gate-overlapAwareFlip-flopSorter 14: Input: Logicalmodulebasedcaptureip-oplist D k frompreviousprocedure 15: Input: Gate-overlapthreshold, OV 16: Input: Numberoflogicalmodulesindesignnetlist, N module 17: Output: Gate-overlapawarelistofcaptureip-opssortedinascendingorderofslackforeachlogicalmodule k L k 18: 19: for k =1tonumberoflogicalmodules N module do 20: for i =1tonumberofip-opsin D k do 21: G ik listofon-pathcritical/near-criticalgatesforip-op d ik 2 D k 22: endfor 23: endfor 24: Initialization: T k = f G 1 k g for k =1to N module 25: Initialization: L k = f d 1 k g for k =1to N module 26: for k =1tonumberoflogicalmodules N module do 27: for i =2tonumberofip-opsin D k do 28: =percentagegate-overlapbetween T k and G ik 29: if < OV then 30: T k = T k [ G ik 31: L k = L k [ d ik 32: endif 33: endfor 34: endfor 35: endprocedure 36: 37: procedure IdentificationofSensorInsertionPoints 38: Input: Areaofthesensormodule, Area Sensor 39: Input: Estimatedtotalareaofthemaindesign, Area main:design 40: Input: Area-overheadbudget, Area Overhead 41: Input: Optimizedlistofuniquecaptureip-opsineachlogicalmodule k L k 42: Output: Setoftargetcaptureip-opsforsensorinsertionineachmodule k FF k 43: Total Sensor = Area main:design Area Overhead =Area Sensor 44: for k =1tonumberoflogicalmodules N module do 45: N k =numberofcaptureip-opsinlist L k 46: endfor 47: P N k =totalnumberofcaptureip-opsinallofthe k modules 48: for k =1tonumberoflogicalmodules N module do 49: P k = N k P N k ;module k 'scontributiontothetotalnumberofcandidateip-ops 50: S k = P k Total Sensor ;Allottednumberofsensorsforlogicalmodule k 51: FF k selecttop S k captureip-opsfromlist L k 52: endfor 53: endprocedure decidedbythearea-overheadbudget.Forthesecondcriterion,theproposedalgorithmensuresthatthesensorscoverawiderangeofdiverseordistinctpathsandthosepathsarespatiallydistributedacrossthelayout.SpatialdistributionofthesensorsallowmonitoringPVTvariationeectsonthetimingcriticalpaths.Theowbeginswiththesynthesisof65

PAGE 66

hardwarefromRTLtothegate-levelnetlistasdepictedinFig.3-3.Atthisstage,basedonthethestandardcelllibraryusedinsynthesisandthegate-levelnetlist,anestimateofthelayoutareaisobtainedfromthesynthesisCADtool.Afterthat,statictiminganalysisisperformedonthesynthesizednetlist,andcritical/near-criticalpathsconsideringbothsingleandmulti-cyclepathsaresortedinorderoftheirrespectivepathslacks.Nextthecaptureip-opselectionalgorithmpresentedinAlgorithm1isexecuted.Thealgorithmiscomposedofthreemainproceduresasdescribedinthefollowing.TherstprocedureNetlistParserlines1to11takesasinputthegate-levelnetlistofthedesign,listoflogicalmodulesinsidethedesign,timingslackdataforeachcaptureip-opobtainedfromstatictiminganalysis,andacut-oslackmargin.ThisprocedureoutputsthesortedlistD kofcaptureip-opsgroupedaccordingtothelogicalmoduleswherethosebelongto.Thelogicalmoduleandslackbasedsortingofcaptureip-opsisaccomplishedbyloopingoverallofthecaptureip-opinstancesabovethetargetcut-oslack,followedbyparsingofthegate-levelnetlisttoidentifytheip-op'slocationinthelogicalhierarchyandputtingitintheappropriatelistasshowninlines8to10.ThesecondprocedureGate-overlapAwareFlip-opSorterlines13to35outputsareducedlistL kofcaptureip-opsobtainedbynarrowingdowntheselectiontoincludepathswithdiverseordistinctlogicgates.Thedistinctnessoftheselectedcaptureip-opsisachievedbyreducingthenumberofoverlappinglogicgatesamongthecritical/near-criticalpathsterminatingatthosecaptureip-ops.Theinputstothisprocedurearetheip-oplistD kfromthepreviousprocedure,atargetgate-overlapthreshold OVandthelistoflogicalmodules.Intherststageofthisprocedure,thelistsG ik-ofallon-pathlogicgatesforthecritical/near-criticalpathsterminatingateachcaptureip-opiofeachlogicalmodulek-aregeneratedasshowninlines19to23.Next,foreachhierarchicallogicalmodulek,wecreatetwoarraysT kandL ktoholdtheon-pathlogicgatesandcaptureip-opsrespectively.WeinitializeL kwiththerstcaptureip-opd 1 kfromthelistD kandT kwiththecorrespondingon-pathlogicalgates66

PAGE 67

asdemonstratedinlines24and25.Next,foreachlogicalmodule,weiterateovertherestofthecaptureip-opsandincludeacaptureip-opinthelistL kifthegate-overlapthresholdbetweenthisparticularip-op'son-pathlogicgatesandtheexistinggatesinthelist T k islessthanthepre-selectedthreshold OV lines26to34.ThenalprocedureIdenticationofSensorInsertionPointsgeneratesthelistFF koftargetcaptureip-opsinmodulekforsensorinsertion.Theinputstothisprocedurearetheareaofthemaindesign,areaofeachsensor,area-overheadbudgetandtheoptimizedip-oplistL kfromthepreviousprocedure.Inline43,thetotalnumberofavailablesensorsiscalculatedfromthearea-overheadbudget.Next,inlines44to46,N knumberofip-opsinlistL kareidentiedforeachmodulek.Inlines49to50,availablesensorquotaS kforeachlogicalmodulekiscalculated.Thisquotaforamoduleisdecidedbythemodule'sshareinthecumulativenumberoftotalcaptureip-ops.Lastly,inline51thenalcaptureip-oplistFF kforeachmodulekisreportedbyselectingtopS k ip-opsfromthelist L k forsensorinsertion.Afternalizingthesensorinsertionpointsorthecaptureip-ops,thesensorinstancesareaddedinsidethesynthesizednetlistoftheSoCatthoseidentiedinsertionpointsusingcustomscriptsandthephysicaldesigniscompleted. 3.4MachineLearningIndevelopingourspeedbinningmodel,weexploredvemajormulticlassmachinelearningclassiers[87]-iMultinomialLogisticRegression,iiMulticlassSupportVectorMachine,iiiBootstrapAggregationBaggingdecisiontree,ivRandomForestdecisiontree,andvAdaptiveBoostingAdaBoost.BriefdescriptionsoneachofthesealgorithmsaregivenintheAPPENDIXsection. 3.5SimulationResultsandAnalysisTherststepinextractingthepathslacksforspeedbinningusingembeddedsensors,istodesignthesensorIPwithappropriateslackdetectionrange.Thedetectionresolutionislimitedbythedelayofaunitbuer.Thenominalresolutionattypical-typicalcorner67

PAGE 68

Figure3-4.Wave-shapesofsensorresponsefromSPICEsimulation.forourselected28nmstandardcelllibraryis20ps[119].Theexpectedresolutionat14nmand22nmnodesare10psand6ps,respectively,consideringminimumsizebuersatnominalconditions[115].Theresolutionalsosetsthemaximumround-oerrorlimitofslackmeasurement.Theslackdetectionrangeissetto15%ofthenominalclockperiodoftheSoCinwhichthesensorswouldbeembedded.Withanominalclockperiodof833psbasedonpath-delayanalysisofpossiblecriticalpathsforourbenchmarkcircuit,atthesensornominalresolutionof20psfortheselectedstandardcelllibrary,7buersarerequiredtocovera15%slackmargin-whichisabout140ps.Afterdecidingthenumberofrequiredbuersinthedelayline,theslacksensorIPwaswritteningate-levelVerilogandsynthesizedwithDesignCompilerusingSynopsys28nmstandardcelllibrary[119].Duringcreationofthelayoutofthesensorsoftmacro,thebuerswereplacedinthesamerowadjacenttooneanothertoensuredelay68

PAGE 69

consistencyamongthebuers.Theip-opsinsideasensorwerealsoplacedadjacentlyinarow.Eachsensoroccupiedalayoutareaof164m2anddissipated26Wpowerwhileactivated.ApostlayoutSPICEnetlistwasextractedandsimulatedwithHSPICE[119].Timingdiagramsfrompost-layoutSPICEsimulationsaregiveninFig.3-4wherethetimingslacksimulatedis100ps.TheactivezeroStatesignaltransitionstozeroaftertherisingclockedgeat2ns.Theip-opsofmonitorunitrecordthisslackdataintobitsM0toM6asshownintherespectivewave-shapesinFig.3-4.Wave-shapesS0toS6capturetheworst-caseslackinformationforthecaseofarisingtransitionoftheData Ensignal.InFig.3-4thetimingslackcorrespondingtotheclockedgeat2nsisrecordedinstoragebitsS0toS6afteralatencyofoneclockcycleattheclockedgeat3nstoavoidmeta-stabilityasdiscussedinSection3.2.Thesensorcalibrationresultsfor28nmstandardcelllibraryaregiveninTable3-1,whereColumn1reportstheslackobtainedbysubtractingtheip-opsetuptimefromthedelaybetweenthedataarrivaltimeandtheactiveclockedge.Column2showstheeectsoftemperaturevariationsonthesensorresponsesatnominalVDD.Sinceonly7buerswereusedinthedelayline,itwasobserved-fortheselected28nmlibrary-thatthesensorresponseswereinvariantofanyshiftintemperaturewithin20Cto80C,thetypicaltemperatureinICtestenvironment.Asimilarcalibrationtablecanbeobtainedforothertemperatures.WeassumethatthechiptemperaturewouldbeavailablefromthetypicalthermalsensorsusedinSoCs[83].AsmentionedinSection3.2,forarisingtransitiondetection,thesensorsareinitializedwithasensorreadingof1111111,beforethepathslackmeasurementscommence.Attheupperdetectionlimitthesensorcodeis1111111andastheslackdecreases,thesensorcodechangesfrom1111111to1111110,thento1111100andeventuallytothelowerdetectionlimitat1000000,whereeachzerorepresentsslackreductionbyanamountequaltothedelayofabuer.Hence,thesensorreadingof1111111indicatesslackisatleast140psand0000000interpretsthatthemonitoredslackwasnegative.Otherstatesnotmentionedinthecalibrationtableindicate69

PAGE 70

Figure3-5.MonteCarloprocessvariationsimulationresultsonsensorIPatnominalVDD.thatthecurrentslackisbeyondthedetectionrangeofthesensor.Sincethespeedofthebuersusedinthedelaylineissensitivetopowersupplynoiseandvoltagedroop,weanalyzedthepowersupplyvariationsensitivityofthesensorIPandtabulatedtheresultsinTable3-1.Theeectofvoltagedroop-slowingdownofthesensorasshownforthe90%VDDcaseinColumn3-canbeeasilydecoupledfromslackdatawiththeaidofpowersupplynoisesensors[66]. Table3-1.Sensorcalibrationresultsfor28nmstandardcelllibrary Cto80 C Slackps100%VDD.05V95%VDD.99V90%VDD.94V 14011111111111101111100 12011111101111101111000 10011111001111001110000 8011110001110001110000 6011100001100001100000 4011000001100001100000 2010000001000000000000 Toassesstheimpactofprocessvariationsonthesensitivityofthesensor,weconducteda150sampleMonteCarlosimulationonthesensorIP.TheselectedparameterstovaryfromnominalweretransistorlengthL,widthWandthresholdvoltageV th70

PAGE 71

eachby15%,andoxidethicknesst oxby4%.Allsimulationsweredonewithinthestatisticalrangeof3standarddeviations.TheresultsaredepictedinthepiechartsofFig.3-5whereitcanbeobservedthatinallthecasesabout95%oftheresponsesmatchedtheexpectedcalibrationresultsofTable3-1,intherestofthecasesthesensorresponsewaseitherslowerorfasterbyaunitbuerdelay.Theseresultsimplythatworst-caseobservationerrorislimitedtoonebuerdelay. Table3-2.SensordataextractionfromactualpathsofFGU Path ID Actual slack pst=0 years Sensor reading t=0 years Slack from sensor pst=0 years Measurement error pst=0 years Actual slack pst=3 years Sensor reading t=3 years Slack from sensor pst=3 years Measurement error pst=3 years 16911000060929000000209 28211100080243100000407 383110000601747100000407 494111000801453110000607 591111000801152110000608 698111000801857100000603 711111110010011511110004011 81121111001001273111000807 91141111001001478111000802 10125111110120582111000802 Table3-3.LayoutstatisticsandsensorareaoverheadforFGU Numberof cells Numberof ip-ops Estimatedlayout areawithout sensorstd.cell Each sensorarea Allottedsensorsfor2% std.cellor0.39% overallareaoverhead 588027514204304 m 2 164 m 2 25 Nextweimplementedourproposedsensorinsertionowandthecaptureip-opselectionalgorithmdescribedinSection3.3forspeedbinning.WeselectedtheFloatingPointandGraphicsUnitFGUcircuitfromtheOpenSPARCT2SoC'sSPARCcore[117].Thecircuitsweresynthesizedtogate-levelnetlistwith28nmstandardcelllibraryusingDesignCompiler[119].FortheFGUcircuit,fromaninitialpostsynthesistiminganalysiswithPrimeTime[119],thelistofallthetimingpathssortedinthedescendingorderofnominaldelaywereobtained.ThelayoutareaofFGUwasestimatedfromthesynthesistoolDesignCompiler[119].Thestatisticsfromthesensorinsertionowareshownin71

PAGE 72

Figure3-6.Sensorloadingeectonpathdelay.Table3-3.Foranarea-overheadbudgetof2%,thecorrespondingnumberofcaptureip-opstobemonitoredbysensorswerearound25asreportedinColumn5inTable3-3.Herethearea-overheadof2%isaconservativeestimateaswehaveconsideredonlythestandardcellarea.Ifwealsoincludetheareaoftheregisterlemacroincalculatingtheoverhead,theoverallarea-overheadisonly0.39%.Fig.3-6showsthediagramofanactualcritical/near-criticalpathfromFGU,whichwasusedtoanalyzethesensor'sloadingimpactonthecritical/near-criticalpath'sdelay.InTable3-4thepincapacitancevaluesandcell'sloaddrivecapacityfactorarereportedfromtheNLDM/CCSliberty[119]libraryleanddesignmanual,respectively,availablewiththe28nmstandardcelllibrary.WealsoperformedadetailedSPICEsimulationonthepathwithandwithoutthesensor.Asshowninrows4and5ofTable3-4,becauseofthesensorinsertionthepathdelayincreasedto769psfrom762ps,whichislessthan1%increase.Column1inTable3-5reportsthedierentlogicalmodulesoftheFGUcircuitandColumn2reportsthenumberofuniquecaptureip-opsineachlogicalmodule.Theslackcut-orangeforidentifyingcritical/near-criticalcaptureip-opswassetas15%ofthenominalclockperiod.Whileselectingthecritical/near-criticalpathsto72

PAGE 73

Table3-4.Sensorloadingeectonpathdelay Capacitanceon Dpinof ip-opfor cellstrength X1 Inputpin capacitanceof clockgating cellforcell strengthX2 Driver Cell's loaddrive capabilityfor cellstrength X1 Delayofa criticalpath beforesensor insertion SPICE simulation Delayofa criticalpath aftersensor insertion SPICE simulation 0.461fF0.668fF4fF762ps769ps monitorwithsensors,weanalyzedifthepath'soutputnodewasrare.Ararenodeisdenedasanodewheretheswitchingactivityisverylow.Toobtaintheswitchingactivity,itisrequiredtorundedicatedtestprogramsthatrepresenttheactualworkload,andthenmonitortheswitchingproleatthenodeofinterest.Since,forFGUcircuitmodulewedidnothavededicatedworkloadprograms,wehaveusedtheSCOAPSandiaControllability/ObservabilityAnalysisProgrambasedtechnique[96]toidentifytheswitchingproleatanode.Thistechniqueisbasedontheobservationthattheprobabilityofanodebeingrareisdirectlycorrelatedtothenode'scontrollability[96].Anodewithverylowcontrollabilityimpliesthelogictransitiononthatnodeisascarceevent.TheSCOAP[97]techniquecancalculate`controllabilityto0'and`controllabilityto1'foreachnodeinthecircuit,wherehighervaluesindicatelowercontrollability.Forourswitchinganalysis,wetaketheratio,R T=minimumcontrollabilityto0,controllabilityto1/maximumcontrollabilityto0,controllabilityto1.0
PAGE 74

Table3-5.SensordistributioninFGUofOpenSPARCT2 Numberof logical modules Numberof unique capture ip-ops Flip-ops with worst-case slackbelow cuto Numberof ip-ops after gate-overlap analysis Numberof ip-opsas apercentof total Numberof allotted slacksensors FAC495651.61 FAD693651.91 FDC76641.61 FDD10402139738.58 FEC84541.61 FGD110241239.12 FIC65541.61 FPC6991372.81 FPE97641.61 FPF73152166.41 FPY19311578332.97 connectedin-seriesinalongpath,asaresultoutputnodeofacritical/near-criticalpathishardlyrarei.e.,lowswitchingevent.Toidentifythecritical/near-criticalpathsofFGU,andtosortthemaccordingtothelogicalmoduleswheretheyterminate,theStaticTimingAnalysisSTAwasperformedinthreesteps,iFirst,theSTAwasrunonthefullFGUmodule,andthelistofcritical/near-criticalcaptureip-opsobtained;iiNext,thosecaptureip-opsalongwiththeircorrespondingpathsweregroupedaccordingtotheirlogicalmodulelocation;iiiFinally,theSTAwasagainrunindividuallyforeachlogicalmodulewithonlythepathgroupofthatmodule.TheresultsofSTAareshowninColumns3and4ofTable3-5.Thegate-overlapawareip-opsortingalgorithmAlgorithm1inSection3.3.wasexecutedwithagate-overlapfactorof30%.Theip-opreductionresultsformgate-overlapanalysisaregiveninColumn4inTable3-5.Eachlogicalmodule'spercentageshareinthecandidateip-oplistaregiveninColumn5.Accordingtoeachmodule'srespectiveshare,theslacksensorswereallottedintothoselogicalmodules.Aneortwasmadetoincludeatleastonesensorineachlogicalmodule.Afternalizingthecandidatecaptureip-opsforsensorinsertionfollowingtheproposedow,thesensor74

PAGE 75

Figure3-7.Thecompletelayoutwithsensors.aFGUwithembeddedsensornetwork.bCellsofsensormodulesarehighlighted.instancewasaddedtothosepointsinsidethesynthesizedgate-levelnetlistoftheFGU.ThedierentstepsofourproposedsensorinsertionmethodologywerecompletedbyusingSynopsysCADtools[119],andcustomscriptswritteninPerl,ShellandTcl.Finally,thefullphysicaldesignoftheFGU,withembeddedsensornetwork,wascompletedwithICCompiler[119].Fig.3-7ashowslayoutofFGUwith25embeddedsensors.FromFig.3-7b,itcanbeobservedthatthesensorblocksaredistributedacrossthelayoutasthosewereplacedinsidethedierentlogicalmodulesofthemainnetlistaccordingtoourproposedow. Table3-6.ProcessvariationproleforMonteCarlosimulations Variation prole Channel length L Channel width W Threshold voltage V th Gate-oxide thickness t ox Case11010103 Case21515154 Thedetailedtransistorlevelcircuitnetlistofthe25sensorsalongwiththeircorrespondingmonitoredpathswereextractedfromthelayoutusingSynopsystools[119].Also,thetransistorlevelnetlistofthetop10%critical/near-criticalpaths-identiedfrom75

PAGE 76

postlayouttiminganalysis-wereextractedforF MAXidentication.Anyofthesetop10%pathscanbethemostcriticalpath,decidingtheF MAX.Whileidentifyingthepossiblecriticalpaths,bothsinglecycleandmulti-cyclepathswereconsidered.Formulti-cyclepaths,theslackswerenormalizedaccordingtothecyclecount.Becauseofpost-fabricationprocessvariations,itisexpectedthatthedelaysoftheseidentiedcriticalpathswillvaryfromchip-to-chip.Tosimulatetheeectofpost-fabricationprocessvariations,weperformeda300pointMonteCarlosimulationontheextractedcriticalpathsaswellasthesensor-pathpairsusingHSPICE[119].ForMonteCarlosimulationtwoprocessvariationscenarioswereconsideredasreportedinTable3-6.ForCase1,theselectedparameterstovaryfromnominalweretransistorlengthL,widthWandthresholdvoltageV theachby10%,andoxidethicknesst oxby3%.ForCase2,theseparameterswerevariedby15%forL,W,V th,and4%fort ox.Allsimulationsweredonewithinthestatisticalrangeof3standarddeviations.Thesimulationtimewasapproximately25hoursonasystemwithanIntelXeonprocessorwith8coresand96GBRAM.Thepathslackswereextractedfromthesimulationsofthesensor-pathpairsbyexcitingthemwithappropriatestimulusfunctionsgeneratedbySynopsystools[119].FromdetailedSPICEsimulationsofallofthetop10%pathsofthecircuit,thelongestpathdelaywasidentied.Afteradding10%marginwiththisdelayasaguard-bandagainstagingandnoise,theminimumclockcycleandtheF MAXwasestimatedforeachsample.InordertoquantizethecontinuousF MAXvalueswechose50MHzasthegridsizeorbinwidth.ThedistributionofF MAXinMonteCarlosamplesforthetwovariationscenariosareshowninFig.3-8.ForcomparativelylowerprocessvariationscenarioofCase1Fig.3-8a,theF MAXdistributionspreadsfrom1100MHzto1300MHzwithastd.deviationof46MHz.Ontheotherhand,forcomparativelyhigherprocessvariationscenarioofCase2Fig.3-8b,F MAXdistributionrangesfrom1100MHzto1350MHzwithstd.deviationof62MHz.ComparisonofthesetwoguresrevealthedegreeofF MAXspreadwiththemagnitudeofprocessvariations. 76

PAGE 77

Figure3-8. F MAX distributioninMonteCarlosamples.aCase1.bCase2. Figure3-9.ImportanceofeachfeatureSensorindevelopingtheRandomForestpredictor.The300datasamplesobtainedfromMonteCarlosimulationswerepartitionedintoatrainingsetandavalidationset.Thevalidationdatasetcontained100samplesandthreecasesoftrainingset-consistingof100,150and200samples,respectively-wereconsidered.Eachofthe25slacksobtainedbythesensorsactasafeatureinourmachinelearningbasedspeedbinningow.ThemachinelearningcapabilitiesofMATLAB[120]wereusedtoimplementandtraintheveclassiers-ECOCSVM,LogisticRegression,Bagging,RandomForestandAdaBoost.M2-describedinSection3.4.Therelativeimportanceofthe25featuresintrainingtheRandomForestpredictorisshowninFig.3-9.77

PAGE 78

Figure3-10.Predictionresultsforeachofthe100validationsamplesfor100sampletrainingdata.Case1.aBagging,bRandomForest,cAdaBoost.M2,dECOCSVM,andeLogisticRegression.Case2.fBagging,gRandomForest,hAdaBoost.M2,iECOCSVM,andjLogisticRegression.Aftertheclassierswerealltrainedwiththetrainingdataset,the100samplevalidationdatasetwereappliedtothetrainedclassiersandthecorrespondingpredictedF MAXwereobtained.ThesepredictedF MAXvalueswerelatercomparedwithactualF MAXtoidentifythemis-binningrate.InFig.3-10predictionmismatchforeachofthe100validationsamplesareshownforthedierentmachinelearningalgorithmsandvariationprolesfor100sampletrainingdataset.Fig.3-10a-eareforthevariationproleofCase1andFig.3-10f-jareforvariationproleofCase2.ItcanbeobservedthatforthecomparativelylowerprocessvariationofCase1,96%,98%,90%,98%,and84%predictionaccuracyareobtainedforBagging,RandomForest,AdaBoost.M2,ECOCSVM,andLogisticRegression,respectively.Ontheotherhand,forCase2,thepredictionaccuracydeterioratedto91%,93%,89%,94%,and78%,respectively.Inallcasesexceptlogisticregressiontheworst-casemis-predictionis50MHzoronebin.Thedegradedprediction78

PAGE 79

Figure3-11.Improvementofpredictionaccuracywithincreasingnumberoftrainingsamples.Variationproleof,aCase1,bCase2.accuracyforCase2canbeexplainedwiththeF MAXdistributioninthetrainingdatasetasshowninFig.3-8,whereCase2has6binscomparedto5forCase1.TheeectofnumberoftrainingsamplesonpredictionaccuracyisshowninFig.3-11.ForthevariationproleofCase1,asweincreasedthetrainingsamplesize,thepredictionaccuracyimprovedsignicantlyreaching99%forRandomForest,ECOCSVMandBaggingfortrainingsetsizesof150andbeyond.ForCase2Fig.3-11b,200trainingsampleswererequiredtoachieve99%accuracyforRandomForest,ECOCSVMandBagging.Fromacomparativeanalysisoftheresultsfrom5dierentmachinelearningtechniquesinFig.3-10andFig.3-11,itisobservedthatRandomForestandECOCmulticlassSVMoerthebestpredictionaccuracy.Ontheotherhand,linearlogisticregressionperformedpoorly;indicatingaweakdirectlinearcorrelationbetweensensorresponsesandfunctionalF MAX.TheloweraccuracyofAdaBoost.M2canbeexplainedbythefactthatthisalgorithmemphasizesonthehardtopredictsamples,andasaresultmightsuerfromover-emphasizingifasmallnumberofsamplesareincludedinaparticularclassinthetrainingdataset.ThisisindeedthecaseasisevidentintheF MAXdistributionplotsinFig.3-8whereonly5%and4%samplesareinthe1100MHz79

PAGE 80

and1300MHzbins,respectivelyforCase1Fig.3-8a.ForCase2Fig.3-8b,only5%samplesareinthe1350MHzbin.Therankingoftheobservedaccuracyofthese5machinelearningtechniquesindevelopingaF MAXpredictorisconsistentwiththendingsof[87]. Figure3-12.ApplicationofslacksensorinTSVdelaymeasurement.InthescenariothatthechipsaresoldwiththeF MAXlabelspredictedbythetrainedclassier,thereisachancethatevenforaclassierthatwastrainedwithsucientnumberoftrainingsamples,1%ofthepredictedchipsmightexhibitmis-predictedF MAX.Theworst-casemis-predictionisonebinfasterorslowerthantheactualF MAX.Forafasterchipbinnedintheslowerbin,thecustomerwillnotcomplain.However,aslowerchipshippedasafasteronemaycausecustomerreturn.Dependingonthetrade-obetweenbenetsgained-fromtesttimereductionandtestercostsavingsfromF MAXprediction-andthecostthatwillbeincurredfromdealingwiththecustomerreturnsoftheworst-case1%shippedsamples,thechipvendorcandecideiftheywillsellthechipswiththepredictedF MAXlabelsorperformasecondstageofexpeditedfunctionalF MAXtestingatthepredictedF MAXfrequencytoeliminatecustomerreturns.Ifthechippassesthisfunctionaltest,thecorrectF MAXwaspredicted.Ifitfails,thenthefunctionalF MAXtestisdoneagainatthefrequencyoftheimmediatelowerbin,whereamatchisexpected.80

PAGE 81

InthiswaythereareatmosttwofunctionalteststhatwillgivethecorrectF MAXforallchipsshippedtothecustomer,asopposedtothetestwithfrequencysweepperformedover thepossible F MAX labelsintheconventionalow. 3.6TestChipImplementationTheslacksensorIPwasimplementedinGlobalFoundriesadvanced14nmnfettechnology.TheapplicationofthesensorwasinestimatingthepropagationdelayoftheThrough-SiliconViaTSVof3DICs.Asofwritingthisthesis,thetestchipisstillunderfabricationatGlobalFoundriesatMalta,NY.Fig.3-12showsasnippetoftheGDSIIviewofthetestchip. 81

PAGE 82

CHAPTER4 DESIGNOFRELIABLESOCSWITHBISTHARDWAREANDMACHINELEARNINGInthischapterthechallengesofpathdelaydegradationfromcombinedeectsofaginganditsimpactofchiplifetimereliabilityareaddressed. 4.1HardwareLevelImplementationDetails Figure4-1.HardwarelevelimplementationdetailsofBIST-RM.Theextentofaginginduceddelaydegradationofcriticalornear-criticalpathsvariesnon-linearlywithtime.Theagingevolutionwithtimeofacriticalpathsimulatedin28nmtechnologyisshowninFig.4-2.Agingoccursatafasterrateduringtheinitialstagesofchiplifespan,andtherateslowsdownwithtime.InFig.4-2,X-axisrepresentstheagingtimestagesinthelifespan,andtheY-axisshowsthepercentagedegradationofthe Thecontentsofthispaperwerepreviouslypublishedin[65]. 82

PAGE 83

Figure4-2.Delayincreasewithtimeforapathin28nmtechnology.pathdelayateachagingstagewithrespecttothe`time0'stage.Remedialproceduresarerequiredtokeepthechipfunctioningreliablydespitethedegradation.Ahigh-leveloverviewofBIST-RMisshowninFig.4-3.Usingpre-denedandstoredLFSRseeds,LBISTpatternsareappliedwithclocksweepingthatexciteasetofspeciallyselectedcriticalornear-criticalpaths.Theresponseobtainedfromthesepatternsarefedtoatrainedpredictortoobtainthereliableclockfrequencyconsideringtheaging.Ifthisclockfrequencyislowerthanthetargetoperatingfrequency,countermeasuresareactivated.BIST-RMwillbeexecutedatscheduledintervalsinanymulti-coreorsingle-coreprocessor.Foramulti-coreprocessor,rstanidlecorewillbeselectedandthenthetestpatternswillbeappliedfromthatcore'sLBISThardwarebyloadingtheLFSRseedsfromnon-volatilememory.Thetestresponseswillbesenttoanothercorerunningthetrainedpredictorinsoftware.Thepredictorwillidentifytheagingstateandgeneratethenecessarycontrolsignalsfortheadaptivecontrollerhardware.Forasinglecoreprocessor,adedicatedmicrocontrollerwillbeusedtorunthepredictorinsoftware.HardwarelevelimplementationprocedureoftheBIST-RMcanbedividedintomultiplesegmentsasshowninFig.4-1.Thedierentstagesaredescribedbelowindetails. 83

PAGE 84

Figure4-3.OverviewofBIST-RMsystem. 4.1.1IdenticationofHighUsageCriticalorNear-CriticalPaths Figure4-4.Pathsmostlikelytobecriticalornear-critical.AsdepictedinFig.4-1inStage1,rstthegate-levelnetlistissynthesizedfromRTL.Next,adetailedtiminganalysisisperformedtosortallthecircuitpathsaccordingtotheirdelays.Inordertondoutthepathsthataremostlikelytobeusedandaremostlikelytobecomespeed-limitingcriticalpaths,twocriteriaarefollowed.Firstly,weidentifythespectrumofpathsthathavethepotentialtobecriticalatanystageover84

PAGE 85

Figure4-5.Togglerateanalysis.thechip'slifespanirrespectiveoftheworkloadactivitypattern.Fromsimulationwith28nmtechnologyitwasobservedthatpathsdegradeby12.5%overthetargetedlife-spanFig.4-2.Basedonthisinformation,weselectthosepaths,suchthat,ifthedelaysofthesepathsareincreasedby12.5%fromaging,theywillbethecriticalpaths.ThiscriteriacanbevisualizedwiththeaidofFig.4-4whereallpathswithdelaysmorethanthecutodelay,d cutoff,areconsidered.Secondly,the`high-usage'criteriaensuresthattheselectedpathsaremostlikelytobeusedduringanyrandomworkloadexecution.Thehigh-usagecriteriacanbeexplainedwithFig.4-5.Theusagefactorofapathisdenedas,UF=P K n =1 T n,whereT nisthetogglerateattheoutputofn thon-pathgate.Thetoggleratecanbeidentiedbyrunningrandomorselectedworkloadsduringthedesignstage.TheoutputofthisstageisthepathlistP 1thatcontainsallthepathssatisfyingtheabovementionedtwocriteria. 4.1.2LBIST-excitablePathIdenticationInthisstepthehigh-usagepathlistP 1identiedinthepreviousstepisfurthernarroweddowntocontainonlythosepathsthatareexcitablebyLBISThardware.Thenecessarytasksinthisstepare, 4.1.2.1ATPGPatternGenerationTraditionallaunch-on-CaptureLoCandLaunch-on-ShiftLoSTransitionDelayFaultTDFATPGpatternstargettheeasilysensitizablepaths.Asaresult,theexcitedpathsaremostoftenshorterpaths,anddonottargetthelongestpaths,i.e.,the85

PAGE 86

speedlimitingcriticalornear-criticalpathswhicharemoresusceptibletoin-eldagingdegradationandbetterrepresentativeofchipspeed.Toovercomethisissue,weusedthesmall-delaydefectSDDtransitionfaultdetectiontestapproachthatsystematicallytargetslongerpaths[105][106].InSDDow,detailedtimingslackinformationisrstobtainedfromtheStaticTimingAnalysisSTAtoolandthenfedtothemodernATPGtools,e.g.SynopsysTetraMAX[119]thatutilizeslack-consciousalgorithmstogeneratehighlyecientSDDpatterns.TheeectivenessofSDDpatternsinexcitingthesetoflongestpathsareobserved,and,ifrequired,testpointsmightbeinsertedtomakesuretheallofthecriticalornear-criticalpathsofinterestarecoveredbytheATPGgeneratedSDDpatterns.Inourcase,allpathswithdelaymorethand cutoffareconsideredforSDDpatterngenerationasshowninFig.4-1.ThenextstepistoconverttheSDDATPGpatternsintoequivalentLBISTapplicableseeds. 4.1.2.2LBISTSeedExtractionDierentmethodshavebeenproposedforextractingseedsfromATPGpatterns.TheseedextractionapproachStage2inFig.4-1usedinthisworkisbasedontheseeddecodingimplementationdescribedin[109].ThisapproachassumestheSoChasanexistingLBISTarchitectureusedforin-eldtest.Theseeddecodingalgorithmin[109]wasdesignedandoptimizedtoworkwithlargebenchmarks.Ittakestestcompressionintoconsiderationandoptimizesthealgorithmtoreducethecomputationaleortaddedbycompressionlogic.Thesefeaturesallowthisalgorithmtobeimplementedonlargebenchmarkswhicharemorelikelytoincurtimingviolationsduetoaging,making[109]suitableforourpurpose.Additionally,thisseedextractionmethodonlyusescomputationalapproachwithoutrequiringanydesignmodicationstodecodetheATPGseeds.Forthiswork,ATPGpatternsaregeneratedforrisingtransition-delayfaults,ratherthanstuck-atfaultsasitisdonein[109].EachTDFATPGpatternhastwovectors;however,aseedonlyhastobecalculatedfortherstvectorofeachpatternsincethe86

PAGE 87

secondvectoristhecapturedresponsefromtherstvectorlaunch-ocaptureortheshiftedversionoftherstvectorlaunch-oshift.TogenerateseedsforTDFpatterns,aninitialsetofTDFATPGpatternsisgeneratedfortheentirefaultlist.Inaddition,[109]generatesamathematicalmodeloftheLBISTLFSRand,ifitispresent,thetestcompressionstructure.ThemodelisthenusedtogeneratelinearXORequationsforeachcare-bitinallATPGpatterns;thus,eachATPGpatternisdescribedbyasetoflinearequations.ConstraintprogrammingtoolsandparallelprogrammingalgorithmsareusedtosolvetheselinearXORequationstogenerateseedsfortheATPGpatterns.AbinarytreesearchalgorithmisusedtondthesolutionwhichmeetsalltheATPGcare-bitconstraintsfortheXORequations.Thisprocessdependsonthenumberofcare-bitsineachpattern,thussomepatternstakelongertodecodethanothers.Duetothelargenumberofcare-bitsinsomepatternsorconictingdependencieswithinthepatterni.e.dependentlinearequations,asolutiondoesnotalwaysexistforagivenpattern.Ifasolutiondoesnotexist,theconstrainttoolrunsanexhaustivesearchwhichhasanexponentialruntimerelativetoLFSRsize.Tolimitruntime,atimelimitisset;oncethislimitisreached,thealgorithmagsthecurrentpatternasundecoded"andstoresfaultsdetectedbythatpatternforthenextiteration.Tofurtherspeeduptheseedgeneration,parallelcomputingtechniquescanbeusedtoprocessthenextsetofequationsinadierentcomputingthreadratherthanwaitforthepreviousonetonish.Toachievethisparallelism,asharedqueueisusedwhichcontainsallsetsofequations.Eachavailablecomputingthreadrequestsasetofequationsandworksindependentlytodecodeit.Oncenished,thethreadstoresthesolutionintoasolutionqueueandpicksupthenextavailablesetofequations.Thisreducesthebottleneckeectcausedbyequationswhichcannotbedecodedsinceonlyfewcomputingthreadsareoccupieddecodingthese,whileallotherthreadsworkontheremainingsets.Mutexlocksarealsousedtoalloweachthreadtoreservethesolutionqueuewhilewritingasolution;thisensuresthatthreadsdonotoverwriteeachother'ssolutions.Barriersare87

PAGE 88

usedtoensureallcomputationthreadshavenishedwritingallpossiblesolutionsbeforemovingontothenextpatternset.Largebenchmarksandcomplexdesignstypicallyhavealargenumberofcare-bitsineachATPGpattern.Inthesecases,[109]usespartialATPGpatternstogeneratemultiplesetsofATPGpatterns,eachwithfewerdetectedfaults.Morepatternsaregeneratedforthesamesetoffaults;thereby,eachpatternhasfewercarebitsandasmallersetoflinearequations.ThedetectedfaultsfromtheoriginalsetofATPGpatternsisusedtoensurethatallfaultscoveredbytheoriginalATPGpatternsarealsocoveredbythepartialATPGpatterns.Inthiswork,TDFfaultsareusedtoactivateselectedpaths;tomaintainthisproperty,modicationsaremadeto[109]togeneratepartialpatternswhilemaintaininggroupsofTDFfaultsalongpaths.ThealgorithmiteratesandgeneratesnewATPGpatternsfortheremainingfaultsofpatternsaggedasundecoded"andthesamedecodingprocessisrepeated.Althoughthisapproachiscomputationallyintensiveandcouldincurlargeruntimes,thesecomputationsonlyneedtobedoneonceduringthedesignstageforalimitednumberoftestpatterns.Allextractedseedscanthenbestoredonthechipandusedatanytime.Duringin-eldtest,theseseedswillbeabletogeneratethedesireddeterministicTDFpatternswhenloadedontoLBISTfrommemory. 4.1.2.3DetailedPathDatabaseCreationInthisphase,forthepathsexcitedbytheLBISTconvertibleSDDpatterns,weextractthedetailedlistofalltheon-pathlogicgatesalongwiththelaunchandcaptureip-ops.Toaccomplishthiswedevelopedanin-housetool-implementingthestepsinAlgorithm2-thattakesthegate-levelverilognetlistofthedesign,netlistofthestandardcelllibrary,thefaultlistforeachLBIST-convertibleSDDpatternfromATPGtoolandthepathdelayinformationfromstatictiminganalysisasinput,andgenerates,ithesortedlistofallcaptureip-opsactivatedbythetestpatterns,iithecompletedatabaseofofallon-pathlogicgatesforthecriticalornear-criticalpaths. 88

PAGE 89

Algorithm2DetailedGateDatabaseGenerationforLBIST-excitableHigh-usageCriticalorNear-criticalPaths 1: procedure PatterntoPathDetailGenerationFlow 2: Input: Gate-levelnetlistofdesign 3: Input: Faultlistforthetestpatterns 4: Input:SortedlistofLBIST-excitabletestpatterns,f Pattern 1 ;Pattern 2 ;::;Patern N P g 5: Input: Cutopathdelay, d cutoff 6: Output: Sortedlistofallthecriticalornear-criticalcaptureip-ops, All FF Sorted 7: Output:DatabaseP 2ofon-pathgatesforeachcriticalornear-criticalcaptureip-op 8: for k =1tonumberofpatterns N p do 9: FF k listofcriticalornear-criticalcaptureip-opswithcorrespondingpathdelaymorethan d cutoff excitedbypattern k 10: endfor 11: All FF = FF 1 [ FF 2 ::::::: [ FF k = f f 1 ;f 2 ;f 3 ;::::::::::;f N total g 12: All FF Sorted sortip-opsin All FF inascendingorderoftimingslack 13: N F numberofip-opsinthelist All FF Sorted 14: for j =1to N F do 15: G j listofon-pathcriticalornear-criticalpathslogicgatesforcaptureip-op f j in All FF Sorted 16: endfor 17: P 2 databasecontainingpairsofip-opsandcorrespondinggates, f j ;G j 18: endprocedure AsshowninLines8to10inAlgorithm2,foreachtestpatternPattern k,thelistFF kisgeneratedthatcontainsthecriticalornear-criticalip-ops-withcorrespondingpathdelayabovethecutovalue-activatedbythatpattern.Next,thecaptureip-oplistsforallthetestpatternsarecombinedintothelistAll FFasshowninLine11.Finally,thecaptureip-opsaresortedinorderoftheirtimingslackmarginstoobtainthesortedip-oplistAll FF SortedasdepictedinLine12.Next,wegeneratethecompletelistofallon-pathlogicgatesforthecriticalornear-criticalpathsforeachip-opinthelistAll FF Sorted.AsshowninLines14to16,foreachip-opf jinlistAll FF Sorted,thecorrespondinglistG jisgeneratedwhichcontaintheon-pathlogicgatesforthecriticalornear-criticalpathsterminatingatthatparticularip-op.Finally,inLine17thecompletedatabaseP 2thatcontainsthepairsofcaptureip-opsandcorrespondinggatesarecreated. 89

PAGE 90

ThislistofpathsinP 2iscross-matchedwiththeinitialpathlistP 1,onlypathsthatarecommontoboththelistsaretakenintotheupdatedpathdatabaseP 3.AsaresultthedatabaseP 3containsonlythosepathsthatarepotentiallytimingcriticalatanystageofchiplifetimeaswellasofhigh-usagerateforanyworkload.ThisP 3databaseistakentothenextstepforgate-overlapanalysis. Figure4-6.Gate-overlapanalysis. 4.1.3Gate-overlapAwareLBISTPatternSelectionforIn-eldApplicationThegoalofthissubsectionStage3inFig.4-1istoselectasetofminimumnumberofcaptureip-opsandthecorrespondingtestpatternsmeetingthefollowingtwocriteria,iacertainfractionofallthehigh-usagecriticalornear-criticalpathsareactivatedbytheselectedpatterns,iitheselectedcriticalornear-criticalcaptureip-opsarespreadacrossthedesignlayout.FulllingtheseconditionsensurethatthepathsactivatedbytheselectedtestpatternswouldbetterrepresenttheprocessvariationsandworkloadproleoftheSoC.Weaccomplishedthesegoalsbydevelopingagate-overlapawareoptimumpatternselectionalgorithm.Theconceptofgate-overlappingcanbeexplainedwiththeFig.4-6.ForthetwopathsinFig.4-6,threegatesarecommoninboththepaths.IfPath90

PAGE 91

Algorithm3 Gate-overlapAwareOptimumTestPatternSetSelection 1: procedureGate-overlapAwareOptimumCaptureFlip-flopSetandPatternListGeneration 2: Input: Targetcoverageforcriticalornear-criticalip-ops, Target coverage 3: Input: Sortedlistoftestpatterns, f Pattern 1 ;Pattern 2 ;::;Patern N P g 4: Input: Gate-overlapthreshold, OV 5: Input: Listofon-pathlogicgatesforeachpathfrompathdatabase P 3 6: Input: Sortedlistofallthecriticalornear-criticalcaptureip-ops, All FF Sorted 7: Output: Listofselectedcriticalornear-criticalcaptureip-ops, Selected FF 8: Output: Listofselectedtestpatterns, Selected Patterns 9: Initialization: OL = OV 10: Initialization: Total FF =1 11: Initialization: Temp = f G 1 g 12: Initialization: Selected FF = f f 1 g 13: while Total FF
PAGE 92

initializedwiththemostcriticalornear-criticalip-opf 1.Next,asshowninLines13to19,acaptureip-opistakenfromthelistAll FF Sorted,andthepercentagelogicgate-overlapbetweenthisip-opandthealreadyselectedip-opsiscomputed.Ifthisoverlapvalueislessthantheinitializedgate-overlapthreshold,theip-opisincludedintheselected FFlist.Ifthisiterationfailstoachievethetargetcoverageforcriticalornear-criticalcaptureip-opsfortheusergivengate-overlapthreshold,thethresholdisincreased,andtheabovegate-overlapawarecaptureip-opselectionprocedureisrepeatedasstatedinLines20to26inAlgorithm3.Aftergettingtheoptimumlistofcaptureip-opsselected FF,thecorrespondingtestpatternsarestoredinthelistSelected Patterns asmentionedinLine27.ThecorrespondingLBISTseedsfortheselectedtestpatternsarereadilyavailablefromtheresultsofSection4.1.2.2.Finally,theseseedsarestoredinanon-chipmemoryandutilizedbytheLFSRduringin-eldLBISTrun. 4.1.4ResponseStorageFlip-opInsertion Figure4-7.Responsestorageip-opsconnectedwithcaptureip-ops.FromexecutionofAlgorithm3inprevioussubsection,thesetofoptimumtestpatternsandthelistofcorrespondingpath-endingcaptureip-opsarereadilyavailable.MISR-thetypicalBISThardwareforsignaturegeneration-worksbycompactionwherealiasingmightoccurandinformationislost.Toovercomethisissue,wehaveinserted92

PAGE 93

additionalstorageip-opswitheachoftheselectedcriticalornear-criticalcaptureip-opsinthelistselected FFinsubsection4.1.2.3activatedbythetestpatternsStage4inFig.4-1.Tokeepthestorageip-opdatauniformforbothrisingandfallingtransitions,theinvertedresponseisrecordedbythestorageip-opinFig.4-7forfallingtransitions.Weselectunitsizeip-opsfromthestandardcelllibrarysuchthatminimumamountofloadcapacitanceisintroducedtotheexistingnetlistasshowninFig.4-7.Allthesestorageip-opsareconnectedinascanchain,andthedataismadeavailabletothemachinelearningsoftware.Afterinsertingtheobservationip-opsinsidetheoriginalgate-levelnetlist,thenalnetlististakentothephysicaldesigntool. 4.2CountermeasuresAgainstAgingDegradationThreepossibleremediesagainstagingthatareapplicablewithBIST-RMarediscussedinthissection. 4.2.1ClockFrequencyAdjustmentAfterachiphasexperiencedsucientagingeect,thecriticalpathdelayswilldeteriorate,andthechipwillfailtocorrectlyfunctionatitstimezeroclockfrequency.Onepossiblewaytocloakthisdegradationistoaddaxedtimingguard-band-consideringthecumulativeagingfactorattheprojectedendofchiplifespan-withthecriticalpathdelaysatdesigntime.Axedtimingguard-bandispessimistic,sinceitismorethantheamountrequiredattheinitialandmiddlestagesofchiplifecycle.Analternativeapproachistoavoidaddingthispessimisticguard-bandandsettheclockfrequencybasedonlyonthecriticalpathdelays.InBIST-RM,thepredictedagingstateinformationcanbeusedtoadaptthePLLclockfrequency.Althoughadjustingtheclockfrequencymarginallywithagingissimpleandstraightforward,achipthatgraduallyslowsdownmayormaynotbeacceptabletothecustomerdependingondesiredfunctionality. 93

PAGE 94

Figure4-8.Adaptivevoltagescalingtocounteractagingdegradation. Figure4-9.LookUpTableLUTforvoltageselectionforABB/AVS. 4.2.2AdaptiveVoltageScalingAVSThetransistorthresholdvoltageincreasesbecauseofaging,andconsequentlythepropagationdelayoflogicgatesincrease.Thepropagationdelayofalogicgatecanberestoredtoitstime0valuebyenhancingthesupplyvoltagemarginally.Inconventional94

PAGE 95

technique,axedvoltageguard-bandV GBisaddedatdesigntimetothenominalsupplyvoltagetomakesurethechipfunctionalclockfrequencydoesnotdegradebelowthetargetfrequencyatanystageofchiplifetimeasshowninFig.4-8withthestraightline.Axedvoltageguard-bandV GBintroducestwochallenges-ithexedvalueofV GBisselectedonworst-casescenarioaccordingtothemarginrequiredattheendofchiplifetime.ButduringtheinitialandmiddlestagesofthechiplifecyclethexedV GBwouldexceedtherequiredvoltagecausingextrastressonthedevice,furtherdeterioratingtheagingeect.iiMoreover,axedguard-bandvoltagewilldissipateunnecessarypoweratinitialandmiddlestagesofchiplifecycle. Figure4-10.UseofBIST-RMwithAVStechnique.TheAdaptiveVoltageScalingAVSwithregardstoagingworksbygraduallyincreasingthesupplyvoltageoverthechiplifespantoosetthepathdelaydegradation.Agingdegradationduringthetimeintervalbetweenconsecutiveadaptationstepsisdeterminedandthesupplyvoltageisupdatedaccordinglytomeetthetargetclockfrequencyconsideringtheaccumulatedaging.In[36]TunableReplicaCircuitsTRCareprogrammedtoemulatethecriticalpaths,andtheagingstatusmonitoredbytheTRCareusedincontrollingthevoltageregulatormoduletoadjustthesupplyvoltage.InBIST-RMscheme,weusethedelaysofactualcircuitpaths-unlikethecriticalpathreplicasinTRC-extractedatruntimeusingLBISThardware.Thepathdelayinformationisusedtoaccuratelyidentifytheagingstateofachipfromatrainedpredictorrunninginthesoftware.TheacquiredagingstateinformationisfedtotheAVScontroller95

PAGE 96

withLookUpTableLUT.InFig.4-9apre-programmedFixedLUTF-LUTstorestheappropriatesupplyvoltagenecessaryforeachagingstate.Basedonthepredictionresultfromsoftware,aProgrammableLUTP-LUTisconguredthatholdsthevoltagevaluenecessaryfordierentprocessorcoresbasedontheirrespectiveagingstate.ThehardwarearchitectureforAVSisshowninFig.4-10,wherethevoltageregulatormoduleadjuststherequiredsupplyvoltagebasedontheagingstatusofthechip.TheagingdegradationoccursatafasterrateduringtheinitialstagesofchiplifetimeandhencetheAVSadjustmentrateishigheratthatstageasshowninFig.4-8.TheBIST-RMwithAVSapproacheectivelysolvesthedrawbacksassociatedwithonetimexedguard-bandvoltageasdiscussedearlier. 4.2.3AdaptiveBodyBiasingABBAdaptiveBodyBiasingABBtechniqueisbeingutilizedintheindustry[112]primarilytorecovertheperformancelossfromprocessvariationeects.TheabsolutevalueofthresholdvoltagedecreasesinforwardbiasingofthebodyterminalofaMOSFETdevice.ABBworksasaneectiveknobtodecreasethepropagationdelayofacircuitattheexpenseofincreasedleakagecurrent.InBIST-RM,theABBtechniquecanbeappliedtorecoverfromthepathdelaydeteriorationcausedbyagingeects.AsshowninFig.4-11,theagingstatusresultsfromBIST-RMcanbefedtoanadaptivebodybiascontrollerandvoltagegeneratormodule.ThecontrollerinterpretstheresultsfromBIST-RMwiththeaidofpre-programmedlook-uptablessimilartoFig.4-9,andsuppliesthenecessarycongurationbitstothevariablevoltagegenerator.ThevariablevoltagegeneratorcanbeimplementedusingavoltagedividerorDC-DCconverter[112]. 4.3SoftwareLevelPredictorInthissectionwedeveloptheframeworkofthein-eldpredictortobeexecutedinsoftware.ThesoftwarecanberunninginaseparateidlecoreotherthanthecorewhereBIST-RMwillbeapplyingthetestpatternsofamulti-coreSoC,orinadedicated96

PAGE 97

Figure4-11.UseofBIST-RMwithABBtechnique.microcontrollerunit.ThesoftwarecollectstheresponsesfromLBISThardware,processesthedata,andfeedstheresultstoadaptivecontrollerhardware. 4.3.1ImpactofProcessVariationsonPredictorSelectionAsdiscussedelaboratelyinSection4.1,thepathstobemonitoredbyLBISThardwareareselectedatthedesigntimebasedontiminganalysis.Itiswellknownthatthedelayofanycertainpathvariessignicantlybetweenitssimulatedvalueatthepre-siliconstageandtheactualvalueafterfabrication[2].Moreover,foranyspecicpath,dierentsamplesofthesamechipexhibitdissimilarvaluesbecauseofprocessvariations[2].Asaresult,asimplelinearregressionbasedpredictor-basedondatafromasinglechipsample-tomaptheBISTresponsestoagingstateswouldbeinaccurate.Machinelearningbasednon-linearpredictorsarethebestoptioninthisscenario. 4.3.2FeatureSelectionforMachineLearningPredictorInmachinelearningcontextfeatureselectionreferstotheselectionofsomekeyattributesfromthedatasetthataccuratelycapturethetrendofthedata.Theaccuracyofthetrainedpredictorislargelydictatedbythequalityofthefeatureset.Featureselectionisbasedonthebasicprinciplethattheselectedfeaturesubsetshouldcontainfeaturesthatarehighlycorrelatedwiththetargetvalue,yetuncorrelatedwitheachother[94].Hence,theobjectiveoffeatureselectioncriteriaistoidentifyandremoveirrelevant,unnecessaryandredundantattributesfromdatathatdonotcontributetotheaccuracyofapredictivemodelormayinfactdecreasetheaccuracyofthemodel.Agoodfeatureset97

PAGE 98

containsfeaturesthatspreadacrossthedataspaceandpresentabetterunderstandingoftheunderlyingprocessthatgeneratedthedata[94].Forourmachinelearningpredictor,weselectthedelaysofLBIST-excitablehigh-usagecriticalornear-criticalpathsasfeatures.Thus,weensurethatthesefeatureswouldstronglytrackthegeneralrun-timeagingdegradationproleofthechip.Moreover,gate-overlapawarealgorithminSection4.1.2.3ensurethatthefeaturespatharespreadacrossthewholelayout,andthusbestrepresentthetrendofagingdegradationonthedesign. 4.3.3MachineLearningFramework Figure4-12.AgingeectonthedelaydistributionofpathsexcitedbytheLBISThardware. 98

PAGE 99

Theselectedpatterns-tobeexecutedbytheLBISThardware-exciteasetofcriticalornear-criticalpathsfortransitiondetectionalongwithfewothernon-criticalpaths.ThedistributionofpropagationdelaysofthesemonitoredpathsfromMonte-Carloprocessvariationssimulationsof100chipsamplesareshowninFig.4-12forabenchmarkcircuit[118].Thedierentagingstates-time 0totime 4-inFig.4-12correspondtotheagingstagesdescribedinSection4.1Fig.4-2.Thepathdelaysdegradeby3%betweensuccessivestatesi.e.,time ktotime k +1inFig.4-12.Foreachagingstate,basedontheobserveddistributionofitscriticalornear-criticalpathdelays,atestclockcycleT kischosenensuringthat,T k > t su+max-path-delaytime k,wheret suisip-opset-uptime.AsdepictedbytheverticallinesinFig.4-12,thechosentestclocksaref T 0 ;T 1 ;T 2 ;T 3 ;T 4 g forstates f time 0 ;time 1 ;time 2 ;time 3 ;time 4 g ,respectively,and T 0
PAGE 100

Table4-2.Interpretationoftheresponsecodes Response code Characteristicsof theresponsecode Relativecomparisonof numberof`ones' AAllbitsare`one' ones A >ones othercodes BAlmostallbitsare`one' ones B ones A CMostbitsare`one' ones C +t su+max-path-delaytime k,>0.Forexample,asevidentfromFig.4-12,testclockcyclesT 1 ;T 2 ;T 3 ;T 4arealwayslargerthanthemaximumpossiblepathdelayfortime 0atallprocessvariationscenarios,andhencethecorrespondingresponsesresultingfromtheLBISTpatternexecutionsatthesetestclocksareofcodeA.Sincetheclockcyclelengthsinthesecasesarealwayslargerthanthepathdelays,theresponsestorageip-opswillalwaysrecord`one',andasaresultallbitsincodeAare`one'.TheresponsecodeBresultswhenthechipisintime kanditistestedattestclockcycleofT k.SincetheT kclockcycleischosenfortime kbasedonthedistributionofthecriticalornear-criticalpathsobservedfromnitenumberofchipsamples,itisexpectedthatalmostallpathdelaysintime kwouldbesmallerthanT k.ButbecauseofprocessvariationsandestimationT kfromnitenumberofobservationsamples,afewpathdelaysmayexceedT kinsomeotherchipsamples.Asaresult,somestorageip-opsmayrecord`zero'whentherisingtransitionLBISTpatternsareexecutedattestclockT kforachipintime k,henceafewbitsincodeBmightbe`zero'.Thiswillcausethenumberof`ones'incodeBtobelessthanorequaltothenumberof`ones'incodeAasdepictedincolumn3ofrow3inTable4-2.TheresponsecodeCresultswhentime kistestedatclockcyclelengthT k )]TJ/F27 7.9701 Tf 6.587 0 Td [(1.Forthisscenario,asgraphicallyshowninFig.4-12,theclockcyclelengthT k )]TJ/F27 7.9701 Tf 6.587 0 Td [(1wouldbelargerthanmostofthepathdelaysintime k.Hence,mostresponsestorageip-opswillrecord`one'andafewwillrecord`zeros'whentheLBISTpatternswillbeexecutedatthisclockcycle.Numberof`ones'incodeCwouldbesmallerthanthenumberof100

PAGE 101

`ones'incodeBandA,asreportedincolumn3ofrow4inTable4-1.BasedonsimilarobservationsforresponsecodesDandE,wendthatthesecodeswillcontainboth`ones'and`zeros'.Alsothenumberof`ones'incodeEwillbelessthanthoseincodeDasshownincolumn3ofTable4-2.Finally,forcodeF,whichresultswhentime kistestedatclockT k )]TJ/F27 7.9701 Tf 6.586 0 Td [(4,mostofthebitswillbe`zero'.Thereasonbehindmostbitsbeing`zeros'canbeclearlyinterpretedfromthelocationof T 0 relativeto time 4 inFig.4-12.FromanalysisofresponsecodesforagingstatesandtestclockpairsinTable4-1,andtherelativenumberof`ones'ineachcodefromthelastcolumnofTable4-2,twogeneralpatternscanbeidentiedforanyagingdegradationstate,time k,itherelativedierencesinthenumberof`ones'betweentheresponsesatadjacenttestclocksfollowadistinctivepattern,iithepositionofthersttestclockcyclefromtheorderedsetf T 0 ;T 1 ;T 2 ;T 3 ;T 4 gatwhichthenumberof`ones'intheresponsecodereachesmaximumhasstrongcorrelationwiththeagingstate'sIDk.Usingthesegeneralobservationsweformulatethefeatures-sumof`ones'-forourmachinelearningframework.Therequireddataforthemachinelearningclassierisrepresentedinastimulus-responsematrixform.ColumnkofmatrixX Trholdsaresumof`ones'intheresponsecodesobtainedfortestclockT katf T 0 ;T 1 ;T 2 ;T 3 ;T 4 g.Theagingstatescorrespondingtotheresponsesineachrowof X Tr isstoredinequivalentrowofthematrix y Tr X Tr = 2 6 6 6 6 6 6 6 6 6 6 4 S A S A S A S A S B S A S A S A S B S C S A S A S B S C S D S A S B S C S D S E S B S C S D S E S F 3 7 7 7 7 7 7 7 7 7 7 5 y Tr = 2 6 6 6 6 6 6 6 6 6 6 4 time 0 time 1 time 2 time 3 time 4 3 7 7 7 7 7 7 7 7 7 7 5Inthispaperwehaveuseddierentmachinelearningclassiers-MultinomialLogisticRegressionMLR[88],BootstrapaggregationBagging[92],RandomForest[93],ErrorCorrectingOutputCodeSupportVectorMachineECOCSVM[90],AdaBoost[95]-and101

PAGE 102

comparedtheirpredictionaccuracy.BriefdescriptionsoneachoftheseclassicationalgorithmsaregivenintheAPPENDIXsection. Stepsindatagenerationfortrainingofthepredictor,Step1:Sucientnumberoftestchipsamples,i.e.,100samples,aresubjectedtogradualacceleratedaginginacontrolledenvironmentsuchthatthechipspassthroughthevedierentagingstagestime 0 time 1 time 2 time 3 and time 4 .Step2:Foreachtrainingchipsample,ateachofthe5agingstages,therisingtransitiondetectionLBISTpatternsareappliedattestclocksf T 0 ;T 1 ;T 2 ;T 3 ;T 4 g,andtheresponsesarecollected.Theresponsesarefromthesetofcodesf A;B;C;D;E;F gasexplainedearlier.Step3:TheresponsesobtainedfromLBISTpatternexecutionsarepassedtothesoftware.Insoftware,theresponsecodesareconvertedtotheequivalentsumformatf S A ;S B ;S C ;S D ;S E ;S F gtopreparethetrainingdatamatrixX Trandthecorrespondingagingtimestatesvectory Tr.For100testchipsampleswitheachhaving5agingstages,thetrainingdataf X Tr ;y Tr g,willcontain500dierenttrainingcases.Next,thetrainingprocedureofthemachinelearningalgorithmisexecutedtocompletethetrainingphase. Stepsindatagenerationforprediction,Step1:Duringin-eldoperation,therisingtransitionLBISTpatternsareexecutedwithclocksweepingfromPLLattestclocksf T 0 ;T 1 ;T 2 ;T 3 ;T 4 g.Thecorrespondingresponsesforeachoftheclocksarepassedtothesoftware.Step2:Insoftware,thetrainedmachinelearningclassierexecutesthepredictionphase,andaccuratelypredictstheagingstatethechipisin. 4.4SimulationResultsandAnalysisTodemonstratetheeectivenessofBIST-RMinagingprediction,weselectedtwobenchmarkcircuits-aprocessorcircuitbenchmarkB15fromtheITC99benchmarksuite[116],andtheOpenRISC1200OR1200RISCCPUcorefromOpenCores[118].Thebenchmarkcircuitsweresynthesizedwith28nmstandardcelllibraryfromSynopsys102

PAGE 103

Table4-3.Resultsfrombenchmarkcircuitsat28nmstandardcelllibrary DesignpropertyOR1200B15 Numberoflogicgates138154283 Layoutarea m 2 8082918496 Numberofip-ops2998417 Numberofnear-critical/criticalip-ops16168 Targetcoverageofnear-critical/criticalip-ops50%50% Gate-overlapthreshold10%5% NumberofLBISTtestpatterns156 Implementationareaoverhead0.97%1.79% [119].ThedesignstatisticsarereportedinTable4-3.Forboththebenchmarks,fromadetailedtiminganalysiswithPrimeTime[119]STAtool,allpotentiallycriticalornear-criticalpathsusingthe12.5%margindiscussedinsection4.1.1wereidentied.Next,eachofthesepotentiallycriticalornear-criticalpath'scumulativetogglerateasdescribedinSection4.1.1wascalculatedusingSynopsysVCSandPowerCompilertools[119]andcustomscriptstoidentifyifitwasahigh-usagepath.Next,theSmallDelayDefectSDDlaunch-on-capturetransitiondelayfaultpatterngenerationowwasinvokedwithATPGtoolTetramax[119].Thecut-oslackwaschosensuchthatallthepotentiallycriticalornear-criticalpathsweretargetedbytheSDDow.SDDowaimedtogeneratetestpatternsintheorderofthelongestpaths.WhenasingleSDDpatternwasgeneratedtargetingthemostcriticalpath,thepatternautomaticallyexcitedsomeothercriticalornear-criticalpathsalongwiththepathitwasaimedfor.TheindividualSDDpatternsandthefaultlistswerewrittenoutbytheATPGtoolforthetwobenchmarkcircuits.TheSDDpatternsweresortedintheorderofthemostnumberofcriticalornear-criticalip-opsactivated.ThesortedSDDpatternswerelatertestedforavailabilityofequivalentseedsfortheLFSRoftheLBISThardwareusingtheproceduredescribedinSection4.1.2.2.ThoseSDDpatternsforwhichnoLFSRseedswerereportedbythealgorithmwithinthesetiterationtimelimitwerediscarded[109].Inthenextstepweutilizedourin-housetool,developedinC++,toextractthedetailedinformation-theon-pathstandardcelllogicgates,launchandcaptureip-ops103

PAGE 104

Table4-4.Detectionresolutionpercentincreaseinpathdelayatdierentagingstages Case time 0 time 1 time 2 time 3 time 4 Case1036912 Case2036.51013.5 -ofeachofthepathsexcitedbytheSDDpatterns.WeaimedtoselectanoptimumnumberofSDDpatternssuchthat50%ofthehigh-usageandpotentiallycriticalornear-criticalip-opswouldbeactivatedbytheselectedpatterns.Thegate-overlapawarepatternselectionalgorithmdescribedinSection4.1.2.3wasutilizedtoidentifythesetofoptimumpatterns.Whenselecting50%ofthehigh-usagepotentiallycriticalornear-criticalip-ops,thegate-overlapawarealgorithmensuredthattheselectedcriticalornear-criticalcaptureip-opswerespreadacrossthechiplayout.Afternalizingthetestpatternsandtheselectedcriticalornear-criticalip-opstobemonitoredbyLBISThardware,resultstoragescanip-ops-asdescribedinSection4.1.4-wereattachedwiththerespectivecaptureip-opsusingautomatedPerlscripts,andnallythelayoutswerecompletedwithICCompiler[119].ThecompletelayoutsofthetwobenchmarkcircuitsaregiveninFig.4-13.Theyellowdotsonthelayoutmarkthelocationsoftheselectedcaptureip-ops,whichclearlyindicatethegate-overlapawarealgorithmensuredthemonitoredcriticalornear-criticalpathsarespreadacrossthelayout.Thetargeted50%ip-opcoveragewasachievedwith10%and5%gate-overlapthresholdsforB15andOR1200benchmarks,respectively,asshowninrow7ofTable4-3.15and6SDDpatternswiththeircorrespondingLFSRseedswereselectedforOR1200andB15benchmarks,respectively,asdemonstratedinrow8ofTable4-3.Theareaoverheadfrominsertingresponsestorageip-opswere0.97%and1.79%ofthemaindesignsforbenchmarksOR1200andB15,respectively.Theareaoverheadsresultfromthechosen50%marginforcriticalornear-criticalip-opselection.Thearea-overheadcanbefurtherreducedbydecreasingthenumberofcriticalornear-criticalip-opstobemonitored,ifrequired.WhentheBIST-RMisactivated,theLBISTtesttimeorclockcycleoverheaddependsonthenumberoftestpatternsapplied,numberof104

PAGE 105

Figure4-13.Layoutsofbenchmarks.Yellowdotsshowthelocationsofthemonitoredcriticalornear-criticalcaptureip-ops.aOR1200;bB15. Figure4-14.PredictionwithECOCSVMclassierresultsfor3%aginggridresolution.atoeB15benchmark;ftojOR1200benchmark.appliedtestclocksandthenumberofmonitoredip-ops.Mathematically,t overhead /numberoftestpatterns*numberoftestclocks*numberofmonitoredip-ops.The105

PAGE 106

testtimeoverheadisinsignicantwithrespecttothetypicalruntimeaswellasthelifetimeofthechip,sincetheBIST-RMtechniqueisactivatedonlywhenrequired. Figure4-15.Predictionaccuracyanderrorrangefordierentclassiersat3%aginggridresolution.aB15benchmark;bOR1200benchmark.Aftergettingthedesignreadybycompletingtheabovementionedprocedures,thenextstepwastoidentifytheoperatingclockfrequencyorF MAXforeachofthesiliconchipsamples.Insimulationtomimicthebehaviorofdierentsiliconsamplesofthesamechip,weperformedMonte-Carloprocessvariationsimulations.Forthispurposeweextractedthepost-layouttransistorlevelSPICEnetlistofthetop10%criticalornear-criticalpathsforthetwobenchmarksusingSynopsystools[119].ThechipF MAXwillbedeterminedbyanyoftheseextractedpathsdependingonthechip-to-chipprocessvariations.WevariedtransistorlengthL,widthWandthresholdvoltageV thby10%,10%and3%,respectively,allwithin3ofthemeanortypicalvaluetogenerate200chipsamples.Ofthese200samples,100sampleswerefortrainingamachinelearning106

PAGE 107

predictor,andtherestwereforvalidationpurpose.Foreachofthe200samples,themaximumpathdelaywasobtainedfromthesetop10%extractedpathsbytransistor-levelHSPICE[119]simulations.Foreachsample,usingtheobtainedmaximumpathdelay,thechipoperatingclockfrequencyorF MAX 0attime 0wasestimatedwithoutaddinganyagingguard-band.Next,the200chipsampleswereagedsuccessivelyatfourtimestages-f time 1,time 2,time 3,time 4 g.AgingsimulationsconsideringbothBiasTemperatureInstabilityBTIandHotCarrierInjectionHCIwereperformedusingHSPICEMOSRA[119].Twocasesofagingresolutions-3%and3.5%-wereconsideredasreportedinTable4-4.Inouragingsimulationsweassumetheworst-casescenariowhereallthecriticalornear-criticalpathsareagedatthesameworst-caserateasafunctionoftheaggregatedtimethechiphasbeenpowered-onsincetimezero.ThedatagenerationproceduresdescribedintherestofthisparagraphweredoneseparatelyforthetwocasesofagingresolutionsinTable4-4.ThesamemethodofndingF MAX 0attime 0-i.e.,obtainingthemaximumpathdelayfromthetop10%criticalpaths-isrepeatedattimesf time 1 ;time 2 ;time 3 ;time 4 gforthe200chipsamples.FromthedistributionofF MAX kattime koverallthechipsamples,atestclockcycleT kisestimated,wherek=0to4.Fig.4-12inSection4.3demonstratedtherelationshipbetweentestclocksT kandagingstatestime k.Next,tondthebehavioroftheselectedcriticalornear-criticalip-opsateachoftheveaginginstances,theLBISTpatternswereappliedattestclocksf T 0 ;T 1 ;T 2 ;T 3 ;T 4 gateachagingstatetime kforthe200chipsamples.Theresponses,R k=f r 0 ;r 1 ;r 2 ;r 3 ;r 4 g,correspondingtothetestclockswerecollectedinthestorageip-ops,wherek=0to4andr 0 ;:::;r 4 2f S A ;S B ;S C ;S D ;S E ;S F g.Inthisway,machinelearningtool'sfeaturebehaviorwithagingtimestateswereobtained.For200chipsamples,witheachsubjectedto5gradualstagesofaging,1000setsofthepairf features k ;time k g -weregenerated.The500f features k ;time k gpairsobtainedfromtherst100chipsampleswereusedtotrainBagging,RandomForest,ECOCSVM,AdaBoost.M2andMLRclassiersfrom107

PAGE 108

MATLABMachineLearningToolbox[120].Aftercompletingthetrainingstep,therestofthe100sampleswerefedtothetrainedpredictor,andtheobtainedresultswereusedtovalidatetheaccuracyofthetrainedpredictor.Thevalidationresultsfor3%agingresolutionCase1inTable4-4andECOCSVMclassierareshowninFig.4-14atoeforB15benchmarkatagingstagestime 0totime 4.Inthegures,X-axisrepresentsthesampleIDandY-axisshowstheextentofmisprediction,i.e.0refersnomisprediction,1indicatestime kismispredictedtobeattime k +1and-1impliestime kismispredictedtobeattime k )]TJ/F27 7.9701 Tf 6.586 0 Td [(1.Forexample,inFig.4-14a,forallthe100samplesthepredictedtimestatematchedtheactualtimestate,whichwastime 0.InFig.4-14b,theactualagingstatewastime 1.Outof100samples,sampleswithIDnumbers19,40,51,60,81and88weremispredictedtobeoftime 0.Thepredictionaccuracyforthiscaseis94%.Similarly,inFig.4-14candd,total3and6samplesweremispredictedcausing98%and94%accuracy,respectively.ThepredictionresultsforOR1200benchmarkareshowninFig.4-14ftoj.ForOR1200benchmark,theworst-casepredictionaccuracyof95%occurredfortime 3asshowninFig.4-14i.Theworst-casepredictionaccuracyimprovesto97%forboththebenchmarksforagingresolutionof3.5%Case2inTable4-4.Atrade-oexistsbetweenagingpredictionresolutionandaccuracy. Figure4-16.Predictionaccuracywithvaryingsizesoftrainingsamplesfordierentclassiersat3%aginggridresolution.aB15benchmark;bOR1200benchmark. 108

PAGE 109

Theworst-casepredictionaccuracy,andmispredictionrangeforvedierentmachinelearningclassiersareshowninFig.4-15forthedesignbenchmarks.FortheB15andOR1200benchmarks,ECOCSVMclassieryieldedthebestpredictionaccuracyof94%and95%,respectively.TheRandomForestclassieralsoexhibitedsimilaraccuracyof94%forthetwobenchmarks.TheMLRalgorithmresultedinthepoorestaccuracy.Thisobservationcanbeexplainedbythefactthatunlikeothernon-linearclassierssuchasECOCSVM,Bagging,AdBoost.M2orRandomForest,MLRisinherentlyalinearclassier,andlinearclassicationalgorithmmaynotbesuitableforourpredictorthathastoaccountforchip-to-chipprocessvariations.ThemaximumextentofmispredictionistwolevelsforMLRclassier,andonelevelfortheotheralgorithms.PredictionaccuracyimprovesmarginallyasthenumberoftrainingsamplesareincreasedasshowninFig.4-16.Basedontheseobservations,ECOCSVMorRandomForestisthemostaccuratepredictorforourtask. Figure4-17.Normalizedexecutiontimeinsoftwareforprediction.Sincethepredictorwillbeexecutedinsoftwarein-eld,theruntimeofthepredictorisanimportantmetric.Weevaluatedtheexecutiontimeofthevedierentpredictorsinsoftwaree.g.,MATLAB[120].AsshowninthenormalizedexecutiontimechartinFig.4-17,MLRisthefastestpredictorfollowedbyECOCSVM.Consideringprediction109

PAGE 110

accuracyandexecutiontime,ECOCSVMisthebestmachinelearningalgorithmforourtask.Thereare,however,twochallengesinensuringtheintegrityofsoftware-levelpredictorthatneedtobeaddressed.Firstly,sincethemaximumpredictionerrorislimitedtowithinonelevel,duringin-eldtesttheactualagingstatetime kmightbepredictedtobeofstatetime k +1ortime k )]TJ/F27 7.9701 Tf 6.587 0 Td [(1intheworst-case.Secondly,ifthechipisatanyagingstagethatisintermediatebetweentime kandtime k +1,theremightbeaquantizationorroundingerrorintheprediction,becausethepredictorwastrainedatdiscretestates-time k,k=1to4.Bothoftheseissuescanbecircumventedbyusingaconservativecountermeasurestrategyinwhichifthesoftwarelevelpredictionisagingstatetime k,thehardwarewillselectthecountermeasureasdiscussedinSection3correspondingtothestatetime k +1.Inthisway,itwillbeensuredthatappropriateadaptivemeasuresaretaken,andtheSoCoperatesreliablywithinitsdesignspecicationsoveritslifetime. 110

PAGE 111

CHAPTER5 SUMMARYANDCONCLUSIONSInthisdissertationmajorreliabilitychallengesofSystemonChipsSoCfabricatedwithmodernCMOStechnologiesareanalyzedelaborately.On-chipstructuressuchasembeddedsensorormonitorIPsareproposedaskeycomponentsthatcancaptureimportantdataatrun-timeforchipreliabilityandresilienceanalysis.TheconceptofreusingexistingBISThardwareforchiplifetimereliabilitymanagementhasbeenputforwardinthisthesis.Moreover,inthisdissertationsoftwarelevelmachinelearningtechniquesarepresentedasatooltoensurereliabilityofhardwarefromthesoftwareside. 5.1SensorNetworkDesignforPowerSupplyNoiseProleExtractionAdesignhasbeenpresentedtocontinuouslymonitorthePSNuctuationsinmodernSoCsandrecordtheworst-caseeventperclockcycle.Ourproposedspecialdebugfeaturetopinpointthecriticalstatusbitsorclockcyclenumberatthemomentofdegradedperformanceoerstheopportunitytotracetherootcausesofperformancedegradationintheeldandallowsacomparisonbetweenworkloadactivitiesintheeldandintheproductiontestconditions.Thesensorsinthenetworkareabletoprovideanaccuratespatialproleofthenoiseinthecircuit,andoeralowareaoverheadasshownbyoursimulation. 5.2SpeedBinningwithSensorsandMachineLearningAninnovativespeed-binningowhasbeenproposedthatutilizesmachinelearningandthepathslacksextractedwithon-chipsensors.Anovellayout-awareandgate-netlistlevelsensorinsertionalgorithmplacesthesensorsuniformlyinthelayoutacrossthecritical/near-criticalcaptureip-ops.Thesensor-extractedpathslacksareusedasfeaturesinmodelingandtrainingmachinelearningsoftwarerunningintheATE.Forasucientnumberoftrainingsamples,theworst-casemismatchbetweenpredictedandactualF MAXisonebinandoccursfor1%ofthepredictedsamples.Theproposed111

PAGE 112

frameworkhasthepotentialtoeliminatethehighcostassociatedwithconventionalfunctionaltestbasedspeed-binningows. 5.3BISTHardwareandMachineLearninginReliableSoCDesignAmethodologyhasbeendemonstratedthatreusestheexistingLBISThardwaretopredictthestateofagingdegradationofaSoCin-eldusingsoftwarelevelmachinelearningtechnique.Theproposedapproachoerstheopportunityoftimingreliabilitymanagementofchipsin-eld.Thisdissertationhasshowed,usingsimulationresultsandanalysis,thattheproposedtechniqueallowsaccurateandnegrainedin-eldagingprediction.Moreover,thepredictedresultscanbefurtherutilizedtoactivateadaptivetechniquestoremedythetimingdegradation. 5.4FutureWorkTheideasandsolutionspresentedinthisthesiscanbeusedasasteppingstoneforfutureresearchonvariabilitytoleranceandreliabilityforcomplexSoCsatnextgenerationCMOStechnologynodese.g.,7nmor5nm.Theproposedsoftwarelevelmachinelearningtechniquetomanagehardwarereliabilityfromthesoftwaresideatrun-timewillleadtoawiderangeofresearchanddevelopmentactivitiesinfuture.Moreover,thesensorsdesignedinthisthesiscanndapplicationsintheemergingeldofhardwaresecurity[113][114]. 112

PAGE 113

APPENDIX MULTICLASSMACHINELEARNINGALGORITHMSThedierentmulticlassmachinelearningalgorithmsusedinchapters3and4aredescribedinthissection. A.1MultinomialLogisticRegressionMultinomiallogisticregressionisasupervisedmachinelearningtechniquewherealinearcombinationoftheobservedfeaturesareusedtopredicttheclasslabel[88].Inthisregressionmodelthelog-oddsoftheoutcomesaremodeledasalinearcombinationofthepredictorvariables.Contrarytolinearregression,wherethedependentvariableortheoutcomeiscontinuous,inlogisticregressionitiscategorical,i.e.,discrete.Thetrainingsetofmtrainingsamplesispresentedintheformfx ;y ;:::;x m ;y m g.Theinputdatasamplex i 2 R N,whereNisthetotalnumberoffeatures.ForKpossibleclasses,theclasslabelsy i 2f1;2;:::;K g.Theweightparameters,correspondingtotheclasslabels,are ; ;:::; K 2 R N.Byconcatenatingthecolumns ; ;:::; K aN -byK matrix = ::: K isformed[88].Duringsupervisedtrainingstage,usingthetrainingdatathematrixisobtainedforwhichthefollowingcostfunctionJisminimum.Iterativeoptimizationalgorithmisusedforobtainingthesolution[88]. J = )]TJ/F29 7.9701 Tf 16.18 14.944 Td [(m X i =1 K X k =1 1 f y i = k g log exp k T x i P K j =1 exp j T x i Duringprediction,foranytestinputx,ahypothesisisrequiredtoestimatetheprobabilitythatPy=k j xforeachk 2f1;2;:::K g.Thishypothesisoutputsavectoroflength K representingtheprobabilities.Thehypothesis h x is[88]: h x = 2 6 6 6 6 6 6 6 4 P y =1 j x ; P y =2 j x ; . P y = K j x ; 3 7 7 7 7 7 7 7 5 = 1 P K j =1 exp j T x 2 6 6 6 6 6 6 6 4 exp T x exp T x . exp K T x 3 7 7 7 7 7 7 7 5 113

PAGE 114

Inh xtheclasslabelforwhichtheprobabilityismaximumistakenasthepredictedclassforthetestdata x A.2MulticlassSupportVectorMachineTheSupportVectorMachineSVMclassierisinherentlyabinaryclassierthatdistinguishesbetweentwoclasses[89].ToapplySVMtechniquetomulticlassclassication,themulticlassproblemisrstreducedtomultiplebinaryclassicationproblemsthatcanbesolvedseparately[90][91].ForamulticlassclassicationscenariowithKnumberofclasses,one-against-oneorall-pairscomparisonwouldrequire)]TJ/F29 7.9701 Tf 5.479 -4.379 Td [(K 2 orKK )]TJ/F15 11.9552 Tf 1.02 0 0 1 474.546 540.175 Tm [(1=2binaryclassiers.The)]TJ/F29 7.9701 Tf 5.479 -4.379 Td [(K 2 hypothesesthataregeneratedbythisprocessarenextcombinedtogetthenalpredictedclass[90].TocombinemultiplebinarySVMsformulticlassclassicationproblems,theErrorCorrectingOutputCodeECOCmulticlassmodelhasbeenproposedtoachieveaveryhighpredictionaccuracy[90].TheECOCmodelreducestheproblemofclassicationwiththreeormoreclassestoasetofbinarySVMclassiersbyusingacodingmatrixM C,wheretheelementsofM Carefromthesetf)]TJ/F15 11.9552 Tf 1.019 0 0 1 196.513 372.82 Tm [(1;0;+1g[91].Fortheall-pairsapproachandKnumberofclasses,thecodingmatrixM ChasKrowsand)]TJ/F29 7.9701 Tf 5.479 -4.378 Td [(K 2 columns.Eachrowofthecodingmatrixcorrespondstoadistinctclass.TheSVMofeachcolumnofM Ccorrespondstoadistinctpairr p ;r q,wherer p ;r qisoneofthe)]TJ/F29 7.9701 Tf 5.479 -4.379 Td [(K 2 possiblepairs.InthecolumncorrespondingtotheSVMforthepairr p ;r q,+1toindicatepositiveclassisassignedfortherowr p,-1toindicatenegativeclassisassignedforrowr qandzerosignoreareassignedforallotherrows[91].Duringtrainingphase,allthe)]TJ/F29 7.9701 Tf 5.48 -4.378 Td [(K 2 binarySVMs-correspondingtothecolumnsofM C -aretrainedusingconventionalSVMmethods[89]-[91].Aftertrainingiscomplete,foranewobservationXallthetrainedN=)]TJ/F29 7.9701 Tf 5.48 -4.379 Td [(K 2 binarySVMsareexecuted.ThevectorofpredictionsbytheindividualSVMsare,114

PAGE 115

f X =[ f 1 X ;:::;f N X ].Asthenalpredictedclasslabel,^ y ischosensuchthat, ^ y =argmin r 2 Y N X s =1 L M C r;s f s X where L isthelossfunction, Y isthesetof K classlabels[91]. A.3BootstrapAggregating Algorithm4 BootstrapaggregatingBaggingmachinelearningclassication 1: procedure TrainingPhase 2: Input:TrainingdatasetZ=f X Tr ;y Tr g;MatrixX Trisoftheform[x 1;x 2;:::;x m]Twhereeachx iisarowvectori=1tomandy Trisamelementcolumnvectorconsistingofclasslabelsforeachrowofmatrix X Tr 3: Input: Numberofclassierstotrain, N 4: Output: Asetoftrainedclassiers T = f D 1 ;D 2 ;D 3 ;:::D N g 5: Initialization: T = fg 6: for k=1tonumberofclassiers N do 7: S k generatedbootstrapsamplewithreplacementfromtrainingdataset Z 8: D k builtandtraineddecision-treeclassierusing S k asthetrainingdata 9: T = T [ D k 10: endfor 11: endprocedure 12: 13: procedure PredictionPhase 14: Input: Asetoftrainedclassiers, T = f D 1 ;D 2 ;D 3 ;:::D N g 15: Input: Newdataforclassication, x C 16: Output: Predictedclassorlabel, y C ,correspondingtotheinputdata x C 17: 18: for k=1tonumberofclassiers N do 19: r k responseobtainedfromexecutionoftrainedclassier D k 2 T on x C 20: endfor 21: y C Majorityvote f r 1 ;r 2 ;:::r k g 22: endprocedure Bootstrapaggregatingbaggingisanensemblemeta-algorithmthatimprovesthestabilityandaccuracyofmachinelearningclassiers[92].Theessenceofbaggingistotmanylargedecisiontreestobootstrap-resampledversionsofthetrainingdata,andclassifythenallabelbymajorityvote.Baggingaveragesagivenprocedureovermanysamples,toreduceitsvariance.Asbaggingalgorithmperformsaveragingonbootstrapsamples,theerrorfromvariancecanbesuppressedevenfortheregressionorclassicationprocedures115

PAGE 116

thatarenotverystable[92].Insituationsforwhichthepredictorhasarelativelylargevariance,baggingcanappreciablyreducethemeansquaredpredictionerror,providedthatthelearningsampleissucientlylarge.Thebaggingmachinelearningow[92]ispresentedinAlgorithmA.3.Inthetrainingphaselines1to11,theinputstothealgorithmarethetrainingsamplesalongwiththeirrespectiveclasslabelsandthetargetednumberofclassierstotrain.ThembyfmatrixX Trconsistsofmtrainingsampleswithfnumberoffeatures.Thecorrespondingclasslabelsareincludedinthecolumnvectory Tr.TheoutputTisasetoftrainedclassiersonthebootstrapsamples.Asdepictedinlines6to10,oneachiterationabootstrapsampleS kisdrawnfromthetrainingdataset,adecision-treeclassierD kisobtainedwiththissampleandincludedintheclassierensembleT.Inthesubsequentpredictionphaselines13to22,thetrainedensembleTandnewdataforclassicationx Carefed,andthepredictedclasslabely Cisobtained.EachclassierD kintheensembleTreportsapredictedclasslabelr kasshowninline19.Finallyinline21,amajorityvoteistakenontheindividuallypredictedlabelsandthewinneristhenalclasslabel y C A.4RandomForestTheRandomForestalgorithmstrivestoimprovetheecacyofthebootstrapaggregatingtechniquebyde-correlatingthetrees[93].TheRandomForesttechniqueissimilartothebaggingowpresentedinAlgorithmA.3,withtheonlyexceptionthatateachtreesplitarandomsampleofqfeaturesaredrawn,andonlythosefeaturesareconsideredforsplittinginthedecisiontree.Generallyq=p f,wherefisthetotalnumberoffeatures.Theoveralleectisthatthevarianceisreducedwhenweaveragethetrees.InotherwordsRandomForestcurbstheover-ttingofthetrainingdataset[92][93]. A.5AdaptiveBoostingAdaptiveBoostingAdaBoostisamachinelearningmeta-algorithm[95],wherethetrainingowstartswiththeexecutionoftherstclassiersuppliedwithequallyweightedtrainingdata.Basedontheaccuracyoftherstclassier,weightsonmisclassieddata116

PAGE 117

Algorithm5 AdaBoost.M2machinelearningclassication 1: procedure TrainingPhase 2: Input: Setofuniqueclasslabels Y ,wherenumberofclasslabelsis L 3: Input:TrainingdatasetZ=f X Tr ;y Tr g;MatrixX Trisoftheform[x 1;x 2;:::;x m]Twhereeachx iisarowvectori=1tomandy Trconsistsofclasslabelsforeachrowofmatrix X Tr y Tr isoftheform[ y 1 y 2 :::y m ] T where y i 2 Y i =1to m 4: Input: Weaklearningalgorithm WeakLearn 5: Input: Numberofiterationsorboostingrounds, N 6: Output: Thenalclassier h fin 7: Let B = f i;y : i 2f 1 ;:::;m g ;y 6 = y i g ,solengthofB, j B j = m L )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 8: Initialization: D 1 i;y = 1 j B j = 1 m L )]TJ/F27 7.9701 Tf 6.586 0 Td [(1 for i;y 2 B 9: for k=1tonumberofiterations N do 10: h k hypothesisreturnedby WeakLearn calledwithmislabeldistribution D k 11: k = 1 2 P i;y 2 B [ D k i;y ] [1 )]TJ/F28 11.9552 Tf 11.955 0 Td [(h k x i ;y i + h k x i ;y ]=pseudo-lossof h k 12: k = k = )]TJ/F28 11.9552 Tf 11.955 0 Td [( k 13: D k +1 i;y = D k i;y Z k 0 : 5 [1+ h k x i ;y i )]TJ/F29 7.9701 Tf 6.587 0 Td [(h k x i ;y ] k ; Z k isnormalizationconstant 14: endfor 15: h fin x =argmax y 2 Y k = N X k =1 log 1 k h k x;y 16: endprocedure 17: 18: procedure PredictionPhase 19: Input: Thenalclassier h fin 20: Input: Newdataforclassication, x C 21: Output: Predictedclassorlabel, y C ,correspondingtotheinputdata x C 22: y C = h fin x C =argmax y 2 Y k = N X k =1 log 1 k h k x C ;y 23: endprocedure areincreasedforthesecondclassier.Thisprocedureofgivingemphasisonmisclassieddataisrepeateduntilalltheclassiersaretrained.TheAdaBoost.M2algorithm[95],presentedinAlgorithm5,takesasinputthesetofuniqueclasslabelsY,thetrainingdatasetX Tr ;y Tr,achoiceofabasicclassiercalledWeakLearn,thenumberofiterationsN,andoutputsthenaloptimizedclassierh fin.ThemislabelweightdistributionfunctionD 1isinitializedtoauniformdistribution.NextthealgorithmcallsthebasicclassierWeakLearn-generallyatreeclassier-repeatedlyinaseriesofiterations.117

PAGE 118

Oniterationk,theboosterprovidesWeakLearnwithamislabeldistributionD koverthetrainingdataset.Asshowninline10,theresponseofWeakLearnisaclassierorhypothesish k,whichisabletocorrectlyclassifyaportionofthetrainingsetthathashigherprobabilitywithrespecttothedistributionD k.Theweaklearner'sgoalistondahypothesiswhichminimizesthetrainingerror kline11withrespecttothedistributionD kthatwasprovidedtotheweaklearner.Toindicatethedegreeofcredibility,eachweakhypothesish koutputsavectorwithvaluesintherangeof0to1correspondingtotheclasslabels,wheretheobjectswithvaluescloseto0or1areconsideredtobeimplausibleorplausible,respectively.Inline11ofAlgorithm5,ifh kx i ;y i=1andh kx i ;y=0,thenh khaspredictedthatx i'sclasslabelisy i,noty,whichisthecorrectprediction.Ontheotherhand,ifh kx i ;y i=0andh kx i ;y=1,thenh khasincorrectlymadetheoppositeprediction.h kx i ;y i=h kx i ;yimpliesrandomguess[95].Thequalityofatrainedhypothesish kisjudgedwiththepseudo-lossfactor k.Pseudo-lossisusedtoinstructthealgorithmtoconcentrateonthelabelsthatarehardesttodierentiate.Pseudo-loss kiscomputedwithrespecttoadistributionD koverthesetofallpairsofexamplesandincorrectlabelsasshowninline11.Inline13,mislabelweightdistributionfornextiterationisupdatedaccordingtothetrainingerror,andthisenablestheboostingalgorithmtoforcetheWeakLearntofocusnotonlyonharder-to-classifysamples,butmoreparticularly,ontheincorrectlabelsthatarehardesttodierentiate.Finally,afterNiterations,theboostercombinestheweakhypothesesh 1,h 2,...,h kintoasinglenalhypothesish finasreportedinline15.Laterinthepredictionstage,newtestdatax Cissuppliedtothetrainedclassierh fintogetthepredictedclasslabely Casshowninline22ofAlgorithm5. 118

PAGE 119

REFERENCES [1]M.B.Taylor,ALandscapeoftheNewDarkSiliconDesignRegime,"IEEEMicro,vol.33,pp.8-19,2013. [2]S.S.Sapatnekar,Overcomingvariationsinnanometer-scaletechnologies,"IEEEJournalonEmergingandSelectedTopicsinCircuitsandSystems,vol.1,pp.5-18,2011. [3]M.TehranipoorandK.M.Butler,PowerSupplyNoise:ASurveyonEectsandResearch,"Design&TestofComputers,IEEE,vol.27,pp.51-67,2010. [4]S.GhoshandK.Roy,Parametervariationtoleranceanderrorresiliency:Newdesignparadigmforthenanoscaleera,"ProcIEEE,vol.98,pp.1718-1751,2010. [5]P.DasandS.K.Gupta,Extendingpre-silicondelaymodelsforpost-silicontasks:Validation,diagnosis,delaytesting,andspeedbinning,"inProceedingsof31stIEEEVLSITestSymposiumVTS,pp.1-6,2013. [6]J.Zeng,R.Guo,W.Cheng,M.MatejaandJ.Wang,Scan-basedSpeed-pathDebugforaMicroprocessor,"IEEEDesign&TestofComputers,2011. [7]E.J.Jang,A.Gattiker,S.NassifandJ.A.Abraham,Ecientandproduct-representativetimingmodelvalidation,"inProceedingsof29thIEEEVLSITestSymposiumVTS,pp.90-95,2011. [8]J.Zeng,J.Wang,C.Chen,M.MatejaandL.Wang,Onevaluatingspeedpathdetectionofstructuraltests,"inProceedingsof11thInternationalSymposiumonQualityElectronicDesignISQED,pp.570-576,2010. [9]JoonhoKong,SungWooChung,andKevinSkadron,Recentthermalmanagementtechniquesformicroprocessors,"ACMComput.Surv.44,3,Article13June2012. [10]W.Lee,Y.Wang,T.Cui,S.NazarianandM.Pedram,DynamicthermalmanagementforFinFET-basedcircuitsexploitingthetemperatureeectinversionphenomenon,"in2014IEEE/ACMInternationalSymposiumonLowPowerElectronicsandDesignISLPED,pp.105-110,2014. [11]E.J.Fluhret.Al.,The12-CorePOWER8TMProcessorWith7.6Tb/sIOBandwidth,IntegratedVoltageRegulation,andResonantClocking,"Solid-StateCircuits,IEEEJournalof,vol.50,pp.10-23,2015. [12]M.Saint-LaurentandM.Swaminathan,Impactofpower-supplynoiseontiminginhigh-frequencymicroprocessors,"AdvancedPackaging,IEEETransactionson,vol.27,pp.135-144,2004. 119

PAGE 120

[13]HuXu,V.F.Pavlidis,W.BurlesonandG.DeMicheli,Thecombinedeectofprocessvariationsandpowersupplynoiseonclockskewandjitter,"inQualityElectronicDesignISQED,201213thInternationalSymposiumon,pp.320-327,2012. [14]S.Novaket.al.,Transistoragingandreliabilityin14nmtri-gatetechnology,"in2015IEEEInternationalReliabilityPhysicsSymposium,pp.2F.2.1-2F.2.5,2015. [15]C.Prasadet.al.,Transistorreliabilitycharacterizationandcomparisonsfora14nmtri-gatetechnologyoptimizedforSystem-on-Chipandfoundryplatforms,"in2016IEEEInternationalReliabilityPhysicsSymposiumIRPS,pp.4B-5-1-4B-5-8,2016. [16]F.OborilandM.B.Tahoori,Aging-AwareDesignofMicroprocessorInstructionPipelines,"Computer-AidedDesignofIntegratedCircuitsandSystems,IEEETransactionson,vol.33,pp.704-716,2014. [17]T.B.Chan,W.T.J.ChanandA.B.Kahng,OnAging-AwareSignoforCircuitsWithAdaptiveVoltageScaling,"IEEETransactionsonCircuitsandSystemsI:RegularPapers,vol.61,pp.2920-2930,2014. [18]S.K.Rao,R.RobucciandC.Patel,Scalabledynamictechniqueforaccuratelypredictingpower-supplynoiseandpathdelay,"inVLSITestSymposiumVTS,2013IEEE31st,pp.1-6,2013. [19]T.Charania,A.OpalandM.Sachdev,AnalysisandDesignofOn-ChipDecouplingCapacitors,"VeryLargeScaleIntegrationVLSISystems,IEEETransactionson,vol.21,pp.648-658,2013. [20]S.Kose,S.Tam,S.Pinzon,B.McDermottandE.G.Friedman,ActiveFilter-BasedHybridOn-ChipDCDCConverterforPoint-of-LoadVoltageRegulation,"VeryLargeScaleIntegrationVLSISystems,IEEETransactionson,vol.21,pp.680-691,2013. [21]K.Arabi,R.SalehandMengXiongfei,PowerSupplyNoiseinSoCs:Metrics,Management,andMeasurement,"Design&TestofComputers,IEEE,vol.24,pp.236-244,2007. [22]S.Sde-PazandE.Salomon,FrequencyandPowerCorrelationbetweenAt-SpeedScanandFunctionalTests,"inTestConference,2008.ITC2008.IEEEInternational,pp.1-9,2008. [23]L-C.Wang,etal.,PowerSupplyNoiseinDelayTesting,"inTestConference,2006.ITC'06.IEEEInternational,pp.1-10,2006. [24]P.PantandJ.Zelman,UnderstandingPowerSupplyDroopduringAt-SpeedScanTesting,"inVLSITestSymposium,2009.VTS'09.27thIEEE,pp.227-232,2009. 120

PAGE 121

[25]C.K.H.Suresh,E.Yilmaz,S.OzevandO.Sinanoglu,Adaptivereductionofthefrequencysearchspaceformultivdddigitalcircuits,"inProceedingsofDesign,Automation&TestinEuropeConference&ExhibitionDATE,pp.292-295,2013. [26]E.DimaandalandM.Padilla,Test-timereductionmethodology:Innovativewaystoreducetesttimeforserverproducts,"inProc.of15thIEEEElectronicsPackagingTechnologyConference,pp.718-722,2013. [27]B.D.Cory,R.KapurandB.Underwood,Speedbinningwithpathdelaytestin150nmtechnology,"IEEEDesign&TestofComputers,vol.20,pp.41-45,2003. [28]J.Chen,J.Zeng,L.Wang,J.RearickandM.Mateja,SelectingthemostrelevantstructuralFmaxforsystemFmaxcorrelation,"inProceedingsof28thIEEEVLSITestSymposiumVTS,pp.99-104,2010. [29]H.Hsu,C.TuandS.Huang,Built-InSpeedGradingwithaProcess-TolerantADPLL,"inProceedingsof16thAsianTestSymposiumATS,pp.384-392,2007. [30]S.Mu,M.C.Chao,S.ChenandY.Wang,StatisticalFrameworkandBuilt-InSelf-Speed-BinningSystemforSpeedBinningUsingOn-ChipRingOscillators,"IEEETransactionsonVeryLargeScaleIntegrationVLSISystems,vol.PP,pp.1-13,2015. [31]E.Mintarno,V.Chandra,D.Pietromonaco,R.AitkenandR.W.Dutton,WorkloaddependentNBTIandPBTIanalysisforasub-45nmcommercialmicroprocessor,"inReliabilityPhysicsSymposiumIRPS,IEEEInternational,pp.3A.1.1-3A.1.6,2013.[32]E.Mintarno,J.Skaf,R.Zheng,J.B.Velamala,Y.Cao,S.Boyd,R.W.DuttonandS.Mitra,Self-TuningforMaximizedLifetimeEnergy-EciencyinthePresenceofCircuitAging,"IEEETransactionsonComputer-AidedDesignofIntegratedCircuits andSystems,vol.30,pp.760-773,2011. [33]M.Fojtiketal.,BubbleRazor:EliminatingTimingMarginsinanARMCortex-M3Processorin45nmCMOSUsingArchitecturallyIndependentErrorDetectionandCorrection,"Solid-StateCircuits,IEEEJournalof,vol.48,pp.66-81,2013. [34]C.Tokunaga,J.F.Ryan,T.KarnikandJ.W.Tschanz,Resilientandadaptivecircuitsforvoltage,temperature,andreliabilityguardbandreduction,"IEEEInternationalReliabilityPhysicsSymposium,2014. [35]C.R.Lefurgyet.al.,ActiveGuardbandManagementinPower7+toSaveEnergyandMaintainReliability,"Micro,IEEE,vol.33,2013. [36]M.Choetal.,.4Post-siliconvoltage-guard-bandreductionina22nmgraphicsexecutioncoreusingadaptivevoltagescalinganddynamicpowergating,"2016IEEEInternationalSolid-StateCircuitsConferenceISSCC. 121

PAGE 122

[37]H.Hong,J.Lim,H.Lim,andS.Kang,LifetimeReliabilityEnhancementofMicroprocessors:MitigatingtheImpactofNegativeBiasTemperatureInstability,".ACMComput.Surv.48,1,Article9September2015. [38]A.Vijayan,A.Koneru,S.Kiamehr,K.ChakrabartyandM.B.Tahoori,Fine-GrainedAging-InducedDelayPredictionBasedontheMonitoringofRun-TimeStress,"IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems,vol.PP,pp.1-1,2016. [39]A.Sehgal,P.SongandK.A.Jenkins,On-chipReal-TimePowerSupplyNoiseDetector,"inSolid-StateCircuitsConference,2006.ESSCIRC2006.Proceedingsofthe32ndEuropean,pp.380-383,2006. [40]MakotoNagata,T.OkumotoandK.Taki,Abuilt-intechniqueforprobingpowersupplyandgroundnoisedistributionwithinlarge-scaledigitalintegratedcircuits,"Solid-StateCircuits,IEEEJournalof,vol.40,pp.813-819,2005. [41]E.Alon,V.Abramzon,B.NezamfarandM.Horowitz,On-DiePowerSupplyNoiseMeasurementTechniques,"AdvancedPackaging,IEEETransactionson,vol.32,pp.248-259,2009. [42]A.Muhtaroglu,G.TaylorandT.Rahal-Arabi,On-diedroopdetectorforanalogsensingofpowersupplynoise,"Solid-StateCircuits,IEEEJournalof,vol.39,pp.651-660,2004. [43]E.Alon,V.StojanovicandM.A.Horowitz,Circuitsandtechniquesforhigh-resolutionmeasurementofon-chippowersupplynoise,"Solid-StateCircuits,IEEEJournalof,vol.40,pp.820-828,2005. [44]R.Franch,etal.,On-chiptiminguncertaintymeasurementsonIBMmicroprocessors,"inTestConference,2007.ITC2007.IEEEInternational,pp.1-7,2007. [45]Tzu-ChienHsueh,F.O'Mahony,M.MansuriandB.Casper,AnOn-DieAll-DigitalPowerSupplyNoiseAnalyzerWithEnhancedSpectrumMeasurements,"Solid-StateCircuits,IEEEJournalof,vol.50,pp.1711-1721,2015. [46]Y.Ogasahara,M.HashimotoandT.Onoye,All-DigitalRing-Oscillator-BasedMacroforSensingDynamicSupplyNoiseWaveform,"Solid-StateCircuits,IEEEJournalof,vol.44,pp.1745-1755,2009. [47]Y.ZhengandK.L.Shepard,On-chiposcilloscopesfornoninvasivetime-domainmeasurementofwaveformsindigitalintegratedcircuits,IEEETrans.VeryLargeScaleIntegrationVLSISyst.,vol.11,no.3,pp.336344,Jun.2003. 122

PAGE 123

[48]R.Petersen,P.Pant,P.Lopez,A.Barton,J.IgnowskiandD.Josephson,Voltagetransientdetectionandinductionfordebugandtest,"inTestConference,2009.ITC2009.International,pp.1-10,2009. [49]K.A.Bowman,etal.,All-DigitalCircuit-LevelDynamicVariationMonitorforSiliconDebugandAdaptiveClockControl,"CircuitsandSystemsI:RegularPapers,IEEETransactionson,vol.58,no.9,pp.2017,2025,Sept.2011. [50]S.WangandM.Tehranipoor,Light-WeightOn-ChipStructureforMeasuringTimingUncertaintyInducedbyNoiseinIntegratedCircuits,"IEEETransactionsonveryLargeScaleIntegrationVLSISystems,vol.22,pp.1030-1041,2014. [51]K.A.Brand,S.Mitra,E.VolkerinkandE.J.McCluskey,Speedclusteringofintegratedcircuits,"inProceedingsofInternationalTestConferenceITC,pp.1128-1137,2004. [52]J.Chen,L.Wang,P.,JingZeng,S.YuandM.Mateja,DatalearningtechniquesandmethodologyforFmaxprediction,"inProceedingsofInternationalTestConferenceITC,pp.1-10,2009. [53]J.Rearick,Toomuchdelayfaultcoverageisabadthing,"inProceedingsofInternationalTestConferenceITC,pp.624-633,2001. [54]C.Chung,J.Jhou,C.ChengandS.Li,FunctionalBuilt-InDelayBinningandCalibrationMechanismforOn-Chipat-SpeedSelfTest,"inProceedingsof18thAsianTestSymposiumATS,pp.163-168,2009. [55]M.BushnellandV.Agrawal,EssentialsofElectronicTestingforDigital,MemoryandMixed-SignalVLSICircuits,"Springer2006. [56]A.Raychowdhury,S.GhoshandK.Roy,Anovelon-chipdelaymeasurementhardwareforecientspeed-binning,"inProceedingsofOn-LineTestingSymposiumIOLTS,pp.287-292,2005. [57]X.Wang,M.Tehranipoor,S.George,D.TranandL.Winemberg,DesignandAnalysisofaDelaySensorApplicabletoProcess/EnvironmentalVariationsandAgingMeasurements,"IEEETransactionsonVeryLargeScaleIntegrationVLSISystems,vol.20,pp.1405-1418,2012. [58]Q.Shi,M.Tehranipoor,X.Wang,andL.Winemberg,On-chipsensorselectionforeectivespeed-binning,"ininProceedingsof57thIEEEInt.MidwestSymp.CircuitsSyst.,pp.10731076,Aug.2014. [59]T.B.Chan,P.Gupta,A.B.KahngandL.Lai,SynthesisandAnalysisofDesign-DependentRingOscillatorDDROPerformanceMonitors,"IEEETransactionsonveryLargeScaleIntegrationVLSISystems,vol.22,pp.2117-2130,2014. 123

PAGE 124

[60]L.Lai,V.Chandra,R.C.AitkenandP.Gupta,SlackProbe:AFlexibleandEcientInSituTimingSlackMonitoringMethodology,"IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems,vol.33,pp.1168-1179,2014. [61]N.KarimiandK.Huang,PrognosisofNBTIagingusingamachinelearningscheme,"in2016IEEEInternationalSymposiumonDefectandFaultToleranceDFT,pp.7-10,2016. [62]S.V.Kumar,C.H.KimandS.S.Sapatnekar,AdaptiveTechniquesforOvercomingPerformanceDegradationDuetoAginginCMOSCircuits,"IEEETransactionsonveryLargeScaleIntegrationVLSISystems,vol.19,pp.603-614,2011. [63]Y.Li,Y.M.Kim,E.Mintarno,D.S.GardnerandS.Mitra,OvercomingEarly-LifeFailureandAgingforRobustSystems,"IEEEDesign&TestofComputers,vol.26,pp.28-39,2009. [64]Y.Satoet.al.,DART:DependableVLSItestarchitectureanditsimplementation,"in2012IEEEInternationalTestConference,pp.1-10,2012. [65]M.Sadi,G.Contreras,D.Tran,J.Chen,L.WinembergandM.Tehranipoor,BIST-RM:BIST-AssistedReliabilityManagementofSoCsUsingOn-ChipClockSweepingandMachineLearning,"inProc.IEEEInternationalTestConferenceITC,2016. [66]M.SadiandM.Tehranipoor,DesignofaNetworkofDigitalSensorMacrosforExtractingPowerSupplyNoiseProleinSoCs,"IEEETransactionsonveryLargeScaleIntegrationVLSISystems,vol.24,pp.1702-1714,2016. [67]M.Sadi,Z.Conroy,B.Eklow,M.Kamm,N.BidokhtiandM.M.Tehranipoor,AnAllDigitalDistributedSensorNetworkBasedFrameworkforContinuousNoiseMonitoringandTimingFailureAnalysisinSoCs,"inTestSymposiumATS,2014IEEE23rdAsian,pp.269-274,2014. [68]HuXu,V.F.Pavlidis,XifanTang,W.BurlesonandG.DeMicheli,TimingUncertaintyin3-DClockTreesDuetoProcessVariationsandPowerSupplyNoise,"VeryLargeScaleIntegrationVLSISystems,IEEETransactionson,vol.21,pp.2226-2239,2013. [69]JieGu,HanyongEomandC.H.Kim,On-ChipSupplyNoiseRegulationUsingaLow-PowerDigitalSwitchedDecouplingCapacitorCircuit,"Solid-StateCircuits,IEEEJournalof,vol.44,pp.1765-1775,2009. [70]K.A.Bowman,C.Tokunaga,T.Karnik,V.K.DeandJ.W.Tschanz,A22nmAll-DigitalDynamicallyAdaptiveClockDistributionforSupplyVoltageDroopTolerance,"Solid-StateCircuits,IEEEJournalof,vol.48,pp.907-916,2013. 124

PAGE 125

[71]S.Pant,E.ChiproutandD.Blaauw,PowerGridPhysicsandImplicationsforCAD,"Design&TestofComputers,IEEE,vol.24,pp.246-254,2007. [72]R.Ginosar,MetastabilityandSynchronizers:ATutorial,"Design&TestofComputers,IEEE,vol.28,pp.23-35,2011. [73]IEEEAssoc.,IEEEStd.1149.1-2001,IEEEStandardTestAccessPortandBoundary-ScanArchitecture,"2001. [74]M.T.HeandM.Tehranipoor,SAM:AcomprehensivemechanismforaccessingembeddedsensorsinmodernSoCs,"inDefectandFaultToleranceDFT,IEEEInternationalSymposiumon,pp.240-245,2014. [75]L.H.Chen,M.Marek-SadowskaandF.Brewer,Buerdelaychangeinthepresenceofpowerandgroundnoise,"VeryLargeScaleIntegrationVLSISystems,IEEETransactionson,vol.11,pp.461-473,2003. [76]R.Saleh,S.Z.Hussain,S.RochelandD.Overhauser,ClockskewvericationinthepresenceofIR-dropinthepowerdistributionnetwork,"Computer-AidedDesignofIntegratedCircuitsandSystems,IEEETransactionson,vol.19,pp.635-644,2000. [77]D.Josephson,Thegood,thebad,andtheuglyofsilicondebug,"inDesignAutomationConference,200643rdACM/IEEE,pp.3-6,2006. [78]S.Mitra,S.A.SeshiaandN.Nicolici,Post-siliconvalidationopportunities,challengesandrecentadvances,"inDesignAutomationConferenceDAC,201047thACM/IEEE,pp.12-17,2010. [79]S.Park,T.HongandS.Mitra,Post-SiliconBugLocalizationinProcessorsUsingInstructionFootprintRecordingandAnalysisIFRA,"Computer-AidedDesignofIntegratedCircuitsandSystems,IEEETransactionson,vol.28,pp.1545-1558,2009.[80]S.DeutschandK.Chakrabarty,Massivesignaltracingusingon-chipDRAMforin-systemsilicondebug,"inTestConferenceITC,2014IEEEInternational,pp.1-10,2014. [81]QiangXuandXiaoLiu,Onsignaltracinginpost-siliconvalidation,"inDesignAutomationConferenceASP-DAC,201015thAsiaandSouthPacic,pp.262-267,2010. [82]J.Shor,K.LuriaandD.ZilbermanRatiometricBJT-basedthermalsensorin32nmand22nmtechnologies,"IEEEInt.Solid-StateCircuitsConf.ISSCCDig.Tech.Papers,pp.210-2122012. [83]S.Paek,W.Shin,J.Lee,Hyo-EunKim,Jun-SeokParkandLee-SupKim,HybridTemperatureSensorNetworkforArea-EcientOn-ChipThermalMapSensing,"Solid-StateCircuits,IEEEJournalof,vol.50,pp.610-618,2015. 125

PAGE 126

[84]M.Sadi,M.Tehranipoor,X.WangandL.WinembergSpeedBinningUsingMachineLearningAndOn-chipSlackSensors,"InProceedingsofthe25theditiononACMGreatLakesSymposiumonVLSIGLSVLSI,pp.155-160,May2015. [85]M.Sadi,S.Kannan,L.WinembergandM.Tehranipoor,SoCSpeedBinningUsingMachineLearningandOn-chipSlackSensors,"IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems,vol.PP,pp.1-1,2016. [86]IsabelleGuyonandAndreElissee,Anintroductiontovariableandfeatureselection,"TheJournalofMachineLearningResearch,3,3/1/2003. [87]M.F.Delgado,E.Cernadas,S.BarroandD.Amorim,Doweneedhundredsofclassierstosolverealworldclassicationproblems?"TheJournalofMachineLearningResearch,vol.15,pp.3133-3181,2014. [88]http://udl.stanford.edu/tutorial/supervised/SoftmaxRegression/Accessed:May2016. [89]C.Hsu,C.ChangandCLin,Apracticalguidetosupportvectorclassication.":1-16. [90]T.Dietterich,andG.Bakiri.Solvingmulticlasslearningproblemsviaerror-correctingoutputcodes."Journalofarticialintelligenceresearchpp.263-286,1995. [91]E.Allwein,R.Schapire,andY.Singer.Reducingmulticlasstobinary:Aunifyingapproachformarginclassiers."TheJournalofMachineLearningResearch1,pp.113-141,2001. [92]L.Breiman,Baggingpredictors,"Mach.Learning,vol.24,pp.123-140,1996. [93]L.Breiman,Randomforests,"Mach.Learning,vol.45,pp.5-32,2001. [94]IsabelleGuyon,AndreElissee,Anintroductiontovariableandfeatureselection,"TheJournalofMachineLearningResearch,3,3/1/2003. [95]Y.FreundandR.E.Schapire,Experimentswithanewboostingalgorithm,"inProceedingsoftheThirteenthInternationalConferenceonMachineLearning,pp.148-156,1996. [96]M.TehranipoorandC.Wang,IntroductiontoHardwareSecurityandTrust,"SpringerPublishingCompany,2011. [97]L.H.GoldsteinandE.L.Thigpen,SCOAP:SandiaControllability/ObservabilityAnalysisProgram,"inDesignAutomation,1980.17thConferenceon,pp.190-196,1980. 126

PAGE 127

[98]V.Chandra,Monitoringreliabilityinembeddedprocessors-Amulti-layerview,"in201451stACM/EDAC/IEEEDesignAutomationConferenceDAC,pp.1-6,2014. [99]J.Henkelet.al,Reliableon-chipsystemsinthenano-era:lessonslearntandfuturetrends".InProceedingsofthe50thAnnualDesignAutomationConferenceDAC'13. [100]K.A.Bowmanetal.,Energy-ecientandmetastability-immuneresilientcircuitsfordynamicvariationtolerance,"Solid-StateCircuits,IEEEJournalof,vol.44,pp.49-63,2009. [101]A.Rahimi,L.BeniniandR.Gupta,Application-adaptiveguard-bandingtomitigatestaticanddynamicvariability,"Computers,IEEETransactionson,vol.63,pp.2160-2173,2014. [102]AbhishekKoneru,ArunkumarVijayan,KrishnenduChakrabarty,andMehdiB.Tahoori.Fine-GrainedAgingPredictionBasedontheMonitoringofRun-TimeStressUsingDfTInfrastructure,"InProceedingsoftheIEEE/ACMInternationalConferenceonComputer-AidedDesignICCAD'15IEEEPress,Piscataway,NJ,USA,51-58. [103]F.Firouzi,FangmingYe,A.Vijayan,A.Koneru,K.ChakrabartyandM.B.Tahoori,Re-usingBISTforcircuitagingmonitoring,"inTestSymposiumETS,201520thIEEEEuropean,pp.1-2,2015. [104]RonaldD.Blantonet.al.,StatisticalLearninginChipSLIC,"InProceedingsoftheIEEE/ACMInternationalConferenceonComputer-AidedDesignICCAD'15. [105]K.Peng,M.Yilmaz,K.ChakrabartyandM.Tehranipoor,Crosstalk-andProcessVariations-AwareHigh-QualityTestsforSmall-DelayDefects,"IEEETransactionsonveryLargeScaleIntegrationVLSISystems,vol.21,pp.1129-1142,2013 [106]M.Yilmaz,K.Chakrabarty,andM.Tehranipoor,Test-PatternSelectionSmall-DelayDefectsinVery-DeepSubmicronIntegratedCircuits,"IEEETransactionsonCAD,2010. [107]M.Yilmaz,K.ChakrabartyandM.Tehranipoor,Interconnect-AwareandLayout-OrientedTest-PatternSelectionforSmall-DelayDefects,"inProc.IEEEInternationalTestConferenceITC,Oct.2008. [108]N.AhmedandM.Tehranipoor,ANovelFaster-than-at-speedTransitionDelayTestMethodConsideringIR-dropEects,"IEEETrasactionsonCAD,2010. [109]G.Contreras,N.Ahmed,L.Winemberg,M.Tehranipoor,PredictiveLBISTModelandPartialATPGforSeedExtraction,"DefectandFaultToleranceinVLSISystems,DFTS2015. 127

PAGE 128

[110]M.Cho,C.Tokunaga,M.M.Khellah,J.W.TschanzandV.De,Aging-awareAdaptiveVoltageScalingin22nmhigh-K/metal-gatetri-gateCMOS,"IEEECustomIntegratedCircuitsConferenceCICC,2015. [111]J.Tschanzetal.,AdaptiveFrequencyandBiasingTechniquesforTolerancetoDynamicTemperature-VoltageVariationsandAging,"IEEEInternationalSolid-StateCircuitsConferenceISSCC,2007,pp.292-604. [112]Sang-SooLee,E.Boling,A.KuoandR.Rogenmoser,Aslew-ratebasedprocessmonitorandbi-directionalbodybiascircuitforadaptivebodybiasinginSoCapplications,"IEEECustomIntegratedCircuitsConferenceCICC,2013. [113]M.T.Rahman,D.Forte,J.FahrnyandM.Tehranipoor,ARO-PUF:Anaging-resistantringoscillatorPUFdesign,"2014Design,Automation&TestinEuropeConference&ExhibitionDATE,Dresden,2014,pp.1-6. [114]M.T.Rahman,K.Xiao,D.Forte,X.Zhang,J.ShiandM.Tehranipoor,TI-TRNG:Technologyindependenttruerandomnumbergenerator,"201451stACM/EDAC/IEEEDesignAutomationConferenceDAC,SanFrancisco,CA,2014,pp.1-6. [115]http://ptm.asu.edu/;Accessed:November2016. [116]http://www.cad.polito.it/downloads/tools/itc99.html;Accessed:November2016. [117]http://www.oracle.com/technetwork/systems/opensparc/index.html;Accessed:November2016. [118]http://www.opencores.org/;Accessed:November2016. [119]http://www.synopsys.com/;Accessed:November2016. [120]http://www.mathworks.com/products/matlab/;Accessed:December2015. 128

PAGE 129

BIOGRAPHICALSKETCHMehdiSadiobtainedhisBSwithhonorsinelectricalandelectronicengineeringfromBangladeshUniversityofEngineeringandTechnologyBUETin2009,MSwithdean'sfellowshipinelectricalengineeringfromUniversityofCaliforniaatRiverside,USAin2011andPh.DinelectricalandcomputerengineeringatUniversityofFlorida,Gainesville,USAin2017.HisPhDstudieswerefundedbySemiconductorResearchCorporationSRCandNSF.DuringhisPh.DstudiesheworkedasaresearchinternwithGlobalFoundriesatMalta,NYandwithNXPatAustin,TX.AftercompletinghisPh.DhejoinedIntelCorp.atHillsboro,OR.Hisresearchexpertiseareintheareasofreliablehardware n SoCdesign,DesignForTestDFT,back-enddesignandcomputerarchitecture. 129