Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-08-31.

MISSING IMAGE

Material Information

Title:
Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-08-31.
Physical Description:
Book
Language:
english
Creator:
Basu, Kanad
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Mishra, Prabhat
Committee Members:
Ranka, Sanjay
Stitt, Greg
Sahni, Sartaj
Gordon-Ross, Ann

Subjects

Subjects / Keywords:
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility:
by Kanad Basu.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Mishra, Prabhat.
Electronic Access:
INACCESSIBLE UNTIL 2014-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044429:00001


This item is only available as the following downloads:


Full Text

PAGE 1

EFFICIENTOBSERVABILITYENHANCEMENTTECHNIQUESFORPOST-SILICONVALIDATIONANDDEBUGByKANADBASUADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012KanadBasu 2

PAGE 3

Idedicatethistomyfamily. 3

PAGE 4

ACKNOWLEDGMENTS IwouldliketosincerelythankmyPh.D.advisorProf.PrabhatMishra,withoutwhoseguidance,thisdissertationwouldnothavebeenpossible.IwouldalsoliketothankmyPh.D.committeemembers:Prof.SartajSahni,Prof.SanjayRanka,Prof.GregStittandProf.AnnGordon-Rossfortheirvaluablesuggestions.IamalsothankfultomylabmatesWeixunWang,XiaokeQin,MingsongChen,KartikSrivastava,ChetanMurthy,HadiHajimiriandKamranRahmanifortheirhelpandsupport.IwouldalsoliketothankmymentorsduringmytwointernshipsatIntelCorporationDr.DhurbajyotiKalitaandDr.PriyadarsanPatra.Iwouldalsoliketotakethisopportunitytothankthosewhohelpedmeduringdifferentstagesofmyresearch.IsincerelythankDr.HenryKofromMcMasterUniversityandDr.XiaoLiufromtheChineseUniveristyofHongKongforhelpingmeunderstandvariousaspectsoftracesignalselectionandsignalrestoration.IamgratefultoDr.IlyaWagnerfromIntelCorporationforexplainingmetheconstraintsandobjectivesassociatedwithtestgenerationforpost-siliconvalidation.IwouldliketothankProfKrishnenduChakrabortyandZhangleiWangfromDukeUniversity,andDr.MehrdadReshadifromUniversityofCalifornia,Irvineforhelpfulsuggestions.Iwouldalsoliketoextendmygratitudetowardsmyfamilytohelpmereachthisstage.Finally,IwouldliketothankMr.SachinTendulkar,Mr.AkiraKurosawa,Mr.PaoloCoelhoandMr.JohnDenver,whothroughtheirlivesandimmortalcreationshavealwaysencouragedmetomoveforward,evenintimesofdespair. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 12 CHAPTER 1INTRODUCTION ................................... 13 1.1FaultsandDefectsinIntegratedCircuits ................... 15 1.2RestorationofUnknownSignals ....................... 17 1.3SignalRestorationVersusErrorDetection .................. 19 1.4Challenges ................................... 20 1.5ResearchContributions ............................ 24 1.6DissertationOrganization ........................... 26 2BACKGROUNDANDRELATEDAPPROACHES ................. 27 2.1TraceSignalSelection ............................. 28 2.2DynamicSignalSelection ........................... 29 2.3TraceDataCompression ........................... 29 2.4Observability-AwareTestGeneration ..................... 30 3RESTORATION-AWARETRACESIGNALSELECTIONTECHNIQUES .... 31 3.1Gate-levelSignalSelection(GSS) ...................... 32 3.1.1ComputationofEdgeValues ..................... 32 3.1.1.1Independentsignals ..................... 33 3.1.1.2Dependentsignals ...................... 35 3.1.1.3Example ........................... 36 3.1.2InitialValueComputationforStateElements ............. 37 3.1.3InitialRegionCreation ......................... 37 3.1.4RecomputationofNodeValues .................... 38 3.1.5RegionGrowth ............................. 39 3.1.6ComplexityAnalysis .......................... 39 3.2MotivationalExample ............................. 39 3.3RTL-levelSignalSelection(RSS) ....................... 40 3.3.1CDFGGeneration ........................... 42 3.3.2RelationshipComputation ....................... 43 3.3.2.1Directrelationship ...................... 43 3.3.2.2Conditionalrelationship ................... 47 3.3.3SignalSelection ............................. 47 5

PAGE 6

3.4Experiments .................................. 48 3.4.1ExperimentalSetup .......................... 48 3.4.2ResultsonGate-levelSignalSelection(GSS) ............ 49 3.4.3ResultsonRTL-levelSignalSelection(RSS) ............ 52 3.5Summary .................................... 54 4EFFICIENTCOMBINATIONOFTRACEANDSCANSIGNALS ......... 55 4.1BackgroundandMotivation .......................... 56 4.2TraceandScanSignalSelection ....................... 58 4.2.1Trace+ScanDebugArchitecture .................... 58 4.2.2TraceSignalSelectionAlgorithm ................... 59 4.2.3ScanSignalSelectionAlgorithm ................... 62 4.2.3.1Creationofminimalnodeset ................ 63 4.2.3.2Illustrativeexample ..................... 63 4.3ExperimentalResults ............................. 64 4.4Summary .................................... 66 5ERRORDETECTIONAWARETRACESIGNALSELECTION .......... 67 5.1TraceSignalSelectionforErrorDetection .................. 68 5.1.1GraphbasedModelingofCircuits ................... 69 5.1.2EdgeValueComputation ........................ 69 5.1.3NodeValueComputation ....................... 73 5.1.4SignalSelection ............................. 74 5.2Expriments ................................... 75 5.2.1ExperimentalSetup .......................... 75 5.2.2ErrorModel ............................... 76 5.2.3Results ................................. 77 5.3Summary .................................... 79 6DYNAMICSIGNALSELECTION .......................... 81 6.1ProblemFormulation .............................. 82 6.2Region-basedSignalSelection(RSS) .................... 84 6.2.1GraphBasedModelingofCircuits ................... 84 6.2.2ErrorPropagationProbabilityComputation .............. 85 6.2.3SignalSelectionBasedonNodeValues ............... 89 6.3DynamicSignalTracing(DST) ........................ 91 6.4Experiments .................................. 94 6.4.1ExperimentalSetup .......................... 94 6.4.2ResultsforTwoRegions ........................ 96 6.4.3ResultsforThreeRegions ....................... 98 6.4.4ResultsforFourRegions ........................ 99 6.4.5HardwareOverhead .......................... 101 6.5Summary .................................... 102 6

PAGE 7

7TRACEDATACOMPRESSIONUSINGSTATICALLYSELECTEDDICTIONARY 103 7.1TraceDataCompression ........................... 103 7.1.1DictionarySelectionAlgorithms .................... 105 7.1.1.1Dictionary-basedcompression(DC) ............ 105 7.1.1.2Bitmask-basedcompression(BMC) ............ 105 7.1.1.3FixeddictionaryMBSTWcompression(fMBSTW) .... 106 7.1.2DynamicTraceDataCompression .................. 107 7.1.3PerformanceAnalysiswithErroneousTraceData .......... 109 7.1.3.1CompressionpenaltyforDCandBMC ........... 109 7.1.3.2CompressionpenaltyforfMBSTW ............. 112 7.2Experiments .................................. 114 7.2.1CompressionPerformance ....................... 114 7.2.2BRAMRequirement(HardwareOverhead) .............. 117 7.2.3CompressionPerformancewithErroneousTraceData ....... 117 7.3Summary .................................... 120 8OBSERVABILITY-AWAREDIRECTEDTESTGENERATION .......... 122 8.1Test-awareSignalSelection .......................... 124 8.1.1FaultSimulation ............................. 124 8.1.2ErrorDetectionAbilityComputation .................. 125 8.1.3OverlapRemoval ............................ 126 8.2TraceSignalAwareTestGeneration ..................... 126 8.2.1SoftErrorsandFaults ......................... 126 8.2.2CrosstalkFaults ............................. 130 8.3Experiments .................................. 135 8.4Summary .................................... 140 9CONCLUSIONSANDFUTUREWORK ...................... 141 9.1Conclusions ................................... 141 9.2FutureResearchDirections .......................... 143 REFERENCES ....................................... 145 BIOGRAPHICALSKETCH ................................ 149 7

PAGE 8

LISTOFTABLES Table page 1-1Restoredsignalsusingtwoselectedsignals .................... 19 3-1Restoredsignalsusingourmethod ......................... 40 3-2ComparisonwithKoetal. .............................. 49 3-3ComparisonwithLiuetal.withdeterministicinputs ................ 50 3-4ComparisonofGSSwithPrabhakaretal. ..................... 51 3-5RTL-levelversusgate-levelsignalselection .................... 52 4-1Restoredsignalsusingtraceandscan ....................... 58 4-2Comparisonwithexistingtechnique ........................ 65 5-1DetectableErrorsfortheISCAS'89benchmarks ................. 76 6-1SelectedSignalsforeachMUXforn=4andm=4 ............... 92 6-2Tableforn=2andm=2 .............................. 93 8-1Memoryrequirementfortest ............................ 135 8-2Faultcoverageincaseofsofterrors ........................ 138 8-3Coverageofcrosstalkfaults ............................ 139 8

PAGE 9

LISTOFFIGURES Figure page 1-1ValidationandtestingphasesinICdesignow .................. 14 1-2Overviewofpost-siliconvalidation ......................... 15 1-3Signalrestoration ................................... 17 1-4Examplecircuit .................................... 18 1-5Examplecircuitwith12ip-ops .......................... 20 1-6ErrorPropagationfortheexamplecircuitinFigure 1-5 .............. 21 1-7Researchcontributions ............................... 23 3-1Examplecircuitwithngates ............................. 33 3-2Examplecircuit .................................... 35 3-3Graphicalrepresentationofexamplecircuit .................... 36 3-4Regioncreationandgrowth ............................. 38 3-5VerilogcodeandCDFG ............................... 41 3-6AportionoftheCDFGinFigure 3-5B ....................... 45 3-7SimpliedversionofFigure 3-6 ........................... 47 3-8OverviewofourexperimentstoverifyRSS .................... 48 3-9ComparisonofSignalSelectionTime ....................... 52 3-10ComparisonofRestorationPerformance ...................... 53 4-1Examplecircuitwithbothscanandtracesignals ................. 57 4-2ProposedArchitecture:Thewidthwofthetracebufferissharedbymtracesignalsandnsubchainsofthescanchain ..................... 60 4-3Graphicalrepresentationofexamplecircuit .................... 61 4-4ComparisonwithKoetal.andBasuetal. ..................... 65 5-1Examplecircuitwithlabeledsignals ........................ 69 5-2GraphicalrepresentationofFigure 5-1 ....................... 70 5-3ExampleusingANDgate .............................. 71 5-4ExampleusingORgate ............................... 71 9

PAGE 10

5-5Dip-opandNOTgate ............................... 72 5-6EdgevaluesforthegraphinFigure 5-2 ...................... 72 5-7NodevaluesforthegraphinFigure 5-6 ...................... 74 5-8SignalSelectionbasedonremovalofoverlap ................... 75 5-9ComparisonwithRestorationawaresignalselection ............... 77 5-10Variationoferrordetectionwithnumberoftracesignals ............. 78 5-11Variationoferrordetectionwithnumberofip-opstraced ............ 79 5-12Variationoferrordetectionwithnumberofoutputstraced ............ 80 6-1Illustrativeexampleshowingregionsanderrorzones ............... 83 6-2Examplecircuitwith2regionsand12ip-ops .................. 86 6-3GraphicalrepresentationofFigure 1-5 withtworegions ............. 87 6-4ExamplesusingANDandORgates ........................ 87 6-5NodevaluesforregionR1inFigure 6-3 ...................... 90 6-6Datapathandcontrollerdesignform=3andn=3 ................ 93 6-7ProposedDesign ................................... 94 6-8GSS .......................................... 96 6-9EZ-GSS ........................................ 96 6-10RSS+DST ....................................... 97 6-11ComparisonofEDRperformancewhenbothregionsareactive ......... 97 6-12ComparisonofEDRperformancewhenonlyoneregionisactive ........ 98 6-13ComparisonofEDRperformanceontheOpencorescircuits ........... 99 6-14ComparisonofEDRperformancewhenoneregionisactive ........... 99 6-15ComparisonofEDRperformancewhentworegionsareactive ......... 100 6-16ComparisonofEDRperformancewhenoneregionisactive ........... 100 6-17ComparisonofEDRperformancewhentworegionsareactive ......... 101 6-18ComparisonofEDRperformancewhen3regionsareactive ........... 101 7-1Overviewofourtracecompressionprocedure ................... 104 10

PAGE 11

7-2ExampleofdictionaryselectioninfMBSTW .................... 108 7-3ActualTraceDataCompression .......................... 108 7-4Comparisonofcompressionperformance ..................... 116 7-5Compressionperformancewithdictionaryentries ................. 116 7-6BRAMrequirements ................................. 118 7-7ComparisonofcompressionpenaltyforDC .................... 119 7-8ComparisonofcompressionpenaltyforBMC ................... 119 7-9ComparisonofcompressionpenaltyforfMBSTW ................. 120 7-10Comparisonofcompressionpenalty ........................ 121 8-1Outlineofproposedtechnique ........................... 122 8-2Observability-awaretestgenerationow ...................... 123 8-3Examplecircuit .................................... 125 8-4Examplecircuitillustratingtestgenerationforsofterrors ............. 129 8-5Examplecircuitillustratingcrosstalkfaults ..................... 130 8-6Positiveglitchonc .................................. 131 8-7Positivedelayonc .................................. 132 8-8Non-simultaneoustransitions ............................ 132 8-9Duplicatedcircuit ................................... 134 8-10Modiedcircuit .................................... 135 8-11Comparisonofsignalselectionmethodsforsofterrors .............. 137 8-12VariationofEDRwithtracebufferwidthforsofterrordetection ......... 138 8-13Comparisonofsignalselectionmethodsforcrosstalkfaults ........... 139 11

PAGE 12

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyEFFICIENTOBSERVABILITYENHANCEMENTTECHNIQUESFORPOST-SILICONVALIDATIONANDDEBUGByKanadBasuAugust2012Chair:PrabhatMishraMajor:ComputerEngineeringPost-siliconvalidationiswidelyacknowledgedasamajorbottleneckforcomplexintegratedcircuits.Duetoincreasingdesigncomplexitycoupledwithshrinkingtime-to-marketconstraints,itisnotpossibletodetectalldesignaws(errors)duringpre-siliconverication.Post-siliconvalidationneedstocapturetheseescapedfunctionalerrorsaswellelectricalfaults.Amajorconcernduringpost-silicondebugistheobservabilityofinternalsignalssincethechiphasalreadybeenmanufactured.Designoverheadconsiderationslimitthenumberofsignalstatesthatcanbetracedorstoredinatracebuffer.Thisdissertationproposesnoveltechniquestoenhancetheobservabilityduringpost-silicondebug.Myresearchhasthreemajorcontributions:protablesignal-selection,efcienttracecompressionandobservability-awaretestgeneration.Itproposesefcientsignalselectiontechniquestoenhancetheobservabilityofthecircuit.Varioussignalselectionconstraintsareexploredincludingstaticversusdynamic,traceversusscanandgeneral-purposeversusarchitecture-specicfeatures.Toimprovetheobservabilityfurther,anefcienttracecompressionapproachhasbeenproposed.Extensiveexperimentalresultsdemonstratesignicantimprovementinoverallsignalobservability.Thisdissertationalsoproposesobservation-awaredirectedtestgenerationtechniquestodrasticallyreducetheoverallpost-siliconvalidationeffort. 12

PAGE 13

CHAPTER1INTRODUCTIONDesigncomplexityisincreasingrapidlykeepingpacewithtwo-foldincreaseinnumberoftransistorseverytechnologycycle.Drasticincreaseindesigncomplexityhasledtosignicantincreaseinvalidationcomplexity.ThereportfromInternationalTechnologyRoadmapforSemiconductors(ITRS)[ 1 ]aswellasotherreputedagenciesindicatethatitiscriticaltodevelopefcientdesignvericationtechniquestosecuredesignproductivityscalingatapaceconsistentwithprocesstechnologycycles.Therehasbeenaplethoraofresearcheffortsinbothindustryandacademiatodevelopscalabledesignvalidationapproachesusingacombinationofsimulationbasedtechniquesandformalmethods.Inspiteofextensiveefforts,itisnotalwayspossibletodetectallthefunctionalerrorsandelectricalfaultsduringpre-siliconvalidation.Post-siliconvalidationisusedtodetectdesignawsincludingtheescapedfunctionalerrorsaswellaselectricalfaults.Post-siliconvalidationiswidelyacknowledgedasamajorbottleneckforcomplexintegratedcircuitsincludingmodernmicroprocessorsaswellascomplexSystem-on-Chip(SoC)designs.Variousindustrialstudiesindicatethatthepost-siliconvalidationeffortconsumesmorethan50%ofanSoCsoveralldesigneffort(measuredintotalcost)at65nmtechnology[ 2 ].Thesestudiesalsoemphasizethefactthattheproblemgetsworseastheindustrycontinuestomovetoevensmallergeometries.Figure 1-1 showstheoverviewofthreeimportantvalidationandtestingphasesinatypicalSoCdesignmethodology.Pre-siliconvalidationeffortincludesvalidationofvariousfunctionalaswellastimingrequirementsacrossabstractionlevelsincludingspecicationandimplementation.Manufacturingtestingisprimarilyusedtodetectphysical(structural)defectsineachofthemanufacturedICs.Ontheotherhand,thefocusofpost-siliconvalidationistodetectdesignawsthathaveescapedpre-siliconvalidation.Inreality,vastmajorityoffunctionalerrorsarecapturedinthepre-silicon 13

PAGE 14

stage.Whileasmallpercentageoffunctionalerrorsremain,thetimerequiredtondandxthemisstillveryexpensive.Itisimportanttonotethatmajorityoftheelectricalfaults(includingcrosstalk,delayandtransientfaults)arecapturedduringpost-siliconvalidation,sinceitisdifculttomodelelectricalerrorsduringthepre-siliconvericationphase. Figure1-1. ValidationandtestingphasesinICdesignow Figure 1-2 presentsanoverviewofthepost-siliconvalidationanddebugmethodology.InputtestsareappliedtotheDeviceUnderTest(DUT).Dependingofthetestgenerationtechniques,theinputtestscanberandom,constrained-randomordirectedinnature.Duringexecution(runtime),statesofsomeselectedsignalsaretracedandstoredinanon-chiptracebuffer.Notethatthesignalswhosestatestobestoredaredecidedduringthepre-silicon(design)phaseusingsuitablesignalselectionalgorithms.Toincreasetheeffectivetracebuffersize,varioustracecompressiontechniquescanbeemployed.Whenafailureisdetected,thecontentsofthetracebufferisdumpedouttoaidinpost-silicondebugusinganofinedebugger. 14

PAGE 15

Figure1-2. Overviewofpost-siliconvalidation Theremainderofthischapterisorganizedasfollows.FirstwediscussaboutpotentialfaultsanddefectsinaSystem-on-Chip(SoC).Next,wedescribebasicsofsignalrestorationandcomparesignalrestorationanderrordetection.Finally,wediscussvariouschallengesassociatedwithpost-siliconvalidationandoutlineourcontributionstoaddressthesechallenges. 1.1FaultsandDefectsinIntegratedCircuitsDefectsanderrorsmaygetintroducedatdifferentphasesofSoCdesigncycle-designtime,synthesis,manufacturing,etc.Efcientfaultmodelingofthesearenecessarytoeffectivelyanalyze,detectandxthem.Functionalerrorsareintroducedduringdevelopmentofspecicationaswellasimplementation.Itisextremelyimportanttodetectandxthesefunctionalerrors.Someofthecommondefectsinachip[ 3 ]includeprocessingdefects(parasitictransistors,oxidebreakdown,etc.),materialdefects(surfaceimpurities),time-dependentfailures(electromigration,dielectricbreakdown,etc.)andpackagingfailures(sealbreak).Studies[ 4 ]revealthatofallthedefectsobservedinaPrintedCircuitBoard(PCB),51%areduetoshorts,thusmakingelectrical 15

PAGE 16

errorsalsoimportantinanySoCvalidationmethodology.Commonfaultmodelsusedtomodeldefectsanderrorsare: SingleStuck-at-Fault(representedass)]TJ /F3 11.955 Tf 10.95 0 Td[(a)]TJ /F5 11.955 Tf 10.95 0 Td[(0ors)]TJ /F3 11.955 Tf 10.95 0 Td[(a)]TJ /F5 11.955 Tf 10.95 0 Td[(1) TransistorOpenandShortFault MemoryFaults GlitchFaults PLAFaults FunctionalErrors DelayFaults AnalogFaultsPost-siliconvalidationdealswitherrorsanddefectscorrespondingtoallthesefaultmodels.However,incaseofmanufacturingtesting,errorsprimarilyrelatedtomanufacturingdefects(suchasstuck-atfaultsandtransistoropens/shorts)areconsidered.ElectricaldefectsmanifestasimportanterrorsinmodernSoCs.Softerrorsandcrosstalkfaultsaretwoimportantdefectsthatcanadverselyaffectthecorrectfunctionalityofthechip.Whilesofterrorsarecausedbyradioactiveeffectsondesignimpurities,crosstalkfaultsoccurduetoimperfectcouplingcapacitancebetweentwolinesinthechip.Softerrorscanbemodeledusingsinglestuck-at-faults.Effectsofcrosstalkcanberepresentedasglitchesanddelayfaults.Effectivedirectedtestgenerationstrategiesneedtobeemployedinordertodetectthesefaults.Thetestsshouldbeabletoactivatethefaultsandpropagatethemtowardstheobservationpoints,e.g.,primaryoutputsorinternaltracesignals.Nowwediscussabouttwoimportantconceptsrelatedtosignalselection,signalrestorationanderrordetection.Sinceonlyafewsignalscanbeobservedduringpost-siliconvalidation,itisimportanttorestoretheunknownsignalstatesasmuchaspossibletoenhancetheobservabilityofthechip.Ontheotherhand,itisequallyimportanttoactuallyobservetheerrorsthroughthetracesignals. 16

PAGE 17

1.2RestorationofUnknownSignalsInpost-silicondebug,unknownsignalstatescanbereconstructedfromthetracedsignalstatesintwoways-forwardandbackwardrestoration.Forwardrestorationdealswiththerestorationofsignalsfrominputtooutput,thatis,knowledgeofinputstatescanhelpinrestoringthevalueoftheoutput.Backwardrestoration,ontheotherhand,dealswithreconstructingtheinputfromtheoutput.ForwardandbackwardrestorationcanbeillustratedusingtheexampleinFigure 1-3 .Weusea2-inputANDgatetoexplaintherestorationprocess.ForwardrestorationisshowninFigure 1-3 (a).Whenoneinputis0orbothinputsare1,theoutputcanbeconstructed.Figure 1-3 (b)showsbackwardrestoration.Whentheoutputis1ortheoutput0withoneinput1,theotherinputcanbereconstructed.Itiseasiertoreconstructsignalsusingforwardratherthanbackwardrestoration.Ifallbuttheunknownsignalvaluesareknown,forwardrestorationcandenitelydeterminetheunknown,whilebackwardmightfailtodosoinspeciccases.Forexample,inFigure 1-3 (c),whentheoutputis0andoneoftheinputis0,thereisnowaytodeterminethestateoftheotherinput.Althoughwehaveillustratedthesignalreconstructionusinga2-inputANDgate,therestorationprocedurecanalsobedescribedinasimilarmannerforothertypesoflogicgatesaswellaswithmoreinputs. Figure1-3. Signalrestoration 17

PAGE 18

Figure1-4. Examplecircuit ThesignalreconstructionprocedureisillustratedusingasimplecircuitshowninFigure 1-4 .Letusassumethatthetracebufferwidthis2,thatis,statesoftwosignalscanberecorded.Wetrytorestoretheothersignalstatesbyapplicationoftheexistingsignalselectionmethodspresentedin[ 5 ]and[ 6 ].TheresultsareshowninTable 1-1 .The`X'srepresentthosestateswhichcannotbedetermined.Theselectedsignalsareshowninshades.Forboth[ 5 ]and[ 6 ]thesignalsselectedareCandF,inthatorder.Restorationratio,whichisapopularmetricforcalculationofsignalrestorabilityisdenedas: RestorationRatio=numberofstatesrestored+numberofstatestraced numberofstatestraced.(1)LetuscalculatethenumberofrestoredstatesinTable 1-1 .IfweconsidertherowcorrespondingtosignalA,twoentrieshavevalue0,whiletheresthavevalueX(non-restoredstate).Thus,twostatesareknown.Similarly,twostatesareknownfortherowcorrespondingtosignalB.SincesignalCistraced,allthestatesareknown(noXintherow).ForsignalD,threeentriesintherowhavevalue0,hencethreestatesarereconstructed.Computinginthismanner,atotalof26statesarereconstructed.Outofthem,10entries(correspondingtosignalsCandF)aretracedstates.Therefore,RestorationRatiointhiscaseis2.6. 18

PAGE 19

Table1-1. Restoredsignalsusingtwoselectedsignals SignalCycle1Cycle2Cycle3Cycle4Cycle5 AX0X0XBX0X0X C 1 1 0 1 0DXX000EXX000 F 0 1 1 0 0GX00X0HX00X0 1.3SignalRestorationVersusErrorDetectionThissectioncomparestwowidelyusedmetric,staterestorationanderrordetection.Majorityoftheexistingsignalselectionapproaches[ 6 7 ]trytofocusonrestorationofunknownsignalsusingtheknowledgeofknownsignalstates.LetusconsideranexamplecircuitinFigure 1-5 comprisedof12ip-ops.InFigure 1-5 ,tracingstatesofip-opsAandBincyclet,helpstorestorethestateofip-opDincyclet+1,sincetheinputtoip-opDisanORoftheoutputsofip-opsAandB.Similarly,sinceip-opsCandHareconnectedbyaNOTgate,tracingHincycletprovidesthestateofCincyclet)]TJ /F5 11.955 Tf 11.11 0 Td[(1.Notethatsignalrestorationcanproceedinbothforward(inputtooutput)andbackward(outputtoinput)direction.Forexample,therestorationofDfromAandBwasinforwarddirection,whiletherestorationofCfromHisinbackwarddirection.Incaseoferrordetectiononlybackwardrestorationismeaningful.Toexplainthisscenario,apartoftheillustrativeexampleofFigure 1-5 hasbeenredrawninFigure 1-6 .Whenip-opDisinerror,thebugcanonlypropagatealongaforwarddirectioninitsfan-outconetowardsip-opsEandG.Therefore,inordertodetecttheerrorinD,wehavetotraceeitherofthesetwoip-ops.TracingtheinputsofD,orthoseip-opsthatareinitsfan-incone(AandC)donothelp.Giventhatadesignerisinterestedindetectinganerror,itismeaningfultofocusdirectlyonerrordetectionmetricinsteadofrestorationperformance. 19

PAGE 20

Figure1-5. Examplecircuitwith12ip-ops 1.4ChallengesTherearemanychallengesindevelopingefcienttechniquesforpost-siliconvalidationanddebug.ThissectiondescribesveimportantchallengesthatIhaveaddressedinthisdissertation.Challenge1:Thesignalstobetracedshouldbeselectedcarefullyinordertomaximizetherestoration.Signalselectiontechniquesbasedonpartialrestoration(describedinChapter 3 )wereproposedbyKoetal.[ 5 ]andLiuetal.[ 6 ].Ifthetracebufferwidthisn,boththeseapproachesselectednsignalswithhighestpartialrestorationabilitiesfortracing.However,signalselectionbasedonpartialrestorationdoesnotprovidethebestreconstruction.Also,thesemethodsarecomputationallyinefcient,sincetheyrequirelongsignalselectiontimeoperatingongate-levelnetlists. 20

PAGE 21

Figure1-6. ErrorPropagationfortheexamplecircuitinFigure 1-5 Anefcientalgorithmisneededwhichcanselecttracesignalsprovidinghighrestorationofuntracedsignalswithfastsignalselectiontime.Challenge2:Thetimerequirementforgate-levelsignalselectionalgorithmsishighbecauseoftheexcessivenumberofvariablesusedtorepresentip-opsandotherinternalsignals.OnepromisingalternativetoreducesignalselectiontimeistoexploreathigherabstractionlevellikeRegisterTransferLevel(RTL).Koetal.[ 8 ]developedasignalselectionapproachwhichselectssomesignalsfromtheRTL-levelandsomefromthegate-levelnetlistdescriptionofthecircuit,thatis,theirsignalselectiondoesnotdependontheRTL-leveldesignalone.ItisdesiredthatasignalselectiontechniquebedevelopedwhichsolelyreliesontheRTL-levelimplementationofthedesign,thusreducingthememoryandtimerequirementassociatedwithanygate-levelnetlist.TheprimarychallengetodevelopanRTL-levelsignalselectionalgorithmistoensurethatthisapproachdoesnotincuranyrestorationpenaltycomparedtogate-levelsignalselectionalgorithms.Challenge3:Scanbaseddebuggingispopularinmanufacturingtestdomain.Theyareprimarilyusedtoidentifyfabricationdefects.Theuseofscanchainsforimprovingsignalobservabilityduringpost-silicondebughasbeenextensivelystudied[ 9 11 ].Severalapproaches[ 12 13 ]havestudiedthecombinationofscanandtrace 21

PAGE 22

signalsforpostsilicondebug.Amajorchallengeisthatthenumberoftracesignals,scansignalsandscandumpfrequencyareinter-dependent.Forexample,selectingmoretracesignalsimplieslessspaceforscansignals,andviceversa.Evenwhenthespaceforscansignalsisreserved,choosingalargescanchain(toomanyscansignals)implieslongerscandumpfrequency.Inotherwords,thereneedstobeacriticalbalancebetweenhowmanysignalstoobserveversushowmanysignalstatescanbeobtainedinaspecicclockcycle.Challenge4:Tillnow,wehaveconsideredhowtracingsomesignalshelpsustoreconstructtheuntracedsignals.Debugginginvolvesdetectionoferrorsbythetracedsignals.Thus,itisnecessarytotracesignalswithanaimofmaximizingthenumberoferrorsdetected.Duringdebug,somepartsofthecircuitmaybelessimportantfordebugpurposesthantherestduringparticularcycles.Forexample,thedebugengineermightbeinterestedindetectingerrorsintheprocessorblockofanSoCinsteadofthememoryblock.Therefore,itisimportanttodevelopasignalselectionalgorithmthatcandynamicallyselectsignalsbasedonthecurrentlyactiveregionstoenhanceerrordetection.Challenge5:Thecircuitobservabilitycanbefurtherenhancedbycompressingthetracebuffercontents.Thisallowsmoresignalstatestobestoredinthetracebufferwithoutincreasingitssize.Sincethetracebufferdoesnotcontributetotheactualchipfunctionalityexceptdebug,itssizeshouldbeassmallaspossible.Thetracebufferhastwoparameters,widthanddepth.Widthreferstothetotalamountofdebugdatathatcanbestoredpercycle,whiledepthreferstothetotalnumberofcyclesoverwhichdebugdatacanbestored.Inordertokeepthetracebuffersizeconstant,whileincreasingtheamountoftracedatathatcanbestored,eitherthedepthorthewidthhastobecompressed.Differenttechniquesoftracedatacompression,eitherbydepth[ 14 15 ]orbywidth[ 16 ]havebeenproposed.Depthcompressionapproachesdealwithselectingthecycleswherethedataarelikelytobeerroneous,andstorethe 22

PAGE 23

dataforonlythosecycles.Ontheotherhand,inwidthcompression,thetracedataobtainedeverycycleisrstcompressedandthenstoredinthetracebuffer.Anefcientlosslesstracedatacompressiontechniqueisnecessarywhichcanprovidebothfastcompressionandhighcompressionefciencywithoutintroducingsignicanthardwareoverhead.Challenge6:Efcientsignalselectionandtracecompressionenhancetheobservabilityduringpost-silicondebug.However,itisequallyimportanttoimprovecontrollabilityaspectduringpost-siliconvalidation.Theinputtestsappliedtothecircuitshouldbecarefullydesignedinordertomaximizeerrordetection.Thisinvolvesestimatingthecornercasescenariosanddesigningteststhatcanactivatethosescenariostoallowtheerrorstopropagatetothetracedsignals.Therefore,itisimportantforthetestdesignertohaveknowledgeofthesignalsthatarebeingtraced.Conversely,ifthetestsareknownapriori,thetracesignalscanbeselectedefcientlytoenhanceerrordetectioncapability. Figure1-7. Researchcontributions 23

PAGE 24

1.5ResearchContributionsMyresearchproposesnoveltechniquestoaddresschallengesinenhancingthesignalobservabilityforpost-siliconvalidationanddebug.Theobjectiveofmyresearchistodevelopefcientsignalselectionandtestgenerationaswellasadeptapproachesfortracecompression.Thefourmajorcontributionsofmyresearcharesummarizedasfollows.Figure 1-7 highlightsthesecontributionsinICdesignmethodology.Contribution1[TraceSignalSelection]:ThiscontributionaddressestherstthreechallengesoutlinedinSection 1.4 .Existingsignalselectionalgorithmsusedpartialrestorability,whichisnotoptimalforsignalrestoration.Atotalrestorability1basedsignalselectionalgorithmhasbeenproposedwhichiscomputationallymoreefcientandproducessignicantlybetterrestorationperformancecomparedtotheexistingapproaches.WehaveextendedthesignalselectionapproachtoRTLlevel,toreducethesignalselectiontimeaswellasthememoryrequirements,withoutsacricingsignicantlyontherestorationperformance.Wehavealsoproposedanefcienttechniquetodeterminetheprotablecombinationoftraceandscansignalsforpost-silicondebug.Theentiretracebufferwidthisdividedtoaccommodatebothtraceandscansignals.Ourapproachusesagraphbasedrepresentationtoselectthreeimportantaspects:(i)efcienttracesignalstobestoredeverycycle,(ii)themostprotablescansignalstobeincludedintheshadowscanchain,and(iii)thescandumpfrequencybasedonthetracebufferwidthconstraints.Contribution2:[DynamicSignalSelection]:ThiscontributionaddressesChallenge4.Existingtracesignalselectionalgorithmsdealwithimprovingtherestorationofuntracedsignalsanddoesnotfocusonerrordetection.Wehaveproposedasignalselectionalgorithmwhichselectsprotablesignalsforefcienterrordetection.Ouralgorithmlaysemphasisonhowerrors,whichpropagatefromerrororigintowardssignalsin 1PartialandtotalrestorabilityhavebeendiscussedinChapter 3 24

PAGE 25

thefan-outconecanbedetected.Wehavealsoproposedaregion-awaresignalselectionalgorithm(RSS)thatselectsprotablesignalsduringdesigntimebasedontheknowledgeoffunctionalregionsandassociatederrorzones.Wehavedevelopedalow-overheaddynamicsignaltracing(DST)hardwaretoenabledesignerstotracedifferentsetofsignalsduringexecutionbasedonactive(relevant)functionalregions.Thislaysemphasisontheactiveerrorzonesinthecircuitthatcanbedetectedusingaspecicallyselectedsetoftracesignals.Tothebestofourknowledge,thisistherstattemptindevelopinganefcientspatio-temporalsolutionfordynamictracesignalselection.Contribution3:[TraceDataCompression]:ThiscontributionaddressesChallenge5.Existingtracecompressionalgorithmschoosethedictionaryonline(thusincludesalltheuniqueentriesinthedictionary),whichresultsinpoorcompressionperformanceaswellasincreasedhardwareoverhead.Studies[ 14 15 ]revealthatthedifferencebetweentheactualtracedataandtheideal(error-less)tracedataisverysmallforpost-silicondebug(2-5%).Thismotivatedustodevelopalosslessdictionarybasedwidthcompressionschemethatoperatesonreal-timetocompressthetracedatausingastaticallyselecteddictionary.Thisprovidesabettercompressionperformancesinceonlyprotableentriesarestoredinthedictionary.Thisalsoprovideshugereductionincompressionhardwareoverhead.Threedifferentcompressionalgorithmshavebeenproposedtotrade-offbetweencompressionperformanceandhardwareoverhead.Contribution4:[Observability-AwareTestGeneration]:ThiscontributionaddressesChallenge6.Theteststhatareusedduringpost-siliconvalidationareproducedusingrandomorconstrained-randomtestgenerationtechniques.However,errordetectionperformancecanbeenhancedifthetestsaredesignedkeepingthedetectionobjectiveinmind.Thetestsshouldbedesignedtoexcitetheerrors(speciallycornercaseones)andpropagatethemtotheobservationpoints,thatis,tothetracedsignals.Moreover,thetracesignalsarechosenassumingtheinputtestsarerandominnature.However,if 25

PAGE 26

thetestsetsareknownapriori,adequatetracesignalscanbeselectedtoenhanceerrordetection.Weproposeefcienttechniquestodeterminethetracesignalsaswellasthetestsetsthatwouldhelpindetectingerrorsinthedesign. 1.6DissertationOrganizationThedissertationisorganizedasfollows.Chapter 2 presentsrelatedapproachesforsignalselection,tracecompressionandtestgeneration.Chapter 3 describesourtracesignalselectionalgorithm.Chapter 4 exploresanefcientcombinationoftraceandscansignals.Chapter 5 describesoursignalselectiontechniquethatfocusesonerrordetection.Chapter 6 presentsourdynamicsignalselectionalgorithm.AtracecompressiontechniquebasedonstaticdictionaryselectionispresentedinChapter 7 .Chapter 8 describesobservability-awaretestgenerationtechniques.Finally,Chapter 9 concludesthedissertation. 26

PAGE 27

CHAPTER2BACKGROUNDANDRELATEDAPPROACHESThefocusofthisdissertationistoreducetheoveralleffortofpost-siliconvalidation,whichisoneofthemostcriticalstagesinSystem-on-Chip(SoC)designmethodology.Post-siliconvalidationanddebugcomprisesofsignalobservationandanalysisasdescribedinSection 1.2 .Aprimaryproblemforpost-silicondebugisthelimitedobservabilityofinternalsignalstatessincethechiphasalreadybeenfabricated.Oncethesignalstatesareknown,theycanbeanalyzedusingsomealgorithmslikefailurepropagationtracing[ 17 ]toidentifytheerrorsinthecircuit.Koushanfaretal.[ 18 ]proposedamethodtoobtaintheinternalstatesofasystemusingagoldencut.However,theirmethodisnotapplicableforpost-silicondebugsinceitisdifculttostopexecutionofaprocessrunningonachipandobtaintheknowledgeofallthecurrentsignalstates.FormalanalysisforpostsilicondebugproposedbyDePaula[ 19 ]isnotapplicabletocircuitswithalargenumberofgates.PhysicalprobingtechniqueswereproposedbyNatarajetal.[ 20 ].DecreaseinfeaturesizeandgrowingcomplexityofICdesignshavemadeitdiffculttoimplementthesetechniquesinpractice.AmethodforvalidationofmemorysubsysteminCMPswasproposedbyDeOrio[ 21 ],whichonlyfocusesonthememorysubsystem.Scanbaseddebuggingtechniquessuchas[ 11 ]arenotappropriatesincetheyrequiretostopthecircuitfunctionalitywhenthescandataisbeingwritten.Thisisparticularlynotbenecialincaseswherethefunctionalerrorsaredrasticallyapart.Doublebuffering[ 22 ]ofscanelementshelpstomitigatethisproblem,butwithalargeareapenalty.Design-for-Debug(DfD)techniqueshavebeenusedextensivelytoincreasetheobservabilityofinternalsignalsofthesilicon.Generallythisisperformedbysamplingthedatawhichisstoredinon-chiptracebuffers.VariousDfDtechniqueslikeembeddedlogicanalyzer(ELA)[ 23 ]andshadowipops[ 22 ]havebeenproposedovertheyearsforpost-silicondebug.ELAcanbeusedtoprobeintothechipandrecordsomeinternal 27

PAGE 28

logicstates.Thetraceisthenrecordedinanon-chiptracebuffer.Duringdebug,thecontentsoftracebufferistransferredtoanofinedebuggerviasomeJointTestActionGroup(JTAG)interface.Inthefollowingsubsections,wediscussrelatedapproachesintheareaofsignalselection,tracecompressionandtestgeneration. 2.1TraceSignalSelectionSinceELAallowsonlyafewsignalstotrace,theyshouldbecarefullyselectedinordertoenhancetheoverallobservabilityduringpost-siliconvalidation.AlogicimplicationbasedtracesignalselectionmethodwasproposedbyPrabhakaretal.[ 24 ].Theauthorsusedtheprimaryinputs,inadditiontothetracedsignalsforrestorationpurposes.Koetal.[ 5 ]andLiuetal.[ 6 ]haveproposedgenerictracesignalselectionalgorithmsinwhichafewimportantsignalscanbetracedandotherscanbereconstructedfromthem.Alltheseapproachesusegate-levelnetlistmodelofadesignforsignalselectionpurposes.BothspaceandtimecomplexitycanbereducedifthesameoperationisperformedathigherabstractionlevelslikeRTL.However,careshouldbetakentoavoiddegradationofrestorationperformance.ConsiderableresearchhasbeenperformedovertheyearsinvolvingbothgateandRTL-leveldesignsintheeldsoftestingandvalidation.AnRTLfaultgradingapproachwasusedtoamelioratethegate-levelfaultcoveragebyMaoetal.[ 25 ].RTL-leveltestsweregeneratedandreusedfordetectinggatelevelstuck-at-faultsbyYogietal.[ 26 ].RecentlyRTL-levelsignalselectionalgorithmsforpost-siliconvalidationwereproposedbyKoetal.[ 8 ].However,[ 8 ]selectssomesignalsfromtheRTL-levelandthentherestfromthegate-leveldescriptionofthecircuit.Thus,bothRTL-levelandgate-levelmodelsofthedesignarenecessaryin[ 8 ]toselectsignalsthataddstothememoryandtimeoverheadofsignalselection.Scanchainshavebeenusedforimprovingthesignalobservabilityduringpost-silicondebug[ 9 11 ].Acombinationofscanandtracesignalsforpostsilicondebugwasrstproposedby[ 12 ].Intheirapproach,thetracebufferisusedtodetermine 28

PAGE 29

thetimewindowoverwhichthebugmighthaveoccurred.Theexperimentisthenre-runwiththescandataconcentratingonthatparticulartimewindow.Combinationoftraceandscandatawerealsousedby[ 13 ]forsilicondebug.Theyusedmultiplerunsofthesameexperimenttoobtainthetracedata.Thescandatawereusedtoselectaknownstate.Bothoftheseapproachesassumerepeatableexperimentsi.e.,thecircuitresponseisuniformformultipledebugruns.Koetal.[ 27 ]proposedanapproachofcombiningscanandtracedatathatworkswelleveninnon-repeatableexperiments.However,theirmethodusedanexhaustiveexplorationofallpossibletrace-scancombinationstodeterminethebestresultforaparticularcircuit.Thisexhaustiveexplorationmaynotbesuitableforpracticalpurposes. 2.2DynamicSignalSelectionExistingtracesignalselectionalgorithmsstaticallyselectedasetofsignalsthatwouldbetracedeverycycle.Prabhakaretal.[ 28 ]proposedanapproachtoalternatebetweentwosetsofsignalsinalternatecycles.Asaresult,itisaveryspeciccaseoftemporaldistributionoferrorswithoutanyconsiderationforspatialdistribution.AmultiplexedsignalselectionforerrordetectionwasproposedbyLiuetal.[ 29 ].Theirapproachisanad-hocsignalselectionheuristicbasedonerrorvisibilitymetric.Itapproachdoesnotconsiderthechallengesassociatedwithdynamicsignalselectioninthepresenceofspatialandtemporaldistributionoferrors. 2.3TraceDataCompressionToincreasetheamountofdatathatcanbestoredinatracebufferwhilekeepingthetracebuffersizeconstant,tracecompressiontechniqueshavebeenproposed[ 14 16 ],whichcompressthetracedatabeforestoringthemintotracebuffer.Thisenablesustoobservemoretracedatawhilekeepingthetracebuffersizeconstant.Thetracebufferhastwoparameters-widthanddepth.Widthreferstothenumberofsignalswhosestatescanbestoredeverycycle,whiledepthreferstothenumberofcyclesoverwhichthetraceisstored.Existingtracecompressionapproachesdifferintermsof 29

PAGE 30

compressionobjectives-while[ 16 ]compressesthewidthofthetracebuffer,[ 14 15 ]compressthedepth. 2.4Observability-AwareTestGenerationSofterrorsandcrosstalkfaultsaretwomajorelectricalerrorsfoundinafabricatedSoC.Effectofsofterrorsonmemorydeviceshadbeenstudiedasearlyasin1979byMayetal.[ 30 ].Overtheyears,researchers[ 31 34 ]havestudiedthevariousaspectsofsofterrors.Sanyaletal.[ 35 36 ]haveproposeddifferentmethodsfordirectedtestgenerationforsofterrors.However,theseapproachesarenotdesignedforpost-siliconvalidationpurposes,thatis,theyassumealltheoutputsignalsofalogicblockarevisible.However,duringpost-siliconvalidation,sincethechipisfabricated,observingtheoutputsignalsofeverycomponentmaynotbefeasiblesincethesecomponentscanbeembeddedinanSoC.Theonlyobservablepointswouldbethetracesignals.Thetestgenerationalgorithmsneedtobemodiedtotakethisintoaccount.Crosstalkfaultsoccurwhentwolinesinacircuitaresonearthattheirmutualcapacitanceaffectstheirstate.Effectsofcrosstalkfaultsondigitalcircuits[ 37 40 ]havebeenstudiedextensively.Existingtestgenerationalgorithmsforcrosstalkfaults[ 41 43 ]sufferfromthesameproblemasthecorrespondingtestgenerationalgorithmsforsofterrors-thatis,theyarenotsuitableforapplicationinpost-siliconvalidationduetolimitedobservabilitythroughtracesignals. 30

PAGE 31

CHAPTER3RESTORATION-AWARETRACESIGNALSELECTIONTECHNIQUESDuringpostsilicondebug,thesignalstatesarestoredinanon-chiptracebuffer.Limitedsizeofthetracebufferallowsonlysomesignalstobetraced.Therestofthesignalshavetobereconstructedfromthem.Therefore,thesignalstobetracedshouldbecarefullyselectedinordertoenhancetheoverallsignalrestorationacrossthecircuit.Existingsignalselectionapproaches[ 5 6 ],whichutilizepartialrestorability1arenotabletoprovidebestpossiblesignalreconstruction.Weproposetotalrestorability2basedsignalselectionalgorithmsthatcanoutperformexistingapproachesinbothsignalselectiontimeandthequalityoftheselectedsignals.Signalselectionathigherabstractionlevelsispromisingtoreducetheoverallsignalselectiontime.SincethenumberofvariablesusedtorepresenttheregistersandothersignalsislessatRTL-level,theoverallcomplexityismuchreduced.WehaveproposedanefcientsignalselectionapproachinRTLlevel.OurRTL-levelsignalselectionreducesthesignalselectiontimeaswellasthememoryrequirementssignicantlywithoutsignicantpenaltyinrestorationperformance.Ourproposedmethodhassimilaritywiththesignalselectionapproachdevelopedby[ 8 ]sincebothuseacontroldataowgraph.However,thereisabasicdifferencebetweenthetwoapproaches.While[ 8 ]selectssomesignalsfromtheRTL-levelandtherestfromthegate-leveldescription,ourproposedapproachoperatesonlyontheRTL-leveldescription.Hence,unlike[ 8 ],itdoesnotrequirethegate-levelmodelofthecircuitforsignalselection. 1PartialRestorabilityofasignalreferstotheprobabilitythatthesignalvaluecanbereconstructedusingknownvaluesofsomeothertracedsignals2TotalRestorabilitymeasureswhetheragroupofsignalscandenitelyreconstructasetofsignalstates. 31

PAGE 32

3.1Gate-levelSignalSelection(GSS)Algorithm 1 showsourgate-levelsignalselectionprocedure(GSS)thathasveimportantsteps.Edgeandnodevaluesarecalculatedinthersttwosteps.Totalrestorabilitycomputationisthenusedtocreateregionandrecomputenodevalues,accompaniedbysignalselection.Theremainderofthissectiondescribeseachofthestepsindetail. Algorithm1:Gate-levelSignalSelection Input: Circuit,TraceBuffer Output: ListofselectedsignalsS(initiallyempty)1:Computethenodevalues.2:FindthestateelementwithhighestvalueandaddtoS.3:CreateInitialRegion.whiletracebufferisnotfulldo 4:Recomputethenodevaluesofstateelements.5:ComputeregiongrowthbyndingthestateelementwithhighestvaluenotinSandaddtoS.endreturnS 3.1.1ComputationofEdgeValuesAnedgebetweentwostateelementsisthepathtakentoreachanelementfromanother,whilepassingthroughanumberofcombinationalgatesbetweenthem,thatis,therecannotbeanystateelementsinbetweenthem.Theedgemaybeintheforwardorbackwarddirection.InFigure 1-4 ,anedgebetweenthetwoip-opsAandCpassesthroughanORgate.Inageneralcase,therecanbeanynumberandtypeofcombinationalgatesinanedge.TondtheprobabilitythatCisinuencedbythevalueatA(whichisthevalueoftheedgeAC),therecanbetwocases(independentanddependent)asdiscussedbelow:3 3Weareshowingcalculationsforforwardrestorabilities;however,thoseforbackwardrestorabilitiescanbederivedinsimilarlines. 32

PAGE 33

3.1.1.1IndependentsignalsConsidertwoedgesACandBCinFigure 1-4 .Here,thetwoinputsignalsoftheORgateinfrontofip-opCaredrivenbyip-opsAandB,whichareindependent.Hence,theedgesACandBCareindependent.Tocalculatetheedgevaluesforanindependentscenario,weuseagenericexampleinFigure 3-1 .Later,wewillshowhowthecalculationworksforthespeciccaseinFigure 1-4 Figure3-1. Examplecircuitwithngates Figure 3-1 hastwoip-opsKandL.WewanttondhowtheinputofLissensitizedbytheoutputofK.TheinputofLcorrespondstotheoutputofthegateGn.ThepathfromKtoLisindependentofanyotherpathsthroughwhichtheoutputofKpropagates.Let'sconsiderthegateG1.Wedenefourprobabilities:PI0,N,PI1,N,PO0,NandPO1,N.Here,PI0,Nindicatestheprobabilitythatanoden(gateorip-op)hasaninputvalueof`0'whenanothernodeiscontrollingit.Similarly,PI1,N,PO0,NandPO1,Nindicatethecasesforinputvalueof`1',outputvalueof`0'and`1',respectively.Theoutputofip-opKcaninuencetheoutputofG1intwocases:i)outputofKisacontrollingvalue,andii)alltheinputstoG1arecomplementofthecontrollingvalue.LetusconsiderG1tobea2-inputANDgate.WedenePG1astheoverallprobabilityofKcontrollingG1.Accordingto[ 44 ], PG1=PO1,G1+PO0,G1(3)Now,let'sdenePO0,G1andPO1,G1.LetPcond0,G1andPcond1,G1betheprobabilitythattheoutputofG1followstheoutputofK,i.e.,theoutputofG1is0(1),whentheoutputofKis0(1).Forsimplicityofcalculation,inthisexample,weassumePI0,G1=PI1,G1=0.5 33

PAGE 34

(thatis,occurrenceof0or1followsequalprobabilityattheinput). PO0=1,G1=Pcond0=1,G1PI0=1,G1(3)Now,fora2-inputANDgate,Pcond0,G1is1,since0isthecontrollinginput.Therefore,weobtainPO0,G1=0.5.Similarly,since1isthenon-controllinginput,Pcond1,G1is0.5,whichgivesPO1,G1=0.25.FromEquation 3 ,itcanbeseenthatPG1=0.75.Now,wereturntoourmaingoal,thatis,todeterminehowKcontrolsL.WerstndtheeffectoftheoutputfromKasitpropagatestothenextgateG2andthenextrapolatealongtheentirepathtoL.WeusethesamesetofEquations 3 and 3 again,exceptthattheinputisG1hereandtheoutputisG2.Obviously,thevaluesofPI0,G2andPI1,G2wouldbePO0,G1andPO1,G1obtainedfromEquation 3 .ForexampleifG2isalsoa2-inputANDgate,applyingEquation 3 ,weobtain,PO0,G2=0.5,andPO1,G2=0.125.Therefore,wegetPG2=0.625,wherePG2istheprobabilityforthegateG2denedinEquation 3 .Inthisway,thecalculationcontinuesuntilwereachL,toobtainthevalueoftheedgeKL.IftherearencombinationalgatesbetweenKandL,weget PO0=1,Gn=1in(Pcond0=1,Gi)PI0=1,G1(3)Finally,Equation 3 isusedtocomputetheprobabilityPGn,whichcorrespondstothevalueoftheedgeKL.WeusethesecomputationstoshowhowanedgevalueiscomputedincaseofthecircuitinFigure 1-4 .Let'scomputethevalueofedgeAC.WenametheORgateinbetweenthetwoasgateGandweassumethatPI0,G=PI1,G=0.5.SinceitisanORgate,Pcond0,G=0.5andPcond1,G=1.Therefore,Equation 3 canbeusedtoobtainPO0,G=0.25andPO1,G=0.5.Equation 3 cannowbeusedtoobtainPG=0.75,whichrepresentsthevalueoftheedgeAC. 34

PAGE 35

3.1.1.2DependentsignalsIncaseofdependentsignals,weneedtodeterminetheprobabilityofastateelementoutputinuencinganm-inputgate,whentheoutputofthestateelementaffectslinputs(l2)ofthegate.WehaveusedagenericexampleinFigure 3-2 tocalculatetheedgevalueincaseofdependentsignals.Itshouldbenotedthatdependentsignalswerenotconsideredby[ 5 ]or[ 6 ]. Figure3-2. Examplecircuit Let'sconsiderFigure 3-2 .Itcanbeseenthattwoinputs(x,y)oftheminputgateGnareaffectedbyip-opK.Forthis,ourgoalwouldbetocombinethedependentedgessothattheedgewillhaveindependentsignals.WecantheneasilyutilizetheformulausedinSection 3.1.1.1 tocomputetheedgevalue.WedesiretondPO1,GnandPO0,Gn,inlineswiththeparameterPI=O0=1,NdenedinSection 3.1.1.1 .LetusassumethatGnisanANDgate.ForanANDgate,since0isthecontrollingvalue,havingeitheroftheinputsas0willensurea0beingpropagatedintothegateGn.Therefore PI0,Gn=PO0,x+PO0,y)]TJ /F3 11.955 Tf 10.95 0 Td[(PO0,x&y(3)PO0,x&ysubtractstheprobabilitywhenbothare0,sinceitisbeingcomputedtwice.Similarly,since1isthenon-controllinginput,weget PI1,Gn=PO1,x&y(3)wherePO1,x&yistheprobabilitywhenbothxandyare`1'.Let'sevaluatethetermsPO0,x&yandPO1,x&y.LetPcond0=1,x=ybetheprobabilitiesthatx(y)is0(1)whenthe 35

PAGE 36

outputofKis0(1).PO0=1,x&ycanbedenedasPO0=1,x&,y=(Pcond0=1,xPcond0=1,y)PO0=1,KWiththehelpofEquation 3 ,thiscanbereducedto PO0=1,x&y=PO0=1,xPO0=1,y PO0=1,K(3)SincethepathsfromKtoxandfromKtoyareassumedtobeindependent4,Equation 3 canbeusedtoobtainthevaluesPO0=1,x=y.ApplicationofEquations 3 and 3 providethevaluesofPI0=1,Gn.ThenalPGncanbeobtainedusingEquations 3 and 3 ,andtheinformationonthenumberofinputstothegateGn.ThiscorrespondstothevalueoftheedgeKL. 3.1.1.3ExampleWenowproceedtoshowhowthecalculationsdescribedinSection 3.1.1.1 andSection 3.1.1.2 canbeusedtodeterminetheedgevaluesforthecircuitinFigure 1-4 .AgraphicalrepresentationofthecircuitisshowninFigure 3-3 Figure3-3. Graphicalrepresentationofexamplecircuit Thestateelementsarerepresentedbynodesandanedgebetweentwostateelementsisrepresentedbyastraightline.Itshouldbenotedthatthereareno 4Ifanyoneofthesepathsconsistofdependentsignals,theaboveprocedurecanbeappliedinarecursivemanneruntilitbecomesanequivalentindependentpath. 36

PAGE 37

dependentedgesinthisexample.Alltheedgeshaveonetwo-inputgateinbetweenthem,Asaresult,alltheedgevaluesare3 4(obtainedfromSection 3.1.1.1 ).Wewillusethisgraphtoexplainoursignalselectionalgorithm. 3.1.2InitialValueComputationforStateElementsWedenethevalueofastateelementasthesumofalltheedgesattachedwithit,inbothforwardandbackwarddirection.Forexample,inFigure 3-3 ,thevalueofip-opcisthesumoftheweightsofalledgesconnectedwithit,thatis,CA,CB,CDandCE.Itisimportanttonotethatwehaveusedathresholdinordertopreventcombinationalloopsinsidethecircuitduringedgevaluecomputation.Thisparameterwasusedby[ 5 ]aswell.Ourcomputationofthestateelementvaluesareindependentofthesequentialloopsinthecircuit.Inasequentialloop,theoutputofastateelementdependsonanotherinboththepreviousandthenextcycle.However,bothcannotbetrueatthesameclockcycle;thatis,thesamestateelementcannotdeterminetheoutputofanotherinthesamecyclebybothforwardandbackwardrestoration.Whileforwardrestorationcandeterminethestateinatleastthenextcycle,backwardcandetermineitatmostthepreviouscycle. 3.1.3InitialRegionCreationAregionisacollectionofstateelementsattachedtogether.Itisnotnecessarythatallthestateelementshaveanedgewitheachotherintheregion.However,eachstateelementintheregionmusthaveatleastoneedgewithanotherstateelementintheregion.InFigure 3-3 ,theip-opsA,B,C,DandEformaregion.Therststateelementtobechosenistheonewiththehighestvalue,basedonthecalculationsinSection 3.1.2 .Itisaddedtoalistcalledknown.Now,allstateelementswhichhaveanedgewiththerecentlyselectedelementareaddedtotheregion.WeshowbyanexampleinFigure 3-4A howthisportionofouralgorithmisusedtoperformtheselectionoftheprotablesignals.Theedgevaluesareshownalongeach 37

PAGE 38

AInitialregioncreation BRegionGrowthFigure3-4. Regioncreationandgrowth edge.Thevaluesoftheip-ops(additionofallit'sedgevalues)areshowninboldalongsideeachip-op.Forexample,ahas3edgesAC,ADandAG,eachhavingavalue3 4.Therefore,thevalueforais3 4+3 4+3 4=9 4.TheipopwiththehighestvalueinFigure 3-4A isC.AllthenodeswhichhaveanedgefromCareincludedintheregion.TheregionisrepresentedbythesplineinFigure 3-4A 3.1.4RecomputationofNodeValuesTherststateelementinFigure 3-3 tobetracedisalreadyknown(cinthepreviousexample).However,thereareotherstateelementsthatneedtobetracedaswell.Toselectthesubsequentstateelements,theirvaluesarerecomputed.Thestateelementwhosevalueisbeingcomputedmayhaveanedgetoanelementinsidetheregionaswellasoneoutsidetheregion.Edgestostateelementsinsidetheregionaregivenhigherweight.Asdiscussedbefore,manyrestorabilitycomputationsrequireknowledgeofmorethanonesignaloftheinput/output5.Therefore,itisbettertogainmoreknowledgeofthesignalsthatarealreadyintheregion,thusincreasingtheirrestorabilityvaluesandtherefore,aimingfortotalrestorabilityofthosesignals.Existingapproaches[ 5 ]and[ 6 ]recomputetherestorabilityvaluesaftereachiteration, 5Forexample,whenalltheinputstoagatearecomplementofthecontrollingvalue. 38

PAGE 39

whichwhentranslatedtothegraphinFigure 3-3 ,wouldcorrespondtoedgevaluerecomputation.Clearly,thisismorecomputationallyintensive. 3.1.5RegionGrowthThestateelementwiththehighestrestorabilityandnotinthelistknownisdetermined.Iftwostateelementshavethesamevalue,theonewiththehigherforwardrestorationistraced.Thisisbecause,backwardrestorationfailsinsomecaseswhereasforwardrestorationdoesnotwhenalltheinputsareknown.ForexampleinFigure 3-4A ,thenextstateelementtobetracedisA.Itisincludedinthelistknown.Ifthetracebufferisalreadyfull,calculationswillstop,otherwisetheregioniscontinuedtogrow.Allstateelementshavinganedgetotherecentlyselectednodeareaddedintheregion.AsshowninFigure 3-4B ,inthiscaseGisaddedsinceGistheonlynodeconnectedtoaandnotintheregion.Thedottedlineindicatestheoriginalregion.Next,recomputationofstateelementvaluesasinSection 3.1.4 isreconsideredandthisprocessisiterateduntilthetracebufferisfull. 3.1.6ComplexityAnalysisInthissection,wecomputethecomplexityofouralgorithm.LetVbethenumberofstateelementsinthecircuitandEbethenumberofedgesinthecircuit.Letnbethenumberofsignalstobetraced,thatis,thesizeofthetracebufferisn.Therststep,thatis,edgevaluecomputationtakesO(E)time,whileip-opvaluecomputationsforeachtimeasignalisselectedtakesO(V)time.Toselectnsignals,thetimerequiredisO(NV).Therefore,theoveralltimecomplexityofouralgorithmisO(E+NV).Ontheotherhand,thetimecomplexityofexistingalgorithmsisO(NE).Since,E>>V,thetimecomplexityofourproposedalgorithmisless.TheoverallspacecomplexityofouralgorithmisO(E+N+V).Since,E>>N+V,thespacecomplexityreducestoO(E). 3.2MotivationalExampleWenowemployourproposedmethod(describedinSection 3.1 )forselectingsignalsinthecircuitinFigure 1-4 .TherstsignalthatwetraceisC.Notethatthis 39

PAGE 40

wasthesamesignalthatwaschosenby[ 5 ]and[ 6 ].ThesecondsignalthatwechooseisA,basedontotalrestorabilitycomputations.TracingAalongwithCguaranteestoreconstructDeverycycle.Asindicatedearlier,existingmethodsselectFasthesecondtracedsignal.Clearly,CandFtogetherdonotprovideanysuchguarantees.TheresultsareshowninTable 3-1 .Itcanbeseenthatourmethodprovidesarestorationratioof3.2,whichisbetterthantheoneprovidedby[ 5 ]and[ 6 ]. Table3-1. Restoredsignalsusingourmethod SignalCycle1Cycle2Cycle3Cycle4Cycle5 A 0 0 0 0 1B1010X C 1 1 0 1 0DX0000EX1000FXX100GX0000HXX000 3.3RTL-levelSignalSelection(RSS)ToshowhowsignalreconstructioncanbeefcientlyperformedinRTLlevel,let'sconsidertheVerilogdesigninFigure 3-5A .Thedesignconsistsofthreeregister-variablesnamelya,bandc(eachcorrespondtoasetofip-ops)aswellastwoinputsignalsdande.Therearealsothreeothersignalsm1,m2andm3.Intheexample,aandbare8bitslong,candeareof7bits,whiledisjustaone-bitsignal.Toshowhowreconstructionisperformed,let'sobservehoweachoftheseip-opsareassigned-aistheconcatenatedvalueofdandc;bistheresultoflogicaloperationsbetweena,m1,m2andm3whilecattainsthesumofanarithmeticoperationbetweeneandaconstantnumber.Let'sassumethatwetracethestateofa.Wenowexplainhowtracingofaincyclekhelpsustoreconstructtheotherstates.Theassignmentofbshowsthatthestateofbincyclek+1canbereconstructedfromthestateofabyforwardrestoration.Fromtheassignmentofa,thestatesofcanddincyclek)]TJ /F5 11.955 Tf 11.38 0 Td[(1canbereconstructed 40

PAGE 41

fromstateofabybackwardrestoration.Finally,fromthelaststatement,thatis,theassignmentofc,stateofecanberestoredincyclek)]TJ /F5 11.955 Tf 11 0 Td[(2bybackwardrestoration.Thus,weseethattracingofonlyonestateofacanreconstructthestatesof4othervariablesindifferentcycles. ARTLVerilogexample BCDFGofVerilogcodeFigure3-5. VerilogcodeandCDFG Algorithm 2 showsoursignalselectionprocedurethathassiximportantsteps.Intherststep,acontroldataowgraph(CDFG)isgeneratedtomodeltheentiresystem.SinceintheRTL-level,eachregistervariablerepresentsmultiplestateelements,weusetheregistervariablesforsignalselectionpurpose.However,thetracebufferwidthreferstothetotalnumberofstateelementsrepresentedbytheseregistervariables.Forexample,theregistervariable[7:0]awillrepresent8stateelements,andthereforeselectionofvariableaimpliesthat8tracebufferlocationsareneeded.TherelationshipbetweenthedifferentregistervariablesisobtainedfromtheCDFG.Theserelationsareusedtoproducethetotalrestorabilityvaluesforthevariables.Theregistervariablewiththehighestvalueischosenfortracing.Onceavariableischosenfortracing,alltheothervariablevaluesarerecomputedinthesamemannerasinAlgorithm 1 .Thesteps 41

PAGE 42

4)]TJ /F5 11.955 Tf 11.13 0 Td[(6arecontinueduntilthetracebufferisfull.Theremainderofthissectiondescribeseachofthestepsindetail. Algorithm2:RTL-levelSignalSelection Input: RTLdescriptionofdesign,No.oftraceentries Output: ListofselectedsignalsS(initiallyempty)1:DeveloptheCDFGoftheRTLdescription.2:Findtherelationshipbetweentheregistervariables.3:Findtheinitialvaluesoftheregistervariables.whiletracebufferisnotfulldo 4:Findtheregistervariablewiththehighestvalue.5:AddallcorrespondingstateelementstothelistS.6:Recomputevaluesforalltheregistervariables.endreturnS 3.3.1CDFGGenerationTherststepofRTLlevelsignalselectionistogeneratetheControlDataFlowGraph(CDFG)fromtheRTLmodel.CDFGcanbegeneratedusinganystandardHDLparser.Forouruse,wehavegeneratedtheCDFGbymodifyingtheopensourceIcarusVerilogparser[ 45 ]fortheVerilogcircuits.Although,ourstudiesarebasedonVerilogbenchmarks;ourapproachisalsoapplicableforVHDLdesigns.TheformatofourCDFGrepresentationissimilartoMohantyetal.[ 46 ].Figure 3-5B showstheCDFGrepresentationoftheVerilogcodeinFigure 3-5A .TheCDFGcanrepresentboththemovementofcontrolsignalsaswellasdatavalues.Thedottedarrowsindicatethecontrol-ow(transitions)intheCDFG,whiletheboldarrowsrepresentthedataow(computations).Forexample,intherighthandsideofFigure 3-5B ,thereisaboldarrowfromatotheANDgate.ThisisbecauseaisaninputoftheANDgate.ThecirclesintheCDFGrepresentoperationalandcontrolnodes,whiletheboxesrepresentstoragenodes.Forexample,thecircleinthetoprepresentsanORassignmentfortheconditionalstatementalways,whilethesquareatthebottominrighthandsideofFigure 3-5B representsthestorageinthenodeb.Itshouldbenotedthatdirectassignmentslikea<=7b0arejustrepresentedasaboldarrowwithvalue0 42

PAGE 43

enteringaboxforstorageofvaluea.Inthiscase,sincethreevariablesa,bandcareallbeingassigned0together,theyaregroupedinasinglebox.ThisbasicrepresentationcanbefurtherextendedtorepresenttheCDFGofacomplexdesign.ThisCDFGrepresentationisusedasinputtothenextstep,relationshipcomputation. 3.3.2RelationshipComputationTherelationshipofasignalwithotherscanbeobtainedfromtheCDFG.Therelationshipcomputationforasinglesignalinthecircuitprovidetheeffectofthatsignalonothers.Tocomputetherelationshipofthesignals,werstnotethattherecanbetwomainrelationshiptypes,namelydirectrelationshipandconditionalrelationship.Thesetwoclassesandtheirrespectiverelationshipcomputationsareexplainedasfollows. 3.3.2.1DirectrelationshipTwosignalsaresaidtobedirectlydependentwhentheyoccuronthesamelineofasignalassignment.Forexample,intheRTLdescriptionshowninFigure 3-5A ,thesignalpairs(a,b)and(a,c)havedirectrelationship.Thisisbecauseboththevariableassignmentsoccurinsidetheif'block.Directrelationshipcanbeoftwotypes,namelyforwardandbackwardrelationship.Forwardrelationshipdealwiththepropagationofvaluesintheforwarddirection,thatisfromtherighthandsideoftheassignmenttothelefthandside.Backwardrelationshipontheotherhanddealswiththereverse,thatisfromlefthandtorighthandsideoftheassignment.Forexample,intheRTLdescriptionoftheexampleinFigure 3-5B ,ahasaforwardrelationshiponb,whilebhasabackwardrelationshipona.Wewilluseasimplegenericexampletoshowhowthedirectrelationshipcomputationisperformed.Later,wewillconsideraspecicexampleshowninFigure 3-5B .Atypicalsignalassignmentstatementlookslikey<=x1OP1x2OP2x3OP3.......xn 43

PAGE 44

whereOPrepresentsanyoperation(eg.,AND,OR,etc.).Wecanseethattherearensignalsontherighthandsideoftheassignmentstatement.Wewanttondouttherelationshipofeachofthesesignalsony.Letusassumethateachofthesesignalsarekbitslongandallthexi'sareindependent.WealsoassumethateachoftheOPi'sareANDgates.Therefore,theassignmentstatementcanberewrittenasy<=x1&x2&...........&xnThesamecomputationscanbeextendedtootheroperations,aswellasdifferentoperationsforeachOPi.Let'scomputetherelationshipofxi(1
PAGE 45

possiblecasesofvalueassignmentstothex's.Thesecalculations,asstatedabove,canbeextendedforotheroperationsaswell.Forexample,ifOPwasanORoperation,Equations 3 and 3 willbemodiedas Py1,xi=1 2kn(3) Py0,xi=2k(n)]TJ /F8 8.966 Tf 6.97 0 Td[(1) 2kn(3) Figure3-6. AportionoftheCDFGinFigure 3-5B WenowapplythesecomputationstothespecicexampleinFigure 3-5B .Figure 3-6 showsaportionofFigure 3-5B thatshowsdependencyofbandcona.Weassumethatallthesignalsareindependenthere.Twonewvariablesg1andg2areintroducedfortheeaseofillustration.Clearly,wehaveg1=a&m1g2=m2&m3b=g1jg2 45

PAGE 46

Tondtherelationshipofaonb,werstndtherelationshipofaong1,andtheng1onb.ItcanbeseenfromEquation 3 and 3 ,Pg10,a=28 216Pg11,a=1 216Tondthesecondpart,thatistherelationshipofg1onb,weuseEquations 3 and 3 ,Pb1,g1=1 216Pb0,g1=28 216Combiningthesetwosetsofequations,wegettherelationshipofaonbasPb1,a=1 21628 216Pb0,a=1 21628 216Finally,usingEquation 3 ,weget,Pba=1 223=0.0000001Sincecandahaveadirectconcatenationrelationship,thevalueofPacisobtainedas1.0.Sincethereisnodirectassignmentrelationshipbetweenbandc,thereisnoedgebetweenthem.Thus,weobtainthevaluesofedgesbaandacofFigure 3-6 .Figure 3-7 showsasimpliedversionofFigure 3-6 withtheedgevalues(shownbelowtheedges).Thenodevaluesarecomputedbyaddingtheedgevalues.Forexample,thenodecisconnectedtoonlyoneedgewithvalue1.0,therefore,thenodevalueofcis1.0.Similarly,thenodevalueforais1.0+0.0000001=1.0000001.Likewise,thenodevalueforbis0.0000001.Thesenodevaluesareshowninthegure(ontopofnodes).Inthissection,wehavedescribedthedirectrelationshipwhenthesignalsareindependent.However,similartoSection 3.1.1.2 ,wecanhavedependentsignalsas 46

PAGE 47

Figure3-7. SimpliedversionofFigure 3-6 well.ThenatureofdependentsignalsarederivedfrommultiplebranchesoftheCDFG.ComputationsfordependentsignalsaresimilartothecomputationsinSection 3.1.1.2 3.3.2.2ConditionalrelationshipConditionalrelationshipcorrespondstothenon-assignmentdependencies.Forexample,intheRTLcodecorrespondingtoFigure 3-5B ,thesignalsaandbhaveconditionalrelationshiponreset6.Wegenerallydonotconsiderbackwardconditionalrelationship,since,thesearenotindirectassignmentstatements.ConditionalrelationshiparecomputedinthesamewayasinEquation 3 ,howevertheoperationsarecheckedinsidetheconditionalblock.Forexample,weconsiderthefollowingcode:if(morn)x<=y;Here,thesymbolxhasaconditionaldependenceonmandn.Sincethereareonlytwovariablesmandnintheconditionaldependency;thedependencyvalueis3 4,asobtainedfromEquations 3 and 3 inSection 3.1.1.1 .Theconditionalrelationshipsarecomputedinthismannerforallthesignalsinthecircuit. 3.3.3SignalSelectionOncethevaluesofthevariablesarecomputed,thenextstepistoselectthebestonefortracing.Thesignalselectionprocedureissimilartothegatelevelsignalselection.Thevariablewiththehighestvalueisselectedandtherestofthevaluesarerecomputedusingregiongrowth.ThispartissameasinAlgorithm 1 andhencenot 6Itshouldbenotedthatwedonotconsiderconditionalrelationshipofgeneralcontrolsignalslikeclock(clk)orreset. 47

PAGE 48

discussedhere.InFigure 3-7 ,registervariableahavingthehighestvalueischosenfortracing.Theprocesscontinuesuntilthetracebufferisfull. 3.4ExperimentsInthissection,werstcompareourapproachwithexistinggate-levelsignalselectiontechniques[ 5 6 ].Next,wedemonstratehowourproposedRTL-levelsignalselectioncanfurtherimprovesignalselectiontime. 3.4.1ExperimentalSetupWeappliedourgate-levelsignalselectionapproach(GSS)ontheISCAS'89benchmarksusedby[ 5 ]and[ 6 ]tocomparewiththeirmethodsandhenceshowtheeffectivenessofouralgorithm.Wehavedesignedasimulatorinthelinesoftheonedescribedby[ 6 ]forourpurpose.Wehaveimplementedthesimulatorasaniterativeprocesswhichterminateswhenitisnotpossibletorestoreanymorestates.Wehavefedthesimulatorwith10setsofrandomvaluesandnotedtheaveragerestorationratio. Figure3-8. OverviewofourexperimentstoverifyRSS 48

PAGE 49

Figure 3-8 givesanoverviewofourexperimentalsetuptoverifytheRTLlevelsignalselectionalgorithm(RSS).Forthispurpose,wehaveusedVerilogcircuitsobtainedfromOpencoreswebsite[ 47 ].ItshouldbenotedthatwehavenotusedtheISCAS'89benchmarkssinceanRTLdescriptionofthesewerenotavailable.WehavemodiedtheIcarusVerilogparser[ 45 ]togeneratetheControlDataFlowGraph(CDFG).TheCDFGisthenparsedbyanotherprogram,whichprovidesthelistofselectedsignalsusingAlgorithm2.AscanbeseeninFigure 3-8 ,wehavecomparedtheresultsobtainedusingGSSandRSSonthesamecircuitstocomparetherestorationperformanceofeachapproach.ThesignalsselectedusingRSSaremappedtogate-levelandtherestorationperformanceisnoted.Simultaneously,theRTLdesignissynthesizedtogate-levelnetlist7,andGSSisappliedonthenetlist.AcomparisonoftherestorationperformanceusingthetwoalgorithmsrevealthattheyarealmostsameasdiscussedinSection 3.4.3 .Thus,ourRTL-levelsignalselectionalgorithmdoesnotincuranysignicantrestorationpenaltycomparedtothegate-levelsignalselectionalgorithm. 3.4.2ResultsonGate-levelSignalSelection(GSS) Table3-2. ComparisonwithKoetal. RestorationRatioRestorationRatiowithrandominputswithdeterministicinputs CircuitKoOurImpro-KoOurImpro-etal.approachvementetal.approachvements3858438421.16203.33s384179161.89161.8s3593248501.0425351.4 Wewouldliketocompareoursignalselectionapproachwiththeothercloselyrelatedmethods.Table 3-2 comparestheperformanceofourapproachwiththeoneproposedbyKoetal.[ 5 ]usingthethreelargestISCAS'89benchmarkcircuits.Allthe 7anysynthesistoolcanbeusedforthispurpose;wehaveusedSynopsysDesignCompiler 49

PAGE 50

experimentshavebeenperformedwithatracebufferofwidth32.Table 3-2 isdividedintothreedistinctparts.Therstcolumnindicatesthecircuitname.Thenextthreecolumnscomparetheperformancewhenrandomsetsofinputsareusedtodrivethecircuits.Inthiscase,eventhecontrolsignalsaredrivenusingrandominputs.Asaresult,thecircuitmightfallintooneoftheresetstates.Theimprovementcanbedenedastheratiobetweentherestorationratiousingourapproachandthatof[ 5 ].ThethirdpartofTable 3-2 comparesourapproachwith[ 5 ]whenthegatesofthecircuitaredrivendeterministically.Thismeansthatthecontrolsignalsaredrivenusingvaluesthatpreventitfromgoingtoaresetstate,whiletheothersignalsaredrivenwithrandominputs.FromTable 3-2 ,itcanbeseenthattheimprovementobtainedusingrandominputsismoderate(31%onaverage).Ontheotherhand,considerablegain(117%onaverage)isobtainedwhenweuseouralgorithmfordeterministicinputs.Asdiscussedearlier,randominputstocontrolsignalsmightleadtoresetstates,whichareresponsibleforhighrestorationforboththeapproaches.Therefore,improvementobtainedislessinthiscase.Asstatedin[ 6 ]deterministicinputsareactuallyusedincircuitsduringreal-lifeapplications.Hence,gainobtainedwiththemaremoresignicant.Table 3-3 comparestherestorationratioofourproposedapproachwiththeoneproposedbyLiuetal.[ 6 ]forthethreelargestISCAS'89benchmarks.Asbefore,atracebufferwidthof32isused.Inthiscase,theinputsaredeterministicinnature.Anaverageimprovementof65%isobserved.ItcanbeseenthattheimprovementhereislessthantheoneobtainedinTable 3-2 .Thiscanbeattributedtothefactthatthealgorithmproposedby[ 6 ]ismoreefcientthan[ 5 ]. Table3-3. ComparisonwithLiuetal.withdeterministicinputs RestorationRatio CircuitLiuetal.OurapproachImprovements385849202.22s3841714161.14s3593222351.6 50

PAGE 51

WenowcompareourapproachwiththeoneproposedbyPrabhakaretal.[ 24 ].[ 24 ]haveusedtheprimaryinputsalongwiththetracedsignalsforsignalrestoration.Tillnow,weonlyusedthetracesignalstorestoretherestofthesignalsonthechip.However,toenablefaircomparison,wehaveincludedtheprimaryinputsforoursignalrestoration.TheresultsareshowninTable 3-4 usingatracebufferofwidth32.Itshouldbenotedthattheimprovementsaremoderate(onanaverage10%)inthiscase.Whenweusetheprimaryinputsforrestoration,mostofthestatesatlaterclockcyclescanberecovered.Ontheotherhand,thestateswheretheinputtestvectorscannotreachduetostatedepthinearlycyclescanberestoredusingthetraceddata.Asreportedin[ 24 ],about90-95%ofthestateswererestoredusingtheirmethod.Hence,thescopeforimprovementislimited. Table3-4. ComparisonofGSSwithPrabhakaretal. RestorationRatio CircuitPrabhakaretal.OurapproachImprovements53784.845.01.03s92345.26.01.15s1585013.915.81.14s3858434.840.51.16s3593252.453.31.02 Figure 3-9 comparesoursignalselectiontimeagainstthetimetakenbyKoetal.[ 5 ]andLiuetal.[ 6 ]forthethreelargestISCAS'89benchmarkcircuits.TheX-axisdenotesthedifferenttracebufferwidths.Itcanbeseenthatourapproachtakessignicantlylesstime(upto90%)comparedtothem.Thisisprimarilyduetothefactthat[ 5 ]and[ 6 ]recomputesedgevaluesineveryiterationwhereasweonlyrecomputethenodevalues.Althoughourgate-levelsignalselectionalgorithmprovidessignicantimprovementovertheexistingapproaches([ 5 ]and[ 6 ]),itdoesnotguaranteethemaximumrestorationpossibleusingthesametracebuffersize.Forexample,ifweconsideratracebufferofwidth32,GSSdoesnotguaranteetochoosethebest32signalsthatcan 51

PAGE 52

Figure3-9. ComparisonofSignalSelectionTime providethemaximumrestorationpossible.Adetailedanalysisisneededtodeterminethemaximumrestorationpossibleusingaparticulartracebuffersizeandthesignalstobetracedinordertoobtainthatrestorationperformance. 3.4.3ResultsonRTL-levelSignalSelection(RSS)Inthissection,wediscusshowourRTLlevelsignalselectionalgorithmcanfurtherimprovethesignalselectiontimewithoutcompromisingtherestorationratiosignicantly.Asdiscussedbefore,wehaveappliedourapproachonthedesignsobtainedfromtheOpencoresbenchmarks.WehavecomparedourRSSapproachwiththegate-levelsignalselectionprocedure(GSS).TheresultsareshowninTable 3-5 .Similartothepreviousexperiments,wehaveassumedatracebufferofwidth32. Table3-5. RTL-levelversusgate-levelsignalselection CircuitMemorySizeReductionSpeedup TotalCPU8.1697WishbourneLCDcontroller22.811923dmx512tranceiever191.24733OPBonewire3.223600SimpleRS232Uart3.8500 52

PAGE 53

TherstcolumninTable 3-5 providesthecircuitname.Thesecondcolumnshowsthememorysizereductionwhichistheratioofmemorysizeingate-levelandRTL-level.ThelastcolumngivesthespeedupobtainedusingRSScomparedtoGSS.Speedupcanbedenedastheratioofgate-levelsignalselectiontimetoRTL-levelsignalselectiontime.Ascanbeseen,RSSisupto3600timesfasterandrequiresupto191timeslessmemorycomparedtoGSS. Figure3-10. ComparisonofRestorationPerformance Finally,wewouldliketocomparetherestorationperformanceofRSSandGSSusingthreeOpencorebenchmarks(OPBonewire,dmx512transceiverandWishbourneLCDcontroller).TheresultsareshowninFigure 3-10 whenthebenchmarksaredrivenusingdeterministicinputs.AswecanseeinFigure 3-10 ,therestorationperformanceforgate-levelandRTL-levelaresimilar.Thegate-levelrestorationperformanceisfoundtobeslightlybetterthantheRTL-levelinsomecases.TheprimaryreasonforthisistherepresentationofstateelementsasarraysinRTL-level.Wheneverweselectasignalfortracing,weareactuallytracingalltheelementsinthearray.However,allofthesignals 53

PAGE 54

inthearraymaynotbeequallybenecial.Someothersignalscouldhavebeenselectedforbetterrestorationperformance.Itshouldbenotedthatourproposedtracesignalselectionalgorithmhasahightemporalobservability(sincethesignalsaretracedeverycycle)butalowspatialobservability.Ifthecircuitdoesnothavemanydominatingsignalsorifthecircuitissuchthattheoverallrestorationcapacityofthetracesignalsarelow,tracingasmallsetofsignalswouldnothelpinrestoringmanyoftheuntracedsignalstates.Apossibleoptiontoimproveobservabilityistoincreasethesetoftracesignals.However,thiswouldmeananincreaseintracebuffersizewhichdenitelyaddstothedebugoverhead.Inordertotracemoresignalswhilekeepingthetracebuffersizesame,apossiblealternativeistocompromiseontemporalobservabilitybutimprovespatialobservability.ThisalternativehasbeenexplainedinChapter 4 3.5SummaryEffeicientsignalselectionisimportanttoenhanceobservabilityduringpost-silicondebug.Wedevelopedtechniquestoemploytotalrestorabilityforselectingthemostprotablesignalsthatcanprovidebetterrestorationcomparedtowhensignalsareselectedusingpartialrestorabilityequations.Weobservedtheperformanceofourgate-levelandRTL-levelsignalselectionalgorithmsusingISCAS'89andOpencoresbenchmarks.Ourexperimentalresultsdemonstratedtwomajoradvantages-ourapproachcanprovidefaster(upto90%)signalselectionaswellassignicantlybetter(upto3times)restorationratiocomparedtoexistingapproaches.OurRTL-levelsignalselectionapproachcanfurtherimprovesignalselectiontimebyseveralorders-of-magnitudeandalsorequireslessmemorycomparedtothegate-levelsignalselectionalgorithms. 54

PAGE 55

CHAPTER4EFFICIENTCOMBINATIONOFTRACEANDSCANSIGNALSThetracesignalselectionalgorithmsdescribedinChapter 3 lackspatialobservabilitysinceasmallsetofsignalsisbeingtracedeverycycle.Toimprovethespatialobservabilityduringpost-silicondebugwhilekeepingthetracebufferlengthxed,scandatacanbecombinedwithtracesignals.Recently,Koetal.[ 27 ]haveshowntheimportanceofcombiningscanchainsandtracesignals.Theyuseapartofthetracebufferinputbandwidthtostoreselectedtracesignalseverycycle.Theremaininginputbandwidthisusedtodumpthescansignalsatacertainfrequency.Althoughthisapproachproducedpromisingresults,thereareseveralchallengestomakeitusefulinpractice.Onemajorissueisthatitusedexhaustiveexplorationtodeterminetheprotablecombinationoftraceandscansignals.Suchanexhaustiveexplorationcanbeinfeasibleforrealdesigns.Anothermajorconcernisthattheselectedscansignalsincludealmostalltheip-ops.Suchanapproachisneitherpracticalnorprotableinmanyrealscenarios,sinceahugenumberofshadowip-opswillbenecessary.Also,thetimeforscandumpwillincrease,thuseffectivelydecreasingthenumberofscandumps(onlyabout20over1000cyclesfortheISCAS'89benchmarks).Wehaveproposedanefcienttechniquetodeterminetheprotablecombinationoftraceandscansignals.Ourapproachusesagraphbasedrepresentationtoselectthreeimportantaspects:(i)efcienttracesignalstobestoredeverycycle,(ii)themostprotablescansignalstobeincludedintheshadowscanchain,and(iii)thescandumpfrequencybasedonthetracebufferwidthconstraints.Itisimportanttonotethattracesignalstatesarestoredeverycyclewhereasscansignalstatesofaspecicclockcyclearestoredbasedondumpfrequency1.Amajorchallengeisthatthesethree 1Forexample,ifthetracebufferwidthis32,and8tracesignalsareused,wehavespaceleftforonly24scansignals.Ifwechooseascanchainof48ip-ops,thescandumpshouldbeineverytwo(4824=2)cycles.Inotherwords,inclockcycle0,states 55

PAGE 56

aspectsareinter-dependent.Forexample,selectingmoretracesignalsimplieslessspaceforscansignals,andviceversa.Evenwhenthespaceforthescansignalsisreserved,choosingalargescanchain(toomanyscansignals)implieslongerscandumpfrequency.Inotherwords,thereisacriticalbalancebetweenhowmanysignalstoobserveversushowmanysignalstatescanbeobtainedforaspecicclockcycle.Ourproposedapproachaddressesthesechallenges.Ourexperimentalresultsshowthatourmethodcansignicantlyimproverestorationperformancecomparedtoexistingmethods. 4.1BackgroundandMotivationToexplainhowcombinationoftraceandscansignalscanbeusedtoimprovesignalobservability,wewillreferbacktoFigure 1-4 inChapter 1 .Usingonlytracesignals,AandC,wewereabletoreconstruct32signalstates,including10tracedonesand22newlyreconstructedones.Wenowshowhowcombinationoftraceandscansignalscanhelpinsignalreconstructionusingthesamecircuit.Inthepreviousexample,thetracebufferstoredatotalof10states(width2anddepth5).Inthiscase,weuseatracebufferthatcanstore11states.SignalCisselectedfortracingeverycycle.Theothertwoimportantsignals,AandFareusedasscansignals.Thescandumpisperformedinalternatecycles.ThemodiedcircuitisshowninFigure 4-1 2.Table 4-1 showsthetraced,scannedandrestoredsignalsusing[ 27 ].ThestatevaluesforsignalCistracedeverycyclewhereasthestatevaluesforscansignals(A of8tracesignalsarestored,whereasonlythestatesofrst24scansignalsarestored.Similarly,inthenextcycle,8tracesignalstatesandthelast24scansignals(withstatesofcycle0)arestored.2Ourmethodusespartialscan.RecentresearchbyAlawadhietal.[ 48 ]hasshownthatpartialscancanbeusedwithoutincorporatingadditionalpenaltycomparedtofullscan. 56

PAGE 57

Figure4-1. Examplecircuitwithbothscanandtracesignals andF)aredumpedinalternatecycles.Thescannedsignalstatesareshowninbold.Althoughscansignalsaredumpedinalternatecycles,thetableshowsstatesforbothAandFincycle1,cycle3,andsoon.Thisisbecauseincycle1thestateofsignalAisdumpedwhereasincycle2thestateofsignalFincycle1isdumped.However,thescanchain(i.e.,AandFusingshadowip-ops)holdsthestateforthesamecycle,althoughdifferentpartsweredumpedindifferentcycles.Inotherwords,thesignalstateofFcapturedatcycle1isdumpedincycle2.Asdescribedby[ 27 ],thescanchainsneednotconsistofip-opsthatarephysicallyconnected.Forexample,thescanchainhereconsistsofip-opsAandFthatareconnectedviaip-opD,whichisnotpartofthescanchain.Inotherwords,avirtualscanchaincanbedevelopedonlycomprisingofthetwoip-opsAandF.Althoughtherestorationratioobtainedhereis3.1(lessthanthetrace-onlymethod),thenumberofstatesrestoredis34whichishigherthanobtainedearlier(32incaseofTable 1-1 ,asseeninSection 1.2 ).Thus,moresignalstatesprovideamoredetailedviewoftheinternalstateofthecircuit.Theprimaryproblemofcombiningscanandtracetogetheristodeterminewhatsignalstoselectfortracing,andwhichonestobeincorporatedinthescanchain.Tracesignalsshouldbechosensuchthattheycompriseoftheimportantsignalsinthecircuit 57

PAGE 58

Table4-1. Restoredsignalsusingtraceandscan SignalCycle1Cycle2Cycle3Cycle4Cycle5 A 0 0 0 0 1B1010X C 1 1 0 1 0DX0000EX1000 F 1 X 1 0 0GX0000HX1000 thatcancontrolsignicantpartsofthecircuit.Scanchainsontheotherhandshouldbedistributedaroundthecircuitsothatthesnapshotofthecircuitataparticularclockcyclecanbeobtainedduringdebug.Sincethetracebufferisgettingdividedbetweenthetracesignalsandthescanchains,itisalsoimportanttoknowhowthisdivisionisdone.Clearly,anequaldivisionmightnotbebenecialsinceitwouldmeanlessimportancetothescansignalsanddecreasingthenumberofscandumps.Koetal.[ 27 ]exploredallcombinationsofnumberoftracesignalsandscandumpfrequencytoobtainabenecialcombination.Wehavedevelopedanalgorithmtoselectanefcientcombinationoftraceandscansignalstomaximizetheoverallsignalrestoration. 4.2TraceandScanSignalSelectionSimilartotrace-onlydebugapproaches,boththetraceandscansignalsarechosenduringthedesignphaseofaparticularcircuit.Thestatesofthetracesignalsaremonitoredeverycycle,whilethescansignalsaredumpedatcertaintimeintervalsinarepeatedfashion.Werstintroduceourdebugarchitecture.Next,wedescribeourtraceandscansignalselectionalgorithms. 4.2.1Trace+ScanDebugArchitectureOurtrace-scancombinedarchitectureismotivatedbythedesignofKoetal.[ 27 ].Theentirespaceofthetracebufferisdividedintotwoparts-oneforthetracedataandtheotherforthescandump.Thestatesoftracesignalsareofoadedintothe 58

PAGE 59

tracebufferateveryclockcycle.Thetracebufferwidthdeterminesthenumberofscansignalsdumpedaswellasthescandumpfrequency.However,sincethetracebuffersizeisconstant,thetotalamountofdatathatcanbestoredremainsxed.Withanincreaseinnumberofip-opsinthescanchain,theamountofdataproducedineachdumpincreases.Asaresult,thenumberofscandumpshastobedecreasedinordertomaintainthetracebufferconstraints.Thescanchainisdividedintosmallsubchainstoallowcompleteutilizationofthetotaltracebufferwidthinthesamewayas[ 49 ].ThesearerepresentedasnsubchainsinFigure 4-2 .Thepartitionsareshowntobenumberedfrom1ton.Eachofthesensubchainsutilizethetracebufferinputsfordumping.Tofacilitatethetradeoffbetweenscanandtracedata,[ 27 ]haveproposedintroductionofmultiplexersinfrontofthetracebufferinputs.Thishelpsindynamicallyreconguringtheinputsforthetraceorthescansignals.Inourcase,theinputstothetracebufferarepredeterminedforaparticularcircuit.Hence,multiplexersarenotneeded,andthisreducesthehardwareoverheadaswellasdelayassociatedwithdynamicrecongurationmechanism.ThetracebufferhasawidthwanddepthD.Therefore,thetotalnumberofbitsthatcanbestoredinthetracebufferarewd.Here,moftheinputsarededicatedfortracesignals,whilensub-scanchainsdumptheirvaluesinthetracebuffer.Clearly,w=n+m.Ourproposedalgorithmcomprisesoftwoparts.First,wedeterminewhichtracesignalsarebenecial.Next,wedetermineprotablesetofscansignalsandscandumpfrequency. 4.2.2TraceSignalSelectionAlgorithmInthissection,wedeterminethesignalsthatareneededtobetracedduringdebug.Themainproblemthatwefaceherearetwofold.Firstofall,thetracesignalsneedtobechosenefcientlyinordertoincorporatetheadvantagesofusingthescansignalsduringdebug.Also,unlikethetrace-onlyapproaches([ 7 ]and[ 5 ]),thenumberofsignalstobetracedisnotxed.Although,themaximumnumberofsignalstobetracedisequaltothe 59

PAGE 60

Figure4-2. ProposedArchitecture:Thewidthwofthetracebufferissharedbymtracesignalsandnsubchainsofthescanchain tracebufferwidth,theactualnumberoftracesignalscanbelesstoaccommodatethescansignals.Weusetwotermsconnectivityandthresholdinouralgorithm.Theconnectivityofastateelementisdenedasthenumberofstateelementsconnectedwithitthroughothercombinationalgates(only)inbothforwardandbackwarddirections,asexplainedinSection 4.1 .Thethresholdisaminimumlimitontheconnectivityofastateelement,sothatitisselectedfortracing.WenowexplainthesetermsusingtheexampleinFigure 1-4 .Theconnectivityoftheip-opscanbedeterminedusingthecircuitdiagram.Forexample,inFigure 1-4 ,theconnectivityofCis4,sinceip-opsA,B,DandEareconnectedtoit.Similarly,connectivityofip-opAis2sinceonlyCandGareconnectedtoit.Algorithm 3 outlinesthemajorstepsinourtracesignalselectionalgorithm.Firstwecreateagraphfromthecircuit,witheachnoderepresentingastateelement.Theedgesbetweenthenodesrepresentthepathtakentoreachfromonestateelementto 60

PAGE 61

theother.ThisgraphconstructionfollowsthesamemethodologydescribedinFigure 3-3 .ThegraphisredrawninFigure 4-3 Figure4-3. Graphicalrepresentationofexamplecircuit Oncethegraphisconstructed,thenodewiththehighestconnectivityisselectedasthemostprotabletracesignal.Alltheadjacentnodesanditselfaredeletedfromthegraph.Thenextnodewithhighestconnectivityischosen.Iftheconnectivityofthenodeislessthanthethreshold,thecomputationstops,otherwisethesignalselectionproceduregoesonuntilthetracebufferwidthisreached. Algorithm3:Tracesignalselectionalgorithm Input: Circuit,threshold Output: ListoftracesignalsS(initiallyempty)1:CreateagraphGPfromthecircuit.whiletracebufferisnotfulldo 2:FindnodewithhighestconnectivityinGP.3:IfconnectivityislessthanthresholdreturnS.4:Otherwise,addthenewnodetothelistS.5:DeleteitanditsadjoiningnodesfromGP.6:Re-computetheconnectivitiesofallnodes.endreturnS LetGPdenotethegraphmodelofFigure 4-3 .Inthisexample,weuse40%ofthetotalnumberofip-opsinthecircuit(i.e.,3.2)asthreshold.ThenodewiththehighestconnectivityisC,4,whichismorethanthethreshold.Therefore,Cisselectedfortracing.LetRfCg,denedasrelationsofC,bethesetofnodesconnectedwith 61

PAGE 62

C,includingCi.e.,RfCg=fA,B,C,D,Eg.Step5ofAlgorithm 3 recalculatesGP=GP)]TJ /F3 11.955 Tf 11.03 0 Td[(RfCg.Inotherwords,afterdeletionofCanditsadjoiningnodes,themodiedGPconsistsofonlythreenodes(F,GandH)whereFisconnectedtobothGandH.ThenodewiththenexthighestconnectivityinGPisF,withaconnectivityof2.Sincethisislessthan40%,Fisnotconsideredasaprotabletracesignal.ThealgorithmreturnsCastheselectedtracesignal. 4.2.3ScanSignalSelectionAlgorithmInthissection,wedescribeourproposedalgorithmforselectionofbothscanchainandscandumpfrequency.TheproceduretodeterminethescanchainisshowninAlgorithm 4 .First,wecreateagraphfromthecircuit,inthesamewaydescribedinSection 4.2.2 .Oncethegraphisconstructed,allthenodesthatarepartofthetracesignalsorareconnectedtothosetracesignalsareremovedfromthegraph.Then,aminimalnodesetisobtainedfromthegraph.Aminimalnodesethastworequirements.First,itisagroupofnodessuchthateachandeveryothernodeinthegraphisconnectedtoatleastonenodeintheset.Also,itshouldbeminimali.e.,thesetshouldhavetheleastnumberofnodes.TheproceduretoobtaintheminimalnodesetisshowninAlgorithm 5 .Theip-opscorrespondingtothenodesinthenodesetconstitutethescanchain.Werstdescribehowtheminimalnodesetiscreated.Next,weuseanillustrativeexampletodescribehowthealgorithmworks. Algorithm4:Scansignalselectionalgorithm Input: Circuit,alreadyselectedtracesignals Output: ListofscansignalsS(initiallyempty)1:Createagraphfromthecircuit.2:Removethenodesrelatedtotracesignalsanditsimmediateneighbors.3:Computethenodevalues.4:FindtheminimalnodesetS.returnS 62

PAGE 63

4.2.3.1CreationofminimalnodesetTherststepconstructsagraphmodelofthecircuit.Oncethegraphhasbeencreated,theminimalnodesethastobedetermined.Duringthecreationoftheminimalnodeset,caremustalsobetakentoensurethatthenodeshavinghigherconnectivityareselectedforscanning.ThealgorithmforminimalnodesetconstructionisshowninAlgorithm 5 Algorithm5:Minimalsignalsetcreation Input: Circuitasgraph,Nodevalues Output: MinimalNodeSetS(initiallyempty)1:PutallthenodesinalistGPS.whileGPSisnotemptydo 2:Findthenodewiththehighestconnectivity.3:RemovethenodefromGPS.4:RemoveallnodesassociatedwiththatnodefromGPSalongwiththeirassociatededges.5:Recomputeconnectivityvalues.endreturnS 4.2.3.2IllustrativeexampleWenowexplaineachofthestepsinthealgorithmusingthegraphinFigure 4-3 .ItshouldbenotedthatsincethecircuitinFigure 1-4 issmall,wehavenottakenintoconsiderationtheeffectofnodesthathavebeenselectedfortracing;thatis,wehaveshownthescansignalapproachindependentlyofthealgorithmdescribedinSection 4.2.2 .Inotherwords,weassumedthattherearenotracesignalsinthiscase.AscanbeseenfromFigure 4-3 ,thenodeCandFhavehighestconnectivity.WechoosenodeCastheinitialnode.Thenodesassociatedwithit(i.e.,A,B,DandE)arealsoremovedalongwiththeircorrespondingedges.Thenodeconnectivityinformationisthenrecomputed.AfterdeletionofCanditsadjoiningnodes,themodiedGPconsistsofonlythreenodes(F,GandH)whereFisconnectedtobothGandH.Inotherwords,theconnectivityvaluesforF,GandHare2,1and1,respectively.ThenodewiththenexthighestconnectivityisF.OnceFisselectedandtheadjoiningnodes(GandH)are 63

PAGE 64

deleted,thegraphbecomesempty.Therefore,thecomputationstops.Thescanchainobtainedcomprisesofthetwoip-opsCandF.Thebasicideabehindthisformofscancellselectionisthateachnodeintheentirecircuitiseitherintheminimalsetorconnectedtoatleastonenodeintheset.Therefore,whenthescandumps(ofip-opsintheminimalset)areperformed,thesignalstatesofthenodesthatarenotintheminimalsetcanbereconstructedbasedonscandumps.Forexample,inFigure 4-3 ,ifthestateofCisdumpedincyclei,thestatesofAandBmaybeobtainedincyclei)]TJ /F5 11.955 Tf 10.95 0 Td[(1,whilethestateofDandEmaybeobtainedincyclei+1.Nowwedescribehowtocomputescandumpfrequencyforasetofscansignals(scanchain)basedonaparticulartracebuffersizeandnumberofinputsdedicatedtothetracesignals.LetthetracebufferdepthandwidthbeDandwrespectively.Letmbethenumberofinputsdedicatedtothetracesignals.Therefore,thenumberofinputsofthetracebufferdedicatedtothescanchainpercyclearew)]TJ /F3 11.955 Tf 11.15 0 Td[(m.Letthescanchainlengthbel.Therefore,numberofcyclesittakestodumptheentirescanchainintothetracebufferisl w)]TJ /F15 8.966 Tf 6.97 0 Td[(m.Thisdeterminesthescandumpfrequency,sincethescanchainwillbedumpedaftereachl w)]TJ /F15 8.966 Tf 6.97 0 Td[(mcycles.SincethedepthofthetracebufferisD,thenumberofscandumpswouldbed(w)]TJ /F15 8.966 Tf 6.97 0 Td[(m) l. 4.3ExperimentalResultsTable 4-2 showstheresultsofcomparisonwiththeexistingtechnique[ 27 ].Weimplementedboththeapproachesforthe3largestISCAS'89benchmarks.Thetracebufferischosenwithawidthof32andadepthof1024.Incaseofourapproach,athresholdvalueof10%ischosenforselectingthetracesignals.Tomaintainfairness,inourimplementationofthemethodproposedby[ 27 ],wehaveusedthesamenumberoftracesignalsasourapproachanddriventheinputsofthebenchmarkswiththesamesetofrandomvalues.TherstcolumninTable 4-2 showsthecircuitname.Thesecondandthirdcolumnsrepresentthenumberofstatesrestoredusingourapproachandtheoneproposedby[ 27 ].Finally,thelastcolumngivestheimprovement,whichistheratio 64

PAGE 65

ofthenumberofextrastatesrestoredbyourapproachcomparedtothestatesrestoredusing[ 27 ].Ourapproachperformedconsistentlybetterthan[ 27 ]andproducedupto17.3%3improvementinrestorationperformance. Table4-2. Comparisonwithexistingtechnique RestoredStates CircuitOurApproachExistingTechnique%Improvements3858433285428379217.3%s3841760187854060311.3%s359323380903266023.5% Figure4-4. ComparisonwithKoetal.andBasuetal. Wenowcompareourproposedapproachwiththeexistingtrace-onlyapproachespresentedbyKoetal.[ 5 ]andBasuetal.[ 7 ].Figure 4-4 showsthecomparisonof 3Themaximumimprovementweobtainedis44%incaseofs9234.Since[ 27 ]didnotreportitintheirpaper,wealsoomittedit. 65

PAGE 66

restorationratiousingthe3largestISCAS'89benchmarks.Asexpected,ourproposedapproachoutperformstheothertwotechniquesfors38417.However,ourproposedapproachoutperforms[ 5 ]fortheremainingbenchmarksbutproducescomparableresultswith[ 7 ]4.Themainreasonisthatthesebenchmarkshavelargenumberofdominatingsignals,whichneedstobetracedeverycycle.Tracingonlyafewofthemandperformingscandumpsatregularfrequenciesisnothelpful.Ontheotherhand,thenumberofsuchdominatingsignalsins38417islowwhichrequiresahighspatialobservabilityoftracesignalsforimprovedrestoration.Hence,abetterperformanceisobtainedwiththetrace-scancombinedapproach. 4.4SummaryCombiningtrace(non-scan)andscansignalsisapromisingapproachtoenhancesignalreconstructionduringpost-silicondebug.Wedevelopedefcientalgorithmstoselectprotabletracesignalsandscanchainstomaximizetherestorationratio.OurexperimentalresultsusingISCAS'89benchmarksdemonstratedthatourmethodprovidesupto17%higherrestorationcomparedtoexistingapproaches.Weobservedthatitisprotabletoselectonlytracesignalsifadesignhaslargenumberofdominatingsignals,otherwiseselectionofbothtraceandscansignalsisbenecial. 4Inthesecases,ourapproachcanbeconsideredasatrace-onlyapproach,thatis,atrace-scancombinedapproachwithzeroscandumps,whichselectsthebesttracesignalsusingthemethodin[ 7 ]. 66

PAGE 67

CHAPTER5ERRORDETECTIONAWARETRACESIGNALSELECTIONExistingtracesignaltechniquesarebasedontheprimaryobjectiveofmaximizingtherestorationofuntracedsignals,andnotdetectionoferrorsinthecircuit.ErrordetectioninacircuitcanbeillustratedusingtheexamplecircuitinFigure 1-4 .Errorsinasignalareonlypropagatedalongtheforwardpropagatingpathtowardstheoutput.Therefore,asignalwhichisonlyonthefan-outconeoftheerroneoussignalcangetaffectedbytheerror.Forexample,inFigure 1-4 ,whenip-opFisinerror,tracinganyip-opinitsfan-incone(A,B,C,DorE)wouldnothelptoreconstructit.Insteadtracinganyip-opinitsfan-outcone(GorH)hasapossibilityofdetectingtheerrorinF.Existingsignalselectionalgorithmsrelyonbothforwardandbackwardrestoration1forreconstructinganuntracedsignal.Sinceanerrorinthefan-outconeofasignalcannotbedetected,forwardrestorationisnotmeaningfulforerrordetection.Therefore,restorationratio,themetricusedtomeasuretheefciencyofexistingsignalselectionalgorithmsisnotappropriateforerrordetection.Section 1.3 motivatestheneedforanewmetricthatcanprovideanestimateofthepercentageoftotalerrorsdetectedinthecircuit.Yangetal.[ 50 ]proposedasignalselectionalgorithmtofacilitateearlydetectionoferrors.Theirapproachisfocusedmoreonlatencyanddoesnotdealwiththetotalnumberoferrorsdetected.Shojaeietal.[ 51 ]developedatechniquewhichisdedicatedtodetectionoftimingerrorsonly.Theauthorsconsideredhowswitchingactivityandpowerdroopcanbeusedtoestimatetheerrorsinaportionofthecircuit.However,theauthorsdidnotconsidertheimportanceoffunctionalorlogicalerrorsthatmightbepropagatingtothetracedsignals.Wehaveproposedasignalselection 1Forwardrestorationdealswithrestorationoftheoutputsignalstatewhenoneormoreoftheinputsignalstatesareknown.Ontheotherhand,backwardrestorationobtainstheinputsignalstateswhentheoutputsignalstateisknown.ThisisdiscussedindetailinSection 1.2 67

PAGE 68

algorithmwhichselectsprotablesignalsforefcienterrordetection.Ouralgorithmlaysemphasisonhowerrors,whichpropagatefromanerrororigintowardsitsfan-outconecanbedetected.Comparedtotheexistingsignalselectionalgorithms,ourapproachismoreefcient(upto2X)indetectingerrorsacrosstheentirecircuit. Algorithm6:SignalSelectionAlgorithm Input: Circuit,Tracebufferwidth Output: Listofsignalstobetraced,TSTS=f/*InitializetoNULL*/1:Createagraphicalrepresentationofthecircuit.2:/*ComputeEdgeValues*/Foreachnodees,d,whichisanedgebetweentwonodessandd,computetheedgevalueps,d.3:/*ComputeNodeValue*/Createasetforeachnodei:Si=f(ej1,i,pj1,i),(ej2,i,pj2,i),....(ejn,i,pjn,i)g,whereejk,irepresentsanedgebetweennodejkandi,wherejkisinthefan-inconeofi,andpjk,iisthevalueoftheedge.Valuefornodei:vi=k=nk=1pjk,i,wheren=jSij4:/*SelectTraceSignals*/whiletracebufferisnotfulldo Selectthenodewiththelargestnodevalue.Letthisnodebei./*Addthenodetothelist*/TS=TS[i/*RemoveOverlaps*/foreachelementtiinSido foreachelementtlinSldo iftheyhavecommonsourcem&pm,ipm,lthen Sl=Sl)]TJ /F6 11.955 Tf 10.95 0 Td[((em,l,pm,l)endendendendReturnthelistofselectedsignals,TS. 5.1TraceSignalSelectionforErrorDetectionInthissection,weproposeasignalselectionalgorithmthatisdedicatedtoerrordetectioninthecircuit.Algorithm 6 describesoursignalselectionalgorithm,whichhas 68

PAGE 69

fourimportantsteps.Theremainderofthissectiondescribeseachofthesestepsindetail. 5.1.1GraphbasedModelingofCircuitsFigure 5-1 showsamodiedversionofFigure 1-4 withlabeledsignals.Figure 5-2 showsthegraphicalrepresentationofFigure 5-1 .Here,eachnodecorrespondstoasignalinthecircuit.Anedgerepresentstheconnectivity/owbetweentwonodes(signals).Forexample,presenceoftheORgatebetweensignalsaandpisrepresentedasanedgebetweenthecorrespondingnodes.Theedgeshavebeenshownusingdirectionalarrows,whichindicatethepropagationoferrorfromasourcetoitsfan-outcone.Flowoferrorfromonenodetoanotherwhichisnotdirectlyconnectedwillpassthroughseveralintermediatenodes.Forexample,inFigure 5-2 ,anerrorfromatocwillpassthrougha,p,c. Figure5-1. Examplecircuitwithlabeledsignals 5.1.2EdgeValueComputationWenowdescribehowtocomputeindividualedgevaluesdependingonthetypeofgate.Next,weexplainthecomputationofcompoundedgevalues.Acompoundedgeisonewhichpassesthroughmultiplenodes,thatis,acompoundedgecomprisesoftwoormoreindividualedges. 69

PAGE 70

Figure5-2. GraphicalrepresentationofFigure 5-1 ANDGate.Tocomputetheprobabilityoferrorpropagation,weconsidertheexamplesinFigure 5-3A ,whichshowsamulti-inputANDgate.ThegraphicalrepresentationisshowninFigure 5-3B .Lettheinputsbenamedi1,i2,...,inandtheoutputo1respectively.Let'sconsidertheerrorpropagationfromi1too1.Inorderforanyerror(0/1or1/0)topropagatetoo1,itisnecessarythatalltheotherinputsbetiedat1.Ifanyoftheotherinputsisatstate0,theerrorwillnotpropagate.Ifoneoftheinputsisatstate0,theoutputwillalwaysbeatastateof0,irrespectiveofi1beingcorrectorerroneousandhence,theerrorgetsundetected.Here,weassumealltheotherinputstotheANDgateareindependent.Weconsiderdependentedgeslater.Therefore,theprobabilitythatallofthemare1simultaneouslyistheproductofeachoftheindividualprobabilities.LetP1inbetheprobabilitythatinputinisatstate1.Therefore,theprobabilitythatalltheotherinputsareat1is Probi1=2knP1ik(5)whichistheprobabilitythatanerrorati1willgetpropagatedtoo1.SimilarcomputationscanbeperformedifthegatehadbeenaNANDgate.ORGate.ThecomputationsforanORgatefollowstheapproachsimilartoanANDgate.Let'sconsideramulti-inputORgateasshowninFigure 5-4A ,andthecorrespondinggraphinFigure 5-4B .Asbefore,lettheinputsbenamedi1,i2,...,inandtheoutputo2,respectively.Let'sconsidertheerrorpropagationfromiitoo2.Sinceinan 70

PAGE 71

AANDgate BGraphofANDgateFigure5-3. ExampleusingANDgate ORgate,0isthenon-dominatinginput;inordertopropagateanerrorini1too2,alltheotherinputsmustbeheldatastateof0.Theprobabilitythataninputikisheldat0isP0ik.Thejointprobabilitythatalltheinputsotherthani1isheldat0is Probi0=2knP0ik(5)whichistheprobabilitythatanerrorati1getspropagatedtoo2.SimilarcomputationscanbeperformedforanyoftheninputsandalsoforaNORgate. AORgate BGraphofORgateFigure5-4. ExampleusingORgate Flip-opandNOTGate.Toshowhowtheerrorpropagatesthroughip-opsandNOTgates,letusconsidertheexamplesinFigure 5-5 .Figure 5-5 (a)showsaD-typeip-opwhoseinputisi1andoutputiso3.Anyerrorini1willbetransmittedtoo3inthenextcycle.Sincethereisnoothersignaldependencybetweeni1ando3,there'snohindranceinerrorpropagation.Therefore,theprobabilityoferrorpropagationis1andhence,thevalueoftheedgebetweenthenodesi1ando3is1.Figure 5-5 (b)showsaNOTgatewhoseinputisi1andoutputiso4.SimilartotheD-typeip-op,thereisjustoneinput,andhence,anyerrorini1willgetpropagatedtoo4.Thus,theedgevaluebetweenthenodesi1ando4is1. 71

PAGE 72

Figure5-5. Dip-opandNOTgate Figure 5-6 showsthevaluesoftheedgesinFigure 5-2 .Forsimplicity,weassumethatallthenodeshavea50%probabilityofhavingastateof0or12. Figure5-6. EdgevaluesforthegraphinFigure 5-2 Nowwediscusshowtheprobabilityoferrortransmissionchangesacrossmultipleedges.Tocalculatetheedgevalueacrossmultiplegates,thatis,probabilityoferrorpropagationfromonenodetotheothersinitsfan-outcone,wehavetoconsiderbothindependentanddependentedges.CompoundIndependentEdge.Anindependentedgeisonewhichpassesfromonenodetoanotherwitheachinternalnodealongthepathvisitedatmostonce.WeexplaintheindependentedgesscenariousingthegraphicalrepresentationinFigure 5-2 .Theedgefromqtosisanindependentedge,sincethereexistsonlyonepathfromqtos,viad.Inotherwords,eq,sistraversalofeq,dfollowedbyed,s.Sincethereisa 2Inourexperiments,weusetheprolinginformationtodeterminetheprobabilityofinputs(nodes)stayingataparticularstate 72

PAGE 73

ip-opbetweenqandd,thevalueofeq,dis1.Thisiswrittenaspq,d=1.Tocomputethevalueoftheedgebetweendands,itcanbeseenthattheotherinputtosisnodee.Thevalueoftheedgeed,sistheprobabilitythatthenodeeisatstate0,whichinthiscaseis0.5.Therefore,pd,s=0.5.Thevalueofedgebetweenqands,istheproductofpq,dandpd,s,thatis,pq,s=0.5.Thisisintuitivebecausebothgatesareindependent,andhence,thenalprobabilitywillbeaproductofhowtheerrorgetspropagatedthrougheachofthem.Thus,foranindependentedge,theedgevalueistheproductofalltheedgeswhicharecomponentsoftheindependentedge.Ingeneral,iftherearenedges,e1,e2,...,encomprisinganindependentedgee,valueofeis pe=1knpek(5)CompoundDependentEdge.Adependentedgeisonewhichstartsfromanode,branchesof,andnallycombinestoreachanothernode.Asbefore,weexplainthedependentedgecomputationusingthegraphicalrepresentationinFigure 5-2 .Thereexiststwoedgesbetweennodesnandr.Oneedgeisn,b,rwhiletheotherisn,b,p,c,r.Tocomputethevalueofthecompoundedgeen,r,weneedtocomputetheseedgevaluesseparately.Thevalueoftheedgen,b,risproductofpn,bandpb,r(followsfromindependentedgevaluecomputation).Thus,valueofedgen,b,r,thatis,pn,b,ris0.5.Ontheotherhand,pn,b,p,c,ris0.25.pn,risdenedaspn,r=max(pn,b,r,pn,b,p,c,r).Thisisbecausewhentheeffectoftwodifferentedgesarealreadytakenintoconsideration,anedgewithahighererrorpropagationprobabilitywillalwaysdominate,andthus,canbeconsideredanedgevaluefromonetoanother.Ingeneral,iftherearenedgese1,e2,..,enbetweentwonodessandd,thenthevalueoftheedgebetweensandd,ps,d=max(pe1,pe2,....,pen) 5.1.3NodeValueComputationWearenowreadytocomputethenodevaluesforFigure 5-6 .Foreachnode,weobtainthesetofnodesinitsfan-inconeandthecorrespondingedgevalues.Letus 73

PAGE 74

considerthenodep.Ascanbeseen,itfallsinthefan-outconeoffournodes,a,b,mandn.ThecorrespondingedgevaluesarecomputedandthesetSpisobtainedasSp=f(em,p,0.5),(ea,p,0,5),(en,p,0.5),(eb,p,0.5)g3.Thenodevalueatpisdenedasthesumofalltheseedges,thatis,vp=2.Inthisway,thenodevaluesarecomputedforeverynodeinthegraph.Figure 5-7 showsthenodevaluesforthenodesinFigure 5-6 Figure5-7. NodevaluesforthegraphinFigure 5-6 5.1.4SignalSelectionInthissection,wedescribethenalstepinoursignalselectionalgorithm.Oncethenodevaluesarecomputed,thenodewiththehighestvalueisselectedfortracing,whichisg(orh)inFigure 5-7 .Thesubsequentsignalsshouldbecarefullyselectedsoastoenhancetheerrordetection.Weshoulddeletecontributionsfromsignalswhichhaveahighprobabilityoferrordetectionfromthealreadytracedsignals.Step4ofAlgorithm1isusedforthispurpose.Foreachsignalinthecircuit,wecheckhowmuchofitscontributionistothealreadyselectedsignalaswellastotheothers.Ifthecontributiontotheselectedsignalislargerthanthecontributiontosomeothersignal,itscontributiontothelatterisremoved.Toillustratethisfeature,werefertoFigure 5-8A .Theedgevaluesaswellasthe(edge,value)pairsetshavebeenshownforeachnode.Aftergisselectedfortracing,thenodevaluesarerecomputedbyremovingtheoverlap. 3Eachentryinthesetisan(edge,value)pair 74

PAGE 75

AscanbeseeninFigure 5-8A ,ps,t=0.5andps,g=0.5.Sincetheprobabilityofanerroratsgettingdetectedattandgaresameandghasalreadybeenselectedfortracing,contributionofsisremovedwhenrecalculatingthenodevalueoft.Bysimilararguments,thecontributionfromftot,i.e.,(ef,t,0.5),willbealsodeletedfromSt.Ontheotherhand,sincethevalueofes,fisgreaterthanes,g,itisnotdeletedfromSf.TherecomputedsetsforeachofthenodesareshowninFigure 5-8B .Sincefhasthehighestnodevalue,itwillbeselectednext. AInitialSets BRecomputedsetsFigure5-8. SignalSelectionbasedonremovalofoverlap 5.2Expriments 5.2.1ExperimentalSetupInthissection,wediscusstheperformanceofouralgorithmusingtheISCAS'89benchmarks.Foreachofthebenchmarks,50differenterrorsitesarechosenrandomly.TheerrormodelisdescribedinSection 5.2.2 .Acompletesimulationof1000cyclesisperformedforboththecorrectanderroneousscenarios.Thesignalstobetracedaremonitoredateachclockcycleduringsimulation.Theirstatesarecomparedwiththestatesobtainedfromtheperfectsimulation.Anydiscrepancyisreportedaserror.WedeneErrorDetectionRatio(EDR)asametricformeasuringtheeffectivenessofasignalselectionalgorithm.EDRisdenedas: EDR=NumberofErrorsDetected NumberofDetectableErrors(5) 75

PAGE 76

Inacircuit,allerrorsmaynotbedetectedbythestateelements.Someoftheerrorsgetsuppressedbeforereachingastateelement(ip-opsandprimaryoutputs).Hence,itisimportanttoconsiderthenumberoferrorsthatcanbedetectedusingstateelementsandnotthetotalnumberoferrorsintroducedinthecircuit.ThesecondandthirdcolumnsinTable 5-1 showthenumberofip-opsandprimaryoutputsrespectively,forthefourISCAS'89benchmarksonwhichwehaveperformedourexperiment.Thenexttwocolumnsindicatethenumberoferrorsthataredetectedusingip-opsoroutputs.Notethatthetotalnumberofdetectableerrors(lastcolumn)arenotthesummationofvaluesinfourthandfthcolumnssincethereareoverlapofdetectableerrors. Table5-1. DetectableErrorsfortheISCAS'89benchmarks Circuit#FFs#outputsDetectableDetectableDetectablebyFFsbyoutputs(total) s537817949303338s92342282226226s13207669121381938s158505978729929 5.2.2ErrorModelWehaveassumedaperiodicallyrecurringerrormodelwhichtriestorepresentarealscenario.Initially,weselectasetofrandomnodesaspotentialerroneousones.Arandomfunctiongeneratorisusedforthispurpose.Weconsidereachoftheerrorsindependently,thatis,weconsideroneerroratatime.Oncethenodesareselected,arandomtimestampischosenwhichshouldbesignicantlylowerthanthetracebufferdepth4.Aftereachoccuranceofthistimestamp,theerroneousnodeisassumedtomalfunction.Wehavechosenasimplebit-ipmodelforourpurpose.Whenthenode 4Wehavechosenatimestampof100cycles,whichislessthanthetracebufferdepthof1000cycles. 76

PAGE 77

issupposedtomalfunction,itjustipsitscorrectstate.Theerroneousstateisthenpropagatedalongitsfan-outcone. 5.2.3ResultsInthissection,wecompareoursignalselectionperformancewithrestorationawaresignalselectionapproach[ 7 ].Thoughthereareothersignalselectionmethodssuchas[ 5 ]and[ 6 ],wecompareourapproachwith[ 7 ],thatprovidesthebestresults.Wehaveusedthesamesetofsignalsusedby[ 7 ]andcomparedtheEDRusingbothapproaches.TheresultsareshowninFigure 5-9 .Ascanbeseen,ourmethodprovidesuptotwiceimprovementcomparedto[ 7 ],when32stateelementsaretraced.Thisisbecausetheschemeproposedin[ 7 ]considersbothforwardandbackwardrestorationwhenselectingsignalsfortracing,whereasonlyforwarderrorpropagation(equivalenttobackwardrestoration)isusefulforerrordetection. Figure5-9. ComparisonwithRestorationawaresignalselection Inthenextsetofexperiments,weexplorehowEDRchangeswhenweincreasethenumberofsignals(ip-opsandoutputs)tobetraced.Weconsiderthenumberoftracesignals32,64and128andnotethechangesinerrordetectionperformance. 77

PAGE 78

TheresultsareshowninFigure 5-10 .Fors5378,anEDRof84%isobtainedwithatracebufferwidthof32.Furtherincreaseoftracebufferwidthincreasesthenumberoferrorsdetectedbyasmallamount(upto95%).Incaseofs9234,EDRvalueof46%isobtainedusingatracebufferwidthof32.Asexpected,increaseintracebufferwidthincreasestheEDRvalue.Almost77%EDRcanbeobtainedusingatracebufferwidthof128.Fors13207,theinitialerrordetectionissmall(18%EDRwithatracebufferwidthof32);however,asharpincreaseinerrordetectionperformanceisobtainedwhenthetracebufferwidthisincreased.Infact,theEDRtripleswhenthetracebufferwidthis128.Althoughs15850isalargebenchmarklikes13207,weseeaconsiderablelargeEDRwhenthetracebufferwidthis32(38%).Sincethereare673stateelements,furtherincreaseintracebufferwidthto64and128doesnotreectasharpincreaseinEDRvalue. Figure5-10. Variationoferrordetectionwithnumberoftracesignals OurnextexperimentistoseehowEDRchangeswhenweselectonlyip-opsfortracing.TheresultsareshowninFigure 5-11 .ThevariationtrendisalmostsimilartothatinFigure 5-10 ,sincethenumberofip-opsformamajorportionofthenumberof 78

PAGE 79

stateelements.Asbefore,s5378hasmostoftheerrorsdetectedwhenthetracebufferwidthis32,andhenceanyfurtherincreasehasminorimpact.Ontheotherhand,s9234ands13207havesharpincreaseinEDRwhenthetracebufferwidthisincreased. Figure5-11. Variationoferrordetectionwithnumberofip-opstraced Finally,wewouldliketoseehowtheerrordetectionvarieswithvariationinthenumberofoutputsignalsfortracing.Inthiscase,wetraceonlytheoutputsignals.Sincethenumberofoutputsignalsarerelativelysmall,thetracebufferwidthisvariedinstepsof4,8,16and32.NotethatthedenominatorofEDRcomputationusesthevaluesofcolumn5inTable 5-1 .TheresultsareshowninFigure 5-12 .Fors5378ands13207,variationoftracebufferwidthproducesasharpincreaseinthevalueofEDR.Fors9234,theEDRbecomes100%withwidthof4.Thisisbecausefors9234,only2errorscanbedetectedusingalltheoutputs,andthesearedetectedwhenonly4outputsaretraced.Excepts13207,whichhas121outputs,16-bittracebufferachieves80-100%EDRforallotherbenchmarks. 5.3SummaryInordertodetectanerror,thetracedsignalsshouldremaininthefan-outconeoftheerrorsignal.Wehaveproposedanalgorithmwhichtakesintoaccountthisfact 79

PAGE 80

Figure5-12. Variationoferrordetectionwithnumberofoutputstraced andselectssignalswithanobjectiveofdetectingerrors.Wehaveanalyzedseveralcaseswhereanysignal,onlyip-opsandonlyoutputsignalsareusedfortracing.Ourproposedapproachissignicantlybetter(upto2X)inerrordetectioncomparedtothestate-of-the-artexistingsignalselectionalgorithms. 80

PAGE 81

CHAPTER6DYNAMICSIGNALSELECTIONToimprovetheobservabilityduringpost-silicondebug,existingtechniques[ 5 7 52 ]selectasmallsetofprotablesignalsduringdesigntime.Theapplicabilityoftheexistingmethodsislimitedforvariousreasons.First,thesemethodstreateachcomponent(functionalregions)ofthedesignasequallyimportantfromdebugperspectiveandthereforeselectsignalsthataregloballybenecialbasedonrestorationcapability.Inotherwords,itassumesuniformspatialandtemporaldistributionoferrors.Inreality,certainregionsmaynotberelevantduringsomecyclesofoperationforvariousreasons.Forexample,asetofcoresinamulticorearchitecturemaybeinpowersavingmode(usingclockgating)duringcertaintimeframe.Therefore,noerrorispossibleinthosecoresduringthattimeframe.Similarly,certainregions(suchaswellverieddatapath)arelesslikelytohaveerrorscomparedtoothercontrol-intensiveregions.Ingeneral,onlyasmallsetofregionsmayberelevantduringaparticulartimeframefordebugginganerror.Therefore,avericationengineerwouldliketohaveknobsthatallowhimtotraceadifferentsetofsignalsatdifferenttimeframe.Prabhakaretal.[ 28 ]proposedanapproachtoselectbetweentwosetsofsignalsinalternatecycles.Asaresult,itisaveryspeciccaseoftemporaldistributionoferrorswithoutanyconsiderationforspatialdistribution.AmultiplexedsignalselectionforerrordetectionwasproposedbyLiuetal.[ 29 ].Theirapproachisanad-hocsignalselectionheuristicbasedonerrorvisibilitymetric.Thereisnodiscussiononhowsuchaselectionisbenecialfortheirtargeteddebugscenario.Inotherwords,theirapproachdoesnotconsiderthechallengesassociatedwithdynamicsignalselectioninthepresenceofspatialandtemporaldistributionoferrors.Weproposeanefcientsignalselectionalgorithmandassociatedtracecontrollerdesignthatwouldenablevericationengineerstodynamicallytracedifferentsetofsignalsforimprovederrordetection.Weproposearegion-awaresignalselection 81

PAGE 82

algorithm(RSS)thatselectsprotablesignalsduringdesigntime(staticanalysis)basedontheknowledgeoffunctionalregionsandassociatederrorzones.Wealsodevelopalow-overheaddynamicsignaltracing(DST)hardwaretoenabledesignerstotracedifferentsetofsignalsduringexecutionbasedonactive(relevant)functionalregions.Thislaysemphasisontheerrorsinactivezonesinthecircuitthatcanbedetectedusingaspecicallyselectedsetoftracesignals.Althoughourworkmightseemsimilartotraditionaltest-pointinsertionandobservabilityanalysis[ 53 ],itisfundamentallydifferentintwoaspects.First,ourapproachisdesignedspecicallyforpost-siliconvalidationanddebug.Also,incaseof[ 53 ],theobservationpointsaredeterminedbythenumberofsignalsinthefan-incone,andnotontheerrorpropagationprobabilityfromthesignalstotheobservationpoints.Tothebestofourknowledge,thisistherstattemptindevelopinganefcientspatio-temporalsolutionfordynamictracesignalselection.OurexperimentalresultsusingbothISCAS'89benchmarksandopencorescircuitsdemonstratethatourapproachisabletodetectupto3timesmoresignalscomparedtoexistingstate-of-the-arttechniques. 6.1ProblemFormulationThegoalofthischapteristodevelopanefcientdynamicsignalselectiontechniquetomaximizedetectionofcurrentlyactiveerrors1inacircuit.Variousindustrialstudieshighlightthefactthaterrorlocationsarenotuniformlydistributedacrossthecircuit,insteadtheyareclusteredinmultiplesmallzones.Wecallthemerror-pronezones(orerrorzones,inshort).Weassumethaterrorzonesduringpost-siliconvalidationcloselyresemblethoseinthepre-siliconphase.Wedividethecircuitintomultiplepartswhereeachpartcontainsoneormoreerrorzones.Wecallthesepartsasfunctionalregions(orregion,inshort).AnaturalboundaryforaregionwouldbethecomponentboundaryofanSoC.Forexample,eachcoreinamulticoreSoCcanformaregion. 1Thoselocatedinactiveregions,explainedlaterinthissection. 82

PAGE 83

Ifonecomponenthasmultipleerrorzones,wemayevendividethatcomponentintomultipleregionsfollowingsomefunctionalboundary.Forexample,aprocessorcorecanbedividedintotworegions,onecoveringfetchanddecodeunitsandtheothercoveringtherest.Inourconstruction,anerrorzoneiscompletelycontainedinsidearegionofthecircuit,thatis,weassumenooverlapofanerrorzonebetweenmultipleregions.Thereisatrade-offbetweennumberofregionsversuserrorzones.Oneregionpererrorzonemaycreatetoomanyregions(partitions)andleadtounacceptablecomputationalcomplexityandhardwareoverhead.Ontheotherhand,alargeregionwithmanydisjointerrorzonesmayreducetheeffectivenessofdynamicsignalselection.LetusconsideracircuitrepresentedbytherectangleinFigure 6-1 .TheentirecircuitisdividedintomregionsnamedR1toRm.Eachregioncanhaveoneormoredisjointerrorzones.Fortheeaseofillustration,weassumeoneerrorzoneperregion.Itdoesnotloseanygeneralitysinceoneerrorzonecanbeviewedasacompositionofmultipledisjointerrorclusters.Forexample,theerrorzoneZR1forregionR1consistsoftwodisjointerrorclustersinFigure 6-1 .ThesetwoclusterstogetherformtheerrorzoneZR1. Figure6-1. Illustrativeexampleshowingregionsanderrorzones Weconsideratracebufferofwidthn,thatis,nsignalstatescanbestoredinthetracebufferpercycle.Duringanyparticularcycle,someofthefunctionalregionsremainactive(relevant).Aregionisconsideredactiveifthegatesintheparticularregion 83

PAGE 84

functionnormallyandnotdormantduetopower-savingmode(usingclockgating)oranyotherreason.Regionswhichdonothaveanysignaltransitionduringcertaintimeframeareconsideredinactiveandhencearenotrelevant.Therearetwoextremescenarios.Whenalltheregionsareactive,oursignaltracingalgorithmgivesproportionalemphasistoeveryregionandtheassociatederrorzones.However,whenonlyoneregionisactive,benecialsignalsfromthatregionneedtobetraced.Weselectnsignalsfromeachofthemregions,formingatotalsetofmnsignals.Duringexecution,dependingonthecurrentlyactiveregions,nbestsignalswillbechosenoutofmnsignals.ItmustbenotedthatthensignalsfromregionRi(where1im)usedtodetecterrorsinZRicanbefromanywhereinRi(insideaswellasoutsideofZRi).Ourregion-basedsignalselectionalgorithm(RSS)inSection 6.2 describeshowthesesignalsareselected,whileanefcienthardwareimplementationfordynamicsignaltracing(DST)isdescribedinSection 6.3 6.2Region-basedSignalSelection(RSS)Algorithm 7 describesourregionbasedsignalselectionalgorithm(RSS)forselectingprotablesignalsduringdesigntime.Therststepcreatesagraph-basedmodelofthecircuit.Next,foreachregionitcomputestheerrorpropagationprobability(denedinSection 6.2.2 )fromeachnodeintheerrorzonetotheothernodesintheentireregion.Finally,foreachregionthemostprotablensignalsareselected.Theremainderofthissectiondescribesthesestepsindetail. 6.2.1GraphBasedModelingofCircuitsTherststepofAlgorithm 7 istoconstructagraphicalrepresentationofthecircuit.WeexplainthisstepusingourexamplecircuitinFigure 1-5 .WeredrawthecircuitinFigure 6-2 wherethe2regionsofthecircuitareshownclearly.ThegraphicalrepresentationisshowninFigure 6-3 .Eachsignalinthecircuitisrepresentedbyanodeandanydataowbetweentwonodesrepresentedbyanedge.Theedgeisirrespectiveofthetypeofgatebetweentwonodes.Forexample,ip-ops 84

PAGE 85

Algorithm7:Region-basedSignalSelection(RSS) Input: Circuit,Tracebufferwidthn,ErrorzonesZ1,...,Zm Output: mlistsofselectedsignals,SS1,...,SSmSSi=f/*InitializeallliststoNULL*/1:Createagraphicalrepresentationofthecircuit.Dividethecircuitintomregions,RicontainstheerrorzoneZi.2:/*ComputeerrorpropagationprobabilityforeachregionRi*/ForeachnodesinZi,computetheprobabilityofanerroratsgettingpropagatedtoanynodedinregionRi.4:/*SelectntracesignalsforeachregionRi*/whileSSidoesnothavensignalsorRiemptydo 4.1ForeachnodedinRi,computethesummationoftheerrorpropagationprobabilityforeachnodesatZi.Thisisthetotalerrordetectionprobability(EDP)atnoded.4.2SelectthenodejwiththelargestEDPvalue.4.3AddthenodetothelistSSi=SSi[j4.4.RemovenodejanditsoverlapfromRiendReturnthelists(SSi,...,SSm)withselectedsignals CandHareconnectedbyaNOTgate,hence,thetwonodesrepresentingthemhaveanedgeconnectingthem.Directedarrowssignifytheerrorpropagationdirection.R1andR2representtwodifferentfunctionalregionsofthecircuitwithrespectiveerrorzones,ZR1andZR2.LetusconsidertheregionR1.TheprobablesourcesoferrorarethenodesAandB.AnyerrorsinthesenodescanpropagatetotheothernodesinR1whichareintheirrespectivefan-outcones.Therefore,theerroratAcanpropagatetoF,D,EandG.Wewouldliketocomputethepossibilityofanerroratanyofthesetwoprobableerroneousnodes(AandB)topropagatetotheothernodes.Wecallthisprobabilityaserrorpropagationprobability,asdescribednext. 6.2.2ErrorPropagationProbabilityComputationWerstdescribehowtocomputeerrorpropagationprobabilitythroughsinglegates.Errorpropagationprobabilityisdenedastheprobabilityofanerrorpresentataninputofagatebeingpropagatedtoitsoutput.Errorpropagationprobabilityovermultiple 85

PAGE 86

Figure6-2. Examplecircuitwith2regionsand12ip-ops gateswillbeexplainedlater.Weconsidereachofthesinglegatesandcomputetheprobabilityofanerroratoneoftheinputsgettingpropagatedtotheoutput.IndividualGates.Tocomputetheprobabilityoferrorpropagation,werstconsideramulti-inputANDgateinFigure 6-4A .Lettheinputsbenamedi1,i2,...,inandtheoutputo1respectively.Letusassumeanerroroccursatoneoftheinputs,say,i1.Wewanttocomputetheprobabilityoftheerrortobepropagatedtoo1.Inorderforanyerror(0/1or1/0)topropagatetoo1,itisnecessarythatalltheotherinputsoftheANDgatebetiedat1.Ifanyoftheotherinputsisatstate0,theoutputwillalwaysbeatastateof0,irrespectiveofi1,hence,theerrorgetsundetected.Here,weassumealltheotherinputstotheANDgateareindependent.Therefore,theprobabilitythatallofthemare1simultaneouslyistheproductofeachoftheindividualprobabilities.Letp1ikbetheprobabilitythatinputikisatstate1.Therefore,theprobabilitythatalltheotherinputsare 86

PAGE 87

Figure6-3. GraphicalrepresentationofFigure 1-5 withtworegions at1isP1i1=2knp1ikwhichistheprobabilitythatanerrorati1willgetpropagatedtoo1,thatistheerrorpropagationprobabilitythroughtheANDgate.SimilarcomputationscanbeperformedforaNANDgate. AANDgate BORgateFigure6-4. ExamplesusingANDandORgates ThecomputationsforanORgatefollowstheapproachsimilartoanANDgate.Amulti-inputORgateisshowninFigure 6-4B .Let'sconsidertheerrorpropagationfromi1too2.Inordertopropagateanerrorini1too2,alltheotherinputsoftheORgatemustbeheldatastateof0.Theprobabilitythataninputikisheldat0isp0ik.Thejoint 87

PAGE 88

probabilitythatalltheinputsotherthani1isheldat0isP0i1=2knp0ikwhichistheerrorpropagationprobabilityfromi1too2.SimilarcomputationscanbeperformedforaNORgate.Foranyoneinputandoneoutputnode(suchasip-opandNOTgate),theerrorpropagationprobabilityisalways1.Nowwediscusshowtheprobabilityoferrorpropagationchangesacrossmultiplegates.Sincetherearemorethanonegatesinvolved,weneedtoconsiderbothindependentanddependentpaths.Apathisdenedastheseriesoflogicgateswhichareplacedinbetweensource(s)anddestination(d)nodes.Inotherwords,itsigniesthepathtraversedbyapotentialerroratnodestoreachnoded.IndependentPathsthroughMultipleGates.Anindependentpathisonewhichpassesacrossasetoflogicgateswitheachgatebeingvisitedatmostonce.WeexplaintheindependentpathscenariousingFigure 6-3 .Tokeepthingssimple,weassumeeachinternalsignalbeinginastateof0or1withaprobabilityof50%2.TheedgefromAtoEisanindependentpath,sincethereexistsonlyonepathfromAtoE,viaD.SincethereisonlyanORgatebetweenAandD,theprobabilityofanerroratAgettingpropagatedtoDistheprobabilityofL(theotherinputtotheORgate)beinginastateof0,whichis0.5inthiscase.Hence,theerrorpropagationprobabilitybetweenAandDis0.5.Similarly,sincethereisonlya2-inputANDgatebetweenDandE,theerrorpropagationprobabilitybetweenDandEis0.5.Sincenoneofthesignalsarevisitedmorethanonce,theoverallerrorpropagationprobabilitybetweenAandEistheproductofthesetwo,whichis0.25.Ingeneral,iftherearen+1signalsinanindependentpathbetweennodessandd,withtheirintermediateerrorpropagationprobabilitiesbeingp1,p2,...,pn,theoverallerrorpropagationprobabilityacrossthepathisP(s,d)=1knpk 2Thisassumptionisforexplanationpurposeonly;inrealexperiments,wegatherprolinginformationtodeterminethestateprobability.ItisexplainedindetailintheSection 6.4 88

PAGE 89

DependentPathsthroughMultipleGates.Adependentpathisoneinwhichwhilemovingfromasourcenodetoadestinationnode,atleastoneoftheinternalnodesisvisitedmorethanonce.WeexplaintheerrorpropagationprobabilitycomputationusingFigure 6-3 .ThereexiststwoindependentpathsbetweennodesAandG.Oneedgeis(A,F,G)whiletheotheris(A,D,E,G),bothbranchingoutatAandcombiningatG.InordertocomputetheerrorpropagationprobabilityacrossthepathbetweenAandG,weneedtocomputetheseindependentpathvaluesseparately.Forthepath(A,F,G),theerrorpropagationprobabilityistheproductoftheprobabilitiesbetweenthepaths(A,F)and(F,G),bothofwhich,forobviousreasonsare0.5.Thus,theerrorpropagationprobabilityofpath(A,F,G)is0.25.Ontheotherhand,sincethepath(A,D,E,G)passesthrough3independenttwo-inputgates,theeventualerrorpropagationprobabilityis0.125.Theerrorpropagationprobabilitythroughpath(A,G)canbecomputedasp(A,G)=max(p(A,F,G),p(A,D,E,G)).Thisisbecauseduringcomputation,theeffectoftwodifferentpathsarealreadytakenintoaccount,andapathwithahigherprobabilityofdetectinganerrorwillalwaysdominate.Ingeneral,iftherearenindependentpathse1,e2,..,enbetweentwonodessandd,thentheerrorpropagationprobabilityofthepathsbetweensandd,p(s,d)=max(pe1,pe2,....,pen) 6.2.3SignalSelectionBasedonNodeValuesInthissection,wedescribethenalstepinoursignalselectionalgorithm.Therstnodechosenfortracinginaregionisthenodewiththehighestvalue.Thevalueofanodeisthesumoferrorpropagationprobabilitiesofallpathsinwhichthenodeisthedestination.Forexample,inFigure 6-5 ,ifweconcentrateonRegionR1,thenodevalueofEwilldependonpaths(A,D,E),(L,D,E)and(D,E).SinceinthisexampleonlyAandBarepossibleerrorlocationsforR1,therelevantpathwouldbe(A,D,E);thepaths(L,D,E)and(D,E)isnotrelevantbecauseDandLarenotinerrorzone.ThereforethenodevalueofEwillbethesumoferrorpropagationprobabilityacrossthepath(A,D,E),thatis,0.25.Wecanhavesimilarcomputationsforotherregions.Thenode 89

PAGE 90

valuesofallthenodesinR1areshowninFigure 6-5 .Eachnodevalueisrepresentedbyanumberbesideit. Figure6-5. NodevaluesforregionR1inFigure 6-3 InFigure 6-5 ,threenodes(A,BandF)havehighestnodevaluesof1.AandBarenotvalidchoicessinceanyofthemcannotdetecttheerrorinothernode,whereasFcandetecterrorinbothAandBwith50%probability.Therefore,wechooseFastherstnodetotrace.Thesubsequentsignalsshouldbecarefullyselectedtoenhancetheerrordetectionintheregion.Contributionsfromsignalswhichhaveahigherrordetectionprobabilityfromthealreadyselectedsignalsshouldbedeleted.Step4.4ofAlgorithm2isusedforthispurpose.Thebasicideaisthatifanalreadyselectednode(e.g.,F)candetectanerror(e.g.,inA)withequalorhigherprobabilitythananothernode(e.g.,D),thentheoverlapfromtheothernodeshouldberemoved.Forexample,sinceFcandetectanerrorinAwith50%probability,thecontributionclaimedbyD(also50%forA)shouldbedeletedfromDduringthenextiteration.Thisprocesscontinuesuntilnbestsignalsareselectedforeachregionortherearenomoresignalstobeselected. 90

PAGE 91

6.3DynamicSignalTracing(DST)Algorithm 8 describesourdynamicsignaltracing(DST)procedureforimprovederrordetection.Theinputtothealgorithmarethechipdesign,tracebuffersize,activeregionsandassociatedrelevanceandsignallists.Relevanceofaregionindicateshowimportanttheregionisinerrordetection,orthepossibilityofndinganerrorinthatregioncomparedtootherregions.Therelevanceinformationisprovidedbythepre-siliconvericationengineerbasedonpercentageoferrorsfoundintheerrorzoneinthatregion(comparedtoothererrorzones)duringpre-siliconvalidation.Ifnoinformationisavailable,wecanconsiderthesizeoftheerrorzoneinthatregionasrelevance.Ifthetracebuffersizeisn,andtherearemactiveregionsinthecircuit,duringdesigntimeourRSSprocedure(Algorithm1)willselectmnsignals.Duringexecution,ourDSTprocedureneedstochoosensignalsfromthesemnsignalsthataremostprotableatacertaindurationdependingonthek(1km)activeregions. Algorithm8:DynamicSignalTracing(DST) Input: Circuit,Tracebuffersizen,kactiveregions(Ri),andrespectiverelevance(ri)andselectedsignallists(SSi) Output: Listofnsignalstobetraced,TSTS=f/*InitializetoNULL*/1:Here,ridenotetherelevanceofregionRi,andSSiisthemostprotablensignalsselectedforregionRi,where1im.Letr=i=mi=1ri2:FindthecontributionfromRi,Ci=nri r3:SelectthebestCisignalsfromSSi4:PuttheselectedsignalsinTS.5:Repeatsteps3-4forallkregions1ik.ReturntheselectedsignalsTS Sincewehavetoselectnoutofmnsignalsfortracing,itisreasonabletoadoptnmultiplexers,eachofwhichwillprovideasignalcorrespondingtothetracebufferoutput.Themainproblemistodividethemnsignalsamongthenmultiplexerssothatallpossiblecombinationsoftracesignalscanbeachieved.Anobviousbutexpensive 91

PAGE 92

Table6-1. SelectedSignalsforeachMUXforn=4andm=4 MuxnameInputSignalsforeachMUX MUX1A1,B2,B3,B4,C2,C3,C4,D2,D3,D4MUX2B1,A2,A3,A4,C2,C3,C4,D2,D3,D4MUX3C1,A2,A3,A4,B2,B3,B4,D2,D3,D4MUX4D1,A2,A3,A4,B2,B3,B4,C2,C3,C4 solutionwouldbetousenmultiplexerseachhavingallthemnsignalsasinputas1output.Oneoptimizationcanbeachievedbythefollowingobservation.Letusconsideracircuitwith4regions,RA,RB,RC,RD.SupposethesignalsresponsiblefordetectingerrorsinregionRAarenamedA1,A2,...,An,intheorderofpriority.IfsignalA1isnotselectedfortracing,subsequentsignals,thatis,A2,A3,....Anwillnotbeselectedfortracing.Thus,itisnotnecessarytokeepthesignalsunderthesamemultiplexerinputasA1.Thenumberofinitialsignalsselectedfromeachregiontofeedintothenmultiplexersaren m.Atotalofnsignalswillllintherststageofeachmultiplexer.Now,numberofsignalsremainingforeachregionisgivenbyn remain=n)]TJ /F3 11.955 Tf 13.89 8.1 Td[(n mUndereachmultiplexer,allthesesignalsexcepttheonefromthesameregionwillbestored.Forexample,ifmultiplexer1hassignalA1,thesignalsA2,A3andA4neednotbepartofitsinput.Thereforesizeofeachmultiplexerissize1wheresize=1+(m)]TJ /F5 11.955 Tf 10.95 0 Td[(1)(n)]TJ /F3 11.955 Tf 13.89 8.1 Td[(n m)Thussavingsobtainedismn sizewhichreducestomn 1+mn)]TJ /F8 8.966 Tf 6.96 0 Td[(2n+n m.Forthisexample,m=4andn=4;thereforethevalueforsizeis10.Ourinitialmultiplexerdesignwasofsize(mn)1,whichinthiscase,means161.Thus,wecouldreducethemultiplexersizeby16 10,thatis,1.6.Table 6-1 showsthecongurationsofeachmultiplexersindicatingthesignalsenteringeachofthem. 92

PAGE 93

Table6-2. Tableforn=2andm=2 CurrentStateSelectedRARBSignals 01(B0,B1)10(A0,A1)11(A0,B0) Now,wewouldliketoexplorethedesignforourdynamicsignaltracingalgorithm.Thetotalpossiblenumberofstatesis2m)]TJ /F5 11.955 Tf 11.4 0 Td[(1sinceatleastoneofmregionswillbeactiveatatime.Thisisindependentofn,thatis,thetracebufferwidth.Howevereachofthestateswillbedenedbynsignals,signifyingthensignalstobetracedatthattime.Table 6-2 showsasimpleexamplecontrollerillustratingdifferentsignalselectionsdependingonthestateofcurrentlyactiveregionswhenm=2andn=2.LetthetworegionsbeRAandRB.ThetwosignalsselectedfromeachregionbeingA0,A1andB0,B1respectively.Atanypoint,onlytwoofthesignalsarechosenfortracing.WhenonlyregionRAisactive,thesignalstobetracedarethetwosignalsfromregionRA,indicatedbyA0,A1.Similarly,whenonlyregionRBisactivethetwosignalstobetracedareB0,B1.Whenbothregionsareactive,thetracesignalstobeselectedareA0,B0. Figure6-6. Datapathandcontrollerdesignform=3andn=3 Similarly,whenm=3andn=3,itisevidentthattherewillbe7differentstatesforthisconguration.Letthe3errorregionsbeRA,RBandRC.SignalsinRAarenamed 93

PAGE 94

Figure6-7. ProposedDesign asA0,A1,A2andsimilarlyforallotherregions.Figure 6-6 showsthecontrolleranddatapathdesignforsuchaconguration.TheoverallstructureofourproposeddesignisshowninFigure 6-7 .Hereweconsideradesignwithnmultiplexersthatwouldproducentracesignals.Theoutputofthemultiplexersarefedtoatracebuffer.Thetracecontrollerprovidesthecontrolsignalstothemultiplexersbasedonthelogicmentionedabove.ThetracecontrolleroperatesunderthesameclockastheDesignUnderTest(DUT).Anexternalknobisappliedonthetracecontroller(generallybythevalidationengineer)whichcontainsinformationonthecurrentlyactiveerrorzonesinthecircuit. 6.4Experiments 6.4.1ExperimentalSetupWeveriedtheeffectivenessofourregion-basedsignalselection(RSS)anddynamicsignaltracing(DST)algorithmsusingsomeofthelargestISCAS'89benchmarksaswellasopencorescircuits.Ineachofthesubsequentexperiments,weconsideranumberofregionswitheachregionhavingoneerrorzone.Eacherrorzonecomprises 94

PAGE 95

ofabout5%oftherespectiveregion.Weinserted50randomerrorsintheerrorzonesoftheactiveregions,withtheerrordensityproportionaltotheregionsize.Weassumeasimplebit-ipmodelforerror,thatis,atparticularcycle,theerrorsignalwilljustipitsstate.Prolinginformationisobtainedbyrunninganidealsimulationof1000cycleswithrandominputvectorsandnotingthepercentageofeachsignalstate.Weperformtwosimulations,onefortheidealcase,whenallthesignalsareassumedtobeerrorfree,andonewiththeerroneoussignalsincluded.Itshouldbenotedthatweconsidertheerrorsindividually,inordertopreventeacherror'seffectfromsuppressinganother.Theerrormodelisassumedtobesporadic,thatis,errorsdonotkickoffeverycycle,butaftercertainintervals.Forourcase,weassumetheerrorstobemanifestedafterahiatusof100cycles.Thesimulationperformedisoftotal1000cycles,thatis,atotalof50000cyclesforthe50errors.Anydiscrepancyinthetracedsignalstatesisreportedaserror.ThemetricusedtomeasureerrordetectionperformanceisErrorDetectionRatio(EDR),asdenedinEquation 5 .Wehaveappliedouralgorithmsusingawidevarietyoftotalregions(m)andactiveregions(k,km).Inthissection,wesummarizetheresultsforthreescenarios(eachhavingseveralsubcases):2regions(bothactiveandonlyoneactive),3regions(allactive,twoactive,andonlyoneactive),4regions(allactive,threeactive,twoactive,andonlyoneactive).Ineachofthesesubcases,wepresenttheaverageofallpossiblescenarios.Forexample,incaseofk=1andm=2,theresultsshowtheaverageoftwopossiblescenarios:R1isactiveorR2isactive.Wecomparethefollowingthreeapproaches: GSS.Thisapproachrepresentstheexistingtechniquesthatfocusonglobalsignalselection(GSS)withoutanyknowledgeoferrorzonesoractiveregions,assumesthattheerrorsareuniformlydistributedacrossthecircuit.Thesignalsareselected 95

PAGE 96

inanapproachsimilarto[ 7 ]3;theonlydifferencebeingthatwehaveconsiderederrordetectionandnotrestorationduringsignalselection. Figure6-8. GSS EZ-GSS.Weextendtheexistingmethodswiththeknowledgeoftheerrorzonestoevaluatetheireffectivenessinhandlingerrorzones.Wecallthisapproachaserror-zoneawareglobalsignalselection(EZ-GSS).Thisisastaticsignalselectionassumingallzonesareactive. Figure6-9. EZ-GSS RSS+DST.Ourapproachisessentiallyacombinationofregion-awaresignalselection(RSS)anddynamicsignaltracing(DST). 6.4.2ResultsforTwoRegionsForeachofourexperimentalcircuits,wecreatedtworegionseachhavingoneerrorzone.Intherstsetofexperiment,weassumebothzonesareactive.Inthiscase, 3Althoughthereareothersimilartechniqueslike[ 6 24 52 ]whichcanbeconsideredasGSS;noneofthemconsiderserrordetection(insteadfocusesonrestorationratio).Moreover,theseapproachesdonottakeintoaccountthepresenceoferrorzonesandtheonescurrentlyactive.Therefore,noneoftheseapproachesareexpectedtoprovideasignicantperformance.Wehavechosenone([ 7 ])fromalltheseapproachesasarepresentativeofGSS. 96

PAGE 97

Figure6-10. RSS+DST EZ-GSSandRSS+DSTaresame,sincewehavetoconsiderbothzonesforsignalselectionevenduringDST.TheresultsareshowninFigure 6-11 .Asexpected,ourapproachperformsbetterthanGSS,withthemaximumimprovementbeing1.75times,sinceourapproachlaysmoreemphasisontheerrorzones. Figure6-11. ComparisonofEDRperformancewhenbothregionsareactive Inthenextexperiment,weassumeoneofthetwoerrorzonesareactiveataparticulartime.Now,wewouldliketocomparetheEDRperformanceofourthreeapproaches,GSS,EZ-GSSandRSS+DST.TheresultsareshowninFigure 6-12 .GSSperformstheworstamongthethreesinceithasnoknowledgeofwheretheerrorislocatedorwhichregionisactive.EZ-GSSperformsbetterthanGSSbuthasnoknowledgeofactiveregions.RSS+DSTperformsthebestsinceitdynamicallyselectssignalswiththecompleteknowledgeofcurrentlyactiveerrorzones.ThemaximumimprovementobtainedbyourapproachagainstGSSisalmost3times. 97

PAGE 98

Figure6-12. ComparisonofEDRperformancewhenonlyoneregionisactive WewouldnowliketoobservetheperformanceofourapproachonsomerealcircuitsobtainedfromtheOpencores[ 47 ]website.Wechoosethreecircuitsforourpurpose,namelyRS232Uart,OPBOnewireandi2cslave.Thesewillbereferredtoasuart,oneandslaverespectivelyforfurtherdiscussion.WesynthesizedtheseusingSynopsysDesignCompilertoobtainthegate-levelnetlistfromtheRTLdescriptions.Foreachofthesecircuits,weconsidertwoerrorregionsofwhichoneisactiveatatime.TheresultsareshowninFigure 6-13 .Asexpected,forallthreebenchmarks,ourproposedmethodsEZ-GSSandRSS+DSTperformsmuchbetterthanGSS.RSS+DSTperformsbestinallcases;howeverforoneperformanceofEZ-GSSandRSS+DSTaresimilar.ThisisbecauseofallthesignalsselectedusingRSS+DST,theoneswhichcandetectmostoftheerrorsareselectedusingEZ-GSSaswell. 6.4.3ResultsforThreeRegionsIntheseexperiments,wecreatedthreeregionsforeachcircuit.Intherstexperiment,weassumeonlyoneofthethreeregionsareactive.TheEDRperformancecomparisonisshowninFigure 6-14 .TheRSS+DSTnumbersaretheaverageofthreepossiblescenariosofoneactiveregioninthecircuit.Upto3timesimprovementisobtainedbyourapproachcomparedtoGSS,whilecomparedtoEZ-GSS,RSS+DSThasamaximumimprovementof1.56. 98

PAGE 99

Figure6-13. ComparisonofEDRperformanceontheOpencorescircuits Figure6-14. ComparisonofEDRperformancewhenoneregionisactive Inthenextsetofexperiments,wecomparewhentwoofthethreeregionsareactive.TheresultsareshowninFigure 6-15 .TheRSS+DSTnumbersaretheaverageofthreepossiblescenariosoftwoactiveregionsinthecircuit.RSS+DSTperformsthebestamongthethreeapproaches,withthemaximumimprovementobtained2times(comparedtoGSS)and1.3times(comparedtoEZ-GSS). 6.4.4ResultsforFourRegionsInthiscase,wecreatefourregionsineachcircuit.Intherstexperiment,wewouldliketocomparewhenoneamongthefourzonesisactive.Theresultsareshownin 99

PAGE 100

Figure6-15. ComparisonofEDRperformancewhentworegionsareactive Figure 6-16 .Asexpected,RSS+DSTperformsbestwiththemaximumimprovementobtained3.2timescomparedtoGSS. Figure6-16. ComparisonofEDRperformancewhenoneregionisactive Inthenextexperiment,weobservetheEDRperformancewhen2ofthe4zonesareactive.TheresultsareshowninFigure 6-17 .RSS+DSTperformsupto2timesbetterinerrordetectioncomparedtoGSSand1.4timescomparedtoEZ-GSS.Finally,weassumethatthreeoutofthefourzonesareactiveandtrytoobservetheerrordetectionperformance.Theresults,showninFigure 6-18 ,revealthatRSS+DSTperformsbettererrordetectionthananyoftheothertwoapproaches.Notethattheimprovementisnotassignicantasinotherscenarios.Thisisbecausewhenmore 100

PAGE 101

Figure6-17. ComparisonofEDRperformancewhentworegionsareactive zonesareactive,GSSandEZ-GSScandeliverbetterresultsrelativetoRSS+DSTcomparedtowhenonlyfewregionsareactive. Figure6-18. ComparisonofEDRperformancewhen3regionsareactive 6.4.5HardwareOverheadWehavedevelopedaVerilogmodulethatisparameterizableformregionspercircuitandntracesignals.Wehavesynthesizedbothourcontroller(thatgeneratesselectedsignalsfortheMUXes)andthedatapath(MUXstructure)describedinSection 6.3 usingSynopsysDesignCompilerwithlsi 10ktechnologylibrary.Thecontrollerareacorrespondington=32andm=4(areasonablyrealisticscenario)is239.Thecorrespondingdatapathareaconsistingof32multiplexersis185.Thereforethetotal 101

PAGE 102

areaforourdesignis239+185=424.Thetracebuffer,whichisanintegralpartofpost-silicondebugmethodologywouldoccupymuchmoreareacomparedtothecontroller.Atypicaltracebufferof321024bits,whensynthesizedusingthesamelibraryisfoundtooccupyanareaofalmost60000,whichisabout141timesmorethanthecontrollerarea.Webelievethatthetracecontrollerhasacceptable(negligible)areaoverheadconsideringthatourapproachcandetectupto3timesmoreerrorscomparedtostate-of-the-artexistingmethods. 6.5SummaryExistingtracesignalselectiontechniquesassumethaterrorsareuniformlydistributedacrossthecircuit.Thisassumptionmaynotbevalidinmanypracticalscenarios.Duringdesigntime,ourregion-awaresignalselectionapproachselectsbenecialsignalsforeachregionbasedoninformationregardingerrorzones.Duringexecution,ourdynamicsignaltracingcontrollerenablesdesignertotraceadifferentsetofsignalsbasedonregionsthatarerelevant(active)duringacertainduration.Ourexperimentalresultsdemonstratedthatourapproachcandetectsignicantlymore(upto3times)errorscomparedtoexistingapproaches. 102

PAGE 103

CHAPTER7TRACEDATACOMPRESSIONUSINGSTATICALLYSELECTEDDICTIONARYDuringpost-silicondebug,thetracedsignalstatesarestoredinanon-chiptracebuffer.Thetracebuffersizedictatestheamountofdatathatcanbestored,andhencedirectlyaffectstheobservabilityofthedesign.Sincethetracebufferisusedonlyfordebugging,itisbettertokeepitssizeassmallaspossibletoreducetheoverallcost,areaandenergyrequirements.Anoptiontoenhancetheobservabilitywithoutcompromisingonthedebugoverheadistocompressthetracedatabeforestoringtheminthetracebuffer.Wehaveproposedalosslessdictionarybasedwidthcompressionschemethatoperatesinreal-timetocompressthetracedata.Unlike[ 16 ],ourmethodchoosesthedictionaryofine,whichprovidesabettercompressionperformanceaswellashugereductionincompressionarchitectureoverhead.Threedifferentcompressionalgorithmshavebeenproposedtotrade-offbetweencompressionperformanceandarchitectureoverhead.WehaveusedCompressionRatio,denedinEquation 7 ,asametrictomeasuretheefciencyofacompressionalgorithm.Ahighercompressionratioimpliesabettercompression. CompressionRatio=UncompresssedDataSize CompressedDataSize(7) 7.1TraceDataCompressionTheexistingcompressiontechniquescompressthetracedatabyselectingadictionarydynamicallyduringexecution.Thisnotonlyresultsininferiorcompressionperformance(duetonon-optimaldictionaryselection),butalsoincreasesthearchitectureoverhead.Thissectiondescribesourtracedatacompressiontechniques.TheoverviewisshowninFigure 7-1 .Ourapproachisbasedonanimportantobservation.Inanypost-silicondebugenvironment,afterthetracedataiscollectedfromthechip,itisvalidatedbychecking 103

PAGE 104

Figure7-1. Overviewofourtracecompressionprocedure withasetofidealtracedata,thatisobtainedfromagoldenmodel.Sinceveryfew(2-5%)bugsactuallyremaintobetrackedduringthepost-silicondebugphase,thereareafewcycleswhichproduceerroneousvalues[ 14 15 ],thataredifferentfromtheidealones.Weutilizethisinformationtodesignourapproach.Sincethedifferencebetweentheidealandtheactualtracedataisverysmall,thesamedictionaryapplicableforidealtracedatacompressioncanbereusedforcompressionoftheactualtracedata.Thistakescareofthetwoproblemsbyprovidingabettercompressionperformance,andreducingthearchitectureoverhead1aswell.Thesecompresseddataarethenreadoutthroughachanneltoadebugger,wheretheyarecheckedagainsttheidealtracedata.Anydiscrepancyinthetracedataisreportedaserror.AscanbeseenfromouranalysisinSection 7.1.3 ,introductionof2-5%errorintracedataresultsin2-6%penaltyincompressionperformance,whichisacceptable.ItcanbeseenfromthediscussionsinSection 7.2 ,evenwiththeintroductionoferrors,ourtechniqueprovideslesscompressionpenaltycomparedtotheexistingtracecompressionmethods[ 16 ].Theremainderofthissectiondescribesourdictionaryselectionalgorithmsandalsoperformsatheoreticalanalysisofthemaximumpenaltypossiblewhenthedictionaryfromtheidealtracedataisusedtocompresstheactual(potentiallyerroneous)tracedata. 1noneedtoimplementadynamicdictionaryselectionalgorithm. 104

PAGE 105

7.1.1DictionarySelectionAlgorithmsWehaveexploredthreecompressionalgorithmsforcompressionofthetracedata,namelyDictionarybasedcompression(DC),Bitmaskbasedcompression(BMC)andxedDictionaryMBSTW(fMBSTW)basedcompression.Allthesethreetechniquesuseadictionaryforcompression.Thedictionaryselectionisextremelyvitalsinceitwouldbereusedtocompresstheactualtracedata.Wewillnowdescribehowthedictionariesareselectedinordertoachievethemaximumcompressionperformance. 7.1.1.1Dictionary-basedcompression(DC)Algorithm 9 outlinesthedictionaryselectionmethod.Inadictionarybasedcompression,themainaimistoincludeinthedictionaryalltheuniqueentrieswhichhavemaximumrepetitionsinthedataset.Therefore,therststepdeterminesalltheuniqueentriesinthedataset.Wethenndthenumberofrepetitionsforeachentry.Theuniqueentriesaresortedinadescendingorderofthenumberofrepetitions.Theentrieswiththehighestnumberofrepetitionsareincludedinthedictionary.Detailsondictionaryselectionforbitmask-basedcompressionhasbeenexplainedin[ 54 55 ]. Algorithm9:DictionaryselectionalgorithmforDC M=Numberofuniqueentries N=NumberofDictionaryEntries DIC=Dictionary foreachentryinMdo Calculatethenumberofrepetitionsintheentiredataset endfor SorttheMentriesindecreasingorderofrepetitioncount IncludetherstNentriesinDIC 7.1.1.2Bitmask-basedcompression(BMC)Thedictionaryselectionforbitmaskbasedcompressionfollowsthesametrendasthedictionarybasedcompression,thatis,selectdictionaryentriesgivingthemaximumsavings.However,thereisaminordifferencebetweenthetwo.WhilesavingsforDCcorrespondstojusttherepetitions,forBMCitincludesthoseduetobitmaskbased 105

PAGE 106

matchingsaswell.Hence,thesavingsforeachuniqueentryshouldbecalculatedbasedonthedirectaswellasbitmaskbasedmatches.Theentriesarethensortedinorderofsavingsandincludedinthedictionary.ThedictionaryselectionalgorithmisshowninAlgorithm 10 Algorithm10:DictionaryselectionalgorithmforBMC M=Numberofuniqueentries N=NumberofDictionaryEntries DIC=Dictionary foreachentryinMdo Calculatethesavingsduetorepetitionandbitmaskbasedmatchingintheentiredataset endfor SorttheMentriesindecreasingorderoftotalsavings IncludetherstNentriesinDIC 7.1.1.3FixeddictionaryMBSTWcompression(fMBSTW)ThecompressiontechniqueforfMBSTWalgorithmfollowsthesametechniqueasMBSTWcompression[ 16 ].ThedifferencefromMBSTWisthatthedictionaryisselectedstaticallyandthenumberofdictionaryentriesislimited.WewouldnowexplainthedictionaryselectionstepsforfMBSTWinAlgorithm 11 .Thisalgorithmisshownfora2-fMBSTW(2-stringsareencodedtogether,similarto2-MBSTW).Thiscanbefurtherextendedto3-fMBSTW,where3stringsareencodedtogether. Figure 7-2 showsanillustrativeexamplefordictionaryselectionusingAlgorithm 11 .Inthisexample,thestringsinthetracedataarerepresentedusingp,q,r,s,t.Theamountofsavingsforeach2-tupleisshowninFigure 7-2 .Wewanttohaveadictionaryofsize4.Ascanbeseen,thehighestsavingsisobtainedfromthe2-tuple.Bothofthesearenowincludedinthedictionary.Thelast entryisshere.Now,weproceedtoseewhich2-tuplewiththerstentryshasthemaximumsavings.isselectedasthe2-tupleandincludedinthedictionary.Whensearchingforthenext2-tuple,itisseenthatgivesthehighestsavings.However,risalreadypresentinthedictionary. 106

PAGE 107

Algorithm11:DictionaryselectionalgorithmforfMBSTW M=No.ofuniqueentries N=No.ofDictionaryEntries DIC=SetofDictionaries rst entry=last entry=NULL Createa2-tupleforeachpairofentriesinM foreach2-tupledo Calculatethesavingsacrosstheentiredatasetassumingonlythistupleisinthedictionary endfor Findthe2-tuplewiththehighestsavingsandaddittoDIC rst entry=rstentryof2-tuple last entry=lastentryof2-tuple N=2 whileSizeofDIClessthanNdo ndthe2-tuplethatstartswithlast entryandproducesmaximumsavings ifsucha2-tupleexiststhen Includethe2-tupleinDIC,N=N+1 else Findany2-tuple(notcontaininganentryalreadyinDIC)whichhasthehighestsavingsandincludeitinDIC last entry=lastentryofthe2-tuple,N=N+2 endif endwhile Hence,isavoided.The2-tuplehavingthenexthighestsavingsis.Therefore,tisselectedforthedictionary.Inthisway,thedictionaryisbuiltup. 7.1.2DynamicTraceDataCompressionOurnalgoalistodebugtheDUT,forwhichweneedthetracedatafromit.ApplicationofasetoftestsproducesthetracedatafromtheDUTwhicharecompressedtoreducethesizeofthetracebuffer.TheoverviewofthecompressionarchitectureisshowninFigure 7-3 .Ascanbeseen,thecompressionarchitectureconsistsoftwoparts,thedictionaryandtheactualcompressionengine.Dependingonthedesignandassociatedconstraints,aspeciccompressionalgorithmanditscorrespondingdictionaryisused.Forexample,whenBMCismostsuitableforadesign,thecompressionenginewillhaveBMCinitandthedictionarywillbetheoneselected 107

PAGE 108

Figure7-2. ExampleofdictionaryselectioninfMBSTW Figure7-3. ActualTraceDataCompression 108

PAGE 109

forBMC.Itshouldbenoted,thatthedictionarysizeisxedhereandnotvariableasinthecaseofdynamicdictionaryselection[ 16 ].Actually,[ 16 ]triedtoincludeeverysingleuniquestringinthedictionary.Thisincreasesthedictionarysize,therebyintroducingsignicantarchitectureoverheadandalsodegradesthecompressionperformance(sincethenumberofbitsusedtoindexthedictionaryincreaseswithanincreaseindictionarysize).Ourapproacheliminatesthesedisadvantagesbykeepingalimitednumberofprotableentriesinthedictionary. 7.1.3PerformanceAnalysiswithErroneousTraceDataOurapproachispromisingduetouseofstaticallyselecteddictionary.However,thisdictionarywillbeusedtocompressactual(potentiallyerroneous)tracedata.Thissectionanalyzesourprocedureanddeterminestheperformancedegradationthatmayoccurwhenthedictionaryobtainedfromidealtracedataisusedtocompresstheactualtracedata.Wehavekeptthetracedatalengthconstantat32bits.Weintroduceatermcompressionpenalty,whichistheratioofthenumberofextrabitsneededforcompressionwhenerrorisintroduced,comparedtotheoriginaltracedatalength.Obviously,alowercompressionpenaltysignieslessnumberofbitsneededtoaccommodatetheerror,andhence,abettercompressionperformance. CompressionPenalty(CP)=Numberofextrabitsneeded Sizeoforiginaltracedata(7)WerstanalyzethecompressionpenaltyforDCandBMC.Next,similaranalysisisperformedforfMBSTW. 7.1.3.1CompressionpenaltyforDCandBMCWetrytoobtainthecompressionpenaltiesforthetwomethodsDCandBMC.Inthissection,wemaketwoimportantobservations.Theorem1.Whenstaticallyselecteddictionary(basedongoldentracedata)isused,thecompressionpenaltyisboundedbythepercentageoferrorintroducedintheactualtracedata. 109

PAGE 110

Proof.Lettherebexstringsintheoriginaltracedata.Letthepercentageoferrorincaseofactualtracedatabel,expressedasafraction(l<1).Theintroductionoferrorchangeslxstrings.Intheworstcase,alltheselxstringswillbeamongthestringsoriginallycompressed,andthesewillnowbeuncompressedduetocontamination.Letthenumberofbitsrequiredtocompresstherest(thatis(1)]TJ /F3 11.955 Tf 10.98 0 Td[(l)xstrings)inthedatasetbeM2.Itshouldbenotedthatthesestringsarenotaffectedduetoerrorinjectionandhence,thevalueofMremainsconstantinbothcases.Letthenumberofdictionaryentriesbe2d,sothatdbitsareneededtorepresentthedictionary.Thelxstringswerecompressedintheidealcaseusing(1+d)bitseach.Ifyidealbethenumberofbitsaftercompressionfortheidealtracedata,itcanberewrittenas, yideal=M+lx(1+d)(7)Now,let'sanalyzetheactualtracedata.Intheworstcase,allofthelxstringsremainuncompressed.Eachofthesestringswillrequire33bits3toberepresented.TheMbitsrequiredtorepresentthe(1)]TJ /F3 11.955 Tf 11.13 0 Td[(l)xstringswillremainthesame.Ifyfaultyisthenumberofbitsneededtorepresentthestringsnow,itcanberepresentedas yfaulty=M+lx(33)(7)whichimplies, yfaulty=yideal+lx(32)]TJ /F3 11.955 Tf 10.95 0 Td[(d)(7)Therefore,numberofextrabitsneeded,representedasyextra,is yextra=yfaulty)]TJ /F3 11.955 Tf 10.95 0 Td[(yideal=lx(32)]TJ /F3 11.955 Tf 10.95 0 Td[(d)(7) 2Someofthestringsmaybecompressed,whiletherestuncompressed332bits(originalsize),plusonebittoindicatenotcompressed 110

PAGE 111

IfCPDCisthecompressionpenaltyforDC,thenfromthedenition, CPDC=l(32)]TJ /F3 11.955 Tf 10.95 0 Td[(d) 32(7)AscanbeseenCPDCisalwayslessthanl,andhenceisboundedbyit. Forexample,with8dictionaryentries,wehaved=3,andassumingtheerrorrateis5%,(whichisthemaximumerrorrateinthesescenarios[ 14 15 ]),wegetCPDC=4%Thus,weseethataveryslightcompressionpenaltyisintroducedinDCevenintheworstcase.ItcanbeseenfromEquation 7 thatincreaseindictionarysizecanlessenthisdegradation.Theorem2.Comparedtotheidealcase(ifdictionarywasselectedusingerroneoustracedata),thecompressionpenaltyusingstaticallyselecteddictionary(usinggoldentracedata)willbeboundedbythetwicethepercentageoferror.Proof.Wewouldliketoseeiftheactualtracedatawerecompressedwithoutthehelpofidealdictionary,howmuchcompressionwouldbeobtained.Inthiscase,thedictionaryentriesmightdifferfromtheidealdictionary.IfnistheextranumberofstringsthatcanbecompressedintheactualcaseandMisthenumberofstringsthatwerecompressedintheidealcase,thenthetotalnumberofstringscompressedarem+n)]TJ /F3 11.955 Tf 11.22 0 Td[(lx.Itisobviousthatthemaximumvalueofncanbelx,otherwise,thesenewstringswouldhavebeencompressedincaseofidealtracecompression,thatis,thesenewstringswouldhavebeenrepresentedintheidealdictionary.Therefore,themaximumnumberofstringscompressedisM,whichisthesamecaseasingoldentracedata.Asanexample,considerahypotheticaltracedatasetof20entries.Supposewechoosethebest2entriesinthedictionary,eachofwhichcancompressatotalof5entries.Thereforethetotalnumberofcompressedentrieswillbe10.Correspondingto 111

PAGE 112

thesymbolsdescribedabove,x=20,m=10andd=2.Lettheerrorratebe10%,thatisl=0.1.Whenerrorisintroduced,thenumberofstringscontaminatedislx,thatis,2.Intheworstcase,boththesestringswerepartofMandarenowleftuncompressedduetoerrors.Thenumberofcompressedstringsnowarem)]TJ /F3 11.955 Tf 10.98 0 Td[(lx,whichisequalto8.Now,ifwetrytocompresstheseerroneousdatawithadifferentsetofdictionary,letthenumberofextrastringsbeingcompressedben.Itisobviousthatifnisgreaterthan2,thenewdictionarywouldhavebeenselectedintherstplace,sothatthevalueofMwouldbedifferent.So,themaximumvalueofnisboundedbylx.However,inthebestcase,thesecontaminatedstringscanbeallcompressedusingsomeotherentry,whichisnotpartofthedictionarynow.Letusreiterateourpreviousexampletoexplainthis.Forexample,allofthelxcontaminatedentriescanbecompressedusingsomeotherentry.Now,ifthatentryhashighenoughfrequency,itwillbeincludedinthedictionary.Inthisexample,themaximumfrequency(original,withoutcontamination)thatanentrycanhaveis5;otherwise,itwouldhavebeenincludedintheoriginaldictionary.Therefore,themaximumnumberofstringsthatcanbecompressedwiththenewdictionaryism+lx,thatis,12inthiscase.Hence,themaximumnumberofstringsthatcanbecompressedextrausingthedynamicallyselecteddictionaryis(m+lx))]TJ /F6 11.955 Tf 11.23 0 Td[((m)]TJ /F3 11.955 Tf 11.23 0 Td[(lx),thatis,2lx,whichmeansthedifferenceincompressionratioshouldbe2l.Therefore,thedifferenceincompressionefciencybetweenthedictionarybasedongoldendataanddictionarybasedonactualdata,willbeboundedbytwicetheerrorrateinthedata. ItcanbenotedthattheanalysisforBMCwillbesimilartoDC.Thisisbecause,evenforBMC,theworstcasecomeswhensomestringswhichwerecompletelycompressed(notusingbitmasks)changetouncompressedduetoerrorintroduction. 7.1.3.2CompressionpenaltyforfMBSTWTondthecompressionpenalty,weanalyzetheworstcaseconditionfor2-fMBSTWhere.Theworstcasescenariocanbedividedintwoparts.Therstpartissimilarto 112

PAGE 113

thatofDCandBMC,thatis,theworstpartcomeswhensomecompletelycompressedstringsbecomeuncompressed.Thesecondpartoftheconditionisexplainedasfollows.Suppose,twoconsecutivestringscorrespondtotwoconsecutivedictionaryentriesa,b.Therefore,allthetwostringswillbecompressedusingthe11prex,followedbythedictionaryentrycorrespondingtoa.However,ifeitheraorbgetscontaminatedbyerror,intheworstcase,oneofthemisuncompressedandtheotheronegetscompressedseparately,whichrequiresmorebitstocompressthetracedata.Wenowinvestigatethecompressionpenaltyinthisapproach.Lettheerrorrateandthenumberofstringsbelandxasbefore.Letdbethenumberofbitstorepresentthedictionaryindex.Therefore,thenumberofsuchstringschangedislx.Eachofthesestringcorrespondstoatuplewhichisbrokenduetoperturbation.Beforetheintroductionoferror,thenumberofbitsrequiredtocompresstheseisgivenasyidealinEquation(7) yideal=lx(2+d)(7)Here,2bitsareneededtorepresenttheprex11anddbitsforthedictionaryindexofa.Afterperturbation(ofb),intheworstcase,aisindependentlycompressedassinglebitsusingtheprex014.Therefore,thenumberofbitsneededtorepresentare(36+d)5.Therewillbelxsuchoccurances.Therefore,thetotalnumberofbitsneededtorepresenttheerroneoustuplesisgivenbyyfaultyas yfaulty=lx(36+d)(7) 4asdiscussedinSection 7.1.1.3 5(2+d)+(2+32),where2+dbitsareneededtocompressaand2+32bitsareneededtorepresenttheuncompressedstringb 113

PAGE 114

Asbefore,letMbethenumberofbitsrequiredtocompresstheother(1)]TJ /F3 11.955 Tf 11.02 0 Td[(l)xstrings.SinceMisunchangedineithercase,thenumberofextrabitsneeded,isgivenby yextra=yfaulty)]TJ /F3 11.955 Tf 10.95 0 Td[(yideal=lx34(7)Therefore,thecompressionpenaltyisgivenby, CPfMBSTW=l34 32(7)Witha5%errorratewecanseethat,CPfMBSTW=5.31%Thus,evenwithanintroductionof5%error,thecompressionpenaltyissmall.TheseanalysiswillbelaterveriedwithexperimentalresultsinSection 7.2.3 7.2ExperimentsWehavecomparedthecompressionperformanceofourapproachwiththealgorithmsproposedbyAnisetal.[ 16 ](MBSTWandWDLZW).Wehavealsoinvestigatedourcompressionperformancewhenthenumberofdictionaryentriesarevaried.Wehaveshownthatourmethodsrequiremuchlesscompressionarchitectureoverheadcomparedtothosein[ 16 ].Finally,inSection 7.2.3 ,wehavealsoanalyzedtheeffectofintroductionoferrorsoncompressionratioandvalidatedtheequationsdevelopedinSection 7.1.3 .Wehaveappliedallthealgorithmsonthe5largestISCAS89benchmarks. 7.2.1CompressionPerformanceFirst,wecomparethecompressionperformanceofouralgorithmswiththealgorithmsin[ 16 ]usingthetracesobtainedfromISCAS89benchmarkcircuits.Thetraceswereobtainedbyfollowingtheapproachoutlinedin[ 7 ].TheresultsarereportedinFigure 7-4 .Wehavexedthedictionaryentrytobe8ineachofthetwocompression 114

PAGE 115

algorithms,DCandBMC.ForMBSTW,wehaveusedthe2-MBSTWalgorithm6.ForthefMBSTWalgorithm,thenumberofdictionaryentriesisooredtothenearestintegerwhichisapowerof2.ItcanbeseenthatthefMBSTWapproachworksbestinallcasesexcepts38584.Thisisbecausethetracesofs38584haveverylessnumberofuniqueentries.Asaresult,evenwith8dictionaryentries,alargeportionofthecircuitcanbecompressedusingDC.DCworksbetterthanMBSTWinmostcasesandworseonlyinsomecases(s9234ands35932).Thereasonforthisisthelargenumberofuniqueentriesinthosetracedata,whichareeffectivelycapturedbyMBSTW,butnotbythe8-entrydictionaryusedinDC.Ifthenumberofbitsneededtorepresentthecompresseddataisanalyzed,itcanbeseenthatfMBSTWprovidesupto60%reductionincompresseddatasizecomparedtoMBSTWand70%comparedtoWDLZW.WDLZWprovidesworstperformanceforalmostallthebenchmarks.Thehighredundancyinthetracedatasetisresponsibleforitssomewhatgoodperformanceins38584ands38417.Next,wevarythedictionarysizeofDCtoseetheeffectoncompressionratio.TheresultsareshowninFigure 7-5 .Wehavevariedthenumberofdictionaryentriesfrom8,16,32and64.AscanbeseenfromFigure 7-5 ,thevariationisnotuniformforallthebenchmarks.Fors9234,s13207ands35932,thecompressionratioincreaseswithincreaseindictionaryentries.Ontheotherhand,fors38584ands38417,increaseinnumberofdictionaryentriesworsensthecompressionratioandtheoptimalcompressionisachievedat8dictionaryentries.Oncewereachanoptimalcompressionratio,anyincreaseinthenumberofdictionaryentrieswilladdtothetotalcompresseddatasizebothduetotheincreasednumberofentriesinthedictionariesandincreaseinthenumberofbitsrepresentingthedictionaryindex. 6Providesbetterperformancethanthe3-MBSTWalgorithm 115

PAGE 116

Figure7-4. Comparisonofcompressionperformance Figure7-5. Compressionperformancewithdictionaryentries 116

PAGE 117

7.2.2BRAMRequirement(HardwareOverhead)Thedictionaryforcompressionhastobestoredinon-chip32-bitBRAMs.WehavecomputedthetotalsizeofBRAMsneededforcompressionusingeachofthesealgorithms.Figure 7-6 comparestherequirementsforeachoftheseapproaches.Itcanbeseenthatsincethedictionarysizeisalwaysxed(8entries)forDCandBMC,thenumberofBRAMsrequiredinthesetwoalgorithmsissignicantlylessthananyotherapproaches.WDLZWhasthehighestnumberofBRAMrequirementssinceitcapturesallthedoublesymbolrepetitions(whichworsensthecompressionperformance).ForfMBSTW,thenumberofBRAMsiskeptooredtothenearesthigherpowerof2forthenumberofuniqueentriesinthestream7.FromFigure 7-6 ,itcanbeseenthatourtwomethods(DCandBMC)providesalmost96%lesscompressionarchitectureoverheadcomparedtoMBSTWandalmost99%lessthanWDLZW.Itcanbeseenthatthereisatradeoffbetweenbettercompressionratioandlowerarchitectureoverhead.AscanbeseenfromFigure 7-4 andFigure 7-6 ,eitherofthetwotechniquesBMCorfMBSTWcanbeappliedbasedonpriority-BMCcanbeusedforleastareaoverhead(upto96%reduction)withreasonablecompressionimprovement(10%)comparedtoMBSTW,whereasfMBSTWshouldbeusedforbestpossiblecompression(upto60%)whileprovidingreasonable(upto84%)reductioninBRAMrequirement. 7.2.3CompressionPerformancewithErroneousTraceDataWenowliketovalidatetheanalysisdoneinSection 7.1.3 .Errorshavebeeninsertedrandomlyatarateof2%to10%instepsof2%inthetracedata,andthesameiscompressedusingDC,BMCandfMBSTW.Figure 7-7 showsthecomparisonofcompressionpenaltyinDCwithvaryingpercentageoferror.ItcanbeseenthatthechangeincompressionpenaltycomplieswithEquation 7 inSection 7.1.3 .For 7ResultsinFigure 7-4 arealsoreportedusingthisconguration 117

PAGE 118

Figure7-6. BRAMrequirements example,puttingavalueofl=2%inEquation 7 willresultinacompressionpenaltyoflessthan2%,whichmatchesinthegureforallthebenchmarks.WehaveconductedsimilarexperimentsforBMCbasedcompressiontechniqueaswell.TheresultsinFigure 7-8 showsthatthecompressionpenaltyalsofollowsEquation 7 .Now,wewouldliketoverifythelastpartofthediscussioninSection 7.1.3 ,thatis,thechangeincompressionpenaltywitherrorrateforfMBSTW.WehaveconductedsimilarexperimentsandtheresultsareshowninFigure 7-9 .AnimportantobservationhereisthatthechangeinpenaltyissharperthanthecaseofFigure 7-7 orFigure 7-8 .ThisisquiteobviousasperthediscussioninSection 7.1.3 ,sinceinfMBSTW,2stringsareaffectedwhenanerrorisintroduced,whereasinDCorBMC,only1stringisaffected.Finally,wecomparehowtheintroductionoferrorsaffectthecompressionperformanceincasesofMBSTWandfMBSTW.TheresultsareshowninFigure 118

PAGE 119

Figure7-7. ComparisonofcompressionpenaltyforDC Figure7-8. ComparisonofcompressionpenaltyforBMC 119

PAGE 120

Figure7-9. ComparisonofcompressionpenaltyforfMBSTW 7-10 .Wehaveintroduced2%errorforeverybenchmark'stracedata.ItcanbeseenthatthecompressionpenaltyobtainedusingfMBSTWisalwayslessthanMBSTW,themaximumdifferencebeing4%fors38417.ThereasonforhigherpenaltyinMBSTWisthatiferrorgetsintroducedearly,MBSTWcannotbenetfromaprotablesequence.Insummary,ourapproach(fMBSTW)willperformsignicantlybetterirrespectiveofthepercentageoferrorsinthedataset. 7.3SummaryThetracebuffersizeislimitedduetoarea/costconstraints.Tracedatacompressionschemeshavebeenpopularwhichdealswithdynamicdictionarybasedcompressionthatenablestostoremoresignalstatesusingaxedsizetracebuffer.Wehaveproposedatracedatacompressiontechnique,whichemploysastaticallycomputeddictionary.Wehaveusedthreecompressionalgorithmsforcompressingthetracedata.Ourapproachescanproduceupto60%bettercompressionperformance,and 120

PAGE 121

Figure7-10. Comparisonofcompressionpenalty reducethecompressionhardwareoverheadupto84%comparedtobest-knownexistingapproaches. 121

PAGE 122

CHAPTER8OBSERVABILITY-AWAREDIRECTEDTESTGENERATIONTwomajorproblemsgoverningefcienterrordetectionarethequalityofinputtestsandselectedtracesignals.Chapters 3 6 primarilyfocusedonimprovementofsignalobservability.However,inordertodetecterrors,itisequallyimportanttoconsiderthecontrollabilityaspectoftests.Duringpost-siliconvalidation,notalltheprimaryoutputsofadesignblockarevisible(sincetheymaybeinternallyconnectedtosomeothercomponentsoftheSoCdesign).Also,thenumberofprimaryoutputsofacircuitistypicallylargerthanthetracebufferwidth,whichdeterminesthenumberofsignalstatesthatcanbestoredpercycle.Hence,theprimaryoutputscannotbeusedasobservationpoints.Existingmethods[ 5 7 ]onsignalselectionassumethattheinputtestsarealwaysrandominnature.However,oncethetracesignalsareknown,AutomaticTestPatternGeneration(ATPG)toolscanbeusedtogenerateefcienttestsforerrordetectioniftheprobableerrorlocationsareavailable.InmodernSoCdesignmethodology,itisfoundthatregionswhereerrorsaredetectedduringpre-siliconvericationaremorelikelytobeerroneousduringpost-siliconvalidation.Therefore,thepre-siliconengineercanprovideinformationabouttheprobableerroneouslocationsforpost-siliconvalidation.Thisinformationisextremelyessentialindeterminingtheinputtestsaswellasthetracesignals.Theinter-dependenceofinputtestsandtracesignalsisshowninFigure 8-1 Figure8-1. Outlineofproposedtechnique 122

PAGE 123

Ourproposedapproachtakesasinputthecircuitandthetracebufferwidth.Ifthetracebufferwidthisn,ourgoalwouldbetoselectnbesttracesignalsanddeterminetheircorrespondingtestcases,sothaterrordetectionismaximized.Asaninput,wealsoconsidertheprobableerroneouslocationsinthecircuit.Thisinformationisprovidedbythepre-siliconengineer.Intherststep,weruntheATPGtoolconsideringtheprimaryoutputsasobservationpointstoobtainasetofdirectedtestsforerrordetection.Next,weusetheseteststodeterminetheprotabletracesignals.TheATPGtoolisusedtogeneratetestsusingthenewtracesignalsasobservationpointsandobtainthefaultcoverage.Thisprocessisrepeated,thatis,thetestsgeneratedarere-usedtodetermineanewsetofobservationpoints(tracesignals).TheATPGtoolisusedtodeterminethefaultcoverageandproducenewsetoftests.Ifthefaultcoveragedoesnotimprove,theoldsetofobservationpointsareselectedfortracingandthecorrespondingtestsasinputtests.Ontheotherhand,iftheimprovementissignicant,theentireprocessisrepeated.Thisprocesscontinuesuntilthefault-coveragereaches100%ordoesnotimproveinsubsequentruns.TheframeworkofourproposedapproachisshowninFigure 8-2 Figure8-2. Observability-awaretestgenerationow 123

PAGE 124

Inthischapter,weconsidertestgenerationinthepresenceofelectricalerrorsforbothsofterrorsandcrosstalkfault.Section 8.1 describesourtest-awaresignalselectiontechnique.Section 8.2 presentsourtracesignalawaretestgenerationapproach. 8.1Test-awareSignalSelectionOncethetestsaredeterminedusingATPGtool,thenextsetoftracesignalsneedtobedeterminedtoimprovetheerrordetectionperformance.Ingeneral,duringselectionoftracesignals,theinputtestsareassumedtoberandom.Wewouldliketolookataspecialcasewhentheinputtestsetsareknownpriortosignalselection.Knowledgeofinputtestscanbeusedtodeterminethesignalsveryefciently,specially,whenthemainfocusiserrordetection.OursignalselectionprocedureispresentedinAlgorithm 12 .Theremainderofthissectiondescribesthethreeimportantstepsofthesignalselectionalgorithmindetail. Algorithm12:Testawaresignalselection Input: Circuit,TraceBufferWidth,TestSetT Output: TraceSignalsforEachTestVectordo 1:Simulateeachfaultinthecircuit.2:Foreachsignalinthecircuit,determinewhetheritcandetectthefault.end3:Computetheerrordetectionabilityofeachsignal.whileTracebufferwidthisnotreacheddo 4:FindthesignalwiththehighestErrorDetectionAbilityandselectit.5:Removeoverlap.endReturnSelectedtracesignals. 8.1.1FaultSimulationThebestwaytoknowwhetherafaultcanbedetectedusingaparticularobservationpointandatestvectoristosimulatethefaultandnoticethestateoftheobservationpoint.Sincewealreadyhavethesetoftestvectors,thefaultsimulationisstraightforward.Foreachtestvector,werstdoasimulationwithoutanyerrorandobservethecorrectstatesofthevarioussignals.Now,weperformsimulationforeveryfaultwiththesame 124

PAGE 125

testvector.Foreachfault,thesignalstatesofthecircuitsareobserved.Iftheyaredifferentfromtheidealsimulation,itisobviousthatthefaultispropagatedtothatsignal.Thisprocessisrepeatedforeachtestcaseandeachfault.Forexample,iftherearemtestvectorsandnfaults,therewillbeatotalofmnsimulations.Foreachsignal,wenotethefaultsthatitcandetect.ThisisrecordedasabinaryvariableEPP.Forexample,ifinFigure 8-3 ,ccandetectanerrorinausinganyofthetestvectors,EPPc,a=1.Ontheotherhand,sincedcanneverdetectanyerrorina,EPPd,a=0. Figure8-3. Examplecircuit 8.1.2ErrorDetectionAbilityComputationErrorDetectionAbility(EDA)ofanode(signal)isameasureoftheerrorsthataparticularnodecandetect.Anodecanonlydetecterrorsinitsfan-incone.Forexample,ifweconsiderFigure 8-3 ,anyerrorinccanonlypropagatetoeandnottoa,bord.Therefore,theonlynodeswhoseerrorsccandetectareaandb.EDAofanodeisthesumofalltheerrorsthataredetectedusingfaultsimulation. EDAc=EPPc,a+EPPc,b(8)Itshouldbenotedthatanodecandetectanerrorusingmultipletestcases,however,itshouldbecountedonlyonce.WeenforcethisbyensuringthatEPPisaBooleannumber.Forexample,ifbysimulating2testcases,ccandetectaandbinbothcases,EDAcwouldbe2andnot4.OnceEDAvalueforeachnodeiscomputed,thenodewiththehighestEDAvalueisselectedfortracing.Thenextsectiondescribeshowtoremovetheoverlapofalreadyselectedsignalsbeforedeterminingthenextprotablesignal. 125

PAGE 126

8.1.3OverlapRemovalThispartofthesignalselectionalgorithmisusedtoremoveeffectsofalreadyselectedsignalsandthus,selectappropriatesignalsforimprovederrordetection.Inordertoexplainthis,letusagaingetbacktoFigure 8-3 .Letusconsidernodec,whichistherstnodetobeselectedfortracing.IfEPPc,a=1andEPPc,b=1,thatis,theerrorsinaandbcanpropagatetoc,contributionsofEPPe,aandEPPe,bshouldnotbeincludedwhilecomputingEDAe.Inthiscase, EDAe=EPPe,c+EPPe,d(8)Thus,overlappingnodes,whosecontributionshavealreadybeenaccountedfor,shouldnotbetakenintoaccountwhencomputingtheEDAvalueofanode.Theprocessofoverlapremovalandsignalselectioncontinuesuntilthetracebufferisfull. 8.2TraceSignalAwareTestGenerationOncethesetofselectedsignalsareknown,thenextstepwouldbetogenerateanothersetoftestsbasedonthesesignalsthatcanmaximizetheerrordetectionability.TheATPGtoolisusedforthispurpose.WeusedATALANTAasanATPGtoolthatgeneratestestsdependingonthefaultlistandconsideringtheoutputnodesasobservationpoints.Inordertogeneratetestsbasedontheselectedtracesignals,wemodifythenetlisttoreplacethetracesignalsasobservationpoints.TheATPGenginewillgeneratethetestsassumingthetracesignalsastheobservationpoints.Thefaultcoverageusingthesenewsetoftestsiscomputed.Iftheydonotimprove,thesetofselectedsignalsarereportedastracesignals.ThetestsgeneratedusingtheATPGtoolareusedasdirectedinputtests.Otherwise,theprocessinSection 8.1 isrepeated. 8.2.1SoftErrorsandFaultsSofterrorsarecausedduetoionizingradiationsfromradioactiveimpuritiespresentinachipduringmanufacture.Thesemayresultinionizingradiationslikealpha-particles.Whenthesealphaparticlescomeincontactwithasemiconductor,theirkineticenergy 126

PAGE 127

getsconvertedtoelectricalenergy[ 56 ],whichresultsincreationofalargenumberoffreeelectronsandholes.Thisleadstoacreationofaninversionlayeraswellasavoltageglitchontheaffectedtransistor.Iftheglitchisofsufcientmagnitude,afaultylogicvalueisintroducedtemporarilyonanodeinthecircuit.ThisisknownasSingleEventTransient(SET).Ifthefaultyvalueispropagatedtoaprimaryoutput,theeventisknownasSingleEventUpset(SEU).WetrytogeneratedirectedteststodetectallSETsresultinginpossibleSEUs.Traditionally,softerrorsareassumedtoaffectmemoryelementssincetheycontainthemaximumdensityofbitssusceptibletosofterrors.VariousmechanismshavebeendevelopedtoprotectthememoryelementsusingErrorCorrectingCodes(ECC).However,withdecreaseinfeaturesizeandincreaseindesigncomplexity,combinationalcircuitsareequallyvulnerabletosoft-errors[ 31 ].Protectionofcombinationalelementsismoreexpensiveintermsofchiparea,powerandperformanceissuescomparedtomemoryelements.Hence,itisimportantthatfaultsincombinationalcircuitsduetosofterrorsaredetectedearly.Asoft-errormaybemaskedinherently(thatis,notpropagatedalongthecircuit)duetothefollowingfactors. LogicalMasking.Thisoccurswhenaparticlehitsaninputofagate,andoneoftheotherinputshaveacontrollingvalue.Inthiscase,thecontrollinginputwilldominatethepropagatedvalueandhence,theerroneousvalueintroducedbythesoft-errorwillneverbepropagated.LetusconsidertheexamplecircuitinFigure 8-3 .Letasoft-erroraffectthenodeaofthecircuit.Here,aisaninputtoanANDgatewhoseotherinputisbandoutputisc.Now,whentheerrorisintroducedata,ifthevalueofbis0,thatis,ifbisthecontrollinginputoftheANDgate,theerrorvaluewillnevergetpropagatedtotheoutputc.Thus,theerrorwillbemasked. ElectricalMasking.Iftheerrorintroducedduetoaparticlestrikeissuppressedbyelectricalpropertiesofsubsequentlogicgates,itwillneverreachanoutputandhenceismasked. Latching-WindowMasking.Iftheerrorreachesalatchatacyclewhenthelatchisnotacceptingitsinputvalue,itwillneverpropagatethroughthelatchandhencewillbemasked. 127

PAGE 128

Thesemaskingeffectsresultinlowersoft-errorratesincombinationalcircuitscomparedtomemoryelements.However,withdecreaseinfeaturesize,transistorsbecomefasterandhence,electricalmaskingisreduced.Alsowithdeeperpipeline,processorclockratesincrease,withasubsequentincreaseinsamplingrateoflatches.Asaresult,effectofLatching-Windowmaskingalsodecreases.Therefore,effectofsoft-errorsoncombinationalcircuitshavebecomemoreprominentthesedays.Theerrormodelthatisusedformodelingsoft-errorsisasimplestuck-atfaultmodel.Thenodesaffectedbyradiationsgetstuckatcertainxedvaluesdependingontheamountoffreeelectronsorholescreated.Theeffectofsofterrorsonanodevaluedependsonthefollowingfactors: OutputCapacitance.Aweakernodecapacitanceallowsittostorelesschargeandhencefasterdischarge.Therefore,thenodeismoresusceptibletosofterrors. Pull-upnetwork.Ifanodehasweakpull-upnetwork,itslogic1valuecanbeeasilychangedtologic0bySET,thusleadingtoastuck-at-0fault. Pull-downnetwork.Ifanodehasaweakpull-downnetwork,itslogic0valuecanbeippedtologic1bySET,thus,leadingtoastuck-at-1fault.Detectionofsofterrorsrequiregenerationoftestcasesthatwouldactivatetheparticularerrorsandpropagatethemtotheprimaryoutputsofthecircuit.Asdiscussedbefore,duringpost-siliconvalidation,notalltheprimaryoutputsofablockmaybevisiblesinceonlysomeoftheinternalsignalsofthechiparetraced.Therefore,inordertoproperlydetectthefaults,theerroneousvaluesshouldpropagatetowardsthetracedsignalsandnottotheprimaryoutputs.Thetestgenerationproblemshouldfocusongeneratingasetoftestcasesthatwouldactivateandpropagateamaximalnumberofsofterrors(ifpossible,allofthem)totheobservationpoints,thatis,thetracesignals.LetusconsidertheexampleinFigure 8-4 toexplainthetestgenerationproblemforsofterrors.ThetwoerrorpointsarePandQ,wheretheerrorscanberepresentedass)]TJ /F3 11.955 Tf 11.11 0 Td[(a)]TJ /F5 11.955 Tf 11.11 0 Td[(0ands)]TJ /F3 11.955 Tf 11.11 0 Td[(a)]TJ /F5 11.955 Tf 11.1 0 Td[(1,respectively.Wewouldliketogenerateteststhatwouldactivatethemaswellaspropagatethemtotheprimaryoutputs.The5inputsignalsarenamed 128

PAGE 129

Figure8-4. Examplecircuitillustratingtestgenerationforsofterrors a,b,c,dande.Todetectthes)]TJ /F3 11.955 Tf 11.01 0 Td[(a)]TJ /F5 11.955 Tf 11.01 0 Td[(0faultatP,theinputtestsshouldbe<1,1,1,1,X>whereasthetestrequiredtodetectthes)]TJ /F3 11.955 Tf 11.33 0 Td[(a)]TJ /F5 11.955 Tf 11.34 0 Td[(1faultatQ,theinputtestsshouldbe<1,1,1,1,0>.Therefore,thetestthatcandetectbothfaultsis<1,1,1,1,0>.ATPGalgorithmscanbedesignedtogenerateteststhatwoulddetectthesefaults.Algorithm 13 describesourtestgenerationprocedureforsoft-errors.Intherststep,weidentifyalltheinternalsignals(gatesignalsandfan-outbranches)inthezone.Foreachofthesesignals,weperformtestgenerationfors)]TJ /F3 11.955 Tf 11.22 0 Td[(a)]TJ /F5 11.955 Tf 11.23 0 Td[(0ands)]TJ /F3 11.955 Tf 11.23 0 Td[(a)]TJ /F5 11.955 Tf 11.23 0 Td[(1faultsusingATPG. Algorithm13:Testgenerationforsofterrordetection Input: Circuit,Tracesignals,Soft-erroraffectedzoneZ Output: Testsettodetectthefaults1:FindthesignalscorrespondingtoZ.Signalscorrespondingtonodesaswellasfan-outsignals.2:Createafaultlistwithstuck-at-0andstuck-at-1ateachnode.3:UseATPGtogeneratetestsforthesefaults.Usethetracesignalsasobservationpoints.Returnthesetoftests. Wedevelopedasimulatortocheckwhetherthetestsdevelopedcouldactuallyexciteandpropagatetheerrors.AsanATPGtool,weuseATALANTA,whichwasdevelopedattheCADlabatVirginiaTech.ATALANTAtakesasinputthecircuitasa 129

PAGE 130

netlistandthefaultlist,andgeneratestestsetsthatwouldhelpdetecttheerrors.Italsogivesanestimateofthepercentageoffaultsthatarecovered.Ifthesignalstobetracedarealreadyknownandwewanttodetectthefaultsatonlythosepointsandnotattheprimaryoutputs,wewouldhavetomodifythenetlist.ATALANTAwillalwaysforcetheerrorstopropagatetowardstheprimaryoutputs.Hence,itisnecessarytoreplacetheprimaryoutputsinthenetlistwiththetracesignals.ATALANTAwouldnowbeabletogenerateteststhatpropagatetheerrorstowardsthetracesignals. 8.2.2CrosstalkFaultsCrosstalkfaultsarecausedbyparasiticcouplingcapacitancesbetweenadjacentlinesinachip[ 40 ].Withdecreaseinfeaturesize,effectofcouplingcapacitancesandhence,crosstalkfaultsbecomemoreprominent,thusleadingtosignalintegrityproblems[ 43 ].Crosstalkfaultsarecausedwhenthecouplingcapacitancebetweentwolinesexceedacertainthreshold.Insuchacase,iftherearetransitionsoneitherorboththelines,thetransitionononewillinuencetheotherandhence,thevoltagelevelschangecausingeitheradelayoraglitch.Thelinewhosevoltagelevelchangesisknownasvictim,whilethelinewhichchangesthevoltageleveliscalledaggressor.WewillexplaincrosstalkglitchesanddelaysusingtheexamplecircuitinFigure 8-5 whichhas5lines(signals),namelya,b,c,dande.Letusassumethecouplingcapacitancesbetweenlinescanddexceedthethresholdsothattheycanactasprobableaggressor-victimpairs. Figure8-5. Examplecircuitillustratingcrosstalkfaults Duringcrosstalkglitch,thevictimlinestaysatastaticstate,whiletheaggressorundergoesatransition.Ifthetransitioneffectisoppositetothestateofthevictim,a 130

PAGE 131

glitchiscreated.Forexample,ifthevictimisatastateof0,whiletheaggressorhasapositivetransition,apositiveglitchisformedonthevictimline.Similarly,ifthevictimlineisinastateof1andanegativetransitionisformedontheaggressorline,anegativeglitchiscreated.Figure 8-5 hasbeenredrawninFigure 8-6A toshowthatwhenlinecisinasteadystateandlinedtransits,apositiveglitchonlinecisformedasshowninFigure 8-6B ASourceofcrosstalkglitch BCrosstalkglitchFigure8-6. Positiveglitchonc Ontheotherhand,delaysarecreatedwhenbothaggressorandthevictimundergotransition.Ifthetransitionsareinthesamedirection,theoveralldelayisreduced.Ifthetransitionsareinoppositedirection,thesignalpropagationdelayisincreased.Figure 8-7B showsapositivedelayonlinecduetotransitionsonbothlinescandd(Figure 8-7A ).Itshouldbenotedthatboththetransitionsneedtobesimultaneousinorderforthedelaytotakeeffect.AscanbeseeninFigure 8-8 ,ifthetwotransitionsarenotsimultaneous,therewillnotbeanydelay.Theeffectofcrosstalkfault,thatis,delayorglitchwillbepropagatedtofan-outgates.Incaseofsequentialcircuits,iftheglitchdurationordelayislessthantheclock 131

PAGE 132

ASourceofcrosstalkdelay BCrosstalkdelayFigure8-7. Positivedelayonc Figure8-8. Non-simultaneoustransitions frequency,itgetssuppressed.However,forcombinationalcircuits,theeffectsgetpropagatedtotheoutputs.ThetestgenerationalgorithmforcrosstalkfaultsisshowninAlgorithm 14 .Therststepofthealgorithmistondalltheaggressor-victimpairsbyobservingtheircouplingcapacitances.Inthiscase,weconsidersingleaggressor-singlevictimpairsonly.However,thealgorithmcanbeextendedtomultipleaggressorsaswell.Informationoncouplingcapacitancesisobtainedfromthelayoutinformationofthechip.Oncewe 132

PAGE 133

haveidentiedallthepairs,thenextstepwouldbetogenerateteststhatwouldprovidetransitionsoneitherorboththelinesdependingonthedesiredtypeofcrosstalkeffect. Algorithm14:Testgenerationforcrosstalkfaultdetection Input: Circuit,listofcouplingcapacitances,threshold Output: Testsettodetectthefaults1:Findallthepairoflinesthatcontributetocrosstalkfaults.2:Duplicatethecircuit.3:UseATPGtogeneratetestsforthesefaults.ReturnTestset. Wenowexplainouralgorithmusingcrosstalkdelay,thatis,transitionsshouldbepresentonbothlines.Crosstalkglitchescanbeexplainedinasimilarway.Duplicationofcircuitisneededtocreatetransitionsonbothaggressorandvictim.Foracombinationalcircuit,whichdoesnothaveaclocksignal,inordertoemulateatransition,weneedtomakesurethatthesignalsonaparticularlinechangeintwoadjacenttimeunits.LetusconsidertheexamplecircuitinFigure 8-5 .Suppose,linescanddhavebeenidentiedascrosstalkpairs.Wewouldliketogeneratetwosetsoftests,suchthattheyretransitionsonboththeselines.Ifwewanttoobservetheeffectofcrosstalkglitch,transitionshouldbeenabledononlyoneline.Inordertogeneratethetransitions,wehaveduplicatedthecircuitinFigure 8-9 .CorrespondingtoeachsignalinFigure 8-5 ,thereisacorrespondingsignalinFigure 8-9 .Forexample,signalainFigure 8-5 willbeduplicatedasa0inFigure 8-9 .Thusalltheinputsareduplicatedaswell.TheATPGisusedtogeneratethetestsforthisduplicatedcircuit;hence,itgenerates2testsfortheoriginalcircuit,onecorrespondingtoeachsetofinputs.Inthisexample,theinputstoa,b,cwillcorrespondtothetestinthersttimeframe,whileinputstoa0,b0,c0willcorrespondtotestinthesecondtimeframe.Thus,inordertogenerateatransitionatlinecinFigure 8-5 ,theinputstocandc0shouldbedifferentinFigure 8-9 .Thiscanbeforcedbyconnectinganexclusive-orgate,whosetwoinputsarecandc0.Sinceanexclusive-orgatewillbe1onlywhenthetwoinputsaredifferent,thisensuresatransitioninlinecinFigure 8-5 .Similarly,dandd0inFigure 8-9 areinputtoanother 133

PAGE 134

exclusive-orgate,thus,forcingatransitionindinFigure 8-5 .Wewanttogeneratetestcasesthatwouldprovidetransitionsonbothlines.ThisisensuredbyconnectinganANDgateattheoutputofthetwoXORgates.TheATPGisthenusedtogeneratetestssothattheoutputooftheANDgateis1.Thisensurestransitiononbothlines.TheATPGtoolcanberunassumingthepointoiss)]TJ /F3 11.955 Tf 11.17 0 Td[(a)]TJ /F5 11.955 Tf 11.16 0 Td[(0.Inthiscase,theATPGtoolwillgeneratetesttoforceotobe1,andhenceensureatransitiononbothlines. Figure8-9. Duplicatedcircuit Wewantthedelayatthevictimtobepropagatedtotheoutput.Inordertoensurethat,oisconnectedtothefan-outbranchofthevictimandthuspropagatedtoanobservationpoint,orprimaryoutput.Forexample,ifd(ord0)isthevictimlineinFigure 8-9 ,whichincurssomedelay,weaddotothefan-outconeofd0,inordertoensurethatthedelayindinFigure 8-5 actuallygetspropagatedtoaprimaryoutpute(e0inthiscase).ThemodiedcircuitisshowninFigure 8-10 .Ifwehavetracesignals,theobservationpoints(tracesignals)areenabledsuchthattheATPGgeneratestestswhichpropagatethedelaytothesetracesignals.Similartest-generationprocedurecanbeappliedforcrosstalkglitches,inwhichcase,thetransitionshouldbeonlyalongtheaggressors. 134

PAGE 135

Figure8-10. Modiedcircuit 8.3ExperimentsWehaveappliedourproposedapproachontheISCAS'85combinationalbenchmarks.Fortherstsetofexperiments,softerrorsaretheonlyelectricalerrorsconsideredpresentinthecircuits.Foreachexperiment,weapplied250errorsforeachcircuit.Randomnodesareselectedaserrorpoints.TheATPGtool,ATALANTAisusedtogeneratedirectedtestsinordertodetectthosefaults.AnanalysisofthememoryrequirementforourtestgenerationalgorithmdescribedinSection 8.2 isshowninTable 8-1 .TherstcolumngivesthenameofthebenchmarkswhilethesecondcolumnprovidesthememoryrequirementtogeneratethedirectedtestsinKbyte.ThethirdcolumnindicatesthesizeofeachbenchmarkinKByte.Thelastcolumnpresentsthenumberofreducedtestsets. Table8-1. Memoryrequirementfortest CircuitMemorySizeReducedRequirement(KB)(KB)Tests c7552271368131729c6288186405515510c5315158945529826c3540236833663245c2670339472960133 Wecomparetheperformanceofourtest-awaresignalselectionalgorithmwiththestandardprolebasedsignalselectionalgorithm,wheretheinputsareallassumed 135

PAGE 136

toberandom.Asetof250pointsareselectedineachcircuitaspotentialerroneousregions.Asimulationof1000cyclesisrunassumingnoerrorispresent.TheinputtothecircuitisfedwiththetestsetsgeneratedbyATALANTA.Then,foreacherror,anothersetof1000simulationsareperformedassumingtheerrorispresentinthecircuit.Ifanyofthetracedsignalstatesduringthissimulationisdifferentfromtheperfectsimulation,anerrorissaidtobedetected.TheErrorDetectionRatio(EDR1),ischosenforcomparingerrordetectionperformance.ThecomparisonofEDRfor5ofthelargestISCAS'85benchmarksisshowninFigure 8-11 .Theproposedmethodcolumnreferstoourproposedtest-awaresignalselectionalgorithm.Ontheotherhand,theprolebasedcolumnreferstostandardprole-basedtracesignalselectionalgorithm.Itshouldbenotedthattomakeafaircomparison,wehaveusedthesamesetoftestsinbothscenarios.Inprole-basedcase,theprimaryoutputsareusedasobservationpoints.AscanbeseeninFigure 8-11 ,ourproposedmethodperformsconsistentlybetterthantheprole-basedsignalselectionalgorithm,withmaximumimprovementbeing57%forc5315.Thisisasexpected,sinceouralgorithmusestestsasinputstoselectsignals,whichgivesabetterinsightduringerrorpropagationprobabilitycomputation,andhence,subsequentselectionoftracesignalstodetecterrors.WewouldliketoobservethevariationofEDRwithtracebufferwidthforsofterrordetection.Thetracebufferwidthhasbeenincreasedinstepsof16,32and64,respectively.TheresultsareshowninFigure 8-12 .Formostcases,exceptc5315,100%EDRisreachedwhentracebufferwidthis32.Next,weinvestigatehowourproposedapproachdescribedinFigure 8-2 convergesforthe5benchmarks.TheresultsarepresentedinTable 8-2 .Foreachofthebenchmark,wechooseatracebufferofsize32.Theerrorsetremainsthesame(250softerrorsthatweconsideredinthissection).Therstcolumngivesthecircuitname.Thesecond 1DenedearlierinEquation 5 136

PAGE 137

Figure8-11. Comparisonofsignalselectionmethodsforsofterrors columnpresentsthenumberofrunsneededtoconverge,thatisthenumberoftimesthesignalselectionandtestgenerationproceduresareexecutedinFigure 8-1 .Thelastcolumnreferstothenalcoverageobtainedusingdirectedtests.Itcanbeseenthatformostbenchmarks,tworunsaresufcienttoreachasteadystateforbothtestsandselectedsignals.Thisprovestheefciencyofourtestawaresignalselectionalgorithm,whichselectsthetracesignalswiselysothatthefaultcoveragereaches100%withintwoiterations.TheATPGtoolgenerateddirectedteststhatenabletheerrorstopropagatetowardsthetracesignals.Ontheotherhand,existingtracesignalselectionalgorithmsrelyonrandominputsastest.Hence,theyprovidesignicantlylowerfaultcoverageasshowninFigure 8-11 .Now,wewouldliketoobservetheperformanceofourtest-generationalgorithmforcrosstalkfaultsdescribedinSection 8.2.2 .SimilartoFigure 8-11 ,wehaveusedthe5largestISCAS'85benchmarks.ThetestgenerationalgorithmforcrosstalkfaultsdescribedinAlgorithm 14 isextremelytimeconsumingasitrequiresmanualmodicationofthecircuitinordertoinserttheadditionalANDandXORgates 137

PAGE 138

Figure8-12. VariationofEDRwithtracebufferwidthforsofterrordetection Table8-2. Faultcoverageincaseofsofterrors CircuitNumberofRunsFinalCoverage c75522100%c62882100%c5315399.2%c35402100%c26702100% describedinFigure 8-10 .Hence,wereducedthenumberoffaultsfrom250to10.Similarly,thetracebufferwidthisalsoreducedto4insteadof32,thatis,inthiscase,4signalswillbestoredeverycycle.Inordertomakefaircomparison,thesameexperimentisrepeatedforprole-basedsignalselectiontechniqueusingthesamesetofparameters.TheresultsareshowninFigure 8-13 .Ascanbeseen,ourproposedmethodprovidessignicantimprovementovertheprole-basedtechnique,withthemaximumimprovementbeing100%forc3540.Theperformanceoftheprole-basedmethodisseentobedegradedcomparedtosofterrorsasseeninFigure 8-11 .Thereasonbehindthisisthelimitednumberoferrorschosen(10).Thesecompriseoflessthan1%ofthetotalnumberofsignalsinthedesign.Hence,usingtheprole-based 138

PAGE 139

methodwithlimitednumberoftracesignals(4),itwillbeextremelydifculttocapturethefaults. Figure8-13. Comparisonofsignalselectionmethodsforcrosstalkfaults SimilartoTable 8-2 forsofterrors,wewanttoseehowfasttheloopinFigure 8-2 canconvergeincaseofcrosstalkfaults.TheparametersremainthesameasinFigure 8-13 ,thatis,10errorpointswithtracebufferofwidth4.TheresultsareshowninTable 8-3 .Similartosofterrors,itcanbeseenthat2-3runsaresufcientforobtainingthemaximumcoveragepossible.Thisprovestheefciencyofourtestgenerationtechniquesandtest-awaresignalselectionalgorithmsforcrosstalkfaults. Table8-3. Coverageofcrosstalkfaults CircuitNumberofRunsFinalCoverage c7552260%c62883100%c5315260%c35402100%c2670390% 139

PAGE 140

8.4SummaryLimitedobservabilityisamajorbottleneckindetectingerrorsduringpost-siliconvalidationanddebug.Inthischapter,wehaveproposedanefcientobservability-awaredirectedtestgenerationtechniquethatselectsefcientobservationpointsandgeneratescorrespondingtestsetstoimprovedetectionofelectricalerrors.WehaveimplementedourapproachusingATLANTAATPGtoolandappliedonISCAS'85benchmarks.Uptotwotimesimprovementinerrordetectionhasbeenobservedcomparedtotheexistingsignalselectionapproaches. 140

PAGE 141

CHAPTER9CONCLUSIONSANDFUTUREWORKPost-siliconvalidationisanimportantcomponentofmodernchipdesignmethodology.Limitedobservabilityisamajorconcernduringpost-silicondebug.Thisdissertationproposedefcienttechniquestoenhancetheobservabilityduringpost-silicondebugtoreduceoverallvalidationeffort.Thischapterconcludesthisdissertationandoutlinesfutureresearchdirections. 9.1ConclusionsDuetodramaticincreaseindesigncomplexityanddecreaseintime-to-marketwindow,alotofbugsescapethepre-siliconvericationphaseandgetmanifestedduringthenormaloperationofthechip.Post-siliconvalidationisusedtodetectthesebugsbeforeachipisdeliveredtothecustomer.Amajorchallengeduringpost-siliconvalidationisthelimitedobservabilityofon-chipsignals.Somerecenttechniqueshelptotracesomeoftheinternalsignalstatesandstoretheminanon-chiptracebufferforfuturedebug.Thesizeofthetrace-bufferdeterminesthesignalstatesthatcanbestoredandhence,providesaconstraintonsignalobservability.Toimproveobservability,thisdissertationmadeseveralimportantcontributionsassummarizedbelow.InChapter 3 ,wedevelopedefcienttechniquestoselecttracesignalsinordertoimprovetherestorationoftheuntracedsignals.Existingapproachesuseapartialrestorabilitybasedsignalselectiontechniques,thatareinferiorbothintermsofrestorationratioandsignalselectiontime.Wehaveproposedatotalrestorabilitybasedsignalselectionalgorithmthatprovidesupto3timesbetterrestorationperformancewhilereducingthesignalselectiontimebyanorder-of-magnitude.WehavefurtherproposedanRTL-levelsignalselectionalgorithmthatreducesthememoryandtimeoverheadconsiderablywithminorimpactontherestorationperformance.InChapter 4 ,weproposedacombinedtraceandscanbasedapproachtoimprovesignalrestoration.Scanbaseddebugmechanismshavebeenusedin 141

PAGE 142

manufacturing-testingdomainforalongtime.Whiletracesignalsprovideagoodtemporalvisibility,scansignalscanimprovethespatialobservability.Wehaveprovidedanefcientcombinationofbothinordertoimprovetheoverallobservabilityalongbothdirections.Ourproposedtechniqueprovidesupto17%betterrestorationcomparedtoexistingtrace-scancombinedapproaches.Theexistingsignalselectiontechniquesfocusonimprovingtheoverallsignalrestorationinthecircuit.However,restorationmaynotbedirectlyrelatedtodetectionoferrorsinthecircuit;sinceerrorsmoveonlyalongthefan-outconeofasignal,whilerestorationcanproceedinbothdirections.InChapter 5 ,wehaveproposedasignalselectiontechniquethathelpsindetectingerrorsacrossthecircuit.Whilepreviousapproachesprimarilyrelyonstaticallyselectedsignals,Chapter 6 presentsadynamicsignalselectiontechniquethatwouldbebenecialinawidevarietyofscenariosincludingwhenonlyasetofregionsinadesignisimportantfromdebugperspective.Ourproposedtechniqueselectssignalsdynamicallydependingonthecurrentlyactiveregionsinthecircuitaswellastheprobableerrorlocations.Experimentalresultsdemonstratethatourproposedmethodcandetectupto3timesmoreerrorscomparedtostaticsignalselectionapproaches.Thetracebuffersizeconstraintsthenumberofsignalsthatcanbestoredduringpost-silicondebug.Ifthetracedataarecompressedbeforestoringinthetracebuffer,largernumberofsignalstatescanbestoredusingthesametracebuffersize.Existingtracesignalcompressionalgorithmsreliedondynamicdictionarygenerationwhichmaynotselectthebestdictionaryentries,andhence,mayprovideapoorcompressionperformance.InChapter 7 ,wehaveproposedadynamiccompressionalgorithmbasedonstaticdictionary,whichimprovestherestorationperformanceandreducesthehardwareoverheadassociatedwithit.Ourproposedtechniquecanprovideupto60%improvementinrestorationperformance,whilereducingthehardwareoverheadby84%comparedtoexistingtechniques. 142

PAGE 143

InChapter 8 ,wehaveproposedanefcientobservability-awaretestgenerationframeworktoimproveboththecontrollabilityandobservabilityduringpost-siliconvalidation.Wehavedevelopedatestgenerationtechniquebasedontheobservationpoints(tracesignals).Basedonthegeneratedtests,wereneourselectedsetofobservationpointsinordertoenhancetheoverallerrordetectioncapability.Wemadetwoimportantcontributions:i)tracesignalselectionbasedoninputtests,andii)testgenerationbasedontheselectedsignals.Ourproposedmethodisfoundtoprovidesignicantimprovementinerrordetectionperformance. 9.2FutureResearchDirectionsPost-siliconvalidationhasemergedasanimportantconcerninanychipdesignmethodology.Itisexpectedthatvariousaspectsofpost-siliconvalidationanddebugwillcontinuetobechallengingresearchproblemsinthedevelopmentoffutureSoCdesigns.Theresearchproposedinthisdissertationcanbeextendedinthefollowingdirections:Theproposedsignalselectionapproachesconsidereddesignswithsingleclockdomainandnogatedclocks.However,actualcircuitsmayhavemutipleclockdomains,withsomeoftheclocksbeingderivedfromthesignalsinthecircuit.Thesignalselectionalgorithmspresentedinthisdissertationcanbeextendedtoaccomodatethesefactors.Specically,incaseofcircuitswithgatedclocks,somesignalsshouldbeselectedinordertoreconstructthederivedclocks.Anotherunderlyingassumptionoftheproposedapproachesisthatthereareonlyip-opsandnolatches.Flip-opschangetheirstatesonlywhentransitionsoccur,whilelatchescanchangewheneveraparticularconditionisreached.Inotherwords,ip-opsareedge-trigerredwhilelatchesarelevel-trigerred.Proposedapproachescanbeextendedtoincludetheeffectoflatchesandhenceselectthesignalsappropriately.WehaveseenthatsignalselectionatRTL-levelreducesthetimeandmemoryoverheadcomparedtogate-levelsignalselection.Theoverheadcanpossiblybefurtherreducedbyperformingthesignalselectionathigherabstractionlevel,for 143

PAGE 144

example,atTLM(TransactionLevelModel)level[ 57 ].However,signalselectionathigherabstractionlevelcomeswiththeadditionalpenaltyofpossibledegradationinrestorationperformance.Thesignalselectionalgorithmsshouldbemodiedinordertoincurminimumpenaltyinoverallrestoration.Thesignalselectionalgorithmspresentedinthisdissertation,whetherforrestorationorerrordetection,assumedanemptytracebuffer;thatis,eventherstsignaltobetracedhastobedetermined.Itmaybepossiblethatthedesignengineerprovidesthevalidationengineerasetofsignalsthatneedstobetracedeverycycle.Thesecanbesomeimportantcontrolsignals,orsignalswhicharehighlypronetoerrors.Thesignalselectionalgorithmsneedtobemodiedtoutilizetheinformationaboutthesignalsalreadyprovidedinordertoselectthesubsequentsignalsforthetracebuffer.Theobservation-awaretestgenerationmethoddescribedinChapter 8 dealswithelectricalerrors.However,logicalerrorsarealsoequallyimportantduringpost-siliconvalidation.Itisthereforenecessarytodevelopsimilartestgenerationstrategiesforlogicalandfunctionalerrors.TheapproachproposedinChapter 8 canbemodiedtoincludelogicalerrors.Agenerichigh-leveltest-generationframeworkneedstobedevelopedthatcantakeintoaccountdifferenttypesoferrorsandfaultsdescribedinSection 1.1 .ThetracedatainFigure 1-2 istransferredfromthelogictothetracebufferviaasetofinterconnectionfabrics.Thesefabricsareextremelyexpensiveinnature.Sincetheydonotdirectlycontributetothefunctionalityofthechipexceptvalidation,theircostshouldbekeptatminimum.Efcientsignalselectionalgorithmscanbeusedtoreducethecostofthesefabrics.Knowledgeofthetracesignalsaswellasthelayoutinformationofthechipcanhelpusdevelopefcientroutingalgorithmstominimizethecostofdatatransfertothetracebuffer. 144

PAGE 145

REFERENCES [1] http://www.itrs.net.InternationalTechnologyRoadmapforSemiconductors(ITRS). [2] A.Nahir,A.Ziv,R.Galivanche,A.Hu,M.Abramovici,A.Camilleri,B.Bentley,H.Foster,V.BertaccoandS.Kapoor,Bridgingpre-siliconvericationandpost-siliconvalidation,inDAC,2010,pp.94. [3] M.J.HowesandD.V.Morgan,Reliabilityanddegradation:semiconductordevicesandcircuits,inWiley-InterscienceJournal,vol.1,pp.454,1981. [4] J.Bateson,In-circuittesting,VanNostrandReinhold,1985. [5] H.F.KoandN.Nicolici,Algorithmsforstaterestorationandtrace-signalselectionfordataacquisitioninsilicondebug,IEEETCAD,vol.28,no.2,2009,pp.285. [6] X.LiuandQ.Xu,Tracesignalselectionforvisibilityenhancementinpost-siliconvalidation,inDATE,2009,pp.1338. [7] K.BasuandP.Mishra,EfcientTraceSignalSelectionforPostSiliconValidationandDebug,inInternationalConferenceonVLSIDesign,2011,pp.352. [8] H.F.KoandN.Nicolici,AutomatedtracesignalsselectionusingtheRTLdescriptions,inITC,2011,pp.1. [9] R.Datta,A.Sebastine,andJ.Abraham,Delayfaulttestingandsilicondebugusingscanchains,inETS,2004,pp.46. [10] X.Gu,W.Wang,K.Li,H.Kim,andS.Chung,Re-usingDFTlogicforfunctionalandsilicondebuggingtestinITC,2002,pp.648. [11] G.J.VanRootselaarandB.Vermeulen,Silicondebug:scanchainsalonearenotenough,inITC,1999,pp.892. [12] J.Gao,Y.Han,andX.Li,ANewPost-SiliconDebugApproachBasedonSuspectWindow,inVTS,2009,pp.85. [13] Y.Yang,N.Nicolici,andA.Veneris,Automateddataanalysissolutionstosilicondebug,inDATE,2009,pp.982. [14] J.YangandN.Touba,Expandingtracebufferobservationwindowforin-systemsilicondebugthroughselectivecapture,inVTS,2008,pp.345. [15] E.AnisandN.Nicolici,Lowcostdebugarchitectureusinglossycompressionforsilicondebug,inDATE,2007,pp.225. [16] E.AnisandN.Nicolici,Onusinglosslesscompressionofdebugdatainembeddedlogicanalysis,inITC,2007,pp.1. 145

PAGE 146

[17] O.Caty,P.Dahlgren,andI.Bayraktaroglu,Microprocessorsilicondebugbasedonfailurepropagationtracing,inITC,2005,pp.284. [18] F.Koushanfar,D.Kirovski,andM.Potkonjak,Symbolicdebuggingschemeforoptimizedhardwareandsoftware,inICCAD,2000,pp.40. [19] F.M.DePaula,M.Gort,A.J.Hu,S.Wilton,andJ.Yang,Backspace:Formalanalysisforpost-silicondebug,inFMCAD,2008,pp.1. [20] N.Nataraj,T.Lundquist,K.Shah,Faultlocalizationusingtimeresolvedphotonemissionandstilwaveforms,inITC,2003,pp.254. [21] A.DeOrio,I.WagnerandV.Bertacco,Dacota:Post-siliconvalidationofthememorysubsysteminmulti-coredesigns,inHPCA,2009,pp.405. [22] D.JosephsonandB.Gottlieb,Thecrazymixedupworldofsilicondebug[icvalidation],inCICC,2004,pp.665. [23] M.Abramovici,P.Bradley,K.Dwarkanath,P.Levin,G.MemmiandD.Miller,Arecongurabledesign-for-debuginfrastructureforSoCs,inDAC,2006,pp.7. [24] S.PrabhakarandM.Hsiao,UsingNon-TrivialLogicImplicationsforTraceBuffer-basedSiliconDebug,inATS,2009,pp.131. [25] W.MaoandR.K.Gulati,ImprovinggatelevelfaultcoveragebyRTLfaultgrading,inITC,1996,p.150. [26] N.YogiandV.Agrawal,SpectralRTLtestgenerationforgate-levelstuck-atfaults,inATS,2006,pp.83. [27] H.KoandN.Nicolici,Combiningscanandtracebuffersforenhancingreal-timeobservabilityinpost-silicondebugging,,inETS,2010,pp62. [28] S.PrabhakarandM.Hsiao,Multiplexedtracesignalselectionusingnon-trivialimplication-basedcorrelation,inISQED,2010,pp.697704. [29] X.LiuandQ.Xu,Onmultiplexedsignaltracingforpost-silicondebug,inDATE,2011,pp.16. [30] T.C.MayandM.H.Woods,Alpha-particle-inducedsofterrorsindynamicmemories,inIEEETransactionsonElectronDevices,vol.26,no.1,1979,pp.29. [31] P.Shivakumar,M.Kistler,S.W.Keckler,D.BurgerandL.Alvisi,Modelingtheeffectoftechnologytrendsonthesofterrorrateofcombinationallogic,inDSN,2002,pp.389398. [32] S.Mitra,N.Seifert,M.Zhang,Q.ShiandK.S.Kim,Robustsystemdesignwithbuilt-insoft-errorresilience,inIEEEComputer,vol.38,no.2,2005,pp.4352. 146

PAGE 147

[33] M.Nicolaidis,Timeredundancybasedsoft-errortolerancetorescuenanometertechnologies,inVTS,1999,pp.8694. [34] P.HazuchaandC.Svensson,ImpactofCMOStechnologyscalingontheatmosphericneutronsofterrorrate,inIEEETransactionsonNuclearScience,vol.47,no.6,2000,pp.25862594. [35] A.Sanyal,S.M.AlamandS.Kundu,Abuilt-inself-testschemeforsofterrorratecharacterization,inIOLTS,2008,pp.6570. [36] A.Sanyal,K.GaneshpureandS.Kundu,OnAcceleratingSoft-ErrorDetectionbyTargetedPatternGeneration,inISQED,2007,pp.723728. [37] R.AngladaandA.Rubio,Briefcommunication.Logicfaultmodelforcrosstalkinterferencesindigitalcircuits,inInternationalJournalofElectronicsTheoreticalandExperimental,vol.67,no.3,1989,pp.423425. [38] A.Rubio,J.PonsandR.Anglada,Acrosstalktolerantlatchcircuitdesign,inMidwestSymposiumonCircuitsandSystems,1990,pp.653656. [39] S.Kundu,S.T.Zachariah,Y.S.ChangandC.Tirumurti,Onmodelingcrosstalkfaults,inIEEETCAD,vol.24,no.12,2005,pp.19091915. [40] H.Takahashi,K.J.Keller,K.T.Le,K.K.SalujaandY.Takamatsu,AMethodforReducingtheTargetFaultListofCrosstalkFaultsinSynchronousSequentialCircuits,inIEEETCAD,vol.24,no.2,2005,pp.252. [41] W.Y.Chen,S.K.GuptaandM.A.Breuer,Testgenerationforcrosstalk-inducedfaults:frameworkandcomputationalresults,inJournalofElectronicTesting,vol.18,no.1,2002,1728. [42] A.Sanyal,K.GaneshpureandS.Kundu,TestPatternGenerationforMultipleAggressorCrosstalkEffectsConsideringGateLeakageLoadinginPresenceofGateDelays,inIEEETVLSI,vol.20,no.3,2012,pp.424436. [43] S.Chun,T.KimandS.Kang,ATPG-XP:TestGenerationforMaximalCrosstalk-InducedFaults,inIEEETCAD,vol.28,no.9,2009,pp.14011413. [44] E.Taylor,H.JanandJ.Fortes,TowardsAccurateandEfcientReliabilityModelingofNanoelectronicCircuits,inIEEE-NANO,2006,pp.395. [45] http://www.icarus.com/eda/verilog/. [46] S.P.Mohanty,N.Ranganathan,E.KougianosandP.Patra,Low-PowerHigh-LevelSynthesisforNanoscaleCMOSCircuits,SpringerVerlag,2008. [47] www.opencores.org [48] N.AlawadhiandO.Sinanoglu,RevivalofPartialScan:TestCubeAnalysisDrivenConversionofFlip-Flops,inVTS,2011,pp.260. 147

PAGE 148

[49] J.Saxena,K.Butler,andL.Whetsel,Ananalysisofpowerreductiontechniquesinscantesting,inITC,2002,pp.670. [50] J.S.YangandN.A.Touba,Automatedselectionofsignalstoobserveforefcientsilicondebug,inIEEEVTS,2009,pp.7984. [51] H.ShojaeiandA.Davoodi,Tracesignalselectiontoenhancetimingandlogicvisibilityinpost-siliconvalidation,inICCAD,2010,pp.6872. [52] D.Chatterjee,C.McCarterandV.Bertacco,Simulation-basedsignalselectionforstaterestorationinsilicondebug,inICCAD,2011,pp.595. [53] N.TamarapalliandJ.Rajski,Constructivemulti-phasetestpointinsertionforscan-basedBIST,inITC,1996,pp.649. [54] K.BasuandP.Mishra,TestDataCompressionUsingEfcientBitmaskandDictionarySelectionMethods,inIEEETVLSI,vol.18,no.9,2010,pp.1277. [55] K.BasuandP.Mishra,Anoveltest-datacompressiontechniqueusingapplication-awarebitmaskanddictionaryselectionmethods,inACMGLSVLSI,2008,pp.83. [56] P.DoddandL.W.Massengill,Basicmechanismsandmodelingofsingle-eventupsetindigitalmicroelectronics,inIEEETrans.onNuclearScience,vol.49,no.6,2002,pp.3100-3106. [57] http://www.accellera.org/home/ 148

PAGE 149

BIOGRAPHICALSKETCH KanadBasuhasreceivedhisPh.D.fromtheDepartmentofComputerandInformationScienceandEngineering,UniversityofFloridain2012.HereceivedhisBachelorofEngineeringdegreefromJadavpurUniversity,Kolkata,Indiain2003.Hisresearchinterestsincludesystemleveldesign,testingandverication.Hehaspublishedseveralarticlesinpeerreviewedjournalsandconferences.HehasreceivedtheBestPaperAwardattheInternationalConferenceonVLSIDesign2011.HeisarecipientoftheCISEdepartmentaltravelgrantaswellastheUniversityofFloridaInternationalCenterOutstandingAchievementaward. 149