|
![]() |
|
| UFDC Home |
myUFDC Home | Help | RSS
|
|
| DARK ITEM | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STANDARD VIEW
MARC VIEW
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Full Text | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
PAGE 1 EFFICIENTOBSERVABILITYENHANCEMENTTECHNIQUESFORPOST-SILICONVALIDATIONANDDEBUGByKANADBASUADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012 PAGE 2 c2012KanadBasu 2 PAGE 3 Idedicatethistomyfamily. 3 PAGE 4 ACKNOWLEDGMENTS IwouldliketosincerelythankmyPh.D.advisorProf.PrabhatMishra,withoutwhoseguidance,thisdissertationwouldnothavebeenpossible.IwouldalsoliketothankmyPh.D.committeemembers:Prof.SartajSahni,Prof.SanjayRanka,Prof.GregStittandProf.AnnGordon-Rossfortheirvaluablesuggestions.IamalsothankfultomylabmatesWeixunWang,XiaokeQin,MingsongChen,KartikSrivastava,ChetanMurthy,HadiHajimiriandKamranRahmanifortheirhelpandsupport.IwouldalsoliketothankmymentorsduringmytwointernshipsatIntelCorporationDr.DhurbajyotiKalitaandDr.PriyadarsanPatra.Iwouldalsoliketotakethisopportunitytothankthosewhohelpedmeduringdifferentstagesofmyresearch.IsincerelythankDr.HenryKofromMcMasterUniversityandDr.XiaoLiufromtheChineseUniveristyofHongKongforhelpingmeunderstandvariousaspectsoftracesignalselectionandsignalrestoration.IamgratefultoDr.IlyaWagnerfromIntelCorporationforexplainingmetheconstraintsandobjectivesassociatedwithtestgenerationforpost-siliconvalidation.IwouldliketothankProfKrishnenduChakrabortyandZhangleiWangfromDukeUniversity,andDr.MehrdadReshadifromUniversityofCalifornia,Irvineforhelpfulsuggestions.Iwouldalsoliketoextendmygratitudetowardsmyfamilytohelpmereachthisstage.Finally,IwouldliketothankMr.SachinTendulkar,Mr.AkiraKurosawa,Mr.PaoloCoelhoandMr.JohnDenver,whothroughtheirlivesandimmortalcreationshavealwaysencouragedmetomoveforward,evenintimesofdespair. 4 PAGE 5 TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 12 CHAPTER 1INTRODUCTION ................................... 13 1.1FaultsandDefectsinIntegratedCircuits ................... 15 1.2RestorationofUnknownSignals ....................... 17 1.3SignalRestorationVersusErrorDetection .................. 19 1.4Challenges ................................... 20 1.5ResearchContributions ............................ 24 1.6DissertationOrganization ........................... 26 2BACKGROUNDANDRELATEDAPPROACHES ................. 27 2.1TraceSignalSelection ............................. 28 2.2DynamicSignalSelection ........................... 29 2.3TraceDataCompression ........................... 29 2.4Observability-AwareTestGeneration ..................... 30 3RESTORATION-AWARETRACESIGNALSELECTIONTECHNIQUES .... 31 3.1Gate-levelSignalSelection(GSS) ...................... 32 3.1.1ComputationofEdgeValues ..................... 32 3.1.1.1Independentsignals ..................... 33 3.1.1.2Dependentsignals ...................... 35 3.1.1.3Example ........................... 36 3.1.2InitialValueComputationforStateElements ............. 37 3.1.3InitialRegionCreation ......................... 37 3.1.4RecomputationofNodeValues .................... 38 3.1.5RegionGrowth ............................. 39 3.1.6ComplexityAnalysis .......................... 39 3.2MotivationalExample ............................. 39 3.3RTL-levelSignalSelection(RSS) ....................... 40 3.3.1CDFGGeneration ........................... 42 3.3.2RelationshipComputation ....................... 43 3.3.2.1Directrelationship ...................... 43 3.3.2.2Conditionalrelationship ................... 47 3.3.3SignalSelection ............................. 47 5 PAGE 6 3.4Experiments .................................. 48 3.4.1ExperimentalSetup .......................... 48 3.4.2ResultsonGate-levelSignalSelection(GSS) ............ 49 3.4.3ResultsonRTL-levelSignalSelection(RSS) ............ 52 3.5Summary .................................... 54 4EFFICIENTCOMBINATIONOFTRACEANDSCANSIGNALS ......... 55 4.1BackgroundandMotivation .......................... 56 4.2TraceandScanSignalSelection ....................... 58 4.2.1Trace+ScanDebugArchitecture .................... 58 4.2.2TraceSignalSelectionAlgorithm ................... 59 4.2.3ScanSignalSelectionAlgorithm ................... 62 4.2.3.1Creationofminimalnodeset ................ 63 4.2.3.2Illustrativeexample ..................... 63 4.3ExperimentalResults ............................. 64 4.4Summary .................................... 66 5ERRORDETECTIONAWARETRACESIGNALSELECTION .......... 67 5.1TraceSignalSelectionforErrorDetection .................. 68 5.1.1GraphbasedModelingofCircuits ................... 69 5.1.2EdgeValueComputation ........................ 69 5.1.3NodeValueComputation ....................... 73 5.1.4SignalSelection ............................. 74 5.2Expriments ................................... 75 5.2.1ExperimentalSetup .......................... 75 5.2.2ErrorModel ............................... 76 5.2.3Results ................................. 77 5.3Summary .................................... 79 6DYNAMICSIGNALSELECTION .......................... 81 6.1ProblemFormulation .............................. 82 6.2Region-basedSignalSelection(RSS) .................... 84 6.2.1GraphBasedModelingofCircuits ................... 84 6.2.2ErrorPropagationProbabilityComputation .............. 85 6.2.3SignalSelectionBasedonNodeValues ............... 89 6.3DynamicSignalTracing(DST) ........................ 91 6.4Experiments .................................. 94 6.4.1ExperimentalSetup .......................... 94 6.4.2ResultsforTwoRegions ........................ 96 6.4.3ResultsforThreeRegions ....................... 98 6.4.4ResultsforFourRegions ........................ 99 6.4.5HardwareOverhead .......................... 101 6.5Summary .................................... 102 6 PAGE 7 7TRACEDATACOMPRESSIONUSINGSTATICALLYSELECTEDDICTIONARY 103 7.1TraceDataCompression ........................... 103 7.1.1DictionarySelectionAlgorithms .................... 105 7.1.1.1Dictionary-basedcompression(DC) ............ 105 7.1.1.2Bitmask-basedcompression(BMC) ............ 105 7.1.1.3FixeddictionaryMBSTWcompression(fMBSTW) .... 106 7.1.2DynamicTraceDataCompression .................. 107 7.1.3PerformanceAnalysiswithErroneousTraceData .......... 109 7.1.3.1CompressionpenaltyforDCandBMC ........... 109 7.1.3.2CompressionpenaltyforfMBSTW ............. 112 7.2Experiments .................................. 114 7.2.1CompressionPerformance ....................... 114 7.2.2BRAMRequirement(HardwareOverhead) .............. 117 7.2.3CompressionPerformancewithErroneousTraceData ....... 117 7.3Summary .................................... 120 8OBSERVABILITY-AWAREDIRECTEDTESTGENERATION .......... 122 8.1Test-awareSignalSelection .......................... 124 8.1.1FaultSimulation ............................. 124 8.1.2ErrorDetectionAbilityComputation .................. 125 8.1.3OverlapRemoval ............................ 126 8.2TraceSignalAwareTestGeneration ..................... 126 8.2.1SoftErrorsandFaults ......................... 126 8.2.2CrosstalkFaults ............................. 130 8.3Experiments .................................. 135 8.4Summary .................................... 140 9CONCLUSIONSANDFUTUREWORK ...................... 141 9.1Conclusions ................................... 141 9.2FutureResearchDirections .......................... 143 REFERENCES ....................................... 145 BIOGRAPHICALSKETCH ................................ 149 7 PAGE 8 LISTOFTABLES Table page 1-1Restoredsignalsusingtwoselectedsignals .................... 19 3-1Restoredsignalsusingourmethod ......................... 40 3-2ComparisonwithKoetal. .............................. 49 3-3ComparisonwithLiuetal.withdeterministicinputs ................ 50 3-4ComparisonofGSSwithPrabhakaretal. ..................... 51 3-5RTL-levelversusgate-levelsignalselection .................... 52 4-1Restoredsignalsusingtraceandscan ....................... 58 4-2Comparisonwithexistingtechnique ........................ 65 5-1DetectableErrorsfortheISCAS'89benchmarks ................. 76 6-1SelectedSignalsforeachMUXforn=4andm=4 ............... 92 6-2Tableforn=2andm=2 .............................. 93 8-1Memoryrequirementfortest ............................ 135 8-2Faultcoverageincaseofsofterrors ........................ 138 8-3Coverageofcrosstalkfaults ............................ 139 8 PAGE 9 LISTOFFIGURES Figure page 1-1ValidationandtestingphasesinICdesignow .................. 14 1-2Overviewofpost-siliconvalidation ......................... 15 1-3Signalrestoration ................................... 17 1-4Examplecircuit .................................... 18 1-5Examplecircuitwith12ip-ops .......................... 20 1-6ErrorPropagationfortheexamplecircuitinFigure 1-5 .............. 21 1-7Researchcontributions ............................... 23 3-1Examplecircuitwithngates ............................. 33 3-2Examplecircuit .................................... 35 3-3Graphicalrepresentationofexamplecircuit .................... 36 3-4Regioncreationandgrowth ............................. 38 3-5VerilogcodeandCDFG ............................... 41 3-6AportionoftheCDFGinFigure 3-5B ....................... 45 3-7SimpliedversionofFigure 3-6 ........................... 47 3-8OverviewofourexperimentstoverifyRSS .................... 48 3-9ComparisonofSignalSelectionTime ....................... 52 3-10ComparisonofRestorationPerformance ...................... 53 4-1Examplecircuitwithbothscanandtracesignals ................. 57 4-2ProposedArchitecture:Thewidthwofthetracebufferissharedbymtracesignalsandnsubchainsofthescanchain ..................... 60 4-3Graphicalrepresentationofexamplecircuit .................... 61 4-4ComparisonwithKoetal.andBasuetal. ..................... 65 5-1Examplecircuitwithlabeledsignals ........................ 69 5-2GraphicalrepresentationofFigure 5-1 ....................... 70 5-3ExampleusingANDgate .............................. 71 5-4ExampleusingORgate ............................... 71 9 PAGE 10 5-5Dip-opandNOTgate ............................... 72 5-6EdgevaluesforthegraphinFigure 5-2 ...................... 72 5-7NodevaluesforthegraphinFigure 5-6 ...................... 74 5-8SignalSelectionbasedonremovalofoverlap ................... 75 5-9ComparisonwithRestorationawaresignalselection ............... 77 5-10Variationoferrordetectionwithnumberoftracesignals ............. 78 5-11Variationoferrordetectionwithnumberofip-opstraced ............ 79 5-12Variationoferrordetectionwithnumberofoutputstraced ............ 80 6-1Illustrativeexampleshowingregionsanderrorzones ............... 83 6-2Examplecircuitwith2regionsand12ip-ops .................. 86 6-3GraphicalrepresentationofFigure 1-5 withtworegions ............. 87 6-4ExamplesusingANDandORgates ........................ 87 6-5NodevaluesforregionR1inFigure 6-3 ...................... 90 6-6Datapathandcontrollerdesignform=3andn=3 ................ 93 6-7ProposedDesign ................................... 94 6-8GSS .......................................... 96 6-9EZ-GSS ........................................ 96 6-10RSS+DST ....................................... 97 6-11ComparisonofEDRperformancewhenbothregionsareactive ......... 97 6-12ComparisonofEDRperformancewhenonlyoneregionisactive ........ 98 6-13ComparisonofEDRperformanceontheOpencorescircuits ........... 99 6-14ComparisonofEDRperformancewhenoneregionisactive ........... 99 6-15ComparisonofEDRperformancewhentworegionsareactive ......... 100 6-16ComparisonofEDRperformancewhenoneregionisactive ........... 100 6-17ComparisonofEDRperformancewhentworegionsareactive ......... 101 6-18ComparisonofEDRperformancewhen3regionsareactive ........... 101 7-1Overviewofourtracecompressionprocedure ................... 104 10 PAGE 11 7-2ExampleofdictionaryselectioninfMBSTW .................... 108 7-3ActualTraceDataCompression .......................... 108 7-4Comparisonofcompressionperformance ..................... 116 7-5Compressionperformancewithdictionaryentries ................. 116 7-6BRAMrequirements ................................. 118 7-7ComparisonofcompressionpenaltyforDC .................... 119 7-8ComparisonofcompressionpenaltyforBMC ................... 119 7-9ComparisonofcompressionpenaltyforfMBSTW ................. 120 7-10Comparisonofcompressionpenalty ........................ 121 8-1Outlineofproposedtechnique ........................... 122 8-2Observability-awaretestgenerationow ...................... 123 8-3Examplecircuit .................................... 125 8-4Examplecircuitillustratingtestgenerationforsofterrors ............. 129 8-5Examplecircuitillustratingcrosstalkfaults ..................... 130 8-6Positiveglitchonc .................................. 131 8-7Positivedelayonc .................................. 132 8-8Non-simultaneoustransitions ............................ 132 8-9Duplicatedcircuit ................................... 134 8-10Modiedcircuit .................................... 135 8-11Comparisonofsignalselectionmethodsforsofterrors .............. 137 8-12VariationofEDRwithtracebufferwidthforsofterrordetection ......... 138 8-13Comparisonofsignalselectionmethodsforcrosstalkfaults ........... 139 11 PAGE 12 AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyEFFICIENTOBSERVABILITYENHANCEMENTTECHNIQUESFORPOST-SILICONVALIDATIONANDDEBUGByKanadBasuAugust2012Chair:PrabhatMishraMajor:ComputerEngineeringPost-siliconvalidationiswidelyacknowledgedasamajorbottleneckforcomplexintegratedcircuits.Duetoincreasingdesigncomplexitycoupledwithshrinkingtime-to-marketconstraints,itisnotpossibletodetectalldesignaws(errors)duringpre-siliconverication.Post-siliconvalidationneedstocapturetheseescapedfunctionalerrorsaswellelectricalfaults.Amajorconcernduringpost-silicondebugistheobservabilityofinternalsignalssincethechiphasalreadybeenmanufactured.Designoverheadconsiderationslimitthenumberofsignalstatesthatcanbetracedorstoredinatracebuffer.Thisdissertationproposesnoveltechniquestoenhancetheobservabilityduringpost-silicondebug.Myresearchhasthreemajorcontributions:protablesignal-selection,efcienttracecompressionandobservability-awaretestgeneration.Itproposesefcientsignalselectiontechniquestoenhancetheobservabilityofthecircuit.Varioussignalselectionconstraintsareexploredincludingstaticversusdynamic,traceversusscanandgeneral-purposeversusarchitecture-specicfeatures.Toimprovetheobservabilityfurther,anefcienttracecompressionapproachhasbeenproposed.Extensiveexperimentalresultsdemonstratesignicantimprovementinoverallsignalobservability.Thisdissertationalsoproposesobservation-awaredirectedtestgenerationtechniquestodrasticallyreducetheoverallpost-siliconvalidationeffort. 12 PAGE 13 CHAPTER1INTRODUCTIONDesigncomplexityisincreasingrapidlykeepingpacewithtwo-foldincreaseinnumberoftransistorseverytechnologycycle.Drasticincreaseindesigncomplexityhasledtosignicantincreaseinvalidationcomplexity.ThereportfromInternationalTechnologyRoadmapforSemiconductors(ITRS)[ 1 ]aswellasotherreputedagenciesindicatethatitiscriticaltodevelopefcientdesignvericationtechniquestosecuredesignproductivityscalingatapaceconsistentwithprocesstechnologycycles.Therehasbeenaplethoraofresearcheffortsinbothindustryandacademiatodevelopscalabledesignvalidationapproachesusingacombinationofsimulationbasedtechniquesandformalmethods.Inspiteofextensiveefforts,itisnotalwayspossibletodetectallthefunctionalerrorsandelectricalfaultsduringpre-siliconvalidation.Post-siliconvalidationisusedtodetectdesignawsincludingtheescapedfunctionalerrorsaswellaselectricalfaults.Post-siliconvalidationiswidelyacknowledgedasamajorbottleneckforcomplexintegratedcircuitsincludingmodernmicroprocessorsaswellascomplexSystem-on-Chip(SoC)designs.Variousindustrialstudiesindicatethatthepost-siliconvalidationeffortconsumesmorethan50%ofanSoCsoveralldesigneffort(measuredintotalcost)at65nmtechnology[ 2 ].Thesestudiesalsoemphasizethefactthattheproblemgetsworseastheindustrycontinuestomovetoevensmallergeometries.Figure 1-1 showstheoverviewofthreeimportantvalidationandtestingphasesinatypicalSoCdesignmethodology.Pre-siliconvalidationeffortincludesvalidationofvariousfunctionalaswellastimingrequirementsacrossabstractionlevelsincludingspecicationandimplementation.Manufacturingtestingisprimarilyusedtodetectphysical(structural)defectsineachofthemanufacturedICs.Ontheotherhand,thefocusofpost-siliconvalidationistodetectdesignawsthathaveescapedpre-siliconvalidation.Inreality,vastmajorityoffunctionalerrorsarecapturedinthepre-silicon 13 PAGE 14 stage.Whileasmallpercentageoffunctionalerrorsremain,thetimerequiredtondandxthemisstillveryexpensive.Itisimportanttonotethatmajorityoftheelectricalfaults(includingcrosstalk,delayandtransientfaults)arecapturedduringpost-siliconvalidation,sinceitisdifculttomodelelectricalerrorsduringthepre-siliconvericationphase. Figure1-1. ValidationandtestingphasesinICdesignow Figure 1-2 presentsanoverviewofthepost-siliconvalidationanddebugmethodology.InputtestsareappliedtotheDeviceUnderTest(DUT).Dependingofthetestgenerationtechniques,theinputtestscanberandom,constrained-randomordirectedinnature.Duringexecution(runtime),statesofsomeselectedsignalsaretracedandstoredinanon-chiptracebuffer.Notethatthesignalswhosestatestobestoredaredecidedduringthepre-silicon(design)phaseusingsuitablesignalselectionalgorithms.Toincreasetheeffectivetracebuffersize,varioustracecompressiontechniquescanbeemployed.Whenafailureisdetected,thecontentsofthetracebufferisdumpedouttoaidinpost-silicondebugusinganofinedebugger. 14 PAGE 15 Figure1-2. Overviewofpost-siliconvalidation Theremainderofthischapterisorganizedasfollows.FirstwediscussaboutpotentialfaultsanddefectsinaSystem-on-Chip(SoC).Next,wedescribebasicsofsignalrestorationandcomparesignalrestorationanderrordetection.Finally,wediscussvariouschallengesassociatedwithpost-siliconvalidationandoutlineourcontributionstoaddressthesechallenges. 1.1FaultsandDefectsinIntegratedCircuitsDefectsanderrorsmaygetintroducedatdifferentphasesofSoCdesigncycle-designtime,synthesis,manufacturing,etc.Efcientfaultmodelingofthesearenecessarytoeffectivelyanalyze,detectandxthem.Functionalerrorsareintroducedduringdevelopmentofspecicationaswellasimplementation.Itisextremelyimportanttodetectandxthesefunctionalerrors.Someofthecommondefectsinachip[ 3 ]includeprocessingdefects(parasitictransistors,oxidebreakdown,etc.),materialdefects(surfaceimpurities),time-dependentfailures(electromigration,dielectricbreakdown,etc.)andpackagingfailures(sealbreak).Studies[ 4 ]revealthatofallthedefectsobservedinaPrintedCircuitBoard(PCB),51%areduetoshorts,thusmakingelectrical 15 PAGE 16 errorsalsoimportantinanySoCvalidationmethodology.Commonfaultmodelsusedtomodeldefectsanderrorsare: SingleStuck-at-Fault(representedass)]TJ /F3 11.955 Tf 10.95 0 Td[(a)]TJ /F5 11.955 Tf 10.95 0 Td[(0ors)]TJ /F3 11.955 Tf 10.95 0 Td[(a)]TJ /F5 11.955 Tf 10.95 0 Td[(1) TransistorOpenandShortFault MemoryFaults GlitchFaults PLAFaults FunctionalErrors DelayFaults AnalogFaultsPost-siliconvalidationdealswitherrorsanddefectscorrespondingtoallthesefaultmodels.However,incaseofmanufacturingtesting,errorsprimarilyrelatedtomanufacturingdefects(suchasstuck-atfaultsandtransistoropens/shorts)areconsidered.ElectricaldefectsmanifestasimportanterrorsinmodernSoCs.Softerrorsandcrosstalkfaultsaretwoimportantdefectsthatcanadverselyaffectthecorrectfunctionalityofthechip.Whilesofterrorsarecausedbyradioactiveeffectsondesignimpurities,crosstalkfaultsoccurduetoimperfectcouplingcapacitancebetweentwolinesinthechip.Softerrorscanbemodeledusingsinglestuck-at-faults.Effectsofcrosstalkcanberepresentedasglitchesanddelayfaults.Effectivedirectedtestgenerationstrategiesneedtobeemployedinordertodetectthesefaults.Thetestsshouldbeabletoactivatethefaultsandpropagatethemtowardstheobservationpoints,e.g.,primaryoutputsorinternaltracesignals.Nowwediscussabouttwoimportantconceptsrelatedtosignalselection,signalrestorationanderrordetection.Sinceonlyafewsignalscanbeobservedduringpost-siliconvalidation,itisimportanttorestoretheunknownsignalstatesasmuchaspossibletoenhancetheobservabilityofthechip.Ontheotherhand,itisequallyimportanttoactuallyobservetheerrorsthroughthetracesignals. 16 PAGE 17 1.2RestorationofUnknownSignalsInpost-silicondebug,unknownsignalstatescanbereconstructedfromthetracedsignalstatesintwoways-forwardandbackwardrestoration.Forwardrestorationdealswiththerestorationofsignalsfrominputtooutput,thatis,knowledgeofinputstatescanhelpinrestoringthevalueoftheoutput.Backwardrestoration,ontheotherhand,dealswithreconstructingtheinputfromtheoutput.ForwardandbackwardrestorationcanbeillustratedusingtheexampleinFigure 1-3 .Weusea2-inputANDgatetoexplaintherestorationprocess.ForwardrestorationisshowninFigure 1-3 (a).Whenoneinputis0orbothinputsare1,theoutputcanbeconstructed.Figure 1-3 (b)showsbackwardrestoration.Whentheoutputis1ortheoutput0withoneinput1,theotherinputcanbereconstructed.Itiseasiertoreconstructsignalsusingforwardratherthanbackwardrestoration.Ifallbuttheunknownsignalvaluesareknown,forwardrestorationcandenitelydeterminetheunknown,whilebackwardmightfailtodosoinspeciccases.Forexample,inFigure 1-3 (c),whentheoutputis0andoneoftheinputis0,thereisnowaytodeterminethestateoftheotherinput.Althoughwehaveillustratedthesignalreconstructionusinga2-inputANDgate,therestorationprocedurecanalsobedescribedinasimilarmannerforothertypesoflogicgatesaswellaswithmoreinputs. Figure1-3. Signalrestoration 17 PAGE 18 Figure1-4. Examplecircuit ThesignalreconstructionprocedureisillustratedusingasimplecircuitshowninFigure 1-4 .Letusassumethatthetracebufferwidthis2,thatis,statesoftwosignalscanberecorded.Wetrytorestoretheothersignalstatesbyapplicationoftheexistingsignalselectionmethodspresentedin[ 5 ]and[ 6 ].TheresultsareshowninTable 1-1 .The`X'srepresentthosestateswhichcannotbedetermined.Theselectedsignalsareshowninshades.Forboth[ 5 ]and[ 6 ]thesignalsselectedareCandF,inthatorder.Restorationratio,whichisapopularmetricforcalculationofsignalrestorabilityisdenedas: RestorationRatio=numberofstatesrestored+numberofstatestraced numberofstatestraced.(1)LetuscalculatethenumberofrestoredstatesinTable 1-1 .IfweconsidertherowcorrespondingtosignalA,twoentrieshavevalue0,whiletheresthavevalueX(non-restoredstate).Thus,twostatesareknown.Similarly,twostatesareknownfortherowcorrespondingtosignalB.SincesignalCistraced,allthestatesareknown(noXintherow).ForsignalD,threeentriesintherowhavevalue0,hencethreestatesarereconstructed.Computinginthismanner,atotalof26statesarereconstructed.Outofthem,10entries(correspondingtosignalsCandF)aretracedstates.Therefore,RestorationRatiointhiscaseis2.6. 18 PAGE 19 Table1-1. Restoredsignalsusingtwoselectedsignals SignalCycle1Cycle2Cycle3Cycle4Cycle5 AX0X0XBX0X0X C 1 1 0 1 0DXX000EXX000 F 0 1 1 0 0GX00X0HX00X0 1.3SignalRestorationVersusErrorDetectionThissectioncomparestwowidelyusedmetric,staterestorationanderrordetection.Majorityoftheexistingsignalselectionapproaches[ 6 7 ]trytofocusonrestorationofunknownsignalsusingtheknowledgeofknownsignalstates.LetusconsideranexamplecircuitinFigure 1-5 comprisedof12ip-ops.InFigure 1-5 ,tracingstatesofip-opsAandBincyclet,helpstorestorethestateofip-opDincyclet+1,sincetheinputtoip-opDisanORoftheoutputsofip-opsAandB.Similarly,sinceip-opsCandHareconnectedbyaNOTgate,tracingHincycletprovidesthestateofCincyclet)]TJ /F5 11.955 Tf 11.11 0 Td[(1.Notethatsignalrestorationcanproceedinbothforward(inputtooutput)andbackward(outputtoinput)direction.Forexample,therestorationofDfromAandBwasinforwarddirection,whiletherestorationofCfromHisinbackwarddirection.Incaseoferrordetectiononlybackwardrestorationismeaningful.Toexplainthisscenario,apartoftheillustrativeexampleofFigure 1-5 hasbeenredrawninFigure 1-6 .Whenip-opDisinerror,thebugcanonlypropagatealongaforwarddirectioninitsfan-outconetowardsip-opsEandG.Therefore,inordertodetecttheerrorinD,wehavetotraceeitherofthesetwoip-ops.TracingtheinputsofD,orthoseip-opsthatareinitsfan-incone(AandC)donothelp.Giventhatadesignerisinterestedindetectinganerror,itismeaningfultofocusdirectlyonerrordetectionmetricinsteadofrestorationperformance. 19 PAGE 20 Figure1-5. Examplecircuitwith12ip-ops 1.4ChallengesTherearemanychallengesindevelopingefcienttechniquesforpost-siliconvalidationanddebug.ThissectiondescribesveimportantchallengesthatIhaveaddressedinthisdissertation.Challenge1:Thesignalstobetracedshouldbeselectedcarefullyinordertomaximizetherestoration.Signalselectiontechniquesbasedonpartialrestoration(describedinChapter 3 )wereproposedbyKoetal.[ 5 ]andLiuetal.[ 6 ].Ifthetracebufferwidthisn,boththeseapproachesselectednsignalswithhighestpartialrestorationabilitiesfortracing.However,signalselectionbasedonpartialrestorationdoesnotprovidethebestreconstruction.Also,thesemethodsarecomputationallyinefcient,sincetheyrequirelongsignalselectiontimeoperatingongate-levelnetlists. 20 PAGE 21 Figure1-6. ErrorPropagationfortheexamplecircuitinFigure 1-5 Anefcientalgorithmisneededwhichcanselecttracesignalsprovidinghighrestorationofuntracedsignalswithfastsignalselectiontime.Challenge2:Thetimerequirementforgate-levelsignalselectionalgorithmsishighbecauseoftheexcessivenumberofvariablesusedtorepresentip-opsandotherinternalsignals.OnepromisingalternativetoreducesignalselectiontimeistoexploreathigherabstractionlevellikeRegisterTransferLevel(RTL).Koetal.[ 8 ]developedasignalselectionapproachwhichselectssomesignalsfromtheRTL-levelandsomefromthegate-levelnetlistdescriptionofthecircuit,thatis,theirsignalselectiondoesnotdependontheRTL-leveldesignalone.ItisdesiredthatasignalselectiontechniquebedevelopedwhichsolelyreliesontheRTL-levelimplementationofthedesign,thusreducingthememoryandtimerequirementassociatedwithanygate-levelnetlist.TheprimarychallengetodevelopanRTL-levelsignalselectionalgorithmistoensurethatthisapproachdoesnotincuranyrestorationpenaltycomparedtogate-levelsignalselectionalgorithms.Challenge3:Scanbaseddebuggingispopularinmanufacturingtestdomain.Theyareprimarilyusedtoidentifyfabricationdefects.Theuseofscanchainsforimprovingsignalobservabilityduringpost-silicondebughasbeenextensivelystudied[ 9 11 ].Severalapproaches[ 12 13 ]havestudiedthecombinationofscanandtrace 21 PAGE 22 signalsforpostsilicondebug.Amajorchallengeisthatthenumberoftracesignals,scansignalsandscandumpfrequencyareinter-dependent.Forexample,selectingmoretracesignalsimplieslessspaceforscansignals,andviceversa.Evenwhenthespaceforscansignalsisreserved,choosingalargescanchain(toomanyscansignals)implieslongerscandumpfrequency.Inotherwords,thereneedstobeacriticalbalancebetweenhowmanysignalstoobserveversushowmanysignalstatescanbeobtainedinaspecicclockcycle.Challenge4:Tillnow,wehaveconsideredhowtracingsomesignalshelpsustoreconstructtheuntracedsignals.Debugginginvolvesdetectionoferrorsbythetracedsignals.Thus,itisnecessarytotracesignalswithanaimofmaximizingthenumberoferrorsdetected.Duringdebug,somepartsofthecircuitmaybelessimportantfordebugpurposesthantherestduringparticularcycles.Forexample,thedebugengineermightbeinterestedindetectingerrorsintheprocessorblockofanSoCinsteadofthememoryblock.Therefore,itisimportanttodevelopasignalselectionalgorithmthatcandynamicallyselectsignalsbasedonthecurrentlyactiveregionstoenhanceerrordetection.Challenge5:Thecircuitobservabilitycanbefurtherenhancedbycompressingthetracebuffercontents.Thisallowsmoresignalstatestobestoredinthetracebufferwithoutincreasingitssize.Sincethetracebufferdoesnotcontributetotheactualchipfunctionalityexceptdebug,itssizeshouldbeassmallaspossible.Thetracebufferhastwoparameters,widthanddepth.Widthreferstothetotalamountofdebugdatathatcanbestoredpercycle,whiledepthreferstothetotalnumberofcyclesoverwhichdebugdatacanbestored.Inordertokeepthetracebuffersizeconstant,whileincreasingtheamountoftracedatathatcanbestored,eitherthedepthorthewidthhastobecompressed.Differenttechniquesoftracedatacompression,eitherbydepth[ 14 15 ]orbywidth[ 16 ]havebeenproposed.Depthcompressionapproachesdealwithselectingthecycleswherethedataarelikelytobeerroneous,andstorethe 22 PAGE 23 dataforonlythosecycles.Ontheotherhand,inwidthcompression,thetracedataobtainedeverycycleisrstcompressedandthenstoredinthetracebuffer.Anefcientlosslesstracedatacompressiontechniqueisnecessarywhichcanprovidebothfastcompressionandhighcompressionefciencywithoutintroducingsignicanthardwareoverhead.Challenge6:Efcientsignalselectionandtracecompressionenhancetheobservabilityduringpost-silicondebug.However,itisequallyimportanttoimprovecontrollabilityaspectduringpost-siliconvalidation.Theinputtestsappliedtothecircuitshouldbecarefullydesignedinordertomaximizeerrordetection.Thisinvolvesestimatingthecornercasescenariosanddesigningteststhatcanactivatethosescenariostoallowtheerrorstopropagatetothetracedsignals.Therefore,itisimportantforthetestdesignertohaveknowledgeofthesignalsthatarebeingtraced.Conversely,ifthetestsareknownapriori,thetracesignalscanbeselectedefcientlytoenhanceerrordetectioncapability. Figure1-7. Researchcontributions 23 PAGE 24 1.5ResearchContributionsMyresearchproposesnoveltechniquestoaddresschallengesinenhancingthesignalobservabilityforpost-siliconvalidationanddebug.Theobjectiveofmyresearchistodevelopefcientsignalselectionandtestgenerationaswellasadeptapproachesfortracecompression.Thefourmajorcontributionsofmyresearcharesummarizedasfollows.Figure 1-7 highlightsthesecontributionsinICdesignmethodology.Contribution1[TraceSignalSelection]:ThiscontributionaddressestherstthreechallengesoutlinedinSection 1.4 .Existingsignalselectionalgorithmsusedpartialrestorability,whichisnotoptimalforsignalrestoration.Atotalrestorability1basedsignalselectionalgorithmhasbeenproposedwhichiscomputationallymoreefcientandproducessignicantlybetterrestorationperformancecomparedtotheexistingapproaches.WehaveextendedthesignalselectionapproachtoRTLlevel,toreducethesignalselectiontimeaswellasthememoryrequirements,withoutsacricingsignicantlyontherestorationperformance.Wehavealsoproposedanefcienttechniquetodeterminetheprotablecombinationoftraceandscansignalsforpost-silicondebug.Theentiretracebufferwidthisdividedtoaccommodatebothtraceandscansignals.Ourapproachusesagraphbasedrepresentationtoselectthreeimportantaspects:(i)efcienttracesignalstobestoredeverycycle,(ii)themostprotablescansignalstobeincludedintheshadowscanchain,and(iii)thescandumpfrequencybasedonthetracebufferwidthconstraints.Contribution2:[DynamicSignalSelection]:ThiscontributionaddressesChallenge4.Existingtracesignalselectionalgorithmsdealwithimprovingtherestorationofuntracedsignalsanddoesnotfocusonerrordetection.Wehaveproposedasignalselectionalgorithmwhichselectsprotablesignalsforefcienterrordetection.Ouralgorithmlaysemphasisonhowerrors,whichpropagatefromerrororigintowardssignalsin 1PartialandtotalrestorabilityhavebeendiscussedinChapter 3 24 PAGE 25 thefan-outconecanbedetected.Wehavealsoproposedaregion-awaresignalselectionalgorithm(RSS)thatselectsprotablesignalsduringdesigntimebasedontheknowledgeoffunctionalregionsandassociatederrorzones.Wehavedevelopedalow-overheaddynamicsignaltracing(DST)hardwaretoenabledesignerstotracedifferentsetofsignalsduringexecutionbasedonactive(relevant)functionalregions.Thislaysemphasisontheactiveerrorzonesinthecircuitthatcanbedetectedusingaspecicallyselectedsetoftracesignals.Tothebestofourknowledge,thisistherstattemptindevelopinganefcientspatio-temporalsolutionfordynamictracesignalselection.Contribution3:[TraceDataCompression]:ThiscontributionaddressesChallenge5.Existingtracecompressionalgorithmschoosethedictionaryonline(thusincludesalltheuniqueentriesinthedictionary),whichresultsinpoorcompressionperformanceaswellasincreasedhardwareoverhead.Studies[ 14 15 ]revealthatthedifferencebetweentheactualtracedataandtheideal(error-less)tracedataisverysmallforpost-silicondebug(2-5%).Thismotivatedustodevelopalosslessdictionarybasedwidthcompressionschemethatoperatesonreal-timetocompressthetracedatausingastaticallyselecteddictionary.Thisprovidesabettercompressionperformancesinceonlyprotableentriesarestoredinthedictionary.Thisalsoprovideshugereductionincompressionhardwareoverhead.Threedifferentcompressionalgorithmshavebeenproposedtotrade-offbetweencompressionperformanceandhardwareoverhead.Contribution4:[Observability-AwareTestGeneration]:ThiscontributionaddressesChallenge6.Theteststhatareusedduringpost-siliconvalidationareproducedusingrandomorconstrained-randomtestgenerationtechniques.However,errordetectionperformancecanbeenhancedifthetestsaredesignedkeepingthedetectionobjectiveinmind.Thetestsshouldbedesignedtoexcitetheerrors(speciallycornercaseones)andpropagatethemtotheobservationpoints,thatis,tothetracedsignals.Moreover,thetracesignalsarechosenassumingtheinputtestsarerandominnature.However,if 25 PAGE 26 thetestsetsareknownapriori,adequatetracesignalscanbeselectedtoenhanceerrordetection.Weproposeefcienttechniquestodeterminethetracesignalsaswellasthetestsetsthatwouldhelpindetectingerrorsinthedesign. 1.6DissertationOrganizationThedissertationisorganizedasfollows.Chapter 2 presentsrelatedapproachesforsignalselection,tracecompressionandtestgeneration.Chapter 3 describesourtracesignalselectionalgorithm.Chapter 4 exploresanefcientcombinationoftraceandscansignals.Chapter 5 describesoursignalselectiontechniquethatfocusesonerrordetection.Chapter 6 presentsourdynamicsignalselectionalgorithm.AtracecompressiontechniquebasedonstaticdictionaryselectionispresentedinChapter 7 .Chapter 8 describesobservability-awaretestgenerationtechniques.Finally,Chapter 9 concludesthedissertation. 26 PAGE 27 CHAPTER2BACKGROUNDANDRELATEDAPPROACHESThefocusofthisdissertationistoreducetheoveralleffortofpost-siliconvalidation,whichisoneofthemostcriticalstagesinSystem-on-Chip(SoC)designmethodology.Post-siliconvalidationanddebugcomprisesofsignalobservationandanalysisasdescribedinSection 1.2 .Aprimaryproblemforpost-silicondebugisthelimitedobservabilityofinternalsignalstatessincethechiphasalreadybeenfabricated.Oncethesignalstatesareknown,theycanbeanalyzedusingsomealgorithmslikefailurepropagationtracing[ 17 ]toidentifytheerrorsinthecircuit.Koushanfaretal.[ 18 ]proposedamethodtoobtaintheinternalstatesofasystemusingagoldencut.However,theirmethodisnotapplicableforpost-silicondebugsinceitisdifculttostopexecutionofaprocessrunningonachipandobtaintheknowledgeofallthecurrentsignalstates.FormalanalysisforpostsilicondebugproposedbyDePaula[ 19 ]isnotapplicabletocircuitswithalargenumberofgates.PhysicalprobingtechniqueswereproposedbyNatarajetal.[ 20 ].DecreaseinfeaturesizeandgrowingcomplexityofICdesignshavemadeitdiffculttoimplementthesetechniquesinpractice.AmethodforvalidationofmemorysubsysteminCMPswasproposedbyDeOrio[ 21 ],whichonlyfocusesonthememorysubsystem.Scanbaseddebuggingtechniquessuchas[ 11 ]arenotappropriatesincetheyrequiretostopthecircuitfunctionalitywhenthescandataisbeingwritten.Thisisparticularlynotbenecialincaseswherethefunctionalerrorsaredrasticallyapart.Doublebuffering[ 22 ]ofscanelementshelpstomitigatethisproblem,butwithalargeareapenalty.Design-for-Debug(DfD)techniqueshavebeenusedextensivelytoincreasetheobservabilityofinternalsignalsofthesilicon.Generallythisisperformedbysamplingthedatawhichisstoredinon-chiptracebuffers.VariousDfDtechniqueslikeembeddedlogicanalyzer(ELA)[ 23 ]andshadowipops[ 22 ]havebeenproposedovertheyearsforpost-silicondebug.ELAcanbeusedtoprobeintothechipandrecordsomeinternal 27 PAGE 28 logicstates.Thetraceisthenrecordedinanon-chiptracebuffer.Duringdebug,thecontentsoftracebufferistransferredtoanofinedebuggerviasomeJointTestActionGroup(JTAG)interface.Inthefollowingsubsections,wediscussrelatedapproachesintheareaofsignalselection,tracecompressionandtestgeneration. 2.1TraceSignalSelectionSinceELAallowsonlyafewsignalstotrace,theyshouldbecarefullyselectedinordertoenhancetheoverallobservabilityduringpost-siliconvalidation.AlogicimplicationbasedtracesignalselectionmethodwasproposedbyPrabhakaretal.[ 24 ].Theauthorsusedtheprimaryinputs,inadditiontothetracedsignalsforrestorationpurposes.Koetal.[ 5 ]andLiuetal.[ 6 ]haveproposedgenerictracesignalselectionalgorithmsinwhichafewimportantsignalscanbetracedandotherscanbereconstructedfromthem.Alltheseapproachesusegate-levelnetlistmodelofadesignforsignalselectionpurposes.BothspaceandtimecomplexitycanbereducedifthesameoperationisperformedathigherabstractionlevelslikeRTL.However,careshouldbetakentoavoiddegradationofrestorationperformance.ConsiderableresearchhasbeenperformedovertheyearsinvolvingbothgateandRTL-leveldesignsintheeldsoftestingandvalidation.AnRTLfaultgradingapproachwasusedtoamelioratethegate-levelfaultcoveragebyMaoetal.[ 25 ].RTL-leveltestsweregeneratedandreusedfordetectinggatelevelstuck-at-faultsbyYogietal.[ 26 ].RecentlyRTL-levelsignalselectionalgorithmsforpost-siliconvalidationwereproposedbyKoetal.[ 8 ].However,[ 8 ]selectssomesignalsfromtheRTL-levelandthentherestfromthegate-leveldescriptionofthecircuit.Thus,bothRTL-levelandgate-levelmodelsofthedesignarenecessaryin[ 8 ]toselectsignalsthataddstothememoryandtimeoverheadofsignalselection.Scanchainshavebeenusedforimprovingthesignalobservabilityduringpost-silicondebug[ 9 11 ].Acombinationofscanandtracesignalsforpostsilicondebugwasrstproposedby[ 12 ].Intheirapproach,thetracebufferisusedtodetermine 28 PAGE 29 thetimewindowoverwhichthebugmighthaveoccurred.Theexperimentisthenre-runwiththescandataconcentratingonthatparticulartimewindow.Combinationoftraceandscandatawerealsousedby[ 13 ]forsilicondebug.Theyusedmultiplerunsofthesameexperimenttoobtainthetracedata.Thescandatawereusedtoselectaknownstate.Bothoftheseapproachesassumerepeatableexperimentsi.e.,thecircuitresponseisuniformformultipledebugruns.Koetal.[ 27 ]proposedanapproachofcombiningscanandtracedatathatworkswelleveninnon-repeatableexperiments.However,theirmethodusedanexhaustiveexplorationofallpossibletrace-scancombinationstodeterminethebestresultforaparticularcircuit.Thisexhaustiveexplorationmaynotbesuitableforpracticalpurposes. 2.2DynamicSignalSelectionExistingtracesignalselectionalgorithmsstaticallyselectedasetofsignalsthatwouldbetracedeverycycle.Prabhakaretal.[ 28 ]proposedanapproachtoalternatebetweentwosetsofsignalsinalternatecycles.Asaresult,itisaveryspeciccaseoftemporaldistributionoferrorswithoutanyconsiderationforspatialdistribution.AmultiplexedsignalselectionforerrordetectionwasproposedbyLiuetal.[ 29 ].Theirapproachisanad-hocsignalselectionheuristicbasedonerrorvisibilitymetric.Itapproachdoesnotconsiderthechallengesassociatedwithdynamicsignalselectioninthepresenceofspatialandtemporaldistributionoferrors. 2.3TraceDataCompressionToincreasetheamountofdatathatcanbestoredinatracebufferwhilekeepingthetracebuffersizeconstant,tracecompressiontechniqueshavebeenproposed[ 14 16 ],whichcompressthetracedatabeforestoringthemintotracebuffer.Thisenablesustoobservemoretracedatawhilekeepingthetracebuffersizeconstant.Thetracebufferhastwoparameters-widthanddepth.Widthreferstothenumberofsignalswhosestatescanbestoredeverycycle,whiledepthreferstothenumberofcyclesoverwhichthetraceisstored.Existingtracecompressionapproachesdifferintermsof 29 PAGE 30 compressionobjectives-while[ 16 ]compressesthewidthofthetracebuffer,[ 14 15 ]compressthedepth. 2.4Observability-AwareTestGenerationSofterrorsandcrosstalkfaultsaretwomajorelectricalerrorsfoundinafabricatedSoC.Effectofsofterrorsonmemorydeviceshadbeenstudiedasearlyasin1979byMayetal.[ 30 ].Overtheyears,researchers[ 31 34 ]havestudiedthevariousaspectsofsofterrors.Sanyaletal.[ 35 36 ]haveproposeddifferentmethodsfordirectedtestgenerationforsofterrors.However,theseapproachesarenotdesignedforpost-siliconvalidationpurposes,thatis,theyassumealltheoutputsignalsofalogicblockarevisible.However,duringpost-siliconvalidation,sincethechipisfabricated,observingtheoutputsignalsofeverycomponentmaynotbefeasiblesincethesecomponentscanbeembeddedinanSoC.Theonlyobservablepointswouldbethetracesignals.Thetestgenerationalgorithmsneedtobemodiedtotakethisintoaccount.Crosstalkfaultsoccurwhentwolinesinacircuitaresonearthattheirmutualcapacitanceaffectstheirstate.Effectsofcrosstalkfaultsondigitalcircuits[ 37 40 ]havebeenstudiedextensively.Existingtestgenerationalgorithmsforcrosstalkfaults[ 41 43 ]sufferfromthesameproblemasthecorrespondingtestgenerationalgorithmsforsofterrors-thatis,theyarenotsuitableforapplicationinpost-siliconvalidationduetolimitedobservabilitythroughtracesignals. 30 PAGE 31 CHAPTER3RESTORATION-AWARETRACESIGNALSELECTIONTECHNIQUESDuringpostsilicondebug,thesignalstatesarestoredinanon-chiptracebuffer.Limitedsizeofthetracebufferallowsonlysomesignalstobetraced.Therestofthesignalshavetobereconstructedfromthem.Therefore,thesignalstobetracedshouldbecarefullyselectedinordertoenhancetheoverallsignalrestorationacrossthecircuit.Existingsignalselectionapproaches[ 5 6 ],whichutilizepartialrestorability1arenotabletoprovidebestpossiblesignalreconstruction.Weproposetotalrestorability2basedsignalselectionalgorithmsthatcanoutperformexistingapproachesinbothsignalselectiontimeandthequalityoftheselectedsignals.Signalselectionathigherabstractionlevelsispromisingtoreducetheoverallsignalselectiontime.SincethenumberofvariablesusedtorepresenttheregistersandothersignalsislessatRTL-level,theoverallcomplexityismuchreduced.WehaveproposedanefcientsignalselectionapproachinRTLlevel.OurRTL-levelsignalselectionreducesthesignalselectiontimeaswellasthememoryrequirementssignicantlywithoutsignicantpenaltyinrestorationperformance.Ourproposedmethodhassimilaritywiththesignalselectionapproachdevelopedby[ 8 ]sincebothuseacontroldataowgraph.However,thereisabasicdifferencebetweenthetwoapproaches.While[ 8 ]selectssomesignalsfromtheRTL-levelandtherestfromthegate-leveldescription,ourproposedapproachoperatesonlyontheRTL-leveldescription.Hence,unlike[ 8 ],itdoesnotrequirethegate-levelmodelofthecircuitforsignalselection. 1PartialRestorabilityofasignalreferstotheprobabilitythatthesignalvaluecanbereconstructedusingknownvaluesofsomeothertracedsignals2TotalRestorabilitymeasureswhetheragroupofsignalscandenitelyreconstructasetofsignalstates. 31 PAGE 32 3.1Gate-levelSignalSelection(GSS)Algorithm 1 showsourgate-levelsignalselectionprocedure(GSS)thathasveimportantsteps.Edgeandnodevaluesarecalculatedinthersttwosteps.Totalrestorabilitycomputationisthenusedtocreateregionandrecomputenodevalues,accompaniedbysignalselection.Theremainderofthissectiondescribeseachofthestepsindetail. Algorithm1:Gate-levelSignalSelection Input: Circuit,TraceBuffer Output: ListofselectedsignalsS(initiallyempty)1:Computethenodevalues.2:FindthestateelementwithhighestvalueandaddtoS.3:CreateInitialRegion.whiletracebufferisnotfulldo 4:Recomputethenodevaluesofstateelements.5:ComputeregiongrowthbyndingthestateelementwithhighestvaluenotinSandaddtoS.endreturnS 3.1.1ComputationofEdgeValuesAnedgebetweentwostateelementsisthepathtakentoreachanelementfromanother,whilepassingthroughanumberofcombinationalgatesbetweenthem,thatis,therecannotbeanystateelementsinbetweenthem.Theedgemaybeintheforwardorbackwarddirection.InFigure 1-4 ,anedgebetweenthetwoip-opsAandCpassesthroughanORgate.Inageneralcase,therecanbeanynumberandtypeofcombinationalgatesinanedge.TondtheprobabilitythatCisinuencedbythevalueatA(whichisthevalueoftheedgeAC),therecanbetwocases(independentanddependent)asdiscussedbelow:3 3Weareshowingcalculationsforforwardrestorabilities;however,thoseforbackwardrestorabilitiescanbederivedinsimilarlines. 32 PAGE 33 3.1.1.1IndependentsignalsConsidertwoedgesACandBCinFigure 1-4 .Here,thetwoinputsignalsoftheORgateinfrontofip-opCaredrivenbyip-opsAandB,whichareindependent.Hence,theedgesACandBCareindependent.Tocalculatetheedgevaluesforanindependentscenario,weuseagenericexampleinFigure 3-1 .Later,wewillshowhowthecalculationworksforthespeciccaseinFigure 1-4 Figure3-1. Examplecircuitwithngates Figure 3-1 hastwoip-opsKandL.WewanttondhowtheinputofLissensitizedbytheoutputofK.TheinputofLcorrespondstotheoutputofthegateGn.ThepathfromKtoLisindependentofanyotherpathsthroughwhichtheoutputofKpropagates.Let'sconsiderthegateG1.Wedenefourprobabilities:PI0,N,PI1,N,PO0,NandPO1,N.Here,PI0,Nindicatestheprobabilitythatanoden(gateorip-op)hasaninputvalueof`0'whenanothernodeiscontrollingit.Similarly,PI1,N,PO0,NandPO1,Nindicatethecasesforinputvalueof`1',outputvalueof`0'and`1',respectively.Theoutputofip-opKcaninuencetheoutputofG1intwocases:i)outputofKisacontrollingvalue,andii)alltheinputstoG1arecomplementofthecontrollingvalue.LetusconsiderG1tobea2-inputANDgate.WedenePG1astheoverallprobabilityofKcontrollingG1.Accordingto[ 44 ], PG1=PO1,G1+PO0,G1(3)Now,let'sdenePO0,G1andPO1,G1.LetPcond0,G1andPcond1,G1betheprobabilitythattheoutputofG1followstheoutputofK,i.e.,theoutputofG1is0(1),whentheoutputofKis0(1).Forsimplicityofcalculation,inthisexample,weassumePI0,G1=PI1,G1=0.5 33 PAGE 34 (thatis,occurrenceof0or1followsequalprobabilityattheinput). PO0=1,G1=Pcond0=1,G1PI0=1,G1(3)Now,fora2-inputANDgate,Pcond0,G1is1,since0isthecontrollinginput.Therefore,weobtainPO0,G1=0.5.Similarly,since1isthenon-controllinginput,Pcond1,G1is0.5,whichgivesPO1,G1=0.25.FromEquation 3 ,itcanbeseenthatPG1=0.75.Now,wereturntoourmaingoal,thatis,todeterminehowKcontrolsL.WerstndtheeffectoftheoutputfromKasitpropagatestothenextgateG2andthenextrapolatealongtheentirepathtoL.WeusethesamesetofEquations 3 and 3 again,exceptthattheinputisG1hereandtheoutputisG2.Obviously,thevaluesofPI0,G2andPI1,G2wouldbePO0,G1andPO1,G1obtainedfromEquation 3 .ForexampleifG2isalsoa2-inputANDgate,applyingEquation 3 ,weobtain,PO0,G2=0.5,andPO1,G2=0.125.Therefore,wegetPG2=0.625,wherePG2istheprobabilityforthegateG2denedinEquation 3 .Inthisway,thecalculationcontinuesuntilwereachL,toobtainthevalueoftheedgeKL.IftherearencombinationalgatesbetweenKandL,weget PO0=1,Gn=1in(Pcond0=1,Gi)PI0=1,G1(3)Finally,Equation 3 isusedtocomputetheprobabilityPGn,whichcorrespondstothevalueoftheedgeKL.WeusethesecomputationstoshowhowanedgevalueiscomputedincaseofthecircuitinFigure 1-4 .Let'scomputethevalueofedgeAC.WenametheORgateinbetweenthetwoasgateGandweassumethatPI0,G=PI1,G=0.5.SinceitisanORgate,Pcond0,G=0.5andPcond1,G=1.Therefore,Equation 3 canbeusedtoobtainPO0,G=0.25andPO1,G=0.5.Equation 3 cannowbeusedtoobtainPG=0.75,whichrepresentsthevalueoftheedgeAC. 34 PAGE 35 3.1.1.2DependentsignalsIncaseofdependentsignals,weneedtodeterminetheprobabilityofastateelementoutputinuencinganm-inputgate,whentheoutputofthestateelementaffectslinputs(l2)ofthegate.WehaveusedagenericexampleinFigure 3-2 tocalculatetheedgevalueincaseofdependentsignals.Itshouldbenotedthatdependentsignalswerenotconsideredby[ 5 ]or[ 6 ]. Figure3-2. Examplecircuit Let'sconsiderFigure 3-2 .Itcanbeseenthattwoinputs(x,y)oftheminputgateGnareaffectedbyip-opK.Forthis,ourgoalwouldbetocombinethedependentedgessothattheedgewillhaveindependentsignals.WecantheneasilyutilizetheformulausedinSection 3.1.1.1 tocomputetheedgevalue.WedesiretondPO1,GnandPO0,Gn,inlineswiththeparameterPI=O0=1,NdenedinSection 3.1.1.1 .LetusassumethatGnisanANDgate.ForanANDgate,since0isthecontrollingvalue,havingeitheroftheinputsas0willensurea0beingpropagatedintothegateGn.Therefore PI0,Gn=PO0,x+PO0,y)]TJ /F3 11.955 Tf 10.95 0 Td[(PO0,x&y(3)PO0,x&ysubtractstheprobabilitywhenbothare0,sinceitisbeingcomputedtwice.Similarly,since1isthenon-controllinginput,weget PI1,Gn=PO1,x&y(3)wherePO1,x&yistheprobabilitywhenbothxandyare`1'.Let'sevaluatethetermsPO0,x&yandPO1,x&y.LetPcond0=1,x=ybetheprobabilitiesthatx(y)is0(1)whenthe 35 PAGE 36 outputofKis0(1).PO0=1,x&ycanbedenedasPO0=1,x&,y=(Pcond0=1,xPcond0=1,y)PO0=1,KWiththehelpofEquation 3 ,thiscanbereducedto PO0=1,x&y=PO0=1,xPO0=1,y PO0=1,K(3)SincethepathsfromKtoxandfromKtoyareassumedtobeindependent4,Equation 3 canbeusedtoobtainthevaluesPO0=1,x=y.ApplicationofEquations 3 and 3 providethevaluesofPI0=1,Gn.ThenalPGncanbeobtainedusingEquations 3 and 3 ,andtheinformationonthenumberofinputstothegateGn.ThiscorrespondstothevalueoftheedgeKL. 3.1.1.3ExampleWenowproceedtoshowhowthecalculationsdescribedinSection 3.1.1.1 andSection 3.1.1.2 canbeusedtodeterminetheedgevaluesforthecircuitinFigure 1-4 .AgraphicalrepresentationofthecircuitisshowninFigure 3-3 Figure3-3. Graphicalrepresentationofexamplecircuit Thestateelementsarerepresentedbynodesandanedgebetweentwostateelementsisrepresentedbyastraightline.Itshouldbenotedthatthereareno 4Ifanyoneofthesepathsconsistofdependentsignals,theaboveprocedurecanbeappliedinarecursivemanneruntilitbecomesanequivalentindependentpath. 36 PAGE 37 dependentedgesinthisexample.Alltheedgeshaveonetwo-inputgateinbetweenthem,Asaresult,alltheedgevaluesare3 4(obtainedfromSection 3.1.1.1 ).Wewillusethisgraphtoexplainoursignalselectionalgorithm. 3.1.2InitialValueComputationforStateElementsWedenethevalueofastateelementasthesumofalltheedgesattachedwithit,inbothforwardandbackwarddirection.Forexample,inFigure 3-3 ,thevalueofip-opcisthesumoftheweightsofalledgesconnectedwithit,thatis,CA,CB,CDandCE.Itisimportanttonotethatwehaveusedathresholdinordertopreventcombinationalloopsinsidethecircuitduringedgevaluecomputation.Thisparameterwasusedby[ 5 ]aswell.Ourcomputationofthestateelementvaluesareindependentofthesequentialloopsinthecircuit.Inasequentialloop,theoutputofastateelementdependsonanotherinboththepreviousandthenextcycle.However,bothcannotbetrueatthesameclockcycle;thatis,thesamestateelementcannotdeterminetheoutputofanotherinthesamecyclebybothforwardandbackwardrestoration.Whileforwardrestorationcandeterminethestateinatleastthenextcycle,backwardcandetermineitatmostthepreviouscycle. 3.1.3InitialRegionCreationAregionisacollectionofstateelementsattachedtogether.Itisnotnecessarythatallthestateelementshaveanedgewitheachotherintheregion.However,eachstateelementintheregionmusthaveatleastoneedgewithanotherstateelementintheregion.InFigure 3-3 ,theip-opsA,B,C,DandEformaregion.Therststateelementtobechosenistheonewiththehighestvalue,basedonthecalculationsinSection 3.1.2 .Itisaddedtoalistcalledknown.Now,allstateelementswhichhaveanedgewiththerecentlyselectedelementareaddedtotheregion.WeshowbyanexampleinFigure 3-4A howthisportionofouralgorithmisusedtoperformtheselectionoftheprotablesignals.Theedgevaluesareshownalongeach 37 PAGE 38 AInitialregioncreation BRegionGrowthFigure3-4. Regioncreationandgrowth edge.Thevaluesoftheip-ops(additionofallit'sedgevalues)areshowninboldalongsideeachip-op.Forexample,ahas3edgesAC,ADandAG,eachhavingavalue3 4.Therefore,thevalueforais3 4+3 4+3 4=9 4.TheipopwiththehighestvalueinFigure 3-4A isC.AllthenodeswhichhaveanedgefromCareincludedintheregion.TheregionisrepresentedbythesplineinFigure 3-4A 3.1.4RecomputationofNodeValuesTherststateelementinFigure 3-3 tobetracedisalreadyknown(cinthepreviousexample).However,thereareotherstateelementsthatneedtobetracedaswell.Toselectthesubsequentstateelements,theirvaluesarerecomputed.Thestateelementwhosevalueisbeingcomputedmayhaveanedgetoanelementinsidetheregionaswellasoneoutsidetheregion.Edgestostateelementsinsidetheregionaregivenhigherweight.Asdiscussedbefore,manyrestorabilitycomputationsrequireknowledgeofmorethanonesignaloftheinput/output5.Therefore,itisbettertogainmoreknowledgeofthesignalsthatarealreadyintheregion,thusincreasingtheirrestorabilityvaluesandtherefore,aimingfortotalrestorabilityofthosesignals.Existingapproaches[ 5 ]and[ 6 ]recomputetherestorabilityvaluesaftereachiteration, 5Forexample,whenalltheinputstoagatearecomplementofthecontrollingvalue. 38 PAGE 39 whichwhentranslatedtothegraphinFigure 3-3 ,wouldcorrespondtoedgevaluerecomputation.Clearly,thisismorecomputationallyintensive. 3.1.5RegionGrowthThestateelementwiththehighestrestorabilityandnotinthelistknownisdetermined.Iftwostateelementshavethesamevalue,theonewiththehigherforwardrestorationistraced.Thisisbecause,backwardrestorationfailsinsomecaseswhereasforwardrestorationdoesnotwhenalltheinputsareknown.ForexampleinFigure 3-4A ,thenextstateelementtobetracedisA.Itisincludedinthelistknown.Ifthetracebufferisalreadyfull,calculationswillstop,otherwisetheregioniscontinuedtogrow.Allstateelementshavinganedgetotherecentlyselectednodeareaddedintheregion.AsshowninFigure 3-4B ,inthiscaseGisaddedsinceGistheonlynodeconnectedtoaandnotintheregion.Thedottedlineindicatestheoriginalregion.Next,recomputationofstateelementvaluesasinSection 3.1.4 isreconsideredandthisprocessisiterateduntilthetracebufferisfull. 3.1.6ComplexityAnalysisInthissection,wecomputethecomplexityofouralgorithm.LetVbethenumberofstateelementsinthecircuitandEbethenumberofedgesinthecircuit.Letnbethenumberofsignalstobetraced,thatis,thesizeofthetracebufferisn.Therststep,thatis,edgevaluecomputationtakesO(E)time,whileip-opvaluecomputationsforeachtimeasignalisselectedtakesO(V)time.Toselectnsignals,thetimerequiredisO(NV).Therefore,theoveralltimecomplexityofouralgorithmisO(E+NV).Ontheotherhand,thetimecomplexityofexistingalgorithmsisO(NE).Since,E>>V,thetimecomplexityofourproposedalgorithmisless.TheoverallspacecomplexityofouralgorithmisO(E+N+V).Since,E>>N+V,thespacecomplexityreducestoO(E). 3.2MotivationalExampleWenowemployourproposedmethod(describedinSection 3.1 )forselectingsignalsinthecircuitinFigure 1-4 .TherstsignalthatwetraceisC.Notethatthis 39 PAGE 40 wasthesamesignalthatwaschosenby[ 5 ]and[ 6 ].ThesecondsignalthatwechooseisA,basedontotalrestorabilitycomputations.TracingAalongwithCguaranteestoreconstructDeverycycle.Asindicatedearlier,existingmethodsselectFasthesecondtracedsignal.Clearly,CandFtogetherdonotprovideanysuchguarantees.TheresultsareshowninTable 3-1 .Itcanbeseenthatourmethodprovidesarestorationratioof3.2,whichisbetterthantheoneprovidedby[ 5 ]and[ 6 ]. Table3-1. Restoredsignalsusingourmethod SignalCycle1Cycle2Cycle3Cycle4Cycle5 A 0 0 0 0 1B1010X C 1 1 0 1 0DX0000EX1000FXX100GX0000HXX000 3.3RTL-levelSignalSelection(RSS)ToshowhowsignalreconstructioncanbeefcientlyperformedinRTLlevel,let'sconsidertheVerilogdesigninFigure 3-5A .Thedesignconsistsofthreeregister-variablesnamelya,bandc(eachcorrespondtoasetofip-ops)aswellastwoinputsignalsdande.Therearealsothreeothersignalsm1,m2andm3.Intheexample,aandbare8bitslong,candeareof7bits,whiledisjustaone-bitsignal.Toshowhowreconstructionisperformed,let'sobservehoweachoftheseip-opsareassigned-aistheconcatenatedvalueofdandc;bistheresultoflogicaloperationsbetweena,m1,m2andm3whilecattainsthesumofanarithmeticoperationbetweeneandaconstantnumber.Let'sassumethatwetracethestateofa.Wenowexplainhowtracingofaincyclekhelpsustoreconstructtheotherstates.Theassignmentofbshowsthatthestateofbincyclek+1canbereconstructedfromthestateofabyforwardrestoration.Fromtheassignmentofa,thestatesofcanddincyclek)]TJ /F5 11.955 Tf 11.38 0 Td[(1canbereconstructed 40 PAGE 41 fromstateofabybackwardrestoration.Finally,fromthelaststatement,thatis,theassignmentofc,stateofecanberestoredincyclek)]TJ /F5 11.955 Tf 11 0 Td[(2bybackwardrestoration.Thus,weseethattracingofonlyonestateofacanreconstructthestatesof4othervariablesindifferentcycles. ARTLVerilogexample BCDFGofVerilogcodeFigure3-5. VerilogcodeandCDFG Algorithm 2 showsoursignalselectionprocedurethathassiximportantsteps.Intherststep,acontroldataowgraph(CDFG)isgeneratedtomodeltheentiresystem.SinceintheRTL-level,eachregistervariablerepresentsmultiplestateelements,weusetheregistervariablesforsignalselectionpurpose.However,thetracebufferwidthreferstothetotalnumberofstateelementsrepresentedbytheseregistervariables.Forexample,theregistervariable[7:0]awillrepresent8stateelements,andthereforeselectionofvariableaimpliesthat8tracebufferlocationsareneeded.TherelationshipbetweenthedifferentregistervariablesisobtainedfromtheCDFG.Theserelationsareusedtoproducethetotalrestorabilityvaluesforthevariables.Theregistervariablewiththehighestvalueischosenfortracing.Onceavariableischosenfortracing,alltheothervariablevaluesarerecomputedinthesamemannerasinAlgorithm 1 .Thesteps 41 PAGE 42 4)]TJ /F5 11.955 Tf 11.13 0 Td[(6arecontinueduntilthetracebufferisfull.Theremainderofthissectiondescribeseachofthestepsindetail. Algorithm2:RTL-levelSignalSelection Input: RTLdescriptionofdesign,No.oftraceentries Output: ListofselectedsignalsS(initiallyempty)1:DeveloptheCDFGoftheRTLdescription.2:Findtherelationshipbetweentheregistervariables.3:Findtheinitialvaluesoftheregistervariables.whiletracebufferisnotfulldo 4:Findtheregistervariablewiththehighestvalue.5:AddallcorrespondingstateelementstothelistS.6:Recomputevaluesforalltheregistervariables.endreturnS 3.3.1CDFGGenerationTherststepofRTLlevelsignalselectionistogeneratetheControlDataFlowGraph(CDFG)fromtheRTLmodel.CDFGcanbegeneratedusinganystandardHDLparser.Forouruse,wehavegeneratedtheCDFGbymodifyingtheopensourceIcarusVerilogparser[ 45 ]fortheVerilogcircuits.Although,ourstudiesarebasedonVerilogbenchmarks;ourapproachisalsoapplicableforVHDLdesigns.TheformatofourCDFGrepresentationissimilartoMohantyetal.[ 46 ].Figure 3-5B showstheCDFGrepresentationoftheVerilogcodeinFigure 3-5A .TheCDFGcanrepresentboththemovementofcontrolsignalsaswellasdatavalues.Thedottedarrowsindicatethecontrol-ow(transitions)intheCDFG,whiletheboldarrowsrepresentthedataow(computations).Forexample,intherighthandsideofFigure 3-5B ,thereisaboldarrowfromatotheANDgate.ThisisbecauseaisaninputoftheANDgate.ThecirclesintheCDFGrepresentoperationalandcontrolnodes,whiletheboxesrepresentstoragenodes.Forexample,thecircleinthetoprepresentsanORassignmentfortheconditionalstatementalways,whilethesquareatthebottominrighthandsideofFigure 3-5B representsthestorageinthenodeb.Itshouldbenotedthatdirectassignmentslikea<=7b0arejustrepresentedasaboldarrowwithvalue0 42 PAGE 43 enteringaboxforstorageofvaluea.Inthiscase,sincethreevariablesa,bandcareallbeingassigned0together,theyaregroupedinasinglebox.ThisbasicrepresentationcanbefurtherextendedtorepresenttheCDFGofacomplexdesign.ThisCDFGrepresentationisusedasinputtothenextstep,relationshipcomputation. 3.3.2RelationshipComputationTherelationshipofasignalwithotherscanbeobtainedfromtheCDFG.Therelationshipcomputationforasinglesignalinthecircuitprovidetheeffectofthatsignalonothers.Tocomputetherelationshipofthesignals,werstnotethattherecanbetwomainrelationshiptypes,namelydirectrelationshipandconditionalrelationship.Thesetwoclassesandtheirrespectiverelationshipcomputationsareexplainedasfollows. 3.3.2.1DirectrelationshipTwosignalsaresaidtobedirectlydependentwhentheyoccuronthesamelineofasignalassignment.Forexample,intheRTLdescriptionshowninFigure 3-5A ,thesignalpairs(a,b)and(a,c)havedirectrelationship.Thisisbecauseboththevariableassignmentsoccurinsidetheif'block.Directrelationshipcanbeoftwotypes,namelyforwardandbackwardrelationship.Forwardrelationshipdealwiththepropagationofvaluesintheforwarddirection,thatisfromtherighthandsideoftheassignmenttothelefthandside.Backwardrelationshipontheotherhanddealswiththereverse,thatisfromlefthandtorighthandsideoftheassignment.Forexample,intheRTLdescriptionoftheexampleinFigure 3-5B ,ahasaforwardrelationshiponb,whilebhasabackwardrelationshipona.Wewilluseasimplegenericexampletoshowhowthedirectrelationshipcomputationisperformed.Later,wewillconsideraspecicexampleshowninFigure 3-5B .Atypicalsignalassignmentstatementlookslikey<=x1OP1x2OP2x3OP3.......xn 43 PAGE 44 whereOPrepresentsanyoperation(eg.,AND,OR,etc.).Wecanseethattherearensignalsontherighthandsideoftheassignmentstatement.Wewanttondouttherelationshipofeachofthesesignalsony.Letusassumethateachofthesesignalsarekbitslongandallthexi'sareindependent.WealsoassumethateachoftheOPi'sareANDgates.Therefore,theassignmentstatementcanberewrittenasy<=x1&x2&...........&xnThesamecomputationscanbeextendedtootheroperations,aswellasdifferentoperationsforeachOPi.Let'scomputetherelationshipofxi(1 PAGE 45 possiblecasesofvalueassignmentstothex's.Thesecalculations,asstatedabove,canbeextendedforotheroperationsaswell.Forexample,ifOPwasanORoperation,Equations 3 and 3 willbemodiedas Py1,xi=1 2kn(3) Py0,xi=2k(n)]TJ /F8 8.966 Tf 6.97 0 Td[(1) 2kn(3) Figure3-6. AportionoftheCDFGinFigure 3-5B WenowapplythesecomputationstothespecicexampleinFigure 3-5B .Figure 3-6 showsaportionofFigure 3-5B thatshowsdependencyofbandcona.Weassumethatallthesignalsareindependenthere.Twonewvariablesg1andg2areintroducedfortheeaseofillustration.Clearly,wehaveg1=a&m1g2=m2&m3b=g1jg2 45 PAGE 46 Tondtherelationshipofaonb,werstndtherelationshipofaong1,andtheng1onb.ItcanbeseenfromEquation 3 and 3 ,Pg10,a=28 216Pg11,a=1 216Tondthesecondpart,thatistherelationshipofg1onb,weuseEquations 3 and 3 ,Pb1,g1=1 216Pb0,g1=28 216Combiningthesetwosetsofequations,wegettherelationshipofaonbasPb1,a=1 21628 216Pb0,a=1 21628 216Finally,usingEquation 3 ,weget,Pba=1 223=0.0000001Sincecandahaveadirectconcatenationrelationship,thevalueofPacisobtainedas1.0.Sincethereisnodirectassignmentrelationshipbetweenbandc,thereisnoedgebetweenthem.Thus,weobtainthevaluesofedgesbaandacofFigure 3-6 .Figure 3-7 showsasimpliedversionofFigure 3-6 withtheedgevalues(shownbelowtheedges).Thenodevaluesarecomputedbyaddingtheedgevalues.Forexample,thenodecisconnectedtoonlyoneedgewithvalue1.0,therefore,thenodevalueofcis1.0.Similarly,thenodevalueforais1.0+0.0000001=1.0000001.Likewise,thenodevalueforbis0.0000001.Thesenodevaluesareshowninthegure(ontopofnodes).Inthissection,wehavedescribedthedirectrelationshipwhenthesignalsareindependent.However,similartoSection 3.1.1.2 ,wecanhavedependentsignalsas 46 PAGE 47 Figure3-7. SimpliedversionofFigure 3-6 well.ThenatureofdependentsignalsarederivedfrommultiplebranchesoftheCDFG.ComputationsfordependentsignalsaresimilartothecomputationsinSection 3.1.1.2 3.3.2.2ConditionalrelationshipConditionalrelationshipcorrespondstothenon-assignmentdependencies.Forexample,intheRTLcodecorrespondingtoFigure 3-5B ,thesignalsaandbhaveconditionalrelationshiponreset6.Wegenerallydonotconsiderbackwardconditionalrelationship,since,thesearenotindirectassignmentstatements.ConditionalrelationshiparecomputedinthesamewayasinEquation 3 ,howevertheoperationsarecheckedinsidetheconditionalblock.Forexample,weconsiderthefollowingcode:if(morn)x<=y;Here,thesymbolxhasaconditionaldependenceonmandn.Sincethereareonlytwovariablesmandnintheconditionaldependency;thedependencyvalueis3 4,asobtainedfromEquations 3 and 3 inSection 3.1.1.1 .Theconditionalrelationshipsarecomputedinthismannerforallthesignalsinthecircuit. 3.3.3SignalSelectionOncethevaluesofthevariablesarecomputed,thenextstepistoselectthebestonefortracing.Thesignalselectionprocedureissimilartothegatelevelsignalselection.Thevariablewiththehighestvalueisselectedandtherestofthevaluesarerecomputedusingregiongrowth.ThispartissameasinAlgorithm 1 andhencenot 6Itshouldbenotedthatwedonotconsiderconditionalrelationshipofgeneralcontrolsignalslikeclock(clk)orreset. 47 PAGE 48 discussedhere.InFigure 3-7 ,registervariableahavingthehighestvalueischosenfortracing.Theprocesscontinuesuntilthetracebufferisfull. 3.4ExperimentsInthissection,werstcompareourapproachwithexistinggate-levelsignalselectiontechniques[ 5 6 ].Next,wedemonstratehowourproposedRTL-levelsignalselectioncanfurtherimprovesignalselectiontime. 3.4.1ExperimentalSetupWeappliedourgate-levelsignalselectionapproach(GSS)ontheISCAS'89benchmarksusedby[ 5 ]and[ 6 ]tocomparewiththeirmethodsandhenceshowtheeffectivenessofouralgorithm.Wehavedesignedasimulatorinthelinesoftheonedescribedby[ 6 ]forourpurpose.Wehaveimplementedthesimulatorasaniterativeprocesswhichterminateswhenitisnotpossibletorestoreanymorestates.Wehavefedthesimulatorwith10setsofrandomvaluesandnotedtheaveragerestorationratio. Figure3-8. OverviewofourexperimentstoverifyRSS 48 PAGE 49 Figure 3-8 givesanoverviewofourexperimentalsetuptoverifytheRTLlevelsignalselectionalgorithm(RSS).Forthispurpose,wehaveusedVerilogcircuitsobtainedfromOpencoreswebsite[ 47 ].ItshouldbenotedthatwehavenotusedtheISCAS'89benchmarkssinceanRTLdescriptionofthesewerenotavailable.WehavemodiedtheIcarusVerilogparser[ 45 ]togeneratetheControlDataFlowGraph(CDFG).TheCDFGisthenparsedbyanotherprogram,whichprovidesthelistofselectedsignalsusingAlgorithm2.AscanbeseeninFigure 3-8 ,wehavecomparedtheresultsobtainedusingGSSandRSSonthesamecircuitstocomparetherestorationperformanceofeachapproach.ThesignalsselectedusingRSSaremappedtogate-levelandtherestorationperformanceisnoted.Simultaneously,theRTLdesignissynthesizedtogate-levelnetlist7,andGSSisappliedonthenetlist.AcomparisonoftherestorationperformanceusingthetwoalgorithmsrevealthattheyarealmostsameasdiscussedinSection 3.4.3 .Thus,ourRTL-levelsignalselectionalgorithmdoesnotincuranysignicantrestorationpenaltycomparedtothegate-levelsignalselectionalgorithm. 3.4.2ResultsonGate-levelSignalSelection(GSS) Table3-2. ComparisonwithKoetal. RestorationRatioRestorationRatiowithrandominputswithdeterministicinputs CircuitKoOurImpro-KoOurImpro-etal.approachvementetal.approachvements3858438421.16203.33s384179161.89161.8s3593248501.0425351.4 Wewouldliketocompareoursignalselectionapproachwiththeothercloselyrelatedmethods.Table 3-2 comparestheperformanceofourapproachwiththeoneproposedbyKoetal.[ 5 ]usingthethreelargestISCAS'89benchmarkcircuits.Allthe 7anysynthesistoolcanbeusedforthispurpose;wehaveusedSynopsysDesignCompiler 49 PAGE 50 experimentshavebeenperformedwithatracebufferofwidth32.Table 3-2 isdividedintothreedistinctparts.Therstcolumnindicatesthecircuitname.Thenextthreecolumnscomparetheperformancewhenrandomsetsofinputsareusedtodrivethecircuits.Inthiscase,eventhecontrolsignalsaredrivenusingrandominputs.Asaresult,thecircuitmightfallintooneoftheresetstates.Theimprovementcanbedenedastheratiobetweentherestorationratiousingourapproachandthatof[ 5 ].ThethirdpartofTable 3-2 comparesourapproachwith[ 5 ]whenthegatesofthecircuitaredrivendeterministically.Thismeansthatthecontrolsignalsaredrivenusingvaluesthatpreventitfromgoingtoaresetstate,whiletheothersignalsaredrivenwithrandominputs.FromTable 3-2 ,itcanbeseenthattheimprovementobtainedusingrandominputsismoderate(31%onaverage).Ontheotherhand,considerablegain(117%onaverage)isobtainedwhenweuseouralgorithmfordeterministicinputs.Asdiscussedearlier,randominputstocontrolsignalsmightleadtoresetstates,whichareresponsibleforhighrestorationforboththeapproaches.Therefore,improvementobtainedislessinthiscase.Asstatedin[ 6 ]deterministicinputsareactuallyusedincircuitsduringreal-lifeapplications.Hence,gainobtainedwiththemaremoresignicant.Table 3-3 comparestherestorationratioofourproposedapproachwiththeoneproposedbyLiuetal.[ 6 ]forthethreelargestISCAS'89benchmarks.Asbefore,atracebufferwidthof32isused.Inthiscase,theinputsaredeterministicinnature.Anaverageimprovementof65%isobserved.ItcanbeseenthattheimprovementhereislessthantheoneobtainedinTable 3-2 .Thiscanbeattributedtothefactthatthealgorithmproposedby[ 6 ]ismoreefcientthan[ 5 ]. Table3-3. ComparisonwithLiuetal.withdeterministicinputs RestorationRatio CircuitLiuetal.OurapproachImprovements385849202.22s3841714161.14s3593222351.6 50 PAGE 51 WenowcompareourapproachwiththeoneproposedbyPrabhakaretal.[ 24 ].[ 24 ]haveusedtheprimaryinputsalongwiththetracedsignalsforsignalrestoration.Tillnow,weonlyusedthetracesignalstorestoretherestofthesignalsonthechip.However,toenablefaircomparison,wehaveincludedtheprimaryinputsforoursignalrestoration.TheresultsareshowninTable 3-4 usingatracebufferofwidth32.Itshouldbenotedthattheimprovementsaremoderate(onanaverage10%)inthiscase.Whenweusetheprimaryinputsforrestoration,mostofthestatesatlaterclockcyclescanberecovered.Ontheotherhand,thestateswheretheinputtestvectorscannotreachduetostatedepthinearlycyclescanberestoredusingthetraceddata.Asreportedin[ 24 ],about90-95%ofthestateswererestoredusingtheirmethod.Hence,thescopeforimprovementislimited. Table3-4. ComparisonofGSSwithPrabhakaretal. RestorationRatio CircuitPrabhakaretal.OurapproachImprovements53784.845.01.03s92345.26.01.15s1585013.915.81.14s3858434.840.51.16s3593252.453.31.02 Figure 3-9 comparesoursignalselectiontimeagainstthetimetakenbyKoetal.[ 5 ]andLiuetal.[ 6 ]forthethreelargestISCAS'89benchmarkcircuits.TheX-axisdenotesthedifferenttracebufferwidths.Itcanbeseenthatourapproachtakessignicantlylesstime(upto90%)comparedtothem.Thisisprimarilyduetothefactthat[ 5 ]and[ 6 ]recomputesedgevaluesineveryiterationwhereasweonlyrecomputethenodevalues.Althoughourgate-levelsignalselectionalgorithmprovidessignicantimprovementovertheexistingapproaches([ 5 ]and[ 6 ]),itdoesnotguaranteethemaximumrestorationpossibleusingthesametracebuffersize.Forexample,ifweconsideratracebufferofwidth32,GSSdoesnotguaranteetochoosethebest32signalsthatcan 51 PAGE 52 Figure3-9. ComparisonofSignalSelectionTime providethemaximumrestorationpossible.Adetailedanalysisisneededtodeterminethemaximumrestorationpossibleusingaparticulartracebuffersizeandthesignalstobetracedinordertoobtainthatrestorationperformance. 3.4.3ResultsonRTL-levelSignalSelection(RSS)Inthissection,wediscusshowourRTLlevelsignalselectionalgorithmcanfurtherimprovethesignalselectiontimewithoutcompromisingtherestorationratiosignicantly.Asdiscussedbefore,wehaveappliedourapproachonthedesignsobtainedfromtheOpencoresbenchmarks.WehavecomparedourRSSapproachwiththegate-levelsignalselectionprocedure(GSS).TheresultsareshowninTable 3-5 .Similartothepreviousexperiments,wehaveassumedatracebufferofwidth32. Table3-5. RTL-levelversusgate-levelsignalselection CircuitMemorySizeReductionSpeedup TotalCPU8.1697WishbourneLCDcontroller22.811923dmx512tranceiever191.24733OPBonewire3.223600SimpleRS232Uart3.8500 52 PAGE 53 TherstcolumninTable 3-5 providesthecircuitname.Thesecondcolumnshowsthememorysizereductionwhichistheratioofmemorysizeingate-levelandRTL-level.ThelastcolumngivesthespeedupobtainedusingRSScomparedtoGSS.Speedupcanbedenedastheratioofgate-levelsignalselectiontimetoRTL-levelsignalselectiontime.Ascanbeseen,RSSisupto3600timesfasterandrequiresupto191timeslessmemorycomparedtoGSS. Figure3-10. ComparisonofRestorationPerformance Finally,wewouldliketocomparetherestorationperformanceofRSSandGSSusingthreeOpencorebenchmarks(OPBonewire,dmx512transceiverandWishbourneLCDcontroller).TheresultsareshowninFigure 3-10 whenthebenchmarksaredrivenusingdeterministicinputs.AswecanseeinFigure 3-10 ,therestorationperformanceforgate-levelandRTL-levelaresimilar.Thegate-levelrestorationperformanceisfoundtobeslightlybetterthantheRTL-levelinsomecases.TheprimaryreasonforthisistherepresentationofstateelementsasarraysinRTL-level.Wheneverweselectasignalfortracing,weareactuallytracingalltheelementsinthearray.However,allofthesignals 53 PAGE 54 inthearraymaynotbeequallybenecial.Someothersignalscouldhavebeenselectedforbetterrestorationperformance.Itshouldbenotedthatourproposedtracesignalselectionalgorithmhasahightemporalobservability(sincethesignalsaretracedeverycycle)butalowspatialobservability.Ifthecircuitdoesnothavemanydominatingsignalsorifthecircuitissuchthattheoverallrestorationcapacityofthetracesignalsarelow,tracingasmallsetofsignalswouldnothelpinrestoringmanyoftheuntracedsignalstates.Apossibleoptiontoimproveobservabilityistoincreasethesetoftracesignals.However,thiswouldmeananincreaseintracebuffersizewhichdenitelyaddstothedebugoverhead.Inordertotracemoresignalswhilekeepingthetracebuffersizesame,apossiblealternativeistocompromiseontemporalobservabilitybutimprovespatialobservability.ThisalternativehasbeenexplainedinChapter 4 3.5SummaryEffeicientsignalselectionisimportanttoenhanceobservabilityduringpost-silicondebug.Wedevelopedtechniquestoemploytotalrestorabilityforselectingthemostprotablesignalsthatcanprovidebetterrestorationcomparedtowhensignalsareselectedusingpartialrestorabilityequations.Weobservedtheperformanceofourgate-levelandRTL-levelsignalselectionalgorithmsusingISCAS'89andOpencoresbenchmarks.Ourexperimentalresultsdemonstratedtwomajoradvantages-ourapproachcanprovidefaster(upto90%)signalselectionaswellassignicantlybetter(upto3times)restorationratiocomparedtoexistingapproaches.OurRTL-levelsignalselectionapproachcanfurtherimprovesignalselectiontimebyseveralorders-of-magnitudeandalsorequireslessmemorycomparedtothegate-levelsignalselectionalgorithms. 54 PAGE 55 CHAPTER4EFFICIENTCOMBINATIONOFTRACEANDSCANSIGNALSThetracesignalselectionalgorithmsdescribedinChapter 3 lackspatialobservabilitysinceasmallsetofsignalsisbeingtracedeverycycle.Toimprovethespatialobservabilityduringpost-silicondebugwhilekeepingthetracebufferlengthxed,scandatacanbecombinedwithtracesignals.Recently,Koetal.[ 27 ]haveshowntheimportanceofcombiningscanchainsandtracesignals.Theyuseapartofthetracebufferinputbandwidthtostoreselectedtracesignalseverycycle.Theremaininginputbandwidthisusedtodumpthescansignalsatacertainfrequency.Althoughthisapproachproducedpromisingresults,thereareseveralchallengestomakeitusefulinpractice.Onemajorissueisthatitusedexhaustiveexplorationtodeterminetheprotablecombinationoftraceandscansignals.Suchanexhaustiveexplorationcanbeinfeasibleforrealdesigns.Anothermajorconcernisthattheselectedscansignalsincludealmostalltheip-ops.Suchanapproachisneitherpracticalnorprotableinmanyrealscenarios,sinceahugenumberofshadowip-opswillbenecessary.Also,thetimeforscandumpwillincrease,thuseffectivelydecreasingthenumberofscandumps(onlyabout20over1000cyclesfortheISCAS'89benchmarks).Wehaveproposedanefcienttechniquetodeterminetheprotablecombinationoftraceandscansignals.Ourapproachusesagraphbasedrepresentationtoselectthreeimportantaspects:(i)efcienttracesignalstobestoredeverycycle,(ii)themostprotablescansignalstobeincludedintheshadowscanchain,and(iii)thescandumpfrequencybasedonthetracebufferwidthconstraints.Itisimportanttonotethattracesignalstatesarestoredeverycyclewhereasscansignalstatesofaspecicclockcyclearestoredbasedondumpfrequency1.Amajorchallengeisthatthesethree 1Forexample,ifthetracebufferwidthis32,and8tracesignalsareused,wehavespaceleftforonly24scansignals.Ifwechooseascanchainof48ip-ops,thescandumpshouldbeineverytwo(4824=2)cycles.Inotherwords,inclockcycle0,states 55 PAGE 56 aspectsareinter-dependent.Forexample,selectingmoretracesignalsimplieslessspaceforscansignals,andviceversa.Evenwhenthespaceforthescansignalsisreserved,choosingalargescanchain(toomanyscansignals)implieslongerscandumpfrequency.Inotherwords,thereisacriticalbalancebetweenhowmanysignalstoobserveversushowmanysignalstatescanbeobtainedforaspecicclockcycle.Ourproposedapproachaddressesthesechallenges.Ourexperimentalresultsshowthatourmethodcansignicantlyimproverestorationperformancecomparedtoexistingmethods. 4.1BackgroundandMotivationToexplainhowcombinationoftraceandscansignalscanbeusedtoimprovesignalobservability,wewillreferbacktoFigure 1-4 inChapter 1 .Usingonlytracesignals,AandC,wewereabletoreconstruct32signalstates,including10tracedonesand22newlyreconstructedones.Wenowshowhowcombinationoftraceandscansignalscanhelpinsignalreconstructionusingthesamecircuit.Inthepreviousexample,thetracebufferstoredatotalof10states(width2anddepth5).Inthiscase,weuseatracebufferthatcanstore11states.SignalCisselectedfortracingeverycycle.Theothertwoimportantsignals,AandFareusedasscansignals.Thescandumpisperformedinalternatecycles.ThemodiedcircuitisshowninFigure 4-1 2.Table 4-1 showsthetraced,scannedandrestoredsignalsusing[ 27 ].ThestatevaluesforsignalCistracedeverycyclewhereasthestatevaluesforscansignals(A of8tracesignalsarestored,whereasonlythestatesofrst24scansignalsarestored.Similarly,inthenextcycle,8tracesignalstatesandthelast24scansignals(withstatesofcycle0)arestored.2Ourmethodusespartialscan.RecentresearchbyAlawadhietal.[ 48 ]hasshownthatpartialscancanbeusedwithoutincorporatingadditionalpenaltycomparedtofullscan. 56 PAGE 57 Figure4-1. Examplecircuitwithbothscanandtracesignals andF)aredumpedinalternatecycles.Thescannedsignalstatesareshowninbold.Althoughscansignalsaredumpedinalternatecycles,thetableshowsstatesforbothAandFincycle1,cycle3,andsoon.Thisisbecauseincycle1thestateofsignalAisdumpedwhereasincycle2thestateofsignalFincycle1isdumped.However,thescanchain(i.e.,AandFusingshadowip-ops)holdsthestateforthesamecycle,althoughdifferentpartsweredumpedindifferentcycles.Inotherwords,thesignalstateofFcapturedatcycle1isdumpedincycle2.Asdescribedby[ 27 ],thescanchainsneednotconsistofip-opsthatarephysicallyconnected.Forexample,thescanchainhereconsistsofip-opsAandFthatareconnectedviaip-opD,whichisnotpartofthescanchain.Inotherwords,avirtualscanchaincanbedevelopedonlycomprisingofthetwoip-opsAandF.Althoughtherestorationratioobtainedhereis3.1(lessthanthetrace-onlymethod),thenumberofstatesrestoredis34whichishigherthanobtainedearlier(32incaseofTable 1-1 ,asseeninSection 1.2 ).Thus,moresignalstatesprovideamoredetailedviewoftheinternalstateofthecircuit.Theprimaryproblemofcombiningscanandtracetogetheristodeterminewhatsignalstoselectfortracing,andwhichonestobeincorporatedinthescanchain.Tracesignalsshouldbechosensuchthattheycompriseoftheimportantsignalsinthecircuit 57 PAGE 58 Table4-1. Restoredsignalsusingtraceandscan SignalCycle1Cycle2Cycle3Cycle4Cycle5 A 0 0 0 0 1B1010X C 1 1 0 1 0DX0000EX1000 F 1 X 1 0 0GX0000HX1000 thatcancontrolsignicantpartsofthecircuit.Scanchainsontheotherhandshouldbedistributedaroundthecircuitsothatthesnapshotofthecircuitataparticularclockcyclecanbeobtainedduringdebug.Sincethetracebufferisgettingdividedbetweenthetracesignalsandthescanchains,itisalsoimportanttoknowhowthisdivisionisdone.Clearly,anequaldivisionmightnotbebenecialsinceitwouldmeanlessimportancetothescansignalsanddecreasingthenumberofscandumps.Koetal.[ 27 ]exploredallcombinationsofnumberoftracesignalsandscandumpfrequencytoobtainabenecialcombination.Wehavedevelopedanalgorithmtoselectanefcientcombinationoftraceandscansignalstomaximizetheoverallsignalrestoration. 4.2TraceandScanSignalSelectionSimilartotrace-onlydebugapproaches,boththetraceandscansignalsarechosenduringthedesignphaseofaparticularcircuit.Thestatesofthetracesignalsaremonitoredeverycycle,whilethescansignalsaredumpedatcertaintimeintervalsinarepeatedfashion.Werstintroduceourdebugarchitecture.Next,wedescribeourtraceandscansignalselectionalgorithms. 4.2.1Trace+ScanDebugArchitectureOurtrace-scancombinedarchitectureismotivatedbythedesignofKoetal.[ 27 ].Theentirespaceofthetracebufferisdividedintotwoparts-oneforthetracedataandtheotherforthescandump.Thestatesoftracesignalsareofoadedintothe 58 PAGE 59 tracebufferateveryclockcycle.Thetracebufferwidthdeterminesthenumberofscansignalsdumpedaswellasthescandumpfrequency.However,sincethetracebuffersizeisconstant,thetotalamountofdatathatcanbestoredremainsxed.Withanincreaseinnumberofip-opsinthescanchain,theamountofdataproducedineachdumpincreases.Asaresult,thenumberofscandumpshastobedecreasedinordertomaintainthetracebufferconstraints.Thescanchainisdividedintosmallsubchainstoallowcompleteutilizationofthetotaltracebufferwidthinthesamewayas[ 49 ].ThesearerepresentedasnsubchainsinFigure 4-2 .Thepartitionsareshowntobenumberedfrom1ton.Eachofthesensubchainsutilizethetracebufferinputsfordumping.Tofacilitatethetradeoffbetweenscanandtracedata,[ 27 ]haveproposedintroductionofmultiplexersinfrontofthetracebufferinputs.Thishelpsindynamicallyreconguringtheinputsforthetraceorthescansignals.Inourcase,theinputstothetracebufferarepredeterminedforaparticularcircuit.Hence,multiplexersarenotneeded,andthisreducesthehardwareoverheadaswellasdelayassociatedwithdynamicrecongurationmechanism.ThetracebufferhasawidthwanddepthD.Therefore,thetotalnumberofbitsthatcanbestoredinthetracebufferarewd.Here,moftheinputsarededicatedfortracesignals,whilensub-scanchainsdumptheirvaluesinthetracebuffer.Clearly,w=n+m.Ourproposedalgorithmcomprisesoftwoparts.First,wedeterminewhichtracesignalsarebenecial.Next,wedetermineprotablesetofscansignalsandscandumpfrequency. 4.2.2TraceSignalSelectionAlgorithmInthissection,wedeterminethesignalsthatareneededtobetracedduringdebug.Themainproblemthatwefaceherearetwofold.Firstofall,thetracesignalsneedtobechosenefcientlyinordertoincorporatetheadvantagesofusingthescansignalsduringdebug.Also,unlikethetrace-onlyapproaches([ 7 ]and[ 5 ]),thenumberofsignalstobetracedisnotxed.Although,themaximumnumberofsignalstobetracedisequaltothe 59 PAGE 60 Figure4-2. ProposedArchitecture:Thewidthwofthetracebufferissharedbymtracesignalsandnsubchainsofthescanchain tracebufferwidth,theactualnumberoftracesignalscanbelesstoaccommodatethescansignals.Weusetwotermsconnectivityandthresholdinouralgorithm.Theconnectivityofastateelementisdenedasthenumberofstateelementsconnectedwithitthroughothercombinationalgates(only)inbothforwardandbackwarddirections,asexplainedinSection 4.1 .Thethresholdisaminimumlimitontheconnectivityofastateelement,sothatitisselectedfortracing.WenowexplainthesetermsusingtheexampleinFigure 1-4 .Theconnectivityoftheip-opscanbedeterminedusingthecircuitdiagram.Forexample,inFigure 1-4 ,theconnectivityofCis4,sinceip-opsA,B,DandEareconnectedtoit.Similarly,connectivityofip-opAis2sinceonlyCandGareconnectedtoit.Algorithm 3 outlinesthemajorstepsinourtracesignalselectionalgorithm.Firstwecreateagraphfromthecircuit,witheachnoderepresentingastateelement.Theedgesbetweenthenodesrepresentthepathtakentoreachfromonestateelementto 60 PAGE 61 theother.ThisgraphconstructionfollowsthesamemethodologydescribedinFigure 3-3 .ThegraphisredrawninFigure 4-3 Figure4-3. Graphicalrepresentationofexamplecircuit Oncethegraphisconstructed,thenodewiththehighestconnectivityisselectedasthemostprotabletracesignal.Alltheadjacentnodesanditselfaredeletedfromthegraph.Thenextnodewithhighestconnectivityischosen.Iftheconnectivityofthenodeislessthanthethreshold,thecomputationstops,otherwisethesignalselectionproceduregoesonuntilthetracebufferwidthisreached. Algorithm3:Tracesignalselectionalgorithm Input: Circuit,threshold Output: ListoftracesignalsS(initiallyempty)1:CreateagraphGPfromthecircuit.whiletracebufferisnotfulldo 2:FindnodewithhighestconnectivityinGP.3:IfconnectivityislessthanthresholdreturnS.4:Otherwise,addthenewnodetothelistS.5:DeleteitanditsadjoiningnodesfromGP.6:Re-computetheconnectivitiesofallnodes.endreturnS LetGPdenotethegraphmodelofFigure 4-3 .Inthisexample,weuse40%ofthetotalnumberofip-opsinthecircuit(i.e.,3.2)asthreshold.ThenodewiththehighestconnectivityisC,4,whichismorethanthethreshold.Therefore,Cisselectedfortracing.LetRfCg,denedasrelationsofC,bethesetofnodesconnectedwith 61 PAGE 62 C,includingCi.e.,RfCg=fA,B,C,D,Eg.Step5ofAlgorithm 3 recalculatesGP=GP)]TJ /F3 11.955 Tf 11.03 0 Td[(RfCg.Inotherwords,afterdeletionofCanditsadjoiningnodes,themodiedGPconsistsofonlythreenodes(F,GandH)whereFisconnectedtobothGandH.ThenodewiththenexthighestconnectivityinGPisF,withaconnectivityof2.Sincethisislessthan40%,Fisnotconsideredasaprotabletracesignal.ThealgorithmreturnsCastheselectedtracesignal. 4.2.3ScanSignalSelectionAlgorithmInthissection,wedescribeourproposedalgorithmforselectionofbothscanchainandscandumpfrequency.TheproceduretodeterminethescanchainisshowninAlgorithm 4 .First,wecreateagraphfromthecircuit,inthesamewaydescribedinSection 4.2.2 .Oncethegraphisconstructed,allthenodesthatarepartofthetracesignalsorareconnectedtothosetracesignalsareremovedfromthegraph.Then,aminimalnodesetisobtainedfromthegraph.Aminimalnodesethastworequirements.First,itisagroupofnodessuchthateachandeveryothernodeinthegraphisconnectedtoatleastonenodeintheset.Also,itshouldbeminimali.e.,thesetshouldhavetheleastnumberofnodes.TheproceduretoobtaintheminimalnodesetisshowninAlgorithm 5 .Theip-opscorrespondingtothenodesinthenodesetconstitutethescanchain.Werstdescribehowtheminimalnodesetiscreated.Next,weuseanillustrativeexampletodescribehowthealgorithmworks. Algorithm4:Scansignalselectionalgorithm Input: Circuit,alreadyselectedtracesignals Output: ListofscansignalsS(initiallyempty)1:Createagraphfromthecircuit.2:Removethenodesrelatedtotracesignalsanditsimmediateneighbors.3:Computethenodevalues.4:FindtheminimalnodesetS.returnS 62 PAGE 63 4.2.3.1CreationofminimalnodesetTherststepconstructsagraphmodelofthecircuit.Oncethegraphhasbeencreated,theminimalnodesethastobedetermined.Duringthecreationoftheminimalnodeset,caremustalsobetakentoensurethatthenodeshavinghigherconnectivityareselectedforscanning.ThealgorithmforminimalnodesetconstructionisshowninAlgorithm 5 Algorithm5:Minimalsignalsetcreation Input: Circuitasgraph,Nodevalues Output: MinimalNodeSetS(initiallyempty)1:PutallthenodesinalistGPS.whileGPSisnotemptydo 2:Findthenodewiththehighestconnectivity.3:RemovethenodefromGPS.4:RemoveallnodesassociatedwiththatnodefromGPSalongwiththeirassociatededges.5:Recomputeconnectivityvalues.endreturnS 4.2.3.2IllustrativeexampleWenowexplaineachofthestepsinthealgorithmusingthegraphinFigure 4-3 .ItshouldbenotedthatsincethecircuitinFigure 1-4 issmall,wehavenottakenintoconsiderationtheeffectofnodesthathavebeenselectedfortracing;thatis,wehaveshownthescansignalapproachindependentlyofthealgorithmdescribedinSection 4.2.2 .Inotherwords,weassumedthattherearenotracesignalsinthiscase.AscanbeseenfromFigure 4-3 ,thenodeCandFhavehighestconnectivity.WechoosenodeCastheinitialnode.Thenodesassociatedwithit(i.e.,A,B,DandE)arealsoremovedalongwiththeircorrespondingedges.Thenodeconnectivityinformationisthenrecomputed.AfterdeletionofCanditsadjoiningnodes,themodiedGPconsistsofonlythreenodes(F,GandH)whereFisconnectedtobothGandH.Inotherwords,theconnectivityvaluesforF,GandHare2,1and1,respectively.ThenodewiththenexthighestconnectivityisF.OnceFisselectedandtheadjoiningnodes(GandH)are 63 PAGE 64 deleted,thegraphbecomesempty.Therefore,thecomputationstops.Thescanchainobtainedcomprisesofthetwoip-opsCandF.Thebasicideabehindthisformofscancellselectionisthateachnodeintheentirecircuitiseitherintheminimalsetorconnectedtoatleastonenodeintheset.Therefore,whenthescandumps(ofip-opsintheminimalset)areperformed,thesignalstatesofthenodesthatarenotintheminimalsetcanbereconstructedbasedonscandumps.Forexample,inFigure 4-3 ,ifthestateofCisdumpedincyclei,thestatesofAandBmaybeobtainedincyclei)]TJ /F5 11.955 Tf 10.95 0 Td[(1,whilethestateofDandEmaybeobtainedincyclei+1.Nowwedescribehowtocomputescandumpfrequencyforasetofscansignals(scanchain)basedonaparticulartracebuffersizeandnumberofinputsdedicatedtothetracesignals.LetthetracebufferdepthandwidthbeDandwrespectively.Letmbethenumberofinputsdedicatedtothetracesignals.Therefore,thenumberofinputsofthetracebufferdedicatedtothescanchainpercyclearew)]TJ /F3 11.955 Tf 11.15 0 Td[(m.Letthescanchainlengthbel.Therefore,numberofcyclesittakestodumptheentirescanchainintothetracebufferisl w)]TJ /F15 8.966 Tf 6.97 0 Td[(m.Thisdeterminesthescandumpfrequency,sincethescanchainwillbedumpedaftereachl w)]TJ /F15 8.966 Tf 6.97 0 Td[(mcycles.SincethedepthofthetracebufferisD,thenumberofscandumpswouldbed(w)]TJ /F15 8.966 Tf 6.97 0 Td[(m) l. 4.3ExperimentalResultsTable 4-2 showstheresultsofcomparisonwiththeexistingtechnique[ 27 ].Weimplementedboththeapproachesforthe3largestISCAS'89benchmarks.Thetracebufferischosenwithawidthof32andadepthof1024.Incaseofourapproach,athresholdvalueof10%ischosenforselectingthetracesignals.Tomaintainfairness,inourimplementationofthemethodproposedby[ 27 ],wehaveusedthesamenumberoftracesignalsasourapproachanddriventheinputsofthebenchmarkswiththesamesetofrandomvalues.TherstcolumninTable 4-2 showsthecircuitname.Thesecondandthirdcolumnsrepresentthenumberofstatesrestoredusingourapproachandtheoneproposedby[ 27 ].Finally,thelastcolumngivestheimprovement,whichistheratio 64 PAGE 65 ofthenumberofextrastatesrestoredbyourapproachcomparedtothestatesrestoredusing[ 27 ].Ourapproachperformedconsistentlybetterthan[ 27 ]andproducedupto17.3%3improvementinrestorationperformance. Table4-2. Comparisonwithexistingtechnique RestoredStates CircuitOurApproachExistingTechnique%Improvements3858433285428379217.3%s3841760187854060311.3%s359323380903266023.5% Figure4-4. ComparisonwithKoetal.andBasuetal. Wenowcompareourproposedapproachwiththeexistingtrace-onlyapproachespresentedbyKoetal.[ 5 ]andBasuetal.[ 7 ].Figure 4-4 showsthecomparisonof 3Themaximumimprovementweobtainedis44%incaseofs9234.Since[ 27 ]didnotreportitintheirpaper,wealsoomittedit. 65 PAGE 66 restorationratiousingthe3largestISCAS'89benchmarks.Asexpected,ourproposedapproachoutperformstheothertwotechniquesfors38417.However,ourproposedapproachoutperforms[ 5 ]fortheremainingbenchmarksbutproducescomparableresultswith[ 7 ]4.Themainreasonisthatthesebenchmarkshavelargenumberofdominatingsignals,whichneedstobetracedeverycycle.Tracingonlyafewofthemandperformingscandumpsatregularfrequenciesisnothelpful.Ontheotherhand,thenumberofsuchdominatingsignalsins38417islowwhichrequiresahighspatialobservabilityoftracesignalsforimprovedrestoration.Hence,abetterperformanceisobtainedwiththetrace-scancombinedapproach. 4.4SummaryCombiningtrace(non-scan)andscansignalsisapromisingapproachtoenhancesignalreconstructionduringpost-silicondebug.Wedevelopedefcientalgorithmstoselectprotabletracesignalsandscanchainstomaximizetherestorationratio.OurexperimentalresultsusingISCAS'89benchmarksdemonstratedthatourmethodprovidesupto17%higherrestorationcomparedtoexistingapproaches.Weobservedthatitisprotabletoselectonlytracesignalsifadesignhaslargenumberofdominatingsignals,otherwiseselectionofbothtraceandscansignalsisbenecial. 4Inthesecases,ourapproachcanbeconsideredasatrace-onlyapproach,thatis,atrace-scancombinedapproachwithzeroscandumps,whichselectsthebesttracesignalsusingthemethodin[ 7 ]. 66 PAGE 67 CHAPTER5ERRORDETECTIONAWARETRACESIGNALSELECTIONExistingtracesignaltechniquesarebasedontheprimaryobjectiveofmaximizingtherestorationofuntracedsignals,andnotdetectionoferrorsinthecircuit.ErrordetectioninacircuitcanbeillustratedusingtheexamplecircuitinFigure 1-4 .Errorsinasignalareonlypropagatedalongtheforwardpropagatingpathtowardstheoutput.Therefore,asignalwhichisonlyonthefan-outconeoftheerroneoussignalcangetaffectedbytheerror.Forexample,inFigure 1-4 ,whenip-opFisinerror,tracinganyip-opinitsfan-incone(A,B,C,DorE)wouldnothelptoreconstructit.Insteadtracinganyip-opinitsfan-outcone(GorH)hasapossibilityofdetectingtheerrorinF.Existingsignalselectionalgorithmsrelyonbothforwardandbackwardrestoration1forreconstructinganuntracedsignal.Sinceanerrorinthefan-outconeofasignalcannotbedetected,forwardrestorationisnotmeaningfulforerrordetection.Therefore,restorationratio,themetricusedtomeasuretheefciencyofexistingsignalselectionalgorithmsisnotappropriateforerrordetection.Section 1.3 motivatestheneedforanewmetricthatcanprovideanestimateofthepercentageoftotalerrorsdetectedinthecircuit.Yangetal.[ 50 ]proposedasignalselectionalgorithmtofacilitateearlydetectionoferrors.Theirapproachisfocusedmoreonlatencyanddoesnotdealwiththetotalnumberoferrorsdetected.Shojaeietal.[ 51 ]developedatechniquewhichisdedicatedtodetectionoftimingerrorsonly.Theauthorsconsideredhowswitchingactivityandpowerdroopcanbeusedtoestimatetheerrorsinaportionofthecircuit.However,theauthorsdidnotconsidertheimportanceoffunctionalorlogicalerrorsthatmightbepropagatingtothetracedsignals.Wehaveproposedasignalselection 1Forwardrestorationdealswithrestorationoftheoutputsignalstatewhenoneormoreoftheinputsignalstatesareknown.Ontheotherhand,backwardrestorationobtainstheinputsignalstateswhentheoutputsignalstateisknown.ThisisdiscussedindetailinSection 1.2 67 PAGE 68 algorithmwhichselectsprotablesignalsforefcienterrordetection.Ouralgorithmlaysemphasisonhowerrors,whichpropagatefromanerrororigintowardsitsfan-outconecanbedetected.Comparedtotheexistingsignalselectionalgorithms,ourapproachismoreefcient(upto2X)indetectingerrorsacrosstheentirecircuit. Algorithm6:SignalSelectionAlgorithm Input: Circuit,Tracebufferwidth Output: Listofsignalstobetraced,TSTS=f/*InitializetoNULL*/1:Createagraphicalrepresentationofthecircuit.2:/*ComputeEdgeValues*/Foreachnodees,d,whichisanedgebetweentwonodessandd,computetheedgevalueps,d.3:/*ComputeNodeValue*/Createasetforeachnodei:Si=f(ej1,i,pj1,i),(ej2,i,pj2,i),....(ejn,i,pjn,i)g,whereejk,irepresentsanedgebetweennodejkandi,wherejkisinthefan-inconeofi,andpjk,iisthevalueoftheedge.Valuefornodei:vi=k=nk=1pjk,i,wheren=jSij4:/*SelectTraceSignals*/whiletracebufferisnotfulldo Selectthenodewiththelargestnodevalue.Letthisnodebei./*Addthenodetothelist*/TS=TS[i/*RemoveOverlaps*/foreachelementtiinSido foreachelementtlinSldo iftheyhavecommonsourcem&pm,ipm,lthen Sl=Sl)]TJ /F6 11.955 Tf 10.95 0 Td[((em,l,pm,l)endendendendReturnthelistofselectedsignals,TS. 5.1TraceSignalSelectionforErrorDetectionInthissection,weproposeasignalselectionalgorithmthatisdedicatedtoerrordetectioninthecircuit.Algorithm 6 describesoursignalselectionalgorithm,whichhas 68 PAGE 69 fourimportantsteps.Theremainderofthissectiondescribeseachofthesestepsindetail. 5.1.1GraphbasedModelingofCircuitsFigure 5-1 showsamodiedversionofFigure 1-4 withlabeledsignals.Figure 5-2 showsthegraphicalrepresentationofFigure 5-1 .Here,eachnodecorrespondstoasignalinthecircuit.Anedgerepresentstheconnectivity/owbetweentwonodes(signals).Forexample,presenceoftheORgatebetweensignalsaandpisrepresentedasanedgebetweenthecorrespondingnodes.Theedgeshavebeenshownusingdirectionalarrows,whichindicatethepropagationoferrorfromasourcetoitsfan-outcone.Flowoferrorfromonenodetoanotherwhichisnotdirectlyconnectedwillpassthroughseveralintermediatenodes.Forexample,inFigure 5-2 ,anerrorfromatocwillpassthrougha,p,c. Figure5-1. Examplecircuitwithlabeledsignals 5.1.2EdgeValueComputationWenowdescribehowtocomputeindividualedgevaluesdependingonthetypeofgate.Next,weexplainthecomputationofcompoundedgevalues.Acompoundedgeisonewhichpassesthroughmultiplenodes,thatis,acompoundedgecomprisesoftwoormoreindividualedges. 69 PAGE 70 Figure5-2. GraphicalrepresentationofFigure 5-1 ANDGate.Tocomputetheprobabilityoferrorpropagation,weconsidertheexamplesinFigure 5-3A ,whichshowsamulti-inputANDgate.ThegraphicalrepresentationisshowninFigure 5-3B .Lettheinputsbenamedi1,i2,...,inandtheoutputo1respectively.Let'sconsidertheerrorpropagationfromi1too1.Inorderforanyerror(0/1or1/0)topropagatetoo1,itisnecessarythatalltheotherinputsbetiedat1.Ifanyoftheotherinputsisatstate0,theerrorwillnotpropagate.Ifoneoftheinputsisatstate0,theoutputwillalwaysbeatastateof0,irrespectiveofi1beingcorrectorerroneousandhence,theerrorgetsundetected.Here,weassumealltheotherinputstotheANDgateareindependent.Weconsiderdependentedgeslater.Therefore,theprobabilitythatallofthemare1simultaneouslyistheproductofeachoftheindividualprobabilities.LetP1inbetheprobabilitythatinputinisatstate1.Therefore,theprobabilitythatalltheotherinputsareat1is Probi1=2knP1ik(5)whichistheprobabilitythatanerrorati1willgetpropagatedtoo1.SimilarcomputationscanbeperformedifthegatehadbeenaNANDgate.ORGate.ThecomputationsforanORgatefollowstheapproachsimilartoanANDgate.Let'sconsideramulti-inputORgateasshowninFigure 5-4A ,andthecorrespondinggraphinFigure 5-4B .Asbefore,lettheinputsbenamedi1,i2,...,inandtheoutputo2,respectively.Let'sconsidertheerrorpropagationfromiitoo2.Sinceinan 70 PAGE 71 AANDgate BGraphofANDgateFigure5-3. ExampleusingANDgate ORgate,0isthenon-dominatinginput;inordertopropagateanerrorini1too2,alltheotherinputsmustbeheldatastateof0.Theprobabilitythataninputikisheldat0isP0ik.Thejointprobabilitythatalltheinputsotherthani1isheldat0is Probi0=2knP0ik(5)whichistheprobabilitythatanerrorati1getspropagatedtoo2.SimilarcomputationscanbeperformedforanyoftheninputsandalsoforaNORgate. AORgate BGraphofORgateFigure5-4. ExampleusingORgate Flip-opandNOTGate.Toshowhowtheerrorpropagatesthroughip-opsandNOTgates,letusconsidertheexamplesinFigure 5-5 .Figure 5-5 (a)showsaD-typeip-opwhoseinputisi1andoutputiso3.Anyerrorini1willbetransmittedtoo3inthenextcycle.Sincethereisnoothersignaldependencybetweeni1ando3,there'snohindranceinerrorpropagation.Therefore,theprobabilityoferrorpropagationis1andhence,thevalueoftheedgebetweenthenodesi1ando3is1.Figure 5-5 (b)showsaNOTgatewhoseinputisi1andoutputiso4.SimilartotheD-typeip-op,thereisjustoneinput,andhence,anyerrorini1willgetpropagatedtoo4.Thus,theedgevaluebetweenthenodesi1ando4is1. 71 PAGE 72 Figure5-5. Dip-opandNOTgate Figure 5-6 showsthevaluesoftheedgesinFigure 5-2 .Forsimplicity,weassumethatallthenodeshavea50%probabilityofhavingastateof0or12. Figure5-6. EdgevaluesforthegraphinFigure 5-2 Nowwediscusshowtheprobabilityoferrortransmissionchangesacrossmultipleedges.Tocalculatetheedgevalueacrossmultiplegates,thatis,probabilityoferrorpropagationfromonenodetotheothersinitsfan-outcone,wehavetoconsiderbothindependentanddependentedges.CompoundIndependentEdge.Anindependentedgeisonewhichpassesfromonenodetoanotherwitheachinternalnodealongthepathvisitedatmostonce.WeexplaintheindependentedgesscenariousingthegraphicalrepresentationinFigure 5-2 .Theedgefromqtosisanindependentedge,sincethereexistsonlyonepathfromqtos,viad.Inotherwords,eq,sistraversalofeq,dfollowedbyed,s.Sincethereisa 2Inourexperiments,weusetheprolinginformationtodeterminetheprobabilityofinputs(nodes)stayingataparticularstate 72 PAGE 73 ip-opbetweenqandd,thevalueofeq,dis1.Thisiswrittenaspq,d=1.Tocomputethevalueoftheedgebetweendands,itcanbeseenthattheotherinputtosisnodee.Thevalueoftheedgeed,sistheprobabilitythatthenodeeisatstate0,whichinthiscaseis0.5.Therefore,pd,s=0.5.Thevalueofedgebetweenqands,istheproductofpq,dandpd,s,thatis,pq,s=0.5.Thisisintuitivebecausebothgatesareindependent,andhence,thenalprobabilitywillbeaproductofhowtheerrorgetspropagatedthrougheachofthem.Thus,foranindependentedge,theedgevalueistheproductofalltheedgeswhicharecomponentsoftheindependentedge.Ingeneral,iftherearenedges,e1,e2,...,encomprisinganindependentedgee,valueofeis pe=1knpek(5)CompoundDependentEdge.Adependentedgeisonewhichstartsfromanode,branchesof,andnallycombinestoreachanothernode.Asbefore,weexplainthedependentedgecomputationusingthegraphicalrepresentationinFigure 5-2 .Thereexiststwoedgesbetweennodesnandr.Oneedgeisn,b,rwhiletheotherisn,b,p,c,r.Tocomputethevalueofthecompoundedgeen,r,weneedtocomputetheseedgevaluesseparately.Thevalueoftheedgen,b,risproductofpn,bandpb,r(followsfromindependentedgevaluecomputation).Thus,valueofedgen,b,r,thatis,pn,b,ris0.5.Ontheotherhand,pn,b,p,c,ris0.25.pn,risdenedaspn,r=max(pn,b,r,pn,b,p,c,r).Thisisbecausewhentheeffectoftwodifferentedgesarealreadytakenintoconsideration,anedgewithahighererrorpropagationprobabilitywillalwaysdominate,andthus,canbeconsideredanedgevaluefromonetoanother.Ingeneral,iftherearenedgese1,e2,..,enbetweentwonodessandd,thenthevalueoftheedgebetweensandd,ps,d=max(pe1,pe2,....,pen) 5.1.3NodeValueComputationWearenowreadytocomputethenodevaluesforFigure 5-6 .Foreachnode,weobtainthesetofnodesinitsfan-inconeandthecorrespondingedgevalues.Letus 73 PAGE 74 considerthenodep.Ascanbeseen,itfallsinthefan-outconeoffournodes,a,b,mandn.ThecorrespondingedgevaluesarecomputedandthesetSpisobtainedasSp=f(em,p,0.5),(ea,p,0,5),(en,p,0.5),(eb,p,0.5)g3.Thenodevalueatpisdenedasthesumofalltheseedges,thatis,vp=2.Inthisway,thenodevaluesarecomputedforeverynodeinthegraph.Figure 5-7 showsthenodevaluesforthenodesinFigure 5-6 Figure5-7. NodevaluesforthegraphinFigure 5-6 5.1.4SignalSelectionInthissection,wedescribethenalstepinoursignalselectionalgorithm.Oncethenodevaluesarecomputed,thenodewiththehighestvalueisselectedfortracing,whichisg(orh)inFigure 5-7 .Thesubsequentsignalsshouldbecarefullyselectedsoastoenhancetheerrordetection.Weshoulddeletecontributionsfromsignalswhichhaveahighprobabilityoferrordetectionfromthealreadytracedsignals.Step4ofAlgorithm1isusedforthispurpose.Foreachsignalinthecircuit,wecheckhowmuchofitscontributionistothealreadyselectedsignalaswellastotheothers.Ifthecontributiontotheselectedsignalislargerthanthecontributiontosomeothersignal,itscontributiontothelatterisremoved.Toillustratethisfeature,werefertoFigure 5-8A .Theedgevaluesaswellasthe(edge,value)pairsetshavebeenshownforeachnode.Aftergisselectedfortracing,thenodevaluesarerecomputedbyremovingtheoverlap. 3Eachentryinthesetisan(edge,value)pair 74 PAGE 75 AscanbeseeninFigure 5-8A ,ps,t=0.5andps,g=0.5.Sincetheprobabilityofanerroratsgettingdetectedattandgaresameandghasalreadybeenselectedfortracing,contributionofsisremovedwhenrecalculatingthenodevalueoft.Bysimilararguments,thecontributionfromftot,i.e.,(ef,t,0.5),willbealsodeletedfromSt.Ontheotherhand,sincethevalueofes,fisgreaterthanes,g,itisnotdeletedfromSf.TherecomputedsetsforeachofthenodesareshowninFigure 5-8B .Sincefhasthehighestnodevalue,itwillbeselectednext. AInitialSets BRecomputedsetsFigure5-8. SignalSelectionbasedonremovalofoverlap 5.2Expriments 5.2.1ExperimentalSetupInthissection,wediscusstheperformanceofouralgorithmusingtheISCAS'89benchmarks.Foreachofthebenchmarks,50differenterrorsitesarechosenrandomly.TheerrormodelisdescribedinSection 5.2.2 .Acompletesimulationof1000cyclesisperformedforboththecorrectanderroneousscenarios.Thesignalstobetracedaremonitoredateachclockcycleduringsimulation.Theirstatesarecomparedwiththestatesobtainedfromtheperfectsimulation.Anydiscrepancyisreportedaserror.WedeneErrorDetectionRatio(EDR)asametricformeasuringtheeffectivenessofasignalselectionalgorithm.EDRisdenedas: EDR=NumberofErrorsDetected NumberofDetectableErrors(5) 75 PAGE 76 Inacircuit,allerrorsmaynotbedetectedbythestateelements.Someoftheerrorsgetsuppressedbeforereachingastateelement(ip-opsandprimaryoutputs).Hence,itisimportanttoconsiderthenumberoferrorsthatcanbedetectedusingstateelementsandnotthetotalnumberoferrorsintroducedinthecircuit.ThesecondandthirdcolumnsinTable 5-1 showthenumberofip-opsandprimaryoutputsrespectively,forthefourISCAS'89benchmarksonwhichwehaveperformedourexperiment.Thenexttwocolumnsindicatethenumberoferrorsthataredetectedusingip-opsoroutputs.Notethatthetotalnumberofdetectableerrors(lastcolumn)arenotthesummationofvaluesinfourthandfthcolumnssincethereareoverlapofdetectableerrors. Table5-1. DetectableErrorsfortheISCAS'89benchmarks Circuit#FFs#outputsDetectableDetectableDetectablebyFFsbyoutputs(total) s537817949303338s92342282226226s13207669121381938s158505978729929 5.2.2ErrorModelWehaveassumedaperiodicallyrecurringerrormodelwhichtriestorepresentarealscenario.Initially,weselectasetofrandomnodesaspotentialerroneousones.Arandomfunctiongeneratorisusedforthispurpose.Weconsidereachoftheerrorsindependently,thatis,weconsideroneerroratatime.Oncethenodesareselected,arandomtimestampischosenwhichshouldbesignicantlylowerthanthetracebufferdepth4.Aftereachoccuranceofthistimestamp,theerroneousnodeisassumedtomalfunction.Wehavechosenasimplebit-ipmodelforourpurpose.Whenthenode 4Wehavechosenatimestampof100cycles,whichislessthanthetracebufferdepthof1000cycles. 76 PAGE 77 issupposedtomalfunction,itjustipsitscorrectstate.Theerroneousstateisthenpropagatedalongitsfan-outcone. 5.2.3ResultsInthissection,wecompareoursignalselectionperformancewithrestorationawaresignalselectionapproach[ 7 ].Thoughthereareothersignalselectionmethodssuchas[ 5 ]and[ 6 ],wecompareourapproachwith[ 7 ],thatprovidesthebestresults.Wehaveusedthesamesetofsignalsusedby[ 7 ]andcomparedtheEDRusingbothapproaches.TheresultsareshowninFigure 5-9 .Ascanbeseen,ourmethodprovidesuptotwiceimprovementcomparedto[ 7 ],when32stateelementsaretraced.Thisisbecausetheschemeproposedin[ 7 ]considersbothforwardandbackwardrestorationwhenselectingsignalsfortracing,whereasonlyforwarderrorpropagation(equivalenttobackwardrestoration)isusefulforerrordetection. Figure5-9. ComparisonwithRestorationawaresignalselection Inthenextsetofexperiments,weexplorehowEDRchangeswhenweincreasethenumberofsignals(ip-opsandoutputs)tobetraced.Weconsiderthenumberoftracesignals32,64and128andnotethechangesinerrordetectionperformance. 77 PAGE 78 TheresultsareshowninFigure 5-10 .Fors5378,anEDRof84%isobtainedwithatracebufferwidthof32.Furtherincreaseoftracebufferwidthincreasesthenumberoferrorsdetectedbyasmallamount(upto95%).Incaseofs9234,EDRvalueof46%isobtainedusingatracebufferwidthof32.Asexpected,increaseintracebufferwidthincreasestheEDRvalue.Almost77%EDRcanbeobtainedusingatracebufferwidthof128.Fors13207,theinitialerrordetectionissmall(18%EDRwithatracebufferwidthof32);however,asharpincreaseinerrordetectionperformanceisobtainedwhenthetracebufferwidthisincreased.Infact,theEDRtripleswhenthetracebufferwidthis128.Althoughs15850isalargebenchmarklikes13207,weseeaconsiderablelargeEDRwhenthetracebufferwidthis32(38%).Sincethereare673stateelements,furtherincreaseintracebufferwidthto64and128doesnotreectasharpincreaseinEDRvalue. Figure5-10. Variationoferrordetectionwithnumberoftracesignals OurnextexperimentistoseehowEDRchangeswhenweselectonlyip-opsfortracing.TheresultsareshowninFigure 5-11 .ThevariationtrendisalmostsimilartothatinFigure 5-10 ,sincethenumberofip-opsformamajorportionofthenumberof 78 PAGE 79 stateelements.Asbefore,s5378hasmostoftheerrorsdetectedwhenthetracebufferwidthis32,andhenceanyfurtherincreasehasminorimpact.Ontheotherhand,s9234ands13207havesharpincreaseinEDRwhenthetracebufferwidthisincreased. Figure5-11. Variationoferrordetectionwithnumberofip-opstraced Finally,wewouldliketoseehowtheerrordetectionvarieswithvariationinthenumberofoutputsignalsfortracing.Inthiscase,wetraceonlytheoutputsignals.Sincethenumberofoutputsignalsarerelativelysmall,thetracebufferwidthisvariedinstepsof4,8,16and32.NotethatthedenominatorofEDRcomputationusesthevaluesofcolumn5inTable 5-1 .TheresultsareshowninFigure 5-12 .Fors5378ands13207,variationoftracebufferwidthproducesasharpincreaseinthevalueofEDR.Fors9234,theEDRbecomes100%withwidthof4.Thisisbecausefors9234,only2errorscanbedetectedusingalltheoutputs,andthesearedetectedwhenonly4outputsaretraced.Excepts13207,whichhas121outputs,16-bittracebufferachieves80-100%EDRforallotherbenchmarks. 5.3SummaryInordertodetectanerror,thetracedsignalsshouldremaininthefan-outconeoftheerrorsignal.Wehaveproposedanalgorithmwhichtakesintoaccountthisfact 79 PAGE 80 Figure5-12. Variationoferrordetectionwithnumberofoutputstraced andselectssignalswithanobjectiveofdetectingerrors.Wehaveanalyzedseveralcaseswhereanysignal,onlyip-opsandonlyoutputsignalsareusedfortracing.Ourproposedapproachissignicantlybetter(upto2X)inerrordetectioncomparedtothestate-of-the-artexistingsignalselectionalgorithms. 80 PAGE 81 CHAPTER6DYNAMICSIGNALSELECTIONToimprovetheobservabilityduringpost-silicondebug,existingtechniques[ 5 7 52 ]selectasmallsetofprotablesignalsduringdesigntime.Theapplicabilityoftheexistingmethodsislimitedforvariousreasons.First,thesemethodstreateachcomponent(functionalregions)ofthedesignasequallyimportantfromdebugperspectiveandthereforeselectsignalsthataregloballybenecialbasedonrestorationcapability.Inotherwords,itassumesuniformspatialandtemporaldistributionoferrors.Inreality,certainregionsmaynotberelevantduringsomecyclesofoperationforvariousreasons.Forexample,asetofcoresinamulticorearchitecturemaybeinpowersavingmode(usingclockgating)duringcertaintimeframe.Therefore,noerrorispossibleinthosecoresduringthattimeframe.Similarly,certainregions(suchaswellverieddatapath)arelesslikelytohaveerrorscomparedtoothercontrol-intensiveregions.Ingeneral,onlyasmallsetofregionsmayberelevantduringaparticulartimeframefordebugginganerror.Therefore,avericationengineerwouldliketohaveknobsthatallowhimtotraceadifferentsetofsignalsatdifferenttimeframe.Prabhakaretal.[ 28 ]proposedanapproachtoselectbetweentwosetsofsignalsinalternatecycles.Asaresult,itisaveryspeciccaseoftemporaldistributionoferrorswithoutanyconsiderationforspatialdistribution.AmultiplexedsignalselectionforerrordetectionwasproposedbyLiuetal.[ 29 ].Theirapproachisanad-hocsignalselectionheuristicbasedonerrorvisibilitymetric.Thereisnodiscussiononhowsuchaselectionisbenecialfortheirtargeteddebugscenario.Inotherwords,theirapproachdoesnotconsiderthechallengesassociatedwithdynamicsignalselectioninthepresenceofspatialandtemporaldistributionoferrors.Weproposeanefcientsignalselectionalgorithmandassociatedtracecontrollerdesignthatwouldenablevericationengineerstodynamicallytracedifferentsetofsignalsforimprovederrordetection.Weproposearegion-awaresignalselection 81 PAGE 82 algorithm(RSS)thatselectsprotablesignalsduringdesigntime(staticanalysis)basedontheknowledgeoffunctionalregionsandassociatederrorzones.Wealsodevelopalow-overheaddynamicsignaltracing(DST)hardwaretoenabledesignerstotracedifferentsetofsignalsduringexecutionbasedonactive(relevant)functionalregions.Thislaysemphasisontheerrorsinactivezonesinthecircuitthatcanbedetectedusingaspecicallyselectedsetoftracesignals.Althoughourworkmightseemsimilartotraditionaltest-pointinsertionandobservabilityanalysis[ 53 ],itisfundamentallydifferentintwoaspects.First,ourapproachisdesignedspecicallyforpost-siliconvalidationanddebug.Also,incaseof[ 53 ],theobservationpointsaredeterminedbythenumberofsignalsinthefan-incone,andnotontheerrorpropagationprobabilityfromthesignalstotheobservationpoints.Tothebestofourknowledge,thisistherstattemptindevelopinganefcientspatio-temporalsolutionfordynamictracesignalselection.OurexperimentalresultsusingbothISCAS'89benchmarksandopencorescircuitsdemonstratethatourapproachisabletodetectupto3timesmoresignalscomparedtoexistingstate-of-the-arttechniques. 6.1ProblemFormulationThegoalofthischapteristodevelopanefcientdynamicsignalselectiontechniquetomaximizedetectionofcurrentlyactiveerrors1inacircuit.Variousindustrialstudieshighlightthefactthaterrorlocationsarenotuniformlydistributedacrossthecircuit,insteadtheyareclusteredinmultiplesmallzones.Wecallthemerror-pronezones(orerrorzones,inshort).Weassumethaterrorzonesduringpost-siliconvalidationcloselyresemblethoseinthepre-siliconphase.Wedividethecircuitintomultiplepartswhereeachpartcontainsoneormoreerrorzones.Wecallthesepartsasfunctionalregions(orregion,inshort).AnaturalboundaryforaregionwouldbethecomponentboundaryofanSoC.Forexample,eachcoreinamulticoreSoCcanformaregion. 1Thoselocatedinactiveregions,explainedlaterinthissection. 82 PAGE 83 Ifonecomponenthasmultipleerrorzones,wemayevendividethatcomponentintomultipleregionsfollowingsomefunctionalboundary.Forexample,aprocessorcorecanbedividedintotworegions,onecoveringfetchanddecodeunitsandtheothercoveringtherest.Inourconstruction,anerrorzoneiscompletelycontainedinsidearegionofthecircuit,thatis,weassumenooverlapofanerrorzonebetweenmultipleregions.Thereisatrade-offbetweennumberofregionsversuserrorzones.Oneregionpererrorzonemaycreatetoomanyregions(partitions)andleadtounacceptablecomputationalcomplexityandhardwareoverhead.Ontheotherhand,alargeregionwithmanydisjointerrorzonesmayreducetheeffectivenessofdynamicsignalselection.LetusconsideracircuitrepresentedbytherectangleinFigure 6-1 .TheentirecircuitisdividedintomregionsnamedR1toRm.Eachregioncanhaveoneormoredisjointerrorzones.Fortheeaseofillustration,weassumeoneerrorzoneperregion.Itdoesnotloseanygeneralitysinceoneerrorzonecanbeviewedasacompositionofmultipledisjointerrorclusters.Forexample,theerrorzoneZR1forregionR1consistsoftwodisjointerrorclustersinFigure 6-1 .ThesetwoclusterstogetherformtheerrorzoneZR1. Figure6-1. Illustrativeexampleshowingregionsanderrorzones Weconsideratracebufferofwidthn,thatis,nsignalstatescanbestoredinthetracebufferpercycle.Duringanyparticularcycle,someofthefunctionalregionsremainactive(relevant).Aregionisconsideredactiveifthegatesintheparticularregion 83 PAGE 84 functionnormallyandnotdormantduetopower-savingmode(usingclockgating)oranyotherreason.Regionswhichdonothaveanysignaltransitionduringcertaintimeframeareconsideredinactiveandhencearenotrelevant.Therearetwoextremescenarios.Whenalltheregionsareactive,oursignaltracingalgorithmgivesproportionalemphasistoeveryregionandtheassociatederrorzones.However,whenonlyoneregionisactive,benecialsignalsfromthatregionneedtobetraced.Weselectnsignalsfromeachofthemregions,formingatotalsetofmnsignals.Duringexecution,dependingonthecurrentlyactiveregions,nbestsignalswillbechosenoutofmnsignals.ItmustbenotedthatthensignalsfromregionRi(where1im)usedtodetecterrorsinZRicanbefromanywhereinRi(insideaswellasoutsideofZRi).Ourregion-basedsignalselectionalgorithm(RSS)inSection 6.2 describeshowthesesignalsareselected,whileanefcienthardwareimplementationfordynamicsignaltracing(DST)isdescribedinSection 6.3 6.2Region-basedSignalSelection(RSS)Algorithm 7 describesourregionbasedsignalselectionalgorithm(RSS)forselectingprotablesignalsduringdesigntime.Therststepcreatesagraph-basedmodelofthecircuit.Next,foreachregionitcomputestheerrorpropagationprobability(denedinSection 6.2.2 )fromeachnodeintheerrorzonetotheothernodesintheentireregion.Finally,foreachregionthemostprotablensignalsareselected.Theremainderofthissectiondescribesthesestepsindetail. 6.2.1GraphBasedModelingofCircuitsTherststepofAlgorithm 7 istoconstructagraphicalrepresentationofthecircuit.WeexplainthisstepusingourexamplecircuitinFigure 1-5 .WeredrawthecircuitinFigure 6-2 wherethe2regionsofthecircuitareshownclearly.ThegraphicalrepresentationisshowninFigure 6-3 .Eachsignalinthecircuitisrepresentedbyanodeandanydataowbetweentwonodesrepresentedbyanedge.Theedgeisirrespectiveofthetypeofgatebetweentwonodes.Forexample,ip-ops 84 PAGE 85 Algorithm7:Region-basedSignalSelection(RSS) Input: Circuit,Tracebufferwidthn,ErrorzonesZ1,...,Zm Output: mlistsofselectedsignals,SS1,...,SSmSSi=f/*InitializeallliststoNULL*/1:Createagraphicalrepresentationofthecircuit.Dividethecircuitintomregions,RicontainstheerrorzoneZi.2:/*ComputeerrorpropagationprobabilityforeachregionRi*/ForeachnodesinZi,computetheprobabilityofanerroratsgettingpropagatedtoanynodedinregionRi.4:/*SelectntracesignalsforeachregionRi*/whileSSidoesnothavensignalsorRiemptydo 4.1ForeachnodedinRi,computethesummationoftheerrorpropagationprobabilityforeachnodesatZi.Thisisthetotalerrordetectionprobability(EDP)atnoded.4.2SelectthenodejwiththelargestEDPvalue.4.3AddthenodetothelistSSi=SSi[j4.4.RemovenodejanditsoverlapfromRiendReturnthelists(SSi,...,SSm)withselectedsignals CandHareconnectedbyaNOTgate,hence,thetwonodesrepresentingthemhaveanedgeconnectingthem.Directedarrowssignifytheerrorpropagationdirection.R1andR2representtwodifferentfunctionalregionsofthecircuitwithrespectiveerrorzones,ZR1andZR2.LetusconsidertheregionR1.TheprobablesourcesoferrorarethenodesAandB.AnyerrorsinthesenodescanpropagatetotheothernodesinR1whichareintheirrespectivefan-outcones.Therefore,theerroratAcanpropagatetoF,D,EandG.Wewouldliketocomputethepossibilityofanerroratanyofthesetwoprobableerroneousnodes(AandB)topropagatetotheothernodes.Wecallthisprobabilityaserrorpropagationprobability,asdescribednext. 6.2.2ErrorPropagationProbabilityComputationWerstdescribehowtocomputeerrorpropagationprobabilitythroughsinglegates.Errorpropagationprobabilityisdenedastheprobabilityofanerrorpresentataninputofagatebeingpropagatedtoitsoutput.Errorpropagationprobabilityovermultiple 85 PAGE 86 Figure6-2. Examplecircuitwith2regionsand12ip-ops gateswillbeexplainedlater.Weconsidereachofthesinglegatesandcomputetheprobabilityofanerroratoneoftheinputsgettingpropagatedtotheoutput.IndividualGates.Tocomputetheprobabilityoferrorpropagation,werstconsideramulti-inputANDgateinFigure 6-4A .Lettheinputsbenamedi1,i2,...,inandtheoutputo1respectively.Letusassumeanerroroccursatoneoftheinputs,say,i1.Wewanttocomputetheprobabilityoftheerrortobepropagatedtoo1.Inorderforanyerror(0/1or1/0)topropagatetoo1,itisnecessarythatalltheotherinputsoftheANDgatebetiedat1.Ifanyoftheotherinputsisatstate0,theoutputwillalwaysbeatastateof0,irrespectiveofi1,hence,theerrorgetsundetected.Here,weassumealltheotherinputstotheANDgateareindependent.Therefore,theprobabilitythatallofthemare1simultaneouslyistheproductofeachoftheindividualprobabilities.Letp1ikbetheprobabilitythatinputikisatstate1.Therefore,theprobabilitythatalltheotherinputsare 86 PAGE 87 Figure6-3. GraphicalrepresentationofFigure 1-5 withtworegions at1isP1i1=2knp1ikwhichistheprobabilitythatanerrorati1willgetpropagatedtoo1,thatistheerrorpropagationprobabilitythroughtheANDgate.SimilarcomputationscanbeperformedforaNANDgate. AANDgate BORgateFigure6-4. ExamplesusingANDandORgates ThecomputationsforanORgatefollowstheapproachsimilartoanANDgate.Amulti-inputORgateisshowninFigure 6-4B .Let'sconsidertheerrorpropagationfromi1too2.Inordertopropagateanerrorini1too2,alltheotherinputsoftheORgatemustbeheldatastateof0.Theprobabilitythataninputikisheldat0isp0ik.Thejoint 87 PAGE 88 probabilitythatalltheinputsotherthani1isheldat0isP0i1=2knp0ikwhichistheerrorpropagationprobabilityfromi1too2.SimilarcomputationscanbeperformedforaNORgate.Foranyoneinputandoneoutputnode(suchasip-opandNOTgate),theerrorpropagationprobabilityisalways1.Nowwediscusshowtheprobabilityoferrorpropagationchangesacrossmultiplegates.Sincetherearemorethanonegatesinvolved,weneedtoconsiderbothindependentanddependentpaths.Apathisdenedastheseriesoflogicgateswhichareplacedinbetweensource(s)anddestination(d)nodes.Inotherwords,itsigniesthepathtraversedbyapotentialerroratnodestoreachnoded.IndependentPathsthroughMultipleGates.Anindependentpathisonewhichpassesacrossasetoflogicgateswitheachgatebeingvisitedatmostonce.WeexplaintheindependentpathscenariousingFigure 6-3 .Tokeepthingssimple,weassumeeachinternalsignalbeinginastateof0or1withaprobabilityof50%2.TheedgefromAtoEisanindependentpath,sincethereexistsonlyonepathfromAtoE,viaD.SincethereisonlyanORgatebetweenAandD,theprobabilityofanerroratAgettingpropagatedtoDistheprobabilityofL(theotherinputtotheORgate)beinginastateof0,whichis0.5inthiscase.Hence,theerrorpropagationprobabilitybetweenAandDis0.5.Similarly,sincethereisonlya2-inputANDgatebetweenDandE,theerrorpropagationprobabilitybetweenDandEis0.5.Sincenoneofthesignalsarevisitedmorethanonce,theoverallerrorpropagationprobabilitybetweenAandEistheproductofthesetwo,whichis0.25.Ingeneral,iftherearen+1signalsinanindependentpathbetweennodessandd,withtheirintermediateerrorpropagationprobabilitiesbeingp1,p2,...,pn,theoverallerrorpropagationprobabilityacrossthepathisP(s,d)=1knpk 2Thisassumptionisforexplanationpurposeonly;inrealexperiments,wegatherprolinginformationtodeterminethestateprobability.ItisexplainedindetailintheSection 6.4 88 PAGE 89 DependentPathsthroughMultipleGates.Adependentpathisoneinwhichwhilemovingfromasourcenodetoadestinationnode,atleastoneoftheinternalnodesisvisitedmorethanonce.WeexplaintheerrorpropagationprobabilitycomputationusingFigure 6-3 .ThereexiststwoindependentpathsbetweennodesAandG.Oneedgeis(A,F,G)whiletheotheris(A,D,E,G),bothbranchingoutatAandcombiningatG.InordertocomputetheerrorpropagationprobabilityacrossthepathbetweenAandG,weneedtocomputetheseindependentpathvaluesseparately.Forthepath(A,F,G),theerrorpropagationprobabilityistheproductoftheprobabilitiesbetweenthepaths(A,F)and(F,G),bothofwhich,forobviousreasonsare0.5.Thus,theerrorpropagationprobabilityofpath(A,F,G)is0.25.Ontheotherhand,sincethepath(A,D,E,G)passesthrough3independenttwo-inputgates,theeventualerrorpropagationprobabilityis0.125.Theerrorpropagationprobabilitythroughpath(A,G)canbecomputedasp(A,G)=max(p(A,F,G),p(A,D,E,G)).Thisisbecauseduringcomputation,theeffectoftwodifferentpathsarealreadytakenintoaccount,andapathwithahigherprobabilityofdetectinganerrorwillalwaysdominate.Ingeneral,iftherearenindependentpathse1,e2,..,enbetweentwonodessandd,thentheerrorpropagationprobabilityofthepathsbetweensandd,p(s,d)=max(pe1,pe2,....,pen) 6.2.3SignalSelectionBasedonNodeValuesInthissection,wedescribethenalstepinoursignalselectionalgorithm.Therstnodechosenfortracinginaregionisthenodewiththehighestvalue.Thevalueofanodeisthesumoferrorpropagationprobabilitiesofallpathsinwhichthenodeisthedestination.Forexample,inFigure 6-5 ,ifweconcentrateonRegionR1,thenodevalueofEwilldependonpaths(A,D,E),(L,D,E)and(D,E).SinceinthisexampleonlyAandBarepossibleerrorlocationsforR1,therelevantpathwouldbe(A,D,E);thepaths(L,D,E)and(D,E)isnotrelevantbecauseDandLarenotinerrorzone.ThereforethenodevalueofEwillbethesumoferrorpropagationprobabilityacrossthepath(A,D,E),thatis,0.25.Wecanhavesimilarcomputationsforotherregions.Thenode 89 PAGE 90 valuesofallthenodesinR1areshowninFigure 6-5 .Eachnodevalueisrepresentedbyanumberbesideit. Figure6-5. NodevaluesforregionR1inFigure 6-3 InFigure 6-5 ,threenodes(A,BandF)havehighestnodevaluesof1.AandBarenotvalidchoicessinceanyofthemcannotdetecttheerrorinothernode,whereasFcandetecterrorinbothAandBwith50%probability.Therefore,wechooseFastherstnodetotrace.Thesubsequentsignalsshouldbecarefullyselectedtoenhancetheerrordetectionintheregion.Contributionsfromsignalswhichhaveahigherrordetectionprobabilityfromthealreadyselectedsignalsshouldbedeleted.Step4.4ofAlgorithm2isusedforthispurpose.Thebasicideaisthatifanalreadyselectednode(e.g.,F)candetectanerror(e.g.,inA)withequalorhigherprobabilitythananothernode(e.g.,D),thentheoverlapfromtheothernodeshouldberemoved.Forexample,sinceFcandetectanerrorinAwith50%probability,thecontributionclaimedbyD(also50%forA)shouldbedeletedfromDduringthenextiteration.Thisprocesscontinuesuntilnbestsignalsareselectedforeachregionortherearenomoresignalstobeselected. 90 PAGE 91 6.3DynamicSignalTracing(DST)Algorithm 8 describesourdynamicsignaltracing(DST)procedureforimprovederrordetection.Theinputtothealgorithmarethechipdesign,tracebuffersize,activeregionsandassociatedrelevanceandsignallists.Relevanceofaregionindicateshowimportanttheregionisinerrordetection,orthepossibilityofndinganerrorinthatregioncomparedtootherregions.Therelevanceinformationisprovidedbythepre-siliconvericationengineerbasedonpercentageoferrorsfoundintheerrorzoneinthatregion(comparedtoothererrorzones)duringpre-siliconvalidation.Ifnoinformationisavailable,wecanconsiderthesizeoftheerrorzoneinthatregionasrelevance.Ifthetracebuffersizeisn,andtherearemactiveregionsinthecircuit,duringdesigntimeourRSSprocedure(Algorithm1)willselectmnsignals.Duringexecution,ourDSTprocedureneedstochoosensignalsfromthesemnsignalsthataremostprotableatacertaindurationdependingonthek(1km)activeregions. Algorithm8:DynamicSignalTracing(DST) Input: Circuit,Tracebuffersizen,kactiveregions(Ri),andrespectiverelevance(ri)andselectedsignallists(SSi) Output: Listofnsignalstobetraced,TSTS=f/*InitializetoNULL*/1:Here,ridenotetherelevanceofregionRi,andSSiisthemostprotablensignalsselectedforregionRi,where1im.Letr=i=mi=1ri2:FindthecontributionfromRi,Ci=nri r3:SelectthebestCisignalsfromSSi4:PuttheselectedsignalsinTS.5:Repeatsteps3-4forallkregions1ik.ReturntheselectedsignalsTS Sincewehavetoselectnoutofmnsignalsfortracing,itisreasonabletoadoptnmultiplexers,eachofwhichwillprovideasignalcorrespondingtothetracebufferoutput.Themainproblemistodividethemnsignalsamongthenmultiplexerssothatallpossiblecombinationsoftracesignalscanbeachieved.Anobviousbutexpensive 91 PAGE 92 Table6-1. SelectedSignalsforeachMUXforn=4andm=4 MuxnameInputSignalsforeachMUX MUX1A1,B2,B3,B4,C2,C3,C4,D2,D3,D4MUX2B1,A2,A3,A4,C2,C3,C4,D2,D3,D4MUX3C1,A2,A3,A4,B2,B3,B4,D2,D3,D4MUX4D1,A2,A3,A4,B2,B3,B4,C2,C3,C4 solutionwouldbetousenmultiplexerseachhavingallthemnsignalsasinputas1output.Oneoptimizationcanbeachievedbythefollowingobservation.Letusconsideracircuitwith4regions,RA,RB,RC,RD.SupposethesignalsresponsiblefordetectingerrorsinregionRAarenamedA1,A2,...,An,intheorderofpriority.IfsignalA1isnotselectedfortracing,subsequentsignals,thatis,A2,A3,....Anwillnotbeselectedfortracing.Thus,itisnotnecessarytokeepthesignalsunderthesamemultiplexerinputasA1.Thenumberofinitialsignalsselectedfromeachregiontofeedintothenmultiplexersaren m.Atotalofnsignalswillllintherststageofeachmultiplexer.Now,numberofsignalsremainingforeachregionisgivenbyn remain=n)]TJ /F3 11.955 Tf 13.89 8.1 Td[(n mUndereachmultiplexer,allthesesignalsexcepttheonefromthesameregionwillbestored.Forexample,ifmultiplexer1hassignalA1,thesignalsA2,A3andA4neednotbepartofitsinput.Thereforesizeofeachmultiplexerissize1wheresize=1+(m)]TJ /F5 11.955 Tf 10.95 0 Td[(1)(n)]TJ /F3 11.955 Tf 13.89 8.1 Td[(n m)Thussavingsobtainedismn sizewhichreducestomn 1+mn)]TJ /F8 8.966 Tf 6.96 0 Td[(2n+n m.Forthisexample,m=4andn=4;thereforethevalueforsizeis10.Ourinitialmultiplexerdesignwasofsize(mn)1,whichinthiscase,means161.Thus,wecouldreducethemultiplexersizeby16 10,thatis,1.6.Table 6-1 showsthecongurationsofeachmultiplexersindicatingthesignalsenteringeachofthem. 92 PAGE 93 Table6-2. Tableforn=2andm=2 CurrentStateSelectedRARBSignals 01(B0,B1)10(A0,A1)11(A0,B0) Now,wewouldliketoexplorethedesignforourdynamicsignaltracingalgorithm.Thetotalpossiblenumberofstatesis2m)]TJ /F5 11.955 Tf 11.4 0 Td[(1sinceatleastoneofmregionswillbeactiveatatime.Thisisindependentofn,thatis,thetracebufferwidth.Howevereachofthestateswillbedenedbynsignals,signifyingthensignalstobetracedatthattime.Table 6-2 showsasimpleexamplecontrollerillustratingdifferentsignalselectionsdependingonthestateofcurrentlyactiveregionswhenm=2andn=2.LetthetworegionsbeRAandRB.ThetwosignalsselectedfromeachregionbeingA0,A1andB0,B1respectively.Atanypoint,onlytwoofthesignalsarechosenfortracing.WhenonlyregionRAisactive,thesignalstobetracedarethetwosignalsfromregionRA,indicatedbyA0,A1.Similarly,whenonlyregionRBisactivethetwosignalstobetracedareB0,B1.Whenbothregionsareactive,thetracesignalstobeselectedareA0,B0. Figure6-6. Datapathandcontrollerdesignform=3andn=3 Similarly,whenm=3andn=3,itisevidentthattherewillbe7differentstatesforthisconguration.Letthe3errorregionsbeRA,RBandRC.SignalsinRAarenamed 93 PAGE 94 Figure6-7. ProposedDesign asA0,A1,A2andsimilarlyforallotherregions.Figure 6-6 showsthecontrolleranddatapathdesignforsuchaconguration.TheoverallstructureofourproposeddesignisshowninFigure 6-7 .Hereweconsideradesignwithnmultiplexersthatwouldproducentracesignals.Theoutputofthemultiplexersarefedtoatracebuffer.Thetracecontrollerprovidesthecontrolsignalstothemultiplexersbasedonthelogicmentionedabove.ThetracecontrolleroperatesunderthesameclockastheDesignUnderTest(DUT).Anexternalknobisappliedonthetracecontroller(generallybythevalidationengineer)whichcontainsinformationonthecurrentlyactiveerrorzonesinthecircuit. 6.4Experiments 6.4.1ExperimentalSetupWeveriedtheeffectivenessofourregion-basedsignalselection(RSS)anddynamicsignaltracing(DST)algorithmsusingsomeofthelargestISCAS'89benchmarksaswellasopencorescircuits.Ineachofthesubsequentexperiments,weconsideranumberofregionswitheachregionhavingoneerrorzone.Eacherrorzonecomprises 94 PAGE 95 ofabout5%oftherespectiveregion.Weinserted50randomerrorsintheerrorzonesoftheactiveregions,withtheerrordensityproportionaltotheregionsize.Weassumeasimplebit-ipmodelforerror,thatis,atparticularcycle,theerrorsignalwilljustipitsstate.Prolinginformationisobtainedbyrunninganidealsimulationof1000cycleswithrandominputvectorsandnotingthepercentageofeachsignalstate.Weperformtwosimulations,onefortheidealcase,whenallthesignalsareassumedtobeerrorfree,andonewiththeerroneoussignalsincluded.Itshouldbenotedthatweconsidertheerrorsindividually,inordertopreventeacherror'seffectfromsuppressinganother.Theerrormodelisassumedtobesporadic,thatis,errorsdonotkickoffeverycycle,butaftercertainintervals.Forourcase,weassumetheerrorstobemanifestedafterahiatusof100cycles.Thesimulationperformedisoftotal1000cycles,thatis,atotalof50000cyclesforthe50errors.Anydiscrepancyinthetracedsignalstatesisreportedaserror.ThemetricusedtomeasureerrordetectionperformanceisErrorDetectionRatio(EDR),asdenedinEquation 5 .Wehaveappliedouralgorithmsusingawidevarietyoftotalregions(m)andactiveregions(k,km).Inthissection,wesummarizetheresultsforthreescenarios(eachhavingseveralsubcases):2regions(bothactiveandonlyoneactive),3regions(allactive,twoactive,andonlyoneactive),4regions(allactive,threeactive,twoactive,andonlyoneactive).Ineachofthesesubcases,wepresenttheaverageofallpossiblescenarios.Forexample,incaseofk=1andm=2,theresultsshowtheaverageoftwopossiblescenarios:R1isactiveorR2isactive.Wecomparethefollowingthreeapproaches: GSS.Thisapproachrepresentstheexistingtechniquesthatfocusonglobalsignalselection(GSS)withoutanyknowledgeoferrorzonesoractiveregions,assumesthattheerrorsareuniformlydistributedacrossthecircuit.Thesignalsareselected 95 PAGE 96 inanapproachsimilarto[ 7 ]3;theonlydifferencebeingthatwehaveconsiderederrordetectionandnotrestorationduringsignalselection. Figure6-8. GSS EZ-GSS.Weextendtheexistingmethodswiththeknowledgeoftheerrorzonestoevaluatetheireffectivenessinhandlingerrorzones.Wecallthisapproachaserror-zoneawareglobalsignalselection(EZ-GSS).Thisisastaticsignalselectionassumingallzonesareactive. Figure6-9. EZ-GSS RSS+DST.Ourapproachisessentiallyacombinationofregion-awaresignalselection(RSS)anddynamicsignaltracing(DST). 6.4.2ResultsforTwoRegionsForeachofourexperimentalcircuits,wecreatedtworegionseachhavingoneerrorzone.Intherstsetofexperiment,weassumebothzonesareactive.Inthiscase, 3Althoughthereareothersimilartechniqueslike[ 6 24 52 ]whichcanbeconsideredasGSS;noneofthemconsiderserrordetection(insteadfocusesonrestorationratio).Moreover,theseapproachesdonottakeintoaccountthepresenceoferrorzonesandtheonescurrentlyactive.Therefore,noneoftheseapproachesareexpectedtoprovideasignicantperformance.Wehavechosenone([ 7 ])fromalltheseapproachesasarepresentativeofGSS. 96 PAGE 97 Figure6-10. RSS+DST EZ-GSSandRSS+DSTaresame,sincewehavetoconsiderbothzonesforsignalselectionevenduringDST.TheresultsareshowninFigure 6-11 .Asexpected,ourapproachperformsbetterthanGSS,withthemaximumimprovementbeing1.75times,sinceourapproachlaysmoreemphasisontheerrorzones. Figure6-11. ComparisonofEDRperformancewhenbothregionsareactive Inthenextexperiment,weassumeoneofthetwoerrorzonesareactiveataparticulartime.Now,wewouldliketocomparetheEDRperformanceofourthreeapproaches,GSS,EZ-GSSandRSS+DST.TheresultsareshowninFigure 6-12 .GSSperformstheworstamongthethreesinceithasnoknowledgeofwheretheerrorislocatedorwhichregionisactive.EZ-GSSperformsbetterthanGSSbuthasnoknowledgeofactiveregions.RSS+DSTperformsthebestsinceitdynamicallyselectssignalswiththecompleteknowledgeofcurrentlyactiveerrorzones.ThemaximumimprovementobtainedbyourapproachagainstGSSisalmost3times. 97 PAGE 98 Figure6-12. ComparisonofEDRperformancewhenonlyoneregionisactive WewouldnowliketoobservetheperformanceofourapproachonsomerealcircuitsobtainedfromtheOpencores[ 47 ]website.Wechoosethreecircuitsforourpurpose,namelyRS232Uart,OPBOnewireandi2cslave.Thesewillbereferredtoasuart,oneandslaverespectivelyforfurtherdiscussion.WesynthesizedtheseusingSynopsysDesignCompilertoobtainthegate-levelnetlistfromtheRTLdescriptions.Foreachofthesecircuits,weconsidertwoerrorregionsofwhichoneisactiveatatime.TheresultsareshowninFigure 6-13 .Asexpected,forallthreebenchmarks,ourproposedmethodsEZ-GSSandRSS+DSTperformsmuchbetterthanGSS.RSS+DSTperformsbestinallcases;howeverforoneperformanceofEZ-GSSandRSS+DSTaresimilar.ThisisbecauseofallthesignalsselectedusingRSS+DST,theoneswhichcandetectmostoftheerrorsareselectedusingEZ-GSSaswell. 6.4.3ResultsforThreeRegionsIntheseexperiments,wecreatedthreeregionsforeachcircuit.Intherstexperiment,weassumeonlyoneofthethreeregionsareactive.TheEDRperformancecomparisonisshowninFigure 6-14 .TheRSS+DSTnumbersaretheaverageofthreepossiblescenariosofoneactiveregioninthecircuit.Upto3timesimprovementisobtainedbyourapproachcomparedtoGSS,whilecomparedtoEZ-GSS,RSS+DSThasamaximumimprovementof1.56. 98 PAGE 99 Figure6-13. ComparisonofEDRperformanceontheOpencorescircuits Figure6-14. ComparisonofEDRperformancewhenoneregionisactive Inthenextsetofexperiments,wecomparewhentwoofthethreeregionsareactive.TheresultsareshowninFigure 6-15 .TheRSS+DSTnumbersaretheaverageofthreepossiblescenariosoftwoactiveregionsinthecircuit.RSS+DSTperformsthebestamongthethreeapproaches,withthemaximumimprovementobtained2times(comparedtoGSS)and1.3times(comparedtoEZ-GSS). 6.4.4ResultsforFourRegionsInthiscase,wecreatefourregionsineachcircuit.Intherstexperiment,wewouldliketocomparewhenoneamongthefourzonesisactive.Theresultsareshownin 99 PAGE 100 Figure6-15. ComparisonofEDRperformancewhentworegionsareactive Figure 6-16 .Asexpected,RSS+DSTperformsbestwiththemaximumimprovementobtained3.2timescomparedtoGSS. Figure6-16. ComparisonofEDRperformancewhenoneregionisactive Inthenextexperiment,weobservetheEDRperformancewhen2ofthe4zonesareactive.TheresultsareshowninFigure 6-17 .RSS+DSTperformsupto2timesbetterinerrordetectioncomparedtoGSSand1.4timescomparedtoEZ-GSS.Finally,weassumethatthreeoutofthefourzonesareactiveandtrytoobservetheerrordetectionperformance.Theresults,showninFigure 6-18 ,revealthatRSS+DSTperformsbettererrordetectionthananyoftheothertwoapproaches.Notethattheimprovementisnotassignicantasinotherscenarios.Thisisbecausewhenmore 100 PAGE 101 Figure6-17. ComparisonofEDRperformancewhentworegionsareactive zonesareactive,GSSandEZ-GSScandeliverbetterresultsrelativetoRSS+DSTcomparedtowhenonlyfewregionsareactive. Figure6-18. ComparisonofEDRperformancewhen3regionsareactive 6.4.5HardwareOverheadWehavedevelopedaVerilogmodulethatisparameterizableformregionspercircuitandntracesignals.Wehavesynthesizedbothourcontroller(thatgeneratesselectedsignalsfortheMUXes)andthedatapath(MUXstructure)describedinSection 6.3 usingSynopsysDesignCompilerwithlsi 10ktechnologylibrary.Thecontrollerareacorrespondington=32andm=4(areasonablyrealisticscenario)is239.Thecorrespondingdatapathareaconsistingof32multiplexersis185.Thereforethetotal 101 PAGE 102 areaforourdesignis239+185=424.Thetracebuffer,whichisanintegralpartofpost-silicondebugmethodologywouldoccupymuchmoreareacomparedtothecontroller.Atypicaltracebufferof321024bits,whensynthesizedusingthesamelibraryisfoundtooccupyanareaofalmost60000,whichisabout141timesmorethanthecontrollerarea.Webelievethatthetracecontrollerhasacceptable(negligible)areaoverheadconsideringthatourapproachcandetectupto3timesmoreerrorscomparedtostate-of-the-artexistingmethods. 6.5SummaryExistingtracesignalselectiontechniquesassumethaterrorsareuniformlydistributedacrossthecircuit.Thisassumptionmaynotbevalidinmanypracticalscenarios.Duringdesigntime,ourregion-awaresignalselectionapproachselectsbenecialsignalsforeachregionbasedoninformationregardingerrorzones.Duringexecution,ourdynamicsignaltracingcontrollerenablesdesignertotraceadifferentsetofsignalsbasedonregionsthatarerelevant(active)duringacertainduration.Ourexperimentalresultsdemonstratedthatourapproachcandetectsignicantlymore(upto3times)errorscomparedtoexistingapproaches. 102 PAGE 103 CHAPTER7TRACEDATACOMPRESSIONUSINGSTATICALLYSELECTEDDICTIONARYDuringpost-silicondebug,thetracedsignalstatesarestoredinanon-chiptracebuffer.Thetracebuffersizedictatestheamountofdatathatcanbestored,andhencedirectlyaffectstheobservabilityofthedesign.Sincethetracebufferisusedonlyfordebugging,itisbettertokeepitssizeassmallaspossibletoreducetheoverallcost,areaandenergyrequirements.Anoptiontoenhancetheobservabilitywithoutcompromisingonthedebugoverheadistocompressthetracedatabeforestoringtheminthetracebuffer.Wehaveproposedalosslessdictionarybasedwidthcompressionschemethatoperatesinreal-timetocompressthetracedata.Unlike[ 16 ],ourmethodchoosesthedictionaryofine,whichprovidesabettercompressionperformanceaswellashugereductionincompressionarchitectureoverhead.Threedifferentcompressionalgorithmshavebeenproposedtotrade-offbetweencompressionperformanceandarchitectureoverhead.WehaveusedCompressionRatio,denedinEquation 7 ,asametrictomeasuretheefciencyofacompressionalgorithm.Ahighercompressionratioimpliesabettercompression. CompressionRatio=UncompresssedDataSize CompressedDataSize(7) 7.1TraceDataCompressionTheexistingcompressiontechniquescompressthetracedatabyselectingadictionarydynamicallyduringexecution.Thisnotonlyresultsininferiorcompressionperformance(duetonon-optimaldictionaryselection),butalsoincreasesthearchitectureoverhead.Thissectiondescribesourtracedatacompressiontechniques.TheoverviewisshowninFigure 7-1 .Ourapproachisbasedonanimportantobservation.Inanypost-silicondebugenvironment,afterthetracedataiscollectedfromthechip,itisvalidatedbychecking 103 PAGE 104 Figure7-1. Overviewofourtracecompressionprocedure withasetofidealtracedata,thatisobtainedfromagoldenmodel.Sinceveryfew(2-5%)bugsactuallyremaintobetrackedduringthepost-silicondebugphase,thereareafewcycleswhichproduceerroneousvalues[ 14 15 ],thataredifferentfromtheidealones.Weutilizethisinformationtodesignourapproach.Sincethedifferencebetweentheidealandtheactualtracedataisverysmall,thesamedictionaryapplicableforidealtracedatacompressioncanbereusedforcompressionoftheactualtracedata.Thistakescareofthetwoproblemsbyprovidingabettercompressionperformance,andreducingthearchitectureoverhead1aswell.Thesecompresseddataarethenreadoutthroughachanneltoadebugger,wheretheyarecheckedagainsttheidealtracedata.Anydiscrepancyinthetracedataisreportedaserror.AscanbeseenfromouranalysisinSection 7.1.3 ,introductionof2-5%errorintracedataresultsin2-6%penaltyincompressionperformance,whichisacceptable.ItcanbeseenfromthediscussionsinSection 7.2 ,evenwiththeintroductionoferrors,ourtechniqueprovideslesscompressionpenaltycomparedtotheexistingtracecompressionmethods[ 16 ].Theremainderofthissectiondescribesourdictionaryselectionalgorithmsandalsoperformsatheoreticalanalysisofthemaximumpenaltypossiblewhenthedictionaryfromtheidealtracedataisusedtocompresstheactual(potentiallyerroneous)tracedata. 1noneedtoimplementadynamicdictionaryselectionalgorithm. 104 PAGE 105 7.1.1DictionarySelectionAlgorithmsWehaveexploredthreecompressionalgorithmsforcompressionofthetracedata,namelyDictionarybasedcompression(DC),Bitmaskbasedcompression(BMC)andxedDictionaryMBSTW(fMBSTW)basedcompression.Allthesethreetechniquesuseadictionaryforcompression.Thedictionaryselectionisextremelyvitalsinceitwouldbereusedtocompresstheactualtracedata.Wewillnowdescribehowthedictionariesareselectedinordertoachievethemaximumcompressionperformance. 7.1.1.1Dictionary-basedcompression(DC)Algorithm 9 outlinesthedictionaryselectionmethod.Inadictionarybasedcompression,themainaimistoincludeinthedictionaryalltheuniqueentrieswhichhavemaximumrepetitionsinthedataset.Therefore,therststepdeterminesalltheuniqueentriesinthedataset.Wethenndthenumberofrepetitionsforeachentry.Theuniqueentriesaresortedinadescendingorderofthenumberofrepetitions.Theentrieswiththehighestnumberofrepetitionsareincludedinthedictionary.Detailsondictionaryselectionforbitmask-basedcompressionhasbeenexplainedin[ 54 55 ]. Algorithm9:DictionaryselectionalgorithmforDC M=Numberofuniqueentries N=NumberofDictionaryEntries DIC=Dictionary foreachentryinMdo Calculatethenumberofrepetitionsintheentiredataset endfor SorttheMentriesindecreasingorderofrepetitioncount IncludetherstNentriesinDIC 7.1.1.2Bitmask-basedcompression(BMC)Thedictionaryselectionforbitmaskbasedcompressionfollowsthesametrendasthedictionarybasedcompression,thatis,selectdictionaryentriesgivingthemaximumsavings.However,thereisaminordifferencebetweenthetwo.WhilesavingsforDCcorrespondstojusttherepetitions,forBMCitincludesthoseduetobitmaskbased 105 PAGE 106 matchingsaswell.Hence,thesavingsforeachuniqueentryshouldbecalculatedbasedonthedirectaswellasbitmaskbasedmatches.Theentriesarethensortedinorderofsavingsandincludedinthedictionary.ThedictionaryselectionalgorithmisshowninAlgorithm 10 Algorithm10:DictionaryselectionalgorithmforBMC M=Numberofuniqueentries N=NumberofDictionaryEntries DIC=Dictionary foreachentryinMdo Calculatethesavingsduetorepetitionandbitmaskbasedmatchingintheentiredataset endfor SorttheMentriesindecreasingorderoftotalsavings IncludetherstNentriesinDIC 7.1.1.3FixeddictionaryMBSTWcompression(fMBSTW)ThecompressiontechniqueforfMBSTWalgorithmfollowsthesametechniqueasMBSTWcompression[ 16 ].ThedifferencefromMBSTWisthatthedictionaryisselectedstaticallyandthenumberofdictionaryentriesislimited.WewouldnowexplainthedictionaryselectionstepsforfMBSTWinAlgorithm 11 .Thisalgorithmisshownfora2-fMBSTW(2-stringsareencodedtogether,similarto2-MBSTW).Thiscanbefurtherextendedto3-fMBSTW,where3stringsareencodedtogether. Figure 7-2 showsanillustrativeexamplefordictionaryselectionusingAlgorithm 11 .Inthisexample,thestringsinthetracedataarerepresentedusingp,q,r,s,t.Theamountofsavingsforeach2-tupleisshowninFigure 7-2 .Wewanttohaveadictionaryofsize4.Ascanbeseen,thehighestsavingsisobtainedfromthe2-tuple givesthehighestsavings.However,risalreadypresentinthedictionary. 106 isavoided.The2-tuplehavingthenexthighestsavingsis .Therefore,tisselectedforthedictionary.Inthisway,thedictionaryisbuiltup. 7.1.2DynamicTraceDataCompressionOurnalgoalistodebugtheDUT,forwhichweneedthetracedatafromit.ApplicationofasetoftestsproducesthetracedatafromtheDUTwhicharecompressedtoreducethesizeofthetracebuffer.TheoverviewofthecompressionarchitectureisshowninFigure 7-3 .Ascanbeseen,thecompressionarchitectureconsistsoftwoparts,thedictionaryandtheactualcompressionengine.Dependingonthedesignandassociatedconstraints,aspeciccompressionalgorithmanditscorrespondingdictionaryisused.Forexample,whenBMCismostsuitableforadesign,thecompressionenginewillhaveBMCinitandthedictionarywillbetheoneselected 107 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MILLISECOND | CLASS.METHOD | MESSAGE |
|---|---|---|
| 0 | sobekcm_page_globals.constructor | |
| 0 | sobekcm_page_globals.constructor | Application State validated or built |
| 0 | sobekcm_database.verify_item_lookup_object | |
| 0 | sobekcm_page_globals.constructor | Navigation Object created from URI query string |
| 0 | sobekcm_database.verify_item_lookup_object | |
| 0 | sobekcm_page_globals.display_item | Retrieving item or group information |
| 0 | sobekcm_page_globals.get_entire_collection_hierarchy | Retrieving hierarchy information |
| 0 | sobekcm_assistant.get_entire_collection_hierarchy | |
| 0 | cached_data_manager.retrieve_item_aggregation | |
| 0 | cached_data_manager.retrieve_item_aggregation | Found item aggregation on local cache |
| 0 | item_aggregation_builder.get_item_aggregation | Found 'all' item aggregation in cache |
| 0 | system.web.ui.page.page_load (ufdc.page_load) | |
| 0 | sobekcm_page_globals.constructor.on_page_load | |
| 0 | html_echo_mainwriter.add_style_references | Adding style references to HTML |
| 0 | html_echo_mainwriter.add_text_to_page | Reading the text from the file and echoing back to the output stream |
| 72 | html_echo_mainwriter.add_text_to_page | Finished reading and writing the file |