<%BANNER%>

Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.

DARK ITEM
Permanent Link: http://ufdc.ufl.edu/UFE0043971/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.
Physical Description: Book
Language: english
Creator: Qin, Xiaoke
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Xiaoke Qin.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Mishra, Prabhat.
Electronic Access: INACCESSIBLE UNTIL 2014-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0043971:00001

Permanent Link: http://ufdc.ufl.edu/UFE0043971/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.
Physical Description: Book
Language: english
Creator: Qin, Xiaoke
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Xiaoke Qin.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Mishra, Prabhat.
Electronic Access: INACCESSIBLE UNTIL 2014-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0043971:00001


This item has the following downloads:


Full Text

PAGE 1

SYSTEM-LEVELVALIDATIONOFMULTICOREARCHITECTURES By XIAOKEQIN ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2012

PAGE 2

c 2012XiaokeQin 2

PAGE 3

Idedicatethistomyfamily. 3

PAGE 4

ACKNOWLEDGMENTS Firstofall,ItrulyappreciatetheeffortofmyPh.D.adviserProf.PrabhatMishra. Henotonlyguidedmetoovercomechallengingproblems,butalsotaughtmehow toexplorenewdirections.Moreimportantly,heisalwaysconsideratetomeandhas helpedmebuildingmycareer.Heisthepersonwhomadethisdissertationcometrue. IwouldliketothankmyotherPh.D.committeemembers:Prof.SartajSahni,Prof. Jih-KwonPeir,Prof.GregStittandProf.AnnGordon-Rossfortheirvaluablecomments andsuggestions.Ialsothankmylab-mates,MingsongChen,KanadBasu,Weixun Wang,ChetanMurthy,KartikShrivastava,HadiHajimiriandKamranRahmani.Itwas mygreatpleasuretoworkwiththem.IreallyenjoyedourfriendshipandIhopeitwilllast forever. Lastbutnotleast,Isincerelythankmyfamilyfortheirloveandsupport.They encouragedmetopursuemydreamsandbecomeagoodperson.Iwouldliketogive themostspecialthankstomygirlfriend,Jie.Herloveanddevotionpavedtheroadtomy doctoraldegree. ThisworkwaspartiallysupportedbygrantsfromNationalScienceFoundation NSFCAREERAward0746261. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS..................................4 LISTOFTABLES......................................8 LISTOFFIGURES.....................................9 ABSTRACT.........................................11 CHAPTER 1INTRODUCTION...................................12 1.1FunctionalValidationofMulticoreArchitectures...............13 1.2ValidationofNon-functionalRequirements..................15 1.3ResearchContributions............................16 1.4DissertationOrganization...........................19 2RELATEDWORK..................................20 2.1TestGenerationforArchitectureValidation..................20 2.2ValidationofCacheCoherenceProtocols..................24 2.3TaskSchedulabilityunderConstraints....................25 3SYNCHRONIZEDGENERATIONOFDIRECTEDTESTS............28 3.1Background...................................29 3.1.1Conictclauseforwarding.......................29 3.1.2Propertyclustering...........................29 3.2SynchronizedTestGeneration........................31 3.2.1CorrectnessoftheProposedApproach................37 3.2.2ImplementationDetails.........................38 3.3Experiments..................................39 3.3.1AStockExchangeSystem.......................39 3.3.2AVLIWMIPSProcessor........................42 3.3.3CircuitTestGeneration.........................45 3.4Summary....................................46 4EFFICIENTTESTGENERATIONFORMULTICOREARCHITECTURES...48 4.1TestGenerationforMulticoreArchitectures.................49 4.1.1CorrectnessofOurProposedApproach...............55 4.1.2ImplementationDetails.........................56 4.1.3HeterogeneousMulticoreArchitectures................58 4.2Experiments..................................60 4.2.1ExperimentalSetup..........................60 4.2.2Results.................................60 5

PAGE 6

4.3Summary....................................64 5VALIDATIONOFCACHECOHERENCEPROTOCOLS.............66 5.1BackgroundandMotivation..........................67 5.2TestgenerationforTransitionCoverage...................69 5.2.1SIProtocol...............................70 5.2.2MSIProtocol..............................73 5.2.3MESIProtocol..............................76 5.2.4MOSIProtocol.............................76 5.3Experiments..................................78 5.4Summary....................................81 6SCALABLEDIRECTEDTESTGENERATION...................82 6.1DirectedTestGenerationbyInterleavingConcreteandSymbolicExecution84 6.1.1IllustrativeExample...........................84 6.1.2SystemModel..............................87 6.1.3Instrumentation.............................88 6.1.4ConcreteSimulation..........................90 6.1.5PathConstraintGeneration......................91 6.1.6TestGeneration.............................92 6.1.7ConstraintSolvingOptimization....................93 6.2ImplementationDetails.............................94 6.2.1DesignFlattening............................94 6.2.2ClockCyclePopulation.........................95 6.2.3DynamicArrayReferenceDisambiguation..............95 6.3Experiments..................................96 6.3.1DesignswithoutDynamicArrayReferences.............97 6.3.2DesignswithDynamicArrayReferences..............97 6.3.3SAT-basedBMCversusOurApproach................99 6.4Summary....................................101 7TEMPERATURE-ANDENERGY-CONSTRAINEDSCHEDULINGINREAL-TIME SYSTEMS......................................102 7.1BackgroundandProblemFormulation....................102 7.1.1ThermalModel.............................102 7.1.2EnergyModel..............................103 7.1.3SystemModel..............................103 7.1.4TCECproblem.............................104 7.2Overview....................................106 7.3ApproximationAlgorithmforTCECScheduling...............107 7.3.1Notations................................107 7.3.2TCECasMCP.............................108 7.3.3AnExactAlgorithmforMCP......................111 7.3.4ApproximationAlgorithm........................114 6

PAGE 7

7.4ProblemVariants................................122 7.5Experiments..................................124 7.5.1ExperimentalSetup..........................124 7.5.2TCECversusTCorEC........................124 7.5.3TCECusingApproximationAlgorithm................126 7.6Summary....................................129 8SCHEDULABILITYVALIDATIONFORMULTICOREARCHITECTURES....130 8.1BackgroundandProblemFormulation....................130 8.1.1ProcessorThermalModel.......................130 8.1.2EnergyModel..............................131 8.1.3SystemModel..............................131 8.1.4MulticoreDVSSchedule........................131 8.1.5ProblemFormulation..........................132 8.2OptimalAlgorithmforTECS..........................132 8.3ApproximationAlgorithm............................138 8.4ProblemVariants................................148 8.4.1TaskSetwithDependence.......................148 8.4.2HardEnergyConstraint........................149 8.5Experiments..................................149 8.6Summary....................................153 9CONCLUSIONSANDFUTUREWORK......................154 9.1Conclusions...................................154 9.2FutureResearchDirections..........................156 REFERENCES.......................................158 BIOGRAPHICALSKETCH................................167 7

PAGE 8

LISTOFTABLES Table page 3-1TestgenerationtimecomparisonforOSES....................40 3-2TestgenerationtimecomparisonforMIPS.....................44 3-3Testgenerationtimecomparisonforcircuits....................46 4-1Testgenerationtimefor8coresystem.......................62 4-2Detailedtestgenerationinformation........................62 5-1Statisticsofourtestgenerationalgorithmfordifferentprotocols.........79 6-1Veriloginstrumentationcode............................89 6-2ComparisonwithHYBRO[56]...........................97 6-3Comparisonwithrandomtesting..........................98 6-4ComparisonwithBMC[22].............................100 7-1Runningtimecomparisonondifferenttasksets.................127 8

PAGE 9

LISTOFFIGURES Figure page 1-1Simulationeffortgrowthwithdesigncomplexity..................14 1-2Designvalidationmodels,requirementsandtechniques.............17 1-3Dissertationoutline..................................17 2-1Directedtestgenerationow............................20 3-1SynchronizedTestGeneration...........................31 3-2DifferentincrementalSATsolvingtechniques...................32 3-3Synchronizedtestgenerationformultipleproperties...............36 3-4TestgenerationforOSESusingdifferentclustersize................43 3-5TestgenerationforMIPSusingdifferentclustersize................45 3-6Testgenerationforcircuitsusingdifferentclustersize...............47 4-1Abstractedarchitectureofatwocoresystem...................49 4-2IncrementalSATsolvingtechnique[79]......................50 4-3Testgenerationformulticorearchitectures.....................50 4-4FSMrepresentationofFigure4-1attimestep i ..................51 4-5Testgenerationformulticorearchitectures.....................54 4-6Multicoresystemwithdifferenttypesofcores...................59 4-7Multicoresystemwithdifferenttypesofexecutionunits..............59 4-8Testgenerationtimewithdifferentnumberofcores...............61 4-9Testgenerationtimewithdifferentinteractions..................63 4-10Testgenerationtimewithheterogeneouscores..................64 5-1StatetransitionsforacacheblockinMSIprotocol................68 5-2GlobalFSMstatespaceofSIprotocolwith3cores................71 5-3StatespaceofMSIprotocolwith3cores.Fortheclarityofpresentation,the transitionstoglobalmodiedstatesIIM,IMI,MIIareomitted,ifthetransition intheoppositedirectiondoesnotexist.......................74 5-4StatespaceofMESIprotocolwith3cores.....................77 9

PAGE 10

5-5StatespaceofMOSIprotocolwith3cores.....................78 5-6Transitioncoveragevs.costfordifferenttestgenerationmethodsonMESI protocolwith8cores.................................80 5-7Transitioncoveragevs.costfordifferenttestgenerationmethodsonMOSI protocolwith8cores.................................80 6-1Theworkowofourapproach...........................85 6-2Counter.v.......................................86 6-3SampleTrace.....................................86 6-4SamplePathConstraints..............................86 6-5ChronologicalBackTracking............................87 6-6Pathconstraintlestructure............................91 6-7TheMIPSarchitecture[22].............................99 7-1OverviewofourTCECschedulabilityframework..................107 7-2Jobexecutiongraph.................................108 7-3JEGofTCEC.Thevaluesnexttoeachedgearecorrespondingtimeandenergy consumption.....................................110 7-4Possiblefalsenegativeregion. =0 : 1 .......................122 7-5ECvsTCEC.ECnishesatA.TCEC < 80 C nishesatB.BothTCEC < 78 C andTCEC < 76 C nishatC.........................125 7-6ECvsTCEC.BothTCandTCEC < 14000 mJ nishatA.TCEC < 13700 mJ nishesatB.TCEC < 12500 mJ nishesatC...................126 7-7Runningtimewithdifferentjobsetsizeand ....................127 7-8Accuracyof EBF ...................................128 8-1StateexplorationinAlgorithm9...........................136 8-2Precedencerelationsamongtasks.........................142 8-3Temperatureandenergyconstrainedscheduling..................151 8-4Actualtimeconsumptionof DPRA .........................152 8-5Actualenergyconsumptionof DPRA .......................152 8-6Runningtimewithdifferentjobsetsizeand ....................152 10

PAGE 11

AbstractofdissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy SYSTEM-LEVELVALIDATIONOFMULTICOREARCHITECTURES By XiaokeQin May2012 Chair:PrabhatMishra Major:ComputerEngineering Multicoreprocessorsarewidelyusedintoday'sservers,desktopandembedded systems.Itisamajorchallengetoverifyfunctionalcorrectnessaswellasnon-functional requirementsofmulticorearchitectures.Directapplicationofexistingfunctional validationapproachesusuallyconsumestoomuchtimetoreachthecoveragegoaldue tothecomplexityofmulticoredesigns.Escapedbugscanleadtoseriousconsequences inmanyscenarios.Duetoparallelexecutionoftasksets,existingapproachesare alsoinsufcienttovalidatewhetherapplicationsinsuchsystemscanbescheduled withinthegiventemperature,energy,andtimingconstraints.Iftheseconstraintsare violated,itcanleadtoperformancedegradationorevencatastrophicconsequences insafety-criticalsystems.Thisdissertationpresentsnoveltechniquestoaddress validationchallengesofbothfunctionalandnon-functionalrequirementsinmodern multicorearchitectures.Myresearchhasmadefourmajorcontributions:iitproposes efcientdirectedtestgenerationtechniquesthatexploitsymmetryinmulticoredesigns; iiitproposesanoveltestgenerationapproachforstate-andtransition-coveragein awidevarietyofcachecoherenceprotocols;iiiitproposesascalabledirectedtest generationtechniquebasedoninterleavedconcreteandsymbolicexecution;andivit proposesschedulabilityvalidationapproachesfortasksetsinmulticorearchitectures undertemperatureandenergyconstraints.Extensiveexperimentalresultsdemonstrate signicantimprovementinoverallvalidationeffort. 11

PAGE 12

CHAPTER1 INTRODUCTION Multicorearchitecturesarewidelyusedintodaysdesktop,server,andembedded systems.Duetotheexistenceofpowerwall,conventionalsinglecorearchitectures cannolongerdelivertherequiredperformanceimprovementbyincreasingfrequency. Instead,architectsintegratemoreandmorecoresintothesamechiptoboostthe throughput.Byoperatingmultiplecoresatalowerfrequency,multicorearchitecturescan achievethesameperformancewithsignicantlylesspowerconsumptioncomparedwith ahighclockratemonolithiccore.Fordesktop-basedsystemsandservers,themulticore architecturesdelivertherequiredthroughputkeepingpacewithtoday'sapplications withincreasingcomputationcomplexity.Duetosuccessfuldeploymentofdual-coreand quad-coreprocessors,thenextgenerationprocessorswillhave32,64orevenhundreds ofcores.Forembeddedsystems,theenergyefciencyofmulticorearchitectures allowsdevicestooperateforlongertimewiththesamebatterycapacity.Besides,since multiplecoresaresharingthesamedie,thePrintedCircuitBoardPCBsizeisalso reduced.Withthegrowingdemandforgreendata-centers,long-lifecomputersand handholddevices,multicorearchitectureswillcontinuetodominatethedesignofnext generationSystem-on-ChipSoCarchitectures. Successfulmulticoredesignsmustsatisfybothfunctionalandnon-functional requirements.Functionalrequirementsensurethattheprocessorperformsalllogical functionsasspeciedbythedesignspecication.Non-functionalrequirementsare imposedtomakethedesignsatisfyvariousdesignconstraintssuchasarea,power, energy,temperature,andperformance.Clearly,functionalrequirementsareimportant, becauseabuggyerroneousdesignleadstounreliablesystems.Dependingon applicationdomains,unreliablesystemscancauselossofvitalinformationoreven disaster.Non-functionalrequirementsarealsoequallyimportant,becauseviolation ofnon-functionalrequirementscanalsoleadtoseriousconsequences.Forexample, 12

PAGE 13

duetounevenactivitiesondifferentcores,thedietemperatureofbusycorescaneasily reach 120 C [16].Ifthehighdietemperatureisnotwellcontrolled,thetransienterror occursmorefrequentlyandthedeviceislessreliable.Also,devicesthatalwaysoperate inhightemperatureusuallyhavemuchshorterlifespanasshowninindustrialstudies [82].Toavoidtheseunwantedscenarios,bothfunctionalandnon-functionalvalidation mustbeperformedtoensurethesuccessofmodernmulticoredesigns. Therestofthischapterisorganizedasfollows.Section1.1andSection1.2 describeexistingvalidationtechniquesandassociatedchallengesforvalidationof functionalandnon-functionalrequirements,respectively.Section1.3summarizesthe contributionofthisdissertation.Finally,Section1.4outlinestheorganizationofthis dissertation. 1.1FunctionalValidationofMulticoreArchitectures Whilemulticorearchitecturesareverysuccessfultoboostthethroughput,their increasingcomplexityalsointroducessignicantvalidationchallenges.Mostwidely usedfunctionalvalidationtechniquesarebasedonsimulationusingrandomand constrained-randomtests[93][1][83].Themulticoredesignisplacedwithinasimulation environmentandatestgeneratorfeedsrandomtestsintothedesign.Thebehavior ofthedesignundertestiscomparedwiththegoldenreferencemodeltodetectany functionalerrors. AsillustratedinFigure1-1[77],thevericationcomplexityhasgrowntremendously inlasttwodecades.Forexample,in2007atypicalSoCdesignwith100milliongates usedonetrilliontestvectorsforsimulation.Duetotheincreasingcomplexityofmulticore architectures,eventrillionsofsimulationvectorsmaynotbeinadequatetoachieve therequiredcoveragegoalwithineverdecreasingtime-to-marketwindow.Since simulationvectorsaregeneratedrandomly,itisquitedifcultforrandomteststoactivate coverageholes.Directedtests[22]arepromisingtoaddressthisproblem.Byanalyzing thelogicalstructureofthedesign,asmallnumberofdirectedtestscanactivatethe 13

PAGE 14

Figure1-1.Simulationeffortgrowthwithdesigncomplexity desiredbehaviorofthesystem.Theycanbeappliedinadditiontotherandomtests toreachthecoveragegoalwithmuchlesstime.Unfortunately,mostdirectedtestsare manuallywritten,whichistimeconsuminganderror-prone.Fullyautomaticdirected testgenerationschemesaredesiredtoacceleratethevericationprocessofmulticore architectures.Therearetwomajorobjectivesindirectedtestgeneration.First,the overallvalidationeffortshouldbeminimizedbyreducingthetotalnumberoftests requiredtoachievethecoveragegoal.Secondly,testgenerationtimeshouldalsobe small. Modelchecking[13,28]ispromisingforautomatedgenerationofdirectedtests. Toactivateaparticularscenario,wecanfeedthenegatedversionofapropertyto themodelchecker,andusetheresultantcounterexampleasadirectedtest.Dueto thestatespaceexplosionproblem,suchaprocessisusuallyverytimeconsuming. Sincedifferentcoresinamulticoredesignusuallycontainsimilarstructure,theirformal descriptionssuchasCNFinSAT-basedmodelcheckingalsoexhibitsignicant symmetry.Webelievesuchsymmetrycanbeexploitedtoacceleratethemodel checkingprocess,becausetheinformationwelearnfromonecoremaybeapplied directlytoothercores.Unfortunately,thisintuitivereasoningishardtoimplement 14

PAGE 15

becauseitisverydifculttoreconstructthesymmetryfromtheCNFformula.The highlevelinformationislostduringCNFsynthesis,anditisinefcientaswellas computationallyexpensivetorecoverthroughreverseengineeringmethods. Animportantrequirementoffunctionalvalidationistoachievecertainstateor transitioncoverageofthestatespaceofthedesign.Simulationusingrandomtests iswidelyusedinindustrytofulllthisgoal.However,duetothesymmetricnatureof multicorearchitectures,itsstatespacecontainssomeuniquefeatures,whichcanbe utilizedtoreducethetestlengthortestingtimerequiredtoreachtherequiredcoverage goal.AlthoughtheFSMofeachcachecontrolleriseasytounderstand,thestructureof theproductFSMformoderncachecoherenceprotocolsusuallyhaveobscurestructures thatarehardtoanalyze.Besides,modernprocessorsusuallycontainmultiplecache levels,whichgreatlycomplicatestheglobalstatespace.Eveniftheglobalstatespace canbedescribed,itisstilldifculttondanefcientwaytoperformtraversalinit.In otherwords,thetestgenerationalgorithmmustactivateallstatesandtransitionswith limitednumberofunnecessarytransitions.Moreover,sincethestatespaceisverylarge, thetestsusuallyintroducealargestorageoverhead.Therefore,itisdesirablethatthe testcanbegeneratedonthey. 1.2ValidationofNon-functionalRequirements Sofarwehavedescribedtheimportanceofensuringfunctionalcorrectnessand challengesassociatedwithverifyingmulticorearchitectures.Itisalsoequallyimportant toensurethatallthenon-functionalrequirementsaremet.Oneofthekeychallengesis tondwhetheragiventasksetcanbescheduledontheprocessorswithoutviolating therequiredtemperatureandenergyconstraints.Thiskindofvalidationisimportant toensurethereliabilityofmulticoredesigns,becausehighdietemperatureleadsto morefrequenttransienterrorsaswellasshorterprocessorlifespan[82].Besides,the managementofoverallenergyconsumptionisalsocrucialtothesuccessofembedded systems.Sincemanyhandhelddevicesareequippedwithmulticoreprocessorsbutstill 15

PAGE 16

battery-powered,weneedtovalidatethatallimportanttasksarenishedwithlimited energyconsumption. Itisusuallyverycostlytoperformsuchvalidation,becausethemanufacturerneed tobuildthefullsystemandtestthedesignbyexecutingrealtasksets.Detectionof failuresatthisstageisexpensive,sinceitwillleadtore-designofthesystem.Sincethe worstcasebehaviorofreal-timesystemsusuallycanbeobtainedbyofineanalysis,we believeitispossibletopredictthesystembehaviorbasedontheinformationcollected viastaticanalysisoftasksetsandexecutionenvironment.Inotherwords,invarious cases,non-functionalvalidationcanbeperformedwithoutrunningtheactualsystem inrealenvironments.ThemajorchallengeinthiseldcomesfromtheNP-hardnature [103][100][86]oftheschedulabilityproblem.Infact,itisNP-hardeventoverifythe schedulabilityofatasksetundertemperatureandenergyconstraintsinasinglecore processor.Theproblemismorecomplexwhenthesystemcontainsmultiplecores. 1.3ResearchContributions Myresearchproposesnoveltechniquestoaddresschallengesinbothfunctional andnon-functionvalidationofmulticoresystems.Theobjectiveofmyresearchisto developefcienttestgenerationapproachesandvalidationalgorithmsformodern multicorearchitectures. Figure1-2presentsthescopeofthisdissertation.Theproposedresearchdevelops efcientvalidationtechniquestoaddressdifferentfunctionalandnon-functional requirementsusingawidevarietyofdesignmodelsincludingsystem-levelmodels, formalmodelsaswellasRTLmodels. Figure1-3outlinesthefourmajorresearchcontributionsofthisdissertationthat aresummarizedasfollows.Therstthreearerelatedtoverifyingfunctionalcorrectness whereasthelastoneensurethatthenon-functionalrequirementsaresatised. Directedtestgenerationformulticorearchitectures: Thisworkproposesanovel techniquethatexploitstemporal,structural,andspatialsymmetryinmulticoredesigns 16

PAGE 17

Figure1-2.Designvalidationmodels,requirementsandtechniques Figure1-3.Dissertationoutline atthesametime.Ourproposedtechniqueenablesthereuseoftheknowledgelearned fromonecoretotheremainingcoresinmulticorearchitecturesstructuralsymmetry, fromoneboundtothenextforagivenpropertytemporalsymmetry,aswellasfrom onepropertytootherpropertiesspatialsymmetry.Ourexperimentalresultsonboth hardwareandsoftwaredesignsdemonstrateanorder-of-magnitudereductioninoverall testgenerationtime. Efcienttestgenerationforstateandtransitioncoverageincachecoherence protocols: Thisworkproposesanefcienttestgenerationapproachforawidevariety 17

PAGE 18

ofcachecoherenceprotocols.Basedondetailedanalysisofthespacestructure, ourapproachcreatesefcienttestsequencesfordifferentpartsoftheglobalFSM statespacetoachieve100%stateandtransitioncoverageforeachcachecoherence protocol.Wedevelopagraphicaldescriptionofthestatespacestructureofseveral commonlyusedcachecoherenceprotocolsandpresentanon-the-ydirectedtest generationalgorithmbasedontheEulertourofhypercubes.Theexperimentalresults ondifferentcachecoherenceprotocolsshowtheeffectivenessofourapproachon systemswithmanycores. ScalabledirectedtestgenerationforrealHDLdesigns: Thisworkdevelopsa scalabletechniquetoenabledirectedtestgenerationofHDLmodelsbyincorporating staticanalysisandsimulationbasedvalidation.Byperforminginterleavedconcreteand symbolicexecution,ourapproachavoidstheerror-pronedesigntranslationprocessand enablesdirectedtestgenerationforrealdesigns.Comparedwithexistingapproaches basedoncombinedconcreteandsymbolicexecution,ourapproachiscapableof analyzingrealprocessordesignswithdynamicarrayreferences.Theexperimental resultsillustratethatourproposedtechniqueisscalable,andenablesdirectedtest generationforrealdesigns. Temperature-andenergy-constrainedschedulingformulticorearchitectures: ThisworkexplorestheDVSschedulingproblemonmulticoresystemsunderboth temperatureandenergyconstraints.WeshowthatthisproblemisNP-hardevenwhen thesteadystatetemperatureisconsidered.Wealsopresentanexactalgorithmanda polynomialtimeapproximationschemefortheproblem.Whentheoriginalproblemis schedulable,ourapproximationalgorithmisguaranteedtogenerateasolution,which willnotviolatethetemperatureconstraint,andconsumenomoretimeorenergythan aspeciedapproximationbound,e.g.,within1%oftheoptimaltimeconsumptionand energyconstraints.Theexperimentalresultsdemonstratethatourtechniqueisable 18

PAGE 19

toproduceschedulesclosetooptimalsolutionwithreasonableexecutiontimeonreal benchmarks. 1.4DissertationOrganization Thisdissertationisorganizedasfollows.Chapter2introducesrelevantexisting researchworks.Chapter3andChapter4describeproposeddirectedtestgenerationfor thefunctionalvalidationofmulticorearchitectures.Chapter5discussesproposedtest generationapproachesfortransitioncoverageincachecoherenceprotocols.Chapter6 describesourscalabledirectedtestgenerationapproachforHDLdesigns.Chapter7 describesourschedulabilityvalidationapproachesunderenergyandtemperature constraints.Chapter8presentsourschedulabilityvalidationtechniqueformulticore processors.Chapter9concludesthisdissertation. 19

PAGE 20

CHAPTER2 RELATEDWORK Thischaptersurveysexistingsystem-levelvalidationtechniques.Foreaseof presentation,wehavedividedtheexistingapproachesintothreecategories.First,we describethetestgenerationapproachesforarchitecturevalidation.Next,wediscuss existingtechniquesforvalidationofcachecoherenceprotocols.Finally,wepresent techniquesforvalidationofnon-functionalrequirements. 2.1TestGenerationforArchitectureValidation Modelcheckingtechniquesarepromisingforfunctionalvericationandtest generationofcomplexsystems[39,50,51,64].Figure2-1showsthegeneral Figure2-1.Directedtestgenerationow frameworkfordirectedtestgenerationusingmodelchecking.Inordertocreate directedtests,theformalmodelofthedesignspecicationandasuitablefaultmodel areprovidedasinput.Thenasetofpropertiesaregeneratedforthedesiredbehaviors faultsthatshouldbeactivatedinthesimulationbasedvalidationstage.Forexample, whenagraphmodelofthedesignandafunctionalcoveragefaultmodelisprovided, 20

PAGE 21

acoverage-drivenpropertygenerationcanbeused.Similarly,incaseofcircuitswith stuck-atfaultmodel,thepropertywillbeintheformof G a =1 or G a =0 .Next,a modelcheckerisemployedtocheckwhetherthereexistssomestateswhichviolate thenegatedversionoftheproperty.Ifthemodelcheckerndsaviolation,itreports acounterexample.Thiscounterexamplecontainsasequenceofinputinformation whichwilldrivethesystemfromaninitialstatetoastatethatdoesnotsatisfythe negatedversionoftheproperty,orinotherwords,whichsatisestheoriginalproperty. Therefore,wecanuseitasatesttoactivatethecorrespondingpropertyorbehavior duringsimulation-basedvalidation. Althoughmodelcheckingiseffectivefordirectedtestgeneration,thecapacity oftheconventionalsymbolicmodelcheckingisusuallylimited.Boundedmodel checkingBMCwasproposedtoaddressthisproblembycheckingwhetherthereis acounterexampleforthepropertywithinagivenbound[13][28].Givenadesign D ,a safetyproperty p ,andabound k ,BMCwillunrollthedesign k timesandencodeitusing thefollowingformula: BMC M;p;k = I s 0 ^ k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ i =0 R s i ;s i +1 ^ k i =0 : p s i where I s 0 istheinitialstateofthesystem, R s i ;s i +1 representsthestatetransition fromstate s i tostate s i +1 ,and p s i checkswhetherproperty p holdsonstate s i .The formulaisthentransformedtoCNFandcheckedbyaSATsolver.IftheSATsolvernds someassignmentwhichmakestheCNFtrue,itimpliesthatthepropertydoesnothold atbound k ,i.e., M 2 k p .Otherwise,ifnosuchassignmentisfound,weconcludethatthe propertyholdsupto k ,or M k p BMCcannotprovethevalidityofasafetypropertytoholdgloballywhenno counterexampleisfoundwithinaspecicbound,butitisquiteeffectivetofalsifya designwhentheboundisnotlarge.ThereasonisthatSATsolversusuallyrequire lessspaceandtimethanconventionalBinaryDecisionDiagramBDDbasedmodel 21

PAGE 22

checkers[65].Therefore,SAT-basedBMCissuitablefordirectedtestgeneration[64], whereacounterexampletypicallyexistswithinarelativelysmallbound.Togeneratethe directedtest,thenegatedversionofthepropertyischeckedbyBMC.TheSATsolver willndanassignmentofallinputandstatevariables,whichsatisesEquation2. Asaresult,wecanextracttheassignmentsequenceofinputvariablesanduseitasa testtoactivatethedesiredpropertyinthesystem. AgreatdealofworkhasbeendonetoreducetheSATsolvingtimeduringBMC [2225,43,52,79,91].ThebasicideaistoexploittheregularityoftheSATinstances betweendifferentbounds.Forexample,incrementalSATsolvers[43,91]reducethe solvingtimebyemployingthepreviouslylearnedconictclauses.Generatedconict clausesarekeptinthedatabaseaslongastheclauseswhichledtotheconictsare notremoved.Strichman[79]proposedthatifaconictclauseisdeducedonlyfrom thetransitionpartofaSATinstance,itcanbesafelyforwardedtoallinstanceswith largerbounds,becausethetransitionpartofthedesignwillstillbeintheSATinstance whenweunrollthedesignformoretimes.Besides,thelearnedconictclausescan alsobereplicatedacrossdifferenttimesteps.However,theexistingapproachesdidnot exploitthesymmetricstructurewithinthesametimestep. Indirectedtestgeneration formulticorearchitectures,sameknowledgeaboutthecorestructureneedstobe re-discoveredforeachcoreindependently,whichcanleadtosignicantwastageof computationalpower. WhenBMCisappliedincircuits,Kuehlmann[53]proposedthattheunfolded transitionrelationcanbesimpliedbymergingverticesthatarefunctionallyequivalent undergiveninputconstraints.Inthisway,thecomplexityoftransitionrelationisgreatly reduced.SincethistechniqueisbasedontheAIGrepresentationoflogicdesigns, itisdifculttouseforacceleratingthesolvingprocessofCNFinstances,whichare directlycreatedfromhighlevelspecications.Functionalvalidationbasedonhighlevel specicationisveryeffectiveinmanyscenarios.Forexample,Bhadraetal.[45]used 22

PAGE 23

executablespecicationtovalidatemultiprocessorsystems-on-chipdesigns.Chenetal. [22]proposeddirectedtestgenerationbasedonhighlevelspecication.Toaccelerate thetestgenerationprocess,conictclauseslearnedduringcheckingofonepropertyare forwardedtospeeduptheSATsolvingprocessofotherrelatedproperties,althoughthe boundisrequiredasaninput.Similarly,thesimultaneousSATsolver[49]enabledthe learnedclausestobereusedbyproperties.Decisionorderingwasalsostudiedin[23]to reducetheSATsolvingtime. Theseapproachesdidnottaketheadvantageofstructural symmetryinmulticorearchitectures. WhenSATinstancecontainssymmetricstructure,symmetrybreakingpredicate [3,5,30,62,80]canbeusedtospeeduptheSATsolvingbyconningthesearchto non-symmetricregionsofthespace.Byaddingsymmetrybreakingpredicatestothe SATinstance,theSATsolverisrestrictedtondthesatisfyingassignmentsofonlyone representativememberinasymmetricset. However,thisapproachcannoteffectively acceleratethedirectedtestgenerationformulticoreprocessors,becausetheproperties fortestgenerationareusuallynotsymmetricwithrespecttoeachcore. Thus,the symmetricregionsintheentirespaceareusuallysmalldespitethefactthatthestructure ofeachcoreisidentical.Biereetal.[14]proposedthateachcomponentcanbesolved individuallytoacceleratethesolvingprocess. However,thesymmetricstructureisnot usedatthesametimeforfurtherspeedup. Duringthevalidationprocess,itisalsoveryimportanttogenerateassertions effectively.OneimportantworkinthisdirectionisGoldMine[81],whichautomatically usesdataminingandformalvericationtogenerateassertionsforrealhardware designs.UsingthesimulationtraceofRTLdesigns,GoldMineemploysdecisiontree basedsupervisedlearningalgorithmstominepotentialassertionsfromthesimulation data.Liuetal.[54]alsoproposedamethodology,whichutilizesGoldMinetoachieve coverageclosureduringdesignvalidation.Oncetheassertionisgenerated,automatic testgenerationapproachescanbeemployedtogeneratethetests,whichcanbeused 23

PAGE 24

toactivatethedesiredbehaviorofthesystem.Forexample,testgenerationtoolsbased oninterleavedconcreteandsymbolicexecution,suchasDART[40],CUTE[72],and Apollo[7],arepromisingincapturingimportantbugsinlargesoftwaresystems.STAR [55]andHYBRO[56]areproposedtogeneratetestsbycombiningstaticanddynamic analysisforhardwarevalidation.DuetotheeffectiveutilizationoftheCFG,HYBRO [56]demonstratedremarkableimprovementoverpreviouspath-basedtestgeneration technique[55].However,HYBROcannotbeappliedonreal-lifedesignscontaining dynamicarrayreferences. 2.2ValidationofCacheCoherenceProtocols Vericationofcachecoherenceprotocolsformulticoreandmultiprocessorsystems hasbeenwidelystudiedinbothacademiaandindustry.Existingstudiescanbebroadly groupedintotwocategories:formalverication[27,33,36]andsimulationbased validation[2,83,93].Formalmethodsusingmodelcheckingcanprovemathematically whetherthedescriptionofcertaincachecoherenceprotocolviolatestherequired property.Forexample,Mur [33]wasdesignedandusedtoverifyvariouscache coherenceprotocolsbasedonexplicitmodelchecking.Counter-exampleguided renement[27]isemployedtoverifycomplexprotocolswithmultilevelcaches.Besides, symbolicmodelcheckingtoolsarealsodevelopedforcoherenceverication.For example,Emersonetal.[36]investigatedthevericationproblemwithparameterized cachecoherenceprotocolusingBDDs.Althoughformalmethodscanguaranteethe correctnessofadesign,theyusuallyrequirethatthedesignshouldbedescribedin certaininputlanguages.Asaresult,modelcheckingusuallycannotbeappliedto implementationsdirectly. Simulationbasedapproaches,ontheotherhand,areabletohandledesignsat differentabstractionlevelsandthereforemorewidelyusedinpractice.Forexample, Woodetal.[93]usedrandomteststoverifythememorysubsystemofSPURmachine. Successiveloadsandstorestothesamelocationareemployedastesttemplateto 24

PAGE 25

exposepossibleerrors.GenesysProtestgenerator[2]fromIBMextendedthisdirection withmorecomplexandsophisticatedtesttemplates.Toreducethesearchspace, Abtsetal.[1]introducedspacepruningtechniqueduringtheirvericationoftheCray processor.Wagneretal.[83]designedtheMCjammertoolwhichcangethigherstate coveragethannormalconstrainedrandomtests.Existingrandomtestgenerationtools areproventobeeffectivetodiscoverpotentialbugs.However,duetotheirrandom nature,itisveryhardtoachievefullstateandtransitioncoverageinareasonable time.Sinceanuncoveredtransitioncanonlybevisitedbytakingauniqueactionata particularstate,itmaynotbefeasibleforarandomtestgeneratortoeventuallycover allpossiblestatesandtransitions.Toaddressthisproblem,somerandomtestersare equippedwithsmallamountofmemory,sothatthefuturesearchcanbeguidedtothe uncoveredregions.Unfortunately,unlessthememoryislargeenoughtoholdtheentire statespace,itisstillquitehardtoachievefullcoveragebysuchguidedrandomtesting. 2.3TaskSchedulabilityunderConstraints Energy-awareschedulingtechniquesforreal-timesystemshavebeenwidely studiedtoreduceenergyconsumption.Whileseveralworksemployeddynamiccache reconguration[87][85],mostofthemarebasedonDynamicVoltageScalingDVS. Aydinetal.[9]addressedbothstaticanddynamicslackallocationproblemsforperiodic tasksets,whileShinetal.[73]alsoconsideredaperiodictasks.Jejurikaretal.focused onenergy-awareschedulingfornon-preemptivetasksets[47]andleakagepower minimization[48].Zhongetal.[103]solvedasystem-wideenergyminimizationproblem withconsiderationofothercomponents.Wangetal.[85]proposedaleakage-aware energysavingtechniquebasedonDVSaswellascachereconguration.Asshown in[100],applyingDVSinreal-timesystemsisaNP-hardproblem.Optimaland approximationalgorithmsaregivenin[103][100][86],whileotherworksproposed heuristics.Asurveyonrecentworkscanbefoundin[21].However,thesetechniques arenotawareofcontrollingtheoperatingtemperature. 25

PAGE 26

Temperature-awareschedulinginreal-timesystemshasdrawnsignicantresearch interestsinrecentyears.Wangetal.[84]introducedasimplereactiveDVSscheme aimingatmeetingtasktimingconstraintsandmaintainingprocessorsafetemperature. Zhangetal.[101]provedtheNP-hardnessoftemperature-constrainedperformance optimizationprobleminreal-timesystemsandproposedanapproximationalgorithm. Yuanetal.[97]consideredbothtemperatureandleakagepowerimpactinDVSproblem forsoftreal-timesystems.Chenetal.[20]exploredtemperature-awareschedulingfor periodictasksinbothuniprocessorandhomogeneousmultiprocessorDVS-enabled platforms.Liuetal.[57]proposedadesign-timethermaloptimizationframeworkwhich isabletosolveproblemvariantsenergy-awareEA,temperature-awareTAand temperature-constrainedenergy-awareTCEAschedulinginembeddedsystemwith tasktimingconstraints.Jayaseelanetal.[46]exploiteddifferenttaskexecutionorders, inwhicheachtaskhasdistinctpowerprole,tominimizepeaktemperature.However, noneofthesetechniquessolvestemperature-constrainedandenergy-constrained TCECproblem.Moreover,theyallmakecertainassumptionsonsystemcharacteristics thatlimitstheirapplicability. Existingresearchformulatedthevoltage/frequencyassignmentproblemsin differentmodels.Forexample,IntegerLinearProgrammingILPhasbeenwidely appliedtomanyvoltage/frequencyassignmentproblemswithoutthetemperature constraint[94,102].Chantemetal.[19]alsousedILPtomodelschedulingproblem withsteady-statetemperatureconstraints.Unfortunately,whentransienttemperature isconsidered,thefullexpansionofthetemperatureconstraintintroducesalarge numberofproductterms,whichpreventustosolvetheproblemefcientlyusing ILPsolvers.Coskunetal.[29]circumventedthisproblemusinganiterativeILPand thermalsimulationapproach,althoughtheconvergencetotheoptimalsolutionisnot guaranteed. 26

PAGE 27

Anotherimportantmodelingtechniqueistimedautomata[6].Norstormetal.[66] rstextendedtimedautomatawiththenotionofreal-timetasksandshowedthatthe traditionalschedulabilityanalysiscanbetransformedtoadecidablereachabilityproblem intimedautomata,whichcanbesolvedusingmodelcheckingtools.Fersmanetal.[37] furthergeneralizedthisapproachwithasynchronousprocessesandpreemptivetasks incontinuous-timemodel.However,noneofthesetechniquesconsideredenergyor temperaturerelatedissues. ThereareseveralstudiesonDynamicPowerManagementDPMusingformal vericationmethodsforembeddedsystems[74]andmultiprocessorplatforms[58]. Shuklaetal.[74]providedapreliminarystudyonevaluatingDPMschemesusingan off-the-shelfmodelchecker.Lungoetal.[58]triedtoincorporatevericationofDPM schemesintheearlydesignstage.Theyshowedthattradeoffscanbemadebetween designqualityandvericationefforts.Noneoftheseapproachesconsiderstemperature managementinsuchsystems.Moreover,theydidnotaccountforenergyandtiming constraints,whichareimportantinreal-timeembeddedsystems.Wangetal.[88] discussedtheapplicationoftimedautomatainschedulabilityproblemwithbothenergy andtemperatureconstraints.Nevertheless,duetothecapacitylimitofmodelchecker, theproposedtechniquecanonlybeappliedtosmalltasksets. Temperature-orenergy-constrainedschedulingproblemsarealsorelatedto themulti-constrainedpathMCPproblemforQualityofServiceQoS.MCPwas extensivelystudiedbynetworkcommunity.Forexample,Chenetal.[26]designedan approximationalgorithmforMCPwithtwoconstraints.[76]and[98]studiedtheefcient heuristicsforMCPproblems.Xueetal.[96]proposedpolynomialtimeapproximation algorithms,whichcanbeappliedformorethantwoconstraints.However,sincethe QoScostsareusuallymodeledasadditiveconstants,theseexistingmethodscannot beapplieddirectlytosolveTCECproblemduetothefactthatthecomputationofthe temperatureisnotadditive. 27

PAGE 28

CHAPTER3 SYNCHRONIZEDGENERATIONOFDIRECTEDTESTS Modelcheckingispromisingforautomaticgenerationofdirectedtests[39,64], becausethecounterexampleofthenegatedversionofapropertycanbeusedasatest toactivatetheproperty.ExistingtestgenerationtechniquesusingSAT-basedbounded modelcheckingBMC[67]canbedividedintotwocategoriesbasedonwhetherit addressesonepropertyormultipleproperties.Therstcategoryisapplicablefortest generationforonedesignandonepropertywithvaryingbounds[78,79].However, theknowledgeobtainedarenotsharedwhensolvingforotherpropertiesonthesame design.Incontrast,themethodsinthesecondcategorytriestoacceleratethetest generationformultiplepropertieswithknownbounds[63].Theyrstgroupsimilar propertiesintoclusters.Then,theknowledgearesharedbyallpropertiesinthesame cluster.Thisapproachexploitthefactthatalthougheachtestgenerationinstanceis createdforadifferentproperty,theseinstancesstillhavealargeoverlap,becausethe designremainsunchanged.Themajordrawbackofthissolutionisthatitassumesthat theboundisknown.Ingeneral,itisverydifculttodeterminetheboundupfrontwithout actuallysolvingtheSATinstance,whichlimitstheapplicabilityofthissolution. Inthiswork[68],wecombinetheadvantagesofbothapproachesbydeveloping anovelBMCbasedtestgenerationtechniqueformultiplepropertiesofthesame design,whichenablesthereuseoflearnedknowledgeacrossdifferentboundsas wellasacrosspropertiesinthesamecluster.Thebasicideaofourapproachisto synchronizethesolvingprocessofmultiplepropertiesfordifferentbounds,sothatthe utilizationoflearnedknowledgecanbemaximized .Onemaythinkthatsolvingmany SATinstancestogethercanbedramaticallycomplexthansolvingoneinstance,and thereforemaybeimpractical.Onthecontrary,sincealltheseinstancesaregenerated byunrollingthesamedesignforseveraltimes,wesuccessfullydevelopedasimplebut effectiveapproachtosignicantlyreducetheoverallSATsolvingtimebyforwarding 28

PAGE 29

knowledgeamongdifferentsolvingprocesses.Ourexperimentalresultsdemonstratean order-of-magnitudereductioninoveralltestgenerationtime. Therestofthechapterisorganizedasfollows.Section3.1brieydiscussesthe backgroundonSAT-basedBMC.Section3.2describesourtestgenerationmethodology formultiplepropertiesandbounds.Section3.3presentsourexperimentalresults. Finally,Section3.4concludesthischapter. 3.1Background Thissectionbrieydescribesthebasicconceptsofexistingaccelerationtechniques fordirectedtestgenerationusingBMC. 3.1.1Conictclauseforwarding ManytechniquesandheuristicsareemployedinSATsolverstoacceleratethe solvingprocess.ModernSATsolverssuchaszChaff[38]andGRASP[59]adopt theDavis-Putnam-Logemann-LovelandDPLL[31,32]algorithmandconict-driven non-chronologicalbacktracking.Thebasicideabehindthesetechniquesistosavethe knowledgelearnedduringresolvingcurrentconicttoavoidthesameconictinthe future[99].Aconictoccurs,whenthecurrentassignmentofsomevariables,througha setofclauses,impliesthatonevariablemustbetrueandfalseatthesametime.Inthis case,conictanalysiswilltracebackalongtheimplicationrelationsandndtheclosest assignmentofvariablesthatledtotheconict.Wecanforbidsuchassignmentfrom occurringagainbyaddingacarefullydesignedclause,i.e.,conictclause,totheoriginal CNF.Generally,conictclausesareonlymeaningfulwithinthesameSATinstance. However,whenthesetofclausesthatledtotheconictclausearesharedbymultiple SATinstances,wecanalsoforwardconictclausesacrossinstances. 3.1.2Propertyclustering Propertyclusteringisanotherimportanttechniquetoreducethetotaltest generationtimewithBMC.AsindicatedinFigure2-1,foragivendesignandfault model,werstgenerateasetofproperties,whichcanbeusedtoactivateallthefaults. 29

PAGE 30

Then,differentSATinstancesforthesepropertiesaresolvedtoobtainthetests.Since sharingofknowledgeamongsimilarpropertiesusuallyreducestheoverallsolvingtime, wecanclusterpropertiesintodifferentgroupsandsolveallthepropertiesinthesame grouptogether. Althoughtheintentionbehindpropertyclusteringisintuitive,thechallengehere istodeterminethepropernumberofclustersandwhichpropertiesshouldbeinthe samecluster.Ononeextreme,onecangroupallpropertiesintoasingleclusterand solvethemtogether.Althoughthisapproachmaximizesthesharingofknowledge amongproperties,italsoincreasesthepossibilitythattoomanyconictclausesare accumulatedinthesolver'sdatabase,whichhamperstheoverallperformanceofthe SATsolver.Ontheotherextreme,wecanalsoleteachclusterhaveonlyoneproperty, whichisactuallytheapproachadoptedbyStrichmanetal.[79].Inthisway,only knowledgerelevanttothepropertieswillbeexplored.However,itisalsopossiblethat thesameknowledgewillbediscoveredagainandagainfordifferentproperties,which isasignicantoverhead,whenthenumberofsimilarpropertiesislarge.Therefore,itis desirabletostrikeabalancebetweentheknowledgesharingandoverheadintroduced byirrelevantconictclauses. Ourpropertyclusteringapproachfordesignsgiveningraphmodelsissimilarto [63].Thepropertiesaregroupedtogetherbytheirsimilarityonstructuralortextual overlap.Thepropertiesinthesameclusteraredescribingbehaviorsofthesame functionalunitorcomponent.Inthisway,itislikelythattheknowledgeorconict clausesthatweobtainedduringsolvingonepropertywillbehelpfultootherproperties inthesamecluster.Forcircuitswithstuck-atfaultmodel,weperformtheclusteringof propertiesbasedonthecone-of-inuenceCOI.Outputsignalswithlargeoverlapin theirCOIsaregroupedintothesamecluster. 30

PAGE 31

3.2SynchronizedTestGeneration Figure3-1showstheframeworkofoursynchronizedtestgenerationapproach. Inordertocreatedirectedtests,theformalmodelofthedesign,asetofpropertiesfor thedesiredbehaviorsfaultsthatshouldbeactivated,andthecorrespondingcluster informationareacceptedasinput.Next,theSATinstancesforeachpropertyare groupedintodifferentclustersbasedontheirsimilarityandthensolvedsimultaneously tocreatethetestsuite,whichcanbeusedtotriggerthedesiredbehaviorsduring simulation-basedvalidation.Algorithm1outlinesthekeystepsinourdirectedtest generationframework.Thecontributionisthesynchronizedtestgenerationfor propertiesinacluster,whichwillbeexplainedindetailsinAlgorithm2. Figure3-1.SynchronizedTestGeneration Tohighlightthecontributionofourwork,Figure3-2comparesourapproachwith twocloselyrelatedtechniques:iincrementalSATforsinglepropertywithunknown bound[79]andiitestgenerationformultiplepropertieswithknownbounds[63].Inthis example,therearethreeproperties p 1 p 2 ,and p 3 withbounds3,2,and1respectively. WeusesoliddotstorepresentdifferentSATinstancesandlinestoindicatetheconict clauseforwardingpaths.Strichmanetal.[79]solvedeachpropertyseparately,and 31

PAGE 32

Algorithm1: Testgenerationframework Input :iDesign D ,; iiProperties P forfaultactivation; Output :Testsforcorrespondingfaults Clustersimilarpropertiesintogroups.; TestSuite = ; ; for eachpropertycluster PC do PerformSynchronizedTestGenerationon PC .; AddgeneratedtestsintoTestSuite.; end return TestSuite AB CD Figure3-2.DifferentincrementalSATsolvingtechniques.AStrichman[79].BMishra andChen[63].CAnaivecombinationof[79]and[63].DTentative assignmentofvariablesduringchecking p 1 at k =3 32

PAGE 33

passedtheknowledgededucedconictclauseshorizontallywithininstancesforthe samepropertyFigure3-2A.Incontrast,Mishraetal.[63]solvedonebaseproperty rst,e.g., p 2 inthiscase,thenforwardthelearnedclauseverticallybetweenother SATinstancesfordifferentproperties,asshowninFigure3-2B. Clearly,itshouldbeprotableifwecanappropriatelyforwardconictclauses verticallybetweenpropertieswhilesolvingforeachpropertyhorizontally.Inthisway, theknowledgelearnedduringcheckingapropertyforaspecicboundcanbenetitself withlargerboundsaswellasacrossotherproperties.Oneintuitivewaytocombine thetwoapproaches,asshowninFigure3-2C,istochoosesomepropertyasbased property p 2 inFigure3-2C,checkthispropertyfordifferentbounds,andthenforward thelearnedconictclausestootherSATinstancesforotherproperties.Unfortunately, thisnaivecombinationhasthreeproblems.First,itisveryhardtochoosethebase property,thatshouldyieldalargenumberofconictclauseswhichcanbesharedby otherproperties.Unlike[63],whereeachpropertyhasonlyoneSATinstance,wedonot knowhowmanySATinstanceswehavetosolve.Asaresult,itisimpossibletoapply theclusteringtechniqueproposedin[63],todeterminethebaseproperty.Secondly, evenifwecorrectlyndtheoptimalbaseproperty,itisstilldifculttochoosethe suitableboundofthereceivingpropertytoforwardclauses,becauseSATinstanceswith inappropriateboundsmaybesolvedtrivially.Moreover,thelearningduringchecking non-basepropertiesiswasted.Forexample,inFigure3-2D,suppose : a i b i c i +1 a i _: d i +1 and a i _: e i +1 areclauseswithinthetransitionconstraintofthesystemat timestep i +1 IntheSATsolvingprocessof p 2 withbound k =2 ,aconictclause b 0 c 1 _: d 1 is deducedbasedon : a 0 b 0 c 1 and a 0 _: d 1 topreventtheassignment f b 0 ;c 1 ;d 1 g = f 0 ; 0 ; 1 g ,whichwillresultinaconicton a 0 .Duringthesolvingprocessof p 1 withbound k =2 ,theSATsolvermayexploretheassignment f b 0 ;c 1 ;d 1 g = f 0 ; 0 ; 1 g ifStrichman's approach[79]isemployed.Suchassignmentcanbeavoidedbyusing[63]asshown 33

PAGE 34

inFigure3-2BandFigure3-2C,becausethelearnedconictclause b 0 c 1 _: d 1 is forwardedto p 1 However,learnedclausesareonlyallowedtobeforwardedfromthebaseproperty p 2 inthiscase.Theknowledgelearnedduringsolvingnon-basepropertieswillnotbe reused.AsindicatedinFigure3-2D,conictclause b 0 c 1 _: e 1 isdeducedbased on : a 0 b 0 c 1 and a 0 _: e 1 duringthesolvingprocessof p 3 withbound k =1 Since p 3 isnotabaseproperty,thisinformationwillnotbereusedby p 1 .Therefore, duringthesolvingprocessof p 1 withbound k =2 ,theSATsolverwillstilltrytomake theassignment f b 0 ;c 1 ;e 1 g = f 0 ; 0 ; 1 g .Whenthenumberofpropertiesislarge,thismay causeagreatwasteofcomputationalpower,becausewehavetoexplorethesame searchspaceformanytimes,ifthespaceisnotvisitedduringthesolvingprocessofthe baseproperty. Ourapproachtosolvethisproblemisbasedontheeffectiveidenticationofconict clausesthatcanbesharedbyotherSATinstancesacrosspropertiesandbounds. In fact,foranybound k 0 0 ,allSATinstancesgeneratedduringBMCEquation2 with k k 0 clearlysharethetransitionclauses I s 0 ^ V k 0 )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 i =0 R s i ;s i +1 ,althoughtheir propertyterms W k i =0 : p s i aredifferent.Thisobservationimpliesthatallconictcauses deducedbasedonthesecommonclausesduringsolvingprocessofanySATinstance canbeforwardedtoanyotherSATinstanceswith k k 0 ,becauseallofthemhavethe samesetofclausesthatledtotheconictclause. Therefore,ifwecheckallproperties togetherfor k =0 ; 1 ; 2 ;::: ,i.e.,synchronously,allconictclausescanbesafelyshared byallsubsequentSATinstances. Algorithm2outlinesoursynchronizedtestgenerationmethodforclustered properties.Itacceptseachpropertyclusterandthedesignofthesystemasinput andproducescorrespondingtests.Asindicatedbefore,thisalgorithmwillcheckall propertiessynchronouslyforeachbound.Ineachiteration,werstgeneratethe transitionclauseset CS k T correspondingto I s 0 ^ V k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 i =0 R s i ;s i +1 usingBMCD,true,k, 34

PAGE 35

Algorithm2: SynchronizedTestGenerationForPropertiesinaCluster Input :iDesign D ,; iiProperties P ,; iiiMaximumbound K max Output :TestSet TS Bound k )]TJ/F15 11.9552 Tf 22.582 0 Td [(0 ; CommonConictClauseSet CCS )-278(; ; TS )-278(; ; while P 6 = ; and k K max do ClauseSet CS k T )]TJ/F24 11.9552 Tf 22.582 0 Td [(BMC D;true;k ; for p 2 P do ClauseSet CS k p )]TJ/F24 11.9552 Tf 22.582 0 Td [(BMC D;p;k ; Step1 :In CS k p ,markallclausesthatalsoexistin CS k T ; Step2 : ConflictC;test p )]TJ/F20 11.9552 Tf 22.584 0 Td [(SAT CCS S CS k p ; Step3 : CCS )]TJ/F24 11.9552 Tf 22.582 0 Td [(CCS S CheckMark ConflictC ; if test p 6 = null then remove p from P ; TS )]TJ/F24 11.9552 Tf 22.582 0 Td [(TS S test p ; end end k )]TJ/F24 11.9552 Tf 22.582 0 Td [(k +1 ; end return TS thenrandomlychooseaproperty p fromthepropertyset P ,andcreateitsownclause set CS k p correspondingto I s 0 ^ V k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 i =0 R s i ;s i +1 ^ W k i =0 : p s i .Next,weperform following3steps. 1.Markallclausesin CS k p whicharealsoin CS k T .Since CS k T remainssamefor allpropertiesat k ,thisstepcanbeimplementedefcientlybytablelookup,as describedinSection3.2.2. 2.UseaSATsolvertosolvetheCNFformula CCS S CS k p ,whichcontainsnotonly CS k p ,butalsoallpreviouslylearnedconictclausesin CCS 3.Fornewconictclauses ConflictC learnedbySATsolver,mergetheclauses deducedpurelybymarkedclausesinto CCS .Thisstepissimilartotheisolation techniqueproposedin[78]and[63]. Ifthesatisedassignment,oracounterexample test p isfoundinstep2,werecord itintestset TS andremove p fromP.Thisprocessrepeatsuntiltestsforallproperties 35

PAGE 36

arefoundorthemaximumbound K max isreached.Finally,thealgorithmreturnsall generatedtests. Figure3-3.Synchronizedtestgenerationformultipleproperties WeusethesameexampleinFigure3-2toillustratetheowofAlgorithm2.The clauseforwardingpathareshowninFigure3-3.Intherstiterationfor k =0 ,suppose werandomlypick p 2 fromthepropertyset.Atthebeginning,thecommonconict clauseset CCS isempty.Thus, p 2 issolveddirectly.Sincetheboundof p 2 is2,the SATinstanceisnotsatisableandnotestisgenerated.However,allconictclauses deducedbasedonclausesin CS 0 T arenowrecordedin CCS ,andwillbeusedto acceleratethesolvingprocessofboth p 1 and p 3 atbound0.Similarly,theconict clausesgeneratedduringsolving p 1 at k =0 willbeusedtospeedup p 3 at k =0 assumes p 3 issolvedlast.Inthenextiteration,allinstanceswillbesolvedwiththe helpofconictclauseslearnedbyallthreeSATinstancesat k =0 ,becauseallconict clausesarerecordedin CCS .Eventually,threetestswillbegeneratedatbound3,2, and1for p 1 p 2 and p 3 respectively.InthecaseofFigure3-2D,sinceboth : a 0 b 0 c 1 a 0 _: d 1 and a 0 _: e 1 areclausesfromthetransitionconstraintofthesystem,both b 0 c 1 _: d 1 and b 0 c 1 _: e 1 willberecordedin CCS basedonAlgorithm2. Therefore,duringthesolvingprocessof p 1 withbound k =2 ,theSATsolverwillskipthe assignment f b 0 ;c 1 ;d 1 g = f 0 ; 0 ; 1 g and f b 0 ;c 1 ;d 1 g = f 0 ; 0 ; 1 g .Inthisway,theunnecessary wasteoftimeisavoided. 36

PAGE 37

NotethatouralgorithmdoesnotrequiretheSATinstancestobepreprocessed usingCone-Of-InuenceCOIoptimizationasin[79]and[63],becauseoriginal SATinstanceshavemoreoverlappedclauses,whichareeffectivelyexploitedbyour approachtoacceleratetheoverallsolvingprocess.OurexperimentalresultsinSection 3.3showthatourapproachwithoutCOIoutperforms[79]and[63]thatuseCOI optimization. Intheremainderofthissection,weprovethecorrectnessofourapproachand discusstheimplementationdetailsofoursynchronizedtestgenerationalgorithm. 3.2.1CorrectnessoftheProposedApproach Toshowthecorrectnessofourtestgenerationapproach,weneedtoshowthatin Algorithm2,solving CCS S CS k p isequivalenttosolving CS k p .Formally,let k p and be theCNFformulaeformedbyclauseset CS k p and CCS respectively,weneedtoprove that k p issatisableiff k p ^ issatisableusingthefollowinglemma. Lemma1. k p ` forall p 2 P and k 0 Proof. Let k T betheCNFformulaformedby CS k T .Werstshowthat k T ` for k 0 byinductiononthesizeof .Inthebasisstep,formula3obviouslyholds because isempty. Consideringthemomentbeforeanewconictclause isaddedto insome iterationwhenthebound k 0 k mustbededucedfrom k 0 T ^ ,i.e., k 0 T ^ ` .By inductionhypothesis, k T ` before isaddedinto .Wealsoknowthat k T ` k 0 T becausetheiroriginalformssatisfy I s 0 ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 R s i ;s i +1 ` I s 0 ^ k 0 )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ i =0 R s i ;s i +1 37

PAGE 38

Hence, k T ` k 0 T ^ .Asaresult,wehave k T ` and k T ` ^ ,whichmeans formula3stillholds,afteranynewclauseisaddedto ,aslongas k 0 k Ontheotherhand,wenoticethat I s 0 ^ k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ i =0 R s i ;s i +1 ^ k i =0 : p s i ` I s 0 ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 R s i ;s i +1 or k p ` k T Therefore,weconcludethat k p ` forall p 2 P and k 0 Since k p ` ,wehave k p $ k p ^ .Inotherwords, Theorem3.1. 8 p 2 P' k p issatisableiff k p ^ issatisable. Thecorrectnessofourapproachisthereforejustied. 3.2.2ImplementationDetails OursynchronizedtestgenerationalgorithmisbuiltaroundzChaffSATsolver[38], whichprovidesclausemanagementschemetosupportincrementalSATsolving.zChaff maintainsallinputclausesandgeneratesconictclauseswithinaninternalclause database DB .Wheninvoked,itwillsolvetheCNFformedbyallclausescurrentlyin DB .Themanagementofclauseswithindatabase DB isbasedongroup.Foreach clause,zChaffassignsa32-bitgroupID.Eachbitidentieswhetherthatclausebelongs toacertaingroupornot.Whenaconictclauseisdeducedbyclausesfrommultiple groups,itsgroupIDisaORproductofthegroupIDofallitsparentclauses,i.e.,this clausebelongstomultiplegroups.zChaffalsoallowsusertoaddorremoveclauses bygroupIDbetweensuccessivesolvingprocesses.Ifoneclausebelongstomultiple groups,itisremovedwhenanyofthesegroupsareremoved. 38

PAGE 39

Withtheseutilities,thestep1and3inAlgorithm2canbeimplementedefciently asfollows: 1.Intheclausemarkingstep,addallclausesin CS k T T CS k p into DB withgroupID1. 2.Addotherclausesin CS k p into DB withgroupID2. 3.Aftersolvingallclausesin DB withzChaff,removeclauseswithgroupID2. Inthisway, CCS isimplicitlymaintainedwithin DB ,becauseonlyconictclauses generatedpurelybasedonclausesin CCS S CS k T arekeptaftereachiteration. Thereisanotherpotentialoverheadinstep1.Beforewemarkitin CS k p ,wehaveto identifywhetheritisin CS k T .Since CS k T remainssameforallpropertiesat k ,webuild ahashtabletorecordallclausesin CS k T .Ittakes O timetodeterminewhethera clausefrom CS k p isin CS k T .Therefore,theoveralltimeconsumptionofsteps1and3in Algorithm2isnegligiblecomparedtotheSATsolvingtime. 3.3Experiments Wehaveevaluatedourtestgenerationapproachusingdifferentsoftwareand hardwaredesigns.Inthissection,wecompareourapproachwithexistingmethods [79]and[63]inthreescenarios:astockexchangesystem,aVLIWimplementationof theMIPSarchitecture,andISCA'89benchmarkcircuits.Inthersttwoscenarios,the systemsandpropertiesaredescribedinSMVlanguageandconvertedtoCNFclauses DIMACSlesusingNuSMV[18].WeusedzChaff[38]asourSATsolvertoimplement ourtestgenerationalgorithm.TheexperimentswereperformedonaPCwith3.0GHz AMD64CPUand4GBRAM. 3.3.1AStockExchangeSystem Thedesigninourrstcasestudysimulatesthebehaviorofacommononline stockexchangesystemOSES.Itcanaccept,checkandexecutethecustomers ordersmarketorderandlimitorder.ThesystemisspeciedusingUMLactivity diagramandimplementedinJAVA.ItsUMLbehaviorspecicationhas27activities, 29transitionsand18keypaths.ThespecicationistranslatedintoNuSMVinputto 39

PAGE 40

generatecorrespondingSATinstances.ThenweapplyoursynchronizedSATsolving approachtondthesatisableassignments,whichcanbeusedastests.Wecompared ourapproachwithStrichman'sapproach[79]andanaivecombinationof[79]and[63] ondifferentpropertieswithunknownbounds.ForStrichman'sapproach[79],weuseit tosolveasequenceofSATinstancesforthesamepropertywithvaryingboundsuntil asatisableinstanceisfound.Thenaivecombinationof[79]and[63]isdevelopedas describedinSection3.2.AfterSATinstancegeneration,weappliedconeofinuence COItospeedupStrichman'sapproach.Whenourapproachwasapplied,wedidnot useCOIasindictedinSection3.2. Table3-1.TestgenerationtimecomparisonforOSES Prop.BoundOur[79]vsours[79]+[63]*vsours Approach[79]Speed-[79]+[63]SpeedTimesTimesupTimesup 1152.94180.3161.2467.5822.96 2142.55150.4959.0657.7022.64 3143.12149.8948.0461.1119.59 41510.54139.5613.2542.534.04 51419.38130.586.7455.742.88 6142.97107.1336.0961.6620.77 7166.61101.6715.3935.865.43 8163.5489.3125.203.761.06 9151.7384.1948.7238.9722.55 10121.9684.0742.805.512.80 11131.2183.9469.4822.5418.66 12152.8383.8029.5939.7714.04 13155.6083.0114.8123.494.19 14141.3480.2559.8822.6016.86 151411.1679.797.1522.532.02 16150.8578.7292.3910.9412.85 17150.8878.2888.9514.5116.49 18150.8678.1990.4912.6014.58 191279.4074.960.9475.100.95 20121.3873.4653.235.433.93 Total-160.872011.62 12.50 679.93 4.23 Thisisanintuitivecombinationof[79]and[63]Figure3-2C.Wehaveshowntheseresultstodemonstrate howourapproachissuperiorthananynaivecombinationofexistingmethods[79]and[63]. 40

PAGE 41

Table1showstheresultsof20mosttimeconsumingpropertiesusingStrichman's approach[79].Therstcolumnshowsthepropertiesusedfortestgeneration.The secondcolumnindicatescorrespondingboundsofeachproperty.Thethirdcolumn showsthetestgenerationtimeinsecondsforeachpropertyusingourapproach.The timeconsumedbysteps1and3inAlgorithm2isalsocountedinthiscolumn.The fourthcolumnindicatesthetimerequiredbyStrichman'sapproach[79]togenerate thetestforthesameproperty.Thetimeiscalculatedasthesummationofthetimeto solvealltheSATinstancesfrom k =0 totheboundoftheproperty.Thefthcolumn showsthespeedup 1 ofourapproachover[79].Thelasttwocolumnspresentthe testgenerationtimeusingthenaivecombinationof[79]and[63]andthespeedup ofourapproach.Itcanbeseenthatourapproachcanproducemorethan10times improvementcomparedto[79],becausemanymoreconictclausesarereusedby subsequentiterations.ThisisespeciallyimportantforhardSATinstances,whichhave toexploreapotentiallylargeassignmentspace.Forexample,thehardestproperty p 1 for[79]actuallyconsumeslessthan3secondsinourapproach.Clearly,thetime consumptionforsolvingmultipleSATinstancesusingourapproachissignicantly smallerthanthesummationoftimetosolveeachinstancesindependently.The overalltimeconsumptionisreducedbyknowledgesharingduringsolvingallproperties synchronously. Oneinterestingobservationisthatthemosttimeconsumingproperty p 19 inour approachhasaboundofonly12.Thereasonforthisisthattheclauseslearnedduring thesolvingprocessofeasierpropertieslike p 19 eliminatedsomeuselesssearching attemptsforthesolutionofharderpropertieslike p 1 .Moreimportantly,theseclauses aremoreeffectivethantheconictclauseslearnedduringsolvingSATinstancesofthe samepropertywithsmallerbounds.Although p 19 itself,whichwassolvedrst,didnot 1 calculatedaspreviouscolumn/thirdcolumn 41

PAGE 42

benetfromotherproperties,theoveralltimeconsumptionwasdramaticallyreduced. Asaresult,ourapproachoutperforms[79],whichonlyforwardsclauseswithinSAT instancesofthesameproperty. Forthenaivecombinationof[79]and[63],wechose p 19 asthebasepropertyand forwardedtheclauseslearnedduringsolvingittootherpropertiesatbound11.These parametersareselectedtoillustratethebestpossibleperformanceofthecombination. ItisremarkablyfastercomparedtoStrichman'sapproach[79],althoughitisstill4times slowerthanourapproach.Itshouldbenotedthatinreality,itisimpossibletochoose theoptimalparameterforthiscombinationbecausetheboundsareunknownforall properties.Inotherwords,theperformanceofthenaivecombinationof[79]and[63] willbemuchworsethanweillustratedhere.Thus,ourapproachwilloutperformitmore signicantlyinpracticalscenarios. Wealsoinvestigatedtheimpactofclustersizeontheoverallsolvingtime.The total135propertiesareclusteredintogroupswithdifferentsize.Theresultsare showninFigure3-4.Figure3-4Apresentstheoverallsolvingtimewithrespectto differentaverageclustersize.Figure3-4Bshowsthecorrespondingaveragenumberof forwardedclausesperclustersolidcurveandthetotalnumberofconictsencountered fordifferentclustersizedottedcurve.Theirvaluescanbefoundontheleftandright Y-axisrespectively.Theresultsuggeststhatlargerclusteringisgenerallyhelpfulto reducetheoverallsolvingtime.Thereasonisthatthenumberofforwardedclauses usuallyincreaseswiththeaverageclustersize,whichcaneffectivelyreducethetotal numberofconictencounteredduringthesolvingprocess. 3.3.2AVLIWMIPSProcessor Wealsoappliedourtestgenerationapproachtoasingle-issueMIPSprocessor [42],[64].Therearevepipelinestages:fetch,decode,execute,memoryaccess, andwriteback.Theexecutestagehasfourparallelexecutionpaths:integerALU,7 42

PAGE 43

Figure3-4.TestgenerationforOSESusingdifferentclustersize.ATimeconsumption. BForwardedclausesandencounteredconicts. stagemultiplierMUL1-MUL7,fourstageoating-pointadderFADD1-FADD4,and multi-cycledividerDIV. WetranslatedthedesignintotheNuSMVinputandusedthethreeapproaches tosolvethegeneratedSATinstancesfordifferentpropertiesandbounds.Forthe combinationof[79]and[63],wechose p 17 asthebasepropertyandforwardedlearned clausestobound7.TheresultsaregiveninTable2.Weonlyshowtheresultson20 mosttimeconsumingpropertiesusingStrichman'sapproach.Itcanbeseenthatour 43

PAGE 44

approachoutperformsbothStrichman'sapproach[79]andthenaivecombinationof[79] and[63]by15and3timesrespectively. Table3-2.TestgenerationtimecomparisonforMIPS Prop.BoundOur[79]vsours[79]+[63]*vsours Approach[79]Speed-[79]+[63]SpeedTimesTimesupTimesup 180.78139.29179.4818.6624.04 280.74132.07178.4619.4526.29 380.76125.18164.7018.1823.93 480.76120.02158.7418.4524.40 580.76115.84151.6127.1435.53 690.86111.13129.8158.2668.06 780.81108.09133.7626.6332.95 890.95104.56110.2953.5956.52 980.7596.25128.6716.7722.41 1080.7787.24113.0016.4721.33 1180.7687.23114.7717.3722.85 1280.7784.98110.6416.4521.42 1370.6581.08125.1113.3520.60 14932.3180.252.4831.610.98 1580.7675.4799.307.259.54 1680.7672.0594.3020.6326.99 17776.5471.720.9472.300.94 1881.0070.0570.3319.4619.53 1980.7669.8591.906.989.19 2080.7665.8087.0311.0814.65 Total-122.991898.13 15.43 490.06 3.98 TheimpactofclustersizeontheoverallsolvingtimeareshowninFigure3-5.There are170propertiesintotal.Itcanbeobservedthattheoverallsolvingtimebecomes constantaftertheaverageclustersizeismorethan50Figure3-5A.Atthesame time,thenumberofforwardedclausesperclusterisnotincreasing,asindicatedbythe dottedcurveinFigure3-5B.Thisphenomenoncanbeexplainedbythefactthatonce theclustersarelargeenoughtoincludeallthesimilarproperties,theoverallsolvingtime willnotbefurtherimprovedandthenumberofforwardedclausesbecomesstable. 44

PAGE 45

Figure3-5.TestgenerationforMIPSusingdifferentclustersize.ATimeconsumption. BForwardedclausesandencounteredconicts. 3.3.3CircuitTestGeneration Weappliedourtestgenerationapproachtoactivatestuck-atfaultusingISCA89 benchmark.Foreachcircuit,wesearchforinputsequences,whichcangenerate0 and1oneachoutputport.BenchmarkcircuitsaretranslatedintoCNFusingstandard formulaeinzChaff.TheresultsaregiveninTable3.Thesolvingtimelimitforeach propertyis100seconds.Weonlyshowtheresultson5circuitswithmaximumtotal 45

PAGE 46

testgenerationtimeusingStrichman'sapproach.Itcanbeseenthatourapproach outperformsbothStrichman'sapproach[79]especiallyforcomplexcircuitslikes38584. Table3-3.Testgenerationtimecomparisonforcircuits Circuit#Prop.Our[79]vsours Approach[79]SpeedTimesTimesup s132073044815681.18 s158503002412701.13 s359326402202321.05 s384172121672101.25 s38584606254333771.32 Total-365246571.28 Inordertoinvestigatetheimpactofclustersizeontheoverallsolvingtime,wealso appliedourtestgenerationtechniqueoncircuits3854withdifferentclustersize.There are606propertiestobecheckedonthedesign.TheresultsareshowninFigure3-6. Itcanbeseenthatalthoughthesolvingtimestilldecreasesatthebeginningwhen largerclustersizeisused,itmightnotalwaysbeoptimaltoclusterallpropertiesinto asinglegroup.Thereasonisthattoolargeclustersizemaycausemanyforwarded clausestobeaccumulatedintheSATsolver'sdatabase,asindicatedbythesolidcurve inFigure3-6B.Toomanyforwardedclausescanmisleadthesearchingprocessofthe SATsolver,whichwilleventuallyincreasetheoverallsolvingtime. 3.4Summary Automaticgenerationofdirectedtestsispromisingforsimulationbasedfunctional validationbecauseitrequireslessnumberoftestvectorstoachievethesamecoverage requirement.However,itsapplicabilityislimitedduetothecapacityrestrictionofcurrent modelcheckingtools.ExistingincrementalSATapproachesaresuitableonlyfora singlepropertywithunknownboundorformultiplepropertieswithknownbounds. Wepresentedanefcienttechniquefortestgenerationbyreusinglearnedknowledge acrossmultiplepropertiesanddifferentbounds.Toenableknowledgesharingamong propertiesaswellasbounds,wepresentedasynchronizedtestgenerationtechnique 46

PAGE 47

Figure3-6.Testgenerationforcircuitsusingdifferentclustersize.ATimeconsumption. BForwardedclausesandencounteredconicts. formultiplepropertieswithdifferentbounds.SATinstancesfordifferentpropertiesare solvedtogether,sothatthediscoveryandutilizationofthecommonconictclauses canbemaximized.Theoveralltimeconsumptionofcheckingmultiplepropertiesusing ourapproachisremarkablysmallerthanthesummationoftimetocheckeachproperty independently.Ourexperimentalresultsonbothhardwareandsoftwaredesigns demonstratedanorder-of-magnitudereductioninoveralltestgenerationtime. 47

PAGE 48

CHAPTER4 EFFICIENTTESTGENERATIONFORMULTICOREARCHITECTURES Chapter3exploredhowtoreusetheknowledgeduringtestgenerationinsingle-core deisgns.WhenSAT-basedBMCisappliedtogeneratedirectedtestsformulticore architectures,therearetwodifferentcategoriesofsymmetryinthecorrespondingSAT instances.Therstcategoryisthetemporalsymmetry.ItoccursbecausetheSAT instanceisencodedbyunrollingthesamearchitectureformultipletimes.Thisregularity hasalreadybeenexploitedbyexistingresearch[79]toacceleratetheSATsolving process.Ontheotherhand,thestructuralsimilarityofmultiplecoresalsointroduces asecondcategoryofsymmetryorspatialsymmetry.Thissymmetryappearsamong theCNFclausesfordifferentcoresatthesametimestep.Intuitively,wecanalsoexploit spatialsymmetrybyreusingtheknowledgeobtainedfromonecoretoothercores. Unfortunately,thisintuitivereasoningishardtoimplementbecauseitisverydifcultto reconstructthesymmetryfromtheCNFformula.Thehighlevelinformationislostduring CNFsynthesis,anditisinefcientaswellascomputationallyexpensivetorecover throughreverseengineeringmethods. Inthiswork[69],weaddressthedirectedtestgenerationformulticorearchitectures bydevelopinganovelBMCbasedtestgenerationtechnique,whichenablesthereuse oflearnedknowledgefromonecoretotheremainingcoresinthemulticorearchitecture. InsteadofdirectsynthesisoftheCNFforthemulticoredesign,wecomposethe CNFdescriptionoftheentiredesignusingCNFformulaeforcoresandthememory subsystem.SincetheCNFrepresentationofcoresaregeneratedbyperformingvariable substitutionoftheCNFforoneofthem,thecorrectmappinginformationiseasily obtained.Inthisway,weareabletotranslateandreusetheconictclauseslearnedon anycoretoothercores.WeprovethattheCNFdescriptiongeneratedbyourapproach hasthesamesatisabilityasoriginalmethods.Ourexperimentalresultsdemonstrate thatourapproachcanremarkablyreducetheoveralltestgenerationtime. 48

PAGE 49

Therestofthechapterisorganizedasfollows.Section2describesourtest generationmethodologyformulticorearchitectures.Section3presentsourexperimental results.Finally,Section4summarizesthechapter. 4.1TestGenerationforMulticoreArchitectures OurworkismotivatedbypreviousworksonincrementalSAT-basedBMC[79]. Basedonthetemporalsymmetrybetweendifferentbounds,thesemethodsaccelerate theSATsolvingprocessbypassingtheknowledgededucedconictclausesinthe temporaldirection.Nevertheless,theSATinstancesgeneratedbymulticoredesigns alsoexhibitremarkablespatialsymmetry.Figure4-1depictsthehighlevelstructureof asystemwith2cores.Bothcoresareidentical 1 andconnectedtomemorysubsystem withabus.Figure4-2showstheSATsolvingprocesswhenweperformBMCforbounds 0,1,2,and3onthismulticorearchitectureusingthetechniqueproposedin[79].We usesoliddotstorepresentdifferentSATinstancesandlinestoindicatetheconict clauseforwardingpaths.Althoughdifferentcoreshaveidenticalstructures,thisspatial symmetryisnotexploited. Figure4-1.Abstractedarchitectureofatwocoresystem Intuitively,itshouldbebenecialiftheknowledgeorconictclausescanalso besharedverticallyamongdifferentcoresasshowninFigure4-3,becausethe 1 Werstdiscussourapproachinthecontextofhomogeneouscores.Theapplication ofourapproachonheterogeneouscoreswillbepresentedinSection4.1.3. 49

PAGE 50

solvingeffortspentonasinglecorecanbereusedbyothercorestosaveoveralltime consumption.Unfortunately,thespatialsymmetryisdifculttorecoverfromtheCNF representationoftheSATinstance.Thereasonisthatmostclausescontainauxiliary variablesintroducedduringtheCNFencodingprocess.Sincetheseauxiliaryvariables areunlabeled,thecorrespondencebetweenclausesfromdifferentcorescannotbe establisheddirectly.Althoughthespatialsymmetrycanbepartiallyrecoveredbysolving agraphautomorphismproblem[3,5,30],itmayrequireimpracticaltimeforlarge designs,becausenopolynomialtimesolutionisfoundforgraphautomorphismproblem. Theunderlyingreasonforthisdilemmaisthatthehighlevelinformationislostafterthe CNFencoding.Inotherwords,asingleattenedCNFSATinstanceisnotsuitableto exploitthespatialsymmetry. Figure4-2.IncrementalSATsolvingtechnique[79] Figure4-3.Testgenerationformulticorearchitectures 50

PAGE 51

InsteadofusingamonolithicCNFasinput,ourapproachsolvesthisproblemby composingtheCNFdescriptionofthesystemusingCNFformulaeforonecore,busand thememorysubsystem.Sincethecoresareidentical,theirCNFrepresentationsare identicalaswell.WejustneedtoperformvariablenamesubstitutiontoobtaintheCNF forallothercores.AsshowninTheorem4.1,whenthestatevariablesaresubstitutedby thecorrectnames,thesystemCNFcomposedbythesereplicatedCNFforcores,bus aswellasmemorysubsystemwillhavethesamesatisabilitybehaviorastheoriginal monolithicCNFrepresentation.Sinceboththestatevariablesandauxiliaryvariablesin replicatedcoresareassignedbyouralgorithm,itiseasytoobtainthecorrectmapping betweenvariablesandclausesindifferentcores.Thespatialsymmetrycanthenbe effectivelyexploitedduringtheSATsolvingprocess.Beforewedescribeouralgorithmin details,werstintroducesomenotations. Denition1. SymmetricComponentSC isasetofidenticalnitestatemachines FSM.Forthe j th FSMwithinaSC,wedenoteitsinitialconditionandtransitional constraintsas I s is 0 ;j and R s is i;j ;s in i;j ;s is i +1 ;j ;s out i +1 ;j i k )]TJ/F15 11.9552 Tf 12.079 0 Td [(1 ,where s in i;j ;s out i +1 ;j ;s is i;j are itsinputvariables,outputvariables,andinternalstatevariablesatthe i th i +1 th time step.ItshouldbenotedthatasymmetriccomponentitselfcanalsobeviewedasFSM, whoseinputandoutputvariablesarethecollectionofalltheinputandoutputvariables ofFSMswithinit. Figure4-4.FSMrepresentationofFigure4-1attimestep i 51

PAGE 52

Inamulticoresystemwith N S identicalcores,wemodelthesetofallcoresasa symmetriccomponent F S .Otherasymmetriccomponents,suchasbusandmemory subsystem,aremodeledasasinglenitestatemachine F A .Wealsomaptheinput andoutputof F A totheoutputandinputof F S sothatdifferentcorescanperform communicationthroughbusandmemorysubsystem.Formally,wedenotetheinitial conditionandtransitionconstraintsof F A as I s A 0 and R s A i ;s Sout i ;s A i +1 ;s Sin i +1 i k )]TJ/F15 11.9552 Tf -458.126 -23.908 Td [(1 ,where s A i representinternalstatevariablesinbusandmemorysubsystematthe i th timestep.Moreover, s Sin i = f s in i;j j 1 j N S g and s Sout i = f s out i;j j 1 j N S g aretheinput andoutputvariablesofthesymmetriccomponent F S ,whichisthecombinationofthe inputsandoutputsofallcores.Forexample,Figure4-4showstheFSMrepresentation ofthesysteminFigure4-1.Thesymmetriccomponent F S iscomposedofcore1and core2.Therestofthesystemisrepresentedby F A .Inthe i th timestep,theinternal statevariableof F S are f s is i; 1 ;s is i; 2 g and s A i .Theinputandoutputvariablesof F S alsothe outputandinputvariableof F A are s Sin i = f s in i; 1 ;s in i; 2 g and s Sout i = f s out i; 1 ;s out i; 1 g ,respectively. TheBMCformulaofthemulticoresystemcanbeexpressedas BMC M;p;k = I s 0 ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 R s i ;s i +1 ^ k i =0 : p s i = I s A 0 ^ N S ^ j =1 I s is 0 ;j ^ k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ i =0 R s A i ;s Sout i ;s A i +1 ;s Sin i +1 ^ N S ^ j =1 R s is i;j ;s in i;j ;s is i +1 ;j ;s out i +1 ;j ^ k i =0 : p s i ThebasicideaofourapproachistogenerateCNFformula BMC 0 M;p;k = CNF A I ^ N S ^ j =1 CNF S I j ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 CNF A R i ^ N S ^ j =1 CNF S R i;j ^ CNF p k 52

PAGE 53

Algorithm3: TestGenerationforMulticoreArchitectures Input :CNFformulae CNF A I CNF S I CNF A R i CNF S R i; 1 CNF p k Numberofcores N S ,Maximumbound K max Output :Test test p Bound k )]TJ/F15 11.9552 Tf 22.582 0 Td [(0 ; Initializevariablemappingtable T ; CommonClauseSet CCS )-278(; ; Generate CNF S I j using CNF S I for 1
PAGE 54

table T 2 torecordthesymmetricsetofvariablesforbothstatevariablesandauxiliary variables.Afterthat,weinvoketheSATsolvingprocessontheconjunctionofclauses inCCSand CNF p k ,whichisequivalentto BMC 0 M;p;k denedabove.Next,we performthefollowing2steps. 1.DuringSATsolving,analyzeanyconictclause cls foundbytheSATsolver.If cls ispurelydeducedbytheclauseswhichbelongtoasingleFSM,replicateand forward cls toallotherFSMs.Thisisimplementedbysubstitutingthevariablesin cls bytheircounterpartsforeachFSMin F S basedontable T .Atthesametime, wealsoreplicatethe cls intemporaldirection,asdiscussedin[79]. 2.Afterthesolvingprocess,onlykeepnewconictclausesthatarededuced independentof CNF p k ,andmergetheminto CCS Ifthesatisedassignment,oracounterexample test p isfoundinstep1,the algorithmreturnsitasatest.Otherwise,thealgorithmrepeatsforeachbound k untilthe maximumboundisreached. Figure4-5.Testgenerationformulticorearchitectures WeusethesameexampleinFigure4-1toillustratetheowofAlgorithm1.The twodifferentclauseforwardingpathsemployedinourapproachareshowninFigure4-5. Suppose : a i b i c i +1 and a i _: d i +1 aretwoclauseswithin CNF S R i; 1 transition 2 AsdiscussedinSection4.1.2,aphysicaltableisnotrequired,insteadamapping functionisusedinourframework. 54

PAGE 55

constraintofCore1,intherstiterationfor k =0 ,twoclauses : a 0 i b 0 i c 0 i +1 and a 0 i _: d 0 i +1 willbeproducedduringthegenerationof CNF S R i; 2 transition constraintofCore2.InthesubsequentSATsolvingprocess,supposeaconictclause b i c i +1 _: d i +1 isdeducedbasedon : a i b i c i +1 and a i _: d i +1 ,itwillbe forwardedtoCore2,becauseitstwoparentclausesareallfromtheCNFformulafor Core1.Therefore, b 0 i c 0 i +1 _: d 0 i +1 cannowbeusedbyCore2topreventthepartial assignment f b 0 i ;c 0 i +1 ;d 0 i +1 g = f 0 ; 0 ; 1 g ,whichwillresultinaconicton a 0 i .Suchforwarding ofconictclausesisnotpossibleusingStrichman'sapproach[79],whichonlyconsiders temporalsymmetrybutnotspatialsymmetry. Intheremainderofthissection,weprovethecorrectnessofourapproachand discusstheimplementationdetailsofourdirectedtestgenerationalgorithmformulticore architectures. 4.1.1CorrectnessofOurProposedApproach Toprovethecorrectnessofourtestgenerationapproach,weneedtoensurethat theproducedCNFformula BMC 0 M;p;k inAlgorithm3hasthesamesatisabilityas BMC M;p;k Theorem4.1. BMC M;p;k and BMC 0 M;p;k havethesamesatisability. Proof. Clearly,wehave BMC M;p;k = I s 0 ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 R s i ;s i +1 ^ k i =0 : p s i = I s A 0 ^ N S ^ j =1 I s is 0 ;j ^ k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ i =0 R s A i ;s Sout i ;s A i +1 ;s Sin i +1 ^ N S ^ j =1 R s is i;j ;s in i;j ;s is i +1 ;j ;s out i +1 ;j ^ k i =0 : p s i Bytheirdenitions,CNFformulae CNF A I CNF S I j CNF A R i CNF S R i;j and CNF p k areCNFrepresentationofpropositionalformulae I s A 0 I s is 0 ;j R s A i ;s Sout i ;s A i +1 ;s Sin i +1 R s is i;j ;s in i;j ;s is i +1 ;j ;s out i +1 ;j and W k i =0 : p s i ,where 0 i k )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 and 1 j N S 55

PAGE 56

Therefore, BMC M;p;k hasthesamesatisabilityas BMC 0 M;p;k = CNF A I ^ N S ^ j =1 CNF S I j ^ k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ i =0 CNF A R i ^ N S ^ j =1 CNF S R i;j ^ CNF p k becausetheauxiliaryvariablesintroducedduringCNFconversiondonotchange thesatisability.Inotherwords, BMC M;p;k and BMC 0 M;p;k havethesame satisability. Infact,thevalueofstatevariablesinasatisfyingassignmentof BMC 0 M;p;k also satisfy BMC M;p;k andthereforecanbeusedasacounterexampleoftheproperty p Thereasonisthatthevalueofthevariablesinasatisfyingassignmentof BMC 0 M;p;k willalsosatisfyallCNFformulae CNF A I CNF S I j CNF A R i CNF S R i;j and CNF p k .Thus,thevalueofthestatevariableswillsatisfycorrespondingpropositional formulae I s A 0 I s j 0 R s A i ;s A i +1 R s j i ;s j i +1 and W k i =0 : p s i .Hence,theytogetherwill satisfy BMC M;p;k ,whichisaconjunctionofabovepropositionalformulae.Therefore, thecorrectnessofouralgorithmisjustied. 4.1.2ImplementationDetails OurtestgenerationalgorithmformulticorearchitecturesisbuiltaroundNuSMV modelchecker[18]andzChaffSATsolver[38].WerstmodelthesystemusingSMV language,thenuseNuSMVtogeneratetheCNFformulae CNF A I CNF S I CNF A R i CNF S R i; 1 and CNF p k inDIMACSformatastheinputofAlgorithm3.zChaffis employedastheinternalSATsolver.Inthissection,webrieyexplainCNFgeneration processandtheimplementationofStep1andStep2inAlgorithm3. ThegenerationofCNFdescriptionsforasinglecore,busandmemorysubsystem usingNuSMVisstraightforward.Theonlypracticalconsiderationisthatallvariables arerepresentedbytheirindicesinCNFclauses.Asaresult,itisimportanttoavoid thesameindextobeusedbytwodifferentvariables.SinceNuSMVdoesnotofferany 56

PAGE 57

externalinterfacetocontroltheindexassignment,wemodiedthesourcecodetomake theindexspacesuitableforourpurpose.Thebasicideaistomaketheassignmentof indicessatisfythefollowingtwoconstraints:1theindicesofvariablesfromthesame coreatthesametimestepareassignedcontinuously;2theindicesofvariablesof thesametimestepacrosscoresareassignedcontinuouslyaswell.Forexample,in a2-coresystemwitheachcorehaving100variables,intimestep1forcore1wecan useindicesfrom1-100controlledbytherstconstraintwhereasthesecondconstraint indicatesthatthevariablesforcore2attimestep1shouldbe101-200.Therefore, 201-300canbeusedtorepresentvariablesofcore1intimestep2,andsoon.Based onthesetwoconstraints,thecomputationoftheindicesofsymmetricvariablescanbe efcientlyimplementedasincreasingordecreasingbyacertainoffset. DuringSATsolving,wealsoneedtotrackthedependencyofgeneratedconict clausestodeterminewhethertheycanbeforwardedtoothercores.Thiscanbeeasily implementedwithinzChaff,whichprovidesclausemanagementschemetosupport incrementalSATsolving.Foreachclauseinitsclausedatabase DB ,zChaffusesa 32-bitgroupIDtotrackthedependency.Eachbitidentieswhetherthatclausebelongs toacertaingroup.Whenaconictclauseisdeducedbasedonclausesfrommultiple groups,itsgroupIDisaORproductofthegroupIDofallitsparentclauses,i.e.,this clausebelongstomultiplegroups.zChaffalsoallowsusertoaddorremoveclauses bygroupIDbetweensuccessivesolvingprocesses.Ifoneclausebelongstomultiple groups,itisremovedwhenanyofthesegroupsareremoved. Withthesemechanisms,thestep1and2inAlgorithm3canbeimplemented efcientlyasfollows: 1.Addclausesin CNF S I j and CNF S R i;j withgroupID j 1 j N S 2.Addclausesin CNF A I CNF A R i withgroupID N S +1 3.Addclausesin CNF p k withgroupID N S +2 57

PAGE 58

4.WhenanewconictclauseisobtainedduringSATsolving,ifitonlybelongstoa singlegroupwithIDsmallerthan N S +1 ,replicatethisclausetoallothercoreswith propergroupID. 5.Aftersolvingallclausesin DB withzChaff,removeclauseswithgroupID N S +2 Theoverheadintroducedbydependencyidenticationandtrackinginouralgorithm isnegligiblecomparedtotheimprovementinSATsolvingtime.Atthesametime,since theindicesofvariablesinsymmetriccoresarecarefullyassigned,themappingtableTis notmaintainedexplicitly,butimplementedasasimplemappingfunction,whichisused togenerateforwardingclausesfordifferentcores.Inthatway,weavoidthepotential cachingoverheadwhichmaydeterioratetheperformanceoftheSATsolver. 4.1.3HeterogeneousMulticoreArchitectures Sofar,wediscussedouralgorithmusinghomogeneouscores.Thissection describestheapplicationofourapproachinthepresenceofheterogeneouscores. Inaheterogeneousmulticoresystem,ifanytwocoresarecompletelydifferent,itis notpossibletoreducethetestgenerationtimebyexploitingthesymmetry.However, mostrealsystemsusuallyemployaclusterofidenticalcoresforsamecomputational purpose.Inthiscase,wecanrstgroupthemintosymmetriccomponentsbasedon theirtypes,thenapplyouralgorithmtoeachsymmetriccomponent.Forexample,in the5-coresystemshowninFigure4-6,core5isusedformonitoringandcore1-4are identicalcoresforcomputation.Wecandenecore1-4asthesymmetriccomponent andapplyouralgorithmonthem.Ingeneral,wecanapplyouralgorithmoneachcluster ofidenticalcoresinasystem. However,whentheheterogeneouscoresarenotcompletelydifferent,i.e.,only somefunctionalunitsinthemaredifferent,ourproposedalgorithmcanbeemployedin amoreefcientway.RecallthattheFSMsinasymmetriccomponentarenotrestricted tocores.Wecanactuallydenethesymmetriccomponentinsuchawaythatitincludes onlytheidenticalfunctionalunitsindifferentcores.Forexample,Figure4-7showsa systemwithheterogeneouscores.Bothofthecoresarepipelinedwithvestages: 58

PAGE 59

Figure4-6.Multicoresystemwithdifferenttypesofcores fetch,decode,execute,memoryaccess,andwriteback.Theonlydifferenceisthat theyhavedifferentimplementationintheexecutestageEX.Inthiscase,wedene oursymmetriccomponent F S asthesetofallfunctionalunitsintwocoresexceptEX. Thesetwoexecutionstagesaswellasbusandmemorysubsystemaremodeledinthe asymmetricpart F A .Ofcourse,theinputandoutputof F S herewillincludenotonlythe inputandoutputvariableofthecores,butalsoalltheinterfacevariablesbetweenEX andotherstages.Inthisway,theinformationlearnedonallotherstagesofonecore canstillbesharedbytheothercore.Clearly,thecorrectnessofourapproachisstill guaranteed,becausetheselectionofthesymmetriccomponentsatisesitsdenition. Figure4-7.Multicoresystemwithdifferenttypesofexecutionunits 59

PAGE 60

4.2Experiments Wehaveevaluatedtheapplicabilityandusefulnessofourtestgenerationtechnique ondifferentmulticorearchitectures. 4.2.1ExperimentalSetup AsdescribedinSection4.1.2,thedesignsandpropertiesaredescribedinSMV languageandconvertedtorequiredCNFformulaeDIMACSlesusingmodied NuSMV[18].WeusedzChaff[38]asourSATsolvertoimplementourtestgeneration algorithm.ExperimentswereperformedonaPCwith3.0GHzAMD64CPUand4GB RAM. First,wepresentresultsofourapproachusingamulticoredesignthatiscomposed ofdifferentnumberofidenticalcores,onebus,andmemorysubsystem.Thepipeline insideeachcorehasvestages:fetch,decode,execute,memoryaccess,and writeback.Besides,eachcorehasitsowncache,whichisconnectedwiththememory throughthebus.Next,wewillpresentinFigure4-10theapplicabilityofourapproach onheterogeneousmulticorearchitectures. Inordertoactivatethedesiredsystembehaviors,weuseddifferentnumberof propertiesondesignswithdifferentcomplexity.Forinstance,weused375properties incaseof16coredesignthattriggertwosimultaneousactivitiesbetweencores.We havealsousedseveralpropertiesthatinvolvesmulticoreinteractions.Forexample, onetestwillactivatethefollowingscenario:ifthevalueinamemorylocationwhichis initializedasonebycore1,isincreasedbyonebyallothercores,itshouldbeequal tothenumberofcoreswhenitisreadbackbycore2.Itshouldbenotedthatthe correspondingpropertyisnotsymmetricwithrespecttoallcores. 4.2.2Results WecomparedourapproachwithStrichman'sapproach[79]andoriginalBMC[28]. EachapproachwasusedtosolveasequenceofSATinstancesforthesameproperty withvaryingboundsuntilasatisableinstanceisfound.TheinputSATinstancesfor 60

PAGE 61

Strichman'sapproachandtheoriginalBMCwasdirectlysynthesizedfrom BMC M;p;k toimprovetheirperformance.Whenourapproachwasapplied,weperformedtheSAT solvingon BMC 0 M;p;k asindicatedinSection4.1.Wealsotriedtocomparewith [3].Unfortunately,theimplementation[4]failedtoproducethesymmetrybreaking predicatesduetothelargesizeofourinputCNFmorethan600kclauses. Figure4-8.Testgenerationtimewithdifferentnumberofcores Figure4-8presentstheaveragetestgenerationtimefordifferentnumberofcores. TheoriginalBMCfailedtoproduceresultswithin3000secondsonseveralpropertiesfor the16coresystem.Therefore,itstimeisomitted.Asexpected,thetimeconsumption increaseswiththenumberofcores.BothourapproachandStrichman'sapproach[79] areremarkablyfasterthanoriginalBMC[28].Byeffectiveutilizationofbothspatialand temporalsymmetry,ourapproachoutperforms[79]whichonlyconsiderstemporal symmetrybynearly2times. Table4-1showsamoredetailedcomparisonofdifferentapproachesonthe8core systemfor10mosttimeconsumingproperties.Therstcolumnrepresentsthenames ofpropertiesused.Thesecondcolumnshowsthecorrespondingboundsortimesteps toactivateeachproperty.Thenextthreecolumnspresentthetestgenerationtimein secondsforeachpropertyusingtheoriginalBMC[28],Strichman'sapproach[79], 61

PAGE 62

Table4-1.Testgenerationtimefor8coresystem Prop.Bound[28][79]OurSpeedupSpeedup TimesTimesApproachover[28]over[79] 1287956253.162.24 2226744213.192.10 3329362303.102.07 428208941712.245.53 533*342148-2.31 620413124478.792.64 720*12548-2.60 8238831406314.022.22 925210615712816.451.23 1025199110610119.711.05 Total-584012506289.301.99 *representruntimesexceeding3000sec. Table4-2.Detailedtestgenerationinformation k[79]Ourapproach #ClsinDB#Decision#FwdClsTimes#ClsinDB#Decision#FwdClsTimes 1972142740045256082.47561492123144411.2 2076285571854273293.685710330049266852.7 2182727256692228243.490042835687245343.1 2289338220311210220215.49659253087368341.9 23954998265241114258597.31029266122860326198952.8 Total-3024114320548122.1-134644332448361.7 andourapproach,respectively.Thetimeiscalculatedasthesummationofthetimeto solvealltheSATinstancesfrom k =0 totheboundoftheproperty.Thetimecalculation alsoincludesthetimeconsumedbynon-SAT-solvingstepsinAlgorithm3.Thelasttwo columnsindicatethespeedupofourapproachover[28]and[79].Itcanbeseemthat ourapproachoutperforms[79]bytwotimesand[28]byanorderofmagnitude. Toinspectthereasonofourimprovementover[79],weanalyzethebehaviorofthe SATsolver.Table4-2showsdetailsofthelastveSATinstancesimmediatelybefore theboundwasfoundduringtheBMCofproperty8onthe8-coresystemhighlighted entryinTable4-1.TherstcolumninTable4-2isthetimestepofeachSATinstance. Thenextfourcolumnscontaintherealsizeoftheclausedatabasebeforethesolving process,thenumberofdecisionsmadebyzChaff,thenumberofforwardedconict 62

PAGE 63

clausesandthetimeconsumptionin[79].Similarinformationofourapproachis representedinthelastfourcolumns.Comparedto[79],thetotalnumberofdecisions madebytheSATsolverismuchsmallerwhenourapproachisapplied.Atthesame time,thenumberofforwardedclausesarecomparable.Inotherwords,ourapproach savesthetimetorediscoverthesameknowledgeforeachcore,withouttheoverheadof forwardingtoomanyconictclauses. Figure4-9.Testgenerationtimewithdifferentinteractions Wealsoinvestigatedtheimpactofdifferentnumberofcoresinvolvedinthe interactiononthetestgenerationtime.Inthisexperiment,weuseaprocessorwitheight 3-stagecores.Theyareconnectedtothememorysubsystemusingsnoopyprotocol. Thedesiredtestshouldtriggerallcoresperformreadandwriteoperationonthesame sharedmemoryvariableincertainorder.TheresultsaregiveninFigure4-9.Whenthe interactioninvolvesonlyasmallnumberofcores,thedifferenceintestgenerationtime of[28],[79],andourapproachisquitesmall.However,whenmoreandmorecoresare involved,ourapproachoutperformsboth[28]and[79]remarkably,duetotheusageof symmetryinformation. 63

PAGE 64

Figure4-10.Testgenerationtimewithheterogeneouscores Finally,toillustratetheeffectivenessofourapproachinamoregeneralscenario,we measurethetestgenerationtimeonasystemwithheterogeneouscores.Weusecores withdifferentimplementationsintheirfetch,issue,executionstages,andrepeatthe previoustestgenerationexperiment.AsdiscussedinSection4.1.3,weonlyreplicate learnedconictclauseswithinthesymmetriccomponents.Figure4-10showstheresult. Thefetchcurvecorrespondstoasystemwherethe8coresareidenticalexcepttheir fetchstages.Similarly,curvesmarkedasIssueandExecutionrepresentcoreswith differentissueandexecutionstages,respectively.Wealsoshowthetestgeneration timeforhomogeneouscoresusingourapproachNoneand[79]asreference.It canbeobservedthatduetolessscopeofknowledgereuse,thetimeconsumptionof ourapproachforheterogeneouscoresaregenerallylargerthanhomogeneouscores. Nevertheless,ourapproachstilloutperforms[79]especiallyforcomplicatedinteractions involvingmanycores. 4.3Summary Functionalvericationofmulticorearchitecturesischallengingduetotheincreased designcomplexityandreducedtime-to-market.ExistingincrementalSATapproaches 64

PAGE 65

haveonlyexploitedthesymmetryinBMCacrossdifferenttimesteps.Wepresenteda novelapproachfordirectedtestgenerationofmulticorearchitecturesthatexploitsboth spatialandtemporalsymmetryinSAT-basedBMC.TheCNFdescriptionofthedesignis synthesizedusingCNFforcores,busandmemorysubsystemtopreservethemapping informationbetweendifferentcores.Asaresult,thesymmetrichighlevelstructure iswellpreservedandtheknowledgelearnedfromasinglecorecanbeeffectively sharedbyothercoresduringtheSATsolvingprocess.Theexperimentalresultsusing homogeneousaswellasheterogeneousmulticorearchitecturesdemonstratedthatthe testgenerationtimeusingourapproachisremarkablysmaller-10timescomparedto existingmethods. 65

PAGE 66

CHAPTER5 VALIDATIONOFCACHECOHERENCEPROTOCOLS Cachinghasbeenthemosteffectiveapproachtoreducethememoryaccesstime forseveraldecades.Whenthesamedataiscachedbydifferentprocessors,cache coherenceprotocolsareemployedtocoordinatetheaccessesandguaranteethat themostrecentwrittendataisreturned.Astheprotocolsaregrowingmoreandmore complex,thevericationteamsarefacingsignicantchallengestoachievetherequired coveragewithintighttime-to-marketwindow. Sinceallpossiblebehaviorsofthecacheblocksinasystemwith n cores 1 canbe denedbyaglobalnitestatemachineFSM,theentirestatespaceistheproductof n cacheblocklevelFSMs.Intuitively,fullstateortransitioncoveragecanbeachieved byperformingabreadthrstsearchBFSonthisproductFSM.Thepaththatleads toeachdistinctstatefromtheinitialstatecanbeusedasatestcaseforthatstate. Unfortunately,sinceeachtestisusedtoactivateonlyonetransition,alargenumberof transitionsmaybeunnecessarilyrepeated,iftheyareontheshortestpathtomany othertransitions.Therefore,itisdesirabletoreplaceBFSwithanotherefcient algorithm,whichcreatesaninputsequencethatcoversalltransitionswithminimum transitionoverhead.Sincethenumberofdirectedtestscanbequitelargeinmany practicalscenarios,itmaybebenecialtogeneratethedirectedtestson-the-y,sothat thecreatedtestscanbedirectlyfedtothesimulatororthedeviceundertestwithout extrastoragerequirement.Clearly,thedevelopmentofsuchalgorithmsrequiresa clearunderstandingofthestatespaceofthecomplexglobalFSM.AlthoughtheFSM ofeachcachecontrolleriseasytounderstand,thestructureoftheproductFSMfor 1 Inthischapter,weusetheterm core torefertoeachsingleprocessingunitsin multicoreormultiprocessorsystems. 66

PAGE 67

moderncachecoherenceprotocolscanhavequiteobscurestructurethatcanbehardto analyze. Inwork[70],weproposeanon-the-ytestgenerationforcachecoherence protocolsbyanalyzingthestatespacestructureoftheircorrespondingglobalFSMs. Insteadofusingstructure-independentBFStoobtainthedirectedtests,weshowthat theentirecomplexstatespacecanbedecomposedintoseveralcomponentswithsimple structures.Sincetheactivationofstateandtransitioncanbeviewedasapathsearching probleminthestatespace,thesedecomposedcomponentswithknownstructurescan beexploitedforefcienttestgeneration.Ourcontributionsare: 1.Wedevelopagraphicaldescriptionofthestatespacestructureofseveral commonlyusedcachecoherenceprotocolsthatcanbeviewedasacomposition ofsimplestructures. 2.Wepresentanon-the-ydirectedtestgenerationalgorithmbasedontheEuler tourofhypercubes.Ourapproachonlyrequireslinearspacerequirementwith respecttothenumberofcores.Thegeneratedtestformsatourinthestatespace ofcorrespondingglobalFSM,whichactivatesallpossibletransitionsoftheglobal FSMwithsmalloverhead. Therestofthischapterisorganizedasfollows.Section5.1providesrelated backgroundinformation.Section5.2describesourcontributionsindetails.Experimental resultsarepresentedinSection5.3.Finally,Section5.4concludesthechapter. 5.1BackgroundandMotivation Inmoderncomputersystems,sincethelatencytotransferdatafromthemain memorytoprocessingunitsismuchlargerthanthetimeconsumptionforcomputation, eachprocessingunitusuallymaintainsitslocalcopyofthemainmemory,orcache forfastaccess.Onemajorproblemofcachingisthatwhenthesamedata,memory block,iscachedintwoormoredifferentplaces,anyfuturemodicationtoitshould bepropagatedtoallthecachedcopies.Otherwise,itcanleadtoincorrectfunctional behaviors.Cachecoherenceprotocolsarethereforeproposedtodenethecorrect 67

PAGE 68

behaviorofeachcachecontroller,whendifferentprocessingunitsissueloadsand storestothesamememorylocation. OneofthesimplestcachecoherenceprotocolistheMSIsnoopyprotocol[42]. Thebehaviorofthecachecontrollerinaprocessingunitismodeledasanitestate machineFSM.Figure5-1showsthestatetransitiondiagramofMSIprotocol.The stateofacacheblocklinecanbeeitherInvalidI,ModiedM,orSharedS. Atthebeginning,allcacheblocksareintheinvalidstate.Whenaloadrequestarrives fromthecoresideSelfLD,thecachecontrollerwillrequestthedatafromthemain memoryandswitchtosharedstate.WhenthecoreissuesastorerequestSelfST,the cachecontrollerwillrstbroadcastaninvalidatedrequestonthebusandthenchangeto modiedstate.Suchaninvalidaterequestwillinformallothercachecontrollersthatare insharedormodiedstatestochangetoinvalidstate.Acacheblockmayalsochange toinvalidstate,whenitisevictedbyanothercacheblockwhichismappedtothesame locationinthecache,orothercoresissuestorerequestsOtherST. IS M SelfLD SelfST SelfST OtherLD Eviction Eviction OtherST OtherST SelfLD ST=Store LD=Load SelfLD OtherLD Figure5-1.StatetransitionsforacacheblockinMSIprotocol AlthoughMSIprotocolisenoughtoguaranteethecoherenceofthecachesystem, itcausessomeunnecessarydelayandtrafconthecommunicationchannels.Many variantsoftheMSIprotocolsareinventedtofurtherimproveitsperformance.For example,ExclusiveEstateisintroducedinMESIprotocoltoavoidthetrafcwhena 68

PAGE 69

cacheblockisonlyusedbyonecore.OwnedOstateisusedinMOSIandMOESI protocoltoreducethedelaywhenamodiedblockisloadedbyothercores. Ascachecoherenceprotocolsarebecomingmoreandmorecomplex,itisgetting hardertoverifytheirimplementations.Fromthevalidationperspective,itisalways desirabletoactivateallpossiblestatetransitionsoftheentiremulticorecachesystem. Inotherwords,itisnecessarytohaveahighstateandtransitioncoverage%,if possibleintheglobalFSMoftheentirememorycachesubsystem. 5.2TestgenerationforTransitionCoverage OurapproachismotivatedbythebasicBreadthFirstSearchBFSinthestate spaceofaglobalFSM.GiventheFSMdescriptionofanycachecoherenceprotocol,itis possibletocomposeatestsuitewhichcanactivateallstatesandtransitionsusingtwo steps:1foreachstate,wendouttheinstructionsequencestoreachitbyperforming aBFSontheglobalFSM;and2foreachtransition,wecreatethetestbyappending therequiredinstructionsaftertheinstructionsequencestoreachtheinitialstateofthis transition.However,suchanaiveapproachhastwoproblems.First,transitionsclose totheinitialstatearevisitedformanytimes.Thus,alargeportionoftheoveralltest timeiswasted.Secondly,itisdifculttogeneratetestson-the-y,becausethememory requirementtoruntheBFSroutineisquitelarge.Sincewehavetorememberallvisited statesinBFSprocess,itsruntimememoryrequirementalsogrowsexponentially. Toaddressthesechallenges,ourapproachneedstosatisfytworequirements:1 weshouldreducethenumberoftransitionsasmuchaspossiblewithoutsacricingthe coveragegoal;and2thespacerequirementforthetestgenerationalgorithmshould besmall.Fortunately,duetothehighlysymmetricandregularstructureofthestate space,itispossibletodesignadeterministictestgenerationalgorithm,whichcan efcientlyactivateallstatesandtransitionsofpopularcachecoherenceprotocols.The basicideaistodividethecomplexstatespaceintoseverallargehypercubesandother 69

PAGE 70

smallcomponents.Sincehypercubescanbetraversedwithnoextraoverhead,alarge numberofunnecessarytransitionscanbeavoidedduringactivatingalltransitions. Intherestofthissection,werstdescribehowtogenerateteststoactivateall transitionsofasimpliedcachecoherenceprotocol:SIprotocol.Next,wediscussour testgenerationtechniquesforavarietyofpopularprotocolsincludingMSI,MESI,MOSI, MOESI,andMESIF.Inthiswork,wefocusonthetransitionbetweentwostablestates. Weassumethatthetransitionbetweenstablestatestotransientstatearecorrect. 5.2.1SIProtocol SIprotocolisatrimmedversionofMSIprotocol,inwhichwedonotallowcores toissuestoreoperation.Forasystemwith n cores,avalidglobalstateofthesystem allowsthecacheblocksinany m coresinIstateandcacheblocksintheother n )]TJ/F24 11.9552 Tf 12.49 0 Td [(m coresinSstate.Thus,thereare 2 n validglobalstates.Besides,sinceanycoreinIor SstatecanbeconvertedtoSorIstatewithinonetransition,thereare n outgoing and n incomingedges.ItiseasytoseethattheentirestatespaceofSIprotocolwith n coresisa n dimensionalhypercube 2 .Figure5-2showssuchastatespacewiththree cores. Sincealledgesarebidirectionalforstatetransitions,wedonotshowtransition directionsexplicitly. Forexample,stateIIIcanbetransformedtoIISwhentherstcore loadsthecacheblock.Similarly,stateIIScanalsobetransformedtoIII,whentherst coreevictsthiscacheblock. Toachievefullstateandtransitioncoverageofthestatespace,weneedtotraverse eachedgeofthehypercubeatleastonceinbothdirections.Sinceeachglobalstatehas 2 Therearemanytransitionsthatstartandendinthesamestates.Forexample,the globalstatewillnotchangeifacoreinSstateissuesaloadoperation.Usually,these transitionsareeasiertocover,becausetheycanbeactivatedbyappendingonemore operationattheendofexistingtests,whichareusedtoactivatecorrespondinginitial states.Asaresult,weomittheminthestatespacestructuredescriptioninthissection. However,allpossibletransitionsareconsideredintheactualimplementationofourtest generationapproachaswellasintheexperimentalresults. 70

PAGE 71

IIS SSS III ISS SIS SII ISI SSI Figure5-2.GlobalFSMstatespaceofSIprotocolwith3cores thesamenumberofincomingandoutgoingedges,itispossibletoformanEulertour [35]ofthestatespace,whichvisitseachedgeexactlyonceinbothdirections. Algorithm4: TestgenerationforSIprotocolwith n cores CreateTestsSI n 1: for i =0 to n )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 do 2: Outputloadi 3: VisitHypercube ;n )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 ;i 4: Outputevicti 5: endfor VisitHypercube id;m;shift 1: for i =1 to m do 2: newid = id +2 i 3: p = i + shift mod n 4: Outputloadp 5: if i> 1 then 6: VisitHypercube newid;i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 ;shift 7: endif 8: Outputevictp 9: endfor 10: return ; Algorithm4showsourtestgenerationalgorithmforSIprotocol,whichperforms anEulertourona n dimensionalhypercube.Here, loadp/evictpmeansthe p th coreperformsaload/evictoperationinaparticularcycle,whileallothercoresremain idle .WeusethestatespaceinFigure5-2toshowtheexecutionofAlgorithm4.The algorithmstartsbycalling CreateTestsSI n .AllcoresareinIstateatthebeginning. 71

PAGE 72

Intherstroundofthe for loopinline2,thesystemrstperformtransitionIII-IISby executingload.During VisitHypercube ,wewillrstvisittransitionIIS-ISSand ISS-IISfor i =1 andIIS-SISfor i =2 .Since i> 1 ,weinvoke VisitHypercube at line6,whichactivatestwotransitions:SIS-SSSandSSS-SIS.Next,transitionSIS-IIS iscoveredbyexecutingevictinline7of VisitHypercube .Finally,theglobalstate goesbacktoIIIviatransitionIIS-IIIafterevictinline5of CreateTestsSI .Inthenext tworoundsofthe for loopin CreateTestsSI ,weareessentiallyperformingarotated versionoftheprevioustraversal,whicharegoingtocoveralltransitionsinpaths III-ISI-SSI-ISI-ISS-SSS-ISS-ISI-IIIandIII-SII-SIS-SII-SSI-SSS-SSI-SII-III.Eventually,all transitionsinthehypercubearecoveredbythegeneratedtestsequences. AlthoughtheexecutionofAlgorithm4seemstobecomplicatedforlarger n thebasicideaofthisalgorithmisquiteeasy:thehypercubeisactuallypartitioned into n isomorphictreeswithnooverlappingedges.Oncethehypercubeiscorrectly partitioned,anEulertourisperformedontrees,becausealledgesarebidirectional.The correctnessofouralgorithmcanbeprovedasfollows. First,weshowthatthetotalnumberoftransitionsinanSIprotocolwith n coresis n 2 n .Noticethatan n dimensionalhypercubehas n 2 n )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 edges.Sinceeachedge correspondstwotransitionsintheSIprotocol,thetotalnumberoftransitionsbecomes n 2 n Next,weprovethatAlgorithm4traversesallthesetransitionsbyconstructingatest sequenceoflength n 2 n Lemma2. ThelengthofthetestsequencegeneratedbyAlgorithm4is n 2 n Proof. Clearly, VisitHypercube in CreateTestsSI for n times.Sinceeachdifferentvalue of id isassociatedwith2transitionsand id growsfrom1to 2 n )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 )]TJ/F15 11.9552 Tf 12.391 0 Td [(1 ,wecanconclude thateachinvocationof VisitHypercube in CreateTestsSI willproduce 2 n )]TJ/F15 11.9552 Tf 12.089 0 Td [(2 transitions. Therefore,thetotalnumberoftransitionsare n n )]TJ/F15 11.9552 Tf 11.955 0 Td [(2+2= n 2 n 72

PAGE 73

Finally,weshowthatnotransitioniscoveredtwice.Duetotheshiftoperation andthestructureofid,itcanbeveriedthattheglobalstatewillneverrepeatbefore theexecutionofthetestsequenceproducedbyeach VisitHypercube withsame m Therefore,itisguaranteedthateveryloadorevictoperationinAlgorithm4always drivesthesystemthroughauncoveredtransition.Inotherwords,thetestsequence constructedbyAlgorithm4doesperformanEulertourofentirestatespace. ThespacecomplexityofAlgorithm4islinearwiththenumberofcores n .The reasonisthatthefunction VisitHypercube id;m;shift canberecursivelycalledforat most n )]TJ/F15 11.9552 Tf 12.035 0 Td [(1 times.Thealgorithmthereforerequiresastackthatwithatmost n )]TJ/F15 11.9552 Tf 12.035 0 Td [(1 levels. Asaresult,thespacecomplexityis O n 5.2.2MSIProtocol ThedifferencebetweenMSIprotocolandSIprotocolisthatacacheblockcanbe changedtothemodiedMstate,whenitreceivesastorerequest.Fortheeaseof discussion,wedenethefollowingterms. Denition2. Globalsharedstate isaglobalstatewithinwhichcoresareineither sharedorinvalidstatese.g,IIS,ISI,ISS,SII,SIS,SSI,andSSSinFigure5-3. Denition3. Globalinvalidstate isaglobalstatewithinwhichallcoresareinthe invalidstatee.g,IIIinFigure5-3. Denition4. Globalmodiedstate isaglobalstatewithinwhichonecoreisinthe modiedstatee.g,IIM,IMI,andMIIinFigure5-3. Figure5-3showsthestatespaceofMSIprotocolwiththreecores.Sinceonlyone corecanbeinthemodiedstateforMSIprotocol,thereare n globalmodiedstates inthestatespaceofasystemwith n cores.Globalmodiedstatesarereachablefrom anyotherglobalstatesbystorerequestsfromcorrespondingcores.Besides,aglobal modiedstatecanalsobeconvertedtotheglobalinvalidstateorglobalsharedstates. Forexample,globalmodiedstateIMIcanbeconvertedtoglobalinvalidstateIIIby evict,orglobalsharedstatesISSandSSIbyloadorload,respectively. 73

PAGE 74

ISI MII SII IIS SSI ISS IMI IIM SIS SSS III Figure5-3.StatespaceofMSIprotocolwith3cores.Fortheclarityofpresentation,the transitionstoglobalmodiedstatesIIM,IMI,MIIareomitted,ifthe transitionintheoppositedirectiondoesnotexist. Clearly,all n globalmodiedstatesformaclique,becausetherearetwotransitions withoppositedirectionsbetweeneachpairofthem.Asaresult,thesetransitionscan becoveredwithanEulertour.Unfortunately, itisnotpossibletocoveralltransitions inthestatespaceofMSIbyasingleEulertour. Thereasonisthatforsomeglobal sharedstatelikeIIS,thereareonlyoutgoingtransitionstoglobalmodiedstates, butnoincomingtransitionsfromthem.Therefore,outgoingtransitionsaretwiceof incomingtransitions.Thesimilarscenariocanalsobeobservedforglobalmodied states,whichhavemoreincomingtransitionsthanoutgoingtransitions.Tocover alltransitions,someofthemmustbereused.Infact,theproblemtominimizethe numberofreusedtransitionsiscalledChinesePostmanProblemCPP[35],which canbesolvedbycalculatingthemin-costmax-ow.Sinceweneedtoperformthetest generationon-the-y,wedecidednottoobtaintheoptimalsolutionbysolvingCPP, becausethestatespacecanbetoolargetotintomemorywhentherearemanycores inthesystem.Instead,wevisittheuncoveredtransitiontoglobalmodiedstateoneby 74

PAGE 75

oneandusetheshortestpathtolinktheendstateoftheprevioustransitionandstart stateofthenexttransition. Algorithm5: TestgenerationforMSIprotocolwith n cores CreateTestsMSI n 1: CreateTestsSI n /*InvokeAlgorithm1*/ 2: VisitClique 3: for eachglobalsharedstate s do 4: for i =0 to n )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 do 5: Outputstorei 6: Outputtheshortestpathfromcurrentstateto s 7: endfor 8: endfor VisitClique p 1: Outputstorep 2: Outputoperationstovisitallbidirectionallyreachableglobalsharedstates 3: for i = p +1 to n )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 do 4: Outputstorei 5: if i = p +1 then 6: VisitClique i 7: endif 8: Outputstorep 9: endfor 10: return Algorithm5presentsthetestgenerationalgorithmforMSIprotocol.Werst invoke CreateTestsSI n inAlgorithm4tocoveralltransitionsthatalsoexistinSI protocol.Next, VisitClique willrecursivelyperformanEulertourinthecliqueofall globalmodiedstates.Forexample,whenweexecute VisitClique inthestatespace showninFigure5-3,wearerstgoingtocovertransitionIIM-IMI.Intherecursivecall of VisitClique inline6,transitionIMI-MIIandMII-IMIarevisited.Afterthat,transition IMI-IIMiscoveredbyexecutionofline7.Inthenextroundofiteration,IIM-MIIand MII-IIMarevisited.Toimprovetheefciency,wealsotraverseallglobalsharedstates thatarebidirectionallyreachablefromcurrentglobalmodiedstate.Finally,inline3-6of CreateTestsMSI n wearevisitingalluncoveredtransitionsfromglobalsharedstates 75

PAGE 76

toglobalmodiedstates.NoticethatwedonotneedtorunDijkstra'salgorithmtond shortestpathinline6,becausewemustbeinaglobalmodiedstateafterexecutingthe storeoperationinline5.Thetargetglobalsharedstatecanbereachedbyissuingload andevictrequestsbasedonthepositionofSinitsstatevector. 5.2.3MESIProtocol InMESIprotocol,acacheblockgoestoexclusiveEstatewhenitistherstone, whichloadsamemoryaddress.Inasystemwith n cores,thereare n globalexclusive states 3 .Figure5-4showsthestatespacewiththreecores.Unlikeglobalmodied states,globalexclusivestatescannotbeconvertedtoeachotherdirectly.Therefore, thetestgenerationalgorithm CreateTestsMSI forMSIprotocolneedstobemodiedto createtestsforMESIprotocol.Wecanadd n groupsofoperationstocovertransitions fromtheglobalinvalidstatetoglobalexclusivestatesaswellastransitionsfromglobal exclusivestatestoglobalmodiedstates.Noticethatthe CreateTestsSI routine,which isusedtovisitalltransitionsbetweenglobalsharedstates,alsoneedstobemodied slightly.ThereasonisthatinMESIprotocol,theglobalinvalidstatewillbeconvertedto globalexclusivestatesafteranyloadrequestIIIgoestoIIEinsteadofIISwhentherst coreissuesaloadrequest. 5.2.4MOSIProtocol TheMOSIprotocolcontainsanewstateownedO,whichcanbeusedtoavoid unnecessarywritebacktomemory.Acacheblockinthemodiedstateisconverted totheownedstate,whenothercoresaretryingtoloadthesamecacheblock.The ownedstatecancoexistwithsharedandinvalidstates.Asaresult,forasystemwith 3 A globalexclusivestate isaglobalstatewithacacheblockinexclusivestatee.g, IIE,IEI,andEIIinFigure5-4. 76

PAGE 77

III MII SII IIS SSI ISI ISS IMI IIM SIS SSS IEI EII IIE Figure5-4.StatespaceofMESIprotocolwith3cores n cores,thereare n 2 n )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 globalownedstates 4 .Consideringthefactthatthereare only n +2 n globalstatesinMSIprotocolwith n cores,thestatespaceofMOSIismuch larger.Despitethelargenumberofstates,thestatespacestructureofMOSIprotocol isnotcomplex.Theentirespacecanbedividedintothreecomponents.Therstand secondpartsarethehypercubeofglobalsharedstatesandthecliqueofglobalmodied states,respectively.TheyareidenticaltocorrespondingstructuresinMSIprotocol.The thirdpartisasetof n hypercubeswithdimension n )]TJ/F15 11.9552 Tf 12.256 0 Td [(1 .Eachofthe n )]TJ/F15 11.9552 Tf 12.257 0 Td [(1 dimensional hypercubesconsistsof 2 n )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 globalownedstates,whosestatevectorshaveOinthe sameposition.Forexample,Figure5-5showsthestatespacewiththreecores.Itis easytoseethatstatesIOI,IOS,SOS,SOIIIO,SIO,SSO,ISOandOII,OSI,OSS,OIS arecomposedofthree2-dhypercubessquares. Onenicepropertyofthisstatespacestructureisthatthereisnotransitionbetween the n hypercubesofglobalownedstates.Therefore,alargenumberoftransitions 4 A globalownedstate isaglobalstatewithacacheblockinownedstatee.g,IOI, IOS,...,OSSinFigure5-5. 77

PAGE 78

SII SSI ISI ISS SSS SOI IOS SOS IMI MII IIMSIO SSO IIO OSI OIS OSS OII ISO III IIS SIS IOI Figure5-5.StatespaceofMOSIprotocolwith3cores betweenglobalownedstatescanbeefcientlycovered.WecanperformanEulertour ineach n )]TJ/F15 11.9552 Tf 10.871 0 Td [(1 dimensionalhypercubebyinvokingroutine CreateTestsSI onglobalowned stateslikeIIO,IOIandOII,whereallbutonecoreareininvalidstate.Inordertocover transitionsfromglobalownedstatestoglobalsharedstates,likeIOS-IIS,wehavetouse asimilartechniquewhichwasusedin CreateTestsMSI n tocoverthestoretransitions. 5.3Experiments Toanalyzetheperformanceofourproposedtestgenerationframework,we conductedanumberofexperimentsusingM5simulator[15].M5isafullsystem simulator,whichimplementsaMOESIcachecoherenceprotocol.Inordertoverify thatourgeneratedtestscanachievealltransitions,wemodifythecachesubsystem inM5slightlytoallowdifferentprocessestoaccessthesamephysicalblock.Theload andstoreoperationsinthegeneratedtestsaretranslatedintocorrespondingALPHA instructions,while evict operationisachievedbyloadingadifferentmemoryaddress whichisalsomappedtothesamelocationinthecacheasthecacheblockundertest. Weusethe load-linked and store-conditional instructionpairstoensuretheexecution orderofinstructionsindifferentcores. 78

PAGE 79

Table5-1.Statisticsofourtestgenerationalgorithmfordifferentprotocols BFSOurapproach #States#TransitionsTotalcostAveragecostTotalcostAveragecostImprov.Testgeneration transitionpertransitiontransitionpertransitionfactortimesec MSI8cores2645256368967.0146642.860.3% < 1 MESI8cores2725392377127.0153122.859.4% < 1 MOSI8cores1288262481964007.51008073.848.7%6.2 MOESI8cores1296263841972167.51014553.848.6%6.2 MSI16cores6555226219682910009611.1115678884.460.2%54.4 MESI16cores6556826224962910326411.1115704644.460.2%54.5 MOSI16cores5898402385563227525436811.51311220635.552.4%586 SinceM5onlysupportsMOESIcachecoherenceprotocol,wealsodevelopeda protocolsimulator,whichcanbeconguredtosimulatethestatetransitionofamulticore systemusingMSI,MESIandMOSIprotocols.Weusedthissimulatortovalidatethe performanceofourtestgenerationapproachonotherprotocols. Intherstexperiment,wecomparedtheefciencyofourtestgenerationmethod withthetestsgeneratedbyperformingbreadthrstsearchBFSdirectlyontheglobal FSMondifferentcachecoherenceprotocolswithvariousnumberofcores.Sincetests generatedbyBFSaretheshortestteststodrivethesystemfromtheglobalinvalid statetotherequiredtransition,weuseadditionaloperationstoresettheglobalstate afterexecutionofeachtest.Table5-1givestheresults.ColumnTotalcostpresents thetotalnumberoftransitionstraversedtoactivatealltransitions.ColumnAverage costpertransitiongivestheaveragenumberoftransitionsweneedtotraversein ordertoactivateanuncoveredtransition.Itcanbeobservedthatthetotalsizeofthe testsgeneratedbyourapproachis50%-60%smallerthantheonesgenerateddirectly byBFS.ThisresultcanbeexplainedbythefactthattheEulertourexploitedinour algorithmtypicallycoversloadandevicttransitionsonglobalsharedstate.Thestore transitionsontheotherhand,arecoveredinasimilarwayastheBFSapproach.Since thenumbersofallowedloadandevicttransitionsforanyglobalstateareequal,wecan savearoundhalfofthetestsbyexploitingthespacestructure. Wealsocomparedthestateandtransitioncoverageofourtestgeneration approachwithadirectedrandomtestgenerator,MCjammer[83].Figure5-6and 79

PAGE 80

0% 20% 40% 60% 80% 100% 020000400006000080000 Transitioncoverage CostTotaltransitions Ourapproach BFS MCjammer Random Figure5-6.Transitioncoveragevs.costfordifferenttestgenerationmethodsonMESI protocolwith8cores 0% 20% 40% 60% 80% 100% 0150000300000450000600000 Transitioncoverage CostTotaltransitions Ourapproach BFS MCjammer Random Figure5-7.Transitioncoveragevs.costfordifferenttestgenerationmethodsonMOSI protocolwith8cores 80

PAGE 81

Figure5-7showtherelationbetweentransitioncoverageandtestingcostonthesame system.ItcanbeseenthatMCjammerisveryefcientatthebeginning.Actually, itismoreefcientthanBFStoachieve70%coverage.However,itbecomesmuch slowertocoveralltransitions.Thereasonisthatitisveryunlikelyforthealgorithmwith randomnesstocoverremaininguncoveredtransitionsamongallallowedtransitions.On theotherhand,ourproposedtestgenerationapproachcanalwaysachieve100%state andtransitioncoveragewithstablehighercoveragespeedthantheBFSbasedtests. Basedonourexperimentalresults,wecanalsoestimatetheoverheadofour approach.Althoughwedescribedouralgorithminrecursiveformstosimplifythe presentation,theycanalsobeimplementedasiterativeroutines.Asdiscussedin Section5.2.1,ouralgorithmshavelinearspacecomplexitywiththenumberofcores. Sinceourtestscanbegeneratedon-the-y,itsoverallspacerequirementisverysmall. ThetestgenerationtimeinTable5-1suggeststhattheruntimeofouralgorithmsis reasonable.ForMOSIprotocolwith23milliontransitions,wecancreateallthetests within10minutes,whichindicatesthatouralgorithmisquitelight-weightedforentire simulationbasedvericationphase. 5.4Summary Weproposedanefcienttestgenerationapproachforawidevarietyofcache coherenceprotocols.Basedondetailedanalysisofthespacestructure,ourapproach createsefcienttestsequencesfordifferentpartsoftheglobalFSMstatespace toachieve100%stateandtransitioncoverageforeachcachecoherenceprotocol. Comparedwithexistingapproachesbasedondirectedrandomtestgeneration,our approachsignicantlyincreasesthetransitioncoveragemetricwithlinearmemory requirement.Ourexperimentalresultsondifferentcachecoherenceprotocols demonstratedtheeffectivenessofourapproachonsystemswithmanycores,makingit suitableforfuturemulticorearchitectures. 81

PAGE 82

CHAPTER6 SCALABLEDIRECTEDTESTGENERATION Modelcheckingisapromisingtechniquetoautomaticallygeneratedirectedtests. Modelcheckersusuallyacceptmodelspresentedinspecialvericationlanguages.It isnoteasytoapplythemonrealimplementations.Forexample,whiletheRegister TransferLevelRTLmodelofrealprocessorsarecommonlydesignedinVerilogor VHDL,modelcheckingtoolslikeNuSMVonlytakesSMVmodelsasinput.Thus,real designsmustbetranslatedrst,whichitselfcanbeanerror-proneprocess.Since modelcheckingisbasedonstaticanalysis,thecomplexityofrealworlddesignsusually exceedsthecapacityofmodelcheckingtools.Thesolvingprocessmayrunoutof memorybeforeproducinganyusefulresultsforreallifedesigns.Ontheotherhand, randomorconstrained-randomtestgenerationtechniquesaresuitableforrealdesigns, becausetheyusuallyperformlittlereasoningoninternaldesignlogic.Alargeamount ofrandomstimulicanbegeneratedeasilyandsimulatedonrealdesigns.However, randomtestsareinefcienttoactivatespecicbehaviors.Itisthereforedesiredtohave atestgenerationapproach,whichcanhandlereal-lifedesigns,butstillabletoactivate anyrequiredsystembehaviorwithasmallnumberoftests. Tobridgethegapbetweenmodelcheckingbaseddirectedtestsandrandom tests,varioustechniquesSTAR[55]andHYBRO[56]combinesstaticanddynamic analysis.STARgeneratesteststoactivateallcontrolpathsofanRTLdesign,although itsuffersfromthepathexplosionproblem.HYBROaddressesthisproblemusing branchcoveragemetricinRTLControl-Flow-GraphCFG.TheCFGissequentially unrolledduringtheconcrete/symbolicsimulationtoobtainthepathconstraintsand measurethebranchcoverage.However,sincetheCFGandUse-Denechainis obtainedusingstaticanalysis,thistechniquecannotbeappliedwhendynamicarray references[11]areinvolved.Forexample,aprocessordesignmaywritetolocation ram[wb addr] andreadfrom ram[ld addr] inthefollowingcycle,where ram isan 82

PAGE 83

arraycorrespondingtothemainmemoryand wb addr and ld addr aretwovariables.If ram[wb addr] and ram[ld addr] arereferringtothesameelementduringexecutionand ram[wb addr] isinvolvedinthecontrolpath,alltheassignmentsto ram[ld addr] must alsobeconsidered.However,itisdifculttodetectsuchdependencybystaticanalysis. Asaresult,theapplicabilityof[56]islimited,sincedynamicarrayreferencesiswidely usedinmodernRTLdesignstoimplementregisterles,buffers,cachesandmemory. Inthischapter,weaddressthedynamicarrayreferenceproblembymakingthe arrayreferenceconcrete.Insteadofperformingstaticanalysisoftheentiredesign togetthevariabledependency,wecomposeaninstrumentedversionoftheoriginal designandexecutetheinstrumenteddesignonaHardwareDescriptionLanguage HDLsimulator.Duringthesimulation,theinstrumentedcodewillproduceatracele, whichrecordsallthelogicaloperationsperformedbythedesign.Alldynamicreferences toarrayelementsarereplacedbytheirconcreteindicesinthetraceleduringthe concretesimulation.Next,thetraceisanalyzedusingaconstraintsolver.Inthisway,our approachisabletoanalyzerealhardwaredesignswithdynamicarrayreferencesand detectdatadependencythrougharrayelements.Wealsoproposeseveraloptimization techniques,whichmakesourproposedalgorithmtohavecomparableefciencyas thestate-of-arttechniques[55][56].Notethatexistingtechniques[55]and[56]cannot handledesignswithdynamicarrayreferences.Ourexperimentalresultsdemonstrate thatourapproachiscapableofgeneratingdirectedtestefcientlyonavarietyof hardwaredesigns.Tothebestofourknowledge,ourapproachistherstattemptto createdirectedtestsforHDLdesignswithdynamicarrayreferencesbyinterleaving concreteandsymbolicsimulation. Therestofthechapterisorganizedasfollows.Section6.1describesourtest generationmethodologyforrealHDLdesigns.Section6.2discussestheimplementation detailsofourapproach.Section6.3presentsourexperimentalresults.Finally, Section6.4concludesthechapter. 83

PAGE 84

6.1DirectedTestGenerationbyInterleavingConcreteandSymbolicExecution Thebasicideaofourworkistoobtainthelogicoperationsperformedbythe designonasingleconcreteexecutionpath,andperformreasoningontopofittoobtain newtestinput.Figure6-1showsthekeystepsinourproposedapproach.Toexplore differentexecutionbehaviorsofthedesign,werstinstrumentthedesignwithtrace generationcode.Wealsodenetheinputvariableset I ,whichpresenttheinputto thedesignundertestDUT.Next,werepeatedlysimulatetheinstrumenteddesignas follows: 1.Use I asinputtotheDUT. 2.SimulateDUTonasimulator.Collectalltheoperationsperformedbythedesign andactivatedpathconstraintsfromthetraceoutput. 3.Invoketheconstraintsolvertocheckwhetherthedesiredbehavior p isoncurrent executionpath.Ifthisisthecase,recordtheassignmentof I asatestof p Otherwise,negateoneofthepathconstraintsandusetheconstraintsolverto obtaintheassignment I 0 ,whichforcesthedesigntoexerciseadifferentexecution path. Werstexplainourtestgenerationworkowusingasimpleexample.Next,we describethesystemmodelofourtargetdesignandseveralkeystepsinourworkow. Finally,wediscusssomeimportantoptimizationtechniquestoreducetheoveralltest generationtime. 6.1.1IllustrativeExample Inthissection,weuseasimpleexampletoshowthebasicworkowofour approach.ThedesignisasimplecountermodulewritteninVerilogFigure6-2.The testinputistheinitialvalueonline15.Ourgoalistoletthemoduleexecutethecodeon line11atclockcycle2. Werstinstrumentthecodeandsimulatethemodulefor3cyclesusingarandom inputvalue,e.g.,out=0.TheoutputtraceisshowninFigure6-3.Thetraceisproduced bytheinstrumentedcode,whichperformsthesameoperationsastheoriginalcode.In addition,theinstrumentedcodealsoprintstheperformedoperationduringsimulationas 84

PAGE 85

Generate Test n +1 from Test n Instrumented Design Instrumentation Trace Designundertest Path Constraints Test n / Test n +1 Simulation Generation Constraint Generation Test Figure6-1.Theworkowofourapproach atracele.Weuseout,0,out,1,out,2andout,3torepresenteach out indifferent cycles.NoticethatIFout,0==40nottakenstatementindicatesthattheifstatement online10isevaluatedtobefalse.Clearly,line11isnotexecutedwhentheinitialvalue ofoutis0. Sinceourgoalistoletline11tobeexecutedatcycle2,thevariable out musthave value40atcycle2.Similarly,out,0,out,1,out,2andout,3mustsatisfyconstraints inFigure6-4.Therefore,wecanuseconstraintsolverslikeYices[34]tosolvethese constraints,andproducethesatisableassignmentstoallvariables.Inthiscase,the solverdeterminesout,0,out,1,andout,2shouldbe38,39,and40,respectively.In otherwords,theinitialvalueofout,0shouldbe38inordertoactivateline11atcycle2. Thisistheintendeddirectedtest. 85

PAGE 86

1 module counterout,clk,reset; 2 parameter WIDTH=8; 3 output [WIDTH )]TJ/F20 11.9552 Tf 8.765 0 Td [(1:0]out; 4 input clk,reset; 5 reg [WIDTH )]TJ/F20 11.9552 Tf 8.765 0 Td [(1:0]out; 6 wire clk,reset; 7 always @ posedge clk 8 begin 9out < =out+1; 10 if out==40 11 $display Activated; 12 end 13 always @reset 14 if reset 15out=0; //initialvalue 16 endmodule Figure6-2.Counter.v out,0=0 out,1=out,0+1 IFout,0==40 not taken out,2=out,1+1 IFout,1==40 not taken out,3=out,2+1 IFout,2==40 not taken Figure6-3.SampleTrace out,1=out,0+1 out,0!=40 out,2=out,1+1 out,1!=40 out,3=out,2+1 out,2=40 Figure6-4.SamplePathConstraints IfweobservethepathconstraintsFigure6-4obtainedfromtraceduring concretesimulationFigure6-3,itiseasytoseethatweareessentiallyperforming achronologicalbacktrackinginthespaceofexecutionpaths.Bynegatingthetopmost 86

PAGE 87

Unreached Taken Taken Taken NotTaken NotTaken NotTaken a b out,0==40 out,1==40 out,2==40 ExecutionPaths Figure6-5.ChronologicalBackTracking constraint 1 inthetraceleout,2!=40,weforcethedesigntoswitchtoadifferent executionpathtransitionainFigure6-5.Sometimes,itisalsopossiblethatour desiredpathconstraintsFigure6-4arenotsatisable,i.e.,thebranchout,2==40in Figure6-5cannotbetaken.Inthiscase,wecannegatethenexttopmostconstraint out,1!=40andusetheconstraintsolvertocheckwhetherthebranchatnode out,1==40canbetaken.Theprocessisrepeated,untilthedesiredtestisfound,orall branchesareactivated. AlthoughthistestgenerationexampleisperformedonasimpleVerilogdesign,it illustratesthebasicideaofourproposedapproach.Intherestofthesection,weare goingtodiscusshowtoautomateeverystepduringthisprocess,andgeneratethe entiredirectedtestsuiteautomatically. 6.1.2SystemModel OurapproachtakesVerilogHDLprogramasinput.Ourcurrentimplementation supportsmostcommonfeaturesofVerilog,suchasalways@...sensitivelist..., continuousassignment,conditionalbranchesif,case,anddifferentvariabletypesreg, 1 Duetouseofstackinourimplementation,thelastpathconstraintisthetopmost constraint. 87

PAGE 88

wire.AlthoughourimplementationisbasedonVerilog,thesameworkingprinciplecan alsobeappliedtoVHDLdesigns,sinceitalsodescribesconcurrentnitestatesystems. Ourcurrentimplementationsupportscommonfaultmodelssuchaspathactivation faultandstuck-atfault.Thesefaultmodelsdescribepossiblefaultsthatcanoccurduring theexecutionofthesystem.Thepathactivationfaultmodelcanbeusedtocheck whetherthereisanyunreachablecodeinthedesign.Thestuck-atfaultcanbeusedto checkwhetherthegivenvariablealwayshasthesamevalue.Basedonthegivenfault model,ourtestgenerationtechniquewillgeneratethetestsuite,whichcanactivateall possiblefaultsofthesystemunderthefaultmodel.Itisimportanttonotethatthese faultmodelsarebynomeansthegoldenmodelratheritisarepresentativemodel. Variousgraph-basedfaultmodelsincludingnodefault,edgefault,sequencefaultand interactionsfaultareexploredinSection6.3.3. Withoutlossofgenerality,wediscussourapproachinthecontextofsingleclock domain.Weusetuple name;clk toindexeachvariableineverycycle.Whenmultiple clockdomainsareused,the clk shouldbethecyclenumberincorrespondingclock domain. 6.1.3Instrumentation Theprimarypurposeoftheinstrumentationistousethesimulatortoproducea traceleduringconcretesimulationoftheRTLdesign.Theresultanttraceleiscrucial toourtestgenerationframeworkfortworeasons.First,thetracelerecordsalllogic operationsperformedduringtheconcretesimulation,whichenablesustoperform symbolicsimulationanddirectedtestgeneration.Besides,thetracelealsoprovides informationaboutdifferentconcreteexecutionpaths.Toensurethateachvariableis unique,weneedtoattenallmoduleinstancesbeforeinstrumentation.Thedetailsare describedinSection6.2.1. Table6-1showstheinstrumentationrules.Foreaseofpresentation,weuseVerilog syntaxforillustration.Weusevariable cc todenotethenumberofclockcyclesfromthe 88

PAGE 89

Table6-1.Veriloginstrumentationcode //Continuousassignmentalways #clkwidth$display v;cc = e ; assign v = e ;assign v = e ; //Blockingassignment v = e ;$display v;cc = e ; v = e ; //Assignmentwithin //always@pos/negedge...$display v;cc +1= e ; v< = e ; v< = e ; //Assignmentwithin //otheralwaysblocks$display v;cc = e ; v< = e ; v< = e ; //If if p if p s ;begin$displayIF p taken; s ;end elseelse s 0 ;begin$displayIF p nottaken; s 0 ;end //Casecase e x: case e begin$displayCASE e = x ; s ;end x: s ;y: y: s 0 ;begin$displayCASE e = y ; s 0 ;end default: s 00 ;default: begin$displayCASE e != x;y ; s 00 ;end //Arrayindex // b [ e ] isanarrayreference$display...b evale...; //inastatement :::b [ e ] ::: ; :::b [ e ] ::: ; //Beginningofacycle$displayNewcycle; cc = cc +1 ; beginningofthesimulation.Weusethedisplaystatementtoprintthesyntacticobjects intothetraceleduringthesimulationoftheinstrumentedcode.Fornormalarithmetic operations,theinstrumentedcodejustrecordtheexactoperationthatisperformedby 89

PAGE 90

thedesign.Forexample,forcontinuousassignmentrstrowinTable6-1,theoriginal codeis assign a=b+c; Theinstrumentedcodeis always #cycle $display x,cc=y,cc+z,cc; assign x=y+z; whichhavethesamefuntionalityastheoriginalcodeandprint x;cc = y;cc + z;cc in everycyclewithcorrespondingcyclenumber cc .Infact,thevalueof cc arepopulated automaticallyduringconcretesimulation.ThedetailscanbefoundinSection6.2.2. Forotherassignmentstatements,theinstrumentedcodealsomarkswhetherthe assignmentismadewithin always@pos/negedge...block.Inthisway,thetracelerecordswhetherthelefthand sidevariablereceivesthevalueofrighthandsideexpressioninthesameclockcycle. Ourframeworkenablesnaturalanalysisofarrays.Toreasonwithdynamicarray references,wereplacetheindexexpressionofeacharrayelementsintoitsconcrete value,andtreateacharrayelementasanindependentvariable.Duringtheconcrete simulation,theindexexpression e isevaluated.Thecorrespondingarrayelementis refereedbyconcatenatingtheconcreteresults eval e tothearraynameinthetracele. WediscussthedetailsinSection6.2.3. 6.1.4ConcreteSimulation Oncethedesignisattenedandinstrumented,weinterleaveconcreteandsymbolic simulationofthedesign.Ineachiteration,weperformtheconcretesimulationof theinstrumenteddesignusingasimulatorwithdesirednumberofcycles.Sincethe instrumentationprocessdoesnotaffectthefunctionalityofthedesign,thebehavior oftheinstrumenteddesignisidenticaltotheoriginaldesign.Atthesametime,the 90

PAGE 91

instrumenteddesignproducesatracele,whichrecordseveryoperationperformedby thedesigninthecorrectorder.Thistracelewillbeusedforthesymbolicsimulationof theconcreteexecutionpath. 6.1.5PathConstraintGeneration Inthisstep,weconvertthetraceleintoapathconstraintle.Thisstepisrequired fortworeasons.First,thecontinuousassignmentsaresimulatedusingalwaysblocks. Asaresult,theconstraintcorrespondingtothecontinuousassignmentmaybeprinted afterthetraceisproducedbytherealalwaysblockinthesamecycle.Tosimplify thesolvingprocess,were-arrangethetracelesothatallconstraintsproducedby continuousassignmentsareplacedbeforetheconstraintscorrespondingtonormal alwaysblocks. Thesemanticsofaregistervariablerequiresthatifavariableisnotupdated,it shouldkeepitsvaluefromthepreviouscycle.However,thispropertyisnotenforcedby theconstraintsinthetracele.Thus,wehavetoexaminethatallassignmentsmade duringacycle,andaddadditionalconstraintstoensurethatallregistersstillmaintains theirvaluesiftheyarenotupdated.Thestructureofavalidpathconstraintleisshown inFigure6-6. ...... Cycle k ContinuousAssignments Cycle k AdditionalConstraints Cycle k Alwaysblocks Cycle k +1 ContinuousAssignments Cycle k +1 AdditionalConstraints Cycle k +1 Alwaysblocks ...... Figure6-6.Pathconstraintlestructure 91

PAGE 92

6.1.6TestGeneration First,wediscussthetestgenerationforpathactivationfault.Sincethegoalis toexploreunreachedexecutionpaths,wecannegateapathconstraintandusethe constraintsolvertocreateanewinputassignment,whichwillguidethedesigntoa differentpath.Currently,wenegatethetopmostpathconstraint.Asaresult,weare essentiallyperformingadepthrstsearch. Algorithm6: TestGenerationAlgorithm test gen constr [0 ;:::;top ] 1: for i = top to 0 do 2: if constr [ i ] isabranchconstraint then 3: c = find next constr [ i ] 4: while c 6 = null do 5: I 0 = satisfy constr [0 ;:::;i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1] ^ c 6: if I 0 6 = null then 7: return I 0 8: endif 9: c = find next constr [ i ] 10: endwhile 11: endif 12: endfor 13: return null find next branch 1: Add branch into covered 2: if branch isanIFstatement then 3: if : branch n2 covered then 4: return : branch 5: endif 6: endif 7: if branch isaCASEstatement then 8: ndandreturnnextuncoveredcase,ifany. 9: endif 10: return null Algorithm6presentsourtestgenerationalgorithm test gen forpathactivationfault indetail.Thealgorithmtakesthepathconstraintle constr [0 ;:::;top ] asinput,where constr [ top ] and constr [ top ] aretherstandlastconstraintsinthele,respectively. 92

PAGE 93

Function test gen examinesallconstraintsproducedbybranchconditionsinthereverse order.Foreverybranchconstraint,werstmarkitascovered,thentrytondthenext uncoveredbranchconstraint.ForIFstatement,wejustneedtocheckthenegated versionofthebranchconstraint.ForCASEstatement,wehavetosearchforthenext uncoveredcase.Afterthat,thenewbranchconstraint c isaddedtoallpreviouspath constraints constr [0 ;:::;i )]TJ/F15 11.9552 Tf 12.379 0 Td [(1] toformtheconstraintsforthenexttest.Ifitissatisable, theassignment I 0 returnedfromtheconstraintsolverwillbereturnedasthenexttest input.Otherwise,weexaminenextuncoveredbranch,untilallbranchesarechecked. Inthisway,itisguaranteedthat I 0 willforcethedesigntoexerciseadifferentexecution pathduringthenextroundofsimulation.Recallthatthedesignissimulatedforaxed numberofcycles.Ouralgorithmeventuallyterminatesonceallreachablebranches withinthegivennumberofcyclesareexplored. Eachbranchisuniquelyidentiedbyitslinenumber,attenedinstancename,and cyclenumber.Toavoidthepathexplosionproblem,acoveredbranchismarkedandnot exploredagaininthefollowingtestgenerationprocess. Forotherfaultmodelsincludingstuck-at,node,edge,sequenceandinteraction faultmodel,thedesiredbehaviorcanbecheckedduringtheexplorationofdifferent executionpaths.Onceweobtainanewexecutionpath,theconstraintsolveris employedtocheckwhetherthedesiredbehaviorispossibleonthepath. 6.1.7ConstraintSolvingOptimization Inourcurrentimplementation,weemployedYices[34]asourconstraintsolver. Sincethepathconstraintusuallycontainsaverylargenumberofconstraints,itisvery importanttoreducetimeconsumptioninconstraintsolving.Currently,weusethree optimizationtechniques. 1. Cone-of-inuenceCOIreduction :Inmanydesigns,alargenumberofvariables areusedfordatatransferandnotinvolvedinthecontrolpath.Inotherwords,they arenotinthecone-of-inuenceofanybranchconstraintsincurrentexecution path.Itisthereforesafetoremovetheconstraintsinvolvingthesevariablesfrom thepathconstraintlewithoutchangingitssatisability.Thisoptimizationis 93

PAGE 94

similartotheCFGunrollingandUDchainslicingtechniqueproposedin[56].It shouldbenoticedthatsincethevariableindicesinarraysarereplacedbytheir concretevaluesinthetracele,weareabletodetectthedatadependency throughdynamicarrayreference. 2. Earlyunsatisabledetection :Somevariables,likeresetsignal,areusedwidely acrosstheentiredesignasswitchvariables.Asaresult,theyappearsinthe pathconstraintforseveraltimesineveryclockcycle.Itisenoughtonegatethe rstoccurrenceofarecurringpathconstraint,becausethenegationofitsother occurrencemustbeunsatisable. 3. Unsatisablecoredetection :Someconstraintsolveriscapabletoreturnthe unsatisablecoreofaunsatisablemodel.Clearly,ifallconstraintsinthe unsatisablecoreremainsinthepathconstraintle,themodelmustbestill unsatisable.Thisinformationcanbeutilizedtoreducethenumberofexpensive constraintsolvercallsbyskippingthenegationofsomepathconstraints. 6.2ImplementationDetails 6.2.1DesignFlattening Ideally,wecanuseanHDLparsertoproduceaattenedversionoftheoriginal design.Unfortunately,modernHDLparsersusuallyperformsomeoptimizationsduring theatteningprocess.Forexample,somearithimeticoperationsmaybereplacedby synthesizablecomponents.Asaresult,itisnoteasytomaptheoperationperformedby theatteneddesignbacktotheoriginaldesign.Weuseadifferentapproachtosolve thisproblem.Insteadofatteningtherealdesignandthenperforminginstrumentation, wesolvetheproblemfromthesimulatorside. InmodernVerilogsimulators,theinputVerilogleisusuallycompiledintoa simulationlebeforerealsimulationexecution.ForsomesimulatorslikeIcarusVerilog [92],thecompiledsimulationleispresentedasanassemblycodelikeprogram. Supposetheinstrumenteddisplaystatementis $display"assert=need_off0b0",need_off;} whereassert=need off0b0means need off equalstozeroinYicesinput language.Inthissimulationle,thedisplaystatementispresentedas, %vpi_call26077"$display","assert=need_off0b0;",v0x130e570_0; 94

PAGE 95

Here,theargumentlistonthesecondlineisalistofaddresses,eachofwhich correspondstoaregister,orwirevariableintheVerilogle.Forexample,v0x130e570 0 istheaddressofvariable need off .Wepostxthevariablenameswiththeiraddresses toremovetheambiguitycausedbyinstantiationofthesamemodule.Forexample,the abovestatementiswrittenas %vpi_call26077"$display",";assert=need_off130e5700b0;"; Inthisway,differentinstantiationofthesamevariableisdisambiguatedwithinthe tracele.Forexample,supposethereisanotherinstantiationofvariable need off ,it musthaveadifferentaddressotherthanv0x130e570 0inthecompiledsimulationle. 6.2.2ClockCyclePopulation Todifferentiatethesamevariableindifferentcycles,weconcatenateeachvariable namewiththeconcretevalueofthecurrentclockcyclenumber.Duringtheconcrete simulation,thevalueispopulatedautomatically.Toaccomplishthis,wepostxthe variablenamewith%0d.Forexample, %vpi_call26077"$display",";assert=need_off130e570c%0d0b0;", v0x13233b0_0; wherev0x13233b0 0istheaddressof cc .Duringtheconcretesimluation,thetracele automaticlyreceivesthecorrectcyclecount,i.e.,thetraceoutputinthesecondcycle becomes assert=need_off130e570c20b0; 6.2.3DynamicArrayReferenceDisambiguation AsdiscussedinSection6,weaddressvariable-indexedelementsinthearrayusing theconcretevalueoftheindices,sothatwecanreasonaboutdynamicdatawithout aliasanalysis.Weimplementthisfeatureasfollows.Supposethefollowingdisplay statementisusedtoassignarrayelement ram[wb adr i] thevalue0b. %vpi_call242"$display",";assert=ram[wb_adr_i]0b0;", v0x1322030,v0x13221b0_0; 95

PAGE 96

Since wb adr i istheindexoftheelementinthearray ram ,werewritetheabove statementas %vpi_call242"$display",";assert=ram1322030_%d0b0;" ,v0x13221b0_0; Assume ram[wb adr i] referstotheelement ram[65535] whenthecorresponding assignemntismadeduringtheconcretesimulation,i.e., wb adr i is65535.Sincewe replace wb adr i withitsconcretevalueusing%d,theresultanttraceoutputbecomes assert=ram1322030_655350b0; Clearly,allreferencesto ram[65535] canbeeasilyidentiedbycheckingwhetherit referstovariable ram1322030 65535 6.3Experiments Wedevelopedaprototypeofourdirectedtestgenerationframework.Ourtest generationtooltakesaVerilogdesignasinputanditerativelyproducesnewtests.We havemodiedIcarusVerilog[92]forinstrumentationwithapproximately500linesof C++code.Wealsoimplementedatestgenerationengineapproximately2000linesof C++codetoperformconcretesimulationontheHDLsimulator,analysisthetracele, generatepathconstraintsandinvoketheSMTsolver.Ourframeworkisfullyautomated andthereisnoneedtomanualinterventionatanystage. Inthissection,wepresenttheexperimentalresultsofourcasestudies.We comparedourapproachwithexistingmethodsincludingHYBRO[56],therandom testtechnique,andmodelcheckingbasedapproach[22].Theexperimentsare performedusingRTLmodelsfromITC99andtwoprocessordesigns.Asdiscussed inSection6.2.1,weusedIcarusVerilogasVerilogparserandsimulator.Yices[34]was employedforconstraintsolving.Allexperimentswereperformedon3GHzAMDOpteron Processorwith10GBmemory. 96

PAGE 97

6.3.1DesignswithoutDynamicArrayReferences Inthissection,wecomparetheperformanceofourapproachwithHYBRO[56]. Tomakefaircomparison,wechoosethesameITC99RTLmodelsas[56],withsame numberofunrolledcyclesandthesameSMTsolver.Weonlycomparethebranch coverageinourexperiments,becausetheassertionsusedforfunctionalcoveragein [56]isnotavailabletopublic. Table6-2.ComparisonwithHYBRO[56] BenchUnrollHYBRO[56]Ourapproach markCyclesBran CovTimeBran CovTime b011094.44%0.07s96.30%2.24s b061094.12%0.10s96.30%2.36s b103096.77%52.14s96.67%24.61s b111078.26%0.28s82.35%3.75s b115091.30%326.85s94.44%270.28s b141583.50%301.69s98.95%257.59s Table6-2presentstheexperimentalresults.Thersttwocolumnsindicatesthe designnameandthenumberofunrolledcycles.Thenextfourcolumnsshowthe branchcoveragerateandthetimeconsumptionofHYBRO[56]andourapproach, respectively.Thebranchcoveragerateiscalculatedusingthesameconventionin[56], whereunreachabledefaultbranchesincasestatementarealsoincluded.Theresults suggestthatourapproachhascomparableperformancewithHYBRO[56]onthese benchmarks.Comparableperformanceisexpectedbecausethecone-of-inuence reductionemployedinourapproachisessentiallyequivalenttotheCFGunrollingand UDchainslicingoptimizationinHYBRO[56].Notethattheseare8ITC99benchmarks thathavearrays.Since[56]isnotapplicableondynamicarrayreferences,wedonot presentthoseresults. 6.3.2DesignswithDynamicArrayReferences ThisexperimentwasperformedonZetprocessor,whichisanopensource implementationofthe16-bitsx86instructionsetarchitecture.Whensynthesizedin 97

PAGE 98

congurabledeviceslikeFPGA,ZetprocessorcanbootMS-DOS6.22andrunMicrosoft Windows3.0.Theprocessorisimplementedusing5K+linesofVerilogcode,289 continuesassignments,53alwaysblocks,324registervariablesand666wirevariables. Boththemainmemoryandtheregisterslearemodeledasarraysandaddressedwith variables. Ourgoalinthisexperimentistoachievehighbranchcoverageinsourcecode level.Thisisimportantbecausetherearealargenumberofconditionalbranchesin theopcodedecodestage.Besides,sincex86instructionsethasvariablelengthbinary encoding,itisnoteasytoinvokeallbranchesinthedesign.Theprimaryinputofthe designisthelowest4bytesofthememoryspacex00000-0x00003.Beforeexecuting thetest,theprocessoronlyexecutesajumpinstructionto0x00000afterreset.The designissimulatedfor10cycles.Wecomparedwithrandomtestssinceitistheonly testgenerationtechniquethatsupportsHDLdesignswithdynamicarrayreference. Table6-3.Comparisonwithrandomtesting Method#TestsExploredBranchTime BranchesCoverage Random100019789.95%366.45s Random500020493.15%1981.73s Random1000020894.98%3785.49s Random2000021296.80%7386.92s Random4000021397.26%14585.83s Ourapproach14021899.54%1320.58s Table6-3showstheexperimentresult.Therstverowsdepicttheresultsbyusing 1000,5000,10000,20000,and400000randomtests,respectively.Theperformanceof ourapproachisshowninthelastrow.Itcanbeseenthatduetotherandomnature,itis verytimeconsumingtoreach100%branchcoverageevenusingthousandsofrandom tests.Ontheotherhand,ourdirectedtestgenerationschemeeffectivelyexplored executionpathsbyavoidingcoveredbranches.Withlessthan200tests,ourframework achieveshighercoveragethan40000randomtests. 98

PAGE 99

Figure6-7.TheMIPSarchitecture[22] 6.3.3SAT-basedBMCversusOurApproach Toevaluatetheeffectivenessofourproposedapproachinotherfaultmodels, wecompareourapproachwiththeBMC-basedtestgenerationtechnique[22].The experimentisperformedonthemodelofasingle-issueMIPSprocessor.Figure6-7 showsitsbriefstructure.Ithasvepipelinestages:fetch,decode,execute,memory MEM,andwriteback.Theexecutestagehasfourparallelexecutionpaths:integer ALU,7stagemultiplierMUL1-MUL7,fourstageoating-pointadderFADD1FADD4,andmulti-cycledividerDIV.TheSAT-basedBMCapproachacceptstheSMV descriptionofMIPSasinput,whereasourapproachacceptsanequivalentVerilog description.Weusefourdifferentfaultmodelsfrom: 1.NodeFault,i.e.,anodecannotbeactivated. 2.EdgeFault,i.e.,successivestateofanodecannotbeactivatedincertainorder. 99

PAGE 100

3.SequenceFault,i.e.,theassociatenodesandedgescannotbeactivatedincorrect order. 4.InteractionFault,i.e.,asetofnodescannotbeactivatedatthesametime. SincethefaultsinaredescribedusingLTLproperties,wemanuallytranslatedall propertiesintoVerilogassertionsregardingcorrespondingregistersindifferentcycles. Thedesignissimulatedfor10cycles. Table6-4.ComparisonwithBMC[22] Types#FaultsBMC[22]Ourapproach TimeTime NodeFault2017.53s22.56s EdgeFault2537.51s24.27s SequenceFault1621.10s20.29s InteractionFault110375.22s340.56s Table6-4showstheexperimentalresults.Thersttwocolumnspresentthefault typesandthenumberofpropertiestobeactivatedforeachfaulttype.Thenexttwo columnsdepictsthetimeconsumptionofBMC[22]andourproposedtechnique, respectively.Althoughthetestgenerationprocessesarequitedifferent,thetotaltime consumptionofBMCisclosetothetimeconsumptionofourapproach.Actually,the tracelegeneratedbyourapproachcanbeviewedasaunrolledversionofthedesign oncurrentexecutionpath,whiletheBMC-basedapproachunrollstheSATpresentation ofthetransitionrelationofthedesign.Therefore,ourapproachhascomparable performancewithSAT-basedBMCwhenasimpledesignisinvolved.However,it shouldbenoticedthatitisnotalwayspossibletousemodelcheckingonrealdesigns directly.Whenthedesigncontainsmanybranchesanddynamicarrayreferences,it becomesdifculttoapplyBMC-basedapproachwithouttranslationorabstraction,while ourproposedapproachcanstillbeappliedasillustratedinSection6.3.2. 100

PAGE 101

6.4Summary FunctionalvericationofmodernSOCdesignsischallengingduetoincreased designcomplexityandreducedtime-to-market.Directedtestsarepromisingbecauseit requiressignicantlylessnumberofteststoachievethesamecoveragegoalcompared torandomtests.Unfortunately,modelcheckersusuallydonotacceptrealhardware designsorsupportfeaturessuchasarrays.Moreover,therealdesignsusuallyexceeds thecapacityofmodelcheckersduetothecomplexityofstaticanalysis.Inthischapter, wepresentedanoveltestgenerationapproachthataddressesbothoftheseproblems usinginterleavedconcreteandsymbolicexecution.Thedesignisrstsimulatedto generateanexecutiontrace.Theconstraintsolveristhenappliedtondthetest inputswhichcanforcetherealdesigntoexercisethedesiredbehavior.Comparedwith existingapproachesbasedoncombinedconcreteandsymbolicexecution,ourapproach iscapableofanalyzingrealprocessordesignswithdynamicarrayreferences.The experimentalresultsdemonstratethatourproposedtechniqueisscalable,andenables directedtestgenerationforrealdesigns. 101

PAGE 102

CHAPTER7 TEMPERATURE-ANDENERGY-CONSTRAINEDSCHEDULINGINREAL-TIME SYSTEMS Sincehighon-chipthermaldissipationhasseveredetrimentalimpact,wehaveto controltheinstantaneoustemperaturesothatitdoesnotgobeyondacertainthreshold. DynamicvoltagescalingDVSisacknowledgedasoneofthemostefcienttechniques usedinbothenergyoptimization[21]andtemperaturemanagement[101].Inexisting literatures, temperatureenergy-constrained meansthatthereisatemperature thresholdenergybudgetwhichcannotbeexceeded,while temperatureenergyaware meansthatthereisnoconstraintbutmaximuminstantaneoustemperature totalenergyconsumptionneedstobeminimized.Inthischapter,weproposea formalmethodbasedonmodelcheckingfortemperature-andenergy-constrained TCEC schedulingproblemsinmultitaskingsystems.Inthiswork[71],wepresentan approximationalgorithm,whicheffectivelyaddressesthestatespaceexplosionproblem causedbymodelcheckers[88].Theapproximationschemewillgivenofalsepositive answer,whileitspossibilitytoreportfalsenegativeanswercanbesmallenoughfor practicalusage. Therestofthechapterisorganizedasfollows.Section7.1providesrelated backgroundinformation.Section7.2providesanoverviewofourframework.Section7.3 describesourcontributionindetails.ExperimentalresultsarepresentedinSection7.5. Finally,Section7.6summarizesthechapter. 7.1BackgroundandProblemFormulation ThissectionprovidestheformaldescriptionoftheTCECschedulingproblem.Since manyaspectsofreal-timesystemsareinvolved,werstprovidesomebackground information. 7.1.1ThermalModel AthermalRCcircuitisnormallyutilizedtomodelthetemperaturevariationbehavior ofamicroprocessor[101].WeadopttheRCcircuitmodelproposedin[75],whichis 102

PAGE 103

widelyusedinrecentresearch[101][46],tocapturetheheattransferphenomenainthe processor.If P denotesthepowerconsumptionduringatimeinterval, R denotesthe thermalresistance, C representsthethermalcapacitance, T amb and T init aretheambient andinitialtemperature,respectively,thetemperatureattheendofthetimeinterval t can becalculatedas: T = P R + T amb )]TJ/F15 11.9552 Tf 11.955 0 Td [( P R + T amb )]TJ/F24 11.9552 Tf 11.955 0 Td [(T init e )]TJ/F26 5.9776 Tf 5.756 0 Td [(t RC = )]TJ/F24 11.9552 Tf 11.955 0 Td [(e )]TJ/F26 5.9776 Tf 5.756 0 Td [(t RC T s + e )]TJ/F26 5.9776 Tf 5.756 0 Td [(t RC T init where t isthelengthofthetimeinterval, T s = P R + T amb isthesteady-statetemperature. 7.1.2EnergyModel Weadapttheenergymodelproposedin[60].Processor'sdynamicpowercanbe representedas P dyn = C V 2 dd f Here V dd isthesupplyvoltageand f istheoperationfrequency. C isthetotalcapacitance and istheactualswitchingactivitywhichvariesfordifferentapplications[8].Inother words,task'spowerprolecanbedifferentfromeachother.Staticpowerisgivenby P sta = V dd I subth + j V bs j I j where V bs I subth and I j denotethebodybiasvoltage, subthresholdcurrentandreversebiasjunctioncurrent,respectively.Hence,wehave P = P dyn + P sta .Ourtechniqueis,however,independentofthepowermodelandthermal model. 7.1.3SystemModel Thesystemweconsidercanbemodeledas: 1.Avoltagescalableprocessorwhichsupports l discretevoltagelevels f v 1 v 2 ,..., v l g 2.Asetof m independenttasks f 1 2 ,..., m g 3.Eachtask i 2f 1 2 ,..., m g hasknownattributesincludingworst-caseworkload, arrivaltime,deadline,periodifitisperiodicorinter-arrivaltimeifitisaperiodic/sporadic. 103

PAGE 104

Theruntimeoverheadofvoltagescalingisvariableanddependsontheoriginaland newvoltagelevels.Thecontextswitchingoverheadisassumedtobeconstant.Forease ofdiscussion,theterms task job and executionblock refertothesameentityintherest ofthischapter. 7.1.4TCECproblem Theproposedmethodologycanbeappliedtobothscenariosinwhichtaskset hasacommondeadlineandeachtaskhasitsowndeadline.Foreaseofdiscussion, thefollowingdenitionofTCECproblemisconstructedfortasksetswithacommon deadline.ThesecondcasewillbediscussedinSection7.4. Givenatraceof m jobs f 1 ; 2 ; ; m g ,wheretask i +1 isexecutedafter i 1 i
PAGE 105

Denition5. TCECinstance :Isthereavoltageassignment f l 1 ;l 2 ;:::;l m g 1 suchthat: m X i =1 t i;l i + l i )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 ;l i D m X i =1 w i;l i + l i )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 ;l i W T i T max ; 8 i 2 1 ;:::;m T i iscalculatedbasedonEquation7foreach i ,i.e., T i = )]TJ/F24 11.9552 Tf 11.955 0 Td [( i T l i s + i T i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 where i = e )]TJ/F25 7.9701 Tf 6.586 0 Td [(t i;l i =RC Recallthat t i;l i istheworstcaseexecutiontimeoftask i under voltagelevel l i T = T init ,and T l i s isthesteady-statetemperatureofthesystem, when l i isapplied.Equation7,7and7denotethecommondeadline,energy andtemperatureconstraints,respectively. Whentheworkloadisperiodic,wealsorequirethetemperatureattheendofthe hyperperiodtobelessthanorequaltotheinitialtemperature.Inthisway,theresultant scheduleisguaranteednottoexceedthetemperatureconstraintirrespectiveofthe lengthoftheexecutiontimei.e.,thehyper-periodmayrepeatmanytimes..Formally, supposethereare m taskswithinthehyperperiod,andthelasttaskis m ,inadditionto thetemperatureconstraintsin7,wealsorequire T m T init MoredetaileddiscussionaboutperiodicworkloadcanbefoundinSection7.4. 1 l i denotetheindexoftheprocessorvoltagelevelassignedto i 105

PAGE 106

7.2Overview Figure7-1illustratestheworkowofourapproach,whichacceptsataskexecution traceasinput.Thetaskexecutiontracecanbeproducedbyaschedulerwithcertain schedulingpolicy.Theschedulerexecutesthetasksetunderthehighestvoltage levelandproducesatraceof executionblocks .Anexecutionblockisdenedas apieceoftaskexecutioninacontinuousperiodoftimeunderasingleprocessor voltage/frequencylevel.Eachexecutionblockisessentiallyawholetaskinstancein non-preemptivesystems.However,inpreemptivescheduling,taskscouldbepreempted duringexecutionhenceoneblockcanbeasegmentofonetask.Theschedulerrecords runtimeinformationforeachblockincludingitscorrespondingtask,requiredworkload, arrivaltimeanddeadline,ifapplicable. Thetaskexecutiontrace,alongwithsystemspecicationprocessorvoltage, frequencylevels,temperatureconstraintsor/andenergybudgetandthermal/power modelsarefedintothetimedautomatageneratorTAGthatwehavedeveloped.TAG generatestwoimportantoutputs.Oneisthecorrespondingtimedautomatamodel[88], andtheotheroneispropertiesreectingthetemperature/energy/deadlineconstraints denedinsystemspecication.Afterthat,asuitablesolvere.g.,amodelcheckeris appliedtondafeasiblescheduleofthetasks,orconrmthattherequiredconstraints cannotbemet.Thismethodologyisexibleandcompletelyautomatic.Itisbasedon formaltechniqueandsuitableinearlydesignstages. Asdiscussedin[88],modelcheckerslikeUPPAALcanbeusedtoverifythe generatedmodeldirectly.However,whenthenumberofjobsislarge,itcanbetime consumingtocheckthepropertiesonthetimedautomatadirectly.Thereasonisthatthe underlyingsymbolicmodelcheckersometimescannothandlelargeproblemsduetothe statespaceexplosionproblem.Toaddressthestatespaceexplosionprobleminmodel checking,weproposeanapproximationalgorithmforTCECschedulinginSection7.3. 106

PAGE 107

Wealsodemonstratetheapplicabilityofourapproachtosolveotherproblemvariants includingTC,TA,TAECandTCEAinSection7.4. Figure7-1.OverviewofourTCECschedulabilityframework. 7.3ApproximationAlgorithmforTCECScheduling ToalleviatethestateexplosionprobleminTCECscheduling,wecanformulateour modelcheckingproblemasaMulti-ConstrainedPathproblemMCP.AlthoughMCP isNP-Completeformorethanoneconstraints,weareabletodesignpolynomialtime approximationschemewhichcanbetunedwithenoughaccuracyforpracticaldesign usage.Inthissection,werstexplainhowtomodelTCECproblemasMCP.Next,we discussthepseudo-polynomialtimemodelcheckingalgorithmbasedonBellman-Ford algorithm.Finally,wepresentourpolynomialtimeapproximationalgorithmforTCEC. 7.3.1Notations Givenadirectedgraph G = V;E ,apath p = s n 1 !! n i andan edge e i = n i ;n i +1 2 E ,where s;n 1 ; ;n i 2 V ,thenotation p jj e i denotesthepath s n 1 !! n i n i +1 .Inotherwords, p canalsobeexpressedas e 0 jj e 1 jjjj e i where e 0 = s;n 1 e 1 = n 1 ;n 2 e i = n i ;n i +1 107

PAGE 108

Givenvectors a ; b 2 R N ,wesaythat a isdominatedby b ,or a b ,iffeach componentof a issmallerorequaltothecorrespondingcomponentin b .Foravector a weuse a 1 ;a 2 ;a 3 todenotetherst,secondandthirdcomponentof a 7.3.2TCECasMCP s d n 1 ; 2 n 1 ;L n 2 ;L n 2 ; 1 n 1 ; 1 n 2 ; 2 n m; 1 n m; 2 n m;L Figure7-2.Jobexecutiongraph AninstanceTCECcanbereducedtoaninstanceofMCP,ifweviewtheexecution jobsatdifferentvoltagelevelsasapathinjobexecutiongraphJEG.Asshownin Figure7-2,aJEGcontainsasourcenode s ,adestinationnode d ,and m layersof jobtasknodes.Ineachlayer,thereare l nodesforeachvoltagelevel.Edgesonly existbetweendifferentlayersofjobnodes,orjobnodesandsource/destinationnodes. Formally,wedeneJEGasfollows. Denition6. JobexecutiongraphJEG isanacyclicdirectedgraph G = V;E with followingproperties: V = f s;d g S f n i;j j 1 i m; 1 j l g ; E = f s;n 1 ;j j 1 j l g S f n m;j ;d j 1 j l g S f n i;j ;n i +1 ;j 0 j 1 i
PAGE 109

f p t t 0 = f q t t 0 + t i;j + j 0 ;j f p w w 0 = f q w w 0 + w i;j + j 0 ;j f p T T 0 = f q T T 0 + )]TJ/F24 11.9552 Tf 11.955 0 Td [( T s ; = e )]TJ/F25 7.9701 Tf 6.587 0 Td [(t i;j =RC where q = e 0 jj e 1 jjjj e i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 isaprexof p ,whichstartsfrom s andendsat n i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ;j 0 .For p = e 0 = s;n 1 ;j f p t t 0 = t 0 f p w w 0 = w 0 f p T T 0 = T 0 where t 0 w 0 and T 0 arethetime,energyconsumptionandtemperaturebeforethe executionofthetaskset.Normally,wehave t 0 = w 0 =0 and T 0 = T init .Wecanalso writethepathtransferfunctionsinvectorform f p I 2 6 6 6 6 6 6 4 f p t t 0 f p w w 0 f p T T 0 3 7 7 7 7 7 7 5 where I =[ t 0 w 0 T 0 ] T Usingtheabovedenition,thevalueoftime,energyconsumptionandtemperature ofrst i jobswithvoltageassignment f j 1 ;j 2 ; ;j i g canbeexpressedas f p I where p = s n 1 ;j 1 !! n i;j i .WeusetheexampleinFigure7-3toillustrate suchcomputationinpractice.Inthiscase,wehave m =2 jobsand l =2 voltage levels.Supposethattheinitialtemperature T 0 =65 C andconstant RC =30 us Thedesignconstraintsaredeadline D =32 us ,energybudget W =55 mJ and 109

PAGE 110

e 1 [0 us; 0 mJ ] d s n 1 ; 1 [70 C ] n 2 ; 1 [70 C ] n 1 ; 2 [80 C ] n 2 ; 2 [80 C ] e 2 [0 us; 0 mJ ] e 3 [20 us; 30 mJ ] e 8 [9 us; 24 mJ ] e 7 [12 us; 18 mJ ] e 6 [15 us; 40 mJ ] e 5 [20 us; 41 mJ ] e 4 [21 us; 31 mJ ] Figure7-3.JEGofTCEC.Thevaluesnexttoeachedgearecorrespondingtimeand energyconsumption. maximumtemperature T max =75 C .Assumethatwedecidetousevoltagelevel1 and2toexecutejob1and2respectively.BasedonthedenitionofJEG,thisvoltage assignmentcorrespondsto s )]TJ/F24 11.9552 Tf 12.086 0 Td [(d path p = e 1 jj e 4 jj e 8 highlighted.Thetimeconsumption aftertheexecutionofalljobscanthereforebecomputedas f e 1 jj e 4 jj e 8 t = f e 1 jj e 4 t +9 us = f e 1 t +21 us +9 us =0 us +21 us +9 us =30 us Similarly,wecancomputetheenergyconsumptionof p as f e 1 jj e 4 jj e 8 w =0 mW +31 mJ +24 mJ =55 mJ andthenaltemperatureof p as f e 1 jj e 4 jj e 8 T = e )]TJ/F23 5.9776 Tf 9.608 3.258 Td [(9 30 e )]TJ/F23 5.9776 Tf 7.782 3.259 Td [(21 30 65 C + )]TJ/F24 11.9552 Tf 11.955 0 Td [(e )]TJ/F23 5.9776 Tf 7.782 3.259 Td [(21 30 70 C + )]TJ/F24 11.9552 Tf 11.955 0 Td [(e )]TJ/F23 5.9776 Tf 9.608 3.258 Td [(9 30 80 C =70 : 8 C 110

PAGE 111

Inotherwords,ourscheduleorpath p satisestheconstraints D =32 us W =55 mJ and T max =75 C Clearly,themodelcheckingproblemdiscussedin[88]canbeansweredbychecking whetherthereexistsapath p ,suchthat f p I C ,where C =[ DWT max ] T .The formaldenitionofourMCPproblemisasfollows. Denition7. MCP G; I ; C instance :Givenajobexecutiongraph G ,aninitialstate vector I =[ t 0 ;w 0 ;T 0 ] T ,aconstraintvector C =[ D;W;T max ] T ,istherean s )]TJ/F24 11.9552 Tf 12.627 0 Td [(d path p = e 0 jj ::: jj e m suchthatforall 0 i m f e 0 jjjj e i I C ThedenitionaboveseemstobetighterthanthedenitionofTCECgivenin Section7.1.4,becauseallconstraintsareenforcedaftereachjob,whilethedeadline andenergyconstraintareenforcedonlyafterthelastjobinTCEC.However,they areessentiallyequivalentduetomonotonicnatureofexecutiontimeandenergy consumption. Intherestofthechapter,wewilluse MCP topresent MCP G; I ; C foreaseof illustration.OurdenitionofMCPdiffersfromQualityofServiceQoSMCPproblems [26,76,96,98]innetworking,becausethecomputationofthetemperatureisnot additive.Asaresult,theexistingtechniquescannotbeapplieddirectlytosolveour problem. 7.3.3AnExactAlgorithmforMCP WehavedevelopedAlgorithm7,whichisanextendedBellman-FordEBF algorithmusedforcomputingtheexactanswerforMCPproblem.Itisdevelopedbased ontheEBFalgorithmsin[26,76,98],whichwereusedtosolveMCPwithconstant additiveconstraints.ThisalgorithmacceptsanMCPinstance,includingJEG G ,initial statevector I ,constraintvector C ,andreturnstheanswertotheMCPproblem.The basicideaofthisalgorithmistokeepupdatingapathset Path v foreachvertex v 111

PAGE 112

Algorithm7: ExtendedBellman-FordEBFAlgorithm EBF G; I ; C 1: for each v 2 V do 2: Path v = ; 3: endfor 4: for j =1 to j l j do 5: Path n 1 ;j = f s;n 1 ;j g 6: endfor 7: for i =2 to j m j do 8: for j =1 to j l j do 9: for eachedge u;n i;j 2 E do 10: Relax u;n i;j 11: endfor 12: endfor 13: endfor 14: for eachedge u;d 2 E do 15: if Relax u;d then 16: return TRUE 17: endif 18: endfor 19: return FALSE Relax u;v 1: for each p 2 Path u suchthat f p jj u;v I C do 2: Skip = FALSE 3: for each q 2 Path v do 4: if f q I f p jj u;v I then 5: Skip = TRUE 6: Break 7: endif 8: endfor 9: if Skip = FALSE then 10: Insert p jj u;v into Path v 11: endif 12: if v = d then 13: return TRUE 14: endif 15: endfor 16: return FALSE ; 112

PAGE 113

whichisasubsetofallpossible s )]TJ/F24 11.9552 Tf 12.973 0 Td [(v paths.Byimplicitlyenumeratingallpossible pathsbetween s and t ,wejustneedtocheckwhetherthereisanypath p 2 Path d thatsatisestheconstraintvector C .Thisenumerationisaccomplishedbycalling functionRelaxonalledgesfor j V j timesline4-7inEBF.Allpathsareexamined implicitlybecausethelongestpathinacyclicgraph G contains j V j edges.Toimprove theefciency, Relax willaddanewpath p jj u;v onlywhenitdoesnotdominateany existingpathsin Path v Example2 :WeusethesameexampleinFigure7-3todemonstratetheexecution of EBF .First,weinitialize Path n 1 ; 1 = f e 1 g and Path n 1 ; 2 = f e 2 g .Then,weperform Relax onedges e 3 and e 4 ,whichstartfrom n 1 ; 1 line7-8in EBF .In Relax e 3 ,we attempttocreatenewpathsfrom s to n 2 ; 1 byappending e 3 toknownpathfrom s to n 1 ; 1 and n 1 ; 2 ,whicharestoredin Path n 1 ; 1 and Path n 1 ; 2 .Since Path n 1 ; 1 = f e 1 g ,wejust needtocheckpath e 1 jj e 3 .Itiseasytoseethat f e 1 jj e 3 I =[203067 : 4] T isdominatedby constraints C =[ DW ] T =[325575] T ,i.e.,constraintsarenotviolated.Therefore,path e 1 jj e 3 isinsertedinto Path n 2 ; 1 ,whichwasempty.Ontheotherhand,path e 2 jj e 5 willnot beaddedinto Path n 2 ; 1 during Relax e 5 ,because f e 2 jj e 5 I =[204172 : 3] T dominates f e 1 jj e 3 I =[203067 : 4] T .Thereasonisthatifthereexistsapathin Path n 2 ; 1 like e 1 jj e 3 whichhaslesstime/energyconsumptionthanthenewpath e 2 jj e 5 ,thenewpathcannot beaprexoftheoptimalpath.Werepeattheaboveprocessuntilwereachnode d .If Path d containsapath,whichsatisesalltheconstrainslike e 1 jj e 4 jj e 8 EBF ndsthe requiredscheduleandreturnstrue.Otherwise,weconcludethatsuchscheduledoes notexist. AlthoughEBFisguaranteedtondtheexactanswertoMCP,itstimecomplexity isquitehigh.AsshowninAlgorithm7, Relax isexecutedfor m l times,whilethe timecomplexityof Relax canbe O j Path max j 2 ,where j Path max j =max v 2 V Path v Therefore,theoverallcomplexityofEBFis O m l j Path max j 2 ,whichisonly pseudo-polynomial,becauseintheworstcase j Path max j canbe O l m .Unfortunately, 113

PAGE 114

wemaynotbeabletondasolutiontoMCPinpolynomialtime.Asindicatedin[89],we canreducethePartitionproblemtoanMCPinstancebyproperlysetting t i;j and w i;j .In otherwords,MCPisNP-Complete. 7.3.4ApproximationAlgorithm Beforeweintroduceourapproximationschemefor MCP ,werstpresentanother problem MCP ,whichiscloselyrelatedto MCP Denition8. MCP G; I ; C instance :Givenapositiveconstant > 0 ,ajobexecution graph G ;aninitialstatevector I =[ t 0 ;w 0 ;T 0 ] T ;aconstraintvector C =[ D;W;T max ] T thereexistsan s )]TJ/F24 11.9552 Tf 11.955 0 Td [(d path p = e m jj ::: jj e 0 suchthatforall 0 i m f e 0 jjjj e i t t 0 D f e 0 jjjj e i w w 0 )]TJ/F24 11.9552 Tf 11.955 0 Td [( W f e 0 jjjj e i T T 0 )]TJ/F24 11.9552 Tf 11.955 0 Td [( T max MCP istighterthan MCP .Any s )]TJ/F24 11.9552 Tf 12.439 0 Td [(d paththatsatisestheconstraintsin MCP alsosatisestheconstraintsin MCP ,butnotviceversa.Inthissection,wearegoing todevelopanapproximationalgorithm EBF to MCP ,suchthat1 EBF istrueimplies MCP istrue,and2 EBF isfalseimplies MCP isfalse.Inotherwords, EBF gives nofalsepositiveanswerto MCP .Itmaygivefalsenegativeanswerwhentheexact answersto MCP and MCP aretrueandfalserespectivelyi.e.,therearefeasiblepaths for MCP ,butnofeasiblepathfor MCP .Since MCP becomes MCP when =0 EBF willbemoreandmoreaccuratewhen 0 InaJEG G = V;E ,wedenefunctions h 1 ;h 2 ;h 3 oneachedgetosimplifythe descriptionofourapproximationscheme.Here, h 1 h 2 ,and h 3 correspondstothe functionsrelatedtothefunctionsoftime,energyandtemperature,respectively.For 114

PAGE 115

e = s;n 1 ;j 2 E j l ,wedene h e 1 x 1 = x 1 h e 2 x 2 = x 2 h e 3 x 3 = x 3 Forother e 2 E h e 1 x 1 = x 1 + t i;j + j 0 ;j h e 2 x 2 = x 2 + w i;j + j 0 ;j h e 3 x 3 = x 3 + )]TJ/F24 11.9552 Tf 11.955 0 Td [( T s ; = e )]TJ/F25 7.9701 Tf 6.587 0 Td [(t i;j =RC Basedonthedenitionofpathtransferfunctions,itiseasytoseethatforpath p = e 0 jjjj e i f p t t 0 = h e i 1 h e 0 1 t 0 f p w w 0 = h e i 2 h e 0 2 w 0 f p T T 0 = h e i 3 h e 0 3 T 0 where isthecompositionoperationforsuccessiveinvocationfunctions. Thebasicideaofourapproximationschemeistobuildatable Z n foreachnode n .Eachcellinthistableholdstheleastvalueoftimeconsumptionamongallexecution paths,whichhavethesameenergyandtemperaturevalueafterscaling.Inotherwords, eachcellrepresentsanoptimalexecutionpath.Dynamicprogrammingisthenappliedto lleach Z n .Theapproximatedsolutioncanbeobtainedbychecking Z d ,whichholdsthe approximatedleasttimeconsumptionofallpossibleexecutionpaths. Algorithm8showsthedetailsofourapproximationalgorithm EBF .Initially,we computethetablesize M andthestepsize k foreachconstraintbasedonthevalue of line1and2of EBF ,andtheninitialize M M tables Z n foreachnodein G .Here, 115

PAGE 116

Algorithm8: EBF G; I ; C 1: M = d m +1 = e 2: k = C k = m +1 ;k =2 ; 3 3: for each v 2 G do 4: for each c 2 ;c 3 2f 0 ; 1 ;::;M g 2 do 5: Z v c 2 ;c 3 = 1 6: v c 2 ;c 3 = null 7: endfor 8: endfor 9: Z s d I 2 = 2 e ; d I 3 = 3 e =0 10: for i =1 to j m j do 11: for j =0 to j l j do 12: for eachedge u;n i;j 2 E do 13: Relax u;n i;j 14: endfor 15: endfor 16: endfor 17: for eachedge u;d 2 E do 18: if Relax u;d then 19: return TRUE 20: endif 21: endfor 22: return FALSE Relax u;v 1: for each c 2 ;c 3 2f 0 ; 1 ;::;M gf 0 ; 1 ;::;M g do 2: if h u;v k c k k C k for k =2 ; 3 then 3: b k = d h u;v k c k k = k e ;k =2 ; 3 4: Z new = h u;v 1 Z u c 2 ;c 3 5: if Z new
PAGE 117

thestepsize k isusedtoscaletheenergyandtemperaturevaluesasindicesinthe table.Forexample,cell d I 2 = 2 e ; d I 3 = 3 e in Z s holdsthetimeconsumptionbeforewe executeanyjobs,whichisinitializedas0inline7.Therestof EBF issimilarto EBF Weusedynamicprogrammingtolleach Z n bycalling Relax ,whichcanbeviewed asascaledversionof Relax .In Relax u;v ,wetraverse Z u toll Z v byextending pathsin Z u .Since Z u isan M by M table,weuse c 2 and c 3 2f 0 ; 1 ;::;M g asindex variablesline1.Aswehavediscussedpreviously,eachcell Z u c 2 ;c 3 representsan executionpathfrom s to u withtimeconsumption Z u c 2 ;c 3 ,energyconsumption c 2 2 andtemperature c 3 3 2 .Inline2of Relax ,werstcheckwhethertheenergyand temperatureconstraintsareviolatedifthejobisexecutedbasedonedge u;v .Ifno violationoccurs,wecalculatethescaledversionofthenewenergyandtemperature values b 2 ;b 3 .Afterthat,wecomparethenewtimeconsumption Z new = h u;v 1 Z u c 2 ;c 3 withthecurrentvaluein Z v b 2 ;b 3 andupdate Z v whennecessary.Ifwealreadyreach destination d andthetimeconsumption Z new isstilllessthantherequiredvalue C 1 3 wehavefoundtherequiredschedule.Comparedwith Relax Relax doesnotstorethe pathsexplicitlyas Path v in Relax ,butimplicitlyindifferentcellswithineachtable. EBF isapolynomialtimealgorithmforagiven ,becausethecomplexityof Relax is M 2 or m= 2 Relax isexecutedfor m l times.Therefore,theoveralltimecomplexity is O m l m= 2 .Now,weshowthat EBF isapolynomialtimealgorithmwiththe approximationpropertiesasclaimedbythefollowingtwotheorems. Theorem7.1. Givenaninstanceof MCP G; I ; C ,if EBF G; I ; C returns TRUE MCP G; I ; C istrue. 2 Recallthatindicesintablearescaledversionofenergyandtemperaturevalues.We canobtaintheactualenergyandtemperaturevaluesbymultiplyingtableindiceswith 2 and 3 3 C 1 C 2 ,and C 3 aretheconstraintsfortime,energy,andtemperature,respectively. 117

PAGE 118

Proof. When EBF returns TRUE ,letthepath p = e 0 jj ::: jj e m bethepathconstructed bytracingbackusingtable .Clearly, p isa s )]TJ/F24 11.9552 Tf 12.707 0 Td [(d path.Weneedtoshowthatforall 0 i m h e i k ::: h e 0 k I k C k ;k =1 ; 2 ; 3 Clearly, p satisesEquation7for k =1 ,becausetheconditiononline5of Relax guaranteesthat h e i 1 ::: h e 0 1 I 1 C 1 ; 0 i m Since p isconstructedby ,itiseasytoseethat C k h e 0 k d I k = k e k ;k =2 ; 3 Otherwise,conditiononline2of Relax wouldnotbesatisedduring Relax e 0 ,and line7of Relax wouldnotbeexecuted.Thiscontradictsthefactthat e 0 isrecordedin Similarly,for 0 i m and k =2 ; 3 ,wehave C k h e 1 k d h e 0 k d I k = k e k = k e k ::: C k h e i k d ::: d h e 0 k d I k = k e k = k e k :::= k e k Or C k h e i k g k ::: h e 1 g k h e 0 k g k I k where g k isaceilingfunction g k x = d x= k e k 118

PAGE 119

Since h e i k and g k aremonotonicallyincreasingfunctionsand g k x x ,wehave followingrelations d I k = k e k = g k I k I k h e 0 k g k I k h e 0 k I k h e 1 k g k h e 0 k g k I k h e 1 k h e 0 k I k ::: h e m k g k ::: h e 0 k g k I k h e m k ::: h e 0 k I k Thus,for 0 i m C k h e i k g k ::: h e 1 k g k h e 0 k g k I k h e i k ::: h e 0 k I k Therefore,Equation7alsoholdson p for k =2 ; 3 .Bythedenitionof MCP MCP G; I ; C istrue. Lemma3. Givenaninstanceof MCP G; I ; C ,ifthereisan s )]TJ/F24 11.9552 Tf 13.6 0 Td [(d path p = e 0 jj ::: jj e m )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 jj e m suchthat h e i 1 ::: h e 0 1 I 1 C 1 h e i k g k ::: h e 0 k g k I k C k ;k =2 ; 3 holdsfor 0 i m EBF willreturnTRUE. Lemma3canbeprovenbyconsideringthefollowingfactthatifweonlyperform Relax onedgesthatarein p ,Equation7andEquation7guaranteesthat theconditionsonline2and5in Relax aresatisedandline6willbeexecutedin eachround.Eventually, EBF willreturntrue.Ifweperform Relax onmoreedges,the minimalvaluein Z d willnotincrease.Asaresult, EBF stillreturnstrue. Theorem7.2. Givenaninstanceof MCP G; I ; C MCP G; I ; C istrueimplies EBF G; I ; C returnsTRUE. 119

PAGE 120

Proof. Wejustneedtoshowthatifthereisan s )]TJ/F24 11.9552 Tf 11.955 0 Td [(d path p = s n 1 ;j 1 !! n m;j m d = e 0 jj ::: jj e m )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 jj e m suchthatforall 0 i m h e i 1 ::: h e 0 1 I 1 C 1 h e i 2 ::: h e 0 2 I 2 )]TJ/F24 11.9552 Tf 11.955 0 Td [( C 2 ;k =2 ; 3 italsosatisesEquation7and7. Clearly,foranyedge e 2 E h e 2 c + h e 2 c + For h e 3 ,whichrepresentsthetemperatureconstraints,wehave h e 3 c += h e 3 c + h e 3 c + ; = e )]TJ/F26 5.9776 Tf 5.756 0 Td [(t RC because 1 Usingceilingfunctions g k x = d x= k e k ;k =2 ; 3 ,itiseasytoverify g k I k I k + k Byapplying h e i k onitsbothsides,wehave h e 0 k g k I k h e 0 k I k + k h e 0 k I k + k ;k =2 ; 3 120

PAGE 121

because h e 0 k isamonotonicfunction.Therefore, h e 0 k g k I k h e 0 k I k + k ; g k h e 0 k g k I k h e 0 k I k +2 k ; h e 1 k g k h e 0 k g k I k h e 1 h e 0 k I k +2 k ::: h e m k g k ::: h e 0 k g k I k h e m k ::: h e 0 k I k + m +1 k FromEquation7,weknowthat h e i k ::: h e 0 k I k )]TJ/F24 11.9552 Tf 11.955 0 Td [( C k ;k =2 ; 3 Thus,for 0 i mk =2 ; 3 ,wehave h e i k g k ::: h e 0 k g k I k )]TJ/F24 11.9552 Tf 11.956 0 Td [( C k + m +1 k )]TJ/F24 11.9552 Tf 11.955 0 Td [( C k + C k = C k Therefore, p satisesEquation7andEquation7.UsingLemma3, EBF will returntrue. NowweuseTheorem7.2toinvestigateunderwhatconstraints EBF yieldsfalse negativeanswers.Consideringthefactthat MCP G; I ; C 0 with C 0 =[ D;W;T max ] T and MCP G; I ; C 0 0 with C 0 0 =[ D;W= )]TJ/F24 11.9552 Tf 12.409 0 Td [( ;T max = )]TJ/F24 11.9552 Tf 12.408 0 Td [( ] T [ D; + W; + T max ] T when issmallareidentical,Theorem7.2canalsobeinterpretedasfollows. Corollary1. Foranysmall < 1 MCP G; I ; C 0 with C 0 =[ D;W;T max ] T istrueimplies EBF G; I ; C 0 0 with C 0 0 =[ D; + W; + T max ] T returnsTRUE. Inotherwords, EBF G; I ; C willnotproduceafalsenegativeanswer,when C dominates C 0 0 .Forexample,Figure7-4showstheregionwhere EBF mayproduce falsenegativeanswersfordifferent W;T max pair.Inthisexample,therearetwofeasible constraints [ D;W 1 ;T max 1 ] T and [ D;W 2 ;T max 2 ] T ,whichdominatesnootherfeasible constraintsexceptthemselves.Itiseasytoseethat EBF willgeneratefalsenegative 121

PAGE 122

answersonlyinthecross-markedregionbasedonCorollary1.Clearly,theareaofthe falsenegativeregionislinearlydependson .Therefore,when issmallenough, EBF producesfalsenegativeanswersinrarecases. O Feasible Falsenegative W T D; 1 : 1 W 1 ; 1 : 1 T max 1 D; 1 : 1 W 2 ; 1 : 1 T max 2 D;W 1 ;T max 1 D;W 2 ;T max 2 Figure7-4.Possiblefalsenegativeregion. =0 : 1 7.4ProblemVariants Ourapproachisalsoapplicabletootherproblemvariantsbymodifyingtheproperty andmakingsuitablechangestoinvocationoftheproblemsolvingdriverinFigure7-1 modelcheckerofapproximationalgorithm. Tasksetwithindividualdeadlines: Inthescenariowhereeachtaskhasits owndeadline,wehavetomakesurethattheexecutionblocksnishnolaterthantheir correspondingtask'sdeadline.Supposethatthedeadlineofthe i th executionblockis D [ i ] .Equation7isreplacedbyfollowingconstraints,forall 1 i m : m X i =1 t i;l i + l i )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 ;l i D [ i ] ; 8 D [ i ] > 0 Theapproximationalgorithm EBF canalsobemodiedsightlytotheindividual deadlinecase.Weonlyneedtoreplace C 1 line5in Relax with D [ i ] ,whennode u representsjob i .Sincetheapproximationisappliedontheenergyandtemperature constraints,allthepropertiesandrelatedproofsof EBF stillhold. 122

PAGE 123

PeriodicTasks: WesolvetheTCECschedulingofperiodictasksbyconsidering theschedulingoftaskswithinahyperperiod.Wehavetomakesurethataalltasks meetstheircorrespondingdeadlinesineveryhyper-period,bthetemperature constraintsarenotviolatedafterexecutionofanyhyperperiod.Clearly,therst requirementcanbeachievedbyaddingthedeadlineconstraintsaswediscussedin tasksetwithindividualdeadlines. Thesecondrequirementissatisedbyonlychoosingtheschedules,whose temperatureattheendofthehyper-periodislessthanorequaltotheinitialtemperature. AsdiscussedinSection7.1.4,weenforcethisrequirementbyaddingconstraint7. Theapproximationalgorithm EBF needstobemodiedasfollows.When Relax isappliedtothenodecorrespondingtothelasttask,weneedtoensurethat h u;v 3 c 3 3 T init line2in Relax .Inaddition,thestepsizeoftemperatureshouldbe calculatedbasedon T init ,i.e., 3 = T init = m +1 line2of EBF .weveriedthatall thepropertiesandrelatedproofsof EBF stillhold. TC: Temperature-constrainedschedulingproblemisasimpliedversionofTCEC.It onlyneedstoensurethatthemaximuminstantaneoustemperatureisalwaysbelowthe threshold temp max TA: Tondaschedulesothatthemaximumtemperatureisminimized,wecan employabinarysearchoverthetemperaturevaluerange.Eachiterationinvokesthe problemsolvingdrivertotestthecurrenttemperatureconstraint T max .Initially, T max is settothemid-valueoftherange.Ifthepropertyisunsatised,wesearchintherange ofvalueslargerthan T max inthenextiteration.Ifthepropertyissatised,wecontinue tosearchintherangeofvalueslowerthan T max tofurtherexplorebetterresults.This processcontinuesuntilthelowerboundislargerthantheupperbound.Theminimum T max andassociatedschedule,whichmakesthepropertysatisableduringthesearch, istheresult.Notethatthetemperaturevaluerangeformicroprocessorsissmallin practice,e.g., [30 C; 120 C ] .Hence,thenumberofiterationsistypicallynomorethan 7 123

PAGE 124

Toadoptourapproximationschemeintheabovecases,wecanignoretheenergy constraint. 7.5Experiments 7.5.1ExperimentalSetup Inthissection,wedescribetheexperimentalsetupforevaluationofourapproach. ADVS-capableprocessorStrongARM[61]ismodeledwithfourvoltage/frequencylevels .5V-206MHz,1.4V-192Mhz,1.2V-162MHzand1.1V-133MHz.Weusesynthetic tasksetswhicharerandomlygeneratedwitheachofthemhavingexecutiontimein therangeof100-500milliseconds.Thesearesuitableandpracticalsizestoreect variationsintemperature,andmillisecondisareasonabletimeunitgranularity[101]. Weadoptthethermalresistance R andthermalcapacitance C valuesfrom[46], whichare1.83 C=Watt and112.2 mJoules= C ,respectively.Theambienttemperature oftheprocessoris32 C .TheschedulerandTAGshowninFigure7-1areimplemented inC++.Theexactalgorithm EBF andtheapproximationalgorithm EBF arealso implementedinC++.AllexperimentsareperformedonacomputerwithAMD642GHz CPUand16GRAM. 7.5.2TCECversusTCorEC ThissectiondemonstratesthatexistingsolutionsbasedonTCorECarenot sufcienttondTCECschedules.Wecomparedtheschedulegeneratedbyenergy constrainedschedulingalgorithm[95]andourTCECschedulingforthesamesetof jobsunderthesameenergyconstraint.Wealsorequirethatthesystemtemperature aftertheexecutionoftasksetdoesnotexceedtheinitialtemperature T init ,sothatthe temperatureconstraintisnotviolatedevenifthetasksetisexecutedrepeatedly.The resultsareshowninFigure7-5. Itcanbeseenthattheschedulegeneratedby[95],whichconsidersonlyenergy constrainttakeslessexecutiontime.However,itviolatestemperatureconstraint.On theotherhand,theschedulesgeneratedbyourTCECapproachwillnotexceedthe 124

PAGE 125

Figure7-5.ECvsTCEC.ECnishesatA.TCEC < 80 C nishesatB.Both TCEC < 78 C andTCEC < 76 C nishatC. respectivetemperatureconstraints 80 C 78 C and 76 C ,respectively,althoughittakes alittlelongerexecutiontime.Therefore,schedulingalgorithmsthatconsideronlyenergy constraintarenotsuitable,whenwewanttocontrolthemaximumtemperatureofthe processorduringjobexecution. WealsocomparedourTCECschedulingwithtemperatureconstrainedscheduling algorithm[101].Theexperimentswereperformedonthesamejobsetwiththesame temperatureconstraints.ForTCEC,weappliedthreedifferentenergyconstraints.We alsorequirethatthesystemtemperatureaftertheexecutionoftasksetdoesnotexceed theinitialtemperature T init .Figure7-6presentstheresults.SinceTChasnoconstraint onenergyconsumption,italwaystriestoexecutejobswithhighvoltage,whichmay leadtopeaktemperatureseveraltimes.Asaresult,TChastheshortestexecutiontime. However,onceweconsiderenergyconstraint,itmaynotbepossibletoexecutesome jobsathighvoltage.Whentheenergybudgetisverytight,wemaynotbeabletoreach 125

PAGE 126

Figure7-6.ECvsTCEC.BothTCandTCEC < 14000 mJ nishatA. TCEC < 13700 mJ nishesatB.TCEC < 12500 mJ nishesatC. themaximumtemperatureduringtheentireexecution,likecurveTCEC < 12500mJ inFigure7-6.Inthiscase,TCwillclearlyviolatetheenergyconstraint,whileourTCEC obtainsaschedulewithintheenergybudget. 7.5.3TCECusingApproximationAlgorithm WecomparedtheefciencyofconventionalsymbolicmodelcheckerUPPAAL withourapproximationalgorithm EBF ontasksetswithdifferentnumberofblocks. SincetheTCECproblemcanalsobemodeledusingILP,weincludethecorresponding resultsofILPformulation.Therstandthesecondcolumnaretheindexandnumberof blocksineachtaskset,respectively.Thenextthreecolumnspresentthetemperature constraintTC,in C ,energyconstraintEC,in mJ ,anddeadlinesDL,in ms tobe checkedonthemodel.Thesixthcolumnindicateswhetherthereexistsaschedule whichsatisesalltheconstraints.ThelastthreecolumnsofTable7-1showstheresults runningtimeinsecondsofUPPAAL,ILPformulationsolvedwithlpsolve[10],andour 126

PAGE 127

approach EBF .SinceUPPAALfailedtoproduceresultfortaskset4and5,weonly reporttherunningtimeof EBF .Itcanbeseenthat EBF outperformsUPPAALby morethan10timesonaverage.Moreover, EBF cansolvemuchlargerproblemsin reasonablerunningtime. Table7-1.Runningtimecomparisonondifferenttasksets TS#BlkTCECDLFound?UPPAALlpsolve EBF 110 851800007000Y9.630.50.2 851500008000Y9.932.10.2 801400008000N9.429.40.2 212 85700002500Y18.5190.30.3 85600002700Y106.6621.20.3 80600002500N17.5282.40.3 314 90900002600Y65.1-0.3 85800002800Y648.3-0.3 90800002700N208.6-0.3 4508538000039500Y--20.2 51008572000083800Y--102.5 0 100 200 300 400 500 600 020406080100 RunTimeseconds TaskNumber =0 : 02 =0 : 04 =0 : 06 =0 : 10 =0 : 20 Exact Figure7-7.Runningtimewithdifferentjobsetsizeand Wealsoevaluatedtherunningtimeofourapproximationschemewithdifferent TheresultsareshowninFigure7-7.CurveExactrepresentstheexecutiontimeofthe 127

PAGE 128

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 : 1250 : 250 : 3750 : 50 : 6250 : 750 : 8751 Falsenegative Range =0 : 02 =0 : 03 =0 : 04 =0 : 06 =0 : 1 Figure7-8.Accuracyof EBF exactalgorithm EBF .Othercurvespresenttherunningtimeof EBF withdifferent Asexpected EBF requiresmoretimeforsmaller orlargerjobsetsize.Butitstime consumptionissillmuchsmallerthantheexactalgorithm EBF Toinvestigatetheaccuracyofourproposedapproximationscheme,weevaluated thedistributionoffalsenegativeratioalongdifferentconstraintvalues.Inthisexperiment, wegenerated1500instanceofTCECproblemasthetestset.Theysharethesame deadlineandenergybudget,whilethetemperatureconstraintsareuniformlydistributed within 1 C abovethelowestfeasibletemperature.Foreachinstance,theexactalgorithm EBF isappliedrsttodeterminewhetherthefeasiblescheduleexists.Thenwerun EBF oneachinstanceandcheckthecorrectnessofthereturnvalue.Theexperimental resultsarepresentedinFigure7-8.Eachpointrepresentthefalsenegativeratioof TCECinstancesineach 0 : 0625 C interval.Forexample,thefalsenegativeratiois 30%forinstanceswithininterval [0 : 18750 : 25] when =0 : 06 .Aswediscussedin Section7.3.4,thefalsenegativeratiocurvesbehavesasstepfunctions,whichfallto zerowhenthetemperatureconstraintisslightlylarger 0 : 125 C for =0 : 02 thanthe 128

PAGE 129

lowestfeasibletemperature.Inotherwords, EBF producesfalsenegativeanswersin rarecases. 7.6Summary Inthischapter,weproposedaexibleandautomaticframeworktosolvethe temperature-andenergy-constrainedschedulingprobleminmultitaskingsystemswith differentvoltagelevels.Wemodeledtheproblemusingextendedtimedautomataand translatedtheenergy/temperatureconstraintsintoCTLspecications.Theusercan employasuitablemodelcheckertodeterminewhetherthereexistsaschedulethat satisestheconstraints.Duetothecapacitylimitationsofsymbolicmodelcheckerlike UPPAAL,wealsoproposedapolynomialtimeapproximationschemethatisguaranteed togenerateresultsclosetooptimalvaluewithreasonablerunningtime.Weproved mathematicallythatourapproximationalgorithmwillgivenofalsepositiveanswer,while thefalsenegativeratiocanbenegligiblysmallinpracticalscenarios.Ourframework isalsoapplicabletootherschedulingproblemswithdifferentenergy/temperature requirement.Extensiveexperimentalresultsdemonstratedtheeffectivenessofour approach.Inourfuturework,weplantodevelopapproximationalgorithmstoefciently solvebothtasksequencingandvoltageassignmenttogether. 129

PAGE 130

CHAPTER8 SCHEDULABILITYVALIDATIONFORMULTICOREARCHITECTURES Chapter7describedourenergy-andtemperature-awareschedulingframework forasingle-coreprocessor.Inthischapter,westudytheDVSschedulingproblemon multicoreprocessorsunderenergyandtemperatureconstraints.Sincethetaskmapping andsequencingarealreadydiscussedinmanyexistingworks,wefocusonhowto assignclockrate/voltagelevelstotasksthatarealreadymappedandsequencedon differentcores,sothatthetotaltimeconsumptionisminimizedunderbothtemperature andenergyconstraints.OurgoalistodevelopaTemperatureandEnergyConstrained SchedulingTECSformulticoresystems.DuetotheNP-hardnatureofTECSproblem, ithasnopolynomialtimesolutionunlessP=NP.Toavoidthestateexplosionproblem, weproposeanapproximationschemewithpolynomialtime/spacecomplexitybasedon thedetailedanalysisoftheproblem.Tothebestofourknowledge,therearenoprior worksthatconsiderbothenergyandtemperatureconstraintsinmulticoresystemsand areguaranteedtoproduceschedulesclosetotheoptimalsolutionwithreasonable executiontime. Therestofthechapterisorganizedasfollows.Section8.1describesrelated backgroundinformationandtheMCTCECproblem.Section8.2andSection8.3discuss theoptimalalgorithmandourapproximationschemeoftheMCTCECproblemindetail. ExperimentalresultsarepresentedinSection8.5.Finally,Section8.6concludesthe chapter. 8.1BackgroundandProblemFormulation 8.1.1ProcessorThermalModel Whentheexecutiontimeofeachtaskislongenoughfortheprocessortoreachthe steadystatetemperature,wecanusethematrixmodel[90]tocalculatethesteadystate temperatureoneachcoreas T t = T amb I t + C P t 130

PAGE 131

Here, T amb istheambienttemperature, C isa n n constantcoefcientmatrix,and P t isthepowerdissipationbyeachcoreundertheclockrateassignmentattime t .Since weassumethateachtimesliceislargeenoughforthesystemtoreachsteady-state temperature, T t isonlydeterminedby T amb C and P t .Foragivensystem,we derive C usingHotSpot[44]asproposedin[90]. 8.1.2EnergyModel Weadopttheenergymodelproposedin[60].Processor'sdynamicpowercanbe representedas P dyn = C V 2 dd f .Here V dd isthesupplyvoltageand f istheoperation frequency. C isthetotalcapacitanceand istheactualswitchingactivitywhichvaries fordifferentapplications[8].Inotherwords,thepowerproleofataskcanbedifferent fromeachother.Staticpowerisgivenby P sta = V dd I subth + j V bs j I j where V bs I subth and I j denotethebodybiasvoltage,subthresholdcurrentandreversebiasjunctioncurrent, respectively.Hence,theoverallpower P = P dyn + P sta 8.1.3SystemModel Thesystemweconsidercanbemodeledas: 1.Amulticoreprocessorwith M cores.Eachcoresupports L discreteclock rate/voltagelevels f f 1 =v 1 f 2 =v 2 ,..., f L =v L g ,where f min isthelowestclockrate, and f max isthehighest. 2.Asetof n tasks,whichhasalreadybeenmappedandsequencedondifferent cores.Weuse ij todenotethe j th taskoncore i .Let c ij betheworst-case workloadof ij ,and k i bethetotalnumberoftasksmappedoncore i .Wealso denotethetotalworkloadoncore i by w i = P k i j =1 c ij Foreaseofdiscussion,theterms task and job refertothesameentityintherestof thischapter. 8.1.4MulticoreDVSSchedule Sincealltasksarealreadymappedandsequenced,aDVSscheduleonamulticore systemwithtaskset f ij j 1 i M; 1 j k i g canberepresentedasasetoftuples f r ij ; [ t ij ;t 0 ij ] j 1 i M; 1 j k i gg ,where r ij ; [ t ij ;t 0 ij ] meansweexecute ij using clockrate r ij duringtimeinterval [ t ij ;t 0 ij ] .Itiseasytoseethatclockrateswitchesalways 131

PAGE 132

happenwhensometasknishes.Whenalltasksmappedtoacorearenished,acore isturnedoff. 8.1.5ProblemFormulation Givenasetof n independenttasks f ij j 1 i M; 1 j k i g ,ifthesafe temperaturethresholdis C T andtheenergybudgetis C E ,TECSschedulingproblem canbedenedasfollows. Denition9. TECS isformallydenedasndingamulticoreDVSschedule, R opt ,which minimizethetotalexecutiontime,i.e., minmax 1 i M t 0 ik i subjectto c ij =r ij t 0 ij )]TJ/F24 11.9552 Tf 11.955 0 Td [(t ij X 0 i n P r ij t 0 ij )]TJ/F24 11.9552 Tf 11.955 0 Td [(t ij C E T t C T ; 8 t 0 t 0 ij t ij +1 ; 8 j
PAGE 133

systemstate,werstidentifythenexttaskthatisreadytoexecute.Next,wecompute thesystemstatesatnextswitchingpointbyexecutingthistaskwithallpossibleclock rates.Afterthat,wemarktheestimatedstateasavalidnewstate,ifitdoesnotviolate thetemperatureorenergyconstraints. Formally,givenatasksequenceoncore i ,atanytimeinstant t ,wedenethe progressofthistasksequenceas p i = w=w i ,where w i = P j c ij isthetotalworkload mappedoncore i and w w w i isthecompletedworkloadonthiscore.Thesystem statuscanberepresentedasatuple s = < p 1 ;r 1 > ;:::; < p M ;r M > ;E;t ,where p i and r i arethecurrentprogressandclockrateofcore i E and t arethetotalenergyandtime consumption,respectively.Thetemperatureofeachcoreisnotexplicitlyincludedinthe systemstatetuple,becausetheycanbecalculatedusingthepowerofeachcore P m andambienttemperature T amb usingEquation8. Whensomecoresinthesystemareabouttobegintheexecutionofthenextjobin theirtasksequences,weencounterapotentialclockrateswitchingpoint,orswitching pointforshort.Sincemultiplecorescanchangeclockrateatthesametime,e.g.,at t =0 ,allpossibleclockrateassignmentsfor M corescanberepresentedbyasetof M )]TJ/F20 11.9552 Tf 9.298 0 Td [(dimenionalvectors.Formally,wedenethesetofpossibleclock R ate A ssignment RA s forsystemstate s asthedirectproduct RA s = M O i =1 8 > > > > > > < > > > > > > : f 0 g if s :p i =1 f f 1 ;:::;f L g elseif R i s :p i =0 f s :r i g otherwise 133

PAGE 134

where R i p i = 8 > > > > > > > > > > > > > > < > > > > > > > > > > > > > > : 0 if p i =0 P 1 j =1 c ij =w i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i elseif P 1 j =1 c ij =w i p i P 2 j =1 c ij =w i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i elseif P 2 j =1 c ij =w i p i ::: P k i j =1 c ij =w i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i elseif P k i j =1 c ij =w i p i R i p i istheremainingprogressuntilthebeginningofnexttaskoncore i RA s returns asetofpossibleclockratechoices,whichallowsthecoretochoosefrom L voltage levelsifitisabouttostartthenexttask,i.e., R i p i =0 .Ifalltasksonthesamecoreare nished,i.e., p i =1 ,weshutdownthecore,byassigningclockrate0.Acoredoesnot consumeanymoreenergyatclockrate0. Inordertocalculatethesystemstateatnextswitchingpoint,wedenethestate transitionfunction s 0 = F s ; r as s 0 :p i = s :p i + r i =w i s 0 :r 0 i = r i ; 1 i M s 0 :E = s :E + M X i =1 P r i s 0 :t = s :t + where =min 1 i M;p i < 1 R i s :p i + w i =r i isaverysmallpositivenumbercloseto0. Thestatetransitionfunction F takesthesystemstateataswitchingpoint s ,and oneclockrateassignmentvector r asinputsandproducesthesystemstateatthenext switchingpoint. Algorithm9showstheDynamicProgrammingDPalgorithmforclockRate AssignmentDPRAtoobtaintheoptimalsolutiontotheTECSproblem.Initially,the setofsystemstates S onlycontainsasinglestate s 1 = < 0 ; 0 > ;:::; < 0 ; 0 > ; 0 ; 0 .During theDPprocess,werstpick s 2S ,whichcontainsatleastoneincompletetask sequencewiththeleastprogressamongallstatesin S .Supposethatthereare m task 134

PAGE 135

Algorithm9: ExactsolutiontoTECS DPRA 1: S = f s 1 g = f < 0 ; 0 > ;:::; < 0 ; 0 > ; 0 ; 0 g 2: while notallstatesin S areexplored do 3: Pickanunexploredstate s from S suchthat s containsatleastoneincomplete tasksequencewiththeleastprogressamongallstatesin S 4: for each r 2 RA s do 5: s 0 = F s;r 6: if r violatestemperatureconstraint C T or s 0 :E>C E then 7: continue 8: endif 9: if 9 s 0 2S s.t. s 0 and s 0 agreeonallvaluesbuttime then 10: if s 0 :t s 0 :t then 11: continue 12: else 13: S = S)-222(f s 0 g /*Remove s 0 */ 14: endif 15: endif 16: S = S S f s 0 g /*Add s 0 */ 17: endfor 18: endwhile 19: Findthestate s opt in S withtheleasttimeconsumption,suchthatalltasksare nished.Constructthecorrespondingschedlue R opt bybacktrackingfrom s opt to s 1 sequencesthatareabouttostartnewtasks.Wetryallpossiblecombinationsofclock rateassignmentsonthese m cores,whilekeepingtheclockrateunchangedontherest M )]TJ/F24 11.9552 Tf 12.265 0 Td [(m cores.Thiswillyieldasetofassignments RA s ,whichcontains L m elements. Next,wecalculateasystemstate s 0 basedon s andclockrateassignment r 2 RA s g If s 0 doesnotviolateanyconstraints,weadditto S .Theaboveprocessrepeatsuntil allstatesin S containingincompletetasksareexplored.Now,weneedtondthestate whichhastheleasttimeconsumptionin S E XAMPLE 1: ThisexampleillustratestheowofAlgorithm9usingaprocessorwith M =2 cores.EachofthemhaveL=2differentclockratelevels f 1 =100 MHz and f 2 =200 MHz .Theirpowerconsumptionare P f 1 =1 W and P f 2 =4 W .Thereare threetasks 1 ; 1 1 ; 2 and 2 ; 1 withworkloadsof 10 6 10 6 ,and 2 10 6 cycles,respectively. 1 ; 1 and 1 ; 2 aremappedtocore1,while 2 ; 1 ismappedtocore2.Therefore,wehave 135

PAGE 136

f 1 f 1 < 0.5, f 1 >< 0.5, f 1 > ,2,1 < 1, f 1 >< 1, f 1 > ,4,2 < 0,0 >< 0,0 > ,0,0 < 0.5, f 1 >< 1, f 2 > ,5,1 ... f 1 f 2 f 2 f 1 < 1, f 2 >< 0.75, f 1 > ,5.5,1.5 ... < 0.5, f 2 >< 0.25, f 1 > ,2.5,0.5 ... f 1 f 1 f 2 f 1 Figure8-1.StateexplorationinAlgorithm9 c 1 ; 1 = c 1 ; 2 =10 6 c 2 ; 1 =2 10 6 w 1 = c 1 ; 1 + c 1 ; 2 =2 10 6 and w 2 = c 2 ; 1 =2 10 6 .We choosethetemperatureconstraintsuchthatonlyonecorecanrunat 200 MHz .Wealso choose C E =10 J.WhenweapplyAlgorithm9tosuchaTECSinstance, S contains onlyoneelement s 1 = < 0,0 > < 0,0 > ,0,0atthebeginning.Thus, s 1 ispickedbyline3. Sincewehave R 1 s 1 :p 1 = R 1 =0 and R 2 s 1 :p 2 =0 ,theclockratesforbothcores canbechanged,i.e., RA s 1 = f f 1 ;f 2 gf f 1 ;f 2 g = f f 1 ;f 1 ; f 1 ;f 2 ; f 2 ;f 1 ; f 2 ;f 2 g contains L M =4 elements,whichrepresentsfourpossibleclockrateassignments.Next, wecomputenewstates s 0 basedontheseassignmentsexcept f 2 ;f 2 ,whichviolate thetemperatureconstraint.Ifwepick r = f 1 ;f 1 ,thenewstate s 2 = F s 1 ; r can becomputedasfollows.First,wehave R 1 s 1 :p 1 + =0 : 5 ,whichmeanscore1is 0 : 5 w 1 cyclesfarfromthebeginningofthenexttask 1 ; 2 .Similarly, R 2 s 1 :p 2 + =1 Therefore,ifweuseclockrate r = f 1 ;f 1 ,whichmakesbothcorestorunat f 1 = 100 MHz =min : 5 w 1 =f 1 ;w 2 =f 1 =1 sec.Inotherwords,thenextswitchingpoint willhappenafter1sec.Atthattime,theprogressvaluesofcore1andcore2willbe s 2 :p 1 =0+ f 1 1 =w 1 =0 : 5 and s 2 :p 2 =0+ f 1 1 =w 2 =0 : 5 ,respectively.Wealso computetheenergyconsumption s 2 :E =0+ P f 1 1+ P f 1 1=2 Jandtime consumption s 2 :t =1 sec.Therefore,thenewstateis s 2 = < 0.5, f 1 > < 0.5, f 1 > ,2,1.Since s 2 and r = f 1 ;f 1 donotviolateanyconstraint,weadd s 2 into S Werepeataboveprocedurefortheothertwoclockrateassignmentsandmark s 1 asexplored.Inthenextround,wepick s 2 from S online3ofAlgorithm9.Wehave 136

PAGE 137

R 1 s 2 :p 1 = R 1 : 5=0 ,whichindicatesthatwecanchangetheclockrateofcore 1,becausetheprevioustaskjustnished.However, R 2 s 2 :p 2 = R 2 : 5=0 : 5 whichmeansthecurrenttaskoncore2hasnotnishedyet.Therefore, RA s 2 = f f 1 ;f 2 gf s 2 :r 2 g = f f 1 ;f 1 ; f 2 ;f 1 g onlycontainstwopossibleclockrateassignments, becausetheclockrateofcore2cannotbechanged.Theseassignmentsareused toproducenewstatesandupdate S .Werepeataboveprocedureuntilwenda statein S ,withinwhichalltasksarenishedwithminimumtotaltimeconsumption. Throughbacktracing,wecanndthepaththatgeneratesit: < 0,0 > < 0,0 > ,0,0 < 0.5, f 1 > < 1, f 2 > ,5,1 < 1, f 2 > < 1, f 1 > ,7,1.5.Thecorrespondingscheduling R opt is < f 1 ,0,1 > < f 2 ,1,1.5 > < f 2 ,0,1 > ,whichmeans 1 ; 1 1 ; 2 and 2 ; 1 shouldbeexecutedusing f 1 f 2 ,and f 2 ,respectively. Clearly,iftwosystemstatesagreeonallvaluesexceptthetimeconsumption,we onlyneedtorecordtheonewithsmallertimeconsumption,becausetheonewithlarger timeconsumptionwillnotbeapartoftheoptimalsolution.Thisfactisexploitedbyline 9inAlgorithm9toacceleratethecomputation.However,itshouldbenoticedthatwe mustexplorethestateincertainorder,sothattheexploredstatewillnotbeupdatedin thefuture.Ouralgorithmsatisessucharequirement,becausethestatethatwepick containsatleastoneincompletetasksequence,say i th ,whichhastheleastprogress amongallstatesin S .Thereisnowayitcanbedominatedbyanynewstates,because anynewstateswillhavealargerprogressonthe i th tasksequence.Therefore,whenwe pickanunexploredstate s online3,itisguaranteedthat s :t willnotbeupdatedinthe future. Thetimeandspacecomplexityoftheexactalgorithmis O L n ,becauseeachof the n taskscanbeexecutedat L differentvoltagelevels,thesystemhas L n different executionpathintheworstcase.Therefore, S willcontainupto O L n states,because weareperformingabreadth-rstsearchinthestatespace.Asaresult,theoveralltime 137

PAGE 138

andspacecomplexitybecomes O L n ,whichisnaturalconsideringtheNP-hardnature ofTECSproblem. 8.3ApproximationAlgorithm Likemanypreviousworks,thebasicideaofourapproximationalgorithmisbuilt ondiscretizationofthestatespace.Thespacesizeisreducedbyroundingupall valuesinthestatevector,andbymergingstatesthatagreeonallvaluesafterrounding. Unfortunately,inTECSproblem,thismethodcannotbeapplieddirectlytoprogress values.Recallthatwedenetheprogressofatasksequenceoneachcoretorepresent howmanyinstructionorworkloadhasalreadybeencompleted.Roundingupprogress valuesintroducestwoproblems.First,theswitchingpoints,whicharedenedbased onprogressvaluesmaybeskipped,becausetheyusuallydonotcoincidewiththe discretizedprogressvalues.Second,theroundingoperationessentiallymeanswe skipsomeinstructionswithoutexecutingthem.Therefore,ifweapplytheobtained schedulinginreality,theactualprogresswillnotmatchwiththeoneswecalculated indynamicprogramming.Asaresult,thetemperatureorenergyconstraintsmaybe violated. Wesolvebothproblemsasfollows.First,weviewastate s 2S notasareal systemstate,butapessimisticapproximationofarealsystemstate.Second,weinsert asuitableidletimeateachswitchingpoint,sothatthedifferencebetweenthereal executionandestimatedvalueindynamicprogrammingcanbebounded.Inthisway, wecanobtainanapproximatedestimationoftheactualexecutionunderanyclock rateselections.Beforeweintroduceourapproximationscheme,werstintroducethe modiedversionofthestatetransitionfunctionandclockrateassignmentfunction, whichareusedtobuildtheapproximationalgorithm.Themodiedstatetransition 138

PAGE 139

function s 0 = F t s ; r isdenedas s 0 :p 0 i = s :p i if r i = f I ; s :p i + r i =w i ; otherwise s 0 :r 0 i = s :r i if r i = f I ; r i ; otherwise s 0 :E = s :E + M X i =1 P r i +2 t s 0 :t = s :t + +2 t where =min 1 i M;p i < 1 R i s :p i + w i =r i isaverysmallpositivenumbercloseto0 Anextraincrement 2 t isadded,whichrepresentstheidletime. RA s isthe modiedversionof RA s ,whichisdenedas RA s = M O i =1 8 > > > > > > < > > > > > > : f 0 g if s :p i =1 f f 1 ;:::;f L g elseif R i s :p i P f s :r i g otherwise Inline9, h isapartialroundingupfunction s 0 = h s ,whichisdenedas s 0 :p i = d s :p i = p e p s 0 :r i = s :r i ;i =1 ;:::;M s 0 :E 0 = d s :E= E e E s 0 :t 0 = s :t Algorithm10showsthedetailsofourapproximationalgorithm DPRA .Werst computethestepsize E P and t foreachconstraintbasedonthevalueof .After that, DPRA parallelstheexactalgorithm DPRA exceptthattheprogressandenergy valuesineachnewsystemstate s isroundeduptothenextavailablediscretizedvalue. Thisisachievedbyapplyingfunction h ,whichforcestheprogressandenergyvalueof theresultantstatetobeanintegermultipleof P or E .Forexample,supposewehave P =0 : 1 and E =0 : 2 ,anewstate F t s ; r = < 0.5, f 2 > < 0.25, f 1 > ,1,2.5,0.5willbe 139

PAGE 140

Algorithm10: ApproximationalgorithmofTECS DPRA 1: E = C E = 4 n 2: =max 1 i M w i =f min 3: P =min E =P max ; f min = f max 2 n P max isthemaximumpowerdissipationof theentireprocessor. 4: t = P 5: S = f s 1 g = f < 0 ; 0 > ;:::; < 0 ; 0 > ; 0 ; 0 g 6: while notallstatesin S areexplored do 7: Pickanunexploredstate s from S suchthat s containsatleastoneincomplete tasksequencewiththeleastprogressamongallstatesin S 8: for each r 2 RA s do 9: s 0 = h F t s ; r 10: if r violatestemperatureconstraint C T or s 0 :E> + C E then 11: continue 12: endif 13: if 9 s 0 2S s.t. s 0 and s 0 agreeonallvaluesbuttime then 14: if s 0 :t s 0 :t then 15: continue 16: else 17: S = S)-222(f s 0 g 18: endif 19: endif 20: S = S S f s g 21: endfor 22: endwhile 23: Findthestate s apx in S withtheleasttimeconsumption OPT apx ,suchthatalltasks arenished.Constructthecorrespondingschedule R apx bybacktrackingfrom s apx to s 1 .Ifataskisskippedduetorounding,itisscheduledasapartoftheprevious taskonthesamecore. recordedas h F t s 0 ; r = < d 0.5/0.1 e *0.1, f 2 > < d 0.25/0.1 e *0.1, f 1 > d 2.5/0.2 e *0.2,0.5 = < 0.5, f 2 > < 0.3, f 1 > ,2.6,0.5 Iftheoptimalschedulingof MCTCEC existswithtimeconsumption OPT ,our approximationalgorithm DPRA willndaschedulesuchthatitwillnotviolatethe temperatureconstraintandhasatmost + C E energyand + OPT time 140

PAGE 141

consumptionrespectively.Intherestofthissection,wewillshowthat DPRA isan approximationalgorithmwiththeclaimedproperties. Lemma4. Givenan MCTCEC instance I ,if DPRA I ndsascheduleof IR apx withestimatedtimeconsumption OPT apx R apx isafeasibleof I ,whoseactualtime consumptiondoesnotexceed OPT apx Proof. Since R apx isfoundby DPRA I S mustbeastate s apx andapathwith K states s 1 )]TJ/F24 11.9552 Tf 13.339 0 Td [(> s 2 )]TJ/F24 11.9552 Tf 13.339 0 Td [(>::: )]TJ/F24 11.9552 Tf 13.339 0 Td [(> s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 13.339 0 Td [(> s apx .When R apx isappliedinreality,weapplythe clockratesassignment r i attime t i for 1 i K .Whenthecurrentjobonacoreis nished,wekeepthecorerunningidlejobuntilnextswitchpoint.Toprovethislemma, weneedtoshowthat1alljobareindeednishedand2allconstraintsaremet. Therststatementcanbeprovedbyshowingthateachjobhasenoughtimetorun. Supposeatask oncore j startsfromthe i thswitchpoint.Ifthenexttaskonthesame corestartsfromthe i 0 thswitchpoint,thetimeallocatedforthistaskis s 0 i :t )]TJ/F46 11.9552 Tf 12.303 0 Td [(s i :t .Since weperform i 0 )]TJ/F24 11.9552 Tf 12.008 0 Td [(i roundsofcomputationtoobtain s 0 i from s i ,therecanbeatmost i 0 )]TJ/F24 11.9552 Tf 12.009 0 Td [(i roundingupduringthecalculationfrom s i :p j to s 0 i :p j .Therefore, s 0 i :t s i :t + s 0 i :p j )]TJ/F46 11.9552 Tf 11.955 0 Td [(s i :p j )]TJ/F15 11.9552 Tf 11.955 0 Td [( i 0 )]TJ/F24 11.9552 Tf 11.955 0 Td [(i p =r i +2 i 0 )]TJ/F24 11.9552 Tf 11.955 0 Td [(i t s 0 i :t )]TJ/F46 11.9552 Tf 11.955 0 Td [(s i :t s 0 i :p j )]TJ/F46 11.9552 Tf 11.955 0 Td [(s i :p j + p =r i Ontheotherhand,thetotalprogressof canbeatmost s 0 i :p j )]TJ/F46 11.9552 Tf 13.175 0 Td [(s i :p j + p Therefore,alltaskswillhaveenoughtimeforexecutionwhen R apx isapplied. Nowweprovethesecondstatementbyconsideringfollowingrelationsamong differentsuccussivestatesonpath s 1 )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s 2 )]TJ/F24 11.9552 Tf 12.619 0 Td [(>::: )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s apx s 2 = h F t s 1 ; r 1 s 3 = h F t s 2 ; r 2 ::: s apx = h F t s K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ; r K )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 141

PAGE 142

Basedonthelogicof DPRA I ,thefollowingrelationsholdfor 1 i M 1 k K + C E s k :E Lettherealstatetransitionpathproducedby R apx be s 1 )]TJ/F24 11.9552 Tf 12.999 0 Td [(> s 0 2 )]TJ/F24 11.9552 Tf 12.999 0 Td [(>::: )]TJ/F24 11.9552 Tf 12.999 0 Td [(> s 0 K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 12.999 0 Td [(> s 0 K .Clearly,wehave s 0 2 = F t s 0 1 ; r 1 s 0 3 = F t s 0 2 ; r 2 ::: s 0 K = F t s 0 K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ; r K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 Sincecomponentsofvectorfunctions h and f areallincreasingfunctions,i.e., s 1 s 2 h s 1 h s 2 and f s 1 f s 2 ,itiseasytoseethat s 2 s 0 2 ,..., s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 s 0 K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 and s apx s 0 K .Therefore, + C E s k :E OPT apx = s apx :t s K :t Noticethattemperatureandenergyvalueschangesmonotonicallybetween s 0 k and s 0 k +1 duringrealexecution.Equation8ensuresthatallconstraintsaremetandtherefore concludestheproof. Inordertoshowthesecondpropertyof DPRA I ,werstdeneprecedence relationbetweendifferenttasks. Core M d a c b e Core1 Figure8-2.Precedencerelationsamongtasks. Denition10. Givenfeasiblescheduleofa MCTCEC instance I ,wedenethe precedencerelation i j ontasks,whichholdsifandonlyifthenishingtimeof i 142

PAGE 143

islessthanthestarttimeof j underthegivenschedule.Forexample,intheschedule showninFigure8-2,wehave a d b c ,and c d for d ,because a b and c nishbefore d starts. Aprecedencerelation PR canalsoberepresentedasagraph G PR ,withinwhich eachtaskisdenotedbyavertex,andeachprecedencerelationisshownasadirected edge.Sincealledgesarepointingfromtaskswithlargerstarttimetotaskswithsmaller nishingtime,thereisnocycleinthegraph,i.e., G PR isaDAG. Clearly,foranytasks 1 ;:::; M thatexecuteatthesametimeon M differentcores undersomefeasibleschedule,thereisnoprecedencerelationbetweenanytwoof them.Besides,itisalsoeasytoseethatifthereisnoprecedencerelationbetween anytwotasksin f 1 ;:::; M g ,theymustexecuteatsometime t 0 simultaneously. Sincethescheduleisfeasible,theclockrateassignmentat t 0 doesnotviolatethe temperatureconstraintsonanycore.Inotherwords,understeadytemperaturemodel, thecombinationoftheclockrateassignmentto 1 ;:::; M willnotviolatethetemperature constraintsonanycore. Nowweshowthatourapproximationscheme DPRA doesndaschedule,when the MCTCEC instanceifschedulable. Lemma5. Givenan MCTCEC instance I ,if I isschedulablewithoptimaltimeconsumption OPT DPRA I returnaschedule R apx withestimatedtimeconsumption OPT apx + OPT Proof. Let S bethestatesetconstructedby DPRA I .Wearegoingtoshowthatthere existsapath p = s 1 )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s 2 )]TJ/F24 11.9552 Tf 12.62 0 Td [(>::: )]TJ/F24 11.9552 Tf 12.62 0 Td [(> s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s K suchthat s K :t + OPT s K :E + C E s K :p j =1 ; 1 j M 143

PAGE 144

First,weconstructapath p withdesiredproperties.Since I isschedulable,letthe optimalscheduleof I be R opt .Weuse R opt todenetheprecedencerelation PR onall tasksandconstructthecorrespondinggraph G PR .Weconstructpath p = s 1 )]TJ/F24 11.9552 Tf 13.073 0 Td [(> s 2 )]TJ/F24 11.9552 Tf 13.072 0 Td [(> ::: )]TJ/F24 11.9552 Tf 12.619 0 Td [(> s K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 )]TJ/F24 11.9552 Tf 12.62 0 Td [(> s K asfollows, 1.Fortask ,itscorrespondingnodein G PR hasnoprecedencenode,useitsclock ratein R opt ,otherwise,useclockrate f 0 2.Computethenextstate s i +1 = h F t s i ; r ,where r istheclockrateassignment ofcorrespondingtasksin R opt .Ifataskisnishedbasedontheprogressin s i +1 removeitscorrespondingnodefrom G PR 3.Repeatabovestepsuntilalltasksarenished. Since s 0 istheinitialstate,andtheclockrateassignmentusedtoproduce s 1 is identicaltotheonesin R opt atbeginning, s 1 willnotviolatethetemperatureconstraints. Thus, s 1 isin S .Supposeweknow s i 2S asinductionhypothesis.Lettheclockrate assignmentusedtoproduce s i +1 be r .Byourconstructionrules, r isappliedonaset oftasks,whichdonothaveanyprecedencerelationbetweenanytwoofthemin R opt .In otherwords, r willnotviolatethetemperatureconstraintonanycore.Suchreasoning holdsforany s i 1 i K .Therefore,allstatesin p donotviolatethetemperature constraint. Next,weshowthat p alsomeetsthetimerequirement,i.e., s K :t + OPT Lettask i k bethelasttaskcompletedin p .Suppose i k startexecutionatsomestate s i k andnishedatstate s K .Since i k isassignedanon-idleclockratein s i k ,theremust existatask i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 i k ,whichjustnishedin s i k .Otherwise,ifall k 'sprecedencetasks arenishedin s i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 k shouldstartin s i k )]TJ/F23 5.9776 Tf 5.757 0 Td [(1 insteadof s i k basedonourconstruction. Forthesamereason,wecannd i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(2 ,suchthat i k )]TJ/F23 5.9776 Tf 5.757 0 Td [(2 i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 and i k )]TJ/F23 5.9776 Tf 5.757 0 Td [(2 nishes inthesamestatefromwhich i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 startsexecution.Eventually,wecandeterminea chainoftasks i 1 i 2 ::: i k ,where i 1 isthersttaskonsomecore.Lettime 144

PAGE 145

consumptionofeachtaskinthechainunder R opt be t i 1 ;:::;t i k respectively.Itiseasyto seethat s K :t s i k :t + t i k +2 K )]TJ/F24 11.9552 Tf 11.955 0 Td [(i k t because DPRA I adds 2 t totimeconsumptionforeachintermediatestate,and werequireatmost t i k timetonishitsprogress,becausethetimeconsumptionof i k summationof isestimatedusingthesameclockrateasin R opt .Similarly,wehave s i k :t s i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 :t + t i k )]TJ/F23 5.9776 Tf 5.756 0 Td [(1 +2 i k )]TJ/F24 11.9552 Tf 11.955 0 Td [(i k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 t ::: s i 2 :t s i 1 :t + t i 1 +2 i 2 )]TJ/F24 11.9552 Tf 11.955 0 Td [(i 1 t = t i 1 +2 i 2 t Bytakingthesumofbothsides,wehave s K :t t i 1 + ::: + t i k +2 K t Ontheotherhand,since i 1 i 2 ::: i k ,theirexecutionhavenooverlapunder R opt Thus t i 1 + ::: + t i k OPT Therefore, s K :t OPT +2 K t OPT +2 n t OPT +2 n P OPT +2 n f min f max 2 n max 1 i M w i =f min OPT + max 1 i M w i =f max + OPT Finally,weprovethat p meetstheenergyrequirement,i.e., s K :E + C E 145

PAGE 146

Since s K = h F t s K )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ; r ,wehave s K :E = d s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 :E + M X i =1 P s K :r i K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 +2 t = E e E s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 :E + M X i =1 P s K :r i K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 +2 t + E Wecanderivesimilarrelationsfor s K )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 :E ,..., s 1 :E andplugalloftheminto8. Noticethat s 1 :E =0 ,wehave s K :E M X i =1 K X j =2 P s j :r i j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 +2 t + K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 E M X i =1 K X j =2 P s j :r i j )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 + K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 P max t + E M X i =1 K X j =2 P s j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 :r i j )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(Idle i;j + M X i =1 K X j =2 P s j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 :r i j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 Idle i;j + K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 P max t + E where Idle i;j = 8 > > < > > : 1 ifcoreireceives f 0 in s j 0 otherwise Intuitively, E A = M X i =1 K X j =2 P s j :r i j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(Idle i;j isthetotalenergyconsumptionwhencoresreceiveclockratesotherthan f 0 andmake progress,while E I = M X i =1 K X j =2 P s j :r i j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 Idle i;j isthetotalenergyconsumptionwhencoresreceive f 0 Duetotheroundingupofprogresses, E A shouldbenomorethantherealenergy consumptionin R opt ,becauseweapplythesameclockratetoexecuteatmostthesame 146

PAGE 147

amountofprogressas R opt .Since R opt isafeasibleschedule,itsenergyconsumptionis boundedby C E ,i.e., E A C E Forthesecondterm E I ,weclaimthatforanycore i ,thetotaltimethattasksreceive clockrate f 0 t idle i = K X j =2 j )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 Idle i;j isnotmorethan K )]TJ/F15 11.9552 Tf 12.061 0 Td [(1 P .Toseethis,letthetimeconsumptionofalltasksoncore i be OPT i .Usingthesametechniqueweusedinthetimeconsumptionanalysis,wehave P K j =2 j OPT i .Ontheotherhand,thetotalprogressskippedduetoroundingupisat most K )]TJ/F15 11.9552 Tf 12.039 0 Td [(1 P .Thetimeconsumptionforexecutionusingclockrateotherthan f 0 ,i.e., P K j =2 j )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(Idle i;j ,shouldbeatleast OPT i )]TJ/F24 11.9552 Tf 11.955 0 Td [(k P .Therefore, t idle i OPT i )]TJ/F25 7.9701 Tf 16.737 14.944 Td [(K X j =2 j )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(Idle i;j K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 P Thus, E I = M X i =1 K X j =2 P s j :r i j )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 Idle i;j P max M M X i =1 t idle i P max K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 P Noticethat K n +1 ,wehave s K :E E A + E I + K )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 P max t + E C E + P max n P +2 nP max t + n E C E + C E = 4+ C E = 2+ C E = 4 + C E Nowwehavenallyprovedthatpath p satisesallconstraints.Let S bethestate setproducedduringtheexecutionof DPRA I .Since p isconstructedusingthesame transitionfunction h and F t ,itisalsoavalidpathin S ,unlesssomestatesinithave sameprogressandenergyvalueasotherstatesin S butmoretimeconsumption, 147

PAGE 148

andthereforebereplaced.Ineithercase, DPRA I willyieldaschedulewithtime consumptionatmost + OPT Lemma6. DPRA isapolynomialtimealgorithmin n ,i.e.,thenumberoftasks. Proof. Werstshowthatthenumberofstatesin S is O n= M +1 .Itiseasytoseethat theenergyvalueisdiscretizedinto 4 n= differentvalues.Forprogressvalues,thereare 1 = P differentvaluesallowedforeachcore.If E =P max < f min = f max 2 n 1 P = P max E =max 1 i M k X j =1 c ij P max 4 n C E f min 4 P max n P min Otherwise,if E =P max f min = f max 2 n 1 P = f max f min Ineithercase, 1 = P isnomorethan n= timesaconstant,becauseboth P max =P min and f max =f min arenormallylessthan10.Therefore,thereareatmost O n= M +1 statesin S .Atthesametime,thenumberofdifferentvoltageassignmentswecanchoose,i.e., thesizeof RA s ,isalsonomorethan L +1 M ,whichisaconstant.Therefore,the overallcomplexityof DPRA is O n= M +1 AsadirectresultofLemma4,Lemma5andLemma6,wehave Theorem8.1. Givenan MCTCEC instance I ,if I isschedulablewithoptimaltime consumption OPT DPRA I willreturnascheduleinpolynomialtime,whichdoesnot violatethetemperatureconstraint,whileitsenergyandtimeconsumptionareatmost + C E and + OPT ,respectively. 8.4ProblemVariants 8.4.1TaskSetwithDependence Sofar,weonlydiscussedtheapplicationofourapproachusingindependenttask set.Whensometaskdependsonothertasksandcannotstartbeforethecompletionof 148

PAGE 149

othertasks,wecannaturallyaddsuchconstraintstotheconstraintcheckingstatement line6ofAlgorithm9andline10ofAlgorithm10,becauseweallowthedummyclock rate f I .Ifthedependenceconstraintisnotmetbasedontheprogressesondifferent cores,wejustdropthenewstate. 8.4.2HardEnergyConstraint Whentheenergyconstraintistight, s 0 :E> + C E online10ofAlgorithm10 shouldbereplacedby s 0 :E>C E .Inthiscase,ourapproximationalgorithmwillnda schedulesuchthatitwillnotviolatetheenergyandtemperatureconstraintsandhas atmost + OPT timeconsumption.However,theapproximationalgorithmisonly guaranteedtondaschedule,whentheoriginalTECSproblemisschedulablewith energyconsumptionof )]TJ/F24 11.9552 Tf 11.956 0 Td [( C E withoptimaltimeconsumption OPT 8.5Experiments Theexperimentswereconductedon2core,4core,and6coreprocessors.Each coreisabstractedasa 8 mm 8 mm square.Thecoresarearrangedin 2 1 2 2 and 3 2 meshes,respectively.WemodeleachcoreasaDVS-capableprocessingunitwith threevoltage/frequencylevels.5V-206MHz,1.1V-133MHz,and0.8V-103MHzlike StrongARM[61].WechoosesometasksfromtheMibenchandobtaintheworkload worstcasecyclenumbersfromM5simulator.Wealsousesynthetictasksets whicharerandomlygeneratedwitheachofthemhavingexecutiontimeintherange of500-5000milliseconds.Weadopttheapproachin[90]tocomputethesteady statetemperature.Theambienttemperatureandinitialtemperatureoftheprocessor aresetto 32 C and 40 C ,respectively.Theexactandapproximationalgorithmsare implementedinC++.Allexperimentswereperformedon3GHzworkstationwith20GB RAM. Wechoose6jobsfromMiBench[41],includingalgorithmsfromcommunication FFT,CRC32 ,security sha ,soundcompression untoast ,andautomotive basicmath,qsort .Theworkloadofthesejobswereinrangeof 5 10 7 )]TJ/F15 11.9552 Tf 12.368 0 Td [(3 10 8 cycles.We 149

PAGE 150

usetheexactalgorithmDPRAtoschedulethesetaskson2coreprocessor. CRC32 1 ; 1 qsort 1 ; 2 ,and untoast 1 ; 3 aremappedtocore1. sha 2 ; 1 FFT 2 ; 2 ,and basicmath 2 ; 3 aremappedtocore2.Wedepictthetemperaturecurvesofeachcorein Figure8-3,whendifferenttemperatureandenergyconstraintsareapplied. InFigure8-3a,thetemperatureconstraintisnotviolatedwhenbothcoresrunat 1.5V.DPRAschedulesjobsondifferentcorestoexecuteusingthemaximumvoltage levelatthesametime,i.e.,task 1 ; 1 and 2 ; 2 ,tominimizethetimeconsumption.When theenergybudgetreduces,taskswithlargeworkloadwillbeexecutedusinglower voltageleveltosaveenergyasshowninFigure8-3b.Aswecansee, 2 ; 2 isexecuted using1.1Vinsteadof1.5V,whentheenergybudgetreducesto22000mJ.Similarly, whenthetemperatureconstraintbecomestighter,lessnumberoftasksareexecuted usingthemaximumvoltageleveltodecreasethepeaktemperature.Asshownin Figure8-3c,twocoresnolongerrunusing1.5Vatthesametime.Althoughthe energybudgetisstillsufcient,thetimeconsumptionincreasesslightlycomparedto Figure8-3a. Weevaluatedtheperformanceofourapproximationschemeusingtasksets withdifferentsizes.Figure8-4andFigure8-5showtheactualratiobetweenthe approximationresultsandtheoptimalsolutionfortimeandenergyconsumption, respectively.Itcanbeseenthattheactualratioisusuallysmallerthantheexpectedratio 1+ .Forexample,for =0 : 02 ,itisexpectedtoproduceresultswithin 2% oftheoptimal values.Theactualgapbetweentheoptimalsolutionandtheapproximationscheduling issignicantlylessthan 2% Wealsoevaluatedtherunningtimeofourapproximationschemewithdifferent andnumberoftasks.Theresultson2coreand4coresystemsareshowninFigure8-6. Curve DPRA representstheexecutiontimeoftheexactalgorithm DPRA .Asexpected, DPRA requiresmoretimeforsmaller orlargerjobsetsize.Itstimeconsumptionissill 150

PAGE 151

2 ; 3 at1.5V 2 ; 1 at0.8V 1 ; 1 at1.5V 1 ; 2 at1.1V 2 ; 2 at1.5V 1 ; 3 at1.5V A 1 ; 3 at1.1V 1 ; 1 at1.5V 2 ; 3 at1.5V 2 ; 1 at1.5V 2 ; 2 at1.1V 1 ; 2 at1.1V B 2 ; 3 at1.5V 1 ; 1 at1.5V 1 ; 2 at1.1V 1 ; 3 at1.1V 2 ; 1 at1.1V 2 ; 2 at1.1V C Figure8-3.Temperatureandenergyconstrainedscheduling.A C T =95 C and C E =23000 mJ .B C T =95 C and C E =22000 mJ .C C T =85 C and C E =23000 mJ 151

PAGE 152

0.9 1 1.1 1.2 1.3 1.4 1.5 1520253010151015 TimeconsumptionAPXvs.OPT TaskNumber OPT =0 : 05 =0 : 1 =0 : 15 =0 : 2 =0 : 5 Figure8-4.Actualtimeconsumptionof DPRA 0.9 1 1.1 1.2 1.3 1.4 1.5 1520253010151015 EnergyconsumptionAPXvs.OPT TaskNumber OPT =0 : 05 =0 : 1 =0 : 15 =0 : 2 =0 : 5 Figure8-5.Actualenergyconsumptionof DPRA 0 100 200 300 400 500 600 051015202530 RunTimeseconds TaskNumbercores DPRA =0 : 04 =0 : 06 =0 : 10 =0 : 20 0 500 1000 1500 2000 2500 3000 05101520 TaskNumbercores DPRA =0 : 04 =0 : 06 =0 : 10 =0 : 20 Figure8-6.Runningtimewithdifferentjobsetsizeand 152

PAGE 153

signicantlysmallerthantheexactalgorithm DPRA ,whichgrowsexponentiallywiththe numberoftasks. 8.6Summary Inthischapter,westudiedtaskschedulingproblemonamulticoreprocessor withDVScapabilityunderbothtemperatureandenergyconstraints.Wepresenta polynomialtimeapproximationscheme.Whentheoriginalproblemisschedulable,our approximationalgorithmisguaranteedtogenerateasolution,whichwillnotviolatethe temperatureconstraint,andconsumenomorethanadesignerspeciedboundtime andenergycomparedwiththeoptimalsolution.Weevaluatedourapproachusingboth realandsyntheticbenchmarksmappedonrealmulticoreprocessors.Theexperimental resultsdemonstratethatourtechniqueisabletoproduceschedulesclosetooptimal solutionwithreasonableexecutiontime. 153

PAGE 154

CHAPTER9 CONCLUSIONSANDFUTUREWORK Multicorearchitecturesarewidelyusedintoday'sdesktop,serverandembedded systems.Increasingcomplexityofmodernmulticorearchitecturesintroducesunique validationchallenges.Thisdissertationdescribedasetofnoveltechniquesand methodologiesforsystemlevelvalidationofmulticorearchitectures.Thischapter concludesthisdissertationandoutlinespossiblefutureresearchdirections. 9.1Conclusions Todesignreliablemulticoresystems,itiscrucialtosatisfybothfunctionaland non-functionalrequirements.Thefunctionalrequirementsensurethatthedesign performsallthelogicaloperationsasspecied.Thenon-functionalrequirements guaranteethatthesystemdoesnotviolatevariousdesignconstraintssuchasarea, power,energy,andtemperature.Increasingcomplexityofmulticorearchitectures introducessignicantchallengesduringvalidationofbothfunctionalbehaviorand non-functionalrequirements.Thisdissertationdevelopednoveltechniquestoaddress thesevalidationchallenges.Thisdissertation'scontributionsaresummarizedasfollows. InChapter3andChapter4,wepresenteddirectedtestgenerationtechniquesfor thefunctionalvalidationofmulticorearchitectures.Althoughsimulationusingdirected testsrequiressignicantlylessnumberofteststoachievethesamecoveragegoal comparedtorandomtests,itisverytimeconsumingtogeneratethedirectedtests automaticallyduetothelimitationofcurrentmodelcheckingtools.Whileexistingworks haveexploitedthetemporalsymmetryinboundedmodelcheckingBMCacross differenttimesteps,wepresentedanovelapproachfordirectedtestgenerationof multicorearchitecturesthatexploitstemporal,structural,aswellasspatialsymmetryin SAT-basedBMC.TheCNFdescriptionofthedesignissynthesizedusingCNFforcores, busandmemorysubsystemtopreservethemappinginformationbetweendifferent cores.Asaresult,thesymmetrichighlevelstructure,i.e.,structuralsymmetry,iswell 154

PAGE 155

preservedandtheknowledgelearnedfromasinglecorewaseffectivelysharedbyother coresduringtheSATsolvingprocess.Theexperimentalresultsusinghomogeneous aswellasheterogeneousmulticorearchitecturesdemonstratedthatourapproachis remarkablyfaster-10timescomparedtoexistingmethods. Chapter5describedanefcienttestgenerationapproachforawidevarietyof cachecoherenceprotocols.Wehaveperformeddetailedanalysisofthespacestructure ofseveralpopularprotocols,anddevelopednoveltechniquestogenerateefcienttest sequencestoachieve100%stateandtransitioncoverageforeachcachecoherence protocol.Ourapproachoutperformedexistingapproachesbasedonconstrained randomtestsprovidinghighertransitioncoveragewithlinearmemoryrequirement. Wealsoconductedexperimentsusingawidevarietyofcachecoherenceprotocols todemonstratetheeffectivenessofourapproachonsystemswithdifferentnumberof cores,makingitsuitableforfuturemulticorearchitectures. InChapter6,weaddressedamajorobstacleinapplyingdirectedtestgeneration techniqueonrealworlddesigns.Sincemodelcheckersdonotdirectlyacceptdesigns writteninhardwaredescriptionlanguageordonotsupportallthefeatures,realdesigns mustbetranslatedorabstractedbeforetestgeneration.Wepresentedanovelapproach fordirectedtestgenerationusinginterleavedconcreteandsymbolicexecutionthat acceptsVerilogdesigns.Thedesignisrstsimulatedtogenerateanexecutiontrace. Theconstraintsolveristhenappliedtondasuitabletestpatternwhichcanforce therealdesigntoexercisethedesiredbehavior.Ourapproachalleviatesthedesign translationproblembydirectlyrecordingthelogicaloperationsperformedduringthe concretesimulation.Theconstraintsolvingcomplexityisalsoreduced,becauseweonly applythesolvertooneexecutiontraceatatime. Chapter7presentedaexibleandautomaticframeworktoaddressthetemperatureandenergy-constrainedschedulabilityvalidationprobleminmultitaskingsystemswith differentvoltagelevels.Theproblemismodeledbyextendedtimedautomata,whilethe 155

PAGE 156

energy/temperatureconstraintsaretranslatedintoCTLspecications.Amodelchecker wasemployedtodeterminewhetherthegiventasksetisschedulable.Inaddition,we alsoproposedapolynomialtimeapproximationschemetocircumventthecapacity limitationsofsymbolicmodelcheckers.Ourapproximationalgorithmisguaranteed togenerateresultsclosetooptimalvaluewithreasonablerunningtime.Weproved mathematicallythatourapproximationalgorithmwillgivenofalsepositiveanswer, whilethefalsenegativeratiocanbenegligiblysmallinpracticalscenarios.Extensive experimentalresultsdemonstratedtheeffectivenessofourapproach. Chapter8studiedtaskschedulabilityvalidationofDVS-enabledmulticore processors.Wehavedesignedapolynomialtimeapproximationscheme,suchthat whentheoriginalproblemisschedulable,ourapproximationalgorithmisguaranteedto generateasolution,whichwillnotviolatethetemperatureconstraint,andconsume nomorethanaspeciedamountoftimeandenergycomparedwiththeoptimal solution.Bothrealandsyntheticbenchmarksareusedtoevaluateourapproach. Theexperimentalresultssuggestthatourtechniqueisabletoproduceresultscloseto optimalsolutionwithreasonableexecutiontime. Inconclusion,thisdissertationpresentedacomprehensivestudyofthesystem-level validationofmulticorearchitectures.Wedevelopedasetofefcientvalidation techniquesandevaluatedthemonavarietyofmulticoresystems.Ourresearchwill enabledesignersandvalidationengineerstosignicantlyimprovethequalityoffuture multicoredesigns. 9.2FutureResearchDirections Thevalidationofmulticorearchitectureswillcontinuetobeoneofthemost importantchallengesinthedevelopmentoffuturedesktop,server,andembedded systems.Theresearchdescribedinthisdissertationcanbeextendedinthefollowing directions: 156

PAGE 157

ThecapacityofunderlyingSATsolversisanimportantbottleneckfordirected testgenerationofmulticorearchitectures..Althoughourproposedtechniqueshave shownsignicantreductionoftheoverallsolvingtime,thetimeconsumptionneedsto befurtherreducedtomakeitapplicableoncomplexindustrialdesigns.Furtherstudies arerequiredtoanalyzetheSATsolvingprocessandmakemoreefcientlearning techniquestoincreasethecapacityofexistingapproaches. Wehaveshownthatthestatespaceofmanycachecoherenceprotocolsinmodern multicorearchitectureshavequiteregularstructure.Webelievethatourproposed techniquescanbefurtherextendedtoeffectivelyanalyzetheprotocolimplementation withlargenumberofcores.Althoughthefulltransitioncoveragemaybecomeinfeasible fortoomanycores,theknowledgeaboutthespacestructurecanbeusedtoeffectively distributethetestvectorswithinthestatespace,sothatcomplexbugscanbedetected. Ourworkinthedirectionofscalabledirectedtestgenerationhasdemonstrated theeffectivenessoftheintegrationofconcretesimulationandstaticanalysis.Further studiescaninvestigatehowtosupportmoreHDLfeaturesandemploymoreconstraint solvingoptimizations.Theproposedtechniquecanalsobeincorporatedwithrandom testgenerationtoreducetheoverallvalidationtime. Ourvalidationtechniquesfortaskschedulabilityanalysiscanbefurtherextended tosupportmorecomplexthermalmodels.Differentvalidationtechniquescanbe developedfortasksetswithrelativelysmallexecutiontime.Althoughpolynomial-time approximationalgorithmscanprovideresultswithboundederrors,theircomputational complexitycanbequitehighwhenthesystemcontainsmanycores.Itistherefore desirabletohavefastandefcientheuristicalgorithmsforschedulabilityvalidationin futuremulticoreandmanycoresystems. 157

PAGE 158

REFERENCES [1]D.Abts,S.Scott,andD.Lilja.Somanystates,solittletime:verifyingmemory coherenceintheCrayX1.In ProceedingsofInternationalParallelandDistributed ProcessingSymposium, ,2003. [2]A.Adir,E.Almog,L.Fournier,E.Marcus,M.Rimon,M.Vinov,andA.Ziv. Genesys-pro:innovationsintestprogramgenerationforfunctionalprocessor verication. IEEEDesignTestofComputers ,21:8493,2004. [3]F.A.Aloul,I.L.Markov,andK.Sakallah.Shatter:efcientsymmetry-breakingfor booleansatisability.In ProceedingsofDesignAutomationConference ,pages 836,2003. [4]F.A.Aloul,I.L.Markov,andK.A.Sakallah. Shatter .UniversityofMichigan,2003. Availablefrom: http://www.aloul.net/Tools/shatter/ [5]F.A.Aloul,A.Ramani,I.L.Markov,andK.Sakallah.SolvingdifcultSAT instancesinthepresenceofsymmetry.In ProceedingsofDesignAutomation Conference ,pages731,2002. [6]R.AlurandD.L.Dill.Atheoryoftimedautomata. TheoreticalComputerScience 126:183,1994. [7]S.Artzi,A.Kiezun,J.Dolby,F.Tip,D.Dig,A.Paradkar,andM.Ernst.Finding bugsinwebapplicationsusingdynamictestgenerationandexplicit-statemodel checking. IEEETransactionsonSoftwareEngineering ,36:474,2010. [8]H.Aydin,R.Melhem,D.Mosse,andP.Mejia-Alvarez.Determiningoptimal processorspeedsforperiodicreal-timetaskswithdifferentpowercharacteristics. In ProceedingsofEuromicroConferenceonReal-TimeSystems ,pages225, 2001. [9]H.Aydin,R.Melhem,D.Mosse,andP.Mejia-Alvarez.Power-awareschedulingfor periodicreal-timetasks. IEEETransactionsonComputers ,53:584,2004. [10]M.Berkelaar,K.Eikland,andP.Notebaert. lpsolve .EindhovenUniveristyof Technology,2010.Availablefrom: http://lpsolve.sourceforge.net/ [11]D.Bernstein,D.Cohen,andD.Maydan.Dynamicmemorydisambiguationfor arrayreferences.In ProceedingsofInternationalSymposiumonMicroarchitecture ,pages105111,1994. [12]A.Biere,A.Cimatti,E.M.Clarke,O.Strichman,andY.Zhu.Boundedmodel checking. AdvancesinComputers ,58:118,2003. [13]A.Biere,A.Cimatti,E.M.Clarke,andY.Zhu.Symbolicmodelcheckingwithout BDDs.In ProceedingsofInternationalConferenceonToolsandAlgorithmsfor ConstructionandAnalysisofSystems ,pages193,1999. 158

PAGE 159

[14]A.BiereandC.Sinz.DecomposingSATproblemsintoconnectedcomponents. JournalonSatisability,BooleanModelingandComputation ,2:191,2006. [15]N.Binkert,R.Dreslinski,L.Hsu,K.Lim,A.Saidi,andS.Reinhardt.TheM5 Simulator:ModelingNetworkedSystems. IEEEMicro ,26:52,2006. [16]S.Borkar,T.Karnik,S.Narendra,J.Tschanz,A.Keshavarzi,andV.De. Parametervariationsandimpactoncircuitsandmicroarchitecture.In Proceedings ofDesignAutomationConference ,pages338,2003. [17]R.E.Bryant.Graph-basedalgorithmsforbooleanfunctionmanipulation. IEEE TransactionsonComputers ,35:677,1986. [18]R.Cavada,A.Cimatti,C.A.Jochim,G.Keighren,E.Olivetti,M.Pistore, M.Roveri,andA.Tchaltse. NuSMV .ITC-Irst,2010.Availablefrom: http://nusmv.irst.itc.it/ [19]T.Chantem,R.P.Dick,andX.S.Hu.Temperature-awareschedulingand assignmentforhardreal-timeapplicationsonmpsocs.In Proceedingsofthe ConferenceonDesign,AutomationandTestinEurope ,pages288,2008. [20]J.-J.Chen,C.-M.Hung,andT.-W.Kuo.Ontheminimizationfotheinstantaneous temperatureforperiodicreal-timetasks.In ProceedingsofIEEERealTimeand EmbeddedTechnologyandApplicationsSymposium ,pages236,2007. [21]J.-J.ChenandC.-F.Kuo.Energy-efcientschedulingforreal-timesystemson dynamicvoltagescalingdvsplatforms.In ProceedingsofIEEEInternational ConferenceonEmbeddedandReal-TimeComputingSystemsandApplications pages28,2007. [22]M.ChenandP.Mishra.Functionaltestgenerationusingefcientproperty clusteringandlearningtechniques. IEEETransactionsonComputer-AidedDesign ofIntegratedCircuitsandSystems ,29:396,2010. [23]M.ChenandP.Mishra.Decisionorderingbasedpropertydecomposition forfunctionaltestgeneration.In ProceedingsoftheConferenceonDesign, AutomationandTestinEurope ,pages167,2011. [24]M.ChenandP.Mishra.Propertylearningtechniquesforefcientgenerationof directedtests. IEEETransactionsonComputers ,60:852,2011. [25]M.Chen,X.Qin,andP.Mishra.Efcientdecisionorderingtechniquesfor sat-basedtestgeneration.In ProceedingsoftheConferenceonDesign,AutomationandTestinEurope ,pages490,2010. [26]S.ChenandK.Nahrstedt.Onndingmulti-constrainedpaths.In Proceedingsof IEEEInternationalConferenceonCommunications ,volume2,pages874, 1998. 159

PAGE 160

[27]X.Chen,Y.Yang,M.Delisi,G.Gopalakrishnan,andC.-T.Chou.Hierarchical cachecoherenceprotocolvericationonelevelatatimethroughassume guarantee.In ProceedingsofIEEEInternationalHighLevelDesignValidation andTestWorkshop ,pages107,2007. [28]E.Clarke,A.Biere,R.Raimi,andY.Zhu.Boundedmodelcheckingusing satisabilitysolving. FormalMethodsinSystemDesign ,19:7,2001. [29]A.Coskun,T.Rosing,K.Whisnant,andK.Gross.Staticanddynamic temperature-awareschedulingformultiprocessorsocs. IEEETransactionson VeryLargeScaleIntegrationVLSISystems ,16:1127,2008. [30]P.T.Darga,M.H.Lifton,K.A.Sakallah,andI.L.Markov.Exploitingstructure insymmetrydetectionforcnf.In ProceedingsofDesignAutomationConference pages530,2004. [31]M.Davis,G.Logemann,andD.Loveland.Amachineprogramfor theorem-proving. CommunicationofACM ,5:394,1962. [32]M.DavisandH.Putnam.Acomputingprocedureforquanticationtheory. Journal ofACM ,7:201,1960. [33]D.Dill,A.Drexler,A.Hu,andC.Yang.Protocolvericationasahardwaredesign aid.In ProceedingsofInternationalConferenceonComputerDesign ,pages 522,1992. [34]B.DutertreandL.M.deMoura.Afastlinear-arithmeticsolverforDPLLT.In ProceedingsofInternationalConferenceonComputerAidedVercation ,pages 81,2006. [35]J.EdmondsandE.L.Johnson.Matching,EulerTours,andtheChinesePostman. MathematicalProgramming ,5:88,1973. [36]E.EmersonandV.Kahlon.Exactandefcientvericationofparameterized cachecoherenceprotocols.In ProceedingsofIFIPWG10.5AdvancedResearch WorkingConferenceonCorrectHardwareDesignandVericationMethods volume2860,pages247,2003. [37]E.Fersman,P.Pettersson,andW.Yi.Timedautomatawithasynchronous processes:Schedulabilityanddecidability.In ProceedingsofInternationalConferenceonToolsandAlgorithmsfortheConstructionandAnalysisofSystems pages67,2002. [38]Z.Fu,Y.Mahajan,andS.Malik. zChaff .PrincetonUniversity,2001.Available from: http://www.princeton.edu/ ~ chaff/zchaff.html [39]A.GargantiniandC.Heitmeyer.Usingmodelcheckingtogeneratetestsfrom requirementsspecications.In Proceedingsofthe7thEuropeanSoftware EngineeringConferenceheldjointlywiththe7thACMSIGSOFTInternational 160

PAGE 161

SymposiumonFoundationsofSoftwareEngineering ,volume24,pages146, 1999. [40]P.Godefroid,N.Klarlund,andK.Sen.Dart:directedautomatedrandomtesting. In ProceedingsoftheACMSIGPLANconferenceonProgramminglanguage designandimplementation ,pages213,2005. [41]M.Guthaus,J.Ringenberg,D.Ernest,T.Austin,T.Mudge,andR.Brown. Mibench:Afree,commerciallyrepresentativeembeddedbenchmarksuite.In ProceedingsofIEEEInternationalWorkshoponWorkloadCharacterization pages3,2001. [42]J.HennessyandD.Patterson. ComputerArchitecture:AQuantitativeApproach MorganKaufmannPublishers,2003. [43]J.N.Hooker.Solvingtheincrementalsatisabilityproblem. JournalofLogic Programming ,15-2:177,1993. [44]W.Huang,K.Sankaranarayanan,K.Skadron,R.J.Ribando,andM.R.Stan. Accurate,pre-rtltemperature-awaredesignusingaparameterized,geometric thermalmodel. IEEETransactionsonComputers ,57:1277,2008. [45]M.A.J.Bhadra,E.Tromova.Validatingpowerarchitecturetechnology-based mpsocsthroughexecutablespecications. IEEETransactionsonVeryLarge ScaleIntegrationVLSISystems ,16:388,2008. [46]R.JayaseelanandT.Mitra.Temperatureawaretasksequencingandvoltage scaling.In ProceedingsofIEEE/ACMInternationalConferenceonComputerAidedDesign ,pages618,2008. [47]R.JejurikarandR.Gupta.Energyawarenon-preemptiveschedulingforhard real-timesystems.In ProceedingsofEuromicroConferenceonReal-Time Systems ,pages21,2005. [48]R.Jejurikar,C.Pereira,andR.K.Gupta.Leakageawaredynamicvoltage scalingforreal-timeembeddedsystems.In ProceedingsofDesignAutomation Conference ,pages275,2004. [49]Z.Khasidashvili,A.Nadel,A.Palti,andZ.Hanna.SimultaneousSAT-based modelcheckingofsafetyproperties.In ProceedingsofHaifaVericationConference ,pages56,2005. [50]H.-M.KooandP.Mishra.Functionaltestgenerationusingproperty decompositionsforvalidationofpipelinedprocessors.In Proceedingsofthe ConferenceonDesign,AutomationandTestinEurope ,pages1240,2006. [51]H.-M.KooandP.Mishra.TestgenerationusingSAT-basedboundedmodel checkingforvalidationofpipelinedprocessor.In ProceedingsofACMGreatLakes SymposiumonVLSI ,pages362,2006. 161

PAGE 162

[52]H.-M.KooandP.Mishra.Functionaltestgenerationusingdesignandproperty decompositiontechniques. ACMTransactionsonEmbeddedComputingSystems 8:32:1:33,2009. [53]A.Kuehlmann.Dynamictransitionrelationsimplicationforboundedproperty checking.In ProceedingsofIEEE/ACMInternationalConferenceonComputerAidedDesign ,pages50,2004. [54]L.Liu,D.Sheridan,W.Tuohy,andS.Vasudevan.Towardscoverageclosure: Usinggoldmineassertionsforgeneratingdesignvalidationstimulus.In ProceedingsoftheConferenceonDesign,AutomationandTestinEurope ,pages 173,2011. [55]L.LiuandS.Vasudevan.Star:Generatinginputvectorsfordesignvalidationby staticanalysisofRTL.In ProceedingsofIEEEHLDVTWorkshop ,2009. [56]L.LiuandS.Vasudevan.Efcientvalidationinputgenerationinrtlbyhybridized sourcecodeanalysis.In ProceedingsoftheConferenceonDesign,Automation andTestinEurope ,pages1,2011. [57]Y.Liu,H.Yang,R.P.Dick,H.Wang,andL.Shang.Thermalvsenergy optimizationfordvfs-enabledprocessorsinembeddedsystems.In ProceedingsofInternationalSymposiumonQualityElectronicDesign ,pages204, 2007. [58]A.Lungu,P.Bose,D.J.Sorin,S.German,andG.Janssen.Multicorepower management:Ensuringrobustnessviaearly-stageformalverication.In ProceedingsofIEEE/ACMInternationalConferenceonFormalMethodsandModelsfor Co-Design ,pages78,2009. [59]J.P.Marques-SilvaandK.A.Sakallah.GRASP:ASearchAlgorithmfor PropositionalSatisability. IEEETransactionsonComputers ,48:506, 1999. [60]S.M.Martin,K.Flautner,T.Mudge,andD.Blaauw.Combineddynamicvoltage scalingandadaptivebodybiasingforlowerpowermicroprocessorsunder dynamicworkloads.In ProceedingsofIEEE/ACMInternationalConference onComputer-AidedDesign ,pages721,2002. [61]Marvell. MarvellStrongARM1100processor .MarvellTechnologyGroupLtd., 2004. [62]A.Miller,A.Donaldson,andM.Calder.Symmetryintemporallogicmodel checking. ACMComputerSurvey ,38:8,2006. [63]P.MishraandM.Chen.Efcienttechniquesfordirectedtestgenerationusing incrementalsatisability.In ProceedingsofInternationalConferenceonVLSI Design ,pages65,2009. 162

PAGE 163

[64]P.MishraandN.Dutt.Graph-basedfunctionaltestprogramgenerationfor pipelinedprocessors.In ProceedingsoftheConferenceonDesign,Automation andTestinEurope ,pages182,2004. [65]M.W.Moskewicz,C.F.Madigan,Y.Zhao,L.Zhang,andS.Malik.Chaff: engineeringanefcientSATsolver.In ProceedingsofDesignAutomation Conference ,pages530,2001. [66]C.Norstr om,A.Wall,andW.Yi.Timedautomataastaskmodelsforevent-driven systems.In ProceedingsofInternationalConferenceonReal-TimeComputing SystemsandApplications ,page182,1999. [67]M.Prasad,A.Biere,andA.Gupta.AsurveyofrecentadvancesinSAT-based formalverication. InternationalJournalonSoftwareToolsforTechnologyTransfer STTT ,7:156,2005. [68]X.Qin,M.Chen,andP.Mishra.Synchronizedgenerationofdirectedtestsusing satisabilitysolving.In ProceedingsofInternationalConferenceonVLSIDesign pages351,2010. [69]X.QinandP.Mishra.Efcientdirectedtestgenerationforvalidationofmulticore architectures.In ProceedingsofInternationalSymposiumonQualityElectronic Design ,pages276,2011. [70]X.QinandP.Mishra.Automatedgenerationofdirectedtestsfortransition coverageincachecoherenceprotocols.In ProceedingsoftheConferenceon Design,AutomationandTestinEurope ,pages3,2012. [71]X.Qin,W.Wang,andP.Mishra.Tcec:Temperature-andenergy-constrained schedulinginreal-timemultitaskingsystems. ToappearinIEEETransactionson Computer-AidedDesignofIntegratedCircuitsandSystems ,2012. [72]K.SenandG.Agha.Cuteandjcute:Concolicunittestingandexplicitpath model-checkingtools.In ProceedingsofInternationalConferenceonComputer AidedVercation ,pages419,2006. [73]D.ShinandJ.Kim.Dynamicvoltagescalingofperiodicandaperiodictasks inpriority-drivensystems.In ProceedingsofAsiaandSouthPacicDesign AutomationConference ,pages653,2004. [74]S.ShuklaandR.Gupta.Amodelcheckingapproachtoevaluatingsystemlevel dynamicpowermanagementpoliciesforembeddedsystems.In Proceedings ofIEEEInternationalHigh-LevelDesignValidationandTestWorkshop ,pages 53,2001. [75]K.Skadron,M.R.Stan,K.Sankaranarayanan,W.Huang,S.Velusamy,and D.Tarjan.Temperature-awaremicroarchitecture:Modelingandimplementation. ACMTransactionsonArchitectureandCodeOptimization ,1:94,2004. 163

PAGE 164

[76]M.SongandS.Sahni.Approximationalgorithmsformulticonstrained quality-of-servicerouting. IEEETransactionsonComputers ,55:603617, 2006. [77]G.S.Spirakis.Designingfor65nmandbeyond.In KeynoteAddressatthe ConferenceonDesign,AutomationandTestinEurope ,2004. [78]O.Strichman.PruningtechniquesfortheSAT-basedboundedmodelchecking problem.In ProceedingsofIFIPWG10.5AdvancedResearchWorkingConferenceonCorrectHardwareDesignandVericationMethods ,pages58, 2001. [79]O.Strichman.Acceleratingboundedmodelcheckingofsafetyproperties. Formal MethodsinSystemDesign ,24:5,2004. [80]D.Tang,S.Malik,A.Gupta,andC.N.Ip.SymmetryreductioninSAT-based modelchecking.In ProceedingsofInternationalConferenceonComputerAided Vercation ,pages125,2005. [81]S.Vasudevan,D.Sheridan,D.Tcheng,S.Patel,W.Tuohy,andD.Johnson. Goldmine:Automaticassertiongenerationusingdataminingandstaticanalysis. In ProceedingsoftheConferenceonDesign,AutomationandTestinEurope pages626,2010. [82]R.Viswanath,V.Wakharkar,A.Watwe,andV.Lebonheur.Thermalperformance challengesfromsilicontosystems. IntelTechnologyJournal ,4:1,2000. [83]I.WagnerandV.Bertacco.Mcjammer:adaptivevericationformulti-coredesigns. In ProceedingsoftheConferenceonDesign,AutomationandTestinEurope pages670,2008. [84]S.WangandR.Bettati.Reactivespeedcontrolintemperature-constrained real-timesystems.In ProceedingsofEuromicroConferenceonReal-Time Systems ,pages10pp.,2006. [85]W.WangandP.Mishra.Leakage-awareenergyminimizationusingdynamic voltagescalingandcacherecongurationinreal-timesystems.In Proceedingsof InternationalConferenceonVLSIDesign ,pages357,2010. [86]W.WangandP.Mishra.Predvs:Preemptivedynamicvoltagescalingforreal-time systemsusingapproximationscheme.In ProceedingsofDesignAutomation Conference ,pages705,2010. [87]W.Wang,P.Mishra,andA.Gordon-Ross.Sacr:Scheduling-awarecache recongurationforreal-timeembeddedsystems.In ProceedingsofInternational ConferenceonVLSIDesign ,pages547,2009. [88]W.Wang,X.Qin,andP.Mishra.Temperature-andenergy-constrainedscheduling inmultitaskingsystems:amodelcheckingapproach.In Proceedingsof 164

PAGE 165

ACM/IEEEInternationalSymposiumonLowPowerElectronicsandDesign pages85,2010. [89]Z.WangandJ.Crowcroft.Quality-of-serviceroutingforsupportingmultimedia applications. IEEEJournalonSelectedAreasinCommunications ,14:1228 ,1996. [90]Z.WangandS.Ranka.Asimplethermalmodelformulti-coreprocessorsandits applicationtoslackallocation.In ProceedingsofIEEEInternationalSymposium onParallelandDistributedProcessing ,pages1,2010. [91]J.Whittemore,J.Kim,andK.Sakallah.SATIRE:Anewincrementalsatisability engine.In ProceedingsofDesignAutomationConference ,pages542,2001. [92]S.Williams. IcarusVerilog .IcarusVerilog,2012.Availablefrom: http:// iverilog.icarus.com/ [93]D.Wood,G.Gibson,andR.Katz.Verifyingamultiprocessorcachecontroller usingrandomtestgeneration. IEEEDesignTestofComputers ,7:13,1990. [94]F.Xie,M.Martonosi,andS.Malik.Compile-timedynamicvoltagescalingsettings: opportunitiesandlimits.In ProceedingsoftheACMSIGPLANConferenceon ProgrammingLanguageDesignandImplementation ,pages49,2003. [95]F.Xie,M.Martonosi,andS.Malik.Boundsonpowersavingsusingruntime dynamicvoltagescaling:anexactalgorithmandalinear-timeheuristic approximation.In ProceedingsofInternationalSymposiumonLowPower ElectronicsandDesign ,pages287,2005. [96]G.Xue,W.Zhang,J.Tang,andK.Thulasiraman.Polynomialtimeapproximation algorithmsformulti-constrainedqosrouting. IEEE/ACMTransactionsonNetworking ,16:656,2008. [97]L.YuanandG.Qu.Alt-dvs:Dynamicvoltagescalingwithawarenessofleakage andtemperatureforreal-timesystems.In ProceedingsofNASA/ESAConference onAdaptiveHardwareandSystems ,pages660,2007. [98]X.Yuan.Heuristicalgorithmsformulticonstrainedquality-of-servicerouting. IEEE/ACMTransactionsonNetworking ,10:244,2002. [99]L.Zhang,C.F.Madigan,M.H.Moskewicz,andS.Malik.Efcientconict drivenlearninginabooleansatisabilitysolver.In ProceedingsofIEEE/ACM InternationalConferenceonComputer-AidedDesign ,pages279,2001. [100]S.Zhang,K.Chatha,andG.Konjevod.Approximationalgorithmsforpower minimizationofearliestdeadlinerstandratemonotonicschedules.In ProceedingsofInternationalSymposiumonLowPowerElectronicsandDesign ,pages 225,2007. 165

PAGE 166

[101]S.ZhangandK.S.Chatha.Approximationalgorithmforthetemperatureaware schedulingproblem.In ProceedingsofInternationalConferenceonComputerAidedDesign ,pages281,2007. [102]Y.Zhang,X.Hu,andD.Z.Chen.Taskschedulingandvoltageselectionforenergy minimization.In ProceedingsofDesignAutomationConference ,pages183, 2002. [103]X.ZhongandC.Xu.System-wideenergyminimizationforreal-timetasks: Lowerboundandapproximation.In ProceedingsofInternationalConferenceon Computer-AidedDesign ,pages516,2006. 166

PAGE 167

BIOGRAPHICALSKETCH XiaokeQinreceivedtheB.S.andM.S.degreesfromtheDepartmentofAutomation, TsinghuaUniversity,Beijing,China,in2004and2007respectively.Hereceivedhis Ph.D.fromtheDepartmentofComputerandInformationScienceandEngineering, UniversityofFlorida,USA,in2012.Hisresearchinterestsareintheareaofcode compression,modelcheckingandsystemverication. 167