Citation
Framework for Optimizing FPGA-Based Space Systems

Material Information

Title:
Framework for Optimizing FPGA-Based Space Systems
Creator:
Wulf, Nicholas L
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
GEORGE,ALAN DALE
Committee Co-Chair:
GORDON-ROSS,ANN M
Committee Members:
LAM,HERMAN
TELESCO,CHARLES MICHAEL

Subjects

Subjects / Keywords:
computing
dependability
fpga
optimization
power
space

Notes

General Note:
On-board processing systems are often deployed in harsh aerospace environments and must therefore adhere to stringent constraints such as low power, small size, and high dependability in the presence of faults. Field-programmable gate arrays (FPGAs) are often an attractive option for designers seeking low-power, high-performance devices. However, unlike non-reconfigurable devices, radiation effects can alter an FPGA's functionality instead of just the device's data, requiring designers to consider fault-tolerant strategies to mitigate these effects. This research presents three major contributions toward developing a framework to aid designers in considering a broad range of devices and fault-tolerant strategies for on-board FPGA-based space processing, highlighting the most promising options and tradeoffs early in the design process. First, we introduce our framework's basic methodology, which leverages the computational density (CD) metric to estimate the power and dependability of FPGA-based designs. Our framework uses these estimates to compare various designs and identify the Pareto-optimal set of devices and fault-tolerant strategies for a particular mission environment and application. Using a hyperspectral-imaging (HSI) mission case study, our framework demonstrates effectiveness by pruning a design space of over 1,000 designs to a reduced trade-off space of fifteen Pareto-optimal designs, a 98.7\% reduction. Second, we generalize the CD metric using linear programming (LP) to expand the range of applications our framework can consider. Our LP method also increases the accuracy of our framework's power and dependability estimations and provides recommendations for allocating FPGA resources to optimize power or dependability. Experimental results using dot product and distance calculation applications validate our LP method's capabilities. Third, we leverage our LP method to enable our framework's memory extension, which improves the accuracy of our framework's analysis by extending our models to include internal-memory capacity and internal/external memory-bandwidth. We use the HSI mission case study to show that memory resources may limit the performance of FPGA-based on-board processing designs and contribute significantly towards power and dependability results. Combined, these three contributions establish our framework's capability as an early-design tool for FPGA-based on-board processing systems and demonstrate our framework's potential for further extensions to improve the accuracy and breadth of analysis.

Record Information

Source Institution:
UFRGP
Rights Management:
All applicable rights reserved by the source institution and holding location.
Embargo Date:
5/31/2018

Downloads

This item has the following downloads:


Full Text

PAGE 1

FRAMEWORKFOROPTIMIZINGFPGA-BASEDSPACESYSTEMSByNICHOLASL.WULFADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2016

PAGE 2

c2016NicholasL.Wulf

PAGE 3

ToMomandGarthforalltheirsupport,Calliforbringingthegoodmemes,andtheWomanfortoleratingmewhileIworkedtograduate

PAGE 4

ACKNOWLEDGMENTSThisworkwassupportedinpartbytheI/UCRCProgramoftheNationalScienceFoundationunderGrantNo.IIP-1161022.TheauthorgratefullyacknowledgesvendorequipmentandtoolsprovidedbyXilinxthathelpedmakethisworkpossible. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 10 CHAPTER 1INTRODUCTION .................................. 12 2BACKGROUND ................................... 19 3AFRAMEWORKFOREVALUATINGANDOPTIMIZINGFPGA-BASEDSOCSFORAEROSPACECOMPUTING ..................... 21 3.1BackgroundandRelatedWork ......................... 21 3.2Framework .................................... 24 3.2.1Overview ................................. 24 3.2.2System-PropertyComponents ..................... 26 3.2.3AnalysisComponent .......................... 28 3.3EvaluationMetrics ............................... 30 3.3.1Power .................................. 30 3.3.2Dependability .............................. 35 3.3.3Lifetime ................................. 37 3.4ExampleCaseStudy .............................. 41 3.4.1MissionDetails ............................. 41 3.4.2CalculationofFrameworkEvaluationMetrics ............. 44 3.4.3ExperimentalSetup ........................... 46 3.4.4ResultsandAnalysis .......................... 51 3.5Conclusions ................................... 60 4OPTIMIZINGFPGAPERFORMANCE,POWER,ANDDEPENDABILITYWITHLINEARPROGRAMMING ......................... 62 4.1BackgroundandRelatedWork ......................... 62 4.2OptimizingPerformance ............................ 63 4.2.1LinearProgramming(LP) ....................... 64 4.2.2CreatingtheInitialTableau ...................... 66 4.2.3FinalTableauandResults ....................... 71 4.3ModicationstoOptimizeforPowerorDependability ............ 73 4.3.1OptimizingPower ............................ 74 4.3.2OptimizingDependability ....................... 77 4.4ResultsandAnalysis .............................. 81 5

PAGE 6

4.4.1BaseCaseStudy:DotProduct ..................... 81 4.4.2ComplexCaseStudy:DistanceCalculation .............. 85 4.5Conclusions ................................... 92 5MEMORY-AWAREOPTIMIZATIONOFFPGA-BASEDSPACESYSTEMS 93 5.1BackgroundandRelatedWork ......................... 93 5.2Memory-AwareAnalysis ............................ 94 5.2.1Memory-ResourceConceptsandAnalysis ............... 94 5.2.2Internal-MemoryExtension ....................... 95 5.2.3External-MemoryExtension ...................... 99 5.3CaseStudy ................................... 101 5.3.1HyperionHyperspectral-ImagingSensor ................ 102 5.3.2Case-StudyPerformanceandMemoryAnalysis ............ 103 5.4ResultsandAnalysis .............................. 107 5.4.1EectsofIncreasingRequiredPerformance .............. 107 5.4.2EectsofVaryingDeviceSize ..................... 111 5.4.3FindingPareto-OptimalDesigns .................... 113 5.5Conclusions ................................... 116 6CONCLUSIONS ................................... 117 REFERENCES ....................................... 119 BIOGRAPHICALSKETCH ................................ 125 6

PAGE 7

LISTOFTABLES Table page 3-1Virtex-4CREME96WeibullParameters[1,2]. ................... 45 3-2CountsofDevicesUnderStudy. ........................... 49 3-3DeviceTIDRatingsandEstimatedLifetimes. ................... 50 3-4Pareto-OptimalDesigns. ............................... 56 4-1Virtex-5LX20TResources. ............................. 66 4-2Virtex-5LX20TOperationVariantProperties. ................... 67 4-3SummaryofEquations. ............................... 69 4-4InitialTableauforPerformanceOptimization. ................... 70 4-5FinalTableauforPerformanceOptimization. ................... 71 4-6IterationResultsforPerformanceOptimization. .................. 73 4-7Virtex-5LX20TOperationVariantEstimatedPowerConsumption. ....... 75 4-8InitialTableauforPowerOptimization. ....................... 76 4-9FinalTableauforPowerOptimization. ....................... 77 4-10IterationResultsforPowerOptimization. ..................... 77 4-11Virtex-5LX20TOperationVariantEstimatedErrorRates. ............ 79 4-12InitialTableauforOptimizingDependability. ................... 80 4-13FinalTableauforDependabilityOptimization. ................... 80 4-14IterationResultsforDependabilityOptimization. ................. 81 4-15Virtex-5LX85TOperationVariantProperties. ................... 87 4-16PredictionsandResultsofPerformanceOptimizationforDistance-CalculationKernel. ......................................... 88 5-1EstimatedpoweranderrorratesforVirtex-4,Virtex-5,andVirtex-6FPGAresources. 97 5-2PowerconsumptionanderrorratesforthefourPareto-optimaldesigns. ..... 115 7

PAGE 8

LISTOFFIGURES Figure page 3-1Frameworkoverviewconsistingofthefoursystem-propertycomponents(corners)andanalysiscomponent(center). .......................... 25 3-2Flowchartfortheanalysiscomponentshowinginputsfromthefoursystem-propertycomponentsandanoutputconsistingofthenalPareto-optimaldesignset. .. 29 3-3Flowchartforthepower-metriccalculation. ..................... 31 3-4Flowchartforthedependability-metriccalculation. ................ 35 3-5Exampleenvironmental-radiationdata[3]. ..................... 36 3-6Flowchartforthelifetime-metriccalculation. .................... 38 3-7DeviceTIDratingsforvariousVirtexdevicesshowsimprovingTIDtrend[4]. 38 3-8AnnualTIDincircularequatorialorbitscomputedusingSHIELDOSE,AE8MAX,AP8MAXmodelswith4mmsphericalAlshielding[5]. .............. 40 3-9HSIimagecubeandthecharacteristicspectrumofasinglepixel[6]. ....... 42 3-10HSIanalysisidentifyingvariouskindsofvegetation[7]. .............. 43 3-11SPENVISmodelofTIDcontributionsfromelectrons,protons,andbremsstrahlungwithvaryinglevelsofaluminumshieldingfortheEO-1orbitduringsolarmaximumusingtheSHIELDOSE-2model. ........................... 47 3-12PoweranddependabilityresultsandParetofrontforVirtex-4-baseddesigns. .. 52 3-13PoweranddependabilityresultsandParetofrontforSpartan-3-baseddesigns. 52 3-14PoweranddependabilityresultsandParetofrontforVirtex-5-baseddesigns. .. 53 3-15PoweranddependabilityresultsandParetofrontforSpartan-6-baseddesigns. 53 3-16PoweranddependabilityresultsandParetofrontforVirtex-6-baseddesigns. .. 54 3-17PoweranddependabilityresultsandParetofrontforVirtex-5QV-baseddesigns. 54 3-18Alldesignswithsixfamily-specicParetofronts. ................. 55 3-19PoweranddependabilityresultsforalldesignsinthenalPareto-optimaldesignsetincludingfamily-specicParetofronts. ..................... 55 4-1Iterativeprocessfortestingmultiplelimitingfrequencies. ............. 72 4-2PowerpredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX20Trunningdot-productkernel. ............................. 83 8

PAGE 9

4-3FrequencypredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX20Trunningdot-productkernel. ......................... 84 4-4PowerpredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX85Trunningdistance-calculationkernel. ......................... 91 5-1High-levelmemorymodelofmemory-resourceinteractions. ............ 94 5-2BlockMMdividestheinputandoutputmatricesintosquareblocksofsizennandcomputesoneCblockatatimeusingmultiplepartialMMs. ....... 103 5-3EachentryinthepartialCblockrequiresnmultiplyandnadditionoperationsandnwordsfromboththeAandBblocks. .................... 104 5-4SourcesofpowerconsumptionasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithapower-optimizeddeviceconguration. ................................. 108 5-5SourcesoferrorsasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithadependability-optimizeddeviceconguration. ..................................... 110 5-6TotalpowerconsumptionanderrorratesasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithadevicecongurationoptimizedforeitherpowerordependability. ......... 111 5-7SourcesofpowerconsumptionforasystemusingaVirtex-4LXdevicewithadevicecongurationoptimizedforpower. ...................... 112 5-8SourcesoferrorsforasystemusingaVirtex-4LXdevicewithadevicecongurationoptimizedfordependability. ............................. 113 5-9GraphoferrorratesversuspowerforVirtex-4,-5,and-6FPGAfamilies(largermarkersrepresentlargerdeviceswithinafamily)showingnalPareto-optimalfronttracedontopofthefourmembersofthePareto-optimalset. ........ 114 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyFRAMEWORKFOROPTIMIZINGFPGA-BASEDSPACESYSTEMSByNicholasL.WulfMay2016Chair:AlanD.GeorgeCochair:AnnGordon-RossMajor:ElectricalandComputerEngineeringOn-boardprocessingsystemsareoftendeployedinharshaerospaceenvironmentsandmustthereforeadheretostringentconstraintssuchaslowpower,smallsize,andhighdependabilityinthepresenceoffaults.Field-programmablegatearrays(FPGAs)areoftenanattractiveoptionfordesignersseekinglow-power,high-performancedevices.However,unlikenon-recongurabledevices,radiationeectscanalteranFPGA'sfunctionalityinsteadofjustthedevice'sdata,requiringdesignerstoconsiderfault-tolerantstrategiestomitigatetheseeects.Thisresearchpresentsthreemajorcontributionstowarddevelopingaframeworktoaiddesignersinconsideringabroadrangeofdevicesandfault-tolerantstrategiesforon-boardFPGA-basedspaceprocessing,highlightingthemostpromisingoptionsandtradeosearlyinthedesignprocess.First,weintroduceourframework'sbasicmethodology,whichleveragesthecomputationaldensity(CD)metrictoestimatethepoweranddependabilityofFPGA-baseddesigns.OurframeworkusestheseestimatestocomparevariousdesignsandidentifythePareto-optimalsetofdevicesandfault-tolerantstrategiesforaparticularmissionenvironmentandapplication.Usingahyperspectral-imaging(HSI)missioncasestudy,ourframeworkdemonstrateseectivenessbypruningadesignspaceofover1,000designstoareducedtrade-ospaceoffteenPareto-optimaldesigns,a98.7%reduction.Second,wegeneralizetheCDmetricusinglinearprogramming(LP)toexpandtherangeofapplicationsourframeworkcanconsider.OurLPmethodalsoincreasestheaccuracyof 10

PAGE 11

ourframework'spoweranddependabilityestimationsandprovidesrecommendationsforallocatingFPGAresourcestooptimizepowerordependability.ExperimentalresultsusingdotproductanddistancecalculationapplicationsvalidateourLPmethod'scapabilities.Third,weleverageourLPmethodtoenableourframework'smemoryextension,whichimprovestheaccuracyofourframework'sanalysisbyextendingourmodelstoincludeinternal-memorycapacityandinternal/externalmemory-bandwidth.WeusetheHSImissioncasestudytoshowthatmemoryresourcesmaylimittheperformanceofFPGA-basedon-boardprocessingdesignsandcontributesignicantlytowardspoweranddependabilityresults.Combined,thesethreecontributionsestablishourframework'scapabilityasanearly-designtoolforFPGA-basedon-boardprocessingsystemsanddemonstrateourframework'spotentialforfurtherextensionstoimprovetheaccuracyandbreadthofanalysis. 11

PAGE 12

CHAPTER1INTRODUCTIONUnmanned,remote-sensingsystemsarecommonlyusedinairandspaceenvironmentstosenseandcollectrawdatafromthesurroundingenvironment.Thesystemtypicallythentransmitsthecollecteddatatoacentralgroundstationwherehigh-performancecomputersprocessandanalyzethedata.However,rapidlyimprovingsensortechnologyhassignicantlyincreasedtheamountofcollecteddata,whichmayexceedtheremotesystem'stransmissionbandwidth.Additionally,becauseremotesystemsarecontinuallyexploringfarther-reachingareas,transmissionlatenciescanbeontheorderoftensofminutesormore,whichhindersremotesystemsthatrelyonreal-timeoperatingdecisionsfromagroundstation.Inordertoaddressincreasingbandwidthpressureandtransmissionlatencies,remotesystemsincludeon-boardprocessingcapabilitiestoprocesstherawdatain-situandtransmitonlythesmaller,processeddata.Additionally,on-boardprocessingempowersremotesystemstoperformthenecessarycalculationsformakingintelligentautonomousoperatingdecisionsinreal-time,therebyreducingtheneedforhigh-latencycommandsfromadistantgroundstation.However,incorporatingon-boardprocessingintoanaerospacemissionischallengingwhenconsideringstringentsize,weight,andpower(SWaP)constraints.Powerisoftenthemostlimitingoftheseconstraintsbecauseenergyisdiculttocollectandstore,andincreasingtheprocessingperformanceincreasesthepowerconsumption.Challengesinaerospacealsoincluderadiationeects,whichcauseunexpectedanderroneousbehaviorsinprocessingsystemsandmaybeexacerbatedbydecreasingfeaturesizesandanincreasingnumberofprocessingelements.Field-programmablegatearrays(FPGAs)areoftenanattractiveoptionfordesignersseekinglow-power,high-performancedevices,butadditionalconsiderationsarerequiredtomitigateradiationeects.Unlikenon-recongurabledevices,radiationeectscanalterthedevice'sfunctionalityinstead 12

PAGE 13

ofjustthedevice'sdata.Therefore,onceadesignerhasdenedanaerospacemission'ssystemplatform,environment,andapplications(e.g.,hyperspectralimaging(HSI),real-timelanding,obstacleavoidance),theprimarydesignchallengeisdeviceandfault-tolerant(FT)strategyselection.Thedevicemustperformwellwiththemission'sapplicationsandbecapableofoperatingeectivelyinthemission'senvironment.AnappropriateFTstrategyisalsonecessaryformostmissionsinordertoguaranteecorrectoperationwithoutexcessiveresourceoverhead.Asuccessfuldesignofanon-boardprocessingsystemmeetsorexceedsallmissionconstraints(maximumpowerusage,maximumfaultrate,minimalprocessingthroughput,etc.).Sincethesemissionconstraintshavedierent,andoftencompeting,tradeos,thesetofsuccessfuldesignscontainsmanyPareto-optimaldesigns[ 8 ].Thedesignermustchoosethebestdesignbasedonthemissionconstraintsandacceptabletradeos.Forexample,becausemissionfailuremaybecatastrophic(e.g.,lossoflife),adesignermayincreasethesystem'spowerconsumptioninordertolowerfaultrates.Alternatively,forasensor-basedmission,faultsmaycausesupercialdamagetothemission'sdata(e.g.,afewdiscoloredpixels),sosacricingfaulttoleranceforincreasedprocessingperformancemightbeadvantageous.Notonlyisdeterminingthebestdesignacomplextask,thedesigner'srelianceonfamiliardevices,FTstrategies,anddevelopment-timeconstraintsoftenlimitsthedesignexplorationspace,whichmayprecludetimetoexplorenewdevicesandFTstrategies.Theselimitationsnarrowthedesignspace'sscope,possiblyresultinginsuccessfulyetnon-Pareto-optimaldesigns.Designersevaluatesystemdesignsusingevaluationmetrics,suchasperformance,power,dependability,deviceutilization,missionlifetime,anddesigncost.Onceperformanceconstraintshavebeenmet,power,dependability,andlifetimeareoftenthemostcriticalevaluationmetricsforon-boardprocessingsystemsinconstrainedandharshenvironments.Therefore,ourworkfocusesonthesethreemetrics,howeveradditionalmetricscouldbeincorporated.Thepowermetricmeasureshowmuchpowertheprocessingdevicewill 13

PAGE 14

consumeduringthemission.Thedependabilitymetricquantiesasystem'sabilitytocorrectlyoperatewithinthemissionenvironment,whichdesignersoftenrepresentasthemeantimetofailure(MTTF),meantimebetweenfailures(MTBF),ordata-lossrate.Thelifetimemetricestimatestheexpecteddurationoftimeforwhichthedesignwillremainfunctional.Toaiddesignersinaddressingthechallengesofdesigningon-boardprocessingsystems,wepresentanovelframeworkthatdeterminesasetofPareto-optimalsystemdesignsintermsofdeviceandFTstrategybasedonamission'sconstraints.Althoughwedesignedourframeworktoconsiderawiderangeofprocessingdevices,thisfoundationalworkfocusesonourframework'smethodologyandanalysiswithrespecttoFPGAdevices.Furthermore,whilethisworkfocusesonsystem-on-chip(SoC)-leveldesign,thescopeofourframeworkiseasilyexpandabletoincludeotherboard-andplatform-designfactors.Ourframeworkconsidersfoursystemproperties:mission,application,device,andFTstrategy.Thedesignerspeciesthemissionandapplicationproperties.Themissionpropertydenesinformationaboutthemissionenvironmentanddictatestheresourcesandconstraintsoftheon-boardprocessingsystembasedondesignconstraintsandavailableplatformresources(e.g.,sensors,powergeneration,memorycapacity).Theapplicationpropertydenestheon-boardprocessingtasks,whicharetypicallysensor-dataprocessingandautonomous-operationdecisions(i.e.,autonomousprocessing).Oncethesesystempropertieshavebeendened,ourframeworkanalyzesthesepropertieswithrespecttoalldeviceandFTstrategycombinationsstoredaprioriinourframework'sdatabase.Fromthisanalysis,ourframeworkproducesmetricdataforpower,dependability,andlifetimetodeterminethePareto-optimalsystemdesigns.UsingacasestudybasedonaHSIsatellitemission,ourframeworkdemonstrateseectivenessbypruningadesignspaceofover1,000designstoareducedtrade-odesignspaceoffteenPareto-optimaldesigns,a98.7%reduction. 14

PAGE 15

AnotherchallengespecicallyassociatedwithFPGA-baseddesigns,isdeterminingthemostappropriatedeviceconguration,whichdictatesthetypeandquantityofFPGAresources(e.g.,ip-ops(FF),look-uptables(LUT),digital-signalprocessing(DSP)units,blockRAMs(BRAM),andinput/output(I/O)pins)thatthesystemrequires.FPGAdevicesoermultiplewaystodenethesameoperationsusingdierentFPGAresources(e.g.,multiplyoperationswithandwithoutusingDSPunits),sothedesigner'schoiceofdevicecongurationmayaectthesystem'sperformance,power,and/ordependability.Therefore,FPGA-basedspace-systemdesignersmustndtheappropriatecombinationofFPGAdevice,FTstrategy,anddevicecongurationtomeetthemission'sgoals.Giventhelargedesignspaceaordedbythesenumerousdesignoptions,FPGA-basedspace-systemdesignisadaunting,laborioustaskwithoutreliableautomateddesigntoolassistance.Toaddressthisneed,wepresentalinearprogramming(LP)-basedmethodthat,basedonanapplication'sspeciccomputationaloperationsandadevice'savailableresources,quicklydeterminestheoptimaloperationdistributionwithrespecttoperformance,power,ordependabilityandcalculatesquantitativemetricsfordesigncomparisonpurposes.OurLPmethodismoreaccuratethanvendor-suppliedrst-orderestimationsandiscapableofmakingrelativecomparisonsbetweendierentFPGAsandoperationdistributions.Furthermore,designerscanperformourLPmethodwithaminimaldescriptionoftheapplication,priortoHDLimplementation,andwithoutlengthysynthesistimes.ThespeedofourLPmethodmakesthismethodaneectiveearlydesignexplorationtoolthatquicklyanalyzesthousandsofFPGAstodeterminethebestFPGAdevicesandoperationdistributions,whichsignicantlyreducesdesigntime.Combinedwithourframework,theLPmethodexpandstherangeofapplicationsthatourframeworkcananalyze,improvestheaccuracyofourframework'spoweranddependabilitymetrics,andenablesourframeworktoconsiderFPGAdevicecongurationsthataretargetedtowardoptimizingeitherpowerordependability.WedemonstrateourLPmethod's 15

PAGE 16

eectivenessbycomparingthecalculatedmetricsfromourLPmethodtotheexperimentalresultsofanoptimizedandsynthesizedFPGAdesign.Sincelengthysynthesistimesprohibittherapidtestingofmanydesignsanddevices,wefocusouranalysisontwocasestudiesinvolvingdot-productanddistance-calculationkernelsonarangeofVirtex-5FPGAs.ResultsshowourLPmethodselectstheoptimaldistributionofoperationsandpredictswhenincreasingperformancecandeceasepowereciencyanddependability.AnotherdesignchallengethatfurthercomplicatesFPGA-basedspace-systemdesigninvolveschoosingasucientamountofmemoryandallocatedmemorybandwidth(i.e.,rateatwhichdatatransferswithinthesystem)consideringmission-specicdatacaptureratesanddownlinkbandwidth.AnFPGAhasthreestandardmemoryresourcesthataectsystemperformance:internal-memorycapacity(IMC),internal-memorybandwidth(IMB),andexternal-memorybandwidth(EMB).IMCrepresentsanFPGA'son-chipmemorystorage(i.e.,BRAMs).IMBistheon-chiptransferrateofthedatabetweentheIMCandtheFPGA'sprocessingoperations.EMBistheexternaltransferrateofdatabetweentheIMCandano-chipexternal-memorydevicethroughanon-chipexternal-memoryport.Consideringthesememoryresourcesisanimportantandnon-trivialtaskforFPGA-basedspace-systemdesigners.InsucientIMBorEMBmaybottleneckperformancesinceoperationscannotquicklyprocessdataifthereisdecientIMBtomovedataquicklythroughthedeviceorEMBtomovedataquicklyinto/outofthedevice(whichisevencommoninhigh-performanceterrestrialcomputing[ 9 ]).Increasingthenumberofexternal-memoryportscanincreasetheEMB,buteachportrequiresFFs,LUTsandI/Opins.ThisextraresourceusagenotonlyincreasespowerbutalsodecreasesdependabilitysinceFPGAresourcesareonlyvulnerablewheninuse.Conversely,iftheseresourceswerealreadybeingusedbyoperations,reducingtheoperationcounttoaddexternal-memoryportscannegativelyimpactperformance.Alternatively,storinglargeamountsofdatainIMCtoavoidhighEMBmaydecreasedependabilityaswell,sincetheIMCistypically 16

PAGE 17

lessdependablethantheexternal-memorydevices,whichmaybehardenedagainstradiationand/orprotectdatawitherror-correctingcodes.VerifyingthatanFPGA-basedspacesystemhassucientIMBandEMBandevaluatingthedesignoptionsbetweenhighIMCusageandhighEMBaddsanextraburdenonsystemdesignersalreadystrugglingtomeetthemission'sgoals,furthernecessitatingtheuseofacomprehensiveautomateddesigntool.Tofurtherimproveandextendourframework'sabilitytoevaluateandoptimizeincreasinglycomplexon-boardprocessingsystems,wepresentourframework'smemoryextension,whichenablesmemory-awareanalysisthroughrenementstoourLPmethod.Memory-awareanalysismoreaccuratelypredictsasystem'spoweranddependabilitybymodelingtheIMC,IMB,andEMBmemoryresources.Duetothenon-trivialandapplication-dependentnatureoftheEMB/IMCrelationship,thedesignermustspecifythisrelationshipinadditiontotheapplicationbeforetheframeworkcanbeginanalysis.Thememoryextensionexpandsourframework'sanalysistoamorecompleteandaccurateholisticdesignviewbynotneglectingtheimportanceofmemoryinFPGA-basedspace-systemdesign.Wedemonstratetheimportanceofourframework'smemoryextensionbyinvestigatingacasestudybasedonanenhancedversionoftheHSIsatellitemission,andshowthemethodadesignermightusetoanalyzetheirapplicationanddeterminetheirapplication'sEMB/IMCrelationship.Resultsoftheframework'smemoryextensionshowthatmemoryresourcesmaylimittheperformanceofanFPGA-basedspace-systemdesignandcontributesignicantlytowardspoweranddependabilityresults.Collectively,theresearchdescribedinthisdissertationpresentsaframeworktoaccelerateandimprovetheearly-designphaseofFPGA-basedon-boardprocessingsystemsforspacemissions,whichisenhancedbyourLPmethod,enablinggreateraccuracyinanalysisandtheexibilitytoaddonfurtherextensionstoourframework,asdemonstratedbyourframework'smemoryextension. 17

PAGE 18

Theremainderofthisdissertationisorganizedasfollows.Chapter 2 discussesthegeneralbackgroundassociatedwiththisresearch,withchapter-specicbackgroundandrelatedresearchdiscussedintherstsectionoftherelevantchapters.Chapter 3 introducesourframework'smethodologyforcombiningvariousevaluationmetricstoselectthePareto-optimalsetofdevicesandFTstrategiesforanon-boardprocessingspacemission.Chapter 4 detailsourLPmethod,whichenhancestheaccuracyoftheframework'sevaluationmetricsforFPGAsandexpandsanalysistoconsiderFPGAdevicecongurations.Chapter 5 showshowtoleverageourLPmethodtoenableourframework'smemoryextension,whichimprovestheaccuracyofourframework'sanalysisbyextendingourmodelstoincludeIMB,IMC,andEMB.Finally,Chapter 6 presentsconclusionsandoutlinesdirectionsforpossiblefutureresearch. 18

PAGE 19

CHAPTER2BACKGROUNDInthischapterwediscussthegeneralperformanceandmemorymetricsthatformthefoundationoftheresearchpresentedinthefollowingchapters.Wediscussotherchapter-specicbackgroundandrelatedresearchintherstsectionoftherelevantchapters.Williamsetal.[ 10 ]deneageneralmethodologyfordeterminingthemaximumprocessingcapabilitiesofagivendevice,referredtoascomputationaldensity(CD).TheCDmethodologyusestheresultsofsingle-instantiatedoperationstopredictthefrequencyandperformanceofanapplicationonanFPGAwithoutrequiringdetailedaprioriinformationabouttheFPGAsuppliedbytheuser.Furthermore,thefullscopeoftheCDmethodologyincludesawiderangeofdevicearchitectures(e.g.,centralprocessingunit(CPU),digitalsignalprocessor(DSP),FPGA,andgraphicsprocessingunit(GPU))andconsidersoperationtypesaswellasprecisionwhencalculatingthedevices'CDs.OurbasicframeworkleveragesthisCDmethodologytoquicklycalculateanupperboundfortheoptimalperformanceofadevicerunningaparticularapplication.AlthoughtherearenumerousdesignsforaparticularFPGAthatmaysatisfythedemandsofamission'sapplication(e.g.,DSP-orlogic-centric,shortorlongpipelinelength,speedorareaoptimized),theCDmethodologynarrowstheframework'sanalysistoascaledversionofthesingledesignthatproducesthemaximumperformance.ThisanalysisofasingledesignprunesthehugedesignspaceassociatedwithSoC-designoptimizationandenablesourframeworktofocussolelyonidentifyingtheoptimaldeviceandFTstrategycombinations.Additionally,theapplicabilityofCDtoawiderangeofarchitecturesenablesextensionstoourframeworktowidenthescopeofanalysisbeyondcomparingonlyFGPAdevices.TocomputeanFPGA'sCD,Williamsetal.consideredfouroperations(combinationsofanaddormultiplyfunctiontypewithDSPandlogic-onlyvariants)andinstantiated 19

PAGE 20

eachoperationoneatatimeontheFPGAtodeterminetheresourcesconsumedperoperation.AsimpleanalyticalmethodthenusedthisdatatoprojecthowtooptimallyusetheFPGA'sresourcestotthemostsimultaneousoperationsontheFPGA.Williamsetal.thencalculatedtheFPGA'sCDasthemaximumnumberofsimultaneousoperationsthatcouldtontheFPGAmultipliedbythelimitingfrequencyoftheslowestoperation.OurworkintroducesourLPmethodthatgeneralizestheCDmethodologybyconsideringanynumberoffunctiontypesandoperationvariantswhenoptimizinganFPGAforperformance.Wealsodiscussanimprovedmethodforhandlingthelimitingfrequencyimposedbytheslowestoperation,whichinsomecasesyieldsfurtherperformanceimprovements.Finally,wedemonstratehowtheexibilityofourLPmethodallowsustomakesmallmodicationstooptimizeanFPGAdesignforpowerordependabilityinsteadofperformance.Inadditiontoperformance-basedmetrics,Richardsonetal.[ 11 ]establishedmemory-basedIMBandEMBdevicemetrics.IMBmeasurestherateatwhichdatacanbetransferredfromon-chipmemoriestotheoperations.IMBisimportantbecauseIMBmaybecomeabottleneck,limitingthespeedatwhichthedatacanbeprocessed.ForanFPGA,IMBrepresentsthebandwidthbetweenon-chipBRAMandtheon-chipprocessingresources.EMBmeasurestheamountofbandwidthbetweenaprocessingdeviceando-chipexternalmemorydevices.Richardsonetal.measuredanFPGA'sEMBbyinstantiatinganexternal-memoryportontheFPGAforeachexternalmemorydeviceandmeasuringtheexternalmemoryport'sresourcerequirements.Theseresultswereextrapolatedtopredicthowmanyexternal-memoryportscouldsimultaneouslytontheFPGAandhowmuchEMBthoseexternalmemoryportscouldprovide. 20

PAGE 21

CHAPTER3AFRAMEWORKFOREVALUATINGANDOPTIMIZINGFPGA-BASEDSOCSFORAEROSPACECOMPUTINGThischapterisorganizedasfollows.Section 3.1 discussesthebackgroundandrelatedworkthatprovidesthefoundationforourframework.Section 3.2 presentsanoverviewofourframework,andSection 3.3 discussesourframework'sevaluationmetrics.InSection 3.4 ,wepresentahypotheticalcasestudybasedonanHSImissiontodemonstrateourframework'sbasicmethodologyandshowourframework'sfull,design-enhancingpotential. 3.1BackgroundandRelatedWorkOurframeworkleveragespreviousworkrelatedtoeachofthefoursystempropertiesandintroducesanovelevaluationmethodologythatcombinestheseproperties,producingevaluation-metricresultstoidentifyandcomparePareto-optimalsystemdesigns.Thissectiondiscussesimportantbackgroundandresearchrelatedtoeachofthefoursystemproperties.Peaseetal.[ 12 ]discussappropriatedeviceselectionbasedonanenvironment'svaryingradiationlevels.Adevicedatabasestoresradiationdataforasetofknowndevicesandallowsdesignerstoquicklyeliminateinappropriatedevices.Forthedeviceproperty,ourframeworkleveragesasimilardevicedatabasetostoreradiationdata,withadditionaldataonthedevice'sprocessingcapabilitiesandpowerconsumption.OtherworkshavedemonstratedmethodologiesforpredictingtheoptimalperformanceofanapplicationdesignonanFPGAdevice.TheRCAmenabilityTest(RAT)[ 13 ]isananalyticalmethodologythatusesthreetestsforthroughputperformance,numericalprecision,andresourceutilizationtodeterminetheviabilityofanalgorithmdesignonanFPGApriortotheuseofahardwaredescriptionlanguage.RATmeasuresthroughputperformancewithbothcommunicationtimefortransferringdataonandotheFPGAandcomputationtimeforprocessingthedataaccordingtoanapplicationdesign,whichreliesonauser-suppliedfrequencyestimationfortheFPGA.Enzleretal.[ 14 ]describeasimilarhigh-levelestimationmethodologyforcharacterizingtheareaandperformanceof 21

PAGE 22

anapplicationonanFPGA.UsingaprioriinformationabouttheFPGA'sarchitecture,themethodologycreatesasetofequationstodescribearea,frequency,throughput,latency,andinput/output(I/O)pincount,enablingtheusertoquicklytestthetradeosinvolvedindecomposingpartsofthedesign,replicatingthoseparts,oraddingregistersforpipelining.Meswanietal.[ 15 ]showhowtomodelandpredicttheperformanceofhigh-performancecomputingapplicationsonsystemsthatuseGPUorFPGAhardwareaccelerators.Theirmodelevaluatestheapplication'scodetondsectionsthatcouldbeeasilyacceleratedandusessimplebenchmarkstopredictacceleratorspeedup.ForEarth-orbitingmissions,ourframeworkrequiresdesignerstoinputthemissionproperty'sdataintoCREME96[ 16 ]topredicttheaverageradiation-uxexperiencedbyaprocessingsystem.Usinguser-providedradiationdataforspecicdevices,CREME96alsopredictsdevice-upsetratesbasedontheradiation-uxeects.Forotherenvironments,simplermodelsbasedonenvironmental-radiationliteraturecanpredicttheaverage-radiationuxaccordingtothemissionproperty'sdata.FTstrategiesincreasesoftwareandhardwarefaulttoleranceusingredundantcalculationsand/ordatastorage,whichallowsprocessingsystemstooperatecorrectlydespiteeectscausedbyupset-inducingradiation.However,thisredundancyincursprocessingand/orareaoverheads,whichincreaseastheFTstrategy'sfault-mitigatingcapabilitiesincrease.Forexample,triple-modularredundancy(TMR)[ 17 18 ]iscapableofdetectingandcorrectingerrorsandincurs200%areaoverhead.Application-dependentFTstrategiescanoerfault-mitigatingcapabilitieswithloweroverheads,suchasalgorithm-basedFT(ABFT)[ 19 ],whichleveragesthelinearpropertiesofcommonmatrixoperationstoproducechecksumsthatdetecterrorsinthenalcalculatedmatrices.Device-dependentFTstrategies,suchasrecongurableFT(RFT)[ 20 ],useanFPGA'spartialrecongurationcapabilitiesandthetime-varyingnatureoforbitalradiationtodynamicallyincrease/decreasethefault-mitigatingcapabilities.FPGAscanalsousepartialrecongurationafterdetectinganupsettorepairtransientfaultsinreal-time 22

PAGE 23

withpartialscrubbingorrepairpermanentfaultsbyreprogrammingonlythedamagedareastoavoidthedamagedresources.Furthermore,reduced-precisionredundancy[ 21 ]isanapplication-anddevice-dependentFTstrategythatmayenableanFPGAtohaveasignicantreductioninoverheadcomparedtoTMRwithonlyasmalllossindependability.OurframeworkconsidersawiderangeofFTstrategies,whichallowsdesignerstoevaluateFTstrategieswithrespecttothespecicapplicationanddeviceandviewtradeosbetweenthefault-mitigatingcapabilityandperformance/areaoverhead.Understandinghowtheapplicationpropertyimpactsadevice'sperformanceisparamountinselectingthePareto-optimaldesigns.Forexample,FPGAsareeectiveforbit-levelandxed-pointoperations,butpotentiallylesseectivethanxed-logicdevicesfordouble-precision,oating-pointoperationsduetotheseoperations'muchhigherrecongurableresourceutilization.Asanovicetal.[ 22 ]addressthisissueforhigh-performancecomputing(HPC)systemsbyidentifyingthirteencommonkernelsthatrepresenttheessentialoperationsofthevastmajorityofallHPCapplications.BysubsettingHPCapplicationsbasedontheapplications'constituentkernels,systemdesignerscanquicklyandeectivelystudyabroadrangeofapplicationsandapplicationbehaviorswithlittlelossofaccuracybyfocusingonunderstandingonlythesethirteenkernels.Ourframeworkleveragesthissubsettingmethodologytoidentifythemostcommonkernelsthatrepresentthemajorityofallon-boardprocessingapplications,whichallowsourframeworktoanalyzeabroadrangeofon-boardprocessingapplicationswithoutrequiringresearchintoeachspecicapplication.Researchinstitutionswithasignicanthistoryindesigningaerospacesystemsusuallyhavestringentguidelinesinplaceforthepart-selectionprocess,however,eventheseestablishedprocessescanbenetfromourframework'sanalysis.Forexample,NASA'sstate-of-the-artpart-selectionprocessforanewmissionbeginsfromthereference-boarddesignofapreviouslyownmission,whichiscomparedtothenecessarycapabilitiesandrequirementsofthenewmissiontoidentifyanypartsthatneedtobeupgraded 23

PAGE 24

ormodied.Then,basedontheavailabletechnology,thedesignertriestodeterminewhichnewpartswouldbestsuittheneedsofthenewmission.AnynewelectricalpartsarerigorouslytestedandscreenedasoutlinedinNASA'sEEE-INST-002document[ 23 ].Inthisprocess,ourframeworkwouldserveasapreliminaryanalysistool,enablingthedesignertoquicklynarrowdowntheirdevice-choicescopetothemostpromisingprocessingdevicesduringthenewpartselectionstepwithoutrelyingonadhocselectionmethodologiesoronlychoosingfamiliardevices. 3.2FrameworkOurframeworkdeterminesthePareto-optimalsystemdesignsbasedonthefoursystemproperties(device,mission,FTstrategy,andapplication),allowingadesignertoselectthebestdesignbasedonthedesigner'sdesiredtradeos,regardlessofadesigner'sfamiliaritywiththedevicesandFTstrategies.AlthoughthisfoundationalworkfocusesonFPGAdevicesforaerospaceenvironments,ourframeworkcansupportawiderangeofdevices(e.g.,CPUs,DSPs,FPGAs,GPUs)aswellasadiversesetofenvironments(e.g.,outer-space,aerospace,underwater)andiseasilyextendabletoadditionaldevicesandenvironments.Furthermore,whilethisworkfocusesonSoC-leveldesign,thescopeofourframeworkiseasilyexpandabletoincludeotherboard-andplatform-designfactors(e.g.,recentworkshowshowexternalmemorydevicescanbeincludedintoourframework[ 24 ]).Theremainderofthissectionisorganizedasfollows.Section 3.2.1 presentsanoverviewofourframework,focusingonoverallscope,generalconcepts,andourframework'scomponents,Section 3.2.2 detailsthefoursystem-propertycomponents,andSection 3.2.3 discusestheanalysiscomponent. 3.2.1OverviewFigure 3-1 depictsanoverviewofourframework,whichiscomposedofvecomponents.Therstfourcomponentsarethesystem-propertycomponents,whichincludethedeviceset,themissioncharacteristics,theFTstrategyset,andtheapplicationkernelsetcomponentsandcorrespondrespectivelytothedevice,mission,FTstrategy,and 24

PAGE 25

applicationsystemproperties.Thefthcomponent,theanalysiscomponent,correspondstothepoweranddependabilityevaluationmetrics. Figure3-1.Frameworkoverviewconsistingofthefoursystem-propertycomponents(corners)andanalysiscomponent(center). Thesystem-propertycomponentsconsistofbothdesigner-specieddataandresearchdataobtainedfromliterature.Sinceourframeworkdoesnothaveaprioriknowledgeofthesystemplatform,environment,andconstraints,thedesignerprovidesthemissioncharacteristics,andourframeworkpre-denesthedeviceset,FTstrategyset,andapplicationkernelsetbasedonliterature-researchdata(Section 3.2.2 ).Theanalysiscomponentcombinesthedataofthesystem-propertycomponentsandproducesevaluation-metricresults,whichthedesignerevaluatestoselectthebestdesign.Eachevaluationmetriccombinesthedatafromthesystem-propertycomponentsina 25

PAGE 26

uniquemethodbasedonthespecicevaluationmetric'sdependencyontheinteractionsofthesystem-propertycomponents.Forexample,thepowermetricevaluatesdeviceperformancewithrespecttoanapplication,whereasthedependabilitymetricevaluatesdeviceradiation-responsedatawithrespecttothemissionenvironment.Alternatively,thedependabilitymetricevaluatesthefault-mitigationcapabilitiesoftheFTstrategies,whereasthepowermetricevaluatestheperformanceandareaoverheadsoftheFTstrategies.Finally,evaluationmetricsonlyevaluatevaliddesignsthatusedevice-orapplication-dependentFTstrategieswiththecorrespondingdevicesandapplications. 3.2.2System-PropertyComponentsThedevicesetcontainsaprioridatafromourframework'sdatabaseonabroadrangeofdevicearchitecturesaswellasanyavailableradiation-hardenedversionsofthesedevices.Thedeviceset'sdatarecordsthreecharacteristicsforeachdevice:powermeasurements,processingcapability,andradiationresponse.Powermeasurementsincludethemaximumdynamicpowerconsumptionofthedeviceforagivenapplication,thethermalresistancebetweentheinsideofthedeviceandthesurroundingenvironment,andinformationonhowthedevice'stemperatureaectsthedevice'sstaticpowerconsumption.OurframeworkrepresentsprocessingcapabilityusingtheCDmethodology,whichdependsonthetypeandprecisionoftheapplication'soperations.Theradiationresponseinvolvesdeterminingtheareasofthedevicethataresensitivetoasingleradiationparticle(protonorheavyionforspacemissions)ofagivenenergy.Literature-researchdataprovidestheradiation-responsedata,becausethisdataissucientforourframework'sanalysis,andobtainingthisdataviaexperimentalanalysisisdicultandtime-consuming.Themissioncharacteristicsdenethemissionenvironment,availableresources,andcomputationalconstraints.Designersmustspecifythisdatabeforeourframeworkcanbeginmissionanalysis.Themissionenvironmentincludesdataonthemission'sspecicpath(e.g.,anorbitinspaceoraroutealongtheoceanoor),themission'sduration(e.g.,monthsoryears),themissionstartdateforconsideringtime-dependentenvironments, 26

PAGE 27

andanyotherharshconditionsthatmustbeconsidered(e.g.,extremetemperaturesorexcessivevibration).TheavailableresourcesincludetheSWaPrestrictionsandmayalsoincludeamonetarybudgetfordesigningandbuildingthesystem.Ourframeworkusestheconstraintsdenedintheresourcedatatotestthesuccessfulnessofvariousdesigns.Thecomputationalconstraintsdictatetheacceptablefaultrates,requiredprocessingthroughputbasedontheincomingsensordata'sthroughput,andthemaximumallowablememoryusagebasedontheon-boardmemoryconstraints.TheFTstrategysetisstoredaprioriinourframework'sdatabaseandcontainsliterature-researchdataonthemosteectiveand/orcommonFTstrategies,whichincludesawidevarietyofFTdetectionand/orcorrectionstrategies,someofwhicharedevice-orapplication-dependent.TheFTstrategysetrecordsthreecharacteristicsforeachFTstrategy:eectiveness,overhead,anddependencies.TheeectivenessistheFTstrategy'sfault-mitigationcapability(e.g.,detectiononly,ordetectionandcorrection).Forexample,ifanon-fault-tolerant(NFT)systemhasa1%chanceofexperiencingafaultduringacertaintimeinterval,addingaTMRFTstrategytothesystemwillcorrect97%ofthesefaultsoverthesametimeinterval.TheoverheadreferstotheextraprocessingthatallFTstrategiesrequireduetoredundantcalculations(e.g.,200%overheadforTMR).Finally,thedependenciesdenewhichdevicesorapplicationscorrespondtoagivenFTstrategy,ensuringthatourframeworkonlyevaluatesvaliddesigns.AddingnewFTstrategiestoourframeworkonlyrequiresthespecicationofthenewmethodsforcalculatingtheFTstrategies'eectivenessandoverheadbasedontheapplicationaswellasanyapplicationordevicedependencies.Theapplicationkernelsetisalsostoredaprioriinourframework'sdatabaseandcontainsthesubsetofcommonkernels(e.g.,matrixmultiplicationandfastFouriertransform)representingtheessentialoperationsofthevastmajorityofon-boardprocessingapplications.Identifyingthecommonkernelsisakeychallengeandimportantareaofresearchforourframework,whichinvolvesanalyzingacomprehensivesurveyof 27

PAGE 28

aerospaceapplicationswiththegoalofidentifyingthesmallestsubsetofcommonkernelsthatencompassesthelargestamountoftheapplications'constituentkernels.Iffutureanalysisdeterminesthatemergingaerospaceapplicationsarenotnecessarilycoveredunderthecurrentsubsetofkernels,thesubsetcaneasilybeexpandedtoincludethesenewkernels.Inadditiontomappingapplicationstooneormoreofthesekernels,ourframeworkcategorizesapplicationsaseithersensorprocessingorautonomousprocessing.Sensorprocessingistheprocessingoftherawdatacollectedfromon-boardsensorswiththepurposeofcompressingand/orextractingimportantinformationbeforetransmission.Autonomousprocessingistheabilityoftheon-boardprocessingsystemtomakeintelligentdecisionsandtakeeectiveactionbasedsolelyonin-situanalysisoftheenvironment,suchascircumnavigatingobstaclesandlocatinglandingzones.Sensorprocessingtypicallyfocusesonmeetingtransmissionthroughputconstraints,whileautonomousprocessingfocusesonreliablymeetingreal-timedeadlines. 3.2.3AnalysisComponentFigure 3-2 showstheanalysiscomponent,whichusesdatafromallfourofthesystem-propertycomponentstocreateandoutputthenalPareto-optimaldesignset.Thedesignerisresponsibleforsupplyingthemissioncharacteristicdatatoourframeworkaswellasidentifyingapplicationsandotherrelevantapplicationparameters(e.g.,sizeofinputmatricesorarrays),whichourframeworkcomparesagainsttheapplicationkernelsettounderstandtheapplication'soperations.AlldatafromthedevicesetandFTstrategysetexistsaprioriinourframework'sdatabase.TheanalysiscomponentconsiderseveryuniqueandvaliddeviceandFTstrategycombinationasananalyzabledesign.TheanalysiscomponentrunseachdesignthroughallMevaluationmetrics,producingMevaluation-metricresultsforeachdesign.Althoughthebulkoftheanalysisoccursautomaticallywithinourframework,externaltools(e.g.,CREME96andSPENVISforradiationmodeling)mayneedtobeinvokedby 28

PAGE 29

Figure3-2.Flowchartfortheanalysiscomponentshowinginputsfromthefoursystem-propertycomponentsandanoutputconsistingofthenalPareto-optimaldesignset. thedesignerifautomaticcontrolsforthesetoolshavenotbeenintegratedintoourframework.Althoughthisfoundationalworkfocusesonthepower,dependability,andlifetimeevaluationmetrics,thesemetricsdonotrepresenttheentirescopeofourframework'scapabilities.Anynumberofevaluationmetricscanbeaddedtoandusedinourframework'sanalysisifresearchers/designersndthesemetricsrelevanttothemission.Furthermore,methodswepresentforanalyzingeachevaluationmetricarenotmeanttobenal,sofutureresearchers/designerscanimproveanevaluationmetricbyusingmoreaccuratedatasourcesorreplacingthemetric'smethodswithmoreadvancedanalysismethods.Adesignissuccessfulifthedesign'sevaluation-metricresultsmeetorout-performthemissionconstraints.Afterremovingallunsuccessfuldesignsthatfailtheconstraints,aPareto-optimalsearchselectsasmallsubsetofthedesignsthatareParetooptimal,meaningthatthesedesignsarethemostpreferreddesignsinsomeaspectbasedsolelyonthedesign'sevaluation-metricresults.Finally,theanalysiscomponentoutputsthese 29

PAGE 30

designsandtheassociatedevaluation-metricresultsasthenalPareto-optimaldesignset,whichthedesignercanusetoidentifyoptimaldesignsandavailabletradeos.TobetterunderstandhowthePareto-optimalsearchselectsPareto-optimaldesigns,letthesetofallNdesignsberepresentedasfD1;D2;:::;DNg,whereanyparticulardesignDxinthissetisrepresentedbythesetofthatdesign'sMevaluation-metricresultsfDx;1;Dx;2;:::;Dx;Mg.ThenDxisaPareto-optimaldesignifandonlyifthereisnodesignthatispreferredorequaltoDxforeveryevaluationmetricandispreferredtoDxinatleastoneevaluationmetric(thisisknownasstronglyParetooptimal).Moreformally,DxisaPareto-optimaldesigni: 69i2f1:::Ng:8j2f1:::Mg:Di;jDx;j^9j2f1:::Mg:Di;jDx;j(3{1)Forcertainevaluationmetrics(e.g.,power),aminimalvalueispreferred,soratherthanuseastandardinequalitysign,thePareto-optimaldenitionusesthesymbolmeaning\ispreferredto". 3.3EvaluationMetricsOurframeworkfocusesonpower,dependability,andlifetime,whicharecriticalevaluationmetricsforaerospacemissions.Theremainderofthissectionisorganizedasfollows.Section 3.3.1 ,Section 3.3.2 ,andSection 3.3.3 presentthedesign'sdevicepowerconsumption,dependability,andlifetimecalculations,respectively. 3.3.1PowerFigure 3-3 depictsthepower-metriccalculation.Ourframeworkcalculatesthesystem'srequiredprocessingintermsoftypeandrateofoperationsperformedbasedonthedesigner-speciedapplicationprocessingandsensorinput-datarate.Forexample,considerasimpleon-boardimage-processingsystemthatusesacameratocaptureEarthimagesfromspacewithasensordatarateofthreeimagespersecond,fourmegapixelsperimage,andthree8-bitcolorchannels(i.e.,red,green,blue)perpixel.Thesystemsumseachpixel'sthreecolorvaluestodetermineiftheaveragebrightnessoftheimageexceeds 30

PAGE 31

acertainthreshold.Sinceaddingmultiple8-bitvaluesproducesaresultlargerthan8bits,thesystemprocessingcanbesummarizedasthree16-bitadditionoperationsperpixel,whichisarequiredprocessingof36million16-bitadditionoperationspersecond. Figure3-3.Flowchartforthepower-metriccalculation. OurframeworkusestherequiredprocessingresultanddeviceCDtocalculatethedeviceutilization(Udevice),whichistheamountofdeviceresourcesasystemusesrelativetothetotalamountofdeviceresourcesavailable.Adeviceutilizationof100%meansthatthesystemisusingthedeviceatthedevice'smaximumpotential.Ourframeworkcalculatesdeviceutilizationastheratiooftherequiredprocessingtothedevice'sCD.ThisCDvaluemustcorrespondtothetypeandprecisionofoperationsusedintherequiredprocessing.Fortheimage-processingexampleandasimple,representative,sampledevicewitha16-bitintegeradditionCDof100millionoperationspersecond,thedeviceutilizationis36%. 31

PAGE 32

UpdatingthedeviceutilizationtoincludetheFTstrategy'sareaoverheadVFTproducesthedevice-FTutilizationUdFT: UdFT=Udevice(1+VFT)(3{2)Fortheimage-processingexampleandaTMRFTstrategy,TMRintroducesa200%overhead,whichresultsinadevice-FTutilizationof108%.SincetheexactdetailsofthedevicecongurationdictatesTMR'soverhead,whichcouldbemoreorless[ 25 ]than200%,andconsideringthatourframeworkdoesnotevaluatepost-synthesisresultsforspecicdesigns,ourframeworkassumesanaverage200%TMRoverhead.Sincethedevice-FTutilizationisgreaterthan100%,thesystemeitherrequiresmorethanonedeviceoradierentdevicewithgreaterresources.Thenalresultofthepowermetriccalculationisthedevice'stotalpowerconsumptionPtotal,whichourframeworkcalculatesasthesumofthedynamicpowerconsumptionPdynamicandstaticpowerconsumptionPstatic.OurframeworkcalculatesPdynamicastheproductofthedevice'smaximumdynamicpowerconsumptionPmax-dyn(determinedasthedevice'sdynamicpowerconsumptionduring100%deviceutilization)andthedevice-FTutilizationvalue.Therefore,totalpowerconsumptioniscalculatedas: Ptotal=Pstatic+Pdynamic=Pstatic+(Pmax-dynUdFT)(3{3)Tocalculatethestaticpower,whichistemperature-dependent,ourframeworkrequiresdataontheambienttemperatureoftheplatformTambient,thethermalresistancefromthedevicetotheplatformRthermal,andthedevice'sstatic-powerfunctionfsp(Tdevice),whichrecordshowthedevice'sstaticpowervarieswithrespecttothedevice'stemperature.Ourframeworkstoresthedevice'sstatic-powerfunctionandnominalthermalresistanceaprioriinourframework'sdevicedatabase,whilethedesignersuppliestheambienttemperatureandanyspecialadjustmentstothethermalresistanceinthemissioncharacteristicsdata.Aftercalculatingdynamicpowerconsumption,ourframework 32

PAGE 33

calculatesthestaticpowerbyrstdeterminingthedevicetemperatureneededtoensurethetotalpowerusedbythedeviceisequaltothepowerdissipatedasheat.Thisstateofequilibriumisrepresentedas: Ptotal=Tdevice)]TJ /F3 11.955 Tf 11.95 0 Td[(Tambient Rthermal(3{4)Combiningequations( 3{3 )and( 3{4 ),substitutingfsp(Tdevice)forPstatic,andsettingtheequationtoequalzeroproduces: Rthermal)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(fsp(Tdevice)+Pdynamic+Tambient)]TJ /F3 11.955 Tf 11.95 0 Td[(Tdevice=0(3{5)OurframeworkndstheTdevicethatsolves( 3{5 )usingtheNewton-Raphsonmethod,ndsstaticpowerconsumptionbyevaluatingthedevice'sstatic-powerfunctionatTdevice,andnallycomputesthetotalpowerconsumptionaccordingto( 3{3 ).Ifthedevice-FTutilizationisgreaterthan100%,thenumberofrequireddevicesnis: n=dUdFTe(3{6)Assumingthatthetotalcomputationisdistributedevenlyacrossthendevices,ourframeworkcalculatestheutilizationforasingledeviceasUdFT=n,calculatesthetotalpowerconsumptionforasingledeviceasdescribedabove,andmultipliesthesingle-devicetotalpowerconsumptionbyntoproducethetotalpowerconsumptionofallndevices.Toconcludetheimage-processingexamplewithaTMRFTstrategy,weassumeanambienttemperatureof25Candasampledevicewithamaximumdynamicpowerconsumptionof10W,athermalresistanceof4C/W,andastaticpowerconsumptionthatvarieslinearlyfrom1Wat0Cto3Wat100C(anoversimplicationofastandardstatic-powerfunction).Sincethedevice-FTutilizationis108%,nis2,andthedynamicpowerconsumptionofasingledeviceis5.4W.Usingourexamplevalues,( 3{5 )becomes5(f1+Tdevice=50g+5:4)+25)]TJ /F3 11.955 Tf 12.81 0 Td[(Tdevice=0,whichresultsinadevicetemperatureof63.3Candastaticpowerof2.27W.Finally,forourtwoexampledevices 33

PAGE 34

runningtheimage-processingexampleapplicationwithaTMRFTstrategy,thetotalpowerconsumptionis15.34W.WenotethatFPGAdeviceutilizationdoesnotnecessarilyimplyfabricutilization,whichisthepercentageofresources(e.g.,lookuptables,ip-ops,DSPunits,etc.)thattheFPGAisactivelyusingforprocessing.Toillustratethisdierence,weconsideradeviceconsistingof125computationalunitsandtwocongurationdesigns:designAuses100computationalunits(80%fabricutilization)andrunsat100MHz,anddesignBuses50computationalunits(40%fabricutilization)andrunsat200MHz.Thedevice'smaximumCDis20billionoperationspersecond(GOPS)at160MHz,whichcanbeachievedbyrunninganyarbitrarydesignthatusesall125computationalunits.Usingthissamedevice,bothdesignAandBperform10GOPSandthereforehaveadeviceutilizationof50%,eventhoughbothdesignsdierconsiderablyinthedesigns'fabricutilizations.Furthermore,sincedesignB'sclockrateisdoublethatofdesignAanddesignB'sfabricutilizationishalfofdesignA's,designB'spowerconsumptionshouldrealisticallybesimilartodesignAbecausethedoubledclockshouldnegatethereducedpowerconsumptionfromhalvingthefabricutilization.Ultimately,deviceutilizationallowsourframeworktoabstractawaythedetailsoffabricutilizationandclockfrequencybecausedeviceutilizationistheonlyrelevantfactorinrelatingdeviceperformancetopowerconsumption.Inthischapter,wehaveelectedtodiscussthissimplepower-metriccalculation,whichisanacceptablyaccurateestimateformanysituationsincludingourcasestudymission,whichclearlyandconciselydemonstratesourframework'smethodology.However,wehavealsodevelopedasignicantlymorecomplexandadvancedcalculation[ 26 ]formoreaccuratelycalculatingthedynamicpowerconsumptionofadeviceforagivenapplication.InsteadofrelyingonlinearinterpolationusingthedeviceCD,thisadvancedcalculationleverageslinearprogrammingtechniquessimilartothoseusedintheCDmethodologytondtheoptimaldynamicpowerconsumptionandconsiderawiderangeofprocessing 34

PAGE 35

components(e.g.,hardcore/softcoreprocessors,hardoating-pointunits)inadditiontotraditionalFPGAlogic.Althoughwecouldeasilyupdateourframework'ssimplepower-metriccalculationwiththismoreaccurateandadvancedcalculation,thiswouldunnecessarilyaddtothecomplexityofdemonstratingourframework'smethodology. 3.3.2DependabilityFigure 3-4 depictsthedependability-metriccalculation.Environmental-radiationdataforheavyionsdescribestheparticleux(i.e.,particlespersquaremeterpersecond)forvaryinglinearenergytransfer(LET)values(orenergylevelsforprotons).LETmeasurestheamountofenergydepositedbyaparticleastheparticlepassesthrougheachunitlengthofamaterial(siliconinthiscase)inunitsofMeVcm2/mg.Figure 3-5 depictsexampleenvironmental-radiationdata.Inthisexample,asquaremeterofsiliconwillexperiencea10MeVcm2/mgparticleeverysecond,a40MeVcm2/mgparticleevery30years,andlessthanone100MeVcm2/mgparticleevery3millennia. Figure3-4.Flowchartforthedependability-metriccalculation. Thedeviceradiation-responsedatadescribesthecrosssectionofthedeviceareasthataresensitivetosingle-eventupsets(i.e.,bitip)whenhitwithaparticleofacertainLETorenergylevel.Althoughtemperaturemayalsoaectadevice'scrosssection,thiseectisbeyondthescopeofthisjournalsincethereisinsucientdataonthiseectforourstudieddevices.Literature-researchdatatypicallypresentscross-sectionvaluesinunitsof 35

PAGE 36

cm2/deviceorcm2/bit.Ifradiationdataisintermsofperbitratherthanperdevice,ourframeworkcalculatesthetotalsensitivityofthedeviceastheproductofthebit-sensitivityandthenumberofsensitivebits.Ourframeworkestimatesthenumberofsensitivebitstobeequaltothesizeofthedevice'sbitstream,meaningthatourframeworkaccountsforFPGAcomponents(e.g.,lookuptables,ip-ops,DSPunits,blockRAMs,etc.)thatthebitstreamconguresorinitializesinthesensitivityestimate.AlthoughthereareotherFPGA-fabricresourcesthatarenon-congurable(e.g.,internalDSPpipelinestageregisters),wedonothaveaccesstotheseresourcesandcannotassesstheseresources'impactsontheFPGA'sdependabilitywithoutvendorprovideddata. Figure3-5.Exampleenvironmental-radiationdata[ 3 ]. Ourframeworkdeterminesthedeviceupsetratebasedontherateatwhichvariousparticleshitthedeviceandtheeectsofthehits,whicharepartoftheenvironmentalanddeviceradiation-responsedata.Thedeviceupsetratemeasurestherateatwhichupsetsoccurinthewholedevice,includingresourcesinunuseddeviceregions.Ifupsetsoccurintheseregions,theupsetshavenoeectontheoverallsystembecausethedesignignoresanyoutputfromtheunusedresources.Therefore,theeectivedeviceupsetrateistheproductofthedeviceupsetrateandthedeviceutilization(Section 3.3.1 ),whichmeasurestherelativeamountofdeviceresourcesused.Wenotethatevenat100%deviceutilization,manydeviceresourceswillremainunused(e.g.,unusedrouting, 36

PAGE 37

unroutable/congestedresources),resultinginonlyabout10%ofthebitsbeingvulnerable[ 27 ].Althoughitispossibletoapproximatetheprecisedeviceupsetrateandimproveourframework'sanalysisthroughvendortoolsorfaultinjection[ 28 29 ],analysisbasedpurelyondevice-utilizationscalingprovidesareasonableworst-caseestimateofthedevice-upsetrateandisassumedinourframework.WiththeeectivedeviceupsetrateandtheappliedFT-strategy,ourframeworkcalculatesthenalMTBFresult,whichquantiestheaveragetimeadevicecanoperatewithoutexperiencingafailure.OurframeworkcalculatesMTBFdierentlyfordierentFTstrategies,whichmayincludevariablessuchasnon-FTeectivedeviceupsetrateandinputdatasize.Forexample,aTMRsystem'sreliabilityistheprobabilitythatthereisnosystemupsetforsomeperiodoftime,andiscalculatedas: RTMR=3(ROrig)2)]TJ /F1 11.955 Tf 11.95 0 Td[(2(ROrig)3(3{7)Ifanon-TMRsystemhasareliabilityof99.0%afteroneday,thenTMRraisesthereliabilityto99.97%,protectingagainst97%oftheupsetsascomparedtothenon-TMRsystem.Conversely,ifthenon-TMRsystemhasareliabilityof80.0%,TMRraisesthereliabilityto89.6%,protectingagainstlessthanhalfoftheupsets.ForotherFTstrategies,itmaynotbepossibletorealisticallycalculatetheFTstrategy'sfault-mitigatingcapabilities,requiringeitherfault-injectiontestingorliterature-researchdata.Aftercalculatingthenalupsetrateforthesystem,ourframeworkcalculatesthenalMTBFresultbyinvertingtheupsetrate. 3.3.3LifetimeFigure 3-6 depictsthelifetime-metriccalculation.Ourframeworkrequiresliterature-researchdataforthedevice'stotalionizingdose(TID)ratingandthelimitingplatformTID.Thedevice'sTIDratingmeasurestheamountofionizingradiationenergythatthedevicecanabsorbbeforebecomingnonfunctional.TheprimarysourcesofTIDradiationareprotons,electrons,andbremsstrahlung(high-energyphotonsreleasedbyelectron/proton 37

PAGE 38

interactions).TypicaldeviceTIDratingscanrangefromafewkradforhighlysensitivedevicestoover1Mradforhardeneddevices.Recenttrends,asseeninFigure 3-7 ,demonstratethatdeviceTIDratingsmaycontinuetoimproveasdevicefeaturesizesdecrease,althoughthistrendcannotbeguaranteedasfabricationprocessescontinuetochange. Figure3-6.Flowchartforthelifetime-metriccalculation. Figure3-7.DeviceTIDratingsforvariousVirtexdevicesshowsimprovingTIDtrend[ 4 ]. Essentialplatformcomponentsareanyphysicalplatformcomponents(notincludingtheprocessingdevice)thatmustbefunctionalforamissiontoremainoperational.ThelimitingplatformTIDistheTIDratingoftheessentialcomponentwiththelowestTIDrating.Iftheplatformcontainsagroupofredundant,spare/backupessentialcomponentsforincreasedreliability,ourframeworkconsidersonlythegroup'sessentialcomponent 38

PAGE 39

withthehighestTIDrating.TypicalcomponentswithhighsensitivitytoTIDincludebipolartransistors,powerMOSFETs(increasedoxidethicknessresultsincreationofmoreelectron-holepairs[ 30 ]),andFlashmemories(chargepumpsaresensitivetoTID[ 31 ]).Displacementdamage(DD)mayalsoaectessentialplatformcomponents,whichoccurswhennon-ionizingradiationcollideswithanddisplacesatoms,leadingtocomponentdefects.OurframeworktreatsDDsimilarlytoTIDbecausebothaccumulateslowlyoverthecomponent'slifetime,eventuallyrenderingtheessentialcomponentinoperable.DesignersdonotusuallyconsiderDDforprocessingdevicesbecauseTIDovershadowstheeectsofDD[ 31 ].However,othercomponentsdoshowparticularsensitivitytoDD,includingbipolartransistors,photo-detectingcharge-coupleddevices(CCDs),solarcells,light-emittingdiodes,andoptocouplers[ 32 ].Ourframeworkrequireseitherliterature-researchdataormodelingresultstoobtaintheenvironment-TIDlevel,whichmeasurestheamountofTIDadeviceorcomponentmayexperienceperunittimeduetothecombinationofenvironmentalradiationandplatformshielding.Figure 3-8 showsannualTIDlevelsforequatorialnear-Earthorbitsofaltitudesunder100,000kmassumingmoderateshielding.Near-EarthTIDlevelsvaryfromlessthan1krad/yearfororbitscloserthan1,000kmupto400krad/yearfororbitswithanaltitudearound17,000km.NotethatTIDlevelsfortheInternationalSpaceStation(ISS)at550kmandtheGlobalPositioningSystem(GPS)orbitat20,200kmaresignicantlydierent(0.28and110krads/year,respectively)thanFigure 3-8 suggestsduetotheISS'sandGPS'snon-zeroinclinationsat51:6and55,respectively.TheoverallTIDratingistheminimumofthedeviceTIDratingandthelimitingplatformTID.OurframeworkcalculatesthenaloperationallifetimeresultbydividingtheoverallTIDratingbytheenvironment-TIDlevel.Theoperationallifetimemeasuresthelengthoftimethemissionisexpectedtooperate(undernormalenvironmentalconditions)beforepermanentfailureofoneormoreessentialcomponentsduetoexcessiveradiationexposure.Operationallifetimecanvarywidelydependingonthe 39

PAGE 40

Figure3-8.AnnualTIDincircularequatorialorbitscomputedusingSHIELDOSE,AE8MAX,AP8MAXmodelswith4mmsphericalAlshielding[ 5 ]. device/componentandenvironment.Forexample,radiation-hardeneddevicesratedforover1Mradcanbeexpectedtooperateforseveralyearsormorewithstandardshieldinginequatorialorbitsaround20,000kmaltitudes.Furthermore,anyVirtexdevicecanbeexpectedtooperateforatleastseveraldecadesbeforesueringanynegativeTID-relatedeectsinatypicallowEarthorbit(LEO).Althoughnon-TIDeectswilllikelydisablethesystembeforetheexpectedTID-basedoperationallifetimeexpires,itmaystillbeimportanttoconsidertheoperationallifetimeduetosolararesandothersourcesofionizingradiation,whichcandramaticallyincreasespaceradiationlevels[ 33 ].Therefore,thelifetimemetricisstillusefulforLEO-basedmissions,becauseduringtheseevents,anexcessiveoperationallifetimecouldleadtomonthsofusefuloperationsasopposedtojustdays.Currently,ourframework'slifetimemetricdoesnotconsiderdestructivesingle-eventeects(SEEs),suchassingle-eventlatchup(SEL),single-eventburnout(SEB),orsingle-eventgate-rupture(SEGR).Justasingleoccurrenceofoneofthesedestructiveeventscanpermanentlydisableanentiredeviceorsystem,therebydramaticallyaecting 40

PAGE 41

theexpectedlifetimeofthemission.Ifradiationdataontheseeectsdoesexist,thedatatypicallyonlyreportsanonsetLET,whichistheminimumLETrequiredtocausedestructiveSEEs.AlthoughitispossibletousethisdatatogivearoughestimateoftheexpectedtimeuntiladestructiveSEEoccurs,designerstypicallyopttousedevicesthatareimmunetodestructiveSEEsratherthanriskpermanentdevicedamagebecause,unlikeaTID-basedlifetimepredictionwherecatastrophiceectsdonotoccuruntilclosertotheendofthepredictedlifetime,destructiveSEEscandisableadeviceatanytimeregardlessofthedestructiveSEE'sfrequency.Therefore,ratherthanintegratingdestructiveSEEeectsintothelifetime-metric,ourframeworkonlyconsidersdesignsthatareimmunetodestructiveSEEeectsbasedonthedevicepropertiesandmissionenvironment. 3.4ExampleCaseStudyThissectionintroducesacurrentlydeployedHSImission,whichservesasacasestudyfortestingandexaminingourframework'sanalysiscomponent.Section 3.4.1 introducesHSIdatacollectionandmaterialsanalysisaswellasourcase-studymission.Section 3.4.2 detailsourframework'spower-metric,dependability-metric,andlifetime-metriccalculationforaVirtex-4withABFTforthecase-studymission.Section 3.4.3 describeshowwesetupourcasestudy,andSection 3.4.4 presentsandanalyzestheresultsofourcasestudy. 3.4.1MissionDetailsOurcasestudy'sapplicationinvolvesanHSIanalysisalgorithm,whichattemptstoidentifycertainmaterialswithinascenebycomparingknownmaterialspectralsignatureswithobservedcharacteristicspectra.AnHSIsensorcapturesscenedataintheformofa3-dimensionalimagecube,wheretwospatialdimensionsdesignateanimagepixel,andthespectraldimensiondesignatesaspecicspectralbandforthepixels.AsshowninFigure 3-9 ,apixel'scharacteristicspectrumisthegroupofdatafromeachspectralbandthatcorrespondstothegivenpixel.Apriorimeasurementsproducespectralsignaturesforanymaterialsofinterest,whichdenethematerial'sreectancevaluesforthespectralbands 41

PAGE 42

usedbytheHSIsensor.Bycomparingeachpixel'scharacteristicspectrumtothesetofmaterialspectralsignatures,HSIanalysiscanidentifyanymaterialofinterestandthematerial'slocationsinthescene,producinganoutputimagesimilartoFigure 3-10 .Thisprocessisanalogoustohumanssubconsciouslyanalyzingasceneusinganobject'scolortodeterminetheobject'smaterialcomposition(e.g.,brownonanappleindicatesrotting).TheHSIsensor'sgreaterspectraldetailenablesHSIanalysistomorepreciselyidentifymaterials(e.g.,distinguishingbetweendierenttypesofgreenvegetation). Figure3-9.HSIimagecubeandthecharacteristicspectrumofasinglepixel[ 6 ]. RemoteHSIimagingsystemstypicallytransmitcollectedimagecubestoagroundstationwherehigh-performanceprocessingsystemsperformHSIanalysis.However,advancesinspace-borneelectronicsandimprovementsinfault-mitigatingtechnologyenableon-boardHSIanalysis,whichmayprovideseveraladvantages,suchasenablingnewHSIsystems[ 34 ]toprovidereal-timecriticalinformationonnaturaldisasters(e.g.,volcanoes,wildres,anddrought).HSIanalysisalsoreducesimagecubestoapproximately1%ofthecube'soriginalsize,aordingmoreecientdatastorageandtransmission.AssessingthefeasibilityofanHSIon-boardprocessingsystemrequiresestimationoftheprocessingrequiredfortheHSIanalysisonthestreamingsensordata.However,becausearound97%oftherequiredprocessinginvolvesasinglematrix-multiplyoperation[ 35 ],theseestimationsaresimplied.Thematrix-multiplyoperationrequirescalculating 42

PAGE 43

Figure3-10.HSIanalysisidentifyingvariouskindsofvegetation[ 7 ]. theautocorrelationsamplematrixRLL=(ANL)T(ANL),whereNisthenumberofpixelsandListhenumberofspectralbands.MatrixANLrepresentsthesensor'simagecubebecausespectraldataforeachpixelcorrespondstoacertainrowofANL.Onlyhalfofthevaluesoftheoutputmatrixneedtobecalculatedbecausetheoutputmatrixmustbesymmetric.Thenumberofmultiply-accumulate(MAC)operationsrequiredtocalculateRLLforasingleimagecubeis: MACHSI=1 2NL2(3{8)DatapreprocessingpriortoHSIanalysisincreasesthesystem'sprocessingrequirements.First,thesystempreprocessestheHSIsensor'srawdatatocorrectcommonimagesensordefects.Specically,eachvalueintheimagecubemustbeosettoaccountforreadoutnoiseanddarkcurrentandthenscaledtoadjustforat-eldeects.SincetheoperationspervalueareroughlyequivalenttoasingleMACoperation,andthereareNLvaluesforeachimagecube,rawdatapreprocessingrequiresLtimeslesscomputationthan 43

PAGE 44

HSIanalysis.SinceL>100formostHSIsystems,therawdatapreprocessingresourcedemandsarenegligible.Ourcasestudy'sHSIsensoristheHyperion[ 36 37 ]ontheEarthObserving-1(EO-1)satellite,whichorbitstheEarthatapproximately7.5km/sinLEOata680kmaltitude,capturingsinglelinesofpixelsatatime.Theselinesareperpendiculartothesensor'spath,andthecombinationofmanyadjacentlinesformsanimagecube.TheHyperioncapturesanimageevery2.95secondsandproducesanimagecube256pixelswide,660lineslong,and22012-bitspectralbandsdeep,requiringatotalof1.386billion32-bitintegerMACoperationspersecond(OPS).AlthoughtemperaturesinLEOcanvarywidelydependingonwhetherasatelliteisinsideoroutsidetheEarth'sshadow,weassumetheEO-1isequippedwithapassivethermalcontrolsystem(e.g.,insulation,radiators,thermalllers)thatregulatestheambienttemperaturetoastandard25C. 3.4.2CalculationofFrameworkEvaluationMetricsInordertoclearlydeneourframework'smethodologiesandcontributions,thissubsectiondetailsthecalculationofthepower,dependability,andlifetimeevaluationmetricsforourcase-studymissionusingaVirtex-4LX40-FF668-10devicewiththeABFTstrategy.TheVirtex-4LX40-FF668-10deviceisalow-midrangedeviceinthe90nmVirtex-4familyandfeatures12.3millioncongurablebits.ThedevicehasaCDof9.37billion32-bitintegerMACOPSwithamaximumdynamicpowerconsumptionof4.59W.BasedontheXilinxPowerEstimatorTool,thedevicehasathermalresistanceof6.6C/Wintheairlessvacuumofspaceandhasanon-linearstatic-powerfunctionmeasuring0.256Wat25C,0.323Wat50C,and0.422Wat75C.GiventheEO-1Hyperionmission'srequired1.386billion32-bitintegerMACOPS(Section 3.4.1 ),thedeviceutilizationfortheVirtex-4LX40-FF668-10is14.8%.Pessimisticallyassuminga10%overhead[ 38 ]fortheABFTstrategyresultsinadevice-FTutilizationof16.3%andadynamicpowerconsumptionof0.748W.With 44

PAGE 45

anambienttemperatureof25C,thedevice'stemperaturereaches31.9C,resultinginastaticpowerconsumptionof0.271Wandatotalpowerconsumptionof1.02W.TheEO-1Hyperionmission'sprimaryradiationconcernsareheavyionsandtrappedprotons.MosttrappedprotonsoriginatefromtheSun'ssolarwindsandaretrappedbytheEarth'smagnetosphere,whereasheavyionsarehighlychargedparticlesoriginatingfromoutsideofthesolarsystem.Increasedsolaractivityreducesbothradiationhazardsbycausingatmosphericexpansiontoremovelow-orbitingtrappedprotonsandstrongersolarwindstorepelheavyionsenteringthesolarsystem.CREME96calculatestheeectsoftheseparticlesonprocessingdevicesbyreportingtheexpectedupsetrateforadeviceinagivenorbit.Engeletal.[ 1 ]giveamuchmoredetaileddescriptionofasimilarexamplewithCREME96.WeusetheNORADtwo-lineelement(TLE)[ 39 ]forEO-1tosupplytheorbitparameters,andthesolar-minimummodeltoensurethedependabilitymetricisaccuratefortheworstcase.Fromtheseparameters,CREME96createsamodeloftheexternalspaceionizing-radiationenvironment,whichmodelstheprotonandheavy-ionuxofvariousenergiesaroundtheEO-1.Assumingatypicalshieldingof100milsofaluminum,CREME96createsatransferredradiationmodelfortheradiationenvironmentinsidetheEO-1.Fromtheinternalradiationmodel,CREME96canestimatethedeviceupsetrateusingthedeviceradiation-responsedata.Table 3-1 showstheheavy-ionandtrapped-protonWeibullparametersforCREME96thatdenethedevice'sradiationresponse.Theheavy-ioninducedupsetrateis0.538upsetsperdayandthetrapped-protoninducedupsetrateis1.12upsetsperday,foratotaldeviceupsetrateof1.66upsetsperday.Adeviceutilizationof14.8%resultsinaneectivedeviceupsetrateof0.246upsetsperday. Table3-1.Virtex-4CREME96WeibullParameters[ 1 2 ]. HeavyIonTrappedProton Onset0.87MeVcm2 mg20MeVWidth304Power10.5LimitingXS1.73m2 bit0.015610)]TJ /F11 5.978 Tf 5.76 0 Td[(12cm2 bit 45

PAGE 46

DuetotheABFTstrategy's10%overhead,theupsetrateincreasesto0.271upsetsperday.Wealsoassumeapessimistic90%coverage[ 38 ]fortheABFTdetection.Ifthesystemdetectsanupset,thesystemrestartsprocessingofthecurrentimagecube,resultinginnooveralladverseeectsforthesystemassumingthattherearenoimpending,hard,real-timedeadlines.With90%coverage,theeectivedeviceupsetratedropsto0.0271upsetsperday,whichisequivalenttoanMTBFof36.9days.Todeterminetheenvironmental-TIDlevels,weusetheSpaceEnvironmentInformationSystem(SPENVIS)[ 40 ],whichwasdevelopedbytheEuropeanSpaceAgencytomodelspaceenvironmenteects.AlthoughCREME96alsomodelsTIDeects,thismodelisinsucientforourcasestudybecauseCREME96doesnotmodelelectroneects,whichcanaccountforasignicantproportionoftheenvironmentalTID.Figure 3-11 depictsthecalculatedenvironmental-TIDlevelsforvariouslevelsofaluminumshieldingintheEO-1orbitduringasolarmaximum.Forathinhypotheticalaluminumshieldingof2.54mm(equalto100mils),SPENVIScalculatesworstandbestcasesof2.203krad/yearand1.436krad/yearduringasolarmaximumandminimum,respectively.AsshowninFigure 3-7 ,theVirtex-4hasaTIDratingof300krad.AssumingthatnootheressentialplatformcomponentshavealowerTIDrating,theEO-1casestudymissioncouldoperatefor136yearswithoutexperiencingfailureduetoTID. 3.4.3ExperimentalSetupFortheEO-1Hyperionmission,ourframeworkcomputesthepoweranddependabilityevaluationmetricsforsixcommercialFPGAdevicefamiliesandthreeFTstrategies.WeevaluatethreestandardVirtexfamilies(Virtex-4,Virtex-5,andVirtex-6),twolow-powerSpartanfamilies(Spartan-3andSpartan-6),andtheradiation-hardenedVirtex-5QVdevice.Ourautomateddata-collectiontoolleveragestheTclscriptingabilitiesoftheXilinxISEDesignSuitetoautomaticallyimplementandanalyzemanyvariousoperationsonhundredsofdierentFPGAdevices,enablingthecalculationofCDafterthedesignerhasspeciedanapplicationandmission.Ourtooliscurrentlysetuptointerfacewith 46

PAGE 47

Figure3-11.SPENVISmodelofTIDcontributionsfromelectrons,protons,andbremsstrahlungwithvaryinglevelsofaluminumshieldingfortheEO-1orbitduringsolarmaximumusingtheSHIELDOSE-2model. Xilinxtools,whichissucientfordemonstratingourframework'sabilities,however,thetoolcouldbemodiedtointerfacewithtoolsfromothervendorsthatsupportTclscripting(e.g.,automatingQuartustoanalyzeAlteradevices).Althoughtheadditionalstudieddevicesgreatlyincreasethedesignspace,ouridenticationofPareto-optimaldesignsscaleswellwiththedesignspacesize,producingagreaterdesign-spacereductionasadesignspaceincreases.Ourframework'scurrentdatabaseexistsasanSQLitedatabase,whichaRubyonRails-basedwebserveraccessesthroughtheActiveRecordlibrary.Whenadesigneraccessesourframework'swebtooltoapplyourframework'sanalysistoasetofdevices,thewebservercollectsallrelevantinformationaboutthesedevicesfromourframework'sSQLitedatabaseandsendsthisdataclient-sidealongwithJavaScriptcodetothedesigner'scomputer.Thedesigner'sbrowserexecutestheJavaScriptcode,whichprocessesthedatabasedatatocalculatetheCDofallspecieddevices,calculatetheevaluation-metricresultsforallpossibledesigns,anddeterminethenalPareto-optimaldesignset.Currently,designersmustoperatetheCREME96andSPENVISwebtools 47

PAGE 48

themselves,enteringinparametersforthemission'sorbitandanydevicesofinterestbasedondatafromliteratureresearch.Futuredevelopmentofourframework'swebtoolwillenabledesignerstostoretheseinputparametersintheframework'sdatabasefordataconsolidationandmayeventuallyenableautomaticresultretrievalfromtheCREME96andSPENVISwebtools.Currently,designersmustdeterminethenumberandtypeofoperationsrequiredbytheapplicationmanually(demonstratedinSection 3.4.1 ),andsupplythisinformationusingourframework'swebtool.Ideally,designerswouldbeabletodenethisapplicationinformationbyonlyspecifyingthegeneralparametersoftheirapplication(e.g.,imagecubesize,datacapturerate).However,thisconveniencetodesignerscanonlybeenabledthroughafullaprioricharacterizationofthemostcommonaerospacekernels,whichisasignicantareaofresearchandisbeyondthescopeofthiswork.TheFTstrategiesincludenofaulttolerance(NFT),ABFT,andTMR,eachofwhichperformsblindscrubbing(withnegligibleoverhead)aftereachimagecubetoremoveanyremainingerrorsinthecongurationmemory,ensuringthatupsetsduringoneimagecubeiterationdonotaectthenextimagecube.Althoughcheckpoint-recoveryanderror-correctioncodesarealsocommonFTstrategies,thesearebest-suitedforhardcoreorsoftcoreprocessorsandtheassociatedcaches,respectively,sotheseprocessorsarenotappropriatefordemonstratingourframeworkwithourcurrentcasestudydevices.WeassumeTMRtohavea200%overheadandincludeanegligiblysmall[ 41 ],radiation-hardened,o-chipvoter.Wealsoassumethatcommon-modefailurescausedbysingle-eventfunctionalinterruptsarenotaproblemforTMRbecausethesefailuresarepredictedtobeextremelyrare,withexpectedratesofonlyoneeventevery36to500yearsforacommercialVirtex-4deviceinvariousLEOorbits[ 42 ].ThedesignconstraintsfortheEO-1HyperionmissionareapowerconsumptionlessthanthreeWattsandanMTBFgreaterthantendays. 48

PAGE 49

Table 3-2 depictsourcasestudy'sdeviceset.EachFPGAfamilycontainsseveralsub-families,whicharegroupsofdevicesoptimizedforbasiclogic,signalprocessing,connectivity,embeddedprocessing,orsomecombinationofthese.Thereareseveraldevicemodelswithvaryingfabricsizesgroupedwithinthesesub-families,withthelargestmodelsbeingroughlyanorderofmagnitudemorepowerfulthanthesmallestmodels.Foreachmodel,thereareoftenseveralpackagesthatoerdieringpackagesizesandI/Ocapabilitieswithoutalteringthemodel'sinternalfunctionality.Finally,eachpackagetypicallyoerseithertwoorthreespeedgrades,withthefasterspeedgradessometimesreachingspeedsashighas40%fasterthanthelowestspeedgrades.Forclarity,intheremainderofthissection,adevicereferstoauniquemodel,package,andspeedgradecombination.Therefore,fromtheoriginalsetofsixXilinxFPGAfamilies,thereareatotalof376devicesincludedinourdeviceset,resultinginadesignspaceof1,128designs(therearethreedesignsforeachdevicebecauseweconsiderthreeFTstrategies). Table3-2.CountsofDevicesUnderStudy. FamilySub-FamiliesDeviceModelsPackagesSpeedGrades Virtex-43172958Virtex-552641105Virtex-64173795Spartan-3182958Spartan-6293059Virtex-5QV1111Total1578167376 Literatureresearchprovidesheavy-ionradiation-responsedatafortheVirtex-4[ 1 ],Virtex-5[ 43 ],Spartan-3[ 44 ],andVirtex-5QV[ 45 ].Hiemstraetal.alsoprovidedprotonradiation-responsedataforallveofourcommercialdevices[ 2 46 { 49 ].Thesourcesforprotonsandheavyionsgivetheradiation-responsedataforonlyasingledevicewithineachdevicefamilybecauseallofthedeviceswithinafamilysharethesamebit-levelstructure.Thus,wereuseradiation-responsedataforalldeviceswithinafamilyafteradjustingforthenumberofcongurationbitsperdevice.Additionally,becauseweareunabletondsucientheavy-ionradiationdatapubliclyavailablefortheVirtex-6and 49

PAGE 50

Spartan-6families,weuseamethoddiscussedbyPeterson[ 50 ]toaccuratelyestimateheavy-ionlimitingcrosssectionsbasedontheprotonlimitingcrosssectionsofthesamedevice.ThelimitingcrosssectionisthemostimportantofthefourWeibullparameters.Whenonlythelimitingcrosssectiondataisknown,wecanstillobtainsucientlyaccurateupsetrateestimatesbycopyingthemissingthreeWeibullparametersfromaknownsimilardevicefamily.SincewedonothaveaccesstoVirtex-5QVtoolsforgeneratingdesigns,weestimatetheVirtex-5QV'sCDbyanalyzingtheVirtex-5FX130T,whichislogicallyidenticaltotheVirtex-5QV.Xilinxdocumentsspecifyablockmemorymaximumfrequencyof360MHzfortheVirtex-5QVand550MHzfortheVirtex-5.SincetheVirtex-5FX130T'sCDisbandwidth-limited(thedevicecantmoreMACoperatorsthantheon-chipblockmemorycansupplywithinputs),weassumethattheVirtex-5QV'sCDis65.45%oftheFX130T'sCD.Toensuremissionsuccess,itisimportantthattheconsidereddevicesareimmunetodestructiveSEEs.SEBandSEGRareprimarilytheconcernofpowerMOSFETsandBJTs[ 51 { 53 ],rarelyaectingcommercialCMOSdevices,suchasXilinxFPGAs.SEListhemostcommondestructiveSEEforCMOSdevices,soSELimmunityisanimportantconcernforourEO-1Hyperionmission.Fortunately,literatureshowsthatallofthesecommercialfamiliesareessentiallySEL-immuneforthelevelsofradiationintherelativelycalmLEOenvironmentoftheEO-1Hyperionmission[ 2 46 { 49 ]. Table3-3.DeviceTIDRatingsandEstimatedLifetimes. FamilyFeatureSize(nm)TIDRating(krad)Lifetime(years) Virtex-490300136Virtex-565341155Virtex-640381173Spartan-390300136Spartan-645373169Virtex-5QV651000454 50

PAGE 51

ThelifetimeevaluationmetricrequiresknowledgeoftheTIDratingsofthedevicesandtheexpectedenvironmental-TIDlevels(Section 3.3.3 ).Table 3-3 depictstheTIDratingsforthesixXilinxFPGAfamiliesinourcasestudy,whichwereobtaineddirectlyfromthedatashowninFigure 3-7 orestimatedbasedonthedata'slineartrendline,whichshowsahighlylinearcorrelationbetweenfeaturesizeandTIDrating.Thisestimationisonlyappropriateforhighlightingthepotentialsuitabilityofadeviceforamissionduringtheearlydesignphase,whichiswhenourframeworkismostuseful.Sincetheseestimationsmaybeinaccurate,anydevicerecommendedbyourframeworkmuststillbetestedforTIDthroughradiationinjectionbeforethedevicecanbeacceptedforthenaldesign.Unliketheotherfamilies,theVirtex-5QVdeviceisradiation-hardenedbydesignandthereforedoesnotfollowthesametrendlineastheotherdevices.Instead,theVirtex-5QV'sproductspecicationstatesthattheVirtex-5QVhasaminimumTIDratingof1Mrad.Table 3-3 showsthepredictedlifetimes(usingSPENVISandtheEO-1missionorbit)foralldevicefamiliesbasedontheworstcase(solarmaximum)value,whicharethelifetimesusedforourcasestudy. 3.4.4ResultsandAnalysisFigures 3-12 3-13 3-14 3-15 3-16 3-17 3-18 ,and 3-19 depictthepoweranddependabilityevaluationmetricresultsforourcasestudy.InorderofascendingfamilyTIDrating,Figures 3-12 3-17 depicttheresultsforalldesignsforeachfamily(Virtex-4,Spartan-3,Virtex-5,Spartan-6,Virtex-6,andVirtex-5QV,respectively)andhighlightseachfamily'sPareto-optimalfront.Figure 3-18 collectivelydepictsthefamilies'designsforcross-familycomparisonandshowsthefamily-specicPareto-optimalfronts.Figure 3-19 depictsourframework'snalPareto-optimaldesignsetafterconsideringlifetimeandlteringunsuccessfuldesignsthatfailtheEO-1Hyperionmission'spowerconsumptionandMTBFconstraints(Section 3.4.3 ).Theresultsdonotshowdesignsrequiringmorethanonedevicetomeetthecomputationalrequirements. 51

PAGE 52

Figure3-12.PoweranddependabilityresultsandParetofrontforVirtex-4-baseddesigns. Figure3-13.PoweranddependabilityresultsandParetofrontforSpartan-3-baseddesigns. 52

PAGE 53

Figure3-14.PoweranddependabilityresultsandParetofrontforVirtex-5-baseddesigns. Figure3-15.PoweranddependabilityresultsandParetofrontforSpartan-6-baseddesigns. 53

PAGE 54

Figure3-16.PoweranddependabilityresultsandParetofrontforVirtex-6-baseddesigns. Figure3-17.PoweranddependabilityresultsandParetofrontforVirtex-5QV-baseddesigns. 54

PAGE 55

Figure3-18.Alldesignswithsixfamily-specicParetofronts. Figure3-19.PoweranddependabilityresultsforalldesignsinthenalPareto-optimaldesignsetincludingfamily-specicParetofronts. 55

PAGE 56

Table 3-4 liststhedesignsincludedinourframework'snalPareto-optimaldesignset.Thespecicdeviceusedinadesignisdescribedbythedevice'sfamily,model,package,andspeedgradelistedundertheFam.,Mod.,Pack.,andSGcolumns,respectively.TheFTcolumnshowsadesign'sFTstrategy.Thenalthreecolumns(Power,MTBF,andLife)showtheresultsofourframework'sevaluationmetricsforpower,dependability,andlifetime,respectively,foreachdesign.Wegroupthedesignsaccordingtothedesign'sdevice'sfamilyandorderbyascendinglifetime.Withineachdevicefamily,weorderthedesignsinascendingorderbypoweranddependability. Table3-4.Pareto-OptimalDesigns. Fam.Mod.Pack.SGFTPower(W)MTBF(days)Life(years) Vir-4SX55FF114812ABFT0.858109136Vir-4SX55FF114812TMR1.351,400,000136Vir-5SX35TFF6653ABFT0.59549.2155Vir-5SX35TFF6653TMR0.912285,000155Sp-6LX16CSG3243ABFT0.41738.7169Vir-6LX75TFF7842ABFT0.69674.2173Vir-6LX75TFF7843TMR0.940650,000173Vir-6LX130TFF11562TMR1.35666,000173Vir-6LX130TFF7843TMR1.38746,000173Vir-6LX130TFF4843TMR1.78749,000173Vir-6LX195TFF7843TMR2.19793,000173Vir-6LX240TFF17593TMR2.43858,000173Vir-6LX240TFF11563TMR2.47867,0001735QVFX130CF17521NFT2.4139304545QVFX130CF17521ABFT2.4435700454 Insomecases,clustersoftwoormorePareto-optimaldesignsarenearlyidenticalinpowerbutnotindependability.Withintheseclusters,designswithlessdependabilityshouldnotbeconsideredbecausethesedesignshaveaninsignicantpowergain.Toaddressthisissue,ourframeworkroundseachevaluationmetricresulttothreesignicantdigits.Therefore,althoughseveraldesignswithintheseclustersmaybetechnicallyPareto-optimalbeforerounding,ourframeworkconsidersonlythemostdependabledesignasPareto-optimalbecausethesedesignsareessentiallyequalinpowerbutsuperiorindependability. 56

PAGE 57

Ingeneral,thereissignicantvariationbetweenthefamiliesandthedesignsineachfamily.Withineachfamilytherearethreetypicallyhorizontally-stretchedgroupsthatrepresentthethreeFTstrategiesusedinourcasestudy.Frombottomtotop,theseFTstrategiesarethelow-powerNFT,themiddle-groundABFT,andthehighly-dependableTMRstrategies.Thehorizontalstretchingofthesegroupsistypicallytheresultofvariationsinstaticpowerbetweenthedierently-sizeddevices,whichaectsthepowerconsumptionbutnotthedependability.Withineachofthehorizontalgroups,therearetypicallysmallerverticalgroupingsconsistingofdesignsthatdieronlyinthedesign'sdevicepackageandspeedgrade,withincreasingdependabilitycorrelatingwithincreasingspeedgrades.Forexample,doublingthedevice'soperatingspeeddoesnotnecessarilyincreasethedevice'sdynamicpowerbecausethefasterdevicewouldrequirehalfasmuchdeviceutilizationtoachievethesamecomputationalcapacity.Conversely,reducingthedeviceutilizationwouldimprovedependability(Section 3.3.2 ).Finally,varyingthedesign'sdevicepackageshowsaslightyetconsistentcorrelation,withlargerpackageshavingalowerthermalresistanceandthereforeslightlylowerstaticpowerconsumption.Table 3-4 showsthatalldesignsontheVirtex-4andVirtex-5Pareto-optimalfrontsusetheSXdevicesub-family.AcloseranalysisrevealsthattheSXsub-familyisbettersuitedfortheHSIapplicationthanothersub-families.Forexample,weconsiderthedierencesbetweentheVirtex-5SX50T-FF1136-3andtheVirtex-5LX50TFF1136-3.Bothdevicesareequivalentinpackageandspeedgrade,butdierinthenumberofdeviceresources,withtheSXdevicehaving13%morelogicand438%moreDSPunitsthantheSXdevice'sLXcounterpart.TheHSIapplication'smultiplicationoperationsaresignicantlymorepower-ecientwhenimplementedusinganFPGA'sDSPunitsratherthanonlyusinganFPGA'sgenerallogicresources.ThisresultsintheSXdeviceachievinga200%largerCDthantheLXdevicefornearlythesamepowerconsumption.Therefore,eventhoughtheSXdevicehas0.18Wmorestaticpower,theSXdevicecan 57

PAGE 58

stillperformtherequiredcomputationswithaloweroverallpowerconsumptionbecausetheSXdevice'sdynamicpowerconsumptionscaleseciently.Conversely,theSXsub-familyisnotdominatingtheVirtex-6'sPareto-optimalfrontfortworeasons.First,theVirtex-6SXsub-familyisonlyavailableintwolargemodels,thesmallestbeingtheSX315Twithastaticpowerof2.08Wat25C.ThesmallestVirtex-6model(theLX75T)hasastaticpowerof0.54Wandperformsthemission'srequiredcomputationwhilestayingwellunder2Woftotalpowerconsumption.Second,theVirtex-6deviceshaveahigherDSP-to-logicratio,asdemonstratedbytheVirtex-6LX75TmodelhavinganequalnumberofDSPsastheVirtex-5SX50T.SinceneithertheVirtex-6LX'snortheVirtex-6SX'sDSPresourcessaturatewhenrunningtheHSIapplication,neithersub-familyhasaclearadvantageovertheother.TheSpartanfamiliesdonothavelargehorizontally-stretchedgroupsduetotworeasons.First,theSpartandevicesoerasmaller,morepower-ecientalternativetotheirVirtexcounterparts.Therefore,manyofthesmallerSpartandesignsaretoosmalltohandlethemission'srequiredcomputations,resultinginareduceddesignsetconsistingofonlylargerdevices.Second,thereisnosignicantvariationbetweenthesub-familieswithineachfamily.TheSpartan-3familydoesnothavedierentsub-familiesandtheSpartan-6hastheLXandLXTsub-families,whichdieronlyinI/Obandwidthandnotincomputationalresources.Therefore,limiteddierencesindevicesizeandspecializationpotentialresultsinthesmallvariationbetweendierentSpartandevicesshowninFigure 3-13 andFigure 3-15 .ThiseectalsoappliestotheVirtex-5QVdevice,whichhasonlyonemodel,package,andspeedgrade,andthusresultsindesignvariationonlybetweenthethreedierentFTstrategies.Figure 3-18 showshowdierentdevicefamiliescanaectadesign'spowerandperformance.TheolderSpartan-3familyperformspoorlyinbothpoweranddependability,whiletheVirtex-4familyprovidesslightlybetterdependabilityandsignicantlybetterpowerconsumptionthantheSpartan-3family.AlthoughtheVirtex-4deviceshavehigher 58

PAGE 59

staticpowerconsumptionthansimilarlysizedSpartan-3devices,theVirtex-4family'sDSPunitsaremorepowerecientthantheSpartan-3multiplierunits,whichisimportanttoconsiderwhendesigningfortheHSIapplication.TheSpartan-6,Virtex-5,andVirtex-6familiesaresuperiorinlow-powerconsumption,andtheVirtex-5andVirtex-6familiesalsoperformsimilarlyindependabilitytotheVirtex-4fordesignsusingTMR.AsshowninTable 3-4 ,theVirtex-5QVisthemostdependabledevicewith1,000XgreaterMTBFfordesignswithsimilarFTstrategies,buthasthehighestpowerconsumptionofallthedevices(alsothehighestcostat$50,000comparedtoastandardFPGApricearound$1,000).Figure 3-19 showsthenalPareto-optimaldesignset.Ourframeworkreducesthedesignspaceof1,128possibledesignsby98.7%,determiningthesefteendesigns,consistingofvedevicefamiliesandthreeFTstrategies,asthesetofsuccessfulPareto-optimaldesignsthatmeetthemission'sconstraints.TheSpartan-3'sPareto-optimalfrontaswellasmuchofthePareto-optimalfrontsoftheVirtex-4,Virtex-5,andSpartan-6arePareto-inferiortotheVirtex-6'sPareto-optimalfront.OurframeworkalsorejectsalloftheSpartan-6andVirtex-6NFTdesignsbecausenoneofthesedesignsmeetthedependabilityconstraintofanMTBFgreaterthantendays.Similarly,theVirtex-5QV'sTMRdesignandthemostpower-hungryPareto-optimalVirtex-6TMRdesignconsumemorethan3W,thusourframeworkrejectsthesedesignsaswell.OfthePareto-optimaldesignset,TMRontheVirtex-4,Virtex-5,andVirtex-6showthebestdependability,ABFTontheVirtex-4,Virtex-5,Virtex-6,andSpartan-6showthebestpowerconsumption,Virtex-5QVhassuperiorlifetime,andTMRontheVirtex-6providesabalanceddesignforallthreemetrics.OursystemtocomputetheseresultsusesasinglecoreofanIntelCorei7-2600CPUrunningat3.4GHzwith8GBofRAMtorunourframework'sJavaScriptcoderunningonversion42oftheMozillaFirefoxbrowser.Weevaluatetheperformanceofourframeworkusingtentrialrunsonoursystem,andwemeasurethedurationoftheanalysis 59

PAGE 60

ofeachrunbyqueryingthesystemtimethroughtheJavaScriptDateobjectbeforeandaftertherelevantJavaScriptcode.Oursystemtakesanaverageof1.01secondstocalculatetheCDforeverydeviceandtheevaluation-metricresultsforeachdesigninourresults.Aftercomputingalltheevaluation-metricresults,oursystemtakesanaverageof4mstondthenalPareto-optimaldesignset.AlthoughgeneralSoC-designoptimizationapproachesmaytypicallyrequiremoreprocessingtime,ourframeworkleveragestheCDmethodologytoprunethemajorityofthedesignvariantoptionsavailableintheSoCdesignandfocusonasingleoptimaldesignvariantforeachdevice.Therefore,althoughthetaskofndingthenalPareto-optimaldesignsetisontheorderofO(n2),wehavereducedthetotalcomputationtimebyrstcomputingCDandtheevaluation-metricresultsforalldesigns,whichinsteadscaleslinearlywiththenumberofdesigns.Wepredictthatevenfordevicesetsthatareseveralordersofmagnitudelarger,calculatingtheevaluationmetricsforalldesignswilldominatetotalcomputationtime,meaningourframework'smethodologywillscalewellintothefuturewiththeadditionofnewdevicestoourframeworkwithincreasingdevicecomplexity. 3.5ConclusionsInthischapter,wehaveintroducedanovelframeworkthatleveragespastresearchandsuccessesindevice,application,andfault-tolerant(FT)strategyanalysistoaidinthedesignofon-boardFPGA-basedSoCsforaerospacecomputing.Ourframeworkconsidersadesigner-denedmissionandapplication,andanalyzesadatabaseofliteratureresearchandexperimentaldatatoprovidedesignerswithanalsetofPareto-optimalsystemdesigns(device/FTstrategycombinations).Ourframework'sevaluationmetricsenabledesignerstoselectthebestdesignfromthisnalsetdependingondesiredmetrictradeosandmissionrequirements.Todemonstrateourframework'spotentialgivenalargedesignspace,weanalyzedadesignspaceof1,128devices(63Xlargerthanourpreviouswork)andprovidedamorein-depthanalysisofthedesignsusingthenewlifetimeevaluationmetric.Our 60

PAGE 61

frameworkreducedthedesignspaceby98.7%,identifyingfteennalPareto-optimaldesigns,includingdesignsspecializinginlowpower,highdependability,highlifetime,oracompromisebetweenallthreeoftheseattributes. 61

PAGE 62

CHAPTER4OPTIMIZINGFPGAPERFORMANCE,POWER,ANDDEPENDABILITYWITHLINEARPROGRAMMINGInordertoenhancetheaccuracyofourframeworkandexpandourframework'scapabilities,thischapterintroducesourlinearprogramming(LP)method.Thischapterisorganizedasfollows.Section 4.1 discussesthebackgroundthatprovidesthefoundationofourLPmethodandrelatedwork.InSection 4.2 ,weshowthemethodologybehindtheperformanceoptimizationofourLPmethod.Section 4.3 discusseshowthisperformanceoptimizationmethodologycanbemodiedtooptimizeforpowerordependabilityinstead.Finally,inSection 4.4 ,weshowtheresultsoftwocasestudiesinvolvingdot-productanddistance-calculationkernelsonarangeofVirtex-5FPGAs. 4.1BackgroundandRelatedWorkLPisapowerfulmethodfordeterminingoptimalsolutionstoproblemsthatcanbestatedintermsoflinearrelationships.Sincesuchawiderangeofproblemscantintothisformat(e.g.,maximizingprotswhiledealingwithlimitedresources),LPisfrequentlyusedinadiversesetofdomains,suchasbusiness,economics,industry,military,andengineering.DesignersalsouseLPinmanyareasofcircuitdesigntoproducelayoutsthatminimizelatency,power,orerrorrateswhilesatisfyingmanysimultaneousconstraints.Landisetal.[ 54 ]usedLPtechniquestoimprovefaultdetectionandrecoveryperformancewhilemeetingmultipleconstraints,suchasnumberofgates,latency,andpower.Agrawaletal.[ 55 ]usedLPtominimizetheimbalanceinthegateinputdelaysthatcausetransientenergyconsumptionduringagatetransition.Byrepresentinggateandbuerdelaysincombinatoriallogiccircuitswithlinearequations,theywereabletoreducepowerbyupto47%insomeinstancescomparedtooriginalcircuitsandndtheoptimaltradeobetweenlatencyandpower.Srinivasanetal.[ 56 ]usedLPtocreatepower-optimized,network-on-chiparchitecturesforapplication-specic,system-on-chipdesigns.Srinivasanetal.minimizedtotalsystempowerbycreatingaoorplansuchthatnetworkingroutesbetweenprocessingcoreswereminimizedwhilealsoensuringthatallcoresandrouting 62

PAGE 63

twithintheareaofaboundingrectangleandthatcertainperformanceconstraintsweremet.Otherworkshavedemonstratednon-LPmethodologiesforpredictingtheoptimalperformanceofanapplicationdesignonanFPGA.TheRCAmenabilityTest(RAT)[ 13 ]isananalyticalmethodologythatusesthreetestsforthroughputperformance,numericalprecision,andresourceutilizationtodeterminetheviabilityofanalgorithmdesignonanFPGApriortotheuseofanyHDL.RATmeasuresthroughputperformancewithbothcommunicationtimefortransferringdataonandotheFPGAandcomputationtimeforprocessingthedataaccordingtoanapplicationdesign,whichreliesonauser-suppliedfrequencyestimationfortheFPGA.Enzleretal.[ 14 ]describedasimilarhigh-levelestimationmethodologyforcharacterizingtheareaandperformanceofanapplicationonanFPGA.UsingaprioriinformationabouttheFPGA'sarchitecture,themethodologycreatedasetofequationstodescribearea,frequency,throughput,latency,andI/Opincount,enablingtheusertoquicklytestthetradeosinvolvedindecomposingpartsofthedesign,replicatingthoseparts,oraddingregistersforpipelining.Finally,Meswanietal.[ 15 ]showedhowtomodelandpredicttheperformanceofhigh-performancecomputingapplicationsonsystemsthatuseGPUorFPGAhardwareaccelerators.Theirmodelevaluatedtheapplication'scodetondsectionsthatcouldbeeasilyacceleratedandusedsimplebenchmarkstopredictacceleratorspeedup.Incontrasttothesemethodologies,ourLPmethodusestheresultsofsingleinstantiatedoperationstopredictthefrequencyandperformanceofanapplicationonanFPGAwithoutrequiringdetailedaprioriinformationabouttheFPGAsuppliedbytheuser.Additionally,ourLPmethodcanconsidermultipleoperationvariantsforanyfunctiontypeandcanalsooptimizeforpoweranddependability. 4.2OptimizingPerformanceOurLPmethod'sprimarypurposeistogeneralizetheoriginalCDmethodologyproposedbyWilliamsetal.[ 10 ]forFPGAstodeterminethemaximumperformance 63

PAGE 64

foranynumberoffunctiontypesandoperationvariants.TheoriginalCDmethodologyconstrainsanalysistoonlytwofunctiontypes(addandmultiply)andtwooperationvariantsforeachfunctiontype(atotaloffouroperationvariants),sothemethodologyonlyreliesonafewsimplecalculations.Addingsupportforadditionalfunctiontypesratherthanjustaddandmultiplyenablesanalysisonamuchbroaderrangeofapplications,andaddingsupportformoreoperationvariantsgivesdesignersmoreoptionstotestwhendetermininganFPGA'smaximumperformance.However,generalizingthemethodologytoconsideradditionalfunctiontypesandoperationvariantsquicklyincreasestheproblemsizebeyondthecapabilitiesofafewsimplecalculations,requiringthepowerandrobustnessofLP.Theremainderofthissectionisorganizedasfollows.Section 4.2.1 providesabriefdiscussionofhowourLPmethodusesLP,Section 4.2.2 showsthenecessaryequationstodescribetheoptimizationproblemandcreatetheinitialtableauforLP,andSection 4.2.3 describeshowtoextractresultsfromtheLP'snaltableau. 4.2.1LinearProgramming(LP)LPisamethodologyfordeterminingthevaluesofasetofdecisionvariablesinalinearsystemthatleadtoanoptimalresultforanobjectivefunction.LPoperatesonasetofinputlinearconstraintequationswrittenintermsofthedecisionvariables.ForourLPmethod,thedecisionvariablesrepresentthequantityofeachoperationvariantusedintheFPGA'soperationdistribution.EveryoperationvariantrequiresauniquecombinationofFPGAresourcetypes,andeachresourcetypeislimitedbasedontheFPGAdeviceunderconsideration.OurLPmethoddescribestheseresourcelimitswithconstraintequationsthatusetheoperationvariantquantitiesasinputs.Additionally,theapplicationrunningontheFPGArequiresaspecicratioofvariousfunctiontypes,whichourLPmethodagainrepresentswithconstraintequationsusingtheoperationvariantquantities.Whenoptimizingforperformance,ourLPmethodcreatesanequationfortheobjectivefunctionthatdenestheFPGA'sperformanceintermsoftheoperationvariants.Finally,ourLPmethodcombinesthecontentsoftheseequationstocreateaninitialtableau.OurLP 64

PAGE 65

methodthenusesthesimplexalgorithm[ 57 ]andBland'srule[ 58 ]toiterativelyperformpivotoperationsontheinitialtableauandcreatethenaltableau.Fromthenaltableau,ourLPmethodcanextractanynecessaryinformationtodeterminetheoptimaloperationdistributionandpredicttheoptimalperformance.Thesimplexalgorithmrequirestheoptimizationproblemtoberepresentedinstandardform,whichmeansthatalldecisionvariablesmustbenonnegativeandallconstraintsmustberepresentedwithequationsandnotinequalities.Itisimpossibletouseanegativenumberofoperationvariants,sothedecisionvariablesarenecessarilynonnegative.Notonlyisthenonnegativevariablerequirementofthesimplexalgorithmautomaticallysatised,butthisrequirementalsopreventsourLPmethodfromneedingadditionalequationstodenethelowerboundsofthedecisionvariables.Unfortunately,manyoftheconstraintsarenaturallyrepresentedwithinequalities,whichviolatetheequationrequirementofthesimplexalgorithm,sotheseconstraintsmustrstbetransformedintoequationsbeforetheinitialtableauisformed.AnexampleofasimpleconstraintinequalityisshowninEquation( 4{1 ),wherexisadecisionvariablewithanupperlimitof100. x100(4{1)Byaddinganewnonnegativeslackvariablestothelessersideoftheinequalityandswitchingtheinequalitysignwithanequalsign,theconstraintinequalityinEquation( 4{1 )istransformedintotheequivalentconstraintequation: s+x=100(4{2)Slackvariablesaresimilartodecisionvariablesinthatslackvariablesarenonnegativeandareusedtodeneconstraintequations,butslackvariablesdonotrepresentthequantityofanyoperationvariant.Sincesisassumedtobenonnegative,Equation( 4{2 )requiresthatxcanbenolargerthan100,andsotheconstraintinequalityofEquation( 4{1 )ispreserved. 65

PAGE 66

4.2.2CreatingtheInitialTableauInordertodemonstratehowourLPmethodformstheinitialtableaurequiredforLP,wediscussabaseexampleusingaspecicFPGAdevice,theVirtex-5LX20T-FF323-2,adot-productkernel,andasetofoperationvariants.WechosetofocusourexampleontheVirtex-5LX20TbecauseitisthesmallestdeviceintheVirtex-5family,whichissuitableforabaseexample,andbecauseVirtex-5deviceshavealowerDSP-to-logicratiothanthemoremodernVirtexfamilies,therebyenablingourLPmethodtomakemoreinterestingdecisions.Dotproductisanimportantkernelthatiswidelyusedwithinotherbasickernelsandapplications(e.g.,matrixmultiplication,convolution).BeforeourLPmethodcanbegincreatingtheconstraintequations,theusermustdenevariouspropertiesforthedevice,application,andoperationvariants.Table 4-1 showsthenecessarydevicepropertiesfordeningtheVirtex-5LX20Tdevice.OurLPmethoddenestheusableresourceamountsforFFsandLUTsas85%ofthecorrespondingactualresourceamounts,anadjustmentidenticaltothatoftheCDmethodologytoaccountfora15%logicresourceoverheadforsteeringlogicandmemoryorI/Ointerfacing[ 10 ]. Table4-1.Virtex-5LX20TResources. FFsLUTsDSPsActualUsableActualUsable 12,48010,60812,48010,60824 Ourparticulardot-productkernelreceivestwovectorsconsistingofmultiple32-bitintegervaluesandoutputsasingle64-bitinteger,requiring32-bitintegermultiplyoperationsand64-bitintegeraddoperations.Table 4-2 showsthedatathatisrequiredfordeningthesetofoperationvariantsusedinourexample.Eachoperationvarianthasafunction,avariantname,thenumberofresources(i.e.,FFs,LUTs,andDSPs)requiredtoinstantiateeachinstanceoftheoperationvariant,andthemaximumachievablefrequencyatwhichtheoperationvariantcanoperatecorrectly.Forourexample,weconsidertwoaddvariants:alarger,fullypipelinedaddvariantthatcanoperateatahighfrequency;andasmalleraddvariantthatuseslessresourcesbutcannotoperateasquickly.Wealso 66

PAGE 67

considerthreemultiplyvariants:alogicvariantthatusesnoDSPs;amixedvariantthatusesamixtureofDSPsandbasiclogicresources;andaDSPvariantthatusesalmostnobasiclogicresources.Todeterminethepropertiesofeachoperationvariant,asingleinstanceofeachoperationvariantwasgeneratedwiththeXilinxCOREGeneratorSystemandinstantiatedonaVirtex-5LX20TwithXilinxIntegratedSynthesisEnvironment(ISE),theresultsofwhichprovidedthedatainTable 4-2 Table4-2.Virtex-5LX20TOperationVariantProperties. FunctionVariantFFsLUTsDSPsFreq.(MHz) AddSmall64640362AddLarge1702100401MultiplyLogic1,0931,1330354MultiplyMixed7347111328MultiplyDSP81324500 WiththedatafromTable 4-1 andTable 4-2 ,ourLPmethodcancreatetheresource-limitingequations(RLE),whichareconstraintequationsthatensuretheoperationdistributiondoesnotleadtoadesignthatusesmoreresourcesthanareavailableontheconsidereddevice.OneRLEisneededforeachresourcetype,andeachRLEdescribeshowmanyofthecorrespondingresourceareavailableonthedeviceandhowmanyeachinstanceofanoperationvariantconsumes.Inourexample,ourLPmethodusestheinequalitiesshownin( 4{3 ),( 4{5 ),and( 4{7 )toformtheRLEsinEquations( 4{4 ),( 4{6 ),and( 4{8 ).ThedecisionvariablesxAs,xAl,xMl,xMm,andxMdrepresentthequantityofthesmall-add,large-add,logic-multiply,mixed-multiply,andDSP-multiplyvariants,respectively.TheslackvariablessFF,sLUT,andsDSPrepresenttheamountofunusedFFs,LUTs,andDSPs,respectively. 64xAs+170xAl+1096xMl+734xMm+81xMd10608(4{3) sFF+64xAs+170xAl+1096xMl+734xMm+81xMd=10608(4{4) 64xAs+210xAl+1133xMl+711xMm+32xMd10608(4{5) 67

PAGE 68

sLUT+64xAs+210xAl+1133xMl+711xMm+32xMd=10608(4{6) xMm+4xMd24(4{7) sDSP+xMm+4xMd=24(4{8)Next,ourLPmethodcreatesthefunctionratioequations(FRE)thatenableustotargetthespecicapplicationrunningonthedevice.OurLPmethodcharacterizesanapplicationbythefunctiontypesthatanapplicationcomprises(e.g.,add,multiply,divide,squareroot)andtheratiobetweenthosefunctiontypes.Foradot-productkernel,thefunctionratioisapproximatelyoneaddtoonemultiply,butforafastFouriertransformkernel,thefunctionratioisapproximatelythreeaddstotwomultiplies.Foranapplicationwithndierentfunctiontypes,ourLPmethodrequiresanFREforallbutoneofthenfunctiontypes,resultinginatotalofn)]TJ /F1 11.955 Tf 12.54 0 Td[(1FREs.TocreateanFREforaparticularfunctiontype,ourLPmethodusesthegeneralizedFREshowninEquation( 4{9 ),wherexiistheithdecisionvariablethatrepresentsanoperationvariantwiththecorrespondingfunction,yiistheithdecisionvariablethatrepresentsanoperationvariantwithadierentfunction,andequalsthefractionoftheapplication'soperationsthatcorrespondtothefunctiontype. (1)]TJ /F3 11.955 Tf 11.96 0 Td[()(Xxi))]TJ /F3 11.955 Tf 11.95 0 Td[((Xyi)=0(4{9)Forourexample,thereareonlytwofunctiontypes,soourLPmethodonlycreatesoneFREusingtheaddfunctiontype,whichisshowninEquation( 4{10 ). (1)]TJ /F1 11.955 Tf 11.95 0 Td[(0:5)(xAs+xAl))]TJ /F1 11.955 Tf 11.96 0 Td[(0:5(xMl+xMm+xMd)=0(4{10)Finally,ourLPmethodcreatestheobjectivefunctionthatrelatesthedecisionvariablestoanewobjectivevariable,forwhichtheintentionistomaximize.SinceourLPmethodiscurrentlytryingtomaximizeperformance,theobjectivevariablerepresentstheperformanceofthedeviceintermsofmillionsofoperationspersecond(MOPS).Ourcomputationalmodelassumesthatalloperationsoperateinthesame 68

PAGE 69

clockregion,sotheperformanceofthedevicecanbecalculatedasthesumofalltheoperationsusedintheoperationdistributionscaledbythelimitingfrequency,whichistheminimumachievablefrequencyamongsttheoperationvariantsbeingconsidered.Forourexample,themixed-multiplyvariantsetsthelimitingfrequencywiththelowestachievablefrequencyof328MHz.Equation( 4{11 )showsthegeneralobjectivefunctionforperformanceoptimization,wheretheobjectivevariablezisthedevice'sperformanceinMOPS,xiistheithdecisionvariable,andfisthelimitingfrequency.Equation( 4{12 )showstheobjectivefunctionforourexample.NotethatalthoughEquation( 4{11 )assumesthateveryoperationvariantproducesanoutputeverycycle,Equation( 4{11 )canbeeasilymodiedtoallowforoperationvariantsthatdonotsatisfythisassumptionbyscalingtheassociateddecisionvariables. z=fXxi(4{11) z=328(xAs+xAl+xMl+xMm+xMd)(4{12)AftercreatingthenecessaryRLEs,FREs,andobjectivefunction,ourLPmethodcreatestheinitialtableau.Table 4-3 summarizesandreformatstheseequationstoclarifyhowtheseequationscorrespondtotherowsoftheinitialtableau. Table4-3.SummaryofEquations. TypeEquationRef.Num. Obj.Func.z=328(xAs+xAl+xMl+xMm+xMd) 4{12 FRE(1)]TJ /F1 11.955 Tf 11.95 0 Td[(0:5)(xAs+xAl))]TJ /F1 11.955 Tf 11.96 0 Td[(0:5(xMl+xMm+xMd)=0 4{10 FFRLEsFF+64xAs+170xAl+1096xMl+734xMm+81xMd=10608 4{4 LUTRLEsLUT+64xAs+210xAl+1133xMl+711xMm+32xMd=10608 4{6 DSPRLEsDSP+xMm+4xMd=24 4{8 Table 4-4 showstheinitialtableauforourexample.OurLPmethodconstructsthetableausuchthateachrowrepresentsoneoftheequationsfromTable 4-3 (theonlyrestrictionisthattheobjectivefunctionmustgointothetopmostrowinordertobeeasilyhandledbythesimplexalgorithm).Everycolumn(exceptfortherightmostcolumn)correspondstoeitheranobjectivevariable,slackvariable,ordecisionvariable.Each 69

PAGE 70

elementinthetableaushowsthecoecientofthevariablecorrespondingtotheelement'scolumnintheequationcorrespondingtotheelement'srow.Theelementsoftherightmostcolumnrepresenttheconstanttermswithinthecorrespondingequationsonthesideoppositeofthevariables. Table4-4.InitialTableauforPerformanceOptimization. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd Obj.Func.1000-328-328-328-328-3280FRE00000.50.5-0.5-0.5-0.50FFRLE01006417010937348110608LUTRLE00106421011337113210608DSPRLE00010001424 Aftercreatingtheinitialtableau,ourLPmethodcanapplythesimplexalgorithmtoperformpivotoperationsonthetableauuntilthenaltableauisproduced.DuetotheinclusionoftheFRE,thetableauisnotincanonicalform(i.e.,thereisnosubsetofthetableau'scolumnsthatcanberearrangedtocreateanidentitymatrixequalinheighttothetableau),soourexamplerequiresatwo-phasesimplexalgorithm,wherePhaseItransformsthetableauintocanonicalform,andPhaseIIproducesthenaltableau.PhaseIrequirestheadditionofanewarticialvariableforeachFRE,asinglenewarticialobjectivefunction,andasinglenewarticialobjectivevariable.Theseadditionstransformtheinitialtableauintocanonicalform,andtherestofPhaseIcanbegintozerooutthearticialvariablesusingthearticialobjectivefunction.AssumingthatatleastoneoperationdistributionexiststhatsatisestheFREsandRLEs(whichisalwaystruewhenoptimizingforperformance)PhaseIsuccessfullycompletesbyproducingacanonicaltableaufromwhichtheaddedarticialtermscanbedropped.PhaseIIthenbeginsandcompletesbyproducingthenaltableau,whichisrecognizedbytheabsenceofnegativevaluesinthetoprow. 70

PAGE 71

4.2.3FinalTableauandResultsTable 4-5 showsthenaltableauforourexample.Sincethetableaumustbecanonical,theremustexistasubsetofvecolumnsthatcanberearrangedtoformanidentitymatrix.Thevariablesassociatedwiththesecolumnsarecalledbasicvariables,whiletheremainingvariablesarecallednonbasicvariables.OurLPmethodcanquicklyndthevalueforeachbasicvariableintherightmostcolumnbyassumingavalueofzeroforthenonbasicvariables.Readingoutthevaluesinthismannershowsthatamaximumperformanceofabout10.2GOPSisachievedwhentheoperationdistributionusesabout16small-addoperations,13mixed-multiplyoperations,and3DSP-multiplyoperations.Theoperationdistributiondoesnotusethelarge-addandlogic-multiplyvariantstoachievemaximumperformance.TheslackvariablesalsoshowthattheoperationdistributionusesalloftheFFsandDSPsandalmostalloftheLUTs. Table4-5.FinalTableauforPerformanceOptimization. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd 10.60140.5068.491.20010217.8=z00.000.211.10.10015.6=xAs00.00-0.000.11.51012.8=xMm0-1.0111.7041.556.300431.4=sLUT0-0.000.30-0.0-0.4012.8=xMd Althoughourexampledeterminesthemaximumperformancewhenconsideringallveoperationvariants,itmaybepossibletondahigherperformancebyconsideringonlyasubsetoftheoperationvariants.Althoughlimitingtheuseofanyoperationvariantscanonlyreduceorhavenoeectonthetotalnumberofoperationsusedbytheoperationdistribution,theperformancecouldstillimproveifthesetofremainingoperationvariantshasanimprovedlimitingfrequency.Unfortunately,themethodfordeterminingthelimitingfrequencyofasetofoperationvariantsisnotlinear,soourLPmethodmustinsteaditerativelyperformtheLPstepsdescribedabovemultipletimestotestalternative 71

PAGE 72

subsetsoftheoperationvariantsandndtheoptimallimitingfrequency.Figure 4-1 showsaowdiagramforthisiterativeprocess. Figure4-1.Iterativeprocessfortestingmultiplelimitingfrequencies. Aftertherstiterationshowninourexample,successiveiterationsremovetheoperationvariantwiththelowestachievablefrequency,therebyenablingourLPmethodtotesthigherlimitingfrequencieswiththeremainingoperationvariants.Thisiterativeprocesscontinuesuntilanyfunctiontypehaslostallassociatedvariants(asituationinwhichtheFREswouldbeimpossibletosatisfy),afterwhichourLPmethodoutputstheoperationdistributionwiththehighestperformance.Table 4-6 showstheresultsofeachiteration.Aftertherstiteration,theiterationprocessremovesthemixed-multiply,logic-multiply,andnallysmall-addvariants,leavingonlythelarge-addandDSP-multiplyvariantsforthenaliteration.Althoughremovingthelargeaddvariantwouldfurtherincreasethelimitingfrequencyto500MHz,thisremovalmakesitimpossibletoachievethecorrectfunctionratiowithonlytheDSP-multiplyvariantremaining.Inourexample,therstiterationproducedthehighestperformance,yetourLPmethodcannotguaranteethatthisresultisoptimalwithouttestingotheralternativesubsetsoftheoperationvariantsaswell.Notethatinthecaseofanapplicationthatrequiresonlyonefunctiontype,therearenoFREs,sotheinitialtableauisalreadycanonicalandPhaseIofthesimplexalgorithmcanbeskipped.AlthoughPhaseIisalwayssuccessfulwhenoptimizingforperformance,PhaseImaynotcompletesuccessfullywhenoptimizingforothergoalssinceadditionalconstraintequationsmaybeused. 72

PAGE 73

Table4-6.IterationResultsforPerformanceOptimization. OperationisAvailable?(#ifused)SmallLargeLogicMixedDSPLimitingSimultaneousPerformanceAddAddMult.Mult.Mult.Freq.(MHz)Operations(GOPS) X15.6XXX12.8X2.832831.210.22X14.4XX8.4X635428.810.17X6XX636212.04.35X6X640112.04.81 Furthermore,thisiterativeprocessistheonlydivergenceofourLPmethod'sperformanceoptimizationfromtheCDmethodology(asidefromthegreaterexibilityofourLPmethodtoconsideranynumberoffunctiontypesandoperationvariants).TheCDmethodologyactuallyoptimizesforthemaximumnumberofsimultaneousoperationsonthedevice.ThentheCDmethodologyndsthelimitingfrequencyoftheoperationvariantsthatwereactuallyusedandmultipliesthenumberofoperationsbythelimitingfrequencytoobtaintheperformance.Thedierencebetweenthemethodologiesissubtle,anddoesnotalwaysproducedierentresults,however,insomesituations,ourLPmethodcanndamoreoptimaloperationdistributionwhenconsideringthesameoperationvariantsastheCDmethodology. 4.3ModicationstoOptimizeforPowerorDependabilityGenerally,thereisonlyoneoperationdistributionthatcanproducetheoptimalperformance,thusourLPmethodcannotoptimizeforanotherdesigngoalafteroptimizingforperformancebecausetherewouldnotbeanyotherdesignoptionsfromwhichtochoose.However,adesignermaynotneedtoachievethemaximumperformanceonadeviceiftheyalreadyknowhowmuchperformancetheirapplicationneeds.Incaseswhereadesignerisalreadytargetingaspeciclevelofperformancefortheirapplication,theremaybemanyoperationdistributionsthatcansatisfytheperformancedemands,makingitpossibletooptimizeforalternativedesigngoals.ThissectionshowshowtomodifytheperformanceoptimizationofourLPmethodtoinsteadoptimizeanoperationdistributionforpowerordependabilityforagiventargetperformance. 73

PAGE 74

4.3.1OptimizingPowerPower-consumptiondesigngoalscanbetantamounttoperformancegoals,especiallyforcertainextreme-computingdomains.Intheaerospacecomputingdomain,powerisoftenalimitedresource,andcertainsituationsmayplacemoreimportanceonmeetingaminimalpowerbudgetthanmeetinglessextremeperformancegoals.Additionally,althoughsupercomputersmayhaveaccesstoincredibleamountsofcomputationalresources,theactualcostsrequiredtopayfortheseresourcesplacesalargeemphasisonincreasingcomputationalpowereciency.Duetotheimportanceofpowerconsumptionasadesigngoal,weshowhowourLPmethodcanoptimizepowerforatargetperformancewithonlytwomodicationstotheinitialtableau:replacingtheperformance-basedobjectivefunctionwithoneforpower;andaddingatargetperformanceconstraint.Inordertocreateapower-basedobjectivefunctiontoreplacetheperformanceobjectivefunction,ourLPmethodmustknowhowmuchpowerisconsumedbyeachoperationvariant.WeusetheXilinxPowerEstimatortooltoestimateafrequency-normalizedpowervaluewithunitsofmW/MHz,whichenablesourLPmethodtoreusethissupplieddatatodeterminepowerconsumptionforanyoperatingfrequency.Therearetwomethodsforestimatingthepowerconsumptionofeachoperationvariant.TherstmethodiseasiestandonlyrequiresinformationonthepowerconsumptionofthethreemainFPGAresourcesthatcomposealloperationvariants.Withthesethreevalues,ourLPmethodcanestimatethepowerconsumptionofanyoperationvariantbysummingupthetotalpowerconsumptionsoftheoperationvariant'sconstituentresources.AmoreaccuratemethodinvolvesimportingdataforeachoperationvariantfromXilinxISEintothepower-estimatortoolinordertodirectlydetermineamoreaccuratepowerestimateforeachoperationvariant.Forourwork,weusethelatter,moreaccuratemethod.Table 4-7 showsthepower-estimateresultsforallveoperationvariantsinourexampleafterimportingdatafromXilinxISEintothepower-estimatortool.Table 4-7 alsoshowsthe 74

PAGE 75

contributionsoftheFPGAresourcesandclocktreetothetotalpowerconsumptionofeachoperationvariant,thoughthisdataisnotrequiredbyourmethod. Table4-7.Virtex-5LX20TOperationVariantEstimatedPowerConsumption. DynamicPower(mW/MHz)FunctionVariantFFsLUTsDSPsClockTotal AddSmall0.0070.0100.0000.0060.023AddLarge0.0190.0570.0000.0250.101MultiplyLogic0.1460.2060.0000.1130.465MultiplyMixed0.0940.1510.0180.0840.347MultiplyDSP0.0090.0140.0720.0110.106 WiththedatafromTable 4-7 ,ourLPmethodcancreatethepower-basedobjectivefunction.Equation( 4{13 )denesthetotaldynamicpowerconsumption(staticpowerconsumptionisaddedonattheveryendofthedynamicpoweranalysistocalculatetotalpower)asthesumofthepowercontributionsofeveryoperationintheoperationdistributionscaledbythelimitingfrequency.Sinceourgoalistomaximizetheobjectivevariable,ifwenaivelyusespowerastheobjectivevariable,powerismaximizedinsteadofminimized.Tocircumventthisissuewithoutsignicantalterations,wedenetheobjectivevariableasthenegativeofthepowerconsumption.Inthisway,asourLPmethodattemptstomaximizetheobjectivevariable(negativepower),itactuallyminimizesthedevice'spowerconsumption(positivepower).Equation( 4{14 )showsthenewobjectivefunctionforourexample,wheretheobjectivevariablezisnowthenegativeofthetotaldynamicpowerconsumptioninmW. Power(mW)=f(0:023xAs+0:101xAl+0:465xMl+0:347xMm+0:106xMd)(4{13) z+7:6xAs+33:2xAl+152:4xMl+113:7xMm+34:7xMd=0(4{14)Next,ourLPmethodcreatesthetargetperformanceequation(TPE),anewconstraintequationthatenablesthetargetingofaspecicperformancefortheoperationdistribution.WithouttheTPE,ourLPmethodwouldalwaysachieveaminimumpowerconsumptionof0Wattsbyremovingalloperationsfromtheoperationdistribution. 75

PAGE 76

Equation( 4{15 )showsthegeneralTPE,whichisnearlyidenticaltoEquation( 4{11 ),thegeneralperformance-basedobjectivefunction.Forourexample,wetargetaperformanceof7.5GOPS,andEquation( 4{16 )showstheresultingTPE. TargetPerformance(MOPS)=fXxi(4{15) 328xAs+328xAl+328xMl+328xMm+328xMd=7;500(4{16)OurLPmethodcreatestheinitialtableauforourexample(Table 4-8 )inthesamemannerasbefore,exceptthatnowtheTPEisinsertedbelowtheobjectivefunction.AsseeninTable 4-8 ,theTPEissimilartotheDFEsandrequirestheadditionofanextraarticialvariable,meaningthatourLPmethodcanneverskipPhaseIofthesimplexalgorithmwhentargetingaspecicperformance.Furthermore,theTPEcancausePhaseItofailifthedesignersetsthetargetperformanceabovethemaximumperformance.AfailureinPhaseImeansthatnoviablesolutionsexistforthegivenconstraints,whichwouldobviouslybetruewithanunachievableperformanceconstraint.AsidefromthefactthatPhaseIisnownecessaryandcanpotentiallyfail,ourmethodoperatesontheinitialtableauinthesamemannerasbeforetocreatethenaltableau. Table4-8.InitialTableauforPowerOptimization. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd Obj.Func.10007.633.2152.4113.734.70TPE00003283283283283287500FRE00000.50.5-0.5-0.5-0.50FFRLE01006417010937348110608LUTRLE00106421011337113210608DSPRLE00010001424 Table 4-9 showsthenaltableaufortherstiterationofourexample.Takingthenegativeoftheobjectivevariableshowsaminimumdynamicpowerconsumptionof1.057Wforadot-productkernelrunningontheVirtex-5LX20T-FF323-2at7.5GOPS.Table 76

PAGE 77

4-9 alsoshowsthatasignicantnumberofFFsandLUTsareunusedonthedevice,indicatingthatthisdesignisindeednotoptimizedforperformance. Table4-9.FinalTableauforPowerOptimization. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd 10026.3025.712.400-1056.8=z001226.30146.0195.7004583.1=sLUT000-0.011.0-0.00011.4=xAs000-0.300.01.3107.3=xMm010217.70106.0141.3004211.1=sFF0000.300.0-0.3014.2=xMd OurLPmethodperformstheiterativeprocess(Figure 4-1 )justasbeforetotestthebenetsofhigherlimitingfrequencies.Table 4-10 showstheresultsofeachiteration.Onceagain,therstiterationproducedthebestoperationdistribution.NotethatTable 4-6 showsthatthenaliterationoftheperformanceoptimizationusingonlythelarge-addandDSP-multiplyvariantshasamaximumperformanceof4.81GOPS,sothesamesubsetofoperationvariantsfailsinthenaliterationofpoweroptimizationbecausethetargetperformanceof7.5GOPSistoohigh. Table4-10.IterationResultsforPowerOptimization. OperationisAvailable?(#ifused)SmallLargeLogicMixedDSPLimitingSimultaneousDynamicAddAddMult.Mult.Mult.Freq.(MHz)OperationsPower(W) X11.4XXX7.3X4.232822.91.057X10.6XX4.6X635421.21.069XXX362N/AN/AXX401N/AN/A 4.3.2OptimizingDependabilityDependability,representedhereasmeantimebetweenfailures(MTBF),isalsoanimportantdesigngoalforextreme-computingdomains.Inaerospace,radiation-inducedfailurescancausesystemswithlowdependabilitytosuerfromdataerrors,downtime,andevencatastrophicfailure.AlthoughdevicesonEarthdonottypicallysuerfromasmanyfailures,asupercomputerthatcomprisesthousandsofprocessingdevicescansuer 77

PAGE 78

fromalowtotaldependabilityifthefailureofindividualdevicesleadstosystem-widefailures.Sincedependabilityisanimportantdesigngoalinsomesituations,weshowhowourLPmethodcanusetheinitialtableauforperformanceoptimizationandmaketwomodications(similartothemodicationsforpoweroptimization)tooptimizefordependabilityinstead.Inordertocreateadependability-basedobjectivefunctiontoreplacetheperformanceobjectivefunction,ourLPmethodmustdenealinearrelationshipbetweenthequantityoftheoperationvariantsandthetotaldependability.Unfortunately,thetotalMTBFofasystemisnotlinearwithrespecttotheMTBFoftheconstituentparts,thusourmethodusestheerrorrate(thereciprocalofMTBF)todenealinearrelationshipbetweenthequantityoftheoperationvariantsandthetotalerrorrateandthenconvertstheerrorratetothetotaldependability.Estimatingerrorratesrequiresrstestimatingtherateofupsetsinducedinthedevicebytheoperatingenvironmentandthendeterminingtherateoferrorscausedbytheupsetrate(upsetsoccurringinunusedareasofanFPGAdonotcauseerrors).Forourexample,weconsideranFPGAdeviceoperatingontheInternationalSpaceStation,andpredictanupsetrateusingCREME96[ 16 ]andVirtex-5fault-injectiondata[ 43 47 ].Aswithpowerestimations,wecanestimateerrorrateseitherasperresourceorperoperationvariant.Themostaccuratemethodmeasurestheerrorrateforeachoperationvariantbyperformingfaultinjection(radiation-orsoftware-based)onasingleinstanceoftheoperationvariant.However,faultinjectionisbeyondthescopeofthiswork,soinsteadwemeasuretheerrorrateforeachFPGAresourceusingaworst-caseestimationtechniquethatassumesallcongurationbitsassociatedwitharesourcecauseanerrorwhenupset.Table 4-11 showstheerror-rateestimationresultsforallveoperationvariantsinourexample.Table 4-11 alsoshowsthecontributionsoftheFPGAresourcestothetotalerrorrateofeachoperationvariantinordertoprovidegreaterclarityonthesourcesoferrorinanFPGA,thoughthisdataisnotrequiredbyourmethod. 78

PAGE 79

Table4-11.Virtex-5LX20TOperationVariantEstimatedErrorRates. ErrorRate(errors/year)FunctionVariantFFsLUTsDSPsTotal AddSmall0.200.200.000.40AddLarge0.530.660.001.19MultiplyLogic3.433.560.006.99MultiplyMixed2.302.230.104.63MultiplyDSP0.250.100.390.75 WiththedatafromTable 4-11 ,ourLPmethodcancreatethedependability-basedobjectivefunction.Equation( 4{17 )denesthetotalerrorrateasthesumoferrorratesofeveryoperationintheoperationdistribution.NotethatthelimitingfrequencyplaysnodirectroleinthecalculationofthetotalerrorratebecausewemodeltheerrorratesoftheFPGAresourcestobeindependentofthefrequency(thoughmodelsincorporatingsingle-eventtransientscouldbeusedinsteadifdesired).Inordertomaximizedependability,ourLPmethodminimizestheerrorratebydeningtheobjectivevariableasthenegativeofthetotalerrorrate,similarlytothepower-minimizingobjectivefunctionfrombefore.Equation( 4{18 )showsthenewobjectivefunctionforourexample,wheretheobjectivevariablezisnowthenegativeofthetotalerrorratemeasuredinerrorsperyear. Errors Year=0:40xAs+1:19xAl+6:99xMl+4:63xMm+0:75xMd(4{17) z+0:40xAs+1:19xAl+6:99xMl+4:63xMm+0:75xMd=0(4{18)Sinceourexampleisstilltargetingaperformanceof7.5GOPS,wecanreuseEquation( 4{16 )astheTPEforoptimizingdependability.WiththenewobjectivefunctionandTPE,ourLPmethodcreatestheinitialtableau(Table 4-12 )similarlytothatforpoweroptimization,withtheTPEonceagaininsertedunderneaththenewobjectivefunction.Aswiththepoweroptimization,theinclusionoftheTPEmeansthatPhaseIofthesimplexalgorithmisnownecessaryandmaypotentiallyfail.OurLPmethodoperatesontheinitialtableausimilarlyasbeforetocreatethenaltableau. 79

PAGE 80

Table4-12.InitialTableauforOptimizingDependability. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd Obj.Func.10000.401.196.994.630.750TPE00003283283283283287500FRE00000.50.5-0.5-0.5-0.50FFRLE01006417010937348110608LUTRLE00106421011337113210608DSPRLE00010001424 Table 4-13 showsthenaltableaufortherstiterationofourexample.Takingthenegativeoftheobjectivevariableshowsaminimumerrorrateof41.3errors/year(oramaximumMTBFof8.83days)foradot-productkernelrunningontheVirtex-5LX20T-FF232-3at7.5GOPS.Exceptforthetoprow,thenaltableausinourexamplefortherstiterationofthepoweranddependabilityoptimizationsareidentical,meaningthatthisoperationdistributionissimultaneouslyidealforpoweranddependability. Table4-13.FinalTableauforDependabilityOptimization. obj.slackdecisionzsFFsLUTsDSPxAsxAlxMlxMmxMd 1001.300.81.100-41.3=z001226.30146.0195.7004583.1=sLUT000-0.011.0-0.00011.4=xAs000-0.300.01.3107.3=xMm010217.70106.0141.3004211.1=sFF0000.300.0-0.3014.2=xMd OurLPmethodperformstheiterativeprocess(Figure 4-1 )similarlyasbeforetotestthebenetsofhigherlimitingfrequencies.Table 4-14 showstheresultsofeachiteration.Surprisingly,eventhougheachiterationproducesthesameoperationdistributionsaswereproducedforthepoweroptimization,theseconditerationisactuallytheoptimalchoicefordependabilityoptimizationwithanMTBFof8.93days.Foraxedtargetperformance,increasingthelimitingfrequencyrequiresaproportionaldecreaseinthenumberofsimultaneousoperations.Sincepowerisproportionaltobothfrequencyandquantityofsimultaneousoperations,increasingthelimitingfrequencywithaxedtarget 80

PAGE 81

performancehasnodirecteectonpowerconsumption.However,sincethetotalerrorrateisonlyproportionaltothequantityofsimultaneousoperationsandnottothefrequency,increasingthelimitingfrequencydirectlyimprovesdependability(althoughlimitingtheavailabilityofcertainoperationvariantsmaystillproduceoverallworseresultsforhigherfrequencies). Table4-14.IterationResultsforDependabilityOptimization. OperationisAvailable?(#ifused)SmallLargeLogicMixedDSPLimitingSimultaneousDependabilityAddAddMult.Mult.Mult.Freq.(MHz)Operations(MTBF:days) X11.4XXX7.3X4.232822.98.833X10.6XX4.6X635421.28.925XXX362N/AN/AXX401N/AN/A 4.4ResultsandAnalysisThissectiondiscussesthesetupandresultsoftwocasestudies,eachusingadierentkernelinordertotestavarietyofcapabilitiesforawide-rangeevaluationofourLPmethod'seectiveness.Section 4.4.1 concludesthispaper'sexampleofusingabasecasestudywiththeVirtex-5LX20Trunningadot-productkerneltoexperimentallydeterminetheminimumachievablepowerforvariousperformancevaluesusingthissetup.Section 4.4.2 introducesamorecomplexcasestudyinvolvingawiderangeofVirtex-5devicesrunningdistance-calculationkernels,whichsignicantlyincreasesthenumberofoperationvariantsanalyzedinourmethod. 4.4.1BaseCaseStudy:DotProductOurbasecasestudytestsourLPmethod'spredictionsofthispaper'sbaseexampleinvolvingaVirtex-5LX20Tdevicerunningadot-productkernel.Totestthesepredictions,wecreatemultipledesignsofadot-productkernelontheVirtex-5LX20TusingvariouscombinationsoftheveoperationvariantsdenedinTables 4-2 4-7 ,and 4-11 .AfterplacingandroutingthedesignsinXilinxISEwithahigheortlevel,weobtainthefrequencyofthedesign,whichwemultiplybythedesign'snumberofoperationsto 81

PAGE 82

calculatethedesign'sperformanceinGOPS.WethenimportthedesignsintotheXilinxPowerEstimatortooltoestimateeachdesign'spower.Becausefaultinjectionisacomplicatedprocessthatisbeyondthescopeofthiswork,wedonotmeasurethedependabilityofthedesigns.Bycomparingthepowerandperformanceofvariousdesignsizesandoperationdistributions,wemeasuretheminimumrequireddynamicpowerforagivenperformanceanddeterminewhichdesignsareactuallymostpower-ecientforvariousrangesofperformance. ab=Xaibi(4{19)Equation( 4{19 )denesadot-productoperationforvectorsaandb,whereaiandbiaretheithentriesinvectorsaandb,respectively.Asingledual-portblockRAMsupplieseachsetofcorresponding32-bitintegerinputentries(i.e.,aiandbi),andasimpleaddressgeneratordriveseachoftheseblockRAMs.ForeachblockRAM,asingle32-bitintegermultiplyoperationcalculatestheproductoftheblockRAM'stwo32-bitintegervalues,andoutputsthe64-bitintegerproduct.Finally,anadd-treeconsistingof64-bitintegeraddoperationssumsalloftheseproductsinparallelandoutputsthenalanswer.Thedesignisfullypipelined,meaningthedeviceperformsafulldot-productcalculationeverycycle.Wevarytheperformancepercycleofthedot-productkernelbyvaryingthelengthoftheinputvectors.Figure 4-2 showsourLPmethod'spredictions,aswellasthemostpower-ecientdesignsforvariousperformancevaluesinGOPS.PredictionsshowthatdesignsshouldonlyusetheDSP-multiplyvariantformultiplyoperationswhentargetingaperformancebelow4.35GOPS.At4.35GOPS,alloftheDSPunitsareneededtokeepupwithinputvectorsoflengthsix.Above4.35GOPS,increasingthelengthoftheinputvectorsrequiresadditionalmultiplyoperations,whichrequiresusingtheremainingunusedlogicresources.Asweaddmorepower-hungry,logic-multiplyoperations,dynamicpowerconsumptionrisessharply.Fortargetedperformancesof6.88GOPSandabove,themostpower-ecientdesignscompletelyavoidthelogic-multiplyvariant.Instead,mixed-multiplyoperations 82

PAGE 83

Figure4-2.PowerpredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX20Trunningdot-productkernel. graduallyreplacesomeoftheDSP-multiplyoperations,startingwitharound57%ofthemultiplyoperationsbeingmixed-multiplyvariantsandprogressingtoaround82%.Ourmethodpredictsthatthemaximumperformanceoperationdistributionconsistsof15.6small-add,2.8DSP-multiply,and12.8mixed-multiplyoperations,foratotalperformanceof10.22GOPSanddynamicpowerconsumptionof1.669W.Finally,predictionsshowthatusingthelarge-addvariantdoesnotincreasemaximumperformanceorpowereciency.Thehigherfrequencyofthelarge-addvariantdoesnotimprovedesignsusingmixedorlogic-multiplyvariants,whichoperateatalowerfrequency.Evenwhenthemixedandlogic-multiplyvariantsareabsentinlower-performancedesigns,theextralogicresourceoverheadofthelarge-addvariantosetsthepower-eciencyoftheDSP-multiplyvariant.Figure 4-2 showstheexperimentalresults,whichlargelyconrmourLPmethod'spredictions.Forgreatestpowereciency,thelarge-addvariantisneverbenecial, 83

PAGE 84

low-performancedesignsrequireonlytheDSP-multiplyvariant,mid-performancedesignsrequirebothlogicandDSP-multiplyvariants,andhigh-performancedesignsrequiremixedandDSP-multiplyvariants.Weachievethehighestperformanceof7.60GOPSusingtwoDSP-multiplyandtwelvemixed-multiplyoperations.Furthermore,forthehighestpossiblepowereciency(calculatedasperformancedividedbypowerconsumption),ourmethodsuggeststoonlyusesmall-addandDSP-multiplyoperationsandtoavoidllingupthedeviceanyfurtherwiththeothervariants.Thisinsightmaybeimportantifthecostofpower(orlackofpower)isthedesigner'sprimaryconcernratherthanmaximumperformanceorthecostofhardware.Insuchasituation,usingmultipledevicesatless-than-maximumcapacitymaybepreferabletousingmoretotalpoweronfewerfullyutilizeddevices. Figure4-3.FrequencypredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX20Trunningdot-productkernel. SeveraleectsareresponsibleforthedierencesbetweenourLPmethod'spredictionsandtheexperimentalresults.Theprimaryeectcomesfromthedierencebetweenthe 84

PAGE 85

predictedandachievablefrequencies,showninFigure 4-3 .Theincreasedcomplexityofafulldesignoverjustasingleinstanceofanoperationvariantincreasesthedicultyofplacingandroutingthedesign,whichcanleadtolowerachievablefrequencies.Areductioninfrequencyresultsinaproportionalreductioninperformanceanddynamicpowerconsumption,causingthepointsinFigure 4-2 toshifttowardstheorigin.Forthisreason,graphsofexperimentalresultsshouldlooklikeshrunkenversionsofourpredictedresults.Smallersecondaryeectsresultinadditionalsmallexperimentaldierencesfrompredictions.ThelogicoverheadrequiredtosupporttheblockRAMsandblockRAMaddressgenerationresultsinaslightincreaseindynamicpowerconsumptionforalldesigns.Fordesignsusinglogicandmixed-multiplyvariants,extraregistersareneededtoecientlypipelinethedesignsandmeettiming.Theseextraregisterssignicantlyincreasethedynamicpowerconsumptionofthemid/high-performancedesignsandlimitdesignsfrombeingabletohandlevectorlengthsoffteenaspredicted,resultingindecreasestothemaximumachievableperformanceaswell.Finally,thelargerthanexpecteddropinfrequencywhenaddingthelogic-multiplyvariantresultsinalargerthanexpectedincreaseindynamicpowerconsumptionataround4GOPS. 4.4.2ComplexCaseStudy:DistanceCalculationOurcomplexcasestudyissimilartothebasecasestudybutincreasesthecomplexityofourLPmethod'sanalysisbyfocusingona32-bitoating-pointdistance-calculationkernel,whichinvolvesagreaternumberoffunctiontypesandoperationvariants.Furthermore,thiscasestudytestsourmethod'sperformanceoptimizationacrossthefullrangeofsizesintheVirtex-5LXTsubfamilyandtestspoweroptimizationonthemid-sizeVirtex-5LX85Tdevice.Distancecalculationsarecommoninnumerousapplicationsinsupercomputing(e.g.,physicalsimulationsandcomplex-valuemathematics)andaerospace(e.g.,startrackingusingplanartriangles[ 59 ]).Thedistance-calculationkernelusedinthiscase 85

PAGE 86

studyinvolvesrepeatedlyperformingthe2D-distancecalculationshowninEquation( 4{20 )onaseriesofaandbvectors,whereaandbare2D-Cartesiancoordinatevectorswith32-bitoatingpointvaluesforthexandyentries.Similartothedot-productkernel,dual-portblockRAMsdrivenbyaddressgeneratorssupplyeachofthecorrespondingentriesoftheinputvectors,requiringoneblockRAMforthexcoordinatesandonefortheycoordinates.ForeachblockRAM,asubtractoperationcomputesthedierenceoftheblockRAM'stwo32-bitoating-pointvalues,andtheresultgoestobothinputsofamultiplyoperationtocomputethesquareddierencebetweenthecorrespondingcoordinates.Anadditionoperationthensumsthesetworesultsandpassesthesumtoasquare-rootoperation,whichoutputsthenal32-bitoating-pointanswer.Thedescribeddesignrepresentsasingledistance-calculationcore,whichisfullypipelinedandthereforeperformsadistancecalculationeverycycle,andwhichconsistsoftwoadd,twomultiply,onesubtract,andonesquare-rootoperations.Wecanvarytheperformancepercycleofthekernelbyincreasingthenumberofdistance-calculationcoresincludedinthedesign. d=q (ax)]TJ /F3 11.955 Tf 11.95 0 Td[(bx)2+(ay)]TJ /F3 11.955 Tf 11.96 0 Td[(by)2(4{20)TodemonstratethecapabilitiesofourLPmethod,weconsideralloperationvariantsavailableintheoating-pointlibraryoftheXilinxCOREGeneratorSystemforeachfunctiontypeinourdesign.Forboththeaddandsubtractfunctiontypes,thereisalogic-onlyvariantandaDSPvariantthatusestwoDSPunits.Sincethesubtractvariantsaresosimilartothecorrespondingaddvariants,wegroupthesubtractvariantswiththecorrespondingaddvariantsforourLPmethod'sanalysistosimplifytheresultswithoutsacricinganyaccuracy.Forthemultiplyfunctiontype,thereisalogic-onlymultiplyvariant,amedium-multiplyvariantthatusesoneDSPunit,afull-multiplyvariantthatusestwoDSPunits,andamax-multiplyvariantthatusesthreeDSPunits.Onlythelogic-onlyvariantisavailableforthesquare-rootfunctiontype.Foreachdeviceinthiscasestudy,wemeasuretheresourceconsumptions,maximumachievablefrequency,and 86

PAGE 87

dynamicpowerconsumptionofeachoperationvariantusingXilinxISEwithahigh-eortplaceandroutelevelandtheXilinxPowerEstimatortool.Table 4-15 showstheresultsofthesemeasurementsforonlytheVirtex-5LX85T-FF1136-3,buttheseresultsaresimilaracrosstheentiresetofstudieddevices,withtheexceptionofproportionalreductionsinachievablefrequenciesfordevicesofaslowerspeedgrade. Table4-15.Virtex-5LX85TOperationVariantProperties. MaxFreq.D.PowerFunctionVariantFFsLUTsDSPs(MHz)(mW/MHz) Add/SubLogic54641605080.213Add/SubDSP32723025040.153MultiplyLogic68161904540.283MultiplyMedium36825814930.147MultiplyFull17110325150.0969MultiplyMax1069034970.0966SquareRootLogic76553105030.266 ThiscasestudyinvestigatesperformanceoptimizationonalleightuniquesizesoftheVirtex-5LXTsubfamilytoshowhowourLPmethodperformsonawiderangeofFPGAsizes.WefocusthecasestudyontheVirtex-5becausethisfamilyofFPGAshasalowerDSP-to-logicratiothanthemoremodernVirtex-6andVirtex-7FPGAs.UsingdeviceswithlowerDSP-to-logicratioshelpstodemonstrateourLPmethod'sdecision-makingprocess,sincethemostpowerecientcomputationalresourcesonthedevice(i.e.,DSPunits)arelimitedandmustbeusedintelligently.WefocusthecasestudyontheLXTsubfamilyoftheVirtex-5familybecausethissubfamilyhasthelargestrangeofresourceamounts,withthelargestmember,theVirtex-5LX330T,containingoversixteentimesthenumberoflogicresourcesasthesmallestmember,theVirtex-5LX20T.Sincepackagesizeandspeedgradehavenegligibleandpredictableeectsrespectively,weonlyinvestigatethefastestspeedgradeonthesmallestpackageforeachuniquesizeoftheVirtex-5LXTsubfamily.Table 4-16 showsourLPmethod'spredictionsforthehighest-performancedesignsoneachdevice,aswellasthehighest-performancedesignsthatwecanexperimentally 87

PAGE 88

Table4-16.PredictionsandResultsofPerformanceOptimizationforDistance-CalculationKernel. VariantsAchieved Virtex-5PredictedUsedResourceUse%Perf.(GOPS)DeviceAdd/SubMult.Add/SubMult.FFLUTDSPPred.ResultRU 70%Log.75%Log.LX20T30%DSPFull25%DSPFull88%59%92%11.09.486%81%Log.78%Log.LX30T19%DSPFull22%DSPFull87%58%100%18.315.987%81%Log.78%Log.LX50T19%DSPFull22%DSPFull87%58%100%27.519.069%42%Med.40%Med.LX85TLogic58%FullLogic60%Full89%61%100%44.828.965%42%Med.40%Med.LX110TLogic58%FullLogic60%Full89%61%100%58.438.666%89%Full.93%Med.LX155TLogic11%MaxLogic7%Max84%57%91%90.059.366%41%Med.39%Med.LX220TLogic59%FullLogic61%Full82%54%91%102.657.056%41%Med.33%Med.LX330TLogic59%FullLogic67%Full79%54%94%156.782.453% achieve.Aswiththebasecasestudy,wetestthesepredictionsbycreatingmanydesignsofthedistance-calculationkerneloneachdeviceusingvariouscombinationsofthenineavailableoperationvariants.AfterplacingandroutingthedesignsinXilinxISEwithahigh-eortlevel,weobtainthefrequencyofthedesign,whichwemultiplybythenumberofoperationsinthedesigntocalculatetheperformanceofthedesigninGOPS.Weselectthehighest-performingdesignforeachdeviceandreportthefeaturesofthesedesignsinTable 4-16 ,includingtheoperationdistributionsofthedesign,thepercentofthedevice'stotalresourcesthatthedesignuses,andthepercentofpredictedperformancethatwecanachieve.Table 4-16 showsthatourLPmethodaccuratelypredictsthecorrectoperationdistributionthatproducestheoptimal-performancedesign.ForthesmallerdevicesthathavehigherDSP-to-logicratios,itcorrectlypredictsthatsomeoftheaddoperationsshouldbeoftheDSPvariantandallofthemultiplyoperationsshouldbeofthefull 88

PAGE 89

variant,whichusestwoDSPsandasmallamountoflogic.ForthelargerdeviceswithlowerDSP-to-logicratios,itcorrectlyrecommendsusingmorelogic-centricvariantsbyusingonlythelogicvariantfortheaddoperationsandacombinationofmediumandfullvariantsforthemultiplyoperations.Averageresourceutilizationsof85.5%forFFsand95.8%forDSPsconrmourLPmethod'sassumptionsofa15%overheadforlogicresourcesand0%overheadforDSPs.WecouldnottesttheoverheadforLUTsinthiscasestudybecauseeveryoperationvariantusesmoreFFsthanLUTs,andeverytesteddevicecontainsanequalnumberofFFsandLUTs,soLUTscanneverbealimitingresource.Wealsonotethatitispossibletousemorethan85%oflogicresourcesinadesign,butdoingsoincreasesdesigncomplexityandreducesachievablefrequencies,whichultimatelyosetsanyadvantagefromanincreasednumberofoperationsandreducesperformance.Therefore,the15%logicoverheadvaluedoesnotrepresentahardlimit,butisusefulwhenoptimizingformaximumperformance.SinceourLPmethodisbasedontheCDmethodology,whichpredictsthetheoreticalmaximumperformanceforadevice,weexpecttheexperimentallyachievedmaximumperformanceforeachdevicetobesomeproportionofourmaximumperformanceprediction.Richardsonetal.[ 60 ]describearealizableutilization(RU)metrictoquantifythedierencebetweentheoreticaldeviceperformanceshownbyCDandtheperformancedesignerscanachieve.Forsmallerdevices,theRUscorereachesaround86%,butasdevicesizeincreases,theRUscoresteadilyfallstoaslowas52%.Thediscrepancybetweenpredicteddesignperformanceandachievableperformanceisprimarilycausedbyadiscrepancyindesignfrequencies,whichproportionatelyaectsperformanceanddynamicpowerconsumption.TheXilinxISEplaceandrouteprocessisabletoobtainslightlyhigherfrequenciesfortheoperationvariantswhenmeasuredinisolationthanwhentheoperationvariantsareincludedinanentiredesign,andthiseectincreasesforthelargerdevicesasroutingcomplexityincreases.Overall,thetheoreticalmaximumperformance 89

PAGE 90

predictedbyourmethodisagoodrst-orderestimateofachievableperformanceonaparticulardevice,andanyknownRUscoresforsimilardevicesrunningsimilarapplicationscanenhanceourpredictionsevenfurther.Furthermore,evenwithoutaprioriRUscores,ourLPmethodservesasausefultoolinpredictingtherelativeperformancebetweensimilarlysizeddevices.TotestourLPmethod'spoweroptimizationonthismorecomplexcasestudy,weinvestigatethemid-sizedVirtex-5LX85Tacrossarangeofperformances.WithknowledgeoftheVirtex-5LX85TRUscore,wescaletheoperatingfrequencyofthedevice'soperationvariantsby64.5%inordertoimproveourLPmethod'saccuracy.Figure 4-4 showsourminimumdynamic-powerpredictions,aswellasthemostpower-ecientdesignsachievableforvariousperformancevalues.OurLPmethodpredictsthatdesignsshouldonlyusetheDSPaddandmaxmultiplyvariantswhentargetingaperformancebelow7.68GOPS.At7.68GOPS,alloftheDSPunitsareneededtokeepupwithfoursimultaneousdistancecalculationcores.Justafter7.68GOPS,ourmethodsuggeststradingthemaxmultiplyoperationsforfullmultiplyoperationsuntil9.22GOPS,whereallmultiplyoperationsareofthefullvariant.After9.22GOPS,logicaddoperationsstartreplacingDSPaddoperationsuntil23.3GOPS,wherealladdoperationsareofthelogicvariant.Beyond23.3GOPS,ourLPmethodrecommendsreplacingfullmultiplyoperationswithmediummultiplyoperationsuntil28.90GOPS,whereadesignconsistingof15.15distancecalculationcoresisusingallavailableFFandDSPresources.Figure 4-4 showsexperimentalresultsthatconrmourLPmethod'spredictionsforminimumpowerafteraccountingforthedevice'sRUscore.Incomparisontothebasecasestudy,thelogicoverheadrequiredtosupporttheblockRAMsandblockRAMaddressgeneratorsisrelativelysmallascomparedtotheincreasednumberofoperationsrequiredtooperateonthedatafromtheblockRAMs,sotheextrapowerincreaseissmallerthaninthepreviouscasestudyaswell.Additionally,unlikewiththebasecasestudy,the 90

PAGE 91

Figure4-4.PowerpredictionsandresultsfordesignsusingminimumpoweronVirtex-5LX85Trunningdistance-calculationkernel. distance-calculationkerneldoesnotrequireextrapower-consumingregisterstohelpwithecientpipelining.Finally,ourLPmethodsuggeststhatmaxmultiplyoperationsshouldprogressivelyreplacefullmultiplyoperationsbetween7.67and9.22GOPS,butthisisuntrueexperimentally.Thisdiscrepancyoccursbecauseourmethodassumesthatthequantitiesofresources,operations,anddistancecalculationcoresarecontinuousvaluesratherthanintegervalues.At7.67GOPS,theminimum-powerdesignusesonlyDSPaddandmaxmultiplyoperationstocreatefourdistancecalculationcores.However,at9.22GOPS,ourmethodrecommendsusingonlythefullvariantformultiplyoperations,whichallowsforamaximumof4.8distance-calculationcoresbeforethedesignrequiresalloftheDSPunits.Sincethemaxmultiplyvariantisbestwhenusingonlyfourdistancecalculationcores,andwecannotactuallydesignafractionofadistancecore,designsusingfullmultiplyoperationswithoutusinglogicaddoperationsareneverpower-optimal. 91

PAGE 92

Fortunately,thisissuewithusingcontinuousvaluesisminimalwhenthesizeofthedeviceissignicantlylargerthanthesizeofanapplication'scomputationalcores,somostofourotherpredictionsareaccurate. 4.5ConclusionsInthischapter,wehaveintroducedourLPmethod,aneectivetoolforexploringearlydesignsbyquicklydeterminingtheoptimaloperationdistributionforaparticulardeviceandapplicationwithrespecttoperformance,power,ordependabilityandcalculatingquantitativemetricsfordesigncomparisonpurposes.OurLPmethodisamorepowerfulgeneralizationoftheestablishedCDmethodology,andstillrequiresthatdesignerscharacterizetheresourceusageandoperatingfrequencyofanyoperationsconsideredforadesign.TodemonstrateourLPmethod'scapabilities,weconductedabaseandacomplexcasestudy.Thebasecasestudyanalyzedpoweroptimizationforadot-productkernelonthesmallestVirtex-5device.ThecomplexcasestudyanalyzedperformanceoptimizationonawiderangeofVirtex-5devicesandagainanalyzedpoweroptimization,butonamid-sizeVirtex-5deviceasopposedtothesmallestdeviceinthebasecasestudy.Thecomplexcasestudyalsoincreasedthecomplexityoftheanalysisbyinvestigatingadistance-calculationkernelinvolvingamorediversesetoffunctiontypesandoperationvariants.Resultsofthecasestudiesshowthatourmethodaccuratelypredictstheoperationdistribution(withinanaverageof4%ofactualvalues)thatcanachievemaximumperformanceandprovidesreasonableestimatesfortheamountofperformanceadesignercanreach.Resultsalsoshowthatourmethodcanaccuratelyrecommendoperationdistributionsforcreatingdesignsthatachieveagivenperformancewithminimumpowerconsumption,althoughthesepredictionsmayneedtobesupplementedwiththedevice'sRUscoretogiveabsoluteratherthanrelationalinformation.Overall,theresultsdemonstratethatourmethodcanhelpdesignerstocomparedevices,predictdesignmetrics,andselecttheoptimaloperationdistributiontobestmeettheirdesigngoals. 92

PAGE 93

CHAPTER5MEMORY-AWAREOPTIMIZATIONOFFPGA-BASEDSPACESYSTEMSInordertodemonstratetheexibilityofourframeworkandsignicantlyexpandourframework'scapabilities,weleveragetheversatileLPmethodtoenableourframework'smemoryextension.Thischapterisorganizedasfollows.Section 5.1 discussesthebackgroundandrelatedworkthatprovidesthefoundationofourframework'smemoryextension.InSection 5.2 ,wediscussourapproachinmodifyingourframework'sanalysistoconsidertheEMB/IMCrelationshipandperformmemory-awareanalysis.Section 5.3 presentsanoverviewoftheHSIcasestudyanddemonstrateshowadesignercananalyzethethreememoryresourcestodetermineanapplication'sEMB/IMCrelationship.Finally,inSection 5.4 ,weshowtheresultsofthreeexperimentstodemonstratetheuseandeectivenessofourframework'smemoryextension. 5.1BackgroundandRelatedWorkSeveralpriorworkshaveshowntheeectsofvariousmemoryresourcesonperformanceandtheimportanceofmemory-awareanalysis.UnderwoodandHemmert[ 61 ]analyzedvector-dotproduct,matrix-vectormultiply,andmatrixmultiplyimplementationsonaVirtex-II6000FPGA.TheauthorsdiscussedtheimportanceofIMBforsustainingtheperformanceoftheoatingpointoperations,andtheypredictedthatasFPGAoating-pointperformanceincreases,theFPGA'sinherentlyhighIMBwouldenableFPGAstooutperformtraditionalCPUs,whicharemorelikelytobeIMB-limited.Douetal.[ 62 ]alsoanalyzedmatrixmultiplicationonFPGAs,focusingonaparticularmatrixmultiplicationalgorithmtoshowthatthealgorithm'srequiredEMBwasproportionaltotheinversesquarerootoftheIMCusage.TheauthorsveriedthisresultbyimplementingthematrixmultiplicationalgorithmonaVirtex-IIProFPGAandvaryingtheEMBandIMCusagetoachievedierentperformance. 93

PAGE 94

5.2Memory-AwareAnalysisOurframework'smemory-awareanalysisenhancesouroriginalframework'sanalysisbyincludingtheeectsofthememoryresourcesonpoweranddependability.Weenableourframework'smemory-awareanalysiswithtwoseparateextensions:aninternal-memoryextensionandanexternal-memoryextension.First,weintroduceahigh-levelviewofourmemory-resourceconceptsandanalysis.Thenweusetheseconceptstoshowhowweaddtheinternal-memoryextensionbyextendingourLPmethod.Wethenshowhowweaddtheexternal-memoryextensionbyembeddingourLPmethodinawrapperalgorithm. 5.2.1Memory-ResourceConceptsandAnalysisFigure 5-1 depictsahigh-levelmemorymodelofthethreememoryresources'interactions.IMC(i.e.,BRAMstorage)buersdatabeforetheoperatorsprocessthisdataorsendthisdatatoanexternal-memorydeviceforstorage.IMBistheon-chipdatabandwidthbetweentheIMCandtheoperators.EMBistheo-chipdatabandwidthbetweentheIMCandanyarbitrarynumberofo-chipexternal-memorydevicesviaanequalnumberofon-chipexternal-memoryports(Figure 5-1 showsanexamplewithtwoexternal-memorydevices). Figure5-1.High-levelmemorymodelofmemory-resourceinteractions. 94

PAGE 95

Inadditiontorepresentingthenumberandtypesofoperators,thedevicecongurationnowrepresentstheamountofIMCusage,IMB,EMB,andexternal-memoryportsontheFPGA.TheLPmethodmustnownddesigncongurationsthatmatchorexceedthelimitssetbymission-specicCD,IMB,andEMBrequirements.Calculatingthemission'srequiredIMBisstraightforwardsinceIMBdependsonlyontheoperators'inputsandoutputs,whichthemission'srequiredCDdescribes.However,calculatingthemission'srequiredEMBismorecomplicated,sinceEMBdependsontherequiredCD,theIMCusage,andthemission'sapplication.AcursoryanalysisofthememorymodelrevealsthattherequiredEMBmustinverselycorrelatewithIMCusage,whichwecanobservebyvaryingtheamountofIMCusage.WithalmostnoIMCusage,almosteveryoperatorinputwouldrequirebothIMBandEMB,sincetheIMCcouldnotcachethedataforreuse.Conversely,withaninniteamountofIMC,theinputdatawouldonlyrequireEMBoncetotransferthedatatoIMC,andtheneachdatausebytheoperatorswouldonlyrequireIMB,resultinginanegligiblylowEMBrequirement.Therefore,forrealisticdesigncongurationswithIMCusagebetweenthesetwoextremes,theamountofrequiredEMBmustbebetween0andtherequiredIMB.Consequently,wecanonlyaccuratelypredicttheEMB/IMCrelationshipafterweanalyzethemission'sapplication'smemoryrequirements. 5.2.2Internal-MemoryExtensionOurLPmethodndstheoptimaldevicecongurationthatsatisestheFPGAresources'constraints.OurLPmethoddenestheoptimaldevicecongurationbybuildinganoptimizingequationthatdescribestheimpactofvariousoperatorsontheoptimizationtarget,whichourframeworkcansettopowerordependability.Beforeourframeworkcanassesstheimpactofanoperator,eitherthedesignerorourframework'sdevicedatabasemustspecifythepropertiesofthelogicresourcesthatcomposetheoperators.WeconsiderthreelogicresourcesonastandardFPGA:FFs,LUTs,andDSPunits.Forpower,weusevendor-providedtoolstoestimatethepowerperMHzofeach 95

PAGE 96

logicresourceandinputthesepowervaluesintoourframework'sdevicedatabase.Ourframeworkmeasuresdependabilityintermsoferrorsperday.Thedesignermustspecifytheerrorratesforeachresource,sincetheerrorrateswillchangesignicantlybasedonthemission'senvironment.Predictingerrorratesgenerallyinvolvescombiningradiation-testresultswithradiation-environmentmodelssuchasCREME96[ 16 ]orSPENVIS[ 40 ].Usingthepoweranderror-ratepropertiesofthelogicresources,ourframeworkcandeterminethepoweranderrorrateofeachoperatorandbuildtheoptimizingequationforourLPmethod.Ourframeworkmustalsobuildaresource-limitequation(RLE)foreachlogicresource,ensuringourLPmethodignoresimpossibledevicecongurationsthatusemorelogicresourcesthanareavailable.AnRLEdenestheresourcequantityontheFPGAandtheresourceconsumptionforeachoperator.Table 5-1 showsourestimatesofthepoweranderrorratesforthethreelogicresourcesonaVirtex-4,Virtex-5,andVirtex-6,aswellasfouradditionalFPGAresourcesforanalyzingIMB,IMC,andEMB.Weobtainthepowerestimationsfromthevendor-providedXilinxPowerEstimatortool.Fortheerrorrates,weuseCREME96topredicttheerrorrateofacongurationbitforeachdeviceinthesameorbitastheEO-1satellitefromourcasestudymission.Wecalculatetheerrorrateofeachresourceasthenumberofcongurationbitsusedtoprogramtheresourcetimestheerrorrateofasinglecongurationbit[ 63 ].Thisisaworst-caseestimationtechniquethatassumesallbitsassociatedwitharesourceareabletocauseanerror.Inrealityonlyabout10%ofthebitsinafully-utilizedFPGAareabletocauseanerror[ 27 ],sincemostofthecongurationbitsgotowardsunusedroutingresources,butthiseectisapplication-anddesign-dependent.FortheBRAMmemorybits,whichrepresentasinglebitofIMCandnottheBRAMstructureitself,weusetheadjustederrorrateofasinglecongurationbitbasedonvendor-providedneutron-injectionresults[ 64 ].Notethatwehaveformedtheerror-rateestimatespurelyforthepurposeofdemonstratingourframework'sanalysisin 96

PAGE 97

thischapter.Althoughtheseerror-rateestimatesarejustied,theseestimatesmayormaynotreectmoreaccurateresultsobtainedthroughbeamtesting. Table5-1.EstimatedpoweranderrorratesforVirtex-4,Virtex-5,andVirtex-6FPGAresources. DeviceFamilyResourcePower(nW/MHz)Errors/day Virtex-4FF2045:7610)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-4LUT3085:7610)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-4DSP300001:7610)]TJ /F6 7.97 Tf 6.59 0 Td[(2Virtex-4BRAM(cong.bits)598563:3510)]TJ /F6 7.97 Tf 6.59 0 Td[(2Virtex-4BRAMmemorybitN/A9:2610)]TJ /F6 7.97 Tf 6.59 0 Td[(6Virtex-4Pin(inputmode)7126:2910)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-4Pin(bidir/outmode)338186:2910)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-5FF1381:5810)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-5LUT1751:5810)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-5DSP180004:9310)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-5BRAM(cong.bits)715721:0610)]TJ /F6 7.97 Tf 6.59 0 Td[(2Virtex-5BRAMmemorybitN/A6:5010)]TJ /F6 7.97 Tf 6.59 0 Td[(6Virtex-5Pin(inputmode)9171:9010)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-5Pin(bidir/outmode)309881:9010)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-6FF1111:9910)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-6LUT1403:9710)]TJ /F6 7.97 Tf 6.59 0 Td[(4Virtex-6DSP143321:2410)]TJ /F6 7.97 Tf 6.59 0 Td[(2Virtex-6BRAM(cong.bits)572582:6510)]TJ /F6 7.97 Tf 6.59 0 Td[(2Virtex-6BRAMmemorybitN/A2:5010)]TJ /F6 7.97 Tf 6.59 0 Td[(6Virtex-6Pin(inputmode)9177:7710)]TJ /F6 7.97 Tf 6.59 0 Td[(3Virtex-6Pin(bidir/outmode)334867:7710)]TJ /F6 7.97 Tf 6.59 0 Td[(3 Theinternal-memoryextensionmodiesourLPmethodinseveralwaystoconsidertheeectsofinternalmemory,producingourinternal-memoryLP(IMLP)method.First,ourIMLPmethodignoresunsustainabledevicecongurations,wheretheoperatorsrequiremoreIMBthanisavailable,byaddinganIMB-resourcelimitingequation(IMB-RLE)tothesetofRLEs.TodenetheIMBresourcequantity,ourframeworkcalculatesthemaximumIMB/cycleas: IMBcycle=[#ofBRAMs]#ofPorts BRAMUsableBytes Port(5{1) UsableBytes Port=WordWidth BRAMPortWidthBytes Word(5{2) 97

PAGE 98

Forexample,ontheVirtex-5LX330,onlyonewhole32-bitwordcantthroughaBRAMport(36-bitwidth),sotherearefourbytesperport.Allofthe288BRAMsontheVirtex-5LX330aretrue-dualport,sotheIMBresourcequantityis2,304bytes/cycle.TocompletetheIMB-RLE,ourframeworkdenestheIMB-resourceconsumptionasthenumberofbytesfromIMBrequiredbyeachoperatorpercycle.Forourcasestudyapplication(Section 5.3 ),eachmultiplyoperatorrequirestwowordsfromtheIMBeverycycle.Becausetheoutputsofthemultiplyoperatorssupplytheinputsoftheaddoperators,theaddoperatorsdonotdirectlyconsumeIMBdata.Sincethereareanequalnumberofaddandmultiplyoperators,thecasestudyapplicationrequiresanaverageoffourbytesfromIMBperoperatorpercycle.AquickanalysisoftheIMB-resourcequantityandaverageIMB-resourceconsumptionshowsthattheVirtex-5LX330canonlysustain576multiplyand576addoperatorsforthecasestudyapplication.AlthoughtheIMB-RLErestrictsourIMLPmethodtoonlyconsideringsustainabledevicecongurations,theIMB-RLEdoesnotenableourIMLPmethodtooptimizeforpowerordependabilityaccordingtoIMB.Theinternal-memoryextensionalsoenablesourIMLPtoconsiderIMBwhenoptimizingadevicecongurationbymodifyingourIMLP'soptimizationequation.Theoptimizationequationdenestheimpactofeachoperatorintermsofpowerordependability,dependingontheoptimizationtarget.Theinternal-memoryextensionadjuststheseimpactvaluesbasedonhowmuchIMBeachoperatorconsumes.Furthermore,thecalculationsforpoweranddependabilityassumethatusingaBRAMatlessthanmaximumcapacitywilldecreasetheBRAM'spowerconsumptionanderrorrateaccordingly.Forpower,ourframeworkusesthepowerperBRAMtocalculatethepowerperbyteofIMBandaddstheresulttoeachoperatorbasedontheoperator'sIMBconsumption.Fordependability,thedesignermustspecifyanerrorrateperBRAM(notincludingBRAMstoragebitsforIMC).Aswiththepowervalues,ourframeworkthenusestheBRAMerrorratetocalculatetheerrorrateperbyteofIMBandaddstheresulttoeachoperatorbasedontheoperator'sIMBconsumption. 98

PAGE 99

Afterndingtheoptimaldevicecongurationbasedontheoptimizationtarget,ourframeworkcancalculatethenalpoweranddependabilityresults.OnceourIMLPmethodndstheoptimaldeviceconguration,ourframeworkcalculatesthenalpoweranddependabilityresultsbyaccumulatingtheresourceconsumptionsoftheoperators.Thedevicecongurationalsodenesasingleoperatorfrequency,whichourIMLPmethodsetsasthelowestoperationalfrequencyconsideringalltheoperatorsusedinthedesignconguration.Ourframeworkusestheoperatorfrequencytoscalethetotaloperatorpower,whichisthereasonfornormalizingtheresourcepowerconsumptionsbyoneMHzinthedevicedatabase.However,onlyresourceusageaectsdependability,soourframeworkdoesnotscaletheerrorratesbytheoperatorfrequency.AlthoughIMCresidesintheBRAMs,theinternal-memoryextensioncannotfactortheIMCintothepoweranddependabilityanalysissincetheextensiondoesnotpredicthowmuchIMCisnecessary.TopredicttheIMCusage,ourframeworkmustanalyzetheexternalmemoryaswell. 5.2.3External-MemoryExtensionTheexternal-memoryextensionenablesourframework'scompletememory-awareanalysisbyprovidinganEMBalgorithmtowraparoundourIMLPmethod.Theexternal-memoryextensionusesawrapperalgorithminsteadofdirectlymodifyingourIMLPmethodbecausetheanalysisofexternal-memoryportsandexternal-memorydevicesisnonlinearfortworeasons.First,themaximumnumberofexternal-memoryportsthatcantonadeviceisrelativelylow,soourframeworkshouldnotconsiderdevicecongurationsusingafractionalnumberofports(e.g.,thereisasignicantdierencebetween2.7portsand2ports).Secondly,thereisnomethodforencapsulatingtheEMB/IMCrelationshipinalinearequationthatcanbeunderstoodbytheunderlyingLPmethod.Therefore,insteadofdirectlymodifyingourIMLPmethod,theexternal-memoryextensionmodiesourIMLP'sinputsandoutputsbasedonEMBandIMCusage.TheEMBalgorithmteststheresultsofaddingpexternal-memoryportstoadevicecongurationbymodifyingtheinputsandoutputsofourIMLPmethod.First,the 99

PAGE 100

EMBalgorithmteststhatthedevicehasenoughresources(i.e.,I/Opins,FFs,andLUTs)toactuallytpexternal-memoryportsonthedevice.Ifthereisenoughroom,theEMBalgorithmsubtractstheresourcesneededforthepexternal-memoryportsfromtheresourcequantitiesofourIMLPmethod'sRLEs.Next,theEMBalgorithmrunsourIMLPmethodandadjuststhedevicecongurationbasedonIMCusageandp.TheEMBalgorithmcalculatesthetotalpowerbysummingthepowerresultofourIMLPmethodwiththeextrapowerconsumedbythepexternal-memoryportsandthepexternal-memorydevices,whichtheEMBalgorithmcalculatesbasedonport-resourceusageandtheexternal-memorydevicetype,respectively.NotefromTable 5-1 thatpinsusedforinputsrequiresignicantlylesspowerthanpinsusedforoutputsorbidirectionalsignals.Also,IMCusagedoesnotaectpowerbecauseIMCrepresentsmemorystorage,whichusesstaticpower,andstaticpowerremainsthesameindependentofIMCusage.TheEMBalgorithmcalculatesthetotalerrorratebysummingtheerrorrateresultofourIMLPmethodwiththeextraerrorrateforeachofthepexternal-memoryportsandtheextraerrorratefromtheIMCusage(requiresthedesignertospecifytheerrorrateperIMCbit).BycarefullychoosingpandtheIMCusagebeforerunningourIMLPmethod,theEMBalgorithmcanoptimizeadevicecongurationforeitherpowerordependability.Tooptimizeforpower,theEMBalgorithmusesthemaximumIMCavailableonthedevicetominimizetherequiredEMBanddeterminetheminimump.Usingthesmallestpossiblepminimizesthepowercontributionfromtheexternal-memoryports.Furthermore,becauseIMCusagehasnoeectonpower,theEMBalgorithmcanusethemaximumamountofIMCwithnonegativeeectonpower.Therefore,withtheoptimizationtargetsettopower,theEMBalgorithmisguaranteedtondthelowestpowerdevicecongurationbyrunningourIMLPmethodwiththemaximumIMCusageandminimump.Alternatively,whenoptimizingfordependability,theEMBalgorithmtestsmultipleIMCusageandpcombinationstondtherightbalancebetweentheIMCusageand 100

PAGE 101

EMB.Ideally,usingtheminimumpandnoIMCusagewouldproducetheoptimaldeviceconguration,sincebothincreasetheoverallerrorrate,buttheEMBalgorithmcannotreducepwithoutincreasingIMCusageduetotheEMB/IMCrelationship.TondtheoptimalcombinationofpandIMCusage,theEMBalgorithmbeginsbyndingtheminimumpusingthemaximumIMCusage.However,beforerunningourIMLPmethod,theEMBalgorithmreducestheIMCusageasmuchaspossiblewithoutrequiringanincreaseinp.MinimizingIMCusageinthiswayguaranteesthatourIMLPmethod(withtheoptimizationtargetsettodependability)willproduceadevicecongurationwiththeminimumerrorrateforthecurrentvalueofp.TheEMBalgorithmiteratesthisprocess,increasingpbyoneeachstep,untilpbecomessolargethattheexternal-memoryportsdonottonthedeviceorourIMLPmethodsfailstomeettheCDrequirement(duetotheexternal-memoryportsusingtoomanyFFsandLUTs).Aftercollectingtheoptimaldevicecongurationsforeachvalidvalueofp,theEMBalgorithmselectstheoptimaldevicecongurationwiththelowesterrorrate.Sincedierentexternal-memorydevicesmaydierintheirpowerconsumptionandEMB,theEMBalgorithmoptimizesforpowerordependabilityforeachtypeofexternal-memorydevice.Fromtheresultsoftheseoptimizationtestsoneachexternal-memorydevicetype,theEMBalgorithmdeterminestheoptimaltypeofexternal-memorydeviceandoutputsthisresultasthenaloutputalongwiththecorrespondingoptimaldeviceconguration. 5.3CaseStudyInordertoaccuratelyanalyzeanFPGA-basedspace-systemdesignandpredictthedesign'spoweranddependability,ourframeworkmustcorrectlymodelinter-memoryresourceinteractionswithrespecttotheoperationsperformedonthedata.Wedemonstratehowdesignersspecifyanapplicationandtheapplication'seectontheEMB/IMCrelationshipusingthecasestudy'sHSI-analysisapplication,whichweapproximateasamatrixmultiplication(MM).First,weintroduceourcasestudybasedontheHyperion 101

PAGE 102

HSIsensor.ThenweanalyzeourcasestudybyinvestigatingblockMMforthedimensionsspeciedinourcasestudy. 5.3.1HyperionHyperspectral-ImagingSensorOurcasestudyisbasedontheHyperionHSIsensorontheEO-1satellitemission[ 36 37 ].TheHyperionHSIsensorcapturesanimagecube( 3-9 )representingagroundsceneevery2.95seconds,whichconsistsof256by660pixelsand22012-bitspectralvaluesperpixel,resultingin18.9MB/sofrawdata.HSIanalysisoftheimagecubecanidentifythelocationsofcertainmaterialsofinterestwithinascene,resultinginoneormoretwo-dimensionalimagesthatlackthespectraldimensionandarethereforemuchsmallerthantheoriginalimagecube.IfasystemcouldperformtheHSIanalysisquicklyenoughinsitu,theEO-1satellitewouldneedtosendonlyasmallfractionoftheoriginaldatatoEarth,potentiallyenablingthecontinuousstreamingofresultsthoughthelimiteddownlinkbandwidth.OurcasestudyexamineshowourframeworkwouldaidindesigninganFPGA-basedspacesystemtoperforminsituHSIanalysisfortheHyperionHSIsensor.AlthoughHSIanalysisisacomplexprocess,wecansimplifyourHSIcomputationalmodelbylookingattheHSIanalysis'slargestconstituentkernel.Jacobsetal.[ 35 ]proledtheHSI-analysiscomputationsanddeterminedthatthemajorityofthecomputationwasasingleMM.ForanimagecubesimilartotheoneproducedbyHyperion,97%oftheHSIanalysiscomputationisanMMwheretherstinputmatrixhasdimensionsof220(numberofspectralbands)by168,960(numberofpixelsinascene)andthesecondinputmatrixisatransposeoftherst.Sincetheremaining3%ofcomputationiscomparativelynegligible,wecangreatlysimplifyourcasestudyanalysisbymodelingtheHSIanalysisasasingleMMforthesedimensions.WereferthereadertoSection 3.4.1 foramoredetailedexplanationofHSIanalysisandtheHyperionHSImission.TheEO-1satellitemissionlaunchedintospaceinNovember,2000.Sincethen,thenumberofspectralbandsandpixelsthatHSIsensorscangatherhasincreased,withhigh-endsensorscapturingupto1,000spectralbandsandupto1,000pixelsacross[ 65 ]. 102

PAGE 103

Furthermore,mostmoderncommercialFPGAshaveaCDthatismuchhigherthanwhatwouldberequiredbythestandardHyperionHSIsensor.TocompensatefortheageoftheHyperionsensorandinvestigateamoremodernmission,ourcasestudyconsidersanenhancedversionoftheHyperionsensorthatiscapableofcapturinganenhancedimagecubethatisroughlytwiceaslargeineverydimensionastheoriginalsize.TheotherpropertiesoftheenhancedHyperionsensorremainthesameastheoriginalsensor,resultingintheproductionofa5001300400imagecubeevery2.95seconds. 5.3.2Case-StudyPerformanceandMemoryAnalysisWeinvestigateblockMMtoanalyzeandquantifythecasestudy'smemoryrequirements.Forlargematrices,likethoseinourcasestudy,blockMMismuchmoreecientwithlimitedIMCthanstandardMM,soblockMMismorerepresentativeofwhatadesignerwouldactuallyuse. Figure5-2.BlockMMdividestheinputandoutputmatricesintosquareblocksofsizennandcomputesoneCblockatatimeusingmultiplepartialMMs. 103

PAGE 104

Figure 5-2 showshowblockMMworksforthecasestudywithinputmatricesAandBandoutputmatrixC.SinceBisthetransposeofA,wecandenethedimensionssandm(forthesideandmiddledimensionsoftheMM,respectively),whereAhasdimensionssm,Bhasdimensionsms,andChasdimensionsss.Forourcasestudy,s=400forthenumberofspectralbands,andm=650;000forthetotalnumberofpixelsinanimagecube.TheFPGAdivideseachmatrixintoasetofsquareblocksofsizennandcomputesoneCblockatatimeusingthecorrespondingrowofblocksinAandcolumnofblocksinB.TheFPGAprocessestheArowandBcolumnoneblockatatimebyfetchingtheAandBblocksfromexternalmemory,storingtheseblocksintheFPGA'sIMC,andperformingapartialMMbymultiplyingtheAandBblockstocalculateapartialCblock.TheFPGAaccumulatesthesumofallthepartialCblocksfromallthepartialMMsofarow/columnpairtoproducethenalCblock,whichtheFPGAstorestotheexternalmemoryfromtheIMC.Basedonthedimensionsofthematrices,theFPGAmustperformm=npartialMMsforeachofthe(s=n)2CblocksinC,foratotalofms2=n3partialMMsperblockMM.However,sinceBisthetransposeofA,Cmustbesymmetric,requiringtheFPGAtoonlycalculatehalfofCandthereforeperformonlyms2=(2n3)partialMMsperblockMM. Figure5-3.EachentryinthepartialCblockrequiresnmultiplyandnadditionoperationsandnwordsfromboththeAandBblocks. 104

PAGE 105

EachpartialMMiseectivelyastandardMMperformedonanAandBblocktoproduceapartialCblock.AsFigure 5-3 shows,eachentryinthepartialCblockrequiresann-lengthdotproductconsistingofnmultipliesandnadditions.Sincetherearen2entriesinthepartialCblock,theFPGAmustperform2n3operationsforeachpartialMM.ThereforetherequiredCDtocalculateafullblockMMin=2.95secondsis: CD=ms2 2n32n31 =ms2 =35.254GOPS(5{3)Similarly,becauseeachdotproductrequiresnwordsfromtheAblockandnwordsfromtheBblock,andeachwordis4byteslong,therequiredIMBis: IMB=4ms2 2n32n31 =4ms2 =141.02GB/s(5{4)SincetheFPGAonlystoresthenalCblocksintoexternalmemory,andthereare(sn)2Cblocksofsizen2,therequiredoutputEMBis: EMBout=4n2s n21 =4s2 =216.95kB/s(5{5)Finally,foreachpartialMM,theFPGAmustfetchanAandBblockfromexternalmemoryintotheIMC,requiringaninputEMBof: EMBin=42n2ms2 2n31 =4ms2 n=141:02 nGB/s(5{6)Comparing( 5{6 )to( 5{4 )conrmsthatthelowerboundforinputEMBisequaltoIMB,sincencannotbelessthan1.UnliketheCD,IMB,andoutputEMB,therequiredinputEMBisdependentontheblocksize,asshownbytheremainingnterm.SincenisdirectlyrelatedtotheIMCusage,wecandenetherelationshipbetweenIMCusageandntoexpresstherequiredinputEMBintermsoftheIMCusage.FortheFPGAtocorrectlyperformapartialMM,theAandBblocksforthenextpartialMMcannotreplacethecurrentAandBblocksinIMCuntilthecurrentpartialMMcompletes.However,iftheFPGAwaitsuntilthecompletionofonepartialMM 105

PAGE 106

tostartfetchingthenextAandBblocks,therewillbestallbetweensuccessivepartialMMswhentheoperationscannotprocessdata,therebyreducingperformance.Toresolvethisissue,theFPGAstoresA'andB'blocksforthenextpartialMMinadditiontotheAandBblocks.WhentheFPGAoperatesontheAandBblocks,theFPGAalsobeginsfetchingtheA'andB'blocks.WhenthecurrentpartialMMcompletes,thenextpartialMMbeginswiththeA'andB'blocks,whichbecometheAandBblocks,andtheFPGAcanbeginfetchingthenextpartialMM'sA'andB'blocks.Withthispre-fetchingmethod,theoperatorscancontinuouslyprocessdatawhiletheexternal-memoryportsfetchnewdataintotheIMC.Therefore,theIMCusagemustbelargeenoughtostoretheA,B,A',B',andpartialCblocks.Sinceeachblockhasn24-byteentries,wecalculatetheIMCusageas: IMBB=45n2=20n2Bytes(5{7)From( 5{7 ),wecancalculatenbasedonbytesofIMCusage: n=r IMCB 20(5{8)Combining( 5{6 )and( 5{8 )producestherequiredinputEMBasafunctionofIMCusage: EMBin=141:02 r IMCB 20!)]TJ /F6 7.97 Tf 6.59 0 Td[(1=630:65 p IMBBGB/s(5{9)Comparing( 5{6 )and( 5{5 )showsthattherequiredinputEMBisgreaterthantherequiredoutputEMBifandonlyifm
PAGE 107

Withtheresultfrom( 5{10 ),thedesignercancompletelyspecifytheEMB/IMCrelationship,enablingourframeworktoperformthememory-awareanalysis. 5.4ResultsandAnalysisUsingtheHSImissionasourcasestudy,weperformthreeexperimentstodemonstratethememory-awareanalysisenabledbyourframework'smemoryextension.WeevaluatealloftheLXmodelsfromtheVirtex-4,Virtex-5andVirtex-6FPGAfamiliesaswellasLXTmodelsfromtheVirtex-6family(includedbecausethereisonlyoneLXmodelforVirtex-6).Foreachmodel,weonlyconsiderthemostcapablepackageintermsofhighestI/Opincountandfastestspeedgrade.Toanalyzetheexternal-memoryports,weusetheXilinxCOREGeneratorSystemtogeneratethelatestgenerationofDDRportavailableforeachdevice(DDR3forVirtex-6andDDR2forVirtex-4andVirtex-5).Fortheexternal-memorydevice,weassumeastandardpowerconsumptionof3Wattsandnoerrorrateduetoradiationhardening.First,weshowhowincreasingtherequiredperformanceofthecasestudyaectsthepoweranddependabilityofanFPGA-basedspacesystem.WethenanalyzetheeectsofvaryingdevicesizebytestingdierentsizedFPGAswithinthesamefamily.Finally,weshowhowourframeworkndsthePareto-optimaldesignsfrommultiplefamiliesofFPGAs. 5.4.1EectsofIncreasingRequiredPerformanceTobetterunderstandtheresultsfromourframework'smemory-awareanalysis,weinvestigatetheoptimaldevicecongurationsforpoweranddependabilityforasingledeviceasHSIanalysisperformancerequirementsincrease.WechoosetoinvestigatetheVirtex-5LX155-FF1760-3duetothedevice'saverageperformancewithintheVirtex-5family,andtheVirtex-5'saverageperformancerelativetotheotherFPGAfamiliesevaluated.Weadjusttheperformancerequirementbyadjusting,thetimeallowedtoprocessanimagecube.TheactualvalueoffortheHSImissionis2.95seconds,which 107

PAGE 108

requires35.254GOPS,sotheappropriateforadesiredrequiredCDis: =3:52541010 CD(5{11) Figure5-4.SourcesofpowerconsumptionasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithapower-optimizeddeviceconguration. WeanalyzeanFPGA-basedspacesystemwithaVirtex-5LX155-FF1760-3FPGAdeviceandapower-optimizeddeviceconguration.Figure 5-4 showsthevaryingpowerconsumptionoftheFPGA-basedspacesystem'sexternal-memorydevicesandtheresourcesandoperationsoftheFPGAdeviceastheperformancerequirementsoftheHSIanalysisincrease.Forperformancebelow10GOPS,systempowerisdominatedbytheFPGA'sstaticpower(1.322W)andthepowerofasingleexternal-memorydeviceandexternal-memoryportontheFPGA(external-memorydevicesandexternal-memoryportsdonothavealow-powermode).Between0and30GOPS,powerconsumptionrisesprimarilyfromanincreasingnumberofmultiplyoperations(usingDSPs)andincreasingBRAMactivityduetoincreasingIMB.Althoughtherearealwaysanequalnumberof 108

PAGE 109

multiplyandaddoperationsontheFPGA,thepowerconsumptionofeachaddoperationisapproximatelyonetenthofthepowerconsumptionofamultiplyoperationwithDSPunits.At31GOPS,theFPGArequiresallavailableDSPunitstosustaintherequiredrateof32multipliespercycle.Whenperformanceexceeds31GOPS,theFPGAmustbeginusingthemorepower-hungrynon-DSPmultiplyoperations,whichconsistonlyofFFsandLUTs.Thenon-DSPmultiplyoperationshavealowermaximumoperatingfrequencythantheotheroperations,sotheFPGAmustreducetheoverallfrequencysothatalloperationsfunctionproperlytogether.Therefore,theFPGAneedsmoretotaloperationstokeepupwiththeperformancerequirementatthereducedfrequency.SincetherearenomoreDSPunits,theFPGAmustadd13newnon-DSPmultiplyoperationsassoonasperformanceexceeds31GOPS,increasingthetotalpowerconsumptionbyapproximately1.6W.At66GOPS,theexternal-memoryportcannolongerkeepupwiththeEMBrequirement,requiringanotherexternal-memoryportandexternalmemorydevice,whichincreasesthetotalpowerconsumptionbyapproximately4W.At75.44GOPS,thesystemreachesmaximumperformance,consuming25.69WofpowerandusingalloftheavailableFFsandDSPunitsandmostoftheavailableLUTs.WealsoanalyzethesameFPGA-basedspacesystemusingadependability-optimizeddeviceconguration.Figure 5-5 showsthevaryingerrorratesforeachoftheFPGA'sresourcesandoperationsastheperformancerequirementsoftheHSIanalysisincrease.Ofthethreeoperations,non-DSPmultiplyoperationshavethehighesterrorratesincetheseoperationsuseapproximately35timesmoreFFsandLUTsthantheaddoperationsanddonotbenetfromtherelatively-lowerrorratesoftheDSPunits.Similartothepowertrends(Figure 5-4 ),at31GOPS,theFPGAmustincorporateseveralnon-DSPmultiplyoperations,resultinginadoublingoftheerrorrate.Thereisalsoaninterestingdynamicbetweentheexternal-memoryportsandIMCusage.Asperformanceincreasesupto9GOPS,theFPGAusesonlyoneexternal-memory 109

PAGE 110

Figure5-5.SourcesoferrorsasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithadependability-optimizeddeviceconguration. port,sotheFPGAmustincreaseIMCusageinordertousethedatafromthesinglememoryportmoreeectively.At9GOPS,IMCusagebecomessolargethattheFPGAcanaddanadditionalexternal-memoryporttosignicantlyreducetheIMCusageandachievealowertotalerrorrate.Thistradeooccursseveralmoretimesasperformanceincreases,reachingashighassixexternal-memoryportsat71.5GOPS.However,at71.8GOPS,theoperationsandexternal-memoryportsusealloftheavailableDSPunitsandFFs,sotheFPGAmustachievefurtherperformanceincreasesbyreducingtheexternal-memoryportsinordertoincreasethenumberofoperations.Thistradeoreducesthenumberofexternal-memoryportsbelowtheoptimalamountandsignicantlyincreasestheoverallerrorrate.At75.44GOPS,thesystemagainreachesmaximumperformancewithatotalerrorrateof45.69errorsperday.NotethatFigure 5-4 showsanFPGA-basedspacesystemdesignedforoptimalpower,andFigure 5-5 showsanFPGA-basedspacesystemdesignedforoptimaldependability, 110

PAGE 111

Figure5-6.TotalpowerconsumptionanderrorratesasperformancerequirementsofHSIanalysisincreaseforasystemusingaVirtex-5LX155-FF1760-3devicewithadevicecongurationoptimizedforeitherpowerordependability. sothetwoguresdonotrepresentthesamesystemdesign.Figure 5-6 showsthetradeothatexistsbetweenthepower-optimizedanddependability-optimizeddesigns.Sincethedependability-optimizeddesignpreemptivelyincreasesthenumberofexternal-memoryportstokeepIMCusagelow,thedesignrequiresmorepowerduetotheexternal-memorydevicesandportsascomparedtothepower-optimizeddesign.Alternatively,thepower-optimizeddesignavoidsincreasingthenumberofexternal-memoryportstokeepthepowerconsumptionlow,butdoessodespiterapidlyincreasingerrorratesduetoincreasingIMCusage. 5.4.2EectsofVaryingDeviceSizeToanalyzethepoweranddependabilityofanFPGAfamilyfortheperformancerequirementsoftheHSIcasestudy(=2.95secondsandrequiredCD=35.254GOPS),weusetheVirtex-4FPGAdevicefamilybecausetheVirtex-4'slimitedprocessingandmemorycapabilitiesproducethemostinterestingtrendsinpoweranddependability.Of 111

PAGE 112

theeightVirtex-4LXmodelsevaluated,weonlyconsiderthelargestfourmodelsbecausetheothermodelscannotmeetthecasestudy'srequiredperformance. Figure5-7.SourcesofpowerconsumptionforasystemusingaVirtex-4LXdevicewithadevicecongurationoptimizedforpower. Figure 5-7 showsthesourcesofpowerconsumptionforFPGA-basedspacesystemsusingaVirtex-4LXdevicewithadevicecongurationoptimizedforpower.TheLX80deviceconsumesthreetofourWattsmorepowerthantheotherthreedevicesduetotheLX80'ssmallerIMC,requiringtwoexternal-memorydevicesandportstoachievetherequiredperformance.Betweenthelargestthreedevices,staticpowerconsumptionisthelargestdierentiator.AlthoughthelargestthreedeviceshaveanequalnumberofDSPunits,thenon-DSPunitsconsumemorepowerintheLX200devicebecauseofthedevice'slowerspeedgrade.SincetheLX200device'sDSPunitsoperateataslowerfrequency,thedevicemustusemorepower-hungry,non-DSPmultiplyoperations.TheLX100deviceachievesthelowestpowerconsumptionintheVirtex-4familyduetothedevice'slowstaticpower,highnumberofDSPunitsandBRAMs,andhighspeedgrade. 112

PAGE 113

Figure5-8.SourcesoferrorsforasystemusingaVirtex-4LXdevicewithadevicecongurationoptimizedfordependability. Figure 5-8 showsthesourcesoferrorsforFPGA-basedspacesystemsusingaVirtex-4LXdevicewithadevicecongurationoptimizedfordependability.TheLX200devicestillsuersahighererrorrateduetothedevice'slowspeedgradeandgreaterusageofnon-DSPmultiplyoperations.TheLX80devicenolongerusesmoreexternal-memorydevicesandportsthantheotherthreeFPGAdevices(allfourFPGAdevicesusethreeexternal-memorydevicesandports).However,theLX80alsohasfewerDSPunitsthantheotherthreedevices,sotheLX80deviceusesmorenon-DSPmultiplyoperationsthantheLX100andLX160devices.TheLX100andLX160havethelowesterrorratesduetothedevices'highspeedgradesandhighnumberofDSPunits. 5.4.3FindingPareto-OptimalDesignsThenaloutputofourframeworkisthesetofPareto-optimaldesignsforthemission.APareto-optimaldesignisadesignthat,whencomparedtoanyotherdesign,issuperiorwithrespecttoatleastoneofthemissiongoals(i.e.,powerordependability).Therefore,thePareto-optimalsetpresentsdesignerswithdesignsthatspecializeinpower, 113

PAGE 114

dependability,orsomecombinationofthetwo,allowingthedesignertomakethenaldecisionofwhichmissiongoalsaremostcritical. Figure5-9.GraphoferrorratesversuspowerforVirtex-4,-5,and-6FPGAfamilies(largermarkersrepresentlargerdeviceswithinafamily)showingnalPareto-optimalfronttracedontopofthefourmembersofthePareto-optimalset. Figure 5-9 showsthepoweranddependabilityresultsforthecasestudyforallFPGA-basedspacesystemsusingaVirtex-4,Virtex-5,orVirtex-6FPGAdevice.ThePareto-optimalsetshowsthetradeospacebetweenthevariousPareto-optimaldesigns.Eachmarkerrepresentsauniquesystemdesign(i.e.,uniquecombinationofFPGAdeviceandoptimizationtarget),withlargermarkersrepresentingarelativelylargermodelwithinthesystem'sdevice'sfamily.Wedidnotincludedesignsthatcannotachievethecasestudy'srequiredperformance(thesmallesttwoVirtex-5andfourVirtex-4devices).AsidefromtheVirtex-5LX330designs,everyVirtex-6designperformsbetterinpoweranddependabilitythaneveryVirtex-4andVirtex-5design.Furthermore,theVirtex-6designsshownovariationbasedontheoptimizationtarget.ThisisbecausethedeviceCDofeveryVirtex-6FPGAdeviceissignicantlyhigherthanthecasestudy's 114

PAGE 115

requiredCD.AsseeninFigure 5-6 ,asimilarphenomenonoccurswithVirtex-5devicesperforminglessthan10GOPS,wherethechoiceofoptimizationtargethasnoeectonthepowerordependability.ForVirtex-6designs,poweranddependabilityworsenwithincreasingdevicesize(exceptforthedependabilityoftheVirtex-6LX130TandLX240Tdesigns).However,thisisnotageneralruleforeverydevicefamilyasseenwiththeVirtex-4andVirtex-5designs.TheVirtex-6LX130TandLX240Tdesignsstandoutwithslightlyimproveddependabilityduetothedevices'addoperations,whichrunslightlyfasterthanthoseoftheotherVirtex-6designs.SincetheVirtex-6designsdonotuseanynon-DSPmultiplyoperationsfortherequiredCD,theaddoperationsaretheslowestoperationandthereforedeterminetheoperatorfrequencyoftheFPGA.ThehigheroperatorfrequencyresultsinalowernumberofoperatorsneededtomeettherequiredCD,resultinginhigherdependabilitysincetherearefeweroperatorstoexperienceanerror.Finally,thehighdependabilityoftheVirtex-5LX330'sdependability-optimizeddesignisduetoacombinationoftheVirtex-5'sinherentlylowcongurationmemoryerrorrateandtheVirtex-5LX330device'shighnumberofDSPunits,allowingthedevicetomeettheperformancerequirementwithoutusingnon-DSPmultiplyoperations. Table5-2.PowerconsumptionanderrorratesforthefourPareto-optimaldesigns. DeviceOpt.TargetPower(W)Errors/day Virtex-6LX75T-FF784-3N/A7.7517.962Virtex-6LX130T-FF784-3N/A8.1287.950Virtex-6LX240T-FF1759-3N/A8.7267.718Virtex-5LX330-FF1760-2Dependability21.525.672 Table 5-2 showsthefourPareto-optimaldesignsforthecasestudy.ThesmallestVirtex-6designoersthebestpowerandreasonabledependability.Themid-rangeVirtex-6LX240Tdesignoersslightlybetterdependability,butatthecostofonemoreWattofpowerconsumption.ThelargestVirtex-5dependability-optimizeddesignoersthemostdependability,butthiscomesatthecostofmorethandoublethepowerconsumptionovertheVirtex-6designs. 115

PAGE 116

5.5ConclusionsInthischapter,wehaveintroducedourframework'smemoryextension,whichenablesmemory-awareanalysisbyrenementstoourframework'soriginalanalysis,furtherimprovingourframework'sabilitytoevaluateandoptimizeincreasinglycomplexFPGA-basedspacesystems.Thememory-awareanalysismoreaccuratelypredictsanFPGA-basedspacesystem'spoweranddependabilitybymodelingtheinternal-memorycapacity(IMC),internal-memorybandwidth(IMB),andexternal-memorybandwidth(EMB)memoryresources.Duetothenon-trivialandapplication-dependentnatureoftheEMB/IMCrelationship,thedesignermustspecifythisrelationshipinadditiontotheapplicationbeforeourframeworkcanbeginanalysis.Thememoryextensionexpandsourframework'sanalysistoamorecompleteandaccurateholisticdesignviewrevealingthatthebestsystemsdonotalwaysusethefewestexternal-memorydevicesorsmallest,largest,ormostmodernFPGAdevices.Todemonstratetheimportanceofourframework'smemoryextension,weinvestigatedacasestudybasedonanenhancedversionofahyperspectral-imaging(HSI)satellitemission.Weusedthecasestudytoshowthemethodadesignermightusetoanalyzetheirapplicationanddeterminetheirapplication'sEMB/IMCrelationship.Forthecasestudymission,ourframeworkevaluated22VirtexfamilyFPGAs,determinedpower-anddependability-optimizeddevicecongurationsforeachdevice,andselectedfourPareto-optimaldesignsrangingfromvery-lowpowertohighdependability.Resultsshowthatmemoryresourcesmaylimittheperformanceofasystemandcontributesignicantlytowardspoweranddependabilityresults. 116

PAGE 117

CHAPTER6CONCLUSIONSInthisdissertation,wehavepresentedourframeworktoaccelerateandimprovetheearly-designphaseofFPGA-basedon-boardprocessingsystemsforspacemissions.Ourframework'smethodologybeginswithevaluatingthousandsofdesignsconsistingofdierentdeviceandFTstrategycombinationswithregardstovariousevaluationmetrics(e.g.,power,dependability,lifetime)foraparticularapplicationandmission.Basedonthesemetrics,ourframeworkdramaticallyprunesthedesign-explorationspacebyselectingonlythePareto-optimaldesigns,whichrepresenttheoptimaltradeosavailabletoadesigner.Duetoourframework'sexibility,wewereabletoenhancetheevaluationmetricsbyreplacingtheCDmethodologywithourLPmethod.UsingourLPmethodimprovestheaccuracyofthepoweranddependabilityevaluationmetrics,dramaticallyexpandstherangeofapplicationsourframeworkcananalyze,andenablesourframeworktomakerecommendationsaboutFPGAdevicecongurationsinadditiontodeviceandFTstrategyrecommendations.Finally,thepowerandexibilityofourLPmethodallowedustoaddourframework'smemoryextension,whichimprovedtheaccuracyofourframeworkbyincludingIMB,IMC,andEMBinourframework'sdesignmodelingandanalysis.Usingmultiplecasestudies,wehavedemonstratedthecapabilitiesofouroriginalframeworkaswellasoursubsequentmodicationsandadditions.Ourframeworkdemonstratedeectivenessasanearly-designtoolinourHSIcasestudybyreducingadesignspaceofover1,000designstoasetoffteenPareto-optimaldesigns,a98.7%reduction.WethenexperimentallyprovedtheaccuracyofourLPmethod'spredictionsusingtwocasestudiesinvolvingdotproductanddistancecalculationapplications.Finally,wedemonstratedthefullculminationofthisresearchinanotherHSIcasestudy,whichshowedthepowerofourLPmethodandtheimportanceofconsideringmemorywhen 117

PAGE 118

evaluatingtheperformance,power,anddependabilityofFPGA-basedon-boardprocessingsystems.Futuredirectionsforthisworkincludeexpandingourframeworktobeabletoanalyzehybriddevicesthatcontainbothrecongurableandxedlogicresources(e.g.theXilinxZynq,whichcontainstwoembeddedARMprocessors).Thiswouldinvolvegatheringdataonthexedlogicresourcesusingthexed-logicCDmethodologyandthenrepresentingthisdataaslinearequationsthatcouldbeincludedinourLPmethodwiththeexistingequationsdescribingtherecongurablelogic.FutureworkinstudyingtheRUscoresofotherFPGAdevicesandcommonapplicationswouldhelprenetheaccuracyofourLPmethodforthosedevicesandapplications,andmayshedlightonageneralmethodologyforpredictingdecreasingdesignfrequencieswithouttheneedforaprioriRUscoremeasurements.Finally,itwouldbeinterestingtoenableourIMLPmethodtooptimizeforacombinationofpoweranddependability,ratherthanoneortheother.AsseenwiththeVirtex-5LX330design,therecanbealargevarianceinpoweranddependabilitybetweenpower-optimizedanddependability-optimizeddevicecongurations.DeterminingoneormoredesignsthatareacompromisebetweenthesetwoextremesenablesabroaderrangeofdesignoptionsthatcouldleadtoamorecompletesetofPareto-optimaldesigns. 118

PAGE 119

REFERENCES [1] J.Engel,K.Morgan,M.Wirthlin,andP.Graham,\Predictingon-orbitstaticsingleeventupsetratesinXilinxVirtexFPGAs,"LosAlamosNationalLaboratory,AllFacultyPublications,Tech.Rep.,2006.[Online].Available: http://scholarsarchive.byu.edu/facpub/1307/ [2] D.Hiemstra,F.Chayab,andZ.Mohammed,\SingleeventupsetcharacterizationoftheVirtex-4eldprogrammablegatearrayusingprotonirradiation,"inRadiationEectsDataWorkshop(REDW),2006IEEE,July2006,pp.105{108. [3] (2012)Singleeventeectstestingprimer.Aerospace.[Online].Available: http://www.aero.org/capabilities/seet/primer/html [4] J.Fabula,J.DeJong,A.Lesea,andW.Hsieh,\ThetotalionizingdoseperformanceofdeepsubmicronCMOSprocesses,"inProc.ofMilitaryandAerospaceProgrammableLogicDevicesConference(MAPLD),Annapolis,MD,Sep.15{18,2008.[Online].Available: https://nepp.nasa.gov/mapld 2008/presentations/w/08-Fabula Joseph mapld08 pres 2.pdf [5] E.Daly,A.Hilgers,G.Drolshagen,andH.Evans,\Spaceenvironmentanalysis:Experienceandtrends,"inESA1996SymposiumonEnvironmentModellingforSpace-basedApplications,Noordwijk,TheNetherlands,Sep.18{20,1996.[Online].Available: http://atanar-esa.cdnetworks.net/conferences/96a09/Abstracts/abstract45/paper/ [6] G.Vane,\HighspectralresolutionremotesensingoftheEarth,"Sensors,vol.2,pp.11{19,1985. [7] R.Clark,T.King,C.Ager,andG.Swayze,\Initialvegetationspeciesandsenescence/stressmappingintheSanLuisValley,Colorado,usingimagingspectrometerdata,"ProceedingsoftheSummitvilleForum'95,pp.64{69,1995. [8] J.Branke,K.Deb,K.Miettinen,andR.Slowinski,Eds.,MultiobjectiveOptimization,1sted.Germany:Springer-VerlagBerlinHeidelberg,2008. [9] N.MahapatraandB.Venkatrao,\Theprocessor-memorybottleneck:Problemsandsolutions,"Crossroads,vol.5,no.3es,Apr.1999.[Online].Available: http://doi.acm.org/10.1145/357783.331677 [10] J.Williams,A.George,J.Richardson,K.Gosrani,C.Massie,andH.Lam,\Characterizationofxedandrecongurablemulti-coredevicesforapplicationacceleration,"ACMTrans.RecongurableTechnol.Syst.,vol.3,no.4,pp.19:1{19:29,Nov.2010. [11] J.Richardson,S.Fingulin,D.Raghunathan,C.Massie,A.George,andH.Lam,\Comparativeanalysisofhpcandacceleratordevices:Computation,memory,i/o,and 119

PAGE 120

power,"inHigh-PerformanceRecongurableComputingTechnologyandApplications(HPRCTA),2010FourthInternationalWorkshopon,Nov2010,pp.1{10. [12] R.Pease,A.Johnston,andJ.Azarewicz,\Radiationtestingofsemiconductordevicesforspaceelectronics,"ProceedingsoftheIEEE,vol.76,no.11,pp.1510{1526,1988. [13] B.Holland,K.Nagarajan,andA.George,\Rat:Rcamenabilitytestforrapidperformanceprediction,"ACMTrans.RecongurableTechnol.Syst.,vol.1,no.4,pp.22:1{22:31,Jan.2009.[Online].Available: http://doi.acm.org/10.1145/1462586.1462591 [14] R.Enzler,T.Jeger,D.Cottet,andG.Troster,\High-levelareaandperformanceestimationofhardwarebuildingblocksonfpgas,"inField-ProgrammableLogicandApplications:TheRoadmaptoRecongurableComputing,ser.LectureNotesinComputerScience,R.HartensteinandH.Grnbacher,Eds.SpringerBerlinHeidelberg,2000,vol.1896,pp.525{534.[Online].Available: http://dx.doi.org/10.1007/3-540-44614-1 57 [15] M.Meswani,L.Carrington,D.Unat,A.Snavely,S.Baden,andS.Poole,\Modelingandpredictingperformanceofhighperformancecomputingapplicationsonhardwareaccelerators,"InternationalJournalofHighPerformanceComputingApplications,vol.27,no.2,pp.89{108,May2013. [16] A.Tylka,J.Adams,P.Boberg,B.Brownstein,W.Dietrich,E.Flueckiger,E.Petersen,M.Shea,D.Smart,andE.Smith,\CREME96:Arevisionofthecosmicrayeectsonmicro-electronicscode,"IEEETrans.Nucl.Sci.,vol.44,no.6,pp.2150{2160,Dec.1997. [17] J.Neumann,\Probablisticlogicsandthesynthesisofreliableorganismsfromreliablecomponents,"AutomataStudies,pp.43{98,1956. [18] R.LyonsandV.W.,\Theuseoftriple-modularredundancytoimprovecomputerreliability,"IBMJournalofResearchandDevelopment,vol.6,no.2,pp.200{209,Apr1962. [19] K.HuangandJ.Abraham,\Algorithm-basedfaulttoleranceformatrixoperations,"IEEETrans.Comput.,vol.C-33,no.6,pp.518{528,June1984. [20] A.Jacobs,G.Cieslewski,A.George,A.Gordon-Ross,andH.Lam,\Recongurablefaulttolerance:Acomprehensiveframeworkforreliableandadaptivefpga-basedspacecomputing,"ACMTrans.RecongurableTechnol.Syst.,vol.5,no.4,pp.21:1{21:30,Dec.2012.[Online].Available: http://doi.acm.org/10.1145/2392616.2392619 [21] B.Pratt,M.Fuller,M.Rice,andM.Wirthlin,\Reduced-precisionredundancyforreliablefpgacommunicationssystemsinhigh-radiationenvironments,"AerospaceandElectronicSystems,IEEETransactionson,vol.49,no.1,pp.369{380,Jan2013. 120

PAGE 121

[22] K.Asanovic,R.Bodik,B.Catanzaro,J.Gebis,K.Keutzer,D.Patterson,W.Plishker,J.Shalf,S.Williams,andK.Yelick,\Thelandscapeofparallelcomputingresearch:AviewfromBerkeley,"Univ.ofCalif.,Berkeley,Tech.Rep.UCB/EECS-2006-183,2006.[Online].Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf [23] K.Sahu,H.Leidecker,andD.Lakins,EEE-INST-002:InstructionsforEEEPartsSelection,Screening,Qualication,andDerating,May2003.[Online].Available: http://nepp.nasa.gov/DocUploads/FFB52B88-36AE-4378-A05B2C084B5EE2CC/EEE-INST-002 add1.pdf [24] N.Wulf,A.George,andA.Gordon-Ross,\Memory-awareoptimizationofFPGA-basedspacesystems,"inAerospaceConference,2015IEEE,March2015,pp.1{13. [25] A.Jacobs,G.Cieslewski,andA.George,\Overheadandreliabilityanalysisofalgorithm-basedfaulttoleranceinFPGAsystems,"inFieldProgrammableLogicandApplications(FPL),201222ndInternationalConferenceon,Aug2012,pp.300{306. [26] N.Wulf,J.Richardson,andA.George,\OptimizingFPGAperformance,power,anddependabilitywithlinearprogramming,"inProc.ofMilitaryandAerospaceProgrammableLogicDevicesConference(MAPLD),SanDiego,CA,April2013. [27] D.White,\Considerationssurroundingsingleeventeectsinfpgas,asics,andprocessors,"Xilinx,Tech.Rep.WP402,March2012.[Online].Available: http://www.xilinx.com/support/documentation/white papers/wp402 SEE Considerations.pdf [28] G.Cieslewski,A.George,andA.Jacobs,\AccelerationofFPGAfaultinjectionthroughmulti-bittesting,"inProc.ofInternationalConferenceonEngineeringofRecongurableSystemsandAlgorithms(ERSA),July12,2010,pp.218{224. [29] G.NazarandL.Carro,\Fastsingle-FPGAfaultinjectionplatform,"inDefectandFaultToleranceinVLSIandNanotechnologySystems(DFT),2012IEEEInternationalSymposiumon,Oct2012,pp.152{157. [30] M.Poizat,\Totalionizingdose:Mechanismsandeects,"SpaceCenterEPFLandEuropeanSpaceAgency,Tech.Rep.,June9,2009.[Online].Available: http://space.ep.ch/webdav/site/space/shared/industry media/03EPFL TID Basic-Mech.pdf [31] S.Kayali,\Spaceradiationeectsonmicroelectronics,"NASAJPLRadiationEectsGroup,Tech.Rep.[Online].Available: http://parts.jpl.nasa.gov/docs/Radcrs Final.pdf [32] C.PoiveyandG.Hopkinson,\Displacementdamage:Mechanismsandeects,"SpaceCenterEPFLandEuropeanSpaceAgency,Tech.Rep.,June9,2009.[Online].Available: http://space.ep.ch/webdav/site/space/shared/industry media/05DDissue3.pdf [33] W.BrownandJ.Gabbe,\TheelectrondistributionintheEarth'sradiationbeltsduringJuly1962asmeasuredbyTelstar,"JournalofGeophysicalResearch,vol.68,no.3,pp.607{618,1963. 121

PAGE 122

[34] (2014)HyspIRImissionstudy.NASAJetPropulsionLaboratory.[Online].Available: http://hyspiri.jpl.nasa.gov [35] A.Jacobs,C.Conger,andA.George,\Multiparadigmspaceprocessingforhyperspectralimaging,"inAerospaceConference,2008IEEE,March2008,pp.1{11. [36] J.Pearlman,P.Barry,C.Segal,J.Shepanski,D.Beiso,andS.Carman,\Hyperion,aspace-basedimagingspectrometer,"IEEETrans.Geosci.RemoteSens.,vol.41,no.6,pp.1160{1173,2003. [37] J.Pearlman,S.Carman,C.Segal,P.Jarecke,P.Clancy,andW.Browne,\OverviewoftheHyperionimagingspectrometerfortheNASAEO-1mission,"inGeoscienceandRemoteSensingSymposium,2001.IGARSS'01.IEEE2001International,vol.7,2001,pp.3036{3038. [38] J.Silva,P.Prata,M.Rela,andH.Madeira,\PracticalissuesintheuseofABFTandanewfailuremodel,"inFault-TolerantComputing,1998.DigestofPapers.Twenty-EighthAnnualInternationalSymposiumon,June23{25,1998,pp.26{35. [39] T.Kelso.(2011,Nov.2,)NORADtwo-lineelementsetscurrentdata.NorthAmericanAerospaceDefenseCommand.[Online].Available: http://celestrak.com/NORAD/elements/resource.txt [40] M.Kruglanski,N.Messios,E.DeDonder,E.Gamby,S.Calders,L.Hetey,H.Evans,andE.Daly,\Lastupgradesanddevelopmentofthespaceenvironmentinformationsystem(spenvis),"inRadiationandItsEectsonComponentsandSystems(RADECS),2009EuropeanConferenceon,Sept2009,pp.563{565. [41] TripleModuleRedundancyDesignTechniquesforVirtexFPGAs,XAPP197(v1.0.1),Xilinx,July6,2006.[Online].Available: http://www.xilinx.com/support/documentation/application notes/xapp197.pdf [42] H.Quinn,\AnintroductiontomissionriskandriskmitigationforXilinxsramFPGAs,"LosAlamosNationalLaboratories,Tech.Rep.,2008.[Online].Available: ftp://ftp.lanl.gov/public/hquinn/quinn intro to rad2.pdf [43] H.Quinn,K.Morgan,P.Graham,J.Krone,andM.Carey,\StaticprotonandheavyiontestingoftheXilinxVirtex-5device,"inRadiationEectsDataWorkshop,2007IEEE,July2007,pp.177{184. [44] A.Manuzzato,S.Gerardin,A.Paccagnella,L.Sterpone,andM.Violante,\OnthestaticcrosssectionofSRAM-basedFPGAs,"inRadiationEectsDataWorkshop,2008IEEE,July14{18,2008,pp.94{97. [45] Radiation-Hardened,Space-GradeVirtex-5QVFamilyOverview,DS192(v1.3),Xilinx,March8,2012.[Online].Available: http://www.xilinx.com/support/documentation/data sheets/ds192 V5QV Device Overview.pdf 122

PAGE 123

[46] D.Hiemstra,F.Chayab,andL.Szajek,\DynamicsingleeventupsetcharacterizationoftheVirtex-IIandSpartan-3SRAMeldprogrammablegatearraysusingprotonirradiation,"inRadiationEectsDataWorkshop(REDW),2004IEEE,July2004,pp.79{84. [47] D.Hiemstra,G.Battiston,andP.Gill,\SingleeventupsetcharacterizationoftheVirtex-5eldprogrammablegatearrayusingprotonirradiation,"inRadiationEectsDataWorkshop(REDW),2010IEEE,July2010,pp.1{4. [48] D.HiemstraandV.Kirischian,\SingleeventupsetcharacterizationoftheVirtex-6eldprogrammablegatearrayusingprotonirradiation,"inRadiationEectsDataWorkshop(REDW),2012IEEE,July2012,pp.1{4. [49] ||,\SingleeventupsetcharacterizationoftheSpartan-6eldprogrammablegatearrayusingprotonirradiation,"inRadiationEectsDataWorkshop(REDW),2013IEEE,July2013,pp.1{4. [50] E.Petersen,\Theseugureofmeritandprotonupsetratecalculations,"NuclearScience,IEEETransactionson,vol.45,no.6,pp.2550{2562,Dec1998. [51] F.Sturesson,\Singleeventeects(see)mechanismandeects,"SpaceCenterEPFLandEuropeanSpaceAgency,Tech.Rep.,2003.[Online].Available: http://space.ep.ch/webdav/site/space/shared/industry media/07SEEEectF.Sturesson.pdf [52] R.Ladbury,\Radiationhardeningatthesystemlevel,"NASAGoddardSpaceFlightCenter,Tech.Rep.,2007.[Online].Available: http://radhome.gsfc.nasa.gov/radhome/papers/nsrec07 sc ladbury.pdf [53] J.Schwank,M.Shaneyfelt,andP.Dodd,\Radiationhardnessassurancetestingofmicroelectronicdevicesandintegratedcircuits:Radiationenvironments,physicalmechanisms,andfoundationsforhardnessassurance,"SandiaNationalLaboratories,Tech.Rep.,2008.[Online].Available: http://www.sandia.gov/mstc/services/documents/Sandia RHA Foundations FINAL.pdf [54] D.Landis,J.Samson,andJ.Aldridge,DefectandFaultToleranceinVLSISystems:Volume2.Boston,MA:SpringerUS,1990,ch.AFaultDetectionandToleranceTradeoEvaluationMethodologyforVLSISystems,pp.267{281.[Online].Available: http://dx.doi.org/10.1007/978-1-4757-9957-6 22 [55] V.Agrawal,M.Bushnell,G.Parthasarathy,andR.Ramadoss,\Digitalcircuitdesignforminimumtransientenergyandalinearprogrammingmethod,"inVLSIDesign,1999.Proceedings.TwelfthInternationalConferenceOn,Jan1999,pp.434{439. [56] K.Srinivasan,K.Chatha,andG.Konjevod,\Linear-programming-basedtechniquesforsynthesisofnetwork-on-chiparchitectures,"VeryLargeScaleIntegration(VLSI)Systems,IEEETransactionson,vol.14,no.4,pp.407{420,April2006. 123

PAGE 124

[57] R.Vanderbei,LinearProgramming:FoundationsandExtensions,ser.InternationalSeriesinOperationsResearch&ManagementScience.SpringerUS,2001.[Online].Available: https://books.google.com/books?id=UKaV3ylSFJ8C [58] R.Bland,\Newnitepivotingrulesforthesimplexmethod,"MathematicsofOperationsResearch,vol.2,no.2,pp.103{107,1977.[Online].Available: http://dx.doi.org/10.1287/moor.2.2.103 [59] C.ColeandJ.Crassidis,\FastStar-PatternRecognitionUsingPlanarTriangles,"JournalofGuidanceControlDynamics,vol.29,pp.64{71,Jan.2006. [60] J.Richardson,A.George,andH.Lam,\Performanceanalysisofgpuacceleratorswithrealizableutilizationofcomputationaldensity,"inApplicationAcceleratorsinHighPerformanceComputing(SAAHPC),2012Symposiumon,July2012,pp.137{140. [61] K.UnderwoodandK.Hemmert,\Closingthegap:Cpuandfpgatrendsinsustainableoating-pointblasperformance,"inField-ProgrammableCustomComputingMachines,2004.FCCM2004.12thAnnualIEEESymposiumon,April2004,pp.219{228. [62] Y.Dou,S.Vassiliadis,G.Kuzmanov,andG.Gaydadjiev,\64-bitoating-pointfpgamatrixmultiplication,"inProceedingsofthe2005ACM/SIGDA13thInternationalSymposiumonField-programmableGateArrays,ser.FPGA'05.NewYork,NY,USA:ACM,2005,pp.86{95.[Online].Available: http://doi.acm.org/10.1145/1046192.1046204 [63] A.Sari,D.Agiakatsikas,andM.Psarakis,\Asofterrorvulnerabilityanalysisframeworkforxilinxfpgas,"inProceedingsofthe2014ACM/SIGDAInternationalSymposiumonField-programmableGateArrays,ser.FPGA'14.NewYork,NY,USA:ACM,2014,pp.237{240.[Online].Available: http://doi.acm.org/10.1145/2554688.2554767 [64] \Devicereliabilityreport,"Xilinx,Tech.Rep.UG116,January2012.[Online].Available: http://www.xilinx.com/support/documentation/user guides/ug116.pdf [65] Aisa:Airbornehyperspectralimagingsystems.[Online].Available: http://www.spectralcameras.com/aisa 124

PAGE 125

BIOGRAPHICALSKETCHNicholasWulfearnedhisBachelorofSciencedegreeinelectricalengineeringfromtheUniversityofFloridain2008.Aftergraduation,heinternedatLosAlamosNationalLabortoryandLawrenceLivermoreNationalLaboratory.NicholasreceivedhisMasterofSciencedegreeinelectricalandcomputerengineeringfromtheUniversityofFloridain2010.NicholasworkedasaresearchassistantattheNSFCenterforHigh-PerformanceRecongurableComputing(CHREC)whilepursuinghisdoctoraldegreeandreceivedhisPh.D.fromtheUniversityofFloridainthespringof2016.Hisresearchinterestsincludeanalysisofrecongurabledevicearchitecturesandspacecomputing. 125