<%BANNER%>

Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.

DARK ITEM
Permanent Link: http://ufdc.ufl.edu/UFE0043906/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.
Physical Description: Book
Language: english
Creator: Munir, Arslan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Arslan Munir.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Gordon-Ross, Ann.
Local: Co-adviser: Ranka, Sanjay.
Electronic Access: INACCESSIBLE UNTIL 2014-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0043906:00001

Permanent Link: http://ufdc.ufl.edu/UFE0043906/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2014-05-31.
Physical Description: Book
Language: english
Creator: Munir, Arslan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Arslan Munir.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Gordon-Ross, Ann.
Local: Co-adviser: Ranka, Sanjay.
Electronic Access: INACCESSIBLE UNTIL 2014-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0043906:00001


This item has the following downloads:


Full Text

PAGE 1

MODELINGANDOPTIMIZATIONOFPARALLELANDDISTRIBUTEDEMBEDDEDSYSTEMSByARSLANMUNIRADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012ArslanMunir 2

PAGE 3

ACKNOWLEDGMENTS IwouldliketoexpressmysinceregratitudetomyPh.D.advisorDr.AnnGordon-RossandmyPh.D.co-advisorDr.SanjayRankafortheirguidanceandsupportduringthecourseofmyPh.D.Isincerelyappreciatetheconsiderableamountoftimeandefforttheyinvestedinguidingmewithmyresearch.IwouldalsoliketoacknowledgeDr.GregorySteffanforinvitingmeforvisitingresearchattheUniversityofToronto(UofT),Ontario,Canada.ManythankstomyPh.D.committeemembersDr.JaniseMcNair,Dr.GregStitt,andDr.PrabhatMishrafortheircommentsthathelpedinimprovingthequalityofthisdissertation.ThisworkwassupportedbytheNaturalSciencesandEngineeringResearchCouncilofCanada(NSERC)andtheNationalScienceFoundation(NSF)(CNS-0834080,CNS-0953447,andCNS-0905308).Anyopinions,ndings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreecttheviewsoftheNSERCandtheNSF. 3

PAGE 4

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 3 LISTOFTABLES ...................................... 12 LISTOFFIGURES ..................................... 14 ABSTRACT ......................................... 18 CHAPTER 1INTRODUCTION ................................... 20 1.1EmbeddedSystemsApplications ....................... 23 1.1.1Cyber-PhysicalSystems ........................ 24 1.1.2Space .................................. 24 1.1.3Medical ................................. 25 1.1.4Automotive ............................... 26 1.2EmbeddedSystemsApplicationsCharacteristics .............. 28 1.2.1Throughput-Intensive .......................... 29 1.2.2Thermal-Constrained .......................... 30 1.2.3Reliability-Constrained ......................... 30 1.2.4Real-Time ................................ 31 1.2.5ParallelandDistributed ......................... 31 1.3EmbeddedSystemsHardwareandSoftware ............... 32 1.3.1EmbeddedSystemsHardware .................... 33 1.3.1.1Sensors ............................ 34 1.3.1.2Sample-and-HoldcircuitsandA/Dconverters ....... 34 1.3.1.3Processingunits ....................... 34 1.3.1.4Memorysubsystems ..................... 35 1.3.1.5D/Aconverters ........................ 36 1.3.1.6Outputdevices ........................ 36 1.3.2EmbeddedSystemsSoftware ..................... 36 1.3.2.1Operatingsystem ...................... 36 1.3.2.2Middleware .......................... 37 1.3.2.3Applicationsoftware ..................... 37 1.4ModelingAnIntegralPartoftheEmbeddedSystemDesignFlow ... 38 1.4.1ModelingObjectives .......................... 40 1.4.2ModelingParadigms .......................... 42 1.4.2.1Differentialequations .................... 42 1.4.2.2Statemachines ....................... 43 1.4.2.3Dataow ........................... 43 1.4.2.4Discreteevent-basedmodeling ............... 44 1.4.2.5Stochasticmodels ...................... 44 1.4.2.6Petrinets ........................... 45 4

PAGE 5

1.4.3StrategiesforIntegrationofModelingParadigms .......... 45 1.4.3.1Cosimulation ......................... 46 1.4.3.2Codeintegration ....................... 47 1.4.3.3Codeencapsulation ..................... 47 1.4.3.4Modelencapsulation ..................... 47 1.4.3.5Modeltranslation ....................... 47 1.5DissertationContributions ........................... 47 1.6RelationshiptoPublishedWork ........................ 50 1.7DissertationOrganization ........................... 51 2OPTIMIZATIONAPPROACHESINDISTRIBUTEDSINGLE-COREEMBEDDEDWIRELESSSENSORNETWORKS ........................ 53 2.1Architecture-LevelOptimizations ....................... 55 2.2SensorNodeComponent-LevelOptimizations ................ 57 2.2.1SensingUnit .............................. 57 2.2.2ProcessingUnit ............................. 58 2.2.3TransceiverUnit ............................. 59 2.2.4StorageUnit ............................... 59 2.2.5ActuatorUnit .............................. 60 2.2.6LocationFindingUnit .......................... 60 2.2.7PowerUnit ................................ 60 2.3DataLink-LevelMediumAccessControlOptimizations ........... 61 2.3.1LoadBalancingandThroughputOptimizations ........... 61 2.3.2Power/EnergyOptimizations ...................... 63 2.4Network-LevelDataDisseminationandRoutingProtocolOptimizations 65 2.4.1QueryDisseminationOptimizations .................. 65 2.4.2Real-TimeConstrainedOptimizations ................ 68 2.4.3NetworkTopologyOptimizations ................... 68 2.4.4ResourceAdaptiveOptimizations ................... 69 2.5OperatingSystem-levelOptimizations .................... 69 2.5.1Event-DrivenOptimizations ...................... 69 2.5.2DynamicPowerManagement ..................... 70 2.5.3Fault-Tolerance ............................. 70 2.6DynamicOptimizations ............................ 71 2.6.1DynamicVoltageandFrequencyScaling ............... 71 2.6.2Software-BasedDynamicOptimizations ............... 72 2.6.3DynamicNetworkReprogramming .................. 72 3ANAPPLICATIONMETRICSESTIMATIONMODELFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS ........................ 73 3.1ApplicationMetricsEstimationModel ..................... 75 3.1.1LifetimeEstimation ........................... 75 3.1.2ThroughputEstimation ......................... 81 3.1.3ReliabilityEstimation .......................... 82 5

PAGE 6

3.1.4ModelsValidation ............................ 83 3.2ExperimentalResults ............................. 84 3.2.1ExperimentalSetup .......................... 84 3.2.2Results ................................. 85 3.2.2.1Lifetime ............................ 86 3.2.2.2Throughput .......................... 87 3.2.2.3Reliability ........................... 87 3.3ConcludingRemarks .............................. 88 4MARKOVMODELINGOFFAULT-TOLERANTDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS ........................ 89 4.1RelatedWork .................................. 91 4.2Fault-ToleranceParameters .......................... 95 4.2.1CoverageFactor ............................ 95 4.2.2SensorFailureProbability ....................... 96 4.2.3SensorFailureRate .......................... 97 4.3Fault-TolerantMarkovModels ......................... 97 4.3.1Fault-TolerantEmbeddedSensorNodeModel ............ 98 4.3.2Fault-TolerantEWSNClusterModel ................. 101 4.3.3Fault-TolerantEWSNModel ...................... 104 4.4Results ..................................... 105 4.4.1ExperimentalSetup .......................... 106 4.4.2ReliabilityandMTTFforanNFTandanFTsensornode ...... 107 4.4.3ReliabilityandMTTFforNFTandFTEWSNclusters ........ 113 4.4.4ReliabilityandMTTFforanNFTandanFTEWSN ......... 117 4.5ConcludingRemarks .............................. 121 5ANMDP-BASEDDYNAMICOPTIMIZATIONMETHODOLOGYFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS ................. 123 5.1RelatedWork .................................. 126 5.2MDP-BasedTuningOverview ......................... 130 5.2.1MDP-BasedTuningMethodologyforEmbeddedWirelessSensorNetworks ................................ 130 5.2.2MDPOverviewwithRespecttoEmbeddedWirelessSensorNetworks 134 5.3ApplicationSpecicEmbeddedSensorNodeTuningFormulationasanMDP ....................................... 136 5.3.1StateSpace ............................... 137 5.3.2DecisionEpochsandActions ..................... 138 5.3.3StateDynamics ............................. 138 5.3.4PolicyandPerformanceCriterion ................... 139 5.3.5RewardFunction ............................ 140 5.3.6OptimalityEquation ........................... 143 5.3.7PolicyIterationAlgorithm ........................ 144 5.4ImplementationGuidelinesandComplexity ................. 145 6

PAGE 7

5.4.1ImplementationGuidelines ....................... 145 5.4.2ComputationalComplexity ....................... 147 5.4.3DataMemoryAnalysis ......................... 147 5.5ModelExtensions ............................... 148 5.6NumericalResults ............................... 152 5.6.1FixedHeuristicPoliciesforPerformanceComparisons ....... 153 5.6.2MDPSpecications ........................... 153 5.6.3ResultsforaSecurity/DefenseSystemApplication ......... 158 5.6.3.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward ................... 158 5.6.3.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward ................... 160 5.6.3.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward ........... 161 5.6.4ResultsforaHealthCareApplication ................. 162 5.6.4.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward ................... 162 5.6.4.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward ................... 163 5.6.4.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward ........... 164 5.6.5ResultsforanAmbientConditionsMonitoringApplication ..... 165 5.6.5.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward ................... 165 5.6.5.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward ................... 166 5.6.5.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward ........... 167 5.6.6SensitivityAnalysis ........................... 168 5.6.7NumberofIterationsforConvergence ................ 169 5.7ConcludingRemarks .............................. 169 6ONLINEALGORITHMSFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKSDYNAMICOPTIMIZATION ...................... 171 6.1RelatedWork .................................. 173 6.2DynamicOptimizationMethodology ..................... 175 6.2.1MethodologyOverview ......................... 175 6.2.2StateSpace ............................... 176 6.2.3ObjectiveFunction ........................... 177 6.2.4OnlineOptimizationAlgorithms .................... 179 6.2.4.1GreedyAlgorithm ...................... 179 6.2.4.2SimulatedAnnealingAlgorithm ............... 180 6.3ExperimentalResults ............................. 183 6.3.1ExperimentalSetup .......................... 183 7

PAGE 8

6.3.2Results ................................. 184 6.4ConcludingRemarks .............................. 191 7ALIGHTWEIGHTDYNAMICOPTIMIZATIONMETHODOLOGYFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS ................. 192 7.1RelatedWork .................................. 195 7.2DynamicOptimizationMethodology ..................... 197 7.2.1Overview ................................ 197 7.2.2StateSpace ............................... 199 7.2.3OptimizationObjectionFunction .................... 200 7.3AlgorithmsforDynamicOptimizationMethodology ............. 202 7.3.1InitialTunableParameterValueSettingsandExplorationOrder .. 202 7.3.2ParameterArrangement ........................ 204 7.3.3OnlineOptimizationAlgorithm ..................... 207 7.3.4ComputationalComplexity ....................... 209 7.4ExperimentalResults ............................. 209 7.4.1ExperimentalSetup .......................... 209 7.4.2Results ................................. 212 7.4.2.1Percentageimprovementsofone-shotoverotherinitialparametersettings ...................... 213 7.4.2.2Comparisonofone-shotwithgreedyvariants-andSA-baseddynamicoptimizationmethodologies ............ 214 7.4.2.3Comparisonofoptimizedgreedy(GD)withgreedyvariants-andSA-baseddynamicoptimizationmethodologies ... 219 7.4.2.4ComputationalComplexity ................. 225 7.5ConcludingRemarks .............................. 228 8HIGH-PERFORMANCEENERGY-EFFICIENTPARALLELMULTI-COREEMBEDDEDCOMPUTING ..................................... 230 8.1ArchitecturalApproaches ........................... 234 8.1.1CoreLayout ............................... 234 8.1.1.1HeterogeneousCMP .................... 235 8.1.1.2Conjoined-CoreCMP .................... 235 8.1.1.3Tiledmulti-corearchitectures ................ 236 8.1.1.43Dmulti-corearchitectures ................. 236 8.1.1.5Composablemulti-corearchitectures ........... 237 8.1.1.6Stochasticprocessors .................... 237 8.1.2MemoryDesign ............................. 237 8.1.2.1Transactionalmemory .................... 238 8.1.2.2Cachepartitioning ...................... 238 8.1.2.3Cooperativecaching ..................... 239 8.1.2.4Smartcaching ........................ 239 8.1.3InterconnectionNetwork ........................ 240 8.1.3.1Interconnecttopology .................... 240 8

PAGE 9

8.1.3.2Packet-Switchedinterconnect ................ 241 8.1.3.3Photonicinterconnect .................... 242 8.1.3.4Wirelessinterconnect .................... 243 8.1.4ReductionTechniques ......................... 243 8.1.4.1Leakagecurrentreduction ................. 244 8.1.4.2Shortcircuitcurrentreduction ............... 244 8.1.4.3Peakpowerreduction .................... 244 8.1.4.4Interconnectionlengthreduction .............. 245 8.1.4.5Instructionanddatafetchenergyreduction ........ 245 8.2Hardware-AssistedMiddlewareApproaches ................. 245 8.2.1DynamicVoltageandFrequencyScaling ............... 246 8.2.2AdvancedCongurationandPowerInterface ............ 247 8.2.3GatingTechniques ........................... 247 8.2.3.1Powergating ......................... 248 8.2.3.2Per-Corepowergating ................... 248 8.2.3.3Splitpowerplanes ...................... 248 8.2.3.4Clockgating ......................... 249 8.2.4ThreadingTechniques ......................... 249 8.2.4.1Hyper-Threading ....................... 249 8.2.4.2Helperthreading ....................... 250 8.2.4.3Speculativethreading .................... 250 8.2.5EnergyMonitoringandManagement ................. 251 8.2.6DynamicThermalManagement .................... 252 8.2.6.1TemperaturedeterminationforDTM ............ 252 8.2.6.2TechniquesassistingDTM ................. 253 8.2.7DependableTechniques ........................ 254 8.2.7.1N-modularredundancy ................... 254 8.2.7.2Dynamicconstitution .................... 255 8.2.7.3Proactivecheckpointdeallocation ............. 255 8.3SoftwareApproaches ............................. 255 8.3.1DataForwarding ............................ 256 8.3.2LoadDistribution ............................ 256 8.3.2.1Taskscheduling ....................... 257 8.3.2.2Taskmigration ........................ 257 8.3.2.3Loadbalancingandunbalancing .............. 258 8.4High-PerformanceEnergy-EfcientMulti-CoreProcessors ......... 260 8.4.1ARM11MPCore ............................ 260 8.4.2ARMCortexA-9MPCore ....................... 260 8.4.3MPC8572EPowerQUICCIII ...................... 262 8.4.4TileraTILEPro64andTILE-Gx .................... 262 8.4.5AMDOpteronProcessor ........................ 263 8.4.6IntelXeonProcessor .......................... 263 8.4.7IntelSandyBridgeProcessor ..................... 264 8.4.8GraphicsProcessingUnits ....................... 265 9

PAGE 10

9MULTI-COREPARALLELANDDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS ..................................... 267 9.1Multi-CoreEmbeddedWirelessSensorNetworkArchitecture ....... 271 9.1.1SensingUnit .............................. 273 9.1.2ProcessingUnit ............................. 275 9.1.3StorageUnit ............................... 275 9.1.4CommunicationUnit .......................... 276 9.1.5PowerUnit ................................ 277 9.1.6ActuatorUnit .............................. 277 9.1.7LocationFindingUnit .......................... 277 9.2Compute-IntensiveTasksMotivatingtheEmergenceofMCEWSNs .... 278 9.2.1InformationFusion ........................... 278 9.2.2Encryption ................................ 280 9.2.3NetworkCoding ............................. 281 9.2.4SoftwareDenedRadio ........................ 281 9.3MCEWSNApplicationDomains ........................ 281 9.3.1WirelessVideoSensorNetworks ................... 282 9.3.2WirelessMultimediaSensorNetworks ................ 283 9.3.3Satellite-BasedWirelessSensorNetworks .............. 284 9.3.4SpaceShuttleSensorNetworks .................... 286 9.3.5Aerial-TerrestrialHybridSensorNetworks .............. 287 9.3.6Fault-TolerantSensorNetworks .................... 289 9.4Multi-CoreEmbeddedSensorNodes ..................... 289 9.4.1InstraNode ............................... 290 9.4.2MarsRoverPrototypeMote ...................... 290 9.4.3Satellite-BasedSensorNode ..................... 290 9.4.4Multi-CPU-basedSensorNodePrototype .............. 291 9.4.5SmartCameraMote .......................... 291 9.5ConcludingRemarks .............................. 292 10AQUEUEINGTHEORETICAPPROACHFORPERFORMANCEEVALUATIONOFPARALLELMULTI-COREEMBEDDEDSYSTEMS .............. 294 10.1RelatedWork .................................. 297 10.2QueueingNetworkModelingofMulti-CoreEmbeddedArchitectures ... 300 10.3QueueingNetworkModelsValidation ..................... 308 10.4InsightsObtainedfromQueueingTheoreticModels ............. 316 10.5TheEffectsofCacheMissRatesonPerformance ............. 318 10.5.1TheEffectsofWorkloadsonPerformance .............. 322 10.5.2PerformanceperWattandPerformanceperUnitAreaComputations 326 10.6ConcludingRemarks .............................. 334 10

PAGE 11

11PARALLELIZEDBENCHMARK-DRIVENPERFORMANCEEVALUATIONOFSYMMETRICMULTIPROCESSORSANDTILEDMULTI-COREARCHITECTURESFORPARALLELEMBEDDEDSYSTEMS ..................... 335 11.1RelatedWork .................................. 337 11.2Multi-CoreArchitecturesandBenchmarks .................. 340 11.2.1Multi-CoreArchitectures ........................ 340 11.2.1.1Symmetricmultiprocessors ................. 340 11.2.1.2Tiledmulti-corearchitectures ................ 340 11.2.2ApplicationsandKernelsLeveragedasBenchmarks ........ 342 11.2.2.1Informationfusion ...................... 342 11.2.2.2Gaussianelimination .................... 344 11.2.2.3Embarrassinglyparallelbenchmark ............ 344 11.3ParallelComputingDeviceMetrics ...................... 344 11.3.1RunTime ................................ 344 11.3.2Speedup ................................. 345 11.3.3Efciency ................................ 345 11.3.4Cost ................................... 345 11.3.5Scalability ................................ 345 11.3.6ComputationalDensity ......................... 346 11.3.7ComputationalDensityperWatt .................... 346 11.3.8Memory-SustainableComputationalDensity ............. 347 11.4Results ..................................... 348 11.4.1QuantitativeComparisonofMulti-corePlatforms .......... 348 11.4.2Benchmark-DrivenResultsforSMPs ................. 350 11.4.3Benchmark-DrivenResultsforTMAs ................. 355 11.4.4ComparisonofSMPsandTMAs ................... 360 11.5ConcludingRemarks .............................. 363 12CONCLUSIONS ................................... 365 REFERENCES ....................................... 378 BIOGRAPHICALSKETCH ................................ 403 11

PAGE 12

LISTOFTABLES Table page 2-1EWSNoptimizationsatdifferentdesign-levels ................. 55 3-1CrossbowIRISmoteplatformhardwarespecications ............. 85 4-1SummaryofnotationsusedinEWSNMarkovmodels ............. 98 4-2Atypicalfaultdetectionalgorithm'saccuracy .................. 108 4-3ReliabilityforanNFTandanFTembeddedsensornode ........... 110 4-4PercentageMTTFimprovementforanFTembeddedsensornodeascomparedtoanNFTembeddedsensornode ........................ 111 4-5ReliabilityforanNFTandanFTEWSNcluster ................. 115 4-6PercentageMTTFimprovementforanFTEWSNclusterascomparedtoanNFTEWSNcluster ............................... 116 4-7Iso-MTTFforEWSNclusters ........................... 117 4-8ReliabilityforanNFTEWSNandanFTEWSN ................. 119 4-9Iso-MTTFforEWSNs ............................... 121 5-1SummaryofMDPnotations ............................ 135 5-2Powerconsumption,throughput,anddelayparametersforasensornodestate ........................................ 154 5-3Minimumandmaximumrewardfunctionparametervaluesandapplicationmetricweightfactorsforasecurity/defensesystem,healthcare,andambientconditionsmonitoringapplication ......................... 156 5-4Theeffectsofdifferentdiscountfactorsforasecurity/defensesystem .... 158 6-1Desirableminimum,desirablemaximum,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforasecurity/defensesystem,healthcare,andanambientconditionsmonitoringapplication ........ 185 7-1CrossbowIRISmoteplatformhardwarespecications ............. 210 7-2Desirableminimum,desirablemaximum,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforasecurity/defensesystem,healthcare,andanambientconditionsmonitoringapplication ........ 211 7-3Percentageimprovementsattainedbyone-shotoverotherinitialparametersettings ....................................... 213 12

PAGE 13

7-4Greedyalgorithmswithdifferentparameterarrangementsandexplorationorders ............................... 220 7-5Energyconsumptionfortheone-shotandtheimprovementmodeforourdynamicoptimizationmethodology ........................ 228 8-1TopGreen500andTop500supercomputers .................. 231 8-2High-performanceenergy-efcientmulti-coreprocessors ........... 261 10-1Multi-coreembeddedarchitectureswithvaryingprocessorcoresandcachecongurations ................................... 302 10-2CachemissratestatisticsobtainedfromSESCfortheSPLASH-2benchmarksformulti-corearchitectures ............................ 312 10-3ExecutiontimeobtainedfromSESCandourqueueingtheoreticmodelsfortheSPLASH-2benchmarksformulti-corearchitectures ............ 313 10-4ExecutiontimecomparisonoftheSPLASH-2benchmarksonSESCformulti-corearchitectures .................................... 315 10-5ExecutiontimecomparisonofSPLASH-2benchmarksonSESCversusourqueueingtheoreticmodels ............................ 316 10-6Areaandpowerconsumptionformulti-corearchitectures ........... 328 10-7Areaandpowerconsumptionofarchitecturalelementsfor2P-2L1ID-2L2-1Mmulti-coreembeddedarchitecture ........................ 329 11-1Parallelcomputingdevicemetricsforthemulti-corearchitectures ....... 349 11-2PerformanceresultsfortheinformationfusionapplicationforSMP2xQuadXeon 351 11-3PerformanceresultsfortheGaussianeliminationbenchmarkforSMP2xQuadXeon 353 11-4PerformanceresultsfortheEPbenchmarkforSMP2xQuadXeon ......... 354 11-5PerformanceresultsfortheinformationfusionapplicationforTILEPro64 ... 356 11-6PerformanceresultsfortheGaussianeliminationbenchmarkforTILEPro64 357 11-7PerformanceresultsfortheEPbenchmarkforTILEPro64 ........... 359 12-1High-performanceenergy-efcientembeddedcomputingchallenges ..... 374 13

PAGE 14

LISTOFFIGURES Figure page 1-2Embeddedsystemshardwareandsoftwareoverview ............. 33 1-3Alinearobjectivefunctionforreliability ..................... 41 2-1Embeddedwirelesssensornetworkarchitecture ................ 56 2-2Embeddedsensornodearchitecturewithtunableparameters ......... 57 2-3Dataaggregation ................................. 66 2-4Directeddiffusion ................................. 67 4-1Sensornodefailureprobabilitydistribution ................... 96 4-2Anon-FT(NFT)embeddedsensornodeMarkovmodel ............ 99 4-3FTembeddedsensornodeMarkovmodel ................... 100 4-4EWSNclusterMarkovmodel ........................... 102 4-5EWSNclusterMarkovmodelwiththreestates ................. 103 4-6EWSNMarkovmodel ............................... 105 4-7MTTFindaysforanNFTandanFTembeddedsensornode ......... 109 4-8MTTFindaysforanNFTEWSNclustersandanFTEWSNcluster ..... 115 4-9MTTFindaysforanNFTEWSNandanFTEWSN .............. 120 5-1ProcessdiagramforMDP-baseddynamicoptimizationmethodologyforEWSNs ....................................... 130 5-2Power,throughput,anddelayrewardfunctions ................. 142 5-3Reliabilityrewardfunctions ............................ 150 5-4SymbolicrepresentationofMDPmodel ..................... 154 5-5Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforasecurity/defensesystem ...................... 159 5-6PercentageimprovementinexpectedtotaldiscountedrewardforMDPforasecurity/defensesystem ............................. 159 5-7Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforasecurity/defensesystem ...................... 160 14

PAGE 15

5-8Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforasecurity/defensesystem ................ 162 5-9Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforahealthcareapplication ....................... 162 5-10PercentageimprovementinexpectedtotaldiscountedrewardforMDPforahealthcareapplication .............................. 163 5-11Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforahealthcareapplication ....................... 164 5-12Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforahealthcareapplication ................. 164 5-13Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication ............ 165 5-14PercentageimprovementinexpectedtotaldiscountedrewardforMDPforanambientconditionsmonitoringapplication .................. 166 5-15Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication ............ 167 5-16Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication ..... 168 6-1DynamicoptimizationmethodologyfordistributedEWSNs .......... 176 6-2Lifetimeobjectivefunction ............................. 178 6-3Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforasecurity/defensesystem ........................... 186 6-4Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforahealthcareapplication ............................ 187 6-5Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforanambientconditionsmonitoringapplication ................ 188 7-1AlightweightdynamicoptimizationmethodologypersensornodeforEWSNs 198 7-3Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforasecurity/defensesystemwhenjSj=729 .................... 215 15

PAGE 16

7-4Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforasecurity/defensesystemwhenjSj=31,104 .................. 216 7-5Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforahealthcareapplicationwhenjSj=729 ..................... 217 7-6Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforahealthcareapplicationwhenjSj=31,104 ................... 217 7-7Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforanambientconditionsmonitoringapplicationwhenjSj=729 ........... 218 7-8Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforanambientconditionsmonitoringapplicationwhenjSj=31,104 ......... 219 7-9ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandthegreedyalgorithmsforasecurity/defensesystemjSj=729 .................................. 221 7-10ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandgreedyalgorithmsforasecurity/defensesystemforjSj=31,104 .............................. 222 7-11ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandgreedyalgorithmsforanambientconditionsmonitoringapplicationforjSj=729 ....................... 224 8-1High-performanceenergy-efcientparallelembeddedcomputingdomain .. 232 9-1Aheterogeneoushierarchicalmulti-coreembeddedwirelesssensornetworkarchitecture ..................................... 271 9-2Multi-coreembeddedsensornodearchitecture ................. 274 9-3OmnibussensorinformationfusionmodelforanMCEWSNarchitecture ... 280 10-1Queueingnetworkmodelforthe2P-2L1ID-2L2-1Mmulti-coreembeddedarchitecture ..................................... 304 10-2Queueingnetworkmodelforthe2P-2L1ID-1L2-1Mmulti-coreembeddedarchitecture ..................................... 306 10-3Queueingnetworkmodelvalidationdemonstration ............... 309 16

PAGE 17

10-4Theeffectsofcachemissrateonresponsetime(ms)formixedworkloadsfor2P-2L1ID-2L2-1Mforavaryingnumberofjobs ............... 319 10-5Theeffectsofprocessor-boundworkloadsonresponsetime(ms)for2P-2L1ID-2L2-1Mforavaryingnumberofjobs ................. 323 11-1Tilera'sTILEPro64processor ........................... 341 11-2Performanceperwatt(MOPS/W)comparisonbetweenSMP2xQuadXeonandtheTILEPro64fortheinformationfusionapplication .............. 360 11-3Performanceperwatt(MFLOPS/W)comparisonbetweenSMP2xQuadXeonandtheTILEPro64fortheGaussianeliminationbenchmark ......... 362 17

PAGE 18

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyMODELINGANDOPTIMIZATIONOFPARALLELANDDISTRIBUTEDEMBEDDEDSYSTEMSByArslanMunirMay2012Chair:AnnGordon-RossCochair:SanjayRankaMajor:ElectricalandComputerEngineering Advancementsinsilicontechnology,micro-electro-mechanicalsystems(MEMS),wirelesscommunications,computernetworking,anddigitalelectronicshaveledtotheproliferationofembeddedsystemsinaplethoraofapplicationdomains(e.g.,industrialandhomeautomation,automotive,space,medical,defense,etc.).Tomeetthediverseapplicationrequirementsoftheseapplicationdomains,noveltrendshaveemergedinembeddedsystems.Manytimes,theembeddedsystemsinanapplicationdomainarenetworkedtogethertoformamulti-unitembeddedsystem,alsoreferredtoasadistributedembeddedsystem,whichpermitssophisticatedapplicationsofgreatervalueascomparedtoanisolatedsingle-unitembeddedsystem.Anemergingtrendistoconnectthesedistributedembeddedsystemsviaawirelessnetworkinsteadofabulky,wirednetworkinginfrastructure.Anotheremergingtrendinembeddedsystemsistoleveragemulti-core(many-core)architecturestomeetthecontinuouslyincreasingperformancedemandsofmanyapplicationdomains(e.g.,medicalimaging,mobilesignalprocessing).Bothsingle-unitandmulti-unitdistributedembeddedsystemscanleveragemulti-corearchitecturesforattaininghighperformanceandenergyefciency.Theburgeoningofmulti-corearchitecturesinembeddedsystemsinducesparallelcomputinginembeddeddomain,whichwaspreviouslyusedpredominantlyinsupercomputingdomainonly.Forbothparallelanddistributedembeddedsystems, 18

PAGE 19

modelingandoptimizationatvariousdesignlevels(e.g.,verication,simulation,analysis)isofparamountsignicance.Consideringtheshorttime-to-marketformanyembeddedsystems,embeddedsystemsdesignersoftenresorttomodelingapproachesforevaluationofdesignalternativesintermsofperformance,power,reliability,and/orscalability.Inthisdissertation,wefocusonmodeling,analysis,andoptimizationofparallelanddistributedembeddedsystems.Toillustratethemodelingandoptimizationofdistributedembeddedsystems,wepresentourworkonmodelingandoptimizationofembeddedsensornodesinanembeddedwirelesssensornetwork(EWSN).Weelaborateadditionalmodelingissuesinparallelembeddedsystemsbyourperformancemodelingofmulti-coreembeddedsystems.Finally,weconductperformanceanalysisofparallelembeddedsystemsbasedonsymmetricmulti-processors(SMPs)andtiledmulti-corearchitectures(TMAs). 19

PAGE 20

CHAPTER1INTRODUCTION Thewordembeddedliterallymeans`within',soembeddedsystemsareinformationprocessingsystemswithin(embeddedinto)othersystems.Inotherwords,anembeddedsystemisasystemthatusesacomputertoperformaspecictaskbutareneitherusednorperceivedasacomputer.Essentially,anembeddedsystemisvirtuallyanycomputingsystemotherthanadesktopcomputer.Embeddedsystemshavelinkstophysicalcomponents/systems,whichdistinguishesthemfromtraditionaldesktopcomputing[ 1 ].Embeddedsystemspossessalargenumberofcommoncharacteristicssuchasreal-timeconstraints,dependability,andpower/energy-efciency. Embeddedsystemscanbeclassiedbasedonfunctionalityastransformational,reactive,orinteractive[ 2 ].Transformationalembeddedsystemstakeinputdataandtransformthedataintooutputdata.Reactiveembeddedsystemsreactcontinuouslytotheirenvironmentatthespeedoftheenvironmentwhereasinteractiveembeddedsystemsreactwiththeirenvironmentattheirownspeed. Embeddedsystemscanbeclassiedbasedonorchestration/architectureassingle-unitormulti-unitembeddedsystems.Single-unitembeddedsystemsrefertoembeddedsystemsthatpossesscomputationalcapabilitiesandinteractwiththephysicalworldviasensorsandactuatorsbutarefabricatedonasinglechipandareenclosedinasinglepackage.Multi-unitembeddedsystems,alsoreferredtoasdistributedembeddedsys-tems,consistofalargenumberofphysicallydistributednodesthatpossesscomputationcapabilities,interactwiththephysicalworldviaasetofsensorsandactuators,andcommunicatewitheachotherviaawiredorwirelessnetwork.Cyber-physicalsystems(CPSs)andembeddedwirelesssensornetworks(EWSNs)aretypicalexamplesofdistributedembeddedsystems. Embeddedsystemdesignistraditionallypower-centricbuttherehasbeenarecentshifttowardshigh-performanceembeddedcomputing(HPEC)duetotheproliferation 20

PAGE 21

ofcompute-intensiveembeddedapplications.Forexample,thesignalprocessingfora3Gmobilehandsetrequires35-40Gigaoperationspersecond(GOPS)fora14.4Mbpschanneland210-290GOPSfora100Mbpsorthogonalfrequency-divisionmultiplexing(OFDM)channel.Consideringthelimitedenergyofamobilehandsetbattery,theseperformancelevelsmustbemetwithapowerdissipationbudgetofapproximately1W,whichtranslatestoaperformanceefciencyof25mW/GOPor25pJ/operationforthe3Greceiverand3-5pJ/operationfortheOFDMreceiver[ 3 ][ 4 ].Thesedemandingandcompetingpower-performancerequirementsmakemodernembeddedsystemdesignchallenging. Thehigh-performanceenergy-efcientembeddedcomputing(HPEEC)domainaddressestheuniquedesignchallengesofhigh-performanceandlow-power/energy(canbetermedasgreen,however,greenmayrefertoabiggernotionofenvironmentalimpact)embeddedcomputing.Thesedesignchallengesarecompetingbecausehigh-performancetypicallyrequiresmaximumprocessorspeedswithenormousenergyconsumption,whereaslow-powertypicallyrequiresnominalorlowprocessorspeedsthatoffermodestperformance.HPEECrequiresthoroughconsiderationofthethermaldesignpower(TDP)andprocessorfrequencyrelationshipwhileselectinganappropriateprocessorforanembeddedapplication.Forexample,decreasingtheprocessorfrequencybyafractionofthemaximumoperatingfrequency(e.g.,reducingfrom3.16GHzto3.0GHz)cancause10%performancedegradationbutcandecreasepowerconsumptionby30-40%[ 5 ].TomeetHPEECpower-performancerequirements,embeddedsystemdesignhastransitionedfromasingle-coretoamulti-coreparadigmthatfavorsmultiplelow-powercoresrunningatlowprocessorsspeedsratherthanasinglehigh-speedpower-hungrycore.Theburgeoningofmulti-corearchitecturesinembeddedsystemsinducesparallelcomputinginembeddeddomain,whichwaspreviouslyusedpredominantlyinsupercomputingdomainonly. 21

PAGE 22

Themulti-coreembeddedsystemshaveintegratedHPEECandparallelcomputingintohigh-performanceenergy-efcientparallelembeddedcomputing(HPEPEC)domain. HPEPECdomainencompassesbothsingle-unitandmulti-unitdistributedembeddedsystems.Chipmultiprocessors(CMPs)provideascalableHPEPECplatformasperformancecanbeincreasedbyincreasingthenumberofcoresaslongastheincreaseinthenumberofcoresoffsetstheclockfrequencyreductionbymaintainingagivenperformancelevelwithlesspower[ 6 ].Multi-processorsystems-on-chip(MPSoCs),whicharemulti-processorversionofsystems-on-chip(SoCs),areanotheralternativeHPEPECplatform,whichprovideanunlimitedcombinationofhomogeneousandheterogeneouscores.ThoughbothCMPsandMPSoCsareHPEPECplatforms,MPSoCsdifferfromCMPsinthatMPSoCsprovidecustomarchitectures(includingspecializedinstructionsets)tailoredformeetingpeculiarrequirementsofspecicembeddedapplications(e.g.,real-time,throughput-intensive,reliability-constrained).BothCMPsandMPSoCsrelyonHPEPEChardware/softwaretechniquesfordeliveringhighperformanceperwattandmeetingdiverseapplicationrequirements. Interactionwiththeenvironment,timingoftheoperations,communicationnetwork,shorttime-to-market,andincreasingcustomerexpectations/demandsforembeddedsystemfunctionalityhasledtoanexponentialincreaseindesigncomplexity(e.g.,currentautomotiveembeddedsystemscontainmorethan100millionlinesofcode).Whileindustryfocusesonincreasingthenumberofon-chipprocessorcorestomeetcustomerperformancedemands,embeddedsystemdesignersfacethenewchallengeofoptimallayoutoftheseprocessorcoresalongwiththememorysubsystem(cachesandmainmemory)tosatisfypower,area,andstringentreal-timeconstraints.Theshorttime-to-market(timefromproductconceptiontomarketrelease)ofembeddedsystemsfurtherexacerbatesdesignchallenges. Modelingofembeddedsystemshelpsinreducingthetime-to-marketbyenablingfastapplication-to-devicemapping,earlyproofofconcept(POC),andsystemverication. 22

PAGE 23

Originalequipmentmanufacturers(OEMs)increasinglyadoptmodel-baseddesignmethodologiesforimprovingthequalityandreuseofhardware/softwarecomponents.Amodel-baseddesignallowsdevelopmentofcontrolanddataowapplicationsinagraphicallanguagefamiliartocontrolengineersanddomainexperts.Moreover,amodel-baseddesignenablescomponents'denitionatahigherlevelofabstractionthatpermitsmodularityandreusability.Furthermore,amodel-baseddesignallowsvericationofsystembehaviorusingsimulation.However,differentmodelsprovidedifferentlevelsofabstractionforthesystemunderdesign(SUD).Toensuretimelycompletionofembeddedsystemsdesignwithsufcientcondenceintheproductsmarketrelease,designengineershavetomaketradeoffsbetweentheabstractionlevelofamodelandtheaccuracyamodelcanattain. Thischapterintroducesthemodelingandoptimizationofembeddedsystems.Section 1.1 elaboratesonseveralembeddedsystemapplicationdomains.VariouscharacteristicsofembeddedsystemapplicationsarediscussedinSection 1.2 .Section 1.3 discussesthemaincomponentsofatypicalembeddedsystem'shardwareandsoftware.Section 1.4 givesanoverviewofmodeling,modelingobjectives,andvariousmodelingparadigms.Section 1.5 summarizesthemaincontributionsofthisdissertation.Finally,weoutlinetheorganizationofthisdissertationinSection 1.7 1.1EmbeddedSystemsApplications Embeddedsystemshaveapplicationsinvirtuallyallcomputingdomains(exceptdesktopcomputing)suchasautomobiles,medical,industryautomation,homeappliances(e.g.,microwaveovens,toasters,washers/dryers),ofces(e.g.,printers,scanners),aircraft,space,military,andconsumerelectronics(e.g.,smartphones,featurephones,portablemediaplayers,videogames).Inthissection,wediscusssomeoftheseapplicationsindetail. 23

PAGE 24

1.1.1Cyber-PhysicalSystems Acyber-physicalsystem(CPS)isanemergingapplicationdomainofmulti-unit/networkedembeddedsystems.TheCPStermemphasizesthelinktophysicalquantitiessuchastime,energy,andspace.AlthoughCPSsareembeddedsystems,thenewterminologyhasbeenproposedbyresearcherstodistinguishCPSsfromsimplemicrocontrollerbasedembeddedsystems.CPSsenablemonitoringandcontrolofphysicalsystemsviaanetwork(e.g.,Internet,Intranet,orwirelesscellularnetwork).CPSsarehybridsystemsthatincludebothcontinuousanddiscretedynamics.ModelingofCPSsmustusehybridmodelsthatrepresentbothcontinuousanddiscretedynamicsandshouldincorporatetimingandconcurrency.Communicationbetweensingle-unitembeddeddevices/subsystemsperformingdistributedcomputationinCPSspresentchallengesduetouncertaintyintemporalbehavior(e.g.,jitterinlatency),messageorderingbecauseofdynamicroutingofdata,anddataerrorrates.CPSapplicationsincludeprocesscontrol,networkedbuildingcontrolsystems(e.g.,lighting,air-conditioning),telemedicine,andsmartstructures. 1.1.2Space Embeddedsystemsareprevalentinspaceandaerospacesystemswheresafety,reliability,andreal-timerequirementsarecritical.Forexample,ay-by-wireaircraftwitha50-yearproductioncyclerequiresanaircraftmanufacturertopurchase,allatonce,a50-yearsupplyofthemicroprocessorsthatwillruntheembeddedsoftware.Allofthesemicroprocessorsmustbemanufacturedfromthesameproductionlinefromthesamemasktoensurethatthevalidatedreal-timeperformanceismaintained.Consequently,aerospacesystemsareunabletobenetfromthetechnologicalimprovementsinthis50-yearperiodwithoutrepeatingthesoftwarevalidationandcertication,whichisveryexpensive.Henceforaerospaceapplications,efciencyisoflessrelativeimportanceascomparedtopredictabilityandsafety,whichisdifculttoensurewithoutfreezingthedesignatthephysicallevel[ 7 ]. 24

PAGE 25

Embeddedsystemsareusedinsatellitesandspaceshuttles.Forexample,smallscalesatellitesinlowearthorbit(LEO)useembeddedsystemsforearthimaginganddetectionofionosphericphenomenonthatinuencesradiowavepropagation(theionosphereisproducedbytheionizationofatmosphericneutralsbyultravioletradiationfromtheSunandresidesabovethesurfaceofearthstretchingfromaheightof50kmtomorethan1000km)[ 8 ].Embeddedsystemsenableunmannedandautonomoussatelliteplatformsforspacemissions.Forexample,thedependablemultiprocessor(DM),commissionedbyNASA'sNewMillenniumProgramforfuturespacemissions,isanembeddedsystemleveragingmulti-coreprocessorsandeld-programmablegatearray(FPGA)-basedco-processors[ 9 ]. 1.1.3Medical Embeddedsystemsarewidelyusedinmedicalequipmentwhereaproductlifecycleof7yearsisaprerequisite(i.e.,processorsusedinmedicalequipmentmustbeavailableforatleast7yearsofoperation)[ 10 ].High-performanceembeddedsystemsareusedinmedicalimagingdevices(e.g.,magneticresonanceimaging(MRI),computedtomography(CT),digitalx-ray,andultrasound)toprovidehighqualityimages,whichcanaccuratelydiagnoseanddeterminetreatmentforavarietyofpatients'conditions.Filteringnoisyinputdataandproducinghigh-resolutionimagesathighdataprocessingratesrequiretremendouscomputingpower(e.g.,videoimagingapplicationsoftenrequiredataprocessingatratesof30imagespersecondormore).Usingmulti-coreembeddedsystemshelpsinefcientprocessingofthesehigh-resolutionmedicalimageswhereashardwareco-processorssuchasgraphicsprocessingunits(GPUs)andFPGAstakeparallelcomputingontheseimagestothenextstep.Theseco-processorsofoadandacceleratesomeoftheprocessingtasksthattheprocessorwouldnormallyhandle. Somemedicalapplicationsrequirereal-timeimagingtoprovidefeedbackwhileperformingproceduressuchaspositioningastentorotherdevicesinsideapatient's 25

PAGE 26

heart.Someimagingapplicationsrequiremultiplemodalities(e.g.,CT,MRI,ultrasound)toprovideoptimalimagesasnosingletechniqueisoptimalforimagingalltypesoftissues.Intheseapplications,embeddedsystemscombineimagesfromeachmodalityintoacompositeimagethatprovidesmoreinformationthanimagesfromeachindividualmodalityseparately[ 11 ]. Embeddedsystemsareusedincardiovascularmonitoringapplicationstotreathigh-riskpatientswhileundergoingmajorsurgeryorcardiologyprocedures.Hemodynamicmonitorsincardiovascularembeddedsystemsmeasurearangeofdatarelatedtoapatient'sheartandbloodcirculationonabeat-by-beatbasis.Thesesystemsmonitorthearterialbloodpressurewaveformalongwiththecorrespondingbeatdurations,whichdeterminestheamountofbloodpumpedoutwitheachindividualbeatandheartrate. Embeddedsystemshavemadetelemedicinearealityenablingremotepatientexamination.Telemedicinevirtuallyeliminatesthedistancebetweenremotepatientsandurbanpractitionersbyusingreal-timeaudioandvideowithonecameraatthepatient'slocationandanotherwiththetreatmentspecialist.Telemedicinerequiresstandards-basedplatformscapableofintegratingamyriadofmedicaldevicesviaastandardI/OconnectionsuchasEthernet,UniversalSerialBus(USB),orvideoport.Vendors(e.g.,Intel)supplyembeddedequipmentfortelemedicinethatsupportreal-timetransmissionofhigh-denitionaudioandvideowhilesimultaneouslygatheringdatafromtheattachedperipheraldevices(e.g.,heartmonitor,CTscanner,thermometer,x-ray,andultrasoundmachine)[ 12 ]. 1.1.4Automotive Embeddedsystemsareheavilyusedintheautomotiveindustryformeasurementandcontrol.Sincetheseembeddedsystemsarecommonlyknownaselectroniccontrolunits(ECUs),weusethetermECUtorefertoanyautomotiveembeddedsystem.Astate-of-the-artluxurycarcontainsmorethan70ECUsforsafetyandcomfortfunctions 26

PAGE 27

[ 13 ].Typically,ECUsinautomotivesystemscommunicatewitheachotherovercontrollerareanetwork(CAN)buses. ECUsinautomotivesarepartitionedintotwomajorcategories:1)ECUsforcontrollingmechanicalpartsand2)ECUsforhandlinginformationsystemsand/entertainment.Therstcategoryincludeschassiscontrol,automotivebodycontrol(interiorair-conditioning,dashboard,powerwindows,etc.),power-traincontrol(engine,transmission,emissions,etc.),andactivesafetycontrol.Thesecondcategoryincludesofcecomputing,informationmanagement,navigation,externalcommunication,andentertainment[ 14 ].Eachcategoryhasuniquerequirementsforcomputationspeed,scalability,andreliability. ECUsresponsibleforpower-traincontrol,motormanagement,gearcontrol,suspensioncontrol,airbagrelease,andanti-lockingbrakesimplementclosed-loopcontrolfunctionsaswellasreactivefunctionswithhardreal-timeconstraintsandcommunicateoveraclassCCAN-bus(typically1Mbps).ECUsresponsibleforpower-trainhavestringentreal-timeandcomputingpowerconstraintsrequiringanactivationperiodofafewmillisecondsathighenginespeeds.Typicalpower-trainECUsuse32-bitmicrocontrollersrunningatafewhundredsofMHz,whereastheremainderofthereal-timesubsystemsuse16-bitmicrocontrollersrunningatlessthan1MHz.Multi-coreECUsareenvisionedasthenextgenerationsolutionforautomotiveapplicationswithintensecomputingandhighreliabilityrequirements. ThebodyelectronicsECUs,whichservethecomfortfunctions(e.g.,air-conditioning,powerwindow,seatcontrol,andparkingassistance),aremainlyreactivesystemswithonlyafewclosed-loopcontrolfunctionsandhavesoftreal-timerequirements.Forexample,driverandpassengersissuesupervisorycommandstoinitiatepowerwindowmovementbypressingtheappropriatebuttons.Thesebuttonsareconnectedtoamicroprocessorthattranslatesthevoltagescorrespondingtobuttonupanddownactionsintomessagesthattraverseoveranetworktothepowerwindowcontroller.The 27

PAGE 28

bodyelectronicsECUscommunicateviaaclassBCAN-bustypicallyoperatingat100Kbps. ECUsresponsibleforentertainmentandofceapplications(e.g.,video,sound,phone,andglobalpositioningsystem(GPS))aresoftwareintensivewithmillionsoflinesofcodeandcommunicateviaanopticaldatabustypicallyoperatingat100Mbps,whichisthefastestbusinautomotiveapplications.VariousCANbusesandopticalbusesthatconnectdifferenttypesofECUsinautomotiveapplicationsareinturnconnectedthroughacentralgateway,whichenablesthecommunicationofallECUs. Forhigh-speedcommunicationoflargevolumesofdatatrafcgeneratedby360-degreesensorspositionedaroundthevehicles,theautomotiveindustryismovingtowardstheFlexRaycommunicationstandard(aconsortiumthatincludesBMW,DaimlerChrysler,GeneralMotors,Freescale,NXP,Bosch,andVolkswagen/Audiascoremembers)[ 14 ].ThecurrentCANstandardlimitsthecommunicationspeedto500Kbpsandimposesaprotocoloverheadofmorethan40percentwhereasFlexRaydenesthecommunicationspeedat10MbpswithcomparativelylessoverheadthantheCAN.FlexRayoffersenhancedreliabilityusingadual-channelbusspecication.Thedual-channelbuscongurationcanexploitphysicalredundancyandreplicatesafety-criticalmessagesonbothbuschannels.TheFlexRaystandardaffordsbetterscalabilityfordistributedECUsascomparedtoCANbecauseofatime-triggeredcommunicationchannelspecicationsuchthateachnodeonlyneedstoknowthetimeslotsforitsoutgoingandincomingcommunications.Topromotehighscalability,thenode-assignedtimeslotscheduleisdistributedacrosstheECUnodeswhereeachnodestoresitsowntimeslotscheduleinalocalschedulingtable. 1.2EmbeddedSystemsApplicationsCharacteristics Differentembeddedapplicationshavedifferentcharacteristics.Althoughacompletecharacterizationofembeddedapplicationswithrespecttoapplications'characteristicsisoutsidethescopeofthispaper,Fig. 1-1 providesaconciseclassicationofembedded 28

PAGE 29

Figure1-1. Classicationofhigh-performanceenergy-efcientembeddedcomputing(HPEEC)techniquesbasedonembeddedapplicationcharacteristics. applicationsbasedontheircharacteristics.Wediscussbelowsomeoftheseapplicationcharacteristicsincontextoftheirassociatedembeddeddomains. 1.2.1Throughput-Intensive Throughput-intensiveembeddedapplicationsareapplicationsthatrequirehighprocessingthroughput.Networkingandmultimediaapplications,whichconstitutealargefractionofembeddedapplications[ 15 ],aretypicallythroughput-intensiveduetoeverincreasingqualityofservice(QoS)demands.Anembeddedsystemcontaininganembeddedprocessorrequiresanetworkstackandnetworkprotocolstoconnectwithotherdevices.Connectinganembeddeddeviceorawidgettoanetworkenablesremotedevicemanagementincludingautomaticapplicationupgrades.Onalargescale,networkedembeddedsystemscanenableHPECforsolvingcomplexlargeproblemstraditionallyhandledonlybysupercomputers(e.g.,climateresearch,weatherforecasting,molecularmodeling,physicalsimulations,anddatamining). 29

PAGE 30

However,connectinghundredstothousandsofembeddedsystemsforHPCrequiressophisticatedandscalableinterconnectiontechnologies(e.g.,packet-switched,wirelessinterconnects).ExamplesofnetworkingapplicationsincludeserverI/Odevices,networkinfrastructureequipment,consumerelectronics(mobilephones,mediaplayers),andvarioushomeappliances(e.g.,homeautomationincludingnetworkedTVs,VCRs,stereos,refrigerators,etc.).Multimediaapplications,suchasvideostreaming,requireveryhighthroughputoftheorderofseveralGOPs.Abroadcastvideowithaspecicationof30framespersecondwith720480pixelsperframerequiresapproximately400,000blocks(groupofpixels)tobeprocessedpersecond.Atelemedicineapplicationrequiresprocessingof5millionblockspersecond[ 16 ]. 1.2.2Thermal-Constrained Anembeddedapplicationisthermal-constrainedifanincreaseintemperatureaboveathresholdcouldleadtoincorrectresultsoreventheembeddedsystemfailure.Dependingonthetargetmarket,embeddedapplicationstypicallyoperateabove45C(e.g.,telecommunicationembeddedequipmenttemperatureexceeds55C)incontrasttotraditionalcomputersystems,whichnormallyoperatebelow38C[ 17 ].Meetingembeddedapplicationthermalconstraintsischallengingduetotypicallyharshandhigh-temperatureoperatingenvironments.Limitedspaceandenergybudgetsexacerbatethesethermalchallengessinceactivecoolingsystems(fans-based)aretypicallyinfeasibleinmostembeddedsystems,resultinginonlypassiveandfanlessthermalsolutions. 1.2.3Reliability-Constrained Embeddedsystemswithhighreliabilityconstraintsaretypicallyrequiredtooperateformanyyearswithouterrorsand/ormustrecoverfromerrorssincemanyreliability-constrainedembeddedsystemsaredeployedinharshenvironmentswherepost-deploymentremovalandmaintenanceisinfeasible.Hence,hardwareandsoftwareforreliability-constrainedembeddedsystemsmustbedevelopedandtestedmore 30

PAGE 31

carefullythantraditionalcomputersystems.Safetycriticalembeddedsystems(e.g.,automotiveairbags,spacemissions,aircraftightcontrollers)haveveryhighreliabilityrequirements(e.g.,thereliabilityrequirementforaight-controlembeddedsystemonacommercialairlineris10)]TJ /F4 7.97 Tf 6.59 0 Td[(10failuresperhourwhereafailurecouldleadtoaircraftloss[ 18 ]). 1.2.4Real-Time Inadditiontocorrectfunctionaloperation,real-timeembeddedapplicationshaveadditionalstringenttimingconstraints,whichimposereal-timeoperationaldeadlinesontheembeddedsystem'sresponsetime.Althoughreal-timeoperationdoesnotstrictlyimplyhigh-performance,real-timeembeddedsystemsrequirehighperformanceonlytothepointthatthedeadlineismet,atwhichtimehighperformanceisnolongerneeded.Hence,real-timeembeddedsystemsrequirepredictablehigh-performance.Real-timeoperatingsystems(RTOSs)provideguaranteesformeetingthestringentdeadlinerequirementsforembeddedapplications. 1.2.5ParallelandDistributed Parallelanddistributedembeddedapplicationsleveragedistributedembeddeddevicestocooperateandaggregatetheirfunctionalitiesorresources.Wirelesssensornetwork(WSN)applicationsusesensornodestogathersensedinformation(statisticsanddata)andusedistributedfault-detectionalgorithms.Mobileagent(autonomoussoftwareagent)-baseddistributedembeddedapplicationsallowtheprocessstatetobesavedandtransportedtoanothernewembeddedsystemwheretheprocessresumesexecutionfromthesuspendedpoint(e.g.,virtualmigration).Manyembeddedapplicationsexhibitvaryingdegrees(lowtohighlevels)ofparal-lelism,suchasinstructionlevelparallelism(ILP)andthread-levelparallelism(TLP).InnovativearchitecturalandsoftwareHPEECtechniquesarerequiredtoexploitanembeddedapplication'savailableparallelismtoachievehigh-performancewithlowpowerconsumption. 31

PAGE 32

VariousHPEECtechniquesatdifferentlevels(e.g.,architecture,middleware,andsoftware)canbeusedtoenableanembeddedplatformtomeettheembeddedapplicationrequirements.Fig. 1-1 classiesembeddedapplicationcharacteristicsandtheHPEECtechniquesavailableatarchitecture,middleware,andsoftwarelevelsthatcanbeleveragedbytheembeddedplatformsexecutingtheseapplicationstomeettheapplicationrequirements(wedescribethedetailsofthesetechniquesinlatersectionsofthepaper).Forexample,throughput-intensiveapplicationscanleveragearchitecturalinnovations(e.g.,tiledmulti-corearchitectures,high-bandwidthinterconnects),hardware-assistedmiddlewaretechniques(e.g.,speculativeapproaches,DVFS,hyper-threading),andsoftwaretechniques(e.g.,dataforwarding,taskscheduling,andtaskmigration).WepointoutthatHPEECtechniquesarenotorthogonalandmanyofthesetechniquescanbeappliedinconjunctionwithoneanothertomorecloselymeetapplicationrequirements.Furthermore,HPEECtechniquesthatbenetoneapplicationrequirement(e.g.,reliability)mayalsobenetotherapplicationrequirements(e.g.,throughput,real-timedeadlines).Forexample,theinterconnectionnetworknotonlydeterminesthefault-tolerancecharacteristicsofembeddedsystemsbutalsoaffectstheattainablethroughputandresponsetime. 1.3EmbeddedSystemsHardwareandSoftware Aninterestingcharacteristicofembeddedsystemdesignishardware/softwarecodesign(i.e.,bothhardwareandsoftwareneedstobeconsideredtogethertondtherightcombinationofhardwareandsoftwarethatwouldresultinthemost-efcientproductmeetingtherequirementspecications).Themappingofapplicationsoftwaretohardwaremustadheretothedesignconstraints(e.g.,real-timedeadlines)andobjectivefunctions(e.g.,cost,energyconsumption)(objectivefunctionsarediscussedindetailinSection 1.4 ).Inthissection,wegiveanoverviewofembeddedsystemshardwareandsoftwareasdepictedinFig. 1-2 32

PAGE 33

Figure1-2. Embeddedsystemshardwareandsoftwareoverview. 1.3.1EmbeddedSystemsHardware Embeddedsystemshardwareislessstandardizedascomparedtothatfordesktopcomputers.However,inmanyembeddedsystems,hardwareisusedwithinaloopwheresensorsgatherinformationaboutthephysicalenvironmentandgeneratecontinuoussequencesofanalogsignals/values.Sample-and-holdcircuitsandanalog-to-digital(A/D)convertersdigitizetheanalogsignals.Thedigitalsignalsareprocessedandtheresultsaredisplayedand/orusedtocontrolthephysicalenvironmentviaactuators.Attheoutput,adigital-to-analog(D/A)conversionisgenerallyrequiredbecausemanyactuatorsareanalog.Inthefollowingsubsections,wedescribebrieythehardwarecomponentsofatypicalembeddedsystem[ 1 ]: 33

PAGE 34

1.3.1.1Sensors Embeddedsystemscontainavarietyofsensorssincetherearesensorsforvirtuallyeveryphysicalquantity(e.g.,weight,electriccurrent,voltage,temperature,velocity,andacceleration).Asensor'sconstructioncanexploitavarietyofphysicaleffectsincludingthelawofinduction(voltagegenerationinanelectriceld)andphotoelectriceffects.Recentadvancesinsmartembeddedsystemsdesign(e.g.,WSNs,CPSs)canbeattributedtothelargevarietyofavailablesensors. 1.3.1.2Sample-and-HoldcircuitsandA/Dconverters Sample-and-holdcircuitsandA/Dconvertersworkintandemtoconvertincominganalogsignalsfromsensorsintodigitalsignals.Sample-and-holdcircuitsconvertananalogsignalfromthecontinuoustimedomaintothediscretetimedomain.Thecircuitconsistsofaclockedtransistorandacapacitor.Thetransistorfunctionslikeaswitchwhereeachtimetheswitchisclosedbytheclockedsignal,thecapacitorischargedtothevoltagev(t)oftheincomingvoltagee(t).Thevoltagev(t)essentiallyremainsthesameevenafteropeningtheswitchbecauseofthechargestoredinthecapacitoruntiltheswitchclosesagain.Eachofthevoltagevaluesstoredinthecapacitorareconsideredasanelementofadiscretesequenceofvaluesgeneratedfromthecontinuoussignale(t).TheA/Dconvertersmapthesevoltagevaluestoadiscretesetofpossiblevaluesaffordedbythequantizationprocessthatconvertsthesevaluestodigits.ThereexistsavarietyofA/Dconverterswithvaryingspeedandprecisioncharacteristics. 1.3.1.3Processingunits TheprocessingunitsinembeddedsystemsprocessthedigitalsignalsoutputfromtheA/Dconverters.Energy-efciencyisanimportantfactorinselectionofprocessingunitsforembeddedsystems.Wecategorizeprocessingunitsinthreemaintypes: 1. Application-specicIntegratedCircuits(ASICs):ASICsimplementanembeddedapplication'salgorithminhardware.Foraxedprocesstechnology,ASICsprovide 34

PAGE 35

thehighestenergy-efciencyamongavailableprocessingunitsatthecostofnoexibility(Fig. 1-2 ). 2. Processors:Manyembeddedsystemscontainageneral-purposemicroprocessorand/oramicrocontroller.Theseprocessorsenableexibleprogrammingbutaremuchlessenergy-efcientthanASICs.High-performanceembeddedapplicationsleveragemulti-core/many-coreprocessors,applicationdomain-specicprocessors(e.g.,digitalsignalprocessors(DSPs)),andapplication-specicinstructionsetprocessors(ASIPs)thatcanprovidetherequiredenergy-efciency.Graphicsprocessingunits(GPUs)areoftenusedascoprocessorsinimagingapplicationstoaccelerateandofoadworkfromthegeneral-purposeprocessors(Fig. 1-2 ). 3. Field-programmableGateArrays(FPGAs):SinceASICsaretooexpensiveforlow-volumeapplicationsandsoftware-basedprocessorscanbetoosloworenergy-inefcient,recongurablelogic(ofwhichFPGAsarethemostprominent)canprovideanenergy-efcientsolution.FPGAscanpotentiallydeliverperformancecomparabletoASICsbutofferrecongurabilityusingdifferent,specializedcongurationdatathatcanbeusedtorecongurethedevice'shardwarefunctionality.FPGAsaremainlyusedforhardwareaccelerationforlow-volumeapplicationsandrapidprototyping.FPGAscanbeusedforrapidsystemprototypingthatemulatesthesamebehaviorasthenalsystemandthuscanbeusedforexperimentationpurposes. 1.3.1.4Memorysubsystems Embeddedsystemsrequirememorysubsystemstostorecodeanddata.Memorysubsystemsinembeddedsystemstypicallyconsistofon-chipcachesandanoff-chipmainmemory.Cachesinembeddedsystemshavedifferenthierarchy:leveloneinstructioncache(L1-I)forholdinginstructions,levelonedatacacheforholdingdata(L1-D),leveltwounied(instruction/data)cache(L2),andrecentlylevelthreecache(L3).Cachesprovidemuchfasteraccesstocodeanddataascomparedtothemainmemory.However,cachesarenotsuitableforreal-timeembeddedsystemsbecauseoflimitedpredictabilityofhitratesandthereforeaccesstime.Toofferbettertimingpredictabilityformemorysubsystems,manyembeddedsystemsespeciallyreal-timeembeddedsystemsusescratchpadmemories.Scratchpadmemoriesenablesoftware-basedcontrolfortemporarystorageofcalculations,data,andotherworkinprogressinsteadofhardware-basedcontrolasincaches.Fornon-volatilestorageofcodeanddata,embeddedsystemsuseashmemorythatcanbeelectricallyerased 35

PAGE 36

andreprogrammed.Examplesofembeddedsystemsusingashmemoryincludepersonaldigitalassistants(PDAs),digitalaudioandmediaplayers,digitalcameras,mobilephones,videogames,medicalequipment,etc. 1.3.1.5D/Aconverters Asmanyoftheoutputdevicesareanalog,embeddedsystemsleverageD/Aconverterstoconvertdigitalsignalstoanalogsignals.D/Aconverterstypicallyuseweightedresistorstogenerateacurrentproportionaltothedigitalnumber.Thiscurrentistransformedintoaproportionalvoltagebyusinganoperationalamplier. 1.3.1.6Outputdevices Embeddedsystems'outputdevicesincludedisplaysandelectro-mechanicaldevicesknownasactuators.Actuatorscandirectlyimpacttheenvironmentbasedontheprocessedand/orcontrolinformationfromembeddedsystems.Actuatorsarekeyelementsinreactiveandinteractiveembeddedsystems,especiallyCPSs. 1.3.2EmbeddedSystemsSoftware Embeddedsystemssoftwareconsistsofanoperatingsystem,amiddleware,andapplicationsoftware(Fig. 1-2 ).Embeddedsoftwarehasmorestringentresourceconstraints(e.g.,smallmemoryfootprint,smalldatawordsizes)thantraditionaldesktopsoftware.Inthefollowingsubsections,wedescribeembeddedsystemsmainsoftwarecomponents: 1.3.2.1Operatingsystem Exceptforverysimpleembeddedsystems,mostembeddedsystemsrequireanoperatingsystem(OS)forscheduling,taskswitching,andI/O.EmbeddedOSs(EOSs)differfromtraditionaldesktopOSsbecauseEOSsprovidelimitedfunctionalitybutahigh-levelofcongurabilityinordertoaccommodateawidevarietyofapplicationrequirementsandhardwareplatformfeatures.Manyembeddedsystemsapplications(e.g.,CPSs)arereal-timeandrequiresupportfromareal-timeOS(RTOS).AnRTOS 36

PAGE 37

leveragesdeterministicschedulingpoliciesandprovidespredictabletimingbehaviorwithguaranteesontheupperboundofatask'sexecutiontime. 1.3.2.2Middleware MiddlewareisasoftwarelayerbetweentheapplicationsoftwareandtheEOS.Middlewaretypicallyincludescommunicationlibraries(e.g.,messagepassinginterface(MPI),ilibAPIforTilera[ 19 ]).Wepointoutthatreal-timeembeddedsystemsrequireareal-timemiddleware. 1.3.2.3Applicationsoftware Embeddedsystemscontainapplicationsoftwarespecictoanembeddedapplication(e.g.,aportablemediaplayer,aphoneframework,ahealthcareapplication,andanambientconditionsmonitoringapplication).EmbeddedapplicationsleveragecommunicationlibrariesprovidedbythemiddlewareaswellasEOSfeatures.Applicationsoftwaredevelopmentforembeddedsystemsrequiresknowledgeofthetargethardwarearchitectureasassemblylanguagefragmentsareoftenembeddedwithinthesoftwarecodeforhardwarecontrolorperformancepurposes.Thesoftwarecodeistypicallywritteninahigh-levellanguage,suchasC,whichpromotesapplicationsoftwareconformitytostringentresourceconstraints(e.g.,limitedmemoryfootprintandsmalldatawordsizes). Applicationsoftwaredevelopmentforreal-timeapplicationsmustconsiderreal-timeissues,especiallytheworst-caseexecutiontime(WCET).TheWCETisdenedasthelargestexecutiontimeofaprogramforanyinputandanyinitialexecutionstate.WepointoutthattheexactWCETcanonlybecomputedforcertainprogramsandtaskssuchasprogramswithoutrecursions,withoutwhileloops,andloopswithastaticallyknowniterationcount[ 1 ].Modernpipelinedprocessorarchitectureswithdifferenttypesofhazards(e.g.,datahazards,controlhazards)andmodernmemorysubsystemscomposedofdifferentcachehierarchieswithlimitedhitratepredictabilitymakesWCETdeterminationfurtherchallenging.Toofferbettertimingpredictability 37

PAGE 38

formemorysubsystems,manyembeddedsystems(real-timeembeddedsystemsinparticular)usescratchpadmemories.Scratchpadmemoriesenablesoftware-basedcontrolfortemporarystorageofcalculations,data,andotherworkinprogressinsteadofhardware-basedcontrolasincaches.SinceexactWCETdeterminationisextremelydifcult,designerstypicallyspecifyupperboundsontheWCET. 1.4ModelingAnIntegralPartoftheEmbeddedSystemDesignFlow Modelingstemsfromtheconceptofabstraction(i.e.,deningarealworldobjectinasimpliedform).Formally,amodelisdenedas[ 1 ]:Amodelisasimplicationofanotherentity,whichcanbeaphysicalthingoranothermodel.Themodelcontainsexactlythosecharacteristicsandpropertiesofthemodeledentitythatarerelevantforagiventask.Amodelisminimalwithrespecttoataskifitdoesnotcontainanyothercharacteristicsthanthoserelevantforthetask. Thekeyphasesintheembeddedsystemdesignoware:requirementspecications,hardware/software(HW/SW)partitioning,preliminarydesign,detaileddesign,componentimplementation,componenttest/validation,codegeneration,systemintegration,systemverication/evaluation,andproduction[ 13 ].Therstphase,requirementspecications,outlinestheexpected/desiredbehaviorofthesystemunderdesign(SUD)andusecasesdescribepotentialapplicationsoftheSUD.Youngetal.[ 20 ]commentedontheimportanceofrequirementspecications:Adesignwithoutspecicationscannotberightorwrong,itcanonlybesurprising!.HW/SWpartitioningpartitionsanapplication'sfunctionalityintoacombinationofinteractinghardwareandsoftware.EfcientandeffectiveHW/SWpartitioningcanenableaproducttomorecloselymeettherequirementspecications.Thepreliminarydesignisahigh-leveldesignwithminimumfunctionalitythatenablesdesignerstoanalyzethekeycharacteristics/functionalityofanSUD.Thedetaileddesignspeciesthedetailsthatareabsentfromthepreliminarydesignsuchasdetailedmodelsordriversforacomponent.Sinceembeddedsystemsarecomplexandarecomprisedofmanycomponents/subsystems,manyembeddedsystemsare 38

PAGE 39

designedandimplementedcomponent-wise,whichaddscomponentimplementationandcomponenttesting/validationphasestothedesignow.Componentvalidationmayinvolvesimulationfollowedbyacodegenerationphasethatgeneratestheappropriatecodeforthecomponent.Systemintegrationistheprocessofintegratingthedesignoftheindividualcomponents/subsystemintothecomplete,functioningembeddedsystem.Verication/evaluationistheprocessofverifyingquantitativeinformationofkeyobjectivefunctions/characteristics(e.g.,executiontime,reliability)ofacertain(possiblypartial)design.Onceanembeddedsystemdesignhasbeenveried,theSUDentersthatproductionphasethatproduces/fabricatestheSUDaccordingtomarketrequirementsdictatedbysupplyanddemandeconomicmodel.Modelingisanintegralpartoftheembeddedsystemsdesignow,whichabstractstheSUDandisusedthroughoutthedesignow,fromtherequirementspecicationsphasetotheformalverication/evaluationphase. Mostoftheerrorsencounteredduringembeddedsystemsdesignaredirectlyorindirectlyrelatedtoincomplete,inconsistent,orevenincorrectrequirementspecications.Currently,therequirementspecicationsaremostlygiveninsentencesofanaturallanguage(e.g.,English),whichcanbeinterpreteddifferentlybytheOEMsandthesuppliers(e.g.,Bosch,Siemensthatprovideembeddedsubsystems).Tominimizethedesignerrors,theembeddedindustrypreferstoreceivetherequirementspecicationsinamodelingtool(e.g.,graphicalorlanguagebased).Modelingfacilitatesdesignerstodeduceerrorsandquantitativeaspects(e.g.,reliability,lifetime)earlyinthedesignow. OncetheSUDmodelingiscomplete,thenextphaseisvalidationthroughsimulationfollowedbycodegeneration.Validationistheprocessofcheckingwhetheradesignmeetsalloftheconstraintsandperformsasexpected.SimulatingembeddedsystemsmayrequiremodelingtheSUD,theoperatingenvironment,orboth.ThreeterminologiesareusedinliteraturedependingonwhethertheSUDortherealenvironmentorbotharemodeled:Software-in-the-loopreferstosimulationwhereboththeSUDandthe 39

PAGE 40

realenvironmentaremodeledforearlysystemvalidation;RapidprototypingreferstosimulationwheretheSUDismodeledandtherealenvironmentexistsforearlyproofofconcept(POC);andHardware-in-the-loopreferstosimulationwherethephysicalSUDexistsandrealenvironmentismodeledforexhaustivecharacterizationoftheSUD. Scalabilityinmodeling/vericationmeansthatifamodeling/vericationtechniquecanbeusedtoabstract/verifyaspecicsmallsystem/subsystem,thesametechniquecanbeusedtoabstract/verifylargesystems.Insomescenarios,modeling/vericationisscalableifthecorrectnessofalargesystemcanbeinferredfromasmallveriablemodeledsystem.Reductiontechniquessuchaspartialorderreductionandsymmetryreductionaddressesthisscalabilityproblem,however,thisarearequiresfurtherresearch. 1.4.1ModelingObjectives Embeddedsystemsdesignrequirescharacterizationofseveralobjectives,ordesignmetrics,suchastheaverageandworst-caseexecutiontimes,codesize,energy/powerconsumption,safety,reliability,temperature/thermalbehavior,electromagneticcompatibility,cost,andweight.Wepointoutthatsomeoftheseobjectivescanbetakenasdesignconstraintssinceinmanyoptimizationproblems,objectivescanbereplacedbyconstraintsandviceversa.Consideringmultipleobjectivesisauniquecharacteristicofmanyembeddedsystemsandcanbeaccuratelycapturedusingmathematicalmodels.Asystemorsubsystem'smathematicalmodelisamathematicalstructureconsistingofsets,denitions,functions,relations,logicalpredicates(trueorfalsestatements),formulas,and/orgraphs.Manymathematicalmodelsforembeddedsystemsuseobjectivefunction(s)tocharacterizesomeoralloftheseobjectives,whichaidsinearlyevaluationofembeddedsystemdesignbyquantifyinginformationforkeyobjectives. Theobjectivesforanembeddedsystemcanbecapturedmathematicallyusinglinear,piecewiselinear,ornon-linearfunctions.Forexample,alinearobjectivefunction 40

PAGE 41

Figure1-3. Alinearobjectivefunctionforreliability. forthereliabilityofanembeddedsystemoperatinginastates(Fig. 1-3 )canbegivenas[ 21 ] fr(s)=8>>>>>><>>>>>>:1,rUR(r)]TJ /F3 11.955 Tf 11.95 0 Td[(LR)=(UR)]TJ /F3 11.955 Tf 11.96 0 Td[(LR),LR
PAGE 42

eachother),respectively,giventhattherearemobjectives.Individualobjectivesarecharacterizedbytheirrespectiveobjectivefunctionsfk(s)(e.g.,alinearobjectivefunctionforreliabilityisgiveninEquation 1 anddepictedinFig. 1-3 ). Asingleobjectivefunctionallowsselectionofasingledesignfromthedesignspace(thedesignspacerepresentsthesetcontainingallpotentialdesigns),however,theassignmentsofweightsfordifferentobjectivesinthesingleobjectivefunctioncanbechallengingusinginformalrequirementspecications.Alternatively,theuseofmultiple,separateobjectivefunctionsreturnsasetofdesignsfromwhichadesignercanselectanappropriatedesignthatmeetsthemostcriticalobjectivesoptimally/sub-optimally.Oftenembeddedsystemsmodelingfocusesonoptimizationofanobjectivefunction(e.g.,power,throughput,reliability)subjecttodesignconstraints.Typicaldesignconstraintsforembeddedsystemsincludesafety,hardreal-timerequirements,andtoughoperatingconditionsinaharshenvironment(e.g.,aerospace)thoughsomeoralloftheseconstraintscanbeaddedasobjectivestotheobjectivefunctioninmanyoptimizationproblemsasdescribedabove. 1.4.2ModelingParadigms Sinceembeddedsystemscontainsalargevarietyofabstractionlevels,components,andaspects(e.g.,hardware,software,functional,verication)thatcannotbesupportedbyonelanguageortool,designersrelyonvariousmodelingparadigms,eachofwhichtargetapartialaspectofthecompletedesignowfromrequirementspecicationstoproduction.Eachmodelingparadigmdescribesthesystemfromadifferentpointofviewbutnoneoftheparadigmscoverallaspects.Wediscusssomeofthemodelingparadigmsusedinembeddedsystemsdesigninthefollowingsubsections,eachofwhichmayusedifferenttoolstoassistwithmodeling: 1.4.2.1Differentialequations Differentialequations-basedmodelingcaneitheruseordinarydifferentialequations(ODEs)orpartialdifferentialequations(PDEs).ODEs(linearandnon-linear)areused 42

PAGE 43

tomodelsystemsorcomponentscharacterizedbyquantitiesthatarecontinuousinvalueandtime,suchasvoltageandcurrentinelectricalsystems,speedandforceinmechanicalsystems,ortemperatureandheatowinthermalsystems[ 13 ].ODE-basedmodelstypicallydescribeanalogelectricalnetworksorthemechanicalbehaviorofthecompletesystemorcomponent.ODEsareespeciallyusefulforstudyingfeedbackcontrolsystemsthatcanmakeanunstablesystemastableone(feedbacksystemsmeasuretheerror(i.e.,differencebetweentheactualanddesiredbehavior)andusethiserrorinformationtocorrectthebehavior).WeemphasizethatODEsworkforsmoothmotionwherelinearity,timeinvariance,andcontinuitypropertieshold.Non-smoothmotioninvolvingcollisionsrequireshybridmodelsthatareamixtureofcontinuousanddiscretetimemodels[ 23 ]. PDEsareusedformodelingbehaviorinspaceandtime,suchasmovingelectrodesinelectromagneticeldsandthermalbehavior.NumericalsolutionsforPDEsarecalculatedbynite-elementmethods(FEMs)[ 23 ]. 1.4.2.2Statemachines Statemachinesareusedformodelingdiscretedynamicsandareespeciallysuitableforreactivesystems.Finite-statemachines(FSMs)andstate-chartsaresomeofthepopularexamplesofstatemachines.CommunicatingFSMs(CFSMs)representseveralFSMscommunicatingwitheachother.State-chartsextendFSMswithamechanismfordescribinghierarchyandconcurrency.Hierarchyisincorporatedusingsuper-statesandsub-states,wheresuper-statesarestatesthatcompriseothersub-states[ 1 ].Concurrencyinstate-chartsismodeledusingAND-states.Ifasystemcontainingasuper-stateSisalwaysinallofthesub-statesofSwheneverthesystemisinS,thenthesuper-stateSisanAND-super-state. 1.4.2.3Dataow Dataowmodelingidentiesandmodelsdatamovementinaninformationsystem.Dataowmodelingrepresentsprocessesthattransformdatafromoneformtoanother, 43

PAGE 44

externalentitiesthatreceivedatafromasystemorsenddataintothesystem,datastoresthatholddata,anddataowthatindicatestheroutesoverwhichthedatacanow.Adataowmodelisrepresentedbyadirectedgraphwherethenodes/vertices,actors,representcomputation(computationmapsinputdatastreamsintooutputdatastreams)andthearcsrepresentcommunicationchannels.Synchronousdataow(SDF)andKahnprocessnetworks(KPNs)arecommonexamplesofdataowmodels.ThekeycharacteristicsofthesedataowmodelsisthatSDFsassumethatallactorsexecuteinasingleclockcyclewhereasKPNspermitactorstoexecutewithanynitedelay[ 1 ]. 1.4.2.4Discreteevent-basedmodeling Discreteevent-basedmodelingisbasedonthenotionofringorexecutingasequenceofdiscreteevents,whicharestoredinaqueueandaresortedbythetimeatwhichtheseseventsshouldbeprocessed.Aneventcorrespondingtothecurrenttimeisremovedfromthequeue,processedbyperformingthenecessaryactions,andneweventsmaybeenqueuedbasedontheaction'sresults[ 1 ].Ifthereisnoeventinthequeueforthecurrenttime,thetimeadvances.Hardwaredescriptionlanguages(e.g.,VHDL,Verilog)aretypicallybasedondiscreteeventmodeling.SystemC,whichisasystem-levelmodelinglanguage,isalsobasedondiscreteeventmodelingparadigm. 1.4.2.5Stochasticmodels Numerousstochasticmodelsexist,whichmainlydifferintheassumeddistributionsofthestateresidencetimes,todescribeandanalyzesystemperformanceanddependability.Analyzingtheembeddedsystem'sperformanceinanearlydesignphasecansignicantlyreducelate-detected,andthereforecost-intensive,problems.Markovchainsandqueueingmodelsarepopularexamplesofstochasticmodels.ThestateresidencetimesinMarkovchainsaretypicallyassumedtohaveexponentialdistributionsbecauseexponentialdistributionsleadtoefcientnumericalanalysis,althoughothergeneralizationsarealsopossible.PerformancemeasuresareobtainedfromMarkovchainsbydeterminingsteadystateandtransientstateprobabilities. 44

PAGE 45

Queueingmodelsareusedtomodelsystemsthatcanbeassociatedwithsomenotionofqueues.Queueingmodelsarestochasticmodelssincethesemodelsrepresenttheprobabilityofndingaqueueingsysteminaparticularcongurationorstate. Stochasticmodelscancapturethecomplexinteractionsbetweenanembeddedsystemanditsenvironment.Timeliness,concurrency,andinteractionwiththeenvironmentareprimarycharacteristicsofmanyembeddedsystemsandnon-determinismenablesstochasticmodelstoincorporatethesecharacteristics.Specically,non-determinismisusedformodelingunknownaspectsoftheenvironmentorsystem.Markovdecisionprocesses(MDPs)arediscretestochasticdynamicprograms,anextensionofdiscretetimeMarkovchains,thatexhibitnon-determinism.MDPsassociatearewardwitheachstateintheMarkovchain. 1.4.2.6Petrinets APetrinetisamathematicallanguagefordescribingdistributedsystemsandisrepresentedbyadirected,bipartitegraph.ThekeyelementsofPetrinetsareconditions,events,andaowrelation.Conditionsareeithersatisedornotsatised.Theowrelationdescribestheconditionsthatmustbemetbeforeaneventcanreaswellasprescribestheconditionsthatbecometruewhenafteraneventres.Activitychartsinuniedmodelinglanguage(UML)arebasedonPetrinets[ 1 ]. 1.4.3StrategiesforIntegrationofModelingParadigms Describingdifferentaspectsandviewsofanentireembeddedsystem,subsystem,orcomponentoverdifferentdevelopmentphasesrequiresdifferentmodelingparadigms.However,sometimespartialdescriptionsofasystemneedtobeintegratedforsimulationandcodegeneration.Multi-paradigmlanguagesintegratedifferentmodelingparadigms.Therearetwotypesofmulti-paradigmmodeling[ 13 ]: 1. Onemodeldescribingasystemcomplementsanothermodelresultinginamodelofthecompletesystem 2. Twomodelsgivedifferentviewsofthesamesystem 45

PAGE 46

UMLisanexampleofmulti-paradigmmodeling,whichisoftenusedtodescribesoftware-intensivesystemcomponents.UMLenablesthedesignertoverifyadesignbeforeanyhardware/softwarecodeiswritten/generated[ 24 ]andallowsgenerationoftheappropriatecodefortheembeddedsystemusingasetofrules.UMLoffersastructuredandrepeatabledesign:ifthereisaproblemwiththebehavioroftheapplication,thenthemodelischangedaccordingly,andiftheproblemliesintheperformanceofthecode,thentherulesareadjusted.Similarly,MATLAB'sSimulinkmodelingenvironmentintegratesacontinuoustimeanddiscretetimemodelofcomputationbasedonequationsolvers,adiscreteeventmodel,andanFSMmodel. Twostrategiesfortheintegrationofheterogeneousmodelingparadigmsare[ 13 ]: 1. Integrationofoperations(analysis,synthesis)onmodels 2. Integrationofmodelsthemselvesviamodeltranslation Webrieydescribeseveraldifferentintegrationapproachesthatleveragethesestrategiesinthefollowingsubsections. 1.4.3.1Cosimulation Cosimulationpermitssimulationofpartialmodelsofasystemindifferenttoolsandintegratesthesimulationprocess.Cosimulationdependsonacentralcosimulationengine,calledasimulationbackplane,thatmediatesbetweenthedistributedsimulationsrunbythesimulationenginesoftheparticipatingcomputer-aidedsoftwareengineering(CASE)tools.Cosimulationisusefulandsufcientformodelvalidationwhensimulationistheonlypurposeofmodelintegration.Ingeneral,cosimulationisusefulforthecombinationofasystemmodelwithamodelofthesystem'senvironmentsincethesystemmodelisconstructedcompletelyinonetoolandentersintothecodegenerationphase,whereastheenvironmentmodelisonlyusedforsimulation.Alternatively,cosimulationisinsufcientifbothofthemodels(thesystemanditsenvironmentmodel)areintendedforcodegeneration. 46

PAGE 47

1.4.3.2Codeintegration Manymodelingtoolshaveassociatedcodegeneratorsandcodeintegrationistheprocessofintegratingthegeneratedcodesfrommultiplemodelingtools.Codeintegrationtoolsexpeditethedesignprocessbecauseintheabsenceofacodeintegrationtool,subsystemcodesgeneratedbydifferenttoolshavetobeintegratedmanually. 1.4.3.3Codeencapsulation CodeencapsulationisafeatureofferedbymanyCASEtoolsthatpermitscodeencapsulationofasubsystemmodelasablockboxintheoverallsystemmodel.Codeencapsulationfacilitatesautomatedcodeintegrationaswellasoverallsystemsimulation. 1.4.3.4Modelencapsulation Inmodelencapsulation,anoriginalsubsystemmodelisencapsulatedasanequivalentsubsystemmodelinthemodelinglanguageoftheenclosingsystem.Modelencapsulationpermitscoordinatedcodegenerationinwhichthecodegenerationfortheenclosingsystemdrivesthecodegeneratorforthesubsystem.Theenclosingsystemtoolcanberegardedasamastertoolandtheencapsulatedsubsystemtoolasaslavetool,therefore,modelencapsulationrequiresthemastertooltohaveknowledgeoftheslavetool. 1.4.3.5Modeltranslation Inmodeltranslation,asubsystemmodelistranslatedsyntacticallyandsemanticallytothelanguageoftheenclosingsystem.Thistranslationresultsinahomogeneousoverallsystemmodelsothatonetoolchaincanbeusedforfurtherprocessingofthecompletesystem. 1.5DissertationContributions Thisdissertationfocusesonmodeling,analysis,andoptimizationofparallelanddistributedembeddedsystems.Toillustratethemodelingandoptimizationof 47

PAGE 48

distributedembeddedsystems,wefocusourworkonmodelingandoptimizationofdistributedEWSNs.SpecicallywetargetdynamicoptimizationmethodologiesbasedonthesensornodetunableparametervaluesettingsforEWSNs.Weplantoelaborateadditionalmodelingissuesinparallelembeddedsystemsduetomulti-core/many-corewavebyourperformancemodelingofmulti-coreparallelembeddedsystems.Thedissertationfurtherevaluatestwomulti-corearchitectures(i.e.,symmetricmultiprocessors(SMPs)andtiledmulti-corearchitectures(TMAs)basedonparallelizedbenchmarks). Specically,thisdissertationmakesthefollowingmaincontributions: ThedissertationdiscussesvariousoptimizationapproachesindistributedembeddedsystemsfocusingondistributedEWSNs. Thedissertationproposesanapplicationmetricsestimationmodelthatestimateshigh-levelapplicationmetricsfromlow-levelembeddedsensornodetunableparametersandtheembeddedsensornode'shardwareinternals(e.g.,transceivervoltage,transceiverreceivecurrent).Thedynamicoptimizationmethodologiesforsingle-coreembeddedsensornodesleveragethisestimationmodelwhilecomparingdifferentoperatingstatesforoptimizationpurposes. Thedissertationproposesafault-tolerantembeddedsensornodemodelconsistingofduplexsensors(i.e.,oneactivesensorandoneinactivesparesensor),whichexploitsthesynergyoffaultdetectionandFT.WhereassensorsmayemployN-modularredundancy(triplemodularredundancy(TMR)isaspecialcaseofN-modularredundancy)[ 25 ],weproposeaduplexsensornodemodeltominimizetheadditionalcost.ThisduplexsensornodemodelservesasarststeptowardsreliablesensornodemodelingandcanpotentiallysparkfurtherresearchinFTsensornodemodelingandevaluation.WeinvestigatethesynergyoffaultdetectionandFTforEWSNsandcharacterizeFTparameterssuchascoveragefactor,sensorfailureprobability,andsensorfailurerate.WeproposeMarkovmodelstocharacterizeEWSNreliabilityandMTTF.OurMarkovmodelsarecomprehensiveandcharacterizesensornodes,EWSNclusters,andoverallEWSNreliabilityandMTTF. Thedissertationproposes,forthersttime,aMarkovDecisionProcess(MDP)-baseddynamicoptimizationmethodologyforembeddedsensornodesindistributedEWSNs.MDPissuitableforEWSNdynamicoptimizationbecauseofMDP'sinherentabilitytoperformdynamicdecisionmaking.ThisdissertationpresentsarststeptowardsMDP-baseddynamicoptimization. ThedissertationproposesonlinealgorithmsfordynamicoptimizationofembeddedsensornodesindistributedEWSNs.Lightweight(lowcomputational 48

PAGE 49

andmemoryresources)onlinealgorithmsarecrucialforembeddedsensornodesconsideringlimitedprocessing,storage,andenergyresourcesofembeddedsensornodes.Ouronlinelightweightoptimizationalgorithmsimpartfastdesignspaceexplorationtoyieldanoptimalornearoptimalparametervalueselection. Thedissertationproposesalightweightdynamicoptimizationmethodologyforembeddedsensornodesthatintelligentlyselectsappropriateinitialtunableparametervaluesettingsbyevaluatingapplicationrequirements,therelativeimportanceoftheserequirementswithrespecttoeachother,andthemagnitudeinwhicheachparametereffectseachrequirement.Thisone-shotoperatingstateobtainedfromappropriateinitialparametervaluesettingsprovidesahigh-qualityoperatingstatewithminimaldesignspaceexplorationforhighly-constrainedapplications.Resultsrevealthattheone-shotoperatingstateiswithin8%oftheoptimaloperatingstateaveragedoverseveraldifferentapplicationdomainsanddesignspaces.Ourdynamicoptimizationmethodologyiterativelyimprovetheone-shotoperatingstatetoprovideanoptimalornear-optimaloperatingstateforlessconstrainedapplications.Thedynamicoptimizationmethodologycombinestheinitialtunableparametervaluesettingswithanintelligentexplorationorderingoftunableparametervaluesandanexplorationarrangementoftunableparameters(sincesomeparametersaremorecriticalforanapplicationthanothersandthusshouldbeexploredrst[ 26 ](e.g.,thetransmissionpowerparametermaybemorecriticalforalifetime-sensitiveapplicationthanprocessorvoltage)).Thedynamicoptimizationmethodologyusesalightweightonlinegreedyalgorithmthatleveragesintelligentparameterarrangementtoiterativelyexplorethedesignspace,resultinginanoperatingstatewithin2%oftheoptimaloperatingstatewhileexploringonly0.04%ofthedesignspace. Thedissertationdiscussesperformanceandenergyoptimizationsforpar-allelembeddedsystems,whicharesummarizedinFig. 1-1 .Eventhoughliteraturediscusseshigh-performanceandparallelcomputingforsupercomputers[ 27 ][ 28 ][ 29 ][ 30 ],thereexistslittlediscussiononHPEPEC[ 31 ]. Thedissertation,forthersttime,investigatesthefeasibilityandapplicationofmulti-coreparallelarchitecturesasprocessingunitsindistributedembeddedsensornodes.Thedissertationproposesamulti-coreembeddedwirelesssensornetwork(MCEWSN)architecturebasedonmulti-coreembeddedsensornodes.Furthermore,thedissertationmotivatesandsummarizesthemulti-coreinitiativeinEWSNsbyacademiaandindustry. Thedissertationproposesanovel,queueingtheory-basedmodelingtechniqueforevaluatingmulti-coreparallelembeddedarchitectures.Thismodelingtechniquewouldenablequickandinexpensivearchitecturalevaluationbothintermsofdesigntimeandresourcesascomparedtodevelopingand/orusingexistingmulti-coresimulatorsandrunningbenchmarksonthesesimulators.Basedonapreliminaryevaluationusingourmodel,architecturedesignerscanrun 49

PAGE 50

targetedbenchmarkstofurtherverifytheperformancecharacteristicsofselectedmulti-corearchitectures(i.e.,ourqueueingtheory-basedmodelfacilitatesearlydesignspacepruning). Thisdissertation,forthersttime,conductsaperformanceanalysisofpar-allelembeddedsystemsleveragingSMPsandTMAsbasedonparallelizedbenchmarks. 1.6RelationshiptoPublishedWork ThisdissertationincludesworkthathasbeenpublishedinvariousIEEE/ACMconferencesandjournals.Furthermore,someoftheworkhasbeenpublishedasbookchapters. TheworkonthemodelingofscalableembeddedsystemsistoappearinScalableComputingandCommunications:TheoryandPractice,JohnWiley&Sons,2012[ 32 ].Markovmodelingoffault-tolerantEWSNsforreliabilityandmeantimetofailure(MTTF)hasbeenpublishedintheproceedingsoftheIEEEInternationalConferenceonComputerCommunicationNetworks(ICCCN'11)[ 33 ].ThisdissertationprovidesadditionaldetailsandanalysisforthereliabilityandMTTFmodelingofWSNs. Ourproposedqueueingtheoreticapproachformodelingoflow-powermulti-coreembeddedarchitecturesforparallelembeddedsystemsoriginallyappearedintheproceedingsoftheIEEEInternationalConferenceonComputerDesign(ICCD'11)[ 34 ].Theworkonhigh-performanceandenergy-efcienttechniquesformulti-coreparallelembeddedsystemshasbeenacceptedforpublicationinIEEETPDS,2011[ 35 ]. TheworkonvariousoptimizationapproachesindistributedEWSNshasbeenpublishedasabookchapterinSustainableWirelessSensorNetworks,INTECH(OpenAccessPublisher),2010[ 36 ].TheworkonMDP-baseddynamicoptimizationmethodologyforembeddedsensornodesinEWSNsoriginallyappearedintheproceedingsoftheIEEE/ACMInternationalConferenceonHardware/SoftwareCodesignandSystemSynthesis(CODES+ISSS'09)[ 37 ]andisacceptedforpublicationinIEEETransactionsonParallelandDistributedSystems(TPDS),2012[ 21 ]. 50

PAGE 51

OnlinealgorithmsforEWSNsdynamicoptimizationoriginallyappearedintheproceedingsoftheIEEEConsumerCommunicationsandNetworkingConference(CCNC'12)[ 38 ].Theworkonone-shotdynamicoptimizationmethodologyandapplicationmetricsestimationmodeloriginallyappearedintheproceedingsoftheIARIAIEEEInternationalConferenceonMobileUbiquitousComputing,Systems,Services,andTechnologies(UBICOMM'10)[ 39 ],andreceivedBestPaperAward.Theextendedversionoftheone-shotdynamicoptimizationmethodologyandapplicationmetricsestimationmodelworkhasbeenacceptedforpublicationinIARIAInternationalJournalonAdvancesinNetworksandServices,2012[ 40 ].Thelightweightdynamicoptimizationmethodologythatiterativelyimprovestheone-shotoperatingstateusinganonlinegreedyalgorithmwaspublishedintheproceedingsoftheIEEEInternationalConferenceonWirelessandMobileComputing,NetworkingandCommunications(WiMob'10)[ 41 ].Thisdissertationpresentsadditionalanalysisandresultsforamorein-depthunderstandingofourdynamicoptimizationmethodologiesforembeddedsensornodesinEWSNs. 1.7DissertationOrganization Thisdissertationisorganizedasfollows:Chapter2discussesvariousoptimizationapproachesindistributedembeddedsystemsfocusingondistributedEWSNs.Chapter3presentsanapplicationmetricsestimationmodelfordistributedembeddedwirelesssensornetwork.MarkovmodelingforreliabilityandMTTFforfault-tolerantembeddedsensornodesinEWSNsispresentedinChapter4.TheMDP-baseddynamicoptimizationmethodologyforembeddedsensornodesindistributedEWSNsispresentedinChapter5.Onlinealgorithmsforembeddedsensornodestunableparameters-baseddynamicoptimizationarepresentedinChapter6.Chapter7presentsalightweightdynamicoptimizationmethodologyforembeddedsensornodesindistributedEWSNs.Variousperformanceandenergyoptimizationsforparallelmulti-coreembeddedsystemsarediscussedinChapter8.Chapter9proposesan 51

PAGE 52

MCEWSNarchitecturebasedonmulti-coreembeddedsensornodesandsummarizesthemulti-coreandparallelcomputinginitiativeinEWSNsbyacademiaandindustry.Chapter10proposesanovel,queueingtheory-basedmodelingtechniqueforevaluatingmulti-coreembeddedarchitecturesforparallelembeddedsystems.Chapter11evaluatesperformanceofparallelembeddedsystemsleveragingSMPsandTMAsbasedonparallelizedbenchmarks.Finally,Chapter12concludesthisdissertation. 52

PAGE 53

CHAPTER2OPTIMIZATIONAPPROACHESINDISTRIBUTEDSINGLE-COREEMBEDDEDWIRELESSSENSORNETWORKS Embeddedwirelesssensornetworks(EWSNs)aredistributedembeddedsystemsconsistingofembeddedsensornodeswithattachedsensorstosensedataaboutaphenomenonandcommunicatewithneighboringsensornodesoverwirelesslinks(werefertowirelesssensornetworks(WSNs)asEWSNssincesensornodesareembeddedinthephysicalenvironment/system).Duetoadvancementsinsilicontechnology,micro-electro-mechanicalsystems(MEMS),wirelesscommunications,computernetworking,anddigitalelectronics,distributedEWSNshavebeenproliferatinginawidevarietyofapplicationdomains.Theseapplicationdomainsincludemilitary,health,ecology,environment,industrialautomation,civilengineering,andmedical,tonameafew.Thiswideapplicationdiversitycombinedwithcomplexembeddedsensornodearchitectures,functionalityrequirements,andhighlyconstrainedandharshoperatingenvironmentsmakesEWSNdesignverychallenging. OnecriticalEWSNdesignchallengeinvolvesmeetingapplicationrequirementssuchaslifetime,reliability,throughput,delay(responsiveness),etc.formyriadofapplicationdomains.Furthermore,EWSNapplicationstendtohavecompetingrequirements,whichexacerbatedesignchallenges.Forexample,ahighprioritysecurity/defensesystemmayhavebothhighresponsivenessandlonglifetimerequirements.Themechanismsneededforhighresponsivenesstypicallydrainbatterylifequickly,thusmakinglonglifetimedifculttoachievegivenlimitedenergyreserves. Commercialoff-the-shelf(COTS)embeddedsensornodeshavedifcultymeetingapplicationrequirementsduetothegenericdesigntraitsnecessaryforwideapplicationapplicability.COTSsensornodesaremass-producedtooptimizecostandarenotspecializedforanyparticularapplication.Fortunately,COTSsensornodescontaintunableparameters(e.g.,processorvoltageandfrequency,sensingfrequency,etc.) 53

PAGE 54

whosevaluescanbespecializedtomeetapplicationrequirements.However,optimizingthesetunableparametersislefttotheapplicationdesigner. Optimizationtechniquesatdifferentdesignlevels(e.g.,sensornodehardwareandsoftware,datalinklayer,routing,operatingsystem(OS),etc.)assistdesignersinmeetingapplicationrequirements.EWSNoptimizationtechniquescanbegenerallycategorizedasstaticordynamic.StaticoptimizationsoptimizeanEWSNatdeploymenttimeandremainxedfortheEWSN'slifetime.Whereasstaticoptimizationsaresuitableforstable/predictableapplications,staticoptimizationsareinexibleanddonotadapttochangingapplicationrequirementsandenvironmentalstimuli.DynamicoptimizationsprovidemoreexibilitybycontinuouslyoptimizinganEWSN/embeddedsensornodeduringruntime,providingbetteradaptationtochangingapplicationrequirementsandactualenvironmentalstimuli. ThischapterintroducesdistributedEWSNsfromanoptimizationperspectiveandexploresoptimizationstrategiesemployedinEWSNsatdifferentdesignlevelstomeetapplicationrequirementsassummarizedinTable 2-1 .WepresentatypicalWSNarchitectureandarchitectural-leveloptimizationsinSection 2.1 .Wedescribesensornodecomponent-leveloptimizationsandtunableparametersinSection 2.2 .Next,wediscussdatalink-levelMediumAccessControl(MAC)optimizationsandnetwork-levelroutingoptimizationsinSection 2.3 andSection 2.4 ,respectively,andoperatingsystem-leveloptimizationsinSection 2.5 .Afterpresentingtheseoptimizationtechniques,wefocusondynamicoptimizationsforWSNs.Thereexistsmuchpreviousworkondynamicoptimizations(e.g.,[ 42 ][ 43 ][ 44 ][ 45 ]),butmostpreviousworktargetstheprocessororcachesubsystemincomputingsystems.WSNdynamicoptimizationspresentadditionalchallengesduetoauniquedesignspace,stringentdesignconstraints,andvaryingoperatingenvironments.Wediscussthecurrentstate-of-the-artindynamicoptimizationtechniquesinSection 2.6 54

PAGE 55

Table2-1. EWSNoptimizations(discussedinthischapter)atdifferentdesign-levels. Design-levelOptimizations Architecture-levelbridging,sensorweb,tunnelingComponent-levelparameter-tuning(e.g.,processorvoltageandfrequency,sensingfrequency)Datalink-levelloadbalancingandthroughput,power/energyNetwork-levelquerydissemination,dataaggregation,real-time,networktopology,resourceadaptive,dynamicnetworkreprogrammingOperatingSystem-levelevent-driven,dynamicpowermanagement,fault-tolerance 2.1Architecture-LevelOptimizations Fig. 2-1 showsanintegratedEWSNarchitecture(i.e.,aEWSNintegratedwithexternalnetworks).Embeddedsensornodesaredistributedinasensoreldtoobserveaphenomenonofinterest(i.e.,environment,vehicle,object,etc.).Embeddedsensornodesinthesensoreldformanadhocwirelessnetworkandtransmitthesensedinformation(dataorstatistics)gatheredviaattachedsensorsabouttheobservedphenomenontoabasestationorsinknode.Thesinknoderelaysthecollecteddatatotheremoterequester(user)viaanarbitrarycomputercommunicationnetworksuchasagatewayandassociatedcommunicationnetwork.Sincedifferentapplicationsrequiredifferentcommunicationnetworkinfrastructurestoefcientlytransfersenseddata,EWSNdesignerscanoptimizethecommunicationarchitecturebydeterminingtheappropriatetopology(numberanddistributionofembeddedsensorsnodeswithintheEWSN)andcommunicationinfrastructure(e.g.,gatewaynodes)tomeettheapplication'srequirements. Aninfrastructure-leveloptimizationcalledbridgingfacilitatesthetransferofsenseddatatoremoterequestersresidingatdifferentlocationsbyconnectingtheEWSNtoexternalnetworkssuchasInternet,cellular,andsatellitenetworks.BridgingcanbeaccomplishedbyoverlayingasensornetworkwithportionsoftheIPnetworkwhere 55

PAGE 56

Figure2-1. Embeddedwirelesssensornetworkarchitecture. gatewaynodesencapsulatesensornodepacketswithtransmissioncontrolprotocoloruserdatagramprotocol/internetprotocol(TCP/IPorUDP/IP). SinceembeddedsensornodescanbeintegratedwiththeInternetviabridging,thisEWSN-Internetintegrationcanbeexploitedtoformasensorweb.Inasensorweb,embeddedsensornodesformawebviewwheredatarepositories,sensors,andimagedevicesarediscoverable,accessible,andcontrollableviatheWorldWideWeb(WWW).Thesensorwebcanuseservice-orientedarchitectures(SoAs)orsensorwebenablement(SWE)standards[ 46 ].SoAsleverageextensiblemarkuplanguage(XML)andsimpleobjectaccessprotocol(SOAP)standardstodescribe,discover,andinvokeservicesfromheterogeneousplatforms.SWEisdenedbytheOpenGISConsortium(OGC)andconsistsofspecicationsdescribingsensordatacollectionandwebnoticationservices.AnexampleapplicationforasensorwebmayconsistofaclientusingEWSNinformationviasensorwebqueries.Theclientreceivesresponseseitherfromreal-timesensorsregisteredinthesensorweborfromexistingdatainthesensordatabaserepository.Inthisapplication,clientscanuseEWSNserviceswithoutknowledgeoftheactualembeddedsensornodes'locations. 56

PAGE 57

Figure2-2. Embeddedsensornodearchitecturewithtunableparameters. AnotherEWSNarchitecturaloptimizationistunneling.TunnelingconnectstwoEWSNsbypassinginternetworkcommunicationthroughagatewaynodethatactsasaEWSNextensionandconnectstoanintermediateIPnetwork.TunnelingenablesconstructionoflargevirtualEWSNsusingsmallerEWSNs[ 47 ]. 2.2SensorNodeComponent-LevelOptimizations COTSsensornodesprovideoptimizationopportunitiesatthecomponent-levelviatunableparameters(e.g.,processorvoltageandfrequency,sensingfrequency,dutycycle,etc.),whosevaluescanbespecializedtomeetvaryingapplicationrequirements.Fig. 2-2 depictsasensornode'smaincomponentssuchasapowerunit,storageunit,sensingunit,processingunit,andtransceiverunitalongwithpotentialtunableparametersassociatedwitheachcomponent[ 47 ].Inthissection,wediscussthesecomponentsandassociatedtunableparameters. 2.2.1SensingUnit Thesensingunitsensesthephenomenonofinterestusingsensorsandananalogtodigitalconverter(ADC).Thesensingunitcancontainavarietyofsensorsdepending 57

PAGE 58

uponanEWSNapplicationsincetherearesensorsforvirtuallyeveryphysicalquantity(e.g.,weight,electriccurrent,voltage,temperature,velocity,andacceleration).Asensor'sconstructioncanexploitavarietyofphysicaleffectsincludingthelawofinduction(voltagegenerationinanelectriceld)andphotoelectriceffects.RecentadvancesinEWSNscanbeattributedtothelargevarietyofavailablesensors.ADCsconverttheanalogsignalsproducedbysensorstodigitalsignals,whichserveasinputtotheprocessingunit. Thesensingunit'stunableparameterscancontrolpowerconsumptionbychangingthesensingfrequencyandthespeed-resolutionproductoftheADC.Sensingfrequencycanbetunedtoprovideconstantsensing,periodicsensing,and/orsporadicsensing.Inconstantsensing,sensorssensecontinuouslyandsensingfrequencyislimitedonlybythesensorhardware'sdesigncapabilities.Periodicsensingconsumeslesspowerthanconstantsensingbecauseperiodicsensingisduty-cyclebasedwherethesensornodetakesreadingsaftereveryTseconds.Sporadicsensingconsumeslesspowerthanperiodicsensingbecausesporadicsensingistypicallyevent-triggeredbyeitherexternal(e.g.,environment)orinternal(e.g.,OS-orhardware-based)interrupts.Thespeed-resolutionproductoftheADCcanbetunedtoprovidehighspeed-resolutionwithhigherpowerconsumption(e.g.,seismicsensorsuse24-bitconverterswithaconversionrateontheorderofthousandsofsamplespersecond)orlowspeed-resolutionwithlowerpowerconsumption. 2.2.2ProcessingUnit Theprocessingunitconsistsofaprocessor(e.g.,Intel'sStrongARM[ 48 ],Atmel'sAVR[ 49 ])whosemaintasksincludecontrollingsensors,gatheringandprocessingsenseddata,executingEWSNapplications,andmanagingcommunicationprotocolsandalgorithmsinconjunctionwiththeoperatingsystem.Theprocessor'stunableparametersincludeprocessorvoltageandfrequency,whichcanbespecializedtomeetpowerbudgetandthroughputrequirements.Theprocessorcanalsoswitchbetween 58

PAGE 59

differentoperatingmodes(e.g.,active,idle,sleep)toconserveenergy.Forexample,theIntel'sStrongARMconsumes75mWinidlemode,0.16mWinsleepmode,and240mWand400mWinactivemodewhileoperatingat133MHzand206MHz,respectively. 2.2.3TransceiverUnit Thetransceiverunitconsistsofaradio(transceiver)andanantenna,andisresponsibleforcommunicatingwithneighboringsensornodes.Thetransceiverunit'stunableparametersincludemodulationscheme,datarate,transmitpower,anddutycycle.Theradiocontainsdifferentoperatingmodes(e.g.,transmit,receive,idle,andsleep)forpowermanagementpurposes.Thesleepstateprovidesthelowestpowerconsumption,butswitchingfromthesleepstatetothetransmitstateconsumesalargeamountofpower.Thepowersavingmodes(e.g.,idle,sleep)arecharacterizedbytheirpowerconsumptionandlatencyoverhead(timetoswitchtotransmitorreceivemodes).Powerconsumptioninthetransceiverunitalsodependsonthedistancetotheneighboringsensornodesandtransmissioninterferences(e.g.,solarare,radiation,channelnoise). 2.2.4StorageUnit Sensornodescontainastorageunitfortemporarydatastoragewhenimmediatedatatransmissionisnotalwayspossibleduetohardwarefailures,environmentalconditions,physicallayerjamming,andenergyreserves.Asensornode'sstorageunittypicallyconsistsofFlashandstaticrandomaccessmemory(SRAM).FlashisusedforpersistentstorageofapplicationcodeandtextsegmentswhereasSRAMisforrun-timedatastorage.Onepotentialoptimizationusesanextremelylow-frequency(ELF)Flashlesystem,whichisspecicallyadaptedforsensornodedataloggingandoperatingenvironmentalconditions.Storageunitoptimizationchallengesincludepowerconservationandmemoryresources(limiteddataandprogrammemory,e.g.,theMica2sensornodecontainsonly4KBofdatamemory(SRAM)and128KBofprogrammemory(Flash)). 59

PAGE 60

2.2.5ActuatorUnit Theactuatorunitconsistsofactuators(e.g.,mobilizer,camerapantilt),whichenhancethesensingtask.Actuatorsopen/closeaswitch/relaytocontrolfunctionssuchascameraorantennaorientationandrepositioningsensors.Actuators,incontrasttosensorswhichonlysenseaphenomenon,typicallyaffecttheoperatingenvironmentbyopeningavalve,emittingsound,orphysicallymovingthesensornode.Theactuatorunit'stunableparameterisactuatorfrequency,whichcanbeadjustedaccordingtoapplicationrequirements. 2.2.6LocationFindingUnit Thelocationndingunitdeterminesasensornode'slocation.Dependingontheapplicationrequirementsandavailableresources,thelocationndingunitcaneitherbeglobalpositioningsystem(GPS)-basedoradhocpositioningsystem(APS)-based.TheGPS-basedlocationndingunitishighlyaccurate,buthashighmonetarycostandrequiresdirectlineofsightbetweenthesensornodeandsatellites.TheAPS-basedlocationndingunitdeterminesasensornode'spositionwithrespecttolandmarks.LandmarksaretypicallyGPS-basedposition-awaresensornodesandlandmarkinformationispropagatedinamulti-hopfashion.Asensornodeindirectcommunicationwithalandmarkestimatesitsdistancefromalandmarkbasedonthereceivedsignalstrength.Asensornodetwohopsawayfromalandmarkestimatesitsdistancebasedonthedistanceestimateofasensornodeonehopawayfromalandmarkviamessagepropagation.Whenasensornodehasdistanceestimatestothreeormorelandmarks,thesensornodecomputesitsownpositionasacentroidofthelandmarks. 2.2.7PowerUnit Thepowerunitsuppliespowertoasensornodeanddeterminesasensornode'slifetime.ThepowerunitconsistsofabatteryandaDC-DCconverter.Theelectrodematerialandthediffusionrateoftheelectrolyte'sactivematerialaffectthebatterycapacity.TheDC-DCconverterprovidesaconstantsupplyvoltagetothesensornode. 60

PAGE 61

2.3DataLink-LevelMediumAccessControlOptimizations Datalink-levelmediumaccesscontrol(MAC)managesthesharedwirelesschannelandestablishesdatacommunicationlinksbetweenembeddedsensornodesinanEWSN.TraditionalMACschemesemphasizehighqualityofservice(QoS)[ 50 ]orbandwidthefciency[ 51 ][ 52 ],however,EWSNplatformshavedifferentpriorities[ 53 ]thusinhibitingthestraightforwardadoptionofexistingMACprotocols[ 54 ].Forexample,sinceEWSNlifetimeistypicallyanimportantapplicationrequirementandbatteriesarenoteasilyinterchangeable/rechargeable,energyconsumptionisaprimarydesignconstraintforEWSNs.Similarly,sincethenetworkinfrastructureissubjecttochangesduetodyingnodes,self-organizationandfailurerecoveryisimportant.Tomeetapplicationrequirements,EWSNdesignerstuneMAClayerprotocolparameters(e.g.,channelaccessschedule,messagesize,dutycycle,andreceiverpower-off,etc.).ThissectiondiscussesMACprotocolsforEWSNswithreferencetotheirtunableparametersandoptimizationobjectives. 2.3.1LoadBalancingandThroughputOptimizations MAClayerprotocolscanadjustwirelesschannelslotallocationtooptimizethroughputwhilemaintainingthetrafcloadbalancebetweensensornodes.Afairnessindexmeasuresloadbalancingortheuniformityofpacketsdeliveredtothesinknodefromallthesenders.Fortheperfectlyuniformcase(idealloadbalance),thefairnessindexis1.MAClayerprotocolsthatadjustchannelslotallocationforloadbalancingandthroughputoptimizationsincludeTrafcAdaptiveMediumAccessProtocol(TRAMA)[ 55 ],BerkeleyMediaAccessControl(B-MAC)[ 56 ],andZebraMAC(Z-MAC)[ 57 ]. TRAMAisaMACprotocolthatadjustschanneltimeslotallocationtoachieveloadbalancingwhilefocusingonprovidingcollisionfreemediumaccess.TRAMAdividesthechannelaccessintorandomandscheduledaccessperiodsandaimstoincreasetheutilizationofthescheduledaccessperiodusingtimedivisionmultipleaccess(TDMA).TRAMAcalculatesaMessage-Digestalgorithm5(MD5)hashfor 61

PAGE 62

everyone-hopandtwo-hopneighboringsensornodestodetermineanode'spriority.ExperimentscomparingTRAMAwithbothcontention-basedprotocols(IEEE802.11andSensor-MAC(S-MAC)[ 58 ])aswellasascheduled-basedprotocol(Node-ActivationMultipleAccess(NAMA)[ 59 ])revealedthatTRAMAachievedhigherthroughputthancontention-basedprotocolsandcomparablethroughputwithNAMA[ 60 ]. B-MACisacarriersenseMACprotocolforEWSNs.B-MACadjuststhedutycycleandtimeslotallocationforthroughputoptimizationandhighchannelutilization.B-MACsupportson-the-yrecongurationoftheMACbackoffstrategyforperformance(e.g.,throughput,latency,powerconservation)optimization.ResultsfromB-MACandS-MACimplementationonTinyOSusingMica2motesindicatedthatB-MACoutperformedS-MACby3.5xonaverage[ 56 ].Nosensornodewasallocatedmorethan15%additionalbandwidthascomparedwithothernodes,thusensuringfairness(loadbalancing). Z-MACisahybridMACprotocolthatcombinesthestrengthsofTDMAandcarriersensemultipleaccess(CSMA)andoffsetstheirweaknesses.Z-MACallocatestimeslotsatsensornodedeploymenttimebyusinganefcientchannelschedulingalgorithmtooptimizethroughput,butthismechanismrequireshighinitialoverhead.Atimeslot'sowneristhesensornodeallocatedtothattimeslotandallothernodesarecallednon-ownersofthattimeslot.MultipleownersarepossibleforagiventimeslotbecauseZ-MACallowsanytwosensornodesbeyondtheirtwo-hopneighborhoodstoownthesametimeslot.UnlikeTDMA,asensornodemaytransmitduringanytimeslotbutslotownershaveahigherpriority.ExperimentalresultsfromZ-MACimplementationonbothns-2andTinyOS/Mica2indicatedthatZ-MACperformedbetterthanB-MACundermediumtohighcontentionbutexhibitedworseperformancethanB-MACunderlowcontention(inheritsfromTDMA-basedchannelaccess).ThefairnessindexofZ-MACwasbetween0.7and1,whereasthatofB-MACwasbetween0.2to0.3foralargenumberofsenders[ 57 ]. 62

PAGE 63

2.3.2Power/EnergyOptimizations MAClayerprotocolscanadapttheirtransceiveroperatingmodes(e.g.,sleep,onandoff)anddutycycleforreducedpowerand/orenergyconsumption.MAClayerprotocolsthatadjustdutycycleforpower/energyoptimizationincludePowerAwareMulti-AccesswithSignaling(PAMAS)[ 47 ][ 61 ],S-MAC[ 58 ],Timeout-MAC(T-MAC)[ 62 ],andB-MAC. PAMASisaMAClayerprotocolforEWSNsthatadjuststhedutycycletominimizeradioontimeandoptimizepowerconsumption.PAMASusesseparatedataandcontrolchannels(thecontrolchannelmanagestherequest/cleartosend(RTS/CTS)signalsorthereceiverbusytone).IfasensornodeisreceivingamessageonthedatachannelandreceivesanRTSmessageonthesignalingchannel,thenthesensornoderespondswithabusytoneonthesignalingchannel.Thismechanismavoidscollisionsandresultsinenergysavings.ThePAMASprotocolpowersoffthereceiverifeitherthetransmitmessagequeueisemptyandthenode'sneighboristransmittingorthetransmitmessagequeueisnotemptybutatleastoneneighboristransmittingandoneneighborisreceiving.EWSNsimulationswith10to20sensornodeswith512-bytedatapackets,32-byteRTS/CTSpackets,and64-bytebusytonesignalpacketsrevealedpowersavingsbetween10%and70%[ 63 ].PAMASoptimizationchallengesincludeimplementationcomplexityandassociatedareacostbecausetheseparatecontrolchannelrequiresasecondtransceiverandduplexer. TheS-MACprotocoltunesthedutycycleandmessagesizeforenergyconservation.S-MACminimizeswastedenergyduetoframe(packet)collisions(sincecollidedframesmustberetransmittedwithadditionalenergycost),overhearing(asensornodereceiving/listeningtoaframedestinedforanothernode),controlframeoverhead,andidlelistening(channelmonitoringtoidentifypossibleincomingmessagesdestinedforthatnode).S-MACusesaperiodicsleepandlisten(sleep-sense)strategydenedbythedutycycle.S-MACavoidsframecollisionsbyusingvirtualsense(networkallocation 63

PAGE 64

vector(NAV)-based)andphysicalcarriersense(receiverlisteningtothechannel)similartoIEEE802.11.S-MACavoidsoverhearingbyinstructinginterferingsensornodestoswitchtosleepmodeafterhearinganRTSorCTSpacket[ 61 ].ExperimentsconductedonReneMotes[ 64 ]foratrafcloadcomprisingofsentmessagesevery1-10secondsrevealedthataIEEE802.11-basedMACconsumed2xto6xmoreenergythanS-MAC[ 65 ]. T-MACadjuststhedutycycledynamicallyforpowerefcientoperation.T-MACallowsavariablesleep-sensedutycycleasopposedtothexeddutycycleusedinS-MAC(e.g.,10%senseand90%sleep).Thedynamicdutycyclefurtherreducestheidlelisteningperiod.Thesensornodeswitchestosleepmodewhenthereisnoactivationevent(e.g.,datareception,timerexpiration,communicationactivitysensing,orimpendingdatareceptionknowledgethroughneighbors'RTS/CTS)forapredeterminedperiodoftime.ExperimentalresultsobtainedfromT-MACprotocolimplementationonOMNeT++[ 66 ]tomodelEYESsensornodes[ 67 ]revealedthatunderhomogeneousload(sensornodessentpacketswith20-to100-bytepayloadstotheirneighborsatrandom),bothT-MACandS-MACyielded98%energysavingsascomparedtoCSMAwhereasT-MACoutperformedS-MACby5xundervariableload[ 60 ]. B-MACadjuststhedutycycleforpowerconservationusingchannelassessmentinformation.B-MACdutycyclestheradiothroughaperiodicchannelsamplingmechanismknownaslowpowerlistening(LPL).Eachtimeasensornodewakesup,thesensornodeturnsontheradioandchecksforchannelactivity.Ifthesensornodedetectsactivity,thesensornodepowersupandstaysawakeforthetimerequiredtoreceiveanincomingpacket.Ifnopacketisreceived,indicatinginaccurateactivitydetection,atimeoutforcesthesensornodetosleepmode.B-MACrequiresanaccurateclearchannelassessmenttoachievelowpoweroperation.ExperimentalresultsobtainedfromB-MACandS-MACimplementationonTinyOSusingMica2 64

PAGE 65

motesrevealedthatB-MACpowerconsumptionwaswithin25%ofS-MACforlowthroughputs(below45bitspersecond)whereasB-MACoutperformedS-MACby60%forhigherthroughputs.ResultsindicatedthatB-MACperformedbetterthanS-MACforlatenciesunder6secondswhereasS-MACyieldedlowerpowerconsumptionaslatencyapproached10seconds[ 56 ]. 2.4Network-LevelDataDisseminationandRoutingProtocolOptimizations OnecommonalityacrossdiverseEWSNapplicationdomainsistheembeddedsensornode'stasktosenseandcollectdataaboutaphenomenonandtransmitthedatatothesinknode.Tomeetapplicationrequirements,thisdatadisseminationrequiresenergy-efcientroutingprotocolstoestablishcommunicationpathsbetweentheembeddedsensornodesandthesinknode.TypicallyharshoperatingenvironmentscoupledwithstringentresourceandenergyconstraintsmakedatadisseminationandroutingchallengingforEWSNs.Ideally,datadisseminationandroutingprotocolsshouldtargetenergyefciency,robustness,andscalability.Toachievetheseoptimizationobjectives,routingprotocolsadjusttransmissionpower,routingstrategies,andleverageeithersingle-hopormulti-hoprouting.Inthissection,wediscussprotocols,whichoptimizedatadisseminationandroutinginEWSNs. 2.4.1QueryDisseminationOptimizations Querydissemination(transmissionofasenseddataquery/requestfromasinknodetoasensornode)anddataforwarding(transmissionofsenseddatafromasensornodetoasinknode)requiresroutinglayeroptimizations.ProtocolsthatoptimizequerydisseminationanddataforwardingincludeDeclarativeRoutingProtocol(DRP)[ 68 ],directeddiffusion[ 69 ],GRAdientRouting(GRAd)[ 70 ],GRAdientBroadcast(GRAB)[ 71 ],andEnergyAwareRouting(EAR)[ 60 ][ 72 ]. DRPtargetsenergyefciencybyexploitingin-networkaggregation(multipledataitemsareaggregatedastheyareforwardedbysensornodes).Fig. 2-3 showsin-networkdataaggregationwheresensornodeIaggregatessenseddatafromsource 65

PAGE 66

Figure2-3. Dataaggregation. nodesA,B,andC,sensornodeJaggregatessenseddatafromsourcenodesDandE,andsensornodeKaggregatessenseddatafromsourcenodesF,G,andH.ThesensornodeLaggregatesthesenseddatafromsensornodesI,J,andK,andtransmitstheaggregateddatatothesinknode.DRPusesreversepathforwardingwheredatareports(packetscontainingsenseddatainresponsetoquery)owinthereversedirectionofthequerypropagationtoreachthesink. Directeddiffusiontargetsenergyefciency,scalability,androbustnessundernetworkdynamicsusingreversepathforwarding.Directeddiffusionbuildsasharedmeshtodeliverdatafrommultiplesourcestomultiplesinks.Thesinknodedisseminatesthequery,aprocessreferredtoasinterestpropagation(Fig. 2-4 (a)).Whenasensornodereceivesaqueryfromaneighboringnode,thesensornodesetsupavectorcalledthegradientfromitselftotheneighboringnodeanddirectsfuturedataowsonthisgradient(Fig. 2-4 (b)).Thesinknodereceivesaninitialbatchofdatareportsalongmultiplepathsandusesamechanismcalledreinforcementtoselectapathwiththebestforwardingquality(Fig. 2-4 (c)).Tohandlenetworkdynamicssuchassensornodefailures,eachdatasourceoodsdatareportsperiodicallyatlowerratestomaintainalternatepaths.Directeddiffusionchallengesincludeformationofinitialgradientsandwastedenergyduetoredundantdataowstomaintainalternatepaths. 66

PAGE 67

Figure2-4. Directeddiffusion:(a)Interestpropagation;(b)Initialgradientsetup;(c)Datadeliveryalongthereinforcedpath. GRAdoptimizesdataforwardingandusescost-eldbasedforwardingwherethecostmetricisbasedonthehopcount(i.e.,sensornodesclosertothesinknodehavesmallercostsandthosefartherawayhavehighercosts).ThesinknodeoodsaREQUESTmessageandthedatasourcebroadcaststhedatareportcontainingtherequestedsensedinformation.Theneighborswithsmallercostsforwardthereporttothesinknode.GRAddrawbacksincludewastedenergyduetoredundantdatareportcopiesreachingthesinknode. GRABoptimizesdataforwardingandusescost-eldbasedforwardingwherethecostmetricdenotesthetotalenergyrequiredtosendapackettothesinknode.GRABwasdesignedforharshenvironmentswithhighchannelerrorrateandfrequentsensornodefailures.GRABcontrolsredundancybycontrollingthewidth(numberofroutesfromthesourcesensornodetothesinknode)oftheforwardingmeshbutrequiresthatsensornodesmakeassumptionsabouttheenergyrequiredtotransmitadatareporttoaneighboringnode. EARoptimizesdataforwardingandusescost-eldbasedforwardingwherethecostmetricdenotesenergyperneighbor.EARoptimizationobjectivesareloadbalancingandenergyconservation.EARmakesforwardingdecisionsprobabilisticallywherethe 67

PAGE 68

assignedprobabilityisinverselyproportionaltotheneighborenergycostsothatpathsconsumingmoreenergyareusedlessfrequently[ 60 ]. 2.4.2Real-TimeConstrainedOptimizations CriticalEWSNapplicationsmayhavereal-timerequirementsforsenseddatadelivery(e.g.,asecurity/defensesystemmonitoringenemytroopsoraforestredetectionapplication).Failuretomeetthereal-timedeadlinesfortheseapplicationscanhavecatastrophicconsequences.Routingprotocolsthatconsiderthetimingconstraintsforreal-timerequirementsincludeReal-timeArchitectureandProtocol(RAP)[ 73 ]andastatelessprotocolforreal-timecommunicationinsensornetworks(SPEED)[ 74 ]. RAPprovidesreal-timedatadeliverybyconsideringthedatareportexpirationtime(timeafterwhichthedataisoflittleornouse)andtheremainingdistancethedatareportneedstotraveltoreachthesinknode.RAPcalculatesthedesiredvelocityv=d=twheredandtdenotethedestinationdistanceandpacketlifetime,respectively.Thedesiredvelocityisupdatedateachhoptoreectthedatareport'surgency.Asensornodeusesmultiplerst-in-rst-out(FIFO)queueswhereeachqueueacceptsreportsofvelocitieswithinacertainrangeandthenschedulestransmissionsaccordingtoareport'sdegreeofurgency[ 60 ]. SPEEDprovidesreal-timedatadeliveryandusesanexponentiallyweightedmovingaveragefordelaycalculation.Givenadatareportwithvelocityv,SPEEDcalculatesthespeedviofthereportiftheneighborNiisselectedasthenexthopandthenselectsaneighborwithvi>vtoforwardthereportto[ 60 ]. 2.4.3NetworkTopologyOptimizations Routingprotocolscanadjustradiotransmissionpowertocontrolnetworktopology(basedonroutingpaths).Low-EnergyAdaptiveClusteringHierarchy(LEACH)[ 75 ]optimizesthenetworktopologyforreducedenergyconsumptionbyadjustingtheradio'stransmissionpower.LEACHusesahybridsingle-hopandmulti-hopcommunicationparadigm.Thesensornodesusemulti-hopcommunicationtotransmitdatareportsto 68

PAGE 69

aclusterhead(LEACHdeterminestheclusterheadusingarandomizeddistributedalgorithm).Theclusterheadforwardsdatatothesinknodeusinglong-rangeradiotransmission. 2.4.4ResourceAdaptiveOptimizations Routingprotocolscanadaptroutingactivitiesinaccordancewithavailableresources.SensorProtocolsforInformationviaNegotiation(SPIN)[ 76 ]optimizesperformanceefciencybyusingdatanegotiationandresourceadaptation.Indatanegotiation,sensornodesassociatemetadatawithnodesandexchangethismetadatabeforeactualdatatransmissionbegins.Thesensornodesinterestedinthedatacontent,basedonmetadata,requesttheactualdata.Thisdatanegotiationensuresthatdataissentonlytointerestednodes.SPINallowssensornodestoadjustroutingactivitiesaccordingtoavailableenergyresources.Atlowenergylevels,sensornodesreduceoreliminatecertainactivities(e.g.,forwardingofmetadataanddatapackets)[ 53 ]. 2.5OperatingSystem-levelOptimizations Anembeddedsensornode'soperatingsystem(OS)presentsoptimizationchallengesbecauseembeddedsensornodeoperationfallsbetweensingle-applicationdevicesthattypicallydonotneedanOSandgeneral-purposedeviceswithresourcestoruntraditionalembeddedOSs.Anembeddedsensornode'sOSmanagesprocessor,radio,I/Obuses,andFlashmemory,andprovideshardwareabstractiontoapplicationsoftware,taskcoordination,powermanagement,andnetworkingservices.Inthissection,wediscussseveraloptimizationsprovidedbyexistingOSsforembeddedsensornodes[ 53 ]. 2.5.1Event-DrivenOptimizations Embeddedsensornodesrespondtoeventsbycontrollingsensingandactuationactivity.Sincesensornodesareevent-driven,itisimportanttooptimizetheOSforeventhandling.EWSNOSsoptimizedforeventhandlingincludeTinyOS[ 77 ]andPicOS[ 78 ]. 69

PAGE 70

TinyOSoperatesusinganevent-drivenmodel(tasksareexecutedbasedonevents).TinyOSiswritteninthenesCprogramminglanguageandallowsapplicationsoftwaretoaccesshardwaredirectly.TinyOS'sadvantagesincludesimpleOScode,energyefciency,andasmallmemoryfootprint.TinyOSchallengesincludeintroducedcomplexityinapplicationdevelopmentandportingofexistingCcodetoTinyOS. PicOSisanevent-drivenOSwritteninCanddesignedforlimitedmemorymicrocontrollers.PicOStasksarestructuredasanitestatemachine(FSM)andstatetransitionsaretriggeredbyevents.PicOSiseffectiveforreactiveapplicationswhoseprimaryroleistoreacttoevents.PicOSsupportsmultitaskingandhassmallmemoryrequirementsbutisnotsuitableforreal-timeapplications. 2.5.2DynamicPowerManagement Asensornode'sOScancontrolhardwarecomponentstooptimizepowerconsumption.ExamplesincludeOperatingSystem-directedPowerManagement(OSPM)[ 79 ]andMagnetOS[ 80 ],eachofwhichprovidemechanismsfordynamicpowermanagement.OSPMoffersgreedy-baseddynamicpowermanagement,whichswitchesthesensornodetoasleepstatewhenidle.Sleepstatesprovideenergyconservation,however,transitiontosleepstatehastheoverheadofstoringtheprocessorstateandrequiresaniteamountofwakeuptime.OSPMgreedy-basedadaptivesleepmechanismdisadvantagesincludewakeupdelayandpotentiallymissingeventsduringsleeptime.MagnetOSprovidestwoonlinepower-awarealgorithmsandanadaptivemechanismforapplicationstoeffectivelyutilizethesensornode'sresources. 2.5.3Fault-Tolerance Sincemaintenanceandrepairofembeddedsensornodesistypicallynotfeasibleafterdeployment,embeddedsensornodesrequirefault-tolerantmechanismsforreliableoperation.MANTIS[ 81 ]isamultithreadedOSthatprovidesfault-tolerantisolationbetweenapplicationsbynotallowingablockingtasktopreventtheexecutionofothertasks.Intheabsenceoffault-tolerantisolation,ifonetaskexecutesaconditionalloop 70

PAGE 71

whoselogicalconditionisneversatised,thenthattaskwillexecuteinaninniteloopblockingallothertasks.MANTISfacilitatessimpleapplicationdevelopmentandallowsdynamicreprogrammingtoupdatethesensornode'sbinarycode.MANTISoffersamultimodalprototypingenvironmentfortestingEWSNapplicationsbyprovidingaremoteshellandcommandservertoenableinspectionofthesensornode'smemoryandstatusremotely.MANTISchallengesincludecontextswitchtime,stackmemoryoverhead(sinceeachthreadrequiresonestack),andhighenergyconsumption. 2.6DynamicOptimizations Dynamicoptimizationsenablein-situparametertuningandempowerstheembeddedsensornodetoadapttochangingapplicationrequirementsandenvironmentalstimulithroughouttheembeddedsensornode'slifetime.Dynamicoptimizationsareimportantbecauseapplicationrequirementschangeovertimeandenvironmentalstimuli/conditionsmaynotbeaccuratelypredictedatdesigntime.AlthoughsomeOS,MAClayer,androutingoptimizationsdiscussedinpriorsectionsofthischapteraredynamicinnature,inthissectionwepresentadditionaldynamicoptimizationtechniquesforEWSNs. 2.6.1DynamicVoltageandFrequencyScaling Dynamicvoltageandfrequencyscaling(DVFS)adjustsasensornode'sprocessorvoltageandfrequencytooptimizeenergyconsumption.DVFStradesoffperformanceforreducedenergyconsumptionbyconsideringthatthepeakcomputation(instructionexecution)rateismuchhigherthantheapplication'saveragethroughputrequirementandthatsensornodesarebasedonCMOSlogic,whichhasavoltagedependentmaximumoperatingfrequency.Minetal.[ 82 ]demonstratedthataDVFSsystemcontainingavoltageschedulerrunningintandemwiththeoperatingsystem'staskschedulerresultedina60%reductioninenergyconsumption.Yuanetal.[ 83 ]studiedaDVFSsystemforsensornodesthatrequiredthesensornodestoinsertadditionalinformation(e.g.,packetlength,expectedprocessingtime,anddeadline) 71

PAGE 72

intothedatapacket'sheader.Thereceivingsensornodeutilizedthisinformationtoselectanappropriateprocessorvoltageandfrequencytominimizetheoverallenergyconsumption. 2.6.2Software-BasedDynamicOptimizations Softwarecanprovidedynamicoptimizationsusingtechniquessuchasdutycycling,batching,hierarchy,andredundancyreduction.Softwarecancontrolthedutycyclesothatsensornodesarepoweredinacyclicmannertoreducetheaveragepowerdraw.Inbatching,multipleoperationsarebufferedandthenexecutedinabursttoreducestartupoverheadcost.Softwarecanarrangeoperationsinahierarchybasedonenergyconsumptionandtheninvokelowenergyoperationsbeforehighenergyoperations.Softwarecanreduceredundancybycompression,dataaggregation,and/ormessagesuppression.Kogekaretal.[ 84 ]proposedanapproachforsoftwarerecongurationinEWSNs.TheauthorsmodeledtheEWSNoperationspace(denedbytheEWSNsoftwarecomponents'modelsandapplicationrequirements)anddenedrecongurationastheprocessofswitchingfromonepointintheoperationspacetoanother. 2.6.3DynamicNetworkReprogramming Dynamicnetworkreprogrammingreprogramsembeddedsensornodestochange/modifytasksbydisseminatingcodeinaccordancewithchangingenvironmentalstimuli.Sincerecollectionandreprogrammingisnotafeasibleoptionformostsensornodes,dynamicnetworkreprogrammingenablesthesensornodestoperformdifferenttasks.Forexample,anEWSNinitiallydeployedformeasuringrelativehumiditycanmeasuretemperaturestatisticsafterdynamicreprogramming.TheMANTISOSpossessesthedynamicreprogrammingability(Section 2.5.3 ). 72

PAGE 73

CHAPTER3ANAPPLICATIONMETRICSESTIMATIONMODELFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS Advancementsinsemiconductortechnology,aspredictedbyMoore'slaw,haveenabledhightransistordensityinasmallchiparearesultingintheminiaturizationofembeddedsystems(e.g.,embeddedsensornodes).Embeddedwirelesssensornetworks(EWSNs)areenvisionedasubiquitous,distributedcomputingsystems,whichareproliferatinginmanyapplicationdomains(e.g.,defense,healthcare,surveillancesystems)eachwithvaryingapplicationrequirementsthatcanbedenedbyhigh-levelapplicationmetrics(e.g.,lifetime,reliability).However,thediversityofEWSNapplicationdomainsmakesitdifcultforcommercial-off-the-shelf(COTS)embeddedsensornodestomeettheseapplicationrequirements. SinceCOTSembeddedsensornodesaremass-producedtooptimizecost,manyCOTSembeddedsensornodespossesstunableparameters(e.g.,processorvoltageandfrequency,sensingfrequency),whosevaluescanbetunedforapplicationspecialization[ 37 ].TheEWSNapplicationdesigners(thosewhodesign,manage,ordeploytheEWSNforanapplication)aretypicallybiologists,teachers,farmers,andhouseholdconsumersthatareexpertswithintheirapplicationdomain,buthavelimitedtechnicalexpertise.Giventhelargedesignspaceandoperatingconstraints,determiningappropriateparametervalues(operatingstate)canbeadauntingand/ortimeconsumingtaskfornon-expertapplicationmanagers.Typically,embeddedsensornodevendorsassigninitialgenerictunableparametervaluesettings,however,noonetunableparametervaluesettingisappropriateforallapplications.ToassisttheEWSNmanagerswithparametertuningtobestttheapplicationrequirements,anautomatedparametertuningprocessisrequired. ApplicationmetricsestimationfordistributedEWSNsisstillininfancy.OnlyafewlifetimeestimationmodelexistsfordistributedEWSNs[ 85 ][ 86 ][ 87 ],however,thesemodelseitherdonotconsiderlow-levelsensornodetunableparametersoronly 73

PAGE 74

considermarginallyafewlow-levelsensornodetunableparameters.Furthermore,existingmodelsfordistributedEWSNsfocusmainlyonnetworkingissuesindistributedEWSNsasopposedtoembeddedissues(e.g.,processor,transceiver,sensors)inanembeddedsensornode.Moreover,theliteraturedoesnotdiscussapplicationmetricsestimationmodelforotherapplicationmetricsapartfromlifetimesuchasthroughputandreliability. Inthischapter,weforthersttime,tothebestofourknowledge,proposeanapplicationmetricsestimationmodelthatestimateshigh-levelapplicationmetricsfromlow-levelsensornodetunableparametersandthesensornode'shardwareinternals(e.g.,transceivervoltage,transceiverreceivecurrent).DynamicoptimizationmethodologiesfordistributedEWSNsleveragethisestimationmodelwhilecomparingdifferentoperatingstatesforoptimizationpurposes. OurresearchhasabroadimpactonEWSNdesignanddeployment.Ourapplicationmetricsestimationmodelprovidesarststeptowardshigh-levelmetricsestimationfromsensornodetunableparametersandhardwareinternals.Theestimationmodelestablishesarelationshipbetweensensornodeoperatingstateandhigh-levelmetrics.Sinceapplicationmanagerstypicallyfocusonhigh-levelmetricsandaregenerallyunawareoflow-levelsensornodeinternals,thismodelprovidesaninterfacebetweentheapplicationmanagerandthesensornodeinternals.Additionally,ourmodelcanpotentiallysparkfurtherresearchinapplicationmetricsestimationforEWSNs. Theremainderofthischapterisorganizedasfollows.Section 3.1 describesourapplicationmetricsestimationmodelthatisleveragedbyvariousdynamicoptimizationmethodologiesfordistributedEWSNs.ExperimentalresultsarepresentedinSection 3.2 .Finally,Section 3.3 concludesthechapteranddiscussesfutureresearchworkdirections. 74

PAGE 75

3.1ApplicationMetricsEstimationModel Thissectionpresentsourapplicationmetricsestimationmodel,whichisleveragedbyvariousdynamicoptimizationmethodologiesfordistributedEWSNs.Thisestimationmodelestimateshigh-levelapplicationmetrics(lifetime,throughput,reliability)fromlow-leveltunableparametersandsensornodehardwareinternals.Theuseofhardwareinternalsisappropriateforapplicationmetricsmodelingassimilarapproacheshavebeenusedinliteratureespeciallyforlifetimeestimation[ 85 ][ 86 ][ 87 ].Basedontunableparametervaluesettingscorrespondingtoanoperatingstateandhardwarespecicvalues,theapplicationmetricsestimationmodeldeterminescorrespondingvaluesforhigh-levelapplicationmetrics.Thesehigh-levelapplicationmetricvaluesarethenusedintheirrespectiveobjectivefunctionstodeterminetheobjectivefunctionvaluescorrespondingtoanoperatingstate(e.g.,lifetimeestimationmodeldeterminessl(lifetimeofferedbystates),whichisthenusedinthelifetimeobjectivefunctiontodeterminetheobjectivefunctionvalue).Thissectionpresentsacompletedescriptionofourapplicationmetricsestimationmodel,includingareviewofourpreviousapplicationmetricsestimationmodel[ 88 ]andadditionaldetails. 3.1.1LifetimeEstimation Anembeddedsensornode'slifetimeisdenedasthetimedurationbetweensensornodedeploymentandsensornodefailureduetoawidevarietyofreasons(e.g.,batterydepletion,hardware/softwarefault,environmentaldamage,externaldestruction,etc.).Lifetimeestimationmodelstypicallyconsiderbatterydepletionasthecauseofsensornodefailure[ 89 ].Sinceembeddedsensornodescanbedeployedinremoteandhostileenvironments,manualbatteryreplacementafterdeploymentisoftenimpractical.Anembeddedsensornodereachesthefailedordeadstateoncetheentirebatteryenergyisdepleted.Thecriticalfactorsthatdetermineasensornode'slifetimearebatteryenergyandenergyconsumptionduringoperation. 75

PAGE 76

TheembeddedsensornodelifetimeindaysLscanbeestimatedas Ls=Eb Ec24(3) whereEbdenotestheembeddedsensornode'sbatteryenergyinJoulesandEcdenotestheembeddedsensornode'senergyconsumptionperhour.ThebatteryenergyinmWhE0bcanbegivenby E0b=VbCb(mWh)(3) whereVbdenotesbatteryvoltageinVoltsandCbdenotesbatterycapacity,typicallyspeciedinmA-h.Since1J=1Ws,Ebcanbecalculatedas Eb=E0b3600=1000(J)(3) Thesensorsintheembeddedsensornodegatherinformationaboutthephysicalenvironmentandgeneratecontinuoussequencesofanalogsignals/values.Sample-and-hold-circuitsandanalog-to-digital(A/D)convertersdigitizetheseanalogsignals.Thisdigitalinformationisprocessedbyaprocessor,andtheresultsarecommunicatedtoothersensornodesorabasestationnode(sinknode)viaatransmitter.Thesensingenergyistheenergyconsumedbythesensornodeduetosensingevents.Theprocessingenergyistheenergyconsumedbytheprocessortoprocessthesenseddata(e.g.,calculatingtheaverageofthesensorvaluesoveratimeintervalorthedifferencebetweenthemostrecentsensorvaluesandthepreviouslysensedvalues).Thecom-municationenergyistheenergyconsumedduetocommunicationwithothersensornodesorthesinknode.Forexample,sensornodessendpacketscontainingthesensed/processeddatainformationtoothersensornodesandthesinknode,whichconsumescommunicationenergy. WemodelEcasthesumoftheprocessingenergy,communicationenergy,andsensingenergy Ec=Esen+Eproc+Ecom(J)(3) 76

PAGE 77

whereEsen,Eproc,andEcomdenotethesensingenergyperhour,processingenergyperhour,andcommunicationenergyperhour,respectively. Thesensing(sampling)frequencyandthenumberofsensorsattachedtothesensorboard(e.g.,theMTS400sensorboard[ 90 ]hasSensirionSHT1xtemperatureandhumiditysensors[ 91 ])arethemaincontributorstothetotalsensingenergy.Ourmodelconsidersenergyconservationbyallowingsensorstoswitchtoalowpower,idlemodewhilenotsensing.Esenisgivenby Esen=Emsen+Eisen(3) whereEmsendenotesthesensingmeasurementenergyperhourandEisendenotesthesensingidleenergyperhour.Emsencanbecalculatedas Emsen=NsVsImstms3600(3) whereNsdenotesthenumberofsensingmeasurementspersecond,Vsdenotesthesensingboardvoltage,Imsdenotesthesensingmeasurementcurrent,andtmsdenotesthesensingmeasurementtime.Nscanbecalculatedas Ns=NrFs(3) whereNrdenotesthenumberofsensorsonthesensingboardandFsdenotesthesensingfrequency.Eisenisgivenby Eisen=VsIstis3600(3) whereIsdenotesthesensingsleepcurrentandtisdenotesthesensingidletime.tisisgivenby tis=1)]TJ /F3 11.955 Tf 11.95 0 Td[(tms(3) Weassumethattheembeddedsensornode'sprocessoroperatesintwomodes:activemodeandidlemode[ 92 ].Theprocessoroperatesinactivemodewhile 77

PAGE 78

processingthesenseddataandswitchestotheidlemodeforenergyconservationwhennotprocessing.Theprocessingenergyisthesumoftheprocessor'senergyconsumptionwhileoperatingintheactiveandtheidlemodes.Wepointoutthatalthoughweonlyconsideractiveandidlemodes,aprocessoroperatinginadditionalsleepmodes(e.g.,power-down,power-save,standby,etc.)canalsobeincorporatedinourmodel.Eprocisgivenby Eproc=Eaproc+Eiproc(3) whereEaprocandEiprocdenotetheprocessor'senergyconsumptionperhourintheactiveandidlemodes,respectively.Eaprocisgivenby Eaproc=VpIapta(3) whereVpdenotestheprocessorvoltage,Iapdenotestheprocessoractivemodecurrent,andtadenotesthetimespentbytheprocessorintheactivemode.tacanbeestimatedas ta=NI=Fp(3) whereNIdenotestheaveragenumberofprocessorinstructionstoprocessonesensingmeasurementandFpdenotestheprocessorfrequency.NIcanbeestimatedas NI=NbRbsen(3) whereNbdenotestheaveragenumberofprocessorinstructionstoprocessonebitandRbsendenotesthesensingresolutionbits(numberofbitsrequiredforstoringonesensingmeasurement). Eiprocisgivenby, Eiproc=VpIipti(3) whereIipdenotestheprocessoridlemodecurrentandtidenotesthetimespentbytheprocessorintheidlemode.Sincetheprocessorswitchestotheidlemodewhennot 78

PAGE 79

processingsensingmeasurements,ticanbegivenas ti=1)]TJ /F3 11.955 Tf 11.95 0 Td[(ta(3) Thetransceiver(radio)isthemaincontributortothetotalcommunicationenergyconsumption.Thetransceivertransmits/receivesdatapacketsandswitchestotheidlemodeforenergyconservationwhentherearenomorepacketstotransmit/receive.Thenumberofpacketstransmitted(received)andthepackets'transmission(receive)intervaldictatesthecommunicationenergy.Thecommunicationenergyisthesumofthetransmission,receive,andidleenergiesforthesensornode'stransceiver Ecom=Etxtrans+Erxtrans+Eitrans(3) whereEtxtrans,Erxtrans,andEitransdenotethetransceiver'stransmissionenergyperhour,receiveenergyperhour,andidleenergyperhour,respectively.Etxtransisgivenby Etxtrans=NtxpktEpkttx(3) whereNtxpktdenotesthenumberofpacketstransmittedperhourandEpkttxdenotesthetransmissionenergyperpacket.Ntxpktcanbecalculatedas Ntxpkt=3600=Pti(3) wherePtidenotesthepackettransmissionintervalinseconds(1hour=3600seconds).Epkttxisgivenas Epkttx=VtIttpkttx(3) whereVtdenotesthetransceivervoltage,Itdenotesthetransceivercurrent,andtpkttxdenotesthetimetotransmitonepacket.tpkttxisgivenby tpkttx=Ps8=Rtx(3) 79

PAGE 80

wherePsdenotesthepacketsizeinbytesandRtxdenotesthetransceiverdatarate(inbits/second). Thetransceiver'sreceiveenergyperhourErxtranscanbecalculatedusingasimilarprocedureasEtxtrans.Erxtransisgivenby Erxtrans=NrxpktEpktrx(3) whereNrxpktdenotesthenumberofpacketsreceivedperhourandEpktrxdenotesthereceiveenergyperpacket.Nrxpktcanbecalculatedas Nrxpkt=3600=Pri(3) wherePridenotesthepacketreceiveintervalinseconds.Pricanbecalculatedas Pri=Pti=ns(3) wherensdenotesthenumberofneighboringsensornodes.Epktrxisgivenas Epktrx=VtIrxttpktrx(3) whereIrxtdenotesthetransceiverreceivecurrentandtpktrxdenotesthetimetoreceiveonepacket.Sincethepacketsizeisthesame,thetimetoreceiveapacketisequaltothetimetotransmitthepacket,thatis,tpktrx=tpkttx. Eitranscanbecalculatedas Eitrans=VtIsttitx(3) whereIstdenotesthetransceiversleepcurrentandtitxdenotesthetransceiveridletimeperhour.titxcanbecalculatedas titx=3600)]TJ /F1 11.955 Tf 11.96 0 Td[((Ntxpkttpkttx))]TJ /F1 11.955 Tf 11.95 0 Td[((Nrxpkttpktrx)(3) 80

PAGE 81

3.1.2ThroughputEstimation Throughputisdenedastheamountofworkprocessedbyasysteminagivenunitoftime.Deningthroughputsemanticsforembeddedsensornodesischallengingbecausethreemaincomponentscontributetothethroughput,sensing,processing,andcommunication(transmission),andthesethroughputcomponentscanhavedifferentsignicancefordifferentapplications.Sincethesethroughputcomponentsarerelated,onepossibleinterpretationistotakethethroughputofthelowestthroughputcomponentastheeffectivethroughput.However,theeffectivethroughputmaynotbeasuitablemetricforadesignerwhoisinterestedinthroughputsassociatedwithallthreecomponents. Inourmodel,wedenetheaggregatethroughputasthecombinationofthesensornode'ssensing,processing,andtransmissionratestoobserve/monitoraphenomenon(measuredinbits/second).Theaggregatethroughputcanbeconsideredastheweightedsumoftheconstituentthroughputs.Ouraggregatethroughputmodelcanbeusedfortheeffectivethroughputestimationbyassigningaweightfactorofonetotheslowestofthethreecomponentsandassigningaweightfactorofzerototheothers.SinceaggregatethroughputmodelingallowsexibilityandcanbeadaptedtovaryingneedsofanEWSNdesigner,wefocusonmodelingoftheaggregatethroughput.Wemodelaggregatethroughputas R=!sRsen+!pRproc+!cRcom:!s+!p+!c=1(3) whereRsen,Rproc,andRcomdenotethesensing,processing,andcommunicationthroughputs,respectively,and!s,!p,and!cdenotetheassociatedweightfactors. Thesensingthroughput,whichisthethroughputduetosensingactivity,dependsuponthesensingfrequencyandsensingresolutionbitspersensingmeasurement.Rsenisgivenby Rsen=FsRbsen(3) 81

PAGE 82

whereFsdenotesthesensingfrequency. Theprocessingthroughput,whichistheprocessor'sthroughputwhileprocessingsensedmeasurements,dependsupontheprocessorfrequencyandtheaveragenumberofinstructionsrequiredtoprocessthesensingmeasurement.Rprocisgivenby Rproc=Fp=Nb(3) Thecommunicationthroughput,whichmeasuresthenumberofpacketstransferredsuccessfullyoverthewirelesschannel,dependsuponthepacketsizeandthetimetotransferonepacket.Rcomisgivenby Rcom=Peffs8=tpkttx(3) wherePeffsdenotestheeffectivepacketsizeexcludingthepacketheaderoverhead(i.e.,Peffs=Ps)]TJ /F3 11.955 Tf 11.95 0 Td[(PhwherePhdenotesthepacketheadersize). 3.1.3ReliabilityEstimation Thereliabilitymetricmeasuresthenumberofpacketstransferredreliably(i.e.,error-freepackettransmission)overthewirelesschannel.Accuratereliabilityestimationischallengingduetodynamicchangesinthenetworktopology,numberofneighboringsensornodes,wirelesschannelfading,sensornetworktrafc,packetsize,etc.ThetwomainfactorsthataffectreliabilityaretransceivertransmissionpowerPtxandreceiversensitivity.Forexample,theAT86RF230transceiver[ 93 ]hasareceiversensitivityof-101dBmwithacorrespondingpacketerrorrate(PER)1%foranadditivewhitegaussiannoise(AWGN)channelwithaphysicalservicedataunit(PSDU)equalto20bytes.ReliabilitycanbeestimatedusingFriisfreespacetransmissionequation[ 94 ]fordifferentPtxvalues,distancebetweentransmittingandreceivingsensornodes,andassumptionsonfadingmodelparameters(e.g.,shadowingfadingmodel).DifferentreliabilityvaluescanbeassignedcorrespondingtodifferentPtxvaluessuchthatthehigherPtxvaluesgivehigherreliability,however,moreaccuratereliabilityestimation 82

PAGE 83

requiresusingprolingstatisticsforthenumberofpacketstransmittedandthenumberofpacketsreceived.TheseprolingstatisticsincreasetheestimationaccuracyofthePERand,therefore,reliability. 3.1.4ModelsValidation Ourmodelsprovidegoodaccuracyinestimatingapplicationmetricssinceourmodelsaccommodatemanyembeddedsensornodehardwareinternalssuchasthebatteryvoltage,batterycapacity,sensingboardvoltage,sensingsleepcurrent,sensingidletime,sensingresolutionbits,etc.Ourmodelsarealsohighlyexiblesinceourmodelspermitcalculationsforparticularnetworksettingssuchasthenumberofneighboringsensornodesanddifferenttypesofsensorswithdifferenthardwarecharacteristics(e.g.,sensingresolutionbits,sensingmeasurementtime,sensingmeasurementcurrent,etc.). Sinceourmodelsprovidearststeptowardsmodelingapplicationmetrics,ourmodels'accuracycannotbecompletelyveriedagainstothermodelsbecausetherearenosimilar/relatedapplicationmetricsestimationmodels.Theexistingmodelsforlifetimeestimationtakedifferentparametersandhavedifferentassumptions,thusanexactcomparisonisnotfeasible,however,weobservethatourlifetimemodelyieldsresultsinasimilarrangeasothermodels[ 85 ][ 86 ][ 87 ].WealsocomparethelifetimeestimationfromourmodelwithanexperimentalstudyonEWSNlifetimes[ 89 ].Thiscomparisonveriesconformityofourlifetimemodelwithrealmeasurements.Forexample,withasensornodebatterycapacityof2500mA-h,experimentsindicateasensornodelifetimerangingfrom72to95hoursfora100%dutycyclefordifferentbatterybrands(e.g.,Ansmann,PanasonicIndustrial,VartaHighEnergy,PanasonicExtremePower)[ 89 ].Usingourmodelwithadutycycleof36%onaverageforthesensing,processing,andcommunication,wecalculatedthatalifetimeof95/0.36=264hours11dayscanbeattained.Similarlyforadutycycleof0.25%onaverageforthesensing,communication, 83

PAGE 84

andprocessing,thelifetimecanbecalculatedas95/0.0025=38,000hours1,583days(examplelifetimecalculationsusingourmodelisgiveninSection 3.2 ). Therelativecomparisonofourmodelswithexistingmodelsandrealmeasurementsprovideinsightsintotheaccuracyofourmodels,however,moreaccuratemodelscanbeconstructedfollowingourmodelingapproachbyconsideringadditionalparametersandmoredetailedhardwaremodelsforembeddedsensornodes. 3.2ExperimentalResults Inthissection,wedescribetheexperimentalsetupandresultsobtainedfromourapplicationmetricsestimationmodel. 3.2.1ExperimentalSetup OurexperimentalsetupisbasedontheCrossbowIRISmoteplatform[ 95 ]withabatterycapacityof2000mA-husingtwoAAalkalinebatteries.TheIRISmoteplatformintegratesanAtmelATmega1281microcontroller[ 92 ],anMTS400sensorboard[ 90 ]withSensirionSHT1xtemperatureandhumiditysensors[ 91 ],andanAtmelAT-86RF230low-power2.4GHztransceiver[ 93 ].Table 3-1 showsthesensornodehardwarespecicvalues,correspondingtotheIRISmoteplatform,whichareusedbytheapplicationmetricsestimationmodel[ 95 ][ 92 ][ 91 ][ 93 ]. Weanalyzesixtunableparameters:processorvoltageVp,processorfrequencyFp,sensingfrequencyFs,packetsizePs,packettransmissionintervalPti,andtransceivertransmissionpowerPtx.Inordertoexplorethedelityofourmethodologyacrosssmallandlargedesignspaces,weconsidertwodesignspacecardinalities(numberofstatesinthedesignspace):jSj=729andjSj=31,104.ThetunableparametersforjSj=729are:Vp=f2.7,3.3,4g(volts),Fp=f4,6,8g(MHz)[ 92 ],Fs=f1,2,3g(samplespersecond)[ 91 ],Ps=f41,56,64g(bytes),Pti=f60,300,600g(seconds),andPtx=f-17,-3,1g(dBm)[ 93 ].ThetunableparametersforjSj=31,104are:Vp=f1.8,2.7,3.3,4,4.5,5g(volts),Fp=f2,4,6,8,12,16g(MHz)[ 92 ],Fs=f0.2,0.5,1,2,3,4g(samplespersecond)[ 91 ],Ps=f32,41,56,64,100,127g(bytes),Pti=f10,30,60,300,600, 84

PAGE 85

Table3-1. CrossbowIRISmoteplatformhardwarespecications. NotationDescriptionValue VbBatteryvoltage3.6VCbBatterycapacity2000mA-hNbProcessinginstructionsperbit5RbsenSensingresolutionbits24VtTransceivervoltage3VRtxTransceiverdatarate250kbpsIrxtTransceiverreceivecurrent15.5mAIstTransceiversleepcurrent20nAVsSensingboardvoltage3VImsSensingmeasurementcurrent550AtmsSensingmeasurementtime55msIsSensingsleepcurrent0.3A 1200g(seconds),andPtx=f-17,-3,1,3g(dBm)[ 93 ].AllstatespacetuplesarefeasibleforjSj=729,whereasjSj=31,104contains7,779infeasiblestatespacetuplesbecauseallVpandFppairsarenotfeasible. AlthoughweanalyzedourapplicationmetricsestimationmodelfortheIRISmotesplatformandtwodesignspaces,ourapplicationmetricsestimationmodelisequallyapplicabletoanyplatform,applicationdomain,anddesignspace.Ourapplicationmetricsestimationmodelaccommodatesseveralsensornodehardwareinternals,whicharehardwareplatform-specicandcanbeobtainedfromtheplatform'sdatasheets.Sincetheappropriatevaluescanbesubstitutedforanygivenplatform,ourmodelcanbeusedwithanyhardwareplatform. 3.2.2Results Inthissubsection,wepresentexampleapplicationmetricscalculationsusingourapplicationmetricsestimationmodel. Sincetheobjectivefunctionvaluescorrespondingtodifferentstatesdependsupontheestimationofhigh-levelapplicationmetrics,wepresentexamplecalculationstoexemplifythisestimationprocessusingourapplicationmetricsestimationmodel 85

PAGE 86

(Section 3.1 )andtheIRISmoteplatformhardwarespecications(Table 3-1 ).Weconsidertheexamplestatesy=(Vpy,Fpy,Fsy,Psy,Ptiy,Ptxy)=(2.7,4,1,41,60,)]TJ /F1 11.955 Tf 9.3 0 Td[(17). 3.2.2.1Lifetime First,wecalculatethelifetimecorrespondingtosy.UsingEquation 3 ,thebatteryenergyisE0b=3.62000=7200mWh,whichisEb=72003600=1000=25,920JfromEquation 3 .Thelifetimemetriccalculationrequirescalculationofprocessing,communication,andsensingenergy. Fortheprocessingenergyperhour,Equation 3 andEquation 3 giveNI=524=120andta=120=(4106)=30s,respectively.Theprocessor'sactivemodeenergyconsumptionperhourfromEquation 3 isEaproc=2.72.510)]TJ /F4 7.97 Tf 6.59 0 Td[(33010)]TJ /F4 7.97 Tf 6.59 0 Td[(6=0.2025JwhereIap=2.5mAcorrespondingto(Vpy,Fpy)=(2.7,4)[ 92 ].UsingEquation 3 givesti=1)]TJ /F1 11.955 Tf 11.01 0 Td[(3010)]TJ /F4 7.97 Tf 6.59 0 Td[(6s=999.97ms.Theprocessor'sidlemodeenergyconsumptionperhourfromEquation 3 isEiproc=2.70.6510)]TJ /F4 7.97 Tf 6.58 0 Td[(3999.9710)]TJ /F4 7.97 Tf 6.59 0 Td[(3=1.755mJwhereIip=0.65mAcorrespondingto(Vpy,Fpy)=(2.7,4)[ 92 ].TheprocessorenergyconsumptionperhourfromEquation 3 isEproc=0.202510)]TJ /F4 7.97 Tf 6.59 0 Td[(6+1.75510)]TJ /F4 7.97 Tf 6.59 0 Td[(3=1.7552mJ. Forthecommunicationenergyperhour,Equation 3 andEquation 3 giveNtxpkt=3600=60=60andtpkttx=418=(250103)=1.312ms,respectively.Equation 3 givesEpkttx=39.510)]TJ /F4 7.97 Tf 6.58 0 Td[(31.31210)]TJ /F4 7.97 Tf 6.59 0 Td[(3=37.392J.Thetransceiver'stransmissionenergyperhourfromEquation 3 isEtxtrans=6037.39210)]TJ /F4 7.97 Tf 6.59 0 Td[(6=2.244mJ.Equation 3 givesPri=60=2=30whereweassumens=2,however,ourmodelisvalidforanynumberofneighboringsensornodes.Equation 3 andEquation 3 giveNrxpkt=3600=30=120andEpktrx=315.510)]TJ /F4 7.97 Tf 6.58 0 Td[(31.31210)]TJ /F4 7.97 Tf 6.59 0 Td[(3=61.01J,respectively.Thetransceiver'sreceiveenergyperhourfromEquation 3 isErxtrans=12061.0110)]TJ /F4 7.97 Tf 6.59 0 Td[(6=7.3212mJ.Equation 3 givestitx=3600)]TJ /F1 11.955 Tf 12.65 0 Td[((601.31210)]TJ /F4 7.97 Tf 6.59 0 Td[(3))]TJ /F1 11.955 Tf 12.09 0 Td[((1201.31210)]TJ /F4 7.97 Tf 6.59 0 Td[(3)=3599.764s.Thetransceiver'sidleenergyperhourfrom 86

PAGE 87

Equation 3 isEitrans=32010)]TJ /F4 7.97 Tf 6.59 0 Td[(93599.764=0.216mJ.Equation 3 givescommunicationenergyperhourEcom=2.244+7.3212+0.216=9.7812mJ. WecalculatesensingenergyperhourusingEquation 3 .Equation 3 givesNs=21=2(sinceMTS400sensorboard[ 90 ]hasSensirionSHT1xtemperatureandhumiditysensors[ 91 ]).Equation 3 givesEmsen=2355010)]TJ /F4 7.97 Tf 6.59 0 Td[(65510)]TJ /F4 7.97 Tf 6.59 0 Td[(33600=0.6534J.UsingEquation 3 andEquation 3 givestis=1)]TJ /F1 11.955 Tf 12.13 0 Td[(5510)]TJ /F4 7.97 Tf 6.59 0 Td[(3=0.945sandEisen=30.310)]TJ /F4 7.97 Tf 6.59 0 Td[(60.9453600=3.062mJ,respectively.Equation 3 givesEsen=0.6534+3.06210)]TJ /F4 7.97 Tf 6.58 0 Td[(3=0.6565J. Aftercalculatingprocessing,communication,andsensingenergy,wecalculatetheenergyconsumptionperhourfromEquation 3 asEc=1.755210)]TJ /F4 7.97 Tf 6.59 0 Td[(3+9.781210)]TJ /F4 7.97 Tf 6.59 0 Td[(3+0.6565=0.668J.Equation 3 givesLs=25,920=(0.66824)=1,616.77days. 3.2.2.2Throughput Forthethroughputapplicationmetric,Equation 3 ,Equation 3 ,andEquation 3 giveRsen=124=24bps,Rproc=4106=5=800kbps,andRcom=218=(1.31210)]TJ /F4 7.97 Tf 6.58 0 Td[(3)=128.049kbps,respectively(Peffs=41)]TJ /F1 11.955 Tf 12.18 0 Td[(21=20whereweassumePh=21bytes).Equation 3 givesR=(0.4)(24)+(0.4)(800103)+(0.2)(128.049103)=345.62kbpswhereweassume!s,!p,and!cequalto0.4,0.4,and0.2,respectively. 3.2.2.3Reliability WeestimatethereliabilitycorrespondingtoPtx=)]TJ /F1 11.955 Tf 9.3 0 Td[(17dBmtobe0.7(Section 3.1.3 ),however,anaccuratereliabilityvaluecanonlybeobtainedusingprolingstatisticsforthenumberofpacketstransmittedandnumberofpacketslost. Similarly,thelifetime,throughput,andreliabilityforstatesy=(Vpy,Fpy,Fsy,Psy,Ptiy,Ptxy)=(5,16,4,127,10,3)canbecalculatedas10.6days,1,321.77kbps,and0.9999,respectively.Thesecalculationrevealthatthetunableparametervaluesettingsforasensornodecanhaveaprofoundimpactontheapplicationmetrics.Forexample,thelifetimeofasensornodeinourtwoexamplesvariedfrom10.6daysto1616.8daysfor 87

PAGE 88

differenttunableparametervaluesettings.Hence,ourproposedapplicationmetricsestimationmodelcanhelpEWSNdesignerstondappropriatetunableparametervaluesettingstoconservethesensornode'senergyandtoenhancethesensornode'slifetimeaftersatisfyingotherapplicationrequirementssuchasthroughputandreliability. 3.3ConcludingRemarks Inthischapter,weproposedanapplicationmetricestimationmodeltoestimatehigh-levelmetrics(lifetime,throughput,andreliability)fromembeddedsensornode'sparameters.Thisestimationmodelassisteddynamicoptimizationmethodologiesforoperatingstates'comparisons.Ourapplicationmetricsestimationmodelprovidedaprototypemodelforapplicationmetricestimation. Futureworkincludesenhancingourapplicationmetricsestimationmodelforadditionalapplicationmetrics(e.g.,security,delay).WeplantofurthervalidateourapplicationmetricsestimationmodelbycomparingthestatisticsobtainedfromactualembeddedsensornodesoperationinadistributedEWSN. 88

PAGE 89

CHAPTER4MARKOVMODELINGOFFAULT-TOLERANTDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS Embeddedwirelesssensornetworks(EWSNs)consistofspatiallydistributedautonomousembeddedsensornodesthatcollaboratewitheachothertoperformanapplicationtask.EWSNdesignisachallengingtaskthatmustconsiderpotentiallycalamitoussensorelds(e.g.,forest,oceanoor[ 96 ],activevolcanoes[ 97 ],habitats[ 98 ]),diverseapplicationrequirements,maintenance,anddesignconstraints.SinceeachEWSNapplicationdesignisunique,developinghighlyspecializedsensornodescanbecostandtimeprohibitive.Therefore,typicalEWSNsareconstructedusingmass-producedcommercialoff-the-shelf(COTS)embeddedsensornodes.Sincetheseembeddedsensornodesareoftendeployedinunattendedandhostileenvironments,theycanbemoresusceptibletofailuresthanothersystems[ 97 ]andmanualinspectioncanbeimpractical.Additionally,sensorsandactuatorshavesignicantlyhigherfaultratesthanothertraditionalsemiconductor-basedsystems(e.g.,NASAabortedthelaunchofspaceshuttleDiscovery[ 99 ]becauseofasensorfailureintheshuttle'sexternaltank[ 100 ]).Sensorfailurecanoccurduetodeploymentimpact,accidents(animal,vehicular,orbombing),re,extremeweather,oraging[ 101 ].Failedsensornodesmayresultinsensornetworkpartitioning(sensornodesbecomeisolatedandaredisconnectedfromthesensornetwork),reducedEWSNavailability,andsensornetworkfailure.Inordertomeetapplicationrequirementsreliablyinthepresenceofsensorfailures,EWSNsrequirefaultdetectionandfault-tolerance(FT)mechanisms. Faultdetectionencompassesdistributedfaultdetection(DFD)algorithmstoidentifyfaultysensorreadingsthatindicateafaultysensor.DFDalgorithmstypicallyuseexistingnetworktrafctoidentifysensorfailuresandthereforedonotincuranyadditionaltransmissioncost.Afaultdetectionalgorithm'saccuracysigniesthealgorithm'sabilitytoaccuratelyidentifyfaults.Thoughfaultdetectionhelpsinisolatingfaultysensors,EWSNsrequireFTtoreliablyaccomplishapplicationtasks. 89

PAGE 90

OneofthemostprominentFTtechniquesistoaddredundantsystemhardwareand/orsoftware[ 25 ].Stringentdesignconstraints(e.g.,power,cost)differentiateEWSNsfromgeneral-purposesystemsandtheaddedredundancyforFTmustjustifytheadditionalcost.Studiesindicatethatsensors(e.g.,temperature,humidity,light,motion)attachedtoembeddedsensornodeshavecomparativelyhigherfaultratesthanothercomponents(e.g.,processor,transceiver)[ 100 ][ 102 ][ 103 ].Fortunately,sensorsarecheapandaddingredundantsparesensorscontributelittletotheindividualsensornodecost. EventhoughFTisawellstudiedresearcheld[ 104 ][ 105 ][ 106 ][ 107 ][ 108 ][ 109 ],faultdetectionandFTforEWSNsarerelativelyunstudied.VaryingFTrequirementsacrossdifferentapplicationsincreasesthecomplexityoffaultdetectionandFTforEWSNs.Forinstance,mission-criticalapplicationshaverelativelyhighreliabilityrequirementsascomparedtonon-mission-criticalapplications(e.g.,ambientconditionsmonitoring).Tothebestofourknowledgethereexistsnosensornodemodeltoprovidebetterreliabilityformission-criticalapplications.Sinceapplicationsaretypicallydesignedtooperatereliablyforacertainperiodoftime(aspeciclifetimerequirement),FTmetricssuchasreliabilityandmeantimetofailure(MTTF)areimportant.Inordertomeetapplicationrequirements,EWSNdesignersrequireamodeltoestimateFTmetricsanddeterminenecessaryredundancyduringdesigntime.Unfortunately,literatureprovidesnorigorousmathematicalmodelwithinsightsintoEWSNreliabilityandMTTF.PreviousworksstudyfaultdetectionandFTinisolationandtheirsynergisticrelationshiphasnotbeeninvestigatedinthecontextofEWSNs. Ourmaincontributionsinthischapterare: WeproposeanFTembeddedsensornodemodelconsistingofduplexsensors(i.e.,oneactivesensorandoneinactivesparesensor),whichexploitsthesynergyoffaultdetectionandFT.WhereassensorsmayemployN-modularredundancy(triplemodularredundancy(TMR)isaspecialcaseofN-modularredundancy)[ 25 ],weproposeaduplexsensornodemodeltominimizetheadditionalcost.Inourduplexsensormodelweassumethattheredundantsensorisinacold 90

PAGE 91

standbymodethatideallyconsumesnopower.Wepointoutthatalthoughwefocusonsensorfailureswithintheembeddedsensornodeinthisstudyduetohighersensorfailureratesascomparedtoothersensornode'scomponents[ 100 ][ 102 ],ourmodelcanbeextendedtoincludefailuresforothercomponentswithinthesensornodesuchastheprocessorandtransceiver.OurduplexsensornodemodelservesasarststeptowardsreliablesensornodemodelingandcanpotentiallysparkfurtherresearchinFTsensornodemodelingandevaluation. WeinvestigatethesynergyoffaultdetectionandFTforEWSNsandcharacterizeFTparameterssuchascoveragefactor,sensorfailureprobability,andsensorfailurerate. WedevelopMarkovmodelstocharacterizeEWSNreliabilityandMTTF.OurMarkovmodelsarecomprehensiveandcharacterizeembeddedsensornodes,EWSNclusters,andoverallEWSNreliabilityandMTTF.Forthersttime,tothebestofourknowledge,wedelineatereliabilityandMTTFhierarchicallyattheembeddedsensornode,EWSNcluster,andEWSNlevels.ThishierarchicalcharacterizationofFTmetricsreinforcestheEWSNdesignprocessandreducesdesigntimebyenablingEWSNdesignerstoinvestigatetheeffectsofdifferenttypesofFTembeddedsensornodes(e.g.,duplex,TMR),numberofEWSNclusters,andthenumberofembeddedsensornodesintheclusterontheFToftheoverallEWSN.ThisevaluationofFTmetricshierarchicallyresultsinselectionofanefcientEWSNtopologytomeetapplicationrequirements. Wealsoinvestigateiso-MTTF(isoreliability)forEWSNclusters,whichwedeneashowmanyredundantembeddedsensornodesanNFTEWSNclusterrequiresoveranFTEWSNclustertoachievearequiredMTTF(reliability). Theremainderofthispaperisorganizedasfollows.Section 4.1 givesareviewofrelatedwork.Section 4.2 presentstheFTparametersleveragedinourMarkovmodel.Section 4.3 describesourMarkovmodelsforcharacterizingEWSNreliabilityandMTTF.ResultsarepresentedinSection 5.6 andSection 4.5 concludesourstudyandoutlinesfutureresearchworkdirections. 4.1RelatedWork DespiteFTbeingawellstudiedresearcheld[ 104 ][ 105 ][ 106 ][ 107 ][ 108 ][ 109 ],littleworkexistsinEWSNfaultdetectionandFT.Jiang[ 110 ]proposedaDFDschemethatdetectedfaultysensornodesbyexchangingdataandmutuallytestingamongneighboringnodes.Liangetal.[ 111 ]proposedaweightedmedianfaultdetectionscheme(WMFDS)thatusedspatialcorrelationsamongthesensormeasurements 91

PAGE 92

(e.g.,temperature,humidity).[ 112 ]presentedaDFDalgorithmthatidentiedfaultysensornodesbasedoncomparisonsbetweenneighboringsensornodes'data.TheDFDalgorithmusedtimeredundancytotoleratetransientfaultsinsensingandcommunication.Khilaretal.[ 113 ]proposedaprobabilisticapproachtodiagnoseintermittentEWSNfaults.ThesimulationresultsindicatedthattheaccuracyoftheDFDalgorithmincreasedasthenumberofdiagnosticroundsincreased(eachroundcomprisedofexchangingmeasurementswiththeneighboringnodes). IntheareaofEWSNfaultdetection,Dingetal.[ 114 ]proposedalgorithmsforfaultysensoridenticationandfault-toleranteventboundarydetection.Theiralgorithmsconsideredthatboththefaultysensorsandsensorsinaneventregioncouldgenerateabnormalreadings(readingsthatdeviatefromatypicalapplication-specicrange).Krishnamacharietal.[ 115 ]proposedadistributedBayesianalgorithmforsensorfaultdetectionandcorrectionwhichconsideredthatmeasurementerrorsduetofaultyequipmentarelikelytobeuncorrelated.Wuetal.[ 116 ]presentedafaultdetectionschemeinwhichthefusioncenter(thenodethataggregateddatafromdifferentnodes)attemptedtoidentifyfaultysensornodesthroughtemporalsequencesofreceivedlocaldecisionsusingamajorityvotingtechnique. InworkrelatedtoFTforEWSNs,Koushanfaretal.[ 103 ]proposedanFTschemethatprovidedbackupforonetypeofsensorusinganothertypeofsensor,buttheydidnotproposeanyFTmodel.Clouqueuretal.[ 117 ]presentedalgorithmsforcollaborativetargetdetectioninthepresenceoffaultysensors.Chiangetal.[ 118 ]builtandevaluatedsystem-leveltestinterfacesforremotetesting,repair,andsoftwareupgradeforsensornodes.Theyaddedatestinterfacemodule(TIM)toprovidethetestingfunction.ExperimentalresultsindicatedthattheTIMwithdouble,triple,andquadrupleredundancyincreasedtheEWSN'savailability. ThereexistssomeworkonprovidingFTinEWSNsbydeployingrelaynodeswithconnectivityasanFTmetric.Relaynodescommunicatewithsensornodes,other 92

PAGE 93

relaynodes,andsinknodestoprolongEWSNlifetime.Zhangetal.[ 119 ]developedapproximationalgorithmsfordeterminingaminimumnumberofrelaynodesandrelaynodeplacementtoachievecertainconnectivityrequirements.Hanetal.[ 120 ]consideredtheproblemofdeployingrelaynodestoprovideFTinheterogeneousEWSNswheresensornodeshaddifferenttransmissionradii.TheydevelopedapproximationalgorithmsforfullandpartialFTrelaynodeplacement.WhereasfullFTrelaynodeplacementdeployedaminimumnumberofrelaynodestoestablishdisjointpathsbetweeneverysensorand/orrelaynodepair,partialFTrelaynodeplacementonlyconsideredsensornodepairs.[ 121 ]evaluatedgossipalgorithms(distributedalgorithmstodistributecomputationalburdenamongallnodes)consideringconnectivityasanFTmetric. Senetal.[ 122 ]introducedregion-basedconnectivity,anFTmetricdenedastheminimumnumberofnodeswithinaregionwhosefailurewoulddisconnectthenetwork.Theyarguedthatregion-basedconnectivitywasasuperiorFTmetricduetocomparativelylowertransmissionpower.Alwanetal.[ 123 ]providedasurveyofFTroutingtechniquesinEWSNs.Souzaetal.[ 124 ]presentedaframeworkforfailuremanagementinEWSNs,focusingonfaultdiagnosisandrecoverytechniques.TheFTframeworkmitigatedthefailurepropagationinabusiness(enterprize)environmentbyimplementingdifferentFTtechniques. ThereexistssomeworkonEWSNmodeling.Caietal.[ 125 ]presentedareliabilitymodeltoprolongthenetworklifetimeandavailabilitybasedonconnectivityandcoverageconstraints.Zhuetal.[ 126 ]presentedamodelthatcharacterizedsensorconnectivityandinvestigatedthetradeoffsamongsensornodeconnectivity,powerconsumption,anddatarate.Theyalsodiscussedtheimpactofsensorconnectivityonsystemreliability.Vasaretal.[ 127 ]presentedMarkovmodelsforEWSNreliabilityanalysis.Theypresentedareliabilitycomparisonforvariousnumbersofdefectivecomponents'replacementswithhot-standbyredundantcomponents.Xingetal.[ 128 ] 93

PAGE 94

presentedEWSNreliabilityandsecuritymodelinginanintegratedfashion.TheirmodelingtechniquedifferentiatedtwotypesofEWSNfailures:securityfailuresduetomaliciousintrusionsandtraditionalfailuresduetomalfunctioningcomponents. FurtherworkexistsonEWSNmodeling.Moustaphaetal.[ 100 ]usedrecurrentneuralnetworks(RNN)tomodelsensornodedynamicsforsensorfaultdetection.TheirnetworkmodelcorrespondedtoEWSNtopologywithRNNinputtakenfrommodeledsensornodeandneighboringsensornodes.Kannanetal.[ 129 ]developedagame-theoreticmodelofreliablelength-andenergy-constrainedroutinginEWSNs.Theyshowedthatoptimallength-constrainedpathscouldbecomputedinpolynomialtimeinadistributedmannerusinggeographicrouting.Mukhopadhyayetal.[ 130 ]presentedamethodthatusedsensordatapropertiestoenablereliabledatacollection.Theirmethodconsistedofpredictivemodelsbasedonthetemporalcorrelationinthedata.Theydemonstratedthattheirmethodcouldhandlemultiplesourcesoferrorssimultaneouslyandcouldcorrecttransienterrorsarisinginsensornodehardwareandwirelesscommunicationchannels. EventhoughDFDalgorithmswereproposedinliteraturefordetectingsensorfaults,thefaultdetectionwasnotleveragedtoprovideFT.ThereexistsworkinliteratureregardingFTinEWSNs[ 103 ][ 119 ][ 120 ][ 122 ][ 125 ],buthowever,FTmetricssuchasreliabilityandMTTFwerenotinvestigatedrigorouslyinthecontextofEWSNs.Specically,reliabilityandMTTFwerenotconsideredhierarchicallyatthesensornode,EWSNcluster,andEWSNlevel.ThishierarchicalcharacterizationofFTmetricsabetsEWSNdesignbyenablingEWSNdesignertodetermineanappropriateEWSNtopologycomprisingofanappropriatenumberofclusterswitheachclustercontaininganappropriatenumberofsensornodeswithdesiredFTcharacteristics. 94

PAGE 95

4.2Fault-ToleranceParameters Inthissection,wecharacterizetheFTparametersbyexploitingthesynergybetweenfaultdetectionandFTinEWSNs.TheFTparametersleveragedinourMarkovmodelarecoveragefactor,sensorfailureprobability,andsensorfailurerate. 4.2.1CoverageFactor Thecoveragefactorcisdenedastheprobabilitythatthefaultyactivesensoriscorrectlydiagnosed,disconnected,andreplacedbyagoodinactivesparesensor.ThecestimationiscriticalinanFTEWSNmodelandcanbedeterminedby c=ck)]TJ /F3 11.955 Tf 11.96 0 Td[(cc(4) whereckdenotestheaccuracyofthefaultdetectionalgorithmindiagnosingfaultysensorsandccdenotestheprobabilityofanunsuccessfulreplacementoftheidentiedfaultysensorwiththegoodsparesensor.Whereasccdependsuponthesensorswitchingcircuitryandisusuallyaconstant,ckestimationischallengingasdifferentfaultdetectionalgorithmshavedifferentaccuracies.Weanalyzeddifferentfaultdetectionalgorithms[ 110 ][ 111 ][ 114 ][ 115 ]andobservedthattheaccuracyofatypicalfaultdetectionalgorithmdependsupontheaveragenumberofsensornodeneighborskandtheprobabilityofsensorfailurep.Wemodelck:ck1withthefollowingempiricalrelation ck=k(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p) k(k=M(p))1=M(p)+(1)]TJ /F3 11.955 Tf 11.96 0 Td[(k=M(p))k(4) whereM(p)isafunctionofpanddenotesanadjustmentparameterthatmaycorrespondlooselytothedesiredaveragenumberofneighboringsensornodesrequiredtoachieveagooddetectionaccuracyforagivenp.WepointoutthatwhereasEquation 4 providesagoodestimateofckingeneralforanyfaultdetectionalgorithm,exactckforaparticularfaultdetectionalgorithmcanbederivedfromthealgorithm'smathematicalmodel.However,ourMarkovmodelsareindependentofck'sdeterminationmethodologyandworkequallygoodforanyckvalue. 95

PAGE 96

Figure4-1. Sensornodefailureprobabilitydistribution. 4.2.2SensorFailureProbability Sensorsonanembeddedsensornodehavecomparativelyhigherfaultratesthanothercomponents[ 100 ][ 102 ][ 103 ].Thesensorfailureprobabilitymaynotbecharacterizedbyaconstantduetomanypossibletypesofsensorfaults(e.g.,outlier,noisy[ 102 ],etc.).Sensorscanfailand/orgiveerroneousreadingsduetosensorboarddisconnection,brokensensorerror,orbatteryoffseterror[ 131 ].Eachofthesensorfaultscanbecharacterizedbyarandomvariableandthesensorfailureprobabilitycanbegivenasthesumoftheserandomvariables.Thus,thesensorfailureprobabilitydensityfunctionfs(p)canbewrittenas fs(p)=X1(p)+X2(p)+X3(p)+...+Xn(p)(4) whereXi,i2f1,2,...,ngdenotetherandomvariablesfornsensorfaultswithmeaniandvariance2i.Usingthecentrallimittheorem[ 132 ],forlargen,weapproximatethesensorfailureprobabilityasanormaldistribution(Fig. 4-1 )withmeansandvariance2s,thus fs(p)=1 p 2sexp")]TJ /F1 11.955 Tf 10.5 8.09 Td[(1 2p)]TJ /F5 11.955 Tf 11.95 0 Td[(s s2#80p1(4) wheres=Pni=1iand2s=Pni=12i. 96

PAGE 97

4.2.3SensorFailureRate Thesensorfailurephenomenoncanberepresentedbyexponentialdistributionwithfailureratesovertheperiodts[ 133 ](theperiodtssigniesthetimeoverwhichthesensorfailureprobabilitypisspecied).Thus,wecanwriteusingEquation 4 fs(p)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(exp()]TJ /F5 11.955 Tf 9.29 0 Td[(sts)(4) Notethatsisafunctionoftsbutfornotationalsimplicity,weusesinsteadofs(ts).SolvingEquation 4 fors,weget s=)]TJ /F1 11.955 Tf 11.19 8.09 Td[(1 tsln 1)]TJ /F1 11.955 Tf 27.35 8.09 Td[(1 p 2sexp")]TJ /F1 11.955 Tf 10.49 8.09 Td[(1 2p)]TJ /F5 11.955 Tf 11.96 0 Td[(s s2#!(4) 4.3Fault-TolerantMarkovModels Inthissection,wepresentourMarkovmodelsforFTEWSNs.OurMarkovmodelsarecomprehensiveandencompasshierarchicallytheembeddedsensornode,anEWSNcluster,andtheoverallEWSN.Weadoptabottom-upparadigminourmodelingbyrstdevelopingasensornodemodelanddeterminingitsreliabilityandMTTF.TheembeddedsensornodeMTTFgivestheaveragesensornodefailurerate,whichisutilizedintheEWSNclustermodel.TheEWSNclustermodelgivestheEWSNclusterreliabilityandMTTF,whichdeterminesEWSNclusteraveragefailurerate.TheEWSNclusteraveragefailurerateisutilizedintheEWSNmodeltodetermineEWSNreliabilityandMTTF.Thisbottom-upapproachenablesEWSNreliabilityandMTTFcharacterizationbyleveragingsensornodeandEWSNclustermodels.WepointoutthatsomeEWSNarchitecturesdonotemploycluster-basedhierarchyduetotheadditionalenergycostassociatedwithclusterformationandclusterheadelection.However,ourmodelingapproachisequallyapplicablefortheseEWSNarchitecturesbyconsideringthattheEWSNiscomposedofjustoneclusterwheretheclusterheadisthesinknode,whichreceivesthesensedinformationfromothersensornodesintheEWSN.Forclarity,Table 4-1 summarizesimportantnotations. 97

PAGE 98

Table4-1. SummaryofnotationsusedinEWSNMarkovmodels NotationDescription ccoveragefactorpsensorfailureprobabilityttemperaturesensorfailureratessensorfailureratetstimeoverwhichpisspeciedPi(t)probabilityofbeinginstateiattimetRsd(t)reliabilityofFT(duplex)sensornodeMTTFsdmeantimetofailureofanFTsensornodekaveragenumberofneighborsensornodessd(k)FTsensornodefailureratewithkneighborsRc(t)EWSNclusterreliabilityc(n)EWSNclusterfailureratewithnsensornodesNnumberofclustersintheEWSNRewsn(t)EWSNreliability 4.3.1Fault-TolerantEmbeddedSensorNodeModel Asabasecase,wedescribeanon-FT(NFT)embeddedsensornodeMarkovmodel(Fig. 4-2 )containingonesensor(temperaturesensorinthiscase,butthesensortypeisarbitrary).TheNFTembeddedsensornodemodelconsistsoftwostates:state1(goodstate)andstate0(failedstate).TheNFTembeddedsensornodefailswhenthenodetransitionsfromstate1tostate0duetoasensorfailure.ThedifferentialequationsdescribingtheNFTembeddedsensornodeMarkovmodelare P01(t)=)]TJ /F5 11.955 Tf 9.3 0 Td[(tP1(t)P00(t)=tP1(t) (4) wherePi(t)denotestheprobabilitythatthesensornodewillbeinstateiattimetandP0i(t)representstherstorderderivativeofPi(t).trepresentsthefailurerateofanactivetemperaturesensor. 98

PAGE 99

Figure4-2. Anon-FT(NFT)embeddedsensornodeMarkovmodel. SolvingEquation 4 withtheinitialconditionsP1(0)=1andP0(0)=0yields P1(t)=e)]TJ /F11 7.97 Tf 6.59 0 Td[(ttP0(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F11 7.97 Tf 6.59 0 Td[(tt (4) ThereliabilityoftheNFTembeddedsensornodeisgivenby Rs(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(P0(t)=P1(t)=e)]TJ /F11 7.97 Tf 6.58 0 Td[(tt(4) TheMTTFoftheNFTembeddedsensornodeis MTTFs=Z10Rs(t)dt=1 t(4) TheaveragefailurerateoftheNFTembeddedsensornodeisgivenby s=1 MTTFs=t(4) Sincesensorshavecomparativelyhigherfaultratesthanothercomponents[ 100 ][ 102 ][ 103 ],weproposeanFTduplexsensornodemodelconsistingofoneactivesensorandoneinactivesparesensor.UsingTMR[ 25 ]forFTisapossiblescenario,butweconsideraduplexsensornodemodeltominimizetheadditionalcostastheadditivecostofsparesensorscanbeprohibitiveforlargeEWSNs.Inaddition,aduplexmodellimitstheincreaseinsensornodesize.OurmodelprovidesarststeptowardsFTsensornodemodelingandencouragestheevaluationofotherFTsensornodemodels. 99

PAGE 100

Figure4-3. FTembeddedsensornodeMarkovmodel. Inourduplexsensornode,theinactivesensorbecomesactiveonlyoncetheactivesensorisdeclaredfaultybythefaultdetectionalgorithm.WerefertoourduplexsensornodeasanFTsensornode,whoseMarkovmodelisdepictedinFig 4-3 .ThestatesintheMarkovmodelrepresentthenumberofgoodsensors.ThedifferentialequationsdescribingtheFTembeddedsensornodeMarkovmodelare P02(t)=)]TJ /F5 11.955 Tf 9.3 0 Td[(tP2(t)P01(t)=tcP2(t))]TJ /F5 11.955 Tf 11.95 0 Td[(tP1(t)P00(t)=t(1)]TJ /F3 11.955 Tf 11.95 0 Td[(c)P2(t)+tP1(t) (4) wherePi(t)denotestheprobabilitythatthesensornodewillbeinstateiattimetandP0i(t)representstherstorderderivativeofPi(t).trepresentsthefailurerateofanactivetemperaturesensorandctistherateatwhichrecoverablefailureoccurs.Theprobabilitythatthesensorfailurecannotberecoveredis(1)]TJ /F3 11.955 Tf 12.45 0 Td[(c)andtherateatwhichunrecoverablefailureoccursis(1)]TJ /F3 11.955 Tf 11.96 0 Td[(c)t. 100

PAGE 101

SolvingEquation 4 withtheinitialconditionsP2(0)=1,P1(0)=0,andP0(0)=0yields P2(t)=e)]TJ /F11 7.97 Tf 6.59 0 Td[(ttP1(t)=ctte)]TJ /F11 7.97 Tf 6.58 0 Td[(ttP0(t)=1)]TJ /F3 11.955 Tf 11.95 0 Td[(P1(t))]TJ /F3 11.955 Tf 11.95 0 Td[(P2(t) (4) ThereliabilityoftheFTembeddedsensornodeisgivenby Rsd(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(P0(t)=P2(t)+P1(t)=e)]TJ /F11 7.97 Tf 6.59 0 Td[(tt+ctte)]TJ /F11 7.97 Tf 6.59 0 Td[(tt(4) wheresubscriptdinRsd(t)standsforduplex.TheMTTFoftheFTembeddedsensornodeis MTTFsd=Z10Rsd(t)dt=1 t+c t(4) TheaveragefailurerateoftheFTembeddedsensornodedependsuponk(sincethefaultdetectionalgorithm'saccuracydependsuponk(Section 4.2 ))andisgivenby sd(k)=1 MTTFsd(k)(4) wheresd(k)andMTTFsd(k)denotethefailurerateandMTTFofanFTembeddedsensornodewithkneighbors. 4.3.2Fault-TolerantEWSNClusterModel AtypicalEWSNconsistsofmanyclustersandweassumeforourmodelthatallnodesinaclusterareneighborstoeachother.Iftheaveragenumberofnodesinaclusterisn,thentheaveragenumberofneighbornodespersensornodeisk=n)]TJ /F1 11.955 Tf 12.31 0 Td[(1.Fig. 4-4 depictsourMarkovmodelforanEWSNcluster.Weassumethataclusterfails(i.e.,failstoperformitsassignedapplicationtask)ifthenumberofalive(non-faulty)sensornodesintheclusterreducestokmin.Thedifferentialequationsdescribingthe 101

PAGE 102

Figure4-4. EWSNclusterMarkovmodel. EWSNclusterMarkovmodelare P0n(t)=)]TJ /F3 11.955 Tf 9.3 0 Td[(nsd(n)]TJ /F4 7.97 Tf 6.58 0 Td[(1)Pn(t)P0n)]TJ /F4 7.97 Tf 6.59 0 Td[(1(t)=nsd(n)]TJ /F4 7.97 Tf 6.59 0 Td[(1)Pn(t))]TJ /F1 11.955 Tf 11.96 0 Td[((n)]TJ /F1 11.955 Tf 11.95 0 Td[(1)sd(n)]TJ /F4 7.97 Tf 6.58 0 Td[(2)Pn)]TJ /F4 7.97 Tf 6.59 0 Td[(1(t)...P0kmin(t)=(kmin+1)sd(kmin)Pkmin+1(t) (4) wheresd(n)]TJ /F4 7.97 Tf 6.59 0 Td[(1),sd(n)]TJ /F4 7.97 Tf 6.58 0 Td[(2),andsd(kmin)representtheFT(duplex)sensornodefailurerate(Equation 4 )whentheaveragenumberofneighborsensornodesaren)]TJ /F1 11.955 Tf 12.29 0 Td[(1,n)]TJ /F1 11.955 Tf 12.29 0 Td[(2,andkmin,respectively.Formathematicaltractabilityandclosedformsolution,weanalyzeaspecial(simple)casewhenn=kmin+2,whichreducestheMarkovmodeltothreestatesasshowninFig. 4-5 .ThedifferentialequationsdescribingtheEWSNclusterMarkovmodelwhenn=kmin+2are P0kmin+2(t)=)]TJ /F1 11.955 Tf 9.29 0 Td[((kmin+2)sd(kmin+1)Pkmin+2(t)P0kmin+1(t)=(kmin+2)sd(kmin+1)Pkmin+2(t))]TJ /F1 11.955 Tf -137.45 -26.9 Td[((kmin+1)sd(kmin)Pkmin+1(t)P0kmin(t)=(kmin+1)sd(kmin)Pkmin+1(t) (4) 102

PAGE 103

Figure4-5. EWSNclusterMarkovmodelwiththreestates. SolvingEquation 4 withtheinitialconditionsPkmin+2(0)=1,Pkmin+1(0)=0,andPkmin(0)=0yields Pkmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[((km+2)sd(km+1)tPkmin+1(t)=(km+2)sd(km+1)e)]TJ /F4 7.97 Tf 6.58 0 Td[((km+2)sd(km+1)t (km+1)sd(km))]TJ /F1 11.955 Tf 11.95 0 Td[((km+2)sd(km+1)+(km+2)sd(km+1)e)]TJ /F4 7.97 Tf 6.59 0 Td[((km+1)sd(km)t (km+2)sd(km+1))]TJ /F1 11.955 Tf 11.95 0 Td[((km+1)sd(km)Pkmin(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(Pkmin+1(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Pkmin+2(t) (4) wherewedenotekminbykminEquation 4 forconciseness.ThereliabilityoftheEWSNclusteris Rc(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(Pkmin(t)=e)]TJ /F4 7.97 Tf 6.58 0 Td[((kmin+2)sd(kmin+1)t+(kmin+2)sd(kmin+1)e)]TJ /F4 7.97 Tf 6.59 0 Td[((kmin+2)sd(kmin+1)t (kmin+1)sd(kmin))]TJ /F1 11.955 Tf 11.96 0 Td[((kmin+2)sd(kmin+1)+(kmin+2)sd(kmin+1)e)]TJ /F4 7.97 Tf 6.58 0 Td[((kmin+1)sd(kmin)t (kmin+2)sd(kmin+1))]TJ /F1 11.955 Tf 11.95 0 Td[((kmin+1)sd(kmin) (4) TheMTTFoftheEWSNclusteris MTTFc=Z10Rc(t)dt=1 (km+2)sd(km+1)+1 (km+1)sd(km))]TJ /F1 11.955 Tf 11.95 0 Td[((km+2)sd(km+1)+(km+2)sd(km+1) (km+2)(km+2)sd(km)sd(km+1))]TJ /F1 11.955 Tf 11.96 0 Td[((km+1)22sd(km) (4) 103

PAGE 104

wherewedenotekminbykminEquation 4 forconciseness.Theaveragefailurerateoftheclusterc(n)dependsupontheaveragenumberofnodesintheclusternatdeploymenttimeandisgivenby c(n)=1 MTTFc(n)(4) whereMTTFc(n)denotestheMTTFofaEWSNclusterofnsensornodes. 4.3.3Fault-TolerantEWSNModel AtypicalEWSNconsistsofN=ns=nclusterswherensdenotesthetotalnumberofsensornodesintheEWSNandndenotestheaveragenumberofnodesinacluster.Fig. 4-6 depictsourEWSNMarkovmodel.WeassumethattheEWSNfailstoperformitsassignedtaskwhenthenumberofaliveclustersreducestoNmin.ThedifferentialequationsdescribingtheEWSNMarkovmodelare P0N(t)=)]TJ /F3 11.955 Tf 9.3 0 Td[(Nc(n)P0N)]TJ /F4 7.97 Tf 6.59 0 Td[(1(t)=Nc(n)PN(t))]TJ /F1 11.955 Tf 11.96 0 Td[((N)]TJ /F1 11.955 Tf 11.95 0 Td[(1)c(n)PN)]TJ /F4 7.97 Tf 6.58 0 Td[(1(t)...P0Nmin(t)=(Nmin+1)c(n)PNmin+1(t) (4) wherec(n)representstheaverageclusterfailurerate(Equation 4 )whentheclustercontainsnsensornodesatdeploymenttime.Formathematicaltractability,weanalyzeaspecial(simple)casewhenN=Nmin+2,whichreducestheMarkovmodeltothreestates.ThedifferentialequationsdescribingtheEWSNMarkovmodelwhenN=Nmin+2are P0Nmin+2(t)=)]TJ /F1 11.955 Tf 9.3 0 Td[((Nmin+2)c(n)PNmin+2(t)P0Nmin+1(t)=(Nmin+2)c(n)PNmin+2(t))]TJ /F1 11.955 Tf 11.96 0 Td[((Nmin+1)c(n)PNmin+1(t)P0Nmin(t)=(Nmin+1)c(n)PNmin+1(t) (4) 104

PAGE 105

Figure4-6. EWSNMarkovmodel. SolvingEquation 4 withtheinitialconditionsPNmin+2(0)=1,PNmin+1(0)=0,andPNmin(0)=0yields PNmin+2(t)=e)]TJ /F4 7.97 Tf 6.58 0 Td[((Nmin+2)c(n)tPNmin+1(t)=(Nmin+2)c(n)e)]TJ /F4 7.97 Tf 6.59 0 Td[((Nmin+1)c(n)t)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F4 7.97 Tf 6.58 0 Td[((Nmin+2)c(n)tPNmin(t)=1)]TJ /F3 11.955 Tf 11.95 0 Td[(PNmin+1(t))]TJ /F3 11.955 Tf 11.95 0 Td[(PNmin+2(t) (4) TheEWSNreliabilityisgivenas Rewsn(t)=1)]TJ /F3 11.955 Tf 11.96 0 Td[(PNmin(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[((Nmin+2)c(n)t+(Nmin+2)c(n)e)]TJ /F4 7.97 Tf 6.58 0 Td[((Nmin+1)c(n)t)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F4 7.97 Tf 6.58 0 Td[((Nmin+2)c(n)t (4) TheEWSNMTTFwhenN=Nmin+2is MTTFewsn=Z10Rwsn(t)dt=1 (Nmin+2)c(n)+Nmin+2 Nmin+1)]TJ /F1 11.955 Tf 11.95 0 Td[(1(4) 4.4Results Inthissection,wepresentreliabilityandMTTFresultsforourMarkovmodels(Section 4.3 )implementedintheSHARPE(SymbolicHierarchicalAutomatedReliabilityandPerformanceEvaluator)SoftwarePackage[ 134 ].SHARPEprovidesMTTFresultsdirectlybasedonourmodels'implementations,however,itdoesnotprovidereliabilityresultsdirectly,butreliabilityresultscanbecalculatedfromstateprobabilities.WepresentexamplereliabilitycalculationsaswellasdetailedreliabilityandMTTFresults 105

PAGE 106

foranNFTaswellasanFTembeddedsensornode,EWSNcluster,andEWSNusingourbottom-upMarkovmodelingparadigm. 4.4.1ExperimentalSetup Inthissubsection,wedescribeourFTembeddedsensornode,EWSNcluster,andEWSNmodelimplementationintheSHARPESoftwarePackage.Wealsopresentatypicalfaultdetectionalgorithm'saccuracyfordifferentsensorfailureprobabilitypvalues. SHARPEisasoftwaretoolforperformance,reliability,andperformabilitymodelspecicationandanalysis.TheSHARPEtoolkitprovidesaspecicationlanguageandsolutionmethodsforcommonlyusedperformance,reliability,andperformabilitymodeltypesincludingcombinatorialmodels,state-space(e.g.,Markovandsemi-Markovrewardmodels),andstochasticPetrinets.SHARPEallowscomputationofsteady-state,transient,andintervalmeasures.SHARPEallowsoutputmeasuresofamodeltobeusedasparametersofothermodelstofacilitatethehierarchicalcombinationofdifferentmodeltypes[ 135 ]. DuetoSHARPElimitationsthattakeonlyexponentialpolynomials,wesimplifyoursensorfailurerateexpressionEquation 4 totakethemeanofsensorfailureprobabilities.Theexponentialdistribution,whichhasapropertyofconstantfailurerate,isagoodmodelforthelongatintrinsicfailureportionofthebathtubcurveandisapplicabletoavarietyofpracticalsituationssincemanyembeddedcomponentsandsystemsspendmostoftheirlifetimesinthisportionofthebathtubcurve.Furthermore,anyfailureratecurvecanbeapproximatedbypiecewiseexponentialdistributionsegmentspatchedtogetherwhereeachexponentialdistributionsegmentspeciesaconstantfailurerateoverasmalltimeunit(e.g.,daily,weekly,ormonthly)thatistheaverageoftheactualchangingrateduringtherespectivetimeduration.Thisfailureratecurveapproximationbypiecewiseexponentialdistributionsisanalogoustoacurveapproximationbypiecewisestraightlinesegments.Moreover,manynaturalphenomena 106

PAGE 107

haveaconstantfailurerate(oroccurrencerate)property(e.g.,thearrivalrateofcosmicrayalphaparticles).TheexponentialmodelworkswellfortheinterarrivaltimeswherethetotalnumberofeventsinagivenperiodisgivenbythePoissondistribution.Whentheseeventstriggerfailures,theexponentiallifetimedistributionmodelnaturallyapplies[ 136 ].Thus p=1)]TJ /F3 11.955 Tf 11.96 0 Td[(exp()]TJ /F5 11.955 Tf 9.3 0 Td[(tts)(4) wherepdenotesthemeanofsensorfailureprobability.SolvingEquation 4 fort,weget t=)]TJ /F1 11.955 Tf 11.19 8.08 Td[(1 tsln(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)(4) Wepointoutthatusingthemeanofsensorfailureprobabilityinsteadofnormaldistributiongivesagoodestimateoftasthevariance2sinEquation 4 approacheszero(i.e.,2s!0). OurMarkovmodelexploitsthesynergyoffaultdetectionandfault-toleranceforEWSNs.Table 4-2 depictsckvaluesestimatedforaDFDalgorithmusingEquation 4 fordifferentpandkvalues.TheseestimatedckvaluesapproximatetheaccuracyoftheDFDalgorithmsproposedin[ 110 ][ 111 ][ 114 ][ 115 ].WepointoutthatanyinaccuracyintheckestimationdoesnotaffectourresultscalculationssincethesecalculationsleverageckvaluesbutareindependentofwhetherthesevaluesreectaccuratelyaparticularDFDalgorithm'saccuracy.Weassumecc=0inEquation 4 (i.e.,onceafaultysensorisidentied,thefaultysensorisreplacedbyagoodsparesensorperfectly),whichgivesc=ckinEquation 4 4.4.2ReliabilityandMTTFforanNFTandanFTsensornode Inthissubsection,wepresentthereliabilityandMTTFresultsforanNFTandanFTembeddedsensornodeforthetwocases:c6=1andc=1,whenk=5andp=0.05.Thecasec6=1correspondstoareal-worldfaultdetectionalgorithm(typicalvaluesareshowninTable 4-2 )whereasthecasec=1correspondstoanidealfaultdetectionalgorithm,whichdetectsfaultysensorsperfectlyforallpvalues.Wecalculatereliability 107

PAGE 108

Table4-2. Atypicalfaultdetectionalgorithm'saccuracy pM(p)k5671015 0.05250.97911110.1500.8580.8950.9210.9570.960.2560.7550.7170.810.8450.8510.3650.6520.6790.6990.7320.7420.4660.5580.58130.5990.6270.6360.5670.4640.4840.4980.5220.530.6680.3710.3860.3980.4170.4240.7690.2780.290.2980.3130.3180.8700.1850.1930.1980.2080.2120.9710.0920.0960.0990.1040.1060.99720.00920.009620.00990.01040.0106 att=100dayssinceweassumets=100days[ 137 ]inEquation 4 ,however,wepointoutthatreliabilitycanbecalculatedforanyothertimevalueusingourMarkovmodels. ForanNFTembeddedsensornodereliabilitycalculation,werequirethesensorfailureratet,whichcanbecalculatedusingEquation 4 (i.e.,t=()]TJ /F1 11.955 Tf 9.3 0 Td[(1=100)ln(1)]TJ /F1 11.955 Tf -445.58 -23.9 Td[(0.05)=5.1310)]TJ /F4 7.97 Tf 6.59 0 Td[(4failures/day).SHARPEgivesP1(t)=e)]TJ /F4 7.97 Tf 6.58 0 Td[(5.1310)]TJ /F12 5.978 Tf 5.76 0 Td[(4tandsensornodereliabilityRs(t)=P1(t).EvaluatingRs(t)att=100givesRs(t)jt=100=0.94999. ForanFTembeddedsensornodewhenc6=1,differentreliabilityresultsareobtainedfordifferentkbecausethefaultdetectionalgorithm'saccuracyandcoveragefactorcdependsuponk.Fork=5,c=0.979(Table 4-2 ),SHARPEgivesP2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(5.1310)]TJ /F12 5.978 Tf 5.76 0 Td[(4tandP1(t)=5.022310)]TJ /F4 7.97 Tf 6.58 0 Td[(4te)]TJ /F4 7.97 Tf 6.59 0 Td[(5.1310)]TJ /F12 5.978 Tf 5.76 0 Td[(4t.ThereliabilityRs(t)=P2(t)+P1(t),whichwhenevaluatedatt=100givesRs(t)jt=100=0.99770. ForanFTembeddedsensornodewhenc=1forallk,p,SHARPEgivesP2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(5.1310)]TJ /F12 5.978 Tf 5.76 0 Td[(4tandP1(t)=5.1310)]TJ /F4 7.97 Tf 6.59 0 Td[(4te)]TJ /F4 7.97 Tf 6.58 0 Td[(5.1310)]TJ /F12 5.978 Tf 5.76 0 Td[(4t.UsingEquation 4 ,thereliabilityRs(t)jt=100=0.99872. 108

PAGE 109

Figure4-7. MTTFindaysforanNFTandanFTembeddedsensornode. Table 4-3 showsthereliabilityforanNFTandanFTembeddedsensornodeevaluatedatt=100daysforkvaluesof5,10,and15.Asexpected,theresultsshowthatreliabilitydecreasesforbothNFTandFTembeddedsensornodesaspincreasesandthereliabilityattainedbyanFTsensornode(bothforc6=1andc=1)isalwaysbetterthananNFTsensornode.Forexample,thepercentagereliabilityimprovementachievedbyanFTsensornodewithc6=1,k=15overanNFTsensornodeis18.98%whenp=0.2.However,anFTsenornodewithc=1outperformsbothanFTsensornodewithc6=1andanNFTsensornodeforallpvalues.Forexample,thepercentageimprovementsinreliabilityforanFTsensornodewithc=1overanNFTsensornodeandanFTsensornode(c6=1)withkequalto5,10,and15,are69.09%,28.11%,24.32%,and23.82%,respectively,forp=0.5.ThepercentageimprovementsinreliabilityattainedbyanFTsensornodewithc=1overanNFTsensornodeandanFTsensornode(c6=1)withkequalto5,10,and15,are230%,172.36%,166.31%,and165.32%,respectively,forp=0.9.TheseresultsrevealthatthepercentageimprovementinreliabilityattainedbyanFTsensornodewithc=1increasesasp!1becauseforanFTsensornodewithc6=1,cdecreasesasp!1(Table 4-2 ).Thesensornodereliabilityanalysisrevealsthatarobustfaultdetectionalgorithmwithc1forallpandkvaluesisnecessarytoattaingoodreliabilityforanFTsensornode. 109

PAGE 110

Table4-3. ReliabilityforanNFTandanFTembeddedsensornode. Prob.NFTFTFTFTFTpk,c:n/a(k=5,c6=1)(k=10,c6=1)(k=15,c6=1)(c=1) 0.050.949990.997700.998720.998720.998720.10.899960.981350.990740.991020.994820.20.800110.934810.950880.951950.978530.30.699770.862650.882630.885130.949590.40.599890.770940.792090.794850.906430.50.50070.660860.680970.683740.846620.60.400120.536100.552950.555520.766630.70.301190.401670.414320.416120.662620.80.199890.259430.266830.268120.521710.90.100260.121480.124240.124700.330860.990.010050.010480.010530.010560.05628 Fig. 4-7 depictstheMTTFforanNFTandanFTsensornodeforkvaluesof5,10,and15versusthesensorfailureprobabilitypwhentsinEquation 4 is100days[ 114 ][ 115 ].TheresultsshowthattheMTTFforanFTsensornodeimproveswithincreasingk.However,theMTTFshowsnegligibleimprovementwhenk=15overk=10asthefaultdetectionalgorithm'saccuracyimprovementgradient(slope)decreasesforlargekvalues. Fig. 4-7 alsocomparestheMTTFforanFTsensornodewhenc=18k,p.Whereasc6=1forexistingfaultdetectionalgorithms,comparisonwithc=1providesinsightintohowthefaultdetectionalgorithm'saccuracyaffectsthesensornode'sMTTF.Fig. 4-7 showsthattheMTTFforanFTsensornodewithc=1isalwaysgreaterthananFTsensornodewithc6=1. Fig. 4-7 showsthattheMTTFforanNFTandanFTsensornodedecreasesaspincreases,however,theFTsensornodemaintainsbetterMTTFthantheNFTsensornodeforallpvalues.ThisobservationrevealsthatasensorwithalowerfailureprobabilityachievesbetterMTTFforboththeNFTandFTsensornodes.Fig. 4-7 alsoshowsthattheMTTFforboththeNFTandtheFTsensornodesapproacheszeroasp 110

PAGE 111

Table4-4. PercentageMTTFimprovementforanFTembeddedsensornodeascomparedtoanNFTembeddedsensornode. Prob.FTFTFTp(k=5,c6=1)(k=10,c6=1)(c=1) 0.185.7595.651000.275.6184.631000.365.1472.991000.546.2552.491000.727.6231.231000.99.3710.521000.990.8751.34100 approachesone(i.e.,MTTF!0()p!1).Thisobservationisintuitivebecauseafaultysensor(withp=1)isunreliableandleadstoafailedsensornodewithzeroMTTFandsuggeststhatdependingupontheapplication'sreliabilityandMTTFrequirements,thefaultysensorshouldbereplacedbeforepapproachesone. Table 4-4 depictsthepercentageMTTFimprovementgainedbyanFTsensornodeoveranNFTsensornodefordifferentvaluesofp.Wecalculatethepercentageimprovementas%MTTFImprovement=(MTTFFT)]TJ /F1 11.955 Tf 12.38 0 Td[(MTTFNFT)=MTTFNFT100.ThegureshowsthattheMTTFpercentageimprovementforanFTsensornodedecreasesaspincreaseswhenc6=1.ThepercentageMTTFimprovementforanFTsensornodewithk=5andk=10are85.75%and95.65%,respectively,forp=0.1anddropsto0.875%and1.34%,respectively,forp=0.99.Thisobservationrevealsthathavingmoreneighboringsensornodes(ahigherkvalue)improvesMTTFbecauseingeneralafaultdetectionalgorithm'saccuracyandcimproveswithincreasingk.Table 4-4 alsoshowsthattheMTTFpercentageimprovementforanFTsensornodeoveranNFTsensornodeis100%onaveragewhenc=1,thushighlightingtheimportanceofarobustfaultdetectionalgorithm. Modelvalidation:WecomparetheMTTFresultsforanNFTembeddedsensornodeobtainedusingourMarkovmodelwithotherexistingmodelsinliteratureas 111

PAGE 112

wellaspracticalsensornodeimplementationstoprovideinsightsintotheaccuracyofourmodels.Wepointoutthatadetailedcomparisonandvericationofourmodelwithexistingmodelsisnotpossiblebecausetherearenosimilar/relatedmodelsthatleverageFTconstructs.[ 86 ]modeledthelifetimeoftrigger-drivensensornodeswithdifferenteventarrivalrates(trigger-drivensensornodesperformprocessingbasedoneventsasopposedtoduty-cyclebasedsensornodesthatperformprocessingbasedonadutycycle),whichcanbeinterpretedassensingeventsthatdependonthesensornode'ssensingfrequency.Resultswereobtainedforthreetrigger-drivensensornodeplatforms:XSM,MicaZ,andTelos.XSMandMicaZconsistedofanATmega128L8-bitprocessorwithamaximumprocessorfrequencyof16MHzandamaximumperformanceof16millionsinstructionspersecond(MIPS).TelosconsistedofaTIMSP43016-bitprocessorwithamaximumprocessorfrequencyof8MHzandamaximumperformanceof16MIPS.ResultsrevealedthatTelos,XSM,andMicaZdeliveredlifetimesof1500,1400,and600days,respectively,withaneventarrivalrateof0.1events/hour,whichdroppedto350,100,and0days,respectively,withaneventarrivalrateof100events/hour.ThelifetimevaluesforanNFTsensornodeobtainedfromourMarkovmodelrangesfrom949daysto22daysfordifferentsensorfailureprobabilities.Dependinguponthesensornodeenergyconsumption,thelifetimemodelproposedin[ 85 ]estimatedthatthesensornodelifetimevariedfrom25daysto215days.ThesecomparisonsindicatethatourMarkovmodelyieldssensornodelifetimevaluesintherangeobtainedfromotherexistingmodels. WealsocomparetheNFTembeddedsensornodelifetimeestimationfromourmodelwithanexperimentalstudyonsensornodelifetimes[ 89 ].Thiscomparisonveriesconformityofourmodelwithrealmeasurements.Forexample,withasensornodebatterycapacityof2500mA-h,experimentsindicateasensornodelifetimerangingfrom72to95hoursfora100%dutycyclefordifferentbatterybrands(e.g.,Ansmann,PanasonicIndustrial,VartaHighEnergy,PanasonicExtremePower)[ 89 ]. 112

PAGE 113

Thesensornodelifetimeforadutycycleof18%canbeestimatedas95/0.18=528hours22daysandforadutycycleof0.42%as95/0.0042=22,619hours942days.ThiscomparisonrevealsthatourNFTembeddedsensornodelifetimemodelestimateslifetimevaluesintherangeobtainedbyexperimentalsensornodeimplementations. 4.4.3ReliabilityandMTTFforNFTandFTEWSNclusters Inthissubsection,wepresentthereliabilityandMTTFresultsfortwoEWSNclustersthatcontainonaveragen=kmin+2andn=kmin+5sensornodesatdeploymenttime(kmin=4).TheselectionoftwodifferentnvaluesprovidesinsightonhowthesizeoftheclusteraffectsreliabilityandMTTF(othernandkminvaluesdepictedsimilartrends).TheNFTEWSNclusterconsistsofNFTsensornodesandtheFTEWSNclusterconsistsofFTsensornodes.Forbrevity,wepresentexampleEWSNclusterreliabilitycalculationsforn=kmin+2(kmin=4)andp=0.1. Leveragingourbottom-upapproachmodel,theNFTEWSNclusterusesthefailureratesofanNFTsensornode.ByusingEquation 4 ,t=1.05410)]TJ /F4 7.97 Tf 6.59 0 Td[(3failures/day.Sincethecoveragefactorcdoesnotappearinthereliability(andMTTF)calculationforanNFTsensornode,failurerates=t=1=MTTFs=1.05410)]TJ /F4 7.97 Tf 6.58 0 Td[(3failures/day.Inotherwords,foranNFTsensornodes(kmin)=s(kmin+1),s(kmin)=s(4)ands(kmin)+1=s(5)denotethesensornodefailureratewhentheaveragenumberofneighborsensornodesare4and5,respectively.TheNFTEWSNclusterreliabilitycalculationutilizess(4)ands(5)whichgivesPkmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(6.32410)]TJ /F12 5.978 Tf 5.75 0 Td[(3tandPkmin+1(t)=6e)]TJ /F4 7.97 Tf 6.58 0 Td[(5.2710)]TJ /F12 5.978 Tf 5.76 0 Td[(3t)]TJ /F1 11.955 Tf 12.42 0 Td[(6e)]TJ /F4 7.97 Tf 6.58 0 Td[(6.32410)]TJ /F12 5.978 Tf 5.76 0 Td[(3t.ThereliabilityRc(t)=Pkmin+2(t)+Pkmin+1(t),whichwhenevaluatedatt=100daysgivesRc(t)jt=100=0.88562. ForanFTEWSNclusterwhenc6=1,thereliabilitycalculationrequiressd(kmin)andsd(kmin+1)whichdependsonkasthefaultdetectionalgorithm'saccuracyandcdependsuponk,yieldingdifferentreliabilityresultsfordifferentkminvalues.UsingTable 4-2 andEquation 4 ,sd(kmin)=sd(4)=1=MTTFsd(4)=1=1.715103=5.8310)]TJ /F4 7.97 Tf 6.59 0 Td[(4failures/dayandsd(kmin+1)=sd(5)=1=MTTFsd(5)=1=1.763103=5.6710)]TJ /F4 7.97 Tf 6.58 0 Td[(4failures/day.SHARPE 113

PAGE 114

givesPkmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(3.40210)]TJ /F12 5.978 Tf 5.75 0 Td[(3tandPkmin+1(t)=6.9856e)]TJ /F4 7.97 Tf 6.59 0 Td[(2.91510)]TJ /F12 5.978 Tf 5.75 0 Td[(3t-6.9856e)]TJ /F4 7.97 Tf 6.59 0 Td[(3.40210)]TJ /F12 5.978 Tf 5.75 0 Td[(3t.ThereliabilityoftheFTEWSNclusteriscalculatedasRc(t)jt=100=0.95969. ForanFTEWSNclusterwhenc=1,thereliabilitycalculationdoesnotdependonk,whichgivessd(kmin)=sd(4)=1=MTTFsd(4)=1/1898=5.26910)]TJ /F4 7.97 Tf 6.59 0 Td[(4failures/dayandsd(kmin+1)=sd(5)=1=MTTFsd(5)=1/1898=5.26910)]TJ /F4 7.97 Tf 6.59 0 Td[(4failures/day.SHARPEgivesPkmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(3.161410)]TJ /F12 5.978 Tf 5.76 0 Td[(3tandPkmin+1(t)=6e)]TJ /F4 7.97 Tf 6.58 0 Td[(2.634510)]TJ /F12 5.978 Tf 5.76 0 Td[(3t-6e)]TJ /F4 7.97 Tf 6.59 0 Td[(3.161410)]TJ /F12 5.978 Tf 5.75 0 Td[(3t,fromwhichwecalculateRc(t)jt=100=0.96552. Table 4-5 showsthereliabilityforanNFTandanFTEWSNcluster(forc6=1andc=1)evaluatedatt=100dayswhenn=kmin+2(kmin=4).Weobservesimilartrendsaswithsensornodereliability(Table 4-3 )wherereliabilityofbothNFTandFTEWSNclustersdecreasesaspincreases(i.e.,reliabilityRc!0()p!1)becausedecreasedindividualsensornodereliabilitydecreasesoverallEWSNclusterreliability.Table 4-5 depictsthatanFTEWSNclusterwithc=1outperformsanFTEWSNclusterwithc6=1andanNFTEWSNclusterforallpvalues.Forexample,thepercentageimprovementinreliabilityforanFTEWSNclusterwithc=1overanNFTEWSNclusterandanFTEWSNclusterwithc6=1is77.33%and12.02%forp=0.3and601.12%and142.73%forp=0.6,respectively.TheseresultsshowthatthepercentageimprovementinreliabilityattainedbyanFTEWSNclusterincreasesaspincreasesbecauseafaultdetectionalgorithm'saccuracyandcdecreasesaspincreases(Table 4-2 ).ThistrendissimilartothepercentageimprovementinreliabilityforanFTsensornode(Section 4.4.2 ).TheresultsshowthatanFTEWSNclusteralwaysperformsbetterthananNFTEWSNcluster.Forexample,thepercentageimprovementinreliabilityforanFTEWSNclusterwithc6=1overanNFTEWSNclusteris58.31%forp=0.3. Fig. 4-8 depictstheMTTFforanNFTEWSNclusterandanFTEWSNclusterversuspwhenkmin=4foraverageclustersizesofn=kmin+2andn=kmin+5sensornodes,respectively,atdeploymenttime.ThegurerevealsthattheFTEWSNcluster'sMTTFisconsiderablygreaterthantheNFTEWSNcluster'sMTTFforbothclustersizes. 114

PAGE 115

Table4-5. ReliabilityforanNFTandanFTEWSNclusterwhenn=kmin+2(kmin=4). pNFTFT(c6=1)FT(c=1) 0.050.967200.990580.990980.10.885620.959690.965520.20.655670.843040.874740.30.419680.664380.744220.40.233090.459250.593880.50.109420.265950.436530.60.040980.118370.287320.70.011140.035480.162790.81.595710)]TJ /F4 7.97 Tf 6.59 0 Td[(34.347010)]TJ /F4 7.97 Tf 6.59 0 Td[(30.067230.95.570210)]TJ /F4 7.97 Tf 6.59 0 Td[(51.483810)]TJ /F4 7.97 Tf 6.59 0 Td[(40.017720.996.105610)]TJ /F4 7.97 Tf 6.59 0 Td[(101.005710)]TJ /F4 7.97 Tf 6.59 0 Td[(95.570210)]TJ /F4 7.97 Tf 6.59 0 Td[(5 Figure4-8. MTTFindaysforanNFTEWSNclustersandanFTEWSNclusterwithkmin=4. Fig. 4-8 alsocomparestheMTTFforanFTEWSNclusterwhenc=1withc6=1andshowsthattheMTTFforanFTEWSNclusterwithc=1isalwaysbetterthananFTEWSNclusterwithc6=1.Thisobservationagainveriesthesignicanceofarobust(ideal)faultdetectionalgorithmwhichcanideallyprovidecvaluesclosetoone(i.e.,c1forallpvalues).WepointoutthatbothanNFTandanFTEWSNclusterwithn>kminhaveredundantsensornodesandcaninherentlytoleraten)]TJ /F3 11.955 Tf 12.14 0 Td[(kminsensornodefailures.TheEWSNclusterwithn=kmin+5hasmoreredundantsensornodesthan 115

PAGE 116

Table4-6. PercentageMTTFimprovementforanFTEWSNclusterascomparedtoanNFTEWSNcluster(kmin=4). Prob.FTFTFTp(n=kmin+2,c6=1)(n=kmin+5,c6=1)(n=kmin+2,c=1) 0.183.0487.551000.273.8477.241000.363.5866.511000.544.9747.221000.726.428.711000.99.819.571000.992.262.47100 theEWSNclusterwithn=kmin+2,resultinginacomparativelygreaterMTTF.Fig. 4-8 showsthattheMTTFforbothanNFTandanFTEWSNclusterapproacheszeroaspapproachesoneandfurthersolidiesthenecessityoflowfailureprobabilitysensorstoachievebetterMTTFinEWSNclusters.TheMTTFvariationforanEWSNclusterwithvaryingpissimilartotheMTTFvariationforasensornode(Fig. 4-7 )becauseanEWSNclusteriscomposedofsensornodesandreectstheMTTFvariationofitsconstituentsensornodes. Table 4-6 showsthepercentageMTTFimprovementforanFTEWSNclusterascomparedtoanNFTEWSNclusterforclustersizesofn=kmin+2andn=kmin+5sensornodes.ThepercentageimprovementsarecalculatedseparatelyforthetwoclustersizesandwecomparetheMTTFsofclustersofthesamesize.WeobservethattheMTTFimprovementforanFTclusterisslightlygreaterwhenn=kmin+5ascomparedtowhenn=kmin+2,however,theMTTFimprovementdecreaseswithincreasingpwhenc6=1.ThisobservationrevealsthatanEWSNclusterconsistingofmoresensornodescanachievebetterMTTFimprovementswhencomparedtoanEWSNclustercontainingfewersensornodes.TheMTTFpercentageimprovementforanFTEWSNclusterwithn=kmin+2,c6=1,is83.04%forp=0.1anddropsto2.26%forp=0.99.Similarly,thepercentageMTTFimprovementforanFTEWSNclusterwithn=kmin+5,c6=1,is87.55%forp=0.1anddropsto2.47%forp=0.99.The 116

PAGE 117

Table4-7. Iso-MTTFforEWSNclusters(kmin=4).
PAGE 118

calculationsforN=Nmin+2(Nmin=0,thatis,eachEWSNfailswhentherearenomoreactiveclusters)andp=0.2.WeassumethatbothEWSNscontainclusterswithn=kmin+5(kmin=4)=9sensornodesonaveragewithclusterfailureratec(9). ThereliabilitycalculationforanNFTEWSNrequirestheNFTEWSNclusterfailureratec(9),whichcanbecalculatedusingEquation 4 withn=9(i.e.,c(9)=1=MTTFc(9)=1=3.34102=2.9910)]TJ /F4 7.97 Tf 6.59 0 Td[(3failures/day).Usingc(9),SHARPEgivesPNmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(5.9810)]TJ /F12 5.978 Tf 5.75 0 Td[(3tandPNmin+1(t)=2e)]TJ /F4 7.97 Tf 6.59 0 Td[(2.9910)]TJ /F12 5.978 Tf 5.76 0 Td[(3t-2e)]TJ /F4 7.97 Tf 6.58 0 Td[(5.9810)]TJ /F12 5.978 Tf 5.76 0 Td[(3t.TheEWSNreliabilityRwsn(t)=PNmin+2(t)+PNmin+1(t)whenevaluatedatt=100daysgivesRwsn(t)jt=100=0.93321. ForanFTEWSNwhenc6=1,thereliabilitycalculationrequirestheFTEWSNclusterfailureratec(9)(forc6=1)(i.e.,c(9)=1=MTTFc(9)=1=5.92102=1.6910)]TJ /F4 7.97 Tf 6.59 0 Td[(3failures/day).Usingc(9),SHARPEgivesPNmin+2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(3.3810)]TJ /F12 5.978 Tf 5.76 0 Td[(3t,PNmin+1(t)=2e)]TJ /F4 7.97 Tf 6.59 0 Td[(1.6910)]TJ /F12 5.978 Tf 5.75 0 Td[(3t-2e)]TJ /F4 7.97 Tf 6.58 0 Td[(3.3810)]TJ /F12 5.978 Tf 5.76 0 Td[(3t.TheEWSNreliabilityRwsn(t)jt=100=0.97583. ForanFTEWSNclusterwhenc=1,thereliabilitycalculationrequirestheFTEWSNclusterfailureratec(9)(forc=1)(i.e.,c(9)=1=MTTFc(9)=1/668.73=1.4910)]TJ /F4 7.97 Tf 6.59 0 Td[(3failures/day).Usingc(9),SHARPEgivesPNmin+2(t)=P2(t)=e)]TJ /F4 7.97 Tf 6.59 0 Td[(2.9810)]TJ /F12 5.978 Tf 5.75 0 Td[(3t,PNmin+1(t)=P1(t)=2e)]TJ /F4 7.97 Tf 6.59 0 Td[(1.4910)]TJ /F12 5.978 Tf 5.75 0 Td[(3t-2e)]TJ /F4 7.97 Tf 6.59 0 Td[(2.9810)]TJ /F12 5.978 Tf 5.75 0 Td[(3t,whichgivesRwsn(t)jt=100=0.98084.Table 4-8 showsthereliabilityforanNFTEWSNandanFTEWSNevaluatedatt=100dayswhenN=Nmin+2(Nmin=0)forclusterswithninesensornodesonaverage.Weobservesimilartrendsaswithsensornodereliability(Table 4-3 )andEWSNclusterreliability(Table 4-5 )wherereliabilityforbothanNFTEWSNandanFTEWSNdecreasesaspincreases(i.e.,reliabilityRwsn!0()p!1)becauseaEWSNcontainsclustersofsensornodesanddecreasedindividualsensornodereliabilitywithincreasingpdecreasesbothEWSNclusterandEWSNreliability.Table 4-8 showsthatanFTEWSNwithc=1outperformsanFTEWSNwithc6=1andanNFTEWSNforallpvalues.Forexample,thepercentageimprovementinreliabilityforanFTEWSNwithc=1overanNFTEWSNandanFTEWSNwithc6=1is5.1%and0.51%forp=0.2and 118

PAGE 119

Table4-8. ReliabilityforanNFTEWSNandanFTEWSNwhenN=Nmin+2(Nmin=0). pNFTFT(c6=1)FT(c=1) 0.050.995570.998830.998850.10.982610.994740.995340.20.933210.975830.980840.30.855570.937750.954820.40.754080.874660.916110.50.635360.782020.862180.60.511660.651210.789480.70.363030.490930.695270.80.209330.303280.554940.90.088070.117920.396470.994.05410)]TJ /F4 7.97 Tf 6.59 0 Td[(34.95210)]TJ /F4 7.97 Tf 6.59 0 Td[(30.08807 350.18%and236.22%forp=0.9,respectively.TheseresultsshowthatthepercentageimprovementinreliabilityattainedbyanFTEWSNincreasesaspincreasesbecausethefaultdetectionalgorithm'saccuracyandcdecreasesaspincreases(Table 4-2 ).ThistrendissimilartothepercentageimprovementinreliabilityforanFTsensornode(Section 4.4.2 )andanFTEWSNcluster(Section 4.4.3 ).TheresultsshowthatanFTEWSNalwaysperformsbetterthanNFTEWSN.Forexample,thepercentageimprovementinreliabilityforanFTEWSNwithc6=1overanNFTEWSNis4.57%forp=0.2. Fig. 4-9 depictstheMTTFforanNFTEWSNandanFTEWSNcontainingonaverageN=Nmin+2andN=Nmin+5clusters(Nmin=0)atdeploymenttime.ThegurerevealsthatanFTEWSNimprovestheMTTFconsiderablyoveranNFTEWSNforbothnumberofclusters.Fig. 4-9 alsoshowsthattheMTTFforanFTEWSNwhenc=1isalwaysgreaterthantheMTTFforanFTEWSNwhenc6=1.WeobservethatsincetheMTTFforanFTEWSNdropsnearlytothatofanNFTEWSNasp!1,buildingamorereliableFTEWSNrequireslowfailureprobabilitysensors.ThisEWSNMTTFvariationwithpfollowssimilartrendsasobservedinEWSNclusters(Fig. 4-8 )andsensornodes 119

PAGE 120

Figure4-9. MTTFindaysforanNFTEWSNandanFTEWSNwithNmin=0. (Fig. 4-7 ).SimilarMTTFvariationsforanEWSN,EWSNcluster,andasensornoderesultsfromourbottom-upmodelingapproach(Section 4.3 ),whereeachhierarchicallevelcapturestheMTTFvariationtrendsoflowerlevels.WeobservethattheMTTFforanEWSNwithN=Nmin+5isalwaysgreaterthantheMTTFforanEWSNwithN=Nmin+2.ThisobservationisintuitivebecauseEWSNswithN=Nmin+5havemoreredundantEWSNclusters(andsensornodes)andcansurvivemoreclusterfailuresbeforereachingthefailedstate(N=0)ascomparedtoanEWSNwithN=Nmin+2. WeobservethepercentageMTTFimprovementforanFTEWSNoveranNFTEWSNcontainingonaverageN=Nmin+2andN=Nmin+5clusters.WeobservethattheMTTFimprovementforbothnumberofclustersdecreaseswithincreasingpwhenc6=1(similartrendasforaEWSNclusterandforasensornode).TheMTTFpercentageimprovementforanFTEWSNwithN=Nmin+2,c6=1,is87.56%forp=0.1anddropsto3.3%forp=0.99.Similarly,theMTTFpercentageimprovementforanFTEWSNwithN=Nmin+5,c6=1,is88.2%forp=0.1anddropsto3.26%forp=0.99.WeobservethattheMTTFimprovementforanFTEWSNwithc=1is100%onaverageforallpvaluesandisgreaterthananFTEWSNwithc6=1.TheMTTFpercentageimprovementforanFTEWSNwithN=Nmin+5overanFTEWSNwithN=Nmin+2is52.22%onaverage. 120

PAGE 121

Table4-9. Iso-MTTFforEWSNs(Nmin=0).
PAGE 122

embeddedsensornode(c6=1)withanaveragenumberofneighborskequalto5,10,and15,are230%,172.36%,166.31%,and165.32%,respectively,forp=0.9.ThepercentageimprovementinreliabilityforanFTEWSNclusterwithc=1overanNFTEWSNclusterandanFTEWSNclusterwithc6=1is601.12%and142.73%,respectively,forp=0.6.ThepercentageimprovementinreliabilityforanFTEWSNwithc=1overanNFTEWSNandanFTEWSNwithc6=1is350.18%and236.22%,respectively,forp=0.9.ResultsindicatedthatourFTmodelcouldprovideonaverage100%MTTFimprovementwithaperfectfaultdetectionalgorithmwhereastheMTTFimprovementvariedfrom95.95%to1.34%duetoafaultdetectionalgorithm'stypicalpoorperformanceathighsensorfailurerates.WealsoobservedthattheredundancyinEWSNsplaysanimportantroleinimprovingEWSNreliabilityandMTTF.ResultsrevealedthatjustthreeredundantsensornodesinanEWSNclusterresultedinanMTTFimprovementof103.35%onaverage.Similarly,redundancyinEWSNclusterscontributestothereliabilityandMTTFimprovementandtheresultsindicatedthatthreeredundantEWSNclusterscouldimprovetheMTTFby52.22%onaverage.Theiso-MTTFresultsindicatethatanNFTEWSNclusterrequiresthreeredundantsensornodestoattainacomparableMTTFasthatofanFTEWSNclusterwhereasanNFTEWSNrequiresnineredundantEWSNclusterstoachieveacomparableMTTFasthatofanFTEWSN. Ourresultsmotivatethedevelopmentofrobustdistributedfaultdetectionalgorithmsandarethefocusofourfuturework.WeplantodevelopanEWSNperformabilitymodeltocaptureboththeperformanceandavailability(and/orreliability)simultaneously.WealsoplantoinvestigateFTinsenseddataaggregation(fusion)androutinginEWSNs. 122

PAGE 123

CHAPTER5ANMDP-BASEDDYNAMICOPTIMIZATIONMETHODOLOGYFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS Embeddedwirelesssensornetworks(EWSNs)aredistributedsystemsconsistingofspatiallydistributedautonomoussensornodesthatspandiverseapplicationdomains.TheEWSNapplicationdomainsincludesecurityanddefensesystems,industrialmonitoring,buildingautomation,logistics,ecology,environmentandambientconditionsmonitoring,healthcare,homeandofceapplications,vehicletracking,etc.However,thiswideapplicationdiversitycombinedwithincreasingembeddedsensornodescomplexity,functionalityrequirements,andhighlyconstrainedoperatingenvironmentsmakeEWSNdesignverychallenging. OnecriticalEWSNdesignchallengeinvolvesmeetingapplicationrequirementssuchasreliability,lifetime,throughput,delay(responsiveness),etc.formyriadofapplicationdomains.Forexample,avineyardirrigationsystemmayrequirelessresponsivenesstoenvironmentalstimuli(i.e.,decreasedirrigationduringwetperiods),buthavealonglifetimerequirement.Ontheotherhand,inadisasterreliefapplication,sensornodesmayrequirehighresponsivenessbuthaveashortlifetime.Additionalrequirementsmayincludehighadaptabilitytorapidnetworkchangesassensornodesaredestroyed.Meetingtheseapplicationspecicrequirementsiscriticaltoaccomplishingtheapplication'sassignedfunction.Nevertheless,satisfyingthesedemandsinascalableandcost-effectivewayisachallengingtask. Commercialoff-the-shelf(COTS)embeddedsensornodeshavedifcultymeetingapplicationrequirementsduetoinherentmanufacturingtraits.Inordertoreducemanufacturingcosts,genericCOTSembeddedsensornodescapableofimplementingnearlyanyapplicationareproducedinlargevolumes,andarenotspecializedtomeetanyspecicapplicationrequirements.Inordertomeetapplicationrequirements,sensornodesmustpossesstunableparameters.Fortunately,someCOTShavetunable 123

PAGE 124

parameterssuchasprocessorvoltage,processorfrequency,sensingfrequency,radiotransmissionpower,andradiotransmissionfrequency,etc. Embeddedsensornodeparametertuningistheprocessofdeterminingappropriateparametervalueswhichmeetapplicationrequirements.However,determiningsuchvaluespresentsseveraltuningchallenges.First,applicationmanagers(theindividualsresponsibleforEWSNdeploymentandmanagement)typicallylacksufcienttechnicalexpertise[ 138 ][ 139 ],asmanymanagersarenon-experts(i.e.,biologists,teachers,structuralengineers,agriculturists,etc.).Inaddition,parametervaluetuningisstillacumbersomeandtimeconsumingtaskevenforexpertapplicationmanagersduetounpredictableEWSNenvironmentsanddifcultyincreatingaccuratesimulationenvironments.Secondly,selectedparametervaluesmaynotbeoptimal(suboptimal).Givenahighlycongurableembeddedsensornodewithmanytunableparametersandwithmanyvaluesforeachtunableparameter,choosingtheoptimalcombinationisdifcult.Inaddition,unanticipatedchangesinthesensornode'senvironmentcanalteroptimalparametervalues.Forexample,anembeddedsensornodedesignedtomonitorashort-livedvolcaniceruptionmayneedtooperateformoremonths/yearsthanexpectedifearthquakesaltermagmaow. Toeaseparametervalueselection,dynamicoptimizationsenableembeddedsensornodestodynamicallytunetheirparametervaluesinsituaccordingtoapplicationrequirementsandenvironmentalstimuli.ThisdynamictuningofparametersensuresthatanEWSNperformstheassignedtaskoptimally,enablingtheembeddedsensornodetoconstantlyconformtothechangingenvironment.Besides,theapplicationmanagerneednotknowembeddedsensornodeand/ordynamicoptimizationspecics,thuseasingparametertuningfornon-expertapplicationmanagers. Unfortunately,thereexistslittlepreviousworkonEWSNdynamicoptimizationswithrespecttorelatinghigh-levelapplicationrequirementstolow-levelembeddedsensornodeparameters.Moreover,changesinapplicationrequirementsovertimewerenot 124

PAGE 125

addressedinpreviouswork.Hence,noveldynamicoptimizationmethodologiesthatrespondtochangingapplicationrequirementsandenvironmentalstimuliareessential. Inthischapter,weproposeanapplication-orienteddynamictuningmethodologyforembeddedsensornodesindistributedEWSNsbasedonMarkovdecisionprocess(MDP).OurMDP-basedapplication-orientedtuningmethodologyperformsdynamicvoltage,frequency,andsensing(sampling)frequencyscaling(DVFS2).MDPisanappropriatecandidateforEWSNdynamicoptimizationswheredynamicdecisionmakingisarequirementinlightofchangingenvironmentalstimuliandwirelesschannelcondition.WefocusonDVFS2forseveralreasons.Traditionalmicroprocessor-basedsystemsusedynamicvoltageandfrequencyscaling(DVFS)forenergyoptimizations.However,sensornodesaredistinctfromtraditionalsystemsinthattheyhaveembeddedsensorscoupledwithanembeddedprocessor.Therefore,DVFSonlyprovidesapartialtuningmethodologyanddoesnotconsidersensingfrequency.Sensingfrequencytuningisessentialforsensornodestomeetapplicationrequirementsbecausethesenseddatadelay(thedelaybetweenthesensorsensingthedataandthedata'sreceptionbytheapplicationmanager)dependsuponthesensornodesensingfrequencyasitinuencestheamountofprocessedandcommunicateddata.Thus,DVFS2providesenhancedoptimizationpotentialascomparedtoDVFSwithrespecttoEWSNs. Ourmaincontributionsinthischapterare: Tothebestofourknowledge,weproposeforthersttimeaMarkovDecisionProcess(MDP)fordynamicoptimizationofembeddedsensornodesinEWSNs.MDPissuitableforembeddedsensornodesdynamicoptimizationbecauseofMDP'sinherentabilitytoperformdynamicdecisionmaking.ThischapterpresentsarststeptowardsMDP-baseddynamicoptimization. OurMDP-baseddynamicoptimizationmethodologygivesapolicythatperformsDVFS2andspeciesgoodqualitysolutionforembeddedsensornodeparametertuningforEWSNlifetime. OurMDP-baseddynamictuningmethodologyadaptstochangingapplicationrequirementsandenvironmentalstimuli. 125

PAGE 126

Weprovideimplementationguidelinesforourproposeddynamictuningmethodologyinembeddedsensornodes. WecompareourproposedMDP-basedapplicationorienteddynamictuningmethodologywithseveralxedheuristics.Theresultsshowthatourproposedmethodologyoutperformsotherheuristicsforgivenapplicationrequirements.ThebroaderimpactsofourresearchincludesfacilitatingEWSNdesigners(personswhodesignEWSNsforanapplication)tobettermeetapplicationrequirementsbyselectingsuitabletunableparametervaluesforeachsensornode.AsthischapterpresentsarststeptowardsMDP-baseddynamicoptimization,ourworkcanpotentiallysparkfurtherresearchinMDP-basedoptimizationsforembeddedsensornodesinEWSNs. Theremainderofthischapterisorganizedasfollows.AreviewofrelatedworkisgiveninSection 10.1 .Section 5.2 providesanoverviewofMDPwithrespecttoEWSNsandourproposedMDP-basedapplicationorienteddynamictuningmethodology.Section 5.3 describestheformulationofourproposedmethodologyasanMDP.Section 5.4 providesimplementationguidelinesandourproposedmethodology'scomputationalcomplexity.Section 5.5 providesmodelextensionstoourproposedpolicy.NumericalresultsarepresentedinSection 5.6 .Section 5.7 concludesourstudyandoutlinesfutureresearchworkdirections. 5.1RelatedWork Thereisalotofresearchintheareaofdynamicoptimizations[ 140 ][ 44 ][ 43 ][ 45 ],buthowever,mostpreviousworkfocusesontheprocessorormemory(cache)incomputersystems.Whereastheseendeavorscanprovidevaluableinsightsintoembeddedsensornodesdynamicoptimizations,theyarenotdirectlyapplicabletoembeddedsensornodesduetodifferentdesignspaces,platformparticulars,andembeddedsensornodes'tightdesignconstraints. Stevens-Navarroetal.[ 141 ]appliedMDPsforverticalhandoffdecisionsinheterogeneouswirelessnetworks.Althoughourworkleveragestherewardfunction 126

PAGE 127

ideafromtheirwork,ourwork,forthersttimetothebestofourknowledge,appliesMDPstodynamicoptimizationsofembeddedsensornodesinEWSNs. LittlepreviousworkexistsintheareaofapplicationspecictuninganddynamicprolinginEWSNs.Sridharanetal.[ 142 ]obtainedaccurateenvironmentalstimulibydynamicallyprolingtheEWSN'soperatingenvironment,however,theydidnotproposeanymethodologytoleveragetheseprolingstatisticsforoptimizations.Tilaketal.[ 143 ]investigatedinfrastructure(referredtoassensornodecharacteristics,numberofdeployedsensors,anddeploymentstrategy)tradeoffsonapplicationrequirements.Theapplicationrequirementsconsideredwereaccuracy,latency,energyefciency,faulttolerance,goodput(ratiooftotalnumberofpacketsreceivedtothetotalnumberofpacketssent),andscalability.However,theauthorsdidnotdelineatetheinterdependencebetweenlow-levelsensornodeparametersandhigh-levelapplicationrequirements.Kogekaretal.[ 84 ]proposedanapproachfordynamicsoftwarerecongurationinEWSNsusingdynamicallyadaptivesoftware.Theirapproachusedtaskstodetectenvironmentalchanges(eventoccurrences)andadaptthesoftwaretothenewconditions.Theirworkdidnotconsiderembeddedsensornodetunableparameters.Kadayifetal.[ 144 ]proposedanautomatedstrategyfordatalteringtodeterminetheamountofcomputationordatalteringtobedoneatthesensornodesbeforetransmittingdatatothesinknode.Unfortunately,theauthorsonlystudiedtheeffectsofdatalteringtuningonenergyconsumptionanddidnotconsiderothersensornodeparametersandapplicationrequirements. SomepreviousandcurrentworkinvestigatesEWSNoperationinchangingapplication(mission)requirementsandenvironmentalstimuli.Marronetal.[ 145 ]presentedanadaptivecross-layerarchitectureTinyCubusforTinyOS-basedsensornetworksthatalloweddynamicmanagementofcomponents(e.g.,caching,aggregation,broadcaststrategies)andreliablecodedistributionconsideringEWSNtopology.Tiny-Cubusconsideredoptimizationparameters(e.g.,energy,communicationlatency, 127

PAGE 128

andbandwidth),applicationrequirements(e.g.,reliability),andsystemparameters(e.g.,mobility).Thesystemparametersselectedthebestsetofcomponentsbasedoncurrentapplicationrequirementsandoptimizationparameters.Vecchio[ 146 ]discussedadaptabilityinEWSNsatthreedifferentlevels:communication-level(bytuningthecommunicationscheme),application-level(bysoftwarechanges),andhardware-level(byinjectingnewsensornodes).Theinternationaltechnologyallianceinnetworkandinformationscience(ITA),sponsoredbytheUKministryofdefense(MoD)andUSarmyresearchlab(ARL),investigatestaskreassignmentandreconguration(includingphysicalmovementofsensornodesorreprocessingofdata)ofalreadydeployedsensornodesinthesensoreldinresponsetocurrentorpredictedfutureconditionstoprovidetheexpectedsensedinformationatasufcientquality[ 147 ].However,ITAcurrentprojects,tothebestofourknowledge,donotconsidersensornodeparametertuningandourMDP-basedparametertuningcanoptimizeEWSNoperationinchangingapplicationrequirements. SeveralpapersexploreDVFSforreducedenergyconsumption.Pillaietal.[ 148 ]proposedreal-timedynamicvoltagescaling(RT-DVS)algorithmscapableofmodifyingtheoperatingsystems'real-timeschedulerandtaskmanagementserviceforreducedenergyconsumption.Childersetal.[ 149 ]proposedatechniqueforadjustingsupplyvoltageandfrequencyatrun-timetoconserveenergy.Theirtechniquemonitoredaprogram'sinstruction-levelparallelism(ILP)andadjustedprocessorvoltageandspeedinresponsetothecurrentILP.Theirproposedtechniquealloweduserstospecifyperformanceconstraints,whichthehardwaremaintainedwhilerunningatthelowestenergyconsumption. Liuetal.[ 150 ]investigatedreducingprocessorspeedbyvaryingthesupplyandthresholdvoltagesforlowpowerconsumptionincomplementarymetal-oxide-semiconductor(CMOS)VLSI(very-large-scaleintegration).Resultsshowedthatanoptimizedthresholdvoltagerevealedan8xpowersavingswithoutanynegativeperformanceimpacts.In 128

PAGE 129

addition,signicantlygreaterenergysavingscouldbeachievedbyreducingprocessorspeedintandemwiththresholdvoltage.Burdetal.[ 151 ]presentedamicroprocessorsystemthatdynamicallyvarieditssupplyvoltageandclockfrequencytodeliverhighthroughputduringcriticalhigh-speedexecutionperiodsandextendedbatterylifeduringlow-speedexecutionperiods.Resultsrevealedthatdynamicvoltagescaling(DVS)couldimproveenergyefciencyby10xforbattery-poweredprocessorsystemswithoutsacricingpeakthroughput. Minetal.[ 82 ]demonstratedthatdynamicvoltagescalinginasensornode'sprocessorreducedenergyconsumption.Theirtechniqueusedavoltagescheduler,runningintandemwiththeoperatingsystem'staskscheduler,toadjustvoltageandfrequencybasedonaprioriknowledgeofthepredictedsensornode'sworkload.Yuanetal.[ 83 ]studiedaDVFSsystemforsensornodes,whichrequiredsensornodessendingdatatoinsertadditionalinformationintoatransmitteddatamessage'sheadersuchasthepacketlength,expectedprocessingtime,anddeadline.Thereceivingsensornodeusedthismessageinformationtoselectanappropriateprocessorvoltageandfrequencytominimizetheoverallenergyconsumption. SomepreviousworksinEWSNoptimizationsexploregreedyandsimulatedannealing(SA)-basedmethods.Lyseckyetal.[ 152 ]proposedanSA-basedautomatedapplicationspecictuningofparameterizedsensor-basedembeddedsystemsandfoundthatautomatedtuningcanimproveEWSNoperationby40%onaverage.Verma[ 153 ]studiedSAandparticleswarmoptimization(PSO)methodsforautomatedapplicationspecictuningandobservedthatSAperformedbetterthanPSObecausePSOoftenquicklyconvergedtolocalminima.Inpriorwork,Muniretal.[ 154 ]proposedgreedy-andsimulatedannealing(SA)-basedalgorithmsforparametertuning.Whereasgreedy-andSA-basedalgorithmsarelightweight,thesealgorithmsdonotensureconvergencetoanoptimalsolution. 129

PAGE 130

Figure5-1. ProcessdiagramforourMDP-basedapplicationorienteddynamictuningmethodologyforembeddedwirelesssensornetworks. ThereexistspreviousworkrelatedtoDVFSandseveralinitiativestowardsapplication-specictuningweretaken.Nevertheless,literaturepresentsnomechanismstodetermineadynamictuningpolicyforsensornodeparametersinaccordancewithchangingapplicationrequirements.Tothebestofourknowledge,weproposetherstmethodologytoaddressEWSNdynamicoptimizationswiththegoalofmeetingapplicationrequirementsinadynamicenvironment. 5.2MDP-BasedTuningOverview Inthissection,wepresentourMDP-basedtuningmethodologyalongwithanMDPoverviewwithrespecttoEWSNs[ 137 ]. 5.2.1MDP-BasedTuningMethodologyforEmbeddedWirelessSensorNetworks Fig. 5-1 depictstheprocessdiagramforourMDP-basedapplication-orienteddynamictuningmethodology.Ourmethodologyconsistsofthreelogicaldomains:theapplicationcharacterizationdomain,thecommunicationdomain,andthesensornodetuningdomain. 130

PAGE 131

TheapplicationcharacterizationdomainreferstotheEWSNapplication'scharacterization/specication.Inthisdomain,theapplicationmanagerdenesvariousapplicationmetrics(e.g.,tolerablepowerconsumption,tolerabledelay,etc.),whicharecalculatedfrom(orbasedon)applicationrequirements.Theapplicationmanageralsoassignsweightfactorstoapplicationmetricstosignifytheweightageorimportanceofeachapplicationmetric.Weightfactorsprovideapplicationmanagerswithaneasymethodtorelatetherelativeimportanceofeachapplicationmetric.TheapplicationmanagerdenesanMDPrewardfunctionwhichsigniestheoverallreward(revenue)forgivenapplicationrequirements.Theapplicationmetricsalongwithassociatedweightfactors,representtheMDPrewardfunctionparameters. Thecommunicationdomaincontainsthesinknode(whichgathersstatisticsfromtheembeddedsensornodes)andencompassesthecommunicationnetworkbetweentheapplicationmanagerandthesensornodes.TheapplicationmanagertransmitstheMPDrewardfunctionparameterstothesinknodeviathecommunicationdomain.Thesinknodeinturnrelaysrewardfunctionparameterstotheembeddedsensornodes. Thesensornodetuningdomainconsistsofembeddedsensornodesandperformssensornodetuning.EachsensornodecontainsanMDPcontrollermodulewhichimplementsourMDP-basedtuningmethodology(summarizedhereanddescribedindetailinSection 5.3 ).Afteranembeddedsensornodereceivesrewardfunctionparametersfromthesinknodethroughthecommunicationdomain,thesensornodeinvokestheMDPcontrollermodule.TheMDPcontrollermodulecalculatestheMDP-basedpolicy.TheMDP-basedpolicyprescribesthesensornodeactionstomeetapplicationrequirementsoverthelifetimeofthesensornode.Anactionprescribesthesensornodestate(denedbyprocessorvoltage,processorfrequency,andsensingfrequency)inwhichtotransitionfromthecurrentstate.Thesensornodeidentiesitscurrentoperatingstate,determinesanaction`a'prescribedbytheMDP-basedpolicy 131

PAGE 132

(i.e.,whethertocontinueoperationinthecurrentstateortransitiontoanotherstate)andsubsequentlyexecutesaction`a'. OurproposedMDP-baseddynamictuningmethodologycanadapttochangesinapplicationrequirements(sinceapplicationrequirementsmaychangewithtime,forexample,adefensesysteminitiallydeployedtomonitorenemytrooppositionforfourmonthsmaylaterberequiredtomonitortroopactivityforanextendedperiodofsixmonths).Wheneverapplicationrequirementschange,theapplicationmanagerupdatestherewardfunction(and/orassociatedparameters)toreectthenewapplicationrequirements.Uponreceivingtheupdatedrewardfunction,thesensornodereinvokesMDPcontrollermoduleanddeterminesthenewMDP-basedpolicytomeetthenewapplicationrequirements. OurMDP-basedapplication-orienteddynamictuningmethodologyreactstoenvironmentalstimuliviaadynamicprolermoduleinthesensornodetuningdomain.Thedynamicprolermodulemonitorsenvironmentalchangesovertimeandcapturesunanticipatedenvironmentalsituationsnotpredictableatdesigntime[ 142 ].Thedynamicprolermodulemaybeconnectedtothesensornodeandprolesthepro-lingstatistics(e.g.,wirelesschannelcondition,numberofpacketsdropped,packetsize,radiotransmissionpower,etc.)whentriggeredbytheEWSNapplication.Thedynamicprolermoduleinformstheapplicationmanageroftheproledstatisticsviathecommunicationdomain.Afterreceivingtheprolingstatistics,theapplicationmanagerevaluatesthestatisticsandpossiblyupdatestherewardfunctionparameters.Thisreevaluationprocessmaybeautomated,thuseliminatingtheneedforcontinuousapplicationmanagerinput.Basedonthesereceivedprolingstatisticsandupdatedrewardfunctionparameters,thesensornodeMDPcontrollermoduledetermineswhetherapplicationrequirementsaremetornotmet.Ifapplicationrequirementsarenotmet,theMDPcontrollermodulereinvokestheMDP-basedpolicytodetermineanewoperatingstatetobettermeetapplicationrequirements.Thisfeedbackprocess 132

PAGE 133

continuestoensurethattheapplicationrequirementsaremetinthepresenceofchangingenvironmentalstimuli. Input: MDPRewardFunctionParameters Output: AnMDP-basedOptimalPolicyMeetingApplicationRequirements foreachEmbeddedSensorNodeinWirelessSensorNetworkdo1 whileEmbeddedSensorNodeAlivedo2 ifInitialDeploymentofEmbeddedWirelessSensorNetworkthen3 DetermineMDP-basedpolicyaccordingtoinitialapplicationrequirements4characterizedbyMDPrewardfunctionparameters end5 ifEmbeddedWirelessSensorNetworkApplicationRequirementsChangethen6 DeterminenewMDP-basedpolicyaccordingtonewapplication7requirementscharacterizedbyMDPrewardfunctionparameters end8 foreachEmbeddedSensorNodeStates2Sdo9 whileembeddedsensornodeinstateido10 ifactionasuggestsstatedifferentfromithen11 switchtothestategivenbyactiona12 else13 continueoperatinginstatei14 end15 end16 end17 end18 end19 Algorithm1:AlgorithmforourMDP-basedapplicationorienteddynamictuningmethodologyforembeddedwirelesssensornetworks. Algorithm 1 depictsourMDP-baseddynamictuningmethodologyinalgorithmicform,whichappliestoallembeddedsensornodesintheWSN(line1)andspanstheentirelifetimeoftheembeddedsensornodes(line2).ThealgorithmdeterminesanMDP-basedpolicyaccordingtotheapplication'sinitialrequirements(capturedbytheMDPrewardfunctionparameters)atthetimeofEWSNdeployment(line3andline4).ThealgorithmdeterminesanewMDP-basedpolicytoadapttochangingapplicationrequirements(line6andline7).Afterdetermining,thealgorithmspeciestheactionaaccordingtoforeachembeddedsensornodestatesgiventhetotalnumberofstatesS(lines9-17). 133

PAGE 134

5.2.2MDPOverviewwithRespecttoEmbeddedWirelessSensorNetworks Inthissection,wedenebasicMDPterminologyinthecontextofEWSNsandgiveanoverviewofourproposedMDP-baseddynamictuningpolicyformulationforembeddedsensornodes.MDPs,alsoknownasstochasticdynamicprogramming,areusedtomodelandsolvedynamicdecisionmakingproblems.Weusestandardnotationsasdenedin[ 155 ]forourMDP-basedproblemformulation.Table 5-1 presentsasummaryofkeynotationsusedinthischapter. ThebasicelementsofanMDPmodelare:decisionepochsandperiods,states,actionsets,transitionprobabilities,andrewards.AnMDPisMarkovian(memoryless)becausethetransitionprobabilitiesandrewardsdependonthepastonlythroughthecurrentstateandtheactionselectedbythedecisionmakerinthatstate.Thedecisionepochsrefertothepointsoftimeduringasensornode'slifetimeatwhichthesensornodemakesadecision.Specically,asensornodemakesadecisionregardingitsoperatingstateatthesedecisionepochs(i.e.,whethertocontinueoperatingatthecurrentstate(processorvoltage,frequency,andsensingfrequency),ortransitiontoanotherstate).Weconsideradiscretetimeprocesswheretimeisdividedintoperiodsandadecisionepochcorrespondstothebeginningofaperiod.ThesetofdecisionepochscanbedenotedasT=f1,2,3,...,Ng,whereN1anddenotesthesensornode'slifetime(eachindividualtimeperiodinTcanbedenotedastimet).ThedecisionproblemisreferredtoasanitehorizonproblemwhenthedecisionmakinghorizonNisniteandinnitehorizonotherwise.Inanitehorizonproblem,thenaldecisionismadeatdecisionepochN)]TJ /F1 11.955 Tf 12.5 0 Td[(1,hencethenitehorizonproblemisalsoknownastheN)]TJ /F1 11.955 Tf 11.95 0 Td[(1periodproblem. Thesystem(asensornode)operatesinaparticularstateateachdecisionepoch,whereSdenotesthecompletesetofpossiblesystemstates.Statesspecifyparticularsensornodeparametervaluesandeachstaterepresentsadifferentcombinationofthesevalues.Anactionsetrepresentsallallowableactionsinallpossiblestates.At 134

PAGE 135

Table5-1. SummaryofMDPnotations NotationDescription ssensornodestateSstatespaceNnumberofdecisionepochsInumberofsensornodestatetuplesrt(s,a)rewardattimetgivenstatesandactionaai,jactiontotransitionfromstateitostatejAsallowableactionsinstatespt(jjs,a)transitionprobabilityfunctionEsexpectedrewardofpolicywithinitialstatesdiscountfactorN(s)expectedtotalrewardN(s)expectedtotaldiscountedreward(s)maximumexpectedtotaldiscountedrewarddoptimaldecisionruledtdecisionruleattimetoptimalpolicyMDPMDP-basedpolicy eachdecisionepoch,thesensornodedecideswhethertocontinueoperatinginthecurrentstateortoswitchtoanotherstate.Thesensornodestate(inourproblem)representsatupleconsistingofprocessorvoltage(Vp),processorfrequency(Fp),andsensingfrequency(Fs).Ifthesystemisinstates2Satadecisionepoch,thesensornodecanchooseanactionafromthesetofallowableactionsAsinstates.Thus,anactionsetcanbewrittenasA=Ss2SAs.WeassumethatSandAsdonotvarywithtimet[ 155 ]. Whenasensornodeselectsactiona2Asinstates,thesensornodereceivesarewardrt(s,a)andthetransitionprobabilitydistributionpt(js,a)determinesthesystemstateatthenextdecisionepoch.Thereal-valuedfunctionrt(s,a)denotesthevalueoftherewardreceivedattimetinperiodt.Therewardisreferredtoasincomeorcostdependingonwhetherornotrt(s,a)ispositiveornegative,respectively.Whenthe 135

PAGE 136

rewarddependsonthesystemstateatthenextdecisionepoch,weletrt(s,a,j)denotethevalueoftherewardreceivedattimetwhenthesystemstateatdecisionepochtiss.Thesensornodeselectsactiona2As,andthesystemoccupiesthestatejatdecisionepocht+1.Thesensornodeevaluatesrt(s,a)using[ 155 ] rt(s,a)=Xj2Srt(s,a,j)pt(jjs,a)(5) wherethenon-negativefunctionpt(jjs,a)iscalledatransitionprobabilityfunction.pt(jjs,a)denotestheprobabilitythatthesystemoccupiesstatej2Sattimet+1whenthesensornodeselectsactiona2AsinstatesattimetandusuallyPj2Spt(jjs,a)=1.Formally,anMDPisdenedasthecollectionofobjectsfT,S,As,pt(js,a),rt(s,a)g. Adecisionruleprescribesanactionineachstateataspecieddecisionepoch.Ourdecisionruleforsensornodesisafunctiondt:S!Aswhichspeciestheactionattimetwhenthesystemisinstatesforeachs2S,dt(s)2As.ThisdecisionruleisbothMarkoviananddeterministic. Apolicyspeciesthedecisionruleforalldecisionepochs.Inthecaseofsensornodes,thepolicyprescribesactionselectionunderanypossiblesystemstate.Apolicyisasequenceofdecisionrules,thatis,=(d1,d2,d3,...,dN)]TJ /F4 7.97 Tf 6.58 0 Td[(1)forN1.Apolicyisstationaryifdt=d8t2Ti.e.,forstationarypolicy=(d,d,d,...,d). Asaresultofselectingandimplementingaparticularpolicy,thesensornodereceivesrewardsattimeperiodsf1,2,3,...,Ng.Therewardsequenceisrandombecausetherewardsreceivedindifferentperiodsarenotknownpriortopolicyimplementation.Thesensornode'soptimizationobjectiveistodetermineapolicywhichmaximizesthecorrespondingrandomrewardsequence. 5.3ApplicationSpecicEmbeddedSensorNodeTuningFormulationasanMDP Inthissection,wedescribetheformulationofourembeddedsensornodeapplicationspecicDVFS2tuningasanMDP.WeformulateMDP-basedpolicyconstructs(i.e.,statespace,decisionepochs,actions,statedynamics,policy,performance 136

PAGE 137

criterion,andrewardfunction)foroursystem.Wealsointroduceoptimalityequationsandthepolicyiterationalgorithm. 5.3.1StateSpace ThestatespaceforourMDP-basedtuningmethodologyisacompositestatespacecontainingtheCartesianproductofembeddedsensornodetunableparameters'statespaces.WedenethestatespaceSas S=S1S2SM:jSj=I(5) wheredenotestheCartesianproduct,Misthetotalnumberofsensornodetunableparameters,Skdenotesthestatespacefortunableparameterkwherek2f1,2,...,Mg,andjSjdenotesthestatespaceScardinality(thenumberofstatesinS). Thetunableparameterk'sstatespace(k2f1,2,...,Mg)Skconsistsofnvalues Sk=fsk1,sk2,sk3,...,skng:jSkj=n(5) wherejSkjdenotesthetunableparameterk'sstatespacecardinality(thenumberoftunablevaluesinSk).Sisasetofn-tupleswhereeachn-tuplerepresentsasensornodestates.Eachstatesiisann-tuple,thatis,si=(v1,v2,...,vM):vk2Sk.Notethatsomen-tuplesinSmaynotbefeasible(e.g.,allprocessorvoltageandfrequencypairsarenotfeasible)andcanberegardedasdonotcaretuples. Eachembeddedsensornodestatehasanassociatedpowerconsumption,throughput,anddelay.Thepower,throughput,anddelayforstatesiaredenotedbypi,ti,anddi,respectively.Sincedifferentsensornodesmayhavedifferentembeddedprocessorsandattachedsensors,eachnodemayhavenodespecicpowerconsumption,throughput,anddelayinformationforeachstate. 137

PAGE 138

5.3.2DecisionEpochsandActions Embeddedsensornodesmakedecisionsatdecisionepochs,whichoccurafterxedtimeperiods.Thesequenceofdecisionepochsisrepresentedas T=f1,2,3,...,Ng,N1(5) wheretherandomvariableNcorrespondstotheembeddedsensornode'slifetime. Ateachdecisionepoch,asensornode'sactiondeterminesthenextstatetotransitiontogiventhecurrentstate.Thesensornodeactioninstatei2Sisdenedas Ai=fai,jg2f0,1g(5) whereai,jdenotestheactiontakenattimetthatcausesthesensornodetotransitiontostatejattimet+1fromthecurrentstatei.Apolicydetermineswhetheranactionistakenornot.Ifai,j=1,theactionistakenandifai,j=0,theactionisnottaken.Foragivenstatei2S,aselectedactioncannotresultinatransitiontoastatethatisnotinS.Theactionspacecanbedenedas A=na=[ai,j]:fai,jg2f0,1g,i=f1,2,3,...,Ig,j=f1,2,3,...,Igo (5) 5.3.3StateDynamics ThestatedynamicsofthesystemcanbedelineatedbythestatetransitionprobabilitiesoftheembeddedMarkovchain.Weformulateoursensornodepolicyasadeterministicdynamicprogram(DDP)becausethechoiceofanactiondeterminesthesubsequentstatewithcertainty.OursensornodeDDPpolicyformulationusesatransferfunctiontospecifythenextstate.Atransferfunctiondenesamappingt(s,a)fromSAs!S,whichspeciesthesystemstateattimet+1whenthesensornodeselectsactiona2Asinstatesattimet.ToformulateourDDPasanMDP,wedene 138

PAGE 139

thetransitionprobabilityfunctionas pt(jjs,a)=8>><>>:1ift(s,a)=j0ift(s,a)6=j.(5) 5.3.4PolicyandPerformanceCriterion Foreachgivenstates2S,asensornodeselectsanactiona2Asaccordingtoapolicy2whereisasetofadmissiblepoliciesdenedas =f:S!Asjdt(s)2As,8s2Sg(5) Aperformancecriterioncomparestheperformanceofdifferentpolicies.Thesensornodeselectsanactionprescribedbyapolicybasedonthesensornode'scurrentstate.IftherandomvariableXtdenotesthestateatdecisionepochtandtherandomvariableYtdenotestheactionselectedatdecisionepocht,thenforthedeterministiccase,Yt=dt(Xt). Asaresultofselectinganaction,thesensornodereceivesarewardr(Xt,Yt)attimet.Theexpectedtotalrewarddenotestheexpectedtotalrewardoverthedecisionmakinghorizongivenaspecicpolicy.Let(s)denotetheexpectedtotalrewardoverthedecisionmakinghorizonwhenthehorizonlengthNisarandomvariable,thesystemisinstatesattherstdecisionepoch,andpolicyisused[ 141 ][ 155 ] (s)=Es"EN(NXt=1r(Xt,Yt))#(5) whereEsrepresentstheexpectedrewardwithrespecttopolicyandtheinitialstates(thesystemstateatthetimeoftheexpectedrewardcalculation),andENdenotestheexpectedrewardwithrespecttotheprobabilitydistributionoftherandomvariableN.WecanwriteEquation 5 as[ 155 ] (s)=Es(1Xt=1t)]TJ /F4 7.97 Tf 6.58 0 Td[(1r(Xt,Yt))(5) 139

PAGE 140

whichgivestheexpectedtotaldiscountedreward.WeassumethattherandomvariableNisgeometricallydistributedwithparameterandhencethedistributionmeanis1=(1)]TJ /F5 11.955 Tf -458.7 -23.91 Td[()[ 141 ].Theparametercanbeinterpretedasadiscountfactor,whichmeasuresthepresentvalueofoneunitofrewardreceivedoneperiodinthefuture.Thus,(s)representstheexpectedtotalpresentvalueofthereward(income)streamobtainedusingpolicy[ 155 ].Ourobjectiveistondapolicythatmaximizestheexpectedtotaldiscountedreward,thatis,apolicyisoptimalif (s)(s)82(5) 5.3.5RewardFunction Therewardfunctioncapturesapplicationmetricsandsensornodecharacteristics.Ourrewardfunctioncharacterizationconsidersthepowerconsumption(whichaffectsthesensornodelifetime),throughput,anddelayapplicationmetrics.Wedenetherewardfunctionf(s,a)giventhecurrentsensornodestatesandthesensornode'sselectedactionaas f(s,a)=!pfp(s,a)+!tft(s,a)+!dfd(s,a)(5) wherefp(s,a)denotesthepowerrewardfunction,ft(s,a)denotesthethroughputrewardfunction,andfd(s,a)denotesthedelayrewardfunction;!p,!t,and!drepresenttheweightfactorsforpower,throughput,anddelay,respectively.Theweightfactors'constraintsaregivenasPm!m=1wherem=fp,t,dgsuchthat0!p1,0!t1,and0!d1.Theweightfactorsareselectedbasedontherelativeimportanceofapplicationmetricswithrespecttoeachother,forexample,ahabitatmonitoringapplicationtakingcameraimagesofthehabitatrequiresaminimumimageresolutiontoprovidemeaningfulanalysisthatnecessitatesaminimumthroughputandthereforthroughputcanbeassignedahigherweightfactorthanthepowermetricforthisapplication. 140

PAGE 141

Wedenelinearrewardfunctionsforapplicationmetricsbecauseanapplicationmetricreward(objectivefunction)typicallyvarieslinearly,orpiecewiselinearly,betweentheminimumandthemaximumallowedvaluesofthemetric[ 141 ][ 152 ].However,anon-linearcharacterizationofrewardfunctionsisalsopossibleanddependsupontheparticularapplication.Wepointoutthatourmethodologyworksforanycharacterizationofrewardfunction.Therewardfunctioncharacterizationonlydenestherewardobtainedfromoperatinginagivenstate.OurMDP-basedpolicydeterminestherewardbyselectinganoptimal/suboptimaloperatingstategiventhesensornodedesignspaceandapplicationrequirementsforanyrewardfunctioncharacterization.Weconsiderlinearrewardfunctionsasatypicalexamplefromthespaceofpossiblerewardfunctions(e.g.,piecewiselinear,non-linear)toillustrateourMDP-basedpolicy.Wedenethepowerrewardfunction(Fig. 5-2 (a))inEquation 5 as fp(s,a)=8>>>>>><>>>>>>:1,0>>>>><>>>>>>:1,taUT(ta)]TJ /F3 11.955 Tf 11.95 0 Td[(LT)=(UT)]TJ /F3 11.955 Tf 11.96 0 Td[(LT),LT
PAGE 142

Figure5-2. Rewardfunctions:(a)Powerrewardfunctionfp(s,a);(b)Throughputrewardfunctionft(s,a);(c)Delayrewardfunctionfd(s,a). Wedenethedelayrewardfunction(Fig. 5-2 (c))inEquation 5 as fd(s,a)=8>>>>>><>>>>>>:1,0><>>:Hi,aifi6=a0ifi=a.(5) 142

PAGE 143

whereHi,adenotesthetransitioncosttoswitchfromthecurrentstateitothenextstateasdeterminedbyactiona.Notethatasensornodeincursnotransitioncostifactionaprescribesthatthenextstateisthesameasthecurrentstate. Hence,theoverallrewardfunctionr(s,a)givenstatesandactionaattimetis r(s,a)=f(s,a))]TJ /F3 11.955 Tf 11.95 0 Td[(h(s,a)(5) whichaccountsforthepower,throughput,anddelayapplicationmetricsaswellasstatetransitioncost. Wepointoutthatmanyotherapplicationmetrics(e.g.,security,reliability,andlifetime)areofimmensesignicancetoEWSNs.Forexample,EWSNsarevulnerabletosecurityattackssuchasdistributeddenialofserviceandSybilattacksforwhichasecurityrewardfunctioncanbeincluded.AreliabilityrewardfunctioncanencompassthereliabilityaspectofEWSNssincesensornodesareoftendeployedinunattendedandhostileenvironmentsandaresusceptibletofailures.Similarly,consideringsensornodes'constrainedbatteryresources,poweroptimizationtechniquesexistthatputsensornodesinsleeporlow-powermode(wherecommunicationand/orprocessingfunctionsaredisabled)forpowerconservationwhenlessactivityisobservedinthesensedregionasdeterminedbypreviouslysenseddata.Thesepoweroptimizationscanbecapturedbyalifetimerewardfunction.Theseadditionalmetricsincorporationinourmodelisthefocusofourfuturework. 5.3.6OptimalityEquation Theoptimalityequation,alsoknownasBellman'sequation,forexpectedtotaldiscountedrewardcriterionisgivenas[ 155 ] (s)=maxa2As8<:r(s,a)+Xj2Sp(jjs,a)(j)9=;(5) where(s)denotesthemaximumexpectedtotaldiscountedreward.Thesalientpropertiesoftheoptimalityequationare:theoptimalityequationhasauniquesolution; 143

PAGE 144

anoptimalpolicyexistsgivenconditionsonstates,actions,rewards,andtransitionprobabilities;thevalueofthediscountedMDPsatisestheoptimalityequation;andtheoptimalityequationcharacterizesstationarypolicies. ThesolutionofEquation 5 givesthemaximumexpectedtotaldiscountedreward(s)andtheMDP-basedpolicyMDP,whichgivesthemaximum(s).MDPprescribestheactionafromactionsetAsgiventhecurrentstatesforalls2S.ThereareseveralmethodstosolvetheoptimalityEquation 5 suchasvalueiteration,policyiteration,andlinearprogramming,howeverinthisworkweusethepolicyiterationalgorithm. 5.3.7PolicyIterationAlgorithm Thepolicyiterationalgorithmcanbedescribedinfoursteps: 1. Setl=0andchooseanyarbitrarydecisionruled02DwhereDisasetofallpossibledecisionrules. 2. Policyevaluation-Obtainl(s)8s2Sbysolvingtheequations: l(s)=r(s,a)+Xj2Sp(jjs,a)l(j)(5) 3. Policyimprovement-Selectdl+18s2Stosatisfytheequations: dl+1(s)2argmaxa2As8<:r(s,a)+Xj2Sp(jjs,a)l(j)9=;(5) andsettingdl+1=dlifpossible. 4. Ifdl+1=dl,stopandsetd=dlwhereddenotestheoptimaldecisionrule.Ifdl+16=dl,setl=l+1andgotostep2. Step2isreferredtoaspolicyevaluation,becausebysolvingEquation 5 ,weobtaintheexpectedtotaldiscountedrewardfordecisionruledl.Step3isreferredtoaspolicyimprovement,becausethisstepselectsal-improvingdecisionrule.Instep4,dl+1=dlquellscycling,becauseadecisionruleisnotnecessarilyunique. 144

PAGE 145

5.4ImplementationGuidelinesandComplexity Inthissection,wedescribetheimplementationguidelinesandcomputationalcomplexityforourproposedMDP-basedpolicy.TheimplementationguidelinesdescribethemappingofMDP-specics(e.g.,statespace,rewardfunction)inourproblemformulation(Section 5.3 )toactualsensornodehardware.ThecomputationalcomplexityfocusesontheconvergenceofthepolicyiterationalgorithmandthedatamemoryanalysisforourMDP-baseddynamictuningmethodology.TheprototypeimplementationofourMDP-basedtuningmethodologyonhardwaresensorplatformsisthefocusofourfuturework. 5.4.1ImplementationGuidelines InordertoimplementourMDP-basedpolicy,particularvaluesmustbeinitiallydened.TherewardfunctionEquation 5 usesthepower,throughput,anddelayvaluesofferedinanembeddedsensornodestatesi(Section 5.3.1 ).Anapplicationmanagerspeciestheminimumandmaximumpower,throughput,anddelayvaluesrequiredbyEquation 5 ,Equation 5 ,andEquation 5 ,respectively,andthepower,throughput,anddelayweightfactorsrequiredinEquation 5 accordingtoapplicationspecications. Asensornode'sembeddedprocessordenesthetransitioncostHi,aasrequiredinEquation 5 ,whichisdependentonaprocessor'sparticularpowermanagementandswitchingtechniques.Processorshaveasetofavailablevoltageandfrequencypairs,whichdenestheVpandFpvalues,respectively,inasensornodestatetuple(Section 5.3.1 ).Embeddedsensorscanoperateatdifferentdenedsensingrates,whichdenetheFsvalueinasensornodestatetuple(Section 5.3.1 ).TheembeddedprocessorandsensorcharacteristicsdeterminethevalueofI,whichcharacterizesthestatespaceinEquation 5 (i.e.,numberofallowableprocessorvoltage,processorfrequency,andsensingfrequencyvaluesdeterminethetotalnumberofsensornodeoperatingstates,andthusthevalueofI). 145

PAGE 146

Thesensornodesperformparametertuningdecisionsatdecisionepochs(Section 5.2.2 ).Thedecisionepochscanbeguidedbythedynamicprolermoduletoadapttotheenvironmentalstimuli.Forinstance,foratargettrackingapplicationandafastmovingtarget,thedecisionepochperiodshouldbesmalltobettercapturethefastmovingtarget.Ontheotherhand,forstationaryorslowmovingtargets,decisionepochperiodshouldbelargetoconservebatteryenergy.However,sincetheexactdecisionepochperiodisapplicationspecic,theperiodshouldbeadjustedtocontrolthesensornodelifetime. BoththeMDPcontrollermodule(whichimplementsthepolicyiterationalgorithmtocalculatetheMDP-basedpolicyMDP)andthedynamicprolermodule(Section 5.2.1 )caneitherbeimplementedassoftwarerunningonasensornode'sembeddedprocessororcustomhardwareforfasterexecution. OneofthedrawbacksforMDP-basedpolicyisthatcomputationalandstorageoverheadincreasesasthenumberofstatesincreases.Therefore,EWSNdesignerwouldliketorestricttheembeddedsensorstates(e.g.,2,4,or16,etc.)toreducethecomputationalandstorageoverhead.Ifstaterestrictionisnotaviableoption,theEWSNcongurationcouldbeaugmentedwithabackendbasestationnodetorunourMDP-basedoptimizationandthesensornodeoperatingstateswouldbecommunicatedtothesensornodes.Thiscommunicationofoperatingstateinformationtosensornodesbythebasestationnodewouldnotconsumeenoughpowerresourcesgiventhatthisstateinformationistransmittedperiodicallyand/oraperiodicallyaftersomeminimumdurationdeterminedbytheagilityoftheenvironmentalstimuli(e.g.,morefrequentcommunicationwouldberequiredforarapidlychangingenvironmentalstimuliasopposedtoaslowchangingenvironmentalstimuli).ThisEWSNcongurationcouldalsoconsiderglobaloptimizations,whichareoptimizationsthattakeintoaccountsensornodeinteractionsanddependenciesandisafocusofourfuturework.Wepointoutthat 146

PAGE 147

globaloptimizationstorageandprocessingoverheadincreasesrapidlyasthenumberofsensornodesinEWSNincreases. 5.4.2ComputationalComplexity Sincesensornodeshavelimitedenergyreservesandprocessingresources,itiscriticaltoanalyzeourproposedMDP-basedpolicy'scomputationalcomplexity,whichisrelatedtotheconvergenceofthepolicyiterationalgorithm.Sinceourproblemformulation(Section 5.3 )consistsofnitestatesandactions,[ 155 ]provesatheoremthatestablishestheconvergenceofthepolicyiterationalgorithmfornitestatesandactionsinanitenumberofiterations.Anotherimportantcomputationalcomplexityfactoristhealgorithm'sconvergencerate.[ 155 ]showsthatforanitenumberofstatesandactions,thepolicyiterationalgorithmconvergestotheoptimal/suboptimalvaluefunctionatleastquadraticallyfast.EmpiricalobservationssuggestthatthepolicyiterationalgorithmcanconvergeinO(lnjSj)iterationswhereeachiterationtakesO(jSj3)time(forpolicyevaluation),however,noproofyetexiststoverifytheseempiricalobservations[ 156 ].Basedontheseempiricalobservationsforconvergence,policyiterationalgorithmcanconvergein4iterationsforjSj=64. 5.4.3DataMemoryAnalysis Weperformeddatamemoryanalysisusingthe8-bitAtmelATmega128Lmicroprocessor[ 49 ]inXSMsensornodes[ 157 ].TheAtmelATmega128Lmicroprocessorcontains128KBofon-chipin-systemreprogrammableashmemoryforprogramstorage,4KBofinternalSRAMdatamemory,andupto64KBofexternalSRAMdatamemory.Integerandoatingpointdatatypesrequire2and4bytesofstorage,respectively.OurdatamemoryanalysisconsidersallstoragerequirementsforourMDP-baseddynamictuningformulation(Section 5.3 )includingstatespace,actionspace,statedynamics(transitionprobabilitymatrix),Bellman'sEquation 5 ,MDPrewardfunctioncalculation(rewardmatrix),andpolicyiterationalgorithm.Weestimatedatamemorysizeforthreesensornodecongurations: 147

PAGE 148

4sensornodestateswith4allowableactionsineachstate(16actionsintheactionspace)(Fig. 5-4 ) 8sensornodestateswith8allowableactionsineachstate(64actionsintheactionspace) 16sensornodestateswith16allowableactionsineachstate(256actionsintheactionspace) Datamemoryanalysisrevealedthat4,8,and16sensornodestatecongurationsrequiredapproximately1.55KB,14.8KB,and178.55KB,respectively.Thus,currentlyavailablesensornodeplatformscontainenoughmemoryresourcestoimplementourMDP-baseddynamictuningmethodologywith16sensornodesstatesorfewer.However,thememoryrequirementsincreaserapidlyasthenumberofstatesincreasesduetothetransitionprobabilitymatrixandrewardmatrixspecications.Therefore,dependinguponavailablememoryresources,anapplicationdevelopercouldrestrictthenumberofstatesaccordinglyorotherwisewouldhavetoresorttoback-endbasestationbasedcomputationalpolicyasoutlinedinSection 5.4.1 toconservepowerandstorage. 5.5ModelExtensions OurproposedMDP-baseddynamictuningmethodologyforEWSNsishighlyadaptivetodifferentEWSNcharacteristicsandparticulars,includingadditionalsensornodetunableparameters(e.g.,radiotransmissionpower)andapplicationmetrics(e.g.,reliability).Furthermore,ourproblemformulationcanbeextendedtoformMDP-basedstochasticdynamicprograms.OurcurrentMDP-baseddynamicoptimizationmethodologyprovidesabasisforMDP-basedstochasticdynamicoptimizationthatwouldreacttochangingenvironmentalstimuliandwirelesschannelconditionstoautonomouslyswitchtoanappropriateoperatingstate.ThisstochasticdynamicoptimizationwouldprovideamajorincentivetouseanMDP-basedpolicy(becauseofcapabilityofMDPtoformulatestochasticdynamicprograms)asopposedtousinglightweightheuristicpolicies(e.g.,greedyorsimulatedannealingbased)for 148

PAGE 149

parametertuningthatcandetermineappropriateoperatingstateoutofalargestatespacewithoutrequiringlargecomputationalandmemoryresources[ 41 ]. Toexemplifyadditionaltuningparameters,weconsiderasensornode'stransceiver(radio)transmissionpower.Theextendedstatespacecanbewrittenas S=VpFpFsPtx(5) wherePtxdenotesthestatespaceforasensornode'sradiotransmissionpower. Wedenethesensornode'sradiotransmissionpowerstatespacePtxas Ptx=fPtx1,Ptx2,Ptx3,...,Ptxmg:jPtxj=m(5) wherePtxi2Ptx8i2f1,2,3,...,mgdenotesaradiotransmissionpower,mdenotesthenumberofradiotransmissionpowervalues,andjPtxj=mdenotestheradiotransmissionpowerstatespacecardinality. Toexemplifytheinclusionofadditionalapplicationmetrics,weconsiderreliability,whichmeasuresthereliabilityofsenseddata,suchasthetotalnumberofsenseddatapacketsreceivedatthesinknodewithouterrorinanarbitrarytimewindow.Thereliabilitycanbeinterpretedasthepacketreceptionrate,whichisthecomplementofthepacketerrorrate(PER)[ 158 ].Thefactorsthataffectreliabilityincludewirelesschannelcondition,networktopology,trafcpatterns,andthephysicalphenomenonthattriggeredthesensornodecommunicationactivity[ 158 ].Ingeneral,thewirelesschannelconditionhasthemostaffectonthereliabilitymetric,becausesenseddatapacketsmayexperiencedifferenterrorratesdependinguponthechannelcondition.Asensornodemaymaintainapplicationspeciedreliabilityindifferentwirelesschannelconditionsbytuning/changingtheerrorcorrectingcodes,modulationschemes,and/ortransmissionpower.Thedynamicprolermoduleinourproposedtuningmethodologyhelpsestimatingthereliabilitymetricatruntimebyprolingthenumberofpacket 149

PAGE 150

Figure5-3. Reliabilityrewardfunctions:(a)linearvariation;(b)quadraticvariation. transmissionsfromeachsensornodeandthenumberofpacketreceptionsatthesinknode. Thesensornode'sreliabilitycanbeaddedtotherewardfunctionandtheextendedrewardfunctioncanbewrittenas f(s,a)=!pfp(s,a)+!tft(s,a)+!dfd(s,a)+!rfr(s,a) (5) wherefr(s,a)denotesthereliabilityrewardfunction,!rrepresentstheweightfactorforreliability,andtheremainderofthetermsinEquation 5 havethesamemeaningasinEquation 5 .Theweightfactors'constraintsaregivenasPm!m=1wherem=fp,t,d,rgsuchthat0!p1,0!t1,0!d1,and0!r1. Thereliabilityrewardfunction(Fig. 5-3 (a))inEquation 5 canbedenedas fr(s,a)=8>>>>>><>>>>>>:1,raUR(ra)]TJ /F3 11.955 Tf 11.95 0 Td[(LR)=(UR)]TJ /F3 11.955 Tf 11.96 0 Td[(LR),LR
PAGE 151

allowed/toleratedreliability,respectively.Thereliabilitymayberepresentedasamultipleofabasereliabilityunitequalto0.1,whichrepresentsa10%packetreceptionrate[ 158 ]. Therewardfunctioncapturinganapplication'smetricscanbedenedaccordingtoparticularapplicationrequirements,andmayvaryquadratically(Fig. 5-3 (b))insteadoflinearly(asdenedabove)overtheminimumandmaximumallowedparametervaluesandcanbeexpressedas fr(s,a)=8>>>>>><>>>>>>:1,raUR(ra)]TJ /F3 11.955 Tf 11.95 0 Td[(LR)2=(UR)]TJ /F3 11.955 Tf 11.96 0 Td[(LR)2,LR
PAGE 152

statedynamicscouldbegivenbypt(jjt,s,a)whichdenotestheprobabilitythatthesystemoccupiesstatejinttimeunitsgivensanda.Ifthewirelesschannelconditiondoesnotchangestatewithtimethenpt(jjt,s,a)=18tthusformingaDDP.Thedeterminationofpt(jjt,s,a)requiresprobabilisticmodelingofwirelesschannelconditionovertimeandisthefocusofourfuturework. 5.6NumericalResults Inthissection,wecomparetheperformance(basedonexpectedtotaldiscountedrewardcriterion(Section 5.3.6 ))ofourproposedMDP-basedDVFS2policy(MDP)withseveralxedheuristicpoliciesusingarepresentativeEWSNplatform.WeusetheMATLABMDPtoolbox[ 160 ]implementationofourpolicyiterationalgorithmdescribedinSection 5.3.7 tosolveBellman'sEquation 5 todeterminetheMDP-basedpolicy.WeselectsensornodestateparametersbasedoneXtremeScaleMotes(XSM)[ 157 ][ 161 ].TheXSMmoteshaveanaveragelifetimeof1,000hoursofcontinuousoperationwithtwoAAalkalinebatteries,whichcandeliver6Whroranaverageof6mW[ 157 ].TheXSMplatformintegratesanAtmelATmega128Lmicrocontroller[ 49 ],aChipconCC1000radiooperatingat433MHz,anda4Mbitserialashmemory.TheXSMmotescontaininfrared,magnetic,acoustic,photo,andtemperaturesensors. Torepresentsensornodeoperation,weanalyzesampleapplicationdomainsthatrepresentatypicalsecuritysystemordefenseapplication(henceforthreferredtoasasecurity/defensesystem)[ 137 ],healthcareapplication,andambientconditionsmonitoringapplication.Forbrevity,weselectasinglesampleEWSNplatformcongurationandseveralapplicationdomains,butwepointoutthatourproposedMDPmodelandmethodologyworksequallywellforanyotherEWSNplatformandapplication. Foreachapplicationdomain,weevaluatetheeffectsofdifferentdiscountfactors,differentstatetransitioncosts,anddifferentapplicationmetricweightfactorsontheexpectedtotaldiscountedrewardforourMDP-basedpolicyandseveralxedheuristicpolicies(Section 5.6.1 ).Themagnitudeofdifferenceinthetotalexpecteddiscounted 152

PAGE 153

rewardfordifferentpoliciesisimportantasitprovidesrelativecomparisonsbetweenthedifferentpolicies. 5.6.1FixedHeuristicPoliciesforPerformanceComparisons DuetotheinfancyofEWSNdynamicoptimizations,thereexistnodynamicsensornodetuningmethodsforcomparisonwithourMDP-basedpolicy.Therefore,wecomparetoseveralxedheuristicpolicies(heuristicpolicieshavebeenshowntobeaviablecomparisonmethod[ 141 ]).Toprovideaconsistentcomparison,xedheuristicpoliciesusethesamerewardfunctionandassociatedparametersettingsasthatofourMDP-basedpolicy.Weconsiderthefollowingfourxedheuristicpolicies: AxedheuristicpolicyPOWwhichalwaysselectsthestatewiththelowestpowerconsumption. AxedheuristicpolicyTHPwhichalwaysselectsthestatewiththehighestthroughput. AxedheuristicpolicyEQUwhichspendsanequalamountoftimeineachoftheavailablestates. AxedheuristicpolicyPRFwhichspendsanunequalamountoftimeineachoftheavailablestatesbasedonaspeciedpreferenceforeachstate.Forexample,givenasystemwithfourpossiblestates,thePRFpolicymayspend40%oftimeintherststate,20%oftimeinthesecondstate,10%oftimeinthethirdstate,and30%oftimeinthefourthstate. 5.6.2MDPSpecications Wecomparedifferentpoliciesusingtheexpectedtotaldiscountedrewardperformancecriterion(Section 5.3.6 ).ThestatetransitionprobabilityforeachsensornodestateisgivenbyEquation 5 .Thesensornode'slifetimeandthetimebetweendecisionepochsaresubjectiveandmaybeassignedbyanapplicationmanageraccordingtoapplicationrequirements.Asensornode'smeanlifetimeisgivenby1=(1)]TJ /F5 11.955 Tf 12.35 0 Td[()timeunits,whichisthetimebetweensuccessivedecisionepochs(whichweassumetobe1hour).Forinstancefor=0.999,thesensornode'smeanlifetimeis1=(1)]TJ /F1 11.955 Tf 11.96 0 Td[(0.999)=1000hours42days. 153

PAGE 154

Figure5-4. SymbolicrepresentationofourMDPmodelwithfoursensornodestates. Forournumericalresults,weconsiderasensornodecapableofoperatinginfourdifferentstates(i.e.,I=4inEquation 5 ).Fig. 5-4 showsthesymbolicrepresentationofourMDPmodelwithfoursensornodestates.Eachstatehasasetofallowedactionsprescribingtransitionstoavailablestates.Foreachallowedactionainastate,thereisafra,pagpairwhereraspeciestheimmediaterewardobtainedbytakingactionaandpadenotestheprobabilityoftakingactiona. Table5-2. Parametersforwirelesssensornodestatesi=[Vp,Fp,Fs](Vpisspeciedinvolts,FpinMHz,andFsinKHz).Parametersarespeciedasamultipleofabaseunitwhereonepowerunitisequalto1mW,onethroughputunitisequalto0.5MIPS,andonedelayunitisequalto50ms.ParametervaluesarebasedontheXSMmote.(pi,ti,anddidenotethepowerconsumption,throughput,anddelay,respectively,instatesi) Notations1=[2.7,2,2]s2=[3,4,4]s3=[4,6,6]s4=[5.5,8,8] pi10units15units30units55unitsti4units8units12units16unitsdi26units14units8units6units Table 5-2 summarizesstateparametervaluesforeachofthefourstatess1,s2,s3,ands4.Wedeneeachstateusinga[Vp,Fp,Fs]tuplewhereVpisspeciedinvolts,FpinMHz,andFsinKHz.Forinstance,stateones1isdenedas[2.7,2,2],which 154

PAGE 155

correspondstoaprocessorvoltageof2.7volts,aprocessorfrequencyof2MHz,andasensingfrequencyof2KHz(2000samplespersecond).Werepresentstatesi8i2f1,2,3,...,Igpowerconsumption,throughputanddelayasmultiplesofpower,throughput,anddelaybaseunits,respectively.Weassumeonebasepowerunitisequalto1mW,onebasethroughputunitisequalto0.5MIPS(MillionsofInstructionsperSecond),andonebasedelayunitisequalto50ms.Weassignbaseunitssuchthattheseunitsprovideaconvenientrepresentationofapplicationmetrics(power,throughput,delay).Wepointout,however,anyotherfeasiblebaseunitvaluescanbeassigned[ 141 ].Weassume,withoutlossofgenerality,thatthetransitioncostforswitchingfromonestatetoanotherisHi,a=0.1ifi6=a.Thetransitioncostcouldbeafunctionofpower,throughput,anddelaybutweassumeaconstanttransitioncostforsimplicityasitistypicallyconstantfordifferentstatetransitions[ 49 ]. OurselectionofthestateparametervaluesinTable 5-2 correspondstoXSMmotespecications[ 157 ][ 49 ].TheXSMmote'sAtmelATmega128Lmicroprocessorhasanoperatingvoltagerangeof2.7to5.5Vandaprocessorfrequencyrangeof0to8MHz.TheATmega128Lthroughputvarieswithprocessorfrequencyat1MIPSperMHz,thusallowinganapplicationmanagertooptimizepowerconsumptionversusprocessingspeed[ 49 ].Ourchosensensingfrequencyalsocorrespondswithstandardsensornodespecications.TheHoneywellHMC1002magnetometersensor[ 162 ]consumesonaverage15mWofpowerandcanbesampledin0.1msontheAtmelATmega128Lmicroprocessor,whichresultsinamaximumsamplingfrequencyofapproximately10KHz(10,000samplespersecond).TheacousticsensorembeddedintheXSMmotehasamaximumsensingfrequencyofapproximately8.192KHz[ 157 ].Althoughthepowerconsumptioninastatedependsuponnotonlytheprocessorvoltageandfrequencybutalsoontheprocessorutilization,whichalsodependsuponsensingfrequency,wereporttheaveragepowerconsumptionvaluesinastateasderivedfromthedatasheets[ 49 ][ 162 ]. 155

PAGE 156

Table5-3. MinimumLandmaximumUrewardfunctionparametervaluesandapplicationmetricweightfactorsforasecurity/defensesystem,healthcare,andambientconditionsmonitoringapplication(LPandUPdenoteminimumandmaximumacceptablepowerconsumption,respectively;LTandUTdenoteminimumandmaximumacceptablethroughput,respectively;LDandUDdenoteminimumandmaximumacceptabledelay,respectively;!p,!t,and!ddenotetheweightfactorsforpower,throughput,anddelay,respectively) NotationSecurity/DefenseHealthCareAmbientMonitoring LP12units8units5unitsUP35units20units32unitsLT6units3units2unitUT12units9units8unitsLD7units8units12unitsUD16units20units40units!p0.450.50.65!t0.20.30.15!d0.350.20.2 Table 5-3 summarizestheminimumLandmaximumUrewardfunctionparametervaluesforapplicationmetrics(power,throughput,anddelay)andassociatedweightfactorsforasecurity/defensesystem,healthcare,andambientconditionsmonitoringapplication.Ourselectedrewardfunctionparametervaluesrepresenttypicalapplicationrequirements[ 163 ].Wedescribebelowtherelativeimportanceoftheseapplicationmetricswithrespecttoourconsideredapplications. AlthoughpowerisaprimaryconcernforallEWSNapplicationsandtolerablepowerconsumptionvaluesarespeciedbasedonthedesiredEWSNlifetimeconsideringlimitedbatteryresourcesofsensornodes.However,arelativeimportanceinpowerfordifferentapplicationscanbedelineatedmainlybytheinfeasibilityofsensornode'sbatteryreplacementduetohostileenvironments.Forexample,sensornodesinawarzoneforasecurity/defenseapplicationandanactivevolcanomonitoringapplicationmakesbatteryreplacementalmostimpractical.Forahealthcareapplicationwithsensorsattachedtoapatienttomonitorphysiologicaldata(e.g.,heartrate,glucose 156

PAGE 157

level,etc.),sensornode'sbatterymaybereplacedthoughpowerisconstrainedbecauseexcessiveheatdissipationcouldadverselyaffectapatient'shealth.Similarly,delaycanbeanimportantfactorforsecurity/defenseincaseofenemytargettrackingandhealthcareforapatientinintensivehealthconditionswhereasdelaymayberelativelylessimportantforahumiditymonitoringapplication.Adatasensitivesecurity/defensesystemmayrequireacomparativelylargeminimumthroughputinordertoobtainasufcientnumberofsenseddatasamplesformeaningfulanalysis.Althoughrelativeimportanceandminimumandmaximumvaluesoftheseapplicationmetricscanvarywidelywithanapplicationdomainandbetweenapplicationdomains,wepickourparametervalues(Table 5-3 )fordemonstrationpurposestoprovideaninsightintoouroptimizationmethodology. Wepointoutthatallofourconsideredapplicationmetricsspecicallythroughputdependsuponthetrafcpattern.EWSNthroughputisacomplexfunctionofthenumberofnodes,trafcvolumeandpatterns,andtheparametersofthemediumaccesstechnique.Asthenumberofnodesandtrafcvolumeincreases,contention-basedmediumaccessmethodsresultinanincreasednumberofpacketcollisionswhichwasteenergywithouttransmittingusefuldata.Thiscontentionandpacketcollisionresultsinsaturationwhichdecreasestheeffectivethroughputandincreasesthedelaysharply.WebrieyoutlinethevarianceinEWSNtrafcpatternsforourconsideredapplications.Thesecurity/defenseapplicationwouldhaveinfrequentburstsofheavytrafc(e.g.,whenanenemytargetappearswithinthesensornodes'sensingrange),healthcareapplicationswouldhaveasteadyowofmediumtohightrafc,andambientconditionsmonitoringapplicationswouldhaveasteadyowoflowtomediumtrafcexceptforemergencies(e.g.,volcanoeruption).Althoughmodelingofapplicationmetricswithrespecttotrafcpatternswouldresultinabettercharacterizationofthesemetricvaluesatparticularinstants/timesinWSN,however,thesemetricvaluescanstillbebounded 157

PAGE 158

byalowerminimumanduppermaximumvalueascapturedbyourrewardfunctions(Section 5.3.5 ). Giventherewardfunction,sensornodestateparameterscorrespondingtoXSMmote,andtransitionprobabilities,ourMATLABMDPtoolbox[ 160 ]implementationofpolicyiterationalgorithmsolvesBellman'sEquation 5 todeterminetheMDP-basedpolicyanddeterminestheexpectedtotaldiscountedreward(Equation 5 ). 5.6.3ResultsforaSecurity/DefenseSystemApplication Table5-4. Theeffectsofdifferentdiscountfactorsforasecurity/defensesystem.Hi,j=0.1ifi6=j,!p=0.45,!t=0.2,!d=0.35. SensorLifetimeMDPPOWTHPEQUPRF 0.9416.67hours10.00067.51119.07787.26927.55860.9520hours12.03029.011110.91118.7239.06870.9625hours15.074711.261113.661110.903811.33390.9733.33hours20.148915.011118.244514.538315.10910.9850hours30.297222.511127.411121.807522.65960.99100hours60.742245.011154.911143.615045.31110.9991000hours608.7522450.0111549.9111436.15453.03810.999910,000hours6.08891034.51035.51034.41034.51030.99999100,000hours6.11044.51045.51044.41044.5104 5.6.3.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward Table 5-4 andFig. 5-5 depicttheeffectsofdifferentdiscountfactorsontheheuristicpoliciesandMDPforasecurity/defensesystemwhenthestatetransitioncostHi,jisheldconstantat0.1fori6=j,and!p,!t,and!dareequalto0.45,0.2,and0.35,respectively.Sinceweassumethetimebetweensuccessivedecisionepochstobe1hour,therangeoffrom0.94to0.99999correspondstoarangeofaveragesensornodelifetimefrom16.67to100,000hours4167days11.4years.Table 5-4 andFig. 5-5 showthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallvaluesofandcorrespondingaveragesensornodelifetimes. 158

PAGE 159

Figure5-5. Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforasecurity/defensesystem.Hi,j=0.1ifi6=j,!p=0.45,!t=0.2,!d=0.35. Figure5-6. PercentageimprovementinexpectedtotaldiscountedrewardforMDPforasecurity/defensesystemascomparedtothexedheuristicpolicies.Hi,j=0.1ifi6=j,!p=0.45,!t=0.2,!d=0.35. Fig. 5-6 showsthepercentageimprovementinexpectedtotaldiscountedrewardforMDPforasecurity/defensesystemascomparedtothexedheuristicpolicies.Thepercentageimprovementiscalculatedas[(RMDP)]TJ /F3 11.955 Tf 10.24 0 Td[(RX)=RMDP]100whereRMDPdenotestheexpectedtotaldiscountedrewardforMDPandRXdenotestheexpectedtotaldiscountedrewardfortheXxedheuristicpolicywhereX=fPOW,THP,EQU,PRFg.Forinstance,whentheaveragesensornodelifetimeis1,000hours(=0.999),MDPresultsina26.08%,9.67%,28.35%,and25.58%increaseinexpectedtotaldiscounted 159

PAGE 160

Figure5-7. Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforasecurity/defensesystem.=0.999,!p=0.45,!t=0.2,!d=0.35. rewardcomparedtoPOW,THP,EQU,andPRF,respectively.Fig. 5-6 alsodepictsthatMDPshowsincreasedsavingsastheaveragesensornodelifetimeincreasesduetoanincreaseinthenumberofdecisionepochsandthusprolongedoperationofsensornodesinoptimal/suboptimalstatesasprescribedbyMDP.Onaverageoveralldiscountfactors,MDPresultsina25.57%,9.48%,27.91%,and25.1%increaseinexpectedtotaldiscountedrewardcomparedtoPOW,THP,EQU,andPRF,respectively. 5.6.3.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward Fig. 5-7 depictstheeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforasecurity/defensesystemwithaxedaveragesensornodelifetimeof1000hours(=0.999)and!p,!t,and!dequalto0.45,0.2,and0.35,respectively.Fig. 5-7 showsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforalltransitioncostvalues. Fig. 5-7 alsoshowsthattheexpectedtotaldiscountedrewardforMDPisrelativelyunaffectedbystatetransitioncost.ThisrelativelyconstantbehaviorcanbeexplainedbythefactthatourMDPpolicydoesnotperformmanystatetransitions.Relativelyfewstatetransitionstoreachtheoptimal/suboptimalstateaccordingtothespecied 160

PAGE 161

applicationmetricsmaybeadvantageousforsomeapplicationmanagerswhoconsiderthenumberofstatetransitionsprescribedbyapolicyasasecondaryevaluationcriteria[ 141 ].MDPperformsstatetransitionsprimarilyatsensornodedeploymentorwheneveranewMDP-basedpolicyisdeterminedastheresultofchangesinapplicationrequirements. Wefurtheranalyzetheeffectsofdifferentstatetransitioncostsonthexedheuristicpolicies,whichconsistentlyresultinalowerexpectedtotaldiscountedrewardascomparedtoMDP.TheexpectedtotaldiscountedrewardsforPOWandTHParerelativelyunaffectedbystatetransitioncost.Theexplanationforthisbehavioristhattheseheuristicsperformstatetransitionsonlyatinitialsensornodedeploymentwhenthesensornodetransitionstothelowestpowerstateandthehighestthroughputstate,respectively,andremaininthesestatesfortheentiresensornode'slifetime.Ontheotherhand,statetransitioncosthasthelargestaffectontheexpectedtotaldiscountedrewardforEQUduetohighstatetransitionratesbecausethepolicyspendsanequalamountoftimeinallstates.Similarly,highswitchingcostshavealargeaffectontheexpectedtotaldiscountedrewardforPRF(althoughlessseverelythanEQU)becausePRFspendsacertainpercentageoftimeineachavailablestate(Section 5.6.1 ),thusrequiringcomparativelyfewertransitionsthanEQU. 5.6.3.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward Fig. 5-8 showstheeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforasecurity/defensesystemwhentheaveragesensornodelifetimeis1,000hours(=0.999)andthestatetransitioncostHi,jisheldconstantat0.1fori6=j.Weexplorevariousweightfactorsthatareappropriatefordifferentsecurity/defensesystemspecics,thatis,(!p,!t,!d)=f(0.35,0.1,0.55),(0.45,0.2,0.35),(0.5,0.3,0.2),(0.55,0.35,0.1)g.Fig. 5-8 revealsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallweightfactorvariations. 161

PAGE 162

Figure5-8. Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforasecurity/defensesystem.=0.999,Hi,j=0.1ifi6=j Figure5-9. Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforahealthcareapplication.Hi,j=0.1ifi6=j,!p=0.5,!t=0.3,!d=0.2. 5.6.4ResultsforaHealthCareApplication 5.6.4.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward Fig. 5-9 depictstheeffectsofdifferentdiscountfactorsforahealthcareapplicationwhenthestatetransitioncostHi,jisheldconstantat0.1fori6=j,and!p,!t,and!dareequalto0.5,0.3,and0.2,respectively.Fig. 5-9 showsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallvaluesofandcorrespondingaveragesensornodelifetimesascomparedtootherxedheuristicpolicies. 162

PAGE 163

Figure5-10. PercentageimprovementinexpectedtotaldiscountedrewardforMDPforahealthcareapplicationascomparedtothexedheuristicpolicies.Hi,j=0.1ifi6=j,!p=0.5,!t=0.3,!d=0.2. Fig. 5-10 showsthepercentageimprovementinexpectedtotaldiscountedrewardforMDPforahealthcareapplicationascomparedtothexedheuristicpolicies.Forinstance,whentheaveragesensornodelifetimeis1,000hours(=0.999),MDPresultsina16.39%,10.43%,27.22%,and21.47%increaseinexpectedtotaldiscountedrewardcomparedtoPOW,THP,EQU,andPRF,respectively.Onaverageoveralldiscountfactors,MDPresultsina16.07%,10.23%,26.8%,and21.04%increaseinexpectedtotaldiscountedrewardcomparedtoPOW,THP,EQU,andPRF,respectively. 5.6.4.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward Fig. 5-11 showstheeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforahealthcareapplicationwithaxedaveragesensornodelifetimeof1000hours(=0.999)and!p,!t,and!dequalto0.5,0.3,and0.2,respectively.Fig. 5-11 showsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforalltransitioncostvalues.ThexedheuristicpoliciesconsistentlyresultinalowerexpectedtotaldiscountedrewardascomparedtoMDP.ComparisonofFig. 5-7 andFig. 5-11 revealsthatasecurity/defensesystemandahealthcareapplicationhavesimilartrendswithrespecttodifferentstatetransitioncostsontheexpectedtotaldiscountedreward. 163

PAGE 164

Figure5-11. Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforahealthcareapplication.=0.999,!p=0.5,!t=0.3,!d=0.2. Figure5-12. Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforahealthcareapplication.=0.999,Hi,j=0.1ifi6=j 5.6.4.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward Fig. 5-12 depictstheeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforahealthcareapplicationwhentheaveragesensornodelifetimeis1,000hours(=0.999)andthestatetransitioncostHi,jiskeptconstantat0.1fori6=j.Weexplorevariousweightfactorsthatareappropriatefordifferenthealthcareapplicationspecics(i.e.,(!p,!t,!d)=f(0.42,0.36,0.22),(0.45,0.4,0.15),(0.5, 164

PAGE 165

Figure5-13. Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication.Hi,j=0.1ifi6=j,!p=0.65,!t=0.15,!d=0.2. 0.3,0.2),(0.58,0.28,0.14)g).Fig. 5-12 showsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallweightfactorvariations. Fig. 5-8 andFig. 5-12 showthattheexpectedtotaldiscountedrewardofPOWgraduallyincreaseswithasthepowerweightfactorincreasesandeventuallyexceedsthatofTHPforasecurity/defensesystemandahealthcareapplication,respectively.However,closeobservationrevealsthattheexpectedtotaldiscountedrewardofPOWforasecurity/defensesystemisaffectedmoresharplythanahealthcareapplication,becauseofthemorestringentconstraintonmaximumacceptablepowerforahealthcareapplication(Table 5-3 ).Fig. 5-8 andFig. 5-12 showthatPRFtendstoperformbetterthanEQUwithincreasingpowerweightfactorsbecausePRFspendsagreaterpercentageoftimeinlowpowerstates. 5.6.5ResultsforanAmbientConditionsMonitoringApplication 5.6.5.1Theeffectsofdifferentdiscountfactorsontheexpectedtotaldiscountedreward Fig. 5-13 demonstratestheeffectsofdifferentdiscountfactorsforanambientconditionsmonitoringapplicationwhenthestatetransitioncostHi,jisheldconstantat0.1fori6=j,and!p,!t,and!dareequalto0.65,0.15,and0.2,respectively.Fig. 5-13 165

PAGE 166

Figure5-14. PercentageimprovementinexpectedtotaldiscountedrewardforMDPforanambientconditionsmonitoringapplicationascomparedtothexedheuristicpolicies.Hi,j=0.1ifi6=j,!p=0.65,!t=0.15,!d=0.2. showsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallvaluesof. Fig. 5-14 showsthepercentageimprovementinexpectedtotaldiscountedrewardforMDPforanambientconditionsmonitoringapplicationascomparedtothexedheuristicpolicies.Forinstance,whentheaveragesensornodelifetimeis1,000hours(=0.999),MDPresultsina8.77%,52.99%,40.49%,and32.11%increaseinexpectedtotaldiscountedrewardascomparedtoPOW,THP,EQU,andPRF,respectively.Onaverageoveralldiscountfactors,MDPresultsina8.63%,52.13%,39.92%,and31.59%increaseinexpectedtotaldiscountedrewardascomparedtoPOW,THP,EQU,andPRF,respectively. 5.6.5.2Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedreward Fig. 5-15 depictstheeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplicationwithaxedaveragesensornodelifetimeof1000hours(=0.999)and!p,!t,and!dequalto0.65,0.15,and0.2,respectively.Fig. 5-15 revealsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforalltransitioncostvalues.Thexedheuristicpoliciesconsistently 166

PAGE 167

Figure5-15. Theeffectsofdifferentstatetransitioncostsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication.=0.999,!p=0.65,!t=0.15,!d=0.2. resultinalowerexpectedtotaldiscountedrewardascomparedtoMDP.Fig. 5-15 revealsthattheambientconditionsmonitoringapplicationhassimilartrendswithrespecttodifferentstatetransitioncostsascomparedtothesecurity/defensesystem(Fig. 5-7 )andhealthcareapplications(Fig. 5-7 ). 5.6.5.3Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedreward Fig. 5-16 showstheeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplicationwhentheaveragesensornodelifetimeis1,000hours(=0.999)andthestatetransitioncostHi,jisheldconstantat0.1fori6=j.Weexplorevariousweightfactorsthatareappropriatefordifferentambientconditionsmonitoringapplicationspecics(i.e.,(!p,!t,!d)=f(0.5,0.25,0.25),(0.6,0.1,0.3),(0.65,0.15,0.2),(0.7,0.12,0.18)g).Fig. 5-16 revealsthatMDPresultsinthehighestexpectedtotaldiscountedrewardforallweightfactorvariations. Foranambientconditionsmonitoringapplication,Fig. 5-16 showsthattheexpectedtotaldiscountedrewardofPOWbecomesclosertoMDPasthepowerweightfactorincreases,becausewithhigherpowerweightfactors,MDPspendsmoretimein 167

PAGE 168

Figure5-16. Theeffectsofdifferentrewardfunctionweightfactorsontheexpectedtotaldiscountedrewardforanambientconditionsmonitoringapplication.=0.999,Hi,j=0.1ifi6=j lowerpowerstatestomeetapplicationrequirements.Fig. 5-16 showsthatPRFtendstoperformbetterthanEQUwithincreasingpowerweightfactorssimilartothesecurity/defensesystem(Fig. 5-8 )andhealthcareapplications(Fig. 5-16 ). 5.6.6SensitivityAnalysis AnapplicationmanagercanassignvaluestoMDPrewardfunctionparameters,suchasHi,a,LP,UP,LT,UT,LD,UD,!p,!t,and!d,beforeanEWSN'sinitialdeploymentaccordingtoprojected/anticipatedapplicationrequirements.However,theaveragesensornodelifetime(calculatedfrom)maynotbeaccuratelyestimatedatthetimeofinitialEWSNdeployment,asenvironmentalstimuliandwirelesschannelconditionsvarywithtimeandmaynotbeaccuratelyanticipated.Thesensornodelifetimedependsonsensornodeactivity(bothprocessingandcommunication),whichvarieswiththechangingenvironmentalstimuliandwirelesschannelconditions.Sensitivityanalysisanalyzestheeffectsofchangesinaveragesensornodelifetimeafterinitialdeploymentontheexpectedtotaldiscountedreward.Thus,iftheactuallifetimeisdifferentthantheestimatedlifetime,whatisthelossintotalexpecteddiscountedrewardiftheactuallifetimehadbeenaccuratelypredictedatdeployment. EWSNsensitivityanalysiscanbecarriedoutwiththefollowingsteps[ 141 ]: 168

PAGE 169

1. Determinetheexpectedtotaldiscountedrewardgiventheactualaveragesensornodelifetimel=1=(1)]TJ /F5 11.955 Tf 11.96 0 Td[(),referredtoastheOptimalRewardRo. 2. Letldenotetheestimatedaveragesensornodelifetimeandldenotethepercentagechangefromtheactualaveragesensornodelifetime(i.e.,l=(1+l)l).lresultsinasuboptimalpolicywithacorrespondingsuboptimaltotalexpecteddiscountedreward,referredtoasSuboptimalRewardRso. 3. TheRewardRatioristheratioofthesuboptimalrewardtotheoptimalreward(i.e.,r=Rso=Ro),whichindicatessuboptimalexpectedtotaldiscountedrewardvariationwiththeaveragesensornodelifetimeestimationinaccuracy. Itcanbeshownthattherewardratiovariesfrom(0,2]aslvariesfrom(-100%,100%].Therewardratio'sidealvalueis1,whichoccurswhentheaveragesensornodelifetimeisaccuratelyestimated/predicted(l=lcorrespondingtol=0).SensitivityanalysisrevealedthatourMDP-basedpolicyissensitivetoaccuratedeterminationofparameters,especiallyaveragelifetime,becauseinaccurateaveragesensornodelifetimeresultsinasuboptimalexpectedtotaldiscountedreward.Thedynamicprolermodule(Fig. 5-1 )measures/prolestheremainingbatteryenergy(lifetime)andsendsthisinformationtotheapplicationmanageralongwithotherproledstatistics(Section 5.2.1 ),whichhelpsinaccurateestimationof.Estimatingusingthedynamicproler'sfeedbackensuresthattheestimatedaveragesensornodelifetimediffersonlyslightlyfromtheactualaveragesensornodelifetime,andthushelpsinmaintainingarewardratiocloseto1. 5.6.7NumberofIterationsforConvergence ThepolicyiterationalgorithmdeterminesMDPandthecorrespondingexpectedtotaldiscountedrewardontheorderofO(ln(jSj)),whereSisthetotalnumberofstates.Inournumericalresultswithfoursensornodestates,thepolicyiterationalgorithmconvergesintwoiterationsonaverage. 5.7ConcludingRemarks Inthischapter,wepresentedtherst(tothebestofourknowledge)application-orienteddynamictuningmethodologyforembeddedsensornodesindistributedEWSNsbased 169

PAGE 170

onMarkovDecisionProcesses(MDPs).OurMDP-basedpolicytunesembeddedsensornodeprocessorvoltage,frequency,andsensingfrequencyinaccordancewithapplicationrequirementsoverthelifetimeofasensornode.OurproposedmethodologyisadaptiveanddynamicallydeterminesthenewMDP-basedpolicywheneverapplicationrequirementschange(whichmaybeinaccordancewithchangingenvironmentalstimuli).WecomparedourMDP-basedpolicywithfourxedheuristicpoliciesandconcludethatourproposedMDP-basedpolicyoutperformseachheuristicpolicyforallsensornodelifetimes,statetransitioncosts,andapplicationmetricweightfactors.Weprovidedtheimplementationguidelinesofourproposedpolicyinembeddedsensornodes.Weprovedthatourproposedpolicyhasfastconvergencerate,computationallyinexpensiveandthuscanbeconsideredforimplementationinsensornodeswithlimitedprocessingresources. FutureworkincludesenhancingourMDPmodeltoincorporateadditionalhigh-levelapplicationmetrics(e.g.,security,reliability,energy,lifetime,etc.)aswellasadditionalembeddedsensornodetunableparameters(suchasradiotransmissionpower,radiotransmissionfrequency,etc.).Furthermore,weplantoincorporatewirelesschannelconditionintheMDPstatespace,thusformulatingastochasticdynamicprogramthatenablessensornodetuninginaccordancewithchangingwirelesschannelcondition.WeplantoimplementourMDP-basedmethodologyonhardwaresensornodesforfurthervericationofresults.Inaddition,wewillenhancesensornodetuningautomationusingprolingstatisticsbyarchitectingmechanismsthatenablethesensornodetoautomaticallyreacttoenvironmentalstimuliwithouttheneedforanapplicationmanager'sfeedback.FutureworkalsoincludestheextensionofourMDP-baseddynamicoptimizationmethodologyforperformingglobaloptimization(i.e.,selectionofsensornodetunableparametersettingstoensurethatapplicationrequirementsaremetforEWSNasawholewheredifferentsensornodescollaboratewitheachotherinoptimal/suboptimaltunableparametersettingsdetermination). 170

PAGE 171

CHAPTER6ONLINEALGORITHMSFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKSDYNAMICOPTIMIZATION Anembeddedwirelesssensornetwork(EWSN)typicallyconsistsofasetofspatiallydistributedembeddedsensornodesthatwirelesslycommunicatewitheachothertocollectivelyaccomplishanapplicationspecictask.Duetotechnologicaladvancementsinwirelesscommunicationsandembeddedsystems,thereexistsaplethoraofEWSNapplications,includingsecurity/defensesystems,industrialautomation,healthcare,andlogistics. GiventhewiderangeofEWSNapplications,anapplicationdesignerisleftwiththechallengingtaskofdesigninganEWSNwhiletakingintoconsiderationapplica-tionrequirements(lifetime,throughput,reliability,etc.).Moreover,theseapplicationrequirementsareaffectedbyenvironmentalstimuli(e.g.,poorwirelesschannelconditionsmaynecessitateincreasedtransmissionpower)andcanchangeovertimeasoperationalsituationsevolve(e.g.,unexpectedwindsfueladyingforestre).Sincecommercialoff-theshelf(COTS)embeddedsensornodeshavelimitedresources(i.e.,batterylifetime,processingpower,etc.),delicatedesignandtradeoffconsiderationsarenecessarytomeetoftencompetingapplicationrequirements(e.g.,highprocessingrequirementswithlonglifetimerequirements). Inordertomeetawiderangeofapplicationrequirements,COTSembeddedsensornodesaregenericallydesigned,buthowever,tunableparameters(e.g.,processorvoltage,processorfrequency,sensingfrequency,radiotransmissionpower,packetsize,etc.)enabletheembeddedsensornodetotuneoperationtomeetapplicationrequirements.Nevertheless,applicationdesignersareleftwiththetaskofparame-tertuningduringEWSNdesigntime.Parametertuningistheprocessofassigningappropriatevaluesforsensornodetunableparametersinordertomeetapplicationrequirements.Parametertuninginvolvesseveralchallengessuchasoptimalparametervalueselectiongivenlargedesignspaces,considerationforcompetingapplication 171

PAGE 172

requirementsandtunableparameters,difcultiesincreatingaccuratesimulationenvironments,slowsimulationtimes,etc.Inaddition,designtimestaticdeterminationoftheseparametersleavestheembeddedsensornodewithlittleornoexibilitytoadapttotheactualoperatingenvironment.Furthermore,manyapplicationdesignersarenon-experts(e.g.,agriculturist,biologists,etc.)andlacksufcientexpertiseforparametertuning.Therefore,autonomousparametertuningmethodologiesmayalleviatemanyofthesedesignchallenges. Dynamicoptimizationsenableautonomousembeddedsensornodeparametertuningusingspecialhardware/softwarealgorithmstodetermineparametervaluesinsituaccordingtoapplicationrequirementsandchangingenvironmentalstimuli.Dynamicoptimizationsrequireminimalapplicationdesignereffortandenableapplicationdesignerstospecifyonlyhigh-levelapplicationrequirementswithoutknowledgeofparameterspecics.Nevertheless,dynamicoptimizationsrelyonfastandlightweightonlineoptimizationalgorithmsforinsituparametertuning. Thedynamicprolingandoptimizationprojectaspiresatalleviatingthecomplexitiesassociatedwithsensor-basedsystemdesignthroughtheuseofdynamicprolingmethodscapableofobservingapplication-levelbehavioranddynamicoptimizationtotunetheunderlyingplatformaccordingly[ 164 ].Thedynamicprolingandoptimizationprojecthasevaluateddynamicprolingmethodsforobservingapplication-levelbehaviorbygatheringprolingstatistics,butdynamicoptimizationmethodsstillneedexploration.Inourpreviouswork[ 137 ],weproposedaMarkovDecisionProcess(MDP)-basedmethodologytoprescribeoptimal/suboptimalsensornodeoperationtomeetapplicationrequirementsandadapttochangingenvironmentalstimuli.However,theMDP-basedpolicywasnotautonomousbecausethemethodologyrequiredtheapplicationdesignertoorchestrateMDP-basedpolicyreevaluationwheneverapplicationrequirementsandenvironmentalstimulichanged.Inaddition,sincepolicyreevaluation 172

PAGE 173

wascomputationallyandmemoryexpensive,thisprocesswasdoneofineonapowerfuldesktopmachine. ToenableinsituautonomousEWSNdynamicoptimizations,weproposeanonlineEWSNoptimizationmethodologywhichextendsstaticdesigntimeparametertuning[ 152 ].Ourmethodologyisadvantageousoverstaticdesigntimeparametertuningbecauseourmethodologyenablestheembeddedsensornodetoautomaticallyadapttoactualchangingenvironmentalstimuli,resultingincloseradherencetoapplicationrequirements.Furthermore,ourmethodologyismoreamenabletonon-expertapplicationdesignersandrequiresnoapplicationdesignereffortafterinitialEWSNdeployment.Lightweight(lowcomputationalandmemoryresources)onlinealgorithmsarecrucialforembeddedsensornodesconsideringlimitedprocessing,storage,andenergyresourcesofembeddedsensornodesindistributedEWSNs.Ouronlinelightweightoptimizationalgorithmsimpartfastdesignspaceexplorationtoyieldanoptimalornearoptimalparametervalueselection. 6.1RelatedWork Thereexistsmuchresearchintheareaofdynamicoptimizations[ 42 ][ 43 ][ 44 ][ 45 ],buthowever,mostpreviousworkfocusesontheprocessorormemory(cache)incomputersystems.WhereastheseendeavorscanprovidevaluableinsightsintoEWSNdynamicoptimizations,theyarenotdirectlyapplicabletoEWSNsduetodifferentdesignspaces,platformparticulars,andasensornode'stightdesignconstraints. IntheareaofEWSNdynamicprolingandoptimizations,Sridharanetal.[ 142 ]obtainedaccurateenvironmentalstimulibydynamicallyprolingtheEWSN'soperatingenvironment,buthowever,didnotproposeanymethodologytoleveragetheseprolingstatisticsforoptimizations.Inourpreviouswork[ 137 ],weproposedanautomatedMarkovDecisionProcess(MDP)-basedmethodologytoprescribeoptimalsensornodeoperationtomeetapplicationrequirementsandadapttochangingenvironmentalstimuli.Kogekaretal.[ 84 ]proposedanapproachfordynamicsoftwarerecongurationin 173

PAGE 174

EWSNsusingdynamicallyadaptivesoftware,whichusedtaskstodetectenvironmentalchanges(eventoccurrences)andadaptthesoftwaretothenewconditions.Theirworkdidnotconsidersensornodetunableparameters. Severalpapersexploreddynamicvoltageandfrequencyscaling(DVFS)forreducedenergyconsumptioninEWSNs.Minetal.[ 82 ]demonstratedthatdynamicprocessorvoltagescalingreducedenergyconsumptionby60%.Similarly,Yuanetal.[ 83 ]studiedaDVFSsystemthatusedadditionaltransmitteddatapacketinformationtoselectappropriateprocessorvoltageandfrequencyvalues.AlthoughDVFSprovidesamechanismfordynamicoptimizations,consideringadditionalsensornodetunableparametersincreasesthedesignspaceandthesensornode'sabilitytomeetapplicationrequirements.Tothebestofourknowledge,ourworkisthersttoexploreanextensivesensornodedesignspace. SomepreviousworksinEWSNoptimizationsexploregreedyandsimulatedannealing(SA)-basedmethods,butthesepreviousworksdidnotanalyzeexecutiontimeandmemoryrequirements.Huberetal.[ 165 ]maximizedtheamountofdatagatheredusingadistributedgreedyschedulingalgorithmthataimedatdetermininganoptimalsensingschedule,whichconsistedofatimesequenceofscheduledsensornodemeasurements.Inpriorwork,Lyseckyetal.[ 152 ]proposedanSA-basedautomatedapplicationspecictuningofparameterizedsensor-basedembeddedsystemsandfoundthatautomatedtuningcanimproveEWSNoperationby40%onaverage.Verma[ 153 ]studiedSAandparticleswarmoptimization(PSO)methodsforautomatedapplicationspecictuningandobservedthatSAperformedbetterthanPSObecausePSOoftenquicklyconvergedtolocalminima. AlthoughpreviousworksinEWSNoptimizationsexploregreedyandSA-basedmethods,thesepreviousworksdidnotanalyzeexecutiontimeandmemoryrequirements.Furthermore,thepreviousworksdidnotinvestigategreedyandSAalgorithmsasonlinealgorithmsfordynamicoptimizations.Priorwork[ 152 ][ 153 ]consideredalimiteddesign 174

PAGE 175

spacewithafewsensornodetunableparameters.Toaddressthedecienciesinpreviouswork,weanalyzegreedyandSAalgorithmsasonlinealgorithmsforperformingdynamicoptimizationsconsideringalargedesignspacecontainingmanytunableparametersandvalues.Thisne-graineddesignspaceenablesembeddedsensornodestomorecloselymeetapplicationrequirements,butexacerbatesoptimizationchallengesconsideringanembeddedsensornode'sconstrainedmemoryandcomputationalresources. 6.2DynamicOptimizationMethodology Inthissection,wegiveanoverviewofourdynamicoptimizationmethodology.Wealsoformulatethestatespace,objectivefunction,andonlinelightweightoptimizationalgorithms/heuristicsforourdynamicoptimizationmethodology. 6.2.1MethodologyOverview Fig. 6-1 depictsourdynamicoptimizationmethodology.Theapplicationdesignerspeciesapplicationrequirementsusinghigh-levelapplicationmetrics(e.g.,lifetime,throughput,reliability),associatedminimumandmaximumdesired/acceptablevalues,andassociatedweightfactorsthatspecifytheimportanceofeachhigh-levelmetricwithrespecttoeachother. TheshadedboxinFig. 6-1 depictstheoveralloperationalow,orchestratedbythedynamicoptimizationcontroller,ateachsensornode.Thedynamicoptimizationcontrollerreceivesapplicationrequirementsandinvokesthedynamicoptimizationmodule.Thedynamicoptimizationmoduledeterminesthesensornode'soperatingstate(tunableparametervaluesettings)usinganonlineoptimizationalgorithm.Thesensornodewilloperateinthatstateuntilastatechangeisnecessary.Statechangesoccurtoreacttochangingenvironmentalstimuliusingthedynamicprolermoduleandprolingstatisticsprocessingmodule. Thedynamicprolermodulerecordsprolingstatistics(e.g.,wirelesschannelcondition,numberofdroppedpackets,packetsize,radiotransmissionpower,etc.)and 175

PAGE 176

Figure6-1. DynamicoptimizationmethodologyfordistributedEWSNs. theprolingstatisticsprocessingmoduleperformsanynecessarydataprocessing.Thedynamicoptimizationcontrollerevaluatestheprocessedprolingstatisticstodetermineifthecurrentoperatingstatemeetstheapplicationrequirements.Iftheapplicationrequirementsarenotmet,thedynamicoptimizationcontrollerreinvokesthedynamicoptimizationmoduletodetermineanewoperatingstate.Thisfeedbackprocesscontinuestoensuretheselectionofanappropriateoperatingstatetobestmeettheapplicationrequirements.Currently,ouronlinealgorithmsdonotdirectlyconsidertheseprolingstatistics,butthatincorporationisthefocusofourfuturework. 6.2.2StateSpace ThestatespaceSforourdynamicoptimizationmethodologyisdenedas S=S1S2SN(6) whereNdenotesthenumberoftunableparameters,Sidenotesthestatespacefortunableparameteri,8i2f1,2,...,Ng,anddenotestheCartesianproduct.Each 176

PAGE 177

tunableparameter'sstatespaceSiconsistsofnvalues Si=fsi1,si2,si3,...,sing:jSij=n(6) wherejSijdenotesthetunableparameteri'sstatespacecardinality(thenumberoftunablevaluesinSi).Sisasetofn-tupleswhereeachn-tuplerepresentsasensornodestates.Notethatsomen-tuplesinSmaynotbefeasible(e.g.,allprocessorvoltageandfrequencypairsarenotfeasible)andcanberegardedasdonotcaretuples. 6.2.3ObjectiveFunction Theembeddedsensornodedynamicoptimizationproblemcanbeformulatedas maxf(s)s.t.s2S (6) wheref(s)representstheobjectivefunctionandcapturesapplicationrequirementsandcanbegivenas f(s)=mXk=1!kfk(s)s.t.s2S!k0,k=1,2,...,m.!k1,k=1,2,...,m.mXk=1!k=1, (6) wherefk(s)and!kdenotetheobjectivefunctionandweightfactorforthekthapplicationmetric,respectively,giventhattherearemapplicationmetrics.Ourobjectivefunctioncharacterizationconsiderslifetime,throughput,andreliability(i.e.,m=3)(additionalapplicationmetricscanbeincluded)andisgivenas f(s)=!lfl(s)+!tft(s)+!rfr(s)(6) 177

PAGE 178

Figure6-2. Lifetimeobjectivefunctionfl(s). wherefl(s),ft(s),andfr(s)denotethelifetime,throughput,andreliabilityobjectivefunctions,respectively,and!l,!t,and!rdenotetheweightfactorsforlifetime,throughput,andreliability,respectively. Weconsiderpiecewiselinearobjectivefunctionsforlifetime,throughput,andreliability[ 152 ][ 153 ].Wedenethelifetimeobjectivefunction(Fig. 6-2 )inEquation 6 as fl(s)=8>>>>>>>>>>>>>><>>>>>>>>>>>>>>:1,sllCUl+(Cl)]TJ /F6 7.97 Tf 6.58 0 Td[(CUl)(sl)]TJ /F6 7.97 Tf 6.59 0 Td[(Ul) (l)]TJ /F6 7.97 Tf 6.59 0 Td[(Ul),Ulsl
PAGE 179

objectivefunctionvalueatLl,Ul,andl,respectively.ThethroughputandreliabilityobjectivefunctionscanbedenedsimilartoEquation 6 6.2.4OnlineOptimizationAlgorithms Inthissubsection,wepresentouronlineoptimizationalgorithms/heuristicsfordynamicoptimizations.Wefocusontwomainonlineoptimizationalgorithms,agreedyandanSA-based. 6.2.4.1GreedyAlgorithm Input: f(s),N,n Output: Embeddedsensornodestatethatmaximizesf(s)andthecorrespondingf(s)value initialtunableparametervalues;1 objBestSol solutionfromstate;2 foreachEmbeddedSensorNodeTunableParameterdo3 fori 1tondo4 objSolTemp currentstatesolution;5 ifobjSolTemp>objBestSolthen6 objBestSol objSolTemp;7 ;8 else9 break;10 end11 end12 selectthenexttunableparameter;13 end14 return,objBestSol15 Algorithm2:Greedyalgorithmforembeddedsensornodedynamicoptimization. Algorithm 2 depictsourgreedyalgorithm,whichtakesasinputtheobjectivefunctionf(s)(Equation 6 ),thenumberofembeddedsensornodetunableparametersN,andeachtunableparameter'sdesignspacecardinalityn(thealgorithmassumesthesamestatespacecardinalityforalltunableparametersfornotationalsimplicity).Thealgorithmsetstheinitialstatewithinitialtunableparametervalues(line1)andthebestsolutionobjectivefunctionvalueobjBestSoltothevalueobtainedfromtheinitialstate(line2).Thealgorithmexploreseachparameterinturn,startingfromthelastparameter(withstatespaceSNinEquation 6 ),whileholdingallotherparametersxedaccordingto.Foreachparametervalues(exploredinascending 179

PAGE 180

order)denotedascurrentstate,thealgorithmcomputestheobjectivefunctionvalueobjSolTemp(lines4and5).Ifthecurrentstateresultsinanimprovementintheobjectivefunctionvalue(line6),objSolTempandareupdatedtothenewbeststate(lines6-8).Thisprocesscontinuesuntilthereisnoobjectivefunctionvalueimprovement(objSolTemp
PAGE 181

Input: f(s),N,n,T0,,c0,t0 Output: Embeddedsensornodestatethatmaximizesf(s)andthecorrespondingf(s)value c,t,q 0;1 rand()%N;2 Tq T0;3 objSolInit solutionfromstate;4 objSolTemp objSolInit;5 objBestSol objSolInit;6 q q+1;7 whiletRAND MAX/2then10 j(+rand()%N)j%N;11 else12 j()]TJ /F29 9.963 Tf 9.96 0 Td[(rand()%N)j%N;13 end14 objSolNew newstatesolution;15 ifobjSolNew>bestSolthen16 objBestSol objSolNew;17 ;18 end19 ifobjSolNew>objSolTempthen20 P 1;21 else22 P exp((objSolNew)]TJ /F16 9.963 Tf 10.52 0 Td[(objSolTemp)/Tq);23 end24 rP rand()/RAND MAX;25 ifP>rPthen26 objSolTemp objSolNew;27 ;28 end29 q q+1;30 c c+1;31 end32 Tq Tq;33 t t+1;34 c 0;35 end36 return,objBestSol37 Algorithm3:Simulatedannealingalgorithmforembeddedsensornodedynamicoptimization. annealingtemperatureTqisinitializedtoT0(line3);theinitialstate'sobjectivefunctionvalueisassignedtothevariableobjSolInit(line4);thecurrentstateobjectivefunctionvalueobjSolTempandthebeststateobjectivefunctionvalueobjBestSolareinitializedtoobjSolInit(lines5and6). 181

PAGE 182

Thealgorithmbeginsthestateexplorationbyrstincrementingthenumberofstatesexploredqfrom0to1(line7).Foreachtrial(lines1031),thealgorithmexploresnewneighboringstateswherejj=Npseudo-randomly(lines1014)insearchofabettersolutionandcalculatestheresultingobjectivefunctionvalueobjSolNew(line15).IfthenewstateoffersahigherobjectivefunctionvalueascomparedtothepreviousobjBestSol,thenewstatebecomesthebestsolution(lines1619),otherwisethealgorithmdeterminestheacceptanceprobabilityPofthenewstatebeingselectedasthecurrentstateusingtheMetropolis-Hastingsrandomwalkalgorithm(lines2029)[ 167 ].Athightemperatures,theMetropolis-Hastingsalgorithmacceptsallmoves(randomwalk)whileatlowtemperatures,theMetropolis-Hastingsalgorithmperformsstochastichill-climbing(theacceptanceprobabilitydependsonthedifferencebetweentheobjectivefunctionandtheannealingtemperature).Attheendofeachtrial,theannealingtemperatureisdecreasedexponentially(line33)andtheprocesscontinuesuntilt!t0(lines836).Afteralltrialshavecompleted,thealgorithmreturnsthecurrentbeststateandtheresultingobjectivefunctionvalueobjBestSol(line37). TheselectionoftheSAalgorithm'sparametersiscriticalindeterminingagoodqualitysolutionforembeddedsensornodeparametertuning.Specically,theselectionoftheT0valueisimportantbecauseaninappropriateT0valuemayyieldlowerqualitysolutions.WeproposetosetT0equaltothemaximalobjectivefunctiondifferencebetweenanytwoneighboringstates(i.e.,T0=maxj4f(s)jwhere4f(s)denotestheobjectivefunctiondifferencebetweenanytwoneighboringstates).ThispropositionisanextensionofT0selectionbasedonthemaximalenergydifferencebetweenneighboringstates[ 167 ][ 168 ].However,itisnotpossibletoestimatemaxj4f(s)jbetweentwoneighboringstatesbecauseSAexploresthedesignspacepseudo-randomly.WeproposeanapproximationT0jmaxjf(s)j)]TJ /F1 11.955 Tf 18.71 0 Td[(minjf(s)jjwhereminjf(s)jandmaxjf(s)jdenotetheminimumandmaximumobjectivefunctionvaluesinthedesignspace, 182

PAGE 183

respectively.Theexhaustivesearchalgorithmcanbeusedtondminjf(s)jandmaxjf(s)jbyminimizingandmaximizingtheobjectivefunction,respectively. 6.3ExperimentalResults Inthissection,wedescribeourexperimentalsetupandexperimentalresultsforgreedyandsimulatedannealing(SA)algorithmsfordifferentapplicationdomains.TheseresultsevaluatethegreedyandSAalgorithmsintermsofsolutionqualityandthepercentageofstatespaceexplored.Wealsopresentdatamemory,executiontime,andenergyresultstoprovideinsightsintothecomplexityandenergyrequirementsofouronlinealgorithms. 6.3.1ExperimentalSetup OurexperimentsarebasedontheCrossbowIRISmotes[ 95 ]thatoperateusingtwoAAalkalinebatterieswithabatterycapacityof2000mA-h.ThisplatformintegratesanAtmelATmega1281microcontroller[ 92 ],anAtmelAT-86RF230lowpower2.4GHztransceiver[ 93 ],andaMTS400sensorboard[ 90 ]withSensirionSHT1xhumidityandtemperaturesensors[ 91 ].Inordertoinvestigatethedelityofouronlinealgorithmsacrosssmallandlargedesignspaces,weconsidertwodesignspacecardinalities(numberofstatesinthedesignspace)jSj=729andjSj=31,104.ThestatespacejSj=729resultsfromsixtunableparameterswiththreetunablevalueseach:processorvoltageVp=f2.7,3.3,4g(volts),processorfrequencyFp=f4,6,8g(MHz)[ 92 ],sensingfrequencyFs=f1,2,3g(samplespersecond)[ 91 ],packettransmissionintervalPti=f60,300,600g(seconds),packetsizePs=f41,56,64g(bytes),andtransceivertransmissionpowerPtx=f-17,-3,1g(dBm)[ 93 ].ThetunableparametersforjSj=31,104areVp=f1.8,2.7,3.3,4,4.5,5g(volts),Fp=f2,4,6,8,12,16g(MHz)[ 92 ],Fs=f0.2,0.5,1,2,3,4g(samplespersecond)[ 91 ],Ps=f32,41,56,64,100,127g(bytes),Pti=f10,30,60,300,600,1200g(seconds),andPtx=f-17,-3,1,3g(dBm)[ 93 ].AllstatespacetuplesarefeasibleforjSj=729,whereasjSj=31,104contains7,779infeasiblestatespacetuples(e.g.,allVpandFppairsarenotfeasible). 183

PAGE 184

Weanalyzethreesampleapplicationdomains:asecurity/defensesystem,ahealthcareapplication,andanambientconditionsmonitoringapplication.Tomodeleachapplicationdomain,weassignapplicationspecicvaluesforthedesirableminimumL,desirablemaximumU,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforapplicationmetricsandassociatedweightfactors.Wespecifytheobjectivefunctionparametersasamultipleofabaseunitwhereonelifetimeunitisequalto5days,onethroughputunitisequalto20kbps,andonereliabilityunitisequalto0.05(percentageoferror-freepackettransmissions).Weassignapplicationmetricvaluesforanapplicationconsideringtheapplication'stypicalrequirements[ 137 ].Forexample,ahealthcareapplicationwithasensorimplantedintoapatienttomonitorphysiologicaldata(e.g.,heartrate,glucoselevel,etc.)mayhavealongerlifetimerequirementbecausefrequentbatteryreplacementmaybedifcult.Table 6-1 depictstheapplicationrequirementsintermsofobjectivefunctionparametervaluesforthethreeapplicationdomains. Thelifetime,throughput,andreliabilityobjectivefunctionvaluescorrespondingtothedesirableminimumandmaximumparametervaluesare0.1and0.9,respectivelyandtheobjectivefunctionvaluescorrespondingtotheacceptableminimumandmaximumparametervaluesare0and1,respectively. Forbrevity,weselectedasinglesampleEWSNplatformcongurationandthreeapplicationdomains,butwepointoutthatourdynamicoptimizationmethodologyandonlineoptimizationalgorithmsareequallyapplicabletoanyEWSNplatformandapplication. 6.3.2Results WeimplementedourgreedyandSA-basedonlineoptimizationalgorithmsinC++andevaluatedthealgorithmsintermsofthepercentageofthedesignspaceexplored,thequality(objectivefunctionvalue)ofeachalgorithm'sdeterminedbeststateascomparedtotheoptimalstatedeterminedusinganexhaustivesearch,andthe 184

PAGE 185

Table6-1. DesirableminimumL,desirablemaximumU,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforasecurity/defensesystem,healthcare,andanambientconditionsmonitoringapplication.Onelifetimeunit=5days,onethroughputunit=20kbps,onereliabilityunit=0.05. NotationSecurity/DefenseHealthCareAmbientMonitoring Ll8units12units6unitsUl30units32units40unitsl1units2units3unitsl36units40units60unitsLt20units19units15unitUt34units36units29unitst0.5units0.4units0.05unitst45units47units35unitsLr14units12units11unitsUr19.8units17units16unitsr10units8units6unitsr20units20units20units totalexecutiontime.Wealsoperformeddatamemoryandenergyanalysistoanalyzescalabilityfordifferentdesignspacesizes. Fig. 6-3 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionfortheSAandgreedyalgorithmsversusthenumberofstatesexploredforasecurity/defensesystemforjSj=729where!l=0.25,!t=0.35,and!r=0.4.TheSAparametersarecalculatedasoutlinedinSection 6.2.4.2 (e.g.,T0=jmaxjf(s)j)]TJ /F1 11.955 Tf 20.07 0 Td[(minjf(s)jj=j0.7737)]TJ /F1 11.955 Tf 12.67 0 Td[(0.1321j=0.6416and=0.8[ 167 ]).Fig. 6-3 showsthatthegreedyandSAalgorithmsconvergedtoasteadystatesolutionafterexploring11and400states,respectively.TheseconvergenceresultsshowthatthegreedyalgorithmconvergedtothenalsolutionfasterthantheSAalgorithm,exploringonly1.51%ofthedesignspace,whereastheSAalgorithmexplored54.87%ofthedesignspace.Thegurerevealsthattheaveragegrowthrateforincreasingsolutionqualitywasfasterintheinitialiterationsthaninthelateriterations.Fig. 6-3 showsanaveragegrowthrate 185

PAGE 186

Figure6-3. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforasecurity/defensesystemwhere!l=0.25,!t=0.35,!r=0.4,jSj=729. ofapproximately22.96%and52.56%fortheinitialiterationsforthegreedyandSAalgorithms,respectively,anddecreasedto12.8%and0.00322%forthelateriterationsofthegreedyandSAalgorithms,respectively.Boththealgorithmsconvergedtotheoptimalsolutionaswasobtainedfromanexhaustivesearchofthedesignspace. Fig. 6-4 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionfortheSAandgreedyalgorithmsversusthenumberofstatesexploredforahealthcareapplicationforjSj=729where!l=0.25,!t=0.35,and!r=0.4.TheSAparametersareT0=j0.7472)]TJ /F1 11.955 Tf 12.24 0 Td[(0.2254j=0.5218and=0.8[ 167 ].Fig. 6-4 showsthatthegreedyandSAalgorithmsconvergedtoasteadystatesolutionafterexploring11states(1.51%ofthedesignspace)and400states(54.87%ofthedesignspace),respectively.TheSAalgorithmconvergedtotheoptimalsolutionafterexploring400stateswhereasthegreedyalgorithm'ssolutionqualityafterexploring11stateswaswithin0.027%oftheoptimalsolution.Fig. 6-4 showsanaveragegrowthrateofapproximately11.76%and5.22%fortheinitialiterationsforthegreedyandSAalgorithms,respectively,anddecreasedto2.27%and0.001%forthelateriterationsofthegreedyandSAalgorithms,respectively. 186

PAGE 187

Figure6-4. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforahealthcareapplicationwhere!l=0.25,!t=0.35,!r=0.4,jSj=729. Fig. 6-5 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionfortheSAandgreedyalgorithmsversusthenumberofstatesexploredforanambientconditionmonitoringapplicationforjSj=31,104where!l=0.6,!t=0.25,!r=0.15.TheSAparametersareT0=j0.8191)]TJ /F1 11.955 Tf 12.49 0 Td[(0.2163j=0.6028and=0.8[ 167 ].Fig. 6-5 showsthatthegreedyandSAalgorithmsconvergedtoasteadystatesolutionafterexploring9states(0.029%ofthedesignspace)and400states(1.29%ofthedesignspace),respectively(thisrepresentsasimilartrendasforthesecurity/defenseandhealthcareapplications).ThegreedyandSAalgorithms'solutionsafterexploring9and400statesarewithin6.6%and0.5%oftheoptimalsolution,respectively.Fig. 6-5 showsanaveragegrowthrateofapproximately5.02%and8.16%fortheinitialiterationsforthegreedyandSAalgorithms,respectively,and0.11%and0.0017%forthelateriterationsofthegreedyandSAalgorithms,respectively. Theresultsalsoprovideinsightsintotheconvergenceratesandrevealthateventhoughthedesignspacecardinalityincreasesby43x(from729to31,104),thegreedyandSAalgorithmsstillexploreonlyasmallpercentageofthedesignspaceandresultinhigh-qualitysolutions.TheresultsindicatethattheSAalgorithmconvergestothe 187

PAGE 188

Figure6-5. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforthegreedyandsimulatedannealingalgorithmsforanambientconditionsmonitoringapplicationwhere!l=0.6,!t=0.25,!r=0.15,jSj=31,104. optimal(ornearoptimal)solutionslowly,however,theSAalgorithmcanresultinadesiredsolutionqualitybycontrollingtheallowablenumberofstatesexplored.Theresultsrevealthatfortightlyconstrainedruntimes,thegreedyalgorithmcanprovidebetterresultsthantheSAalgorithm(e.g.,whenexplorationofonly6states(0.82%ofS)orlessisallowed),however,theSAalgorithmrequireslongerruntimestoachieveanearoptimalsolutions(e.g.,thegreedyalgorithmobtainedasolutionwithin8.3%oftheoptimalsolutiononaverageafterexploring1.37%ofdesignspacewhereastheSAalgorithmobtainedasolutionwithin0.237%oftheoptimalsolutiononaverageafterexploring54.87%ofdesignspaceforjSj=729). Toverifythatouralgorithmsarelightweight,weanalyzedtheexecutiontime,energyconsumption,anddatamemoryrequirements.Wemeasuredtheexecutiontime(averagedover10,000runstosmoothanydiscrepanciesduetooperatingsystemoverheads)forbothalgorithmsonanIntelXeonCPUrunningat2.66GHz[ 169 ]usingtheLinux/Unixtimecommand[ 170 ].WescaledtheseruntimestotheAtmelATmega1281microcontroller[ 92 ]runningat8MHz.Whereasthisscalingdoesnot 188

PAGE 189

provideexactabsoluteruntimesfortheAtmelprocessor,thecomparisonofthesevaluesprovidesvaluableinsights.ForeachSArun,weinitializedthepseudo-randomnumbergeneratorwithadifferentseedusingsrand()[ 171 ].Weobservethatthegreedyalgorithmexplores1(0.14%ofthedesignspaceS),4(0.55%ofS),and10(1.37%ofS)statesin0.366,0.732,and0.964ms,respectively(thegreedyalgorithmconvergedafter10iterations).TheSAalgorithmexplores1,4,10,100(13.72%ofS),421(57.75%ofS),and729(100%ofS)statesin1.097,1.197,1.297,3.39,11.34,and18.19ms,respectively,forjSj=729.Onaverage,theexecutiontimelinearlyincreasesby0.039and0.023msperstateforthegreedyandSAalgorithms,respectively.Thegreedyalgorithmrequires34.54%lessexecutiontimeonaverageascomparedtoSA(afterexploring10states).WemeasuredthegreedyandSAalgorithms'executiontimeforjSj=31,104andobservedsimilarresultsasforjSj=729becauseboththealgorithms'executiontimedependsuponthenumberofstatesexploredandnotonthedesignspacecardinality.Theexhaustivesearchrequires29.526msand2.765secondsforjSj=729andjSj=31,104,respectively.Comparedwithanexhaustivesearch,thegreedyandSAalgorithms(afterexploring10states)requires30.63xand22.76xlessexecutiontime,respectively,forjSj=729,andrequires2868.26xand2131.84xlessexecutiontime,respectively,forjSj=31,104.Weveriedourexecutiontimeanalysisusingclock()[ 171 ]andobservedsimilartrends.Theseexecutiontimeresultsindicatethatouronlinealgorithms'efcacyincreasesasthedesignspacecardinalityincreases. WecalculatedtheenergyconsumptionofouralgorithmsEalgoforanAtmelATmega1281microcontroller[ 92 ]operatingatVp=2.7VandFp=8MHzasEalgo=VpIapTexewhereIapandTexedenotetheprocessor'sactivecurrentandthealgorithm'sexecutiontimeat(Vp,Fp),respectively(weobservedsimilartrendsforotherprocessorvoltageandfrequencysettings).Ourcalculationsindicatethatthegreedyalgorithmrequires5.237,10.475,and13.795Jtoexplore1,4,and10states,respectivelywhereastheSAalgorithmrequires15.698,17.129,18.56,48.51,162.28, 189

PAGE 190

Figure6-6. Datamemoryrequirementsforexhaustivesearch,greedy,andsimulatedannealingalgorithmsfordesignspacecardinalitiesof8,81,729,and46656. and260.3Jforexploring1,4,10,100,421,and729states,respectively,bothforjSj=729andjSj=31,104.Theexhaustivesearchrequires0.422and39.567mJforjSj=729andjSj=31,104,respectively.TheSAalgorithmrequires34.54%moreenergyascomparedtothegreedyalgorithmforexploring10stateswhereasboththealgorithmsarehighlyenergy-efcientascomparedtoexhaustivesearch. Fig. 6-6 depictslowdatamemoryrequirementsforbothalgorithmsfordesignspacecardinalitiesof8,81,729,and46,656.Weobservethatthegreedyalgorithmrequires452,520,562,and874bytes,whereastheSAalgorithmrequires508,574,612,and924bytesofstoragefordesignspacecardinalitiesof8,81,729,and46,656,respectively.ThedatamemoryanalysisshowsthattheSAalgorithmhascomparativelylargermemoryrequirements(9.35%onaverageforanalyzeddesignspacecardinalities)thanthegreedyalgorithm.Thedatamemoryrequirementsforboththealgorithmsincreaselinearlyasthenumberoftunableparametersandtunablevalues(andthusthedesignspace)increases.Wepointoutthatthedatamemoryrequirementsfortheexhaustivesearchiscomparabletothegreedyalgorithmbecausetheexhaustivesearchsimplyevaluatestheobjectivefunctionvalueforeachstateinthedesignspace. 190

PAGE 191

However,theexhaustivesearchyieldsahighpenaltyinexecutiontimebecauseofcompletedesignspaceevaluation.Thegurerevealsthatouralgorithmsscalewellwithincreaseddesignspacecardinality,andthusourproposedalgorithmsareappropriateforsensornodeswithalargenumberoftunableparametersandparametervalues. 6.4ConcludingRemarks Inthischapter,weproposedadynamicoptimizationmethodologyusinggreedyandsimulatedannealingonlineoptimizationalgorithmsfordistributedembeddedwirelesssensornetworks.Comparedtopreviouswork,ourmethodologyconsideredanextensiveembeddedsensornodedesignspace,whichallowedembeddedsensornodestomorecloselymeetapplicationrequirements.Resultsrevealedthatouronlinealgorithmswerelightweight,requiredlittlecomputational,memory,andenergyresourcesandthuswereamenableforimplementationonsensornodeswithtightresourceandenergyconstraints.Furthermore,ouronlinealgorithmscouldperforminsituparametertuningtoadapttochangingenvironmentalstimulitomeetapplicationrequirements. Futureworkincludesfurtherresultsvericationusinglargerstatespacescontainingmoreembeddedsensornodetunableparametersandtunablevalues.Inaddition,wewillimplementourdynamicoptimizationmethodologyonahardwaresensornodeplatformtogatherandincorporateprolingstatisticsintoourlightweightoptimizationalgorithms. 191

PAGE 192

CHAPTER7ALIGHTWEIGHTDYNAMICOPTIMIZATIONMETHODOLOGYFORDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS Advancementsinsemiconductortechnology,aspredictedbyMoore'slaw,haveenabledhightransistordensityinasmallchiparearesultingintheminiaturizationofembeddedsystems(e.g.,embeddedsensornodes).Embeddedwirelesssensornetworks(EWSNs)areenvisionedasdistributedcomputingsystems,whichareproliferatinginmanyapplicationdomains(e.g.,defense,healthcare,surveillancesystems)eachwithvaryingapplicationrequirementsthatcanbedenedbyhigh-levelapplicationmetrics(e.g.,lifetime,reliability).However,thediversityofEWSNapplicationdomainsmakesitdifcultforcommercial-off-the-shelf(COTS)embeddedsensornodestomeettheseapplicationrequirements. SinceCOTSembeddedsensornodesaremass-producedtooptimizecost,manyCOTSembeddedsensornodespossesstunableparameters(e.g.,processorvoltageandfrequency,sensingfrequency),whosevaluescanbetunedforapplicationspecialization[ 37 ].TheEWSNapplicationdesigners(thosewhodesign,manage,ordeploytheEWSNforanapplication)aretypicallybiologists,teachers,farmers,andhouseholdconsumersthatareexpertswithintheirapplicationdomain,buthavelimitedtechnicalexpertise.Giventhelargedesignspaceandoperatingconstraints,determiningappropriateparametervalues(operatingstate)canbeadauntingand/ortimeconsumingtaskfornon-expertapplicationmanagers.Typically,embeddedsensornodevendorsassigninitialgenerictunableparametervaluesettings,however,noonetunableparametervaluesettingisappropriateforallapplications.ToassisttheEWSNmanagerswithparametertuningtobestttheapplicationrequirements,anautomatedparametertuningprocessisrequired. Parameteroptimizationistheprocessofassigningappropriate(optimalornear-optimal)tunableparametervaluesettingstomeetapplicationrequirements.Parameteroptimizationscanbestaticordynamic.Staticoptimizationsassign 192

PAGE 193

parametervaluesatdeploymentandthesevaluesremainxedforthelifetimeofthesensornode.Oneofthechallengesassociatedwithstaticoptimizationsisaccuratelydeterminingthetunableparametervaluesettingsusingenvironmentalstimuliprediction/simulation.Furthermore,staticoptimizationsarenotappropriateforapplicationswithvaryingenvironmentalstimuli.Alternatively,dynamicoptimiza-tionsassign(andre-assign/change)parametervaluesduringruntimeenablingthesensornodetoadapttochangingenvironmentalstimuli,andthusmoreaccuratelymeetapplicationrequirements. EWSNdynamicoptimizationspresentadditionalchallengesascomparedtotraditionalprocessorormemory(cache)dynamicoptimizationsbecausesensornodeshavemoretunableparametersandalargerdesignspace.Thedynamicprolingandoptimization(DPOP)projectaimstoaddressthesechallengesandcomplexitiesassociatedwithsensor-basedsystemdesignthroughtheuseofautomatedoptimizationmethods[ 164 ].TheDPOPprojecthasgathereddynamicprolingstatisticsfromasensor-basedsystem,however,theparameteroptimizationprocesshasnotbeenaddressed. Inthischapter,weinvestigateparameteroptimizationusingdynamicprolingdataalreadycollectedfromtheplatform.Weanalyzeseveraldynamicoptimizationmethodsandevaluatealgorithmsthatprovideagoodoperatingstatewithoutsignicantlydepletingthebatteryenergy.Weexplorealargedesignspacewithmanytunableparametersandvalues,whichprovideane-graineddesignspace,enablingembeddedsensornodestomorecloselymeetapplicationrequirementsascomparedtosmaller,morecourse-graineddesignspaces.Gordon-Rossetal.[ 172 ]showedthatner-graineddesignspacescontaininterestingdesignalternativesandresultinincreasedbenetsinthecachesubsystem(thoughsimilartrendsfollowforothersubsystems).However,thelargedesignspaceexacerbatesoptimizationchallenges,takingintoconsiderationanembeddedsensornode'sconstrainedmemoryandcomputationalresources. 193

PAGE 194

Consideringtheembeddedsensornode'slimitedbatterylife,energy-efcientcomputingisalwaysofparamountsignicance.Therefore,optimizationalgorithmsthatconserveenergybyminimizingdesignspaceexplorationtondagoodoperatingstatearecritical,especiallyforlargedesignspacesandhighlyconstrainedsystems.Additionally,rapidlychangingapplicationrequirementsandenvironmentalstimulicoupledwithlimitedbatteryreservesnecessitatesahighlyresponsiveandlowoverheadmethodology. Ourmaincontributionsinthispaperare: Weproposealightweightdynamicoptimizationmethodologythatintelligentlyselectsappropriateinitialtunableparametervaluesettingsbyevaluatingapplicationrequirements,therelativeimportanceoftheserequirementswithrespecttoeachother,andthemagnitudeinwhicheachparametereffectseachrequirement.Thisone-shotoperatingstateobtainedfromappropriateinitialparametervaluesettingsprovidesahigh-qualityoperatingstatewithminimaldesignspaceexplorationforhighly-constrainedapplications.Resultsrevealthattheone-shotoperatingstateiswithin8%oftheoptimaloperatingstateaveragedoverseveraldifferentapplicationdomainsanddesignspaces. Wepresentadynamicoptimizationmethodologytoiterativelyimprovetheone-shotoperatingstatetoprovideanoptimalornear-optimaloperatingstateforlessconstrainedapplications.Ourdynamicoptimizationmethodologycombinestheinitialtunableparametervaluesettingswithanintelligentexplorationorderingoftunableparametervaluesandanexplorationarrangementoftunableparameters(sincesomeparametersaremorecriticalforanapplicationthanothersandthusshouldbeexploredrst[ 26 ](e.g.,thetransmissionpowerparametermaybemorecriticalforalifetime-sensitiveapplicationthanprocessorvoltage)). Wearchitectalightweightonlinegreedyalgorithmthatleveragesintelligentparameterarrangementtoiterativelyexplorethedesignspace,resultinginanoperatingstatewithin2%oftheoptimaloperatingstatewhileexploringonly0.04%ofthedesignspace. OurresearchhasabroadimpactonEWSNdesignanddeployment.Ourworkenablesnon-expertapplicationmanagerstoleverageourdynamicoptimizationmethodologytoautomaticallytailortheembeddedsensornodetunableparameterstobestmeettheapplicationrequirementswithlittledesigntimeeffort.OurproposedmethodologyissuitableforallEWSNapplicationsrangingfromhighlyconstrainedtohighlyexibleapplications.Theone-shotoperatingstateprovidesagoodoperating 194

PAGE 195

stateforhighly-constrainedapplications,whereasgreedyexplorationoftheparametersprovidesimprovementovertheone-shotoperatingstatetodetermineahigh-qualityoperatingstateforlessconstrainedapplications.Ourinitialparametervaluesettings,parameterarrangement,andexplorationorderingtechniquesarealsoapplicabletoothersystemsorapplicationdomains(e.g.,cachetuning)withdifferentapplicationrequirementsanddifferenttunableparameters. Theremainderofthispaperisorganizedasfollows.Section 7.1 overviewstherelatedwork.Section 7.2 presentsourdynamicoptimizationmethodologyalongwiththestatespaceandobjectivefunctionformulation.Wedescribeourdynamicoptimizationmethodology'sstepsandalgorithmsinSection 7.3 .ExperimentalresultsarepresentedinSection 7.4 .Finally,Section 7.5 concludesthepaperanddiscussesfutureresearchworkdirections. 7.1RelatedWork Thereexistsmuchresearchintheareaofdynamicoptimizations[ 43 ][ 44 ][ 45 ][ 26 ],butmostpreviousworktargetstheprocessorormemory(cache)incomputersystems.ThereexistslittlepreviousworkonEWSNdynamicoptimization,whichpresentsmorechallengesgivenauniquedesignspace,designconstraints,platformparticulars,andexternalinuencesfromtheEWSN'soperatingenvironment. Intheareaofdynamicprolingandoptimization,Sridharanetal.[ 142 ]dynamicallyproledanEWSN'soperatingenvironmenttogatherprolingstatistics,however,theydidnotdescribeamethodologytoleveragetheseprolingstatisticsfordynamicoptimization.Shenoyetal.[ 173 ]investigatedprolingmethodsfordynamicallymonitoringsensor-basedplatformswithrespecttonetworktrafcandenergyconsumption,butdidnotexploredynamicoptimizations.Inpriorwork,Muniretal.[ 37 ][ 174 ]proposedaMarkovDecisionProcess(MDP)-basedmethodologyforoptimalsensornodeparametertuningtomeetapplicationrequirementsasarststeptowardsEWSNdynamicoptimization.TheMDP-basedmethodologyrequiredhighcomputationaland 195

PAGE 196

memoryresourcesforlargedesignspacesandneededahigh-performancebasestationnode(sinknode)tocomputetheoptimaloperatingstateforlargedesignspaces.Theoperatingstatesdeterminedatthebasestationwerethencommunicatedtotheothersensornodes.ThehighresourcerequirementsmadetheMDP-basedmethodologyinfeasibleforautonomousdynamicoptimizationforlargedesignspacesgiventheconstrainedresourcesofindividualsensornodes.Kogekaretal.[ 84 ]proposeddynamicsoftwarerecongurationtoadaptsoftwaretonewoperatingconditions,however,theirworkdidnotconsidersensornodetunableparametersandapplicationrequirements.Verma[ 153 ]investigatedsimulatedannealing(SA)andparticleswarmoptimization(PSO)-basedparametertuningforEWSNsandobservedthatSAperformedbetterthanPSObecausePSOoftenquicklyconvergedtolocalminima.AlthoughthereexistsworkonoptimizationofEWSNs,ourworkusesmulti-objectiveoptimizationandne-graineddesignspacetondoptimal(ornear-optimal)sensornodeoperatingstatesthatmeetapplicationrequirements. Oneoftheprominentdynamicoptimizationtechniquesforreducingenergyconsumptionisdynamicvoltageandfrequencyscaling(DVFS).SeveralpreviousworksexploredDVFSinEWSNs.Minetal.[ 82 ]utilizedavoltagescheduler,runningintandemwiththeoperatingsystem'staskscheduler,toperformDVFSbasedonanaprioripredictionofthesensornode'sworkload,andresultedina60%reductioninenergyconsumption.Similarly,Yuanetal.[ 83 ]usedadditionaltransmitteddatapacketinformationtoselectappropriateprocessorvoltageandfrequencyvalues.AlthoughDVFSisamethodfordynamicoptimization,DVFSconsidersonlytwosensornodetunableparameters(processorvoltageandfrequency).Inthischapter,weexpandtheembeddedsensornodeparametertuningspace,whichprovidesaner-graineddesignspace,enablingembeddedsensornodestomorecloselymeetapplicationrequirements. 196

PAGE 197

SomedynamicoptimizationworkutilizeddynamicpowermanagementforenergyconservationinEWSNs.Wangetal.[ 175 ]proposedastrategyforoptimizingmobilesensornodeplacementtomeetcoverageandenergyrequirements.Theirstrategyutilizeddynamicpowermanagementtooptimizethesensornodes'sleepstatetransitionsforenergyconservation.Ningetal.[ 176 ]presentedalinklayerdynamicoptimizationapproachforenergyconservationinEWSNsbyminimizingtheidlelisteningtime.Theirapproachutilizedtrafcstatisticstooptimallycontrolthereceiversleepinterval.Inourwork,weincorporateenergyconservationbyswitchingthesensors,processors,andtransceiverstolowpower,idlemodeswhenthesecomponentsarenotactivelysensing,processing,andcommunicating,respectively. EventhoughthereexistsworkonEWSNoptimizations,dynamicoptimizationrequiresfurtherresearch.Specically,thereisaneedforlightweightdynamicoptimizationmethodologiesforsensornodeparametertuningconsideringasensornode'slimitedenergyandstorage.Furthermore,sensornodetunableparameterarrangementandexplorationorderrequiresfurtherinvestigation.OurworkprovidescontributiontothedynamicoptimizationofEWSNsbyproposingalightweightdynamicoptimizationmethodologyforEWSNsinadditiontoasensornode'stunableparametersarrangementandexplorationordertechniques. 7.2DynamicOptimizationMethodology Inthissection,wegiveanoverviewofourdynamicoptimizationmethodologyalongwiththestatespaceandobjectivefunctionformulationforthemethodology. 7.2.1Overview Fig. 7-1 depictsourdynamicoptimizationmethodologyfordistributedEWSNs.EWSNdesignersevaluateapplicationrequirementsandcapturetheserequirementsashigh-levelapplicationmetrics(e.g.,lifetime,throughput,reliability)andassociatedweightfactors.Theweightfactorssignifytheweightage/importanceofapplicationmetricswithrespecttoeachother.Thesensornodesuseapplicationmetricsandweightfactors 197

PAGE 198

Figure7-1. AlightweightdynamicoptimizationmethodologypersensornodeforEWSNs. todetermineanappropriateoperatingstate(tunableparametervaluesettings)byleveraginganapplicationmetricsestimationmodel.Theapplicationmetricsestimationmodelestimateshigh-levelapplicationmetricsfromlow-levelsensornodeparametersandsensornodehardware-specicinternals(Section 3.1 discussesourapplicationmetricsestimationmodelindetail). Fig. 7-1 showsthepersensornodedynamicoptimizationprocess(encompassedbythedashedcircle),whichisorchestratedbythedynamicoptimizationcontroller.Theprocessconsistsoftwooperatingmodes:theone-shotmodewhereinthesensornodeoperatingstateisdirectlydeterminedbyinitialparametervaluesettingsandtheimprovementmodewhereintheoperatingstateisiterativelyimprovedusinganonlineoptimizationalgorithm.Thedynamicoptimizationprocessconsistsofthreesteps.Intherststepcorrespondingtotheone-shotmode,thedynamicoptimizationcontrollerintelligentlydeterminestheinitialparametervaluesettings(operatingstate)andexplorationorder(ascendingordescending),whichiscriticalinreducingthenumberofstatesexploredinthethirdstep.Intheone-shotmode,thedynamicoptimization 198

PAGE 199

processiscompleteandthesensornodetransitionsdirectlytotheoperatingstatespeciedbytheinitialparametervaluesettings.Thesecondstepcorrespondstotheimprovementmode,whichdeterminestheparameterarrangementbasedonapplicationmetricweightfactors(e.g.,exploreprocessorvoltagethenfrequencythensensingfrequency).Thisparameterarrangementreducesthedesignspaceexplorationtimeusinganoptimizationalgorithminthethirdsteptodetermineagoodqualityoperatingstate.Thethirdstepcorrespondstotheimprovementmodeandinvokesanonlineop-timizationalgorithmforparameterexplorationtoiterativelyimprovetheoperatingstatetomorecloselymeetapplicationrequirementsascomparedtotheone-shot'soperatingstate.Theonlineoptimizationalgorithmleveragestheintelligentinitialparametervaluesettings,explorationorder,andparameterarrangement. Adynamicprolerrecordsprolingstatistics(e.g.,processorvoltage,wirelesschannelcondition,radiotransmissionpower)giventhecurrentoperatingstateandenvironmentalstimuliandpassestheseprolingstatisticstothedynamicoptimizationcontroller.Thedynamicoptimizationcontrollerprocessestheprolingstatisticstodeterminewhetherthecurrentoperatingstatemeetstheapplicationrequirements.Iftheapplicationrequirementsarenotmet,thedynamicoptimizationcontrollerreinvokesthedynamicoptimizationprocesstodetermineanewoperatingstate.Thisfeedbackprocesscontinuestoensurethattheapplicationrequirementsarebestmetunderchangingenvironmentalstimuli.Wepointoutthatourcurrentworkdescribesthedynamicoptimizationmethodology,however,incorporationofprolingstatisticstoprovidefeedbackispartofourfuturework. 7.2.2StateSpace ThestatespaceSforourdynamicoptimizationmethodologygivenNtunableparametersisdenedas S=P1P2PN(7) 199

PAGE 200

wherePidenotesthestatespacefortunableparameteri,8i2f1,2,...,NganddenotestheCartesianproduct.Eachtunableparameter'sstatespacePiconsistsofnvalues Pi=fpi1,pi2,pi3,...,ping:jPij=n(7) wherejPijdenotesthetunableparameteri'sstatespacecardinality(thenumberoftunablevaluesinPi).Sisasetofn-tuplesformedbytakingonetunableparametervaluefromeachtunableparameter.Asinglen-tuples2Sisgivenas s=(p1y,p2y,...,pNy):piy2Pi,8i2f1,2,...,Ng,y2f1,2,...,ng(7) Eachn-tuplerepresentsasensornoteoperatingstate.Wepointoutthatsomen-tuplesinSmaynotbefeasible(suchasinvalidcombinationsofprocessorvoltageandfrequency)andcanberegardedasdonotcaretuples. 7.2.3OptimizationObjectionFunction Thesensornodedynamicoptimizationproblemcanbeformulatedas maxf(s)s.t.s2S (7) wheref(s)representstheobjectivefunctionandcapturesapplicationmetricsandweightfactors,andisgivenas f(s)=mXk=1!kfk(s)s.t.s2S!k0,k=1,2,...,m.!k1,k=1,2,...,m.mXk=1!k=1, (7) 200

PAGE 201

Figure7-2. Lifetimeobjectivefunctionfl(s). wherefk(s)and!kdenotetheobjectivefunctionandweightfactorforthekthapplicationmetric,respectively,giventhattherearemapplicationmetrics. Forourdynamicoptimizationmethodology,weconsiderthreeapplicationmetrics(m=3):lifetime,throughput,andreliability,whoseobjectivefunctionsarerepresentedbyfl(s),ft(s),andfr(s),respectively.Wedenefl(s)(Fig. 7-2 )usingthepiecewiselinearfunction fl(s)=8>>>>>>>>>>>>>><>>>>>>>>>>>>>>:1,sllCUl+(Cl)]TJ /F6 7.97 Tf 6.58 0 Td[(CUl)(sl)]TJ /F6 7.97 Tf 6.59 0 Td[(Ul) (l)]TJ /F6 7.97 Tf 6.59 0 Td[(Ul),Ulsl
PAGE 202

theminimumandmaximumacceptable/desiredvaluesofapplicationmetrics).Theft(s)andfr(s)canbedenedsimilartoEquation 7 Theobjectivefunctioncharacterizationenablesthereward/gaincalculationfromoperatinginagivenstatebasedonthehigh-levelmetricvaluesofferedbythestate.Althoughdifferentcharacterizationofobjectivefunctionsresultsindifferentrewardvaluesfromdifferentstates,ourdynamicoptimizationmethodologyselectsahigh-qualityoperatingstatefromthedesignspacetomaximizethegivenobjectivefunctionvalue.Weconsiderpiecewiselinearobjectivefunctionsasatypicalexamplefromthepossibleobjectivefunctions(e.g.,linear,piecewiselinear,non-linear)toillustrateourdynamicoptimizationmethodology,thoughotherobjectivefunctionscharacterizationsworkequallywellforourmethodology. 7.3AlgorithmsforDynamicOptimizationMethodology Inthissection,wedescribeourdynamicoptimizationmethodology'sthreesteps(Fig. 7-1 )andassociatedalgorithms. 7.3.1InitialTunableParameterValueSettingsandExplorationOrder Therststepofourdynamicoptimizationmethodologydeterminesinitialtunableparametervaluesettingsandexplorationorder(ascendingordescending).Theseinitialtunableparametervaluesettingsresultsinahigh-qualityoperatingstateinone-shot,hencethenameone-shotmode(Fig. 7-1 ).Thealgorithmcalculatestheapplicationmetricobjectivefunctionvaluesfortherstandlastvaluesinthesetoftunablevaluesforeachtunableparameterwhileothertunableparametersaresettoanarbitraryinitialsetting(eitherrstorlastvalue).Wepointoutthatthetunablevaluesforatunableparametercanbearrangedinanascendingorder(e.g.,forprocessorvoltageVp=f2.7,3.3,4g(volts)).Thisobjectivefunctionvaluescalculationdeterminestheeffectivenessofsettingaparticulartunableparametervalueinmeetingthedesiredobjective(e.g.,lifetime).Thetunableparametersettingthatgivesahigherobjectivefunctionvalueisselectedastheinitialparametervalueforthattunableparameter.Theexplorationorder 202

PAGE 203

forthattunableparameterissettodescendingifthelastvalueinthesetoftunablevalues(e.g.,Vp=4inourpreviousexample)givesahigherobjectivefunctionvalueorascendingotherwise.Thisexplorationorderselectionhelpsinreducingdesignspaceexplorationforagreedy-basedoptimizationalgorithm(step3),whichstopsexploringatunableparameterassoonasatunableparametersettinggivesalowerobjectivefunctionvaluethantheinitialsetting.Thisinitialparametervaluesettingandexplorationorderdeterminationprocedureisthenrepeatedforallothertunableparametersandapplicationmetrics. Input: f(s),N,n,m,P Output: Initialtunableparametervaluesettingsandexplorationorder fork 1tomdo1 forPi P1toPNdo2 fkpi1 kthmetricobjectivefunctionvaluewhenparametersettingis3fPi=pi1,Pj=Pj0,8i6=jg; fkpin kthmetricobjectivefunctionvaluewhenparametersettingis4fPi=pin,Pj=Pj0,8i6=jg; fkPi fkpin)]TJ /F16 9.963 Tf 9.96 0 Td[(fkpi1;5 iffkPi0then6 explorePiindescendingorder;7 Pkd[i] descending;8 Pk0[i] pkin;9 else10 explorePiinascendingorder;11 Pkd[i] ascending;12 Pk0[i] pki1;13 end14 end15 end16 returnPkd,Pk0,8k2f1,...,mg Algorithm4:Initialtunableparametervaluesettingsandexplorationorderalgorithm. Algorithm 4 describesourtechniquetodetermineinitialtunableparametervaluesettingsandexplorationorder(rststepofourdynamicoptimizationmethodology).Thealgorithmtakesasinputtheobjectivefunctionf(s),thenumberoftunableparametersN,thenumberofvaluesforeachtunableparametern,thenumberofapplicationmetricsm,andPwherePrepresentsavectorcontainingthetunableparameters,P=fP1,P2,...,PNg.Foreachapplicationmetrick,thealgorithmcalculatesvectors 203

PAGE 204

Pk0andPkd(whereddenotestheexplorationdirection(ascendingordescending)),whichstoretheinitialvaluesettingsandexplorationorder,respectively,forthetunableparameters.Thealgorithmdeterminesthekthapplicationmetricobjectivefunctionvaluesfkpi1andfkpinwheretheparameterbeingexploredPiisassigneditsrstpi1andlastpintunablevalues,respectively,andtherestofthetunableparametersPj,8j6=iareassignedinitialvalues(lines3-4).fkPistoresthedifferencebetweenfkpinandfkpi1.IffkPi0,pinresultsinagreater(orequalwhenfkPi=0)objectivefunctionvalueascomparedtopi1forparameterPi(i.e.,theobjectivefunctionvaluedecreasesastheparametervaluedecreases).Therefore,toreducethenumberofstatesexploredwhileconsideringthatthegreedyalgorithm(Section 7.3.3 )stopsexploringatunableparameterifatunableparameter'svalueyieldsacomparativelylowerobjectivefunctionvalue,Pi'sexplorationordermustbedescending(lines6-8).ThealgorithmassignspinastheinitialvalueofPiforthekthapplicationmetric(line9).IffkPi<0,thealgorithmassignstheexplorationorderasascendingforPiandpi1astheinitialvaluesettingofPi(lines11-13).ThisfkPicalculationprocedureisrepeatedforallmapplicationmetricsandallNtunableparameters(lines1-16). 7.3.2ParameterArrangement Dependingontheapplicationmetricweightfactors,someparametersaremorecriticaltomeetingapplicationrequirementsthanotherparameters.Forexample,sensingfrequencyisacriticalparameterforapplicationswithahighresponsivenessweightfactorandtherefore,sensingfrequencyshouldbeexploredrst.Inthissubsection,wedescribeatechniqueforparameterarrangementsuchthatparametersareexploredinanordercharacterizedbytheparameters'impactonapplicationmetricsbasedonrelativeweightfactors.Thisparameterarrangementtechnique(step2)ispartoftheimprovementmode,whichissuitableforrelativelylessconstrainedapplicationsthatwouldbenetfromahigherqualityoperatingstatethantheone-shotmode'soperatingstate(Fig. 7-1 ). 204

PAGE 205

Theparameterarrangementstepdeterminesanarrangementforthetunableparameterscorrespondingtoeachapplicationmetric,whichdictatestheorderinwhichtheparameterswillbeexplored.Thisarrangementisbasedonthedifferencebetweentheapplicationmetric'sobjectivefunctionvaluescorrespondingtotherstandlastvaluesofthetunableparameters,whichiscalculatedinstep1(i.e.,thetunableparameterthatgivesthehighestdifferenceinanapplicationmetric'sobjectivefunctionvaluesistherstparameterinthearrangementvectorforthatapplicationmetric).Foranarrangementthatconsidersallapplicationmetrics,thetunableparameters'orderissetinaccordancewithapplicationmetrics'weightfactorssuchthatthetunableparametershavingagreatereffectonapplicationmetricswithhigherweightfactorsaresituatedbeforeparametershavingalesseraffectonapplicationmetricswithlowerweightfactorsinthearrangement.Wepointoutthattheeffectofthetunableparametersonanapplicationmetricisdeterminedfromtheobjectivefunctionvaluecalculationsasdescribedinstep1.Thearrangementthatconsidersallapplicationmetricsselectstherstfewtunableparameterscorrespondingtoeachapplicationmetric,startingfromtheapplicationmetricwiththehighestweightfactorsuchthatnoparametersarerepeatedinthenalintelligentparameterarrangement.Forexample,ifprocessorvoltageisamongsttherstfewtunableparameterscorrespondingtotwoapplicationmetrics,thentheprocessorvoltagesettingcorrespondingtotheapplicationmetricwiththegreaterweightfactorisselectedwhereastheprocessorvoltagesettingcorrespondingtotheapplicationmetricwiththelowerweightfactorisignoredinthenalintelligentparameterarrangement.Step3(onlineoptimizationalgorithm)usesthisintelligentparameterarrangementforfurtherdesignspaceexploration.Themathematicaldetailsoftheparameterarrangementstepareasfollows. OurparameterarrangementtechniqueisbasedoncalculationsperformedinAlgorithm 4 .Wedene rfP=frf1P,rf2P,...,rfmPg(7) 205

PAGE 206

whererfPisavectorcontainingrfkP,8k2f1,2,...,mgarrangedindescendingorderbytheirrespectivevaluesandisgivenas rfkP=ffkP1,fkP2,...,fkPNg:jfkPijjfkPi+1j,8i2f1,2,...,N)]TJ /F1 11.955 Tf 11.95 0 Td[(1g (7) ThetunableparameterarrangementvectorPkcorrespondingtorfkP(one-to-onecorrespondence)isgivenby Pk=fPk1,Pk2,...,PkNg,8k2f1,2,...,mg(7) AnintelligentparameterarrangementbPmustconsiderallapplicationmetrics'weightfactorswithhigherimportancegiventothehigherweightfactors,thatis bP=fP11,...,P1l1,P21,...,P2l2,P31,...,P3l3,...,Pm1,...,Pmlmg(7) wherelkdenotesthenumberoftunableparameterstakenfromPk,8k2f1,2,...,mgsuchthatPmk=1lk=N.Ourtechniqueallowstakingmoretunableparametersfromparameterarrangementvectorscorrespondingtohigherweightfactorapplicationmetrics:lklk+1,8k2f1,2,...,m)]TJ /F1 11.955 Tf 12.79 0 Td[(1g.InEquation 7 ,l1tunableparametersaretakenfromvectorP1,thenl2fromvectorP2,andsoontolmfromvectorPmsuchthatfPk1,...,Pklkg\fPk)]TJ /F4 7.97 Tf 6.58 0 Td[(11,...,Pk)]TJ /F4 7.97 Tf 6.59 0 Td[(1lk)]TJ /F12 5.978 Tf 5.75 0 Td[(1g=;,8k2f2,3,...,mg.Inotherwords,weselectthosetunableparametersfromparameterarrangementvectorscorrespondingtothelowerweightfactorsthatarenotalreadyselectedfromparameterarrangementvectorscorrespondingtothehigherweightfactors(i.e.,bPcomprisesofdisjointornon-overlappingtunableparameterscorrespondingtoeachapplicationmetric). Inthesituationwhereweightfactor!1ismuchgreaterthanallotherweightfactors,anintelligentparameterarrangementePwouldcorrespondtotheparameterarrangementfortheapplicationmetricwithweightfactor!1 eP=P1=fP11,P12,...,P1Ng()!1!q,8q2f2,3,...,mg(7) 206

PAGE 207

TheinitialparametervaluevectorbP0andtheexplorationorder(ascendingordescending)vectorbPdcorrespondingtobP(Equation 7 )canbedeterminedfrombP(Equation 7 ),Pkd,andPk0,8k2f1,...,mg(Algorithm 4 )byexaminingthetunableparameterfrombPanddeterminingthetunableparameter'sinitialvaluesettingfromPk0andexplorationorderfromPkd. 7.3.3OnlineOptimizationAlgorithm Stepthreeofourdynamicoptimizationmethodology,whichalsobelongstotheimprovementmode,iterativelyimprovestheone-shot'soperatingstate.Thisstepleveragesinformationfromstepsoneandtwo,andusesagreedyoptimizationalgorithmfortunableparametersexplorationinanefforttodetermineabetteroperatingstatethantheoneobtainedfromstepone(Section 7.3.1 ).Thegreedyalgorithmexploresthetunableparametersintheorderdeterminedinstep2.Thegreedyalgorithmstopsexploringatunableparameterassoonasatunableparametersettingyieldsalowerobjectivefunctionvalueascomparedtotheprevioustunableparametersettingforthattunableparameter,andhencenamedasgreedy.Thisgreedyapproachhelpsinreducingdesignspaceexplorationtodetermineanoperatingstate.Eventhoughweproposeagreedyalgorithmfordesignspaceexploration,anyotheralgorithmcanbeusedinstepthree. Algorithm 5 depictsouronlinegreedyoptimizationalgorithm,whichleveragestheinitialparametervaluesettings(Section 7.3.1 ),parametervalueexplorationorder(Section 7.3.1 ),andparameterarrangement(Section 7.3.2 ).Thealgorithmtakesasinputtheobjectivefunctionf(s),thenumberoftunableparametersN,thenumberofvaluesforeachtunableparametern,theintelligenttunableparameterarrangementvectorbP,thetunableparameters'initialvaluevectorbP0,andthetunableparameter'sexplorationorder(ascendingordescending)vectorbPd.ThealgorithminitializesstatefrombP0(line1)andfbestwith'sobjectivefunctionvalue(line2).ThealgorithmexploreseachparameterinbPiwherebPi2bP(Equation 7 )inascendingordescendingorder 207

PAGE 208

Input: f(s),N,n,bP,bP0,bPd Output: Sensornodestatethatmaximizesf(s)andthecorrespondingf(s)value initialtunableparametervaluesettingsfrombP0;1 fbest solutionfrominitialparametersettings;2 forbPi bP1tobPNdo3 explorebPiinascendingordescendingorderassuggestedbybPd;4 foreachbPi=fbpi1,bpi2,...,bpingdo5 ftemp currentstatesolution;6 ifftemp>fbestthen7 fbest ftemp;8 ;9 else10 break;11 end12 end13 end14 return,fbest Algorithm5:Onlinegreedyoptimizationalgorithmfortunableparametersexploration. asgivenbybPd(lines3-4).ForeachtunableparameterbPi(line5),thealgorithmassignsftemptheobjectivefunctionvaluefromthecurrentstate(line6).Thecurrentstate2Sdenotestunableparametervaluesettingsandcanbewrittenas =fPi=pixg[fPj,8j6=ig,i,j2f1,2,...,Ng(7) wherepix:x2f1,2,...,ngdenotestheparametervaluecorrespondingtothetunableparameterPibeingexploredandsetPj,8j6=idenotestheparametervaluesettingsotherthanthecurrenttunableparameterPibeingexploredandisgivenby Pj=8>><>>:Pj0,ifPjnotexploredbefore,8j6=iPjb,ifPjexploredbefore,8j6=i.(7) wherePj0denotestheinitialvalueoftheparameterasgivenbybP0andPjbdenotesthebestfoundvalueofPjafterexploringPj(lines5-13ofAlgorithm 5 ). Ifftemp>fbest(theobjectionfunctionvalueincreases),ftempisassignedtofbestandthestateisassignedtostate(lines7-9).Ifftempfbest,thealgorithmstopsexploringthecurrentparameterbPiandstartsexploringthenexttunableparameter(lines 208

PAGE 209

10-12).Thealgorithmreturnsthebestfoundobjectivefunctionvaluefbestandthestatecorrespondingtofbest. 7.3.4ComputationalComplexity ThecomputationalcomplexityforourdynamicoptimizationmethodologyisO(NmlogN+Nn),whichiscomprisedoftheintelligentinitialparametervaluesettingsandexplorationordering(Algorithm 4 )O(Nm),parameterarrangementO(NmlogN)(sortingrfkP(Equation 7 )contributestheNlogNfactor)(Section 7.3.2 ),andtheonlineoptimizationalgorithmforparameterexploration(Algorithm 5 )O(Nn).AssumingthatthenumberoftunableparametersNislargerthanthenumberofparameter'stunablevaluesn,thecomputationalcomplexityofourmethodologycanbegivenasO(NmlogN).Thiscomplexityrevealsthatourproposedmethodologyislightweightandisthusfeasibleforimplementationonsensornodeswithtightresourceconstraints. 7.4ExperimentalResults Inthissection,wedescribetheexperimentalsetupandresultsforthreeapplicationdomains:security/defense,healthcare,andambientconditionsmonitoring.Theresultsincludethepercentageimprovementsattainedbyourinitialtunableparametersettings(one-shotoperatingstate)overotheralternativeinitialvaluesettings,andacomparisonofourgreedyalgorithm(whichleveragesintelligentinitialparametersettings,explorationorder,andparameterarrangement)fordesignspaceexplorationwithothervariantsofagreedyalgorithmandSA.Thissectionalsopresentsanexecutiontimeanddatamemoryanalysistoverifythecomplexityofourdynamicoptimizationmethodology. 7.4.1ExperimentalSetup OurexperimentalsetupisbasedontheCrossbowIRISmoteplatform[ 95 ]withabatterycapacityof2000mA-husingtwoAAalkalinebatteries.TheIRISmoteplatformintegratesanAtmelATmega1281microcontroller[ 92 ],anMTS400sensorboard[ 90 ]withSensirionSHT1xtemperatureandhumiditysensors[ 91 ],andanAtmelAT-86RF230low-power2.4GHztransceiver[ 93 ].Table 7-1 showsthesensornodehardwarespecic 209

PAGE 210

Table7-1. CrossbowIRISmoteplatformhardwarespecications. NotationDescriptionValue VbBatteryvoltage3.6VCbBatterycapacity2000mA-hNbProcessinginstructionsperbit5RbsenSensingresolutionbits24VtTransceivervoltage3VRtxTransceiverdatarate250kbpsIrxtTransceiverreceivecurrent15.5mAIstTransceiversleepcurrent20nAVsSensingboardvoltage3VImsSensingmeasurementcurrent550AtmsSensingmeasurementtime55msIsSensingsleepcurrent0.3A values,correspondingtotheIRISmoteplatform,whichareusedbytheapplicationmetricsestimationmodel[ 95 ][ 92 ][ 91 ][ 93 ]. Weanalyzesixtunableparameters:processorvoltageVp,processorfrequencyFp,sensingfrequencyFs,packetsizePs,packettransmissionintervalPti,andtransceivertransmissionpowerPtx.Inordertoexplorethedelityofourmethodologyacrosssmallandlargedesignspaces,weconsidertwodesignspacecardinalities(numberofstatesinthedesignspace):jSj=729andjSj=31,104.ThetunableparametersforjSj=729are:Vp=f2.7,3.3,4g(volts),Fp=f4,6,8g(MHz)[ 92 ],Fs=f1,2,3g(samplespersecond)[ 91 ],Ps=f41,56,64g(bytes),Pti=f60,300,600g(seconds),andPtx=f-17,-3,1g(dBm)[ 93 ].ThetunableparametersforjSj=31,104are:Vp=f1.8,2.7,3.3,4,4.5,5g(volts),Fp=f2,4,6,8,12,16g(MHz)[ 92 ],Fs=f0.2,0.5,1,2,3,4g(samplespersecond)[ 91 ],Ps=f32,41,56,64,100,127g(bytes),Pti=f10,30,60,300,600,1200g(seconds),andPtx=f-17,-3,1,3g(dBm)[ 93 ].AllstatespacetuplesarefeasibleforjSj=729,whereasjSj=31,104contains7,779infeasiblestatespacetuplesbecauseallVpandFppairsarenotfeasible. 210

PAGE 211

Table7-2. DesirableminimumL,desirablemaximumU,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforasecurity/defense(defense)system,healthcare,andanambientconditionsmonitoringapplication.Onelifetimeunit=5days,onethroughputunit=20kbps,onereliabilityunit=0.05. NotationDefenseHealthCareAmbientMonitoring Ll8units12units6unitsUl30units32units40unitsl1units2units3unitsl36units40units60unitsLt20units19units15unitUt34units36units29unitst0.5units0.4units0.05unitst45units47units35unitsLr14units12units11unitsUr19.8units17units16unitsr10units8units6unitsr20units20units20units Inordertoevaluatetherobustnessofourmethodologyacrossdifferentapplicationswithvaryingapplicationmetricweightfactors,wemodelthreesampleapplicationdomains:asecurity/defensesystem,ahealthcareapplication,andanambientconditionsmonitoringapplication.WeassignapplicationspecicvaluesforthedesirableminimumL,desirablemaximumU,acceptableminimum,andacceptablemaximumobjectivefunctionparametervaluesforapplicationmetrics(Section 7.2.3 )asshowninTable 7-2 .Wespecifytheobjectivefunctionparametersasamultipleofbaseunitsforlifetime,throughput,andreliability.Weassumeonelifetimeunitisequalto5days,onethroughputunitisequalto20kbps,andonereliabilityunitisequalto0.05(percentageoferror-freepackettransmissions). Inordertoevaluateourone-shotdynamicoptimizationsolutionquality,wecomparethesolutionfromtheone-shotinitialparametersettingscP0withthesolutionsobtainedfromthefollowingfourpotentialinitialparametervaluesettings(althoughanyfeasiblen-tuples2Scanbetakenastheinitialparametersettings): 211

PAGE 212

I1assignstherstparametervalueforeachtunableparameter,thatis,I1=pi1,8i2f1,2,...,Ng. I2assignsthelastparametervalueforeachtunableparameter,thatis,I2=pin,8i2f1,2,...,Ng. I3assignsthemiddleparametervalueforeachtunableparameter,thatis,I3=bpin=2c,8i2f1,2,...,Ng. I4assignsarandomvalueforeachtunableparameter,thatis,I4=piq:q=rand()%n,8i2f1,2,...,Ngwhererand()denotesafunctiontogeneratearandom/pseduo-randomintegerand%denotesthemodulusoperator. AlthoughweanalyzedourmethodologyfortheIRISmotesplatform,threeapplicationdomains,andtwodesignspaces,ouralgorithmsareequallyapplicabletoanyplatform,applicationdomain,anddesignspace.Sincetheconstantassignmentsfortheminimumandmaximumdesirablevaluesandweightfactorsareapplication-dependentanddesigner-specied,appropriateassignmentscanbemadeforanyapplicationgiventheapplication'sspecicrequirements.Finally,sincethenumberoftunableparametersandtheparameters'possible/allowedtunablevaluesdictatesthesizeofthedesignspace,weevaluatebothlargeandsmalldesignspacesbutanysizeddesignspacecouldbeevaluatedbyvaryingthenumberoftunableparametersandassociatedvalues. 7.4.2Results Inthissubsection,wepresentresultsforpercentageimprovementsattainedbyourdynamicoptimizationmethodologyoverotheroptimizationmethodologies.WeimplementedourdynamicoptimizationmethodologyinC++.Toevaluatetheeffectivenessofourone-shotsolution,wecomparetheone-shotsolution'sresultswithfouralternativeinitialparameterarrangements(Section 7.4.1 ).Wenormalizetheobjectivefunctionvaluescorrespondingtotheoperatingstatesattainedbyourdynamicoptimizationmethodologywithrespecttotheoptimalsolutionobtainedusinganexhaustivesearch.Wecomparetherelativecomplexityofourone-shotdynamicoptimizationmethodologywithtwootherdynamicoptimizationmethodologies,whichleveragegreedy-andSA-basedalgorithmsfordesignspaceexploration[ 41 ].Although 212

PAGE 213

Table7-3. PercentageimprovementsattainedbycP0overotherinitialparametersettingsforjSj=729andjSj=31,104. )]TJ 142.81 0 Td[(jSj=729jSj=31,104ApplicationDomainI1I2I3I4I1I2I3I4 Security/DefenseSystem155%10%57%29%148%0.3%10%92%HealthCare78%7%31%11%73%0.3%10%45%AmbientConditionsMonitoring52%6%20%7%15%-7%-12%18% forbrevitywepresentresultsforonlyasubsetoftheinitialparametervaluesettings,applicationdomains,anddesignspaces,weobservedthatresultsforextensiveapplicationdomains,designspaces,andinitialparametersettingsrevealedsimilartrends. 7.4.2.1Percentageimprovementsofone-shotoverotherinitialparametersettings Table 7-3 depictsthepercentageimprovementsattainedbytheone-shotparametersettingscP0overotherparametersettingsfordifferentapplicationdomainsandweightfactors.Wepointoutthatdifferentweightfactorscouldresultindifferentpercentageimprovements,however,weobservedsimilartrendsforotherweightfactors.Table 7-3 showsthatone-shotinitialparametersettingscanresultinashighas155%percentageimprovementascomparedtootherinitialvaluesettings.Weobservethatsomearbitrarysettingsmaygiveacomparableorevenabettersolutionforaparticularapplicationdomain,applicationmetricweightfactors,anddesignspacecardinality,butthatarbitrarysettingwouldnotscaletootherapplicationdomains,applicationmetricweightfactors,anddesignspacecardinalities.Forexample,I3obtainsa12%betterqualitysolutionthancP0fortheambientconditionsmonitoringapplicationforjSj=31,104,butyieldsa10%lowerqualitysolutionforthesecurity/defenseandhealthcareapplicationsforjSj=31,104,anda57%,31%,and20%lowerqualitysolutionthancP0forthesecurity/defense,healthcare,andambientconditionsmonitoringapplications,respectively,forjSj=729.ThepercentageimprovementattainedbycP0overall 213

PAGE 214

applicationdomainsanddesignspacesis33%onaverage.Ourone-shotmethodologyistherstapproach(tothebestofourknowledge)tointelligentinitialtunableparametervaluesettingsforsensornodestoprovideagoodqualityoperatingstate,asarbitraryinitialparametervaluesettingstypicallyresultinapooroperatingstate.ResultsrevealthatonaveragecP0givesasolutionwithin8%oftheoptimalsolutionobtainedfromexhaustivesearch. ThepercentageimprovementattainedbycP0overallapplicationdomainsanddesignspacesis33%onaverage.Ourone-shotmethodologyistherstapproach(tothebestofourknowledge)tointelligentinitialtunableparametervaluesettingsforsensornodestoprovideagoodqualityoperatingstate,asarbitraryinitialparametervaluesettingstypicallyresultinapooroperatingstate.ResultsrevealthatonaveragecP0givesasolutionwithin8%oftheoptimalsolutionobtainedfromanexhaustivesearch[ 88 ]. 7.4.2.2Comparisonofone-shotwithgreedyvariants-andSA-baseddynamicoptimizationmethodologies Inordertoinvestigatetheeffectivenessofourone-shotmethodology,wecomparetheone-shotsolution'squality(indicatedbytheattainedobjectivefunctionvalue)withtwootherdynamicoptimizationmethodologies,whichleverageSA-basedandgreedy-based(denotedbyGDascwhereascstandsforascendingorderofparameterexploration)explorationofdesignspace.Weassigninitialparametervaluesettingsforgreedy-andSA-basedmethodologiesasI1andI4,respectively.Notethat,forbrevity,wepresentresultsforI1andI4,however,otherinitialparametersettingssuchasI2andI3wouldyieldsimilartrendswhencombinedwithgreedy-basedandSA-baseddesignspaceexploration. Fig. 7-3 showstheobjectivefunctionvaluenormalizedtotheoptimalsolution(obtainedfromexhaustivesearch)versusthenumberofstatesexploredfortheone-shot,GDasc,andSAalgorithmsforasecurity/defensesystemforjSj=729. 214

PAGE 215

Figure7-3. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforasecurity/defensesystemwhere!l=0.25,!t=0.35,!r=0.4,jSj=729. Theone-shotsolutioniswithin1.8%oftheoptimalsolutionobtainedfromexhaustivesearch.ThegureshowsthatGDascandSAexplore11states(1.51%ofthedesignspace)and10states(1.37%ofthedesignspace),respectively,toattainanequivalentorbetterqualitysolutionthantheone-shotsolution.Although,greedy-andSA-basedmethodologiesexplorefewstatestoreachacomparablesolutionasthatofourone-shotmethodology,theone-shotmethodologyissuitablewhendesignspaceexplorationisnotanoptionduetoanextremelylargedesignspaceand/orextremelystringentcomputational,memory,andtimingconstraints.Theseresultsindicatethatotherarbitraryinitialvaluesettings(e.g.,I1,I4,etc.)donotprovideagoodqualityoperatingstateandnecessitatedesignspaceexplorationbyonlinealgorithms(e.g.,greedy)toprovideagoodqualityoperatingstate.Wepointoutthatifthegreedy-andSA-basedmethodologiesleverageourone-shotinitialtunableparametervaluesettingsI,furtherimprovementsovertheone-shotsolutioncanproduceaverygoodquality(optimalornear-optimal)operatingstate[ 41 ]. Fig. 7-4 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionversusthenumberofstatesexploredforasecurity/defensesystemforjSj=31,104.Theone-shotsolutioniswithin8.6%oftheoptimalsolution.ThegureshowsthatGDasc 215

PAGE 216

Figure7-4. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforasecurity/defensesystemwhere!l=0.25,!t=0.35,!r=0.4,jSj=31,104. convergestoalowerqualitysolutionthantheone-shotsolutionafterexploring9states(0.029%ofthedesignspace)andSAexplores8states(0.026%ofthedesignspace)toyieldabetterqualitysolutionthantheone-shotsolution.Theseresultsrevealthatthegreedyexplorationofparametersmaynotnecessarilyattainabetterqualitysolutionthanourone-shotsolution. Fig. 7-5 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionversusthenumberofstatesexploredforahealthcareapplicationforjSj=729.Theone-shotsolutioniswithin2.1%oftheoptimalsolution.ThegureshowsthatGDascconvergestoanalmostequalqualitysolutionascomparedtotheone-shotsolutionafterexploring11states(1.5%ofthedesignspace)andSAexplores10states(1.4%ofthedesignspace)toyieldanalmostequalqualitysolutionascomparedtotheone-shotsolution.Theseresultsindicatethatfurtherexplorationofthedesignspaceisrequiredtondanequivalentqualitysolutionascomparedtoone-shotiftheintelligentinitialvaluesettingsleveragedbyone-shotisnotused. Fig. 7-6 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionversusthenumberofstatesexploredforahealthcareapplicationforjSj=31,104.Theone-shotsolutioniswithin1.6%oftheoptimalsolution.ThegureshowsthatGDasc 216

PAGE 217

Figure7-5. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforahealthcareapplicationwhere!l=0.25,!t=0.35,!r=0.4,jSj=729. Figure7-6. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforahealthcareapplicationwhere!l=0.25,!t=0.35,!r=0.4,jSj=31,104. convergestoalowerqualitysolutionthantheone-shotsolutionafterexploring9states(0.029%ofthedesignspace)andSAexplores6states(0.019%ofthedesignspace)toyieldabetterqualitysolutionthantheone-shotsolution.Theseresultsconrmthatthegreedyexplorationofparametersmaynotnecessarilyattainabetterqualitysolutionthanourone-shotsolution. Fig. 7-7 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionversusthenumberofstatesexploredforanambientconditionsmonitoringapplication 217

PAGE 218

Figure7-7. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforanambientconditionsmonitoringapplicationwhere!l=0.4,!t=0.5,!r=0.1,jSj=729. forjSj=729.Theone-shotsolutioniswithin7.7%oftheoptimalsolution.ThegureshowsthatGDascandSAconvergetoanequivalentorbetterqualitysolutionthantheone-shotsolutionafterexploring4states(0.549%ofthedesignspace)and10states(1.37%ofthedesignspace).Theseresultsagainconrmthatthegreedy-andSA-basedexplorationcanprovideimprovedresultsovertheone-shotsolution,butrequiresadditionalstateexploration. Fig. 7-8 showstheobjectivefunctionvaluenormalizedtotheoptimalsolutionversusthenumberofstatesexploredforanambientconditionsmonitoringapplicationforjSj=31,104.Theone-shotsolutioniswithin24.7%oftheoptimalsolution.ThegureshowsthatbothGDascandSAconvergetoanequivalentorbetterqualitysolutionthantheone-shotsolutionafterexploring3states(0.01%ofthedesignspace).TheseresultsindicatethatbothgreedyandSAcangivegoodqualitysolutionsafterexploringaverysmallpercentageofthedesignspaceandbothgreedyandSA-basedmethodsenablelightweightdynamicoptimizations[ 41 ].Theresultsalsoindicatethattheone-shotsolutionprovidesagoodqualitysolutionwhenfurtherdesignspaceexplorationisnotpossibleduetoresourceconstraints. 218

PAGE 219

Figure7-8. Objectivefunctionvaluenormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforone-shot,greedy,andSAalgorithmsforanambientconditionsmonitoringapplicationwhere!l=0.4,!t=0.5,!r=0.1,jSj=31,104. 7.4.2.3Comparisonofoptimizedgreedy(GD)withgreedyvariants-andSA-baseddynamicoptimizationmethodologies Forcomparisonpurposes,weimplementedanSA-basedalgorithm,ourgreedyonlineoptimizationalgorithm(GD)(whichleveragesintelligentinitialparametervalueselection,explorationordering,andparameterarrangement),andseveralothergreedyonlinealgorithmvariations(Table 7-4 )inC++.WecompareourresultswithSAtoproviderelativecomparisonsofourdynamicoptimizationmethodologywithanothermethodologythatleveragesanSA-basedonlineoptimizationalgorithmandarbitraryinitialvaluesettings.Wepointoutthatstep3ofourdynamicoptimizationmethodologycanuseanylightweightalgorithm(e.g.,greedy,SA-based)intheimprovementmode(Fig. 7-1 ).Although,wepresentSAforcomparisonwiththegreedyalgorithm,bothofthesealgorithmsareequallyapplicabletoourdynamicoptimizationmethodology.WecompareGDresultswithdifferentgreedyalgorithmvariations(Table 7-4 )toprovideaninsightintohowinitialparametervaluesettings,explorationordering,andparameterarrangementaffectthenaloperatingstatequality.Wenormalizetheobjectivefunctionvalue(correspondingtotheoperatingstate)attainedbythealgorithmswithrespectto 219

PAGE 220

Table7-4. Greedyalgorithmswithdifferentparameterarrangementsandexplorationorders NotationDescription GDGreedyalgorithmwithparameterexplorationorderbPdandarrangementbPGDascAExploresparametervaluesinascendingorderwitharrangementA=fVp,Fp,Fs,Ps,Pti,PtxgGDascBExploresparametervaluesinascendingorderwitharrangementB=fPtx,Pti,Ps,Fs,Fp,VpgGDascCExploresparametervaluesinascendingorderwitharrangementC=fFs,Pti,Ptx,Vp,Fp,PsgGDdesDExploresparametervaluesindescendingorderwitharrangementD=fVp,Fp,Fs,Ps,Pti,PtxgGDdesEExploresparametervaluesindescendingorderwitharrangementE=fPtx,Pti,Ps,Fs,Fp,VpgGDdesFExploresparametervaluesindescendingorderwitharrangementF=fPs,Fp,Vp,Ptx,Pti,Fsg theoptimalsolution(objectivefunctionvaluecorrespondingtotheoptimaloperatingstate)obtainedfromanexhaustivesearch. Fig. 7-9 showstheobjectivefunctionvaluesnormalizedtotheoptimalsolutionforSAandgreedyalgorithmsversusthenumberofstatesexploredforasecurity/defensesystemforjSj=729.ResultsindicatethatGDascA,GDascB,GDascC,GDdesD,GDdesE,GDdesF,andGDconvergetoasteadystatesolution(objectivefunctionvaluecorrespondingtotheoperatingstate)afterexploring11,10,11,10,10,9,and8states,respectively.Wepointoutthatwedonotplottheresultsforeachiterationandgreedyalgorithmvariationsforbrevity,however,weobtainedtheresultsforalliterationsandgreedyalgorithmvariations.TheseconvergenceresultsshowthatGDconvergestoanaloperatingstateslightlyfasterthanothergreedyalgorithms,exploringonly1.1%ofthedesignspace.GDascAandGDascBconvergetoalmostequalqualitysolutionsasGDdesDandGDdesEshowingthatascendingordescendingparametervaluesexploration 220

PAGE 221

Figure7-9. ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandthegreedyalgorithmsforasecurity/defensesystemwhere!l=0.25,!t=0.35,!r=0.4,andjSj=729. andparameterarrangementsdonotsignicantlyimpactthesolutionqualityforthisapplicationforjSj=729. ResultsalsoindicatethattheSAalgorithmoutperformsallgreedyalgorithmsandconvergestotheoptimalsolutionafterexploring400statesor55%ofthedesignspace.Fig. 7-9 alsoveriestheabilityofourmethodologytodetermineagoodquality,near-optimalsolutioninone-shotthatiswithin1.4%oftheoptimalsolution.GDachievesonlya1.8%improvementovertheinitialstateafterexploring8states. Fig. 7-10 showstheobjectivefunctionvaluesnormalizedtotheoptimalsolutionforSAandgreedyalgorithmsversusthenumberofstatesexploredforasecurity/defensesystemforjSj=31,104.ResultsrevealthatGDconvergestothenalsolutionbyexploringonly0.04%ofthedesignspace.GDdesD,GDdesE,andGDdesFconvergetobettersolutionsthanGDascA,GDascB,andGDascCshowingthatdescendingparametervaluesexplorationandparameterarrangementsD,E,andFarebetterforthisapplicationascomparedtotheascendingparametervaluesexplorationandparameterarrangementsA,B,andC.Thisdifferenceisbecauseadescendingexplorationordertendstoselecthighertunableparametervalues,whichincreasesthethroughputconsiderablyascomparedtolowertunableparametervalues.Sincethroughputhas 221

PAGE 222

Figure7-10. ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandgreedyalgorithmsforasecurity/defensesystemwhere!l=0.25,!t=0.35,!r=0.4,andjSj=31,104. beenassignedahigherweightfactorforthisapplicationthanthelifetime,betteroverallobjectivefunctionvaluesareattained. ComparingFig. 7-10 andFig. 7-9 revealsthatthedesignspacesizealsoaffectsthesolutionqualityinadditiontotheparametervalueexplorationorderandparameterarrangement.Forexample,forjSj=729,theascendinganddescendingparametervalueexplorationorderandparameterarrangementresultsincomparablequalitysolutions,whereasforjSj=31,104,thedescendingparametervalueexplorationorderresultsinhigherqualitysolutions.Again,theSAalgorithmoutperformsallgreedyalgorithmsandconvergestotheoptimalsolutionforjSj=31,104afterexploring100statesor0.3%ofthedesignspace.Fig. 7-10 alsoveriestheabilityofourmethodologytodetermineagoodquality,near-optimalsolutioninone-shotthatiswithin9%oftheoptimalsolution.GDachievesonlya0.3%improvementovertheinitialstate(one-shotsolution)afterexploring11states. ResultsforahealthcareapplicationforjSj=729revealthatGDconvergestothenalsolutionslightlyfasterthanothergreedyalgorithms,exploringonly1%ofthedesignspace.TheSAalgorithmoutperformsthegreedyalgorithmvariantsafterexploring 222

PAGE 223

400statesor55%ofthedesignspaceforjSj=729,buttheSAimprovementoverthegreedyalgorithmvariantsisinsignicantasthegreedyalgorithmvariantsattainnear-optimalsolutions.Resultsindicatethattheone-shotsolutioniswithin2%oftheoptimalsolution.GDachievesonlya2%improvementovertheone-shotsolutionafterexploring8states. ResultsforahealthcareapplicationforjSj=31,104revealthatGDconvergestothenalsolutionbyexploringonly0.0257%ofthedesignspace.TheSAalgorithmoutperformsallgreedyalgorithmsandconvergestotheoptimalsolutionafterexploring100states(0.3%ofthedesignspace).Theone-shotsolutioniswithin1.5%oftheoptimalsolution.GDachievesonlya0.2%improvementovertheone-shotsolutionafterexploring8states. Fig. 7-11 showstheobjectivefunctionvaluesnormalizedtotheoptimalsolutionversusthenumberofstatesexploredforanambientconditionsmonitoringapplicationforjSj=729.ResultsrevealthatGDconvergestothenalsolutionslightlyfasterthanothergreedyalgorithms,exploringonly1.1%ofthedesignspace.GDascA,GDascB,andGDascCconvergetoahigherqualitysolutionthanGDdesD,GDdesE,andGDdesFbecausetheascendingexplorationordertendstoselectlowertunableparametervalues,whichresultsincomparativelylargerlifetimevaluesascomparedtohighertunableparametervalues.Thishigherlifetimeresultsinhigherlifetimeobjectivefunctionvaluesandthushigheroverallobjectivefunctionvalues.WeobservethatthegreedyalgorithmvariantsresultinhigherqualitysolutionsafterexploringmorestatesthantheoneattainedbyGD,sinceGDascA,GDascB,GDascC,andGDdesFattaintheoptimalsolutionforjSj=729.ThisobservationrevealsthatotherarbitraryparameterarrangementsandexplorationordersmayobtainbettersolutionsthanGDbutthosearbitraryarrangementsandexplorationorderswouldnotscalefordifferentapplicationdomainswithdifferentweightfactorsandfordifferentdesignspacecardinalities.TheSAalgorithmoutperformsGDdesD,GDdesE,andGDafterexploring400states(55%ofthedesignspace).GDascA,GDascB,GDascC, 223

PAGE 224

Figure7-11. ObjectivefunctionvaluesnormalizedtotheoptimalsolutionforavaryingnumberofstatesexploredforSAandgreedyalgorithmsforanambientconditionsmonitoringapplicationwhere!l=0.4,!t=0.5,!r=0.1,andjSj=729. andGDdesFattainoptimalsolutions.Ourone-shotsolutioniswithin8%oftheoptimalsolution.GDachievesonlya2%improvementovertheone-shotsolutionafterexploring8states. ResultsforanambientconditionsmonitoringapplicationforjSj=31,104indicatethatGDconvergestotheoptimalsolutionafterexploring13states(0.04%ofdesignspace),witha17%improvementovertheone-shotsolution.Theone-shotsolutioniswithin14%oftheoptimalsolution.GDascA,GDascB,andGDascCconvergetoabettersolutionthanGDdesD,GDdesE,andGDdesFforsimilarreasonsasjSj=729.Theone-shotsolutioniswithin14%oftheoptimalsolution.SAconvergestoanear-optimalsolutionafterexploring400states(1.3%ofthedesignspace). Theresultsfordifferentapplicationdomainsanddesignspacesverifythattheone-shotmodeprovidesahigh-qualitysolutionthatiswithin8%oftheoptimalsolutionaveragedoverallapplicationdomainsanddesignspacecardinalities.Theseresultsalsoverifythatimprovementscanbeachievedovertheone-shotsolutionduringtheimprovementmode.TheresultsindicatethatGDmayexploremorestatesthanothergreedyalgorithmsifstateexplorationprovidesanoticeableimprovementoverthe 224

PAGE 225

one-shotsolution.Theresultsalsoprovideaninsightintotheconvergenceratesandrevealthateventhoughthedesignspacecardinalityincreasesby43x,bothheuristicalgorithms(greedyandSA)stillexploreonlyasmallpercentageofthedesignspaceandresultinhigh-qualitysolutions.Furthermore,althoughSAoutperformsthegreedyalgorithmsafterexploringacomparativelylargerportionofthedesignspace,GDstillprovidesanoptimalornear-optimalsolutionwithsignicantlylessdesignspaceexploration.TheseresultsadvocatetheuseofGDasadesignspaceexplorationalgorithmforconstrainedapplications,whereasSAcanbeusedforrelativelylessconstrainedapplications.WepointoutthatbothGDandSAareonlinealgorithmsfordynamicoptimizationandaresuitableforlargerdesignspacesascomparedtootherstochasticalgorithms,suchasMDP-basedalgorithmswhichareonlysuitableforrestricted(comparativelysmaller)designspaces[ 37 ]. 7.4.2.4ComputationalComplexity Weanalyzetherelativecomplexityofthealgorithmsbymeasuringtheirexecutiontimeanddatamemoryrequirements.Weperformdatamemoryanalysisforeachstepofourdynamicoptimizationmethodology.Ourdatamemoryanalysisassumesan8-bitprocessorforsensornodeswithintegerdatatypesrequiring2bytesofstorageandoatdatatypesrequiring4bytesofstorage.Analysisrevealsthattheone-shotsolution(step1)requiresonly150,188,248,and416byteswhereassteptworequires94,140,200,and494bytesfor(numberoftunableparametersN,numberofapplicationmetricsm)equalto(3,2),(3,3),(6,3),and(6,6),respectively.GDinstep3requires458,528,574,870,and886bytes,whereasSAinstep3requires514,582,624,920,and936bytesofstoragefordesignspacecardinalitiesof8,81,729,31104,and46656,respectively. ThedatamemoryanalysisshowsthatSAhascomparativelylargermemoryrequirementsthanthegreedyalgorithm.Ouranalysisrevealsthatthedatamemoryrequirementsforallthreestepsofourdynamicoptimizationmethodologyincreases 225

PAGE 226

linearlyasthenumberoftunableparameters,tunablevalues,andapplicationmetrics,andthusthedesignspace,increases.Theanalysisveriesthatalthoughallthreestepsofourdynamicoptimizationmethodologyhavelowdatamemoryrequirements,theone-shotsolutioninsteponerequires361%lessmemoryonaverage. Wemeasuredtheexecutiontimeforallthreestepsofourdynamicoptimizationmethodologyaveragedover10,000runs(tosmoothanydiscrepanciesinexecutiontimeduetooperatingsystemoverheads)onanIntelXeonCPUrunningat2.66GHz[ 169 ]usingtheLinux/Unixtimecommand[ 170 ].WescaledtheseexecutiontimestotheAtmelATmega1281microcontroller[ 92 ]runningat8MHz.Eventhoughscalingdoesnotprovide100%accuracyforthemicrocontrollerruntimebecauseofdifferentinstructionsetarchitecturesandmemorysubsystems,scalingprovidesreasonableruntimeestimatesandenablesrelativecomparisons.Resultsshowedthatsteponeandsteptworequired1.66msand0.332ms,respectively,bothforjSj=729andjSj=31,104.Forstepthree,wecomparedGDwithSA.GDexplored10statesandrequired0.887msand1.33msonaveragetoconvergetothesolutionforjSj=729andjSj=31,104,respectively.SArequired2.76msand2.88mstoexploretherst10states(toprovideafaircomparisonwithGD)forjSj=729andjSj=31,104,respectively.TheothergreedyalgorithmsrequiredcomparativelymoretimethanGDbecausetheyrequiredmoredesignstateexplorationtoconvergethanGD,however,allthegreedyalgorithmsrequiredlessexecutiontimethanSA. Toverifythatourdynamicoptimizationmethodologyislightweight,wecomparedtheexecutiontimeresultsforallthreestepsofourdynamicoptimizationmethodologywiththeexhaustivesearch.Theexhaustivesearchrequired29.526msand2.765secondsforjSj=729andjSj=31,104,respectively,whichgivesspeedupsof10xand832x,respectively,forourdynamicoptimizationmethodology.Theexecutiontimeanalysisrevealsthatallthreestepsofourdynamicoptimizationmethodologyrequiresexecutiontimeontheorderofmilliseconds,andtheone-shotsolutionrequires 226

PAGE 227

138%lessexecutiontimeonaverageascomparedtoallthreestepsofthedynamicoptimizationmethodology.Executiontimesavingsattainedbytheone-shotsolutionascomparedtothethreestepsofourdynamicoptimizationmethodologyare73%and186%forGDandSA,respectively,whenjSj=729,andare100%and138%forGDandSA,respectively,whenjSj=31,104.Theseresultsindicatethatthedesignspacecardinalityaffectstheexecutiontimelinearlyandourdynamicoptimizationmethodology'sadvantageincreasesasthedesignspacecardinalityincreases.Weveriedourexecutiontimeanalysisusingtheclock()function[ 171 ],whichconrmedsimilartrends. Tofurtherverifythatourdynamicoptimizationmethodologyislightweight,wecalculatetheenergyconsumptionforthetwomodesofourmethodologytheone-shotandtheimprovementmodeswitheitheraGD-orSA-basedonlinealgorithm.WecalculatetheenergyconsumptionEdynforanAtmelATmega1281microcontroller[ 92 ]operatingatVp=2.7VandFp=8MHzasEdyn=VpIapTexewhereIapandTexedenotetheprocessor'sactivecurrentandtheexecutiontimeforthemethodology'soperatingmodeat(Vp,Fp),respectively(weobservedsimilartrendsforotherprocessorvoltageandfrequencysettings).Wepointoutthatweconsidertheexecutiontimeforexploringtherst10statesbothfortheGD-andSA-basedonlinealgorithmsinourenergycalculationsasboththeGDandSAalgorithmsattainednear-optimalresultsafterexploring10statesbothforjSj=729andjSj=31,104.Table 7-5 summarizestheenergycalculationsfordifferentmodesofourdynamicoptimizationmethodologyaswellasfortheexhaustivesearchforjSj=729andjSj=31,104.Weassumethatthesensornode'sbatteryenergyinourcalculationsisEb=25,920J,whichiscomputedusingEquation 3 (fromourapplicationmetricsestimationmodel).Resultsindicatethatone-shotconsumes1,679%and166,510%lessenergyascomparedtotheexhaustivesearchforjSj=729andjSj=31,104,respectively.ImprovementmodeusingGDastheonlinealgorithmconsumes926%and83,135%lessenergyascomparedtothe 227

PAGE 228

Table7-5. Energyconsumptionfortheone-shotandtheimprovementmodeforourdynamicoptimizationmethodology.IMGDjSj=XandIMSAjSj=XdenotetheimprovementmodeusingGDandSAastheonlinealgorithms,respectively,forjSj=XwhereX=f729,31,104g.ESjSj=XdenotestheexhaustivesearchforjSj=XwhereX=f729,31,104g.BfdenotesthefractionofbatteryenergyconsumedinanoperatingmodeandRldenotesthemaximumnumberoftimes(runs)ourdynamicoptimizationmethodologycanbeexecutedinagivenmodedependinguponthesensornode'sbatteryenergy. ModeTexe(ms)Edyn(J)BfRl One-shot1.6623.759.1610)]TJ /F4 7.97 Tf 6.58 0 Td[(101.11010IMGDjSj=7292.87941.21.610)]TJ /F4 7.97 Tf 6.58 0 Td[(9629.13106IMSAjSj=7294.752682.6210)]TJ /F4 7.97 Tf 6.59 0 Td[(9381.2106ESjSj=72929.526422.521.6310)]TJ /F4 7.97 Tf 6.58 0 Td[(1061.35106IMGDjSj=31,1043.32247.541.8310)]TJ /F4 7.97 Tf 6.59 0 Td[(9545.22106IMSAjSj=31,1044.87269.722.710)]TJ /F4 7.97 Tf 6.58 0 Td[(9371.77106ESjSj=31,1042,76539,5701.5310)]TJ /F4 7.97 Tf 6.59 0 Td[(6655103 exhaustivesearchforjSj=729andjSj=31,104,respectively.ImprovementmodeusingSAastheonlinealgorithmconsumes521%and56,656%lessenergyascomparedtotheexhaustivesearchforjSj=729andjSj=31,104,respectively.Furthermore,ourdynamicoptimizationmethodologyusingGDastheonlinealgorithmcanbeexecuted545.22106)]TJ /F1 11.955 Tf 12.5 0 Td[(655103=544.6106moretimesthantheexhaustivesearchand173.45106moretimesthanwhenusingSAastheonlinealgorithmforjSj=31,104.Theseresultsverifythatourdynamicoptimizationmethodologyislightweightandcanbetheoreticallyexecutedontheorderofmilliontimesevenonenergy-constrainedsensornodes. 7.5ConcludingRemarks Inthispaper,weproposedalightweightdynamicoptimizationmethodologyforEWSNs,whichprovidedahigh-qualitysolutioninone-shotusinganintelligentinitialtunableparametervaluesettingsforhighlyconstrainedapplications.Wealsoproposedanonlinegreedyoptimizationalgorithmthatleveragedintelligentdesignspaceexplorationtechniquestoiterativelyimproveontheone-shotsolutionforless 228

PAGE 229

constrainedapplications.Resultsshowedthatourone-shotsolutionisnear-optimalandwithin8%oftheoptimalsolutiononaverage.Comparedwithsimulatingannealing(SA)anddifferentgreedyalgorithmvariations,resultsshowedthattheone-shotsolutionyieldedimprovementsashighas155%overotherarbitraryinitialparametersettings.Resultsindicatedthatourgreedyalgorithmconvergedtotheoptimalornear-optimalsolutionafterexploringonly1%and0.04%ofthedesignspacewhereasSAexplored55%and1.3%ofthedesignspacefordesignspacecardinalitiesof729and31,104,respectively.Datamemoryandexecutiontimeanalysisrevealedthatourone-shotsolution(stepone)required361%and85%lessdatamemoryandexecutiontime,respectively,whencomparedtousingallthethreestepsofourdynamicoptimizationmethodology.Furthermore,one-shotconsumed1,679%and166,510%lessenergyascomparedtotheexhaustivesearchforjSj=729andjSj=31,104,respectively.ImprovementmodeusingGDastheonlinealgorithmconsumed926%and83,135%lessenergyascomparedtotheexhaustivesearchforjSj=729andjSj=31,104,respectively.Computationalcomplexityalongwiththeexecutiontime,datamemoryanalysis,andenergyconsumptionconrmedthatourmethodologyislightweightandthusfeasibleforsensornodeswithlimitedresources. Futureworkincludestheincorporationofprolingstatisticsintoourdynamicoptimizationmethodologytoprovidefeedbackwithrespecttochangingenvironmentalstimuli.Inaddition,weplantofurtherverifyourdynamicoptimizationmethodologybyimplementingourmethodologyonaphysicalhardwaresensornodeplatform.Futureworkalsoincludestheextensionofourdynamicoptimizationmethodologytoglobaloptimizations,whichwillensurethatindividualsensornodetunableparametersettingsarebothoptimalforthesensornodeandfortheentireEWSN. 229

PAGE 230

CHAPTER8HIGH-PERFORMANCEENERGY-EFFICIENTPARALLELMULTI-COREEMBEDDEDCOMPUTING Eventhoughliteraturediscusseshigh-performancecomputing(HPC)forsupercomputers[ 27 ][ 28 ][ 29 ][ 30 ],thereexistslittlediscussiononhigh-performanceenergy-efcientembeddedcomputing(HPEEC)[ 31 ].Inembeddedsystems,mostlyhigh-performanceandenergy-efciencyisattainedviamulti-corearchitectures,whichinducesparallelcomputinginembeddedsystems.TheintegrationofHPEECandparallelcomputingcanbecoinedashigh-performanceenergy-efcientparallelembeddedcomputing(HPEPEC). ThedistinctionbetweenHPCandparallelcomputingforsupercomputersandHPEPECisimportantbecauseperformanceisthemostsignicantmetricforsupercomputerswithlessemphasisgiventoenergy-efciency,whereasenergy-efciencyisaprimaryconcernforHPEPEC.Forexample,eachofthe10mostpowerfulcontemporarysupercomputershasapeakpowerrequirementofupto10MW,whichisequivalenttothepowerneedsofacitywithapopulationof40,000[ 177 ][ 31 ].Toacknowledgetheincreasingsignicanceofenergy-efcientcomputing,theGreen500listrankssupercomputersusingtheFLOPSperwattperformancemetric[ 178 ].Table 8-1 liststhetop5greensupercomputersalongwiththeirtop500supercomputerranking.Thetableshowsthatthetopperformingsupercomputersarenotnecessarilyenergy-efcient[ 177 ][ 178 ].Table 8-1 indicatesthatmostofthetopgreensupercomputersconsistoflow-powerembeddedprocessorclustersaimingatachievinghighperformanceperwattandhighperformanceperunitarea[ 179 ]. Fig. 8-1 givesanoverviewoftheHPEPECdomain,whichspansarchitecturalapproachestomiddlewareandsoftwareapproaches.Inthischapterwefocusonhigh-performanceandenergy-efcienttechniquesthatareapplicabletoembeddedsystems(CMPs,SoCs,orMPSoCs)tomeetparticularapplicationrequirements.Althoughthemainfocusofthechapterisonembeddedsystems,manyoftheenergy 230

PAGE 231

Table8-1. TopGreen500andTop500supercomputersasofJune2011[ 177 ][ 178 ] SupercomputerGreen500RankTop500RankCoresPowerEfciencyPeakPerformancePeakPower BlueGene/QPrototype2110981922097.19MFLOPS/W104857.6GFLOPS40.95kWBlueGene/QPrototype1216581921684.20MFLOPS/W104857.6GFLOPS38.80kWDEGIMACluster,Inteli5343079201375.88MFLOPS/W111150GFLOPS34.24kWTSUBAME2.04573278958.35MFLOPS/W2287630GFLOPS1243.80kWiDataPlexDX360M3,Xeon2.45543072891.88MFLOPS/W293274GFLOPS160.00kW 231

PAGE 232

Figure8-1. High-performanceenergy-efcientparallelembeddedcomputing(HPEPEC)domain. andperformanceissuesareequallyapplicabletosupercomputerssincestate-of-the-artsupercomputersleverageembeddedprocessors/chips(e.g.,Jaguarsupercomputercomprisingof224162processorcoresleveragesAMDOpteronsix-coreCMPs[ 177 ]). 232

PAGE 233

However,wesummarizeseveraldifferencesbetweensupercomputingapplicationsandembeddedapplicationsasfollows: 1. Supercomputingapplicationstendtobehighlydataparallelwherethegoalistodecomposeataskwithalargedatasetacrossmanyprocessingunitswhereeachsubtaskoperatesonaportionofthedataset.Ontheotherhand,embeddedapplicationstendtoconsistofmanytaskswhereeachtaskisexecutedonasingleprocessingunitandmayhavearrivalanddeadlineconstraints. 2. Supercomputingapplicationstendtofocusonleveragingalargenumberofprocessorswhereasthescaleofembeddedapplicationsisgenerallymuchsmaller. 3. Supercomputingapplications'mainoptimizationobjectiveisperformance(althoughenergyisincreasinglybecomingaveryimportantsecondarymetric),whileperformanceandenergyareequallyimportantobjectivesforembeddedapplications.Also,reliabilityandfaulttoleranceplayamoreimportantroleinembeddedapplicationsascomparedtosupercomputingapplications. TheHPEPECdomainbenetsfromarchitecturalinnovationsinprocessorcorelayouts(e.g.,heterogeneousCMP,tiledmulti-corearchitectures),memorydesign(e.g.,transactionalmemory,cachepartitioning),andinterconnectionnetworks(e.g.,packet-switched,photonic,wireless).TheHPEPECplatformsprovidehardwaresupportforfunctionalitiesthatcanbecontrolledbymiddlewaresuchasdynamicvoltageandfrequencyscaling(DVFS),hyper-threading,helperthreading,energymonitoringandmanagement,dynamicthermalmanagement(DTM),andvariouspower-gatingtechniques.TheHPEPECdomainbenetsfromsoftwareapproachessuchastaskscheduling,taskmigration,andloadbalancing.ManyoftheHPEPECtechniquesatdifferentlevels(e.g.,architectural,middleware,andsoftware)arecomplementaryinnatureandworkinconjunctionwithoneanothertobettermeetapplicationrequirements.Tothebestofourknowledge,thisistherstdissertationtargetingHPEPECthatprovidesacomprehensiveclassicationofvariousHPEPECtechniquesinrelationtomeetingdiverseembeddedapplicationrequirements. Differentembeddedapplicationshavedifferentcharacteristicssuchasthroughput-intensive,parallelanddistributed,real-time,reliability-constrained,andthermal-constrained. 233

PAGE 234

VariousHPEPECtechniquesatdifferentlevels(e.g.,architecture,middleware,andsoftware)canbeusedtoenableanembeddedplatformtomeettheembeddedapplicationrequirements.Fig. 1-1 classiesembeddedapplicationcharacteristicsandtheHPEECtechniquesavailableatarchitecture,middleware,andsoftwarelevelsthatcanbeleveragedbytheembeddedplatformsexecutingtheseapplicationstomeettheapplicationrequirements.Forexample,parallelembeddedapplicationscanleveragearchitecturalinnovations(e.g.,tiledmulti-corearchitectures,high-bandwidthinterconnects),hardware-assistedmiddlewaretechniques(e.g.,speculativeapproaches,DVFS,hyper-threading),andsoftwaretechniques(e.g.,dataforwarding,taskscheduling,andtaskmigration).WepointoutthatHPEECtechniquesarenotorthogonalandmanyofthesetechniquescanbeappliedinconjunctionwithoneanothertomorecloselymeetapplicationrequirements.Inthischapter,wediscussHPEECtechniquespertinenttodistributedandparallelembeddedsystems.However,wepointoutthatHPEECtechniquesthatbenetoneapplicationrequirement(e.g.,parallel,low-power)mayalsobenetotherapplicationrequirements(e.g.,throughput,real-timedeadlines). 8.1ArchitecturalApproaches NovelHPEECarchitecturalapproachesplayadominantroleinmeetingvaryingapplicationrequirements.Thesearchitecturalapproachescanbebroadlycategorizedintofourcategories:corelayout,memorydesign,interconnectionnetworks,andreductiontechniques.Inthissection,wedescribetheseHPEECarchitecturalapproaches. 8.1.1CoreLayout Inthissubsection,wediscussvariouscorelayouttechniquesencompassingchipandprocessordesignsincehigh-performancecannotbeachievedonlyfromsemiconductortechnologyadvancements.Thereexistvariouscorelayoutconsiderationsduringchipandprocessordesignsuchaswhethertousehomogeneous(coresofthesametype)orheterogeneouscores(coresofvaryingtypes),whethertopositionthecoresina2Dor3Dlayoutonthechip,whethertodesignindependent 234

PAGE 235

processorcoreswithswitchesthatcanturnon/offprocessorcores,ortohavearecongurableintegratedcircuitthatcanbeconguredtoformprocessorcoresofdifferentgranularity.Inthissubsection,wedescribeafewcorelayouttechniquesincludingheterogeneousCMP,conjoined-coreCMP,tiledmulti-corearchitectures,3Dmulti-corearchitectures,composablemulti-corearchitectures,multi-componentarchitectures,andstochasticprocessors.Wealsodiscussthepower/energyissuesassociatedwiththesearchitecturalapproaches. 8.1.1.1HeterogeneousCMP HeterogeneousCMPsconsistofmultiplecoresofvaryingsize,performance,andcomplexityonasingledie.SincetheamountofILPorTLPvariesfordifferentworkloads,buildingaCMPwithsomelargecoreswithhighsingle-threadperformanceandsomesmallcoreswithgreaterthroughputperdieareaprovidesanattractiveapproachforchipdesign.ResearchindicatesthatthebestheterogeneousCMPscontaincorescustomizedtoasubsetofapplicationcharacteristics(sincenosinglecorecanbewellsuitedforallembeddedapplications)resultinginnon-monotoniccores(i.e.,corescannotbestrictlyorderedintermsofperformanceorcomplexityforalltheapplications)[ 180 ].Toachievehighperformance,applicationsaremappedtotheheterogeneouscoressuchthattheassignedcorebestmeetsanapplication'sresourcerequirements.HeterogeneousCMPscanprovideperformancegainsashighas40%butattheexpenseofadditionalcustomizationcost[ 181 ]. 8.1.1.2Conjoined-CoreCMP Conjoined-coreCMPsaremultiprocessorsthatallowtopologicallyfeasibleresourcesharing(e.g.,oating-pointunits(FPUs),instructionanddatacaches)betweenadjacentcorestoreducedieareawithminimalimpactonperformanceandimprovetheoverallcomputationalefciency.Sinceconjoined-coreCMPsaretopologyoriented,thelayoutmustbeco-designedwiththearchitectureotherwisethearchitecturalspecicationsforresourcesharingmaynotbetopologicallypossibleormayincurhighercommunication 235

PAGE 236

costs.Ingeneral,thesharedresourcesshouldbelargeenoughsothatthecostoftheadditionalwiringrequiredforsharingmaynotexceedtheareabenetsachievedbysharing.Staticschedulingisthesimplestwaytoorganizeresourcesharinginconjoined-coreCMPswherecoresshareresourcesindifferentnon-overlappingcycles(e.g.,onecoremayusethesharedresourceduringevencyclesandtheothercoremayusethesharedresourceduringoddcycles,oronecoremaysharetheresourcefortherstvecycles,anothercoreforthenextvecycles,andsoon).Resultsindicatethatconjoined-coreCMPscanreducearearequirementsby50%andmaintainperformancewithin9-12%ofconventionalcoreswithoutconjoining[ 182 ]. 8.1.1.3Tiledmulti-corearchitectures Tiledmulti-corearchitecturesexploitmassiveon-chipresourcesbycombiningeachprocessorcorewithaswitchtocreateamodularelementcalledatile,whichcanbereplicatedtocreateamulti-coreembeddedsystemwithanynumberoftiles.Tiledmulti-corearchitecturescontainahigh-performanceinterconnectionnetworkthatconstrainsinterconnectionwirelengthtonolongerthanthetilewidthandaswitch(communicationrouter)interconnectsneighboringswitches.Examplesoftiledmulti-corearchitecturesincludetheRawprocessor,Intel'sTera-Scaleresearchprocessor,TileraTILE64,TILEPro64,andTILE-Gxprocessorfamily[ 183 ]. 8.1.1.43Dmulti-corearchitectures A3Dmulti-corearchitectureisanintegratedcircuitthatorchestratesarchitecturalunits(e.g.,processorcoresandmemories)acrosscoresina3Dlayout.ThearchitectureprovidesHPEECbydecreasingtheinterconnectionlengthsacrossthechip,whichresultsinreducedcommunicationlatency.Researchrevealsthat3Dmulti-coreprocessorscanachieve47%performancegainand20%powerreductiononaverageover2Dmulti-coreprocessors[ 184 ].The3Dmulti-corearchitectures'disadvantagesincludehighpowerdensitythatexacerbatesthermalchallengesaswellasincreasedinterconnectcapacitanceduetoelectricalcouplingbetweendifferentlayers[ 185 ]. 236

PAGE 237

8.1.1.5Composablemulti-corearchitectures Thecomposablemulti-corearchitectureisanintegratedcircuitthatallowsthenumberofprocessorsandeachprocessor'sgranularitytobeconguredbasedonapplicationrequirements(i.e.,largepowerfulprocessorsforapplications(tasks)withmoreILPandsmalllesspowerfulprocessorsfortaskswithmoreTLP).Thearchitectureconsistsofanarrayofcomposablelightweightprocessors(CLPs)thatcanbeaggregatedtoformlargepowerfulprocessorstoachievehighperformancedependinguponthetaskgranularity.Examplesofcomposablemulti-corearchitecturesincludeTRIPSandTFlex[ 183 ]. 8.1.1.6Stochasticprocessors Stochasticprocessorsareprocessorsusedforfault-tolerantcomputingthatarescalablewithrespecttoperformancerequirementsandpowerconstraintswhileproducingoutputsthatarestochasticallycorrectintheworstcase.Stochasticprocessorsmaintainscalabilitybyexposingmultiplefunctionallyequivalentunitstotheapplicationlayerthatdifferintheirarchitectureandexhibitdifferentreliabilitylevels.Applicationsselectappropriatefunctionalunitsforaprogramorprogramphasebasedontheprogramand/orprogramphase'sreliabilityrequirements.Stochasticprocessorscanprovidesignicantpowerreductionandthroughputimprovementespeciallyforstochasticapplications(applicationswithaprioriknowledgeofreliabilityrequirements,suchasmultimediaapplications,wherecomputationalerrorsareconsideredanadditionalnoisesource).Resultsindicatethatstochasticprocessorscanachieve20-60%powersavingsinthethemotionestimationblockofH.264videoencodingapplication[ 186 ]. 8.1.2MemoryDesign Thecachemissrate,fetchlatency,anddatatransferbandwidtharesomeofthemainfactorsimpactingtheperformanceandenergyconsumptionofembeddedsystems.Thememorysubsystemencompassesthemainmemoryandcachehierarchyand 237

PAGE 238

musttakeintoconsiderationissuessuchasconsistency,sharing,contention,size,andpowerdissipation.Inthissubsection,wediscussHPEECmemorydesigntechniques,whichincludetransactionalmemory,cachepartitioning,cooperativecaching,andsmartcaching. 8.1.2.1Transactionalmemory Transactionalmemoryincorporatesthedenitionofatransaction(asequenceofinstructionsexecutedbyasingleprocesswiththefollowingproperties:atomicity,consistency,andisolation)inparallelprogrammingtoachievelock-freesynchronizationefciencybycoordinatingconcurrentthreads.Acomputationwithinatransactionexecutesatomicallyandcommitsonsuccessfulcompletion,makingthetransaction'schangesvisibletootherprocesses,oraborts,causingthetransaction'schangestobediscarded.Atransactionensuresthatconcurrentreadsandwritestoshareddatadonotproduceinconsistentorincorrectresults.Theisolationpropertyofatransactionensuresthatatransactionproducesthesameresultasifnoothertransactionswererunningconcurrently[ 187 ].Intransactionalmemories,regionsofcodeinparallelprogrammingcanbedenedasatransaction.Transactionalmemorybenetsfromhardwaresupportthatrangesfromcompleteexecutionoftransactionsinhardwaretohardware-acceleratedsoftwareimplementationsoftransactionalmemory[ 183 ]. 8.1.2.2Cachepartitioning Oneofthemajorchallengesinusingmulti-coreembeddedsystemsforreal-timeapplicationsistimingunpredictabilityduetocorecontentionforon-chipsharedresources(e.g.,leveltwo(L2)orlevelthree(L3)caches,interconnectnetworks).Worst-caseexecutiontime(WCET)estimationtechniquesforsingle-coreembeddedsystemsarenotdirectlyapplicabletomulti-coreembeddedsystemsbecauseataskrunningononecoremayevictusefulL2cachecontentsofanothertaskrunningonanothercore.Cachepartitioningisacachespaceisolationtechniquethatexclusivelyallocatesdifferentportionsofsharedcachestodifferentcorestoavoidcachecontention 238

PAGE 239

forhardreal-timetasks,thusensuringamorepredictableruntime.Cachepartitioning-awareschedulingtechniquesalloweachtasktouseaxednumberofcachepartitionsensuringthatacachepartitionisoccupiedbyatmostonescheduledtaskatanytime[ 188 ].Cachepartitioningcanenhanceperformancebyassigninglargerportionsofsharedcachestocoreswithhigherworkloadsascomparedtothecoreswithlighterworkloads. 8.1.2.3Cooperativecaching Cooperativecachingisahardwaretechniquethatcreatesaglobally-managedsharedcacheusingthecooperationofprivatecaches.CooperativecachingallowsremoteL2cachestoholdandservedatathatwouldnottinthelocalL2cacheofacoreandthereforeimprovesaverageaccesslatencybyminimizingoff-chipaccesses[ 189 ].Cooperativecachingprovidesthreeperformanceenhancingmechanisms:cooperativecachingfacilitatescache-to-cachetransfersofunmodieddatatominimizeoff-chipaccesses,cooperativecachingreplacesreplicateddatablockstomakeroomforuniqueon-chipdatablockscalledsinglets,andcooperativecachingallowsevictionofsingletsfromalocalL2cachetobeplacedinanotherL2cache.Cooperativecachingimplementationrequiresplacementofcooperation-relatedinformationinprivatecachesandtheextensionofcachecoherenceprotocolstosupportdatamigrationacrossprivatecachesforcapacitysharing.Resultsindicatethatforan8-coreCMPwith1MBL2cachepercore,cooperativecachingimprovestheperformanceofmulti-threadedcommercialworkloadsby5-11%and4-38%ascomparedtosharedL2cacheandprivateL2caches,respectively[ 190 ]. 8.1.2.4Smartcaching Smartcachingfocusesonenergy-efcientcomputingandleveragescacheset(way)predictionandlow-powercachedesigntechniques[ 16 ].Insteadofwaitingforthetagarraycomparison,waypredictionpredictsthematchingwaypriortothecacheaccess.Waypredictionenablesfasteraveragecacheaccesstimeandreducespower 239

PAGE 240

consumptionbecauseonlythepredictedwayisaccessedifthepredictioniscorrect.However,ifthepredictionisincorrect,theremainingwaysareaccessedduringthesubsequentclockcycle(s),resultinginalongercacheaccesstimeandincreasedenergyconsumptionascomparedtoacachewithoutwayprediction.Thedrowsycacheisalow-powercachedesigntechniquethatreducesleakagepowerbyperiodicallysettingtheunusedcacheline'sSRAMcellstoadrowsy,low-powermode.Adrowsycacheisadvantageousoverturningoffcachelinescompletelybecausethedrowsymodepreservesthecacheline'sdatawhereasturningoffthecachelinelosesthedata.However,drowsymoderequirestransitioningthedrowsycachelinetoahigh-powermodebeforeaccessingcacheline'sdata.Researchrevealsthat80-90%ofcachelinescanbeputindrowsymodewithlessthana1%performancedegradationandresultinacachestaticanddynamicenergyreductionof50-75%[ 191 ]. 8.1.3InterconnectionNetwork Asthenumberofon-chipcoresincreases,ascalableandhigh-bandwidthinterconnectionnetworktoconnecton-chipresourcesbecomescrucial.Interconnectionnetworkscanbestaticordynamic.Staticinterconnectionnetworksconsistofpoint-to-pointcommunicationlinksbetweencomputingnodesandarealsoreferredtoasdirectnetworks(e.g.,bus,ring,hypercube).Dynamicinterconnectionnetworksconsistofswitches(routers)andlinksandarealsoreferredtoasindirectnetworks(e.g.,packet-switchednetworks).Thissubsectiondiscussesprominentinterconnecttopologies(e.g.,bus,2Dmesh,hypercube)andinterconnecttechnologies(e.g.,packet-switched,photonic,wireless). 8.1.3.1Interconnecttopology Oneofthemostcriticalinterconnectionnetworkparametersisthenetworktopology,whichdeterminestheon-chipnetworkcostandperformance.Theinterconnecttopologygovernsthenumberofhopsorroutersamessagemusttraverseaswellastheinterconnectionlength.Therefore,theinterconnecttopologydetermines 240

PAGE 241

thecommunicationlatencyandenergydissipation(sincemessagetraversalacrosslinksandthroughroutersdissipatesenergy).Furthermore,theinterconnecttopologydeterminesthenumberofalternatepathsbetweencomputingnodes,whichaffectsreliability(sincemessagescanroutearoundfaultypaths)aswellastheabilitytoevenlydistributenetworktrafcacrossmultiplepaths,whichaffectstheeffectiveon-chipnetworkbandwidthandperformance.Theinterconnecttopologycostisdictatedbythenodedegree(thenumberoflinksateachcomputingnode)andlengthoftheinterconnectingwires.Examplesofon-chipinterconnectionnetworktopologiesincludebuses(linear1Darrayorring),2Dmesh,andhypercube.Inbustopology,theprocessorcoresshareacommonbusforexchangingdata.Busesarethemostprevalentinterconnectnetworkinmulti-coreembeddedsystemsduetothebus'slowcostandeaseofimplementation.Busesprovidelowercoststhanotherinterconnecttopologiesbecauseofalowernodedegree:thenodedegreeforabusinterconnectistwo,fora2Dmeshisfour,andforahypercubeislogpwherepisthetotalnumberofcomputingnodes.However,busesdonotscalewellasthenumberofcoresintheCMPincreases.The2Dmeshinterconnecttopologyprovidesshortchannellengthsandlowroutercomplexity,however,the2Dmeshdiameterisproportionaltotheperimeterofthemesh,whichcanleadtoenergyinefciencyandhighnetworklatency(e.g.,thediameterof10x10meshis18hops)[ 192 ].Thehypercubetopologyisaspecialcaseofad-dimensionalmesh(ad-dimensionalmeshhasanodedegreeof2d)whend=logp. 8.1.3.2Packet-Switchedinterconnect Packet-switchedinterconnectionnetworksreplacebusesandcrossbarinterconnectsasscalabilityandhigh-bandwidthdemandincreasesformulti-coreembeddedsystems.Packet-switchednetworksconnectaroutertoeachcomputingnodeandroutersareconnectedtoeachotherviashort-lengthinterconnectwires.Packet-switchedinterconnectionnetworksmultiplexmultiplepacketowsovertheinterconnectwires 241

PAGE 242

toprovidehighlyscalablebandwidth[ 183 ].Tilera'sTILEarchitecturesleveragethepacket-switchedinterconnectionnetwork. 8.1.3.3Photonicinterconnect Asthenumberofon-chipcoresinaCMPincreases,globalon-chipcommunicationplaysaprominentroleinoverallperformance.Whilelocalinterconnectsscalewiththenumberoftransistors,theglobalwiresdonotbecausetheglobalwiresspanacrosstheentirechiptoconnectdistantlogicgatesandtheglobalwires'bandwidthrequirementsincreasesasthenumberofcoresincreases.Aphotonicinterconnectionnetworkconsistingofaphotonicsource,opticalmodulators(ratesexceed12.5Gbps),andsymmetricalopticalwaveguidescandeliverhigherbandwidthandlowerlatencieswithconsiderablylowerpowerconsumptionthananelectronicsignalingbasedinterconnectnetwork.Inphotonicinterconnects,onceaphotonicpathisestablishedusingopticalwaveguides,datacanbetransmittedend-to-endwithoutrepeaters,regenerators,orbuffersasopposedtotheelectronicinterconnectsthatrequiresbuffering,regeneration,andretransmissionofmessagesmultipletimesfromsourcetodestination[ 193 ].Thephotonicinterconnectionnetworkisdividedintozoneseachwithadroppointsuchthattheclocksignalisopticallyroutedtothedroppointwheretheopticalclocksignalisconvertedtotheelectricalsignal.Analysisrevealsthatpowerdissipationinanopticalclockdistributionislowerthananelectricalclockdistribution[ 185 ]. Thephotonicinterconnectionnetworkscanbenetseveralclassesofembeddedapplications,includingreal-timeandthroughput-intensiveapplications(especiallyapplicationswithlimiteddatareusesuchasstreamingapplications)(Fig. 1-1 ).However,eventhoughphotonicinterconnectionnetworksprovidemanybenets,thesenetworkshaveseveraldrawbackssuchasdelaysassociatedwiththeriseandfalltimesofopticalemittersanddetectors,lossesintheopticalwaveguides,signalnoiseduetowaveguidescoupling,limitedbuffering,andsignalprocessing[ 185 ]. 242

PAGE 243

8.1.3.4Wirelessinterconnect Wirelessinterconnectisanemergingtechnologythatpromisestoprovidehighbandwidth,lowlatency,andlowenergydissipationbyeliminatinglengthywiredinterconnects.Carbonnanotubes(CNT)areagoodcandidateforwirelessantennasduetoaCNT'shighaspectratio(virtuallyaone-dimensionalwire),highconductance(lowlosses),andhighcurrentcarryingcapacity(109A/cm2,whichismuchhigherthansilverandcopper)[ 185 ].Wirelessinterconnectcandeliverhighbandwidthbyprovidingmultiplechannelsandusingtime-division,code-division,frequency-division,orsomehybridofthesemultiplexingtechniques.Experimentsindicatethatawirelessinterconnectcanreducethecommunicationlatencyby20-45%ascomparedtoa2D-meshinterconnectwhileconsumingacomparableamountofpower[ 192 ].Awirelessinterconnect'sperformanceadvantageincreasesasthenumberofon-chipcoresincreases.Forexample,awirelessinterconnectcanprovideaperformancegainof217%,279%,600%overa2D-meshinterconnectwhenthenumberofon-chipcoresisequalto128,256,and512,respectively[ 194 ]. 8.1.4ReductionTechniques Duetoanembeddedsystem'sconstrainedresources,embeddedsystemarchitecturaldesignmustconsiderpowerdissipationreductiontechniques.Powerreductiontechniquescanbeappliedatvariousdesignlevels:thecomplementarymetal-oxide-semiconductor(CMOS)-leveltargetsleakageandshortcircuitcurrentreduction,theprocessor-leveltargetsinstruction/datasupplyenergyreductionaswellaspower-efcientmanagementofotherprocessorcomponents(e.g.,executionunits,reorderbuffers,etc.),andtheinterconnectionnetwork-leveltargetsminimizinginterconnectionlengthusinganappropriatenetworklayout.Inthissubsection,wepresentseveralpowerreductiontechniquesincludingleakagecurrentreduction,shortcircuitcurrentreduction,peakpowerreduction,andinterconnectionlengthreduction. 243

PAGE 244

8.1.4.1Leakagecurrentreduction Asadvancesinthechipfabricationprocessreducesthefeaturesize,theCMOSleakagecurrentandassociatedleakagepowerhasincreased.Leakagecurrentreductiontechniquesincludebackbiasing,silicononinsulatortechnologies,multi-thresholdMOStransistors,andpowergating[ 16 ]. 8.1.4.2Shortcircuitcurrentreduction ShortcircuitcurrentowsinaCMOSgatewhenbothnMOSFETandpMOSFETareon,whichcausesalargeamountofcurrenttoowthroughtransistorsandcanresultinincreasedpowerdissipationoreventransistorburn-out.Theshortcircuiteffectisexacerbatedastheclockperiodapproachesthetransistorswitchingperiodduetoincreasingclockfrequencies.Theshortcircuitcurrentcanbereducedusinglow-leveldesigntechniquesthataimtoreducethetimeduringwhichbothnMOSFETandpMOSFETareon[ 16 ]. 8.1.4.3Peakpowerreduction Peakpowerreductionnotonlyincreasespowersupplyefciencybutalsoreducespackaging,cooling,andpowersupplycostasthesecostsareproportionaltothepeakpowerdissipationratherthantheaveragepowerdissipation.Adaptiveprocessorscanreducepeakpowerbycentrallymanagingarchitecturalcomponentcongurations(e.g.,instructionanddatacaches,integerandoatingpointinstructionqueues,reorderbuffers,load-storeexecutionunits,integerandoatingpointregisters,registerrenaming,etc.)toensurethatnotofallthesecomponentsaremaximallyconguredsimultaneously.Adaptiveprocessorsincurminimalperformancelossandhighpeakpowerreductionbyrestrictingmaximumcongurationtoasingleresourceorafewresources(butnotall)atatime.Researchrevealsthatadaptiveprocessorsreducepeakpowerconsumptionby25%withonlya5%performancedegradation[ 195 ]. 244

PAGE 245

8.1.4.4Interconnectionlengthreduction Theinterconnectingwirelengthincreasesasthenumberofon-chipdevicesincreases,resultinginbothincreasedpowerdissipationanddelay.Anenergy-efcientdesignrequiresreducedinterconnectionwirelengthsforhighswitchingactivitysignalsanduseofplacementandroutingoptimizationalgorithmsforreduceddelayandpowerconsumption[ 16 ].Chipdesigntechniques(e.g.,3Dmulti-corearchitectures)andvariousinterconnecttopologies(e.g.,2D-mesh,hypercube)helpinreducinginterconnectionwirelengths. 8.1.4.5Instructionanddatafetchenergyreduction HardwiredASICstypicallyprovide50xmoreefcientcomputingascomparedtogeneralpurposeprogrammableprocessors,however,architecture-levelenergyconsumptionanalysiscanhelpinenergy-efcientdesignofprogrammableprocessors[ 3 ].Previousworkindicatesthattheprogrammableprocessorsspendapproximately70%ofthetotalenergyconsumptionfetchinginstructions(42%)anddata(28%)tothearithmeticunits,whereasperformingthearithmeticconsumesasmallfractionofthetotalenergy(around6%).Moreover,theinstructioncacheconsumesthemajorityoftheinstructionfetchenergy(67%)[ 3 ].Researchindicatesthatreducinginstructionanddatafetchenergycanreducetheenergy-efciencygapbetweenASICsandprogrammableprocessorsto3x.Specically,instructionfetchtechniquesthatavoidaccessingpower-hungrycachesarerequiredforenergy-efcientprogrammableprocessors(e.g.,theStanfordefcientlow-powermicroprocessor(ELM)fetchesinstructionsfromasetofdistributedinstructionregistersratherthanthecache)[ 3 ]. 8.2Hardware-AssistedMiddlewareApproaches VariousHPEECtechniques(Fig. 8-1 )areimplementedasmiddlewareand/orpartofanembeddedOStomeetapplicationrequirements.TheHPEECmiddlewaretechniquesareassistedand/orpartlyimplementedinhardwaretoprovidetherequestedfunctionalities(e.g.,powergatingsupportinhardwareenablesmiddlewaretopowergate 245

PAGE 246

processorcores).HPEEChardware-assistedmiddlewaretechniquesincludedynamicvoltageandfrequencyscaling(DVFS),advancedcongurationandpowerinterface(ACPI),threadingtechniques(hyper-threading,helperthreading,andspeculativethreading),energymonitoringandmanagement,dynamicthermalmanagement(DTM),dependableHPEECtechniques(N-modularredundancy,dynamicconstitution,andproactivecheckpointdeallocation),andvariouslow-powergatingtechniques(powergating,per-corepowergating,splitpowerplanes,andclockgating). 8.2.1DynamicVoltageandFrequencyScaling DVFSisadynamicpowermanagement(DPM)techniqueinwhichtheperformanceandpowerdissipationisregulatedbyadjustingtheprocessor'svoltageandfrequency.Theone-to-onecorrespondencebetweenprocessor'svoltageandfrequencyinCMOScircuitsimposesastrictconstraintondynamicvoltagescaling(DVS)techniquestoensurethatthevoltageadjustmentsdonotviolateapplicationtiming(deadline)constraints(especiallyforreal-timeapplications).Multi-coreembeddedsystemsleveragetwoDVFStechniques:globalDVFSscalesthevoltagesandfrequenciesofallthecoressimultaneouslyandlocalDVFSscalesthevoltageandfrequencyonaper-corebasis[ 16 ].ExperimentsindicatethatlocalDVFScanimproveperformance(throughput)by2.5xonaverageandcanprovidean18%higherthroughputthanglobalDVFSonaverage[ 196 ][ 197 ]. DVFS-basedoptimizationscanbeemployedforreal-timeapplicationstoconformwithtasks'deadlinesinanenergy-efcientmanner.Forexample,ifataskdeadlineisimpending,DVFScanbeadjustedtooperateatthehighestfrequencytomeetthetaskdeadlinewhereasifthetaskdeadlineisnotclose,thenDVFScanbeadjustedtolowervoltageandfrequencysettingstoconserveenergywhilestillmeetingthetaskdeadline. AlthoughDVFSisregardedasoneofthemostefcientenergysavingtechnique,theassociatedoverheadofperformingDVFSneedstobeconsidered.DVFSrequiresaprogrammableDC-DCconverterandaprogrammableclockgenerator(mostlyphase 246

PAGE 247

lockloop(PLL)-based)thatincurstimeandenergyoverheadwhenevertheprocessorchangesitsvoltageandfrequencysetting.Thisoverheaddictatestheminimumdurationoftimethatthetargetsystemshouldstayinaparticularvoltage-frequencystatefortheDVStoproduceapositiveenergygain[ 198 ]. 8.2.2AdvancedCongurationandPowerInterface ThoughDPMtechniquescanbeimplementedinhardwareaspartoftheelectroniccircuit,hardwareimplementationcomplicatesthemodicationandrecongurationofpowermanagementpolicies.Theadvancedcongurationandpowerinterface(ACPI)specicationisaplatform-independentsoftware-basedpowermanagementinterfacethatattemptstounifyexistingDPMtechniques(e.g.,DVFS,powerandclockgating)andputthesetechniquesundertheOScontrol[ 199 ].ACPIdenesvariousstatesforanACPI-compliantembeddedsystem,buttheprocessorpowerstates(C-states)andtheprocessorperformancestates(P-states)aremostrelevanttoHPEEC.ACPIdenesfourC-states:C0(theoperatingstatewheretheprocessorexecutesinstructionsnormally),C1(thehaltstatewheretheprocessorstopsexecutinginstructionsbutcanreturntoC0instantaneously),C2(thestop-clockstatewheretheprocessorandcachemaintainsstatebutcantakelongertoreturntoC0),andC3(thesleepstatewheretheprocessorgoestosleep,doesnotmaintaintheprocessorandcachestate,andtakeslongestascomparedtootherC-statestoreturntoC0).ACPIdenesnP-states(P1,P2,...,Pn)wheren16,correspondingtotheprocessorC0state.EachP-statedesignatesaspecicDVFSsettingsuchthatP0isthehighestperformancestatewhileP1toPnaresuccessivelylowerperformancestates.ACPIspecicationisimplementedinvariousmanufacturedchips(e.g.,IntelnamesP-statesasSpeedStepwhileAMDasCool`n'Quiet). 8.2.3GatingTechniques Toenablelow-poweroperationandmeetanapplication'sconstrainedenergybudget,varioushardware-supportedlowpowergatingtechniquescanbecontrolledby 247

PAGE 248

themiddleware.Thesegatingtechniquescanswitchoffacomponent'ssupplyvoltageorclocksignaltosavepowerduringotherwiseidleperiods.Inthissubsection,wediscussgatingtechniquessuchaspowergating,per-corepowergating,splitpowerplanes,andclockgating. 8.2.3.1Powergating Powergatingisapowermanagementtechniquethatreducesleakagepowerbyswitchingoffthesupplyvoltagetoidlelogicelementsafterdetectingnoactivityforacertainperiodoftime.Powergatingcanbeappliedtoidlefunctionalunits,cores,andcachebanks[ 16 ]. 8.2.3.2Per-Corepowergating Per-corepowergatingisane-grainedpowergatingtechniquethatindividuallyswitchesoffidlecores.InconjunctionwithDVFS,per-corepowergatingprovidesmoreexibilityinoptimizingperformanceandpowerdissipationofmulti-coreprocessorsrunningapplicationswithvaryingdegreesofparallelism.Per-corepowergatingincreasessingle-threadperformanceonasingleactivecorebyincreasingtheactivecore'ssupplyvoltagewhilepowergatingtheotheridlecores,whichprovidesadditionalpower-andthermal-headroomfortheactivecore.Experimentsindicatethatper-corepowergatinginconjunctionwithDVFScanincreasethethroughputofamulti-coreprocessor(with16cores)by16%onaveragefordifferentworkloadsexhibitingarangeofparallelismwhilemaintainingthepowerandthermalconstraints[ 200 ]. 8.2.3.3Splitpowerplanes Splitpowerplanesisalow-powertechniquethatallowsdifferentpowerplanestocoexistonthesamechipandminimizesbothstaticanddynamicpowerdissipationbyremovingpowerfromidleportionsofthechip.Eachpowerplanehasseparatepins,aseparate(orisolated)powersupply,andindependentpowerdistributionrouting.Forexample,Freescale'sMPC8536EPowerQUICIIIprocessorhastwopowerplanes:one 248

PAGE 249

planefortheprocessorcore(e500)andL2cachearrays,andasecondplanefortheremainderofthechip'scomponents[ 201 ]. 8.2.3.4Clockgating Clockgatingisalow-powertechniquethatallowsgatingofftheclocksignaltoregisters,latches,clockregenerators,orentiresubsystems(e.g.,cachebanks).Clockgatingcanyieldsignicantpowersavingsbygatingoffthefunctionalunits(e.g.,adders,multipliers,andshifters)notrequiredbythecurrentlyexecutinginstruction,asdeterminedbytheinstructiondecodeunit.Clockgatingcanalsobeappliedinternallyforeachfunctionalunittofurtherreducepowerconsumptionbydisablingthefunctionalunit'supperbitsforsmalloperandvaluesthatdonotrequirethefunctionalunit'sfullbitwidth.Thegranularityatwhichclockgatingcanbeappliedislimitedbytheoverheadassociatedwiththeclockenablesignalgeneration[ 16 ]. 8.2.4ThreadingTechniques Differentthreadingtechniquestargethighperformancebyeitherenablingasingleprocessortoexecutemultiplethreadsorbyspeculativelyexecutingmultiplethreads.Prominenthigh-performancethreadingtechniquesincludehyper-threading,helperthreading,andspeculativethreading.Wepointoutthathelperandspeculativethreadingareperformance-centricandmayleadtoincreasedpowerconsumptionincaseofmisspeculationwherespeculativeprocessingneedstobediscarded.Therefore,helperandspeculativethreadingshouldbeusedwithcautioninenergycriticalembeddedsystems.Belowwedescribeabriefdescriptionofthesethreadingtechniques. 8.2.4.1Hyper-Threading Hyper-threadingleveragessimultaneousmultithreadingtoenableasingleprocessortoappearastwologicalprocessorsandallowsinstructionsfrombothofthelogicalprocessorstoexecutesimultaneouslyonthesharedresources[ 189 ].Hyper-threadingenablestheOStoschedulemultiplethreadstotheprocessorsothatdifferentthreadscanusetheidleexecutionunits.Thearchitecturestate,consistingofgeneral-purpose 249

PAGE 250

registers,interruptcontrollerregisters,controlregisters,andsomemachinestateregisters,isduplicatedforeachlogicalprocessor.However,hyper-threadingdoesnotofferthesameperformanceasamultiprocessorwithtwophysicalprocessors. 8.2.4.2Helperthreading Helperthreadingleveragesspecialexecutionmodestoprovidefasterexecutionbyreducingcachemissratesandmisslatency[ 189 ].Helperthreadingacceleratesperformanceofsingle-threadedapplicationsusingspeculativepre-execution.Thispre-executionismostbenecialforirregularapplicationswheredataprefetchingisineffectiveduetochallengingdataaddressesprediction.Thehelperthreadsrunaheadofthemainthreadandreducecachemissratesandmisslatenciesbypre-executingregionsofthecodethatarelikelytoincurmanycachemisses.Helperthreadingcanbeparticularlyusefulforapplicationswithmultiplecontrolpathswherehelperthreadspre-executeallpossiblepathsandprefetchthedatareferencesforallpathsinsteadofwaitinguntilthecorrectpathisdetermined.Oncethecorrectexecutionpathisdetermined,allthehelperthreadsexecutingincorrectpathsareaborted. 8.2.4.3Speculativethreading Speculativethreadingapproachesprovidehighperformancebyremovingunnecessaryserializationinprograms.Wediscusstwospeculativeapproaches:speculativemulti-threadingandspeculativesynchronization. Speculativemulti-threadingdividesasequentialprogramintomultiplecontiguousprogramsegmentscalledtasksandexecutethesetasksinparallelonmultiplecores.Thearchitectureprovideshardwaresupportfordetectingdependenciesinasequentialprogramandrollingbacktheprogramstateonmisspeculations.Speculativemulti-threadedarchitecturesexploithightransistordensitybyhavingmultiplecoresandrelievesprogrammersfromparallelprogramming,asisrequiredforconventionalCMPs.Speculativemulti-threadedarchitecturesprovideinstructionwindowsmuchlargerthanconventionaluniprocessorsbycombiningtheinstructionwindowsofmultiple 250

PAGE 251

corestoexploitdistantTLPasopposedtothenearbyILPexploitedbyconventionaluniprocessors[ 183 ]. Speculativesynchronizationremovesunnecessaryserializationbyapplyingthread-levelspeculationtoparallelapplicationsandpreventingspeculativethreadsfromblockingatbarriers,busylocks,andunsetags.Hardwaremonitorsdetectconictingaccessesandrollbackthespeculativethreadstothesynchronizationpointpriortotheaccessviolation.Speculativesynchronizationguaranteesforwardexecutionusingasafethreadthatensuresthattheworstcaseperformanceoftheorderofconventionalsynchronization(i.e.,threadsnotusinganyspeculation)whenspeculativethreadsfailtomakeprogress. 8.2.5EnergyMonitoringandManagement Prolingthepowerconsumptionofvariouscomponents(e.g.,processorcores,caches)fordifferentembeddedapplicationsatanegranularityidentieshow,when,andwherepowerisconsumedbytheembeddedsystemandtheapplications.Powerprolingisimportantforenergy-efcientHPEECsystemdesign.Energymonitoringsoftwarecanmonitor,track,andanalyzeperformanceandpowerconsumptionfordifferentcomponentsatthefunction-levelorblock-levelgranularity.PowerPackisanenergymonitoringtoolthatusesacombinationofhardware(e.g.,sensorsanddigitalmeters)andsoftware(e.g.,drivers,benchmarks,andanalysistools).PowerPackprolespowerandenergy,aswellaspowerdynamics,ofDVFSinCMP-basedclustersystemsfordifferentparallelapplicationsatthecomponentandcodesegmentgranularity[ 202 ]. Powermanagementmiddlewaredynamicallyadaptstheapplicationbehaviorinresponsetouctuationsinworkloadandpowerbudget.PowerDialisapowermanagementmiddlewarethattransformsstaticapplicationcongurationparametersintodynamiccontrolvariablesstoredintheaddressspaceoftheexecutingapplication[ 203 ].Thesecontrolvariablesareaccessibleviaasetofdynamicknobstochangetherunningapplication'scongurationdynamicallytotradeoffcomputationaccuracy 251

PAGE 252

(asfarastheapplicationsminimumaccuracyrequirementsaresatised)andresourcerequirements,whichtranslatestopowersavings.ExperimentsindicatethatPowerDialcanreducepowerconsumptionby75%. Greenisapowermanagementmiddlewarethatenablesapplicationprogrammerstoexploitapproximationopportunitiestomeetperformancedemandswhilemeetingqualityofservice(QoS)guarantees[ 204 ].Greenprovidesaframeworkthatenablesapplicationprogrammerstoapproximateexpensivefunctionsandloops.Greenoperatesintwophases:thecalibrationphaseandtheoperationphase.Inthecalibrationphase,GreencreatesaQoSlossmodelfortheapproximatedfunctionstoquantifytheapproximationimpact(lossinaccuracy).TheoperationalphaseusesthisQoSlossmodeltomakeapproximationdecisionsbasedonprogrammer-speciedQoSconstraints.ExperimentsindicatethatGreencanimprovetheperformanceandenergyconsumptionby21%and14%,respectively,withonlya0.27%QoSdegradation. 8.2.6DynamicThermalManagement TemperaturehasbecomeanimportantconstraintinHPEECembeddedsystemsbecausehightemperatureincreasescoolingcosts,degradesreliability,andreducesperformance.Furthermore,anembeddedapplication'sdistinctandtime-varyingthermalprolenecessitatesdynamicthermalmanagement(DTM)approaches.DTMformulti-coreembeddedsystemsismorechallengingthanforthesingle-coreembeddedsystemsbecauseacore'scongurationandworkloadhasasignicantimpactonthetemperatureofneighboringcoresduetolateralheattransferbetweenadjacentcores.ThegoalofDTMtechniquesistomaximizeperformancewhilekeepingtemperaturebelowadenedthreshold. 8.2.6.1TemperaturedeterminationforDTM DTMrequiresefcientchipthermalproling,whichcanbedoneusingsensor-based,thermalmodel-based,orperformancecounters-basedmethods.Sensor-basedmethodsleveragephysicalsensorstomonitorthetemperatureinreal-time.DTM 252

PAGE 253

typicallyusesoneofthetwosensorplacementtechniques:globalsensorplacementmonitorsglobalchiphotspotsandlocalsensorplacementplacessensorsineachprocessorcomponenttomonitorlocalprocessorcomponents.Thermalmodel-basedmethodsusethermalmodelsthatexploitthedualitybetweenelectricalandthermalphenomenabyleveraginglumped-RC(resistor/capacitor)models.Thermalmodelscaneitherbelow-levelorhigh-level.Low-levelthermalmodelsestimatetemperatureaccuratelyandreportthesteadystateaswellasprovidetransienttemperatureestimation,however,arecomputationallyexpensive.High-levelthermalmodelsleverageasimpliedlumped-RCmodelthatcanonlyestimatethesteadystatetemperature,however,arecomputationallylessexpensivethanthelow-levelthermalmethods.Performancecounters-basedmethodsestimatethetemperatureofdifferenton-chipfunctionalunitsusingtemperaturevaluesreadfromspecicprocessorcounterregisters.Thesecounterreadingscanbeusedtoestimatetheaccessrateandtiminginformationofvariouson-chipfunctionalunits. 8.2.6.2TechniquesassistingDTM DVFSisoneofthemajortechniquethathelpsDTMinmaintainingachip'sthermalbalanceandalleviatesacore'sthermalemergencybyreducingthecorevoltageandfrequency.DVFScanbeglobalorlocal.GlobalDVFSprovideslesscontrolandefciencyasasinglecore'shotspotcouldresultinunnecessarystallingorscalingofalltheremainingcores.LocalDVFScontroleachcore'svoltageandfrequencyindividuallytoalleviatethermalemergencyoftheaffectedcores,however,introducesdesigncomplexity.Ahybridlocal-globalthermalmanagementapproachhasthepotentialtoprovidebetterperformancethanlocalDVFSwhilemaintainingthesimplicityofglobalDVFS.ThehybridapproachappliesglobalDVFSacrossallthecoresbutspecializesthearchitecturalparameters(e.g.,instructionwindowsize,issuewidth,fetchthrottling/gating)ofeachcorelocally.Researchrevealsthatthehybridapproachachievesa5%betterthroughputthanthelocalDVFS[ 197 ].AlthoughDVFS 253

PAGE 254

canhelpDTMtomaintainthermalbalance,thereexistsothertechniquestoassistDTM(e.g.,Zhouetal.[ 205 ]suggestedthatadjustingmicro-architecturalparameterssuchasinstructionwindowsizeandissuewidthhaverelativelyloweroverheadthanDVFS-basedapproaches). 8.2.7DependableTechniques Toachieveperformance-efciencywhilemeetinganapplication'sreliabilityrequirementsdenesthedependableHPEEC(DHPEEC)domain,whichrangesfromredundancytechniquestodependableprocessordesign.DHPEECplatformsarecriticalforspaceexploration,spacescience,anddefenseapplicationswitheverincreasingdemandsforhighdatabandwidth,processingcapability,andreliability.Wedescribeseveralhardware-assistedmiddlewaretechniquesleveragedbyDHPEECincludingN-modularredundancy,dynamicconstitution,andproactivecheckpointdeallocation. 8.2.7.1N-modularredundancy Theprocessvariation,technologyscaling(deepsubmicronandnanoscaledevices),andcomputationalenergyapproachingthermalequilibriumleadstohigherrorratesinCMPs,whichnecessitatesredundancytomeetreliabilityrequirements.Core-levelN-modularredundancy(NMR)runsNprogramcopiesonNdifferentcoresandcanmeethighreliabilitygoalsformulti-coreprocessors.Eachcoreperformsthesamecomputationandtheresultsarevoted(compared)forconsistency.Votingcaneitherbetime-basedorevent-based.Basedonthevotingresult,programexecutioncontinuesorrollsbacktoacheckpoint(apreviouslystored,validarchitecturalstate).Amulti-coreNMRframeworkcanprovideeitherstaticordynamicredundancy.Staticredundancyusesasetofstaticallyconguredcoreswhereasdynamicredundancyassignsredundantcoresduringruntimebasedontheapplication'sreliabilityrequirementsandenvironmentalstimuli[ 206 ].Staticredundancyincurshigharearequirementandpowerconsumptionduetothelargenumberofcoresrequiredtomeetanapplication's 254

PAGE 255

reliabilityrequirements,whereasdynamicredundancyprovidesbetterperformance,power,andreliabilitytradeoffs. Thedependablemultiprocessor(DM)isanexampleofaDHPEECplatformwhichleveragesNMR.TheDMdesignincludesafault-tolerantembeddedmessagepassinginterface(FEMPI)(alightweightfault-tolerantversionoftheMessagePassingInterface(MPI)standard)forprovidingfault-tolerancetoparallelembeddedapplications[ 9 ].Furthermore,DMcanleverageHPECplatformssuchastheTilePro64[ 207 ]. 8.2.7.2Dynamicconstitution Dynamicconstitution,anextensionofdynamicredundancy,permitsanarbitrarycoreonachiptobeapartofanNMRgroup,whichincreasesdependabilityascomparedtothestaticNMRcongurationbyschedulingaroundcoreswithpermanentfaults.Forexample,ifanNMRgroupisstaticallyconstitutedandthenumberofcoreswithpermanentfaultsdropsbelowthethresholdtomeettheapplication'sreliabilityrequirements,theremainingnon-faultycoresintheNMRgrouparerendereduseless.DynamicconstitutioncanalsobehelpfulinalleviatingthermalconstraintsbypreventingNMRhotspots[ 208 ]. 8.2.7.3Proactivecheckpointdeallocation Proactivecheckpointdeallocationisahigh-performanceextensionforNMRthatpermitscoresparticipatinginvotingtocontinueexecutioninsteadofwaitingonthevotinglogicresults.Afteravotinglogicdecision,onlythecoreswithcorrectresultsareallowedtocontinuefurtherexecution. 8.3SoftwareApproaches Theperformanceandpowerefciencyofanembeddedplatformnotonlydependsuponthebuilt-inhardwaretechniquesbutalsodependsuponthesoftware'sabilitytoeffectivelyleveragethehardwaresupport.Software-basedHPEECtechniquesassistDPMbysignalingthehardwareoftheresourcerequirementsofanapplicationphase.Softwareapproachesenablehighperformancebyschedulingandmigratingtasks 255

PAGE 256

staticallyordynamicallytomeetapplicationrequirements.HPEECsoftware-basedtechniquesincludedataforwarding,taskscheduling,taskmigration,andloadbalancing. 8.3.1DataForwarding DataforwardingbenetsHPEECbyhidingmemorylatency,whichismorechallenginginmultiprocessorsystemsascomparedtouniprocessorsystemsbecauseuniprocessorcachescanhidememorylatencybyexploitingspatialandtemporallocalitywhereascoherentmultiprocessorshavesharingmissesinadditiontothenon-sharingmissespresentinuniprocessorsystems.Inasharedmemoryarchitecture,processorsthatcachethesamedataaddressarereferredassharingprocessors.Dataforwardingintegratesne-grainedmessagepassingcapabilitiesinasharedmemoryarchitectureandhidesthememorylatencyassociatedwithsharingaccessesbysendingthedatavaluestothesharingprocessorsassoonasthedatavaluesareproduced[ 209 ].Dataforwardingcanbeperformedbythecompilerwherethecompilerinsertswriteandfor-wardassemblyinstructionsinplaceofordinarywriteinstructions.Compiler-assisteddataforwardingusesanextraregistertoindicatetheprocessorsthatshouldreceivetheforwardeddata.Anotherdataforwardingtechniquereferredasprogrammer-assisteddataforwardingrequiresaprogrammertoinsertapost-storeoperationthatcausesacopyofanupdateddatavaluetobesenttoallthesharingprocessors.Experimentsindicatethatremotewritestogetherwithprefetchingimproveperformanceby10-48%relativetothebasesystem(nodataforwardingandprefetching)whereasremotewritesimproveperformanceby3-28%relativetothebasesystemwithprefetching[ 189 ]. 8.3.2LoadDistribution Amulti-coreembeddedsystem'sperformanceisdictatedbytheworkloaddistributionacrossthecores,whichinturndictatestheexecutiontimeandpower/thermalproleofeachcore.Loaddistributiontechniquesfocusonloadbalancingbetweentheexecutingcoresviataskschedulingandtaskmigration. 256

PAGE 257

8.3.2.1Taskscheduling ThetaskschedulingproblemcanbedenedasdetermininganoptimalassignmentoftaskstocoresthatminimizesthepowerconsumptionwhilemaintainingthechiptemperaturebelowtheDTMenforcedceilingtemperaturewithminimalornoperformancedegradationgiventhetotalenergybudget.TaskschedulingappliesforbothDPMandDTMandplaysapivotalroleinextendingbatterylifeforportableembeddedsystems,alleviatingthermalemergencies,andenablinglong-termsavingsfromreducedcoolingcosts.TaskschedulingcanbeappliedinconjunctionwithDVFStomeetreal-timetaskdeadlinesasahigherprocessingspeedresultsinfastertaskexecutionandshorterschedulinglengths,butattheexpenseofgreaterpowerconsumption.Conversely,thedecreaseinprocessorfrequencyreducespowerconsumptionbutincreasestheschedulinglength,whichmayincreasetheoverallenergyconsumption.Sincethetaskschedulingoverheadincreasesasthenumberofcoresincreases,hardware-assistedtaskschedulingtechniquesarethefocusofemergingresearch(e.g.,threadschedulingingraphicsprocessingunits(GPUs)ishardware-assisted).Experimentsindicatethathardware-assistedtaskschedulingcanimprovetheschedulingtimeby8.1%forCMPs[ 210 ]. 8.3.2.2Taskmigration Inamulti-threadedenvironment,threadsperiodicallyand/oraperiodicallyenterandleavecores.ThreadmigrationisaDPMandDTMtechniquethatallowsascheduledthreadtoexecute,preempt,ormigratetoanothercorebasedonthethread'sthermaland/orpowerprole.TheOSorthreadschedulercandynamicallymigratethreadsrunningoncoreswithlimitedresourcestothecoreswithmoreresourcesasresourcesbecomeavailable.Dependingontheexecutingworkloads,therecanbeasubstantialtemperaturevariationacrosscoresonthesamechip.Threadmigration-basedDTMperiodicallymovesthreadsawayfromthehotcorestothecoldcoresbasedonthistemperaturedifferentialtomaintainthecores'thermalbalance.Athreadmigration 257

PAGE 258

techniquemusttakeintoaccounttheoverheadincurredduetothreadmigrationcommunicationcostsandaddressspaceupdates.Temperaturedeterminationtechniques(e.g.,performancecounter-based,sensor-based)assistthreadmanagementtechniquesinmakingmigrationdecisions. Threadmigrationtechniquescanbecharacterizedasrotation-based,temperature-based,orpower-based[ 211 ].Therotation-basedtechniquemigratesathreadfromcore(i)tocore((i+1)modN)whereNdenotesthetotalnumberofprocessorcores.Thetemperature-basedtechniqueorderscoresbasedonthecores'temperatureandthethreadoncore(i)isswappedwiththethreadoncore(N)]TJ /F3 11.955 Tf 12.58 0 Td[(i)]TJ /F1 11.955 Tf 12.58 0 Td[(1)(i.e.,thethreadonthehottestcoreisswappedwiththethreadonthecoldestcore,thethreadonthesecondhottestcoreisswappedwiththethreadonthesecondcoldestcore,andsoon).Thepower-basedtechniqueorderscoresbasedonthecores'temperatureinascendingorderandordersthreadsbasedonthethreads'powerconsumptionindescendingorder.Thepower-basedtechniquethenschedulesthread(i)tocore(i)(e.g.,themostpower-hungrythreadisscheduledtothecoldestcore). ThreadmigrationcanbeappliedinconjunctionwithDVFStoenhanceperformance.Researchindicatesthatthreadmigrationalonecanimproveperformanceby2xonaveragewhereasthreadmigrationinconjunctionwithDVFScanimproveperformanceby2.6xonaverage[ 196 ]. 8.3.2.3Loadbalancingandunbalancing Loadbalancingtechniquesdistributeaworkloadequallyacrossallthecoresinamulti-coreembeddedsystem.Loadunbalancingcanbecausedbyeitherextrinsicorintrinsicfactors.ExtrinsicfactorsareassociatedwiththeOSandhardwaretopology.Forexample,theOScanscheduledaemonprocessesduringtheexecutionofaparallelapplicationandanasymmetrichardwaretopologycanresultinvaryingcommunicationlatenciesfordifferentprocesses.Intrinsicfactorsincludeimbalancedparallelalgorithms,imbalanceddatadistribution,andchangesintheinputdataset.Anunbalancedtask 258

PAGE 259

assignmentcanleadtoaperformancedegradationbecausecoresexecutinglightworkloadsmayhavetowait/stallforothercoresexecutingheavierworkloadstoreachasynchronizationpoint.Loadbalancingreliesonefcienttaskschedulingtechniquesaswellasbalancedparallelalgorithms.Cachepartitioningcanassistloadbalancingbyassigningmorecachepartitionstothecoresexecutingheavierworkloadstodecreasethecachemissrateandincreasethecore'sexecutionspeed,andthusreducethestalltimeforthecoresexecutinglightworkloads[ 212 ]. Althoughloadbalancingprovidesamechanismtoachievehighperformanceinembeddedsystems,loadbalancingmayleadtohighpowerconsumptionifnotappliedjudiciouslybecauseloadbalancingfocusesonutilizingallthecoresevenforasmallnumberoftasks.Aloadunbalancingstrategythatconsidersworkloadcharacteristics(i.e.,periodicoraperiodic)canachievebetterperformanceandlowerpowerconsumptionascomparedtoaloadbalancingoraloadunbalancingstrategythatignoresworkloadcharacteristics.Aworkload-awareloadunbalancingstrategyassignsrepeatedlyexecutedperiodictaskstoaminimumnumberofcoresanddistributesaperiodictasksthatarenotlikelytobeexecutedrepeatedlytoamaximumnumberofcores.Wepointoutthatthecriticalperformancemetricforperiodictasksisdeadlinesatisfactionratherthanfasterexecution(alongerwaitingtimeisnotaproblemaslongasthedeadlineismet),whereasthecriticalperformancemetricforaperiodictasksisresponsetimeratherthandeadlinesatisfaction.Theperiodictasksnotdistributedoverallthecoresleavemoreidlecoresforschedulingaperiodictasks,whichshortenstheresponsetimeofaperiodictasks.ResultsonanARM11MPCorechipdemonstratethattheworkload-awareloadunbalancingstrategyreducespowerconsumptionandthemeanwaitingtimeofaperiodictasksby26%and82%,respectively,ascomparedtoaloadbalancingstrategy.Theworkload-awareloadunbalancingstrategyreducesthemeanwaitingtimeofaperiodictasksby92%withsimilarpowerefciencyascomparedtoaworkloadunawareloadunbalancingstrategy[ 213 ]. 259

PAGE 260

8.4High-PerformanceEnergy-EfcientMulti-CoreProcessors Siliconandchipvendorshavedevelopedvarioushigh-performancemulti-coreprocessorsthatleveragethevariousHPEECtechniquesdiscussedinthischapter.Althoughprovidinganexhaustivelistofalltheprevalenthigh-performancemulti-coreprocessorsthatcanbeusedinembeddedapplicationsisoutsideofthescopeofthischapter,wediscusssomeprominentmulti-coreprocessors(summarizedinTable 8-2 )andfocusontheirHPEECfeatures. 8.4.1ARM11MPCore TheARM11MPCoreprocessorfeaturescongurablelevelonecaches,afullycoherentdatacache,1.3GB/secmemorythroughputfromasingleCPU,andvectoroatingpointcoprocessors.TheARM11MPCoreprocessorprovidesenergy-efciencyviaaccuratebranchandsub-routinereturnprediction(reducesthenumberofincorrectinstructionfetchesanddecodeoperations),physicallyaddressedcaches(reducesthenumberofcacheushesandrells),andpowerandclockgatingtodisableinputstoidlefunctionalblocks[ 214 ].TheARM11MPCoresupportsadaptiveshutdownofidleprocessorstoyielddynamicpowerconsumptionof0.49mW/MHz@130nmprocess.TheARMIntelligentEnergyManager(IEM)candynamicallypredicttheapplicationperformanceandperformsDVFStoreducepowerconsumptionto0.3mW/MHz[ 215 ]. 8.4.2ARMCortexA-9MPCore TheARMCortexA-9MPCoreisamulti-issueout-of-ordersuperscalarpipelinedmulti-coreprocessorconsistingofonetofourCortex-A9processorcoresgroupedinaclusteranddeliversapeakperformanceof2.5DMIPS/MHz[ 216 ].TheARMCortexA-9MPCorefeaturesasnoopcontrolunit(SCU)thatensurescachecoherencewithintheCortex-A9processorclusterandahigh-performanceL2cachecontrollerthatsupportsbetween128Kand8MofL2cache.TheCortexA-9processorincorporatestheARMThumb-2technologythatdeliversthepeakperformanceoftraditionalARMcodewhileprovidinguptoa30%instructionmemorystoragereduction.EachCortexA-9processor 260

PAGE 261

Table8-2. High-performanceenergy-efcientmulti-coreprocessors ProcessorCoresSpeedPowerPerformance ARM11MPCore1-4620MHz600mW2600DMIPSARMCortexA-9MPCore1-4800MHz-2GHz250mWperCPU4,000-10,000DMIPSMPC8572EPowerQUICCIII21.2GHz-1.5GHz17.3W@1.5GHz6897MIPS@1.5GHzTileraTILEPro6464tiles700MHz-866MHz19-23W@700MHz443GOPSTileraTILE-Gx16/36/64/100tiles1GHz-1.5GHz10-55W750GOPSAMDOpteron61008/121.7-2.3GHz65-105WIntelXeonProcessorLV514822.33GHz40WIntelSandyBridge43.8GHz35-45W121.6GFLOPSAMDPhenomIIX61090T63.6GHz125WNVIDIAGeForceGTX460336CUDAcores1.3GHz160W748.8GFLOPSNVIDIAGeForce9800GX2256CUDAcores1.5GHz197W1152GFLOPSNVIDIAGeForceGTX295480CUDAcores1.242GHz289W748.8GFLOPSNVIDIATeslaC2050/C2070448CUDAcores1.15GHz238W1.03TFLOPSAMDFireStream9270800streamcores750MHz160W1.2TFLOPSATIRadeonHD4870X21600streamcores750MHz423W2.4TFLOPS 261

PAGE 262

intheclustercanbeinoneofthefollowingmodes:runmode(theentireprocessorisclockedandpowered-up),standbymode(theCPUclockisgatedoffandonlythelogicrequiredtowakeuptheprocessorisactive),dormantmode(everythingexceptRAMarraysarepoweredoff),andshutdown(everythingispoweredoff)[ 217 ].TheARMCortex-A9MPCoreprocessorcansupportuptofourteenpowerdomains:fourCortex-A9processorpowerdomains(oneforeachcore),fourCortex-A9processordataenginespowerdomains(oneforeachcore),fourpowerdomains(oneforeachoftheCortex-A9processorcachesandtranslationlookasidebuffer(TLB)RAMs),onepowerdomainforSCUduplicatedtagRAMs,andonepowerdomainfortheremaininglogic,privateperipherals,andtheSCUlogiccells[ 217 ].TypicalARMCortexA-9MPCoreapplicationsincludehigh-performancenetworkingandmobilecommunications. 8.4.3MPC8572EPowerQUICCIII Freescale'sPowerQUICCIIIintegratedcommunicationsprocessorconsistoftwoprocessorcores,enhancedperipherals,andahigh-speedinterconnecttechnologytomatchprocessorperformancewithI/Osystemthroughput.TheMPC8572EPowerQUICCIIIprocessorcontainsanapplicationaccelerationblockthatintegratesfourpowerfulengines:atablelookupunit(TLU)(performscomplextablesearchesandheaderinspections),apattern-matchingengine(PME)(carriersoutexpressionmatching),adeateengine(handlesledecompression),andasecurityengine(acceleratescryptography-relatedoperations)[ 218 ].TheprocessorusesDPMtominimizepowerconsumptionofidleblocksbyputtingidleblocksinoneofthepowersavingmodes(doze,nap,andsleep)[ 219 ].TypicalMPC8572EPowerQUICCIIIapplicationsincludemulti-serviceroutingandswitching,uniedthreatmanagement,rewall,andwirelessinfrastructureequipment(e.g.,radionodecontrollers). 8.4.4TileraTILEPro64andTILE-Gx Tilerarevolutionizeshigh-performancemulti-coreembeddedcomputingbyleveragingatiledmulti-corearchitecture(e.g.,theTILEPro64andTILE-Gxprocessor 262

PAGE 263

family[ 220 ][ 221 ]).TheTILEPro64andTILE-Gxprocessorfamilyfeaturean8x8gridandanarrayof16to100tiles(cores),respectively,whereeachtileconsistsofa32-bitverylonginstructionword(VLIW)processor,threedeeppipelinesdeliveringupto3instructionspercycle(IPC),integratedL1andL2cache,andanon-blockingswitchthatintegratesthetileintoapower-efcientinterconnectmesh.TheTILEPro64andTILE-Gxprocessorsoffer5.6MBand32MBofon-chipcache,respectively,andimplementTilera'sdynamicdistributedcache(DDC)technologythatprovidesa2ximprovementonaverageincachecoherenceperformanceovertraditionalcachetechnologiesusingacachecoherenceprotocol.EachtilecanindependentlyrunacompleteOSormultipletilescanbegroupedtogethertorunamulti-processingOSlikeSMPLinux.TheTILEPro64andTILE-GxprocessorfamilyemploysDPMtoputidletilesintoalow-powersleepmode.TheTILEPro64andTILE-Gxfamilyofprocessorcansupportawiderangeofcomputingapplicationsincludingadvancednetworking,wirelessinfrastructure,telecom,digitalmultimedia,andcloudcomputing. 8.4.5AMDOpteronProcessor TheAMDOpteronisadual-coreprocessorwhereeachcorehasaprivateL2cachebutsharesanon-chipmemorycontroller.TheAMDOpteron6100seriesplatformconsistsof8or12coreAMDOpteronprocessors.TheAMDOpteron'sCool`n'Quiettechnologyswitchescorestolow-powerstateswhenatemperaturethresholdisreached[ 222 ]. 8.4.6IntelXeonProcessor IntelleveragesHafniumHi-KandmetalgatesinnextgenerationXeonprocessorstoachievehigherclockspeedsandbetterperformanceperwatt.TheXeonprocessorsalsoimplementhyper-threadingandwidedynamicexecutiontechnologiesforhighperformance.Thewiderexecutionpipelinesenableeachcoretosimultaneouslyfetch,dispatch,execute,andretireuptofourinstructionspercycle[ 223 ].TheIntelXeon5500processorfamilyfeatures15powerstatesandafasttransitionbetweenthesepower 263

PAGE 264

states(lessthan2microseconds)[ 5 ].TheXeonprocessorsarebasedonIntelCore2Duomicro-architecturewherethetwocoresshareacommonL2cachetoprovidefasterinter-corecommunication.ThesharedL2cachecanbedynamicallyresizeddependingonindividualcore'sneeds.Intel'sdeeppowerdowntechnologyenablesbothcoresandtheL2cachetobepowereddownwhentheprocessorisidle[ 224 ].Intel'sdynamicpowercoordinationtechnologyallowssoftware-basedDPMtoaltereachcore'ssleepstatetotradeoffbetweenpowerdissipationandperformance.TheprocessorincorporatesdigitaltemperaturesensorsoneachcoretomonitorthermalbehaviorusingIntel'sadvancedthermalmanagertechnology[ 225 ].TheDual-coreIntelXeonprocessorLV5148alow-powerembeddedprocessorenablesmicro-gatingofprocessorcircuitrytodisabletheprocessor'sinactiveportionswithnergranularity[ 226 ].TypicalapplicationsfortheIntelXeonprocessorincludemedicalimaging,gaming,industrialcontrolandautomationsystems,mobiledevices,military,andaerospace. 8.4.7IntelSandyBridgeProcessor TheSandyBridgeisIntel'ssecond-generationquad-coreprocessorthatoffershighsustainedthroughputforoating-pointmath,mediaprocessingapplications,anddata-parallelcomputation[ 227 ][ 228 ].Theprocessors'soatingpointunitsupportstheadvancedvectorextension(AVX)instructionsetthatallowsvectorprocessingofupto256bitsinwidth.Theprocessorleveragesthehyper-threadingtechnologythatprovidestheOSwitheightlogicalCPUs.IntelSandyBridgeleveragesIntelTurboBoostTechnologythatallowsprocessorcoresandthebuilt-inintegratedgraphicsprocessor(IGP)torunfasterthanthebaseoperatingfrequencyiftheprocessorisoperatingbelowpower,current,andtemperaturespecicationlimits[ 229 ].Theprocessorusesaring-styleinterconnectbetweenthecoresofferingacommunicationbandwidthupto384GB/s. 264

PAGE 265

8.4.8GraphicsProcessingUnits Agraphicsprocessingunit(GPU)isamassivelyparallelprocessorcapableofexecutingalargenumberofthreadsconcurrently,andacceleratesandofoadsgraphicsrenderingfromtheCPU.GPUsfeaturehighmemorybandwidththatistypically10xfasterthancontemporaryCPUs.NVIDIAandAMD/ATIarethetwomainGPUvendors.GPUsaresuitableforhigh-denition(HD)videos,photos,3Dmovies,high-resolutiongraphics,andgaming.Apartfromhighgraphicsperformance,GPUsenablegeneral-purposecomputingongraphicsprocessingunits(GPGPU),whichisacomputingtechniquethatleveragesGPUstoperformcompute-intensiveoperationstraditionallyhandledbyCPUs.GPGPUsarerealizedbyaddingprogrammablestagesandhigherprecisionarithmetictotherenderingpipelines,whichenablesstreamprocessorstoprocessnon-graphicsdata.Forexample,NVIDIATeslapersonalsupercomputerconsistingof3or4TeslaC1060computingprocessors[ 230 ]offersupto4TFLOPSofcomputecapabilitywith4GBofdedicatedmemoryperGPU[ 231 ]. NVIDIA'sPowerMizertechnologyavailableonallNVIDIAGPUsisaDPMtechniquethatadaptstheGPUtosuitanapplication'srequirements[ 232 ].DigitalwatchdogsmonitorGPUutilizationandturnoffidleprocessorengines.NVIDIA'sParallelDataCachetechnologyacceleratesalgorithms,suchasray-tracing,physicssolvers,andsparsematrixmultiplication,wheredataaddressesarenotknownapri-ori[ 233 ].ATI'sPowerPlaytechnologyisaDPMsolutionthatmonitorsGPUactivityandadjustsGPUpowerbetweenlow,medium,andhighstatesviaDVFSbasedonworkloadcharateristics.Forexample,PowerPlayputstheGPUinalow-powerstatewhenreceivingandcomposingemails,andswitchestheGPUtoahigh-powerstateforcompute-intensivegamingapplications.PowerPlayincorporateson-chipsensorstomonitortheGPU'stemperatureandtriggersthermalactionsaccordingly.ThePowerPlaytechnologyisavailableontheATIRadeonHD3800and4800series 265

PAGE 266

graphicsprocessors,theATIMobilityRadeongraphicsprocessors,andtheRadeonExpressmotherboardchipsets. 266

PAGE 267

CHAPTER9MULTI-COREPARALLELANDDISTRIBUTEDEMBEDDEDWIRELESSSENSORNETWORKS Advancementsinsilicontechnology,embeddedsystems,sensors,micro-electro-mechanicalsystems,andwirelesscommunicationshaveledtotheemergenceofembeddedwirelesssensornetworks(EWSNs).EWSNsconsistofembeddedsensornodeswithattachedsensorstosensedataaboutaphenomenonandcommunicatewithneighboringsensornodesoverwirelesslinks(werefertowirelesssensornetworks(WSNs)asEWSNssincesensornodesareembeddedinthephysicalenvironment/system).EWSNshaveapplicationsinvariousdomainssuchassurveillance,environmentmonitoring,trafcmonitoring,volcanomonitoring,andhealthcare,amongothers. ManyemergingEWSNapplicationsrequireaplethoraofsensors(e.g.,acoustic,seismic,temperature,andmorerecentlyimagesensorsand/orsmartcameras)embeddedinthesensornodes.AlthoughtraditionalEWSNsequippedwithscalarsensors(e.g.,temperature,humidity)transmitmostofthesensedinformationtoasinknode(basestationnode),thissense-transmitparadigmisbecominginfeasibleforinformation-hungryapplicationsequippedwithaplethoraofsensorsincludingimagesensorsand/orsmartcameras.Forexample,consideramilitaryEWSNdeployedinabattleeld,whichrequiresvarioussensorssuchasimaging,acoustic,andelectromagnetic.Inthisapplication,imagesareappropriateforvisuallymonitoringthebattleeldwhereaselectromagneticandacousticsensorsenableefcientdetectionandtrackingoftargetsofinterest.Onceatargetisdetected,highresolutionimagesand/orvideosequencesmayberequiredinreal-timefordetailedstudyofthetarget[ 234 ].ThisapplicationpresentsvariouschallengesforexistingEWSNssincetransmissionofhighresolutionimagesandvideostreamsoverlimitedwirelessbandwidthfromsensornodestothesinknodeisinfeasible.Furthermore,meaningfulprocessingofmultimediadata(acoustic,image,andvideointhisexample)inreal-timeexceedsthecapabilitiesof 267

PAGE 268

traditionalEWSNsconsistingofsingle-coreembeddedsensornodes[ 235 ],andrequiresmorepowerfulembeddedsensornodestorealizethisapplication. Realizingthatsingle-coreEWSNswillsoonbeunabletosatisfytheevergrowingappetiteofinformation-richapplications(e.g.,videosensornetworks),eachnewgenerationofsensornodespossessenhancedcomputationandcommunicationcapabilities.Forexample,thetransmissionratefortherstgenerationofMicamotesis38.4kbpswhereasthesecondgenerationofMicamotes(MicaZmotes)cancommunicateat250kbpsusingIEEE802.15.4(Zigbee)[ 236 ].Despitetheadvancesincommunication,limitedwirelessbandwidthfromsensornodestothesinknodemakestimelytransmissionofmultimediadatatothesinknodeinfeasible.IntraditionalEWSNs,communicationenergydominatescomputationenergy.Forexample,anembeddedsensornodeproducedbyRockwellAutomation[ 237 ]expends2000xmoreenergyfortransmittingabitascomparedtoexecutingasingleinstruction[ 238 ].Similarly,transmittinga15framespersecond(FPS)digitalvideostreamoverawirelessbluetoothlinktakes400mW[ 239 ]. Fortunately,thereexistsatradeoffbetweentransmissionandcomputationinanEWSN,whichiswell-suitedforin-networkprocessingforinformation-hungryapplicationsandallowstransmissionofonlyeventdescriptions(e.g.,detectionoftargetofinterest)tothesinknodetoconserveenergy.Technologicaladvancementsinmulti-corearchitecturehavemademulti-coreprocessorsaviablechoiceforincreasingthecomputationpowerofembeddedsensornodes.Multi-coreembeddedsensornodescanextractthedesiredinformationfromthesenseddata,whichreducesthesizeofthedatasentfromthesensornodestothesinknodebysendingonlytheprocessedinformation.Byreplacingalargepercentageofcommunicationbyin-networkcomputation,multi-coreembeddedsensornodeswouldexperienceenormousenergysavingsthatwouldincreasetheoveralllifetimeofthesensornetwork. 268

PAGE 269

Multi-coreembeddedsensornodesenableenergysavingsovertraditionalsingle-coreembeddedsensornodesintwoways.First,reducingtheenergyexpendedincommunicationbyperformingin-situcomputationofsenseddataandtransmittingonlyprocessedinformation.Second,amulti-coreembeddedsensornodeallowsthecomputationstobesplitamongmultiplecoreswhilerunningeachcoreatalowprocessorvoltageandfrequency,whichresultsinenergysavings.Utilizingasingle-coreembeddedsensornodeforinformationprocessingininformation-richapplicationsrequiresthesensornodetorunatahighprocessorvoltageandfrequencytomeettheapplication'sdelayrequirements,whichincreasesthepowerdissipationoftheprocessor.Amulti-coreembeddedsensornodereducesthenumberofmemoryaccesses,clockspeed,andinstructiondecoding,therebyenablinghigherarithmeticperformanceatalowerpowerconsumptionascomparedtoasingle-coreprocessor[ 239 ].Preliminarystudieshavedemonstratedtheenergy-efciencyofmulti-coreembeddedsensornodesascomparedtosingle-coreembeddedsensornodesinanEWSN.Forexample,Doganetal.[ 240 ]evaluatedsingle-andmulti-corearchitecturesforbiomedicalsignalprocessinginwirelessbodysensornetworks(WBSNs)wherebothenergy-efciencyandreal-timeprocessingarecrucialdesignobjectives.Resultsrevealedthatthemulti-corearchitectureconsumed66%lesspowerthanthesingle-corearchitectureforhighbiosignalcomputationworkloads(i.e.,50.1Megaoperationsperseconds(MOPS))whereasthemulti-corearchitectureconsumed10.4%morepowerthanthesingle-corearchitectureforrelativelylightcomputationworkloads(i.e.,681Kilooperationspersecond(KOPS)). Althoughperformanceandenergyadvantagesrenderedbymulti-coreprocessorsinembeddedsensornodescouldbeattainedusingeld-programmablegatearrays(FPGAs)andapplication-specicintegratedcircuits(ASICs)[ 238 ],wefocusonlyonmulti-coreprocessorsinthischapterduetothefollowingreasons.First,themajorityofexistingsensornodesleveragesingle-coreprocessors,thereforetransitioningfrom 269

PAGE 270

asingle-coretoamulti-corearchitectureprovidesacomparativelyeasierchangeascomparedwithtransitioningtoFPGAsandASICssincebothsingle-andmulti-corearchitecturestypicallyleverageC-basedparallelprogramminglanguages,awidelyadoptedprogrammingparadigm.Alternatively,FPGAsrequirespecializedhardwaredescriptionlanguages(HDLs)andbothFPGAsandASICsrequirespecializedhardwaredesignexpertise.Second,multi-coreembeddedsensornodes,similartosingle-coreembeddedsensornodes,permitexibilityandreprogrammability,whicharecriticalforemergingEWSNapplications.Forexample,real-timevideoprocessingrequiresprogrammableplatformsasnewvisionmethodsandalgorithmsemergeregularly.AlthoughFPGAsalsoofferreprogrammability,FPGAsrequireahigherprogrammingeffortascomparedtomulti-corearchitectures,whichmaynotbefeasibleforrapidlychangingapplicationrequirements.Additionally,FPGAswouldnotbesuitableforsensornodesrequiringintenseoatingpointoperationsduetoverylargehardwareoverheads.Ontheotherhand,ASICsprovidenoexibilityorreprogrammabilitybutcandeliverthehighestperformanceandenergyefciencyascomparedtomulti-corearchitecturesandFPGAs.Hence,ASICswouldonlybesuitableforEWSNsrequiringahigh-volumeofembeddedsensornodes(towarrantthehighcostassociatedwithASICproduction)withxedapplicationrequirementsandalgorithms. Tothebestofourknowledge,thisworkisthersttohighlightthefeasibilityandapplicationofmulti-coretechnologyinEWSNs.Althoughfewinitiativesstudyingthefeasibilityofmulti-coretechnologyforEWSNsexistinliterature[ 240 ][ 241 ],therehasbeennoworkinliteraturethatproposesamulti-coreembeddedwirelesssensornetwork(MCEWSN)architecturebasedonmulti-coreembeddedsensornodes.Furthermore,motivesandapplicationdomainsforMCEWSNshavenotyetbeencharacterized.Thischaptermotivatesandsummarizesthemulti-coreinitiativeinEWSNsbyacademiaandindustry. 270

PAGE 271

Figure9-1. Aheterogeneoushierarchicalmulti-coreembeddedwirelesssensornetwork(MCEWSN)architecture. Theremainderofthischapterisorganizedasfollows.Section 9.1 proposesaMCEWSNarchitectureanddiscussesthearchitectureofamulti-coreembeddedsensornode.Section 9.2 elaboratesonseveralcompute-intensivetasksthatmotivatedtheemergenceofMCEWSNs.PotentialapplicationdomainsamenabletoMCEWSNsarediscussedinSection 9.3 .Section 9.4 discussesseveralprototypesofmulti-coreembeddedsensornodesandSection 9.5 concludesthischapter. 9.1Multi-CoreEmbeddedWirelessSensorNetworkArchitecture Fig. 9-1 depictsourproposedheterogeneoushierarchicalMCEWSNarchitecture,whichisenvisionedtocopewiththeincreasingin-networkcomputationalrequirementsofemergingEWSNapplications.Theheterogeneityinthearchitecturestemsfromtheintegrationofnumeroussingle-coreembeddedsensornodesandafewmulti-coreembeddedsensornodes.Wenotethathomogeneoushierarchicalsingle-coreEWSNshavebeendiscussedinliteratureforlargeEWSNs(EWSNsconsistingofalargenumberofsensornodes)[ 242 ][ 243 ].Ourproposedarchitectureishierarchicalsincethearchitecturecomprisesofvariousclusters(agroupofembeddedsensornodesincommunicationrangewitheachother)andasinknode.Ahierarchicalnetworkiswell 271

PAGE 272

suitedforlargeEWSNssincesmallEWSNs(EWSNsconsistingofafewsensornodes)cansendthesenseddatadirectlytothebasestationorsinknode. Eachclusterconsistsofseveralleafsensornodesandaclusterhead.Leafsensornodescontainasingle-coreprocessorandareresponsibleforsensing,pre-processingsenseddata,andtransmittingsenseddatatotheclusterheadnodes.Sinceleafsensornodesarenotintendedtoperformcomplexprocessingofsenseddatainourproposedarchitecture,asingle-coreprocessorsufcientlymeetsthecomputationrequirementsofleafsensornodes.Clusterheadnodesconsistofamulti-coreprocessorandareresponsibleforcoalescing/fusingthedatareceivedfromleafsensornodesfortransmissiontothesinknodeinanenergyandbandwidthefcientmanner.Ourproposedarchitecturewithmulti-coreclusterheadsisbasedonpracticalreasonssincesendingallthecollecteddatafromtheclusterheadstothesinknodeisnotfeasibleforbandwidthlimitedEWSNs,whichwarrantscomplexprocessingandinformationfusion(discussedindetailinSection 9.2 )tobecarriedoutatclusterheadnodesandonlytheconciseprocessedinformationistransmittedtothesinknode. Thesinknodecontainsamulti-coreprocessorandisresponsiblefortransforminghigh-leveluserqueriesfromthecontrolandanalysiscenter(CAC)tonetwork-specicdirectives,queryingtheMCEWSNforthedesiredinformation,andreturningtherequestedinformationtotheuser/CAC.Thesinknode'smulti-coreprocessorfacilitatespost-processingoftheinformationreceivedfrommultipleclusterheads.Thepost-processingatsinknodeincludesinformationfusionandeventdetectionbasedonaggregateddatafromallofthesensornodesinthenetwork.TheCACfurtheranalyzestheinformationreceivedfromthesinknodeandissuescontrolcommandsandqueriestothesinknode. MCEWSNscanbecoupledwithasatellitebackbonenetworkthatprovideslong-haulcommunicationfromthesinknodetotheCACsinceMCEWSNsareoftendeployedinremoteareaswithnowirelessinfrastructuresuchasacellularnetworkinfrastructure.Thesatellitesinthesatellitebackbonenetworkcommunicatewitheach 272

PAGE 273

otherviainter-satellitelinks(ISLs).Sinceasatellite'suplinkanddownlinkbandwidthislimited,amulti-coreprocessorinthesinknodeisrequiredtoprocess,compress,and/orencrypttheinformationsenttothesatellitebackbonenetwork. EventhoughthischapterfocusesonheterogeneousMCEWSNs,homogenousMCEWSNarchitecturesareanextensionofourproposedarchitecture(Fig. 9-1 )whereleafsensornodesalsocontainamulti-coreprocessor.InahomogeneousMCEWSNequippedwithmultiplesensors,eachprocessorcoreinamulti-coreembeddedsensornodecanbeassignedprocessingofonesensingtask(e.g.,oneprocessorcorehandlessensedtemperaturedataandanotherprocessorcorehandlessensedhumiditydataandsoon)asopposedtosingle-coreembeddedsensornodeswherethesingleprocessorcoreisresponsibleforprocessingthesenseddatafromallsensors.WefocusonheterogeneousMCEWSNsaswebelievethatheterogeneousMCEWSNswouldserveasarststeptowardsintegrationofmulti-coreandsensornetworkingtechnologybecauseofthefollowingreason.Duetothedominanceofsingle-coreembeddedsensornodesinexistingEWSNs,replacingallofthesingle-coreembeddedsensornodeswithmulti-coreembeddedsensornodesmaynotbefeasibleandcost-effectivegiventhatonlyafewmulti-coreembeddedsensornodesoperatingasclusterheadscouldmeetanapplication'sin-networkcomputationrequirements.Hence,ourproposedheterogeneousMCEWSNwouldenableasmoothtransitionfromsingle-coretomulti-coreEWSNs. Fig. 9-2 depictsthearchitectureofamulti-coreembeddedsensornodeinourMCEWSN.Themulti-coreembeddedsensornodeconsistsofasensingunit,aprocessingunit,astorageunit,acommunicationunit,apowerunit,anoptionalactuatorunit,andanoptionallocationndingunit(optionalunitsarerepresentedbydottedlinesinFig. 9-2 )[ 235 ]. 9.1.1SensingUnit Thesensingunitsensesthephenomenonofinterestandiscomposedoftwosubunits:sensors(e.g.,camera/image,audio,andscalarsensors(e.g.,temperature, 273

PAGE 274

Figure9-2. Multi-coreembeddedsensornodearchitecture. pressure))andanalog-to-digitalconverters(ADCs).Imagesensorscaneitherleveragetraditionalcharge-coupleddevice(CCD)technologyorcomplementarymetal-oxide-semiconductor(CMOS)imagingtechnology.TheCCDsensoraccumulatestheincidentlightenergyasthechargeaccumulatedonapixel,whichisthenconvertedintoananalogvoltagesignal.InCMOSimagingtechnology,eachpixelhasitsowncharge-to-voltageconversionandotherprocessingcomponentssuchasampliers,noisecorrection,anddigitizationcircuits.TheCMOSimagingtechnologyenablesintegrationoflens,animagesensor,andimagecompressionandprocessingtechnologyonasinglechip.ADCsconverttheanalogsignalsproducedbysensorstodigitalsignals,whichserveasinputtotheprocessingunit. 274

PAGE 275

9.1.2ProcessingUnit Theprocessingunitconsistsofamulti-coreprocessorandisresponsibleforcontrollingsensors,gatheringandprocessingsenseddata,executingthesystemsoftwarethatcoordinatessensing,communicationtasks,andinterfacingwiththestorageunit.Theprocessingunitfortraditionalsensornodesconsistsofasingle-coreprocessorforgeneral-purposeapplicationssuchasperiodicsensingofscalardata(e.g.,temperature,humidity).High-performancesingle-coreprocessorswouldbeinfeasibletomeetcomputationrequirementsasthesesingle-coreprocessorswouldrequireoperationathighprocessorvoltageandfrequency.Aprocessoroperatingatahighvoltageandfrequencyconsumesanenormousamountofpoweraspowerincreasesproportionallytotheoperatingprocessorfrequencyandsquareoftheoperatingprocessorvoltage.Furthermore,eveniftheseenergyissuesareignored,asinglehigh-performanceprocessorcoremaynotbeabletomeetthecomputationrequirementsofemergingapplicationssuchasmultimediasensornetworksinreal-time. Multi-coreprocessorsdistributethecomputationsamongtheavailablecores,whichspeedsupthecomputationsaswellasconservesenergybyallowingeachprocessorcoretooperateatalowerprocessorvoltageandfrequency.Multi-coreprocessorsaresuitableforstreamingandcomplex,event-basedmonitoringapplicationssuchasinsmartcamerasensornetworksthatrequiredatatobeprocessedandcompressedaswellasrequireextractionofkeyinformationfeatures.Forexample,theIC3D/Xetalsingle-instructionmultiple-data(SIMD)processor,whichconsistsofalinearprocessorarray(LPA)with320reducedinstructionsetcomputers(RISC)/processors,isbeingusedinsmartcamerasensornetworks[ 244 ]. 9.1.3StorageUnit Thestorageunitconsistsofthememorysubsystem,whichcanbeclassiedasusermemoryandprogrammemory,andamemorycontroller,whichcoordinatesmemoryaccessesbetweendifferentprocessorcores.Theusermemorystoressensed 275

PAGE 276

datawhenimmediatedatatransmissionisnotpossibleduetohardwarefailures,environmentalconditions,physicallayerjamming,limitedenergyreserves,orwhenthedatarequiresprocessing.Theprogrammemoryisusedforprogrammingtheembeddedsensornodeandusingashmemoryfortheprogrammemoryprovidespersistentstorageofapplicationcodeandtextsegments.Staticrandom-accessmemory(SRAM),whichdoesnotneedperiodicrefreshingbutisexpensiveintermsofareaandpowerconsumption,isusedasdedicatedprocessormemory.Synchronousdynamicrandom-accessmemory(SDRAM)istypicallyusedasusermemory.Forexample,theImote2embeddedsensornode,whichcontainsaMarvellPXA271XScaleprocessoroperatingbetween13and416Mhz,has256KBSRAM,32MBFlash,and32MBSDRAM[ 245 ]. 9.1.4CommunicationUnit Thecommunicationunitinterfacestheembeddedsensornodetothewirelessnetworkandconsistsofatransceiverunit(transceiverandantenna)andacommunicationunitsoftware.Thecommunicationunitsoftwaremainlyconsistsofthecommunicationprotocolstack,andthephysicallayersoftwareinthecaseofsoftwaredenedradio.Thetransceiverunitconsistsofeitherawirelesslocalareanetwork(WLAN)card,suchasanIEEE802.11bcompliantcard,oranIEEE802.15.4compatiblecard,suchasaTexasInstrument/ChipconCC2420chipset.Thechoiceofatransceiverunitcarddependsupontheapplicationrequirementssuchasdesiredrangeandallowablepower.ThemaximumtransmitpowerofIEEE802.11bcardsishigherascomparedtoIEEE802.15.4cards,whichresultsinahighercommunicationrangebutconsumesmorepower.Forexample,theIntelPRO/Wireless2011cardhasadatarateof11Mbpsandatypicaltransmitpowerof18dBmbutdraws300mAand170mAforsendingandreceiving,respectively.TheCC2420802.15.4radiohasamaximumdatarateof250kbpsandatransmitpowerof0dBmbutdraws17.4mAand19.7mAforsendingandreceiving,respectively. 276

PAGE 277

9.1.5PowerUnit Thepowerunitsuppliespowertovariouscomponents/unitsontheembeddedsensornodeanddictatesthesensornode'slifetime.ThepowerunitconsistsofabatteryandaDC-DCconverter.TheDC-DCconverterprovidesaconstantsupplyvoltagetothesensornode.Thepowerunitmaybeaugmentedbyanoptionalenergy-harvestingunitthatderivesenergyfromexternalsourcessuchassolarcells.Althoughmulti-coreembeddedsensornodesaremorepower-efcientascomparedtosingle-coreembeddedsensornodes,energy-harvestingunitsinmulti-coreclusterheadsandthesinknodewouldprolongtheMCEWSNlifetime.Energy-harvestingunitsaremoresuitableforclusterheadsandthesinknodeasthesenodesperformmorecomputationsascomparedtothesingle-coreleafsensornodes.Furthermore,incorporatingenergy-harvestingunitsinonlyafewembeddedsensornodes(i.e.,clusterheadsandsinknode)wouldnotsubstantiallyincreasethecostofEWSNdeployment.Withoutanenergy-harvestingunit,MCEWSNswouldonlybesuitableforapplicationswithrelativelysmalllifetimerequirements. 9.1.6ActuatorUnit Theoptionalactuatorunitconsistsofactuators(e.g.,motors,servo,linearactuators,airmuscles,musclewire,camerapantilt,etc.)andanoptionalmobilizerunitforsensornodemobility.Actuatorsenhancethesensingtaskbyopening/closingaswitch/relaytocontrolfunctionssuchasacameraorantennaorientationandrepositioningsensors.Actuators,incontrasttosensorsthatonlysenseaphenomenon,typicallyaffecttheoperatingenvironmentbyopeningavalve,emittingsound,orphysicallymovingthesensornode. 9.1.7LocationFindingUnit Theoptionallocationndingunitdeterminesasensornode'slocation.Dependingontheapplicationrequirementsandavailableresources,thelocationndingunitcaneitherbeglobalpositioningsystem(GPS)-basedoradhocpositioningsystem 277

PAGE 278

(APS)-based.EventhoughGPSishighlyaccurate,theGPScomponentsareexpensiveandrequiredirectlineofsightbetweenthesensornodeandsatellites.APSdeterminesasensornode'spositionwithrespecttodenedlandmarks,whichmaybeotherGPS-basedsensornodes[ 36 ].Asensornodeestimatesthedistancefromitselftothelandmarkbasedondirectcommunicationandthereceivedcommunicationsignalstrength.Asensornodethatistwohopsawayfromalandmarkestimatesitsdistancebasedonthedistanceestimateofasensornodeonehopawayfromalandmarkviamessagepropagation.Asensornodewithdistanceestimatestothreeormorelandmarkscancomputeitsownpositionviatriangulation. 9.2Compute-IntensiveTasksMotivatingtheEmergenceofMCEWSNs Manyapplicationsrequireembeddedsensornodestoperformvariouscompute-intensivetasksthatoftenexceedsthecomputingcapabilityoftraditionalsingle-coresensornodes.Thesetasksincludeinformationfusion,encryption,networkcoding,andsoftwaredenedradiotonameafew,andmotivatetheemergenceofMCEWSNs.Inthissection,wediscussthesecompute-intensivetasksrequiringmulti-coresupportinanembeddedsensornode. 9.2.1InformationFusion PerhapsthemostimportantandcrucialprocessingtaskinEWSNsisinformationfusion,whichbenetsfromamulti-coreprocessorinanembeddedsensornode.EWSNsproducealargeamountofdatathatmustbeprocessed,delivered,andassessedaccordingtoapplicationobjectives.Sincethetransmissionbandwidthislimited,informationfusioncondensesthesenseddataandtransmitsonlytheselectedfusedinformationtothesinknode.Additionally,thedatareceivedfromneighboringsensornodesisoftenredundantandhighlycorrelated,whichwarrantsfusingthesenseddata.Formally,informationfusionencompassestheory,techniques,andtoolscreatedandappliedtoexploitthesynergyintheinformationacquiredfrommultiplesources(sensor,databases,etc.)inamannerthattheresultingdecisionoractionisin 278

PAGE 279

somesense(qualitativelyorquantitatively)betterintermsofaccuracyorrobustnessthanwouldbepossibleifanyoffusedsourceswereusedindividuallywithoutsuchsynergyexploitation[ 246 ].Dataaggregationisaninstanceofinformationfusioninwhichthedatafromvarioussourcesisaggregatedbymeansofsummarizationfunctions(e.g.,minimum,maximum,andaverage)thatreducethevolumeofdatabeingmanipulated.Informationfusioncanreducetheamountofdatatrafc,lternoisymeasurements,andmakepredictionsandinferencesaboutamonitoredentity. Informationfusioncanbecomputationallyexpensiveparticularlyforvideosensingapplications.Unlikescalardata,whichcanbecombinedusingrelativelysimplemathematicalmanipulationssuchasaverageandsummation,videodataisvectorialandrequirescomplexcomputationstofuse(e.g.,edgedetection,histogramformation,compression,ltering,etc.).Reducingtransmissionoverheadviainformationfusioninvideosensornetworkscomesattheexpenseofasubstantialincreaseinintermediateprocessing,whichwarrantstheuseofmulti-coreclusterheadsinMCEWSNs.Multi-coreclusterheadsfusedatareceivedfrommultiplesensornodestoeliminateredundanttransmissionandprovidefusedinformationtothesinknodewithminimumdatalatency.Datalatencyisthesumofthedelayinvolvedindatatransmission,routing,andinformationfusion/dataaggregation[ 242 ].Datalatencyisimportantinmanyapplications,especiallyreal-timeapplications,wherefreshnessofdataisanimportantfactor.Multi-coreclusterheadscanfusedatamuchfasterthansingle-coresensornodes,whichjustiestheuseofmulti-coreclusterheadsinMCEWSNswithcomplexreal-timecomputingrequirements. Omnibusmodelforinformationfusion:TheOmnibusmodel[ 247 ]guidesinformationfusionforsensor-baseddevices.Fig. 9-3 illustratestheOmnibusmodelwithrespecttoourMCEWSNarchitectureandweexemplifythemodel'susagebyconsideringasurveillanceapplicationperformingtargettrackingbasedonacousticsensors[ 246 ].TheObservestage,whichcanbecarriedoutatsingle-coresensor 279

PAGE 280

Figure9-3. OmnibussensorinformationfusionmodelforanMCEWSNarchitecture. nodesand/ormulti-coreclusterheads,usesaler(e.g.,movingaveragelter)toreducenoise(SignalProcessing)fromacousticsensordataprovidedbytheembeddedsensornodes(Sensing).TheOrientatestage,whichiscarriedoutatmulti-coreclusterheads,usesthelteredacousticdataforrangeestimation(FeatureExtraction)andestimatesthetarget'slocationandtrajectory(PatternProcessing).TheDecidestage,whichiscarriedoutatmulti-coreclusterheadsand/ormulti-coresinknode,classiesthesensedtarget(ContextProcessing)anddetermineswhetherthetargetrepresentsathreat(DecisionMaking).Ifthetargetisathreat,theActstage,whichiscarriedoutattheCAC,interceptsthetarget(Control)(e.g.,withamissile)andactivatesavailablearmaments(ResourceTasking). 9.2.2Encryption Securityisanimportantissueinmanysensornetworkingapplicationssincesensorsaredeployedinopenenvironmentsandaresusceptibletomaliciousattacks.Thesensedand/oraggregateddataneedstobeencryptedforsecuretransmissiontothesinknode.Thetwomainpracticalissuesinvolvedinencryptionarethesizeoftheencryptedmessageandtheexecutiontimeforencryption.Privacyhomomorphisms(PHs)areencryptionfunctionssuitableforMCEWSNsthatallowasetofoperationsto 280

PAGE 281

beperformedonencrypteddatawithoutknowingthedecryptionfunctions[ 242 ].PHsuseapositiveintegerd2forcomputingthesecretkeyforencryptionsuchthatthesizeoftheencrypteddataincreasesbyafactorofdascomparedtotheoriginaldata.Thesecurityoftheencrypteddataincreaseswithdaswellastheexecutiontimeforencryption.Forexample,theexecutiontimeforencryptionofonebyteofdatais3,481clockcyclesonaMICA2motewhend=2andthisexecutiontimeincreasesto4,277clockcycleswhend=4.MICA2motescannothandlethecomputationsford4[ 242 ],hence,applicationsrequiringgreatersecurityrequiremulti-coresensornodesand/orclusterheadstoperformcomputationsford4. 9.2.3NetworkCoding Networkcodingisacodingtechniquetoenhancenetworkthroughputinmulti-nodalenvironmentssuchasEWSNs.DespitetheeffectivenessofnetworkcodingforEWSNs,excessivedecodingcostassociatedwithnetworkcodinghindersthetechnique'sadoptionintraditionalEWSNswithconstrainedcomputingpower[ 248 ].FutureMCEWSNswillenableadoptionofsophisticatedcodingtechniquessuchasnetworkcodingtoincreasenetworkthroughput. 9.2.4SoftwareDenedRadio Softwaredenedradio(SDR)isaradioinwhichsomeorallofthephysicallayerfunctionsexecuteassoftware.TheradioofexistingEWSNsishardware-based,whichresultsinhigherproductioncostsandminimalexibilityinsupportingmultiplewaveformstandards[ 249 ].MCEWSNscanrealizeSDR-basedradiobyenablingfastandparallelcomputationofsignalprocessingoperationsneededinSDR(e.g.,fastFouriertransform(FFT)).SDR-basedMCEWSNswouldenablemulti-mode,multi-band,andmulti-functionalradiosthatcanbeenhancedusingsoftwareupgrades. 9.3MCEWSNApplicationDomains MCEWSNsaresuitableforsensornetworkingapplicationdomainsthatrequirecomplexin-networkinformationprocessingsuchaswirelessvideosensornetworks, 281

PAGE 282

wirelessmultimediasensornetworks,satellite-basedwirelesssensornetworks,spaceshuttlesensornetworks,aerial-terrestrialhybridsensornetworks,andfault-tolerantsensornetworks.Inthissection,wediscusstheseapplicationdomainsforMCEWSNs. 9.3.1WirelessVideoSensorNetworks Wirelessvideosensornetworks(WVSNs)areWSNsinwhichsmartcamerasand/orimagesensorsareembeddedinthesensornodes.WVSNsemulatethecompoundeyefoundincertainarthropods.AlthoughWVSNsareasubsetofwirelessmultimediasensornetworks(WMSNs),wediscussWVSNsseparatelytoemphasizetheWVSNs'stand-aloneexistence.WVSNsaresuitableforapplicationsinareassuchashomelandsecurity,battleeldmonitoring,andmining.Forexample,videosensorsdeployedatairports,borders,andharborsprovidealevelofcontinuousandaccuratemonitoringandprotectionthatisotherwiseunattainable.Wediscusstheapplicationofmulti-coreembeddedsensornodesbothforimage-andvideo-centricWVSNs. Inimage-centricWVSNs,multipleimage/camerasensorsobserveascenefrommultipledirectionsandareabletodescribeobjectsintheirtruethree-dimensionalappearancebyovercomingocclusionproblems.Low-costimagingsensorsarereadilyavailablesuchasCCDandCMOSimagingsensorsfromKodak,andtheCyclopscamerafromtheUniversityofCaliforniaatLosAngeles(UCLA)designedasanadd-onforMicasensornodes[ 239 ].Imagepre-processinginvolvesconvolutionsanddata-dependentoperationsusingalimitedneighborhoodofpixels.ThesignalprocessingalgorithmsforimageprocessinginWVSNstypicallyexhibitahighdegreeofparallelismandaredominatedbyafewregularkernels(e.g.,FFT)responsibleforalargefractionoftheexecutiontimeandenergyconsumption.Acceleratingthesekernelsonmulti-coreembeddedsensornodeswouldachievesignicantspeedupinexecutiontimeandreductioninenergyconsumption,andwouldhelpachievereal-timecomputationrequirementsformanyapplicationsinenergy-constraineddomains. 282

PAGE 283

Video-centricWVSNsrelyonmultiplevideostreamsfrommultipleembeddedsensornodes.Sincesensornodescanonlyservelow-resolutionvideostreamsgiventhesensornodes'resourcelimitations,asinglevideostreamalonedoesnotcontainenoughinformationforvisionanalysissuchaseventdetectionandtracking,however,multiplesensornodescancapturevideostreamsfromdifferentanglesanddistancestogetherprovidingenormousvisualdata[ 236 ].Videoencodersrelyonintraframecompressiontechniquesthatreduceredundancywithinoneframeandinterframecompressiontechniques(e.g.,predictivecoding)thatexploitredundancyamongsubsequentframes[ 235 ].Videocodingtechniquesrequirecomplexalgorithmsthatexceedthecomputingpowerofsingle-coreembeddedsensornodes.Thevisualdatafromnumeroussensornodescanbecombinedtogivehigh-resolutionvideostreams,however,thisprocessingrequiresmulti-coreembeddedsensornodesand/orclusterheads. 9.3.2WirelessMultimediaSensorNetworks Awirelessmultimediasensornetwork(WMSN)consistsofwirelesslyconnectedembeddedsensornodesthatcanretrievemultimediacontentsuchasvideoandaudiostreams,stillimages,andscalarsensordataoftheobservedphenomenon.WMSNstargetalargevarietyofdistributed,wireless,streamingmultimedianetworkingapplicationsrangingfromhomesurveillancetomilitaryandspaceapplications.Amultimediasensorcapturesaudioandimage/videostreamsusinganembeddedmicrophoneandamicro-camera. VarioussensorsinaWMSNcoordinatecloselytoachieveapplicationgoals.Forexample,inamilitaryapplicationfortargetdetectionandtracking,acousticandelectromagneticsensorscanenableearlydetectionofatargetbutmaynotprovideadequateinformationaboutthetarget.Additionaltargetdetailssuchastypeofvehicle,equippedarmaments,andonboardpersonnelareoftenrequiredandgatheringthesedetailsrequiresimagesensors.Althoughthesensingabilityinmostsensorsisisotropic 283

PAGE 284

andattenuateswithdistance,adistinctcharacteristicofvideo/imagesensorsisthesesensors'directionalsensingrange.Recentlyomnicamerashavebecomeavailablethatcanprovidecompletecoverageofthescenearoundasensornode,however,applicationsarelimitedtocloserangescenariostoguaranteesufcientimageresolutionformovingobjects[ 236 ].Toensurefullcoverageofthesensoreld,asetofdirectionalcamerasisrequiredtocaptureenoughinformationforactivitydetection.Theimageandvideosensorshighsensingcostlimitsthesesensorscontinuousactivationgivenconstrainedembeddedsensornoderesources.Hence,theimageandvideosensorsinaWMSNrequiresophisticatedcontrolsuchthattheimageandvideosensorsaretriggeredonlyafteratargetisdetectedbasedonsenseddatafromotherlowercostsensorssuchasacousticandelectromagnetic. ThedesirableWMSNscharacteristicsincludetheabilitytostore,processinreal-time,correlate,andfusemultimediadataoriginatedfromheterogeneoussources[ 235 ].Multimediacontents,especiallyvideostreams,requiredataratesthatareordersofmagnitudehigherthanthosesupportedbytraditionalsingle-coreembeddedsensornodes.Toprocessmultimediadatainreal-timeandtoreducethewirelessbandwidthdemand,multi-coreembeddedsensornodesinthenetworkarerequired.Multi-coreembeddedsensornodesfacilitatein-situprocessingofvoluminousinformationfromvarioussensors,notifyingtheCAConlyonceaneventisdetected(e.g.,targetdetection). 9.3.3Satellite-BasedWirelessSensorNetworks Asatellite-basedwirelesssensornetwork(SBWSN)isawirelesscommunicationsensingnetworkcomposedofmanysatellites,eachequippedwithmulti-functionalsensors,long-rangewirelesscommunicationmodules,amobilizerformobilitymanagement,andacomputationalunit(potentiallymulti-core)tocarryoutprocessingofthesenseddata.Traditionalsatellitemissionsareextremelyexpensivetodesign,build,launch,andoperate,therebymotivatingtheaerospaceindustrytodirectattentiontodistributed 284

PAGE 285

spacemissions,whichwouldconsistofmultiplesmall,inexpensive,anddistributedsatellitesworkinginacoordinatedfashiontoattainmissiongoals.SBWSNswouldenablerobustspacemissionsbytoleratingthefailureofasingleorafewsatellitesascomparedtoalargesinglesatellitewhereasinglefailurecouldcompromisethesuccessofamission.SBWSNscanbeusedforavarietyofmissionssuchasspaceweathermonitoring,studyingtheimpactofsolarstormsonEarth'smagnetosphereandionosphere,environmentalmonitoring(e.g.,pollution,land,andoceansurfacemonitoring),andhazardprediction(e.g.,oodandearthquakeprediction). EachSBWSNmissionrequiresspecicorbitsandconstellationstomeetmissionrequirementsandGPSprovidesanessentialtoolfororbitdeterminationandnavigation.Typicalconstellationsincludestring-of-pearls,owerconstellation,andsatellitecluster.Inparticular,theowerconstellationprovidesstableorbitcongurations,whicharesuitableformicro-satellite(mass<100kg),nano-satellite(mass<10kg),andpico-satellite(mass<1kg)missions.ImportantorbitalfactorstoconsiderinSBWSNdesignarerelativerange(distance)andspeedbetweensatellites,theISLaccessopportunity,andtheground-linkaccessopportunity.Theaccesstimeisthetimefortwosatellitestocommunicatewitheachotheranddependsupondistancebetweenthesatellites(range).SatellitesinanSBWSNcanbeusedasaninterferometer,whichcorrelatesdifferentimagesacquiredfromslightlydifferentangles/viewpointsinordertogetbetterresolutionandmoremeaningfulinsights. AllthesatellitesinanSBWSNcollaboratetosensethedesiredphenomenon,communicateoverlongdistancesthroughbeam-formingoveranISL,andmaintainthenetworktopologythroughself-organizedmobility[ 250 ].StudiesindicatethatIEEE802.11b(Wi-Fi)andIEEE802.16(WiMax)canbeusedforinter-satellitecommunications(communicationbetweensatellites)andIEEE802.15.4(Zigbee)canbeusedforintra-satellite(communicationbetweensensornodeswithinasatellite)communications[ 251 ].WepointoutthattheIEEE802.11bprotocolrequiresmodications 285

PAGE 286

foruseinanISLwheredistancebetweensatellitesismorethanonekilometersincetheIEEE802.11bstandardnormallysupportsacommunicationrangewithin300meters.Thefeasibilityofwirelessprotocolsforinter-satellitecommunicationdependsonrange,powerrequirements,mediumaccesscontrol(MAC)features,andsupportformobility.Theintra-satelliteprotocolsaremainlyselectedbasedonpowersincetherangeissmallwithinasatellite.Alowdutycycleandtheabilitytoputtheradiotosleeparedesirablefeaturesforintra-satellitecommunicationprotocols.Forexample,theMICA2DOTmote,whichrequires24mWofactivepowerand3Wofstandbypower,suppliedbya3V750mAhbatterycellcanlastfor27,780hoursthreeyearsandtwomonths,whileoperatingatadutycycleof0.1%(supportedbyZigbee)[ 252 ]. SinceindividualsatellitewithinanSBWSNmaynothavesufcientpowertocommunicatewithagroundstation,sinksatelliteinanSBWSNcancommunicatewithagroundstation,whichisconnectedtotheCAC.GroundcommunicationinSBWSNstakesplaceinvery-highfrequency(VHF)(30MHz300MHz)andultra-highfrequency(UHF)(300MHz3GHz)bands.VHFfrequenciespassthroughtheionospherewitheffectssuchasscintillation,fading,Faraday'srotation,andmulti-patheffectsduringintensesolarcyclesduetoreectionoftheVHFsignals.UFHfrequencies,inwhichbothS-andL-bandslie,cansufferseveredisruptionsduringasolarstorm.ForaformationofseveralSNAP-1nano-satellites,thetypicaldownlinkdatarateis38.4kbpsor76.8kbpsmaximum[ 252 ],whichnecessitatesmulti-coreembeddedsensornodesinSBWSNstoperformin-situprocessingsothatonlyeventdescriptionsaresenttotheCAC. 9.3.4SpaceShuttleSensorNetworks Aspaceshuttlesensornetwork(3SN)correspondstoanetworkofsensorsaimedtomonitoraspaceshuttleduringpre-ight,ascent,on-orbit,andre-entryphases.Battery-operatedembeddedwirelesssensorscanbeeasilybondedtothespaceshuttlestructureandenablereal-timemonitoringoftemperature,triaxialvibration,strain,pressure,tilt,chemical,andultrasounddata.MCEWSNswouldenablereal-time 286

PAGE 287

monitoringofspacevehiclesnotpossiblebyground-basedsensingsystems.Forexample,theColumbiaspaceshuttleaccidentwascausedbydamagedonewhenfoamshieldingdislodgedfromtheexternalfueltankduringtheshuttle'slaunch,whichdamagedthewingleadingedgepanels[ 253 ].Thevehiclelackedon-boardsensorsthatcouldhaveenabledgroundpersonneltodeterminetheextentandlocationofthedamage.Ground-basedcamerascapturedimagesoftheimpactbutwerenotabletoreliablycharacterizethelocationandseverityoftheimpactandresultingdamage. MCEWSNsforspaceshuttles,currentlyunderdevelopment,wouldbeusedforspaceshuttlemainengine(SSME)crackinvestigation,spaceshuttleenvironmentalcontrollifesupportsystem(ECLSS)oxygenandnitrogenexhosesanalysis,andwingleadingedgeimpactdetection.Sincetheamountofdataacquiredduringthe10-minuteascentperiodisnearly100MB,thetimetodownloadalldataevenforasingleeventviatheradiofrequency(RF)linkisprohibitivelylong.Hence,informationfusionalgorithmsarerequiredtobeimplementedin3SNstominimizethequantityandincreasethequalityofdatabeingtransmittedviatheRFlink.Furthermore,MCEWSNswouldenablea10xreductionintheinstallationcostsfortheshuttleascomparedtothesensingsystemsbasedontraditionalwiredapproaches[ 253 ]. 9.3.5Aerial-TerrestrialHybridSensorNetworks Aerial-terrestrialhybridsensornetworks(ATHSNs),whichconsistofgroundsensorsandaerialsensors,integrateterrestrialsensornetworkswithaerial/spacesensornetworks.ToconnectremoteterrestrialEWSNstoaCAClocatedfarawayinurbanareas,ATHSNscanincludeasatellitebackbonenetwork.Thesatellitebackbonenetworkiswidelyavailableatremotelocationsandprovidesareliableandbroadbandcommunicationnetwork[ 234 ][ 254 ].VarioussatellitecommunicationchoicesarepossiblesuchasWildBlue,HughesNet,andtheNASA'sgeostationaryoperationalenvironmentalsatellite(GOES)system.However,asatellite'suplinkanddownlinkbandwidthislimited,andrequirespre-processingaswellascompressionof 287

PAGE 288

senseddata(especiallymultimediadatasuchasimageandvideostreams).Multi-coreembeddedsensornodesaresuitableforATHSNs,andarecapableofcarryingouttheprocessingandcompressionofhigh-qualityimageandvideostreamsfortransmissiontoandfromsatellitebackbonenetwork. AerialnetworksinATHSNsmayconsistofunmannedaerialvehicles(UAVs)andsatellites.Forexample,consideranATHSNinwhichUAVscontainembeddedimageandvideosensorssuchthatonlytheimagescenesthatareofsignicantinterestfromamilitarystrategyperspectivearesensedingreaterdetail.TheworkingofATHSNsconsistingofUAVsandsatellitescanbedescribedconciselyinsevensteps[ 234 ]:1)Groundsensorsdetectthepresenceofahostiletargetinthemonitoredeldandstoreeventsinmemory;2)Thesatelliteperiodicallycontactsmulti-coreclusterheadsintheterrestrialEWSNtodownloadupdatesabouttargetpresence;3)SatellitescontactUAVstoacquireimagedataaboutthescenewheretheintrusionisdetected;4)UAVsgatherimagedatathroughtheembeddedimagesensors;5)Theembeddedmulti-coresensorsinUAVsprocessandcompresstheimagedatafortransmissiontothesatellitebackbonenetworkinabandwidthefcientmanner;6)ThesatellitebackbonenetworkrelaystheprocessedinformationreceivedfromtheUAVstotheCAC;7)Thesatellitebackbonenetworkrelaysthecommands(e.g.,launchingtheUAVs'arsenal)fromtheCACtotheUAVs. Yeetal.[ 254 ]haveimplementedanATHSNprototypeforanecologicalstudyusingtemperature,humidity,photosyntheticallyactiveradiation(PAR),windspeed,andprecipitationsensors.TheprototypeconsistsofasmallsatellitedishandacommunicationmodemforintegratingaterrestrialEWSNwiththeWildBluesatellitebackbonenetwork,whichprovidescommercialservice.TheprototypeusesIntel'sStargateprocessorasthesinknode,whichprovidesaccesscontrolandmanagestheuseofthesatellitelink. 288

PAGE 289

Thetransformationalsatellite(TSAT)systemisafuturegenerationsatellitesystem,whichisdesignedformilitaryapplicationsbyNationalAeronauticsandSpaceAdministration(NASA),theU.S.DepartmentofDefense(DoD),andtheIntelligenceCommunity(IC)[ 234 ].TheTSATsystemisconceivedasaconstellationofvesatellites,placedingeostationaryorbit,thatconstituteahigh-bandwidthsatellitebackbonenetwork,whichallowsterrestrialunitstoaccessopticalandradarimageryfromUAVsandsatellitesinreal-time.TSATprovidesbroadband,reliable,worldwide,andsecuretransmissionofvariousdata.TSATsupportsRFcommunicationlinkswithdataratesupto45Mbpsandlasercommunicationlinkswithdataratesupto10-100Gbps[ 234 ]. 9.3.6Fault-TolerantSensorNetworks ThesensornodesinanEWSNaretypicallydeployedinharshandunattendedenvironments,whichmakesfault-tolerance(FT)animportantconsiderationinEWSNdesignparticularlyforspace-basedWSNs.Forexample,thetemperatureofaerospacevehiclesvariesfromcryogenictoextremelyhightemperature,andpressurefromvacuumtoveryhighpressure.Additionally,shockandvibrationlevelsduringlaunchcancausecomponentfailures.Furthermore,highlevelsofionizingradiationrequireselectronicstobeFTifnotradiation-hardened(rad-hard).Multi-coreembeddedsensorscanprovidehardware-based(e.g.,triplemodularredundancy(TMR)orself-checkingpairs(SCP))aswellassoftware-based(e.g.,algorithm-basedfaulttolerance(ABFT))FTmechanismsforapplicationsrequiringhighreliability.Computationssuchaspre-processinganddatafusioncanbereplicatedonmultiplecoressothatifradiationcorruptsprocessingononecore,processingonothercoreswouldstillenablereliablecomputationofresults. 9.4Multi-CoreEmbeddedSensorNodes MCEWSNsarenotmerelyatheoreticalenvisionasseveralinitiativestowardsmulti-coreembeddedsensornodeshavebeenundertakenbyacademiaandindustry 289

PAGE 290

forvariousreal-timeapplications.Inthissection,wedescribeseveralstate-of-the-artmulti-coreembeddedsensornodeprototypes. 9.4.1InstraNode InstraNodeisadual-coresensornodeforreal-timehealthmonitoringofcivilstructuressuchashighwaybridgesandskyscrapers.InstraNodeisequippedwitha4000mAhlithium-ionbattery,threeaccelerometers,agyroscope,andanIEEE802.11b(Wi-Fi)cardforcommunicationwithothernodes.Onelow-powerprocessorcoreinInstraNoderunsat3Vand4MHzandisdedicatedtosamplingdatafromsensorswhereastheotherfaster,high-powerprocessorcorerunsat4.3Vand40MHzandisresponsiblefornetworkingtaskssuchastransmission/receptionofdataandexecutionofaroutingalgorithm.Furthermore,InstraNodepossessesmulti-modaloperationcapabilitiessuchaswired/wirelessandbattery-powered/AC-adaptorpoweredoptions.ExperimentsindicatethattheInstraNodeoutperformssingle-coresensornodesintermsofpower-efciencyandnetworkperformance[ 255 ]. 9.4.2MarsRoverPrototypeMote Etchisonetal.[ 256 ]haveproposedahigh-performanceEWSNfortheMarsRover,whichconsistsofdual-coremobilesensornodesandawirelessclusterconsistingofmultipleprocessorstoprocessimagedatagatheredfromthesensornodesaswellastomakedecisionbasedongatheredinformation.TheprototypemoteconsistsofaMicroATXmotherboardwithIntel'sdual-coreAtomprocessor,2GBofRAM,andispoweredby12V/5ADCpowersupplyforlabtesting.Eachmoteperformsdataacquisition,processing,andtransmission. 9.4.3Satellite-BasedSensorNode Vladimirovaetal.[ 257 ]havedevelopedasystem-on-chip(SoC)satellite-basedsensornode(SBSN).TheSBSNprototypecontainsaSPARCV8LEON3softprocessorcore,whichallowscongurationinasymmetricmultiprocessor(SMP)architecture[ 258 ].TheLEON3processorcorerunssoftwareapplicationsandinterfaceswiththeupper 290

PAGE 291

layersofthecommunicationstackwiththeIEEE802.11protocol.TheSBSNprototypeusesanumberofintellectualpropertycoressuchashardwareacceleratedWi-FiMAC,transceivercore,andaJavaco-processor.TheJavaco-processorenablesdistributedcomputingandInternetprotocol(IP)-basednetworkingfunctionsinSBWSNs.Theinter-satellitecommunicationmodule(ISCM)intheSBSNprototypeadherestoIEEE802.11andCubeSatdesignspecications.TheISCMsupportsgroundcommunicationlinksandISLsatvariabledataratesandcongurablewaveformstoadapttochannelconditions.TheISCMincorporatesS-band(2.4GHz)aswellas434/144MHzradiofrontendsinterfacedtoasinglerecongurablemodem.TheISCMleveragesahigh-endAD9861ADC/digital-to-analogconverter(DAC)forthe2.4GHzradiofrontendforaMaxim2830radioandalow-endAD7731forthe434/144MHzfrontendforanAlincoDJC-7Eradio.Additionally,ISCMincorporatescurrentandtemperaturesensorsanda16-bitmicrocontrollerforhousekeepingpurposes. 9.4.4Multi-CPU-basedSensorNodePrototype Oharaetal.[ 259 ]havedevelopedaprototypeforanembeddedsensornodeusingthreePIC18centralprocessingunits(CPUs).TheprototypeissuppliedbyacongurablevoltagestabilizedpowersupplybutthesamevoltageissuppliedtoallCPUs.TheprototypeallowedchangingfrequencyofeachCPUstaticallybychangingacorrespondingceramicresonator.Experimentsrevealthatthemulti-CPUsensornodeprototypeconsumes76%lesspowerascomparedtoasingle-coresensornodeforbenchmarksthatinvolvedsampling,rootmeansquarecalculation,andsamplespre-processingfortransmission. 9.4.5SmartCameraMote AsmartcameramotehasbeendevelopedbyKleihorstetal.[ 239 ],whichconsistsoffourbasiccomponents:colorimagesensors,anIC3DSIMDprocessor(amemberofthePhilips'XetalfamilyofSIMDprocessors)forlow-levelimageprocessing,ageneralpurposeprocessorforintermediateandhigh-levelprocessing 291

PAGE 292

andcontrol,andacommunicationmodule.Bothoftheprocessorsarecoupledwithadual-portrandom-accessmemory(RAM)thatenablestheseprocessorstoworkinasharedworkspace.TheIC3DSIMDprocessorconsistsofalineararrayof320RISCprocessors.ThepeakpixelperformanceoftheIC3Dprocessorisapproximately50Gigaoperationspersecond(GOPS).Despitehighpixelperformance,theIC3Dprocessorisaninherentlylow-powerprocessor,whichmakestheprocessorsuitableformulti-coreembeddedsensornodes.ThepowerconsumptionoftheIC3Dprocessorfortypicalapplicationssuchasfeaturendingorfacedetectionisbelow100mWinactiveprocessingmodes. 9.5ConcludingRemarks Inthischapter,weproposedanarchitectureforheterogeneoushierarchicalmulti-coreembeddedwirelesssensornetworks(MCEWSNs).Compute-intensivetaskssuchasinformationfusion,encryption,networkcoding,andsoftwaredenedradio,willbenetinparticularfromtheincreasedcomputationpowerofferedbymulti-coreembeddedsensornodes.Manywirelesssensornetworkingapplicationdomainssuchaswirelessvideosensornetworks,wirelessmultimediasensornetworks,satellite-basedsensornetworks,spaceshuttlesensornetworks,aerial-terrestrialhybridsensornetworks,andfault-tolerantsensornetworks,canbenetfromMCEWSNs.PerceivingthepotentialbenetsofMCEWSNs,severalinitiativeshavebeenundertakeninbothacademiaandindustrytodevelopmulti-coreembeddedsensornodessuchasInstraNode,satellite-basedsensornode,andsmartcameramote. DespitefewinitiativestowardsMCEWSNs,thedomainisstillinitsinfancyandrequiresaddressingsomechallengestofacilitateubiquitousdeploymentofMCEWSNs.SincebatteryenergyisthemostcriticalresourceconstraintforMCEWSNs,researchanddevelopmentinenergy-efcientbatteriesandenergy-harvestingsystemswouldbebenecialforMCEWSNs.Mobilityandself-adaptabilityofembeddedsensornodesrequiresfurtherresearchtoobtainthedesiredviewofthesensoreld(e.g.,animage 292

PAGE 293

sensorfacingdownwardtowardstheearthmaynotbedesirable).Furthermore,distillinginformationfromalargenumberoflow-resolutionvideostreamsobtainedfrommultiplevideosensorsrequiresnovelalgorithmssincecurrentcomputer-visionalgorithmsareabletoanalyzeonlyafewhigh-resolutionimages.Finally,recongurabilityinMCEWSNsisanimportantresearchavenuethatwouldallowthenetworktoadapttonewrequirementsbyacceptingcodeupgrades(e.g.,amoreefcientalgorithmforvideocompressionmaybediscoveredafterdeployment). 293

PAGE 294

CHAPTER10AQUEUEINGTHEORETICAPPROACHFORPERFORMANCEEVALUATIONOFPARALLELMULTI-COREEMBEDDEDSYSTEMS WithMoore'slawsupplyingbillionsoftransistorson-chip,embeddedsystemsareundergoingaparadigmshiftfromsingle-coretomulti-coretoexploitthishightransistordensityforhighperformance.Thisparadigmshifthasledtotheemergenceofdiversemulti-coreembeddedsystemsinaplethoraofapplicationdomains(e.g.,high-performancecomputing,dependablecomputing,mobilecomputing,etc.).Manymodernembeddedsystemsintegratemultiplecores(whetherhomogeneousorheterogeneous)on-chiptosatisfycomputingdemandwhilemaintainingdesignconstraints(e.g.,energy,power,performance,etc.).Forexample,a3Gmobilehandset'ssignalprocessingrequires35-40Gigaoperationspersecond(GOPS).Consideringthelimitedenergyofamobilehandsetbattery,theseperformancelevelsmustbemetwithapowerdissipationbudgetofapproximately1W,whichtranslatestoaperformanceefciencyof25mW/GOPor25pJ/operationforthe3Greceiver[ 260 ].Thesedemandingandcompetingpower-performancerequirementsmakemodernembeddedsystemdesignchallenging. Increasingcustomerexpectations/demandsforembeddedsystemfunctionalityhasledtoanexponentialincreaseindesigncomplexity.Whileindustryfocusesonincreasingthenumberofon-chipprocessorcorestomeetcustomerperformancedemands,embeddedsystemdesignersfacethenewchallengeofoptimallayoutoftheseprocessorcoresalongwiththememorysubsystem(cachesandmainmemory)tosatisfypower,area,andstringentreal-timeconstraints.Theshorttime-to-market(timefromproductconceptiontomarketrelease)ofembeddedsystemsfurtherexacerbatesdesignchallenges.Architecturalmodelingofembeddedsystemshelpsinreducingthetime-to-marketbyenablingfastapplication-to-devicemappingsinceidentifyinganappropriatearchitectureforasetoftargetapplicationssignicantlyreducesthedesigntimeofanembeddedsystem.Toensuretimelycompletionofembeddedsystemsdesign 294

PAGE 295

withsufcientcondenceintheproduct'smarketrelease,designengineershavetomaketradeoffsbetweentheabstractionlevelandtheaccuracyamulti-corearchitecturemodelcanattain. Modernmulti-coreembeddedsystemsallowprocessorcorestosharehardwarestructuressuchaslast-levelcaches(LLCs)(e.g.,leveltwo(L2)orlevelthree(L3)cache),memorycontrollers,andinterconnectionnetworks[ 261 ].SincetheLLC'sconguration(e.g.,size,linesize,associativity)andthelayoutoftheprocessorcores(on-chiplocation)hasasignicantimpactonamulti-coreembeddedsystem'sperformanceandenergy,ourworkfocusesonperformanceandenergycharacterizationofembeddedarchitecturesbasedondifferentLLCcongurationsandlayoutoftheprocessorcores.Thoughthereisageneralconsensusonusingprivatelevelone(L1)instruction(L1-I)anddata(L1-D)cachesinembeddedsystems,therehasbeennodominantarchitecturalparadigmforprivateorsharedLLCs.SincemanyembeddedsystemscontainanL2cacheastheLLC,wefocusontheL2cache,however,ourstudycaneasilybeextendedforL3cachesandbeyondasLLCs. Sincemulti-corebenchmarksimulationrequiressignicantsimulationtimeandresources,alightweightmodelingtechniqueformulti-corearchitectureevaluationiscrucial[ 262 ].Previousworkpresentsvariousmulti-coresystemmodels,however,thesemodelsbecomeincreasinglycomplexwithvaryingdegreesofcachesharing[ 263 ].Manyofthepreviousmodelsassumedthatsharingamongstprocessorcoresoccurredateitherthemainmemorylevelortheprocessorcoresallsharedthesamecachehierarchy,however,multi-coreembeddedsystemscanhaveanL2cachesharedbyasubsetofcores(e.g.,Intel'ssix-coreDunningtonprocessorhasL2cachessharedbytwoprocessorcores).Weleverageforthersttime,tothebestofourknowledge,queueingnetworktheoryasanalternativeapproachformodelingmulti-coreembeddedsystemsforperformanceanalysis(thoughqueueingnetworkmodelshavebeenstudiedincontextoftraditionalcomputersystems[ 134 ]).Ourqueueingnetworkmodelapproach 295

PAGE 296

allowsmodelingthelayoutofprocessorcores(homogeneousorheterogeneous)withcachesofdifferentcapacitiesandcongurationsatdifferentcachelevels.Ourmodelingtechniqueonlyrequiresahigh-levelworkloadcharacterizationofanapplication(i.e.,whethertheapplicationisprocessor-bound(requiringhighprocessingresources),memory-bound(requiringlargenumberofmemoryaccesses),ormixed). Ourmaincontributionsinthischapterare: Wepresentanovel,queueingtheory-basedmodelingtechniqueforevaluatingmulti-coreembeddedarchitecturesthatdoesnotrequirearchitectural-levelbenchmarksimulation.Thismodelingtechniqueenablesquickandinexpensivearchitecturalevaluationbothintermsofdesigntimeandresourcesascomparedtodevelopingand/orusingexistingmulti-coresimulatorsandrunningbenchmarksonthesesimulators.Basedonapreliminaryevaluationusingourmodel,architecturedesignerscanruntargetedbenchmarkstofurtherverifytheperformancecharacteristicsofselectedmulti-corearchitectures(i.e.,ourqueueingtheory-basedmodelfacilitatesearlydesignspacepruning). Ourqueueingtheoreticapproachquantiesperformancemetrics(e.g.,responsetime,throughput)fordifferentworkload/benchmarkcharacteristicsanddifferentcachemissrates.Althoughgeneraltrendsofperformancemetricscouldbeanticipatedfordifferentcachemissratesandworkloadcharacteristics,ourworkforthersttimequantiesthepercentageincreaseanddecreaseintheperformancemetricsfordifferentcachemissratesandworkload/benchmarkcharacteristicsfordifferentarchitectures. Ourqueueingtheoreticapproachenablesarchitecturalevaluationforworkloadswithanycomputingrequirementscharacterizedprobabilistically.Wealsoproposeamethodtoquantifycomputingrequirementsofrealbenchmarksprobabilistically. Ourqueueingtheoreticmodelingapproachcanbeusedforperformanceperwattandperformanceperunitareacharacterizationsofmulti-coreembeddedarchitectures,withvaryingnumberofprocessorcoresandcachecongurations,toprovideacomparativeanalysis.Forperformanceperwattandperformanceperunitareacomputations,wecalculatechipareaandpowerconsumptionfordifferentmulti-coreembeddedarchitectureswithvaryingnumberofprocessorcoresandcachecongurations. Wepointoutthatalthoughqueueingtheoryhasbeenusedinliteratureforperformanceanalysisofmulti-disksystems[ 134 ][ 264 ],weforthersttimetothebestofourknowledgeapplyqueueingtheory-basedmodelingandperformanceanalysis 296

PAGE 297

techniquestomulti-coreembeddedsystems.Furthermore,weforthersttimedevelopamethodologytosimulateworkloads/benchmarksonourqueueingtheoreticmulti-coremodelbasedonprobabilitiesthatareassignedaccordingtoworkloadcharacteristics(e.g.,processor-bound,memory-bound,ormixed)andcachemissrates.WeverifyourqueueingtheoreticmodelingapproachbyrunningSPLASH-2multi-threadedbenchmarksontheSuperESCalarsimulator(SESC).TheSESCsimulationresultsvalidateourqueueingtheoreticmodelingapproachasaquickandinexpensivearchitecturalevaluationmethod. Ourinvestigationofperformanceandenergyfordifferentcachemissratesandworkloadsisimportantbecausecachemissratesandworkloadscansignicantlyimpacttheperformanceandenergyofanembeddedarchitecture.Furthermore,cachemissratesalsogiveanindicationofthedegreeofcachecontentionbetweendifferentthreads'workingsets.Ourperformance,power,andperformanceperwattresultsindicatethatmulti-coreembeddedarchitecturesthatleveragesharedLLCsarescalableandprovidethebestLLC'sperformanceperwatt.However,sharedLLCarchitecturesmayintroducemainmemoryresponsetimeandthroughputbottlenecksforhighcachemissrates.ThearchitecturesthatleverageahybridofprivateandsharedLLCsarescalableandalleviatemainmemorybottlenecksattheexpenseofreducedperformanceperwatt.ThearchitectureswithprivateLLCsexhibitlessscalabilitybutdonotintroducemainmemorybottlenecksattheexpenseofreducedperformanceperwatt. 10.1RelatedWork Queueingtheoryhasbeenusedinliteraturefortheperformanceanalysisofcomputernetworksandothercomputersystems.Samarietal.[ 265 ]usedqueueingtheoryfortheanalysisofdistributedcomputernetworks.Theauthorsproposedacorrectionfactorintheanalyticalmodelandcomparedtheresultswiththeanalyticalmodelwithoutthecorrectionfactor[ 266 ],andsimulationtoverifythecorrectnessoftheproposedmodel.Mainkaretal.[ 267 ]usedqueueingtheory-basedmodelsfor 297

PAGE 298

performanceevaluationofcomputersystemswithacentralprocessingunit(CPU)anddiskdrives.Ourworkdiffersfromthepreviousworkonqueueingtheory-basedmodelsforcomputersystemsinthatourworkappliesqueueingtheoryforperformanceevaluationofmulti-coreembeddedsystemswithdifferentcachesubsystems,whichwerenotinvestigatedbeforeusingqueueingtheory.Furthermore,ourworkintroducesanovelwayofrepresentingworkloadswithdifferentcomputingrequirementsprobabilisticallyinaqueueingtheorybasedmodel. Previousworkpresentsevaluationandmodelingtechniquesformulti-coreembeddedarchitecturesfordifferentapplicationsandvaryingworkloadcharacteristics.Savageetal.[ 263 ]proposedauniedmemoryhierarchymodelformulti-corearchitecturesthatcapturedvaryingdegreesofcachesharingatdifferentcachelevels.Themodel,however,onlyworkedforstraight-linecomputationsthatcouldberepresentedbydirectedacyclicgraphs(DAGs)(e.g.,matrixmultiplication,fastFouriertransform(FFT)).Ourqueueingtheoreticmodelscanworkforvirtuallyanytypeofworkloadwithanycomputingrequirements.Fedorovaetal.[ 261 ]studiedcontention-awaretaskschedulingformulti-corearchitectureswithsharedresources(caches,memorycontrollers,andinterconnectionnetworks).Theymodeledthecontention-awaretaskschedulerandinvestigatedthescheduler'simpactonapplicationperformanceformulti-corearchitectures.Ourqueueingtheoreticmodelspermitawiderangeofschedulingdisciplinesbasedonworkloadrequirements(e.g.,rst-come-rst-served(FCFS),priority,roundrobin(RR),etc.). Somepreviousworkinvestigatedperformanceandenergyaspectsformulti-coresystems.Kumaretal.[ 268 ]studiedpower,throughput,andresponsetimemetricsforheterogeneousCMPs.TheauthorsobservedthatheterogeneousCMPscouldimproveenergyperinstructionby4-6xandthroughputby63%overanequivalentareahomogeneousCMPbecauseofcloseradaptationtotheresourcerequirementsofdifferentapplicationphases.Theauthorsusedamulti-coresimulatorforperformance 298

PAGE 299

analysis,however,ourqueueingtheoreticmodelscanbeusedasaquickandinexpensivealternativeforinvestigatingperformanceaspectsofheterogeneousCMPs.Sabryetal.[ 269 ]investigatedperformance,energy,andareatradeoffsforprivateandsharedL2cachesformulti-coreembeddedsystems.TheauthorsproposedaSystemC-basedplatformthatcouldmodelprivate,shared,andhybridL2cachearchitectures(ahybridL2cachearchitecturecontainsseveralprivateL2caches,eachcontainingonlyprivatedata,andauniedsharedL2cachethatstoresonlyshareddata).TheSystemC-basedmodel,however,requiredintegrationwithothersimulatorssuchasMPARMtoobtainperformanceresults.Ourqueueingtheoreticmodelsdonotrequireintegrationwithothermulti-coresimulatorstoobtainperformanceresultsandserveasanindependentcomparativeperformanceanalysisapproachformulti-corearchitectureevaluation.Bentezetal.[ 270 ]proposedanadaptiveL2cachearchitecturethatadaptedtotthecodeanddataduringruntimeusingpartialcachearrayshutdown.Theadaptivecachecouldbeconguredinfourmodesthatprioritizedeitherinstructionspercycle(IPC),processorpowerdissipation,processorenergyconsumption,orprocessorpower2delayproduct.ExperimentsrevealedthatCMPswith2MBofprivateadaptiveL2cacheprovided14.2%,44.3%,18.1%,and29.4%improvementinIPC,powerdissipation,energyconsumption,andpower2delay,respectively,overa4MBsharedL2cache.Ourworkdoesnotconsideradaptiveprivatecachesbutcomparesprivate,shared,andhybridcachesonanequalareabasistoprovideafaircomparisonbetweendifferentLLCs. Thereexistsworkintheliteraturerelatedtomemorysubsystemlatencyandthroughputanalysis.Ruggiero[ 271 ]investigatedcachelatency(atallcachelevels),memorylatency,cachebandwidth/throughput(atallcachelevels),andmemorybandwidthformulti-coreembeddedsystemsusingLMBench,anopensourcebenchmarksuite,onIntelprocessors.Ourmodelsenablemeasurementofcachelatencyand 299

PAGE 300

throughputofmodeledarchitecturesinanearlydesignphasewhenfabricatedarchitecturesarenotavailable. Althoughthereexistspreviousworkonperformanceevaluation,ourworkisnovelbecauseweforthersttimedevelopqueueingnetworkmodelsforvariousmulti-coreembeddedarchitectures.Somepreviousworkpresentsbenchmark-drivenevaluationforspecicembeddedarchitectures,however,multi-coreembeddedarchitectureevaluationconsideringdifferentworkloadcharacteristics,cachecongurations(private,shared,orhybrid),andmissrateswithcomparativeanalysishasnotbeenaddressed. 10.2QueueingNetworkModelingofMulti-CoreEmbeddedArchitectures Inthissection,wedenequeueingnetworkterminologiesandourmodelingapproachinthecontextofmulti-coreembeddedarchitectures.Weusethetermjobsofteninsteadoftasks(decomposedworkloadresultingfromparallelizingajob)tobeconsistentwiththequeueingnetworkterminology.Ourmodelingapproachisbroadlyapplicabletomulti-programmedworkloadswheremultiplejobsrunonthemulti-coreembeddedarchitectureaswellasforparallelizedapplications/jobsthatrundifferenttasksonthemulti-corearchitectures. Aqueueingnetworkconsistsofservicecenters(e.g.,processorcore,L1-Icache,L1-Dcache,L2cache,andmainmemory(MM))andcustomers(e.g.,jobs/tasks).Aservicecenterconsistsofoneormorequeuestoholdjobswaitingforservice.Arrivingjobsentertheservicecenter'squeueandascheduling/queueingdiscipline(e.g.,rst-come-rst-served(FCFS),priority,roundrobin(RR),processorsharing(PS),etc.)selectsthenextjobtobeservedwhenaservicecenterbecomesidles.Thequeueingdisciplineispreemptiveifanarrivinghigherpriorityjobcansuspendtheservice/executionofalowerpriorityjob,otherwisethequeueingdisciplineisnon-preemptive.FCFSisanon-preemptivequeueingdisciplinethatservesthewaitingjobsintheorderinwhichthejobsenterthequeue.Prioritybasedqueueingdisciplinescanbepreemptiveornon-preemptiveandservesthejobsbasedonanassignedjobpriority. 300

PAGE 301

IntheRRqueueingdiscipline,ajobreceivesaservicetimequantum(slot).Ifthejobdoesnotcompleteduringtheservicetimequantum,thejobisplacedattheendofthequeuetoresumeduringasubsequentservicetimequantum.InthePSqueueingdiscipline,alljobsataservicecenterareservicedsimultaneously(andhencethereisnoqueue)withtheservicecenter'sspeedequallydividedacrossallofthejobs.Afterbeingserviced,ajobeithermovestoanotherservicecenterorleavesthenetwork. Aqueueingnetworkisopenifjobsarrivefromanexternalsource,spendtimeinthenetwork,andthendepart.Aqueueingnetworkisclosedifthereisnoexternalsourceandnodepartures(i.e.,axednumberofjobscirculateindenitelyamongtheservicecenters).Aqueueingnetworkisasingle-chainqueueingnetworkifalljobspossessthesamecharacteristics(e.g.,arrivalrates,requiredservicerate,androutingprobabilitiesforvariousservicecenters)andareservicedbythesameservicecentersinthesameorder.Ifdifferentjobscanbelongtodifferentchains,thenetworkisamulti-chainqueueingnetwork.Animportantclassofqueueingnetworksisproduct-formwherethejointprobabilityofthequeuesizesinthenetworkisaproductoftheprobabilitiesfortheindividualservicecenters'queuesizes. Thequeueingnetworkperformancemetricsincluderesponsetime,throughput,andutilization.Theresponsetimeistheamountoftimeajobspendsattheservicecenterincludingthequeueingdelay(theamountoftimeajobwaitsinthequeue)andtheservicetime.Theservicetimeofajobdependsupontheamountofwork(e.g.,numberofinstructions)neededbythatjob.Thethroughputisdenedasthenumberofjobsservedperunitoftime.Inourmulti-coreembeddedarchitecturecontext,throughputmeasuresthenumberofinstructions/data(bits)processedbythearchitecturalelement(processor,cache,MM)persecond.Utilizationmeasuresthefractionoftimethataservicecenter(processor,cache,MM)isbusy.Little'slawgovernstherelationshipbetweenthenumberofjobsinthequeueingnetworkNandresponsetimetr(i.e., 301

PAGE 302

Table10-1. Multi-coreembeddedarchitectureswithvaryingprocessorcoresandcachecongurations(Pdenotesaprocessorcore,Mmainmemory,andintegerconstantsinfrontofP,LIID,L2,andMdenotesthenumberofthesearchitecturalcomponentsintheembeddedarchitecture). ArchitectureDescription 2P-2L1ID-2L2-1MMulti-coreembeddedarchitecturewith2processorcores,privateL1I/Dcaches,privateL2caches,andasharedM2P-2L1ID-1L2-1MMulti-coreembeddedarchitecturewith2processorcores,privateL1I/Dcaches,asharedL2cache,andasharedM4P-4L1ID-4L2-1MMulti-coreembeddedarchitecturewith4processorcores,privateL1I/Dcaches,privateL2caches,andasharedM4P-4L1ID-1L2-1MMulti-coreembeddedarchitecturewith4processorcores,privateL1I/Dcaches,asharedL2cache,andasharedM4P-4L1ID-2L2-1MMulti-coreembeddedarchitecturewith4processorcores,privateL1I/Dcaches,2sharedL2caches,andasharedM N=trwheredenotestheaveragearrivalrateofjobsadmittedtothequeueingnetwork[ 272 ]). Weconsidertheclosedproduct-formqueueingnetworkformodelingmulti-coreembeddedarchitecturesbecauseatypicalembeddedsystemexecutesaxednumberofjobs(e.g.,amobilephonehasonlyafewapplicationstorunsuchasinstantmessaging,audiocoding/decoding,calculator,graphicsinterface,etc.).Furthermore,closedproduct-formqueueingnetworksassumethatajobleavingthenetworkisreplacedinstantaneouslybyastatisticallyidenticalnewjob[ 134 ].Table 10-1 describesthemulti-coreembeddedarchitecturesthatweevaluateinthischapter.Wefocusonembeddedarchitecturesrangingfrom2(2P)to4(4P)processorcorestoreectcurrentarchitectures[ 225 ],however,ourmodelisapplicabletoanynumberofcores.Ourmodeledembeddedarchitecturescontainprocessorcores,L1-IandL1-Dprivatecaches,L2caches(privateorshared),andMM(embeddedsystemsaretypicallyequippedwithDRAM/NAND/NORFlashmemory[ 273 ][ 274 ]). 302

PAGE 303

Consideraclosedproduct-formqueueingnetworkwithIservicecenterswhereeachservicecenteri2Ihasaserviceratei.Letpijbetheprobabilityofajobleavingservicecenteriandenteringanotherservicecenterj.Therelativevisitcount#jtoservicecenterjis #j=IXi=1#ipij(10) Theperformancemetrics(e.g.,throughput,responsetime,etc.)foraclosedproduct-formqueueingnetworkcanbecalculatedusingameanvalueanalysis(MVA)iterativealgorithm[ 275 ].ThebasisofMVAisatheoremstatingthatwhenajobarrivesataservicecenterinaclosednetworkwithNjobs,thedistributionofthenumberofjobsalreadyqueuedisthesameasthesteadystatedistributionofN)]TJ /F1 11.955 Tf 12.35 0 Td[(1jobsinthequeue[ 276 ].SolvingEquation 10 usingMVArecursivelygivesthefollowingperformancemetricvalues:themeanresponsetime ri(k)atservicecenteri,meanqueueingnetworkthroughput T(k),themeanthroughputofjobs ti(k)atservicecenteri,andthemeanqueuelength li(k)atservicecenteriwhentherearekjobsinthenetwork.Theinitialrecursiveconditionsarei=0suchthat ri(0)= T(0)= ti(0)= li(0)=0.Thevaluesfortheseperformancemetricscanbecalculatedforkjobsbasedonthecomputedvaluesfork)]TJ /F1 11.955 Tf 11.95 0 Td[(1jobsas[ 134 ] ri(k)=1 i(1+ li(k)]TJ /F1 11.955 Tf 11.95 0 Td[(1))(10) T(k)=k R=k PIi=1#i ri(k)(10) ti(k)=#i (T)(k)(10) li(k)= ti(k) ri(k)(10) Toexplainourmodelingapproachformulti-coreembeddedarchitectures,wedescribeasamplequeueingmodelforthe2P-2L1ID-2L2-1Marchitectureindetail(other 303

PAGE 304

Figure10-1. Queueingnetworkmodelforthe2P-2L1ID-2L2-1Mmulti-coreembeddedarchitecture. architecturemodelsfollowasimilarexplanation).Fig. 10-1 depictsthequeueingnetworkmodelfor2P-2L1ID-2L2-1M.Thetaskschedulerschedulesthetasks/jobsonthetwoprocessorcoresP1andP2.Weassumethatthetaskscheduleriscontention-awareandschedulestaskswithminimalornocontentiononcoressharingLLCs[ 261 ].Thequeueingnetworkconsistsoftwochains:chainonecorrespondstoprocessorcoreP1andchaintwocorrespondingtoprocessorcoreP2.ThejobsservicedbyP1eitherreenterP1withprobabilityPr1P1P1orentertheL1-IcachewithprobabilityPr1P1L1IorL1-DcachewithprobabilityPr1P1L1D.Thejobarrivalprobabilitiesintotheservicecenters(processorcore,L1-I,L1-D,L2,orMM)dependsupontheworkloadcharacteristics(i.e.,processor-bound,memory-bound,ormixed).ThedatafromtheL1-IcacheandtheL1-DcachereturnstoP1withprobabilitiesPr1L1IP1andPr1L1DP1,respectively,afterL1-IandL1-Dcachehits.TherequestsfromtheL1-IcacheandtheL1-DcachearedirectedtotheL2cachewithprobabilitiesPr1L1IL2andPr1L1DL2,respectively,afterL1-IandL1-Dcachemisses.TheprobabilityofrequestsenteringP1ortheL2cachefromtheL1-IandL1-DcachedependsonthemissratesoftheL1-IandL1-Dcaches.AfteranL2cachehit,therequesteddataistransferredtoP1 304

PAGE 305

withprobabilityPr1L2P1orentersMMwithprobabilityPr1L2MafteranL2cachemiss.TherequestsfromMMalwaysreturntoP1withprobabilityPr1MP1=1.ThequeueingnetworkchainandpathforchaintwocorrespondingtoP2followsthesamepatternaschainonecorrespondingtoP1.Forexample,requestsfromtheL2cacheinchaintwoeitherreturntoP2withprobabilityPr2L2P2afteranL2cachehitorenterMMwithprobabilityPr2L2MafteranL2cachemiss. Thequeueingnetworkmodelprobabilitiesforthe2P-2L1ID-2L2-1Mmulti-corearchitectureformemory-boundworkloads(processor-to-processorprobabilityPpp=0.1,processor-to-memoryprobabilityPpm=0.9)assumingthatL1-I,L1-D,andL2cachemissratesare25%,50%,and30%,respectively,aresetas,Pr1P1P1=0.1,Pr1P1L1I=0.45,Pr1P1L1D=0.45,Pr1L1IP1=0.75,Pr1L1DP1=0.5,Pr1L1IL2=0.25,Pr1L1DL2=0.5,Pr1L2P1=0.7,Pr1L2M=0.3,Pr1MP1=1(differentprobabilitiesareassignedforprocessor-boundormixedworkloads).OurmodelallowsstudyofworkloadsbasedonanoverallworkloadbehaviorwherePppandPpmremainsuniformthroughouttheworkload.OurmodelalsoallowsdetailedstudyofworkloadswithdifferentphasesbyassigningadifferentPppandPpmforeachphase. Tofurtherelaborateonourmodelingapproach,Fig. 10-2 depictsthequeueingmodelfor2P-2L1ID-1L2-1M,whichissimilartothemodelfor2P-2L1ID-2L2-1M(Fig. 10-1 )exceptthatthisqueueingmodelcontainsasharedL2cache(L2sdenotesthesharedL2cacheinFig. 10-2 )forthetwoprocessorcoresP1andP2insteadofprivateL2caches.Thequeueingnetworkconsistsoftwochains:chainonecorrespondstoprocessorcoreP1andchaintwocorrespondingtoprocessorcoreP2.TherequestsfromtheL1-IcacheandtheL1-DcacheforchainonegotothesharedL2cache(L2s)withprobabilityPr1L1IL2sandPr1L1DL2s,respectively,onL1-IandL1-Dcachemisses.TherequesteddataistransferredfromtheL2cachetoP1withprobabilityPr1L2sP1onanL2cachehitwhereasthedatarequestgoestoMMwithprobabilityPr1L2sMonanL2cachemiss.TherequestsfromtheL1-IcacheandtheL1-Dcacheforchaintwogo 305

PAGE 306

Figure10-2. Queueingnetworkmodelforthe2P-2L1ID-1L2-1Mmulti-coreembeddedarchitecture. tothesharedL2cache(L2s)withprobabilityPr2L1IL2sandPr2L1DL2s,respectively,onL1-IandL1-Dcachemisses.TherequesteddatafromtheL2cacheistransferredtoP2withprobabilityPr2L2sP2onanL2cachehitwhereasthedatarequestgoestoMMwithprobabilityPr2L2sMonanL2cachemiss. Ourqueueingtheoreticmodelsdetermineperformancemetricsofcomponent-levelarchitecturalelements(e.g.,processorcores,L1-1,L1-D,etc.),however,embeddedsystemdesignersareoftenalsointerestedinsystem-wideperformancemetrics.Forexample,system-wideresponsetimeofanarchitectureisanimportantmetricforreal-timeembeddedapplications.Ourqueueingtheoreticmodelsenablecalculationsofsystem-wideperformancemetrics.Basedonourqueueingtheoreticmodels,wecan 306

PAGE 307

calculatethesystem-wideresponsetimeRtrofamulti-coreembeddedarchitectureas Rtr=max8i=1,...,Np(PriPiPitrPi)+max8i=1,...,Np(PriPiL1ItriL1I)+max8i=1,...,Np(PriPiL1DtriL1D)+max8i=1,...,Np((PriL1IL2+PriL1DL2)triL2+max8i=1,...,Np(PriL2MtrM) (10) whereNpdenotesthetotalnumberofprocessorcoresinthemulti-coreembeddedarchitecture.Sinceprocessorcoresinamulti-coreembeddedarchitectureoperateinparallel,theeffectiveresponsetimeoftheprocessorcoresisthemaximumresponsetimeoutofalloftheprocessorcores'responsetimesasgiveninEquation 10 (similarreasoningholdsforotherarchitecturalelements).trPi,triL1I,triL1D,triL2,andtrMdenotetheresponsetimeforprocessorcorePi,L1-I,L1-D,andL2correspondingtochaini,andMM,respectively.PriPiPidenotestheprobabilityofrequestsloopingbackfromprocessorcorePitoprocessorcorePiinthequeueingnetworkchaini(thetotalnumberofchainsinthequeueingnetworkisequaltoNp).PriPiL1IandPriPiL1DdenotetheprobabilityofrequestsgoingfromprocessorcorePiinchainitotheL1-IcacheandtheL1-Dcache,respectively.PriL1IL2andPriL1DL2denotetheprobabilityofrequestsgoingfromtheL1-IcacheandtheL1-DcacheinchainitotheL2cache,respectively.PriL2MdenotestheprobabilityofrequestsgoingfromtheL2cacheinchainitoMM.System-widethroughputcanbegivensimilartoEquation 10 Ourqueueingnetworkmodelingprovidesafasteralternativeforperformanceevaluationofmulti-corearchitecturesascomparedtorunningcompletebenchmarksonmulti-coresimulators(and/ortracesimulators)thoughattheexpenseofaccuracy.Ourqueueingnetworkmodelsonlyrequiresimulatingasubsetofthebenchmark'sinstructions(speciedimplicitlybytheserviceratesofthearchitecturalcomponentssuchasprocessorcoresandcaches)necessarytoreachsteadystate/equilibrium 307

PAGE 308

ofqueueingnetwork(exactminimumnumberofinstructionsrequireddependsupontheworkloadbehavior)withworkloadbehavioralcharacteristicscapturedbyprocessor-to-processorandprocessor-to-memoryprobabilities(asshowninFig. 10-1 ). 10.3QueueingNetworkModelsValidation Wevalidateourqueueingnetworkmodelsfordifferentcachemissratesandworkloadsandndthatthemodel'ssimulationresultsconformwithexpectedqueueingtheoreticalresults.Forexample,Fig. 10-3 depictstheresponsetimeformixedworkloads(Ppp=0.5,Ppm=0.5)for2P-2L1ID-1L2-1Masthenumberofjobs/tasksNvaries.ThegureshowsthatasNincreases,theresponsetimefortheprocessorcore,L1-I,L1-D,L2,andMMincreasesforallofthecachemissrates.WepointoutthatcachemissratescouldincreaseasNincreasesduetointer-taskaddressconictsandincreasingcachepressure(increasednumberofworkingsetsinthecache),butweassumethatthecachesizesaresufcientlylargeenoughsothatcapacitymissesremainthesamefortheconsiderednumberofjobs.WepresenttheaverageresponsetimeindividuallyfortheprocessorcoresandtheL1-I,L1-D,andL2caches.ForsmallerL1-I,L1-D,andL2cachemissrates,theprocessorcoreresponsetimeincreasesdrasticallyasNincreasesbecausemostofthetimejobsareservicedbytheprocessorcorewhereasforlargerL1-I,L1-D,andL2cachemissrates,theMMresponsetimeincreasesdrasticallybecauseofalargenumberofMMaccesses.Theseresultsalongwithourotherobservedresultsconformwiththeexpectedqueueingtheoreticalresultsandvalidatethecorrectnessofourqueueingnetworkmodelsformulti-corearchitectures.WepointoutthatsmallvariationsinresultscouldbeduetoinaccuraciesintheSHARPEsimulator,butdonotchangetheoveralltrends. Wefurthervalidateourqueueingtheoreticapproachformodelingmulti-corearchitecturesusingmulti-threadedbenchmarksexecutingonamulti-coresimulator.Wechoosekernels/applicationsfromtheSPLASH-2benchmarksuite,whichrepresenta 308

PAGE 309

Figure10-3. Queueingnetworkmodelvalidationdemonstrationresponsetime(ms)formixedworkloadsfor2P-2L1ID-1L2-1MforavaryingnumberofjobsN. rangeofcomputationsinthescientic,engineering,andgraphicsdomains.Webrieydescribeourselectedkernels/applicationsfromtheSPLASH-2benchmarksuite[ 277 ] FastFouriertransform(FFT):TheFFTkernelisacomplex1-DalgorithmforFFTcalculation,whichisoptimizedtominimizeinterprocesscommunication.Thekernel'stimeandmemoryrequirementgrowthratesareO(N1.5logN)andO(N),respectively. LUdecomposition:TheLUkernelfactorsadensematrixintotheproductofalowertriangularandanuppertriangularmatrixusingablockingalgorithmthatexploitstemporallocalityonsubmatrixelements.Weobtainresultsusingthenon-contiguousversionofLUintheSPLASH-2suiteasthenon-contiguousversionexhibitsbetterperformanceonchipmulti-processors(CMPs)[ 278 ],whichisthefocusofourstudy.Thekernel'stimeandmemoryrequirementgrowthratesareO(N3)andO(N),respectively. Radix:Theintegerradixsortkernelimplementsaniterativenon-comparison-basedsortingalgorithmforintegers.Thekernel'stimeandmemoryrequirementgrowthratesarebothO(N). Raytrace:TheRaytraceapplicationrendersathreedimensionalsceneusingraytracing.Thekernel'stimeandmemoryrequirementgrowthratesforthisapplicationisunpredictable[ 262 ]. 309

PAGE 310

Water-Spatial:Thisapplicationevaluatesforcesthatoccurovertimeinasystemofwatermolecules.Theapplication'stimeandmemoryrequirementgrowthratesarebothO(N). Toenableprobabilisticcharacterizationofmulti-threadedbenchmarks/workloadsthatcanbeusedinourqueueingtheoreticmodels,weoutlineaproceduretoestimatePppandPpmforbenchmarksgivenfewprocessorandmemorystatisticsforthebenchmarks.Forexample,Pppcanbeestimatedas Ppp=Op=Ot(10) whereOpandOtdenoteprocessoroperationsandtotaloperations(processorandmemory)inabenchmark,respectively.Foroatingpointbenchmarks,Opcanbeassumedtobeequaltothenumberofoatingpoint(FP)operations.Otcanbeobtainedas Ot=Op+Om(10) whereOmdenotesthenumberofmemoryoperations,whichisthesumofthetotalreadandtotalwriteoperations.PpmcanbeobtainedasPpm=1)]TJ /F3 11.955 Tf 11.96 0 Td[(Ppp. Usingourprobabilisticcharacterizationprocedureforbenchmarksandtheprocessorandmemorystatisticsfrom[ 277 ],weobtainPppforFFT,LU,andWater-Spatialas0.48,0.38,and0.46,respectively.ForRadix,FPoperationsarenotspeciedbecauseRadixisanintegersortbenchmark,however,sorting1millionintegersrequiresOp1million(sinceRadixrequiresO(N)operationsforsortingNintegers)fromwhichweapproximatePpptobe0.05usingourprobabilisticcharacterizationprocedure.Fromourprobabilisticcharacterizationofworkloads,FFTandWater-SpatialcanbeclassiedasmixedworkloadswhereasLUandRadixcanbeclassiedasmemory-boundworkloads(Radixisahighlymemory-intensiveworkload,asisindicatedbyitsPpm=1)]TJ /F1 11.955 Tf 12.77 0 Td[(0.05=0.95).WecouldnotcharacterizeRaytraceprobabilisticallybecauseofinsufcientstatisticsonthenumberofoperationsrequired(onemayassumePpp 310

PAGE 311

=0.5forsuchbenchmarks).Wepointoutthatwecanalsoobtaintheseprocessorandmemorystatisticsfordifferentbenchmarksbyrunningthesebenchmarksonamulti-coresimulatorrst(sincerunningbenchmarksonamulti-coresimulatortakestime,benchmarksshouldonlyberunonamulti-coresimulatorincasethebenchmarks'statisticsarenotavailablefromanypriorworkinliterature). WesimulatethearchitecturesinTable 10-1 usingSuperESCalarsimulator(SESC)[ 279 ].SESCmodelstheMIPSinstructionsetarchitectureandusesMINTasafunctionalsimulatorwithalibrarythathandlesmostoperatingsystemcalls,whicheliminatestheneedforafulloperatingsystemrunninginthesimulatorandthereforesignicantlyacceleratessimulation.InordertoaccuratelycaptureourmodeledarchitectureswithSESC,wespecifythesameprocessorandcacheparameters(e.g.,processoroperatingfrequency,cachesizesandassociativity,etc.)forthearchitecturesintheSESCcongurationlesasspeciedinourqueueingtheoreticmodels.Weconsidersingle-issueprocessorswithvepipelinestagesanda45nmprocesstechnology.WepointoutthattheexecutiontimesforthebenchmarksonSESCarecalculatedfromthenumberofcyclesrequiredtoexecutethosebenchmarks,thatis,executiontime=(Numberofcycles)(cycletime).Forexample,FFTrequires964,057cyclestoexecuteon4P-4L1ID-4L2-1Mat16.8MHz(59.524ns),whichgivestheexecutiontimeof57.38ms. Toverifythattheinsightsobtainedfromourqueueingtheoreticmodelsregardingarchitecturalevaluationarethesameasobtainedfromamulti-coresimulator,wecomparetheexecutiontimeresultsfortheFFT(anexampleformixedworkloads),LU(non-contiguous)(anexampleformemory-boundworkloads),andRaytrace(anexampleformixedworkloads)benchmarksonSESCandourqueueingtheoreticmodels(thebenchmarksarerepresentedprobabilisticallyforourqueueingtheoreticmodels)forthefourcoreprocessorarchitectures.Weassigntheprobabilitiescorrespondingtothesebenchmarksforourqueueingtheoreticmodelsfromthecachemissstatisticsobtained 311

PAGE 312

Table10-2. CachemissratestatisticsobtainedfromSESCfortheSPLASH-2benchmarksformulti-corearchitectures. ArchitectureFFTLURaytraceL1-IL1-DL2L1-IL1-DL2L1-IL1-DL2 4P-4L1ID-4L2-1M0.0110.0820.1350.00030.2020.05940.0250.0950.1254P-4L1ID-1L2-1M0.0110.0830.080.000240.2120.01820.0240.0940.064P-4L1ID-2L2-1M0.0110.0830.0860.000240.2120.020.0240.0940.104 312

PAGE 313

fromSESCasdepictedinTable 10-2 .Wepointoutthatprobabilitiesinourqueueingtheoreticmodelcanbeassignedbasedonthecachemissstatisticsforthebenchmarksfromanypriorworkinliterature.System-wideresponsetimeisobtainedfromourqueueingtheoreticmodelforthesebenchmarksusingEquation 10 .Table 10-3 summarizestheexecutiontimeresultsfortheFFT,LU,andRaytracebenchmarksonSESCandthemodeledbenchmarksforourqueueingtheoreticmodelsusingtheSHARPEmodelingtool/simulator[ 134 ].TheresultsfromSESCandourqueueingtheoreticmodelsprovidesimilarinsights,thatis,themulti-corearchitectureswithsharedLLCsprovidebetterperformancethanthearchitectureswithprivateandhybridLLCsforthesebenchmarks.Furthermore,architectureswithhybridLLCsexhibitsuperiorperformancethanthearchitectureswithprivateLLCs.Weagainclarifythatourqueueingtheoreticmodelsproviderelativeperformancemeasuresfordifferentarchitecturesandbenchmarksbysimulatingaminimumnumberofthebenchmarks'representativeinstructions,whichexplainsthedifferenceintheexecutiontimeobtainedfromSESCandourqueueingtheoreticmodelsforthesebenchmarks.Furthermore,executiontimeofthebenchmarksonSESCdependsupontheinputsizesforthebenchmarksandvariesfordifferentinputsizesbutretainssimilartrendsacrossdifferentarchitectures.Ourqueuingtheoreticmodelscapturetheperformancetrends,whichareimportantforrelativecomparisonofdifferentarchitectures,andthesetrendsagreewiththeperformancetrendsobtainedfromSESC. Table10-3. ExecutiontimeobtainedfromSESCandourqueueingtheoreticmodelsfortheSPLASH-2benchmarksformulti-corearchitectures.(QTdenotesourqueueingtheoreticmodel) ArchitectureFFTLURaytraceSESCQTSESCQTSESCQT 4P-4L1ID-4L2-1M57.38ms4.43ms362.13ms3.75ms17.84s4.38ms4P-4L1ID-1L2-1M52.77ms3.6ms336.54ms3.1ms15.91s3.37ms4P-4L1ID-2L2-1M52.81ms3.7ms337.3ms3.12ms16.52s4.05ms 313

PAGE 314

TofurtherverifytheinsightsrevealedwithrespecttoarchitectureswithdifferentLLCcongurations,weobtainperformanceresultsforadditionalSPLASH-2benchmarksonSESCfortwocoreandfourcoreprocessorarchitectures.Table 10-4 summarizestheperformance(executiontime)resultsonSESCfortheFFT,LU(non-contiguous),Radix,Raytrace,andWater-spatialbenchmarksforboththetwocoreandfourcoreprocessorarchitectures.TheSESCsimulationresultsverifythatthearchitectureswithsharedLLCsprovidebetterperformancethanthearchitectureswithprivateLLCsformostofthemixedandmemory-boundbenchmarks.Similarperformancetrendscanbeobservedforprocessor-boundworkloads,however,thememorysubsystem'simpactonperformanceismoreprominentformixedandmemory-boundworkloadsasdiscussedinSection 10.5.1 .TheSESCsimulationresultsalsoindicatethatthearchitectureswithhybridLLCsprovidebetterperformancethanthearchitectureswithprivateLLCsandworseperformancethanthosewithsharedLLCsformostofthebenchmarks.Forexample,sharedLLCsprovide8.7%,7.6%,2.3%,and12.1%betterperformanceoverprivateLLCsand0.08%,0.2%,8.7%,and3.8%betterperformanceoverhybridLLCsforFFT,LU,Radix,andRaytrace,respectively,formulti-corearchitectureswithfourprocessorcores.Wepointoutthatdependinguponthedatapartitioninginamulti-threadedbenchmark,abenchmarkmayyielddifferentthanexpectedperformanceresultsfromanarchitecturebecauseofpeculiarspatialandtemporallocalitiesinthedata,unpredictablesharedmisses,sharedwrites,andcachemissrates[ 278 ].Forexample,privateLLCsdepict7.8%and8.4%betterperformancethansharedandhybridLLCsfortheWater-Spatialbenchmarkformulti-corearchitectureswithfourprocessorcores. TheseSESCsimulationresultsonmulti-threadedbenchmarksverifytheresultsandinsightsobtainedfromourqueueingtheoreticmodelsformixedandmemory-boundworkloadswithcomparableprocessor-to-processorandprocessor-to-memoryprobabilities(Section 10.5.1 ).FromtheprobabilisticcharacterizationoftheSPLASH-2 314

PAGE 315

Table10-4. ExecutiontimecomparisonoftheSPLASH-2benchmarksonSESCformulti-corearchitectures. ArchitectureFFTLURadixRaytraceWater-Spatial 2P-2L1ID-2L2-1M65.06ms518.52ms4.24s26.32s24.47s2P-2L1ID-1L2-1M60.23ms480.85ms4.16s23.5s24.22s4P-4L1ID-4L2-1M57.38ms362.13ms2.24s17.84s13.06s4P-4L1ID-1L2-1M52.77ms336.54ms2.19s15.91s14.08s4P-4L1ID-2L2-1M52.81ms337.3ms2.38s16.52s14.16s benchmarks,wediscoverthatitisdifculttondbenchmarksinabenchmarksuitethatcovertheentirerangeofprocessor-to-processorandprocessor-to-memoryprobabilities(e.g.,Ppprangingfrom0.05forsomebenchmarksto0.95forothers).Thelackofcomputationallydiversebenchmarksintermsofprocessorandmemoryrequirementsinabenchmarksuitemakesourqueueingtheoreticmodelingapproachanattractivesolutionforrigorousarchitecturalevaluationbecauseourmodelingapproachenablesarchitecturalevaluationviabenchmarkswithvirtuallyanycomputingrequirementscharacterizedprobabilistically. Toverifythatourqueueingtheoreticmodelingapproachprovidesaquickarchitecturalevaluationascomparedtorunningbenchmarksonamulti-coresimulator,wecomparetheexecutiontimeforevaluatingmulti-coreembeddedarchitecturesonSESCandusingourqueueingtheoreticmodelsimplementationonSHARPE.ThetimetakentoexecuteSPLASH-2benchmarksonSESCwasmeasuredusingLinuxtimecommandonanIntelXeonE5430processorrunningat2.66GHz.ThetimetakentoexecuteourqueueingtheorymodelswasmeasuredusingLinuxtimecommandonanAMDOpteron246processorrunningat2GHzandthentheexecutiontimeresultswerescaledto2.66GHztoprovideafaircomparison(SHARPEruntimewasmeasuredontheAMDOpteronprocessorbecauseweusedSHARPEsoftwarepackageinstallationfromCHREC[ 207 ]).Weobservethatourqueueingtheoreticmodelstake1.92msand8.33msonaverageformulti-coreembeddedarchitectureswithtwoandfourprocessor 315

PAGE 316

Table10-5. ExecutiontimecomparisonofSPLASH-2benchmarksonSESCversusourqueueingtheoreticmodels.Tx)]TJ /F6 7.97 Tf 6.59 0 Td[(coreYdenotestheexecutiontimeforsimulatinganx-corearchitectureusingYwhereY=fSESC,QTg(QTdenotesourqueueingtheoreticmodel) BenchmarkT2)]TJ /F6 7.97 Tf 6.58 0 Td[(coreSESC(s)T2)]TJ /F6 7.97 Tf 6.59 0 Td[(coreSESC=T2)]TJ /F6 7.97 Tf 6.59 0 Td[(coreQTT4)]TJ /F6 7.97 Tf 6.59 0 Td[(coreSESC(s)T4)]TJ /F6 7.97 Tf 6.59 0 Td[(coreSESC=T4)]TJ /F6 7.97 Tf 6.59 0 Td[(coreQT FFT1.78851.4168LU2110,93826.13,133Radix72.137,55277.39,280Raytrace772.7402,448780.393,673Water-Spatial927.35482,995998.4119,856 cores,respectively.Wepointoutthattheseexecutiontimesonourqueueingtheoreticmodelsdonotincludethetimetakentoobtainprocessor,memory,andcachestatistics(eitherfromanypriorworkinliteratureorrunningbenchmarksonamulti-coresimulatorifthesestatisticsarenotavailablefromanypriorwork)asthisstatisticsgatheringtimecanvaryfordifferentbenchmarksanddependsupontheexistingworkinliteratureondifferentbenchmarks.Table 10-5 depictstheaveragetimetakenbyexecutingSPLASH-2benchmarksonSESCformulti-coreembeddedarchitectureswithtwoandfourprocessorcoresaswellastheratiooftimetakenonSESCtothetimetakenbyourqueueingtheoreticmodels.Resultsrevealthatourqueuingtheoreticmodelscanprovidearchitecturalevaluationresults482,995xfasterascomparedtoexecutingbenchmarksonSESC.Hence,ourqueueingtheoreticmodelingapproachcanbeusedforquickarchitecturalevaluationformulti-coreembeddedsystemsforvirtuallyanysetofworkloads. 10.4InsightsObtainedfromQueueingTheoreticModels Inthissection,wepresentinsightsobtainedfromourqueueingtheoreticmodelsregardingperformance,performanceperwatt,andperformanceperunitareaforthevedifferentmulti-coreembeddedarchitecturesdepictedinTable 10-1 (forbrevity,wepresentasubsetoftheresults,however,ouranalysisandderivedconclusionsarebasedonourcompletesetofexperimentalresults).Weimplementourqueueing 316

PAGE 317

networkmodelsofthemulti-coreembeddedarchitecturesusingtheSHARPEmodelingtool/simulator[ 134 ].WeconsidertheARM7TDMIprocessorcore,whichisa32-bitlow-powerprocessorwith32-bitinstructionanddatabuswidths[ 280 ][ 281 ].Weconsiderthefollowingcacheparameters[ 282 ]:cachesizesof8KB,8KB,and64KBfortheL1-I,L1-D,andL2caches,respectively;associativitiesofdirect-mapped,2-way,and2-wayfortheL1-I,L1-D,andL2caches,respectively;andblock(line)sizesof64B,16B,and64BfortheL1-I,L1-D,andL2caches,respectively.Weassumea32MBMMforallarchitectures,whichistypicalformobileembeddedsystems(e.g.,SharpZaurusSL-5600personaldigitalassistant(PDA))[ 283 ].Toprovideafaircomparisonbetweenarchitectures,weensurethatthetotalL2cachesizeforsharedL2cachearchitecturesandprivateL2cachearchitecturesremainsthesame. Werstcalculatetheserviceratesfortheservicecentersusedinourmulti-corequeueingmodels.Weassumethattheprocessorcoredelivers15MIPS@16.8MHz[ 280 ](cycletime=1/(16.8106)=59.524ns,whichfor32-bitinstructionscorrespondstoaservicerateof480Mbps.WeassumeL1-I,L1-D,andL2cache,andMMaccesslatenciesof2,2,10,and100cycles,respectively[ 280 ][ 284 ].WithanL1-Icachelinesizeof64B,anaccesslatencyof2cycles,anda32-bit(4B)bus,transferring64Brequires64/4=16cycles,whichresultsinatotalL1-Itime(cycles)=accesstime+transfertime=2+16=18cycles,withacorrespondingL1-Iservicerate=(648)/(1859.52410)]TJ /F4 7.97 Tf 6.59 0 Td[(9)=477.86Mbps.WithanL1-Dcachelinesizeof16B,transfertime=16/4=4cycles,andtotalL1-Dtime=2+4=6cycles,withacorrespondingL1-Dservicerate=(168)/(659.52410)]TJ /F4 7.97 Tf 6.58 0 Td[(9)=358.4Mbps.WithanL2cachelinesizeof64B,transfertime=64/4=16cycles,whichgivestotalL2time=10+16=26cycles,withacorrespondingL2servicerate=(648)/(2659.52410)]TJ /F4 7.97 Tf 6.59 0 Td[(9)=330.83Mbps.WithMMlinesizeof64B,transfertime=64/4=16cycles,whichgivestotalMMtime=100+16=116cycles,withacorrespondingservicerate=(648)/(11659.52410)]TJ /F4 7.97 Tf 6.58 0 Td[(9)=74.15Mbps.Weassumethateachindividualjob/task 317

PAGE 318

requiresprocessing1Mbofinstructionanddata,whichisimplicitinourqueueingmodelviaserviceratespecications(ensuressteadystate/equilibriumbehaviorofqueueingnetworkforoursimulatedworkloads). 10.5TheEffectsofCacheMissRatesonPerformance Inthissubsection,wepresentresultsdescribingtheeffectsofdifferentL1-I,L1-D,andL2cachemissratesonthearchitectureresponsetimeandthroughputperformancemetricsformixed,processor-bound,andmemory-boundworkloads.Consideringtheeffectsofdifferentcachemissratesisanimportantaspectofperformanceevaluationofmulti-coreembeddedarchitectureswithsharedresourcesbecausecachemissratesgiveanindicationwhetherthethreads(correspondingtotasks)arelikelytoexperiencecachecontention.ThreadswithhigherLLCmissratesaremorelikelytohavelargeworkingsetssinceeachmissresultsintheallocationofanewcacheline.Theseworkingsetsmaysufferfromcontentionbecausethreadsmayrepeatedlyevicttheotherthreads'data(i.e.,cachethrashing)[ 261 ].Weobtainedresultsforcachemissratesof0.0001,0.05,and0.2,upto0.5,0.7,and0.7fortheL1-I,L1-D,andL2caches,respectively.Thesecachemissraterangesrepresentstypicalmulti-coreembeddedsystemsforawidediversityofworkloads[ 285 ][ 286 ][ 287 ]. Fig. 10-4 depictstheeffectsofcachemissrateonresponsetimeformixedworkloads(Ppp=0.5,Ppm=0.5)for2P-2L1ID-2L2-1MasthenumberofjobsNvaries.WeobservethatMMresponsetimeincreasesby314%asmissratesfortheL1-1,L1-D,andL2cachesincreasesfrom0.0001,0.05,and0.2,to0.5,0.7,and0.7,respectively,whenN=5.Thisresponsetimeincreaseisexplainedbythefactthatasthecachemissrateincreases,thenumberofaccessestoMMincreases,whichincreasesthequeuelengthandMMutilization,whichcausesanincreasesintheMMresponsetime.TheL1-IandL1-Dresponsetimesdecreaseby14%and18%,respectively,asthemissratesfortheL1-1andL1-Dcachesincreasefrom0.0001and0.05,respectively,to0.5and0.7,respectively,whenN=5.Thisdecreaseinresponsetimeoccursbecause 318

PAGE 319

Figure10-4. Theeffectsofcachemissrateonresponsetime(ms)formixedworkloadsfor2P-2L1ID-2L2-1MforavaryingnumberofjobsN. increasedmissratesdecreasetheL1-1andL1-Dqueuelengthsandutilizations.TheL2cacheresponsetimeincreasesby12%asthemissratesfortheL1-1andL1-Dcachesincreasefrom0.0001and0.05,respectively,to0.5and0.7,respectively,whenN=5(eventhoughtheL2cachemissratealsoincreasesfrom0.2to0.7butincreasedL1-IandL1-DmissrateseffectivelyincreasethenumberofL2cachereferences,whichincreasestheL2cachequeuelengthandutilizationandthusL2cacheresponsetime). Weobservedthatformixedworkloads(Ppp=0.5,Ppm=0.5),theresponsetimesfortheprocessorcore,L1-I,L1-D,andMMfor2P-2L1ID-1L2-1Mareveryclosetotheresponsetimesfor2P-2L1ID-2L2-1M,however,theL2responsetimepresentsinterestingdifferences.TheL2responsetimefor2P-2L1ID-1L2-1Mis22.3%lessthantheL2responsetimefor2P-2L1ID-2L2-1MwhentheL1-1,L1-D,andL2cachemissratesare0.0001,0.05,and0.2,respectively,andN=5(similarpercentagedifferenceswereobservedforothervaluesofN)whereastheL2responsetimefor2P-2L1ID-1L2-1Misonly6.5%lessthantheL2responsetimewhentheL1-1,L1-D,andL2cachemissratesare0.5,0.7,and0.7,respectively.ThisresultshowsthatthesharedL2cache(ofcomparableareaasthesumoftheprivateL2caches)performsbetterthantheprivateL2cachesintermsofresponsetimeforsmallcachemissrates,however,theperformanceimprovementdecreasesasthecachemissrateincreases.Similartrendswereobservedforprocessor-boundworkloads(Ppp=0.9,Ppm=0.1)andmemory-boundworkloads(Ppp=0.1,Ppm=0.9). 319

PAGE 320

Formixedworkloads,theresponsetimefortheprocessorcore,L1-I,L1-D,andMMfor4P-4L1ID-1L2-1Mis1.2x,1x,1.1x,and2.4xgreaterthanthecorrespondingarchitecturalelementsprocessorcore,L1-I,L1-D,andMMfor4P-4L1ID-4L2-1MwhereastheL2responsetimefor4P-4L1ID-1L2-1Mis1.1xlessthantheL2responsetimefor4P-4L1ID-4L2-1MwhentheL1-1,L1-D,andL2cachemissratesare0.5,0.7,and0.7,respectively,andN=5.Thisobservationinconjunctionwithourotherexperiments'resultsrevealthatthearchitectureswithprivateLLCsprovideimprovedresponsetimeforprocessorcoresandL1cachesascomparedtothearchitectureswithsharedLLCs,however,theresponsetimeoftheLLCalonecanbeslightlybetterforarchitectureswithsharedLLCsbecauseofthelargereffectivesizeforeachcore.TheresultsalsoindicatethattheMMresponsetimecouldbecomeabottleneckforarchitectureswithsharedLLCs,especiallywhenthecachemissratesbecomehigh.AnotherinterestingobservationisthatsharedLLCscouldleadtoincreasedresponsetimeforprocessorcoresascomparedtotheprivateLLCsbecauseofstallingoridlewaitingofprocessorcoresforbottleneckscausedbyMM.Similartrendswereobservedforprocessor-boundandmemory-boundworkloads. Formixedworkloads,theresponsetimeoftheL2for4P-4L1ID-2L2-1Mis1.2xlessthan4P-4L1ID-4L2-1Mand1.1xgreaterthan4P-4L1ID-1L2-1MwhentheL1-1,L1-D,andL2cachemissratesare0.0001,0.05,and0.2,respectively,andN=5.MMresponsetimefor4P-4L1ID-2L2-1Mis2.3xlessthan4P-4L1ID-1L2-1MwhereasMMresponsetimefor4P-4L1ID-2L2-1Mand4P-4L1ID-4L2-1MisthesamewhentheL1-1,L1-D,andL2cachemissratesare0.5,0.7,and0.7,respectively,andN=5.TheresponsetimesfortheprocessorcoreandL1-I/Darecomparableforthethreearchitectures(4P-4L1ID-4L2-1M,4P-4L1ID-2L2-1M,and4P-4L1ID-1L2-1M).TheseresultsalongwithourotherresultsshowthathavingLLCssharedbyfewercores(e.g.,theL2cachesharedbytwocoresinourconsideredarchitecture)donotintroduceMMasaresponsetimebottleneckwhereastheMMbecomesthebottleneckasmorecores 320

PAGE 321

sharetheLLCs,especiallyforlargecachemissrates.Similartrendswereobservedforprocessor-boundandmemory-boundworkloads. Weobservetheeffectsofcachemissratesonthroughputforvariousmulti-coreembeddedarchitectures.Formixedworkloads,thethroughputfortheprocessorcore,L1-I,L1-D,andMMfor2P-2L1ID-1L2-1Misveryclosetothethroughputfor2P-2L1ID-2L2-1M,however,L2throughputfor2P-2L1ID-1L2-1Mis100%greateronaveragethantheL2throughputfor2P-2L1ID-2L2-1MfordifferentmissratesfortheL1-1,L1-D,andL2andN=5.However,combinedthroughputofthetwoprivateL2cachesin2P-2L1ID-2L2-1MiscomparabletotheL2throughputfor2P-2L1ID-1L2-1M.ThisshowsthatthesharedandprivateL2cachesprovidecomparablenetthroughputsforthetwoarchitectures.Thethroughputfortheprocessorcore,L1-I,L1-D,andL2for4P-4L1ID-4L2-1Mis2.1xlessonaveragethanthecorresponding2P-2L1ID-2L2-1Melementswhereasthethroughputfortheprocessorcore,L1-1,L1-D,andL2forthetwoarchitecturesisthesamewhenthemissratesfortheL1-1,L1-D,andL2cachesare0.5,0.7,and0.7,respectively,andN=5.Thisindicatesthatthethroughputfortheindividualelements(exceptMMsinceMMissharedforboththearchitectures)decreasesforthearchitecturewithmorecoressincetheworkloadremainsthesame.Thethroughputfortheprocessorcore,L1-I,L1-D,L2,andMMfor4P-4L1ID-2L2-1Mis1.5x,1.5x,1.5x,2.5x,and1.3xlessthanthethroughputforthecorresponding4P-4L1ID-1L2-1MelementswhenthemissratesfortheL1-1,L1-D,andL2cachesare0.0001,0.05,and0.2,respectively,andN=5.TheseobservationsrevealthatchangingtheL2cachefromprivatetosharedcanalsoimpactthethroughputforotherarchitecturalelementsbecauseoftheinteractionbetweentheelements. Weevaluatetheeffectsofcachemissratesonthroughputforprocessor-boundworkloads(Ppp=0.9,Ppm=0.1)for2P-2L1ID-2L2-1MasNvaries.ResultsrevealthatthereisnoapparentincreaseinprocessorcorethroughputasNincreasesfrom5to20becauseprocessorscontinuetooperateatutilizationcloseto1whentheL1-1, 321

PAGE 322

L1-D,andL2cachemissratesare0.3,0.3,and0.3,respectively(similartrendswereobservedforothercachemissrates).TheMMthroughputincreasesby4.67%(4.67%-1.64%=3.03%greaterthanthemixedworkloads)asNincreasesfrom5to20whenL1-1,L1-D,andL2cachemissratesare0.5,0.7,and0.7,respectively.Inthiscase,theMMpercentagethroughputincreaseisgreaterforprocessor-boundworkloadsascomparedtomixedworkloadsbecausetheMMisunderutilizedforprocessor-boundworkloads(e.g.,autilizationof0.519forprocessor-boundworkloadsascomparedtoautilizationof0.985formixedworkloadswhenN=5).However,theMMabsolutethroughputforprocessor-boundworkloadsislessthanthemixedworkloads(e.g.,MMthroughputof38.5Mbpsforprocessor-boundworkloadsascomparedtothethroughputof73MbpsformixedworkloadswhenN=5).Forprocessor-boundworkloads,thethroughputfortheprocessorcore,L1-I,L1-D,andMMfor2P-2L1ID-1L2-1Missimilartothethroughputfor2P-2L1ID-2L2-1M,however,theL2throughputfor2P-2L1ID-1L2-1Mis100%greaterthantheL2throughputfor2P-2L1ID-2L2-1MforallthecachemissratesonaverageandN=5.Similartrendswereobservedformemory-boundworkloadsandmixedworkloadsforarchitectureswithtwoorfourcoreswithprivateandsharedLLCs(thesethroughputtrendswouldcontinueasthenumberofcoresincreases). 10.5.1TheEffectsofWorkloadsonPerformance Inthissubsection,wepresentresultsdescribingtheeffectsofdifferentworkloadsontheresponsetimeandthroughputperformancemetricswhentheL1-1,L1-D,andL2cachemissratesareheldconstant.Wediscusstheeffectsofvaryingcomputingrequirementsoftheseworkloads.Thecomputingrequirementofaworkloadsigniestheworkload'sdemandforprocessorresources,whichdependsonthepercentageofarithmetic,logic,andcontrolinstructionsintheworkloadrelativetoloadandstoreinstructions.ThecomputingrequirementsofworkloadsarecapturedbyPppandPpminourmodel. 322

PAGE 323

Figure10-5. Theeffectsofprocessor-boundworkloadsonresponsetime(ms)for2P-2L1ID-2L2-1MforavaryingnumberofjobsNforcachemissrates:L1-I=0.01,L1-D=0.13,andL2=0.3 Fig. 10-5 depictstheeffectsofvaryingcomputingrequirementsforprocessor-boundworkloadsonresponsetimefor2P-2L1ID-2L2-1MasNvarieswheretheL1-I,L1-D,andL2cachemissratesare0.01,0.13,and0.3,respectively.TheguredepictsthatasNincreases,theresponsetimefortheprocessorcore,L1-I,L1-D,L2,andMMincreasesforallvaluesofPppandPpm.ThegureshowsthatasPppincreases,theresponsetimeoftheprocessorincreaseswhereastheresponsetimeofL1-I,L1-D,L2,andMMisaffectednegligiblybecauseoftheprocessor-boundnatureoftheworkloads.Forexample,theprocessorresponsetimeincreasesby19.8%asPppincreasesfrom0.7to0.95whenN=5.TheresponsetimeofL1-I,L1-D,L2,andMMdecreasesby10.8%,14.2%,2.2%,and15.2%,respectively,asPppincreasesfrom0.7to0.95whenN=5becauseanincreaseinPppresultsinadecreaseinmemoryrequests,whichdecreasestheresponsetimeforthecachesandMM. Weobservethattheresponsetimefortheprocessorcore,L1-I,andL1-Dfor2P-2L1ID-1L2-1Misveryclose(within7%)to2P-2L1ID-2L2-1Masthecomputingrequirementsoftheprocessor-boundworkloadvaries.However,2P-2L1ID-1L2-1Mprovidesa21.5%improvementinL2responsetimeanda12.3%improvementinMMresponsetimeover2P-2L1ID-2L2-1MwhenPpp=0.7anda23.6%improvementinL2responsetimeanda1.4%improvementinMMresponsetimewhenPpp=0.95andN=5.4P-4L1ID-2L2-1Mprovides22.3%improvementinL2responsetime 323

PAGE 324

and13%improvementinMMresponsetimeover4P-4L1ID-4L2-1MwhenPpp=0.7andN=5.4P-4L1ID-2L2-1Mprovides22.3%improvementinL2responsetimeand3%improvementinMMresponsetimeover4P-4L1ID-4L2-1MwhenPpp=0.95becausehigherPppresultsinlowerMMreferences.4P-4L1ID-1L2-1Mprovides7.4%improvementinL2responsetimewith5.2%degradationinMMresponsetimeover4P-4L1ID-2L2-1MwhenPpp=0.7andN=5.4P-4L1ID-1L2-1Mprovides12.4%improvementinL2responsetimewithnodegradationinMMresponsetimeover4P-4L1ID-2L2-1MwhenPpp=0.95andN=5.TheseresultsindicatethatsharedLLCsprovidemoreimprovementinL2responsetimeascomparedtohybridandprivateLLCsformorecompute-intensiveprocessor-boundworkloads.ThehybridLLCs,however,providebetterMMresponsetimethansharedLLCsformorecompute-intensiveprocessor-boundworkloads.TheseresultssuggestthathybridLLCsmaybemoresuitablethansharedLLCsintermsofscalabilityandoverallresponsetimeforcomparativelylesscompute-intensiveprocessor-boundworkloads. Formemory-boundworkloads,2P-2L1ID-1L2-1Mprovidesa16.7%improvementinL2responsetimeanda31.5%improvementinMMresponsetimeover2P-2L1ID-2L2-1MwhenPpm=0.95andN=5.2P-2L1ID-1L2-1Mprovidesan18.2%improvementinL2responsetimeanda25.8%improvementinMMresponsetimeover2P-2L1ID-2L2-1MwhenPpm=0.7andN=5.4P-4L1ID-2L2-1Mprovidesa19.8%improvementinL2responsetimeanda20.2%improvementinMMresponsetimeonaverageover4P-4L1ID-4L2-1MbothforPpm=0.95andPpm=0.7andN=5.4P-4L1ID-1L2-1Mprovidesa2.4%improvementinL2responsetimewitha15%degradationinMMresponsetimeover4P-4L1ID-2L2-1MwhenPpm=0.95andN=5.4P-4L1ID-1L2-1MprovidesnoimprovementinL2responsetimewitha11.5%degradationinMMresponsetimeover4P-4L1ID-2L2-1MwhenPpm=0.7andN=5.TheseresultsindicatethatsharedLLCsprovidealargerimprovementinL2andMMresponsetimeascomparedtoprivateLLCsformemory-boundworkloads.Furthermore,hybridLLCsaremore 324

PAGE 325

amenableintermsofresponsetimeascomparedtosharedandprivateLLCsformemory-boundworkloads.Similartrendswereobservedformixedworkloadsforarchitectureswithtwoorfourcorescontainingprivate,shared,orhybridLLCs. Weobservetheeffectsofvaryingcomputingrequirementsforprocessor-boundworkloadsonthroughputfor2P-2L1ID-2L2-1MasNvaries.AsNincreases,thethroughputfortheprocessorcore,L1-I,L1-D,L2,andMMincreasesforallvaluesofPppandPpm.Furthermore,asPppincreases,thethroughputoftheprocessorcoreincreaseswhereasthethroughputofL1-I,L1-D,L2,andMMdecreasesbecauseofrelativelyfewermemoryrequests.Formemory-boundworkloads,L1-I,andL1-Dthroughputfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1Marecomparable,however,2P-2L1ID-1L2-1MimprovestheL2throughputby106.5%and111%(duetolargercombinedL2cache)whereastheMMthroughputdecreasesby126%and121.2%whenPpmis0.7and0.95,respectively.Formemory-boundworkloads,2P-2L1ID-1L2-1Mprovides5.3%and3.4%improvementinprocessorcorethroughputover2P-2L1ID-2L2-1MwhenPpm=0.95andPpm=0.7,respectively,andN=5.Forprocessor-boundworkloads,theprocessorcorethroughputsfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1Marecomparable.Similartrendswereobservedforthearchitectureswithfourcorescontainingprivate,shared,orhybridLLCs.Thisisbecauseprocessorcoresoperateclosetosaturation(athighutilization)forprocessor-boundworkloadsandmemorystallsduetomemorysubsystemresponsetimehavenegligibleeffectonprocessorcoreperformanceasmemoryaccessesarecompletelyoverlappedwithcomputation. Responsetimeandthroughputresultsrevealthatmemorysubsystemsforanarchitecturewithprivate,shared,orhybridLLCshaveaprofoundimpactontheresponsetimeofthearchitectureandthroughputsfortheL2andMMwithrelativelylittleimpactonthroughputfortheprocessorcores,L1-I,andL1-D. 325

PAGE 326

10.5.2PerformanceperWattandPerformanceperUnitAreaComputations Inthissubsection,wecomputeperformanceperwattandperformanceperunitareaforthemulti-coreembeddedarchitecturesusingourqueueingtheoreticmodels.Theperformanceperunitareaisanimportantmetricforembeddedsystemswheretheentiresystemneedstobepackagedinalimitedspace,however,performanceperunitareamaynotbeasimportantofmetricfordesktopandsupercomputing.Ourperformanceperwattandperformanceperunitareacomputationsassistinrelativecomparisonsbetweendifferentmulti-coreembeddedarchitectures.Forthesecomputations,werstneedtocalculateareaandworst-case(peak)powerconsumptionfordifferentmulti-coreembeddedarchitectures,whichweobtainusingCACTI6.5[ 288 ],InternationalTechnologyRoadmapforSemiconductors(ITRS)specications[ 289 ],anddatasheetsformulti-coreembeddedarchitectures. Toillustrateourareaandpowercalculationprocedurethatcanbecombinedwiththeresultsobtainedfromourqueuingtheoreticmodelstoobtainperformanceperunitareaandperformanceperwatt,weprovideexampleareaandpowercalculationsfor2P-2L1ID-2L2-1M.Theareacalculationsfor2P-2L1ID-2L2-1Mare:totalprocessorcorearea=20.0325=0.065mm2;totalL1-Icachearea=2(0.2818780.19619)=20.0553=0.11mm2(CACTIprovidescacheheightwidthforindividualcaches(e.g.,0.2818780.19619fortheL1-Icache));totalL1-Dcachearea=2(0.2097230.23785)=20.0499=0.0998mm2;totalL2cachearea=2(0.451660.639594)=20.2889=0.578mm2;totalMMarea=8.387774.08034=34.22mm2.ThepowerconsumptionofthecachesandMMisthesumofthedynamicpowerandtheleakagepower.SinceCACTIgivesdynamicenergyandaccesstime,dynamicpoweriscalculatedastheratioofdynamicenergytotheaccesstime.Forexample,fortheL1-Icache,thedynamicpower=0.020362(nJ)/0.358448(ns)=56.8mWandtheleakagepower=10.9229mW,whichgivestheL1-Ipowerconsumption=56.8+10.9229=67.72mW.FortheMMdynamicpowercalculation,wecalculatetheaverage 326

PAGE 327

dynamicenergyperreadandwriteaccess=(1.27955+1.26155)/2=1.27055mJ,whichgivesthedynamicpower=1.27055(nJ)/5.45309(ns)=233mW.ThetotalpowerconsumptionfortheMM=233+2941.12=3174.12mW(2941.12mWistheleakagepowerfortheMM).Thepowerconsumptioncalculationsfor2P-2L1ID-2L2-1Mare:totalprocessorcorepowerconsumption=21.008=2.016mW;totalL1-Icachepowerconsumption=267.72=135.44mW;totalL1-Dcachepowerconsumption=239.88=79.76mW;totalL2cachepowerconsumption=2153.84=307.68mW. Wesummarizetheareaandpeakpowerconsumptionfortheprocessorcores,L1-I,L1-D,L2,andMMforothermulti-coreembeddedarchitecturesassuminga45nmprocess.For2P-2L1ID-1L2-1M,theareasfortheprocessorcore,L1-I,L1-D,andL2are0.065,0.11,0.0998,and0.5075mm2,respectively,whereasthepowerconsumptionsare2.016,135.44,79.76,and253.28mW,respectively.For4P-4L1ID-4L2-1M,theareasfortheprocessorcore,L1-I,L1-D,andL2are0.13,0.2212,0.1996,and1.1556mm2,respectively,whereasthepowerconsumptionsare4.032,270.88,159.52,and615.36mW,respectively.For4P-4L1ID-1L2-1M,theareaandpowerconsumptionfortheL2are0.9366mm2and354.04mW,respectively,whereastheareaandpowerconsumptionsforremainderofthearchitecturalelementsaresameasthatof4P-4L1ID-4L2-1M.For4P-4L1ID-2L2-1M,theareaandpowerconsumptionfortheL2are1.015mm2and506.8mW,respectively,whereastheareaandpowerconsumptionforremainderofthearchitecturalelementsaresameasthatof4P-4L1ID-4L2-1M. Theareaandpowerresultsformulti-coreembeddedarchitecturesshowthattheMMconsumesthemostareaandpowerconsumptionfollowedbyL2,L1-I,L1-D,andtheprocessorcore.WeobservethatthesharedL2cachesfor2P-2L1ID-1L2-1Mand4P-4L1ID-1L2-1Mrequires14%and24%lessareaandconsumes21.5%and74%lesspowerascomparedtotheprivateL2cachesfor2P-2L1ID-2L2-1Mand4P-4L1ID-4L2-1M,respectively.ThehybridL2cachesfor4P-4L1ID-2L2-1Mrequires14%lessareaandconsumes21.4%lesspowerascomparedtotheprivateL2caches 327

PAGE 328

Table10-6. Areaandpowerconsumptionformulti-corearchitectures. ArchitectureArea(mm2)Power(mW) 2P-2L1ID-2L2-1M0.8528524.8962P-2L1ID-1L2-1M0.7823470.54P-4L1ID-4L2-1M1.70641049.794P-4L1ID-1L2-1M1.4874788.4724P-4L1ID-2L2-1M1.5658941.232 for4P-4L1ID-4L2-1MwhereasthesharedL2cachefor4P-4L1ID-1L2-1Mrequires8.7%lessareaandconsumes43%lesspowerascomparedtothehybridL2cachesfor4P-4L1ID-2L2-1M.Theseresultsindicatethatpower-efciencyofsharedLLCsimprovesasthenumberofcoresincreases. Table 10-6 showstheareaandpeakpowerconsumptionfordifferentmulti-coreembeddedarchitectures.Table 10-6 doesnotincludetheMMareaandpowerconsumption,whichallowstheresultstoisolatetheareaandpeakpowerconsumptionofprocessorcoresandcaches.ThisMMisolationfromTable 10-6 enablesdeeperinsightsandafaircomparisonfortheembeddedarchitecturessinceweassumeanoff-chipMMthathasthesamesizeandcharacteristicsforallevaluatedarchitectures.Toillustratetheareaandpowercalculationsformulti-coreembeddedarchitectures,weprovideareaandpowerconsumptioncalculationsfor2P-2L1ID-2L2-1Masanexample.Wepointoutthattheseareaandpowerconsumptioncalculationsuseconstituentareaandpowerconsumptioncalculationsforthearchitecturalelementsinamulti-coreembeddedarchitecture.For2P-2L1ID-2L2-1M,totalcachearea=0.11+0.0998+0.578=0.7878mm2,whichgivesoverallarea(excludingtheMM)=0.065+0.7878=0.8528mm2(0.065mm2istheareafortheprocessorcoresascalculatedpreviously).For2P-2L1ID-2L2-1M,thetotalcachepowerconsumption=135.44+79.76+307.68=522.88mW,whichgivestheoverallpowerconsumption(excludingtheMM)=2.016+522.88=524.896mW(2.016mWisthepowerconsumptionfortheprocessorcores). 328

PAGE 329

Table10-7. Areaandpowerconsumptionofarchitecturalelementsfor2P-2L1ID-2L2-1Mmulti-coreembeddedarchitecture. ElementArea(mm2)Power(mW) Core0.0652.016L1-I0.11135.44L1-D0.099879.76L20.578307.68MM34.223174.12 Theoverallareaandpowerconsumptionresultsfordifferentmulti-coreembeddedarchitectures(Table 10-6 )showthat2P-2L1ID-2L2-1Mrequires8.3%moreon-chipareaandconsumes10.4%morepowerascomparedto2P-2L1ID-1L2-1M.4P-4L1ID-4L2-1Mrequires8.2%and12.8%moreon-chipareaandconsumes10.3%and24.9%morepowerascomparedto4P-4L1ID-2L2-1Mand4P-4L1ID-1L2-1M,respectively.TheseresultsrevealthatthearchitectureswithsharedLLCsbecomemoreareaandpowerefcientascomparedtothearchitectureswithprivateorhybridLLCsasthenumberofcoresinthearchitectureincreases. Table 10-7 showstheareaandpeakpowerconsumptionfortheprocessorcores,L1-I,L1-D,L2,andMMfor2P-2L1ID-2L2-1Massuminga45nmprocess.ThecoreareasarecalculatedusingMoore'slawandtheITRSspecications[ 289 ](i.e.,thechiparearequiredforthesamenumberoftransistorsreducesapproximatelyby1/2xeverytechnologynode(process)generation).Forexample,ARM7TDMIcoreareais0.26mm2at130nmprocess[ 290 ],thecoreareaat45nmprocess(afterthreetechnologynodegenerations,(i.e.,130nm,90nm,65nm,45nm))canbegivenapproximatelyas(1/2)30.26=0.0325mm2. Weemphasizethattheworkloadsandcachemissrateshavealargeinuenceonanarchitecture'spowerconsumption.Forexample,highercachemissratesleadtoanincreaseinpowerconsumptionbecauseofmorefrequentrequeststothepowerhungryMM.Theprocessor-boundworkloadsarelikelytoconsumelesspowerthanthemixedworkloads,whichinturnconsumeslesspowerthanthememory-bound 329

PAGE 330

workloads.Thepowerefciencyofprocessor-boundworkloadsstemsfromthefactthatworkloadsspendmoretimeinpower-efcientprocessorcoresascomparedtothepowerhungrycachesandMM(asshowninthepowerbreakdownofarchitecturalelementsinTable 10-7 ).Furthermore,powerconsumptionofanarchitectureincreasesasthenumberofjobsincreasesbecauseincreasedutilizationrestrictsprolongedoperationofarchitecturalelements(e.g.,processorcores)inlow-powermodes. Inremainderofthissubsection,wediscussperformanceperwattandperformanceperunitarearesultsformulti-coreembeddedarchitecturesassuming64-bitFPoperations.WeobservethattheperformanceperwattandperformanceperunitareadeliveredbytheprocessorcoresandtheL1-IandL1-Dcachesforthesearchitecturesareveryclose(within7%),however,theL2cachepresentsinterestingresults.AlthoughtheMMperformanceperwattforthesearchitecturesalsodiffers,thisdifferencedoesnotprovidemeaningfulinsightsforthefollowingtworeasons:1)theMMistypicallyoff-chipandtheperformanceperwattismorecriticalforon-chiparchitecturalcomponentsthantheoff-chipcomponents,and2)ifmorerequestsaresatisedbytheLLC,thenfewerrequestsaredeferredtotheMM,whichdecreasestheMMthroughputandhencetheperformanceperwatt.Therefore,wemainlyfocusontheperformanceperwattandperformanceperunitareacalculationsfortheLLCsforourstudiedarchitecturesinthefollowing. Wecalculateperformanceperwattresultsformemory-boundworkloadswhentheL1-I,L1-D,andL2cachemissratesare0.01,0.13,and0.3,respectively.TheperformanceperwattvaluesfortheL2cachesare2.42MFLOPS/Wand3.1MFLOPS/WandtheperformanceperwattfortheMMis0.164MFLOPS/Wand0.074MFLOPS/Wfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whenPpm=0.95andN=5.Wepointoutthatourperformanceperwattcalculationsfor2P-2L1ID-2L2-1MincorporatetheaggregatethroughputfortheL2cache,whichisthesumofthroughputsforthetwoprivateL2cachesin2P-2L1ID-2L2-1M.TheperformanceperwattfortheL2 330

PAGE 331

cachesdropsto2.02MFLOPS/Wand2.53MFLOPS/WwhereastheperformanceperwattfortheMMdropsto0.137MFLOPS/Wand0.06MFLOPS/Wfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whenPpm=0.7andN=5.TheperformanceperwattvaluesfortheL2cachesare2.21MFLOPS/W,4.77MFLOPS/W,and3.08MFLOPS/Wfor4P-4L1ID-4L2-1M,4P-4L1ID-1L2-1M,and4P-4L1ID-2L2-1M,respectively,whenPpm=0.95andN=10.Weobservesimilartrendsformixedworkloadsandprocessor-boundworkloadsbutwithcomparativelylowerperformanceperwattfortheLLCcachesbecausetheseworkloadshavecomparativelylowerPpmascomparedtomemory-boundworkloads.TheperformanceperwattforcachesdropsasPpmdecreasesbecauselessrequestsaredirectedtotheLLCcachesforalowPpm,whichdecreasesthethroughputandhencetheperformanceperwatt.TheseresultsindicatethatarchitectureswithsharedLLCsprovidethehighestLLCperformanceperwattfollowedbyarchitectureswithhybridLLCsandprivateLLCs.Wepointoutthatthedifferenceinperformanceperwattforthesemulti-corearchitecturesismainlyduetothedifferenceintheLLCpowerconsumptionasthereisarelativelysmalldifferenceinthethroughputdeliveredbythesearchitecturesforthesameworkloadswithidenticalcachemissrates. ToinvestigatetheeffectsofcachemissratesontheperformanceperwattoftheLLCs,wecalculatetheperformanceforwattformemory-boundworkloads(Ppm=0.9)athighcachemissrates:L1-1=0.5,L1-D=0.7,andL2=0.7,andwhenN=10.TheperformanceperwattvaluesfortheL2cachesare5.4MFLOPS/Wand6.55MFLOPS/Wfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whereastheperformanceperwattvaluesare2.7MFLOPS/W,3.28MFLOPS/W,and4.69MFLOPS/Wfor4P-4L1ID-4L2-1M,4P-4L1ID-2L2-1M,and4P-4L1ID-2L2-1M,respectively.Resultsrevealthatathighcachemissrates,theperformanceperwattoftheLLCsincreasesbecauserelativelymorerequestsaredirectedtotheLLCsathigher 331

PAGE 332

cachemissratesthanlowercachemissrates,whichincreasesthethroughput,andhencetheperformanceperwatt. Basedonourexperimentalresultsfordifferentcachemissratesandworkloads,wedeterminethepeakperformanceperwattfortheLLCsforourstudiedmulti-coreembeddedarchitectures.ThepeakperformanceperwattvaluesfortheL2cachesare11.8MFLOPS/Wand14.3MFLOPS/Wfor2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whentheL1-I,L1-D,andL2cachemissratesareallequalto0.3,Ppm=0.9,andN=20.ThepeakperformanceperwattvaluesfortheL2cachesare7.6MFLOPS/W,9.2MFLOPS/W,and13.8MFLOPS/Wfor4P-4L1ID-4L2-1M,4P-4L1ID-2L2-1M,and4P-4L1ID-1L2-1M,respectively,whentheL1-I,L1-D,andL2cachemissratesareallequalto0.2,Ppm=0.9,andN=20.ThecomparisonofperformanceperwattresultsfortheLLCsfortwocoreandfourcorearchitecturesindicatethatarchitectureswithdifferentnumberofcoresandprivateorsharedcachesdeliverpeakperformanceperwattfordifferenttypesofworkloadswithdifferentcachemissratecharacteristics.Forexample,architectureswithtwoprocessorcoresandasharedLLCdeliverbetterperformanceperwattfortheLLCthanarchitectureswithfourprocessorcoresandasharedLLCwhentheL1-I,L1-D,andL2cachemissratesareallequalto0.3,Ppm=0.9,andN=20(i.e.,theLLC'sperformanceperwattis14.3MFLOPS/Wfor2P-2L1ID-1L2-1Mascomparedto10.9MFLOPS/Wfor4P-4L1ID-1L2-1M).Furthermore,thesearchitecturesdeliverpeakLLCperformanceperwattforworkloadswithmid-rangecachemissrates(e.g.,missratesof0.2or0.3)becauseathighercachemissrates,alargernumberofrequestsaredirectedtowardstheLLCs,whichcausestheLLCsutilizationtobeclosetoonethatresultsinanincreasedresponsetimeanddecreasedthroughput. Wecalculateperformanceperunitarearesultsformemory-boundworkloadswhentheL1-I,L1-D,andL2cachemissratesare0.01,0.13,and0.3,respectively.TheperformanceperunitareavaluesfortheL2cachesare1.29MFLOPS/mm2and 332

PAGE 333

1.54MFLOPS/mm2andtheperformanceperunitareavaluesfortheMMare15.25KFLOPS/mm2and6.9KFLOPS/mm2for2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whenPpm=0.95andN=5.TheperformanceperunitareavaluesfortheL2cachesdropto1.08MFLOPS/mm2and1.26MFLOPS/mm2whereastheperformanceperunitareavaluesfortheMMdropto12.68KFLOPS/mm2and5.6KFLOPS/mm2for2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whenPpm=0.7andN=5.TheperformanceperunitareavaluesfortheL2cachesare1.18MFLOPS/mm2,1.8MFLOPS/mm2,and1.54MFLOPS/mm2for4P-4L1ID-4L2-1M,4P-4L1ID-1L2-1M,and4P-4L1ID-2L2-1M,respectively,whenPpm=0.95andN=10.Weobservesimilartrendsforperformanceperunitareaformixedworkloadsandprocessor-boundworkloadsasfortheperformanceperwatttrendsexplainedabove.TheseresultsindicatethatarchitectureswithsharedLLCsprovidethehighestLLCperformanceperunitareafollowedbyarchitectureswithhybridLLCsandprivateLLCs.Wepointoutthatthedifferenceinperformanceperunitareaforthesemulti-corearchitecturesismainlyduetothedifferenceintheLLCthroughputasweensurethatthetotalLLCareaoccupiedbyamulti-coreembeddedarchitecturewithagivennumberofcoresremainscloseenough(aminordifferenceintheoccupiedareaoftheLLCsoccursfordifferentmulti-coreembeddedarchitecturesduetopracticalimplementationandfabricationconstraintsasdeterminedbyCACTI)toprovideafaircomparison. Basedonourqueuingtheoreticmodelsresultsandareacalculations,wedeterminethepeakperformanceperunitareafortheLLCsforourstudiedmulti-coreembeddedarchitectures.ThepeakperformanceperunitareavaluesfortheL2cachesare6.27MFLOPS/mm2and7.14MFLOPS/mm2for2P-2L1ID-2L2-1Mand2P-2L1ID-1L2-1M,respectively,whentheL1-I,L1-D,andL2cachemissratesareallequalto0.3,Ppm=0.9,andN=20.ThepeakperformanceperunitareavaluesfortheL2cachesare4.04MFLOPS/mm2,4.6MFLOPS/mm2,and5.22MFLOPS/mm2for4P-4L1ID-4L2-1M,4P-4L1ID-2L2-1M,and4P-4L1ID-1L2-1M,respectively,whentheL1-I,L1-D,andL2 333

PAGE 334

cachemissratesareallequalto0.2,Ppm=0.9,andN=20.Otherresultforpeakperformanceperunitarearevealsimilartrendsasforthepeakperformanceperwatttrendsandthereforewedonotdiscussthoseforbrevity. 10.6ConcludingRemarks Inthischapter,wedevelopedclosedproduct-formqueueingnetworkmodelsforperformanceevaluationofmulti-coreembeddedarchitecturesfordifferentworkloadcharacteristics.TheperformanceevaluationresultsindicatedthatthearchitectureswithsharedLLCsprovidedbettercacheresponsetimeandMFLOPS/WthantheprivateLLCsforallcachemissratesespeciallyasthenumberofcoresincreases.TheresultsalsorevealedthedownsideofsharedLLCsindicatingthatthesharedLLCsaremorelikelytocauseamainmemoryresponsetimebottleneckforlargercachemissratesascomparedtotheprivateLLCs.ThememorybottleneckcausedbysharedLLCsmayleadtoincreasedresponsetimeforprocessorcoresbecauseofstallingoridlewaiting.However,resultsindicatedthatthemainmemorybottleneckcreatedbysharedLLCscanbemitigatedbyusingahybridofprivateandsharedLLCs(i.e.,sharingLLCsbyafewernumberofcores)thoughhybridLLCsconsumemorepowerthanthesharedLLCsanddelivercomparativelylessMFLOPS/W.Theperformanceperwattandperformanceperunitarearesultsforthemulti-coreembeddedarchitecturesrevealedthatthemulti-corearchitectureswithsharedLLCsbecomemoreareaandpowerefcientascomparedtothearchitectureswithprivateLLCsasthenumberofprocessorcoresinthearchitecturesincreases.ThesimulationresultsfortheSPLASH-2benchmarksexecutingontheSESCsimulatorveriedourarchitecturalevaluationinsightsobtainedfromourqueueingtheoreticmodels. Inourfuturework,weplantoenhanceourqueueingtheoreticmodelsforperformanceevaluationofheterogeneousmulti-coreembeddedarchitectures. 334

PAGE 335

CHAPTER11PARALLELIZEDBENCHMARK-DRIVENPERFORMANCEEVALUATIONOFSYMMETRICMULTIPROCESSORSANDTILEDMULTI-COREARCHITECTURESFORPARALLELEMBEDDEDSYSTEMS Ason-chiptransistorcountsincrease,embeddedsystemdesignhasshiftedtomulti-andmany-corearchitectures.Aprimaryreasonforthisarchitecturereformationisthatperformancespeedupsarebecomingmoredifculttoachievebysimplyincreasingtheclockfrequencyoftraditionalsingle-corearchitecturesbecauseoflimitationsinpowerdissipation.Thissingle-coretomulti-coreparadigmshiftinembeddedsystemshasintroducedparallelcomputingtoembeddeddomain,whichwaspreviouslypredominantlyusedinsupercomputingonly.Furthermore,thisparadigmshiftincomputingindustryhasledtotheproliferationofdiversemulti-corearchitectures,whichnecessitatescomparisonandevaluationofthesedisparatearchitecturesfordifferentembeddeddomains(e.g.,distributed,real-time,reliability-constrained). Contemporarymulti-corearchitecturesarenotdesignedtodeliverhighperformanceforallembeddeddomains,butareinsteaddesignedtoprovidehighperformanceforasubsetofthesedomains.Thepreciseevaluationofmulti-corearchitecturesforaparticularembeddeddomainrequiresexecutingcompleteapplicationsprevalentinthatdomain.Despitethediversityofembeddeddomains,manyembeddeddomains(anddistributedembeddeddomaininparticularofwhichembeddedwirelesssensornetworks(EWSNs)areaprominentexample)haveinformationfusionasoneofthemostcriticalapplication.Furthermore,manyotherembeddedapplicationsconsistofvariouskernelssuchasGaussianelimination(GE)(usedinnetworkcoding)thatdominatethecomputationtime[ 262 ].Parallelizedapplicationsaswellaskernelsfromanembeddeddomainprovideaneffectivewayofevaluatingmulti-corearchitecturesfortheembeddeddomain. Inthischapter,weevaluatetwomulti-corearchitecturesforembeddedsystems:symmetricmultiprocessors(SMPs)andTilera'stiledmulti-corearchitectures(TMAs). 335

PAGE 336

WeconsiderSMPsbecauseSMPsareubiquitousandpervasive,whichprovidesastandard/fairbasisforcomparingwithothernovelarchitectures(e.g.,TMAs).WeconsiderTilera'sTILEPro64forTMAsbecauseofTilera'sinnovativearchitecturalfeatures(e.g.,three-wayissuesuperscalartiles,on-chipmeshinterconnect,anddynamicdistributedcache(DDC)technology).Weparallelizeinformationfusionapplication,GEkernel,andanembarrassinglyparallel(EP)benchmarkforSMPsandTMAsforperformanceevaluation. Thechoiceofamulti-corearchitecturedictatesthehigh-levelparallellanguagessincesomemulti-corearchitecturessupportproprietaryparallellanguageswhosebenchmarksarenotavailableopensource(e.g.,Tilera'sTILEPro64).Tileraprovidesamulti-coredevelopmentenvironment(MDE) ilib API[ 19 ]whereasmanySMPs(e.g.,theIntel-basedSMP)supportOpenMP(OpenMulti-processing),hencethecross-architecturalevaluationresultsmaybeaffectedbytheparallellanguage'sefciency.However,ouranalysisprovidesinsightsintotheattainableperformanceperwattfromthesetwomulti-corearchitecturesforembeddedsystems. Ourmaincontributionsinthischapterare: Tothebestofourknowledge,thisworkisthersttoevaluateSMPsandTMAsformulti-coreparallelembeddedsystems. WeprovideaquantitativecomparisonbetweenSMPsandTMAsbasedonvariousdevicemetrics(e.g.,computationaldensity,memorysubsystembandwidth).Thisquantitativecomparisonprovidesahigh-levelevaluationofthecomputationalcapabilityofthesearchitectures. Weparallelizeinformationfusionapplication,GE,andEPbenchmarksforSMPsandTMAstocompareandanalyzetheperformanceandperformanceperwattofthetwoarchitectures.Thisparallelizedbenchmark-drivenevaluationprovidesdeeperinsightsascomparedtoatheoreticalquantitativeapproach. Ourcross-architecturalevaluationresultsrevealthatTMAsperformbetterintermsofscalabilityandperformanceperwattthanSMPsforapplicationsinvolvingintegeroperationsondatawithlittlecommunicationbetweenoperatingtiles.For 336

PAGE 337

applicationsrequiringoatingpoint(FP)operationsandfrequentdependenciesbetweencomputations,SMPsoutperformTMAsintermsofscalabilityandperformanceperwatt. Theremainderofthischapterisorganizedasfollows.AreviewofrelatedworkisprovidedinSection 11.1 .Section 11.2 discussesthemulti-corearchitecturesstudiedinthischapteralongwiththeparallelizedbenchmarksforevaluatingthemulti-corearchitectures.Parallelcomputingmetricsleveragedtoevaluatethemulti-corearchitecturesaredescribedinSection 11.3 .Performanceevaluationresultsforthemulti-corearchitecturesarepresentedinSection 11.4 andSection 11.5 concludesthischapter. 11.1RelatedWork ParallelizationofalgorithmsandperformanceanalysisofSMPshasbeenafocusofpreviouswork.Sunetal.[ 291 ]investigatedperformancemetrics,suchasspeedup,efciency,andscalability,forsharedmemorysystems.Theauthorsidentiedthecausesofsuperlinearspeedups,suchascachesize,parallelprocessingoverheadreduction,andrandomizedalgorithmsforsharedmemorysystems.Brownetal.[ 292 ]studiedtheperformanceandprogrammabilitycomparisontotransformtheBorncalculation(amodelusedtostudytheinteractionsbetweenaproteinandsurroundingwatermolecules)usingbothOpenMPandMessagePassingInterface(MPI).TheauthorsobservedthattheprogrammabilityandperformancewerebetterfortheOpenMPimplementationascomparedtotheMPIimplementation,however,thescalabilityoftheMPIversionwassuperiortotheOpenMPversion.Livelyetal.[ 293 ]exploredtheenergyconsumptionandexecutiontimeofdifferentparallelimplementationsofscienticapplicationsusingMPI-onlyversusMPI/OpenMP.TheresultsindicatedthatthehybridMPI/OpenMPimplementationresultedinlessexecutiontimeandenergy.OurworkdiffersfromthepreviousparallelprogrammingworkinthatwecompareparallelimplementationsofdifferentbenchmarksusingOpenMPandTilera's ilib API(a 337

PAGE 338

proprietaryAPI)fortwomulti-corearchitecturesasopposedtocomparingOpenMPwithMPIasinmanypreviousworks. SomepreviousworkinvestigatedperformanceofTMAs.Bikshandietal.[ 294 ]demonstratedthathierarchicaltiledarrays(HTAs)yieldedincreasedperformanceonparallelizedbenchmarks,suchasmatrixmultiplication(MM)andNASAadvancedsupercomputing(NAS)benchmarks,byimprovingthedatalocality.Zhuetal.[ 295 ]presentedaperformancestudyofOpenMPlanguageconstructsontheIBMCyclops-64(C64)architecturethatintegrated160processingcoresonasinglechip.TheauthorsobservedthattheoverheadofOpenMPlanguageconstructsontheC64architecturewasatleastoneorderofmagnitudelowerascomparedtothepreviousworkonconventionalSMPsystems.Garciaetal.[ 296 ]proposedahigh-performanceMMalgorithmfortheIBMC64.TheirproposedMMalgorithmwasabletoattain55.2%ofthepeakperformanceofthechip. Somepreviousworkinvestigatedmulti-corearchitecturesfordistributedembeddedsystems.Doganetal.[ 240 ]evaluatedasingle-andamulti-corearchitectureforbiomedicalsignalprocessinginwirelessbodysensornetworks(WBSNs)wherebothenergy-efciencyandreal-timeprocessingarecrucialdesignobjectives.Resultsrevealedthatthemulti-corearchitectureconsumed66%lesspowerthanthesingle-corearchitectureforhighbiosignalcomputationworkloads(i.e.,50.1Megaoperationsperseconds(MOPS))whereasconsumed10.4%morepowerthanthesingle-corearchitectureforrelativelylightcomputationworkloads(i.e.,681Kilooperationspersecond(KOPS)).Kwoketal.[ 238 ]proposedFPGA-basedmulti-corecomputingforbatchprocessingofimagedataindistributedEWSNs.Theauthorsobservedthattheimagedataconsistedoftwo-dimensionalarrayofpixelsandexhibitedahigh-levelofparallelismforaccelerationonFPGAssinceFPGAsconsistoftwo-dimensionalarrayofcongurablelogicblocks(CLBs).TheauthorsnotedthatasetofCLBs(e.g.,4CLBs)couldhandleapixelandeachsetofCLBdirectlycommunicatedwithitsneighborsets. 338

PAGE 339

TheauthorsfurtherproposedtoemployFPGA-basedaccelerationofimagedataatalayerhigherthantheleafsensornodesinahierarchicalEWSN(thelayerhigherthanleafsensornodescorrespondstoclusterheadsinhierarchicalEWSNs).ThespeedupobtainedbyFPGA-basedaccelerationat20MHzforedgedetection,whichisanimageprocessingtechnique,was22xascomparedto48MHzMicroBlazemicroprocessor.OurworkdiffersfromtheworkdonebyKwoketal.[ 238 ]inthatwestudythefeasibilityoftwoxedlogicmulti-corearchitectureparadigms(i.e.,SMPsandTMAs)insteadofrecongurablelogic,forparallelembeddedsystems. Variousnetworkingalgorithmshavebeenimplementedonmulti-corearchitecturesforinvestigatingthearchitecturesfeasibilityfordistributedembeddedsystems.Kimetal.[ 248 ]proposedaparallelnetworkcodingalgorithmandimplementedtheproposedalgorithmontheCellBroadbandEnginefordemonstrationpurposes.Kulkarnietal.[ 243 ]evaluatedcomputationalintelligence(CI)algorithmsforvarioustasksperformedbydistributedEWSNssuchasinformationfusion,energy-awarerouting,scheduling,security,andlocalization.OurworkdiffersfromthepreviousworkinthatweimplementGEandEPbenchmarks,whichareusedaskernelsinmanyembeddeddomains(e.g.,GEisusedindecodingpartofnetworkcoding),aswellasinformationfusionapplicationfortwomulti-corearchitecturesamenablefordistributedembeddedsystems.AresearchgroupatPurdueexploredparallelhistogram-basedparticlelterforobjecttrackingonsingleinstructionmultipledata(SIMD)-basedsmartcameras[ 244 ].Ourworkdiffersfromthepreviousworkonltersinembeddedsystemsinthatweimplementamovingaveragelterforreducingnoiseinsenseddataaspartofinformationfusionapplication,whichalsoservesasaperformanceevaluationbenchmarkforSMPsandTMAs. AlthoughthereexistsworkforindependentperformanceevaluationofSMPsandTMAs,therehasbeennoworkintheliteraturethatcross-evaluatesthetwoarchitectures.OurparallelizationofthesamesetofbenchmarksforSMPsandTMAsenablescross-architecturalevaluationofthetwoarchitectures.Furthermore,ourwork 339

PAGE 340

backsupexperimentalresultswiththeoreticalcomputationofperformancemetrics(e.g.,computationaldensityandmemorysubsystembandwidth)forthetwoarchitectures. 11.2Multi-CoreArchitecturesandBenchmarks Inthissection,wedescribethemulti-corearchitecturesstudiedinthischapterandtheapplicationsandkernelsusedasbenchmarkstoevaluatethesearchitectures. 11.2.1Multi-CoreArchitectures Inthissubsection,wegiveanoverviewofthetwomulti-corearchitecturesstudiedinthischapter. 11.2.1.1Symmetricmultiprocessors SMPsarethemostpervasiveandprevalenttypeofparallelarchitecturethatprovidesaglobalphysicaladdressspaceandsymmetricaccesstoallofmainmemoryfromanyprocessorcore.Everyprocessorhasaprivatecacheandalloftheprocessorsandmemorymodulesattachtoasharedinterconnect,typicallyasharedbus[ 262 ].Forthischapter,weconsideranIntel-basedSMP,whichisan8-coreSMPconsistingoftwochipscontainingIntel'sXeonE5430quad-coreprocessorsat45nmCMOSlithography[ 297 ](henceforthwedenotetheIntel-basedSMPasSMP2xQuadXeonforconciseness).TheXeonE5430quad-coreprocessorchipoffersamaximumclockfrequencyof2.66GHz,integratesa32KBleveloneinstruction(L1-I)anda32KBlevelonedata(L1-D)cachepercore,a12MBleveltwo(L2)uniedcache(adualcoreoptionwitha6MBL2cacheisalsoavailable),anda1333MHzfrontsidebus(FSB).TheXeonE5430leveragesIntel'senhancedfront-sidebusrunningat1333MHzthatenablesenhancedthroughputbetweeneachoftheprocessorcores[ 298 ]. 11.2.1.2Tiledmulti-corearchitectures TMAsexploitmassiveon-chipresourcesbycombiningeachprocessorcorewithaswitchtocreateamodularelementcalledatile,whichcanbereplicatedtocreateamulti-coreplatformwithanynumberoftiles.TMAscontainahigh-performanceinterconnectionnetworkthatconstrainsinterconnectionwirelengthtobenolongerthan 340

PAGE 341

Figure11-1. Tilera'sTILEPro64processor[ 299 ]. thetilewidthandaswitch(communicationrouter)interconnectsneighboringswitches.ExamplesofTMAsincludetheRawprocessor,Intel'sTera-Scaleresearchprocessor,andTilera'sTILE64,TILEPro64,andTILE-Gxprocessorfamily[ 183 ][ 300 ][ 220 ].Forthischapter,weconsiderTILEPro64processor,whichisdepictedinFig. 11-1 .TILEPro64processorfeaturesan8x8gridof64tiles(cores)at90nmCMOSlithographywhereeachtileconsistsofathree-wayverylonginstructionword(VLIW)pipelinedprocessorcapableofdeliveringuptothreeinstructionspercycle(IPC),integratedL1andL2caches,andanon-blockingswitchthatintegratesthetileintoapower-efcient31Tbpson-chipinterconnectmesh.EachtileoftheTILEPro64hasa16KBL1cache(8KBinstructioncacheand8KBdatacache)anda64KBL2cache,resultinginatotalof5MBofon-chipcachewithTilera'sdynamicdistributedcache(DDC)technology.Eachtilecanindependentlyrunacompleteoperatingsystem(OS)ormultipletilescanbegroupedtogethertorunamulti-processingOS,suchasSMPLinux[ 301 ].TheTILEProprocessorfamilycansupportawiderangeofcomputingapplications,includingadvancednetworking,wirelessinfrastructure,telecom,digitalmultimedia,andcloudcomputing. 341

PAGE 342

11.2.2ApplicationsandKernelsLeveragedasBenchmarks Manyapplicationsrequireembeddedsystemstoperformvariouscompute-intensivetasksthatoftenexceedsthecomputingcapabilityoftraditionalsingle-coreembeddedsystems.Inthissubsection,webrieydescribetheapplicationsand/orkernelsthatweparallelizetoleverageasbenchmarksforevaluatingmulti-corearchitecturesforembeddedsystems.Weconsiderinformationfusionapplication,GE,andEPbenchmarksforourstudy. 11.2.2.1Informationfusion Perhapsoneofthecrucialprocessingtaskindistributedembeddedsystemsisinformationfusion,whichbenetsfromamulti-coreprocessorintheseembeddedsystems.Distributedembeddedsystems(e.g.,EWSNs)producealargeamountofdatathatmustbeprocessed,delivered,andassessedaccordingtoapplicationobjectives.Sincethetransmissionbandwidthislimited,informationfusioncondensesthesenseddataandtransmitsonlytheselectedfusedinformationtothebasestationnode.Additionally,thedatareceivedfromneighboringembeddednodesisoftenredundantandhighlycorrelated,whichwarrantsfusingthesensed/receiveddata.Formally,informationfusionencompassestheory,techniques,andtoolscreatedandappliedtoexploitthesynergyintheinformationacquiredfrommultiplesources(sensor,databases,etc.)inamannerthattheresultingdecisionoractionisinsomesense(qualitativelyorquantitatively)betterintermsofaccuracyorrobustnessthanwouldbepossibleifanyoffusedsourceswereusedindividuallywithoutsuchsynergyexploitation[ 246 ].Dataaggregationisaninstanceofinformationfusioninwhichthedatafromvarioussourcesisaggregatedbymeansofsummarizationfunctions(e.g.,minimum,maximum,andaverage)thatreducethevolumeofdatabeingmanipulated.Informationfusioncanreducetheamountofdatatrafc,lternoisymeasurements,andmakepredictionsandinferencesaboutamonitoredentity. 342

PAGE 343

Consideringthesignicanceofinformationfusionfordistributedembeddedsystem,weparallelizeaninformationfusionapplicationbothforSMPsandTMAstoinvestigatethesuitabilityofthetwoarchitecturesfordistributedembeddedsystems.Weconsiderahierarchicaldistributedembeddedsystemforinformationfusionconsistingofembeddedsensornodessuchthateachclusterheadreceivessensingmeasurementsfrom10single-coreembeddedsensornodesequippedwithtemperature,pressure,humidity,acoustic,magnetometer,accelerometer,gyroscope,proximity,andorientationsensors[ 302 ].Theclusterheadimplementsamovingaveragelter,whichcomputesthearithmeticmeanofanumberofinputmeasurementstoproduceeachoutputmeasurement,toreducerandomwhitenoisefromsensormeasurements.Givenaninputsensormeasurementvectorx=(x(1),x(2),...),themovingaveragelterestimatesthetruesenormeasurementvectorafternoiseremovaly=(y(1),y(2),...)as y(k)=1 MM)]TJ /F4 7.97 Tf 6.59 0 Td[(1Xi=0x(k)]TJ /F3 11.955 Tf 11.95 0 Td[(i),8kM(11) whereMisthelter'swindow,whichdictatesthenumberofinputsensormeasurementstofusefornoisereduction.Whenthesensormeasurementshaverandomwhitenoise,themovingaverageltermanagestoreducethenoisevariancebyafactorofp M.Forpracticaldistributedembeddedsystems,Mcanbechosenasthesmallestvaluethatcanreducethenoisetomeettheapplicationrequirements.Foreachofthelteredsensormeasurements(i.e.,afterapplyingmovingaveragelter)foreachoftheembeddedsensornodeinthecluster,theclusterheadcalculatestheminimum,maximum,andaverageofthesensedmeasurements.Thisinformationfusionapplicationrequires100N(3+M)operationswithcomplexityO(NM)whereNdenotesthenumberofsensorsamples.Parallelperformancemetricscalculationsfortheparallelizedinformationfusionapplicationhighlightstheadvantagesattainedbymulti-corearchitecturesforclusterheads/basestationnodesascomparedtosingle-corearchitectures. 343

PAGE 344

11.2.2.2Gaussianelimination TheGEalgorithmsolvesasystemoflinearequationsandisusedinmanyscienticapplications,includingtheLinpackbenchmarkusedforrankingsupercomputersintheTOP500listoftheworld'sfastestcomputers[ 177 ][ 303 ].Furthermore,GEisusedindistributedembeddedsystems.Forexample,thedecodingalgorithmfornetworkcodingusesavariantofGE(networkcodingisacodingtechniquetoenhancenetworkthroughputindistributedembeddedsystems)[ 248 ].ThesequentialruntimeoftheGEalgorithmisO(n3).OurimplementationofGEfocusesonupper-triangularizationofmatricesandrequires(2=3)n3+(7=4)n2+(7=2)nFPoperations(theseFPoperationsincludetheextraoperationsrequiredtomaketheGEalgorithmnumericallystable). 11.2.2.3Embarrassinglyparallelbenchmark TheEPbenchmarkistypicallyusedtoquantifythepeakattainableperformanceofaparallelcomputerarchitecture.FortheEPbenchmark,weconsiderthegenerationofnormallydistributedrandomvariatesthatareusedinsimulationofstochasticapplications[ 304 ].WeuseBox-Muller'salgorithmforourEPbenchmark,whichrequires99nFPoperations(weassumethatsquarerootrequires15FPoperationsandlogarithm,cosine,andsineeachrequire20FPoperations[ 305 ]). 11.3ParallelComputingDeviceMetrics Inthissection,wecharacterizevariousmetricstocompareparallelarchitectures.Thesemetricsincluderuntime,speedup,efciency,cost,scalability,computationaldensity(CD),computationaldensityperwatt(CD/W),andmemory-sustainableCD.Inthischapter,weusethesemetricstoprovideaquantitativecomparisonofourinvestigatedmulti-corearchitectures. 11.3.1RunTime TheserialruntimeTsofaprogramisthetimeelapsedbetweenthebeginningandendoftheprogramonasequentialcomputer.TheparallelruntimeTpisthetime 344

PAGE 345

elapsedfromthebeginningofaprogramtothemomentthelastprocessornishesexecution. 11.3.2Speedup Speedupmeasurestheperformancegainachievedbyparallelizingagivenapplication/algorithmoverthebestsequentialimplementationofthatapplication/algorithm.SpeedupSisdenedastheratiooftheserialruntimeTsofthebestsequentialalgorithmforsolvingaproblemtothetimetakenbytheparallelalgorithmtosolvethesameproblemonpprocessors,thatis,S=Ts=Tp. 11.3.3Efciency Efciencymeasuresthefractionofthetimeforwhichaprocessorisusefullyemployed.EfciencyEisdenedastheratioofthespeedupStothenumberofprocessorsp,thatis,E=S=p. 11.3.4Cost Costmeasuresthesumofthetimethateachprocessorspendssolvingtheproblem.ThecostCofsolvingaproblemonaparallelsystemisdenedastheproductoftheparallelruntimeTpandthenumberofprocessorspused,thatis,C=Tpp.Aparallelcomputingsystemiscostoptimalifthecostofsolvingaproblemonaparallelcomputerisproportionaltotheexecutiontimeofthebestknownsequentialalgorithmonasingleprocessor[ 306 ]. 11.3.5Scalability Scalabilityofaparallelsystemmeasurestheperformancegainachievedbyparallelizingastheproblemsizeandthenumberofprocessorsvaries.Formally,scalabilityofaparallelsystemisameasureofthesystem'scapacitytoincreasespeedupinproportiontothenumberofprocessors.Ascalableparallelsystemmaintainsaxedefciencyasthenumberofprocessorsandtheproblemsizeincreases[ 306 ]. 345

PAGE 346

11.3.6ComputationalDensity TheCDmetricmeasuresthecomputationalperformanceofadevice(parallelsystem).TheCDfordoubleprecisionFP(DPFP)operationscanbegivenas[ 307 ] CDDPFP=fXiNi CPIi(11) wherefdenotestheoperatingfrequencyofthedevice,NidenotesthenumberofinstructionsoftypeirequiringFPcomputationsthatcanbeissuedsimultaneously,andCPIidenotestheaveragenumberofcyclesperinstructionoftypei. 11.3.7ComputationalDensityperWatt TheCD/Wmetrictakesintoaccountthepowerconsumptionofadevicewhilequantifyingperformance.Weproposeasystem-levelpowermodeltoestimatethepowerconsumptionofmulti-coreplatformsthatcanbeusedinestimatingtheCD/W.Ourpowermodelconsidersboththeactiveandidlemodepowerconsumptionsofmulti-coreplatforms.Givenamulti-coreplatformwithatotalofNprocessorcores,thepowerconsumptionoftheplatformwithpactiveprocessorcorescanbegivenas Pp=pPactivemax N+(N)]TJ /F3 11.955 Tf 11.95 0 Td[(p)Pidlemax N(11) wherePactivemaxandPidlemaxdenotethemaximumactiveandidlemodepowerconsumptionsofthemulti-coreplatform,respectively.Pactivemax=NandPidlemax=Ngivetheactiveandidlemodepower,respectively,perprocessorcoreandassociatedswitchingandinterconnectionnetworkcircuitry.Ourpowermodelincorporatesthepowersavingfeaturesofthestate-of-artmulti-coreplatforms.Contemporarymulti-coreplatformsprovideinstructionstoswitchtheprocessorcoresandassociatedcircuitry(switches,clock,interconnectionnetwork)notusedinacomputationtoalow-poweridlestate.Forexample,asoftware-usableNAPinstructioncanbeexecutedonatileintheTilera'sTMAstoputthetileintoalow-powerIDLEmode[ 299 ][ 308 ].Similarly,theXeonprocessor5400seriesprovidesanextendedHALTstatewhereasOpteron 346

PAGE 347

processorsprovideaHALTmode(enteredbyexecutingtheHLTinstruction)toreducepowerconsumptionbystoppingtheclocktointernalsectionsoftheprocessor(otherlow-powermodelsarealsoavailable)[ 298 ].InvestigationofacomprehensivepowermodelforTMAsisthefocusofourfuturework. 11.3.8Memory-SustainableComputationalDensity Memorysubsystemperformanceplaysacrucialroleinoverallparallelsystemperformancegiventheincreasingdisparitybetweentheprocessorandmemoryspeedimprovements.Theinternalmemorybandwidth(IMB)metricassesseson-chipmemoryperformancebyquantifyingthenumberofoperationsthatcanbesustainedbythecomputationalcoresforagivenapplication.IMBcanbegivenas(adaptedfrom[ 307 ]) IMB=XiHciNciPciWcifci 8CPAci(11) whereHcidenotesthehit-rateforacacheoftypei;Ncidenotesthenumberofcachesoftypei;Pci,Wci,fci,andCPAcidenotethenumberofports,widthofmemory,operatingfrequency,andnumberofcyclesperaccessforacacheoftypei,respectively.IMBforeachcache-levelandtypeiscalculatedseparatelyandthensummedtoobtainthetotalIMB. Inparallelsystems,externalmemoryunitssupplementon-chipmemoryunits.Externalmemoryunitsarefartherfromthecomputationalcoresascomparedtotheon-chipmemoryunits,however,theexternalunits'bandwidthimpactstheperformance.Theexternalmemorybandwidth(EMB)metricassessesthisexternalmemorybandwidth[ 309 ].WedenetheoreticalEMBas EMBth=XiNmiWmiTRmi 8(11) whereNmidenotesthenumberofmemorymodulesoftypei,andWmiandTRmidenotethememory-interfacewidthandtransferrateofamemoryoftypei,respectively.TRmiistypicallyafactor(14xdependinguponthememorytechnology)oftheexternal 347

PAGE 348

memoryclockrate.ThistheoreticalEMBistypicallynotattainableinpracticebecauseofdelayinvolvedinaccessingexternalmemoryfromaprocessorcore,therefore,wedenetheeffectiveEMBEMBeffas EMBeff=XiEMBthmi CPAmi(11) whereEMBthmiandCPAmidenotethetheoreticalEMBandnumberofcyclesperaccessforamemoryoftypei,respectively. 11.4Results Inthissection,werstpresentaquantitativecomparisonofSMPsandTMAsbasedonthedevicemetricscharacterizedinSection 11.3 .ThenwepresentperformanceandperformanceperwattresultsofourparallelizedbenchmarksforSMPsandTMAs.WeparallelizethestudiedbenchmarksforSMPsandTMAsusingOpenMPandTilera'sMDE ilib API.ThepurposeofthiscomparisonbetweenSMPsandTMAsistoevaluateSMPsandTMAsasmulti-coreprocessorarchitecturesforembeddedsystems. WeobtainthepowerconsumptionvaluesoftheSMPsandTMAsfromthedevices'respectivedatasheetsandusethesevaluesinourpowermodel(Equation 11 ).Forexample,theTILEPro64hasamaximumactiveandidlemodepowerconsumptionof28Wand5W,respectively[ 307 ][ 310 ].Intel'sXeonE5430hasamaximumpowerconsumptionof80Wandaminimumpowerconsumptionof16WinanextendedHALTstate[ 297 ][ 298 ]. 11.4.1QuantitativeComparisonofMulti-corePlatforms Inthissubsection,weprovideaquantitativecomparisonoftheSMPsandTMAsbasedontheparallelcomputingdevicemetricscharacterizedinSection 11.3 .Table 11-1 summarizesthedevicemetricscalculations.Table 11-1 showsthattheSMPsprovidegreaterCDthantheTMAs,however,theTMAsexcelinmemorysubsystemperformanceandthereforesustainableCD. FortheCDandCD/Wmetrics,weprovideloweranduppervaluesforthetheoreticalCD.TheupperCDvaluesrepresenttheabsolutemaximumCDgiven 348

PAGE 349

Table11-1. Parallelcomputingdevicemetricsforthemulti-corearchitectures(Intel'sXeonE5430referstotheXeonquad-corechiponSMP2xQuadXeon). Multi-coreCDDPFPCD/WIMBL1-I+L1-DIMBL2IMBL1+L2EMBthEMBeffPlatform(GFLOPS)(MFLOPS/W)(GB/s)(GB/s)(GB/s)(GB/s)(MB/s) Intel'sXeonE54302.3432953868170387117910.66(perDDR2)107TILEPro640.41814643892228289724925.6324SMP2xQuadXeon4.7865810761363406161423581701712 349

PAGE 350

thatthesuperscalarprocessorcoresinthemulti-corearchitecturesissuesaswellasretiresthemaximumpossiblenumberofinstructionseachclockcycleassumingthateachFPinstruction(operation)completesinoneclockcycle.Forexample,theTILEPro64andXeonE5430cansimultaneouslyissuethreeintegerinstructionsandfourinteger/FPinstructions,respectively[ 298 ][ 299 ].ThelowerCDvaluesindicateCDvaluesthatmaybeattainableforrealworkloads/benchmarksasthelowerCDvalueisbasedontheactualnumberofcyclesrequiredfordifferentFPoperations.Forexample,weassumethatXeonE5430processorcanperform32-and64-bitarithmeticandlogicunit(ALU)instructions(e.g.,add,subtract,rotate,shift)inonecycle,DPFPmultiplyinfourcycles,andDPFPdivisioninseventeencycles.WepointoutthatthelowerCDvaluecalculationsinTable 11-1 arebasedonaxedinstructionmix(50%ALU,35%multiply,and15%divide)anddifferentlowerCDvaluescanbeobtainedfordifferentinstructionmixesdependingonabenchmark's/workload'sparticularcharacteristics,however,ourcalculationsprovideanestimateaswellasoutlinetheCDcalculationprocedure.SincetheTILEPro64donothaveFPexecutionunitsnordoestheofcialdocumentationmentionFPperformance,wecalculatetheCDvaluesfortheTILE64/TILEPro64basedonexperimentalestimatesofcyclesrequiredtoexecutevariousFPoperations. TheIMBandEMBcalculationsinTable 11-1 arebasedonEquation 11 ,Equation 11 ,andEquation 11 .ThelowervalueforIMBcorrespondstotypicallowcachehitratesofL1-I(0.5),L1-D(0.3),andL2(0.3)[ 285 ]whereastheuppervalueforIMBcorrespondstohitratesof1(or100%). 11.4.2Benchmark-DrivenResultsforSMPs Inthissubsection,wepresentperformanceresultsforSMPsbasedonourparallelizedbenchmarks.TheseresultsprovidedeeperinsightsintotheactualattainableperformanceascomparedtothequantitativetheoreticalperformanceestimatesdepictedinTable 11-1 350

PAGE 351

Table11-2. PerformanceresultsfortheinformationfusionapplicationforSMP2xQuadXeonwhenM=40. ProblemSize#ofCoresExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwattNpTpS=Ts=TpE=S=pC=Tpp(MOPS)(MOPS/W) 3000,000112.021112.021073.222.363000,00027.871.530.7615.741639.1425.613000,00044.032.980.7416.12320133.343000,00062.894.20.717.344463.6734.873000,00082.484.850.6119.845201.632.51 351

PAGE 352

Table 11-2 depictstheperformanceresultsfortheinformationfusionapplicationforSMP2xQuadXeonwhenthenumberofsamplesNtobefusedisequalto3000,000andmovingaveragelter'swindowM=40.Resultsareobtainedwithcompileroptimizationlevel-O3,whichoptimizestheexecutionoftheapplicationforthegivenarchitecture.Resultsshowthatthemulti-coreprocessorspeedsuptheexecutiontimeascomparedtoasingle-coreprocessor(i.e.,executingtheapplicationonasingleprocessorcore).Themulti-coreprocessorincreasesthethroughput(MOPS)ascomparedtoasingle-coreprocessor.Forexample,eightprocessorcoresincreasethethroughputoftheinformationfusionapplicationbyafactorof4.85xascomparedtoasingleprocessorcore.Moreimportantlymultiplecoresexecutetheinformationfusionapplicationmorepowerefcientlyascomparedtoasingle-coreprocessorascanbeseenfromtheperformanceperwattresultsinTable 11-2 .Forexample,theSMP-basedmulti-coreprocessorusingfourprocessorcores(i.e.,p=4)attains49%betterperformanceperwattthanasingleprocessorcore. Table 11-3 depictstheperformanceresultsfortheGEbenchmark,obtainedwiththecompileroptimizationlevel-O3,forSMP2xQuadXeonwhen(m,n)=(2000,2000)wheremisthenumberoflinearequationsandnisthenumberofvariablesinthelinearequation.Resultsshowthatthemulti-coreprocessorspeedsuptheexecutiontimeascomparedtothesingle-coreprocessorandthespeedupsattainedbythemulti-coreprocessorareproportionaltop(i.e.,numberofprocessorcores).Forexample,themulti-coreprocessorwithp=8increasesthethroughput(MFLOPS)andperformanceperwatt(MFLOPS/W)byafactorof7.45xand2.2x,respectively,ascomparedtoasingle-coreprocessor. Table 11-4 depictstheperformanceresultsfortheEPbenchmark,obtainedwiththecompileroptimizationlevel-O3,forSMP2xQuadXeonwhenthenumberofrandomvariatesgeneratednisequalto100,000,000.WeobservethattheSMParchitecturedelivershigherMFLOPS/Waspincreasesandattainedspeedupsareclosetop(i.e.,ideal 352

PAGE 353

Table11-3. PerformanceresultsfortheGaussianeliminationbenchmarkforSMP2xQuadXeon. ProblemSize#ofCoresExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwatt(m,n)pTpS=Ts=TpE=S=pC=Tpp(MFLOPS)(MFLOPS/W) (2000,2000)18.05118.05663.3513.82(2000,2000)23.762.141.077.521420.2122.2(2000,2000)42.083.870.978.322567.3126.74(2000,2000)61.425.670.948.523760.5629.38(2000,2000)81.087.450.938.644944.4430.9 353

PAGE 354

Table11-4. PerformanceresultsfortheEPbenchmarkforSMP2xQuadXeon. ProblemSize#ofCoresExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwattnpTpS=Ts=TpE=S=pC=Tpp(MFLOPS)(MFLOPS/W) 100,000,00017.61117.611300.9227.1100,000,00023.821.9917.642591.6240.49100,000,00041.923.960.997.685156.2553.71100,000,00061.295.90.987.747674.459.96100,000,00080.977.840.987.7610,206.263.79 354

PAGE 355

speedups).AcomparisonwithTable 11-1 indicatesthatSMP2xQuadXeonattains96%ofthetheoreticallowerCDvalues. TheseresultsverifythatSMP-basedmulti-corearchitecturesprovidebetterperformanceperwattascomparedtoacomparablesingle-corearchitectureforembeddedsystems.Hence,embeddedsystemsusinganSMP-basedmulti-coreprocessorismoreperformance-andpower-efcientascomparedtoanembeddedsystemusingasingle-coreprocessor. 11.4.3Benchmark-DrivenResultsforTMAs Inthissubsection,wepresentperformanceresultsfortheTMAsbasedonourparallelizedbenchmarkstoinvestigatethefeasibilityofTMA-basedmulti-coreprocessorsforembeddedsystems. Table 11-5 depictstheperformanceresultsforinformationfusionapplication,obtainedwiththecompileroptimizationlevel-O3,fortheTMA-basedmulti-coreprocessor(TILEPro64)whenN=3000,000andM=40.ResultsindicatethattheTMA-basedmulti-coreprocessorspeedsuptheexecutiontimeproportionaltothenumberoftilesp(i.e.,idealspeedup)ascomparedtoacomparablesingle-coreprocessor(i.e.,executingtheapplicationonasingleTMAtile).Theefciencyremainscloseto1(one)andcostremainsnearlyconstantasthenumberoftilesincreasesindicatingidealscalabilityoftheTMA-basedmulti-coreprocessorfortheinformationfusionapplication.Forexample,theTMA-basedmulti-coreprocessorincreasesMOPSandMOPSperwatt(MOPS/W)by48.4xand11.3x,respectively,forp=50ascomparedtoasingleTMAtile. Table 11-6 depictstheperformanceresultsfortheGEbenchmark,obtainedwiththecompileroptimizationlevel-O3,forTILEPro64when(m,n)=(2000,2000).ResultsshowthattheTMA-basedmulti-coreprocessorspeedsuptheexecutiontimeascomparedtothesingle-coreprocessor,however,attainedspeedupismuchlessthanp.Furthermore,efciencydecreasesandcostincreasesaspincreasesindicatingpoorscalabilityfor 355

PAGE 356

Table11-5. PerformanceresultsfortheinformationfusionapplicationforTILEPro64whenM=40. ProblemSize#ofTilesExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwattNpTpS=Ts=TpE=S=pC=Tpp(MOPS)(MOPS/W) 3000,000170.651170.65182.634.073000,000235.052170.136864.333000,000417.184.11.0268.72750.87116.63000,000611.486.21.0368.91123.69156.943000,00088.97.940.9971.21449.44183.943000,000106.7910.41.0467.91899.85221.173000,000501.4648.40.97738835.62384.66 356

PAGE 357

Table11-6. PerformanceresultsfortheGaussianeliminationbenchmarkforTILEPro64. ProblemSize#ofTilesExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwatt(m,n)pTpS=Ts=TpE=S=pC=Tpp(MFLOPS)(MFLOPS/W) (2000,2000)1416.7111416.7112.812.39(2000,2000)2372.351.120.56744.714.342.51(2000,2000)4234.111.80.45936.4422.813.54(2000,2000)6181.232.30.381087.3829.464.11(2000,2000)8145.512.860.361164.0836.74.66(2000,2000)1684.454.90.311351.263.235.88(2000,2000)2852.257.980.281463102.26.79(2000,2000)4436.2611.490.261595.44147.277.08(2000,2000)5629.72140.251664.32179.687.15 357

PAGE 358

GEbenchmarkontheTMA-basedmulti-coreprocessor.Thisisbecauseofexcessivememoryoperations,dependencybetweenthecomputations,andsynchronizationoperationsrequiredbytheGEbenchmark.However,theTMA-basedmulti-coreprocessorstillattainsbetterperformanceandperformanceperwattthanacomparablesingle-coreprocessor.Forexample,theTMA-basedmulti-coreprocessorutilizing56tilesincreasesthethroughput(MFLOPS)andperformanceperwatt(MFLOPS/W)by14xand3x,respectively,ascomparedtoaTMAsingletile. Table 11-7 depictstheperformanceresultsfortheEPbenchmark,obtainedwiththecompileroptimizationlevel-O3,forTILEPro64whenn=100,000,000.ResultsindicatethattheTMA-basedmulti-coreprocessordelivershigherperformanceandperformanceperwattaspincreases.Forexample,theTMA-basedprocessorincreasesperformanceandperformanceperwattby7.9xand5.4x,respectively,forp=8andby42.6xand9.1x,respectively,forp=56ascomparedtoasingle-core/tileoftheTMA.SincetheEPbenchmarkenablesquanticationofthepeakattainableperformanceofaparallelarchitecture,ourEPbenchmarkenablesquantifyingpeakFPperformanceattainedbyTMAs.AcomparisonwithTable 11-1 indicatesthedisparityofattainedCDwiththeoreticalCD,whichcanbeexplainedwithourassumptionsinSection 11.4.1 regardingthenumberofclockcyclesrequiredtoexecuteFPoperationsonTilera'sintegerexecutionunits(nodataisavailableregardingFPoperations'executiontimeinTilera'sofcialdocumentation).WepointoutthattheEPbenchmarkinvolvescomplexFPoperations(e.g.,squarerootandlogarithm)thattakemanymorecyclesthansimpleFPoperations(e.g.,addandmultiply),especiallyonintegerexecutionunits[ 305 ].Furthermore,thecalculatedtheoreticalCDvaluesrepresentthecomputecapacityof64tileswhiletheEPbenchmarkresultsareobtainedforatmost56tiles,sinceinpracticeall64tilescannotbeusedbyaparallelapplicationonTilera'sTMAsbecausesometilesarereservedforotherOSprocessesandresourcemanagement[ 19 ]. 358

PAGE 359

Table11-7. PerformanceresultsfortheEPbenchmarkforTILEPro64. ProblemSize#ofTilesExecutionTime(s)SpeedupEfciencyCostPerf.Perf.perwattnpTpS=Ts=TpE=S=pC=Tpp(MFLOPS)(MFLOPS/W) 100,000,0001687.7711687.7714.392.68100,000,0002345.791.991691.5828.635100,000,0004173.483.960.99693.9257.078.86100,000,0006117.745.840.97706.4484.0811.74100,000,000887.327.880.98698.56113.3814.4100,000,0001644.2315.550.97707.68223.8320.82100,000,0002828.8623.830.85808.08343.0322.78100,000,0004419.4935.290.8857.56507.9524.41100,000,0005616.1542.60.76904.461324.4 359

PAGE 360

Figure11-2. Performanceperwatt(MOPS/W)comparisonbetweenSMP2xQuadXeonandtheTILEPro64fortheinformationfusionapplicationwhenN=3000,000. Comparisonofperformanceandperformanceperwattforinformationfusionapplication(Table 11-5 )andEPbenchmark(Table 11-7 )revealsthatTMAsdelivermuchhigherperformanceandperformanceperwattforbenchmarksinvolvingintegeroperationsascomparedtothebenchmarksinvolvingFPoperations.Forexample,forthesebenchmarks,boththeperformanceandperformanceperwattforintegeroperationsis13xascomparedtotheFPoperationsforp=8.ThisisbecausetheTilera'sTMAsonlycontainintegerexecutionunitsandlackdedicatedFPunits. TheseresultsverifythatTMAsprovidebetterperformanceperwattascomparedtoacomparablesingleprocessor-corearchitecture.Hence,anembeddedsystemusingTMAsasprocessingunitsismoreperformance-andpower-efcientascomparedtoanembeddedsystemusingasingle-coreprocessingunit. 11.4.4ComparisonofSMPsandTMAs Inthissubsection,wepresentresultsforcross-architecturalevaluationofSMPsandTilera'sTMAsformulti-coreparallelembeddedsystemsbasedonourparallelizedbenchmarks.ThecomparisonoftheperformanceresultsfortheSMPsandTMAsprovideinsightsintothesuitabilityofthetwoarchitectureformulti-coreembeddedsystemsfordifferenttypesofapplications. 360

PAGE 361

Fig. 11-2 showsthatTILEPro64delivershigherperformanceperwattascomparedtoSMP2xQuadXeonfortheinformationfusionapplication.Forexample,TILEPro64attains465.8%betterperformanceperwattthanSMPsforinformationfusionapplicationwhenp=8.Thisisbecausetheinformationfusionapplicationrequiresoperationsonprivatedataofvarioussensors/sources,whichisverywellparallelizedonTILEPro64using ilib API.TheparallelizationonTILEPro64exploitsdatalocalityforenhancedperformanceforcomputingmovingaverages,minimum,maximum,andaverageofthesenseddata.Theexploitationofdatalocalityenablesfastaccesstoprivatedata,whichleadstohigherIMBandconsequentlyhigherMFLOPSandMFLOPS/W. ComparativelylowerperformanceoftheinformationfusionapplicationonSMP2xQuadXeonthanTILEPro64arisesbecauseoftworeasons.First,thearchitectureismoresuitedforsharedmemoryapplicationswhereastheinformationfusionapplicationiswellsuitedforarchitecturesthatcanbetterexploitdatalocality.Second,OpenMP-basedparallelprogramminguses sections and parallel construct,whichrequiressenseddatatobesharedbyoperatingthreads(speciedby shared clause)evenifthedatarequiresindependentprocessingbyeachthread.Tryingtohaveanindependentcopyofsenseddata(byusing private clause)foreachthreadforpotentialbetterperformanceresultedinasegmentationfaultduetohugememoryrequirementstherebyforcingtheuseofsharedmemoryforsenseddata.Hence,comparativelylowerperformanceoftheSMPthantheTMAfortheinformationfusionapplicationpartlyattributestothelimitationofOpenMPfornotallowingdeclarationofthread-specicprivatedata(i.e.,receiveddatafromtherstsource/sensorisprivatetotherstthreadonly,receiveddatafromthesecondsource/sensorisprivatetothesecondthreadonly,andsoon). Fig. 11-3 showsthatSMP2xQuadXeondelivershigherMFLOPS/WthantheTILEPro64fortheGEbenchmark.Forexample,SMP2xQuadXeonattains563%betterperformanceperwattthantheTILEPro64fortheGEbenchmarkwhenp=8.Furthermore,results 361

PAGE 362

Figure11-3. Performanceperwatt(MFLOPS/W)comparisonbetweenSMP2xQuadXeonandtheTILEPro64fortheGaussianeliminationbenchmarkwhen(m,n)=(2000,2000). indicatethatSMP2xQuadXeonexhibitsbetterscalabilityandcost-efciencythantheTILEPro64fortheGEbenchmark.Forexample,theefciencyofSMP2xQuadXeonfortheGEbenchmarkis0.93ascomparedtoanefciencyof0.36fortheTILEPro64whenp=8.ThisisbecausetheGEbenchmarkrequireslotsofcommunicationandsynchronizationoperationsbetweenprocessingcoresaswellasexcessivememoryoperations,whichfavorstheSMP-basedsharedmemoryarchitectureasthecommunicationtransformstoreadandwriteoperationsinsharedmemory.IncaseofTMAs,communicationoperationsburdenson-chipinterconnectionnetworkparticularlywhencommunicatingdatasizeislarge.Furthermore,higherIMBandEMBofSMP2xQuadXeonthantheTILEPro64leadstohighermemory-sustainableCDandthusenhancedperformancefortheGEbenchmark,whichrequiresfrequentmemoryaccesses. FortheEPbenchmark,SMP2xQuadXeondelivershigherMFLOPS/WthantheTILEPro64becausetheEPbenchmark'sexecutiontimeonSMP2xQuadXeonissignicantlylessthantheexecutiontimeontheTILEPro64.Forexample,fortheEPbenchmark,SMP2xQuadXeondelivers4.4xbetterperformanceperwattthantheTILEPro64whenp=8.ThecomparativelylargerexecutiontimeontheTILEPro64ascomparedto 362

PAGE 363

SMP2xQuadXeonresultsfromthecomplexFPoperations(e.g.,squareroot,logarithm)intheEPbenchmarkthatrequiremanycyclestoexecuteontheintegerexecutionunitsintheTILEPro64. Wealsocomparetheoverallexecutiontimeforthebenchmarks.WeobservethattheexecutiontimeofthebenchmarksonasinglecoreoftheSMPsissignicantlylessthantheexecutiontimeofthebenchmarksonasingletileoftheTMAs.Forexample,fortheinformationfusionapplication,theexecutiontimeonasinglecoreofSMP2xQuadXeonis6xlessthantheexecutiontimeonasingletileoftheTILEPro64.ThisexecutiontimedifferencebetweenasingleTMAtileascomparedtoasingleSMP2xQuadXeoncoreismainlyduetothelowercomputingpowerandoperatingfrequencyofeachtileandlackofFPexecutionunitsintheTILEPro64.Forexample,eachtileontheTMAhasamaximumclockfrequencyof866MHzascomparedtothe2.66GHzclockfrequencyofSMP2xQuadXeon.ThisobservationconrmsthecorollaryofAmdahl'slawthatemphasizestheperformanceadvantageofapowerfulsinglecoreovermultiplelesspowerfulcores[ 311 ].Wepointoutthatthisexecutiontimedifferencemaybeexacerbatedformemory-intensivebenchmarksbecauseofthelargerL2cacheontheSMP2xQuadXeon(12MB)ascomparedtotheTILEPro64(5MBon-chipcachewithTilera'sDDCtechnology). 11.5ConcludingRemarks Inthischapter,wecomparedtheperformanceofSMPsandtiledmulti-corearchitectures(TMAs)basedonparallelizedinformationfusionapplication,Gaussianelimination(GE),andembarrassinglyparallel(EP)benchmarks.Weprovidedaquantitativecomparisonofthesetwoarchitecturesbasedonvariousdevicemetrics'calculations(e.g.,computationaldensity(CD),CDperwatt(CD/W),internalmemorybandwidth(IMB),andexternalmemorybandwidth(EMB)).Althoughquantitativecomparisonprovidedahigh-levelevaluationofthecomputationalcapabilitiesofthearchitectures,ourworkprovidesdeeperinsightsbasedonparallelizedbenchmark-drivenevaluation. 363

PAGE 364

ResultsrevealedthattheTILEPro64exhibitedbetterscalabilityandattainedbetterperformanceperwattthantheSMPsforapplicationsinvolvingintegeroperationsondataandfortheapplicationsthatoperateprimarilyonprivatedatawithlittlecommunicationbetweenoperatingthreadsbyexploitingthedatalocality.Forexample,TILEPro64attains465.8%betterperformancethananIntel-basedSMPfortheinformationfusionapplicationwhennumberofoperatingprocessorcores/tiles,p=8.Interestingly,theparallelizedGEbenchmarkdidnotscalewellfortheTILEPro64,especiallyasthenumberofparticipatingtilesincreases.Forexample,theIntel-basedSMPattained563%betterperformanceperwattthantheTILEPro64fortheGEbenchmarkwhenp=8.ThisisbecausetheGEbenchmarkrequireslotsofcommunicationandsynchronizationoperationsbetweenoperatingcoresthattheSMPwasabletohandlebetterthantheTMA.ResultsfromtheEPbenchmarkrevealedthattheSMPprovidedhigherpeakoatingpointperformanceperwattthantheTMAmainlybecauseofthelackofdedicatedoatingpointunitintheTMA. Infuture,weplantofurtherevaluateSMPsandTMAsforotherbenchmarks(e.g.,blockmatchingkernelforimageprocessing,sortingkernels,andfastFouriertransform(FFT),etc.)usedinvariousembeddedapplications.WealsoplantodeveloparobustenergymodelfortheSMPsandTMAs.Additionally,weplantoevaluatemulti-corearchitectureswitheldprogrammablegatearrays(FPGAs)forembeddedsystems. 364

PAGE 365

CHAPTER12CONCLUSIONS Inthisdissertation,wepresentednovelmethodsformodelingandoptimizationofparallelanddistributedembeddedsystems.Toillustratethemodelingandoptimizationofdistributedembeddedsystems,wepresentedourresearchondistributedembeddedwirelesssensornetworks(EWSNs).Specically,wetargeteddynamicoptimizationmethodologiesbasedontheembeddedsenornodetunableparametervaluesettingsforEWSNsandmodelingofapplicationmetrics(e.g.,lifetime,reliability).Todemonstratethemodelingandoptimizationofparallelembeddedsystems,wepresentedourresearchonmulti-core-basedparallelembeddedsystems. Chapter1introducedembeddedsystems,hardwareandsoftware,applicationcharacteristics,modelingparadigm,anddissertationorganizationalongwithrelationshiptopublishedwork. Chapter2discusseddistributedembeddedsystemsfromanoptimizationperspectivefocusingondistributedEWSNs.WepresentedatypicalEWSNarchitecturealongwithseveralpossibleintegrationscenarioswithexternalIPnetworksforubiquitousavailabilityofEWSNofferedservices(e.g.,sensedtemperatureandhumiditystatistics).Wediscussedcommercialoff-the-shelf(COTS)embeddedsensornodecomponentsandassociatedtunableparametersthatcanbespecializedtoprovidecomponent-leveloptimizations.Wepresenteddatalink-levelandnetwork-leveloptimizationstrategiesfocusingonMACandroutingprotocols,respectively.OurpresentedMACprotocolstargetedloadbalancing,throughput,andenergyoptimizationsandroutingprotocolsaddressedquerydissemination,real-timedatadelivery,andnetworktopology.DifferentOSoptimizationsincludeevent-drivenexecution,dynamicpowermanagement,andfault-tolerance. Chapter3proposedanapplicationmetricestimationmodeltoestimatehigh-levelmetrics(lifetime,throughput,andreliability)fromembeddedsensornode'sparameters. 365

PAGE 366

Thisestimationmodelassisteddynamicoptimizationmethodologiesforoperatingstates'comparisons.Ourapplicationmetricsestimationmodelprovidedaprototypemodelforapplicationmetricestimation. InChapter4,weproposedafault-tolerant(FT)duplexembeddedsensornodemodelbasedonthenovelconceptofdeterminingthecoveragefactorusingasensor'sfaultdetectionalgorithmaccuracy.WedevelopedcomprehensiveMarkovmodelsthathierarchicallyencompassembeddedsensornodes,EWSNclusters,andtheoverallEWSN.OurMarkovmodelscharacterizedEWSNreliabilityandMTTFfordifferentsensorfailureprobabilitiesandassistEWSNdesignersincreatingEWSNsthatmeetapplicationrequirementsbydeterminingthereliabilityandMTTFinthepre-deploymentphase.OurmodelscouldalsoenableEWSNdesignerstoselectaparticularfaultdetectionalgorithmbasedonthealgorithm'sdetectionaccuracyconsideringapplicationrequirements. Weobservedthatthefaultdetectionalgorithm'saccuracyplaysacrucialroleinFTEWSNsbycomparingourresultsagainstaperfectfaultdetectionalgorithm(c=1).TheresultsindicatedthatthepercentageimprovementinreliabilityforanFTembeddedsensornodewithc=1overanNFTembeddedsensornodeandanFTembeddedsensornode(c6=1)withanaveragenumberofneighborskequalto5,10,and15,are230%,172.36%,166.31%,and165.32%,respectively,forp=0.9.ThepercentageimprovementinreliabilityforanFTEWSNclusterwithc=1overanNFTEWSNclusterandanFTEWSNclusterwithc6=1is601.12%and142.73%,respectively,forp=0.6.ThepercentageimprovementinreliabilityforanFTEWSNwithc=1overanNFTEWSNandanFTEWSNwithc6=1is350.18%and236.22%,respectively,forp=0.9.ResultsindicatedthatourFTmodelcouldprovideonaverage100%MTTFimprovementwithaperfectfaultdetectionalgorithmwhereastheMTTFimprovementvariedfrom95.95%to1.34%duetoafaultdetectionalgorithm'stypicalpoorperformanceathighsensorfailurerates.Wealsoobservedthattheredundancy 366

PAGE 367

inEWSNsplaysanimportantroleinimprovingEWSNreliabilityandMTTF.ResultsrevealedthatjustthreeredundantsensornodesinanEWSNclusterresultedinanMTTFimprovementof103.35%onaverage.Similarly,redundancyinEWSNclusterscontributestothereliabilityandMTTFimprovementandtheresultsindicatedthatthreeredundantEWSNclusterscouldimprovetheMTTFby52.22%onaverage.Theiso-MTTFresultsindicatethatanNFTEWSNclusterrequiresthreeredundantsensornodestoattainacomparableMTTFasthatofanFTEWSNclusterwhereasanNFTEWSNrequiresnineredundantEWSNclusterstoachieveacomparableMTTFasthatofanFTEWSN. Weproposedanapplication-orienteddynamicoptimizationmethodologyforEWSNsbasedonMarkovDecisionProcesses(MDPs)inChapter5.OurMDP-basedoptimalpolicytunedsensornodeprocessorvoltage,frequency,andsensingfrequencyinaccordancewithapplicationrequirementsoverthelifetimeofasensornode.OurproposedmethodologywasadaptiveanddynamicallydeterminedthenewMDP-basedpolicywheneverapplicationrequirementschanged(whichmaybeinaccordancewithchangingenvironmentalstimuli).WecomparedourMDP-basedpolicywithfourxedheuristicpoliciesandconcludedthatourproposedMDP-basedpolicyoutperformedeachheuristicpolicyforallsensornodelifetimes,statetransitioncosts,andapplicationmetricweightfactors.Weprovidedtheimplementationguidelinesofourproposedpolicyinembeddedsensornodes. AlthoughourMDP-basedmethodologyforgoodqualitysensornodeparametertuningtomeetapplicationrequirementswasarststeptowardsEWSNdynamicoptimization,theMDP-basedmethodologyrequiredhighcomputationalandmemoryresourcesforlargedesignspacesandneededahigh-performancebasestationnode(sinknode)tocomputetheoptimaloperatingstateforlargedesignspaces.Theoperatingstatesdeterminedatthebasestationwerethencommunicatedtotheothersensornodes.ThehighresourcerequirementsmadetheMDP-based 367

PAGE 368

methodologyinfeasibleforautonomousdynamicoptimizationforlargedesignspacesgiventheconstrainedresourcesofindividualsensornodes.Therefore,weproposedlightweightonlinegreedyandsimulatedannealingalgorithmssuitabletobeusedfordynamicoptimizationofdistributedEWSNsinChapter6.Comparedtopreviouswork,ourmethodologyconsideredanextensiveembeddedsensornodedesignspace,whichallowedembeddedsensornodestomorecloselymeetapplicationrequirements.Resultsrevealedthatouronlinealgorithmswerelightweight,requiredlittlecomputational,memory,andenergyresourcesandthusareamenableforimplementationonsensornodeswithtightresourceandenergyconstraints.Furthermore,ouronlinealgorithmscouldperforminsituparametertuningtoadapttochangingenvironmentalstimulitomeetapplicationrequirements. Chapter7proposedalightweightdynamicoptimizationmethodologyforEWSNs,whichprovidedahigh-qualitysolutioninone-shotusinganintelligentinitialtunableparametervaluesettingsforhighlyconstrainedapplications.Wealsoproposedanonlinegreedyoptimizationalgorithmthatleveragedintelligentdesignspaceexplorationtechniquestoiterativelyimproveontheone-shotsolutionforlessconstrainedapplications.Resultsshowedthatourone-shotsolutionisnear-optimalandwithin8%oftheoptimalsolutiononaverage.Resultsindicatedthatourgreedyalgorithmconvergedtotheoptimalornear-optimalsolutionafterexploringonly1%and0.04%ofthedesignspacewhereasSAexplored55%and1.3%ofthedesignspacefordesignspacecardinalitiesof729and31,104,respectively.Datamemoryandexecutiontimeanalysisrevealedthatourone-shotsolution(stepone)required361%and85%lessdatamemoryandexecutiontime,respectively,whencomparedtousingallthethreestepsofourdynamicoptimizationmethodology.Furthermore,one-shotconsumed1,679%and166,510%lessenergyascomparedtotheexhaustivesearchforjSj=729andjSj=31,104,respectively.ImprovementmodeusingGDastheonlinealgorithmconsumed926%and83,135%lessenergyascomparedtotheexhaustivesearchforjSj=729and 368

PAGE 369

jSj=31,104,respectively.Computationalcomplexityalongwiththeexecutiontime,datamemoryanalysis,andenergyconsumptionconrmedthatourmethodologyislightweightandthusfeasibleforsensornodeswithlimitedresources. Chapter8gaveanoverarchingsurveyofhighperformanceenergyefcientcomputing(HPEEC)techniquesformulti-core-basedparallelembeddedsystems.Wediscussedstate-of-the-artmulti-coreprocessorsthatleveragetheseHPEECtechniques. Chapter9proposedanarchitectureforheterogeneoushierarchicalmulti-coreembeddedwirelesssensornetworks(MCEWSNs).Compute-intensivetaskssuchasinformationfusion,encryption,networkcoding,andsoftwaredenedradio,willbenetinparticularfromtheincreasedcomputationpowerofferedbymulti-coreembeddedsensornodes.Manywirelesssensornetworkingapplicationdomainssuchaswirelessvideosensornetworks,wirelessmultimediasensornetworks,satellite-basedsensornetworks,spaceshuttlesensornetworks,aerial-terrestrialhybridsensornetworks,andfault-tolerantsensornetworks,canbenetfromMCEWSNs.PerceivingthepotentialbenetsofMCEWSNs,severalinitiativeshavebeenundertakeninbothacademiaandindustrytodevelopmulti-coreembeddedsensornodessuchasInstraNode,satellite-basedsensornode,andsmartcameramote,whichwerediscussedinthechapter. Inchapter10,wedevelopedclosedproduct-formqueueingnetworkmodelsforperformanceevaluationofmulti-coreembeddedarchitecturesfordifferentworkloadcharacteristics.Theperformanceevaluationresultsindicatedthatthearchitectureswithsharedlast-levelcaches(LLCs)providedbettercacheresponsetimeandMFLOPS/WthantheprivateLLCsforallcachemissratesespeciallyasthenumberofcoresincreases.TheresultsalsorevealedthedownsideofsharedLLCsindicatingthatthesharedLLCsaremorelikelytocauseamainmemoryresponsetimebottleneckforlargercachemissratesascomparedtotheprivateLLCs.ThememorybottleneckcausedbysharedLLCsmayleadtoincreasedresponsetimeforprocessorcores 369

PAGE 370

becauseofstallingoridlewaiting.However,resultsindicatedthatthemainmemorybottleneckcreatedbysharedLLCscanbemitigatedbyusingahybridofprivateandsharedLLCs(i.e.,sharingLLCsbyafewernumberofcores)thoughhybridLLCsconsumemorepowerthanthesharedLLCsanddelivercomparativelylessMFLOPS/W.Theperformanceperwattandperformanceperunitarearesultsforthemulti-coreembeddedarchitecturesrevealedthatthemulti-corearchitectureswithsharedLLCsbecomemoreareaandpowerefcientascomparedtothearchitectureswithprivateLLCsasthenumberofprocessorcoresinthearchitecturesincreases.ThesimulationresultsfortheSPLASH-2benchmarksexecutingontheSESCsimulatorveriedourarchitecturalevaluationinsightsobtainedfromourqueueingtheoreticmodels. Chapter11comparedtheperformanceofsymmetricmultiprocessors(SMPs)andtiledmulti-corearchitectures(TMAs)basedonparallelizedinformationfusionapplication,Gaussianelimination(GE),andembarrassinglyparallel(EP)benchmarks.Weprovidedaquantitativecomparisonofthesetwoarchitecturesbasedonvariousdevicemetrics'calculations(e.g.,computationaldensity(CD),CDperwatt(CD/W),internalmemorybandwidth(IMB),andexternalmemorybandwidth(EMB)).Althoughquantitativecomparisonprovidedahigh-levelevaluationofthecomputationalcapabilitiesofthearchitectures,ourworkprovidesdeeperinsightsbasedonparallelizedbenchmark-drivenevaluation. ResultsrevealedthattheTILEPro64exhibitedbetterscalabilityandattainedbetterperformanceperwattthantheSMPsforapplicationsinvolvingintegeroperationsondataandfortheapplicationsthatoperateprimarilyonprivatedatawithlittlecommunicationbetweenoperatingthreadsbyexploitingthedatalocality.Forexample,TILEPro64attains465.8%betterperformancethananIntel-basedSMPfortheinformationfusionapplicationwhennumberofoperatingprocessorcores/tiles,p=8.Interestingly,theparallelizedGEbenchmarkdidnotscalewellfortheTILEPro64,especiallyasthenumberofparticipatingtilesincreases.Forexample,theIntel-based 370

PAGE 371

SMPattained563%betterperformanceperwattthantheTILEPro64fortheGEbenchmarkwhenp=8.ThisisbecausetheGEbenchmarkrequireslotsofcommunicationandsynchronizationoperationsbetweenoperatingcoresthattheSMPwasabletohandlebetterthantheTMA.ResultsfromtheEPbenchmarkrevealedthattheSMPprovidedhigherpeakoatingpointperformanceperwattthantheTMAmainlybecauseofthelackofdedicatedoatingpointunitintheTMA. Thisdissertationalsooutlinesfutureresearchdirectionsintheareaofmodelingandoptimizationofparallelanddistributedembeddedsystems.Regardingthereliabilityandmeantimetofailure(MTTF)modelingofdistributedEWSNs,ourresultsmotivatethedevelopmentofrobustdistributedfaultdetectionalgorithmsandarethefocusofourfuturework.WeplantodevelopanEWSNperformabilitymodeltocaptureboththeperformanceandavailability(and/orreliability)simultaneously.WealsoplantoinvestigateFTinsenseddataaggregation(fusion)androutinginEWSNs.Applicationmetricsestimationmodelproposedinthisdissertationcanbeenhancedforadditionalapplicationmetrics(e.g.,security,delay).TheapplicationmetricsestimationmodelcanbefurthervalidatedbycomparingthestatisticsobtainedfromactualembeddedsensornodesoperationinadistributedEWSN. Regardingtheoptimizationofdistributedembeddedsystems,ourproposedMDP-basedmodelcanbeenhancedtoincorporateadditionalhigh-levelapplicationmetrics(e.g.,security,reliability,energy,lifetime,etc.)aswellasadditionalembeddedsensornodetunableparameters(suchasradiotransmissionpower,radiotransmissionfrequency,etc.).Furthermore,wirelesschannelconditionscanbeincorporatedintheMDPstatespace,thusformulatingastochasticdynamicprogramthatwouldenablesensornodetuninginaccordancewithchangingwirelesschannelcondition.Additionally,MDP-basedmethodologycanbeimplementedonhardwaresensornodesforfurthervericationofresults.Moreover,sensornodetuningautomationcanbeenhancedusingprolingstatisticsbyarchitectingmechanismsthatenablethesensor 371

PAGE 372

nodetoautomaticallyreacttoenvironmentalstimuliwithouttheneedforanapplicationmanager'sfeedback. Regardinglightweightdynamicoptimizationmethodologiesproposedinthisdissertation,furtherresultsvericationcanbecarriedoutusinglargerstatespacescontainingmoreembeddedsensornodetunableparametersandtunablevalues.Futureworkincludestheincorporationofprolingstatisticsintoourdynamicoptimizationmethodologytoprovidefeedbackwithrespecttochangingenvironmentalstimuli.Inaddition,weplantofurtherverifyourdynamicoptimizationmethodologybyimplementingourmethodologyonaphysicalhardwaresensornodeplatform.Futureworkalsoincludestheextensionofourdynamicoptimizationmethodologytoglobaloptimizations,whichwillensurethatindividualsensornodetunableparametersettingsarebothoptimalforthesensornodeandfortheentireEWSN. Additionalworkneedstobedoneinparallelanddistributedembeddedsystemssuchasmulti-coreMCEWSNS.DespitefewinitiativestowardsMCEWSNs,thedomainisstillinitsinfancyandrequiresaddressingsomechallengestofacilitateubiquitousdeploymentofMCEWSNs.SincebatteryenergyisthemostcriticalresourceconstraintforMCEWSNs,researchanddevelopmentinenergy-efcientbatteriesandenergy-harvestingsystemswouldbebenecialforMCEWSNs.Mobilityandself-adaptabilityofembeddedsensornodesrequiresfurtherresearchtoobtainthedesiredviewofthesensoreld(e.g.,animagesensorfacingdownwardtowardstheearthmaynotbedesirable).Furthermore,distillinginformationfromalargenumberoflow-resolutionvideostreamsobtainedfrommultiplevideosensorsrequiresnovelalgorithmssincecurrentcomputer-visionalgorithmsareabletoanalyzeonlyafewhigh-resolutionimages.Finally,recongurabilityinMCEWSNsisanimportantresearchavenuethatwouldallowthenetworktoadapttonewrequirementsbyacceptingcodeupgrades(e.g.,amoreefcientalgorithmforvideocompressionmaybediscoveredafterdeployment). 372

PAGE 373

Ourproposedqueueingtheoreticmodelsforperformanceevaluationofmulti-coreembeddedsystemscanbeenhancedtoincorporateheterogeneousmulti-coreembeddedarchitectures.Infuture,weplantofurtherevaluateSMPsandTMAsforotherbenchmarks(e.g.,blockmatchingkernelforimageprocessing,sortingkernels,andfastFouriertransform(FFT),etc.)usedinvariousembeddedapplications.WealsoplantodeveloparobustenergymodelfortheSMPsandTMAs.Additionally,weplantoevaluatemulti-corearchitectureswitheldprogrammablegatearrays(FPGAs)forembeddedsystems. Despiteremarkableadvancementsinparallelembeddedsystem,theHPEECdomainstillfacesvariousarduouschallenges,whichrequirefurtherresearchtoleveragethefull-scalebenetsofHPEECtechniques.Althoughpowerisstillarst-orderconstraintinHPEECplatforms,wediscussseveraladditionalchallengesfacingtheHPEECdomain(summarizedinTable 12-1 )alongwithfutureresearchdirections. HeterogeneousCMPsprovideperformanceefciency,butpresentadditionaldesignchallengesasdesignspaceincreasesconsideringvarioustypesofcoresandtheexibilityofchangingeachcore'sarchitecturalparameters(e.g.,issuewidth,instructionwindowsize,fetchgating)foranarbitrarypermutationsofworkloads.Furthermore,foragivendiesize,thereexistsafundamentaltradeoffbetweennumberandtypeofcoresandappropriatecachesizesforthesecores.Efcientdistributionofavailablecachesizeacrossthecachehierarchies(privateandshared)toprovidehighperformanceischallenging[ 180 ]. Synchronizationbetweenmultiplethreadsrunningonmultiplecoresintroducesperformancechallenges.Threadsusesemaphoresorlockstocontrolaccesstoshareddata,whichdegradesperformanceduetothebusywaitingofthreads.Furthermore,threadsusesynchronizationbarriers(adenedpointinthecodewhereallthreadsmustreachbeforefurtherexecution),whichdecreasesperformanceduetoidle-waitingoffasterthreadsforslowerthreads. 373

PAGE 374

Table12-1. High-performanceenergy-efcientembeddedcomputing(HPEEC)challenges. ChallengeDescription ComplexdesignspaceLargedesignspaceduetovariouscoretypes(homogeneous,heterogeneous)andeachcore'stunableparameters(e.g.,instructionwindowsize,issuewidth,fetchgating)Highon-chipbandwidthIncreasedcommunicationduetoincreasingnumberofcoresrequireshigh-bandwidthon-chipinterconnectsSynchronizationSynchronizationprimitives(e.g.,locks,barriers)resultsinprogramsserializationdegradingperformanceSharedmemorybottleneckThreadsrunningondifferentcoresmakelargenumberofaccessestovarioussharedmemorydatapartitionsCachecoherenceHeterogeneouscoreswithdifferentcachelinesizesrequirecachecoherenceprotocolsredesignandsynchronizationprimitives(e.g.,semaphores,locks)increasecachecoherencetrafcCachethrashingThreadsworkingconcurrentlyevicteachothersdataoutofthesharedcachetobringtheirowndata Althoughdifferentthreadscanworkindependentlyonprivatedata,sharedmemorybecomesabottleneckduetolargenumberofshared-dataaccessestodifferentcachepartitions.Furthermore,threadscancommunicateviasharedmemory,whichrequirescachestatetransitionstotransferdatabetweenthreads.Threadsmuststalluntilcachestatetransitionsoccur,asthereislikelyinsufcientspeculativeorout-of-orderworkavailableforthesethreads.Moreover,designingacommoninterfacetothesharedcache,clockdistribution,andcachecoherenceprovidesadditionaldesignchallenges[ 189 ]. Cachecoherenceisrequiredtoprovideaconsistentmemoryviewinshared-memorymulti-coreprocessorswithvariouscachehierarchies.Embeddedsystemsconventionallyrelyonsoftware-managedcachecoherency,whichdoesnotscalewellwiththenumberofcoresandtherebynecessitateshardware-assistedcachecoherence. 374

PAGE 375

Hardware-softwarecodesignofcachecoherenceprotocoldeneschallengingtradeoffsbetweenperformance,power,andtime-to-market[ 312 ]. CachethrashinganadditionalHPEECchallengeisaphenomenonwherethreadscontinuallyevicteachothersworkingsetfromthecache,whichincreasesthemissrateandlatencyforallthreads.Althoughdirect-mappedcachespresentanattractivechoiceformulti-coreembeddedsystemsduetoadirect-mappedcache'spowerefciencyascomparedtoassociativecaches,direct-mappedcachesaremorepredisposedtothrashingascomparedtosetassociativecaches.Cachethrashingcanbeminimizedbyprovidinglargerandmoreassociativecaches,however,theseopportunitiesareconstrainedbystrictpowerrequirementsforembeddedsystems.Victimcachesemployedalongsidedirect-mappedcacheshelptoalleviatingcachethrashingbyprovidingassociativityforlocalizedcacheconictregions[ 313 ]. VariousnewavenuesareemerginginHPEECsuchasenergy-efcientdatacenters,gridandclusterembeddedcomputinganddependableHPEEC.Variousvendorsaredevelopingenergy-efcienthigh-performancearchitecturesfordatacentersbyleveragingahugevolumeoflow-powermobileprocessors(e.g.,SeaMicro'sSM10000serversfamilyintegrates512low-powerX861.66GHz,64-bit,IntelAtomcores[ 314 ]).Advancesarebeingmadeingridandclusterembeddedcomputing,e.g.,AMAX'sClusterMaxSuperGGPGPUclustersconsistingofNVIDIATesla20-seriesGPUcomputingplatformsfeature57,344GPUcoresandoffer131.84TFLOPSofsingleprecisionperformanceand65.92TFLOPSofdoubleprecisionperformance[ 315 ].ThoughgridembeddedcomputinghasrevolutionizedHPEEC,butrequiresfurtherinvestigationinassociatedtaskschedulingpoliciesduetotheuniquedynamicsofgridembeddedcomputing.Differentheterogeneousembeddedprocessorscanbeaddedtoorremovedfromthegriddynamically,whichrequiresintelligentdynamictaskschedulingpoliciestomaptaskstothebestavailablecomputingnodes.Thetaskscheduling 375

PAGE 376

policiesmustconsidertheimpactofdynamicchangesinavailablecomputingresourcesontimeandenergyrequirementsoftasks. Asthenumberofon-chipcoresincreasestosatisfyperformancedemands,communicatingdatabetweenthesecoresinanenergy-efcientmannerbecomeschallengingandrequiresscalable,high-bandwidthinterconnectionnetworks.Althoughwirelessinterconnectsprovideapower-efcienthigh-performancealternativetowiredinterconnects,associatedresearchchallengesincludepartitioningofwiredandwirelessinterconnectdomains,directionalantennadesign,andlightweightmediumaccesscontrol(MAC)protocols.Sincemanysupercomputingapplicationsleveragemultiplemany-corechips(CMOStechnologyandpowerdissipationlimitrestrictsthenumberofprocessorcoresonasinglechip),designofhigh-bandwidthandlow-powerinterconnectionnetworksbetweenthesemany-corechipsisalsoanemergingresearchavenue.Althoughphotonicnetworkdesignshavebeenproposedinliteratureasaprospectivelow-powerandhigh-bandwidthsolutiontointerconnectmany-coreCMPs[ 316 ][ 317 ],thedomainofscalableinterconnectionnetworks(inter-chipandintra-chip)requiresfurtherresearch. Dynamicoptimizationtechniquesthatcanautonomouslyadaptembeddedsystemsaccordingtochangingapplicationrequirementsandenvironmentalstimulipresentaninterestingresearchavenue.Thetaskschedulingtechniquesinreal-timeembeddedsystemsaretypicallybasedontasks'worst-caseexecutiontimes,whichcanproduceslacktimewheneveratasknishesexecutionbeforethetask'sdeadline.Therefore,dynamictaskschedulingtechniquesthatleveragethisslacktimeinformationatruntimetoreduceenergyconsumptionarecrucialforHPEECsystemsandrequirefurtherresearch. TokeepupwiththeMoore'slaw,innovativetransistortechnologiesareneededthatcanpermithightransistordensityon-chipfacilitatingchipminiaturizationwhileallowingoperationathigherspeedswithlowerpowerconsumptionascomparedto 376

PAGE 377

thecontemporaryCMOStransistortechnology.Miniaturizedembeddedmulti-coreprocessor/memorydesignandfabricationusingnewtransistortechnologies(e.g.,multiplegateeld-effecttransistors(MuGFETs),FinFETs,Intel'stri-gate)isaninterestingHPEEClithographyresearchavenue[ 318 ]. Finally,advancedpowermonitoringandanalysistoolsarerequiredforHPEECplatformstomonitorpoweratanegranularity(i.e.,thefunctionalunit-levelinrelationtoanapplication'scodesegments)andprolearchitecturalcomponentswithrespecttopowerconsumptionfordifferentcodesegments.Specically,powermeasurementandanalysistoolsforGPUsarerequiredconsideringtheproliferationofGPUsintheHPEECdomain[ 319 ]. 377

PAGE 378

REFERENCES [1] P.Marwedel,EmbeddedandCyber-PhysicalSystemsinaNutshell,inDesignAutomationConference(DAC)KnowledgeCenterArticle,November2010.[Online].Available: http://www.dac.com/front end+topics.aspx?article=58&topic=1 [2] S.Edwards,L.Lavagno,E.Lee,andA.Sangiovanni-Vincentelli,DesignofEmbeddedSystems:FormalModels,Validation,andSynthesis,ProceedingsoftheIEEE,vol.85,no.3,pp.366,March1997. [3] W.Dally,J.Balfour,D.Black-Shaffer,J.Chen,R.Harting,V.Parikh,J.Park,andD.Shefeld,EfcientEmbeddedComputing,IEEEComputer,vol.41,no.7,pp.27,July2008. [4] J.Balfour,EfcientEmbeddedComputing,Ph.D.Thesis,EEDept.,StanfordUniv.,May2010. [5] P.Gepner,D.Fraser,M.Kowalik,andR.Tylman,NewMulti-CoreIntelXeonProcessorsHelpDesignEnergyEfcientSolutionforHighPerformanceComputing,inProc.ofIMCSIT,Mragowo,Poland,October2009. [6] P.Crowley,M.Franklin,J.Buhler,andR.Chamberlain,ImpactofCMPDesignonHigh-PerformanceEmbeddedComputing,inProc.ofHPECWorkshop,Lexington,Massachusetts,September2006. [7] E.Lee,Cyber-PhysicalSystemsAreComputingFoundationsAdequate?inNSFWorkshoponCyber-PhysicalSystems:ResearchMotivations,TechniquesandRoadmap(PositionPaper),Austin,Texas,October2006. [8] G.Starr,J.Wersinger,R.Chapman,L.Riggs,V.Nelson,J.Klingelhoeffer,andC.Stroud,ApplicationofEmbeddedSystemsinLowEarthOrbitforMeasurementofIonosphericAnomalies,inProc.ofInternationalConferenceonEmbeddedSystems&Applications(ESA'09),LasVegas,Nevada,July2009. [9] J.Samson,J.Ramos,A.George,M.Patel,andR.Some,TechnologyValidation:NMPST8DependableMultiprocessorProject,inProc.ofIEEEAerospaceConference,BigSky,Montana,March2006. [10] Intel,AdvantechPutsIntelArchitectureattheHeartofLiDCO'sAdvancedCardiovascularMonitoringSystem,inWhitePaper,2010.[Online].Available: http://download.intel.com/design/embedded/medical/323210.pdf [11] M.Reunert,HighPerformanceEmbeddedSystemsforMedicalImaging,inIntel'sWhitePaper,October2007.[Online].Available: ftp://download.intel.com/design/embedded/medical-solutions/basoct07p9.pdf [12] Intel,IntelTechnologyHelpsMedicalSpecialistsMoreQuicklyReachandTreatPatientsinRemoteAreas,inWhitePaper,2011.[Online].Available: http://download.intel.com/embedded/applications/medical/325447.pdf 378

PAGE 379

[13] K.Muller-Glaser,G.Frick,E.Sax,andM.Kuhl,MultiparadigmModelinginEmbeddedSystemsDesign,IEEETrans.onControlSystemsTechnology,vol.12,no.2,pp.279,March2004. [14] A.Sangiovanni-VincentelliandM.Natale,EmbeddedSystemDesignforAutomotiveApplications,IEEEComputer,vol.40,no.10,pp.42,October2007. [15] D.Milojicic,TrendWars:EmbeddedSystems,IEEEConcurrency,vol.8,no.4,pp.80,October-December2000. [16] G.Kornaros,Multi-coreEmbeddedSystems.TaylorandFrancisGroup,CRCPress,2010. [17] C.GonzalesandH.Wang,WhitePaper:ThermalDesignConsiderationsforEmbeddedApplications,June2011.[Online].Available: http://download.intel.com/design/intarch/papers/321055.pdf [18] J.C.Knight,SoftwareChallengesinAviationSystems.SpringerBerlin/Heidelberg,2002. [19] TILERA,TileraMulticoreDevelopmentEnvironment:iLibAPIReferenceManual,inTileraOfcialDocumentation,April2009. [20] W.Young,W.Boebert,andR.Kain,ProvingaComputerSystemSecure,ScienticHoneyweller,vol.6,no.2,pp.18,July1985. [21] A.MunirandA.Gordon-Ross,AnMDP-basedDynamicOptimizationMethodologyforWirelessSensorNetworks,IEEETrans.onParallelandDistributedSystems,2011. [22] J.ZhaoandR.Govindan,UnderstandingPacketDeliveryPerformanceinDenseWirelessSensorNetworks,inProc.ofACMSenSys,LosAngeles,California,November2003. [23] C.Myers,ModelingandVericationofCyber-PhysicalSystems,inDesignAutomationSummerSchool,UniversityofUtah,June2011.[Online].Available: http://www.lems.brown.edu/iris/dass11/Myers-DASS.pdf [24] OMG,UniedModelingLanguage,inObjectManagementGroupStandard,2011.[Online].Available: http://www.uml.org/ [25] I.KorenandM.Krishna,Fault-TolerantSystems.MorganKaufmannPublishers,2007. [26] C.Zhang,F.Vahid,andR.Lysecky,ASelf-TuningCacheArchitectureforEmbeddedSystems,ACMTrans.onEmbeddedComputingSystems,vol.3,no.2,pp.407,May2004. 379

PAGE 380

[27] K.Hwang,AdvancedParallelProcessingwithSupercomputerArchitectures,ProceedingsoftheIEEE,vol.75,no.10,pp.1348,October1987. [28] A.Klietz,A.Malevsky,andK.Chin-Purcell,Mix-and-matchHighPerformanceComputing,IEEEPotentials,vol.13,no.3,pp.6,August/September1994. [29] W.Pulleyblank,HowtoBuildaSupercomputer,IEEEReview,vol.50,no.1,pp.48,January2004. [30] S.BokhariandJ.Saltz,ExploringthePerformanceofMassivelyMultithreadedArchitectures,ConcurrencyandComputation:Practice&Experience,vol.22,no.5,pp.588,April2010. [31] W.-c.FengandK.Cameron,TheGreen500List:EncouragingSustainableSupercomputing,IEEEComputer,vol.40,no.12,pp.38,December2007. [32] A.Munir,S.Ranka,andA.Gordon-Ross,Modelingofscalableembeddedsystems,inScalableComputingandCommunications:TheoryandPractice(toappear),S.Khan,L.Wang,andA.Zomaya,Eds.JohnWiley&Sons,2012. [33] A.MunirandA.Gordon-Ross,MarkovModelingofFault-TolerantWirelessSensorNetworks,inProc.ofIEEEInternationalConferenceonComputerCommunicationNetworks(ICCCN),Maui,Hawaii,August2011. [34] A.Munir,A.Gordon-Ross,andS.Ranka,AQueueingTheoreticApproachforPerformanceEvaluationofLow-PowerMulti-coreEmbeddedSystems,inProc.ofIEEEInternationalConferenceonComputerDesign(ICCD),Amherst,Massachusetts,October2011. [35] A.Munir,S.Ranka,andA.Gordon-Ross,High-PerformanceEnergy-EfcientMulti-coreEmbeddedComputing,IEEETrans.onParallelandDistributedSystems(TPDS),2011. [36] A.MunirandA.Gordon-Ross,Optimizationapproachesinwirelesssensornetworks,inSustainableWirelessSensorNetworks,W.SeahandY.K.Tan,Eds.InTech,2010.[Online].Available: http://www.intechopen.com/articles/show/title/optimization-approaches-in-wireless-sensor-networks [37] ,AnMDP-basedApplicationOrientedOptimalPolicyforWirelessSensorNetworks,inProc.oftheIEEE/ACMInternationalConferenceonHardware/SoftwareCodesignandSystemSynthesis(CODES+ISSS),Grenoble,France,October2009,pp.183. [38] A.Munir,A.Gordon-Ross,S.Lysecky,andR.Lysecky,OnlineAlgorithmsforWirelessSensorNetworksDynamicOptimization,inProc.ofIEEEConsumerCommunicationsandNetworkingConference(CCNC),LasVegas,Nevada,January2012. 380

PAGE 381

[39] ,AOne-ShotDynamicOptimizationMethodologyforWirelessSensorNetworks,inProc.IARIAInternationalConferenceonMobileUbiquitousComput-ing,Systems,ServicesandTechnologies(UBICOMM),Florence,Italy,October2010. [40] ,AOne-ShotDynamicOptimizationMethodologyandApplicationMetricsEstimationModelforWirelessSensorNetworks,IARIAInternationalJournalonAdvancesinNetworksandServices,2012. [41] ,ALightweightDynamicOptimizationMethodologyforWirelessSensorNetworks,inProc.ofIEEEWiMob,October2010. [42] D.BrooksandM.Martonosi,Value-basedClockGatingandOperationPacking:DynamicStrategiesforImprovingProcessorPowerandPerformance,ACMTrans.onComputerSystems,vol.18,no.2,pp.89,May2000. [43] H.Hamed,A.El-Atawy,andA.-S.Ehab,OnDynamicOptimizationofPacketMatchinginHigh-SpeedFirewalls,IEEEJournalonSelectedAreasinCommunications,vol.24,no.10,pp.1817,October2006. [44] K.HazelwoodandM.D.Smith,ManagingBoundedCodeCachesinDynamicBinaryOptimizationSystems,ACMTrans.onArchitectureandCodeOptimization,vol.3,no.3,pp.263,September2006. [45] S.Hu,M.Valluri,andL.K.John,EffectiveManagementofMultipleCongurableUnitsusingDynamicOptimization,ACMTrans.onArchitectureandCodeOptimization,vol.3,no.4,pp.477,December2006. [46] N.Mahalik,SensorNetworksandConguration:Fundamentals,Standards,Platforms,andApplications.Springer,2007. [47] H.KarlandA.Willig,ProtocolsandArchitecturesforWirelessSensorNetworks.JohnWileyandSons,Inc.,2005. [48] StrongARM,IntelStrongARMSA-1110Microprocessor,August2011.[Online].Available: http://bwrc.eecs.berkeley.edu/research/pico radio/test bed/hardware/documentation/arm/sa1110briefdatasheet.pdf [49] ATMEL,ATMELATmega128L8-bitMicrocontrollerDatasheet,inATMELCorporation,SanJose,California,December2010.[Online].Available: http://www.atmel.com/dyn/resources/prod documents/doc2467.pdf [50] T.S.Rappaport,WirelessCommunications,PrinciplesandPractice.PrenticeHall,1996. [51] N.Abramson,DevelopmentoftheALOHANET,IEEETrans.onInformationTheory,vol.31,no.2,pp.119,March1985. 381

PAGE 382

[52] IEEEStandards,WirelessLANMediumAccessControl(MAC)andPhysicalLayer(PHY)Specication,IEEEStd802.11-1999edition:LANMANStandardsCommitteeoftheIEEEComputerSociety,1999. [53] K.Sohraby,D.Minoli,andT.Znati,WirelessSensorNetworks:Technology,Protocols,andApplications.JohnWileyandSons,Inc.,2007. [54] A.Chandrakasan,R.Amirtharajah,S.Cho,J.Konduri,J.Kulik,W.Rabiner,andA.Wang,DesignConsiderationsforDistributedMicrosensorSystems,inProc.ofIEEECustomIntegratedCircuitsConference(CICC),SanDiego,California,May1999. [55] V.Rajendran,K.Obraczka,andJ.Garcia-Luna-Aceves,Energy-efcientCollision-freeMediumAccessControlforWirelessSensorNetworks,inProc.ofInternationalConferenceonEmbeddedNetworkedSensorSystems(SenSys)'03.LosAngeles,California:ACM,November2003. [56] J.Polastre,J.Hill,andD.Culler,VersatileLowPowerMediaAccessforWirelessSensorNetworks,inProc.ofInternationalConferenceonEmbeddedNetworkedSensorSystems(SenSys)'04.Baltimore,Maryland:ACM,November2004. [57] I.Rhee,A.Warrier,M.Aia,andJ.Min,Z-MAC:AHybridMACforWirelessSensorNetworks,inProc.ofInternationalConferenceonEmbeddedNetworkedSensorSystems(SenSys)'05.SanDiego,California:ACM,November2005. [58] W.Ye,J.Heidemann,andD.Estrin,AnEnergy-EfcientMACProtocolforWirelessSensorNetworks,inProc.ofIEEEINFOCOM,NewYork,NewYork,June2002. [59] L.BaoandJ.Garcia-Luna-Aceves,ANewApproachtoChannelAccessSchedulingforAdHocNetworks,inProc.ofMobiCom'01.Rome,Italy:ACM,July2001. [60] C.Raghavendra,K.Sivalingam,andT.Znati,WirelessSensorNetworks.KluwerAcademicPublishers,2004. [61] I.Stojmenovic,HandbookofSensorNetworks:AlgorithmsandArchitectures.JohnWileyandSons,Inc.,2005. [62] T.VanDamandK.Langendoen,AnAdaptiveEnergy-EfcientMACProtocolforWirelessSensorNetworks,inProc.ofInternationalConferenceonEmbeddedNetworkedSensorSystems(SenSys)'03.LosAngeles,California:ACM,November2003. [63] S.SinghandC.S.Raghavendra,PAMAS-PowerAwareMulti-AccessprotocolwithSignalingforadhocnetworks,ACMSigcommComputerCommunicationReview,vol.28,no.3,pp.5,July1998. 382

PAGE 383

[64] D.Culler,J.Hill,M.Horton,K.Pister,R.Szewczyk,andA.Woo,MICA:TheCommercializationofMicrosensorMotes,inSensorMagazine,April2002.[Online].Available: http://www.sensorsmag.com/articles/0402/40/ [65] W.Ye,J.Heidemann,andD.Estrin,MediumAccessControlwithCoordinatedAdaptiveSleepingforWirelessSensorNetworks,IEEE/ACMTrans.onNetwork-ing,vol.12,no.3,pp.493,June2004. [66] A.Varga,TheOMNeT++discreteeventsimulationsystem,inProc.ofEuropeanSimulationMulticonference(ESM)'01,Prague,CzechRepublic,June2001. [67] EYES,EnergyEfcientSensorNetworks,2010.[Online].Available: http://www.eyes.eu.org/sensnet.htm [68] D.Cofn,D.Hook,S.McGarry,andS.Kolek,DeclarativeAd-hocSensorNetworking,inInSPIEIntegratedCommandEnvironments,July2000. [69] C.Intanagonwiwat,R.Govindan,D.Estrin,J.Heidemann,andF.Silva,DirectedDiffusionforWirelessSensorNetworking,IEEE/ACMTrans.onNetworking,vol.11,no.1,pp.2,February2003. [70] R.Poor,GradientRoutinginAdHocNetworks,2010.[Online].Available: http://www.media.mit.edu/pia/Research/ESP/texts/poorieeepaper.pdf [71] F.Ye,G.Zhong,S.Lu,andL.Zhang,GRAdientBroadcast:ARobustDataDeliveryProtocolforLargeScaleSensorNetworks,ACMWirelessNetworks(WINET),vol.11,no.2,March2005. [72] R.ShahandJ.Rabaey,EnergyAwareRoutingforLowEnergyAdHocSensorNetworks,inProc.ofWirelessCommunicationsandNetworkingConference(WCNC).Orlando,Florida:IEEE,March2002. [73] C.Lu,B.Blum,T.Abdelzaher,J.Stankovic,andT.He,RAP:AReal-TimeCommunicationArchitectureforLarge-ScaleWirelessSensorNetworks,inReal-TimeandEmbeddedTechnologyandApplicationsSymposium(RTAS)'02,SanJose,California,September2002. [74] T.He,J.Stankovic,C.Lu,andT.Abdelzaher,SPEED:AStatelessProtocolforReal-timeCommunicationinSensorNetworks,inProc.ofInternationalConferenceonDistributedComputingSystems(ICDCS)'03.Providence,RhodeIsland:IEEE,May2003. [75] W.Heinzelman,A.Chandrakasan,andH.Balakrishnan,Energy-EfcientCommunicationProtocolsforWirelessMicrosensorNetworks,inHawaiianInternationalConferenceonSystemSciences,January2000. [76] J.Kulik,W.Heinzelman,andH.Balakrishnan,Negotiation-BasedProtocolsforDisseminatingInformationinWirelessSensorNetworks,ACMWirelessNetworks(WINET),vol.8,no.2/3,pp.169,May2002. 383

PAGE 384

[77] TinyOS,,2010.[Online].Available: http://www.tinyos.net/ [78] E.Akhmetshina,P.Gburzynski,andF.Vizeacoumar,PicOS:ATinyOperatingSystemforExtremelySmallEmbeddedPlatforms,inProc.ofConferenceonEmbeddedSystemsandApplications(ESA)'02,LasVegas,Nevada,June2002,pp.116. [79] A.SinhaandA.Chandrakasan,OperatingSystemandAlgorithmicTechniquesforEnergyScalableWirelessSensorNetworks,inProc.ofInternationalConferenceonMobileDataManagement,HongKong,January2001,pp.199. [80] R.Barrandetal.,OntheNeedforSystem-LevelSupportforAdHocandSensorNetworks,ACMSIGOPSOperatingSystemsReview,vol.36,no.2,pp.1,April2002. [81] H.Abrachandetal.,MANTIS:SystemSupportforMultimodalNetworksofIn-SituSensors,inProc.ofWorkshoponWirelessSensorNetworksandApplications(WSNA)'03,SanDiego,California,September2003,pp.50. [82] R.Min,T.Furrer,andA.Chandrakasan,DynamicVoltageScalingTechniquesforDistributedMicrosensorNetworks,inProc.ofIEEEWVLSI,Orlando,Florida,April2000. [83] L.YuanandG.Qu,DesignSpaceExplorationforEnergy-EfcientSecureSensorNetwork,inProc.oftheIEEEInternationalConferenceonApplication-SpecicSystems,Architectures,andProcessors(ASAP),SanJose,California,July2002,pp.88. [84] S.Kogekar,S.Neema,B.Eames,X.Koutsoukos,A.Ledeczi,andM.Maroti,Constraint-GuidedDynamicRecongurationinSensorNetworks,inProc.ofACMIPSN,Berkeley,California,April2004. [85] K.ShaandW.Shi,ModelingtheLifetimeofWirelessSensorNetworks,SensorLetters,vol.3,pp.126,2005. [86] D.Jung,T.Teixeira,A.Barton-Sweeney,andA.Savvides,Model-basedDesignExplorationofWirelessSensorNodeLifetimes,inProc.oftheACM4thEuropeanconferenceonWirelesssensornetworks(EWSN'07),Delft,TheNetherlands,January2007. [87] D.Jung,T.Teixeira,andA.Savvides,SensorNodeLifetimeAnalysis:ModelsandTools,ACMTrans.onSensorNetworks(TOSN),vol.5,no.1,February2009. [88] A.Munir,A.Gordon-Ross,S.Lysecky,andR.Lysecky,AOne-ShotDynamicOptimizationMethodologyforWirelessSensorNetworks,inProc.oftheIARIAIEEEInternationalConferenceonMobileUbiquitousComputing,Systems,ServicesandTechnologies(UBICOMM),Florence,Italy,October2010. 384

PAGE 385

[89] H.Nguyen,A.Forster,D.Puccinelli,andS.Giordano,SensorNodeLifetime:AnExperimentalStudy,inProc.ofIEEEInternationalConferenceonPervasiveComputingandCommunications(PerCom'11),Seattle,Washington,March2011. [90] Crossbow,MTS/MDASensorBoardUsersManual,inCrossbowTechnology,Inc.,SanJose,California,October2010.[Online].Available: http://www.xbow.com/support/Support pdf les/MTS-MDA Series Users Manual.pdf [91] Sensirion,DatasheetSHT1x(SHT10,SHT11,SHT15)HumidityandTemperatureSensor,inSENSIRION-TheSensorCompany,Staefa,Switzerland,April2010.[Online].Available: http://www.sensirion.com/en/pdf/product information/Datasheet-humidity-sensor-SHT1x.pdf [92] Atmel,ATMELATmega1281Microcontrollerwith256KBytesIn-SystemProgrammableFlash,inATMELCorporation,SanJose,California,October2010.[Online].Available: http://www.atmel.com/dyn/resources/prod documents/2549S.pdf [93] ,ATMELAT86RF230LowPower2.4GHzTransceiverforZigBee,IEEE802.15.4,6LoWPAN,RF4CEandISMApplications,inATMELCorporation,SanJose,California,October2010.[Online].Available: http://www.atmel.com/dyn/resources/prod documents/doc5131.pdf [94] H.Friis,ANoteonaSimpleTransmissionFormula,Proc.IRE,vol.34,p.254,1946. [95] Crossbow,CrossbowIRISDatasheet,inCrossbowTechnology,Inc.,SanJose,California,October2010.[Online].Available: http://www.xbow.com/Products/Product pdf les/Wireless pdf/IRIS Datasheet.pdf [96] K.YifanandJ.Peng,DevelopmentofDataVideoBaseStationinWaterEnvironmentMonitoringOrientedWirelessSensorNetworks,inProc.ofIEEEICESS,Washington,DC,July2008. [97] G.Werner-Allen,K.Lorincz,M.Welsh,O.Marcillo,J.Johnson,M.Ruiz,andJ.Lees,DeployingaWirelessSensorNetworkonanActiveVolcano,IEEEInternetComputing,vol.10,no.2,pp.18,March2006. [98] A.Mainwaring,D.Culler,J.Polastre,R.Szewczyk,andAnderson,WirelessSensorNetworksforHabitatMonitoring,inProc.ofACMWSNA,Atlanta,Georgia,September2002. [99] NASA,NASAKennedySpaceCenter:NASAOrbiterFleet,May2011,[Online].Available:http://www.nasa.gov/centers/kennedy/shuttleoperations/orbiters/orbitersdis.html. 385

PAGE 386

[100] A.MoustaphaandR.Selmic,WirelessSensorNetworkModelingUsingModiedRecurrentNeuralNetworks:ApplicationtoFaultDetection,inProc.ofIEEEICNSC,London,UK,April2007. [101] J.Bredin,E.Demaine,M.Hajiaghayi,andD.Rus,DeployingSensorNetworksWithGuaranteedFaultTolerance,IEEE/ACMTrans.onNetworking,vol.18,no.1,pp.216,February2010. [102] A.Sharma,L.Golubchik,andR.Govindan,OnthePrevalenceofSensorFaultsinReal-WorldDeployments,inProc.ofIEEECommunicationsSocietyConferenceonSensor,MeshandAdHocCommunicationsandNetworks(SECON),SanDiego,California,June2007. [103] F.Koushanfar,M.Potkonjak,andA.Sangiovanni-Vincentelli,FaultToleranceTechniquesforWirelessAdHocSensorNetworks,inProc.ofIEEESensors,Orlando,Florida,June2002. [104] A.Hopkins,T.B.Smith,andJ.Lala,FTMP-AHighlyReliableFault-TolerantMultiprocessorforAircraft,Proc.oftheIEEE,vol.66,no.10,pp.1221,October1978. [105] J.Wensley,L.Lamport,J.Goldberg,M.Green,N.Levitt,P.Melliar-Smith,R.Shostak,andC.Weinstock,SIFT:DesignandAnalysisofaFault-TolerantComputerforAircraftControl,Proc.oftheIEEE,vol.66,no.10,pp.1240,October1978. [106] A.Avizienis,TheN-VersionApproachtoFault-TolerantSoftware,IEEETrans.onSoftwareEngineering,vol.11,no.12,pp.1491,December1985. [107] A.SomaniandN.Vaidya,UnderstandingFaultToleranceandReliability,IEEEComputer,vol.30,no.4,pp.45,April1997. [108] J.Sklaroff,RedundancyManagementTechniqueforSpaceShuttleComputers,IBMJournalofResearchandDevelopment,vol.20,no.1,pp.20,January1976. [109] A.AvizienisandJ.Laprie,DependableComputing:FromConceptstoDesignDiversity,Proc.oftheIEEE,vol.74,no.5,pp.629,May1986. [110] P.Jiang,ANewMethodforNodeFaultDetectioninWirelessSensorNetworks,Sensors,vol.9,no.2,pp.1282,May2009. [111] G.Jian-Liang,X.Yong-Jun,andL.Xiao-Wei,Weighted-MedianbasedDistributedFaultDetectionforWirelessSensorNetworks,JournalofSoftware,vol.18,no.5,pp.1208,May2007. [112] M.LeeandY.Choi,FaultDetectionofWirelessSensorNetworks,ElsevierComputerCommunications,vol.31,no.14,pp.3469,September2008. 386

PAGE 387

[113] P.KhilarandS.Mahapatra,IntermittentFaultDiagnosisinWirelessSensorNetworks,inProc.ofIEEEICIT,Rourkela,India,December2007. [114] M.Ding,D.Chen,K.Xing,andX.Cheng,LocalizedFault-TolerantEventBoundaryDetectioninSensorNetworks,inProc.ofIEEEINFOCOM,Miami,Florida,March2005. [115] B.KrishnamachariandS.Iyengar,DistributedBayesianAlgorithmsforFault-TolerantEventRegionDetectioninWirelessSensorNetworks,IEEETransactionsonComputers,vol.53,no.3,pp.241,March2004. [116] J.Wu,D.Duh,T.Wang,andL.Chang,On-LineSensorFaultDetectionBasedonMajorityVotinginWirelessSensorNetworks,inProc.of24thWorkshoponCombinatorialMathematicsandComputationTheory(ALGO),Eilat,Israel,October2007. [117] T.Clouqueur,K.Saluja,andP.Ramanathan,FaultToleranceinCollaborativeSensorNetworksforTargetDetection,IEEETrans.onComputers,vol.53,no.3,pp.320,March2004. [118] M.Chiang,Z.Zilic,J.Chenard,andK.Radecka,ArchitecturesofIncreasedAvailabilityWirelessSensorNetworkNodes,inProc.ofIEEEITC,Washington,DC,October2004. [119] W.Zhang,G.Xue,andS.Misra,Fault-TolerantRelayNodePlacementinWirelessSensorNetworks:ProblemsandAlgorithms,inProc.ofIEEEINFOCOM,Anchorage,Alaska,May2007. [120] X.Han,X.Cao,E.Lloyd,andC.-C.Shen,Fault-TolerantRelayNodePlacementinHeterogeneousWirelessSensorNetworks,IEEETrans.onMobileComputing,vol.9,no.5,pp.643,May2010. [121] M.Baldi,F.Chiaraluce,andE.Zanaj,FaultToleranceinSensorNetworks:PerformanceComparisonofSomeGossipAlgorithms,inIEEEWISES,Ancona,Italy,June2009. [122] A.Sen,B.Shen,L.Zhou,andB.Hao,Fault-ToleranceinSensorNetworks:ANewEvaluationMetric,inProc.ofIEEEINFOCOM,Barcelona,Catalunya,Spain,April2006. [123] H.AlwanandA.Agarwal,ASurveyonFaultTolerantRoutingTechniquesinWirelessSensorNetworks,inProc.ofIEEESENSORCOMM,Athens,Greece,June2009. [124] L.Souza,FT-CoWiseNets:AFaultToleranceFrameworkforWirelessSensorNetworks,inProc.ofIEEESENSORCOMM,Valencia,Spain,October2007. 387

PAGE 388

[125] W.Cai,X.Jin,Y.Zhang,K.Chen,andJ.Tang,ResearchonReliabilityModelofLarge-ScaleWirelessSensorNetworks,inProc.ofIEEEWiCOM,Wuhan,China,September2006. [126] J.ZhuandS.Papavassiliou,OntheConnectivityModelingandtheTradeoffsbetweenReliabilityandEnergyEfciencyinLargeScaleWirelessSensorNetworks,inProc.ofIEEEWCNC,NewOrleans,Louisiana,March2003. [127] C.Vasar,O.Prostean,I.Filip,R.Robu,andD.Popescu,MarkovModelsforWirelessSensorNetworkReliability,inProc.ofIEEEICCP,Cluj-Napoca,Romania,August2009. [128] L.XingandH.Michel,IntegratedModelingforWirelessSensorNetworksReliabilityandSecurity,inProc.ofIEEE/ACMRAMS,NewportBeach,California,January2006. [129] R.KannanandS.Iyengar,Game-TheoreticModelsforReliablePath-LengthandEnergy-ConstrainedRoutingWithDataAggregationinWirelessSensorNetworks,IEEEJournalonSelectedAreasinCommunications(JSAC),vol.22,no.6,pp.1141,August2004. [130] S.Mukhopadhyay,C.Schurgers,D.Panigrahi,andS.Dey,Model-BasedTechniquesforDataReliabilityinWirelessSensorNetworks,IEEETrans.onMobileComputing,vol.8,no.4,pp.528,April2009. [131] N.Peterson,L.Anusuya-Rangappa,B.Shirazi,W.Song,R.Huang,D.Tran,S.Chien,andR.LaHusen,VolcanoMonitoring:ACaseStudyinPervasiveComputing,SpringerComputerCommunicationsandNetworks,2009. [132] M.DeGroot,ProbabilityandStatistics.AddisonWesley,1975. [133] N.Johnson,S.Kotz,andN.Balakrishnan,ContinuousUnivariateDistributions.JohnWileyandSons,Inc.,1994. [134] R.Sahner,K.Trivedi,andA.Puliato,PerformanceandReliabilityAnalysisofComputerSystems:AnExample-BasedApproachUsingtheSHARPESoftwarePackage.KluwerAcademicPublishers,1996. [135] SHARPE,TheSHARPETool&theInterface(GUI),May2011,[Online].Available:http://people.ee.duke.edu/chirel/IRISA/sharpeGui.html. [136] NIST,EngineeringStatisticsHandbook:ExponentialDistribution,September2011,[Online].Available:http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm. [137] A.MunirandA.Gordon-Ross,AnMDP-basedApplicationOrientedOptimalPolicyforWirelessSensorNetworks,inProc.ofInternationalConferenceonHardware/SoftwareCodesignandSystemSynthesis(CODES+ISSS'09).Grenoble,France:ACM,October2009,pp.183. 388

PAGE 389

[138] M.Horton,CommercialWirelessSensorNetworks:Status,IssuesandChallenges,inProc.ofIEEESECON:KeynotePresentation,SantaClara,California,October2004. [139] K.Greene,SensorNetworksforDummies,TechnologyReview(publishedbyMIT),March2006. [140] C.-Y.SeongandB.Widrow,NeuralDynamicOptimizationforControlSystems,IEEETrans.onSystems,Man,andCybernatics,vol.31,no.4,pp.482,August2001. [141] E.Stevens-Navarro,Y.Lin,andV.Wong,AnMDP-basedVerticalHandoffDecisionAlgorithmforHeterogeneousWirelessNetworks,IEEETrans.onVehicularTechnology,vol.57,no.2,pp.1243,March2008. [142] S.SridharanandS.Lysecky,AFirstStepTowardsDynamicProlingofSensor-BasedSystems,inProc.ofIEEESECON,SanFrancisco,California,June2008. [143] S.Tilak,N.Abu-Ghazaleh,andW.Heinzelman,InfrastructureTradeoffsforSensorNetworks,inProc.ofthe1stACMInternationalWorkshoponWirelessSensorNetworksandApplications,Atlanta,Georgia,September2002. [144] I.KadayifandM.Kandemir,TuningIn-SensorDataFilteringtoReduceEnergyConsumptioninWirelessSensorNetworks,inProc.ofACMDATE,Paris,France,February2004. [145] P.Marronandetal.,AdaptationandCross-LayerIssuesinSensorNetworks,inProc.ofIEEEISSNIP,December,Melbourne,Australia2004. [146] A.Vecchio,AdaptabilityinWirelessSensorNetworks,inProc.ofIEEEICECS,September,Malta2008. [147] ITA,InternationalTechnologyAllianceinNetworkandInformationScience,December2010.[Online].Available: http://www.usukita.org/ [148] P.PillaiandK.Shin,Real-TimeDynamicVoltageScalingforLow-PowerEmbeddedOperatingSystems,inProc.ofACMSOSP,Banff,Alberta,Canada,October2001. [149] B.Childers,H.Tang,andR.Melhem,AdaptingProcessorSupplyVoltagetoInstruction-LevelParallelism,inProc.ofKoolchipsWorkshop,inconjunctionwithMICRO-33,Monterey,California,December2001. [150] D.LiuandC.Svensson,TradingSpeedforLowPowerbyChoiceofSupplyandThresholdVoltages,IEEEJournalofSolid-StateCircuits,vol.28,no.1,pp.10,January1993. 389

PAGE 390

[151] T.Burd,T.Pering,A.Stratakos,andR.Brodersen,ADynamicVoltageScaledMicroprocessorSystem,IEEEJournalofSolid-StateCircuits,vol.35,no.11,pp.1571,November2000. [152] S.LyseckyandF.Vahid,AutomatedApplication-SpecicTuningofParameterizedSensor-BasedEmbeddedSystemBuildingBlocks,inProc.oftheInternationalConferenceonUbiquitousComputing(UbiComp),OrangeCounty,California,September2006,pp.507. [153] R.Verma,AutomatedApplicationSpecicSensorNetworkNodeTuningforNon-ExpertApplicationDevelopers,M.S.Thesis,DepartmentofElectricalandComputerEngineering,UniversityofArizona,2008. [154] Withheldforblindreview. [155] M.Puterman,MarkovDecisionProcesses:DiscreteStochasticDynamicProgram-ming.JohnWileyandSons,Inc.,2005. [156] T.Lane,MDPPolicyIterationLecture,April2011.[Online].Available: http://www.cs.unm.edu/terran/ [157] P.Dutta,M.Grimmer,A.Arora,S.Bibyk,andD.Culler,DesignofaWirelessSensorNetworkPlatformforDetectingRare,Random,andEphemeralEvents,inProc.ofACMIPSN,LosAngeles,California,April2005. [158] J.ZhaoandR.Govindan,UnderstandingPacketDeliveryPerformanceinDenseWirelessSensorNetworks,inProc.ofACMSenSys,LosAngeles,California,November2003. [159] F.YuandV.Krishnamurthy,OptimalJointSessionAdmissionControlinIntegratedWLANandCDMACellularNetworkswithVerticalHandoff,IEEETrans.onMobileComputing,vol.6,no.1,pp.126,January2007. [160] I.Chades,M.Cros,F.Garcia,andR.Sabbadin,MarkovDecisionProcess(MDP)Toolboxv2.0forMATLAB,inINRAToulouse,INRA,France,February2005.[Online].Available: http://www.inra.fr/internet/Departements/MIA/T/MDPtoolbox/ [161] P.DuttaandD.Culler,SystemSoftwareTechniquesforLow-PowerOperationinWirelessSensorNetworks,inProc.ofIEEE/ACMICCAD,SanJose,California,November2005. [162] Honeywell,Honeywell1-and2-AxisMageneticSensorsHMC1001/1002,andHMC1021/1022Datasheet,inHoneywellInternationalInc.,Morristown,NewJersey,December2010.[Online].Available: http://www.ssec.honeywell.com/magnetic/datasheets/hmc1001-2 1021-2.pdf 390

PAGE 391

[163] I.Akyildiz,W.Su,Y.Sankarasubramaniam,andE.Cayirci,WirelessSensorNetworks:ASurvey,ElsevierComputerNetworks,vol.38,no.4,pp.393,March2002. [164] DPOP,DynamicProlingandOptimization(DPOP)forSensorNetworks,December2010.[Online].Available: http://www.ece.arizona.edu/dpop/ [165] M.F.Huber,A.Kuwertz,F.Sawo,andU.D.Hanebeck,DistributedGreedySensorSchedulingforModel-basedReconstructionofSpace-TimeContinuousPhysicalPhenomenon,inProc.oftheIEEEInternationalConferenceonInformationFusion(FUSION),Seattle,Washington,July2009,pp.102. [166] L.XuandE.Oja,ImprovedSimulatedAnnealing,BoltzmannMachine,andAttributedGraphMatching,inProc.oftheEURASIPWorkshoponNeuralNetworks.Springer-Verlag,February1990,pp.151. [167] S.Kirkpatrick,C.Gelatt,andM.Vecchi,OptimizationbySimulatedAnnealing,Science,vol.220,no.4598,pp.671,May1983. [168] W.Ben-Ameur,ComputingtheInitialTemperatureofSimulatedAnnealing,ComputationalOptimizationandApplications,vol.29,no.3,pp.369,December2004. [169] Xeon,IntelXeonProcessorE5430,December2010.[Online].Available: http://processornder.intel.com/details.aspx?sSpec=SLANU [170] Linux,LinuxManPages,December2010.[Online].Available: http://linux.die.net/man/ [171] C++ReferenceLibrary,incplusplus.com,2010.[Online].Available: http://cplusplus.com/reference/clibrary/ctime/clock/ [172] A.Gordon-Ross,F.Vahid,andN.Dutt,FastCongurable-CacheTuningWithaUniedSecond-LevelCache,IEEETrans.onVeryLargeScaleIntegration(VLSI)Systems,vol.17,no.1,pp.80,January2009. [173] A.Shenoy,J.Hiner,S.Lysecky,R.Lysecky,andA.Gordon-Ross,EvaluationofDynamicProlingMethodologiesforOptimizationofSensorNetworks,IEEEEmbeddedSystemsLetters,vol.2,no.1,pp.10,March2010. [174] A.MunirandA.Gordon-Ross,AnMDP-basedDynamicOptimizationMethodologyforWirelessSensorNetworks,acceptedforpublicationinIEEETrans.onParallelandDistributedSystems(TPDS),2011. [175] X.Wang,J.Ma,andS.Wang,CollaborativeDeploymentOptimizationandDynamicPowerManagementinWirelessSensorNetworks,inProc.oftheIEEEInternationalConferenceonGridandCooperativeComputing(GCC),Changsha,Hunan,China,October2006,pp.121. 391

PAGE 392

[176] X.NingandC.Cassandras,OptimalDynamicSleepTimeControlinWirelessSensorNetworks,inProc.oftheIEEEConferenceonDecisionandControl(CDC),Cancun,Mexico,December2008,pp.2332. [177] Top500,Top500SupercomputerSites,June2011.[Online].Available: http://www.top500.org/ [178] Green500,RankingtheWorld'sMostEnergy-EfcientSupercomputers,June2011.[Online].Available: http://www.green500.org/ [179] I.AhmadandS.Ranka,HandbookofEnergy-AwareAndGreenComputing.TaylorandFrancisGroup,CRCPress,2011. [180] R.Kumar,D.Tullsen,P.Ranganathan,N.Jouppi,andK.Farkas,Single-ISAHeterogeneousMulti-CoreArchitecturesforMultithreadedWorkloadPerformance,inProc.ofIEEEISCA,Munich,Germany,June2004. [181] R.Kumar,D.Tullsen,andN.Jouppi,CoreArchitectureOptimizationforHeterogeneousChipMultiprocessors,inProc.ofACMInternationalConfer-enceonParallelArchitecturesandCompilationTechniques(PACT),Seattle,Washington,September2006. [182] R.Kumar,N.Jouppi,andD.Tullsen,Conjoined-coreChipMultiprocessing,inProc.ofIEEE/ACMMICRO-37,Portland,Oregon,December2004. [183] S.Keckler,K.Olukotun,andH.Hofstee,MulticoreProcessorsandSystems.Springer,2009. [184] K.PuttaswamyandG.Loh,ThermalHerding:MicroarchitectureTechniquesforControllingHotspotsinHigh-Performance3D-IntegratedProcessors,inProc.ofIEEEHPCA,Phoenix,Arizona,February2007. [185] P.Pande,A.Ganguly,B.Belzer,A.Nojeh,andA.Ivanov,NovelInterconnectInfrastructuresforMassiveMulticoreChips-AnOverview,inProc.ofIEEEISCAS,Seattle,Washington,May2008. [186] S.Narayanan,J.Sartori,R.Kumar,andD.Jones,ScalableStochasticProcessors,inProc.ofIEEE/ACMDATE,Dresden,Germany,March2010. [187] M.Hill,TransactionalMemory,inSynthesisLecturesonComputerArchitecture,June2010.[Online].Available: http://www.morganclaypool.com/toc/cac/1/1 [188] N.Guan,M.Stigge,W.Yi,andG.Yu,Cache-AwareSchedulingandAnalysisforMulticores,inProc.ofACMEMSOFT,Grenoble,France,October2009. [189] S.Fide,ArchitecturalOptimizationsinMulti-CoreProcessors.VDMVerlag,2008. [190] J.ChangandG.Sohi,CooperativeCachingforChipMultiprocessors,inProc.ofACMISCA,Boston,Massachusetts,May2006. 392

PAGE 393

[191] K.Flautner,N.Kim,S.Martin,D.Blaauw,andT.Mudge,DrowsyCaches:SimpleTechniquesforReducingLeakagePower,inProc.ofIEEE/ACMISCA,Anchorage,Alaska,May2002. [192] S.-B.Lee,S.-W.Tam,I.Pefkianakis,S.L.Lu,M.Chang,C.Guo,G.Reinman,C.Peng,M.Naik,L.Zhang,andJ.Cong,AScalableMicroWirelessInterconnectStructureforCMPs,inProc.ofACMMobiCom,Beijing,China,September2009. [193] A.Shacham,K.Bergman,andL.Carloni,PhotonicNetworks-on-ChipforFutureGenerationsofChipMultiprocessors,IEEETrans.onComputers,vol.57,no.9,pp.1246,September2008. [194] P.Pande,A.Ganguly,K.Chang,andC.Teuscher,HybridWirelessNetworkonChip:ANewParadigminMulti-CoreDesign,inProc.ofIEEENoCArc,NewYork,NewYork,December2009. [195] V.Kontorinis,A.Shayan,D.Tullsen,andR.Kumar,ReducingPeakPowerwithaTable-DrivenAdaptiveProcessorCore,inProc.ofIEEE/ACMMICRO-42,NewYork,NewYork,December2009. [196] J.DonaldandM.Martonosi,TechniquesforMulticoreThermalManagement:ClassicationandNewExploration,inProc.ofIEEEISCA,Boston,Massachusetts,June2006. [197] R.JayaseelanandT.Mitra,AHybridLocal-GlobalApproachforMulti-CoreThermalManagement,inProc.ofIEEE/ACMICCAD,SanJose,California,November2009. [198] J.Park,D.Shin,N.Chang,andM.Pedram,AccurateModelingandCalculationofDelayandEnergyOverheadsofDynamicVoltageScalinginModernHigh-PerformanceMicroprocessors,inProc.ofACM/IEEEISLPED,Austin,Texas,August2010. [199] ACPI,AdvancedCongurationandPowerInterface,June2011.[Online].Available: http://www.acpi.info/ [200] J.LeeandN.Kim,OptimizingThroughputofPower-andThermal-ConstrainedMulticoreProcessorsUsingDVFSandPer-CorePower-Gating,inProc.ofIEEE/ACMDAC,SanFrancisco,California,July2009. [201] Freescale,GreenEmbeddedComputingandtheMPC8536EPowerQUICCIIIProcessor,2009.[Online].Available: http://www.freescale.com/les/32bit/doc/white paper/MPC8536EWP.pdf [202] R.Ge,X.Feng,S.Song,H.-C.Chang,D.Li,andK.Cameron,PowerPack:EnergyProlingandAnalysisofHigh-PerformanceSystemsandApplications,IEEETrans.onParallelandDistributedSystems,vol.21,no.5,pp.658,May2010. 393

PAGE 394

[203] H.Hoffmann,S.Sidiroglou,M.Carbin,S.Misailovic,A.Agarwal,andM.Rinard,Power-AwareComputingwithDynamicKnobs,MITTechnicalReport:ComputerScienceandArticialIntelligenceLaboratory(MIT-CSAIL-TR-2010-027),May2010. [204] W.BaekandT.Chilimbi,Green:AFrameworkforSupportingEnergy-ConsciousProgrammingusingControlledApproximation,inProc.ofACMSIGPLANPLDI,Toronto,Ontario,Canada,June2010. [205] X.Zhou,J.Yang,M.Chrobak,andY.Zhang,Performance-awareThermalManagementviaTaskScheduling,ACMTrans.onArchitectureandCodeOptimization(TACO),vol.7,no.1,pp.5:1:31,April2010. [206] A.Jacobs,A.George,andG.Cieslewski,RecongurableFaultTolerance:AFrameworkforEnvironmentallyAdaptiveFaultMitigationinSpace,inProc.ofIEEEFPL,Prague,CzechRepublic,August-September2009. [207] CHREC,NSFCenterforHigh-PerformanceRecongurableComputing,June2011.[Online].Available: http://www.chrec.org/ [208] J.SloanandR.Kumar,TowardsScalableReliabilityFrameworksforErrorProneCMPs,inProc.ofACMCASES,Grenoble,France,October2009. [209] D.PoulsenandP.-C.Yew,DataPrefetchingandDataForwardinginSharedMemoryMultiprocessors,inProc.ofIEEEICPP,NorthCarolinaStateUniversity,NorthCarolina,August1994. [210] L.Yan,W.Hu,T.Chen,andZ.Huang,HardwareAssistantSchedulingforSynergisticCoreTasksonEmbeddedHeterogeneousMulti-coreSystem,JournalofInformation&ComputationalScience,vol.5,no.6,pp.2369,2008. [211] P.Chaparro,J.Gonzalez,G.Magklis,Q.Cai,andA.Gonzalez,UnderstandingtheThermalImplicationsofMulticoreArchitectures,IEEETrans.onParallelandDistributedSystems,vol.18,no.8,pp.1055,August2007. [212] G.SuoandX.-j.Yang,BalancingParallelApplicationsonMulti-coreProcessorsBasedonCachePartitioning,inProc.ofIEEEISPA,ChenduandJiuZhaiValley,China,August2009. [213] H.Jeon,W.Lee,andS.Chung,LoadUnbalancingStrategyforMulti-CoreEmbeddedProcessors,IEEETrans.onComputers,vol.59,no.10,pp.1434,October2010. [214] ARM,ARM11MPCoreProcessorTechnicalReferenceManual,June2011.[Online].Available: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0360e/DDI0360E arm11 mpcore r1p0 trm.pdf [215] K.HirataandJ.Goodacre,ARMMPCore:TheStreamlinedandScalableARM11ProcessorCore,inProc.ofIEEEASP-DAC,Yokohama,Japan,January2007. 394

PAGE 395

[216] ARM,WhitePaper:TheARMCortex-A9Processors,June2011.[Online].Available: http://www.arm.com/les/pdf/ARMCortexA-9Processors.pdf [217] ,Cortex-A9MPCoreTechnicalReferenceManual,June2011.[Online].Available: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/DDI0407F cortex a9 r2p2 mpcore trm.pdf [218] Freescale,MPC8572EPowerQUICCIIIProcessor,June2011.[Online].Available: http://www.freescale.com/les/netcomm/doc/fact sheet/MPC8572FS.pdf [219] ,MPC8572EPowerQUICCIIIIntegratedProcessorHardwareSpecications,June2011.[Online].Available: http://cache.freescale.com/les/32bit/doc/data sheet/MPC8572EEC.pdf [220] TILERA,ManycorewithoutBoundaries:TILEPro64Processor,July2011.[Online].Available: http://www.tilera.com/products/processors/TILEPRO64 [221] ,ManycorewithoutBoundaries:TILE-GxProcessorFamily,June2011.[Online].Available: http://www.tilera.com/products/processors/TILE-Gx Family [222] AMD,AMDCool`n'QuietTechnology,June2011.[Online].Available: http://www.amd.com/us/products/technologies/cool-n-quiet/Pages/cool-n-quiet.aspx [223] Intel,High-PerformanceEnergy-EfcientProcessorsforEmbeddedMarketSegments,June2011.[Online].Available: http://www.intel.com/design/embedded/downloads/315336.pdf [224] ,IntelCore2DuoProcessorMaximizingDual-CorePerformanceEfciency,June2011.[Online].Available: ftp://download.intel.com/products/processor/core2duo/mobile prod brief.pdf [225] ,Dual-CoreIntelXeonProcessorsLVandULVforEmbeddedComputing,June2011.[Online].Available: ftp://download.intel.com/design/intarch/prodbref/31578602.pdf [226] ,IntelXeonProcessorLV5148,June2011.[Online].Available: http://ark.intel.com/Product.aspx?id=27223 [227] ,IntelMicroarchitectureCodenameSandyBridge,June2011.[Online].Available: http://www.intel.com/technology/architecture-silicon/2ndgen/index.htm [228] ,Intel's`SandyBridge'Coreprocessors,June2011.[Online].Available: http://techreport.com/articles.x/20188 [229] ,IntelTurboBoostTechnology2.0,June2011.[Online].Available: http://www.intel.com/technology/turboboost/ [230] NVIDIA,NVIDIATeslaC1060ComputingProcessor,June2011.[Online].Available: http://www.nvidia.com/object/product tesla c1060 us.html 395

PAGE 396

[231] ,NVIDIATeslaPersonalSupercomputer,June2011.[Online].Available: http://www.nvidia.com/docs/IO/43395/NV DS Tesla PSC US Mar09 LowRes.pdf [232] ,NVIDIAPowerMizerTechnology,June2011.[Online].Available: http://www.nvidia.com/object/feature powermizer.html [233] ,NVIDIATeslaC2050/C2070GPUComputingProcessor,June2011.[Online].Available: http://www.nvidia.com/object/product tesla C2050 C2070 us.html [234] M.Hamdi,N.Boudriga,andM.Obaidat,Bandwidth-EffecitveDesignofaSatellite-BasedHybridWirelessSensorNetworkforMobileTargetDetectionandTracking,IEEESystemsJournal,vol.2,no.1,pp.74,March2008. [235] I.F.Akyildiz,T.Melodia,andK.R.Chowdhury,WirelessMultimediaSensorNetworks:ApplicationsandTestbeds,ProceedingsoftheIEEE,vol.96,no.10,pp.1588,October2008. [236] Y.LiuandS.K.Das,Information-IntensiveWirelessSensorNetworks:PotentialandChallenges,IEEECommunicationsMagazine,vol.44,no.11,pp.142,November2006. [237] Rockwell,RockwellAutomation,October2011.[Online].Available: www.rockwellautomation.com [238] T.T.-O.KwokandY.-K.Kwok,ComputationandEnergyEfcientImageProcessinginWirelessSensorNetworksBasedonRecongurableComputing,inProc.oftheInternationalConferenceonParallelProcessingWorkshops(ICPPW),Columbus,Ohio,August2006. [239] R.Kleihorst,B.Schueler,A.Danilin,andM.Heijligers,SmartCameraMotewithHighPerformanceVisionSystem,inProc.oftheWorkshoponDistributedSmartCameras(DSC),Boulder,Colorado,October2006. [240] A.Y.Dogan,D.Atienza,A.Burg,I.Loi,andL.Benini,Power/PerformanceExplorationofSingle-CoreandMulti-coreProcessorApproachesforBiomedicalSignalProcessing,inProc.oftheWorkshoponPowerandTimingModeling,OptimizationandSimulation(PATMOS),Madrid,Spain,September2011. [241] M.Coates,DistributedParticleFiltersforSensorNetworks,inProc.oftheIEEE/ACM3rdInternationalSymposiumonInformationProcessinginSensorNetworks(IPSN),Berkeley,California,April2004. [242] R.RajagopalanandP.Varshney,Data-AggregationTechniquesinSensorNetworks:ASurvey,IEEECommunicationsSurveys&Tutorials,vol.8,no.4,pp.48,2006. 396

PAGE 397

[243] R.Kulkarni,A.Forster,andG.Venayagamoorthy,ComputationalIntelligenceinWirelessSensorNetworks:ASurvey,IEEECommunicationsSurveys&Tutorials,vol.13,no.1,pp.68,2011. [244] A.Kak,ParallelHistogram-basedParticleFilterforObjectTrackingonSIMD-basedSmartCameras,inPurdueRobotVisionLab,PurdueUniversity,WestLafayette,Indiana,October2011.[Online].Available: https://engineering.purdue.edu/RVL/Research/SIMD PF/index.html [245] MEMSIC,Imote2HardwareBundleforWirelessSensorNetworks,October2011.[Online].Available: www.memsic.com [246] E.F.Nakamura,A.A.Loureiro,andA.C.Frery,InformationFusionforWirelessSensorNetworks:Methods,Models,andClassications,ACMComputingSurveys,vol.39,no.3,August2007. [247] M.BedworithandJ.O'Brien,TheOmnibusModel:ANewModelforDataFusion?IEEEAerospaceandElectronicSystemsMagazine,vol.15,no.4,pp.30,April2000. [248] D.Kim,K.Park,andW.Ro,NetworkCodingonHeterogeneousMulti-CoreProcessorsforWirelessSensorNetworks,Sensors,vol.11,no.8,pp.7908,2011. [249] G.R.Murthy,Control,CommunicationandComputingUnits:ConvergedArchitectures,InternationalJournalofComputerApplications,vol.1,no.4,pp.49,2010. [250] W.Li,T.Arslan,J.Han,A.T.Erdogan,A.El-Rayis,N.Haridas,andE.Yang,EnergyEfciencyEnhancementinSatelliteBasedWSNthroughCollaboratoinandSelf-OrganizedMobility,inProc.oftheIEEEAerospaceConference,BigSky,Montana,March2009. [251] T.Vladimirova,C.Bridges,G.Prassinos,X.Wu,K.Sidibeh,D.Barnhart,A.-H.Jallad,J.Paul,V.Lappas,A.Baker,K.Maynard,andR.Magness,CharacterizingWirelessSensorMotesforSpaceApplications,inProc.oftheNASA/ESAConferenceonAdaptiveHardwareandSystems(AHS),Edinburgh,UnitedKingdom,August2007. [252] V.Lappas,G.Prassinos,A.Baker,andR.Magness,WirelessSensorMotesforSmallSatelliteApplications,IEEEAntennasandPropagationMagazine,vol.48,no.5,pp.175,October2006. [253] K.Champaigne,WirelessSensorSystemsforNear-termSpaceShuttleMissions,inProc.oftheConference&ExpositiononStructuralDynamics(IMAC),2005.[Online].Available: http://sem.org/Proceedings/ConferencePapers-Paper.cfm?ConfPapersPaperID=23262 397

PAGE 398

[254] W.Ye,F.Silva,A.DeSchon,andS.Bhatt,ArchitectureofaSatellite-BasedSensorNetworkforEnvironmentalObservation,inProc.oftheEarthScienceTechnologyConference(ESTC),Adelphi,Maryland,June2008. [255] C.Park,Q.Xie,andP.Chou,InstraNode:Dual-MicrocontrollerBasedSensorNodeforReal-TimeStructuralHealthMonitoring,inProc.oftheIEEECommunicationsSocietyConferenceonSensorandAdHocCommunicationsandNetworks(SECON),SantaClara,California,September2005. [256] J.Etchison,G.Skelton,Q.Pang,andT.Hulitt,MobileIntelligentSensorNetworkUsedforDataProcessing,JacksonSateUniversity,Jackson,Mississippi,2010.[Online].Available: http://www.iiis.org/CDs2010/CD2010SCI/SCI 2010/PapersPdf/SA874PZ.pdf [257] T.Vladimirova,C.Bridges,J.Paul,S.Malik,andM.Sweeting,Space-basedWirelessSensorNetworks:DesignIssues,inProc.oftheIEEEAerospaceConference,BigSky,Montana,March2009. [258] Aeroex,Leon3Processor,October2011.[Online].Available: http://www.gaisler.com/cms/index.php?option=com content&task=view&id=13&Itemid=53 [259] S.Ohara,M.Suzuki,S.Saruwatari,andH.Morikawa,APrototypeofaMulti-CoreWirelessSensorNodeforReducingPowerConsumption,inProc.oftheInternationalSymposiumonApplicationsandtheInternet(SAINT),Turku,Finland,July2008. [260] J.Balfour,EfcientEmbeddedComputing,Ph.D.dissertation,DepartmentofElectricalEngineering,StanfordUniversity,May2010. [261] A.Fedorova,S.Blagodurov,andS.Zhuravlev,ManagingContentionforSharedResourcesonMulticoreProcessors,CommunicationsoftheACM,vol.53,no.2,pp.49,February2010. [262] D.Culler,J.Singh,andA.Gupta,ParallelComputerArchitecture:AHard-ware/SoftwareApproach.MorganKaufmannPublishers,Inc.,1999. [263] J.SavageandM.Zubair,AUniedModelforMulticoreArchitectures,inProc.ofACMInternationalForumonNext-generationMulticore/ManycoreTechnologies(IFMT),Cairo,Egypt,November2008. [264] R.Jain,TheArtofComputerSystemsPerformanceAnalysis:TechniquesforExperimentalDesign,Measurement,Simulation,andModeling.Wiley,1991. [265] N.SamariandG.Schneider,AQueueingTheory-BasedAnalyticModelofaDistributedComputerNetwork,IEEETrans.onComputers,vol.C-29,no.11,pp.994,November1980. [266] L.Kleinrock,QueueingSystems,VolumeII:ComputerApplications.Wiley-Interscience,1976. 398

PAGE 399

[267] V.MainkarandK.Trivedi,PerformanceModelingUsingSHARPE,inProc.oftheEighthSymposiumonReliabilityinElectronics(RELECTRONIC),Budapest,Hungary,August1991. [268] R.Kumar,D.Tullsen,N.Jouppi,andP.Ranganathan,HeterogeneousChipMultiprocessors,IEEEComputer,vol.38,no.11,pp.32,November2005. [269] M.Sabry,M.Ruggiero,andP.Valle,PerformanceandEnergyTrade-offsAnalysisofL2On-chipCacheArchitecturesforEmbeddedMPSoCs,inProc.ofIEEE/ACMGreatLakesSymposiumonVLSI(GLSVLSI),Providence,RhodeIsland,USA,May2010. [270] D.Bentez,J.Moure,D.Rexachs,andE.Luque,AdaptiveL2CacheforChipMultiprocessors,inProc.ofACMInternationalEuropeanConferenceonParallelandDistributedComputing(Euro-Par),Rennes,France,August2007. [271] J.Ruggiero,MeasuringCacheandMemoryLatencyandCPUtoMemoryBandwidth,IntelWhitePaper,pp.1,December2008. [272] J.Medhi,StochasticModelsinQueueingTheory.AcademicPress,AnimprintofElsevierScience,2003. [273] O.Kwon,H.Bahn,andK.Koh,FARS:APageReplacementAlgorithmforNANDFlashMemoryBasedEmbeddedSystems,inProc.ofIEEECIT,Sydney,Australia,July2008. [274] L.Shiandetal.,WriteActivityReductiononFlashMainMemoryviaSmartVictimCache,inProc.ofACMGLSVLSI,Providence,RhodeIsland,USA,May2010. [275] M.ReiserandS.Lavenberg,MeanValueAnalysisofClosedMulti-chainQueueingNetworks,JournalofACM,vol.27,no.2,pp.313,April1980. [276] K.SevcikandI.Mitrani,TheDistributionofQueueingNetworkStatesatInputandOutputInstants,JournalofACM,vol.28,no.2,pp.358,April1981. [277] S.Woo,M.Ohara,E.Torrie,J.Singh,andA.Gupta,TheSPLASH-2Programs:CharacterizationandMethodologicalConsiderations,inProc.ofACMISCA,SantaMargheritaLigure,Italy,June1995. [278] C.Bienia,S.Kumar,andK.Li,PARSECvs.SPLASH-2:AQuantitativeComparisonofTwoMultithreadedBenchmarkSuitesonChip-Multiprocessors,inProc.oftheIEEEInternationalSymposiumonWorkloadCharacterization(IISWC),Seattle,Washington,September2008. [279] SESC,SESC:SuperESCalarSimulator,September2011.[Online].Available: http://iacoma.cs.uiuc.edu/paulsack/sescdoc/ [280] ARM7TDMI,ATMELEmbeddedRISCMicrocontrollerCore:ARM7TDMI,November2010.[Online].Available: http://www.atmel.com/ 399

PAGE 400

[281] ,ARM7TDMIDataSheet,November2010.[Online].Available: http://www.atmel.com/ [282] TILERA,TileProcessorArchitectureOverview,inTILERAOfcialDocumentation,Copyright2006-2009TileraCorporation,November2009. [283] L.Yang,R.Dick,H.Lekatsas,andS.Chakradhar,OnlineMemoryCompressionforEmbeddedSystems,ACMTrans.onEmbeddedComputingSystems(TECS),vol.9,no.3,pp.27:1:30,March2010. [284] Freescale,CacheLatenciesofthePowerPCMPC7451,January2011.[Online].Available: http://cache.freescale.com/les/32bit/doc/app note/AN2180.pdf [285] R.Min,W.-B.Jone,andY.Hu,LocationCache:ALow-PowerL2CacheSystem,inProc.ofACMInternationalSymposiumonLowPowerElectronicsandDesign(ISLPED),NewportBeach,California,August2004. [286] Y.Chen,E.Li,J.Li,andY.Zhang,AcceleratingVideoFeatureExtractionsinCBVIRonMulti-coreSystems,IntelTechnologyJournal,vol.11,no.4,pp.349,November2007. [287] P.Jain,Software-assistedcachemechanismsforembeddedsystems,Ph.D.dissertation,DepartmentofElectricalEngineeringandComputerScience,MassachusettsInstituteofTechnology,February2008. [288] CACTI,AnIntegratedCacheandMemoryAccessTime,CycleTime,Area,Leakage,andDynamicPowerModel,November2010.[Online].Available: http://www.hpl.hp.com/research/cacti/ [289] ITRS,InternationalTechnologyRoadmapforSemiconductors,January2011.[Online].Available: http://www.itrs.net/ [290] ARM,ARM7ThumbFamily,January2011.[Online].Available: http://saluc.engr.uconn.edu/refs/processors/arm/arm7 family.pdf [291] X.SunandJ.Zhu,PerformanceConsiderationsofSharedVirtualMemoryMachines,IEEETrans.onParallelandDistributedSystems(TPDS),vol.6,no.11,pp.1185,November1995. [292] R.BrownandI.Sharapov,PerformanceandProgrammabilityComparisonBetweenOpenMPandMPIImplementationsofaMolecularModelingApplication,Springer-VerlagLectureNotesinComputerScience,vol.4315,pp.349,November2008. [293] C.Lively,X.Wu,V.Taylor,S.Moore,H.-C.Chang,andK.Cameron,EnergyandPerformanceCharacteristicsofDifferentParallelImplementationsofScienticApplicationsonMulticoreSystems,InternationalJournalofHighPerformanceComputingApplications,2011. 400

PAGE 401

[294] G.Bikshandi,J.Guo,D.Hoeinger,G.Almasiy,B.Fraguelaz,M.Garzaran,D.Padua,andC.vonPrauny,ProgrammingforParallelismandLocalitywithHierarchicallyTiledArrays,inProc.oftheACMSIGPLANSymposiumonPrinciplesandPracticeofParallelProgramming(PPoPP),Manhattan,NewYorkCity,NewYork,March2006. [295] W.Zhu,J.Cuvillo,andG.Gao,PerformanceCharacteristicsofOpenMPLanguageConstructsonaMany-core-on-a-chipArchitecture,inProc.ofthe2005and2006InternationalConferenceonOpenMPSharedMemoryParallelProgramming,ser.IWOMP'05/IWOMP'06.Berlin,Heidelberg:Springer-Verlag,2008,pp.230.[Online].Available: http://portal.acm.org/citation.cfm?id=1892830.1892855 [296] E.Garcia,I.Venetis,R.Khan,andG.Gao,OptimizedDenseMatrixMultiplicationonaMany-CoreArchitecture,inProc.oftheACMEuro-ParconferenceonParallelprocessing,2010. [297] Intel,IntelXeonProcessorE5430,April2011.[Online].Available: http://ark.intel.com/Product.aspx?id=33081 [298] ,Quad-CoreIntelXeonProcessor5400SeriesDatasheet,August2008.[Online].Available: http://www.intel.com/assets/PDF/datasheet/318589.pdf [299] TILERA,TileProcessorArchitectureOverviewfortheTILEProSeries,inTileraOfcialDocumentation,November2009. [300] ,ManycorewithoutBoundaries:TILE64Processor,February2011.[Online].Available: http://www.tilera.com/products/processors/TILE64 [301] IBM,LinuxandSymmetricMultiprocessing,May2011.[Online].Available: http://www.ibm.com/developerworks/library/l-linux-smp/ [302] Android,SensorEvent,November2011.[Online].Available: http://developer.android.com/reference/android/hardware/SensorEvent.html [303] LINPACK,LINPACKBenchamarks,February2011.[Online].Available: http://en.wikipedia.org/wiki/LINPACK [304] NPB,NASAAdvancedSupercomputing(NAS)ParallelBenchmarks,May2011.[Online].Available: http://www.nas.nasa.gov/Resources/Software/npb.html [305] JavaDoc,ClassFlops:Countingoatingpointoperations,May2011.[Online].Available: http://ai.stanford.edu/paskin/slam/javadoc/javaslam/util/Flops.html [306] V.Kumar,A.Grama,A.Gupta,andG.Karypis,IntroductiontoParallelComputing.TheBenjamin/CummingsPublishingCompany,Inc.,1994. [307] J.Williams,C.Massie,A.George,J.Richardson,K.Gosrani,andH.Lam,CharacterizationofFixedandRecongurableMulti-CoreDevicesforApplication 401

PAGE 402

Acceleration,ACMTrans.onRecongurableTechnologyandSystems,vol.3,no.4,November2010. [308] TILERA,TileProcessorArchitectureOverview,inTileraOfcialDocumentation,November2009. [309] J.Richardson,S.Fingulin,D.Raghunathan,C.Massie,A.George,andH.Lam,ComparativeAnalysisofHPCandAccelaratorDevices:Computation,Memory,I/O,andPower,inProc.oftheIEEEWorkshoponHPRCTA,NewOrleans,Louisiana,November2010. [310] TILERA,TILEmPowerApplianceUser'sGuide,inTileraOfcialDocumentation,January2010. [311] K.Asanovic,R.Bodik,J.Demmel,T.Keaveny,K.Keutzer,J.Kubiatowicz,N.Morgan,K.Patterson,DavidSen,J.Wawrzynek,D.Wessel,andK.Yelick,AViewoftheParallelComputingLandscape,CommunicationsoftheACM,vol.52,no.10,October2009. [312] T.Berg,MaintainingI/ODataCoherenceinEmbeddedMulticoreSystems,IEEEMICRO,vol.29,no.3,pp.10,May/June2009. [313] G.BournoutianandA.Orailoglu,MissReductioninEmbeddedProcessorsthroughDynamic,Power-FriendlyCacheDesign,inProc.ofIEEE/ACMDAC,Anaheim,California,June2008. [314] SeaMicro,TheSM10000Family,June2011.[Online].Available: http://www.seamicro.com/ [315] AMAX,HighPerformanceComputing:ClusterMaxSuperGTeslaGPGPUHPCSolutions,June2011.[Online].Available: http://www.amax.com/hpc/productdetail.asp?product id=superg [316] P.Koka,M.McCracken,H.Schwetman,X.Zheng,R.Ho,andA.Krishnamoorthy,Silicon-photonicNetworkArchitecturesforScalable,Power-efcientMulti-chipsystems,inProc.ofACM/IEEEISCA,Saint-Malo,France,June2010. [317] M.AsghariandA.Krishnamoorthy,SiliconPhotonics:Energy-efcientCommunication,NaturePhotonics,pp.268,May2011. [318] C.-W.Lee,S.-R.-N.Yun,C.-G.Yu,J.-T.Park,andJ.-P.Colinge,DeviceDesignGuidelinesforNano-scaleMuGFETs,ElsevierSolid-StateElectronics,vol.51,no.3,pp.505,March2007. [319] S.Collange,D.Defour,andA.Tisserand,PowerConsumptionofGPUsfromaSoftwarePerspective,inProc.ofACMICCS,BatonRouge,Louisiana,May2009. 402

PAGE 403

BIOGRAPHICALSKETCH ArslanMunirreceivedhisB.S.inelectricalengineeringfromtheUniversityofEngineeringandTechnology(UET),Lahore,Pakistan,in2004,andhisM.A.Sc.degreeinelectricalandcomputerengineering(ECE)fromtheUniversityofBritishColumbia(UBC),Vancouver,Canada,in2007.HereceivedhisPh.D.degreeinECEfromtheUniversityofFlorida(UF),Gainesville,Florida,USAinthespringof2012.HewasalsoinvitedasavisitingresearchstudentattheUniversityofToronto(UofT),Ontario,Canadainthespringof2012.HeservedasavisitinglecturerintheDepartmentofElectricalEngineering,UET,Lahorein2004.HeworkedasamanagementconsultantinNationalEngineeringServicesPakistan(NESPAK),from2004to2005,wherehewasresponsibleforthecompilationoftechnicalspecicationsleadingtotheformulationofbiddingdocumentsfortheprocurementsofdiversehigh-techelectronicprojectsaswellasevaluationofbothnancialandtechnicalaspectsofbidstendered.From2007to2008,heworkedasasoftwaredevelopmentengineeratMentorGraphicsintheEmbeddedSystemsDivisionwhereheworkedonthesoftwaredevelopmentformobilephoneandportablemediaplayerprojectsalongwiththetestingandportingofthedevelopedsoftwareontargetembeddedhardware.HewastherecipientofmanyacademicawardsincludingtheacademicRollofHonorandGoldMedalsforthebestoverallperformanceinElectricalEngineering.HereceivedaBestPaperawardattheIARIAInternationalConferenceonMobileUbiquitousComputing,Systems,ServicesandTechnologies(UBICOMM)in2010.HeiscurrentlyareviewerformanyprestigiousIEEE/ACMjournalsandconferences.Hiscurrentresearchinterestsincludeembeddedsystems,low-powerdesign,computerarchitecture,multi-coreplatforms,parallelcomputing,dynamicoptimizations,fault-tolerance,andcomputernetworks. 403