Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0022603/00001
## Material Information- Title:
- Scheduling Algorithms for Energy Minimization
- Creator:
- Kang, Jaeyeon
- Place of Publication:
- [Gainesville, Fla.]
- Publisher:
- University of Florida
- Publication Date:
- 2008
- Language:
- english
- Physical Description:
- 1 online resource (171 p.)
## Thesis/Dissertation Information- Degree:
- Doctorate ( Ph.D.)
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Computer Engineering
Computer and Information Science and Engineering - Committee Chair:
- Ranka, Sanjay
- Committee Co-Chair:
- Sahni, Sartaj
- Committee Members:
- Fortes, Jose A.
Peir, Jih-Kwon Avery, Paul R. - Graduation Date:
- 8/9/2008
## Subjects- Subjects / Keywords:
- Algorithms ( jstor )
Deadlines ( jstor ) Directed acyclic graphs ( jstor ) Electric potential ( jstor ) Energy consumption ( jstor ) Energy requirements ( jstor ) Energy value ( jstor ) Prioritization ( jstor ) Scheduling ( jstor ) Temporal logic ( jstor ) Computer and Information Science and Engineering -- Dissertations, Academic -- UF dvs, multicore, parallel, power, scheduling - Genre:
- Electronic Thesis or Dissertation
born-digital ( sobekcm ) Computer Engineering thesis, Ph.D.
## Notes- Abstract:
- Energy consumption is a critical issue in parallel and distributed embedded systems. We present novel algorithms for energy efficient scheduling of DAG (Directed Acyclic Graph) based applications on DVS (Dynamic Voltage Scaling) enabled systems. The proposed scheduling algorithms mainly consist of assignment and slack allocation. All schemes for the assignment and the slack allocation effectively minimize energy consumption while meeting the deadline constraints in static or dynamic environments. They are also equally applicable to the homogenous and heterogeneous parallel machines. Experimental results show that the proposed algorithms provide significantly good performance for energy minimization and require considerably small computational time. ( en )
- General Note:
- In the series University of Florida Digital Collections.
- General Note:
- Includes vita.
- Bibliography:
- Includes bibliographical references.
- Source of Description:
- Description based on online resource; title from PDF title page.
- Source of Description:
- This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
- Thesis:
- Thesis (Ph.D.)--University of Florida, 2008.
- Local:
- Adviser: Ranka, Sanjay.
- Local:
- Co-adviser: Sahni, Sartaj.
- Statement of Responsibility:
- by Jaeyeon Kang.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Kang, Jaeyeon. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Classification:
- LD1780 2008 ( lcc )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

PAGE 1 1 SCHEDULING ALGORITHMS FO R ENERGY MINIMIZATION By JAEYEON KANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 PAGE 2 2 2008 Jaeyeon Kang PAGE 3 3 To my loved Mom and Dad PAGE 4 4 ACKNOWLEDGMENTS First and forem ost, I would like to thank my advisor, Sanjay Ranka, for his constant support and guidance. He taught me passion, pati ence, and devotion which are necessary for a true researcher. I would also like to thank my co-advisor, Sartaj Sahni, for his helpful advice and guidance. He taught me thoroughness and attitude towards research and helped me think more broadly about my work. My grateful thanks al so go to my committee members, Jih-Kwon Peir, Jose Fortes, and Paul Avery, for thei r valuable insights and comments. I am grateful to all my colleagues for being my good friends and collaborators. They have been very helpful and supportiv e academically and personally. They have made my journey memorable. I wish to give special thanks to al l of my friends in Korea for listening to me and making me peaceful. Finally, none of this would have happened w ithout the full support of my loved family. I would like to thank my mom who is in heave n, for always believing in me and supporting me. She was the right person who helped me overc ome a lot of difficu lties throughout my PhD program. I would like to thank my dad for motiva ting me to start this journey and encouraging me to continue it. He has served as an excellent role model in my life. I would also like to thank my brothers for their sincere support and encouragement. My deep est gratitude goes to my loved husband, Hyuckchul, for being with me. Words ar e not enough to express my gratitude for everything he has done for me. I love him and hope that I will be there for him when he needs me. And, I thank my eight-month-old daughter, Katherine (Hyunseung), for coming to me. She has made me the happiest person in the whole worl d. I love her and promise that I will always be on her side. PAGE 5 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ..........9 LIST OF FIGURES.......................................................................................................................11 ABSTRACT...................................................................................................................................15 CHAP TER 1 INTRODUCTION..................................................................................................................16 1.1 Introduction ................................................................................................................... 16 1.2 Prelim inaries................................................................................................................. 18 1.2.1 Energy Model ....................................................................................................18 1.2.2 Application Model ............................................................................................ 19 1.2.3 Dyna mic Environments..................................................................................... 20 1.2.3.1 Overestim ation.................................................................................... 20 1.2.3.2 Underestim ation.................................................................................. 20 1.3 Scheduling f or Energy Minimization............................................................................ 21 1.3.1 Static Assignm ent.............................................................................................21 1.3.1.1 Assignm ent to minimize total finish time........................................... 21 1.3.1.2 Assignm ent to minimize total energy consumption........................... 22 1.3.2 Static Slack Allocation ...................................................................................... 22 1.3.3 Dyna mic Assignment........................................................................................22 1.3.4 Dyna mic Slack Allocation................................................................................22 1.4 Contributions .................................................................................................................24 1.4.1 Static Assignm ent to Mini mize Total Finish Time:.......................................... 24 1.4.2 Static Assignm ent to Minimize Total Energy Consumption............................ 25 1.4.3 Static Slack Allocation to Minim ize Total Energy Consumption..................... 25 1.4.4 Dyna mic Slack Allocation to Mini mize Total Energy Consumption............... 26 1.4.5 Dyna mic Assignment to Minimize Total Energy Consumption....................... 27 1.5 Docum ent Layout.......................................................................................................... 28 2 RELATED WORK................................................................................................................. 29 2.1 Static Slack Allocation .................................................................................................. 31 2.1.1 Non-optim al Slack Allocation.......................................................................... 31 2.1.2 Near-optim al Slack Allocation.......................................................................... 32 2.2 Dynamic Slack Allocation............................................................................................34 2.3 Static Assignm ent.........................................................................................................34 2.4 Dynamic Assignment....................................................................................................37 PAGE 6 6 3 STATIC SLACK ALLOCATION......................................................................................... 38 3.1 Proposed Slack Allocation ............................................................................................38 3.2 Unit Slack Allocation .................................................................................................... 41 3.2.1 Maxim um Available Slack for a Task.............................................................. 41 3.2.2 Compatible Task Matrix................................................................................... 42 3.2.3 Search Space Reduction .................................................................................... 44 3.2.3.1 Fully independent tasks ......................................................................45 3.2.3.2 Fully dependent tasks ......................................................................... 45 3.2.3.3 Com pressible tasks............................................................................. 45 3.2.4 Branch and Bound Search ................................................................................. 47 3.2.5 Estim ating the Lower Bound to Reduce the Search Space............................... 49 3.3 Experim ental Results....................................................................................................50 3.3.1 Simulation Methodology................................................................................... 51 3.3.1.1 The DAG g eneration...........................................................................51 3.3.1.2 Perform ance measures........................................................................ 51 3.3.2 Mem ory Requirements...................................................................................... 52 3.3.3 Determ ining the Size of Unit Slack and the Number of Intervals.................... 53 3.3.4 Homogeneous Environments............................................................................ 55 3.3.4.1 Com parison of energy requirements................................................... 55 3.3.4.2 Com parison of time requirements...................................................... 59 3.3.5 Heterogeneous Environm ents...........................................................................60 3.3.5.1 Com parison of energy requirements................................................... 60 3.3.5.2 Com parison of time requirements...................................................... 64 3.3.6 Effect of Se arch Space Reduc tion Techniques for PathDVS............................ 65 4 DYNAMIC SLACK ALLOCATION.................................................................................... 68 4.1 Proposed Dynam ic Slack Allocation............................................................................ 69 4.1.1 Choosing a Subset of Tasks for Slack Reallocation ......................................... 71 4.1.1.1 Greedy approach .................................................................................72 4.1.1.2 The k tim e lookahead approach.......................................................... 72 4.1.1.3 The k descendent lookahead approach ............................................... 73 4.1.2 Time Range for Selected Tasks........................................................................ 75 4.2 Experim ental Results....................................................................................................79 4.2.1 Simulation Methodology................................................................................... 79 4.2.1.1 The DAG g eneration...........................................................................79 4.2.1.2 Dyna mic environments generation..................................................... 79 4.2.1.3 Perform ance measures........................................................................ 80 4.2.2 Overestim ation.................................................................................................. 81 4.2.2.1 Com parison of energy requirements................................................... 81 4.2.2.2 Com parison of time requirements...................................................... 87 4.2.3 Underestim ation................................................................................................89 4.2.3.1 Com parison of deadline requirements................................................ 90 4.2.3.2 Com parison of energy requirements................................................... 95 4.2.3.3 Com parison of time requirements.................................................... 100 PAGE 7 7 5 STATIC ASSIGNMENT......................................................................................................102 5.1 Overall Scheduling Process.........................................................................................103 5.2 Proposed Static Assignm ent to Minimize Finish Time............................................... 106 5.2.1 Task Selection .................................................................................................107 5.2.2 Processor S election......................................................................................... 108 5.2.3 Iterative Scheduling ........................................................................................ 109 5.3 Proposed Static Assignm ent to Minimize Energy......................................................111 5.3.1 Task Prioritization ...........................................................................................112 5.3.2 Estim ated Deadline for a Task........................................................................ 114 5.3.3 Processor S election......................................................................................... 115 5.3.3.1 Greedy approach for the com putation of expected energy............... 116 5.3.3.2 Exam ple for assignment................................................................... 118 5.4 Experim ental Results for Assignment Al gorithms that Minimize Finish Time......... 120 5.4.1 Simulation Methodology................................................................................. 121 5.4.1.1 The DAG g eneration.........................................................................121 5.4.1.2 Perform ance measures...................................................................... 121 5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms... 121 5.4.3 Comparison between CPS (Used in Prior Scheduling for Energy Minimization) and ICP.................................................................................... 126 5.5 Experim ental Results for Assignment Algorithms that Minimize Energy................. 127 5.5.1 Simulation Methodology................................................................................. 128 5.5.1.1 The DAG g eneration.........................................................................128 5.5.1.2 Perform ance measures...................................................................... 128 5.5.1.3 Variations of our algorithm s.............................................................129 5.5.1.4 Variations of GA based algorithm s.................................................. 130 5.5.2 DVS Sche mes to Compute Expected En ergy in Processor Selection Step..... 131 5.5.3 Independence between T ime and Energy Requirements................................ 131 5.5.3.1 Com parison of energy requirements of proposed algorithms........... 132 5.5.3.2 Com parison of energy requirements with GA based algorithms......134 5.5.3.3 Com parison of time requirements.................................................... 139 5.5.4 Dependence between Tim e and Energy Requirements...................................141 6 DYNAMIC ASSIGNMENT................................................................................................ 144 6.1 Proposed Dynam ic Assignment.................................................................................. 145 6.1.1 Choosing a Subset of Tasks for Rescheduling ................................................ 146 6.1.2 Time Range for Selected Tasks...................................................................... 147 6.1.3 Estim ated Deadline and Energy...................................................................... 149 6.1.4 Processor S election......................................................................................... 150 6.2 Experim ental Results.................................................................................................. 152 6.2.1 System Methodology...................................................................................... 153 6.2.1.1 The DAG g eneration.........................................................................153 6.2.1.2 Dyna mic environments generation................................................... 153 6.2.1.3 Perform ance measures...................................................................... 154 6.2.2 Comparison of Energy Requirements............................................................. 154 6.2.3 Comparison of Time Requirements................................................................158 PAGE 8 8 7 CONCLUSION AND FUTURE WORK............................................................................. 160 7.1 Static Slack Allocation ................................................................................................ 160 7.2 Dynamic Slack Allocation..........................................................................................161 7.3 Static Assignm ent.......................................................................................................162 7.4 Dynamic Assignment..................................................................................................162 7.5 Future W ork................................................................................................................163 LIST OF REFERENCES.............................................................................................................164 BIOGRAPHICAL SKETCH.......................................................................................................171 PAGE 9 9 LIST OF TABLES Table page 3-1 Results for 100 tasks in homogeneous environm ents: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 553-2 Results for 200 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 563-3 Results for 300 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 563-4 Results for 400 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 573-5 Normalized energy consumption of Path DVS and LPDVS with respect to different deadline extension rates in homogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)..................................................................... 583-6 Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous environments................................................................................................................... ...593-7 Results for 100 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 613-8 Results for 200 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 613-9 Results for 300 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 623-10 Results for 400 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)............... 623-11 Normalized energy consumption of Path DVS and LPDVS with respect to different deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)...................................................... 63 PAGE 10 10 3-12 Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous environm ents................................................................................................................... ...643-13 Number of tasks participa ting in search with respect to different number of tasks and processors...........................................................................................................................663-14 Depth of search tree with respect to different number of tasks and processors................. 663-15 Number of nodes explored in search with respect to di fferent number of tasks and processors...........................................................................................................................674-1 Normalized energy consumption of k time lookahead and k descendent lookahead algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension............................................................. 834-2 Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished ta sk rates and time increase rates for 0.05 deadline extension rate................................................................................ 915-1 Results for 50 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to di fferent deadline exte nsion rates (unit: percentage).......................................................................................................................1235-2 Results for 50 tasks and 8 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to di fferent deadline exte nsion rates (unit: percentage).......................................................................................................................1235-3 Results for 50 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to di fferent deadline exte nsion rates (unit: percentage).......................................................................................................................1245-4 Results for 100 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to di fferent deadline exte nsion rates (unit: percentage).......................................................................................................................1245-5 Results for 100 tasks and 8 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to di fferent deadline exte nsion rates (unit: percentage).......................................................................................................................1255-6 Results for 100 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to differe nt deadline extension rates for 100 tasks on 16 processors (unit: percentage)................................................................................. 125 PAGE 11 11 LIST OF FIGURES Figure page 1-1 Example of DAG and assignment DAG............................................................................ 20 1-2 Overall process of scheduling for energy minimization.................................................... 23 3-1 Example of a DAG and assignment on two processors..................................................... 41 3-2 Compatible task matrix and lis ts for an exam ple in Figure 1-1......................................... 44 3-3 Compression of assignment DAG..................................................................................... 47 3-4 Compression of compatible task lists................................................................................ 47 3-5 Reduced compatible task lists and search graph................................................................ 49 3-6 Runtime of PathDVS with respect to different size of DAGs (unit: m s)........................... 52 3-7 Normalized energy consumption of Path DVS wi th respect to different unit slack rates for different number of tasks..................................................................................... 53 3-8 Normalized energy consumption of LPDVS with respec t to different interval rates for different number of tasks.............................................................................................. 54 3-9 Normalized energy consumption of slack a llocation algorithms with respect to different deadline extension rates for different number of tasks....................................... 58 3-10 Runtime to execute algorithms with respec t to different deadline extension rates for different nu mber of tasks in homogeneous environments (unit: ms)................................ 59 3-11 Normalized energy consumption of slack a llocation algorithms with respect to different deadline extension rates for diffe rent number of tasks in heterogeneous environments................................................................................................................... ...63 3-12 Runtime to execute algorithms with respec t to different deadline extension rates for different nu mber of tasks in heterogeneous environments (unit: ms)................................ 64 4-1 Tasks selected for slack reallocation in an assignm ent DAG depending on dynamic slack allocation algorithms................................................................................................74 4-2 Overestimation: Time range for se lected slack allocable tasks using k -tim e lookahead approach and k -descendent lookahead approach...............................................................78 4-3 Underestimation: Time range for selected slack allocable tasks using k -t ime lookahead approach and k -descendent lookahead approach..............................................78 PAGE 12 12 4-4 Normalized energy consumption of Gr eedy, dPathDVS, and kallDescendent with respect to different early finished task ra tes and tim e decrease rates for no deadline extension............................................................................................................................82 4-5 Normalized energy consump tion for no deadline extension.............................................. 84 4-6 Normalized energy consumpti on for 0.01 deadline extension rate.................................... 85 4-7 Normalized energy consumpti on for 0.02 deadline extension rate.................................... 85 4-8 Normalized energy consumpti on for 0.05 deadline extension rate.................................... 86 4-9 Normalized energy consumpti on for 0.1 deadline extension rate ......................................86 4-10 Normalized energy consumpti on for 0.2 deadline extension rate ......................................87 4-11 Computational time to readjust the schedul e fro m an early finished task with respect to different time decrease rates for no deadline extension (unit: ns via logarithmic scale)..................................................................................................................................88 4-12 Results for variable de a dline extension rates: Computational time to readjust the schedule from one early finished task with respect to different time decrease rates (unit: ns via logarithmic scale)........................................................................................ 89 4-13 Deadline miss ratio with respect to different tim e increase rates and late finished task rates for 0.05 deadline extension rate................................................................................ 90 4-14 Deadline miss ratio for no deadline extension................................................................... 92 4-15 Deadline miss ratio for 0 .01 deadline extension rate .........................................................93 4-16 Deadline miss ratio for 0 .02 deadline extension rate .........................................................93 4-17 Deadline miss ratio for 0 .05 deadline extension rate .........................................................94 4-18 Deadline miss ratio for 0.1 deadline extension rate........................................................... 94 4-19 Deadline miss ratio for 0.2 deadline extension rate........................................................... 95 4-20 Energy increase ratio with respect to diffe rent tim e increase rates and late finished task rates for 0.05 deadline extension rate.........................................................................96 4-21 Energy increase ratio for no deadline extension................................................................97 4-22 Energy increase ratio for 0.01 deadline extension rate......................................................97 4-23 Energy increase ratio for 0.02 deadline extension rate......................................................98 4-24 Energy increase ratio for 0.05 deadline extension rate......................................................98 PAGE 13 13 4-25 Energy increase ratio for 0.1 deadline extension rate........................................................ 99 4-26 Energy increase ratio for 0.2 deadline extension rate........................................................ 99 4-27 Computational time to readjust the schedule f rom a late finished task with respect to different time increase rates for no deadlin e extension (unit: ns via logarithmic scale)................................................................................................................................100 4-28 Results for variable deadline extension ra tes : Computational time to readjust the schedule from one late finished task with respect to different time decrease rates (unit: ns via logarithmic scale)...................................................................................... 101 5-1 A high level description of proposed scheduling approach ............................................. 105 5-2 The ICP procedure ........................................................................................................... 110 5-3 The DVSbasedAssignment procedure ..............................................................................117 5-4 Example of assignment to minimize fini sh tim e and assignment to minimize DVS based energy.....................................................................................................................120 5-5 Normalized energy consumption of ICP and CPS using PathDVS with respect to different deadline extension rates for diffe rent number of tasks and processors ............. 127 5-6 Comparison between optimal scheme and greedy schem e for pro cessor selection of A0 for 50 tasks on 4 and 8 processors............................................................................. 131 5-7 Results for 50 tasks: Normalized energy consumption of our algorithm s with respect to variable deadline extension rates for different number of processors......................... 132 5-8 Results for 100 tasks: Normalized en ergy consumption of our algorithm s with respect to variable deadline extension rates for different number of processors............. 133 5-9 Improvement of our algorithms over ICPPathDVS (i.e., baseline algo rithm) with respect to different number of processors for variable deadline extension rates (unit: percentage).......................................................................................................................134 5-10 Normalized energy consumption of GARandNonOptim al and our algorithms for different number of tasks and processors......................................................................... 136 5-11 Normalized energy consumption of GARa ndOpt imal and our algorithms for different number of tasks and processors....................................................................................... 137 5-12 Normalized energy consumption of GAS olNonOptim al and our algorithms with respect to different extension rates for di fferent number of tasks and processors........... 138 5-13 Normalized energy consumption of GASolNonOptim al and our algorithms................. 138 5-14 Normalized energy consumption of GASolOptim al and our algorithms........................ 139 PAGE 14 14 5-15 Runtime to execute our al gor ithms with respect to variable deadline extension rates for different number of tasks (unit: ms)........................................................................... 140 5-16 Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms logarithmic scale).............. 140 5-17 Results for 4 processors: Improvement of our algorithm s over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................142 5-18 Results for 8 processors: Improvement of our algorithm s over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................143 6-1 The DynamicDVSbasedAssignment procedure ................................................................ 152 6-2 Results for 4 processors: Normalized energy consum ption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks.................................................................. 155 6-3 Results for 8 processors: Normalized energy consum ption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks.................................................................. 156 6-4 Results for 16 processors: Normalized energy consum ption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks.................................................................. 157 6-5 Results for 32 processors: Normalized energy consum ption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks.................................................................. 158 6-6 Computational time to readjust the schedul e fro m an early finished task with respect to different time decrease rates (unit: ns via logarithmic scale)................................... 159 PAGE 15 15 Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy SCHEDULING ALGORITHMS FO R ENERGY MINIMIZATION By Jaeyeon Kang August 2008 Chair: Sanjay Ranka Cochair: Sartaj Sahni Major: Computer Engineering Energy consumption is a critical issue in pa rallel and distributed embedded systems. We present novel algorithms for energy efficient sc heduling of DAG (Directed Acyclic Graph) based applications on DVS (Dynamic Voltage Scaling) enabled systems. The proposed scheduling algorithms mainly consist of assignment and slack allocation. All schemes for the assignment and the slack allocation effectively minimize en ergy consumption while meeting the deadline constraints in static or dynamic environments They are also equally applicable to the homogenous and heterogeneous parallel machines. Experimental results show that the proposed algorithms provide significantly good performance for energy minimization and require considerably small computational time. PAGE 16 16 CHAPTER 1 INTRODUCTION 1.1 Introduction Com puters use a significant and growing portion of the energy consumption. Roughly 8% of the electricity in the US is now being cons umed by computers [1]. A study by Dataquest [15] reported that the world-wide total power dissi pation of processors in PCs was 160MW in 1992, and by 2001 it had grown to 9000MW. It is now widely recognized that power-aware computing is no longer an issue confined to mobile and real-time computing environments, but is also important for desktop and conventional computi ng as well. In particular, high-performance parallel and distributed system s, data centers, supercomputers, clusters, embedded systems, servers, and networks consume considerable amount of energy. In addition to expenses related to energy consumption of computers, significant additional costs have to be borne for cooling the facility. Thus reducing the energy requirements of executing an appl ication is very important for both large scale systems that consume consid erable amount of energy and embedded systems that utilize battery for their power. More recently, industry and researchers are eyei ng multi-core processors, which can attain higher performance by running multiple thread s in parallel [18, 19, 36, 39, 40, 58, 67, 68]. By integrating multiple cores on a chip, designers hope to sustain performance growth while depending less on raw circuit speed and decr easing the power requi rements per unit of performance. These workhorses of the next gene ration of supercomputers and wireless devices are poised to alter the horizon of high-performance computing. However, proper scheduling and allocation of applications on these architectures is required [17]. Most effective energy minimization techniques are based on Dyna mic Voltage Scaling (DVS). The DVS technique assigns differentia l voltages to each task to minimize energy PAGE 17 17 requirements of an application [20, 63, 66]. A ssigning differential voltages is the same as allocating additional time or slack to a task. This technique has been found to be a very effective method for reducing energy in DVS enabled pro cessors. Scheduling algorithms without DVS technique such as Energy Aware Scheduling [27, 28] and several heuristics in [61] do not perform as well in DVS-enabled systems. There is considerable research on DVS sche duling algorithms for independent tasks in a single processor real time system [3, 4, 5, 11, 12, 21, 23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52, 59, 60, 70, 72, 73, 76, 78, 79]. Recently, several DVS based algorithms for slack allocation have been proposed for tasks with precedence relationsh ips in a multiprocessor real time system [6, 13, 22, 29, 31, 45, 46, 47, 48, 51, 55, 56, 57, 75, 77]. The precedence relationships are represented as a Directed Acyclic Graph (DAG) consisting of nodes that represent computations and edges that represent the dependency betw een the nodes. DAGs have been shown to be representative of a large number of applications. We explore novel scheduling algorithms for DVS based en ergy minimization of DAG based applications on parallel and distribute d machines. The proposed schemes are equally applicable to homogenous and heterogeneous parallel machines. The scheduling of DAG based applications with the goal of DVS based energy minimization broadly consists of two steps: assignment and slack allocation. Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level. Note that the finish time of DAG at the maximum voltage has to be less than or equal to the deadline for any feasible schedule. Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the to tal energy consumption is minimized while the DAG can execute within a given deadline. PAGE 18 18 A scheduling algorithm can be cl assified into static sche duling algorithm (i.e., offline algorithm) and dynamic scheduling algorithm (i.e ., online algorithm). The static scheduling algorithms for DAG execution use the estimated execution time of tasks. However, the estimated execution time (ET) of tasks may be different fr om their actual execution time (AET) at runtime. The dynamic environments can be divided into tw o broad categories based on whether the actual execution time is less than or more than th e estimated time: overestimation (AET < ET) and underestimation (AET > ET). These dynamic environm ents may either potentially give a chance to minimize energy requirements more or ma ke deadline constraints missed. The dynamic scheduling algorithms address these problems at runtime with the goals of minimizing energy consumption and satisfying deadline constraints. In this thesis, we present novel scheduling algorithms for energy minimization in both static and dynamic environments. The algorithms can be mainly divided into four categories: static slack allocation, dynamic slack allocati on, static assignment, and dynamic assignment. Algorithms for each of the four categories will be presented in Chapter 3, 4, 5 and 6, respectively. 1.2 Preliminaries In this sec tion, we briefly describe the en ergy model, the application model, and the dynamic environments used in this thesis. 1.2.1 Energy Model The Dyna mic Voltage Scaling (DVS) techniqu e reduces the dynamic power dissipation by dynamically scaling the supply voltage and the clock frequency of processors. The power dissipation, Pd, is represented by fVCPddefd2, where Cef is the switched capacitance, Vdd is the supply voltage, and f is the operating frequency [9, 10] The relationship between the supply PAGE 19 19 voltage and the frequency is represented by dd tddVVVkf /2, where k is the constant of circuit and Vt is the threshold voltage. The en ergy consumed to execute task i, Ei, is expressed by iddeficVCE 2, where ci is the number of cycles to execu te the task. The supply voltage can be reduced by decreasing the processor speed. It also reduces energy cons umption of task. Here we use the tasks execution time at the maxi mum supply voltage during assignment to guarantee deadline constraints, given as max/ fc compTimeii 1.2.2 Application Model The Directed Acyclic G raph (DAG) repres ents the workflow among tasks. In a DAG shown in Figure 1-1 (a), a node represents a task and a directed edge between nodes represents the precedence relationship of tasks. Given a DAG, the assignment of tasks in a DAG to their appropriate processors in a pa rallel architecture w ill be done through an assignment algorithm. Figure 1-1 (b) depicts the assignment for the DAG of Figure 1-1 (a). The assignment is various depending on mapping methods while it satisfies a given deadline of the DAG. Figure 1-1 (c) represents the assignment DAG, which is the di rect workflow among tasks generated after the assignment. The direct precedence relationship of tasks may change from one in an original DAG depending on the given assi gnment. For instance, task 1 and task 4 have a direct dependency in the original DAG, but, in the assi gnment DAG, they have no direct dependency. Furthermore, if task 2 finishes at time 5, task 5 has no more direct dependency with task 2 while the dependency is indirectly presented in the assignment DAG. And, there may be additional dependencies in the assignment DAG due to sche duling constraints within a processor. For example, task 3 and task 4 have a dependency relations hip in the assignment DAG. PAGE 20 20 Figure 1-1. Example of DAG and assignment DAG: (a) DAG, (b) Assignment on two processors, (c) Assignment DAG 1.2.3 Dynamic Environments The actual execution tim e (AET) of tasks may be different from their estimated execution time (ET) used in static scheduling. We divide the tasks into two broad categories based on whether the actual execution time is less than or more than the estimated time: overestimation (i.e., AET < ET) and underestimation (i.e., AET > ET). 1.2.3.1 Overestimation For m ost real time applications, an upper worst case bound on the actual execution time of each task is used to guarantee that the appli cation completes in a given time bound. Many such tasks may complete earlier than expected during the actual execution. Also when historical data is used to estimate the time requirements, the act ual execution time of each task may be less than its estimated execution time. This allows for depe ndent tasks to potentiall y begin at an earlier time than what was envisioned during the static sc heduling. The extra available slack can then be allocated to tasks that have not yet begun execution with the goal of re ducing the total energy requirements while still meeting the deadline constraints. 1.2.3.2 Underestimation For m any applications that do not use the worst case execution time for estimation, the actual execution time of a task ma y be larger than its estimated execution time. In this case, it 0 1 2 3 4 5 6 7 8 9 10 11 12 P0 1 2 3 4 5 6 7 deadline 1 2 3 5 4 6 7 7 1 2 3 5 4 6 P1 ( a ) (b)(c) PAGE 21 21 cannot be guaranteed that the deadline constraint s will be always satisfied. However, slack can be removed from future tasks with the hope of satisfying the deadline constraints as closely as possible while trying to keep energy reduction. 1.3 Scheduling for Energy Minimization Figure 1-2 shows the overall process of sc heduling algorithm for energy m inimization. The following four step process for scheduling task s in a DAG for energy minimization is broadly required: Static assignment Static slack allocation Dynamic assignment Dynamic slack allocation 1.3.1 Static Assignment The static as signment process determines the ordering to execute task s and the mapping of tasks to processors based on the computation tim e at the maximum voltage level. The schedule generated from this process is not completed because there may be slack until the deadline. The assignment is performed by two different methods: assignment to minimize total finish time and assignment to minimize total energy consumption. 1.3.1.1 Assignment to minimize total finish time The assignm ent is performed in order to minimi ze total finish time of a DAG. The deadline has to be greater than or equal to the total fini sh time for a feasible solution. An important side effect of minimizing the total finish time is that for a given deadlin e, the total amount of available slack is increased. In general, highe r slack should lead to lower energy after the application of slack al location algorithms. PAGE 22 22 1.3.1.2 Assignment to minimize total energy consumption The assignm ent is performed in order to mi nimize total energy consumption after slack allocation (i.e., DVS based energy) while still meeting the deadline constraints. It can be done by considering the energy consumption while dete rmining the execution ordering of tasks and expected energy after slack al location while mapping tasks to processors. In general, incorporating energy minimization during the assignment process should lead to better performance in terms of reduc ing energy requirements. 1.3.2 Static Slack Allocation The static s lack allocation process allocates slack to tasks to minimize energy consumption while meeting deadline constraints at compile time The initial static schedule is generated after static assignment and static slack allocation (i.e., static scheduling). The problem of slack allocation can be posed as the follo wing: Allocate a variable amount of slack to each task so that the total energy consumption is minimized while the deadlines are met. 1.3.3 Dynamic Assignment The dynam ic assignment process reassigns tasks to processors whenever a task finishes earlier or later than expected based on the curren t schedule (i.e., the initial static schedule or the previous schedule updated at runtime) at runtim e. The reassignment is performed to minimize DVS based energy. However, if the deadline constr aints are not satisfied, the reassignment is ignored and the current assignment is kept. On ce the reassignment is determined, slack is reallocated to tasks (i.e., dynami c slack allocation) to minimize energy consumption while still meeting the deadline constraints. 1.3.4 Dynamic Slack Allocation The dynam ic slack allocation process reallocates slack to tasks whenever a task finishes earlier or later than expected based on the curren t schedule (i.e., the initial static schedule or the PAGE 23 23 previous schedule updated at runtime) at runtime. The current schedule is in itialized to the static schedule and updated whenever dynamic scheduling is applied from the occurrence of early or late finished tasks at runtime. The assignment is not changed during slack reallocation. The main goal of dynamic slack allocation algorithm is slightly different depending on dynamic environments (i.e., whether the estimated execu tion time of a task is overestimated or underestimated). For overestimation, the dynamic slack allocation algorithm minimizes energy consumption while guaranteeing that the de adline constraints are always met. For underestimation, it tries to reduce the possibi lity of the DAG not completing by the required deadline while trying to keep energy reduction. Figure 1-2. Overall process of sc heduling for energy minimization Static Scheduling Static Assignment Static Slack Allocation runtime Dynamic Scheduling Dynamic Assignment Dynamic Slack Allocation PAGE 24 24 1.4 Contributions In this section, we present the m ain contributions of th e proposed scheduling algorithms presented in this thesis. 1.4.1 Static Assignment to Minimize Total Finish Time: While m ost of prior research on the schedu ling for energy minimization of DAGs has not concentrated on the assignment process, we show that the assignment itself is very important to minimize energy requirements as much as the sl ack allocation process. In general, minimizing the time (i.e., scheduling length of a DAG) and minimizing the energy are referred to as conflicting goals. However, when using DVS t echniques under a specifie d deadline, we show that minimizing total finish time can lead to be tter energy requirements due to the increase of total amount of available slack. The main featur es of the proposed st atic assignment algorithm minimizing finish time are as follows: Assign multiple independent ready tasks simultaneously: The computation of priority of a task depends on estimating the execution path from this task to the last task of the DAG representing the workflow. Since the ma pping of tasks yet to be scheduled is unknown and the cost of task execution depe nds on the processor that is assigned, the priority has to be approximated during schedul ing. Hence, it is difficult to explicitly distinguish the execution order of tasks with similar prioriti es. Using this intuition, the proposed algorithm forms independent ready tasks whose priorities are similar into a group and finds an optimal solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready tasks that can be assigned consists of tasks for which all the predecessors have already been assigned. Iteratively refine the scheduling: The scheduling is iterativel y refined by using the cost of the critical path based on the assignment generated in the previ ous iteration. Here the critical path is defined by the length of the longest path from a task to an exit task and it is used to determine the priority of the tas k. Assuming that the mappings of the previous iteration are good, it provides a better estimate of th e cost of the critical path than using the average or median computation and communi cation time as the estimate in the first iteration. PAGE 25 25 1.4.2 Static Assignment to Mini miz e Total Energy Consumption Most of the prior research on the scheduli ng for energy minimization of DAGs is based on a simple list based assignment algorithm. The assign ment that minimizes total finish time may be a reasonable approach as minimizing time generally leads to more slack to be allocated and finally reducing the energy requirements during the slack allocation step. However, this approach cannot incorporate the differential energy and time requirements of each task of the workflow on different processors. Our assignment algorith ms mitigate the problem by considering the expected effect of slack allocation during the assignment process. They significantly outperform other existing algorithms in te rms of energy consumption. Furt hermore, they require small computational time. The main features of the proposed static assignm ent algorithms minimizing energy consumption are as follows: Utilize expected DVS based energy information during assignment: Our algorithm assigns the appropriate processor for each task such that the total energy expected after slack allocation is minimized. The expected en ergy after slack allo cation (i.e., expected DVS based energy) for each task is computed by using the estimated deadline for each task so that the overall DAG can be executed with in the deadline of the DAG. This leads to good performance in terms of energy minimization. Consider multiple task prioritizations: We test multiple assignments using multiple task prioritizations based on tradeoffs between en ergy and time for each task. This leads to good performance in terms of energy minimizat ion. Furthermore, the execution of these assignments can be potentially done in parallel to minimize the computational time (i.e., runtime overhead). 1.4.3 Static Slack Allocation to Mi nimize Total Energy Consumption The proposed scheduling algorithm Path based DVS algorithm, finds the best task set that can efficiently use unit slack for minimizing en ergy consumption. It inco rporates assignment based dependency relationships among tasks as well as different ener gy profiles of tasks on different processors. It provi des near optimal solutions for energy minimization with considerably smaller computational time and memo ry requirements as compared to an existing PAGE 26 26 algorithm that provides near optimal solutions (i.e., linear programming based approach). The main features of the proposed st atic slack allocation al gorithm are as follows, in particular, in a perspective of requiring small computation time: Utilize compatible task matrix: The compatible task matrix represents the list of tasks which can share unit slack (i.e., a minimum indi visible unit slack) toge ther for each task. The matrix is composed based on the following two characteristics: First, each assignmentbased path which consists of tasks with precedence relationships in an assignment DAG cannot have more than one unit slack. Second, th is unit slack cannot be allocated to more than one task on each assignment-based path. Using the matrix, the branch and bound search method can be efficiently applied. Apply search space reduction techniques: In general, the branch and bound search method requires large computati onal time. Thus, to reduce the search space (which reduces the computational time as a result), we check whether each task is a divide a fully independent task, a fully dependent task, or a compressible task. Here only one representative of compressible ta sks participates in the search It dramatically reduces the search space while not reducing the quality of energy performance. 1.4.4 Dynamic Slack Allocation to Mi nimiz e Total Energy Consumption Prior dynamic slack allocation algorithms for DAGs are base d on using a simple greedy approach that allocates the slack to the next ready task on the same processor where the task that completes earlier than expected was executed. This slack forwarding based approach, although fast, is shown not to perform well in our expe riments in terms of energy reduction. A simple option for adjusting slack at runtime is to reapply the static slack allocation algorithms for the unexecuted tasks when a task finishes early or late. It can be expected to be close to the best that can be achieved for energy minimi zation, particularly when applyi ng near optimal static slack allocation algorithms. However, the time requirement s of static algorithms are large and they may not be practical for many runtime scen arios. The proposed dynamic slack allocation algorithms effectively reallocate the slack to unexecuted tasks to reduc e more energy and/or meet a given deadline at runtime. They are comparab le to static algorithms applied at runtime in terms of reducing energy and/or meeting a give n deadline, but require considerably smaller PAGE 27 27 computational time. Also, they are effective for cas es when the estimated execution time of tasks is underestimated or overestimated. The main features of the proposed dynamic slack allocation algorithms are as follows: Select the subset of tasks for slack reallocation: The potentially rescheduled tasks via the dynamic slack allocation algorithm are ta sks which have not yet started when the algorithm is applied. We assume that the volta ge can be selected before a task starts executing. The dynamic slack allocation (i.e., resche duling) is applied to the subset of tasks that depends on the algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead of rea llocating the slack during runtim e. Clearly, this should be done so that the other goal of energy reduction is also met simultaneously. Determine the time range for the selected tasks: The time range of the selected tasks has to be changed as some of the tasks have comp leted earlier or later than expected. Based on the computation time in the current sc hedule and assignment-based dependency relationships among tasks, we recompute the time range (i.e., earliest start time and latest finish time) where the selected tasks should be executed. Slack is allocated to the selected tasks within this time range in order to try to meet the deadline constraints. 1.4.5 Dynamic Assignment to Mini miz e Total Energy Consumption There is very little rese arch on the dynamic scheduling for DAGs with the goal of energy minimization. We have shown th at reallocating the slack at runtime (i.e., dynamic slack allocation) leads to better energy minimization. However, it may not be enough to improve energy requirements at runtime. We show that reassignment of ta sks along with reallocation of slack during runtime can lead to better perf ormance in terms of energy minimization as compared to only reallocating the sl ack at runtime. For an approach that is effective and useful at runtime, its computational time (i.e., runtime overh ead) is also small. The main features of the proposed dynamic assignment algorithm are as follows: Select the subset of tasks for reassignment: Like in dynamic slack allocation, the potentially rescheduled tasks via the dynami c assignment algorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic rea ssignment is applied to the subset of tasks among the tasks. The tasks considered for rescheduling are limited in order to minimize the overhead of reassigning pro cessors during runtime. PAGE 28 28 Determine the time range for the selected tasks: The time range of the selected tasks has to be determined in order to meet the dead line constraints. Based on the computation time in the current schedule and assignment-base d dependency relationships among tasks, we recompute the time range where the selected tasks should be executed. While the time range is defined for the selected tasks given an assignment in the dynamic slack reallocation (i.e., earliest star t time and latest finish time for the selected tasks on their assigned processors), for reassignment, it is defined over each proce ssor for the selected tasks (i.e., available earliest start time and late st finish time for the selected tasks on each processor). The reassignment for the selected tasks is performed within this determined time range. Utilize expected DVS based energy information during reassignment: Our algorithm reassigns the appropriate processor for each selected task such that the total energy expected after slack allocation is minimize d. The expected DVS based energy for each selected task is computed by using the estimated deadline for each task so that the selected tasks can be executed within the time range. Th is leads to good perf ormance in terms of energy minimization while meeting deadline constraints. 1.5 Document Layout The rem ainder of this document is organized as follows. Chapter 2 presents the related work on scheduling for energy minimization. Chapter 3 presents the st atic slack allocation algorithm to minimize total en ergy consumption under the deadline constrains. Chapter 4 presents the dynamic slack allocation to minimi ze total energy consumption under the deadline constraints at runtime. Chapter 5 presents the static assignment algorithms. Chapter 6 presents the dynamic assignment algorithm to minimize total energy consumption. In Chapter 7, conclusion and future work are described. PAGE 29 29 CHAPTER 2 RELATED WORK There has been significant interest in th e developm ent of energy aware scheduling algorithms has been actively c onducted as the energy is importan t in many systems. The energy aware scheduling algorithms can be divided de pending on their goal: scheduling to minimize overall energy consumption, scheduling to balan ce energy consumption for each processor, and so on. The scheduling with the goal of balancing en ergy is usually applicable in wireless sensor networks [74]. For most other cases, the schedu ling is done with the goal of energy minimization and is the focus of this dissertation. The scheduling algorithms for energy minimization can be br oadly divided depending on: Whether Dynamic Voltage Scaling ( DVS) technique is used or not? Whether it is for independent tasks or de pendent tasks (i.e., tasks with precedence relationships)? Whether it is for single processor systems or multiprocessor systems? Whether it is for homogeneous systems or heterogeneous systems? Whether it is applied at compile time or runtime? In the following, we briefly describe the curre nt work that addresses the above issues. Several algorithms have developed to minimize energy consumption without DVS technique [27, 28, 61]. However, they do not perform well in DVS-enabled systems. Also, the DVS technique has been found to be a very effective method for reducing energy in DVS enabled processors. The proposed scheduling algorithms in this thesis focus on the DVS technique. The scheduling algorithm for energy minimi zation can be divided depending on the characteristics of tasks consisting of target a pplications: scheduling for independent tasks and scheduling for dependent tasks (i.e., tasks with precedence relationships). The precedence relationships are represented as a Directed Acyclic Graph (DAG) consisting of nodes that PAGE 30 30 represent computations and edge s that represent the dependenc y between the nodes. There is considerable research on DVS scheduling algorithms for independent tasks [3, 4, 5, 11, 12, 21, 23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52, 59, 60, 70, 72, 73, 76, 78, 79]. However, many applications are represented by DAG. The proposed scheduling al gorithms in this thesis are focused on DAG based applications. The scheduling algorithms for energy minimization can be also be categorized based on whether the target system is a single processor system or a multiprocessor system. There is considerable research on DVS sche duling algorithms in a single pro cessor real time system [3, 4, 5, 25, 49, 50, 52, 72]. However, in practice, a multiproce ssor real time system is used to execute many applications. The proposed scheduling algorithms in this thesis focus on a multiprocessor system. In addition, the multiprocessor system can be divided into a homogeneous multiprocessor system and a heterogeneous multiprocessor system. While several prior scheduling algorithms in a multiprocessor system can only apply for a homogeneous system, the proposed scheduling algorithms in this thes is are applicable for both homogeneous and heterogeneous systems. Finally, the scheduling algorithm s for energy minimization can be also divided depending on whether it is applied at compile time (i.e., static algorithms) or at runtime (i.e., dynamic algorithms). Several runtime approaches have be en studied in the lite rature [4, 5, 21, 23, 33, 35, 47, 51, 59, 60, 76, 77, 78, 79 ]. However, most of these approaches have been developed for independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. The proposed scheduling algorithms in this thesis focus on the dynamic algorithms for DAG based applications in a multiprocessor system as well as the static algorithms. PAGE 31 31 As described in Chapter 1, the scheduling algorithm for energy minimization broadly consists of two steps: assignment and then slack allocation. Most of the prior research on the scheduling for energy minimization of DAGs on parallel machines has not focused on the assignment process, but more on the slack allocation process. However, the assignment process is very important to minimize energy consumpti on in addition to the slack allocation process. The proposed scheduling algorithms in this thes is focus on both the assignment algorithms and the slack allocation algorithms for energy minimization. In the following sections, we present related work for static slack allocation, dynamic slack allocation, static assignment, and dynamic a ssignment, for the scheduling of DAG based applications on homogenous and he terogeneous parallel processo rs respectively, in detail. 2.1 Static Slack Allocation There is considerable research on DVS sche duling algorithm s for independent tasks [3, 11, 12, 25, 26, 34, 38, 43, 49, 50, 52, 59, 70, 72, 73]. Recently, several DVS based algorithms for slack allocation have been proposed for tasks with precedence relationships in a multiprocessor real time system [6, 13, 22, 29, 45, 46, 48, 55, 57, 75] The slack allocation algorithms (i.e., DVS scheme) can be mainly divided into two cate gories: non-optimal slack allocation and nearoptimal slack allocation. 2.1.1 Non-optimal Slack Allocation The slack is greedily allocated to tasks base d on decreasing or incr easing order of their finish time [13], or allocated evenly to all pos sible tasks [45]. In [22] the scheduling algorithm iteratively assigns slack based on dynamic recalculation of priori ties. The algorithms in [13, 22, 45] ignore the various energy profiles of tasks on different processors during slack allocation and lead to poor energy reduction. Using these energy profiles can lead to reduction in potential PAGE 32 32 energy saving [48, 55]. The static slack alloca tion algorithms described in [48, 55] work as follows: Divide the total slack available into an equal partition called unit slack Iteratively execute the following till all the avai lable slack is used: Allocate the unit slack to a task(s) that leads to maximum reduction in energy However, because of the dependency relationshi ps among tasks in an assignment, the sum of energy reduction of several tasks (i.e., tasks executed in parallel) may be higher than the highest energy reduction of a single task(s). In this case, the allocation of slack to a single task(s) with the highest energy reduction one at a time as used in [48, 55] leads to suboptimal slack allocation. Our scheme effectively exploits this fact to determine a set of multiple independent tasks which cumulatively have the maximum energy reduction. 2.1.2 Near-optimal Slack Allocation As a near-op timal slack allocation algorith m, Linear Programming (LP) based approach has been developed [75]. The formulation in [75] for the continuous voltage case is formulated as Linear Programming (LP) pr oblem where the objective is the minimization of total energy consumption. The constraints include deadline c onstraints for each task and the relationships among tasks from an original DAG and the relationships among tasks on the same processor after assignment. Since the formulation in [75] does not consider the communication time among tasks, we extend the version by consideri ng the communication time when representing precedence relationships among tasks. The linear based formulation for the continuous voltage case is as follows: PAGE 33 33 i i i i i ii ii source sink j ij ii iij i j i icompTime deadline startTime compTime x deadline x startTime deadline startTime startTime pPred pred where x commTime startTime startTime xf 0 0 ,0 subject to Minimize where xi is the computation time of task i that can be slowed, f(xi) is the energy model depending on computation time, startTimei is the start time of task i on its assigned processor, predi is the set of direct predecessors of task i in a DAG, pPredi is the task assigned prior to task i on the same assigned processor, compTimei is the computation time of task i on its assigned processor, and commTimeij is the communication time between task i and task j on their assigned processors. The sink a nd source nodes are dummy nodes re presenting the start and end of a DAG, respectively. Their computation tim e and communication time connected to both nodes are zero. The function f(x) in general is a nonlin ear function. As an effective approximation, the convex objective function that minimizes ener gy can be formulated as a piecewise linear function. The accuracy of this approximation increas es with the larger nu mber of intervals (or smaller length of intervals). This effectively leads to choices that are more energy efficient. Convex optimization problems for the target ap plication with linear c onstraints and objective function that is the sum of convex function of i ndependent variables can be solved in polynomial time [2, 24, 37]. This is based on using a piecewis e linear approximation of the energy functions for each variable. In [24], the number of in tervals for the piecewise linear function is proportional to 8n (In our case, n will be the number of tasks). Th is processor has to be repeated multiple times to achieve the required level of accu racy. In practice, we found that significantly PAGE 34 34 less number of intervals and a single iteration is sufficient to achieve acceptable level of accuracy (i.e. level after which th e reduction in energy plateaus). The LP based algorithm provides near optim al solutions but requires much time and memory requirements. Our scheme addresse s the problems (i.e., time and memory) by combining compatible task matrix, search sp ace reduction techniques, and lower bound while providing near optimal solutions. 2.2 Dynamic Slack Allocation Several runtim e approaches for slack allocati on have been studied in the literature [4, 5, 21, 23, 33, 35, 47, 51, 59, 60, 76, 77, 78, 79]. Most of these approaches have been developed for independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. For tasks with precedence relationships in a multiprocessor real time system, the algorithm in [51] uses greedy technique (i.e., slack forwarding) that allocates the ge nerated slack to the next ready task on the same processor where an early finished task was executed. Although th e time requirement of the greedy approach is small, the performance in terms of reducing energy is significantly lower than applying the static methods at runtime. Our methods show that the use of more intelligen t methods can lead to improved reduction in energy requirements. 2.3 Static Assignment The assignm ent algorithms used in the schedul ing for energy minimi zation can be mainly classified into the following two broad categories: assignment to minimize finish time and assignment to minimize energy. Assignment to minimize finish time: The goal of this assignment is to minimize total finish time of a DAG. If the deadline constraints are met, appropriate slack is allocated in the second phase to tasks to minimize energy. Assignment to minimize energy: Th is method tries to make assi gnments that lead to lower energy (before slack allocation) but may not meet deadline cons traints. Furthermore, even if they minimize total energy consumption befo re slack allocation, they may not minimize PAGE 35 35 the energy consumption after slack allocati on. This is because the energy after slack allocation depends on the execution time, availa ble slack, and energy profiles of the tasks. Most prior scheduling algorithms for ener gy minimization use simple list assignment algorithms. Parallel computing lite rature consists of a variety of algorithms that minimize the finish time of DAG on a parallel machine. Pr ior research on task scheduling in DAGs to minimize total finish time has mainly focused on algorithms for a homogeneous environment [16, 41, 42, 54, 69, 71]. Scheduling algorithms such as Dynamic Critical Path (DCP) algorithm [41] that give good performance in a homogene ous environment may not be efficient for a heterogeneous environment as the computation tim e of a task may be dependent on the processor to which the task is mapped. Several scheduli ng algorithms for a heterogeneous environment have been recently proposed [8, 32, 44, 62, 64]. Most of them are based on static list scheduling heuristics to minimize the finish time of DAGs for example, Dynamic Level Scheduling (DLS) [62], Heterogeneous Earliest Fini sh Time (HEFT) [64], and Iterat ive List Scheduling (ILS) [44]. The DLS algorithm selects a task to schedule and a processor where the task will be executed at each step. It has two features that can have an adverse impact on its performance. First, it uses the earliest start time to select a processor for a task to be scheduled. This may not be effective for a heterogeneous environment as the comple tion of the task may depend on the processor where the task is assigned. Second, it uses the average of computa tion time across all the processors for a given task to de termine a critical task. This can cause an inaccurate estimation of tasks priority. The HEFT algorithm reduces the cost of sche duling by using pre-calcul ated priorities of tasks in scheduling and uses the ea rliest finish time for the selec tion of a processor. This can, in general, provide better performance as compared to the DLS algorithm. However, since the algorithm uses the average of computation time acr oss all the processors for a given task to PAGE 36 36 determine tasks priorities, it may lead an inaccu rate ordering for executing tasks. To address the problem, the ILS algorithm generates an initial schedule by using HEFT and iteratively improves it by updating priorities of tasks. While it has been shown to have good performance [44], we show that the determination of tasks priority can be improved by using group based assignment. This is because the calculated priorities of tasks have a degree of inaccuracy on a heterogeneous environment as the assignment of future tasks is unknown. Most of existing algorithms for energy mi nimization are based on one execution of assignment and slack allocation. To improve performance in terms of energy, an iterative execution of assignment and slack allocation base d on genetic algorithms or simulated annealing has been proposed. They are based on trying out several assignments (or iteratively refining the assignment). Each assignment is followed by a slack allocation algorithm to determine the energy requirements. The Genetic Algorithm (GA) based approach in [56, 57] consists of two interleaved steps: Processor selection for tasks based on GA For each processor selection, derive the best scheduling which includes the execution ordering of tasks using another GA Each GA evolves the solutions via two poi nt crossover and mutation from randomly generated initial solutions and explores the large search space for getting better solution. Given each schedule from the processor selection and the task ordering, a DVS based slack allocation scheme is applied. This approach was shown to outperform existing algo rithms in terms of energy consumption based on their experimental results. However, the assignment itself still does not consider the energy consumption after slack allocation. Also, the testing of energy requirements of multiple solutions each corresponding to a different assignment requires considerable computational time. PAGE 37 37 2.4 Dynamic Assignment There is little research on the dynamic sc heduling for DAGs with the goal of energy m inimization. Furthermore, the existing dynamic scheduling algorithms have concentrated only on dynamic slack reallocation. However, as shown in this thesis, reassign ment of tasks (i.e., dynamic assignment) along with reallocation of slack during runtime can be expected to lead to better performance in terms of energy minimization. PAGE 38 38 CHAPTER 3 STATIC SLACK ALLOCATION The slack allocation algo rithms assume that an assignment of tasks to processors has already been made. The problem of slack a llocation can be posed as the following: Allocate variable amount of slack to each task so that the to tal energy is minimized while the deadlines can still be met. Most prior slack allocati on algorithms provide non-optimal solutions for energy minimization. They ignore the various energy pr ofiles of tasks on differe nt processors during slack allocation. While some of algorithms use the energy profiles for better energy minimization, they still ignore the dependency re lationships among tasks in an assignment. All of them lead to poor energy reduction. To address these problems, our slack allocation algorithm incorporates assignment based depend ency relationships among tasks as well as different energy profiles of tasks. Unlike most algorithms, a Linear Programming (LP) based approach provides near optimal solutions for en ergy minimization. However, it requires large computational time and memory. We introduce a slack allocation algorithm which provides close to optimal solutions for energy minimization but requires less computational time and memory compared to LP based approach. 3.1 Proposed Slack Allocation The Path based algorithm, our novel approach for energy m inimization, is an iterative approach that allocates a small amount of slack (called unit slack ) in each iteration and asks the following question: Find the subset of tasks that can be allocat ed this unit slack so that the total energy consumption is minimized while the deadline constraint is also met. PAGE 39 39 The above process is iteratively applied till all the slack is used. We show that each iteration of the problem can be reduced to fi nding a weighted maximal independent set of tasks, where the weight is given by the amount of energy reduction by allocating unit slack. The dependency relationships in an assignmen t DAG constrain the total slack which can be allocated to the different tasks. For instance, in Figure 1-1, consider an example in which one unit of slack can be allocated (i.e. the deadline is 12 units). Th e total unit slack that can be allocated for the one unit of slack is one or two: If task 7 (or 1) is allocated the slack, no ot her task can use this slac k in order to satisfy the deadline constraints. Tasks 2 and3 (or 2 & 4, 2 & 6, 4 & 5, 5 & 6) can use this slack c oncurrently as they are not dependent on each other and both can be slowed down. The appropriate option to choose between the tw o choices depends on the energy reduction in task 7 versus the sum of energy reduction for tasks 2 and 3. Our slack allocation algorithm considers the overall assignment-based dependency relationships among tasks, while the most exis ting algorithms ignore them. We define two phases: Phase 1: Slack allocation from start time to total finish time based on a given assignment in this case the slack can be allocated to only a subset of tasks that are not on the critical path. Phase 2: Slack allocation from total finish time to deadline in this case the slack can potentially be allocated to all the tasks. For instance, while, in Figure 1-1, there is no sl ack from start time to total finish time, in Figure 3-1, the slack of time 5 to 6 is considered for the slack allocation from start time to total finish time. The slack can be allocated only to task 2. However, the slack of time 8 to 9 at Phase 2 can be allocated to a subset of tasks (e.g., 1, 2 & 3, or 4). PAGE 40 40 The execution of Phase 1 precedes the execution of Phase 2 to expect more energy saving by reducing the possibility of redundant slack alloca tion to the same tasks. In the example of Figure 3-1, assume that the energy of tasks 1, 2, 3, and 4 reduced by allocating one time unit of slack is 1, 10, 1, and 10, respectively and the energy model follows a quadratic function. The total energy saving is 20 by a llocating slack to task 2 at Phase 1 and then task 4 at Phase 2 Meanwhile, when allocating slack to tasks 2 and 3 at Phase 2 and then task 2 at Phase 1 the total energy saving is 16.6. It gives a difference of 17%. For each of the two phases, our algorithm iterati vely allocates one unit of slack (the size of this unit called unitSlack is a parameter). For Phase 1 at each iteration over unitSlack only tasks with the maximum available slack are considered because of the limited number of slack allocable tasks and the different amount of availa ble slack for each task. Thus tasks considered at each iteration may be changed. For instance, cons ider an example where only three tasks have available slack of 5, 4, and 3 respectively. In the first iteration, only one task with a slack of 5 will be considered. In the next iteration, two task s will be considered as both of them have a slack of 4. This process is iteratively executed ti ll there is no task which can use slack until total finish time. Meanwhile, at Phase 2 all tasks are considered for sl ack allocation at each iteration. The number of iterations at Phase 2 is equal to totalSlack divided by unitSlack, where totalSlack is defined by the difference of actual deadline and total finish time. At each iteration, one unitSlack is allocated to one or more tasks that lead to maximum sum of energy reduction over the full use of the unitSlack The characteristic that each ta sk is allocated the entire unitSlack or no slack during each iteration allo ws for the use of branch and bound techniques to find the optimal slack allocation. The size of the unitSlack can be reduced to a level where the further reducing it does not significantly improve the energy requirements. PAGE 41 41 Figure 3-1. Example of a DAG and assignment on two processors 3.2 Unit Slack Allocation In this sec tion, we present our slack allocation algorithm over a minimum indivisible unit slack, called as unitSlack which finds the best task set that can efficiently use unitSlack for minimizing energy consumption. A key requirement of the slack alloca tion algorithm is to incorporate assignment-based dependency relations hips among tasks as we ll as different energy profiles of tasks on different processors. The slack allocation algorithm is motivated from the characteristic that each assignmentbased path which consists of tasks with prec edence relationships in an assignment DAG cannot have more than one unitSlack Furthermore, this slack cannot be allocated to more than one task on each path. In Figure 1-1, there ar e three assignment-based paths: 1-2-5-7 ( Path1), 1-3-5-7, ( Path2), and1-3-4-6-7 ( Path3). The maximum amount of unitSlack that can be allocated to tasks is the number of paths and only one task along each of these three paths can be allocated the unitSlack. An implication of the above is that two tasks on the same path of an assignment DAG cannot both be allocated unitSlack. Using a matrix which represents tasks that can share slack for given tasks, the branch and bound search method is efficiently applied. 3.2.1 Maximum Available Slack for a Task Each task has differential am ount of maximum av ailable slack. This is due to the fact that the assignment algorithm has to maintain the pr ecedence relationships among tasks in an original 0 1 2 3 4 5 6 7 8 9 deadline 1 2 3 4 P0 P1 3 4 2 1 PAGE 42 42 DAG. This slack is divided by unitSlack for normalization, i.e., the maximum number of unitSlack s that can be allocated to a task is equal to maximum available slack divided by unitSlack The maximum available slack of task i, slacki, is defined by the difference of the latest start time of i, LSTi, and the earliest start time of i, ESTi. The latest start time of task i, the earliest start time of task i, and the slack of task i are respectively defined by i ij j succj pSucc i icompTime commTime LST LST deadline LSTi i min,, min ij j j pred pPred pPred i icommTime compTime EST compTime ESTstart ESTij i imax max i i iESTLST slack where deadlinei is the deadline of task i, starti is the start time of i, succi is the set of direct successors of i in a DAG, pSucci is the task assigned next to i on the same assigned processor, predi is the set of direct predecessors of i in a DAG, and pPredi is the task assigned prior to i on the same assigned processor. Note that at Phase 1 the deadline is assumed to be equal to total finish time unless the specified deadline of a task is earlier than the total finish time. 3.2.2 Compatible Task Matrix The matrix represents the list of tasks which can share unitSlack together for each task or vice versa. If task i and task j are in the same assignment-based path, elements mij and mji in compatible task matrix M are set to zero. Otherwise, the elements are set to one. The elements related to the same task (i.e., mij where i = j) are set to zero. If the va lue of element indicating the relationship of two tasks is equal to one, the two tasks can share unitSlack together because they are independently (or in parallel) executed. Howeve r, if the value is equal to zero, the two tasks cannot share unitSlack because only one task in each assignment-based path can have unitSlack PAGE 43 43 The assignment-based dependency relationships among tasks may be changed after slack allocation over unitSlack. The change of assignment-based dependency relationships also lets compatible task matrix modified. The compatible task matrix M is defined by otherwise ,0 if ,1 where, ... : : ... ... 21 2 2221 1 1211ji ij nn nn n n m mmm mmm mmm M where mij indicates whether task i and task j can be slack-sharable and i is the set of assignment-based paths including task i. While n (matrix size: n by n) is the total number of tasks at Phase 2 it is the number of tasks whose maximum available slack is the greatest size at Phase 1 This matrix can be easily generated by performing a transitive closure on the assignment DAG and then taking the complement of that matrix. The DAG structure can also be used to derive a list of ancestors for each task This list can be updated by performing a level wise search of the DAG. In most cases, it generates a sparse matrix. Th is can be effectively represented by an array of lists (one for each task). The compatible task list of task i consists of tasks not in the same paths with the task i. Thus tasks included in compatibleTaski are ones which can share unitSlack together with task i. The compatible task list of task i, compatibleTaski, is defined by i k ik i where kTask compatible | where is the set of all tasks in a DAG. Figure 3-2 shows the compatible matrix and lists for the example in Figure 1-1. Using the compatible task matrix/lists, the set of tasks which can share unitSlack together is found such that the sum of energy reduction of tasks is maximized. It corresponds to the maximum weighted independent set (MWIS) problem which is known to be NP-hard [7, 53, 65]. Our approach on PAGE 44 44 task scheduling for energy minimization addresses this problem using a branch and bound search and demonstrates its efficiency. [] ]5 ,2[ ]6 ,4[ ]5 ,2[ ]2[ ]6 4, ,3[ [] 0000000 0010010 0101000 0010010 00 00010 0101100 00000007 6 5 4Task compatible Task compatible Task compatible Task compatible Task compatible Task compatible Task compatible M3 2 1 Figure 3-2. Compatible task matrix a nd lists for an example in Figure 1-1 3.2.3 Search Space Reduction We reduce the search space by performing the following checks for each task: fully independent tasks, fully dependent tasks, and co mpressible tasks. The rule to distinguish task i using compatible task matrix and lists is as follows: searchin consider Else candidate allocable as consider then 0 If Else to allocate then 1 Ifi i ij i ij unitSlack mj unitSlack mjj, i The rule to distinguish task i using compatible task lists is as follows: searchin consider Else candidate allocable as consider then 0 If Else to allocate then 1 Ifi i i i iunitSlack Task compatible N unitSlack NTask compatible N where N(compatibleTaski) is the number of tasks in compatibleTaski and N(is the number of tasks. PAGE 45 45 3.2.3.1 Fully independent tasks If a task is included only in an assignment-b ased path consisting of only the task (i.e., independent task from all of other tasks), unitSlack is certainly allocate d to the task without search regardless of the slack allocation of other tasks. 3.2.3.2 Fully dependent tasks If a task are in all assignment-based paths (i.e ., dependent task with all tasks), the task is one of candidate task sets which unitSlack can be allocated to. Thus, the energy reduction of the task is compared with those of other candidates w ithout including this during the search. In Figure 3-2, tasks 1 and 7 are the examples of fully dependent tasks. 3.2.3.3 Compressible tasks The tasks on the same assignment-based paths can be represented by a single task for the purpose of slack allocation. The re presentative of compressible task s is a task with the maximum energy reduction among the compressible tasks. This can lead to substantial reduction in runtime without decreasing energy performance. In the assignment DAG of Figure 3-3 (a), tasks 3, 5, and 11 can be compressed and represented by a single representative task (e.g., 3) since the paths where they are included are all same. The representative of the compressed tasks is a task with the maximum energy reduction among compresse d tasks. Using compatible task lists, we can check if tasks can be compressed instead of seeing assignment-based paths including tasks. The rule of compression from a compatible task list is as follows: jij i i c kererTask compatible Task compatible & if where k c is the kth compressed task including a repres entative of the compressed task and eri is energy reduction of task i. PAGE 46 46 Figure 3-3 illustrates an in itial assignment DAG and its compressed DAG for a given application with 12 tasks. In Figure 3-4, the compression process of compatible task lists for the example of Figure 3-3 is illustrated. In other words, (a) and (b) in Figure 3-4 represent the assignment DAG of (a) and the compressed assignment DAG of (b) in Figure 3-3, respectively, using compatible task lists. From the initial compatible task lists, the following tasks are compressed: (1, 12), (2, 9, 10), (3, 5, 11), and (4, 8). Each compressible task list is represented by one task with th e maximum energy reduction (e.g., 1, 2, 3, 8). The second column in Figure 3-4 (b) shows compatible task lists after the compression. Any fully independent task (e.g., 2) is automatically allocated unitSlack and excluded for the search by removing the task from the compatible task lists of other tasks and itself. Once the fully independent task is removed from the compatible task lists, task 1 is identified as a fully dependent one and also excluded for the search. It is considered as a f easible solution without any search. The remaining tasks except for the fully independent tasks and the fully dependent tasks participate in search. Tasks that can be effectively merged with ot her tasks are removed (i .e., tasks with greater index are removed) from compa tible task lists to avoid redundant traversal in search. For instance, task 7 has task 6 in the compressed compatible task list of task 7, but the task 6 is removed since the compatible task list of task 6 includes task 7. The third column in Figure 3-4 (b) shows the reduced compatible task lists afte r compression. The search is finally performed with tasks 3, 6, 7, 8 based on the reduced compatible task lists Through the search, two solutions, {3, 8} and {6, 7, 8}, are considered in addition to fully dependent tasks (e.g., 1) as feasible solutions. PAGE 47 47 Figure 3-3. Compression of assignment DAG: (a ) Assignment DAG, (b) Compressed assignment DAG Figure 3-4. Compression of compatible task lists: (a) Compatible ta sk lists in a given assignment, (b) Compressed and reduced compatible task lists 3.2.4 Branch and Bound Search The energy reduction of a task is defined by th e difference of its original energy and its energy expected after allocating a unitSlack to the task. A branch and bound algorithm is used to search all the possible compatible solutions to determine the one that has the maximum energy reduction. The feasible states in the state space cons ist of all the compatible subsets of tasks. We (a) [2, 9, 10] 12 [2, 4, 8, 9, 10] 11 [1, 3, 4, 5, 6, 7, 8, 12] 10 [1, 3, 4, 5, 6, 7, 8, 12] 9 [2, 3, 5, 6, 7, 9, 10, 11] 8 [2, 4, 6, 8, 9, 10] 7 [2, 4, 7, 8, 9, 10] 6 [2, 4, 8, 9, 10] 5 [2, 3, 5, 6, 7, 9, 10, 11] 4 [2, 4, 8, 9, 10] 3 [1, 3, 4, 5, 6, 7, 8, 12] 2 [2, 9, 10] 1 Com p atible Tasks tas k [ ] [2, 3, 6, 7] 8 [8] [2, 6, 8] 7 [7, 8] [2, 7, 8] 6 [8] [2, 8] 3 Allocate unitSlack [1, 3, 6, 7, 8] 2 Feasible solution [2] 1 Reduction Compression Compatible Tasks task (b) (a) : start (dummy) node : compressed task 1 3 4 5 8 6 2 9 1 0 7 11 12 S 1 3 8 6 2 7 S 5 (b) PAGE 48 48 use a Depth First Search (DFS) to effectively search through all possible subset of compatible tasks. The advantage of using a DFS is that it only stores one search path representing a candidate task set which unitSlack can be allocated to during s earch. By maintaining a running lower bound from the energy reduction of traver sed search paths so far, we apply bounding heuristics that elim inate search spaces where a be tter solution cannot be found. At any given node of the state space tree, the se t of possible search options is limited to the list of available tasks corresponding to the intersection of all the list of tasks from the root to that particular node. Each node in a search graph ha s its own explorable task list indicating tasks which can be explored as child nodes of the node. The explorable task list of node i including task k, explorableTaski, is defined by i ikk parent i ikk i, parent ,Task compatible Task explorable parent ,Task compatible Task explorablei if if The cost of node x, c(x) is defined by c(x) = f(x) + g(x) where f(x) is the sum of energy reduction of tasks from the root to node x and g(x) is the estimate on the sum of energy reduction of tasks of child nodes from node x. g(x) is obtained as the sum of energy reduction of tasks in the explorable task list of the node and represents an upper bound to the amount of energy reduction of tasks of child nodes. Thus, when exploring nodes in search, if c(x) is lower than the lower bound, the node x is pruned, otherwise, it is expanded. The cost value c(x) on leaf node x indicates the actual sum of energy reduction of tasks in the search path. If c(x) on leaf node x is greater than lower bound, the lower bound is updated as c(x) and the search path becomes a candidate solution. The optimal task set over unitSlack is finally found. Figure 3-5 illustrates the reduction of compatible task list in Figure 3-2 and its application to explore a search graph. Through the search, five solutions, {2, 3}, {2, 4}, {2, 6}, {4, 5}, and {5, 6}, PAGE 49 49 are considered in addition to fully dependent tasks (e.g., 1, 7) as feasible solutions which unitSlack can be allocated to. Figure 3-5. Reduced compatible task lists and search graph 3.2.5 Estimating the Lower Bound to Reduce the Search Space Finding the set of tasks which makes the sum of energy reduction of tasks maximized by allocating unitSlack can be referred to the maximum wei ghted independent set problem (MWIS). The authors in [53] showed that simple greedy algorithms for the MWIS guarantee to find a task set whose weight is at least GVvvdvW 1 / where W(v) is the weight of vertex v in a graph G and d(v) is the degree of vertex v We modify the guaranteed minimum weight for MWIS problem to apply it to our problem as an initial lower bound. The lower bound, lowerbound, is initialized as follows: si i s iTask compatible NN er lowerbound 1, where s is the set of tasks participating in the search and N(s) is the number of tasks participating in the search, N(compatibleTaski) is the number of tasks included in compatibleTaski, and eri is the energy reduced by allocating unitSlack to task i. [ ] 6 [6] 5 [5] 4 [ ] 3 [3, 4, 6] 2 Compatible Tasks Task 6 6 2 5 3 4 [3, 4, 6] [ ] [ ] [ ] [ ] 4 5 [ ] [ 6 ] [5] [5] [ 6 ] [3, 4, 6] PAGE 50 50 If the set of fully dependen t tasks is nonempty, the lowe r bound is compared with the energy reduction of each fully dependent task. In the example of Figure 3-2, before the search, the lower bound is updated by the maximum energy reduction among fully dependent tasks 1 and 7 if the initial lower b ound is lower. Then the fully dependent task (1 or 7) with the maximum energy reduction becomes a feasible so lution for slack allocation. Furthermore, at each iteration, unless the assignment-based depe ndency relationships among tasks are changed from the previous step, the energy reduction of the solution of the previous step (i.e., the sum of energy reduction of tasks which unitSlack is allocated to at the previ ous step) can be used as the lower bound for the next unit slack allocation. 3.3 Experimental Results We compare the performance of our DVS al gorithm (i.e., PathDVS), DVS algorithm to allocate slack to task(s) with the highest energy reduction in [48, 55] (i.e., EProfileDVS), and greedy slack allocation based DVS algorithm in [13] (i.e., GreedyDVS). All the DVS algorithms assume that the assignment of tasks to pro cessor is already completed. The following two different assignment strategies are used: ICP which assigns based on the earliest finish time (presented in Chapter 5) and CPS which assigns based on the earliest possi ble start time [48]. We also compare the performance of PathDVS and LP DVS, an extension to the formulation in [75] to incorporate communication costs. PathDVS and LPDVS algorithms provide close to the optimal solution and controlled by the size of the unitSlack and the number of intervals respectively. The size of unitSlack and the number of intervals are also set to the best size and length obtained empirically in this experiment. For LPDVS, CPLEX v. 10.0 [14], was used to solve the LP problem by using a piecewise linear function for convex objective function. PAGE 51 51 3.3.1 Simulation Methodology In this section, we describe DAG genera tion and performance measure used in our experiments. 3.3.1.1 The DAG generation In order to show the performance of the propo sed static slack allocation algorithm in both heterogeneous and homogeneous environments, we randomly generated a large number of synthetic graphs with 100, 200, 300, and 400 tasks. For heterogeneous systems, the execution time of each task on each processor at the maxi mum voltage is varied from 10 to 40 units. The communication time between a task and its child task for a pair of processors is varied from 1 to 4 units. For homogeneous systems, within the sim ilar extent, the execution time of each task on all processors at the maximum voltage is varied from 10 to 40 units and all of the communication time among tasks on different processors is set to 2 units. The energy consumed to execute each task is varied from 10 to 80. The execution of graphs is performed on 4, 8, and 16 processors. For each combination of values of number of tasks and processors, 20 different synthetic graphs are generated. 3.3.1.2 Performance measures The performance is measured in terms of nor malized total energy consumption, that is, total energy normalized by the energy obtained from an assignment al gorithm without a DVS scheme. The deadline is determined by: deadline = (1 + deadline extension rate) maximum total finish time from assi gnments without DVS scheme We provide experimental results for deadline extension rate equal to 0.0 (no deadline ex tension), 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, and 0.4. PAGE 52 52 3.3.2 Memory Requirements The size of the compatible task matrix is O(n2) Generally this matrix is sparse and can be reduced into O(kn) using lists, where n is the number of tasks and k is the constant representing the number of compatible tasks. At every leve l the list of explorable tasks of size bounded by O(n) is stored, but its size becomes zero at the leaf node as reduced gradually at each level. Our branch and bound method uses DFS and only stor es one path whose length is the number of tasks that can be allocated slack together and should be O(min(n,p)) where p is the number of processors. Thus the number of va riables stored during search is O(n) and the overall memory requirement of our algorithm is O(kn+n) it can be reduced by using search space reduction techniques. The number of variables required for LPDVS is proportional to O(n number of intervals) and its memory requirement depends on the actual implementation of linear programming. Using CPLEX on a machine with 2 Gigabyte memory, the maximum number of tasks that LPDVS can reliably execute was around 200 for 0.4 deadline extension rate (i.e. 400 piecewise linear inte rvals per a task). Meanwhile, we were able to execute DAGs of size 1000 using PathDVS as shown in Figure 3-6. 8 Processors with 0.4 Deadline Extension Rate0 50000 100000 150000 200000 250000 300000 350000 020040060080010001200 Number of TasksRuntime Figure 3-6. Runtime of PathDVS with respec t to different size of DAGs (unit: ms) PAGE 53 53 3.3.3 Determining the Size of Unit Sl ack and the Number of Intervals Figure 3-7 shows the results of comparison of energy consumption of PathDVS with respect to different sizes of unitSlack The size of unitSlack is determined by the rate of total finish time (i.e., unitSlack = totalFinishTime unitSlackRate ). The performance of our slack allocation algorithm in terms of energy depends on the size of unitSlack In general, the smaller size of unitSlack leads more energy saving while it makes runtime increased. However, the size of the unitSlack can be limited to a level where further reducing it does not significantly improve energy requirements. Based on the results, the size of unitSlack corresponding to 0.0005 unit slack rate does not give significant improvemen t on energy. While there is 7-10% improvement of energy with 0.001 unitSlackRate over 0.01 unitSlackRate, there is less than 0.3% difference of energy between 0.001 and 0.0005 unitSlackRate s. Thus the size of unitSlack corresponding to 0.001 unitSlackRate is a reasonable choice. 100 Tasks0 0.2 0.4 0.6 0.8 1 0.10.010.0010.0005 Unit Slack RateNormalized Energy 0 deadline extension 0.1 deadline extension 0.2 deadline extension 0.3 deadline extension 0.4 deadline extension 200 Tasks0 0.2 0.4 0.6 0.8 1 0.10.010.0010.0005 Unit Slack RateNormalized Energy 0 deadline extension 0.1 deadline extension 0.2 deadline extension 0.3 deadline extension 0.4 deadline extension Figure 3-7. Normalized energy consumption of Pa thDVS with respect to different unit slack rates for different number of tasks: (a) 100 tasks and (b) 200 tasks The authors in [24] suggest that the LP problem with a convex objective function and linear constraints can be optimally solved using 8 n intervals for the piecewise linear function that approximates that the convex function, where n is the number of tasks. However, we found that in practice, the smaller number of intervals is su fficient for our target applications. The total time PAGE 54 54 amount which will be divided by interval for the piecewise linear function for each task is equal to the amount of total maximum available slack (i.e., deadline extension rate total finish time + slack available until total finish time or available slack based on minimum voltage ) further dividing the time amount is unnecessary and requ ires more computational time. The total slack available to each task can be approximately bounded by total available slack (i.e., deadline total finish time before slack allocation ). The number of intervals is proportional to the deadline extension rate divided by the interval rate (i.e., the number of intervals deadline extension rate / intervalRate ). Figure 3-8 shows the result of comp arison of energy consumption of LPDVS with respect to different interval rates by whic h the objective function is divided. Based on the results, the length of interval corresponding to 0.0005 intervalRate does not give significant improvement on energy compared to 0.001 intervalRate. However, there is 2-8% improvement of energy with 0.001 intervalRate over 0.01 intervalRate while there is 0.05% difference of energy between 0.001 and 0.0005 intervalRate s. Thus the length of interval corresponding to 0.001 intervalRate is a reasonable choice. 100 Tasks0 0.2 0.4 0.6 0.8 1 0.10.010.0010.0005 Interval RateNormalized Energy 0 deadline extension 0.1 deadline extension 0.2 deadline extension 0.3 deadline extension 0.4 deadline extension 200 Tasks0 0.2 0.4 0.6 0.8 1 0.10.010.0010.0005 Interval RateNormalized Energy 0 deadline extension 0.1 deadline extension 0.2 deadline extension 0.3 deadline extension 0.4 deadline extension Figure 3-8. Normalized energy consumption of LPDVS with respect to different interval rates for different number of tasks: (a) 100 tasks and (b) 200 tasks PAGE 55 55 3.3.4 Homogeneous Environments In this section, we show the performance of the proposed st atic slack allocation algorithm in homogeneous environments where the comput ation time of each task and the communication time among tasks on all processors are same. 3.3.4.1 Comparison of energy requirements Tables 4-1, 4-2, 4-3, and 4-4 show the im provement of PathDVS over EProfileDVS and GreedyDVS for different number of processors with respect to different assignments and different number of processors for different number of tasks in hom ogeneous environments. PathDVS considerably outperforms other existing DVS algorithms regardless of using any assignment algorithms. For instance, given ICP assignment, PathDVS improves by 12-29% over EProfileDVS and 60-70% over GreedyDVS with 0.4 deadline extension rate. The results show that the performance improvement of PathDVS is higher for larger number of processors. Table 3-1. Results for 100 tasks in homogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.261.161.823.555.868.10 9.039.81 ICP Greedy 2.185.969.3417.6928.1942.41 51.9059.08 EProfile 0.081.021.883.675.537.60 9.029.94 4 Processors CPS Greedy 2.196.089.5318.0428.4942.60 52.2459.41 EProfile 1.002.593.977.2411.2416.06 18.9720.64 ICP Greedy 4.979.4013.3522.8034.3549.20 58.7065.34 EProfile 0.201.983.477.3711.3115.77 18.2519.86 8 Processors CPS Greedy 3.328.1112.2622.0633.5948.34 57.7564.44 EProfile 1.674.997.5413.5519.5924.12 26.2427.37 ICP Greedy 7.1513.4318.7830.4542.8056.23 64.3570.02 EProfile 0.543.756.4012.6718.3723.89 26.0927.27 16 Processors CPS Greedy 6.4113.2718.8130.6142.6256.23 64.3470.06 PAGE 56 56 Table 3-2. Results for 200 tasks in homogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.211.322.174.086.248.59 10.0311.14 ICP Greedy 1.305.849.5918.3628.9543.07 52.6559.84 EProfile 0.081.232.144.326.8610.43 12.4114.07 4 Processors CPS Greedy 1.235.789.5318.4629.2344.00 53.7761.06 EProfile 0.372.664.238.0412.7218.39 21.5123.54 ICP Greedy 2.718.2912.7423.0134.9850.05 59.4966.18 EProfile 0.202.134.088.2613.1118.31 20.7922.74 8 Processors CPS Greedy 2.187.7712.2922.7234.8249.73 58.8865.55 EProfile 1.253.425.379.9315.5822.37 26.0428.15 ICP Greedy 4.8810.5015.2126.0438.5253.85 63.1769.44 EProfile 0.252.764.9510.0615.8022.61 26.4128.53 16 Processors CPS Greedy 3.819.8014.6025.5238.0653.52 62.8869.16 Table 3-3. Results for 300 tasks in homogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.100.801.222.303.866.49 8.9110.50 ICP Greedy 1.365.588.9817.1427.3641.88 52.1759.63 EProfile 0.020.811.272.363.645.41 6.347.19 4 Processors CPS Greedy 0.945.308.7717.0227.1041.23 50.9158.24 EProfile 0.473.345.6911.4517.8925.14 30.6834.62 ICP Greedy 1.898.8014.0525.9139.2554.74 64.6871.46 EProfile 0.063.155.4410.7216.8723.58 27.7730.60 8 Processors CPS Greedy 1.578.6513.9425.5238.4553.65 63.0469.52 EProfile 0.684.317.0113.2319.7226.67 29.9531.73 ICP Greedy 3.7611.1316.7428.8241.9356.90 65.5171.28 EProfile 0.624.537.0213.1519.6426.58 29.7331.54 16 Processors CPS Greedy 3.2110.9016.6028.6941.7056.59 65.2071.00 PAGE 57 57 Table 3-4. Results for 400 tasks in homogeneou s environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.091.202.054.407.2511.30 14.5916.66 ICP Greedy 0.835.919.8418.9329.9044.91 55.2462.49 EProfile 0.021.222.023.826.038.54 9.6210.18 4 Processors CPS Greedy 0.625.709.5918.4029.0043.14 52.5959.56 EProfile 0.222.203.697.0611.3217.02 21.2924.70 ICP Greedy 1.487.2711.7021.7833.8549.29 59.5366.85 EProfile 0.041.903.236.9011.3318.70 23.7727.56 8 Processors CPS Greedy 1.517.2111.5921.8633.8850.36 60.9068.19 EProfile 0.564.918.0914.4221.5228.41 31.3833.31 ICP Greedy 2.4710.3716.1728.5341.7656.85 65.5271.28 EProfile 0.294.757.5813.6220.3327.92 31.6333.46 16 Processors CPS Greedy 1.7610.3916.2628.7542.0357.09 65.7671.52 Figure 3-9 shows the energy comparison of DVS algorithms (i.e., PathDVS, EProfileDVS, GreedyDVS) using ICP for different number of tasks. The results show that the performance improvement of PathDVS over the other DVS algorithms generally increases as the deadline extension rate increases. Table 3-5 shows the energy comparison be tween PathDVS and LPDVS in homogeneous environments. Note that the comparison is limited to 200 tasks as this was the largest problem that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS and the intervalRate for LPDVS are set to 0.001. These results show that the two algorithms are comparable in energy minimization or PathDVS is slightly better for most cases. PAGE 58 58 100 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 200 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 300 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 400 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS Figure 3-9. Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for different number of tasks: (a) 100 tasks, (b) 200 tasks, (c) 300 tasks, and (d) 400 tasks Table 3-5. Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in homogeneous en vironments (Positive difference indicates that PathDVS performs better than LPDVS) 100 Tasks 200 Tasks LPDVS PathDVS Difference LPDVS PathDVS Difference 0 0.962454 0.962451 0.000003 0.978646 0.978653 -0.000007 0.01 0.921541 0.921532 0.000009 0.924348 0.924422 -0.000074 0.02 0.888233 0.888223 0.000010 0.883956 0.884003 -0.000047 0.05 0.809833 0.809764 0.000069 0.793064 0.793112 -0.000048 0.1 0.713714 0.713611 0.000103 0.686825 0.685985 0.00084 0.2 0.579983 0.579758 0.000225 0.547527 0.543398 0.004129 0.3 0.487378 0.48714 0.000238 0.455571 0.447699 0.007872 0.4 0.417437 0.417173 0.000264 0.388111 0.377237 0.010874 PAGE 59 59 3.3.4.2 Comparison of time requirements Table 3-6 and Figure 3-10 show the comparis on of computational time for PathDVS and LPDVS in homogeneous environments. The PathDVS requires less runtime because it substantially reduces the search space by using compatible task lists, their compression, and the lower bound. In particular, the time requirement s of PathDVS are substantially smaller as the deadline extension rate decreases (i.e., tight dead line) while it increases linearly as the deadline extension rate increases due to the iterative search over unitSlack For many practical real time systems, the tight deadline is true. Based on th e results shown in Table 3-6, for no deadline extension (i.e., deadline extension rate equal to 0), the runtime of PathDVS is one to two orders magnitude less than that of LPDVS. Table 3-6. Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous environments 100 Tasks 200 Tasks 4 Processors 61.97 210.32 8 Processors 19.46 52.74 100 Tasks0 200 400 600 800 1000 0.010.020.050.10.20.30.4 Deadline Extension RateRuntime LPDVS PathDVS 200 Tasks0 500 1000 1500 2000 2500 3000 0.010.020.050.10.20.30.4 Deadline Extension RateRuntime LPDVS PathDVS Figure 3-10. Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in homogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks PAGE 60 60 3.3.5 Heterogeneous Environments In this section, we show the performance of the proposed st atic slack allocation algorithm in heterogeneous environments where the comput ation time of each task and the communication time among tasks are different on different processors. 3.3.5.1 Comparison of energy requirements Tables 4-7, 4-8, 4-9, and 4-10 show the improvement of PathDVS over EProfileDVS and GreedyDVS for different number of processors with respect to different assignments (i.e., ICP and CPS assignments) and different number of processors (i.e ., 4, 8, and 16 processors) for different number of tasks (i.e., 100, 200, 300, and 400 tasks) in heterogeneous environments. Like in homogeneous environments, PathDVS co nsiderably outperforms other existing DVS algorithms regardless of using any assignment al gorithms. For instance, given ICP assignment, PathDVS improves by 7-36% over EProfile DVS and 80-93% over GreedyDVS with 0.4 deadline extension rate. The results also show th at the performance improvement of PathDVS is higher for larger number of processors and larger number of tasks. Figure 3-11 shows the energy comparison of DVS algorithms (i.e., Path DVS, EProfileDVS, GreedyDVS) using ICP for different number of tasks (i.e., 100, 200, 300, and 400 tasks). Based on the results, in general, the performance improvement of PathDVS over the ot her DVS algorithms generally increases as the deadline extension rate increases. PAGE 61 61 Table 3-7. Results for 100 tasks in heterogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 6.706.746.786.886.917.05 7.127.22 ICP Greedy 58.6359.4460.2762.6366.0971.68 75.9779.35 EProfile 0.671.191.762.663.755.37 6.326.71 4 Processors CPS Greedy 3.456.8610.0217.6227.6741.63 51.2158.35 EProfile 19.4319.4719.5019.4819.4919.67 19.6019.61 ICP Greedy 76.3276.7877.2678.6180.6083.82 86.2988.24 EProfile 0.532.634.208.0812.2816.47 18.0818.90 8 Processors CPS Greedy 8.2812.6616.5625.8836.9250.65 59.2365.38 EProfile 23.5623.5323.6223.5423.6223.57 23.6923.69 ICP Greedy 84.1984.4884.8085.6987.0289.18 90.8392.14 EProfile 1.814.376.4210.9715.1020.03 22.1423.24 16 Processors CPS Greedy 13.0417.3721.3830.4040.7153.99 62.4068.33 Table 3-8. Results for 200 tasks in heterogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 10.2210.2810.3710.4710.5410.79 10.9110.88 ICP Greedy 60.4061.2162.0164.2767.5872.93 77.0380.27 EProfile 0.161.512.444.416.628.64 9.5710.00 4 Processors CPS Greedy 2.737.2310.9119.5329.9043.70 52.9659.85 EProfile 15.7515.7515.9115.8616.1016.18 16.3016.28 ICP Greedy 73.2573.7874.3175.8378.0781.69 84.4786.65 EProfile 0.421.392.284.847.7811.43 13.5914.97 8 Processors CPS Greedy 5.469.3312.8821.5432.1546.42 55.8762.69 EProfile 26.8926.8726.9626.8526.8026.83 26.9526.89 ICP Greedy 83.4583.7784.0985.0286.4088.65 90.3791.73 EProfile 1.214.437.1512.2918.4523.50 25.4626.45 16 Processors CPS Greedy 9.9415.3719.7029.4741.4054.65 62.6068.26 PAGE 62 62 Table 3-9. Results for 300 tasks in heterogene ous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 7.757.797.867.938.078.22 8.318.34 ICP Greedy 58.8359.6760.4962.8166.2371.77 76.0279.37 EProfile 0.000.831.352.824.426.40 7.337.85 4 Processors CPS Greedy 1.475.989.6018.0228.3042.28 51.7858.85 EProfile 18.9618.9118.9219.0619.1219.18 19.2019.26 ICP Greedy 74.6375.1475.6577.0979.2182.63 85.2787.34 EProfile 0.081.843.035.999.5414.04 16.3817.78 8 Processors CPS Greedy 4.008.9012.8522.2033.4348.05 57.4564.09 EProfile 35.2935.3735.4135.2635.3635.44 35.4735.39 ICP Greedy 85.2985.5985.8986.7287.9689.96 91.5092.71 EProfile 0.504.507.7314.7922.2029.25 32.3433.64 16 Processors CPS Greedy 10.1916.7521.9133.2745.4659.14 66.9672.21 Table 3-10. Results for 400 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage) Deadline Extension Rate 0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 9.309.379.459.539.689.90 9.9710.01 ICP Greedy 59.2760.1160.9463.2666.6772.16 76.3879.70 EProfile 0.101.231.973.845.868.10 9.169.72 4 Processors CPS Greedy 1.696.5310.3019.0029.4443.36 52.7259.64 EProfile 22.4822.4922.5122.5322.6622.73 22.7122.75 ICP Greedy 75.4575.9676.4577.8479.9083.22 85.7887.79 EProfile 0.893.314.999.0213.3618.08 20.4021.65 8 Processors CPS Greedy 5.3610.9815.3525.3436.7950.98 59.8466.08 EProfile 36.1836.0936.0936.0536.0736.05 36.1436.05 ICP Greedy 85.3485.6485.9386.7788.0089.99 91.5292.73 EProfile 1.284.838.3416.1623.1029.95 32.7334.07 16 Processors CPS Greedy 7.8314.5620.1832.2744.8258.80 66.8272.20 PAGE 63 63 100 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 200 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 300 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS 400 Tasks0 0.2 0.4 0.6 0.8 1 00.010.020.050.10.20.30.4 Deadline Extension RateNormalized Energy GreedyDVS EProfileDVS PathDVS Figure 3-11. Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for diffe rent number of tasks in heterogeneous environments: (a) 100 tasks, (b) 200 ta sks, (c) 300 tasks, and (d) 400 tasks Table 3-11. Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS) 100 Tasks 200 Tasks PathDVS LPDVS Difference PathDVS LPDVS Difference 0 0.922500 0.922383 0.000116 0.947985 0.947781 -0.000203 0.01 0.885561 0.885098 -0.000462 0.906328 0.906165 -0.000162 0.02 0.851380 0.850993 -0.000386 0.870523 0.870373 -0.000149 0.05 0.770131 0.769853 -0.000277 0.785193 0.785067 -0.000126 0.1 0.671220 0.671066 -0.000154 0.681989 0.682629 0.000639 0.2 0.537683 0.537611 -0.000072 0.543238 0.545835 0.002597 0.3 0.447022 0.447104 0.000081 0.449777 0.453937 0.004160 0.4 0.379900 0.380161 0.000261 0.382132 0.385930 0.003798 PAGE 64 64 Table 3-11 shows the energy comparison be tween PathDVS and LPDVS in heterogeneous environments. Note that the comparison is limited to 200 tasks as this was the largest problem that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS and the intervalRate for LPDVS are set to 0.001. Like in homogeneous environments, these results show that the two algorithms are comparable in energy minimization. 3.3.5.2 Comparison of time requirements Table 3-12 and Figure 3-12 show the runtime comparison between PathDVS and LPDVS. Like in homogeneous environments, PathDVS requires less runtime b ecause it substantially reduces the search space by usi ng compatible task lists, their compression, and the lower bound. In particular, the time requirements of PathDVS are substantially smaller as the deadline extension rate decreases (i.e., tight deadline). For instance, the runtime ratio of LPDVS to PathDVS is 56.49 for 200 tasks on 4 processors for no deadline extension. Table 3-12. Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous environments 100 Tasks 200 Tasks 4 Processors 37.38 56.49 8 Processors 13.27 12.22 100 Tasks0 200 400 600 800 1000 0.010.020.050.10.20.30.4 Deadline Extension RateRuntime LPDVS PathDVS 200 Tasks0 500 1000 1500 2000 2500 0.010.020.050.10.20.30.4 Deadline Extension RateRuntime LPDVS PathDVS Figure 3-12. Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in heterogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks PAGE 65 65 3.3.6 Effect of Search Space Reduction Techniques for PathDVS The main factor that determines the cost of PathDVS algorithm is the size of search space. In this section, we present the effect of sear ch space reduction techniques introduced in this paper (i.e., compression, compatible task matr ix/lists, and lower bound). The experiments are performed on 50 different synthetic graphs for e ach combination of values of number of tasks and processors with 0.01 deadline extension rate We present results the average values of the different metrics for Phase 2 as it is considerably more computation intensive than Phase 1 The cost of Phase 1 is small as the number of slack allocable tasks considered is smaller. The size of search space depends on the depth of search tree and the number of tasks participating in the search. The size of search space is O(n^d) where n is the number of total tasks and d is the depth of search tree. By using compression, the size can be reduced to O(t^d) where t is the number of tasks participating in the search. Table 3-13 shows the average number of tasks after compression. Note that the compression technique classifies tasks into th ree categories: fully independent tasks, fully dependent tasks, and compressible tasks, and then makes only a representative for each compressible task participate in the search. Th ese results show that the compression methods reduce the number of task signifi cantly (58-94%) leading to a mu ch smaller search space. The amount of compression decreases as the number of processors increase. This is because the amount of compression achieved is based on the assignment-based dependency relationship among tasks in the assignment DAG (not the actual DAG). This relationship generally becomes more complex with the increase of number of processors. Table 3-14 shows the depth of search tree. Base d on the results, the de pth is proportional to the number of processors (i.e., depth number of processors ) and the size can be referred to as PAGE 66 66 O(t^p) where p is the number of processors. Thus, th e maximum number of independent tasks which unitSlack can be allocated together is approximately same to the number of processors. Although the worst case size of search space is O(t^p) the use of compatible task matrix/lists can lead to a substant ially smaller number of tasks that are expanded (i.e., explorable tasks) at each level. Furthermore, the maximum le vel that is searched is generally much smaller than the depth. This makes the search space sign ificantly smaller and is further reduced by the use of branch and bound techniques. Table 3-15 shows the number of nodes explored in the search. The number of nodes explored is consid erably smaller than the total search space. Table 3-13. Number of tasks partic ipating in search with respect to different number of tasks and processors Number of Tasks Number of Processors Number of Tasks Part icipating in Search 4 11.8 8 24.5 100 Tasks 16 42.4 4 12.1 8 24.8 200 Tasks 16 53.6 Table 3-14. Depth of search tr ee with respect to different num ber of tasks and processors Number of Tasks Number of Pro cessors Depth of Search Tree 4 4 8 8.2 100 Tasks 16 17.3 4 4 8 7.9 200 Tasks 16 17.4 PAGE 67 67 Table 3-15. Number of nodes expl ored in search with respect to different number of tasks and processors Number of Tasks Number of Processors Number of Node Explored in Search 4 22 8 1114 100 Tasks 16 141342 4 27 8 1000 200 Tasks 16 415924 PAGE 68 68 CHAPTER 4 DYNAMIC SLACK ALLOCATION Static sched uling algorithms for DAG execu tion use the estimated execution time. The estimated execution time (ET) of tasks may be di fferent from their actual execution time (AET) at runtime. We divide the dynamic environments into two broad categori es based on whether the actual execution time is less than or more than the estimated time: overestimation (AET < ET) and underestimation (AET > ET). For most real time applications, an uppe r worst case bound on the actual execution time (i.e., worst case execution time) of each task is us ed to guarantee that the application completes in a given time bound. This corresponds to ove restimation of actual execution time. Therefore, many tasks may complete earlier than expected during the actual execution. This allows for assignment-based dependent tasks to potentially start earlier than what was envisioned during the static scheduling. The extra available slack can then be allocated to tasks that have not yet begun execution with the goal of reducing the total en ergy requirements while still meeting the deadline constraints. For many applications that do not use the wors t case time for estimation, historical data is used to estimate the time requirements of each ta sk and the estimated execution time may be less than the actual execution time. This corresponds to underestimation of actual execution time. In this case, for tasks where the time is underestim ated, many future tasks may complete later than expected during the actual execution. Thus, it cannot be guaranteed that the deadline constraints will be always satisfied. Howeve r, slack can be removed from future tasks with the hope of satisfying the deadline constraints as closely as possible. A simple option for adjusting slack at runtime is to reapply the static slack allocation algorithms for the unexecuted tasks when a task finishes early or late. However, the time PAGE 69 69 requirements of static algorithms applied at runtime are generally large and they may not be practical for many runtime scenarios. We expl ore novel dynamic (or runtime) algorithms for achieving these goals. In this chapter, we present novel dynamic al gorithms that lead to good performance in terms of both computational time (i.e., runtime overhead) and energy requirements. The main intuition behind our methods is that the slack a llocation can be restricted to a small subset of tasks so that the static slack allo cation algorithms can be applied to a small subset rather than all the tasks. There are three main contributions of our methods: They require significantly less computationa l time (i.e., runtime overhead) than applying the static algorithm at runtime for every in stance when a task fini shes early or late. The performance in terms of reducing en ergy and/or meeting a given deadline is comparable to applying the static algorithm at runtime. They are effective for cases when the estimat ed execution time of tasks is underestimated or overestimated. 4.1 Proposed Dynamic Slack Allocation We assume that a static algorithm has already been applied before executing tasks and the schedule needs to be adjusted whenever a task finishes early or late. The dynamic slack allocation algorithm reallocates the slack whenever a task finishes earlier or later than expected based on the current schedule. The current schedu le is initialized to the static schedule and updated whenever dynamic slack allocation is ap plied from the occurrence of early or late finished tasks at runtime. Our algorithms do not change the assignment of tasks to the processors. The requirements of dynamic slack allocatio n algorithm depend on whether the execution time is overestimated (AET < ET) or underestimated (AET > ET). PAGE 70 70 Overestimation: The extra slack can be potentially allocated to tasks that are not yet executed. Here the goal of dynamic slack allocation algorithms is to reduce energy while still meeting deadline constraints. Underestimation: In this case, the primary goal of dyn amic slack allocation algorithms is to reduce the slack of future tasks to tr y to complete the DAG within the deadline constraints or as closely as possible to the deadline. A secondary goal is to minimize the energy requirements. Although our approach can be used in a mi xed environment (i.e., an environment where some tasks are underestimated and some tasks are overestimated), the main motivation is to support an environment where the estimated executi on time of tasks is mostly overestimated or underestimated. The main focus for the case of unde restimated tasks is to meet deadline, while for overestimated tasks is to minimize energy. The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks for which the schedule will be readjusted. The sc hedule for the remaining tasks (i.e., tasks not selected for the slack reallocation) is not affected. There are two st eps that need to be addressed. First, select the subset of tasks for slack rea llocation. The potentially rescheduled tasks via the dynamic slack allocation al gorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be select ed before a task starts executing. The dynamic slack allocation (i.e., rescheduling) is applie d to the subset of tasks that depends on the algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead of reallocating the slack during runtime. Clearly, this should be done so that the other goal of energy reduction is also met simultaneously. Sec ond, determine the time range for the selected tasks: The time range of the se lected tasks has to be change d as some of the tasks have completed earlier or later than expected. Based on the computation time in the current schedule and assignment-based dependency relationships am ong tasks, we recompute the time range (i.e., earliest start time and latest finish time) where th e selected tasks should be executed. Slack has to PAGE 71 71 be allocated to the selected tasks within this time range in order to try to meet the deadline constraints. At this stage a static slack allocation approach is applied to the subset of tasks within the time range as described above. It is worth notin g that the dynamic slack allocation algorithms presented in this section are independent of th e static scheduling algorithms. Once the tasks and their constraints are determined, any static sc heduling algorithm can be potentially used at runtime. We have used the methods providing ne ar optimal solutions (i.e., LP based approach, Path based approach as described in Chapter 3) for this purpose. The computational overhead is kept small due to the limited number of tasks selected for slack reallocation. Before applying the dynamic slack allocation, th e computation time of each selected task is set to its estimated execution time used in the assignment algorithm (before any static slack allocation) for calculating the slack during dynamic slack allocation. The slac k is recalculated for the selected tasks (ignoring the sl ack that was allocated during the static scheme). This will, in general, lead to better energy requirements as considering the change of assignment-based dependency relationships among tasks from the early finished task. It is based on the fact that the slack allocation by considering assignment-based dependency relationships among tasks leads to better performance in terms of reducing energy. 4.1.1 Choosing a Subset of Tasks for Slack Reallocation The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks for which the schedule will be readjusted. The sc hedule for the remaining tasks (i.e., tasks not selected for the slack reallocation) is not affected. Figure 4-1 shows the subset of tasks for slack reallocation in an assignment DAG when task 2 finishes early or late based on two dynamic slack allocation algorithms that reallocate slack: k time lookahead approach and k descendent PAGE 72 72 lookahead approach. These approaches are describe d in detail in the next subsections. Note that the assignment DAG may be changed based on the change of assignment-based direct dependency relationships due to the slack realloca tion and the early or late finished tasks. 4.1.1.1 Greedy approach In greedy approach, only the assignment-based direct successors of the early or late finished task are considered for readjusting th e schedule. In the example shown in Figure 4-1, only the direct successors of task 2, e.g., tasks 4 and 5, are considered for slack allocation. The greedy approach uses slack forwarding [51], whic h allocates slack to a direct successor of the early or late finished task on the same pro cessor. We extend the greedy approach in [51] by considering all assignment-based direct successors for slack allocation on any processors. This extension is expected to make more energy reduced compared to allocating slack to a single task. 4.1.1.2 The k time lookahead approach Using k time lookahead approach, all tasks within a limited range of time are considered for readjusting the schedule. The range of time is limited based on the value of k (i.e., k maximum computation time of tasks ). The maximum computation time is defined as the computation time of the task that takes the maxi mum time. In the example shown in Figure 4-1, assume that the computation time of each task is one unit, the communication time among tasks is zero, and the tasks in the same depth finish at the same time for ease of presentation of the key concepts. In this case, if k is equal to 2, the time range woul d be 2 units (2 one unit) and then tasks within the time range from the finish of task 2, e.g., 4, 5, 6, 7,8,9,and10, are considered. The set of ta sks selected for the slack reallocation when task l finishes early is defined by PAGE 73 73 s.t. where }, maxl l l j l i l i i allocationestaticFTim ftime compTime k*fimeime staticFT ftimeme|staticSTi{ j where staticSTimei is the start time of task i in the static or previous schedule, staticFTimei is the finish time of task i in the static or previous schedule, ftimel is the actual finish time of task l at runtime, and compTimej is the computation time of task j on its assigned processor, a.k.a., the estimated execution time at the maximum voltage. The approach with all option for k (i.e., k-all time lookahead approach) corresponds to the static slack allocation appro ach without the limitation on the time range for tasks considered for rescheduling. Thus the k-all time lookahead approach is sa me as applying the static slack allocation to all the remaining tasks at runtime. One would expect this to be close to the best that can be achieved, particularly when applying near optimal static slack allocation algorithms (i.e., LP based approach, Path based approach) as descri bed in Chapter 3. The set of tasks selected for the slack reallocation when task l finishes early is defined by l l l l i i allocationestaticFTim ftime ftimeestaticSTim s.t. where}, |{ 4.1.1.3 The k descendent lookahead approach Unlike the k time lookahead approach, the k descendent lookahead approach considers only tasks whose schedules are directly influenced by the early or late finished task. The main intuition is that limiting the tasks to direct de scendants will reduce scheduling time requirements and also lead to good performance in terms of energy as keeping the schedule for uninfluenced tasks or indirectly influen ced tasks. Specifically, the k -th assignment-based direct successors of the early or late finished task are considered. Th e number of tasks considered for readjusting the schedule is limited with the value of k Only descendants that are at a distance up to k are considered. In the example of Figure 4-1, using the descende nt lookahead approach that k is PAGE 74 74 equal to 2, the considered tasks are direct assignment-based successors of task 2, e.g., tasks 4 and 5, and their direct successors, e.g., tasks 7,8, and 9 However, task9 will not be allocated slack because of no available slack for the task due to the direct dependency of task 6. The approach with all option for k (i.e., k-all descendent lookahead approach) corresponds to setting k equal to the remaining depth. The set of tasks selected for the slack reallocation is defined by stepfirst after the step previous at the generated step,first at the s.t. where }, |{allocation l l l l l ii allocation estaticFTim ftime assgnSucc where assgnSuccl is the set of assignment-based direct successors of task l. Figure 4-1. Tasks selected for slack reallocation in an assign ment DAG depending on dynamic slack allocation algorithms 1 2 4 3 5 6 7 8 9 11 10 1 2 3 6 7 8 9 11 10 k-2 Time Lookahead k-2 Descendent Lookahead k-all Descendent Lookahead 4 5 k-all Time Lookahead (Static DVS applied at runtime) Greedy PAGE 75 75 4.1.2 Time Range for Selected Tasks The static schedule (or the previous schedule updated at runtime) for tasks not in the set of slack reallocable tasks (i.e., the set of selected task s for slack reallocation) is kept to be the same. For the set of slack reallocable tasks, the follo wing changes are made be fore applying algorithms for slack reallocation: computation time, start time, finish time (or deadline) of tasks. First, the minimum computation time of a task is set to its estimated time at the maximum voltage (i.e., staticCTimei = compTimei where i allocation. Here staticCTimei is the computation time of task i in the static or previous schedule generated by the last slack reallocation). This is the same time that was used during static assign ment process. This effectively ensures that maximum flexibility is available for slack reallocation. For instance, for tasks 5 and 8 in Figure 4-2 (c), their computation time is changed into their own estimated computation time before applying runtime algorithm. However, their co mputation time in Figure 4-2 (d) is not changed since it depends on whether or not they are slack reallocable tasks. Tasks in light grey colored boxes indicate slack re allocable tasks. Next, the start time of the tasks is changed as flexibly as possible to meet the deadline constraints as well as the finish times of assignment-based predecessors of each task. Note that the finish time of the predecessors that have al ready completed or are not part of the selected tasks is fixed. In a case of overestimation, the selected tasks for slack reallocation may start earlier than the current scheduled time. For instance, in Figure 4-2 (c), due to the early finish of task 1, task 3 and task 4 can start early, but task 5 cannot start earl y because of the assignment-based direct depende ncy relationship with task 2. Meanwhile, in a case of underestimation, the selected tasks for slack reallocation may have to start later than the current scheduled time. For instance, in Figure 43 (c), due to the late finish of task 1, tasks 3 and 4 PAGE 76 76 should start late, but task 5 can still start early because it is not directly influenced by the late finished task 1. Finally, the finish time (or deadlines) of the ta sks is changed so that they can be completed as late as possible while ensuring that deadline co nstraints are (as closely) met. The successors of a task that is not part of the selected task s are based on the current schedule (i.e., task 7 in Figure 4-2 (d) and Figure 4-3 (d)). In a case of overestim ation, the deadlines for the selected tasks keep their scheduled finish time. For instance, it is acceptable if slack reallocable tasks 6, 7, and 8 finish no later than their finish time in the static schedule depicted in Figure 4-2 (a). Meanwhile, in a case of underestimation, the deadlines for th e selected tasks may be pushed back to ensure that each task can complete at maximum voltage. For instance, the deadline of task 7 has to be increased as there is no slack in 4. The deadlines of other tasks th at can complete their execution before their scheduled finish time (i.e., task 6) are not changed since changing their deadlines into the maximum finish time (i.e., finish time of task 7) may negatively impact the remaining tasks. Figure 4-2 and Figure 4-3 illustrate the application of the above constraints both for k time lookahead approach and k descendent lookahead approach, for the cases of overestimation and underestimation respectively. The dotted box shows the range of time consis ting of the start time and the finish time (or deadline) for slack real locable tasks which are considered for slack reallocation at runtime. For edges among tasks, the solid line represents an assignment-based direct dependency relationship among the tasks while the dotted line represents an assignmentbased indirect dependency relationship among the tasks. Using the above constraints, each slack a llocable task has different amount of the maximum available slack for reallocation. The actu al slack is computed to be within the time PAGE 77 77 range for slack reallocable tasks. The maximum available slack of slack reallocable task i, slacki, is defined by the difference of the latest start time of task i, LSTi, and the earliest start time of task i, ESTi. The latest start time of task i, the earliest start time of task i, and the maximum available slack of task i are computed as follows, respectively: i ij j succ pSucc i iestaticCTim commTime LST LST deadline LSTij i min,, min ij j j pred pPred pPred i icommTime estaticCTim EST estaticCTim ESTstart ESTij i imax max i i iESTLST slack where deadlinei is the deadline of task i, starti is the start time of task i, succi is the set of direct successors of task i in a DAG, pSucci is the task assigned next to task i on the same assigned processor, predi is the set of direct predecessors of task i in a DAG, pPredi is the task assigned prior to task i on the same assigned processor, commTimeij is the communication time between task i and task j on their assigned processors, and staticCTimei is the computation time of task i in the static or previous schedule generated by the last slack reallocation. Here the earliest start time and the latest start time of a task not includ ed in the set of slack re allocable tasks are equal to its start time based on completed (i.e.,c allocation j j j j whereestaticSTim LSTEST Here staticSTimej is the start time of task i in the static or previous schedule generated by the last slack reallocation). Once the time range is determined for slack real locable tasks, the slack is reallocated to appropriate tasks by using a slack allocation approach in order to minimize total energy requirements and then the schedule is updated. PAGE 78 78 Figure 4-2. Overestimation: Time range fo r selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach: (a ) Initial static schedule, (b) Schedule from the early finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach Figure 4-3. Underestimation: Time range for selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach: (a) Initial sta tic schedule, (b) Schedule from the late finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach 1 2 3 4 6 7 5 9 8 deadline 1 2 3 4 6 7 5 9 8 1 2 3 4 6 7 5 9 8 1 2 3 4 6 7 5 9 8 (d) (b) (c) (a) slack reallocable task late finished task start (d) (b) (c) (a) 1 2 3 4 6 7 5 9 8 1 2 3 4 6 7 5 9 8 1 2 3 4 6 7 5 9 8 slack reallocable task early finished task deadline 2 3 4 6 7 5 9 8 1 start PAGE 79 79 4.2 Experimental Results In this section, we compare the perfor mance of various dynamic slack allocation algorithms (i.e., k-Descendent, k-Time}, and Greedy) and compare them to applying static slack allocation in dynamic environments. Each dynamic algorithm is appl ied to a static schedule give n through a known assignment algorithm which assigns based on the early finish time and a static slack allocation algorithm (i.e., LPDVS, PathDVS). Our previous experi ments in Chapter 3 show that the energy minimization of LPDVS is comparable to PathDVS while its time requirement is higher. To distinguish PathDVS that is used to generate a static schedule, we call PathDVS applied at runtime as dPathDVS. The size of unit slack for PathDVS and dPathDVS is set to ( 0.001 finish time of a DAG ) based on empirical results for static slack as described in Chapter 3. 4.2.1 Simulation Methodology In this section, we describe DAG generati on, dynamic environments generation, and performance measure used in our experiments. 4.2.1.1 The DAG generation We randomly generated a large number of graphs with 100 and 200 tasks. Since the results for heterogeneous environments are similar to those for homogeneous environments, we present only the results for the latter. The execution time of each task is varied from 10 to 40 units and the communication time among tasks is set to 2 un its. The execution of graphs is performed on 4, 8, and 16 processors. 4.2.1.2 Dynamic environments generation We simulated a number of dynamic cases to study the effectiveness of our algorithms. Here are some of the important parameters that can be varied to create dynamic cases for overestimation and underestimation respectively: PAGE 80 80 Overestimation The fraction of tasks that finish earlier than expected (i.e., tasks with AET < ET) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyFinishedTaskRate total number of tasks ). The fractional difference between actual execution time and estimated time for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecreaseRate estimated execution time ). Underestimation The fraction of tasks that finish later than e xpected (i.e., tasks with AET > ET) is given by the lateFinishedTaskRate (i.e., number of late finished tasks = lateFinishedTaskRate total number of tasks ). The fractional difference between actual execution time and estimated time for each task that finishes late is given by timeIncreaseRate (i.e., amount of increase = timeIncreaseRate estimated execution time ). To generate cases with overestimation, we experimented with earlyFinishedTaskRate s equal to 0.2, 0.4, 0.6, and 0.8 and timeDecreaseRate 's equal to 0.1, 0.2, 0.3, and 0.4. To generate cases with underestimation, we experimented with lateFinishedTaskRate s equal to 0.2, 0.4, 0.6, and 0.8 and timeIncreaseRate s equal to 0.05, 0.1, 0.15, and 0.2. The deadline is determined by: deadline = (1 + deadline exten sion rate) total finish time from assignments without DVS scheme The deadline corresponds to the time requirements of an execution schedule that minimizes execution time for a given set of processors. This represents the overall slack that is available for allocation. We experimented with deadline extension rates equal to 0.0 (no extension), 0.01, 0.02, 0.05, 0.1, and 0.2. 4.2.1.3 Performance measures An important measure is the amount of co mputational time (i.e., runtime overhead) required to readjust the schedule when the executi on time is less than or greater than estimated time. The followings are other important m easures for cases with overestimation and underestimation. PAGE 81 81 For the case of overestimation, normalized energy consumption is measured. This is computed as the total energy required for completing the DAG by the total energy for completing the DAG assuming static slack allocati on (i.e. all tasks completing in exactly their estimated time). A lower value of the norma lized energy consumption is desirable. And, for the case of underestimation, deadlin e miss ratio and energy increase ratio are measured. When the tasks take more time than the estimated time, the overall execution time may be more than the deadline. The deadline miss ratio measures the difference between the actual execution time and the deadline normalized by the deadline. A lower value of the deadline miss ratio is desirable. A value equal to zero im plies that the deadline wa s not missed. And, the energy increase ratio is computed as the incr ease in total energy required for completing the DAG by the total energy for completing the DAG assu ming static slack allo cation (i.e. all tasks completing in exactly their estimated time). A lower value of the energy increase ratio is desirable. 4.2.2 Overestimation In this section, we show th e performance of our algorithms in the case that the execution time of tasks is overestimated (i.e., the actual exec ution time of a task is less than its estimated time). 4.2.2.1 Comparison of energy requirements We first compared k-all descendent algorithm with Gr eedy and dPathDVS algorithms. Figure 4-4 shows the normalized energy requir ements of kallDescendent, Greedy, and dPathDVS algorithms with respect to different time decrease rate s and different early finished task rates for no deadline extension (i.e., deadline extension rate equal to zero). The results show that the energy requirements of ka llDescendent are significantly better than the greedy approach. For instance, for timeDecreaseRate equal to 0.4, kallDescenden t reduces energy by 17% and PAGE 82 82 29% as compared to Greedy algorithm with 0.2 and 0.8 earlyFinishedTaskRate s, respectively. Most importantly, the energy requirements vis-avis dPathDVS are within 1% in almost all cases. The time requirement of kallDescendent is one to two orders of magnitude smaller than dPathDVS as shown in Figure 4-11. These results demonstrate the subset of tasks that comprise only the descendants can be used for slack allocation to simulta neously reduce time requirements while keeping the energy requirements to be comparable to using static scheduling algorithms at runtime. 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent Figure 4-4. Normalized energy consumption of Greedy, dPathDVS, and kallDescendent with respect to different early finished task ra tes and time decrease rates for no deadline extension Table 4-1 shows the energy compar ison of our proposed algorithms, k time lookahead (i.e, kTime) and k descendent lookahead (i.e., kDes cendent) algorithms, with variable k values for each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent). PAGE 83 83 These results show that the energy requirements of k3Time or k6Descendent are comparable with those of kallDescendent. The differe nce between k6Descendent (or k3Time) and kallDescendent is within 1-5% of each other. While kallDescendent is better than k3Time and k6Descendent when the fraction of early finished tasks is small, k6Descendent and k3Time are better when the fraction of early finished tasks is large. Table 4-1. Normalized energy consumption of k time lookahead and k descendent lookahead algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension Early Finished Task Rate Time Decrease Rate k2 Time k3 Time k4 Descendent k6 Descendent kall Descendent 0.1 0.9425 0.9367 0.9379 0.9334 0.9207 0.2 0.9108 0.9008 0.9024 0.8952 0.8753 0.3 0.8866 0.8721 0.8738 0.8639 0.8372 0.2 0.4 0.8701 0.8506 0.8515 0.8393 0.8069 0.1 0.8899 0.8826 0.8845 0.8800 0.8780 0.2 0.8307 0.8194 0.8223 0.8164 0.8153 0.3 0.7857 0.7696 0.7730 0.7657 0.7660 0.4 0.4 0.7527 0.7312 0.7348 0.7266 0.7276 0.1 0.8481 0.8426 0.8420 0.8404 0.8424 0.2 0.7699 0.7621 0.7610 0.7604 0.7657 0.3 0.7092 0.6984 0.6969 0.6974 0.7061 0.6 0.4 0.6647 0.6497 0.6480 0.6492 0.6606 0.1 0.8070 0.8023 0.8007 0.8002 0.8071 0.2 0.7111 0.7057 0.7029 0.7041 0.7186 0.3 0.6430 0.6355 0.6319 0.6343 0.6548 0.8 0.4 0.5890 0.5782 0.5744 0.5781 0.6042 PAGE 84 84 Figures 4-5, 4-6, 4-7, 4-8, 4-9, and 4-10 show the energy requirements of our proposed dynamic slack allocation algorithms, k time lookahead (i.e., kTime) and k descendent (i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), greedy algor ithm (i.e., Greedy), and static slack allocation applied at runtime (i.e., dPathDVS), for no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates, resp ectively. The results are very similar with ones for no deadline extensions as described in the above. 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-5. Normalized energy cons umption for no deadline extension PAGE 85 85 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-6. Normalized energy consump tion for 0.01 deadline extension rate 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-7. Normalized energy consump tion for 0.02 deadline extension rate PAGE 86 86 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-8. Normalized energy consump tion for 0.05 deadline extension rate 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-9. Normalized energy consump tion for 0.1 deadline extension rate PAGE 87 87 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4 Time Decrease RateNormalized Energy Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-10. Normalized energy consump tion for 0.2 deadline extension rate 4.2.2.2 Comparison of time requirements Figure 4-11 shows the average time requirements to readjust the schedule due to a single task's early finish. The computational time of k6Descendent is roughly an order of magnitude lower than kallDescendent and 3-4 times lowe r than k3Time. Based on the time and energy comparisons described above, k6Descendent provides reasonable performance in energy requirements at substantially lower overheads. PAGE 88 88 10000 100000 1000000 10000000 100000000 1000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS k2Time k3Time k4Descendent k6Descendent kallDescendent Figure 4-11. Computational time to readjust the sc hedule from an early finished task with respect to different time decrease rates for no d eadline extension (unit: ns via logarithmic scale) Figure 4-12 shows the time requirements to r eadjust the schedule due to a single tasks early finish with respect to different time decr ease rates for different deadline extension rates (i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results are very similar with ones for no deadlin e extension as desc ribed in the above. PAGE 89 89 (a) 0.01 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (b) 0.02 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (c) 0.05 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (d) 0.1 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (e) 0.2 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.10.20.30.4 Time Decrease RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-12. Results for variable deadline extension rates: Computational time to readjust the schedule from one early finish ed task with respect to different time decrease rates (unit: ns via logarithmic scale): (a) fo r 0.01 deadline extensi on rate, (b) for 0.02 deadline extension rate, (c) for 0.05 dead line extension rate, (d) for 0.1 deadline extension rate, and (e) for 0.2 deadline extension rate 4.2.3 Underestimation In this section, we show th e performance of our algorithms in the case that the execution time of tasks is underestimated (i.e., the actual execution time of a task is greater than its estimated time). PAGE 90 90 4.2.3.1 Comparison of deadline requirements We first compared k-all descendent algorith m with Greedy and dPathDVS algorithms. The results in Figure 4-13 show that kallDescendent is significantly better than the greedy approach in terms of being able to maintain the dead line requirements. Most importantly, the deadline missed ratio vis-a-vis dPathDVS was within 0. 1% in most cases. The time requirement of kallDescendent is one to two orders of magnitude smaller th an dPathDVS as shosn in Figure 427. These results demonstrate the subset of tasks that comprise only the descendants can be used for slack allocation to simultaneously reduce tim e requirements while meeting the deadline as closely as the static algorith ms (executed at runtime). 0.2 Late Finished Task Rate0 0.03 0.06 0.09 0.12 0.15 0.050.10.150.2 Time Increase RateDeadline Miss Ratio No Scheme Greedy dPathDVS kallDescendent 0.4 Late Finished Task Rate0 0.03 0.06 0.09 0.12 0.15 0.050.10.150.2 Time Increase RateDeadline Miss Ratio No Scheme Greedy dPathDVS kallDescendent 0.6 Late Finished Task Rate0 0.03 0.06 0.09 0.12 0.15 0.050.10.150.2 Time Increase RateDeadline Miss Ratio No Scheme Greedy dPathDVS kallDescendent 0.8 Late Finished Task Rate0 0.03 0.06 0.09 0.12 0.15 0.050.10.150.2 Time Increase RateDeadline Miss Ratio No Scheme Greedy dPathDVS kallDescendent Figure 4-13. Deadline miss ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate PAGE 91 91 Table 4-2 shows the deadline miss ra tio of our proposed algorithms, k time lookahead (i.e, kTime) and k descendent lookahead (i.e., kDes cendent) algorithms, with variable k values for each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent). These results show that the deadline miss ra tios of k3Time and k6Descendent are comparable with that of kallDescendent. Table 4-2. Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished task rates and time increase rates for 0.05 deadline extension rate Late Finished Task Rate Time Increase Rate k2 Time k3 Time k4 Descendent k6 Descendent kall Descendent 0.05 0.0010.0000.0010.000 0.000 0.1 0.0040.0020.0040.002 0.000 0.15 0.0100.0070.0100.006 0.001 0.2 0.2 0.0180.0130.0180.013 0.003 0.05 0.0000.0000.0000.000 0.000 0.1 0.0030.0010.0030.002 0.001 0.15 0.0100.0080.0100.009 0.006 0.4 0.2 0.0220.0200.0220.020 0.016 0.05 0.0030.0030.0030.003 0.003 0.1 0.0120.0120.0120.012 0.013 0.15 0.0270.0270.0270.028 0.029 0.6 0.2 0.0500.0510.0510.051 0.052 0.05 0.0100.0090.0100.010 0.010 0.1 0.0330.0330.0330.034 0.035 0.15 0.0610.0620.0620.062 0.064 0.8 0.2 0.1000.1000.1000.100 0.101 PAGE 92 92 Figures 4-14, 4-15, 4-16, 417, 4-18, and 4-19 show the deadline miss ratio of our proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent (i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), static scheduling without any change at runtime (i.e., NoScheme), greedy algorithm (i.e., Greedy), and static slack allocation applied at runtime (i.e., dPathDVS), w ith respect to different time increase rates and different early finished task rates, for no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates, respectively. The resu lts are very similar with ones for 0.05 deadline extension rate descri bed in the above. 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-14. Deadline miss ratio for no deadline extension PAGE 93 93 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-15. Deadline miss ratio for 0.01 deadline extension rate 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-16. Deadline miss ratio for 0.02 deadline extension rate PAGE 94 94 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-17. Deadline miss ratio for 0.05 deadline extension rate 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-18. Deadline miss ratio for 0.1 deadline extension rate PAGE 95 95 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.05 0.1 0.15 0.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.050.10.150.2 Time Increase RateDeadline Miss Ratio NoScheme Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-19. Deadline miss ratio for 0.2 deadline extension rate 4.2.3.2 Comparison of energy requirements Figure 4-20 shows the energy increase ratio for the three algorithms: dPathDVS, kallDescendent, and k6Descendent. The deadline exte nsion rate is set to 0.05 (this corresponds to the case when the amount of slack is small and ha s to the potential of a large number of deadline misses). The three algorithms were found to be co mparable for the amount of energy increase. In general, the k-6 descendent lookahead algorithm is better in terms of energy when larger number of tasks finish late while the static algorithm a pplied at runtime is better when smaller number of tasks finish late. PAGE 96 96 0.2 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k6Descendent 0.4 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k6Descendent 0.6 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k6Descendent 0.8 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k6Descendent Figure 4-20. Energy increase ratio wi th respect to different time increase rates and late finished task rates for 0.05 deadline extension rate Figures 4-21, 4-22, 4-23, 4-24, 4-25, and 4-26 show the energy increase ratio of our proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent (i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and a ll option for kDescendent), greedy algorithm (i.e., Greedy), and static slack allocati on applied at runtime (i.e., dPathDVS), with respect to different time increase rates and different early finished ta sk rates, for no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates, resp ectively. The results are very similar with ones for 0.05 deadline extension rate as described in the above. PAGE 97 97 0.2 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.1 0.15 0.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.05 0.1 0.15 0.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.1 0.15 0.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.1 0.15 0.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-21. Energy increase ratio for no deadline extension 0.2 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-22. Energy increase ratio for 0.01 deadline extension rate PAGE 98 98 0.2 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-23. Energy increase ratio for 0.02 deadline extension rate 0.2 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-24. Energy increase ratio for 0.05 deadline extension rate PAGE 99 99 0.2 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.050.10.15 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.3 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-25. Energy increase ratio for 0.1 deadline extension rate 0.2 Late Finished Task Rate0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.4 Late Finished Task Rate0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.6 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time 0.8 Late Finished Task Rate0 0.05 0.1 0.15 0.2 0.25 0.3 0.050.10.150.2 Time Increase RateEnergy Increase Ratio dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-26. Energy increase ratio for 0.2 deadline extension rate PAGE 100 100 4.2.3.3 Comparison of time requirements Figure 4-27 shows the average time requirements to readjust the schedule per task that is underestimated. The computationa l time of k6Descendent is r oughly an order of magnitude lower than kallDescendent and 3-4 times lower than k3Time. Based on the time, deadline miss ratio, and energy increase ratio comparisons desc ribed above, k6Descendent provides reasonable performance in deadline satisfaction and energy requirements at substantially lower overheads. 1000 10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.050.10.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time c Figure 4-27. Computational time to readjust the schedule from a late finished task with respect to different time increase rates for no deadlin e extension (unit: ns via logarithmic scale) Figure 4-28 shows the time requirements to r eadjust the schedule due to a single tasks early finish with respect to different time decr ease rates for different deadline extension rates (i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results are very similar with ones fo r 0.05 deadline extension rate as described in the above. PAGE 101 101 (a) No Deadline Extension10000 100000 1000000 10000000 100000000 0.050.10.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (b) 0.01 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 0.050.1 0.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (c) 0.02 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 0.050.1 0.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (d) 0.1 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.050.10.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time (e) 0.2 Deadline Extension Rate10000 100000 1000000 10000000 100000000 1000000000 10000000000 0.050.10.150.2 Time Increase RateComputational Time Greedy dPathDVS kallDescendent k4Descendent k6Descendent k2Time k3Time Figure 4-28. Results for variable deadline extension rates: Computational time to readjust the schedule from one late finished task with respect to different time decrease rates (unit: ns via logarithmic scale): (a) fo r 0.0 deadline extension rate (no deadline extension) (b) for 0.01 deadline extension rate, (c) for 0.02 dead line extension rate, (d) for 0.1 deadline extens ion rate, and (e) for 0.2 deadline extension rate PAGE 102 102 CHAPTER 5 STATIC ASSIGNMENT A s presented in Chapter 1, the following tw o step processes are generally used for scheduling tasks with the goal of energy minimization while still meeting deadline constratins: assignment and then slack allocation. In this chapter, we explore the assignment process at compile time (i.e., static assignment) which dete rmines the ordering to execute tasks and the mapping of tasks to processors based on the co mputation time at the maximum voltage level. Note that the finish time of DAG at the maximu m voltage has to be less than or equal to the deadline for any feasible schedule. Most of the prior research on scheduling for energy minimization of DAGs on parallel machines is based on deriving an assignment sche dule that minimizes total finish time in the assignment step. Simple list based scheduling algorithms are generally used for this purpose. This may be a reasonable approach as minimizing finish time generally leads to more slack to be allocated and finally reducing th e energy requirements during the slack allocation step. However, this approach is not enough to minimize total energy consumption because it cannot incorporate the differential energy and time requirements of each task of the workflow on different processors. For the first step, we presen t a novel algorithm that has lowe r finish time compared to existing algorithms for a heterogeneous environmen t. We show that the extra slack that this algorithm generates can lead to overall reduction in energy after slack allocation as compared to existing algorithms. The main thrust of this chapter is to show that incorporating energy minimization during the assignment process can lead to even better results. Genetic Algorithm (GA) based scheduling algorithms [56, 57] have tried to partially addr ess this issue by searching through a large number PAGE 103 103 of assignments. This approach was shown to outperform existing algorithms in terms of energy consumption based on their experimental resu lts. However, the assignment itself does not consider the energy consumpti on after slack allocation during assignment. Furthermore, the testing of energy requirements of multiple solu tions each corresponding to a different assignment requires considerable computa tional time. We present novel algorithms which can achieve assignments with better energy requirements at lower computational times as compared to the Genetic Algorithm based methods. 5.1 Overall Scheduling Process In this section, we present the overall process of our proposed scheduling approach. A high level description of our proposed scheduling approach is illustrated in Figure 5-1. In the first step, tasks are assigned to proce ssors with the goal of minimizing total finish time of a DAG to derive a Baseline Assignment This is done for two reasons: Check whether the deadline constraints can be met. Note that if the deadline is shorter than the finish time of the DAG, the DAG cannot be feasibly executed in the required time. Generate an initial time based task prioritizati on to determine the scheduling order of tasks. This minimizing time based prioritization is cal led Baseline Prioritization for the rest of this chapter. If a feasible assignment is derived, then a DVS based slack allocation scheme is applied with the goal of energy minimiza tion. This energy serves as Baseline Energy. In general, having a lower finish time can lead to a larger amount of slack that can be allocated to the appropriate tasks during the slack allocation step. This can lead to a large reduction in energy requirements as compared to algorithms that have a larger finish time. However, incorporating DVS based energy minimization during the assignment proc ess can provide better solutions for energy minimization. Thus our goal is to derive assignments that have better energy requirements than the baseline energy. PAGE 104 104 The baseline prioritization along with the en ergy requirements of each task is used to generate multiple prioritizations. Each prioritization is based on a parameter that weighs the importance of time versus energy for the assign ment. For each such prioritization, a time minimization assignment algorithm is applied to minimize total finish time. Note that if the finish time for a given prioritization is larger th an the deadline constraint it cannot be feasibly executed in the required time constraint and the prioritization is abandoned. For all the feasible prioritizations the following steps are applied: Step 1: An estimated deadline is assigned to each task. This estimated deadline is based on the criticality of the task in the schedule in order to meet the deadline constraints. Step 2: An assignment for the estimated DVS based energy minimization is now applied such that the estimated deadline constraint s defined in step 1 are generally met. If the above provides a feasible assignment (i.e., the one whose fi nish time is less than or equal to the deadline), a DVS ba sed slack allocation scheme is applied to minimize energy. The estimated deadline assigned to each task in step 1 as described above is parameterized based on a parameter Higher value of allows for potentially lower energy requirements as providing higher flexibility for processor selection, but a higher probability of deriving an assignment that does not meet the deadline constraints. The above steps are executed for each value of the parameter each potentially resulting in a different as signment. The feasible assignment with the least energy for different values of and is chosen. PAGE 105 105 Figure 5-1. A high level descripti on of proposed scheduling approach Assignment to minimize finish time Assignment to minimize finish time Time based task prioritization Energy based task prioritization Task prioritization Weighting factor on time, (a) Feasible task prioritization Feasible solution? Assignment to minimize finish time Assignment to minimize finish time Time based task prioritization Energy based task prioritization Task prioritization Weighting factor on time, (a) Feasible task prioritization Feasible solution? Task deadline based on the assignment to minimize time DVS Assignment to minimize energy Get a schedule with the minimum energy Weighting factor on latest finish time, Feasible solution? (b) Feasible task prioritization Task deadline based on the assignment to minimize time DVS Assignment to minimize energy Get a schedule with the minimum energy Weighting factor on latest finish time, Feasible solution? (b) Feasible task prioritization PAGE 106 106 It is worth noting that the above methodol ogy is independent of the time minimization assignment algorithm and the DVS scheme for sl ack allocation. As the time minimization assignment algorithm, we have used ICP based assignment (which will be presented in the next section) as it is shown to ha ve superior performance over pr ior algorithms. Also, as the DVS scheme, we have chosen PathDVS (presented in Chapter 3) which provides near optimal solutions for slack allocation with smaller computational time requirements. 5.2 Proposed Static Assignment to Minimize Finish Time Several scheduling algorithms for generating assi gnment that minimizes the finish time of DAGs for a heterogeneous environment have been recently proposed [44, 62, 64]. Most of them are based on static list schedu ling heuristics to minimize the finish time of DAGs, for example, Dynamic Level Scheduling (DLS) [62], Heterogene ous Earliest Finish Time (HEFT) [64], and Iterative List Scheduling (ILS) [44]. The DL S algorithm selects a task to schedule and a processor where the task will be executed at each step using earliest task first. The HEFT algorithm reduces the cost of scheduling by using pr e-calculated priorities of tasks in scheduling and uses the earliest finish time for the selecti on of a processor. This can in general provide better performance as compared to the DLS al gorithm. However, since the HEFT uses the average of computation time across all the proc essors for a given task to determine tasks priorities, it may lead an inaccurate ordering for executing tasks. To address the problem, the ILS algorithm generates an initial schedule by using HEFT and iteratively improves it by updating priorities of tasks. Our approach is based on the fact that task prioritization can be improved by using a group based approach. There are two main features of the proposed assignment algorithm, called Iterative Critical Path (ICP). First, assign mu ltiple independent ready tasks simultaneously. The computation of priority of a task depends on estim ating the execution path from this task to the PAGE 107 107 last task of the DAG representing the workflow. Since the mapping of tasks yet to be scheduled is unknown and the cost of task execution depends on the processor that is assigned, the priority has to be approximated during scheduling. Hence, it is difficult to explicitly distinguish the execution order of tasks with si milar priorities. Using this in tuition, the proposed algorithm forms independent ready tasks whose priorities are similar into a group and finds an optimal solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready tasks that can be assigned consists of tasks for which all the predecessors have already been assigned. Second, iteratively refine the scheduling. The scheduling is iteratively refined by using the cost of the critical path based on the assignment generated in the previous iteration. Assuming that the mappings of the previous iterat ion are good, it provides a better estimate of the cost of the critical path than using the average or median com putation and communication time as the estimate in the first iteration. 5.2.1 Task Selection To determine the scheduling order of tasks in our algorithm, the priority of each task is computed using its critical path, which is the length of the longest path from the task to an exit task. The critical path of each task is computed by traversing the graph from an exit task. The critical path of task i, cpi, is defined by jij succ i icpe avgCommTim e avgCompTim cpij max where avgCompTimei is the average computation time of task i, avgCommTimeij is the average communication time between task i and task j, and succi is the set of direct successors of task i in a DAG. Using the critical path of each task, the tasks are sorted into non-increasing order of their critical path values. The composition of this ge nerated task ordering list preserves the original PAGE 108 108 precedence constraints among tasks of the given DAG. During the as signment process, at each step a list of ready tasks is used for the next set of tasks that can be assigned. The list of ready tasks consists of tasks for which all the predec essors have already been assigned. ICP finds a subset of these ready tasks whose values of cr itical paths are similar among ready tasks for resource assignment, but which have no preceden ce relationships with each other. In other words, the subset is composed of independent tasks whose predecessors are all assigned to processors. The size of the selected subset is bounded by a pre-specified threshold value. The average values of computation time and communication time are used at the initial step. After the first assignment, the actual computation time and communication time based on the previous assignment are used for the computation of critical path. 5.2.2 Processor Selection ICP optimally assigns multiple independent ready tasks in the previous steps simultaneously on the available processors. For a list of independent ready tasks, ICP finds the best processor for each task in the list such that the total finish time of the selected subset of tasks is minimized (i.e., Option 1) or the sum of finish time on processors is minimized (i.e., Option 2). For the goal to reduce the finish time of a DAG, Option 1 is to apply the goal directly and Option 2 is to increase the possibility of minimizi ng the finish times of next tasks as leaving more space for next tasks. Any of both methods and their combination can be applied for the processor selection. The optimal solution for the processor selecti on with the selected subset of tasks is generated using ILP (Integer Linear Programming) formula tion. The formulation for Options 1 is as follows: PAGE 109 109 )(,,, )(,max max subject to ,max Minimizekpkji kpk pred ij ij ij j s iijcommTime ftime ime availableT compTime ftime p fitmeik Here, ftimeij is the finish time of task i on processor pj, s is the subset of ready tasks, P is the set of processors, compTimeij is the computation time of task i on processor pj, p(k) is the processor where task k is assigned, commTimei,j,k,p(k) is the communication time between task i on processor pj and task k on processor p(k) and predi is the set of direct predecessors of task i in a DAG. The available start time of task i from the free slot of processor pj is represented by availableTimeij. In the case of Option 2, only the objective function is changed from Option 1 and the constraints are same with Option 1. The formulation for Options 2 is as follows: )(,,, )(,max max subject to Minimizekpkji kpk pred ij ij ij j s iijcommTime ftime ime availableT compTime ftime p fitmeik We found that the schedules generated with ei ther of these two options were comparable. Thus, in the following, we limit ourselves to Option 1. 5.2.3 Iterative Scheduling The ICP assignment method is based on an itera tive scheduling in orde r to provide a better estimate of the cost of the cri tical path. Figure 5-2 presents a high level description of the ICP assignment procedure. PAGE 110 110 Figure 5-2. The ICP procedure In the first iteration, the estimation of the cr itical path is based on average computation time across all processors for the tasks yet to be scheduled. This can result in inaccuracies in estimating the critical path. To reduce or elimin ate the possibility of inappropriate assignment due to an inaccurate critical path estimate, ICP ite ratively reschedules tasks using a critical path Initialize 1. minFinishTime = maxValue 2. Compute the average of computation time and communication time for each task 3. Compute the critical path value fo r each task based on the average values Procedure ICP 4. While there is a continuous impr ovement of performance do 5. Generate the list of tasks, sorted by non-increasing order of the critical path values 6. While the list of tasks is not empty do 7. Find tasks i whose priorities are close, where i succk and ks 8. Insert them into the list of ready tasks s 9. Assign the ready tasks based on ILP formulation 10. If the finish time of each assigned task >= minFinishTime then 11. Break 12. End If 13. Delete tasks in s from and empty s, s = {} 14. End While 15. If total finish time is less than minFinishTime then 16. Update minFinishTime 17. Assign each task i to its selected processor 18. End If 19. Compute the critical pa th based on the current assignment 20. If the times that total finish time is not im proved over a current th reshold is greater than k times or the critical path is same with one of previous assignment then 21. Change the number of ready tasks, threshold 22. End If 23. End While End Procedure PAGE 111 111 which is determined based on the assignment fr om the previous itera tion of the scheduling algorithm. In other words, the critical path of each task depends on the previous assignment (i.e., The computation time of each task for the computa tion of critical path is its computation time on its assigned processor, not average computati on time across all processors and also the communication time among tasks is also based on the specified value based on their assigned processors). This iterative refinement continues till the total finish time does not decrease or the prespecified number of iteration times is complete d. The value of the threshold for the subset of tasks starts with a fixed valu e and is decremented by one if no reduction in finish time (i.e., schedule length) is seen after a few iterations. The change of threshold value increases the possibility to improve the performa nce in terms of finish time. 5.3 Proposed Static Assignment to Minimize Energy As described earlier, the prior research on scheduling for energy minimization has concentrated on the slack allocation step to minimize the energy requirements during a given phase while using simple list based scheduling ap proaches to minimize total finish time for the assignment step. Unlike these methods, our propos ed assignment algorithm considers the energy requirements based on potential sl ack during the assignment step. The main features of our assignment algorithm are as follows. First, utilize expected DVS based energy information during assignment. Our algorithm assigns the appr opriate processor for each task such that the total en ergy expected after slack alloca tion is minimized. The expected energy after slack allocation (i.e., expected DVS based energy) for each task is computed by using the estimated deadline for each task so that the overall DAG can be executed within the deadline of the DAG. Second, consider multiple task prioritizations. We test multiple assignments using multiple task prioritizations based on tradeoffs between energy and time for PAGE 112 112 each task. The execution of these assignments can be potentially done in parallel to minimize the computational time (i.e., runtime to execute algorithm). The details on task prioritization, estimated d eadline for each task, and processor selection for our assignment algorithm to minimize DVS based energy are described in the subsequent subsections. 5.3.1 Task Prioritization In the time minimization assignment methods, th e priorities of tasks which are used to determine the scheduling order of tasks are based on only using time information without paying any attention to energy requirements. The task prioritization in our al gorithm is based on a weighted sum between the time and energy re quirements. After applying an assignment algorithm to minimize finish time (i.e., baseline assignment), the baseline prioritization (i.e., time based prioritization) is generate d and used to determine the task prioritization for reapplying an assignment algorithm. Appropriate choice of we ight provides tradeoffs between energy and deadline constraints. To compute the time based priority of each ta sk (i.e., baseline prioritization), we use its critical path which is the length of the longest path from the task to an exit task. The critical path of each task is computed in the same way with one for ICP assignment presented in the previous section. The composition of the task ordering list generated based on the cr itical path of tasks preserves the original precedence constraints among ta sks of the given DAG. The critical path of task i, cpi, is defined by jij succ i icp commTime compTime cpij max where compTimei is the computation time of task i, commTimeij is the communication time between task i and task j, and succi is the set of direct successors of task i in a DAG. PAGE 113 113 Given the baseline prioritization, the priority of each task used in our algorithm is recomputed by incorporating the energy information. The priority of task i, priorityi, is defined by 10 where, / 1/ kk i i ienergy energy CPcp priority where CP is the critical path of a DAG (i.e., total finish time of a DAG), cpi is the critical path of task i, energyi is the energy consumed to execute task i, is the weight of time, and is the set of all tasks in a DAG. If the weighting factor is closer to zero, the task which requires the higher energy to be executed is assigned to the appropr iate processor with the higher priority than any other tasks with the lower energy consumption. It is expected to lead to better performance in terms of energy. However, due to the ignorance of time in formation, the finish time of the DAG may be larger and even the deadline constraints may not be satisfied. If the weighting factor is closer to one, the probability of a feasible assignm ent of the DAG is higher, but the lack of consideration on energy information may l ead to lower energy performance. The above prioritization is modified to acco mmodate the precedence relationships among tasks during assignment, i.e., a successor task is always assigned after its predecessor tasks. For instance of Figure 1-1, assume that the orderi ng of tasks based on the priority values is Due to the precedence relationships among tasks, the actual execution ordering for assignment is changed into Tasks 3, and 2 precede task 5 although their prio rities are lower and also task precedes task 4to execute task 5 ahead of task 4 based on their priorities. PAGE 114 114 5.3.2 Estimated Deadline for a Task The goal of the assignment is to minimize th e expected total ener gy consumption after slack allocation while still satisfying deadline constraints. Consider a scenario where the assignment of a subset of tasks has been alr eady completed and a given next task in the prioritization list has to be assigne d. The choice of the processors th at can be assigned to this task should be limited to the ones where expected finish time from the overall assignment will lead to meeting the deadline constraints (else this will re sult in an infeasible assignment). Clearly, there is no guarantee that the schedule derived will be is a feasible schedule (i.e., the schedule meeting deadline) at the time when the assignment for a given task is being determined because the feasibility of the schedule depends on the as signment of the other remaining tasks whose assignment is not determined. The proposed algorithm calculates the estimated deadline for each task, that is, deadline expected to enable a feasible schedule if the ta sks finish time satisfies its estimated deadline. The estimated deadline of a task is an interpolat ed value between the earliest finish time to the latest finish time using a weighting factor The latest finish time of task i, LFTi, its earliest finish time, EFTi, its estimated deadline, di, are respectively defined by ij j j succ pSucc pSucc i icommTime compTime LFT compTime LFT deadline LFTij i imin min i ij j pred pPred i icompTime commTime EFT EFTstart EFTij i max,, max 10 where, 1 i i iEFT LFT d where deadlinei is the deadline of task i, starti is the start time of task i, compTimei is the computation time of task i on its assigned processor, commTimeij is the communication time PAGE 115 115 between task i and task j on their assigned processors, succi is the set of direct successors of task i in a DAG, pSucci is the task put next to task i on the same assigned processor, predi is the set of direct pred ecessors of task i in a DAG, pPredi is the task put prior to task i on the same assigned processor, and is the weight of the latest finish time. If the weighting factor is closer to one, the task is allo wed more flexibility for processor assignment as the task can take a longer time to complete. However, the probability of feasible assignment of the DAG may be lower. If the weighting factor is closer to zero, there is less flexibility in assigning the task to a processor. However, the probability of a feasible assignment of the DAG is higher. Also, as this potentially generates more slack afte r assignment, the slack can be allocated by the DVS algor ithm for energy minimization. 5.3.3 Processor Selection Figure 5-3 presents a high level description of the assignment procedure for a given task prioritization. The task is assigned to a pro cessor such that the total energy consumption expected after applying DVS scheme for the task s that have already been assigned so far (and including the new task that is being considered for assignment) is minimized while trying to meet estimated deadline of the task. The candidate processors for the task are selected such that the task can execute within its estimated deadli ne. Once selecting the candidate processors for the task, the next process is followed depending on the following conditions: First, if no processor is available to satisfy the estimated deadline for the task, the processor with the earliest finish time is select ed. It is possible that it later becomes a feasible schedule as the assignment is based on the estimated times for future tasks whose assignment is yet to be determined. When the task finishes with in the range of its earliest finish time and its latest finish time, we assume that the deadlin e of a DAG can be met with a high probability. By PAGE 116 116 selecting a processor where the task finishes earlier, the chance to meet deadline becomes increased. Second, if there is only one ca ndidate processor that meets the above cons traint, the task is assigned to that processor. It is also in order to increase the chance to meet deadline constraints. Finally, if there are more than one candidate processors that meet the above constraint, a processor is selected such that the total energy expected after slack allocation is minimized. The expected total energy is the sum of expected energy of already assigned tasks and the task considered for assignment. For the computation of the expected energy for a given processor assignment in this step a faster heuristic base d strategy (as compared to PathDVS which provides nearly optimal solutions) is used. This procedure is described in the next subsection. The above selection process is it eratively performed until all tasks are assigned. However, if the finish time of a task exceeds th e deadline, the process stops. 5.3.3.1 Greedy approach for the computation of expected energy The unit slack allocation used in PathDVS algorithm (described in Chapter 3) finds the subset of tasks which maximally reduces the tota l energy consumption. This corresponds to the maximum weighted independent set (MWIS) problem [7, 53, 65]. This is computationally intensive. Our approach requires the use of a DV S scheme during the assignment of each task in order to compute expected DVS based energy to select the best processor in the processor selection step. This is an intermediate step wher e exact energy estimates are not as important as in the slack allocation step. To reduce the ti me requirements of the optimal branch and bound strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS problem [53] can be used while providing good es timates of energy. The greedy algorithm in our approach is as follows: PAGE 117 117 Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already a ssigned tasks and a task considered for assignment) Select a task with the maximum energy re duction among the independent tasks of the previously selected task Iteratively select a task until there is no independent task of the selected tasks The above greedy approach for unit slack alloca tion is iteratively performed until there is no slack or no task for slack allocation under the estimated deadline constraints. In the proposed greedy approach, the independent ta sks can be easily identified using compatible task matrix or lists which represent the list of tasks which can share unit slack together for each task or vice versa like in PathDVS. Figure 5-3. The DVSbasedAssignment procedure Procedure DVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task i can execute within its estimated deadline di Condition 1: If there is no processor 4.1. If the finish time of the task i > deadline 4.2. Stop the procedure 4.3. Else 4.4. Select a processor such that the finish time of the task i is minimized 4.5. End If Condition 2: If there is only one processor 4.1. Select the processor for the task i Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem for the task i and the already assigned tasks 4.2. Select a processor such th at the total energy is minimized 5. End For End Procedure PAGE 118 118 5.3.3.2 Example for assignment In the following, we briefly describe the benefit of considering DVS based expected energy for tasks during the assignment process by a simple example. Figure 5-4 (a) and (b) show a DAG with 4 tasks and the execution time and the energy consumption for each task on each processor at the maximum voltage level. There is large variation in the energy requirements of the task (This was done mainly to keep the exampl e simple in terms of the number of tasks). An assignment that minimizes total finish time is presented in Figure 5-4 (c). The total finish time is 7. The corresponding energy consumption before slack allocation is 27. The time based task prioritization correspond ing to this assignment is as follows: Consider that the deadline to comple te the execution of the DAG is 9. The prioritization is obviously feasible since the finish time is less than this deadline. The estimated deadline for each task is determined using the as signment shown in Figure 5-4 (c). The estimated deadlines for tasks and are 4, 6, 7, and 9 respectivel y based on weighting factor of latest finish equal to one. The proposed assignment method to minimize en ergy is now applied (N ote that the energy model follows a quadratic function and the unit slack is one unit). In the following, we show the assignment process based on the above prioritization order. First, consider task 1. If task 1 is assigned to processor p1, there is estimated slack of two units since its finish time is 2 and its estimated de adline is 4. After slack is allocated to the task, the energy consumption is 0.25. If the task is assigned to processor p2, the expected energy is 2.5 after allocating the estimated slack of two units to the task. Thus, the task is assigned to processor p1. PAGE 119 119 Second, consider task 2. If this task is assigned to processor p1, the estimated slack is two units. The entire slack can be allocated to task 1or 2, or a slack of one unit is allocated to tasks and 2 respectively. The better solution is to allocate the whole slack to task 2. Then the total energy for tasks 1 and based on this assignment is 2.25. If this task is assigned to processor p2, there is no estimated slack (sin ce the estimated deadline for task 2 is 6). However, the total expected energy based on this as signment is 2, so the task 2 is assigned to processor p2. Next, consider task 3. Processor p2 is not considered for task 3 because the finish time of task 3 on processor p2 exceeds its estimated deadline. Therefore, the task 3 is assigned to processor p1. Finally, consider task 4. If this task is assigned to processor p1, the estimated slack is three units. In this case, the entire slack is allocated to task 3 and then the total expected energy is 7.2. If this task is assigned to processor p2, then a slack of two units is allocated to task 3and a slack of one unit is allocated to task 2 and the total expected energy is 7.6. Thus, the task 4 should be assigned to processor p1 even though its energy requirements on processor p2 (i.e., energy before slack allocation) is less than that on processor p1. Figure 5-4 (d) shows the assignment to minimi ze DVS based energy. Here the total finish time is 9 and the total energy consump tion before slack allocation is 24. Once the assignment is completed, a slack allocation algorithm is applied to minimize the total energy requirements. Let us now compare the two assignments of Figure 5-4 (c) and (d) after the slack allocation. The a ssignment in Figure 5-4 (c) (i.e. assignment that minimizes finish time), a slack of two units is allocated to tasks 2 and 3 resulting in the total energy is 8.25. For the assignment in Figure 5-4 (d) (i.e., assignment that minimizes energy), the total energy after slack allocation is 7.2 this corresponds that the slack of three units is allocated to task 3. This PAGE 120 120 represents a 12.7% improvement in overall en ergy requirements. The algorithm was able to achieve this improvement by focusing the poten tial slack on task 3 which had higher energy requirements. Figure 5-4. Example of assignment to minimize finish time and assignment to minimize DVS based energy: (a) DAG, (b) Execution time and energy information for each task on two processors, (c) Assignment to minimi ze finish time, (d) Assignment to minimize DVS based energy (i.e., our assignment) 5.4 Experimental Results for Assignment Algorithms that Minimize Finish Time In this section, we present comparisons of our algorithm with algorithms that minimize total finish time followed by slack allocation. We compare the performance of the combination with ILS [44] and HEFT [64]. The latter two algorithms have been shown to be superior to existing algorithms for minimizing time for hete rogeneous environments. We combined these algorithms with three DVS algorithms, PathDVS which was presented in Chapter 3, EProfileDVS [48, 55], and GreedyDVS [13] in order to see if the DVS algorithm makes a difference in the relative comparison of the th ree assignment algorithms. The size of unit slack for PathDVS (i.e., unitSlack ) is set to the best size obtained empirically in the experiments shown in Chapter 3: unitSlack is equal to 0.001 total finish time. 1 20 1 10 P2 2 20 5 1 P1 Energy 2 2 3 2 P2 Time 2 2 2 2 P1 4 3 2 1 Task 1 2 3 4 ( a ) ( b ) 1 20 1 10 P2 2 20 5 1 P1 Energy 2 2 3 2 P2 Time 2 2 2 2 P1 4 3 2 1 Task 1 2 3 4 ( a ) ( b ) 0 1 2 3 4 5 6 7 8 9 P1 1 2 3 4 P2 (d) 0 1 2 3 4 5 6 7 8 9 P1 1 2 3 4 P2 0 1 2 3 4 5 6 7 8 9 P1 1 2 3 4 P2 (d)0 1 2 3 4 5 6 7 8 9 P1 1 3 2 4 P2 (c) 0 1 2 3 4 5 6 7 8 9 P1 1 3 2 4 P2 0 1 2 3 4 5 6 7 8 9 P1 1 3 2 4 P2 (c) PAGE 121 121 5.4.1 Simulation Methodology In this section, we describe DAG genera tion and performance measure used in our experiments. 5.4.1.1 The DAG generation We randomly generated a larg e number of graphs with 50 and 100 tasks. The execution time of each task on each processor at the maximu m voltage is varied from 10 to 40 units (given that we are targeting a heterogeneous environm ent) and the communication time between a task and its child task for a pair of processors is varied from 1 to 4 units. The energy consumed to execute each task on each processor is varied from 10 to 80. The execution of graphs is performed on 4, 8, and 16 processors. For each co mbination of values of number of tasks and processors, 20 different synthe tic graphs are generated. 5.4.1.2 Performance measures We used total finish time and improvement in total energy consumption for comparing the different algorithms. The deadline extension rate is the fraction of the total finish time that is added to the deadline (i.e., deadline = (1+deadline extension rate) maximum total finish time from assignments before applying DVS ). We provide experimental results for deadline extension rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6, 0.8, and 1.0. The total iteration times and the iteration times of unimproved state for the sa me threshold for ICP are set to 10 and 3 and the threshold varies 1 to 4. 5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms We compared our algorithm, ICP, with ILS [44] and HEFT [64] which outperform any other existing algorithms in terms of total finish time. They are compared in terms of total finish time and total energy consumption after applyi ng slack allocation in order to show the relationship between minimizing finish tim e and minimizing energy consumption. PAGE 122 122 A comparison of the three different algorithms shows that ICP was slightly better than ILS and considerably better than HEFT in terms of to tal finish time. The average total finish time of ICP is reduced by 3.95% and 9.31% compar ed to ILS and HEFT respectively. Tables 5-1, 5-2, 5-3, 5-4, 5-5, and 5-6 show the improvement of ICP-PathDVS over the remaining three assignment algorithms (i.e., ICP, ILS, and HEFT) and using three DVS algorithms (i.e., EProfileDVS, GreedyDVS, and Pa thDVS) in terms of energy consumption with respect to different deadline extension rates fo r each combinations of 50 and 100 tasks on 4, 8, and 16 processors, respectively. Based on the resu lts, our assignment algorithm, ICP, leads to lower energy requirements as compared to ot her assignment algorithms regardless of any DVS algorithms. For instance, using PathDVS algor ithm, the energy on ICP assignment reduces by 11-14% over ILS and 13-17% over HEFT. We believe the main reason is that having a lower finish time leads to a large amount of slack that can be allocated optimally to the appropriate tasks during the slack allocation step. This leads to a large reduction in energy requirements as compared to an algorithm that has a larger finish time. The results also show that the performance of PathDVS (which is presented in Chapter 3) outperforms compared to any other DVS al gorithms regardless of using any assignment algorithms in terms of minimizing energy. Fo r instance, given ICP assignment, PathDVS improves by 4-18% over EProfileDVS and 19-84% over Greedy depending on the values of deadline extension rate. Finally, the combination of ICP and Path DVS outperforms compared to any other combinations. For instance, the combined e ffects of ICP along with PathDVS provide an improvement of 13-26% over the comb ination of ILS and EProfileDVS. PAGE 123 123 Table 5-1. Results for 50 tasks and 4 processo rs: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline exte nsion rates (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 2.83%5.97%6.75%7.08%7.31% 7.36% ICP GreedyDVS 19.82%47.24%61.90%71.08%77.29% 81.70% PathDVS 12.15%11.68%11.93%12.10%12.28% 12.33% EProfileDVS 13.86%16.05%16.81%17.14%17.34% 17.38% ILS GreedyDVS 24.98%50.41%64.19%72.83%78.66% 82.80% PathDVS 21.80%17.94%17.88%17.98%18.14% 18.19% EProfileDVS 21.88%21.83%22.25%22.42%22.62% 22.65% HEFT GreedyDVS 26.00%50.08%63.93%72.63%78.51% 82.68% Table 5-2. Results for 50 tasks and 8 processo rs: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline exte nsion rates (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 3.72%10.08%12.08%13.14%13.94% 14.20% ICP GreedyDVS 20.40%49.71%64.46%73.37%79.29% 83.40% PathDVS 12.20%11.97%12.80%13.52%14.17% 14.49% EProfileDVS 14.82%20.55%22.29%23.36%23.99% 24.31% ILS GreedyDVS 26.64%53.44%67.10%75.36%80.85% 84.65% PathDVS 20.64%17.64%17.75%18.26%18.84% 19.11% EProfileDVS 21.06%24.97%26.37%27.35%27.95% 28.20% HEFT GreedyDVS 27.09%52.62%66.47%74.87%80.46% 84.34% PAGE 124 124 Table 5-3. Results for 50 tasks and 16 proce ssors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline exte nsion rates (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 5.04%11.73%13.00%13.93%14.35% 14.60% ICP GreedyDVS 20.99%49.48%63.85%72.81%78.80% 83.07% PathDVS 13.96%12.44%12.40%12.91%13.20% 13.43% EProfileDVS 16.26%22.29%23.60%24.45%24.88% 25.18% ILS GreedyDVS 24.92%51.16%64.97%73.66%79.46% 83.60% PathDVS 17.44%14.93%14.59%14.89%15.08% 15.24% EProfileDVS 18.01%24.05%24.96%25.91%26.28% 26.53% HEFT GreedyDVS 25.46%50.97%64.74%73.45%79.29% 83.45% Table 5-4. Results for 100 tasks and 4 processo rs: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline exte nsion rates (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 2.92%7.31%9.18%10.81%11.48% 11.97% ICP GreedyDVS 16.33%47.45%62.65%72.04%78.15% 82.46% PathDVS 9.16%8.40%9.16%10.29%10.71% 11.13% EProfileDVS 10.63%14.09%15.70%17.15%17.82% 18.30% ILS GreedyDVS 19.35%49.22%63.90%72.99%78.89% 83.06% PathDVS 17.11%13.48%13.15%14.15%14.28% 14.57% EProfileDVS 17.14%18.65%19.91%21.28%21.88% 22.35% HEFT GreedyDVS 19.61%48.82%63.62%72.78%78.73% 82.93% PAGE 125 125 Table 5-5. Results for 100 tasks and 8 processo rs: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline exte nsion rates (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 4.36%12.88%16.61%18.29%18.69% 19.43% ICP GreedyDVS 17.30%50.16%65.40%74.29%79.91% 83.99% PathDVS 8.86%8.58%10.39%11.76%12.02% 12.83% EProfileDVS 11.38%19.16%22.67%24.27%24.62% 25.29% ILS GreedyDVS 20.15%51.73%66.53%75.15%80.59% 84.53% PathDVS 14.07%11.12%12.52%13.82%14.08% 14.87% EProfileDVS 14.31%20.95%24.33%25.85%26.19% 26.86% HEFT GreedyDVS 19.57%50.94%65.98%74.75%80.28% 84.30% Table 5-6. Results for 100 tasks and 16 processo rs: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension ra tes for 100 tasks on 16 processors (unit: percentage) Deadline Extension Rate 0 0.2 0.4 0.6 0.8 1.0 EProfileDVS 5.06%16.17%18.78%19.73%20.22% 20.40% ICP GreedyDVS 19.28%52.75%67.13%75.46%80.93% 84.77% PathDVS 9.65%9.41%9.88%10.28%10.57% 10.82% EProfileDVS 12.85%23.59%26.09%26.97%27.44% 27.71% ILS GreedyDVS 23.23%54.82%68.55%76.52%81.75% 85.43% PathDVS 13.39%11.41%11.49%11.74%13.01% 13.91% EProfileDVS 14.25%24.50%26.75%27.43%28.71% 29.49% HEFT GreedyDVS 21.74%53.52%67.62%75.82%81.20% 84.99% PAGE 126 126 5.4.3 Comparison between CPS (Used in Prio r Scheduling for Energy Minimization) and ICP We also compared our algorithm to the CPS a ssignment algorithm that is typically used in the energy minimization literature [48]. Here we show the performance for a large number of graphs with 100 and 200 tasks on 4 and 8 processo rs. The other experimental settings (e.g., execution time, communication time etc.) are same with the above. The performance is also measured in terms of total finish time a nd total energy consumpti on after applying slack allocation in order to show the relationship between minimizing finish time and minimizing energy consumption The average ratio of total finish time of IC P to CPS is 0.71 and 0.59 on 4 and 8 processors respectively Figure 5-5 shows the result of comparison of ICP and CPS followed by slack allocation (i.e., PathDVS) in terms of total ener gy consumption. The results show that ICP assignment algorithm gives more energy savings co mpared to CPS assignment algorithm. It is because ICP gives more slack that can be used to save energy due to the earlier total finish time. For instance, the result s for 100 tasks on 8 processors showed that that ICP required 40% less time and 67-75% less energy as compared to CPS. And, the results for 10 0 tasks on 4 processors showed that that ICP required 29% less time and 48-56% less energy as compared to CPS. From these results, we can see that the assignment is one of critical factors to minimize energy consumption because less finish time make s more slack potentially used for energy minimization. PAGE 127 127 100 Tasks on 4 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy ICP-PathDVS CPS-PathDVS 100 Tasks on 8 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 00.20.40.60.81 Deadline Extension RateNormalized Energy ICP-PathDVS CPS-PathDVS 200 Tasks on 4 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy ICP-PathDVS CPS-PathDVS 200 Tasks on 8 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy ICP-PathDVS CPS-PathDVS Figure 5-5. Normalized energy consumption of ICP and CPS using Path DVS with respect to different deadline extension rates for differe nt number of tasks and processors: (a) 100 tasks on 4 processors, (b) 100 task s on 8 processors, (c) 200 tasks on 4 processors, and (d) 200 tasks on 8 processors 5.5 Experimental Results for Assignme nt Algorithms that Minimize Energy We have conducted a number of simulations to evaluate the benefits of our algorithm to other algorithms that do not cons ider energy profiles in the assignment. We also compared our proposed scheduling algorithm with GA based algorithms that consider multiple assignments [56, 57]. The performance of our energy based assignment algorithm is relatively independent of the slack allocation and the time minimization assignment algorithms. Given that PathDVS and ICP perform better than other related algorithms (as presented in Chapter 3 and Section 5.4 respectively), we use these algorithms for slack allocation and time minimization assignment respectively. PAGE 128 128 The experimental results are presented into tw o broad subsections. In the first subsection, we assume that the energy requirements of a task on a processor are relative ly independent of the execution time requirements. In the second subs ection, we assume that there is a strong correlation between time and energy requirements of executing the task on a processor. 5.5.1 Simulation Methodology In this section, we describe DAG genera tion and performance measure used in our experiments. 5.5.1.1 The DAG generation We randomly generated a larg e number of graphs with 50 and 100 tasks. The execution time of each task on each processor at the maximu m voltage is varied from 10 to 40 units and the communication time between a task and its child task for a pair of processors is varied from 1 to 4 units. The energy consumed to execute each task on each processor is varied from 10 to 80. The execution of graphs is performed on 4, 8, 16, and 32 processors. For each combination of values of number of tasks and processors, 20 different synthetic graphs are generated. 5.5.1.2 Performance measures The performance is measured in terms of normalized total energy consumption and computational requirements (i.e., runtime of algorithms). The former is defined as the total energy consumption normalized by the energy cons umption obtained from assignment algorithm without a DVS scheme. We assume that the deadline is always larger than or equal to the finish time of the DAG. Here the finish time of the D AG is based on the baseline assignment (i.e., time minimization assignment using time based prior itization). The deadline extension rate is the fraction of the total finish time that is added to the deadline (i.e., deadline = (1 + deadline extension rate) total finish time fr om assignment before applying DVS ). We provide PAGE 129 129 experimental results for deadline ex tension rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6, 0.8, and 1.0. 5.5.1.3 Variations of our algorithms We tested three variations of our algorithms to understand the impact of multiple prioritizations (based on parameter ) and variable estimates on deadline for each task (based on parameter ). The algorithms used in our experiments are classified into three categories: A0, A1, and A2. First, A0 is an assignment for time based task prioritization (= 1.0) and deadline estimate equal to the latest finish time (= 1.0). This is followed by a slack allocation and corresponds to an assignment that is based on using base prioritization and allowing for maximum allowable deadline for each task. Second, A1 is an ssignment for the weight of time equal to one and the vari ous weights of LFT (i.e., = 1.0, = 1.0, 0.75, and 0.5). For each of the feasible assignment, a final slack allocation step is performed. This corresponds to assignments that are based on using base prioritization. For this prioritizati on, attempt variable amounts of estimated deadline given by The basic idea here is that choosing the maximum allowable deadline for each task (i.e., higher value of ) may lead to infeasible assignments but may lead to best energy requirements by providing more flexibility for processor selection. Finally, A2 is an assignment for the various weights of time and LFT (i.e., = 0, 0.2, 0.4, 0.6, 0.8, and 1.0, = 1.0, 0.75, and 0.5). For all of th e feasible assignments, a final slack allocation step is performed. These correspond to assignments that are ba sed on multiple prioritizations. For each prioritization, attempt variable am ounts of estimated deadline given by The optimal values of and for A1 and A2 formulation are instance dependent. For each instance all the values are attempted and the one th at results in the minimal energy is chosen. We chose the range of values of and as discussed above based on initial experimentation. PAGE 130 130 5.5.1.4 Variations of GA based algorithms Genetic algorithms consist of a populati on of individuals that go through several generations. The algorithms in [56, 57] use a nested set of individuals. The first set corresponds to multiple mapping of tasks to processors. For each mapping, there is a population consisting of multiple individuals corresponding ordering or prioritiz ation of tasks. Each generation is used to generate the next generation using crossover and mutation. The former combines two individuals to generate a new set of two individuals. The latt er modifies one of the individual. The fitness of an individual is measured by the total energy requirements after applying a slack allocation scheme and the satisfaction of deadline constrai nts. There are several parameters including the number of individuals of the population, the crossover rate, and the mutation rate. The values of parameters used in GA are set as suggested in [56, 57]. We terminate the GA algorithm if the improvement is less than 1% afte r 10 generations as suggested in [56, 57] The performance of GA based algorithms depends on the slack alloca tion method and the initial seeding of the population. To show the comparison between our algorithms and GA based approaches, we conducted experiments with four variations of GA based algorithms: GARandNonOptimal, GARandOptimal, GASolNonOptimal, and GASolOptimal. GA using DVS scheme in [56, 57] with ra ndomly generated solutions for an initial population (i.e., GARandNonOptimal). This is the scheme that is presented in [56, 57] GA using PathDVS with randomly generated solutions for an initial population (i.e., GARandOptimal) GA using DVS scheme in [56, 57], with ra ndomly generated population consisting of A0 as one of the solution (i.e., GASolNonOptimal) GA using PathDVS with randomly generated pop ulation consisting of A0 as one of the solution (i.e., GASolOptimal) We chose different DVS schemes as the GA requires fitness calculations (in our case energy required) for each solution that is generated. We wanted to find out if a less computationally PAGE 131 131 intensive DVS scheme during the GA process can lead to similar solutions as a more computationally intensive DVS scheme. 5.5.2 DVS Schemes to Compute Expected Energy in Processor Selection Step As discussed in the algorithm section, our approach requires the use of a DVS scheme during the assignment of each task in order to co mpute expected DVS based energy to select the best processor in the processor selection step. This is an intermediate step where exact energy requirement is not needed. To reduce the time requirements of the optimal branch and bound strategy for unit slack allocation as described in Chapter 3, we us ed a greedy strategy. To test whether this strategy leads to inferior assignm ents, we compared the energy requirements using these two methods for slack allo cation during this inte rmediate step. Figure 5-6 shows this comparison for different deadline extension rates. Since the performance difference in terms of energy was not significant and the greedy scheme is one to two orders of magnitude faster, we chose a greedy based scheme for this step. 0 0.2 0.4 0.6 0.8 1 00.20.4 Deadline Extension RateNormalized Energy A0-Optimal A0-Greedy 0 10000 20000 30000 40000 50000 60000 70000 80000 00.20.4 Deadline Extension RateRuntime A0-Optimal A0-Greedy Figure 5-6. Comparison between optimal scheme and greedy scheme for processor selection of A0 for 50 tasks on 4 and 8 processors: (a) with respect to normalized energy consumption and (b) with respect to runtime (unit: ms) 5.5.3 Independence between Time and Energy Requirements In this section, we present the experimental results for the cases that the energy requirement of a task on a processor is relativel y independent of the execution time requirement. PAGE 132 132 5.5.3.1 Comparison of energy requiremen ts of proposed algorithms Figures 3-7 and 3-8 show the results of comparison of energy consumption for our algorithms (i.e., A0, A1, and A2) and baseline al gorithm (i.e., Base: the combination of ICP and PathDVS) with respect to different deadline extens ion rates for different number of processors (i.e., 4, 8, 16, and 32 processors) and tasks (i.e ., 50 and 100 tasks). Based on the results, all of our algorithms lead to significan t energy reduction compared to baseline algorithm. Furthermore, A2 is better than A1, while A1 is better than A0. For instance, using 1. 0 deadline extension rate for 32 processors, A0, A1, and A2 improves by 30.9%, 32.8%, and 36.8% over baseline algorithm, respectively. (a) 50 Tasks on 4 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (b) 50 Tasks on 8 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (c) 50 Tasks on 16 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (d) 50 Tasks on 32 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 Figure 5-7. Results for 50 tasks: Normalized en ergy consumption of our algorithms with respect to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors PAGE 133 133 (a) 100 Tasks on 4 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (b) 100 Tasks on 8 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (c) 100 Tasks on 16 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 (d) 100 Tasks on 32 Processors0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Deadline Extension RateNormalized Energy Base A0 A1 A2 Figure 5-8. Results for 100 tasks: Normalized en ergy consumption of our algorithms with respect to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors Figure 5-9 shows the improvement of our al gorithms over baseline algorithm (i.e., Base: ICP-PathDVS) with respect to different number of processors. Based on the results, as the number of processors increases, the perf ormance of our algorithms shows increased improvement over baseline algorithm. For instan ce, with 0.4 deadline extension rate, A0 improves by 8.4%, 11.3%, 21.4%, and 31%, A1 improves by 10.6%, 12.7%, 23.1%, and 33%, and A2 improves by 16.8%, 18.9%, 27.2%, and 35.8%, for 4, 8, 16, and 32 processors, respectively, as compared to baseline algorithm. PAGE 134 134 (a) No Deadline Extension 0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00% 7.00% 8.00% 9.00% 481632 Number of ProcessorsImprovement A0 A1 A2 (b) 0.2 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 481632 Number of ProcessorsImprovement A0 A1 A2 (c) 0.4 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 481632 Number of ProcessorsImprovement A0 A1 A2 (d) 0.6 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 481632 Number of ProcessorsImprovement A0 A1 A2 (e) 0.8 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 481632 Number of ProcessorsImprovement A0 A1 A2 (f) 1.0 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 481632 Number of ProcessorsImprovement A0 A1 A2 Figure 5-9. Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) with respect to different number of processors for variable deadline extension rates (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate (e) 0.8 deadline extension rate, and (f) 1.0 dead line extension rate 5.5.3.2 Comparison of energy requiremen ts with GA based algorithms We found that the GA (Genetic Algorithm) based algorithms have relatively poor performance and do not always generate a feasib le schedule (i.e., schedule that completes by a given deadline), especially when the deadline is tight (i.e., small values of deadline extension PAGE 135 135 rate). Based on the results, in general, A0 was considerably better than the GA the improvements ranged anywhere fr om 50%-70% of the energy requirements of the GA. In the following, we present the results of the comparis on of our algorithms with four variations of GA based algorithms (i.e., GARandNonOptim al, GARandOptimal, GASolNonOptimal, GASolOptimal). Comparison with GARandNonOptimal Figure 5-10 shows the result of comp arison between our algorithms and GARandNonOptimal in terms of energy consumpti on with respect to different number of tasks and processors. The GA based algorithm using in itial solutions for task ordering and mapping which are randomly generate d does not provide good perfor mance in terms of energy consumption. Furthermore, it cannot even gene rate a feasible schedule (i.e., schedule meeting deadline), especially under the tight deadline, when using the limited initial solution pool (i.e., 25 individuals for ordering and 50 individuals for mapping) and the cons traint for the termination of GA algorithm (i.e., repeat until no improvement of at least 1% is made for 10 generations) as suggested in [56, 57]. We then provide the results for deadline ex tension rate equal to 1.0 to fairly compare the energy with the feasible solutions generated. Based on the results, GARandNonOptimal gives even worse performance than the baseline algorithm (i.e., Base: ICPPathDVS) for example, 65% improvement of Base over GARandNonOptimal. Our algorithms, A0, A1, and A2, respectively improve by 68. 7%, 70.0%, and 73.1% in terms of energy consumption compared to GARandNonOptimal, for 8 processors. As the number of processors increases, our algorithms provide much more benefit. While A0 improves by 48.5% for 4 processors, it improves by 68.7% for 8 proce ssors. Our algorithms also provide better PAGE 136 136 performance as the number of tasks increases. For instance, A0 improves by 58.3% for 50 tasks and 61.5% for 100 tasks. 0 0.1 0.2 0.3 0.4 0.5 0.6 4 Processors 8 Processors Number of ProcessorsNormalized Energy Base GARandNonOptimal A0 A1 A2 0 0.1 0.2 0.3 0.4 0.5 50 Tasks 100 Tasks Number of TasksNormalized Energy Base GARandNonOptimal A0 A1 A2 Figure 5-10. Normalized energy consumption of GARandNonOptimal and our algorithms for different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks Comparison with GARandOptimal The performance did not significantly improve by using a better slack allocation scheme like PathDVS which provides near optimal solutions for energy minimization. Like GARandNonOptimal, due to the use of randomly generated initial solutions, the limited number of individuals and the constraint for te rmination, GARandOptimal does not give good performance. Figure 5-11 shows the result s of comparison between our algorithms and GARandOptimal in terms of energy consumption w ith respect to different number of tasks and processors for 1.0 deadline extension rate. Based on the results, our algorithms, A0, A1, and A2, respectively improve by 69.1%, 70.5%, and 73.5% in terms of energy consumption compared to GARandOptimal, for 8 processors. Also, our algo rithms become better over GARandOptimal as the number of processors and tasks becomes increased. For instance, A0 improves by 46.5% and 69.1% for 4 and 8 processors, and 58.8% and 60.5% for 50 and 100 tasks, respectively. PAGE 137 137 0 0.1 0.2 0.3 0.4 0.5 0.6 4 Processors 8 Processors Number of ProcessorsNormalized Energy Base GARandOptimal A0 A1 A2 0 0.1 0.2 0.3 0.4 0.5 50 Tasks 100 Tasks Number of TasksNormalized Energy Base GARandOptimal A0 A1 A2 Figure 5-11. Normalized energy consumption of GARandOptimal and our algorithms for different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks Comparison with GASolNonOptimal Next we tried the other appr oach that seeds the population with a good solution (from A0) because using all the randomly generated initial solutions leads to poor performance. Figure 5-12 shows the result of comparison between our al gorithms and GASolNonOptimal in terms of energy consumption with respect to different deadline extension rates fo r different number of tasks and processors. Although GASolNonOptimal uses one good solution from A0, no significant improvement was achieved as compared to A0. It is because their DVS scheme used in GASolNonOptimal does not provide good performa nce in terms of energy consumption, while our algorithms use PathDVS, optimal DVS scheme Based on the results, our algorithms, A0, A1, and A2, respectively improve by 11.1%, 15.0% and 20.5% compared to GASolNonOptimal, for 100 tasks on 8 processors with 1.0 deadline extension rate. Figure 5-13 shows the results of comparison between our algorithms and GASolN onOptimal in terms of energy consumption with respect to different number of tasks a nd processors for 1.0 deadline extension rate. PAGE 138 138 50 Tasks on 4 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy Base GASolNonOptimal A0 A1 A2 50 Tasks on 8 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 00.20.40.60.81 Deadline Extension RateNormalized Energy Base GASolNonOptimal A0 A1 A2 100 Tasks on 4 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy Base GASolNonOptimal A0 A1 A2 100 Tasks on 8 Processors0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.20.40.60.81 Deadline Extension RateNormalized Energy Base GASolNonOptimal A0 A1 A2 Figure 5-12. Normalized energy consumption of GASolNonOptimal and our algorithms with respect to different extension rates for diffe rent number of tasks and processors: (a) 50 tasks and 4 processors, (b) 50 tasks and 8 processors, (c) 100 tasks and 4 processors, and (d) 100 tasks and 8 processors 0 0.05 0.1 0.15 0.2 0.25 4 Processors8 ProcessorsNumber of ProcessorsNormalized Energy Base GASolNonOptimal A0 A1 A2 0 0.05 0.1 0.15 0.2 0.25 50 Tasks100 TasksNumber of TasksNormalized Energy Base GASolNonOptimal A0 A1 A2 Figure 5-13. Normalized energy consumption of GASolNonOptimal and our algorithms: (a) with respect to different number of processors a nd (b) with respect to different number of tasks PAGE 139 139 Comparison with GASolOptimal Figure 5-14 shows the results of comparison between our algorithms and GASolOptimal in terms of energy consumption with respect to different number of tasks and processors for 1.0 deadline extension rate. Although GASolOptimal us es one good solution from A0 and a near optimal DVS scheme, no significant improveme nt was achieved as compared to A0. The performance of A0 and GASolOptimal is very si milar the fractional difference between energy requirements of A0 and GASolOptimal was between 0.00009 and 0.002. Furthermore, our algorithms with iteration (i.e., A1, A2) provide th e improved performance. Based on the results, our algorithms, A1 and A2, respectively improve by 4.6% and 14.3% for 8 processors, 5.3% and 16.3% for 100 tasks. 0 0.05 0.1 0.15 0.2 0.25 4 Processors8 Processors Number of ProcessorsNormalized Energy Base GASolOptimal A0 A1 A2 0 0.05 0.1 0.15 0.2 0.25 50 Tasks 100 Tasks Number of TasksNormalized Energy Base GASolOptimal A0 A1 A2 Figure 5-14. Normalized energy consumption of GASolOptimal and our algorithms: (a) with respect to different number of processors a nd (b) with respect to different number of tasks 5.5.3.3 Comparison of time requirements Figure 5-15 shows the results of runtime re quirements of our algorithms in terms of runtime with respect to different deadline extens ion rates. The total runtime for A1 and A2 is proportional to the number of different values of and times the runtime of A0. It is worth noting that, since A1 and A2 can effectively exec ute in parallel, their runtime can be reduced significantly in a parallel environments. PAGE 140 140 (a) 50 Tasks0 10000 20000 30000 40000 50000 60000 70000 80000 00.20.40.60.81Deadline Extension RateRuntime A0 A1 A2 (b) 100 Tasks0 100000 200000 300000 400000 500000 600000 700000 800000 900000 00.20.40.60.81Deadline Extension RateRuntime A0 A1 A2 Figure 5-15. Runtime to execute ou r algorithms with respect to variable deadline extension rates for different number of tasks (unit: ms): (a) 50 tasks and (b) 100 tasks Figure 5-16 shows the result of comparison of A0 and GA based algorithms in terms of computational time (i.e., runtime taken to execu te algorithms) for 1.0 deadline extension rate with respect to different number of tasks. Based on the results, A0 is two orders of magnitude faster than GA based algorithms using a subopt imal DVS scheme (i.e., RandNonOptimal and GASolNonOptimal). Furthermore, A0 is 2237, 2406 times faster than GARandOptimal and GASolOptimal which a nearly optimal DVS scheme is used. 1 10 100 1000 10000 100000 1000000 10000000 100000000 50 Tasks100 Tasks Number of TasksRuntime A0 GARandNonOptimal GASolNonOptimal GARandOptimal GASolOptimal Figure 5-16. Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms logarithmic scale) PAGE 141 141 5.5.4 Dependence between Time and Energy Requirements In the experimental results presented in the pr evious section, we assumed that the time and energy requirements of a task were independent of each other. We also conducted experiments to see the performance with various degrees of correlation between time and energy consumption for tasks on a given processor. We define a parameter that controls this correlation (i.e., correlation rate). The energy for a task is propo rtional to the execution time multiplied a value varied from (1 ) to (1 + ) (i.e., energy of each task = execution time of each task [1, 1+] ). We experimented with a number of values for and present results for equal to 0, 0.4, and 0.8. We also compare the results with th e independent case as defined in the previous section. This corresponds to rand. Figure 5-17 shows the energy improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) for variable values of and different deadline extension rates, for 8 processors respectively. Based on the results, as the parameter increases, the relative improvement of our algorithms increases. For instance, with 1.0 deadline extension rate, A0 improves by 4%, 5.7%, and 17.9%, A1 improves by 7.6%, 9.5%, and 20.5%, and A2 improves by 15.5%, 19.2%, and 30.3%, for equal to 0, 0.4, and 0.8 respectively. For the case of timeindependent energy consumption (based on our experimental setting), the improvement is between value of between 0.4 and 0.8. For instance, using the rand option (i.e., timeindependent energy consumption), A0 improves by 12.5%, A1 improves by 16.9%, and A2 improves by 23.2%, for 1.0 deadline extension rate. PAGE 142 142 (a) No Deadline Extension0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8randCorrelation Rate Improvement A0 A1 A2 (b) 0.2 Deadline Extension Rate0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8randCorrelation Rate Improvement A0 A1 A2 (c) 0.4 Deadline Extension Rate0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Energy Heterogeneity RateImprovemen t A0 A1 A2 (d) 0.6 Deadline Extension Rate0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8randCorrelation Rate Improvement A0 A1 A2 (e) 0.8 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 0 0.4 0.8 rand Correlation Rate Improvement A0 A1 A2 (f) 1.0 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 Figure 5-17. Results for 4 processors: Improveme nt of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate (e) 0.8 deadline extension rate, and (f) 1.0 dead line extension rate PAGE 143 143 (a) No Deadline Extension 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 (b) 0.2 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 (c) 0.4 Deadline Extension Rate0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Energy Heterogeneity RateImprovemen t A0 A1 A2 (d) 0.6 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 (e) 0.8 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 (f) 1.0 Deadline Extension Rate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 00.40.8rand Correlation Rate Improvement A0 A1 A2 Figure 5-18. Results for 8 processors: Improveme nt of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate PAGE 144 144 CHAPTER 6 DYNAMIC ASSIGNMENT We assum e that a static scheduling algorith m has already been applied before executing tasks and the schedule needs to be adjusted whenev er a task finishes before its scheduled time. Thus this schedule is updated whenever a dynamic scheduling is applied. When a task finishes before its estimated time, two changes may occur for all the remaining tasks (i.e., tasks that have not yet executed) in the schedul e. Its processor mapping may cha nge (along with the start time and end time). Also, the amount of slack (time over minimum execution time for that processor based on executing the task at maximum voltage) may change. Most prior research on scheduling for en ergy minimization does not focus on the assignment process, in particular, in dynamic envi ronments. We have shown that reallocating the slack at runtime (i.e., dynamic slack allocation) l eads to better energy minimization in Chapter 4. We also showed that applying our dynamic slack allocation method at runtime not only outperforms the existing greedy method but also is comparable to static near optimal methods applied at runtime in terms of energy requirements in Chapter 4. In this chapter, we explore whether reassignm ent of tasks along with reallocation of slack during runtime can lead to even better performa nce in terms of energy minimization. For an approach that is effective at runtime, its overhead should be small for it to be useful. The proposed dynamic scheduling algorithm utilizes several threads to generate a schedule: One set for reallocating slack while keepi ng the assignment in the current schedule. Another set for changing the assignm ent and then reallocating slack. Then a schedule providing the minimum energy is selected. As described in Chapter 4, for the dynamic scheduling (i.e., rescheduling), there are two steps that need to be addressed. First, select th e subset of tasks for resc heduling. The potentially PAGE 145 145 rescheduled tasks via the dynamic scheduling algo rithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic scheduling is applied to th e subset of tasks among the tasks. The tasks considered for rescheduling are limited in or der to minimize the overhead of reassigning processors and reallocating the sl ack during runtime. Clearly, this should be done so that the other goal of energy reduction is also met simultaneously. Second, determine the time range for the selected tasks. The time range of the selected task s has to be changed as some of the tasks have completed earlier than expected. Base d on the computation time in the schedule and assignment-based dependency relationships amo ng tasks, we recompute the time range (i.e., earliest start time and latest fini sh time) where the selected ta sks should be executed. The time range is defined differently for reassignment a nd slack reallocation tim e range over processors for reassignment and time range for the selected tasks given an assignment for slack reallocation. However, the main concept is same as the selected tasks have to be reassigned and reallocated slack within this time range in or der to meet deadline constraints. At this stage our proposed reassignment algo rithm and slack reallo cation approach are applied to the subset of tasks within the time range as described above. The computational time (i.e., runtime overhead) is kept small due to the limited number of tasks selected for rescheduling. While several assignment methods can be applied using threads, we propose a reassignment method based on our method describe d in Chapter 5. This incorporates the expected DVS based energy information duri ng the reassignment process. The dynamic assignment algorithm is described in detail in the next section. 6.1 Proposed Dynamic Assignment This section presents a novel dynamic assignm ent algorithm which reassigns processors for the reschedulable tasks at runtime. The ma in feature of our propos ed reassignment algorithm PAGE 146 146 is to consider the energy requirements based on potential slack during the assignment step. In other words, the algorithm assigns an appropriate processor for each reschedulable task such that the total energy expected after slack allocation is minimized. The expected energy after slack allocation for each reschedulable task is computed by using the estimated deadline for the task so that the overall DAG can be executed by the deadline. 6.1.1 Choosing a Subset of Tasks for Rescheduling The proposed dynamic scheduling algorithm, k lookahead approach, is based on choosing a subset of tasks for which the schedule will be readjusted. The schedule for the remaining tasks (i.e., tasks not selected for the rescheduling) is not affected. Figure 5-1 shows the subset of tasks for rescheduling in an assignment DAG when task 2 finishes early. Using k lookahead approach, all tasks within a li mited range of time are considered for the readjustment of schedule. The range of time is limited with the value of k (i.e., k maximum computation time of tasks ). In the example of Figure 5-1, assume that the computation time of each task is one unit, the communication time am ong tasks is zero, and the tasks in the same depth finish at the same time for ease of presen tation of the key concepts. In this case, if k is equal to 2, the time range would be 2 units (2 one unit) and then tasks within the time range from the finish of task 2, e.g., 4, 5, 6, 7,8,9,and10, are considered. The set of tasks selected for the rescheduling is defined by s.t. where }, maxl l l j l i l i i allocationestaticFTim ftime compTime k*fimeime staticFT ftimeme|staticSTi{ j where staticSTimei is the start time of task i in the static or previous schedule, staticFTimei is the finish time of task i in the static or previous schedule, ftimel is the actual finish time of task l at PAGE 147 147 runtime, and compTimej is the computation time of task j on its assigned processor, a.k.a., the estimated execution time at the maximum voltage. The approach with all option for k (i.e., k-all lookahead approach) corresponds to the static scheduling approach wit hout the limitation on the time range for tasks considered for rescheduling. Thus, the k-all lookahead approach is same as applying the static scheduling algorithm to all the remaining tasks at runtime. One would expect this to be close to the best that can be achieved. The set of tasks select ed for the rescheduling is defined by l l l l i i allocationestaticFTim ftime ftimeestaticSTim s.t. where}, |{ 6.1.2 Time Range for Selected Tasks The schedule for tasks not in the set of reschedul able tasks is kept to be the same (this is based on static schedule or schedul e generated by last rescheduling) For the set of reschedulable tasks, the range of time to ex ecute them is defined for feasible solutions before dynamic scheduling algorithm. The time range is diffe rently defined for reassignment and slack reallocation time range over each processor for reassignment and time range for the set of reschedulable tasks given an assignment for slack reallocation. It is because reassignment can map a task to any processor. Even in a case that there is a processor where no reschedulable task is assigned, the time range over the processo r for reassignment may be limited based on the assignment of other tasks no in the set of reschedulable tasks. Meanwhile, for slack reallocation, there is no need to define the time range for all processors, but only for the set of selected tasks because the slack is reallocated based on a given assignment. For reassignment, the time range of processors is defined as follows. First, the minimum computation time of a task is set to its estimated time at the maximum voltage (i.e., staticCTimei = compTimei where i allocation. Here staticCTimei is the computation PAGE 148 148 time of task i in the static or previous schedule generated by the last rescheduling). This is the same time that was used during static assignm ent process. This effectively ensures that maximum flexibility is available for reassignment. Second, the available start time of each proce ssor is the possible earliest start time of each processor for the tasks. It is set to the expected finish time (i.e., the finish time in the current schedule) of the last task that is not in the set of reschedulable tasks and already started when applying an algorithm (it is still executing or fi nished) on each processor (i.e., a task with the latest finish time on each processor among tasks not in the set of reschedulable tasks). It is worth noting that it is not the earliest start time of reschedulable tasks on each processor. The earliest start times of the tasks on a processor are di fferent due to the precedence relationships among other tasks. The available start time of a processor pj, procSTimej, is defined by i i l i ji i jestaticSTim ftimeestaticSTim pprocwhere estaticFTim procSTime max& & Finally, the deadline of each proc essor is the possible latest finish time of each processor for the tasks. It is set to the e xpected start time (i.e., the start ti me in the current schedule) of the first task that is not in the set of reschedulab le tasks and is not started yet when applying an algorithm on each processor (i.e., a task with the earliest start time on each processor among tasks not in the set of reschedulable tasks). It is wo rth noting that it is not the latest finish time of reschedulable tasks on each processor. Like the earli est start time, the latest finish times of the tasks on a processor are different due to the pr ecedence relationships among other tasks. The deadline of a processor pj, procDeadlinej, is defined by i i ji c allocation i i jestaticSTim pproc whereestaticSTim ne procDeadli min& & PAGE 149 149 6.1.3 Estimated Deadline and Energy The goal of the assignment is to minimize th e expected total ener gy consumption after slack allocation while still satisfying deadline constraints. Consider a scenario where the assignment of a subset of tasks has already b een completed and a given next task in the prioritization list has to be assigne d. The choice of the processors th at can be assigned to this task should be limited to the ones where expected finish time from the overall assignment will lead to meeting the deadline constraints (else this will result in an infeasible assignment). Clearly, there is no guarantee that the schedule derived will be a feasible schedule (i.e., a schedule meeting deadline) at the time when the assignment for a given task is being determined because the feasibility of the schedule depends on the as signment of the other remaining tasks whose assignment is not determined. The proposed algorithm calculates the estimated deadline for each task, that is, deadline expected to enable a feasible schedule if the ta sks finish time satisfies its estimated deadline. The estimated deadline of a task is set to the late st finish time in order to allow more flexibility for processor assignment as the task can take a longer time to complete (while the probability of feasible schedule for DAG may be lower). The latest finish time of task i, LFTi, is defined by ij j j succ pSucc pSucc i icommTime estaticCTim LFT estaticCTim LFT deadline LFTij i imin min Here the latest finish time of a task is differe nt based on its potential assigned processor due to the assignment-based dependency relationship among tasks. From this fact, the time limit which a task should be completed within will vary for processors. Using this estimated deadline, the estimated energy of reschedulable tasks is computed while selecting processors for reassignment. The estimated energy is the energy expected after PAGE 150 150 slack allocation. For the computation of the es timated energy, we apply the principle of unit slack allocation used in PathDVS algorithm which is a static slack allocation algorithm providing near optimal solutions. The unit slack allocation used in PathDVS algorithm (described in Chapter 3) finds the subset of tasks which maxi mally reduces the total energy consumption. This corresponds to the maximum weight ed independent set (MWIS) pr oblem [7, 53, 65]. This is computationally intensive. Our approach re quires the use of a DVS scheme during the assignment of each task in order to compute e xpected DVS based energy to select the best processor in the processor sele ction step. This is an interm ediate step where exact energy estimates are not as important. To reduce the time requirements of the optimal branch and bound strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS problem [53] can be used. The greedy algor ithm in our approach is as follows: Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already a ssigned tasks and a task considered for assignment). Select a task with the maximum energy re duction among the independent tasks of the previously selected task. Iteratively select a task until there is no independent task of the selected tasks. The above greedy approach for unit slack alloca tion is iteratively performed until there is no slack or no task for slack allocation under the estimated deadline constraints. In the proposed greedy approach, the independent ta sks can be easily identified using compatible task matrix or lists which represent the list of tasks which can share unit slack together for each task or vice versa like in PathDVS. 6.1.4 Processor Selection Figure 6-1 presents a high level description of the assignment procedure. The task is assigned to a processor such that the total en ergy consumption expect ed after applying DVS PAGE 151 151 scheme for the tasks that have already been assi gned so far (and including the new task that is being considered for assignment) is minimized while trying to meet estimated deadline of the task. The candidate processors for the task are sel ected such that the task can execute within its estimated deadline. Note that the estimated deadline of a task may be different based on processors. Once selecting the candidate processo rs for the task, the next process is followed depending on the three following conditions. First, if no processor is available to satisfy the estimated deadline, the processor with the earliest finish time is selected (it is possible th at it later becomes a feasible schedule as the assignment is based on estimated times for future tasks whose assignment is yet to be determined). When the task finishes within its late st finish time, we assume that the deadline of a DAG can be met with a high probability. By selecti ng a processor where the task finishes earlier, the chance to meet deadline becomes increase d. However, if its finish time exceeds the time range for reschedulable tasks or its specific d eadline, the reassignment process stops because it means the schedule will not meet deadline constraints obviously. Second, if there is only one processor that m eets the above constraint, the task is assigned to that processor. It is also in order to increase the chance to meet deadline constraints. Finally, if there are more than one candidate processors that meet the above constraint, a processor is selected such that the total energy expected after slack allocation is minimized. The expected total energy is the sum of expected energy of already assigned tasks and the task considered for assignment. For the computation of the expected energy for a given processor assignment in this step a faster heuristic based strategy (as compared to PathDVS which is nearly optimal) is used as described in the previous subsection. PAGE 152 152 The above selection process is iterativel y performed until all selected tasks for rescheduling are assigned. However, if the finish time of a task exceeds the deadline, the process stops and the previous assignment is kept for all reschedulable tasks. Figure 6-1. The DynamicDVSbasedAssignment procedure 6.2 Experimental Results In this section, we compar e the performance of the combination of dynamic assignment and dynamic slack allocation proposed in this pa per (i.e., DynamicAssgn) with the following two main methods which outperform othe r existing in each given state: Procedure DynamicDVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task i can execute within its estimated deadline Condition 1: If there is no processor 4.1. If the finish time of the task i > deadline 4.2. Stop the procedure 4.3. Else 4.4. If there is any processor such that the task can execute within processors deadline for reschedulable tasks 4.5 Select a processor such that the finish time of the task i is minimized 4.6. Else 4.7. Stop the procedure 4.8. End If 4.9. End If Condition 2: If there is only one processor 4.1. Select the processor for the task i Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem for the task i and the already assigned task 4.2. Select a processor such th at the total energy is minimized 5. End For End Procedure PAGE 153 153 Static scheduling (i.e., StaticDVS) presented in Chapter 3: This static scheduling provides near optimal solutions for energy minimization given an assignment. However, it keeps the schedule generated at compile time during runtime. Dynamic slack allocation (i.e., DynamicDVS) presented in Chapter 4: This dynamic slack allocation readjusts the schedule whenever a ta sk finishes earlier than expected during runtime while keeping a given assignment. In our experiments, k-3 time lookahead slack allocation approach which gives good perfor mance in terms of energy is used. The dynamic algorithms (i.e., DynamicDVS, D ynamicAssgn) are applied to a static schedule that is based on a known assignment al gorithm which assigns based on the early finish time and a static slack allocation algorithm. We use a static scheduling algorithm presented in Chapter 3. Also, for a fair comparison with DynamicDVS, k-3 time lookahead approach for DynamicAssgn is used and PathDVS is used as a slack allocation method applied at runtime. 6.2.1 System Methodology In this section, we describe DAG generati on, dynamic environments generation, and performance measure used in our experiments. 6.2.1.1 The DAG generation We randomly generated a larg e number of graphs with 50 and 100 tasks. The execution time of each task on each processor at the maximu m voltage is varied from 10 to 40 units and the communication time between a task and its child task for a pair of processors is varied from 1 to 4 units. The energy consumed to execute each ta sk on each processor is varied from 10 to 80. The execution of graphs is performed on 4, 8, 16, and 32 processors. 6.2.1.2 Dynamic environments generation There are two broad parameters for dynamic environments: The number of tasks that finish earlier than expected (i.e., tasks whose the actual execution time is less than its estimated execution time) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyF inishedTaskRate total number of tasks ). The amount of decrease for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecrea seRate estimated execution time). PAGE 154 154 We experimented with earlyFinishedTaskRate s equal to 0.2, 0.4, 0.6, and 0.8 and timeDecreaseRates equal to 0.1, 0.2, 0.3, and 0.4. 6.2.1.3 Performance measures The deadline extension rate is the fraction of the total finish time that is added to the deadline (i.e., deadline = (1 + deadline extension rate) total finish time from assignments without DVS scheme ). We experimented with deadline exte nsion rates equal to 0 (no extension), 0.01, 0.02, 0.05, 0.1, and 0.2, but only the result s for no deadline extension are presented due to space limitations since the results are similar. To compare algorithms, the normalized energy consumption, that is, total energy normalized by the energy obtained from the static assignment (before applying static sl ack allocation), is used. The comput ational time (i.e., runtime overhead) is also performed as an important measure for algorithms in dynamic environments. 6.2.2 Comparison of Energy Requirements Figures 6-2, 6-3, 6-4, and 6-5 show the comp arison of our algorithm with static scheduling and dynamic slack allocation in terms of energy consumption with respect to different time decrease rates and different early finished task rates for 4, 8, 16, and 32 processors, respectively. Based on the results, the combination of dynami c assignment and dynamic slack allocation (i.e., DynamicAssgn) significantly ou tperforms static scheduling a nd dynamic slack allocation in terms of energy consumption. For instance, fo r 32 processors, DynamicAssgn improves energy requirements by 15-26% and 8-12% compared to StaticDVS and DynamicDVS respectively. These results show that adjusting the assignment at runtime as well as adjusting the slack at runtime is necessary for minimizing the energy requirements. Furthermore, in general, the improvement of DynamicAssgn over the other two algorithms increases as timeDecreaseRate and earlyFinishedTaskRate increase. PAGE 155 155 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn Figure 6-2. Results for 4 processors: Norm alized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks PAGE 156 156 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn Figure 6-3. Results for 8 processors: Norm alized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks PAGE 157 157 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn Figure 6-4. Results for 16 processors: No rmalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks PAGE 158 158 0.2 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.4 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.6 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn 0.8 Early Finished Task Rate0.5 0.6 0.7 0.8 0.9 1 0.10.20.30.4Time Decrease RateNormalized Energ y StaticDVS DynamicDVS DynamicAssgn Figure 6-5. Results for 32 processors: No rmalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respec t to different time decrease rates and early finished task rates for 50 and 100 tasks 6.2.3 Comparison of Time Requirements Figure 6-6 shows the average time requirement to readjust the schedule due to a single tasks early finish (i.e., runtime overhead). The computational time of DynamicAssgn is an order of magnitude larger than DynamicDVS since DynamicAssgn requires assignment process as well as slack allocation process. However, DynamicAssgn requires 0.02-0.04 seconds in average to readjust the schedule at run time this small time should make it useful for a large number of computation intensive applications. PAGE 159 159 50 Tasks1000 10000 100000 1000000 10000000 100000000 0.10.20.30.4Time Decrease RateComputational Time DynamicDVS DynamicAssgn 100 Tasks1000 10000 100000 1000000 10000000 100000000 0.10.20.30.4Time Decrease RateComputational Time DynamicDVS DynamicAssgn Figure 6-6. Computational time to readjust the schedule from an early finished task with respect to different time decrease rates (unit: ns via logarithmic scale) PAGE 160 160 CHAPTER 7 CONCLUSION AND FUTURE WORK Energy consum ption is a critical issue in pa rallel and distributed embedded systems. The scheduling for DVS based energy minimization broa dly consists of two steps: assignment and slack allocation. Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level. Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the to tal energy consumption is minimized while the DAG can execute within a given deadline. We have presented novel scheduling al gorithms to minimize DVS based energy consumption of DAG based applications under the deadline constraints for parallel systems. The proposed scheduling algorithms are classified in to four categories: st atic slack allocation, dynamic slack allocation, static assignment, and dynamic assignment, presented in Chapter 3, 4, 5, and 6, respectively. In this chapter, we review our main cont ributions for scheduling algorithms presented in this thesis. 7.1 Static Slack Allocation In Chapter 3, we have presented a novel stat ic slack allocation algorithm (i.e., static DVS scheme) for DAG based application in parallel and distributed sy stems. There are three main contributions of our method: The performance in terms of reducing energy is comparable to LP (Linear Programming) based algorithm which provide s near optimal solutions. It requires significantly less memory as comp ared to the LP based algorithm and can be scaled to larger size problems. The time requirements of our algorithm are an orde r to two orders of magnitude faster than the LP based algorithm when th e amount of total available slack is small (i.e., tight deadline). PAGE 161 161 Our experimental results also show that the energy reducti on of our proposed algorithm is considerably better than simplistic schemes. Fu rthermore, based on the efficient techniques for search space reduction such as compatible task lists, compression, and lower bound, the branch and bound search method can be effectively used to provide near optimal solutions for energy minimization while requiring the low computational time. 7.2 Dynamic Slack Allocation In Chapter 4, we have presented novel sl ack allocation algorithms to minimize energy consumption/meet deadline constraints for DAG based applications in dynamic environments, where the actual execution time of a task may be different from its estimated time. There are three main contributions of our methods: They require significantly less computationa l time (i.e., runtime overhead) than applying the static algorithm at runtime for every in stance when a task finishes early or late. The performance in terms of reducing energy and/or meeting a given deadline is comparable to applying the static algorithm at runtime. They are effective for cases when the es timated execution time is underestimated or overestimated. The experimental results also show that our methods offe r significant improvement over simplistic greedy methods in terms of energy requirements and/or satisfying the deadline constraints. Our methods have been shown to work for environm ents where the estimated time for all tasks is greater than or equal to the execution time (i.e., underestimation) or where the estimated time for all tasks is less than or e qual to the execution time (i.e., overestimation). However, they should be equally effective for hybrid environments where some tasks complete before estimated time while some tasks complete after estimated time. PAGE 162 162 7.3 Static Assignment In Chapter 5, we have presented novel stat ic assignment algorithms to minimize DVS based energy consumption of DAG based applic ations for parallel systems. The proposed assignment algorithms effectively assign tasks to appropriate processors with the goal of energy minimization by utilizing expected DVS base d energy information during assignment and considering multiple task prioritizations based time and energy. There are three main contributions of our methods: Through the assignment method to minimize fi nish time, the deadline constraints are satisfied and also the energy can be reduced due to the generation of a larger amount of slack that can be allocated to tasks during the slack allocation step. The performance in terms of reducing ener gy requirements is significantly improved by incorporating energy minimization during the assignment process. They require two to three or ders of magnitude less time as compared to the Genetic Algorithm based formulations which outperfor m other existing algorithms in terms of energy consumption. Our experimental results show th at our proposed algorithms significantly outperform in terms of energy consumption with the lower computati onal time compared to existing algorithms. 7.4 Dynamic Assignment In Chapter 6, we have presented a nove l assignment algorithm to minimize energy consumption for dynamic environments. The pr oposed algorithm adjusts the schedule by reassigning tasks to processors and then realloca ting slack to tasks, whenever a task finishes earlier than expected at runtime. There are two main contributions of our method: The time requirements of our scheme are sma ll enough that it should be useful for a large number of application workflows. It provides considerably be tter energy minimization compared to (a) static scheduling without any change of the schedule at runt ime and (b) only reallo cating the slack at runtime while keeping the assignment. PAGE 163 163 Our experimental results show th at our proposed algorithms significantly outperform in terms of energy consumption with the lower computational time. Our scheme can easily be modified to cases when the actual execution time is greater than the estimated time like dynamic slack allocation, although in these cases the d eadline guarantees can not be maintained. 7.5 Future Work In this thesis, we have presented scheduli ng algorithms assuming that there is no resource contention. However, in practice, resources such as buses, caches, and I/O devices may be shared between multiple tasks. These types of resource conflict can have a significant impact on the time and energy requirements and have to be eff ectively incorporated in scheduling. We will develop algorithms that can model and enco mpass these issues for energy minimization. PAGE 164 164 LIST OF REFERENCES 1. AeA (formerly American Electronics Association) Report Cybernation, http://www.aeanet.org 2. R. K. Ahuja and J. B. Orlin, A Fast Sca ling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints, Op erations Research, 49(5), Sept. 2001, pp. 784789. 3. H. Aydin, R. Melhem, D. Moss, and P. Meja-Alvarez, Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics, Euromicro Conference on Real-Time Systems (ECRTS) Delft, Netherlands, June 2001, pp.225232. 4. H. Aydin, R. Melhem, D. Moss, and P. Meja-Alvarez, Dynamic and Aggressive Scheduling Techniques for Power-Aware R eal-Time Systems, Real-Time Systems Symposium (RTSS), London, UK, Dec. 2001, pp.95-105. 5. H. Aydin, R. Melhem, D. Moss, and P. Meja-Alvarez, Power-Aware Scheduling for Periodic Real-Time Tasks, IEEE Transactions on Computers, 53(5), May 2004, pp.584600. 6. N. K. Bambha, S. S. Bhattacharyya, J. Teic h, and E. Zitzier, A Hybrid Global/Local Search Strategies for Dynamic Voltage Scaling in Embedded Multiprocessors, International Symposium on Hardware/Software Codesign (CODES), Copenhagen, Denmark, Apr. 2001, pp.243-248. 7. S. Basagni, Finding a Maximal Weighted Independent Set in Wireless Networks, Telecommunication Systems, 18(1-3), Sept. 2001, pp.155-168. 8. T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Hetero geneous Distributed Computing Systems, Journal of Parallel and Distributed Computing, 61(6), June 2001, pp.810-837. 9. T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, Dynamic Voltage Scaled Microprocessor System, IEEE Journal of Solid-State Circuits, 35(11), Nov. 2000, pp.1571-1580. 10. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-Power CMOS Digital Design, IEEE Journal of Solid-State Circ uits, 27(4), Apr. 1992, pp.473-484. 11. J. Chen, H. Hsu, K. Chuang, C. Yang, A. Pang, and T. Kuo, Multiprocessor EnergyEfficient Scheduling with Task Migrati on Considerations, Euromicro Conference on Real-Time Systems (ECRTS), Sicily, Italy, July 2004, pp.101-108. PAGE 165 165 12. J. Chen and T. Kuo, Multiprocessor Energy-Efficient Scheduling for Real-Time Tasks with Different Power Characteristics, Inte rnational Conference on Parallel Processing (ICPP), Oslo, Norway, June 2005, pp.13-20. 13. P. Chowdhury and C. Chakrabarti, Static Task-Scheduling Algorithms for BatteryPowered DVS Systems, IEEE Transactions on Very Large Scale Integration Systems, 13(2), Feb. 2005, pp.226-237. 14. CPLEX, http://www.ilog.com/products/cplex/ 15. Dataquest, http://data1.cde.ca.gov/dataquest/ 16. H. El-Rewini and T. G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target Machines, Journal of Parall el Distributed Computing, 9(2), June 1990, pp.138-153. 17. W. Felter, K. Rajamani, T. Keller, and C. Rusu, A Performance-conserving Approach for Reducing Peak Power Consumption in Serv er Systems, International Conference on Supercomputing (ICS), Cambridge, MA, USA, June 2005, pp.293-302 18. F. Franchetti, Y. Voronenko, and M. Pues chel, FFT Program Generation for Shared Memory: SMP and Multicore, Supercomputing (SC), Tampa, FL, USA, Nov. 2006, pp.51. 19. D. Geer, Chip Makers Turn to Multicore Processors, IEEE Computer, 38(5), May 2005, pp.11-13. 20. K. Govil, E. Chan, and H. Wasserman, Comparing Algorithms for Dynamic SpeedSetting of a Low-Power CPU. Internati onal Conference on Mobile Computing and Networking, Berkeley, CA, USA, Nov. 1995, pp.13-25. 21. F. Gruian, Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors, International Sy mposium on Low Power Electron ics and Design, Huntington Beach, CA, USA, Aug. 2001, pp.46-51. 22. F. Gruian and K. Kuchcinski, LEneS: Task Scheduling for Low-Energy Systems Using Variable Supply Voltage Processors, Asian South Pacific Design Automation Conference (ASP-DAC01), Yokohama, Japan, Jan. 2001, pp.449-455. 23. F. Gruian and K. Kuchcinski, Uncertainty-B ased Scheduling: Energy-Efficient Ordering for Tasks with Variable Execution Time, International Symposium on Low Power Electronics and Design, Seoul Korea, Aug. 2003, pp.465-468. 24. D. S. Hochbaum and J. G. Shanthikumar, Convex Separable Optimization Is Not Much Harder than Linear Optimization, Journa l of the ACM, 37(4), Oct. 1990, pp.843-862. 25. I. Hong, G. Qu, M. Porkonjak, and M. B. Srivastava, Synthesis Techniques for LowPower Hard Real-Time Systems on Variable Voltage Processors, Real-Time Systems Symposium (RTSS), Madrid, Spain, Dec. 1998, pp.178-187. PAGE 166 166 26. I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava, Power Optimization of Variable-Voltage Core-Based Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(12), Dec. 1999, pp.1702-1714. 27. J. Hu and R. Marculescu, Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures under Real-Time Constraints, Design, Automation and Test in Europe Conference (DATE), Paris, France, Feb. 2004, pp.10234. 28. J. Hu and R. Marculescu, Communication and Task Scheduling of Application-Specific Networks-on-Chip, Computer and Digita l Techniques, 152(5), Sept. 2005, pp.643-651 29. S. Hua and G. Qu, Power Minimization Techniques on Distributed Real-Time Systems by Global and Local Slack Management, Asia South Pacific Automation Conference (ASP-DAC05), Shanghai, China, Jan. 2005, pp.830-835. 30. O. H. Ibarra and C. E. Kim, Heuristic Al gorithms for Scheduling Independent Tasks on Nonidentical Processors, Journal of the ACM, 24(2), Apr. 1977, pp. 280-289. 31. T. Ishihara and H. Yasuura, Voltage Sc heduling Problem for Dynamically Variable Voltage Processors, International Sympos ium on Low Power Elect ronics and Design (ISLPED), Monterey, CA, USA, Aug. 1998, pp.197-202. 32. M. Iverson, F. Ozuner, G. Follen, Parallel izing Existing Applications in a Distributed Heterogeneous Environment, Heterogene ous Computing Workshop (HCW), Santa Barbara, California, USA, Apr. 1995, pp.93-100. 33. R. Jejurikar and R. Gupta, Dynamic Slack Reclamation with Procrastination Scheduling in Real-Time Embedded Systems, Design Au tomation Conference (DAC), San Diego, California, USA, June 2005, pp.111-116. 34. R. Jejurikar and R. Gupta, Energy-Aware Task Scheduling With Task Synchronization for Embedded Real-Time Systems, IEEE Tr ansactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6), June 2006, pp.1024-1037. 35. R. Jejurikar and R. Gupta, Optimized Sl owdown in Real-Time Task Systems, IEEE Transactions on Computers, 55(12), Dec. 2006, pp.1588-1598. 36. A. Jerraya, H. Tenhunen, and W. Wolf Multiprocessor Systems-on-Chips, IEEE Computer, 38(7), July 2005, pp.36-40. 37. P. V. Karzanov and S. T. McCormick, Polynomial Methods for Separable Convex Optimization in Unimodular Linear Spaces with Applications, SIAM Journal of Computing, 26(4), Aug. 1997, pp.1245-1275. 38. W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, Performance Comparison of Dynamic Voltage Scaling Algorithms for Real-T ime Systems, Real-Time and Embedded Technology and Application Symposium (RTA S02), San Jose, CA, USA, Sept. 2002, pp.219-228. PAGE 167 167 39. R. Kumarm K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, International Symposium on Mi croelectronics, Washington, DC, USA, Dec. 2003, pp. 81. 40. R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous Chip Multiprocessors, IEEE Computer 38(11), Nov. 2005, pp. 32-38. 41. Y. Kwok and I. Ahmad, Dynamic Critical-Pat h Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocesso rs, IEEE Transactions on Parallel and Distributed Systems, 7(5), May 1996, pp.506-521. 42. Y. Kwok and I. Ahmad, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Computing Surveys, 31(4), December 1999, pp.406471. 43. W. Kwon and T. Kim, Optimal Voltage Allo cation Techniques for D ynamically Variable Voltage Processors, ACM Transactions on Embedded Computing Systems, 4(1), Feb. 2005, pp.211-230. 44. G. Q. Liu, K. L. Poh, and M. Xie, Iterative List Scheduling for Heterogeneous Computing, Journal of Para llel and Distributed Com puting, 65(5), May 2005, pp.654665. 45. J. Luo and N. K. Jha, Power-conscious Join t Scheduling of Periodic Task Graphs and Aperiodic Tasks in Distributed Real-time Em bedded Systems, International Conference on Computer-Aided Design (ICCAD), San Jose, California, USA, Nov. 2000, pp.357364. 46. J. Luo and N. K. Jha, Battery-Aware St atic Scheduling for Distributed Real-Time Embedded Systems, Design Automation Conf erence (DAC), Las Vegas, NV, USA, June 2001, pp.444-449. 47. J. Luo and N. K. Jha, Static and Dynami c Variable Voltage Scheduling Algorithms for Real-Time Heterogeneous Distributed Embe dded Systems, Asia South Pacific Design Automation Conference (ASP-DAC02), Ba ngalore, India, Jan. 2002, pp.712-719. 48. J. Luo and N. K. Jha, Power-profile Driven Variable Voltage Scaling for Heterogeneous Distributed Real-time Embedded Systems, International Conference on VLSI Design (VLSI), Las Vegas, Nevada, USA, Jan. 2003, pp.369-375. 49. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling for Minimizing Energy or Minimizing Power, International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000, pp.3239-3242. 50. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling Algorithms for Minimizing Energy, International Sym posium on Low Power Electronic Design (ISLPED), Huntington Beach, California, USA, Aug. 2001, pp.279-282. PAGE 168 168 51. R. Mishra, N. Rastogi, D. Zhu, D. Moss, and R. Melhem, Energy Aware Scheduling for Distributed Real-Time Systems, Internati onal Parallel and Distributed Processing Symposium (IPDPS), Nice, France, Apr. 2003, pp.21b. 52. P. Pillai and K. G. Shin, Real-Time Dyna mic Voltage Scaling for Low-Power Embedded Operating Systems, ACM Symposium On Opera ting Systems Principles Banff, Alberta, Canada, Oct. 2001, pp.89-102. 53. S. Sakai, M. Togasaki, and K. Yamazaki, A Note on Greedy Algorithms for the Maximum Weighted Independent Set Problem, Discrete Applied Mathematics, 126(2-3), Mar. 2003, pp.313-322. 54. V. Sarkar, Partitioning and Scheduling Parallel Programs for Multi-processors, Cambirdge, Mass, MIT Press, 1989. 55. M. T. Schmitz and B. M. Al-Hashimi, Cons idering Power Variations of DVS Processing Elements for Energy Minimisation in Distri buted Systems, Intern ational Symposium on System Synthesis, Montral, P.Q., Canada, Oct. 2001, pp.250-255. 56. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Syst ems, Design, Automation, and Test in Europe Conference (DATE), Paris, France, Mar. 2002, pp.514-521. 57. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Iterative Schedule Optimization for Voltage Scalable Distributed Embedded Systems, ACM Transactions on Embedded Computing Systems, 3(1), Feb. 2004, pp.182-217. 58. S. Shankland and M. Kanellos, Intel to Elaborate on New Multicore Processor, http://news.zdnet.co.uk/hardware/0,1000000091,39116043,00.htm ?r=1 59. Y. Shin and K. Choi, Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems, Design Automation Conference (DAC 99), New Orleans, Louisiana, USA, June 1999, pp.134-139. 60. Y. Shin, K. Choi, and T. Sakurai, Power Optimization of Real-Time Embedded Systems on Variable Speed Processors, Internati onal Conference on Computer-Aided Design (ICCAD), San Jose, Californi a, USA, Nov. 2000, pp.365-368. 61. S. Shivel, H. J. Siegel, A. A. Maciejewski, P. Sugavanam, T. Banka, R. Castain, K. Chindam, S. Dussinger, P. Pichumani, P. Sat yqsekaran, W. Saylor, D. Sendek, J. Sousa, J. Sridharan, and J. Velazco, Static Allocat ion of Resources to Communicating Subtasks in a Heterogeneous Ad Hoc Grid Environmen t, Journal of Parallel and Distributed Computing, 66(4), Apr. 2006, pp.600-611. 62. G. C. Sih and E. A. Lee, A Compile-Time Scheduling Heuristic for InterconnectionConstrained Heterogeneous Pro cessor Architectures, IEEE Tran sactions on Parallel and Distributed Systems, 4(2), Feb. 1993, pp.175-187. PAGE 169 169 63. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, Reducing Power in High-Performance Microprocessors, Desi gn Automation Conference (DAC), San Francisco, California, USA, June 1998, pp.732-737. 64. H. Topcuoglu, S. Hariri, and M. Wu, Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Transactions on Parall el and Distributed Systems, 13(3), Mar. 2002, pp.260-274. 65. D. Warrier, W. E. Wilhelm, J. S. Warren, I. V. Hicks, A Branch-and-Price Approach for the Maximum Weight Independent Set Probl em, Networks, 46(4), Dec. 2005, pp. 198209. 66. M. Weiser, B. Welch, A. Demers, and S. Shenker, Scheduling for Reduced CPU Energy, USENIX Conference on Operating Systems De sign and Implementation, Monterey, CA, USA, Nov. 1994, pp.13-23. 67. S. Williams, L. Oliker, R. Vuduc, K. Yelick, J. Demmel, and J. Shalf, Optimization of Sparse Matrix-vector Multiplication on Emer ging Multicore Platforms, Supercomputing (SC), Reno, NV, USA, Nov. 2007, pp.38. 68. W. Wolf, The Future of Multiprocessor Systems-on-Chips, Design Automation Conference (DAC), San Diego, CA, USA, June 2004, pp.681-685. 69. M. Y. Wu and D. D. Gajski, Hypertool: A Programming Aid for Message-Passing Systems, IEEE Transactions on Parallel a nd Distributed Systems, 1(3), July 1990, pp.330-343. 70. C. Yang, J. Chen, T. Kuo, An Approximation Algorithm for Energy-Efficient Scheduling on A Chip Multiprocessor, Design, Automation, and Test in Europe Conference (DATE), Munich, Germany, Mar. 2005, pp.468-473. 71. T. Yang and A. Gerasoulis, DSC: Schedu ling Parallel Tasks on an Unbounded Number of Processors, IEEE Transactions on Parallel and Distributed System s, 5(9), Sept. 1994, pp.951-967. 72. R. Yao, A. Demers, and S. Shenker, A Scheduling Model for Reduced CPU Energy, IEEE Symposium on Foundations of Comput er Science (FOCS), Milwaukee, Wisconsin, USA, Oct. 1995, pp.374-382. 73. Y. Yu and V. K. Prasanna, Resource Allo cation for Independent Real-Time Tasks in Heterogeneous Systems for Energy Minimizati on, Journal of Information Science and Engineering, 19(3), May 2003, pp.433-449. 74. Y. Yu and V. K. Prasanna, Energy-Balanced Task Allocation for Collaborative Processing in Wireless Sensor Networks, M obile Networks and A pplications, 10(1-2), Feb. 2005, pp.115-131. PAGE 170 170 75. Y. Zhang, X. (Sharon) Hu, and D. Z. Chen Task Scheduling and Voltage Selection for Energy Minimization, Design Automation Conference (DAC), New Orleans, Louisiana, USA, June 2002, pp.183-188. 76. D. Zhu, R. Melhem, and B. R. Childers, Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 14(7), July 2003, pp.686-700. 77. D. Zhu, D. Moss, and R. Melhem, Powe r-Aware Scheduling for AND/OR Graphs in Real-Time Systems, IEEE Transactions on Para llel and Distributed Systems, 15(9), Sept. 2004, pp.849-864. 78. J. Zhuo and C. Chakrabarti, An Efficient Dynamic Task Scheduling Algorithm for Battery Powered DVS Systems, Asian South Pacific Design Automation Conference (ASP-DAC05), Shanghai, China, Jan. 2005, pp.846-849. 79. J. Zhuo and C. Chakrabarti, System-Level Energy-Efficient Dynamic Task Scheduling, Design Automation Conference (DAC), Sa n Diego, California, USA, June 2005, pp.628-631. PAGE 171 BIOGRAPHICAL SKETCH Jaeyeon Kang obtained her Master of Science in com puter science from University of Southern California in 2002. Sh e obtained her Master of Science and Bachelor of Science degrees in electrical and computer engineer ing from Sungkyunkwan University, Korea in 1997 and 1999 respectively. |