Citation
Comparative Analysis of Space-Grade Processors

Material Information

Title:
Comparative Analysis of Space-Grade Processors
Creator:
Lovelly, Tyler M
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (84 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
GEORGE,ALAN DALE
Committee Co-Chair:
LAM,HERMAN
Committee Members:
GORDON-ROSS,ANN M
TELESCO,CHARLES MICHAEL

Subjects

Subjects / Keywords:
afrl -- algorithm -- analysis -- application -- architecture -- autonomous -- bandwidth -- benchmark -- benchmarking -- chrec -- comparison -- computation -- computing -- cots -- cpu -- dsp -- dwarf -- efficiency -- fpga -- gpu -- hardened -- hardening -- manycore -- memory -- metric -- multicore -- nasa -- onboard -- optimization -- overhead -- parallel -- performance -- power -- processing -- processor -- radiation -- reconfigurable -- sensor -- space -- taxonomy
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
Onboard computing demands for space missions are continually increasing due to the need for real-time sensor and autonomous processing combined with limited communication bandwidth to ground stations. However, creating space-grade processors that can operate reliably in environments that are highly susceptible to radiation hazards is a lengthy, complex, and costly process, resulting in limited processor options for space missions. Therefore, research is conducted into current, upcoming, and potential future space-grade processors to provide critical insights for progressively more advanced architectures that can better meet the increasing demands for onboard computing. Metrics and benchmarking data are generated and analyzed for various processors in terms of performance, power efficiency, memory bandwidth, and input/output bandwidth. Metrics are used to measure and compare the theoretical capabilities of a broad range of processors. Results demonstrate how onboard computing capabilities are increasing due to processors with architectures that support high levels of parallelism in terms of computational units, internal memories, and input/output resources; and how performance varies between applications, depending on the intensive computations used. Furthermore, the overheads incurred by radiation hardening are quantified and used to analyze low-power commercial processors for potential use as future space-grade processors. Once the top-performing processors are identified using metrics, benchmarking is used to measure and compare their realizable capabilities. Computational dwarfs are established and a taxonomy is formulated to characterize the space-computing domain and identify computations for benchmark development, optimization, and testing. Results demonstrate how to optimize for the architectures of space-grade processors and how they compare to one another for a variety of integer and floating-point computations. Metrics and benchmarking results and analysis thus provide critical insights for progressively more advanced architectures for space-grade processors that can better meet the increasing onboard computing demands of space missions. Trade-offs between architectures are determined that can be considered when deciding which space-grade processors are best suited for specific space missions or which characteristics and features are most desirable for future space-grade processors. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: GEORGE,ALAN DALE.
Local:
Co-adviser: LAM,HERMAN.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2018-06-30
Statement of Responsibility:
by Tyler M Lovelly.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Embargo Date:
6/30/2018
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

COMPARATIVE ANALYSIS OF SPACE GRADE PROCESSORS By TYLER MICHAEL LOVELLY A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2017

PAGE 2

2017 Tyler Michael Lovelly

PAGE 3

To my country

PAGE 4

4 ACKNOWLEDGMENTS This work was supported in part by the Industry/University Cooperative Research Center Program of the National Science Foundation under grant nos. IIP 1161022 and CNS 1738783, by the industry and government members of the NSF Center for High Performance Re configurable Computing and the NSF Center for Space, High Performance, and Resilient Computing, by the University of Southern California Information Sciences Institute for remote access to their systems, and by Xilinx and Microsemi for provided hardware an d software

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 8 ABSTRACT ................................ ................................ ................................ ..................... 9 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 11 2 BACKGROUND AND RELATED RESEARCH ................................ ....................... 14 Metrics and Benchmarking Analysis ................................ ................................ ....... 15 Computational Dwarfs and Taxonomies ................................ ................................ 18 3 METRICS AND BENCHMARKING METHODOLOGIES ................................ ........ 20 Metrics Calculations for Fixed Logic and Reconfigurable Logic Processors ........... 20 Benchmark Development, Optimization, and Testing for CPUs and FPGAs .......... 24 4 METRICS EXPERIMENTS, RESULTS, AND ANALYSIS ................................ ....... 27 Metrics Comparisons of Space Grade CPUs, DSPs, and FPGAs .......................... 27 Performance Variations in Space Grade CPUs, DSPs, and FPGAs ...................... 29 Overheads Incurred from Radiation Hardening of CPUs, DSPs, and FPGAs ......... 36 Projected Future Space Grade C PUs, DSPs, FPGAs, and GPUs .......................... 39 5 BENCHMARKING EXPERIMENTS, RESULTS, AND ANALYSIS ......................... 44 Space Computing Taxonomy and Benchmarks ................................ ...................... 44 Performance Analysis of Space Grade CPUs ................................ ........................ 48 Performance Analysis of Space Grade FPGAs ................................ ...................... 51 Benchmarking Comparisons of Space Grade CPUs and FPGAs ........................... 53 Expanded Analysis of Space Grade CPUs ................................ ............................. 55 6 CONCLUSIONS ................................ ................................ ................................ ..... 58 APPENDIX A METRICS DATA ................................ ................................ ................................ ..... 62 B BENCHMARKING DATA ................................ ................................ ........................ 65

PAGE 6

6 LIST OF REFERENCES ................................ ................................ ............................... 68 BIOGRAPHICAL SKETCH ................................ ................................ ............................ 84

PAGE 7

7 LIST OF TABLES Table page 2 1 UCB computational dwarfs. ................................ ................................ ................ 18 5 1 Space computing taxonomy. ................................ ................................ .............. 45 5 2 Space computing benchmarks. ................................ ................................ .......... 47 A 1 Metrics data for space grade CPUs, DSPs, and FPGAs. ................................ ... 62 A 2 Performance variations in space grade CPUs, DSPs, and FPGAs. ................... 62 A 3 Metrics data for closest COTS counterparts to space grade CPUs, DSPs, and FPGAs. ................................ ................................ ................................ ........ 62 A 4 Radiation hardening outcomes for space grade CPUs, DSPs, and FPGAs. ...... 63 A 5 Percentages achieved by space grade CPUs, DSPs, and FPGAs after radiation hardening. ................................ ................................ ............................ 63 A 6 Metrics data for low power COTS CPUs, DSPs, FPGAs, and GPU s. ................ 64 A 7 Metrics data for projected future space grade CPUs, DSPs, FPGAs, and GPUs (worst case). ................................ ................................ ............................ 64 A 8 Metrics data for projected future space grade CPUs, DSPs, FPGAs, and GPUs (best case). ................................ ................................ .............................. 64 B 1 Parallelization data for space grade CPUs. ................................ ........................ 65 B 2 Resource usage data for space grade FPGAs. ................................ .................. 66 B 3 Benchmarking data for matrix multiplication on space grade CPUs and FPGAs. ................................ ................................ ................................ ............... 66 B 4 grade CPUs and FPGAs. .. 66 B 5 Performance data for additional benchmarks on space grade CPUs. ................ 67

PAGE 8

8 LIST OF FIGURES Figure page 4 1 Metrics data for space grade CPUs, DSPs, and FPGAs ................................ ... 28 4 2 Operations mixes of intensive computations. ................................ ..................... 30 4 3 Performance variations in space grade CPUs, DSPs, and FPGAs. ................... 31 4 4 Metrics data for closest COTS counterparts to sp ace grade CPUs, DSPs, and FPGAs. ................................ ................................ ................................ ........ 37 4 5 Percentages achieved by space grade CPUs, DSPs, and FPGAs after radiatio n hardening. ................................ ................................ ............................ 38 4 6 Metrics data for low power C OTS CPUs, DSPs, FPGAs, and GPUs ................ 40 4 7 Metrics data for current and projected future space gra de CPUs, DSPs, FPGAs, and GPUs. ................................ ................................ ............................ 42 5 1 Parallelization data for matrix multip lication on space grade CPUs. ................... 49 5 2 Parallelization data for Kepler grade CPUs ...................... 50 5 3 Resource usage data for matrix multiplication on space grade FPGAs. ............. 51 5 4 Resource u grade FPGAs. ................ 52 5 5 Benchmarking data for matrix multiplicatio n on space grade CPUs and FPGAs ................................ ................................ ................................ ............... 54 5 6 on space grade CPUs and FPGAs .. 54 5 7 Performance data for additional benchmarks on space grade CPUs ................ 56

PAGE 9

9 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy COMPARATIVE ANALYSIS OF SPACE GRADE PROCESSORS By Tyler Michael Lovelly December 2017 Chair: Ala n Dale George Major: Electrical and Computer Engineering O nboard computing demands for space missions are continually increasing due to the need for real time sensor and autonomous processing combined with limited communication bandwidth to ground stations. However, creating space grade processors that can operat e reliably in environments that are highly susceptible to radiation hazards is a lengthy, complex, and costl y process, resulting in limited processor options for space missions. Therefore, researc h is conducted into current, upcoming and potential future space grade processors to provide critical insights for progressively more advanced architectures that can better meet the increasing demands for onboard computing Metrics and benchmarking data are generated and analyzed for various processors in terms o f performance, power efficiency, memory bandwidth, and input/output bandwidth. M etrics are used to measure and compare the theoretical capabilities of a broad range of processors R esults demonstrate how onboard computing capabilities are increasing due to processors with architectures that support high levels of parallelism in terms of computational units, internal memories, and input/output resources; and how performance varies between applications, depending on the intensive computations

PAGE 10

10 used. Furthermor e, the overheads incurred by radiation hardening are quantified and used to analyze lo w power commercial processors for poten tial use as future space grade processors Once the top performing processors are identified using metrics, benchmarking is used to measure and compare their realizable capabilities. Computational dwarfs are established and a taxonomy is formulated to characte rize the space co mputing domain and identify computations for benchmark development, optimization, and testing Results demonstrate how to optimize for the architectures of space grade processors and how they compare to one another for a variety of integer and floating point computations Metrics and benchmarking results and analysis thus provide critical insights for progressively more advanced architectures for space grade processors that can better meet the increasing onboard computing demands of space missions. Trade offs between architectures are determined that can be co nsidered when deciding which space grade processors are best suited for specific space missions or which characteristics and features are most desirable for future space grade processors

PAGE 11

11 CHAPTER 1 INTRODUCTION Curren tly available processor options for space missions are limited due to the lengthy, complex, and costly process of creating space grade processors, and because space mission design typically requires lengthy development cycles, resulting in a large and potentially increasing technological gap between space grade and co mmercial off the shelf (COTS) processors [1 5 ]. However, computing requirements for space missions are becoming more demanding due to the increasing need for real time sensor and autonomous processing with more advanced sensor technologies and increasing m ission data rates, data p recisions, and problem sizes [5 7 ]. Furthermore, communication bandwidth to ground stations remains limited and suffers from long transmission latencies, making remote transmission of sensor data and real time operating decisions i mpractical. High performance space grade processors can alleviate these challenges by processing data before transmission to ground stations and making decisions autonomously, but careful consideration is required to ensure that they can meet the unique ne eds of onboard computing [7,8 ]. To address the continually increasing demand for high performance onboard computing, architectures must be carefully analyzed for their potential as future space grade processors. C urrent space grade processors are typically based upon COTS processors with architectures that were not explicitly designed for the unique needs of onboard computing. To ensure that future space grade processors are based upon architectures that are most suitable for space missions, trade offs betw een various architectures should be determin ed and considered when designing optimizing, or comparing space grade processors or when selecting COTS architecture s for radiation

PAGE 12

12 hardening and use in space missions [9 11] However, the range of available pr ocessors is large and diverse, with many possible architectures to evaluate. To analyze the broad range of current, upcoming, and potential future processor s for onboard computing, a set of metrics is used that provides a theoretical basis for the study of their architectures [12 16 ]. Facilitated by these metrics, quantitative analysis and objective comparisons are conducted for many diverse space gra de and low po wer COTS processors from categories such as multicore and many core central processing units (CPUs), digital signal processors (DSPs), field programmable gate arrays (FPGAs), graphics processing units (GPUs), and hybrid configura tions of these architectures. M etrics analysis provides insights into the performance, power efficiency, memory bandwid th, and input/output bandwidth of specific implementations of these processors to track the current and future progress of their development and to determine which can better meet the computing needs of space missions [1]. Once the top performing space gra de p rocessors have been identified, a benchmarking analysis is conducted to study the realizable capabilities of their architectures. To characterize the broad space computing domain, a comprehensive study is performed to determine common and critical comp uting requirements for space missions based upon applicat ion requirements Using this information, computational dwarfs are established, and an expansive taxonomy is formulated that broadly defines and classifies the computationally intensive applications required by space missions. From this taxonomy, a set of benchmarks is identified that is largely representative of onboard computing requirements, and thus simplifies the space computing domain in to a manageable set of computations. Then, a variety of the se space computing

PAGE 13

13 benchmarks are developed, optimized, and tested on top performing processors to analyze and compare the realizable capabilities of each architecture in terms of performance and power efficiency. Benchmarking analysis provide insights into which architectures and optimizations are most eff ective and which factors are limiting additional performance from being achieved [9]. The remainder of this dissertation is structured as follows. Chapter 2 describes background and related research fo r space grade processors, metrics and benchmarking analysis and computational dwarfs and taxonomies. Chapter 3 describes methodologies fo r metri cs calculations and benchmark developmen t optimization, and testing for space grade processors Chapter 4 prov ides a metrics analysis of current upcoming, and projected fu ture space grade processors including comparisons of space grad e processors to one another, detailed analysis of how the performance of space grade processors varies between applications and co mputations based upon operations mix, comparisons of space grade processors to the closest COTS counterparts upon which they were based to determine overheads incurred from radiation hardening, and comparisons of top performing space grade and COTS process ors to determine the potential for future space grade processors. Chapter 5 provides a benchmarking analysis of top performing space grade processors, including the formulation of a taxonomy that is used characterize the space computing domain and identify benchmarks direct comparisons of space grade processors to one another in terms of performance and power efficiency and an expanded performance analysis using a variety of additional benchmarks. Finally, Chapter 6 provides conclusions. Data for all resu lts are tabulated and included in the A ppendix.

PAGE 14

14 CHAPTER 2 BACKGROUND AND RELATED RESEARCH Many radiation hazards exist in the harsh space environment such as galactic cosmic rays, solar particle events, and trapped radiation in the Van Allen belts, which threaten the ope ration of onboard processors [17,18 ]. Space grade processors must be radiation hardened to withstand cumulative radiation effects such as charge buildup within the gate oxide that causes damage to the silicon lattice over time, and th ey must provide immunity to single event effects that occur when single particles pass through the silicon lattice and cause errors that can lead to data corruption or disrupt the func tionality of the processor [19 21 ]. Several techniques exist for the fab ricati on of space grade processors [22 24 ], including radiation hardening by process, which involves the use of an insulating oxide layer, and radiation hardening by design, which involves specialize d transistor layout techniques. Although COTS processors can be used in space, they cannot always satisfy reliability and accessibility requireme nts for missions with long planned lifetimes within harsh orbits and locations that are highly susceptible to radiation hazards However, creat ing a space grade implementation of a COTS processor often comes with associated costs [25 ], including slower operating frequencies, decreased numbers of processor cores or computational units, increased power dissipation, and de creased input/output resour ces. Traditionally, space grade processors have come in the form of single core CPUs [26 ]. However, in recent years, development has occurred on space grade processors with more adva nced architectures such as multicore and many core CPUs, DSPs, and FP GAs. For example, a space grade CPU based upon a multicore ARM

PAGE 15

15 Cortex A53 architecture is currently being developed through a joint investment of the National Aeronautics and Space Administration (NASA) and the Air Force Research Laboratory (AFRL) under the Hig h P erformance Spaceflight Comping (HPSC) program [ 6 8, 27,28 ]. This processor is referred to as the Boeing HPSC, although it may be renamed upon completion of the program. Metrics and Benchmarking Analysis To analyze and compare processors for use in space missions, an established set of metrics is used for the quantitative ana lysis of diverse processors in terms of performance, power efficiency, memory bandwidt h, and input/output bandwidth [12 16 ]. These me trics provide a basis f or the analysis of a the theoretical capabilities of processors and enable the objective comparison of diverse architectures, from categories such as multicore and many core CPUs, DSPs, FPGAs, GPUs, and hybrid configu rations of these architectures. Computational density (CD), reported in gigaoperations per second (GOPS), is a metric for the steady st ate performance of the computational units of a processor for a stream of independent operations. By default, calculations are based upon an operations mix of half additions and half multiplications. However, the default can be varied to analyze how performance differs between applications that contain computations that require o ther operations mixes. Multiply accumulate functions are only considered to be one operation each because they require data dependency between each addition and multiplication. CD is calculated separately for each da ta type considered, including 8 bit, 16 bit, and 32 bit integers, as well as both single precision and double precision floating point (hereafter referred to as Int8, Int16, Int32, SPFP, and DPFP, respectively). CD per watt (CD/W), reported in GOPS per watt (GOPS/W), is

PAGE 16

16 a metric for the performance achieved for each watt of power dissipated by the processo r. I nternal Memory B andwidth (IMB), reported in gigabytes per second, is a metric for the throughput between a processor and onchip memories. E xternal Memory B andwidth (EMB), reported in gigabytes per second, is a metric for the throughput between a proces sor and offchip memories through dedicated memory controllers. I nput/Output B andwidth (IOB), reported in gigabytes per second, is a metric for the total throughput between a processor and offchip resources through both dedicated memory controllers and all other available forms of input/output. Although no single metric can complete ly characterize the performance of any given processor, each metric provides unique insights into specific features that can be related to applications and computations as needed. The most releva nt metric for performance is CD when bound computationally, C D/W when bound by power efficiency, IMB or EMB when bound by memory, IOB when bound by input/output resources, or some combination of multiple metrics depending on specific application requirements. Although other metrics are also of interest, such as the cost and reliability of each processor, this information is not standardized between vendors and is often unavailable or highly dependent on mission specific factors. M etrics can be calculated solely based upon information from vendor provided documentatio n and software, without the hardware costs and software development efforts required for benchmarking, thus providing a practical methodology for the comparison and analysis of a broad range of processors. However, metrics describe only the theoretical cap abilities of each architecture without complete consideration of software requireme nts and implementation details that result in additional costs to

PAGE 17

17 performance. Therefore, once the top performing processors have been identified using metrics, more thoroug h performance anal ysis can then be conducted using benchmarking to de termine realizable capabilities through hardware and software experimentation. To analyze the realizable capabilities of space grade processors, benchmarks are identified for the space co mputing domain, then developed, optimized, and tested directly on each processor. Typically, vendor provided libraries achieve the highest performance for any given processor because they are carefully and specifically op timized for its architecture [14 ]. However, these libraries are often very limited and are unlikely to be optimized for the more esoteric computations used for autonomous processing. Furthermore, most highly optimized numerical libraries are developed specifically for the high performance c omputing domain, which is primarily concerned with floating point data types, and thus do not support integer data types that are often used for sensor processing. Therefore, the development of custom benchmarks becomes necessary when existing libraries do not support all computations, data types, and optimizations being considered. Although benchmarking analysis of space grade processors requires greater hardware costs and development efforts than metrics analysis, the resulting insights are specific to th e computations required by the space computing domain. Thorough analysis becomes possible as processors approach theoretical capabilities and the effects of their architectures on performance can be carefully studied. Therefore, benchmarking provides an ac curate and insightful methodology for comparing performance trade offs for various architectures, computations, and optimizations.

PAGE 18

18 B enchmarking of onboard processors has been conducted previously [ 11,29,30 ] but there has been no research focused on analyzing and comparing the performance of the advanced architectures used in current and upc oming space grade processors Computational Dwarfs and Taxonomies Before benchmarking analysis can be conducted, the computing domain being considered must first b e studied and characterized to identify computations that are largely representative of its critical applications [31,32 ]. Thus, the University of California at Berkeley (UCB) introduced the concept of computational dwarfs for designing and analyzing compu method that captures a pattern of co pp. 1 ]. Table 2 1 lists the UCB dwarfs, which were defined at high levels of abstraction to encompass all computation al methods used in modern compu ting Table 2 1. UCB computational dwarfs. Dwarf Dense linear algebra Sparse linear algebra Spectral methods N body methods Structured grids Unstructured grids MapReduce Combinational logic Graph traversal Dynamic programming Backtrack and branch and bound Graphical models Finite state machines The UCB dwarfs are used to characterize applications by determining their intensive computations and classifying them under the appropriate dwarf. For example, computations such as matrix multiplication and matrix addition are both classified under the dense and sparse linear algebra dwarfs, while fast Fourier transform and discrete

PAGE 19

19 wave let transform are both classified under the spectral methods dwarf. Abstracting applications as dwarfs enables analysis of computational patterns across a broad range of applications, independent of the actual hardware and software implementation details. For any computing domain, dwarfs can be identified and used to create taxonomies that broadly define and classify the computational patterns within that domain. This concept has been demonstrated in various computing domains, including high performance com puting [34 ], cloud compu ting [35], and symbolic computation [36 ]. T hese concepts are used to establish dwarfs and a taxonomy for the space computing domain that is then used to identify a set of computations for benchmarking

PAGE 20

20 CHAPTER 3 METRICS AND BENCHMARKING METHODOLOGIES E stablished concepts in the analysis of processor s are used and applied to space grade processors These concepts include metrics analysi s as an initial comparison of a broad range of architectures and benchmarking analysis for f urther insights into the top performing architectures based upon performance and power efficiency for various computations Metrics Calculations for Fixed Logic and Reconfigurable Logic Processors To calculate metrics for a fixed logic processor such as a CPU, DSP, or GPU, several pieces of information are required about the architecture that are obtained from vendor provid ed documentation [12 14 ]. For example, Equations 3 1 to 3 15 demonstrate the process of calculating metrics fo r the Freescale QorIQ P5 0 40, which is a quadcore CPU [37 3 9 ]. CD calculations require information about the operating frequency reported in megahertz (MHz), the number of each type of computational unit, and the number of operations per cycle that can be a chieved by each type of computational unit for all operations mixes and data t ypes considered. As shown in Equations 3 1 and 3 2 there is one integer addition unit and one integer multiplication unit on each processor core, allowing for one addition and o ne multiplication to be issued simultaneously per cycle for all integer data types. There is only one floating point unit on each processor core, which handles both additions and multiplications, allowing for only one operation to be issued per cycle for a ll floating point data types. CD/W calculations require the same information as CD calculations, in addition to the maximum power dissipation. As shown in Equations 3 3 and 3 4 CD/W is calculated using the corresponding CD calculations and the maximum pow er dissipation. IMB

PAGE 21

21 calculations require information about the number of each type of onchip memory unit, such as caches and register files, and associated operating frequencies, bus widths, access latencies, and data rates. As shown in Equations 3 5 to 3 7 IMB is calculated for all types of caches available on each processor core. Assuming cache hits, both types of L1 cache can s upply data in each clock cycle. Although the L2 cache has a higher bus width, it also require s a substantial access latency that limits the overall bandwidth. IMB values are combined to obtain the total IMB. EMB calculations require information about the number of each type of dedicated controller for offchip memories and associated operating frequencies, bus widths, and data rates As shown in Equation 3 8 EMB is calculated for the dedicated controllers available for external memories on the QorIQ P5040. IOB calculations require the same information as EMB calculations, in addition to the number of each type of available input/out put resource and associated operating frequencies, bus widths, and data rates. As shown in Equations 3 9 to 3 15 IOB is calculated for each type of input/output resource available using optimal configurations for signal multiplexing. IOB values are combin ed to obtain the tot al IOB. ( 3 1) ( 3 2) ( 3 3) ( 3 4) ( 3 5) ( 3 6) ( 3 7)

PAGE 22

22 ( 3 8) ( 3 9) ( 3 10) ( 3 11) ( 3 12) ( 3 13) ( 3 14) ( 3 15) To calculate metrics for a reconfigurable logic processor such as an FPGA, the process is more complex as compared to fixed logic processors, and it requires several pieces of information about the architecture that are obtained from vendor provided documentation, s oft ware, and test cores [12 16 ]. For example, Equations 3 16 to 3 30 demonstrate the process of calculati ng metrics for the Xilinx Virtex 5 FX130T, which is an FPGA [4 0 4 3 ]. CD calculations require information about the total available logic resources of th e architecture in terms of multiply accumulate units (MACs) lookup tables (LUTs) and flip flops (FFs) Ad ditionally, the use of software and test cores is required to generate information about the operating frequencies and logic resources used for all operations mix e s and data types considered A linear programming algorithm is used for optimization, based u pon operating frequencies and the configuration of computational units on the reconfigurable architecture [15 ,16 ]. As shown in Equations 3 16 to 3 20 CD is calculated separately for each integer and floating point data type, based upon the operating frequ encies and logic resources used for additions and multiplications, where each computational unit can compute one

PAGE 23

23 operation per cycle and multiple implementations of each computational unit are considered that make use of different types of logic resources. CD/W calculations require the use software to generate information about power dissipation given the configuration of computatio nal units for each data type As shown in Equations 3 21 to 3 25 CD/W is calculated separately for each integer and floating p oint data type using estimates for maximum power dissipation generated using vendor provided software IMB calculations require information about the number of onchip memory units such as b lock random access memory units and the associated operating frequencies, number of ports, bus widths, and data rates. As shown in Equation 3 26 IMB is c alculated for the internal block random access memory units on the Virtex 5. EMB calculations require the operating frequency, logic a nd input/output resource usage, bus widths, and data rates for dedicated controllers for offchip memories. As shown in Equation 3 27 EMB is calculated for dedicated controllers for external memories, where the maximum number of controllers is limited by t he number of input/output ports available. IOB calculations require the same type of information that is required for fixed logic processors. As shown in Equations 3 28 to 3 30 IOB is calculated for each type of input/output resource available. IOB values are combined to obtain the total IOB. ( 3 16) ( 3 17) ( 3 18) ( 3 19) ( 3 20) ( 3 21)

PAGE 24

24 ( 3 22) ( 3 23) ( 3 24) ( 3 25) ( 3 26) ( 3 27) ( 3 28) ( 3 29) ( 3 30) To calculate metrics for a hybrid processor that contains some combination of CPU, DSP, GPU, and FPGA architectures, the calculations must first be completed for each constituent architecture. CD va lues are then combined to obtain the hybrid CD, which is then divided by the combined maximum power dissipation to obtain the hybrid CD/W. IMB, EMB, and IOB values are also combined to obtain the hybrid IMB, EMB, and IOB, but they must account for any over lap of memory and input/output resources that are shared between the constituent architectures. Benchmark Development Optimization and Testing for CPUs and FPGAs Benchmarks are developed for various data types, including 8 bit, 16 bit, and 32 bit integer s, as well as both single precision and double precision floating point (hereafter referred to as Int8, Int16, Int32, SPFP, and DPFP, respectively). Correct functionality is verified using known test patterns, then execution times are measured using random ized data. The total number of arithmetic operations, solved equations, or memory transfers per second are calculated for each benchm ark and reported in either

PAGE 25

25 mega ope rations per second (MOPS), mega soluti ons per second (MSOLS), or mega transfers per second (MT/s). Data are generated using various developme nt boards, including the Cobham GR CPCI LEON4 N2X [4 4 ] for the Cobham GR740 [45 47], the Freescale P5040DS [48 ] for the BAE Systems RAD5545 [49] the LeMaker Hi Key [50 ] for the Boeing HPSC, t he Boeing Maestro Development Board [51 ] for t he Boeing Maestro [52 54], the Xilinx ML510 [55 ] for the Xilinx Virtex 5QV FX130 [56] and the Microsemi RTG4 DEV KIT [57 ] for the Microsemi RTG4 [58] In some cases, the development boards used contain close COTS counte rparts because the space grade processors are inaccessible. Therefore, results for the GR740, RAD5545, and HPSC are adjusted to determine the space grade performance based upon differences in operating frequencies where the HPSC is estimated to operate at 500 MHz Additionally, floating point results for the GR740 are adjusted to account for an estimated 30% performance reduction due to differences in the number of floating point units. Because the Virtex 5QV is inaccessible, results are generated using ve ndor provided software, and thus the COTS counterpart is only used for functional verification. Although there are also two DSPs of interest they are not included due to lack of suitable resources for benchmarking analysis For each of the CPUs being considered, benchmarks are developed in C, then compiled and executed using a GNU/Linux operating system. Wherever possible, benchmarks use optimized floating point libraries [59,60 ]. For the GR740, RAD5545, and HPSC, which are multicore architectures, com putations are parallelized across processor cores with a shared memory strateg y using the OpenMP interface [61 ].

PAGE 26

26 Additionally, benchmarks that have results reported in MOPS or MT/s are further parallelized for the HPSC within each processor core using its single instructi on, multiple data (SIMD) units [62 ]. For the Maestro, which is a many core architecture with nonuniform memory accesses, computations are parallelized across processor cores with a hybrid shared memory and message passing strategy using bot h the OpenMP interface and the Tilera User Dynamic Network interface to preserve data locality and control cache usage by manually scattering and gathering data between core s [63,64 ]. Results for parallel efficiency are based upon speedup calculated by div iding parallel performance by serial baselines. Because power dissipation cannot be measured directly, results for power efficiency are based upon estimated maximum power dissipation as described in vendor provided documentation. For each of the FPGAs bein g considered, benchmarks are developed in VHDL, then simulated, synthesized, and implemented using vendor provided software. Wherever possible, benchmarks use vendor provided and open source libraries [42,43,65 68 ], and are further optimized by inserting p ipeline stages into the design, which reduces the propagation delay incurred per clock cycle, thus increasing achievable operating frequencies. Based upon the resource usage of each benchmark in terms of available MACs LUTs FFs and block random access m emories (BRAMs), computations are parallelized across the architectures wherever possible by instantiating and operating multiple benchmarks simultaneously. Because power dissipation cannot be measured directly, results for power efficiency are based upon data generated from vendor provided software that provides the estimated power d issipation for each design

PAGE 27

27 CHAPTE R 4 METRICS EXPERIMENTS, RESULTS, AND ANALYSIS To enable quantitative analysis and objective comparisons of space grade processors, metrics are calculated for many diverse space grade and low power COTS processors. First, space grade processors are compared to one another. Next, top performing space grade processors are further analyzed to determine how performance varies between appli cations and computations based upon operations mix. Then, space grade processors are compared to the closest COTS counterparts upon which they were based to determine the overheads incurr ed from radiation hardening. Finally, top performing space grade and low power COTS processors are compared to determine the potential for future space grade processors. Metrics Comparisons of Space Grade CPUs, DSPs, and FPGA s Usi ng the methods described in Chapter 3 Figure 4 1 provides CD, CD/W, IMB, EMB, and IOB for vari ous current and upcoming space grade processors in logarithmic scale, incl uding the Honeywell HXRHPPC [69 ] and BAE Systems RAD750 [70 ], which are single co re CPUs; the Cobham GR7 12RC [71,72 ], Cobham GR740 [45 47 ], and BAE System s RAD5545 [49 ], which are multico re CPUs; the Boeing Maestro [52 54 ], which is a many cor e CPU; the Ramon Chips RC64 [73 77 ] and BAE Systems RADSPEED [78,79 ], which are multicore DSPs; and the Xilinx Virtex 5Q V [41 43,56 ] and Microsemi RTG4 [ 58,80 82 ], whic h are FPGAs. Data from Fi gure 4 1 are provided within Table A 1.

PAGE 28

28 A B C Figure 4 1. Metrics data for space grade CPUs, DSPs, and FPGAs A) CD. B) CD/W. C) IMB, EMB, and IOB. The HXRHPPC, RAD750, and GR712RC achieve lower CD and CD/W due to slower operating frequencie s and older single core or dual core CPU architectures with limited computational units. Additionally, they achi eve low IMB due to limited internal caches, low EMB due to limited or no dedicated ex ternal memory controllers, and low IOB due to limited and slow input/output resources. CPUs such as the GR740, RAD5545, and Maestro achieve a much higher CD than older CPUs due to their higher operating frequencies, newer multicore and many core architectures, and (in the case of both the RAD5545 and Maestro), multiple integer units within each processor core. Of all the CPUs compared, the Maestro achiev es the highest CD and IMB due to its large

PAGE 29

29 number of processor cores and ca ches, whereas the GR740 achieves the highes t CD/W due to its low power dissipation. Although the theoretical capabilities of space grade processors are greatly increasing due to newer CPUs, even further gains are made with DSPs and FPGAs. The RC64 achieves a high integer CD, and the RADSPEED achiev es a high floating point CD due to large levels of parallelism for these types of computational units; and both achieve a high IMB due to large numbers of internal caches and register files. The Virtex 5QV achieves high CD and C D/W, and the RTG4 achieves h igh integer CD and CD/W because they support large numbers of computat ional units at a relatively low power dissipation; and both achieve high IMB due to large n umbers of internal BRAM units, high EMB because they support multiple dedicated controll ers for external memories, and high IOB due to the large number of general purpose input/output ports available By comparing space grade processors using metrics, the changes in theoretical capabilities of space grade processors are analyzed. The performance ach ieved by space grade processors has increased by several orders of magnitude due to newer processors with more advanced architectures that support higher levels of parallelism in terms of computational units, internal memories, and input/output resources. Performance Variations in Space Grade CPUs, DSPs, and FPGAs CD calculations for each processor are based upon an operations mix of half additions and half multiplications by default because this is a common and critical operations mix for many intensive computations that are used i n space applications. However, further analysis is conducted for other important operations mixes. Figure 4 2 displays several examples of computations used in space applications and their

PAGE 30

30 corresponding operations mixes of a d dit ions and multiplications [83 8 8 ], where subtractions are considered logically equivalent to additions. Although overheads are required during implementation, these operations mixes characterize the work operations involved, and thus provide a foundation fo r the performance of each computation and the appli cations in which they are used. Figure 4 2. Operations mixes of intensive computations Figure 4 3 provides CD for top performing space grade CPUs, DSPs, and FPGAs using all possible operations mixes consisting of additions and multiplications to demonstrate how the performance varies between differ ent computations Data from Figure 4 3 are provided within Table A 2. Further experimentation would be conducted for additional operations mixes that relate to other computations consisting of operations such as divisions, shifts, square roo ts, and trigonometric functions, but is not possible because information about the performance of these operations is often not included in vendor provided documentation or they are accomplished using software emulation.

PAGE 31

31 A B C D E F G Figure 4 3. Performance variations in space grade CPUs, DSPs, and FPGAs A) GR740. B) RAD5545. C) Maestro. D) RC64. E) RADSPEED. F) Virtex 5QV. G) RTG4.

PAGE 32

32 The GR740 contains an integer unit for each processor core that can compute one Int8, Int16, or Int32 addition or multiplication per cycle. The GR740 also co ntains a floating point unit for each processor core that can compute one SPFP or DPFP addition or multiplication per cycle. Therefore, both integer and floating point CD remain constant for all operations mixes because additions and multiplications are co mputed in the same number of cycles. The RAD5545 contains several integer unit s for each processor core, including two units that can each compute one Int8, Int16, or Int32 addition per cycle and one unit that can compute one Int8, Int16, or Int32 multipli cation per cycle. Operations can be issued to two of these units in the same cycle, resulting in the ability to compute both an addition and a multiplication per cycle, two additions per cycle, or one multiplication per cycle. Therefore, integer CD remains constant for operations mixes with a majority of additions, but it decreases up to 50% as the percentage of multiplications surpasses the percentage of additions due to more multiplications that cannot be computed simultaneously with additions. The RAD554 5 also contains a floating point unit for each processor that can compute one SPFP or DPFP addition or multiplication per cycle. Therefore, floating point CD remains constant for all operations mixes because additions and multiplications are computed in th e same number of cycles. The Maestro contains several integer unit s for each processor core, including two units that can each compute four Int8 additions, two Int16 additions, or one Int32 addition per cycle, and one unit that can compute one Int8, Int16, or Int32 multiplication in two cycles. Therefore, integer CD decreases up to 94% as the percentage of multiplications increases because multiplications take more cycles to compute than

PAGE 33

33 additions and have less computational units for each processor core. T he Maestro also contains a floating point unit for each processor core that can compute one SPFP or DPFP addition per cycle and one SPFP or DPFP multiplication in two cycles, with the ability to interleave additions with multiplications. Therefore, floatin g point CD remains constant for operations mixes with a majority of additions but decreases up to 50% as the percentage of multiplications surpasses the percentage of additions because multiplications take more cycles to compute and this results in more mu ltiplications that cannot be interleaved with additions. The RC64 contains several computational units for each processor core that can compute eight Int8 or Int16 additions per cycle, four Int32 additions per cycle, four Int8 or Int16 multiplications per cycle, one Int32 multiplication per cycle, or one SPFP addition or multiplication per cycle. DPFP operations are not supported. Therefore, integer CD decreases up to 75% as the percentage of multiplications increases because multiplications take more cycle s to compute than additions. F loating point CD remains constant for all operations mixes because ad ditions and multiplications are computed in the same number of cycles. The RADSPEED contains an integer unit for each processor core that can compute one Int 8 addition per cycle, one Int16 addition in two cycles, one Int32 addition in four cycles, one Int8 or Int16 multiplication in four cycles, or one Int32 multiplicatio n in seven cycles. Therefore, integer CD decreases up to 75% as the percentage of multipli cations increases because multiplications take more cycles to compute than additions. The RADSPEED also contains several floating point unit s for each processor core, including one unit that can compute one SPFP or DPFP addition per cycle and

PAGE 34

34 one unit that can compute one SPFP or DPFP multiplication per cycle. Operations can be issued to both of these units in the same cycle, resulting in the ability to compute both an addition and a multiplication per cycle but not two additions or two multiplications per cycle. However, the ability to compute two operations per cycle only applies to SPFP operations because DPFP operations are limited by bus widths. Therefore, single precision floating point CD peaks when the percentages of additions and multiplications are equal and decreases up to 50% as the percentages of additions and multiplications become more unbalanced. D ouble precision floating point CD remains constant for all operations mixes because additions and multiplications are computed in the same number of cycles. The Virtex 5QV and RTG4 contain reconfigurable architectures that support computational units that compute one Int8, Int16, Int32, SPFP, or DPFP addition or multiplication per cycle. As data types and precisions increase, slower operating frequencies can typically be achieved and more logic resources are required. For Int8, Int16, and Int32 operations, multiplications typically achieve slower operating frequencies than additions and require more logic resources. Therefore, integer CD decreases up to 92% for the Virtex 5QV and up to 99% for the RTG4 as the percenta ge of multiplications increases. For SPFP and DPFP operations, multiplications typically achieve slower operating frequencies than additions and require less logic resources when multiply accumulate units are used, but they require more logic resources whe n these units are not used. Therefore, floating point CD either increases or decreases as the percentage of multiplications increases, depending on the use of multiply accumulate units. However, floating point CD does not vary as much as the

PAGE 35

35 integer CD bec ause the differences between logic resources used for additions and multiplications are not as significant. By matching the operations mixes from Figure 4 2 with the results from Figure 4 3, the variations in performance between different computations are analyzed for each top performing space grade processor. For all operations on the GR740, the floating point operations on the RAD5545 and RC64, and the double precision floating point operations on the RADSPEED, CD does not vary between computations For i nteger operations on the RAD5545 and floating point operations on the Maestro, CD is highest for computations that use at least half additions (such as matrix addition, fast Fourier transform, matrix multiplication, and matrix convolution), becomes worse f or computations that use more than half multiplications (such as Jacobi transformation), and is lowest for computations that use all multiplications (such as the Kronecker product). For integer operations on the Maestro, RC64, RADSPEED, Virtex 5QV, and RTG 4, CD is highest for computations that use all additions such as matrix addition and becomes worse for all other computations where more multiplications are used. For single precision floating point operations on the RADSPEED, CD is highest for computation s that use half additions and half multiplications (such as matrix multiplication and matrix convolution), becomes worse for all other computations as either more additions or more multiplications are used, and is lowest for computations that use either al l additions or all multiplications (such as matrix addition or the Kronecker product). For floating point operations on the Virtex 5QV and RTG4, CD varies moderately between computations Variations in CD demonstrate how the

PAGE 36

36 performance of space grade processors is affected by the operations mixes of the intensive computations used in space applications. Overheads Incurred from Radiation Hardening of CPUs, DSPs, and FPGAs Figure 4 4 provides CD, CD/W, IMB, EMB, and IOB for the closest COTS counter parts to the space grade CPUs, DSPs, and FPGAs from Figure 4 1 in logarithmic scale, where the HXRHPPC was based upo n the Freescale PowerPC603e [8 9 ], the RAD750 was based upon the IBM Power PC750 [9 0 9 2 ], the RAD5545 wa s based upon the QorIQ P 5040 [3 7 3 9 ], the Maestro was b ased upon the Tilera TILE64 [93,9 4 ], the RADSPEED was based upon the ClearSpeed CSX700 [95,9 6 ], and the Virtex 5QV was ba sed upon the Virtex 5 FX130T [4 0 4 3 ]. The GR712RC, GR740, RC64, and RTG4 are not included because they were not based upon any specific COTS processors Data from Figure 4 4 are provided within Table A 3. By comparing the results from Figures 4 1 and 4 4, the overheads incurred from radiation hardening of COTS processors are calculated. Figure 4 5 provides the percentages of operating frequencies, the number of processor cores for CPUs and DSPs or computational units for FPGAs power dissipation, CD, CD/W, IMB, EMB, and IOB achieved by each space grade processor as compared t o its closest COTS counterpart. Data from Figure 4 5 are provided within Tables A 4 and A 5. The largest decreases in operating frequencies are for the multicore and many core CPUs because their closest COTS counterparts benefited from high operating frequencies that were significantly decreased to be sustainable on space grade processors, whereas the closest COTS counterparts to the RADSPEED and Virtex 5QV only required moderate operating frequencies to begin with, and therefore did not need t o be dec reased as significantly.

PAGE 37

37 A B C Figure 4 4. Metrics data for closest COTS counterparts to space grade CPUs, DSPs, and FPGAs. A) CD. B) CD/W. C) IMB, EMB, and IOB. The largest decreases in the number of processor cores or computational units are for the Maestro, RADSPEED, and Virtex 5QV because their closest COTS counterparts contained large levels of parallelism that could not be sustained after radiation hardening, whereas the closest COTS counterparts of the multicore CPUs did not contain enough pa rallelism to require any decreases to the number of processor cores during radiation hardening. The Maestro achieves a larger floating point CD and CD/W than its closest COTS counterpart due to the addition of floating point units to each processor core, r esulting in the only occurrence of increases in metrics after radiation hardening.

PAGE 38

38 A B C D E F Figure 4 5. Percentages achieved by space grade CPUs, DSPs, and FPGAs after radiation hardening A) Operating frequency. B) Processor c ores or computational units. C) Power dissipation. D) CD. E) CD/W. F) IMB, EMB, and IOB. Increases and decreases in power dissipation are more unpredictable because they are dependent on many factors, including decreases in operating frequencies and

PAGE 39

39 the numbe r of processor cores or computational units and changes to input/output peripherals By comparing space grade processors to their closest COTS counterparts using metrics, the overheads incurred from radiation hardening are analyzed. The largest decreases in CD and IMB occurred for the multicore and many core CPUs rather than the DSP and FPGA, demonstrating that large decreases in operating frequencies had a more significant impact on the resulting CD and IMB than decreases in the number of processor cores or computational units The smallest decreases in CD/W occurred for the Virtex 5QV due to relatively small decreases in CD and only minor variations in power dissipation. The largest decreases in EMB and IOB occurred for the older single core CPUs because their input/output resources are highly dependent on operating frequencies that were significantly decreased. These overheads can be considered when analyzing processors for potential radiation hardening and use in space missions. Project ed Future Space Grade CPUs, DSPs, FPGA s and GPUs Figure 4 6 provides CD, CD/W, IMB, EMB, and IOB for a variety of low power COTS processors in logarithmic scale, incl uding the Intel Quark X1000 [97,9 8 ], which is a single cor e CPU; the In tel Atom Z3770 [99 ,10 0], Intel Core i7 4610Y [10 1 10 4] and Samsung Exynos 5433 [10 5 10 7 ], which are multicore CPUs; the Tilera TILE Gx8036 [10 8 11 0 ], which is a many core CPU; the Free scale MSC8256 [11 1 11 3 ], which is a multicore DSP; the Texas Instr uments KeyStone II 66AK 2H12 [11 4 11 6 ], which is a multicore DSP paired with a m ulticore CPU; the Xilinx Spartan 6Q LX150T [42,43,117,11 8 ], Xilinx Artix 7Q 350T [42,43,119,12 0 ], and Xilinx Kintex 7Q K410T [42,43,119,12 0 ], which are FPGAs; and the NVIDIA Tegra 3 [1 21,12 2 ], NVIDIA Tegra K1 [115,116,123], and Tegra X1 [106,107,12 4 ], which are GPUs paired with multicore

PAGE 40

40 CPUs. Several modern processors are considered from each category with power dissipation no larger than 30 W. Data from Figure 4 6 are provided within Table A 6. A B C Figure 4 6. Metrics data for low power COTS CPUs, DSPs, FPGAs, and GPUs A) CD. B) CD/W. C) IMB, EMB, and IOB. By comparing many low power COTS processors, the top performing architectures are selected and considered for potential radiation hardening and use in future space missions. Although the Core i7 4610Y is the top performing CPU in most

PAGE 41

41 cases, the Exynos 5433 achieves the largest CD/W of the CPUs due to its low power dissipation. The top performing DSP, FPGA, and GPU are the KeyStone II, Kintex 7Q, and Tegra X1, respectively. However, if the architectures from these COTS processors were to be used in potential future space grade processors, several overheads would likely be incurred durin g the radiation hardening p rocess that must be considered. Therefore, the results for top performing COTS processors from Figure 4 6 are decreased based upon the worst case and best case radiation hardening overheads from Figure 4 5 to project metrics for potential future space grade processors. Figure 4 7 provides worst case and best case projections in logarithmic scale for potential future space grade processors based upon the Core i7 4610Y, Exynos 5433, KeyStone II, Kintex 7Q, and Tegra X1 alongside the top performing space grade processors from Figure 4 1 to determine how additional radiation hardening of top performing COTS processors could impact the theoretical capabilities of space grade processors. Data from Figure 4 7 are provided within Tables A 7 and A 8. By comparing top performing and projected future space grade processors using metrics, the potential benefits of radiation hardening additional COTS architectures are analyzed. Although the results from Figure 4 5 suggest that the radiation hardening of CP Us typically results in large overheads, the Core i7 4610Y and Exynos 5433 achieve the largest CD and CD/W for each data type considered, as well as the largest IMB, out of all space grade CPUs even when using worst case projections. However, the results f rom Figure 4 5 also suggest that the radiation hardening of DSPs and FPGAs typically results in smaller overheads. When using best case projections, the KeyStone II and Kintex 7Q achieve the largest CD and CD/W for each data type considered, as well as

PAGE 42

42 the largest EMB, as well as the largest IMB and IOB in most cases, out of all space grade processors. Finally, although there are no past results for the radiation hardening of GPUs, the Tegra X1 achieves a large CD and CD/Wand a moderate IMB, EMB, and IOB wi thin the range of projections used. A B C Figure 4 7. Metrics data for current and projected future space grade CPUs, DSPs, FPGAs, and GPUs. A) CD. B) CD/W. C) IMB, EMB, and IOB.

PAGE 43

43 Based upon the projections and comparisons from Figure 4 7, COTS proces sors from each category have a high potential to increase the theoretical capabilities of space grade processors, even with the overheads incurred from radiation hardening. Therefore, as expected, radiation hardening of modern COTS processors could benefit onboard computing in terms of performance, power efficiency, memory bandwidth, and input/output bandwidth; and these results help to quantify potential outcomes.

PAGE 44

44 CHAPTER 5 BENCHMARKING EXPERIMENTS, RESULTS, AND ANALYSIS To analyze and compare the top performing processors, benchmarking is conducted to determine their realizable capabilities through hardware and software experimentation. First, a taxonomy is presented that characterizes and classifies the space computing domain and is used to identi fy computations for benchmarking analysis. Next, the performance of space grade processors is analyzed to determine how to optimize and parallelize computations for their architectures for both a sensor processing benchmark and an autonomous processing ben chmark. Then, space grade processors are directly compared to one another to provide insights into which architectures perform best in terms of performance and power efficiency. Finally, an expanded analysis is presented using a variety of additional bench marks Space Computing Taxonomy and Benchmarks A comprehensive study of common and critical applications is presented based upon space mission needs and is used to establish computational dwarfs and formulate a corresponding taxonomy for the space computin g domain. Because thorough consideration of every possible application is impractical, the space computing taxonomy provides a broad and expansive representation of the computing requirements required for space missions. Table 5 1 presents the space comput ing taxonomy, which is composed of high level dwarfs and their corresponding applications, and is followed by discussion

PAGE 45

45 Table 5 1. Space computing taxonomy. Dwarf Applications Remote sensing Synthetic aperture radar Light detection and ranging Beamforming Sensor fusion Image processing Hyper/multi spectral imaging Hyper temporal imaging Stereo vision Feature detection and tracking Image and video compression Orbital orientation Horizon and star tracking Attitude determination and control Orbit determination and control Orbital maneuvering Relative motion control Rapid trajectory generation On orbit assembly Surface maneuvering Autonomous landing Hazard detection and avoidance Terrain classification and mapping Path optimization Missio n planning Intelligent scheduling Model checking Environmental control Communication Software defined radio Error detection and correction Cryptography Requirements for onboard sensor processing are rapidly increasing due to advancements in remote sensing and data acquisition, including radar and laser applications and operations for combining sensor data, which impose in tens ive computational demands [11,12 5 12 7 ]. Image processing is commonly required, including imaging across frequency spectrums an d in noisy environments, in addition to resolution enhancement, stereo vision, and detection and tracki ng of features across frames [12 8 13 1 ]. Because sensor data cannot always be processed onboard, and communication bandwidth to ground stations is limited, data compression can reduce

PAGE 46

46 communication requirements to ensure that critical sensor dat a are retrieved and analyzed [13 2 13 4 ]. Guidance, navigation, and control applications are critical to space missions, and require intensive computing for real time autonomous operations, including horizon and star tracking, and determination and control algorithms for s pacecraft attitude and orbit [13 5 13 7 ]. Autonomous maneuvering is required in orbital missions for proximity operations, including relative motion control for formation flying, rendezvous and d o cking, and on orbit assembly [13 8 14 2 ]. Surface missions require autonomous maneuvering to study foreign environments and to safely and precisely land on and nav igate unfamiliar terrain [14 3 14 7 ]. Autonomous mission planning consists of profiling, intelligent scheduling, and abstract modeling of onboard science experiments, environmental control syst ems, and space craft maneuvering operations [127,14 8 15 1 ]. Communication capabilities to ground stations or other remote systems are also critical for space missions, increasingly based upon software defined radio due to higher flexib ility and ease of adapt ation [15 2 ]. Due to the unreliability of remote communication systems and the hazards posed by the harsh space environment, fault tolerance is critical for space missions, and data reliability can be strengthened by periodically scrubbing memories and appl ying error detection and correction codes to data transmissions [153,15 4 ]. Cryptographic techniques are often required to protect sensitive and valua ble data during transmission [15 5 ]. While mission security can require specific, classified cryptographic a lgorithms, computationally similar unclassified algorithms are also of significance for less sensitive or shorter duration missions [156,15 7 ].

PAGE 47

47 To determine which computations to prioritize for benchmark development, optimization, and testing, the taxonomy is decomposed to identify its most computationally intensive parts, where most can be characterized by several of the more abstracted UBC dwarfs such as dense and sparse linear algebra, spectral methods, and combinational logic. Table 5 2 presents the set of space computing benchmarks, which largely represent s the computations required by the dwarfs a nd applications of the taxonomy Table 5 2. Space computing benchmarks. Benchmark Matrix multiplication Matrix addition Matrix convolution Matrix transpose Kronecker product Fast Fourier transform Haar wavelet transform Discrete wavelet transform Clohessy Wiltshire equations Artificial potential functions Reed Solomon codes Advanced Encryption Standard From this set of computations, space computing benchmarks are developed, optimized, and tested on space grade processors using the methods described in Chapter 3 First, space grade CPUs and FPGAs are analyzed and compared using a matrix multiplication be nchmark with size 1024 1024 matrices, which largely vectors, which largely represents autonomous processing. Then, an expanded analysis is conducted on space grade CPUs using a variety of additional benchmarks to determine how their architectures perform across a broad range of computations used within the space computing taxonomy. The expanded analysis uses a matrix addition

PAGE 48

48 benchmark with size 2048 2048 matrices, a matrix c onvolution benchmark with size 2048 2048 matrices and a size 3 3 Sobel filter, a matrix transpose benchmark with size 2048 2048 matrices, and a Clohessy Wiltshire equations benchmark with size 2048 vectors. By generating benchmarking data on space gr ade processors, their realizable capabilities are analyzed for computations that are used either within specific applications or broadly across the space computing domain Performance Analysis of Space Grade CPUs Figures 5 1 and 5 2 provide parallelization data for the matrix multiplication and grade CPUs, including the Cobham GR740, BAE Systems RAD5545, Boeing HPSC, and Boeing Maestro. For each architecture, speedup is achieved as computations are distributed across pr ocessor cores. Performance is analyzed using increasing numbers of threads to quantify and compare the parallel efficiency of ea ch architecture. Data from Figures 5 1 and 5 2 are provided within Table B 1. For matrix multiplication, speedup is achieved on the GR740, RAD5545, and HPSC, with minor increases once the number of threads exceeds the number of processor cores. The GR740 achieves minor speedup, the RAD5545 achieves near linear speedup in most cases, and the Maestro achieves significant speedup. The HPSC achieves moderate speedup for most data types but only minor speedup for the Int32 data type that requires large data precisions for its results to prevent overflow

PAGE 49

49 A B C D Figure 5 1. Parallelization data for matrix multiplication on space grade CPUs A) GR740. B) RAD5545. C) HPSC. D) Maestro. and HPSC, but decreases once the number of threads exceeds the number of processor cores. The GR740 achieves minor to moderat e speedup, the RAD5545 achieves near linear speedup in most cases, and the HPSC achieves near linear speedup in all cases other than the less computationally intensive Int8 and Int16 data types. The Maestro achieves the lowest speedup, whe re integer performance decreases once more than only few threads are used, and floating point performance decreases once the number of threads approaches half the number of processor cores.

PAGE 50

50 A B C D Figure 5 equation on space grade CPUs A) GR740. B) RAD5545. C) HPSC. D) Maestro. By analyzing the parallel speedup achieved by space grade CPUs, the efficiency of their multicore and many core architectures is quantified and compared. In particular, the GR740 achieves minor to moderate levels of parallel efficiency with more speedup levels of parallel efficiency with near linear speedup in most cases, de monstrating that the performance of its architecture does not suffer significantly from communication overheads between processor cores. The HPSC achieves moderate speedup for matrix multiplication, but benefits from the SIMD units within each processor co re, and achieves near

PAGE 51

51 intensive data types The Maestro achieves the lowest levels of parallel efficiency considering its large number of processor cores, demonstrating that some computatio ns cannot be efficiently parallelized even using a hybrid strategy. Results demonstrate the realizable capabilities and limitations of space grade CPUs and that future architectures must be designed to minimize communication overheads between processor cor es to achieve progressively higher levels of performance Performance Analysis of Space Grade FPGAs Figures 5 3 and 5 4 provide resource usage for the matrix multiplication and grade FPGAs, including the Xilinx Virtex 5QV and Microsemi RTG4. For each architecture, speedup is achieved wherever possible as computations are distributed across the architecture by instantiating and operating multiple benchmarks simultaneously. Resource efficiency is analyzed in terms of MACs LUTs, FFs and BRAMs, to quantify and compare the resource efficiency of each architecture. Data from Figures 5 3 and 5 4 are provided within Table B 2 A B Figure 5 3. Resource usage data for matrix multiplication on space grade FPGAs A) Virtex 5QV. B) RTG4.

PAGE 52

52 A B Figure 5 grade FPGAs A) Virtex 5QV. B) RTG4. For matrix multiplication, the resources required by the designs are significant enough that only one benchmark can be instantiated and operated. In most cases, the limiting factors that prevent parallel speedup from being achieved on these architectures are the availability of additional MACs for the Virtex 5QV and of additional BRAMs for the RTG4. Thus, matrix multip lication is typically bound computationally on Virtex 5QV and typically bound by memory on RTG4. significant, which allows for many benchmarks to be instantiated and operated si multaneously, where the number of benchmarks that can operate in parallel is highest for Int8 data types and typically decreases as the level of data precision increases. In most cases, the limiting factors that prevent additional parallel speedup from bei ng achieved on these architectures are the availability of additional LUTs and FFs because co mputationally on the Virtex 5QV and RTG4.

PAGE 53

53 By analyzing the resource usage of space grade FPGAs, the efficiency of their reconfigurable architectures is quantified and compared. In particular, the Virtex 5QV and RTG4 achieve much higher levels of parallel for matrix multiplication, due to significantly less resource usage that allows multiple benchmarks to be instantiated and operated simultaneously. In most cases, the Virtex 5QV is bound computationally, while the RTG4 equation and bound by memory for matrix multiplication. In some cases, floating point results are bound by the inability to route additional designs across the reconfigurable architectures. Results demonstrate the rea lizable capabilities and limitations of space grade FPGAs and that future architectures must be designed to minimize resource usage for arithmetic operations and trigonometric functions to achieve progressively higher levels of performance Benchmarking Co mparisons of Space Grade CPUs and FPGAs Figures 5 5 and 5 6 provide performance and power efficiency for the matrix grade CPUs and FPGAs in logarithmic scale, including the GR740, RAD5545, HPSC, Maes tro, Virtex 5QV and RTG4. The realizable capabilities of their architectures are analyzed and directly compared to one another. Data from Figures 5 5 and 5 6 are provided within Tables B 3 and B 4 For matrix multiplication, the floating point performance of the RAD5545 and HPSC benefit significantly from the use of optimized libraries. Although integer performance is also of importance for sensor processing, there is no equivalent library available for integer data types. The HPSC also benefits significant ly from the use of SIMD units The Virtex 5QV and RTG4 typically achieve higher performance for integer

PAGE 54

54 data types than for floating point data types due to more efficient libraries that require fewer cycles to compute each result and support higher operat ing frequencies. A B Figure 5 5. Benchmarking data for matrix multiplication on space grade CPUs and FPGAs A) Performance. B) Power efficiency. A B Figure 5 grade CPUs and FPGAs A) Performance. B) Power efficiency. point data types are typically required, but integer data types can also be advantageous if potential increases in performance are worth the loss in data precision. Although integer perform ance on space grade CPUs is not significantly higher than floating point performance, the space grade FPGAs achieve much higher integer performance than floating point performance due to the lower

PAGE 55

55 resource usage of the designs, which results in higher achi evable operating frequencies and the ability to instantiate and operate many more benchmarks in parallel. Therefore, a trade off between performance and data precision is likely not worthwhile on the space grade CPUs, but could be worthwhile on the space g rade FPGAs. By analyzing the performance and power efficiency of space grade CPUs and FPGAs and directly comparing them to one another, the realizable capabilities of their architectures are quantified and examined Of the space grade CPUs, the GR740 achie ves high levels of power efficiency due to its low power dissipation, the RAD5545 and HPSC achieve high levels of floating point performance due to the use of optimized libraries, and the Maestro achieves high levels of integer performance in some cases du e to its multiple integer execution pipelines However, the HPSC achieves the highest performance in most cases largely due to its SIMD units. The space grade FPGAs achieve much higher performance and power efficiency than any of the space grade CPUs due to their ability to reconfigure their architectures specifically for each benchmark, their ability to support high levels of parallelism in many cases, and their low power dissipation. In particular, the Virtex 5QV typically achieves the highest performanc e largely due to its relatively high operating frequencies and the RTG4 typically achieves the highest power efficiency due to its very low power dissipation. Results demonstrate that the most desirable architectures for future space grade processors are m ulticore CPUs, FPGAs, and hybrid combinations of both, particularly those that include SIMD units and better support for trigonometric functions Expanded Analysis of Space Grade CPUs Figure 5 7 provides performance for the matrix addition, matrix convolut ion, matrix transpose, and Clohessy Wiltshire equations benchmarks on space grade CPUs

PAGE 56

56 in logarithmic scale, including the GR740, RAD5545, HPSC, and Maestro. The architectures of these processors are analyzed and compared to provide additional insights int o their realizable capabilities for onboa rd computing using a variety of benchmarks. Data from Figure 5 7 are provided within Table B 5. A B C D Figure 5 7. Performance data for additional benchmarks on space grade CPUs A) Matrix addition. B) Matrix convolution. C ) Matrix transpose. D ) Clohessy Wiltshire equations The HPSC achieves the highest performance in almost all cases, demonstrating its advantages for arithmetic operations, trigonometric functions, and memory transfers. Th e RAD5545 typically achieves the next highest performance, followed by the GR740, and then by the Maestro. Results further demonstrate that multicore architectures are

PAGE 57

57 more desirable than many core architectures for future space grade CPUs due to better pa rallel efficiency between processor cores and th at the SIMD units of the HPSC are highly beneficial for a variety of computations By expanding the performance analysis of space grade CPUs using a variety of benchmarks, the realizable capabilities of their architectures are quantified and compared across a broad range of computations used within the space computing taxonomy.

PAGE 58

58 CHAPTER 6 CONCLUSIONS To address the continually increasing demands for onboard computing research is conducted into a broad range of processors using metrics and benchmarking to provide critical insights for progressively more advanced architectures that can better meet the computing needs of future space missions Trade offs between architectures are determined that can be considered when deciding which space grade processors are best suited for specific space missions or which characteristics and features are most desirable for future space grade processors A metrics analysis is presented as a methodology to quantitativ ely and objectively analyze a broad range of space grade and low power COTS processor s in terms of performance, power efficiency memory bandwidth, and input/output bandwidt h Results are generated to enable comparisons of space grade processors to one ano ther, comparisons of space grade processors to their closest COTS counterparts to determine overheads incurred from radiation hardening, and comparisons of top performing space grade and COTS processors to determine the potential for future space grade pro cessors. Metrics results demonstrate and quantify how space grade processors with multicore and many core CPU, DSP, and FPGA architectures are continually increasing the theoretical capabilities of space missions by supporting high levels of parallelism in terms of computational units, internal memories, and input/output resources. In particular, the best results are provided by the RC64, Virtex 5QV, and RTG4 for integer CD and CD/W; the RADSPEED and Virtex 5QV for floating point CD and CD/W; the RC64 and V irtex 5QV for IMB; the RAD5545 and Virtex 5QV for EMB; and the

PAGE 59

59 RAD5545, Virtex 5QV, and RTG4 for IOB. Additionally, CD results for each top performing space grade processor are further analyzed to demonstrate and evaluate how the performance can vary signi ficantly between applications, depending on the operations mixes used within the intensive computations with the largest variations occurring for integer operations on the Maestro, Virtex 5QV, and RTG4. The overheads incurred from radiation hardening are quantified using metrics and analyzed, where the overheads incurred by the space grade CPUs are typically much larger than those incurred by the DSP and FPGA because they required more significant decreases in operating frequencies. O verheads from past ca ses of radiation hardening are used to project metrics for potential future space grade processors, demonstrating and quantifying how the radiation hardening of modern COTS processors from each category could result in significant increases in the theoreti cal capabilities of future space missions. In particular, the Core i7 4610Y and Exynos 5433 could provide the largest CD, CD/W, and IMB out of all space grade CPUs; the KeyStone II and Kintex 7Q could provide the largest CD, CD/W, and EMB out of all space grade processors, as well as the largest IMB and IOB in most cases; and the Tegra X1 could provide the largest CD and CD/W out of all space grade processors. Once the top performing processors are identified using metrics, a benchmarking analysis is demons trated as a methodology to measure and compare their realizable capabilities. T he space computing domain is broadly characterized to establish computational dwarfs, including remote sensing, image processing, orbital orientation, orbital maneuvering, surfa ce maneuvering, mission planning, and communication, and a corresponding taxonomy is formulated and used to identify a set of computations for

PAGE 60

60 benchmark development, optimization, and testing. R esults are generated on space grade CPUs and FPGAs for matrix multiplication, a sensor processing benchmark, and autonomous processing benchmark and for a variety of additional benchmarks on space grade CPUs for an expanded analysis. Of the CPUs analyzed using benchmarking the GR740 achieves h igh levels of power efficiency due to its low power dissipation, the RAD5545 achieves high levels of parallel efficiency due to its low communication overheads, and the HPSC achieves high levels of performance and powe r efficiency for a variety of computat ions due to its SIMD units. Results demonstrate that multicore architectures are more desirable than many core architectures for future space grade CPUs due to better parallel efficiency between processor cores. The space grade FPGAs analyzed using benchmarking achieve much higher performance and power efficiency than any of the space grade CPUs due to their ability to reconfigure their architectures specifically for each benchmark, their ability to support high levels of parallelism in many cases, and their low power dissipation. In particular, the Virtex 5QV typically achieves the highest performance largely due to its relatively high operating frequencies and the RTG4 typically achieves the highest power efficiency due to its very low power dissip ation. Results demonstrate that lower resource usage for arithmetic operations and trigonometric functions is desirable for future space grade FPGAs to increase the parallel efficiency of their reconfigurable architectures. In conclusion, m etrics results d emonstrate that multicore and many core CPU, DSP, and FPGA architectures are continually increasing the capabilities of space missions, and that radiation hardening of modern COTS processors from each category

PAGE 61

61 could result in significant increases in futur e capabilities. Benchmarking results demonstrate that FPGAs achieve the highest levels of realizable performance and power efficiency, and that the most desirable architectures for future space grade processors are multicore CPUs, FPGAs, and hybrid combina tions of both, particularly those that include SIMD units and better support for trigonometric functions Future research directions involve expanding these experiments using additional processors, metrics benchmarks, and optimizations.

PAGE 62

62 APPENDIX A METRICS DATA Table A 1. Metrics data for space grade CPUs, DSPs, and FPGAs Processor CD (GOPS) Power (W) CD/W (GOPS/W) IMB (GB/s) EMB (GB/s) IOB (GB/s) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP HXRHPPC 0.08 0.08 0.08 0.08 0.04 7.60 0.01 0.01 0.01 0.01 0.01 1.92 0.00 0.16 RAD750 0.27 0.27 0.27 0.13 0.13 5.00 0.05 0.05 0.05 0.03 0.03 3.19 1.06 1.59 GR712RC 0.08 0.08 0.08 0.03 0.03 1.50 0.05 0.05 0.05 0.02 0.02 4.80 0.40 1.21 GR740 1.00 1.00 1.00 1.00 1.00 1.50 0.67 0.67 0.67 0.67 0.67 32.80 1.06 1.90 RAD5545 3.73 3.73 3.73 1.86 1.86 20.00 0.19 0.19 0.19 0.09 0.09 55.58 12.80 32.48 Maestro 11.99 11.32 10.19 12.74 12.74 22.20 0.54 0.51 0.46 0.57 0.57 152.88 8.32 15.07 RC64 102.40 102.40 30.72 19.20 0.00 8.00 12.80 12.80 3.84 2.40 0.00 3840.00 4.80 24.80 RADSPEED 14.17 11.81 6.44 70.83 35.42 15.00 0.94 0.79 0.43 4.72 2.36 589.04 7.46 15.16 Virtex 5QV 503.72 214.57 59.67 51.93 14.96 9.97 a 44.30 22.62 5.91 5.14 1.70 1931.04 16.00 109.16 RTG4 418.32 252.18 18.36 3.12 0.83 3.91 b 55.60 41.71 5.68 1.96 0.74 707.40 5.33 68.70 a Averaged between data types (Int8: 11.37 W; Int16: 9.49 W; Int32: 10.09 W; SPFP: 10.10 W; DPFP: 8.78 W) b Averaged between data types (Int8: 7.52 W; Int16: 6.05 W; Int32: 3.23 W; SPFP: 1.60 W; DPFP: 1.14 W) Table A 2. Performance variati ons in space grade CPUs, DSPs, and FPGAs Processor CD (GOPS) for 100% additions CD (GOPS) for 50% each CD (GOPS) for 100% multiplications Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP GR740 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 RAD5545 3.73 3.73 3.73 1.86 1.86 3.73 3.73 3.73 1.86 1.86 1.86 1.86 1.86 1.86 1.86 Maestro 101.92 50.96 25.48 12.74 12.74 11.99 11.32 10.19 12.74 12.74 6.37 6.37 6.37 6.37 6.37 RC64 153.60 153.60 76.80 19.20 0.00 102.40 102.40 30.72 19.20 0.00 76.80 76.80 19.20 19.20 0.00 RADSPEED 35.42 17.71 8.85 35.42 35.42 14.17 11.81 6.44 70.83 35.42 8.85 8.85 5.06 35.42 35.42 Virtex 5QV 2722.86 988.42 413.12 47.65 18.72 503.72 214.57 59.67 51.93 14.96 293.83 117.01 33.15 52.47 10.66 RTG4 3766.97 2180.88 1169.23 2.64 1.32 418.32 252.18 18.36 3.12 0.83 238.66 126.09 9.42 3.12 0.42 Table A 3. Metrics data for closest COTS counterpa rts to space grade CPUs, DSPs, and FPGAs Processor CD (GOPS) Power (W) CD/W (GOPS/W) IMB (GB/s) EMB (GB/s) IOB (GB/s) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP PowerPC60 3e 0.27 0.27 0.27 0.27 0.13 3.50 0.08 0.08 0.08 0.08 0.04 6.38 0.00 2.13 PowerPC750 0.80 0.80 0.80 0.40 0.40 4.70 0.17 0.17 0.17 0.09 0.09 9.60 3.20 4.00 P5040 17.60 17.60 17.60 8.80 8.80 49.00 0.36 0.36 0.36 0.18 0.18 262.40 25.60 44.22 TILE64 42.16 39.82 35.84 0.00 0.00 23.00 1.83 1.73 1.56 0.00 0.00 537.60 12.80 19.55 CSX700 19.20 16.00 8.73 96.00 48.00 10.00 1.92 1.60 0.87 9.60 4.80 792.00 8.00 20.72 Virtex 5 833.20 416.00 89.97 80.52 17.43 13.61 a 52.50 24.72 6.39 6.06 2.18 2413.80 21.33 121.58 a Averaged between data types ( Int8: 15.87 W; Int16: 16.83 W; Int32: 14.07 W; SPFP: 13.28 W; DPFP: 8.00 W )

PAGE 63

63 Table A 4. Radiation hardening outcom es for space grade CPUs, DSPs, and FPGAs Processor Space grade Closest COTS counterpart Percentages achiev ed by space grade Operating frequency (M Hz) Cores/units Power (W) Operating frequency (M Hz) Cores/units Power (W) Operating frequency (%) Cores/units (%) Power (%) HXRHPPC 80 .00 1 7.60 266 .00 1 3.50 30.08 100.00 217.14 RAD750 133 .00 1 5.00 400 .00 1 4.70 33.25 100.00 106.38 RAD5545 466 .00 4 20.00 2 200 .00 4 49.00 21.18 100.00 40.82 Maestro 260 .00 49 22.20 700 .00 64 23.00 37.14 76.56 96.52 RADSPEED 233 .00 76 15.00 250 .00 96 10.00 93.20 79.17 150.00 Virtex 5QV 226.47 a 695 a 9.97 a 304.99 b 820 b 13.61 b 74.25 84.79 73.22 a Avera ged between data types (Int8: 301 .30 M Hz 1672 cores, 11.37 W; Int16: 205.80 M Hz 1043 cores, 9.49 W ; Int32: 215.47 M Hz 276 c ores, 10.09 W; SPFP: 222.57 M Hz 233 cores, 10.10 W; DPFP: 187 .20 M Hz 79 cores, 8.78 W) b Avera ged between data types (Int8: 353 35 M Hz 2358 cores, 15.87 W; Int16: 380.95 M Hz 1092 cores, 16.83 W; Int32: 301.93 M Hz 298 cores, 14.07 W; SPFP: 327 .33 M Hz 246 cores, 13.28 W; DPFP: 161 .39 M Hz 108 cores, 8.00 W) Table A 5. Percentages achieved by space grade CPUs, DSPs, and FPGAs after radiation hardening Processor CD (%) CD/W (%) IMB (%) EMB (%) IOB (%) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP HXRHPPC 30.08 30.08 30.08 30.08 30.08 13.85 13.85 13.85 13.85 13.16 30.08 a 7.52 RAD750 33.25 33.25 33.25 33.25 33.25 31.26 31.26 31.26 31.26 31.26 33.25 33.25 39.80 RAD5545 21.18 21.18 21.18 21.18 21.18 51.90 51.90 51.90 51.90 51.90 21.18 50.00 73.46 Maestro 28.44 28.44 28.44 a a 29.46 29.46 29.46 a a 28.44 65.00 77.08 RADSPEED 73.80 73.81 73.77 73.78 73.78 48.96 49.38 49.43 49.19 49.19 74.37 93.25 73.17 Virtex 5QV 60.46 51.58 66.32 64.49 85.83 84.39 91.50 92.55 84.85 78.17 80.00 75.00 89.78 a Not applicable because original value was zero

PAGE 64

64 Table A 6. Metrics data for low power COTS CPUs, DSPs, FPGAs, and GPUs Processor CD (GOPS) Power (W) CD/W (GOPS/W) IMB (GB/s) EMB (GB/s) IOB (GB/s) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP Quark X1000 0.40 0.80 0.80 0.40 0.40 2.22 0.18 0.36 0.36 0.18 0.18 3.20 1.60 5.41 Atom Z3770 198.56 105.12 58.40 46.72 23.36 4.00 49.64 26.28 14.60 11.68 5.84 280.32 17.36 18.79 Core i7 4610Y 626.60 348.20 177.00 124.80 62.40 11.50 54.49 30.28 15.39 10.85 5.43 835.20 25.60 64.10 Exynos 5433 461.60 251.90 147.20 121.60 52.40 4.00 115.40 62.98 36.80 30.40 13.10 696.00 13.20 15.02 TILE Gx8036 388.80 216.00 129.60 43.20 43.20 30.00 12.96 7.20 4.32 1.44 1.44 1728.00 12.80 33.86 MSC8256 24.00 24.00 12.00 12.00 6.00 6.04 3.97 3.97 1.99 1.99 0.99 288.00 12.80 17.94 KeyStone II 1459.20 729.60 364.80 198.40 99.20 21.69 67.28 33.64 16.82 9.15 4.57 1270.40 28.80 48.22 Spartan 6Q 590.40 185.10 37.96 21.22 7.86 7.04 a 60.58 22.46 5.95 3.89 1.47 675.36 24.00 57.80 Artix 7Q 1245.00 939.10 163.30 134.20 45.52 14.70 b 75.49 52.65 13.73 8.93 3.72 3598.61 16.00 75.60 Kintex 7Q 2295.00 1696.00 380.60 224.30 91.95 27.41 c 74.28 51.36 18.03 8.72 3.50 6555.27 42.67 184.29 Tegra 3 265.98 137.98 73.98 73.98 25.60 2.00 132.99 68.99 36.99 36.99 12.80 265.60 10.68 16.33 Tegra K1 697.60 440.00 311.20 256.00 44.40 5.00 139.50 88.00 62.20 51.20 19.52 625.60 6.40 33.58 Tegra X1 1152.00 704.00 480.00 384.00 72.00 5.00 230.40 140.80 96.00 76.80 14.40 544.00 25.60 32.16 a Averaged between data types (Int8: 9.75 W; Int16: 8.24 W; Int32: 6.38 W; SPFP: 5.46 W; DPFP: 5.36 W) b Averaged between data types (Int8: 16.49 W; Int16: 17.84 W; Int32: 11.89 W; SPFP: 15.03 W; DPFP: 12.23 W) c Averaged between data types (Int8: 30.90 W; Int16: 33.02 W; Int32: 21.11 W; SPFP: 25.74 W; DPFP: 26.27 W) Table A 7. Metrics data for projected future space grade CPUs, DSPs, FPGAs, and GPUs (worst case). Processor CD (GOPS) CD/W (GOPS/W) IMB (GB/s) EMB (GB/s) IOB (GB/s) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP Core i7 4610Y 132.73 73.76 37.49 26.43 13.22 7.55 4.19 2.13 1.50 0.71 176.91 8.51 4.82 Exynos 5433 97.78 53.36 31.18 25.76 11.10 15.98 8.72 5.10 4.21 1.72 147.43 4.39 1.13 KeyStone II 309.09 154.54 77.27 42.02 21.01 9.32 4.66 2.33 1.27 0.60 269.09 9.58 3.63 Kintex 7Q 486.12 359.24 80.62 47.51 19.48 10.29 7.11 2.50 1.21 0.46 1388.52 14.19 13.86 Tegra X1 244.01 149.12 101.67 81.34 15.25 31.91 19.50 13.30 10.64 1.89 115.23 8.51 2.42 Table A 8 Metrics data for projected future space grade CPUs, DSPs, FPGAs, and GPUs ( best case). Processor CD (GOPS) CD/W (GOPS/W) IMB (GB/s) EMB (GB/s) IOB (GB/s) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP Core i7 4610Y 462.44 257.02 130.57 92.08 53.56 45.98 27.71 14.24 9.21 4.24 668.16 23.87 57.55 Exynos 5433 340.67 185.93 108.59 89.72 44.97 97.38 57.63 34.06 25.79 10.24 556.80 12.31 13.49 KeyStone II 1076.92 538.54 269.11 146.39 85.14 56.78 30.78 15.57 7.76 3.57 1016.32 26.86 43.29 Kintex 7Q 1693.76 1251.86 280.76 165.50 78.92 62.68 47.00 16.69 7.39 2.74 5244.21 39.79 165.46 Tegra X1 850.20 519.64 354.09 283.33 61.80 194.43 128.84 88.85 65.17 11.26 435.20 23.87 28.87

PAGE 65

65 APPENDIX B BENCHMARKING DATA Table B 1 Parallelization data for space grade CPUs. Processor Threads Matrix multiplication speedup ( ) speedup ( ) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP GR740 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 GR740 2 1.42 1.38 1.39 1.51 1.50 1.55 1.73 1.71 1.87 1.90 GR740 4 1.58 1.51 1.53 1.67 1.67 2.45 2.81 2.65 3.38 3.26 RAD5545 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 RAD5545 2 1.81 1.06 1.99 1.98 1.99 1.93 1.94 1.96 1.79 1.99 RAD5545 4 3.62 2.22 3.98 3.98 3.95 3.71 3.60 3.70 2.95 3.87 RAD5545 8 3.99 2.03 3.89 3.91 3.90 3.32 3.46 3.73 2.84 3.51 RAD5545 16 3.45 2.03 3.90 3.89 3.90 2.83 3.31 3.56 2.68 3.25 HPSC 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 HPSC 2 1.35 1.30 0.20 1.43 1.43 1.62 1.60 1.95 1.93 1.92 HPSC 4 2.63 2.58 0.28 2.86 2.75 2.23 2.53 3.67 3.81 3.81 HPSC 8 5.20 5.01 0.32 5.52 5.47 2.80 2.33 7.13 7.43 7.34 HPSC 16 5.15 5.06 0.32 5.59 5.57 2.01 1.71 2.85 4.90 4.89 HPSC 32 5.22 5.07 1.73 5.61 5.62 1.73 1.54 2.40 4.48 4.41 Maestro 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Maestro 2 1.84 1.91 1.93 1.88 1.79 1.49 1.14 1.65 1.51 1.22 Maestro 4 1.81 2.19 2.75 2.81 3.05 1.74 0.96 1.54 1.48 1.21 Maestro 8 3.07 3.53 3.76 4.18 4.57 1.53 0.67 1.61 1.63 1.19 Maestro 16 4.82 5.06 5.10 5.10 7.89 1.01 0.47 1.77 1.96 1.39 Maestro 32 7.80 8.12 8.01 8 .0 7 9.92 0.07 0.04 0.47 1.06 0.69 Maestro 45 10.63 11.00 10.82 10.84 12 .2 0 < 0.01 0.00 0.01 0.02 < 0.01

PAGE 66

66 Table B 2 Resource usage data for space grade FPGAs. Processor Resource Matrix multiplication resource u sage (%) resource usage (%) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP Virtex 5QV a MACs 80.00 80.00 80.00 30.00 55.00 23.13 14.38 6.25 60.00 34.69 Virtex 5QV a LUTs 34.49 42.68 20.67 42.14 30.21 40.07 57.93 63.01 63.80 50.11 Virtex 5QV a FFs 34.79 47.26 25.81 46.75 36.47 49.09 75.58 75.36 80.48 63.89 Virtex 5QV a BRAMs 52.68 51.68 21.48 10.74 10.74 100.00 65.10 14.09 56.38 32.55 RTG4 b MACs 27.71 27.71 41.56 55.41 55.41 14.72 14.72 19.05 15.15 20.78 RTG4 b LUTs 25.83 30.75 26.72 81.72 30.85 37.42 69.29 62.57 67.23 33.72 RTG4 b FFs 18.27 24.23 24.33 48.09 18.70 37.42 90.60 94.28 60.02 26.55 RTG4 b BRAMs 61.24 61.24 61.24 61.24 30.62 98.56 98.56 32.54 77.03 44.50 a Total available resources are 320 MACs, 81920 LUTs, 81920 FFs, and 298 BRAMs b Total available resources are 462 MACs, 151824 LUTs, 151824 FFs, and 209 BRAMs Table B 3 Benchmarking data for matrix multiplication on space grade CPUs and FPGAs Processor P erformance (M OPS) Power (W) P ower efficiency (M OPS /W ) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP GR740 17.30 14.78 11.08 15.02 15.01 1.50 11.53 9.85 7.39 10.01 10.01 RAD5545 75.03 24.10 11.70 361.96 565.41 20.00 3.75 1.21 0.58 18.10 28.27 HPSC 3041.25 1503.63 468.10 1243.98 627 .2 5 10.00 304.12 150.36 46.81 124.40 62.73 Maestro 63.21 60.75 58.55 57.18 50.33 22.2 0 2.8 5 2.74 2.64 2.5 8 2.2 7 Virtex 5QV 46692.03 40957.92 3537.45 7943.51 3528.65 3.25 a 13996.41 12018.17 1187.86 2385.44 1099.27 RTG4 13653.13 13653.13 6023.50 4818.80 630.15 0.70 b 20256.87 17662.52 7538.80 5682.54 1507.55 a Averaged between data types (Int8: 3.34 W; Int16: 3.41 W; Int32: 2.98 W; SPFP: 3.33 W; DPFP: 3.21 W) b Averaged between data types (Int8: 0.67 W; Int16: 0.77 W; Int32: 0.80 W; SPFP: 0.85 W; DPFP: 0.42 W ). Table B 4 grade CPUs and FPGAs Processor Performance ( MSOLS ) Power (W) Power efficiency (MSOLS/W ) Int8 Int16 Int32 SPFP DPFP Int8 Int16 Int32 SPFP DPFP GR740 0.83 0.79 0.80 0.43 0.41 1.50 0.55 0.53 0.53 0.29 0.28 RAD5545 0.86 0.19 0.21 0.47 0.70 20.00 0.04 0.01 0.01 0.02 0.04 HPSC 1.85 1.41 4.22 1.26 1.25 10.00 0.18 0.14 0.42 0.13 0.13 Maestro 1.75 2.65 0.35 0.13 0.11 22.2 0 0.08 0.12 0.02 0.01 0.01 Virtex 5QV 991.93 418.70 81.09 161.14 58.88 3.89 a 262.83 104.94 20.17 41.53 15.45 RTG4 302.29 194.65 68.46 26.75 11.99 0.73 b 661.48 235.94 78.60 40.59 14.19 a Averaged between data types (Int8: 3.77 W; Int16: 3.99 W; Int32: 4.02 W; SPFP: 3.88 W; DPFP: 3.81 W ) b Averaged between data types (Int8: 0.46 W; Int16: 0.83 W; Int32: 0.87 W; SPFP: 0.66 W; DPFP: 0.85 W )

PAGE 67

67 Table B 5 Performance data for additional benchmarks on space grade CPUs Processor Benchmark Perfo rmance (M OPS MSOLS, or MT/s ) Int8 Int16 Int32 SPFP DPFP GR740 Matrix addition a 52.37 37.43 21.40 30.62 15.96 GR740 Matrix convolution a 46.17 52.34 48.05 59.04 56.17 GR740 Matrix transpose b 5.98 4.82 4.78 6.74 6.28 GR740 Clohessy Wiltshire equations c d d d 10.86 5.40 RAD5545 Matrix addition a 136.53 96.41 48.14 46.13 36.61 RAD5545 Matrix convolution a 149.11 98.33 126.72 126.10 117.90 RAD5545 Matrix transpose b 9.69 8.66 7.96 8.26 7.81 RAD5545 Clohessy Wiltshire equations c d d d 17.07 17.42 HPSC Matrix addition a 1508.90 792.27 411.23 413.79 213.34 HPSC Matrix convolution a 459.87 456.67 461.30 408.74 1162.32 HPSC Matrix transpose b 143.96 68.40 33.69 33.48 53.09 HPSC Clohessy Wiltshire equations c d d d 21.67 14.11 Maestro Matrix addition a 7.85 8.11 4.82 5.07 3.46 Maestro Matrix convolution a 81.54 67.59 67.91 65.57 44.19 Maestro Matrix transpose b 4.55 4.23 3.62 3.64 2.64 Maestro Clohessy Wiltshire equations c d d d 0.76 0.47 a Performance reported in MOPS b Performance reported in MT/s c Performance reported in MSOLS d Integer data types not applicable for this benchmark

PAGE 68

68 LIST OF REFERENCES Comparative Analysis of Present and Future Space Grade Processors with Device Metrics AIAA Journal of Aerospace Information Systems (JAIS) Vol. 14, No. 3, March 2017, pp. 184 197 doi: 10.2514/1.I010472 Proceedings of the Government Microcircuit Applications and Critical Technology Conferenc e (GOMACTech) Defense Technical Information Center, Ft. Belvoir, VA, March 2015, pp. 1 27. [3] Lovelly, T. M., Cheng, K., Garcia, W., Bryan, D., Wise, T., and George, A. D., Generation Architectures for Proceedings of the 7th Workshop on Fault Tolerant Spaceborne Computing Employing New Technologies Sandia National Laboratories, Albuquerque, NM, June 2014, pp. 1 27. Proceedings of the Military and Aerospace Programmable Logic Devices Conference (MAPLD) SEE Symposium, San Die go, CA, May 2014, pp. 1 24, http://www.chrec.org/chrec pubs/Lovelly_ MAPLD14.pdf [retrieved Nov. 2017]. [5] Lovelly, T. M., Bryan, D., Cheng, K., Kreynin, R., George, A. D., Gordon Ross, A., or Next Generation On Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2014, Paper 2440. doi:10.1109/AERO.2014.6836387 [6] Some, R., Doyle, R., Bergman, L., Whitaker, W., Powell, W., Johnson, M., Gofo rth, Performance AIAA Infotech@Aerospace Conference (I@A) AIAA Paper 2013 4729, Aug. 2013. doi:10.2514/6.2013 4729 [7] Doyle, R., Some, R., Powell, W., Mounce, G., Go forth, M., Horan, S., and Lowry, M., Generation Space Processor: A Joint Proceedings of the International Symposium on Artificial Intelligence, Robotics, and Automation in Space (i SAIRAS) Canadian Space Agency, Montreal, June 2014, http://robotics.estec.esa.int/i SAIRAS/isairas2014/Data/ Plenaries/ ISAIRAS_FinalPaper_0153.pdf [retrieved Nov. 2017]. Chiplet B ased Approach for Heterogeneous Processing and Packaging Architectures, Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi: 10.1109/AERO.2016.7500830

PAGE 69

69 [9] Lovelly, T. M., Wise, T. W., Holtzman, S. H., and George, A. Benchmarking Analysis of Space Grade Central Processing Units and Field Programmable Gate Arrays AIAA Journal of Aerospace Information Systems (JAIS) submitted Nov. 2017 [10] Space Proceedings of the 9th Workshop on Fault Tolerant Spaceborne Computing Employing New Technologies Sandia National Laborator ies, Albuquerque, NM, June 2016, pp. 1 15. [11] Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi:10.1109/AERO.2016.7500866 [12] Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H., Multicore Devices for Application ACM Transactio ns on Reconfigurable Technology and Systems (TRETS) Vol. 3, No. 4, Nov. 2010, pp. 1 29. doi:10.1145/1862648 [13] Richardson, J., Fingulin, S., Raghunathan, D., Massie, C., George, A. D., and Lam, omputation, Memory, I/O, Proceedings of the High Performance Reconfigurable Computing Technology and Applications Workshop (HPRCTA) at ACM/IEEE Supercomputing Conference (SC) CFP1015F ART, IEEE, Piscataway, NJ, Nov. 2010. doi:10.1109/HPRCTA.20 10.5670797 Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable Utilization Metrics ACM Transactions on Reconfigurable Technology and Systems (TRETS) Vol. 10, No. 1, Dec. 2016, pp. 1 21. doi: 10.1145/2888401 [15] Wulf, N., George, A. D., and Gordon Optimizing FPGA ACM Transactions on Reconfigurable Technology and Systems (TR ETS) Vol. 10, No. 1, Dec. 2016, pp. 1 29. doi: 10.1145/2888400 [16 ] Proceedings of the Military and Aerospace Programmable Logic D evices Conference (MAPLD) SEE Symposium, San Diego, CA, April 2013, pp. 1 13, http://www.chrec.org/chrec pubs/Wulf_MAPLD13.pdf [retrieved Nov 2017]. [17 IEEE Transactions on Nu clear Science (TNS) Vol. 55, No. 4, Aug. 2008, pp. 1810 1832. doi:10.1109/TNS.2008.2001409

PAGE 70

70 [18 IEEE Transactions on Nuclear Science (TNS) Vol. 50, No. 3, June 2003, pp. 466 482. doi:10.1109/TNS.2003.813131 [19 ] Schwank, J. R., Ferlet Cavrois, V., Shaneyfelt, M. R., Paillet, P., and Dodd, P. E., IEEE Transactions on Nuclear Science (TNS) Vol. 50, No 3, June 2003, pp. 522 538. doi:10.1109/TNS.2003.812930 [20 ] Gaillardin, M., Raine, M., Paillet, P., Martinez, M., Marcandella, C., Girard, S., in Advanced SOI Devices : New Insights into Total Ionizing Dose and Single Event Proceedings of the IEEE SOI 3D Subthreshold Microelectronics Technology Unified Conference (S3S) CFP13SOI USB, IEEE, Piscataway, NJ, Oct. 2013. doi:10.1109/S3S.2013.6716530 [21 IEEE Design and Test of Computers (D&T) Vol. 22, No. 3, June 2005, pp. 258 266. doi:10.1109/MDT.2005.69 [22 ] Ballast, J., Amort, T., Cabanas Holmen, M., Cannon, E. H., Brees, R., Neathery, C., F Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2015. doi:10.1109/AERO.2015.7119216 [23 ] Makihara, A., Midorikawa, M., Yama guchi, T., Iide, Y., Yokose, T., Tsuchiya, Y., By IEEE Transactions on Nuclear Science (TNS) Vol. 52, No. 6, Dec. 2005, pp. 2524 2530. doi:10.1109/TNS.2005.860716 [24 of Hardness By Design Methodology to Radiation Tolerant ASIC Tech IEEE Transactions on Nuclear Science (TNS) Vol. 47, No. 6, Dec. 2000, pp. 2334 2341. doi:10.1109/23.903774 [25 ] Glein, R., Rittner, F., Becher, A., Ziener, D., Frickel, J., Teich, J., and Heuberger, Grade vs. COTS SRAM Based FPGA in N Modular NASA/ESA Adaptive Hardware and Systems Conference (AHS) CFP1563A ART, IEEE, Piscataway, NJ, June 2015. doi:10.1109/AHS.2015.7231159 [26 Proceedings of the Data Systems i n Aerospace Conference (DASIA) AeroSpace and Defence Industries Association of Europe, ESA SP 701, Dubrovnik, Croatia, May 2012, http://www.ramon chips.com/ papers/SurveySpaceProcessors DASIA2012 paper.pdf [retrieved Nov. 2017].

PAGE 71

71 [27 ] ASA Selects High Performance Spaceflight Computing (HPSC) 2017, https://www.nasa.gov/press release/goddard/2017/nasa selects high performance spaceflight computing hpsc processor contractor [retrieved Nov. 2017] [28 ] Synopsis/Solicitation, Solicitation Number NNG16574410R, NASA Goddard Space Flight Center, Greenbelt, MD, July 2016, https://www.fbo.gov/ind ex?s=opportunity& mode=form&id=eefe806f639ae00527a13da6b73b3001 [retrieved Nov. 2017] [29] Core Processors to Improve the Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2011. doi:10.1109/AERO.2011.5747445 [30] Quinn, H., Robinson, W. H., Rech, P., Aguirre, M., Barnard, A., Desogus, M., Entrena, L., Garcia Valderas, M., Guertin, S. M., Kaeli, D., Kastensmidt, F. L., Kiddie, B. T., Sanchez Clem on Nuclear Science (TNS), Vol. 62, No. 6, Dec. 2015, pp. 2547 2554. doi:10.1109/TNS.2015.2498313 [31] Marshall AIAA Infotech@Aerospace Conference (I@A) AIAA Paper 2011 1620, March 2011. doi:10.2514/6.2011 1620 [32] Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI) IEEE, Piscataway, NJ, April 2 008. doi:10.1109/ISVLSI.2008.25 [33] Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Landscape of Parallel Computing Research: A View No. UCB/EECS 2006 183, University of California, Berkeley, Dec. 2006, https://www2 eecs.berkeley.edu/Pubs/TechRpts/2006/EECS 2006 183.pdf [retrieved Nov. 2017]. [34 ] Manakul, K., Siripongwutikorn, P., See, S., and Achalak Proceedings of the IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS) IEEE, Piscataway, NJ, Dec. 2012. doi: 10.1109/ICPADS.2012.126 [35] Phillips, S. C., Engen, V., and P Proceedings of the IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom) IEEE, Piscataway, NJ, Nov. Dec. 2011. doi: 10.1109/CloudCom.2011.114

PAGE 72

72 [36 Numerical and Symbolic Scientific Computing, Texts and Monographs in Symbolic Computation (A Series of the Research Institute for Symbolic Computation, Johannes Kepler University, Linz, Austria) Vol. 1, 2012, pp. 9 5 104. doi: 10.1007/978 3 7091 0794 2_5 [37 P5040RM, Rev. 3, Freescale Semiconductor, Austin, TX, June 2016, pp. 65 76, 128 138. Semiconductor, Austin, TX, May 2014, http://cache.freescale.com/files/32bit/doc/ data_sheet/P5040.pdf [retrieved Nov. 2017]. [39 reescale Semiconductor, Austin, TX, March 2013, pp. 36 46, 151 152, 162, 246 250, 274, https://www nxp.com/ webapp/ Download?colCode=E5500RM [retrieved Nov. 2017]. [40 San J ose, CA, Aug. 2015, https://www.xilinx.com/support/documentation/data_sheets/ ds100.pdf [retrieved Nov. 2017]. [41 5, Virtex CA, Oct. 2012, https://www.xilinx.com/products/technolog y/power/xpe.html [retrieved Nov. 2017]. [42 San Jose, CA, March 2011, https://www.xilinx.com/support/documentation/ ip_documentation/ cordic_ds249.pdf [retrieved Nov. 2017]. [4 3 Xilinx Inc., San Jose, CA, March 2011, https://www.xilinx.com/support/documentation/ ip_documentation/floating_point_ds335.pdf [retrieved Nov. 2017]. [4 4] CPCI LEON4 N2X D evelopmen Cobham/Gaisler, Gteborg, Sweden, Aug. 2013, pp. 1 57, http://www.gaisler.com/doc/ GR CPCI LEON4 N2X_UM.pdf [retrieved Nov. 2017] [45 UM DS, Ver. 1.5, Cobham/Gaisle r, Gteborg, Sweden, Nov. 2016, http://www.gaisler.com/doc/gr740/ GR740 UM DS.pdf [retrieved Nov. 2017]. [46 ] Hjorth, M., Aberg, M., Wessman, N., Andersson, J., Chevallier, R., Forsyth, R., Hard Quadcore LEON4FT Sy stem on Proceedings of the Data Systems in Aerospace Conference (DASIA) AeroSpace and Defence Industries Association of Europe, ESA SP 732, Barcelona, Spain, May 2015, http://microelectronics.esa.int/gr740/DASIA2015 GR740 Hjorth.pdf [retrieved Nov. 2017].

PAGE 73

73 [47 Proceedings of Data Systems in Aerospace Conference (DASIA) AeroSpace and Defence Industries Association of Europe, ESA SP 701, Dubrovnik, Croatia, May 2012, http://microelectronics.esa.int/ gr740/NGMP NGFP DASIA 2012 Paper.pdf [retrieved Nov. 2017]. [48 ] Semiconductor, Austin, TX, July 2013, pp. 1 111. [49 ] Berger, R., Chadwic k, S., Chan, E., Ferguson, R., Fleming, P., Gilliam, J., Graziano, M., Hanley, M., Kelly, A., Lassa, M., Li, B., Lapihuska, R., Marshall, J., Miller, H., Moser, D., Pirkl, D., Rickard, D., Ross, J., Saari, B., Stanley, D., and Stevenson, J. Hardened System On Chip Power Architecture Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2015. doi:10.1109/AERO.2015.7119114 [50] Shenzhen LeMaker Technology Co. Ltd., Shenzhen, China, Dec. 2015, pp. 1 22, https://github.com/96boards/documentation/blob/master/ConsumerEdition/HiKey/ AdditionalDocs/HiKey_Hardware_User_Manual_Rev0.2.pdf [retrieved Nov. 2017] [51 ] 10th Workshop on Fault Tolerant Spaceborne Computing Employing New Technologies Sandia National Laboratories, Albuquerque, NM, May June 2017. [52 ] Rogers, C. M., Bar A 49 Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi:10.1109/AERO.2016.7500626 [53 ] Suh, J., Kang, D. Proceedings of the IEEE Aerospace Conference (AERO) CFP13AACCDR, IEEE, Piscataway, NJ, March 2013. doi:10.1109/AERO.2013.6496949 [54 ] Villalpando, C., Rennels, D., Some, R., and Cabanas Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2011. doi:10.1109/AERO.2011.5747447 [55 ] Inc., San Jose, CA, June 2011, https://www.xilinx.com/support/documentation/ boards_and_kits/ug356.pdf [retrieved Nov. 2017] [56 Hardened, Space Grade Virtex 5QV Family Xilinx Inc., San Jose, CA, Apr. 2017, https://www.xilinx.com/support/documentation/ data_sheets/ds192_V5QV_Device_Overview.pdf [retrieved Nov. 2017].

PAGE 74

74 [57 ] ., Aliso Viejo, CA, July 2017, https://www.microsemi.com/document portal/doc_download/ 135213 ug0617 rtg4 fpga development kit user guide [retrieved Nov. 2017] [58 CA, July 2017 http://www.microsemi.com/document portal/doc_view/ 134430 pb0051 rtg4 fpgas product brief [retrieved Nov. 2017]. [59 ] Blackford, L. S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Poz o, R., Remington, K., and ACM Transactions on Mathematical Software (TOMS) Vol. 28, No. 2, June 2002, pp. 135 151. doi:10.1145/567806.567807 [60 ] Whaley, R. C., and Petitet, A., Wiley Journal of Software: Practice and Experience Vol. 35, No. 2, Feb. 2005, pp. 101 121. doi:10.1002/spe.626 [61 ] Standard API for Shared IEEE Computational Science and Engineering Vol. 5, No. 1, Jan. March 1998, pp. 46 55. doi:10.1109/99.660313 [62 ] 2016, pp. 1 348, https://static.docs.arm.com/ihi0073/b/IHI0073B_arm_neon_intrinsics _ref.pdf [retrieved Nov. 2017]. [63 ] Proceedings of the 10th Wor kshop on Fault Tolerant Spaceborne Computing Employing New Technologies Sandia National Laboratories, Albuquerque, NM, June 2017, pp. 1 15. [64 ] UG227, Release 2.1.0.98943, Tiler a Corp., San Jose, CA, April 2010. [65 ] Xilinx Inc., San Jose, CA, April 2014, https://www.xilinx.com/support/documentation/ ip_documentation/floating_point/v7_0/pg060 floating p oint.pdf [retrieved Nov. 2017] [66 Inc., San Jose, CA, June 2011, https://www.xilinx.com/support/documentation/ ip_documentation/ div_gen/v4_0/ds819_div_gen.pdf [retrieved Nov. 2017]. [67 2015, http://www.actel.com/ipdocs/CoreCORDIC_HB.pdf [retrieved Nov. 2017]. [68 ] Amsterdam, Netherlands, Feb. 2012, http://opencores .org/project,fpuvhdl [retrieved Nov. 2017].

PAGE 75

75 [69 Plymouth, MN, Aug. 2008, https://aerocontent.honeywell.com/aero/common/documents/ myaerospacecatalog documents/Space documents/ HXRHPPC_Processor.pdf [retrieved Nov. 2017]. [70 ] Berger, R. W., Bayles, D., Brown, R., Doyle, S., Kazemzadeh, A., Knowles, K., Moser, D., Rodgers, J., Saari, B A Radiation Hardened Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2001. doi:10.1109/AERO.2001.931184 [71 Core LEON3 GR712RC DS, Ver. 2.3, Cobham/Gaisler, Gteborg, Sweden, Jan. 2016, http://www.gaisler.com/ j25/doc/ gr712rc datasheet.pdf [retrieved Nov. 2017]. [72 FT SPARC V8 Processor LEON3FT RTAX, Ver. 1.9, Cobham/Gaisler, Gteborg, Sweden, Jan. 2013, pp. 1 2, 15 17, 28, 32 33, http://www.gaisler.com/doc/leon3ft rtax ag.pdf [retrieved Nov. 2017]. [73 Rad Proceed ings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi:10.1109/AERO.2016.7500697 [74 ] Ginosar, R., Aviely, P., Gellis, H., Liran, T., Israeli, T., Nesher, R., Lange, F., Hard Ma ny Core High Performance Proceedings of the Data Systems in Aerospace Conference (DASIA) AeroSpace and Defence Industries Association of Europe, ESA SP 732, Barcelona, Spain, May 2015, http://www.ramon chips.com/papers/ D ASIA2015 RC64 paper.pdf [retrieved Nov. 2017]. [75 Rad Hard Manycore with FPGA Extension for Telecomm Satellites and Other Space Proceedings of the Military a nd Aerospace Programmable Logic Devices Conference (MAPLD) SEE Symposium, San Diego, CA, May 2015, pp. 1 4, http:// www.ramon chips.com/papers/MAPLD2015 RC64 paper.pdf [retrieved Nov. 2017]. [76 rs to Move to Its Ceva X and Ceva linleygroup.com/newsletters/newsletter_detail.php?num=3993 [retrieved Nov. 2017]. [77 EE Times, 27 Aug. 2008, http://www.eetimes.com/document.asp?doc_id=1275609 [retrieved Nov. 2017].

PAGE 76

76 [78 Performance Tiled Rad Hard Digital Signal Processor AIAA Infotech@Aerospace Conference (I@A) AIAA Paper 2013 4728, Aug. 2013. doi:10.2514/6.2013 4728 [79 ] Marshall, J., Berger, R., Bear, M., Hollinden, L., Robertson, J., and Rickard, D., Hard Digital Signal Processor to Spaceborne Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ March 2012. doi:10.1109/AERO.2012.6187229 [80 May 2016, http://www.microsemi.com/document portal/doc_download/ 135193 ds0131 rtg4 fpga datasheet [retrieved Nov. 2017]. [81 Feb. 2016, https://www.microsemi.com/document portal/doc_download/ 134921 rtg4 power estimator [retrieved Nov. 2017]. [82 Microsemi Corp., Aliso Viejo, CA, June 2015, http://microsemi.com/document portal/ doc_download/135182 smartfusion2 hard multiplier addsub configuration guide [retrieved Nov. 2017]. [8 MathWorld A Wolfra m Web Resource Oct 2017, http://mathworld.wolfram.com/MatrixAddition.html [retrieved Nov. 2017]. [8 MathWorld A Wolfram Web Resource Oct 2017, http://mathworld.wolfram.com/FastFourierTransform.html [retrieved Nov. 2017]. [8 MathWorld A Wolfram Web Resource Oct 2017, http://mathworld.wolfram.com/MatrixMultiplication.html [retrieved Nov. 2017]. [8 MathWorld A Wolfram Web Resource Oct 2017, http://mathworld.wolfram.com/Convolution.html [retrieved Nov. 2017]. [8 MathWorld A Wolfram Web Resource Oct 2017, http://mathworld.wolfram.com/JacobiTransformation.html [r etrieved Nov. 2017]. [8 MathWorld A Wolfram Web Resource Oct 2017, http://mathworld.wolfram.com/KroneckerProduct.html [retrieved Nov. 2017]. [89 MPC603/D, Rev. 3, Freescale Semiconductor, Austin, TX, June 1994, http://www.nxp. com/assets/ documents/data/en/data sheets/MPC603.pdf [retrieved Nov. 2017].

PAGE 77

77 [90 Rev. 1, Freescale Semiconductor, Austin, TX, Dec. 2001, http://www.nxp.com/ assets/ documents/data/en/reference manuals/MPC750UM.pdf [retrieved Nov. 2017]. [91 Microelectronics Division, Hopewell Junction, NY, Sept. 2002, http://datasheets chipdb.org/IBM/PowerPC/7xx/PowerPC 740 750.pdf [retrieved Nov. 2017]. [92 Bit Micropro MPC60XBUSRM, Rev. 0.1, Freescale Semiconductor, Austin, TX, Jan. 2004, http:// www.nxp.com/assets/documents/data/en/reference manuals/MPC60XBUSRM.pdf [retrieved Nov. 2017]. [93 Release 1.2, Tilera Corp., San Jose, CA, Feb. 2013, http://www.mellanox.com/repository/ solutions/tile scm/docs/UG120 Architecture Overview TILEPro.pdf [retrieved Nov. 2017]. [94 ., San Jose, CA, Nov. 2011, http://www.mellanox.com/repository/solutions/tile scm/docs/ UG101 User Architecture Reference.pdf [retrieved Nov. 2017]. [95 PD 1425, Rev. 1E, ClearSpeed Technology, Ltd., B ristol, England, U.K., Jan. 2011. [96 RM 1137, Rev. 4B, ClearSpeed Technology, Ltd., Bristol, England, U.K., Nov. 2009. [97 329678 002US, Intel Corp., Santa Clara, CA, April 2014, pp. 12 24, 28, http://www intel.com/content/ dam/support/us/en/documents/processors/quark/sb/329678_ intelquarkcore_hwrefman_002.pdf [retrieved Nov. 2017]. [98 001US, Intel Corp., Santa Clara, CA, Oct. 2013, pp. 21 23, 114, 252 290, http://www.intel.com/ content/dam/ support/us/en /documents/processors/quark/sb/ intelquarkcore_ devman_001.pdf [retrieved Nov. 2017]. [99 329474 003, Rev. 003, Intel Corp., Santa Clara, CA, Dec. 2014, http://www.intel.com/ content/dam/ www/public/us/en/documents/datasheets/atom z36xxx z37xxx datasheet vol 1.pdf [retrieved Nov. 2017]. [100 Sept. 2013, http://ark.intel.com/products/76760 [retrieved Nov. 2017].

PAGE 78

78 [101 325462 060US, Combined Vols. 1, 2A 2D, and 3A 3D, Intel Corp., Santa Clara, CA, Sept. 2016, pp. 113 146, http://www.intel.com/content/dam/www/public/us/en/ documents/manuals/64 ia 32 architectures software developermanual 325462.pdf [retrieved Nov. 2017]. [102 248966 033, Intel Corp., Santa Clara, CA, June 2016, pp. 36 43, 62 69, http://www intel.com / content/dam/www/public/us/en/documents/manuals/64 ia 32 architectures optimization manual.pdf [retrieved Nov. 2017]. [103 Processor Family, and Desktop Intel Celeron Processor Fam Number 328897 010, Vol. 1, Intel Corp., Santa Clara, CA, March 2015, http://www intel.com/content/ dam/www/public/us/en/documents/datasheets/4th gen core family desktop vol 1 datasheet.pdf [retrieved Nov. 2017]. [104 46 Sept. 2013, http://ark.intel.com/products/76618 [retrieved Nov. 2017]. [105 ] Frumusanu, A., and Smith, R., Samsung eles, CA, Feb. 2015, http://www.anandtech.com/show/8718 [retrieved Nov. 2017]. [106 Rev. r1p3, ARM Holdings, San Jose, CA, Feb. 2016, pp. 24 27, 509 521, http:// infocenter.arm.com/h elp/index.jsp?topic=/com.arm.doc.ddi0488h [retrieved Nov. 2017]. [107 Rev. r0p2, ARM Holdings, San Jose, CA, Feb. 2014, pp. 13 19, 24 29, http:// infocenter.arm.com/help/index.jsp?to pic=/com.arm.doc.ddi0500d [retrieved Nov. 2017]. [108 UG404, Release 1.12, Tilera Corp., San Jose, CA, Oct. 2014, http://www.mellanox.com/ repository/solutions/tile scm/docs/UG40 4 IO Device Guide.pdf [retrieved Nov. 2017]. [109 Jose, CA, Feb. 2013, http://www.mellanox.com/repository/solutions/tile scm/docs/ UG401 ISA.pdf [retrieved Nov. 2017]. [110 le Processor Architecture Overview for the TILE 1.1, Tilera Corp., San Jose, CA, May 2012, http://www.mellanox.com/repository/ solutions/tile scm/docs/UG130 ArchOverview TILE Gx.pdf [retrieved Nov. 2017]. [11 Core Freescale Semiconductor, Austin, TX, July 2013, http://www.nxp.com/assets/ documents/ data/en/data sheets/MSC8256.pdf [retrieved Nov. 2017].

PAGE 79

79 [11 MSC8256RM, Rev. 0, Freescale Semiconductor, Austin, TX, July 2011, pp. 44 72, 80 82, http://www.nxp.com/assets/documents/data/en/reference manuals/MSC8256RM.pdf [retrieved Nov. 2017]. [11 SC8256PB, Rev. 1, Freescale Semiconductor, Austin, TX, May 2011, http://www.nxp.com/assets/ documents/ data/en/product briefs/MSC8256PB.pdf [retrieved Nov. 2017]. [114 on Texas Instrumen ts, Dallas, TX, Nov. 2013, http://www.ti.com/lit/ds/symlink/ 66ak2h12.pdf [retrieved Nov. 2017]. [11 5 A and ARMv7 0406C.c, ARM Holdings, San Jose, CA, May 2014, pp. 165 171, 181 191, 261 272, http:/ /infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c [retrieved Nov. 2017]. [116 ] ARM Cortex A15 MPCore Processor Technical Reference Manual, DDI 0438I, Rev. r4p0, ARM Holdings, San Jose, CA, June 2013, pp. 12 16, 26 29, http:// infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i [retrieved Nov. 2017]. [11 Grade Spartan Ver. 1.1, Xilinx, Inc., San Jose, CA, May 2014, https://www.xilinx.com/support/ documentation/ data_sheets/ds172_S6Q_Overview.pdf [retrieved Nov. 2017]. [11 3A, Spartan San Jose, CA, Oct. 2012, https://www.xilinx.com/products/technology/power/xpe.html [retrieved No v. 2017] [11 Ver. 1.2, Xilinx, Inc., San Jose, CA, July 2015, https://www.xilinx.com/support/ documentation/ data_sheets/ds185 7SeriesQ Overview.pdf [retrieved Nov. 2017]. [120 Xilinx Power Estimator: Artix 7, Kintex 7, Virtex 7, Zynq Xilinx, Inc., San Jose, CA, April 2014, https://www.xilinx.com/products/technology/ power/ xpe.html [retrieved Nov. 2017]. [121 Reference Manual DP 05644 001_v01p, NVIDIA Corp., Santa Clara, CA, Jan. 2012. [122 San Jose, CA, April 2010, http://infocenter.arm.com/help/index.jsp?topic = / com.arm.doc.ddi0388f [retrieved Nov. 2017]. [12 v1.1, Santa Clara, CA, Jan. 2014, http://www.nvidia.com/content/pdf/tegra_white_ papers/tegra K1 whitepaper.pdf [retrieved Nov. 20 17].

PAGE 80

80 [12 v1.0, Santa Clara, CA, Jan. 2015, http://international.download.nvidia.com/pdf/tegra/ Tegra X1 whitepaper v1.0.pdf [retrieved Nov. 2017]. [12 5] Martn del Campo, G., Reigb Proceedings of the 11th European Conference on Synthetic Aperture Radar (EUSAR) VDE, Frankfurt, Germany, June 2016. [12 6] Sun X., Abshire, J. B., McGarry, J. F., Neumann, G. A., Smith, J. C., Cavanaugh, J. F., Harding, D. J., Zwally, H. J., Smith, D. E., and Zuber, M. T. Developed at the N ASA Goddard Space Flight Center: IEEE Journal of Select ed Topics in Applied Earth Observations and Remote Sensing (J STARS) Vol 6, No. 3, June 2013, pp. 1660 1675. doi:10.1109/JSTARS.2013.2259578 [12 7] IEEE Sensors Journal Vol. 16, No. 12, June 2016, pp. 4866 4881. doi:10.1109/JSEN.2016.2549860 [ 12 8] Up Robust Features Elsevier Journal of Computer Vision and Image Understanding Vol. 110, No. 3, June 2008, pp. 346 359. doi:10.1016/j.cviu.2007.09.014 [12 9] Plaza, A., Qian, D., Ya ng IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( J STARS ) V ol. 4, No. 3, Sep 2011, pp. 528 544. doi: 10.1109/JSTARS.2010.2095495 [ 13 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) IEEE, Piscataway, NJ June 2011. doi: 10.1109/CVPRW.2011.5981842 [13 1] Chang, C., T ime Processing Algorithms for IEEE Transactions on Geoscience and Remote Sensing (TGRS) Vol 39, No. 4, Ap ril 2001, pp. 760 768. doi: 10.1109/36.917889 [13 2] Ho, A., George, A., and Gordon Bit Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi:10.1109/AERO.2016.7500799 [13 B 2, Consultative Committee for Space Data Systems (CCSDS), Washington, D.C., Sept. 2017, https://public.ccsds.org/Pubs/122x0b2.pdf [retrieved Nov. 2017].

PAGE 81

81 [13 4] Standard 123.0 B 1, Consultative Committee for Space Data Systems (CCSDS), Washington, D.C., May. 2012, https://public.ccsds.org/Pubs/123x0b1ec1.pdf [retrieved Nov. 2017] [13 5] 2009, https://calhoun.nps.edu/handle/10945/4335 [retrieved Nov. 2017]. [13 6] Boyarko, G. A., Optimal Reorientation AIAA Journal of Guidance, Control and Dynamics (JGCD) Vol. 34, No. 4, July 2011, pp. 1197 1208 doi: 10.2514/1.49449 [13 7] Curtis, H., Orbital Mechanics for Engineering Students, 1st ed., Elsevier Aerospace Engineering Series Elsevier Butterworth Heinemann, Burlington, MA, 2005. [13 8] Proceedings of the IEE E Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2016. doi:10.1109/AERO.2016.7500944 [13 Time 6DoF Guidance For of Spacecraft Proximity Maneuvering and Close Approach with a Tumbling Objec Proceedings of the AIAA/AAS Astrodynamics Specialist Conference AIAA Paper 2010 7666, Aug. 2010. doi: 10.2514/6.2010 7666 [14 AIAA Journal of Guidanc e, Control, and Dynamics (JGCD) Vol. 18, No. 2, March 1995, pp. 237 241. doi: 10.2514/3.21375 [14 1] upon the Clohessy Springer Journal of the Astronautical Sciences Vol 61, No. 4, Dec. 2014, pp. 341 366. doi:10.1007/s40295 014 0029 6 [14 Orbit Assembly Using Superquadric AIAA Journal of Guidance, Control and Dynamics (JGCD) Vol. 31, No. 1, Jan. 2008, pp. 30 43. doi: 10.2514/1.28865 [14 Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2007. d oi: 10.1109/AERO.2007.352724 [14 4] Estlin, T., Bornstein, B., Gaines, D., Thompson, D. R., Castano, R., Anderson, ACM Transacti ons on Intelligent Systems and Technology (TIST) Vol. 3, No. 50, May 2012. doi: 10.1145/2168752.2168764

PAGE 82

82 [14 5] Wolf, A. A., Acikmese, B., Cheng, Y., Casoliva, J., Carson, J. M., and Ivanov, M. Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2011. doi:10.1109/AERO.2011.5747243 [14 6] Goldberg, S. B., Maimone, M. ision and Rover Proceedings of the IEEE Aerospace Conference (AERO) IEEE, Piscataway, NJ, March 2002. doi: 10.1109/AERO.2002.1035370 [14 7] Bajracharya, M., Ma, J., Howar T ime 3D Stereo Semantic Mapping, Perception, and Explo ration Workshop (SPME) at International Conference on Robotics and Automation (ICRA) May 2012. [14 8] Knight R., S cale Activity Scheduling and Planning Proceedings of the Sixth International Workshop in Planning and Scheduling for Space (IWPSS) Pasadena, California, July 2009, https://smcit.ecs.baylor.edu/2009/ iwpss/papers/22.pdf [retrieved Nov. 2017]. [14 Springer Internationa l Journal of Automated Software Engineering Vol. 10, No. 2, April 2003, pp. 203 232. doi: 10.1023/A:1022920129859 [15 Control of Three Dimensional Spacecraft Relative M Proceedings of the American Control Conference (ACC) pp. 173 178, June 2012. doi: 10.1109/ACC.2012.6314862 [15 IEEE Transactions on Automatic Control (TA C) Vol. 57, No. 11, Nov. 2012, pp. 2817 2830. doi: 10.1109/TAC.2012.2195811 [15 Defined Radio: A Brief IEEE Potentials Vo l. 23, No. 4, Oct. 2004, pp. 14 15. doi: 10.1109/MP.2004.1343223 [15 3 ] Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC) IEEE, Piscataway, NJ, Sept. 2012. doi:10.1109/HPEC.2012.6408673 [15 B 3, Consultative Committee for Space Data Systems (CCSDS), Washington, D.C., Sept. 2017, https://public.ccsds.org/Pubs/131x0b3.pdf [retrieved Nov 2017].

PAGE 83

83 [15 5] (CNSS), Fort George G. Meade, MD, Nov. 2012, https://www.cnss.gov/CNSS/ issuances/P olicies.cfm [retrieved Nov. 2017] [15 6 ] Committee on National Security Systems (CNSS), Fort George G. Meade, MD, Oct. 2016, https://www.cnss.gov/CNSS/issuances/Policies.cfm [retrieved Nov. 2017] [15 B 1, Consultative Committee for Space Data Systems (CCSDS), Washington, D.C., Nov. 2012, https://public.ccsds.org/Pubs/352x0b1.pdf [retrieved Nov. 2017].

PAGE 84

84 BIOGRAPHICAL SKETCH Tyler Michael Lovelly received the Bachelor of Science in Computer E ngineering in 2011, Master of Science in Electrical and Computer E ngineering in 2013, and Doctor of Phil osophy in E lectrical and Computer E ngineering in 2017 from the University of Florida He completed internships with United Space Alliance at NASA Kennedy Space Center as part of the Space Shuttle program in 2009 and 2010 and with the Air Force Research Laboratory at Kirtland Air F orce Base as part of the Space Electronic s Technology program from 2013 to 2016 He was a r esearch group leader at the NSF Center for High Performance Reconfigurable Computing and the NSF Center for Space, High Performance, and Resilient Computing from 201 2 to 2017 and a visiting scholar at the University of Pittsburgh in 2017.