<%BANNER%>

Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2012-08-31.

Permanent Link: http://ufdc.ufl.edu/UFE0042283/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2012-08-31.
Physical Description: Book
Language: english
Creator: Shrivastava, Kartik
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Kartik Shrivastava.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Mishra, Prabhat.
Electronic Access: INACCESSIBLE UNTIL 2012-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042283:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042283/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2012-08-31.
Physical Description: Book
Language: english
Creator: Shrivastava, Kartik
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Kartik Shrivastava.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Mishra, Prabhat.
Electronic Access: INACCESSIBLE UNTIL 2012-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042283:00001


This item has the following downloads:


Full Text

PAGE 1

SYNERGISTICINTEGRATIONOFCODECOMPRESSIONANDENCRYPTIONINEMBEDDEDSYSTEMSByKARTIKSHRIVASTAVAATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2010

PAGE 2

c2010KartikShrivastava 2

PAGE 3

Tomyfamilyandfriends 3

PAGE 4

ACKNOWLEDGMENTS First,IwouldliketothankmythesissupervisorDr.PrabhatMishraforprovidingmeanopportunitytosolvemanyinterestingandcomplexproblems,assistingmetolearnnewtechnologiesandrecognizingthepotentialinmetopositivelycontributeinongoingresearchinEmbeddedSystemsLab.MysincerethankstoDr.MyThaiandDr.AlinDobraforbeingmythesiscommitteemembersandprovidingmevaluablefeedbackandconstructivecommentsonmythesis.IwouldalsoliketothankallComputerandInformationScienceandEngineeringDepartmentfacultyforofferingadvancedcoursestoaugmentmyknowledgeandinspiringmetoapplythoseconceptstosolvenumerouscomplexissues.IwouldliketoextendmyprofoundgratitudetoresearchmembersinEmbeddedSystemLabwhowerethereatalltimesprovidingmeajoyousenvironment,listeningtoallmyproblemsandforassistingmeinsolvingthemwithgreatease.Lastbutbynomeanstheleast,Iwouldliketoconveymyheartfeltthankstomyfamilyandfriendswho,atalltimeshavebeenagreatsourceofpositivespirit,inspiration,encouragingmetotakebolddecisionsandaccomplishthemsuccessfully.ThisworkispartiallysupportedbytheNationalScienceFoundationunderGrantNo.CNS-0915376. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1INTRODUCTION ................................... 10 1.1CodeCompression ............................... 10 1.2IntegrationofCodeCompressionandEncryption ............. 12 1.3ThesisContributionsandOrganization .................... 12 2RELATEDWORK .................................. 14 2.1CodeCompression ............................... 14 2.2Encryption ................................... 16 2.3CombinationofCodeCompressionandEncryption ............. 16 3DUALCODECOMPRESSION ........................... 18 3.1Overview .................................... 19 3.1.1OfineDualCompression ....................... 19 3.1.2DecompressionArchitecture ...................... 20 3.2DynamicFrequencybasedCompression .................. 21 3.2.1Prolecreation ............................. 21 3.2.2CompressionMechanism ....................... 23 3.2.3RuntimeDecompression ........................ 25 3.3StaticFrequencybasedCompression .................... 27 3.4Experiments .................................. 29 3.4.1ExperimentalSetup .......................... 29 3.4.2CodeSizeReduction .......................... 29 3.4.3PerformanceIncrease ......................... 30 4INTEGRATIONOFCODECOMPRESSIONANDENCRYPTION ........ 36 4.1CombiningCompressionandEncryption .................. 36 4.1.1Encryptionfollowedbycompression ................. 36 4.1.2Compressionfollowedbyencryption ................. 36 4.2DynamicCodeEncryptionandCompression ................ 37 4.2.1CompressedBinaryCreation ..................... 37 4.2.2PerformanceAnalysis ......................... 39 4.2.3PlacementofCache .......................... 42 5

PAGE 6

4.3Experiments .................................. 44 4.3.1ExperimentalSetup .......................... 44 4.3.2Results ................................. 45 5CONCLUSION .................................... 48 REFERENCES ....................................... 50 BIOGRAPHICALSKETCH ................................ 52 6

PAGE 7

LISTOFTABLES Table page 3-1Asummaryofthenumberofstaticanddynamicinstructionsintheselectedbenchmarkswhereeachinstructionisof4bytes. ................. 29 3-2Numberofclockcyclesfortheuncompressedandcompressedbenchmarksforvariouscachesizes.Thecachesizesareinbytes ............... 31 4-1Averageratioofthenumberofcyclesforacombinationoftheusedencryptionandcompressionmethods ............................. 45 7

PAGE 8

LISTOFFIGURES Figure page 1-1OverviewofCodeCompression .......................... 11 3-1OverviewofDualCodeCompression ....................... 19 3-2Percentofcoverageofdynamicinstructionsforvariousdictionarysizesintheselectedbenchmarks. ................................ 22 3-3DynamicFrequencybasedCompressionMechanism .............. 24 3-4DecompressionandexecutionofDFCcompressedcode ............ 25 3-5Compressionencodingusedinbit-maskbasedencoding ............ 28 3-6Compressionratiosforthebenchmarks,usingSFC ............... 30 3-7Themissratiosforthebenchmarksforvariouscachesizes ........... 33 3-8Ratioofthereductioninthenumbercyclesduetocompressionforvariouscachesizes. ..................................... 33 3-9Cyclesforepic .................................... 34 3-10Cyclesfordjpeg ................................... 34 3-11Cyclesforcjpeg ................................... 35 3-12Cyclesforrawcaudio ................................ 35 4-1EncryptionfollowedbyCompression ........................ 37 4-2CompressionfollowedbyEncryption ........................ 37 4-3ProcedureusedtocompressandencryptanECOFFbinary ........... 38 4-4Basiccompressionfollowedbyencryptionmodel ................. 39 4-5Processor-Cache-Decoder(PCD)architecture .................. 42 4-6Processor-Decompressor-Cache-Decryptor(PDCD)architecture ........ 43 4-7Compressionratioforthevariousbenchmarks .................. 45 4-8PerformanceratiosforDESforvariouscachesizes ................ 46 4-9PerformanceratiosforAESforvariouscachesizes ................ 47 4-10Ratioofexecutioncyclesbetweencompressedanduncompressedbinaries .. 47 8

PAGE 9

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceSYNERGISTICINTEGRATIONOFCODECOMPRESSIONANDENCRYPTIONINEMBEDDEDSYSTEMSByKartikShrivastavaAugust2010Chair:PrabhatKumarMishraMajor:ComputerEngineeringEmbeddedsystemsareusedinawidevarietyofplacestoday,fromcellphonestoautomobiles.Architectsaimtomakeembeddedsystemsmorepowerfulandspaceefcientaswellassecure.Codecompressiontechniquesarepromisingforreducingthememoryrequirements,whereasexistingencryptiontechniquesarewidelyusedforapplicationsecurity.Codecompressionistraditionallyusedtoreducethecodesizebycompressingtheinstructionswithhigherstaticfrequency.However,itmayproduceadecompressionoverhead.Performanceawarecompressionstrategiestrytoimproveperformancethroughreductionofcachemissesbyutilizingthedynamicinstructionfrequency,butitsacricescodesize.Codecompressionandencryptioncanbeintegratedtomakeembeddedsystemefcient(intermsofarea,powerandperformance)aswellassecure.Thisthesisstudiesapromisingdirectionofcompressionfollowedbyencryptiontoreducethedecryptionoverheadwhilemaintainingtheindividualadvantagesofbothcodecompressionandencryption.Thisthesisalsoproposesadualcompressionschemethataimstosimultaneouslyoptimizecodesizereductionandperformanceimprovement.Experimentalresultsshowthatdualcompressioncanachievebothcompressionratiosofupto60%andanaverageperformanceimprovementof50%.Moreover,compressionfollowedbyencryptionreducestheexecutiontimeoftheencryptedbinaryby40%onanaverage. 9

PAGE 10

CHAPTER1INTRODUCTIONEmbeddedsystemshaveawidevarietyofapplicationstoday,frommultipurposehandheldPDAstodedicatedreal-timecontrolsystems.Embeddedsystemsareresourceconstrainedi.e.,theygenerallyhavelimitedmemoryandcomputationalcapabilitiesandthereisadrivingneedtoextractasmuchspaceefciencyandperformancefromtheavailableresourcesaspossible.Thereisalsoaneedofsecuringproprietaryprogramsfromespionageandsabotage,whileminimizingtheeffectonperformance.Codecompressionaddressesthememoryrequirements,whereasencryptionprovidessecurityforapplicationprograms.Thischapterisorganizedasfollows.Section 1.1 describescodecompressiontechniques.Section 1.2 motivatestheneedforcombiningcodecompressionwithencryption.Finally,Section 1.3 describesthethesis'contributionsandorganization. 1.1CodeCompressionGeneraldatacompressiontechniqueslikeHuffman[ 1 ],LZW[ 2 ]etc.areusedtoreducethesizeofthetargeteddatatobetterutilizestoragespace.Compressingtheapplicationbinaryanddecompressingitatruntimehelpsusbetterutilizethelimitedmemoryspaceinembeddedsystems.Figure 1-1 showsanoverviewofcodecompressioninembeddedsystems.Thecompressedcodeisplacedinthemainmemoryand/orintheinstructioncache,thusincreasingtheireffectivesizesbyenablingthemtoholdmorenumberofinstructions.Duringruntime,compressedcodeisfetched,decompressedandsenttothenextmemorylevelortotheprocessor.Decompressionintroducesacertainoverheadwhichincreasesthenumberofcyclesforeachfetch,whichmayreducetheprogram'sexecutionrate.However,areducedbinarysizeofacompressedapplicationhassomefeatureswhichcanimproveitsperformance.Ifthecompressedcodeisstoredinthemainmemory,llingupacachelineonacachemisswillrequirefewernumberofcyclesontheaverage,ineffectreducingtheaverage 10

PAGE 11

latencytofetchaninstructionblockfromthememory.Moreover,placingthecompressedcodeinthecachemeansthatitholdsmoreinstructions,henceincreasingtheeffectivecachesizeandcausingareductioninthemissrate. Figure1-1. OverviewofCodeCompression Codecompressionhasbeenemployedtoexploitbothcodesizereductionandperformanceincreaseinembeddedsystems.CompressionRatioiswidelyacceptedasthemetricformeasuringtheefciencyofcompressionalgorithmsandisdenedas:CompressionRatio=CompressedCodeSize OriginalCodeSizeGoodcompressionratioscanbeachievedbycompressingtheinstructionsthatoccurmostfrequentlyinthecode,whereas,aspeedupisachievedbycompressingtheinstructionsthatarefetchedmostoften.Mostfrequentinstructionsinstaticcodemaynotbethemostexecutedonesandviceversa.Hence,abinarycompressedtomaximizeonebenetmaynotprovidethebestresultsintheotherscenario.Therearesomemixedprolebasedcompressionschemeswhichattempttoachievebothcodesizereductionandperformanceimprovement.Inmixedproling,thedictionaryconsistsofinstructionsfrombothsetsofinstructionsbyselectivelycombiningbothstaticanddynamicfrequencies.Thisapproachcanleadtoatrade-offbutcannot 11

PAGE 12

achievethebestofbothworlds.Chapter 3 describeshowtosimultaneouslyachievebothcodesizereductionandperformanceimprovement. 1.2IntegrationofCodeCompressionandEncryptionFormorethantwodecadesnow,softwareindustryhasrisenremarkablyinsizeandimportance,andlikeallotherindustriesitissusceptibletoespionageandsabotage.Therearenumerouswaysdevisedtorecoverthelogicofthecodeoreventoaltertheinstructionsinthecodefromthebinariesofpoorlyprotectedsoftware.Asourdependencyonsoftwareincreasessodoesthenecessityofbetterandefcientprotectionschemesfromthesethreatsi.e.,makingthesoftwaremoresecurewithouthighlycompromisingitsexecutiontimeandthroughput.Thisthesisproposesawayofachievingitusingencryptionovercodecompression.Encryptionhasalwaysservedasadependablewayofprotectinginformation.Protectingcriticaldataforstorageandtransmissionisoneofthemostcommonlyusedapplicationsofcryptography.Forexample,encryptionofmessageswhilesendingthemoutincommonmediaisacommonpracticebythewirelesscompanies.Cipheringleswilestoringthemontheharddiskprotectsthemfrombeingreadincasethehardwareitselfislostorstolen.Softwareorratherbinarycodeisfundamentallydifferentfromotherstaticdata.Codeencryptionitselfisarelativelynewandopeneldofresearch.Encryptingstaticdataismainlyconcernedwiththeencryptionalgorithmandthemodeofoperation.Codeencryptionrequiresthatinstructionsneedtobedecryptedduringexecution,andthereforecanintroducesignicantoverhead.Suchintricaciesofusingbinarycodeasanactiveprocessmakeencryption/decryptionmorecomplex.Chapter 4 integratescodecompressionwithcodeencryption,attemptingtomakecodeexecutionsecureandefcientatthesametime. 1.3ThesisContributionsandOrganizationThisthesishastwomajorcontributions:i)anoveldualcompressionschemewhichaimstosimultaneouslymaximizethereductionintheoverallexecutioncyclesand 12

PAGE 13

thebinarysize,andii)synergisticallyandefcientlycombineencryptionwithcodecompression.Indualcompressionscheme,rstthecodeiscompressedonthebasisofitsexecutionproleandthensecondcompressionisdonetoreducethebinarysize,basedonthestaticoccurrencesoftheinstructionsaftertherstcompression.Duringexecution,decompressionisrstdonebetweenthecacheandthememoryandthenbetweentheprocessorandcache.Ipresentadetaileddescriptionofcompressionalgorithmanddecompressionsystemwithperformanceresultsandanalysis.Whilecombiningencryptionandcodecompression,Ipresentananalysisofthemostfeasibleandefcientarchitecturefollowedbyanalysisofhowvariousparameterssuchascompressionratio,decryptionanddecompressionlatency,cachesizeetc.affecttheapplication'sperformance.TheframeworkisimplementedontheSimpleScalarsimulatorandvalidatedusingMediaBenchandMiBenchbenchmarks.Restofthethesisisorganizedasfollows.Chapter 2 surveysrelatedworkoncodecompressionandencryption.Chapter 3 describesthedualcompressionscheme.IntegrationofcompressionandencryptionisdiscussedinChapter 4 .Finally,Chapter 5 concludesthethesis. 13

PAGE 14

CHAPTER2RELATEDWORKTheexistingapproachescanbedividedintothreerelatedcategories:codecompression,encryptionandacombinationofcodecompressionandencryption.Section 2.1 liststheexistingcodecompressiontechniquesthattargetcodesizereduction,performanceimprovementandtheattemptstocombinethesetwo.Sections 2.2 and 2.3 givealistingofsomeoftheencryptiontechniquesandattemptstocombinethemwithcodecompression. 2.1CodeCompressionCodecompressiontechniqueswererstdevelopedforembeddedsystemsbyWolfeandChannin[ 3 ].TheydevelopedaHuffmancodingbasedcompressiontechniqueinwhichthecompressedprogramisstoredinthemainmemory.ALineAddressTable(LAT)isusedtomaptheinstructionsintheoriginalcodetothecompressedcode.LekatsasandWolfeusedArithmeticcodingforcodecompressioninembeddedsystems[ 4 ].Nametal.[ 5 ]useddictionarybasedcompressiontocompressVLIWinstructions.LarinandContedevisedaHuffmanbasedcompressiononembeddedsystemsin[ 6 ].TunstallcodingwasusedbyXieetal.[ 7 ]toperformvariabletoxedlengthcompression.UsageofvariablesizedblockwasfurtherexploitedbyLinetal.[ 8 ],whentheyproposedLZWcompressionschemeforcodecompressionofembeddedprocessors.CodecompressiontechniqueswereappliedonvariablelengthinstructionsetprocessorsbyDasetal.[ 9 ].Severalnewtechniqueshavebeenproposedtoimprovethestandarddictionarybasedcompressionbyrememberingasmanymismatchesaspossible.Althoughdifferentapproacheshavebeenproposedtoaccomplishthis,recentworkbySeongetal.[ 10 ]hasgivenpromisingresults.Theyrememberthemismatchpositionsusingbitmasks,whichisadvantageoussinceanumberofmismatchescanberememberedusingasinglebitmask.Theothermajoradvantageofthismethodisthatthecompressedcodecanbedecompressedinonecycleandtherefore,ithasaminimaldecompressionoverheadanddoesnothamper 14

PAGE 15

theprocessorperformance.Alltheseworksemphasizeonreducingthecodesizeoftheapplicationatthecostofpotentialperformancedegradation.Therehasalsobeensomeworkoncodecompressionbasedondynamicfrequencyprolingtoincreaseperformanceefciency.Beninietal.[ 11 ]proposedatechniqueofselectivecompressiontoreducetheenergyrequiredbytheprogramtoexecuteonembeddedsystems.Theycompressedthemostcommonlyfetchedinstructionstoreducetheenergydissipatedinmemoryaccesses.Theirprolingresultsshowthat256mostfrequentlyfetchedinstructionsintheirbenchmarktookupalargeportionoftheprogramexecutiontime.Therefore,theyonlycompressedthese256instructions.Theadvantageoftheirmethodisthesimplicityofthedecompressionlogic.However,theytargetedenergydissipationratherthansystemperformanceorthecodesize.Lekatsasetal.[ 12 ]proposedadictionarybasedcompressiontechniqueforcodecompression,whichexclusivelydealtwithtakingadvantageofcompressingwordswithhigherfrequencies.Theydevelopedacompressionschemeandadecompressorwhichtakesoneclockcycletoextractinstructionsfromcompressedcodethroughwhichaperformanceincreasewasachieved.Theyusedxedandvariable-lengthcodewordsintheirexperiments.Theirresultsshowanaverageperformanceimprovementof25%.Howevercodesizereductionisnotdiscussed.Nettoetal.described[ 13 ]amulti-prolebasedcompressiontechniquewheretheyproposedanapproachtomixstaticanddynamicinstructionprolingtoeffectivelyexploitsize-performancetrade-off.Likeourapproach,theytoousedaword-sizedsetsofindices,removinganycompressedwordmisalignments,givingafasterdecompression.Theirresultsshowa35%reductionincodesizeanda50%reductionininstructioncacheaccesses.However,theirworkusesasinglecompressionscheme,withasingledecompressor.So,foranycombinationofinstructionsfromtheirdynamicandstaticproles,bothsizeandperformancecannotbeoptimalatthesametime. 15

PAGE 16

InthedualcompressionschemedescribedinChapter 3 ,compressionforspeedandsizearedoneseparately.Toincreasespeed,wehaveimprovedthecodecompressiontechniquein[ 13 ]withamorecompactcompressionformatandafasterdecompressionmethodthatusesanauxiliarytable.ThecompressiontechniquetargetingareductioninsizeissimilartoSeongetal.[ 10 ],asitgivesthebestcompressionratiowhilefacesminordecompressionoverhead. 2.2EncryptionEncryptiontechniqueshavebeenusedsincehistorictimes.Thesetechniqueswerebasicallyoftwotypes:substitutionciphersandtranspositionciphers.Intheformer,substitutionrulesaredenedtosubstituteonecharacterwithanother.Ontheotherhand,transpositioncipherschangetheorderofcharactersinthecode.However,allthesecipherswereeasilybrokenusingstatisticalattacks.Privatekeycryptographyhasbeenusedsinceearly20thcenturyinwhichbothpartiesoperatingonthedatahadthesamekeytoencryptanddecrypt.Thistypeofsharedkeycryptographyisoftwotypes:blockandstream.Ablockcipheroperatesonablockofdatawhileastreamcipherworksbycombiningthedatawithastreamofpseudo-randombits.ExampleofblockciphersincludeAESandDES.RC4isanexampleofstreamcipher.However,theproblemofsharingtheprivatekeyforcedpeopletochangetopublickeycryptography.Inthissystem,therearetwosetsofkeys,publickeyandprivatekeys,withtheencryptorandthedecryptorrespectively.Thesekeysaredifferentandonecannotbeproducedfromtheother.Theencryptorencryptsthecodeusingitspublickeywhilethedecryptordecryptsitusingitsownprivatekey.Therefore,theneedofkeysharingisavoided.RSAisanexampleofpublickeycryptography. 2.3CombinationofCodeCompressionandEncryptionTherearefeweffortstocombinebothencryptionandcompressiontogether.Johnsonetal.[ 14 ]proposedamethodtocompressencrypteddatausingLowDensityParityCheckcodes(LDPC)andtheyhaveshowntheirperformanceonOTPencrypted 16

PAGE 17

data.However,theirmethodisnotsuitablesinceLDPCcompressionisNPhard.Also,theyhaveusedtheiralgorithmonlyonOTPencrypteddata,whichisnotconsideredagoodencryptionscheme.Ruanetal.[ 15 ]improvedtheShannon-Fano-Eliastechniqueofencryptingcompresseddatabyimprovingthecodelength.However,theintensivedecryption/decompressionofthesecodesarenotapplicableinembeddedsystems.AlthoughIBMCodepackuseskeysfordecompression[ 16 ],therehasbeennoindicationofencryptioninthem.Shawetal.[ 17 ],developedamethodinsimilarlineswithoursoncombinationofcompressionandencryption.Theyworkmainlyonimageandvideolesandnotforembeddedsystems.Thecompressionschemesusedbythem,whichcomprisesofcodebooksislossyinnature.Thismaybesuitablefordata,butcertainlynotapplicableforthecode,sincealossycodecanleadtoinaccuracyandinadequatefunctionality.Cypress,developedbyLekatsasetal.[ 18 ]hasintegratedcompressionandencryption.Theydealwithbothcodeanddatasequencesformultimediaembeddedsystems.Intheirsystem,theyuseacompression/encryptiontechniquewhichworksonbothcodeanddata.Theproblemariseswhenoperatingondata.Datacanbewrittenbacktothememorybytheprocessor.Theytrytocounterthisbychangingthesystemincludingchangesinpagetableandplacementofinstructionanddatacaches.Thisdatahastobeencryptedandcompressedagainbeforebeingwritten.Since,inthiscase,encryptionandcompressionhavetobeperformedduringruntime,itwillsignicantlyaffectprocessorperformanceandisnotapplicableinmanysystemswithreal-timeconstraints.Moreover,theirapproachisinherentlyintrusive,sincethereisasignicantchangeintheactualhardwareofthesystem.Inchapter 4 ,Ihaveproposedamethodinwhichthecompressionandencryptionofthecodecanbedonewithminimalmodicationinthehardwareofthesystem,whileitretainsalltheadvantagesofcompressionaswellasencryption. 17

PAGE 18

CHAPTER3DUALCODECOMPRESSIONDualcodecompressiontargetstooptimizebothsystemperformanceandcodesizereduction,whichisnotpossibleinanysingularcodecompressionschemeasitwilltargeteithersizereductionorperformanceimprovement.Atahighlevel,dualcodecompressionschemeissimilartoothercodecompressionmethods,i.e.,rstacompressedbinaryiscreatedofine,thendecompressionisdonedynamicallyforeachblockofcodewhenitisfetchedduringexecution.Thedifferenceofcourseliesinthefactthatcompressionanddecompressionaredonetwice,rstforperformanceimprovementandthenforsizereduction,basedonfrequenciesofdynamicandstaticinstructionsrespectively.Therefore,thereshouldbeasynergybetweenthetwosteps.Theoutputoftherstcompressionstepshouldbeavalidinputforthesecond.Moreover,dynamicdecompressionforthetwostepsshouldbedoneinsuchawaythattheoverheadisminimal.Toachieveaspeedupwemustreducethecachemissratiowhichispossiblebyplacingcompressedcodeinthecache.Holdingthemostfrequentlyexecutedinstructionsincompressedformwillgreatlyenhancecacheusageandcorrespondinglyimprovesystemperformanceasthecachemiss-ratewillreduce.Themainmemoryutilizationisenhancedbyholdingstaticallycompressedcodewithminimalcodesize.TheorderingofthetwocompressionanddecompressionstepsareshowninFigure 3-1 .Therestofthechapterisorganizedasfollows,Section 3.1 presentsanoverviewofdualcompressionanddecompression.Section 3.2 describesthersthalfofdualcodecompressionanditscorrespondingdecompression,i.e.,compressiontotargetperformanceenhancement.Section 3.3 presentsadescriptionforthesecondhalfofdualcompressionwhichreducesthecodesize.Finally,section 3.4 presentstheexperimentalresultsandanalysis. 18

PAGE 19

Figure3-1. OverviewofDualCodeCompression 3.1Overview 3.1.1OfineDualCompressionAlgorithms 1 and 2 showanoverviewofstepsinvolvedingeneratingacompressedbinary.First,codecompressionisdonetoimproveperformancebyselectingthemostfrequentlyfetchedinstructions.WerefertothisstepasDynamicFrequencybasedCompression(DFC).Inthesecondstep,codesizereductionisaimedandthemostfrequentlyoccurringstaticinstructionsareselectedforcompressionwhichwewillcallStaticFrequencybasedCompression(SFC). Algorithm1DynamicFrequencybasedCompression 1: CreateprolePofmostexecutedbasicblocks 2: Createa256entrydictionaryD1basedonP. 3: Compresseach32-bitvectorusingD1toproduceC1. 4: GenerateBasicBlockMappingTableBBM 5: returnC1,D1,BBM InDFCthemostfrequentlycalledbasicblocksaretightlycompressedinsuchawaythatcompressedanduncompressedwordlengthsarexedattheoriginalinstruction 19

PAGE 20

Algorithm2StaticFrequencybasedCompression 1: CreateDictionaryD2usingthemostfrequentwordsinC1 2: CompressC1usingD2toproduceC2 3: ReadjustJumptargetsinC2 4: returnC2,D1,BBM,D2 wordlength.SFCcompressestheoutputofDFCwherethemostfrequentlyoccurringstaticwordsarecompressed.ThewordboundariesaremaintainedinDFC,whichfacilitatescompressioninSFC.Bit-maskbasedcodecompressionalgorithmisusedinSFC. 3.1.2DecompressionArchitectureDecompressionforDFCisdonebetweencacheandmemorytoenableincreaseincacheutilization.Thecacheholdsthemostfrequentlyexecutedinstructionsincompressedform,thus,theeffectivesizeofthecacheincreasesandthetotalnumberofcachemissesgetsreduced.Asdecompressorisinvokedforeachinstructionfetch,ithastobefastenoughtodecompressacompressedwordandprovideittotheprocessor'sfetchunitinasingleclockcycle.DecompressionforSFCisdonebetweencacheandmemorytomakedecompressiondistributedandtoreduceitsoverhead.Whenacachelineneedstoberelled,compressedwordsarefetchedfromthemainmemory,whicharedecompressedandthensenttothecache.Asdecompressionisdoneonlywhenthereisacache-miss,thedecompressor'sinvocationislessfrequent.Therefore,wecanusehighefcientcompressiontechniques,yieldingthebestpossiblecompressionratioswhichmayhaveareasonabledecompressionoverhead.TherearevariousothercombinationsofplacementofDFCandSFCdecompressionpossiblebuttheyarelessefcient.Post-cachedecompressionforbothDFCandSFCwillcauseaheavylatencyforeachinstructionfetch.Decompressingthemtogetherbeforethecachewouldmeanthatthecachewouldholduncompressedinstructions. 20

PAGE 21

Inthedecompressionarchitecture,wheneverthereisacachemiss,compressedblocksarefetchedfromthemainmemory,whichwouldbeenoughtollupthecachelineondecompression.ThiswaythecacheholdscodethatiscompressedwithDFC.Iftheinstructionpresentinthecacheisinuncompressedformitisdirectlysenttotheprocessor.Ifitiscompressed,thedecompressorfetchesanddecompressesitandstoresitinitsbuffer,andpassesontherequiredinstructiontotheprocessor.ThedetailsofDFCandSFCarediscussedinthefollowingsections. 3.2DynamicFrequencybasedCompressionTheDFCschemeissplitintothreesteps.Therststepisprolecreation,whichinvolvesidentifyingallthebasicblocksofcodeintheprogramandtherelativefrequencieswithwhichtheyarefetchedandthencreatingadictionarybasedonthemostfrequentlyfetchedblocks.Thesecondstepefcientlycompressesthecodeinamannerwhichbestexploitsthelocalityofthemostfrequentlyfetchedinstructionsinthebasicblock.Thethirdstepperformsafastruntimedecompressionofthecompressedcode. 3.2.1ProlecreationTherststepinprolecreationistheidenticationofthebasicblocksandtheirrelativeaccessfrequenciesofbeingfetched.Abasicblockisacodewithoneentrypoint,oneexitpointandnojumpinstructionscontainedinit.Itisasequenceofinstructionswhichareallexecutediftherstoneinthesequenceisexecuted.Thestartinginstructionoftheblockmaybejumpedtofromanylocation,butnoneoftheotherinstructionscanbebranchtargets.Themethodusedtoidentifythebasicblocksandtheirrespectivefrequenciesisasfollows.Wegenerateanexecutiontraceoftheprogramandcalculatethefrequencywithwhicheachinstructionisfetched.Wealsoidentifythetargetsforthejumpinstructions.Herebasicblocksarethosesequenceofinstructionswhichhavethesamefrequencyofexecutionandnoinstructionasthejumptargetexcepttherstone. 21

PAGE 22

Thenextstepinprolecreationisselectingthebasicblockswhicharemostfrequentlyfetched.Wecompressthemostfrequentlyfetchedbasicblocksusingacoupleofintuitions.Firstly,keepingthemostfrequentlyexecutedinstructionsinthecacheincompressedformwillhelpusbetterutilizeitsspaceandreducesthenumberofcachemisses.Ifabasicblockiscompresseditwilltakelessnumberoffetchestobringitfromthememory,thereforeitsavesacertainnumberofcyclesforeachfetch.Moreover,higherthefrequencyofthatblockbeingfetched,morethecycleswesavecumulativelyovertheentireexecution.Wehavetodecideexactlyhowmanyinstructionsshouldbemarkedforcompression.Forthiswerelyonthe90-10rulewhichstatesthat90%ofprogramexecutiontimeisspenton10%ofthecode.Adictionaryiscreatedconsistingofthemostfrequentlyfetchedinstructions.Figure 3-2 showspercentageoffetchestotheinstructioncontainedindictionariesfordifferentdictionarysizes.Forexample,inepicbenchmark,256mostfrequentlyfetchedinstructionsmakeupfor96.9%ofthetotalnumberoffetches. Figure3-2. Percentofcoverageofdynamicinstructionsforvariousdictionarysizesintheselectedbenchmarks. 22

PAGE 23

3.2.2CompressionMechanismFirstwehavetodecideonthedictionarysize.Figure 3-2 suggeststhatadictionarysizeof256isreasonablesinceitcanaccommodatearound70to99percentofthetotalinstructionsexecutedinthesebenchmarks.Tocompressthecodewereplacetheinstructionswiththeirrespectiveindices.AsthetargetinstructionsetarchitecturehereisAlpha,theinstructionsizeis32biti.e.,4bytes.Byselectingadictionarysizeof256,theindexsizewouldbeonebyte.Unlikebit-maskingordictionarybasedcompression,axedblockencodingisusedtobetterfacilitatecompressionanddecompressionofthebasicblocks.Groupsofwordsbelongingtoabasicblockarecompressedtogethertoformasingleword.Themainadvantageofthisapproachisthatthecompressedcodedoesnotgetmisaligned,soonlyasinglefetchisrequiredtoobtainaninstruction.Moreover,fetchingacompressedwordalignedtothewordboundaryisfasterandcanenableparalleldecompression.Figure 3-3 illustratesthecompressionmechanism.Instructionsf1,2,3,4,5gandf8,9,10gformbasicblocksintheprogramandeachinstructionisofsize32bits.Duetothechosendictionarysizeof256,theindexsizewillbe8bits.Instructions1,2,3and4arereplacedtoformonewordconsistingoftheirrespectivedictionaryindices.Instruction5isputasanindexinthenextwordandtheremainingspaceislledupwithpadding.Similarly,instructions8,9and10areputasindicesandtheremainingspaceisleftpadded.Theideabehindsuchcompressionisthatwhenevertherstinstructionofabasicblockiscalled,thenextfewinstructionsarefetchedalongwithit.Asthewordsdonotcontainanyinformationastowhethertheyarecompressedornot,aBasicBlockMapping(BBM)tableisrequiredtoindicateifawordisinthecompressedformatornot.Eachentryinthetableconsistsofinformationaboutabasicblock,suchastheaddressoftherstinstructionoftheblock,theaddressofthelastinstructionanditsaddressmappedtothecompressedform.BBMtableeliminatesthenecessityofagbits/bytesthatindicatethewhethertheinstructioniscompressedor 23

PAGE 24

Figure3-3. DynamicFrequencybasedCompressionMechanism not.Thisextrainformation(usedinexistingmethods)spannedovertheentirebinaryaddsontothesizeofthecompressedbinary.ThesizeoftheBBMtableitselfisverysmallasitonlycontainsinformationaboutthemostfrequentlyfetchedbasicblocks.ItalsoeliminatestherequirementofLineAddressTables(LAT)whichmapthejumptargetsinthecompressedcodeinexistingmethods.ItiseasytomapthejumptargetusingtheBBMtableduetoxedencoding.Thedictionarysizeisimportantinthistypeofcompressionformat.Onecompressedwordfullyconsistsofdictionaryindices.Thus,smallerthedictionary,moreinstructionscouldbetintoonecompressedword.InAlphaISA,aninstructionwordis32bits,ifwechooseadictionarysizeof256,i.e.,anindexofsize8,wewouldbeabletotinamaximumoffourinstructionsintooneword,asshowninFigure 3-3 .Achoicehastobemadefortheindexsize;alargeindexwouldmeanmorenumberblockstobe 24

PAGE 25

compressedbuttheywillbelooselycompressed,whereasasmallerindexwillhavetheoppositeeffect. 3.2.3RuntimeDecompressionHereIdescribethedetailsofthesystemthatisusedtoperformtheruntimedecompression.ThedecompressorisplacedbetweenthecacheandtheprocessorforDFCthathastwoadvantages.Firstly,compressedcodeisplacedinthecacheandsecondlyfetchingindividualwordsfromthecache(compressedoruncompressed)isstraightforward. Figure3-4. DecompressionandexecutionofDFCcompressedcode TheruntimedecompressionunitusestheBBMtabletoseewhichinstructionsarecompressed.Whenthedecoderfetchesacompressedword,itdecompressesitusingthedictionaryandsendsbacktherequiredinstructiontotheprocessorandstorestherestinitsbuffer.Thenumberofinstructionscontainedinacompressedworddependsonthedictionarysizeasdiscussedearlier.Asthewordboundariesaremaintainedevenaftercompression,fetchingacompressedwordfromthecacheisfastandsimple.Iftheinstructiontobefetchedisuncompressed,weonlyhavetomapittotherightlocationandfetchthewholeword.Iftheinstructioniscompressed,wefetchthecompressedword,obtaintheindexandreturntherequiredinstructionafteradictionarylookup.Figure 3-4 showshowaninstructionisextractedfromacompressedwordandexecuted.HereweexecutetheinstructionwiththeoriginalPC3.BylookingattheBBMtable, 25

PAGE 26

PC3isshowntobeinthebasicblockf1,2,3,4,5gwhichstartsfromaddress1inthecompressedcodewhichwillcontaininstructionsf1,2,3,4g.ConsideranotherexamplewherePCisnotpartofthebasicblock,examplePC6.Inthiscasethebasicblockwhichisbefore6isf1,2,3,4,5g.Weneedtodividethebasicblocksizeby4(rightshiftby2)toobtainthenumberofcompressedwordsandaddittotheoffsetfromthelastwordofthecompressedblock,i.e.,newaddressforcurrentPC=newaddressfortheblockabove+((blocksize-1)>>2)+(currentPC-lastaddressoftheblock).Inthiscase,newaddressforPC6willbe1+(5-1)>>2)+(6-5)=3.AftermappingPC,thecompressedwordataddressoneisfetchedbythedecompressor,theinstructionsareextractedfromitandkeptinthedecompressor'sbufferandtherequiredinstructionissenttotheprocessorforexecution.Thedecompressor'sfetchestothecacheispipelinedthusfetchinganyinstructionfromthecacheonlytakesonecycleexceptforinstructionsthatarejumptargets.TheadditionaladvantageofaBBMtableisthatitenablescodecompressionwithouttheuseofanadditionalcompressed/uncompressedagwitheachword.ThissavessignicantspacesincetheBBMtableitselfisverysmall.Placingthedecompressorafterthecachealsomeansthatthecacheholdscompressedcode,therebytheeffectivesizeofthecacheincreases.Ifthecacheonlyholdsthemostfrequentlyexecutedcode,i.e.,thecompressedbasicblocks,theeffectivesizeofthecacheincreasesbyinverseofthecompressionratioofthebasicblocks.Inthissystemwhereeachcompressedwordholdsfourinstructions,thecachesizeeffectivelyincreasesfourtimes.Thisincreaseineffectivecachesizeisthereasonoftheexpectedspeedup.Alargercachemeanslessnumberofoverallfetchesfromthemainmemory.Thedecompressionoverheadshouldalsobesmallinordertoobtainaproperspeedup.Thedecoderinthissystemusesonecycletofetchaninstructionfromthecache,decompressitandstoresthefouruncompressedwordsinitsbuffer.The 26

PAGE 27

processorfetchestheinstructionsfromthebufferinthenextcycle.Thus,fetchingfourinstructionsfromacompressedbasicblocktakesvecycles.Fetchinganinstructionwhichisnotcompressedwilltaketwocycles,oneforthedecompressortofetchitfromthecacheandonefortheprocessortofetchitfromthebuffer.Wecanreducethenumberofcyclesfurtherbypipeliningthefetchesbythedecompressor.Afetchbythedecodertakestwocyclesonlyiftheinstructiontobefetchedisajumptarget,otherwisealltheinstructionswilltakejustasinglecycle.Cycletimetofetchaninstructionfromthedecoder'sbufferwouldbeverysmallcomparedtothatfromanL1cache. 3.3StaticFrequencybasedCompressionCompressionschemesusedinoptimizingcodesizecanbecomplexandtheirdynamicdecompressioncanhavesignicantdecompressionlatency.DynamicdecompressionforSFCisdonebeforethecache,thusdecompressionisinvokedonlywhenthereisacachemiss.Thefactthatthedecompressorisnotinthecriticalpathofexecution,i.e.,thedecompressorisnotinvokedforeachfetchbytheprocessorgivesusthefreedomtouseefcientcompressionmechanisms,suchas,Huffmanorarithmeticcompressionthatprovideexcellentcompressionratiobuthaveahighdecompressionlatency.ThecompressionmechanismusedforSFCisbasedontheworkdonebySeongetal[ 10 ],whichusesabit-maskbasedcompressionschemewhichgivesahighcompressionefciencyandhasasinglecycledecompressionpenalty.CompressionisperformedontheDFCcompressedcode.Asmentionedearlier,thewordboundariesinDFCaremaintained,hence,directapplicationofbitmaskbasedcompressionispossibletoperformSFC.Unlike[ 10 ],wehaveplacedthedecompressionengineforSFCbeforethecache.Thus,decompressionisinvokedateachcachemisstollacacheline.Asthecodeinthemainmemoryisincompressedform,intuitivelyitwillrequirelessnumberoffetchestothemainmemoryontheaveragetollacacheline.Furthermore,asweareusing 27

PAGE 28

afastdecompressionengine,weshouldseeafurtherspeedincreaseinthesystembecauseofSFC.Ihaveusedbitmaskbasedcompressionwhereatwo-bitbitmaskisused.ThedictionaryconsistsofthemostfrequentlyoccurringstaticinstructionsandthebitmaskisselectedbyXORingthevariationintheinstructionfromthedictionaryindex.Otherthanthesevariationsthecompressionmechanismremainsthesameas[ 10 ]andisoutlinedinAlgorithm 3 .Figure 3-5 showstheencodingusedforcompression. Algorithm3Bitmask-BasedCompression 1: Createthefrequencydistributionofinstructions. 2: Createthedictionarybasedonfrequencyaswellasbit-maskbasedsavings. 3: Compresseach32bitvector. 4: Handleandadjustbranchtargets 5: returnCompressedcodeanddictionary Figure3-5. Compressionencodingusedinbit-maskbasedencoding Itisusefultoconsiderlargerdictionarysizeswhenthecurrentdictionarysizecannotaccommodateallthevectorswithfrequencyvalueabovecertainthreshold.(e.g.,above5isprotable).However,therearecertaindisadvantagesofincreasingthedictionarysize.Thecostofusinglargerdictionaryismoresincethedictionaryindexbecomesbigger.Thecostincreaseisbalancedonlyifmostofthedictionaryisfullwithhighfrequencyvectors.Mostimportantly,abiggerdictionaryincreasesaccesstimeandtherebyreducesdecompressionefciency.Astandarddictionarysizeof2048isused.Duringexecution,eachtimethereisacachemiss,compressedblocksarefetchedfromthememorywhicharethendecompressedandplacedinthecache.Thenumber 28

PAGE 29

Table3-1. Asummaryofthenumberofstaticanddynamicinstructionsintheselectedbenchmarkswhereeachinstructionisof4bytes. BenchmarkDynamicInstructionsStaticInstructions epic5949463147124cjpeg1902556749896djpeg588795853852rawcaudio761011127256rawdaudio630930027248bitcnts527606523284crc32510830428392 ofblocksfetchedfromthememoryshouldbesufcienttollupthecachelineafterdecompression.Therestisstoredindecompressor'sbuffer.Asthenumberofblocksfetchedfromthemainmemorytollupthecachelinewouldbelesscomparedtoregularexecutionofuncompressedcode,aspeedupisexpected. 3.4Experiments 3.4.1ExperimentalSetupExperimentswereperformedinSimpleScalarperformancesimulatorforMIPSuniprocessorarchitectureusingaselectionofbenchmarksfromMediaBenchandMiBenchcompiledforAlphaISA.Thebenchmarkprogramsemployedwereepic,cjpeganddjpegimagecompressionutility,adpcm-encodeanddecodevoicecompressionprogram,bitcntfromMiBench'sautomotivesuiteandcrc32fromtelcomsuite.ThesimulationsystemconsistedofaSuperScalarMIPSProcessor,adecompressoreachforDFCandSFC,asingleinstructiondirectcachewithalinesizexedat16bytes,andfetchingacachelinefromthemainmemorywhichtakes64cycles.Table 3-1 showsadescriptionofthenumberofstaticanddynamicinstructionsforeachbenchmarkused. 3.4.2CodeSizeReductionFigure 3-6 showsthecodesizereductionachievedinthecodebySFC.Theimplementationofbit-maskbasedcompressionforadictionarysizeof2048entriesgivecompressionratiosfrom0.60to0.65.Thesenumbersaresimilartotheresultsin 29

PAGE 30

Figure3-6. Compressionratiosforthebenchmarks,usingSFC [ 10 ].Asexpected,thereisalmostnosizereductionintheDFCstage,therefore,SFCisexclusivelyresponsibleforcodesizereduction. 3.4.3PerformanceIncreaseTable 3-2 showsperformanceoftheuncompressedandcompressedbinaryintermsofthenumberofclockcyclestakentoexecuteusingarangeofcachesizesandthecorrespondingmissesforeachbenchmark.Foreachbenchmarkthereisatrendinperformanceimprovementasthesizeofthecachedecreases.Thereasonforthistrendliesinthefactthatthedifferenceinthecachemissesbetweenuncompressedandcompressedcodedecreaseswithanincreaseincachesize.Therefore,theratioofreductionincycleswillreducewithcachesizeincrease.Thegreatestperformanceimprovementisobservedforcachesizeof128bytes.Embeddedsystemsgenerallyusesmallcachesandourtechniquescanbebenecialinsuchenvironments.Performanceimprovementismoreforbenchmarkswhosecriticalcode(themostfrequentlyexecutedinstructions)ismuchlargerthanthecache.Thiscouldbeseen 30

PAGE 31

Table3-2. Numberofclockcyclesfortheuncompressedandcompressedbenchmarksforvariouscachesizes.Thecachesizesareinbytes BenchmarksCache(bytes)ClockCyclesCachemisses OriginalCompressedOriginalCompressed epic128291288151125664394479037217018372561445916139454271919154291784835129468715681393578986832662288102480951000670855066603913174682048636964125132219129812439456 djpeg12886863373254273221474658540177256469424061915854976681639274251224810547140506275672262848151024174151001142093156274015381820489579240705484911439895077 cjpeg1282376495221046961094144970205601125611368434490706713183452517103725127792169267695685121656211595911024592943585629056673296485530320484066687529184203436131265894 rawcaudio12883493514445402643613131229625645848254429282302220466051244793974395664677135511024436033143448564233231320484334726433950112001074 rawdaudio128834935144454026148093946602564584825442928267713551512447939743956644233231310244360331434485612001074204843347264339501702925 bitcnt1285519528116122581148093926589425630083364971625450906888666512167472789728918228571883771024104597575248356108273811320485338534522095077487404 crc321283697698357427152153529256369769835742715215352951236624663601606465446621024357636235383662853245820483535711351313019581936 31

PAGE 32

fordjpegwhichhasfairlylargecriticalcodesize.Thereisahugeincreaseinthepercentagereductioninthenumberofcycles,whichdecreasessteadilywithincreaseincachesize.Forbenchmarkswhosecriticalcodetseasilyinthecache,thedifferencebetweentheperformanceofcompressedanduncompressedcodeisnegligible.Forexample,thereisminorchangeinthenumberofcachemissesforrawcaudioandcrc32ifweincreasethecachesizeafter256bytes,whichimpliesthatthecacheeasilyaccommodatestheentirecriticalcode,eveninuncompressdform.Therefore,compressingthecodeinthatcachecongurationwouldnotdecreasethenumberofcachemisses,hencenoperformanceimprovementisseen.Inthecaseofbitcnts,noperformanceimprovementforcachesize2Kwasobserved,a2Kcacheholdsthewholecriticalcode.Moreover,thatperformanceisequaltothatseenforcompressedcodeina1Kcache.Thisisbecausecompressedversionofthecriticalcodetsentirelyina1Kcachebutnottheuncompressedformwhichresultsinaperformanceimprovementoftwotimesinthiscase.Figure 3-7 summarizesthetrendsinreductionincache-misseswithincreaseincachesizeforvariousbenchmarks.Thedecreaseinthecachemiss-ratioforthebenchmarksfordifferentcachesizesandreductioninthenumberofcyclesduetocompressionfollowsasimilartrendasshowninFigure 3-8 .Asdiscussedearlier,SFCmayproduceaslightspeedupasthetotalnumberoffetchesmadetothememoryisexpectedtodecreaseduetoreducedbinarysize.Asacorollarytothis,combiningDFCwithSFCshouldgiveabetterspeedupthanDFCalone.Thefollowingguresshowthenumberofcyclesforfourcases,namelyrunninganuncompressedbinary,abinarycompressedusingSFConly,compressedusingDFConlyandcompressedusingbothDFCandSFC.Runninguncompressedcoderequiresinthemostnumberofcycles,followedbySFConlycode,DFConlycode,andSFCandDFCcombined.TheimprovementduetoSFCismoreapparentinsmallercaches.Asmallercachemeansagreatermissrate,whichresultsinmorenumberofaccessesto 32

PAGE 33

Figure3-7. Themissratiosforthebenchmarksforvariouscachesizes Figure3-8. Ratioofthereductioninthenumbercyclesduetocompressionforvariouscachesizes. themainmemory.Ifthemainmemoryholdscompressedcodeeachmemoryaccesswilleffectivelybringinmoreinstructions.Thus,lessnumberofmemoryaccessesisrequiredduringtheentireexecution.Thisdifferenceinthenumberofmemoryaccessesisthereasonfortheattributedspeedup.Figure 3-9 ,Figure 3-10 andFigure 3-11 showsthistrendforthelargerbenchmarksepic,djpegandcjpegrespectively.Figure 3-12 showsthesameforrawcaudio.The 33

PAGE 34

Figure3-9. Cyclesforepic Figure3-10. Cyclesfordjpeg performanceimprovementismoreapparentinthelargerbenchmarksbecausetheircriticalcodeislargehencehavemorecachemisses.Criticalcodeisfairlysmallinthecaseofrawcaudiowhicheasilytsinacacheofsize256bytesifthebinaryisnotcompressedusingDFC.Whencompressed,criticalcodeofrawcaudioalsotsinacacheof128bytes.Therefore,noimprovementisapparentforallcachesizesafter128bytesforrawcaudio. 34

PAGE 35

Figure3-11. Cyclesforcjpeg Figure3-12. Cyclesforrawcaudio 35

PAGE 36

CHAPTER4INTEGRATIONOFCODECOMPRESSIONANDENCRYPTIONThischapterpresentsadescriptionandanalysisonintegrationofencryptionandcodecompression.InSection 4.1 abasicarchitecturaldecisionregardingthesequenceofencryptionandcompressionisdiscussed.Section 4.2 presentsaperformancemodelofcompressionandencryptionwithananalysisoftheplacementofcaches.Section 4.3 presentstheexperimentalsetupandresults. 4.1CombiningCompressionandEncryptionTherecanbetwowaysinwhichcompressionandencryptioncanbecombined:encryptionfollowedbycompression,orcompressionfollowedbyencryption.Combiningbothencryptionandcompressionmayleadtoanumberofproblems.Therstandmajorproblembeingthatbothdecompressionanddecryptionareslowandhencemaypreventthefullutilizationoftheprocessorperformance.Thedecompressionengineshouldbesuchthattherateatwhichinstructionsareproducedfromitisequaltotherateatwhichtheinstructionsareexecutedbytheprocessor.Inthenexttwosubsections,wewilldiscussthechallengesassociatedwiththetwopossiblecombinationsofencryptionandcompression. 4.1.1EncryptionfollowedbycompressionTherstscenarioisshowninFigure 4-1 .Mostcompressionalgorithmstakeadvantageofthematchingpatternsintheuncompresseddataset.Encrypteddatagenerallyhashighentropyandtherefore,haslesssimilarityinpatterns.Asaresult,itisdifculttocompressthosedata. 4.1.2CompressionfollowedbyencryptionThisisthemostusefulsequencewhenonethinksofcombiningcompressionandencryption.Itiseasiertocompresstheunencryptedcodeastheregularitypatternspresentintheinstructionsareprettyhigh.Moreover,thiscompresseddatacanbeeasilyencryptedandsentacrosstheinsecurechanneltothereceivingend.Thedecryptorand 36

PAGE 37

Figure4-1. EncryptionfollowedbyCompression thedecompressorcandotherestofthework.ThewholescenarioisshowninFigure 4-2 Figure4-2. CompressionfollowedbyEncryption 4.2DynamicCodeEncryptionandCompressionThediscussionintheprevioussectionconcludesthatcompressionfollowedbyencryptionissuitableforembeddedsystems.Thissectiondiscussesvariousimplementationmechanismsthatarepossibleandtheirimpactonperformance. 4.2.1CompressedBinaryCreationAlgorithm 4 outlinesthebasicstepsincreatingacompressed-encryptedtextsegmentinabinary. 37

PAGE 38

Algorithm4BasicCompression-Encryption 1: Compressthetextsegmentoftheprogram. 2: Retargetthejumpswherepossible. 3: Createamappingtablefortherestofthejumps. 4: Encryptthecompressedtextsegment. Figure4-3. ProcedureusedtocompressandencryptanECOFFbinary Figure 4-3 illustratesthealgorithmforanECOFFbinary1.Thetextsegmentisextractedfromthebinaryandcompressionisperformedonitusingbitmasking[ 10 ],whichproducesanauxiliaryjump-mappingtable,adictionaryandacompressedtextsegment.Thiscompressedtextisthenencryptedandanewbinaryleiscreatedusingtheencryptedtext,thedictionary,thejump-mappingtableandtherestofthesegmentsfromtheoriginalle.Thecreationoftheencryptedbinaryisstatic,i.e.,itisdoneofine.Executionofthisbinaryhoweverwillbedynamic.Theencryptedtextsegmentiskeptinthememoryandduringfetchofeachinstructionithastobedecryptedanddecompressed.Dynamicdecoding(decryptionanddecompression)involvesdedicateddecoderwhichfetchesinstructionblocksfromthememory,decodesthemandsends 1ECOFFandEFLarewidelyusedformatsforbinaryrepresentationofapplicationprograms 38

PAGE 39

backthedecodedinstructiontothecacheorthefetchunitandstorestherestinitsbuffer.Thereisalwaysadecompressionoverheadassociatedwiththedecoderunitwhichcanbeminimizedbypipelining. Figure4-4. Basiccompressionfollowedbyencryptionmodel 4.2.2PerformanceAnalysisFigure 4-4 showsabasicsystemonwhichdynamicdecodingisperformed.Thedecodersitsbetweentheprocessorandthememory,individualinstructionsarefetchedfromthememory.Asthecodeiscompressed,moreinstructionswouldbebroughtinwithasinglefetchofablockontheaverage.Thisresultsinlesseningthetotalnumberoffetchestothemainmemoryasasinglefetchbringsinmoreinstructionsincompressedfromthaninuncompressedfrom.Reducingthenumberoffetchesshouldreducethetotalnumberofcyclesofexecution.However,decodingablockofcompressedinstructionwilltakeupsomecyclesdependingonthecomplexityofthecompressionandencryptionalgorithm.Now,ifthetotalcyclestakenupindecodingtheinstructionsislessthanthetotalcyclesrequiredtofetchthem,wewillseeaspeedupintheexecution.Thefollowingequationsgiveabasicmathematicalmodelfortheaboveanalysis.Wetakethebasicsystemshowningure 4-4 andassumethereisauniformcompressionratiothroughoutthetextsegment.WeconsiderasimpleunpipelinedmodelwhichcanbeimprovedwhenpipeliningisintroducedLet,C=CompressionratioofthetextsegmentM=CyclestakentofetchawordfrommemoryE=Cyclestakentodecryptawordofencodedtext 39

PAGE 40

R=CyclestakentodecompressawordofencodedtextN=TotalnumberofinstructionwordsfetchedduringexecutionThen,Totalcyclestakentofetchthecodeforanunencryptedanduncompressedbinarywouldbe Tn=N.M(4)Totalcyclestakentofetchanddecryptthecodeforabinarythatisonlyencryptedwouldbe Te=N.(M+E)(4)Totalcyclestakentofetch,decryptanddecompressthecodethatisencryptedaswellascompressed Ter=C.N.(M+E+R)(4)NotethatTn,TeandTerdonotconstitutethecyclestakenbytheprocessortoexecutethecode,butonlythoseusedinfetchingthecodetotheprocessor.NowEquation 4 givestheratioofnon-executingcyclesbetweenencrypted-compressedtextandregulartextandEquation 4 givesthatratioforencrypted-compressedtextandonlyencryptedtext.CN=Ter TnCN=C.(M+E+R) M CN=C1+E+R M(4)CE=Ter TeCE=C.(M+E+R) M+E 40

PAGE 41

CE=C1+R M+E(4)CEgivestheeffectofcompressionaloneontheperformanceofanexecutedbinarywhileCNgivestheabsoluteeffectonperformancethatencryptingandcompressingabinarywouldhave.ThegoalistomakeCNandCEaslowaspossible.TheobviouswaytodoitistohavealowercompressionratioC(bettercompressionefciency)andalowdecompressionlatencyR.Weusuallyhavetomaketradeoffswhenconsideringthesetwopotentiallyconictingrequirements.Forexample,Huffmanandarithmeticencodinggivethebestcompressionratiobutitsdecompressionprovestobeveryslow.Dictionarybasedcompressionandselectivebit-maskinggiveadecentcompressionratiowithfastdecompression.Herewehaveassumedthatthecompressionisuniformthroughoutthetextsegment.Inarealscenario,uniformityorirregularityincompressionwouldalsoaffectCNandCE.Therewillbesomepartsofthecodethatmaybemoretightlycompressedthanothersanddifferentpartsofthecodearefetchedatdifferentfrequencies.Ifthepartsthatarefetchedthemostnumberoftimesarecompressedmoretightlytheperformancewillimprovefurther.Similarly,encryptionalgorithmshouldbechoseninrelationtoC,M,andRsoastokeepCNsmall.However,lesserthecomputationalcomplexityoftheencryptionalgorithmthelesssecureitwouldbe.Sothedesignerhastomakeachoicebetweensecurityandspeed.Forexample,AESwillhavealargerdecryptionlatencythanDES.HenceCNwouldbesmallerforDESandCEwouldbesmallerforAES,i.e.,executionofanencryptedandcompressedcodewillbeslowerforAEScomparedtoDESbuttheeffectofcompressionwouldbemoresignicantforAES.ForbothEquations 4 and 4 ,Mgivesthecachemisslatency,andasEquations 4 and 4 suggest,alargeMwouldreducetheeffectoftheotherlatencies.Thatmeans,ifMismuchlargerthanbothEandRthedecompressionanddecryptionlatencieswouldbenegligible. 41

PAGE 42

4.2.3PlacementofCacheTillnowwehavediscussedagenericsysteminvolvingonlyaprocessor,decoderandmemory.Amorerealisticsystemwouldalsoinvolvecaches.Thisgivesustheopportunitytoexplorethedifferentcongurationsthatthesystemcachescanhaveandtheirrelativeadvantagesanddisadvantages.Firstofall,decryptorshouldalwaysbeplacedbeforethecache.Placingitafterthecachewouldmeanthateachinstructionfetchinvokesthedecryptor,whichwouldmakethesystemextremelyslow.Placingitbeforethecachewillcauseitsinvocationonlyoncachemiss.Asdecompressioncouldbefast,theplacementofthedecompressorismoreexible.Figure 4-5 showsacongurationwherethedecoderisputbetweenthecacheandthemainmemory.Herethejobofthedecoderistobothdecryptanddecompressablockofcodefromthememoryandprovidethecachewithablockofregularcode. Figure4-5. Processor-Cache-Decoder(PCD)architecture Thedecodedinstructionsaresentbacktothecache.Inthisschemethegranularityofdecompressionischangedfromoneinstructiontoaninstructionblockofthecache.Thelargerthecacheblocksize,morethenumberofinstructionsthataresentbacktothecacheperfetch.Also,morethenumberofcompressedinstructionsfetchedforasingleroundofdecompression,bettertheperformanceimprovement.Alargerblocksizewouldnotnecessarilymeanareductioninthetotalnumberoffetchesfromthecache.Thatwoulddependontheactualbinaryandthesizeofthebasicblocksinthecode.ThePCDarchitectureissimilartothesimplisticmodelingureFigure 4-6 exceptthatthegranularityhaschanged,insteadofindividualwords,processingisdoneoncacheblocks.Alargercachesizewouldmeanmorehitsandalowerfrequencyof 42

PAGE 43

fetches.Asthatfrequencydropswithalargercachesize,sodoestheeffectofthedecoderunit,i.e.,totalnumberoffetchestothememoryandthetotalnumberofblocksdecodedwilldrop.Sothedifferencebetweenthenumberoffetchesincompressedcodeanduncompressedcodeissmaller.Inotherwordsalargecachereducesthesignicanceofareducedcodesize.Howeverastheunencryptedcodesitsinthecache,thetotalnumberoftimesblocksaredecryptedwouldreducewithalargecache.Thatmeansalowerdecryptionoverhead.Sowhencomparedwiththeperformanceofaregularprogram(unencryptedanduncompressed),withthesamecachesize,thereshouldbelittledifferenceintheperformanceratio.Figure 4-6 showsaprocessor-decompressor-cache-decryptor(PDCD)architecture.Inthisschemetheencryptedtextisfetchedasblocksfromthememorybythedecryptor,whicharethendecryptedandsentbacktothecache. Figure4-6. Processor-Decompressor-Cache-Decryptor(PDCD)architecture Thedecompressorfetchesthecompressedtextfromthecacheandsendsbackinstructionwordstotheprocessor.Thedecryptorandthedecompressorcannotbeplacedtogetherbetweentheprocessorandthecacheasitwouldgivealargedecryptionoverheadforeachfetchfromtheprocessor.InthePCDarchitecture,theuncompressedinstructionsarekeptinthecachefromwheretheprocessorfetchesthem.Cachesizeandthecachereplacementmethoddeterminesmissrateofthecachesystem. 43

PAGE 44

TheadvantageofPDCDoverPCDisthatasthecompressedtextiskeptinthecache,moreinstructionsareeffectivelyplacedinthecache,i.e.,theeffectivecachesizeincreaseswhichreducesthemissratiowhichinturnreducesthenumberoffetchestothemainmemory.Howeverastheprocessorfetchesinstructionsdirectlyfromthedecompressor,thereisadecompressionlatencyforeachinstructionfetch,anditwouldbeessentialforthedecompressortobeextremelyfast. 4.3Experiments 4.3.1ExperimentalSetupOurexperimentswereperformedusingSimpleScalarperformancesimulatorforMIPSuniprocessorarchitecture.AselectionofbenchmarksfromMediaBenchandMiBenchcompiledforAlphaISAisusedtoperformourexperiments.Thebenchmarkprogramsusedwereepic,cjpeganddjpegimagecompressionutility,adpcm-encodeanddecodevoicecompressionprograms.IaddedthedecompressoranddecryptormodulesinSimpleScalar'ssim-outorder.Ikeptasingleinstructiondirectcachewithalinesizexedat16bytes,andfetchingacachelinefromthemainmemorytakes64cycles.Ihaveusedthreecompressiontechniques:bitmaks-basedcompression,dictionary-basedcompressionanddualcompressiondescribedinChapter 3 .Dictionaryandbitmaskbasedcompressionuseasingle-cycledecompressor.BlockciphersDESandAESareusedforencryptionusingECBencryptionmode.DESusesablocksizeof64bitswhereasAESusesthatof128bitsandhaveadecryptionlatencyof32and128cycles,respectively.Asbitmask-anddictionary-basedcompressionalgorithmshaveasinglecycledecompressionrate,wehaveusedthePDCDarchitecturetointegratethemwithAESandDES.FordualcompressionwehavealsousedPDCD,however,SFCanddecryptionaredoneatthesamestage. 44

PAGE 45

Table4-1. Averageratioofthenumberofcyclesforacombinationoftheusedencryptionandcompressionmethods AESDES Uncompressed4.352.29Bitmask2.791.69Dictionary3.471.77Dual2.501.36 4.3.2ResultsFigure 4-7 showsthecompressionratiosfordual,bit-maskanddictionarybasedcompressionschemes.Bit-maskinganddualcompressionhavesimilarcompressionratios,andarebetterthanthedictionarybasedcompressionscheme. Figure4-7. Compressionratioforthevariousbenchmarks Thebenchmarksshowanaverageincreaseinperformancewhencomparingbinariesthatareonlyencryptedandthosethatareencryptedaswellascompressed.Improvementislargewhenthecachesizeissmallasthenumberofasagreaternumberofcachemissesisreduced.Table 4-1 showstheaverageratioofcycleswhencomparedtoregularexecutionwiththevariouscombinationsofAESandDESwithDual,BitmaskandDictionarybasedcompressionschemesaswellaswhentheyareuncompressed. 45

PAGE 46

Programsthatareonlyencryptedtakethemostnumberofcycles,followedbythosethatarecompressedusingdictionarybasedcompressionscheme,bitmaskbasedschemeanddualcompressionrespectively.Thisresultcomplieswithourhypothesisasbitmaskinggivesabettercompressionratiothandictionarybasedcompression.Also,cacheutilizationisthebestindualcompressionasthemostfrequentlyfetchedinstructionsaremosttightlycompressed.ProgramsencryptedwithAEStakemorenumberofcyclescomparedtoDESasithasahigherdecryptionlatency.Figure 4-9 andFigure 4-8 showtheeffectofcachesonexecutionperformanceofonlyencryptedandencryptedandcompressedcodeforDESandAESrespectively.Asthesizeofthecacheincreases,theratioofperformanceofregularcodeandencryptedcodedecreasesforbothencryptedaswellasencryptedandcompressedcode.Thisisbecauselargercachesizesmeansalowermissratio.Alowermissratiomeanslessnumberoffetchestothemainmemoryandlessnumberofblockstodecode. Figure4-8. PerformanceratiosforDESforvariouscachesizes Figure 4-10 showsthereductioninthenumberofcyclesforvariouscombinationsofcompressionandencryptionmethodscomparedtothecorrespondingencryptionmethod.Performanceimprovementismostnoticeableforsmallcaches.Smallcache 46

PAGE 47

Figure4-9. PerformanceratiosforAESforvariouscachesizes meansmorenumberoffetches,hencetheeffectofcompressionwouldbemoreprominent. Figure4-10. Ratioofexecutioncyclesbetweencompressedanduncompressedbinaries 47

PAGE 48

CHAPTER5CONCLUSIONExistingembeddedsystemsareusedeverywhere;startingfromday-to-dayappliancestocomplexbiomedical,militaryandotherscienticequipment.Suchsystemsneedtobeefcient(intermsofarea,powerandperformance)aswellassecure.Thisthesisdescribedanoveldualcodecompressionscheme.Codecompressiontechniquescanbeusedinembeddedsystemstoeitherimprovecodesizeorperformance.Throughmyproposedschemeofdualcompression,wecansimultaneouslyoptimizecodesizeandperformance.Dualcompressionissplitintotwoparts.DynamicFrequencybasedCompression(DFC)improvesperformancebycompressingthemostfrequentlyexecutingbasicblocks.StaticFrequencybasedCompression(SFC)exploitsthemostfrequentstaticinstructionsandusesbit-maskbasedcompressiontoreducethecodesize.DFCcompressestheoriginalbinaryandprovidesavalidinputforSFCasthewordboundariesaremaintained.DFCmaycauseaminorsizereduction.ThedynamicdecompressionforDFCisdonebetweenthecacheandprocessorandthatforSFCisdonebetweencacheandmainmemory.Thiswaydecompressionisdistributedandcacheandmemoryspaceisefcientlyutilized.SFCitselfcausesaminorspeedupasfetchingcompressedcodefromthemainmemorywouldcomparativelytakelessnumberofcycles.Experimentalresultsdemonstratethatdualcompressionreducescachemissessignicantlyforsmallcachesandproducesanaveragespeedupof50%,andachievedcompressionratiosfrom60-65%.Thesecondpartofthisthesispresentedasynergisticschemeofcombiningencryptionandcodecompression.Whiletheformerprovidessecuritytoapplicationprogramsfromreverseengineeringandmaliciousmanipulation,thelatterisusedtominimizethecodesizeandthusreducethememoryrequirements.Thisthesisanalyzedthesequenceinwhichcompressionandencryptionshouldbedoneandshowedthatitisusefultorstcompressthecodeandthenencryptit,asthenitwill 48

PAGE 49

havereducedcodeforencryptionanddecryption.Thethesisalsoanalyzedtheeffectofvariousparametersontheperformanceofsuchasystemandtheeffectofcache.Acomplexencryptionalgorithmmakestheeffectofcompressionmoreprominentandtheperformanceincreaseisproportionaltothecompressionratio.Largememoryaccesslatenciesdiminishthelatenciesofdecryptionanddecompressionandasmalleraccesslatency,asisthecasewithembeddedsystems,makestheireffectmorevisible.Withtheintroductionofcaches,itisalwaysmoreprotabletoplacethedecryptorbeforethecacheasthatwouldreduceitsinvocationrate.Alargecachetranslatesintoalowermissratiowhichmeanslessnumberofaccessestothemainmemoryandlessnumberofinvocationsofdecryptor,thus,decryptionlatencywouldbelessprominent.Finally,aProcessor-Decompressor-Cache-Decryptor(PDCD)architecturewouldgivebetterresultscomparedtoaProcessor-Cache-Decoder(PCD)architectureforafastdecompressorasinthatcasethecachewouldholdcompressedinstructions,thus,effectivelyincreasingitssize.Experimentalresultsdemonstratedthattheexecutiontimerequiredisindeedlessifencryptioniscombinedwithcompressionratherthanifencryptionhadbeendonealone.Compressionalgorithmswhichgiveabettercompressionratio,likebitmask-basedoverdictionary-basedcompression,gavebetterperformanceresults.Forexample,bitmask-basedcompressiongavea17%reductionwhereasthatfordictionarybasedcompressionwas13%whencombinedwithDES.Furthermoredualcompressionmakesthebestutilizationofboththememoryaswellascacheandthereforegavethebestperformanceresultof40%reductioninexecutioncycleswithDES.Theperformanceimprovementduetocompressionwasmoreapparentforsmallcachesasmorecachemissesinvokedthedecryptormorefrequently. 49

PAGE 50

REFERENCES [1] D.A.Huffman,Amethodfortheconstructionofminimum-redundancycodes,Pro-ceedingsoftheInstituteofRadioEngineers(IRE),vol.40,no.9,pp.1098,September1952. [2] Welch,T.A.,Piparazzi:Atestprogramgeneratorformicro-architectureowverication,Computer,vol.17,no.6,pp.8,June1984. [3] A.WolfeandA.Chanin,ExecutingcompressedprogramsonanembeddedRISCarchitecture,inProceedingsofInternationalSymposiumonMicroarchitecture(MICRO),1992,pp.81. [4] H.LekatsasandW.Wolf,SAMC:Acodecompressionalgorithmforembeddedprocessors,IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems(TCAD),vol.18,no.12,pp.1689,December1999. [5] S.Nam,I.ParkandC.Kyung,Improvingdictionary-basedcodecompressioninVLIWarchitectures,IEICETrans.Fundamentals,vol.E82-A,no.11,pp.2318,November1999. [6] S.LarinandT.Conte,Compiler-drivencachedcodecompressionschemesforembeddedilpprocessors,inProceedingsofInternationalSymposiumonMicroarchitecture(MICRO),1999,pp.82. [7] Y.Xie,W.WolfandH.Lekatsas,CodecompressionforVLIWprocessorsusingvariable-to-xedcoding,inProceedingsofInternationalSymposiumonSystemSynthesis(ISSS),2002,pp.138. [8] C.Lin,Y.XieandW.Wolf,LZW-basedcodecompressionforVLIWembeddedsystems,inProceedingsofDesignAutomationandTestinEurope(DATE),2004,pp.76. [9] D.DasandR.KumarandP.P.Chakrabarti,Dictionarybasedcodecompressionforvariablelengthinstructionencodings,inProceedingsofVLSIDesign,2005,pp.545. [10] S.SeongandP.Mishra,Bitmask-basedcodecompressionforembeddedsystems,IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems(TCAD),vol.27(4),pp.673,April2008. [11] L.Benini,D.Bruni,A.MaciiandE.Macii,Hardware-assisteddatacompressionforenergyminimizationinsystemswithembeddedprocessors,inProceedingsofDesignAutomationandTestinEurope(DATE),2002,pp.449. [12] H.LekatsasandJ.HenkelandV.Jakkula,Designofanone-cycledecompressionhardwareforperformanceincreaseinembeddedsystems,inProceedingsofDesignAutomationConference(DAC),2002,pp.34. 50

PAGE 51

[13] E.WanderleyNetto,R.Azevedo,P.Centoducatte,G.Araujo,Multi-prolebasedcodecompression,in41stDesignAutomationConference,2004,pp.244. [14] M.Johnson,Oncompressingencrypteddata,IEEETransactionsonSignalProcessing,vol.52,pp.2992,2004. [15] X.Ruan,Usingimprovedshannon-fano-eliascodesfordataencryption,Informa-tionTheory,2006IEEEInternationalSymposiumon,pp.1249,2006. [16] A.OrpazandS.Weiss,Astudyofcodepack:optimizingembeddedcodespace,inCODES'02:ProceedingsofthetenthinternationalsymposiumonHardware/softwarecodesign.NewYork,NY,USA:ACM,2002,pp.103. [17] C.Shaw,D.Chatterji,P.MajiS.Sen,B.Roy,P.P.Chaudhuri,Apipelinearchitectureforencompression(encryption+compression)technology.inPro-ceedingsofInternationalConferenceonVLSIDesign,2003,p.277. [18] H.Lekatsas,J.Henkel,S.T.Chakradhar,andV.Jakkula,Cypress:compressionandencryptionofdataandcodeforembeddedmultimediasystems,IEEEDesign&TestofComputers,vol.21,no.5,pp.406,SepOct2004. 51

PAGE 52

BIOGRAPHICALSKETCH KartikShrivastavareceivedhisBachelorofTechnologyininformationtechnologyfromMalviyaNationalInstituteofTechnology,Indiain2008.HecompletedhisMasterofScienceincomputerengineeringfromUniversityofFloridain2010.Sincesummerof2009,hehasbeenworkingoncodecompressiontechniquesforembeddedsystemsatEmbeddedSystemsLaboratory,UniversityofFlorida. 52