<%BANNER%>

Multi-Query Optimization in the Datapath System

Permanent Link: http://ufdc.ufl.edu/UFE0025140/00001

Material Information

Title: Multi-Query Optimization in the Datapath System
Physical Description: 1 online resource (49 p.)
Language: english
Creator: Pansare, Niketan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: datapath, multi, optimization, query
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The performance of conventional databases on modern hardware is suboptimal as compared to the scientific and commercial applications. Recent studies have shown that this suboptimal performance is contributed to bad data placement and data retrieval strategies. Also, for typical analytic processing workload, most queries work on similar data and hence there is a huge potential to minimize the transfer of data through the memory hierarchy. In spite of this, the focus of current databases is on optimizing the computations and not the data. The Datapath system is a novel database that is implemented from the ground-up using a data-centric approach. In this thesis, I describe and evaluate a multi-query optimizer for the Datapath system. Unlike traditional multi-query optimizers that only try to overlap common sub-expressions, I propose an efficient optimization algorithm that minimizes the data (or the overall number of tuples) flowing through the system. Using this objective function, a qualitative and quantitative study is presented comparing the commonly used algorithms against the proposed multi-query optimization algorithm.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Niketan Pansare.
Thesis: Thesis (M.S.)--University of Florida, 2009.
Local: Adviser: Jermaine, Christopher.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0025140:00001

Permanent Link: http://ufdc.ufl.edu/UFE0025140/00001

Material Information

Title: Multi-Query Optimization in the Datapath System
Physical Description: 1 online resource (49 p.)
Language: english
Creator: Pansare, Niketan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: datapath, multi, optimization, query
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The performance of conventional databases on modern hardware is suboptimal as compared to the scientific and commercial applications. Recent studies have shown that this suboptimal performance is contributed to bad data placement and data retrieval strategies. Also, for typical analytic processing workload, most queries work on similar data and hence there is a huge potential to minimize the transfer of data through the memory hierarchy. In spite of this, the focus of current databases is on optimizing the computations and not the data. The Datapath system is a novel database that is implemented from the ground-up using a data-centric approach. In this thesis, I describe and evaluate a multi-query optimizer for the Datapath system. Unlike traditional multi-query optimizers that only try to overlap common sub-expressions, I propose an efficient optimization algorithm that minimizes the data (or the overall number of tuples) flowing through the system. Using this objective function, a qualitative and quantitative study is presented comparing the commonly used algorithms against the proposed multi-query optimization algorithm.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Niketan Pansare.
Thesis: Thesis (M.S.)--University of Florida, 2009.
Local: Adviser: Jermaine, Christopher.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0025140:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

ThanksgoouttoChristopherJermaine,AlinDobra,SubiArumugam,RaviJampaniandLuisPerez. 4

PAGE 5

Page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 9 CHAPTER 1INTRODUCTION .................................. 10 1.1Compute-CentricSystem ............................ 10 1.2Data-CentricSystem .............................. 10 1.3Compute-CentricVersusData-Centric .................... 11 1.4ProblemStatement ............................... 14 2RELATEDWORK .................................. 17 3OVERVIEW ..................................... 19 4DESIGN ........................................ 22 4.1TheNetworkIntegratorClass ......................... 22 4.2TheEnumeratorMethod ............................ 22 4.3TheCostFunction ............................... 23 4.4TheSearchStrategy .............................. 23 5IMPLEMENTATION ................................. 26 5.1Types ...................................... 26 5.2Enumeration .................................. 27 5.3MappingRules ................................. 28 5.4BypassableRules ................................ 29 5.5TheCostFunction ............................... 31 5.6TheSearchStrategy .............................. 33 6EXPERIMENTALRESULTS ............................ 35 6.1Goal ....................................... 35 6.2Setup ...................................... 35 6.3ExperimentalResults .............................. 37 6.4Analysis ..................................... 37 7CONCLUSION .................................... 45 8FUTUREWORK ................................... 46 5

PAGE 6

....................................... 47 BIOGRAPHICALSKETCH ................................ 49 6

PAGE 7

Table Page 5-1SelectivityFactor ................................... 33 6-1CostandTimetakenbyeachalgorithm ...................... 39 7

PAGE 8

Figure Page 1-1Queryplansfortraditionaldatabases ........................ 12 1-2PathNetworkafterquery1 ............................. 13 1-3PathNetworkafterquery2 ............................. 13 1-4PathNetworkafterquery3 ............................. 14 5-1Pathnetworkbeforebypassing ........................... 29 5-2Finalpathnetworkwithbypassing ......................... 30 5-3Finalpathnetworkwithoutbypassing ....................... 30 5-4Examplepathnetworkforbypassing ........................ 30 6-1Frameworkfortestingdierentqueryoptimizationtechniques .......... 36 6-2ComparisonofFIFOwithotherselectors ...................... 37 6-3ComparisonofRandomselectorwiththecostbasedselectors ........... 38 6-4Comparisonofthecostbasedselectors ....................... 38 6-5Comparisonofthetheaveragetimetakenbytheselectors ............ 40 6-6Pathnetworkafterquery11 ............................. 42 6-7Pathnetworkafterquery5forwaypoint-countselector .............. 43 6-8Pathnetworkafterquery5forcost-basedselector ................. 43 8

PAGE 9

9

PAGE 10

17 19 23 ]aresuggestedtoimprovethememoryperformanceofthedatabases,buttheycanonlyperformasgoodasthedataexpectedbythecomputations.Thisemphasisonthecomputationisill-suitedforthedatabasesduetolargeamountofdatatheyhandle.Second,thegapbetweenthetimetakentopushthedatathroughmemoryhierarchyandthetimetakentoperformthecomputationonithasbeenincreasinginthepastyears.Thisproblemhasbeenfurtheraggravatedwiththeadventofmulti-coreprocessorsandhencetransferringthedatathroughmemoryhierarchyisbecomingabottleneckforthemoderndatabases[ 2 ].Inspiteofthisbottleneck,computationsstilldrivethedataincurrentdatabasesystems. 10

PAGE 11

21 ]suggestedbythequeryprocessor.Toputitsimply,inData-centricsystem,datadrivesthecomputation;whileintraditionaldatabasesystems,computationdrivesthedata. 11

PAGE 12

1-1 )forthesethreequeries.Thisistrueevenformostmulti-queryoptimizers,sincethesequerieshavenocommonsub-expressions[ 22 ].Thegure 1-1 ignoresthephysicaloperatorssuchasindex,sort,etc. Figure1-1. Queryplansfortraditionaldatabases TheJoin1operatorwillhavethecodetofetchthedata(GetNextmethod)andalsotoperformthecomputationonit.Thismeansthatthesamedata(fromthenationtable)isbroughttwiceontothecache;rstforJoin1andthenforJoin2.Thoughsomedatabasestrytoalleviatethisproblembyusingmulti-queryoptimizersandmaterializedviews,itdoesnotsolvetheproblem. 12

PAGE 13

1-2 Figure1-2. PathNetworkafterquery1 Thepathmanagerwillnowtrytooverlapquery2toreducetheowofdatainthesystem.Thegure 1-3 showsthenewpathnetwork,whereJoin1andJoin2aremergedtogethertoformJoin1-2. Figure1-3. PathNetworkafterquery2 13

PAGE 14

1-4 showsapathnetworkafterintegratingquery3intotheexistingpathnetworkshowninthegure 1-3 .Noticethattheselectionwaypointofquery1actsasabypasswaypoint1forquery3. Figure1-4. PathNetworkafterquery3 Theplangeneratedbytraditionaldatabase 1-1 hasmoredatapaths.Thismeansthatthereismuchmoredatabeingtransferredthaninthepathnetwork.Clearly,mosttraditionalqueryoptimizersarenotanidealchoicefortheDatapathsystem.Multi-queryoptimizers(MQO)trytoalleviatethisproblembysharingtheresultofcommonsub-expressionsbetweenqueries[ 22 ].TheconstraintsonthetypeofdataintheDatapathsystemaremorerelaxedthanwhatmostMQOassume,hencemakingtheproblemalittledierentfrommulti-queryoptimization.Also,multi-queryoptimizersliketraditionalqueryoptimizersfocusonoptimizingthecomputations,whileignoringthedatapaths.Therefore,traditionalmulti-queryoptimizersarealsonotsuitablefortheDatapathsystem. 14

PAGE 15

1. 2. 1. 2. 15

PAGE 16

13 ]andhenceexhaustivesolutionisnotfeasible.Forsimplestcasewheretherearenoqueriesinthesystem,ourproblembecomesatraditionalqueryoptimizationproblem.Hence,weusethestrategythatlimitsthesearchspacebyperformingalook-aheadsearchratherthanexhaustivesearchintheSearchcomponent.Thiswillbediscussedindepthinthechapter 5 16

PAGE 17

21 ]laidthefoundationforoptimizingsinglequeriesinthedatabasesystem.Mostqueryoptimizersuseacostmodeltosearchthroughthesearchspacedeterminedbytheirsearchstrategies.Varioussearchstrategieshavebeenproposedforsinglequeryoptimization[ 4 7 10 14 { 16 27 ].Moreover,dierentqueryoptimizationschemeswereproposedtoachievedierentoptimizationgoals,namelyminimizingresponsetimeoftheinputquery,minimizingthememoryusage,maximizingthethroughputofthesystem,etc.Mostsinglequeryoptimizersfocusontryingtominimizetheresponsetimeofinputquery,whereasmulti-queryoptimizers[ 22 ]trytoimprovethethroughputofthesystem.Insteadofoptimizingeachqueryindependently,multi-queryoptimizerstrytooptimizetheglobalqueryplanthatrepresentsallthequeriesinthesystemtoexploitcommonsub-expressionsinmultiplequeries.Amultiple-querygraphisgenerallyusedtorepresentthisglobalqueryplan[ 3 18 ].Sellis[ 22 ]provedthatmulti-queryoptimizationwouldleadtosubstantialsavingsoversinglequeryoptimization.Sincemulti-queryoptimizationisaNP-hardproblem,Sellis[ 22 ]suggestedusinganAsearchdirectedbyaheuristicfunctionratherthananexhaustivesolution.Later,thisheuristicfunctionwasreplacedbyamoreinformedcostfunctionwhichimprovedtheperformanceoftheoptimizer[ 24 ].Royetal.[ 20 ]suggestedagreedyheuristicalgorithmthattriedtomaximizesharingbymaterializingsomepartialresultsonthedisk.Dalvietal.[ 6 ]extendedthisalgorithmbyusingpipeliningtoreducethecostofmaterialization.TorosluandCosar[ 26 ]proposedadynamicprogrammingschemeformulti-queryoptimizers.Mostmulti-queryoptimizerstrytooverlaponlycommonsub-expressionsinmultiplequeries.Hall[ 11 ]suggesteddetectingcommonsub-expressionwithinsinglequery.ChenandDunham[ 5 ]allowforpartialoverlapofselectionpredicatesbyleavingallprojectionoperationstothenalstages.Theyarguethatpushingprojectionsupisbadfornestedloopjoinbutgoodforhashjoin[ 9 ]. 17

PAGE 18

11 ]suggestsevaluatingcommonsub-expressionasapre-processingstep;whereasSubramanianandVenkataraman[ 25 ]suggestsitasapost-processingstepoftraditionalqueryoptimization.ThiswouldallowtheMQOtechniquestobeintegratedwiththeexistingqueryoptimizersandhenceprovideapracticalsolution.Royetal.[ 20 ]alsoprovideapracticalalgorithmbymodifyingtheVolcanosearchstrategy[ 10 ].LiketheDatapathsystem,theStagedDBsystemfocusofsharingtheaccesstothedataandnotthecomputation.Boththesystemsgroupthecomputations(ortheexecutionrequests)ofdierentqueriesthatsharethesamedata.TheStagedDBusesthestagestogroupthecomputations,whereastheDatapathsystemusesthewaypoints.So,theoptimizeroftheStagedDBisexpectedtosolvethesimilarproblem(ifnotthesameproblem)asthePathOptimizer.However,thedecisionofsharingthedataispusheddowntotheexecutionengine.TheexecutionengineoftheStagedDBsystemtakesmostofthedecisionsbymonitoringeachrelationaloperatorsorthestagestodetectanoverlap.ThismakessharingofthedataopportunisticintheStagedDBsystem.Asaresult,theoptimizerfortheStagedDBissimilartotraditionalqueryoptimizers[ 12 ].Also,thelevelofsharingsupportedbytheexecutionengineoftheStagedDBsystemislessascomparedtothatoftheDatapathsystem.Thecooperativescans[ 28 ]alsosharethedatainconcurrentscans.Thisisanalogoustosharingofthetable-scansintheDatapathsystem.Apartfromthetablescans,thecooperativescansdonotsupportanysharingofthedata.Inessence,thecooperativescansonlytrytominimizethediskaccessandnottheaccessestothecache.ThoughbothcooperativescansandtheStagedDBsystemfocustosomeextentonsharingofthedata(ratherthancomputations),theydonotfullyexploitthelevelofdata-sharingasascomparedtotheDatapathsystem. 18

PAGE 19

19

PAGE 20

6 ).TheQueryPlannerconsistofthreemaincomponents,namelytheParser,thePathOptimizerandtheTranslator.TheParsergetsaSQLqueryandperformstype-checkingandothervalidations.Ifthequeryisvalid,itforwardsthequerytothePathOptimizer.ThePathOptimizerrsttransformsthequeryintoagraphcalledasquerydescription.Thequerydescriptioncontainsnoinformationabouttheorderingofjoins.Asdiscussedearlier,thepathnetworkisagraphthatrepresentstheoverallexecutionplanofallthequeriesinthesystemortheglobalqueryplan.ThePathOptimizerthentriestointegratethequerydescriptionontothepathnetwork.Itdoessoincrementallybyconsideringonepredicateatatimefromthequerydescriptionandtryingtointegrateitontothepathnetwork.Itisimportanttonotethatthisintegrationisnon-destructive.Thismeansthattheedgesintheexistingpathnetworkarenotmodied.Thedetailsofthisalgorithmwillbediscussedlater.ThePathOptimizerusesanobjectcalledthenetworkintegratortomaintainthestateofthealgorithm.Thenetworkintegratorobjectcontainsapartiallyintegratedpathnetworkandapartialquerydescription.ThenalstateofthePathOptimizerisanetworkintegratorobjectthatcontainsafullyintegratedpathnetworkandanemptyquerydescription.Tosummarize: 1. Data-centricqueryoptimizationisdierentthancompute-centricqueryoptimization. 2. Thisthesisusesadata-centricapproachtomulti-queryoptimization. 20

PAGE 21

ThegoalofthePathOptimizeristominimizethenumberoftuplesinthepathnetwork. 4. Theproposedalgorithm(whichwillbediscussedindepthlater)isincremental,non-destructive,non-exhaustiveandmodular(toseperatedierentaspectsofqueryoptimization). 21

PAGE 22

1. Theexistingpathnetwork 2. Arepresentationofthenewquerythatistobeintegratedintothenetwork(Querydescription)Thejobofthisclassistointegratethenewqueryintothepathnetwork.However,forreasonsthatIwilldiscusssubsequently,thisclassdoesnotencodeanynotionof"search".Infact,itisquiteunintelligent.Allthisclassdoesistoprovidethemachinerynecessarytointegratethequeryintothenetwork:theclassdoesnotguidetheintegrationinanyway.Thatisdoneviaanexternalalgorithmthatmakesuseoftheclass.TheNetworkIntegratorclassworksasfollows.Atalltimes,aninstanceofthisclasscontainsacertain"stateofintegration".Initially,aftertheconstructoriscalled,thenewqueryistotallyseparatefromtheexistingpathnetworkinsideoftheNetworkIntegratorobject.Thus,intially,thetwoaretotallyun-integrated.Eventually,thequeryandthenetworkwillbetotallyintegrated,inwhichcasetheinstanceencapsulatesavalidpathnetworkthattotallycontainsthenewqueryandcouldbedirectlyexecutedbythesystem.AninstanceoftheNetworkIntegratorclassmayalsoholdanintermediatelevelofintegration,wherethenewqueryisonlypartiallyintegratedintotheexistingpathnetwork. 22

PAGE 23

23

PAGE 24

24

PAGE 26

1. 2. 3. 4. 26

PAGE 27

1. 5.3 2. 5.4 .Ifthepredicatesarebypassable,thenanewpredicatePBypassable,itcreatedandaddedtothewaypointWPN.ThePathOptimizeralsorecursivelychecksforthebypassableparentsandaddsthebypasspredicatestothem. 3. 27

PAGE 28

1 ). networkintegratorobject(PN,QD) listofnetworkintegratorobjectsLetPN=InputPathNetwork;andQD=InputQueryDescription;andreturnList=listofnetworkintegratorobjectstobereturned(initiallyempty);foreachpredicatePQDinthequerydescriptionQDdo foreachwaypointWPNinthepathnetworkPNdo 5.3MappingRulesThetwopredicatesP1andP2canbemappediftheysatisfyfollowingcriteria: 28

PAGE 29

Boththepredicatesareofthesametype.Forexample,ifP1isaselectionpredicateandP2isajoinpredicate,theycannotbemapped. 2. Ifboththepredicatesaretablescanorselectionpredicatesandworkonthesametable,thentheycanbemappedelsetheycannotbemapped.Forexample,ifP1ispredicateoftheformnation.n name='US'andP2ispredicateoftheformorders.o orderstatus='F',thentheycannotbemappedbecausetheyhavedierenttables. 3. Ifboththepredicatesarejoinpredicates,thelefthandsidetableandattributeofthepredicateP1shouldbesameaseitherleftorrighthandsidetableandattributeofthepredicateP2.Forexample,thepredicatelineitem.l suppkey=supplier.s suppkeycanbemappedontothepredicatelineitem.l suppkey=partsupp.ps suppkey. 4. Tokeepthealgorithmsimple,thetoppredicatesarenotmappable. 5-1 ,thathasthejoinoflineitemandordersfollowedbythejoinoflineitemandsupplier. Figure5-1. Pathnetworkbeforebypassing Say,ifthenewqueryisthejoinoflineitemandpartsupp,thenthegure 5-2 showsthepathnetworkwithbypasswaypointswhereasthegure 5-3 showsthepathnetworkwithoutbypassing.Clearly,theformerpathnetworkhaslessdataowthanthelatterpathnetwork. 29

PAGE 30

Finalpathnetworkwithbypassing Figure5-3. Finalpathnetworkwithoutbypassing Figure5-4. Examplepathnetworkforbypassing 30

PAGE 31

5-4 .OnlyaquerywithjoinofTbl1andTbl5canbebypassedforJoinF.Allothertablesarerighthandsidetablesforatleastonejoin.ThoughTbl3isonlefthandsideofJoinB,butitisonrighthandsideofJoinD,henceitcannotbeconsiderforbypassing. 31

PAGE 32

2 ). ANetworkIntegratorobject AnIntegerCostLetPN=InputPathNetworkandQD=InputQueryDescription;LetFull-PN=Mini-Search(PN,QD)andH=Hashtblof(Waypoint,Flow);foreachwaypointWinFull-PNdo ifWaypointknotpresentinHthen foreachpredicatePinthewaypointWdo foreachpredicatePinthewaypointWdo 32

PAGE 33

21 ]and[ 8 ].Thetable 5-1 givestheselectivityfactorforvariouscases2. Table5-1. SelectivityFactor Typeofpredicate Condition Selectivityfactor Selection Selection 3 Join 5-1 ,ValueCount(R.A)meansnumberofdistinctvaluesofattributeAinrelationR. 33

PAGE 34

3 ). ListofNetworkIntegratorobjects:lni AfullyintegratedpathnetworkforeachPathnetworkPNinlnido ifIsPNfullyintegratedthen

PAGE 35

1. 2. 3. 35

PAGE 36

6-1 showsdierentmodulesoftheframeworkandtheirinterfaces. Figure6-1. Frameworkfortestingdierentqueryoptimizationtechniques ThePathOptimizeristestedon8TPC-Hqueries1.ThesequeriesarerandomlyshuedandareincrementallygiventothePathOptimizer.Thesamesequenceofqueriesarealsogiventodierentcombinationsofthesearchalgorithms,selectorfunctionsandthecostfunctions.Theframeworkistestedontenrandominputorderingsandthecostofnalpathnetworkandalsothetimetakenbyeachalgorithmisrecordedintoa 36

PAGE 37

1 ]. 6-1 showsthecostandthetimetakenbyeachalgorithm.Thegure 6-2 comparestheaveragecostoftheFIFOselectorwiththatofotherselectors. Figure6-2. ComparisonofFIFOwithotherselectors Thegure 6-3 comparestheaveragecostoftheRandomselectorwiththeaveragecostofthecost-basedselectors.Thegure 6-4 comparestheaveragecostofthecost-basedselectors.Thegure 6-5 comparestheaveragetimetakenbyalltheselectors. 37

PAGE 38

ComparisonofRandomselectorwiththecostbasedselectors Figure6-4. Comparisonofthecostbasedselectors 38

PAGE 39

CostandTimetakenbyeachalgorithm Search Selector Avgcost Maxcost Mincost Avgtime Maxtime Mintime Exhaus-tive Costbasedwithmini-search 3308260720 4339400090 2146320100 180.13 312.22 80.91 Exhaus-tive Costbasedwithoutmini-search 3449952840 4339400095 2912333459 162.5 312.27 85.99 Exhaus-tive Waypointcount 4395383130 8972215000 1440000400 25.903 73.17 3.37 Exhaus-tive FIFO 2.05728E+12 8.10005E+12 1440000555 33.904 104.25 4.17 Exhaus-tive Random 14768863045 74218667890 1520010980 41.712 119.29 3.98 Greedy Costbasedwithmini-search 3746172230 4769866765 3348026765 36.827 74.72 10.85 Greedy Costbasedwithoutmini-search 3507063135 3640986771 3251026765 36.997 75.28 10.87 Greedy Waypointcount 22552508025 65898400180 2572666700 0.493 0.56 0.305 Greedy FIFO 2.96523E+12 1.60001E+13 4.91044E+11 0.692 0.8 0.407 Greedy Random 64181229885 94817668050 16000632860 0.643 0.84 0.409 Look-ahead Costbasedwithmini-search 3385487380 4470200090 2629520100 71.268 109.94 33.27 Look-ahead Costbasedwithoutmini-search 3444960100 4554400090 2713720100 71.061 110.68 33.18 Look-ahead Waypointcount 15058316550 32314505050 8640000000 0.426 0.52 0.281 Look-ahead FIFO 2.23372E+12 9.6E+12 6480604050 0.536 0.63 0.322 Look-ahead Random 45932428745 79600156560 28000056210 0.612 0.69 0.372 look-aheaddepthof1issucientinmostcasesandperformsasgoodasthedepthof2or3.Thetimetakenbyanalgorithmdependsuponthenumberofnetworkintegratorobjectitenumerates.Hence,exhaustivesearchtakesalotmoretimethanthelook-ahead 39

PAGE 40

Comparisonofthetheaveragetimetakenbytheselectors orgreedysearch.Thoughgreedysearchtakeslesstime,itdoesnotperformasgoodasthelook-aheadsearch(seetable 6-1 ).Therefore,theproposedalgorithmusesthelook-aheadsearch.Also,thecostingofanetworkintegratorobjectisatime-consumingoperation.Infact,timetakenbyblind-selectorsandexhaustivesearchisalmostequaltothetimetakenbythedata-centriccost-basedselectorsusingagreedysearch.Thoughcostbasedselectorstakemoretimethantheblindselectors,theyusuallyprovidethepathnetworkwithordersofmagnitudelessnumberoftuplesthantheblindselectors.Duetotheorderingoffoldingfunction,theFIFOselectortriestoselectthepathnetworkwithextrawaypoints.Infact,FIFOactsasasinglequeryoptimizerbecauseitalwaystriestointroducenewowsinthenetworkandhenceproducesworstresults.Thegure 6-2 showsthatmappingwaypointsprovidessignicantgainoversinglequeryoptimization.RandomandWaypointCount(compute-centric)costfunctionarebothbad.However,compute-centric(orthewaypointcount)cost-basedselectorperformswellforexhaustivesearch(butnotbetterthandata-centricfunction).Thisisbecause 40

PAGE 41

6-4 ).Hence,theproposedalgorithmusesthedata-centriccostfunctionratherthanblindselectorsorcompute-centric(orthewaypointcount)cost-basedselector.Theaboveresultsshowthatstatisticallydata-centriccost-basedselectorperformsbetterthanwaypoint-countcostbasedselector.Theexamplebelowexplainsthereasonforthisbehaviour.ConsiderthetestcasewhereTPC-Hquery11istherstqueryandTPC-Hquery5isthesecondquery.2 Query5:select*fromcustomer,orders,lineitem,supplier,nation,regionwherec_custkey=o_custkeyandl_orderkey=o_orderkeyandl_suppkey=s_suppkeyandc_nationkey=s_nationkeyands_nationkey=n_nationkeyandn_regionkey=r_regionkey 41

PAGE 42

6-6 .Butwhenquery5isintegratedontothepathnetworkwithquery11,thepathnetworkgeneratedbywaypoint-countselectorisshowninthegure 6-7 andthatgeneratedbythedata-centricselectorisshowninthegure 6-8 6-7 .Inthepathnetwork 6-7 ,ordersandcustomertablesarejoinedafterlineitem.Itisclearthathavingthisjoinlowerdownthequeryplanisabetterchoiceasitproduceslessow.Thedata-centricselectoriscognizantofthisfactandhencechoses 6-8 .Also,theaboveexperimentalresultsattestthatthepathnetworkselectedbydata-centricselectorhaslowerowthanthewaypoint-countselector. Figure6-6. Pathnetworkafterquery11 Mini-searchperformswellforexhaustiveandlook-aheadbutnotforgreedy.Thisisbecauseittriestopredictthefuturejoinorderinganddoesnotsimplyjointhesmaller 42

PAGE 43

Pathnetworkafterquery5forwaypoint-countselector Figure6-8. Pathnetworkafterquery5forcost-basedselector 43

PAGE 44

44

PAGE 45

45

PAGE 46

46

PAGE 47

[1] Graphviz-graphvisualizationsoftware. [2] A.Ailamaki,D.J.Dewitt,M.D.Hill,andD.A.Wood.Dbmssonamodernprocessor:Wheredoestimego?InVLDB'99:Proceedingsofthe25thInternationalConferenceonVeryLargeDataBases,pages266{277,SanFrancisco,CA,USA,1999.MorganKaufmannPublishersInc. [3] U.S.ChakravarthyandJ.Minker.Multiplequeryprocessingindeductivedatabasesusingquerygraphs.InVLDB'86:Proceedingsofthe12thInternationalConferenceonVeryLargeDataBases,pages384{391,SanFrancisco,CA,USA,1986.MorganKaufmannPublishersInc. [4] S.Chaudhuri.Anoverviewofqueryoptimizationinrelationalsystems.InPODS'98:ProceedingsoftheseventeenthACMSIGACT-SIGMOD-SIGARTsymposiumonPrinciplesofdatabasesystems,pages34{43,NewYork,NY,USA,1998.ACM. [5] F.-C.F.ChenandM.H.Dunham.Commonsubexpressionprocessinginmultiple-queryprocessing.IEEETrans.onKnowl.andDataEng.,10(3):493{499,1998. [6] N.N.Dalvi,S.K.Sanghai,P.Roy,andS.Sudarshan.Pipelininginmulti-queryoptimization.InPODS'01:ProceedingsofthetwentiethACMSIGMOD-SIGACT-SIGARTsymposiumonPrinciplesofdatabasesystems,pages59{70,NewYork,NY,USA,2001.ACM. [7] A.Deshpande,Z.Ives,andV.Raman.Adaptivequeryprocessing.Found.Trendsdatabases,1(1):1{140,2007. [8] H.Garcia-Molina,J.Ullman,andJ.Widom.DatabaseSystems:TheCompleteBook.PearsonEducation,2008. [9] G.Graefe.Sort-merge-join:Anideawhosetimehas(h)passed?InProceedingsoftheTenthInternationalConferenceonDataEngineering,pages406{417,Washington,DC,USA,1994.IEEEComputerSociety. [10] G.GraefeandW.J.McKenna.Thevolcanooptimizergenerator:Extensibilityandecientsearch.InProceedingsoftheNinthInternationalConferenceonDataEngineering,pages209{218,Washington,DC,USA,1993.IEEEComputerSociety. [11] P.Hall.Optimizationofasinglerelationalexpressioninarelationaldatabasesystem.IBMJournalofResearchandDevelopment,pages244{257,1976. [12] S.HarizopoulosandA.Ailamaki.Acaseforstageddatabasesystems.InInProceedingsof1stConferenceonInnovativeDataSystemsResearch,2003. [13] T.IbarakiandT.Kameda.Ontheoptimalnestingorderforcomputingn-relationaljoins.ACMTrans.DatabaseSyst.,9(3):482{502,1984. 47

PAGE 48

Y.E.Ioannidis.Queryoptimization.ACMComput.Surv.,28(1):121{123,1996. [15] M.JarkeandJ.Koch.Queryoptimizationindatabasesystems.ACMComput.Surv.,16(2):111{152,1984. [16] Y.C.Kang.Randomizedalgorithmsforqueryoptimization.PhDthesis,Madison,WI,USA,1991. [17] P.-A.LarsonandG.Graefe.Memorymanagementduringrungenerationinexternalsorting.InSIGMOD'98:Proceedingsofthe1998ACMSIGMODinternationalconferenceonManagementofdata,pages472{483,NewYork,NY,USA,1998.ACM. [18] T.Neumann.EcientGenerationandExecutionofDAG-StructuredQueryGraphs.PhDthesis,2005. [19] C.Nyberg,T.Barclay,Z.Cvetanovic,J.Gray,andD.Lomet.Alphasort:acache-sensitiveparallelexternalsort.TheVLDBJournal,4(4):603{628,1995. [20] P.Roy,S.Seshadri,S.Sudarshan,andS.Bhobe.Ecientandextensiblealgorithmsformultiqueryoptimization.SIGMODRec.,29(2):249{260,2000. [21] P.G.Selinger,M.M.Astrahan,D.D.Chamberlin,I.A.Lorie,andT.G.Price.Accesspathselectioninarelationaldatabasemanagementsystem.pages23{34,1979. [22] T.K.Sellis.Multiple-queryoptimization.ACMTransactionsonDatabaseSystems,13:23{52,1988. [23] A.Shatdal,C.Kant,andJ.F.Naughton.Cacheconsciousalgorithmsforrelationalqueryprocessing.InVLDB'94:Proceedingsofthe20thInternationalConferenceonVeryLargeDataBases,pages510{521,SanFrancisco,CA,USA,1994.MorganKaufmannPublishersInc. [24] K.Shim,T.Sellis,andD.Nau.Improvementsonaheuristicalgorithmformultiple-queryoptimization.DataKnowl.Eng.,12(2):197{222,1994. [25] S.N.SubramanianandS.Venkataraman.Cost-basedoptimizationofdecisionsupportqueriesusingtransient-views.InACMSIGMODConf,pages319{330,1998. [26] I.H.TorosluandA.Cosar.Dynamicprogrammingsolutionformultiplequeryoptimizationproblem.Inf.Process.Lett.,92(3):149{155,2004. [27] L.B.WarshawandD.P.Miranker.Rule-basedqueryoptimization,revisited.InACMInternationalConferenceonInformationandKnowledgeManagement(CIKM),pages267{275.ACMPress,1999. [28] M.Zukowski,S.Heman,N.Nes,andP.Boncz.Cooperativescans:dynamicbandwidthsharinginadbms.InVLDB'07:Proceedingsofthe33rdinternationalconferenceonVerylargedatabases,pages723{734.VLDBEndowment,2007. 48

PAGE 49

NiketanR.PansarereceivedhisBachelorofEngineeringdegreeinInformationTechnologyfromVeermataJijabaiInstituteofTechnologyin2006.HethenreceivedhisMasterofSciencedegreeinComputerEngineeringfromtheUniversityofFloridainFall2009.HisprimaryresearchisfocusedonDatabaseandMachineLearning. 49