<%BANNER%>

BigCube

Permanent Link: http://ufdc.ufl.edu/UFE0043650/00001

Material Information

Title: BigCube A User-Centric Modeling Paradigm with Multidimensional Data Types and Operations Supporting Complex Spatial Objects in Data Warehouses
Physical Description: 1 online resource (222 p.)
Language: english
Creator: Viswanathan, Ganesh
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2011

Subjects

Subjects / Keywords: bigcube -- solap -- spatial -- warehousing
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The primary goal of the proposed research is to enable end-users to effectively design and analyze complex structured spatial data using data warehouses. This topic is inspired by increasing requests from data analysts to help create systems capable of storing both simple and complex structured scientific data, supporting multidimensional operations and aggregations to analyze them. This work contains four main parts: an abstract data model called BigCube that enables conceptual data warehouse design with in-built support for complex spatial objects along data hierarchies and an associated BigCube Algebra that provides an extensible set of analysis operators, a query interface for users called the Cube Analysis Language (CAL), a set of new spatial aggregation and query operations to support complex OLAP, and a novel data structure called Intelligent Binary Large Object (iBLOB) that leverages databases to provide efficient support for storing and query over large, multi-structured hierarchical objects. Data warehouses have traditionally been at the forefront of information technology applications as a way for organizations to effectively use information for business planning and decision making. The large increase in the availability of spatial data in recent years has lead to increased challenges in storing and analyzing such data. This work introduces a high-level user-view called BigCube that enables analysts to integrate and query on complex structured data. CAL queries are translated directly to the underlying logical model such as MDX or SQL implementations. New spatial operators such as cardinal direction relations are defined as part of the BigCube algebra to demonstrate the extensible query capabilities of this approach. Additionally, in order to improve the database storage and query performance of large structured objects, we introduce the in-database iBLOBs that indexes and retrieves exact sub-components of complex objects, and outperforms XML or BLOB-based storage in experiments. Overall, this work supports applications such as decision support, weather event research, GIS aggregations and biological data analysis by providing analyst-friendly tools to consolidate and query on complex structured datasets.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ganesh Viswanathan.
Thesis: Thesis (Ph.D.)--University of Florida, 2011.
Local: Adviser: Schneider, Markus.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2011
System ID: UFE0043650:00001

Permanent Link: http://ufdc.ufl.edu/UFE0043650/00001

Material Information

Title: BigCube A User-Centric Modeling Paradigm with Multidimensional Data Types and Operations Supporting Complex Spatial Objects in Data Warehouses
Physical Description: 1 online resource (222 p.)
Language: english
Creator: Viswanathan, Ganesh
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2011

Subjects

Subjects / Keywords: bigcube -- solap -- spatial -- warehousing
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The primary goal of the proposed research is to enable end-users to effectively design and analyze complex structured spatial data using data warehouses. This topic is inspired by increasing requests from data analysts to help create systems capable of storing both simple and complex structured scientific data, supporting multidimensional operations and aggregations to analyze them. This work contains four main parts: an abstract data model called BigCube that enables conceptual data warehouse design with in-built support for complex spatial objects along data hierarchies and an associated BigCube Algebra that provides an extensible set of analysis operators, a query interface for users called the Cube Analysis Language (CAL), a set of new spatial aggregation and query operations to support complex OLAP, and a novel data structure called Intelligent Binary Large Object (iBLOB) that leverages databases to provide efficient support for storing and query over large, multi-structured hierarchical objects. Data warehouses have traditionally been at the forefront of information technology applications as a way for organizations to effectively use information for business planning and decision making. The large increase in the availability of spatial data in recent years has lead to increased challenges in storing and analyzing such data. This work introduces a high-level user-view called BigCube that enables analysts to integrate and query on complex structured data. CAL queries are translated directly to the underlying logical model such as MDX or SQL implementations. New spatial operators such as cardinal direction relations are defined as part of the BigCube algebra to demonstrate the extensible query capabilities of this approach. Additionally, in order to improve the database storage and query performance of large structured objects, we introduce the in-database iBLOBs that indexes and retrieves exact sub-components of complex objects, and outperforms XML or BLOB-based storage in experiments. Overall, this work supports applications such as decision support, weather event research, GIS aggregations and biological data analysis by providing analyst-friendly tools to consolidate and query on complex structured datasets.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ganesh Viswanathan.
Thesis: Thesis (Ph.D.)--University of Florida, 2011.
Local: Adviser: Schneider, Markus.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2011
System ID: UFE0043650:00001


This item has the following downloads:


Full Text

PAGE 1

BIGCUBE:AUSER-CENTRICMODELINGPARADIGMWITHMULTIDIMENSIONALDATATYPESANDOPERATIONSSUPPORTINGCOMPLEXSPATIALOBJECTSINDATAWAREHOUSESByGANESHVISWANATHANADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2011

PAGE 2

2011GaneshViswanathan 2

PAGE 3

ToAmmaandAppa 3

PAGE 4

ACKNOWLEDGMENTS Itisapleasuretothankthemanypeoplewhomadethisdissertationpossible.IamindebtedinthepreparationofthisthesistoDr.MarkusSchneider,myPhDadvisor.IfondlyremembermyrstdaysatUFwhenheintroducedmetomovingobjectsdatabasesandwetalkedaboutthevariousaspectsofspatialdataengineering.Overthepastseveralyears,aftermuchinterestingwork,thisareahasnowbecomemyspecialtyandpassion.Histeachings,soundadvice,encouragementandpositivespirithavehelpedmeinpushingthroughmanychallengesandcompletingthiswork.Iamverythankfultohisideasandgoodcompanyduringthisimportantphaseofmylife.IthankDr.AlinDobra,Dr.SumiHelal,Dr.SanjayRankaandDr.GaryKoehlerforservingonmysupervisorycommittee,supportingmeandprovidingpositiveencouragementduringmyentiretimeingraduateschool.IamthankfultoourgraduateadvisorsJohnBowers,JoanCrisman,andtoKeriTaylorintheCISEDepartmentatUFforsupportingmeandhelpingtotrackmyprogress.ThankstoReidPorteratLosAlamosNationalLaboratoryforintroducingmetoseveralnewandexcitingmovingobjectsapplications.RohanLoveland(OxfordUniversity)andBeateZimmer(TexasA&MUniversity,CorpusCristi)mademysummerintheremoteslopesoftheJemezMountainsatLosAlamosverymemorableaswebikedandracedupthePajaritoskislopes.IthankBhaveshDoshi,FioCattaneoandShyamRajagopalatAmazon.comforintroducingmetotheexcitingworldoflarge-scaledatamanagementonthecloudandlettingmehaveaneventfulsummeratSeattle.Iwouldliketothankthemanypeoplewhohavetaughtmeengineeringandcomputerscience:mycollegeandhighschoolteachers.ThankstoDrSartajSahniandDrChrisJermaineforpositivelyinuencingmeduringmyearlydaysatUF.Ihadagreattimeinyourclassesandadmiredyourpassionatwork!Abigshout-outtomymanystudentcolleaguesandroommatesforthecaring,camaraderie,entertainmentandemotionalsupporttheyprovided.Ilearnttheropes 4

PAGE 5

ofgraduateschoollifefromvariouslabmatesintheDatabase&SystemsResearchCenteratUF.Thankyouall.ParbatiManna,JayendraVenkateswaran,LaukikChitnis,ManasSomaiya,PadmavatiSridhardeservespecialmentionformentoringmeduringmyearlydaysinthelab.AspecialmentiontoTaoChenforputtingupwithmeforalltheseyearsandforthegreattimeswehavehadtogetherlearning,coding,writingandresearchingonourgroupandlabprojects.IhadagoodtimewithAlejandroPauly,MarkMckenneyandReaseyPraingsharinglunchesandplayingtennis.SanthoshKodipaka,RaviJampaniandIsharedgoodtimesplayingvolleyballanddiscussingvaried,randomandoff-beatresearchtopics.IamgratefultoSridharNarayanan,ShrinareshSubramanian,KarthikVeeramani,KarthikGurumoorthy,KannanandManivannanRajah,VenkatakrishnanRamaswamy,HarishBharadwaj,SankethBhat,ChandrasekharSridharandArifKhan.Ihavelearntalotfromallofyou.IwouldalsoliketothanktheNationalScienceFoundation(NSF)andNASAJPLlaboratoryforsupportingmeduringmyPh.D.throughseveralnancialgrants(throughmyadvisor).Wehadaexcitingtimepreparingandpresentingourmovingobjectsdatabase(MOD)demonstrationsandreviewsfortheNASAAdvancedInformationSystemsTechnologies(AIST)project.IwishtothankmyentirefamilyforprovidingalovingenvironmentformeandhelpingmemanagelifewhenIwas8000plusmilesawayfromhome.ThankstoSukumarIyer,SaileshRamakrishnanandArunKrishnaswamyfortheirexpertadviceandmotivation.IcannotbegintothankmybrotherVasantViswanathanforhissupport,encouragementandhelpfularmwhenmostneeded.Lastly,andmostimportantly,Iwishtothankmyparents,ChitturVenkatachalamViswanathanandGeethaViswanathan.Theyboreme,nurturedme,supportedme,taughtme,taughtme,andlovedme.TothemIdedicatethisthesis. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 9 LISTOFFIGURES ..................................... 10 ABSTRACT ......................................... 13 CHAPTER 1INTRODUCTION ................................... 15 1.1Overview .................................... 15 1.2ProblemswithExistingDataWarehouseDesignArchitecture ....... 19 1.3ContributionsandDevisedApproaches ................... 21 1.4DissertationOutline .............................. 22 2RELATEDWORK .................................. 25 2.1Overview .................................... 25 2.2DataWarehouseConceptualDesignFrameworks ............. 25 2.2.1E-RApproaches ............................ 26 2.2.2UMLApproaches ............................ 29 2.2.3Ad-hocApproaches .......................... 30 2.3LogicalMultidimensionalModelingMethodologies ............. 31 2.4PhysicalImplementationofDataWarehouses ................ 39 2.4.1TheDataWarehouseArchitecture .................. 39 2.4.2DataHierarchies ............................ 40 2.4.3StorageandRetrievalofLargeStructuredObjects ......... 40 2.4.4CubeMaterialization .......................... 42 2.5RequirementsforUser-CentricSpatialOLAP ................ 43 2.6SpatialDataWarehouseDesignModelsandSOLAPTools ........ 52 2.6.1DataWarehouseQueryLanguages .................. 55 2.6.2SpatialAggregation ........................... 57 2.6.3CardinalDirectionsbetweenSpatialObjects ............. 58 3CUBEASANABSTRACTDATATYPE:THEBigCubeMODEL ......... 60 3.1TheRenedDataWarehouseArchitecture ................. 60 3.2UsingtheMeta-FrameworkforSpatialDataWarehouseDesign ...... 62 3.3CaseStudy:ProductManagement ...................... 65 3.4TheBigCubeModel .............................. 66 3.5DataModelandC3constructs ........................ 66 3.6SpatialDataTypesandtheirIntegrationintoDataWarehouseHierarchies 82 3.7ComplexSpatialTypes ............................. 82 6

PAGE 7

3.7.1ComplexPoint ............................. 82 3.7.2ComplexLines ............................. 83 3.7.3ComplexRegions ............................ 85 3.7.4HierarchicalRepresentationofSpatialObjects ........... 87 4BigCubeALGEBRA ................................. 89 4.1BigCubeOperators ............................... 89 4.2BigCubeDenitionOperators ......................... 89 4.3BigCubeManipulationOperators ....................... 90 4.4BigCubeAnalysisOperators .......................... 92 4.5SpatialAggregationOperators ........................ 95 4.6TranslationstoLogicalDesignandPhysicalImplementation ........ 98 4.6.1CaseStudy:WeatherEventAnalysis ................. 98 4.6.2OLAPformulationstosupportSpatialDataWarehouseModelingwithC3Constructs ........................... 100 4.6.3TranslationfromBigCubetoRelationalOLAPDesign ........ 102 4.6.4Observations .............................. 106 5BigCubeQUERYLANGUAGE ........................... 108 5.1Overview .................................... 108 5.2BigCubeDataModelandExample ...................... 110 5.3CUBEANALYSISLANGUAGE(CAL) .................... 111 5.3.1CubeDenitionLanguage(CDL) ................... 111 5.3.2CubeManipulationLanguage(CML) ................. 114 5.3.3CubeQueryandAnalysisLanguage(CQAL) ............ 115 5.4TransformationtoLogicalModelandImplementation ............ 119 5.5AdvantagesofCALoverMDXandSQL ................... 127 5.6Observations .................................. 135 6iBLOB:EFFICIENTQUERYINGOVERCOMPLEXHIERARCHICALOBJECTS 138 6.1Overview .................................... 138 6.2ProblemsinHandlingStructuredObjectsinDatabaseSystems ...... 141 6.3RepresentingandInterpretingStructuredApplicationObjectswithTypeStructureSpecications ............................ 144 6.4IntelligentBinaryLargeObjects(iBLOBs) .................. 149 6.4.1StructureIndex:PreservingStructureinUnstructuredStorage ... 150 6.4.2SequenceIndex:TrackingDataOrderforUpdates ......... 152 6.4.3TheiBLOBInterface .......................... 155 6.5ImplementationofMulti-StructurediBLOBs ................. 157 6.5.1SequenceIndexImplementation ................... 164 6.5.2StructureIndexImplementation .................... 166 6.5.3iBLOBimplementation ......................... 166 6.6EvaluationofiBLOBS ............................. 171 7

PAGE 8

7QUERYINGFORCARDINALDIRECTIONRELATIONSONSPATIALDATAUSINGtheBigCubeAPPROACH .......................... 188 7.1Introduction ................................... 188 7.2OverviewoftheObjectsInteractionMatrix(OIM)Model .......... 190 7.3TheObjectsInteractionGraticule(OIG)ApproachforModelingCardinalDirectionsinDataWarehouses ........................ 194 7.4TheTilingPhase:RepresentingInteractionsofObjectswiththeObjectsInteractionGraticuleandMatrix ........................ 197 7.5TheInterpretationPhase:AssigningSemanticstotheObjectsInteractionMatrix ...................................... 201 7.6DirectionalPredicatesforOLAPQuerying .................. 204 7.7Observations .................................. 207 8CONCLUSIONSANDFUTUREWORK ...................... 208 8.1Contributions .................................. 208 8.2ListofPublications ............................... 209 8.3DirectionsforFutureWork ........................... 210 REFERENCES ....................................... 212 BIOGRAPHICALSKETCH ................................ 222 8

PAGE 9

LISTOFTABLES Table page 2-1Examplesofaggregationoperatorsindatawarehouses ............. 48 3-1FivelevelsofBigCubedatatypes .......................... 79 4-1Scalarandspatialaggregationoperators ..................... 94 6-1MeasuresusedtoquantifyiBLOBperformance .................. 174 9

PAGE 10

LISTOFFIGURES Figure page 2-1Considerationsforspatialdatawarehousedesign ................ 39 2-2Illustrationofaspatialregionobject ........................ 45 2-3Detailedillustrationofacomplexregion ...................... 46 2-4Typesofspatialdimensionsindatawarehouses ................. 47 2-5Useofbufferoperationforspatialregionobject .................. 49 2-6Possiblecongurationsfortwospatialobjects ................... 59 3-1Thegeneric3-tierarchitectureformodelingdatawarehouses .......... 61 3-2Meta-frameworkforspatialdatawarehousedesign ................ 63 3-3IllustrationofBigCubestructure(productsalesexample) ............. 67 3-4ExampleBigCubeinstance ............................. 71 3-5Datahierarchyrepresentedasadirectedacyclicgraph ............. 77 3-6Raggedandnon-raggedhierarchies ........................ 78 3-7Balancedandunbalancedhierarchies ....................... 78 3-8Examplesofcomplexspatialobjects ........................ 83 3-9Examplesofpossiblegeometricanomaliesofaregionobject .......... 85 3-10Thehierarchicalstructureofaregionobject. ................... 87 4-1Illustrationofspatialslice .............................. 93 4-2Convexhullaggregationoperatorforspatialobjects ............... 96 4-3Boundingboxaggregationoperatorforspatialdata ................ 98 4-4Boundingboxaggregationoperatorforspatialdata ................ 99 4-5IllustrationofweathereventsBigCubedesign ................... 101 5-1StarschemadesignfromCDLquery. ....................... 121 5-2SnowakeschemadesignfromCDLquery. .................... 123 5-3GalaxyorFactConstellationschemadesignfromCDLquery. .......... 124 5-4ResultsofSpatialCALQuery ............................ 133 10

PAGE 11

5-5ResultsofSpatialAggregationusingCALQuery ................. 134 5-6UsingTopologicalOperatorsinSpatialAnalysis .................. 135 5-7CALqueryeditorandresultsofOLAPanalysisonMondrian .......... 136 6-1Illustrationofacomplexregionobject ....................... 141 6-2MethodologiesforADTImplementation ...................... 142 6-3Structureofhierarchicalcomplexobjects ..................... 145 6-4ElementsofaniBLOBobject ............................ 150 6-5Astructuredobjectconsistingofnsub-objectsandninternaloffsets ...... 151 6-6Astructuredobjectconsistingofabaseobjectandstructuredsub-objects ... 151 6-7Anout-of-ordersetofdatablocksandtheircorrespondingsequenceindex .. 152 6-8Theinitialin-orderandde-fragmenteddataandsequenceindex. ........ 153 6-9Asequenceindexafterinsertingblock[j...l]atpositionk. ............ 153 6-10Asequenceindexafterdeletingblock[m...n]. ................... 154 6-11Asequenceindexafterreplacingblock[o...p]byblock[l...q]. .......... 154 6-12ThestandardizediBLOBinterface ......................... 155 6-13Amulti-structuredcomplexregionobject ...................... 158 6-14Multi-structuredregionobjectwithsecondarystructures ............. 159 6-15InternaliBLOBstructure ............................... 160 6-16Aregionwithtwofacecyclesandaholecycle ................... 161 6-17InterfaceforiBLOBsequenceindex ........................ 165 6-18AsampleinstanceofaniBLOB. .......................... 167 6-19Availablestrategiesformanagingcomplexapplicationobjectsindatabases. .. 172 6-20DetailedarchitectureforTSSandiBLOBimplementation ............ 173 6-21iBLOBperformanceevaluation ........................... 178 6-22HierarchicalrepresentationofaspatialgeometryusingESRIshapeles .... 179 6-23GraphcomparingtheiBLOBstorageperformancewithXMLandBLOB .... 182 6-24GraphcomparingtheiBLOBcreateperformancewithXMLandBLOB ..... 183 11

PAGE 12

6-25GraphcomparingtheiBLOBreadperformancewithXMLandBLOB ...... 184 6-26GraphcomparingtheiBLOBinsertperformancewithXMLandBLOB ..... 185 6-27GraphcomparingtheiBLOBdeleteperformancewithXMLandBLOB ..... 186 7-1OverviewofthetwophasesoftheObjectsInteractionMatrix(OIM)model ... 190 7-2IllustrationofanobjectsinteractiongridanditsOIM ............... 192 7-3Illustrationofvariousdirectionrelationmodels .................. 194 7-4OverviewoftheOIGmodel ............................. 196 7-5IllustrationoftheOIGandOIMfortworegions .................. 198 12

PAGE 13

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyBIGCUBE:AUSER-CENTRICMODELINGPARADIGMWITHMULTIDIMENSIONALDATATYPESANDOPERATIONSSUPPORTINGCOMPLEXSPATIALOBJECTSINDATAWAREHOUSESByGaneshViswanathanDecember2011Chair:MarkusSchneiderMajor:ComputerEngineeringTheprimarygoaloftheproposedresearchistoenableend-userstoeffectivelydesignandanalyzecomplexstructuredspatialdatausingdatawarehouses.Thistopicisinspiredbyincreasingrequestsfromdataanalyststohelpcreatesystemscapableofstoringbothsimpleandcomplexstructuredscienticdata,supportingmultidimensionaloperationsandaggregationstoanalyzethem.Thisworkcontainsfourmainparts:anabstractdatamodelcalledBigCubethatenablesconceptualdatawarehousedesignwithin-builtsupportforcomplexspatialobjectsalongdatahierarchiesandanassociatedBigCubeAlgebrathatprovidesanextensiblesetofanalysisoperators,aqueryinterfaceforuserscalledtheCubeAnalysisLanguage(CAL),asetofnewspatialaggregationandqueryoperationstosupportcomplexOLAP,andanoveldatastructurecalledIntelligentBinaryLargeObject(iBLOB)thatleveragesdatabasestoprovideefcientsupportforstoringandqueryoverlarge,multi-structuredhierarchicalobjects.Datawarehouseshavetraditionallybeenattheforefrontofinformationtechnologyapplicationsasawayfororganizationstoeffectivelyuseinformationforbusinessplanninganddecisionmaking.Thelargeincreaseintheavailabilityofspatialdatainrecentyearshasleadtoincreasedchallengesinstoringandanalyzingsuchdata.Thisworkintroducesahigh-leveluser-viewcalledBigCubethatenablesanalyststointegrateandqueryoncomplexstructureddata.CALqueriesaretranslateddirectly 13

PAGE 14

totheunderlyinglogicalmodelsuchasMDXorSQLimplementations.NewspatialoperatorssuchascardinaldirectionrelationsaredenedaspartoftheBigCubealgebratodemonstratetheextensiblequerycapabilitiesofthisapproach.Additionally,inordertoimprovethedatabasestorageandqueryperformanceoflargestructuredobjects,weintroducethein-databaseiBLOBsthatindexesandretrievesexactsub-componentsofcomplexobjects,andoutperformsXMLorBLOB-basedstorageinexperiments.Overall,thisworksupportsapplicationssuchasdecisionsupport,weathereventresearch,GISaggregationsandbiologicaldataanalysisbyprovidinganalyst-friendlytoolstoconsolidateandqueryoncomplexstructureddatasets. 14

PAGE 15

CHAPTER1INTRODUCTION 1.1OverviewFormorethanadecade,datawarehouseshavebeenattheforefrontofinformationtechnologyapplicationsasawayfororganizationstoeffectivelyuseinformationforbusinessplanninganddecisionmaking.Theycontainlargerepositoriesofanalyticalandsubject-orienteddata,integratedfromseveralheterogeneoussourcesoverahistoricaltime-line( Inmon 2005 ; Kimball,R.andRoss,M. 2002 ).ThetechniqueofperformingcomplexanalysisovertheinformationstoredinthedatawarehouseispopularlycalledOnlineAnalyticalProcessing(OLAP).Thelargeincreaseintheavailabilityofspatialdatainrecentyearshasleadtoincreasedchallengesinstoringsuchinformationandanalyzingthem.Datawarehousescouldprovideaneffectivewaytomanagespatialinformationbyprovidinglarge-scalestorage,multidimensionaldatamanagementandOLAPqueryingcapabilitiestogetherinonesystem.Spatialdatawarehouses(SDWs)arefull-edgeddatawarehouseswhichprovidenativesupportforspatialdataandadvancedspatialOLAPoperationsonthem.Theseoperationsonthespatialobjectscanincludebasicqueryingoperations,suchasFindthecitywiththelargestsalesvolumeforiPadsinthestateofFloridain2010,mapgeneralizationoperationssuchasFindallstateswherethetopveschooldistrictsout-performedallotherswithinthatstate,between2005and2010intermsofstudentgrades,orspatialanalysisoperationssuchasconvexhull,FindthesmallestconvexregioninwesternUnitedStatescontainingthemaximumnumberofcollegetownswheremorethan2500unitsofKinectweresoldin2010,andselectivespatialunion,ReturnthegeometryoftheregioninFloridadescribedbythecountieswhereDropBox1 1DropboxisaregisteredtrademarkofDropboxInc. 15

PAGE 16

usageexceededthatofTwitter2inthelastvemonths.Thislastqueryrequiresaspatialaggregateuniononthegeometryofthevariouscountiessatisfyingthecondition.Manyotherinterestingspatialaggregationqueriesarepossiblewhenspatialdataisfullyintegratedintodatacubesandaneffectiveapproachformultidimensionalqueryingisavailableonthem.Therehavebeenseveralproposalsofaconceptualdesignmodelfordatawarehouses.Acomparisonofexistingmodelsforspatialdatawarehousingalongwithrequirementsforuser-centricdatawarehousingwaspresentedin( Viswanathan,G.andSchneider,M. 2011 ).TheseconceptualmodelslieinthedirectionofEntityRelationship(E/R)models,UMLextensionsandotherad-hocsystemslikemodeldrivenarchitecturebasedapproaches.However,theyarenotextensivelyusedfordatawarehousedesignsincemostofthemaresystem-centric,unintuitiveforthenon-databaseuserorhaveoverlyconvolutedterminology.Inpractice,mostdatawarehousingmodelsdirectlyexposethelogicaldesignschematatoallusers.Thislogicaldesignoftenconsistsofdevelopingrelationaltablesthroughstar,factconstellationorsnowakeschema(calledRelationalOLAPorROLAP)( Kimball,R.andRoss,M. 2002 ),ormultidimensionalcubes(calledMultidimensionalOLAPorMOLAP).Asaresult,theuserrequirementsareoftenneglectedandtheapproachbecomestotallysystem-centric,insteadofbeingfocusedontheuser'sneedtounderstandandanalyzethecomplexdatasets.Moreover,suchdirectimplementationstrategiescanleadtoaninefcienttranslationofthedataanalysisrequirementsintothedesignschema.Abetterapproachformultidimensionaldatamodelingisagenericmetamodelatanabstractlevelthatcapturestheneedsofdataanalystsanddoesnotexposethelogicalimplementationaspectsandsystemfeatures.ThustheusergetsaplatformforanalyzingdatawithouttheneedtohandleimplementationdetailssuchastheROLAPorMOLAPconstraintsontheschema. 2TwitterisaregisteredtrademarkofTwitter,Inc. 16

PAGE 17

Inordertoprovidebuilt-insupportforspatialdatainanalysissystemsitisimportanttoincorporatespatialdatahierarchies,spatialdatadimensionsandspatialmeasureswithinthedatawarehouse.Thesewouldhelptosupportspatialaggregationoperationsonthem.However,theseposeseveralnewchallengesrelatedtospatialdatamodelinginamultidimensionalcontext,suchastheneedfornewspatialdatatypessuitedforaggregationoperations,inclusionofspatialhierarchiesindatadimensionsandasmeasures,thedevelopmentofnewspatialOLAPoperations,ensuringconsistentandvalidspatialOLAP,etc.Considerforexampletheresultofthespatialaggregateunionqueryshownabove.Sincetheresultcanbeeither-asimpleregion,asimpleregionwithholes,acomplexregionwithmultiplefaces,oracomplexregionwithmultiplefacesthatbearholes-basedontheselectionconditionfortheconstituentcounties,aSDWmodelshouldprovidedynamic,built-insupportforsuchreturntypes.OLAPoperationsareoftencategorizedasdistributive,algebraicandholistic( Grayetal. 1996 ; HanandKamber 2006 ),dependingonwhetherthemeasuresofhighlevelcellscanbeeasilycomputedfromtheirlowlevelcounterparts,withoutaccessingbasetuplesresidingatthenestlevel.Forexample,intheclassicsales(location,time,product)data,thetotalsalesofanitemat[Florida,2010]canbecalculatedbyaddingupthetotalsalesoverallmonthsin2010,i.e.,[Florida,January2010]to[Florida,December2010],withoutlookingatbasedatapointssuchas[Florida,20March2010],whichmeansthatSUMisadistributivemeasure.Incomparison,AVGisoftencitedasanalgebraicorsemi-distributivemeasure,inthatAVGcanbederivedfromtwodistributivemeasures:SUMandCOUNT,i.e.,algebraicmeasuresarefunctionsofdistributivemeasures.Holisticmeasuressuchasstandarddeviationrequiredataatthespecicrequisitelevelforallcomputations.Similarly,spatialqueryingandaggregationoperationssuchasspatialroll-up,drill-downandselectionalsoinvolveseverallevelsofdatamanipulation.Forexample,consideradrilldownoperationfromCountry(region)tocounty(maps)tocities(stringlabelsforpoints).Thiscomplexnavigationoperator 17

PAGE 18

canbeveryusefulinminingseverallevelsofspatialinformationsuchasgeo-spatialandvideodata.Uponreviewingexistingmodelingapproachesforspatialdatawarehousing(Section 2.1 )wefoundthatoneofthemajorshortcomingsofexistingmodelsistheheavyfocusondirectad-hocimplementationstrategiessuchasacombinationofOLAPtoolsorGISmappingclientswithdatabasestocreateapseudospatialdatawarehouse.However,foreffectivemultidimensionaldatamodelingandanalysiswhatisneededisareneddatawarehousearchitecturethatkeepstheuserasthefocalpointandachievesaclearabstractionofthedataforallstakeholdersinthesystem.Henceourproposalisforasoundconceptualmodelbuiltonabstractdatatypes(ADTs)andusingthecubemetaphorforOLAPanalysiswhilenativelysupportingspatialdataalongthedatadimensionsandasmeasuresforaggregation.TheuserviewiscreatedbyusingagenerictextualanalysislanguagecalledCubeAnalysisLanguage(CAL)thathelpstowriteSOLAPqueries.Finally,asetoftransformationrulesfromtheconceptualmodeltologicaldesignstrategiessuchasRelationalOLAP(ROLAP)( Inmon 2005 ),MultidimensionalOLAP(MOLAP)( Kimball,R.andRoss,M. 2002 )andHybridOLAP(HOLAP)( PedersenandJensen 2002 )isalsoneededtohelpcompletethedesignofthespatialdatawarehouse.Overall,thispaperprovidesanewinsightintothefundamentalrequirementsfordesigningauser-friendlyspatialdatawarehousemodelbyprovidinganobjectiveanalysisoftheessentialrequirementsforit.ExistingquerylanguagespopularlyusedforOLAPindatawarehousessuchasMDX3orMicrosofteXpressionslanguageandOracle4OLAParebasedonunique 3MicrosoftandMDXareeitherregisteredtrademarksortrademarksofMicrosoftCorporationintheUnitedStatesand/orothercountries.4Oracle,JavaandOracleOLAPareregisteredtrademarksofOracleand/oritsafliates.Othernamesmaybetrademarksoftheirrespectiveowners. 18

PAGE 19

logicalimplementationmodels,andcanoftenbeinappropriatefortheuserofageneralpurposedatawarehouse.Inrecentyears,MDXhasbecomethedefactostandardfortextualOLAPlanguages.However,itisnotoftenusedinapplicationsorindustryduetoseveralproblems.First,itrequirestheusertohavegoodknowledgeoftheunderlyingimplementationdetailsforperformingunionsandcrossjoins.Second,itdoesnotprovideaglobalviewoftheschemabutrequirestheusertohavethemultidimensionalschematicofdatabeforequerying.Thirdly,MDXcanoftengetcomplicatedduetoseverallevelsoforderedcrossjoinsinthequery.Thus,oftenthequeryitselfcannotbewrittenorvalidateddirectly,insteadrequiringaMDXparserandimplementationfortestingitsvalidity.WefoundseveraloftheexistingMDXimplementationstoberatherdifferentintheirlevelsoffunctionalityandsupport,leadingtomanyerrorsandincompatiblesyntaxissues.Finally,MDXdoesnotincludespatialoperatorsorsupportforstoringandanalyzingovercomplexpoint,lineandregiondata. 1.2ProblemswithExistingDataWarehouseDesignArchitectureThereareseveralproblemswithexistingconceptualdesignapproachesfordatawarehousesasoutlinedbelow.TheseserveasoneofthemotivationsfordevelopingtheBigCubemodel.First,currentlyavailableconceptualmodelsfordatawarehousesareoftentoocomplicatedforendusers.TheconceptualmodelsbasedonERandUMLdiagrams,forexample,addcomplexextensionstotheconventionalmodelingdiagrams,whichareoftentimesnoteasilyperceptibletousers.Thustheyfailtoserveasacommunicationinterfacebetweentheuserandthedesigner.Second,differentconceptualmodelsusedifferentterminologiestoillustratethesamedatawarehouseconcepts.Thisisconfusingtotheuserandevenmisleadinginsomecases.Thereisalsoaclashbetweentheconceptsusedinconceptualmodelingandthoseusedforlogicalimplementation.Thethirdissuestemsfromtheprevioustwoproblems.Sincemostconceptualmodelsarecomplexandhaveoverlyconvolutedterminology,inpractice, 19

PAGE 20

mostdesignersdirectlyapplylogicalimplementationstrategiessuchasRelationalOLAP(ROLAP),MultidimensionalOLAP(MOLAP),oracombinationofthetwo,asHybridOLAP(HOLAP)todeveloptheirdatawarehouses.Fore.g.,inmostcases,thesimplestarschema( KimballandRoss 1996 )ischosenasthedesignstrategyandthedatawarehouseisimplemented.Asaresult,theuserrequirementstakeabackseatandtheapproachbecomestotallysystem-centric,insteadofbeinguser-centric.Fourth,queryingandanalyzingdatausingdatawarehousesisstilladifculttaskfortheunskilleduser.Thisisbecauseoftwofactors.Availablegraphicaluserinterface(GUI)toolsforbusinessintelligence(BI),suchasspreadsheets,charts,dynamicguresanddashboardsdonotalwaysprovidethecompletefunctionality.Theyareoftenspecictooperationsandrestrictedinfeatures,suchasthecapabilityforframingadditivequeries.Theydonotprovidethecapabilitytoperformallavailabledatawarehouseoperationsusingasimplecommoninterface.Secondly,theavailablequerylanguagesforOLAPsuchasOracleOLAPorMultidimensionaleXpression(MDX)( MicrosoftCorporation 2010 )arepowerful,butrequireexpertknowledgeabouttheunderlyingdatamodelandoperationssuchasthecross-join.Also,theSQL-likestyleofMDXremindsusersofthestandarddatabasequerylanguagebuttheintendedsemanticsisdifferent.Fifth,mostexistingconceptualmodelsprovidesupportforonlysimpledatasuchasalphanumericdata.However,withnewemergingapplicationsofspatialandtemporalinterest,supportforawiderangeofcomplexdata(suchasmulti-structuredspatialandtemporaldata)wouldbenecessary.Finally,existingconceptualmodelshaveseveralfeatureshortcomings.Forexample,somemodelscannotcapturealltheuserrequirementssuccinctlywiththeirexistingfunctionality.Othersencounterproblemsinthemappingfromconceptualtologicaldesign.Someotherscannothandlethetranslationofalargenumberofdimensions,hierarchiesandtheirinter-relationshipstologicaldesign.ThisisdiscussedindetailinChapter 2 20

PAGE 21

1.3ContributionsandDevisedApproachesInthiswork,weproposeanovelapproachforconceptuallyanalyzingmultidimensionaldatasetsinauser-centricfashion.OurmodelgeneralizesoverexistingdesigntechniquesandthismetamodeliscalledasBigCube.Asthenamesuggests,weusethecubemetaphortoprovideacompletesetofabstractmultidimensionaldatatypestomodelcomplexstructureddata.Thiscubeviewcorrespondswellwiththecognitiveunderstandingofmultidimensionaldatafortheanalysts.UsingabstractdatatypesforeachofthecomponentsoftheBigCubemakesthemodelveryextensibleandhelpsustosupportcomplexspatialobjects(suchaspoints,linesandregions)anddevelopmultiplelayersofOLAPqueriesonthem.Wealsoprovidebuilt-insupportforspatialdatahierarchiesbothalongtheperspectivesdeningthecubeandinsidethecubeasmeasuresofanalysis.Further,weaimtosolvetheproblemsassociatedwithdatawarehousequerylanguagesbyintroducingtheCubeAnalysisLanguage(CAL).CALisanabstractionoverstandardSQLaggregateextensionsandMDX,andhelpstheusertodesign,buildandmanipulatedatacubesinafullymultidimensionalenvironment.CALalsoincludesaqueryandanalysiscomponentthatcanbeseamlesslyextendedtosupportcomplexanalysisoperations.ThesearelatertranslatedintologicalcomponentsandimplementedusingexistingtechnologiessuchasRelationalOLAPorMultidimensionalOLAP.TheadvantageofCAListhatuserscancreate,modify,queryandanalyzethestructureandinstancesofmultidimensionaldatacubessimplybyusingtheabstractcubeinterface.Theunderlyingrelational(ROLAP),multidimensional(MOLAP)orhybrid(HOLAP)implementationschemeiscompletelyhidden.Thishelpsanalyststoeasilyqueryandanalyzedatasetswithoutresolvingmulti-leveltableandkeydependenciesfromthebackendtablesthatdrivethecuberepresentation.ThecombinationoftheBigCubemodelandCALquerylanguagepresentsanewuser-centricframeworkfor 21

PAGE 22

modelingmultidimensionaldataandforsupportingcomplexaggregationsonspatialobjectsindatawarehouses.WehaveextendedthetraditionalmultidimensionalalgebrabyprovidingspatialsupportinthedatahierarchiesanddevelopingspatialqualitativeandquantitativeoperationsforBigCubebasedqueries.Inparticular,weintroduceanewapproachtoevaluatecardinaldirectionrelationsamongcomplex(multi-component)spatialpoint,lineandregionobjectsusinganovelObjectsInteractionMatrix(OIM)approach.Thisisextendedtosupportthespatio-temporalvariationofcomplexregionsusingagraticuletiling.TheseoperationscanthenbeusedinaspartoftheBigCubealgebra,fore.g.to`ndtheregionswestofWyomingthathadasalesofiPhonesgreaterthan20Kunitsin2010'.Additionally,sincewebasetheentireBigCubeviewondatahierarchies,webuiltanewdatastructurecalledIntelligentBinaryLargeObjects(iBLOBs)thatleveragestheLOBstoragecapabilityoftraditionaldatabasestosupportefcientread,writeandupdatesofcomplex,multi-structuredobjects.AnexperimentalevaluationofiBLOBsdemonstratestheimprovedperformanceofthisapproachwhencomparedtoXMLandnaiveBLOBbasedin-databasestorageoflarge,complexobjects.Insummary,thisworksupportsscienticdatamanagementbyprovidinganoveluser-levelinterfaceforspatialdatawarehousesthatnativelysupportsthestorageandefcientretrievalofcomplexdatasuchasspatialpoints,linesandregions,andallowsnewoperationslikecardinaldirectionstobeevaluatedoverthem. 1.4DissertationOutlineThisdissertationisorganizedasdetailedbelow.Chapter 2 providesadetaileddiscussionaboutexistingconceptualdesignapproachesfordatawarehouses.ThisisfollowedbyastudyofconceptuallogicalandimplementationmodelsforspatialdatawarehousesandexistingSOLAPanalysistools.Thesearelaterevaluatedusingasetofbasicdesignrequirementsformultidimensional, 22

PAGE 23

spatialdatamodeling.Later,asummaryofavailablequerylanguageinterfacesformultidimensionalandscienticdatabasesanddatawarehousesispresented,followedbyadetailedstudyofspatialdatamodelingandoperationsonspatialdata.Thishelpsisdeterminenovelspatialoperationsandaggregationsforanalyzingovercomplexspatialpoints,linesandregionsinthedatawarehouse.Oneexampleofsuchanoperationisthedeterminationandevaluationofqualitativecardinaldirectionsrelationshipsbetweenspatialobjects.Chapter 3 presentstheBigCubeapproachfordesigningdatawarehouses.Thebasisofthisapproachistomodeldataasacubeandtoincorporatethecubeasanabstractdatatypeindatawarehouses.ThisSectionincludesaformalizationofmultidimensionaldatatypesformodelingcomplexspatialdataandOLAPoperationsonthem.Chapter 4 introducesthevariousoperatorsavailabletodene,manipulate,queryandanalyzedatausingtheBigCubemodel.Additionally,wealsoprovidenewvariationsoftraditionalmultidimensionaloperatorssuchasspatialsliceandconvexhull,MBRandMBRbasedspatialaggregations.Chapter 5 introducesthepresentationlayerofourmodel.ThisiscomposedoftheCubeAnalysisLanguage(CAL)thatincludestheCDLforBigCubedenition,CMLforBigCubemanipulationandCQALforBigCubequeryandanalysis.Chapter 6 introducesournovelIntelligentBinaryLargeObjects(iBLOBs)fortheefcientstorageandretrievaloflargehierarchicalobjectsindatabasesbyextendingtheBLOBstructure.WealsopresentresultsofourevaluationofiBLOBsforspatialhierarchicalobjectsandprovidecomparisonswithXMLandBLOBdatastructuresandtheirstorageandqueryperformance.Chapter 7 explainsthecardinaldirectionoperatorusingtheObjectsInteractionMatric(OIM)modelwedeneaspartoftheBigCubealgebra.HereweextendprovidesanapplicationofOIMandtheBigCubeoperators.ThenewmodelcalledtheObjects 23

PAGE 24

InteractionGraticule(OIG)helpsinevaluatingspatialcardinaldirectionrelationshipsamongmovingobjectsindatawarehouses.WedemonstratehowcardinaldirectionscanbeusedforqueryingspatialobjectsusingtheBigCube.model.Chapter 8 concludesthedocument,providesalistingofpublicationssupportingthisworkandmentionsexcitingnewavenuesforfuturework. 24

PAGE 25

CHAPTER2RELATEDWORK 2.1OverviewMostresearchintheareaofdatawarehousingandOLAPhasfollowedadvancementsincommercialproducts.However,thegeneralvisionofadatawarehouseasalargecollectionofdata,fromseveralsourcestoaiddecisionsupport,issharedbymostresearchers.Forefcientanalysis,thedataintheOLAPsystemisoftenstoredinamultidimensionalstructurelikeamultidimensionalarrayorcube.Allexistingdata-warehousemodelscanbeclassiedbroadlyintoconceptualmodelsandlogicalmodels.Theconceptualmodelingapproachespresentanabstractviewofthedatatotheuser,anddonotinvolveimplementationaspects.Thelogicalmodelspresentanotherdesignmethodologythatistiedcloselywiththeimplementationaspectsofthedatawarehouse.Inthissectionwepresentadetailedsurveyofdatawarehousedesignarchitecturesusingconceptualmodelingstrategies,logicalmultidimensionalmodelingstrategies,andnallyquerylanguagesforOLAP.Inthissection,wereviewexistingresearchondatawarehousingandOLAPtools,spatialdatamodellingandassociatedimplementationstrategies,leadingtothelistofessentialrequirementsforspatialdatawarehousing(inSection 2.5 ).Aftergatheringtherequirementsforagoodmultidimensionaldatawarehousemodelweprovideacomparisonoftheabovementionedmodelsbasedontherequirementcriteria.Figure 2-1 illustratesthevariousdomainsthatneedtobeconsideredfordecidingthearchitectureofaspatialdatawarehouse(Section 3.4 ).Asurveyofthestate-of-the-artineachofthesedomainsisthetopicofthecurrentsection. 2.2DataWarehouseConceptualDesignFrameworksTherehavebeenseveralproposalsforadatawarehousedesignframeworkinthepastfewyears.TheconceptualmodelingapproachescanbebroadlyclassiedintoExtensionsofEntityRelationship(E/R)models(( FranconiandKamble 2004 ; 25

PAGE 26

Kamble 2008 ; MalinowskiandZimanyi 2006 ; Sapiaetal. 1999 ; Tryfonaetal. 1999 )),ExtensionsofUniedModelingLanguage(UML)(( Abelloetal. 2006 ; Lujan-Moraetal. 2006 ; Pratetal. 2006 ))andAd-hoc(( Golfarellietal. 1998b ; Husemannetal. 2000 ; Zepedaetal. 2008 ))models. 2.2.1E-RApproachesThedatawarehouseisabasicallyacollectionofdata-dimensionsthatrevolvearoundacentralfact.Becauseofthisnatureofdatawarehousesandthefactthattheyareusedtoanalyzeverylargecollectionsofdata,theE-Rdesignmodelisfundamentallyunsuitableformodelingdatawarehouses.Forexample,inKimball's(1996)view:EntityrelationmodelsareadisasterforqueryingbecausetheycannotbeunderstoodbyusersandcannotbenavigatedusefullybyDBMSsoftware.Entityrelationmodelscannotbeusedasthebasisforenterprisedatawarehouses.Entityrelationship(E-R)datamodelingisgoodforreportingandpointquerieswhiledimensionaldatamodeling(usingfactanddimensiontables)isgoodforad-hocqueryanalysis.Manytimes,thistranslatestoanentityrelationship-baseddatawarehouse(overallschema)andadimensionaldatamartlayer(smalleroperationalschema).AmongtheE/Rapproaches,Malinowskiet.al.presenttheMultiDimERmodelbasedonthestarandsnowakeschemain( MalinowskiandZimanyi 2006 ).ThismodeliswelldenedintermsoftheE/Rmodelanditslogicalrepresentations,andisapromisingconceptualE/Rapproachformultidimensionalmodeling.However,theMultiDimERmodelbecomescomplicatedwhentherearemultiplehierarchiesoverthedimensions(toomanylinkshavetobeconstructed.)Anotherproblemofthisapproachisthatonlineaggregation(oraggregationovermultiplelevels),whichisveryessentialfordatawarehouses,isnotsupported.Also,thetransformationfromconceptualviewtologicaldesignbecomestedious,sincetherearemultiplepossiblewaysbywhichirregularhierarchiescanbemappedintorelations.Sinceadatawarehousecommonlyhasseveralmany-to-manyrelationships,thismeansthattheE-Rdiagramwould 26

PAGE 27

beoodedwithassociativeentities,sinceeachoftheirattributesmightberelatedtoseveralfundamentalentities.(ThisisactuallyageneralweaknessofusingtheE-Rapproach).Intraditionaldatabasedesign,thetransformationfromERtologicaldesignisfollowedbyanormalizationstep.Thismodeltriestoincludesomeaspectsofdatabasenormalizationinthetransformationprocessdirectly,andishence,aconvolutedmixofdatabaseandmultidimensionalmodelingconceptsthattakestheuserawayfromthegoalofanalyzingthedata.ThestarERmodelispresentedin( Tryfonaetal. 1999 ),butthismodellacksanexplicitmechanismforrepresentinginitialuserrequirementsfordynamicmultidimensionalmodeling,andamethodologytoperformOLAPoperationsonthedata.Thismodelmakesuseoffactanddimensiontablesdirectly,withstarorsnowakeschemaforDWdesign;soitsactuallydimensionalmodeling.Itdoesidentifytheproblemthatconceptualmodelinghasn'thadmuchinterestamongacademicresearchersindatawarehousingandthereislackofaclassicationandunderstandingofdifferentkindsofhierarchies.Kambleet.al.( Kamble 2008 )presentalistofrequirementsandevaluatetheGen-eralisingConceptualMultidimensionaldataModel(CGMD)fordatawarehousedesignagainsttherequirments.ThecoreofthismodelisanextensionoftheE/Rapproach( FranconiandKamble 2004 )thatautomatestheconstructionofmultidimensionalconceptualschemafromanE/Rdiagramusingseparaterepresentationsofsimpleandmultidimensionalaggregatedentities.TheE/Rapproachishowevercounter-intuitivetothecognitiveunderstandingofthemultidimensionaldataanalysisschemataforthebusinessuser.Further,thismodelalsodoesnotexplicitlysupporttransformationsbetweenmeasuresanddimensionmembersandthemany-to-manyrelationshipsthatappearamongdimensionsandaggregationatintermediatelevelsalongthehierarchies.Sapiaet.al.presentamultidimensionalextensiontotheE/RmodelcalledME/RorMultidimensionalEntityRelationshipmodelin( Sapiaetal. 1999 ).TheextensionisaspecializationofE/RbasedontheISO/IRDSstandardmetadata.The 27

PAGE 28

semanticissimilartoE/Randonlythestructuresformultidimensionalusagearedescribed.ThestructurespresentedareaMeta-Model,whichisextendedE/Rwiththegeneralizationconcept,theE/RmodelerwhichdecidesthelegalconstructionsandtheMultidimensionalE/RwhichisaspecializationclassfromthepreviousE/Rmodeler.ThisisalsoavisualconceptualmodelingapproachthatextendsE/Rtothemultidimensionalarena,byusingtwospecialrelationshipssets,namelyn-aryforfactsandbinaryforregularclassicationhierarchies,andanewspecializedentitysetcalledthedimensionlevel.Measuresaremodeledasattributesofthefactrelationshipset,andDescriptionattributesareshownasattributesofthedimensionlevels.Thereisacon-nectrelationshipbetweenthedimensionlevelandthefactrelationshipset,therolls-uprelationshipsetrespectively.Theconnectbetweenarolls-uprelationshipsetandadimensionlevelhasaconstraintthatthegraphconstructedwiththepairsofdimensionlevelsconnectedbyarolls-uprelationshipsetmustbeaDirectedAcyclicGraph(DAG).Thereisaconstraintthatthegraphofdimensionlevelscannotbecyclic.ThemodelprovidesseveralspecicgraphicnotationsfortheME/Rextension.Thismodelisbasedonthewell-knownE/Rmodel,andborrowssyntaxandsemanticsfromit.Henceitiseasytounderstandanddevelop.Thedrill-acrosspathsriseinanaturalwaywhentwoormorefactrelationshipsetsareconnectedtolevelsinthesamehierarchy.modelalsousesstatediagramsaretorepresentthesystem'sbehaviorandevendenesasetofOLAPoperations.However,themodelfailstosupportnon-strictclassicationhierarchies,andmany-to-manyrelationshipsamongparticulardimensions.DevelopedundertheEuropeanCommission'sESPIRITprogram,DWQproject( JarkeandVassiliou 1997 )providesaconceptualmodelproposal(DWCDM)withagraphicallanguagebasedonE/R,andDescriptionLogicstoformalizeknowledgerepresentationasconceptandroles.ThegoaloftheDWQprojectistodevelopasemanticfoundationthatwillallowthedesignersofdatawarehousestolinkthechoiceofdeepermodels,richerdatastructuresandrigorousimplementationtechniques 28

PAGE 29

toquality-of-servicefactorsinasystematicmanner,thusimprovingthedesign,theoperation,andmostimportantlytheevolutionofdatawarehouseapplications.DWQ'sresearchobjectivesaddressthreecriticaldomainswherequalityfactorsareofcentralimportance:toenrichthesemanticsofmetadatabaseswithformalmodelsofinformationqualitytoenableadaptiveandquantitativedesignoptimizationofdatawarehouses;toenrichthesemanticsofinformationresourcemodelstoenablemoreincrementalchangepropagationandconictresolution;toenrichthesemanticsofdatawarehouseschemamodelstoenabledesignersandqueryoptimizerstotakeexplicitadvantageofthetemporal,spatialandaggregatenatureofDWdata.Theresultswillbedeliveredintheformofpublicationsandsupportedbyasuiteofprototypemodulestoachievethefollowingpracticalobjectives:validatingtheirindividualusefulnessbylinkingthemwithrelatedmethodsandvendor-specictools.Theresearchgoalistodemonstrateprogressovercommercialstate-of-the-art,andtogivemembersoftheindustrialsteeringcommitteeacompetitiveadvantagebyearlyaccesstoresultsdemonstratingtheinteractionofthedifferentcontributionsinthecontextofcasestudiesintelecommunicationsandandenvironmentalprotection.ThismodelinheritsallthedisadvantagesofERrepresentationalongwithnosupportforgenericdimensionalityandnoclearmodularrepresentation.Themodelisvagueandvendor-specic,andprovidesnosupportfornavigationalongdimensionhierarchies. 2.2.2UMLApproachesAmongtheUMLapproaches,( Abelloetal. 2006 ),( Lujan-Moraetal. 2006 )and( Pratetal. 2006 )presentUMLextensionsformodelingmultidimensionaldata.( Abelloetal. 2006 )presentsYAM2asanextensionofUMLwhichemphasizesonpart-wholerelationshipsforaggregation,usesstatediagramstorepresentthesystembehaviorandevendenesasetofOLAPoperations.ThemodelgeneratesconceptualmultidimensionalconstellationschemasfromrequirementsexpressedinSQLqueriesandrelationalsources.Thismodelstronglycapturesend-userrequirementsand 29

PAGE 30

performsvalidationbeforegeneratingtheconceptualschemas.However,onedrawbackofthismodelisthelackofsupportforaggregationsattheschemalevel.Trujillo'sworkextendsUMLin( Lujan-Moraetal. 2006 ),byusinganUMLproledenedbyasetofstereotypes,constraintsandtaggedvaluestorepresentthemainpropertiesattheconceptuallevel.Thisisfurtherextendedandautomatedin( Pratetal. 2006 )andsupportsmultiplelevelsofaggregationwhichisveryessentialinadatawarehouse.However,therearesomeproblemswiththeUMLapproach.Thereislackofacompletetranslationoftheuserdesignintologicalandimplementationlevels,whileguaranteeingsummarizability.Further,asthenumberofdatadimensionsincrease,themodelsbecometoocomplexforabusinessanalysttocreativelycontinuetheanalysisprocess.aspectofdataforanalysis. 2.2.3Ad-hocApproachesAmongthead-hocapproaches,theDimensionalFactModel(DFM)isconstructedin( Golfarellietal. 1998b )fromanoperationalE/Rschemabasedonrequirementanalysis.DFMisbasedontherelationalstarschema( Kimball,R.andRoss,M. 2002 )anddoesnotsupportgeneralization/specializationhierarchiesandmany-to-manyrelationships.Inasimilarmanner,Zepedaet.al.( Zepedaetal. 2008 )presentedaModelDrivenArchitecture(MDA)forproducingcandidatemultidimensionalschemasfromoperationalE/Rschemabasedonrequirementanalysis.EachofthecandidateschemesisbasedonstarschemaandtheMDmodelsupportsgeneralizationhierarchiesandmany-to-manyrelationships.( Husemannetal. 2000 )presentsanotherad-hocmodelingapproachbasedonanextensionoftherelationalNormalFormsforMultidimensionaldata.However,noneoftheseproposalsarecapableofinherentlysupportingaggregationinthestructureandinstance,andprovidingaclearabstractionofthedatafortheusersofthesystem. 30

PAGE 31

2.3LogicalMultidimensionalModelingMethodologiesSeveraldifferentlogicalmodelshasalsobeenproposedtomodelmultidimensionaldatainthepastfewyears.Thedatacubeoperatorwasformallyintroducedin( Grayetal. 1996 )inanattempttoextendtherelationalmodeltosuitmultidimensionalanalysis.Thoughmuchoftheworkwasbuilttoaidintherelationalrepresentationofaggregatedata,contributionsliketheALLoperatorareessentialevenfromamultidimensionalperspective.AcompletesurveyofthepropertiesofseveralearlierlogicaldesignmodelscanbefoundintheworksofBlaschkaandothers( Blaschkaetal. 1998 ),Vassiliadis( VassiliadisandSellis 1999 )andPedersen( Pedersenetal. 2001 ).Wenowpresentadetaileddiscussionofalltheseapproaches,includingthenewermodelsintroducedintherecentyears,orderedlooselyoverthemodeltimeline.OneoftheearliestapproachesformultidimensionalmodelingwasintroducedbyKimball( Kimball 1997 ).Thisdimensionalmodelingapproachproposesaninformalmethodologytoderivethemultidimensionalschema,andprovidesawaytodeveloparelationalimplementationintheformofthestarschema.Dimensionalmodelingimposessomerulesonthemodeling,butresultsinadatamodelthathastheaccessmethodsdenedclearlybyvirtueoftherelationships( Kimballetal. 1998 ; Kimball,R. 1996 ).Usersarealsobetterabletorelatetotheseemeasurebydimensionalvalue(s)paradigmratherthanasimplecollectionofvalues.Theapproachinvolvesdiscoveringthedata-martsforthedata-warehousespace,listingalldimensionsforeachdata-mart,usinganad-hocmatrixtocaptureuserrequirements,andthendesigningafacttablewithmeasuresaddedtoeachgrainofdetailalongthedimensionlevels.Liet.al.( LiandWang 1996 )presentagroupingmodelthatdescribesthecubeasafunctionfromseveralrelations(dimensions)tosomemeasurablevalues.Foreachcombinationoftuples(calledacoordinate),onefromeachdimension,thereisanassociateddatavalue.Eachdimensionisviewedasabasicgrouping,i.e.,eachtupleinthedimensioncorrespondstothegroupconsistingofallthecoordinatesthatcontainthis 31

PAGE 32

tuple.Inordertoexpressuserqueries,relationalalgebraexpressionsarethenextendedtothoseonbasicgroupingsforobtainingcomplexgroupings,includingorderorientedgroupings(forexpressing,e.g.,cumulativesum).Anextensiontorelationalalgebracalledgroupingalgebrahasbeendevelopedtofacilitatedataderivation.ThismodelisbasicallyprovidesformalismfortherelationalOLAPapproach.Gyssenset.al.proposedanothercubebasedmodel( GyssensandLakshmanan 1997 ),whichisbased,ratherstrongly,ontherelationalapproach.Thecubeisrepresentedasatable,denedwithasetofdimensionnames,attributesandafunctiontopairormapattributestotherespectivedimensions.Thesetofdimensionsnamesisdisjoint.Themeasuresofthecubearemappedtotheattribute-dimensionpairs.Aninstanceforatableisdenedasn+1niterelationswheretherstnrelationsareovertheattributesfromeachdimensionwithanextraattributefortupleidentication(T-id).ThelastrelationisovertheattributesfromMandhaseachone(T-id)fromtherstnrelations.Thisisanotherspecicationforstarschema.Thetableshaveatabularandarelationalrepresentation,forsimplifyingtheinclusionofaggregationoperations.Therelationalalgebraisextendedwithaggregatefunctionsandgroup-bymechanisms.Therearetwofunctionsprovidedforrestructuringtables.Foldtransformsdimensionattributesintomeasures,andunfoldtransformsmeasuresintodimensionattributes.Thismodelthussupportssymmetrictreatmentofdimensionandmeasures,buttherelationalaspectlendstheweaknessofnotbeingabletoanalyzeintuitivelyalongalldimensions.ThemodelpresentedbyAgrawalet.al.in( Agrawaletal. 1997 )presentsalogicaldatamodelformultidimensionaldatabases.Thecubeisdenedasasetofdimensions(eachassociatedwithadomain)andasetofelements(measures).Amappingisprovidedbetweenthedimensionsandthesetofelements.Theelementsofthecubecanbe0,1(theBooleanCube)oran-tupleofelements.Thismodeldoesnotrequirethedimensionstohavearanked,discretedomain.Insteadthemappingfunctioncan 32

PAGE 33

beusedtoprovideasymmetrictreatmentbetweenmeasuresanddimensions.AnalgebraisalsodenedoverthemodelwithoperationssuchasPUSH(totransformadimensionintoameasure),PULL(totransformameasureintoadimension),DESTROYDIMENSION(toremoveasingle-valueddimensionfromthecube),RESTRICTION(tocontraintthecubealongdimensions),JOIN(toconstructanewcubefromtwocubes).TheJOINoperationsbetweentwocubesisparticularlyinteresting.IntheJOINoperation,eachdimensionintherstcubeiscombinedwithexactlyonedimensioninthesecond.Theresultisadimensionwiththeunionoftheoriginaldimensions.Twomappingfunctionsmapthejoiningdimensionstothenewdimension.Themeasuresfromeachcubeforeachcoordinatearecombinedintoanewmeasureusinganotherfunctionf-elem.SeveralotheroperationslikeCartesianproduct,naturaljoin,andassociatearealsodiscussed.However,thismodeldoesnotdiscussthehandlingofexplicitmultiplehierarchiesamongdimensions,andoftheproblemofimprecisionduetodoublecountingduringdataaggregation.Themodelisalogicalmodeldevelopedtoexpressmanipulationstohandlecomplexmultidimensionaldata.AcleardatastructurecapableofmeetingtherequirmentsforaformalmultidimensionalOLAPmodelarenotdiscussed.IthoweverexpresesageneralOLAPstructurethatcanbeimplementedusingRelationalOLAP(ROLAP)orMultidimensionalOLAP(MOLAP)oracombinationofthesetwomethodologies.Golfarelliet.al.presentanothermultidimensionaldatamodelcalledDFMorDimensionalFactModel( GolfarelliandRizzi 1998 ; Golfarelli,M.andRizzi,S. 1999 ).AmethodtoconstructthemultidimensionalschemafromE/Risalsopresented.ThemodelisprimarilydescribedingraphicalnotationsimilartoE/R( Golfarellietal. 1998a ).Theemphasisforthismodelisonconceptualdesignwithaclearrepresentationofthestructures.ThefactschemeisformalizedusingatypeofDirectedAcyclicGraph(DAG)calledaQuasi-Tree.Thefactschemeisdenedasasix-tuplewithmeasures,dimensionattributes,non-dimensionattributes,setoforderspairstobuildthetree, 33

PAGE 34

asetofoptionalrelationshipsandasetofaggregationstatements.Theaggregationelementisatripletconsistingofmeasure,dimension,andaggregationoperator.Ahierarchyoverthedimensionisdepictedbythequasi-treegraphoveritsattributes.PrimaryandSecondaryfactinstancesarethendenedforplacethemeasures.TheSecondaryfactinstancesserveasaggregationsofprimaryfactinstances(basedonlegalv-dimensionalaggregationpatterns).Later,avaguemechanismforintegrationoffactschemesisdescribedastheunionofmeasuresandintersectionofdimensionvalues.However,drill-acrossconstructsarestillnotclear,andlaterworkmentionsarelationalimplementationofthestructuredevelopedforanalysiswithoutaformalismforverifyingthecorrectnesssofresultingaggregations.Cabibboet.al.( CabibboandTorlone 1998 )presentalogicalmultidimensionalmodelcalledMD,alongwithacalculusasaquerylanguageandobservationsonitsexpressivepower( CabibboandTorlone 1997 ).Themethodologyis,however,orientedtotransformanE/RschemaofoperativedataintoalogicalMDschema.TheMDschemacomprisesofdimension,andf-tablethatrepresentsthecube.Amultidimensionaldatabaseisdenedasapairwithasetofdimensionsandasetoff-tablesconstructedoverthosedimensions.Dimensionsincludeasetofnames,eachofwhichbelongstoalevelcoveredbyadomain.Adimensionscanhaveoneormorelevels.Aninterestingobservationisthateachpairofleveldomainsisdisjoint.Thus,fore.g.,CitiesbelongtoadifferentdomainfromStates.(However,theauthorsseemtojustconsideronlyalphanumericvaluesinthevariouslevels.)Thedimensionhierarchyisrepresentedbyapartialorderonthelevels.Thusforl1
PAGE 35

focusedontheoperations.Dimensionattributesareclassiedintothreecategoriesasprimaryattributes(toidentifydimensionelements),classicationattributes(tostructurethedimensionelementsinlevels)anddimensionalattributes(todescribethefeaturesofdimensionalelements).Theclassicationattributesdescribeaclassicationhierarchyrepresentedasabalancedtreewhereeachnodeisaninstanceoftheattributeforthatlevel.Thusthereisnodistinctionbetweentheschemaandinstancespecicationforthismodel.ClassicationandFeaturedomain(forsub-tree)typesaredenedonthenodesoftheclassicationhierarchy.Overthisstructure,themultidimensionalobjectMOisdenedthroughaselection,aggregationoperationoverspecicnodeidentiers.SecondaryMOsaredenedtoprovidecontextdescriptorstotheprimaryMOs(theMOtypementionedforemost).Standardaggregationoperationslikesum,avg,min,maxandcountaredescribedoverthisstructure,usinganalgebra.Thismodelsupportsnotionsofsummarizabilityandexplicithierarchies.However,multiplehierarhiesoverthesamedimensionaren'tshowntobesupported,anditlacksasymmetrictreatmentfordimensionsandmeasures.Dattaet.al.proposedabasiccubemetaphortomodelthedatawarehousein( DattaandThomas 1999 ).Thismodeldescribesthecubeasafour-tuplecomposedofdimensions,measures,attributes,andafunctiontomapdimensionstomeasurevalues.Themappingisdenedsuchthattheattributesetscorrespondingtodimensionsarepairwisedisjoint.Thepaperalsodescribesaggregationoperatorsonthecubesuchascartesianproduct,join,difference,andsum,max,min,etc.PushandPulloperatorssimilarto( Agrawaletal. 1997 )arealsodened.ItalsoprovidessampleOLAPqueriesandanalgebratodescribethem,byoperatingonthedatacube.However,withthenotionofATTRIBUTES,themodeldoesnottalkabouttheexplicithandlingofhierarchies,orthedouble-countingproblemsduringaggregation.Further,manytomanyrelationshipsbetweenfactsanddimensionsaren'thandled.Thisisaprimitivemodelthatdoesnottakeintoaccountcomplex 35

PAGE 36

multidimensionalrelationships.Itseemstohaveevolvedbackwardsfromtherelationalschema,sinceeachdimensionisrepresentedbyoneattributeattheinstancelevel.Thisisaseveredrawback.Pedersonet.al.proposedamultidimensionaldatamodel( Pedersenetal. 2001 ; PedersenandJensen 1999 )thatpresentsaclearandconsistentwaytomodelcomlpexdata.Then-dimensionalfactschema(relatetorelationschemadenition( Codd 1970 ))iscomprisedonfacttypeanddimensiontype.Themultidimensionalobject(MO)isdenedasafour-tuplecomposedoffactschema,facts,dimensions,andfact-dimensionrelations.Thefactschemacomprisesofafacttypeanditscorrespondingdimensiontypes.Theinclusionoftypeinthismodelisjusttoenablethematchingofsimilardatacubesforintegration,andalsotoperformsomeaggregationoperations.Thefactistheobjectofinterestasdenedbythecubeanditisquantizedbymeansofmeasurevaluesinthecellsofthecube.Everythingthatcharacterizesthefacttypeisconsidereddimensionalinthismodel.Thustherearenoattributes.Thedimensiontypeisdenedbycategorytypesasapartialorderfunction.Thus,dimensionsarecomposedofcategories(similartoleveldescriptors)andthecategoriesdenealevelinthehierarchyofthedimension.Someimportantfactshavebeennotedinthispaper:Severalfactsmaybecharacterizedbythesamecombinationofdimensionvaluesinmultidimensionalspace,butnotwithinoneMO.Further,imprecisioninthedataishandledbytestingthegranularityandspecifyingtheprecisionoftheresultsascoarseorprecise.Thepaperalsolistselevenbasicrequirementsforamultidimensionalmodel(summarizedandextendedfrompreviousworks)andprovidesasurveyofearliermodels.doesnotconsidercomplexspatialobjectsasmeasures,norperformsaggregationoperationsoverthem.Theinclusionoftypesisanaddedoverheadfortheconceptualmodel.However,thismodeldoesnotprovideagooduserviewforthedatawarehousespaceandtheformalismisratherconvoluted.However,thismodeldoesnotclearlyformalizethestructureofthemultidimensionaldata.Fore.g.,thereisadisparitybetweenthe 36

PAGE 37

structureofthedimensionusingthetypes,whichconsistsoffourelements,andtheactualdimensionwhichonlyrequirestwoelements,namely,thesetofcategoriesandthepartialorderrelation.Themodelalsousesfact-dimensionrelationstohelplinkfactstodimensionvalues,andthusdoesnotallowformissingvalues.Inreality,however,businessintelligencedatacanhaveunknown,missingornullvalues,anditbecomestheworkofthedatamodeltocaptureandhandlesuchanimprecisionanduncertaintyunderthepoliciessetforthbytheuserforsummarizability.Oftentimes,statisticalanalysisonterabytesizedbusinessdataislessaffectedbysomesparsenessincertainareasofthemultidimensionalcube.Thismeansthatfunctionstolinkdimensionsandassociatedfactsaremoreimportantinadatawarehousemodel.Moreover,webelievethatthemodelisalsonon-intuitivefortheusertohelpinconceptualdesignandintheconstructionofthemultidimensionalcubestructureforadatawarehouse.ofthebigcubeasthebasisfordatawarehousedesignfortheuser.SomepapershavealsoproposednewmodelstointegrateOLAPandGISsystems( Bimonteetal. 2005 ),( Bimonteetal. 2007 ),alongwithsomepossibleoperations( Bimonteetal. 2006 ).Thesimplecubedatamodelslike( DattaandThomas 1999 )donotprovidespecicsofhierachiesamongdimensionsandthemany-to-manyrelationshipbetweenfactsanddimensions.( MalinowskiandZimanyi 2005a )addressesissueswithspatialdataandtalkabouthierarchiesandspatialmeasures.Inthissection,wereviewexistingresearchondatawarehousingandOLAPtools,spatialdatamodellingandassociatedimplementationstrategies,leadingtothelistofessentialrequirementsforspatialdatawarehousing(inSection 2.5 ).Figure 2-1 illustratesthevariousdomainsthatneedtobeconsideredfordecidingthearchitectureofaspatialdatawarehouse(Section 3.4 ).Asurveyofthestate-of-the-artineachofthesedomainsisthetopicofthecurrentsection.Overthepastdecadeseveralapproacheshavebeenproposedformodellingdatawarehouses.Now,wepresentastudyofthebestavailableconceptualand 37

PAGE 38

logicalmodelsfordatawarehousing.ExistingconceptualmodelingapproachescanbebroadlyclassiedintoExtensionsofEntityRelationship(E/R)models(( FranconiandKamble 2004 ; Kamble 2008 ; MalinowskiandZimanyi 2006 ; Sapiaetal. 1999 ; Tryfonaetal. 1999 )),ExtensionsofUniedModelingLanguage(UML)(( Abelloetal. 2006 ; Lujan-Moraetal. 2006 ; Pratetal. 2006 ))andAd-hoc(( Golfarellietal. 1998b ; Husemannetal. 2000 ; ViswanathanandSchneider 2010 ; Zepedaetal. 2008 ))designmodels.Severaldifferentlogicalmodelshavealsobeenproposedtomodelmultidimensionaldatainthepastfewyears.Thedatacubeoperator( Grayetal. 1996 )wastherstclearattempttoextendtherelationalmodeltosuitmultidimensionalanalysis.AcompletesurveyofthepropertiesofseveralearlierlogicaldesignmodelscanbefoundintheworksofBlaschkaet.al.( Blaschkaetal. 1998 ),Vassiliadiset.al.( VassiliadisandSellis 1999 )andPedersenet.al.( Pedersenetal. 2001 ).Thoughmanyofthesemodelsaidintherelationalrepresentationofaggregatedata,contributionsliketheALLoperatorandconceptsregardingdatahierarchiesaresignicanteveninamultidimensionalcontext.OneoftheearliestapproachesformultidimensionalmodelingwasintroducedbyKimballin( Kimball 1997 ).Thisdimensionalmodelingapproachproposesaninformalmethodologytoderivethemultidimensionalschemaandprovidesawaytodeveloparelationalimplementationintheformofthestarschema.Dimensionalmodelingimposessomerulesonthemodelingbutresultsinadatamodelthathastheaccessmethodsdenedclearlybyvirtueoftherelationships( Kimball 1997 ; Kimball,R.andRoss,M. 2002 ).Usersarealsobetterabletorelatetotheseemeasurebydimensionalvalue(s)paradigmratherthanasimplecollectionofvalues.Theapproachinvolvesdiscoveringthedata-martsforthedata-warehousespace,listingalldimensionsforeachdata-mart,usinganad-hocmatrixtocaptureuserrequirements,andthendesigningafacttablewithmeasuresaddedtoeachgrainofdetailalongthedimensionlevels.ThemodelpresentedbyAgrawalet.al.( Agrawaletal. 1997 )isalogicaldatamodelformultidimensionaldatabases.Thecubeisdenedasasetof 38

PAGE 39

dimensions(eachassociatedwithadomain)andasetofelements(measures).Amappingisprovidedbetweenthedimensionsandthesetofelements.Theelementsofthecubecanbe0,1(theBooleanCube)oran-tupleofelements.Thismodeldoesnotrequirethedimensionstohavearanked,discretedomain.Insteadthemappingfunctioncanbeusedtoprovideasymmetrictreatmentbetweenmeasuresanddimensions.Analgebraisalsodenedoverthemodelwithoperationssuchaspushandpull(totransformadimensionintomeasureandvice-versa),destroydimension,restriction(toconstraintmembervalues),andjoin(tocombinetwocubes).Severalotheroperationslikecartesianproduct,naturaljoin,andassociatearealsomentioned.However,thismodeldoesnotdiscussthehandlingofexplicitmultiplehierarchiesamongdimensionsortheproblemofimprecisionduetodoublecountingduringdataaggregation. Figure2-1. Anillustrationofthevariousdomainsconsideredduringthedesignofthespatialdatawarehousemeta-framework. 2.4PhysicalImplementationofDataWarehouses 2.4.1TheDataWarehouseArchitectureHurtado( Hurtadoetal. 1999 )studytheeffectofdimensionupdatesondatacubesandpresentamodelforbothdomainupdatesandstructuralupdatesofdimensions.( LenzandShoshani 1997 )presentsthenecessaryandsufcient 39

PAGE 40

conditionsforsummarizabilityinOLAP.( MalinowskiandZimanyi 2005b )addressesissueswithspatialdataandrelateshierarchiesandspatialmeasures.( Hanetal. 1998 )denesthespatialmeasureasacollectionofpointerstospatialobjectsandpresentsthreetypesofdimensionsthatexistinaspatialdatacube.Athoroughstudyofclassicationhierarchiesiscrucialforthedevelopmentofapowerfulmultidimensionalmodel.Therehavebeenseveralpapersdiscussingcubehierarchiesinthecontextofconceptualmodeling.Malinowskiet.al.( MalinowskiandZimnyi 2004 ),( MalinowskiandZimanyi 2006 )provideadetailedtreatmentofhierarchyrequirementsusingareal-worldexample.( Niemietal. 2001 )providesabriefclassicationofhierarchiesandenforcingdependenciesonvarioushierarchytypes.Additionally,forrelationalOLAPtheconventionalschemadesignssuchasusingstar,snowake,galaxy(sub-dimensiontables)orfactconstellation(manyfactsw/shareddimensiontables),iswellknownanddiscussedinnumerousworkson( Inmon 2005 ; Kimball 1997 ; Poeetal. 1997 )datawarehousedesign. 2.4.2DataHierarchiesAthoroughstudyofclassicationhierarchiesiscrucialforthedevelopmentofapowerfulmultidimensionalmodel.Therehavebeenseveralpapersdiscussingcubehierarchiesinthecontextofconceptualmodeling.Malinowskiet.al.( MalinowskiandZimnyi 2004 ),( MalinowskiandZimanyi 2006 )provideadetailedtreatmentofhierarchyrequirementsusingareal-worldexample.( Niemietal. 2001 )providesabriefclassicationofhierarchiesandenforcingdependenciesonvarioushierarchytypes.WeprovideadetaileddiscussionondifferenttypesofhierarchiesinvolvedincomplexdatatypeslaterinthisarticleinChapter 3 2.4.3StorageandRetrievalofLargeStructuredObjectsTheneedforextensibilityindatabases,ingeneral,andfornewdatatypesindatabases( Stonebraker 1986 ),inparticular,hasbeenthetopicofextensiveresearchfromthelateeighties.Inthissection,wereviewworkrelatedtothestorageand 40

PAGE 41

managementofstructuredlargeapplicationobjects.Thefourmainapproachescanbesubdividedintospecializedleformats,newDBMSprototypes,traditionalrelationalDBMS,andobject-orientedextensibilitymechanismsinDBMS.Thespecializedleformatscanbefurthercategorizedintotextformatsandbinaryformats( McGrath 2003 ).TextformatsorganizedataasastreamofUnicodecharacterswhereasbinaryformatsstorenumbersinnativeformats.XML( Brayetal. 2000 )isauniversalstandardtextdataformatprimarilymeantfordataexchange.Acriticalissuewithalltextdataformatsisthattheymakethedatastructurevisibleandthatonecannotrandomlyaccessspecicsubcomponentdatainthemiddleofthele.ThewholeXMLlehastobeloadedintothemainmemorytoextractthedataportionofinterest.Moreover,themethodsusedtodenethelegalstructureforaXMLdocumentsuchasDocumentTypeDenition(DTD)andXMLSchemaDenition(XSD)haveseveralshortcomings.DTDlackssupportfordatatypesandinheritance,whileXSDisreallyover-verboseandunintuitivewhendeningcomplexhierarchicalobjects.Ontheotherhand,binarydataformatslikeNetCDF( McGrath 2003 ; Rewetal. 2004 )andHDF( HDF-HierarchicalDataFormat 2011 ; McGrath 2003 )supportrandomaccessofsubcomponentdata.Butupdatinganexistingstructureisnotexplicitlysupportedinbothformats.Further,sinceHDFstoresalargeamountofinternalstructuralspecications,thesizeofaHDFleisconsiderablylargerthanaatstorageformat.Further,theseleformatsdonotbenetfromDBMSpropertiessuchastransactions,concurrencycontrol,andrecovery.ThesecondapproachtostoringlargeobjectsisthedevelopmentofnewDBMSprototypesasstandalonedatamanagementsolutions.TheseincludesystemssuchasBSSS( Hwangetal. 1994 ),DASDBS,( Scheketal. 1990 ),EOS( Biliris 1992 ),Exodus( Careyetal. 1988 ),Genesis( Batoryetal. 1988 ),andStarburst( Haasetal. 1990 ).Thesesystemsoperateonvariable-length,uninterpretedbytesequencesandofferlow-levelbyterangeoperationsforinsertion,deletion,andmodication.However,these 41

PAGE 42

systemsdonotmanagestructuralinformationoflargeapplicationobjectsandarehenceunabletoproviderandomaccesstoobjectcomponents.ThethirdapproachtakentostorelargeobjectsistheuseoftablesandBLOBsintraditionalobject-relationaldatabasemanagementsystems.Anyhierarchicalstructurewithinanobjectcanbeincorporatedintablesusingaseparateattributecolumnthatcross-referencestupleswiththeirprimarykeys.SomedatabasesuchasOracleevensupporthierarchicalSQLqueriesonsuchtables.However,thedrawbackofthismethodisthatthequeryingbecomesunintuitiveandhastobesupportedbycomplexprocedurallanguagefunctionsinsidethedatabase.Further,thesequeriesareslowbecauseoftheneedofmultiplejoinsbetweentables.BinaryLargeOBjects(BLOBs)provideanothermeanstostorelargeobjectsindatabases.However,thisisamechanismforstoringunstructured,binarydata.Hence,theentireBLOBhastobeloadedintomainmemoryeachtimeforprocessingpurposes.Thefourthapproachtostoringlargeobjectsistheuseofobject-orientedextensionmechanismsindatabases.MostpopularDBMSsupporttheCREATETYPEconstructtocreateuser-deneddatatypes.However,thetypeconstructorsprovided(likearrayconstructors)donotallowtocreatelargeandvariable-lengthapplicationobjects. 2.4.4CubeMaterializationHurtado( Hurtadoetal. 1999 )studytheeffectofdimensionupdatesondatacubesandpresentamodelforbothdomainupdatesandstructuralupdatesofdimensions.( LenzandShoshani 1997 )presentsthenecessaryandsufcientconditionsforsummarizabilityinOLAP.( MalinowskiandZimanyi 2005b )addressesissueswithspatialdataandrelateshierarchiesandspatialmeasures.( Hanetal. 1998 )denesthespatialmeasureasacollectionofpointerstospatialobjectsandpresentsthreetypesofdimensionsthatexistinaspatialdatacube. 42

PAGE 43

2.5RequirementsforUser-CentricSpatialOLAPForadatawarehousemodeltobeeffectiveinmodeling,storingandqueryingdata,someessentialrequirementsneedtobemet.Blaschkaetal.( Blaschkaetal. 1998 )providealistofrequirementsformultidimensionalmodelingforOLAPapplications.Pedersenetal.presentelevenrequirementsforamultidimensionalmodel( Pedersenetal. 2001 )usingaclinicaldatawarehousingapplicationasanexample.Theserequirementsarethenusedtoevaluatefourteenexistingmodels,toclassifythemintothreegroupsassimplecubemodels,structuredcubemodelsandstatisticalobjectmodels.Finally,anextendedmultidimensionalmodelisalsopresented.( Tsoisetal. 2001 )presentstenadditionalrequirementsforconceptualmodelsofmultidimensionaldataandcomparesthemagainstseveralotherearliermodels.Bystudyingtheseexistingmodelsformultidimensionaldatamodeling(Section 2.1 ),alongwithseveralnewOLAPtoolsandapplicationsthathaveemergedinthelastfewyears,wenowcompilealistofbasicfeaturesthatareessentialforaneffectiveuser-centricspatialdatawarehousemodel. 1. Multidimensionaldatastore:Aspatialdatawarehousesystemmustrstandforemostqualifyasamultidimensionaldatastore.Theprimaryreasonbehindthisbasicpropertyistoensuresupportforincreasingdatadimensionalityovertime.Sincedatawarehousestypicallyintegratedatafromheterogeneoussourcesoverabroadtimeline,itiscommontoseealargenumberofattributesforeachdataobjectaccommodatedintoadatamart.Severalsuchdatamartswithvariedandindependentdimensionalityareoftenthreadedtogethertobuiltasingle,largeenterpriseleveldatawarehousesystem.Thus,themodelshouldallowformultiplefacts,datadimensionsandevenmultipledatacubestobeincludedinthedatawarehousesystem.Limitstothemultidimensionalityofthedatacubeorthegranularityofthemembersormeasuresshouldnotbebasedonlogicalorimplementationconsiderations. 2. Simpleuserview:AsdatawarehousingandOLAPsystemshavebecomeincreasinglycomplextounderstandanddevelop,inrecentyearsthetrendhasbeentogetbacktoauser-centricapproachformodelingmultidimensionaldata.ThisbringsupthesecondmajorrequirementforaSDWmodel,inthattheuserviewbesimpleandintuitive,yetcapableofcapturingthefulldimensionalityofthedata.Thiscanbeachievedbyaconceptualmodelingstrategybasedonan 43

PAGE 44

abstractviewsuchasmultidimensionalarraysorcubes.Inrecentyears,manydataminingandcomplexknowledgegatheringsystemsarebeingbuiltovertraditionalrelationaldatabasesandcomplexscienticlesystems.However,foruserstobeabletoperformcomplexanalyticsoverlargedata,theforemostrequirementisaneasy-to-useinterface.Suchaninterfaceshouldhavethefollowingproperties:(i)simple,(ii)easytounderstandthestructureofthedata(theinterfaceshouldclearlyillustratetheconceptualstructureofthedata,forexample,intermsofclassesandassociations),(iii)easilyavailableaggregationfunctionsonparticulardatatypesandthemeanstoapplythem,(iv)easytogatherresults,visualizeandexportorsavethem,(v)abilitytoperformmulti-levelqueriesusingtheresultsfrompreviousaggregations,(vi)abilitytocreatenewtypesandspecifythesyntaxandsemanticsofnewoperationsinanextensiblemanner.Oneexampleofsuchanuser-interfaceisanabstractmultidimensionalview,likeadatacubeormultidimensionalarraywithasupportingsetofOLAPoperationsthatwouldmakeiteasierforuserstonavigatethroughhierarchicaldataandperformanalysis. 3. Implementationindependentconceptualdesign:Bydenition,anyconceptualmodelshouldbecompletelyfreeofimplementationaspectstoserveasaneffectivedatamodel.Sincebusinessintelligence(BI)systemsaremostoftenusedbydataanalystsfordecisionsupport,theuserviewshouldbeindependentofimplementationaspectstoeaseanalysis.Thus,forexample,theuseoffacttablesanddimensiontables(therebyexposingROLAPimplementation)shouldbeavoidedintheuserview.Thedatashouldbeviewabletotheanalystatanabstractandhigh-level,withoutrequiringtheunderstandingofcomplexlogicaldesignssuchasstar,galaxyorsnowakeschema,orphysicalimplementationconsiderationssuchasoptimizationsformaterializedviews,columnstoresandindexingrequirements.Internalrequirementsshouldnotdictatetheconceptualdesignofthedatawarehouseasthiscanfurtherforceuserstomodeldatawarehousesforsystemspecicimplementationsandrestrictdataanalysis. 4. Separationofstructureandvalues:Thereshouldbeanexplicitseparationofschemaandinstances,i.e.,thestructureofdataandtheiractualvalues.ThedistinctionbetweenstructureandinstancesofthedatacubehelpstheanalysttoapplyOLAPoperationsandmanipulatethemultidimensionalviewofdataanditscontentsindependently.Forexample,considerdroppingadatadimensioninadatacube.Thoughthisisasimpleconceptualoperationfortheusertochangethestructureofthedatacube,italsodrasticallyaffectsthevaluesinsidethecube.Thecellsofthecubehavetobere-evaluatedandtheircontentsupdatedwithnewmeasures.However,thisreformulationofthecubeshouldnotaffecttheviewofthedatacubeitselffortheanalyst,andshouldonlybenoticeableasachangeinthestateofthecube.Thishelpsinkeepinganalysisonlineanduser-friendly,withoutdevelopingnewtransformationsfromconceptualtologicaldesignsforeachnewstateofthemultidimensionalcube.Further,thisalsoallowsforefcienttypecheckingforensuringthevalidityofcomplexOLAPoperations. 44

PAGE 45

(a)(b)Figure2-2. Illustrationof(a)acomplexregionobjectwiththreefacesanditsinterior,boundaryandexteriorpointsets,and(b)asingleface,alsodenotedasasimpleregionwithholes. 5. Complexabstracttypesanddataobjects:Themodelshouldsupportthebasicsetofdatatypessuchasalphanumerictypes(int,char,etc.)andmorecomplextypessuchasgeo-spatialtypes(point,line,region,etc.),temporaltypes(timeinterval,instant,etc.).ThishelpstointegrateexistingtypesandoperationsintothesystemforOLAPanalysis.Further,themodelshouldbeextensibletosupportabstractuserdenedtypes(UDTs)andoperationsonthem.TheremustbefacilitiestospecifythesyntaxandsemanticsofsuchUDTs,alongwithanyadditionalconstraintstoensuremeaningfulaggregationsonsuchdata.Thesecomplexobjectscanresideasmeasuresofanalysisorasthemembersofthedatadimensionsinthemultidimensionaldatawarehouse.Additionally,themodelshouldalsoprovidesupportformultiple(composite)andcomplexmembersandmeasures.Forexample,acellinasalesdatacubecanconceptuallyincludeseveralmeasuressuchassalesquantity,inventoryand/orsalesprot.LocationcanbeacomplexobjectsuchasapolygonrepresentingItalywiththeVaticanasaholeinsideit.AnexampleofacomplexregionobjectisillustratedinFigure 2-2 ,withitsseveralfacesandsegmentcycles(representingtheobjectboundary).Thus,itisessentialthatthespatialdatacubebecapableofstoringandmanagingspatialmembersandmeasuresasbothsimple,complexandcomposite(map)spatialobjects. 6. Descriptiveattributes:Thematicordescriptiveattributesformembersandmeasures(geometricorotherwise)allowaddingadditionalinformationaboutdata.Forexample,applicationssuchaswebdatawarehousesofteninvolveoneormorekeywordortagelds,andGISandspatialdatabasesystemsuselabelstohelpidentify,qualifyandcorrectlyrepresentcompositespatialpartitions.ThismustbesupportedbytheSDWmodel.Additionally,selection,navigationandaggregationqueriesoversuchthematicattributesshouldalsobeavailabletoimproveanalysiscapabilities. 7. Explicithierarchies:Hierarchies(withseverallevelsofmemberormeasurecategories)shouldbesupportedexplicitlyinthedatadimensionsandevenforthevariousfactsofanalysis.Suchdatahierarchiesshouldbesupportedasrst 45

PAGE 46

(a)(b)Figure2-3. IllustrationofacomplexstructuredregionshowingfacesF1(containingouterCycleC1andholeCycleC2),F2(withcycleC3)andF3(withcycleC4),andahierarchicalrepresentationfortheregion(ormulti-polygon)object. classcitizensinsidethedatawarehouse.Moreover,hierarchiesshouldalsobesupportedintheirmostgeneralform,meaningthatragged,unbalancedandunevenhierarchiesofdatashouldbeavailableandusableforanalysis.Thisrequirementprovidesanopportunitytomodelcomplexdataobjectswithvariablerepresentationsinthemostgenericanduser-friendlyformatwithoutanyinuencefromimplementationconsiderations( Niemietal. 2001 ). 8. Supportforspatialhierarchies:Themodelshouldsupportgeneralizationandspecializationhierarchiesonspatialobjects.Thiswould,forexample,enableroll-upoperationsfromacityleveltoastateleveltoacountrylevelinthelocationhierarchy.Furtherallowinginter-linkingofspatialhierarchieswiththematicattributehierarchiesallowsforimprovedmultidimensionaldataanalysis.Forexample,aquerysuchas:FindthetrajectoriesofhurricaneslabelledCategory3orhigherthattraversedthestateofFloridain2005involvesreturningspatiallineobjects(trajectories)selectedthroughspatial(Florida),temporal(year)andthematicconstraints(HurricaneCategory). 9. Multiplehierarchies:Multiplehierarchiesalongthedatadimensionsandevenmeasurevaluesshouldbesupported.DatahierarchiesinSDWscanbeoftwobasickinds:datadimensionhierarchiesandobjecthierarchies.Theformersetincludeshierarchiesalongthedatadimensions,whichcanallowuserstoroll-upordrill-downalongthelevelsofthehierarchy.Theobjecthierarchiesarecomplexhierarchiesrepresentingtheinternalstructuresofbasicdatatypessuchasaregion.Thus,intheirmostgenericform,SDWsshouldsupportobjecthierarchiesasmembersofdatahierarchies.However,theuniquenessandconstraintsforeachofthemshouldbeuniquelymaintained,forexample,itmustbepossibletoensurethataregionobjectdoesnotcontaindanglinglinesorindependentpointsinitsstructure.Thesupportformultipledatadimensionhierarchiesisawellknownrequirementfordatawarehouses( MalinowskiandZimanyi 2004 ; Pedersenetal. 2001 ; Tsoisetal. 2001 ).However,inthisarticle,wealsomotivatetheneedforobjecthierarchies,toenablenativesupportforhierarchicalUDTsinDWs.Consider,forexample,Figure 2-2 athatillustratesacomplexregionobjectwhich 46

PAGE 47

(a)(b)(c)Figure2-4. Threetypesofspatialdimensions:(a)Geometric(b)Non-Geometric,(c)Mixed. consistsofthreeregionswithoneoftheminsidetheholeofanother.AnotherexampleinFigure 2-2 bdisplaysasinglefaceofaregionobject(whichcanalsoberegardedasasimpleregion)withmultipleholes.Suchcomplexdataobjectscanrequireseveralhierarchiesforcorrectrepresentationoftheirstructure,attributionofinternaltypestomeasurevalues(FindthequantityofsalesofcoffeebeansinmainlandUSA(onefaceofentireUSAobject)in2010)andforperformingefcientoperationsonthem.Figure 2-3 aprovidesamoredetailedvisualizationofacomplexregionobjectwiththreefaceslabeledasF1,F2andF3.Theinterior,exteriorandboundarypointsetsoftheregionarealsodisplayed.Afterperformingascanoperation,thecyclicorderoftheregion'sboundaryisstoredtorepresenteachfaceuniquely.Asecondaryhierarchylinkingthesiblinglistsofoutercyclescanhelptooptimizeoperationssuchasintersectionsandunionsthatinvolvecomputationsontheobject'sgeometry.Figure 2-3 bshowsthedetailedtreestructureofaregionobject.Inthegure,face[],holeCycle[],andsegment[]representalistoffaces,alistofholecyclesandalistofsegmentsrespectively.Inthetreerepresentation,therootnoderepresentsthestructuredobjectitself,andeachchildnoderepresentsacomponentnamedsub-object.Asub-objectcanfurtherhaveastructure,whichisrepresentedinasub-treerootedwiththatsub-objectnode.Forexample,theregionobjectinFigure 2-3 aconsistsofalabelcomponentandalistoffacecomponents.Eachfaceinthefacelistisalsoastructuredobjectthatcontainsafacelabel,anoutercycle,andalistofholecycles,whereboththeoutercycleandtheholecyclesareformedbysegmentslists.Whilestoringsucharegionobjectinthedatawarehouse,itisnecessarytoprovidetheorderedsegmentlistsforthefacecyclesforperformingefcientplane-sweepoperations.However,wewouldalsoliketostoresecondaryhierarchiesinthestructuresuchasthebasicregionhierarchyasillustratedinFigure 2-3 b.Thiscanonlybeachievedifmultiplehierarchiesareallowedinthespatialdatawarehouse. 10. Supportforirregularhierarchies:Theremustbesupportfornon-conformant(non-onto,non-strictandragged)hierarchiesandgeneralization/specialization(is-a)relationships( MalinowskiandZimanyi 2004 ; Niemietal. 2001 ).Forexample,consideralocationhierarchythatexistsinaSalesdatawarehouse:hCity!County!State!Countryi.IftheuserwouldlaterliketoincludeanotherhierarchysuchashSchoolDistrict!City!VotingZone!Countryi 47

PAGE 48

Table2-1. Examplesofnon-spatialandspatialaggregationoperators TypeBigCubeAggregationOperator AdditiveSum,Count,MaxorApex,MinorBase,Concatenate,Length,Area,ConvexHull,SpatialUnion,SpatialIntersectionSemi-AdditiveAverage,Variance,StandardDeviation,MaxN,MinN,Centroid,CenterofGravity,CenterofMass,Diameter,PerimeterNon-AdditiveMedian,MostFrequent,Rank,(First)LastNonNullValue,Diameter,MinimumBoundingBox,NearestNeighbor,Equi-Partition bycreatingnew,independentlevelsalongwithsomeexistinglevels,thisshouldbeallowedbytheabstractmodel.Notethatthetwopathsofthelocationhierarchyillustratedabove,specifydifferentaggregationsemanticsonthemeasurevalues.Suchscenariosareoftenencounteredindatawarehouseswhereupdatestothestructureofdataemergewiththeinclusionofheterogeneousdatasetsovertime. 11. Supportforspatialdimensions:Hierarchiescanbeusedasthedatadimensionsdeningthespatialdatacubestructureofthedatawarehouse.Thisnativesupportforspatialdimensionswillhelpuserstoperformselection,navigationandaggregationoperationsonbothmembersandmeasurevalueseasily.Thesupportforspatialdataandspatialhierarchiesalongthedatadimensionsisoneoftheessentialrequirementsforanyintegratedspatialdecisionsupportsystem.Additionally,themodelshouldallowoneormorespatialhierarchiestobecombinedasasingledatadimensiontodenethecubestructure.Usingaclassicationofexistingconceptsondatahierarchies,auniqueseparationofdatadimensionsasfullygeometric,semi-geometricandnon-geometricdimensions( Rivestetal. 2001b )isoftenconsideredinspatialROLAP.ThereexistexampleapplicationsofSOLAPsystemssupportingthesethreetypesofspatialdimensions(Figure 2-4 )( Bedardetal. 2001 ).Geometricspatialdimensionsaresaidtocomprisegeometricshapesinalllevelsofthedatadimension.Non-geometricordescriptivedimensionsessentiallycontainonlyalphanumericdataintheirdimensionmembersandmixedspatialdimensionscomprisesomespatialshapesandnon-spatialdataalongthehierarchicallevels.Thoughthisdistinctionoftypessoundsintuitive,itisoftendifculttoformalizedatatypesandapplyconstraintsforclosureanduniquenessbasedonsuchgenericstructures.Anotherapproachtohandlingspatialdataalongdatadimensionsistohaveuniquegeometricandnon-geometricdatadimensionsfollowedbyanassociationoperatorbetweenthem.Forexample,onecanfollowthepathfromacityobject(polygon)toastatename(string)inGISsystemsbyrstusingthemapgeneralizationoperatorandthenselectingthemaplabelforthatstate.Asimilarnotioncouldbeappliedinspatialdatacubestogeneralizespatialdimensionswithnon-spatialthematicattributes. 12. Supportforattributeaggregations:ThemodelmustprovidegoodsupportforaggregationonbothgeometricandalphanumericattributesapartfrombasicnumericandstatisticalcomputationsonthemembersandmeasuresoftheOLAP 48

PAGE 49

cube.Thereshouldbealsosupportforaggregationsalongattributesthatarenotpartofthedatadimensions,hierarchiesormeasuresthemselves,suchasthematicattributes.Further,apartfromaggregationsinsidejustonedatastore,analyticsshouldalsobeavailablebetweendifferentOLAPcubestoenablethetheseamlessintegrationofthevariousdatamarts.ExamplesofpossibleaggregateoperationsareshowninTable 2-1 13. Supportforspatialaggregations:Themodelshouldexplicitlysupportaggregationoperations(examplesinTable 2-1 )onthespatialmeasuresandmembers.Forexample,Returntheconvexhulloncitieshavingthetop-khighestsalesofiPadsineveryUSstatein2010. 14. User-denedaggregates,extensibleOLAP:Userdenedaggregationfunctionsshouldbesupported.Thesemayevenincludead-hocoperationssuchasratio(metric)andmulti-levelbuffer(geometric)operations.Forexample,consideraquerytondthemovingbufferinrangesof10kmoverFukushimaPrefectureinJapantoassesstheextentofspreadofradioactivitythroughtheatmosphereandtoaidinrelocatingthepopulation(Figure 2-5 )afterthe2011earthquake.Sincetherateofcontaminationreduceswiththerangefromtheaffectedregion,thebuffersandtherateofcontaminationineachzoneinthiscaseareaggregationscomputedoverthepreviousbufferextentsbasedontherateofdecayofnuclearactivitywithspatio-temporalvariations.Thus,supportforad-hoc,user-denedgeo-spatialoperationsonbothspatialmeasures,membersandtheirthematicattributesprovidesforanextensibledataanalysissystem. 15. Onlineaggregation:Themodelshouldallowformultiplelevelsofonlineaggregation,i.e.,dynamic,multi-levelquerydesign.Thisallows,forexample,tonavigatealongaspatialdatadimensionofthecubewhilethusaggregatingthemeasurevaluesinsideitanddeterminingavalidanalysisresult. 16. Handlingdataimprecisionandsummarizabilityconditions:Animportantpropertyofanymultidimensionaldatamodelisensuringcorrectsummarizability. Figure2-5. Illustrationofthree10kmrangebuffersforFukushimaPrefectureinJapantoaccessspreadofcontaminatedmaterialduringthe2011earthquakeandresultingaffectonnuclearpowerplants. 49

PAGE 50

Thispropertywasrstintroducedinthedomainofstatisticaldatabases( LenzandShoshani 1997 ).ASDWmodelshouldbeabletohandledataimprecisionsothatdouble-countingofdataisavoided,andnon-additivedataarenotsummarized.Thisisparticularlyrelevantforsemi-additiveandnon-additiveaggregateoperationsonspatialOLAPdata.Forexample,consideraveragescomputedforthesalespercentagesofanitematthecitygranularity.Whenrollinguptothenextlevel,i.e.,state,onemustensurethattheaverageisre-computedbytakingintoaccountthenewsalespercentageguresatthenewgranularity.Moreover,theassociationofspatialgeometrytomeasurevaluesmustbeevaluatedcorrectlywhileperformingaggregationsacrossspatialhierarchies.Additionally,thedatamodelmustprovidebuilt-insupportforaggregationsonadditive,semi-additiveandnon-additivedatainaeasilyperceptiblemannerforusers. 17. Drill-acrosscapability:Themodelshouldsupportdrillingacrossdimensions,i.e.,sharingofdimensionsamongdifferentfactcubes.Forexample,consideraSalesdatacubewiththefollowingdatadimensions:product,timeandlocation.Forthelocationinformationconsideralevelcalleddistrictinonecube.Thesamelevelisnamedasprefectureinanotherdatacubeduetolocalizations.Thuswerequirecapabilitiestorelateanddrill-acrossdatacubesalongvariedgranularitiesbymeansofsuitableassociationfunctions.However,theaggregationsoversuchoperationsshouldbecorrectandshouldyieldmeaningfulresults.ThiscanbeachievedbyexplicitlymonitoringthecurrentstateofacubeduringOLAPnavigation. 18. Drill-throughcapability:Themodelshouldsupportdrillingthroughcapabilitytobeabletoquerythebaselevel(raw)data.Thismeansthataccesstothebasedatacubeandthestoredlow-leveldata(indatabases,spreadsheetsorcomplexscienticformats)mustbeavailabletotheuser. 19. Handlinguncertainty:Themodelshouldalsobeabletohandletheuncertaintyinthedatausingtechniquessuchasdatalineagetrackingorspecialnullorf?gvalues. 20. Handlingchangesovertime:Anotherrequirementfordatawarehouseswhichhasbeenthesubjectofresearchinseveraldomainsincludingstatisticaldatabases,multidimensionaldatawarehousesandtemporalOLAPsystems,istheabilitytohandleupdatesanddeletionsovertime.Sincedatawarehousestypicallycollectdataoveralongtimeperiod,thesystemshouldensurethat(re)calculationsofmeasurevaluesareconsistentandcorrectovertime.Inconclusion,althoughseveralapproacheshavebeenproposedtomodeldatawarehousesinrecentyears,mostofthemlieatthelogicaldesignandphysicalimplementationlevels.FromthereviewinSection 2.1 ,wendthatonlyfewapproachesareconceptual,inthat,theytakeintoaccountuserdesignaspectsandabstract 50

PAGE 51

fromimplementationconsiderations.Further,onlyfewmodelssatisfymanyoftherequirementsenumeratedabove.Amongsuchmodels,theE/RextensionsfromMalinowskietal.( MalinowskiandZimanyi 2004 ; MalinowskiandZimanyi 2006 )(MADSmodel)achievesaclearconceptualseparationofschemaandinstancesinthemultidimensionaldataset.Further,thismodelalsosupportsexplicithierarchiesofseveralkinds(bothbalancedandragged)andallowsforcomplexmeasureswithdifferentlevelsofgranularity.However,themodelbecomesquitecomplicatedforusersduetothevariedextensionstoE/R.Furtheritdoesnotsupportaggregationalongthematicattributes,andonlyhaspartialsupportformultiplefactsandcomplexgeneralizationandspecializationhierarchiesintheschema.ThestarERmodel( Tryfonaetal. 1999 )supportcomplexmeasuresinthedatacubebutonlyhadpartialsupportforirregulardatahierarchiesandattributeaggregationsalongdifferentlevelsofgranularity.AmongtheextensionstoUMLclass-associationmodelling,theDimensionFactModel(DFM)( Golfarellietal. 1998b )providesaseparationofstructureandinstancesandconsidersseveraldifferentkindsofdatahierarchies.However,thismodelalsodoesnotsupportaggregationviathematicattributesandonlyprovidespartialsupportformultiplefactsinthedatacubeandOLAPfunctionssuchasdrill-across.Amongotherad-hocandspecializeddatawarehousedesignmodels,themultidimensionalmodelbyPederson( Pedersenetal. 2001 )providesgoodsupportforcomplexmeasuresandaggregationsalongsuchmeasuresandthematicattributes.However,themodeldenesseveralcomplextypesforthemultidimensionalelementsandonlyhaspartialsupportforirregularhierarchiesanduser-denedaggregates.TheBigCubemodel( ViswanathanandSchneider 2010 )satisesmostoftheaboverequirementssuchassimpleconceptualuser-view,supportforseveraldifferentkindsofdatahierarchiesandonlineaggregationsbyusingthecubeasanabstractdatatypeanddeningoperationsexplicitlyondatahierarchiesonitsperspectives.AdetaileddescriptionofthisBigCubemodel,alongwithitsalgebra,newaggregationoperators,andasupportingquery 51

PAGE 52

language,soastoprovideasignicantimprovementinuser-centric,multidimensionaldatawarehousingofcomplexspatialdataistheprimarycontributionofthisdissertation. 2.6SpatialDataWarehouseDesignModelsandSOLAPToolsSpatialdatawarehousing(SDW)hasbecomeatopicofgrowinginterestinboththedatabaseandGIScommunitiesinrecentyears.ThisisprimarilyduetotheexplosionintheamountofspatialinformationavailablefromvarioussourcessuchasGPSreceivers,communicationmedia,onlinesocialnetworksandothergeo-spatialapplications.ConsequentlysomespatialOLAPtoolsarenowavailabletohelpmodelandanalyzesuchdata.ThetermSOLAPwasintroducedveryearly( Rivestetal. 2001b ),thoughtheconceptofleveragingdecisionsupportsystemswithspatialdataandoperationshasbeenthetopicofresearchindatabaseandGIScommunitesforseveralyearsnow.Anearlyapproachtospatialonlineanalyticalprocessing(SOLAP)is( Rivestetal. 2001b ),whichmentionsessentialSOLAPfeaturesclassiedintothreeareasofrequirements.Therstistoenabledatavisualizationviacartographic(maps)andnon-cartographicdisplays(e.g.,2Dtables),numericdatarepresentationandthevisualizationofcontextdata.Second,dataexplorationrequiresmultidimensionalnavigationonbothcartographicandnon-cartographicdisplays,lteringondatadimensions(members)andsupportforcalculatedmeasures.Thethirdareadiscussedinvolvesthestructureofthedata,forexample,thesupportforspatialandmixeddatadimensions,supportforstorageofgeometricdataoveranextendedtimeperiod,etc.TheconceptualdesignmodelsforspatialdatawarehousesareextensionsofE/RandUMLdiagramsorad-hocdesignapproaches.AmongextensionsofE/Rmodels,( MalinowskiandZimanyi 2004 )presentsaclearintegrationofspatialdataforOLAPbyextendingtheMultiDimERandMADSapproaches.Amongotherad-hocdesignapproaches,( Ferrietal. 2002 )presentsaformalframeworktointegratespatialandmultidimensionaldatabasesbyusingafullcontainmentrelationshipbetweenthehierarchylevels.( Jensenetal. 52

PAGE 53

2004 ),theformalmodelfrom( Pedersenetal. 2001 )isextendedtosupportspatiallyoverlappinghierarchiesbyexploitingthepartialcontainmentrelationsamongdatalevels,thusleadingtoamoreexiblemodelingstrategy.Bimonteet.al.( Bimonteetal. 2006 ; Bimonteetal. 2010 )presenttheGeoCubemodelforspatialdatawarehousedesign,basedonaformalschemaandinstancedenitionforcubeelements.GeoCubealsoextendsconventionalSOLAPoperationswithvenewoperationsnamely,classify,specialize,permute,OLAP-bufferandOLAP-overlay.However,oneoftheshortcomingsofthisapproachistheuseofn-nmappingsbetweendata-dimensionsandfacts.Sinceeachcellofthedatacubeisauniquecatesianproductoftheassociateddatadimensions,thisn-nmappingweakensthecubestructureandmakesitdifculttoapplyconstraintsanddynamicschemachangesduringOLAPoperations.ThelogicalSDWdesignmodelsaimtoprovidesupportforspatialdatadimensions( ScotchandParmanto 2005 ),spatialmeasures( Hanetal. 1997 ; Marchandetal. 2004 ; Rivestetal. 2005 ; Shekharetal. 2001 )andspatialaggregations( Gomezetal. 2009 ).Theconceptofspatialmeasures(withaspecicgeometricpart)iseitherdenedasreferencestospatialobjects( Rivestetal. 2001b ; Stefanovicetal. 2002 )orastheresultsoftopological,distanceormetricoperations( MalinowskiandZimanyi 2004 ; Rivestetal. 2001b ),orasvaluesassociatedwithaspatialdatadimensioninthedatacube( Hanetal. 1998 ; Marchandetal. 2004 ).In( Stefanovicetal. 2002 ),theauthorsclassifyspatialdimensionhierarchiesaccordingtotheirspatialreferencesasnon-geometric(liketraditionaldescriptivedatadimensions),geometrictonon-geometric,andfullygeometric.Inadditiontosupportingspatialobjects,mostGISmodelsusebothgeometric(e.g.,theextentofrespreadisshownasapolygon)andthematicordescriptiveattributes(e.g.,statename)tohelpqualifygeometricdataobjects( Rigauxetal. 2002 ).Thisisaveryusefulfeatureforsupportingspatialaggregationoperationsandmapgeneralizations(suchasmovingfromstateleveltocountrylevelinthelocationhierarchy).Adiscussionofspatialhierarchiesandtopologicaloperatorsinaconceptual 53

PAGE 54

SDWmodelispresentedin( MalinowskiandZimanyi 2005a ).Sometechniquestoreneraggedandunbalancedhierarchiesinlogicalmultidimensionaldatabasedesignbytheuseoffunctionaldependenciesisspeciedin( Niemietal. 2001 ).Sekharet.al.( Shekharetal. 2001 )extendtheMapCubeoperatortosupportspatialdataandaggregations,butthemodelisratherconstrainedandnoteasilyextendableforuser-denedqueries.ThemajorimplementationsofSOLAPtoolscanbebroadlyclassiedasOLAPdominant,GISdominant,orintegratedOLAPandGISsolutions( Rivestetal. 2005 ).OLAPapproachesprovidemeansforaggregationofdata,whileGISapproachesfocusongeometricoperationsandvisualdataselectionswhilelimitingmultidimensionaldataanalysis.AnotherapproachistheintegrationofOLAPandGISsystems( Bimonteetal. 2006 ; Marchandetal. 2004 ; ScotchandParmanto 2005 ).NewmodelstointegrateOLAPandGISsystemsarespeciedin( Bimonteetal. 2005 )and( Bimonteetal. 2007 ),alongwithsomepossibleoperationsin( Bimonteetal. 2006 ).Thesimplecubedatamodelslike( DattaandThomas 1999 )donotprovidespecicsofhierarchiesamongdimensionsandthen-nrelationshipbetweenfactsanddimensions.( Hanetal. 1998 )discussescubematerializationaspectsanddenesspatialmeasuresasacollectionofpointerstospatialobjects.In( Rivestetal. 2005 ),theauthorspresentrequirementsandguidelinesforimplementingSpatialOLAP(SOLAP)technologyandintroduceacommercialproductcalledJMAPthatcombinesGISandOLAPtechnologyandpresentsaneasy-to-useinterfaceforperforminganalysisfornon-technicalusers.ThoughGISsystemshavetraditionallybeenusedforgeo-spatialexploration,theyhaveanenormousdrawbackwhenconsideringOLAPanalysisrequirements.ThisisprimarilybecauseGISsystemsarenotbuilttosupportinteractivenavigationalongdatahierarchiesandtoprovidedecisionsupport.Instead,theyareoftentransactionalsystemsthatimportspatialdatalesandprovideacartographicuserinterfaceforvisualexplorationandspecicspatialfunctionality.However,anintegrationofGISand 54

PAGE 55

OLAPsystemscouldbeaninterestingpathfordevelopingspatialdatawarehousesthatcanfacilitatecomplexspatialoperationsalongwithdecisionsupportfunctionality.TheGeoMondrian( GeoMondrianProject 2011 )projectaimstodevelopanopen-sourceimplementationofaSOLAPanalysisserver.Currently,itprovidesaspatiallyenabledversionoftheMondrianOLAPserver( PentahoAnalysisServices:MondrianProject 2011 ).However,itisunclearifGeoMondrianhasaclearunderlyingspatialdatamodelwithSOLAPoperators.Itseemstobeessentiallybuiltad-hoc,byusingacombinationoftheJavaTopologySuite( JavaTopologySuite(JTS) 2011 )(whichprovidesspatialoperationsaccordingtoOGCstandards)andMondrian(whichprovidestheOLAPoperationsonthematicattributes)withPostGIS(whichprovidesthespatialdatatypes)andJPivot( JPivot:JSPCustomTagLibrary 2011 )foruserviewasaweb-basedspreadsheet.ThesetogethercreateafunctionalspatialdataanalysistoolkitsupportingtheintegrationofspatialdataandoperationsinanOLAPserver. 2.6.1DataWarehouseQueryLanguagesTherehavebeenafewproposalsforOLAPquerylanguagesintendedformultidimensionalandstatisticaldatabases.Forexample,thesyntaxanddenitionofaDataMiningQueryLanguage(DMQL)fortheDBMinerSystemispresentedin( Hanetal. 1996 ).DMQLsupportsconcepthierarchiesonthedimensionsofthecube.Thelanguageprovidesaframeworkforminingthroughmultiplelevelsofdatausingcharacteristic,discriminant,classication,andassociationrules.Sincethelanguagedesignistoospecic,itisnotcompleteinadatawarehousecontext.However,DMQLservesasaninterestingqueryingplatformforOLAP.ThespecicationforSumQL,aquerylanguageforSummaryDatabases(SDB)isprovidedin( Pedersenetal. 2000 ).SumQLhelpstheusertoposeaggregatequeriesoverSDBsbytakingadvantageofthesemanticpropertiesofthesummarydatamodel.SumQLisanSQL-likequerylanguagethatincludesconstructstoreectSDBconceptssuchasdimensions,hierarchiesovercategoriesandaggregation.However,SumQLis 55

PAGE 56

providedasalanguageforsummarydatabases,anditdoesnotexplicitlyincludedatawarehousingandOLAPfunctionssuchasslicing,dicing,etc.,andhencetheuserhastodesigncomplexqueriesforhandlingsuchoperations.MultiDimensionaleXpressions(MDX)( MicrosoftCorporation 2010 )isapowerfullanguageformultidimensionaldataqueryingandisnowsupportedbynumerousBIandOLAPsystemvendors.However,MDXcanbecomplicatedforaunskilleduserofadatawarehouse,sinceitdoesnotspecifyconventionalOLAPoperationssuchassliceordiceexplicitly,butinsteadforcestheusertoperformjoindataacrossseveraldimensions(usingmultiplecrossjoinsamongitsmembers)toarriveattherequiredaggregationstate.MDXhasbeenwidelyadoptedandissupportedbyMicrosoft,Microstrategy,Whitelight,SAS,SAPandotherpopularanalysisservicesvendors.AnothermultidimensionalquerylanguagecalledtheCubeQueryLanguage(CQL)ispresentedin( BauerandLehner 1997 ),thatisdesignedforstatisticalandscienticdatabases(SSDB),particularlytheCROSS-DBsystem( Lehneretal. 1996 ).CQLisanextensionofSQLthatsupportsmultidimensionalqueryingandprocessesqueriesinatwo-stepfashion,withadataquerying(GET)stepfollowedbyadatapresentation(SHOW)step.Itenablestheuseoffeaturestofurtherrenetheselectionprocess.Itallowsselectionpredicatesnotonlyonclassicationnodes,butalsoonfeaturevalues,thatcharacterizethesingleitems.However,CQLdoesnotprovideagenericinterfaceforconventionaldatawarehouseoperations.Formodelingspatialdatatherearenowseveralestablishedapproachesinthedatabasecommunity.Anintroductiontobasicspatialdatatypesisgivenin( ShekharandChawla 2003 ).TheROSEalgebra( Gutingetal. 1995 ; GutingandSchneider 1995 )providesamorerobustdiscussionofspatialdatatypesbyintroducingtypessuchaspoint,lineandregionforsimpleandcomplexspatialobjectsanddescribestheassociatedspatialalgebra.Compositespatialobjects(collectionsofpoints,linesandregions)arepresentedasspatialpartitionsormapobjects.Similarly,theOpen 56

PAGE 57

GISConsortiumalsoprovidesaReferenceModel( OpenGISConsortium:ReferenceModel 2011 )asastandardforarepresentinggeo-spatialinformation.Qualitativespatialoperationsincludetopologicalrelations( SchneiderandBehr 2006 )suchasdisjoint,meet,overlap,equal,inside,contains,coversandcoveredBy,andcardinaldirectionrelationssuchasNorth,South,EastorWest.Quantitativerelationsonspatialobjectsincludemetricoperationsbasedonthesize,shapeandmetricdistancesbetweenobjectsortheircomponents.Manyoftheseoperationscanbeusedtoselect,restricttherangeofdataobjectsortheirextent,toqueryandanalyzespatialdatainthedatawarehouse.Otherimplementationaspectsofspatialdatawarehousesrelatetospatialaggregationoperationsandcubematerialization,whicharediscussedbelow. 2.6.2SpatialAggregationTherehasbeensomeworkdoneintheareaofspatialaggregation.( Lopezetal. 2005 )presentsacomprehensivesurveyonspatiotemporalaggregationthatincludesasectiononspatialaggregation.( Rivestetal. 2001a )introducestheconceptofSOLAPanddescribesthedesirablefeaturesandoperatorsaSOLAPsystemshouldhave.However,thereisnoformalmodelpresented.Hanet.al.( Hanetal. 1998 )usedOLAPtechniquesformaterializingselectedspatialobjects,andpresentedtheconceptofaSpatialDataCube.Thismodelonlysupportsaggregationofsuchspatialobjects.Pedersenet.al.( PedersenandTryfona 2001 )proposeamodelforthepre-aggregationofspatialfacts.First,theypro-processthesefacts,computingtheirdisjointpartsinordertobeabletoaggregatethemlater,giventhatpre-aggregationworksifthespatialpropertiesoftheobjectsaredistributiveoversomeaggregatefunction.Thisproposalignoresthegeometry,anddoesnotaddressformsotherthanpolygons.Theauthorsdonotreportexperimentalresults.Extendingthismodelwiththeabilitytopresentpartialcontainmenthierarchies(usedforalocation-basedservicesenvironment),Jensenet.al.( Jensenetal. 2002 )proposedamultidimensionaldatamodelformobileservices,i.e.,servicesthatdelivercontenttousersdependingontheir 57

PAGE 58

location.Thismodelomitsconsideringthegeometry,limitingthesetofqueriesthatcanbeaddressed.( Gomezetal. 2007 )denesummablequeriesandprovideamethodforintegratingGISandOLAP.ThisisimplementedusingtheanewquerylanguagecalledGISOLAP-QL.( RuizandTimes 2009 )presentsanexploratorynoteoncombiningscalarandspatialaggregationoperators.Usingconceptsaboutdistributive(additive),algebraic(semi-additive)andholistic(non-additive)aggregationfunctionclassicationfrom( HanandKamber 2006 ),theauthorsprovideaninterestingclassicationofaggregationoperatorsonspatialdataasdistributivenumericanddistributivespatial,algebraicnumericandspatial,andholisticnumericandspatialoperators.TheyalsopresentanexampleapplicationbasedontheGeoMDQLconcept( Silva 2008 ; Silvaetal. 2007 )forageographicdatamodelandprocessor,usingpracticaldatafromtheBrazilianpublichealthsystem. 2.6.3CardinalDirectionsbetweenSpatialObjectsAgoodsurveyofexistingapproachesformodelingcardinaldirectionsbetweenregionobjectswithouttemporalvariationisprovidedin( Chenetal. 2010 ).Themodelsproposedtocapturecardinaldirectionrelationsbetweensimplespatialobjects(likepoint,line,andregionobjects)asinstancesofspatialdatatypes( Schneider 1997 )canbeclassiedintotiling-basedmodelsandminimumboundingrectangle-based(MBR-based)models,someexamplesofwhichareshowninFigure 7-3 .Tiling-basedmodelsusepartitioninglinesthatsubdividetheplaneintotiles.Theycanbefurtherclassiedintoprojection-basedmodelsandcone-shapedmodels,bothofwhichassigndifferentrolestothetwospatialobjectsinvolved.Theprojection-basedmodelsdenedirectionrelationsbyusingpartitioninglinesparalleltothecoordinateaxes.TheDirection-RelationMatrixmodel( GoyalandEgenhofer 2000 ; SkiadopoulosandKoubarakis 2004 )helpscapturetheinuenceoftheobjects'shapesasshowninFigure 7-3 c.However,thismodelonlyappliestospatialobjectswithnon-temporalvariations.Italsoleadstoimpreciseresultswithintertwinedobjects.Weintroduced 58

PAGE 59

(a)(b)(c)Figure2-6. Possiblecongurationsbetweentwoobjects:spatialvariation(a),spatio-temporalvariationinoneobject(b),spatio-temporalvariationinbothobjects(c). animprovedmodelingstrategyforcardinaldirectionsbetweenregionobjectsin( Chenetal. 2010 ).Thecone-shapedmodelsdenedirectionrelationsbyusingangularzones.TheMBR-basedmodel( PapadiasandEgenhofer 1997 )approximatesobjectsusingminimumboundingrectanglesandbringsthesidesoftheseMBRsintorelationwitheachotherusingAllen'sintervalrelations( Allen 1983 ).Thedatawarehousehelpstostoreandqueryoverlarge,multidimensionaldatasetsandishenceagoodchoiceforstoringandqueryingspatio-temporaldata.Wepresentedaconceptual,user-centricapproachtodatawarehousedesigncalledtheBigCubemodelin( ViswanathanandSchneider 2010 )andthequerylanguagecalledcubeanalysislanguage(CAL)in( ViswanathanandSchneider 2011a ).Severalothermodelshavealsobeenproposedtoconceptuallymodeladatawarehouseusingacubemetaphorassurveyedandextendedin( MalinowskiandZimanyi 2005a ; Pedersenetal. 2001 ).Movingobjects,theirformaldatatypecharacterizationsandoperationshavebeenintroducedin( Gutingetal. 2000 ; Lemaetal. 2003 ).Howeverthereisthelackofamodelforqualitativedirectionrelationsbetweenmovingobjectsinallexistingworks.Weprovideasolutiontothisproblembydescribingthebasicsofthemovingobjectdatawarehouse(MODW)frameworkasanextensionofbasicdatawarehousesinChapter 7 59

PAGE 60

CHAPTER3CUBEASANABSTRACTDATATYPE:THEBigCubeMODELInthisChapter,weintroduceameta-frameworkfordatawarehousedesignwhichhelpstokeepthemodelingandanalysisuser-centricandprovidesanextensiblemodeltosupportcomplexspatialobjectsandaggregationsonthem. 3.1TheRenedDataWarehouseArchitectureThedevelopmentofadatawarehousestartswiththemodelingstep.SimilartotheANSIthreetierdatabasedesignarchitecture,weproposeathree-tierarchitecturetodesignandmodeltheentiredatawarehouseorOLAPsystem.Thisdata-modelincludesthreeviews:theUserview,theDesignviewandtheImplementationview.Thesedifferentlevelshelptoprovideuserswithanabstractviewofthedata(seeFigure 3-1 ).Theuserviewconsistsofexternalcubeviewsthatprovideahigh-levelabstractionofthedatawarehouseforthebusinessanalysts.Itallowsforcubedesignandqueryoperationsfromapurelyanalyticalstandpoint.TheconceptualdesignviewisprovidedbyourBigcubemodelandthissupportstheuserview,helpingtheanalysttoabstractlymodelthedatawarehousewithoutregardtoanyimplementationaspects.Howeveritalsohelpstovalidatethestructureandcontentsofthedatawarehouse.Thelogicaldesignviewshelptoincludethelogicalimplementationaspectsofthesystem.Thisphase,oftenperformedbythedatawarehousedesigner,providesthetranslationfromtheBigcubeconceptualmodeltothespeciedlogicalimplementationparadigm,namelyRelationalOLAP,MultidimensionalOLAPorthecombinationofbothsuchasHybridOLAP.TheImplementationviewprovidesthelowestlevelofabstraction,forthedatawarehousedeveloperoradministrator.Itincludeslehandlingandphysicaldatawarehousedesignaspectssuchasstorage,bufferandconsistencymanagement,RAID,dataintegration,etc.Thisthree-tierdatawarehousedesignarchitectureisdescribedindetailbelow. 60

PAGE 61

Figure3-1. TheGenericThree-TierArchitectureforModelingDataWarehouses TheUserViewprimarilyprovidesthebusinessuserofthedatawarehousesystemwithasetofExternalCubeViewsthatpresentdifferentsetsofthedatafromthedatawarehouse.Atthishighestlevelofabstraction,theExternalCubeViewshelpuserstovisualizerelevantdatafromthedatawarehouse.ExternalCubeViewsareamechanismtovisuallypresentqueryresultstoenterprisebusinessusersaccordingtotheirlevelofauthorizationforviewingaspecicpartoftheorganization'sdata.Thecubeviewpresentsanintuitivemodelfordataanalysisandthemultidimensionalviewiseasilyunderstoodbytheanalyst.ThesearealsousedtopresenttheresultsofOLAPqueriestotheuserthroughsuitableGUItools.TheDesignViewisthesecondtierinthedatawarehousedesignmodel.ThiscanbedistinguishedintotheConceptualDesignphaseandtheLogicalDesignphase.TheConceptualViewistherstphaseofthedatawarehousedesignview,andserves 61

PAGE 62

tocaptureallthedomainanddataconstraintsforanalysis.Asetofstandardizedtransformationrulesareappliedtotransformtheconceptualdesignview(BigCube)intotheelementsoflogicaldesign.TheConceptualDesignPhasedescribesthestoreddataintermsoftheBigcubedatamodelfordatawarehouses.MostmodelsutilizeanextensionoftheEntity-Relationshipdiagram,UMLCASEpackages,orad-hoccombinationoftheseconceptstopresenttheconceptualdatamodel.However,inourmodelweshallusetheacube-viewcalledBigcubetomodelthemultidimensionaldataspace.Themodelincludesoperatorstodesign,developandquerythedatacubes.TheLogicalViewisthesecondmajorphaseofthedatawarehousedesign,andallowsfortheconversionoftheconceptualmodelintologicalDWimplementationmodelssuchasROLAP,MOLAPorHOLAP.Forexample,iftherelationalmodelischosenfordatawarehouseimplementation,thedatawarehousedesignercanusetheBigcubeROLAPtransformationrulestoconverttheConceptualModel(fromthepreviousphase)intotheStar,SnowakeorFactConstellationschema.Materializedviewsarealsodesignedatthislevel.TheconversionfromERtotherelationaltablesishandledbyaspecialsetofconversionrules.TheImplementationViewisthelowestlevelofabstractionofthedatawarehouse,anddescribeshowthedataisactuallystoredinthesystem.Itpresentsthephysicalstorageaspectofthedatawarehousetotheadministrator.TheInternalStorageDesignleveltakescareofcomplexlow-leveldatastructuresandissuesrelatedtolestorage,buffermanagement,transactionandconcurrencycontrol.ThePhysicalStoragelevelcomprisesofintegrationunitswithheterogeneousstoredatabasesthathelptocompletethedatawarehousingphysicalsystem. 3.2UsingtheMeta-FrameworkforSpatialDataWarehouseDesignAfterreviewingtheexistingdatawarehouseandSOLAPmodelingapproachesandgeneratingthelistofessentialrequirementsforaneffectivespatialdatawarehouse 62

PAGE 63

model,wenowprovideabroadinsightintohowaspatialdatawarehousearchitectureshouldbeconstructedforsupportinguser-centricOLAP.Forprovidinguser-friendlyspatialdataanalysisitisessentialtouseanabstractdatamodeltodesignandconstructthedatawarehouse.Thiscanbeprovidedbyaconceptualdesignviewthatfullyabstractsovertheunderlyingimplementationdetails.Toallowuserstointeractwiththeconceptualcubeanuserview(querylanguageorvisualmapinterface)canbeusedtoexposethesetofdatatypesandoperationsforOLAPanalysis.Ateachlevel,explicitsupportforspatialdatamustbeprovidedusingspatialdatatypeswhichcanbesingleobjectssuchaspoint,lineorregion,oracombinationoftheseintermsofspatialpartitionsormapobjects.Figure 3-2 illustratessuchameta-frameworkthatweproposeforspatialdatawarehousedesign.Theconceptualmodelshouldprovidebuilt-insupportforspatialobjectsbyusingabstractdatatypes(ADTs)orbyextendingmultidimensionaldatatypessuchas Figure3-2. ComponentsoftheMeta-FrameworkforSpatialDataWarehouseDesign. 63

PAGE 64

perspectives(datadimensions)andsubjects(facts)toincludespatialvalues.Later,additive,semi-additiveandholisticclassesofaggregationoperationscanbedenedoverthem( Grayetal. 1996 ; HanandKamber 2006 ; RuizandTimes 2009 ).SuchaconceptualdesignviewcanbeanextensionoftheBigCubemodel( ViswanathanandSchneider 2010 )whichprovidesADTsarrangedoverdifferentlevelstocreatetheconceptualcubeoroneoftheotherconceptualmodellingapproachessupportingspatialdataanalysis( Bimonteetal. 2010 ; Gomezetal. 2009 ; Jensenetal. 2004 ; MalinowskiandZimanyi 2004 ).Asetoftransformationrulesareneededfromtheconceptualmodeltothelogicaldesignlevel.Thelogicaldesigncanbedoneinoneofthreeways.Datawarehousestar,snowakeorgalaxyschemacanbeconstructedandthecorrespondingrelationaltablesarestoredinadatabaselinkedbyforeignkeysandotherfunctionaldependencies.ThisiscalledRelationalOLAPorROLAP.InmultidimensionalOLAPdesign,datacubescanbeconstructedinmemorytostoreandoperateoverthedatawarehouse.Thisisverysimilartothecubemodelusedforconceptualdesign.Howeverthoughmultidimensionalqueryingisoftenfasterincomparisontorelationalquerying,thisapproachcanleadtoincreasedmemoryandstoragerequirements.AbalancebetweenthesetwoapproachesisachievedinHybridOLAPbyusingacombinationofrelationalandmultidimensionaldesignstrategies.Forexample,in-memorymultidimensionalarrayscanbeusedforconstructingthematerializedviewsthatenablefasterqueryprocessingonfrequentlyaccessedmeasuresanddatadimensions,whilebaseleveldata(athighestgranularity)isstillstoredinrelationdatasets.Adrill-throughoperationcanbeusedtoretrievetherawdatawhenrequired.Theuserviewcanincludegenerictextualquerylanguages,avisualgraphicaldashboardofmapclientssuchasOpenLayers( OpenLayersmappingclient 2011 ), 64

PAGE 65

GoogleMaps1( GoogleMaps 2011 ),Bingmaps( MicrosoftBingMaps 2011 )ortabularrepresentationsusingtoolssuchasJPivot( JPivot:JSPCustomTagLibrary 2011 ).Acombinationofthesetoolsisoftenrequiredforeffectivedatavisualizationanduser-friendlyanalysistodesignmultiplelevelsofqueries.Thismeta-frameworkcanbeusedtorepresenttheexactsemanticsofspatialaggregationoperationsondifferentviewlevels.Forexample,aquerysuchasndalladjacentstateswheremorethan5000iPhoneunitswheresoldin2010leadstoaselectiononthematicattributesfollowedbyatestforthemeettopologicalrelationonthespatialpartitionstogeneratetherequiredresults. 3.3CaseStudy:ProductManagementLetusconsideranexampleproductmanagementdatasetfromasupplierforillustratingthefeaturesoftheconceptualdatawarehousemodel.Thedatasetstoresinventory,salesordersandinvoiceinformationforthesalesofproductstocustomers.Productsarecategorizedbytheirnameandbrand.Eachproductalsohasitstypeandcolorinformation.Thecostpriceforeachproductismaintainedbythesupplier,alongwithitsavailablequantityandthemanufacturinglocation,aspartoftheinventory.Theinvoiceissuedbythesellercontainsthesellingpriceandquantityofproductsorderedbythecustomer.Onceanorderisplaced,theordercommittimeisrecordedusingday,month,weekandyearelds.Theorderlocationisalsorecordedforshippingpurposes,andincludescity,county,state,zoneandcountryinformation.Further,thesupplieralsorecordstheprotresultingfromthesalesofacertainquantityofproductstotracktheiroverallsalesrevenue.WewillnowbrieyillustratehowtoperformmultidimensionalmodelingofthisdatasetusingtheBigCubeapproach.TheformalizationoftheBigCubedatatypesanditsoperationsareprovidedinthenextsection.Fromtheexample,wegatherthebase 1GoogleandGoogleMapsareregisteredtrademarksofGoogleInc. 65

PAGE 66

datavaluesfortheBigCubeasGainesville(forCity),2010(foryear),100(forsalesquantity),etc.Wethenconstructcategoriesofvaluesfromthesourcedata.Examplesoftheseincludeyear,month,weekandday(fortime).Similarlyforlocation,wecanconstructcategoriessuchascity,county,state,zoneandcountry.Internallythesearerepresentedasmultisetdatatypes(asexplainedintheSection 3.4 ).Next,webuildhierarchiesoverthecategoriesofdata.Bydenitioneachofthecategoriesconstitutingthehierarchyshouldbeatauniquelevel.Letusconstructtwohierarchiesfortimeas:day,month,yearandday,week,year.Similarly,fromthelocation,wecanconstructtwohierarchiesas:city,state,countryandcity,county,zone,country.Atthenextlevel,wearenowreadytobuildperspectivesforanalysis.Thisisratherclearfromthesourcedata.Fore.g.,wecanconstructtimeandlocationperspectivesusingthetwohierarchieswebuiltatthepreviousstep.Similarly,subjectssuchasinvoice,salesquantity,protandinventorycanbeconstructedfromtheirconstituenthierarchies.Finally,webuildtheBigCubeastheunionofcells,eachbeingdenedbythecombinationofavailableperspectives.ThesubjectsoftheBigCubeareplacedinthecellsoftheBigCube.Inthismannertheentireproductmanagementdatacanbemodeledusingacubeview.TheresultingBigCubestructureisillustratedinFigure 3-3 3.4TheBigCubeModel 3.5DataModelandC3constructsInthisSection,wepresentourdatamodelformultidimensionaldatacubessupportingcomplexhierarchicalspatialobjects.TheseareextensionstotheBigCubeapproach( ViswanathanandSchneider 2010 )whichisaconceptualmetamodelformultidimensionalOLAPdatawithuniquetypesasshowninTable 3-1 .Tosupportcomplexobjectsindatawarehousesweneednewconstructsthatcanhandledatawithcomplicatedstructures.Howevertokeepthedatawarehousemodelinguser-friendly,theapproachtakenforconceptualmodelingandforapplyingaggregationsmustbesimple.TheC3constructspresentedheresatisfyboththeserequirements 66

PAGE 67

Figure3-3. IllustrationofthestructureofaSalesBigCubeshowingthreeperspectives:Time,ProductandLocationthatdenetwosubjectsofanalysis:Sales-QuantityandSales-Prot.Notethevarioushierarchiesineachoftheperspectivesandsubjectsofanalysis. byprovidingtheanalystwiththreesimpleandlogicaloperationstoconstructdatacubes,namelycategorization,containmentandcubing.LaterbyusingclassicalOLAPoperationssuchasslice,dice,rollup,drilldownandpivot,userscannavigateandquerythedatacubes.Categorizationhelpstocreategroupingsofbasedatavaluesbasedontheirlogicalandphysicalrelationships.Containmenthelpstoorganizethedatacategoriesintolevelsandplacetheminatleastapartialorderinginordertoconstructhierarchies.CubingorCombinationtakesdifferentcategoriesofdatafromthevarioushierarchiesanhelpstocreateadatacubefromthembyspecifyingmeaningfulsemantics.Thisisdonebyassociatingasetofmembersdeningthecubetoasetofmeasuresplacedinsidethecube.Further,eachoftheC3constructshaveasetofanalysisfunctionsassociatedwiththem,calledtheA-set.AnA-setcanincludeaggregationfunctions,queryfunctionssuch 67

PAGE 68

asselections,anduser-denedfunctions(UDFs).SinceaggregationsarefundamentaltoOLAPcubes,werstintroducethedenitionofanA-setinDenition 1 Denition1. AnalysissetorA-setAnanalysissetorA-setisasetoffunctionsdenedonthecomponentsofadatacubethatareavailableforaggregation,queryingandotheruser-denedoperations.AnA-sethasthefollowingalgebraicstructure:A=where,airepresentstheithaggregationfunctionavailable,qitheithqueryfunctionavailableanduitheithuser-denedfunction(UDF)availableinthatparticularcubecomponent.TheA-setisavailableaspartofeverycategory,hierarchyandperspectiveinthedatacube.TheoperationsontheconstituentelementsofthesecubecomponentsarespeciedbyitscorrespondingA-set.Next,tofacilitatethedevelopmentoftheC3constructsandadditionalOLAPformulations,wepresentsomenecessaryterminologyanddenitionsbasedonlatticetheory( DaveyandPriestley 2002 )andOLAPformalisms( Grayetal. 1997 ; ViswanathanandSchneider 2010 ). Denition2. PosetanditsTopandBottomElementsApartiallyorderedsetorposetPisasetwithanassociatedbinaryrelationthatforanyx,yandz,satisesthefollowingconditions: Reexivity:xxTransitivity:8xyandyz)xzAnti-Symmetry:8xyandyx)x=yForanySP,m2PisamaximumorgreatestelementofSif8x2S:(mx),andisrepresentedasmaxP.TheminimumorleastelementofPisdenedduallyand 68

PAGE 69

representedasminP.Aposet(P,)isatotallyorlinearlyorderedset(alsocalledchain)if8x,y2P)xyoryxWithaninducedorder,anysubsetofachainisalsoachain.ThegreatestelementofPiscalledthetopelementofPandisrepresentedas>,anditsdual,theleastelementofPiscalledthebottomelementofPandrepresentedas?.Anon-emptynitesetPalwayshasa>element(byZorn'sLemma).OLAPcubesoftencontainsparsedata.ToensurethatabottomelementexistsandtomaketheOLAPoperationsgenericallyapplicabletoallmultidimensionalcubeelements,weperformaliftingprocedurewheregivenaposetP(withorwithout?),takeanelement0=2PanddeneonP?=P[f0gas:xyiffx=0orxyinP. Denition3. DisjointUnionandLinearSumofPosetsLetPandQbetwo(disjoint)posets.TheunionP[QofPandQistheorderedsetformedbydeningxyinP[Qiff: either(x,y2PandxyinP)or(x,y2QandxyinQ).ThelinearsumP+QisdenedbytakingthefollowingorderrelationonP[Q:xyiff: (x,y2PVxyinP)W(x,y2PVxyinQ)W(x2P,y2Q)WerepresentP[QbyplacingsidebysidebothPandQ.TorepresentP+QweplacePdirectlybelowQandaddapathfromthegreatestelementofPtotheminimumelementofQ.Thisconcepthelpsinthecoalescingofspatialdatahierarchies. Denition4. StrictPosetAstrongorstrictpartiallyorderedsetisasetwithanassociatedbinaryrelationthatforanyx,yandz,satisesthefollowingconditions: Irreexivity::(x
PAGE 70

Denition5. LatticeLetPbeaposetandletSP.Anelementu2PcalledanupperboundofSif8s2S:(su).Dually,anelementlinPiscalledthelowerboundofSif8s2S:(sl).ThesetofallupperboundsandlowerboundsisrepresentedasSuandSlrespectively. Su=fu2Pj(8s2S):sugSl=fl2Pj(8s2S):slgAnelementxiscalledthesupremumortheleastupperboundofSif:x2Suand8x,y2Su:xy.ThisisrepresentedassupSor_S.TheinmumorthegreatestlowerboundofSisdenedduallyandrepresentedasinfSor^S.Anon-emptyorderedsetPiscalledalatticeif8x,y2P:x_yandx^y.Suchalatticewouldformthebasisforthedenitionofdataandspatialobjecthierarchieswhicharecentralconceptsindatawarehousedesignasdescribedinthefollowingsections.WearenowreadytodescribetheBigCubemetamodelforuser-centricmodelingofmultidimensionaldata.First,weintroducespecializedmultidimensionalabstractdatatypesovervelevels,todevelopanddesigntheoverallframeworkforthedatawarehouse.ThesemultidimensionaltypesarecollectivelycalledBigCubeDataTypes(BDTs)asshowninTable 3-1 .Eachlevelhasthreebasicstructuralcomponents,namely,kinds(containingasetoftypes),thedatatypesandelementsofthedatatypes(instances).TheBigCubestructureisillustratedinFigure 3-3 anditsinstantiationisshowninFigure 3-4 .AteachlevelofBDTs,weintroduceanexampletoillustratetheneedforthestructuralcomponentandthendeneitformally.Theseareessentialuserconceptstomodelmultidimensionaldata.ToprovideaconsistentandindependentuserviewforthedatawarehouseusingBigCube,werefrainfromusingcommonrelationaldatawarehouseterminologylikefact 70

PAGE 71

Figure3-4. AnexampleinstanceoftheSalesBigCubeshowninFigure 3-3 tablesanddimensiontables.Insteadweprovideanew,formaldenitionforthebasiccomponentsoftheBigCube. Example:.ConsidertheProductManagementcasestudypresentedinSection 3.3 .Thebasicdatathatneedstobestored(andlateranalyzed)inthedatawarehousearevaluessuchas1500(oftypeint)fortheprotinUSDandGainesville(oftypestring)fortheCityname.ThesearecalledthebasedatavaluesofthedatasetandaredenedinDenition 6 .Thebasedatatypeforeachvalueisindicatedwithinparenthesis. Denition6. Basedatavalue,basedatatypesandBASE.Abasedatavaluevisdenedasanyindivisible,low-levelvalueinthedatawarehouse.Abasedatatypeisthedatatypeofabasedatavalueandincludesallalphanumerictypes(suchasint,real,charconnedtonitedomainsinZ,R,Arespectively),alltemporaltypes(suchasdate,time,interval)andallgeo-spatialdatatypes(suchaspoint,line,regionforcomplexspatialobjects,connedtonitedomaininR2).ThesetofallbasedatatypesisdenedasakindBASE. BASE=ALPHA[NUM[TIME[GEOALPHA=fchar,string,...gNUM=finteger,real,...gTIME=fdate,time,interval,...gGEO=fpoint,line,region,...g 71

PAGE 72

Accordingtotheirfunctionality,basedatavalues(v2a2BASE)canbeeithermemberswhenusedforanalysisalongdatadimensions,ormeasureswhenusedtoquantifyfactualdata.Thesecondstepinmodelingmultidimensionaldataforanalysisinvolvesgroupingsorcategorizationofthebasedatavalues.However,beforewecandeneasubjectivegroupingofbasevaluesasacategory,werstconstructamultisetofvalidtypesfromtheavailablebasedatatypes.Thepowersetoperatoryieldsthesetofallsubsetsincludingtheemptysetandthesetitself,i.e.,P(A)=fBjBAg.Usingpowersets,wecannowdenetheMultisetTypeConstructorasshowninDenition 7 Denition7. Multisettypeconstructor().isamultisettypeconstructorthatcreatesmultisettypesfromthebasedatatypesasfollows: (a)=P(aN),a2BASEThismeansthat,forexample,(string)=P(stringN)holds.Thismultisetstringtypeisthesetofallmultisetsbasedonthedatatypestring.Thenumberofoccurrencesofeachofthetypesisshownbythesecondelementofthepair.Fore.g.,City=f(Gainesville,3),(Orlando,2),(Miami,1)g2(string).Wecannowdenethemultisetstringtypeasms string:=(string).Therefore,theCityisacategoryofcitymeasurevaluesofthetypems stringandcanberepresentedasCity:ms string.Similarly,wecanalsodenecorrespondingmultisettypesforallavailablebasedatatypes.NowwearereadytointroducetheC3constructsandsupportingOLAPformulationswhicharecentraltotheBigCubemodelandformultidimensionaldataanalysis.Real-worlddataalwayshassomeformofsymmetricityandassymetricityassociatedwithitsbasedatavalues.Fore.g.,allpersonsworkinginaUniversitycanbeemployees(symmetricrelationship).Employeescouldbestudents,facultyoradministrators(asymmetricrelationship). 72

PAGE 73

Denition8. TherstCinC3:CategorizationAcategorizationconstructdenesgroupingsofbasedatavaluesbasedonthesimilarityofdataas:C,AcwhereCisacategory(collectionofbasevalues)andAisasetofanalysisfunctionsthatcanbeappliedontheelementsofC.Thesemanticsofcategorizationrelationshipsaredenedinoneofthreeways:arbitrary(fore.g.,split100basevaluesinto10categoriesequallyaccordingtosomecriteria),user-dened(fore.g.,Gainesville,ChapelHillandMadisoncanbecategorizedasCollegeTowns),oraccordingtoreal-worldbehavior(suchasspatialgrouping,fore.g.,NewDelhi,BerlinandMiamicanbecategorizedasCities). Denition9. ThesecondCinC3:ContainmentTheContainmentconstructhelpstodenehierarchiesinthedata.Thesedatahierarchiesaremodeledaspartiallyorderedsets(orposets)touseanextensibleparadigmthatsupportsdifferentkindsofraggedandunbalancedhierarchies.Thecontainmentconstructtakesthedatacategoriesandbuildsadatahierarchyfromit.Thecontainmentconstructisdenedasasetinclusionfromoneleveltoanotheras,wherePandQrepresentthecategoriesofdataonwhichholds.Thecontainmentconstructisanalogoustoasinglepathbetweentwolevelsinaposet.ThesetofanalysisfunctionsthatareapplicableonaparticularcontainmentareavailableinA.Thesefunctionscanbeappliedwhichmovingfromtheelementsofonecategorytoanother.Thishelpstouniquelydeneoperationsonspecichierarchicalpathsintheperspectivesofthedatacube.Acontainmentrelationshipdenestheorderingofthelevelsinthehierarchyandisrepresentedbythesymbol.Thesemanticsofthisconstructisdenedby:(i)anyarbitrarycontainment,fore.g.,fteenbasedatavaluescanbeorderedintoafour-levelhierarchyusingthestructureofabalancedbinarytree,(ii)user-denedcontainment:fore.g.,productscanbeorderedintoahierarchybasedontheirsellingprice,(iii)accordingtoreal-worldbehavior:thesereectthefactthatahigherlevelelementisa 73

PAGE 74

contextoftheelementsofthelowerlevel,itoffersconstrainttothelowerlevelvalues,itevolvesatalowerfrequencythanthelowerlevelelements,orthatitcontainsthelowerlevelelements. Example:.Inourcasestudy,twoexamplesofcategoriesareCity=f(Gainesville,Orlando,Miami)gandProt=f(,,)gfortheprotinUSD.Theseareoftypesstringandintrespectively. Denition10. Category,CategoryTypeandCATEGORY.Acategoryofelementsc2S,SBASE,isagroupingofbasedatavaluessuchthatavalidcategorizationrelationshipexistsamongthesetofelements.Acategorytype,providesthemultisetdatatypesforeachcategory.ThesetofallavailablecategorytypesisdenedasakindCATEGORY.CategorieshelpustoconstructhierarchiesofdataasshownbythenextlevelofBigCubestructuretypes,namelyhierarchy,perspectiveandsubject.Hierarchiesareconstructedusingthecontainmentconstructoverthecategories,andperspectivesaredenedasacombinationofhierarchies.TodenethemultidimensionalcubespacewenowneedtothirdCinC3whichisthecubingorcombinationconstruct. Denition11. ThethirdCinC3:CubingorCombinationTheCombinationconstructhelpstomaptwosemanticallyuniquecategoriesofdatavaluesbyasetofanalysisfunctions.GiventwoorderedsetsofcategoriesPandQ,wedeneaorder-preserving(monotone)mappingj:P!QsuchthatifxyinP)j(x)j(y)inQ.Now,thecombinationconstructisdenedasP,Q,j,A,whereArepresentsthesetofanalysisfunctionsthatcanbeappliedonthecombinationrelationship.TheCombinationconstructhelpstodenehierarchiesinthedata.Thesedatahierarchiesaremodeledaspartiallyorderedsets(orposets)touseanextensibleparadigmthatsupportsdifferentkindsofraggedandunbalancedhierarchies.Thecontainmentconstructtakesthedatacategoriesandbuildsadatahierarchyfromit. 74

PAGE 75

Example:.Inourcasestudy,twoexamplesofcategoriesareCity=f(Gainesville,3),(Orlando,2),(Miami,1)gandProt=f(,3),(,2),(,1)gfortheprotinUSD.Theseareofthemultisettypesms stringandms intrespectively. Denition12. Category,CategoryTypeandCATEGORY.Acategoryofelementsc2(a),isagroupingofbasedatavaluessuchthatavalidcategorizationrelationshipexistsamongthesetofelements.Acategorytype(a),providesthemultisetdatatypesforeachcategory.ThesetofallavailablecategorytypesisdenedasakindCATEGORY.EverycategoryincludesanA-setoranalysissetthatdeterminesthevariousaggregationsfunctionsapplicableontheelementsofthatcategory.Fornbasedatatypes,wehavencorrespondingmultisetsinCATEGORY.Inourexample,thecategoriescitiesandprotaremultisetsandthereisnoorderamongtheconstituentpairsofelements.Weonlystorecitynamesalongwiththenumberofoccurrencesofeachcityinthecube.Thesameappliesforprotandallothercategoriesofmembers/measuresinthecube.CategorieshelpustoconstructhierarchiesofdataasshownbythenextlevelofBigCubestructuretypesinDenition 13 Example:.Inoursampledataset,wenoticethattherearedatahierarchiessuchasCityStateCountryforlocation,andDayMonthYearfortime.ThesearedenedashierarchiesinDenition 13 Denition13. Hierarchy,hierarchytypeandHIERARCHY.Givenadatasetwithncategoriesofbasedatavalues,ahierarchyisdenedasatotalorderingofthecate-gories(c1...cn),suchthatafulllogicalcontainmentrelationship()existsbetweentheconstituentbasedatavalues.Everyhierarchyincludesananalysis-setthatdeterminestheaggregateoperationsapplicableonthevariouspathsofthehierarchy.ThehierarchytyperepresentsthetypeofhierarchyandthekindHIERARCHYisdenedasasetofallhierarchytypesinthedatawarehouse. 75

PAGE 76

Hierarchy=hc1,c2,...,cn,Aisuchthat(i)81in:ci2a2CATEGORY(ii)c0=f(?,1)g,cn+1=f(>,1)g(iii)81in:icici+1Withncategories,eachhierarchycontainsupto2nlevelsintotallinearorder.Eachlogicalcontainmentrelationbetweentwocategoriesiscalledapathinthehierarchy.EachhierarchyconsistsoftwospeciallevelscalledApexor>andBaseor?thatdenotethetopmostlevel(smallestsetofinstances)andthelowermostlevel(largestsetofinstances)respectively.Overall,thestructureofadatahierarchyintheBigCubeisgivenbythelatticethusconstructed(Denition).Thusattheinstancelevel,specicsetsofmeasuresalwaysexistatuniquelevelsinthehierarchyandarecomparableoverthestructureoftheordering.Theuseofunique?and>levelshelpstouniquelyidentifyeach(aggregationvalid)hierarchyevenaftertheyarecombinedinlaterlevelstoformalattice.Italsohelpstokeepthehierarchiesbalanced(onto).The>levelalwayspointstotherootofahierarchywhilethe?pointstotheleafnodes. ANoteonDataHierarchies:.Athoroughstudyofclassicationhierarchiesiscrucialforthedevelopmentofapowerfulmultidimensionaldatamodel.Therehavebeenseveralapproachesdescribingcubehierarchiesinthecontextofconceptualmodeling.Malinowskiet.al.( MalinowskiandZimnyi 2004 ),( MalinowskiandZimanyi 2006 ),( MalinowskiandZimanyi 2007 )provideadetailedtreatmentofhierarchyrequirementsusingareal-worldexample.( Niemietal. 2001 )providesabriefclassicationofhierarchiesandenforcingdependenciesonvarioushierarchytypes.Inthissection,weprovideananalysisofthevariousdatahierarchystructureswhichcanbesupportedbythelatticestructuredenedinDenition 3.5 76

PAGE 77

Figure3-5. DataHierarchyRepresentedasaDirectedAcyclicGraph Hierarchies.TheremaybeseveraltypesofhierarchiesfoundinanOLAPbusinessapplication.Theseincludebalancedandunbalanced,strict(onto)andnon-strict(non-onto),raggedandnon-raggedhierarchiesasdescribedbelow.UnderstandingofthedifferenttypesofhierarchiesisessentialforsupportingdifferentkindsofOLAPqueriesthatmaybeposedonthehierarchiesalongperspectivesandsubjectsofanalysis. DirectedAcyclicGraph.Thisisthemostgeneralclassinthetaxonomyofhierarchies.Directedacyclicgraph(DAG)isadirectedgraphwithnocycles.ThereareredundantaggregationpathsinDAGs.SeeFigure 3-5 foranillustration. TransitiveAnti-closedDiGraph.Thisisanacyclicgraphwithnoredundantaggregationpaths.Ifthereisaretwoedgesoflengthoneandmorethanonerespectively,theedgeoflengthoneisremoved.Thustherearenodirectshortcutsinthegraph.ThiskindofhierarchyisalsocalledaCoveredHierarchy. Tree.AtreeisaDAGwitheachnodehavingasingleparent,excepttherootorALLlevel.Thusuniqueaggregationpathsareguaranteed. Simplehierarchies.Simplehierarchiesarehierarchieswhichuseasinglecriterionforanalysisandtherelationshipamongmemberscanberepresentedbyatree.Eachlinkbetweenparentandchildlevelshasone-to-manycardinalities,i.e.,aparentmembermayberelatedtoseveralchildmembers,butachildmemberisrelatedtoatmostoneparent.SuchhierarchieswhereeachchildbelongstoatmostoneparentarealsocalledStricthierarchies. 77

PAGE 78

Figure3-6. RaggedandNon-RaggedHierarchies Figure3-7. BalancedandUnbalancedHierarchies SymmetricHierarchyandAsymmetricHierarchy.ASymmetricHierarchyisonewhereallthebrancheshavethesamelength.Suchhierarchiesalsohavethepropertiesofbalanced,non-ragged,homogeneousandlevel-basedhierarchies(asseenbelow).AnAsymmetrichierarchyisonewhereallthebranchesdonothavethesamelength.Somebranchescouldhavedifferentlengthsfromroottotheleafnodes.Thesehierarchieshavethepropertiesofnon-balanced,non-ontoandraggedhierarchies. RaggedandNon-RaggedHierarchies.Non-raggedhierarchiesarehierarchiesinwhicheachchildmemberofaparentmemberisatthesamelevel.Inraggedhierarchiessomeparentmemberscanhavechildmemberappearingatdifferentlevels.Anexamplecouldoccurinbanking,whereaclericalemployeeandamanagercouldbothbemanagedbyaBranchmanager.SeeFigure 3-6 GeneralizedHierarchy.Thishierarchycontainsmultipleexclusivepathssharingsomelevels.Allthepathsrepresentasingleperspectiveofanalysiswithasinglecriterion.ThesearealsocalledRaggedhierarchies. BalancedandUnbalancedHierarchies.Balancedhierarchieshaveallleafnodesatthesamelevelinthetree.InUnbalancedhierarchiessomelowerlevelsofthehierarchyarenotmandatory.SeeFigure 3-7 78

PAGE 79

Table3-1. FivelevelsofBigCubedatatypes Kinds:Types:Elements: BASEbasedatatypebasedatavalueCATEGORYcategorytypecategoryHIERARCHYhierarchytypehierarchyPERSPECTIVEperspectivetypeperspectiveSUBJECTsubjecttypesubjectBigCube Thestructureofthehierarchyhassignicanteffectoncorrectnessofaggregationsandduringcubematerialization.Howevertheexactsemanticsandrestrictionsimposedbythesestructuresontheinstancevaluesofthedatacubeishandledatthelogicallevelbythemodelapplied.Fore.g.,relationalOLAPapproachcanimposemultidimensionalfunctionaldependenciesandcountconstraintsonthemembersofthedatadimensions.Bymakinguseofdatahierarchiesinagenericmanner,wecanconstructthenexttierofBigCubestructuretypes.Thisnextlevelhelpstocreateanddistinguishtheroleofthedatadimensionforanalysis.Perspectivetypeshelptocreatedatadimensionsforanalysis,whilesubjecttypeshelptocreatethemeasurementsofdataanalysis. Example:.Inoursampledataset,wenoticethattherearethreedatadimensions,namelyLocation,TimeandProduct,eachofwhichcanbeusedforindependentlyanalyzingthedataset.EachoftheseiscalledaperspectiveintheBigCubemodelandisdenedinDenition 14 Denition14. Perspective,perspectivetypeandPERSPECTIVE.AperspectivePisaspecialunionofoneormorehierarchies(overmembers)thatcreatesastrictpartialorderingbetweentheircategories,suchthatmeaningfulqueryingandanalysisofthemultidimensionaldataispossible.Perspectivetypesaredenedovertheconstituenthierarchytypes.ThePERSPECTIVEisdenedasakindthatcontainsamultisetofperspectivestypes.Theperspectivecreatesaspecialunionofhierarchiessuchthatalatticestructureiscreatedinwhichcategoriesfromdifferenthierarchiesaremergedaccordingto 79

PAGE 80

user-denedsemantics.Eachpathinthelatticeofaperspective'shierarchy,beginningfromitsleastupperboundandendinginitsgreatestlowerboundiscalledapathintheperspective.Whiletotalorderismaintainedinthehierarchies,theperspectivecreatesastrictpartiallyorderedset(poset)ofcategories.Thestrictnatureoftheposetimpliesthateachcategoryexistsatauniquelevelintheperspective,whichhelpstodeneandvalidaggregationoperationseasily.Perspectiveshavehierarchiesbuiltonmember(ormeasure)categoriesalongwithanApex>todenotethehighestaggregatedleveloftheordering(analogoustotheALLconstruct)andBase?todenotethelowestaggregatedleveloftheorderingofthecategoriesconstitutingthebottom-mostleveloftheperspective. Example:.Inourcasestudy,wenoticethattherearethreesetsofmeasurablefactsthatcanserveasthebasisforanalysisoftheentiredataset:Sales,InvoiceandInventory.EachoftheseisdenedasasubjectofanalysisinDenition 15 Denition15. Subject,subjecttypeandSUBJECT.AsubjectSisaspecialunionofhierarchiesovermeasuresthathelptosignifymeasuredfacts,andhelpstoperformanalysisofthemultidimensionaldata.Eachsubjectisavalueofthesubjecttype,andtheSUBJECTisdenedasakindthatcontainsamultisetofsubjecttypes.Subjectaresimilartoperspectives,exceptthattheyhelptoquantifyfactsofanalysisandthusaremeasurements,whileperspectivesformdatadimensionsforperforminganalysis.Finally,havingdenedthesefourlevelsofBigCubetypes,wecannowconstructthecompleteBigCubeasshowninDenition 16 Denition16. BigCube.Givenamultidimensionaldataset,theBigCubecellstructureisdenedasaninjectivefunctionfromthen-dimensionalspacedenedbytheCartesianproductofnfunctionallyindependentperspectivesP(identiedbyitsmembers)toasetofrsubjects(identiedbyitsmeasures)Sandquantifyingthedataforanalysisas: 80

PAGE 81

fB:(P1P2...Pn))167(!Siwherei2f1,...rg^(Si,P)2BASEThecompleteBigCubestructureisnowdenedasaunionofallitscells,givenas: BigCube(B)=[i2f1,...rg,fBSiThecombinationnotationmeansthatanyorderofthetwoargumenttypesisvalidbutthecombinationofperspectivesidentiesoneuniquecellinthestructureoftheBigCube,asinDenition 11 .TheBigCubecellstoresthesubjectsconsistingofoneormoremeasurevalues.EverycombinationintheBigCubehasanA-setoranalysissetassociatedwithitthatdeterminestheaggregationfunctionsapplicableonthecombinationofperspectivesthattogetherdetermineauniquecellinthecube.Generally,theA-setinthecombinationoperatoroftheBigCubeisonlyanassociation.Thismeansthat,forexample,inourexamplesalesBigCube,acellstoringthesalesquantityasasubjectofanalysishasthestructure(product,location,time).Quantity.ThisassociationcanbesetintheA-setoftheBigCube.Thus,theoverallstructureoftheBigCubethusdenedcanbeillustratedusingave-tupleB=hP,S,C,H,AiwherePrepresentsasetofperspectivesdeningtheedgesoftheBigCube,SrepresentsthesubjectsofanalysiscontainedintheBigCube,CrepresentsthecategoriesofdatathatarearrangedinHhierarchicallevelswithintheBigCubeandArepresentstheA-setoranalysissetwhichincludescategorization,containmentandcubingaggregateattributesfortheBigCubeB.Thebasecube?BisaspecialstateoftheBigCubethatdenesthecubeviewintheleastaggregatedform(atnestgranularity).ThemostaggregatedstateoftheBigCubeiscalledastheapexcubeor>B.ThustheentiremultidimensionaldatasetcanbemodeledusingthedatawarehouseframeworkasaconstellationofBigCubes.TheexampledatasetisrepresentedintheBigCubemodelasillustratedinFigure 3-3 (structure)andFigure 3-4 (instance).AllvelevelsofBigCubedatatypesthusconstructedareshowninTable 3-1 81

PAGE 82

3.6SpatialDataTypesandtheirIntegrationintoDataWarehouseHierarchiesInthepast,numerousdatamodelsandquerylanguagesforspatialdatahavebeenproposedwiththeaimofformulatingandprocessingspatialqueriesindatabasesandGIS( Guting 1994 ).Spatialdatatypes(see( Schneider 1997 )forasurvey)likepoint,line,orregionprovidefundamentalabstractionsformodelingthestructureofgeometricentities,theirrelationships,properties,andoperations.Afewmodels( GutingandSchneider 1995 ; OpenGISConsortium:ReferenceModel 2011 ; Schneider 1997 )havebeendevelopedtowardscomplexspatialobjects.Alltheseapproachesallowmultipleobjectcomponents.Someapproachesarebasedonanitegeometricdomain( GutingandSchneider 1995 ; Schneider 1997 )whereaswedeneourdatatypesintheinniteEuclideanplane. 3.7ComplexSpatialTypesThissectiondenestheunderlyingspatialdatatypesintheBigCubemodel.Westriveforaverygeneral,abstractdenitionofcomplexlinesandcomplexregions(seeFigure 3-8 )intheEuclideanplaneR2.Ourformalframeworkarebasicconceptsofpointsettheoryandpointsettopology( Gaal 1964 ).Thetaskistodeterminethosepointsetsthatareadmissibleforcomplexline(Section 3.7.2 )andcomplexregion(Section 3.7.3 )objects.Thedenitionswegivecontributetoanunstructuredobjectdenitionwhichsolelydeterminesthepointsetofalineorregion.Duetospacerestrictions,wedonotidentifystructuralcomponents.Butacomplexlinerepresentsaspatiallyembeddednetworkpossiblyconsistingofseveralconnectedcomponents,andacomplexregionrepresentsamulti-partregionpossiblywithholes.Forbothspatialdatatypeswespecifythetopologicalnotionsofboundary,interior,exterior,andclosuresincethesenotionsarelaterneededforthespecicationoftopologicalrelationships. 3.7.1ComplexPointWebeginourdenitionofspatialdatawiththepointtype. Denition17. Thespatialdatatypepointisdenedaspoint=fP2jPisniteg. 82

PAGE 83

Figure3-8. Examplesofacomplexlineobject(a)andacomplexregionobject(b). Avalueofthistype,p2Piscalledacomplexpoint.IfP2pointisasingletonset,thenPisdenotedasasimplepoint.Apointhierarchyinthespatialdataware-housecomprisesofacomplexpointdenitionbya>and?orderinginthelattice(Dention 3.5 ). 3.7.2ComplexLinesBeforewestartwithadenitionforcomplexlines(Figure 3-8 a),weneedafewdenitionsofsomewell-knownandneededtopologicalconcepts.WeassumetheexistenceoftheEuclideandistancefunctiond:R2R2!Rwithd(p,q)=d((x1,y1),(x2,y2))=p (x1)]TJ /F7 11.955 Tf 10.95 0 Td[(x2)2+(y1)]TJ /F7 11.955 Tf 10.95 0 Td[(y2)2.Withthenotionofdistance,wecannowproceedtodenewhatismeantbyaneighborhoodofapointinR2. Denition18. Letq2R2ande2R+.ThesetNe(q)=fp2R2jd(p,q)0,thereexistsanumberd>0(usuallydependingone) 83

PAGE 84

suchthatforeveryx2Nd(x0)\Xweobtainthatf(x)2Ne(f(x0)).ThemappingfissaidtobecontinuousonXifitiscontinuousateverypointofX.Forafunctionf:X!YandasetAXweintroducethenotationf(A)=ff(x)jx2Ag.Denition 19 enablesustogiveanunstructureddenitionforcomplexlinesastheunionoftheimagesofanitenumberofcontinuousmappings. Denition20. Thespatialdatatypelineisdenedas line=fLR2j(i)L=Sni=1fi([0,1])withn2N0(ii)81in:fi:[0,1]!R2isacontinuousmapping(iii)81in:jfi([0,1])j>1gWecallavalueofthistypecomplexlineandtheimageofacontinuousmappingcontinuousline.Therstconditionalsoallowsalineobjecttobetheemptypointset(n=0inDenition 20 ).Thethirdconditionavoidsdegeneratelineobjectsconsistingonlyofasinglepoint.TheboundaryofacomplexlineListhesetofitsendpointsminusthoseendpointsthataresharedbyseveralcontinuouslines.Thesharedpointsbelongtotheinteriorofacomplexline.BasedonDenition 20 ,letE(L)=Sni=1ffi(0),fi(1)gbethesetofendpointsofallcontinuouslines.Weobtain L=E(L))-138(fp2E(L)jcard(ffij1im^fi(0)=pg)+card(ffij1im^fi(1)=pg)6=1gLetL6=?.ItispossiblethatLisempty(e.g.,ifLisaclosedcontinuousline).Theclosure LofListhesetofallpointsofLincludingtheendpoints.Therefore L=Lholds.FortheinteriorofLweobtainL= L)]TJ /F4 11.955 Tf 11.33 0 Td[(L=L)]TJ /F4 11.955 Tf 11.34 0 Td[(L6=?,andfortheexteriorwegetL)]TJ /F11 11.955 Tf 10.23 -4.34 Td[(=R2)]TJ /F7 11.955 Tf 11.02 0 Td[(L,sinceR2istheembeddingspace.Thesetofallpointsdeterminingthelinecanbearrangedinapartialordercomprisingofa>and?.Thiscreatesacomplexlinehierarchyinthespatialdatawarehouse. 84

PAGE 85

Figure3-9. Examplesofpossiblegeometricanomaliesofaregionobject. 3.7.3ComplexRegionsRegionsareembeddedintothetwo-dimensionalEuclideanspaceR2andmodeledasspecialinnitepointsets.WebrieyintroducesomeneededconceptsfrompointsettopologyinR2. Denition21. LetXR2andq2R2.qisaninteriorpointofXifthereexistsaneigh-borhoodNsuchthatN(q)X.qisanexteriorpointofXifthereexistsaneighborhoodNsuchthatN(q)\X=?.qisaboundarypointofXifqisneitheraninteriornorexteriorpointofX.qisaclosurepointofXifqiseitheraninteriororboundarypointofX.ThesetofallinteriorpointsofXiscalledtheinteriorofXandisdenotedbyX.ThesetofallexteriorpointsofXiscalledtheexteriorofXandisdenotedbyX)]TJ /F2 11.955 Tf 7.46 -4.34 Td[(.ThesetofallboundarypointsofXiscalledtheboundaryofXandisdenotedbyX.ThesetofallclosurepointsofXiscalledtheclosureofXandisdenotedby X.ApointqisalimitpointofXifforeveryneighborhoodN(q)holdsthat(N)-157(fqg)\X6=?.XiscalledanopensetinR2ifX=X.XiscalledaclosedsetinR2ifeverylimitpointofXisapointofX.ItfollowsfromthedenitionthateveryinteriorpointofXisalimitpointofX.Thus,limitpointsneednotbeboundarypoints.Theconverseisalsotrue.AboundarypointofXneednotbealimitpoint;itisthencalledanisolatedpointofX.FortheclosureofXweobtainthat X=X[X. 85

PAGE 86

Itisobviousthatarbitrarypointsetsdonotnecessarilyformaregion.ButopenandclosedpointsetsinR2arealsoinadequatemodelsforcomplexregionssincetheycansufferfromundesiredgeometricanomalies(Figure 3-9 ).Acomplexregiondenedasanopenpointsetrunsintotheproblemthatitmayhavemissinglinesandpointsintheformofcutsandpunctures.Atanyrate,itsboundaryismissing.Acomplexregiondenedasaclosedpointsetadmitsisolatedordanglingpointandlinefeatures.Regularclosedpointsetsavoidtheseanomalies. Denition22. LetXR2.Xiscalledaregularclosedsetif,andonlyif,X= X.Theeffectoftheinterioroperationistoeliminatedanglingpoints,danglinglines,andboundaryparts.Theeffectoftheclosureoperationistoeliminatecutsandpuncturesbyappropriatelysupplementingpointsandtoaddtheboundary.Forthespecicationoftheregiondatatype,denitionsareneededforboundedandconnectedsets. Denition23. (i)TwosetsX,YR2aresaidtobeseparatedif,andonlyif,X\ Y=?= X\Y.AsetXR2isconnectedif,andonlyif,itisnottheunionoftwonon-emptyseparatedsets.(ii)Letq=(x,y)2R2.Thenthelengthornormofqisdenedasjjqjj=p x2+y2.(iii)AsetXR2issaidtobeboundedifthereexistsanumberr2R+suchthatjjqjj
PAGE 87

Figure3-10. Thehierarchicalstructureofaregionobject. obtain F=F[F=FandF)]TJ /F11 11.955 Tf 10.36 -4.34 Td[(=R2)]TJ ET q .478 w 202.37 -143.37 m 210.01 -143.37 l S Q BT /F7 11.955 Tf 202.37 -153.22 Td[(F=R2)]TJ /F7 11.955 Tf 11.1 0 Td[(F(6=?).EveryfaceFofaregion,thusdescribed,isarrangedintoalatticehierarchy(Denition 3.5 )byincorporatingaunique>and?identier. 3.7.4HierarchicalRepresentationofSpatialObjectsThestructuresofdifferentapplicationobjectscanvary.Examplesarethestructureofaregion(Figure 6-1 )andthestructureofabook.Weaimatdevelopingagenericplatformthataccommodatesallkindsofhierarchicalstructures.Thus,therststepistoexploreandextractthecommonpropertiesofallstructuredobjects.Unsurprisingly,thehierarchyofastructuredobjectcanalwaysberepresentedasatree.Figure 6-3 showsthetreestructureofaregionobject.Inthegure,face[],holeCycle[],andsegment[]representalistoffaces,alistofholecyclesandalistofsegmentsrespectively.Inthetreerepresentation,therootnoderepresentsthestructuredobjectitself,andeachchildnoderepresentsacomponentnamedsub-object.Asub-objectcanfurtherhaveastructure,whichisrepresentedinasub-treerootedwiththatsub-objectnode.Forexample,aregionobjectinFigure 6-3 consistsofalabelcomponentandalistoffacecomponents.Eachfaceinthefacelistisalsoastructuredobjectthatcontainsafacelabel,anoutercycle,andalistofholecycles,whereboththeoutercycleandtheholecyclesareformedbysegmentslists.Further,weobservethattwotypesofsub-objectscanbedistinguishedcalledstructuredobjectsandbaseobjects.Thesetwocomponentsformthebasisofatypestructurespecication(TSS)forcomplex,variablestructuredobjectsthatwe 87

PAGE 88

rstintroducedin( Chenetal. 2010 ).Structuredobjectsconsistofsub-objects,andbaseobjectsarethesmallestunitsthathavenofurtherinnerstructure.Inatreerepresentation,eachleafnodeisabaseobjectwhileinternalnodesrepresentstructuredobjects.ThisstructuremapsintothestructureofalatticefromDenition 3.5 .Further,forthetranslationofsuchhierarchicalstructurestologicaldatawarehousedesigns(relational,multidimensionalorhybridscenario)wehavealsodevelopedaspecializedbinarylargeobject(BLOB)extensioncalledIntelligentBLOBs(iBLOBs)( Chenetal. 2010 )thatsupportstheuser-centricmaterializationofcomplex,variable-structuredlargeobjectsindatabases.iBLOBs,togetherwiththeTSSprovideaneffectiveframeworkforthelogicaldesignandstorageofBigCubes.Thus,BigCubeprovidesauser-centricparadigmforthemodeling,storageandanalysisofcomplexspatialdatasetsindatawarehouses.Atreerepresentationisausefultooltodescribehierarchicalinformationataconceptuallevel.However,togiveamoreprecisedescriptionandtomakeitunderstandabletocomputers,aformalspecicationwouldbemoreappropriate.Therefore,weproposeagenerictypestructurespecicationasanalternativeofthetreerepresentationfordescribingthehierarchicalstructureofapplicationobjects,anduseanintelligentbinarylargeobject(iBLOB)structureforefcientstorageandqueryingofthishierarchicalinformation,asdescribedintheChapter 6 .Inthenextchapter,wedenethevariousBigCubeoperatorsthathelptodene,manipulateandcreatethespatialdatawarehousefollowedbynewoperatorstoqueryandaggregationoveralphanumericandcomplexspatialdata. 88

PAGE 89

CHAPTER4BigCubeALGEBRAThischapterspeciedtheBigCubealgebrawhichprovidesacompendiumofoperatorsavailableintheBigCubemodel.TheseincludeBigCubedenition,manipulation,andOLAPnavigation,aggregationandqueryingoperators.WhilethisChaptermostlyembodieson-goingresearchwork,someoperationsonspatialdatahavealsobeenintroduced. 4.1BigCubeOperatorsInthissection,weprovidetheBigCubeoperatorsthathelptocreate,manipulate,modifyanddeveloptheBigCubesinthedatawarehouse.Inarststep,weprovidetheBigCubeDenitionOperatorsthathelptodenetheBigCubecomponentsandcreatethem.Next,weintroducetheBigCubeManipulationOperatorsthathelptomanipulateandalterthestructureandcomponentsofexistingBigCubes.Finally,weintroducetheBigCubeQueryandAnalysisOperatorsthathelpbothindataretrievalandOLAPnavigationintheBigCubesandintheanalysisandminingofthedatabyperformingcorrectaggregations.Inthefollowingpages,instancesoftheBigCubecomponentsarerepresentedas:mformember,vformeasure,lforlevel,cforcategory,forhierarchy,pforemphperspective,sforsubject,bforBigCube,andaforaninstanceofthedescriptiveattribute.Letrepresentthesetofalldatatypesavailableinthesourcedatasetofndimensions.Thiscanincludebasicdatatypessuchasint,char,etc.,orcomplexdatatypessuchasgeometricdatatypesforcomplexspatialobjects,bothconnedtonitedomains.WeintroduceatypeconstructortthattransformsanygivenatomicdatatypeeintoaBigCubetypet(e)withtheappropriatesemantics. 4.2BigCubeDenitionOperatorsTheBigCubeprimitivetypessuchasmembers,measuresandcategoriesareconstructedfromthebasedatatypesbyapplyingtheconstructort.Eachtierof 89

PAGE 90

denitionoperatorshelpstoconstructthecorrespondingorderofBigCubedatatypes,asshownbelow.8(1in,1jn,i6=j,ei2)Order-1Operatorsadd member:t(ei)!miadd measure:t(ei)!viadd category:t(ei,ej)!ciOrder-2Operatoradd hierarchy:t(>,ci,cj,?)!iOrder-3Operatorsadd perspective:t(mi,vi,ci,>c,?c,i)!piadd subject:t(vi)!siOrder-4Operatoradd attribute:fa(ei)!aiThesesevenoperatorsarerequiredforthegenerationofBigCubecomponents.Theadd attributeoperatorhelpstoaddnewthematicattributestoanyofthebasicBDTinstances.Oncethecubecomponentshavebeengenerated,thecubestructureisinitializedbyusingtheformulateoperator:formulate(pi,si)=(pipj)!Si=biAtthispoint,thecellsintheBigCubecontainmeasures,alongwithsomenullvaluesforundenedmeasures.,andthedatawarehousehasbeenconstructed.structureofthedatawarehouse.Wealsoprovidesevencorrespondingremove,renameandupdateoperatorstoremove,renameandmodifytherespectivecomponentsfromtheBigCube.However,everystructuralupdateoperationontheBigCubeshouldbefollowedbytheformulateoperationtoreformulateandreinitializethemultidimensionalstructure. 4.3BigCubeManipulationOperatorsThebasicBigCubemanipulationoperatorshelptoalter,rename,updateanddeletethecomponentsoftheBigCubeandtheBigCubeitself.Eachofthesefunctionsare 90

PAGE 91

deneduniquelyoverfourOrders.ThesignaturesofthebasicBigCubemanipulationoperatorsareshownbelow.NotethatalterisusedovertheBigCubestructurewhereasupdatehelpstochangethemeasuresandmembervaluescontainedintheBigCube.8(1in,1jn,i6=j,ei2)Order-1Operatorsupdate member:t(mi)!mjupdate measure:t(vi)!vjalter category:t(ci)!cjOrder-2Operatoralter hierarchy:t(>,ci,cj,?,i)!h>,ci,cj,?,jiOrder-3Operatorsalter perspective:t(pi,mi,vi,ci,>c,?c,i)!hpj,mj,vj,cj,>c,?c,jialter subject:t(si)!sjOrder-4Operatoralter attribute:fa(ai)!ajGivenaBigCubeB=hP,S,C,H,Ai,havingthescheme(P1P2...Pn))167(!SiandthesetofBASEdatatypesdenedintheBigCubeandei,ej2,therenameoperatorrisdenedas:r(ei)!ej.SimilarlythedeleteoperatordenedcappliesonanyBigCubecomponenttoremoveitfromthestructureoftheBigCube.Henceeachdeleteoperationshouldbefollowedbyformulate(denedinSection 4.2 ).ThepushandpulloperatorsinBigCubehelptotreatperspectivesandsubjectsofanalysisinalmostasymmetricmanner.However,thesemanticsofboththesecomponentsisverydifferentinsidetheBigCubesinceeverycombinationofperspectivessigniesanuniquecellintheBigCubestructure.Thepushoperatorhelpstoconvertperspectivesintoanalysis-subjectsintheBigCubeandthusdevelopsanewBigCubebothinitsstructureandcontents.Thepulloperatoristhedualofpushandhelps 91

PAGE 92

toreformulatetheBigCubebyremovingasubjectandaddingitalongasanotherperspectiveoftheBigCube. 4.4BigCubeAnalysisOperatorsWenowprovidetheBigCubeanalysisoperationsforqueryingandaggregatingoverthemultidimensionaldataset.AllthestandardmultidimensionalqueryingoperationsavailableinadatawarehousecontextcanbeincorporatedintoBigCubebyspecifyingtransformationsontheelementsofBigCubeDataTypes(BDTs).TheanalysisoperatorsforBigCubeshavebeendevelopedtobebasic,elegantandfunctionalforthepurposesofdatatransformationandknowledgegathering.ThegoalhereistodigresscompletelyfromrelationalconceptsandcontinuewithapuremultidimensionalBigCubeviewforperformingdataanalysis.TheseoperationswilllaterbemappedtotheunderlyinglogicalmodelusedfortheBigCubeimplementation.ThebasicunaryoperatorsavailableinBigCubeincludeunion,intersectionanddifference.TheunionoftwoBigCubesprovidesanewBigCubewiththecombinationofallaxisperspectives(acrosstheirinnercategoricallevels)andformulatesanewcubeofanalysissubjects.ThemultidimensionalanalysisoperatorsavailableintheBigCubemodelhelptoperformqueryingandaggregationoperationsfordataanalysis.Theseincludeslice,dice,pivot,roll-up,drill-down,drill-throughanddrill-across.SliceisdenedontheBigCubebyremovingorrestrictingspecicmembersofacategoryonthedatahierarchy.Forexample,onecanperformslicetoremoveaspecicproduct(byname)orremoveanentiresetofproductsfromtheBigCubeviewbyselectingallproductswithnamestartingwith`i'.Notethatremovingentirecategoriesfromalevelinthehierarchyisnotaqueryoperator,butisinsteadaBigCubemanipulationoperatorasdenedearlier.Diceisdenedthesameasslice,butcanincludeseveralrestrictionsonmultipleperspectives(together)thatmayresultinanentirelysmallercube.Bothsliceanddicecanincorporatenotonlyequivalencetestsonalphanumericdata,butalsohighlevelspatialoperators.Thisispossiblesincethebasedatatypesdened 92

PAGE 93

Figure4-1. SpatialsliceoperationonthesetofUSstatepolygonstodeterminethestateslyingtothewestofWyoming havespecicoperationsassociatedwitheachofthem.Fore.g.,wecanusethespatialtopologicaloperators( EgenhoferandROBERT 1991 ; SchneiderandBehr 2006 )orspatialdirectionrelationoperators( Chenetal. 2010 ; Schneideretal. 2012 )toperformrestrictionandselectionovercomplexspatialobjects.AndexampleofaquerywhereaspatialsliceisappliedtothesetofUSstate(representedaspolygons)todeterminethestateslyingtotheWestdirectionofWyomingisillustratedinFigure 4-1 .Wenowprovidethesignatureoftheotherbasicanalysisoperators.8(1in,1jn,i6=j,mimj)rollup:bi(pi,,mi)!bj(pj,,mj)drilldown:bj(pj,,mj)!bi(pi,,mi)pivot:bi(pi,sj)!bj(pj,si)drillthrough:bi(pi,i,mi)!f?igTheBigCubebnotationisoverloadedtoillustratethechangeinthelevelofaggregation(stateoftheBigCube)duringtheseoperations.NotethattheBigCubealsocontainsasetofaggregateoperatorscalledA-set,witheachinternalcomponent,whosesemanticsaredenedbyunderlyinglogicaldesignmodel.WedeneastateoperatorforeveryBigCubeas(B)!Pi,whichreturnstheexactlevelsoftheBigCubeperspectives 93

PAGE 94

Table4-1. DataWarehouseAggregationOperatorsUsingBigCubeApproach TypeBigCubeAggregationOperator AdditiveSum,Count,MaxorApex,MinorBase,Concatenate,ConvexHull,SpatialUnion,SpatialIntersectionSemi-AdditiveAverage,Variance,StandardDeviation,MaxN,MinN,Centroid,CenterofGravity,CenterofMassNon-AdditiveMedian,MostFrequent,Rank,LastNonNullValue,FirstNonNullValue,MinimumBoundingBox,NearestNeighbor,Equi-Partition thatdenethecurrentstateofthecube.Usingtheformulateoperatorwecanthenobtainthecellsofthecubefromthesetofinstancesoftheperspectivelevels.TheBigCubeprojectionoperatorallowstorestricttheBigCubebyallowingselectionsononeormoresubjects,whiledisregardingtheotherscontainedintheBigCube.ThishelpsinthecreationofsummaryorquotientBigCubesforefcientmultidimensionalanalysis.ThesliceoperationremovesoneperspectiveandreturnstheresultingBigCube;diceperformssliceacrosstwoormoreperspectives.TheseoperationschangethestateoftheBigCube,becauseanychangeinperspectivesredenesthecells(measures)init.PivotrotatestheperspectivesforanalysisacrossaxesandreturnsaBigCubewithadifferentorderingofsubjects.Roll-upperformsspecializationtransformationoveroneormoreconstituenthierarchicallevelsanddrill-downappliesthegeneralizationtransformationoveroneormorehierarchicallevels.Drill-throughobtainsthebasedatavalueswithhighestgranularity.Drill-acrosscombinesseveralBigCubesinordertoobtainaggregateddataacrossthecommonperspectives.Further,wecanalsoincludeadditive,semi-additiveandnon-additiveaggregateoperationsfordifferentbasedatatypesasshowninTable 4-1 .Theadditiveoperatorscanbemeaningfullyaggregatedbyadditionoverspeciedperspectives.Thesemi-additiveoperatorscanbesummarizedusingadditiononlyalongsomeperspectives,andneedtoberecomputedfrombasedataforothers.Non-additiveoperatorscannotbeusedadditively(oneaggregationfollowinganother)overanyoftheperspectivesoftheBigCubes. 94

PAGE 95

4.5SpatialAggregationOperatorsSpatialaggregationoperatorsincludetopologicalpredicatesandanalysisoperatorssuchasequals,intersects,touches,within,crosses,overlaps,disjointandcontains.Quantitative(metric)spatialoperatorsincludemetricdistance,area,length,perimeter,etc.Qualitativespatialoperatorsincludecardinaldirections,relativedirections,etc.Otherspatialaggregationoperatorsthatyieldaspatialdataresultareconvexhull,envelope,centroid,boundary,intersection,union,differenceandbuffer.TheconvexhulloperatoronBigCubestakesaspatialdataBigCubeasinputandreturnsaBigCubethatcontainstheconvexhullofthespeciedgeometricobjectsintheperspectives.Thisisachievedbyapplyingtheconvexhulltransformonthegeometricdimensionandapplyingformulateoperationontheresultingperspectivestodeterminethenewsetofsubjects(andtheircorrectstateofaggregation).Theoperationspatialaggregationsoperateinasimilarfashionandprovideanewaggregatedcubeusingacombinationofthespatialoperationfollowedbydiceandformulateoperatorstodeterminethenewcubestateanditsinstancevalues.Acombinationofnumericandspatialaggregationoperatorsarealsoavailableonspatialdatacubes.Forexample,considerthecomputationofmaximumcity-citydistancebetweenFloridaandGeorgia(farthestcitiesinthetworegions).TheseareexamplesofspatialoperatorsallowedonBigCubes.( RuizandTimes 2009 )presentsanexploratorynoteoncombiningscalarandspatialaggregationoperators.Usingconceptsaboutdistributive(additive),algebraic(semi-additive)andholistic(non-additive)aggregationfunctionclassicationfrom( HanandKamber 2006 ),theauthorsprovideaninterestingclassicationofaggregationoperatorsonspatialdataasdistributivenumericanddistributivespatial,algebraicnumericandspatial,andholisticnumericandspatialoperators.TheyalsopresentanexampleapplicationbasedontheGeoMDQLconceptfrom( Silva 2008 ; Silvaetal. 95

PAGE 96

Figure4-2. Convexhullaggregationoperatorforspatialobjects 2006 2007 ),forageographicdatamodelandprocessor,usingpracticaldatafromtheBrazilianpublichealthsystem.WithBigCube,weextendthissetbyintegratingbasicalphanumericaggregationoperatorssuchasconcatenation,sum,avg,etc.,aswellasspatialqualitativeandquantitativeoperatorssuchasconvexhull.Ourgoalistodevelopthespatialdatawarehouse(SDW)asafull-edgeddatastorageandanalysissystemwhichprovidesnativesupportforcomplexspatialdataandadvancedspatialOLAPoperationsonthem.Theseoperationsonthespatialobjectscanincludebasicqueryingoperations,suchasFindthecitywiththelargestsalesvolumeforiPadsinthestateofFloridain2010,mapgeneralizationoperationssuchasFindallstateswherethetopveschooldistrictsout-performedallotherswithinthatstate,between2005and2010intermsofstudentgrades,orspatialanalysisoperationssuchasconvexhull:FindthesmallestconvexregioninwesternUnitedStatescontainingthemaximumnumberofcollegetownswheremorethan2500unitsofKinectweresoldin2010,andselectivespatialunion:ReturnthegeometryoftheregioninFloridadescribedbythecountieswhereDropBoxusageexceededthatof 96

PAGE 97

Twitterinthelastvemonths.Thespatialunionqueryrequiresaspatialaggregateuniononthegeometryofthevariouscountiessatisfyingthecondition.AggregationUsingConvexHull:Theconvexhulloperatorcanbeusedasanaggregateoperationinspatialdatawarehouses.Theconvexhulloperationreturnsageometryobjectthatistheminimumconvexsetcontainingthespeciedinputgeometryobjects.AnillustrationisshowninFigure 4-2 appliedovertheStateofFlorida.AggregationusingBoundingBoxorCircle:Theboundingboxorcircleoperatorcanalsobeusedasanaggregateoperationinspatialdatawarehouses.Theboundingboxoperationreturnsageometryobjectthatistheminimumboundingrectangle(MBR)ortheminimumboundingcircle(MBR)ofthecontainingthespeciedinputgeometryobjects.AnexampleisshowninFigure 4-3 whichshowsaqueryusedtoreturntheaveragewindspeedofallhurricanelocations(points)recordedwithinthestateofFlorida.ThisisachievedbyperformingtheboundingboxoperatoroverallthelocationsinsidethestateofFLwherehurricaneeventswhererecordedandthentakingtheaverageofthesewindspeeds.AnotherexampleusingminimumboundingcirclesisshowninFigure 4-4 overtheStateofFlorida.Manyotherinterestingspatialaggregationqueriescanbeconstructedwhenspatialdataisfullyintegratedintodatacubesandaneffectiveapproachformultidimensionalqueryingisavailableonthem.Ancomparisonofexistingmodelsforspatialdatawarehousingwasintroducedin( Viswanathan,G.andSchneider,M. 2011 ).ByfollowingthestrategyfollowedinthisSectionwecandenethesemanticsofadditionalaggregationoperatorsinvolvingvariouscombinationofscalarandspatialoperations.Byincorporatingthebasedatatypeswithinalltheperspectivesandanalysissubjectsofthecubeview,BigCubepresentsasimpleyetextensiblefoundationformultidimensionalqueryingonlarge-scale,complexscienticdataandenableanalyststoperformeffectivedataanalysis.InthenextchapterweshalldescribethetransformationsperformedtoalogicalmodelthatisthenusedforimplementingtheBigCubeview. 97

PAGE 98

Figure4-3. Boundingboxaggregationoperatorforspatialdata 4.6TranslationstoLogicalDesignandPhysicalImplementationThissectionfocusesontranslationsfromthehigh-levelBigCubeviewtoavailablelogicaldatamodels.Forexample,oneapproachcouldbetotranslatetheBigCubeviewtorelationalstar,snowake,factconstellationorgalaxytypeofROLAPschema.Thistransformationfrommultidimensionalviewtodatabase-basedimplementationisexplainedhere. 4.6.1CaseStudy:WeatherEventAnalysisToillustratetheapplicationofsuchauser-centric,conceptualdatawarehousemeta-framework,wehaveusedanexamplefromtropicalweathereventsresearch.TheUSNationalHurricaneCenter(NHC)( NationalHurricaneCenter(NHC)-SeasonArchives 2011 ),NOAAHurricaneResearchDivision(NOAA)( NationalOceanicandAtmosphericAdministration(NOAA) 2011 ),andJointTyphoonWarningCenter(JTWC)( JointTyphoonWarningCenter(JTWC) 2011 )collectdataabouthurricaneeventsintheNorthAtlanticandPacicOceanusingacombinationofsatellite,weatherballoons 98

PAGE 99

Figure4-4. Boundingboxaggregationoperatorforspatialdata andighttelemetrysytems.Thesedatasetscontainhistoricalhurricanetrajectoryinformation(from1997-2010)alongwithabout150otherrelevantattributessuchaswindspeed,pressure,hurricane-stage(category),etc.UsingthisinformationandtheshapelesforUSStateboundaries,wecreateaspatialdatawarehousewithdatacubesdescribingthehurricanetrajectoryandotherassociatedattributes(anexampleisshowninFigure 4-5 ).Thisframeworkcannowbeusedtoexecutespatialanalysisqueriessuchas,ndthehurricanethatcrossedthestateofLouisianain2005withmaximummonthlywindspeedaverages,determineallwindspeedsfora5kmradiusforhurricanesclassiedascategory-3orhigherfrom2003-2010,anddetermineaheatmapforallUSStatesbasedonthenumberofhurricanesthataffectedeachofthemfrom1990-2010.Weemployedthegenericspatialdatawarehousemeta-modellingapproachtodesigntheexampledatacube.Inthisapproach,hierarchiesaredenedasrstclasscitizensofthemultidimensionalstructure.Hierarchiesofdatacategoriesexistin 99

PAGE 100

boththeperspectives(oftencalleddatadimensions)andthesubjectsofanalysisormetrics(oftencalledfacts)ofthecube.Measurevaluesareinstancesofsubjectsofanalysisandmembersareinstancesofthecube'sperspectivesofvisualization.Themeasuresandmembersofthedatacubecanbebothalphanumericvalues,spatialobjectsoracombinationofthese.Forexample,westorethelocationoftheeyeofthehurricaneasaspatialpointobject.ThespatialpointisdenedbylatitudeandlongitudeongeographicWGS94coordinatesystem.Theexecutionoftherstquery(above)includedthefollowingsteps.First,weselectallhurricanesthathadatopologicalis-crossrelationwithLouisianawithasliceonyear(2005).Forthesehurricanes,wegatheredthewindsspeedsatthelocationoftheeyeofthehurricaneandthencomputedthehurricane-specicwind-speedaveragesforeachmonthin2005.Finally,weselectthenameofthehurricanewiththemaximumwindspeedaverage(Katrina).Theabilitytoperformthematicselections,spatialtopologicalrelations,aggregationsonmeasures(suchastheaverageonwindspeedvalues)overdataintegratedfromheterogeneoussourcesoverthehistoricaltimeperiodallowsfortheexecutionofsuchaquery.ThisillustratestheversatilityandusefulnessofaspatialdatawarehouseforperformingOLAPoperationsonlarge-scaledatasets.Thegenericmeta-modelforsuchaspatialdatawarehouseallowsthesystemtocompletelycaptureandstorethemultidimensionalstructurewhiledormant,andeasilyrecreate,pivotandquerytherelevantperspectivesandanalysis-subjectsofthedatacubewhilequeriesarebeingprocessed. 4.6.2OLAPformulationstosupportSpatialDataWarehouseModelingwithC3ConstructsInthisSectionwepresentOLAPformulationsthathelptoapplyanalysisoperationsondatacubeswithcomplexspatialdatabyusingtheC3constructsontheBigCubemodel( ViswanathanandSchneider 2011c ). 100

PAGE 101

Figure4-5. IllustrationofthestructureofaWeather-Eventsdatacubeshowingthreeperspectives:HurricaneData,Location(spatialpoint)andTimethatdenethreesubjectsofanalysis:wind-speed,wind-pressureandhurricane-stage. First,weanalyzehowdatacubescanbeeasilydesignedandmodeledusingtheC3constructsasfollows.Thebasic,low-leveldatatypesareavailableinthekindBASE.Theseincludealphanumeric,timeandgeo-spatialdatatypes.ElementsofthesetypesarethebasedatavalueswhicharerstorganizedintoCategoriesbyusingthecategorizationconstruct.Thismeansthatfore.g.,`GNV',`LA',`MN'canbeacategoryofcities.Analysisfunctionscanbeassociatedtothedomainofthecategories.Fore.g.,wecandeneaunionfunctionthattakestheelementsofcitiesandperformsaunionoperationtoyieldanewpolygon.Thenextstepistousethecontainmentconstructtodenethehierarchicalnatureoftheelementswithinthecategories.Thisallowsforthecreationofexplicithierarchicalpathsbetweencategoriesandthespecicationofanalysisoperationsoneachofthemonuniquelyorasawhole.Ane.g.,ofanalysisbeingusingthecontainmentconstructistheoften-usedSUMaggregationoperatoronSalesquantitydenedfromCitytoStatelevel.Thenalstepisthecreationofinteractinglatticegalaxieswhichisachievedbyusingthecombinationconstruct.Thecombinationconstructmapsthecategoriesindifferenthierarchiestoothersinthegalaxytocreatethedatacubeschema(cells). 101

PAGE 102

Elementsofthedatacube(objectswithinthecells)areidentiedbytheirdeningcubeperspectives.Inthenexttwosection,wedescribehowtotranslatefromtheBigCubeviewtorelationalOLAPdatabasedesign(star,snowakeandfactconstellationorgalaxyschema)forimplementation. 4.6.3TranslationfromBigCubetoRelationalOLAPDesignThemappingfromBigCubeconstructstoalogicalmodelisachievedbytranslatingthebasicBigCubeentities(asdenedinChapter 3 totheentitiesinthelogicalmodel.ForrelationalOLAPthemostcommonlyusedschemasarethestar,snowakeandfactconstellationorgalaxyschema.Thegalaxyschemaisacomplexschemawhichhasseveralfacttablessharingoneormoredimensiontables.Thedimensiontablescouldbesharedatdifferentlevelsamongthedifferentfacttableswhichoftenmakestheaggregationoperationsassociatedwiththeschemaeverydifculttomanage.However,theschemaisextensibleinthatitallowsnewdimensionsandassociatedfactstobebuiltovertimebysimplyspecifyingtheexactaggregationoperationsallowedonthenewfactsovertheshareddimensionlevels.OurapproachtomapfromtheabstractBigCubeviewtorelationalOLAPrequiresaclearunderstandingoftheROLAPschemathatisassignedforthemapping,andisextendedandimprovedforhandlingcomplexspatialobjectsfromresearchpresentedin( Abelloetal. 2003 ; Agrawaletal. 1997 ; GyssensandLakshmanan 1997 ; Vassiliadis 1998 ; VassiliadisandSkiadopoulos 2000 ).ConsidertheBigCubeBthatistobemappedintoarelationalschemaRwithassociatedfactanddimensionrelationsFRandDRrespectively.TomapBigCubestorelations,werstdenetwobasicmappingfunctions.TherstfunctionqmapsBigCubeperspectiveandanalysis-subjectlevelstoanattributeofarelation,andkmapsrelationalattributestoaparticularperspectiveoranalysis-subjectlevel(anditsassociateddatacategories).Thesedenethedatadimensiontablesintherelationalschema.Thusaperspective'shierarchicallevelPcanbemapped 102

PAGE 103

toanattributeAifq(P)=Aandk(A)=P.Further,aftertheBigCubeisformulatedusingtheinjectivefunctionfromtheperspectivelevelstothesetofinstancesoftheanalysissubjects,wealsohaveq(S)=Aandk(A)=Sthusdeningthefacttables.TherelationshipbetweenthefactanddimensiontablesisbasedontheinjectivefunctionfB:(P1P2...Pn))167(!Siwherei2f1,...rg^(Si,P)2APEXdeningtheBigCubeview.TheseconstraintsaretranslatedtorelationalschemabycreatingdimensiontablesDaforeachofthecategoriesintheperspectivehierarchyoftheapexcube.Oncethisisdone,everylowerleveloftheperspectiveistransformedintoaseparatedimensiontableDl.TheprimarykeyfromtheDabasetablesisplacedintocorrespondingDltablessuchthehierarchicalpathbetweenapexalevelandthenextlevell=a)]TJ /F6 11.955 Tf 10.97 0 Td[(1ismaintained.Thisprocessiscontinuedtop-downinbreadth-rstorder,tillwereachthebasecube.Thustheparentlevelsareassociatedwitheveryappropriatechildleveltocreatetheentireperspectivehierarchyintheunderlyingrelationalschema,whichinthiscaseturnsouttobesnowakeorgalaxy(ifnewfacttablesarealsobuilt).Toachievethesameprocessusingastarschema(withsingle-leveldimensiontables)wecreateatupleforeachofupper-levelparentcategoryinallitsdescendants.Anexampleofthisprocessisshowninexamplesbelow.Thenalstepistostoretheanalysisfunctionsassociatedwitheachlevelofthesubjects(intheirA-set)asSAforthesubjectA-sets.ForeachfunctionintheA-setwecreateanewattributeinthefacttablethatcontainsthenewvalueaggregationfromthebasetables.ConsideranaggregationfunctionSUMthatexistsinthesalesquantitysubject'sA-setforthetimedatetomonthlevels.Thismeansthatthesalesquantityformonthvalues(correspondingtomonth idforeignkey)inthefacttablesarecomputedbyperformingsummationonthevaluesfromthedayvalues(correspondingtoday id).Anotherexampleisshownintheschematranslationsbelow.Asanalternative,theSAsettablescanalsobebuiltandstoredwithintherelationalschemawithonetupleforeachpathinthelatticedescribingthesubjectS.Apartfromthetwolevelattributes(for 103

PAGE 104

parent-childrelationship),theattributeaggfncontainstheexactA-setfunctionsdeningthepathinthelattice.Fore.g.,fortheLocationsubjectiftheaggregationbetweenCountryandStateismap-generalizationmeaningthatStates(oftypepoint)aremappedtotheCountries(oftyperegion)thattheyaretopologicallypresentin,thisisrecordedinatableLocation Asetwithattributes(S level1,S level2,aggfn)asanentryhcountry,state,map-generalization-fni.TheA-settablesareusedwheneveraroll-upordrill-downisperformedonthecubewhentheLocationexistsasasubjectinsidetheBigCube.ThishelpstomaintainthecontinuityoflevelsdeningtheBigCubelatticestructure.ThesameprocessiscontinuedforeachoftheA-setspresentintheBigCubeentities.WedeneadatabaseschemaRtocontainaBigCubeBiffforeachpathinthelatticeoftheperspectivesthereisacorrespondinglink(key)fromthechildtoparentattributesinR,andforeachsubjects2SiinthecellsoftheBigCubethereisacorrespondingtupleinthefacttablesoftheschema.ThenalrelationalschemaRisdenedbythecollectionoffacttablesFi,dimensiontablesDi,andaggregation-settablesforthefactsanddimensions,suchthatRcontainsB.Fore.g.,astarschemafortheexampleintroducedinSection 4.6.1 isshownbelow. HURRICANE(hurr_idnumber,hurr_namevarchar2(15),hurr_aliasvarchar2(15))LOCATION(loc_idnumber,latnumber(5,2),longnumber(5,2),citypoint,stateregion,countryregion)TIME(time_idnumber,yeartimestamp,monthtimestamp,weeknumber(2),daytimestamp)MEASURE(idnumber,speednumber,pressurenumber,stagevarchar2(10),hurr_idnumber,loc_idnumber,time_idnumber)SimilarlyusingtheapproachpresentedaboveasnowakeschemacanbebuiltfortheBigCubeexampleintroducedinSection 4.6.1 asshownbelow. HURRICANE(hurr_idnumber,hurr_namevarchar2(15),hurr_aliasvarchar2(15))LOCATION_country(country_idnumber,countryregion)LOCATION_state(state_idnumber,stateregion) 104

PAGE 105

LOCATION_city(city_idnumber,citypoint,state_id,country_id)LOCATION_point(point_idnumber,latnumber(5,2),longnumber(5,2),city_idnumber)TIME_year(year_idnumber,yeartimestamp)TIME_month(month_idnumber,monthtimestamp,year_idnumber)TIME_week(week_idnumber,weeknumber(2),month_idnumber,year_idnumber)TIME_day(day_idnumber,daytimestamp,week_idnumber,month_idnumber,year_idnumber)MEASURE(idnumber,speednumber,pressurenumber,stagevarchar2(10),hurr_idnumber,loc_idnumber,time_idnumber)Thegalaxyschemaconsistsofoneormorefacttablesrelatedtooneormoredimensiontables.Therecanbedifferentkindsofgalaxyschemadesigns.Intherstkind,onlyonefacttableisbuiltforallfactstobeanalyzed.Inthesecondkind,onefacttableforeachfacttobeanalyzedinthedatawarehouse.Inthethirddesign,dimensiontablesexistinonlyonehighlevel(meaningtheapexperspectivelevelistiedtotheschema)withanattributeinthetableforeachadditionalinnerlevelofthehierarchy.Theseareassociatedwiththefacttable(s).Inthefourthdesign,onedimensiontableisbuiltforeachlevelinthedimensionalhierarchiesandeachleveltableisassociatedtospecicfactsbasedonthedatadenitionandaggregationsemanticsprovidedbytheuser.Asanexample,agalaxyschemaforthecasestudypresentedinSection 4.6.1 isshownbelow.Weconsiderthecasewhenhurricanespeedisprovidedfordayvaluesandtheaveragefunctionusedtocomputethenewwind-speedvaluesforthemonthandyearrollupontimeperspective. HURRICANE(hurr_idnumber,hurr_namevarchar2(15),hurr_aliasvarchar2(15))LOCATION_country(country_idnumber,countryregion)LOCATION_state(state_idnumber,stateregion)LOCATION_city(city_idnumber,citypoint,state_id,country_id)LOCATION_point(point_idnumber,latnumber(5,2),longnumber(5,2),city_idnumber) 105

PAGE 106

TIME_year(year_idnumber,yeartimestamp)TIME_month(month_idnumber,monthtimestamp,year_idnumber)TIME_week(week_idnumber,weeknumber(2),month_idnumber,year_idnumber)TIME_day(day_idnumber,daytimestamp,week_idnumber,month_idnumber,year_idnumber)MEASURE_speed_day(speed_idnumber,speednumber,point_idnumber,day_idnumber)MEASURE_speed_month(speed_idnumber,speednumber,point_idnumber,month_idnumber)MEASURE_speed_year(speed_idnumber,speednumber,point_idnumber,year_idnumber)MEASURE_pressure(speed_idnumber,speednumber,city_idnumber,week_idnumber)MEASURE_stage(speed_idnumber,stagevarchar2(10),point_idnumber,day_idnumber) 4.6.4ObservationsDuringtranslationfromtheBigCubemultidimensionalviewtotherelationalschema,themostbasicobservationisthattherelationaltablesmakesthedesignstatic,inthatfactanddimensiontablesarespecicallybuiltforsubjectsandperspectivesofanalysis.Howeverwhenwehavetouseafactasoneofthedimensionsofanalysis,theentireschemahastoberecomputedandrebuiltbecausetheassociationfromthenewdimensiontotheotherfactsisnotyetcontainedintheschema.Considerfore.g.,thecasewhenlocationbecomesasubjectofanalysisinthefollowingquery:Findthecitieswhichwhereaffectedbyhurricanesin2005,withintensitygreaterthatCategory-3.Inthiscase,intheBigCubeviewwemovetotheBigCubestatedenedbyhurr idhurr stagetime year!citybyusingtheformulateoperatorontherequisiteperspectivelevels,runningtheslicefortime=2005,hurr id=notnullandhurr stage=`category)]TJ /F6 11.955 Tf 11.3 0 Td[(3'andthenreturningrollupresultfrompointtocitylevelinlocationperspective.TheBigCubealgebraisclosedbecausetheoperationsareperformedonBigCubeandreturnvalidBigCubeelements.Itisalsocompletewhencomparedtorelationalalgebrasinceanyclausecanbemodied,anyvalidBigCubequerycanbecomputedasasetofoperationsappliedtotheappropriateBigCube. 106

PAGE 107

Toperformthesamequeryonarelationalschema,weneedtobuildthecorrespondingnewtablesforthecorrespondingdimensionlevelsandthenperformingtheaggregateoperations.Themultidimensionalviewenablesamoreuser-centricanalysisbecausealltheperspectivesandsubjectsareclearlyvisibleintheBigCubeviewandanalysisaggregationscanbedirectlyappliedtotheBigCube.Thesemanticsofthelogicalmodelhelpstoconstrainttheaggregationsandrelationshipsbetweentheavailabledatadimensionsandmeasuresinthedatawarehouse.Overall,thisdesignhelpstoconstructandqueryovercomplexscienticdata,anddeneanddevelopnewaggregationoperationsintuitivelyusingthemultidimensionalview. 107

PAGE 108

CHAPTER5BigCubeQUERYLANGUAGE 5.1OverviewInrecentyears,severalnewandinnovativefrontendapplicationshavebeendevelopedforperformingbusinessintelligence(BI)andonlineanalyticalprocessing(OLAP).TheseincludeseveralGUItoolsthathelptoanalyze,andquerydatabasesandspreadsheetsasameanstogatherinformationfororganizationaldecisionmaking.Datawarehousesserveasthefoundationrepositoriesofmassiveamountsofdatathatarecollectedfromseveralheterogeneoussourcesoveralargeperiodoftimeinanorganization.TheselargerepositoriesstorevaluableinformationthatcanbeminedusingOLAPquerytoolsforknowledgegathering.However,thereisthelackofagenericuserinterfacetoenableanalystuserstoquerydatafromthedatawarehousebyjustusinganabstractcubeinterface.TheexistingquerylanguagespopularlyusedinmultidimensionaldatabasessuchasOracleOLAP( OracleOLAP:MultidimensionalAnalyticEngine 2011 )andMDX( MicrosoftCorporation 2010 )arebasedonuniquelogicalimplementationmodels,andcanoftenbeinappropriatefortheuserofageneralpurposedatawarehouse.Inrecentyears,MDXhasbecomethedefactostandardfortextualOLAPlanguages.However,itisnotoftenusedinapplicationsorindustryduetoseveralproblems.First,itrequirestheusertohavegoodknowledgeoftheunderlyingimplementationdetailsforperformingunionsandcrossjoins.Second,itdoesnotprovideaglobalviewoftheschemabutrequirestheusertohavethemultidimensionalschematicofdatabeforequerying.Thirdly,MDXcanoftengetcomplicatedduetoseverallevelsoforderedcrossjoinsinthequery.Thus,oftenthequeryitselfcannotbewrittenorvalidateddirectly,insteadrequiringaMDXparserandimplementationfortestingitsvalidity.WefoundseveraloftheexistingMDXimplementationstoberatherdifferentintheirlevelsoffunctionalityandsupport,leadingtomanyerrorsandincompatiblesyntaxissues. 108

PAGE 109

Inthischapter,weaimtosolvetheseproblemsassociatedwithexistingquerylanguagesfordatawarehouses,byintroducingtheCubeAnalysisLanguage(CAL).CALisanabstractionoverstandardSQLaggregateextensionsandMDX,andhelpstheusertodesign,buildandmanipulatedatacubesinafullymultidimensionalenvironment.CALalsoincludesaqueryandanalysiscomponentthatcanbeseamlesslyextendedtosupportcomplexanalysisoperations.ThesearelatertranslatedintologicalcomponentsandimplementedusingexistingtechnologiessuchasRelationalOLAPorMultidimensionalOLAP.TherehavebeenafewproposalsforOLAPquerylanguagesintendedformultidimensionalandstatisticaldatabases.Forexample,thesyntaxanddenitionofaDataMiningQueryLanguage(DMQL)fortheDBMinerSystemispresentedin( Hanetal. 1996 ).DMQLsupportsconcepthierarchiesonthedimensionsofthecube.Thelanguageprovidesaframeworkforminingthroughmultiplelevelsofdatausingcharacteristic,discriminant,classication,andassociationrules.Sincethelanguagedesignistoospecic,itisnotcompleteinadatawarehousecontext.However,DMQLservesasaninterestingqueryingplatformforOLAP.ThespecicationforSumQL,aquerylanguageforSummaryDatabases(SDB)isprovidedin( Pedersenetal. 2000 ).SumQLhelpstheusertoposeaggregatequeriesoverSDBsbytakingadvantageofthesemanticpropertiesofthesummarydatamodel.SumQLisanSQL-likequerylanguagethatincludesconstructstoreectSDBconceptssuchasdimensions,hierarchiesovercategoriesandaggregation.However,SumQLisprovidedasalanguageforsummarydatabases,anditdoesnotexplicitlyincludedatawarehousingandOLAPfunctionssuchasslicing,dicing,etc.,andhencetheuserhastodesigncomplexqueriesforhandlingsuchoperations.MultiDimensionaleXpressions(MDX)( MicrosoftCorporation 2010 )isapowerfullanguageformultidimensionaldataqueryingandisnowsupportedbynumerousBIandOLAPsystemvendors.However,MDXcanbecomplicatedforaunskilleduserofadata 109

PAGE 110

warehouse,sinceitdoesnotspecifyconventionalOLAPoperationssuchassliceordiceexplicitly,butinsteadforcestheusertoperformjoindataacrossseveraldimensions(usingmultiplecrossjoinsamongitsmembers)toarriveattherequiredaggregationstate.MDXhasbeenwidelyadoptedandissupportedbyMicrosoft,Microstrategy,Whitelight,SAS,SAPandotherpopularanalysisservicesvendors.AnothermultidimensionalquerylanguagecalledtheCubeQueryLanguage(CQL)ispresentedin( BauerandLehner 1997 ),thatisdesignedforstatisticalandscienticdatabases(SSDB),particularlytheCROSS-DBsystem( Lehneretal. 1996 ).CQLisanextensionofSQLthatsupportsmultidimensionalqueryingandprocessesqueriesinatwo-stepfashion,withadataquerying(GET)stepfollowedbyadatapresentation(SHOW)step.Itenablestheuseoffeaturestofurtherrenetheselectionprocess.Itallowsselectionpredicatesnotonlyonclassicationnodes,butalsoonfeaturevalues,thatcharacterizethesingleitems.However,CQLdoesnotprovideagenericinterfaceforconventionaldatawarehouseoperations. 5.2BigCubeDataModelandExampleWeintroducedtheBigCubedatamodelformultidimensionalanalysisin( ViswanathanandSchneider 2010 )andtheCALquerylanguagein( ViswanathanandSchneider 2011a ).Here,weprovideaquerylanguageforBigCubecalledtheCubeAnalysisLanguage(CAL)thatprovidesacompletenewsyntaxformultidimensionaldatamodeling,navigationandanalysisusingtheabstracthypercubemetaphor.Thisallowsforaclearanduser-centricdatawarehouseimplementation.InthisSection,wepresentaProductSalesexamplethatisusedintherestofthepapertodemonstrateCALfunctionality.ConsideranexampleSalesdatasetfromasupplier,storingorderinformationforthesaleofproducts,toseveralcustomers.Productsarecategorizedbytheirnameandbrand.Eachproductalsohastypeandcolorinformation.Thecostpriceforeachproductismaintainedbythesupplier,alongwithitsavailablequantityand 110

PAGE 111

themanufacturinglocation,aspartoftheinventory.Theinvoiceissuedbythesellercontainsthesellingpriceandquantityofproductsorderedbythecustomer.Onceanorderisplace,theordercommittimeisrecordedforeachday,month,weekandyear.Thestorelocationisalsorecordedforshippingpurposesandincludescity,zone,stateandcountryinformation.Apartfromthis,thesupplieralsorecordstheprotresultingfromthesalesofacertainquantityofproducts,tocaptureandtracktheiroverallsales.InBigCube,salesrepresentsanenterprisedatawarehousecontainingorderinformationfromthesaleofproductstocustomers.Basedonuserrequirements,weidentifythreesubjectsforanalysis(basedonuserrequirements),namely,Inventory,Invoice,andSales.Inventoryisquantiedbythefollowingmeasures:cost price,manufactured atandunit sales.Similarly,InvoiceandSaleshavetheirownmeasures.Theperspectivesfordataanalysisincludeproducts,timeandlocation.Timehasthemembersdays,weeks,monthsandyearsorderedintohierarchies,asdeterminedbytheuser.Similarly,locationalsoincludesitsconstituenthierarchies.Thisillustratestheconceptualstructureofthesalesdatawarehouse. 5.3CUBEANALYSISLANGUAGE(CAL)TheCubeAnalysisLanguage( ViswanathanandSchneider 2011a )consistsofthreecomponents:CDLorCubeDenitionLanguagetodesignandconstructthecubestructures,CMLorCubeManipulationLanguagetoalterandmodifycubeschema,andCQALorCubeQueryandAnalysisLanguagetonavigatethroughcubeinstancesandqueryitforanalysis.ThebasicarithmeticoperatorsinCALincludeaddition(+),subtraction(-),multiplication(x),division(/),power,andparenthesesforgrouping.CALalsosupportsregularexpressionsinitssyntaxandusesthedotnotationforaccessingthecomponentsofthecube. 5.3.1CubeDenitionLanguage(CDL)TheCubeDenitionLanguage(CDL)helpstodesignandconstructdatacubeinaconceptualmanner.Theunderlyingtypesystempresentsanobject-orientedviewofthecubestructurethathelpstomodelthemultidimensionaldatasetwithhigh 111

PAGE 112

expressiveness.Eachcubecontainsasetofsubjectsandperspectives.Subjectshaveassociatedmeasuresandperspectiveshaveassociatedcategoriesofmembersformingdatahierarchiesaspresentedin( ViswanathanandSchneider 2010 ).Thelistofmembersisuniqueforeachperspective.Thelistofsubjectsandperspectivesisuniqueforeachcube.CubecreationisachievedusingtheCREATEclause,modicationofcubestructureisachievedusingALTER,andDROPisusedfordeletionasillustratedbytheexamplesbelow. CREATECUBESales(SUBJECTInvoice(Selling_Pricedecimal(20,2),Sales_Quantityint)SUBJECTInventory(Cost_Pricedecimal(20,2),`UnitSales'int,Manufactured_Atchar)SUBJECTSale(Profitprofit_t,`UnitSales'int)SHARED(Sales.`UnitSales',Invoice.Quantity)PERSPECTIVEProduct(PNamechar,Brandchar)ATTRIBUTE(PColorchar,PTypechar)PERSPECTIVETime(Dayint,Monthchar,Weekint,Yearint)HIERARCHY(Day,Month,Year)ASTime_H1HIERARCHY(Day,Week,Year)ASTime_H2PERSPECTIVELocation(Citypoint,Countyregion,Statechar,Region_1char,Region_2char,Countrychar),HIERARCHY(City,County,Country)ASLoc_H1 112

PAGE 113

HIERARCHY(City,State,Region_1,Country)ASLoc_H2HIERARCHY(City,State,Region_2,Country)ASLoc_H3);Duringthelogicaldesignphase(Section 5.4 ),forrelationalimplementationtheattributesfortheproductdimensionbecomepartofseparatetableslinkedtoproducttablesbyforeignkeys.Thuseachproduct(identiedbyname)wouldhaveuniquecolorandtypeattributes.Membersandmeasuresinacubecanbedenedusingalpha-numericdatatypesorspatialtypesassupportedbytheunderlyingdatabasesystemorextensibilitymechanism.Intheaboveexample,generalizationisappliedontheLocationperspectiveduringrollupofcity(point)values(spatialdata)tostateandhigherlevels(alphanumericdata).TheALTERADDandALTERMODIFYclausescanbeusedtoaddnewcomponentsandmodifyexistingcomponentsofacuberespectively.Eachperspectivedenesanewaxisonthemultidimensionalcube.However,sincecombinationofperspectivesdeterminesthesetofmeasuresinthecellsofthecube,thecubeitselfchangesandhastobereconstructed(exceptforALTERRENAME).ThefollowingexamplesillustratetheworkingoftheALTERclause. ALTERSalesADDSUBJECTSupplier(Shipped_Datedatetime);ALTERSales.LocationADDHIERARCHY(City,Country,Continent)ASH_Loc4;ALTERSalesDROPPERSPECTIVESupplier;TheUSINGclauseprovidesamechanismtocreateduplicatesofthecubestructure.WhenusedwiththeCREATECUBEclause,allthedatavaluesarealsoconceptuallyduplicatedbydefault.WithaNULLsufx,thesubjects,perspectivesandassociatedhierarchiesinthenewlycreatedcube(itsstructure)matchthoseofthesourcebutthevaluesarenullasintherstquerybelow. CREATECUBESales_OrdersUSINGSalesNULL;CREATESUBJECTSales_Orders.Order_InvoiceUSINGProduct_Management.Invoice;TheDROPCUBEstatementdropstheentirecubealongwithallitssetofsubjectsandperspectives. DROPCUBESales_Orders; 113

PAGE 114

5.3.2CubeManipulationLanguage(CML)Oncethedatacubeshavebeencreated,CMLcommandsareusedtopopulatethecubeswithmemberandmeasurevalues.ThedataloadingprocessinginadatawarehouseisoftencomplexandconsistsofseveralExtraction,TransformandLoad(ETL)phases.CMLcommandsprovideanalternativetextualinterfacetoachievetheseoperations.TheINSERTstatementisusedtoaddnewdataintoacellofthedatacube.Insertingnewmemberscanleadtoseveralemptycells(formeasures)inthedatacube.Insertingthemeasurevaluesincellsisachievedbyexplicitlyspecifyingacombinationofperspectives,usingthePERSPECTIVEclausefollowingtheINSERTINTO...VALUESclause. INSERTINTOSales.InventoryVALUES(200.50,50,`Orlando')PERSPECTIVELocation.City=`Boston'ANDTime.Year=2011ANDProduct.Name=`TiVo';TheUPDATE-SETstatementhelpstomakechangestodatavaluesthatalreadyexistinthecube.Updatingmembersmayormaynotaffectthemeasurevaluesinthecellsofthecube. UPDATESalesSETInventory.Quantity=200WHERE(Location.City=`Boston'ANDTime.Year=2011ANDProduct.Name=`TiVo');TheDELETEstatementhelpstoremovedata(memberormeasurevalue)fromthecube.TheexampleshowstheremovalofalldatarelatedtothesalesofiPhoneatBostonontherstofeverymonth. DELETEFROMSalesWHERE(Location.City=`Boston'ANDTime.Day=01ANDProduct.Name=`iPhone');TheTRUNCATEcommandclearstheentirecubebyremovingallthemeasurevalues.Truncatecanalsobeusedontheperspectivesofacube,butsincethisremovesallthemembersfromthatperspective,thecellsofthecubebecomenull. TRUNCATESales.Invoice; 114

PAGE 115

5.3.3CubeQueryandAnalysisLanguage(CQAL)TheCubeQueryandAnalysisLanguage(CQAL)helpsinnavigatingandqueryingdatacubesandperforminganalysis.TheprimarycommandinCQAListheGETclausewhichisusedtoretrievemeasurevaluesfromthedatacube(similartoSELECTinSQLforrelations).Theperiodordotoperatorcanbeusedtoimplicitlydrill-downorroll-uponthelevelsoftheperspective(whereverhierarchieshavebeendened).TheperiodoperatorreferencesthecubecomponentsusingtheirnamesinthedatawarehouseenvironmentbyusingtheCMLmetadata.TheASclauseattheendofthequeryhelpstostoreresultsandbuildmulti-levelqueries.Backticksareusedwheneverspacesarepresentinthenamesofcubecomponents.Anoptionalclausethatcanbeusedwithanycategoricallevelofaperspectiveorsubjecthierarchyisthememberchildren,thatreturnsamultisetofallchildrenvaluesbelongingtotheelementatthatparticularlevel.TheONoperatorhelpstoformattheresultantmultidimensionaldatacubealongitsaxes.AxisnumbersinCALstartfrom1whichreferstothecolumnsinthereport.However,sinceonlytwo-dimensionalreportsareconvenientforvisualizations,automaticjoinsofmembersareachievedfromlefttorightasspeciedinthequery.Tochangethisordering,theSOLVE PRIORITYcanbespeciedintheCMLle.SlicingisasimpleformofselectionappliedonOLAPcubes.Duringslicingoneormoreperspectivescanberemovedfromthecubeusingasetofequalityconstraintsonitsperspectives.Membersalongdifferentperspectivescanalsobelteredbyslicing.ConsideraquerytoshowtheinventoryofallelectronicsproductsatBostonin2011.SLICEallowsboththedotnotationandlteringviaequalityasshownbelow. GETSales.InventorySLICEProduct.PType=`electronics'SLICELocation.City=`Boston'SLICETime.Year.2011FROMSales; 115

PAGE 116

TheDICEclauseallowsforthecreationofsubcubesthatarebasedonacombinationofmemberstoyieldasmallsetofmeasurevalues.DICEmaynotreducethenumberofperspectives,justyieldingasmallercube.UnlikeSLICE,DICEalsoallowslteringofmultiplemembersfromwithinthesameperspective. GETInventory.Cost_PriceNOTNULLDICE(Time.Year>2011)AND(Location.CountyLIKE`S%')FROMSales;Thisqueryreturnsasmallerdatacube(subcube)thatshowsonlysalesmeasurevales(bothprotandquantity)fromcountieshavingastartingwith`S'andfortheperiodafter2011.ThePIVOTclausecanbeusedattheendoftheGETstatementtopivottheresultantdataacrossthestandardaxes.ThefollowingexampleprovidestheresultsofthecubewiththeLocationoncolumnsandsalesquantityontherowsforvisualization.Themultidimensionalcubeobtainedasaresult(ASclause)isalsopivotedonitsaxes. GETProduct_Management.Sales.QuantityON1,Product_Management.Location.CityON2PIVOTSLICEProductONName=`iPhone'SLICELocationONCounty=`Suffolk'DICETimeONYear>2008;ROLLUPisanaggregation(generalization)operatorhelpstheusertorollupacrossthelevelsofahierarchybyspecifyingthecurrentlevel(state)inthehierarchy,andthenallevelorthenumberoflevelstorollupforgeneralization.Sincethehierarchychosenisalsoexplicitlyspecied,aggregationproceedscorrectlyevenifmultipleaggregationpathsexistforthesameperspective.Iftherollupreachestheapexcube,thehighestlevelvaluesarereturned.Aditionally,ROLLUPcanalsobedonesimilartotheGROUPBYclauseinSQL,meaningthatasinglehierarchicallevel(datacategory)canbespeciedforperformingagroupingofcellvaluesbasedontheinstancesofeachofthecategoryvalues.Fore.g.,wecangetthesumofallsalesvaluesforallproductsby 116

PAGE 117

theirnamesbyapplyingROLLUPontheproductname.Anexampleofthisisshownlaterinthechapter.Now,referringtotheCREATECUBEstatementinSection 5.3.1 ,wecanseethatbothqueriesshownbelowperformthesamerollupoperationsandreturnsimilarresults.Thesyntaxoftherollupoperatoris:ROLLUPperspective.hierarchy(currrent level,nal level); GETSale.`UnitSales'ROLLUPLocation.Loc_H2(City,Country)FROMSales;DRILLDOWNisaspecializationoperatorhelpstheusertodrilldownthelevelsofahierarchybyspecifyingthecurrentlevel(state)inthehierarchy,andthenallevelorthenumberoflevelstostepdownforspecialization.Ifthedrilldownreachesthebasecube,thelowestlevelvaluesarereturned.Bothexamplesshownbelowperformdrilldowntothesamecitylevelbutalongtwodifferenthierarchies,thusyieldingdifferentaggregate(measure)resultsinthecube.Thesyntaxofthedrilldownoperatoris:DRILLDOWNperspective.hierarchy(current level,nal level); GETSale.`UnitSales'DRILLDOWNLocation.Loc_H3(Country,City)FROMSales;TheDRILLTHROUGHclausehelpstoretrievethebasedatabydrillingthroughthebasecubesintotherelationaldatabases.Inthefollowingexample,wegetthesalesdataforeachcity(whichisthelowestcategorylevelofthelocationperspective)usingtheexistingrepresentationacrosstheaxes. GETSales.`UnitSales'DRILLTHROUGHLocationAS"data1.dbfs";TheDRILLACROSSclauseinvolvesaccessacrossdatacubesusingthetypecheckingontheperspectivesorsubjectsoftheassociatedcubes.Thefollowingexamplehelpstoobtainsalesdataforallcitiesbyusingtwostructurallysimilarcubes,eachhavingadistinctsetoflocationvalues. GETProduct_Management.Sales.Quantity 117

PAGE 118

DRILLACROSSProduct_Management.Location.City,Sales_Orders.Location.City;StatisticalAggregationOperators:CQALincludesacollectionofpredicatesformanyaggregationoperatorsthataresupportedbasedontheavailabledatatypesandoperationsintheunderlyinglogicalimplementationmodel(Section 5.4 ).Someexamplesofpredicatesforaggregationsforconventionalalphanumericdataincludesum,count,min,max,andavg.ThefollowingqueryshowsthequantityofsalesofelectronicsproductsintheUK,from2010.ThisexampleusesacombinationofmanyavailableCALoperatorsasanillustrationoftheexibilityofthequerylanguageformultidimensionaldataanalysis. GETCOUNT(Sales.Sale.`Unit_Sales')SLICEProduct.PType=`electronics'DICETime.Year>2010ANDLocation.CityLIKE`London'ROLLUPLocation.Loc_H2(City,3)ASC_SALES_elecUK2010;Additionally,weusetheAGGREGATEclauseintheCDLtospecifytheexactaggregationfunctionovertherespectivehierarchies.Thisisoptional,becauseoftentimestheaggregatefunctioniseitherspeciedbytheuserduringthequeryprocessoritishandledusingconstraintssetbythelogicalmodel.TheNULLaggregatefunctionisusedtoblocktheuseofanaggregationfunctionoverthecellvalues.ThepresenceofnullindicatesthataunionofthecellvaluesisdoneandamultisetiscreatedforthathigherlevelcellintheBigCube.AnexampleoftheAGGREGATEclauseinCDLisshownbelow. CREATECUBESales(SUBJECTInvoice(Selling_Pricedecimal(20,2),Sales_Quantityint)SUBJECTInventory(Cost_Pricedecimal(20,2),`UnitSales'int,Manufactured_Atchar) 118

PAGE 119

SUBJECTSale(Profitprofit_t,`UnitSales'int)SHARED(Sales.`UnitSales',Invoice.Quantity)PERSPECTIVEProduct(PNamechar,Brandchar)ATTRIBUTE(PColorchar,PTypechar)PERSPECTIVETime(Dayint,Monthchar,Weekint,Yearint)HIERARCHY(Day,Month,Year)ASTime_H1HIERARCHY(Day,Week,Year)ASTime_H2PERSPECTIVELocation(Citypoint,Countyregion,Statechar,Region_1char,Region_2char,Countrychar),HIERARCHY(City,County,Country)ASLoc_H1HIERARCHY(City,State,Region_1,Country)ASLoc_H2HIERARCHY(City,State,Region_2,Country)ASLoc_H3)AGGREGATESales.ProfitONLoc_H2USINGSUM,AGGREGATESales.InventoryONLoc_H2USINGNULL,AGGREGATESales.QuantityONLoc_H1orLoc_H2USINGSUM; 5.4TransformationtoLogicalModelandImplementationOncetheconceptualBigCubeiscomplete,translationscanbedevelopedfromtheabstractmodeltothelogicaldesignapproachesfordatawarehousing.Theworksof( Abelloetal. 2003 ; GeoMondrianProject 2011 ; MalinowskiandZimanyi 2006 ; Niemietal. 2001 ; Poeetal. 1997 )provideusefultechnicalguidelinesandobservationsonmanaginghierarchiesandtheuser-centricoperationsrequiredfortranslatingfroman 119

PAGE 120

abstractleveltoalogicaldesigninadatawarehouseenvironment,leadingfurthertoaphysicalimplementation.Inourexample,iftheRelationalOLAPrepresentationisused,wecantranslatetheBigCubeviewtostar,snowakeorgalaxyschema.Thebroadconceptisthatthesubjectsofanalysisarerealizedasfacttableswithmeasuresascolumns(andstoringmeasurevaluesastuples),andtheperspectivesarerealizedasdimensiontablesbearingthecategories(levels)ascolumns(andstoringmembersastuples),andlinkedtofacttablesusingforeignkeys.QueriesfromCALcanbetranslatedtoMDXorSQL(usingOLAPextensionsandGROUPBY)asappropriate.Thesearethenevaluatedandtheresultingdataisreturnedtotheuser.ThedetailedsemanticsofBigCubetranslationsoperationsisnowexplained.NowwepresenttheexactmethodologyusedtotransformthealgebraicqueriesandCALelementstoMDXqueriesandlatertoSQL.TheendusersofspatialdatawarehousesystemsareanalystsorscienticdomainexpertswhowishtoanalyzedatabymovingfromoneBigCubetoanother.TheBigCubeisrstbuiltusingtheCDLqueries.ConsiderthefollowingCDL. CREATECUBESales(SUBJECTSale(Profitprofit_t,`UnitSales'int)PERSPECTIVEProduct(PNamechar,Brandchar)PERSPECTIVETime(Dayint,Monthchar,Weekint,Yearint)HIERARCHY(Day,Month,Year)ASTime_H1HIERARCHY(Day,Week,Year)ASTime_H2PERSPECTIVELocation(Citypoint,Countyregion,Statechar,Region_1char, 120

PAGE 121

Figure5-1. StarschemadesignfromCDLquery. Region_2char,Countrychar),HIERARCHY(City,County,Country)ASLoc_H1HIERARCHY(City,State,Region_1,Country)ASLoc_H2HIERARCHY(City,State,Region_2,Country)ASLoc_H3);Forrelationalimplementation,wersttranslateeachCREATECUBEspecicationintoadatabaseUSER.SCHEMAinsidetheappropriatetablespace.InthiscasewerstcreateaSalesschemaandthisiscalledadata-martforthisdata.Now,therearethreebroadschemaspecicationsavailabletoimplementsuchadatawarehouse.Therstisastarschema,whichischaracterizedbyrelativelyfewrelationsandwell-denedjoinpaths.Thebenetofusingasimpledesignwithasinglefacttableconnectedtodeningdimensiontablesisthatthisstructuralschemaiseasilyunderstoodbytheend-usersofthesystem,whoaredataanalysts.Additionally,thisbigtablestructureprovidesfastqueryresponse,sincedatawarehouserelationsarecharacterizedbylargetablewithrelativelyfewinserts,updatesordeletes(transactions).Thebenetsofusingthestarschemaincludethefollowing: 121

PAGE 122

Providesasimplephysicaldatabasedesignthatenablestheeasyunderstandingofthestructureandmetadataforendusersanddevelopersofthespatialdatawarehouse Itmatchestheenduserviewofdata,inthatdirectassociationscanbemadefromtheperspectivesoftheBigCubetothedatadimensiontables,andfromthesubjectsforanalysistothemeasuresinthefacttable. Itcreatesadatabasedesignprividingfastqueryresponsetimes. Databaseoptimizersgetasimplerdatabasedesigntoenablegoodexecutionplans. UIandreportgenerationismucheasierbecausetheconnectionsleadtoasimplerunderlyingdesignandreducedjoinsovertableswithalargenumberoftuples.Forthestarschema,wecreatefacttablesoutofthesubjectsforanalysisanddimensiontablesoutoftheperspectivesofanalysis.Theprimarykeyfromthedimensiontablesisputintothefacttablesandthecollectionofthesekeysfromeachofthedimensiontablestogetherdenesakeyforatupleinthefacttable.TheoverallstarschemaisshowninFigure 5-1 .Inaextendedexample,ifattributesexistforperspectives,fore.g.,intheproductperspectivedenedearlier,theattributesbecomepartofseparatetableslinkedtoproducttablesbyforeignkeys.Thuseachproduct(identiedbyname)wouldhaveuniquecolorandtypeattributes.Membersandmeasuresinacubecanbedenedusingalpha-numericdatatypesorspatialtypesassupportedbytheunderlyingdatabasesystemorextensibilitymechanism.Intheaboveexample,generalizationisappliedontheLocationperspectiveduringrollupofcity(point)values(spatialdata)tostateandhigherlevels(alphanumericdata).Thesnowakeschemainvolvesafurthernormalizationofthedatadimensiontables.OftentimesthedimensiontableschemaisbuiltinthethirdnormalformandthesetofdimensiontablesarerelatedtothefacttablesbyextendingtheprimarykeyfromthedimensiontabledeningtheperspectiveintheBigCubeintothefacttable.ThisisillustratedinFigure 5-2 122

PAGE 123

Figure5-2. SnowakeschemadesignfromCDLquery. Whentherearemultiplefactsassociatedwiththeschema,wecanuseagalaxyorfactconstellationtable.Inthese,thefactscanbepresentinseparatetablesthatcansharethedatadimensiontables.Thedimensiontablesthemselvescanbeatonelevel,ordecomposedintothethirdnormalform.Ifthesetofkeysfromthedimensiontablesdonotdeneauniquetupleinthefacttables,newprimarykeysarecreatedinthefacttablesbycombiningoneormoreofthemeasurevalues.AnexamplegalaxyschemaisillustratedinFigure 5-3 withaninvoicetablewhichincludesanadditionalinvoice id,sincethecombinationoftheproduct idandtime idalonedonotdeterminethekeyforthisfact.NotethatthegalaxyschemamakesthedatabasetablestobecomeclutteredverysoonandprovidereducedexbilityintermsofcreatingnewanalysisqueriesfromtheBigCubeview.ForconversionfromCQALtoMDXthefollowingprocedureisfollowed.ForeachCALqueryweperformaUNIONfortherstandsecondelementsdenedinaxis1and2intheGETclauseoftheCALquery.ACROSSJOINisperformedontheunionofthesetwowiththethirdaxiselement.TheHIERARCHIZEfunctioncanbebuilt 123

PAGE 124

Figure5-3. GalaxyorFactConstellationschemadesignfromCDLquery. overthecrossjointoenablethecorrectnavigationsemanticsandvisualizationoftheresultingdataalongtheaxes.IfthedatainanyperspectiveisslicedfromtheBigCube,thisisexecutedbyusingtheFILTERcommandinMDX.Fore.g.,Filter(Axis(2),[Product].[drink]).UsingthismetholodogywecannowtranslateCALcommandstoMDXandthentoSQL.Consideranexampleproductsalesdatasetwiththefollowingperspectives:media,productandtime,andsubjects:unitsales,storecount,andstoresales.ACALquerybasedonthisSalesBigCubeisshownbelow. GET`UnitSales','StoreSales',`StoreCount'ON1,Time.ALL,Time.ALL.Childrenon2Product.ALLon3.Media.ALLon4FROMSalesDICETime.Year=`1997'ANDProduct.PName=`Drink'TheMDXqueryequivalentforthisisprovidedusingthemethodologydenedabove.Thisisillustratedbelow. SELECT{[Measures].[UnitSales],[Measures].[StoreSales], 124

PAGE 125

[Measures].[StoreCount]}ONCOLUMNS,HIERARCHIZE(CROSSJOIN({[Measures].[ALL],UNION(CROSSJOIN([Product].[AllProducts],[Time].[AllTimes]),CROSSJOIN([Product].[AllProducts],[Time].[AllTimes].[Children])}))ONROWSFROMSalesWHERE[Time].[1997]TheCubeMarkupLanguage(CubeML)itselfisonlyanoptionalfacilitytoenabletheETLanddatawarehousequeryphasestoproceedinparallelandtoconnectthemtogetheratalatertime.ThisisanXMLspecicationthatcanbeusedtoconnecttheunderlyingdatabaserepresentationwiththerequirementsoftheBigCubestructure.WealsoextendCubeMLtoaccommodaterelational(ROLAP),multidimensional(MOLAP)implementationstrategiesandacombinationofboth.Asanexample,weforROLAPwehaveincludedaclauseinCubeMLtoprovideinformationregardingtheunderlyingrelations.Forexample,thefollowingclauseprovidesthelinktotheconnectingtablesforstoringYear,MonthandDaymembersintheTimeperspective'sTime H1hierarchy. ...Further,wecanaddkeyvaluestothevariouslevelsandprovideprimarykeyasattributetotheLEVELclause.Inthismanner,ourextensibleCubeMLspecicationhelpswithROLAPimplementation.TherelationfromCALtoSQLqueriesisasfollows.ConsiderthetypicalSQLquerystructure,SELECT.FROM.WHERE.GROUPBY.ORDERBY,whichisusedindatawarehouses.WeshallexplainthesemanticsofeachoftheseelementsandtheirrelationtoCALinthefollowingdiscussion.TheSELECTclauseallowessubjectsandperspectivestobeaddedorremovedfromtheresultset.ThemeasureandmembervaluesarereturnedbasedonthehierarchicalrelationshipsdenedontheBigCubeperspectives.TheFROMclauseallowsforanexplicitBigCubeschematobespecied. 125

PAGE 126

Thistranslatesintothecollectionofdatadimensiontablesandfacttablesavailableforqueryingtotheuser.Weconsiderthatdatathatisnotusedinthequeryexistsintheschemabutisnotconsideredduringmaterialization.Fore.g.,whenproducttimelocation!saleisdenitionfortheBigCube,queriescanbebuilttoqueryonproducttimeorproductlocation.Thesale-quantityfactvaluesarethenaggregatedovertheresultingcubecells.Inparticular,fore.g.,aaggregationofsales-quantityvaluesonthelocationsisdoneontherstdenition,whiletheaggregationofsales-quantityvaluesonthetimevaluesisdoneinthesecondquery.TheexactaggregationfunctionusedisbasedontheA-setspeciedwiththeBigCubedenition,whichistranslatedintothesemanticsofthelogicalmodel.TheWHEREclausesallowspathsandconditionstobespecied.Itcanalsobeusedtoremoveexistingsub-objectsofmembersormeasuresorentiredatacategoriesorlevelsintheperspectivesandsubjects.TheSLICEandDICEoperatoristranslatedintoconstraintsintheWHEREclauseforoperationsonalphanumericdata.Formorecomplexoperationssuchasslicingbyaspatialtopologicalrelation,weusetheappropriateIS CROSSorsimilaroperatorasspeciedbythespatialdatabaselibrary.Fore.g.,thiscanbeusedasfollows. SELECT...FROM...WHEREIS_CROSS(region_1,region_2)=1TheGROUPBYclauseinSQLprovidesthebasisforimplementingtherollupaggregationoperatorinBigCube.ColumnscanbeinterchangedoreventuallyremovedbyrollinguptoALL(apex)levelusingtheGROUPBYclause.HoweveronceaGROUPBYisdoneandthecubereachesahigherstateitcalledbedrilleddownusingthesamehierarchyunlessadrilldownaggregationfunctionisalsospecied.Forexample,ifweknowthattheinventory quantityissplitevenlyacrossallcitiesinAlachuacounty.However,ingeneralwestartfromthebaselevelBigCubeoralowerstateandrolluptoahigherlevelbyusingtheGROUPBYoperator.TheORDERBYclauseisusedtosortresultingmemberandmeasurevaluesbasedonthedatacategoriesdeningthe 126

PAGE 127

structureoftheBigCube.ThesecorrespondtotheelementsspeciedintheSELECTclause,andhencesometimesortingcanaffecttheresultinganalysisforspecicaggregations.Fore.g.,iftop-koperatorisusedtorankthetop5regionsaccordingtotheirareaandthenbytheirachievedsales-quantity,andiflateraspatialunionoftheresultsisapplied,theresultchangesaccordingtothesortingoftheparameterssenttotheORDERBYclause. 5.5AdvantagesofCALoverMDXandSQLMulti-DimensionaleXpressions(MDX)hasbecomepopularinrecentyearsandissupportedbyseveralmajorBIvendors.However,inthesesystemsMDXisnotuseddirectlyforqueryingbutinsteadasainternallinkinthequeryhandlersystembothduetoitscomplexityandlackofextensibilityandsupportfornewcomplexdatatypes.Thusdashboardsandgraphicalfront-endsaretheonlytoolsusedcommonlyfordatawarehousedevelopmentandmultidimensionaldataanalysis.CALprovidesanewtextualinterface(formultidimensionaldata)thatissimilarinitsease-of-usetoSQL(forrelationaldatasets).Inthissection,weprovideacomparisonofsyntaxandsemanticsbetweenCALandMDX,highlightsomeoftheshortcomingsofMDXandmotivatetheneedforanewabstractionoverexistingOLAPlanguages.First,MDXrequiresbuildingsetsandtuplesandthecreationofexplicitcrossjoinsandunionsbetweenmembers.Thisisachievedusingmanydifferentparenthesesthatoftenmakethecodeunreadable.InCAL,theunderlyingBigCubedatamodeltakescareoftypecheckingandqueryvalidation.CALdoesnotprovideforexplicituser-levelcross-joins.Insteadtheorderofaggregationisaccordingtothehierarchyexplicitlyspeciedbytheuser.Further,MDXallowsaxestobespeciedinorderanddoesnotallowalterationordeletionofapreviousaxis.ThisisoftenabigrestrictionwhentheDWstructureisbeingalteredbyanaggregationquery.InCAL,thequeryparserdynamicallyreordersqueryresultswithinternalaxisidswhicharedifferentfromthosespeciedbytheuser. 127

PAGE 128

Additionally,theCubeMarkupLanguage(CubeML)canbeusedtospecifyasolvepriorityforjoiningperspectivelevelsasshownbelow. ......IftheaboveCMLleisreferencedinthequeryusingWITHclause,theSOLVE PRIORITYsupercedestheorderofperspectivesspeciedinthequeryitselfforthejoinsamongthevariousmembersalongdifferentaxes.Thishelpswhenmulti-levelCALqueriesarebuiltwithseveralsub-cubesthatusedinhigherlevelcomputations.Thelinkbetweentheinternalphysicalaxisidandthelogicaluserspeciedaxisnumberismaintainedbythesystem.Thus,CALdoesnotrestrictthenumberofaxesoraskipintheirspecicationinthequery.Third,fromtheuser'sstandpointonethemainproblemswithMDXisthemixingofstructureandthevaluesofthecube.Fore.g.,Timerepresentsacubedimensiontypewhereas2005representsthevaluefortheYearlevelinTime.ThisseparationisnotclearinaMDXquery.InCAL,theunderlyingBigCubedatamodelmakestheclearseparationusingexplicitdata-typingandtheuserisalwaysallowedtostartfromthepre-denedcubestructureandleadtothevaluesasseenintheexamplesbelow. SELECT{[Time].[2006]:[Time].[2011]}oncolumns,{[Product].[Tools]:[Product].[HomeAudio]}onrowsFROMSales 128

PAGE 129

WHERE([Customer].[Gainesville],[Measures].[UnitSales])Moreover,intheMDXqueryabovethereisredundancyinthespecicationofdatadimensionswiththecolonoperator.TheCALsyntaxismuchsimplerandintuitiveasseenintheequivalentquerybelow. GETTime.[2006:2011]ON1,Product.[Tools:HomeAudio]ON2FROMSalesDICECustomer.GainesvilleANDSale.Unit_sales;Further,MDXonlyallowsimplicitslice(usingWHEREclause)acrossdissimilardimensions.HoweverinCAL,DICEcanbeusedforchoosingmembersevenfromthesameperspective.Thesysteminternallyhandlesthetwocuboidsthatresultfromeachsliceandmergesthemtogetherintoonesingledatacuboid.Also,CALisnotcaseorline-oriented,andspacesinmemberormeasurenamesarehandledusingenclosingbacktickcharactersasinSQL.Finally,considerthefollowingquerytodisplaythestorecost,unitsalesandtheprotgeneratedfromthesalesofproductsduringtheyear1997instoresinsidetheUSAandthegrosstotaloverallstores(alllocations)fromoursampledataset.TheCALqueryanditsMDXvariantrequiredtogeneratetheresults(intwo-dimensionalreportformatasshowninFigure 5-7 )areprovidedbelow. CALquery:GET`StoreCost',`UnitSales',ProfitON1,Timeon2,Storeon3,Producton4FROMSalesDICE(Time.Year="1997")AND(Store.ALLANDStore.Country="USA");MDXquery:SELECT{[Measures].[StoreCost],[Measures].[UnitSales],[Measures].[Profit]}ONCOLUMNS,Union(Union(Crossjoin({[Product].[AllProducts]},Union(Union(Union(Union(Crossjoin({[Store].[AllStores]},{[Time].[1997]}),Crossjoin({[Store].[AllStores]},{[Time].[1997].[Q1]})),Crossjoin({[Store].[AllStores]},{[Time].[1997].[Q2]})),Crossjoin({[Store].[AllStores]},{[Time].[1997].[Q3]})),Crossjoin({[Store].[AllStores]},{[Time].[1997].[Q4]}))),Crossjoin({[Product].[AllProducts]},Union(Union(Union(Union(Crossjoin({[Store].[USA]},{[Time].[1997]}),Crossjoin({[Store].[USA]}, 129

PAGE 130

{[Time].[1997].[Q1]})),Crossjoin({[Store].[USA]},{[Time].[1997].[Q2]})),Crossjoin({[Store].[USA]},{[Time].[1997].[Q3]})),Crossjoin({[Store].[USA]},{[Time].[1997].[Q4]})))),{([Product].[Drink],[Store].[AllStores],[Time].[1997]),([Product].[Non-Consumable],[Store].[AllStores],[Time].[1997]),([Product].[Food],[Store].[AllStores],[Time].[1997])})ONROWSFROM[Sales]Consideranotherqueryto`ndtheproductnamewiththemaximumsalesquanti-tiesandthecitywheretheyweresold.'TheCALqueryforthisisshownbelow. GETProduct.PName,Location.Cityon1FROMSalesSLICESales.`UnitSales'=(GETMAX(Sales.`UnitSales')SLICEP.PName);TheequivalentSQLqueryontheunderlyingdatabaseschemaisshownbelow. CREATEVIEWMaxSalesViewASSELECTP.PName,MAX(S.Quantity)ASMaxSFROMSalesS,ProductP,TimeT,LocationLWHERES.TimeID=T.TimeIDANDS.LocID=L.LocIDANDS.ProdID=P.ProdIDGROUPBYP.PName;SELECTP.PName,L.City,MaxSalesView.MaxSFROMMaxSalesViewMSV,SalesS,ProductP,LocationL,TimeTWHEREMSV.PName=P.PNameANDS.TimeID=T.TimeIDANDS.LocID=L.LocIDANDS.ProdID=P.ProdID;Thestarschematablesuse`quantity'torefertothe`unitsales'attribute.Firstwecreateaviewforthemaxsales,andtheretrievetheproductnameandlocationdataforeachofthesesalesvalues.NoticehowthecomplexstructureoftheunderlyingtablesandthemanyexplicitjoinsmaketheSQLquerydifcultforend-userstoformulate. 130

PAGE 131

WewillnowprovideadditionalsampleCALandtheirequivalentMDXqueriestoillustratethebenetsofaconceptual,high-levelapproachformodelingandqueryingcomplexspatialdata.Thedatasetwehaveusedisthereal-worldretailsalesdatasetthatincludesthesalesmeasuresforretailproductsinseveralcontinentsacrosstheworldoverseveralyears.Thedatahasseverallevelsofgranularity,fore.g.,thestorelocation,thecityandcountrynamesarealsoavailable.Toleveragespatialanalysis,wehaveaddedspatialinformationfromtheTIGER1datasetsofcity,countyandstatelocationsandboundariesfromtheofcialdata.govclearinghouseforspatialdata.WehavealsoincludedabaseversionoftheFoodMartdatabasethataccompaniestheMondrianopen-sourceOLAPsystemandcontainssimilarproductsalesmeasures.OurqueriesarewrittenasCALqueries,andweachievetranslationtoMDXandprocessitnallyasSQL.Thisdemonstratesouroveralldesignforauser-centricspatialdatawarehousesystemthatcanbeusedinrelationwithalogicaldatamodelanditsquerysystem(suchasMDX)andimplementednallyasSQLqueriesovertheattachedphysicaldatabases.SinceBigCubewasbuiltexplicitlytosupportcomplexstructureddatasuchasspatialdata,wehavealsoimplementedtheessentialspatialoperatorssuchasgeometricunion,differenceandintersectionbyextendingthepostgisspatiallibraryandusingtheOpenGISGeographicObjectsImplementationSpecication2fromtheOpenGeo-SpatialConsortium(OGC),totestthefunctionalityofthesefromauser-centricanalysisview.Thisisillustratedbythesamplequeriesandtheirresultsbelow.Thetests 1TIGERisaregisteredtrademarkoftheU.S.CensusBureau.2OpenGeospatialConsortium,theOGClogos,andthemarksOGC,OPENGIS,OPENGIS,OPENGIS.NET,OGCNETWORK,OGCNetwork,OpenLS,OpenLocationServices,OGCA,OpenGeospatialConsortium(Australia),Ltd,OGCE,OpenGeospatialConsortium(Europe),Ltd,OWS,OGCSensorWebServices,OGCWebServices,areregisteredtrademarksorservicemarksortrademarksorservicemarksofOpenGeospatialConsortiumintheUnitedStatesandinothercountries. 131

PAGE 132

wereconductedonalinux3systemwith1GBRAMrunningPostgreSQL4databasewithPostGIS5support.Query:FindtheunitsalesandstoresalesforallstoresinsideUSAduring1997.ToanswerthisqueryusingCAL,weusethespatialtopologicaloperatorinside.Inpostgis,thisisdenedusingST Within(geometryA,geometryB),whichreturnstrueifageometryAiscompletelyinsideanothergeometryB.HencewecanuseinsidetocheckforallstoresthatliecompletelyinsidetheextentoftheUSA.TheCALanditscorrespondingMDXareshownbelow.TheresultsaredisplayedinFigure 5-4 .Thisqueryreturned8datarowsandtook134msforprocessing. CAL:GET`UnitSales',`StoreSales'on1,Store.Cityon2,DICEST_Within(Store.Location,Store.Country)ANDTime.Year=1997;MDX:SELECT{[Measures].[UnitSales],[Measures].[StoreSales]}oncolumns,Filter({[Store].[StoreCity].members},ST_Within([Store].CurrentMember.Properties("geom"),[Store].[StoreCountry].[USA].Properties("geom")))onrows 3LinuxisaregisteredtrademarkofLinuxTrovalds,theinventoroflinuxoperatingsystemandadministeredbytheLinuxMarkInstitude(LMI).4PostgreSQLisCopyright-2009bythePostgreSQLGlobalDevelopmentGroup.5PostGIShasbeendevelopedbyRefractionsResearchasaprojectinopensourcespatialdatabasetechnologyandisreleasedunderGNUGeneralPublicLicense. 132

PAGE 133

FROM[Sales]WHERE[Time].[2010] Figure5-4. ResultsofSpatialCALQuery Query:FindtheoveralsalesofallstoresintheUSAduring1997alongwiththetotalgeometricareaspannedbythesestores.Toexecutethisquerywehavetoensurethatthecorrectspatialcoordinatesystemisusedduringwhileapplyingtheunionandareaoperators.Forthis,theOGCspecication( OpenGISConsortium:ReferenceModel 2011 )providestheSRIDparameter.WGS84lat-longcoordinatesystemisspeciedusingSRID4326andweusethisintheST Transformfunctionprovidedbypostgisasshownbelow.Thequeryreturnedoneaggregateresultrowin8ms(Figure 5-5 ). CAL:GET`ST_Area(ST_Transform(ST_UnionAgg(Store.All,"geom"),4326,2991))/1E6'as`Geom_area_in_km2'ON1,Store.AllON2FROM[Sales]DICE[Time].[1997]MDX:withmember[Measures].[Geom_area_in_km2]as'ST_Area( 133

PAGE 134

ST_Transform(ST_UnionAgg([Store].[AllStores].CurrentMember.Children,"geom"),4326,2991))/1E6'select{[Measures].[UnitSales],[Measures].[Geom_area_in_km2]}ONCOLUMNS,{[Store].[AllStores].[USA]}ONROWSfrom[Sales]where[Time].[1997] Figure5-5. ResultsofSpatialAggregationusingCALQuery Query:FindtheunitsalesandstoresalesforStoreslocatedinsidethestateofWashington.Forthisquery,byusingabasicboundingpolygonforthestateofWashington,wecanexecuteaspatialinsidetopologicaloperatortocheckforthestoreslocationgeometricallywithinthestate.Theexecutedqueriesareshownbelowfollowedbytheresults(Figure 5-6 ). CAL:GET`UnitSales',`StoreSales'ON1`StoreCity'.ALLON2FROM[Sales]DICE[Time].[1997]AND(ST_Within([Store].CurrentMember.Properties("geom"),ST_GeomFromText("POLYGON((-123.1045.50,-123.1049.00,-120.0049.00,-120.0045.50,-123.1045.50))")));MDX:SELECT{[Measures].[UnitSales],[Measures].[StoreSales]}oncolumns,Filter( 134

PAGE 135

{[Store].[StoreCity].members},ST_Within([Store].CurrentMember.Properties("geom"),ST_GeomFromText("POLYGON((-123.1045.50,-123.1049.00,-120.0049.00,-120.0045.50,-123.1045.50))")))onrowsFROM[Sales]WHERE[Time].[1997] Figure5-6. UsingTopologicalOperatorsinSpatialAnalysis Thesequeriesdemonstratehowanunderlyingspatiallyenableddatabasecanbeusedtoleveragedatawarehousesforuser-centricspatialOLAPqueries. 5.6ObservationsTheCubeAnalysisLanguage( ViswanathanandSchneider 2011a )canworkwithbothpopularlogicaldesignapproachesfordatawarehouses,namely,RelationalOLAP(ROLAP)andMultidimensionalOLAP(MOLAP).Todemonstratethefunctionalityofthisquerylanguageinuser-centricmultidimensionalanalysis,wehaveimplementedaCALparserforMondrian6( PentahoAnalysisServices:MondrianProject 2011 ),whichisanOpenSourceOLAPserverfromPentaho.WehaveprovidedXMLAdatasourceaccessfromarelationalPostgreSQLdatabaseandextendedJPivot,aJSPtaglibrarycapableofrenderingOLAPtablesandcharts(seeFigure 5-7 )inordertodisplaytheresultsingraphicalformat.Consideraquery`todisplaythestorecost,unitsalesandtheprotgeneratedfromthesalesofproducts,during1997,instoresinsidetheUSAandthegrosstotaloverallstores(alllocations)'fromoursampledataset.UsingCAL, 6MondrianandPentahoareregisteredtrademarksofPentahoCorporation. 135

PAGE 136

Figure5-7. CALqueryeditorandresultsofOLAPanalysisonMondrian( PentahoAnalysisServices:MondrianProject 2011 ). thisquerycanbebuiltusingasimplesetofCQALstatementsanditsresultcanbegraphicallyanalyzedinMondrianasshowninFigure 5-7 .WhenamultidimensionalOLAPstrategyisapplied,acorrespondinglogicalmodelisusedtotranslatethesubjectsandperspectivesdirectlyintofactsanddatadimensionsalongwithotherassociatedconstructs.ThelinksbetweenthecubecomponentsandtheMOLAPconstructsarewrittentotheCubeMLleasmetadata.ThesecanbeusedbythesystemforperformingMOLAPstorage.ItisthusclearthattheuseofCMLandCALprovidesauniqueplatformforanalystuserstodesign,developandquerydatacubesabstractlyregardlessofimplementationaspects.WeincorporateaCALtoMDXconvertormodulethatservestoconvertCALcodeintooneormoreMDXqueriestoperformtherequiredaction. 136

PAGE 137

Thischapterpresentsamajorimprovementinuser-centricdatawarehousedesignbyintroducinganoveltext-basedCubeAnalysisLanguageformultidimensionalanalysis.CALconsistsofthreecomponents,namelytheCubeDenitionLanguage(CDL),theCubeManipulationLanguage(CML)andtheCubeQueryandAnalysisLanguage(CQAL)thathelptobuild,manipulateandquerythedatacuberespectively.CALhelpstheusertocommunicateinanabstractdatacubeenvironmentandprovidesanextensibleinterfacetoperformmultidimensionalnavigationanddataanalysisoperations.TheseDWoperationsareinternallyvalidatedandimplementedusingtheunderlyinglogicaldatamodel(ROLAPorMOLAP).WealsodemonstratethefunctionalityofCALbyimplementingaworkingprototypeoverMondrian(anopensourceOLAPtool)andarelationalPostgresdatabasewithpostgissupport.WewouldalsoliketomotivatetheuseofCALasafreshuserinterfaceforspatialdatawarehouses,incontrasttotheexistinggraphicaldashboardtoolsorschemaspecicSQL. 137

PAGE 138

CHAPTER6IBLOB:EFFICIENTQUERYINGOVERCOMPLEXHIERARCHICALOBJECTSNewemergingapplicationsincludinggenomic,multimedia,andgeo-spatialtechnologieshavenecessitatedthehandlingofcomplexapplicationobjectsthatarehighlystructured,large,andofvariablelength.Currently,suchobjectsarehandledusinglesystemformatslikeHDFandNetCDFaswellastheXMLandBLOBdatatypesindatabases.However,someoftheseapproachesareveryapplicationspecicanddonotprovideproperlevelsofdataabstractionfortheusers.Othersdonotsupportrandomupdatesorcannotmanagelargevolumesofstructureddataandprovidetheirassociatedoperations.Inthispaper,weproposeanoveltwo-stepsolutiontomanageandqueryapplicationobjectswithindatabases.First,wepresentageneralizedconceptualframeworktocaptureandvalidatethestructureofapplicationobjectsbymeansofatypestructurespecication.Second,weintroduceanoveldatatypecalledIntelligentBinaryLargeObject(iBLOB)thatleveragesthetraditionalBLOBtypeindatabases,preservesthestructureofapplicationobjects,andprovidessmartqueryandupdatecapabilities.TheiBLOBframeworkgeneratesatypestructurespecicapplicationprogramminginterface(API)thatallowsapplicationstoeasilyaccessthecomponentsofcomplexapplicationobjects.ThisgreatlysimpliestheeasewithwhichnewtypesystemscanbeimplementedinsidetraditionalDBMS. 6.1OverviewManyeldsincomputerscienceareincreasinglyconfrontedwiththeproblemofhandlinglarge,variable-length,highlystructured,complexapplicationobjectsandenablingtheirstorage,retrieval,andupdatebyapplicationprogramsinauser-friendly,efcient,andhigh-levelmanner.Examplesofsuchobjectsincludebiologicalsequencedata,spatialdata,spatiotemporaldata,multimediadata,andimagedata,justtonameafew.Traditionaldatabasemanagementsystems(DBMS)arewellsuitedtostoreandmanagelarge,unstructuredalphanumericdata.However,storingandmanipulating 138

PAGE 139

large,structuredapplicationobjectsatthelowbytelevelaswellasprovidingoperationsonthemarehardlysupported.Binarylargeobjects(BLOBs)aretheonlymeanstostoresuchobjects.However,BLOBsrepresentthemaslow-level,binarystringsanddonotpreservetheirstructure.Asaresult,thisdatabasesolutionturnsouttobeunsatisfactory.secondaryHence,scientistshavedesignedspecialleformatslikeNetCDF(networkCommonDataForm)andHDF5(HierarchicalDataFormat)tostoresuchobjectsinles.Unfortunately,withoutthesupportofaDBMS,standardfeatureslikeaquerylanguage,concurrencycontrol,transactionmanagement,security,andrecoveryareunavailable(datamanagementproblem).Awidelyacceptedapproachtohandlingcomplexdataindatabasesistomodelandimplementthemasvaluesofabstractdatatypes(ADT)inatypesystem,oralgebra,whichisthenembeddedintoanextensibleDBMSanditsquerylanguage.ThisenablestheiruseasattributedatatypesinadatabaseschemawithoutdihierrwsclosingtheimplementationdetailsoftheircomplexinternalstructuretotheuserandDBMS.Atthetypesystemlevel,extensibleDBMSenablethespecicationofnewADTslikespatial,image,andXMLdatatypes.However,theseADTshaveDBMSspecicimplementationsandarenotuniversallydeployable(generalityproblem).Ontheotherhand,BLOBsarenotwellsuitedforstructuredobjectmanagement.Theyhaveoriginallybeendesignedforstoringunstructureddataasbytesequencesandofferalow-levelinterfaceforsimpleread/writeaccesstobyteranges.ThusBLOBsdonotunderstandthesemanticsoftheinternalstructureoftheapplicationobjectsstoredinthemandthereforedonotincludemethodstoaccessinternalcomponentsofthem(abstractionproblem).Thismakestheaccesstoacomponentofanapplicationobjectratherexpensivesincetheentireobjectneedstobeloadedintomainmemorytounderstanditsstructuralsemanticsandgetaccesstothecomponentofinterest.Further,BLOBstypicallyallowdatatobeappended,truncated,andmodiedthroughtheoverwritingofbytes.However,general 139

PAGE 140

datainsertionsanddeletionsarenotsupportedunlesstheuserexplicitlyshiftsdata(updateproblem).Inthispaper,wepresentanovel,genericmodelforcomplexobjectmanagementthatfocusesonprovidingtherequiredfunctionalitytoaddressthedatamanagement,generality,abstraction,andupdateproblems.Werstproposeageneralizedmethod,namedtypestructurespecication,forrepresentingandinterpretingthestructureofapplicationobjects.ThisspecicationprovidesaninterfacefortheADTimplementertodescribethestructureofcomplexobjectsattheconceptuallevel.Basedonthisspecication,weemployageneralizedframework,calledintelligentbinarylargeobjects(iBLOBs),fortheefcientandhigh-levelstorage,retrieval,andupdateofhierarchicallystructuredcomplexobjectsindatabases.iBLOBsstorecomplexobjectsbyutilizingtheunstructuredstoragecapabilitiesofDBMSandprovidecomponent-wiseaccesstothem.Inthissense,theyserveasacommunicationbridgebetweenthehigh-levelabstracttypesystemandthelow-levelbinarystorage.Thisframeworkisbasedontwoorthogonalconceptscalledstructuredindexandsequenceindex.AstructuredindexfacilitatesthepreservationofthestructuralcompositionofapplicationobjectsinunstructuredBLOBstorage.AsequenceindexisamechanismthatpermitsfullsupportofrandomupdatesinaBLOBenvironment.InSection 6.2 ,wedescribetheapplicationsthatinvolvelargestructuredapplicationobjects,theexistingapproachestohandlingthem,andourapproachtodealingwithstructuredobjectsinadatabasecontext.WeintroducetheconceptoftypestructurespecicationandtheiBLOBframeworkinSections 6.3 and 6.4 .WethenevaluateiBLOBsandcomparetheirperformanceintermsofbothstorageandqueryingcomponentsoflargestructuredspatialobjectsusingareal-worldlargespatialdataset.Finally,inSection 8.1 ,wedrawconclusionsanddiscussfuturework.ThisworkhasbeenpresentedinourdetailedconceptpaperonTSSandiBLOBs( Chenetal. 2010 ). 140

PAGE 141

Figure6-1. Aregionobjectasanexampleofacomplex,structuredapplicationobject.ItcontainsthefacesF1,F2,andF3,whichconsistofthecyclesC1andC2forF1,C3forF2,andC4forF3. 6.2ProblemsinHandlingStructuredObjectsinDatabaseSystemsApplicationobjectslikeDNAstructures,3Dbuildings,andspatialregionsarecomplex,highlystructured,andofvariablerepresentationlength.Thedesiredoperationsontheapplicationobjectsusuallyinvolvehighcomplexity,longexecutiontimeandlargememory.Forexample,regionobjectsarecomplexapplicationobjectsthatarefrequentlyusedinGISapplications.AsshowninFigure 6-1 ,aregionobjectconsistsofcomponentscalledfaces,andfacesareenclosedbycycles.Eachcycleisaclosedsequenceofconnectedsegments.Applicationsthatdealwithregionsmightbeinterestedinnumericoperationsthatcomputethearea,theperimeterandthenum-beroffacesofaregion.Theymightalsobeinterestedingeometricoperationsthatcomputetheintersection,union,anddifferenceoftworegions.Manymoreoperationsonregionsarerelevanttoapplicationsthatworkwithmapsandimages.Inanycase,theimplementationofanoperationrequireseasyaccesstocomponentsofstructuredobjects(e.g.,segments,cycles,andfacesofaregion)thatuseslessmemoryandrunsinlesstime.Sincedatabasesystemsprovidebuilt-inadvancedfeaturesliketheSQLquerylanguage,transactioncontrol,andsecurity,handlingcomplexobjectsinadatabasecontextisanexpedientstrategy.Mostapproachesarebuiltupontwoimportantarchitecturesthatenabledatabasesupportforapplicationsinvolvingcomplexapplicationobjects. 141

PAGE 142

(a)(b)(c)Figure6-2. Thelayeredarchitecture(a)andtheintegratedarchitecture(b)andoursolution(c). EarlyapproachesapplyalayeredarchitectureasshowninFigure 6-2 a,inwhichamiddlewarethathandlescomplexapplicationobjectsisclearlyseparatedfromtheapplicationfront-endthatprovidesservicesandanalysismethodstoitsusers.Inthisarchitecture,onlytheunderlyingprimitivedataarephysicallystoredintraditionalRDBMStables.Theknowledgeaboutthestructureofcomplexobjectsismaintainedinthemiddleware.Itistheresponsibilityofthemiddlewaretoloadtheprimitivedatafromtheunderlyingdatabasetables,toreconstructcomplexobjectsfromtheprimitivedata,andtoprovideoperationsoncomplexobjects.TheunderlyingDBMSinthelayeredarchitecturedoesnotunderstandthesemanticsofthecomplexdatastored.Inthissense,thedatabaseisoflimitedvalue,andtheburdenisontheapplicationdevelopertoimplementamiddlewareforhandlingcomplexobjects.Thiscomplicatesandslowsdowntheapplicationdevelopmentprocess.Alargelyacceptedapproachistomodelandimplementcomplexdataasab-stractdatatypes(ADTs)inatypesystem,oralgebra,whichisthenembededintoanextensibleDBMSanditsquerylanguage.Thisapproachemploysanintegratedar-chitecture(Figure 6-2 (b)),wheretheapplicationsdirectlyinteractwiththeextendeddatabasesystem,andusetheADTsasattributedatatypesinadatabaseschema.SomecommercialdatabasevendorslikeOracleandPostgreshaveincludedsome 142

PAGE 143

ADTslikespatialdatatypesasbuilt-indatatypesintheirdatabaseproducts.ExtensibleDBMSprovidesuserstheinterfacesforimplementingtheirownADTsothatalltypesofapplicationscanbesupported.SincetheonlyavailabledatastructureforstoringcomplexobjectswithvariablelengthisBLOB,theimplementationsofADTsforcomplexobjectsaregenerallybasedonBLOBs.Theimplementationofanabstractdatatypeinvolvesthreetasks,thedesignofbinaryrepresentation,theimplementationofcomponentretrievalandupdate,andtheimplementationofhighleveloperationsandpredicates.Theintegratedarchitecturehasobviousadvantages.Ittransferstheburdenofhandlingcomplexobjectsfromtheapplicationdevelopertodatabases.Onceabstractdatatypesaredesignedandintegratedintoadatabasecontext,applicationsthatdealwithcomplexobjectsbecomestandarddatabaseapplications,whichrequirenospecialtreatment.Thissimpliesandspeedsupthedevelopmentprocessforcomplexapplications.However,thedrawbackofthisapproachisthatADTsforstructuredapplicationobjectsrelyontheunstructuredBLOBtype,whichprovidesonlybyteleveloperationsthatcomplicate,orevenfoil,theimplementationofcomponentretrievalandupdate.Bytemanipulationisaredundantandtedioustaskfortypesystemimplementerswhowanttoimplementahigh-leveltypesystembecausetheywanttofocusonthedesignofthedatatypesandthealgorithmsforthehigh-leveloperationsandpredicates.Inthischapter,weproposeanewconceptthatextendstheintegratedarchitectureapproach,providesthetypesystemimplementerswithahighlevelaccesstocomplexobjects,andiscapableofhandlinganystructuredapplicationobjects.Ourmotivationfordesigningstructuredobjectstorageindatabasesalsocomesfromsomevisionspresentedin( CareyandDeWitt 1996 ; McKenneyetal. 2006 )andtheseveralnewconferencesonobjectdatabases.Inourconceptforsupportingobjectstorageindatabases,weapplytheintegratedarchitectureapproachandextenditwithageneralizedframework(Figure 6-2 c)thatconsistsoftwocomponents,thetypestructure 143

PAGE 144

specication(Section 6.3 )andtheintelligentBLOBconcept(Section 6.4 ).Thetypestructurespecicationconsistsofalgebraicexpressionsthatareusedbytypesystemimplementerstospecifytheinternalhierarchyoftheabstractdatatype.ItislaterusedasthemetadatafortheintelligentBLOBtoidentifythesemanticmeaningofeachstructurecomponent.Further,aspartofthetypestructurespecicationweprovideasetofhigh-levelfunctionsasinterfacesfortypesystemimplementerstocreate,access,ormanipulatedataatthecomponentlevel.Tosupportthecorrespondinginterfaces,weproposeagenericstoragemethodcalledintelligentBLOB(iBLOB),whichisabinaryarraywhoseimplementationisbasedontheBLOBtypeandwhichmaintainshierarchicalinformation.Itisintelligentbecause,unlikeBLOBs,itunderstandsthestructureoftheobjectstoredandsupportsfastaccess,insertionandupdatetocomponentsatanylevelintheobjecthierarchy.Thetypestructurespecicationintheframeworkprovidesanabstractviewoftheapplicationobjectwhichhidestheimplementationdetailsoftheunderlyingdatastructure.TheunderlyingintelligentBLOBsensureagenericstoragesolutionforanykindsofstructuredapplicationobjects,andenabletheimplementationofthehigh-levelinterfacesprovidedbythetypestructurespecication.Therefore,thetypestructurespecicationandtheconceptofintelligentBLOBstogetherenableaneasyimplementationforabstractdatatypes.typesystemimplementerscanbereleasedfromthetaskofinterpretingthelogicalsemanticsofbinaryunstructureddata,andthecomponentlevelaccessisnativelysupportedbytheunderlyingiBLOB. 6.3RepresentingandInterpretingStructuredApplicationObjectswithTypeStructureSpecicationsThestructuresofdifferentapplicationobjectscanvary.Examplesarethestructureofaregion(Figure 6-1 )andthestructureofabook.Weaimatdevelopingagenericplatformthataccommodatesallkindsofhierarchicalstructures.Thus,therststepistoexploreandextractthecommonpropertiesofallstructuredobjects.Unsurprisingly,the 144

PAGE 145

(a)(b)Figure6-3. Thehierarchicalstructureofaregionobjectandthehierarchicalstructureofabookobject. hierarchyofastructuredobjectcanalwaysberepresentedasatree.Figure 6-3 ashowsthetreestructureofaregionobject.Inthegure,face[],holeCycle[],andsegment[]representalistoffaces,alistofholecyclesandalistofsegmentsrespectively.Inthetreerepresentation,therootnoderepresentsthestructuredobjectitself,andeachchildnoderepresentsacomponentnamedsub-object.Asub-objectcanfurtherhaveastructure,whichisrepresentedinasub-treerootedwiththatsub-objectnode.Forexample,aregionobjectinFigure 6-3 aconsistsofalabelcomponentandalistoffacecomponents.Eachfaceinthefacelistisalsoastructuredobjectthatcontainsafacelabel,anoutercycle,andalistofholecycles,whereboththeoutercycleandtheholecyclesareformedbysegmentslists.Similarly,thestructureofabookcanalsoberepresentedasatree(Figure 6-3 b).Further,weobservethattwotypesofsub-objectscanbedistinguishedcalledstructuredobjectsandbaseobjects.Structuredobjectsconsistofsub-objects,andbaseobjectsarethesmallestunitsthathavenofurtherinnerstructure.Inatreerepresentation,eachleafnodeisabaseobjectwhileinternalnodesrepresentstructuredobjects.Atreerepresentationisausefultooltodescribehierarchicalinformationataconceptuallevel.However,togiveamoreprecisedescriptionandtomakeitunderstandabletocomputers,aformalspecicationwouldbemoreappropriate.Therefore,weproposeagenerictypestructurespecicationasanalternativeofthetreerepresentationfordescribingthehierarchicalstructureofapplicationobjects. 145

PAGE 146

Werstintroducetheconceptofstructureexpressions.Structureexpressionsdenethehierarchyofastructuredobject.Astructureexpressioniscomposedofstructuretags(TAGs)andstructuretaglists(TAGLISTs).Astructuretag(TAG)providesthedeclarationforasinglecomponentofastructuredobject,whereasastructuretaglist(TAGLIST)providesthedeclarationforalistofcomponentsthathavethesamestructure.ThedeclarationofaTAG,namedtagdeclaration,ishNAME:TYPEi,whereNAMEistheidentierofthetagandthevalueofTYPEiseitherSO,whichisaagthatindicatesastructuredobject,orBO,whichisaagthatindicatesabaseobject.Anexampleofastructuredobjecttagishregion:SOi,andhsegment:BOiisanexampleofabaseobjecttag.Werstdeneasetofterminalsthatwillbeusedinstructureexpressionsasconstants.Then,weshowthesyntaxofstructureexpressions. TerminalSetS=f:=,h,i,j,[,],SO,BO,:g Expression::=TAG:=hTAGjTAGLISTi+;TAGLIST::=TAG[]TAG::=hNAME:TYPEiTYPE::=hSOjBOiNAME::=IDENTIFIERIntheregionexample,wecandenethestructureofaregionobjectwiththefollowingexpression:hregion:SOi:=hregionLabel:BOihface:SOi[].Intheexpression,theleftsideof:=givesthetagdeclarationofaregionobjectandtherightsideof:=givesthetagdeclarationsofitscomponents,inthiscase,theregionlabelandthefacelist.Thus,wesaytheregionobjectisdenedbythisstructureexpression.Withstructureexpressions,thetypesystemimplementercanrecursivelydenethestructureofstructuredsub-objectsuntilnostructuredsub-objectsareleftundened.Alistofstructureexpressionsthenformsaspecication.Wecallaspecicationthatconsistsofstructureexpressionsandisorganizedfollowingsomerulesatypestructurespecication(TSS)foranabstractdatatype.Threerulesaredesignedtoensurethe 146

PAGE 147

correctnessandcompletenessofatypestructurespecicationwhenwritingstructureexpressions:(1)therststructureexpressioninaTSSmustbetheexpressionthatdenestheabstractdatatypeitself(correctness);(2)everystructuredobjectinaTSShastobedenedwithoneandonlyonestructureexpression(completenessanduniqueness);(3)noneofthebaseobjectsinaTSSisdened(correctness).Byfollowingtheserules,thetypesystemimplementercanwriteonetypestructurespecicationforeachabstractdatatype.Further,itisnotdifculttoobservethattheconversionbetweenatreerepresentationandatypestructurespecicationissimple.TherootnodeinatreemapstotherststructureexpressionintheTSS.Sinceallinternalnodesarestructuredsub-objectsandleafnodesarebasesub-objects,eachinternalnodehasexactlyonecorrespondingstructureexpressionintheTSS,andleafnodesrequirenostructureexpressions.ThetypestructurespecicationoftheabstractdatatyperegioncorrespondingtothetreestructureinFigure 6-3 aisasfollows: hregion:SOi:=hregionLabel:BOihface:SOi[];hface:SOi:=hfaceLabel:BOihouterCycle:SOihholeCycle:SOi[];houterCycle:SOi:=hsegment:BOi[];hholeCycle:SOi:=hsegment:BOi[];Thenextstepafterspecifyingthestructureistocreateandstoretheapplicationobjectintothedatabase.TheTSSprovidesaworkableinterfaceforthetypesystemimplementertocreate,accessandnavigatethroughtheobject.Thishigher-levelinterfaceistheabstractionoftheiBLOBinterface.Thisabstractionalongwiththespecication,freesthetypesystemimplementerfromunderstandingtheunderlyingdatatypeiBLOBthatisusedfornallyrepresentingtheapplicationobjectinthedatabase.Navigatingthroughthestructureoftheobjectisdonebyspecifyingapathfromtheroottothenodebyastringusingthedot-notation.Forexample,topointtotherstsegmentoftheoutercycleofthethirdfaceofaregionobjectcanbespeciedbythestringregion.face[3].outerCycle.segment[1].Acomponentnumber(e.g.,rstsegment, 147

PAGE 148

thirdface)isdeterminedbythetemporalorderwhenacomponentwasinserted.Animportantpointtomentionisthatthestructuralvalidityofapath(e.g.,whetheranoutercycleisasubcomponentofaface)canbeveriedbyparsingtheTSS.However,theexistenceofathirdfacecanonlybedetectedduringruntime.Thesetofoperatorswhicharedenedbytheinterfacearegivenbelow: create:!SOget:path!BO[]set:path!boolset:pathchar!boolbaseObjectCount:path!intsubObjectCount:path!intAnapplicationobjectcanbecreatedbytheoperatorcreate()whichgeneratesanemptyapplicationobject.Theoperatorget(p)returnsallbaseobjectsatleafnodesunderthenodespeciedbyanyvalidpathp.Sincenodatatypesaredenedforthestructuredobjectsinintermediatenodes,theseobjectsarenotaccessible,andpathstothemareundened.Hence,pathstointermediatenodesareinterpreteddifferentlyinthesensethattheoperatorget(p)recursivelyidentiesandreturnsallbaseobjectsunderp.Theoperatorset(p)createsanintermediatecomponent.Theoperatorset(p,s)insertsabaseobjectgivenasacharacterstringsatthelocationspeciedbythepathp.ThelasttwooperatorsbaseObjectCount(p)andsubObjectCount(p)returnthenumberofbaseobjectsandthenumberofsub-objectsunderanodespeciedbythepathp.Asanexample,foraregionobjectwithonefacethatcontainsanoutercyclewiththreesegments,thecorrespondingcodeforcreatingtheregionobjectisgivenbelow: 148

PAGE 149

regionr=create();r.set(region.regionLabel,MyRegion);r.set(region.face[1]);r.set(region.face[1].faceLabel,Face1);r.set(region.face[1].outerCycle);r.set(region.face[1].outerCycle.segment[1],seg1);r.set(region.face[1].outerCycle.segment[2],seg2);r.set(region.face[1].outerCycle.segment[2],seg3);Therstlineofthecodeshowshowthetypesystemimplementercancreatearegionobjectbasedonthespeciedtypestructurespecication.Thesecondlinecreatestherstfaceandthethirdlineitsoutercycleasintermediatecomponents.Thefollowingthreelinesstorethethreesegmentsseg1,seg2,seg3ascomponentsoftheoutercycle. 6.4IntelligentBinaryLargeObjects(iBLOBs)Inthissection,wepresenttheconceptualframeworkforanewdatabasedatatypecallediBLOBforIntelligentBinaryLargeObjects.Thistypeenhancesthefunctionalityoftraditionalbinarylargeobjects(BLOBs)indatabasesystems.Ourconceptalsohelpstosolvethegenerality,abstractionandupdateproblems(describedinSection 7.1 )thatareexhibitedbycurrentapproaches(seeSection 2.1 )tomanagelargeapplicationobjects.BLOBsservecurrentlyastheonlymeanstostorelargeobjectsinDBMS.However,theydonotpreservethestructureofapplicationobjectsanddonotprovideaccess,updateandqueryfunctionalityforthesub-componentsoflargeobjects.iBLOBshelptosmartlyextendtraditionalBLOBsbypreservingtheobjectstructureinternallyandprovidingapplication-friendlyaccessinterfacestotheobjectcomponents.Allthisisachievedwhilemaintaininglowlevelaccesstodataandextendingexistingdatabasesystemsusingobject-orientedconstructsandabstractdatatypes(ADTs).TheiBLOBframeworkconsistsoftwomainsectionscalledthestructureindexandthesequenceindex(Figure 6-4 ).Therstsectioncontainsthestructureindexwhichhelpsusrepresenttheobjectstructureaswellasthebasedata.Thesecond 149

PAGE 150

sectioncontainsthesequenceindexthatdictatesthesequentialorganizationofobjectfragmentsandpreservesitunderupdates.SincetheunderlyingstoragestructureofaniBLOBisprovidedthroughaBLOB,whichisavailableinmostDBMSs,theiBLOBdatatypecanberegisteredasauser-deneddatatypeandbeusedinSQL. 6.4.1StructureIndex:PreservingStructureinUnstructuredStorageAstructureindexisamechanismthatallowsanarbitraryhierarchicalstructuretoberepresentedandstoredinanunstructuredstoragemedium.Itconsistsoftwocomponentsfor,rst,therepresentationofthestructureofthedataand,second,theactualdatathemselves.Thestructuralcomponentisusedasareferencetoaccessthedata'sstructuralhierarchy.Themechanismisnotintendedtoenforceconstraintsonthedatawithinit;thus,ithasnoknowledgeofthesemanticsofthedatauponwhichitisimposed.Thisconceptconsidershierarchicallystructuredobjectsasconsistingofanumberofvariable-lengthsub-objectswhereeachsub-objectcaneitherbeastructuredobjectorabaseobject.Withineachstructuredobject,itssub-objectsresideinsequentiallynumberedslots.Theleavesofthestructurehierarchycontainbaseobjects.Toillustratetheconceptofastructureindex,weshowanexamplehowtostoreaspatialregionobjectwithaspecicstructureinadatabase.AregiondatatypemaybedescribedbyahierarchicalstructureasshowninFigure 6-3 a.Consideraregionmadeupofseveralfaces.Ifweneededtoaccessthe50thfaceofaregionobjectusingatraditionalBLOBstoragemechanism,onewouldhavetoloadandsequentiallytraversetheentireBLOBuntilthedesiredfacewouldbefound.Further,sincethefaceobjectscanbeofvariablelength,thelocationofthe50thfacecannotbeeasilycomputed Figure6-4. IllustrationofaniBLOBobjectconsistingofastructureindexandasequenceindex. 150

PAGE 151

Figure6-5. Astructuredobjectconsistingofnsub-objectsandninternaloffsets Figure6-6. Astructuredobjectconsistingofabaseobjectandstructuredsub-objects withoutextrasupportbuiltintotheBLOB.InordertoavoidanundesirablesequentialtraversaloftheBLOB,weintroducethenotionofoffsetstodescribestructure.EachhierarchicallevelofastructureinastructureindexstoredinaBLOBismadeupoftwocomponents(correspondingtothetwocomponentsofthegeneralstructureindexdescribedabove).Therstcomponentcontainsoffsetsthatrepresentthelocationofspecicsub-objects.Thesecondcomponentrepresentsthesub-objectsthemselves.Wedeneoffsetstohaveaxedsize;thus,thelocationoftheithfacecanbedirectlydeterminedbyrstcalculatingthelocationoftheithoffsetandthenreadingtheoffsettondthelocationoftheface.Figure 6-5 showsastructuredobjectwithinternaloffsets.Therecursivenatureofhierarchicalstructuresallowsustogeneralizetheabovedescription.Eachsub-objectcanitselfhaveastructureliketheregiondescribedabove.Objectsatthesamelevelarenotrequiredtohavethesamestructure;thus,atanygivenlevelitispossibletondbothstructuredsub-objectsandbaseobjects(rawdata).Forexample,wecanextendthestructureofaregionobjectsothatitismadeupofacollectionoffaceseachofwhichcontainsanoutercycleandzeroormoreholecycles,whichinturnaremadeupofacollectionofsegments.Segmentscanbeimplementedasapairof(x,y)-coordinatevalues.ThisexampleisillustratedintermsofstructuredandbaseobjectsinFigure 6-6 wherethetoplevelobjectrepresentsaregionwithaninformationpart,alabel,andoneofitsfacesub-objects. 151

PAGE 152

Ingeneral,aspecicstructureindeximplementationmustbedenedwithrespecttotheunderlyingunstructuredstoragemedium.BecausewehavetouseBLOBsastheonlyalternativeinadatabasecontext,weareforcedtoembedboththestructureindexandthedataintoaBLOB.Thus,usinganoffsetstructureembeddedwithinthedataitselfisanappropriatesolution.However,thismaynotbeidealinallcases.Forinstance,onecouldimplementastructureindexfordatastoredinatles.Inthiscase,thestructureindexandthedatacouldberepresentedinseperateles.Ingeneral,thestructureindexconceptmustbeadaptedtothecapabilitiesoftheuser'sdesiredstoragemediumforimplementation. 6.4.2SequenceIndex:TrackingDataOrderforUpdatesDifferentDBMSsprovidedifferentimplementationsoftheBLOBtypewithvariedfunctionalities.However,mostadvancedBLOBimplementationssupportthreeoperationsatthebytelevel,namely,randomreadandappend(writebytesatendofBLOB),truncate(deletebytesatend)andoverwrite(replacebyteswithanotherblockofbytesofthesameorsmallerlength).Structuredlargeobjectsrequiretheabilitytoupdatesub-objectswithinastructure.Specically,theyrequirerandomupdateswhichincludeinsertion,deletionandtheabilitytoreplacedatawithnewdataofarbitrarysize.Examplesarethereplacementofasegmentbyseveralsegmentsinacycleofaregionobject,oraddinganewface.Givenalargeregionobject,updatingitentirelyforeachchangeinaface,cycleorsegmentbecomesverycostlywhenstoredinBLOBs(updateproblem).Thus,itisdesirabletoupdateonlythepartofthestructurethatneedsupdating.Forthispurpose,wepresentanovelsequenceindexconceptthatisbasedontherandomreadanddataappend Figure6-7. Anout-of-ordersetofdatablocksandtheircorrespondingsequenceindex 152

PAGE 153

Figure6-8. Theinitialin-orderandde-fragmenteddataandsequenceindex. operationssupportedbyBLOBsExtracapabilitiesprovidedbyhigherlevelBLOBsareafurtherimprovementandserveforoptimizationpurposes.ThesequenceindexconceptisbasedontheideaofphysicallystoringnewdataattheendofaBLOBandprovidinganindexthatpreservesthelogicallycorrectorderofdata.Consequently,datawillhaveinternalfragmentationandwillbephysicallystoredout-of-order,asillustratedinFigure 6-7 .Inthisgure,thedatablocks(withstartandendbyteaddressesrepresentedbylettersundereachboundary)representingfacesshouldbereadintheorder1,2,3,4,eventhoughphysicallytheyarestoredout-of-orderintheBLOB(wewillstudythepossiblereasonsshortly).Byusinganorderedlistofphysicalbyteaddressranges,thesequenceindexspeciestheorderinwhichthedatashouldbereadforsequentialaccess.ThesequenceindexfromFigure 6-7 indicatesthattheblock[i...j]mustbereadrst,followedbytheblock[l...m],etc.Basedonthegeneraldescriptionofthesequenceindexgivenabove,wenowshowhowtoapplyitasasolutiontotheupdateproblem.AssumethatthedataforagivenstructuredobjectisinitiallystoredsequentiallyinaBLOB,asshowninFigure 6-8 .Supposefurtherthattheuserthenmakesaninsertionatpositionkinthemiddleoftheobject.InsteadofshiftingdataafterpositionkwithintheBLOBtomakeroomforthenewdata,weappendittotheBLOBasblock[j...l],asshowninFigure 6-9 .Bymodifyingthesequenceindextoreecttheinsertion,weareabletolocatethenewdataatitslogicalpositionintheobject. Figure6-9. Asequenceindexafterinsertingblock[j...l]atpositionk. 153

PAGE 154

Figure 6-10 illustratesthebehaviorofthesequenceindexwhenablockisintendedtobedeletedfromthestructuredobject.EventhoughthereisnonewdatatoappendtotheBLOB,thesequenceindexmustbeupdatedtoreectthenewlogicalsequence.BecausetheBLOBdoesnotactuallyallowforthedeletionofdata,thesequenceindexismodiedinordertopreventaccesstothedeletedblock[m...n]ofdata.ThiscanresultininternalfragmentationofdataintheiBLOBwhichcanbemanagedusingaspecialresequenceoperationshownlaterintheiBLOBinterface. Figure6-10. Asequenceindexafterdeletingblock[m...n]. Finally,Figure 6-11 illustratesthecaseofanupdatewherethevaluesofablockofdata[o...p]asaportionofblock[j...l]arereplacedwithvaluesfromanewblock[l...q].Forthiskindofupdate,itispossibleforthenewsetofvaluestogenerateablocksizedifferentfromthatoftheoriginalblockbeingreplaced. Figure6-11. Asequenceindexafterreplacingblock[o...p]byblock[l...q]. iBLOBsenhanceBLOBsbyprovidingsupportfortruncateandoverwriteoperationsatthehighercomponentlevelofanapplicationobject'sstructure.ThetruncateoperationinBLOB(deletebytesatend)isenhancediniBLOBwitharemovefunctionwhichcanperformdeletionofcomponentsatanylocation(beginning,middleorattheendofstructure)asshowninFigure 6-10 .TheoverwriteoperationinBLOB(replacebytearraywithanotherofsamelength)isenhancediniBLOBwithacombinationofremoveandinsertfunctionsandsequenceindexadjustments,toperformtheoverwriteofcomponentswithothercomponentsofdifferentsizesasshowninFigure 6-10 154

PAGE 155

create:!iBLOB (6)create:Storage!iBLOB (6)create:iBLOB!iBLOB (6)copy:iBLOBiBLOB!iBLOB (6)locateiBLOB:iBLOB!Locator (6)locate:iBLOBLocatorInt!Locator (6)getStream:iBLOBLocator!Stream (6)insert:iBLOBdataIntLocatorInt!iBLOB (6) insert:iBLOBiBLOBLocatorInt!iBLOB (6)remove:iBLOBLocatorInt!iBLOB (6)append:iBLOBdataIntLocator!iBLOB (6)append:iBLOBiBLOBLocator!iBLOB (6)length:iBLOBLocator!Int (6)count:iBLOBLocator!Int (6)resequence:iBLOB!iBLOB (6)Figure6-12. ThestandardizediBLOBinterface 6.4.3TheiBLOBInterfaceInthissection,wepresentagenericinterfaceforconstructing,retrievingandmanipulatingiBLOBs.WithinthisiBLOBinterface,weassumetheexistenceofthefollowingdatatypes:theprimitivetypeIntforrepresentingintegers,Storageasastoragestructurehandletype(i.e.,blobhandle,ledescriptor,etc.),LocatorasareferencetypeforaniBLOBoranyofitssub-objects,StreamasanoutputchannelforreadingbyteblocksofarbitrarysizefromaniBLOBobjectoranyofitssub-objects,anddataasarepresentationofabaseobject.Figure 6-12 liststheoperationsofferedbytheinterface.Weusetheterml-referencedobjecttoindicatetheobjectthatisreferredtobyagivenlocatorl.Thefollowingdescriptionsfortheseoperationsareorganizedbytheirfunctionality: ConstructionandDuplication:AniBLOBobjectcanbeconstructedinthreedifferentways.Therstconstructorcreate()( 6 )createsanemptyiBLOBobject.Thesecondconstructorcreate(sh)( 6 )constructsaniBLOBobjectfromaspecicstoragestructurehandleshsuchasaBLOBobjecthandleoraledescriptor.Thethirdconstructorcreate(s)( 6 )isacopyconstructorandbuildsanew 155

PAGE 156

iBLOBobjectfromanexistingiBLOBobjects.Similarly,aniBLOBobjects2canalsobecopiedintoanotheriBLOBobjects1byusingthecopy(s1,s2)operator( 6 ). InternalReference:Inordertoprovideaccesstoaninternalsub-objectofaniBLOBobject,weneedawaytoobtainthereferenceofsuchasub-object.Thesub-objectreferencingprocessmuststartfromthetopmosthierarchicalleveloftheiBLOBobjectswhoselocatorlisprovidedbytheoperatorlocateiBLOB(s)( 6 ).Fromthislocatorl,anextlevelsub-objectcanbereferencedbyitsslotiintheoperatorlocate(s,l,i)( 6 ). ReadandWrite:SinceiBLOBssupportlargeobjectswhichmaynottintomainmemory,weprovideastreambasedmechanismthroughtheoperatorgetStream(s,l)( 6 )toconsecutivelyreadarbitrarysizedatafromanyl-referencedobject.Thestreamobtainedfromthisoperatorbehavessimilarlytoacommonleoutputstream.Otherthanreadingdata,theinterfaceallowsinsertionofeitherabaseobjectdofspeciedsizezthroughtheoperatorinsert(s,d,z,l,i)( 6 )oranentireiBLOBobjects1throughtheoperatorinsert(s,s1,l,i)( 6 )intoanyl-referencedobjectataspeciedsloti.Abaseobjectdsuchasintheoperatorappend(s,d,z,l)( 6 )oraiBLOBobjects1suchasinoperatorappend(s,s1,l)( 6 )canbeappendedtoanl-referencedobject.Thisiseffectivelythesameasinsertingtheinputasthelastsub-objectofthereferencedobject.Theoperatorremove(s,l,i)( 6 )removesthesub-objectatslotifromtheparentcomponentwithLocatorl. PropertiesandMaintenance:Theactualsizeofanl-referencedobjectisobtainedbyusingtheoperatorlength(s,l)( 6 )whilethenumberofsub-objectsoftheobjectisprovidedbytheoperatorcount(s,l)( 6 ).Finally,theoperatorresequence(s)( 6 )reorganizesanddefragmentstheiBLOBobjectscollapsingitssequenceindexsuchthatitcontainsasinglerange.ThisoperationeffectivelysynchronizesthephysicalandlogicalrepresentationsoftheiBLOBobjectandminimizesthestoragespace.TotestthefunctionalitywehaveimplementedtheiBLOBdatatypeinOracleDatabase11gRelease2,IBMInformix1andPostgreSQLusingobjectorientedextensionsandprogrammingAPIoftheDBMS.EachoperatorintheTSSinterfacecanbeimplementedusingthecorrespondingiBLOBinterfaceoperator.Fore.g.,toimplementget(region.face[1].outerCycle.segment[1]),werstuselocateiBLOB( 6 )to 1IBM,theIBMlogo,andInformixareregisteredtrademarksofInternationalBusinessMachinesCorp.,registeredinmanyjurisdictionsworldwide. 156

PAGE 157

getaLocatortotheiBLOB,thenuselocate()( 6 )repeatedlytomoveacrosslevelsandnavigatetotherequiredcomponent(i.e.,rstsegment),andnally,getStream()( 6 )toretrievetherstsegmentoftheouterCycleinfthface.OtherTSSinterfacefunctionslikeset,baseObjectCountandsubObjectCountcanalsobeimplementedinasimilarmanner. 6.5ImplementationofMulti-StructurediBLOBsThisSectiondescribesourimplementationconceptforiBLOBstohandlemulti-structuredobjects.Thisisoftenusedforcomplexobjects,suchasspatialdata.Considerforexampletherepresentationofapointobjectasacollectionoflat,lonvaluesthatprovidesitsprimarystructure.Notethatacomplexpointobjectatanyinstantcanincludeacollectionofpointobjects,describedasamulti-pointgeometrytypein( OpenGISConsortium:ReferenceModel 2011 )geometrytypes,withindividualgeometriesrepresentedbyalatitude,longitudevaluepair.Anadditionalrepresentationcalledasthesecondarystructurecanthenbebuiltoverthisstructureastheindexforthispointcloud.Thiscould,fore.g.,alexicographicalorderingofthecoordinatevaluescorrespondingtothesinglecomplexpointobjectforefcientretrieval.Anotherinterestingapplicationforsuchmultiplestructuresisinthecaseofregionorpolygonobjects.Oftentimestheinternalrepresentationofaregionobjectdoesnotmatchtherequiredorderingforexecutingaplane-sweepalgorithmtoimplementspatialoperatorssuchasintersectionoftworegionsaswiththefollowingsignatureintersection:regionregion!region( GutingandSchneider 1995 ; Schneider 1997 ).Eachregionisstoredascontainingfaces(single-componentregions)withtheirouter-cyclestoredincounter-clockwiseorderingofsegments.Afacecancontainzeroormoreholecycles.Theseareoftenrepresentedbyaclockwiseorderingofsegmentlists.Thisconstitutestheregion'sprimarystructure.Asecondarystructureforthisregionobjectcanthenberepresentedasthelexicographicalorderingofsegmentcoordinatescanbeincorporatedintothe 157

PAGE 158

Figure6-13. Themulti-structuredhierarchicalstructureofacomplexregionormulti-polygonobject. structureasaspatialindex,asshowninFigure 6-13 fortheapplicationofspatialoperations.Theconceptofmultipleviewsonthesamedataobjectisveryessentialforcomplexstructureddataanddeservesadditionaldiscussiononitsrequirementandimplementationconceptasabstractdatatypesindatabases.Thedifferentviewsofastructuredobjectreectdifferentarrangementsofitselements.Sinceabaseelementonlycontainspuredata,itisanatomicelementwithaspecic,xedviewofthedata.Thus,onlythearrangementofthebaseelementsinanobjectaffectshowtheobjectisviewed.Toenableanobjecttohavemultiplestructuresorviewsofthesamedataandtoavoidredundancyandupdateconsistencyproblemswhichmayoccurbystoringthesamedatamorethanonceintheobject,referencelinksareusedtorefertocomponentsalreadyexistingelsewhereintheobject.Inordertostoremultiplerepresentationsofobjects,weextendthebasicTSSgrammarfromSection 6.3 byintroducingaReferenceObject(RO).Areferenceobject(RO)isdenedasapointerobjectthatreferencesanotherexistingstructuredobject(SO),baseobject(BO)orreferenceobject(RO).Areferenceobjectisconceptuallysimilartoaprogramminglanguagepointerinthatitcontainsinformationonhowtolocateotherobjects.WeuseasimilarnotationandstringrepresentationaswasusedtodenebasiciBLOBs.Becausestructuralelementsimposeahierarchicalstructureontheobject,weareabletorepresentthemasparenthesisthathierarchicallysubdivideastringofbase 158

PAGE 159

Figure6-14. Amulti-structuredobjectwiththreedifferentstructuresandreferencelinksdenotedbyshadedcirclesandarrows(a)andthecorrespondingdirectedgraphrepresentationwithreferencesasdashedarrows(b) elementsandreferenceelements.Abaseelementisagainrepresentedasasinglecharacterb.Areferenceelementissimilartoabaseelementinthatitcontainsdata(inthiscasethedataisinformationthatpointstoanotherelementintheobject)andcannotcontainsub-elements.Therefore,wesimplyrepresentareferenceelementasasinglecharacterr.Forexample,thestructureoftheiBLOBinFigure 6-14 canberepresentedas(((rrr)(rr))(bbbbrr)((r(rr))r(bb))).Thenewsyntaxforstructuredexpressionsisnowdenedasbelow. TerminalSetS=f:=,h,i,j,[,],SO,BO,RO,:g Expression::=TAG:=hTAGjTAGLISTi+;TAGLIST::=TAG[]TAG::=hNAME:TYPEiTYPE::=hSOjBOjROiNAME::=IDENTIFIERLikeabaseelement,referenceelementsmaynothavesub-elements;thus,thestructureofaniBLOBishierarchical.However,becausereferenceelementscanpointtootherelementsinthestructuredobject,amulti-structuredobjectisabletomodeldatawithdirectedgraphstructure,asopposedtostandardstructuredobjectsthatcanonlymodelhierarchicalobjects.Figure 6-14 (a)illustratesaiBLOBwithonestructurerepresentingasequentialviewofitsbaseelementsandtwootherstructures 159

PAGE 160

Figure6-15. ExampleofiBLOBinternalrepresentation representingtwodifferenthierarchicalviewsofthesamebaseelements.Figure 6-14 (b)showsadirectedgraphrepresentationofthesamemulti-structurediBLOB.FortheexampleshowninFigure 6-13 ,thenewTSSgrammarcanbedenedwiththesecondarystructuresforsegmentreferences(segmentRef)asbelow. hregion:SOi:=hregionLabel:BOihface:SOi[]hindex:SOi;hface:SOi:=hfaceLabel:BOihouterCycle:SOi;hindex:SOi:=hindexLabel:BOihsegmentRef:ROi[];houterCycle:SOi:=hsegment:BOi[];hholeCycle:SOi:=hsegment:BOi[];hsegmentRef:ROi:=&hsegment:BOi;Now,wecontinuetoillustratehowaniBLOBcanbeusedtorepresentaregionspatialdatatype.Asmentionedearlier,conceptually,objectsoftyperegionarecomposedoffaces.Eachfaceismadeupofoneormorepolygons.Therstpolygonrepresentstheouterboundaryofthefacewhiletheremainingpolygonsrepresentholeswithintheface.Logically,allpolygonsrepresentingholesinafacemustlieinsidethepolygonthatrepresentstheouterboundary.Eachpolygonisrepresentedbyasequenceofsegments.Thishierarchyofcomponentsisregardedasthestructuralcompositionofregions.Foralgorithmicpurposes,itisfavorabletorepresentthewholeregionasasingleorderedsequenceofsegments.Thisorderingisirrespectiveofthestructuralcomposition,thusinferringthestructurefromtheoverallsequenceofsegmentsisnot 160

PAGE 161

trivial.Asaresult,multi-structuredrepresentationiniBLOBsisagoodoptionforstoringregions.Asingleregionwillberepresentedbyitsorderedsequenceofsegmentsstoredinaview(whichwedenotetheprimarystructureview)oftheiBLOBhierarchy,anditsstructuralcompositionwillbestoredinanotherviewasasecondarystructure,allowingthelowestelementsinthisview'shierarchytoreferencethesegmentslocatedintheprimarystructure.Figure 6-16 (a)illustratesaregionwithtwofaces,onewithahole,andtheotherwithnoholes.Figure 6-16 (b)showsthemulti-structuredrepresentationofthatregion. (a)(b)Figure6-16. (a)Anexampleregionobjectwithtwofacesandonehole.Thenumbersareusedtoidentifyeachsegment.(b)Asamplemulti-structuredobjectrepresentationofthisregion.Dashedlinesindicatereferences.P,S,O,H,F,andS#denotetheprimarystructure,thesecondarystructure,outercycles,holes,faces,andsegments,respectively. InFigure 6-16 (b),theseparationoftheprimarystructureviewandthesecondarystructureofanmulti-structuredregionrepresentationisillustrated.TheprimarystructureiscomposedofasinglelevelcontainingtheactualorderedsequenceofsegmentsthatmakeuptheregioninFigure 6-16 (a).Thesecondarystructurebranchhasallthelevelsneededtorepresentthestructuralcompositionofaregion.Atthebottomofthehierarchyinsteadofstoringtheactualsegmentsthatmakeupthepolygons,referencestothesegmentsintheprimarystructurebrancharestored.Attheimplementationlevel,weproposeanewdatabasetypenamediBLOBthatintegratestheconceptsofstructureindexandsequenceindex.Thisdatatypeisintendedastheimplementationbasisforconstructingalgebraswithcomplex, 161

PAGE 162

multi-structuredapplicationdata.AniBLOBobjectisgeneralenoughtostoreanystructuredobjectirrespectiveofitsdepthandthecomplexityofitshierarchicalstructure.WhentheiBLOBdatatypeismadeavailableinobjectdatabasemanagementsys-tems(OODBMSs),newemergingdatabaseapplicationscanstoretheirspecializedobjectsiniBLOBs(whichthemselvesresidewithinBLOBs)andatthesametimetakeadvantageoftheadvancedfeaturesthesesystemsprovide(i.e.,concurrencyandtransactioncontrol,multi-useraccess,recoverability,robustness,etc.).WenowdescribeourimplementationforiBLOBsusingtheconceptualstructuredevelopedearlierinthischapter.InordertoprovideageneralimplementationofiBLOBsourarchitectureconsistsofagenericblobadapter(GBA).TheGBAconsistsofthefollowingabstractedapplicationprogramminginterface(API)ontopofavailabledatabaseBLOBAPI. classGenericBLOBInterface{public://Thisvariablewillstoretheblob_locator.Tosupportmultipledatabaseslob//locators,thetypeisvoid.void*locator;//thisvariablewillstoretheconnectionpointer.void*connect;//TheconstructortakestheloblocatoronwhichtoperformBLOBoperation//usingthisinterfaceGenericBLOBInterface(void*loc,void*conn);//ReadsspecificportionofBLOB.intread(unsignedchar*buffer,intstart,intlen); 162

PAGE 163

//(Over)writesspecificportionofBLOBintwrite(unsignedchar*buffer,intstart,intlen);//AppendsdatatoaBLOBintappend(unsignedchar*buffer);//ReturnsthelengthoftheBLOBintlength();//SettheBLOBlengthtosizeprovidedvoidtruncate(intsize);//SettheBLOBtoemptyvoidsetEmpty();};WehavewrittenseparateimplementationforOracle,InformixandPostgresdatabasesbasedonthisgenericAPItodemonstratehowGBAcanbeusedtoconnecttoanyavailableDBMSwithBLOBfunctionality.Fore.g.,intheOracleimplementationweincludetheheaderleocci.h,whichprovidestherequiredprogrammingfunctionalityforOracledatabaseLOBswiththe`Blob'typeusingtheOracleC++CallInterface(OCCI).Forinformixdatabase,weincludetheheaderleit.handusethecorrespondingLOBlocatorwiththe`ITLargeObject'type.OtherdatabasescanbeconnectedbyprovidingasimilarimplementationfortheGBAInterface.Next,usingthisGBAAPIandtheappropriateimplementation,webuildthestructureandsequenceindexes,whicharethebasiccomponentsofiBLOBs.BydemonstratingtheiBLOBimplementationatthemostgenericlevel,weensurethatalliBLOBfunctionalitycanbeprovidedunderanymajorDBMS.ThustheimplementationofiBLOBsathigherlevelsofLOBcapabilities(suchaswheninsertforlargersizedsub-componentinsideaBLOBisavailable)onlyallowsforincreasedperformanceandreducedstorageoverhead. 163

PAGE 164

TheiBLOBinterfaceisimplementedinC++totakeadvantageofitscapabilitiesasasystemprogramminglanguageaswellasthefactthatmostDBMSsprovideC++accessabilitylibraries.OuriBLOBimplementationisprovidedfortheOracleDBMSversion12gRelease2.Oracleprovideslevel3LOBs(withtruncateandupdatefunctionality);however,wecanignoretheOracleLOBoperationsavailableatthetoptwolevelsfordemonstration.InSection 6.5.1 ,wedescribeoursequenceindeximplementation.Then,inSection 6.5.2 ,wepresentourstructureindeximplementationandthedetailsofitsintegrationintotheiBLOBdatatypeimplementation. 6.5.1SequenceIndexImplementationThepurposeofthesequenceindexistoprovidealogicallysequentialviewofdataphysicallystoredinarbitraryorderintheLOB.Toallowthis,werepresentthesequenceindexasanarrayofaddressranges.Forourimplementation,thestartofeachrangeisrepresentedbytheaddressoftherstbyteintherange,andtheendofeachrangeisrepresentedbytheaddressofthelastbyteintherange.Thesequenceindexisdesignedasamainmemoryarraystoredondisk.InordertofacilitateefcientaccessandmanipulationofthesequenceindexandtheunderlyingiBLOB,thesequenceindexisloadedintomainmemoryuponaccesstoaparticulariBLOB.LocatingdataintheiBLOBisdonebyusingthesequenceindextomaplogicalbytepositionstophysicaladdresses.Forexample,considerFigure 6-7 .Assumewehavereadblock1andarenowatpositionj.Toretrievethenextlogicalblock(inthiscaseblock2),wemustndthephysicaladdressofthelogicalbytepositionimmediatelyfollowingblock1(positionj+(1logicalbyteposition)).Notethatinthisparticularexample,thisphysicaladdressisl.Thesequenceindexprovidesthismappingthroughthelocationoperationsdescribedintheinterfacebelow.WhenanupdateoperationisperformedonaiBLOBobject,thesequenceindexismodiedtorepresentalogicalshiftinthedatasequence.Todeleteablockofdata,weremoveanyreferenceswithinthesequenceindextophysicaladdressesintheblock. 164

PAGE 165

Notethatremovingdatafromthesequenceindexdoesnotnecessarilyimplyaphysicaldeletionofthereferenceddata.Toreecttheinsertionofnewdatainthesequenceindex,referencestothenewblockofdatamustbeinsertedintothesequenceindex.Eitheranewrangeisinsertedintothesequenceindex,oranexistingrangeisextendedtoincludethenewblock.Replacingadatablockwithanewdatablockamountstoadeletionfollowedbyaninsertion.Figure 6-17 presentstheinterfaceofferedbythesequenceindex.LetSeqIdxbethetypeofallsequenceindexes. create:!SeqIdx (6)create:range[]!SeqIdx (6)fromBack:SeqIdxInt!Int (6)fromFront:SeqIdxInt!Int (6)fromPos:SeqIdxIntInt!Int (6)insert:SeqIdxIntIntInt!SeqIdx (6)append:SeqIdxIntInt!SeqIdx (6)rmBySize:SeqIdxIntInt!SeqIdx (6)rmByEnd:SeqIdxIntInt!SeqIdx (6)replace:SeqIdxIntIntInt!SeqIdx (6)byteCount:SeqIdxIntInt!Int (6)Figure6-17. InterfaceforiBLOBsequenceindex Thesequenceindexoperationsspanthefollowingcategories: Constructor:Asequenceindexcanbeconstructedwithcreate()( 6 )whichgeneratesanemptysequenceindexwithnoranges.Whenconstructedwithcreate(d)( 6 ),createsasequenceindexloadedwiththeaddressrangesprovidedinthearrayd. Locate:Dataislocatedbytraversingthesequenceindexinthreeways.AcalltofromBack(si,b)( 6 )isabletondthephysicaladdressthatislocatedb(logical)bytesfromtheendofthebytesequencegivenbythesequenceindexsi.Similarly,fromFront(si,f)( 6 )ndstheaddresslocatedf(logical)bytesfromthebeginningofthesequenceindexsi.FunctionfromPos(si,p)( 6 )worksthesamewayasfunction( 6 )butinsteadofstartingfromthebeginning,itstartsfromagivenlocationp. 165

PAGE 166

Insert:Operatorinsert(si,l,p,s)( 6 )modiesthelogicalsequencebyinsertingsbytesofdatastoredatphysicallocationpatthepositionwhereaddresslisfoundinthesequenceindexsi.Theoperationappend(si,p,s)( 6 )appendssbytesofdatastoredatphysicallocationptotheendofthelogicalsequence. Remove:Dataislogicallyremovedbyusingthesequenceindexinoneoftwoways.TheoperatorrmBySize(si,s,l)( 6 )allowstheremovaloflbytesfromstartaddresssinsequenceindexsi.Noticehowthisremovalislogicalandcanspanseveralrangesofthesequenceindex.TheoperatorrmByEnd(si,s,e)( 6 )removesallbytesbetweenthestartaddresssandendaddresseinsequenceindexsi.Notethatsucharemovalcanspanseverallogicaladdressranges. Replace:Operatorreplace(si,l,p,s)( 6 )allowsthereplacementofsbytesatlocationlwithsbytesstoredatlocationp. ObjectByteCounter:OperatorbyteCount(si,s,e)( 6 )providesthenumberofbytesaccessiblebythesequenceindexbetweensandewheresandecanspanmultiplebyteranges. 6.5.2StructureIndexImplementationTheimplementationofourstructureindexworksbymaintaining,foreachobject,asetofxedsizeoffsetstotheaddresswhereeachofitssub-objectsends.Wepurposelydeneoffsetstopointtoendpositionsofsub-objectsbecausethebeginningoftherstsub-objectcanbecalculatedbylocatingtheendofthelastoffset.Ontheotherhand,theendpositionofthelastsub-objectcannotbecalculatedingeneral;thus,itmustbestored.Eachoffsetvalueisdeterminedbythephysicallocationwheretheoffsetiswritten,andthephysicallocationwherethesub-objectitrepresentsends.Soifatlocationxwewritetheoffsetforasub-objectthatendsatlocationy,thevalueofthatoffsetwillbey)]TJ /F7 11.955 Tf 11.14 0 Td[(x.Conceptually,anoffsetisapointertotheendofanobject.Forthisreason,wecanlocatethebeginningofthatsub-objectbycomputingtheendpositionofits(logically)previoussub-object(orinthecaseoftherstsub-object,theendpositionofthelastoffset). 6.5.3iBLOBimplementationTheiBLOBdatatypeimplementationuniesthefunctionalityofthesequenceindexandthestructureindex.Inadditiontothedatarequiredforthestructureindex(offsets) 166

PAGE 167

andsequenceindex(addressranges),theiBLOBrequireseachobjecttoincludeitsnumberofcontainedsub-objects,whichwedenotethecount,andalogicalobjectboundary,implementedasadummybyte.Thecountislocatedimmediatelyfollowingthedummybyteattheendofanobject.Figure 6-18 illustratesthelocationofthecountandthedummybytesatdifferentlevelsofthehierarchy.AbasicimplementationrequirementforiBLOBsistheabilitytondandrefertospecicsub-objectswithinaiBLOBobject.WedenethetypeLocatorasanobjecthandletypeforaiBLOBanditssub-objects.AninstanceoftheLocatortype(whichwedenotealocator)representsanobjectbutdoesnotcontainanyoftheobject'sdata.Inourimplementation,alocatorismadeupofthephysicaladdresswhereanobjectstarts,thephysicaladdresswhereanobjectends,andanobject'scount.Iftheobjectisatthelowestlevel,thenumberofsub-objectsissetto0. Figure6-18. AsampleinstanceofaiBLOB.Notethattheinneroffsetstructureofthesentencesinbetweenbracketsisnotshowninordertoimproveclarity.Thebytesimmediatelyafterthesesentencessymbolizetheirdummybytesandtheirsub-objectcounts. Recallthatoffsetsrefertophysicalbytepositionsinsteadoflogicalbytepositions.Offsetsareimplementedinthiswaybecauseifoffsetsweretostorelogicaladdresses,everydataupdateintheiBLOBwouldcausealloffsetslogicallyforwardoftheupdatetobecomeobsoleteduetothelogicalshiftofdata.Inthecaseofphysicaloffsets,thisproblemonlyappearsifthelastsub-objectofanobjectisupdatedcausingthephysicalendoftheobjecttochangewhichwouldrenderthatobject'soffsetobsolete.Thisproblemissolvedbytheexistenceofthedummybytemarkingthelogicalendofanobject,whichispointedtobytheobject'soffset.Thedummybyteisonlyremovedif 167

PAGE 168

theobjectitcorrespondstoisremoved.Asaconsequence,theoffsetalwayspointstoaphysicallocationthatexistsandcanbefoundthroughthesequenceindex.Basedonthepreviousideas,theinternalreferencefunctionslistedinSection 6.4.3 areimplemented.WeprovidethedetailsoftheimplementationofeachoftheiBLOBinterfacefunctionsasfollows: Constructor:TocreateaiBLOBbasedonaLOBhandlewithcreate( 6 ),weloadthesequenceindexwhichislocatedattheendoftheLOB.ThisispossiblebecausewestorethesizeofthesequenceindexasthelastpieceofdataofthewholeLOB. InternalReference:TogeneratealocatorfortheiBLOBinlocateiBLOB( 6 ),werstlocateitsendaddressbymovingqbytesfromtheendoftheiBLOB,whereqisequaltothesumofthesizeoftheiBLOB'scountandthesizeofthedummybyte.ThisisdonewiththesequenceindexfunctionfromBack( 6 ).Similarly,itsstartaddressisretrievedwithacalltothesequenceindexoperationfromFront(0)( 6 ).Thelocatorofthesub-objectatslotk,givenalocatorltoitsparentobject,isgeneratedusingtheoperationlocate( 6 ).Tolocatetheendoftheobjectwemustnditsoffsetwhichislocatedksofffromthestartoftheparent(wheresoffrepresentsthenumberofbytesusedbyeachoffset).Tondthestartofthesub-objectwhenk=0(i.e.,itistherstsub-object)wecansimplycalculatecountsoffwherecountisthenumberofsub-objectsoftheparent.Forallothersub-objects,theirstartcanbecalculatedbyndingtheoffsetfortheprevioussub-object(k)]TJ /F6 11.955 Tf 11.07 0 Td[(1)andlogicallymoving(fromtheendoftheprevioussub-object)qbytes. Write:First,thedatacorrespondingtothenewsub-objectisappendedtotheendoftheLOB.Thesedataincludethecount,thedummybyteandtheoffsetofthenewobject,inadditiontotheincrementedcountofitsparent.Oncethisisdone,thenewdataislogicallyrelocatedbymodifyingthesequenceindexandthenwritingthenewsequenceindexintotheLOB.Similarly,whenasub-objectisremoved,theonlydatatobewrittenintotheLOBisthedecrementedcountoftheparentobject.Theremovedsub-objectisonlyremovedthroughthesequenceindexbutitstillremainsphysicallystoredintheLOB. PropertiesandMaintenance:ResequencingaiBLOBwithoperationresequence( 6 )entailsmodifyingthephysicallocationsoftheobjects.Asaconsequence,theresequencingprocessmustgeneratenewoffsets.Byrecursivelyresequencingeachhierarchicallevel,theresequenceoperationisabletorewritetheiBLOBwithalldatain-orderandwithoutinternalfragmentation. 168

PAGE 169

Additionallybesidesthesebasicfunctions,wealsoprovidestatisticalfunctionstodeterminethefragmentationcountforiBLOBs(whenlargeobjectsareinsertedcontinuously),objectcountandsizeforsub-objectsatspeciclevelsinthehierarchy,andoveralllobstoragesize.Additionally,weimplementabatchmodeinwhichupdatesmadetoaniBLOBarecollectedandexectuedinabatchsequencesothatthenumberofupdatesontheinternalLOBisreduced.Finally,usingthisimplementation,weimplementthetypesystemspecicationforspatialdatatypes.ConsiderthefollowingTSSgrammarwehavedevelopedforaspatialregionobjectwiththefollowingcharacteristics,givenasinputoptionallyasaXMLle. 169

PAGE 170

ThefeaturesthisTSSXMLspecicationareasfollows.EveryXMLnodereferstoanelementintheTSS.TSSelementscanbeoneofthreetypes:SO-thatdenotesastructuredobject(innernodeinthehierarchy),BO-thatdenotesabaseobject(leafnodeinthehierarchy),andRO-thatdenotesareferenceobject,whichcanpointtoanyotherstructured,baseorreferenceobjectinthehierarchy.Additionally,wealsoincludeLISTversionsofeachoftheabove.ThisisindicatedbythelisttagattributeintheXMLle.Iflistissetastrue,thenweincludeaplaceholderobject(intheiBLOB)andtheninsertitssub-objects.Allelementsofalistshallbeofthesametype(eitheranSO,BOorRO).AbooleanreturnvaluecalledprimaryindicateswhichoftheparentlevelSO'sistheprimarystructure.ThereisonlyoneprimarystructureinanyTSS.Everyotherstructuremustbesecondaryanddenedusingreferencepointerstotheprimarystructure'snodes.Optionally,everysecondarystructurecanbeindentiedbyastructuretagwhichhasavalueoftypeintegerfrom0(primary)ton(secondary)structures. OtherimportantpropertiesofthisTSSXMLschemadefinitionareasfollows.1.Therootelementwillbecalled"TSS".2.Thenameofthetypesystemshallbedefinedinthe"name"tagoftheTSS.3.AllotherelementsinsidetheTSSshallbeeitherSO,BO,orRO,referingtostructuredobjects,baseobjectsorreferenceobjectsrespectively.4.Ateverylevelofthehierarchy,elementswillhaveauniquenametag.Namescanbeusedmultipletimesatdifferenthierarchicallevels.Forexample,therecanbeonly"segments"definedinside"outerCycle"andlaterinside"innerCycle"elements.5.BOelementswillalwayshavea"type"tagthatreferstothetypeofthe 170

PAGE 171

baseobject.ThesemustbeoneofthetypessupportedattheiBLOBlevel.Currently,thisincludesbinary,int,double,string,intArray,anddoubleArray.6.Theexplicit"type"tagisnotincludedforSOobjects.[SeenoteIIIbelow].ThetypeofanSOobjectisacombinationofitsinternalnodesasdefinedbytheuser.Forexample,thetypeofanouterCycleiseachofwhichisadoubleArray.Thevalueofbaseobjectsismaintainedbytheuserofthetypesystem.Baseobjectscanbeassimpleasaunstructuredbasevalueorascomplexasastructuredobjectwhichthetypesystemuserknowshowtoparseanduse.TSSandiBLOBshallallowsuchflexibilitytousers.7.ROelementswillalwayshavea"type"tagthatreferstothetypeofthereferenceobject.ThesemustbeoneofthetypessupportedattheiBLOBlevel.Currently,thisincludesbinary,int,double,string,intArray,anddoubleArray.ThevalueoftheROobjectsismaintainedbytheuserofthetypesystem.8.SOobjectsshallhaveanopenandclose"SO"xmlelement.BOandROobjectsare"emptyXMLelements"indicatedbyatag,sincethetypeofthebaseobjectisoneofthepre-definedtypesreferingtotheiBLOBlevel.Giventhisspecication,anexamplepathqueryforthecomplexhierarchicalregionobjectcanthenberepresentedas`Region.Shape.Faces[0].InnerCycle[0].PointList[0]',whichretrievestherstinnercyclefromtherstfaceofaregionobject.ThusthetypesystemusercanconstructintuitivequeriestoretrievepartsorthewholeofcomplexstructuredobjectsbasedontheimplementationprovidedbyTSSandiBLOBs.InthenextSection,weexperimentallyevaluatethisinternalrepresentationforcomplexstructuredobjectsintermsofstorageandperformanceefciencywhencomparedtootherobjectstorageorganizationsinXMLandBLOBinsidedatabases. 6.6EvaluationofiBLOBSInthissection,weinvestigatetheperformanceofaiBLOBincomparisontotheperformanceofaLOBstoringastructuredobject.Inourexperiments,asillustratedbelow,iBLOBsperformedbetterthanblobsandXMLforconstructing,retrievingandqueryingofpartsoflargehierarchicalobjectsincomparisontoBLOBsandXMLdatatypesindatabasesinourtests.Thebroadspectrumofavailablestrategiesfor 171

PAGE 172

Figure6-19. Availablestrategiesformanagingcomplexapplicationobjectsindatabases. storingandretrievingcomplexhierarchicalobjectsisillustratedinFigure 6-19 .WequantifytheadvantagesthatiBLOBprovideswithrespecttoperformanceincomparisonwithanon-iBLOBimplementationofanalgebraextensionrequiringLOBsorthedatabaseXMLtype.WealsoincludethestoragecostofimplementingiBLOBsagainstthestoragecostsofthenon-iBLOBalgebraimplementation.OurdetailedimplementationarchitectureforusingTSSandiBLOBsinaasagenericdataaccessinterface(GDAI)isshowninFigure 6-20 .Tomakeavalidcomparison,wemakethefollowingassumptionsaboutDBMSLOBs.First,inouranalysis,weignoretheamountofbytesthatmustbereadinordertolocateobjectswithiniBLOBsandLOBs.AvailableObject-RelationalDBMSs(ORDBMSs)arehighlyoptimizedforreadoperations.Therefore,readinglargeamountsofdataisgenerallynotaproblemincomparisontoupdatingdata.Toremainincompliancewiththeupdateproblem(insertingalargerbinaryobjectwithinaLOB),weassumetheLOBdoesnotincludeamechanismtoallowrandomupdates;however,weassumethattheDBMSinwhichtheLOBresidesprovidesthehighestlevelofLOBhandlingcapabilities(randomdataoverwrite,truncate,appendandrandomread).WedonotassumethatiBLOBsareimplementedataparticularlevelofLOBfunctionality, 172

PAGE 173

Figure6-20. DetailedarchitectureforTSSandiBLOBimplementation rather,wetestiBLOBsateachLOBfunctionalitylevel.Insummary,wearetestingthehighestpossiblefunctionalityLOBagainstiBLOBsbasedoneachlevelofLOBfunctionality.NotethattheseassumptionsdonotindicatethataparticularDBMSbeused.Instead,thecomparisonisvalidforallLOBimplementationsthatprovidethehighestlevelofLOBhandlingcapabilities.Thisisrstasimplertestbasedonourconceptualdesignandusingabasicconceptofahierarchicalbookwithchapterscontainingsentences,paragraphs,words,charactersandpunctuations.Later,weperformadetailedimplementationusingaspatialregionexampleandmeasurethequeryperformanceoflarge,structuredspatialobjectsusingiBLOBs.WeidentifytwocriticalmeasuresthatgaugethestorageperformanceofiBLOBsversusLOBs.Therstisthenumberofbyteswrittenthroughthelifeofastructured 173

PAGE 174

Table6-1. MeasuresusedtoquantifyiBLOBperformance DataType BytesStored BytesWritten iBLOBlevel1 T(NT)=(b+x)v+by+Sa(NT)+(NBy)+(NT)(v+w) Q(NT)=(b+x)v+by+Sa(NT)+(NBy)+(NT)(w+v) iBLOBlevel2 T(NT)=(b+x)v+by+Sb(NT)+(NBy)+(NT)(v+w) iBLOBlevel3 T(NT)=(b+x)v+by+Sb(NT)+(NBy)+(NT)(v+w))]TJ /F39 9.963 Tf 9.13 0 Td[((NBa) LOB T(NT)=y(b+NB) Q(NT)=QB(NB)QB(i)=QB(i)]TJ /F41 9.963 Tf 9.13 0 Td[(1)+y+(QB(i)]TJ /F41 9.963 Tf 9.12 0 Td[(1)=2)QB(0)=by Symbolsused b=numofbaseobjectsx=numofnonbaseobjectsy=avgnumofbytesperbaseobjectv=overheadperobjectw=overheadforupdatinganobjecta=bytereuserateNT=totalnumofupdatesNB=numofbaseobjectupdatesNX=numofnon)]TJ /F39 9.963 Tf 9.13 0 Td[(baseobjectupdatesSa,Sb=numofbytestowritesequenceindex largeobject.Thismeasuretakesintoconsiderationthenumberofbytesrequiredtocreateaninitialobject,plusanybytesthatmustbewrittenasaresultofupdatestotheobject.Thismeasurequantiestherequiredworkloadforperformingupdatesonstructuredlargeobject.ThesecondmeasurequantiesthesizeofbytesactuallystoredintheiBLOBorLOBafteraseriesofupdateshavebeenexecuted.ThismeasureisusedtounderstandthestorageoverheadintroducedwhenusingiBLOBs.ThetwomeasureswetestarequantiedintheformulasshowninTable 6-1 .Webeginbysummarizingthenotationusedintheseformulas.Eachformulacalculates 174

PAGE 175

anumberofbytesthatmustbewrittenorstored;QisafunctionthatyieldsthetotalsumofbyteswrittentoaLOBoriBLOBduringNTupdates,andTisafunctionthatyieldsthetotalnumberofbytesstoredinaLOBoriBLOBafterNTupdates.QBisarecursivefunctionthatcalculatesthenumberofbytesthatmustbewritteninaLOBtoinsertanewobject.NBisthetotalnumberofupdatesinvolvingbaseobjects(objectsatthelowestlevelsofthestructuralhierarchy),andNXisthetotalnumberofupdatesinvolvingnon-baseobjects.Forexample,addinganemptychaptertoabookrequirestheinsertionofanon-baseobject,whileaddingawordtoabookrequiresaddingabaseobject.Wecalculatethesevaluesbyintroducingaconstantdwhichrepresentsthepercentageofthetotalupdatesthatarebaseobjectupdates.NBandNXarethencalculatedasfollows:NB=bdNTcNX=NT)]TJ /F7 11.955 Tf 10.95 0 Td[(NBWeassumethateachLOBandiBLOBthatwetestwillalreadycontainaninitialobject.ThevalueofbrepresentstheoriginalnumberofbaseobjectsintheLOBoriBLOB,andxrepresentstheoriginalnumberofnon-baseobjects.Wedeneytobetheaveragesizeofabaseobjectinunitsofbytes.vrepresentsthenumberofbytesofoverheadrequiredtorepresentstructuralinformationforeachobjectinaiBLOB.Inourimplementation,theoverheadforanobjectconsistsoftheoffsetforthatobject(4bytes),theboundarybyteforthatobject(1byte),andtheintegerindicatingthenumberofsub-objectswithinthatobject(4bytes);thus,v=9.Inasecondaryimplementation,weusedadummyboundaryobjectofsize4bytes,butthiswillbehandledinthespatialimplementation,intheforthcomingsections.Weconsiderthebytesoftheoriginalobjecttobephysicallyin-order;therefore,thesizeoftheoriginalobjectinaiBLOBisdeterminedbycalculatingtheoverheadrequiredforallobjectsplusthenumberofbaseobjectbytesplusthesizeofaminimal(singlerange)sequenceindex:v(b+x)+by+8. 175

PAGE 176

TheoriginalobjectsizeforanobjectinaLOBisequaltothenumberofbaseobjectsmultipliedbytheaveragebaseobjectsize:by.InordertocalculatethenumberofbytesthatmustbewrittentoaiBLOBforagivennumberofupdates,wemustalsoconsidertheoverheadofupdatinganobject.Forinstance,whenweaddachaptertoabook,inadditiontoaddingthenewchapteranditsstructuraloverhead,thecountofthebookobjectmustbemodiedtoindicatethatitnowhasanotherchapter.Weintroducewtoindicatethisoverhead,andforourexperiments,setw=4(sizeofcount).Thus,wecancalculatethenumberofbytesrequiredtoperformNTupdatesasthenumberofupdatesofbaseobjectsmultipliedbytheaveragesizeofabaseobject,plusthetotalnumberofupdatesmultipliedbythesumoftheoverheadforanobjectandtheoverheadforupdatinganobject:yNB+NT(v+w).IftheiBLOBisimplementedonalevel3LOB,thenupdatesreplacingoneobjectwithanewobjectcanreusethebytesinwhichtheoldobjectwasstored.Thus,theresultingiBLOBwillhavealengthofthesizeoftheobjectminustheamountofbytesthatwecanbeoverwritten.Tomodelthis,weintroducethebytereuserateconstanta.Finally,weprovidetwoformulastocalculatethesizeofthesequenceindexoveraseriesofupdates.Therstformula,Sa,calculatesthenumberofsequenceindexbytesthatmustbewrittenthroughoutaseriesofNTupdates.Aftereachupdate,anewsequenceindexiswritten,andthissecondformulacalculatesthesumofallofthesesequenceindexes.Foranygivenupdate,thesequenceindexcangrowby32bytesintheworstcase.Thiscaseoccurswhenanewobjectisinsertedsuchthatitsoffsetanddataeachbreakasequenceindexrangeintothreeranges,effectivelygeneratingfourextrarangesinthesequenceindex.Noteveryupdateresultsinworstcasegrowth;however,wewillassumeaworstcasegrowthpatternforiBLOBupdates.Asequenceindexrepresentinganobjectwithnoupdateshasasizeof8bytes.The 176

PAGE 177

secondsequenceindex,SbcalculatesthesizeofanalsequenceindexonceNTupdateshavebeencompleted.Theformulasforbothareasfollows:Sa(NT)=(32NTi=0i)+8(NT+1)Sb(NT)=NT32+8UsingtheabovenotationandtheformulasinTable 6-1 ,wecalculatetheperformancebehaviorofiBLOBsandLOBsregardingstoringandupdatingstructuredlargeobjects.InFigure 6-21 (a),werstexaminethenumberofbytesthatmustbewrittentoaiBLOBorLOBoverthecourseofaseriesofupdates.Notethatthegraphdoesnotrepresentthenalamountofbytesstoredaftertheupdateshaveoccurred,butthenumberofbyteswrittentomakealltheupdates.WeindicatetheY-axisonthisgraphusingalogarithmicscale.Recallthataftereachupdate,thesequenceindexiswrittentotheendoftheiBLOB.Becausewearemeasuringbyteswritten,thefactthatsomeLOBcapabilitylevelsallowtheoverwritingofdatahasnoeffect.If5bytesarewrittentoaiBLOB,and3bytesoverwritepreviouslyusedspace,5byteshavestillbeenwritten.Thus,inTable 6-1 ,onlyasingleformulaisgivenforallthreeiBLOBlevelsforthismeasure.TheformulacalculatesthesizeoftheoriginaliBLOB,thenaddstheamountofbytesrequiredtowritetheobjectsinvolvedintheupdate,andnallycalculatesthenumberofbytesrequiredtowritethesequenceindextotheiBLOBaftereachupdate.CalculatingthismetricforaLOBismoredifcult.RecalltheupdateproblemassociatedwithLOBs.InordertoinsertanewobjectwithintheLOB,wemustphysicallyshiftthedatathatispositionedafterthenewobjecttomakeroomforthenewobject.Similarly,todeleteobjectsorreplaceobjectswithdifferentsizedobjects,wemustshiftdataaswell.Forthismeasure,weassumethatobjectsareupdatedinthemiddleoftheLOBonaverage.Thus,toinsertanewobject,wemustwritethenewobject,plusitsoverhead,plushalfofthecurrentsizeoftheLOBinordertoshiftdatatomakeroomforthenewobject.RewritinghalfoftheLOBforeveryupdaterequirestheamountofdatawrittentotheLOBtoincreasegreatlyasthenumberofupdatesincrease(notethelogarithmic 177

PAGE 178

(a)(b)Figure6-21. Numberofbyteswrittenonupdates(a),andsizeoftheunderlyingLOBobject(b) scaleoftheYaxisinFigure 6-21 ).ThismeasureemphasizestheimpactoftheupdateproblemassociatedwithLOBs.BecauseiBLOBsovercometheupdateproblem,makingupdatesrequiresthatthenewobjectbeappendedandthesequenceindexrewritten.Therefore,dramaticallyfewerbytesarewrittenwhenupdatingiBLOBsversusLOBs.WhileusingiBLOBsinsteadofLOBsforstoringstructuredobjectsalleviatestheupdateproblem,iBLOBsdohaveagreaterstorageoverheadthanLOBs.InFigure 6-21 (b),wegraphtheamountofdatastoredinaiBLOBorLOBafteruserexecutesaseriesofupdates.BecauseLOBsmuststoredatain-order,theresultingLOBwillalwaysbesmallerthantheresultingiBLOB.ThisisduetothestorageoverheadrequiredtostorethesequenceindexiniBLOBs.InTable 6-1 ,weprovideaseparateformulaforeachlevelofiBLOB.Recallthatlevel1iBLOBsdonotallowanydatatobetruncated.Thus,aftereachupdate,thesequenceindexfortheoldiBLOBisnotphysicallyremovedfromstorage;itisremovedfromthenewsequenceindexsothatitlogicallydoesnotexistanymore.WedenotethistheworstcaseiBLOB,andthelargeoverheadforstoringthesequenceindexesisevidentinitsgraph.Level2iBLOBsallowthephysical 178

PAGE 179

Figure6-22. HierarchicalrepresentationofaspatialgeometryusingESRIshapeles truncationofdata.Thus,whenanupdateoccurs,theoldsequenceindexcanbephysicallyremovedfromtheiBLOBandthenewsequenceindexwritteninitsplace.Bysimplyallowingtruncation,thestorageoverheadofiBLOBsdropsdramatically.Thelevel3iBLOBallowssomedatatobeoverwrittenwhenanobjectupdateismade.Ingeneral,weassumethattheamountofbytesthatcanbereusedisrelativelylow.Forthisgraph,weassumethat20percentoftheupdatedbyteswilloverwriteexistingbytes.Asexpected,thelevel3iBLOBhasevenlowerstorageoverheadthanthelevel2iBLOB.Inthecasewherethesizeofbaseobjectsisxed,alevel3iBLOBwouldhavethesamestorageoverheadastheLOBimplementation,plus8bytesforthesinglerangesequenceindex.AlthoughiBLOBsofalllevelsincursomestorageoverheadwhenstoringstructuredobjectsascomparedtoLOBs,theperformancetheygainbyalleviatingtheupdateproblemshowsthattheyareaviablealternativetostoringstructuredobjectsinLOBs.Foralargespatialobject,wethenperformexperimentstoevaluatethestorageandperformanceofiBLOBswhencomparedwithnaiveBLOBstorageforthesameobjectandforXMLTypedatainOracleDBMS.ThedatasetusedisthesetofstateboundariesforallthestatesinthecontinentalUSAinTIGERformatstoredcommonlyasESRI'sshapeles.ThestructureoftheseshapelesisprovidedbyESRIandshown 179

PAGE 180

inFigure 6-22 .WewroteparsersfortheselestocreatethespatialobjectinmemoryandthenconvertthemtoBLOBstorage,iBLOBstorageandXMLTypestorageasthesearethemostcommonformatsavailableforstoringcomplexobjectsindatabases.ReferFigure 6-19 foranillustrationofavailablemethodologiesforstoringcomplexobjectsindatabases.Forthesethreedifferentformatsofstorage,wethenevaluatethestoragesizeforthepolygonboundariesrepresentingthevariousstatesofthecontinentalUSA.Thisisshowninthefollowinggraphs.Figure 6-23 providesthecomparativestoragesizesofthesamepolygonobjectiniBLOB,BLOBandXMLstorage.XMLtakesthelargestamountofstoragebecauseofhighamountofstructuralmetadataincludingtagnamesstoredinsidethetuple.Forthemulti-structuredrepresentation,wehaveuseda4byteboundary(dummy)objecttoindicatetheendofdatainanysub-object.Thus,iBLOBhastheoverheadof12bytesforabaseobject(4bytesforObjectType,4bytesforOffsets,4bytesforcountand4bytesfordummyboundaryobject)andsoitshowsmorestoragerequirementthatconventionalBLOBapproach.HoweveriniBLOB(aswithXML)theobjectstructureisstoredinsidethetupleasmetadataandthiscanbeusedtodirectlyaccessspecicsub-componentsoftheobject.NotethatwehavenotusedtheXMLindex,sincethesameappliesforiBLOBs.Additionally,fromaconceptualdesignstandpoint,thestoringoftagnamesisamajordifferencebetweenXMLandiBLOBintermsofstorage.SinceiBLOBsdirectlystorethebinaryobjectinaninternallyserializedmanner,wecanplaceapointerfromtherelevantobjectclassdeningthestorageandrecreatetheobjectinmemory.ThusexplicitserializationanddeserializationfunctionsareavoidedintheiBLOBcontext.ThispresentsamajorimprovementinfunctionalityinusingiBLOBsfordevelopingnewtypesystemimplementations.Figure 6-24 illustratesthetimerequiredtocreatethespatialpolygonobjectsusingiBLOB,BLOBandXMLapproachrespectively.ThecreatetimeforXMLandiBLOBwhichstoreadditionalstructuralinformationishigherduetotheadditionalamount 180

PAGE 181

ofdiskreadsandwritesrequiredfortheincreasedstorageshowninFigure 6-23 .However,theiBLOBshowsincreasedperformanceforobjectread,writeanddeleteoperationsforlargesizedcomplexobjects.Considerthecomparisonofsub-objectreadtimesshowninFigure 6-25 .Wehavemeasuredthereadperformanceofeachoftheapproachforslicingthroughtheobjecttoretrieveonespecicsub-component,inthiscase,therstringofthepolygonoftenrepresentingaCountyinthatState.Clearly,thereadperformanceofiBLOBsismuchhigherthatconventionalBLOBstoragewhenthecomplexobjectsizeexceeds2MB.MoreoveriBLOBsperformbetterthanthenaiveXMLstoragewhenthesizeexceeds200KBbecauseoftheincreasedoverheadinXMLtoparsethehierarchy(withoutanindex).Figure 6-26 showsthecomparativeperformanceofiBLOBswithBLOBsandXMLstorageforinsertinganewringinthepolygonobject(representingnewCountyinformationforaparticularState).AgainiBLOBsperformbetterthanbothXMLandBLOBstorageforlargesizedobjects.iBLOBstooklesstimethanXMLbasedsub-objectinsertswhenthesizeofthebasepolygonexceeded200KB.Similarly,iBLOBstooklesstimeforsub-objectinsertsthanBLOBstoragewhenthesizeofthebasepolygonexceeded700KB.AnotherimportantobservationisthatthetimetakenforinsertswithiBLOBsremainsalmostconsistentoncethebaseobjectsizeexceeds2MB.ThisadvantageisbecauseweonlyperformsappendsinthestructuralindexoftheiBLOBinsteadoftheconventionalreformulationoftheobjectrequiredinBLOBstorage. 181

PAGE 182

Figure6-23. GraphillustratingthecomparativestorageperformanceofiBLOB,BLOBandXMLbasedin-databaseLOBstoresforcomplexspatialobjects 182

PAGE 183

Figure6-24. GraphillustratingthecomparativeperformanceofobjectconstructionforiBLOB,BLOBandXMLbasedin-databasestorage 183

PAGE 184

Figure6-25. Graphillustratingthecomparativeofsub-objectreadoperationforiBLOB,BLOBandXMLbasedin-databasestorage 184

PAGE 185

Figure6-26. Graphillustratingthecomparativeofsub-objectinsertoperationforiBLOB,BLOBandXMLbasedin-databasestorage 185

PAGE 186

Figure6-27. Graphillustratingthecomparativeofsub-objectdeleteoperationforiBLOB,BLOBandXMLbasedin-databasestorage 186

PAGE 187

Finally,Figure 6-27 evaluatesthedeleteperformanceofthethreemodesofstorage.Followingthesamestrategyasabove,wetestforthedeletionofaparticularCountysub-object(representedbyaring)intheState(representedbyapolygon).Deletionofsub-objectsismostefcientiniBLOBsmainlybecausedeletiononlyrequiresthesub-objecttobedereferencedfromthestructureindexwhichisachievedbysimplydeletingthememorychunkforthatsub-objectfromthesequenceindex.Fordeletion,iBLOBsperformbetterthanXMLstoragewhenthesizeexceeds100KB,andbetterthanconventionalBLOBswhenthebaseobjectsizeexceedsapproximately500KB.ThusintermsofperformanceiBLOBsexhibitreducedtimesforinsertion,deletionandreadsforpartsofcomplexobjectsstoredindatabaseswhencomparedtobothXMLandBLOBstorage.TheslightlyincreasedstoragesizeiniBLOBsiseasilyoffsetbytheimprovementinperformancewhencomparedtoBLOBsandXMLstorageclearlyillustratingtheimprovedperformanceofiBLOBsforqueryingoverlargeobjectsstoredinObjectRelationalDBMSs.Toconclude,inthischapter,weprovideanovelsolutiontostoreandmanagecomplexapplicationobjects(i.e.,variablelength,structured,hierarchicaldata)byintroducinganewmechanismforhandlingstructuredobjectsinsideDBMSs.Thisincludestwomajorconcepts.First,wepresentatypestructurespecication(TSS)thathelpstodescribethestructureofcomplexapplicationobjects.ThenweintroduceaspecialSQLdatatypecalledIntelligentBinaryLargeObjectoriBLOBthatenablesthedatabasetohandlestructuredobjects.WeexperimentallyevaluatethisstrategyandcompareitwithXMLandBLOBbasedcomplexobjectstorageindatabasesandndhigherperformanceforlargesizedmulti-component,hierarchicalobjects.ThecombinationoftypestructurespecicationandiBLOBsprovidesthenecessarytoolstoeasilyimplementtypesystemsindatabases.However,thefocusofthischapteristoextenddatabasefunctionalitytonativelysupportcomplexobjectsinaefcientmanner. 187

PAGE 188

CHAPTER7QUERYINGFORCARDINALDIRECTIONRELATIONSONSPATIALDATAUSINGTHEBigCubeAPPROACHInthisChapter,weintroduceanovelapproachcalledtheObjectsInteractionGraticule(OIG)model( ViswanathanandSchneider 2011b )todeterminetheCardinaldirectionsbetweenmoving,simple(single-component,hole-free)regionobjects.WealsoshowhowdirectionalpredicatescanbederivedfromthecardinaldirectionsandusetheminMDXqueries.ThispresentsanovelapplicationofaspatialoperationforndingthequalitativeCardinaldirectionrelationshipsamongcomplexobjectsindatawarehouses. 7.1IntroductionFormorethanadecade,datawarehouseshavebeenattheforefrontofinformationtechnologyapplicationsasawayfororganizationstoeffectivelyuseinformationforbusinessplanninganddecisionmaking.Thedatawarehousecontainsdatathatgiveinformationaboutaparticular,decision-makingsubjectinsteadofaboutanorganization'songoingoperations(subject-oriented).Dataisgatheredintothedatawarehousefromavarietyofsourcesandthenmergedintoacoherentwhole(in-tegrated).Allthedatainadatawarehousecanbeidentiedwithaparticulartimeperiod(time-variant).Dataisperiodicallyaddedinadatawarehousebutishardlyeverremoved(non-volatile).Thisenablesthemanagertogainaconsistentpictureofthebusiness.Thus,thedatawarehouseisalarge,subject-oriented,integrated,time-variantandnon-volatilecollectionofdatainsupportofmanagement'sdecisionmakingprocess( Inmon 2005 ; KimballandRoss 1996 ).OnlineAnalyticalProcessing(OLAP)isthetechnologythathelpsperformcomplexanalysesovertheinformationstoredinthedatawarehouse.DatawarehousesandOLAPenableorganizationstogatheroveralltrendsanddiscovernewavenuesforgrowth.Withtheemergenceofnewapplicationsinareassuchasgeo-spatial,sensor,multimediaandgenomeresearch,thereisanexplosionofcomplex,spatio-temporal 188

PAGE 189

datathatneedstobeproperlymanagedandanalyzed.Thisdataisoftencomplex(withhierarchical,multidimensionalnature)andhasspatio-temporalcharacteristics.Agoodframeworktostore,queryandminesuchdatasetsinvolvesnext-genmovingobjectsdatawarehousesthatbringthebesttoolsfordatamanagementtosupportcomplex,spatio-temporaldatasets.Themovingobjectsdatawarehouse(MODW)canbedenedasalarge,subject-oriented,integrated,time-variant,non-volatilecollectionofanalytical,spatio-temporaldatathatisusedtosupportthestrategicdecision-makingprocessforanenterprise.Movingobjectsdatawarehouseshelptoanalyzecomplexmultidimensionalgeo-spatialdataexhibitingtemporalvariations,andprovideenterprisedecisionsupport.Qualitativerelationsbetweenspatialobjectsincludecardinaldirectionrelations,topologicalrelationsandapproximaterelations.Ofthesecardinaldirectionshaveturnedouttobeveryimportantduetotheirapplicationinspatialwaynding,qualitativespatialreasoningandindomainssuchascognitivesciences,robotics,andGIS.InspatialdatabasesandGIStheyarefrequentlyusedasselectioncriteriainspatialqueries.However,currentlythereisnoavailablemethodtomodelandqueryforcardinaldirectionsbetweenmovingobjects(withaspatio-temporalvariation).Anearlyapproachtomodelingdatawarehouseswithsupportforseveralbuilt-indatatypesispresentedin( ViswanathanandSchneider 2010 ).WedescribedanovelsystemtomodelcardinaldirectionsbetweenspatialregionsindatabasesusingtheObjectsInteractionMatrix(OIM)modelin( Chenetal. 2010 ; Schneideretal. 2012 ).Thismodelsolvestheproblemsfoundinexistingdirectionrelationmodelsliketheunequaltreatmentofthetwospatialobjectsasargumentsofacardinaldirectionrelation,theuseoftoocoarseapproximationsofthetwospatialoperandobjectsintermsofsinglerepresentativepointsorMBRs,thelackingpropertyofconversenessofthecardinaldirectionscomputed,thepartialrestrictionandlimitedapplicabilitytosimplespatialobjectsonly,andthecomputationofincorrectresultsinsomecases.Thebasisof 189

PAGE 190

Figure7-1. OverviewofthetwophasesoftheObjectsInteractionMatrix(OIM)model themodelwasaboundedgridcalledtheobjectsinteractiongridwhichhelpstocapturetheinformationaboutthespatialobjectsthatintersecteachofitscells.Then,weusedamatrixtoandappliedaninterpretationmethodtodeterminethecardinaldirectionbetweenspatialobjects.TheObjectsInteractionMatrix(OIM)approachservesasabasisformodelingandcapturingdirectionrelationsamongspatialdata.InSection 7.2 westartbyprovidingadescriptionoftheOIMapproach. 7.2OverviewoftheObjectsInteractionMatrix(OIM)ModelInthissection,wegiveabriefoverviewofournovelcardinaldirectionmodel,calledtheObjectsInteractionMatrix(OIM)model.Weemphasizeitsmainfeaturesandsketchhowitovercomestheweaknessesofcurrentmodelsandhowitsatisestherequirementsforevaluatingcardinaldirectionsoverinteractingspatialobjects.TheOIMmodelbelongstothetiling-basedmodelsandhereespeciallytotheprojection-basedmodels.However,itisimportanttounderstandthatitratherdiffersfromtheDirection-RelationMatrix(DRM)modelandisthusnotitsextension.Forexample,theDRMmodelsubdividestheentireunboundedEuclideanspacearoundareferenceobject,whiletheOIMmodelsubdividesaclosedsubspaceenclosingbothoperandobjects.Figure 7-4 showsitstwo-phasestrategyforcalculatingthecardinaldirectionbetweentwospatialobjectsAandB.Inthefollowing,weassumethatAandBarenon-emptyvaluesofthecomplexspatialdatatyperegion( Schneider 1997 ).Intherstphase,calledthetilingphase,werstdeterminetheninedirectionalzonesthatbelongtospatialregionobjectAandthentheninedirectionalzonesthat 190

PAGE 191

belongtospatialregionobjectB.WeobtaintwopartitionsoftheEuclideanplane,andthesimplebutfundamental,newideaisnowthatbothpartitionsareoverlaidsothattheregionobjectsAandBinteractwitheachotherinthetilingprocess.Thepartitionoverlaygeneratesagridcalledobjectsinteractiongrid(OIG)(Figure 7-5 a).Incontrasttoallothertiling-basedmodelswhichhaveunboundedzonesexceptforthecentralzone,ourgridisclosedandbounded.Weachievethisbyomittingallperipheral,unboundedzones.Regardingourexample,a33-gridisgenerated(seethecontinuoussegmentsinFigure 7-5 a).Thesurrounding16unboundedgridcells(indicatedbythedashedsegmentsinFigure 7-5 a)areirrelevantsinceneitherAnorBcanintersectthem.Moreprecisely,theareacoveredbytheobjectsinteractiongrid,calledobjectsinteractiongridspace(OIGspace),isgivenbytheminimumandmaximumx-andy-coordinatesoftheminimumboundingrectanglesofAandB.Wewilllaterseethatalsoalltheothernm-gridsarepossiblewith1n,m3.ThepartitionoverlaycatersforanequalandsymmetrictreatmentoftheoperandobjectsAandB.Conceptslikereferenceobjectandtargetobjectdonotexistinourmodel.Anobjectsinteractiongridprovidesuswiththevaluableinformationwhichobjectintersectswhichgridcell.Thismeansthatagridcellortilemaybeintersectedbynoregion(codedby0),byregionAonly(codedby1),byregionBonly(codedby2),orbybothregionsAandB(codedby3).Foreachgridcellti,jintheithrowandjthcolumn(1i,j3)westorethecodedinformationabouttheregionsthatintersectitinanobjectsinteractionmatrix(OIM)MincellMi,j.Thatis,weabstractfromthegeometryoftheOIGspaceandonlykeeptheinformationwhichregionintersectswhichtile.Wecandothissincecardinaldirectionshaveaqualitativeandnotaquantitativenature.Inourexample,weobtaina33-objectsinteractionmatrix(Figure 7-5 b).Inthesecondphase,calledtheinterpretationphase,weleveragetheobjectsinteractionmatrixtoderivethecardinaldirectionbetweenAandB.Wecallthisstepinterpretationsincesuchamatrixcanbemappedtodifferentmodelsofbasiccardinal 191

PAGE 192

OIG(A,B)= OIM(A,B)=0@2000120101A(a)(b)Figure7-2. TheobjectsinteractiongridOIG(A,B)fortwocomplexregionobjectsandthederivedobjectsinteractionmatrixOIM(A,B)(b) directions,i.e.,itcanbeinterpretedindifferentways.Inthisarticle,wehavealreadyseenthreedifferentmodelsandthusthreedifferentinterpretationsofbasiccardinaldirections,namelythefourcardinaldirectionsnorth,east,south,andwest)( Haar 1976 ),thefourcardinaldirectionsnorthwest,northeast,southwest,southeast( Frank Frank ),andtheninecardinaldirectionsnorth,east,south,west,northwest,northeast,southwest,southeast,andorigin( GoyalandEgenhofer 1997 ; SkiadopoulosandKoubarakis 2004 ).Wewillinthischapterconneourselvestothelattercardinaldirectionmodelduetoitspopularityandlargedetailedness,andinordertomakeourapproachcomparabletotheDirection-RelationMatrixmodel.Wedistinguishtwostepsintheinterpretationphase.Intherststep,weuseanindexpair(i,j)with1imand1jntorepresentthelocationoftheelementMi,jinthemnobjectsinteractionmatrixM.WeapplyafunctionloctoeachregionandMinordertodetermineatwhichlocationswecanndcomponentsofeachregion.ForourexampleinFigure 7-5 weobtainloc(A,M)=f(2,2),(3,2)gandloc(B,M)=f(1,1),(2,3)g.Inthesecondstep,weuseafunctiondirtodeterminethecompositecardinaldirectionbetweenAandB.Weformallpairsofelementsofloc(A,M)andloc(B,M)anddeterminethebasiccardinaldirectionforeachpairbyapplyingacorrespondinginterpretationfunction.ThecompositecardinaldirectionbetweenAandBisthenequaltotheunionofalldeterminedbasiccardinaldirections.Theinterpretationfunctiondeterminesthebasiccardinaldirectionbetweenanytwo 192

PAGE 193

objectcomponentsonthebasisoftheir(i,j)-locationsintheobjectsinteractionmatrix.Thevaluesoftheinterpretationfunctionarestoredinaninterpretationtableforalookupinconstanttime.ForourexampleinFigure 7-5 ,letusconsider(3,2)2loc(A,M)and(2,3)2loc(B,M).ThefactthatacomponentofAislocatedinthecell(3,2)andacomponentofBislocatedinthecell(2,3)impliesthatthecomponentofAmustbesouthwestofthecomponentofB.Overall,weobtaindir(A,B)=fSW,W,SEgand,similarly,dir(B,A)=fNE,E,NWg.Notethatfordeterminingthecardinaldirectionsbetweentheinteractingspatialobjectstheshapesofbothregionobjectsaretakenintoaccount.Further,bothregionobjectsaretreatedasequalpartners.Toalargeextent,thiscoequaltreatmentcontributestoassuringthepropertyofconversenessofcardinaldirectionsinourapproach.FortworegionobjectsAandB,wecanusethesameapproachtocomputedir(A,B)anddir(B,A).Further,weobtaintheconsistentresultthatdir(A,B)istheinverseofdir(B,A),andviceversa.Ifwehaveeitherdir(A,B)ordir(B,A),wecanderivetheinversecompositecardinaldirectionimmediatelyinconstanttimeastheunionoftheinversebasiccardinaldirections.Thatis,ifweknowdir(A,B),e.g.,wecanimmediatelydeterminedir(B,A)asdir(B,A)=finv(d)jd2dir(A,B)gwherethefunctioninvdeterminestheinverseofeachbasiccardinaldirection(e.g.,inv(W)=E,inv(NW)=SE).Now,weshallcontinueandpresentthenovelObjectsInteractionGraticule(OIG)systemformodelingcardinaldirectionsbetweenmovingobjectsandqueryingforsuchrelations.Wealsointroduceamovingobjectsdatawarehouseframeworktohelpachievethistask.Movingobjectsaresocalledspatio-temporalobjectsthatdisplayacontinuousevolutionintheirspatiolocationovertime.OurmethodimprovesupontheOIMmodelbyaddingsupportformovingobjectsandprovidesaninnovativeapproachtomodelcardinaldirectionrelationsinsidedatawarehouses.Inarstphase,weapplyamulti-gridtilingstrategytodeterminethezonebelongingtothetheninecardinal 193

PAGE 194

(a)(b)(c)(d)Figure7-3. Projection-based(a)andcone-shaped(b)models,andtheDirection-RelationMatrixmodelwithAasreferenceobject(c)andwithBasreferenceobject(d) directionsofeachspatialobjectataparticulartimeandthenintersectsthem.ThisleadstoacollectionofgridsovertimecalledtheObjectsInteractionGraticule.ForeachgridcelltheinformationaboutthespatialobjectsthatintersectitisstoredinanObjectsInteractionMatrix.Inthesecondphase,aninterpretationmethodisappliedtothesematricestodeterminethecardinaldirectionbetweenthemovingobjects.TheseresultsareintegratedintoMDXqueriesusingdirectionalpredicates.Inthenextsection,weprovideasurveyofexistingtechniquestomodelcardinaldirectionsingeneral,anddiscusstheirapplicabilitytodatawarehousesandformodelingdirectionrelationsbetweenmovingobjects.InSection 7.3 ,weintroduceourmovingobjectsdatawarehouseframeworkandtheObjectsInteractionGraticuleModelformodelingcardinaldirectionrelationsbetweenmovingobjects.TheTilingPhaseofthemodel(explainedinSection 7.4 )helpstogeneratetheOIMmatrix;itsInterpretationisachievedinSection 7.5 .Section 7.6 providesdirectionpredicatesandMDXqueries( MicrosoftCorporation 2010 )thatillustratecardinaldirectionqueryingusingourmodel.Finally,Section 8.1 concludesthepaperandprovidessomedirectionsforfutureresearch. 7.3TheObjectsInteractionGraticule(OIG)ApproachforModelingCardinalDirectionsinDataWarehousesTheideabehindmovingobjectsdatawarehouses(MODW)istoprovideasystemcapableofrepresentingmovingentitiesindatawarehousesandbeabletoaskqueriesaboutthem.Movingentitiescouldbemovingpointssuchaspeople,animals,all 194

PAGE 195

kindsofvehiclessuchascars,trucks,airplanes,ships,etc.,whereusuallyonlythetime-dependentpositioninspaceisrelevant,nottheextent.However,movingentitieswithanextent,e.g.,hurricanes,res,oilspills,epidemicdiseases,etc.,couldbecharacterizedasmovingregions.Suchentitieswithacontinuous,spatio-temporalvariation(inposition,extentorshape)arecalledmovingobjects.Withafocusoncardinaldirectionrelations,movingregionsaremoreinterestingbecauseofthechangeintherelationshipbetweentheirevolvingextentsovertime.Inthispaper,wefocusonsimple(single-component,hole-free)movingregionsandprovideanovelapproachtogatherdirectionrelationsbetweensuchobjectsovertime,usingadatawarehousingframework.Themovingobjectsdatawarehouseisdenedbyaconceptualcubewithmovingobjectsinthedatadimensions(containingmembers)deningthestructureofthecube,anditscellscontainingmeasurevaluesthatquantifyreal-worldfacts.Themeasuresandmembersareinstancesofmovingobjectdatatypes( Gutingetal. 2000 ).TheBigCube( ViswanathanandSchneider 2010 )isanexampleofaconceptual,user-centricdatawarehousemodelthatcanbeextendedforMODWdesign.Inthispaper,ourOIGmodelliesattheconceptuallevelandthepredicatesprovidethemeanstoimplementthemodelusinganylogicalapproach( VassiliadisandSellis 1999 ).However,weshallprovideMDXqueriestohelpillustratetheversatilityofthemodelinqueryingfordirectionrelationsbetweenmovingobjects.ThegoaloftheOIGmodelistoenableadatawarehouseusertoqueryforcardinaldirectionsbetweenmovingregionobjects.Toachievethisgoal,weneedtotakethevariouspossiblemovingobjects'congurationsintoaccountandmodelfordirectionrelationsinallofthecasestoarriveattheoveralldirectionrelation.Thisisbecausethedirectionrelationbetweentwomovingobjects,betweentwoqueriedtimeinstances,canbearrivedatonlybyconsideringallthedirectionrelationsbetweenthemduringtheirlifetimes.Thepossiblecongurationsbetweenmovingobjectsthatweneedtoconsiderincludethefollowing.First,twoobjectscouldbeatdifferentspatiallocations 195

PAGE 196

Figure7-4. OverviewofthetwophasesoftheObjectsInteractionGraticule(OIG)model atthesameinstantoftime(dualobject,spatialvariation)asshowninFigure 2-6 (a).Second,anobjectcouldbeattwodifferentspatiallocationsattwodifferentinstancesoftime(singleobject,spatio-temporalvariation)asshowninFigure 2-6 (b).Third,twoobjectscouldbeattwodifferentspatiallocationsattwodifferenttimeinstances(dualobject,spatio-temporalvariation)asshowninFigure 2-6 (c).Thedottedlinesbetweenthecongurationofobjectsacrosstimerepresentstheuxintheintersectionofthecoordinatesystemsusedinthespace-timecontinuum.Weincludethisinourmodeltobeabletocapturethelocationsofobjectsacrossthetimeextents.However,thedashedlinesindicatethepartnotboundedbytheobjectsinteractiongraticule(OIG).TheOIGisaclosed,boundedregionandthedottedlinesdonotsignifyanyholesinthespatio-temporalvariationofthemovingobjects.Figure 7-4 showsthetwo-phasestrategyofourmodelforcalculatingthecardinaldirectionrelationsbetweentwoobjectsAandBattimeinstancest1andt2.WeassumethatAandB(inthegeneralcase)arenon-emptyvaluesofthecomplexspatial-temporaldatatypemregion( Gutingetal. 2000 ).Forcomputingthedirectionrelationbetweentwomovingobjects'snapshots,weneedtoconsiderallpossibledi-rectionrelationsbetweenthevariouscombinationofobjectsintheinteractingsystem.First,weconsiderthescenarioateachsnapshott1andt2,andalsothecasewhent1=t2.Forthese,wedefaulttotheOIMapproachfordirectionsbetweenobjectswithouttemporalvariationandgatherthedirectionrelationsbetweenthem.Thisisgivenbydir(At1,Bt1)anddir(At2,Bt2).Thesecondcasearisesift16=t2.Thenve 196

PAGE 197

moredirectionrelationscanbecomputedasshowninFigure 7-4 .Thisincludesfourcombinationsforthetwoobjectsatt1andt2,givenbydir(At1,Bt2),dir(At2,Bt1),dir(At1,At2)anddir(Bt1,Bt2).Plus,wealsorelatetheentiresystem(bothobjects)ateachofthedifferenttimeinstancesusedtodeterminethequeryresult.Thisisgivenbydir(mbr(union(At1,Bt1)),mbr(union(At2,Bt2))).Usingallthesedirectionrelations,wecannowcomputethemovingdirectionrelationsbetweenthetworegionsovertime.Noticethat,forclarity,wehaveusedthenotationAtinsteadofA(t)torefertothetemporaldevelopmentofthemovingregionA(AisactuallydenedbyacontinuousfunctionA:time!region).Wewillusethisnotationthroughtherestofthepaper.ThetilingphaseinSection 7.4 detailsournoveltilingstrategythatproducestheobjectsin-teractiongraticuleandshowshowtheyarerepresentedbyobjectsinteractionmatrices.TheinterpretationphaseinSection 7.5 leveragestheobjectsinteractionmatrixtoderivethedirectionalrelationshipbetweentwomovingregionobjects. 7.4TheTilingPhase:RepresentingInteractionsofObjectswiththeObjectsInteractionGraticuleandMatrixInthissection,wedescribethetilingphaseofthemodel.Thegeneralideaofourtilingstrategyistosuperimposeagraticulecalledobjectsinteractiongraticule(OIG)onacongurationoftwomovingspatialobjects(regions).Suchagraticuleisdeterminedbyfourverticalandfourhorizontalpartitioninglinesofeachobjectatavailabletimeinstances.Thefourvertical(fourhorizontal)partitioninglinesofanobjectaregivenasinniteextensionsofthetwovertical(twohorizontal)segmentsoftheobject'sminimumboundingrectangleateachofthetwotimeinstances.ThepartitioninglinesofbothobjectscreateapartitionoftheEuclideanplaneconsistingofmultiplemutuallyexclusive,directionaltilesorzones.Inthemostgeneralcase,allpartitioninglinesaredifferentfromeachother,andweobtainanoverlaypartitionwithcentral,boundedtilesandperipheral,unboundedtiles(indicatedbythedashedsegmentsinFigure 7-5 (a)).Theunboundedtilesdonot 197

PAGE 198

OIG(At1,Bt1,At2,Bt2)= OIM(At1,Bt2)=)]TJ /F6 11.955 Tf 10.46 -9.59 Td[(102OIM(At2,Bt1)=0@2000000011AOIM(mbr(union(At1,Bt1)),mbr(union(At2,Bt2)))=102002OIM(At1,At2)=0@1000000011AOIM(Bt1,Bt2)=0@0020002001A(a)(b)Figure7-5. TheobjectsinteractiongraticuleOIG(A,B)forthetworegionobjectsAandBinFigures 7-3 cand 7-3 d(a)andthederivedobjectsinteractionmatrices(OIM)forOIGcomponentsdescribedinDenition 27 containanyobjectsandtherefore,weexcludethemandobtainagraticulespacethatisaboundedpropersubsetofR2,asDenition 25 states. Denition25. LetR=(At1,Bt1,At2,Bt2),R2regionwithAt16=?^Bt16=?^At26=?^Bt26=?,andletminrx=minfxj(x,y)2rg,maxrx=maxfxj(x,y)2rg,minry=minfyj(x,y)2rg,andmaxry=maxfyj(x,y)2rgforr2fAt1,Bt1,At2,Bt2g.Theobjectsinteractiongraticulespace(OIGS)ofAt1,Bt1,At2andBt2isgivenas: OIGS(R)=f(x,y)2R2jmin(minAt1x,minBt1x,minAt2x,minBt2x)xmax(maxAt1x,maxBt1x,maxAt2x,maxBt2x)^min(minAt1y,minBt1y,minAt2y,minBt2y)ymax(maxAt1y,maxBt1y,maxAt2y,maxBt2y)gDenition 26 determinestheboundedgraticuleformedasapartofthepartitioninglinesandsuperimposedonOIGS(At1,Bt1,At2,Bt2). Denition26. Letsegbeafunctionthatconstructsasegmentbetweenanytwogivenpointsp,q2R2,i.e.,seg(p,q)=ftjt=(1)]TJ /F4 11.955 Tf 10.83 0 Td[(l)p+lq,0l1g.LetHr=fseg((minrx,minry),(maxrx,minry)),seg((minrx,maxry),(maxrx,maxry))gandVr=fseg((minrx,minry),(minrx,maxry)),seg((maxrx,minry),(maxrx,maxry))gforr2fAt1,Bt1,At2,Bt2g.WecalltheelementsofHAt1,HBt1,HAt2,HBt2,VAt1,VBt1,VAt2andVBt2objectsinteractiongraticulesegments.Then,theobjectsinteractiongraticule(OIG)forAandBisgivenas: 198

PAGE 199

OIG(At1,Bt1,At2,Bt2)=HAt1[VAt1[HBt1[VBt1[HAt2[VAt2[HBt2[VBt2.IntheOIGofanobject,therearetwoconstituentobjectinteractioncoordinatesystems(OICS)foreachtemporalstateofthemovingobjects.Thesearedenedasfollows: OICoordS(At1,Bt1)=HAt1[VAt1[HBt1[VBt1,andOICoordS(At2,Bt2)=HAt2[VAt2[HBt2[VBt2.ThedenitionofOIGcomprisesthedescriptionofallgraticulesthatcanarise.Inthemostgeneralcase,ift1=t2andHAt1\HBt1=?andVAt1\VBt1=?,weobtainabounded33-graticulesimilartothatforanon-temporalvariationintheobjectscongurations.SpecialcasesariseifHAt1\HBt16=?and/orVAt1\VBt16=?.Thenequalgraticulesegmentscoincideintheunionofallgraticulesegments.Asaresult,dependingontherelativepositionoftwoobjectstoeachother,theobjectsinteractiongraticulecanbeofdifferentsizes.However,duetothenon-emptypropertyofaregionobject,notallgraticulesegmentscancoincide.Thismeansthatatleasttwohorizontalgraticulesegmentsandatleasttwoverticalgraticulesegmentsmustbemaintained.Denition 27 givesaformalcharacterizationfortheOIG. Denition27. AnobjectsinteractiongraticuleOIG(At1,Bt1,At2,Bt2)consistsoftwoobjectsinteractioncoordinatesystems,att1andt2,eachofsizemn,withm,n2f1,2,3g,ifjHA\HBj=3)]TJ /F7 11.955 Tf 11.38 0 Td[(mandjVA\VBj=3)]TJ /F7 11.955 Tf 11.38 0 Td[(n.Further,italsoconsistsoffourObjectsInteractionGridsforeachofthespatio-temporalcombinationsofthetwomovingobjectsandafthfortheoverallsystem.Together,thesearecalledtheobjectsinteractiongraticulecomponents.Theobjectsinteractiongraticulepartitionstheobjectsinteractiongraticulespaceintoobjectsinteractiongraticuletiles(zones,cells).Denition 28 providestheirdenitionforeachofthetimeinstancesuniquely,usingtheobjectsinteractioncoordinatesystems. Denition28. GivenAt1,Bt1,At2,Bt22regionwithAt16=?^Bt16=?^At26=?^Bt26=?,OIGS(At1,Bt1,At2,Bt2),andOIG(At1,Bt1,At2,Bt2),wedenecH=jHA[HBj=jHAj+ 199

PAGE 200

jHBj)-191(jHA\HBjandcVcorrespondinglyatatimeinstant.LetHAB=HA[HB=fh1,...,hcHgsuchthat(i)81icH:hi=seg((x1i,yi),(x2i,yi))withx1i
PAGE 201

i(A,B,ti,j)=8>>>>>>>>>><>>>>>>>>>>:0ifA\ti,j=?^B\ti,j=?1ifA\ti,j6=?^B\ti,j=?2ifA\ti,j=?^B\ti,j6=?3ifA\ti,j6=?^B\ti,j6=?Weusethembrandunionfunctionsforcomputingtheminimumboundingrectangleandthespatialunionoftwoobjects,respectively.Tosupportbothobjectsinteractioncoordinatesystemsweextenditoacceptmbr(union(At1,Bt1))andmbr(union(At2,Bt2))asoperands.Theoperatordenotesthepoint-settopologicalinterioroperatorandyieldsaregionwithoutitsboundary.Foreachgraticulecellti,jintheithrowandjthcolumnofanmn-graticulewith1imand1jn,westorethecodedinformationinanobjectsinteractionmatrix(OIM)incellOIM(A,B)i,j. OIM(A,B)=0BBBB@i(A,B,t1,1)i(A,B,t1,2)i(A,B,t1,3)i(A,B,t2,1)i(A,B,t2,2)i(A,B,t2,3)i(A,B,t3,1)i(A,B,t3,2)i(A,B,t3,3)1CCCCA 7.5TheInterpretationPhase:AssigningSemanticstotheObjectsInteractionMatrixThesecondphaseoftheOIGmodelistheinterpretationphase.Thisphasetakesanobjectsinteractionmatrix(OIM)obtainedastheresultofthetilingphaseasinputandusesittogenerateasetofcardinaldirectionsasoutput.Thisisachievedbyseparatelyidentifyingthelocationsofbothobjectsintheobjectsinteractionmatrixandbypairwiseinterpretingtheselocationsintermsofcardinaldirections.Theunionofallthesecardinaldirectionsistheresult.ThisphaseissimilartotheInterpretationPhaseoftheOIMmodel( Chenetal. 2010 ).Weuseaninterpretationfunctiontodeterminethebasiccardinaldirectionbetweenanytwoobjectcomponentsonthebasisoftheir(i,j)-locationsintheobjectsinteractionmatrix.ThecompositecardinalrelationbetweenAandBisthentheunionofalldeterminedrelations. 201

PAGE 202

Inarststep,wedeneafunctionloc(seeDenition 30 )thatactsononeoftheregionobjectsAorBandtheirOIManddeterminesalllocationsofcomponentsofeachobjectinthematrixforbothtemporalextentsindividually.LetIm,n=f(i,j)j1im,1jng.Weuseanindexpair(i,j)2Im,ntorepresentthelocationoftheelementMi,j2f0,1,2,3gandthusthelocationofanobjectcomponentfromAorBinanmnobjectsinteractionmatrix. Denition30. LetMbethemn-objectsinteractionmatrixoftworegionobjectsAandB.Thenthefunctionlocisdenedas: loc(A,M)=f(i,j)j1im,1jn,Mi,j=1_Mi,j=3gloc(B,M)=f(i,j)j1im,1jn,Mi,j=2_Mi,j=3gInasecondstep,wedeneaninterpretationfunctionytodeterminethecardinaldirectionbetweenanytwoobjectcomponentsofAandBonthebasisoftheirlocationsintheobjectsinteractionmatrix.Weuseapopularmodelwiththeninebasiccardinaldirections:north(N),northwest(NW),west(W),southwest(SW),south(S),southeast(SE),east(E),northeast(NE),andorigin(O)tosymbolizethepossiblecardinaldirectionsbetweenobjectcomponents.Adifferentsetofbasiccardinaldirectionswouldleadtoadifferentinterpretationfunctionandhencetoadifferentinterpretationofindexpairs.Denition 31 providestheinterpretationfunctionywiththesignaturey:Im,nIm,n!CD. Denition31. Given(i,j),(i0,j0)2Im,n,theinterpretationfunctionyonthebasisofthesetCD=fN,NW,W,SW,S,SE,E,NE,Ogofbasiccardinaldirectionsisdenedas 202

PAGE 203

y((i,j),(i0,j0))=8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:Nifii0^ji0^j=j0SEifi>i0^j>j0Eifi=i0^j>j0NEifij0Oifi=i0^j=j0ThemaindifferencecomparedtotheOIMapproachhoweverisinthefollowingthirdandnalstep.Wetemporallyliftthedircardinaldirectionrelationfunctiontoincludeobjectsovertheirtemporalextents.Here,wespecifythecardinaldirectionfunctionnamedmdir(moving-direction)whichdeterminesthecompositemovingcardinaldirectionfortwomovingregionobjectsAandB.Thisfunctionhasthesignaturemdir:regiont1regiont2!2CDandyieldsasetofbasiccardinaldirectionsasitsresult.Inordertocomputethefunctiondir,werstgeneralizethesignatureofourinterpretationfunctionytoy:2Im,n2Im,n!2CDsuchthatforanytwosetsX,YIm,nholds:y(X,Y)=fy((i,j),(i0,j0))j(i,j)2X,(i0,j0)2Yg.WearenowabletospecifythecardinaldirectionfunctionmdirinDenition 32 Denition32. LetA,B2regionanddir(A,B)=y(loc(A,OIM(A,B)),loc(B,OIM(A,B))).Thenthecardinaldirectionfunctionmdirisdenedas mdir(At1,Bt1,At2,Bt2)=dir(At1,Bt1)[dir(At2,Bt2)[dir((At1,Bt1),(At2,Bt2))[dir(At1,Bt2)[dir(At2,Bt1)[dir(At1,At2)[dir(Bt1,Bt2)WeapplythisdenitiontoourexampleinFigure 7-5 .Withloc(At1,OIM(At1,Bt1))=f(1,1)gandloc(Bt1,OIM(At1,Bt1))=f(3,3)g,andsoon,weobtain 203

PAGE 204

mdir(At1,Bt1,At2,Bt2)=fy((1,1),(3,3)),y((3,1),(1,3)),y(f(1,1)g,f(1,3),(2,3)g),y((1,1),(1,3)),y((3,1),(1,2)),y((1,1),(3,1)),y((3,1),(1,3))g=fNW,SW,W,SEgFinallywecansayregardingFigure 7-5 thatObjectAispartlynorthwest,partlysouthwest,partlywest,andpartlysoutheastofobjectBovertheperiodfromtimet1tot2.EachoftheindividualdirectionsbetweenthemovingobjectsforthethreepossiblecongurationsdescribedinSection 7.3 canalsobeprovidedbyusingtheresultsfromeachapplicationofdir,thatisusedtonallyarriveatthemovingdirectionrelations(givenbymdir). 7.6DirectionalPredicatesforOLAPQueryingBasedontheOIGmodelandtheinterpretationmechanismdescribedintheprevioussections,wecanidentifythecardinaldirectionsbetweenanygiventwomovingregionobjects.Tointegratethecardinaldirectionsintomovingobjectdatawarehousesasselectionandjoinconditionsinspatialqueries,binarydirectionalpredicatesneedtobeformallydened.Forexample,aquerylikeFindallhurricanesthataffectstateswhichliestrictlytothenorthofFloridarequiresadirectionalpredicatelikestrict northasaselectionconditionofaspatialjoin.Themdirfunction,whichproducesthenalmovingcardinaldirectionsbetweentwocomplexregionobjectsAandBacrosstemporalvariation,yieldsasubsetofthesetCD=fN,NW,W,SW,S,SE,E,NE,Ogofbasiccardinaldirections.Asaresult,atotalnumberof29=512cardinaldirectionscanbeidentied.Therefore,atmost512directionalpredicatescanbedenedtoprovideanexclusiveandcompletecoverageofallpossibledirectionalrelationships.Wecanassumethatuserswillnotbeinterestedinsuchalarge,overwhelmingcollectionofdetailedpredicatessincetheywillnditdifculttodistinguish,rememberandhandlethem.Insteadweprovideamechanismfortheusertodeneandmaintainseverallevelsofpredicatesforquerying.Asarst 204

PAGE 205

step,inDenition 33 ,weproposenineexistentialdirectionalpredicatesthatensuretheexistenceofaparticularbasiccardinaldirectionbetweenpartsoftworegionobjectsAandB. Denition33. LetR=(At1,Bt1,At2,Bt2),R2region.Thentheexistentialdirectionalpredicatefornorthisdenedas: exists north(R)(N2mdir(R))EightfurtherexistentialdirectionpredicatesforS,E,W,O,NE,SE,NW,andSWarealsodenedcorrespondingly.Later,byusing:,_and^operators,theuserwillbeabletodeneanysetofcompositederiveddirectionalpredicatesfromthissetfortheirownapplications.Weshallprovidetwoexamplesforthese.Therstsetofpredicatesisdesignedtohandlesimilarlyorienteddirectionalpredicatesbetweentworegions.Similarlyorientedmeansthatseveralcardinaldirectionsfacingthesamegeneralorientationbelongtothesamegroup.Denition 34 showsanexampleofnorthernbyusingtheexistentialpredicates. Denition34. LetR=(At1,Bt1,At2,Bt2),R2region.Thennorthernisdenedas: northern(R)=exists north(R)_exists northwest(R)_exists northeast(R)Theothersimilarlyorienteddirectionalpredicatessouthern,eastern,andwesternaredenedinasimilarway.Thesecondsetofpredicatesisdesignedtohandlestrictdirectionalpredicatesbetweentworegionobjects.Strictmeansthattworegionobjectsareinexactlyonebasiccardinaldirectiontoeachother.Denition 35 showsanexampleofstrict northbyusingtheexistentialpredicates. Denition35. LetR=(At1,Bt1,At2,Bt2),R2region.Thenstrict northisdenedas: strict north(R)=exists north(R)^:exists south(R)^:exists west(R)^:exists east(R)^:exists northwest(R)^:exists northeast(R)^:exists southwest(R)^:exists southeast(R)^:exists origin(R) 205

PAGE 206

Theotherstrictdirectionalpredicatesstrict south,strict east,strict west,strict origin,strict northeast,strict northwest,strict southeast,strict southwest,strict northern,strict southern,strict eastern,andstrict westernaredenedinasimilarway.WecannowemploythesepredicatesinMDXqueriesinthemovingobjectsdatawarehouse.Forexample,assumingwearegivenasampleWeatherEventscube(analogoustoaspreadsheettable)withhurricanenames(orderedincategoriesaccordingtotheirintensity)fromseveralyearsandcontaininggeographicinformation,wecanposethefollowingquery:Determinethenamesofhurricanes,orderedincategoriesaccordingtotheirintensity,whichhadapathmovingtowardstheeastfromtheirpointoforigin,andaffectedstatesstrictlyinthenorthernpartofFlorida,duringtheperiodfrom2005to2009.ThecorrespondingMDXqueryisasfollows: SELECT{NONEMPTYFilter({[Measures].[Hurricanes].[Category].MEMBERS},exists_east([Measures].[Hurricanes].CurrentMember,[Measures].[Hurricanes]))}ONROWS,{[Date].[2005]:[Date].[2009]}ONCOLUMNS,{[Geography].[Country].[State]}ONPAGESFROMCube_WeatherEventsWHERE(strict_northern([Geography].[Country].[State].MEMBERS,[Geography].[Country].[USA].[FL]))TheresultofthisqueryyieldsGeorgiaandNorthCarolinawiththevarioushurricanesandcategoriesfrom2005till2009.Fore.g.weseethatKevin,CindyandKatrinaaffectedGeorgiain2005,andNorthCarolinahadhurricanesCindyandKatrinain2005.Inconclusion,theBigCubemodelprovidesanextensibleapproachforspecifyingnewcomplexoperationsonspatialdata.Cardinaldirectionsareanimportantqualitativerelationshipthatexistsbetweenspatialobjectsandcanbeusedinseveralapplications 206

PAGE 207

suchasinEarthscienceresearch(forexample,`FindthechangesintrajectoryintermsofdirectionrelationsthatoccurredbetweenhurricaneGustavandhurricaneHannawiththeStateofNewYorkin2010'),inenvironmentalmonitoring(forexample,`FindthedirectionofmovementoftheradioactivecloudfromFukushimaprefectureinJapanduringearlyMarch2011')andseveralotherspatialapplications.TheBigCubemodelprovidesauser-centricparadigmtodevelopandintegratesuchoperatorswiththeunderlyingspatialtypesystemandtoperformdataanalysis. 7.7ObservationsInthisChapter,weintroduceanovelapproachcalledtheObjectsInteractionGraticule(OIG)modeltodeterminethecardinaldirectionsbetweenmoving,simple(single-component,hole-free)regionobjects.WealsoshowhowdirectionalpredicatescanbederivedfromthecardinaldirectionsandusetheminMDXqueries.WehavealsoshownhowdifferentkindsofdirectionalpredicatescanbederivedfromthecardinaldirectionsandhowthesepredicatescanbeemployedinMDXqueriesusingthemovingobjectsdatawarehouseframework.Inthefuture,weplantoextendourapproachtoincludecomplexmovingpoints,linesandothermixedcombinationsofmovingobjectdatatypesusinganextensibleunderlyingframeworkofmovingobjectsdatabase(MOD)( Schneideretal. 2011 2010 ).Furtherworkincludesanefcientimplementationofthemovingobjectsdatawarehouseandthedesignofspatialreasoningtechniquesfordirectionrelationsusingtheobjectsinteractiongraticulemodel. 207

PAGE 208

CHAPTER8CONCLUSIONSANDFUTUREWORK 8.1ContributionsExistingmultidimensionalmodelingtechniquespresentasystem-centricandlogicalviewthatisbasedonrelationaldatabasedesigntools.Inthispaper,wepresentagenericmetamodelforconceptualdatawarehousedesignthathelpstoprovideuserswithanabstractviewofdatasuitableformultidimensionalmodelingandanalysis.TheBigCubemodelprovidesanextensiblesetofabstractmultidimensionaldatatypesfordatawarehousemodeling.Thecubeviewisintuitivetothecognitiveunderstandingofmultidimensionalhierarchicaldataforanalysts.AfterdescribingthemultidimensionaldatatypesoftheBigCubemodelandassociatedOLAPandaggregationoperations,wepresenttheCubeAnalysisLanguage(CAL),whichisanovel,user-levelOLAPquerylanguagetohandledataanalysisusingtheabstract-levelBigCubeview.Asademonstrationoffunctionalityandapplication,wehaveextendedtheopen-sourceMondrianOLAPserverwithaCALquerycompilerandperformedtranslationsfromCALqueriestoMDXandnallytoSQLforimplementationusingstar,snowakeorgalaxyschemadatabasearchitecure.Asanapplicationofoperationsoncomplexspatialdataindatawarehouses,wealsodevelopedanewmodelcalledtheObjectsInteractionGraticule(OIG)togatherCardinaldirectionrelationsamongobjectswithspatio-temporalvariations.Additionally,inordertoimprovethedatabasestorageandqueryperformanceoflargestructuredobjects,weintroduceanoveldatastructurecalledintelligentbinarylargeobject(iBLOB)thatindexesandretrievesexactsub-componentsofcomplexobjects,andoutperformsXMLorBLOB-basedstorageinexperiments.Overall,thisworkpresentsamajorimprovementinspatialdatawarehousingbypresentinganovelframeworkthatprovidesauser-centric,conceptualviewofthemultidimensionaldataasaBigCubeabstractdatatype(ADT),OLAPnavigationandaggregationoperationsontheBigCubeand 208

PAGE 209

anextensiblequerylanguagetodevelopandanalyzeBigCubes.Itsupportsvariedapplicationinseveraldomainssuchasdecisionsupport,weathereventresearch,GISaggregationsandbiologicaldataanalysisbyprovidinganalyst-friendlytoolstoconsolidateandqueryoncomplexstructureddatasets. 8.2ListofPublicationsHereisthefulllistofpublicationssupportingthisthesisanddescribingtherelatedprojects. 1. CardinalDirectionsbetweenComplexRegions,MarkusSchneider,TaoChen,GaneshViswanathan&WenjieYuan,ACMJournalonTransactionsonDatabaseSystems(TODS),InPress,2012 2. OLAPFormulationsforSupportingComplexSpatialObjectsinDataWarehouses,GaneshViswanathan&MarkusSchneider,13thInternationalConferenceonDataWarehousingandKnowledgeDiscovery(DaWaK),Toulouse,France,2011 3. CAL:AGenericQueryandAnalysisLanguageforDataWarehouses,GaneshViswanathan&MarkusSchneider,20thInternationalConferenceonSoftwareEngineeringandDataEngineering(SEDE),LasVegas,2011 4. AMovingObjectsDatabaseInfrastructureforHurricaneResearch:DataInte-grationandComplexObjectManagement,MarkusSchneider,Shen-ShyangHo,MalvikaAgarwal,TaoChen,HechenLiu&GaneshViswanathan,NASAEarthScienceTechnologyForum(ESTF),Pasadena,CA2011 5. OntherequirementsforUser-CentricSpatialDataWarehousingandSOLAP,GaneshViswanathan&MarkusSchneider,1stInternationalWorkshoponSpatialInformationModeling,ManagementandMining(SIM3),DASFAAWorkshops,HongKong,2011 6. User-CentricSpatialDataWarehousing:RequirementsandApproaches,GaneshViswanathan&MarkusSchneider,InvitedsubmissiontoInternationalJournalofDataMining,ModellingandManagement(IJDMMM),2011QueryingCardinalDirectionsbetweenComplexObjectsinDataWarehouses,GaneshViswanathan&MarkusSchneider,InvitedsubmissiontoFundamentaInformatica(FI)Journal,2011 7. iBLOB:ComplexObjectManagementinDatabasesThroughIntelligentBinaryLargeObjects,TaoChen,ArifKhan,MarkusSchneider&GaneshViswanathan,3rdInternationalConferenceonObjectsandDatabases(ICOODB),Frankfurt,Germany,2010 209

PAGE 210

8. TheObjectsInteractionGraticuleforCardinalDirectionQueryinginMov-ingObjectsDataWarehouses,GaneshViswanathan&MarkusSchneider,14thEast-EuropeanConferenceonAdvancesinDatabasesandInformationSystems(ADBIS),TrackonWarehousingandOLAPingComplex,SpatialandSpatio-TemporalData,NoviSad,2010 9. MovingObjectsDatabaseTechnologyforAd-HocQueryingandSatelliteDataRetrievalofDynamicAtmosphericEvents,MarkusSchneider,Shen-ShyangHo,TaoChen,ArifKhan,GaneshViswanathan,WenqingTang&W.TimothyLiu,NASAEarthScienceTechnologyForum(ESTF),Arlington,VA2010 10. BigCube:AMetamodelforManagingMultidimensionalData,GaneshViswanathan&MarkusSchneider,19thInternationalConferenceonSoftwareEngineeringandDataEngineering(SEDE),SanFrancisco,2010 11. TheObjectsInteractionMatrixforModelingCardinalDirectionsinSpatialDatabases,TaoChen,MarkusSchneider,GaneshViswanathan&WenjieYuan,15thInternationalConferenceonDatabaseSystemsforAdvancedApplications(DASFAA),Tsukuba,Japan2010 8.3DirectionsforFutureWorkAsfuturework,weproposetodesignadditionalaggregationoperationsforBigCubesandexploretheinclusionofspatio-temporalobjectsindatawarehouses.Newaggregateoperationsonspatialdatacanincludeseveralcombinationsofspatialandscalaraggregateoperationssuchasndingtheareaofaggregatespatialobjectsdescribedoverseverallevelsoftheperspectiveorsubjecthierarchy.Anotherareaforfutureresearcharetheoperationsinvolvingmultipledatacubes.Conceptually,thistranslatesintodevelopingplansfortheintegrationofseveraldatamartswithheterogeneousdesignsthatoftenexistinanenterprisedatawarehousesystem.Anefcientstrategyforautomaticallyincorporatingexistingdatasources(ETLsystems)touser-leveldataBigCubesanditsimplementationisanotherinterestingareaforfutureresearch.Additionalfutureworkinvolvestheextensonofthisapproachtohandlespatio-temporaldataandaggregationsonthem.WearecurrentlyworkingonprovidingdatawarehousingsolutiongforapplicationsinEarthscienceresearch,followingthemethodologypresentedin( Schneideretal. 2011 2010 ).Asarststep,wearebuildingamovingobjectdatabase(MOD)capableofefcientlystoringandhandlinglarge 210

PAGE 211

scale,complexstructuredweathereventdataandperformingintelligentanalysisonthem( Schneideretal. 2011 2010 ).Overall,thisdemonstratesacompleteconceptualmodelingframeworkforuser-centricOLAP,anddatastructuresandoperatorsforsupportingcomplexspatialobjectsindatawarehousesanditsautomatictranslationtodatawarehouseschemaforuser-centric,complexdataengineering.Thisworkthusextendsandimprovestheareaofspatialdataengineeringbyprovidinganewapproachforcomplexdatawarehousing,specifyingnewqualitativeoperatorssuchascardinaldirectionsandusinganovelintelligentBLOBdatastructuretoleveragedatabaseLOBstostoremetadataandimprovethequeryperformanceoflargestructuredobjects. 211

PAGE 212

REFERENCES ABELLO,A.,SAMOS,J.,ANDSALTOR,F.2003.Implementingoperationstonavigatesemanticstarschemas.InProceedingsofthe6thACMinternationalworkshoponDatawarehousingandOLAP.ACM,56. ABELLO,A.,SAMOS,J.,ANDSALTOR,F.2006.YAM2:AMultidimensionalConceptualModelExtendingUML.InformationSystems31,6,541. AGRAWAL,R.,GUPTA,A.,ANDSARAWAGI,S.1997.ModelingMultidimensionalDatabases.InProceedingsofthe13thInternationalConferenceonDataEngineering.232. ALLEN,J.F.1983.MaintainingKnowledgeaboutTemporalIntervals.JournaloftheAssociationforComputingMachinery26,11,832. BATORY,D.S.,BARNETT,J.R.,GARZA,J.F.,SMITH,K.P.,TSUKUDA,K.,TWICHELL,B.C.,ANDWISE,T.E.1988.Genesis:anExtensibleDatabaseManagementSystem.IEEETrans.onSoftwareEngineering14,1711. BAUER,A.ANDLEHNER,W.1997.TheCube-Query-Languages(CQL)forMultidimensionalStatisticalandScienticDatabaseSystems.InProceedingsofthe5thInternationalConferenceonDatabaseSystemsforAdvancedApplications(DASFAA).WorldScienticPress,263. BEDARD,Y.,MERRETT,T.,ANDHAN,J.2001.3FundamentalsofSpatialDataWarehousingforGeographicKnowledgeDiscovery.GeographicDataMiningandknowledgediscovery,53. BILIRIS,A.1992.ThePerformanceofThreeDatabaseStorageStructuresforManagingLargeObjects.InACMSIGMODInt.Conf.onManagementofData.276. BIMONTE,S.,TCHOUNIKINE,A.,ANDMIQUEL,M.2005.TowardsaSpatialMultidimensionalModel.In8thACMInternationalworkshoponDataWarehous-ingandOLAP(DOLAP).ACM,NewYork,NY,USA,39. BIMONTE,S.,TCHOUNIKINE,A.,ANDMIQUEL,M.2006.GeoCube,aMultidimensionalModelandNavigationOperatorsHandlingComplexMeasures:ApplicationinSpatialOLAP.In4thInternationalConferenceonAdvancesinInformationSystems(ADVIS).100. BIMONTE,S.,TCHOUNIKINE,A.,ANDMIQUEL,M.2007.SpatialOLAP:OpenIssuesandaWebBasedPrototype.In10thAGILEInternationalConferenceonGeographicInformationScience.1. BIMONTE,S.,TCHOUNIKINE,A.,MIQUEL,M.,ANDPINET,F.2010.WhenSpatialAnalysisMeetsOLAP:MultidimensionalModelandOperators.InternationalJournalofDataWarehousingandMining(IJDWM)6,4,33. 212

PAGE 213

BLASCHKA,M.,SAPIA,C.,HOFLNG,G.,ANDDINTER,B.1998.FindingYourWaythroughMultidimensionalDataModels.In9thInternationalWorkshoponDatabaseandExpertSystemsApplications.198. BRAY,T.,PAOLI,J.,SPERBERG-MCQUEEN,C.,MALER,E.,ANDYERGEAU,F.2000.Extensiblemarkuplanguage(XML)1.0.W3Crecommendation6. CABIBBO,L.ANDTORLONE,R.1997.QueryingMultidimensionalDatabases.InIn6thInternationalWorkshoponDatabaseProgrammingLanguages.253. CABIBBO,L.ANDTORLONE,R.1998.ALogicalApproachtoMultidimensionalDatabases.AdvancesinDatabaseTechnology(EDBT),183. CAREY,M.ANDDEWITT,D.1996.Ofobjectsanddatabases:Adecadeofturmoil.InPROCEEDINGSOFTHEINTERNATIONALCONFERENCEONVERYLARGEDATABASES.Citeseer,3. CAREY,M.J.,DEWITT,D.J.,ANDVANDENBERG,S.L.1988.ADataModelandQueryLanguageforExodus.ACMSIGMODRecord17,413423. CHEN,T.,KHAN,A.,SCHNEIDER,M.,ANDVISWANATHAN,G.2010.iBLOB:ComplexObjectManagementinDatabasesthroughIntelligentBinaryLargeObjects.InICOODB.85. CHEN,T.,SCHNEIDER,M.,VISWANATHAN,G.,ANDYUAN,W.2010.TheObjectsInteractionMatrixforModelingCardinalDirectionsinSpatialDatabases.InPro-ceedingsofthe15thInternationalConferenceonDatabaseSystemsforAdvancedApplications(DASFAA).218. CODD,E.F.1970.ARelationalModelofDataforLargeSharedDataBanks.Commun.ACM13,6,377. DATTA,A.ANDTHOMAS,H.1999.TheCubeDataModel:AConceptualModelandAlgebraforOn-lineAnalyticalProcessinginDataWarehouses.DecisionSupportSystems27,3,289. DAVEY,B.ANDPRIESTLEY,H.2002.Introductiontolatticesandorder.CambridgeUnivPress. EGENHOFER,M.ANDROBERT,D.1991.Point-settopologicalspatialrelations.InternationalJournalofGeographicalInformationSystem5,2,161. FERRI,F.,POURABBAS,E.,RAFANELLI,M.,ANDRICCI,F.2002.ExtendingGeographicDatabasesforaQueryLanguagetoSupportQueriesInvolvingStatisticalData.InInternationalConferenceonScienticandStatisticalDatabaseManagement.IEEE,220. FRANCONI,E.ANDKAMBLE,A.2004.Adatawarehouseconceptualdatamodel.InProceedingsofScienticandStatisticalDatabaseManagement.435. 213

PAGE 214

FRANK,A.U.QualitativeSpatialReasoning:CardinalDirectionsasanExample.InternationalJournalofGeographicalInformationSystems. GAAL,S.1964.Pointsettopology.AcademicPressInc. GEOMONDRIANPROJECT.2011.http://www.spatialytics.org/projects/geomondrian/. GOLFARELLI,M.,MAIO,D.,ANDRIZZI,S.1998a.Conceptualdesignofdatawarehousesfrome/rschema.InHICSS'98:ProceedingsoftheThirty-FirstAn-nualHawaiiInternationalConferenceonSystemSciences-Volume7.IEEEComputerSociety,Washington,DC,USA,334. GOLFARELLI,M.,MAIO,D.,ANDRIZZI,S.1998b.TheDimensionalFactModel:AConceptualModelforDataWarehouses.InternationalJournalofCooperativeInformationSystems7,2,215. GOLFARELLI,M.ANDRIZZI,S.1998.AMethodologicalFrameworkforDataWarehouseDesign.InProceedingsofthe1stACMInternationalworkshoponDatawarehousingandOLAP.ACM,3. GOLFARELLI,M.ANDRIZZI,S.1999.DesigningtheDataWarehouse:KeyStepsandCrucialIssues. GOMEZ,L.,HAESEVOETS,S.,KUIJPERS,B.,ANDVAISMAN,A.2009.SpatialAggregation:DataModelandImplementation.InformationSystems34,6,551. GOMEZ,L.,HAESEVOETS,S.,KUIJPERS,B.,ANDVAISMAN,A.A.2007.Spatialaggregation:Datamodelandimplementation.CoRRabs/0707.4304. GOOGLEMAPS.2011.http://maps.google.com/. GOYAL,R.ANDEGENHOFER,M.2000.CardinalDirectionsbetweenExtendedSpatialObjects.Unpublishedmanuscript. GOYAL,R.ANDEGENHOFER,M.J.1997.TheDirection-RelationMatrix:ARepresentationforDirectionsRelationsbetweenExtendedSpatialObjects.InTheAnnualAssemblyandtheSummerRetreatoftheUniversityConsortiumforGeographicInformationSystemsScience. GRAY,J.,BOSWORTH,A.,LAYMAN,A.,ANDPIRAHESH,H.1996.DataCube:ARelationalAggregationOperatorGeneralizingGroup-By,Cross-Tab,andSub-Totals.InternationalConferenceonDataEngineering,152. GRAY,J.,CHAUDHURI,S.,BOSWORTH,A.,LAYMAN,A.,REICHART,D.,VENKATRAO,M.,PELLOW,F.,ANDPIRAHESH,H.1997.Datacube:ARelationalAggregationOperatorGeneralizingGroup-by,Cross-tab,andSub-totals.DataMiningandKnowledgeDiscovery1,1,29. 214

PAGE 215

GUTING,R.1994.AnIntroductiontoSpatialDatabaseSystems.TheVLDBJour-nal3,4,357. GUTING,R.,BOHLEN,M.,ERWIG,M.,JENSEN,C.,LORENTZOS,N.,SCHNEIDER,M.,ANDVAZIRGIANNIS,M.2000.AFoundationforRepresentingandQueryingMovingObjects.ACMTransactionsonDatabaseSystems(TODS)25,1,42. GUTING,R.,DERIDDER,T.,ANDSCHNEIDER,M.1995.ImplementationoftheROSEalgebra:EfcientAlgorithmsforRealm-basedSpatialDataTypes.InAdvancesinSpatialDatabases.Springer,216. GUTING,R.ANDSCHNEIDER,M.1995.Realm-basedSpatialDataTypes:theROSEalgebra.TheVLDBJournal4,2,243. GYSSENS,M.ANDLAKSHMANAN,L.V.S.1997.AFoundationforMulti-dimensionalDatabases.InVLDB'97:Proceedingsofthe23rdInternationalConferenceonVeryLargeDataBases.MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,106. HAAR,R.1976.ComputationalModelsofSpatialRelations.Tech.Rep.TR-478,ComputerScience,UniversityofMaryland,CollegePark,MD,USA. HAAS,L.M.,CHANG,W.,LOHMAN,G.M.,MCPHERSON,J.,WILMS,P.F.,LAPIS,G.,LINDSAY,B.G.,PIRAHESH,H.,CAREY,M.J.,ANDSHEKITA,E.J.1990.StarburstMid-ight:AstheDustClears.IEEETrans.onKnowledgeandDataEngineering(TKDE)2,143. HAN,J.,FU,Y.,WANG,W.,KOPERSKI,K.,ANDZAIANE,O.1996.DMQL:Adataminingquerylanguageforrelationaldatabases.InSIGMODWorkshoponResearchIssuesinDataMiningandKnowledgeDiscovery(DMKD).Montreal,Canada. HAN,J.ANDKAMBER,M.2006.DataMining:ConceptsandTechniques.MorganKaufmann. HAN,J.,KOPERSKI,K.,ANDSTEFANOVIC,N.1997.GeoMiner:ASystemPrototypeforSpatialDataMining.InACMSIGMODInternationalConferenceonManagementofdata.ACM,553. HAN,J.,STEFANOVIC,N.,ANDKOPERSKI,K.1998.SelectiveMaterialization:AnEfcientMethodforSpatialDataCubeConstruction.InPacic-AsiaConferenceonKnowledgeDiscoveryandDataMining(PAKDD).144. HDF-HIERARCHICALDATAFORMAT.2011.http://www.hdfgroup.org/. HURTADO,C.,MENDELZON,A.,ANDVAISMAN,A.1999.Maintainingdatacubesunderdimensionupdates.DataEngineering,1999.Proceedings.,15thInternationalConferenceon,346. 215

PAGE 216

HUSEMANN,B.,J.LECHTENBORGER,ANDG.VOSSEN.2000.ConceptualDataWarehouseDesign.InWorkshoponDesignandManagementofDataWarehouses.3. HWANG,B.,JUNG,I.,ANDMOON,S.1994.Efcientstoragemanagementforlargedynamicobjects.InEUROMICRO94.SystemArchitectureandIntegration20thEUROMICROConference.37. INMON,W.2005.BuildingtheDataWarehouse.JohnWiley&Sons,NewYork. JARKE,M.ANDVASSILIOU,Y.1997.Foundationsofdatawarehousequality.In2ndConferenceonInformationQuality.DWQ:ESPRITLongTermResearchProject,No22469,MassachusettsInstituteofTechnology,Cambridge. JAVATOPOLOGYSUITE(JTS).2011.http://www.vividsolutions.com/jts/. JENSEN,C.,KLIGYS,A.,PEDERSEN,T.,ANDTIMKO,I.2004.MultidimensionalDataModelingforLocation-basedServices.TheInternationalJournalonVeryLargeDataBases(VLDBJ)13,1,1. JENSEN,C.S.,KLIGYS,A.,PEDERSEN,T.B.,ANDTIMKO,I.2002.Multidimensionaldatamodelingforlocation-basedservices.InGIS'02:Proceedingsofthe10thACMInternationalSymposiumonAdvancesinGeographicInformationSystems.ACM,NewYork,NY,USA,55. JOINTTYPHOONWARNINGCENTER(JTWC).2011.http://metocph.nmci.navy.mil/jtwc. JPIVOT:JSPCUSTOMTAGLIBRARY.2011.http://jpivot.sourceforge.net/. KAMBLE,A.2008.Aconceptualmodelformultidimensionaldata.In5thAsia-PacicConferenceonConceptualModeling.Vol.79.29. KIMBALL,R.1997.ADimensionalModelingManifesto.DBMSMagazine10,9,58. KIMBALL,R.,REEVES,L.,ROSS,M.,ANDTHORNTHWAITE,W.1998.TheDataWarehouseLifecycleToolkitExpertMethodsforDesigning,Developing,andDeployingDataWarehouses. KIMBALL,R.ANDROSS,M.1996.TheDataWarehousingToolkit.JohnWiley&Sons,NewYork. KIMBALL,R.1996.TheDataWarehouseToolkit:PracticalTechniquesforBuildingDimensionalDataWarehouses.JohnWiley&Sons,Inc.NewYork,NY,USA. KIMBALL,R.ANDROSS,M.2002.TheDataWarehouseToolkit:TheCompleteGuidetoDimensionalModeling(SecondEdition).Wiley. LEHNER,W.1998.Modelinglargescaleolapscenarios.InInAdvancesinDatabaseTechnology(EDBT).LNCS,vol.1377.Springer,153. 216

PAGE 217

LEHNER,W.,RUF,T.,ANDTESCHKE,M.1996.CROSS-DB:AFeature-ExtendedMultidimensionalDataModelforStatisticalandScienticDatabases.InProceedingsofthe5thInternationalConferenceonInformationandKnowledgeManagement(CIKM).ACM,NewYork,USA,253. LEMA,C.,ANTONIO,J.,FORLIZZI,L.,GUTING,R.,NARDELLI,E.,ANDSCHNEIDER,M.2003.AlgorithmsforMovingObjectsDatabases.TheComputerJournal46,6,680. LENZ,H.ANDSHOSHANI,A.1997.SummarizabilityinOLAPandStatisticalDatabases.InProceedingsoftheInternationalConferenceonScienticandStatisticalDatabaseManagementConference(SSDBM).IEEEComputerSociety,132. LI,C.ANDWANG,X.S.1996.Adatamodelforsupportingon-lineanalyticalprocessing.InProceedingsofthe5thInternationalConferenceonInformationandKnowledgeManagement(CIKM).ACM,81. LOPEZ,I.,SNODGRASS,R.,ANDMOON,B.2005.Spatiotemporalaggregatecomputation:Asurvey.KnowledgeandDataEngineering,IEEETransactionson17,2(Feb.),271. LUJAN-MORA,S.,TRUJILLO,J.,ANDSONG,I.2006.AUMLProleforMultidimensionalModelinginDataWarehouses.DataKnowledgeEngineering59,3,725. MALINOWSKI,E.ANDZIMANYI,E.2004.OLAPHierarchies:AConceptualPerspective.InAdvancedInformationSystemsEngineering.Springer,19. MALINOWSKI,E.ANDZIMANYI,E.2004.RepresentingSpatialityinaConceptualMultidimensionalModel.In12thACMInternationalworkshoponGeographicInforma-tionSystems.ACM,12. MALINOWSKI,E.ANDZIMANYI,E.2005a.SpatialHierarchiesandTopologicalRelationshipsintheSpatialMultiDimERModel.InDatabase:Enterprise,SkillsandInnovation.LNCS,vol.3567/2005.SpringerBerlin/Heidelberg,17. MALINOWSKI,E.ANDZIMANYI,E.2005b.SpatialHierarchiesandTopologicalRelationshipsintheSpatialMultiDimERModel.InDatabase:Enterprise,SkillsandInnovation.LectureNotesinComputerScience,vol.3567/2005.SpringerBerlin/Heidelberg,17. MALINOWSKI,E.ANDZIMANYI,E.2006.HierarchiesinaMultidimensionalModel:FromConceptualModelingtoLogicalRepresentation.DataKnowledgeEngineering59,2,348. MALINOWSKI,E.ANDZIMANYI,E.2007.LogicalRepresentationofaConceptualModelforSpatialDataWarehouses.Geoinformatica11,4,431. MALINOWSKI,E.ANDZIMNYI,E.2004.Olaphierarchies:Aconceptualperspective.InInProceedingsofthe16thInternationalConferenceonAdvancedInformationSystemsEngineering,CAiSE04,LNCS3084.Springer-Verlag,477. 217

PAGE 218

MARCHAND,P.,BRISEBOIS,A.,BEDARD,Y.,ANDEDWARDS,G.2004.ImplementationandEvaluationofaHypercube-basedMethodforSpatiotemporalExplorationandAnalysis.ISPRSJournalofPhotogrammetryandRemoteSensing59,1-2,6. MCGRATH,R.2003.XMLandScienticFileFormats.InTheGeologicalSocietyofAmerica. MCKENNEY,M.,PAULY,A.,PRAING,R.,ANDSCHNEIDER,M.2006.TechnicalReport:StructuredLargeObjectsinDatabases.Tech.rep.,UniversityofFlorida,CISEDepartment. MICROSOFTBINGMAPS.2011.http://www.bing.com/maps/. MICROSOFTCORPORATION.2010.MultidimensionalExpressions(MDX)Reference.http://msdn.microsoft.com/en-us/library/ms145506.aspx. NATIONALHURRICANECENTER(NHC)-SEASONARCHIVES.2011.http://www.nhc.noaa.gov/pastall.shtml. NATIONALOCEANICANDATMOSPHERICADMINISTRATION(NOAA).2011.http://www.aoml.noaa.gov/hrd. NIEMI,T.,NUMMENMAA,J.,ANDTHANISCH,P.2001.LogicalMultidimensionalDatabaseDesignforRaggedandUnbalancedAggregationHierarchies.InInter-nationalWorkshoponDesignandManagementofDataWarehouses.Interlaken,Switzerland.Citeseer. OPENGISCONSORTIUM:REFERENCEMODEL.2011.http://openlayers.org. OPENLAYERSMAPPINGCLIENT.2011.http://openlayers.org. ORACLEOLAP:MULTIDIMENSIONALANALYTICENGINE.2011.http://www.oracle.com/technetwork/database/options/olap/index.html. PAPADIAS,D.ANDEGENHOFER,M.1997.AlgorithmsforHierarchicalSpatialReasoning.GeoInformatica1,3,251. PEDERSEN,T.ANDJENSEN,C.2002.MultidimensionalDatabaseTechnology.Com-puter34,12,40. PEDERSEN,T.,JENSEN,C.,ANDDYRESON,C.2001.AFoundationforCapturingandQueryingComplexMultidimensionalData.InformationSystems26,5,383. PEDERSEN,T.,SHOSHANI,A.,GU,J.,ANDJENSEN,C.2000.ExtendingOLAPQueryingtoExternalObjectDatabases.InProceedingsofthe9thInternationalConferenceonInformationandknowledgemanagement(CIKM).ACMNewYork,USA,405. 218

PAGE 219

PEDERSEN,T.B.ANDJENSEN,C.S.1999.MultidimensionalDataModelingforComplexData.InInProceedingsof15thInternationalConferenceonDataEngineer-ing(ICDE.IEEEComputerSociety,336. PEDERSEN,T.B.ANDTRYFONA,N.2001.Pre-aggregationinspatialdatawarehouses.InSSTD'01:Proceedingsofthe7thInternationalSymposiumonAdvancesinSpatialandTemporalDatabases.Springer-Verlag,London,UK,460. PENTAHOANALYSISSERVICES:MONDRIANPROJECT.2011.http://mondrian.pentaho.org/. POE,V.,BROBST,S.,ANDKLAUER,P.1997.Buildingadatawarehousefordecisionsupport.Prentice-Hall,Inc. PRAT,N.,AKOKA,J.,ANDWATTIAU,I.2006.AUML-basedDataWarehouseDesignMethod.DecisionSupportSystems42,3,1449. REW,R.,UCAR,B.,ANDHARTNETT,E.2004.MergingnetCDFandHDF5.In20thInt.Conf.onInteractiveInformationandProcessingSystems. RIGAUX,P.,SCHOLL,M.,ANDVOISARD,A.2002.IntroductiontoSpatialDatabases:WithApplicationtoGIS.MorganKaufmann. RIVEST,S.,BEDARD,Y.,ANDMARCHAND,P.2001a.ModelingMultidimensionalSpatio-temporalDataWarehousesinaContextofEvolvingSpecications.Geomat-ica,55(4). RIVEST,S.,BEDARD,Y.,ANDMARCHAND,P.2001b.TowardBetterSupportforSpatialDecisionMaking:DeningtheCharacteristicsofSpatialOn-lineAnalyticalProcessing(SOLAP).Geomatica55,4,539. RIVEST,S.,BEDARD,Y.,PROULX,M.,NADEAU,M.,HUBERT,F.,ANDPASTOR,J.2005.SOLAPtechnology:MergingBusinessIntelligencewithGeospatialTechnologyforInteractiveSpatio-temporalExplorationandAnalysisofData.ISPRSjournalofphotogrammetryandremotesensing60,1,17. RUIZ,C.ANDTIMES,V.2009.ATaxonomyofSOLAPOperators.XXIVSimposioBrasileirodeBancodeDados,Fortaleza,CE. SAPIA,C.,BLASCHKA,M.,HOFLING,G.,ANDDINTER,B.1999.ExtendingtheE/RModelfortheMultidimensionalParadigm.InER'98:WorkshopsonDataWarehousingandDataMining.Springer-Verlag,105. SCHEK,H.-J.,PAUL,H.-B.,SCHOLL,M.H.,ANDWEIKUM,G.1990.TheDASDBSProject:Objectives,Experiences,andFutureProspects.IEEETrans.onKnowledgeandDataEngineering(TKDE)2,1,25. SCHNEIDER,M.1997.SpatialDataTypesforDatabaseSystems-FiniteResolutionGeometryforGeographicInformationSystems.Vol.LNCS1288.Springer-Verlag. 219

PAGE 220

SCHNEIDER,M.ANDBEHR,T.2006.TopologicalRelationshipsbetweenComplexSpatialObjects.ACMTransactionsonDatabaseSystems(TODS)31,1,39. SCHNEIDER,M.,CHEN,T.,VISWANATHAN,G.,ANDYUAN,W.2012.CardinalDirectionsbetweenComplexRegions.ACMTransactionsonDatabaseSystems(TODS)InPress. SCHNEIDER,M.,HO,S.,AGARWAL,M.,CHEN,T.,HECHEN,L.,ANDVISWANATHAN,G.2011.MovingObjectsDatabaseTechnologyforAd-HocQueryingandSatelliteDataRetrievalofDynamicAtmosphericEvents.NASAEarthScienceTechnologyForum(ESTF). SCHNEIDER,M.,HO,S.,CHEN,T.,KHAN,A.,VISWANATHAN,G.,TANG,W.,ANDLIU,W.2010.MovingObjectsDatabaseTechnologyforAd-HocQueryingandSatelliteDataRetrievalofDynamicAtmosphericEvents.NASAEarthScienceTechnologyForum(ESTF). SCOTCH,M.ANDPARMANTO,B.2005.SOVAT:SpatialOLAPVisualizationandAnalysisTool.In38thHawaiiInternationalConferenceonSystemSciences(HICSS).IEEE,142b. SHEKHAR,S.ANDCHAWLA,S.2003.SpatialDatabases:ATour.PrenticeHall. SHEKHAR,S.,LU,C.,TAN,X.,CHAWLA,S.,ANDVATSAVAI,R.2001.MapCube:AVisualizationToolforSpatialDataWarehouses.Geographicdataminingandknowledgediscovery,73. SILVA,J.2008.Geomdql:Umalinguagemdeconsultageogracaemultidimensional.PhDThesis,UniversidadeFederaldePernambuco. SILVA,J.,TIMES,V.,ANDSALGADO,A.2006.Anopensourceandwebbasedframeworkforgeographicandmultidimensionalprocessing.InProceedingsofthe2006ACMsymposiumonAppliedcomputing.ACM,63. SILVA,J.,VERA,A.,OLIVEIRA,A.,NASCIMENTOFIDALGO,R.,SALGADO,A.,ANDTIMES,V.2007.Queryinggeographicaldatawarehouseswithgeomdql.InBrazilianSymposiumonDatabases(SBBD).223. SKIADOPOULOS,S.ANDKOUBARAKIS,M.2004.ComposingCardinalDirectionRelations.ArticialIntelligence152,2,143. STEFANOVIC,N.,HAN,J.,ANDKOPERSKI,K.2002.Object-basedSelectiveMaterializationforEfcientImplementationofSpatialDataCubes.KnowledgeandDataEngineering,IEEETransactionson12,6,938. STONEBRAKER,M.1986.InclusionofNewTypesinRelationalDataBaseSystems.InInt.Conf.onDataEngineeringConference(ICDE).262. 220

PAGE 221

TRYFONA,N.,BUSBORG,F.,ANDCHRISTIANSEN,J.1999.starER:AConceptualModelforDataWarehouseDesign.InProceedingsofACM2ndInternationalWorkshoponDataWarehousingandOLAP.3. TSOIS,A.,KARAYANNIDIS,N.,ANDSELLIS,T.2001.MAC:ConceptualDataModelingforOLAP.InInl.WorkshopontheDesignandManagementofDataWarehouses(DMDW).Citeseer,28. VASSILIADIS,P.1998.Modelingmultidimensionaldatabases,cubesandcubeoperations(extendedversion).InScienticandStatisticalDatabaseManagement,1998.Proceedings.10thInternationalConferenceon.IEEE,53. VASSILIADIS,P.ANDSELLIS,T.1999.ASurveyofLogicalModelsforOLAPDatabases.SIGMODRecord28,4,64. VASSILIADIS,P.ANDSKIADOPOULOS,S.2000.Modellingandoptimisationissuesformultidimensionaldatabases.InAdvancedInformationSystemsEngineering.Springer,482. VISWANATHAN,G.ANDSCHNEIDER,M.2010.BigCube:AMetaModelforManagingMultidimensionalData.In19thInternationalConferenceonSoftwareEngineeringandDataEngineering(SEDE).237. VISWANATHAN,G.ANDSCHNEIDER,M.2011a.CAL:AGenericQueryandAnalysisLanguageforDataWarehouses.In20thInternationalConferenceonSoftwareEngineeringandDataEngineering(SEDE).18. VISWANATHAN,G.ANDSCHNEIDER,M.2011b.Theobjectsinteractiongraticuleforcardinaldirectionqueryinginmovingobjectsdatawarehouses.InAdvancesinDatabasesandInformationSystems.Springer,520. VISWANATHAN,G.ANDSCHNEIDER,M.2011c.Olapformulationsforsupportingcomplexspatialobjectsindatawarehouses.InDataWarehousingandKnowledgeDiscovery(DaWaK).Springer,39. VISWANATHAN,G.ANDSCHNEIDER,M.2011.RequirementsforSpatialDataWarehousingandSOLAP.In1stInternationalWorkshoponSpatialInformationModeling,ManagementandMining. ZEPEDA,L.,CELMA,M.,ANDZATARAIN,R.2008.AMixedApproachforDataWarehouseConceptualDesignwithMDA.InInternationalConferenceonCom-putationalScienceandItsApplications.1204. 221

PAGE 222

BIOGRAPHICALSKETCH GaneshViswanathanwasborninChennai,India.AfterattendingschoolinseveralplacesthroughoutIndia,heearnedhisundergraduatedegreeinInformationTechnologyfromAnnaUniversity,Chennai-India,graduatingrstinhisclass.GaneshhasbeenaresearchinternatLosAlamosNationalLaboratory(ISR2-SpaceandRemoteSensinggroup)andAmazon.com(EC2-ElasticBlockStoregroup)wherehespecializedinlarge-scaledataanalytics.HereceivedhisPh.D.fromthetheUniversityofFloridaintheFallof2011. 222