On Nonparametric Measures of Dependence and Conditional Independence

MISSING IMAGE

Material Information

Title:
On Nonparametric Measures of Dependence and Conditional Independence Theory and Applications
Physical Description:
1 online resource (195 p.)
Language:
english
Creator:
Seth,Sohan
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Principe, Jose C
Committee Members:
Rangarajan, Anand
Harris, John G
Rao, Murali

Subjects

Subjects / Keywords:
association -- causal -- conditional -- dependence -- granger -- hypothesis -- ica -- independence -- inference -- measure -- metric -- nonparametric -- point -- process -- selection -- space -- spike -- testing -- train -- variable
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
?Does greenhouse gas emission cause global warming??, ?Does gas price depend on supply and demand??, ?Does stress cause depression??, ?Does productivity depend on competition?? -- these are a few of many questions that we face in our daily lives that naturally expose our minds to the concepts of dependence and causation; e.g., in the context of politics, sports, finance, education, health and technology. But, what do these two words imply? When exposed to the word dependence, an average person almost always thinks of a relation; i.e., if two variables are dependent, then change in one would result in a change in the other. The same intuitive understanding also applies to the concept of causation. Thus, these concepts are rather transparent in our minds. But, what remains a challenging problem till this day, and often a center of debate, is how to quantify these concepts while preserving their intuitive nature. Quantifying dependence and causation can be viewed as constructing shelters. Any building is a shelter, and it is built with bricks, the basic building block. But, they differ in their architecture, and may possess different characteristics to serve different purposes. Similarly, dependence and causation can also be quantified in many different ways where the building blocks are observations of signals, and the objective is to arrange them in the most robust and cost effective way, to capture a certain attribute. This dissertation explores the concept of dependence and causation from an engineering perspective; in particular, machine learning and data mining. The contributions of this dissertation are; first, unifying available quadratic measures of independence to develop computationally simpler independent component analysis algorithm, second, establish a robust kernel based measure of conditional independence to detect Granger non-causality beyond linear interaction, third, develop novel understanding of dependence between arbitrary random variables from the perspective of realizations, and construct parameter-free estimators for practical problems such as variable selection in gene expression data, and fourth, extend this approach to develop scalable estimators of conditional dependence to quantify causal flow in EEG data.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Sohan Seth.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Principe, Jose C.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043252:00001


This item is only available as the following downloads:


Full Text

PAGE 1

ONNONPARAMETRICMEASURESOFDEPENDENCE ANDCONDITIONALINDEPENDENCE: THEORYANDAPPLICATIONS By SOHANSETH ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2011

PAGE 2

c r 2011SohanSeth 2

PAGE 3

Tomyparentswhodreamedit 3

PAGE 4

ACKNOWLEDGMENTS Ishouldtakethisopportunitytothankthepeoplewhohavepl ayedasignicantrole inmyjourneytopursueaPhDdegree.IthankDr.JoseC.Prn cipeforthenancial supportandtechnicaladvicethatleadtothecompletionofm ywork,Dr.MuraliRao forenrichingmyunderstandingoftheoreticalscience,Dr. JohnHarrisandDr.Anand Rangarajanforbeingkindenoughtoserveonmycommittee,Dr .AnutoshChatterjee, Dr.RajibBandyopadhyay,andDr.BipanTudufortheirsuppor tandguidancethat encouragedmetopursueaPhDdegree,andDr.Il`Memming'Par kforbeingatrue inspiration,aneverenthusiasticcollaborator,andabrot her. Thelastveyearshavebeenawonderfuljourneywhichhasbee nmadememorable byanumberofamazingpeople.IthankShalomDarmanjian,Eri onHasanbelliuand AlexanderSinghAlvaradoforalwaysbeingtherethroughthe upsanddowns.Ithank theseniorCNELmembers,SudhirRao,WeifengLiu,andPuskal Pokharel,forsharing theirknowledgeandvision,andthejuniormembers,LinLi,A ustinBrockmeier,Goktug Cinar,RakeshChalasani,andBilalFadlallah,forsharingt heircuriosityandspirit.I thankStefanCraciun,JeremyAnderson,ManuRastogi,Sheng -Feng`Steve'Yen,Kittipat `Bot'Kampa,RaviShekhar,HectorGalloza,VaibhabGarg,Dr .JieXu,EvanKriminger, andSavyasachiSinghforbeingsourceoffunandlaughter.It hankDr.AndreasKeil,Luis SanchezGiraldo,AbhishekSingh,PingpingZhu,SonglinZha o,DarrenAiken,JianwuXu, AntonioPaiva,MustafaCanOzturk,SeungJuHan,Kyu-HwaJeo ng,andJungPhilKwon formanyfruitfuldiscussions.IthankAshishNemaniandAmi tAgarwalforbeingever cheerfulroommates. Finally,Ithankmyparentsfortheirlove,patienceandfait h,myelderbrotherDr. SumanSethforhiseverlastingsupportandcare,andmysiste r-in-lawBarnaliBasakand mylittlesistersAnushreeGoyalandRoshaPokharelforbein gthegreatestsourceofjoy. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................4 LISTOFTABLES .....................................9 LISTOFFIGURES ....................................10 ABSTRACT ........................................12 CHAPTER 1INTRODUCTION ..................................14 1.1Concepts .....................................14 1.2Applications ...................................16 1.3Issues ......................................17 1.3.1Suitability ................................17 1.3.2Estimation ................................18 1.3.3Abstraction ...............................20 1.3.4Scalability ................................20 1.4Contribution ...................................21 2BACKGROUND ...................................23 2.1Correntropy ...................................23 2.2MeanSquareContingency ...........................37 2.3Discussion ....................................43 3INDEPENDENCE ..................................45 3.1Background ...................................50 3.1.1StrictPositiveDeniteness .......................50 3.1.2QuadraticMutualInformation .....................50 3.1.3CharacteristicFunctionBasedMeasure ................52 3.1.4QuadraticDependenceMeasure ....................53 3.2Unication ....................................54 3.2.1GeneralizedCrossInformationPotential ...............54 3.2.2CompositionKernel ...........................55 3.2.3RelatedWorks ..............................57 3.3NovelKernels ..................................58 3.4IndependentComponentAnalysis .......................63 3.4.1Description ................................65 3.4.2Optimization ...............................65 3.4.3Approximation ..............................67 3.4.4ExperimentalSetup ...........................68 3.4.5Discussion ................................70 5

PAGE 6

3.4.6Errorvs.SpeedPlot ..........................72 3.5Summary ....................................75 4CONDITIONALINDEPENDENCE ........................76 4.1Background ...................................79 4.1.1ConditionalCumulativeDistributionFunctionbased method ....80 4.1.2ConditionalProbabilityDensityFunctionBasedMeth od ......82 4.1.3ReproducingKernelHilbertSpaceBasedMethod ..........83 4.2ProposedApproach ...............................85 4.2.1MeasureofConditionalIndependence .................86 4.2.2EstimationofConditionalDistribution ................88 4.2.3EstimatorofConditionalIndependence ................90 4.2.4ChoiceofFunctionsandRegularization ................90 4.2.5Discussion ................................92 4.3Simulation ....................................93 4.3.1ExperimentalSetup ...........................94 4.3.2PermutationTest ............................95 4.3.3DataGeneratingProcesses .......................97 4.3.3.1ExperimentA .........................97 4.3.3.2ExperimentB .........................98 4.3.3.3ExperimentC .........................99 4.3.3.4ExperimentD .........................99 4.3.3.5ExperimentE .........................100 4.3.3.6ExperimentF .........................102 4.3.3.7ExperimentG .........................103 4.3.3.8ExperimentH .........................105 4.3.3.9ExperimentI .........................106 4.4Summary ....................................106 5GENERALIZEDASSOCIATION ..........................108 5.1ConceptsandLimitations ...........................111 5.1.1Independence ..............................112 5.1.2Correlation ................................112 5.1.3Association ................................113 5.1.4Copula ..................................114 5.1.5Divergence ................................116 5.1.6Renyi'sApproach ............................118 5.1.7QuadraticMeasuresofIndependence .................120 5.1.8FurtherLimitations ...........................121 5.1.8.1Parameters ..........................121 5.1.8.2Symmetry ...........................122 5.2ANovelFramework ...............................122 5.2.1PropertiesofDependence ........................122 5.2.1.1Denition ...........................122 6

PAGE 7

5.2.1.2Asymmetry ..........................123 5.2.1.3Bounds ............................123 5.2.1.4Invariance ..........................124 5.2.1.5Estimator ...........................124 5.2.2GeneralizedAssociation .........................124 5.2.3GeneralizedMeasureofAssociation ..................126 5.2.4RelatedWork ..............................129 5.2.5Examples .................................129 5.2.5.1BivariateGaussian ......................129 5.2.5.2ClaytonCopula ........................130 5.2.5.3MultivariateGaussian ....................131 5.2.5.4CategoricalData .......................132 5.2.5.5TimeSeries ..........................133 5.2.6Propertiesofgeneralizedmeasureofassociation ...........133 5.3Applications ...................................135 5.3.1TimeSeriesAnalysis:Auto-dependence ................135 5.3.2VariableSelection ............................138 5.3.2.1InteractiveAssociation ....................138 5.3.2.2CausalVariableSelection ..................140 5.3.3CausalInference .............................141 5.3.4ApplicationtoPointProcesses .....................144 5.3.4.1Victor-PurpuraSpikeTrainMetric .............144 5.3.4.2DependenceBetweenStimulusandSpikeTrains ......146 5.3.4.3DependenceBetweenSetsofSpikeTrains .........149 5.3.4.4Micro-stimulationData ....................149 5.4Summary ....................................153 6CONDITIONALASSOCIATION ..........................155 6.1Background ...................................158 6.2Method .....................................159 6.2.1ConditionalAssociation .........................159 6.2.2Examples .................................161 6.2.2.1ConditionallyDependentbutIndependentVariable s ....162 6.2.2.2ConditionallyIndependentbutDependentVariable s ....162 6.3SurrogateTest ..................................162 6.4Simulation ....................................168 6.4.1ConditionalGrangerCausality .....................168 6.4.2LinearSystem ..............................169 6.4.3NonlinearSystem ............................170 6.4.4VaryingCouplingStrength .......................172 6.4.5RelativelyHighSynchrony .......................174 6.4.6MultivariateTimeSeries ........................174 6.4.7HeartRateandRespirationForce ...................175 6.4.8DigoxinClearance ............................176 7

PAGE 8

6.4.9CausalityUnderNoise .........................177 6.4.10EEGData ................................178 6.5Discussion ....................................179 7FUTUREWORK ...................................181 7.1UniedQuadraticMeasureofIndependence .................182 7.2AsymmetricQuadraticMeasureofConditionalIndepende nce ........183 7.3GeneralizedAssociation ............................183 7.4ConditionalAssociation ............................184 REFERENCES .......................................186 BIOGRAPHICALSKETCH ................................195 8

PAGE 9

LISTOFTABLES Table page 3-1Kernelsfordierentchoicesof g ( x u ) and ( u ) ..................57 3-2Performanceofindependentcomponentanalysis(ICA)fo rtwosources .....69 3-3PerformanceofICAalgorithmsforvaryingnumberofsour ces ..........71 3-4ComputationalloadofICAalgorithms .......................73 4-1Performanceofconditionalindependencemeasuresinex perimentA .......97 4-2Performanceofconditionalindependencemeasuresinex perimentB .......98 4-3Performanceofconditionalindependencemeasuresinex perimentC .......100 4-4Performanceofconditionalindependencemeasuresinex perimentD ......101 4-5Performanceofconditionalindependencemeasuresinex perimentE .......102 4-6Performanceofconditionalindependencemeasuresinex perimentE(singlelag) 103 4-7Performanceofconditionalindependencemeasuresinex perimentF .......104 4-8Performanceofconditionalindependencemeasuresinex perimentF(singlelag) 104 4-9Performanceofconditionalindependencemeasuresinex perimentG. ......104 5-1Descriptionofdatasetsandtheperformanceofvariable selectionmethods ...139 5-2Causaldirectionsderivedfromthecause-eectpairs ................143 6-1PerformanceofMCAforlinearsystem .......................171 6-2Performanceof MCA fornonlinearsystem .....................172 6-3PerformanceofMCAforsleepapneadata .....................177 7-1Equivalenceofassociationandinformationbasedappro aches ...........182 9

PAGE 10

LISTOFFIGURES Figure page 2-1Kernelsusedfordeningcorrelationandcorrentropy ...............25 2-2Graphicaldescriptionofmeansquarecontingency(MSC) anditsestimator ...42 3-1Twofunctions,notzeroeverywhere,withconvolutionze roeverywhere ......51 3-2Max K M ( x ,0.5) ,min K m ( x ,0.5) ,andtriangle K T ( x ,0.5) kernelsin (0,1) 2 ..63 3-3Sourcedensitiesusedinindependentcomponentanalysi s(ICA)experiment ...64 3-4PerformanceofseveralICAalgorithms .......................75 4-1Depictionofmeasureofconditionalindependence M CI anditsestimator ....87 4-2Performanceofconditionalindependencemeasuresinex perimentH .......105 4-3Performanceofconditionalindependencemeasuresinex perimentI .......107 5-1Is A moredependentthan B ? ............................113 5-2Situationswherethecause X andtheeect Y cannot/canbeseparated. ....119 5-3Illustrationoftheestimationofgeneralizedassociat ion ..............128 5-4Rankvariablefor(increasing)correlationcoecient ................130 5-5Rankvariablefor(increasing)dependenceparameter ...............131 5-6Rankvariablefor(increasing)correlationcoecient( multivariate) ........132 5-7Rankvariablefor(increasing)displacementparameter ..............133 5-8Rankvariablefor(increasing)correlationcoecient( timeseries) .........134 5-9Auto-dependencestructureoftimeseries ......................137 5-10Performanceofvariableselectionalgorithmonmicroa rraydatasets .......141 5-11RastersforinhomogeneousGammarenewalprocess ................144 5-12GMAacrossvaluesof q ...............................148 5-13Dependencemeasuresfordierentnumberofclassesand samples ........148 5-14GMAbetweentwosetsofspiketrains .......................150 5-15Microstimulationwaveforms .............................151 5-16GMAacrosstemporalprecisionfor14corticalchannels ..............152 10

PAGE 11

5-17Conditional,joint,andmarginalGMAfor14channels ..............154 6-1Whatmakes Y conditionallydependenton X given Z ? ..............157 6-2Performanceofsurrogatedatagenerationschemeunderc onditionaldependence 165 6-3Performanceofsurrogatedatagenerationschemeunderc onditionalindependence 166 6-4PerformanceofMCAfornonlinearsystem .....................173 6-5PerformanceofMCAforhighsynchrony ......................175 6-6PerformanceofMCAformultivariatetimeseries .................176 6-7Grangercausalinferenceunderadditivenoise ...................179 6-8Causalrowofalpharhythm .............................180 11

PAGE 12

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy ONNONPARAMETRICMEASURESOFDEPENDENCE ANDCONDITIONALINDEPENDENCE: THEORYANDAPPLICATIONS By SohanSeth August2011 Chair:JoseC.PrncipeMajor:ElectricalandComputerEngineering \Doesgreenhousegasemission cause globalwarming?",\Doesgasprice depend on supplyanddemand?",\Doesstress cause depression?",\Doesproductivity depend on competition?"{theseareafewofmanyquestionsthatweface inourdailylivesthat naturallyexposeourmindstotheconceptsof dependenceandcausation ;e.g.,inthe contextofpolitics,sports,nance,education,healthand technology.But,whatdothese twowordsimply?Whenexposedtotheworddependence,anaver agepersonalmost alwaysthinksofarelation;i.e.,iftwovariablesaredepen dent,thenchangeinonewould resultinachangeintheother.Thesameintuitiveunderstan dingalsoappliestothe conceptofcausation.Thus,theseconceptsarerathertrans parentinourminds.But, whatremainsachallengingproblemtillthisday,andoftena centerofdebate,ishowto quantify theseconceptswhilepreservingtheirintuitivenature.Qu antifyingdependence andcausationcanbeviewedasconstructing shelters .Anybuildingisashelter,andit isbuiltwithbricks,thebasicbuildingblock.But,theydi erintheirarchitecture,and maypossessdierentcharacteristicstoservedierentpur poses.Similarly,dependence andcausationcanalsobequantiedinmanydierentwayswhe rethebuildingblocksare observationsofsignals,andtheobjectiveistoarrangethe minthemostrobustandcost eectiveway,tocaptureacertainattribute. 12

PAGE 13

Thisdissertationexplorestheconceptofdependenceandca usationfroman engineeringperspective;inparticular,machinelearning anddatamining.Thecontributions ofthisdissertationare;rst,unifyingavailablequadrat icmeasuresofindependenceto developcomputationallysimplerindependentcomponentan alysisalgorithm,second, establisharobustkernelbasedmeasureofconditionalinde pendencetodetectGranger non-causalitybeyondlinearinteraction,third,developn ovelunderstandingofdependence betweenarbitraryrandomvariablesfromtheperspectiveof realizations,andconstruct parameter-freeestimatorsforpracticalproblemssuchasv ariableselectioningene expressiondata,andfourth,extendthisapproachtodevelo pscalableestimatorsof conditionaldependencetoquantifycausalrowinEEGdata. 13

PAGE 14

CHAPTER1 INTRODUCTION Dependenceandcausationplayacrucialroleineveryaspect ofscienceand engineering.Theirubiquityisundeniablee.g.,inmachine learninganddatamining ( Songetal. 2007a b ),whereauseroftenencountersorexpressesstatementssuc has X dependsonY or XcausesY .Although,theseconceptsareoftenexploitedinacolloqui al senseratherthanmathematical,theirusage,nonetheless, simpliesourunderstandingofa problemandourexplanationofasolution,andthus,demands amorerigoroustheoretical andcomputationalfoundation. Althoughdependenceandcausation(orequivalentlycondit ionaldependence) lackuniversallyaccepteddenitions,theirrespectiveco unterparts,i.e.,independence andnon-causation(orequivalentlyconditionalindepende nce)arepreciselydened inmathematicalterms.Therefore,itisacommonpracticeto dene,understandand quantifydependenceandconditionaldependenceasmereabs enceofindependenceand conditionalindependence( Fukumizuetal. 2008 ; Achard 2008 ).However,theseconcepts arenotalwaysreciprocalofeachother,andtheybearverydi erentunderstandingand usageinthecontextofspecicapplications. 1.1Concepts Independenceis,perhaps,themostfundamentalconcepttha tliesatthecoreof modernmachinelearninganddataminingalgorithms.The assumption ofstatistical independencebetweentworandomvariablesnotonlysimpli estheunderstandingofa problem,butitalsoprovidestheessentialtoolsforformul atingafeasibleandecient solution( Comon 1994 ).Atypicalexamplebeingtheassumptionofindependencebe tween signalandnoise.However,althoughindependenceenjoysas trongandclearmathematical denition,howto decide iftworandomvariablesareindependentinpractice,remain s adicultandopenproblem( Rosenblatt 1975 ; AhmadandLi 1997 ; Grettonetal. 2007 ).Apartofthisdicultyarisesfromthefactthatindepende nceisdenedinterms 14

PAGE 15

ofrandomvariables,whereasinpracticeauseronlyhaveacc esstoaniteamountof realizations.Therefore,agreatdealofresearchthrougho utthelastcenturyhavebeen dedicatedtocorrectlyinferindependenceusingboththele astamountofrealizationsand theleastcomputationaleort. Conditionalindependenceisageneralizationofindepende nce,andlikeindependence, itcherishesawellacceptedmathematicaldenition,andit isdiculttoinferfrom realizations( Dawid 1998 ; SuandWhite 2008 ).The assumption ofconditionalindependence playsthesameroleinmachinelearninganddataminingasind ependence;i.e.,of simplifyingunderstandingandeasingcomputationalburde nofcomplexproblems( Koller andSahami 1996 ).But,thenecessityof deciding whethertworandomvariablesare conditionallyindependentornot,perhaps,spawnsmostlyf romitsinherentconnection tonon-causation( DiksandPanchenko 2006 ).FollowingtheseminalworkofGranger, non-causality,i.e.,onevariablenothavingany(Granger) causalinruenceontheother, canbedirectlyrelatedtoconditionalindependence( Granger 1980 ).Thegrowing popularityofGrangercausalityinrourishingresearchare assuchasneuroscience,demands waysofinferringconditionalindependencecorrectlyande cientlyfromlimitedamountof realizationsandcomputationalresource( Nedungadietal. 2009 ). Dependenceistheabsenceofindependence;i.e.,ifindepen denceiscomparedto linearity,thenbeingdependentisthesameasbeingnonline ar,andlikenonlinearity, itlacksaformalmathematicaldenition,andthus,anwellacceptedcomputational foundation( MariandKotz 2001 ).However,inmanyaspectofscience,itisnotenough toobserveiftworandomvariablesareindependent,butitis requiredtounderstand how much dependentthetworandomvariablesare,orgiventwopairsof randomvariables, ifonepairsharesmoredependencethantheother;e.g.,inva riableselection( Guyon andElissee 2003 ).Likenonlinearity,dependencebetweentworandomvariab lescan originatefrommanydierentperspectives.Therefore,whe naddressingdependenceitis moreimportanttounderstand why apairofvariablesaremoredependentthatanother 15

PAGE 16

pairofvariables,thanjustassessingiftheyareindepende ntornot.This,ratherill-posed, questionhasattractedimmenseattentioninthestatistica lliteratureinthelastdecades ( Renyi 1959 ; Lehmann 1966 ; SchweizerandWol 1981 ; Nelsen 2002 ).Thetwomost popularnotionsofdependencethathavebeenextensivelyus edinsciencearecorrelation, duetoitssimplicity,andmutualinformation(MI),duetoit sintuitiveunderstanding throughthecolloquialconceptofuncertainty( ShannonandWeaver 1949 ).However,both theseconceptsfallshortinquantifyingdependenceinitsf ullunderstandingandpotential; e.g.,correlation,duetoitslinearnature,andMI,duetoit sdicultyin estimation Therefore,thequestofunderstandingdependenceinthecon textofanapplication,andof quantifyingitreliablyandecientlyfromrealizationsst illremainsactive. Conditionaldependence,again,isageneralizationofthec onceptofdependence,and inpopularview,itisthemereabsenceofconditionalindepe ndence( Fukumizuetal. 2008 ).However,quantifyingthisconcepthasrecentlyreceived considerableeort,due toitslinkto,again,causation.Whileconditionalindepen denceinfersnon-causation, conditionaldependencepotentiallysuggeststhe strength ofcausalinruence( Nolteetal. 2008 ).Similartodependence,thetwoquantiersofconditional dependencethatare mostlyusedinpracticearethepartialcorrelation,anexte nsionofcorrelation,andthe conditionalmutualinformation,ageneralizationofmutua linformation( Joe 1989 ),which failtocapturethenotionofconditionaldependenceinitsc ompleteness,andtherefore, conditionaldependencestillremainsalargelyunexplored areaofresearch,perhapsthe leaststudiedofthefour. 1.2Applications Besidestheirextensivecolloquialuse,theapplicationso fthesefourconceptsare far-reaching.Forexample,independenceisessentialford eningindependentcomponent analysis(ICA).ItseparatesICAfromthemoregeneral,ando ftenill-posed,problemof blindsourceseparation(BSS),andprovidesacomputationa llyfeasiblesolution( Bachand Jordan 2002 ; Shenetal. 2009 ).Dependence,ontheotherhand,isusefulinthecontext 16

PAGE 17

ofvariableselection,whereitisnecessarytoweighttheco ntributionsofseveralinput variableontheoutputvariable,andselectonlytheimporta ntvariablestoavoidcurseof dimensionalityandtoimprovegeneralization( BontempiandMeyer 2010 ).Conditional independenceanddependence,asmentionedearlier,playac rucialroleindetectingthe existenceandstrengthof(Granger)causation( Schreiber 2000 ).However,theycanalso beexploitedingraphicalmodelinginthecontextofinferri ngamoregeneralnotionof causality;i.e.,notinthesenseofGrangerbutcausalitywi thoutinvolvingtime,whichis anemergingresearcharea( Edwards 2000 ). Anothertypicalapplicationofdependenceandconditional dependenceisthe investigationoffunctionalandeectiveconnectivityamo ngbrainregions( Friston 1995 ). Howdierentregionsofbraincommunicatewhileperforming aparticulartaskhasbeen themostintriguingquestiontotheneuroscientistinthela stcentury.Itisevidentthat answeringthisquestionrequiresassessingthedependence betweenthesignalsoriginating fromdierentregions,andalsoexploringthe(Granger)cau salinruenceoftheseregions oneachother.However,duetotheinherentdicultiesinass essingtheseattributes, theuseofsimplermethodsthatonlycapturesecondordersta tisticalstructure,such ascorrelationforquantifyingdependenceandlinearGrang ercausalityforquantifying causationremainpredominant.Therefore,thedevelopment ofnovelunderstandingand toolstoanalyzetheseproblemopensupthepossibilityofne wdiscoveries. 1.3Issues Fromanapplicationperspective,however,thepresentview ofunderstanding, quantifyingandapplyingthesefourconceptsremainsincom plete.Someparticular issuesofthecurrentviewcanbesummarizedasfollows.1.3.1Suitability Aquantieroftheseconceptsshouldbeproblemspecic.For example,considerthe problemofICA,whereitisrequiredtominimizeaquantiero findependenceasacost function.First,sincethisproblemonlyaddressesrealval uedsignals,itissucientto 17

PAGE 18

employaquantierthatisdenedontherealvaluedrandomva riables.Second,since theobjectiveofthiscostfunctionistodetectiftworandom variablesareindependentor not,itissucienttodesignaquantierthatissensitiveto departurefromindependence, morethananythingelse.Also,sinceICAproblemsareusuall ylinear,itissucientto useaquantierthatissensitivetothelinearmixing.Third ,sincethequantierdenes theperformancesurfaceoverwhichthemixingoperatorisse arched,thequantiershould notinvolveanyfreeparametersthatwouldchangetheshapeo ftheperformancesurface, andthus,thepositionofthelocalminima.Finally,sinceth isquantierneedstobe evaluatedineachiteration,itshouldbeeasytocomputetor educethecomputational load.However,thesesimplerequirementsareoftenoverloo kedinpractice,andaquantier ofindependence{perhapsbetterinotheraspects{isusedth atmightnotbesuitablefor thisparticularapplication,e.g.( SuzukiandSugiyama 2011 ). Ontheotherhand,considertheexampleofvariableselectio n,whereitisrequired tojudgetheimportanceofavariableforregressionorclass icationusingquantierof dependenceasanevaluationfunction.First,sincetheobje ctivehereisto compare the contributionoftworandomvariablesoverathirdrandomvar iable,itisnotsucientto employaquantierofindependence.Second,sinceregressi onandclassicationcanbe regardedasestablishingafunctionalrelationshipbetwee nrandomvariables,aquantierof dependenceshouldbesensitivetoperturbationinthefunct ionalrelationship,morethan anythingelse.Finally,sincethevariableselectionstepi sfollowedbyalearningmachine,it isrequiredthatthecapacityofthequantiermatchesthato fthelearningmachinei.e.the quantiershouldnotbetoorelaxedortoostrictinselectin gfunctionalrelationship.These requirementsareoftenignoredinpracticeandaquantiero fdependenceisemployed withoutevaluatingitscompletecharacteristics( Songetal. 2007b ). 1.3.2Estimation Thequantiersoftheseconceptscanbeexpressedeitherthr oughtherandom variables(i.e.ameasure)orthroughtherealizations(i.e .anestimator).Itisacommon 18

PAGE 19

practicetoexpressthequantiersasafunctionofrandomva riablesortheprobability law,andthen,toestimatethemfromrealizationssinceinpr acticeauseronlyhaveaccess totherealizations.Inpopularview,anestimatorisregard edasanapproximationofthe measure,andanextensiveamountofresearchisdevotedtowa rdndingestimatorsthat approximatesthevalueofthemeasureascloselyaspossible usingtheleastnumberof realizationspossible.Mostestimatorsaredesignedtobec onsistent,i.e.,intheorythey alwaysreacharbitrarilyclosetothemeasuregivensucien trealizations( Lehmannand Casella 1998 ). However,thisviewdoesnotaddresstheissueofwhetherthee stimatorfollowsthe samecharacteristicsasthemeasure,e.g.doesitreachitsm inimumormaximumvalue underthesameconditions,orisitinvarianttothesameclas softransformations?This questionsareunavoidablesinceinpracticeeitherauserha rdlyhasenoughrealizations toreacharbitrarilyclosetothemeasure,oragoodestimato rthatcanreachthevalueof themeasureeectivelyiscomputationallyprohibitive( Grangeretal. 2004 ),andthus, crudeestimatorsareoftenemployedwheretheestimatedval ueusuallydiersalotfrom themeasure( Pengetal. 2005 ).Therefore,itisessential,atleastfromanapplication perspective,tounderstandwhatattributeoftherealizati onsisexactlyquantiedbythese estimators;e.g.,foraquantierofindependence,itshoul dbewellunderstoodwhenisthe estimatedvalueminimum,andforaquantierofdependence, itshouldbewellunderstood whendoestheestimatedvalueincreaseordecreaseorreachi tsmaximum. Moreover,anestimatorofteninvolvesfreeparameters,suc haskernelsizeinthe Parzenestimate.Freeparameter(s)arenecessaryinpracti cetoregularizeanill-posed problemsuchasestimatingRadon-Nikodymderivative( Perez-Cruz 2008 ).Butthey obscuretheunderstandingofthequantier,andchoosingth ebestvaluesofthese parametersarecomputationallycumbersome.Insummary,th ereexistsagapbetween whatameasurepromises,andwhatanestimatorcanachieve,a ndthisisoftenoverlooked inpractice. 19

PAGE 20

1.3.3Abstraction Traditionally,thesefourconceptshavemostlybeenexplor edinthecontextofreal orvectorvaluedrandomvariablesandrealizations.Theyca nbeperceivedmoreeasily intheEuclideanspaceduetotheexistenceofdirection(oro rdering)anddimension.For example,toanaverageuserdependencecanbeperceivedasal inearornonlineartracein theEuclideanspace.However,inrecentyears,moreabstrac tdata-types,suchasstrings, trees,graphs,andspiketrains,havebecomeincreasinglyc ommonintheeldofmachine learninganddatamining( Grettonetal. 2008 ).Thesedata-typesdonotbelongtoan Euclideanspace,andtherefore,itishardertoperceivewha ttheseconceptsimplyinthis context.Usuallytheavailablenotionsofdependenceorcon ditionaldependencecannot bedenedonthesedata-types,orifdened,theycannotbees timatedusingtraditional approach.Thislackofabstractionisoftenignoredinpract ice,andthesemoreexotic data-typesareusuallyrepresentedinEuclideanspaceforp rocessingwhichdestroystheir originalstructure( VictorandPurpura 1997 ; Victor 2002 ). 1.3.4Scalability Withtherecentadvancesinsensortechnology,theamountof datacollectedfroman experimenthasseenanexponentialgrowthovertherecentye ars;bothintermsofnumber ofobservationsanddimensionality.Forexample,anEEGrec odingdeviceoftenemploys 129 sensorstorecordsignalsat 1 kHzforseveralminutes.Althoughsimpleralgorithms thatexploitsecondorderstatistic,e.g.,linearGrangerc ausality,canbeeasilyusedto exploresuchvastamountofdata,themoresophisticatedmet hods,e.g.methodsbased onconditionalindependenceusuallysuerfromlackofscal abilityduetoeithertheir computationalcostortheirrequirementofchoosingfreepa rameters( SuandWhite 2008 ). Therefore,thereexistsacleartrade-obetweensimplicit yandscalability,whichisusually overlookedinpractice. 20

PAGE 21

1.4Contribution Thisdissertationenhancesthecurrentunderstandingofth eseconceptsindierent frontiersbyaddressingtheseissues.Inchapter 2 ,Idiscusssomepreliminaryworkon addressingtheissueofindependence,usingtheconceptofc orrentropy( Raoetal. 2011 ),anddependence,usingtheconceptofmeansquarecontinge ncy,toelaboratethe issuespresentedintheintroduction,andtomotivatetheme thodspresentedinthelater chapters. Quantiersofindependencehasreceivedconsiderableatte ntionintheliterature, andthereexistmultitudeofmeasureswithdierentcharact eristics.Therefore,Iaddress thisquanticationissuefromanapplicationperspective, inparticularinthecontext ofICAinchapter 3 .Ishowthatalthoughseveralseeminglyunrelatedquantie rshave beenproposedinthelastdecadetoperformICA,thesequanti ersactuallyoriginate fromauniedframework.Then,Iexploreanovelquantierfo llowingthisframework thathasnotbeenexploredinICA.Anattractivepropertyoft hisquantieristhatit isparameterfree,andIexperimentallyshowthatitperform sequallywellcomparedto otherstate-of-the-artmethods.Ialsoaddressindetailth eissueoftrade-obetween computationalcomplexityandperformancethatexistsinth eindependencebasedICA methods. Quantiersofconditionalindependenceassubstituteofli nearGrangercausality,has receivedcomparativelylessattentionintheliteraturedu etotheinherentdicultyin assessingconditionalattributessuchasconditionaldens ityfunction.Therefore,Ipropose anewquantierofconditionalindependenceinchapter 4 .Someattractivepropertiesof thisapproachisitssimplicityandrobustness.Assessingc onditionalindependenceusually requiresfreeparameter(s)sinceestimatingconditionala ttributeisanill-posedproblem, andtheproposedapproachisnoexception.However,researc honthisareafocuseson designingquantiersthatarerobusttotheselectionofpar ametersandyetecientin 21

PAGE 22

inferringconditionalindependencecorrectlyfromsmalln umberofrealizations.Ishowthat theproposedmethodsatisfythisobjective. Theconceptofdependenceanditsquantiersarenotwellund erstoodinpractice. Therefore,Iaddressanovelframeworkforunderstandingan dquantifyingdependence inchapter 5 .Someattractivepropertiesofthisframeworkisthat,rst ,itcaptures dependencefromtheperspectiveoftherealizationsrather thantherandomvariables, aviewthatisusuallynon-existentintheliterature,secon d,itcanbeappliedinany arbitrarymetricspace,notjustEuclidean,andthird,itis parameterfree.Ishowthe eectivenessofthisapproachonseveralsyntheticandreal data,andalsodiscussits applicabilityinvariableselectionandcausalinference. Inthischapter,Ialsoprovidean extensivesurveyonexistingviewsofquantifyingdependen ceandaddresstheirdrawbacks. Theconceptofconditionaldependenceishardlyaddressedi ntheliterature. Therefore,inchapter 6 ,Iextendthemethoddescribedinthepreviouschapterto addressthisissue.Again,Iaddressconditionaldependenc efromtheperspectiveofthe realizations.Someattractivepropertiesofthisapproach isitsscalabilityandrobustness. Also,itdoesnotrequireselectingafreeparameter.Idemon stratetheeectivenessofthis approachthroughbothrealandsimulatedexamples.HereIal soproposeanovelschemeof generatingsurrogatedataforthetestofconditionalindep endence. Insummary,thisdissertationaddressesthefollowingaspe ctstoenhancethe understandingandquanticationofthesefourconcepts.Fi rst,itdiscussestheapplicability ofthesemeasurestoparticularproblems.Second,itconstr uctssimple,robustand/or parameterfreequantiers.Third,itexploressomeofthese conceptsfromtheperspective oftherealizationsratherthanjustrandomvariablesorthe probabilitylaw.This dissertationisacollectionoffourpapersthathavebeenas sembledasindividualchapters. Thesepapersare( Sethetal. 2011 ),( SethandPrncipe 2011c ),( SethandPrncipe 2011b )and( SethandPrncipe 2011a ). 22

PAGE 23

CHAPTER2 BACKGROUND Inthischapter,Idiscusssomeinitialworksonindependenc eanddependencethat laythemotivationandfoundationofthefollowingchapters .Istartbydiscussingthe conceptofcorrentropy,ageneralizedsimilaritymeasure, andshowhowitcanbeexploited todesignameasureofindependence.Next,Iaddressthepost ulatesofdependenceas proposedbyRenyi,andshowhowastatisticsatisfyingthes epostulatescanbeestimated inpractice.Finally,Idiscusssomeissueswiththisapproa ches. 2.1Correntropy Correntropyisageneralizationofcorrelationthatextrac tsnotonlythesecondorder informationbutalsohigherordermomentsofthejointdistr ibution( Santamariaetal. 2006 ; Liuetal. 2007 ).Inthelastfewyears,thisconcepthasbeensuccessfullya ppliedin severalengineeringapplicationssuchastimeseriesmodel ing( ParkandPrncipe 2008 ; Xuetal. 2008b ),nonlinearitytest( GunduzandPrncipe 2009 ),objectrecognition ( Jeongetal. 2009 )andindependentcomponentanalysis( Lietal. 2007 ).Although, correntropyissimilartocorrelationbydenition,recent studieshaveshownthatit performsbetterthancorrelationwhiledealingwithnon-li nearsystemsandnon-Gaussian noiseenvironments,withoutanysignicantincreaseinthe computationalcost. Beforeproceedingwiththeideaofcorrentropy,Iintroduce theconceptofnon-negative denitenessandstateBochner'stheorem,thatwillbeessen tialinprovingthepropertiesof correntropy.Denition1. (Non-negativedenitefunctions) Acomplexvaluedfunction ( x ) ,dened onsomeset X ,issaidtobenon-negativedeniteif n X i =1 n X j =1 i j ( x i x j ) 0 whenever f x 1 x 2 ,..., x n g isanitesubsetof X and f 1 2 ,..., n g isanitesetof complexnumbers. 23

PAGE 24

Theuseofnon-negativedenitefunctionsinengineeringap plicationshasrecently gainedimmensepopularityduetotheadventofkernelmachin es( ScholkopfandSmola 2002 ).However,theuseofnon-negativedenitekernelsinthisc hapterisinspiredbythe relationbetweennon-negativedenitefunctionsandposit ivemeasuresontherealline ( Bochner 1941 ). Theorem2.1. (Bochner'stheorem) Everycontinuousnon-negativedenitefunction ( ) onthereallinehasthefollowingrepresentation ( z )= Z e i z (d ) forsomenitepositivemeasure on R ,thatis, ( ) istheFouriertransformofapositive measure Wewillusethisparticularpropertyofanon-negativedeni tekerneltoprovethe propertiesofcorrentropyandrelatedquantities. Let ( ) bearealvalued,continuous,symmetricandnon-negativede nitekernel, then,correntropyisdenedinthefollowingway.Denition2. (Correntropy) Giventworandomvariables X and Y ,correntropyisdened as V ( X Y )= E X Y [ ( X Y )]= ZZ ( x y )d F X Y ( x y ) (2{1) where E istheexpectationoperatorand F X Y ( x y ) isthejointprobabilitydistribution functionof X and Y Givenrealizations f ( x 1 y 1 ),( x 2 y 2 ),...,( x N y N ) g correntropycanbeestimatedusing stronglawoflargenumbers andthereforetheestimationisconsistent.Itcanbeseenth at if ( x y )= xy thencorrentropyactuallybecomescorrelation 1 (SeeFigure 2-1 ).However, 1 Thecorrelationkernel ( x y )= xy isalsoanon-negativedenitekernel.However,it isnotshiftinvariantbutaseparablekernel( Genton 2001 ). 24

PAGE 25

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -4 -3 -2 -1 0 1 2 3 4 AKernelforcorrelation -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BKernel(Gaussian)forcorrentropy Figure2-1.Kernelsusedfordeningcorrelationandcorren tropy inthischapterweonlyconcentrateonshiftinvariantkerne lsoftheform ( x y )= ( x y ) thatobeysBochner'stheorem.Denition3. (Centeredcorrentropy) Giventworandomvariables X and Y ,centered correntropyisdenedas U ( X Y )= E X Y [ ( X Y )] E X E Y [ ( X Y )] = ZZ ( x y ) f d F X Y ( x y ) d F X ( x )d F Y ( y ) g (2{2) where F X ( x ) and F Y ( y ) arethemarginalprobabilitydistributionfunctionsof X and Y respectively. Givenrealizations,centeredcorrentropycanagainbeesti matedusingstronglaw oflargenumbers.Also,notethatif ( x y )= xy thencenteredcorrentropyreducesto covariance.Denition4. (Correntropycoecient) Giventworandomvariables X and Y ,neitherof thembeingaconstantwithprobability 1 ,thecorrentropycoecientisdenedas ( X Y )= U ( X Y ) p U ( X X ) U ( Y Y ) (2{3) where U ( X Y ) isthecenteredcorrentropyof X and Y ,and U ( X X ) and U ( Y Y ) arethe centeredautocorrentropyof X and Y respectively. 25

PAGE 26

Correntropy,centeredcorrentropyandcorrentropycoeci entexhibitverysimilar propertiesasthoseofcorrelation,covarianceandcorrela tioncoecient.Belowwestate someofpropertiestodemonstratethisfact.Property1. V ( X X ) > 0 and U ( X X ) 0 with U ( X X )=0 ifandonlyif X is degenerateProof. V ( X X )= R ( x x )d F X ( x )= (0) > 0 and U ( X X )= (0) ZZ ( x y )d F X ( x )d F X ( y ) = (0) ZZ d F X ( x )d F X ( y ) Z e i ( x y ) (d ) = (0) Z (d ) Z e i x d F X ( x ) Z e i y d F X ( y ) = (0) Z (d ) Z e i x d F X ( x ) 2 0. If U ( X X )=0 thenwehave (0)= Z (d ) Z e i x d F X ( x ) 2 whichimplies Z e i x d F X ( x ) =1 almosteverywhere.Thereforeif 0 isalimitpointofthesupportof thenwemusthave X c where c isaconstanti.e. X isdegenerate.Thisalsoimpliesthatif X isnota degeneraterandomvariablethenwehave U ( X X ) > 0 Property2. Both V ( X Y ) and U ( X Y ) aresymmetricnon-negativedenitefunctions onthespaceofrandomvariables 26

PAGE 27

Proof. Thesymmetryfollowsfromthesymmetryofthekernel.Let 1 ,..., n 2 C ,and X 1 ,..., X n 2X ,then n X i =1 n X j =1 i j V ( X i X j ) = E n X i =1 n X j =1 i j ( X i X j ) 0, and n X j =1 n X k =1 j k U ( X j X k ) = n X j =1 n X k =1 j k ZZ d F X j X k ( x y ) d F X j ( x )d F X k ( y ) Z (d ) e i ( x y ) = Z ( d ) n X j =1 n X k =1 E j e i X j E [ e i X k ] k e i X k E [ e i X k ] = Z ( d ) E n X k =1 k e i X k E [ e i X k ] 2 0. Property3. j V ( X Y ) j p V ( X X ) V ( Y Y ) and j U ( X Y ) j p U ( X X ) U ( Y Y ) Proof. j V ( X Y ) j = ZZ ( x y )d F X Y ( x y ) Z (d ) ZZ e i ( x y ) d F X Y ( x y ) Z (d )= (0)= p (0) p (0) = p V ( X X ) p V ( Y Y ), and j U ( X Y ) j = ZZ ( x y ) f d F X Y ( x y ) d F X ( x )d F Y ( y ) g Z d ZZ e i ( x y ) f d F X Y ( x y ) d F X ( x )d F Y ( y ) g = Z (d ) cov( e i X e i Y ) 27

PAGE 28

Z (d ) p var( e i X ) p var( e i Y ) s Z var ( e iX ) (d ) s Z var ( e iY ) (d ) = p U ( X X ) p U ( Y Y ). Corollary1. 1 ( X Y ) 1 Property4. ( X Y )=1 ifandonlyif Y = X Proof. ) isstraightforward.Next,consider ( U ( X Y )= Z (d )cov e i X e i Y (2{4) Z (d ) p var[ e i X ] p var[ e i Y ] (2{5) Z (d )var[ e i X ] 1 = 2 Z (d )var[ e i Y ] 1 = 2 (2{6) = p U ( X X ) p U ( Y Y ). When ( X Y )=1 ,theinequalitiesintheaboveequationsturnintoequality .Therefore, equalityofeq.( 2{5 )andeq.( 2{6 )implies var[ e i X ]= 2 var[ e i Y ] (2{7) foralmostall ,where isaconstantandsinceweassumethesupportof inthereal linethisholdsforall .Equalityofeq.( 2{4 )andeq.( 2{5 )implies cov( e i X e i Y )= p var[ e i X ] p var[ e i Y ] andtherefore, e i X E [ e i X ]= r ( )( e i Y E [ e i Y ]), r ( ) > 0. (2{8) where r ( ) isconstantandfromeq.( 2{7 )andeq.( 2{8 )weget, r ( )= 28

PAGE 29

Therefore,wehave, e i X E [ e i X ]= [ e i Y E [ e i Y ]] (2{9) forall .Multiplyingbothsidesofeq.( 2{9 )by f 2 L 1 andintegratingwithrespectto yields ^ f ( X ) E [ ^ f ( X )]= [ ^ f ( Y ) E [ ^ f ( Y )]] almosteverywherewhere ^ f ( x )= Z e i x f ( )d Thissetoffunctionsisanalgebraandwecanthusapproximat eallcontinuousfunctions vanishingatinnity,uniformly.Thuswehave, ( X ) E [ ( X )]= [ ( Y ) E [ ( Y )]] forallcontinuousfunctionfunctions ( ) andhence,takinglimits,allBorelfunctions. Inparticularif A isBorelsubsetof R 1 A ( X ) P ( X 2 A )= [ 1 A ( Y ) P ( Y 2 A )] (2{10) almosteverywhere.Let A besuchthat 0 < P ( X 2 A ) < 1 .Noweq.( 2{10 )holdswith probability 1 .Takingan suchthat X ( ) 2 A andan ~ suchthat X (~ ) = 2 A ,weget P ( X 2 A c )= [ 1 A ( Y ( )) P ( Y 2 A )] (2{11) P ( X 2 A )= [ 1 A ( Y (~ )) P ( Y 2 A )]. (2{12) As 0 P ( Y 2 A ) 1 eq.( 2{11 )forces 1 A ( Y ( ))=1 andeq.( 2{12 )forces 1 A ( Y (~ ))= 0 .Thus, P ( X 2 A c )= P ( Y 2 A c ) (2{13) P ( X 2 A )= P ( Y 2 A ). (2{14) 29

PAGE 30

Addingequations( 2{13 )and( 2{14 ),weget =1 .Thuswehave, e i X E [ e i X ]= e i Y E [ e i Y ]. forall .Takingderivativesonbothsidestwiceandletting =0 weget, X E [ X ]= Y E [ Y ] (2{15) X 2 E [ X 2 ]= Y 2 E [ Y 2 ]. (2{16) Computing X 2 fromeq.( 2{15 )andputtingitineq.( 2{16 ),weget, 2 Y ( E [ X ] E [ Y ])=( E [ Y 2 ] E [ X 2 ]) ( E [ Y ] E [ X ]) 2 whichisarstorderequationof Y if X 6 = Y .Solvingthisequationwegetonlyone valueof Y indicatingthat Y isadegeneraterandomvariable.Butthisisacontradiction Therefore X = Y Thesepropertiesareverysimilartothepropertiesofcorre lationandrelatedstatistics andplayacrucialroleinmanyapplicationswherecorrelati onisreplacedwithcorrentropy. Inthischapterweexploreitsapplicabilityasameasureofi ndependence.Interestingly enough,thepropertiesexhibitedbycorrentropyandcorrel ationareverysimilarinthis context,too.Weshowthisinthefollowingpropositions.Proposition2.1. If X and Y arejointlynormalrandomvariablesthencenteredcorrentropyof X E [ X ] and Y E [ Y ] iszeroifandonlyiftherandomvariablesare independent.Proof. If X isanormalrandomvariablewithmean m andvariance 2 ,then E [ e i X ]=exp im 2 2 2 (2{17) 30

PAGE 31

Let X and Y bejointlynormaland X N ( m 1 2 1 ) and Y N ( m 2 2 2 ) ,then X Y is alsonormallydistributedand X Y N ( m 1 m 2 2 1 + 2 2 2cov( X Y )) .Therefore X m 1 N (0, 2 1 ) Y m 2 N (0, 2 2 ) and ( X m 1 ) ( Y m 2 ) N (0, 2 1 + 2 2 2cov( X Y )) Usingeq.( 2{17 )wend U ( X m 1 Y m 2 ) = Z (d )exp 2 1 + 2 2 2 +cov( X Y ) 2 Z (d )exp 2 1 + 2 2 2 2 = Z (d )exp 2 1 + 2 2 2 2 exp (cov( X Y )) 2 1 Thus U ( X m 1 Y m 2 ) > 0 if cov( X Y ) > 0 U ( X m 1 Y m 2 ) < 0 if cov( X Y ) < 0 and U ( X m 1 Y m 2 )=0 if cov( X Y )=0 .Thereforeif ( X Y ) isjointlynormalthen X m 1 Y m 2 andhence X Y areindependentifandonlyif U ( X m 1 Y m 2 )=0 Proposition2.2. Zerocenteredcorrentropydoesnotimplyindependence. Proof. Weprovethisusingacounterexample.Let a n > 0 a n = a n and P 1n = 1 a n =1 thatis a 0 +2 P 1n =1 a n =1 .Let putmass a n at n .Then ( x )= Z e 2 i x (d )= 1 X n = 1 e 2 ixn a n = a 0 + 1 X n =1 ( e 2 ixn + e 2 ixn ) a n = a 0 +2 1 X n =1 cos(2 xn ) a n Herewechangethedenitionofthekernelslightlybyintrod ucinganextra 2 termto simplifytheproof.Notethat ( ) isnonnegativedenitebutnotnecessarilypositiveand ( m )= a 0 +2 P 1n =1 a n =1 forall m =0, 1, 2,... .Thusforanyprobabilitymeasure F concentratedonpoints ( m n ) m n =0, 1, 2,... ,wehave, Z ( x y )d F ( x y )= X ( m n )d( m n )= X d F ( m n )=1. Inparticularwesee, Z ( x y ) f d F ( x y ) d G ( x y ) g =0 forallsuchmeasures.Thiscompletestheproof. 31

PAGE 32

Fromthesepropositionswhatwendisthat,rst,althoughc enteredcorrentropy iszerounderindependence,zerocenteredcorrentropydoes not imply independence, andsecond,forGaussianrandomvariablesindependencecan beinferredfromcentered correntropy.Thesepropertiesaresimilartothoseofcovar iance.However,correntropyis conceptuallydierentfromcorrelationandperhapsbetter thanthelatterinthecontext ofdetectingindependence.Forexample,considerthefollo wingsituation.If X and Y are randomvariables,then cov( X a Y )=cov( X Y ) forall a .Therefore, cov( X Y )=0 implies cov( X a Y )=0 forall a .Thus,noinformationisobtainedbytheparameter a Supposeontheotherhand U ( X a Y )=0 forall a .Then, Z e i a cov( e i X e i Y ) (d )=0 forall a .Thisimplies E [ e i ( X Y ) ]= E [ e i X ] E [ e i Y ] almosteverywhere.Toappreciatethe\degree"ofindepende ncethisimplies,letus assumethatallthequantities,in ,aboveareentireandthatsupportof hasalimit point.Then,equatingthecoecientsof n onbothsidesoftheequationweget,forall n E [( X Y ) n ]= n X k =0 n k !( n k )! E [ X k ( Y ) n k ]= n X k =0 n k !( n k )! E [ X k ] E [ ( Y ) n k ], whichimpliesthattherandomvariablesare\almostindepen dent".Thisexampleshows that,correntropy,conveysmoreinformationaboutindepen dencethanuncorrelatedness. But,we,still,cannotinferindependenceexactly,i.e.fro mamathematicalsense. However,asshownin( Lietal. 2007 )centeredcorrentropyisanappropriatecontrast functionforindependentcomponentanalysis(ICA).Next,w ediscussthenecessarysteps thatwouldleadtoanexacttestofindependencebasedoncorr entropy.Inordertodesign anexacttestofindependencefromcenteredcorrentropy,we introducetheconceptof parametriccenteredcorrentropy. 32

PAGE 33

Denition5. (ParametricCenteredCorrentropy) Giventworandomvariables X and Y theparametriccenteredcorrentropyisdenedas U a b ( X Y )= E X Y [ ( aX + b Y )] E X E Y [ ( aX + b Y )] = ZZ ( ax + b y ) f d F X Y ( x y ) d F X ( x )d F Y ( y ) g where a and b arescalarsin R and a 6 =0 Usingtheconceptofparametriccenteredcorrentropywecan designatestof independenceduetothefollowinglemma.Lemma1. (Zeroparametriccenteredcorrentropyandindependence) Giventworandom variables X and Y theparametriccenteredcorrentropy U a b ( X Y )=0 forall a b 2 R if andonlyif X and Y areindependent. Proof. Thesucientconditionisstraightforward.Forthenecessa ryconditionassume U a b ( X Y )=0 forall a b 2 R .Fromthedenitionwehave, U a b ( X Y )= ZZ ( ax + b y ) f d F X Y ( x y ) d F X ( x )d F Y ( y ) g = ZZ ( ax + b y )d Q ( x y ) where Q ( x y )= F X Y ( x y ) F X ( x ) F Y ( x ) .Thereforetoproveindependenceweneedto showthat d Q ( x y )=0 .Now, U a b ( X Y )=0 forall a b 2 R implies Z e i b ZZ e i ( ax y ) d Q ( x y ) (d )=0 forall a b 2 R ,inparticularforall b 2 R .Hence, ZZ e i ( ax y ) d Q ( x y )=0 foralmostall as isalwayspositive.Sincethesupportof is R ,thisholdsforall a 2 R .Thisiseasilywrittenas Z e i ( x + y ) d Q ( x y )=0 33

PAGE 34

forall 2 R .Thusweconcludethat dQ =0 2 Whatthislemmastatesisthattworandomvariablesareindep endentifandonlyif theparametriccenteredcorrentropyiszeroforallpossibl eparametervalues.Therefore, usingthelemmawecandeneatestofindependenceasfollows Denition6. (CorrentropyIndependenceMeasure) Giventworandomvariables X and Y correntropyindependencemeasure isdenedasfollows. ( X Y )=sup a b j U a b ( X Y ) j (2{18) where a b 2 R ( X Y ) isavalidmeasureofindependencesinceitiszeroifandonly if X and Y areindependent.Theproposedtestsofindependencerequir ecomputingtheparametric centeredcorrentropyfordierentparametervalues.Given realizations f ( x i y i ) g Ni =1 ,the estimatorofcenteredcorrentropyisgivenby ^ U ( X Y )= 1 N N X i =1 ( x i y i ) 1 N 2 N X j =1 N X k =1 ( x j y k ). Therefore,astraightforwardcomputationofcenteredcorr entropyrequires O ( N 2 ) computationwhichisexpensiveformanyapplications 3 .However,thecomputation complexitycanbereducedsignicantlybychoosingLaplaci ankernel. 2 ThelastlineoftheprooffollowsfromthetheoryofFouriert ransformthatifthe Fouriertransformofameasureiszeroeverywherethentheme asureiszeroeverywhere ( Kankainen 1995 ). 3 Since isnonnegativedenite,theestimatorcanbecomputedecie ntlyasdescribed in( SethandPrncipe 2009 ).However,inthischapterwepresentadierentand,perhap s, moreecientmethodexploitingthefactthattheproposedte stsarebivariatei.e.both X and Y areonedimensionalrandomvariables 34

PAGE 35

Consider ( z ) tobeaLaplaciankerneloftheform ( z )= e j z j then, ^ U ( X Y )= 1 N N X i =1 e j x i y i j 1 N 2 N X j =1 N X k =1 e j x j y k j Inthisexpressionthersttermcanbecomputedin O ( N ) time.Tocomputethesecond termwefollowtheapproachproposedby( Chen 2006 ).Notethatthistermcanbe rewritteninthefollowingway, N X j =1 24 e x j X f k : y k x j g e y k + e x j X f k : y k > x j g e y k 35 Now,letusassumethat f y i g Ni =1 issortedinascendingorder.Ifthesequenceisnotsorted thenitcanbesortedin O ( N log N ) timeusinganyoptimalsortingalgorithm.Then,the expressioncanjustbewrittenas n X j =1 e x j K X k =1 e y k + e x j n X k = K +1 e y k # where K ischosensuchthat y K x j and y K +1 > x j .Nowletsassumethatwehavethe cumulativesums f S K = K X k =1 e y k g NK =1 and f S K = n X k = K +1 e y k g NK =1 ofthesorted f y i g Ni =1 .Thesesumscanbecomputedin O ( N ) time.Finallyusing f e x i g ni =1 f e x i g ni =1 f S i g ni =1 and f S i g ni =1 ,thedoublesumcanbecomputedin O ( N ) time.Therefore, theoverallcomplexityofcomputingcenteredcorrentropyu singLaplaciankernelbecomes O ( N log N ) insteadof O ( N 2 ) Inordertocompute ( X Y ) ,weneedtosearchatwodimensionalspacefor supremumofapossiblynonlinearfunction.Although,compu tingasinglevalueof parametriccorrentropyis O ( N log N ) intime,thesearchcanstillbeexpensivedepending onthefeaturesofthesurface.Inordertoestablishacompro misebetweencomputation 35

PAGE 36

andaccuracy,wesuggesttousegridsearch.Theresolutiono fthegridis,ofcourse,user denedandanergridresultsinbetteraccuracy.Moreover, thegridsearchonlyadds amultiplicativeconstanttotheoverallcomplexity,thus, keepingtheoverallcomplexity O ( N log N ) .Werewrite a as tan andsearchoverthegrid =0: = 40: and b = 2:0.5:2 .Notethatothersophisticatedoptimizationtechniquessu chasthe halfquadraticoptimizationtechniquecanalsobeappliedt oecientlysolvethisproblem ( YuanandHu 2009 ). Aninterestingpropertyofthecorrentropyindependencete stisthatitgeneralizes theGaussianityassumptionrequiredbycorrelation.Let X Y benormalwithmeanzero andvariance 2 1 and 2 2 respectivelythatis X N (0, 2 1 ) and Y N (0, 2 2 ) .Then thegeneralnondegenerate,bi-Gaussianwithmarginals N (0, 2 1 ) and N (0, 2 2 ) hasthe followingdensity, f ( x y )= 1 2 1 2 p 1 2 exp 1 2(1 2 ) x 2 2 1 + y 2 2 2 2 xy 1 2 where j j 1 .Thusfor 0 p i 1 P p i =1 ,themixtureofGaussian P mi =1 f i ( x y ) hasmarginals N (0, 2 1 ) and N (0, 2 2 ) .Also,if ( X Y ) hasjointdensityoftheform P mi =1 f i ( x y ) then cov( X Y )= 1 2 P mi =1 p i i and cov( X Y )= cov( X Y ) and X Y and X + Y havedensities m X i =1 p i N (0, 2 1 + 2 2 2 1 2 ) and m X i =1 p i N (0, 2 1 + 2 2 +2 1 2 ), respectively.Theorem2.2. Suppose ( X Y ) hasjointdensityoftheform P mi =1 f i ( x y ) asdescribed above.Then ( X Y ) isindependentifandonlyif U ( X Y )= U ( X Y )=0 Proof. Usingeq.( 2{17 )wend 36

PAGE 37

U ( X Y )= Z (d ) m X i =1 p i exp 2 1 + 2 2 2 + i 1 2 2 exp 2 1 + 2 2 2 2 # = Z (d )exp 2 1 + 2 2 2 2 m X i =1 p i exp i 1 2 2 1 # (2{19) Suppose U ( X Y )=0 .Thentheintegrandineq.( 2{19 )andhence m X i =1 p i exp( i 1 2 2 ) 1 assumesbothpositiveandnegativevalues.Withoutlossofg enerality,wereplace 2 by Thenfor > 0 ( )= m X i =1 p i ( exp( i 1 2 ) 1 ) assumesbothpositiveandnegativevalues.Now ( ) isconvexin (0, 1 ) and (0)=0 Since isconvex isincreasing.If assumesnegativevalues,since (0)=0 (0) must assumenegativevaluethatis P p i i < 0 andif U ( X Y )=0 theargumentaboveleads to P p i i > 0 .Therefore,if U ( X Y )=0 and U ( X Y )=0 then P p i i =0 and hence, (0)=0 whichimplies ( ) 0 .Thentheintegrandin( 2{19 )iszero almost everywhereandif hassupport R ,theintegrandisidenticallyequaltozero.Thenallthe i =0 thatis X Y areindependent. Thistheoremprovesthatthesearchspacecanberestrictedd rasticallywith appropriateassumptionsontheunderlyingdistribution.A lthoughtheassumeddistribution isnotthemostrexibleone,itcanbeeasilyseenthatitisact uallyageneralizationofthe Gaussianityassumptioni.e.,theGaussianityassumptioni saspecialcasewhen m =1 2.2MeanSquareContingency Theconceptofdependenceinstatisticshasbeenconceivedb ymanyrenowned authorse.g.theconceptofpositivequadrantdependenceby Lehmann( Lehmann 1966 ) 37

PAGE 38

andtheconceptofcopulabySklar( Nelsen 1999 ).However,whenapplyingtheseconcepts inengineeringapplications,theapproachproposedbyRen yiisperhapsthemostintuitive one.Renyiinhisseminalchapterproposedasetofpostulat esthathefounddesirablefor ameasureofdependence( Renyi 1959 ).Unlikeotherapproachestodenedependence, thesepostulatesleadtothenotionof\functional"depende nce.Renyialsoshowedthat thereisonlyonemeasure,namelythemaximalcorrelationco ecient(MCC)thatsatises allthesedesiredpropertiesandseveralothermeasures,su chasMIandmeansquare contingency(MSC),thatsatisfymostofthem( Renyi 1959 ) 4 .Thesepostulateshave furtherbeeninvestigatedbyothersandseveralmodicatio nsand/orsimplicationshave beensuggested( SchweizerandWol 1981 ; Grangeretal. 2004 ). ThenotionofMSCwasrstintroducedbyPearsonfortwodiscr eterandomvariables ( Pearson 1915 ).Let X and Y betworandomvariablesthatcantakevalues x i for i =1,..., m and y j for j =1,..., n respectively.ThenMSC( )isdenedas 2 = m X i =1 n X j =1 P ij P i P j 1 2 P i P j = m X i =1 n X j =1 P 2 ij P i P j 1. where P ij denotestheprobability P ( X = x i Y = y j ) and P i = P j P ij and P j = P i P ij MSCcanberegardedasadependencemeasureasitsatisesthe followingproperties, 1. 2 vanishesifandonlyiftherandomvariablesareindependent 2. 2 reachesitsmaximumifandonlyiftherandomvariablesareco mpletely dependenti.e.oneisafunctionoftheother. Moreover,itcanbeshownthat 0 2 max( m n ) 1 i.e.MSCisboundedand thus,itcanbeappropriatelynormalizedbetween 0 and 1 .Othermodicationsofthe denitionofMSCcanbefoundin( Royer 1933 ; Steensen 1934 ). 4 However,thedenitionofMSCandMIcanbemodiedsuchthatt heysatisfyall thesedesiredproperties( MicheasandZografos 2006 ). 38

PAGE 39

RenyilaterextendedtheideaofMSCtoamoregeneralsetofr andomvariables ( Renyi 1959 1970 ).Beforeproceeding,letusdiscussthenotionofabsolutec ontinuity andregulardependence.If and aretwomeasuresonthesamemeasurablespace ( Z C ) then issaidtobeabsolutelycontinuouswithrespectto if ( C )=0 foreveryset C 2C forwhich ( C )=0 ( Renyi 1959 ).Nowconsidertworandomvariables ( X Y ) that takevaluesintheproductspace XY .Dene ( A B )= P ( X 2 A Y 2 B ) and ( A B )= P ( X 2 A ) P ( Y 2 B ). Thenthedependencebetween X and Y aresaidtobe regular if isabsolutelycontinuous withrespectto ( Renyi 1959 ).Now,ifthedependencebetween X and Y isregular thenaccordingtoRadon-Nikodym(R-N)derivativethereexi stsameasurablefunction h ( x y )=d = d suchthat ( A B )= Z A B h ( x y )d Let F XY ( x y ) bethejointdistributionof ( X Y ) and F X ( x ) and F Y ( y ) bethemarginal distributions,then, P ( X 2 A Y 2 B )= Z A Z B h ( x y )d F X ( x )d F Y ( y ). andiftherandomvariableshavejointdensity f XY ( x y ) andmarginaldensities f X ( x ) and f Y ( y ) then h ( x y )= f XY ( x y ) f X ( x ) f Y ( y ) (2{20) Usingthefunction h ( x y ) RenyidenedMSC( )betweentwoarbitraryregularly dependentrandomvariables X and Y as 2 = Z ( h ( x y ) 1) 2 f X ( x ) f Y ( y )d x d y = Z h ( x y ) f XY ( x y )d x d y 1. 39

PAGE 40

NotethatMSCisniteifandonlyifthelikelihoodratio h ( x y ) issquareintegrablewith respectto F X ( x ) F Y ( y ) i.e. h ( x y ) 2 L 2 ( F X ( x ) F Y ( y )) MSCdenedinsuchawayisalwayspositivei.e. 2 [0,+ 1 ] anditcanbe normalizedinthefollowingway, ( X Y )= p 1+ 2 suchthat 2 [0,1] ( X Y ) satisesveoutofsevenpostulatesproposedbyRenyi( Renyi 1959 ) 5 .AccordingtoRenyiif C ( X Y ) isameasureofdependence,then, 1. C ( X Y ) isdenedforanypairofrandomvariables X and Y ,neitherofthembeing constantwithprobability1. 2. C ( X Y ) = C ( Y X ) 3. 0 C ( X Y ) 1 4. C ( X Y ) =0 ifandonlyif X and Y areindependent. 5. C ( X Y ) =1 ifthereisastrictdependencebetween X and Y ,i.e.either X = g ( Y ) or Y = f ( X ) where f and g areBorel-measurablefunctions. 6.IftheBorel-measurablefunctions f and g maptherealaxisinaonetoonewayonto itself, C ( f ( X ) g ( Y )) = C ( X Y ) 7.Ifthejointdistributionof X and Y isnormal,then C ( X Y ) = j R ( X Y ) j where R ( X Y ) isthecorrelationcoecientof X and Y ( X Y ) doesnotsatisfytherstandthefthpropertyasitisdened onlyforregularly dependentrandomvariablesandunderstrictdependencei.e Y = f ( X ) therandom variablesarenotregular.However,thedenitionofMSCcan beslightlymodiedandit canbeshownthatMSCsatisesallthepostulates( MicheasandZografos 2006 ). MSCisalsocloselyrelatedtoanotherimportantconceptofd ependence,namelythe copula .Sklarprovedthatgiventworandomvariables X and Y withjointdistribution F XY 5 ThisparticulartransformationallowsRenyi'sseventhpo stulatetobesatised. 40

PAGE 41

andmarginaldistributions F X ( x ) and F Y ( y ) ,thereexistsafunction C :[0,1] 2 [0,1] suchthat, C ( F X ( x ), F Y ( y ))= F XY ( x y ) ( MariandKotz 2001 ).Thefunction C iscalled copulaanditisuniqueifthejointdistributioniscontinuo us.Copulacanalsoberegarded asthejointdistributionfunctionoftwouniformrandomvar iables, F X ( X ) and F Y ( Y ) Tworandomvariablesareindependentifandonlyif C ( u v )= uv If C isdoubledierentiablethen, f XY ( x y )= @ 2 C ( u v ) @ u @ v f X ( x ) f Y ( y ). Thereforeitcanbereadilyseenthat h ( x y )= @ 2 C ( u v ) @ u @ v = c ( u v ). Usingthisfact,MSCcanbedenedas 2 = ZZ @ 2 C ( u v ) @ u @ v d C ( u v ) 1= ZZ c 2 ( u v )d u d v 1. NotethatMSCisafunctionofthedensityofcopulaonly(i.e. itdoesnotdependonthe marginaldistributionsof X and Y ).Therefore,itinheritsthenicepropertiesofacopula suchasinvariancetomonotonictransformation( MariandKotz 2001 ). InordertoderiveanestimatorofMSC,weexploitthefacttha tMSCisthedistance between h ( x y ) and 1 inthe L 2 ( F X ( x ) F Y ( y )) space.Let ^ h ( x y ) 2 L 2 ( F X ( x ) F Y ( y )) bean estimateofthelikelihoodratio h ( x y ) .ThenanestimateofMSCisgivenby, ^ 2 = Z ( ^ h ( x y ) 1) 2 f X ( x ) f Y ( y )d x d y Usingtriangleinequality(SeeFigure 2-2 ),weget, ^ (2{21) where = Z ( h ( x y ) ^ h ( x y )) 2 f X ( x ) f Y ( y )d x d y 1 2 41

PAGE 42

1 (,) hxy 2 (()()) XY LFxF y ˆ (,) hx y ) ˆ ) # Figure2-2.Graphicaldescriptionofmeansquarecontingen cy(MSC)anditsestimator Therefore,wendanestimateof ^ h ( x y ) thatminimizestheupperboundinthe inequalityi.e.wesolve, h ( x y )=argmin ^ h ( x y ) 2 (2{22) Sinceitis impossible tosearchtheentire L 2 space,werestrictourattentiontoasetof functionsdenedasfollows, H = ( ^ h ( x y ): ^h ( x y )= n X i =1 i 1 ( x x i ) 2 ( y y i ) ) (2{23) where f ( x i y i ) g ni =1 arerealizationsof ( X Y ) and 1 and 2 arepositivedenitefunctions. Wewilljustifytheselectionofthisfunctionalspaceindet ailinthefollowingsubsection. Replacing ^ h ( x y ) in weget, 2 = Z f XY ( x y ) f X ( x ) f Y ( y ) n X i =1 i 1 ( X x i ) 2 ( Y y i ) 2 f X ( x ) f Y ( y )d x d y = Z f 2 XY ( x y ) f X ( x ) f Y ( y ) d x d y 2 E XY n X i =1 i 1 ( X x i ) 2 ( Y y i ) + E X ? Y n X i =1 n X j =1 i j 1 ( X x i ) 2 ( Y y i ) 1 ( X x j ) 2 ( Y y j ) Z f 2 XY ( x y ) f X ( x ) f Y ( y ) d x d y 2 n > K [ X Y ][ X Y ] 1 42

PAGE 43

+ 1 n 2 > ( K XX K XX K YY K YY ) where K ZZ ( i j )= ( z i z j ) and K [ X Y ][ X Y ] = K XX K YY where denotestheHadamard (entrywise)productoftwomatrices.Asthersttermdoesno tdependon wecanignore it.Therefore,ouroriginalproblem( 2{22 )reducesto =argmin 1 n 2 > ( K XX K XX K YY K YY ) 2 n > K [ X Y ][ X Y ] 1 (2{24) Since K XX and K YY arepositivedenite,sois ( K XX K XX K YY K YY ) .Thisresultfollows fromSchur(Hadamard)producttheorem( Ballantine 2005 ).Therefore,thesolutionis givenby = n (( K XX K XX K YY K YY )) 1 K [ X Y ][ X Y ] 1 (2{25) Usingthesolution ,anestimateofMSCisgivenby, ^ 2 = E X ? Y n X i =1 i 1 ( X x i ) 2 ( Y y i ) 1 # 2 1 n 2 n X j =1 n X k =1 n X i =1 i 1 ( x j x i ) 2 ( y k y i ) 1 # 2 = 1 n 2 > ( K XX K XX K YY K YY ) 2 n 2 > ( K XX 1 K YY 1 )+1 (2{26) where E X ? Y denotestheexpectationtakenoverthemeasure F X ( x ) F Y ( y ) .Noticethata directcomputationofthisestimatoris O ( n 3 ) incomplexity. 2.3Discussion Inthischapter,Ihaveestablishedanoveltestofindepende ncebasedontheconcept ofcorrentropy,andanovelestimateofmeansquarecontinge ncybasedonkernelleast squareregression.Thecorrentropyindependencetestisea sytoevaluate,anditprovides 43

PAGE 44

anelegantgeneralizationofcorrelationusingnon-negati vedenitefunctions.Moreover, usingcorrentropycoecientitispossibletoconstructame asureofdependencesince itreachesitsmaximumifandonlyif X and Y arelinearlyrelated.However,afew drawbacksofthisapproacharethatitisonlydenedbetween tworealvaluedrandom variables,anditrequiressearchingtwoparameters.Altho ughevaluatingindependence betweentworealvaluedsignalsissucientinthecontextof ,e.g.,ICA,therequirement ofsearchingprohibitsitsapplicabilityinthisproblem.O ntheotherhand,althoughthe proposedestimatorofMSCisconsistentinnature,itiscomp utationallyexpensiveand requiresselectingtwofreeparameters,i.e.,thekernelan dtheregularizer.Moreover,it isnotclearfromtheestimationpointofview,whentheestim atedvalueismaximumor minimum,andwhenitincreasesordecreases. Thesetwoexamplesdemonstratethatjustestablishingamea sureofindependenceor dependenceisnotsucientinpractice,buttheyaredesired tobesimpletoevaluateand shouldconveysimpleunderstandingtobeapplicableinprac ticalproblems. 44

PAGE 45

CHAPTER3 INDEPENDENCE Statisticalindependencebetweenrandomvariableshasbec omeaneectivetoolin manysignalprocessingapplicationssuchasindependentco mponentanalysis(ICA)where theobjectiveisto extractstatisticallyindependentsources fromasetof(linearly)mixed observations.Althoughtheideal contrastfunction forICAisa measureofindependence suchasmutualinformation,theinherentdicultyinassess ingindependencereliablyfrom niterealizationshaspreventedtheapplicabilityofthis approachinpracticalapplications. InsteadearliermethodsofICAhaveusuallyexploitedsurro gate contrastfunctions such ashigherordercumulants( CardosoandSouloumiac 1993 )andnegentropy( Hyvorinen 1999 ).Theseapproacheseitherestablishanecessary(butnotsu cient)conditionof independenceorapproximatethecontrastfunctionbysimpl erandtractablenonlinearities. Although,theresultingapproachesaresucientlyfastand memoryecient,theydonot guaranteeoptimality. Withtheadventofreliableestimatesofstatisticalindepe ndence,anumberof algorithmshavebeenproposedinrecentyearsforaccurates ourceextraction.However, thesealgorithmsfacemanychallenges:(1)computationall oadandchoiceof freeparameters forreliableestimationofindependence,and(2)dicultyo fsearchinganonlinear performancesurfaceduetothepresenceoflocalminima,and asaresult,(3)lackof scalabilitytohigherdimensions.Therefore,inrecentyea rs,researchonICAhasprimarily focusedontheissuesof,rst,generatingreliable,andyet computationallyecient measuresofindependence,andsecond,developingfast,and yetaccuratemethodsof optimization.Inthischapter,weexploresomeoftheseissu es.Wedevelopaframework forgeneratingmeasuresofindependencethatuniessevera lexistingmeasures,introduce aparameterfreemeasureofindependencefollowingthisfra mework,discusssuitable approximationschemeforevaluatingthismeasureecientl y,exploreappropriate 45

PAGE 46

optimizationschemeforsearchingtheperformancesurface ,andnally,comparethe proposedframeworkagainstexistingmethodsofICAtoaddre sstheirprosandcons. Tworandomvariables X and Y aresaidtobeindependent(denotedby X ? Y )ifthe correspondingjointprobabilitylawcanbefactorizedinto marginalprobabilitiesi.e. P ( X 2 A Y 2 B )= P ( X 2 A ) P ( Y 2 B ). Therefore,ameasureofindependencei.e.anon-negativest atisticoftworandomvariables thatassumeszerovalueifandonlyiftherandomvariablesar eindependent,canbe designedbyestimatingthe distance betweentherespectivejointandtheproductofthe marginalprobabilitylaws.Iftherandomvariablestakeval uesin R d ,thenthereexist othersucientdescriptorssuchastheprobabilitydensity function(pdf),thecumulative distributionfunction(cdf)orthecharacteristicfunctio n(chf),thatcompletelydescribe aprobabilitylaw.Throughoutthelastdecade,anumberofme asuresofindependence havebeenproposedbasedonthesedescriptors.Forexample, Kankainen et.al. have proposedameasureofindependenceusingtheweighted l 2 distancebetweenthejointand marginalchfs( Kankainen 1995 ),whereasPrincipe et.al. haveproposedthe l 2 distance betweenthejointandmarginalpdfsasameasureofindepende nce,andhaveproposed aParzentypeestimatorofthisquantity( Prncipe 2010 ).Achard et.al. ,ontheother hand,haveproposedtheweighteddistancebetweenthe transformed densityfunctionsasa measureofindependence( Achardetal. 2003 ).Althoughthesemeasuresareconceptually dierentfromeachother,theyshareacommonfeature:Theya reessentially thesquared (weighted) l 2 distanceofsomedescriptoroftherandomvariables .Therefore,wereferto allthesemeasuresasthe quadraticmeasures .Followingthisobservation,itisnaturalto askwhetherthesemeasuresfollowacommonprinciple.Inthi schapter,weaddressthis questionandprovideadenitiveanswerbyshowingthatthes emeasuresindeedoriginate fromauniedtheoreticalframework. 46

PAGE 47

Inordertounifythesemeasuresofindependence,wegeneral izetheconceptof informationtheoreticlearning(ITL).ITLisanemergingar eaofresearchthathasgained prominenceinthesignalprocessingcommunitybyassociati ngphysicalmeaningto informationtheoreticquantitiese.g.Renyi'squadratic entropyanddivergence( Prncipe 2010 ).InITLthesimilaritybetweentworandomvariablesiseval uatednotjustinterms oftheirsecondorderstatistics,butintermsoftheirentir epdfs.Thisisachievedbyusing thecrossinformationpotential(CIP) v ( f X f Y )= R f X ( z ) f Y ( z )d z whichisinessence theinnerproductofthepdfs f X ( x ) and f Y ( y ) assumingthattheyarein l 2 .UsingCIP, thedistance D betweentworandomvariablescansimplybeevaluatedastheE uclidean distancebetweentheircorrespondingpdfsi.e. D 2 ( f X f Y )= v ( f X f Y f X f Y ) .Noticethat D =0 i f X = f Y almosteverywhere(a.e.). 1 Therefore,theproblemofindependencecan beaddressedinthisframeworkbyreplacing f X and f Y by f XY and f X f Y ,respectively. InthischapterwegeneralizethedenitiontoCIP,anddene thegeneralizedCIP (GCIP)as v ( f X f Y )= Z ( x y ) f X ( x ) f Y ( y )d x d y where ( x y ) isasymmetricstrictlypositivedenite(sspd)function[s eesection 3.1.1 fordenition].Noticethat,byvirtueofstrictpositivede niteness,GCIPisstillan innerproductin l 2 ,andundersomeregulatoryconditions[seeappendixformor e information] v ( X Y ) isaspecialcaseof v ( X Y ) when ( x y ) istheDirac'sdelta operator ( x y ) .AlsonoticethatGCIP, tosomeextent ,preservesthephysicalmeaning oftheCIPintact.Toelaboratethisidea,considertheestim atorsofthesetwoquantities. ReplacingthepdfsbytheircorrespondingParzenestimates ,theestimatorofCIPis givenby ( nm ) 1 P ni =1 P nj =1 p ( x i y j ) where p ( x y )= p ( x y ) isasymmetricpdf,and f x i g ni =1 and f y j g mj =1 aresamplesfrom X and Y ,respectively,whereasusingstronglawof 1 weomit`almosteverywhere'whenevidentfromcontext 47

PAGE 48

largenumbers,theestimatorofGCIPisgivenby ( nm ) 1 P ni =1 P nj =1 ( x i y j ) .Thesetwo expressionsareverysimilarexceptfortheconditionson p and :BeingaParzentype kernel, p isalwayspositivewhereas isallowedtotakenegativevalues,and p isusually shiftinvarianti.e. p ( x y )= p ( x y ) whereas isallowedtobenonshiftinvariant. Nonetheless,theintuitiveexplanationofkernels p ( x x i ) asthepotentialgeneratedby sample x i stillappliesinGCIPwherethepotentialisnowgeneratedby thekernel ( x x i ) instead( Prncipe 2010 ). GCIPprovidesarangeofsimilaritymeasuresin L 2 ,andallowsthedesignofafamily ofindependencemeasures.Withoutlossofgenerality,assu methat X and Y takevaluesin R ,and : R 2 R 2 R isansspdkernel.ThenfollowingITLinterpretation, D 2 ( f XY f X f Y )= v ( f XY f X f Y f XY f X f Y ) (3{1) isameasureofindependence.Weshowthatforparticularcho icesof thisfamily encompassestheabovementionedquadraticmeasuresofinde pendence.Tobespecic,we proposeageneric sspd kerneloftheform ( x y )= Z g ( x u ) g ( y u )d ( u ) (3{2) where g ( x u ) isanappropriatefunctionand isanappropriatemeasure,andshowthat withproperchoiceof g ( x u ) and ,wecanreproducetheabovementionedmeasuresof independence.Moreover,givenrealizations f x i y i g ni =1 andaproductkerneloftheform (( x y ),( u v ))= 1 ( x u ) 2 ( y v ) where i : R d i R d i R i =1,2 aresspdkernels, D canbeestimatedusingstronglawoflargenumbersasfollows ^ D 2 1 2 ( f XY f X f Y ) = 1 n 2 n X i =1 n X j =1 1 ( x i x j ) 2 ( y i y j ) 2 n 3 n X i =1 n X j =1 n X k =1 1 ( x i x k ) 2 ( y j y k ) 48

PAGE 49

+ 1 n 4 n X i =1 n X j =1 1 ( x i x j ) n X i =1 n X j =1 2 ( y i y j ). (3{3) Ithasbeenobservedthatthequadraticmeasuresofindepend encesharethe same estimatoroftheform( 3{3 )( Raoetal. 2011 ),oftenwiththekernels 1 and 2 taken tobe Gaussian ( Kankainen 1995 ; Prncipe 2010 ; Achardetal. 2003 ).Theproposed framework,thus,providesanexplanationofthisrathersur prisingobservation,sinceitis possibleforacombinationof g and toproducethesame Thisgeneralizedframeworkalsoallowstheconstructionof newmeasuresof independencebydeningnewkernelsthroughappropriate g and .Inthischapter, weexploretwonewkernelsbysetting g ( x u ) tobeaheavysidefunction,either I ( x u ) or I ( x u ) ,and tobetheLebesguemeasure.Aninterestingpropertyofthese kernelsis that,unlikeotherpopularkernelssuchasGuassianandLapl acian,theydonotinvolveany freeparameters.However,noticethatthesekernelsareunb ounded,andtherefore,cannot beevaluatedinclosedforms.Nonetheless,weshowthatthee stimatorsoftheinduced measuresofindependencecanstillbeevaluated,andcanber epresentedinkernelformas in( 3{3 )withproperchoiceofkernelfunctions. Therestofthechapterisorganizedasfollows;insection 3.1 wediscussseveral availablemeasuresofindependenceindepth.Thesemeasure shavebeenwidelyusedas costfunctionsinthecontextofICA,andtheinherentsimila rityintheirexpressionsand theirestimatorsprovidethemotivationforexploringauni edframeworkthatembraces them.Insection 3.2 wepresentthemaincontributionofthechapteri.e.weuseth e conceptofGCIPandshowthatthemethodsdescribedinsectio n 3.1 actuallyoriginate fromauniedframework.Insection 3.3 weproposetwonewmeasuresofindependence motivatedbythisframework,andexploresomepropertiesof theassociatedkernels.As mentionedearlierthesemeasures,andtherespectiveestim atorsareparameterfree,and thereforeanexcellentchoiceforICA.Insection 3.4 weaddressthecomputationalissues oftheproposedapproach,anddevelopanappropriateoptimi zationtechniquetoexplore 49

PAGE 50

thenonlinearperformancesurface.Wealsoapplythepropos edapproachinthecontext oflinearICAandcomparewiththeGaussiankernelwithdier entkernelsizes,andother establishedmethodssuchaskernelICA,RADICAL,JADEandFa stICA,intermsofthe tradeobetweenaccuracyandspeed.Finally,insection 3.5 weconcludethechapterwith abriefoverviewoftheproposedwork,andsomeopenissuesin thislineofresearch. 3.1Background Inthissectionwediscussseveralavailablemeasuresofind ependenceandshowthat theysharethesameestimator.Beforeproceedingwebrieryd escribethenotionofstrict positivedeniteness,whichplaysapivotalrolethroughou ttherestofthechapter. 3.1.1StrictPositiveDeniteness Let ( X ,) beameasurablespaceandlet : XX! R bearealvaluedkernel, then, ( x y ) issaidtobe positivedenite ifforanynitesignedBorelmeasure : R thefollowingholds( Sriperumbuduretal. 2010 ), R ( x y )d ( x )d ( y ) 0 2 .Thekernelis saidtobe strictlypositivedenite (spd)iftheequalityholdsonlyforthezeromeasure.A function ~ : X! R iscalledspdif ( x y )=~ ( x y ) isspd( Bochner 1941 ).Thistypeof kernelsareknownastheshiftinvariantkernels( Genton 2001 ). 3.1.2QuadraticMutualInformation Asmentionedearlier,tworandomvariables X and Y areindependentifandonlyif f XY ( x y )= f X ( x ) f Y ( y ) forall ( x y ) .Therefore, = Z ( f XY ( x y ) f X ( x ) f Y ( y ) ) 2 d x d y (3{4) isameasureofindependencei.e. =0 ifandonlyif X ? Y ,undertheassumptionthat thedensityfunctionsare L 2 integrable.ThisquantityissimplythesquaredEuclidean 2 Noticethatinthecontextofmeasuringindependence canberegardedas F XY ( x y ) F X ( x ) F Y ( y ) ,andtheexistenceoftheRadon-Nikodymderivative d ( x ) = d x = f XY ( x y ) f X ( x ) f Y ( y ) isnotnecessary,asshowninTheorem 3.1 .However,theexistence isneededtodenethequadraticmutualinformationasdescr ibedinsection 3.1.2 50

PAGE 51

sinc 2 (x) sinc 2 (x)cos(4x) FT of sinc 2 (x) FT of sinc 2 (x)cos(4x) Figure3-1.Twofunctions,notzeroeverywhere,withconvol utionzeroeverywhere distancebetweentwodensityfunctionsandithasbeenwidel yusedasasurrogatefor mutualinformationinadaptivesignalprocessingandICA( Prncipe 2010 ).Given realizations f x i y i g ni =1 isestimatedconsistentlyusingParzenwindowestimationi .e. ^ = Z 1 n n X i =1 p 1 ( x x i ) p 2 ( y y i ) 1 n n X i =1 p 1 ( x x i ) 1 n n X i =1 p 2 ( y y i ) 2 d x d y (3{5) where p 1 ( x ) and p 2 ( x ) arezeromeansymmetricprobabilitydensityfunctions 3 .Given 4 p c ij ( a b )= Z p i ( x a ) p j ( x b )d x theestimatorof isgivenby ^ = ^ D 2p c 11 p c 22 ( f XY ( x y ), f X f Y ) (3{6) 3 Forconsistentestimationthescaleofthesedensityfuncti onsdropstozeroatacertain rateasnumberofsamplesgoestoinnity. 4 Superscript c denotesconvolution 51

PAGE 52

Inpractice p 1 ( x ) and p 2 ( x ) areusuallychosenfromalocation-scalefamily p ( x ) whichis choseninsuchawaythat p c ( x y ) canbecomputedinclosedform.Forexample, p ( x ) couldbetheGaussianortheCauchyfamily.Noticethat p c ii ( a b ) isapositivedenite functionsince Z p c ii ( a b )d ( a )d ( b )= Z Z p i ( x a )d ( a ) 2 d x 0. However, p c ii isnotstrictlypositivedeniteingeneral.Thiscanbeeasi lyshown byacounterexample.Consider p i ( x ) tobethesquared sinc functioni.e. p i ( x )= sin 2 ( x ) = x 2 .TheFouriertransformofthisfunctionisthetriangularfu nctionthatvanishes outside [ ] .Nowifwechoose tobeameasuresuchthattheFouriertransform R exp( i x )d ( x ) where i = p 1 vanishesinside [ ] ,then R p i ( x a )d ( a )=0 without beingzero 5 ,andthiscanbeachieved,forexample,bychoosing suchthat d ( x )=sin 2 ( x )cos(4 x ) = x 2 d x (seeFigure 3-1 ). Noticethat, p c ii ( x ) isstrictlypositivedeniteif p i ( x ) isstrictlypositivedeniteand, inpractice, p i ( x ) isoftenchosentobestrictlypositivedenite.However,it isonlya sucientconditionsince p i ( x ) canbeanonstrictlypositivedenitefunctionsuchas arectangularfunctionbutthecorresponding p c ii ( x ) canbeastrictlypositivedenite functione.g.atriangularfunction.3.1.3CharacteristicFunctionBasedMeasure Givenarandomvector X ,thecharacteristicfunctionof X isdenedas ( )= E exp( i > X ) where i = p 1 .Tworandomvariables(i.e.densityfunctions)arethesame iftheircorrespondingcharacteristicfunctionsarethesa me( Kankainen 1995 ).Let ~ ( 1 2 )= ( 1 2 ) ( 1 ) ( 2 ) 5 Usingthepropertythatconvolutionisequivalenttomultip licationinfrequency domain. 52

PAGE 53

= E exp( i 1 X + i 2 Y ) E exp( i 1 X ) E exp( i 2 Y ), then X ? Y ifandonlyif ~ ( 1 2 )=0 forall ( 1 2 ) .Usingthisconditionameasureof independencecanbedenedasfollows( Kankainen 1995 ), = Z ~ ( 1 2 ) 2 ( 1 2 )d 1 d 2 (3{7) where ( 1 2 ) isanappropriatestrictlypositiveweightingfunction.Le t ( 1 2 )= 1 ( 1 ) 2 ( 2 ) and ~ i ( x y )= R exp( i x )exp( i y ) i ( )d ,thentheestimatorof is givenby ^ = ^ D 2 ~ 1 ~ 2 ( f XY ( x y ), f X f Y ). (3{8) Inpractice,theweightingfunctionsarechoseninsuchaway that ~ canbecomputedin closedformand ~ ( x ) isrealvaluede.g. ~ couldbeGaussianorCauchyfunction.Note thatsince 1 ( ) and 2 ( ) arestrictlypositivefunctions, ~ 1 ( x ) and ~ 2 ( x ) arestrictly positivedenitefunctions.3.1.4QuadraticDependenceMeasure Afunction ( x ) iszeroalmosteverywhereif R ( x a ) ( x )d x =0 forall a where ( x ) isastrictlypositivedenitefunction.Givenrandomvaria bles ( X Y ) ,dene, ( a b )= Z 1 ( x a ) 2 ( y b )( f XY ( x y ) f X ( x ) f Y ( y ))d x d y where 1 2 arestrictlypositivedenitefunctions.Thenthequadrati cmeasureof independenceisdenedas( Achardetal. 2003 ) 6 & = Z 2 ( a b ) ( a b )d a d b (3{9) 6 Noticethatby quadraticdependencemeasure wedonotrefertoa measureofdependence asin( Renyi 1959 ),butsimplyfollowtheterminologyasintroducedin( Achard etal. 2003 ) 53

PAGE 54

where ( a b ) isanappropriatestrictlypositiveweightingfunction.Le t ( a b )= 1 ( a ) 2 ( b ) and ~ i ( x y )= R i ( x a ) i ( y a ) i ( a )d a ,thenanestimatorof & isgivenby ( Achardetal. 2003 ) ^ & = ^ D 2 ~ 1 ~ 2 ( f XY ( x y ), f X f Y ). (3{10) Inpractice,thefunctions i ( x ) and i ( x ) arechoseninsuchaswaythat ~ ( x ) canbe evaluatedinclosedform.Forexample,if ( x ) and ( x ) areGaussianorCauchyfunction andLebesguemeasure,respectively,then ~ ( x ) isalsoaGaussianoraCauchyfunction. Notethat,insuchcases,since i ( x ) isastrictlypositivedenitefunction, ~ i ( x ) isa strictlypositivedenitefunction( Achardetal. 2003 ). 3.2Unication Intheprevioussection,wehavediscussedanumberofmeasur esofindependence usedinpractice.Inthissection,welayoutthemaincontrib utionofthechapterby showingthatthesemeasuresofindependencecanbeframedun derasinglemeasurewith generalizedkernel.3.2.1GeneralizedCrossInformationPotential Let X and Y betworandomvariablesand ( x y ) beastrictlypositivedenite kernel.Dene = Z ( x ,~ x )( f X f Y )( x )( f X f Y )(~ x )d x d~ x thenduetothepropertyofthestrictpositivedeniteness, =0 ifandonlyif f X = f Y Thispropertycanbeeasilyextendedtothecaseofindepende nce,forexample,usinga productkerneli.e.dene = Z 1 ( x ,~ x ) 2 ( y ,~ y )( f XY f X f Y )( x y ) ( f XY f X f Y )(~ x ,~ y )d x d y d~ x d~ y (3{11) then =0 ifandonlyif X ? Y 54

PAGE 55

3.2.2CompositionKernelDenition7 (Schwartzspace) Thespaceofrapidlydecreasingfunctionson R n isthe functionalspace S ( R n )= f f 2 C 1 ( R n ): jj f jj < 18 g where aremulti-indices, C 1 ( R n ) isthesetofsmoothfunctions f : R n C jj f jj =sup x 2 R n j x D f ( x ) j and D isthedierentialoperator.Thespace S isalsocalled the Schwartzspace Denition8 ( S -admissiblefunctions) g : R n R n C is S -admissibleifforevery h 2S ,thefollowingintegralequationhasasolution f : R n C Z g ( x u ) f ( u )d u = h ( x ). (3{12) Using g ( x u ) wedenethecompositionkernelasfollows, Theorem3.1 (Compositionkernels) If g : R n R n C isa S -admissibleorastrictly positivedenitefunction,and isapositivemeasuresuchthat R j g ( x u ) j 2 d ( u ) < 1 and supp( )= R n ,thenthefollowingkernelissymmetricstrictlypositived enite, K ( x y )= Z g ( x u ) g ( y u )d ( u ). (3{13) Proof. Symmetryistrivial.Firstweshowthatitispositivedenit e.Let beanite signedBorelmeasure, D 2 = ZZ K ( x y )d ( x )d ( y ) = ZZZ g ( x u ) g ( y u )d ( u )d ( x )d ( y ) = Z Z g ( x u )d ( x ) 2 d ( u ) 0 55

PAGE 56

Toshowthestrictpositivedeniteness,weneedtoshowthat D =0 ) =0 .Suppose D =0 ,then, Z Z g ( x u )d ( x ) 2 d ( u )=0 ) Z g ( x u )d ( x )=0 {a.e. and,since supp( )= R n ,thisholdseverywhere. Case1:if g isstrictlypositivedenite,thenbyintegratingwith d ( u ) ,weget, RR g ( x u )d ( x )d ( u )=0 ;whichimplies =0 Case2:if g is S -admissible,multiplyingbothsidesbyanarbitraryfuncti on f ( u ) that satises( 3{12 )andintegrating, Z f ( u ) Z g ( x u )d ( x )d u =0 ) Z Z g ( x u ) f ( u )d u d ( x )=0 Therefore, Z h ( x )d ( x )=0, 8 h 2S andso =0 .Thiscompletestheproof. Toillustratethelastlineinmoredetail,considerthefoll owingelementofthe Schwartzspace, h ( x )=exp( x 2 ) .Thenthefamilyoffunctions exp( ( x y ) 2 ) isalsoin theSchwartzspaceforall y .Nowconsiderthesituation Z exp( ( x y ) 2 )d ( x )=0, 8 y Then,takingtheFouriertransform, ~ h ( )~ ( )=0, 8 56

PAGE 57

where ~ h and ~ aretheFouriertransformsofthefunction h andthemeasure respectively. Nowsince, ~ h ( ) isneverzero, ~ ( ) iszeroeverywherewhichimpliesthat iszero everywhere. Examplesofbasisfunctions g ( x u ) usedforcompositionkernelsin R are e ixu where i = p 1 I ( x u ) I ( x u ) where I ( A ) is 1 when A istrueand 0 otherwise,Dirac's deltafunction ( x u ) ,theGaussianfunction e ( x u ) 2 oranystrictlypositivedenite kernelassuch,andthefundamentalsolutionsofellipticop erators(Green'sfunctions)such as 1 = j x y j d 2 for d > 3 where x y 2 R d .In R d ,weusethetensorproductkernels: Q di =1 g ( x i u i ) Usingthecompositionkernelafamilyofquadraticmeasures ofindependencecanbe generatedasshowninTable 3-1 Table3-1.Kernelsfordierentchoicesof g ( x u ) and ( u ) KernelMeasure K i ( x y )= R ( x u ) ( y u )d u ( 3{4 ) K i ( x y )= R exp( i x )exp( i y ) i ( )d ( 3{7 ) K i ( x y )= R i ( x a ) i ( y a ) i ( a )d a ( 3{9 ) K i ( x y )= R I ( x a ) I ( y a ) i ( a )d a ( 3{16 ) K i ( x y )= R I ( x a ) I ( y a ) i ( a )d a ( 3{17 ) Noticethatthisfamilyencompassestheavailablemeasures ofindependenceas discussedinsection 3.1 andalsointroducestwonewmeasuresthatwewilladdressint he nextsection.Also,therstkernelrequiresthejointdensi tyfunctiontobecontinuous inordertoevaluatethemeasureofindependence.Otherkern elsdonothavesuch restrictions.3.2.3RelatedWorks TheproposedworkiscloselyrelatedtotheworkdonebyFukum izuandcoworkersin thecontextofkernelbasedlearning( Sriperumbuduretal. 2010 ),andDiksandcoworkers inthecontextoftimeseriesanalysis( DiksandPanchenko 2007 ). TheconnectionbetweenITLandkernelmethodshasbeenprevi ouslyestablishedin manycontexts( Xuetal. 2008a ).Theapproachinvolvinggeneralizedcrossinformation 57

PAGE 58

potentialstrengthensthisconnectionsinceasymmetricst rictlypositivedenitekernel inducesareproducingkernelHilbertspace(RKHS)( Sriperumbuduretal. 2010 ).Let : X!H bethemappingfromtheinputspacetotheRKHSsuchthat h ( x ), ( y ) i H = ( x y ) isthereproducingkernel.Let P beaprobabilitymeasureon X ,thenthemapping : P E X P ( X ) isinjectiveifthekernelis characteristic 7 ( Sriperumbuduretal. 2010 ),andithasbeenshownthatasymmetricpositivedeniteker nelischaracteristic. Throughthismappingtheinnerproductbetweentwoprojecte dprobabilitymeasures become h ( P ),( Q ) i H = h E X P ( X ), E Y Q ( Y ) i H = E X P E Y Q ( X Y ). Thisisthesameexpressionasthegeneralizedcrossinforma tionpotential.However, wearriveatthisexpressionfromanentirelydierentpersp ectivewithoutexplicitly formulatinganRKHS. Thegeneralizedinnerproducthasalsobeenusedintheconte xtofestimating divergenceandindependenceby( DiksandPanchenko 2007 ).However,theproposedwork isdierentinthesensethatweunifyseveralexistingmetho dsusingappropriatekernels andproposenewkernelsthatareeasytoestimate. 3.3NovelKernels Inthissection,weexploitthegenericsspdkernelintroduc edintheprevioussection togeneratenewkernels,andusethemtogeneratenewmeasure sofindependence.Wewill alsoshowthattheresultingmeasuresofindependencesatis fythepropertiesofacontrast function. 7 X P denotesarandomvariable X withprobabilitydistribution P 58

PAGE 59

Letusnowconsiderthefollowingtwosspdkernelswhere g ( x u ) is I ( x u ) or I ( x u ) K M ( x y )= Z I ( x u ) I ( y u )d ( u ) (3{14) K m ( x y )= Z I ( x u ) I ( y u )d ( u ). (3{15) Usingthesekernelsin( 3{11 )andthefactthat F X Y ( x y )= Z I (~ x x ) I (~ y y ) f XY (~ x ,~ y )d~ x d~ y where F XY ( x y ) isthecdf,wegetthefollowingmeasuresofindependence # M = Z ( F XY ( x y ) F X ( x ) F Y ( y ) ) 2 d ( x )d ( y ) (3{16) # m = Z ( S XY ( x y ) S X ( x ) S Y ( y ) ) 2 d ( x )d ( y ) (3{17) where S X Y ( x y )= Z I (~ x x ) I (~ y y ) f XY (~ x ,~ y )d~ x d~ y isthesurvivalfunction 8 (sf).Noticethatthesestatisticsmeasurethedierencebe tween twocumulativedistributionfunctionsortwosurvivalfunc tionsrespectively,ratherthan twodensityfunctionsoritsvariationsasinthepreviousme asures.Similarmeasures ofindependencehavebeendiscussedin( Blumetal. 1961 )where d ( x )d ( y ) hasbeen replacedby d F XY ( x y ) Atrivialchoicefor istheLebesguemeasure 9 ,forwhichthemeasuresbecome # M = Z ( F XY ( x y ) F X ( x ) F Y ( y ) ) 2 d x d y 8 Forrealvaluedrandomvariables F X ( x )=1 S X ( x ) .However,thisrelationdoesnot holdfortwoormoredimensions. 9 Thischoicecannotbemadefor( 3{7 )sinceitreducesthekerneltoaDeltafunction, whereasin( 3{9 ),thischoicedoesnoteliminatethechoiceofkernel 59

PAGE 60

# m = Z ( S XY ( x y ) S X ( x ) S Y ( y ) ) 2 d x d y However,thesemeasuresmightnotbeintegrable.Butsurpri singly,theirnitesample estimators ^ # M and ^ # m canbeintegratedinclosedformas, Z ( F n XY ( x y ) F n X ( x ) F n Y ( y ) ) 2 d x d y = Z M x m x Z M y m y 1 n n X i =1 I ( x i x ) I ( y i y ) 1 n 2 n X i =1 I ( x i x ) n X i =1 I ( y i y ) 2 d x d y = 1 n 2 n X i =1 n X j =1 ( M x max( x i x j ))( M y max( y i y j )) 2 n 3 n X i =1 n X j =1 n X k =1 ( M x max( x i x k ))( M y max( y j y k )) + 1 n 4 n X i =1 n X j =1 ( M x max( x i x j )) n X i =1 n X j =1 ( M y max( y i y j )) and Z ( S n XY ( x y ) S n X ( x ) S n Y ( y ) ) 2 d x d y = Z M x m x Z M y m y 1 n n X i =1 I ( x i x ) I ( y i y ) 1 n 2 n X i =1 I ( x i x ) n X i =1 I ( y i y ) 2 d x d y = 1 n 2 n X i =1 n X j =1 (min( x i x j ) m x )(min( y i y j ) m y ) 2 n 3 n X i =1 n X j =1 n X k =1 (min( x i x k ) m x )(min( y j y k ) m y ) + 1 n 4 n X i =1 n X j =1 (min( x i x j ) m x ) n X i =1 n X j =1 (min( y i y j ) m y ) 60

PAGE 61

respectively,where m x =min i f x i g M x =max i f x i g and m y =min i f y i g M y =max i f y i g Hereweusethefactthat F n XY ( x y ) F n X ( x ) F n Y ( y ) and S n XY ( x y ) S n X ( x ) S n Y ( y ) iszero outsidetheinterval ( m x M x ) ( m y M y ) Althoughtheexpressionsof ^ # M and ^ # m areverysimilarto( 3{3 ),wehavenotshown thattheindividualtermsi.e. min( x y ) m and M max( x y ) intheseexpressionsare sspdkernels.Toshowthat,letusrevisit( 3{14 )and( 3{15 ),andusean thatisuniformly distributedovertheopeninterval ( a b ) .Thentheresultingsspdkernelsaregivenby K M ( x y )= b max( x y ) and K m ( x y )=min( x y ) a Noticethatthesekernelsareonlydenedontheinterval ( a b ) 2 andnoton R 2 .However, iftherandomvariablesexistinanintervalthenitissucie nttouseasspdkernelthat isdenedonthatparticularinterval 10 .Sincetherealizations f ( x i y i ) g ni =1 existinan interval,wecanselect a 1 =min i ( x i ), a 2 =min i ( y i ) and b 1 =max i ( x i ), b 2 =max i ( y i ) forkernels 1 and 2 respectively,asdiscussedin( 3{3 ).Therefore,itisevidentthatthe estimators ^ # M and ^ # m areindeedspecialcasesof( 3{3 ).Noticealsothattherespective kernelsare datadriven ,sincetheyadjustthemselvestotherangeoftherealizatio ns.This isequivalentto normalizing therealizationsbetween 0 and 1 beforeapplyingansspd kerneldenedon (0,1) 2 ,whichisacommonpracticeinkernelmethods. 10 Theproofofthisstatementisastraightforwardextensiono ftheproofofTheorem 3.1 61

PAGE 62

Thisconnectioncanbefurtherexploitedbyconsidering tobeauniform probability measureintheinterval ( a b ) .Thenthecorrespondingkernelsaregivenby K M ( x y )= b max( x y ) b a and K m ( x y )= min( x y ) a b a Itcanbeeasilyveriedthatusingthesekernels,andthecho iceof a and b mentioned above ^ D 2 M1 M2 ( f XY f X f Y ) = R ( F n XY ( x y ) F n X ( x ) F n Y ( y )) 2 d x d y (max( x i ) min( x i ))(max( y i ) min( y i )) andsimilarly. ^ D 2 m1 m2 ( f XY f X f Y ) = R ( S n XY ( x y ) S n X ( x ) S n Y ( y )) 2 d x d y (max( x i ) min( x i ))(max( y i ) min( y i )) Anadvantageoftheseexpressionsarethattheyarescaleinv ariant,whichisadesired propertyofacontrastfunction( Comon 1994 ).WecallthesekernelstheMaxandtheMin kernelrespectively.Onceagain,noticethatapplyingthes eratherdatadrivenkernelson realizationsissimplyequivalenttorstnormalizingther ealizationsbetween (0,1) ,and thenapplyingthekernels min( x y ) or max( x y ) thatarebothsspdintheinterval (0,1) Noticethat,if X and Y arerealvaluedrandomvariables,thenthesetwokernels providethesameresult,since F n XY ( x y ) F n X ( x ) F n Y ( y ) =1 S n X ( x ) S n Y ( y )+ S n XY ( x y ) (1 S n X ( x ))(1 S n Y ( y )) = S n XY ( x y ) S n X ( x ) S n Y ( y ). 62

PAGE 63

0 0.5 1 0 0.2 0.4 0.6 0.8 1 min max triangle Figure3-2.Max K M ( x ,0.5) ,min K m ( x ,0.5) ,andtriangle K T ( x ,0.5) kernelsin (0,1) 2 Therefore,inoursimulationsweonlyusetheMinkernel.Not icethatinICAitis sucienttoconsiderthe pairwiseindependence betweentherealvalueddemixedsignals forndingthedemixingmatrix( Comon 1994 ). Aninterestingpropertyoftheproposedkernelsisthatthey aresymmetricbut notshiftinvariantinnature,unlikepopularkernelusedin theothermethodssuchas Gaussian,LaplacianandCauchykernels.However,theMaxan dtheMinkernelcanbe combinedtodesignashiftinvariantkernelasfollows, K S = K M + K m =1 max( x y )+min( x y )=1 j x y j Notethatthiskernelisstrictlypositivedenitesinceiti sthesummationoftwo strictlypositivedenitekernel,on (0,1) anditisactuallythetriangularkernel K T = ( 1 j x y j ) + restrictedover (0,1) 3.4IndependentComponentAnalysis Inthissection,webrierydiscusstheconceptoflinearICA, andapplytheproposed kerneltoperformICAonseveralsyntheticdata. 63

PAGE 64

-2 0 2 0 0.2 0.4 (a) Kurt: Inf -2 0 2 0 0.5 1 (b) Kurt: 3.00 -2 0 2 0 0.2 0.4 (c) Kurt: -1.20 -2 0 2 0 0.2 0.4 (d) Kurt: 6.00 -2 0 2 0 0.5 1 (e) Kurt: 6.00 -2 0 2 0 0.5 1 (f) Kurt: 1.11 -2 0 2 0 0.5 1 (g) Kurt: -1.68 -2 0 2 0 0.2 0.4 (h) Kurt: -0.74 -2 0 2 0 0.2 0.4 (i) Kurt: -0.50 -2 0 2 0 0.5 1 (j) Kurt: -0.53 -2 0 2 0 0.5 (k) Kurt: -0.67 -2 0 2 0 0.2 0.4 (l) Kurt: -0.47 -2 0 2 0 0.2 0.4 (m) Kurt: -0.82 -2 0 2 0 0.2 0.4 (n) Kurt: -0.62 -2 0 2 0 0.2 0.4 (o) Kurt: -0.80 -2 0 2 0 0.2 0.4 (p) Kurt: -0.77 -2 0 2 0 0.2 0.4 (q) Kurt: -0.29 -2 0 2 0 0.2 0.4 (r) Kurt: -0.67 Figure3-3.Sourcedensitiesusedinindependentcomponent analysis(ICA)experiment64

PAGE 65

3.4.1Description LinearICAisaformofblindsourceseparation(BSS)problem whereitisassumed thatthesourcesignalsare mutuallyindependent .Tobespecic,let S =[ S 1 ,..., S d ] be d realvaluedunknownsourcesignals;atmostoneofthembeing Gaussian, A beareal valuedfullrankunknownmixingmatrix,assumedtobesquare withoutlossofgenerality, and X =[ X 1 ,..., X d ] be d observedsignalsi.e. X = AS ;then,thegoalofICAistonda demixingmatrix W suchthattheoutputsignals Y =[ Y 1 ,..., Y d ] where Y = WX areas mutuallyindependentaspossible.DesigninganICAalgorit hm,therefore,comprisestwo steps;rst,designinganappropriatecostfunction J ( X ) thatmeasuresindependence,and second,ndingasuitablesearchalgorithmthatexploresth eperformancesurface J ( WX ) tosearchfortheoptimaldemixingmatrix W =min W J ( WX ) .Basedonthesetwosteps, theavailableICAalgorithmscanbebroadlydividedintwogr oups.Therstgroupof algorithms,suchasJADEandFastICA,avoidsestimatingind ependence,andemploys surrogatemeasures,suchaskurtosisandapproximationofn egentropy,ascostfunctions thatresultsinfairlysimpliedoptimizationproblem.The sealgorithmsareremarkably fast,butconvergestoonlynearoptimalsolution.Thesecon dgroupofalgorithms,on theotherhand,employsmeasuresofindependenceascostfun ctionthatmakesthese algorithmsmoreaccurate.Butthisaccuracycomesatthecos tofexploringanonlinear performancesurfacethatrequireshighlyinvolvedoptimiz ationalgorithm.Inthefollowing subsection,wedescribeasuitableoptimizationschemefor thequadraticmeasurebased ICA.3.4.2Optimization Theoptimizationproblem,inthecontextofICA,isusuallyt ackledinoneof twoways;employingagradientdescentbasedapproach( BachandJordan 2002 ), oremployinganexhaustivesearch( Learned-MillerandFisherIII 2003 ).Boththese approacheshavetheirownlimitations.Agradientdescentb asedapproachisfaster,but oftenconvergestolocalminima,whereasanexhaustivesear chisusuallyslower,but 65

PAGE 66

avoidslocalminima( Learned-MillerandFisherIII 2003 ).Inthischapterweexplorean exhaustivesearchbasedalgorithmfortheproposedmeasure ofindependence. ItisastandardpracticeinICAto whiten themixedobservationsbeforeapplying analgorithm.Giventhewhitenedobservations,thedemixin gmatrix W reducestoa rotationmatrix R .Forsimplicityletusrstconsiderthecasewhenweonlyhav etwo mixedsignalsi.e. d =2 ,then R canberepresentedbyasingle Jacobirotation .Wend theoptimalvalueof byperformingagridsearchi.e.byevaluatingthecostfunct ionat M equi-spaced sintherange [ = 4, = 4] ,andbychoosingthe withtheminimumcost. For d > 2 ,weemployasequentialsearchtechnique,i.e.sincea ( d d ) rotationmatrix canbecompletelydescribedby d ( d 1) = 2 Jacobirotations,wesearchtheoptimalvalues oftheserotationsoneatatime.Weelaboratethisconceptin Algorithm1indetail 11 where R ( pq ) denotestherotationmatrixthatperformsarotationby fordimensions ( p q ) Thisapproach,however,is O ( Md 2 ) incomputation,whicheasilybecomesintractable inhigherdimensions.Toeasethecomputationalload,westa rtouroptimizationfroma favorableinitialcondition W = W 0 ,astandardapproachconsideredintheICAliterature ( BachandJordan 2002 ),andsearchtheregioninthevicinityofthissolution.Thi s approachreducesthecomputationalloadto O ( md 2 ) ,where m M denotesthenumber ofJacobirotationssearched.Ourexperimentsshowthatthi sapproachprovidesfavorable computationalresultsthangradientdescentinhigherdime nsionse.g. d > 16 .Forour experimentswechoose m =8 ,intherange [ = 24, = 24] anddecreasetherangebyhalf ineachconsecutivesweep.SeeAlgorithm1forthedenition ofsweep.Also,wesetthe numberofsweeps S tobe 2 sincetheproposedalgorithmislinearin S 11 http://sites.google.com/site/sohanseth/Home/codes 66

PAGE 67

Algorithm1 ICAusingquadraticmeasuresofindependence Input: ( d n ) matrixofrealizations X ,numberofsweeps S ,numberofgrids M ,and initialdemixingmatrix W = W 0 for Each S sweeps do for Each d ( d 1) = 2 Jacobirotations pq do Evaluate J ( R ( m pq ) X ) for m =1,..., M Find pq =min m J ( R ( m pq ) X ) Updatedemixingmatrix W R ( pq ) W Updaterealizationmatrix X R ( pq ) X endfor endfor Output: Demixingmatrix W ,anddemixedrealizations X 3.4.3Approximation Thecomputationalcomplexityofdirectlyevaluatingthequ adraticmeasureof independenceisquadraticinnaturei.e. O ( n 2 ) given n realizations.Thiscomplexity isprohibitiveforpracticalapplicationswhen n > 1000 .Intheliteratureanumber ofmethodshavebeensuggestedtoreducethiscomplexitye.g .byusingalowrank decompositionsuchasCholeskydecomposition( BachandJordan 2002 )thatexploits thefactthatforcertainsspdkernels,theGrammatrixi.e.m atrixofallpairwisekernel evaluation,hasafastdecayingeigenvaluestructure,andt herefore,itispossibleto representthismatrixusingalowerrank,say t ,matrix.Thentheevaluationofthe quadraticmeasurereducesto O ( nt 2 ) .However,weobservethattheeigenvaluestructures ofboththeMaxandtheMinkernelsdonotdecayveryfast,andt herefore,theycannot berepresentedbylowerrankmatrix.Therefore,toreduceth ecomputationalcomplexity ofthecorrespondingmeasuresofindependence,weemployar athercrudequantization approach.Inessencewedividethearea [min i f x i g ,max i f x i g ] [min i f y i g ,max i f y i g ] in L L equalblocks,andrepresenttherealizations,say n 1 ofthem,inaparticularblock byasinglerealizationatthecenteroftheblockhavingprob ability n 1 = n .Themotivation forthisapproachcomesfromthefactthattheproposedmeasu reestimatesthearea underempiricalCDFsthatareessentiallystaircasefuncti ons,andtherefore,canbe approximatedbycoarsestairs.Usingthisapproachthecomp utationalcomplexityreduces 67

PAGE 68

to O ( n + L 4 ) .Ourexperimentsshowthatthisapproachprovidesfavorabl ecomputational resultswhencomparedwiththeGaussiankernelinhighersam plesizes, d > 4000 Althoughthisapproachisnotconsistentinnature,anadapt ivebinningstrategycanbe developedforconsistentestimationasblock-areaapproac heszero.However,wechoose thiscrudeapproachsince,inthiscase,weonlyneedtoevalu atetheGrammatrixonce andthenreuseitfortherestofthealgorithm,wherethisGra mmatrixiscreatedfromthe centersoftheblocks.Inourexperiments,weusetheusuales timatorinTable 3-2 andthe approximateestimatorinTable 3-3 and 3-4 3.4.4ExperimentalSetup WecomparetheMax/MinkernelagainsttheGaussiankernel,i nthecontextof quadraticmeasures,sinceitisthemostwidelyusedkerneli npractice( Prncipe 2010 ; Achardetal. 2003 ; Kankainen 1995 ; Shenetal. 2009 ).AGaussiankernelisdenedas ( x y )=exp( ( x y ) 2 = 2 ) where isascaleparameterknownasthekernelsize.We compareourkernelagainstthreedierentvaluesof i.e. 0.25 0.5 ,and 1 .Toevaluate themeasureinducedbythiskernelfaster,weusetheincompl eteCholeskydecomposition ( BachandJordan 2002 ),andsettheprecisionoftheCholeskydecompositionto 10 3 Noticethatahigherprecisionincreasesaccuracy,aswella scomputationalload. Wealsocompareouralgorithmwithotherestablishedalgori thmssuchas FastICA ( Hyvorinen 1999 ), KernelICA ( BachandJordan 2002 ), JADE ( CardosoandSouloumiac 1993 )and RADICAL ( Learned-MillerandFisherIII 2003 ).Forthesemethodsweuse thedefaultparametersettingsasintheirrespectivetoolb oxes.WeinitializeAlgorithm 1andKernelICAfromthesolutionofFastICA.Wedonotinitia lizeRADICALsince theavailableimplementationofthisalgorithmdoesnotall owinitialization.Alsofor RADICALweuseeitherthefastimplementationthatdoesnotp erform augmentation (inTable 3-2 ),ortheregularimplementationwithoutaugmentation(inT able 3-3 and 3-4 ).Augmentationimpliesgeneratingreplicasofthemixedob servationstogeneratea smootherperformancesurface.Itisevidentthataugmentin gincreasesaccuracyofthe 68

PAGE 69

Table3-2.Performanceofindependentcomponentanalysis( ICA)fortwosources SourceJADEFastICAKernelICARADICALGaus =1 Gaus =0.5 Gaus =0.25 Max/Min a,a 3.624.44 2 942 67 3.854.475.37 2 77 b,b 4.986.54 3 023 13 3.993.814.21 3 31 c,c 1 66 2.25 1 48 1.802.182.201.822.09 d,d 5 66 6.766.01 5 43 6.557.497.91 4 95 e,e 4.305.11 1 31 1.542.112.001.941.86 f,f 2.873.95 1 441 54 1.68 1 581 581 52 g,g 1 32 1.71 1 311 431 361 371 391 37 h,h 4 10 5.80 4 20 4.61 4 013 964 283 88 i,i 6 73 9.5212.35 9 34 7.368.169.51 7 50 j,j 4.716.68 1 351 39 2.982.942.933.16 k,k 4.676.51 2 682 98 3.093.113.21 2 96 l,l 6.859.40 4 604 575 305 215 285 04 m,m 2.804.09 1 29 1.514.821.56 1 361 39 n,n 3.975.47 1 82 2.194.132.462.142.39 o,o 3 45 4.77 3 60 4.054.804.364.924.83 p,p 2.793.88 1 44 1.763.172.07 1 62 1.82 q,q 15.5019.55 2 762 43 12.9712.3612.2913.81 r,r 4.005.77 3 143 18 3.743.894.22 3 39 Mean 4.676.233.153.094.344.054.223.78 Time(s) 0.000.000.601.970.170.310.660.94 69

PAGE 70

solution,aswellasthecomputationalloadofthealgorithm .Recently,( Grettonetal. 2005 )hasproposedanextensionofKernelICAthatavoidsusingan explicitregularization parameter.However,ithasbeenshownin( Grettonetal. 2005 )thatthisalgorithm providessimilarperformanceasKernelICA,andtherefore, weonlychoosetocompareour methodwithKernelICA. Wecomparetheperformanceofthesemethodsonsyntheticdat a,followingthe experimentalsetupdescribedin( BachandJordan 2002 ).Eachexperimentinvolves 2 32 sourcesthatare(randomly)selectedfromthedistribution sdescribedinFigure 3-3 .We generate 1000 8000 independentrealizations,andmixthemwitharandommatrix WepresenttheperformanceofthealgorithmsinTables 3-2 3-4 ,whereinTable 3-2 we showtheperformanceofthealgorithmsfor 2 sourceshavingsamemarginaldistributions, whereasinTable 3-3 and 3-4 weshowtheperformanceandcomputationalcostofthe algorithmsforsourceswithrandomlychosenmarginaldistr ibutions.EachentryinTable 3-2 andTable 3-3 showsthemeanof 100 Amaridivergence ( BachandJordan 2002 ) betweentheactualdemixingmatrixandtheestimateddemixi ngmatrixovernumber oftrialsdescribedinthetable.Thebestperformance,andt heperformancesthatare statisticallyindistinguishablefromthebestperformanc eusingWelch'st-testwithsize 0.1 areshowninbold.Noticethattheoutputofthistestalsodep endsonthevarianceof theAmaridivergences,andnotjusttheirmean.Eachentryin Table 3-4 representsthe meanamountoftimetakenbytherespectivealgorithminseco nds,averagedoverreported numberoftrials,eachinvolvingreportednumberofsamples anddimensions. 3.4.5Discussion FromTable 3-2 ,weobservethatKernelICAperformsthebest,winning 16 outof 18 times.TheperformanceofKernelICAisjustiedsinceitisb asedongradientdescent, whichallowsittoreachabettersolution,whereasthebests olutionthatcanbeachieved bytheothermethodsbasedonexhaustivesearchisrestricte dbytheresolutionofthe searchgrid.However,weexpectthisperformancetodeterio ratewhenthedimensionality 70

PAGE 71

Table3-3.PerformanceofICAalgorithmsforvaryingnumber ofsources DimensionsSamplesTrialsJADEFastICAKernelICARADICALG aus =1 Gaus =0.5 Gaus =0.25 Max/Min 2100010244.66.0 2 5 3.53.83.43.33.2 200010243.34.1 1 7 2.62.42.22.12.1 400010242.43.1 1 2 1.91.81.61.51.5 800010241.82.3 0 8 1.41.31.11.11.1 410002565.06.0 3 2 4.23.8 3 43 33 2 20002563.34.2 2 1 2.52.52.3 2 22 2 40002562.32.8 1 3 1.81.81.51.51.4 80002561.72.1 0 9 1.31.21.11.01.1 81000645.65.74.27.64.0 3 53 53 4 2000643.54.1 2 2 3.52.52.2 2 12 0 4000642.62.9 1 4 1.61.71.5 1 51 4 8000641.72.0 1 0 1.21.21.11.11.0 1610001614.86.56.616.5 4 64 44 54 2 2000165.54.32.95.42.5 2 22 22 1 4000162.73.21.71.61.7 1 51 51 4 8000161.92.2 1 0 1.21.2 1 11 01 0 324000812.23.52.411.01.9 1 81 71 6 800082.82.31.22.51.2 1 11 01 0 71

PAGE 72

oftheproblemisincreased,sinceinsuchcasestheperforma ncesurfacetendstohave morelocalminima.WeobservethatthisisindeedtrueinTabl e 3-3 whereKernelICA performspoorlyfor d > 8 .Amongtheexhaustivesearchbasedmethods,however, Max/Minkernelperformsthebest,winningan 11 outof 16 times,whereasRADICAL performsveryclosely,winning 10 times.However,RADICALtakeslongertimetorundue toaugmentation.ThiscaneasilybeveriedfromTable 3-4 wherewehaverunthesame experimentwiththefastversionofRADICAL,andittakesmuc hlesscomputationtime. However,asexpected,thespeedcomesatthecostofaccuracy .Wehaveusedfastversion ofRADICALinTable 3-3 and 3-4 sincethethecomplexityofregularRADICALbecomes prohibitiveforthehigherdimensions,andsamplesizes. FromTable 3-2 wealsoobservethattheGaussiankernelprovidesdierentp erformances overdierentkernelsizes,and =0.5 performsthebestonanaverage.However,from Table 3-3 weobservethattheGaussiankernelwith =0.25 performsbetterthanthe otherkernelsizes,anditalsoperformsverysimilartotheM ax/Minkernel.Aplausible explanationofthisobservationcouldbethatGaussiankern elperformsworseinsome specialcasesasdescribedinTable 3-2 butitperformswell onanaverage .Butitis interestingtoobservethattheMax/Minkernelperformssim ilarorbetterthanthebest Gaussiankernel.Finally,fromTable 3-2 weobservethatJADEperformswellincertain caseswhereasFastICAfailstowininanycase.However,from Table 3-4 isisevidentthat FastICAisthefastestalgorithmwhichmakesitanexcellent choiceforinitializingother algorithms.3.4.6Errorvs.SpeedPlot Notice,however,thatcomparingtheperformanceoftheICAa lgorithmsisa non-trivialproblemsincetheyalwaysestablishatradeob etweenaccuracyand computationtimei.e.theiraccuracycan,inmostcases,bei mprovedbyusingmore computationalresource,e.g.incaseofRADICALasdiscusse dintheprevioussection. Therefore,anICAalgorithmcanonlybesaidtobebetterthan therestifitismore 72

PAGE 73

Table3-4.ComputationalloadofICAalgorithms DimensionsSamplesTrialsJADEFastICAKernelICARADICALG aus =1 Gaus =0.5 Gaus =0.25 Max/Min 2100010240.00.00.70.10.20.30.71.5 200010240.00.01.10.10.30.61.82.0400010240.00.03.40.20.92.57.93.5800010240.00.09.10.52.46.820.66.2 410002560.00.04.80.91.02.04.49.0 20002560.00.06.61.61.83.911.411.340002560.00.016.13.35.114.147.120.480002560.00.042.07.014.542.1129.836.5 81000640.00.129.06.74.99.220.641.7 2000640.00.037.312.68.217.954.253.74000640.00.175.126.022.562.7205.097.48000640.10.1154.354.360.2171.0526.7172.2 161000162.30.2154.338.419.437.485.7187.0 2000161.10.1259.473.032.169.3199.2222.94000160.50.2556.5150.2100.4286.5955.8413.28000160.60.31032.3317.7279.7807.42257.8699.0 3240008295.91.13888.7942.3500.11540.14807.71636.9 8000859.50.99485.92077.81479.44299.011444.02783.4 73

PAGE 74

accuratewhileconsuminglessresource.Tocapturethisatt ribute,weproposetoexamine the errorvs.speed (EvS)ploti.e.a 2 -DplotoftheerrorsachievedbytheICAalgorithms, andthecomputationtimeinvolvedinachievingthecorrespo ndingerror.Intuitively, therealizationsinthisplotshouldfollowamonotonicdeca ywhichimpliesthatlower errorscanbeachievedbyconsumingmoreresource.Therefor e,ifanalgorithmbreaks themonotonicrelationshipandfallsunder(over)thiscurv ethenitessentiallyachieves less(more)errorinless(more)time,andso,itisbetter(wo rse)thanotheralgorithms. Toplotthisgure,however,weusethe ranks oferrorandcomputationtimeachieved bythemethods,ratherthantheabsolutevalues.Alsosinces everalalgorithmscan producestatisticallyindistinguishableperformances,w emakeboththebestperformance andthecorrespondingequivalentperformancestohaverank 1 ,e.g.inTable 3-3 for d =16 and n =2000 weassignthefollowingrankstothealgorithmsfromleftto right, 8,6,5,7,4,1,1,1 .Finally,weconsidertheaverageofalltheranksovermulti ple ( d n ) combinationsi.e.dimensionandsamplesizecombinations, andplotina 2 Dplot. WeshowtheEvSplotforTable 3-3 and 3-4 inFigure 3-4 .Noticethatsincewehave 8 algorithmsintotal,theEvSplotisboundedbetween 1 and 8 i.e.themimimumand maximumrankthatcanbeachieved. Overall,weobservethattheICAalgorithmsfollowtheexpec tedpatternintheEvS ploti.e.theyachievealowererrorattheexpenseofmorecom putationalload.JADE performsslightlybetterthanFastICAsinceithasbeenable toproducelesserrorin lesstime,whereastheGaussiankernelwith =1 performsbetterthanfastversionof RADICALsinceithasbeenabletoachievelesserrorinalmost thesametime.Also,the Gaussiankernelwith =0.5 performsbetterthanKernelICAsinceithasbeenableto achievethesameerrorinlesstime.TheMax/Minkernelperfo rmsbetterthanKernelICA, however,attheexpenseofslightlymorecomputationalload .Finally,theGaussiankernel with =0.25 performsthesameasMax/Minkernel,buttakesmoretimetore achthe solution. 74

PAGE 75

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 JADE FastICA KernelICA RADICAL Gaus s = 1 Gaus s = 0.5 Gaus s = 0.25 Max/Min Time ranksError ranks Figure3-4.PerformanceofseveralICAalgorithms 3.5Summary Inthischapterwehaveproposedauniedframeworkforanumb erofmeasuresof independencethatarewidelyusedinpractice.Wehaveachie vedthisbygeneralizingthe conceptofcrossinformationpotentialininformationtheo reticlearningusingsymmetric strictlypositivedenitekernels,andbyproposingagener icformofsymmetricstrictly positivedenitekernel.Wehaveintroducedtwonewkernels andtheassociatedmeasures ofindependencebasedonthisframework.Aninterestingpro pertyofthisapproachisthat itdoesnotinvolveanyfreeparameter.Wehavealsodiscusse dasuitableoptimization scheme,andappropriateapproximationtechniquetointegr atetheproposedmeasure inatractableICAalgorithm.Thesimulationresultsshowth attheproposedalgorithm performsequallywellcomparedtotheexistingalgorithmsf orperformingICA. 75

PAGE 76

CHAPTER4 CONDITIONALINDEPENDENCE Givenstationarystochasticprocesses f X t g and f Y t g f X t g issaidtocause f Y t g i.e. f X t g!f Y t g inthesenseofGranger ,ifthepastvalues [ X t 1 X t 2 ,...] of f X t g contain additionalinformationaboutthepresentvalue Y t of f Y t g thatisnotcontainedinpast values [ Y t 1 Y t 2 ,...] of f Y t g alone( Granger 1980 )i.e. 1 f X t g6!f Y t g() P ( Y t y j Y t 1 Y t 2 ,...) = P ( Y t y j X t 1 X t 2 ,..., Y t 1 Y t 2 ,...). (4{1) Noticethatinmathematicalterms,wearemoreinterestedin f X t g6!f Y t g i.e. non-causalitythan f X t g!f Y t g i.e.causalitysincetheformerdenitionreducestoa simple hypothesistest ,andwewillfollowthisviewfortherestofthechapter. Thenotionofstatisticalcausality 2 ,wasrstconceivedbyNorbertWiener( Wiener 1956 ).However,CliveGrangeriscreditedtobethersttoexploi titsmathematical foundation.Grangerinitiallyaddressedamuchsimplerpro blemwithtwoassumptions; rst,that f X t g6!f Y t g() E [ Y t j X t 1 X t 2 ,..., Y t 1 Y t 2 ,...] = E [ Y t j Y t 1 Y t 2 ,...], (4{2) i.e.,thecausalityonlyappearsinthe(conditional)meanv alueofthetimeseries,and second,that f X t Y t g isabivariatelinearautoregressiveprocess.Althoughthe se assumptionsareessentialforfeasiblecausalinferencein termsofcomputationand 1 Inthischapterweonlyexplorethecausalitybetweentimese riesthattakevaluesin R d 2 Causality,inthemostgeneralsense,maynotinvolvetime( Pearl 2000 ).However,in thischapter,weonlyaddresstheproblemofcausalityintim eorthecausalityinthesense ofGranger. 76

PAGE 77

datarequirements,theyareoftenviolatedinpractice( SuandWhite 2008 ).Therst assumption,alsoknownasthe causalityinmean is,sometimes,relaxedto causalityin variance ( ComteandLieberman 2000 )i.e. f X t g6!f Y t g() V [ Y t j X t 1 X t 2 ,..., Y t 1 Y t 2 ,...] = V [ Y t j Y t 1 Y t 2 ,...] (4{3) where V [ X ]= E [ X 2 ] E [ X ] 2 ,whereasthesecondassumptionisusuallyrelaxedthrough theuseofeithernonlinear( Anconaetal. 2004 )orpiecewiselinearmodels( Chenetal. 2004 ).Anonlinearmodel,however,usuallysuersfromtraining diculty,whereasa piecewiselinearmodelusuallyrequirescomparativelymor edataforbetterapproximation ofthenonlinearprocess.However,mostimportantly,neith er( 4{2 )nor( 4{3 )represents ( 4{1 )initsentirety,andtheyonlyactassurrogates. WiththegrowingapplicabilityofGrangercausalityindive rseresearchareassuch asmachinelearning( ChuandGlymour 2008 ),dynamicalnetwork( Marinazzoetal. 2008 ),bioinformatics( ZouandFeng 2009 ),neuroscience( Nedungadietal. 2009 )and econometrics( SuandWhite 2008 ),adierentapproachtowardthisproblemhasbecome increasinglypopularinrecentyears.Thisnewapproachfoc usonemploying testsof conditionalindependence asatooltodetect non-causality byexploitingthefactthat 3 f X t g6!f Y t g() Y t ? [ X t 1 X t 2 ,...] j [ Y t 1 Y t 2 ,...], (4{4) i.e.underGrangernon-causality,thepresentvalue Y t of f Y t g isconditionallyindependent ofthepastvalues [ X t 1 X t 2 ,...] of f X t g givenpastvalues [ Y t 1 Y t 2 ,...] of f Y t g ( Diks andPanchenko 2006 ).Noticethat( 4{4 )isadirectconsequenceof( 4{1 ),andviceversa. 3 Here ? denotesindependenceand j denotesconditioning. 77

PAGE 78

Therefore,thisapproachgeneralizestheconceptofcausal ityinmeanasin( 4{2 )and causalityinvarianceasin( 4{3 ),andtakesintoconsiderationtheentireconditional probabilityasin( 4{1 ).Itisfurtherpreferredforbeingmodelfree,andthus,app licable toawiderrangeofproblems,inparticulartoproblemswhere otherformsofGranger causality,suchasthelinearGrangercausality,areinadeq uate( SuandWhite 2008 ). Assessingconditionalindependence,however,isadicult problemthatuntilthis dayremainslargelyunexplored.Conditionalindependence betweentworandomvariables givenathirdrandomvariablecanbemathematicallyformula tedinseveralways,e.g., usingconditionaldistributionfunction( LintonandGozalo 1996 ),conditionaldensity function( DiksandPanchenko 2006 ),conditionalcharacteristicfunction( SuandWhite 2007 ),copula( Bouezmarnietal. 2009 )andkernelmethods( Fukumizuetal. 2008 ). However,noneoftheseapproachesareactuallypreferredov ertheothersinceeachofthem originatesfromverydierentconceptualperspectives,an deachhasitsownstrengths andweaknesses.Moreover,adetailedcomparisonofthesedi erentapproaches,tothe bestofourknowledge,stillremainstobeestablishedinthe literature.Inthischapter webrieryaddressafewofthesewidelyusedmethods,discuss theirpropertiesand drawbacks,andproposeanovelmeasureofconditionalindep endencethatalleviatessome oftheseissues.Theproposedmethodquantiesconditional independenceintermsof thequadraticdistancebetweentwoconditionalcumulative distributionfunctions,and estimatestheindividualfunctionsusingaleastsquarereg ressionframework,awellstudied problemintheliterature.Thisformulationsimpliesthep roblemofassessingconditional independence,andallowsthedesignofpotentiallybettere stimators. Moreover,theissueofcausalinruencebetweentwotimeseri escanbetrivially extendedtomultivariatetimeseriesinvolvingthreeormor etimeserieswhereitisoften desiredtoseparatea directcause froman indirect onei.e.tojudgeifthetimeseries f X t g and f Y t g arecausallyconnectedornot,givenathirdtimeseries f Z t g f X t g issaid tocause f Y t g given f Z t g i.e. f X t g!f Y t gjf Z t g ,ifthepastvalues [ X t 1 X t 2 ,...] 78

PAGE 79

of f X t g containadditionalinformationaboutthepresentvalue Y t of f Y t g thatis notcontainedinpastvalues [( Y t 1 Z t 1 ),( Y t 2 Z t 2 ),...] of f Y t Z t g .Intermsof conditionalindependence, f X t g6!f Y t gjf Z t g impliesthatthepresentvalue Y t of f Y t g isconditionallyindependentofthepastvalues [ X t 1 X t 2 ,...] of f X t g givenpastvalues [( Y t 1 Z t 1 ),( Y t 2 Z t 1 ),...] of f Y t Z t g .Although,inrecentyears,theconditional independencebasedapproacheshavebeenwidelyusedtodete rminecausalinruence betweenbivariatetimeseries,thesemethodshaveseldombe enappliedtolargescale problemsinvolvingthreeormoretimeseriestodetermineco nditionalcausalinruence.In thischapter,weexplorethisparticularaspectofthesemet hods,anddiscussthepowers andlimitationsofthesemethodscomparedtootherestablis hedmethods,suchasthe linearGrangercausality,inthiscontext. Therestofthechapterisorganizedasfollows.Insection 4.1 wedescribetheexisting methodsforassessingconditionalindependence,andalsod iscusstheirprosandcons indetail.Insection 4.2 wedescribeanovelmeasureofconditionalindependenceand derivethecorrespondingestimator.Wealsoexplorethecon nectionofthismethodtothe existingmethods,anddiscusstheadvantagesanddisadvant agesoftheproposedwork.In section 4.3 weapplytheproposedmethodinseveralsimulatedproblemsa ndcompareits performanceagainstotherstate-of-the-artmethods.Fina lly,insection 4.4 weconcludethe chapterwithabriefoverviewoftheproposedworkandsomegu idelinesforfuturework. 4.1Background Inthissectionwebrierydescribeafewofthewidelyusedapp roachesofassessing conditionalindependence,andtheirrespectiveprosandco ns. 79

PAGE 80

4.1.1ConditionalCumulativeDistributionFunctionbased method Iftworandomvariables X and Y areconditionallyindependentgivenathirdrandom variable Z ,then 4 F X j YZ ( x j y z )= F X j Z ( x j z ) 8 ( x y z ) (4{5) where F U j V ( u v )= P ( U u j V v ) istheconditionalcumulativedistributionfunction (CDF)oftherandomvariable U giventheevent V v .Since F U j V ( u j v )= F UV ( u v ) F V ( v ) (4{6) where F UV ( u v )= P ( U u V v ) isthejointCDFand F V ( v )= P ( V v ) isthe marginalCDF,theconditionin( 4{5 )canberestatedas F XYZ ( x y z ) F Z ( z )= F XZ ( x z ) F YZ ( y z ) 8 ( x y z ). (4{7) Nowdene A ( x y z )= F XYZ ( x y z ) F Z ( z ) F XZ ( x z ) F YZ ( y z ), then( LintonandGozalo 1996 ) X ? Y j Z () A ( x y z )=0 8 ( x y z ). (4{8) Thisconditionhasbeenusedby( LintonandGozalo 1996 )todesigntwoseparate testsofconditionalindependence,rstusingaKolmogorov -Smirnovtypestatistici.e. KS=sup ( x y z ) A ( x y z ), (4{9) 4 Noticethatthisconditionisnot ifandonlyif asmistakenlystatedin( Lintonand Gozalo 1996 ) 80

PAGE 81

andsecond,aCramer-von-Misestypestatistici.e. CM= Z A 2 ( x y z )d F XYZ ( x y z ). (4{10) Givenrealizations f ( x i y i z i ) g ni =1 ,thesestatisticscanbeconsistentlyestimatedasfollows KS n =sup ( x n y n z n ) A n ( x y z ) (4{11) and CM n = Z A 2n ( x y z )d F n XYZ ( x y z ). (4{12) where A n ( x y z )= F n XYZ ( x y z ) F n Z ( z ) F n XZ ( x z ) F n YZ ( y z ) and F n U ( u )= 1 n n X i =1 I ( u u i ) (4{13) istheempiricalCDF,and I istheidentityfunctioni.e I ( x y ) is 1 if x y orzero otherwise.Noticethatwhen U isavector, I ( u u i ) impliesthat I ( u ( d ) u ( d ) i ) foreach dimension d Anadvantageofusingthesetypeofestimatorsisthattheydo notinvolveanyfree parameterssuchaskernelorregularizationasintheestima torsthatwewillexploreinthe nextsubsections.However,adisadvantageoftheseestimat orsisthattheybreakdown wheneitherthesamplesize( n )issmallorthedimensionality( d )ishighsinceineither casesthevaluesoftheempiricalCDFatthesamplelocations i.e. F n U ( u i ) areoftenzero. Therefore,statisticaltestsofconditionalindependence basedontheseestimatorsusually suerfromlowsmallsamplepower,especiallyinhigherdime nsions.Variationsofthis approachhavebeenproposedtoalleviatetheseissues( DelgadoandManteiga 2001 ). However,thesevariationsusuallyinvolvefreeparameters 81

PAGE 82

4.1.2ConditionalProbabilityDensityFunctionBasedMeth od Tworandomvariables X and Y areconditionallyindependentgivenathirdrandom variable Z ,ifandonlyif, f X j YZ ( x j y z )= f X j Z ( x j z ) 8 ( x y z ) (4{14) where f U j V ( u v ) istheconditionalprobabilitydensityfunction(PDF)ofth erandom variable U giventherandomvariable V .Since f U j V ( u j v )= f UV ( u v ) f V ( v ) (4{15) where f UV ( u v ) and f V ( v ) arethejointandmarginalPDF,respectively,thecondition in ( 4{14 )canberestatedas f XYZ ( x y z ) f Z ( z )= f XZ ( x z ) f YZ ( y z ) 8 ( x y z ). (4{16) Thisconditionhasbeenwidelyusedtodesigntestsofcondit ionalindependence,e.g. ( DiksandPanchenko 2006 )hasproposedatwo-sidedtestofconditionalindependence ; whereas,( SuandWhite 2008 )hasproposedaonesidedtestofconditionalindependence basedonthiscriteria.Wefollowthelatterapproachsincei thasbeenshowntooutperform manyexistingmethods.Following( SuandWhite 2008 )wedenetheHellingerdistance basedstatistic(HD)as HD= Z ( 1 s f XZ ( x z ) f YZ ( y z ) f XYZ ( x y z ) f Z ( z ) ) 2 a ( x y z )d F XYZ ( x y z ) (4{17) where a ( x y z ) isanappropriatenon-negativeweightingfunction.Givenr ealizations f ( x i y i z i ) g ni =1 ,thisstatisticcanbeconsistentlyestimatedas HD n = 1 n n X i =1 8<: 1 s ^ f XZ ( x i z i ) ^ f YZ ( y i z i ) ^ f XYZ ( x i y i z i ) ^ f Z ( z i ) 9=; 2 a ( x i y i z i ) (4{18) 82

PAGE 83

where ^ f U ( u ) istheParzendensityestimateofthetruedensity f U ( u ) i.e. f U ( u )= 1 n P ni =1 p ( u u i ) where p ( u ) isaParzenkernel( Parzen 1962 ).Theweightingfunction a in( 4{18 )isappropriatelychosentominimizeortoignoretheeecto fthetaildensities whereParzenestimationispoor. AnadvantageofaPDFbasedmethod,suchas HD ,overaCDFbasedmethod, suchas KS or CM ,isthattheimplicitsmoothingoperationassociatedwitht heuseof asmoothkernel p allowsthismethodtobemorerobusttosmallsamplesizeorhi gh dimensionality.However,adisadvantageofthismethodist hatitrequiresselectingtwo freeparametersnamelythekernelandtheassociatedkernel size.APDFbasedmethodis usuallyobservedtobesensitivetotheseparametersandacr ossvalidationbasedapproach isoftenemployedtosettheseparameterstotheirbestvalue swhichinturnincreasesthe computationalcostofthemethod.Moreover,theperformanc eoftheestimatorusually deterioratesinhigherdimensionssincetheParzendensity estimateitselfsuersfromthe curseofdimensionality. ThereexistotherapproachesinvolvingPDFe.g.thecharact eristicfunctionbased approach( SuandWhite 2007 ).However,theystillrequireestimatingeitherthe (conditional)PDFsorthe(conditional)characteristicsf unctions,andarealsosensitiveto thechoiceofthekernelandthekernelsize.4.1.3ReproducingKernelHilbertSpaceBasedMethod Let H U denotesthereproducingkernelHilbertspace(RKHS)induce dbythestrictly positivedenitekernel U : UU! R ( Bochner 1941 ).Giventhattherandomvariables X Y and Z takevaluesin X Y and Z ,respectively,thecrosscross-covarianceoperator YX : H X !H Y isdenedas( Fukumizuetal. 2008 ) h g YX f i H Y = E [ f ( X ) g ( Y )] E [ f ( X )] E [ g ( Y )] (4{19) 83

PAGE 84

where f 2H X g 2H Y ,and h i denotestheinnerproduct.Using YX ,thenormalized conditionalcross-covarianceoperatorisdenedas V YX j Z = 1 = 2 YY ( YX YZ 1 = 2 ZZ ZX ) 1 = 2 XX Ithasbeenrecentlyshownthat( Fukumizuetal. 2008 ) X ? Y j Z () ( YZ )( XZ ) j Z = O (4{20) where O denotesthenulloperator.Therefore,theHilbert-Schmidt normofthisoperator hasbeenproposedasameasureofconditionalindependencei .e. HSNCIC= jj V ( YZ )( XZ ) j Z jj 2HS Givenrealizations f ( x i y i z i ) g ni =1 ,thisquantitycanbeconsistentlyestimatedasfollows, HSNCIC n =Tr[ R ( XZ ) R ( YZ ) 2 R ( XZ ) R ( YZ ) R Z + R ( XZ ) R Z R ( YZ ) R Z ] (4{21) where Tr denotesthetraceofamatrix, R U = K UU ( K UU + n n I ) 1 K UU ( i j )= U ( u i u j ) (4{22) istheGrammatrix, I istheidentitymatrix,and n isaregularizationparameterthat dependsonthesamplesize n AnadvantageoftheRKHSbasedmethodisthatitdoesnotexpli citlyestimatethe CDForthePDF,andtherefore,itismuchmorerobusttosmalls amplesizeandhigh dimensionality.However,adisadvantageofthismethodist hatitrequiresselectingthree freeparameters,namelythekernel,theassociatedkernels ize,andtheregularizationvalue. Asimilarideahasalsobeenexploredby( Sun 2008 )thatusesthefactthat X ? Y j Z if andonlyif XX j ( Y Z ) = XX j Z .Thisconditionissimilarto( 4{20 ),however,theauthor arrivesatthisconditionbyexplicitlygeneralizingtheli nearGrangercausalmodel. 84

PAGE 85

Ameasureofconditionalindependencerequiresestimating theconditionaldistribution function,explicitlyorimplicitly.Sinceestimatingthis statisticforcontinuousrandom variablesfromniterealizationsisanillposedproblem,a measureofconditional independence,intuitively,mustrelyonfreeparametersfo rasensibleestimate,where thesefreeparametersaretobeselectedbasedonsomecriter ia.Therefore,theobjective ofestimatingconditionalindependencereliablyistodesi gnanapproachwherethecriteria ofchoosingtheseparametersiswelldened,and/ortheycan bechoseneasily.Inthe followingsection,weexploreonesuchapproach,andthen,d iscusshowitdiersfromthe existingones. 4.2ProposedApproach Inthischapter,weexploreadierentrepresentationofcon ditionalindependence: Tworandomvariables X and Y areconditionallyindependentgivenathirdrandom variable Z ,ifandonlyif, P ( X x j Y = y Z = z )= P ( X x j Z = z ) 8 ( x y z ). (4{23) Beforeproceeding,webrierydiscussafewpropertiesof( 4{23 ),whilediscussingthe motivationbehindthisparticularformulationindetailat theendofthissection.First, noticethattheconditionin( 4{23 )isverysimilartotheconditionin( 4{5 ),however,they arenotthesamesincein( 4{23 )weconditionontheevent U = u ratherthantheevent U u asin( 4{5 ).Thesetwoeventsareverydierentsince,unlike( 4{7 ),theconditional probability P ( U u j V = v ) cannotbedecomposedastheratioof P ( U u V = v ) and P ( V = v ) duetothefactthatthesetermsarezeroif V isacontinuousrandom variable.Therefore,similarstatisticssuchas KS and CM cannotbederivedusingthis condition.Theconditionalprobability P ( U u j V = v ) can,however,beexpressedin termsoftheconditionalPDFasfollow, P ( U u j V = v )= Z u 1 f U j V (~ u j v )d~ u = R u 1 f UV (~ u v )d~ u R 1 1 f UV (~ u v )d~ u 85

PAGE 86

Thisexpressionestablishesthelinkbetweenthecondition in( 4{23 )andtheconditionin ( 4{14 ).Therefore,anapproachsimilarto( 4{17 )canbeconsideredtodesignameasureof conditionalindependence.However,weavoidthisapproach sinceitwouldrequireusing ParzentypeestimatesoftheconditionalPDFs,whichmakest heestimatorofconditional independencesensitivetokernelsizevariation.4.2.1MeasureofConditionalIndependence Ameasureofconditionalindependenceischaracterizedbyt hepropertythatit attainszerovalueifandonlyiftherandomvariablesarecon ditionallyindependent. Therefore,( 4{23 )canbeusedinmanydierentwaystodesignameasureofcondi tional independencee.g. M 2CI = Z ( P ( X x j Y = y Z = z ) P ( X x j Z = z )) 2 d F X ( x )d F YZ ( y z ) (4{24) isameasureofconditionalindependence.Wechoosethispar ticularmeasuresinceitcan beecientlyestimatedbyestimatingtheconditionaldistr ibution P ( U u j V = v ) usingaleastsquareregressionapproach.Wewilldescribet heestimationprocessin detailinthenextsubsections.Sincethismeasureisquadra ticinthesensethatitis the L 2 distancebetweentwofunctions,andsinceitisasymmetrici nthesensethat M CI ( X Y Z ) 6 = M CI ( Y X Z ) ,werefertothismeasureasAsymmetricQuadratic MeasureofConditionalIndependenceor AQMCI inshort. Let g x ( y z ) and h x ( z ) beestimatesof P ( X x j Y = y Z = z ) and P ( X x j Z = z ) respectively 5 .Thenanestimateof M CI isgivenby, ^ M 2CI = Z ( g x ( y z ) h x ( z )) 2 d ^ F X ( x )d ^ F YZ ( y z ) (4{25) 5 Wepreferthenotation g x ( y z ) and h x ( z ) insteadof g ( x y z ) and h ( x z ) sincewe willbeestimating g and h foreach x separately. 86

PAGE 87

Figure4-1.Depictionofmeasureofconditionalindependen ce M CI anditsestimator where ^ F ( ) denotestheempiricalCDFasin( 4{13 ).Denoteby ~ M CI thequantity ~ M 2CI = Z ( g x ( y z ) h x ( z )) 2 d F X ( x )d F YZ ( y z ), (4{26) thenusingthetriangleinequality(in R ),weget, jM CI ^ M CI jjM CI ~ M CI j + j ~ M CI ^ M CI j Theterm j ~ M CI ^ M CI j vanishesas n !1 sincetheempiricalestimate ^ F ( ) convergesto theactualcumulativefunction F ( ) almostsurelyandtheestimates g x ( y z ) and h x ( z ) are bothbounded.Usingthetriangleinequality,oncemore,ont herstterm,weget, jM CI ~ M CI j jj a jj b jjj a b j Z ( P ( X x j Y = y Z = z ) P ( X x j Z = z )) ( g x ( y z ) h x ( z )) 2 d F X ( x )d F YZ ( y z ) 1 = 2 j a b jj a j + j b j Z ( P ( X x j Y = y Z = z ) g x ( y z ) ) 2 87

PAGE 88

d F X ( x )d F YZ ( y z ) 1 = 2 + Z ( P ( X x j Z = z ) h x ( z ) ) 2 d F X ( x )d F YZ ( y z ) 1 = 2 = fromFigure 4-1 X j YZ + X j Z Thisinequalityshowsthattheabsolutedierencebetweent heactualandtheestimated M CI isupperboundedbythedistancebetweentheactualandthees timatedconditional probabilities.Therefore,inordertoestimate M CI properly,weneedtoestimate g x ( y z ) and h x ( z ) insuchawaythatthecorrespondingdistances X j YZ and X j Z areminimized. 4.2.2EstimationofConditionalDistribution Letusconsiderthegeneralproblemofndingtheleastsquar eestimate p u ( v ) 6 of P ( U u j V = v ) i.e.ndinganestimate p u ( v ) suchthatthedistance 2U j V = Z ( P ( U u j V = v ) p u ( v )) 2 d F U ( u )d F V ( v ) isminimized.Since F U ( u ) isanon-negativemeasure, 2U j V isminimizedif 2U j V ( u )= Z ( P ( U u j V = v ) p u ( v )) 2 d F V ( v ) isminimizedforall u Noticethat P ( U u )= EI ( U u ) .Usingthisequality,weexpand 2U j V ( u ) toget, 2U j V ( u )= Z ( P ( U u j V = v ) p u ( v )) 2 d F V ( v ) = C 2 Z P ( U u j V = v ) p u ( v )d F V ( v ) + Z p 2 u ( v )d F V ( v ) 6 p u ( v ) isequivalentto ^ P ( U u j V = v ) inFigure 4-1 88

PAGE 89

= C 2 Z I (~ u u )d F U j V (~ u j v ) p u ( v )d F V ( v ) + Z p 2 u ( v )d F V ( v ) = C 2 Z I (~ u u ) p u ( v )d F UV (~ u v ) + Z p 2 u ( v )d F V ( v ) where C = R P 2 ( U u j V = v )d F V ( v ) isaconstantthatdoesnotdependon p u ( v ) Noticethat,here F U j V (~ u j v )= P ( U ~ u j V = v ) andnot P ( U ~ u j V v ) ,andtherefore, d F U j V (~ u j v )d F V ( v )=d F UV (~ u v ) Now,assumethatwehavethefollowingmodel p u ( v )= m X i =1 u i i ( v ) where f i g mi =1 isasetoflinearlyindependentfunctionsand f u i g mi =1 isasetofcoecients. Then, 2U j V ( u )= C 2 E m X j =1 u j I ( U u ) j ( V ) + E m X i =1 m X j =1 u i u j i ( V ) j ( V ) = C 2 b > u + >u A u (4{27) where b ( j )= E [ I ( U u ) j ( V )] isacolumnvector, A ( i j )= E [ i ( V ) j ( V )] isamatrix and u ( i )= u i isacolumnvectorofcoecients.Givenrealizations f ( u i v i ) g ni =1 b and A canbeconsistentlyestimatedas ^ b ( j )= 1 n n X i =1 I ( u i u ) j ( v i )= 1 n ( e u )( j ) and ^ A ( i j )= 1 n n X k =1 i ( v k ) j ( v k )= 1 n ( > )( i j ), where ( j i )= j ( v i ) isamatrixofbasisfunctionsevaluatedatthesamplelocati onsand e u ( i )= I ( u i u ) isacolumnvector. 89

PAGE 90

Therefore, 2U j V ( u ) isapproximatelyminimizedat u =( > ) 1 e u where p u ( v )= P ni =1 ( u ) i i ( v ) .Noticethat,thisparticularsolutionmightsuerfromthe over-tting problem.Wewilladdressthisissueinamoment.4.2.3EstimatorofConditionalIndependence Let g x ( y z ) and h x ( z ) betheleastsquareestimatesof P ( X x j Y = y Z = z ) and P ( X x j Z = z ) ,respectively,where g x ( y z )= P mi =1 ( u ) i i ( y z ) and h u ( z )= P li =1 ( r u ) i i ( z ) f i g mi =1 and f i g li =1 aresetsoflinearlyindependentfunctions,and u and r u arethesolutionsoftherespectiveleastsquareproblems.T hen, ^ M CI canbeevaluated asfollows, ^ M 2CI = 1 n 2 n X i =1 n X j =1 ( g x i ( y j z j ) h x i ( z j )) 2 = 1 n 2 n X i =1 n X j =1 m X k =1 ( x i ) k k (( y j z j )) l X k =1 ( r x i ) k k ( z j ) 2 = 1 n 2 jj > > jj 2F where ( j i )= x i ( j ) and ( j i )= r x i ( j ) arematricesofcoecients, ( j i )= j ( y i z i ) and ( j i )= j ( z i ) ,and jjjj F denotestheFrobeniusnorm.Replacingthe solutionofthecoecientvectorinthisequationweget, ^ M 2CI = 1 n 2 jj ( > ( > ) 1 > ( > ) 1 ) E jj 2F (4{28) where E ( j i )= I ( x j x i ) .Noticethat E =[ e x 1 ,..., e x n ] 4.2.4ChoiceofFunctionsandRegularization Sincewearendinganestimate p u ( v ) of P ( U u j V = v ) byminimizingthe L 2 distancebetweenthesetwofunctions,weneedtoensurethat theexpansionfunctions 's 90

PAGE 91

arerichenoughtomakethisdistancearbitrarilysmallas n !1 .Therefore,wechoose p u ( v )= n X i =1 u i ( v v i ) (4{29) where isastrictlypositivedenitekernel( Bochner 1941 ).Itcanbeshownthatunder thecondition n !1 ,( 4{29 )canapproximateany L 2 ( F V ) functionwitharbitrary accuracy(i.e.inthe L 2 distancesense).Tobemorespecic,givensamples f x i y i z i g ni =1 wechoose i ( y z )= (( y z ),( y i z i )) and i ( z )= ( z z i ) for i =1,..., n .Usingthese functionsthematrices and becometheGrammatrices K ( Y Z )( Y Z ) K ZZ respectively asin( 4{22 ). Noticethat,since isstrictlypositivedenite,matrices K ( Y Z )( Y Z ) and K ZZ are invertiblei.e. K > ( KK > ) 1 K = I .Thiscreatesaproblemsinceitmakes ^ M CI =0 regardlessoftherealizations.Moreover,if isjustpositivedenite(ratherstrictly positivedenite)thenitmakesthematricesill-posed,and therefore,theycannotbe inverted.Therefore,ineithersituationthesolutionneed stobeproperlyregularized.We proposetouseaTikhonovregularizationonthenormoftheco ecientvectori.e.instead ofsolving( 4{27 )wesolvetheregularizedproblem 2( R ) U j V ( u )= C 2 b > u + >u A u + n >u u (4{30) where n istheregularizationparameter.Itcanbeeasilyshownthat theregularized estimateof ^ M CI isgivenby ^ M 2CI ( n ) = T ( Y Z )( Y Z ) T ZZ E 2F (4{31) where T UU = K >UU ( K >UU K UU + n n I ) 1 K UU Itisclearfrom( 4{31 )thattheestimatorinvolvesthreefreeparameters,namely thekernel ,theassociatedkernelsizeandtheregularization n .Wediscussedearlier that shouldbeastrictlypositivedenitekernelsuchasGaussia norLaplacian.We workwithaLaplaciankernel,andsetthekernelsizetotheme diandistancebetween 91

PAGE 92

thesamples.Finally,wesettheregularizationparametert odropasthesamplesizeis increasedi.e.weselect n = n 1 .Noticethattheproposedestimatorrequiresinverting an ( n n ) dimensionalmatrixwhichis O ( n 3 ) incomputation.However,thiscomputation canbereducedsubstantiallybyexploitingthefactthatthe Grammatricesoftenhavea fastdecayingeigenvaluestructure,andbyusingmethodssu chasincompleteCholesky decomposition( FineandScheinberg 2001 ). 4.2.5Discussion Ameasureofconditionalindependencederivedinthisfashi onhasseveraladvantages overtheexistingones.First,thekernelbasedregressioni mplicitlyinvolvesasmoothing operationthatallowsittoperformwellinhigherdimension s.Since,thequantity P ( U u j V v ) cannotbeestimatedinasimilarway,thisapproachgivesthe corresponding measureofconditionalindependenceanedgeovertheKSandC Mstatisticsinhigher dimensions.Second,akernelbasedapproachisobservedtob elesssensitivetothekernel sizevariation( Takeuchietal. 2006 ; Kanamorietal. 2008 ).Ithappenssinceakernel basedapproachexpressestheconditionalprobability P ( U u j V = v ) asaweighted sumofkernelsratherthanaunweightedsumasintheParzenes timate.Therefore,this approachisexpectedtobemorerobusttotheselectionofthe kernelandthekernelsize sincetheseweightsprovideextradegreesoffreedomthathe lpminimizingtheeectof thevariationofthekernelandthekernelsize,thatisnotav ailableintheParzendensity approaches.This,albeit,comesattheexpenseofaregulari zationparameter,tohandle theseextradegreesoffreedom.Third,theproposedapproac horiginatesfromregression, andtherefore,facilitatesunderstandingthesefreeparam eters,andprovidesaneasierway ofselectingthemfromtheknowledgeoftheregressiondomai ne.g.usingcrossvalidation ( Kanamorietal. 2008 ).Thiscannotbeachieved,forexample,inthekernelbased measuressuchas HSNCIC ,wheretheprimaryfocusistoembedadistributioninthe RKHSusing any characteristickernelratherthanexplicitlyestimatingt hedistribution. However,theuseofakernelbasedestimationnaturallylink s AQMCI to HSNCIC .It 92

PAGE 93

hasbeenrecentlyshownthatintheRKHSbasedmethodthecond itionalprobabilityis implicitlyrepresentedasasumofweightedkernelswhichis verysimilartoAQMCI( Song etal. 2009 ).Moreover,thesetwoapproacheshavesimilarcomputation alcomplexity andtheybothrequireselectingappropriatekernel,kernel sizeandregularization.Inthis chapter,weusethumbrulestoselectthefreeparametersrat herthancrossvalidationsince itiscomputationallyexpensive,especiallyinconjunctio nwiththesurrogatetest. 4.3Simulation Inthissection,wedescribeaseriesofexperimentstohighl ightthepropertiesof AQMCI andtocompareitagainstotheravailablemethodssuchas KS CM HD and HSNCIC .Wechooseavarietyofsyntheticdatasetsincludinglinear autoregressiveprocess, nonlineardynamicalsystem,nonlinearsystemwithexterna lexcitationandsystems wherethecausalityappearsinthevarianceandnotinthemea n.Noticethat,tothe bestofourknowledge,theconditionalindependencebaseda pproacheshaveonlybeen appliedtosmallnetworks,mostlyinvolvingtwotimeseries ( DiksandPanchenko 2006 ; SuandWhite 2008 ).However,weapplythesemeasurestolargernetworks,invo lving upto 5 timeseries,tobetterdemonstratetheadvantagesandthedi sadvantagesofthese methods.Theobjectivesoftheexperimentsare,thus,twofo ld:rst,wecomparethe resultsobtainedbytheconditionalindependencemethodsa gainstlinearGrangercausality ( LG ),andsecond,comparetheconditionalindependencemethod samongthemselves. Sincetheconditionalindependencebasedmethodsdonotput anyassumptionsonthe underlyingprocesstheytendtoprovidesimilarperformanc eoverallthedatasets;whereas LG ,althoughitoftenoutperformsthesemethodsonlinearproc esses,failstoperformwell whenthelinearityassumptionisinvalidated.Moreover,du ethesamereason,weobserve theconditionalindependencebasedmethodstoperformabit worsewhenthenumberof conditioningvariablesareincreased.FortheDGPsdescrib edbelow,wealsoobservethat theCMandtheKStestseitherperformpoorlyintermsofaccep tancerate,orperform somewhaterraticallyintermsofacceptingspuriousconnec tions.However,weexpectthis 93

PAGE 94

sincethesemeasuresusuallysuerfromestimationissuesd uetothefactthattheyrequire estimatingtheempiricalCDFin 5 9 dimensionsusingonly 200 samples.Therefore, weonlypresentresultsfor CM .Theothermeasuresofconditionalindependencei.e. HD HSNCIC and AQMCI ,usuallyoutperformthemsincetheyinvolveakernel,while demonstratingcompetitiveperformanceamongthemselves.4.3.1ExperimentalSetup Foreachexperiment,wegenerateamultivariatetimeseries f W t g with 2 5 elements, andfocusonndingthecausalconnectivityamongtheseelem ents.Toseparateadirect causefromanindirectcausewetestfortheconditionalcaus ality.Iftheoriginaltime seriesisbivariatethenthisisequivalenttotesting f W (1) t g6!f W (2) t g andviceversa. However,if f W t g hasthreeormoreelements,thenforanytwoelements i and j 6 = i wetestif f W ( i ) t g6!f W ( j ) t gjf W ( k ) t g where f W ( i ) t g[f W ( j ) t g[f W ( k ) t g = f W t g .Since allthesimulatedtimeseriesareofsecondorderi.e.thepre sentvalueonlydependson atmosttwopastvalues,wealwaysconditiononmaximumtwopa stvaluesi.e.totest f X t g6!f Y t gjf Z t g wetest Y t ? [ X t 1 X t 2 ] j [ Y t 1 Y t 2 Z (1) t 1 Z (1) t 2 Z (2) t 1 Z (2) t 2 ,...] Theestimators KS and CM donotinvolveanyfreeparameters.Therefore,wejust follow( 4{11 )and( 4{12 )toestimatethem.Theestimatorof HD ,ontheotherhand, involvesthreefreeparametersi.e.thekernelfordensitye stimation,theassociatedkernel size,andtheweightingfunction.Following( SuandWhite 2008 ),wesetthekernelto p ( u )= Y d (3 ( u ( d ) ) 2 ) 2 g ( u ( d ) ) where g ( v )=exp( v 2 = 2 2 ) = p 2 isaGaussianfunction,setthekernelsizeofthe Gaussianfunctionto =2 n 1 = 8.5 ,andsettheweightingfunctionto a ( u )= Y d 1 2 + 1 4 u ( d ) I ( 2 u ( d ) 0) + 1 2 1 4 u ( d ) I (0 u ( d ) 2). 94

PAGE 95

For HSNCIC ,wealsoneedtochoosethreefreeparametersnamelythekern el,the associatedkernelsize,andtheregularization.Following ( Fukumizuetal. 2008 ) wechooseaGaussiankernelwiththekernelsizesettothemed ianinter-sample distanceandsettheregularizationto 10 3 .For AQMCI wechooseaLaplaciankernel i.e. ( u )= Q d exp( j u ( d ) j = ) withthekernelsizesettothemedianinter-sample distanceandsettheregularizationto n 1 .Otherkernels,kernelsizeselectionscheme, andregularizationvalueshavebeenexploredin( SethandPrncipe 2010b ).Adirect computationof HSNCIC and AQMCI require O ( n 3 ) computationwhereas KS CM and HD require O ( n 2 ) computation.Tocontrolthecomputationalloadoftheexper iments,weset thesamplesizeto 200 foralltheexperiments.For LG weusethe GCCAtoolbox ( Seth 2009 ). Sinceweareworkingwithalimitednumberofsamples,werely onpermutationtest tojudgethesignicanceofthevalueobtainedbythemeasure ofconditionalindependence. Wediscusstheprocessofgeneratingsurrogatedataindetai linthefollowingsubsection. Foreachpermutationtestwegenerate 200 surrogatestoapproximatethenulldistribution, anduseasignicancelevelof 0.10 .Werepeateachexperimenta 256 timesandreport theacceptancerateofaparticularconnectioni.e.thefrac tionoftimesaconnectionhas beenestablished,astheperformanceindex.Theacceptance rateshouldbecloseto 1 if theconnectionexistsorcloseto 0 iftheconnectiondoesnotexist. 4.3.2PermutationTest Generatingsurrogatedatainthecontextoftestofconditio nalindependenceisnot trivial.Oneofthepopularapproachesforgeneratingsurro gatedatahasbeenproposed by( PaparoditisandPolitis 2000 ),andwasalsoimplementedby( SuandWhite 2007 2008 ).Briery,givensamples, f x i y i z i g ni =1 from ( X Y Z ) ,theaimofthisapproach istogeneratesamples f ~ x i ,~ y i ,~ z i g ni =1 ,representing ( ~ X ~ Y ~ Z ) ,suchthat ~ X ? ~ Y j ~ Z and 95

PAGE 96

( X Z ) ( ~ X ~ Z ) ( Y Z ) ( ~ Y ~ Z ) 7 .Thisisachievedbyestimatingtheconditional densityfunctions ^ f X j Z ( x j z )= P ni =1 p 1 ( x x i ) p 3 ( z z i ) = P ni =1 p 3 ( z z i ) and ^f Y j Z ( y j z )= P ni =1 p 2 ( y y i ) p 3 ( z z i ) = P ni =1 p 3 ( z z i ) where p ( ) isaParzenkernelforkerneldensity estimation,andthen,samplingfromthem.Thecompleteappr oach,thus,canbestatedin threesteps:foreach i ,rst,assign ~ z i = z i ,second,sample ~ x i fromthedensity ^ f X j Z ( x j ~ z i ) andthird,sample ~ y i fromthedensity ^ f Y j Z ( y j ~ z i ) .However,thisapproachrequiresselecting aresamplingwidth.Sinceweonlyuse 200 samplesforeachexperimentandthenumber ofconditioningvariablesrangesfrom 2 to 8 ,theappropriateresamplingwidthbecomes diculttoestimate.Therefore,wefollowadierentapproa ch. TogeneratesurrogatedataforthetestofconditionalGrang ernon-causality,wefollow theapproachdescribedin( DiksandDeGoede 2001 ).Givensamples f x i y i z i g ni =1 from f X t Y t Z t g ,theobjectiveistogeneratesamples f ~ x i ,~ y i ,~ z i g ni =1 representingastochastic process f ~ X t ~ Y t ~ Z t g suchthat f ~ X t g6!f ~ Y t gjf ~ Z t g and f X t gf ~ X t g and f Y t Z t g f ~ Y t ~ Z t g 8 .Thisisusuallydonebyxingtheobservations f y ( e ) i z ( e ) i g ni =1 whilerandomly permutingtheobservations f x ( e ) i g ni =1 where f u ( e ) i g ni =1 denotestherealizationsfromthe timeembeddedstochasticprocess f U ( e ) t g .Noticethatthisapproachnotonlydestroys thecausalinruencebutalso,possibly,destroystheintern alstructureoftheprocess f ~ X t g However,thisapproachispreferredinpracticeduetoitssi mplicityanditisobserved toprovidesensibleresult.Moreover,itdoesnotinvolvean yparameter.Inthecontext ofthetestofconditionalindependence,however,thisappr oachstilldoesnotproduce thesurrogaterealizations f (~ x i ,~ y i ,~ z i ) g ni =1 suchthat Y ? X j Z ,butmerelymodifythe realizationssuchthat ( Y Z ) ? X wherethelatterisasucientconditionfortheformer. 7 Theexpression X Y impliesthattherandomvariables X and Y followthesame distribution 8 Theexpression f X t gf Y t g impliesthatthetimeseries f X t g and f Y t g hasthesame jointprobabilitylaw 96

PAGE 97

Table4-1.Performanceofconditionalindependencemeasur esinexperimentA (a)True x 1 x 2 x 1 0.00 x 2 0.00 (b) LG x 1 x 2 x 1 0.17 x 2 0.08 (c) CM x 1 x 2 x 1 0.06 x 2 0.07 (d) HD x 1 x 2 x 1 0.00 x 2 0.00 (e) HSNCIC x 1 x 2 x 1 0.00 x 2 0.00 (f) AQMCI x 1 x 2 x 1 0.00 x 2 0.01 Asteadyunder-rejectionofthenullhypothesis,thatwewil lobserveinoursimulation,can beattributedtothisfact.4.3.3DataGeneratingProcesses Inthissectionwegeneratedatafromseveraldatageneratin gprocesses(DGPs),and presenttheperformanceofthemeasuresofconditionalinde pendenceindetectingtothe truecausaldirections.4.3.3.1ExperimentA Westartowithasimplebivariatemodelthatdoesnotexhibi tanycausality. Considerthefollowingbivariatestochasticprocess, x 1 ( t )= 1 x 2 ( t )= 2 where ( 1 2 ) N ( 0 ) ,and = 264 10.8 0.81 375 Theindividualprocesses f X (1) t g and f X (2) t g areinstantaneouslycorrelated,however,they donotexhibitanycausalinruenceoneachother.FromTable 4-1 ,weobservethatallthe methodshavesuccessfullyconcludedthisfact. 97

PAGE 98

Table4-2.Performanceofconditionalindependencemeasur esinexperimentB (a)True x 1 x 2 x 1 1.00 x 2 0.00 (b) LG x 1 x 2 x 1 0.95 x 2 0.12 (c) CM x 1 x 2 x 1 0.30 x 2 0.44 (d) HD x 1 x 2 x 1 0.05 x 2 0.02 (e) HSNCIC x 1 x 2 x 1 0.20 x 2 0.07 (f) AQMCI x 1 x 2 x 1 0.55 x 2 0.05 4.3.3.2ExperimentB Next,weconsiderabivariatelinearautoregressiveproces sasfollows, x 1 ( t )=0.9 x 1 ( t 1) 0.5 x 1 ( t 2)+ 1 x 2 ( t )=0.8 x 2 ( t 1) 0.5 x 2 ( t 2)+ 0.16 x 1 ( t 1) 0.2 x 1 ( t 2)+ 2 where ( 1 2 ) N ( 0 ) ,and = 264 10.4 0.40.7 375 Noticethattheindividualprocessesareagaininstantaneo uslycorrelated.However,they alsoexhibitaunidirectionalcausalinruence, f X (1) t g!f X (2) t g .Sincethetimeseries islinear,thisproblemcanbeeasilysolvedby LG ,asshowninTable 4-2 .However, weobservethatthisproblemisratherdiculttobesolvedus ingtheconditional independenceapproach.Apossiblereasonforthisphenomen oncouldbetheinability oftheconditionalindependencebasedmethodstocondition ontheconditioningvariables properly.However,fromTable 4-2 ,weobservethat AQMCI outperformstheother methodsbasedonconditionalindependenceindetectingthe truecausalinruence. Moreover,wealsoobservethattheconditionalindependenc ebasedapproachestendto performbetterifthecorrelationbetweenthenoisesources 1 and 2 islow. 98

PAGE 99

4.3.3.3ExperimentC Next,weagainconsideralinearautoregressivemodelbutin volvingthreetimeseries asfollows, x 1 ( t )=0.8 x 1 ( t 1) 0.5 x 1 ( t 2) +0.4 x 3 ( t 1)+ 1 x 2 ( t )=0.9 x 2 ( t 1) 0.8 x 2 ( t 2)+ 2 x 3 ( t )=0.5 x 3 ( t 1) 0.2 x 3 ( t 2) +0.5 x 2 ( t 1)+ 3 where 1 N (0,0.3 2 ) 2 N (0,1) ,and 3 N (0,0.2 2 ) .Sincethisprobleminvolves 3 timeseries,weneedtoseparatethedirectcausefromtheind irectcause.Beinga linearproblem,itcanbeeasilysolvedby LG asshowninTable 4-3 .FromTable 4-3 wealsoobservethat AQMCI hasbeenabletosuccessfullyrecoverthetrueconnectivity whereas HD hasfailedtodetectthecausality f X (3) t g!f X (1) t g .Weobservethatthe conditionalindependencebasedmethodsdemonstrateatend encytoacceptthefalse causalconnectivity f X (2) t g!f X (1) t g .Ithappenssince f X (2) t g isactuallyanindirectcause of f X (1) t g ,anditcauses f X (1) t g through f X (3) t g .Therefore,thetendencytoacceptthis particularconnection,again,showstheinabilityoftheco nditionalindependencebased methodstoconditionproperlyontheintermediatetimeseri es.However,weseethat AQMCI suerstheleastfromthisproblem. 4.3.3.4ExperimentD Next,weagainconsideralinearautoregressiveprocess,ho wever,involving 5 time seriesasfollows, x 1 ( t )=0.95 p 2 x 1 ( t 1) 0.9025 x 1 ( t 2)+ 1 x 2 ( t )=0.5 x 1 ( t 2)+ 2 x 3 ( t )= 0.4 x 1 ( t 2)+ 3 99

PAGE 100

Table4-3.Performanceofconditionalindependencemeasur esinexperimentC (a)True x 1 x 2 x 3 x 1 0.000.00 x 2 0.00 1.00 x 3 1.000.00 (b) LG x 1 x 2 x 3 x 1 0.110.11 x 2 0.10 1.00 x 3 1.000.11 (c) CM x 1 x 2 x 3 x 1 0.810.98 x 2 0.54 1.00 x 3 1.000.89 (d) HD x 1 x 2 x 3 x 1 0.000.00 x 2 0.00 1.00 x 3 0.050.00 (e) HSNCIC x 1 x 2 x 3 x 1 0.000.05 x 2 1.00 1.00 x 3 0.780.00 (f) AQMCI x 1 x 2 x 3 x 1 0.000.00 x 2 0.31 1.00 x 3 1.000.00 x 4 ( t )= 0.5 x 1 ( t 1)+0.25 p 2( x 4 ( t 1) + x 5 ( t 1))+ 4 x 5 ( t )= 0.25 p 2( x 4 ( t 1) x 5 ( t 2))+ 5 where 1 5 N (0,0.6 2 ) 2 N (0,0.5 2 ) 3 N (0,0.3 2 ) ,and 4 N (0,0.3 2 ) Thisparticularproblemisagainlinear,andtherefore,itc anbesolvedusing LG asshown inTable 4-4 .However,itisaratherdicultproblemfortheothermethod ssinceit involves 5 timeserieswhich,inturn,makesthenumberofconditioning variablesfor theconditionalindependencetesttobe 8 .WeobservefromTable 4-4 ,that AQMCI and HSNCIC haveoutperformedtheothermethodsbasedonconditionalin dependence. However,theyhavefailedtodetecttheconnection f X (4) t g!f X (5) t g 4.3.3.5ExperimentE Next,weconsideranonlinearnetworkwithexternalexcitat ionasfollows, x 1 ( t )=2 1 exp( x 2 ( t 1)) 1+exp( x 2 ( t 1)) + 1 x 2 ( t )=0.1 x 1 ( t 1) x 2 ( t 1)+0.5 x 2 ( t 1) + (6+ x 1 ( t 1)) x 3 ( t 1) 6 + 2 100

PAGE 101

Table4-4.Performanceofconditionalindependencemeasur esinexperimentD (a)True x 1 x 2 x 3 x 4 x 5 x 1 1.001.001.000.00 x 2 0.00 0.000.000.00 x 3 0.000.00 0.000.00 x 4 0.000.000.00 1.00 x 5 0.000.000.001.00 (b) LG x 1 x 2 x 3 x 4 x 5 x 1 1.001.001.000.07 x 2 0.11 0.070.110.11 x 3 0.110.11 0.110.10 x 4 0.120.080.09 0.89 x 5 0.090.070.141.00 (c) CM x 1 x 2 x 3 x 4 x 5 x 1 0.900.040.130.17 x 2 0.01 0.780.060.66 x 3 0.300.06 0.530.13 x 4 0.160.080.04 0.02 x 5 0.080.140.350.78 (d) HD x 1 x 2 x 3 x 4 x 5 x 1 0.230.320.400.09 x 2 0.23 0.230.220.26 x 3 0.130.13 0.130.15 x 4 0.050.040.03 0.06 x 5 0.310.310.230.34 (e) HSNCIC x 1 x 2 x 3 x 4 x 5 x 1 0.840.981.000.00 x 2 0.00 0.000.000.00 x 3 0.000.00 0.000.00 x 4 0.000.000.00 0.00 x 5 0.000.000.001.00 (f) AQMCI x 1 x 2 x 3 x 4 x 5 x 1 1.001.001.000.00 x 2 0.00 0.000.000.00 x 3 0.000.00 0.000.00 x 4 0.000.000.00 0.01 x 5 0.000.000.000.96 101

PAGE 102

Table4-5.Performanceofconditionalindependencemeasur esinexperimentE (a)True x 1 x 2 x 3 x 1 1.000.00 x 2 1.00 0.00 x 3 0.001.00 (b) LG x 1 x 2 x 3 x 1 0.910.07 x 2 1.00 0.13 x 3 0.191.00 (c) CM x 1 x 2 x 3 x 1 0.760.04 x 2 1.00 0.00 x 3 1.001.00 (d) HD x 1 x 2 x 3 x 1 0.010.02 x 2 0.00 0.00 x 3 0.020.92 (e) HSNCIC x 1 x 2 x 3 x 1 0.000.00 x 2 0.35 0.00 x 3 0.001.00 (f) AQMCI x 1 x 2 x 3 x 1 0.000.00 x 2 0.98 0.00 x 3 0.001.00 where 1 2 N (0,0.1 2 ) and x 3 ( t ) U (0,1) isanexternalinput.Weobservefrom Table 4-5 thatalthoughtheothermethodsfailtorecoverallthevalid connections, AQMCI outperformstheothermethodsbymissingonlytheconnectio n f X (1) t g!f X (2) t g .This particularconnectionis,again,missedsincetheconditio nalindependencebasedmethods failtoconditionontheconditioningvariablesproperly.T hisfactcanbeeasilyestablished byreducingthenumberofpastvaluesthatweconditiononfro m 2 to 1 .Wepresentthe resultofthismodiedexperimentinTable 4-6 ;whichclearlydemonstratesourreasoning. Also,noticethatforthismodiedexperiment,allthemetho dsincluding KS and CM performwell;whichstrengthensourreasoningofwhytheset womethodsperformpoorly intheotherexperiments,andthatisduetotheestimationis suesinhigherdimensions. Noticethat,althoughthisproblemisnonlinear, LG stillmanagestorecoverthetrue connectivityasshowninTables 4-5 and 4-6 .However,weobservethatwhenthevariance ofthenoiseterms 1 and 2 arezero, LG establishesaspuriousconnectionbetween f X (3) t g!f X (1) t g 4.3.3.6ExperimentF Next,weconsideralargernonlinearnetworkwithexternale xcitationasfollows, x 1 ( t +1)= 1+ x 1 ( t ) 1+ x 2 1 ( t ) sin x 2 ( t )+ 1 x 2 ( t +1)= x 1 ( t )exp x 2 1 ( t )+ x 2 2 ( t ) 8 + x 2 ( t )cos x 2 ( t ) 102

PAGE 103

Table4-6.Performanceofconditionalindependencemeasur esinexperimentE(singlelag) (a)True x 1 x 2 x 3 x 1 1.000.04 x 2 1.00 0.02 x 3 0.121.00 (b) LG x 1 x 2 x 3 x 1 1.000.07 x 2 1.00 0.14 x 3 0.111.00 (c) CM x 1 x 2 x 3 x 1 1.000.04 x 2 1.00 0.03 x 3 0.041.00 (d) HD x 1 x 2 x 3 x 1 0.750.00 x 2 1.00 0.00 x 3 0.121.00 (e) HSNCIC x 1 x 2 x 3 x 1 0.990.00 x 2 1.00 0.00 x 3 0.101.00 (f) AQMCI x 1 x 2 x 3 x 1 0.950.02 x 2 1.00 0.02 x 3 0.091.00 + x 3 4 ( t ) (1+ x 4 ( t )) 2 +0.5cos( x 1 ( t )+ x 2 ( t )) + 2 x 3 ( t +1)= x 1 ( t ) 1+0.5sin x 2 ( t ) + x 2 ( t ) 1+0.5sin x 1 ( t ) + 3 where 1 2 3 N (0,0.1 2 ) and x 4 ( t ) N (0,1) isanexternalinput.Sincethis problemisagainnonlinear,itcannotbesolvedby LG .However,surprisingly LG provides reasonableresultindetectingtheexistingconnections,b ut,italsoestablishessome non-existingconnections.Theconditionalindependenceb asedmethods,however,performs poorly.Thereasonbehindthisisagainthesameasdescribed inthepreviousexample. ToprovethiswepresenttheresultofthesameexperimentinT able 4-8 ,afterreducing thenumberofpastvaluesthatweconditiononfrom 2 to 1 ,andweobservetheexact samephenomenon.Noticethat, CM and HSNCIC decideonthefavoroftwospurious connections f X (3) t g!f X (1) t g f X (2) t g ,whereas HD and AQMCI successfullyeliminate them.However,fromTable 4-7 weobservethat AQMCI outperformstheothermethods whenweconditiononmorenumberofvariables.4.3.3.7ExperimentG Next,weconsideraproblemwherethecausalitydoesnotappe arinthemean,butin thevarianceasfollow x 1 ( t ), x 2 ( t ) N (0,1+0.4 x 2 2 ( t 1)). 103

PAGE 104

Table4-7.Performanceofconditionalindependencemeasur esinexperimentF (a)True x 1 x 2 x 3 x 4 x 1 1.001.000.00 x 2 1.00 1.000.00 x 3 0.000.00 0.00 x 4 0.001.000.00 (b) LG x 1 x 2 x 3 x 4 x 1 0.850.990.10 x 2 0.99 1.000.10 x 3 0.500.25 0.05 x 4 0.141.000.20 (c) CM x 1 x 2 x 3 x 4 x 1 0.070.820.00 x 2 0.98 0.990.00 x 3 0.300.06 0.01 x 4 0.450.910.24 (d) HD x 1 x 2 x 3 x 4 x 1 0.090.160.00 x 2 0.00 0.000.00 x 3 0.000.00 0.00 x 4 0.090.170.06 (e) HSNCIC x 1 x 2 x 3 x 4 x 1 0.000.000.00 x 2 0.88 0.970.00 x 3 0.010.00 0.00 x 4 0.951.000.91 (f) AQMCI x 1 x 2 x 3 x 4 x 1 0.000.020.00 x 2 1.00 0.980.00 x 3 0.000.00 0.00 x 4 0.021.000.09 Table4-8.Performanceofconditionalindependencemeasur esinexperimentF(singlelag) (a)True x 1 x 2 x 3 x 4 x 1 1.001.000.00 x 2 1.00 1.000.00 x 3 0.000.00 0.00 x 4 0.001.000.00 (b) LG x 1 x 2 x 3 x 4 x 1 1.001.000.08 x 2 1.00 1.000.11 x 3 0.640.96 0.07 x 4 0.071.000.10 (c) CM x 1 x 2 x 3 x 4 x 1 0.851.000.00 x 2 1.00 1.000.04 x 3 0.400.82 0.02 x 4 0.091.000.07 (d) HD x 1 x 2 x 3 x 4 x 1 1.001.000.00 x 2 1.00 1.000.00 x 3 0.000.00 0.00 x 4 0.091.000.09 (e) HSNCIC x 1 x 2 x 3 x 4 x 1 1.001.000.00 x 2 1.00 1.000.04 x 3 0.250.15 0.00 x 4 0.131.000.13 (f) AQMCI x 1 x 2 x 3 x 4 x 1 1.001.000.00 x 2 1.00 1.000.00 x 3 0.000.00 0.00 x 4 0.111.000.12 Table4-9.Performanceofconditionalindependencemeasur esinexperimentG. (a)True x 1 x 2 x 1 0.00 x 2 1.00 (b) LG x 1 x 2 x 1 0.12 x 2 0.26 (c) CM x 1 x 2 x 1 0.08 x 2 0.13 (d) HD x 1 x 2 x 1 0.10 x 2 0.33 (e) HSNCIC x 1 x 2 x 1 0.32 x 2 0.91 (f) AQMCI x 1 x 2 x 1 0.05 x 2 0.25 104

PAGE 105

0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 A LG 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 B CM 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 C HD 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 D HSNCIC 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 E AQMCI Figure4-2.Performanceofconditionalindependencemeasu resinexperimentH Noticethatsincethisparticularproblemdoesnottthe LG frameworkincannotbe solvedusing LG asclearfromTable 4-9 .WeobservefromTable 4-9 thattheconditional independencebasedmethods,ontheotherhand,candetectth etruecausalinruence,and that HSNCIC outperformstherestofthemethodsinthistask.However,it alsotendsto favorthenon-existentcausaldirection.4.3.3.8ExperimentH Next,weconsiderabivariatenonlineardynamicalnetworkw ithvaryingcoupling strengthasfollows x 1 ( t )=3.4 x 1 ( t 1)(1 x 2 1 ( t 1)) e x 2 1 ( t 1) +0.8 x 1 ( t 2)+ 1 x 2 ( t )=3.4 x 2 ( t 1)(1 x 2 2 ( t 1)) e x 2 2 ( t 1) +0.5 x 2 ( t 2)+ cx 2 1 ( t 2)+ 2 where 1 2 N (0,0.1 2 ) and 0 c 1 isthecouplingstrength.Inthisparticular problem f X (1) t g!f X (2) t g for c > 0 butnottheotherwayaround.Ithasbeenshown that LG failstodetectthetruecausalconnectivity.FromFigure 4-2 weobservethatall themethodsrecoverthetruecausalstructurewithmoreorle ssaccuracyand HSNCIC and AQMCI performverysimilar.Weobservethatalthough LG performssomewhatwell, itsperformancedoesnotvaryoverdierentsamplesizes;wh ichprovesthefactthatthe connectionestablishedby LG isactuallyspuriousinnature. 105

PAGE 106

4.3.3.9ExperimentI Next,weagainconsideranonlineardynamicalnetworkwithv aryingcoupling strength,however,involvingthreetimeseriesasfollows x 1 ( t )=3.4 x 1 ( t 1)(1 x 2 1 ( t 1)) e x 2 1 ( t 1) + 1 x 2 ( t )=3.4 x 2 ( t 1)(1 x 2 2 ( t 1)) e x 2 2 ( t 1) +0.5 x 1 ( t 1)+ 2 x 3 ( t )=3.4 x 2 ( t 1)(1 x 2 2 ( t 1)) e x 2 2 ( t 1) +0.3 x 2 ( t 1)+ cx 1 ( t 1)+ 3 where 1 2 3 N (0,0.1 2 ) and 0 c 1 isthecouplingstrength.Forthisparticular problem, f X (1) t g!f X (3) t g for c > 0 ,however,thecausalconnection f X (1) t g!f X (2) t g and f X (2) t g!f X (3) t g donotdependon c .FromFigure 4-3 weobservethatboth HSNCIC and AQMCI havesuccessfullyrecoveredthetruecausalstructure. LG alsoperformsverywell sincethecausaleectispartiallylinearinnature. 4.4Summary Inthischapter,wehaveaddressedtheissueofexploitingte stofconditional independencetodetectGrangernon-causality.Wehavedisc ussedtheadvantagesand disadvantagesofseveralexistingmeasuresofconditional independence,haveexplored anovelmeasureofconditionalindependencetoalleviateso meoftheseissues,andhave compareditsperformanceagainstotherstateofthearttech niquesinthecontextof detecting(conditional)Grangernon-causality.Acoupleo fattractivefeaturesofthe conditionalindependencebasedapproacharethat,rst,it doesnotputanyassumption onthetypeofcausalityunderconsiderationsuchascausali tyinmeanorcausalityin variance,second,itcanbeappliedtolinearandnonlinears ystemalikewithoutany modication,andthird,itcaninferthecorrectcausalityw ithsubstantiallysmallsample size. 106

PAGE 107

0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 A LG : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 B CM : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 C HD : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 D HSNCIC : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 2 x 2 x 1 E AQMCI : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 3 x 3 x 1 F LG : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 3 x 3 x 1 G CM : ( x 1 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 3 x 3 x 1 H HD : ( x 1 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 3 x 3 x 1 I HSNCIC : ( x 1 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 1 x 3 x 3 x 1 J AQMCI : ( x 1 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 2 x 3 x 3 x 2 K LG : ( x 1 x 2 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 2 x 3 x 3 x 2 L CM : ( x 2 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 2 x 3 x 3 x 2 M HD : ( x 2 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 2 x 3 x 3 x 2 N HSNCIC : ( x 2 x 3 ) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 cAcceptance rate x 2 x 3 x 3 x 2 O AQMCI : ( x 2 x 3 ) Figure4-3.Performanceofconditionalindependencemeasu resinexperimentI107

PAGE 108

CHAPTER5 GENERALIZEDASSOCIATION Inthemodernscienticandengineeringliterature,oneoft enencounterstheterm dependence incontextssuchas X ishighlydependenton Y or Z ismoredependenton X thanon Y .Uponreadingthattworandomvariablesaredependent,atyp icalresearcher usuallyinferoneofthreethings:First,thatthetworandom variablesarenot independent second,thatthetworandomvariablesare correlated ,andthird,thatthetworandom variablessharesome information .However,noneofthesethreeinterpretationsexplorethe conceptofdependenceinitsentirety.Forexample,althoug hitisacommonpracticeto denedependenceasmereabsenceofindependence,itdoesno taddressthequestionof howmuchdependent thetworandomvariablesare.Onthecontrary,althoughcorr elation successfullyaddressesthisparticularissueusingthenot ionof correlationcoecient ,it suersfromtwoseriousdrawbacks;rst,itonlycapturesth enotionof lineardependence andsecond,itisonlydenedforrealvaluedrandomvariable s.Mutualinformation, ontheotherhand,hasbeenfavoredbymanyasameasureofdepe ndence;mostlydue toShannon'smathematicalinterpretationofotherwisevag ueconceptsofuncertainty andinformation.Unlikecorrelation,mutualinformationc anbedenedonanytwo randomvariablesirrespectiveoftheirdomains,anditisca pableofcapturingnonlinear relationsbetweenthetwovariables.However,itsinterpre tationbecomessomewhat obscurewhentherandomvariablesarenotcategoricalbutmo reexotice.g.realvalued, timeseries,graphsetc.Moreover,mutualinformationisus uallydiculttoestimate, sinceitrequiresestimatingtheRadon-Nikodymderivative ,whichisanill-posedproblem onitsown.Crudeestimatorsofmutualinformation,basedon histogram( Pengetal. 2005 )orParzenestimate( GrangerandLin 1991 ),usually donotconveytheproperties ortheinterpretationsoftheactualmeasure ,e.g.,unlikethemeasure,theestimated mutualinformationbetweentworandomvariablesisneverin varianttoone-to-one transformations.Therefore,theconclusionsdrawnbythes eestimatorsremaininquestion. 108

PAGE 109

Theestimationissue,however,issometimeavoided,especi allyinengineeringapplications, byreplacingthemeasureofdependenceby,strictlyspeakin g,ameasureofindependence sinceitisofteneasiertoevaluate( Songetal. 2007b ).However,althoughthisapproach tendstoprovidereliableresults,itsvalidityremainsano penissuetotheskepticminds. Theconceptsofcorrelationandmutualinformationhavebee ngeneralizedonmany occasionstocircumventtheirrespectivelimitations.For example,Spearman( Spearman 1904 )andKendall( Kendall 1938 )eachhaveproposedtwogeneralizationsofcorrelation that,inessence,capture monotonedependence ( Nelsen 2002 )ratherthanjustlinear dependence,whereasRenyihascharacterizeddependencet hroughasetofproperties thataresatisedbycertainmeasuressuchasmutualinforma tion,thus,enrichingthe understandingofthisconcept.The measuresofassociation provideaclearunderstanding ofwhatdependenceis,inthecontextof both therandomvariablesandtherealizations ( Kruskal 1958 ),sincetheytreattherealizationsasaprobabilitylawbya ssigningequal massesovertherealizationvalues( Whitt 1976 ).However,unlikemutualinformation,the conceptofassociationisrestrictedto onlyrealvaluedrandomvariables ,andmoreover, fromanapplicationperspective,theoriginofameasureofa ssociationismostlyintuitive innature.Renyi'sapproach,ontheotherhand,providesam orerigoroustreatmentofthis subjectbycharacterizingameasureofdependenceintermso fasetofdesiredproperties ( Renyi 1959 ).Thisapproachismoresuitableforapplications,sinceit advocatesfor designingameasureofdependencefollowingcertain specications ,thatcancertainlybe chosenaccordingtotheapplication.Moreover,althoughin itiallyexploredinthecontext oftworealvaluedrandomvariables,thisideahasbeenexten dedtotworandomvectors ofarbitrarydimensions,andithasbeenshownthatthe -divergencesalsofollowsimilar properties( Joe 1989 ).Sincemutualinformationisaspecialcaseofthe -divergence, Renyi'sapproachprovidesabetterunderstandingofmutua linformationinthecontext ofmoregeneralclassofrandomvariables,but,unlikeassoc iation,thisapproachdoes not provideanyintuitiveexplanationof whatdependenceis,inthecontextoftherealizations 109

PAGE 110

Fromapracticalperspective,boththeconceptofassociati onandtheapproachconsidered byRenyiareinsucientinunderstandingandquantifyingd ependenceinitsentirety. Wesubmitthat,inpractice,oneonlyhasaccesstoanitenum berofrealizations,and therefore,itismoreimportantto understandthepropertiesofanestimatorofdependence ratherthanameasure .Moreover,inpractice,therealizationscanexistinmorea bstract domainssuchasvectorsorpointprocesses( Sethetal. 2010 ),andtherefore,dependence shouldbe well-denedirrespectiveofthenatureofthesedomains Althoughtheconceptofdependenceremainslargelyunexplo red,itsapplicationin modernscienceandengineeringsuchasinmachinelearning, neuroscience,statistics,and economics( BachandJordan 2002 ; Grangeretal. 2004 ; Prncipe 2010 )continuesto grow.Dependenceisanattractiveconceptbecauseitrelate sandexplainscomplexideas insimplecolloquialterms;verysimilartoinformationand entropyincommunication systems.Forexample,invariableselectiononeshouldsele ctareducedsetofvariables onwhichthetargetismost dependent ,whereasindimensionalityreductiononeshould projecttheoriginaldatatoasubspacethatismost dependent ontheoriginalspace. Similarly,intimeseriesanalysis,oneshouldexploreandi nvestigatethepastinstances onwhichthepresentinstanceismost dependent ,whereasinsystemmodeling,one shouldndamodelthatmaximizesthe dependence oftheoutputobservationsonthe inputobservations.Therefore,itisnaturalthatameasure ofdependence,atleastinan engineeringcontext,shouldpossesstheintuitiveunderst andingthatrelatesittotheseand otherapplicationareas.Also,sincetheseapplicationson lyinvolveasetofrealizations, ratherthanasetofrandomvariables,itisperhapsmorereas onabletoexploredependence fromthepointofviewofrealizations,thanrandomvariable s. Inordertosatisfytheserequirements,weabandonthetradi tionalapproachof quantifyingdependencebydesigninganappropriatemeasur e,andthenndinga good estimator;insteadwedirectlyexploretheconceptofdepen denceintermsofthe realizations,andderiveameaningfulestimator.Sinceman yscienticandengineering 110

PAGE 111

applications,suchasregression,classication,predict ion,andltering,dealwith establishingafunctionalrelationshipbetweentworandom variables,weexploreaset ofdesiredpropertiesoftheestimatorofdependenceinthis context.Weaddressthe questionsthat,givenasetofrealizationsfromapairofran domvariables,whatattributes makethemdependent,whenandhowdoesthedependencebetwee ntheserandom variablesincreaseordecrease,andwhendoesitreachitsma ximumvalue.Withthisnew understanding,wegeneralizetheconceptofassociationto arbitrarymetricspaces.We callthisgeneralizationthe generalizedassociation ,andthecorrespondingestimatorthe generalizedmeasureofassociation (GMA).Someinterestingpropertiesofthisestimator arethat,itisbounded,parameter-free,easytocompute,an d asymmetric innature. Beingparameterfree,allowsthevalueoftheestimatortobe unambiguous,whereasbeing asymmetricallowsittodetermine,insomesense,the causalinruence ofonevariable overtheother.Theseratherinterestingpropertiesonthep roposedapproachmakes itanexcellentchoiceforapplicationssuchastimeseriesa nalysis,variableselection, dimensionalityreductionandcausalinference. Therestofthechapterisorganizedasfollows;insection 5.1 weprovideadetailed overviewoftheexistingmeasuresofdependence,anddiscus stheiradvantagesand disadvantages,particularlyinthecontextofapplication s.Insection 5.2 wedescribea setofdesiredpropertiestocharacterizetheconceptofdep endence,andintroduceanovel estimatorofdependencebygeneralizingtheconceptofasso ciation,thatsatisesmany oftheserequirements.Insection 5.3 weprovidesomesimulationresultstoexplorethe applicabilityofGMA,andtoalsodiscussitsstrengthsandl imitations.Finally,insection 5.4 weconcludethechapterwithsomefutureworkguidelines. 5.1ConceptsandLimitations Inthissectionweprovideadetailedoverviewoftheexistin gliteratureonmeasures ofdependence,mostlyinchronologicalorder,discussthei ntuitionbehindthesemeasures, andpointouttheirlimitations. 111

PAGE 112

5.1.1Independence Theconceptofdependencebeginswiththeconceptofindepen dence.Let X and Y betworandomvariablesthattakevaluesinthemeasurablesp aces ( X ,( X )) and ( X ,( Y )) respectively,where ( X ) denotesthe -algebraovertheset X .Instatistics, thesetworandomvariablesaresaidtobeindependentifando nlyif P ( X 2 A Y 2 B )= P ( X 2 A ) P ( Y 2 B ) 8 A 2 ( X ), B 2 ( Y ) .Whatthisequationimpliesisthatthe probabilitythattherandomvariablestakeacertainvaluea sapairi.e. ( X Y ) takingthe value ( A B ) ,isequaltotheprobabilitythattheytaketherespectiveva luesindividually i.e. X takingvalue A and Y takingvalue B .Inotherwords,therandomvariablesdonot inruenceeachotheri.e.theytaketheirvalues independently ofeachother. 5.1.2Correlation Whentherandomvariablesarenotindependent,thentheyare dependent.However, ameasureofdependenceisdierentfromameasureofindepen dence,despitethefact thattheyareoftentreatedassynonymicintheliterature( Hsingetal. 2005 ; Achard 2008 ; Prncipe 2010 ).Bydenition,ameasureofindependenceisanonnegativem easure thatassumeszerovalueifandonlyiftherandomvariablesar eindependent,whereas ameasureofdependencesatisessomemoreproperties,that arealmostalwaysvery intuitiveinnature.Forexample,if D ( X Y ) isameasureofdependencebetweenrandom variables X and Y ,thenitisexpectedthat D ( X X ) isthesameforall X irrespectiveof thedistributionof X .Moreover,if ( X Y ) isapairofrealvaluedrandomvariablesthen D ( X aY ) shouldnotdependon a 2 R i.e.ameasureofdependenceshouldextractmore intricateinformationabouttherandomvariablesthanjust theirscale.Inbrief,ameasure ofdependenceshouldnotjustdetectnon-independence,but itshouldalsocapturethe strength ofdependence. Theconceptofcapturingthestrengthofdependencewas,per haps,rstconceived byPearsonwhointroducedthenotionof correlationcoecient ( MariandKotz 2001 ). Let ( X Y ) beapairofrealvaluedzeromeanunitvariancerandomvariab les.Thenthe 112

PAGE 113

Figure5-1.Is A moredependentthan B ? correlationcoecientisdenedas r = E [ XY ]= R ( F XY ( x y ) F X ( x ) F Y ( y ))d x d y .where F X ( x )= P ( X x ) denotesthecumulativedistributionfunction(CDF),andth esecond equalityisduetoHoeding( MariandKotz 2001 ).Noticethatunlikeindependence, correlationcoecientisdenedforonlyrealvaluedrandom variableswheretheproduct andtheorderingiswelldened. Ifweputthemathematicsasideforamomentthen,whatcorrel ationcoecient capturesiswhethertherandomvariables X and Y takelargevaluestogetherornot,i.e. a highercorrelationcoecientimpliesthatwhen X islargethensois Y .Givenrealizations f ( x i y i ) g ni =1 ,thecorrelationcoecientcanbeconsistentlyestimateda s ^ r =1 = n P ni =1 x i y i Noticethattheempiricalcorrelationcoecientcarriesth esameanalogyasthemeasure itselfi.e.itcaptureswhetherlargerealizationsof X areassociatedwithlargerealizations of Y ( Whitt 1976 ).However,correlationcoecientdenedinthiswaycanonl ycapture linearrelationshipe.g.tworandomvariablescanhaveamon otonic(butnotlinear) relationship,whichalsoimpliesthatlargevaluesof X areassociatedwithlargevaluesof Y ,butthecorrelationcoecientbetweenthemmightbelow. 5.1.3Association Theconceptofcorrelationcan,however,begeneralized,an dtwosuchgeneralizations havebeenproposedbySpearman( Spearman 1904 )andKendall( Kendall 1938 ), respectively.Insteadofworkingwiththerealizations f ( x i y i ) g ni =1 directly,Spearman 113

PAGE 114

proposedtoworkwiththe ranks oftherealizations,anddened(Spearman's)correlation asthecorrelationbetweentheranksof f x i g ni =1 andtheranksof f y i g ni =1 .Noticethat,the inspirationbehindthisapproachstillremainsthesame,i. e.onewishestocaptureiflarge realizationsof X areassociatedwithlargerealizationsof Y .However,here,thevalueofa realizationisjudgedinthecontextoftheotherrealizatio nsi.e.thenotionofbeinglarge becomesrelativeratherthanabsolute. Kendall,ontheotherhand,proposedadierentapproachfor quantifyingdependence byintroducingthenotionof concordance and discordance .Tworealizations ( x i y i ) and ( x j y j ) aresaidtobeconcordantif x i x j and y i y j havethesamesignortheyaresaid tobediscordant.Kendalldened(Kendall's)correlationt obethedierencebetweenthe numberofconcordantpairsandthenumberofdiscordantpair snormalizedbythetotal numberofpairs.Noticethat,althoughthisideaisfundamen tallydierent,itnonetheless capturesasimilarattribute,andthatiswhetherrelativel ylargerealizationsof X are associatedwithrelativelylargerealizationsof Y ( Kruskal 1958 ). TheapproachesexploredbySpearmanandKendall,alsoknown as association areessentiallystrongerthantheoriginalcorrelationcon ceptsincethey donotconsider thevalueofarealizationbutonlyitsrelativepositionwit hrespecttotherestofthe realizations ,andasaresulttheycapture monotonicdependence andnotjustlinear dependence( MariandKotz 2001 ),i.e.theyareinvarianttomonotonictransformation andtheyaremaximumwhentherealizationsaremonotonicfun ctionsofeachother. However,associationisagaindenedonlyofrealvaluedran domvariables,andtherefore, thesemethodsfailwhentherandomvariablesare,say,vecto rs. 5.1.4Copula Theideaofassociationcanalsobeformulatedintermsofcop ula( Nelsen 1999 ),a conceptthat,tillthisday,remainslargelyunexploredint heengineeringcommunity.Sklar showedthatanycontinuousbivariatejointdistributionfu nction F XY ( x y ) canbeuniquely writtenas F XY ( x y )= C ( F X ( x ), F Y ( y )) where C :[0,1] 2 [0,1] isadistribution 114

PAGE 115

functionitselfwithuniformmarginals.Thisfunctionisca lledthe copula ,anditseparates theinformationaboutdependencefromtheinformationabou tmarginals.Forexample,a trivialcopulais C ( u v )= uv ,anditimpliesthattherandomvariablesareindependent irrespectiveofthemarginaldistributionsof X and Y .Otherpopularparametriccopula familiesaretheGaussiancopulaandArchimedeancopulassu chasClaytoncopula, Gumbelcopula,Frankcopulaetc,( MariandKotz 2001 ). Sinceacopulaextractsthedependencestructureofajointd istribution,ameasure ofdependencecanbetriviallydesignedbycomputingthedis tancebetweentheextracted copulaandtheindependentcopulae.g. D 2 c = R ( C ( u v ) uv ) 2 d u d v ( SchweizerandWol 1981 ).Givenrealizations f ( x i y i ) g ni =1 ,thismeasurecanbeconsistentlyestimatedas ^ D 2 c =1 = n 2 n X i =1 n X j =1 ( ^ F XY ( x i y j ) ^ F X ( x i ) ^ F Y ( y j )) 2 where ^ F U ( u )=1 = n P ni =1 I ( u u i ) istheempiricalCDF. Noticethatfollowingtheconceptofcopula,thedependence betweentworealvalued randomvariablesdoesnotdependontheirmarginaldistribu tions.Since,achangeof marginaldistributionisequivalenttotransformingthera ndomvariablebyasuitable monotonicfunction,acopulabasedmeasureofdependenceis alsoinvarianttomonotonic transformations.Thisattributeofthemeasureisalsoinhe ritedbythecorresponding estimatorsincetheestimatedvalueonlydependsontherank softherealizationsi.e. theirrelativepositions,andnottheirabsolutevalues.It canbeshownthatboththe Spearman'scorrelation r andKendall'scorrelation r canbewrittenintermsof copulaasfollows( SchweizerandWol 1981 ), r =12 R ( C ( u v ) uv ) d u d v and r =4 R C ( u v )d C ( u v ) 1 .However,althoughcopulaprovidesaformaltreatment ofthesetwoconcepts,othermeasuresofdependencesuchas D c donotcarrythesame understandingofdependenceasassociationinthecontexto frealizations. 115

PAGE 116

5.1.5Divergence Thelimitationofassociationandcopulacanbecircumvente dbytheconceptof divergencesuchasmeansquarecontingencyandmutualinfor mation.Let X and Y be twocategoricalrandomvariablesthattakevaluesintheset s f a 1 ,..., a m g and f b 1 ,..., b n g respectively.Thenmeansquarecontingencyisdenedas( Pearson 1915 ) C = m X i =1 n X j =1 P ( a i b j ) P ( a i ) P ( b j ) 1 2 P ( a i ) P ( b j ) whereasmutualinformationisdenedas( ShannonandWeaver 1949 ) I = m X i =1 n X j =1 log P ( a i b i ) P ( a i ) P ( b j ) P ( a i b j ). MeansquarecontingencywasintroducedbyPearsoninthecon textofdesigningatestof independence( Pearson 1915 ).Itwaslaterdiscussedthatmeansquarecontingencyalso followssomedesiredpropertiesofameasureofdependences uchasitiszeroifandonlyif therandomvariablesareindependent,anditismaximumwhen thevalueofonerandom variablecanbefullydeterminedfromtheother( Steensen 1934 ). Mutualinformation,ontheotherhand,wasproposedbyShann oninthecontextof capturingtheremaininguncertaintyorentropyofarandomv ariablewhenthevalueofthe otherrandomvariableisknown( ShannonandWeaver 1949 ),anditwasobservedthatit followsthesamepropertiesasmeansquarecontingency.Bot hmeansquarecontingency andmutualinformation,however,areactuallytwospecialc asesofamoregeneralfamily ofdivergencesalsoknownasthe -divergences( Csiszar 1967 ).Other -divergences includethevariationaldistanceandRenyi's -divergence.The -divergence,between twocategoricalrandomvariables,isconsistentlyestimat edbyreplacingthetheoretical probabilityvaluesbyempiricalprobabilityvalues,andth isestimatorcarriesthesame propertiesandinterpretationsofdependenceasthemeasur e. Onceagain,ifweputthemathematicsasideforamoment,then what -divergences captureishowdierenttheprobabilitythattherandomvari ablestakeacertainvalueas 116

PAGE 117

apairis,fromtheprobabilitythattheytakethevaluesindi viduallyi.e.howdistantthe randomvariablesarefrombeingindependentofeachother.T herandomvariablesbecome maximallydependentwhengivenavalueofonevariabletheot hervariabletakesonlya certainvaluewithprobability 1 .Noticethat,thiscondition,inasense,ismorestrictthan monotonicdependencesinceitcapturesa one-to-one relationship.Although,one-to-one relationshipmakesperfectsenseiftherandomvariablesar ecategoricalsincedependence shouldnotdependonthelabelofacategory,thisconceptbec omesratherstrictinother situationse.g.inthecasewhentherandomvariablesarerea lvaluedsincethisproperty becomeshardtocapturefromnitesamplesize. Ithasbeenshownthattheconceptof -divergencescanbegeneralizedtorandom vectorsusingtheRadon-Nikodymderivative,i.e. D = Z f XY ( x y ) f X ( x ) f Y ( y ) f X ( x ) f Y ( y )d x d y where isaconvexfunctionand f Z ( z ) denotestheprobabilitydensityfunctionof Z ,and thatthe -divergencesdenedinthiswaysatisfymanydesiredproper tiesofameasureof dependence( Joe 1989 ; MicheasandZografos 2006 ).However,thisextensiongivesrise toaratherstrictmeasureofdependence.Forexample, -divergencesaremaximumwhen thejointprobabilitylawissingularwithrespecttothepro ductofthemarginalprobability laws,whichisamorestrictconditionthanevenone-to-onef unctionalrelationship.On theotherhand,the -divergencemeasuresdenedinthisway,sometimeslackcer tain desiredpropertiesofameasureofdependence.Forexample, althoughthesemeasuresare maximumwhentherandomvariablesshareaone-to-onerelati onship,itisusuallynottrue theotherwayaround. However,themostseriousdrawbackofusingthe -divergencesisthatthey arediculttoestimate,andpartofthisdicultyarisesfro mtheestimationofthe Radon-Nikodymderivativewhichisanill-posedproblemits elf.AlthoughParzentype estimators( GrangerandLin 1991 )ornearestneighborbasedestimators( Kraskovetal. 117

PAGE 118

2004 )areusuallypopularin R n ,thesemethodscannotbetriviallyextendedtoany abstractspace.Moreover,theseestimators,evenin R n ,hardlyfollowanypropertyof thecorrespondingmeasure,e.g.theymaynotreachtheirmax imumeveniftherandom variablessharealinearrelationship.Moreover,unliketh econceptofassociation,it remainsunclearwhatattributeofasetofrealizationscont ributetodependence. Estimatingdependenceinanabstractspace,however,isoft enessential.Forexample, considerestimatingdependencebetweentwosetsofneurals piketrains 1 ;acrucial probleminmanyneuroscienticapplicationssuchasfuncti onalconnectivityanalysis amongneuralassemblies( Engeletal. 2001 ).Thespaceofspiketrainlacksbasic Euclideanstructuresuchasadditivityandmultiplicativi ty,anda base measuresuch asLebesguemeasurewithrespecttowhichtheRadon-Nikodym derivativecanbedened. Therefore,adirectestimationofmutualinformation,inth isspaceisnotpossible,and othersurrogateapproachesareconsidered,suchasreprese ntingthespiketrainsinan Euclideanspacebybinning,orasacategoricalvariablesby clustering( Victorand Purpura 1997 ),orbydividingthemingroupsbytheirspikecountsandesti mating mutualinformationseparatelyforeachgroupbyrepresenti ngtheminEuclideanspace ( Victor 2002 ).However,theseapproacheshaveseverallimitations,e.g .binningignores thetimestructureofthespiketrainwhereasgroupingthemm akeseachgroupsparseand susceptibletosmallsampleestimationissues.Therefore, analternateapproachtoward measuringdependenceisworthinvestigatinginthisresear charea. 5.1.6Renyi'sApproach Renyi,inoneofhisseminalchapters,addressedtheproble moffunctionaldependence betweentworealvaluedrandomvariablesbydiscussingseve npostulatesthathefound desirableforameasureofdependence( Renyi 1959 ).AccordingtoRenyiif D ( X Y ) isa measureofdependencebetweentworandomvariables X and Y ,then, 1 Inessence,aspiketraincanbethoughtasanorderedsetofre alnumbers. 118

PAGE 119

-2 0 2 -4 -2 0 2 4 0.70/0.71 AInvertible -4 -2 0 2 4 -4 -2 0 2 4 0.70/0.57 BNon-invertible Figure5-2.Situationswherethecause X andtheeect Y cannot/canbeseparated. 1. D ( X Y ) isdenedforany X and Y ,neitherofthembeingconstantwithprobability 1. 2. D ( X Y )= D ( Y X ) 3. 0 D ( X Y ) 1 4. D ( X Y )=0 ifandonlyif X and Y areindependent. 5. D ( X Y )=1 ifthereisastrictdependencebetween X and Y ,i.e.either X = g ( Y ) or Y = f ( X ) where f and g areBorel-measurablefunctions. 6.IftheBorel-measurablefunctions f and g maptherealaxisinaonetoonewayonto itself, D ( f ( X ), g ( Y ))= D ( X Y ) 7.Ifthejointdistributionof X and Y isnormal,then D ( X Y )= j r ( X Y ) j Thesepropertieshavelaterbeenextendedtotworandomvect orsofarbitrarydimensions, andithasbeenshownthatthe -divergencessuchasmutualinformation,andmean squarecontingencyfollowthesedesiredproperties( MicheasandZografos 2006 ).Renyi's approach,therefore,generalizestheconceptofmutualinf ormation,andprovideabetter understandingofthisconceptinthecontextoftwomoreabst ractrandomvariablesrather thanjustcategorical. However,althoughthesepostulatescharacterizedependen cewell,theydonotaddress thefundamentalquestionofhowandwhendependenceincreas es/decreases.Toelaborate thisissueconsidertwopairsofrandomvariables ( X 1 Y 1 ) and ( X 2 Y 2 ) .Renyi'spostulates identifywhen D ( X 1 Y 1 ) or D ( X 2 Y 2 ) isminimumormaximum,andalsoprovidean ideaofwhenthedependence D ( X 1 Y 1 ) and D ( X 2 Y 2 ) arethesame.But,theydonot 119

PAGE 120

addressunderwhatconditionseither D ( X 1 Y 1 ) > D ( X 2 Y 2 ) or D ( X 1 Y 1 ) < D ( X 2 Y 2 ) Therefore,dierentmeasuresthatsatisfyRenyi'spostul atesusuallyhavetheirindividual interpretationsofdependence. Moreover,Renyionlyexploredthesedesiredpropertiesin thecontextoftworandom variables,butnottheirrealizations.Itisusuallyobserv edthatalthoughameasure satisesmanyoftheseproperties,thecorrespondingestim atordoesnot.Forexample, Renyishowedthatthemaximalcorrelationi.e. r m =sup f g r ( f ( X ), g ( Y )) where f g 2H and H isanappropriatesetoffunctions,satisesallthedesired properties,anditreaches itsmaximumwhentherandomvariablesshareafunctionalrel ationship f ( X )= g ( Y ) ( Renyi 1959 ).However,givenanitenumberofrealizationsitisalways possibletond functions f and g thattthedataperfectly( BachandJordan 2002 )whichmakesthe dependencemaximumirrespectiveoftherealizations.Howe ver,itisundesirable,and therefore,theestimatorofmaximalcorrelationneedstobe regularized,anditremains unclearunderwhatconditionsthisregularizedestimatorr eachesitsmaximum. 5.1.7QuadraticMeasuresofIndependence Sinceestimatingasophisticatedmeasureofdependencesuc hasmutualinformation isdicultinpractice,adierentapproachtowardthisprob lemhasbecomeincreasingly popular,especiallyintheengineeringcommunity.Thisapp roach,strictlyspeaking, replacesameasureofdependencebyameasureofindependenc e( Achard 2008 ; Prncipe 2010 ).Amongmanydierentapproachesproposedintherecentyea rs,theonethathas receivedperhapsthemostattentionisthekernelbasedappr oach.Abivariatefunction : XX! R iscalledstrictlypositivedeniteif R ( x y )d ( x )d ( y ) > 0 for anynonzeromeasure :( X ) R .Usingthisidea,ameasureofindependencecanbe triviallydenedbyreplacing by P ( x y ) P ( x ) P ( y ) ( DiksandPanchenko 2007 ; Gretton etal. 2007 ).Anadvantageofthisapproachisthatitcanbedenedon anyabstract space XY ,anditcanbeeasilyestimatedusingthestronglawoflargen umbers. 120

PAGE 121

However,adisadvantageofthisapproachisthatthismeasur eorthecorresponding estimatordoesnotsatisfyanybasicpropertiesofameasure ofdependence.Forexample, thedependencebetweenarandomvariableanditselfdepends onthedistributionof therandomvariable,orthedependencebetweenapairofbiva riateGaussianrandom variabledependsonthevarianceoftheindividualvariable s.Butthistypeofmethodsare nonethelessusedinpracticeduetotheirsimplicityincomp utationandunderstanding. Finally,thevalueofthismeasuredependsonthekernel.Alt houghextensionofthis approachhasbeenproposedthat,intheory,doesnotdependo nthechoiceofkernel ( Fukumizuetal. 2008 ),thecorrespondingestimatorstilldoes.Onceagain,alth ough thisapproachextendstheconceptofdependencetotwoabstr actspaces,itdoesnot provideanyintuitionabout whataspectsofasetofrealizationsinthosespacesmakethe m dependent .Finally,thisapproachreliesondeningstrictlypositiv edenitekernelson abstractjointspaces,whichisanon-trivialproblem,e.g. deningstrictlypositivedenite kernelsonspaceofspiketrainsremainanopenissue.5.1.8FurtherLimitations5.1.8.1Parameters Manywellacceptedestimatorsofdependence,suchastheest imatorofmutual information,involvefreeparameters.Theinclusionofafr eeparameteroftenobscurethe interpretationoftheestimateddependencevaluesincethe valueofthemeasurechanges withthevalueofthisparameter,andthevalueofsuchparame tersareoftenchanged tomaketheestimatorsstable.Forexample,considerthehis togrambasedestimator ofmutualinformation( Pengetal. 2005 ).Thisestimatordividesthespaceonwhich therealizationsexistindisjointsegments,countsthenum berofrealizationsthatfallin thosesegments,andtreatsthemasestimatesoftheprobabil ityofthatsegment.These probabilityvaluesarethenusedtocomputethemutualinfor mation.Asitcanbeeasily seen,thevalueoftheestimatordependsonhowthesegmentsa recreated.Ifthesegments aresmallenoughsuchthateachsegmentonlycontainatmosta singlerealization,then 121

PAGE 122

theestimateddependencebecomesmaximumwhereasifoneseg mentislargeenough thatitcontainsalltherealizationsthentheestimateddep endencebecomeszero.Thisis obviouslyundesirablei.e.theestimateddependencevalue shouldnotdependonanyfree parameters.5.1.8.2Symmetry Almostanyavailablemeasureofdependenceissymmetricinn ature( Renyi 1959 ). However,inmanyreallifeapplicationsweencountertworan domvariables X and Y suchthat Y = f ( X ) ,where f isanonlinear,andoften,non-invertiblefunction,e.g.in regression.Insuchsituationsitisnaturaltoaskwhether D ( X Y )= D ( Y X ) sincethe valueof Y canbecompletelypredictedfromthevalueof X butnottheotherwayaround. Similarly,inadimensionalityreductionproblem,oneproj ectstheoriginalrealizationsin asubspace,however,thisapproachcannotbereversed.Insu chsituationsitisnaturalto thinkthattheconceptofdependenceisperhapsasymmetric. Therefore,wearguethata dependencemeasureshouldbeasymmetricandshouldbeablet ocapturethedirection ofinteractionbetweenthe independent andthe dependent variablei.eforanon-linear, non-invertibleregressionproblem, D ( X Y ) D ( Y X ) wheretheequalityshouldbe achievedifandonlyifthefunction f isinvertible.SeeFigure 5-2 foranillustration.Here weread D ( X Y ) as dependenceof Y on X 5.2ANovelFramework Havingidentiedthelimitationsanddicultiesofthecurr entviewatmeasuring dependence,wediscussasetofpropertiesthatadependence measureshouldsatisfyto beappliedinpracticalproblems.Thenweproposeameasureo fdependencethatfollows thesepropertiesbygeneralizingtheconceptofassociatio n. 5.2.1PropertiesofDependence5.2.1.1Denition Ameasureofdependenceshouldbewelldenedbetweenanytwo randomvariables. Forexample,oneshouldbeabletodenedependencebetweena vectorandarealvalued 122

PAGE 123

randomvariable.Aswementionedbefore,bothcorrelationa ndassociationareonly denedonrealvaluedrandomvariables,andtherefore,they addresstheproblemof dependencebetweentwovectorsintermsofpairwiseinterac tions.However,inmany practicalproblemssuchasinvariableselection,itisesse ntialtocomputeasinglenumber thatquantiesthestrengthofinteractionbetween,say,av ectorsuchasasetofinput variables,andarealvaluedrandomvariablesuchasthetarg etvariable.Moreover,given asetofrealizations f x i y i g ni =1 ,itshouldbeclearwhataspectsoftheserealizationsmake themdependent.Forexample,correlationandassociationp rovideaclearunderstanding ofthisissuewhile -divergenceandthekernelbasedmeasurefailtodoso.Since in practiceanuseronlyhasaccesstorealizations,thisis,pe rhaps,moreimportantthan understandingthenotionofdependenceinthecontextofthe randomvariables. 5.2.1.2Asymmetry Adependencemeasureshouldbeallowedtobeasymmetric,and itshouldbeableto showthe causal inruencebetweentheindependentandthedependentvariabl e.Thisis againdesirable,e.g.,invariableselection,andincausal inference. 5.2.1.3Bounds Ameasureofdependenceshouldbebothupperandlowerbounde dwherethe lowerboundshouldindicateindependenceandtheupperboun dshouldindicatestrict dependence.Thelowerboundisoftenviolatedbymeasuresof associationsincetheyreach iteveniftherandomvariablesarenotindependent,whereas theupperboundisoften violatedby -divergencessincetheyreachiteveniftherandomvariable sarenotstrictly dependenti.e.functionsofeachother.Themeaningofstric tdependence,however,should dependontheapplication.Sincemanyapplicationslookfor afunctionalrelationship betweentwovariables,wearguethatthestrictdependences houldbeequivalenttoa deterministicfunctionalrelationshipwherethedependen tvariablecanbecompletely determinedfromtheindependentvariables.However,thisc onditionisverystrictin practicesincegivennitenumberofsamplesitisalwayspos sibletondacurvethatts 123

PAGE 124

themproperly,thus,establishingafunctionalrelationsh ipisoftentrivialinthecontext oftherealizations.Therefore,wearguethatthisconditio nshouldberelaxedtoan appropriatesetoffunctionssuchaslinear,monotonic,con tinuous,orsmoothfunctions ofcertaindegree.Forexample,correlationcoecientisma ximumwhentworandom variablesarelinearlyrelated,whereasassociationismax imumwhentherandomvariables aremonotonicfunctionsofeachother.5.2.1.4Invariance Ameasureofdependenceshouldbeinvarianttoanappropriat eclassoftransformations. Thispropertyisimportantinordertocreateanequivalentc lassofrandomvariable pairsforwhichthedependencevaluesdonotchange.Forexam ple,itisdesiredfor thedependencevaluebetweentworandomvariablesnottodep endonthescaleofthe randomvariables.Anappropriateclassoftransformations ,however,should,again,depend ontheapplication,andshouldbeinheritedbytheestimator .Ageneralsetofdesired transformationsaree.g.unitarytransformationswhenthe randomvariablesarevectors, andlineartransformationwhentherandomvariablesarerea lvalued.Invariancetosuch transformationsareagaintypicalinthecontextofvariabl eselectionsince,intuitively,the dependencebetweentheinputsandthetargetshouldnotdepe ndonwhethertheinputs arerotatedorscaled.5.2.1.5Estimator Anestimatorofdependenceshouldnotinvolveanyfreeparam eter,anditshould followthepropertiesofthecorrespondingmeasure.Forexa mple,itshouldconveythe samenotionofdependence,shouldbeinvarianttothesamecl assoftransformations,and shouldreachthemaximumvalueunderthesameconditions.5.2.2GeneralizedAssociation Theproblemofestimatingdependencebetweentworandomvar iablescanbe approachedintwoconceptuallydierentways.Therstappr oachstartswithameasure, i.e.afunctionoftherandomvariables,thatsatisessomed esiredcriteria,andthen 124

PAGE 125

proceedswithanestimator,usuallyconsistent,ofthismea surefromnitesamples ( Kruskal 1958 ).Thesecondapproach,ontheotherhand,startswiththesam ples, describesanintuitivewayofcapturingtheattributefromt hesamples,andthen, discussesameasureforwhichtheproposedmethodisanestim ator,usuallyconsistent ( Spearman 1904 ; Kendall 1938 ).Thelatterapproachismoretransparentinthecontext ofunderstandingandevaluatingdependenceintermsofther ealizations,ratherthanthe randomvariables,whereastheformerapproach,althoughit establishesadesiredmeasure, oftenfailstoprovideanappropriateestimator,thatfollo wssimilarproperties.Sinceour goalistounderstanddependenceintermsofthesamples,wef avorthelatterapproach. Inordertosatisfythedesiredpropertiesofanestimatorof dependence,weextendthe conceptofassociationfrom R toametricspace.Weconsiderametricspacesincethis spaceisoftenencounteredinpracticalproblemssuchasinv ariableselection. Inspiredbytheideaofassociation,weproposethat givenrealizations f ( x i y i ) g ni =1 fromapairofrandomvariables ( X Y ) Y isassociatedwiththe X ifcloserealization pairsof Y i.e. f y i y j g areassociatedwithcloserealizationpairsof X i.e. f x i x j g ,where the'close'nessisdenedbytherespectivedistancemetric softhespaceswheretherealizationsliei.e. X and Y .Inotherwords, iftworealizations f x i x j g areclosein X ,then thecorrespondingrealizations f y i y j g areclosein Y .Noticethatthisapproachgeneralizes theconceptofassociationsinceitdoesnotrequirethedoma inoftherandomvariablesto haveanorderi.e.adomainwherethenotionofarealizationb einglargemightnotmake sense.Moreover,italsogivesusanideaabout,giventwoset sofrealizationsfromtwo distributions,whatitmeanstosaythatonedistributionis moredependentthantheother (asinFigure 5-1 ). Thisparticularperspectiveofdependenceisveryintuitiv e.Forexample,ina regressionscenario,ifthedependentvariableisrelatedt otheindependentvariableby acontinuousfunction,thenweexpectthatclosepointsinth eindependentdomainwould betransformedtoclosepointsinthedependentdomain,wher easinaclassication 125

PAGE 126

scenario,weexpecttheclassicationaccuracytoimprovei fcloseinputpointswould belongtothesameclass,thusmakingthisnotionofdependen ceveryusefulinthecontext ofvariableselection.Ontheotherhand,itisalsousefulin thecontextofdimensionality reductionsinceitisdesiredthatcloserealizationsinthe originalspacealsoremainclosein theprojectedspace.5.2.3GeneralizedMeasureofAssociation Thenextstepistodesignanestimatorofdependencethatsat isesthedesired properties.Toensurethat,weconstructanestimatorthato nlyconsiderstherelative positionsoftherealizationswithrespecttoeachotherrat herthantheirabsolutelocations. ThisideaissimilartotheapproachconsideredbyKendallor Spearmancomparedtothe approachconsideredbyPearson,anditallowstheestimator ofdependencederivedfrom thisconcepttoretaincertaindesiredinvarianceproperti esofameasureofassociation. First,letusconsiderthefollowingsimplealgorithm, 1.Forall i 2f 1,..., n g ,repeatthefollowing; 2.Find x j closestto x i intermsof d X i.e. j =argmin j 6 = i d X ( x i x j ) 3.Findrank r i of y j intermsof d Y i.e. r i =# f j : j 6 = i d Y ( y j y i ) d Y ( y j y i ) g where d Z denotestheassociatedmetricofthespace Z .Thenwecanconsider r i 'stobe realizationsofarandomvariable R .Wecallthisthe rankvariable .Now,iftherandom variables X and Y areindependentthen R shouldbeauniformrandomvariablei.e.it shouldtakeanyvaluefromtheset f 1,..., n 1 g withequalprobability, assumingthat tworealizationsdonotsharethesamedistancefromathirdr ealization .Nowintheother extremecase,wesaythattherandomvariablesarestrictlyd ependentwhenallthe r i 's are 1 .Thishappens,forexample,when X = Y .Inanintermediatesituationthe r i 's shouldbeskewedtoalowervaluei.e.closerto 1 .Therefore,accordingtoournotionof dependence,tworandomvariablesaremoredependentifthed istributionof R ismore skewed.Thenalpieceoftheestimationistocapturetheske wnessof R .Thiscanbe doneisvariousways.Westartowithaverysimpleapproach. Weconsiderthearea 126

PAGE 127

undertheCDFof R normalizedby ( n 1) tobethevalueofthedependence.Notice thatwhentherandomvariablesareindependentthenthisval ueshouldbecloserto 0.5 whereaswhentherandomvariablesarehighlydependentthis valueshouldbecloseto 1 .Wecallthismeasure generalizedmeasureofassociation orGMA, GMA .GMAcanbe rewrittenas, GMA= 1 n 1 n 1 X r =1 ( n r ) P ( R = r ) (5{1) where P ( R = r )=# f i : r i = r g = n istheempiricalprobabilityoftherankvariable. Notice,however,thatGMAdenedinthismannerhasaserious drawback;itassumes thatnotworealizationssharethesamedistancefromathird realization.Although,this assumptionisperhapsvalidforapairofcontinuousrandomv ariables,itcaneasilybe violatedinpracticeformanyreasons,e.g.iftherealizati onsarebeinggeneratedfroma mixeddistribution,orthedistancemetricthatisbeinguse disa pseudometric .Insuch situations,itisnotpossibletoassignauniqueranktoarea lization.Toresolvethisissue, weconsideraprobabilisticrankingapproach.Inessence,w henweobservetiesineither X or Y ,weconsiderseveralrankstobeequallyprobable,andassig nequalprobabilitytothe possibleranks.Toelaborate,letusconsiderthefollowing algorithm, 1.Assign P ( R = r )=0 8 r 2f 1,...,( n 1) g 2.Forall i 2f 1,..., n g ,repeatthefollowing; 3.Findthesetofpoints f x j : j 2Jg closestto x i intermsof d X i.e. J = f j : j =argmin j 6 = i d X ( x i x j ) g 4.Forall j 2J ,ndthespreadofranksi.e. r i ,max and r i ,min of y j intermsof d Y such that r i ,max =# f j : j 6 = i d Y ( y j y i ) d Y ( y j y i ) g and r i ,min =# f j : j 6 = i d Y ( y j y i ) < d Y ( y j y i ) g 5.Forallrankvalues r i ,min < r r i ,max ,assign P ( R = r ) P ( R = r )+1 = jJj = ( r i ,max r i ,min ) = n 127

PAGE 128

Figure5-3.Illustrationoftheestimationofgeneralizeda ssociation Itcanbeeasilyseenthatuponcompletionofthisalgorithm P P ( R = r )=1 i.e.itis avalidPDF.Onceagain,GMAisdenedastheareaundertheCDF of R asin( 5{1 ). Thisapproachhassimilarpropertiesasthepreviousonei.e .itisboundedbetween 0.5 and 1 wherethelowerandtheupperlimitisachievedunderindepen denceandstrict dependence.Moreover,whenthedistancevaluesaredistinc t,thentheformerapproach isaspecialcaseofthelatterapproach.Figure 5-3 providesagraphicaldepictionof thismethod.Inthisparticularsituationtheredpointi.e. the i thpointhastwoclosest neighbors,blueandgreen,inthe X space.Forthebluesample r i ,max =4 r i ,min =3 whereasforthegreensample r i ,max =9 and r i ,min =5 .Therefore,fortheseparticular example,wehave P ( R =4) P ( R =4)+1 = 2 and P ( R = k ) P ( R = k )+1 = 2 = 4 for k =6,7,8,9 Noticethat,sinceweareworkingwithempiricalestimates, theminimumvalueisnot strictly 0.5 butitcanbelessthan 0.5 ,andapproaches 0.5 when n increases.Thisnature isverysimilartoanyconsistentestimatorofindependence .Also,althoughwehavenot foundaformalproof,weconjecturethattheproposedmeasur edoesnotassumevalue between 0 and 0.5 (exceptfortheinaccuracy'sduetonitesamples),andther efore,we alwaysuseaone-sidedtesttojudgethesignicanceofanacq uireddependencevalue. 128

PAGE 129

5.2.4RelatedWork AsimilarconceptasGMAhasalsobeenproposedby( FriedmanandRafsky 1983 ) inthecontextoftestofindependence,wheretheauthorshav eprovidedagraphtheoretic interpretationofmultivariateassociation.Inessence,( FriedmanandRafsky 1983 ) suggeststheuseof K minimumspanningtree(KMST)ora K -thnearestneighbor graph(see( FriedmanandRafsky 1983 )fordenition)ontherealizationsof X and Y tobuildtheirmethod.Ifweconsider K =1 i.e.thenearestneighborgraph,thenthe twoestimatorsexploredby( FriedmanandRafsky 1983 )become 1 =# f i : r i =1 g and 2 = P i r i ,respectively.Althoughsimilarinnature,theseestimato rsdonotsatisfy thedesiredpropertiesofdependenceaswehavediscussed.F orexample, 2 becomes smallerasdependenceincreases.Ontheotherhand,althoug h 1 isboundedbetween 0 and n ,itonlyconsiders n +1 distinctvalues,andtherefore,failstodetectasmooth transitionofdependence.Also,thisestimatorusuallygiv eslessstatisticalpower,since tworandomvariablescanbedependenteveniftheclosestnei ghborsin X and Y donot match.Although( FriedmanandRafsky 1983 )showsthat 2 achievessucientpower using K MST,itisunclearhow K shouldbeselectedinpractice.Also,computingMST increasesthecomputationalcostofthemeasureofassociat ion. 5.2.5Examples ToillustratetheideaofGMAmore,letusconsiderthefollow ingexamples,andshow howtheproposedmethodcapturesdependence.Fortheexampl esintheEuclideanspaces, weusethe l 2 normasthemetric. 5.2.5.1BivariateGaussian ConsiderabivariateGaussianrandomvariablewithvarying correlationcoecienti.e. ( X Y ) N ( 0 ) where = 264 1 1 375 129

PAGE 130

100 200 300 400 500 600 700 800 900 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RankCDF r = 0.11, Gen. Asso. = 0.50 r = 0.22, Gen. Asso. = 0.51 r = 0.33, Gen. Asso. = 0.52 r = 0.44, Gen. Asso. = 0.54 r = 0.55, Gen. Asso. = 0.56 r = 0.66, Gen. Asso. = 0.60 r = 0.77, Gen. Asso. = 0.64 r = 0.88, Gen. Asso. = 0.72 r = 0.99, Gen. Asso. = 0.91 Figure5-4.Rankvariablefor(increasing)correlationcoe cient Itisanwellacceptedfactthatthedependencebetween X and Y isamonotonicfunction ofthecorrelationcoecient.Figure 5-4 showsthedistributionoftherankvariable,and thecorrespondingdependencevaluesoverincreasingcorre lationcoecients.Weobserve thatwhenthecorrelationcoecientissmall,thentherankd istributionisrat,andit getsmoreskewedasthecorrelationcoecientisincreased. Wealsoobserveamonotonic increaseinthedependencevalues,asexpected.5.2.5.2ClaytonCopula ConsiderabivariateClaytoncopulai.e. C ( u v )=( u + v 1) 1 = withzeromean unitvarianceGaussianmarginals.Herethecoecient 0 controlsthedependence, andahighervalueof implieshigherdependencebetween X and Y .Figure 5-5 shows thedistributionoftherankvariable,andthecorrespondin gdependencecapturedbythe proposedframeworkfordierentvaluesof .Weobservethatthedistributionoftherank variableisratwhen issmall,anditgetsmoreskewedwhen increases,whereasthe estimateddependencevaluesshowamonotonicincrement. 130

PAGE 131

100 200 300 400 500 600 700 800 900 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RankCDF r = 2, Gen. Asso. = 0.63 r = 3, Gen. Asso. = 0.68 r = 4, Gen. Asso. = 0.72 r = 5, Gen. Asso. = 0.75 r = 6, Gen. Asso. = 0.78 r = 7, Gen. Asso. = 0.80 r = 8, Gen. Asso. = 0.82 r = 9, Gen. Asso. = 0.83 Figure5-5.Rankvariablefor(increasing)dependencepara meter 5.2.5.3MultivariateGaussian Considertwo 3 -dimensionalzeromeanidentitycovarianceGaussianvecto rs,where therstelementofonevectoriscorrelatedwiththeseconde lementoftheothervector, i.e. ( X Y ) N ( 0 ) where = 2666666666666664 100 0 0 010 000 001 000 000 100 00 010 000 001 3777777777777775 Figure 5-6 showsthedistributionoftherankvariable,andtheestimat eddependencevalue bytheproposedframeworkfordierentvaluesofthecorrela tioncoecients.Asexpected, weobserveasimilareectasinthepreviousexample.Howeve r,wenoticethatinthis particularexample,thedependencevalueisnotcloseto 1 eventhough iscloseto 1 Thishappenssincewhen =1 ,onlytwoelementsoftherandomvectorsbecomeidentical, 131

PAGE 132

100 200 300 400 500 600 700 800 900 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RankCDF r = 0.11, Gen. Asso. = 0.50 r = 0.22, Gen. Asso. = 0.50 r = 0.33, Gen. Asso. = 0.52 r = 0.44, Gen. Asso. = 0.53 r = 0.55, Gen. Asso. = 0.54 r = 0.66, Gen. Asso. = 0.56 r = 0.77, Gen. Asso. = 0.58 r = 0.88, Gen. Asso. = 0.62 r = 0.99, Gen. Asso. = 0.66 Figure5-6.Rankvariablefor(increasing)correlationcoe cient(multivariate) whereastherestoftheelementsstillremainindependent,a ndtherefore,thetworandom vectorsarenotstrictlydependent.5.2.5.4CategoricalData Considertwo 3 -dimensionalGaussianrandomvariables X 1 N ([0,0,0], I ) ,and X 2 N ([ , ], I ) ,andassumethattheyrepresenttwoclasses C 1 and C 2 respectively.Then, intuitively,thedependencebetweentherealizationsofth esetwodistributions X 1 and X 2 andtheirrespectivelabelsshouldincreaseastheybecomef artheraparti.e.as increases. Forthecategoricallabels,weusethemetric d ( x y )=0 if x = y and 1 otherwise.Figure 5-7 showsthedistributionoftherankvariable,andtheestimat eddependencevalueby theproposedframeworkfordierentvaluesofthedisplacem entparameter .Asexpected, weobserveasimilareectasinthepreviousexample.Howeve r,wenoticethatthe dependencevaluesaturatesaftercertain isreached.Thishappenssincefromthispoint onwardtherealizationsfromtwoclassesdonotoverlap.Als onoticethatthismaximum valueisnot 1 ,sinceseveralrealizationshavethesamelabel,andtheran kdistributionis notsmoothsince Y canonlytaketwodistinctvalues. 132

PAGE 133

100 200 300 400 500 600 700 800 900 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RankCDF r = 0.10, Gen. Asso. = 0.50 r = 0.50, Gen. Asso. = 0.54 r = 1.00, Gen. Asso. = 0.61 r = 1.50, Gen. Asso. = 0.68 r = 2.00, Gen. Asso. = 0.72 r = 2.50, Gen. Asso. = 0.74 r = 3.00, Gen. Asso. = 0.75 r = 3.50, Gen. Asso. = 0.75 r = 4.00, Gen. Asso. = 0.75 r = 4.50, Gen. Asso. = 0.75 Figure5-7.Rankvariablefor(increasing)displacementpa rameter 5.2.5.5TimeSeries Considerthefollowingbivariatetimeseries, x 1 ( t )=0.9 x 1 ( t 1) 0.5 x 1 ( t 2)+ 1 x 2 ( t )=0.8 x 2 ( t 1) 0.5 x 2 ( t 2)+ 0.16 x 1 ( t 1) 0.2 x 1 ( t 2)+ 2 where ( 1 2 ) N ( 0 ) and = 264 1 1 375 Thisisalinearautoregressivemodelwherethetwotimeseri esareinstantaneously correlatedthroughthecorrelatednoise.Wegenerate 100 timeseriesoflength 10 each, andusethe l 2 distancebetweenthemagnitudespectrumasameasureofdist ance.Figure 5-8 showsthedependencecapturedbytheproposedframeworkfor dierentvaluesofthe correlationcoecients.Weagainobservesimilareectsas inthepreviousexamples. 5.2.6Propertiesofgeneralizedmeasureofassociation Beforenishingthissectionwebrierydescribesomeproper tiesfollowedbyGMA. 133

PAGE 134

10 20 30 40 50 60 70 80 90 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RankCDF r = 0.11, Gen. Asso. = 0.51 r = 0.22, Gen. Asso. = 0.51 r = 0.33, Gen. Asso. = 0.52 r = 0.44, Gen. Asso. = 0.53 r = 0.55, Gen. Asso. = 0.56 r = 0.66, Gen. Asso. = 0.60 r = 0.77, Gen. Asso. = 0.67 r = 0.88, Gen. Asso. = 0.77 r = 0.99, Gen. Asso. = 0.94 Figure5-8.Rankvariablefor(increasing)correlationcoe cient(timeseries) 1.Theproposeddependencemeasureisdenedbetweenanytwo randomvariables thattakevaluesinmetricspaces,anditallowstherandomva riables X and Y toassume valuesintwodistinctspaces. 2.Theproposedmeasureisasymmetric.Theintuitionbehind thisisthatif Y is anon-invertiblefunctionof X thengiventworealizations ( x i y i ) and ( y i y j ) if x i and x j areclosethen y i and y j areclose.However,thisisnottruetheotherwayaround.For example,considerthecase Y =sin X .SeeFigure 5-2 foranillustration. 3.If X ? Y 2 then GMA 0.5 ,andif X = Y then GMA=1 .However,notice thatboththeseconditionsarenecessary,andatthispoint, wedonothaveaformalproof whethertheseconditionsarealsosucient. 4.Theproposedmeasureisinvarianttoanyisometrictransf ormationof X and Y sinceitissolelybasedonpairwisedistance.Forexample,i n R n itisinvarianttounitary transformation.Moreover,sinceitisbasedonrelativedis tancevaluesthanabsolute,itis invarianttoanytransformationthatleavestherelativedi stancebetweentherealizations 2 X ? Y impliesthat X and Y areindependent 134

PAGE 135

intact.Forexample,when X and Y arerealvalued,thentheproposedmeasureis invarianttolineartransformations. 5.Theproposedapproachisparameterfree. 5.3Applications InthissectionwediscusstheapplicabilityofGMAtopracti calapplicationssuchas timeseriesanalysis,variableselectionandcausalinfere nce,anddiscussitsstrengthsand weaknesses.5.3.1TimeSeriesAnalysis:Auto-dependence Autocorrelationfunctionhasbeenanessentialtoolfortim eseriesanalysisfor overmanydecades.Althoughthistoolhasbeensuccessfully appliedinmanyresearch problems,ithasastrongdrawback.Autocorrelationcanonl ycapturethelinear dependencebetweenpastandpresentvaluesofatimeseries. However,inpractice,there existtimeseriesthatexhibitnonlineardependenceoverth elags,e.g.econometrictime series.Inrecentyears,therefore,therehasbeenasubstan tialdevelopmentincapturing nonlineardependence.Thesemethodsusuallyfocusonusing MI( GrangerandLin 1991 ; Chapeau-Blondeau 2007 ).Inthissection,weemployGMAtoexplorethenonlinear dependenceoverdierentlagsusingthefollowingdatagene ratingprocesses(DGP)from ( GrangerandLin 1991 ). 1. y t = t +0.8 2t 1 +0.8 2t 2 +0.8 2t 3 2. y t =sign( y t 1 )+ t 3. y t =0.6 t 1 y t 2 + t 4. y t =4 y t 1 (1 y t 1 ) t > 1 1 < y 1 < 1 TherstprocessDGP1isamovingaverageprocessoforder 3 .Therefore,agood measureofdependenceshouldindicatezerodependencefora nylagabovethecorrect order.ThesecondprocessDGP2,ontheotherhand,isanautoregressiveprocessof order 1 .Therefore,anappropriatedependencemeasureshouldindi cateaninnitely longdecayingmemory.ThethirdprocessDGP3isabilinearpr ocesswithwhitenoise 135

PAGE 136

characteristics,whereasthefourthprocessDGP4isalogis ticmap,adeterministicchaotic process.Agooddependencemeasureshouldrevealtheauto-r egressivenatureofDGP3, whereasthedeterministicnatureofDGP4.However,givenon lyniterealizationsthese areoftendicult. Sincetwoconsecutiverealizationsofatimeseriescouldbe dependent,itisnot possibletogenerate iid realizations.However,assumingthatthedependencedrops sucientlyoverlonglags,say ,wegenerate n realizationsinthefollowingway, f ( y m y m l ) g n 0 + n m = n 0 +1 ,totestthedependenceoverlag l .Wecomputeauto-dependence for l =1,...,9 ,andset n =1000, =25 .ForGMA,wecompute GMA( y t l y t ) i.e thedependenceofthepresentvalueonthepastvalue.Formut ualinformation(MI)we usethenearestneighborbasedestimatordiscussedin( Kraskovetal. 2004 ),andset theneighborhoodparameterto 5 ,thedefaultvalueinthecorrespondingtoolbox.For Hilbert-SchmidtIndependenceCriterion(HSIC)( Grettonetal. 2007 ),weuseaGaussian kernelandsetthekernelsizetothemedianintersampledist ance. Weshowtheauto-dependencestructureevaluatedbytheseme asuresinFigure 5-9 wherewealsoreportthestatisticalpowersachievedbythes emethodsusingpermutation testwithsize 0.05 .Thesolidlineisthemedianoftheauto-dependencecompute dover 128 trials.Thedottedlineisthemedianofthesurrogatedatage neratedbypermuting theoriginalrealizations,againcomputedover 128 trials.Theverticalvaluesindicatethe powersofthecorrespondingpermutationtestsofsize 0.05 .NoticethatunlikeGMAand Kendall's ,MIandHSICarezeroifandonlyiftworandomvariablesarein dependent. Therefore,itisexpectedthatthesetwomeasureswouldprov idehigherstatisticalpowers whileevaluatingindependence.Also,sinceweareworkingo nthereallineKendall's shouldhaveanadvantageoverothermethods,sinceitisden edsolelyon R .Weobserve thatthesemeasuresonanaveragefollowsimilartrends.GMA andKendall's ,however, failtocapturethedependenceforDGP3.Apossibleexplanat ionfortheirfailureis thatforthisbilinearprocessthedependencedoesnotappea rinsimplefunctionalform. 136

PAGE 137

5 10 0.5 0.6 0.7 1.00 0.77 0.20 0.09 0.02 0.12 0.11 0.03 0.09LagsGMA ADGP1,GMA 5 10 0.5 0.6 0.7 0.8 1.00 1.00 0.78 0.41 0.09 0.09 0.05 0.03 0.06LagsGMA BDGP2,GMA 5 10 0.6 0.8 1 1.2 0.06 0.08 0.05 0.19 0.09 0.01 0.05 0.06 0.06LagsGMA CDGP3,GMA 5 10 0.6 0.8 1 1.2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00LagsGMA DDGP4,GMA 5 10 0 0.2 0.4 0.6 1.00 1.00 0.06 0.09 0.08 0.050.05 0.04 0.05LagsKendall's t EDGP1,Kendall's 5 10 0 0.2 0.4 0.6 0.8 1.00 1.00 1.00 1.00 1.00 0.84 0.63 0.36 0.15LagsKendall's t FDGP2,Kendall's 5 10 0 0.5 1 0.04 0.12 0.070.05 0.02 0.070.09 0.06 0.05LagsKendall's t GDGP3,Kendall's 5 10 0 0.5 1 0.27 0.10 0.13 0.02 0.05 0.01 0.04 0.12 0.04LagsKendall's t HDGP4,Kendall's 5 10 0 0.2 0.4 1.00 0.95 0.24 0.04 0.04 0.05 0.04 0.050.04LagsMI IDGP1,MI 5 10 0 0.2 0.4 1.00 1.00 0.90 0.36 0.15 0.08 0.08 0.08 0.05LagsMI JDGP2,MI 5 10 0 0.2 0.4 0.10 0.36 0.04 0.05 0.05 0.02 0.05 0.05 0.02LagsMI KDGP3,MI 2 4 6 8 10 0 2 4 6 1.00 1.00 1.00 1.00 1.00 0.98 0.38 0.07 0.06LagsMI LDGP4,MI 5 10 0 0.02 0.04 1.00 1.00 0.82 0.100.080.040.070.050.05LagsHSIC MDGP1,HSIC 5 10 0 0.02 0.04 1.00 1.00 1.00 1.00 0.98 0.84 0.54 0.210.09LagsHSIC NDGP2,HSIC 2 4 6 8 10 0 2 4 x 10 -3 0.88 1.00 0.130.08 0.07 0.08 0.060.020.05LagsHSIC ODGP3,HSIC 5 10 0 0.05 0.1 1.00 1.00 0.04 0.090.050.060.110.080.03LagsHSIC PDGP4,HSIC Figure5-9.Auto-dependencestructureoftimeseriesOntheotherhand,HSICandMIperformratherpoorlyincaptur ingdependenceof DGP4.Ithappensperhapsduetothechoiceoffreeparameter, whichbecomesdicult whentherandomvariablesarerelatedinadeterministicfas hion.Inthersttwocases, however,HSICachievesthebestpowercomparedtoothermeth odssinceitisatestof independence. 137

PAGE 138

5.3.2VariableSelection Variableselectionisanintegratedpartofmachinelearnin g,andanessentialtool fordealingwithhighdimensionaldata.Theobjectiveofvar iableselectionistoextract asubsetofinformativevariables ~ S fromthepoolofallthevariables S ,wherebeing informativeissubjectivetoaparticulartaske.g.classi cation( GuyonandElissee 2003 ). Intuitively, ~ S canbeselectedbymaximizingthedependencebetween ~ S andthetarget T i.e ~ S =argmax s S D ( s T ) .However,thisstraightforwardapproachrequiresanexhau stive searchoverallpossiblecombinations s S whichisprohibitiveinpractice.Agreedy solutiontothisproblemistoaddonevariable s m atatimetothesetofalreadyselected variables ~ S m 1 suchthatthedependencebetweenthenewsetofvariables ~ S m 1 [ s m and thetargetismaximizedi.e. s m =argmax s m 2 S n ~ S m 1 D ( ~ S m 1 [ s m T ) ,for m =1,..., M where ~ S 0 = .However,applyingGMAinthisset-upmaynotbefavorablesi ncerst, thisapproachisstillcomputationallyexpensive,especia llyinconjunctionwithGMA's quadraticcomplexityinnumberofrealizations,andsecond ,thevalueofGMAcaneasily saturateinhigherdimensionsandsparserealizationsassh owninExample 5.2.5.4 Therefore,wefollowasimplerankingbasedapproachwherew erankthevariablesin termsoftheassociationofthetargetwiththecorrespondin gvariablei.e. GMA( s m T ) 8 s m 2 S ,andselectthetoprankedvariables.Thisapproachiscompu tationallysimpler,andstill providesfavorableperformanceaswewillsee.Adrawbackof thisapproach,however, isthatitcannotguaranteethattheclassicationerrorwil ldropmonotonicallywith increasingnumberoffeaturessincetheconsecutivefeatur esareindependentlyselected. 5.3.2.1InteractiveAssociation Aideaofassociationcanbeexploitedtocapturesynergyand redundancyasfollows: Giventhreerandomvariables ( X Y Z ) Y isinteractivelyassociatedwith X and Z ,if closerealizationpairsof Y i.e. f y i y j g aremoreassociatedwithcloserealizationpairsof ( X Z ) i.e. f ( x i z i ),( x j z j ) g thanwithcloserealizationpairsof Z i.e. f z i z j g or X i.e. f x i x j g alone .Inotherwords, knowingthattworealizations f ( x i z i ),( x j z j ) g areclosein 138

PAGE 139

Table5-1.Descriptionofdatasetsandtheperformanceofva riableselectionmethods DatasetDescriptionClassicationerror IndexVar.Sam.Cl.GMAMIAmRMRRelief BreastcancerBRE24481972 0.420.420.40 0 38 CentralnervoussystemCNS7129602 0.460.440.44 0 42 ColontumorCOL2000622 0.260.240.22 0 21 DLBCL-StanfordDLS4026472 0.13 0 11 0.160.13 DLBCL-Harvard-OutcomeDLO7129582 0.16 0 13 0.150.20 DLBCL-Harvard-TargetDLT7129772 0.510.51 0 45 0.47 LeukemiaALL-MLLLEA7129722 0 080 07 0.110.12 LeukemiaMLLLEM12582723 0.13 0 12 0.120.19 Lungcancer-DanaFarberLCD126002035 0.150.13 0 11 0.22 Lungcancer-BrighamLCB125331812 0 020 020 02 0.06 Lungcancer-MichiganLCM7129962 0.020.01 0 01 0.05 OvariancancerOVC151542532 0.030.02 0 01 0.03 ProstatecancerPRO126001362 0.13 0 11 0.320.21 XZ ,bringstherealizations f y i y j g closerin Y ,thanknowingthattherealizationpairs f x i x j g and f z i z j g areclosein X and Z individually Followingthedenition,wecaptureinteractiveassociati onsimplybythedierence betweenthegeneralizedassociationsasfollows, MIA( X ; Y ; Z )=GMA(( X Z ), Y ) max(GMA( X Y ),GMA( Z Y )). Then MIA( X ; Y ; Z ) > 0 impliessynergyand MIA( X ; Y ; Z ) 0 denotesredundancy.To elaboratethisconceptletusconsidertwosimpleexamples. Firstconsider, Y =( X Z ) where X and Z areindependentrandomvariables.Here X and Z havethesame eect andtherefore,therandomvariabletripletexhibitssynerg y.Also, GMA(( X Z ), Y )=1 Now,ifthecontributionof Z ( X )islowcomparedto X ( Z ),thentheclose-nessofthe realizationsof Y ismostlygovernedby X ( Z ),andthus, GMA( X Y ) / 1 ( GMA( Z Y ) / 1 ),andtherefore, MIA( X ; Y ; Z ) 0 .Ontheotherhand,if X and Z shareequal contributionthen GMA( Z Y ) 1 ( GMA( X Y ) 1 ),andtherefore MIA( X ; Y ; Z ) 0 Next,considertheexample, X = U Y = U and Z = U + V ,where U and V areindependentrandomvariables.Since X eectsboth Z and Y ,therandomvariable 139

PAGE 140

tripletexhibitsredundancy.Now, GMA( X Y )=1 ,andsince Z iscorruptedby V GMA( Z Y ) < 1 .Now,ifthecontributionof V issmallwithrespectto U ,then GMA([ X Z ]; Y ) / 1 ,andtherefore, MIA( X ; Y ; Z ) 0 .Ontheotherhand,ifthe contributionof V ishighcomparedto U ,then GMA([ X Z ]; Y ) 1 ,andtherefore, MIA( X ; Y ; Z ) 0 5.3.2.2CausalVariableSelection Wefollowasimilarapproachasin( BontempiandMeyer 2010 )forselecting informativevariables,butwereplacemutualinformationb yassociation.Inbrief,we employaforwardselectionapproach,andselectavariables uchthatitmaximizesthe associationbetweentheoutputandthevariableaswellasth eaveragesynergyamongthe variable,theoutputandthealreadyselectedvariables,i. e.,mathematically, s m =argmax s m 2 S n ~ S m 1 GMA( s m T )+ 1 j ~ S m 1 j X s 2 ~ S m 1 MIA( s m ; T ; s ) for m =1,..., M where ~ S 0 = Intuitively,thisapproachdistinguishbetweenadirectca useandanindirectcause whilechoosingvariables.Forexamplegiventwoinputvaria bles X 1 and X 2 andan outputvariable Y ,itisimportanttodistinguishbetweenthethreepossiblec hoices X 1 Y X 2 X 1,2 X 2,1 Y and X 2,1 X 1,2 Y .Therstcombinationexhibits synergywhereastheothersexhibitredundancy,andtheprop osedapproachweightsthe rstcombinationmore. Wetestthisapproachon 13 geneexpressiondatasetsasdescribedinTable 5-1 .We randomlydivideeachdatasetin 75% 25% tousefortrainingandtestingrespectively. Weselectvariablesusingthetrainingsetonly,thentraina classierwiththesevariables, andtestusingthetestset.Werepeatthisprocessfor 128 randomdivisionsandreport theaverageclassicationerrorachievedbytheproposedme thod,Relief( RobnikSikonja andKononenko 2003 ),mRMR( Pengetal. 2005 ),andvariablesselectedbyjustranking themaccordingtomaximumassociationinTable 5-1 .Eachentryinthetableisthe 140

PAGE 141

5 10 15 20 25 0.36 0.38 0.4 0.42 0.44 0.46 BRE 5 10 15 20 25 0.2 0.3 0.4 COL 5 10 15 20 25 0.1 0.2 0.3 DLS 5 10 15 20 25 0.1 0.15 0.2 0.25 DLO 5 10 15 20 25 0.1 0.15 0.2 LEA 5 10 15 20 25 0.2 0.4 LEM 5 10 15 20 25 0.1 0.2 0.3 LCD 5 10 15 20 25 0.02 0.04 0.06 0.08 0.1 LCB 5 10 15 20 25 0.02 0.04 0.06 0.08 0.1 LCM 5 10 15 20 25 0.01 0.02 0.03 OVC 5 10 15 20 25 0.2 0.4 PRO 5 10 15 20 25 0.16 0.18 0.2 0.22 0.24 0.26 0.28 Number of variablesClassification error Max Asso Min Int Max Asso mRMR Relief Figure5-10.Performanceofvariableselectionalgorithmo nmicroarraydatasets classicationerroraveragedover 1 25 bestinformativevariables,andover k NN classierswith k =1,3,5,7 ( Dudaetal. 2000 ).Weobservethattheproposedapproach outperformsbothReliefandmRMR.Wealsoshowtheaveragepe rformanceinFigure 5-10 5.3.3CausalInference Althoughcausalinferenceinvolvingtime,orthecausality inthesenseofGranger, hasbeenwidelyusedfor 40 yearsinmanydiverseresearchareassuchaseconometrics, neuroscienceandbioinformatics,causalinferencewithou ttimestillremainsalargely unexploredarea.However,thisareahasrecentlygainedcon siderableattentionfollowing theworkofPearl( Pearl 2000 )andSpirtesandcoworkers( Spirtesetal. 2001 ).Itis acommonpracticetoinfer X Y if Y isdependenton X throughanon-invertible 141

PAGE 142

relationship( FriedmanandNachman 2000 ).Thisideahasbeengeneralized,andithas beenproposedthat X Y if Y canbewrittenas Y = f ( X )+ where f isafunctionand ? X isanindependentnoise.However,tomaterializethisconce ptoneneedstorstta function f totherealizations f ( x i y i ) g ni =1 andthenperformanindependencetestbetween theresidual and X ( Mooijetal. 2009 ).Boththeseapproachesarecomputationally expensive. SinceGMAisasymmetricinnature,wehypothesizethat X Y if GMA( X Y ) > GMA( Y X ) .Theintuitionbehindthisapproachisthatif X causallyinruence Y ,then smallchangesin X wouldbeassociatedwithsmallchangesin Y .Totestthishypothesis, weconsiderabootstrapmethodi.e.werandomlyresamplefro mtheoriginaldataand compute f GMA( X ( t ) Y ( t ) ) GMA( Y ( t ) X ( t ) ) g Tt =1 where t denotestheresamplingtrials. Thenweconclude X Y ifasignicantnumberofthesetermsaregreaterthanzero.F or thefollowingexperiment,weset T =1000 ,andinfer X Y if # f GMA( X ( t ) Y ( t ) ) GMA( Y ( t ) X ( t ) ) > 0 g > 800 .Foreachresampling,weresample 75% oftheoriginal samples. Weapplytheproposedapproachtothecause-eect-pairdata setavailable online Beforeapplyingtheproposedmethod,weremoveanyrepetiti onsintherealizations.In Table 5-2 ,wepresenttheeectivesamplesizesafterremovingrepeti tions,numberoftimes X Y hasbeenacceptedoutof 1000 trials,numberoftimes Y X hasbeenaccepted outof 1000 trials,decisionsmadebytheproposedapproachbyobservin gwhethermore that 800 trialshaveagreed,andthegroundtruths.Overall,wehaveo nlybeenableto makethecorrectdecisiononly 43% ofthetimes.However,weobservethattheproposed approachhasonlymadeadecision 56% ofthetimes,andoutofthesedecisions 75% of thedecisionsarecorrect,whichispromising. In( Mooijetal. 2009 )theauthorshaveproposedanHSICbasedapproachtota modelandthenapplyanindependencetestbetweentheresidu alandinputtodecide causalinruence.Theyapplythisapproachontherst 8 pairs,whereitfailstodetectthe 142

PAGE 143

Table5-2.Causaldirectionsderivedfromthecause-eectp airs IndexSample X YY X DecisionTruth 1344657343! 2349757243! 3348621379! 4349279721! 598310000 !! 6230710000 !! 784910000 !! 845410000 !! 9379910000 !! 10344210000 !! 11287910000 !! 122920651349! 1332693070 !! 1433297030 !! 153863997 1634698614 !! 173148-1000 18310891109 !! 19185-1000 2034396904 21349814186 !! 2235580920 23415285715! 24407434566! 25990218782! 269409919 !! 27933518482! 28992381619! 299544996 3098493367 !! 319909937 !! 3291410000 !! 3313998713 !! 342509955 !! 3521193367 !! 361839964 !! 3723110000 !! IndexSample X YY X DecisionTruth 3870899010 !! 394265995 4046825975 4166498020 !! 42829210000 !! 439960239761! 449808869131 !! 459931231769! 466630615385! 4750648351748168607393493654735275036365634451365439561521020948351753989150850 54392731269! 5572408592561925274735719271228858192160840 5919215985 6019177822261191853147 621927642366319244455664138265735! 65127992872 !! 661305548452! 671307257743! 681291147853 6990810000 7030-1000 7199-1000 72163241959 73508452947174193739261! 75204395605! truecausalitybetweeninpair 6 .Otherwiseisperformswell,makingonlyoneerrorinpair 4 .However,noticethatthisaccuracycomesattheexpenseofc omputationalload.Our approach,ontheotherhand,initspresentstateisfairlysi mple,andeasytoevaluate. But,furtheranalysisisnecessarytounderstandwhenandwh ythismethodfails,andalso, whenandwhyitsucceeds. 143

PAGE 144

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 400 450 500 Time (sec)TrialRaster plot Figure5-11.RastersforinhomogeneousGammarenewalproce ss 5.3.4ApplicationtoPointProcesses Inthissection,weapplytheproposedapproachonpointproc essesusingsimulated data,andcompareitsperformanceagainstthemethodpropos edby( VictorandPurpura 1997 ).Beforeproceeding,webrierydiscussthenotionofdistan ceonthespaceofspike trains.Thedistancebetweentwospiketrainscanbedenedi nseveralways,forexample, see( VictorandPurpura 1997 )and( vanRossum 2001 ).However,wechoosetoexplore themeasuresuggestedby( VictorandPurpura 1997 ),andleaveadetailedstudyonthe eectofotherdistancemetricsasfuturework.5.3.4.1Victor-PurpuraSpikeTrainMetric Thespike-distancemetricexploredbyVictorandPurpura( VictorandPurpura 1997 )isacost-basededitdistancebetweensequencesoftimeeve nts.Thedistanceis denedastheminimumcostoftransformingonespiketrainto theother,bythreesimple operationsi.e.,shifting,adding,andremoving,whereeac hoperationhasanassociated cost.Giventwospiketrains x = f x i g mi =1 and y = f y i g ni =1 ,eachspike,say x i ,iseither matchedtoasinglespikefromtheotherspiketrain,say y j ,toformamatching ( x i y j ) 144

PAGE 145

withacostof q j x i y j j ,orthespikeisdeletedfrom x i atcost 1 .Anyunmatchedspikesin y isassessedacostof 1 each.Therefore,thesoleparameter q controlstemporalprecision, i.e.,thecostofmovingaspikeintimeversussimplyremovin gitandperhapsaddingit somewhereelse.Let M = f ( x i y j ),... g beasetofmatchingswhereaparticularspikecan appearinatmost1element.Thenthedistanceisdenedas( Dubbsetal. 2010 ) d ( q ) ( x y )=min M 24 X ( x i y j ) 2 M q j x i y j j +( m + n 2 j M j ) 35 Forthecase q =0 ,if m = n then d (0) ( x y )=0 ,andif m 6 = n ,then d (0) ( x y )= j m n j For q = 1 ,unlessspikesexactlyalignintime,theymustberemovedan dre-addedto alignsincenoshiftingisallowed,thustheminimumcostis d ( 1 ) ( x y )= m + n .For 0 < q < 1 ,thismetrictakesintoaccountthetemporalpositionsofth espikes,and q controlsthetemporalresolutioni.e.,howmuchseparatedt hespikesareintime.Ithas beenrecentlyshownthatthismetricisessentiallya L 1 metricandthatitcanbeextended to L p spaces( Dubbsetal. 2010 ).Inaddition,themetrichasalsobeenextendedto multiplespiketrains( Aronov 2003 ),buthereweonlyconcentrateonsinglespiketrains. Implementation Theframeworkforthesimulationanalysiswasconductedusi ng theSpikeTrainAnalysisToolkit,aneuroinformaticsresou rcefundedbytheNIH'sHuman BrainProject.ThetoolboxhasecientC/C++implementatio nsofmanyinformation theoreticquantities( Goldbergetal. 2009 );ofinteresttousisitsimplementationof VictorPurpuraspiketrainmetricandtheclustering-based mutualinformationestimator thatweuseforabaselinecomparisoninasimulatedexperime nt.Thedefaultparameters oftheclustering-basedmutualinformationalgorithmwere used.Fortheactualmetric weusedVictorPurpuraspikemetricacrossarangeof q ,whichcontrolsthetemporal precision.Thegeneralizedassociationcodewasimplement edinMATLABandthe simulationswererunconcurrentlyusingthesamedistancee valuationsprovidedbythe SpikeTrainAnalysisToolkit. 145

PAGE 146

5.3.4.2DependenceBetweenStimulusandSpikeTrains Weexplorethecharacteristicsofthegeneralizedassociat iontocapturethedependence betweenavariablethatcontrolsthecharacteristicsofsim ulatedspiketrains.Weexpand onanexperimentontemporalphasediscriminationusedin( VictorandPurpura 1997 ). Thegoalistoidentifythedependencethephaseimposesoncy clicspikingactivity. TheunderlyingspikingisaninhomogeneousGammarenewalpr ocess,withunderlying marginalintensityfunction ( t )= R 0 [1+ m cos(2 t = T + )]. ToformtheGammaprocesswithshape k ,where k isapositiveintegervalue,aPoisson processwithrate k ( t ) isformedandevery k thspikeiskept.Theresultingspiketrain ismuchmoreregularthananinhomogeneousPoissonprocess, whichlendsitselffor moreprecisetemporalcomparisons.Spikesweregeneratedw ithdierentphase chosen fromadiscreteset ofuniformlyspacedphasesintheinterval (0,2 ) .Liketheoriginal experimentthebaseringrateis R 0 =20 Hz,theperiodwas T =0.25 ms,andlengthof spiketrainsgeneratedis 1 s.Thetuningdepth m wasvariedtoalterthedicultyofthe dependenceestimation.Thesetofspiketrainsusedfor m =0.5 and j j =4 isshownin Figure 5-11 Eachvalueof servesasacategoryforestimatingmutualinformation,but for generalizedassociationthesevaluesaretreatedascontin uouswithdistancemetric, d ( 1 2 )=cos 1 (cos( 1 2 )) .Thus,themeasureofassociationcanusetheordering tobetterquantifythedependenceinthiscase.Weinvestiga tehowwellthemethods capturedependencebetween andtheresultingspiketrainsasthetuningdepth m and thenumberofdiscretephases j j varies.Havingonlydiscretephases actually under utilizes GMA,asGMAcanbeappliedtocontinuouslydistributeddata, butusingdiscrete valuesfacilitatesadirectcomparisonwiththeexistingcl ustering-basedmethod. 146

PAGE 147

Eachdependencemeasurementwascalculatedon100bootstra ptrials.Oneachrun thedependencewasalsocalculatedonshuedcategoriestog enerateanindependent surrogatedistribution.Weuseahypothesistestforindepe ndencewithsize 0.05 to testthestatisticalpowerofthemethods.Fortuningdepth m =0 thespiketrainsare independentof Thesimulationresultsforatuningdepthof m =0.5 (Figure 5-12 )showthe dependencevalueacrosstemporalprecisions q .Itpeaksatatemporalprecisionof q =16 s 1 ,thisisthequarter-waveoftheunderlyingperiodofthespi kegeneration. Similarresultsformutualinformationareshownin( VictorandPurpura 1997 ).GMA showsasimilartrendacross q ,butithasasharperpeaksurroundingthepeaktemporal precision,whereas,themutualinformationispeakedforla rgerrangeofvalues;thus,GMA betterhighlightstheoptimaltemporalprecision.Acrossd ierentsamplesizesthevariance ofbothmethodsdecrease.Notethatthesmallsamplesizebia sdevelopedintheshued surrogateisnotremovedfromthemutualinformationcalcul ationtoillustrateitsbehavior acrossparameters;whereas,GMAhasnochangeinbiasfordi erentsamplesizes.The meansofestimatedGMAstaythesameoverdierentsamplesiz eswhereasthemeans oftheestimatedmutualinformationdecreasewhensamplesi zeisincreased.Itshould benotedthatfordierentnumbersofcategories, j j ,thetheoreticalvalueofmutual informationshouldbe log 2 j j bits. Wefurtherexaminehowthemethodsareabletoassess(in)dep endenceattuning depth m =0.2( m =0) whilethetemporalprecisionforthemetricisheldat q =16 (Figure 5-13 ).Itisevidentthatbothmethodsareabletodetectthedepen dence.In addition,bothmethodsshowadecreaseinvarianceasthenum berofsamplespercategory increases;however,atsmallnumberofsamplespercategory GMAhasmuchlower variance;animportantcharacteristicforneuroscienceap plicationswherethenumber oftrialsperstimulusorexactlyrepeatedstimuliarelimit ed.Inaddition,thereisno changeinthemeansoftheestimatedGMAvaluesfordierents amplesizes.Overall,the 147

PAGE 148

n n r n !"# $!% $&'rr( $!% $&'rr( $!% $&'rr( Figure5-12.GMAacrossvaluesof q nrnrnnn !"# $%&n&n &'"(' &'"' &'"' &'"' &'")' &'"' &'"' &'")' &'"(' nnn*&nn&+" nn*&nn&+" Figure5-13.Dependencemeasuresfordierentnumberofcla ssesandsamples 148

PAGE 149

resultslendcondencethatGMAisastablemeasurethatisab letocaptureassociation acrosschangesinnumberofcategoriesandsamples,andisno tplaguedbyneedforbias correctionthatisneededwithmutualinformationestimato rs. 5.3.4.3DependenceBetweenSetsofSpikeTrains Inthispaper,wemostlyfocusonassessingdependencebetwe eninputstimulus andtheresultingspiketrains.However,inthissection,we provideasimpleexampleto showthattheproposedapproachcanalsobeappliedtocaptur edependencebetween twosetsofspiketrainsi.e.,whenboththespaces X and Y arespacesofspiketrains.In ordertodothat,wefollow( Mackeetal. 2009 )togeneratetwosetsofspiketrainswith knowncovariancestructure.Noticethat( Mackeetal. 2009 )onlyproposesanapproach togeneratebinnedspiketrains.Therefore,tocollectthet imeinstancesoftheevents, wegeneratebinaryspiketrainsi.e.,spiketrainswithatmo stonespikeperbin,and considerthebeginningofthebintobethespiketiming.Fore ach i =1,..., n ,weset themeanspikingprobabilityforbothspiketrainsto 0.05 andgenerate 100 binseach. Thisisequivalenttogeneratingtwo 1 slongspiketrainswithmeanringrate5Hzeach. Wevarythecovarianceofthebinaryspiketrainsintheallow edrange,from 10 4 to 9.9 10 3 ,andpresenttheestimateddependencevaluesinFigure 5-14 usingVictor Purpurametricwith q =20 and n =40 realizationswherethedottedlineisthe0.95 quantileofthesurrogatedata,andthesolidlineisthemean standarddeviationofthe actualGMAvalues.Bothsurrogateandactualvalueshavebee ncomputed128times.It caneasilyobservedthatGMAcansuccessfullydetectthedep endenceamongthespike trainobservations.Noticethat,inapracticalsituationt hetwosetsofspiketrainsmight requiretwodierentmetricstocapturetheirrespectivest atistics.However,herewehave usedthesame q forbothsincetheyhavethesamemarginalstatistics. 5.3.4.4Micro-stimulationData Weapplytheproposedmethodtoquantifythedependencebetw eenmicrostimulation parametersandtheneuralresponsestheyelicit,inparticu lar,thedependenceofthe 149

PAGE 150

0 2 4 6 8 x 10 -3 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 CovarianceGen. Asso. Surrogate Actual Figure5-14.GMAbetweentwosetsofspiketrainsneuralresponse(spiketrains)recordedinthesomatosenso rycortex(S1)ontheelectrical microstimulationadministeredinthethalamicsomatosens oryregion.Theneuralspiking usedinthisanalysisisfromaratwithtwochronicallyimpla nted16-channel( 2 8 ) tungstenmicro-wirearrays(TuckerDavis).Neuronalactiv itywasrecordedfromonearray inthecortex(S1)usingthePlexonmultichannelacquisitio nprocessor.Actionpotentials weredetectedusingaconstantthresholdandweresortedbya semi-automatedclustering procedure(SortClient)usingtherst3principalcomponen tsofthedetectedwaveforms. Thesecondarraywaspositionedintheventralposteriorlat eral(VPL)regionofthe thalamus. Priortotherecordingsession,anesthesiawasinducedbyis oruoranefollowedby aNembutalinjectionandmaintainedwithisoruorane.Apair ofthalamicchannels withresponsetocutaneoustouchofaforepawdigitweresele ctedasthechannelsfor microstimulation.Bipolarmicrostimulation(AMSystemsM odel2200Isolator)was appliedtotwoadjacentelectrodesinthethalamicarray.Ea chstimulationconsistedof abiphasicsquarepulse(Figure 5-15 ).Thepulsedurationandcurrentamplitudewere varied,butthestimulationswerealwaysat500msapart.Dur ingthesession,19distinct pairsofpulsedurationandcurrentamplitudewereapplied, with140responsesfromeach pairrandomlypermutedthroughouttherecording.Weanalyz e480msofspikingdata afterstimulusonseton14corticalchannelsaftereachstim ulusonsetforanalysis.This 150

PAGE 151

nnrn rnnnrn nrnn nnrnn nrn n n n n Figure5-15.Microstimulationwaveformstimewindowismuchlargerthanthemajorityoftheresponsef rommicrostimulationthat occurswithin100msofstimulation. Thegoalofthisanalysisistoassessthedependenceofthene uralresponseonthe stimulationparameters.However,rst,weinvestigatethe signicanceoftheestimated dependencevaluesusingbootstrapsampling.Thereare100b ootstraptrialswith40trials, outof140,foreachstimulationsetting.Weusetheproposed estimatorwiththeVictor Purpuraspikedistanceasmetricacrossarangeoftemporalp recision q values.Euclidean distanceisusedinthetwodimensionalstimulationspaceof amplitudeandduration.Both generalizedassociationandmutualinformationarecalcul atedforeachsampleandfora shuedversionthatdestroysthedependencebetweenstimul ationandspiking.Weshow themeanvaluesofGMAandthe 0.95 quantiles(dottedline)ofthecorrespondingshued datainFigure 5-16 .Itisevidentthatthesevaluesarerelativelylow,butstil lsignicant onall14channelsoveralltemporalprecisions.Also,weobs erveaclearpeakaround 50-100Hz,whichimpliesthattheprecisionrelevanttothes timulationiswithin10to20 ms.Wesetthevalueof q to50. 151

PAGE 152

0 1 10 50 100 200 500 1000 0.5 0.505 0.51 0.515 0.52 0.525 0.53 0.535 0.54 Dependence between spiking and stimulation parameter across temporal precisionGen. Asso.Temporal Precision(1/s) Figure5-16.GMAacrosstemporalprecisionfor14corticalc hannels Sincethestimulationisdiscretevalued,aconditioneddep endenceanalysiscan beperformedtoinvestigatethedependencebetweenspiking activityandaparticular parameterwhilexingtheotherparametertoacertainvalue .Weconsiderthisapproach toinvestigatetheeectofeectofamplitudeanddurationo nanindividualbasis.We showtheconditionaldependence(i.e.,thedependencebetw eenspikingactivityand amplitudekeepingthedurationxed,andthedependencebet weenspikingactivityand durationkeepingtheamplitudexed),themarginaldepende nce(i.e.,thedependence betweenspikingactivityandamplitude,andthedependence betweenspikingactivityand duration),andthejointdependence(i.e.,thedependenceb etweenspikingactivityand amplitude{durationtogether)forallthe14channelsinFig ure 5-17 .Theblack/dottedline/unlledisthe0.95quantileofthesurrogatevalues,w hereasthered/solid-line/lled isthe0.5quantileoftheactualvalues.Thevaluesatthetop aretheaveragepowerover the14channels,thenullhypothesisbeingindependence.As expected,weobservethatthe 152

PAGE 153

jointdependenceisalwaysstatisticallysignicantonall channels.However,themarginal andconditionaldependences,ontheotherhand,revealmore interestingstructures. Weobservethatthedurationparameteralone(i.e.,fromthe marginaldependence perspective)doesnothaveanyeectonthespikingactivity ,whereastheamplitude parameteraloneimposesastrongdependenceonthespikinga ctivity.However,it shouldbenotedthatourobservationisonlylimitedtothepa rticularvaluesofduration parametersusedintheexperimenti.e.,intherange125-250 s.Moreover,theconditional dependencebetweenthespikingactivityandthedurationgi ventheamplitudeisnot signicant(orrelativelylow)i.e.,giventherangeofdura tionunderinvestigationitis ratherimmaterialforhowlongacertainamplitudehasbeena pplied.Asimilarsituation isalsoobservedwhenthedependencebetweenspikingactivi tyandamplitudeisassessed givenparticularvaluesofdurations,butonlyuptoanexten t.Thatis,weobservethat givenasmalldurationspikingactivitydoesnotdependonth eamplitudetoomuch. However,foralongerdurationspikingactivitybecomesver ysensitivetotheamplitude applied.Theseobservationsprovidemoreinsightintheee ctoftheseparameterson thespikingactivity.Thus,theycanbeusedtondtherangeo fdurationswherethereis signicantdependencebetweenthestimulationamplitudea ndthespikingresponse,and thatinturn,canbeexploitedforfurtherreningtheexperi mentaldesigninthisstudy. 5.4Summary Inthischapter,wehaveexploredtheconceptofdependence. Wehavediscussedthe availablenotionsofdependencesuchascorrelationandmut ualinformation,addressed theirdrawbacks,andintroducedanovelapproachofquantif yingdependence.The proposedapproachaddressesdependenceinthecontextofth erealizations,ratherthanthe randomvariables.Itisdenedonanymetricspaces,isasymm etricinnature.Moreover, itisparameterfreeandcomputationallyecient.Wehaveap pliedtheproposedapproach inmanypracticalapplicationssuchastimeseriesanalysis ,causalinferenceandvariable selection,andshownthattheproposedmethodquantiesdep endencewell. 153

PAGE 154

1.00 0.05 0.20 0.28 0.52 0.41 0.44 0.44 0.93 0.96 0.97 0.80 1.00 0.52 1 0.52 2 0.52 3 0.52 4 0.52 5 0.52 6 0.52 7 0.52 8 0.52 9 0.52 10 0.52 11 0.52 12 0.52 13 Joint Dur 25 50 75 100 125 150 175 200 225 250 Amp 0.52 14Amplitude( m A) Duration( m s)Gen. Asso. Figure5-17.Conditional,joint,andmarginalGMAfor14cha nnels 154

PAGE 155

CHAPTER6 CONDITIONALASSOCIATION Theproblemofassessing conditionaldependence betweentworandomvariables giventheknowledgeofathirdrandomvariable,hasreceived considerableattentioninthe recentyearsduetoitsgrowingapplicabilityinmanypracti calmachinelearningproblems, suchas,causalinference( Sunetal. 2007 ),dimensionalityreduction( Fukumizuetal. 2004 ),featureselection( BontempiandMeyer 2010 ),andtimeseriesanalysis( Suand White 2008 ).Bydenition,tworandomvariables X and Y aresaidtobe conditionally independent givenathirdrandomvariable Z (i.e. X ? Y j Z ) ifandonlyif P ( X 2 A j Y 2 B Z 2 C )= P ( X 2 A j Z 2 C ) (6{1) where A B and C arearbitraryelementsofthesigma-algebrasoftherespect ivespaces wheretherandomvariablesassumevaluesin. In R d ,whichisperhapsthemostfrequentlyencountered domain inpractice,( 6{1 ) canbeequivalentlyexpressedinseveralotherways,e.g.,u singcumulativedistribution function(CDF),probabilitydensityfunction(PDF),chara cteristicfunction(CHF),etc. Ameasureofconditionaldependenceisusuallyderivedfrom thesedierentexpressions ofconditionalindependence,e.g.,usingconditionalCDF( LintonandGozalo 1996 ), conditionalPDF( DiksandPanchenko 2006 ),conditionalCHF( SuandWhite 2007 ), kernelmethods( Fukumizuetal. 2008 ),copula( Bouezmarnietal. 2009 ),andconditional probabilitydistributionfunction( SethandPrncipe 2010a ).Althougheachofthese approachesdemonstratessomeuniqueadvantagesovertheot hers,theyalmostalways shareafewdrawbacks;theyarecomputationallyexpensive, requiring O ( n 2 ) or O ( n 3 ) timecomplexitywhere n isthesamplesize,andtheyalmostalwaysrequireselecting severalfreeparameters,suchas,akernel,thecorrespondi ngkernelsize,andoften,a regularizationparameter,wherethebestselectioncriter ionoftheseparameters,untilthis day,remainsanopenissue.Thesedrawbacksprohibittheapp licabilityofthesemeasures 155

PAGE 156

tolargescaleproblemsi.e.problemsinvolvinglargesampl esize,orlargedimensionality, orboth.Moreover,thesemethods,bydesign,onlyaddressth eissueof conditional independenceratherthanconditionaldependence i.e.theyarezeroifandonlyif( 6{1 )is satised,but,theyusuallyignorewhenandhowthesemeasur esincrease/decreaseorreach theirrespectivemaximumvalues. Theproblemofassessingan attribute ,suchasdependenceorconditionaldependence, ofasetofrandomvariablescanbeapproachedintwoconceptu allydierentways. Therstapproachstartswithameasure,i.e.afunctionofth erandomvariables,that satisessomedesiredcriteria,andisfollowedbyanestima tor,usuallyconsistent,ofthis measurefromniterealizations( Kruskal 1958 ).Thesecondapproach,ontheotherhand, startswiththerealizations,describesanintuitivewayof capturingtheattributefrom therealizations,andthen,discussesthecorrespondingme asureforwhichtheproposed methodisanestimator,usuallyconsistent( Spearman 1904 ; Kendall 1938 ).Theavailable methodsofassessingconditionaldependencefollowthefor merapproach.Toelaborate, considertheexampleoftheconditionalmutualinformation (CMI),whichisanestablished measureofconditionaldependence( Joe 1989 ).Givenrandomvariables ( X Y Z ) ,and thejointprobabilitylaw P ( X Y Z ) ,itisclearthatCMIisminimumwhen( 6{1 )is satised,whereasitismaximumwhenthereexistsafunction alrelationshipbetween X and Y forevery Z = z ( Joe 1989 ).Givena nite setofrealizations f ( x i y i z i ) g ni =1 CMIisestimatedconsistentlybyestimatingtheRadon-Niko dymderivativeusing adaptivebinning( Perez-Cruz 2008 )orkernelsmoothing( Joe 1989 ).However,the resultingestimatorsdonotprovidethesameunderstanding ofwhentheestimatedvalue isminimumormaximum,orunderwhatcircumstancesthisvalu eincreasesordecreases. Thatis,althoughthemeaningofCMIistransparentinthecon textoftherandom variablesi.e.theprobabilitylaw,fromtheperspectiveof therealizations,itremains unclear whatattributemakesthemconditionallydependent 156

PAGE 157

Figure6-1.Whatmakes Y conditionallydependenton X given Z ? Inthischapter,wefollowthelatterapproachandexploreco nditionaldependence fromtheperspectiveoftherealizationsi.e.givenrealiza tions f ( x i y i z i ) g ni =1 weprovide anintuitiveunderstandingofwhatmakesthemconditionall ydependent(SeeFigure 6-1 ).Thisapproachisparticularlyusefulforapplicationssi nce,inpractice,weonlyhave accesstoanitesetofrealizationsratherthantheunderly ingprobabilitylaw.Inorderto achievethis,werevisittheconceptofassociationinstati stics,whichprovidesanintuitive understandingof,givenasetofrealizationswhatdependen cebetweentwo realvalued randomvariablesimplies.Wegeneralizethisconcepttoarb itrarymetricspaces,andthen extendittointroducetheconceptof conditionalassociation .Theproposedapproachnot onlyprovidesanintuitiveviewofwhatconditionaldepende nceisandhowitchanges,but itisalsoparameterfree,andrelativelyeasytocompute,ma kingitanexcellentalternative tothestate-of-the-artmethods.Anotheradvantageofthep roposedapproachisthatitis denedonlyintermsofthepairwise distances betweentherealizations,andtherefore,itis applicabletoexoticmetricspacessuchasnon-Euclideansp aceand/orinnitedimensional spacee.g.thespaceofspiketrains( Sethetal. 2010 ). Statisticaltestofconditionalindependenceisessential fordetectingGranger non-causality( Granger 1980 )orbuildingacausalnetwork( Pearl 2000 ).Givenonly 157

PAGE 158

anitesetofrealizationsoneusuallyrelieson permutationtest toestimatethethreshold ofrejection( DiksandDeGoede 2001 ).However,generatingthesurrogatedataforthe permutationtestisnottrivial,andtillthisday,remainsa nopenareaofresearch.In thischapter,weintroduceanovelschemeofgeneratingsurr ogatedataforthetestof conditionalindependence.Someinterestingpropertiesof theproposedapproacharethat itonlyinvolvesonefreeparameter,andit resamplestheoriginaldata ,thus,providinga scopeofre-usingcomputationsforestimatingthesurrogat evalues.However,initspresent format,thisapproachisonlyapplicabletoEuclideanspace swheretheLebesguemeasure isdened.Therefore,inthischapter,welimitourselvesto Euclideanspaces. Therestofthechapterisorganizedasfollows:Insection 6.1 weprovideabrief overviewoftheexistingliteratureonmeasuresofconditio naldependenceandpointout theirweaknesses,insection 6.2 howtheconceptofassociationcanbegeneralizedand extendedtoaddressthenotionofconditionalassociation, insection 6.3 weproposeanovel schemeofgeneratingsurrogatedataforthetestofconditio nalindependence,insection 6.4 weapplytheproposedmethodonseveralsyntheticandrealda tatoprovidemoreinsight intotheproposedmethod,andcompareitagainstotheravail ablemethods,andnally,in section 6.5 weconcludethechapterwithabriefsummaryoftheproposedw ork,andsome guidelinesforfuturework. 6.1Background Wehaveintroducedseveralmeasuresofconditional(in)dep endencesuchas CM HD and HSNCIC ,inChapter 4 .Hereweintroduceanothermeasureofconditional dependence{conditionalmutualinformation{basedonthee stimationofPDF.We usedaParzentypeestimatein 4 .ThePDF,however,canalsobeestimatedusinga k -th nearestneighborbasedapproach.Thisapproachhasbeenada ptedby( Perez-Cruz 2008 ) toconsistentlyestimatetheCMI, CMI( X Y ; Z )= Z log f XYZ ( x y z ) f Z ( z ) f XZ ( x z ) f YZ ( y z ) d F XYZ ( x y z ) (6{2) 158

PAGE 159

byreplacingthePDFsbytheirempiricalestimatesi.e. f n U ( u i )= k = ( n 1) = vol( k ) where vol( k ) denotesthevolumeofaspherewithradius k ,and k isthe distance between u i andits k -thneighbor.Theadvantageofthisapproachisthatitdoesn ot requireselectingakernel.However,itstillhasafreepara meter,i.e.thesizeofthe neighborhood k ( Perez-Cruz 2008 ).Tothebestofourknowledge,anappropriatecriterion forselectingthisparameterstillremainstobefound.Noti cethat,boththePDFandthe CDFbasedapproachesinherentlyassumethattheunderlying spacewheretherandom variablestakevaluesin,isnitedimensionalEuclideansp ace.TheCDFbasedapproach assumesthatbyconsideringthattheunderlyingspacehasan order,whereasthePDF basedapproachesassumethateitherbyconsideringtheexis tenceofadensitykernel,orby deningthevolumeofasphereby vol( )= d = 2 d ( d = 2+1) (6{3) where d isthedimensionalityofthespace,and istheGammafunction.Therefore,these methodscannotbeeasilyextendedtoarbitrarymetricspace s. Althoughtheseapproachesaremathematicallyprecise,i.e .thestatisticsarezeroif andonlyifconditionalindependenceissatised,andtheco rrespondingestimatorsare consistenti.e.theyreachtherespectivetheoreticalvalu eswhenthenumberofrealizations tendstoinnity;theydonotprovideanintuitiveunderstan dingofwhatconditional dependenceisintermsofniterealizationse.g.howitincr eases/decreasesandwhenit reachesitsmaximum.Inthefollowingsection,weaddressth isissue. 6.2Method 6.2.1ConditionalAssociation Thenotionofgeneralizedassociationcanbeexploitedtoca pture conditionalassociation asfollows: Giventhreerandomvariables ( X Y Z ) Y isconditionallyassociated 159

PAGE 160

with X given Z ,ifcloserealizationpairsof Y i.e. f y i y j g aremoreassociatedwithclose realizationpairsof ( X Z ) i.e. f ( x i z i ),( x j z j ) g thanwithcloserealizationpairsof Z i.e. f z i z j g alone .Inotherwords, giventhattworealizations f z i z j g areclosein Z ,knowing thattherealizations f x i x j g arealsoclosein X bringstherealizations f y i y j g closerin Y Noticethat,likegeneralizedassociation,thisconceptis alsodenedoveranymetricspace wheretheclose-nessoftworealizationscanbeassessedbya nappropriatemetric. Followingthedenition,wecaptureconditionalassociati onsimplybythedierence betweenthegeneralizedassociationsasfollows, MCA( X Y ; Z )=GMA(( X Z ), Y ) GMA( Z Y ). Then MCA( X Y ; Z ) isgreaterthanzerowhen Y isconditionallyassociatedwith X given Z ,whereas MCA( X Y ; Z ) islessthanorequaltozeroif Y isnotconditionally associatedwith X given Z .Toelaboratethisconceptletusconsidertwosimpleexampl es. Firstconsider X 1 = V 1 Y 1 = U 1 Z 1 = U 1 where U 1 and V 1 areindependent.Then GMA( Z 1 Y 1 )=1 ,whereasifthe contributionof V 1 issignicantlyhigherthan U 1 then GMA(( X 1 Z 1 ), Y 1 ) 0.5 ,andtherefore, MCA( X 1 Y 1 ; Z 1 ) 0.5 < 0 .Thesignicance ofcontributioncanbeunderstoode.g.inthecontextofsign al-to-noiseratio,where U 1 issignaland V 1 isthenoise.Thestrongertheeectof V 1 ,themoretheclose-nessof therealizationsof ( X 1 Z 1 ) isgovernedby X 1 ,thusdestroyingtheassociationbetween Y 1 and Z 1 .Therefore,weexpecttheconditionalassociationvalueto becloseto 0 if thecontributionof V 1 islowcomparedto U 1 whereastobecloseto 0.5 whenthis contributionishigh.Aspecialcaseis MCA( X X ; X )=0 Next,consider X 2 = U 2 Y 2 = U 2 Z 2 = V 2 where U 2 and V 2 areindependentrandom variables.Then GMA( Z 2 Y 2 ) 0.5 ,whereasifthecontributionof U 2 issignicantly higherthan V 2 then GMA(( X 2 Z 2 ), Y 2 ) 1 ,andtherefore, MCA( X 2 Y 2 ; Z 2 ) 0.5 > 0 Usingsimilarintuitionasbefore,weexpecttheconditiona lassociationvaluetobeclose to 0.5 ifthecontributionof V 2 islowcomparedto U 2 whereastobecloseto 0 when 160

PAGE 161

thiscontributionishigh.Therefore, MCA providesanintuitiveunderstandingofhow conditionalassociationchanges,andwhenitreachesitsma ximum/minimum. Thesetwoexamplesalsoclearlydierentiatethetwoapproa chesofworkingwiththe probabilitylawandworkingwiththerealizations.Fromapr obabilisticpointofview, Y 1 isalwaysconditionallyindependentof X 1 given Z 1 whereas Y 2 isalways strictly dependent on X 2 given Z 2 .Therefore,agoodmeasureofdependenceshouldalwaysassu meits minimumanditsmaximumvalue,respectively,forthesetwoe xamples.However,itis certainlynotfeasibletoachieveusingonlyanitenumbero frealizations,andtherefore, themeaningofthenitesampleestimatorremainsobscure.O ntheotherhand,the proposedapproachprovidesanintuitiveunderstandingoft heacquiredvaluebyrelatingit totherelativecontributionoftherandomvariables. Noticethattheexpressionofconditionalassociationisve rysimilartotheexpression ofCMIsinceitcanbeexpressedas CMI( X Y ; Z )=MI( X ,( Y Z )) MI( X Z ) wheremutualinformation(MI)isdenedas MI( X Y )= Z f XY ( x y )log( f XY ( x y ) = f X ( x ) f Y ( y ))d x d y Thesimilaritybetweenthesetwoexpressionsprovidesmore intuitiononhowconditional associationworks.However,unlikeCMIwedonotclaimthatc onditionalassociationisa necessaryandsucientconditionforconditionalindepend ence.Aformalprooforvalidity ofthisstatementremainstobefound.Onanalnote,althoug hinthischapter,wehave assumedthattherealizations f x i g ni =1 f y i g ni =1 and f z i g ni =1 aredistinct,thisrestrictioncan beeasilyliftedbyemployingamethodforbreakingtiesbetw eenranks. 6.2.2Examples Todemonstratethevalidityofthegeneralizedassociation andconditionalassociation, weconsiderthefollowingtwoexamples. 161

PAGE 162

6.2.2.1ConditionallyDependentbutIndependentVariable s Considerthreeindependentnormallydistributedrandomva riables X Y and i.e. X Y N (0,1) ,andathirdrandomvariable Z = r ( X Y )+(1 r ) where 0 0 .Wecallthisexample ExCoDe .For r =0.8 ,wegenerate 200 sets of 200 realizationsfromtheserandomvariables,andcomputethem edianvaluesof GMA and MCA tobe 0.50 and 0.22 respectively. 6.2.2.2ConditionallyIndependentbutDependentVariable s Considerthreeindependentnormallydistributedrandomva riables Z 1 2 i.e. Z 1 2 N (0,1) ,andconstructtworandomvariables X and Y where X = r Z +(1 r ) 1 and Y = r Z +(1 r ) 2 where 0 0.5 and MCA( X Y ; Z ) 0 .Wecall thisexample, ExCoIn .For r =0.5 ,wegenerate 200 setsof 200 realizationsfromthese randomvariables,andcomputethemedianvaluesof GMA and MCA tobe 0.56 and 0.00 respectively. 6.3SurrogateTest Althoughconditionalassociation,oranyothermeasureofc onditionaldependence returnsavalue,say v ,the signicance ofthisvalue,inthecontextofhypothesistesting ofconditionalindependence,remainsobscure,sinceasmal l/largevaluecanresultdueto absence/presenceofconditionaldependenceorsimplyduet olackofevidencei.e.sucient numberofrealizations.Sincethe null distributioni.e.thedistributionofthemeasured value v ifthehypothesisissatised,isoftendiculttoassessinc aseofniterealizations, 162

PAGE 163

inpractice,anuserusuallyrelieson permutationtest todecidethesignicanceof v .In simpleterms,givenasetofrealizations f ( x i y i z i ) g ni =1 ,theobjectiveofpermutationtest istogenerate T setsof surrogaterealizations f ( x ( t ) i y ( t ) i z ( t ) i ) g ni =1 for t =1,..., T that rerectthepropertyofthenullconditionbutkeeptheothera ttributesofthedataintact, evaluatethemeasureusingtheserealizationstoacquire surrogatevalues f v ( t ) g Tt =1 ,and then,observeifthetruevalue v issignicantlydierentfromthesesurrogatevalues. Generatingsurrogatedatainthecontextoftestofconditio nalindependence,however, isnottrivial.Oneofthepopularapproachesforgenerating surrogatedatahasbeen proposedby( PaparoditisandPolitis 2000 ),andhasalsoalsoimplementedby( Suand White 2007 2008 ).Inbrief,givenrealizations, f x i y i z i g ni =1 from ( X Y Z ) ,theaimof thisapproachistogeneraterealizations f ~ x i ,~ y i ,~ z i g ni =1 ,representing ( ~ X ~ Y ~ Z ) ,suchthat ~ X ? ~ Y j ~ Z and ( X Z ) ( ~ X ~ Z ) ( Y Z ) ( ~ Y ~ Z ) 1 .Thisisachievedbyestimatingthe conditionaldensityfunctions ^ f X j Z ( x j z )= P ni =1 p 1 ( x x i ) p 3 ( z z i ) = P ni =1 p 3 ( z z i ) and ^ f Y j Z ( y j z )= P ni =1 p 2 ( y y i ) p 3 ( z z i ) = P ni =1 p 3 ( z z i ) where p ( ) isaParzenkernelfor kerneldensityestimation,andthen,samplingfromthem.Th ecompleteapproach,thus, canbestatedinthreesteps:foreach i ,rst,assign ~ z i = z i ,second,sample ~ x i fromthe density ^f X j Z ( x j ~ z i ) ,andthird,sample ~ y i fromthedensity ^ f Y j Z ( y j ~ z i ) .However,thisapproach requiresselectinganappropriateresamplingwidth,which becomesdicultinhigher dimensions. Anotherwidelyusedapproachisdiscussedin( DiksandDeGoede 2001 )inthe contextofGrangercausalinference.Givensamples f x i y i z i g ni =1 fromamultivariate stochasticprocess f X t Y t Z t g ,theobjectiveistogeneratesamples f ~ x i ,~ y i ,~ z i g ni =1 representingastochasticprocess f ~ X t ~ Y t ~ Z t g suchthat f ~ X t g doesnotcause f ~ Y t g given 1 Theexpression X Y impliesthattherandomvariables X and Y followthesame distribution 163

PAGE 164

theknowledgeof f ~ Z t g ,and f Y t Z t gf ~ Y t ~ Z t g 2 .Thisisusuallydonebyxingthe observations f y ( e ) i z ( e ) i g ni =1 whilerandomlypermutingtheobservations f x ( e ) i g ni =1 where f u ( e ) i g ni =1 denotestherealizationsfromthetimeembeddedstochastic process f U ( e ) t g Noticethatthisapproachnotonlydestroysthecausalinrue ncebutalso,possibly, destroystheinternalstructureoftheprocess f ~ X t g .However,thisapproachispreferred inpracticeduetoitssimplicityanditisobservedtoprovid esensibleresult.Moreover,it doesnotinvolveanyparameter.Inthecontextofthetestofc onditionalindependence, however,thisapproachstilldoesnotproducethesurrogate realizations f (~ x i ,~ y i ,~ z i ) g ni =1 suchthat Y ? X j Z ,butmerelymodifytherealizationssuchthat ( Y Z ) ? X wherethe latterisasucientconditionfortheformer. Here,wediscussadierentapproachforgeneratingsurroga tedatabymodifying therstapproach.Wefollowtheapproachsuggestedin( PaparoditisandPolitis 2000 ) inthesensethatwerstgeneratesamplesfrom f Z ( z ) andthenfrom f X j Z ( x j z ) and f Y j Z ( y j z ) respectively,however,weperformthefollowingmodicati onsinorderto makethisapproachcomputationallymoreattractiveinthec ontextofthepermutation test;rst,weuseanearestneighborbasedestimateofcondi tionaldensityfunctionas describedintheprevioussectioninthecontextofestimati ng CMI ,andsecond,wereuse therealizations f z i g ni =1 f x i g ni =1 and f y i g ni =1 asrealizationsfrom f Z ( z ) f X j Z ( x j z ) ,and f Y j Z ( y j z ) respectively.Reusingtheoriginaldataiscomputationall yadvantageousinthe contextofpermutationtestsincecomputationinvolvedinc omputingthetrueconditional dependencevaluecanbereusedtocomputethesurrogatevalu es,whichincludese.g. reusingthedistancematrixorthekernelGrammatrix.Befor ediscussingtheother aspects,wepresentthealgorithmindetail. 2 Theexpression f X t gf Y t g impliesthatthetimeseries f X t g and f Y t g hasthesame jointprobabilitylaw 164

PAGE 165

-4 -2 0 2 4 Z -4 -2 0 2 4 Y -4 -2 0 2 4 -4 -2 0 2 4 ZX -4 -2 0 2 4 Y -4 -2 0 2 4 X A r =0.8 -4 -2 0 2 4 Z' -4 -2 0 2 4 Y' -4 -2 0 2 4 -4 -2 0 2 4 Z'X' -4 -2 0 2 4 Y' -4 -2 0 2 4 X' B r =0.8 Surrogate -0.5 0 0.5 1 0 20 40 60 PCProbability True Surrogate C r =0.8 -0.5 0 0.5 1 0 5 10 15 20 MCAProbability True Surrogate D r =0.8 -0.5 0 0.5 1 0 5 10 15 CMIProbability True Surrogate E r =0.8 0 1 2 3 4 0 1 2 3 4 HSNCICProbability True Surrogate F r =0.8 0 0.2 0.4 0.6 0.8 0 0.5 1 1.5 2 0.07 0.21 0.75 1.00 1.00 1.00 1.00 1.00gPC True Surrogate G 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.04 0.05 0.07 0.15 0.43 0.93 1.00 1.00gMCA True Surrogate H 0 0.2 0.4 0.6 0.8 -0.2 0 0.2 0.4 0.6 0.8 0.18 0.18 0.17 0.31 0.75 1.00 1.00 1.00 CMIg True Surrogate I 0 0.2 0.4 0.6 0.8 0 2 4 6 0.06 0.04 0.06 0.30 0.95 1.00 1.00 1.00gHSNCIC True Surrogate J Figure6-2.Performanceofsurrogatedatagenerationschem eunderconditional dependence Considerthatwehaverealizations f ( u i v i ) g ni =1 fromajointdistribution f UV ( u v ) Thenusingthedenitionofconditionaldensityfunction,w eget f U j V ( u j v = v j ) / f UV ( u v j ) since f V ( v j ) isaconstantnormalizingfactor.Therefore,wecanestimat e f U j V ( u i j v j ) / f UV ( u i v j ) following( 6{3 ),whereweneedtospecifyaneighborhood parameter k .Giventheestimate f U j V ( u i j v j ) ,wecanthensamplefromthisdensity functionassumingthedensityfunctiononlyexistsoverthe realizationvalues f u i g ni =1 Basedonthisapproach,wefollowthreesimplestepstogener atesurrogatedata;foreach i ,rst,assign ~ z i = z i ,second,sample ~ x i fromtheensemble f x j g ni =1 withprobability f X j Z ( x j j z i ) ,andthirdsample ~ y i fromtheensemble f y j g ni =1 withprobability f Y j Z ( y j j z i ) Althoughthisprocessissimple,itinvolvesafreeparamete r k ,which,insomesense, controlsthesmoothnessoftheestimateddensityfunction. Weset k = d p n e .Itshould 165

PAGE 166

-4 -2 0 2 4 Z -4 -2 0 2 4 Y -4 -2 0 2 4 -4 -2 0 2 4 ZX -4 -2 0 2 4 Y -4 -2 0 2 4 X A r =0.5 -4 -2 0 2 4 Z' -4 -2 0 2 4 Y' -4 -2 0 2 4 -4 -2 0 2 4 Z'X' -4 -2 0 2 4 Y' -4 -2 0 2 4 X' B r =0.5 Surrogate -0.4 -0.2 0 0.2 0.4 0 2 4 6 PCProbability True Surrogate C r =0.5 -0.2 0 0.2 0 5 10 15 MCAProbability True Surrogate D r =0.5 -0.5 0 0.5 0 5 10 15 CMIProbability True Surrogate E r =0.5 0 0.5 1 0 2 4 6 HSNCICProbability True Surrogate F r =0.5 0 0.2 0.4 0.6 0.8 -0.1 0 0.1 0.2 0.3 0.04 0.05 0.03 0.04 0.03 0.01 0.00 0.00gPC True Surrogate G 0 0.2 0.4 0.6 0.8 -0.04 -0.02 0 0.02 0.04 0.05 0.05 0.05 0.04 0.04 0.01 0.00 0.00gMCA True Surrogate H 0 0.2 0.4 0.6 0.8 -0.2 -0.1 0 0.1 0.2 0.20 0.20 0.19 0.12 0.09 0.03 0.02 0.01gCMI True Surrogate I 0 0.2 0.4 0.6 0.8 0 0.5 1 0.04 0.04 0.03 0.02 0.01 0.00 0.00 0.00gHSNCIC True Surrogate J Figure6-3.Performanceofsurrogatedatagenerationschem eunderconditional independence benotedthatthechoiceofthisparametermaynotbeoptimal. However,weempirically showthatitworkswellinpractice.Also,noticethatthisap proach,insomesense,can beunderstoodastherstapproachwithavariablekernelsiz e,andthecomputational complexityofthisapproachishigherthanthesecondapproa ch. Todemonstratethevalidityofthisapproach,werevisitthe examples ExCoDe and ExCoIn .Noticethat,the quality ofthesurrogatedatashouldbejudgedbyameasureof conditionaldependence.However,thesignicanceoftheme asuredvalueitselfisjudgedby thesurrogatedata.Therefore,toevaluatethequalityofth esurrogatedata,werelyona simplerandmoreestablishedmeasureofconditionaldepend encei.e.the partialcorrelation ( PC ),alongwiththemethodsdescribedinthischapteri.e. MCA CMI and HSNCIC PC canbereliablyappliedinthesetwoexamplessincethejoint probabilitylawforboththe 166

PAGE 167

examplesareGaussian.InFigure 6-2 and 6-3 ,weshowoneinstanceofthetrueandthe surrogateprobabilitydistributionsforeachoftheseexam ples,with r =0.8 and r =0.5 respectively,alongwiththedistributionsofthetrueands urrogatevaluesofmeasuresof conditionalindependence,andthestatisticalpowersofth emethodsforvarying r .For CMI weuse k = d p n e ,whereasfor HSNCIC weuseaGaussiankernelwiththekernelsize settothemedianoftheintersampledistances,andsetthere gularizationvalueto 1 = n .We use 1000 setsof 200 realizationseachtobothcomputetheempiricalthresholdo frejection atsize 0.05 ,andtocomputetheempiricalpower. Weobservethatforboththeexamplesthesurrogatevaluesex istaroundzerofor PC and MCA ,whichispromising.Ontheotherhand,thenulldistributio nsfor CMI and HSNCIC arebiased,whichisanusualobservationfornitesamplees timation.For ExCoDe ,thetruevaluesaremuchhigherthanthesurrogatevalueswh ichindicates thepresenceofconditionaldependence,whereasfor ExCoIn ,thetruevaluesarealmost identicallydistributedasthesurrogatevalueswhichindi catestheabsenceofconditional dependence. For ExCoDe r controlsthedicultyoftheproblemintermsofSignal-to-N oise ratio,i.e.ahigher r injectsmorenoise( )relativetothesignal( X ),makingitdicult toobservethecontributionof X on Y given Z .Weobservethat PC achievesthebest performanceintermsofstatisticalpower,reachingtheide alpower 1 at r =0.4 .The performanceof PC isjustiedsincetheoriginalrealizationsareGaussianin nature. Theperformanceof CMI and HSNCIC arebetterthan MCA .However,thisperformance isachievedbyproperselectionoftheparametervaluesanda ttheexpenseofmore computationalcost.Also,weobservethat CMI tendstooverrejectinboththeexamples. Althoughthetrueandthesurrogatevaluesofthemeasuresfo r ExCoDe followthe desiredpatterni.e.thetruevaluesmonotonicallyincreas eforincreasing r ,andthe surrogatevaluesmaintainasteadylow,wedonotobservethe sameeectfor ExCoIn Noticethatthesurrogatevaluesbecomelargerfor PC forlarge r .Apossiblereasonfor 167

PAGE 168

thisisthatforlarge r ,thejointdistributionof ( X Z ) and ( Y Z ) arealmostsingular,and thus,diculttoestimate.Therefore,itispossiblethatth esurrogatedatageneratedfrom thesedistributionsarenotaccurate,andtherefore,notGa ussian,thusmanipulatingthe valuesreturnedby PC .Italsoexplainsthesteadyunderrejectionofthenullhypo thesis by MCA forlarge r HSNCIC ,ontheotherhand,demonstratesadierentbehavior. Althoughitmaintainsasimilarsurrogatevaluedistributi onoverdierent r ,thetruevalue estimatedbythismeasuremonotonicallydecreaseswithinc reasing r ,thusforcingitto under-reject.Aprobablecauseofthisobservationis,agai n,thechoiceoffreeparameters. Sinceforlarge r theprobabilitylawbecomesnarrower, HSNCIC perhapsrequireasmaller kernelsizeforproperestimationthantheselectedkernels ize. Inbrief,thisexperimentshowsthat,rst,thesurrogateda tageneratedbythe proposedmethodmaynotbeaccurate,butitisstillsucient lyreliableforpermutation test,andsecond,theavailablemeasuresofconditionaldep endencerelyontheaccurate choiceofparameterstomakeproperdecisioni.e.toavoidov erorunderrejection. 6.4Simulation Inthissection,weapplytheproposedapproachtoseveralre alandsyntheticdataset inthecontextof Grangernon-causalinference ( Granger 1980 ),toaddressitsprosand cons.6.4.1ConditionalGrangerCausality Givenstationarystochasticprocesses f X t g and f Y t g f X t g issaidtocause f Y t g i.e. f X t g!f Y t g inthesenseofGranger ,ifthepastvalues [ X t 1 X t 2 ,...] of f X t g contain additionalinformationaboutthepresentvalue Y t of f Y t g thatisnotcontainedinpast values [ Y t 1 Y t 2 ,...] of f Y t g alone( Granger 1980 )i.e., f X t g6!f Y t g() Y t ?j ( X t 1 X t 2 ,...) j ( Y t 1 Y t 2 ,...) Noticethatinmathematicalterms,wearemoreinterestedin f X t g6!f Y t g i.e. non-causalitythan f X t g!f Y t g i.e.causalitysincetheformerdenitionreducestoa 168

PAGE 169

simple hypothesistest ofconditionalindependence.Theissueofcausalinruenceb etween twotimeseriescanbetriviallyextendedtomultivariateti meseriesinvolvingthreeor moretimeserieswhereitisoftendesiredtoseparatea directcause froman indirect one i.e.tojudgeifthetimeseries f X t g and f Y t g arecausallyconnectedornot,givenathird timeseries f Z t g f X t g issaidtocause f Y t g given f Z t g i.e. f X t g!f Y t gjf Z t g ,ifthepast values [ X t 1 X t 2 ,...] of f X t g containadditionalinformationaboutthepresentvalue Y t of f Y t g thatisnotcontainedinpastvalues [( Y t 1 Z t 1 ),( Y t 2 Z t 2 ),...] of f Y t Z t g .In termsofconditionalindependence, f X t g6!f Y t gjf Z t g impliesthatthepresentvalue Y t of f Y t g isconditionallyindependentofthepastvalues [ X t 1 X t 2 ,...] of f X t g givenpast values [( Y t 1 Z t 1 ),( Y t 2 Z t 1 ),...] of f Y t Z t g Forourexperiment,wegenerateamultivariatetimeseries f W g with 2 5 elements. Toseparatea directcause froman indirectcause wequantifythe conditional causal inruencei.e.wequantifythe causalinruenceof f W i g on f W j g bytheconditional associationof Y = W j ( t ) with X =[ W i ( t 1),..., W i ( t L )] given Z =[ W j ( t 1),..., W j ( t L ), W i j ( t 1),..., W i j ( t L )] where W = W i [ W j [ W i j ,and L isthe numberofpastvaluesthatweconditionon.Sinceweareworki ngintheEuclideanspace, weusethe l 2 distanceasmetricforallthreespaces X Y and Z .Foreachexperiment weuse T =200 trialstoestimatetheempiricalthresholdofrejection,an daccepta causalconnectioni.e.rejectthenullhypothesis,ifthees timatedp-valueislessthan 0.05 .Forallthesimulationresultsweobserveacommonfeaturet hattheperformanceof MCA improvesoverincreasingsamplesizes n ,anditdegradesoverincreasingembedding dimensions L .Thisisexpectedsince L controlsthedimensionalityoftheresulting hypothesistestingproblem,andthehigherthevalueof L ,themoredicultandsparsethe problemis.6.4.2LinearSystem First,weconsiderthefollowinglinearsystem( ZouandFeng 2009 ), W 1 ( t )=0.95 p 2 W 1 ( t 1) 0.9025 W 1 ( t 2)+ 1 169

PAGE 170

W 2 ( t )=0.5 W 1 ( t 2)+ 2 W 3 ( t )= 0.4 W 1 ( t 2)+ 3 W 4 ( t )= 0.5 W 1 ( t 1)+0.25 p 2( W 4 ( t 1) + W 5 ( t 1))+ 4 W 5 ( t )= 0.25 p 2( W 4 ( t 1) W 5 ( t 2))+ 5 where 1 5 N (0,0.6 2 ) 2 N (0,0.5 2 ) 3 N (0,0.3 2 ) ,and 4 N (0,0.3 2 ) Here f W 1 g causes f W 2 g f W 3 g ,and f W 4 g ,whereas f W 4 g and f W 5 g causeeach other.WepresenttheresultofthisexperimentinTable 6-1 with L =2,10,25 and n =250,500,1000 .Eachentry,inthetable,showsthepercentageoftimeapart icular connectionhasbeenestablishedoverindicatednumberofre petitions R .Thetrue connectivitieshavebeenunderlined.Weobservethat MCA hasbeenabletorecover theconnectivities f W 1 g!f W 2 g f W 3 g f W 4 gg withhighaccuracy.However,ithasfailed todetecttheconnectivities f W 4 g$f W 5 g forlowsamplesizesandhighembedding dimensions.Apossibleexplanationofthisobservationist hattheconnectionstrengths between f W 4 g and f W 5 g isratherweakcomparedtotheotherexistingconnections. 6.4.3NonlinearSystem Next,weconsiderthefollowingnon-linearsystem( Narendra 1997 ,Page144), W 1 ( t +1)= 1+ W 1 ( t ) 1+ W 2 1 ( t ) sin W 2 ( t )+ 1 W 2 ( t +1)= W 1 ( t )exp W 2 1 ( t )+ W 2 2 ( t ) 8 + W 2 ( t ) cos W 2 ( t )+ W 3 4 ( t ) (1+ W 4 ( t )) 2 +0.5cos( W 1 ( t )+ W 2 ( t )) + 2 W 3 ( t +1)= W 1 ( t ) 1+0.5sin W 2 ( t ) + W 2 ( t ) 1+0.5sin W 1 ( t ) + 3 where W 4 ( t ) N (0,1) isanexternalinput,and 1,2,3 N (0,0.1 2 ) .Here f W 1 g and f W 2 g causeeachother,andtheybothcause f W 3 g ,whereas f W 4 g causes f W 2 g .We presenttheresultofthisexperimentinTable 6-2 for L =2,10,25 and n =250,500,1000 170

PAGE 171

Table6-1.PerformanceofMCAforlinearsystem (a) L =2 n =250 R =128 W 1 W 2 W 3 W 4 W 5 W 1 0.96 1.00 0.99 0.01 W 2 0.01 0.010.000.00 W 3 0.010.02 0.000.02 W 4 0.020.000.00 0.04 W 5 0.000.010.010.09 (b) L =10 n =250 R =128 W 1 W 2 W 3 W 4 W 5 W 1 0.75 0.89 0.88 0.02 W 2 0.02 0.000.000.02 W 3 0.030.01 0.000.02 W 4 0.000.000.00 0.02 W 5 0.020.000.000.00 (c) L =25 n =250 R =128 W 1 W 2 W 3 W 4 W 5 W 1 0.24 0.35 0.29 0.04 W 2 0.02 0.020.000.03 W 3 0.000.00 0.000.01 W 4 0.000.010.00 0.02 W 5 0.020.020.020.02 (d) L =2 n =500 R =64 W 1 W 2 W 3 W 4 W 5 W 1 1.00 1.00 1.00 0.00 W 2 0.00 0.020.000.00 W 3 0.000.00 0.000.02 W 4 0.000.020.00 0.05 W 5 0.000.020.000.16 (e) L =10 n =500 R =64 W 1 W 2 W 3 W 4 W 5 W 1 0.98 1.00 1.00 0.05 W 2 0.00 0.000.000.08 W 3 0.000.00 0.000.02 W 4 0.020.000.00 0.02 W 5 0.000.000.000.02 (f) L =25 n =500 R =64 W 1 W 2 W 3 W 4 W 5 W 1 0.66 0.84 0.86 0.02 W 2 0.02 0.020.000.05 W 3 0.020.02 0.000.05 W 4 0.000.000.02 0.05 W 5 0.060.020.000.02 (g) L =2 n =1000 R =32 W 1 W 2 W 3 W 4 W 5 W 1 1.00 1.00 1.00 0.03 W 2 0.00 0.000.000.00 W 3 0.000.00 0.000.00 W 4 0.000.000.00 0.07 W 5 0.000.000.000.47 (h) L =10 n =1000 R =32 W 1 W 2 W 3 W 4 W 5 W 1 1.00 1.00 1.00 0.00 W 2 0.00 0.000.000.00 W 3 0.030.00 0.030.00 W 4 0.000.000.00 0.06 W 5 0.000.000.000.00 (i) L =25 n =1000 R =32 W 1 W 2 W 3 W 4 W 5 W 1 1.00 1.00 1.00 0.06 W 2 0.00 0.060.000.06 W 3 0.000.00 0.000.00 W 4 0.030.000.02 0.12 W 5 0.030.030.000.00 171

PAGE 172

Table6-2.Performanceof MCA fornonlinearsystem (a) L =2 n =250 R =128 W 1 W 2 W 3 W 4 W 1 0.00 0.07 0.00 W 2 1.00 0.96 0.02 W 3 0.000.00 0.02 W 4 0.001.00 0.00 (b) L =10 n =250 R =128 W 1 W 2 W 3 W 4 W 1 0.03 0.05 0.04 W 2 0.62 0.67 0.08 W 3 0.000.00 0.02 W 4 0.020.67 0.02 (c) L =25 n =250 R =128 W 1 W 2 W 3 W 4 W 1 0.05 0.10 0.01 W 2 0.32 0.37 0.05 W 3 0.000.02 0.03 W 4 0.020.13 0.02 (d) L =2 n =500 R =64 W 1 W 2 W 3 W 4 W 1 0.00 0.10 0.02 W 2 1.00 1.00 0.00 W 3 0.000.00 0.05 W 4 0.001.00 0.00 (e) L =10 n =500 R =64 W 1 W 2 W 3 W 4 W 1 0.09 0.05 0.06 W 2 0.83 0.95 0.03 W 3 0.000.00 0.03 W 4 0.050.94 0.00 (f) L =25 n =500 R =64 W 1 W 2 W 3 W 4 W 1 0.05 0.16 0.03 W 2 0.45 0.61 0.03 W 3 0.020.00 0.02 W 4 0.030.42 0.00 (g) L =2 n =1000 R =32 W 1 W 2 W 3 W 4 W 1 0.03 0.50 0.03 W 2 1.00 1.00 0.00 W 3 0.000.00 0.00 W 4 0.001.00 0.00 (h) L =10 n =1000 R =32 W 1 W 2 W 3 W 4 W 1 0.03 0.16 0.03 W 2 1.00 1.00 0.00 W 3 0.000.00 0.06 W 4 0.001.00 0.00 (i) L =25 n =1000 R =32 W 1 W 2 W 3 W 4 W 1 0.13 0.40 0.03 W 2 0.84 0.84 0.09 W 3 0.000.00 0.03 W 4 0.000.63 0.00 Weobservethatalthough MCA hasbeenabletorecovertherestoftheconnectivitieswell, ithasfailedtorecovertheconnections f W 1 g!f W 2 g f W 3 g forsmallsamplesizesand higherembeddingdimensions.Thisobservationcanbeattri butedtothenonlinearnature ofthesignal,whereforhighembeddingdimensionsthecontr ibutionofthenonlinearity isovershadowedbytheothercomponents.Wealsoobservetha t MCA hasbeenable torecoverthetrueconnectivitywithveryhighaccuracyfor L =1 .Thisobservation signiesthecontributionofproperembeddingdimensions, andthatitmightbecrucialfor detectingcausalinruence.6.4.4VaryingCouplingStrength Next,weconsiderthefollowingtimeseries( Chenetal. 2004 ), W 1 ( t )=3.4 W 1 ( t 1)(1 W 2 1 ( t 1)) e W 2 1 ( t 1) +0.8 W 1 ( t 2)+ 1 W 2 ( t )=3.4 W 2 ( t 1)(1 W 2 2 ( t 1)) e W 2 2 ( t 1) +0.5 W 2 ( t 2)+ r W 2 1 ( t 2)+ 2 172

PAGE 173

0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 A L =2 n =250 R =128 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 B L =10 n =250 R =128 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 C L =25 n =250 R =128 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 D L =2 n =500 R =64 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 E L =10 n =500 R =64 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 F L =25 n =500 R =64 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 G L =2 n =1000 R =32 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 H L =10 n =1000 R =32 0 0.5 1 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 I L =25 n =1000 R =32 Figure6-4.PerformanceofMCAfornonlinearsystemwhere 1 2 N (0,0.1 2 ) and 0 r 1 isthecouplingstrength.Here f W 1 g causes f W 2 g for r> 0 .Duetothenonlinearnatureofthistimeseries,Grangercau salityfailsto detectthetruecausaldirection.Therefore,( Chenetal. 2004 )hasproposedanonlinear versionofGrangercausalinferencebypiece-wiselinearap proximationwhichsuccessfully capturesthetruecausaldirection.However,thisapproach requiresrelativelyhighnumber ofrealizationstosamplethetimeserieswell.FromFigure 6-4 ,weobservethat MCA has beenabletorecoverthetruecausaldirectionaccuratelyov erdierentsamplesizesand embeddingdimensions. 173

PAGE 174

6.4.5RelativelyHighSynchrony Next,weconsiderthefollowingsimilarproblembutwithhig hersynchrony( Sun 2008 ), W 1 ( t )=1.4+0.3 W 1 ( t 2) W 2 1 ( t 1) +0.8 W 1 ( t 2)+ 1 W 2 ( t )=1.4+0.1 W 2 ( t 2) cW 1 ( t 1) W 2 ( t 1) +3.4 W 2 ( t 1) (1 r )0.4 W 2 2 ( t 1)+ 2 where 1 2 N (0,0.01 2 ) and 0 r 1 isthecouplingstrength.Thecouplingstrength isadvisedtobekeptbelow 0.7 toavoidstrongsynchrony( Sun 2008 ).Therefore,wescan thisparameterupto 0.6 .Duetothehigherlevelofsynchrony,thisproblemisrelati vely easierthanthepreviousproblem,andweobservethatinFigu re 6-5 ,where MCA hasbeen abletodetectthetrueconnectivitywithhighaccuracy.6.4.6MultivariateTimeSeries Next,weconsiderthefollowingbivariatetimeserieswhere eachelementi.e. f W 1 g or f W 2 g ,is 4 -dimensional( Fukumizuetal. 2008 ), W (1) 1 ( t +1)=1.4 W (1) 1 ( t ) 2 +0.3 W (2) 1 ( t ) W (2) 1 ( t +1)= W (1) 1 ( t ) W (2) 1 ( t +1)=1.4 f cW (1) 1 ( t ) W (1) 2 ( t ) +(1 r ) W (2) 2 ( t ) 2 g +0.1 W (2) 2 ( t ) W (2) 2 ( t +1)= W (1) 2 ( t ) where W (3) 1 W (4) 1 W (3) 2 W (4) 2 N (0,0.5 2 ) and 0 r 0.6 isthecouplingstrength.This problemisrelativelydicultsincethedimensionalityoft hejointspaceismuchhigher. However,fromFigure 6-6 ,weobservethat,althoughworsethattheprevioustwoexamp les but, MCA stillhassuccessfullyrecoveredthetruecausaldirection 174

PAGE 175

0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 A L =2 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 B L =10 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 C L =25 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 D L =2 n =250 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 E L =10 n =250 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 F L =25 n =250 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 G L =2 n =1000 R =32 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 H L =10 n =1000 R =32 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 I L =25 n =1000 R =32 Figure6-5.PerformanceofMCAforhighsynchrony6.4.7HeartRateandRespirationForce Next,weapply MCA onthebivariatetimeseriesofheartrate(H)andrespiratio n force(R)ofapatientsueringfrom sleepapnea .Thisdatahasbeenacquiredfrom the SantaFeInstitutetimeseriescompetition .Innaturalconditionitisobservedthat respirationforcehasacausalinruenceonheartrate.Howev er,forapatientsuering fromsleepapnea,thiscausaldirectionisreversed.Theref ore,forthisparticulardataset, astrongcausalinruenceisobservedfromheartratetorespi rationforce,whereasa weakinruenceisobservedintheotherdirection( Schreiber 2000 ).Werandomlyselect n =480,720,960 longsegmentsfromthetimeseries,anduse L =30,60,90 .Since,the 175

PAGE 176

0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 A L =1 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 B L =10 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 C L =25 n =250 R =128 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 D L =1 n =500 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 E L =10 n =500 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 F L =25 n =500 R =64 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 G L =1 n =1000 R =32 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 H L =10 n =1000 R =32 0 0.2 0.4 0.6 0 0.5 1 gAcceptance rate x 1 x 2 x 2 x 1 I L =25 n =1000 R =32 Figure6-6.PerformanceofMCAformultivariatetimeseriestimeseriesissampledat 2 Hzthesemeasurementsareequivalentto n =4,6,8 minutes and L =15,30,45 seconds.WeobservefromTable 6-3 that MCA supportsstronglyfor thecausalinruenceofheartrateoverrespirationforce.6.4.8DigoxinClearance Finally,weusethemeasurestoexaminethestructureofagra phicalmodel,usingthe Dixoginclearancedatasetwhichconsistsof 35 observationsofthreevariables;creatinine clearance(C),digoxinclearance(D),andurinerow(U),rec ordedfromseparatepatients ( Fukumizuetal. 2008 ).MedicalresearchshowsthatalthoughDandUaredependent oneachother,theyareconditionallyindependentgivenC.T estsofindependenceand 176

PAGE 177

Table6-3.PerformanceofMCAforsleepapneadata (a) L =15 s n =4 m HR H 0.28 R 0.00 (b) L =15 s n =6 m HR H 0.32 R 0.00 (c) L =15 s n =8 m HR H 0.20 R 0.00 (d) L =30 s n =4 m HR H 0.53 R 0.05 (e) L =30 s n =6 m HR H 0.67 R 0.02 (f) L =30 s n =8 m HR H 0.64 R 0.00 (g) L =45 s n =4 m HR H 0.52 R 0.08 (h) L =45 s n =6 m HR H 0.67 R 0.08 (i) L =45 s n =8 m HR H 0.69 R 0.09 conditionalindependenceusing GMA and MCA revealtherespectivep-valuestobe 0.05 and 0.25 ,thus,puttingmoreemphasisonthedesiredconditionalind ependence. 6.4.9CausalityUnderNoise x ( t ) isabivariatenonlinearautoregressiveprocesscorrupted bynoisei.e.( Nolte etal. 2010 ), x ( t )=(1 r ) y ( t )+ r B ( t ) where y 1 ( t )= P 5i =1 a i y 1 ( t i )+ P 5i 1 =1 ... P 5i L =1 b i 1 ... i L y 2 ( t i 1 )... y 2 ( t i L ) y 2 1 2 are uncorrelated 5 -thorderautoregressiveprocesses, B isamixingmatrix,and = jj Y jj = jj jj isanormalizingfactor.Therefore, y 2 y 1 ,but y 1 6! y 2 ;however,itishardtorejectthe latterusingtheprincipleofGrangercausalityduetothepr esenceofadditivenoise( Nolte etal. 2010 ).But,wechoosethisdatasettoobservetheperformanceoft heproposed approachundernoiseandnonlinearity. Here L controlstheorderofnonlinearinteraction,and L =1 implieslinear interaction.Wesimulate 1024 independenttimeseriesoflength 4000 eachforseveral r ,anduserandomcoecientsfortheauto-regressivemodels. Wereporttheperformance ofconditionalassociationandthelinearGrangercausalit yinFigure 6-7 .Thedotted linesaretherejectionthresholds.Forboththemethodswee mbedthetimeseriesin 177

PAGE 178

10 dimensions,i.e.,forGrangercausalityweta 10 orderautoregressiveprocessandfor conditionalassociationweevaluate MCA([ x j ( t 1),..., x j ( t 10)], x i ( t );[ x i ( t 1),..., x i ( t 10)]) .Forsimplicityweuse B = I ,i.e.,thenoisesourcesdonotmix.Inthissituation, bothmethodsshouldindicatetheabsenceofcausalrowinbot hdirectionsfor r =1 whichisnottrueotherwise. Since,theuseofconditionalassociationisbasedonthesam eprincipleofcausality asdiscussedbyGranger,itissusceptibletoinferringthes puriousdirection y 1 y 2 However,weobservethat,rst,conditionalassociationis capableofinferringnonlinear causalinruence{withhighaccuracyfororder 2 andhighsignaltonoiseratio{,anditis lesssusceptibletoinferringwrongcausaldirectioncompa redtoGrangercausality{almost nofalsepositivedetectioncomparedtoGrangercausalityw hichreturnsspuriousdirection veryfrequently.6.4.10EEGData Weapplyconditionalassociationinthecontextofexplorin gcausalrowofalpha rhythm{themostdominantfrequencyobservedinthehumanbr ainwithoureyesclosed {usingthedatasetdescribedin( Nolteetal. 2008 ).Thedatahasbeenrecordedfrom10 subjectswitheyesclosedusinga19channel10-20systemwit h256Hzsamplingfrequency. Sincethelocationofalpharhythmvariesover8-12Hz,welt ereachchannelusingaFIR bandpasslterwithcenterfrequency10Hzandqualityfacto r1.Toassessthecausal row,weconsider64pastvalueswhichcorrespondstoalagof2 50ms.Similarto( Nolte etal. 2008 ),weevaluatepairwisecausalrow,notconditionalcausalr ow,toreducethe numberofconditioningdimensions.However,weevaluateth ecausalrowinbothdirection separatelyi.e. F x y and F y x ratherthanonlyresidualrow,i.e., F x y F y x .Weobserve thatthereisasignicantcausalrowfromthefrontaltothep arietallobe,whichmatches withtheobservationmadeby( Nolteetal. 2008 ).Weshowanexampleofthecausalrow asassessedbyconditionalassociationinFigure 6-8 .Thegureshowstheaveragecausal rowestimatedover16secondwindowsshiftedby8seconds. 178

PAGE 179

-0.05 0 0.05 Order 1MCA 0 0.5 1 10 -2 10 0 gGranger Order 2 0 0.5 1 g Order 3 0 0.5 1 g Order 4 0 0.5 1 g x 2 x 1 0.5 quantile x 2 x 1 0.9 quantile x 2 x 1 0.1 quantile x 1 x 2 0.5 quantile x 1 x 2 0.9 quantile x 1 x 2 0.1 quantile Figure6-7.Grangercausalinferenceunderadditivenoise 6.5Discussion Inthischapter,wehaveintroducedtheconceptofcondition alassociationasa substituteforconditionaldependence.Themajordierenc ebetweentheproposed approachandthealreadyexistingapproachesisthatitexpl orestheconceptofconditional dependenceinthecontextoftherealizationsratherthanra ndomvariablesorthe probabilitylaw.Unlikeavailablemeasuresofassociation ,theproposedapproachis parameterfree,andalsorelativelyeasytocompute,thusma kingitanexcellenttoolfor inferringcausalrelationshipbetweenrandomvariablesan dstochasticprocesses.Wehave alsointroducedanovelschemeofgeneratingsurrogatedata forthetestofconditional independence,whichispracticallyattractivesinceitres amplestheoriginaldataallowing 179

PAGE 180

Figure6-8.Causalrowofalpharhythmthecomputationsinvolvedincomputingtheconditionalmea suretobereusedtocompute surrogatevalues. 180

PAGE 181

CHAPTER7 FUTUREWORK Inthisdissertation,Ihaveaddressedtheconceptofindepe ndence,conditional independence,dependenceandconditionaldependencefrom anapplicationperspective. Theconceptofindependencehasbeenaddressedinthecontex tofindependentcomponent analysis,whereameasureofindependencecanbeusedasacos tfunction.Ihaveunieda numberofexistingmeasuresofindependence,andderivedne wparameter-freeestimators tobeusedinICA.Ihaveshownthattheproposedapproachwork sequallywellcompared totheexistingmethods,intermsofaccuracyandcomputatio nalload,onanextensive simulationset-up. Theconceptofconditionalindependencehasbeenaddressed incontextofGranger non-causalinference.Ameasureofconditionalindependen ceusuallyrequiresfree parameterssinceconditionalattributesarenotwell-den edinthecontinuousdomain. Ihaveproposedakernelleastsquareregressionbasedappro achtodesignanestimatorof conditionalindependencewherethefreeparameterscanbec hosenusingtheknowledgeof theregressiondomain.Theeectivenessoftheproposedmet hodhasbeendemonstrated usingsimulateddata. Theconceptsofdependenceandconditionaldependencehasb eenexploredfroman applicationperspectivetodesignestimatorsthatarescal able,robust,parameter-freeand thatconveyanintuitiveunderstandingofhowdependencean dconditionaldependence increases/decreasesintermsoftherealizations.Theprop osedapproaches,termedas generalizedassociationandconditionalassociationbare resemblancewithmutual informationandconditionalmutualinformation.Ihaveals oshownthatasimilar approachcanbeconsideredtocaptureinteractiveassociat ionwhichissimilartothe conceptofinteractiveinformation.SeeTable 7-1 .Ihaveshowntheapplicabilityofthese approachesthroughvariableselectionongeneexpressiond ataandcausalrowexploration onEEGdata. 181

PAGE 182

Table7-1.Equivalenceofassociationandinformationbase dapproaches ConceptProperties Available MI( X Y ) Symmetric,non-negative, 0 ifandonlyif X ? Y CMI( X Y ; Z )MI(( X Z ), Y ) MI( Z Y ) non-negative, 0 ifandonlyif X ? Y j Z II( X ; Y ; Z )MI( Z Y )+MI( X Y ) MI(( X Z ), Y ) 0 ) Synergy, 0 ) Redundancy Proposed GMA( X Y ) Asymmetric, 0.5, 1 0.5 if X ? Y 1 if X = Y MCA( X Y ; Z )GMA(( X Z ), Y ) GMA( Z Y ) 0.5, 0.5 0 if X ? Y j Z MIA( X ; Y ; Z )GMA(( X Z ), Y ) max(GMA( Z Y )+GMA( X Y )) 0.5 0.5 0 ) Synergy, 0 ) Redundancy Iconcludethedissertationbyprovidingsomefutureworkgu idelinesfortheworks presentedinthepreviouschapters. 7.1UniedQuadraticMeasureofIndependence Althoughtheinitialresultsoftheproposedapproachispro mising,manyaspectsof thismethodstillrequirefurtherinvestigation.Firstofa ll,wehaveproposedageneric strictlypositivedenitekernelthatallowsthedesignofn ewstrictlypositivedenite kernelsbychoosingappropriate g and .However,wehaveonlyexploredaparticular combinationofthesetwoparameters.Sincethereexistothe r g and thatsatisfythe requirementsofTheorem 3.1 ,itisnaturaltoseekotherinterestingkernels. Next,wehaveshownthatanyindependencebasedICAalgorith malwaysestablishesa trade-obetweenaccuracyandcomputation.Therefore,itr emainsanopenquestion,how todesignasuitableICAalgorithmthatwouldimproveonboth thesefronts.Wehaveseen thatfromtheoptimizationpointofview,thegradientdesce ntbasedapproachperforms betterthantheexhaustivesearchinlowerdimensionswhere theroleripsinthehigher dimensions.Therefore,itisperhapsanaturalchoicetoemp loytwodierentoptimization schemesfortwodierentsituations,anddevelopanappropr iateswitchingstrategy. Finally,althoughwehaveproposedageneralizedtheoretic alframeworkfordesigning kernels,wehaveonlycompareddierentkernelsonanempiri calbasis.However,since weknowthegenericformofthekernel,italsoallowsustoper formatheoreticalanalysis 182

PAGE 183

ontheperformancesurfacegeneratedbythesekernelsfordi erent g and .Therefore, aplausibleapproachforunderstandingthequalityoftheke rnelwouldbetoexplorethe shapeoftheperformancesurfaceinthevicinityofthesolut ion.However,itiscertainlyan involvedproblem,andweleaveitasafuturework. 7.2AsymmetricQuadraticMeasureofConditionalIndepende nce AQMCIoranyothermeasureofconditionalindependenceassu chsuersfromseveral limitationsinthecontextofdetectingGrangernon-causal ity.First,thesemethodsusually involvefreeparameters,andsecond,theyarecomputationa llymoreinvolved,third, theyusuallyinvolvepermutationtestswheregeneratingth esurrogatedataremainsa challengingproblem,andfourth,theymightfailtoproperl yconditionontheconditioning variableswheneitherthenumberofconditioningvariables islargeorthenumberof samplesareinsucient.Nonetheless,wehaveshownempiric allythatthesemethodshave potentialtosubstitutethetraditionalapproachofusingl inearGrangercausality;however, theycertainlyrequiremoreinvestigation.Finally,wewou ldliketomentionthatinthe simulations,wehaveconditionedourtestonaxednumberof pastvaluessinceweknow thegroundtruth,whereas,inpracticalproblemsthetruenu mberoflagsareunknownand itmustbechosenviaappropriatestatisticaltest. 7.3GeneralizedAssociation AlthoughGMAprovidespromisingperformanceinalltheexpe rimentalset-ups,a fewattributesofthismeasurestillneedssomeattention.T hevalueofGMAdependson themetric,whichisnotuniqueforaparticularmetricspace ,andtherefore,itremains unclearexactlyhowthevalueobtainedbyGMAshouldbeinter pretedwhenadierent metricisused.ThedependenceofGMAonthemetricisanalogo ustothedependence ofHSIConthekernel( Grettonetal. 2007 ),andthedependenceofthenearestneighbor basedestimatorofMIontheneighborhood( Kraskovetal. 2004 ).Thisisundesirablebut arequiredsteptoextendstatisticalideastoanarbitrarys pace,e.g.see( Dabo-Niangand Rhomari 2009 ).Therefore,itwouldbeinterestingtondametricfreever sionofGMA 183

PAGE 184

whichwouldbeequivalenttondinga populationversion ofthesamplebasedapproach thatwehaveconsidered. GMAisameasureofassociationandnotameasureofdependenc einstrictsense sincewehavenotprovidedanyconcreteevidencethatitisze roifandonlyiftwo randomvariablesareindependent.Itwouldbeinterestingt ondaproofforthis statement,oracounterexampletoinvalidateit.Also,weha venotestablishedunder whatcircumstancesGMAreachesitsmaximumvalue.Although wearefamiliarwitha fewsucientconditions,especiallyin R ,itwouldbeinterestingtoexplorethenecessary conditions,toenrichourunderstandingofthismethod. Wealsohaveonlyexploredaparticularapproachofcapturin ggeneralizedassociation, wherewehaveexploitedtherelativelocationsofthe(rst) nearestneighbors.However, thisconceptcanbematerializedinotherdierentwaysbyex ploitingtheothersuccessive neighbors(i.e.second,thirdetc.),e.g.asproposedin( FriedmanandRafsky 1983 ).Such estimatorsofgeneralizedassociationiscurrentlyunderi nvestigation.Finally,wehave onlyexploredalimitedsetofpossibleapplications,while inpracticethisapproachcan beappliedtootherproblemssuchasthefollowing.Unsuperv isedandcausalvariable selection:GMAcanbeusedforthisapplicationbysimplycho osingvariablessuchthatthe dependencebetweentheselectedvariablesandtheoriginal setofvariablesismaximized. TheasymmetryofGMAcanalsobeexploitedinconjunctionwit hthevariableselection whichtakesadvantageofthecausalstructureofthevariabl es.Dimensionalityreduction: Followingsimilarlinesofthoughts,GMAcanalsobeusedfor dimensionalityreduction orfeatureselection,bothsupervisedandunsupervised,wh erethegoalistolinearlyor nonlinearlycombinevariablestocreatenewvariablesthat aremoreinformativeinthe contextofanapplication. 7.4ConditionalAssociation Wehaveobservedthattheproposedapproachusuallyprovide slesspowercompared to PC inthecontextofGaussianprobabilitylaw.Althoughthisis notadrawbackofthis 184

PAGE 185

method,itiscertainlyanundesirableproperty.Therefore ,moresophisticatedmeasureof associationshouldbeinvestigatedinordertoimprovethep erformanceoftheproposed approachintermsofstatisticalpower. Wehaveobservedthatthequalityofthesurrogatevaluesgen eratedbythe proposedmethodissomewhatpoorwhentheprobabilitylawis closetodegenerate. Thisissueshouldbeexploredinmoredetail.Alsothefullex tentoftheeectofthe freeneighborhoodparameterremainstobeexplored.Next,W ehaveonlyexplored themeasureofconditionalassociationinthecontextofthe realizations.However,a corresponding populationversion ofthisapproachwouldbeveryinterestingtoinvestigate. Onanalnote,wehavediscussedthattheproposedapproachc anbeappliedto anymetricspaces.However,wehaverestrictedourselvesto Euclideanspace,partly duetosimplicityinunderstandingand,also,duetotheunav ailabilityofsurrogatedata generationtechnique.Itwouldbeinterestingtoapplythis approachtomoreabstract spacestofullyunderstanditscapabilitiesandlimitation s. 185

PAGE 186

REFERENCES S.Achard.Asymptoticpropertiesofadimension-robustqua draticdependencemeasure. ComptesRendusMathematique ,346(3):213{126,2008. S.Achard,D.T.Pham,andC.Jutten.Quadraticdependenceme asurefornonlinear blindsourcesseparation.In ProceedingsoftheFourthInternationalSymposiumon IndependentComponentAnalysisandBlindSourceSeparatio n ,pages263{268,Nara, Japan,2003. I.A.AhmadandQ.Li.Testingindependencebynonparametric kernelmethod. Statistics &ProbabilityLetters ,34(2):201{210,1997. N.Ancona,D.Marinazzo,andS.Stramaglia.Radialbasisfun ctionapproachesto nonlinearGrangercausalityoftimeseries. PhysicalReviewE ,70:829{864,2004. D.Aronov.Fastalgorithmforthemetric-spaceanalysisofs imultaneousresponsesof multiplesingleneurons. JournalofNeuroscienceMethods ,124(2):175{179,2003.ISSN 0165-0270. F.R.BachandM.I.Jordan.Kernelindependentcomponentana lysis. J.Mach.Learn. Res. ,3:1{48,2002. C.S.Ballantine.OntheHadamardproduct. MathematischeZeitschrift ,105-5:365{366, 2005. J.R.Blum,J.Kiefer,andM.Rosenblatt.Distributionfreet estsofindependencebasedon thesampledistributionfunction. Ann.Math.Statist. ,32-2:485{498,1961. S.Bochner.Hilbertdistancesandpositivedenitefunctio ns. AnnalsofMathematics ,42 (3):647{656,July1941. G.BontempiandP.EmmanuelMeyer.Causallterselectionin microarraydata.In ICML ,pages95{102,2010. T.Bouezmarni,J.V.K.Rombouts,andA.Taamouti.Anonparam etriccopulabasedtest forconditionalindependencewithapplicationstoGranger causality.CIRANOworking papers,CIRANO,2009. J.F.CardosoandA.Souloumiac.Blindbeamformingfornon-g aussiansignals. Radarand SignalProcessing,IEEProceedingsF ,140(6):362{370,Dec1993. F.Chapeau-Blondeau.Autocorrelationversusentropy-bas edautoinformationfor measuringdependenceinrandomsignal. PhysicaA:StatisticalMechanicsandits Applications ,380:1{18,2007. A.Chen. IndependentComponentAnalysisandBlindSignalSeparatio n ,chapterFast KernelDensityIndependentComponentAnalysis,pages24{3 1.SpringerBerlin/ Heidelberg,2006. 186

PAGE 187

Y.Chen,G.Rangarajan,J.Feng,andM.Ding.Analyzingmulti plenonlineartimeseries withextendedGrangercausality. PhysicsLettersA ,324:26,2004. T.ChuandC.Glymour.Searchforadditivenonlineartimeser iescausalmodels. J.Mach. Learn.Res. ,9:967{991,2008.ISSN1532-4435. P.Comon.Independentcomponentanalysis,anewconcept? SignalProcessing ,36(3): 287{314,1994. F.ComteandO.Lieberman.Second-ordernoncausalityinmul tivariateGARCHprocesses. Journaloftimeseriesanalysis ,21(5):535{557,2000. I.Csiszar.Informationtypemeasuresofdierenceofproba bilitydistributionsandindirect observations. StuiaSci.Math.Hungary ,2:299{318,1967. S.Dabo-NiangandN.Rhomari.Kernelregressionestimation inaBanachspace. Journal ofStatisticalPlanningandInference ,139(4):1421{1434,2009. A.P.Dawid. Conditionalindependence ,volume2,pages146{155.Wiley-Interscience, 1998. M.A.DelgadoandW.G.Manteiga.Signicancetestinginnonp arametricregression basedonthebootstrap. AnnalsofStatistics ,29(5):1469{1507,2001. C.DiksandJ.DeGoede. Globalanalysisofdynamicalsystems ,chapterAgeneral nonparametricbootstraptestforGrangercausality,page3 91{403.IoPPublishing, London,2001. C.DiksandV.Panchenko.Anewstatisticandpracticalguide linesfornonparametric Grangercausalitytesting. JournalofEconomicDynamicsandControl ,30(9-10): 1647{1669,2006. C.G.H.DiksandV.Panchenko.Nonparametrictestsforseria lindependencebasedon quadraticforms. StatisticaSinica ,17:81{98,2007. A.J.Dubbs,B.A.Seiler,andM.O.Magnasco.Afast L p spikealignmentmetric. Neural Computation ,22(11):2785{2808,2010. R.O.Duda,P.E.Hart,andD.H.Stork. PatternClassication .WileyInterscience,2000. D.Edwards. IntroductiontoGraphicalModelling .Springer,2000. A.K.Engel,P.Fries,andW.Singer.Dynamicpredictions:os cillationsandsynchronyin top-downprocessing. NatureReviewsNeuroscience ,2(10):704{16,2001. S.FineandK.Scheinberg.EcientSVMtrainingusinglow-ra nkkernelrepresentations. J.Mach.Learn.Res. ,2:243{264,2001. J.H.FriedmanandL.C.Rafsky.Graph-theoreticmeasuresof multivariateassociation andprediction.In AnnalsofStatistics ,volume11(2),pages377{391,1983. 187

PAGE 188

N.FriedmanandI.Nachman.Gaussianprocessnetworks.In UAI'00:Proceedingsofthe 16thConferenceonUncertaintyinArticialIntelligence ,pages211{219,SanFrancisco, CA,USA,2000.MorganKaufmannPublishersInc. K.J.Friston.Functionalandeectiveconnectivityinneur oimaging:Asynthesis. Human BrainMapping ,2:56{78,1995. K.Fukumizu,F.R.Bach,andM.I.Jordan.Dimensionalityred uctionforsupervised learningwithreproducingkernelHilbertspaces. J.Mach.Learn.Res. ,5:73{99,2004. ISSN1533-7928. K.Fukumizu,A.Gretton,X.Sun,andB.Scholkopf.Kernelme asuresofconditional dependence. AdvancesinNeuralInformationProcessingSystems20 ,pages489{496, 2008. M.G.Genton.Classofkernelsformachinelearning:Astatis ticsperspective. J.Mach. Learn.Res. ,2:299{312,2001. D.Goldberg,J.Victor,E.Gardner,andD.Gardner.Spiketra inanalysistoolkit: Enablingwiderapplicationofinformation-theoretictech niquestoneurophysiology. Neuroinformatics ,7:165{178,2009.ISSN1539{2791. C.Granger.Testingforcausality:apersonalviewpoint. J.ofEconomicDynamicsand Control ,2:329{352,1980. C.GrangerandJ.L.Lin.Usingthemutualinformationcoeci enttoidentifylagsin nonlinearmodels. JournalofTimeSeriesAnalysis ,15-4:371{384,1991. C.W.Granger,E.Maasoumi,andJ.Racine.Adependencemetri cforpossiblynonlinear processes. JournalofTimeSeriesAnalysis ,25(5):649{669,2004. A.Gretton,K.M.Borgwardt,M.J.Rasch,B.Scholkopf,andA .J.Smola.Akernel methodforthetwo-sampleproblem. CoRR ,abs/0805.2368,2008. A.Gretton,K.Fukumizu,C.H.Teo,L.Song,B.Scholkopf,an dA.J.Smola.Akernel statisticaltestofindependence.In AdvancesinNeuralInformationProcessingSystems 19 ,585{592,2007. A.Gretton,R.Herbrich,A.Smola,O.Bousquet,andB.Schol kopf.Kernelmethodsfor measuringindependence. J.Mach.Learn.Res. ,6:2075{2129,2005. A.GunduzandJoseC.Prncipe.Correntropyasanovelmeasu refornonlinearitytests. SignalProcess. ,89(1):14{23,2009.ISSN0165-1684. I.GuyonandA.Elissee.Anintroductiontovariableandfea tureselection. J.Mach. Learn.Res. ,3:1157{1182,2003. 188

PAGE 189

T.Hsing,L.-Y.Liu,BrunM.,andE.R.Dougherty.Thecoecie ntofintrinsicdependence (featureselectionusingelCID). PatternRecognition ,38(5):623{636,2005.ISSN 0031-3203. A.Hyvorinen.Fastandrobustxed-pointalgorithmsforin dependentcomponentanalysis. IEEETransactionsonNeuralNetworks ,10(3):626{634,1999. K.H.Jeong,W.Liu,S.Han,E.Hasanbelliu,andJoseC.Prnc ipe.Thecorrentropymace lter. PatternRecognition ,42-5:871{885,2009. H.Joe.Relativeentropymeasuresofmultivariatedependen ce. JournalofAmerican StatisticalAssocation ,84(405):157{164,1989. T.Kanamori,S.Hido,andM.Sugiyama.Ecientdirectdensit yratioestimationfor non-stationarityadaptationandoutlierdetection.In AdvancesinNeuralInformation ProcessingSystems20 ,pages809{816,2008. A.Kankainen. Consistenttestingoftotalindependencebasedonempirica lcharacteristic function .PhDthesis,UniversityofJyvskyl,1995. M.G.Kendall.Anewmeasureofrankcorrelation. Biometrika ,30(1):81{93,1938. D.KollerandM.Sahami.Towardoptimalfeatureselection.T echnicalReport1996-77, StanfordInfoLab,1996. A.Kraskov,H.Stogbauer,andP.Grassberger.Estimatingm utualinformation. Physical review.E,Statistical,nonlinear,andsoftmatterphysics ,69(6Pt2),June2004. W.H.Kruskal.Ordinalmeasuresofassociation. JournalofAmericanstatisticalassociation ,53(284):814{861,1958. E.G.Learned-MillerandJ.W.FisherIII.ICAusingspacings estimatesofentropy. J. Mach.Learn.Res. ,4:1271{1295,2003. E.L.Lehmann.Someconceptsofdependence. AnnalsofMathematicalStatistics ,37-5: 1137{1153,1966. E.L.LehmannandG.W.Casella. TheoryofPointEstimation .Springer,1998. R.Li,W.Liu,andJ.C.Prncipe.Aunifyingcriterionforbl indsourceseparationbased oncorrentropy. SignalProcessing ,87-8:1872{1881,2007. O.LintonandP.Gozalo.Conditionalindependencerestrict ions:Testingandestimation. CowlesFoundationDiscussionPapers1140,CowlesFoundati on,YaleUniversity, November1996. W.Liu,P.P.Pokharel,andJ.C.Prncipe.Correntropy:Pro pertiesandapplications innon-gaussiansignalprocessing. IEEETransactionsonSignalProcessing ,55(11): 5286{5298,2007. 189

PAGE 190

J.H.Macke,P.Berens,A.S.Ecker,A.S.Tolias,andM.Bethge .Generatingspiketrains withspeciedcorrelationcoecients. NeuralComput. ,21:397{423,2009. D.DrouetMariandS.Kotz. CorrelationandDepedence .ImperiealCollegePress,London, 2001. D.Marinazzo,M.Pellicoro,andS.Stramaglia.Kernel-Gran gercausalityandtheanalysis ofdynamicalnetworks. Phys.Rev.E ,77(5):056{215,May2008. A.C.MicheasandK.Zografos.Measuringstochasticdepende nceusing -divergence. JournalofMultivariateAnalysis ,97(3):765{784,2006. J.Mooij,D.Janzing,J.Peters,andB.Scholkopf.Regressi onbydependenceminimization anditsapplicationtocausalinferenceinadditivenoisemo dels.In ICML'09:Proceedingsofthe26thAnnualInternationalConferenceonMachine Learning ,pages745{752, NewYork,NY,USA,2009.ACM. K.S.Narendra.Neuralnetworksforintelligentcontrol. Americancontrolconference workshop ,1997. A.Nedungadi,G.Rangarajan,N.Jain,andM.Ding.Analyzing multiplespiketrainswith nonparametricGrangercausality. J.ofComp.Neuro. ,27:55{64,2009. R.B.Nelsen. AnIntroductiontoCopulas .Springer,1999. R.B.Nelsen. Distributionswithgivenmarginalsandstatisticalmodell ing ,chapter Concordanceandcopulas:Asurvey,pages169{177.Dordrech t:Kluwer,2002. G.Nolte,A.Ziehe,N.Kramer,F.Popescu,andK.-R.Muller .Comparisonofgranger causalityandphaseslopeindex. JournalofMachineLearningResearch-Proceedings Track ,6:267{276,2010. G.Nolte,A.Ziehe,V.V.Nikulin,A.Schlogl,N.Kramer,T. Brismar,andK.-R.Muller. Robustlyestimatingtherowdirectionofinformationincom plexphysicalsystems. Phys.Rev.Lett. ,100(23):234{101,Jun2008. E.PaparoditisandD.Politis.Thelocalbootstrapforkerne lestimatorsundergeneral dependenceconditions. AnnalsoftheInstituteofStatisticalMathematics ,52(1): 139{159,2000. I.ParkandJ.C.Prncipe.CorrentropybasedGrangercausa lity. IEEEInternational ConferenceonAcoustics,SpeechandSignalProcessing ,pages3605{3608,2008. E.Parzen.Onestimationofaprobabilitydensityfunctiona ndmode. AnnalsofMathematicalStatistics ,33-3:1065{1076,1962. J.Pearl. Causality:models,reasoningandinference .CambridgeU.Press,2000. K.Pearson.Ontheprobableerrorofacoecientofmeansquar econtingency. Biometrika 10-4:570{573,1915. 190

PAGE 191

H.Peng,F.Long,andC.Ding.Featureselectionbasedonmutu alinformation:criteriaof max-dependency,max-relevance,andmin-redundancy. IEEETransPatternAnalMach Intell ,27(8):1226{1238,August2005. F.Perez-Cruz.Estimationofinformationtheoreticmeasu resforcontinuousrandom variables.In AdvancesinNeuralInformationProcessingSystems ,pages1257{1264, 2008. J.C.Prncipe. Informationtheoreticlearning:Renyi'sentropyandkern elperspective Springer,2010. M.Rao,S.Seth,J.Xu,Y.Chen,H.Tagare,andJoseC.Prnci pe.Atestofindependence basedonageneralizedcorrelationfunction. SignalProcessing ,91(1):15{27,2011. A.Renyi.Onmeasureofdependence. ActaMathematicaHungarica ,10:441{451,1959. A.Renyi. Probabilitytheory .North-Hollandpublishingcompany,1970. M.RobnikSikonjaandI.Kononenko.Theoreticalandempiricalanalys isofrelieand rrelie. MachineLearning ,53:23{69,October2003.ISSN0885-6125. M.Rosenblatt.Aquadraticmeasureofdeviationoftwo-dime nsionaldensityestimatesand atestofindependence. AnnalsofStatistics ,3-1:1{14,1975. E.B.Royer.Asimplemethodforcalculatingmeansquarecont ingency. Annalsof MathematicalStatistics ,4-1:75{78,1933. I.Santamaria,P.Pokharel,andJ.C.Prncipe.Generalize dcorrelationfunction: Denition,properties,andapplicationtoblindequalizat ion. IEEETransactionson SignalProcessing ,54(6):2187{2197,2006. B.ScholkopfandA.Smola. Learningwithkernels .MITPress,Cambridge,MA,2002. T.Schreiber.Measuringinformationtransfer. PhysicalReviewLetters ,85(2):461{464, 2000. B.SchweizerandE.F.Wol.Onnonparametricmeasuresofdep endenceforrandom variables. AnnalsofStatistics ,9-4:879{885,1981. A.K.Seth.AmatlabtoolboxforGrangercausalconnectivity analysis. Journalof NeuroscienceMethods ,2009. S.Seth,I.Park,A.Brockmeier,M.Semework,J.Choi,J.Fran cis,andJ.C.Prncipe.A novelfamilyofnon-parametriccumulativebaseddivergenc esforpointprocesses.In J.Laerty,C.K.I.Williams,J.Shawe-Taylor,R.S.Zemel,a ndA.Culotta,editors, AdvancesinNeuralInformationProcessingSystems23 ,pages2119{2127.2010. S.SethandJ.C.Prncipe.Onspeedingupcomputationininf ormationtheoreticlearning. Proc.Int.JointConferenceonNeuralNetworks ,2009. 191

PAGE 192

S.SethandJ.C.Prncipe.Aconditionaldistributionfunc tionbasedapproachtodesign nonparametrictestsofindependenceandconditionalindep endence. IEEEInternational ConferenceonAcoustics,SpeechandSignalProcessing ,pages2066{2069,2010a. S.SethandJ.C.Prncipe.AtestofGrangernon-causalityb asedonnonparametric conditionalindependence. InternationalConferenceonPatternRecognition ,2010b. S.SethandJ.C.Prncipe.Conditionalassociation. NeuralComputation(Submitted) 2011a. S.SethandJ.C.Prncipe.Generalizedassociation. IEEETransactionsonPattern AnalysisandMachineIntelligence(Submitted) ,2011b. S.SethandJ.C.Prncipe.Atestofgrangernon-causalityb asedonnonparametric measureofconditionalindependence. IEEETransactionsonNeuralNetwork(Submitted) ,2011c. S.Seth,M.Rao,I.Park,andJ.C.Prncipe.Auniedframewo rkforquadraticmeasures ofindependence. IEEETransactionsonSignalProcessing(inpress) ,2011. C.E.ShannonandW.Weaver. Themathematicaltheoryofcommunication .Universityof IllinoisPress,Urbana,Illinois,1949. H.Shen,S.Jegelka,andA.Gretton.Fastkernel-basedindep endentcomponentanalysis. SignalProcessing,IEEETransactionson ,57(9):3498{3511,May2009. L.Song,J.Huang,A.Smola,andK.Fukumizu.Hilbertspaceem beddingsofconditional distributionswithapplicationstodynamicalsystems.In ICML'09:Proceedingsofthe 26thAnnualInternationalConferenceonMachineLearning ,pages961{968,NewYork, NY,USA,2009.ACM. L.Song,A.Smola,A.Gretton,andK.M.Borgwardt.Adependen cemaximizationview ofclustering.In ICML'07:Proceedingsofthe24thinternationalconference onMachine learning ,pages815{822,NewYork,NY,USA,2007a.ACM. L.Song,A.Smola,A.Gretton,K.M.Borgwardt,andJ.Bedo.Su pervisedfeature selectionviadependenceestimation.In ICML'07:Proceedingsofthe24thinternational conferenceonMachinelearning ,pages823{830,NewYork,NY,USA,2007b.ACM. C.Spearman.Theproofandmeasurementofassociationbetwe entwothings. The AmericanJournalofPsychology ,15(1):72{101,1904. P.Spirtes,C.Glymour,andR.Scheines. Causation,Prediction,andSearch,2ndEdition volume1.TheMITPress,1edition,2001. 192

PAGE 193

B.K.Sriperumbudur,K.Fukumizu,A.Gretton,G.R.G.Lanckr iet,andB.Scholkopf. KernelchoiceandclassiabilityforRKHSembeddingsofpro babilitydistributions.In Y.Bengio,D.Schuurmans,J.Laerty,andA.CulottaC.Willi ams,editors, TwentyThirdAnnualConferenceonNeuralInformationProcessingS ystems ,pages1750{1758, 012010. J.F.Steensen.Oncertainmeasuresofdependencebetweens tatisticalvariables. Biometrika ,26-1:251{255,1934. L.SuandH.White.Aconsistentcharacteristicfunction-ba sedtestforconditional independence. JournalofEconometrics ,141-2:807{834,2007. L.SuandH.White.AnonparametricHellingermetrictestfor conditionalindependence. EconometricTheory ,24:829{864,2008. X.Sun.Assessingnonlineargrangercausalityfrommultiva riatetimeseries.In ECML PKDD'08:ProceedingsoftheEuropeanconferenceonMachine LearningandKnowledgeDiscoveryinDatabases-PartII ,pages440{455,2008. X.Sun,D.Janzing,B.Scholkopf,andK.Fukumizu.Akernelbasedcausallearning algorithm.In Proceedingsofthe24thinternationalconferenceonMachin elearning pages855{862,2007. T.SuzukiandM.Sugiyama.Least-squaresindependentcompo nentanalysis. Neural Computation ,23(1):284{301,2011. I.Takeuchi,Q.V.Le,T.D.Sears,andA.J.Smola.Nonparamet ricquantileestimation. J.Mach.Learn.Res. ,7:1231{1264,2006. M.C.W.vanRossum.Anovelspikedistance. NeuralComputation ,13:751{763,2001. J.D.Victor.Binlessstrategiesforestimationofinformat ionfromneuraldata. Phys.Rev. E ,66(5):051903,Nov2002. J.D.VictorandK.P.Purpura.Metric-spaceanalysisofspik etrains:theory,algorithms andapplication. Network:ComputationinNeuralSystems ,8(2):127{164,1997. W.Whitt.Bivariatedistributionswithgivenmarginals. Annalsofstatistics ,4(6): 1280{1289,1976. N.Wiener. ModernMathematicsforEngineers ,chapterThetheoryofprediction. McGraw-Hill,NewYork,1956. J.W.Xu,A.R.C.Paiva,I.Park,andJ.C.Prncipe.Areprodu cingkernelhilbertspace frameworkforinformation-theoreticlearning. SignalProcessing,IEEETransactionson 56(12):5891{5902,2008a. J.W.Xu,H.Bakardjian,A.Cichocki,andJ.C.Prncipe.Ane wnonlinearsimilarity measureformultichannelsignals. NeuralNetworks ,21:222{231,2008b. 193

PAGE 194

X.T.YuanandB.G.Hu.Robustfeatureextractionviainforma tiontheoreticlearning. In ICML'09:Proceedingsofthe26thAnnualInternationalConf erenceonMachine Learning ,pages1193{1200,2009. C.ZouandJ.Feng.Grangercausalityvs.dynamicBayesianne tworkinference:a comparativestudy. BMCBioinformatics ,10(1):122+,2009. 194

PAGE 195

BIOGRAPHICALSKETCH SohanSethreceivedhisBachelorofEngineeringdegreeinin strumentationand electronicsengineeringin2005fromJadavpurUniversity, Kolkata,India,andhisMaster ofSciencedegreeinelectricalandcomputerengineeringin 2008fromtheUniversityof Florida,Gainesville,Florida.Hisresearchinterestslie intheeldsofmachinelearning, datamining,andcomputationalneuroscience. 195