University Press of Florida

From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science

Buy This Book ( Related URL )
MISSING IMAGE

Material Information

Title:
From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science
Physical Description:
Book
Language:
en-US
Creator:
Matloff, Norm

Subjects

Subjects / Keywords:
discrete probability models, continuous probability models, multivariate probability models, introduction to statistical inference, introduction to model building, Markov chains, statistical relations between variables, introduction to queuing models, renewal theory and some applications, OGT+ isbn: 9781616100360
Computer Science
Educational Technology / Technology

Notes

Abstract:
The materials here form a textbook for a course in mathematical probability and statistics for computer science students. Computer science examples are used throughout, in areas such as: computer networks; data and text mining; computer security; remote sensing; computer performance evaluation; software engineering; data management. The R statistical/data manipulation language is used throughout. Since this is a computer science audience, a greater sophistication in programming can be assumed. The student must know calculus, basic matrix algebra, and have skill in programming. The author recommends that his R tutorial, "R for Programmers," be used as a supplement. the tutorial is available at: http://heather.cs.ucdavis.edu/~matloff/R/RProg.pdf Throughout the units, mathematical theory and applications are interwoven, with a strong emphasis on modeling: What do probabilistic models really mean, in real-life terms? How does one choose a model? How do we assess the practical usefulness of models? There is considerable discussion of the intuition involving probabilistic concepts. However, all models and so on are described precisely in terms of random variables and distributions. The book is available for download from: http://heather.cs.ucdavis.edu/~matloff/132/PLN/ProbStatBook.pdf
General Note:
Expositive
General Note:
Community College, Higher Education
General Note:
http://www.ogtp-cart.com/product.aspx?ISBN=9781616100360
General Note:
Adobe Reader
General Note:
Diagram, Graph, Narrative text, Textbook
General Note:
http://florida.theorangegrove.org/og/file/2e13f939-b18f-38f7-3719-2d4257f2690c/1/ProbStatBook.pdf

Record Information

Source Institution:
University of Florida
Holding Location:
University Press of Florida
Rights Management:
This work is licensed under the Attribution-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/3.0/us/. Copyright is retained by N. Matloff in all non-U.S. jurisdictions, but permission to use these materials in teaching is still granted, provided the authorship and licensing information …
Resource Identifier:
isbn - 9781616100360
System ID:
AA00011705:00001


This item is only available as the following downloads:


Full Text

PAGE 1

FromAlgorithmstoZ-Scores: ProbabilisticandStatisticalModelingin ComputerScience NormMatloff UniversityofCalifornia,Davis f X t = ce )]TJ/F26 9.9626 Tf 8.911 0 Td [(0 : 5 t )]TJ/F28 9.9626 Tf 8.911 0 Td [( 0 )]TJ/F27 6.9738 Tf 7.046 0 Td [(1 t )]TJ/F28 9.9626 Tf 8.911 0 Td [( libraryMASS x<-mvrnormmu,sgm SeeCreativeCommonslicenseat http://heather.cs.ucdavis.edu/matloff/probstatbook.html

PAGE 2

2

PAGE 3

Contents 1DiscreteProbabilityModels1 1.1ALOHANetworkExample...................................1 1.2BasicIdeasofProbability....................................2 1.2.1TheCrucialNotionofaRepeatableExperiment....................2 1.2.2OurDenitions.....................................4 1.2.3BasicProbabilityComputations:ALOHANetworkExample.............6 1.2.4Bayes'Theorem....................................8 1.2.5ALOHAintheNotebookContext...........................9 1.2.6Simulation.......................................10 1.2.6.1SimulationoftheALOHAExample....................10 1.2.6.2RollingDice.................................11 1.2.7Combinatorics-BasedProbabilityComputation....................12 1.2.7.1WhichIsMoreLikelyinFiveCards,OneKingorTwoHearts?......12 1.2.7.2AssociationRulesinDataMining....................13 1.3DiscreteRandomVariables...................................14 1.4Independence,ExpectedValueandVariance..........................14 1.4.1IndependentRandomVariables............................14 1.4.2ExpectedValue.....................................15 1.4.2.1IntuitiveDenition.............................15 1.4.2.2ComputationandPropertiesofExpectedValue...............15 1.4.2.3Casinos,InsuranceCompaniesandSumUsers,ComparedtoOthers..18 1.4.3Variance.........................................19 i

PAGE 4

ii CONTENTS 1.4.4IsaVarianceofXLargeorSmall?...........................20 1.4.5Chebychev'sInequality.................................21 1.4.6TheCoefcientofVariation..............................21 1.4.7Covariance.......................................22 1.4.8ACombinatorialExample...............................22 1.4.9ExpectedValue,Etc.intheALOHAExample.....................23 1.4.10ReconciliationofMathandIntuitionoptionalsection................24 1.5Distributions..........................................24 1.5.1BasicNotions......................................24 1.5.2ParametericFamiliesofpmfs.............................25 1.5.2.1TheGeometricFamilyofDistributions...................25 1.5.2.2TheBinomialFamilyofDistributions....................26 1.5.2.3ThePoissonFamilyofDistributions....................27 1.5.2.4TheNegativeBinomialFamilyofDistributions..............27 1.5.2.5ThePowerLawFamilyofDistributions..................29 1.6RecognizingDistributionsWhenYouSeeThem........................29 1.6.1ACoinGame......................................29 1.6.2TossingaSetofFourCoins..............................30 1.6.3TheALOHAExampleAgain.............................31 1.7ACautionaryTale........................................31 1.7.1TrickCoins,TrickyExample..............................31 1.7.2IntuitioninRetrospect.................................32 1.7.3ImplicationsforModeling...............................33 1.8WhyNotJustDoAllAnalysisbySimulation?.........................33 1.9TipsonFindingProbabilities,ExpectedValuesandSoOn..................33 2ContinuousProbabilityModels37 2.1ARandomDart.........................................37 2.2DensityFunctions........................................40 2.2.1Motivation,DenitionandInterpretation.......................40

PAGE 5

CONTENTS iii 2.2.2UseofDensitiestoFindProbabilitiesandExpectedValues..............42 2.3FamousParametricFamiliesofContinuousDistributions...................43 2.3.1TheUniformDistributions...............................43 2.3.1.1DensityandProperties............................43 2.3.1.2Example:ModelingofDiskPerformance..................43 2.3.1.3Example:ModelingofDenial-of-ServiceAttack..............43 2.3.2TheNormalGaussianFamilyofContinuousDistributions.............44 2.3.2.1DensityandProperties............................44 2.3.2.2Example:NetworkIntrusion........................45 2.3.2.3TheCentralLimitTheorem.........................46 2.3.2.4Example:CoinTosses............................46 2.3.2.5MuseumDemonstration...........................47 2.3.2.6Optionaltopic:FormalStatementoftheCLT................47 2.3.2.7ImportanceinModeling...........................48 2.3.3TheChi-SquareFamilyofDistributions........................48 2.3.3.1DensityandProperties............................48 2.3.3.2ImportanceinModeling...........................48 2.3.4TheExponentialFamilyofDistributions........................48 2.3.4.1DensityandProperties............................48 2.3.4.2ConnectiontothePoissonDistributionFamily...............49 2.3.4.3ImportanceinModeling...........................49 2.3.5TheGammaFamilyofDistributions..........................49 2.3.5.1DensityandProperties............................49 2.3.5.2Example:NetworkBuffer..........................50 2.3.5.3ImportanceinModeling...........................51 2.4DescribingFailure......................................51 2.4.1MemorylessProperty..................................51 2.4.2HazardFunctions....................................54 2.4.2.1BasicConcepts...............................54 2.4.3Example:SoftwareReliabilityModels.........................55

PAGE 6

iv CONTENTS 2.5ACautionaryTale:theBusParadox..............................55 2.6ChoosingaModel........................................57 2.7AGeneralMethodforSimulatingaRandomVariable.....................57 3MultivariateProbabilityModels61 3.1MultivariateDistributions....................................61 3.1.1WhyAreTheyNeeded?................................61 3.1.2DiscreteCase......................................61 3.1.3MultivariateDensities.................................63 3.1.3.1MotivationandDenition..........................63 3.1.3.2UseofMultivariateDensitiesinFindingProbabilitiesandExpectedValues63 3.1.3.3Example:aTriangularDistribution.....................64 3.2MoreonCo-variationofRandomVariables..........................66 3.2.1Covariance.......................................66 3.2.2Correlation.......................................67 3.2.3Example:ContinuationofSection3.1.3.3.......................67 3.2.4Example:aCatchupGame...............................68 3.3SetsofIndependentRandomVariables.............................69 3.3.1Properties........................................69 3.3.1.1ProbabilityMassFunctionsandDensitiesFactor..............69 3.3.1.2ExpectedValuesFactor...........................70 3.3.1.3CovarianceIs0...............................70 3.3.1.4VariancesAdd................................71 3.3.1.5Convolution.................................71 3.3.2Examples........................................72 3.3.2.1Example:Dice................................72 3.3.2.2Example:Ethernet..............................72 3.3.2.3Example:AnalysisofSeekTime......................73 3.3.2.4Example:BackupBattery..........................73 3.4MatrixFormulations......................................74

PAGE 7

CONTENTS v 3.4.1PropertiesofMeanVectors...............................74 3.4.2PropertiesofCovarianceMatrices...........................74 3.5ConditionalDistributions....................................75 3.5.1ConditionalPmfsandDensities............................75 3.5.2ConditionalExpectation................................75 3.5.3TheLawofTotalExpectationadvancedtopic....................76 3.5.3.1ExpectedValueAsaRandomVariable...................76 3.5.3.2TheFamousFormulaTheoremofTotalExpectation...........76 3.5.4WhatAbouttheVariance?...............................77 3.5.5Example:TrappedMiner................................77 3.5.6Example:AnalysisofHashTables...........................78 3.6ParametricFamiliesofDistributions..............................80 3.6.1TheMultinomialFamilyofDistributions.......................80 3.6.1.1ProbabilityMassFunction..........................80 3.6.1.2MeansandCovariances...........................81 3.6.1.3Application:TextMining..........................82 3.6.2TheMultivariateNormalFamilyofDistributions...................83 3.6.2.1DensitiesandProperties...........................83 3.6.2.2TheMultivariateCentralLimitTheorem..................86 3.6.2.3Example:DiceGame............................86 3.6.2.4Application:DataMining..........................88 3.7SimulationofRandomVectors.................................88 3.8TransformMethodsadvancedtopic..............................89 3.8.0.5GeneratingFunctions............................89 3.8.0.6MomentGeneratingFunctions.......................90 3.8.1Example:NetworkPackets...............................91 3.8.1.1PoissonGeneratingFunction........................91 3.8.1.2SumsofIndependentPoissonRandomVariablesArePoissonDistributed.91 3.8.1.3RandomNumberofBitsinPacketsonOneLinkadvancedtopic.....92 3.8.2OtherUsesofTransforms...............................93

PAGE 8

vi CONTENTS 3.9VectorSpaceInterpretationsforthemathematicallyadventurousonly...........93 3.9.1PropertiesofCorrelation................................93 3.9.2ConditionalExpectationAsaProjection........................94 3.10ProofoftheLawofTotalExpectation.............................95 4IntroductiontoStatisticalInference99 4.1WhatStatisticsIsAllAbout..................................99 4.2IntroductiontoCondenceIntervals..............................99 4.2.1HowLongShouldWeRunaSimulation?.......................99 4.2.2CondenceIntervalsforMeans............................100 4.2.2.1SamplingDistributions...........................100 4.2.2.2OurFirstCondenceInterval........................102 4.2.3MeaningofCondenceIntervals............................104 4.2.3.1AWeightSurveyinDavis..........................104 4.2.3.2BacktoOurBusSimulation.........................105 4.2.3.3OneMorePointAboutInterpretation....................106 4.2.4SamplingWithandWithoutReplacement.......................107 4.2.5OtherCondenceLevels................................107 4.2.6TheStandardErroroftheEstimate.........................107 4.2.7WhyNotDividebyn-1?TheNotionofBias.....................108 4.2.8AndWhatAbouttheStudent-tDistribution?.....................109 4.2.9CondenceIntervalsforProportions..........................110 4.2.9.1Derivation..................................110 4.2.9.2Examples..................................111 4.2.9.3Interpretation................................112 4.2.9.4Non-EffectofthePopulationSize.....................112 4.2.9.5PlanningAhead...............................112 4.2.10One-SidedCondenceIntervals............................113 4.2.11CondenceIntervalsforDifferencesofMeansorProportions............113 4.2.11.1IndependentSamples............................113

PAGE 9

CONTENTS vii 4.2.11.2RandomSampleSize............................115 4.2.11.3DependentSamples.............................115 4.2.12Example:MachineClassicationofForestCovers..................116 4.2.13ExactCondenceIntervals...............................117 4.2.14Slutsky'sTheoremadvancedtopic..........................117 4.2.14.1TheTheorem................................118 4.2.14.2WhyIt'sValidtoSubstitute s for .....................118 4.2.14.3Example:CondenceIntervalforaRatioEstimator............119 4.2.15TheDeltaMethod:CondenceIntervalsforGeneralFunctionsofMeansorProportionsadvancedtopic.................................119 4.2.15.1TheTheorem................................119 4.2.15.2Example:SquareRootTransformation...................120 4.2.15.3Example:CondenceIntervalfor 2 ....................121 4.2.16SimultaneousCondenceIntervals...........................123 4.2.16.1TheBonferonniMethod...........................124 4.2.16.2Scheffe'sMethodadvancedtopic.....................125 4.2.16.3Example...................................126 4.2.16.4OtherMethodsforSimultaneousInference.................126 4.2.17TheBootstrapMethodforFormingCondenceIntervalsadvancedtopic......126 4.3HypothesisTesting.......................................126 4.3.1TheBasics.......................................126 4.3.2GeneralTestingBasedonNormallyDistributedEstimators..............127 4.3.3Example:NetworkSecurity..............................128 4.3.4TheNotionofp-Values...............................128 4.3.5What'sRandomandWhatIsNot...........................128 4.3.6One-Sided H A .....................................129 4.3.7ExactTests.......................................129 4.3.8What'sWrongwithHypothesisTesting........................131 4.3.9WhattoDoInstead...................................131 4.3.10DecideontheBasisofthePreponderanceofEvidence...............132

PAGE 10

viii CONTENTS 4.4GeneralMethodsofEstimation.................................132 4.4.1Example:GuessingtheNumberofRafeTicketsSold................133 4.4.2MethodofMoments..................................133 4.4.3MethodofMaximumLikelihood............................134 4.4.4Example:EstimationtheParametersofaGammaDistribution............135 4.4.4.1MethodofMoments.............................135 4.4.4.2MLEs....................................136 4.4.5MoreExamples.....................................136 4.4.6WhatAboutCondenceIntervals?...........................138 4.4.7BayesianMethodsadvancedtopic..........................138 4.4.8TheEmpiricalcdf...................................139 4.5RealPopulationsandConceptualPopulations.........................140 4.6NonparametricDensityEstimation...............................141 4.6.1BasicIdeas.......................................141 4.6.2Histograms.......................................142 4.6.3Kernel-BasedDensityEstimationadvancedtopic..................144 4.6.4ProperUseofDensityEstimates............................145 5IntroductiontoModelBuilding149 5.1BiasVs.Variance........................................149 5.2DesperateforData......................................150 5.2.1MathematicalFormulationoftheProblem.......................150 5.2.2BiasandVarianceoftheTwoPredictors........................151 5.2.3Implications......................................151 5.3AssessingGoodnessofFitofaModel............................153 5.3.1TheChi-SquareGoodnessofFitTest.........................153 5.3.2Kolmogorov-SmirnovCondenceBands.......................154 5.4BiasVs.VarianceAgain...................................155 5.5Robustness...........................................155 6StatisticalRelationsBetweenVariables157

PAGE 11

CONTENTS ix 6.1TheGoals:PredictionandUnderstanding...........................157 6.2ExampleApplications:SoftwareEngineering,Networks,TextMining............157 6.3RegressionAnalysis.......................................158 6.3.1WhatDoesRelationshipReallyMean?.......................158 6.3.2MultipleRegression:MoreThanOnePredictorVariable...............159 6.3.3InteractionTerms....................................160 6.3.4NonrandomPredictorVariables............................160 6.3.5Prediction........................................163 6.3.6OptimalityoftheRegressionFunction.........................164 6.3.7ParametricEstimationofLinearRegressionFunctions................165 6.3.7.1MeaningofLinear.............................165 6.3.7.2PointEstimatesandMatrixFormulation..................166 6.3.7.3BacktoOurALOHAExample.......................167 6.3.7.4ApproximateCondenceIntervals.....................169 6.3.7.5OnceAgain,OurALOHAExample....................171 6.3.7.6EstimationVs.Prediction..........................172 6.3.7.7ExactCondenceIntervals.........................172 6.3.8TheFamousErrorTermadvancedtopic......................172 6.3.9ModelSelection....................................173 6.3.9.1TheOverttingProbleminRegression...................173 6.3.9.2MethodsforPredictorVariableSelection..................174 6.3.10NonlinearParametricRegressionModels.......................175 6.3.11NonparametricEstimationofRegressionFunctions..................176 6.3.12RegressionDiagnostics.................................177 6.3.13NominalVariables...................................177 6.3.14TheCaseinWhichAllPredictorsAreNominalVariables:AnalysisofVariance..177 6.3.14.1It'saRegression!..............................178 6.3.14.2InteractionTerms..............................178 6.3.14.3NowConsiderParsimony..........................179 6.3.14.4Reparameterization.............................180

PAGE 12

x CONTENTS 6.4TheClassicationProblem...................................181 6.4.1MeaningoftheRegressionFunction..........................181 6.4.1.1TheMeanHereIsaProbability.......................181 6.4.1.2OptimalityoftheRegressionFunction...................181 6.4.2ParametricModelsfortheRegressionFunctioninClassicationProblems......182 6.4.2.1TheLogisticModel:Form.........................182 6.4.2.2TheLogisticModel:IntuitiveMotivation..................183 6.4.2.3TheLogisticModel:TheoreticalFoundation................183 6.4.3NonparametricEstimationofRegressionFunctionsforClassicationadvancedtopic184 6.4.3.1UsetheKernelMethod,CART,Etc.....................184 6.4.3.2SVMs....................................184 6.4.4VariableSelectioninClassicationProblems.....................185 6.4.4.1ProblemsInheritedfromtheRegressionContext..............185 6.4.4.2Example:ForestCoverData........................185 6.4.5YMustHaveaMarginalDistribution!.........................186 6.5PrincipalComponentsAnalysis.................................187 6.5.1DimensionReductionandthePrincipleofParsimony.................187 6.5.2HowtoCalculateThem................................188 6.5.3Example:ForestCoverData..............................189 6.6Log-LinearModels.......................................189 6.6.1TheSetting.......................................189 6.6.2TheData........................................190 6.6.3TheModels.......................................191 6.6.4ParameterEstimation..................................192 6.6.5TheGoal:ParsimonyAgain..............................192 6.7Simpson'sNon-Paradox....................................193 7MarkovChains 197 7.1Discrete-TimeMarkovChains.................................197 7.1.1Example:FiniteRandomWalk.............................197

PAGE 13

CONTENTS xi 7.1.2Long-RunDistribution.................................198 7.1.2.1PeriodicChains...............................200 7.1.2.2TheMeaningoftheTermStationaryDistribution............200 7.1.3Example:Stuck-At0Fault...............................200 7.1.3.1Description.................................200 7.1.3.2InitialAnalysis................................201 7.1.3.3GoingBeyondFinding ..........................202 7.1.4Example:Shared-MemoryMultiprocessor......................204 7.1.4.1TheModel..................................204 7.1.4.2GoingBeyondFinding ..........................205 7.1.5Example:SlottedALOHA...............................206 7.1.5.1GoingBeyondFinding ..........................207 7.2HiddenMarkovModels.....................................209 7.3Continuous-TimeMarkovChains................................210 7.3.1Holding-TimeDistribution...............................210 7.3.2TheNotionofRates.................................211 7.3.3StationaryDistribution.................................211 7.3.4MinimaofIndependentExponentiallyDistributedRandomVariables........213 7.3.5Example:MachineRepair...............................213 7.3.6Continuous-TimeBirth/DeathProcesses........................215 7.3.7Example:ComputerWorm...............................216 7.4HittingTimesEtc.........................................217 7.4.1SomeMathematicalConditions............................217 7.4.2Example:RandomWalks...............................217 7.4.3FindingHittingandRecurrenceTimes.........................218 7.4.4Example:FiniteRandomWalk.............................219 7.4.5Example:Tree-Searching...............................220 8IntroductiontoQueuingModels223 8.1Introduction...........................................223

PAGE 14

xii CONTENTS 8.2M/M/1..............................................223 8.2.1Steady-StateProbabilities...............................224 8.2.2MeanQueueLength..................................224 8.2.3DistributionofResidenceTime/Little'sRule.....................225 8.3Multi-ServerModels......................................227 8.4LossModels...........................................227 8.5NonexponentialServiceTimes.................................229 8.6ReversedMarkovChains....................................230 8.6.1MarkovProperty....................................231 8.6.2Long-RunStateProportions..............................231 8.6.3FormoftheTransitionRatesoftheReversedChain..................231 8.6.4ReversibleMarkovChains...............................232 8.6.4.1ConditionsforCheckingReversibility...................232 8.6.4.2MakingNewReversibleChainsfromOldOnes..............233 8.6.4.3Example:QueueswithaCommonWaitingArea..............233 8.6.4.4Closed-FormExpressionfor forAnyReversibleMarkovChain.....234 8.7NetworksofQueues......................................235 8.7.1TandemQueues.....................................235 8.7.2JacksonNetworks...................................236 8.7.2.1OpenNetworks...............................236 8.7.3ClosedNetworks....................................237 9RenewalTheoryandSomeApplications239 9.1Introduction...........................................239 9.1.1TheLightBulbExample,Generalized.........................239 9.1.2DualityBetweenLifetimeDomainandCountsDomain.............239 9.2WhereWeAreGoing......................................240 9.3PropertiesofPoissonProcesses.................................240 9.3.1Denition........................................240 9.3.2AlternateCharacterizationsofPoissonProcesses...................240

PAGE 15

CONTENTS xiii 9.3.2.1ExponentialInterrenewalTimes.......................240 9.3.2.2Stationary,IndependentIncrements.....................241 9.3.3ConditionalDistributionofRenewalTimes......................242 9.3.4DecompositionandSuperpositionofPoissonProcesses................243 9.3.5NonhomogeneousPoissonProcesses.........................243 9.3.5.1Example:SoftwareReliability.......................244 9.4PropertiesofGeneralRenewalProcesses............................244 9.4.1TheRegenerativeNatureofRenewalProcesses....................244 9.4.2SomeoftheMainTheorems..............................244 9.4.2.1TheFunctions F n Sumtom.........................244 9.4.2.2TheRenewalEquation............................246 9.4.2.3TheFunctionmtUniquelyDeterminesFt................246 9.4.2.4AsymptoticBehaviorofmt........................248 9.5AlternatingRenewalProcesses.................................248 9.5.1DenitionandMainResult...............................248 9.5.2Example:InventoryProblemdifcult........................249 9.6Residual-LifeDistribution...................................250 9.6.1Residual-LifeDistribution...............................250 9.6.2AgeDistribution....................................251 9.6.3MeanoftheResidualandAgeDistributions......................253 9.6.4Example:EstimatingWebPageModicationRates..................253 9.6.5Example:TheS,sInventoryModelAgain......................253 9.6.6Example:DiskFileModel...............................253 9.6.7Example:EventSetsinDiscreteEventSimulationdifcult.............254 9.6.8Example:MemoryPagingModel...........................256

PAGE 16

xiv CONTENTS

PAGE 17

Preface Whyisthisbookdifferentfromallotherbooksonprobabilityandstatistics? First,thebookstressescomputerscienceapplications.Thoughotherbooksofthisnaturehavebeenpublished,notablytheoutstandingtextbyK.S.Trivedi,thisbookhasmuchmorecoverageofstatistics,includingafullchaptertitledStatisticalRelationsBetweenVariables.Thisshouldproveespeciallyhelpfulas machinglearninganddataminingplayagreaterroleincomputerscience. Second,thereisastrongemphasisonmodeling:Whatdoprobabilisticmodelsreallymean,inreal-life terms?Howdoesonechooseamodel?Howdoweassessthepracticalusefulnessofmodels?Thisaspectis soimportantthatthereisaseparatechapterforthisaswell,titledIntroductiontoModelBuilding.Throughoutthetext,thereisconsiderablediscussionoftheintuitioninvolvingprobabilisticconcepts.Forinstance, whenprobabilitydensityfunctionsareintroduced,thereisanextendeddiscussionregardingtheintuitive meaningofdensitiesandtheirrelationtotheinherently-discretenatureofrealdataduetotheniteprecisionofmeasurement.However,allmodelsandsoonaredescribedpreciselyintermsofrandomvariables anddistributions. Finally,theRstatistical/datamanipulationlanguageisusedthroughout.Again,severalexcellenttextson probabilityandstatisticshavebeenwrittenthatfeatureR,butthisbook,byvirtueofhavingacomputer scienceaudience,usesRinamoresophisticatedmanner.ItisrecommendedthatmyonlinetutorialonR programming, RforProgrammers http://heather.cs.ucdavis.edu/ matloff/R/RProg. pdf ,beusedasasupplement. Asprerequisites,thestudentmustknowcalculus,basicmatrixalgebra,andhaveskillinprogramming.As withanytextinprobabilityandstatistics,itisalsoextremelyhelpfulifthestudenthasagoodsenseofmath intuition,anddoesnottreatmathematicsassimplymemorizationofformulas. Anoteregardingthechaptersonstatistics:Itiscrucialthatstudentsapplytheconceptsinthought-provoking exercisesonrealdata.Nowadaystherearemanygoodsourcesforrealdatasetsavailable.Hereareafewto getyoustarted: UCIrvineMachineLearningRepository, http://archive.ics.uci.edu/ml/datasets. html UCLAStatisticsDept.datasets, http://www.stat.ucla.edu/data/ Dr.B'sWideWorldofWebData, http://research.ed.asu.edu/multimedia/DrB/ Default.htm StatSci.org,at http://www.statsci.org/datasets.html xv

PAGE 18

xvi CONTENTS UniversityofEdinburghSchoolofInformatics, http://www.inf.ed.ac.uk/teaching/courses/ dme/html/datasets0405.html NotethatRhasthecapabilityofreadinglesontheWeb,e.g. >z<-read.table"http://heather.cs.ucdavis.edu/matloff/z" ThisworkislicensedunderaCreativeCommonsAttribution-NoDerivativeWorks3.0UnitedStatesLicense.Thedetailsmaybeviewedat http://creativecommons.org/licenses/by-nd/3.0/ us/ ,butinessenceitstatesthatyouarefreetouse,copyanddistributethework,butyoumustattributethe worktomeandnotalter,transform,orbuilduponit.Ifyouareusingthebook,eitherinteachingaclass orforyourownlearning,Iwouldappreciateyourinformingme.Iretaincopyrightinallnon-U.S.jurisdictions,butpermissiontousethesematerialsinteachingisstillgranted,providedthelicensinginformation hereisdisplayed.

PAGE 19

Chapter1 DiscreteProbabilityModels 1.1ALOHANetworkExample Throughoutthisbook,wewillbediscussingbothclassicalprobabilityexamplesinvolvingcoins,cards anddice,andalsoexamplesinvolvingapplicationstocomputerscience.Thelatterwillinvolvediverseelds suchasdatamining,machinelearning,computernetworks,softwareengineeringandbioinformatics. Inthissection,anexamplefromcomputernetworksispresentedwhichwillbeusedatanumberofpoints inthischapter.Probabilityanalysisisusedextensivelyinthedevelopmentofnew,fastertypesofnetworks. Today'sEthernetevolvedfromanexperimentalnetworkdevelopedattheUniversityofHawaii,called ALOHA.Anumberofnetworknodeswouldoccasionallytrytousethesameradiochanneltocommunicatewithacentralcomputer.Thenodescouldn'theareachother,duetotheobstructionofmountains betweenthem.Ifonlyoneofthemmadeanattempttosend,itwouldbesuccessful,anditwouldreceivean acknowledgementmessageinresponsefromthecentralcomputer.Butifmorethanonenodeweretotransmit,a collision wouldoccur,garblingallthemessages.Thesendingnodeswouldtimeoutafterwaitingfor anacknowledgementwhichnevercame,andtrysendingagainlater.Toavoidhavingtoomanycollisions, nodeswouldengageinrandom backoff ,meaningthattheywouldrefrainfromsendingforawhileeven thoughtheyhadsomethingtosend. Onevariationis slotted ALOHA,whichdividestimeintointervalswhichIwillcallepochs.Eachepoch willhaveduration1.0,soepoch1extendsfromtime0.0to1.0,epoch2extendsfrom1.0to2.0andsoon. Intheversionwewillconsiderhere,ineachepoch,ifanodeisactive,i.e.hasamessagetosend,itwill eithersendorrefrainfromsending,withprobabilitypand1-p.Thevalueofpissetbythedesignerofthe network.RealEthernethardwaredoessomethinglikethis,usingarandomnumbergeneratorinsidethe chip. Theotherparameterqinourmodelistheprobabilitythatanodewhichhadbeeninactivegeneratesa messageduringanepoch,andthusbecomesactive.Thinkofwhathappenswhenyouareatacomputer. Youarenottypingconstantly,andwhenyouarenottyping,thetimeuntilyouhitakeyagainwillberandom. Ourparameterqmodelsthatrandomness. Letnbethenumberofnodes,whichwe'llassumeforsimplicityistwo.Assumealsoforsimplicitythatthe timingisasfollows.Arrivalofanewmessagehappensinthemiddleofanepoch,andthedecisionasto 1

PAGE 20

2 CHAPTER1.DISCRETEPROBABILITYMODELS whethertosendversusbackoffismadeneartheendofanepoch,say90%intotheepoch. Forexample,saythatatthebeginningoftheepochwhichextendsfromtime15.0to16.0,nodeAhas somethingtosendbutnodeBdoesnot.Attime15.5,nodeBwilleithergenerateamessagetosendornot, withprobabilityqand1-q,respectively.SupposeBdoesgenerateanewmessage.Attime15.9,nodeAwill eithertrytosendorrefrain,withprobabilitypand1-p,andnodeBwilldothesame.SupposeArefrains butBsends.ThenB'stransmissionwillbesuccessful,andatthestartofepoch16Bwillbeinactive,while nodeAwillstillbeactive.Ontheotherhand,supposebothAandBtrytosendattime15.9;bothwillfail, andthusbothwillbeactiveattime16.0,andsoon. Besuretokeepinmindthatinoursimplemodelhere,duringthetimeanodeisactive,itwon'tgenerate anyadditionalnewmessages. Let'sobservethenetworkfortwoepochs,epoch1andepoch2.Assumethatthenetworkconsistsofjust twonodes,callednode1andnode2,bothofwhichstartoutactive.Let X 1 and X 2 denotethenumbersof activenodesatthe veryend ofepochs1and2, afterpossibletransmissions .We'lltakeptobe0.4andqto be0.8inthisexample. Let'snd P X 1 =2 ,theprobabilitythat X 1 =2 ,andthengettothemainpoint,whichistoaskwhatwe reallymeanbythisprobability. Howcould X 1 =2 occur?Therearetwopossibilities: bothnodestrytosend;thishasprobability p 2 neithernodetriestosend;thishasprobability )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 Thus P X 1 =2= p 2 + )]TJ/F46 10.9091 Tf 10.91 0 Td [(p 2 =0 : 52 .1 1.2BasicIdeasofProbability 1.2.1TheCrucialNotionofaRepeatableExperiment It'scrucialtounderstandwhatthat0.52gurereallymeansinapracticalsense.Tothisend,let'sputthe ALOHAexampleasideforamoment,andconsidertheexperimentconsistingofrollingtwodice,saya blueoneandayellowone.LetXandYdenotethenumberofdotswegetontheblueandyellowdice, respectively,andconsiderthemeaningof P X + Y =6= 5 36 Inthemathematicaltheoryofprobability,wetalkofa samplespace ,whichconsistsofthepossibleoutcomes X;Y ,seeninTable1.1.Inatheoreticaltreatment,weplaceweightsof1/36oneachofthepoints inthespace,reectingthefactthateachofthe36pointsisequallylikely,andthensay,Whatwemeanby P X + Y =6= 5 36 isthattheoutcomes,5,,4,,3,,2,,1havetotalweight5/36. Thoughthenotionofsamplespaceispresentedineveryprobabilitytextbook,andiscentraltotheadvanced theoryofprobability,mostprobabilitycomputationsdonotrelyonexplicitlywritingdownasamplespace.

PAGE 21

1.2.BASICIDEASOFPROBABILITY 3 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6 Table1.1:SampleSpacefortheDiceExample notebookline outcome blue+yellow=6? 1 blue2,yellow6 No 2 blue3,yellow1 No 3 blue1,yellow1 No 4 blue4,yellow2 Yes 5 blue1,yellow1 No 6 blue3,yellow4 No 7 blue5,yellow1 Yes 8 blue3,yellow6 No 9 blue2,yellow5 No Table1.2:NotebookfortheDiceProblem Inthisparticularexampleitisusefulforusasavehicleforexplainingtheconcepts,butwewillNOTuseit much. ButtheintuitivenotionwhichisFARmoreimportantofwhat P X + Y =6= 5 36 meansisthe following.Imaginedoingtheexperimentmany,manytimes,recordingtheresultsinalargenotebook: Rollthedicethersttime,andwritetheoutcomeontherstlineofthenotebook. Rollthedicethesecondtime,andwritetheoutcomeonthesecondlineofthenotebook. Rollthedicethethirdtime,andwritetheoutcomeonthethirdlineofthenotebook. Rollthedicethefourthtime,andwritetheoutcomeonthefourthlineofthenotebook. Imagineyoukeepdoingthis,thousandsoftimes,llingthousandsoflinesinthenotebook. Therst9linesofthenotebookmightlooklikeTable1.2.Here2/9oftheselinessayYes.Butaftermany, manyrepetitions,approximately5/36ofthelineswillsayYes.Forexample,afterdoingtheexperiment720 times,approximately 5 36 720=100 lineswillsayYes. Thisiswhatprobabilityreallyis:Inwhatfractionofthelinesdoestheeventofinteresthappen? Itsounds simple,butifyoualwaysthinkaboutthislinesinthenotebookidea,probabilityproblemsarealot easiertosolve. Anditisthefundamentalbasisofcomputersimulation.

PAGE 22

4 CHAPTER1.DISCRETEPROBABILITYMODELS 1.2.2OurDenitions Thesedenitionsareintuitive,ratherthanrigorousmath,butintuitioniswhatweneed.Keepinmindthat wearemakingdenitions below,notlistingproperties. Weassumeanexperimentwhichisatleastinconceptrepeatable.Theexperimentofrolling twodiceisrepeatable,andeventheALOHAexperimentisso.Wesimplywatchthenetworkfor alongtime,collectingdataonpairsofconsecutiveepochsinwhichtherearetwoactivestationsat thebeginning.Ontheotherhand,theeconometricians,inforecasting2009,cannotrepeat2008. Yetalloftheeconometricians'toolsassumethateventsin2008wereaffectedbyvarioussortsof randomness,andwethinkofrepeatingtheexperimentinaconceptualsense. Weimagineperformingtheexperimentalargenumberoftimes,recordingtheresultofeachrepetition onaseparatelineinanotebook. WesayAisan event forthisexperimentifitisapossiblebooleani.e.yes-or-nooutcomeofthe experiment.Intheaboveexample,herearesomeevents: *X+Y=6 *X=1 *Y=3 *X-Y=4 A randomvariable isanumericaloutcomeoftheexperiment,suchasXandYhere,aswellasX+Y, 2XYandevensinXY. ForanyeventofinterestA,imagineacolumnonAinthenotebook.The k th lineinthenotebook,k= 1,2,3,...,willsayYesorNo,dependingonwhetherAoccurredornotduringthe k th repetitionofthe experiment.Forinstance,wehavesuchacolumninourtableabove,fortheevent f A=blue+yellow =6 g ForanyeventofinterestA,wedenePAtobethelong-runproportionoflineswithYesentries. ForanyeventsA,B,imagineanewcolumninournotebook,labeledAandB.Ineachline,this columnwillsayYesifandonlyifthereareYesentriesforbothAandB.PAandBisthenthe long-runproportionoflineswithYesentriesinthenewcolumnlabeledAandB. 1 ForanyeventsA,B,imagineanewcolumninournotebook,labeledAorB.Ineachline,this columnwillsayYesifandonlyifatleastoneoftheentriesforAandBsaysYes. 2 ForanyeventsA,B,imagineanewcolumninournotebook,labeledA j BandpronouncedA givenB.Ineachline: *ThisnewcolumnwillsayNAnotapplicableiftheBentryisNo. 1 Inmosttextbooks,whatwecallAandBhereiswrittenA B,indicatingtheintersectionoftwosetsinthesamplespace. Butagain,wedonottakeasamplespacepointofviewhere. 2 Inthesamplespaceapproach,thisiswrittenA [ B.

PAGE 23

1.2.BASICIDEASOFPROBABILITY 5 *IfitisalineinwhichtheBcolumnsaysYes,thenthisnewcolumnwillsayYesorNo,depending onwhethertheAcolumnsaysYesorNo. Thinkofprobabilitiesinthisnotebookcontext: PAmeansthelong-runproportionoflinesinthenotebookinwhichtheAcolumnsaysYes. PAorBmeansthelong-runproportionoflinesinthenotebookinwhichtheA-or-Bcolumnsays Yes. PAandBmeansthelong-runproportionoflinesinthenotebookinwhichtheA-and-Bcolumnsays Yes. PA j Bmeansthelong-runproportionoflinesinthenotebookinwhichtheA j Bcolumnsays Yes amongthelineswhichdoNOTsayNA. AhugelycommonmistakeistoconfusePAandBandPA j B. Thisiswherethenotebookview becomessoimportant.Comparethequantities P X =1and S =6= 1 36 and P X =1 j S =6= 1 5 whereS=X+Y: 3 Afteralargenumberofrepetitionsoftheexperiment,approximately1/36ofthelinesofthenotebook willhavethepropertythatbothX=1andS=6sinceX=1andS=6isequivalenttoX=1andY =5. Afteralargenumberofrepetitionsoftheexperiment,if welookonlyatthelinesinwhichS=6 then amongthoselines ,approximately1/5of thoselines willshowX=1. ThequantityPA j Biscalledthe conditionalprobabilityofA,givenB Notethat and hashigherlogicalprecedencethan or .Forexample,PAandBorCmeansP[AandBor C].Also, not hashigherprecedencethan and Herearesomemoreveryimportantdenitionsandproperties: SupposeAandBareeventssuchthatitisimpossibleforthemtooccurinthesamelineofthe notebook.Theyaresaidtobe disjoint events.Then P A or B = P A + P B .2 Again,thisterminology disjoint stemsfromtheset-theoreticsamplespaceapproach,whereitmeans thatA B= .Thatmathematicalterminologyworksneforourdiceexample,butinmyexperience peoplehavemajordifcultyapplyingitcorrectlyinmorecomplicatedproblems.Thisisanother illustrationofwhyIputsomuchemphasisonthenotebookframework. 3 ThinkofaddinganScolumntothenotebooktoo

PAGE 24

6 CHAPTER1.DISCRETEPROBABILITYMODELS IfAandBarenotdisjoint,then P A or B = P A + P B )]TJ/F46 10.9091 Tf 10.909 0 Td [(P A and B .3 Inthedisjointcase,thatsubtractedtermis0,so.3reducesto.2. EventsAandBaresaidtobe stochasticallyindependent ,usuallyjuststatedas independent 4 if P A and B = P A P B .4 Incalculatinganandprobability,howdoesoneknowwhethertheeventsareindependent?The answeristhatthiswilltypicallybeclearfromtheproblem.Ifwetosstheblueandyellowdice, forinstance,itisclearthatonediehasnoimpactontheother,soeventsinvolvingthebluedieare independentofeventsinvolvingtheyellowdie.Ontheotherhand,intheALOHAexample,it'sclear thateventsinvolving X 1 areNOTindependentofthoseinvolving X 2 IfAandBarenotindependent,theequation.4generalizesto P A and B = P A P B j A .5 NotethatifAandBactuallyareindependent,then P B j A = P B ,and.5reducesto.4. 1.2.3BasicProbabilityComputations:ALOHANetworkExample Pleasekeepinmindthatthenotebookideaissimplyavehicletohelpyouunderstandwhattheconcepts reallymean.Thisiscrucialforyourintuitionandyourabilitytoapplythismaterialintherealworld.But thenotebookideaisNOTforthepurposeofcalculatingprobabilities.Instead,weusethepropertiesof probability,asseeninthefollowing. Let'slookatallofthisintheALOHAcontext.InEquation.1wefoundthat P X 1 =2= p 2 + )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 =0 : 52 .6 Howdidwegetthis?Let C i denotetheeventthatnodeitriestosend,i=1,2.Thenusingthedenitions above,ourstepswouldbe P X 1 =2= P C 1 and C 2 ornot C 1 andnot C 2 .7 = P C 1 and C 2 + P not C 1 andnot C 2 from.2.8 = P C 1 P C 2 + P not C 1 P not C 2 from.4.9 = p 2 + )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 .10 Herearethereasonsforthesesteps: 4 Theterm stochastic isjustafancysynonymfor random .

PAGE 25

1.2.BASICIDEASOFPROBABILITY 7 .7:Welistedthewaysinwhichtheevent f X 1 =2 g couldoccur. .8:Write G = C 1 and C 2 H = D 1 and D 2 ,where D i = not C i ,i=1,2.ThentheeventsGandHare clearlydisjoint;ifinagivenlineofournotebookthereisaYesforG,thendenitelytherewillbea NoforH,andviceversa. .9:Thetwonodesactphysicallyindependentlyofeachother.Thustheevents C 1 and C 2 arestochasticallyindependent,soweapplied.4.Thenwedidthesamefor D 1 and D 2 NotecarefullythatinEquation.7,ourrststepwasto breakbigeventsdowninto smallevents, inthiscasebreakingtheevent f X 1 =2 g downintotheevents C 1 and C 2 and D 1 and D 2 .Thisisacentralpartofmostprobabilitycomputations.Incalculatingaprobability,askyourself, Howcanithappen? Goodtip: Whenyousolveproblemslikethis,writeoutthe and and or conjunctionslikeI'vedoneabove. Thishelps! Now,whatabout P X 2 =2 ?Again,webreakbigeventsdownintosmallevents,inthiscaseaccordingto thevalueof X 1 : P X 2 =2= P X 1 =0and X 2 =2 or X 1 =1and X 2 =2 or X 1 =2and X 2 =2 = P X 1 =0and X 2 =2 .11 + P X 1 =1and X 2 =2 + P X 1 =2and X 2 =2 Since X 1 cannotbe0,thatrstterm, P X 1 =0and X 2 =2 is0.Todealwiththesecondterm, P X 1 = 1and X 2 =2 ,we'lluse.5.Duetothetime-sequentialnatureofourexperimenthere,itisnaturalbut certainlynotmandated,aswe'lloftenseesituationstothecontrarytotakeAandBtobe f X 1 =1 g and f X 2 =2 g ,respectively.So,wewrite P X 1 =1and X 2 =2= P X 1 =1 P X 2 =2 j X 1 =1 .12 Tocalculate P X 1 =1 ,weusethesamekindofreasoningasinEquation.1.Fortheeventinquestion tooccur,eithernodeAwouldsendandBwouldn't,orAwouldrefrainfromsendingandBwouldsend. Thus P X 1 =1=2 p )]TJ/F46 10.9091 Tf 10.909 0 Td [(p =0 : 48 .13 Nowweneedtond P X 2 =2 j X 1 =1 .Thisagaininvolvesbreakingbigeventsdownintosmallones.If X 1 =1 ,then X 2 =2 canoccuronlyif both ofthefollowingoccur: EventA:Whichevernodewastheonetosuccessfullytransmitduringepoch1andwearegiventhat thereindeedwasone,since X 1 =1 nowgeneratesanewmessage.

PAGE 26

8 CHAPTER1.DISCRETEPROBABILITYMODELS EventB:Duringepoch2,nosuccessfultransmissionoccurs,i.e.eithertheybothtrytosendorneither triestosend. RecallingthedenitionsofpandqinSection1.1,wehavethat P X 2 =2 j X 1 =1= q [ p 2 + )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 ]=0 : 41 .14 Thus P X 1 =1and X 2 =2=0 : 48 0 : 41=0 : 20 Wegothroughasimilaranalysisfor P X 1 =2and X 2 =2 :Werecallthat P X 1 =2=0 : 52 from before,andndthat P X 2 =2 j X 1 =2=0 : 52 aswell.Sowend P X 1 =2and X 2 =2 tobe 0 : 52 2 =0 : 27 .Puttingallthistogether,wendthat P X 2 =2=0 : 47 Let'sdoonemore;let'snd P X 1 =1 j X 2 =2 .[Pauseaminuteheretomakesureyouunderstandthat thisisquitedifferentfrom P X 2 =2 j X 1 =1 .]From.5,weknowthat P X 1 =1 j X 2 =2= P X 1 =1 andX 2 =2 P X 2 =2 .15 Wecomputedbothnumeratoranddenominatorherebefore,inEquations.12and.11,soweseethat P X 1 =1 j X 2 =2=0 : 20 = 0 : 47=0 : 43 1.2.4Bayes'Theorem Following.15above,wenotedthattheingredientshadalreadybeencomputed,in.12and.11.If wegobacktothederivationsinthosetwoequationsandsubstitutein.15,wehave P X 1 =1 j X 2 =2= P X 1 =1 and X 2 =2 P X 2 =2 .16 = P X 1 =1 and X 2 =2 P X 1 =1 and X 2 =2+ P X 1 =2 and X 2 =2 .17 = P X 1 =1 P X 2 =2 j X 1 =1 P X 1 =1 P X 2 =2 j X 1 =1+ P X 1 =2 P X 2 =2 j X 1 =2 .18 Lookingatthisinmoregenerality,foreventsAandBwewouldndthat P A j B = P A P B j A P A P B j A + P not A P B j not A .19 Thisisknownas Bayes'Theorem or Bayes'Rule .

PAGE 27

1.2.BASICIDEASOFPROBABILITY 9 notebookline X 1 =2 X 2 =2 X 1 =2 and X 2 =2 X 2 =2 j X 1 =2 1 Yes No No No 2 No No No NA 3 Yes Yes Yes Yes 4 Yes No No No 5 Yes Yes Yes Yes 6 No No No NA 7 No Yes No NA Table1.3:TopofNotebookforTwo-EpochALOHAExperiment 1.2.5ALOHAintheNotebookContext ThinkofdoingtheALOHAexperimentmany,manytimes. Runthenetworkfortwoepochs,startingwithbothnodesactive,thersttime,andwritetheoutcome ontherstlineofthenotebook. Runthenetworkfortwoepochs,startingwithbothnodesactive,thesecondtime,andwritethe outcomeonthesecondlineofthenotebook. Runthenetworkfortwoepochs,startingwithbothnodesactive,thethirdtime,andwritetheoutcome onthethirdlineofthenotebook. Runthenetworkfortwoepochs,startingwithbothnodesactive,thefourthtime,andwritetheoutcomeonthefourthlineofthenotebook. Imagineyoukeepdoingthis,thousandsoftimes,llingthousandsoflinesinthenotebook. TherstsevenlinesofthenotebookmightlooklikeTable1.3.Weseethat: Amongthoserstsevenlinesinthenotebook,4/7ofthemhave X 1 =2 .Aftermany,manylines,this proportionwillbeapproximately0.52. Amongthoserstsevenlinesinthenotebook,3/7ofthemhave X 2 =2 .Aftermany,manylines,this proportionwillbeapproximately0.47. 5 Amongthoserstsevenlinesinthenotebook,3/7ofthemhave X 1 =2and X 2 =2 .Aftermany, manylines,thisproportionwillbeapproximately0.27. Amongtherstsevenlinesinthenotebook,fourofthemdonotsayNAinthe X 2 =2 j X 1 =2 column. Amongthesefourlines ,twosayYes,aproportionof2/4.Aftermany,manylines,this proportionwillbeapproximately0.52. 5 Don'tmakeanythingofthefactthattheseprobabilitiesnearlyaddupto1.

PAGE 28

10 CHAPTER1.DISCRETEPROBABILITYMODELS 1.2.6Simulation Tosimulatewhetherasimpleeventoccursornot,wetypicallyuseRfunction runif .Thisfunctiongeneratesrandomnumbersfromtheinterval,1,withallthepointsinsidebeingequallylikely.Soforinstance theprobabilitythatthefunctionreturnsavaluein,0.5is0.5.Thushereiscodetosimulatetossingacoin: ifrunif<0.5heads<-TRUEelseheads<-FALSE Theargument1meanswewishtogeneratejustonerandomnumberfromtheinterval,1. 1.2.6.1SimulationoftheALOHAExample Followingisacomputationviasimulationofthe approximate valueof P X 1 =2 P X 2 =2 and P X 2 = 2 j X 1 =1 ,usingtheRstatisticallanguage,thelanguageofchoiceofprofessionalstatisticans.Itisopen source,it'sstatisticallycorrectnotallstatisticalpackagesareso,hasdazzlinggraphicscapabilities,etc. Tolearnaboutthesyntaxe.g. < )]TJ/F38 10.9091 Tf 11.232 0 Td [(astheassignmentoperator,seemyintroductiontoRforprogrammers at http://heather.cs.ucdavis.edu/ matloff/R/RProg.pdf 1 #findsPX1=2,PX2=2andPX2=2|X1=1inALOHAexample 2 sim<-functionp,q,nreps{ 3 countx2eq2<-0 4 countx1eq1<-0 5 countx1eq2<-0 6 countx2eq2givx1eq1<-0 7 #simulatenrepsrepetitionsoftheexperiment 8 foriin1:nreps{ 9 numsend<-0#nomessagessentsofar 10 #simulateAandB'sdecisiononwhethertosendinepoch1 11 foriin1:2 12 ifrunif
PAGE 29

1.2.BASICIDEASOFPROBABILITY 11 39 cat"PX2=2:",countx2eq2/nreps,"n" 40 cat"PX2=2|X1=1:",countx2eq2givx1eq1/countx1eq1,"n" 41 } Notethateachofthe nreps iterationsofthemain for loopisanalogoustoonelineinourhypothetical notebook.So,thendtheapproximatevalueof P X 1 =2 ,dividethecountofthenumberoftimes X 1 =2 occurredbythenumberofiterations. Noteespeciallythatthewaywecalculated P X 2 =2 j X 1 =1 wastocountthenumberoftimes X 2 =2 amongthosetimesthat X 1 =1 ,justlikeinthenotebookcase. Remember,simulationresultsareonlyapproximate.Thelargerthevalueweusefor nreps ,themore accurateoursimulationresultsarelikelytobe.Thequestionofhowlargeweneedtomake nreps willbe addressedinalaterchapter. 1.2.6.2RollingDice Ifwerollthreedice,whatistheprobabilitythattheirtotalis8?Wecountallthepossibilities,orwecould getanapproximateanswerviasimulation: 1 #rollddice;findPtotal=k 2 3 #simulaterollofonedie;thepossiblereturnvaluesare1,2,3,4,5,6, 4 #allequallylikely 5 roll<-functionreturnsample:6,1 6 7 probtotk<-functiond,k,nreps{ 8 count<-0 9 #dotheexperimentnrepstimes 10 forrepin1:nreps{ 11 sum<-0 12 #rollddiceandfindtheirsum 13 forjin1:dsum<-sum+roll 14 ifsum==kcount<-count+1 15 } 16 returncount/nreps 17 } Thecalltothebuilt-inRfunction sample heresaystotakeasampleofsize1fromthesequenceofnumbers 1,2,3,4,5,6.That'sjustwhatwewanttosimulatetherollingofadie.Thecode forjin1:dsum<-sum+roll thensimulatesthetossingofadiedtimes,andcomputingthesum. SinceapplicationsofRoftenuselargeamountsofcomputertime,goodRprogrammersarealwayslooking forwaystospeedthingsup.Hereisanalternateversionoftheaboveprogram: 1 #rollddice;findPtotal=k 2 3 probtotk<-functiond,k,nreps{ 4 count<-0

PAGE 30

12 CHAPTER1.DISCRETEPROBABILITYMODELS 5 #dotheexperimentnrepstimes 6 forrepin1:nreps 7 total<-sumsample:6,d,replace=TRUE 8 iftotal==kcount<-count+1 9 } 10 returncount/nreps 11 } Herethecode sample:6,d,replace=TRUE simulatestossingthediedtimestheargument replace saysthisissamplingwithreplacement,sofor instancewecouldgettwo6s.Thatreturnsad-elementarray,andwethencallR'sbuilt-infunction sum tondthetotaloftheddice. Thesecondversionofthecodehereismorecompactandeasiertoread.Italsoeliminatesoneexplicitloop, whichisthekeytowritingfastcodeinR. 1.2.7Combinatorics-BasedProbabilityComputation Insomeprobabilityproblemsalltheoutcomesareequallylikely.Theprobabilitycomputationisthensimply amatterofcountingalltheoutcomesofinterestanddividingbythetotalnumberofpossibleoutcomes.Of course,sometimesevensuchcountingcanbechallenging,butitissimpleinprinciple.We'lldiscusstwo exampleshere. 1.2.7.1WhichIsMoreLikelyinFiveCards,OneKingorTwoHearts? Supposewedeala5-cardhandfromaregular52-carddeck.Whichislarger,PkingorPhearts? Beforecontinuing,takeamomenttoguesswhichoneismorelikely. Now,hereishowwecancomputetheprobabilities.Thereare )]TJ/F44 7.9701 Tf 5 -3.995 Td [(52 5 possiblehands,sothisisourdenominator. ForPking,ournumeratorwillbethenumberofhandsconsistingofonekingandfournon-kings.Since therearefourkingsinthedeck,thenumberofwaystochooseonekingis )]TJ/F44 7.9701 Tf 5 -3.996 Td [(4 1 =4 .Thereare48non-kings inthedeck,sothereare )]TJ/F44 7.9701 Tf 5 -3.995 Td [(48 4 waystochoosethem.Everychoiceofonekingcanbecombinedwithevery choiceoffournon-kings,sothenumberofhandsconsistingofonekingandfournon-kingsis 4 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(48 4 .Thus P king = 4 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(48 4 )]TJ/F44 7.9701 Tf 5 -3.995 Td [(52 5 =0 : 299 .20 Thesamereasoninggivesus P hearts = )]TJ/F44 7.9701 Tf 5 -3.996 Td [(13 2 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(39 3 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(52 5 =0 : 274 .21 So,the1-kinghandisjustslightlymorelikely.

PAGE 31

1.2.BASICIDEASOFPROBABILITY 13 Bytheway,IusedtheRfunction choose toevaluatethesequantities,runningRininteractivemode,e.g.: >choose,2 choose,3/choose,5 [1]0.2742797 Ralsohasaverynicefunction combn whichwillgenerateallthe )]TJ/F47 7.9701 Tf 5 -3.995 Td [(n k combinationsofkthingschosen fromn,andalsoatyouroptioncallauser-speciedfunctiononeachcombination.Thisallowsyoutosave alotofcomputationalwork.SeetheexamplesinR'sonlinedocumentation. Here'showwecoulddothe1-kingproblemviasimulation: 1 #usesimulationtofindPkingwhendeala5-cardhandfroma 2 #standarddeck 3 4 #thinkofthe52cardsasbeinglabeled1-52,withthe4kingshaving 5 #numbers1-4 6 7 sim<-functionnreps{ 8 count1king<-0#countofnumberofhandswith1king 9 forrepin1:nreps{ 10 hand<-sample:52,5,replace=FALSE#dealhand 11 kings<-intersect:4,hand#findwhichkings,ifany,areinhand 12 iflengthkings==1count1king<-count1king+1 13 } 14 printcount1king/nreps 15 } 1.2.7.2AssociationRulesinDataMining Theeldof datamining isabranchofcomputerscience,butitislargelyanapplicationofvariousstatistical methodstoreallyhugedatabases. Oneoftheapplicationsofdataminingiscalledthe marketbasket problem.Herethedataconsistsof recordsofsalestransactions,sayofbooksatAmazon.com.Thebusiness'goalisexempliedbyAmazon's suggestiontocustomersthatPatronswhoboughtthisbookalsotendedtobuythefollowingbooks. 6 The goalofthemarketbasketproblemistosiftthroughsalestransactionrecordstoproduce associationrules patternsinwhichsalesofsomecombinationsofbooksimplylikelysalesofotherrelatedbooks. Thenotationforassociationrulesis A;B C;D;E ,meaninginthebooksalesexamplethatcustomers whoboughtbooksAandBalsotendedtobuybooksC,DandE.HereAandBarecalledthe antecedents oftherule,andC,DandEarecalledthe consequents .Let'ssupposeherethatweareonlyinterestedin ruleswithasingleconsequent. Wewillpresentsomemethodsforndinggoodrulesinanotherchapter,butfornow,let'slookathowmany possiblerulesthereare.Obviously,itwouldbeimpracticaltouseruleswithalargenumberofantecedents. 7 Supposethebusinesshasatotalof20productsavailableforsale.Whatpercentageofpotentialruleshave threeorfewerantecedents? 8 6 Somecustomersappreciatesuchtips,whileothersviewitasinsultingoraninvasionofprivacy,butwe'llnotaddresssuch issueshere. 7 Inaddition,thereareseriousstatisticalproblemsthatwouldarise,tobediscussedinanotherchapter. 8 Besuretonotethatthisisalsoaprobability,namelytheprobabilitythatarandomlychosenrulewillhavethreeorfewer antecedents.

PAGE 32

14 CHAPTER1.DISCRETEPROBABILITYMODELS Foreachk=1,...,19,thereare )]TJ/F44 7.9701 Tf 5 -3.995 Td [(20 k possiblesetsofantecedents,thusthismanypossiblerules.Thefraction ofpotentialrulesusingthreeorfewerantecedentsisthen P 3 k =1 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(20 k )]TJ/F44 7.9701 Tf 5 -3.996 Td [(20 )]TJ/F47 7.9701 Tf 6.587 0 Td [(k 1 P 19 k =1 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(20 k )]TJ/F44 7.9701 Tf 5 -3.996 Td [(20 )]TJ/F47 7.9701 Tf 6.586 0 Td [(k 1 = 23180 10485740 =0 : 0022 .22 So,thisisjustscratchingthesurface.Andnotethatwithonly20products,therearealreadyovertenmillion possiblerules.With50products,thisnumberis 2 : 81 10 16 !ImaginewhathappensinacaselikeAmazon, withmillionsofproducts.Thesestaggeringnumbersshowwhatatremendouschallengedataminersface. 1.3DiscreteRandomVariables Inourdiceexample,therandomvariableXcouldtakeonsixvaluesintheset f 1,2,3,4,5,6 g .Thisisanite set. IntheALOHAexample, X 1 and X 2 eachtakeonvaluesintheset f 0,1,2 g ,againaniteset. 9 Nowthinkofanotherexperiment,inwhichwetossacoinuntilwegetheads.LetNbethenumberoftosses needed.ThenNcantakeonvaluesintheset f 1,2,3,... g Thisisacountablyinniteset. Nowthinkofonemoreexperiment,inwhichwethrowadartattheinterval,1,andassumethattheplace thatishit,R,cantakeonanyofthevaluesbetween0and1.Thisisanuncountablyinniteset. WesaythatX, X 1 X 2 andNare discrete randomvariables,whileRis continuous .We'lldiscusscontinuousrandomvariablesinalaterchapter. 1.4Independence,ExpectedValueandVariance Theconceptsandpropertiesintroducedinthissectionformtheverycoreofprobabilityandstatistics.Except forsomespeciccalculations,theseapplytobothdiscreteandcontinuousrandomvariablescalculations, theseapplytobothdiscreteandcontinuousrandomvariables 1.4.1IndependentRandomVariables Wealreadyhaveadenitionfortheindependenceofevents;whataboutindependenceofrandomvariables? RandomvariablesUandVaresaidtobe independent ifforanysetsIandJ,theevents f XisinI g and f Y isinJ g areindependent,i.e.PXisinIandYisinJ=PXisinIPYisinJ. 9 Wecouldevensaythat X 1 takesononlyvaluesintheset f 1,2 g ,butifweweretolookatmanyepochsratherthanjusttwo,it wouldbeeasiernottomakeanexceptionalcase.

PAGE 33

1.4.INDEPENDENCE,EXPECTEDVALUEANDVARIANCE 15 1.4.2ExpectedValue 1.4.2.1IntuitiveDenition ConsiderarepeatableexperimentwithrandomvariableX.Wesaythatthe expectedvalue ofXisthe long-runaveragevalueofX,aswerepeattheexperimentindenitely. Inournotebook,therewillbeacolumnforX.Let X i denotethevalueofXinthei th rowofthenotebook. Thenthelong-runaverageofXis lim n !1 X 1 + ::: + X n n .23 Supposeforinstanceourexperimentistotoss10coins.LetXdenotethenumberofheadswegetoutof 10.Wemightgetfourheadsintherstrepetitionoftheexperiment,i.e. X 1 =4 ,sevenheadsinthesecond repetition,so X 2 =7 ,andsoon.Intuitively,thelong-runaveragevalueofXwillbe5.Thiswillbeproven below.ThuswesaythattheexpectedvalueofXis5,andwriteEX=5. 1.4.2.2ComputationandPropertiesofExpectedValue Continuingthecointossexampleabove,let K in bethenumberoftimesthevalueioccursamong X 1 ;:::;X n i=0,...,10,n=1,2,3,...Forinstance, K 4 ; 20 isthenumberoftimeswegetfourheads,intherst20repetitions ofourexperiment.Then E X =lim n !1 X 1 + ::: + X n n .24 =lim n !1 0 K 0 n +1 K 1 n +2 K 2 n ::: +10 K 10 ;n n .25 = 10 X i =0 i lim n !1 K in n .26 But lim n !1 K in n isthelong-runproportionofthetimethatX=i.Inotherwords,it'sPX=i!So, E X = 10 X i =0 i P X = i .27 Soingeneral,theexpectedvalueofadiscreterandomvariableXwhichtakesvalueinthesetAis E X = X c 2 A cP X = c .28 Notethat.28istheformulawe'lluse.Theprecedingequationswerederivation,tomotivatetheformula. Notetoothat1.28isnotthe denition ofexpectedvalue;thatwasin1.23.Itisquiteimportanttodistinguish betweenallofthese,intermsofgoals.

PAGE 34

16 CHAPTER1.DISCRETEPROBABILITYMODELS ItwillbeshowninSection1.5.2.2thatinourexampleaboveinwhichXisthenumberofheadswegetin 10tossesofacoin, P X = i = 10 i 0 : 5 i )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 10 )]TJ/F47 7.9701 Tf 6.587 0 Td [(i .29 So E X = 10 X i =0 i 10 i 0 : 5 i )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 10 )]TJ/F47 7.9701 Tf 6.586 0 Td [(i .30 ItturnsoutthatEX=5. ForXinourdiceexample, E X = 6 X c =1 c 1 6 =3 : 5 .31 Itiscustomarytousecapitallettersforrandomvariables,e.g.Xhere,andlower-caselettersforvaluestaken onbyarandomvariable,e.g.chere.Pleaseadheretothisconvention. Bytheway,itisalsocustomarytowriteEXinsteadofEX,wheneverremovaloftheparenthesesdoesnot causeanyambiguity.Anexampleinwhichitwouldproduceambiguityis E U 2 .Theexpression EU 2 mightbetakentomeaneither E U 2 ,whichiswhatwewant,or EU 2 ,whichisnotwhatwewant. ForS=X+Yinthediceexample, E S =2 1 36 +3 2 36 +4 3 36 + ::: 12 1 36 =7 .32 InthecaseofN,tossingacoinuntilwegetahead: E N = 1 X c =1 c 1 2 c =2 .33 Wewillnotgointothedetailshereconcerninghowthesumofthisparticularinniteseriesiscomputed. SomepeopleliketothinkofEXusingacenterofgravityanalogy.Forgetthatanalogy!Thinknotebook! Intuitively,EXisthelong-runaveragevalueofXamongallthelinesofthenotebook. Soforinstance inourdiceexample,EX=3.5,whereXwasthenumberofdotsonthebluedie,meansthatifwedothe experimentthousandsoftimes,withthousandsoflinesinournotebook,theaveragevalueofXinthose lineswillbeabout3.5.WithS=X+Y,ES=7.Thismeansthatinthelong-runaverageincolumnSin Table1.4is7. Ofcourse,bysymmetry,EYwillbe3.5too,whereYisthenumberofdotsshowingontheyellowdie. ThatmeanswewastedourtimecalculatinginEquation.32;weshouldhaverealizedbeforehandthat ESis 2 3 : 5=7 .

PAGE 35

1.4.INDEPENDENCE,EXPECTEDVALUEANDVARIANCE 17 notebookline outcome blue+yellow=6? S 1 blue2,yellow6 No 8 2 blue3,yellow1 No 4 3 blue1,yellow1 No 2 4 blue4,yellow2 Yes 6 5 blue1,yellow1 No 2 6 blue3,yellow4 No 7 7 blue5,yellow1 Yes 6 8 blue3,yellow6 No 9 9 blue2,yellow5 No 7 Table1.4:ExpandedNotebookfortheDiceProblem Inotherwords,foranyrandomvariablesUandV,theexpectedvalueofanewrandomvariableD=U+Vis thesumoftheexpectedvaluesofUandV: E U + V = E U + E V .34 NotecarefullythatUandVdoNOTneedtobeindependentrandomvariablesforthisrelationtohold.You shouldconvinceyourselfofthisfactintuitively bythinkingaboutthenotebooknotion. Saywelookat 10000linesofthenotebook,whichhascolumnsforthevaluesofU,VandU+V.Itmakesnodifference whetherweaverageU+Vinthatcolumn,oraverageUandVintheircolumnsandthenaddeitherway, we'llgetthesameresult. Whileyouareatit,convinceyourselfthat E aU + b = aE U + b .35 foranyconstants a and b .Forinstance,sayUistemperatureinCelsius.ThenthetemperatureinFahrenheit is W = 9 5 U +32 .So,Wisanewrandomvariable,andwecangetisexpectedfromthatofUbyusing .35with a = 9 5 andb=32. ButifUandV are independent,then E UV = EU EV .36 Inthediceexample,forinstance,letDdenotetheproductofthenumbersofbluedotsandyellowdots,i.e. D=XY.Then E D =3 : 5 2 =12 : 25 .37 Considerafunctiongofonevariable,andletW=gX.Wisthenarandomvariabletoo.SayXtakeson

PAGE 36

18 CHAPTER1.DISCRETEPROBABILITYMODELS valuesinA,asin.28.ThenWtakesonvaluesin B = f g c : cA g .Dene A d = f c : c 2 A;g c = d g .38 Then P W = d = P X 2 A d .39 so E W = X d 2 B dP W = d .40 = X d 2 B d X c 2 A d P X = c .41 = X c 2 A g c P X = c .42 Thepropertiesofexpectedvaluediscussedabovearekeytotheentireremainderofthisbook.You shouldnoticeimmediatelywhenyouareinasettinginwhichtheyareapplicable.Forinstance,ifyou seetheexpectedvalueofthesumoftworandomvariables,youshouldinstinctivelythinkof.34right away. 1.4.2.3Casinos,InsuranceCompaniesandSumUsers,ComparedtoOthers Theexpectedvalueisintendedasa measureofcentraltendency ,i.e.assomesortofdenitionofthe probablisticmiddleintherangeofarandomvariable.Itplaysanabsolutelycentralroleinprobability andstatistics.Yetoneshouldunderstanditslimitations. First,notethattheterm expectedvalue itselfisamisnomer.Wedonotexpect Wtobe91/6inthislast example;infact,itisimpossibleforWtotakeonthatvalue. Second,theexpectedvalueiswhatwecallthe mean ineverydaylife.Andthemeanisterriblyoverused. Consider,forexample,anattempttodescribehowwealthyornotpeopleareinthecityofDavis.If suddenlyBillGatesweretomoveintotown,thatwouldskewthevalueofthemeanbeyondrecognition. EvenwithoutGates,thereisaquestionastowhetherthemeanhasthatmuchmeaning. Moresubtlythanthat,thereisthebasicquestionofwhatthemeanmeans.What,forexample,doesEquation .23meaninthecontextofpeople'sincomesinDavis?Wewouldsampleapersonatrandomandrecord his/herincomeas X 1 .Thenwe'dsampleanotherperson,toget X 2 ,andsoon.Fine,butinthatcontext, whatwould.23mean?Theansweris,notmuch. Foracasino,though,.23meansplenty.SayXistheamountagamblerwinsonaplayofaroulettewheel, andsuppose.23isequalto$1.88.Thenafter,say,1000playsofthewheelnotnecessarilybythesame gambler,thecasinoknowsitwillhavepaidoutatotalaboutabout$1,880.Soifthecasinocharges,say

PAGE 37

1.4.INDEPENDENCE,EXPECTEDVALUEANDVARIANCE 19 $1.95perplay,itwillhavemadeaprotofabout$70overthose1000plays.Itmightbeabitmoreor lessthanthatamount,butthecasinocanbeprettysurethatitwillbearound$70,andtheycanplantheir businessaccordingly. Thesameprincipleholdsforinsurancecompanies,concerninghowmuchtheypayoutinclaims.Witha largenumberofcustomers,theyknowexpect!approximatelyhowmuchtheywillpayout,andthuscan settheirpremiumsaccordingly. Thekeypointinthecasinoandinsurancecompaniesexamplesisthattheyareinterestedintotals,e.g.total payoutsonablackjacktableoveramonth'stime,ortotalinsuranceclaimspaidinayear.Anotherexample mightbethenumberofdefectivesinabatchofcomputerchips;themanufacturerisinterestedinthetotal numberofdefectiveschipsproduced,sayinamonth. Bycontrast,indescribinghowwealthypeopleofatownare,thetotalincomeofalltheresidentsisnot relevant.Similarly,indescribinghowwellstudentsdidonanexam,thesumofthescoresofallthestudents doesn'ttellusmuch.Abetterdescriptionmightinvolvepercentiles,includingthe50thpercentile,the median. Nevertheless,themeanhascertainmathematicalproperties,suchas.34,thathaveallowedtherichdevelopmentoftheeldsofprobabilityandstatisticsovertheyears.Themedian,bycontrast,doesnothavenice mathematicalproperties.So,themeanhasbecomeentrenchedasadescriptivemeasure,andwewilluseit often. 1.4.3Variance Whiletheexpectedvaluetellsustheaveragevaluearandomvariabletakeson,wealsoneedameasureof therandomvariable'svariabilityhowmuchdoesitwanderfromonelineofthenotebooktoanother?In otherwords,wewantameasureof dispersion .Theclassicalmeasureis variance ,denedtobethemean squareddifferencebetweenarandomvariableanditsmean: Var U = E [ U )]TJ/F46 10.9091 Tf 10.909 0 Td [(EU 2 ] .43 ForXinthedieexample,thiswouldbe Var X = E [ X )]TJ/F15 10.9091 Tf 10.909 0 Td [(3 : 5 2 ] .44 Toevaluatethis,apply.42with g c = c )]TJ/F15 10.9091 Tf 10.909 0 Td [(3 : 5 2 : Var X = 6 X c =1 c )]TJ/F15 10.9091 Tf 10.909 0 Td [(3 : 5 2 1 6 =2 : 92 .45 Youcanseethatvariancedoesindeedgiveusameasureofdispersion.IfthevaluesofUaremostlyclustered nearitsmean,thevariancewillbesmall;ifthereiswidevariationinU,thevariancewillbelarge.

PAGE 38

20 CHAPTER1.DISCRETEPROBABILITYMODELS ThepropertiesofEin.34and.35canbeusedtoshowthat Var U = E U 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( EU 2 .46 Theterm E U 2 isagainevaluatedusing.42. Thusforexample,ifXisthenumberofdotswhichcomeupwhenwerolladie,and W = X 2 ,then E W = 6 X i =1 i 2 1 6 = 91 6 .47 Animportantpropertyofvarianceisthat Var cU = c 2 Var U .48 foranyrandomvariableUandconstantc.Itshouldmakesensetoyou:Ifwemultiplyarandomvariable by5,say,thenitsaveragesquareddistancetoitsmeanshouldincreasebyafactorof25.Andshiftingdata overbyaconstantdoesnotchangetheamountofvariationinthem,so Var cU + d = c 2 Var U .49 foranyconstantd. Thesquarerootofthevarianceiscalledthe standarddeviation Thesquaringinthedenitionofvarianceproducessomedistortion,byexaggeratingtheimportanceofthe largerdifferences.Itwouldbemorenaturaltousethe meanabsolutedeviation MAD, E j U )]TJ/F46 10.9091 Tf 11.417 0 Td [(EU j However,thisislesstractablemathematically,sothestatisticalpioneerschosetousethemeansquared difference,whichlendsitselftolotsofpowerfulandbeautifulmath,inwhichthePythagoreanTheorem popsupinabstractvectorspaces.SeeSection3.9.2fordetails. Aswithexpectedvalues,thepropertiesofvariancediscussedabove,andalsoinSection3.2.1below, arekeytotheentireremainderofthisbook.Youshouldnoticeimmediatelywhenyouareinasetting inwhichtheyareapplicable.Forinstance,ifyouseethevarianceofthesumoftworandomvariables, youshouldinstinctivelythinkof.61rightaway. 1.4.4IsaVarianceofXLargeorSmall? RecallthatthevarianceofarandomvariableXissupposetobeameasureofthedispersionofX,meaning theamountthatXvariesfromoneinstanceonelineinournotebooktothenext.ButifVarXis,say,2.5, isthatalotofvariabilityornot?Wewillpursuethisquestionhere.

PAGE 39

1.4.INDEPENDENCE,EXPECTEDVALUEANDVARIANCE 21 1.4.5Chebychev'sInequality ThisinequalitystatesthatforarandomvariableXwithmean andvariance 2 P j X )]TJ/F46 10.9091 Tf 10.909 0 Td [( j c 1 c 2 .50 Inotherwords,Xdoesnotoftenstraymorethan,say,3standarddeviationsfromitsmean.Thisgivessome concretemeaningtotheconceptofvariance/standarddeviation. Toprove.50,let'srststateandproveMarkov'sInequality:ForanynonnegativerandomvariableY, P Y d EY d .51 Toprove.51,letZequal1if Y d ,0otherwise.Then Y dZ .52 thinkofthetwocases,so EY dEZ .53 Theright-handsideof.53is dP Y d ,so.51follows. Nowtoprove.50,dene Y = X )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 .54 andset d = c 2 2 .Then.51says P [ X )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 c 2 2 ] E [ X )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 ] c 2 2 .55 Since X )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 c 2 2 ifandonlyif j X )]TJ/F46 10.9091 Tf 10.909 0 Td [( j c .56 theleft-handsideof.55isthesameastheleft-handsideof.50.Thenumeratoroftheright-handsize of.55issimplyVarX,i.e. 2 ,sowearedone. 1.4.6TheCoefcientofVariation Continuingourdiscussionofthemagnitudeofavariance,lookatourremarkfollowing.50:

PAGE 40

22 CHAPTER1.DISCRETEPROBABILITYMODELS Inotherwords,Xdoesnotoftenstraymorethan,say,3standarddeviationsfromitsmean.This givessomeconcretemeaningtotheconceptofvariance/standarddeviation. ThissuggeststhatanydiscussionofthesizeofVarXshouldrelatetothesizeofEX.Accordingly,one oftenlooksatthe coefcientofvariation ,denedtobetheratioofthestandarddeviationtothemean: coef.ofvar. = p Var X EX .57 Thisisascale-freemeasuree.g.inchesdividedbyinches,andservesasagoodwaytojudgewhethera varianceislargeornot. 1.4.7Covariance Thisisatopicwe'llcoverfullyinChapter3,butatleastintroducehere. AmeasureofthedegreetowhichUandVvarytogetheristheir covariance Cov U;V = E [ U )]TJ/F46 10.9091 Tf 10.909 0 Td [(EU V )]TJ/F46 10.9091 Tf 10.909 0 Td [(EV ] .58 Exceptforadivisor,thisisessentially correlation .IfUisusuallylargeatthesametimeYissmall,for instance,thenyoucanseethatthecovariancebetweenthemwitllbenegative.Ontheotherhand,iftheyare usuallylargetogetherorsmalltogether,thecovariancewillbepositive. Again,onecanusethepropertiesofEtoshowthat Cov U;V = E UV )]TJ/F46 10.9091 Tf 10.909 0 Td [(EU EV .59 Also Var U + V = Var U + Var V +2 Cov U;V .60 IfUandVareindependent,thenCovU,V=0and Var U + V = Var U + Var V .61 1.4.8ACombinatorialExample Acommitteeoffourpeopleisdrawnatrandomfromasetofsixmenandthreewomen.Supposeweare concernedthattheremaybequiteagenderimbalanceinthemembershipofthecommittee.Towardthat end,letMandWdenotethenumbersofmenandwomeninourcommittee,andletD=M-W.Let'snd ED.

PAGE 41

1.4.INDEPENDENCE,EXPECTEDVALUEANDVARIANCE 23 Dcantakeonthevalues4-0,3-1,2-2and1-3,i.e.4,2,0and-2.So, ED = )]TJ/F15 10.9091 Tf 8.484 0 Td [(2 P D = )]TJ/F15 10.9091 Tf 8.485 0 Td [(2+0 P D =0+2 P D =2+4 P D =4 .62 Now,usingreasoningalongthelinesinSection1.2.7,wehave P D = )]TJ/F15 10.9091 Tf 8.485 0 Td [(2= P M =1 and W =3= )]TJ/F44 7.9701 Tf 5 -3.995 Td [(6 1 )]TJ/F44 7.9701 Tf 10 -3.995 Td [(3 3 )]TJ/F44 7.9701 Tf 5 -3.996 Td [(9 4 .63 Aftersimilarcalculationsfortheotherprobabilitiesin.62,wendtheED=1.33.Ifweweretoperform thisexperimentmanytimes,i.e.choosecommitteesagainandagain,onaveragewewouldhaveabitmore thanonemoremanthanwomenonthecommittee. 1.4.9ExpectedValue,Etc.intheALOHAExample Findingexpectedvaluesetc.intheALOHAexampleisstraightforward.Forinstance, EX 1 =0 P X 1 =0+1 P X 1 =1+2 P X 1 =2=1 0 : 48+2 0 : 52=1 : 52 .64 HereisRcodetondvariousvaluesapproximatelybysimulation: 1 #findsEX1,EX2,VarX2,CovX1,X2 2 sim<-functionp,q,nreps{ 3 sumx1<-0 4 sumx2<-0 5 sumx2sq<-0 6 sumx1x2<-0 7 foriin1:nreps{ 8 numsend<-0 9 foriin1:2 10 ifrunif
PAGE 42

24 CHAPTER1.DISCRETEPROBABILITYMODELS 33 meanx2<-sumx2/nreps 34 cat"EX2:",meanx2,"n" 35 cat"VarX2:",sumx2sq/nreps-meanx2,"n" 36 cat"CovX1,X2:",sumx2/nreps,"n" 37 } Asacheckonyourunderstandingsofar,youshouldndatleastoneofthesevaluesbyhand,andseeifit jibeswiththesimulationoutput. 1.4.10ReconciliationofMathandIntuitionoptionalsection HereIhavebeenpromotingthenotebookideaoverthesterile,confusingmathematicaldenitionsinthe theoryofprobability.Itisworthnoting,though,thatthetheoryactuallydoesimplythenotebooknotion, throughatheoremknownastheStrongLawofLargeNumbers: ConsiderarandomvariableU,andasequenceofindependentrandomvariables U 1 ;U 2 ;::: whichallhave thesamedistributionasU,i.e.theyarerepetitionsoftheexperimentwhichgeneratesU.Then lim n !1 U 1 + ::: + U n n = E U withprobability1.65 Inotherwords,theaveragevalueofUinallthelinesofthenotebookwillindeedconvergetoEU. 1.5Distributions 1.5.1BasicNotions Forthetypeofrandomvariableswe'vediscussedsofar,the distribution ofarandomvariableUissimply alistofallthevaluesittakeson,andtheirassociatedprobabilities: Example: ForXinthediceexample,thedistributionofXis f ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 g .66 Example: IntheALOHAexample,distributionof X 1 is f ; 0 : 00 ; ; 0 : 48 ; ; 0 : 52 g .67 Example: InourexampleinwhichNisthenumberoftossesofacoinneededtogetthersthead,the distributionis f ; 1 2 ; ; 1 4 ; ; 1 8 ;::: g .68

PAGE 43

1.5.DISTRIBUTIONS 25 Itiscommontoexpressthisinfunctionalnotation.Wedenethe probabilitymassfunction pmfofa discreterandomvariableV,denoted p V ,as p V k = P V = k .69 foranyvaluekwhichVcantakeon. Pleasekeepinmindthenotation.Itiscustomarytousethelower-casep,withasubscriptconsistingofthe nameoftherandomvariable. Example: In.68, p N k = 1 2 k ;k =1 ; 2 ;::: .70 Example: Inthediceexample,whichS=X+Y, p S k = 8 > > > > > > < > > > > > > : 1 36 ;k =2 3 36 ;k =3 3 36 ;k =4 ::: 1 36 ;k =12 .71 Itisimportanttonotethattheremaynotbesomeniceclosed-formexpressionfor p V likethatof.70. Therewasnosuchformin.71,noristhereinourALOHAexamplefor p X 1 and p X 2 1.5.2ParametericFamiliesofpmfs 1.5.2.1TheGeometricFamilyofDistributions Recallourexampleoftossingacoinuntilwegetthersthead,withNdenotingthenumberoftossesneeded. Inorderforthistotakektosses,weneedk-1tailsandthenahead.Thus p N k = 1 2 k )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 1 2 ;k =1 ; 2 ;::: .72 Wemightcallgettingaheadasuccess,andrefertoatailasafailure.Ofcourse,thesewordsdon'tmean anything;wesimplyrefertotheoutcomeofinterestassuccess. DeneMtobethenumberofrollsofadieneededuntilthenumber5showsup.Then p N k = 5 6 k )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 1 6 ;k =1 ; 2 ;::: .73

PAGE 44

26 CHAPTER1.DISCRETEPROBABILITYMODELS reectingthefactthattheevent f M=k g occursifwegetk-1non-5sandthena5.Heresuccessisgetting a5. Thetossesofthecoinandtherollsofthedieareknownas Bernoullitrials ,whichisasequenceofindependent1-0-valuedrandomvariables B i ,i=1,2,3,... B i is1forsuccess,0forfailure,withsuccessprobability p.Forinstance,pis1/2inthecoincase,and1/6inthedieexample. Ingeneral,supposetherandomvariableUisdenedtobethenumberoftrialsneededtogetasuccessina sequenceofBernoullitrials.Then p U k = )]TJ/F46 10.9091 Tf 10.909 0 Td [(p k )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 p;k =1 ; 2 ;::: .74 Notethatthereisadifferentdistributionforeachvalueofp,sowecallthisa parametricfamily ofdistributions,indexedbytheparameterp.WesaythatUis geometricallydistributed withparameterp. Itcanbeshownthat E U = 1 p .75 whichshouldmakegoodintuitivesensetoyouand Var U = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p p 2 .76 Bytheway,ifweweretothinkofanexperimentinvolvingageometricdistributionintermsofournotebook idea,thenotebookwouldhaveaninnitenumberofcolumns,oneforeach B i .Withineachrowofthe notebook,the B i entrieswouldbe0untiltherst1,thenNAnotapplicableafterthat. 1.5.2.2TheBinomialFamilyofDistributions AgeometricdistributionariseswhenwehaveBernoullitrialswithparameterp,withavariablenumberof trialsNbutaxednumberofsuccesses.A binomialdistribution ariseswhenwehavetheoppositea xednumberofBernoullitrialsnbutavariablenumberofsuccessessayX. 10 Forexample,saywetossacoinvetimes,andletXbethenumberofheadsweget.WesaythatXis binomiallydistributedwithparametersn=5andp=1/2.Let'sndPX=2.Therearemanyordersinwhich thatcouldoccur,suchasHHTTT,TTHHT,HTTHTandsoon.Eachorderhasprobability 0 : 5 2 )]TJ/F15 10.9091 Tf 11.126 0 Td [(0 : 5 3 andthereare )]TJ/F44 7.9701 Tf 5 -3.995 Td [(5 2 orders.Thus P X =2= 5 2 0 : 5 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 3 = 5 2 = 32=5 = 16 .77 10 Noteagainthecustomofusingcapitallettersforrandomvariables,andlower-caselettersforconstants.

PAGE 45

1.5.DISTRIBUTIONS 27 Forgeneralnandp, P X = k = n k p k )]TJ/F46 10.9091 Tf 10.91 0 Td [(p n )]TJ/F47 7.9701 Tf 6.586 0 Td [(k .78 Soagainwehaveaparametricfamilyofdistributions,inthiscaseafamilyhavingtwoparameters,nandp. Let'swriteXasasumofthose0-1Bernoullivariablesweusedinthediscussionofthegeometricdistribution above: X = n X i =1 B i .79 where B i is1or0,dependingonwhetherthereissuccessonthe i th trialornot.Thenthereadershoulduse ourearlierpropertiesofEandVarinSection1.4tollinthedetailsinthefollowingderivationsofthe expectedvalueandvarianceofabinomialrandomvariable: EX = E B 1 + :::; + B n = EB 1 + ::: + EB n = np .80 and Var X = Var B 1 + :::; + B n = Var B 1 + ::: + Var B n = np )]TJ/F46 10.9091 Tf 10.909 0 Td [(p .81 Again,.80shouldmakegoodintuitivesensetoyou. 1.5.2.3ThePoissonFamilyofDistributions Anotherfamousparametricfamilyofdistributionsisthesetof PoissonDistributions ,whichisusedto modelunboundedcounts.Thepmfis P X = k = e )]TJ/F47 7.9701 Tf 6.587 0 Td [( k k ;k =0 ; 1 ; 2 ;::: .82 Theparameterforthefamily, ,turnsouttobethevalueofEXandalsoVarX. ThePoissonfamilyisveryoftenusedtomodelcountdata.Forexample,ifyougotoacertainbankevery dayandcountthenumberofcustomerswhoarrivebetween11:00and11:15a.m.,youwillprobablynd thatthatdistributioniswellapproximatedbyaPoissondistributionforsome 1.5.2.4TheNegativeBinomialFamilyofDistributions RecallthatatypicalexampleofthegeometricdistributionfamilySection1.5.2.1arisesasN,thenumber oftossesofacoinneededtogetourrsthead.Nowgeneralizethat,withNnowbeingthenumberoftosses

PAGE 46

28 CHAPTER1.DISCRETEPROBABILITYMODELS neededtogetour r th head,whererisaxedvalue.Let'sndPN=k,k=r,r+1,...Forconcreteness, lookatthecaser=3,k=5.Inotherwords,wearendingtheprobabilitythatitwilltakeus5tossesto accumulate3heads. Firstnotetheequivalenceoftwoevents: f N =5 g = f 2headsintherst4tossesandheadonthe5 th toss g .83 Thateventdescribedbeforetheandcorrespondstoabinomialprobability: P headsintherst4tosses= 4 2 1 2 4 .84 Sincetheprobabilityofaheadonthe k th tossis1/2andthetossesareindependent,wendthat P N =5= 4 2 1 2 5 = 3 16 .85 Thenegativebinomialdistributionfamily,indexedbyparametersrandp,correspondstorandomvariables whichcountthenumberofindependenttrialswithsuccessprobabilitypneededuntilwegetrsuccesses. Thepmfis P N = k = k )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 r )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p k )]TJ/F47 7.9701 Tf 6.586 0 Td [(r p r ;k = r;r +1 ;::: .86 Wecanwrite N = G 1 + ::: + G r .87 where G i isthenumberoftossesbetweenthesuccessesnumbersi-1andi.Buteach G i hasageometric distribution!Sincethemeanofthatdistributionis1/p,wehavethat E N = r 1 p .88 Infact,thosergeometricvariablesarealsoindependent,soweknowthevarianceofNisthesumoftheir variances: Var N = r 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p p 2 .89

PAGE 47

1.6.RECOGNIZINGDISTRIBUTIONSWHENYOUSEETHEM 29 1.5.2.5ThePowerLawFamilyofDistributions Here p X k = ck )]TJ/F47 7.9701 Tf 6.586 0 Td [( ;k =1 ; 2 ; 3 ;::: .90 Itisrequiredthat > 1 ,asotherwisethesumofprobabilitieswillbeinnite.For satisfyingthatcondition, thevaluecischosensothatthatsumis1.0: 1 : 0= 1 X k =1 ck )]TJ/F47 7.9701 Tf 6.586 0 Td [( c Z 1 1 k )]TJ/F47 7.9701 Tf 6.587 0 Td [( dk = c .91 Hereagainwehaveaparametricfamilyofdistributions,indexedbytheparameter Thepowerlawfamilyisanold-fashionedmodelanold-fashionedtermfordistributionis law ,butthere hasbeenaresurgenceofinterestinitinrecentyears.Itturnsoutthatmanytypesofnetworksinthereal worldexhibitapproximatelypowerlawbehavior. Forinstance,inafamousstudyoftheWebA.BarabasiandR.Albert,EmergenceofScalinginRandom Networks, Science ,1999,509-512,itwasfoundthatthenumberoflinksleadingtoaWebpagehasan approximatepowerlawdistributionwith =2 : 1 .ThenumberoflinksleadingoutofaWebpagewas foundtobeapproximatelypower-lawdistributed,with =2 : 7 1.6RecognizingDistributionsWhenYouSeeThem Manyrandomvariablesoneencountersdonothaveadistributioninsomefamousparametricfamily.But manydo,andit'simportanttobealerttothispoint,andrecognizeonewhenyouseeone. 1.6.1ACoinGame ConsideragameplayedbyJackandJill.Eachofthemtossesacoinmanytimes,butJackgetsaheadstart oftwotosses.SobythetimeJackhashad,forinstance,8tosses,Jillhashadonly6;whenJacktossesfor the15 th time,Jillhasher13 th toss;etc. Let X k denotethenumberofheadsJackhasgottenthroughhisk th toss,andlet Y k betheheadcountforJill atthatsametime,i.e.amongonlyk-2tossesforher.So, Y 1 = Y 2 =0 .Let'sndtheprobabilitythatJill iswinningafterthek th toss,i.e. P Y 6 >X 6 Yourrstreactionmightbe,Aha,binomialdistribution!Youwouldbeontherighttrack,buttheproblem isthatyouwouldnotbethinkingpreciselyenough.JustWHAThasabinomialdistribution?Theansweris thatboth X 6 and Y 6 havebinomialdistributions,bothwithp=0.5,butn=6for X 6 whilen=4for Y 6 Now,asusual,askthefamousquestion,Howcanithappen?Howcanithappenthat Y 6 >X 6 ?Well, wecouldhave,forexample, Y 6 =3 and X 6 =1 ,aswellasmanyotherpossibilities.Let'swriteit

PAGE 48

30 CHAPTER1.DISCRETEPROBABILITYMODELS mathematically: P Y 6 >X 6 = 4 X i =1 i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X j =0 P Y 6 = i and X 6 = j .92 MakeSUREyourunderstandthisequation. Now,toevaluate P Y 6 = i and X 6 = j ,weseetheandsoweaskwhether Y 6 and X 6 areindependent. Theyinfactare;Jill'scointossescertainlydon'taffectJack's.So, P Y 6 = i and X 6 = j = P Y 6 = i P X 6 = j .93 Itisatthispointthatwenallyusethefactthat X 6 and Y 6 havebinomialdistributions.Wehave P Y 6 = i = 4 i 0 : 5 i )]TJ/F15 10.9091 Tf 10.91 0 Td [(0 : 5 4 )]TJ/F47 7.9701 Tf 6.586 0 Td [(i .94 and P X 6 = j = 6 j 0 : 5 j )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 6 )]TJ/F47 7.9701 Tf 6.587 0 Td [(j .95 Wewouldthensubstitute.94and.95in.92.Wecouldthenevaluateitbyhand,butitwouldbe moreconvenienttouseR's dbinom function: 1 prob<-0 2 foriin1:4 3 forjin0:i-1 4 prob<-prob+dbinomi,4,0.5 dbinomj,6,0.5 5 printprob Wegetananswerofabout0.17.IfJackandJillweretoplaythisgamerepeatedly,stoppingeachtimeafter the6 th toss,thenJillwouldwinabout17%ofthetime. 1.6.2TossingaSetofFourCoins Consideragameinwhichwehaveasetoffourcoins.Wekeeptossingthesetoffouruntilwehavea situationinwhichexactlytwoofthemcomeupheads.LetNdenotethenumbroftimeswemusttossthe setoffourcoins. Forinstance,onthersttossofthesetoffour,theoutcomemightbeHTHH.ThesecondmightbeTTTH, andthethirdcouldbeTHHT.Inthesituation,N=3. Let'sndPN=5.HerewerecognizethatNhasageometricdistribution,withsuccessdenedasgetting twoheadsinoursetoffourcoins.Whatvaluedoestheparameterphavehere?

PAGE 49

1.7.ACAUTIONARYTALE 31 Well,pisPX=2,whereXisthenumberofheadswegetfromatossofthesetoffourcoins.Werecognize thatXisbinomial!Thus p = 4 2 0 : 5 4 = 3 8 .96 ThususingthefactthatNhasageometricdistribution, P N =5= )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 4 p =0 : 057 .97 1.6.3TheALOHAExampleAgain Asanillustrationofhowcommonlytheseparametricfamiliesarise,let'sagainlookattheALOHAexample. Considerthegeneralcase,withtransmissionprobabilityp,messagecreationprobabilityq,andmnetwork nodes.Wewillnotrestrictourobservationtojusttwoepochs. Suppose X i = m ,i.e.attheendofepochiallnodeshaveamessagetosend.Thenthenumberwhich attempttosendduringepochi+1willbebinomiallydistributed,withparametersmandp. 11 Forinstance, theprobabilitythatthereisasuccessfultransmissionisequaltotheprobabilitythatexactlyoneofthem nodesattemptstosend, m 1 p )]TJ/F46 10.9091 Tf 10.909 0 Td [(p m )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 = mp )]TJ/F46 10.9091 Tf 10.909 0 Td [(p m )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 .98 Nowinthatsamesetting, X i = m ,letKbethenumberofepochsitwilltakebeforesomemessageactually getsthrough.Inotherwords,wewillhave X i = m X i +1 = m X i +2 = m ,...butnally X i + K )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 = m )]TJ/F15 10.9091 Tf 9.685 0 Td [(1 ThenKwillbegeometricallydistributed,withsuccessprobabilityequalto.98. ThereisnoPoissondistributioninthisexample,butitiscentraltotheanalysisofEthernet,andalmostany othernetwork.Wewilldiscussthisatvariouspointsinlaterchapters. 1.7ACautionaryTale 1.7.1TrickCoins,TrickyExample Supposewehavetwotrickcoinsinabox.Theylookidentical,butoneofthem,denotedcoin1,isheavily weightedtowardheads,witha0.9probabilityofheads,whiletheother,denotedcoin2,isbiasedinthe oppositedirection,witha0.9probabilityoftails.Let C 1 and C 2 denotetheeventsthatwegetcoin1orcoin 2,respectively. Ourexperimentconsistsofchoosingacoinatrandomfromthebox,andthentossingitntimes.Let B i denotetheoutcomeofthe i th toss,i=1,2,3,...,where B i =1 meansheadsand B i =0 meanstails.Let X i = B 1 + ::: + B i ,so X i isacountofthenumberofheadsobtainedthroughthe i th toss. 11 Notethatthisisaconditionaldistribution,given X i = m .

PAGE 50

32 CHAPTER1.DISCRETEPROBABILITYMODELS Thequestionis:Doestherandomvariable X i haveabinomialdistribution?Or,moresimply,thequestionis,Aretherandomvariables B i independent?Tomostpeople'ssurprise,theanswerisNotoboth questions.Whynot? Thevariables B i areindeed0-1variables,andtheyhaveacommonsuccessprobability.Buttheyarenot independent!Let'sseewhytheyaren't. Considertheevents A i = f B i =1 g ,i=1,2,3,...Infact,justlookatthersttwo.Bydenition,theyare independentifandonlyif P A 1 and A 2 = P A 1 P A 2 .99 First,whatis P A 1 ? Now,waitaminute! Don'tanswer,Well,itdependsonwhichcoinweget, becausethisisNOTaconditionalprobability.Yes,the conditional probabilities P A 1 j C 1 and P A 1 j C 2 are0.9and0.1,respectively,butthe unconditional probabilityis P A 1 =0 : 5 .Youcandeducethateither bythesymmetryofthesituation,orby P A 1 = P C 1 P A 1 j C 1 + P C 2 P A 1 j C 2 = : 5 : 9+ : 5 : 1=0 : 5 .100 Youshouldthinkofallthisinthenotebookcontext.Eachlineofthenotebookwouldconsistofareportof threethings:whichcoinweget;theoutcomeofthersttoss;andtheoutcomeofthesecondtoss.Noteby thewaythatinourexperimentwedon'tknowwhichcoinweget,butconceptuallyitshouldhaveacolumn inthenotebook.Ifwedothisexperimentformany,manylinesinthenotebook,about90%ofthelinesin whichthecoincolumnsayswillshowHeadsinthesecondcolumn.But50%ofthelines overall will showHeadsinthatcolumn. So,therighthandsideofEquation.99isequalto0.25.Whataboutthelefthandside? P A 1 and A 2 = P A 1 and A 2 and C 1 + P A 1 and A 2 and C 2 .101 = P A 1 and A 2 j C 1 P C 1 + P A 1 and A 2 j C 2 P C 2 .102 = : 9 2 : 5+ : 1 2 : 5 .103 =0 : 41 .104 Well,0.41isnotequalto0.25,soyoucanseethattheeventsarenotindependent,contrarytoourrst intuition.Andthatalsomeansthat X i isnotbinomial. 1.7.2IntuitioninRetrospect Togetsomeintuitionhere,thinkaboutwhatwouldhappenifwetossedthechosencoin10000timesinstead ofjusttwice.Ifthetosseswereindependent,thenforexampleknowledgeoftherst9999tossesshouldnot tellusanythingaboutthe10000thtoss.Butthatisnotthecaseatall.After9999tosses,wearegoingto haveaverygoodideaastowhichcoinwehadchosen,becausebythattimewewillhavegottenabout9000 headsinthecaseofcoin C 1 orabout1000headsinthecaseof C 2 .Intheformercase,weknowthatthe

PAGE 51

1.8.WHYNOTJUSTDOALLANALYSISBYSIMULATION? 33 10000thtossislikelytobeahead,whileinthelattercaseitislikelytobetails. Inotherwords,earlier tossesdoindeedgiveusinformationaboutlatertosses,sothetossesaren'tindependent. 1.7.3ImplicationsforModeling Thelessontobelearnedisthatindependencecandenitelybeatrickything,nottobeassumedcavalierly. Andincreatingprobabilitymodelsofrealsystems,wemustgivevery,verycarefulthoughttotheconditional andunconditionalaspectsofourmodels-itcanmakeahugedifference,aswesawabove.Also,the conditionalaspectsoftenplayakeyroleinformulatingmodelsofnonindependence. Thistrickcoinexampleisjustthattrickybutsimilarsituationsoccurofteninreallife.Ifinsomemedical study,say,wesamplepeopleatrandomfromthepopulation,thepeopleareindependentofeachother.But ifwesample families fromthepopulation,andthenlookatchildrenwithinthefamilies,thechildrenwithin afamilyarenotindependentofeachother. 1.8WhyNotJustDoAllAnalysisbySimulation? Nowthatcomputerspeedsaresofast,onemightaskwhyweneedtodomathematicalprobabilityanalysis; whynotjustdoeverythingbysimulation?Thereareanumberofreasons: Evenwithafastcomputer,simulationsofcomplexsystemscantakedays,weeksorevenmonths. Mathematicalanalysiscanprovideuswithinsightsthatmaynotbeclearinsimulation. Likeallsoftware,simulationprogramsarepronetobugs.Thechanceofhavinganuncaughtbugina simulationprogramisreducedbydoingmathematicalanalysisforaspecialcaseofthesystembeing simulated.Thisservesasapartialcheck. Statisticalanalysisisusedinmanyprofessions,includingengineeringandcomputerscience,andin ordertoconductmeaningful,useful statisticalanalysis,oneneedsarmunderstandingofprobability principles. AnexampleofthatsecondpointaroseinthecomputersecurityresearchofagraduatestudentatUCD,C. Senthilkumar,whowasworkingonawaytomorequicklydetectthespreadofamaliciouscomputerworm. Hewasevaluatinghisproposedmethodbysimulation,andfoundthatthingshitawallatacertainpoint. Hewasn'tsureifthiswasareallimitation;maybe,forexample,hejustwasn'trunninghissimulationon therightsetofparameterstogobeyondthislimit.Butamathematicalanalysisshowedthatthelimitwas indeedreal. 1.9TipsonFindingProbabilities,ExpectedValuesandSoOn First,donotwrite/thinknonsense.Forexample,theexpressionPAorPBisnonsensedoyousee why?

PAGE 52

34 CHAPTER1.DISCRETEPROBABILITYMODELS Similarly,don'tuseformulasthatyoudidn'tlearnandareinfactfalse.Forexample,inanexpression involvingarandomvariableX,onecanNOTreplaceXbyEX!Howwouldyoulikeitifyourprofessor weretoloseyourexam,andthentellyou,Well,I'lljustassignyouascorethatisequaltotheclassmean? Asnotedbefore,incalculatingaprobability,askyourself, Howcanithappen? Thenyou willtypicallyhaveasetofand/orterms,whichyoucomputeindividuallyandaddtogether.Anduntilyou getusedtoit, writedowneverystep,includingreasons ,asyouseein.7-.9. Anotherpointisthatyoushoulddenevariables,e.g.LetXdenotethenumberofheads. Writeitdown! Thismakesitmucheasiertotranslatefromwordstomathexpressionsandequations. Exercises 1 .ThisproblemconcernstheALOHAnetworkmodelofSection1.1.Feelfreetousebutcitecomputations alreadyintheexample. a P X 1 =2 and X 2 =1 ,forthesamevaluesof p and q intheexamples. bFind P X 2 =0 cFind P X 1 =1 j X 2 =1 2 .Consideragameinwhichonerollsasingledieuntiloneaccumulatesatotalofatleastfourdots.Let X denotethenumberofrollsneeded.Find P X 2 and E X 3 .RecallthecommitteeexampleinSection1.4.8.Supposenow,though,thattheselectionprotocolisthat theremustbeatleastonemanandatleastonewomanonthecommittee.Find E D and Var D 4 .ConsiderthegameinSection1.6.1.Find E Z and Var Z ,where Z = Y 6 )]TJ/F46 10.9091 Tf 10.909 0 Td [(X 6 5 .Saywechoosesixcardsfromastandarddeck,oneatatimeWITHOUTreplacement.Let N bethe numberofkingsweget.Does N haveabinomialdistribution?Chooseone:iYes.iiNo,sincetrialsare notindependent.iiiNo,sincetheprobabilityofsuccessisnotconstantfromtrialtotrial.ivNo,since thenumberoftrialsisnotxed.viiandiii.iviiandiv.viiiiiandiv. 6 .Supposewehavenindependenttrials,withtheprobabilityofsuccessonthei th trialbeing p i .Let X =the numberofsuccesses.Usethefactthatthevarianceofthesumisthesumofthevarianceforindependent randomvariablestoderive Var X 7 .Youboughtthreeticketsinalottery,forwhich60ticketsweresoldinall.Therewillbeveprizesgiven. Findtheprobabilitythatyouwinatleastoneprize,andtheprobabilitythatyouwinexactlyoneprize. 8 .Twove-personcommitteesaretobeformedfromyourgroupof20people.Inordertofostercommunication,wesetarequirementthatthetwocommitteeshavethesamechairbutnootheroverlap.Findthe probabilitythatyouandyourfriendarebothchosenforsomecommittee. 9 .Consideradevicethatlastseitherone,twoorthreemonths,withprobabilities0.1,0.7and0.2,respectively.Wecarryonespare.Findtheprobabilitythatwehavesomedevicestillworkingjustbeforefour monthshaveelapsed.

PAGE 53

1.9.TIPSONFINDINGPROBABILITIES,EXPECTEDVALUESANDSOON 35 10 .Abuildinghassixoors,andisservedbytwofreightelevators,namedMikeandIke.Thedestination oorofanyorderoffreightisequallylikelytobeanyofoors2through6.Onceanelevatorreachesany oftheseoors,itstaysthereuntilsummoned.Whenanorderarrivestothebuilding,whicheverelevatoris currentlyclosertooor1willbesummoned,withelevatorIkebeingtheonesummonedinthecaseinwhich theyarebothonthesameoor. Findtheprobabilitythatafterthesummons,elevatorMikeisonoor3.Assumethatonlyoneorderof freightcantinanelevatoratatime.Also,supposetheaveragetimebetweenarrivalsoffreighttothe buildingismuchlargerthanthetimeforanelevatortotravelbetweenthebottomandtopoors;this assumptionallowsustoneglecttraveltime. 11 .Withoutresortingtousingthefactthat )]TJ/F47 7.9701 Tf 5 -3.996 Td [(n k = n = [ k n )]TJ/F46 10.9091 Tf 10.909 0 Td [(k !] ,nd c and d suchthat n k = n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 k + c d .105 12 .ProveEquation.46,andalsoshowthat b = EU minimizesthequantity E ] U )]TJ/F46 10.9091 Tf 10.909 0 Td [(b 2 ] 13 .Showthatif X isanonnegative-integervaluedrandomvariable,then EX = 1 X i =1 P X i .106 Hint:Write i = P i j =1 1 ,andwhenyouseeaniteratedsum,reversetheorderofsummation. 14 .Acivilengineeriscollectingdataonacertainroad.Sheneedstohavedataon25trucks,and10percent ofthevehiclesonthatroadaretrucks.Findtheprobabilitythatshewillneedtowaitformorethan200 vehiclestopassbeforeshegetstheneededdata. 15 .Supposewetossafairtimentimes,resultingin X heads.Showthattheterm expectedvalue isa misnomer,byshowingthat lim n !1 P X = n= 2=0 .107 UseStirling'sapproximation, k p 2 k k e k .108

PAGE 54

36 CHAPTER1.DISCRETEPROBABILITYMODELS

PAGE 55

Chapter2 ContinuousProbabilityModels 2.1ARandomDart Imaginethatwethrowadartatrandomattheinterval,1.LetDdenotethespotwehit.Byatrandom wemeanthatallsubintervalsofequallengthareequallylikelytogethit.Forinstance,theprobabilityofthe dartlandingin.7,0.8isthesameasfor.2,0.3,.537,0.637andsoon. Therstcrucialpointtonoteisthat P D = c =0 .1 foranyindividualpointc.Thatcanbeseenbythefactthatcisinastinyasubintervalasyouwish,orby thefactthattheintervalc,c,oreven[c,c],haslength0.Or,reasonthatthereareinnitelymanypoints, andiftheyallhadsomenonzeroprobabilityw,say,thentheprobabilitieswouldsumtoinnityinsteadof to1;thustheymusthaveprobability0. Thatmaysoundoddtoyou,butremember,thisisanidealization.Dactuallycannotbejustanyoldpoint in,1.Ourdarthasnonzerothickness,ourmeasuringinstrumenthasonlyniteprecision,andsoon.So itreallyisanidealization,thoughanextremelyusefulone.It'sliketheassumptionofmasslessstringin physicsanalyses;thereisnosuchthing,butit'sagoodapproximationtoreality. ButEquation.1presentsaproblemforusindeningtheterm distribution forvariableslikethis.We deneditforadiscreterandomvariableYasalistofthevaluesYtakeson,togetherwiththeirprobabilities. Butthatwouldbeimpossibleherealltheprobabilitiesofindividualvalueshereare0. Instead,wedenethedistributionofarandomvariableWwhichputs0probabilityonindividualpointsin anotherway.Tosetthisup,werstmustdene,foranyrandomvariableWincludingdiscreteones,its cumulativedistributionfunction cdf: F W t = P W t ;
PAGE 56

38 CHAPTER2.CONTINUOUSPROBABILITYMODELS ofthenameoftherandomvariable. Whatisthere?It'ssimplyanargumenttoafunction.Thefunctionherehasdomain ; 1 ,andwemust thusdenethatfunctionforeveryvalueoft. Forinstance,considerourrandomdartexampleabove.Weknowthat,forexample F D : 23= P D 0 : 23=0 : 23 .3 Ingeneralforourdart, F D t = 8 > < > : 0 ; if t 0 t; if 0
PAGE 57

2.1.ARANDOMDART 39 ofheadswegetfromtwotossesofacoin.Then F Z t = 8 > > > > < > > > > : 0 ; if t< 0 0 : 25 ; if 0 t< 1 0 : 75 ; if 1 t< 2 1 ; if t 2 .5 Forinstance, F Z : 2= P Z 1 : 2= P z =0 or Z =1=0 : 25+0 : 50=0 : 75 .Makesureyou conrmthis! F Z isgraphedbelow: ThefactthatonecannotgetanonintegernumberofheadsiswhatmakesthecdfofZatbetweenconsecutiveintegers. Inthegraphsyouseethat F D in.4iscontinuouswhile F Z in.5hasjumps.Forthisreason,we callrandomvariableslikeDoneswhichhave0probabilityforindividualpoints continuousrandom variables Atthislevelofstudyofprobability,mostrandomvariablesareeitherdiscreteorcontinuous,butsomeare not.

PAGE 58

40 CHAPTER2.CONTINUOUSPROBABILITYMODELS 2.2DensityFunctions Intuitioniskeyhere.MakeSUREyoudevelopagoodintuitiveunderstandingofdensityfunctions,asitis vitalinbeingabletoapplyprobabilitywell.Wewilluseitalotinourcourse. 2.2.1Motivation,DenitionandInterpretation OK,nowwehaveanameforrandomvariablesthathaveprobability0forindividualpointscontinuous andwehavesolvedtheproblemofhowtodescribetheirdistribution.Nowweneedsomethingwhichwill becontinuousrandomvariables'analogofaprobabilitymassfunction. Thinkasfollows.From.2wecanseethatforadiscreterandomvariable,itscdfcanbecalculatedby summingispmf.Recallthatinthecontinuousworld,weintegrateinsteadofsum.So,ourcontinuous-case analogofthepmfshouldbesomethingthatintegratestothecdf.Thatofcourseisthederivativeofthecdf, whichiscalledthe density .Itisdenedas f W t = d dt F W t ;
PAGE 59

2.2.DENSITYFUNCTIONS 41 Figure2.1:ApproximationofProbabilitybyaRectangle So,Xwilltakeonvaluesinregionsinwhich f X islargemuchmoreoftenthaninregionswhere itissmall,withtheratiooffrequenciesbeingproportiontothevaluesof f X ForourdartrandomvariableD, f D t =1 fortin,1,andit's0elsewhere. 1 Again, f D t isNOTPD =t,sincethelattervalueis0,butitisstillviewableasarelativelikelihood.Thefactthat f D t =1 for alltin,1canbeinterpretedasmeaningthatallthepointsin,1areequallylikelytobehitbythedart. Morepreciselyput,youcanviewtheconstantnatureofthisdensityasmeaningthatallsubintervalsofthe samelengthwithin,1havethesameprobabilityofbeinghit. Notetoothatif,say,Xhasthedensityinthepreviousparagraph,then f X =6 = 15=0 : 4 andthus P : 99
PAGE 60

42 CHAPTER2.CONTINUOUSPROBABILITYMODELS 2.2.2UseofDensitiestoFindProbabilitiesandExpectedValues Equation.6impliesthat P a 2 : 5= Z 4 2 : 5 2 t= 15 dt =0 : 65 .17 F X s = Z s 1 2 t= 15 dt = s 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 15 forsin,4cdfis0fort < 1,and1fort > 4.18

PAGE 61

2.3.FAMOUSPARAMETRICFAMILIESOFCONTINUOUSDISTRIBUTIONS 43 2.3FamousParametricFamiliesofContinuousDistributions 2.3.1TheUniformDistributions 2.3.1.1DensityandProperties Inourdartexample,wecanimaginethrowingthedartattheintervalq,rsothiswillbeatwo-parameter family.Thentobeauniformdistribution,i.e.withallthepointsbeingequallylikely,thedensitymust beconstantinthatinterval.Butitalsomustintegrateto1[see.11.So,thatconstantmustbe1divided bythelengthoftheinterval: f D t = 1 r )]TJ/F46 10.9091 Tf 10.909 0 Td [(q .19 fortinq,r,0elsewhere. Iteasilyshownthat E D = q + r 2 and Var D = 1 12 r )]TJ/F46 10.9091 Tf 10.909 0 Td [(q 2 ThenotationforthisfamilyisUq,r. 2.3.1.2Example:ModelingofDiskPerformance Uniformdistributionsareoftenusedtomodelcomputerdiskrequests.Recallthatadiskconsistsofalarge numberofconcentricrings,called tracks .Whenaprogramissuesarequesttoreadorwriteale,the read/writehead mustbepositionedabovethetrackoftherstpartofthele.Thismove,whichiscalleda seek ,canbeasignicantfactorindiskperformanceinlargesystems,e.g.adatabaseforabank. Ifthenumberoftracksislarge,thepositionoftheread/writehead,whichI'lldenoteatX,islikeacontinuous randomvariable,andoftenthispositionismodeledbyauniformdistribution.Thissituationmayholdjust beforeadefragmentationoperation.Afterthatoperation,thelestendtobebunchedtogetherinthecentral tracksofthedisk,soastoreduceseektime,andXwillnothaveauniformdistributionanymore. Eachtrackconsistsofacertainnumberof sectors ofagivensize,say512byteseach.Oncetheread/write headreachesthepropertrack,wemustwaitforthedesiredsectortorotatearoundandpassunderthe read/writehead.Itshouldbeclearthatauniformdistributionisagoodmodelforthis rotationaldelay 2.3.1.3Example:ModelingofDenial-of-ServiceAttack Inonefacetofcomputersecurity,ithasbeenfoundthatauniformdistributionisactuallyawarningof trouble,apossibleindicationofa denial-of-serviceattack .Heretheattackertriestomonopolize,say,a Webserver,byinundatingitwithservicerequests.AccordingtotheresearchofDavidMarchette, 2 attackers chooseuniformlydistributedfalseIPaddresses,apatternnotnormallyseenatservers. 2 StatisticalMethodsforNetworkandComputerSecurity ,DavidJ.Marchette,NavalSurfaceWarfareCenter, rion.math. iastate.edu/IA/2003/foils/marchette.pdf .

PAGE 62

44 CHAPTER2.CONTINUOUSPROBABILITYMODELS 2.3.2TheNormalGaussianFamilyofContinuousDistributions Thesearethefamousbell-shapedcurves,socalledbecausetheirdensitieshavethatshape. 3 2.3.2.1DensityandProperties DensityandParameters: Thedensityforanormaldistributionis f W t = 1 p 2 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 5 t )]TJ/F48 5.9776 Tf 5.756 0 Td [( 2 ; 0 .Then F Y t = P Y t denitionof F Y .21 = P cX + d t denitionofY .22 = P X t )]TJ/F46 10.9091 Tf 10.909 0 Td [(d c algebra .23 = F X t )]TJ/F46 10.9091 Tf 10.909 0 Td [(d c denitionof F X .24 Therefore 3 Notethatotherparametricfamilies,notablytheCauchy,alsohavebellshapes.Thedifferenceliesintherateatwhichthe tailsofthedistributiongoto0.However,duetotheCentralLimitTheorem,tobepresentedbelow,thenormalfamilyisofprime interest. 4 Remember,thisisasynonymforexpectedvalue.

PAGE 63

2.3.FAMOUSPARAMETRICFAMILIESOFCONTINUOUSDISTRIBUTIONS 45 f Y t = d dt F Y t denitionof f Y .25 = d dt F X t )]TJ/F46 10.9091 Tf 10.909 0 Td [(d c from : 24 .26 = f X t )]TJ/F46 10.9091 Tf 10.909 0 Td [(d c d dt t )]TJ/F46 10.9091 Tf 10.909 0 Td [(d c denitionof f X andtheChainRule .27 = 1 c 1 p 2 e )]TJ/F44 7.9701 Tf 6.586 0 Td [(0 : 5 t )]TJ/F48 5.9776 Tf 5.756 0 Td [(d c )]TJ/F48 5.9776 Tf 5.756 0 Td [( 2 from : 20 .28 = 1 p 2 c e )]TJ/F44 7.9701 Tf 6.586 0 Td [(0 : 5 t )]TJ/F45 5.9776 Tf 5.756 0 Td [( c + d c 2 algebra .29 Thatlastexpressionisthe N c + d;c 2 2 density,sowearedone! EvaluatingtheNormalcdf Thefunctionin.20doesnothaveaclosed-formindeniteintegral.Thusprobabilitiesinvolvingnormal randomvariablesmustbeapproximated.Traditionally,thisisdonewithatableforthecdfofN,1.This onetableissufcientfortheentirenormalfamily,becauseifXhasthedistribution N ; 2 then X )]TJ/F46 10.9091 Tf 10.909 0 Td [( .30 hasaN,1distributiontoo,duetotheafnetransformationclosurepropertydiscussedabove. Bytheway,theN,1cdfistraditionallydenotedby .Asnoted,traditionallyithasplayedacentral role,asonecouldtransformanyprobabilityinvolvingsomenormaldistributiontoanequivalentprobability involvingN,1.OnewouldthenuseatableofN,1tondthedesiredprobability. Nowadays,probabilitiesforanynormaldistribution,notjustN,1,areeasilyavailablebycomputer.In theRstatisticalpackage,thenormalcdfforanymeanandvarianceisavailableviathefunction pnorm We'llusebothmethodsinourrstcoupleofexamplesbelow. 2.3.2.2Example:NetworkIntrusion Asanexample,let'slookatasimpleversionofthenetworkintrusionproblem.Supposewehavefound thatinJill'sremoteloginstoacertaincomputer,thenumberofdisksectorsshereadsorwritesXhasa normaldistributionhasameanof500andastandarddeviationof15.Sayournetworkintrusionmonitor ndsthatJillorsomeoneposingasherhasloggedinandhasreadorwritten535sectors.Shouldwebe suspicious? Toanswerthisquestion,let'snd P X 535 :Let Z = X )]TJ/F15 10.9091 Tf 11.153 0 Td [(500 = 15 .Fromourdiscussionabove,we knowthatZhasaN,1distribution,so P X 535= P Z 535 )]TJ/F15 10.9091 Tf 10.909 0 Td [(500 15 1= = 15=0 : 01 .31

PAGE 64

46 CHAPTER2.CONTINUOUSPROBABILITYMODELS Again,traditionallywewouldobtainthat0.01valuefromaN,1cdftableinabook.WithR,wewould justusethefunction pnorm : >1-pnorm,500,15 [1]0.009815329 Anyway,that0.01probabilitymakesussuspicious.Whileit could reallybeJill,thiswouldbeunusual behaviorforJill,sowestarttosuspectthatitisn'ther.Ofcourse,thisisaverycrudeanalysis,andreal intrusiondetectionsystemsaremuchmorecomplex,butyoucanseethemainideashere. 2.3.2.3TheCentralLimitTheorem TheCentralLimitTheoremCLTsays,roughlyspeaking,thatarandomvariablewhichisasumofmany componentswillhaveanapproximatenormaldistribution. 5 So,forinstance,humanweightsareapproximatelynormallydistributed,sinceapersonismadeofmany components.ThesameistrueforSATtestscores, 6 asthetotalscoreisthesumofscoresontheindividual problems. Binomiallydistributedrandomvariables,thoughdiscrete,alsoareapproximatelynormallydistributed.This comesfromthefactthatifsayThasabinomialdistributionwithntrials,thenwecanwrite T = T 1 + ::: + T n where T i is1forasuccessand0forafailure.Sincewehaveasum,theCLTapplies.ThusweusetheCLT ifwehavebinomialdistributionswithlargen. 2.3.2.4Example:CoinTosses Forexample,let'sndtheapproximateprobabilityofgettingmorethan12headsin20tossesofacoin.X, thenumberofheads,hasabinomialdistributionwithn=20andp=0.5Itsmeanandvariancearethennp =10andnp-p=5.So,let Z = X )]TJ/F15 10.9091 Tf 10.909 0 Td [(10 = p 5 ,andwrite P X> 12= P Z> 12 )]TJ/F15 10.9091 Tf 10.91 0 Td [(10 p 5 1 )]TJ/F15 10.9091 Tf 10.909 0 Td [( : 894=0 : 186 .32 Or: >1-pnorm,10,sqrt [1]0.1855467 Theexactansweris0.132.Remember,thereasonwecoulddothiswasthatXisapproximatelynormal, fromtheCLT.Thisisanapproximationofthedistributionofadiscreterandomvariablebyacontinuous one,whichintroducesadditionalerror. 5 TherearemanyversionsoftheCLT.Thebasiconerequiresthatthesummandsbeindependentandidenticallydistributed,but moreadvancedversionsarebroaderinscope. 6 Thisreferstotherawscores,beforescalingbythetestingcompany.

PAGE 65

2.3.FAMOUSPARAMETRICFAMILIESOFCONTINUOUSDISTRIBUTIONS 47 WecangetbetteraccuracybyaccountingforthefactthatXisdiscrete,replacing12by12.5above.Think ofthenumber13owningtheregionbetween12.5and13.5.Thisiscustomary,andinthiscasegivesus 0.1317762,whiletheexactanswertosevendecimalplacesis0.131588.Thisiscalledthe correctionof continuity .Ofcourse,forlargernthisadjustmentisnotnecessary. 2.3.2.5MuseumDemonstration ManysciencemuseumshavethefollowingvisualdemonstrationoftheCLT. Therearemanyballsinachute,withatriangulararrayofrrowsofpinsbeneaththechute.Eachballfalls throughtherowsofpins,bouncingleftandrightwithprobability0.5each,eventuallybeingcollectedinto oneofrbins,numbered0tor.Aballwillendupinbiniifitbouncesrightwardinioftherrowsofpins,i =0,1,...,r.Keypoint: LetXdenotethebinnumberatwhichaballendsup.Xisthenumberofrightwardbounces successesinrrowstrials.ThereforeXhasabinomialdistributionwithn=randp=0.5 Eachbiniswideenoughforonlyoneball,sotheballsinabinwillstackup.Andsincetherearemanyballs, theheightofthestackinbiniwillbeapproximatelyproportionaltoPX=i.Andsincethelatterwillbe approximatelygivenbytheCLT,thestacksofballswillroughlylooklikethefamousbell-shapedcurve! Therearemanyonlinesimulationsofthismuseumdemonstration,suchas http://www.rand.org/ statistics/applets/clt.html and http://www.jcu.edu/math/isep/Quincunx/Quincunx. html .Bycollectingtheballsinbins,theapparatusbasicallysimulatesahistogramfor X ,whichwillthen beapproximatelybell-shaped. 2.3.2.6Optionaltopic:FormalStatementoftheCLT Denition1 Asequenceofrandomvariables L 1 ;L 2 ;L 3 ;::: convergesindistribution toarandomvariable M if lim n !1 P L n t = P M t ; forallt .33 Notebytheway,thattheserandomvariablesneednotbedenedonthesameprobabilityspace. TheformalstatementoftheCLTis: Theorem2 Suppose X 1 ;X 2 ;::: areindependentrandomvariables,allhavingthesamedistributionwhich hasmeanmandvariance v 2 .Then Z = X 1 + :::X n )]TJ/F46 10.9091 Tf 10.909 0 Td [(nm v p n .34 convergesindistributiontoaN,1randomvariable.

PAGE 66

48 CHAPTER2.CONTINUOUSPROBABILITYMODELS 2.3.2.7ImportanceinModeling Normaldistributionsplayakeyroleinstatistics.Mostoftheclassicalstatisticalproceduresassumethat onehassampledfromapopulationhavinganapproximatedistributions.Thisshouldcomeasnosurprise, knowingtheCLT.Thelatterimpliesthatmanythingsinnaturedohaveapproximatenormaldistributions. 2.3.3TheChi-SquareFamilyofDistributions 2.3.3.1DensityandProperties Let Z 1 ;Z 2 ;:::;Z k beindependentN,1randomvariables.Thethedistributionof Y = Z 2 1 + ::: + Z 2 k .35 iscalled chi-squarewithkdegreesoffreedom .Wewritesuchadistributionas 2 k .Chi-squareisa one-parameterfamilyofdistributions. Itturnsoutthatchi-squareisaspecialcaseofthegammafamilyinSection2.3.5below,withr=k/2and =0 : 5 2.3.3.2ImportanceinModeling Thisdistributionisusedwidelyinstatisticalapplications.Aswillbeseeninourchaptersonstatistics,many statisticalmethodsinvolveasumofsquarednormalrandomvariables. 7 2.3.4TheExponentialFamilyofDistributions 2.3.4.1DensityandProperties Thedensitiesinthisfamilyhavetheform f W t = e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t ; 0
PAGE 67

2.3.FAMOUSPARAMETRICFAMILIESOFCONTINUOUSDISTRIBUTIONS 49 2.3.4.2ConnectiontothePoissonDistributionFamily Supposethelifetimesofasetoflightbulbsareindependentandidenticallydistributedi.i.d.,andconsider thefollowingprocess.Attime0,weinstallalightbulb,whichburnsanamountoftime X 1 .Thenweinstall asecondlightbulb,withlifetime X 2 .Thenathird,withlifetime X 3 ,andsoon. Let T r = X 1 + ::: + X r .37 denotethetimeofthe i th replacement.Also,letNtdenotethenumberofreplacementsuptoandincluding timet. 9 Thenitcanbeshownthatifthecommondistributionofthe X i isexponentiallydistributed,the NthasaPoissondistributionwithmean t .Andtheconverseistruetoo:Ifthe X i areindependentand identicallydistributedandNtisPoisson,thenthe X i musthaveexponentialdistributions. Inotherwords,NtwillhaveaPoissondistributionifandonlyifthelifetimesareexponentiallydistributed. Youcanseetheonlyifpartquickly,bythefollowingargument.First,notethat P X 1 >t = P [ N t =0]= e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t .38 Then f X 1 t = d dt )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t = e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t .39 ThecollectionofrandomvariablesNt t 0 ,iscalleda Poissonprocess Therelation E [ N t ]= t saysisthatreplacementsareoccurringatanaveragerateof perunittime.Thus iscalledthe intensityparameter oftheprocess.Itisbecauseofthisrateinterpretationthatmakes a naturalindexingparameterin.36. 2.3.4.3ImportanceinModeling Manydistributionsinreallifehavebeenfoundtobeapproximatelyexponentiallydistributed.Afamous exampleisthelifetimesofairconditionersonairplanes.Anotherfamousexampleisinterarrivaltimes,such ascustomerscomingintoabankormessagesgoingoutontoacomputernetwork.Itisusedinsoftware reliabilitystudiestoo. 2.3.5TheGammaFamilyofDistributions 2.3.5.1DensityandProperties RecallEquation.37,inwhichtherandomvariable T r wasdenedtobethetimeofthe r th lightbulb replacement. T r isthesumofrindependentexponentiallydistributedrandomvariableswithparameter 9 Again,sinceNtisacontinuousrandomvariable,thephraseandincludingisunnecssaryhere.

PAGE 68

50 CHAPTER2.CONTINUOUSPROBABILITYMODELS Thedistributionof T r iscalledan Erlang distribution,withdensity f T r t = 1 r )]TJ/F15 10.9091 Tf 10.909 0 Td [(1! r t r )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t ;t> 0 .40 Thisisatwo-parameterfamily. Wecangeneralizethisbyallowingrtotakenonintegervalues,bydeningageneralizationofthefactorial function: \050 r = Z 1 0 x r )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 e )]TJ/F47 7.9701 Tf 6.586 0 Td [(x dx .41 Thisiscalledthegammafunction,anditgivesusthegammafamilyofdistributions,moregeneralthanthe Erlang: f W t = 1 \050 r r t r )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t ;t> 0 .42 Notethat \050 r ismerelyservingastheconstantthatmakesthedensityintegrateto1.0.Itdoesn'thave meaningofitsown. Thisisagainatwo-parameterfamily,withrand asparameters. Agammadistributionhasmean r= andvariance r= 2 .Inthecaseofintegerr,thisfollowsfrom.37and thefactthatanexponentiallydistributedrandomvariablehasmeanandvariance 1 = andvariance 1 = 2 anditcanbederivedingeneral.Noteagainthatthegammareducestotheexponentialwhenr=1. Recallfromabovethatthegammadistribution,oratleasttheErlang,arisesasasumofindependentrandom variables.ThustheCentralLimitTheoremimpliesthatthegammadistributionshouldbeapproximately normalforlargeintegervaluesofr.WeseeinFigure2.2thatevenwithr=10itisratherclosetonormal. Italsoturnsoutthatthechi-squaredistributionwithddegreesoffreedomisagammadistribution,withr= d/2and =0 : 5 2.3.5.2Example:NetworkBuffer SupposeinanetworkcontextnotourALOHAexample,anodedoesnottransmituntilithasaccumulated vemessagesinitsbuffer.Supposethetimesbetweenmessagearrivalsareindependentandexponentially distributedwithmean100milliseconds.Let'sndtheprobabilitythatmorethan552mswillpassbeforea transmissionismade,startingwithanemptybuffer. Let X 1 bethetimeuntiltherstmessagearrives, X 2 thetimefromthentothearrivalofthesecondmessage, andsoon.Thenthetimeuntilweaccumulatevemessagesis Y = X 1 + ::: + X 5 .Thenfromthedenition ofthegammafamily,weseethatYhasagammadistributionwithr=5and =0 : 01 .Then P Y> 552= Z 1 552 1 4! 0 : 01 5 t 4 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 01 t dt .43

PAGE 69

2.4.DESCRIBINGFAILURE 51 Thisintegralcouldbeevaluatedviarepeatedintegrationbyparts,butlet'suseRinstead: >1-pgamma,5,0.01 [1]0.3544101 2.3.5.3ImportanceinModeling Asseenin.37,sumsofexponentiallydistributedrandomvariablesoftenariseinapplications.Suchsums havegammadistributions. Youmayaskwhatthemeaningisofagammadistributioninthecaseofnonintegerr.Thereisnoparticular meaning,butwhenwehavearealdataset,weoftenwishtosummarizeitbyttingaparametricfamilyto it,meaningthatwetrytondamemberofthefamilythatapproximatesourdatawell. Inthisregard,thegammafamilyprovidesuswithdensitieswhichriseneart=0,thengraduallydecrease to0astbecomeslarge,sothefamilyisusefulifourdataseemtolooklikethis.Graphsofsomegamma densitiesareshowninFigure2.2. 2.4DescribingFailure Inadditiontodensityfunctions,anotherusefuldescriptionofadistributionisits hazardfunction .Again thinkofthelifetimesoflightbulbs,notnecessarilyassuminganexponentialdistribution.Intuitively,the hazardfunctionstatesthelikelihoodofabulbfailinginthenextshortintervaloftime,giventhatithaslasted uptonow.Tounderstandthis,let'srsttalkaboutacertainpropertyoftheexponentialdistributionfamily. 2.4.1MemorylessProperty Oneofthereasonstheexponentialfamilyofdistributionsissofamousisthatithasapropertythatmakes manypracticalstochasticmodelsmathematicallytractable:Theexponentialdistributionsare memoryless Whatthismeansisthatforpositivetandu P W>t + u j W>t = P W>u .44 Let'sderivethis:

PAGE 70

52 CHAPTER2.CONTINUOUSPROBABILITYMODELS Figure2.2:VariousGammaDensities

PAGE 71

2.4.DESCRIBINGFAILURE 53 P W>t + u j W>t = P W>t + u and W>t P W>t .45 = P W>t + u P W>t .46 = R 1 t + u e )]TJ/F47 7.9701 Tf 6.586 0 Td [(s ds R 1 t e )]TJ/F47 7.9701 Tf 6.586 0 Td [(s ds .47 = e )]TJ/F47 7.9701 Tf 6.587 0 Td [(u .48 = P W>u .49 Wesaythatthismeansthattimestartsoverattimet,orthatWdoesn'trememberwhathappenedbefore timet. Itisdifcultforthebeginningmodelertofullyappreciatethememorylessproperty.Let'smakeitconcrete. ConsidertheproblemofwaitingtocrosstherailroadtracksonEighthStreetinDavis,justwestofJStreet. Onecannotseedownthetracks,sowedon'tknowwhethertheendofthetrainwillcomesoonornot. Ifwearedriving,theissueathandiswhethertoturnoffthecar'sengine.Ifweleaveiton,andtheendof thetraindoesnotcomeforalongtime,wewillbewastinggasoline;ifweturnitoff,andtheenddoescome soon,wewillhavetostarttheengineagain,whichalsowastesgasoline.Or,wemaybedecidingwhether tostaythere,orgowayovertotheCovellRd.railroadoverpass. Supposeourpolicyistoturnofftheengineiftheendofthetrainwon'tcomeforatleastsseconds.Suppose alsothatwearrivedattherailroadcrossingjustwhenthetrainrstarrived,andwehavealreadywaitedfor rseconds.Willtheendofthetraincomewithinsmoreseconds,sothatwewillkeeptheengineon?If thelengthofthetrainwereexponentiallydistributediftherearetypicallymanycars,wecanmodelitas continouseventhoughitisdiscrete,Equation.45wouldsaythatthefactthatwehavewaitedrseconds sofarisofnovalueatallinpredictingwhetherthetrainwillendwithinthenextsseconds.Thechanceof itlastingatleastsmoresecondsrightnowisnomoreandnolessthanthechanceithadoflastingatleasts secondswhenitrstarrived. ThememorylessnessofexponentialdistributionsimpliesthataPoissonprocessNtalsohasatimestarts overpropertycalledthe Markovproperty .RecallourexampleinSection2.3.4.2inwhichNtwasthe numberoflightbulbburnoutsuptotimet.Thememorylessnesspropertymeansthatifwestartcounting afreshfromtime,sayz,thenthenumbersofburnoutsaftertimez,i.e.Qu=Nz+u-Nz,alsoisaPoisson process.Inotherwords,QuhasaPoissondistributionwithparameter .Moreover,Quisindependent ofNtforany t
PAGE 72

54 CHAPTER2.CONTINUOUSPROBABILITYMODELS 2.4.2HazardFunctions 2.4.2.1BasicConcepts SupposethelifetimesoflightbulbsLwerediscrete.Supposeaparticularbulbhasalreadylasted80hours. Theprobabilityofitfailinginthenexthourwouldbe P L =81 j L> 80= P L =81 P L> 80 = p L 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L .50 Byanalogy,forcontinuousLwedene h L t = f L t 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L t .51 Again,theinterpretationisthat h L t isthelikelihoodoftheitemfailingverysoonaftert,giventhatithas lastedtamountoftime. Notecarefullythatthewordfailurehereshouldnotbetakenliterally.InourDavisrailroadcrossing exampleabove,failuremeansthatthetrainendsafailurewhichthoseofuswhoarewaitingwill welcome! Sinceweknowthatexponentiallydistributedrandomvariablesarememoryless,wewouldexpectintuitively thattheirhazardfunctionsareconstant.Wecanverifythisbyevaluating.51foranexponentialdensity withparameter ;sureenough,thehazardfunctionisconstant,withvalue Thereadershouldverifythatincontrasttoanexponentialdistribution'sconstantfailurerate,auniform distributionhasanincreasingfailurerateIFR.Somedistributionshavedecreasingfailurerates,while mosthavenon-monotonerates. Hazardfunctionmodelshavebeenusedextensivelyinsoftwaretesting.Herefailureisthediscoveryof abug,andwithquantitiesofinterestincludethemeantimeuntilthenextbugisdiscovered,andthetotal numberofbugs. Peoplehavewhatiscalledabathtub-shapedhazardfunction.Itishighnear0reectinginfantmortality andafter,say,70,butislowandratheratinbetween. Youmayhavenoticedthattheright-handsideof.51isthederivativeof )]TJ/F46 10.9091 Tf 8.484 0 Td [(ln [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L t ] .Therefore Z t 0 h L s ds = )]TJ/F15 10.9091 Tf 10.303 0 Td [(ln[1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L t ] .52 sothat 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L t = e )]TJ/F53 7.9701 Tf 7.998 6.421 Td [(R t 0 h L s ds .53

PAGE 73

2.5.ACAUTIONARYTALE:THEBUSPARADOX 55 andthus 10 f L t = h L t e )]TJ/F53 7.9701 Tf 7.998 6.421 Td [(R t 0 h L s ds .54 Inotherwords,justaswecanndthehazardfunctionknowingthedensity,wecanalsogointhereverse direction.Thisestablishesthatthereisaone-to-onecorrespondencebetweendensitiesandhazardfunctions. Thismayguideourchoiceofparametricfamilyformodelingsomerandomvariable.Wemaynotonly haveagoodideaofwhatgeneralshapethedensitytakeson,butmayalsohaveanideaofwhatthehazard functionlookslike.Thesetwopiecesofinformationcanhelpguideusinourchoiceofmodel. 2.4.3Example:SoftwareReliabilityModels Hazardfunctionmodelshavebeenusedsuccessfullytomodelthearrivalsi.e.discoveriesofbugsin software.Questionsthatariseare,forinstance,Whenarewereadytoship?,meaningwhencanwe believewithsomecondencethatmostbugshavebeenfound? Typicallyonecollectsdataonbugdiscoveriesfromanumberofprojectsofsimilarcomplexity,andestimates thehazardfunctionfromthatdata. Seeforexample AccurateSoftwareReliabilityEstimation ,byJasonAllenDenton,Dept.ofComputer Science,ColoradoStateUniversity,1999,andthemanyreferencestherein. 2.5ACautionaryTale:theBusParadox Supposeyouarriveatabusstop,atwhichbusesarriveaccordingtoaPoissonprocesswithintensityparameter0.1,i.e.0.1arrivalperminute.Recallthatthemeansthattheinterarrivaltimeshaveanexponential distributionwithmean10minutes.Whatistheexpectedvalueofyourwaitingtimeuntilthenextbus? Well,ourrstthoughtmightbethatsincetheexponentialdistributionismemoryless,timestartsover whenwereachthebusstop.Thereforeourmeanwaitshouldbe10. Ontheotherhand,wemightthinkthatonaveragewewillarrivehalfwaybetweentwoconsecutivebuses. Sincethemeantimebetweenbusesis10minutes,thehalfwaypointisat5minutes.Thusitwouldseem thatourmeanwaitshouldbe5minutes. Whichanalysisiscorrect?Actually,thecorrectansweris10minutes.So,whatiswrongwiththesecond analysis,whichconcludedthatthemeanwaitis5minutes?Theproblemisthatthesecondanalysisdidnot takeintoaccountthefactthatalthoughinter-busintervalshaveanexponentialdistributionwithmean10, theparticularinter-busintervalthatweencounterisspecial. Imagineabagfullofsticks,ofdifferentlengths.Wereachintothebagandchooseastickatrandom.The keypointisthatnotallpiecesareequallylikelytobechosen;thelongerpieceswillhaveagreaterchance ofbeingselected. 11 Theformalnameforthisis length-biasedsampling 10 Recallthatthederivativeoftheintegralofafunctionistheoriginalfunction! 11 AnotherexamplewassuggestedtomebyUCDgradstudentShubhabrataSengupta:Thinkofalargeparkinglotonwhich

PAGE 74

56 CHAPTER2.CONTINUOUSPROBABILITYMODELS Similarly,theparticularinter-busintervalthatwehitislikelytobealongerinterval.Toseethis,supposewe observethecomingsandgoingsofbusesforaverylongtime,andplottheirarrivalsonatimelineonawall. Insomecasestwosuccessivemarksonthetimelineareclosetogether,sometimesfarapart.Ifwewereto standfarfromthewallandthrowadartatit,wewouldhittheintervalbetweensomepairofconsecutive marks.Intuitivelywearemoreapttohitawiderintervalthananarrowerone. Onceonerecognizesthisandcarefullyndsthedensityofthatinterval,wediscoverthatthatintervaldoes indeedtendtobelongersomuchsothattheexpectedvalueofthisintervalis20minutes!Inotherwords, ifwethrowadartatthewall,say,1000times,themeanofthe1000intervalswewouldhitwouldbeabout 20.Thisincontrasttothemeanofalloftheintervalsonthewall,whichwouldbe10. Thusthehalfwaypointcomesat10minutes,consistentwiththeanalysiswhichappealedtothememoryless property. Actually,wecanintuitivelyreasonoutwhatthedensityisofthelengthoftheparticularinter-businterval thatwehit,asfollows.Firstconsiderthebag-of-sticksexample,andsupposesomewhatarticiallythat sticklengthXisadiscreterandomvariable.LetYdenotethelengthofthestickthatwepick.Supposethat, say,sticklengths2and6eachcomprise10%ofthesticksinthebag,i.e. p X = p X =0 : 1 .55 Intuitively,onewouldthenreasonthat p Y =3 p Y .56 Inotherwords,thesticksoflength2arejustasnumerousasthoseoflength6,butsincethelatterarethree timesaslong,theyshouldhavetriplethechanceofbeingchosen. Notethatthisisnotsomeabsolutephysicallaw.Differentpeoplemightdrawsticksfromthebagindifferent ways.Butitisareasonablemodel. NowletXdenoteinterarrivaltimesbetweenbuses,andYdenotetheinterarrivaltimethatwehit.Theanalog of.56wouldbethat f Y t isproportionalto tf X t ,i.e. f Y t = ctf X t .57 forsomeconstantc.Recallingthat f Y mustintegrateto1,weseethat c = Z 1 0 tf X t dt )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 .58 ButthatintegralisjustEX!Thelatterquantityis10,and f X t =0 : 1 e )]TJ/F44 7.9701 Tf 6.586 0 Td [(0 : 1 t .59 hundredsofbucketsareplacedofvariousdiameters.Wethrowaballhighintothesky,andseewhatsizebucketitlandsin.Here thedensitywouldbeproportionaltothesquareofthediameter.

PAGE 75

2.6.CHOOSINGAMODEL 57 So, f Y t =0 : 01 te )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 1 t .60 YoumayrecognizethisasanErlangdensity. 2.6ChoosingaModel Theparametricfamiliespresentedhereareoftenusedintherealworld.Asindicatedpreviously,thismay bedoneonanempiricalbasis.WewouldcollectdataonarandomvariableX,andplotthefrequenciesofits valuesinahistogram.IfforexampletheplotlooksroughlylikethecurvesinFigure2.2,wecouldchoose thisasthefamilyforourmodel. Or,ourchoicemayarisefromtheory.Ifforinstanceourknowledgeofthesettinginwhichweareworking saysthatourdistributionismemoryless,thatforcesustousetheexponentialdensityfamily. Ineithercase,thequestionastowhichmemberofthefamilywechoosetowillbesettledbyusingsome kindofprocedurewhichndsthememberofthefamilywhichbesttsourdata.Wewilldiscussthisin detailinourchaptersonstatistics. Notethatwemaychoosenottouseaparametricfamilyatall.Wemaysimplyndthatourdatadoes nottanyofthecommonparametricfamiliestherearemanyothersthanthosepresentedhereverywell. Proceduresthatdonotassumeanyparametricfamilyaretermed nonparametric 2.7AGeneralMethodforSimulatingaRandomVariable SupposewewishtosimulatearandomvariableXwithcdf F X forwhichthereisnoRfunction.Thiscanbe donevia F )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 X U ,whereUhasaU,1distribution.Inotherwords,wecall runif andthenplugtheresult intotheinverseofcdfofX.Hereinverseisinthesensethat,forinstance,squaringandsquare-rooting, expandln,etc.areinverseoperationsofeachother. Forexample,sayXhasthedensity2ton,1.Then F X t = t 2 ,so F )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 s = s 0 : 5 .Wecanthengenerate XinRas sqrtrunif .Here'swhy: Forbrevity,denote F )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 X asGand F X asH.OurgeneratedrandomvariableisGU.Then P [ G U t ] = P [ U G )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 t ] = P [ U H t ] = H t .61 Inotherwords,thecdfofGUis F X !So,GUhasthesamedistributionasX.

PAGE 76

58 CHAPTER2.CONTINUOUSPROBABILITYMODELS Notethatthismethod,thoughvalid,isnotnecessarilypractical,sincecomputing F )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 X maynotbeeasy. Exercises 1 .Suppose X hasauniformdistributionon-1,1,andlet Y = X 2 .Find f Y Hint:Firstnd F Y t 2 .Allthatglittersisnotgold,andnoteverybell-shapeddensityisnormal.ThefamilyofCauchydistributions,havingdensity f X t = 1 c 1 1+ t )]TJ/F47 7.9701 Tf 6.586 0 Td [(b c 2 ; 1 0 : 7 Hint: P W> 0 : 7= P W 2 > 0 : 49 .

PAGE 77

2.7.AGENERALMETHODFORSIMULATINGARANDOMVARIABLE 59 6 .Supposeamanufacturerofsomeelectroniccomponentndsthatitslifetimeisexponentiallydistributed withmean10000hours.Theygivearefundiftheitemfailsbefore500hours.Let N bethenumberofitems theyhavesold,uptoandincludingtheoneonwhichtheymaketherstrefund.Find EN and Var N 7 .Forthedensity a exp )]TJ/F46 10.9091 Tf 8.485 0 Td [(bt;t> 0 ,showthatwemusthave a = b .Thenshowthatthemeanandvariance forthisdistributionare 1 =b and 1 =b 2 ,respectively. 8 .ConsidertherandombucketexampleinFootnote11.Supposebucketdiameter D ,measuredinmeters, hasauniformdistributionon,2.Let W denotethediameterofthebucketinwhichthetossedballlands. aFindthedensity,meanandvarianceof W ,andalso P W> 1 : 5 bWriteanRfunctionthatwillgeneraterandomvariateshavingthedistributionof W 9 .Supposethatcomputerroundofferrorincomputingthesquarerootsofnumbersinacertainrangeis distributeduniformlyon-0.5,0.5,andthatwewillbecomputingthesumofnsuchsquareroots.Finda numbercsuchthattheprobabilityisapproximately95%thatthesumisinerrorbynomorethanc. Acertainpublicparkinggaragechargesparkingfeesof$1.50forthersthourorfractionthereof,and$1 perhourafterthat.So,someonewhostays57minutespays$1.50,someonewhoparksforonehourand12 minutespays$1.70,andsoon.SupposeparkingtimesTareexponentiallydistributedwithmean1.5hours. LetWdenotethetotalfeepaid.FindEWandVarW. 10 .InSection2.45,weshowedthattheexponentialdistributionismemoryless.Infact,itistheonly continuousdistributionwiththatproperty.ShowthattheU,1distributiondoesNOThavethatproperty. Todothis,evaluatebothsidesof.44.

PAGE 78

60 CHAPTER2.CONTINUOUSPROBABILITYMODELS

PAGE 79

Chapter3 MultivariateProbabilityModels 3.1MultivariateDistributions 3.1.1WhyAreTheyNeeded? Mostapplicationsofprobabilityandstatisticsinvolvetheinteraction betweenvariables.Forinstance,when youbuyabookatAmazon.com,thesoftwarewilllikelyinformyouofotherbooksthatpeopleboughtin conjunctionwiththeoneyouselected.Amazonisrelyingonthefactthatsalesofcertainpairsorgroupsof booksarecorrelated. Individualpmfs p X anddensities f X don'tdescribethesecorrelations.Weneedsomethingmore.Weneed waystodescribemultivariatedistributions. 3.1.2DiscreteCase Saywerollabluedieandayellowone.LetXandYdenotethenumberofdotswhichappearontheblue andyellowdice,respectively,andletSdenotethetotalnumberofdotsappearingonthetwodice.Wewill notdiscussYmuchhere,focusingonXandS. Recallthatthe distribution ofXisdenedtobealistofallthevaluesXtakeson,andtheirassociated probabilities: f ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 ; ; 1 6 g .1 WecanwritethismorecompactlybutequivalentlybydeningX's probabilitymassfunction pmf: p X i = P X = i = 1 6 ;i =1 ; 2 ;:::; 6 .2 61

PAGE 80

62 CHAPTER3.MULTIVARIATEPROBABILITYMODELS ThedistributionofSisdenedsimilarly,eitherasalist, f ; 1 36 ; ; 2 36 ; ; 3 36 ; ; 4 36 ; ; 5 36 ; 6 36 ; 5 36 ; 4 36 ; 3 36 ; 2 36 ; 1 36 g .3 orviaitspmf p S 1 ButitmayalsobeimportanttodescribehowXandSvaryjointly.Forexample,intuitivelywewouldfeel thatXandSarepositivelycorrelated.Howdowedescribetheirjointvariation? Todothis,wedenethe bivariateprobabilitymassfunction ofXandS.JustastheunivariatepmfofXis denedtobe p X i = P X = i ,wedenethebivariatepmfas p X;S i;j = P X = i;S = j = 1 36 ;i =1 ; 2 ;:::; 6; j = i +1 ;:::;i +6 .4 Expectedvaluesarecalculatedintheanalogousmanner.RecallthatforafunctiongofX E [ g X ]= X i g i p X i .5 So,foranyfunctiongoftwodiscreterandomvariablesUandV,dene E [ g U;V ]= X i;j g i;j p U;V i;j .6 Forinstance: E XS = 6 X i =1 12 X j =2 ijp X;S i;j = 6 X i =1 i +6 X j = i +1 ij 1 36 .7 Theunivariatepmfs,called marginalpmfs ,canofcourseberecoveredfromthebivariatepmf.Toget p X from p X;S ,wesumoverthevaluesofS.Forexample,let'snd p X ,whichistheprobabilitythatX= 3.HowcouldtheeventX=3happen?Well,Scouldbeanywherefrom4to9,eachwithprobability1/6. So, p X = 9 X j =4 p X;S ;j =6 1 36 = 1 6 .8 Thatisconsistentwithourunivariatecalculationof p X ,asofcourseitshouldbe. 1 Recallthattheconventionfordenotingpmfsistousetheletter`p'withasubscriptindicatingtherandomvariable.

PAGE 81

3.1.MULTIVARIATEDISTRIBUTIONS 63 Wegetconsistentresultsforexpectedvaluestoo.TreatingXasafunctionofXandS,wehave E X = 6 X i =1 i +6 X j = i +1 ip X;S i;j .9 buttheright-handsideRHSof.9reducesto E X = 6 X i =1 i i +6 X j = i +1 p X;S i;j = 6 X i =1 ip X i .10 from.8.Thelastexpressionin.10isEXasdenedintheunivariatesetting,soeverythingisindeed consistent. 3.1.3MultivariateDensities 3.1.3.1MotivationandDenition Extendingourpreviousdenitionofcdfforasinglevariable,wedenethetwo-dimensionalcdfforapair ofrandomvariablesXandYas F X;Y u;v = P X u and Y v .11 IfXandYwerediscrete,wewouldevaluatethatcdfviaadoublesumoftheirbivariatepmf.Youmayhave guessedbynowthattheanalogforcontinuousrandomvariableswouldbeadoubleintegral,anditis.The integrandisthebivariatedensity: f X;Y u;v = @ @u @ @v F X;Y u;v .12 Densitiesinhigherdimensionsaredenedsimilarly. Asintheunivariatecase,abivariatedensityshowswhichregionsoftheX-Yplaneoccurmorefrequently, andwhichoccurlessfrequently. 3.1.3.2UseofMultivariateDensitiesinFindingProbabilitiesandExpectedValues Againbyanalogy,foranyregionAintheX-Yplane, P [ X;Y A ]= ZZ A f X;Y u;v dudv .13 So,justasprobabilitiesinvolvingasinglevariableXarefoundbyintegrating f X overtheregioninquestion, forprobabilitiesinvolvingXandY,wetakethedoubleintegralof f X;Y overthatregion.

PAGE 82

64 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Also,foranyfunctiongX,Y, E [ g X;Y ]= Z 1 Z 1 g u;v f X;Y u;v dudv .14 whereitmustbekeptinmindthat f X;Y u;v maybe0insomeregionsoftheU-Vplane.Notethatthereis nosetAhereasin.13.See.18belowforanexample. Findingmarginaldensitiesisalsoanalogoustothediscretecase,e.g. f X s = Z t f X;Y s;t dt .15 Otherpropertiesandcalculationsareanalogousaswell.Forinstance,thedoubleintegralofthedensityis equalto1,andsoon. 3.1.3.3Example:aTriangularDistribution SupposeX,Yhasthedensity f X;Y s;t =8 st; 0 1 .Thiscalculationwillinvolveadoubleintegral.TheregionAin.13is f s;t : s + t> 1 ; 0
PAGE 83

3.1.MULTIVARIATEDISTRIBUTIONS 65 HeresrepresentsXandtrepresentsY.ThegrayareaistheregioninwhichX,Yranges.ThesubregionA in.13,correspondingtotheeventX+Y > 1,isshowninthestripedareainthegure. Thedarkverticallineshowsallthepointss,tinthestripedregionforatypicalvalueofsintheintegration process.Sincesisthevariableintheouterintegral,considereditxedforthetimebeingandaskwheret willrange forthats .WeseethatforX=s,Ywillrangefrom1-stos;thuswesettheinnerintegral'slimits to1-sands.Finally,wethenaskwherescanrange,andseefromthepictureorfrom.16thatitranges from0to1.Thusthosearethelimitsfortheouterintegral. P X + Y> 1= Z 1 0 : 5 Z s 1 )]TJ/F47 7.9701 Tf 6.586 0 Td [(s 8 stdtds = Z 1 0 8 s s )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 ds = 2 3 .17 Following.14, E [ p X + Y ]= Z 1 0 Z s 0 p s + t 8 stdtds .18 Let'sndthemarginaldensity f Y t .Sowemustintegrateoutthesin.16: f Y t = Z 1 t 8 stds =4 t )]TJ/F15 10.9091 Tf 10.909 0 Td [(4 t 3 .19

PAGE 84

66 CHAPTER3.MULTIVARIATEPROBABILITYMODELS 3.2MoreonCo-variationofRandomVariables 3.2.1Covariance The covariance betweenrandomvariablesXandYisdeneda Cov X;Y = E [ X )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(EY ] .20 SupposethattypicallywhenXislargerthanitsmean,Yisalsolargerthanitsmean,andviceversafor below-meanvalues.Then.20willlikelybepositive.Inotherwords,ifXandYarepositivelycorrelated atermwewilldeneformallylaterbutkeepintuitivefornow,thentheircovarianceispositive.Similarly, ifXisoftensmallerthanitsmeanwheneverYislargerthanitsmean,thecovarianceandcorrelationbetween themwillbenegative.Allofthisisroughlyspeaking,ofcourse,sinceitdependson howmuch Xislarger orsmallerthanitsmean,etc. Covarianceislinearinbotharguments: Cov aX + bY;cU + dV = acCov X;U + adCov X;V + bcCov Y;U + bdCov Y;V .21 foranyconstantsa,b,candd.Also Cov X;Y + q = Cov X;Y .22 foranyconstantqandsoon. Notethat Cov X;X = Var X .23 foranyXwithnitevariance. Also,hereisashortcutwaytondthecovariance: Cov X;Y = E XY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX EY .24 Theproofwillhelpyoureviewsomeimportantissues,namelyaEU+V=EU+EV,bEcU=cEU andEc=cforanyconstantc,andcEXandEYareconstantsin.24. Cov X;Y = E [ X )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(EY ] denition.25 = E [ XY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(EY X + EX EY ] algebra.26 = E XY + E [ )]TJ/F46 10.9091 Tf 8.484 0 Td [(EX Y ]+ E [ )]TJ/F46 10.9091 Tf 8.485 0 Td [(EY X ]+ E [ EX EY ] E[U+V]=EU+EV.27 = E XY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX EY E[cU]=cEU,Ec=c.28

PAGE 85

3.2.MOREONCO-VARIATIONOFRANDOMVARIABLES 67 Anotherimportantproperty: Var X + Y = Var X + Var Y +2 Cov X;Y .29 Thiscomesfrom.24,therelation Var X = E X 2 )]TJ/F46 10.9091 Tf 11.756 0 Td [(EX 2 andthecorrespondingoneforY.Just substituteanddothealgebra. 3.2.2Correlation CovariancedoesmeasurehowmuchorlittleXandYvarytogether,butitishardtodecidewhetheragiven valueofcovarianceislargeornot.Forinstance,ifwearemeasuringlengthsinfeetandchangetoinches, then.21showsthatthecovariancewillincreaseby 12 2 =144 .Thusitmakessensetoscalecovariance accordingtothevariables'standarddeviations.Accordingly,the correlation betweentworandomvariables XandYisdenedby X;Y = Cov X;Y p Var X p Var Y .30 So,correlationisunitless,i.e.doesnotinvolveunitslikefeet,pounds,etc. Itisshownlaterinthischapterthat )]TJ/F15 10.9091 Tf 19.394 0 Td [(1 X;Y 1 j X;Y j =1 ifandonlyifXandYareexactlinearfunctionsofeachother,i.e.Y=cX+dforsome constantscandd 3.2.3Example:ContinuationofSection3.1.3.3 Let'sndthecorrelationbetweenXandYintheexampleinSection3.1.3.3. E XY = Z 1 0 Z s 0 st 8 stdtds .31 = Z 1 0 8 s 2 s 3 = 3 ds .32 = 4 9 .33 f X s = Z s 0 8 stdt .34 =4 st 2 j s 0 .35 =4 s 3 .36

PAGE 86

68 CHAPTER3.MULTIVARIATEPROBABILITYMODELS f Y t = Z 1 t 8 stds .37 =4 t s 2 j 1 t .38 =4 t )]TJ/F46 10.9091 Tf 10.909 0 Td [(t 2 .39 EX = Z 1 0 s 4 s 3 ds = 4 5 .40 E X 2 = Z 1 0 s 2 4 s 3 ds = 2 3 .41 Var X = 2 3 )]TJ/F52 10.9091 Tf 10.909 15.382 Td [( 4 5 2 =0 : 027 .42 EY = Z 1 0 t t )]TJ/F15 10.9091 Tf 10.909 0 Td [(4 t 3 ds = 4 3 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(4 5 = 8 15 .43 E Y 2 = Z 1 0 t 2 t )]TJ/F15 10.9091 Tf 10.909 0 Td [(4 t 3 dt =1 )]TJ/F15 10.9091 Tf 12.104 7.38 Td [(4 6 = 1 3 .44 Var Y = 1 3 )]TJ/F52 10.9091 Tf 10.909 15.382 Td [( 8 15 2 =0 : 049 .45 Cov X;Y = 4 9 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(4 5 8 15 =0 : 018 .46 X;Y = 0 : 018 p 0 : 027 0 : 049 =0 : 49 .47 3.2.4Example:aCatchupGame Considerthefollowingsimplegame.Therearetwoplayers,whotaketurnsplaying.One'spositionafterk turnsisthesumofone'swinningsinthoseturns.Basically,aturnconsistsofgeneratingarandomU,1 variable,withonedifferenceifthatplayeriscurrentlylosing,hegetsabonusof0.2tohelphimcatchup. LetXandYbethetotalwinningsofthetwoplayersafter10turns.Intuitively,XandYshouldbepositively correlated,duetothe0.2bonuswhichbringsthemclosertogether.Let'sseeifthisistrue. Thoughverysimplystated,thisproblemisfartootoughtosolvemathematicallyinanelementarycourse orevenanadvancedone.So,wewillusesimulation.InadditiontondingthecorrelationbetweenXand Y,we'llalsond F X;Y : 8 ; 5 : 2 .

PAGE 87

3.3.SETSOFINDEPENDENTRANDOMVARIABLES 69 1 taketurn<-functiona,b{ 2 win<-runif 3 ifa>=breturnwin 4 elsereturnwin+0.2 5 } 6 7 cdf2<-functionxy,t1,t2{#2-dim.cdf 8 tmp<-xy[xy[,1]<=t1&xy[,2]<=t2,] 9 returnnrowtmp/nrowxy 10 } 11 12 nreps<-10000 13 nturns<-10 14 xyvals<-matrixnrow=nreps,ncol=2 15 forrepin1:nreps{ 16 x<-0 17 y<-0 18 forturnin1:nturns{ 19 #x'sturn 20 x<-x+taketurnx,y 21 #y'sturn 22 y<-y+taketurny,x 23 } 24 xyvals[rep,]<-cx,y 25 } 26 printcorxyvals[,1],xyvals[,2] 27 printcdf2xyvals,5.8,5.2 Theoutputis0.65and0.03.So,XandYareindeedpositivelycorrelatedaswehadsurmised. NotetheuseofR'sbuilt-infunction cor tocomputecorrelation.Notetoothatthebonusmakesthetwo players'winningsleapfrogovereachother.Withoutit,wewouldhaveEX=EY=5.0,and F X;Y : 8 ; 5 : 2 somewhatgreaterthan0.25.Thelatterwouldbethevalueof F X;Y : 0 ; 5 : 0 .Butthebonusmovesthe distributionsofXandYmoretoward10.0. 3.3SetsofIndependentRandomVariables Greatmathematicaltractabilitycanbeachievedbyassumingthatthe X i inarandomvector X = X 1 ;:::;X k areindependent.Inmanyapplications,thisisareasonableassumption. 3.3.1Properties Inthenextfewsections,wewilllookatsomecommonly-usedpropertiesofsetsofindependentrandom variables.Forsimplicity,considerthecasek=2,withXandYbeingindependentscalarrandomvariables. 3.3.1.1ProbabilityMassFunctionsandDensitiesFactor IfXandYareindependent,then p X;Y = p X p Y .48

PAGE 88

70 CHAPTER3.MULTIVARIATEPROBABILITYMODELS inthediscretecase,and f X;Y = f X f Y .49 inthecontinuouscase.Inotherwords,thejointpmf/densityistheproductofthemarginalones. Thisiseasilyseeninthediscretecase: p X;Y i;j = P X = i and Y = j denition .50 = P X = i P Y = j independence .51 = p X i p Y j denition .52 Hereistheproofforthecontinuouscase; f X;Y u;v = @ @u @ @v F X;Y u;v .53 = @ @u @ @v P X u and Y v .54 = @ @u @ @v P X u PY v .55 = @ @u @ @v F X u F Y v .56 = f X v f Y v .57 3.3.1.2ExpectedValuesFactor IfXandYareindependent,then E XY = E X E Y .58 Toprovethis,use.48and.49forthediscreteandcontinuouscases. 3.3.1.3CovarianceIs0 IfXandYareindependent,thenfrom.58and.24,wehave Cov X;Y =0 .59 andthus X;Y =0 aswell.

PAGE 89

3.3.SETSOFINDEPENDENTRANDOMVARIABLES 71 However,theconverseisfalse.Acounterexampleistherandompair V;W thatisuniformlydistributed ontheunitdisk, f s;t : s 2 + t 2 1 g 3.3.1.4VariancesAdd IfXandYareindependent,thenfrom.29and.58,wehave Var X + Y = Var X + Var Y : .60 3.3.1.5Convolution IfXandYarenonnegative,continuousrandomvariables,andwesetZ=X+Y,thenthedensityofZisthe convolution ofthedensitiesofXandY: f Z t = Z t 0 f X s f Y t )]TJ/F46 10.9091 Tf 10.91 0 Td [(s ds .61 Youcangetintuitiononthisbyconsideringthediscretecase.SayUandVarenonnegativeinteger-valued randomvariables,andsetW=U+V.Let'snd p W ; p W k = P W = k bydenition .62 = P U + V = k substitution .63 = k X i =0 P U = i and V = k )]TJ/F46 10.9091 Tf 10.909 0 Td [(i Inwhatwayscanithappen? .64 = k X i =0 p U;V i;k )]TJ/F46 10.9091 Tf 10.909 0 Td [(i bydenition .65 = k X i =0 p U i p V k )]TJ/F46 10.9091 Tf 10.909 0 Td [(i fromSection3.3.1.1 .66 Reviewtheanalogybetweendensitiesandpmfsinourunitoncontinuousrandomvariables,Section2.2.1, andthenseehow.61isanalogousto.62through.66: kin.62isanalogoustotin.61 thelimits0tokin.66areanalogoustothelimits0totin.61 theexpressionk-iin.66isanalogoustot-sin.61 andsoon

PAGE 90

72 CHAPTER3.MULTIVARIATEPROBABILITYMODELS 3.3.2Examples 3.3.2.1Example:Dice InSection3.2.1,wespeculatedthatthecorrelationbetweenX,thenumberonthebluedie,andS,thetotal ofthetwodice,waspositive.Let'scomputeit. WriteS=X+Y,whereYisthenumberontheyellowdie.Thenusingthepropertiesofcovariancepresented above,wehavethat Cov X;S = Cov X;X + Y bydenition .67 = Cov X;X + Cov X;Y from : 21 .68 = Var X +0 from : 23 ; : 59 .69 Also,from.60, Var S = Var X + Y = Var X + Var Y .70 ButVarY=VarX.SothecorrelationbetweenXandSis X;S = Var X p Var X p 2 Var X =0 : 707 .71 Sincecorrelationisatmost1inabsolutevalue,0.707isconsideredafairlyhighcorrelation.Ofcourse,we didexpectXandStobehighlycorrelated. 3.3.2.2Example:Ethernet Considerthisnetwork,essentiallyEthernet.Herenodescansendatanytime.Transmissiontimeis0.1 seconds.Nodescanalsoheareachother;onenodewillnotstarttransmittingifithearsthatanotherhasa transmissioninprogress,andevenwhenthattransmissionends,thenodethathadbeenwaitingwillwaitan additionalrandomtime,toreducethepossibilityofcollidingwithsomeothernodethathadbeenwaiting. Supposetwonodeshearathirdtransmitting,andthusrefrainfromsending.LetXandYbetheirrandom backofftimes,i.e.therandomtimestheywaitbeforetryingtosend.Let'sndtheprobabilitythatthey clash,whichis P j X )]TJ/F46 10.9091 Tf 10.909 0 Td [(Y j 0 : 1 AssumethatXandYareindependentandexponentiallydistributedwithmean0.2,i.e.theyeachhave density 5 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(5 u on ; 1 .Thenfrom.49,weknowthattheirjointdensityistheproductoftheirmarginal densities, f X;Y s;t =25 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(5 s + t ;s;t> 0 .72

PAGE 91

3.3.SETSOFINDEPENDENTRANDOMVARIABLES 73 Now P j X )]TJ/F46 10.9091 Tf 10.909 0 Td [(Y j 0 : 1=1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P j X )]TJ/F46 10.9091 Tf 10.909 0 Td [(Y j > 0 : 1=1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P X>Y +0 : 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P Y>X +0 : 1 .73 Lookatthatrstprobability.Applying.13with A = f s;t : s>t +0 : 1 ; 0 Y +0 : 1= Z 1 0 Z 1 t +0 : 1 25 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(5 s + t dsdt =0 : 303 .74 Bysymmetry, P Y>X +0 : 1 isthesame.So,theprobabilityofaclashis0.394,ratherhigh.Wemay wishtoincreaseourmeanbackofftime,thoughamoredetailedanalysisisneeded. 3.3.2.3Example:AnalysisofSeekTime Thiswillbeananalysisofseektimeonadisk.Supposewehavemappedtheinnermosttrackto0andthe outermostoneto1,andassumethatathenumberoftracksislargeenoughtotreatthepositionHofthe read/writeheadtheinterval[0,1]tobeacontinousrandomvariable,andbthetracknumberrequestedhas auniformdistributiononthatinterval. Considertwoconsecutiveservicerequestsforthedisk,denotingtheirtracknumbersbyXandY.Inthe simplestmodel,weassumethatXandYareindependent,sothatthejointdistributionofXandYisthe productoftheirmarginals,andisthusisequalto1onthesquare 0 X;Y 1 Theseekdistancewillbe j X )]TJ/F46 10.9091 Tf 10.909 0 Td [(Y j .Itsmeanvalueisfoundbytakinggs,tin.14tobe j s )]TJ/F46 10.9091 Tf 10.909 0 Td [(t j Z 1 0 Z 1 0 j s )]TJ/F46 10.9091 Tf 10.909 0 Td [(t j 1 dsdt = 1 3 .75 Bytheway,whatabouttheassumptionshere?Theindependencewouldbeagoodassumption,forinstance, foraheavily-usedleserveraccessedbymanydifferentmachines.Twosuccessiverequestsarelikelytobe fromdifferentmachines,thusindependent.Infact,evenwithinthesamemachine,ifwehavealotofusers atthistime,successiverequestscanbeassumedindependent.Ontheotherhand,successiverequestsfrom aparticularuserprobablycan'tbemodeledthisway. Asmentionedinourunitoncontinuousrandomvariables,page43,ifit'sbeenawhilesincewe'vedonea defragmentingoperation,theassumptionofauniformdistributionforrequestsisprobablygood. Onceagain,thisisjustscratchingthesurface.Muchmoresophisticatedmodelsareusedformoredetailed work. 3.3.2.4Example:BackupBattery Supposewehaveaportablemachinethathascompartmentsfortwobatteries.Themainbatteryhaslifetime Xwithmean2.0hours,andthebackup'slifetimeYhasmeanlife1hours.Onereplacestherstbythe

PAGE 92

74 CHAPTER3.MULTIVARIATEPROBABILITYMODELS secondassoonastherstfails.Thelifetimesofthebatteriesareexponentiallydistributedandindependent. Let'sndthedensityofW,thetimethatthesystemisoperationali.e.thesumofthelifetimesofthetwo batteries. Recallthatifthetwobatterieshadthesamemeanlifetimes,Wwouldhaveagammadistribution.Butthat's notthecasehere.ButwenoticethatthedistributionofWisaconvolutionoftwoexponentialdensities,as itisthesumoftwononnegativeindependentrandomvariables.Using.3.1.5,wehave f W t = Z t 0 f X s f Y t )]TJ/F46 10.9091 Tf 10.909 0 Td [(s ds = Z t 0 0 : 5 e )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 5 s e )]TJ/F44 7.9701 Tf 6.587 0 Td [( t )]TJ/F47 7.9701 Tf 6.586 0 Td [(s ds = e )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 5 t )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F47 7.9701 Tf 6.586 0 Td [(t ; 0
PAGE 93

3.5.CONDITIONALDISTRIBUTIONS 75 IfAisanrxkbutnonrandommatrix,deneQ=AW.ThenQisanr-componentrandomvector,and Cov Q = ACov W A 0 .80 SupposeVandWareindependentrandomvectors,meaningthateachcomponentinVisindependent ofeachcomponentofW.ButthisdoesNOTmeanthatthecomponentswithinVareindependentof eachother,andsimilarlyforW.Then Cov V + W = Cov V + Cov W .81 3.5ConditionalDistributions Thekeytogoodprobabilitymodelingandstatisticalanalysisistounderstandconditionalprobability.The issuearisesconstantly. 3.5.1ConditionalPmfsandDensities First,let'sreview:Inmanyrepetitionsofourexperiment,PAisthelong-runproportionofthetimethat Aoccurs.Bycontrast,PA j Bisthelong-runproportionofthetimethatAoccurs, amongthoserepetitions inwhichBoccurs. Keepthisinyourmindatalltimes. Nowweapplythistopmfs,densities,etc.Wedenetheconditionalpmfasfollowsfordiscreterandom variablesXandY: p Y j X j j i = P Y = j j X = i = p X;Y i;j p X i .82 Byanalogy,wedenetheconditionaldensityforcontinuousXandY: f Y j X t j s = f X;Y s;t f X s .83 3.5.2ConditionalExpectation Conditionalexpectationsaredenedasstraightforwardextensionsof.82and.83: E Y j X = i = X j jp Y j X j j i .84 E Y j X = s = Z t tf Y j X t j s dt .85

PAGE 94

76 CHAPTER3.MULTIVARIATEPROBABILITYMODELS 3.5.3TheLawofTotalExpectationadvancedtopic 3.5.3.1ExpectedValueAsaRandomVariable ForarandomvariableYandaneventA,thequantityEY j Aisthelong-runaverageofY,amongthetimes whenAoccurs.NoteseveralthingsabouttheexpressionEY j A: Theexpressionevaluatestoaconstant. Theitemtotheleftofthe j symbolisa randomvariable Y. Theitemontherightofthe j symbolisan event A. Bycontrast,forthequantityEY j Wdenedbelow,forarandomvariableW,itisthecasethat: Theexpressionitselfisarandomvariable,notaconstant. Theitemtotheleftofthe j symbolisagainarandomvariableY. Buttheitemtotherightofthe j symbolisalsoarandomvariableW. Itwillbeveryimportanttokeepthesedifferencesinmind. Considerthefunctiongtdenedas 2 g t = E Y j W = t .86 Inthiscase,theitemtotherightofthe j isanevent,andthusgtisaconstantforeachvalueoft,nota randomvariable. Now,denetherandomvariableQtobegW.SinceWisarandomvariable,thenQistoo.Thequantity EY j WisthendenedtobeQ.Beforereadinganyfurther,re-readthetwosetsofbulleteditemsabove, andmakesureyouunderstandthedifferencebetweenEY j W=tandEY j W. OnecanviewEY j Wasaprojectioninanabstractvectorspace.Thisisveryelegant,andactuallyaidsthe intuition.Ifandonlyifyouaremathematicallyadventurous,readthedetailsinSection3.9.2. 3.5.3.2TheFamousFormulaTheoremofTotalExpectation Anextremelyusefulformula,givenonlyscantornomentioninmostundergraduateprobabilitycourses,is E Y = E [ E Y j W ] .87 foranyrandomvariablesYandW. 2 Ofcourse,thetisjustaplaceholder,andanyotherlettercouldbeused.

PAGE 95

3.5.CONDITIONALDISTRIBUTIONS 77 TheRHSof.87looksoddatrst,butit'smerelyE[gW];sinceQ=EY j Wisarandomvariable,we cancertainlyaskwhatitsexpectedvalueis. Equation.87isabitabstract.It'saveryusefulabstraction,enablingstreamlinedwritingandthinking abouttheprocess.Still,youmayndithelpfultoconsiderthecaseofdiscreteW,inwhich.87hasthe moreconcreteform EY = X i P W = i E Y j W = i .88 Toseethisintuitively,thinkofmeasuringtheheightsandweightsofalltheadultsinDavis.Saywemeasure heighttothenearestinch,sothatheightisdiscrete.WelookatalltheadultsinDaviswhoare72inches tall,andwritedowntheirmeanweight.Thenwewritedownthemeanweightofalladultsofheight68. Thenwewritedownthemeanweightofalladultsofheight75,andsoon.Then.87saysthatifwetake theaverageofallthenumberswewritedowntheaverageoftheaveragesthenwegetthemeanweight among all adultsinDavis. Notecarefully,though,thatthisisa weighted average.Ifforinstancepeopleofheight69inchesaremore numerousinthepopulation,thentheirmeanweightwillreceivegreateremphasisinoveraverageofallthe meanswe'vewrittendown.Thisisseenin.88,withtheweightsbeingthequantitiesPW=i. Therelation.87isprovedinthediscretecaseinSection3.10. 3.5.4WhatAbouttheVariance? Bytheway,onemightguessthattheanalogoftheTheoremofTotalExpectationforvarianceis Var Y = E [ Var Y j W ] .89 Butthisisfalse. ThinkforexampleoftheextremecaseinwhichY=W.ThenVarY j Wwouldbe0,but VarYwouldbenonzero. Thecorrectformula,calledtheLawofTotalVariance,is Var Y = E [ Var Y j W ]+ Var [ E Y j W ] .90 Derivingthisformulaiseasy,bysimplyevaluatingbothsides,andusingtherelation Var X = E X 2 )]TJ/F15 10.9091 Tf -459.515 -13.549 Td [( EX 2 .Thisexerciseislefttothereader. 3.5.5Example:TrappedMiner Adaptedfrom StochasticProcesses, bySheldonRoss,Wiley,1996. Amineristrappedinamine,andhasachoiceofthreedoors.Thoughhedoesn'trealizeit,ifhechooses toexittherstdoor,itwilltakehimtosafetyafter2hoursoftravel.Ifhechoosesthesecondone,itwill

PAGE 96

78 CHAPTER3.MULTIVARIATEPROBABILITYMODELS leadbacktothemineafter3hoursoftravel.Thethirdoneleadsbacktothemineafter5hoursoftravel. Supposethedoorslookidentical,andifhereturnstotheminehedoesnotrememberwhichdoorshetried earlier.Whatistheexpectedtimeuntilhereachessafety? LetYbethetimeittakestoreachsafety,andletWdenotethenumberofthedoorchosen,2or3onthe rsttry.ThenletusconsiderwhatvaluesEY j Wcanhave.IfW=1,thenY=2,so E Y j W =1=2 .91 IfW=2,thingsareabitmorecomplicated.Theminerwillgoona3-hourexcursion,andthenbebackin itsoriginalsituation,andthushaveafurtherexpectedwaitofEY,sincetimestartsover.Inotherwords, E Y j W =2=3+ EY .92 Similarly, E Y j W =3=5+ EY .93 Insummary,nowconsideringthe randomvariable EY j W,wehave Q = E Y j W = 8 < : 2 ;w:p: 1 3 3+ EY;w:p: 1 3 5+ EY;w:p: 1 3 .94 wherew.p.meanswithprobability.So,using.87or.88,wehave EY = EQ =2 1 3 ++ EY 1 3 ++ EY 1 3 = 10 3 + 2 3 EY .95 Equatingtheextremeleftandextremerightendsofthisseriesofequations,wecansolveforEY,whichwe ndtobe10. Itislefttothereadertoseehowthiswouldchangeifweassumethattheminerrememberswhichdoorshe hasalreadyhit. 3.5.6Example:AnalysisofHashTables Famousexample,adaptedfromvarioussources. Consideradatabasetableconsistingofmcells,onlysomeofwhicharecurrentlyoccupied.Eachtimea newkeymustbeinserted,itisusedinahashfunctiontondanunoccupiedcell.Sincemultiplekeysmap tothesametablecell,wemayhavetoprobemultipletimesbeforendinganunoccupiedcell. WewishtondEY,whereYisthenumberofprobesneededtoinsertanewkey.Oneapproachtodoing sowouldbetoconditiononW,thenumberofcurrentlyoccupiedcellsatthetimewedoasearch.After

PAGE 97

3.5.CONDITIONALDISTRIBUTIONS 79 ndingEY j W,wecanusetheTheoremofTotalExpectationtondEY.Wewillmaketwoassumptions tobediscussedlater: aGiventhatW=k,eachprobewillcollidewithanexistingcellwithprobabilityk/m,withsuccessive probesbeingindependent. bWisuniformlydistributedontheset1,2,...,m,i.e.PW=k=1/mforeachk. TocalculateEY j W=k,wenotethatgivenW=k,thenYisthenumberofindependenttrialsuntila successisreached,wheresuccessmeansthatourprobeturnsouttobetoanunoccupiedcell.Thisisa geometric distribution,i.e. P Y = r j W = k = k m r )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 )]TJ/F46 10.9091 Tf 13.882 7.38 Td [(k m .96 Themeanofthisgeometricdistributionis,from.75, 1 1 )]TJ/F47 7.9701 Tf 13.539 4.295 Td [(k m .97 Then EY = E [ E Y j W ] .98 = m )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X k =1 1 m E Y j W = k .99 = m )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X k =1 1 m )]TJ/F46 10.9091 Tf 10.909 0 Td [(k .100 =1+ 1 2 + 1 3 + ::: + 1 m )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .101 Z m 1 1 u du .102 = ln m .103 wheretheapproximationissomethingyoumightrememberfromcalculusyoucanpictureitbydrawing rectanglestoapproximatetheareaunderthecurve.. Now,whataboutourassumptions,aandb?Theassumptioninaofeachcellhavingprobabilityk/m shouldbereasonablyaccurateifkismuchsmallerthanm,becausehashfunctionstendtodistributeprobes uniformly,andtheassumptionofindependenceofsuccessiveprobesisallrighttoo,sinceitisveryunlikely thatwewouldhitthesamecelltwice.However,ifkisnotmuchsmallerthanm,theaccuracywillsuffer. Assumptionbismoresubtle,withdifferinginterpretations.Forexample,themodelmayconcernone specicdatabase,inwhichcasetheassumptionmaybequestionable.PresumablyWgrowsovertime,in

PAGE 98

80 CHAPTER3.MULTIVARIATEPROBABILITYMODELS whichcasetheassumptionwouldmakenosenseitdoesn'teven have adistribution.Wecouldinstead thinkofadatabasewhichgrowsandshrinksastimeprogresses.However,evenhere,itwouldseemthatW wouldprobablyoscillatearoundsomevaluelikem/2,ratherthanbeinguniformlydistributedasassumed here.Thus,thismodelisprobablynotveryrealistic.However,evenidealizedmodelscansometimesprovide importantinsights. 3.6ParametricFamiliesofDistributions Sincetherearesomanywaysinwhichrandomvariablescancorrelatewitheachother,thereareratherfew parametricfamiliescommonlyusedtomodelmultivariatedistributionsotherthanthosearisingfromsets ofindependentrandomvariableshaveadistributioninacommonparametricunivariatefamily.Wewill discusstwohere. 3.6.1TheMultinomialFamilyofDistributions 3.6.1.1ProbabilityMassFunction Thisisageneralizationofthebinomialfamily. Supposeonetossesadie8times.Whatistheprobabilitythattheresultsconsistoftwo1s,one2,one4, three5sandone6?Well,ifthetossesoccurinthatorder,i.e.thetwo1scomerst,thenthe2,etc.,thenthe probabilityis 1 6 2 1 6 1 1 6 0 1 6 1 1 6 3 1 6 1 .104 Buttherearemanydifferentorderings,infact 8! 2!1!0!1!3!1! .105 ofthem. Fromthis,wecanseethefollowing.Suppose: wehaventrials,eachofwhichhasrpossibleoutcomesorcategories thetrialsareindependent thei th outcomehasprobability p i Let X i denotethenumberoftrialswithoutcomei,i=1,...,r.Thenwesaythat X 1 ;:::;X r havea multinomial distribution ,andthejointpmfofthe X 1 ;:::;X r is p X 1 ;:::;X r j 1 ;:::;j r = n j 1! :::j r p j 1 1 :::p j r r .106

PAGE 99

3.6.PARAMETRICFAMILIESOFDISTRIBUTIONS 81 Notethatthisfamilyofdistributionshasr+1parameters. 3.6.1.2MeansandCovariances Nowlookatthevector X = X 1 ;:::;X r 0 .Let'snditsmeanvectorandcovariancematrix. First,notethatthemarginaldistributionsofthe X i arebinomial!So, EX i = np i and Var X i = np i )]TJ/F46 10.9091 Tf 10.909 0 Td [(p i .107 SoweknowEXnow: EX = 0 @ np 1 ::: np r 1 A .108 WhataboutCovX?Tothisend,let T ki equal1or0,dependingonwhetherthek th trialresultsinoutcome i,k=1,...,nandi=1,...,r.Wesaythat T ki isthe indicatorvariable fortheeventthatk th trialresultsin outcomei.Thisisasimpleconcept,butithaspowerfuluses,asyou'llsee. Makesureyouunderstandthat X i = n X k =1 T ki .109 From.109,youcanseethat X = U 1 + ::: + U n .110 where U k = 0 @ T k 1 ::: T kr 1 A .111 Now,here'swherethepowerofthematrixoperationsinSection3.4willbeseen: Cov X = Cov U 1 + ::: + U n from : 110 .112 = Cov U 1 + ::: + Cov U n from : 81 .113 = nCov U 1 allhavethesamedistribution .114

PAGE 100

82 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Now,for i 6 = j ,wehavefrom.24 Cov T 1 i ;T 1 j = E T 1 i T 1 j )]TJ/F46 10.9091 Tf 10.909 0 Td [(ET 1 i ET 1 j .115 But T 1 i ;T 1 j =0 !And ET 1 i = p i andthesameforthejcase.So, Cov T 1 i ;T 1 j = )]TJ/F46 10.9091 Tf 8.485 0 Td [(p i p j .116 Ofcourse,fori=j, Cov T 1 i ;T 1 j = Var T 1 i = p i )]TJ/F46 10.9091 Tf 11.609 0 Td [(p i ,since T 1 i hasabinomialdistributionwith numberoftrialsequalto1. Puttingallthistogether,andrecalling.114,weseethat Cov X = n 0 B B @ p 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 1 )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p 2 ::: )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p r )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p 2 p 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 ::: )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 2 p r :::::::::::: :::::::::p r )]TJ/F46 10.9091 Tf 10.909 0 Td [(p r 1 C C A .117 NotetoothatifwedeneR=X/n,sothatRisthevectorofproportionsinthevariouscategoriese.g. X 1 =n isthefractionoftrialsthatresultedincategory1,then.117and.79,wehave Cov R = 1 n 0 B B @ p 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 1 )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p 2 ::: )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p r )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 1 p 2 p 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 ::: )]TJ/F46 10.9091 Tf 8.485 0 Td [(p 2 p r :::::::::::: :::::::::p r )]TJ/F46 10.9091 Tf 10.909 0 Td [(p r 1 C C A .118 Whew!Thatwasaworkout,buttheseformulaswillbecomeveryusefullateron,bothinthisunitand subsequentones. 3.6.1.3Application:TextMining Oneofthebranchesofcomputerscienceinwhichthemultinomialfamilyplaysaprominentroleisin textmining.Onegoalisautomaticdocumentclassication.Wewanttowritesoftwarethatwillmake reasonablyaccurateguessesastowhetheradocumentisaboutsports,thestockmarket,electionsetc.,based onthefrequenciesofvariouskeywordstheprogramndsinthedocument. Manyofthesimplermethodsforthisusethe bagofwordsmodel .Wehaverkeywordswe'vedecidedare usefulfortheclassicationprocess,andthemodelassumesthatstatisticallythefrequenciesofthosewords inagivendocumentcategory,saysports,followamultinomialdistribution.Eachcategoryhasitsownsetof probabilities p 1 ;:::;p r .Forinstance,ifBarryBondsisconsideredoneword,itsprobabilitywillbemuch higherinthesportscategorythanintheelectionscategory,say.So,theobservedfrequenciesofthewords inaparticulardocumentwillhopefullyenableoursoftwaretomakeafairlygoodguessastothecategory thedocumentbelongsto.

PAGE 101

3.6.PARAMETRICFAMILIESOFDISTRIBUTIONS 83 Onceagain,thisisaverysimplemodelhere,designedtojustintroducethetopictoyou.Clearlythe multinomialassumptionofindependencebetweentrialsisgrosslyincorrecthere,mostmodelsaremuch morecomplexthanthis. 3.6.2TheMultivariateNormalFamilyofDistributions Notetothereader:Thisisamoredifcultsection,butworthputtingextraeffortinto,assomanystatistical applicationsincomputersciencemakeuseofit.Itwillseemhardattimes,butintheendwon'tbetoobad. 3.6.2.1DensitiesandProperties Intuitively,thisfamilyhasdensitieswhichareshapedlikemultidimensionalbells,justliketheunivariate normalhasthefamousone-dimensionalbellshape. Let'slookatthebivariatecaserst.Thejointdistributionof X 1 and X 2 issaidtobe bivariatenormal if theirdensityis f X;Y s;t = 1 2 1 2 p 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 e )]TJ/F45 5.9776 Tf 19.35 3.259 Td [(1 2 )]TJ/F48 5.9776 Tf 5.757 0 Td [( 2 s )]TJ/F48 5.9776 Tf 5.756 0 Td [( 1 2 2 1 + t )]TJ/F48 5.9776 Tf 5.756 0 Td [( 2 2 2 2 )]TJ/F45 5.9776 Tf 7.782 4.324 Td [(2 s )]TJ/F48 5.9776 Tf 5.756 0 Td [( 1 t )]TJ/F48 5.9776 Tf 5.756 0 Td [( 2 1 2 ; )-222(1
PAGE 102

84 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Figure3.1:BivariateNormalDensity, =0 : 2 Figure3.2:BivariateNormalDensity, =0 : 8

PAGE 103

3.6.PARAMETRICFAMILIESOFDISTRIBUTIONS 85 Allofthisreectsthehighcorrelation.8betweenthetwovariables.Ifweweretocontinuetoincrease toward1.0,wewouldseethebellbecomenarrowerandnarrower,with X 1 and X 2 comingcloserandcloser toalinearrelationship,onewhichcanbeshowntobe X 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 = 1 2 X 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 .120 Inthiscase,thatwouldbe X 1 = r 10 15 X 2 =0 : 82 X 2 .121 Themultivariatenormalfamilyofdistributionsisparameterizedbyonevector-valuedquantity,themean ,andonematrix-valuedquantity,thecovariancematrix .Specically,supposetherandomvector X = X 1 ;:::;X k 0 hasak-variatenormaldistribution. Thedensityhasthisform: f X t = ce )]TJ/F44 7.9701 Tf 6.587 0 Td [(0 : 5 t )]TJ/F47 7.9701 Tf 6.586 0 Td [( 0 )]TJ/F45 5.9776 Tf 5.756 0 Td [(1 t )]TJ/F47 7.9701 Tf 6.586 0 Td [( .122 where c = 1 k= 2 p det .123 Hereagain'denotesmatrixtranspose,-1denotesmatrixinversionanddetmeansdeterminant.Again, notethattisakx1vector. Sincethematrixissymmetric,therearekk+1/2distinctparametersthere,andkparametersinthemean vector,foratotalofkk+3/2parametersforthisfamilyofdistributions. Thefamilyhasthefollowingimportantproperties: Thecontoursforpointsatwhichthebellhasthesameheightthinkofatopographicalmapare ellipticalinshape.ThelargerthecorrelationinabsolutevaluebetweenXandY,themoreelongated theellipse.Whentheabsolutecorrelationreaches1,theellipsedegeneratesintoastraightline. LetXbeak-variatenormalvectorasabove,andletAbeaconstanti.e.nonrandommatrixwithk columns.ThentherandomvectorY=AXalsohasamultivariatenormaldistribution,withmean A andcovariancematrix A A 0 Thishastwoimportantimplications: Thelower-dimensionalmarginaldistributionsarealsomultivariatenormal. ScalarlinearcombinationsofXarenormal.Inotherwords,forconstantscalars a 1 ;:::;a k ,the quantity Y = a 1 X 1 + ::: + a k X k hasaunivariatenormaldistributionwithmean a andvariance a a 0 ,with a beingtherowvector a 1 ;:::;a k .

PAGE 104

86 CHAPTER3.MULTIVARIATEPROBABILITYMODELS IfXandYarenormalandtheyareindependent,thentheyjointlyhaveabivariatenormaldistributions.Ingeneral,though,havinganormaldistributionforXandYdoesnotimplythatthey arejointlybivariatenormal. SupposeWhasamultivariatenormaldistribution.Theconditionaldistributionofsomecomponents ofW,givenothercomponents,isagainmultivariatenormal. InRthedensity,cdfandquantilesofthemultivariatenormaldistributionaregivenbythefunctions dmvnorm pmvnorm and qmvnorm inthelibrary mvtnorm .Youcansimulateamultivariatenormaldistribution byusing mvrnorm inthelibrary MASS 3.6.2.2TheMultivariateCentralLimitTheorem ThemultidimensionalversionoftheCentralLimitTheoremholds.Asumofindependentidenticallydistributedrandomvectorshasanapproximatemultivariatenormaldistribution. Forexample,sinceaperson'sbodyconsistsofmanydifferentcomponents,theCLTanon-independent, non-identicallyversionofitexplainsintuitivelywhyheightsandweightsareapproximatelybivariatenormal.Histogramsofheightswilllookapproximatelybell-shaped,andthesameistrueforweights.The multivariateCLTsaysthatthree-dimensionalhistogramsplottingfrequencyalongtheZaxisagainst heightandweightalongtheXandYaxeswillbeapproximatelythree-dimensionalbell-shaped. 3.6.2.3Example:DiceGame Supposewerolladie50times.LetXdenotethenumberofrollsinwhichwegetonedot,andletYbethe numberoftimeswegeteithertwoorthreedots.Forconvenience,let'salsodeneZtobethenumberof timeswegetfourormoredots,thoughourfocuswillbeonXandY. Supposewewishtond P X 12 and Y 16 .Supposealsothatwewin$5foreachrollofaone,and $2foreachrollofatwoorthree;we'llndtheprobabilitythatwewinmorethan$90. ThetripleX,Y,Zhasamultinomialdistributionwithn=50andthreepossibleoutcomes;2or3;4,5 or6,with p 1 =1 = 6 p 2 =1 = 3 and p 3 =1 = 2 .From.110,weseethatX,Y,Zhasanapproximately multivariatenormaldistribution. Theseprobabilitiesofinteresttousherewouldbequitedifculttonddirectly.For P X 12 and Y 16 ,forinstance,wewouldneedtosum.106overmany,manydifferentcases.So,theCLTwillbevery valuablehere. We'llofcourseneedtoknowthemeanvectorandcovariancematrixoftherandomvectorX,Y.Wehave thosefrom.107and.117: E [ X;Y ]= = 6 ; 50 = 3 .124

PAGE 105

3.6.PARAMETRICFAMILIESOFDISTRIBUTIONS 87 and Cov [ X;Y ]= 50 5 = 36 )]TJ/F15 10.9091 Tf 8.485 0 Td [(50 = 18 )]TJ/F15 10.9091 Tf 8.485 0 Td [(50 = 1850 2 = 9 .125 WeusetheRfunction pmvnorm introducedinSection3.6.2.1.ToaccountfortheintegernatureofXand Y,wecallthefunctionwithupperlimitsof12.5and16.5,ratherthan12and16,whichisoftenusedtoget abetterapproximation.Ourcodeis 1 p1<-1/6 2 p23<-1/3 3 meanvec<-50 cp1,p23 4 var1<-50 p1 -p1 5 var23<-50 p23 -p23 6 covar123<--50 p1 p23 7 covarmat<-matrixcvar1,covar123,covar123,var23,nrow=2 8 printpmvnormupper=c.5,16.5,mean=meanvec,sigma=covarmat Wendthat P X 12 and Y 16 0 : 43 .126 Now,let'sndtheprobabilitythatourtotalwinnings,W,isover$90.WeknowthatW=5X+2Y,and Section.6.2.1tellsusthatlinearcombinationsofamultivariatenormalrandomvectorareunivariate normal.Inotherwords,Whasanormaldistribution! WethusneedthemeanandvarianceofW.Themeaniseasy: EW = E X +2 Y =5 EX +2 EY =250 = 6+100 = 3=75 .127 Forthevariance,use.29: Var W = Var [ X + Y ] denitionofW .128 = Var X + Var Y +2 Cov X; 2 Y [ from : 29 .129 =5 2 Var X +2 2 Var Y +2 5 2 Cov X;Y [ propertiesofVar,Cov ] .130 =25 250 36 +4 100 9 +20 )]TJ/F15 10.9091 Tf 9.681 7.38 Td [(50 18 .131 =162 : 5 .132 Then P W> 90=1 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 90 )]TJ/F15 10.9091 Tf 10.909 0 Td [(75 162 : 5 0 : 5 =0 : 12 .133 Bytheway,whataboutZ?SinceZ=50-X-Y,thereisnoneedtolookatZ,andwewouldhavedifculties ifwedid.ThereasonisthatthefactthatZisanexactlinearfunctionofXandYturnsouttomakethe covariancematrix singular ,i.e.lackinganinverse.Thatwouldcreateproblemsin.122.

PAGE 106

88 CHAPTER3.MULTIVARIATEPROBABILITYMODELS 3.6.2.4Application:DataMining Themultivariatenormalfamilyplaysacentralroleinmultivariatestatisticalmethods. Forinstance,amajorissueindataminingis dimensionreduction ,whichmeanstryingtoreducewhatmay behundredsorthousandsofvariablesdowntoamanageablelevel.Oneofthetoolsforthis,called principle componentsanalysis PCA,isbasedonmultivariatenormaldistributions.Googleusesthiskindofthing quiteheavily.We'lldiscussPCAinSection6.5. Toseeabitofhowthisworks,notethatinFigure3.2, X 1 and X 2 hadnearlyalinearrelationshipwitheach other.Thatmeansthatoneofthemisnearlyredundant,whichisgoodifwearetryingtoreducethenumber ofvariableswemustworkwith. Ingeneral,themethodofprinciplecomponentstakesroriginalvariables,inthevectorXandformsr newonesinavectorY,eachofwhichissomelinearcombinationoftheoriginalones.Thesenewonesare independent.Inotherwords,thereisasquarematrixAsuchthatthecomponentsofY=AXareindependent. ThematrixAconsistsoftheeigenvectorsofCovX;moreonthisinSection6.5ofourunitonstatistical relations. Wethendiscardthe Y i withsmallvariance,asthatmeanstheyarenearlyconstantandthusdonotcarry muchinformation.Thatleavesuswithasmallersetofvariablesthatstillcapturesmostoftheinformation oftheoriginalones. Manyanalysesinbioinformaticsinvolvedatathatcanbemodeledwellbymultivariatenormaldistributions.Forexample,inautomatedcellanalysis,twoimportantvariablesareforwardlightscatterFSCand sidewardlightscatterSSC.Thejointdistributionofthetwoisapproximatelybivariatenormal. 4 3.7SimulationofRandomVectors Let X = X 1 ;:::;X k 0 bearandomvectorhavingaspecieddistribution.Howcanwewritecodeto simulateit?Itisnotalwayseasytodothis.We'lldiscussacoupleofeasycaseshere,andillustratewhat onemaydoinothersituations. Theeasiestcaseandaveryfrequenly-occurringoneisthatinwhichthe X i areindependent.Onesimply simulatesthemindividually,andthatsimulatesX! AnothereasycaseisthatinwhichXhasamultivariatenormaldistribution.WenotedinSection3.6.2.1 thatRincludesthefunction mvrnorm ,whichwecanusetosimulateourXhere.Thewaythisfunction worksistousethenotionofprinciplecomponentsmentionedinSection3.6.2.4.WeconstructY=AXfor thematrixAdiscussedthere.The Y i areindependent,thuseasilysimulated,andthenwetransformbackto Xvia X = A )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 Y Ingeneral,though,thingsmaynotbesoeasy.Forinstance,considerthedistributionin.16.Thereisno formulaicsolutionhere,butthefollowingstrategyworks. 4 See BioinformaticsandComputationalBiologySolutionsUsingRandBioconductor ,editedbyRobertGentleman,Wolfgang Huber,VincentJ.Carey,RafaelA.IrizarryandSandrineDudoit,Springer,2005.

PAGE 107

3.8.TRANSFORMMETHODSADVANCEDTOPIC 89 FirstwendthemarginaldensityofX.AsinthecaseforYshownin.19,wecompute f X s = Z s 0 8 stdt =4 s 3 .134 Usingthemethodshowninourunitoncontinuousprobability,Section2.7,wecansimulateXas X = F )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X W .135 whereWisaU,1randomvariable,generatedas runif .Since F X u = u 4 F )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X v = v 0 : 25 ,andthus ourcodetosimulateXis runif.25 NowthatwehaveX,wecangetY.Weknowthat f Y j X t j S = 8 st 4 s 3 = 2 s 2 t .136 Remember,sisconsideredconstant.Soagainweusetheinverse-cdfmethodheretondY,givenX,and thenwehaveourpairX,Y. 3.8TransformMethodsadvancedtopic Weoftenusetheideaof transform functions.Forexample,youmayhaveseen Laplacetransforms ina mathorengineeringcourse.Thefunctionswewillseeheredifferfromthisbyjustachangeofvariable. Thoughintheformusedheretheyinvolveonlyunivariatedistributions,theirapplicationsareoftenmultivariate,aswillbethecasehere. 3.8.0.5GeneratingFunctions Let'sstartwiththe generatingfunction .Foranynonnegative-integervaluedrandomvariableV,itsgeneratingfunctionisdenedby g V s = E s V = 1 X i =0 s i p V i ; 0 s 1 .137 Forinstance,supposeNhasageometricdistributionwithparameterp,sothat p N i = )]TJ/F46 10.9091 Tf 11.459 0 Td [(p p i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ,i= 1,2,...Then g N s = 1 X i =1 s i )]TJ/F46 10.9091 Tf 10.909 0 Td [(p p i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p p 1 X i =1 s i p i = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p p ps 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(ps = )]TJ/F46 10.9091 Tf 10.909 0 Td [(p s 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(ps .138

PAGE 108

90 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Whyrestrictstotheinterval[0,1]?Theansweristhatfor s> 1 theseriesin.137maynotconverge.for 0 s 1 ,theseriesdoesconverge.Toseethis,notethatifs=1,wejustgetthesumofallprobabilities, whichis1.0.Ifanonnegativesislessthan1,then s i willalsobelessthan1,sowestillhaveconvergence. Oneuseofthegeneratingfunctionis,asitsnameimplies,togeneratetheprobabilitiesofvaluesforthe randomvariableinquestion.Inotherwords,ifyouhavethegeneratingfunctionbutnottheprobabilities, youcanobtaintheprobabilitiesfromthefunction.Here'swhy:Forclarify,write.137as g V s = P V =0+ sP V =1+ s 2 P V =2+ ::: .139 Fromthisweseethat g V = P V =0 .140 So,wecanobtainPV=0fromthegeneratingfunction.Nowdifferentiating.137withrespecttos,we have g 0 V s = d ds P V =0+ sP V =1+ s 2 P V =2+ ::: = P V =1+2 sP V =2+ ::: .141 So,wecanobtainPV=2from g 0 V ,andinasimilarmannercancalculatetheotherprobabilitiesfrom thehigherderivatives. 3.8.0.6MomentGeneratingFunctions Thegeneratingfunctionishandy,butitislimitedtodiscreterandomvariables.Moregenerally,wecanuse the momentgeneratingfunction ,denedforanyrandomvariableXas m X t = E [ e tX ] .142 foranytforwhichtheexpectedvalueexists. Thatlastrestrictionisanathematomathematicians,sotheyusethecharacteristicfunction, X t = E [ e itX ] .143 whichexistsforanyt.However,itmakesuseofpeskycomplexnumbers,sowe'llstayclearofithere. Differentiating.142withrespecttot,wehave m 0 X t = E [ Xe tX ] .144

PAGE 109

3.8.TRANSFORMMETHODSADVANCEDTOPIC 91 Weseethenthat m 0 X = EX .145 So,ifwejustknowthemoment-generatingfunctionofX,wecanobtainEXfromit.Also, m 00 X t = E X 2 e tX .146 so m 00 X = E X 2 .147 Inthismanner,wecanforvariouskobtain E X k ,the k th moment ofX,hencethename. 3.8.1Example:NetworkPackets Asanexample,supposesaythenumberofpacketsNreceivedonanetworklinkinagiventimeperiodhas aPoissondistributionwithmean ,i.e. P N = k = e )]TJ/F47 7.9701 Tf 6.586 0 Td [( k k ;k =0 ; 1 ; 2 ; 3 ;::: .148 3.8.1.1PoissonGeneratingFunction Let'srstnditsgeneratingfunction. g N t = 1 X k =0 t k e )]TJ/F47 7.9701 Tf 6.587 0 Td [( k k = e )]TJ/F47 7.9701 Tf 6.586 0 Td [( 1 X k =0 t k k = e )]TJ/F47 7.9701 Tf 6.587 0 Td [( + t .149 wherewemadeuseoftheTaylorseriesfromcalculus, e u = 1 X k =0 u k =k .150 3.8.1.2SumsofIndependentPoissonRandomVariablesArePoissonDistributed Supposedpacketscomeintoanetworknodefromtwoindependentlinks,withcounts N 1 and N 2 ,Poisson distributedwithmeans 1 and 2 .Let'sndthedistributionof N = N 1 + N 2 ,usingatransformapproach. g N t = E [ t N 1 + N 2 ]= E [ t N 1 ] E [ t N 2 ]= g N 1 t g N 2 t = e )]TJ/F47 7.9701 Tf 6.587 0 Td [( + t .151

PAGE 110

92 CHAPTER3.MULTIVARIATEPROBABILITYMODELS where = 1 + 2 Butthelastexpressionin.151isthegeneratingfunctionforaPoissondistributiontoo!Andsincethere isaone-to-onecorrespondencebetweendistributionsandtransforms,wecanconcludethatNhasaPoisson distributionwithparameter .WeofcourseknewthatNwouldhavemean butdidnotknowthatNwould haveaPoissondistribution. So:AsumoftwoindependentPoissonvariablesitselfhasaPoissondistribution.Byinduction,thisisalso trueforsumsofkindependentPoissonvariables. 3.8.1.3RandomNumberofBitsinPacketsonOneLinkadvancedtopic Considerjustoneofthetwolinksnow,andforconveniencedenotethenumberofpacketsonthelinkbyN, anditsmeanas .ContinuetoassumethatNhasaPoissondistribution. LetBdenotethenumberofbitsinapacket,with B 1 ;:::;B N denotingthebitcountsintheNpackets.We assumethe B i areindependentandidenticallydistributed.Thetotalnumberofbitsreceivedduringthat timeperiodis T = B 1 + ::: + B N .152 SupposethegeneratingfunctionofBisknowntobehs.ThenwhatisthegeneratingfunctionofT? g T s = E s T .153 = E [ E s T j N ] .154 = E [ E s B 1 + ::: + B N j N ] .155 = E [ E s B 1 j N :::E s B N j N ] .156 = E [ h s N ] .157 = g N [ h s ] .158 = e )]TJ/F47 7.9701 Tf 6.586 0 Td [( + h s .159 Hereishowthesestepsweremade: Fromtherstlinetothesecond,weusedtheTheoremofTotalExpectation. Fromthesecondtothethird,wejustusedthedenitionofT. Fromthethirdtothefourthlines,wehaveusedalgebraplusthefactthattheexpectedvalueofa productofindependentrandomvariablesistheproductoftheirindividualexpectedvalues. Fromthefourthtothefth,weusedthedenitionofhs. Fromthefthtothesixth,weusedthedenitionof g N .

PAGE 111

3.9.VECTORSPACEINTERPRETATIONSFORTHEMATHEMATICALLYADVENTUROUSONLY 93 FromthesixthtothelastweusedtheformulaforthegeneratingfunctionforaPoissondistribution withmean WecanthengetalltheinformationaboutTweneedfromthisformula,suchasitsmean,variance,probabilitiesandsoon,asseenpreviously. 3.8.2OtherUsesofTransforms Transformtechniquesareusedheavilyinqueuinganalysis,includingformodelsofcomputernetworks.The techniquesarealsousedextensivelyinmodelingofhardwareandsoftwarereliability. 3.9VectorSpaceInterpretationsforthemathematicallyadventurousonly 3.9.1PropertiesofCorrelation Let V bethesetofallrandomvariableswithnitevarianceandmean0.Treatthisasavectorspace,with thesumoftwovectorsXandYtakentobetherandomvariableX+Y,foraconstantc,thevectorcXbeing therandomvariablecX.Notethat V isclosedundertheseoperations,asitmustbe. Deneaninnerproductonthisspace: X;Y = E XY = Cov X;Y .160 RecallthatCovX,Y=EXY-EXEY,andthatweareworkingwithrandomvariablesthathavemean0. ThusthenormofavectorXis jj X jj = X;X 0 : 5 = p E X 2 = p Var X .161 againsinceEX=0. ThefamousCauchy-SchwarzInequalityforinnerproductssays, j X;Y jjj X jjjj Y jj .162 i.e. j X;Y j 1 .163 Also,theCauchy-SchwarzInequalityyieldsequalityifandonlyifonevectorisascalarmultipleofthe other,i.e.Y=cXforsomec.Whenwethentranslatethistorandomvariablesofnonzeromeans,wegetY =cX+d. Inotherwords,thecorrelationbetweentworandomvariablesisbetween-1and1,withequalityifandonly ifoneisanexactlinearfunctionoftheother.

PAGE 112

94 CHAPTER3.MULTIVARIATEPROBABILITYMODELS 3.9.2ConditionalExpectationAsaProjection ContinuetoconsiderthevectorspaceinSection3.9.1. ForarandomvariableX,let W denotethesubspaceof V consistingofallfunctionshXwithmean0and nitevariance.Again,notethatthissubspaceisindeedclosedundervectoradditionandscalarmultiplication. NowconsideranyYin V .Recallthatthe projection ofYonto W istheclosestvectorTin W toY,i.e.T minimizes jj Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(T jj = E [ Y )]TJ/F46 10.9091 Tf 10.91 0 Td [(T 2 ] 0 : 5 .164 TondtheminimizingT,considerrsttheminimizationof E [ S )]TJ/F46 10.9091 Tf 10.909 0 Td [(c 2 ] .165 withrespecttoconstantscforsomerandomvariableS.Expandingthesquare,wehave E [ S )]TJ/F46 10.9091 Tf 10.909 0 Td [(c 2 ]= E S 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 cES + ES 2 .166 Taking d dc andsettingtheresultto0,wendthattheminimizingcisc=ES. Gettingbackto.164,usetheLawofTotalExpectationtowrite E [ Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(T 2 ]= E E [ Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(T 2 j X ] .167 Fromwhatwelearnedwith.165,appliedtotheconditionali.e.innerexpectationin.167,weseethat theTwhichminimizes.167isT=EY j X. Inotherwords,theconditionalmeanisaprojection!Nice,butisthisusefulinanyway?Theansweris yes,inthesensethatitguidestheintuition.Allthisisrelatedtoissuesofstatisticalpredictionherewe wouldbepredictingYfromXandthegeometryherecanreallyguideourinsight.Thisisnotveryevident withoutgettingdeeplyintothepredictionissue,butlet'sexploresomeoftheimplicationsofthegeometry. Forexample,aprojectionisperpendiculartothelineconnectingtheprojectiontotheoriginalvector.So 0= E Y j X ;Y )]TJ/F46 10.9091 Tf 10.91 0 Td [(E Y j X = Cov [ E Y j X ;Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(E Y j X ] .168 ThissaysthatthepredictionEY j Xisuncorrelatedwiththepredictionerror,Y-EY j X .Thisinturnhas statisticalimportance.Ofcourse,.168couldhavebeenderiveddirectly,butthegeometryofthevector spaceintepretationiswhatsuggestedwelookatthequantityintherstplace.Again,thepointisthatthe vectorspaceviewcanguideourintuition.

PAGE 113

3.10.PROOFOFTHELAWOFTOTALEXPECTATION 95 Simlarly,thePythagoreanTheoremholds,so jj Y jj 2 = jj E Y j X jj 2 + jj Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(E Y j X jj 2 .169 whichmeansthat Var Y = Var [ E Y j X ]+ Var [ Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(E Y j X ] .170 Equation.170isacommonthemeinlinearmodelsinstatistics,thedecompositionofvariance. 3.10ProofoftheLawofTotalExpectation Let'sprove.87forthecaseinwhichWandYtakevaluesonlyintheset f 1,2,3,... g .RecallthatifTis aninteger-valuerandomvariableandwehavesomefunctionh,thenL=hTisanotherrandomvariable, anditsexpectedvaluecanbecalculatedas 5 E L = X k h k P T = k .171 Inourcasehere,QisafunctionofW,sowenditsexpectationfromthedistributionofW: E Q = 1 X i =1 g i P W = i = 1 X i =1 E Y j W = i P W = i = 1 X i =1 2 4 1 X j =1 jP Y = j j W = i 3 5 P W = i = 1 X j =1 j 1 X i =1 P Y = j j W = i P W = i = 1 X j =1 jP Y = j = E Y Inotherwords, E Y = E [ E Y j W ] .172 5 ThisissometimescalledTheLawoftheUnconsciousStatistician,bynastyprobabilitytheoristswholookdownonstatisticians. Theirpointisthattechnically EL = P k kP L = k ,andthat.171mustbeproven,whereasthestatisticianssupposedlythink it'sadenition.

PAGE 114

96 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Exercises 1 .Supposetherandompair X;Y hasthedensity f X;Y s;t =8 st onthetriangle f s;t :0 0 : 4 3 .SupposeType1batterieshaveexponentiallydistributedlifetimeswithmean2.0hours,whileType2 batterylifetimesareexponentiallydistributedwithmean1.5. aSupposewehaveaportablemachinethathascompartmentsfortwobatteries,amain,ofType1,and abackup,ofType2.Onereplacestherstbythesecondassoonastherstfails.Findthedensityof W ,thetimethatthesystemisoperational,i.e.thesumofthelifetimesofthetwobatteries. bSupposewehavealargeboxcontainingamixtureofthetwotypesofbatteries,inproportionsqand 1-q.Wereachintothebox,chooseabatteryatrandom,thenuseit.Let Y bethelifetimeofthebattery wechoose.UsetheLawofTotalVariance,.90,tond Var Y 4 .Newspapersatacertainvendingmachinecost25cents.Suppose60%ofthecustomerspaywithquarters, 20%usetwodimesandanickel,15%insertadimeandthreenickels,and5%depositvenickels.When thevendorcollectsthemoney,vecoinsfalltotheground.Let X;Y amd Z denotethenumbersofquarters, dimesandnickelsamongthesevecoins. aIsthejointdistributionof X;Y;Z amemberofaparametricfamilypresentedinthischapter?Ifso, whichone? bFind P X =2 ;Y =2 ;Z =1 cFind X;Y 5 .BuslinesAandBintersectatacertaintransferpoint,withtheschedulestatingthatbusesfrombothlines willarrivethereat3:00p.m.However,theyareoftenlate,byamounts X and Y forthetwobuses.The bivariatedensityis f X;Y s;t =2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(s )]TJ/F46 10.9091 Tf 10.909 0 Td [(t; 0 6 bFind EW 6 .Showthat aX + b;cY + d = X;Y .174

PAGE 115

3.10.PROOFOFTHELAWOFTOTALEXPECTATION 97 foranyconstantsa,b,candd. 7 .Supposewewishtopredictarandomvariable Y byusinganotherrandomvariable, X .Wemayconsider predictorsoftheform cX + d forconstantscandd.Showthatthevaluesofcanddthatminimizethemean squaredpredictionerror, E [ Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(cX )]TJ/F46 10.9091 Tf 10.909 0 Td [(d 2 are c = E XY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX EY Var X .175 d = E X 2 EY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EX E XY Var X .176 8 .ProgramsAandBconsistofrandsmodules,respectively,ofwhichcmodulesarecommontoboth. Asasimplemodel,assumethateachmodulehasprobabilitypofbeingcorrect,withthemodulesacting independently.Let X and Y denotethenumbersofcorrectmodulesinAandB,respectively.Findthe correlation X;Y asafunctionofr,s,candp. Hint:Write X = X 1 + :::X r )]TJ/F47 7.9701 Tf 6.587 0 Td [(c ,where X i is1or0,dependingonwhethermoduleiofAiscorrect,forthe nonoverlappingmodulesofA.DothesameforB,andforthesetofcommonmodules. 9 .UsetransformmethodstoderivesomepropertiesofthePoissonfamily: aShowthatforanyPoissonrandomvariable,itsmeanandvarianceareequal. bSuppose X and Y areindependentrandomvariables,eachhavingaPoissondistribution.Showthat Z = X + Y againhasaPoissondistribution. 10 .Supposeonekeepsrollingadie.Let S n denotethetotalnumberofdotsafternrolls,mod8,andlet T bethenumberofrollsneededfortheevent S n =0 tooccur.Find E T ,usinganapproachlikethatinthe trappedminerexampleinSection3.5.5. 11 .Inourordinarycoinswhichweuseeveryday,eachonehasaslightlydifferentprobabilityofheads, whichwe'llcall H .Say H hasthedistribution N : 5 ; 0 : 03 2 .Wechooseacoinfromabatchatrandom, thentossit10times.Let N bethenumberofheadsweget.Find Var N 12 .JackandJillplayadicegame,inwhichonewins$1perdot.Therearethreedice,dieA,dieBand dieC.JillalwaysrollsdiceAandB.JackalwaysrollsjustdieC,buthealsogetscreditfor90%ofdieB. Forinstance,sayinaparticularrollA,BandCare3,1and6,respectively.ThenJillwouldwin$4and Jackwouldget$6.90.Let X and Y beJill'sandJack'stotalwinningsafter100rolls.UsetheCentralLimit Theoremtondtheapproximatevaluesof P X> 650 ;Y< 660 and P Y> 1 : 06 X Hints:ThiswillfollowasimilarpatterntothedicegameinSection3.6.2.3,whichwewin$5foronedot, and$2fortwoorthreedots.Remember,inthatexample,thekeywasthatwenoticedthatthepair X;Y wasasumofrandompairs.Thatmeantthat X;Y hadanapproximatebivariatenormaldistribution,so wecouldndprobabilitiesifwehadthemeanvectorandcovariancematrixof X;Y .Thusweneededto nd EX;EY;Var X ;Var Y and Cov X;Y .Weusedthevariouspropertiesof E ;Var and Cov togetthosequantities.

PAGE 116

98 CHAPTER3.MULTIVARIATEPROBABILITYMODELS Youwilldothesamethinghere.Write X = U 1 + ::: + U 1 00 ,where U i isJill'swinningsonthei th roll. Write Y asasimilarsumof V i .Youprobablywillndithelpfultodene A i B i and C i asthenumbersof dotsappearingondiceA,BandConthei th roll.Thennd EX etc.Again,makesuretoutilizethevarious propertiesfor E ;Var and Cov 13 .Showthatifrandomvariables U and V areindependent, Var UV = E U 2 Var V + Var U EV 2 .177

PAGE 117

Chapter4 IntroductiontoStatisticalInference 4.1WhatStatisticsIsAllAbout Ifyoufollowtheeventsinvolvingspacetravel, 1 ,youmayhearstatementslike,Thereisa40%chancethat weatherconditionsonFridaywillbegoodenoughtolaunchthespaceshuttle.Yourresponsemightbe curiosityastothefollowingquestions: Whatdoesthat40%gurereallymean? Howaccurateisthatgure? Whatdatawasusedtoobtainthatgure,andwhatmathematicalmodelwasused? Well,thesearetypicalstatisticalissues. Ifyouthoughtthatstatisticsisnothingmorethanaddingupcolumnsofnumbersandpluggingintoformulas, youarebadlymistaken.Actually,statisticsisanapplicationofprobabilitytheory.Weemployprobabilistic modelsforthebehaviorofoursampledata,and infer fromthedataaccordinglyhencethename, statistical inference 4.2IntroductiontoCondenceIntervals 4.2.1HowLongShouldWeRunaSimulation? Inoursimulationsinpreviousunits,itwasneverquiteclearhowlongthesimulationshouldberun,i.e.what valuetosetfor nreps .Nowwewillnallyaddressthisissue. Asourexample,recallfromtheBusParadoxinSection2.5:Busesarriveatacertainbusstopatrandom times,withinterarrivaltimesbeingindependentexponentiallydistributedrandomvariableswithmean10 1 Personally,Idon't. 99

PAGE 118

100 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE minutes.Youarriveatthebusstopeverydayatacertaintime,sayfourhoursminutesafterthebuses starttheirmorningrun.Whatisyourmeanwaitforthenextbus? Wefoundmathematicallythat,duetothememorylesspropertyoftheexponentialdistribution,ourwaitis againexponentiallydistributedwithmean10.Butsupposewedidn'tknowthat,andwewishedtondthe answerviasimulation.Wecouldwriteaprogram: 1 doexpt<-functionopt{ 2 lastarrival<-0.0 3 whilelastarrival
PAGE 119

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 101 Let W i denotethe i th waitingtime,i=1,2,...,nandlet W denotethesamplemean, W = W 1 + :::W n n .1 W iswhattheprogramprintsout. Thekeypointsarethat Therandomvariables W i eachhavethedistribution F W ,andthuseachhavemean andvariance 2 Therandomvariables W i areindependent. Themeanof W isalso : E W = 1 n E n X i =1 W i forconst.c, E cU = cEU .2 = 1 n n X i =1 EW i E [ U + V ]= EU + EV .3 = 1 n n EW i = .4 = .5 Thevarianceof W is1/nofthepopulationvariance: Var W = 1 n 2 Var n X i =1 W i forconst.c, ;Var [ cU ]= c 2 Var [ U ] .6 = 1 n 2 n X i =1 Var W i forU,Vindep., ;Var [ U + V ]= Var [ U ]+ Var [ V ] .7 = 1 n 2 n 2 .8 = 1 n 2 .9 Let'sthinkofthenotebookexampleinadifferentcontext.Hereourexperimentistosample20waittimes, againeitherbypersonallygoingtothebusstop20timesorrunningtheaboveprogramwith nreps equal to20. Eachline ofthenotebookwouldconsistofdatafrom20visitstothebusstop,withwaittimes W 1 ;:::;W 20 andthesamplemean W .Soournotebookwouldhaveacolumnfor W 1 ,onefor W 2 ,...,one for W 20 andespeciallyonefor W .Hereiswhatwewouldnd: Whenwesaythateach W i hasthesamedistributionasthepopulation,wemeanthefollowing,say fori=2:Ifweweretogathertogetherallthevaluesof W 1 ,onefromeachoftheinnitelymanylines

PAGE 120

102 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE ofthenotebook,thentheiraveragewouldbe10.0.Also,thelong-runproportionoflinesforwhich W 1 < 4 ,say,wouldbeequalto P W< 4 ItcanbeshownthatactuallyWhasanexponentialdistribution.Thisfollowsfromthememoryless property.So, P W< 4= Z 4 0 0 : 1 e )]TJ/F44 7.9701 Tf 6.586 0 Td [(0 : 1 t dt =0 : 33 .10 Thereforethelong-runproportionoflinesforwhich W 1 < 4 wouldbe0.33. Andifweweretocalculatethestandarddeviationofallthosevaluesof W 1 ,we'dget which wealsoknowtobe10,sincethemeanandstandarddeviationareequalinthecaseofexponential distributions. Equation.5saysthatifweweretoaverageallthevaluesof W overallthelinesofthenotebook, we'dget10.0theretoo. Ifweweretocalculatethestandarddeviationofthosevaluesof W ,we'dget = p n whichweknow tobe0.5. Thesepointsareabsolutelykey,formingtheverybasisofstatistics.Youshouldspendextratimepondering them. 4.2.2.2OurFirstCondenceInterval TheCentralLimitTheoremthentellsusthat Z = W )]TJ/F46 10.9091 Tf 10.909 0 Td [( = p n .11 hasanapproximatelyN,1distribution.Wewillbeinterestedinthecentral95%ofthatdistribution, whichduetosymmetryhave2.5%oftheareainthelefttailand2.5%intherightone.ThroughtheRcall qnorm.025 ,orbyconsultingaN,1cdftableinabook,wendthattherecuttoffpointsareat-1.96 and1.96.Thus 0 : 95 P )]TJ/F15 10.9091 Tf 8.484 0 Td [(1 : 96 < W )]TJ/F46 10.9091 Tf 10.909 0 Td [( = p n < 1 : 96 .12 Doingabitofalgebraontheinequalitiesyields 0 : 95 P W )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 p n << W +1 : 96 p n .13 Nowremember,notonlydowenotknow ,wealsodon'tknow .Butwecanestimateit,asfollows:

PAGE 121

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 103 Recallthatbydenition 2 = E [ W )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 ] .14 Let'sestimate 2 bytakingsampleanalogs.Thesampleanalogof is W .Whataboutthesampleanalog oftheE?Well,sinceEaveragingoverthewholepopulationofWs,thesampleanalogistoaverage overthesample.So,weget 1 n n X i =1 W i )]TJETq1 0 0 1 324.491 527.155 cm[]0 d 0 J 0.436 w 0 0 m 11.818 0 l SQBT/F46 10.9091 Tf 324.491 518.174 Td [(W 2 .15 Inotherwords,justasitisnaturaltoestimatethepopulationmeanofWbyitssamplemean,thesameholds forVarW: ThepopulationvarianceofWisthemeansquareddistancefromWtoitspopulationmean. ThereforeitisnaturaltoestimateVarWbytheaveragesquareddistanceofWfromitssample mean,amongoursamplevalues W i Weuse s 2 asoursymbolforthisestimateofpopulationvariance. 4 Wethustakeourestimateof tobe s thesquarerootofthatquantity. Bytheway,.15isequalto s 2 = 1 n n X i =1 W 2 i )]TJETq1 0 0 1 339.029 308.621 cm[]0 d 0 J 0.436 w 0 0 m 11.818 0 l SQBT/F46 10.9091 Tf 339.029 299.64 Td [(W 2 .16 Caution:Thiswayofcomputing s 2 issubjecttomoreroundofferror. Onecanshowthedetailswillbegivenattheendofthissectionthat.13isstillvalidifwesubstitute s for ,i.e. 0 : 95 P W )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 s p n << W +1 : 96 s p n .17 Inotherwords,weareabout95%surethattheinterval W )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 s p n ; W +1 : 96 s p n .18 contains .Thisiscalleda95% condenceinterval for Wecouldaddthisfeaturetoourprogram: 4 ThoughItrytosticktotheconventionofusingonlycapitalletterstodenoterandomvariables,itisconventionaltouselower caseinthisinstance.

PAGE 122

104 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE 1 doexpt<-functionopt{ 2 lastarrival<-0.0 3 whilelastarrival
PAGE 123

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 105 thoughit'snotimmediatelyimportanthere,notethattherewouldalsobecolumnsfor W 1 through W 1000 theweightsofour1000people,andcolumnsfor W ands. Nowhereisthepoint:Approximately95%ofallthoseintervalswouldcontain ,themeanweightinthe entireadultpopulationofDavis.Thevalueof wouldbeunknowntousthat'swhywe'dbesampling 1000peopleintherstplace!butitdoesexist,anditwouldbecontainedinapproximately95%ofthe intervals. Asavariationonthenotebookidea,thinkofwhatwouldhappenifyouand99friendseachdothisexperiment.Eachofyouwouldsample1000peopleandformacondenceinterval.Sinceeachofyouwould getadifferentsampleofpeople,youwouldeachgetadifferentcondenceinterval.Whatwemeanwhen wesaythecondencelevelis95%isthatofthe100intervalsformedbyyouand99friendsabout95of themwillcontainthetruepopulationmeanweight.Ofcourse,youhopeyouyourselfwillbeoneofthe95 luckyones!Butremember,you'llneverknowwhoseintervalsarecorrectandwhosearen't. Nowremember,inpracticeweonlytake one sampleof1000people.Ournotebookideahereismerelyfor thepurposeofunderstandingwhatwemeanwhenwesaythatweareabout95%condentthatoneinterval weformdoescontainthetruevalueof 4.2.3.2BacktoOurBusSimulation Well,inoursimulationcase,itis exactlythesamesituation .Simulationisasamplingprocess.Our isthe meaninthepopulationofallbuswaits,while W isthemeaninoursampleof1000waits.Thisisnotmere analogy;mathematicallythetwosituationsarecompletelyidentical,twoinstancesofthesameprinciple. Let'susetheyouandyour99friendsideaagain.Supposedeachofyou100peopleruntheRprogramat theendofSection4.2.2.2.Eachofyouwillgetadifferentcondenceintervalprintedoutattheendofyour run. 6 Well,whenwesaythattheprogramprintsouta95%condenceinterval,wemeanthatabout95of you100peoplewillhaveanintervalthatcontainsthetruevalueofEW. IntheDavisweightexampleabove,Istressedthatwedon'tknow afterall,that'sthereasonweare takingasampleofpeople,soastoestimate Similarly,thewholepointofdoingasimulationtondsomequantityERisthatwedon'tknowthevalue ofER!WewillsimulatemanyvaluesofR,forming R ,andusethatquantityasanestimateofER. Butourbusexamplewasjustthatan example ,setuptoillustratethenotionofaddingacondenceinterval totheoutputofasimulation.WeactuallydoknowthevalueofEWhere;it's10.Thatmakesthisarather articialexample,butthat'sgood,becauseitwillallowustoreallyseetheyouand99friendsideain action,asfollows. We'llexpandthecodetosimulate1000peoplerunningtheoriginalprogram.Inotherwords,we'lladdan extraouterlooptodo1000runsoftheprogram.Eachrunwillcomputethecondenceinterval,andthen we'llseeintheendhowmanyofthe1000runshaveacondenceintervalthatincludesthetrueEW,10.0: 1 doexpt<-functionopt{ 2 lastarrival<-0.0 3 whilelastarrival
PAGE 124

106 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE 4 lastarrival<-lastarrival+rexp,0.1 5 returnlastarrival-opt 6 } 7 8 observationpt<-240 9 nreps<-1000 10 numruns<-1000 11 waits<-vectorlength=nreps 12 numcorrectcis<-0#numberofconf.ints.thatcontain10.0 13 forrunin1:numruns{ 14 forrepin1:nrepswaits[rep]<-doexptobservationpt 15 wbar<-meanwaits 16 s2<-meanwaits-meanwbar 17 s<-sqrts2 18 radius<-1.96 s/sqrtnreps 19 ifabswbar-10.0<=radiusnumcorrectcis<-numcorrectcis+1 20 } 21 cat"approx.trueconfidencelevel=",numcorrectcis/numruns,"n" Infact,theoutputofthatprogramwas0.958,sureenoughabout95%. Whyisitnotexactly0.95? Weonlysimulated1000intervals;ideallyitshouldbeaninnitenumber,togettheexactprobability thatanintervalcontains TheCentralLimitTheoremisonlyapproximate. Ideallywewoulduse.13,butduetolackofknowledgeofthetruevalueof wedon'tknow ,so whywouldweknow ?,weresortedtousingsinstead,in.18. Againrememberthatinpracticeweonlydo one runofsimulating1000waitsforthebus.Oursimulation codeaboveismerelyforthepurposeofunderstandingwhatwemeanwhenwesaythatweareabout95% condentthatoneintervalweformdoescontainthetruevalueof 4.2.3.3OneMorePointAboutInterpretation Somestatisticsinstructorsgivestudentstheoddwarning,Youcan'tsaythattheprobabilityis95%that isINtheinterval;youcanonlysaythattheprobabilityis95%condentthattheintervalCONTAINS 7 Thisofcourseisnonsense.Asanyfoolcansee,thefollowingtwostatementsareequivalent: isintheinterval theintervalcontains Soitisridiculoustosaythattherstisincorrect.Yetmanyinstructorsofstatisticssayso. Wheredidthiscrazinesscomefrom?Well,waybackintheearlydaysofstatistics,someinstructorwas afraidthatastatementlikeTheprobabilityis95%that isintheintervalwouldmakeitsoundlike isa 7 SeeforexampletheWikipediaentry,CondenceIntervals, http://en.wikipedia.org/wiki/Confidence_ interval#Meaning_and_interpretation .

PAGE 125

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 107 randomvariable.Granted,thatwasalegitimatefear,because isnotarandomvariable,andwithoutproper warning,somelearnersofstatisticsmightthinkincorrectly.Therandomentityistheinterval,not .Thisis clearinourprogramabovethe10isconstant,while wbar and s varyfromintervaltointerval. So,itwasreasonableforteacherstowarnstudentsnottothink isarandomvariable.Butlateron,some idiotmusthavethendecidedthatitisincorrecttosay isintheinterval,andotheridiotsthenfollowed suit. 4.2.4SamplingWithandWithoutReplacement Implicitinouranalysessofarinourassumptionthatthe W i areindependentisthatwearesampling with replacement ,whichmeansit'spossiblethatourrandomsamplingprocessmightchoosethesameperson twice. Ifwesamplewithreplacement,wesaythatwehavea randomsample .Ifitisdonewithoutreplacement,it's calleda simplerandomsample .Inthelattercase,.9doesnothold,becausethe W i arenotindependent thoughtheyarestillidenticallydistributed.Toseethis,supposethatDaviswereatinytownconsisting ofjustthreeadults,withweights120,161and190.Thenifforexample W 1 =190 ,then E W 2 j W 1 = +161 = 2=140 : 5 ,while E W 1 =120+161+190 = 3=157 .Thus W 1 and W 2 arenotindependent, and.9wouldfail. 8 Butexceptforcasesinwhichoursamplesizeisasubstantialfractionofthepopulationsize,theprobability ofgettingthesamepersontwicewouldbeverylow,soitdoesn'tmatter.Thuswecansafelyuseanalyses whichassumewith-replacementsamplingevenifweareusingwithout-replacementsampling. 4.2.5OtherCondenceLevels Wehavebeenusing95%asourcondencelevel.Thisiscommon,butofcoursenotunique.Wecanfor instanceuse90%,whichgivesusanarrowerintervalin.18,wemultiplyby1.65insteadofby1.96, whichthereadershouldcheck,attheexpenseoflowercondence. Acondenceinterval'serrorrateisusuallydenotedby 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( ,soa95%condencelevelhas =0 : 05 4.2.6TheStandardErroroftheEstimate Remember, W isarandomvariable.InourDavispeopleexample,eachlineofthenotebookwouldcorrespondtoadifferentsampleof1000people,andthuseachlinewouldhaveadifferentvaluefor W .Thusit makessensetotalkabout Var W ,andtorefertothesquartrootofthatquantity,i.e.thestandarddeviation of W .In.9,wefoundthistobe = p n anddecidedtoestimateitby s= p n .Thelatteriscalledthe standarderroroftheestimate s.e.,meaningtheestimateofthestandarddeviationoftheestimate W Theword estimate wasusedtwiceintheprecedingsentence.Makesuretounderstandthetwodifferent settingsthattheyapplyto. 8 Note,though,that.5 does hold,becauseexpectedvaluesofsumsequalsumsofexpectedvaluesevenfordependentrandom variables.

PAGE 126

108 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Wecanseefrom.18whattodoingeneral,ifweareestimatingsomenumber by b 9 andthelatter hasanapproximatelynormaldistribution.Let s:e: b denoteourestimateforthestandarddeviationofthat distribution,i.e.thestandarderrorof b .Thenanapproximate95%condenceintervalfor is b 1 : 96 s : e : b .19 Thestandarderroroftheestimateisoneofthemostcommonly-usedquantitiesinstatisticalapplications. YouwillencounteritfrequentlyintheoutputofR,forinstance.Makesureyouunderstandwhatitmeans andhowitisused. 4.2.7WhyNotDividebyn-1?TheNotionofBias Itshouldbenotedthatitiscustomaryin.15todividebyn-1insteadofn,forreasonsthatarelargely historical.Here'stheissue: Ifwedividebyn,aswehave,thenitturnsoutthat n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 n 2 .20 ThinkaboutthisintheDavispeopleexample,onceagaininthenotebookcontext.Remember,herenis 1000,andeachlineofthenotebookrepresentsourtakingadifferentrandomsampleof1000people.Within eachline,therewillbeentriesfor W 1 through W 1000 ,theweightsofour1000people,andfor W and s .For convenience,let'ssupposewerecordthatlastcolumnas s 2 insteadof s Now,saywewanttoestimatethepopulationvariance 2 .Asdiscussedearlier,thenaturalestimatorforit wouldbethesamplevariance, s 2 .What.20saysisthatafterlookingataninnitenumberoflinesinthe notebook,theaveragevalueof s 2 wouldbejust...a...little...bit...too...small.Allthe s 2 valueswouldaverage outto 0 : 999 2 ,ratherthanto 2 .Wemightsaythat s 2 hasalittlebitmoretendencytounderestimate 2 thantooverestimateit. Wesaythat s 2 isa biased estimatorof 2 ,withtheamountofbiasbeing E s 2 = 1 n 2 .21 Let'sprove.20.We'lluse.16.Asbefore,letWbearandomvariabledistributedasthepopulation. RecallfromSection4.2.2.1thatthisimpliesthat EW = and Var W = 2 ,where and 2 arethe populationmeanandvariance.Write E n X i =1 W 2 i = nE W 2 Sec.4.2.2.1 .22 = n [ Var W + EW 2 ] shortcutformulaforVar .23 = n [ 2 + 2 ] Sec.4.2.2.1 .24 9 Thequantityispronouncedtheta-hat.Thehatsymbolistraditionalforestimateof.

PAGE 127

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 109 Continuingtoworkfrom.16,write E [ W 2 ]= Var W +[ E W ] 2 = 1 n 2 + 2 .25 Nowusingallthisin.16,weget E s 2 = n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 n 2 .26 Theearlierdevelopersofstatisticswerebotheredbythisbias,sotheyintroducedafudgefactorbydividing byn-1insteadofnin.15.Butwewillusen.Afterall,whennislargewhichiswhatweareassuming byusingtheCentralLimitTheoremintheentiredevelopmentsofaritdoesn'tmakeanyappreciable difference.ClearlyitisnotimportantinourDavisexample,orourbussimulationexample. Moreover,speakinggenerallynowratherthannecessarilyforthecaseof s 2 thereisnoparticularreasonto insistthatanestimatorbeunbiasedanyway.Analternativeestimatormayhavealittlebiasbutmuchsmaller variance,andthusmightbepreferable.Andanyway,eveniftheclassicalversionof s 2 isanunbiased estimatorfor 2 s isnotanunbiasedestimatorfor ,thepopulationstandarddeviation.Inotherwords, unbiasednessisnotsuchanimportantproperty. So,inourtreatmenthere,ourdenitionof s 2 dividesbynratherthanbyn-1. TheRfunctions var and sd calculatetheversionsof s 2 and s ,respectively,thathaveadivisorofn-1. 4.2.8AndWhatAbouttheStudent-tDistribution? Anotherthingwearenotdoinghereistousethe Studentt-distribution .Thatisthenameofthedistribution ofthequantity T = W )]TJ/F46 10.9091 Tf 10.909 0 Td [( ~ s= p n .27 Here ~ s denotesthevalueofsunderitsclassicaldenition,inwhichwedividebyn-1insteadofn.Note carefullythatweareassumingthatthe W i themselvesnotjust W haveanormaldistribution.Theexact distributionofTiscalledthe Studentt-distributionwithn-1degreesoffreedom .Thesedistributionsthus formaone-parameterfamily,withthedegreesoffreedombeingtheparameter. Thisdistributionhasbeentabulated.InR,forinstance,thefunctions dt pt andsoonplaythesameroles as dnorm pnorm etc.doforthenormalfamily.Thecall qnorm.975,9 returns2.26.Thisenables ustogetanfor fromasampleofsize10,atEXACTLYa95%condencelevel,ratherthanbeingatan APPROXIMATE95%levelaswehavehadhere,asfollows. Westartwith.12,replacing1.96by2.26, W )]TJ/F46 10.9091 Tf 10.717 0 Td [( = = p n byT,and by = .Doingthesamealgebra, wendthefollowingcondenceintervalfor : W )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 : 26 ~ s p 10 ; W +2 : 26 ~ s p 10 .28

PAGE 128

110 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Ofcourse,forgeneraln,replace2.26by t 0 : 975 ;n )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ,the0.975quantileofthet-distributionwithn-1degrees offreedom.ThedistributionistabulatedbytheRfunctions dt pt andsoon. Idonotusethet-distributionherebecause: Itdependsontheparentpopulationhavinganexactnormaldistribution,whichisneverreallytrue.In theDaviscase,forinstance,people'sweightsareapproximatelynormallydistributed,butdenitely notexactlyso.Forthattobeexactlythecase,somepeoplewouldhavetohaveweightsofsay,a billionpounds,ornegativeweights,sinceanynormaldistributiontakesonallvaluesfrom to 1 Forlargen,thedifferencebetweenthet-distributionandN,1isnegligibleanyway. 4.2.9CondenceIntervalsforProportions Inourbusexampleabove,supposewealsowantoursimulationtoprintouttheestimatedprobabilitythat onemustwaitlongerthan6.2minutes: 1 doexpt<-functionopt{ 2 lastarrival<-0.0 3 whilelastarrival6.4]/nreps 19 cat"approx.PW>6.4=",prop,"n" Thevalueprintedoutfortheprobabilityis0.516.Weagainaskthequestion,howcanwegaugetheaccuracy ofthisnumberasanestimatorofthetrueprobability P W> 6 : 4 ? 4.2.9.1Derivation Itturnsoutthatwealreadyhaveouranswer,becauseaprobabilityisjustaspecialcaseofamean.Tosee this,let Y = 1 ; if W> 6 : 2 0 ; otherwise .29

PAGE 129

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 111 Then E Y =1 P Y =1+0 P Y =0= P W> 6 : 2 .30 Let p denotethisprobability,andlet b p denoteourestimateofit; b p isour prop intheprogram.In.16,take W i tobeour Y i here,andnotethat Y 2 i = Y i .Thatmeansthat s 2 = b p )]TJ/F52 10.9091 Tf 11.533 0 Td [(b p 2 = b p )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p .31 Equation.18becomes b p )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 p b p )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p =n; b p +1 : 96 p b p )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p =n .32 4.2.9.2Examples Weincorporatethatintoourprogram: 1 doexpt<-functionopt{ 2 lastarrival<-0.0 3 whilelastarrival6.4]/nreps 19 s2<-prop -prop 20 s<-sqrts2 21 radius<-1.96 s/sqrtnreps 22 cat"approx.PW>6.4=",prop,",withamarginoferrorof",radius,"n" Inthiscase,wegetmarginoferrorof0.03,thusanintervalof.51,0.57.Wewouldsay,Wedon'tknow theexactvalueof P W> 6 : 4 ,soweranasimulation.Thelatterestimatesthisprobabilitytobe0.54,with a95%marginoferrorof0.03. NoteagainthatthisusesthesameprinciplesasourDavisweightsexample.Supposewewereinterestedin estimatingtheproportionofadultsinDaviswhoweighmorethan150pounds.Supposethatproportionis 0.45inoursampleof1000people.Thiswouldbeourestimate b p forthepopulationproportion p ,andan approximate95%condenceinterval.32forthepopulationproportionwouldbe.42,0.48.Wewould thensay,Weare95%condentthatthetruepopulationproportionpofpeoplewhoweighover150pounds isbetween0.42and0.48.

PAGE 130

112 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Notealsothatalthoughwe'veusedtheword proportion intheDavisweightsexampleinsteadof probability theyarethesame.IfIchooseanadultatrandomfromthepopulation,theprobabilitythathis/herweightis morethan150isequaltotheproportionofadultsinthepopulationwhohaveweightsofmorethan150. Andthesameprinciplesareusedinopinionpollsduringpresidentialelections.Here p isthepopulation proportionofpeoplewhoplantovoteforthegivencandidate.Thisisanunknownquantity,whichis exactlythepointofpollingasampleofpeopletoestimatethatunknownquantityp.Ourestimateis b p ,the proportionofpeopleinoursamplewhoplantovoteforthegivencandidate,andnisthenumberofpeople thatwepoll.Weagainuse.32. 4.2.9.3Interpretation Thesameinterpretationholdsasbefore.Considertheexamplesinthelastsection: Ifeachofyouand99friendsweretoruntheRprogramatthebeginningofSection4.2.9.2,you100 peoplewouldget100condenceintervalsfor P W> 6 : 4 .About95ofyouwouldhaveintervals thatdocontainthatnumber. Ifeachofyouand99friendsweretosample1000peopleinDavisandcomeupwithcondence intervalsforthetruepopulationproportionofpeoplewhoweightmorethan150pounds,about95of youwouldhaveintervalsthatdocontainthattruepopulationproportion. Ifeachofyouand99friendsweretosample1200peopleinanelectioncampaign,toestimatethetrue populationproportionofpeoplewhowillvoteforcandidateX,about95ofyouwillhaveintervals thatdocontainthispopulationproportion. 4.2.9.4Non-EffectofthePopulationSize NotethatinboththeDavisandelectionexamples,itdoesn'tmatterwhatthesizeofthepopulationis.The approximatedistributionof b p ,Np,p-p/n,andthustheaccuracyof b p ,dependsonlyon p and n .Sowhen peopleask,Howapresidentialelectionpollcangetbywithsamplingonly1200people,whenthereare morethan100,000,000votersintheU.S.?nowyouknowtheanswer.We'lldiscussthequestionWhy 1200?below. Anotherwaytoseethisistothinkofasituationinwhichwewishtoestimatetheprobabilitypofheads foracertaincoin.Wetossthecoinntimes,anduse b p asourestimateofp.Hereourpopulationthe populationofallcointossesisinnite,yetitisstillthecasethat1200tosseswouldbeenoughtogeta goodestimateofp. 4.2.9.5PlanningAhead Now,whydothepollsterssample1200people? First,notethatthemaximumpossiblevalueof b p )]TJ/F52 10.9091 Tf 11.179 0 Td [(b p is0.25. 10 Thenthepollstersknowthattheirmargin 10 Usecalculustondthemaximumvalueoffx=x-x.

PAGE 131

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 113 oferrorwithn=1200willbeatmost 1 : 96 0 : 5 = p 1200 ,orabout3%,evenbeforetheypollanyone.They consider3%tobesufcientlyaccuratefortheirpurposes,so1200isthentheychoose. 4.2.10One-SidedCondenceIntervals Condenceintervalsasdiscussedsofargiveonebothanupperandlowerboundfortheparameterofinterest. Fromhereon,theword parameter isusedinabroadercontextthanjustparametricfamiliesofdistributions. Thetermwillrefertoanypopulationquantity. Insomeapplications,weareinterestedinhavingonlyanupperbound,oronlyalowerbound.Onecango throughthesamekindofreasoningasinSection4.2abovetoobtainapproximate95%one-sided condence intervals: W )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 65 s p n ; 1 .33 ; W +1 : 65 s p n .34 Notetheconstant1.65,whichisthe0.95quantileoftheN,1distr,comparedto1.96,the0.975quantile. 4.2.11CondenceIntervalsforDifferencesofMeansorProportions 4.2.11.1IndependentSamples SupposeinoursamplingofpeopleinDaviswearemainlyinterestedinthedifferenceinweightsbetween menandwomen.Let X and n 1 denotethesamplemeanandsamplesizeformen,andlet Y and m 1 forthe women.Denotethepopulationmeansandvariancesby i and 2 i ,i=1,2.Wewishtondacondence intervalfor 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 .Thenaturalestimatorforthatquantityis X )]TJETq1 0 0 1 367.579 225.047 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 367.579 216.066 Td [(Y Inordertoformacondenceintervalfor 1 )]TJ/F46 10.9091 Tf 9.273 0 Td [( 2 using X )]TJETq1 0 0 1 330.795 204.298 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 330.795 195.316 Td [(Y ,weneedtoknowthedistributionofthatlatter quantity.Toseethis,recallthatthisishowweeventuallygot.18;westartedbynotingthedistribution of W ,ormorepreciselythedistributionof W )]TJ/F46 10.9091 Tf 11.472 0 Td [( = = p n in.11,andthenusedthattoderiveour condenceinterval.So,hereweneedtoknowthedistributionof X )]TJETq1 0 0 1 380.224 163.65 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 380.224 154.668 Td [(Y Noterstthat X and Y areindependent.Theycomefromseparatepeople.Also,asnotedbefore,theyare approximatelynormallydistributed.So,theyjointlyhaveanapproximatelybivariatenormaldistribution. Thenfromourearlierunitonmultivariatedistributions,page85,weknowthatthelinearcombination X )]TJETq1 0 0 1 268.152 77.213 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 268.152 68.231 Td [(Y =1 X + )]TJ/F15 10.9091 Tf 8.485 0 Td [(1 Y .35 willalsohaveanapproximatelynormaldistribution,withmean 1 + )]TJ/F15 10.9091 Tf 8.484 0 Td [(1 2 andvariance 2 1 =n 1 + )]TJ/F15 10.9091 Tf 8.485 0 Td [(1 2 2 2 =n 2 .

PAGE 132

114 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Ifwethenlet s 2 i ,i=1,2denotethetwosamplevariances,wehavethat Z = X )]TJETq1 0 0 1 295.74 621.81 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 295.74 612.828 Td [(Y )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 q s 2 1 n 1 + s 2 2 n 2 .36 hasanapproximateN,1distribution,andworkingasbefore,wehavethatanapproximate95%condence intervalfor 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 is 0 @ X )]TJETq1 0 0 1 210.506 507.576 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 210.506 498.594 Td [(Y )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 s s 2 1 n 1 + s 2 2 n 2 ; X )]TJETq1 0 0 1 331.651 507.576 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 331.651 498.594 Td [(Y +1 : 96 s s 2 1 n 1 + s 2 2 n 2 1 A .37 Asimilarderivationgivesusanapproximate95%condenceintervalforthedifferenceintwopopulation proportions p 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(p 2 : 0 @ b p 1 )]TJ/F52 10.9091 Tf 12.99 0 Td [(b p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 s s 2 1 n 1 + s 2 2 n 2 ; b p 1 )]TJ/F52 10.9091 Tf 12.99 0 Td [(b p 2 +1 : 96 s s 2 1 n 2 + s 2 2 n 2 1 A .38 where s 2 i = b p i )]TJ/F52 10.9091 Tf 12.314 0 Td [(b p i .39 Example: Inanetworksecurityapplication,C.Mano etal 11 compareround-triptraveltimeforpackets involvedinthesameapplicationincertainwiredandwirelessnetworks.Thedatawasasfollows: sample samplemean samples.d. samplesize wired 2.000 6.299 436 wireless 11.520 9.939 344 Wehadobservedquiteadifference,11.52versus2.00,butcoulditbeduetosamplingvariation?Maybewe haveunusualsamples?Thiscallsforacondenceinterval! Thena95%condenceintervalforthedifferencebetweenwirelessandwirednetworksis 11 : 520 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 : 000 1 : 96 r 9 : 939 2 344 + 6 : 299 2 436 =9 : 52 1 : 22 .40 Soyoucanseethatthereisabigdifferencebetweenthetwonetworks,evenafterallowingforsampling variation. 11 RIPPS:RogueIdentifyingPacketPayloadSlicerDetectingUnauthorizedWirelessHostsThroughNetworkTrafcConditioning,C.Manoandatonofotherauthors,ACMT RANSACTIONSON I NFORMATION S YSTEMSAND S ECURITY ,toappear.

PAGE 133

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 115 4.2.11.2RandomSampleSize InourDavisweightsexampleinSection4.2.3.1,wewereimplicitlyassumingthatthesamplessizesofthe twogroups, n 1 and n 2 ,werenonrandom.Forinstance,wemightsample500menand500women. Ontheotherhand,wemightsimplysample1000peoplewithoutregardtogender.Thenthenumberofmen andwomeninthesamplewouldberandom.Thinkonceagainofournotebookview.Inourrstsampleof 1000people,wemighthave492menand508women.Inoursecondsample,thegenderbreakdownmight be505and495,andsoon.Inkeepingwiththeconventiontodenoterandomquantitiesbycapitalletters, wemightwritethenumbersofmenandwomeninoursampleas N 1 and N 2 However,inmostcasesitshouldnotmatter.Aslongasthereisnotsomeoddpropertyofoursampling method,e.g.inwhichtherewouldbetendencyforlargersamplestohaveshortermen,wecansimplydo ourinferenceconditionallyon N 1 and N 2 ,thustreatingthemasconstants. 4.2.11.3DependentSamples Notecarefully,though,thatakeypointabovewastheindependenceofthetwosamples.Bycontrast, supposewewish,forinstance,tondacondenceintervalfor 1 )]TJ/F46 10.9091 Tf 11.209 0 Td [( 2 ,thedifferenceinmeanweightsin Davisof15-year-oldand10-year-oldchildren,andsupposeourdataconsistofpairsofweightmeasurements atthetwoageson thesamechildren .Inotherwords,wehaveasampleofnchildren,andforthe i th child wehavehis/herweight U i atage15and V i atage10.Let V and U denotethesamplemeans. Theproblemisthatthetwosamplemeansarenotindependent.Ifachildistallerthanhis/herpeersatage 15,he/shewasprobablytallerthanthemwhentheywereallage10.Inotherwords,foreachi, V i and U i arepositivelycorrelated,andthusthesameistruefor V and U .Thuswecannotuse.37. However,therandomvariables T i = V i )]TJ/F46 10.9091 Tf 11.057 0 Td [(U i ,i=1,2,...,narestillindependent.Thuswecanuse.18,so thatourapproximate95%condenceintervalis T )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 : 96 s p n ; T +1 : 96 s p n .41 where s 2 isthesamplevarianceofthe T i Acommonsituationinwhichwehavedependentsamplesisthatinwhichwearecomparingtwodependent proportions.Supposeforexamplethattherearethreecandidatesrunningforapoliticalofce,A,Band C.Wepoll1,000votersandaskwhomtheyplantovotefor.Let p A p B and p Z bethethreepopulation proportionsofpeopleplanningtovoteforthevariouscandidates,andlet b p A b p B and b p C bethecorresponding sampleproportions. Supposewewishtoformacondenceintervalfor p A )]TJ/F46 10.9091 Tf 11.572 0 Td [(p B Clearly,thetwosampleproportionsarenot independentrandomvariables,sinceforinstanceif b p A =1 thenweknowforsurethat b p B is0.Todealwith this,wecouldsetupvariables U i and V i asabove,withforexample U i being1or0,accordingtowhether the i th personinoursampleplanstovoteforAornot. Butwecandobetter.Let N A N B and N C denotetheactualnumbersofpeopleinoursamplewhostate theywillvoteforthevariouscandidates,sothatforinstance b p A = N A = 1000 .Well,thepointisthatthe

PAGE 134

116 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE vector N A ;N B ;N C T hasamultinomialdistribution.Thusweknowthat b p A )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p B =0 : 001 N A )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 001 N B .42 hasvariance 0 : 001 p A )]TJ/F46 10.9091 Tf 10.909 0 Td [(p A +0 : 001 p B )]TJ/F46 10.9091 Tf 10.909 0 Td [(p B )]TJ/F15 10.9091 Tf 10.91 0 Td [(0 : 002 p A p B .43 So,thestandarderrorof b p A )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p B is p 0 : 001 b p A )]TJ/F52 10.9091 Tf 11.532 0 Td [(b p A +0 : 001 b p B )]TJ/F52 10.9091 Tf 11.533 0 Td [(b p B )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 002 b p A b p B .44 4.2.12Example:MachineClassicationofForestCovers Remotesensing ismachineclassicationoftypefromvariablesobservedaerially,typicallybysatellite.In theapplicationwe'llconsiderhere,researcherswanttopredictforestcovertypeforagivenlocationthere aresevendifferenttypes,fromknowngeographicdata,asdirectobservationistooexpensiveandmay sufferfromlandaccesspermissionissues.SeeBlackard,JockA.andDenisJ.Dean,2000,Comparative AccuraciesofArticialNeuralNetworksandDiscriminantAnalysisinPredictingForestCoverTypesfrom CartographicVariables, ComputersandElectronicsinAgriculture ,24:131-151. Therewereover50,000observations,butforsimplicitywe'lljustusetherst1,000here. Oneofthevariableswastheamountofhillsideshadeatnoon,whichwe'llcallHS12.Let'sndanapproximate95%condenceintervalforthedifferenceinpopulationmeanHS12valuesincovertype1andtype 2locations.Thetwosamplemeanswere223.8and226.3,withsvaluesof15.3and14.3,andthesample sizeswere226and585.Soourcondenceintervalis 223 : 8 )]TJ/F15 10.9091 Tf 10.909 0 Td [(226 : 3 1 : 96 r 15 : 3 2 226 + 14 : 3 2 585 = )]TJ/F15 10.9091 Tf 8.485 0 Td [(2 : 5 2 : 3= )]TJ/F15 10.9091 Tf 8.484 0 Td [(4 : 8 ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 3 .45 Nowlet'sndacondenceintervalforthedifferenceinpopulationproportionsofsitesthathavecovertypes 1and2.Oursampleestimateis b p 1 )]TJ/F52 10.9091 Tf 11.533 0 Td [(b p 2 =0 : 226 )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 585= )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 359 .46 Thestandarderrorofthisquantity,from.44,is p 0 : 001 0 : 226 0 : 7740 : 001 0 : 585 0 : 415 )]TJ/F15 10.9091 Tf 10.909 0 Td [(002 0 : 226 0 : 585=0 : 019 .47 Thatgivesusacondenceintervalof )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 359 1 : 96 0 : 019= )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 397 ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 321 .48

PAGE 135

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 117 4.2.13ExactCondenceIntervals Recallhowwederivedourpreviouscondenceintervals.Webeganwithaprobabilitystatementinvolving ourestimator,andthendidsomealgebratoturnitaroundintoaformulaforacondenceinterval.Those operationshadnothingtodowiththeapproximatenatureofthedistributionsinvolved.Wecandothesame thingifwehaveexactdistributions. Forexample,supposewehavearandomsample X 1 ;:::;X 10 fromanexponentialdistributionwithparameter .Let'sndanexact95%condenceintervalfor Let T = X 1 + ::: + X 10 .49 RecallthatThasagammadistributionwithparameters10theshape,inR'sterminologyand .Let q denotethe0.95quantileofthisdistribution,i.e.thepointtotherightofwhichthereisonly5%ofthearea underthedensity.Notecarefullythatthisisindeedafunctionof ;ithasdifferentvaluesfordifferent Then: 0 : 95= P [ T q ]= P [ q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 T ] .50 Herewehaveusedthefactthatqisadecreasingfunction. So,anEXACT95%one-sidedcondenceintervalfor is ;q )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 T .51 Now,whatIS q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ?Recallwhatqis,the0.95quantileofthegammadistributionwithshape10.Italways helpsintuitiontolookatsomespecicnumbers: >qgamma.95,10,2.5 [1]6.282087 >qgamma.95,10,4 [1]3.926304 So,q.5=6.28andq=3.92.Thatmeans q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 : 28=2 : 5 and q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 : 92=4 Youcannowseehowwecanformtheinterval.SayT=16.4.Thenwedosometrial-and-erroruntilwend anumberwsuchthat qgamma.95,10,w=16 .Ourcondenceintervalisthen,w. 4.2.14Slutsky'sTheoremadvancedtopic ThereadershouldreviewSection2.3.2.6beforecontinuing. Sinceonegenerallydoesnotknowthevalueof in.13,wereplaceitby s ,yielding.17.Whywas thatlegitimate? Theanswerdependsonthetheorembelow.First,weneedadenition.

PAGE 136

118 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Denition3 Wesaythatasequenceofrandomvariables L n convergesinprobability totherandom variable L ifforevery > 0 lim n !1 P j L n )]TJ/F46 10.9091 Tf 10.909 0 Td [(L j > =0 .52 Thisisalittleweakerthanconvergencewithprobability1,asintheStrongLawofLargeNumbersSLLN, Section1.4.10.Convergencewithprobability1impliesconvergenceinprobabilitybutnotviceversa. Soforexample,if Q 1 ;Q 2 ;Q 3 ;::: arei.i.d.withmean ,thentheSLLNimpliesthat L n = Q 1 + ::: + Q n n .53 convergeswithprobability1to ,andthus L n convergesinprobabilityto too. 4.2.14.1TheTheorem Theorem4Slutsky'sTheorem abridgedversion:Considerrandomvariables X n ;Y n ; and X ,suchthat X n convergesindistributionto X and Y n convergesinprobabilitytoaconstant c withprobability1, Then: a X n + Y n convergesindistributionto X + c b X n =Y n convergesindistributionto X=c 4.2.14.2WhyIt'sValidtoSubstitute s for Wenowreturntothequestionraisedabove.Inourcontexthere,thatwetake X n = W )]TJ/F46 10.9091 Tf 10.909 0 Td [( = p n .54 Y n = s .55 Weknowthat.54convergesindistributiontoN,1while.55convergesinto1.Thusforlargen,we havethat W )]TJ/F46 10.9091 Tf 10.909 0 Td [( s= p n .56 hasanapproximateN,1distribution,sothat.17isvalid.

PAGE 137

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 119 4.2.14.3Example:CondenceIntervalforaRatioEstimator AgainconsidertheexampleinSection4.2.3.1ofweightsofmenandwomeninDavis,butthistimesuppose wewishtoformacondenceintervalforthe ratio ofthemeans, = 1 2 .57 Again,thenaturalestimatoris b = X Y .58 Howcanweconstructacondenceintervalfromthisestimator?Ifitwerealinearcombinationof X and Y we'dhavenoproblem,sincealinearcombinationofmultivariatenormalrandomvariablesisagainnormal. Thatisnotexactlythecasehere,butit'sclose.Since Y convergesinprobabilityto 2 ,Slutsky'sTheorem Section4.2.14tellsusthattheproblemherereallyisoneofsuchalinearcombination.Wecanforma condenceintervalfor 1 ,thendividebothendpointsoftheintervalby Y ,yieldingacondenceintervalfor 4.2.15TheDeltaMethod:CondenceIntervalsforGeneralFunctionsofMeansorProportionsadvancedtopic The deltamethod isagreatwaytoderiveasymptoticdistributionsofquantitiesthatarefunctionsofrandom variableswhoseasymptoticdistributionsarealreadyknown. 4.2.15.1TheTheorem Theorem5 Suppose R 1 ;:::;R k areestimatorsof 1 ;:::; k basedonarandomsampleofsizen,suchthat therandomvector p n R )]TJ/F46 10.9091 Tf 10.909 0 Td [( = p n 0 B B @ R 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 R 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 2 ::: R k )]TJ/F46 10.9091 Tf 10.909 0 Td [( k 1 C C A .59 hasanasymptoticallymultivariatenormaldistributionwithmean0andnonsingularcovariancematrix = ij Lethbeasmoothscalarfunctionofkvariables,with h i denotingits i th partialderivative.Considerthe randomvariable Y = h R 1 ;:::;R k .60

PAGE 138

120 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Then p n [ Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(h 1 ;:::; k ] convergesindistributiontoanormaldistributionwithmean0andvariance [ 1 ;:::; k ] T [ 1 ;:::; k ] .61 providednotallof i = h i 1 ;:::; k ;i =1 ;:::;k .62 are0. Informally,thetheoremsaysthat Y willbeapproximatelynormalwithmean h 1 ;:::; k andcovariance matrix1/ntimes.61.Thiscanbeusedtoformcondenceintervalsfor h 1 ;:::; k .Ofcourse,the quantitiesin.61aretypicallyestimatedfromthesample. Inotherwords,ourapproximate95%condenceintervalfor h 1 ;:::; k is h R 1 ;:::;R k 1 : 96[ b 1 ;:::; b k ] T b [ b 1 ;:::; b k ] .63 Proof We'llcoverthecasek=1droppingthesubscript1forconvenience.RecalltheMeanValueTheoremfrom calculus: 12 h R = h + h 0 W R )]TJ/F46 10.9091 Tf 10.909 0 Td [( .64 forsome W between and R .Rewritingthis,wehave p n [ h R )]TJ/F46 10.9091 Tf 10.909 0 Td [(h ]= p nh 0 W R )]TJ/F46 10.9091 Tf 10.909 0 Td [( .65 Itcanbeshownandshouldbeintuitivelyplausibletoyouthatifasequenceofrandomvariablesconvergesindistributiontoaconstant,theconvergenceisinprobabilitytoo.So, R )]TJ/F46 10.9091 Tf 10.093 0 Td [( convergesinprobability to0,forcing W toconvergeinprobabilityto h .ThenfromSlutsky'sTheorem,theasymptoticdistributionof.65isthesameasthatof p nh 0 R )]TJ/F46 10.9091 Tf 10.909 0 Td [( .Theresultfollows. 4.2.15.2Example:SquareRootTransformation Itisusedtobecommon,andtosomedegreestillcommontoday,forstatisticalanalyststoapplyasquare-root transformationtoPoissondata.Thedeltamethodshedslightonthemotivationforthis. 12 Thisiswherethedeltainthenameofthemethodcomesfrom,anallusiontothefactthatderivativesarelimitsofdifference quotients.

PAGE 139

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 121 Considerarandomvariable X thatisPoisson-distributedwithmean .RecallfromSection3.8.1.2that sumsofindependentPoissonrandomvariablesarethemselvesPoissondistributed.Forthatreason, X has thesamedistributionas Y 1 + ::: + Y k .66 wherethe Y i arei.i.d.Poissonrandomvariableseachhavingmean =k .BytheCentralLimitTheorem, Y 1 + ::: + Y k hasanapproximatenormaldistribution,withmeanandvariance .Thisisnotquitearigorous argument,asthemeanof Y i dependsonk,soourtreatmenthereisinformal. Nowconsider W = p X = p Y 1 + ::: + Y k .Let h t = p T ,sothat h 0 t =1 = p t .Thedeltamethod thensaysthat W alsohasanapproximatenormaldistribution,withasymptoticvariance 1 4 = 1 4 .67 So,theasymptoticvarianceof p X isaconstant,independentof .Thisbecomesrelevantinregression analysis,where,aswewilldiscussinChapter6,aclassicalassumptionisthatacertaincollectionofrandom variablesallhavethesamevariance. 4.2.15.3Example:CondenceIntervalfor 2 RecallthatinSection4.2.7wenotedthat.18isonlyanapproximatelycondenceintervalforthemean. AnexactintervalisavailableusingtheStudentt-distribution,if thepopulationisnormallydistributed.We pointedoutthat.18isveryclosetotheexactintervalforevenmoderatelylargenanyway,andsinceno populationisexactlynormal,.18isgoodenough.Notethatoneoftheimplicationsofthisandthefact that.18didnotassumeanyparticularpopulationdistributionisthataStudent-tbasedcondenceinterval workswellevenfornon-normalpopulations.WesaythattheStudent-tintervalis robust tothenormality assumption. Butwhataboutacondenceintervalforavariance?Hereonecanformanexactintervalbasedonthe chi-squaredistribution,if thepopulationisnormal.Inthiscase,though,theintervaldoesNOTworkwell fornon-normalpopulations;itisNOTrobusttothenormalityassumption.So,let'sderiveanintervalthat doesn'tassumenormality;we'llusethedeltamethod.Warning:Thiswillgetalittlemessy. Write 2 = E W 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( EW 2 .68 andfrom.16writeourestimatorof 2 as s 2 = 1 n n X i =1 W 2 i )]TJETq1 0 0 1 313.225 53.68 cm[]0 d 0 J 0.436 w 0 0 m 11.818 0 l SQBT/F46 10.9091 Tf 313.225 44.699 Td [(W 2 = T 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(T 2 1 .69

PAGE 140

122 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Weareusingouroldnotation,with W 1 ;:::;W n beingarandomsamplefromourpopulation,andwithW representingarandomvariablehavingthepopulationdistribution. Since ET 2 = E W 2 and ET 1 = EW ,wetakeourfunctionhtobe h u;v = u )]TJ/F46 10.9091 Tf 10.909 0 Td [(v 2 .70 Inotherwords,inthenotationofthetheorem, R 1 isour T 2 and R 2 isour T 1 We'llneedthevariancesof T 1 and T 2 ,andtheircovariance.Wealreadyhavetheirmeans,asnotedabove. Wealsohavethevarianceof T 1 ,from.9: Var T 1 = 1 n Var W .71 Nowforthevarianceof T 2 :Using.9buton W 2 insteadofW,wehave Var T 2 = 1 n Var W 2 = 1 n [ E W 4 )]TJ/F46 10.9091 Tf 10.909 0 Td [(E W 2 2 ] .72 Nowforthecovariance: Cov T 1 ;T 2 = 1 n 2 n X i =1 Cov W i ;W 2 i = 1 n 2 nCov W;W 2 .73 Butfromthefamousformulaforcovariance, Cov W;W 2 = E W 3 )]TJ/F46 10.9091 Tf 10.909 0 Td [(EW E W 2 .74 Tosummarize: Var T 1 = 1 n [ E W 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( EW 2 ] .75 Var T 2 = 1 n [ E W 4 )]TJ/F46 10.9091 Tf 10.909 0 Td [(E W 2 2 ] .76 Cov T 1 ;T 2 = 1 n [ E W 3 )]TJ/F46 10.9091 Tf 10.909 0 Td [(EW E W 2 ] .77 Also, h 0 u;v = ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(2 v T h 0 R 1 ;R 2 = ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(2 ET 1 T = ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(2 EW T .78

PAGE 141

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 123 Theasymptoticvariancetouseinourcondenceintervalfor 2 isseenin.61tobe 1 n E W 4 )]TJ/F46 10.9091 Tf 10.909 0 Td [(E W 2 2 +4 EW 2 f E W 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( EW 2 g)]TJ/F15 10.9091 Tf 18.788 0 Td [(4 EW f E W 3 )]TJ/F46 10.9091 Tf 10.909 0 Td [(EW E W 2 g .79 Now,wedonotknowthevalueof E W m here,m=1,2,3,4.So,weestimate E W m as 1 n n X i =1 W m i .80 Ourcondenceintervalisthen s 2 plusandminus1.96timesthesquarerootofthisquantity. Itshouldbenoted,though,thatestimatingmeansofhigherpowersofarandomvariablerequireslarger samplesinordertoachievecomparableaccuracy.Ourcondenceintervalheremayneedaratherlarge sampletobeaccurate,asopposedtothesituationwith.18,inwhichevenn=20shouldworkwell. 4.2.16SimultaneousCondenceIntervals Supposeinourstudyofheights,weightsandsoonofpeopleinDavis,weareinterestedinestimatinga numberofdifferentquantities,withourformingacondenceintervalforeachone.Thoughourcondence levelforeachoneofthemwillbe95%,our overall condencelevelwillbelessthanthat.Inotherwords, wecannotsayweare95%condentthatalltheintervalscontaintheirrespectivepopulationvalues. Insomecaseswemaywishtoconstructcondenceintervalsinsuchawaythatwecansayweare95% condentthatalltheintervalsarecorrect.Thisbranchofstatisticsisknownas simultaneousinference or multipleinference Usuallythiskindofmethodologyisusedinthecomparisonofseveral treatments .Thistermoriginated inthelifesciences,e.g.comparingtheeffectivenessofseveraldifferentmedicationsforcontrollinghypertension,itcanbeappliedinanycontext.Forinstance,wemightbeinterestedincomparinghowwell programmersdoinseveraldifferentprogramminglanguages,sayPython,RubyandPerl.We'dformthree groupsofprogrammers,oneforeachlanguage,withsay20programmerspergroup.Thenwewouldhave themwritecodeforagivenapplication.OurmeasurementcouldbethelengthoftimeTthatittakesfor themtodeveloptheprogramtothepointatwhichitrunscorrectlyonasuiteoftestcases. Let T ij bethevalueofTforthej th programmerinthei th group,i=1,2,3,j=1,2,...,20.Wewouldthenwish tocomparethethreetreatments,i.e.programminglanguages,byestimating i = ET i 1 ,i=1,2,3.Our estimatorswouldbe U i = P 20 j =1 T ij = 20 ,i=1,2,3.Sincewearecomparingthethreepopulationmeans,we maynotbesatisedwithsimplyformingordinary95%condenceintervalsforeachmean.Wemaywish toformcondenceintervalswhich jointly havecondencelevel95%. 13 Notevery,verycarefullywhatthismeans.Asusual,thinkofournotebookidea.Eachlineofthenotebook wouldcontainthe60observations;differentlineswouldinvolvedifferentsetsof60people.So,therewould be60columnsfortherawdata,threecolumnsforthe U i .Wewouldalsohavesixmorecolumnsforthe 13 Theword may isimportanthere.Itreallyisamatterofphilosophyastowhetheroneusessimultaneousinferenceprocedures.

PAGE 142

124 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE condenceintervalslowerandupperboundsforthe i .Finally,imaginethreemorecolumns,oneforeach condenceinterval,withtheentryforeachbeingeitherRightorWrong.Acondenceintervalislabeled Rightifitreallydoescontainitstargetpopulationvalue,andotherwiseislabeledWrong. Now,ifweconstructindividual95%condenceintervals,thatmeansthatinagivenRight/Wrongcolumn, inthelongrun95%oftheentrieswillsayRight.Butforsimultaneousintervals,wehopethatwithinaline weseethree Rights,and95%ofalllineswillhavethatproperty. Inourcontexthere,ifwesetupourthreeintervalstohaveindividualcondencelevelsof95%,their simultaneouslevelwillbe 0 : 95 3 =0 : 86 ,sincethethreecondenceintervalsareindependent.Conversely, ifwewantasimultaneouslevelof0.95,wecouldtakeeachoneata98.3%level,since 0 : 95 1 3 0 : 983 However,ingeneraltheintervalswewishtoformwillnotbeindependent,sotheabovecuberootmethod wouldnotwork.Herewewillgiveashortintroductiontomoregeneralprocedures. Notethatnothinginlifeisfree.Ifwewantsimultaneouscondenceintervals,theywillbewider. 4.2.16.1TheBonferonniMethod Onesimpleapproachis Bonferonni'sInequality : Lemma6 Suppose A 1 ;:::;A g areevents.Then P A 1 or ::: or A g g X i =1 P A i .81 Youcaneasilyseethisforg=2: P A 1 or A 2 = P A 1 + P A 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P A 1 and A 2 P A 1 + P A 2 .82 Onecanthenprovethegeneralcasebymathematicalinduction. Nowtoapplythistoformingsimultaneouscondenceintervals,take A i tobetheeventthatthe i th condenceintervalisincorrect,i.e.failstoincludethepopulationquantitybeingestimated.Then.81saysthat if,say,weformtwocondenceintervals,eachhavingindividualcondencelevel-5/2%,i.e.97.5%, thentheoverallcollectivecondencelevelforthosetwointervalsisatleast95%.Here'swhy:Let A 1 be theeventthattherstintervaliswrong,and A 2 isthecorrespondingeventforthesecondinterval.Then overallconf.level = P not A 1 andnot A 2 .83 =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P A 1 or A 2 .84 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P A 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P A 2 .85 =1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 025 )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 025 .86 =0 : 95 .87

PAGE 143

4.2.INTRODUCTIONTOCONFIDENCEINTERVALS 125 4.2.16.2Scheffe'sMethodadvancedtopic TheBonferonnimethodisunsuitableformorethanafewintervals;eachonewouldhavetohavesucha highindividualcondencelevelthattheintervalswouldbeverywide.Manyalternativesexist,afamous onebeing Scheffe'smethod 14 Theorem7 Suppose R 1 ;:::;R k haveanapproximatelymultivariatenormaldistribution,withmeanvector = i andcovariancematrix = ij .Let b beaconsistentestimatorof Foranyconstants c 1 ;:::;c k ,considerlinearcombinationsofthe R i k X i =1 c i R i .88 whichestimate k X i =1 c i i .89 Formthecondenceintervals k X i =1 c i R i q k 2 ; k s c 1 ;:::;c k .90 where [ s c 1 ;:::;c k ] 2 = c 1 ;:::;c k T b c 1 ;:::;c k .91 andwhere 2 ; k istheupperpercentileofachi-squaredistributionwithkdegreesoffreedom. 15 Thenalloftheseintervalsforinnitelymanyvaluesofthe c i !havesimultaneouscondencelevel 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( Bytheway,ifweareinterestedinonlyconstructingcondenceintervalsfor contrasts ,i.e. c i havingthe propertythat i c i =0 ,wethenumberofdegreesoffreedomreducestok-1,thusproducingnarrower intervals. JustasinSection4.2.7weavoidedthet-distribution,herewehaveavoidedtheFdistribution,whichisused insteadofch-squareintheexactformofScheffe'smethod. 14 Thenameispronouncedsheh-FAY. 15 RecallthatthedistributionofthesumofsquaresofgindependentN,1randomvariablesiscalled chi-squarewithgdegrees offreedom .ItistabulatedintheRstatisticalpackage'sfunction qchisq .

PAGE 144

126 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE 4.2.16.3Example Forexample,againconsidertheDavisheightsexampleinSection4.2.11.Supposewewanttondapproximate95%condenceintervalsfortwopopulationquantities, 1 and 2 .Thesecorrespondtovaluesof c 1 ;c 2 of,0and,1.Sincethetwosamplesareindependent, 12 =0 .Thechi-squarevalueis5.99, 16 sothesquarerootin.90is3.46.So,wewouldcompute.18for X andthenfor Y ,butwoulduse3.46 insteadof1.96. ThisactuallyisnotasgoodasBonferonniinthiscase.ForBonferonni,wewouldndtwo97.5%condence intervals,whichwoulduse2.24insteadof1.96. Scheffe'smethodistooconservativeifwejustareformingasmallnumberofintervals,butitisgreatifwe formalotofthem.Moreover,itisverygeneral,usablewheneverwehaveasetofapproximatelynormal estimators. 4.2.16.4OtherMethodsforSimultaneousInference Therearemanyothermethodsforsimultaneousinference.Itshouldbenoted,though,thatmanyofthem arelimitedinscope,incontrasttoScheffe'smethod,whichisusablewheneveronehasmultivariatenormal estimators,andBonferonni'smethod,whichisuniversallyusable. 4.2.17TheBootstrapMethodforFormingCondenceIntervalsadvancedtopic Manystatisticalapplicationscanbequitecomplex,whichmakesthemverydifculttoanalyzemathematically.Fortunately,thereisafairlygeneralmethodforndingcondenceintervalscalledthe bootstrap Hereisaverybriefoverview. Sayweareestimatingsomepopulationvalue basedoni.i.d.randomvariables Q i ,i=1,...,n.Sayour estimatoris b .Thenwedrawknewsamplesofsizen,bydrawingvalueswithreplacementfromthe Q i Foreachsample,werecompute b ,givingusvalues ~ i ,i=1,...,k.Wesorttheselattervaluesandndthe 0.025and0.975quantiles,i.e.the2.5%and97.5%pointsofthevalues ~ i ,i=1,...,k.Thesetwopointsform ourcondenceintervalfor Rincludesthe boot functiontodothemechanicsofthisforus. 4.3HypothesisTesting 4.3.1TheBasics Supposeyouhaveacoinwhichyouwanttoassessforfairness.Letpbetheprobabilityofheadsfor thecoin.Youcouldtossthecoin,say,100times,andthenformacondenceintervalforpusing.32. Thewidthoftheintervalwouldtellyouwhether100tosseswasenoughfortheaccuracyyouwant,andthe locationoftheintervalwouldtellyouwhetherthecoinisfairenough. 16 ObtainedfromRvia qchisq.95,2 .

PAGE 145

4.3.HYPOTHESISTESTING 127 Forinstance,ifyourintervalwere.49,0.54,youmightfeelsatisedthatthiscoinisreasonablyfair.In fact, notecarefullythateveniftheintervalwere,say,.502,0.506,youwouldstillconsiderthecoin tobereasonablyfair. Unfortunately,thisentireprocesswouldbecountertothetraditionalusageofstatistics.Mostusersof statisticswouldusethetossdatatotestthe nullhypothesis H 0 : p =0 : 5 .92 againstthe alternatehypothesis H A : p 6 =0 : 5 .93 Theapproachistoconsider H 0 innocentuntilprovenguilty.Weformthe teststatistic Z = b p )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 q 1 n b p )]TJ/F52 10.9091 Tf 11.533 0 Td [(b p .94 Under H 0 therandomvariableZwouldhaveanapproximateN,1distribution.ThebasicideaisthatifZ turnsouttohaveavaluewhichisrareforthatdistribution,wesay,Ratherthanbelievewe'veobserveda rareevent,wechooseinsteadtoabandonourassumptionthat H 0 istrue. So,whatdowetakeforourcutoffvalueforrareness?Thisprobabilityiscalledthe signicancelevel denotedby .Theclassicalvaluefor is0.05.If H 0 weretrue,ZwouldhaveanapproximateN,1 distribution,andthuswouldbelessthan-1.96orgreaterthan1.96only5%ofthetime,arareevent. So,ifZdoesstraythatfari.e.1.96ormoreineitherdirectionfrom0,wereject H 0 ,anddecidethat p 6 =0 : 5 .Wesay,Thevalueofpissignicantlydifferentfrom0.5;moreonthisbelow,asitisNOTwhat itsoundslike. LetXbethenumberofheadswegetfromour100tosses.Notethatourrulefordecisionmakingformulated aboveisequivalentdothealgebratoseethisforyourselftosayingthatwewillaccept H 0 if 40 X 60 andrejectitotherwise. 4.3.2GeneralTestingBasedonNormallyDistributedEstimators Suppose b isanapproximatelynormallydistributedestimatorofsomepopulationvalue .Thentotest H 0 : = c ,formtheteststatistic Z = b )]TJ/F46 10.9091 Tf 10.909 0 Td [(c s:e: b .95 where s:e: b isthestandarderrorof b ,andproceedasbefore: Reject H 0 : = c atthesignicancelevelof =0 : 05 if j Z j 1 : 96 .

PAGE 146

128 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE 4.3.3Example:NetworkSecurity Let'slookatthenetworksecurityexampleinSection4.2.11.1again.Here b = X )]TJETq1 0 0 1 439.39 628.278 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 439.39 619.296 Td [(Y ,andcispresumably 0dependingonthegoalsofMano etal .Ifyoureviewthematerialleadingupto.36,you'llseethat s:e: X )]TJETq1 0 0 1 288.296 569.783 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 288.296 560.801 Td [(Y = s s 2 1 n 1 + s 2 2 n 2 .96 Inthatexample,wefoundthatthestandarderrorwas0.61.So,ourteststatistic.95is Z = X )]TJETq1 0 0 1 258.274 500.628 cm[]0 d 0 J 0.436 w 0 0 m 8.758 0 l SQBT/F46 10.9091 Tf 258.274 491.647 Td [(Y )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 0 : 61 = 11 : 52 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 : 00 0 : 61 =15 : 61 .97 Thisisdenitelylargerinabsolutevaluethan1.96,sowereject H 0 ,andconcludethatthepopulationmean round-triptimesaredifferentinthewiredandwirelesscases. 4.3.4TheNotionofp-Values Inthatexampleabove,theZvalue,15.61,wasfarlargerthanthecutoffforrejectionof H 0 ,1.96.You mightsaythatweresoundinglyrejected H 0 .Whendataanalystsencountersuchasituation,theywant toindicateitintheirreports.Thisisdonethroughsomethingcalledthe observedsignicancelevel ,more oftencalledthe p-value Toillustratethis,let'slookatasomewhatmildercase,inwhichZ=2.14.Bycheckingtheatableofthe N,1distribution,orsaybycalling pnorm.14 inR,wewouldndthattheN,1distributionhasarea 0.016totherightof2.14,andofcourseanequalareatotheleftof-2.14.Inotherwords,inthegeneral formulationinSection4.3.2,wewouldbeabletoreject H 0 evenatthemuchmorestringentsignicance levelof0.032insteadof0.05.Thiswouldbeastrongerstatement,andintheresearchcommunityitis customarytosay,Thep-valuewas0.032. InourexampleaboveinwhichZwas15.61,thevalueisliterallyoffthechart; pnorm.61 returnsa valueof1.Ofcourse,it'satinybitlessthan1,butitissofaroutintherighttailoftheN,1distribution thattheareatotherightisessentially0.So,thiswouldbetreatedasvery,veryhighlysignicant. Ifmanytestsareperformedandaresummarizedinatable,itiscustomarytodenotetheoneswithsmall p-valuesbyasterisks.Thisisgenerallyoneasteriskforpunder0.05,twoforplessthan0.01,threefor 0.001,etc.Themoreasterisks,themoresignicantthedataissupposedtobe.Well,that'sacommon interpretation,butcarefulanalystsknowittobemisleading,aswewillnowdiscuss. 4.3.5What'sRandomandWhatIsNot Itiscrucialtokeepinmindthat H 0 isnotaneventoranyotherkindofrandomentity.Thiscoineitherhas p=0.5oritdoesn't.Ifwerepeattheexperiment,wewillgetadifferentvalueofX,butpdoesn'tchange. Soforexample,itwouldbewrongandmeaninglesstospeakoftheprobabilitythat H 0 istrue.

PAGE 147

4.3.HYPOTHESISTESTING 129 Similarly,itwouldbewrongandmeaninglesstowrite 0 : 05= P j Z j > 1 : 96 j H 0 ,againbecause H 0 is notaneventandthiskindofconditionalprobabilitywouldnotmakesense.Whatiscustomarilywrittenis somethinglike 0 : 05= P H 0 j Z j > 1 : 96 .98 Thisisreadaloudastheprobabilitythat j Z j islargerthan1.96under H 0 ,withthephrase under H 0 referringtotheprobabilitymeasureinthecaseinwhich H 0 istrue. 4.3.6One-Sided H A Supposethatsomehowwearesurethatourcoinintheexampleaboveiseitherfairoritismoreheavily weightedtowardsheads.Thenwewouldtakeouralternatehypothesistobe H A : p> 0 : 5 .99 Arareeventwhichcouldmakeusabandonourbeliefin H 0 wouldnowbeifZin.94isverylargein thepositivedirection.So,with =0 : 05 ,ourrulewouldnowbetoreject H 0 if Z> 1 : 65 Thesamewouldbethecaseifournullhypothesiswere H A : p 0 : 5 .100 insteadof H A : p =0 : 5 .101 Then.98wouldchangeto 0 : 05 P H 0 j Z j > 1 : 65 .102 4.3.7ExactTests Remember,thetestswe'veseensofarareallapproximate.In.94,forinstance, b p hadanapproximate normaldistribution,sothatthedistributionofZwasapproximatelyN,1.Thusthesignicancelevel wasapproximate,aswerethep-valuesandsoon. 17 Buttheonlyreasonourtestswereapproximateisthatweonlyhadthe approximate distributionofourtest statisticZ,orequivalently,weonlyhadtheapproximatedistributionofourestimator,e.g. b p .Ifwehavean exact distributiontoworkwith,thenwecanperformanexacttest. 17 Anotherclassofprobabilitieswhichwouldbeapproximatewouldbethe power values.Thesearetheprobabilitiesofrejecting H 0 ifthelatterisnottrue.Wewouldspeak,forinstance,ofthepowerofourtestatp=0.55,meaningthechancesthatwewould rejectthenullhypothesisifthetruepopulationvalueofpwere0.55.

PAGE 148

130 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Let'sconsiderthecoinexampleagain.Tokeepthingssimple,let'ssupposewetossthecoin10times.We willmakeourdecisionbasedonX,thenumberofheadsoutof10tosses.Supposewesetourthreshholdfor strongevidenceagain H 0 tobe8heads,i.e.wewillreject H 0 if X 8 .Whatwill be? = 10 X i =8 P X = i = 10 X i =8 10 i 1 2 10 =0 : 055 .103 That'snot0.05.Clearlywecannotgetanexactsignicancelevelof0.05, 18 butour isexactly0.055. Ofcourse,ifyouarewillingtoassumethatyouaresamplingfromanormally-distributedpopulation,then theStudent-ttestisnominallyexact.TheRfunction t.test performsthisoperation. Asanotherexample,supposelifetimesoflightbulbsareexponentiallydistributedwithmean .Inthepast, =1000 ,butthereisaclaimthatthenewlightbulbsareimprovedand > 1000 .Totestthatclaim, wewillsample10lightbulbs,gettinglifetimes X 1 ;:::;X 10 ,andcomputethesamplemean X .Wewillthen performahypothesistestof H 0 : =1000 .104 vs. H A : > 1000 .105 Itisnaturaltohaveourtesttaketheforminwhichwereject H 0 if X>w .106 forsomeconstantwchosensothat P X>w =0 : 05 .107 under H 0 .Supposewewantanexacttest,notonebasedonanormalapproximation. Recallthat 100 X ,thesumofthe X i ,hasagammadistribution,withr=10and =0 : 001 .So,wecannd thewforwhich P X>w =0 : 05 byusingR's qgamma >qgamma.95,10,0.001 [1]15705.22 So,wereject H 0 ifoursamplemeanislargerthan1570.5. 18 Actually,itcouldbedonebyintroducingsomerandomizationtoourtest.

PAGE 149

4.3.HYPOTHESISTESTING 131 4.3.8What'sWrongwithHypothesisTesting Hypothesistestingisatime-honoredapproach,usedbytensofthousandsofpeopleeveryday. Butit iswrong.Iusethequotationmarksherebecause,althoughhypothesistestingismathematicallycorrect, itisatbestnoninformativeandatworstseriouslymisleading. Tobeginwith,it'sabsurdtotest H 0 intherstplace.Nocoinisabsolutelyperfectlybalanced,withp= 0.5000000000000000000000000000...Weknowthatbeforeevencollectinganydata. Butmuchworseisthiswordsignicant. Sayourcoinactuallyhasp=0.502.Fromanyone'spoint ofview,that'safaircoin!Butlookwhathappensin.94asthesamplesizengrows.ifwehavealarge enoughsample,eventuallythedenominatorin.94willbesmallenough,and b p willbecloseenoughto 0.502,thatZwillbelargerthan1.96andwewilldeclarethatpissignicantlydifferentfrom0.5.Butit isn't!Yes,pisdifferentfrom0.5,butNOTinanysignicantsense. Thisisespeciallyaproblemincomputerscienceapplicationsofstatistics,becausetheyoftenuseverylarge datasets.Adataminingapplication,forinstance,mayconsistofhundredsofthousandsofretailpurchases. ThesameistruefordataonvisitstoaWebsite,networktrafcdataandsoon.Inallofthese,thestandard useofhypothesistestingcanresultinourpouncingonverysmalldifferencesthatarequiteinsignicantto us,yetwillbedeclaredsignicantbythetest. Conversely,ifoursampleistoosmall,wecanmissadifferencethatactually is signicanti.e.important tousandwewoulddeclarethatpisNOTsignicantlydifferentfrom0.5. Insummary,thetwobasicproblemswithhypothesistestingare H 0 isimproperlyspecied.Whatwearereallyinterestedinhereiswhetherpis near 0.5,notwhether itis exactly 0.5whichweknowisnotthecaseanyway. Useoftheword signicant isgrosslyimproperor,ifyouwish,grosslymisinterpreted. Hypothesistestingformstheverycoreusageofstatistics,yetyoucannowseethatitis,asIsaidabove,at bestnoninformativeandatworstseriouslymisleading.Thisiswidelyrecognizedbythinkingstatisticians. Forinstance,see http://www.indiana.edu/ stigtsts/quotsagn.html foranicecollection ofquotesfromfamousstatisticiansonthispoint.Thereisanentirechapterdevotedtothisissueinoneof thebest-sellingelementarystatisticstextbooksinthenation. 19 Butthepracticeofhypothesistestingistoo deeplyentrenchedforthingstohaveanyprospectofchanging. 4.3.9WhattoDoInstead Inthecoinexample,wecouldsetlimitsoffairness,sayrequirethatpbenomorethan0.01from0.5inorder toconsideritfair.Wecouldthentestthehypothesis H 0 :0 : 49 p 0 : 51 .108 19 Statistics ,thirdedition,byDavidFreedman,RobertPisani,RogerPurves,pub.byW.W.Norton,1997.

PAGE 150

132 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Suchanapproachisalmostneverusedinpractice,asitissomewhatdifculttouseandexplain.Buteven moreimportantly,whatifthetruevalueofpwere,say,0.51001?Wouldwestillreallywanttorejectthe coininsuchascenario? NotecarefullythatIamnotsayingthatweshouldnotmakeadecision.We do havetodecide,e.g.decide whetheranewhypertensiondrugissafeorinthiscasedecidewhetherthiscoinisfairenoughforpractical purposes,sayfordeterminingwhichteamgetsthekickoffintheSuperBowl.Butitshouldbeaninformed decision,andeventestingthemodied H 0 abovewouldbemuchlessinformativethanacondenceinterval. Formingacondenceintervalisthefarsuperiorapproach. Thewidthoftheintervalshowsuswhether nislargeenoughfor b p tobereasonablyaccurate,andthelocationoftheintervaltellsuswhetherthecoinis fairenoughforourpurposes. Notethatinmakingsuchadecision,wedoNOTsimplycheckwhether0.5isintheinterval. That wouldmakethecondenceintervalreducetoahypothesistest,whichiswhatwearetryingtoavoid.Iffor exampletheintervalis.502,0.505,wewouldprobablybequitesatisedthatthecoinisfairenoughfor ourpurposes,eventhough0.5isnotintheinterval. Hypothesistestingisalsousedformodelbuilding,suchasforpredictorvariableselectioninregression analysisamethodtobecoveredinalaterunit.Theproblemisevenworsethere,becausethereisno reasontouse =0 : 05 asthecutoffpointforselectingavariable.Infact,evenifoneuseshypothesis testingforthispurposeagain,veryquestionablesomestudieshavefoundthatthebestvaluesof for thiskindofapplicationareintherange0.25to0.40. Inmodelbuilding,westillcanandshouldusecondenceintervals.However,itdoestakemoreworktodo so.Wewillreturntothispointinourunitonmodeling,Chapter5. 4.3.10DecideontheBasisofthePreponderanceofEvidence Inthemovies,youseestoriesofmurdertrialsinwhichtheaccusedmustbeprovenguiltybeyondthe shadowofadoubt.Butinmostnoncriminaltrials,thestandardofproofisconsiderablylighter, preponderanceofevidence .Thisisthestandardyoumustusewhenmakingdecisionsbasedonstatisticaldata. Suchdatacannotproveanythinginamathematicalsense.Instead,itshouldbetakenmerelyasevidence. Thewidthofthecondenceintervaltellsusthelikelyaccuracyofthatevidence.Wemustthenweighthat evidenceagainstotherinformationwehaveaboutthesubjectbeingstudied,andthenultimatelymakea decisiononthebasisofthepreponderanceofalltheevidence. Yes,juriesmustmakeadecision.Buttheydon'tbasetheirverdictonsomeformula.Similarly,youthedata analystshouldnotbaseyourdecisionontheblindapplicationofamethodthatisusuallyoflittlerelevance totheproblemathandhypothesistesting. 4.4GeneralMethodsofEstimation Intheprecedingsections,weoftenreferredtocertainestimatorsasbeingnatural.Forexample,ifwe areestimatingapopulationmean,anobviouschoiceofestimatorwouldbethesamplemean.Butinmany

PAGE 151

4.4.GENERALMETHODSOFESTIMATION 133 applications,itislessclearwhatanaturalestimateforaparameterofinterestwouldbe. 20 Wewillpresent generalmethodsforestimationinthissection. 4.4.1Example:GuessingtheNumberofRafeTicketsSold You'vejustboughtarafeticket,andndthatyouhaveticketnumber68.Youcheckwithacoupleof friends,andndthattheirnumbersare46and79.Letcbethetotalnumberoftickets.Howshouldwe estimatec,usingourdata68,46and79? Itisreasonabletoassumethateachofthethreeofyouisequallylikelytogetassignedanyofthenumbers 1,2,...,c.Inotherwords,thenumbersweget, X i ,i=1,2,3areuniformlydistributedontheset f 1,2,...,c g Wecanalsoassumethattheyareindependent;that'snotexactlytrue,sincewearesamplingwithoutreplacement,butforlargecorbetterstated,forn/csmallit'scloseenough. So,weareassumingthatthe X i areindependentandidenticallydistributedfamouslywrittenas i.i.d. in thestatisticsworldontheset f 1,2,...,c g .Howdoweusethe X i toestimatec? 4.4.2MethodofMoments Oneapproach,anintuitiveone,wouldbetoreasonasfollows.Noterstthat E X = c +1 2 .109 Let'ssolveforc: c =2 EX )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .110 Weknowthatwecanuse X = 1 n n X i =1 X i .111 toestimateEX,soby.110, 2 X )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 isanestimateofc.Thuswetakeourestimatorforctobe b c =2 X )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .112 ThisestimatoriscalledtheMethodofMomentsestimatorofc. Let'sstepbackandreviewwhatwedid: 20 RecallfromSection4.2.10thatwearenowusingtheterm parameter tomeananypopulationquantity,ratherananindexinto aparametricfamilyofdistributions.

PAGE 152

134 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE WewroteourparameterasafunctionofthepopulationmeanEXofourdataitemX.Here,that resultedin.110. Inthatfunction,wesubstitutedoursamplemean X forEX,andsubstitutedourestimator b c forthe parameterc,yielding.112.Wethensolvedforourestimator. Wesaythatanestimator b ofsomeparameter is consistent if lim n !1 b = .113 wherenisthesamplesize.Inotherwords,asthesamplesizegrows,theestimatoreventuallyconvergesto thetruepopulationvalue. Ofcoursehere X isaconsistentestimatorofEX.Thusyoucanseefrom.110and.112that b c isa consistentestimatorofc.Inotherwords,theMethodofMomentsgenerallygivesusconsistentestimators. Whatifwehavemorethanoneparametertoestimate,say 1 ;:::; r ?Wegeneralizewhatwedidabove.To seehow,recallthat E X i iscalledthe i th moment ofX; 21 let'sdenoteitby i .Also,notethatalthough wederived.110bysolving.109forc,wedidstartwith.109.Sowedothefollowing: Fori=1,...,rwewrite i asafunction g i ofallthe k Fori=1,...,rset b i = 1 n n X j =1 X i j .114 Substitutethe b k inthe g i andthensolveforthem. Intheaboveexamplewiththerafe,wehadr=1, 1 = c g 1 c = c +1 = 2 andsoon.Atwo-parameter examplewillbegivenbelow. 4.4.3MethodofMaximumLikelihood Anothermethod,muchmorecommonlyused,iscalledthe MethodofMaximumLikelihood .Inour exampleabove,itmeansaskingthequestion,Whatvalueofcwouldhavemadeourdata,46,79 mostlikelytohappen?Well,let'sndwhatiscalledthe likelihood ,i.e.theprobablyofourparticulardata valuesoccurring: L = P X 1 =68 ;X 2 =46 ;X 3 =79= 1 c 3 ; if c 79 0 ; otherwise .115 21 Hencethename,MethodofMoments.

PAGE 153

4.4.GENERALMETHODSOFESTIMATION 135 Nowkeepinmindthatcisaxed,thoughunknownconstant.Itisnotarandomvariable.Whatweare doinghereisjustaskingWhatifquestions,e.g.Ifcwere85,howlikelywouldourdatabe?Whatabout c=91? Wellthen,whatvalueofcmaximizes.115?Clearly,itisc=79.Anysmallervalueofcgivesusa likelihoodof0.Andforclargerthan79,thelargercis,thesmaller.115is.So,ourmaximumlikelihood estimatorMLEis79.Ingeneral,ifoursamplesizeinthisproblemweren,ourMLEforcwouldbe c =max i X i .116 4.4.4Example:EstimationtheParametersofaGammaDistribution Asanotherexample,supposewehavearandomsample X 1 ;:::;X n fromagammadistribution. f X t = 1 \050 c c t c )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t ;t> 0 .117 forsomeunknown c and .Howdoweestimate c and fromthe X i ? 4.4.4.1MethodofMoments Let'strytheMethodofMoments,asfollows.Wehavetwopopulationparameterstoestimate,cand ,so weneedtoinvolvetwomomentsofX.ThatcouldbeEXand E X 2 ,buthereitwouldmoreconveniently beEXandVarX.Weknowfromourpreviousunitoncontinuousrandomvariables,Chapter2,that EX = c .118 Var X = c 2 .119 Inourearliernotation,thiswouldber=2, 1 = c 2 = and g 1 c; = c= and g 2 c; = c= 2 Switchingtosampleanalogsandestimates,wehave b c b = X .120 b c b 2 = s 2 .121 Dividingthetwoquantitiesyields b = X s 2 .122

PAGE 154

136 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE whichthengives b c = X 2 s 2 .123 4.4.4.2MLEs WhatabouttheMLEsofcand ?Remember,the X i arecontinuousrandomvariables,sothelikelihood function,i.e.theanalogof.115,istheproductofthedensityvalues: L = n i =1 1 \050 c c X i c )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(X i .124 Ingeneral,itisusuallyeasiertomaximizetheloglikelihoodandmaximizingthisisthesameasmaximizing theoriginallikelihood: l = c )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 n X i =1 ln X i )]TJ/F15 10.9091 Tf 12.559 7.38 Td [(1 n X k =1 X i + nc ln )]TJ/F46 10.9091 Tf 10.909 0 Td [(n ln\050 c .125 Onethentakesthepartialderivativesof.125withrespecttocand .Thesolutionvalues, c and ,are thentheMLEsofcand .Unfortunately,theseequationsdonothaveclosed-formsolutions,sotheymust besolvednumerically. 4.4.5MoreExamples Suppose f W t = ct c )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 fortin,1,withthedensitybeing0elsewhere,forsomeunknown c> 0 .We havearandomsample W 1 ;:::;W n fromthisdensity. Let'sndtheMethodofMomentsestimator. EW = Z 1 0 tct c )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 dt = c c +1 .126 So,set W = b c b c +1 .127 yielding b c = W 1 )]TJETq1 0 0 1 319.118 40.411 cm[]0 d 0 J 0.436 w 0 0 m 11.818 0 l SQBT/F46 10.9091 Tf 319.118 31.43 Td [(W .128

PAGE 155

4.4.GENERALMETHODSOFESTIMATION 137 WhatabouttheMLE? L = n i =1 cW c )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 i .129 so l = n ln c + c )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 n X i =1 ln W i .130 Thenset 0= n b c + n X i =1 ln W i .131 andthus b c = )]TJ/F15 10.9091 Tf 37.691 7.38 Td [(1 1 n P n i =1 ln W i .132 AsinSection4.4.3,noteveryMLEcanbedeterminedbytakingderivatives.Consideracontinuousanalog oftheexampleinthatsection,with f W t = 1 c on,c,0elsewhere,forsome c> 0 Thelikelihoodis 1 c n .133 aslongas c max i W i .134 andis0otherwise.So, b c =max i W i .135 asbefore. Let'sndthebiasofthisestimator. Thebiasis E b C )]TJ/F46 10.9091 Tf 10.909 0 Td [(c .Toget E b c weneedthedensityofthatestimator,whichwegetasfollows: P b c t = P all W i t denition .136 = t c n densityof W i .137

PAGE 156

138 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE So, f b c t = n c n t n )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 .138 Integratingagainstt,wendthat E b C = n n +1 c .139 Sothebiasisc/n+1,notbadatall. 4.4.6WhatAboutCondenceIntervals? Usuallywearenotsatisedwithsimplyformingestimatescalled pointestimates .Wealsowantsome indicationofhowaccuratetheseestimatesare,intheformofcondenceintervals intervalestimates Inmanyspecialcases,ndingcondenceintervalscanbedoneeasilyonan adhoc basis.Look,forinstance, attheMethodofMomentsEstimatorinSection4.4.2.Ourestimator.112isalinearfunctionof X ,so weeasilyobtainacondenceintervalfor c fromonefor EX Anotherexampleis.132.Takingthelimitas n !1 theequationshowsusandwecouldverifythat c = 1 E [ln W ] .140 Dening X i =ln W i and X = X 1 + ::: + X n = ,wecanobtainacondenceintervalfor EX intheusual way.Wethenseefrom.140thatwecanformacondenceintervalfor c bysimplytakingthereciprocal ofeachendpointoftheinterval,andswappingtheleftandrightendpoints. Whataboutingeneral?FortheMethodofMomentscase,ourestimatorsarefunctionsofthesample moments,andsincethelatterareformedfromsumsandthusareasymptoticallynormal,thedeltamethod canbeusedtoshowthatourestimatorsareasymptoticallynormalandtoobtainasymptoticvariancesfor them. Thereisawell-developedasymptotictheoryforMLEs,whichundercertainconditionsnotonlyshows asymptoticnormalitywithadeterminedasymptoticvariance,butalsoestablishesthatMLEsareinacertain senseoptimalamongallestimators.Wewillnotpursuethishere. 4.4.7BayesianMethodsadvancedtopic Consideragaintheexampleofestimatingp,theprobabilityofheadsforacertaincoin.Supposewewereto saybeforetossingthecoinevenonceIthinkpcouldbeanynumber,butmorelikelynear0.5,something likeanormaldistributionwithmean0.5andstandarddeviation,oh,let'ssay0.1. Notecarefullytheword think. Wearejustusingourgutfeelinghere,ourhunch.ThenumberpisNOTarandomvariable! WearedealingwiththisONEcoin,andithasjustONEvalueofp.Yetwearetreatingpasrandomanyway.

PAGE 157

4.4.GENERALMETHODSOFESTIMATION 139 Underthisrandompassumption,theMLEwouldchange.OurdatahereisX,thenumberofheadsweget fromntossesofthecoin.Insteadofthelikelihoodbeing L = n X p X )]TJ/F46 10.9091 Tf 10.909 0 Td [(p n )]TJ/F47 7.9701 Tf 6.587 0 Td [(X .141 itnowbecomes L = 1 p 2 0 : 1 exp )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 5[ p )]TJ/F15 10.9091 Tf 10.909 0 Td [(0 : 5 = 0 : 1] 2 n X p X )]TJ/F46 10.9091 Tf 10.909 0 Td [(p n )]TJ/F47 7.9701 Tf 6.587 0 Td [(X .142 WewouldthenndthevalueofpwhichmaximizestheL,andtakethatasourestimate. Agutfeelingorhunchusedinthismanneriscalleda subjectiveprior .Priortocollectinganydata, wehaveacertainbeliefaboutp.Thisisverycontroversial,andmanypeopleincludingmeconsideritto behighlyinappropriate.They/Ifeelthatthereisnothingwrongusingone'sgutfeelingstomakeadecision, butitshouldNOTbepartofthemathematicalanalysisofthedata.One'shunchescanplayaroleindeciding thepreponderanceofevidence,asdiscussedinSection4.3.10. Ontheotherhand,maybewehaveactualdataoncoins,presumedtobearandomsamplefromthepopulation ofallcoinsofthattype,andweassumethatourcoinnowischosenlyrandomlyfromthatpopulation.Say wehaveformedanormalorothermodelforpbasedonthatdata.Itwouldbenetousethisinestimatingp foranewcoin,andthesecondLabovewouldbeappropriate.Inthiscase,wewouldbeusingan empirical prior 4.4.8TheEmpiricalcdf Recallthat F X ,thecdfof X ,isdenedas F X t = P X t ; )-222(1
PAGE 158

140 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE thesortedversionofthe X I 22 Then b F X t = 8 > < > : 0 ; for tY n .145 Hereisasimpleexample.Sayn=4andourdataare4.8,1.2,2.2and6.1.Wecanplottheempiricalcdfby callingR's ecdf function: >plotecdfx Hereisthegraph: 4.5RealPopulationsandConceptualPopulations InourexampleinSection4.2.3.1,weweresamplingfromarealpopulation.However,inmany,probably mostapplicationsofstatistics,eitherthepopulationorthesamplingismoreconceptual. ConsidertheexperimentcomparingthreescriptinglanguagesinSection3.169.Wethinkofourprogrammersasbeingarandomsamplefromthepopulationofallprogrammers,butthatisprobablyanidealization. Itmaybe,forexample,thattheyallworkatthesamecompany,inwhichcasewemustthinkofthemas arandomsamplefromtheratherconceptualpopulationofallprogrammerswho might workatthis company. 23 22 Acommonnotationforthisis Y j = X j ,meaningthat Y j isthe j th smallestofthe X i .Thesearecalledthe orderstatistics ofoursample. 23 You'reprobablywonderingwhywehaven'tdiscussedotherfactors,suchasdifferinglevelsofexperienceamongtheprogrammers.Thiswillbedealtwithinourunitonregressionanalysis,Chapter6.

PAGE 159

4.6.NONPARAMETRICDENSITYESTIMATION 141 AndwhataboutourrafeexampleinSection4.4.1?Certainlywecanimaginevariouskindsofrandomness thatcontributetothenumberspeoplegetontheirrafetickets.Maybe,forinstance,youwereinatrafc jamonthewaytothetheplacewhereyouboughttheticket,soyouboughtitalittlelaterthanyoumight haveandthusgotahighernumber.ButI'vealwaysemphasizedthenotionofarepeatableexperimentin thesenotes.Howcanthathappenhere?Youcouldimagine,forinstance,therafechairsuddenlylosingall thetickets,andaskingeveryonetodrawagain,resultingindifferentticketnumbers.Oryoucanimaginethe populationofallrafesthatyoumightsubmittowhichhavethesamevalueofc. Youcanseefromthisthatifonechoosestoapplystatisticscarefullywhichyouabsolutelyshoulddo theresometimesaresomeknottyproblemsofinterpretationtothinkabout. 4.6NonparametricDensityEstimation ConsidertheBusParadoxexampleagain.Recallthat W denotedthetimeuntilthenextbusarrives.This iscalledthe forwardrecurrencetime .The backwardrecurrencetime isthetimesincethelastbuswas here,whichwewilldenoteby R Supposeweareinterestedinestimatingthedensityof R f R ,basedonthesampledata R 1 ;:::;R n thatwe gatherinoursimulationinSection4.2.1,wheren=1000.Howcanwedothis? 24 Wecould,ofcourse,assumethat f R isamemberofsomeparametricfamilyofdistributions,saythetwoparametergammafamily.WewouldthenestimatethosetwoparametersasinSection4.4,andpossibly checkourassumptionusinggoodness-of-tprocedures,discussedinourunitonmodeling,Chapter5.On theotherhand,wemaywishtoestimate f R withoutmakinganyparametricassumptions.Infact,onereason wemaywishtodosoistovisualizethedatainordertosearchforasuitableparametricmodel. Ifwedonotassumeanyparametricmodel,wehaveinessencechangeourproblemfromestimatinganite numberofparameterstoaninnite-parameterproblem;theparametersarethevaluesof f X t forallthe differentvaluesoft.Ofcourse,weprobablyarewillingtoassume some structureon f R ,suchascontinuity, butthenwestillwouldhaveaninnite-parameterproblem. Wecallsuchestimation nonparametric ,meaningthatwedon'tuseaparametricmodel.However,youcan seethatitisreallyinnite-parametricestimation. Againdiscussedinourunitonmodeling,Chapter5,themorecomplexthemodel,thehigherthevariance ofitsestimator. So,nonparametricestimatorswillhavehighervariancethanparametricones. The nonparametricestimatorswillalsogenerallyhavesmallerbias,ofcourse. 4.6.1BasicIdeas Recallthat f R t = d dt F R t = d dt P R t .146 24 Actually,ourunitonrenewaltheory,Chapter9,provesthatRhasanexponentialdistribution.However,herewe'llpretendwe don'tknowthat.

PAGE 160

142 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Fromcalculus,thatmeansthat f R t P R t + h )]TJ/F46 10.9091 Tf 10.909 0 Td [(P R t )]TJ/F46 10.9091 Tf 10.909 0 Td [(h 2 h .147 = P t )]TJ/F46 10.9091 Tf 10.909 0 Td [(h
PAGE 161

4.6.NONPARAMETRICDENSITYESTIMATION 143 3 whileTRUE{ 4 newlastarrival=lastarrival+rexp,0.1 5 ifnewlastarrival>opt 6 returnopt-lastarrival 7 elselastarrival<-newlastarrival 8 } 9 } 10 11 observationpt<-240 12 nreps<-10000 13 waits<-vectorlength=nreps 14 forrepin1:nrepswaits[rep]<-doexptobservationpt 15 histwaits NotethatIusedthedefaultnumberofintervals,20.Hereistheresult: Thedensityseemstohaveashapelikethatoftheexponentialparametricfamily.Thisisnotsurprising, becauseit is exponential,butrememberwe'repretendingwedon'tknowthat. Hereistheplotwith100intervals:

PAGE 162

144 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Again,asimilarshape,thoughmoreraggedy. 4.6.3Kernel-BasedDensityEstimationadvancedtopic Nomatterwhattheintervalwidthis,thehistogramwillconsistofabunchofrectanges,ratherthanacurve. Thatisbasicallybecause,foranyparticularvalueoft, d f X t ,dependsonlyonthe X i thatfallintothat interval.Wecouldgetasmootherresultifweusedallourdatatoestimate f X t butputmoreweightonthe datathatisclosertot.Onewaytodothisiscalled kernel-based densityestimation,whichinRishandled bythefunction density Weneedasetofweights,morepreciselyaweightfunctionk,calledthe kernel .Anynonnegativefunction whichintegratesto1i.e.adensityfunctioninitsownrightwillwork.Ourestimatoristhen c f R t = 1 nh n X i =1 k t )]TJ/F46 10.9091 Tf 10.909 0 Td [(R i h .152 Tomakethisideaconcrete,takektobetheuniformdensityon-1,1,whichhasthevalue0.5on-1,1and 0elsewhere.Then.152reducesto.150.Notehowtheparameterh,calledthe bandwidth ,continues tocontrolhowfarawayfromtotwewishtogofordatapoints. Butasmentioned,whatwereallywantistoincludealldatapoints,sowetypicallyuseakernelwithsupport onallof ; 1 .InR,thedefaultkernelisthatoftheN,1density.Thebandwidthhcontrolshow muchsmoothingwedo;smallervaluesofhplaceheavierweightsondatapointsneartandmuchlighter weightsonthedistantpoints.ThedefaultbandwidthinRistakentothethestandarddeviationofk. Forourdatahere,Itookthedefaults:

PAGE 163

4.6.NONPARAMETRICDENSITYESTIMATION 145 Figure4.1:Kernelestimate,defaultbandwidth plotdensityr TheresultisseeninFigure4.1. Ithentrieditwithabandwidthof0.5.SeeFigure4.2.Thiscurveoscillatesalot,soananalystmightthink 0.5istoosmall.Weareprejudicedhere,becauseweknowthetruepopulationdensityisexponential. 4.6.4ProperUseofDensityEstimates Thereisnogood,practicalwaytochooseagoodbinwidthorbandwdith.Moreover,thereisalsonogood waytoformareasonablecondencebandforadensityestimate. So,densityestimatesshouldbeusedasexploratorytools,notasrmbasesfordecisionmaking.Youwill probablynditquiteunsettlingtolearnthatthereisnoexactanswertotheproblem.Butthat'sreallife! Exercises Notetoinstructor: SeethePrefaceforalistofsourcesofrealdataonwhichexercisescanbeassignedto complementthetheoreticalexercisesbelow. 1 .Supposewedrawasampleofsize2fromapopulationinwhich X hasthevalues10,15and12.Find p X ,rstassumingsamplingwithreplacement,thenassumingsamplingwithoutreplacement.

PAGE 164

146 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE Figure4.2:Kernelestimate,bandwidth0.5 2 .Supposelifetimesoflightbulbsareexponentiallydistributedwithmean .Inthepast, was1000,but thereisaclaimthatthenewlightbulbsareimprovedand 1000.Totestthatclaim,wewillsample100 lightbulbs,gettinglifetimes X 1 ;:::;X 2 0 ,andcompute X = X 1 + ::: + X 2 0 = 20 .Wewillthenperforma hypothesistestof H 0 : =1000 vs. H A : > 1000 .Itisnaturaltohaveourtesttaketheforminwhich wereject H 0 if X>r forsomeconstant r chosensothat P X>r =0 : 05 under H 0 Supposewewantanexacttest,notonebasedonanormalapproximation.Find r 3 .ConsidertheMethodofMomentsEstimator ^ c intherafeexample,Section4.4.1.Findtheexactvalue of Var ^ c .Usethefactsthat 1+2+ ::: + r = r r +1 = 2 and 1 2 +2 + :::;r 2 = r r +1 r +1 = 6 4 .Suppose W hasauniformdistributionon-c,c,andwedrawarandomsampleofsizen, W 1 ;:::;W n FindtheMethodofMomentsandMaximumLikelihoodestimators. 5 .Anurncontains marbles,oneofwhichisblackandtherestbeingwhite.Wedrawmarblesfromtheurn oneatatime,withoutreplacement,untilwedrawtheblackone;let N denotethenumberofdrawsneeded. FindtheMethodofMomentsestimatorof basedonX. 6 .Intherafeexample,Section4.4.1,nda )]TJ/F46 10.9091 Tf 11.011 0 Td [( %condenceintervalforcbasedon c ,theMaximum LikelihoodEstimateofc. Hint:UsetheexampleinSection4.2.13asaguide. 7 .Inmanyapplications,observationscomeincorrelatedclusters.Forinstance,wemaysamplertreesat

PAGE 165

4.6.NONPARAMETRICDENSITYESTIMATION 147 random,thensleaveswithineachtree.Clearly,leavesfromthesametreewillbemoresimilartoeachother thanleavesondifferenttrees. Inthiscontext,supposewehavearandomsample X 1 ;:::;X n ,neven,suchthatthereiscorrelationwithin pairs.Specically,supposethepair X 2 i +1 ;X 2 i +2 hasabivariatenormaldistributionwithmean ; andcovariancematrix 1 1 .153 i=0,...,n/2-1,withthen/2pairsbeingindependent.FindtheMethodofMomentsestimatorsof and 8 .CandidatesA,BandCarevyingforelection.Let p 1 p 2 and p 3 denotethefractionsofpeopleplanning tovoteforthem.Wepollnpeopleatrandom,yieldingestimates b p 1 b p 2 and b p 3 .Yclaimsthatshehasmore supportersthantheothertwocandidatescombined.Giveaformulaforanapproximate95%condence intervalfor p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( p 1 + p 3 .

PAGE 166

148 CHAPTER4.INTRODUCTIONTOSTATISTICALINFERENCE

PAGE 167

Chapter5 IntroductiontoModelBuilding Allmodelsarewrong,butsomeareuseful. GeorgeBox 1 [Mathematicalmodels]shouldbemadeassimpleaspossible,butnotsimpler. AlbertEinstein 2 TheabovequotebyBoxsaysitall.Considerforexamplethefamilyofnormaldistributions.Inreallife, randomvariablesareboundednoperson'sheightisnegativeorgreaterthan500inchesandareinherently discrete,duetotheniteprecisionofourmeasuringinstruments.Thus,technically,norandomvariablein practicecanhaveanexactnormaldistribution.Yettheassumptionofnormalitypervadesstatistics,andhas beenenormouslysuccessful,providedoneunderstandsitsapproximatenature. Thesituationissimilartothatofphysics.Weknowthatinmanyanalysesofbodiesinmotion,wecan neglecttheeffectofairresistance,butthatinsomesituationsonemustincludethatfactorinourmodel. So,theeldofprobabilityandstatisticsisfundamentallyabout modeling .Theeldisextremelyuseful, providedtheuserunderstandsthemodelingissueswell.Forthisreason,thisbookcontainsthisseparate chapteronmodelingissues. 5.1BiasVs.Variance ConsiderageneralestimatorQofsomepopulationvalueb.Thenacommonmeasureofthequalityofthe estimatorQisthe meansquarederror MSE, E [ Q )]TJ/F46 10.9091 Tf 10.909 0 Td [(b 2 ] .1 Ofcourse,thesmallertheMSE,thebetter. 1 GeorgeBox-isafamousstatistician,withseveralstatisticalproceduresnamedafterhim. 2 ThereaderisundoubtedlyawareofEinstein's-1955famoustheoriesofrelativity,butmaynotknowhisconnections toprobabilitytheory.Hisworkon Brownianmotion ,whichdescribesthepathofamoleculeasitisbombardedbyothers,is probabilisticinnature,andlaterdevelopedintoamajorbranchofprobabilitytheory.Einsteinwasalsoapioneerinquantum mechanics,whichisprobabilisticaswell.Atonepoint,hedoubtedthevalidityofquantumtheory,andmadehisfamousremark, Goddoesnotplaydicewiththeuniverse. 149

PAGE 168

150 CHAPTER5.INTRODUCTIONTOMODELBUILDING Onecanbreak.1downintovarianceandsquaredbiascomponents,asfollows: 3 MSE Q = E [ Q )]TJ/F46 10.9091 Tf 10.909 0 Td [(b 2 ] .2 = E [ f Q )]TJ/F46 10.9091 Tf 10.909 0 Td [(EQ + EQ )]TJ/F46 10.9091 Tf 10.909 0 Td [(b g 2 ] .3 = E [ Q )]TJ/F46 10.9091 Tf 10.909 0 Td [(EQ 2 ]+2 E [ Q )]TJ/F46 10.9091 Tf 10.909 0 Td [(EQ EQ )]TJ/F46 10.9091 Tf 10.909 0 Td [(b ]+ E [ EQ )]TJ/F46 10.9091 Tf 10.909 0 Td [(b 2 ] .4 = Var Q + EQ )]TJ/F46 10.9091 Tf 10.909 0 Td [(b 2 .5 = variance+squaredbias.6 Inotherwords,indiscussingtheaccuracyofanestimatorespeciallyincomparingtwoormorecandidates touseforourestimatortheaveragesquarederrorhastwomaincomponents,oneforvarianceandonefor bias.Inbuildingamodel,thesetwocomponentsareoftenatoddswitheachother;wemaybeabletond anestimatorwithsmallerbiasbutmorevariance,orviceversa. 5.2DesperateforData Supposewehavethesamplesofmen'sandwomen'sheightsdescribedinSection4.2.11,saywewishto predicttheheightHofanewpersonwhoweknowtobeamanbutforwhomweknownothingelse. Thequestionis,shouldwetakegenderintoaccountinourprediction?Ifso,wewouldpredictthemanto beofheight 4 T 1 = X; .7 ourestimateforthemeanheightofallmen.Ifnot,thenwepredicthisheighttobe T 2 = X + Y 2 ; .8 ourestimateofthemeanheightofallpeopleassumingthathalfthepopulationismale. RecallingournotationfromSection4.2.11,assumethat n 1 = n 2 ,andcallthecommonvaluen.Also,for simplicity,let'sassumethat 1 = 2 = 5.2.1MathematicalFormulationoftheProblem Let'sformalizethisabit.LetGdenotegender,1formale,2forfemale.Thenourrandomquantityhereis X,G.Ourexperimenthereistochooseapersonfromthepopulationatrandom.Thustheheightand genderwillberandomvariables. 3 Inreadingthefollowingderivation,keepinmindthatEQandbareconstants. 4 Assumingthatpredictingtoohighandtoolowareofequalconcerntous,etc.

PAGE 169

5.2.DESPERATEFORDATA 151 Thenthecorrectpopulationmodelis E H j G = i = i .9 andourpredictor T 1 reectsthis. 5 However, T 2 makesthesimplifyingassumptionthat 1 = 2 ,sothat E H j G = i = .10 where isthecommonvalueof 1 and 2 .We'llreferto.9asthe complexmodel twoparameters,not countingvariances,andto.10asthe simplemodel oneparameter,notcountingvariances. 5.2.2BiasandVarianceoftheTwoPredictors Sincethetruemodelis.9, T 1 isunbiased,from.5.Butthepredictor T 2 fromthesimplemodelis biased: E T 2 j G =1= E : 5 X +0 : 5 Y denition .11 =0 : 5 E X +0 : 5 E Y linearityofE .12 =0 : 5 1 +0 : 5 2 [ from.5 ] .13 6 = 1 .14 Ontheotherhand, T 2 hasasmallervariance:Recalling.9,wehave Var T 1 j G =1= 2 n .15 And Var T 2 j G =1= Var : 5 X +0 : 5 Y .16 =0 : 5 2 Var X +0 : 5 2 Var Y propertiesofVar .17 = 2 2 n [ from4.9 ] .18 5.2.3Implications Thesendingsarehighlyinstructive.Youmightatrstthinkthatofcourse T 1 wouldbethebetter predictorthan T 2 .Butforasmallsamplesize,thesmalleractually0biasof T 1 isnotenoughtocounteract itslargervariance. T 2 isbiased,yes,butitisbasedondoublethesamplesizeandthushashalfthevariance. 5 Wearecallingita predictor ratherthanan estimator asinotherexamples.Thisfollowscustom,whichistousethelatterterm whenthetargetisaconstant,e.g.apopulationmean,andtousetheformertermwhenthetargetisarandomquantity.Itisnota majordistinction.

PAGE 170

152 CHAPTER5.INTRODUCTIONTOMODELBUILDING Inlightof.6,weseethat T 1 ,thetruepredictor,maynotnecessarilybethebetterofthetwopredictors. Granted,ithasnobiaswhereas T 2 doeshaveabias,butthelatterhasasmallervariance. Let'sconsiderthisinmoredetail,using.5: MSE T 1 = 2 n +0 2 = 2 n .19 MSE T 2 = 2 2 n + 1 + 2 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 2 = 2 2 n + 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 2 2 .20 T 1 isabetterpredictorthan T 2 if.19issmallerthan.20,whichistrueif 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 2 2 > 2 2 n .21 Soyoucanseethat T 1 isbetteronlyifeither nislargeenough,or thedifferenceinpopulationmeanheightsbetweenmenandwomenislargeenough,or thereisnotmuchvariationwithineachpopulation,e.g.mostmenhaveverysimilarheights Sincethatthirditem,smallwithin-populationvariance,israrelyseen,let'sconcentrateonthersttwoitems. Thebigrevelationhereisthat: Amorecomplexmodelismoreaccuratethanasimpleroneonlyifeither wehaveenoughdatatosupportit,or thecomplexmodelissufcientlydifferentfromthesimplerone Inheight/genderexampleabove,ifnistoosmall,wearedesperatefordata,andthusmakeuseof thefemaledatatoaugmentourmaledata. Thoughwomentendtobeshorterthanmen,thebiasthat resultsfortheaugmentationisoffsetbythereductioninestimatorvariancethatweget.Butifnislarge enough,thevariancewillbesmallineithermodel,sowhenwegotothemorecomplexmodel,theadvantage gainedbyreducingthebiaswillmorethancompensatefortheincreaseinvariance. THISISANABSOLUTELYFUNDAMENTALNOTIONINSTATISTICS. Thiswasaverysimpleexample,butyoucanseethatincomplexsettings,ttingtoorichamodelcanresult inveryhighMSEsfortheestimates.Inessence,everythingbecomesnoise.Somepeoplehavecleverly coinedtheterm noisemining ,aplayontheterm datamining .Thisisthefamous overtting problem. Inourunitonstatisticalrelations,Chapter6,wewillshowtheresultsofascaryexperimentdoneatthe WhartonSchool,theUniversityofPennsylvania'sbusinessschool.Theresearchersdeliberatelyaddedfake datatoapredictionequation,andstandardstatisticalsoftwareidentieditassignicant!Thisispartly aproblemwiththeworditself,aswesawinSection4.3.8,butalsoaproblemofusingfartoocomplexa model,aswillbeseeninthatfutureunit.

PAGE 171

5.3.ASSESSINGGOODNESSOFFITOFAMODEL 153 5.3AssessingGoodnessofFitofaModel OurexampleinSection4.4.4concernedhowtoestimatetheparametersofagammadistribution,givena samplefromthedistribution.Butthatassumedthatwehadalreadydecidedthatthegammamodelwas reasonableinourapplication.Herewewillbeconcernedwithhowwemightcometosuchdecisions. Assumewehavearandomsample X 1 ;:::;X n fromadistributionhavingdensity f X 5.3.1TheChi-SquareGoodnessofFitTest Theclassicwaytodothiswouldbethe Chi-SquareGoodnessofFitTest .Wewouldset H 0 : f X isamemberoftheexponentialparametricfamily.22 Thiswouldinvolvepartitioning ; 1 intokintervals s i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 ;s i ofourchoice,andsetting N i = numberof X i in s i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ;s i .23 WewouldthenndtheMaximumLikelihoodEstimateMLEof ,ontheassumptionthatthedistribution ofXreallyisexponential.TheMLEturnsouttobethereciprocalofthesamplemean,i.e. b =1 = X .24 Thiswouldbeconsideredtheparameterofthebest-ttingexponentialdensityforourdata.Wewould thenestimatetheprobabilities p i = P [ X s i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 ;s i ]= e )]TJ/F47 7.9701 Tf 6.587 0 Td [(s i )]TJ/F45 5.9776 Tf 5.756 0 Td [(1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F47 7.9701 Tf 6.587 0 Td [(s i ;i =1 ;:::;k: .25 by b p i = e )]TJ/F53 7.9701 Tf 6.704 2.103 Td [(b s i )]TJ/F45 5.9776 Tf 5.756 0 Td [(1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F53 7.9701 Tf 6.704 2.103 Td [(b s i ;i =1 ;:::;k: .26 Notethat N i hasabinomialdistribution,withntrialsandsuccessprobability p i .Usingthis,theexpected valueof EN i isestimatedtobe i = n e )]TJ/F53 7.9701 Tf 6.704 2.104 Td [(b s i )]TJ/F45 5.9776 Tf 5.756 0 Td [(1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F53 7.9701 Tf 6.704 2.104 Td [(b s i ;i =1 ;:::;k: .27 Ourteststatisticwouldthenbe Q = k X i =1 N i )]TJ/F46 10.9091 Tf 10.909 0 Td [(v i 2 v i .28

PAGE 172

154 CHAPTER5.INTRODUCTIONTOMODELBUILDING where v i istheexpectedvalueof N i undertheassumptionofexponentialness.ItcanbeshownthatQis approximatelychi-squaredistributedwithk-2degreesoffreedom. 6 NotethatonlylargevaluesofQshould besuspicious,i.e.shouldleadustoreject H 0 ;ifQissmall,itindicatesagoodt.IfQwerelargeenoughto bearareevent,saylargerthan 0 : 95 ;k )]TJ/F44 7.9701 Tf 6.586 0 Td [(2 ,wewoulddecideNOTtousetheexponentialmodel;otherwise, wewoulduseit. Hopefullythereaderhasimmediatelyrecognizedtheproblemhere. Ifwehavealargesample,this procedurewillpounceontinydeviationsfromtheexponentialdistribution,andwewoulddecidenottouse theexponentialmodelevenifthosedeviationswerequiteminor.Again,nomodelis100%correct,and thusagoodnessofttestwilleventuallytellusnottouse any modelatall. 5.3.2Kolmogorov-SmirnovCondenceBands Againconsidertheproblemabove,inwhichwewereassessingthetofaexponentialmodel.Inlinewith ourmajorpointthatcondenceintervalsarefarsuperiortohypothesistests,wenowpresent KolmogorovSmirnovcondencebands ,whichworkasfollows. Recalltheconceptofempiricalcdfs,presentedinSection4.4.8.Itturnsoutthatthedistributionof M =max
PAGE 173

5.4.BIASVS.VARIANCEAGAIN 155 Warning:TheKolmogorov-SmirnovprocedureavailableintheRlanguageperformsonlyahypothesistest, ratherthanformingacondenceband.Inotherwords,itsimplycheckstoseewhetheramemberofthe familyfallswithintheband.Thisisnotwhatwewant,becausewemaybeperfectlyhappyifamemberis only near theband. Ofcourse,anotherway,thisonelessformal,ofassessingdataforsuitabilityforsomemodelistoplotthe datainahistogramorsomethingofthatnaure. 5.4BiasVs.VarianceAgain Inourunitonestimation,Section4.6,wesawaclassictradeoffinhistogram-andkernel-baseddensity estimators.Withhistograms,forinstance,thewiderbinwidthproducesagraphwhichissmoother,but possibly too smooth,i.e.withlessoscillationthanthetruepopulationcurvehas.Thesameproblemoccurs withlargervaluesofhinthekernelcase. Thisisactuallyyetanotherexampleofthebias/variancetradeoff,discussedinaboveand,asmentioned, ONEOFTHEMOSTRECURRINGNOTIONSINSTATISTICS .Alargebinwidth,oralargevalue ofh,producesmorebias.Ingeneral,thelargethebinwidthorh,thefurther E [ b f R t isfromthetruevalue of f R t .Thisoccursbecausewearemakinguseofpointswhicharenotsoneart,andthusatwhichthe densityheightisdifferentfromthatof f R t .Ontheotherhand,becausewearemakinguseofmorepoints, Var [ b f r t ] willbesmaller. THEREISNOGOODWAYTOCHOOSETHEBINWIDTHORh .Eventhoughthereisalotof theorytosuggesthowtochoosethebinwidthorh,nomethodisfoolproof.Thisismadeevenworsebythe factthatthetheorygenerallyhasagoalofminimizing integrated meansquarederror, Z 1 E b f R t )]TJ/F46 10.9091 Tf 10.909 0 Td [(f R t 2 dt .33 ratherthan,say,themeansquarederrorataparticularpointofinterest,v: E b f R t )]TJ/F46 10.9091 Tf 10.909 0 Td [(f R t 2 .34 5.5Robustness Traditionally,theterm robust instatisticshasmeantresiliencetoviolationsinassumptions.Forexample,in Section4.2.8,wepresentedStudent-t,amethodforndingexactcondenceintervalsformeans,assuming normally-distributedpopulations.Butasnotedattheoutsetofthischapter,nopopulationintherealworld hasanexactnormaldistribution.Thequestionathandwhichwewilladdressbelowis,doestheStudent-t methodstillgiveapproximatelycorrectresultsifthesamplepopulationisnotnormal?Ifso,wesaythat Student-tis robust tothenormalityassumption. Later,therewasquitealotofinterestamongstatisticiansinestimationproceduresthatdowellevenifthere are outliers inthedata,i.e.erroneousobservationsthatareinthefringesofthesample.Suchprocedures

PAGE 174

156 CHAPTER5.INTRODUCTIONTOMODELBUILDING aresaidtoberobusttooutliers. Ourinteresthereisonrobustnesstoassumptions.LetusrstconsidertheStudent-texample.Asdiscussed inSection4.2.8,themainstatistichereis T = X )]TJ/F46 10.9091 Tf 10.909 0 Td [( s= p n .35 where isthepopulationmeanand s istheunbiasedversionofthesamplevariance: s = s P n i =1 X i )]TJ/F15 10.9091 Tf 14.038 2.757 Td [( X 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .36 Thedistributionof T ,undertheassumptionofanormalpopulation,hasbeentabulated,andtablesforit appearinvirtuallyeverytextbookonstatistics.Butwhatifthepopulationisnotnormal,asisinevitablythe case? Theansweristhatitdoesn'tmatter.Forlargen,evenforsampleshaving,say,n=20,thedistributionof T isclosetoN,1bytheCentralLimitTheoremregardlessofwhetherthepopulationisnormal. Bycontrast,considertheclassicprocedureforperforminghypothesistestsandformingcondenceintervals forapopulationvariance 2 ,whichreliesonthestatistic K = n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 s 2 2 .37 whereagain s 2 istheunbiasedversionofthesamplevariance.Ifthesampledpopulationisnormal,then K canbeshowntohaveachi-squaredistributionwithn-1degreesoffreedom.Thisthensetsupthe testsorintervals.However,ithasbeenshownthattheseproceduresarenotrobusttotheassumptionofa normalpopulation.See TheAnalysisofVariance:Fixed,Random,andMixedModels ,byHardeoSahaiand MohammedI.Ageel,Springer,2000,andtheearlierreferencestheycite,especiallythepioneeringworkof Scheffe'. Exercises Notetoinstructor: SeethePrefaceforalistofsourcesofrealdataonwhichexercisescanbeassignedto complementthetheoreticalexercisesbelow. 1 .InourexampleinSection5.2,assume 1 =70 ; 2 =66 ; =4 andthedistributionofheightisnormal inthetwopopulations.Supposewearepredictingtheheightofamanwho,unknowntous,hasheight68. Wehopetoguesswithintwoinches.Find P j T 1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(68 j < 2 and P j T 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(68 j < 2 forvariousvaluesofn. 2 .InSection4.2.16wediscussed simultaneousinference ,theformingofcondenceintervalswhosejoint condencelevelwas95%orsomeothertargetvalue.TheKolmogorov-SmirnovcondencebandinSection 5.3.2allowsustocomputerinnitelymanycondenceintervalsfor F X t atdifferentvaluesoft,ata priceofonly1.358.Still,ifwearejustestimating F X t atasinglevalueoft,anindividualcondence intervalusing.32wouldbenarrowerthanthatgiventousbyKolmogorov-Smirnov.Comparethewidths ofthesetwointervalsinasituationinwhichthetruevalueof F X t =0 : 4 .

PAGE 175

Chapter6 StatisticalRelationsBetweenVariables 6.1TheGoals:PredictionandUnderstanding Predictionisdifcult,especiallywhenit'saboutthefuture. YogiBerra 1 Inthisunitweareinterestedinrelationsbetweenvariables.Beforebeginning,itisimportanttounderstand thetypicalgoalsinanalyzingsuchrelations: Prediction: Herewearetryingtopredictonevariablefromoneormoreothers. Understanding: Herewewishtodeterminewhichvariableshaveagreatereffectonagivenvariable. Denotethepredictorvariablesby, X ;:::;X r .Thevariabletobepredicted,Y,isoftencalledthe response variable Acommonstatisticalmethodologyusedforsuchanalysesiscalled regressionanalysis .Intheimportant specialcasesinwhichtheresponsevariableYisan indicatorvariable 2 takingonjustthevalues1and0 toindicateclassmembership,wecallthisthe classicationproblem .Ifwehavemorethantwoclasses, weneedseveralYs. Intheabovecontext,weareinterestedintherelationofasinglevariableYwithothervariables X i .But insomeapplications,weareinterestedinthemoresymmetricproblemofrelations among variables X i withtherebeingnoY.Atypicaltoolforthecaseofcontinuousrandomvariablesis principalcomponents analysis ,andapopularoneforthediscretecaseis log-linearmodel ;bothwillbediscussedlaterinthis unit. 6.2ExampleApplications:SoftwareEngineering,Networks,TextMining Example: Asanaidindecidingwhichapplicantstoadmittoagraduateprogramincomputerscience,we 1 YogiBerra-isaformerbaseballplayerandmanager,famousforhismalapropisms,suchasWhenyoureachaforkin theroad,takeit;Thatrestaurantissocrowdedthatnoonegoesthereanymore;andIneversaidhalfthethingsIreallysaid. 2 Sometimescalleda dummyvariable 157

PAGE 176

158 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES mighttrytopredictY,afacultyratingofastudentaftercompletionofhis/herrstyearintheprogram,from X =thestudent'sCSGREscore, X =thestudent'sundergraduateGPAandvariousothervariables. HereourgoalwouldbePrediction,buteducationalresearchersmightdothesamethingwiththegoal ofUnderstanding.Foranexampleofthelatter,seePredictingAcademicPerformanceintheSchoolof Computing&InformationTechnologySCIT, 35thASEE/IEEEFrontiersinEducationConference ,by PaulGoldingandSophiaMcNamarah,2005. Example: Inapaper,EstimationofNetworkDistancesUsingOff-lineMeasurements, ComputerCommunications ,byDannyRaz,NidhanChoudhuriandPrasunSinha,2006,theauthorswantedtopredictY, theround-triptimeRTTforpacketsinanetwork,usingthepredictorvariables X =geographicaldistancebetweenthetwonodes, X =numberofrouter-to-routerhops,andothervariables.Thegoalhereis primarilyPrediction. Example: Inapaper,ProductivityAnalysisofObject-OrientedSoftwareDevelopedinaCommercialEnvironment, SoftwarePracticeandExperience ,byThomasE.Potok,MladenVoukandAndyRindos,1999, theauthorsmainlyhadanUnderstandinggoal:Whatimpact,positiveornegative,doestheuseofobjectorientedprogramminghaveonprogrammerproductivity?HeretheypredictedY=numberofperson-months neededtocompletetheproject,from X =sizeoftheprojectasmeasuredinlinesofcode, X =1or0 dependingonwhetheranobject-orientedorproceduralapproachwasused,andothervariables. Example: Most textmining applicationsareclassicationproblems.Forexample,thepaperUntangling TextDataMining, ProceedingsofACL'99 ,byMartiHearst,1999cites, interalia ,anapplicationinwhich theanalystswishedtoknowwhatproportionofpatentscomefrompubliclyfundedresearch.Theywere usingapatentdatabase,whichofcourseisfartoohugetofeasiblysearchbyhand.Thatmeantthatthey neededtobeabletoreasonablyreliablypredictY=1or0accordingtowhetherthepatentwaspublicly fundedfromanumberof X i ,eachofwhichwasanindicatorvariableforagivenkeyword,suchasNSF. TheywouldthentreatthepredictedYvaluesastherealones,andestimatetheirproportionfromthem. 6.3RegressionAnalysis 6.3.1WhatDoesRelationshipReallyMean? ConsidertheDaviscitypopulationexampleagain.Inadditiontotherandomvariable W forweight,let H denotetheperson'sheight.Supposeweareinterestedinexploringtherelationshipbetweenheightand weight. Asusual,wemustrstask, whatdoesthatreallymean ?Whatdowemeanbyrelationship?Clearly, thereisnoexactrelationship;forinstance,wecannotexactlypredictaperson'sweightfromhis/herheight. Intuitively,though,wewouldguessthatmeanweightincreaseswithheight.Tostatethisprecisely,takeY tobetheweightWand X tobetheheightH,anddene m W ; H t = E W j H = t .1 Thislooksabstract,butitisjustcommon-sensestuff.Forexample, m W ; H wouldbethemeanweightof allpeopleinthepopulationofheight68inches.Thevalueof m W ; H t varieswitht,andwewouldexpect

PAGE 177

6.3.REGRESSIONANALYSIS 159 thatagraphofitwouldshowanincreasingtrendwitht,reectingthattallerpeopletendtobeheavier. Wecall m W ; H the regressionfunctionofWonH .Ingeneral, m Y ; X t meansthemeanof Y amongall unitsinthepopulationforwhich X = t Notetheword population inthatlastsentence.Thefunctionmisapopulation function. Now,let'sagainsupposewehavearandomsampleof1000peoplefromDavis,with H 1 ;W 1 ;:::; H 1000 ;W 1000 .2 beingtheirheightsandweights.Weagainwishtousethisdatatoestimatepopulationvalues.Butthe differencehereisthatweareestimatingawholefunctionnow,thewholecurvem.Thatmeansweare estimatinginnitelymanyvalues,withone m W ; H t valueforeacht. 3 Howdowedothis? Thetraditionalmethodistochooseaparametricmodelfortheregressionfunction.Thatwayweestimate onlyanitenumberofquantitiesinsteadofaninnitenumber. Typicallytheparametricmodelchosenislinear,i.e.weassumethat m W ; H t isalinearfunctionoft: m W ; H t = ct + d .3 forsomeconstantscandd.Ifthisassumptionisreasonablemeaningthatthoughitmaynotbeexactlytrue itisreasonablyclosethenitisahugegainforusoveranonparametricmodel.Doyouseewhy?Again, theansweristhatinsteadofhavingtoestimateaninnitenumberofquantities,wenowmustestimateonly twoquantitiestheparameterscandd. Equation.3isthuscalleda parametric modelof m W ; H .Thesetofstraightlinesindexedbycand disatwo-parameterfamily,analogoustoparametricfamiliesofdistributions,suchasthetwo-parametric gammafamily;thedifference,ofcourse,isthatinthegammacaseweweremodelingadensityfunction, andherewearemodelingaregressionfunction. Notethatcanddareindeedpopulationparametersinthesamesensethat,forinstance,rand areparameters inthegammadistributionfamily.WewillseehowtoestimatecanddinSection6.3.7. 6.3.2MultipleRegression:MoreThanOnePredictorVariable Notethat X andtcouldbevector-valued.Forinstance,wecouldhave Y beweightandhave X bethepair X = X ;X = H;A = height,age.4 soastostudytherelationshipofweightwithheightandage.Ifweusedalinearmodel,wewouldwritefor t = t 1 ;t 2 m W ; H t = 0 + 1 t 1 + 2 t 2 .5 3 Ofcourse,thepopulationofDavisisnite,butthereistheconceptualpopulationofallpeoplewho could liveinDavis.

PAGE 178

160 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Inotherwords meanweight = 0 + 1 height + 2 age.6 ItistraditionaltousetheGreekletter tonamethecoefcientsinalinearregressionmodel. Soforinstance m W ; H ; 37 : 2 wouldbethemeanweightinthepopulationofallpeoplehavingheight68 andage37.2. 6.3.3InteractionTerms Equation.5implicitlysaysthat,forinstance,theeffectofageonweightisthesameatallheightlevels. Inotherwords,thedifferenceinmeanweightbetween30-year-oldsand40-year-oldsisthesameregardless ofwearelookingattallpeopleorshortpeople.Toseethat,justplug40and30foragein.5,withthe samenumberforheightinboth,andsubtract;youget 10 2 ,anexpressionthathasnoheightterm. Ifwefeelthattheassumptionisnotagoodonetherearealsodataplottingtechniquestohelpassessthis, wecanaddan interactionterm to.5,consistingoftheproductofthetwooriginalpredictors.Ournew predictorvariable X isequalto X X ,andthusourregressionfunctionis m W ; H t = 0 + 1 t 1 + 2 t 2 + 3 t 1 t 2 .7 Ifyouperformthesamesubtractiondescribedabove,you'llseethatthismorecomplexmodeldoesnot assume,astheolddid,thatthedifferenceinmeanweightbetween30-year-oldsand40-year-oldsisthe sameregardlessofwearelookingattallpeopleorshortpeople. Recallthestudyofobject-orientedprogramminginSection6.1.Theauthorsthereset X = X X Thereadershouldmakesuretounderstandthatwithoutthisterm,wearebasicallysayingthattheeffect whetherpositiveornegativeofusingobject-orientedprogrammingisthesameforanycodesize. Thoughtheideaofaddinginteractiontermstoaregressionmodelistempting,itcaneasilygetoutofhand. Ifwehavekbasicpredictorvariables,thenthereare k 2 potentialtwo-wayinteractionterms, k 3 three-waytermsandsoon.Unlesswehaveaverylargeamountofdata,werunabigriskofovertting Section6.3.9.1.Andwithsomanyinteractionterms,themodelwouldbedifculttointerpret. 6.3.4NonrandomPredictorVariables Inourweight/height/ageexampleabove,allthreevariablesarerandom.Ifwerepeattheexperiment,i.e. wechooseanothersampleof1000people,thesenewpeoplewillhavedifferentweights,differentheights anddifferentagesfromthepeopleintherstsample. Butwemustpointoutthatthefunction m Y ; X makessenseevenif X isnonrandom.Toillustratethis,let's lookattheALOHAnetworkexampleinourintroductoryunitondiscreteprobability,Section1.1. 1 #simulationofsimpleformofslottedALOHA 2

PAGE 179

6.3.REGRESSIONANALYSIS 161 3 #anodeisactiveifithasamessagetosenditwillneverhavemore 4 #thanoneinthismodel,inactiveotherwise 5 6 #theinactiveshaveachancetogoactiveearlierwithinaslot,after 7 #whichtheactivesincludingthosenewly-activemaytrytosend;if 8 #thereisacollision,nomessagegetsthrough 9 10 #parametersofthesystem: 11 #s=numberofnodes 12 #b=probabilityanactivenoderefrainsfromsending 13 #q=probabilityaninactivenodebecomesactive 14 15 #parametersofthesimulation: 16 #nslots=numberofslotstobesimulated 17 #nb=numberofvaluesofbtorun;theywillbeevenlyspacedin,1 18 19 #willfindmeanmessagedelayasafunctionofb; 20 21 #wewillrelyonthe"ergodicity"ofthisprocess,whichisaMarkov 22 #chainseehttp://heather.cs.ucdavis.edu/matloff/132/PLN/Markov.tex, 23 #whichmeansthatwelookatjustonerepetitionofobservingthechain 24 #throughmanytimeslots 25 26 #mainloop,runningthesimulationformanyvaluesofb 27 alohamain<-functions,q,nslots,nb{ 28 deltab=0.7/nb#we'lltrynbvaluesofbin.2,0.9 29 md<-matrixnrow=nb,ncol=2 30 b<-0.2 31 foriin1:nb{ 32 b<-b+deltab 33 w<-alohasims,b,q,nslots 34 md[i,]<-alohasims,b,q,nslots 35 } 36 returnmd 37 } 38 39 #simulatetheprocessforhslots 40 alohasim<-functions,b,q,nslots{ 41 #status[i,1]=1or0,fornodeiactiveornot 42 #status[i,2]=ifnodeiactive,thenepochinwhichmsgwascreated 43 #couldtryaliststructureinsteadamatrix 44 status<-matrixnrow=s,ncol=2 45 #startwithallactivewithmsgcreatedattime0 46 fornodein1:sstatus[node,]<-c,0 47 nsent<-0#numberofsuccessfultransmitssofar 48 sumdelay<-0#totaldelayamongsuccessfultransmitssofar 49 #nowsimulatethenslotsslots 50 forslotin1:nslots{ 51 #checkfornewactives 52 fornodein1:s{ 53 if!status[node,1]#inactive 54 ifrunifb{ 61 ntrysend<-ntrysend+1 62 whotried<-node 63 } 64 } 65 ifntrysend==1{#somethinggetsthroughiffexactlyonetries 66 #doourbookkeeping 67 sumdelay<-sumdelay+slot-status[whotried,2]

PAGE 180

162 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES 68 #thisnodenowbacktoinactive 69 status[whotried,1]<-0 70 nsent<-nsent+1 71 } 72 } 73 returncb,sumdelay/nsent 74 } AminorchangeisthatIreplacedtheprobabilityp,theprobabilitythatanactivenodewouldsendinthe originalexampletob,theprobabilityof not sendingbforbackoff.LetAdenotethetimeAmeasured inslotsbetweenthecreationofamessageandthetimeitissuccessfullytransmitted. Weareinterestedinmeandelay,i.e.themeanofA.Weareparticularlyinterestedintheeffectofbhereon thatmean.Ourgoalhere,asdescribedinSection6.1,couldbePrediction,sothatwecouldhaveanideaof howmuchdelaytoexpectinfuturesettings.Or,wemaywishtoexplorendinganoptimalb,i.e.onethat minimizingthemeandelay,inwhichcaseourgoalwouldbemoreinthedirectionofUnderstanding. Irantheprogramwithcertainarguments,andthenplottedthedata: >md<-alohamain,0.1,1000,100 >plotmd,cex=0.5,xlab="b",ylab="A" TheplotisshowninFigure6.1. Notethatthoughourvaluesbherearenonrandom,theAvaluesareindeedrandom.Todramatizethatpoint, Irantheprogramagain.Remember,unlessyouspecifyotherwise,Rwilluseadifferentseedforitsrandom numberstreameachtimeyourunaprogram.I'vesuperimposedthisseconddatasetontherst,usinglled circlesthistimetorepresentthepoints: md2<-alohamain,0.1,1000,100 pointsmd2,cex=0.5,pch=19 TheplotisshowninFigure6.2. WedoexpectsomekindofU-shapedrelation,asseenhere.Forbtoosmall,thenodesareclashingwith eachotheralot,causinglongdelaystomessagetransmission.Forbtoolarge,weareneedlesslybacking offinmanycasesinwhichweactuallywouldgetthrough. Thislookslikeaquadraticrelationship,meaningthefollowing.TakeourresponsevariableYtobeA,take ourrstpredictor X tobeb,andtakeoursecondpredictor X tobe b 2 .ThenwhenwesayAandb haveaquadraticrelationship,wemean m A ; b b = 0 + 1 b + 2 b 2 .8 forsomeconstants 0 ; 1 ; 2 .So,weareusingathree-parameterfamilyforourmodelof m A ; b .Nomodel isexact,butourdataseemtoindicatethatthisoneisreasonablygood,andiffurtherinvestigationconrms that,itprovidesforanicecompactsummaryofthesituation. Again,we'llseehowtoestimatethe i inSection6.3.7.

PAGE 181

6.3.REGRESSIONANALYSIS 163 Figure6.1:ScatterPlot Wecouldalsotryaddingtwomorepredictorvariables,consistingof X = q and X = s .Wewould collectmoredata,inwhichwevariedthevaluesofqands,andthencouldentertainthemodel m A ; b b = 0 + 1 b + 2 b 2 + 3 q + 4 s .9 6.3.5Prediction So,we'vetakenourdataonweight/height/age,andestimatedthefunctionmusingthatdata,yielding b m Now,anewpersoncomesin,ofheight70.4andage24.8.Whatshouldwepredicthisweighttobe? Theansweristhatwepredicthisweighttobeourestimatedmeanweightforhisheight/agegroup, b m W ; H;A : 4 ; 24 : 8 .10

PAGE 182

164 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Figure6.2:ScatterPlot,TwoDataSets Ifourmodelis.5,then.10is m W ; H t = b 0 + b 1 70 : 4+ b 2 24 : 8 .11 wherethe b i areestimatedfromourdataasinSection6.3.7below. 6.3.6OptimalityoftheRegressionFunction InpredictingYfromXwithXrandom,wemightassessourpredictiveabilitybythe meansquared predictionerror MSPE: MSPE = E Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(w X 2 .12 wherewissomefunctionwewillusetoformourpredictionforYbasedonX.Whatwisbest,i.e.whichw

PAGE 183

6.3.REGRESSIONANALYSIS 165 minimizesMSPE? Toanswerthisquestion,conditiononXin.12: MSPE = E E f Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(w X 2 j X g .13 Theorem8 Thebestwism,i.e.thebestwaytopredictYfromXistopluginXintheregressionfunction. Weneedthislemma: Lemma9 ForanyrandomvariableZ,theconstantcwhichminimizes E [ Z )]TJ/F46 10.9091 Tf 10.909 0 Td [(c 2 ] .14 is c = EZ .15 Proof Expand.14to E Z 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 cEZ + c 2 .16 andusecalculustondthebestc. Applythelemmatotheinnerexpectationin.13,withZbeingYandcbeingsomefunctionofX.The minimizingvalueisEZ,i.e. E Y j X sinceourexpectationhereisconditionalonX. Allofthistellsusthatthebestfunctionwin.12is m Y ; X .Thisprovesthetheorem. 6.3.7ParametricEstimationofLinearRegressionFunctions 6.3.7.1MeaningofLinear Herewemodel m Y ; X asalinearfunctionof X ;:::;X r : m Y ; X t = 0 + 1 t + ::: + r t r .17 Notethattheterm linearregression doesNOTnecessarilymeanthatthegraphoftheregressionfunction isastraightlineoraplane.Instead,theword linear referstotheregressionfunctionbeinglinearinthe

PAGE 184

166 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES parameters.So,forinstance,.8isalinearmodel;ifforexamplewemultiple 0 1 and 2 by8,thenm ismultipliedby8. 6.3.7.2PointEstimatesandMatrixFormulation So,howdoweestimatethe i ?Lookforinstanceat.8.Keepinmindthatin.8,the i arepopulation values.Weneedtoestimatethemfromourdata.Howdowedothat? Let'sdene b i ;A i tobethe i th pairfromthesimulation.Intheprogram,thisis md[i,] .Ourestimated parameterswillbedenotedby ^ i .UsingtheresultofSection6.3.5asaguide,theestimationmethodology involvesndingthevaluesof ^ i whichminimizethesumofsquareddifferencesbetweentheactualAvalues andtheirpredictedvalues: 100 X i =1 [ A i )]TJ/F15 10.9091 Tf 10.909 0 Td [( ^ 0 + ^ 1 b i + ^ 2 b 2 i ] 2 .18 Obviously,thisisacalculusproblem.Wesetthepartialderivativesof.18withrespecttothe ^ i to0, givingusethreelinearequationsinthreeunknowns,andthensolve. Forthegeneralcase.17,wehaver+1equationsinr+1unknowns.Thisismostconvenientlyexpressed inmatrixterms.Let X j i bethevalueof X j forthe i th observationinoursample,andlet Y i bethe correspondingYvalue.Pluggingthisdatainto.3.7.1,wehave E Y i j X i ;:::;X r i = 0 + 1 X i + ::: + r X r i ;i =1 ;:::;n .19 That'sasystemofnlinearequations,whichfromyourlinearalgebraclassyouknowcanberepresented morecompactlybyamatrix.Thatwouldbe E V j Q = Q .20 wherewith 0 denotingmatrixtransposeandavectorwithouta 0 beingarowvector V = Y 1 ;:::;Y n 0 ; .21 = 0 ; 1 ;:::; r 0 .22 andQisthenxr+1matrixwhosei,jelementis X j i ,with X i takentobe1.Forinstance,ifweare predictingweightfromheightandage,thenrow5ofQwouldconsistofa1,thentheheightandageofthe fthpersoninoursample. Nowtoestimatethe i ,let ^ = ^ 0 ; ^ 1 ;:::; ^ r 0 .23

PAGE 185

6.3.REGRESSIONANALYSIS 167 Thenitcanbeshownthat,afterallthepartialderivativesaretakenandsetto0,thesolutionis ^ = Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 Q 0 V .24 6.3.7.3BacktoOurALOHAExample Roranyotherstatisticalpackagedoestheworkforus.InR,wecanusethe lm linearmodelfunction: >md<-cbindmd,md[,1] >lmout<-lmmd[,2]md[,1]+md[,3] FirstIaddedanewcolumntothedatamatrix,consistingof b 2 .Ithencalled lm ,withtheargument md[,2]md[,1]+md[,3] Rdocumentationcallsthismodelspecicationargumentthe formula .ItstatesthatIwishtousetherstand thirdcolumnsof md ,i.e. b and b 2 ,aspredictors,anduseA,i.e.secondcolumn,astheresponsevariable. 4 Thereturnvaluefromthiscall,whichI'vestoredin lmout ,isanobjectofclass lm .Oneofthemember variablesofthatclass, coefcients ,isthevector b : >lmout$coefficients Interceptmd[,1]md[,3] 27.56852-90.7258579.98616 So, b 0 =27 : 57 andsoon. Theresultis b m A;b t =27 : 57 )]TJ/F15 10.9091 Tf 10.909 0 Td [(90 : 73 t +79 : 99 t 2 .25 Anothermembervariableinthe lm classis tted.values .Thisisthettedcurve,meaningthevaluesof .25at b 1 ;:::;b 100 .Inotherwords,thisis.25.Iplottedthiscurveonthesamegraph, >linescbindmd[,1],lmout$fitted.values SeeFigure6.3.Asyoucansee,thetlooksfairlygood.Whatshouldwelookfor? Remember,wedon'texpectthecurvetogothroughthepointsweareestimatingthemean ofAfor eachb,nottheAvaluesthemselves. Thereisalwaysvariationaroundthemean.Ifforinstanceweare lookingattherelationshipbetweenpeopleheightsandweights,themeanweightforpeopleofheight70 inchesmightbe,say,160pounds,butweknowthatsome70-inch-tallpeopleweighmorethanthisandsome weighless. 4 Unfortunately,Rdidnotallowmetoputthesquaredcolumndirectlyintotheformula,forcingmetouse cbind tomakea newmatrix.

PAGE 186

168 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Figure6.3:QuadraticFitSuperimposed However,thereseemstobeatendencyforourestimatesof b m A;b t tobetoolowforvaluesinthemiddle rangeoft,andpossibletoohighfortaround0.3or0.4. However,withasamplesizeofonly100,it's difculttotell. It'salwaysimportanttokeepinmindthatthedataarerandom;adifferentsamplemayshow somewhatdifferentpatterns.Nevertheless,weshouldconsideramorecomplexmodel. SoItriedaquartic,i.e.fourth-degree,polynomialmodel.Iaddedthird-andfourth-powercolumnsto md callingtheresult md4 ,andinvokedthecall lmmd4[,2]md4[,1]+md4[,3]+md4[,4]+md4[,5] Theresultwas >lmout$coefficients Interceptmd4[,1]md4[,3]md4[,4]md4[,5] 95.98882-664.027801731.90848-1973.00660835.89714

PAGE 187

6.3.REGRESSIONANALYSIS 169 Figure6.4:FourthDegreeFitSuperimposed Inotherwords,wehaveanestimatedregressionfunctionof b m A;b t =95 : 98882 )]TJ/F15 10.9091 Tf 10.909 0 Td [(664 : 02780 t +1731 : 90848 t 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1973 : 00660 t 3 +835 : 89714 t 4 .26 ThetisshowninFigure6.4.Itlooksmuchbetter.Ontheotherhand,wehavetoworryaboutovertting. WereturntothisissueinSection6.3.9.1. 6.3.7.4ApproximateCondenceIntervals Asusual,weshouldnotbesatisedwithjustpointestimates,inthiscasethe b i .Weneedanindication ofhowaccuratetheyare,soweneedcondenceintervals.Inotherwords,weneedtousethe b i toform condenceintervalsforthe i Forinstance,recallthestudyonobject-orientedprogramminginSection6.1.Thegoaltherewasprimarily Understanding,specicallyassessingtheimpactofOOP.Thatimpactismeasuredby 2 .Thus,wewantto

PAGE 188

170 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES ndacondenceintervalfor 2 Equation.24showsthatthe b i aresumsofthecomponentsofV,i.e.the Y j .So,theCentralLimit Theoremimpliesthatthe b i areapproximatelynormallydistributed.Thatinturnmeansthat,inorderto formcondenceintervals,weneedstandarderrorsforthe i .Howwillwegetthem? NotecarefullythatsofarwehavemadeNOassumptionsotherthan.17.Now,though,weneedtoaddan assumption: 5 Var Y j X = t = 2 .27 forallt.Notethatthisandtheindependenceofthesampleobservationse.g.thevariouspeoplesampledin theDavisheight/weightexampleareindependentofeachotherimpliesthat Cov V j Q = 2 I .28 whereIistheusualidentiymatrixsonthediagonal,0soffdiagonal. Besureyouunderstandwhatthismeans.IntheDavisweightsexample,forinstance,itmeansthatthe varianceofweightamong72-inchtallpeopleisthesameasthatfor65-inch-tallpeople.Thatisnotquite truethetallergrouphaslargervariancebutit'sprobablyaccurateenoughforourpurposeshere. Keepinmindthatthederivationbelowisconditionalonthe X i j ,whichisthestandardapproach, especiallysincethereisthecaseofnonrandomX.Thuswewilllatergetconditionalcondenceintervals, whichisne.Toavoidclutter,Iwillsometimesnotshowtheconditioningexplicitly,andthusforinstance willwriteCovVinsteadofCovV j Q. Wecanderivethecovariancematrixof ^ ,asfollows.First,wecaneasilyderivethatforanymx1random vectorMandconstanti.e.nonrandommatrixcwithmcolumns, Cov cM = cCov M c 0 .29 Also,onecanshowthatthetransposeoftheproductoftwomatricesisthereverseproductofthetransposes. In.29,set c = Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 Q 0 and M = V .Thenfrom.24, Cov ^ =[ Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ] Q 0 Cov V Q [ Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ] 0 .30 =[ Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ] Q 0 2 IQ [ Q 0 Q )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 ] 0 .31 = 2 Q 0 Q )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 .32 HerewehaveusedthefactthatQ'Qisasymmetricmatrix,whichimpliesthesamepropertyforitsinverse. Whew!That'salotofworkforyou,ifyourlinearalgebraisrusty.Butit'sworthit,because.30now givesuswhatweneedforcondenceintervals.Here'show: 5 Actually,wecouldderivesomeusable,thoughmessy,standarderrorswithoutthisassumption.

PAGE 189

6.3.REGRESSIONANALYSIS 171 First,weneedtoestimate 2 .RecallrstthatforanyrandomvariableU, Var U = E [ U )]TJ/F46 10.9091 Tf 11.203 0 Td [(EU 2 ] ,we have 2 = Var Y j X = t .33 = Var Y j X = t 1 ;:::;X r = t r .34 = E f Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(m Y ; X t g 2 .35 = E Y )]TJ/F46 10.9091 Tf 10.909 0 Td [( 0 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 1 t 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(::: )]TJ/F46 10.9091 Tf 10.909 0 Td [( r t r 2 .36 Thus,anaturalestimatefor 2 wouldbethesampleanalog,wherewereplaceEbyaveragingoverour sample,andreplacepopulationquantitiesbysampleestimates: s 2 = 1 n n X i =1 Y i )]TJ/F15 10.9091 Tf 13.633 2.878 Td [(^ 0 )]TJ/F15 10.9091 Tf 13.633 2.878 Td [(^ 1 X i )]TJ/F46 10.9091 Tf 10.909 0 Td [(::: )]TJ/F15 10.9091 Tf 13.544 2.878 Td [(^ r X r i 2 .37 So,theestimatedcovariancematrixfor ^ is d Cov ^ = s 2 Q 0 Q )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 .38 6.3.7.5OnceAgain,OurALOHAExample InRwecanobtain.38viathegenericfunction vcov : >vcovlmout Interceptmd4[,1]md4[,3]md4[,4]md4[,5] Intercept92.73734-794.47552358.860-2915.2381279.981 md4[,1]-794.475536896.8443-20705.70525822.832-11422.355 md4[,3]2358.86046-20705.704762804.912-79026.08635220.412 md4[,4]-2915.2382825822.8320-79026.086100239.652-44990.271 md4[,5]1279.98125-11422.355035220.412-44990.27120320.809 Whatisthistellingus?Forinstance,itissayingthatthe,4positioninthematrix.38isequalto 20320.809,sothestandarderrorof b 4 isthesquarerootofthis,142.6.Thusanapproximate95%condence intervalforthetruepopulation 4 is 835 : 89714 1 : 96 142 : 6= : 4 ; 1115 : 4 .39 Thatintervalisquitewide.Rememberwhatthistellsusthatoursampleofsize100isnotverylarge.On theotherhand,theintervalisquitefarfrom0,whichindicatesthatourfourth-degreemodelislegitimately betterthanourquadraticone. Bytheway,applyingtheRfunction summary toalinearmodelobjectsuchas lmout heregivesstandard errorsforthe b i andlotsofotherinformation.

PAGE 190

172 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES 6.3.7.6EstimationVs.Prediction Instatisticalparlance,thereisakeendistinctionmadebetweenthewords estimation and prediction .To explainthis,let'sagainconsidertheexampleofpredictingY=weightfromX=height,age.Saywehave someoneofheight67inchesandage27,andwanttoguessi.e. predict herweight. FromSection6.3.6,weknowthatthebestpredictionism[,27].However,wedonotknowthevalueof thatquantity,sowemust estimate itfromourdata.So,our predictedvalue forthisperson'sweightwillbe ^ m [ ; 27] ,i.e.our estimate forthevalueoftheregressionfunctionatthepoint,27. 6.3.7.7ExactCondenceIntervals NotecarefullythatwehavenotassumedthatY,givenX,isnormallydistributed. Intheheight/weight context,forexample,suchanassumptionwouldmeanthatweightsinaspecicheightsubpopulation,say allpeopleofheight70inches,haveanormaldistribution. Ifwedomakesuchanassumption,thenwecangetexactcondenceintervalswhichofcourse,onlyhold ifwereallydohaveanexactnormaldistributioninthepopulation.ThisagainusesStudent-tdistributions. Inthatanalysis, s 2 hasn-r+1initsdenominatorinsteadofourn,justastherewasn-1inthedenominator for s 2 whenweestimatedasinglepopulationvariance.ThenumberofdegreesoffreedomintheStudent-t distributionislikewisen-r+1.Butasbefore,forevenmoderatelylargen,itdoesn'tmatter. 6.3.8TheFamousErrorTermadvancedtopic Booksonregressionanalysisandtherearehundreds,ifnotthousandsofthesegenerallyintroducethe subjectasfollows.Theyconsiderthelinearcasewithr=1,andwrite Y = 0 + 1 X + ;E =0 .40 with beingindependentofX.Theyalsoassumethat hasanormaldistributionwithvariance 2 Let'sseehowthiscomparestowhatwehavebeenassumingheresofar.Inthelinearcasewithr=1,we wouldwrite m Y ; X t = E Y j X = t = 0 + 1 t .41 Notethatinourcontext,wewoulddene as = Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(m Y ; X X .42 Equation.40isconsistentwith.41:Theformerhas E =0 ,andsodoesthelatter,since E = EY )]TJ/F46 10.9091 Tf 10.909 0 Td [(E [ m Y ; X X ]= EY )]TJ/F46 10.9091 Tf 10.909 0 Td [(E [ E Y j X ]= EY )]TJ/F46 10.9091 Tf 10.909 0 Td [(EY =0 .43

PAGE 191

6.3.REGRESSIONANALYSIS 173 Inordertoproducecondenceintervals,welateraddedtheassumption.27,whichyoucanseeisconsistentwith.40sincethelatterassumesthat Var = 2 nomatterwhatvalueXhas. Now,whataboutthenormalityassumptionin.40?Thatwouldbeequivalenttosayingthatinourcontext, theconditionaldistributionofYgivenXisnormal,whichisanassumptionwedidnotmake.Notethatin theweight/heightexample,thisassumptionwouldsaythat,forinstance,thedistributionofweightsamong peopleofheight68.2inchesisnormal. Nomatterwhatthecontextis,thevariable iscalledthe errorterm .Originallythiswasanallusionto measurementerror,e.g.inchemistryexperiments,butthemoderninterpretationwouldbepredictionerror, i.e.howmucherrorwemakewhenweus m Y ; X t topredictY. 6.3.9ModelSelection TheissuesraisedinChapter5becomecrucialinregressionandclassicationproblems.Inthisunit,we willtypicallydealwithmodelshavinglargenumbersofparameters.Acentralprinciplewillbethatsimpler modelsarepreferable,providedofcoursetheyareaccurate.HencetheEinsteinquoteabove.Simpler modelsareoftencalled parsimonious HereIusetheterm modelselection tomeanwhichpredictorvariableswewilluse.Ifwehavedataonmany predictors,wealmostcertainlywillnotbeabletousethemall,forthefollowingreason: 6.3.9.1TheOverttingProbleminRegression Recall.8.Thereweassumedasecond-degreepolynomialfor m A ; b .Whynotathird-degree,orfourth, andsoon? Youcanseethatifwecarrythisnotiontoitsextreme,wegetabsurdresults.Ifwetapolynomialofdegree 99toour100points,wecanmakeourttedcurveexactlypassthrougheverypoint!Thisclearlywouldgive usameaningless,uselesscurve.Wearesimplyttingthenoise. RecallthatweanalyzedthisprobleminSection5.2.3inourunitonmodeling.testing.Therewenotedan absolutelyfundamentalprincipleinstatistics: Inchoosingbetweenasimplermodelandamorecomplexone,thelatterismoreaccurateonly ifeither wehaveenoughdatatosupportit,or thecomplexmodelissufcientlydifferentfromthesimplerone Thisisextremelyimportantinregressionanalysis. Forexample,lookatourregressionmodelforA againstbintheALOHAsimulationinearliersections.Wedidanalysesforasimplermodel,aquadratic polynomial,andamorecomplexmodel,aquarticpolynomialofdegree4.Rephrasingtheabovepointsin thiscontext,wewouldsay, Inchoosingbetweenthequadraticandquarticmodels,thelatterismoreaccurateonlyifeither

PAGE 192

174 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES wehaveenoughdatatosupportit,or atleastoneofthecoefcients 3 and 4 isquitedifferentfrom0 Intheweight/height/ageexampleinSection6.3.1,thiswouldbephrasedas Indecidingwhethertopredictfromheightonly,versusfrombothheightandage,thelatteris moreaccurateonlyifeither wehaveenoughdatatosupportit,or thecoefcient 2 isquitedifferentfrom0 Ifweusetoomanypredictorvariables, 6 ,ourdataisdiluted,bybeingsharedbysomany i .Asa result, Var i willbelarge,withbigimplications:WhetherourgoalisPredictionorUnderstanding,our estimateswillbesopoorthatneithergoalisachieved. Thequestionsraisedinturnbytheaboveconsiderations,i.e. Howmuchdata isenoughdata?,and How different from0isquitedifferent?,areaddressedbelowinSection6.3.9.2. AdetailedmathematicalexampleofoverttinginregressionispresentedinmypaperACarefulLookat theUseofStatisticalMethodologyinDataMiningbookchapter,byN.Matloff,in FoundationsofData MiningandGranularComputing ,editedbyT.Y.Lin,WesleyChuandL.Matzlack,Springer-VerlagLecture NotesinComputerScience,2005. 6.3.9.2MethodsforPredictorVariableSelection So,wetypicallymustdiscardsome,maybemany,ofourpredictorvariables.Intheweight/height/age example,wemayneedtodiscardtheagevariable.IntheALOHAexample,wemightneedtodiscard b 4 andeven b 3 .Howdowemakethesedecisions? Notecarefullythat thisisanunsolvedproblem. Ifanyoneeverclaimstheyhaveafoolproofwaytodo this,theydonotunderstandtheproblemintherstplace.Entirebookshavebeenwrittenonthissubject e.g. SubsetSelectioninRegression ,byAlanMiller,pub.byChapmanandHall,2002,discussingmyriad differentmethods,butagain,noneofthemisfoolproof. Mostofthemethodsforvariableselectionusehypothesistestinginoneformoranother.Typicallythistakes theform H 0 : i =0 .44 Inthecontextof.6,thiswouldmeantesting H 0 : 2 =0 .45 6 IntheALOHAexampleabove, b b 2 b 3 and b 4 areseparatepredictors,eventhoughtheyareofcoursecorrelated.

PAGE 193

6.3.REGRESSIONANALYSIS 175 Ifwereject H 0 ,thenweusetheagevariable;otherwisewediscardit. IhopeI'veconvincedyouthatthisisnotagoodidea.Asusual,thehypothesistestisaskingthewrong question.Forinstance,intheweight/height/ageexample,thetestisaskingwhether 2 iszeroornot, whereas whatwewanttoknow iswhether 2 isfarenoughfrom0foragetogiveusbetterpredictionsof weight.Thosearetwovery,verydifferentquestions. Averyinterestingexampleofoverttingusingrealdatamaybefoundinthepaper,HonestCondenceIntervalsfortheErrorVarianceinStepwiseRegression,byFosterandStine, www-stat.wharton.upenn. edu/ stine/research/honests2.pdf .Theauthors,oftheUniversityofPennsylvaniaWharton School,tookrealnancialdataanddeliberatelyaddedanumberofextrapredictorsthatwereinfactrandomnoise,independentoftherealdata.Theythentestedthehypothesis.44.Theyfoundthateachof thefakepredictorswassignicantlyrelatedtoY!Thisillustratesboththedangersofhypothesistesting andthepossibleneedformultipleinferenceprocedures. 7 Thisproblemhasalwaysbeenknownbythinking statisticians,buttheWhartonstudycertainlydramatizedit. Well,then,whatcanbedoneinstead?First,thereisthesamealternativetohypothesistestingthatwe discussedbeforecondenceintervals.Wesawanexampleofthatin.39.Granted,theintervalwasvery wide,tellingusthatitwouldbenicetohavemoredata.Buteventhelowerboundofthatintervalisfarfrom zero,soitlooksprettysafetouse b 4 asapredictor. Moreover,acondenceintervalfor i tellsuswhetherthevariable X i wouldhavemuchvalueasapredictor.Onceagain,considertheweight/height/ageexample.Supposeourcondenceintervalfor 2 is .04,0.56.Thatwouldsaythat,forinstance,a10-yeardifferenceinageonlymakesabouthalfapound differenceinmeanweightinwhichcaseagewouldbeofalmostnovalueinpredictingweight. Amethodthatenjoyssomepopularityincertaincirclesisthe AkaikeInformationCriterion AIC.Ituses aformula,backedbysometheoreticalanalysis,whichcreatesatradeoffbetweenrichnessofthemodeland sizeofthestandarderrorsofthe ^ i .TheRstatisticalpackageincludesafunction AIC forthis,whichis usedby step intheregressioncase. Themostpopularalternativetohypothesistestingforvariableselectiontodayisprobably crossvalidation Herewesplitourdataintoa trainingset ,whichweusetoestimatethe i ,anda validationset ,inwhich weseehowwellourttedmodelpredictsnewdata,sayintermsofaveragesquaredpredictionerror.Wedo thisforseveralmodels,i.e.severalsetsofpredictors,andchoosetheonewhichdoesbestinthevalidation set.Ilikethismethodverymuch,thoughIoftensimplystickwithcondenceintervals. Aroughruleofthumbisthatoneshouldhave r< p n 8 6.3.10NonlinearParametricRegressionModels WepointedoutinSection6.3.7.1thattheword linear in linearregressionmodel meanslinearin ,notint. Thisisthemostpopularapproach,asitiscomputationallyeasy,butnonlinearmodelsareoftenused. Themostfamousoftheseisthe logistic model,forthecaseinwhich Y takesononlythevalues0and1. 7 Theyaddedsomanypredictorsthatrbecamegreaterthann.However,theproblemstheyfoundwouldhavebeentheretoa largedegreeevenifrwerelessthannbutr/nwassubstantial. 8 AsymptoticBehaviorofLikelihoodMethodsforExponentialFamiliesWhentheNumberofParametersTendstoInnity, StephenPortnoy, AnnalsofStatistics ,1968.

PAGE 194

176 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Aswehaveseenbefore,inthiscasetheexpectedvaluebecomesaprobability.Thelogisticmodelfora nonvector X isthen m Y ; X t = P Y =1 j X = t = 1 1+ e )]TJ/F44 7.9701 Tf 6.587 0 Td [( 0 + 1 t .46 Itextendstothecaseofvector-valued X intheobviousway. Thelogisticmodelisquitewidelyusedincomputerscience,inmedicine,economics,psychologyandso on. Hereisanexampleofanonlinearmodelusedinkineticsofchemicalreactions,withr=3: 9 m Y ; X t = 1 t )]TJ/F46 10.9091 Tf 10.909 0 Td [(t = 5 1+ 2 t + 3 t + 4 t .47 HeretheXvectorishydrogen,n-pentane,isopentane'. Unfortunately,inmostcases,theleast-squaresestimatesoftheparametersinnonlinearregressiondonot haveclosed-formsolutions,andnumericalmethodsmustbeused.ButRdoesthatforyou,viathe nls functioningeneral,andvia glm forthelogisticandrelatedmodelsinparticular. 6.3.11NonparametricEstimationofRegressionFunctions Insomeapplications,theremaybenoobviousparametricmodelfor m Yl ; X .Or,wemayhaveaparametric modelthatweareconsidering,butwewouldliketohavesomekindofnonparametricestimationmethod availableasameansofcheckingthevalidityofourparametricmodel.So,howdoweestimatearegression functionnonparametrically? Toguideourintuitiononthis,let'sturnagainoftheDavisexampleoftherelationshipbetweenheight andweight.Considerestimationofthequantity m W ; H : 2 ,the population meanweightofallpeople ofheight68.2.Wecouldtakeourestimate ^ m W ; H : 2 tobetheaverageweightofallthepeopleinour samplewhohavethatheight.Butwemayhaveveryfewpeopleofthatheight,sothatourestimatemay haveahighvariance,i.e.maynotbeveryaccurate. Whatwecoulddoinsteadistotakethemeanweightofallthepeopleinoursamplewhoseheightsare near 68.2,saybetween67.7and68.7.Thatwouldbiasthingsabit,butwe'dgetalowervariance.All nonparametricregressionmethodsworklikethis,thoughwithmanyvariations. Asourdenitionofnear,wecouldtakeallpeopleinoursamplewhoseheightsarewithinhamountof 68.2.ThisshouldremindyouofourdensityestimatorsinSection4.6ofourunitonestimationandtesting. Aswesawthere,ageneralizationwouldbetouseakernelmethod.Forinstance,forunivariateXandt: ^ m Y ; X t = P n i =1 Y i k t )]TJ/F47 7.9701 Tf 6.587 0 Td [(X i h P n i =1 k t )]TJ/F47 7.9701 Tf 6.587 0 Td [(X i h .48 9 See http://www.mathworks.com/index.html?s_cid=docframe_homepage .

PAGE 195

6.3.REGRESSIONANALYSIS 177 ThereisanRpackagethatincludesafunction nkreg todothis.TheRbasehasasimilarmethod,called LOESS .Note:Thatisthemethodname,buttheRfunctioniscalled lowess Othertypesofnonparametricmethodsinclude ClassicationandRegressionTrees CART,nearestneighbormethods,supportvectormachines,splinesetc. 6.3.12RegressionDiagnostics Researchersinregressionanalysishavedevisedsome diagnostic methods,meaningmethodstocheckthet ofamodel,thevalidityofassumptions[e.g..27],searchfordatapointsthatmayhaveanundueinuence andmayactuallybeinerror,andsoon. TheRpackagehastonsofdiagnosticmethods.SeeforexampleChapter4of LinearModelswithR ,Julian Faraway,ChapmanandHall,2005. 6.3.13NominalVariables RecallourexampleinSection6.2concerningastudyofsoftwareengineerproductivity.Toreview,the authorsofthestudypredicted Y =numberofperson-monthsneededtocompletetheproject,from X = sizeoftheprojectasmeasuredinlinesofcode, X =1or0dependingonwhetheranobject-orientedor proceduralapproachwasused,andothervariables. Asmentionedatthetime, X iscalledanindicatorvariable.Let'sgeneralizethatabit.Supposeweare comparingtwodifferentobject-orientedlanguages,C++andJava,aswellastheprocedurallanguageC. Thenwecouldchangethedenitionof X tohavethevalue1forC++and0fornon-C++,andwecould addanothervariable, X ,whichhasthevalue1forJavaand0fornon-Java.UseoftheClanguagewould beimpliedbythesituation X = X =0 Herewearedealingwitha nominal variable,Language,whichhasthreevalues,C++,JavaandC,and representingitbythetwoindicatorvariables X and X .NotethatwedoNOTwanttorepresent Languagebyasinglevaluehavingthevalues0,1and2,whichwouldimplythatChas,forinstance,double theimpactofJava. Youcanseethatifanominalvariabletakesonqvalues,weneedq-1indicatorvariablestorepresentit.We saythatthevariablehasq levels 6.3.14TheCaseinWhichAllPredictorsAreNominalVariables:AnalysisofVariance ContinuingtheideasinSection6.3.13,supposeinthesoftwareengineeringstudytheyhadkepttheproject sizeconstant,andinsteadof X beingprojectsize,thisvariablerecordedwhethertheprogrammeruses anintegrateddevelopmentenvironmentIDE.Say X is1or0,dependingonwhethertheprogrammer usestheEclipseIDEornoIDE,respectively.ContinuetoassumethestudyincludedthenominalLanguage variable,i.e.assumethestudyincludedtheindicatorvariables X C++and X Java.Nowallofour predictorswouldbenominal/indicatorvariables.Regressionanalysisinsuchsettingsiscalled analysisof variance ANOVA.

PAGE 196

178 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Eachnominalvariableiscalleda factor .So,inoursoftwareengineeringexample,thefactorsareIDEand Language.Noteagainthatintermsoftheactualpredictorvariables,eachfactorisrepresentedbyoneor moreindicatorvariables;hereIDEhasoneindicatorvariablesandLanguagehastwo. Analysisofvarianceisaclassicstatisticalprocedure,usedheavilyinagriculture,forexample.Wewillnot gointodetailshere,butmentionitbrieybothforthesakeofcompletenessandforitsrelevancetoSections 6.3.3and6.6.ThereaderisstronglyadvisedtoreviewSections6.3.3beforecontinuing. 6.3.14.1It'saRegression! Theterm analyisisofvariance isamisnomer.Amoreappropriatenamewouldbe analysisofmeans ,asit isinfactaregressionanalysis,asfollows. First,noteinoursoftwareengineeringexamplewebasicallyaretalkingaboutsixgroups,becausethereare sixdifferentcombinationsofvaluesforthetriple X ;X ;X .Forinstance,thetriple,0,1means thattheprogrammerisusinganIDEandprogramminginJava.Notethattriplesoftheformw,1,1are impossible. So,allthatishappeninghereisthatwehavesixgroupswithsixmeans.Butthatisaregression!Remember, forvariablesUandV, m V ; U t isthemeanofallvaluesofVinthesubpopulationgroupofpeopleorcars orwhateverdenedbyU=s.IfUisacontinuousvariable,thenwehaveinnitelymanysuchgroups,thus innitelymanymeans.Inoursoftwareengineeringexample,weonlyhavesixgroups,buttheprincipleis thesame.Wecanthuscasttheprobleminregressionterms: m Y ; X i;j;k = E Y j X = i;X = j;X = k ;i;j;k =0 ; 1 ;j + k 1 .49 Notetherestriction j + k 1 ,whichreectsthefactthatjandkcan'tbothbe1. Again,keepinmindthatweareworkingwithmeans.Forinstance, m Y ; X ; 1 ; 0 isthepopulationmean projectcompletiontimefortheprogrammerswhodonotuseEclipseandwhoprograminC++. Sincethetriplei,j,kcantakeononlysixvalues,mcanbemodeledfullygenerallyinthefollowingsixparameterlinearform: m Y ; X i;j;k = 0 + 1 i + 2 j + 3 k + 4 ij + 5 ik .50 where 4 and 5 arethecoefcientsoftwointeractionterms,asinSection6.3.3. 6.3.14.2InteractionTerms Itiscrucialtounderstandtheinteractionterms.Withouttheijandikterms,forinstance,ourmodelwould be m Y ; X i;j;k = 0 + 1 i + 2 j + 3 k .51

PAGE 197

6.3.REGRESSIONANALYSIS 179 whichwouldmeanasinSection6.3.3thatthedifferencebetweenusingEclipseandandnoIDEisthe sameforallthreeprogramminglanguages,C++,JavaandC.Thatcommondifferencewouldbe 1 .Ifthis conditiontheimpactofusinganIDEisthesameacrosslanguagesdoesn'thold,atleastapproximately, thenwouldusethefullmodel,.50.Moreonthisbelow. Notecarefullythatthereisnointeractiontermcorrespondingtojk,sincethatquantityis0,andthusthereis nothree-wayinteractiontermcorrespondingtoijkeither. Butsupposeweaddathirdfactor,Education,representedbytheindicator X ,havingthevalue1ifthe programmerhasaleastaMaster'sdegree,0otherwise.Thenmwouldtakeon12values,andthefullmodel wouldhave12parameters: m Y ; X i;j;k;l = 0 + 1 i + 2 j + 3 k + 4 l + 5 ij + 6 ik + 7 il + 8 jl + 9 kl + 10 ijl + 11 ikl .52 Again,therewouldbenoijklterm,asjk=0. Here 1 2 3 and 4 arecalledthe maineffects ,asopposedtothecoefcientsoftheinteractionterms, calledofcoursethe interactioneffects Theno-interactionversionwouldbe m Y ; X i;j;k;l = 0 + 1 i + 2 j + 3 k + 4 l .53 6.3.14.3NowConsiderParsimony Inthethree-factorexampleabove,wehave12groupsand12means.Whynotjusttreatitthatway,instead ofapplyingthepowerfultoolofregressionanalysis?Theanswerliesinourdesireforparsimony,asnoted inSection6.3.9.1. Ifforexample.53weretohold,atleastapproximately,wewouldhaveafarmoresatisfyingmodel.We couldforinstancethentalkoftheeffectofusinganIDE,ratherthanqualifyingsuchastatementbystating whattheeffectwouldbeforeachdifferentlanguageandeducationlevel.Moreover,ifoursamplesizeis notverylarge,wewouldgetmoreaccurateestimatesofthevarioussubpopulationmeans. Oritcouldbethat,while.53doesn'thold,amodelwithonlytwo-wayinteractions, m Y ; X i;j;k;l = 0 + 1 i + 2 j + 3 k + 4 l + 5 ij + 6 ik + 7 il + 8 jl + 9 kl .54 doesworkwell.Thiswouldnotbeasniceas.53,butitstillwouldbemoreparsimoniousthan.52. Accordingly,themajorthrustofANOVAistodecidehowrichamodelisneededtodoagoodjobof describingthesituationunderstudy.Thereisanimpliedhierarchyofmodelsofinteresthere: thefullmodel,includingtwo-andthree-wayinteractions,.52 themodelwithtwo-factorinteractionsonly,.54

PAGE 198

180 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES theno-interactionmodel,.53 Traditionallythesearedeterminedviahypothesistesting,whichinvolvescertainpartitioningsofsumsof squaressimilarto.18.Thisiswherethename analysisofvariance stemsfrom.Thenulldistribution oftheteststatisticoftenturnsouttobeanF-distribution.Ofcourse,inthisbook,weconsiderhypothesis testinginappropriate,preferringtogivesomecarefulthoughttotheestimatedparameters,butitisstandard. Furthertestingcanbedoneonindividual 1 andsoon.Oftenpeopleusesimultaneousinferenceprocedures, discussedbrieyinSection4.2.16ofourunitonestimationandtesting,sincemanytestsareperformed. 6.3.14.4Reparameterization ClassicalANOVAusesasomewhatdifferentparameterizationthanthatwe'veconsideredhere.Forinstance, considerasingle-factorsettingcalled one-wayANOVA withthreelevels.Ourpredictorsarethen X and X .Takingourapproachhere,wewouldwrite m Y ; X i;j = 0 + 1 i + 2 j .55 Thetraditionalformulationwouldbe i = + i ;i =1 ; 2 ; 3 .56 where = 1 + 2 + 3 3 .57 and i = i )]TJ/F46 10.9091 Tf 10.909 0 Td [( .58 Ofcourse,thetwoformulationsareequivalent.Itislefttothereadertocheckthat,forinstance, = 0 + 1 + 2 2 .59 TherearesimilarformulationsforANOVAdesignswithmorethanonefactor. Notethattheclassicalformulationoverparameterizestheproblem.Intheone-wayexampleabove,for instance,therearefourparameters 1 2 3 butonlythreegroups.Thiswouldmakethesystem indeterminate,butweaddtheconstraint 3 X i =1 i =0 .60 Equation.24thenmustmakeuseof generalizedmatrixinverses .

PAGE 199

6.4.THECLASSIFICATIONPROBLEM 181 6.4TheClassicationProblem Asmentionedearlier,inthespecialcaseinwhichYisanindicatorvariable,withthevalue1iftheobjectisin aclassand0ifnot,theregressionproblemiscalledthe classicationproblem .Itisalsosometimescalled patternrecognition ,inwhichcasethepredictorsarecalled features .Also,theterm machinelearning usuallyreferstoclassicationproblems. Iftherearecclasses,weneedcorc-1Yvariables,whichIwilldenoteby Y i ,i=1,...,c. 6.4.1MeaningoftheRegressionFunction 6.4.1.1TheMeanHereIsaProbability Now,hereisakeypoint:Sincethemeanofanyindicatorrandomvariableistheprobabilitythatthevariable isequalto1,theregressionfunctioninclassicationproblemsreducesto m Y ; X t = P Y =1 j X = t .61 RememberthatXandtarevector-valued. Forconcreteness,let'slookatthepatentexampleinSection6.1.Again,Ywillbe1or0,dependingon whetherthepatenthadpublicfunding.We'lltake X tobeanindicatorvariableforthepresenceor absenceofNSFinthepatent, X tobeanindicatorvariableforNIH,andtake X tobethenumber ofclaimsinthepatent.Thislastpredictormightberelevant,e.g.ifindustrialpatentsarelengthier. So, m Y ; X [ ; 0 ; 5] wouldbethepopulationproportionofallpatentsthatarepubliclyfunded,amongthose thatcontainthewordNSF,donotcontainNIH,andmakeveclaims. 6.4.1.2OptimalityoftheRegressionFunction Again,ourcontextisthatwewanttoguessY,knowingX.SinceYis0-1valued,ourguessforYbasedon X,gX,shouldbe0-1valuedtoo.Whatisthebestg? Again,sinceYandgare0-1valued,ourcriterionshouldbewhatwillIcallProbabilityofCorrectClassicationPCC: PCC = P [ Y = g X ] .62 Nowproceedasin.13: PCC = E [ P f Y = g X j X g ] .63 TheanalogofLemma9is

PAGE 200

182 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Lemma10 SupposeWtakesonvaluesinthesetA= f 0,1 g ,andconsidertheproblemofmaximizing P W = c ;cA .64 Thesolutionis 1 ; ifPW=1 > 0.5 0 ; otherwise .65 Proof Againrecallingthatciseither1or0,wehave P W = c = P W =1 c +[1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P W =1] )]TJ/F46 10.9091 Tf 10.909 0 Td [(c .66 =[2 P W =1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1] c +1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P W =1 .67 Theresultfollows. Applyingthisto.63,weseethatthebestgisgivenby g t = 1 ; if m Y ; X t > 0 : 5 0 ; otherwise .68 Sowendthattheregressionfunctionisagainoptimal,inthisnewcontext. 6.4.2ParametricModelsfortheRegressionFunctioninClassicationProblems Remember,weoftentryaparametricmodelforourregressionfunctionrst,asitmeansweareestimating anitenumberofquantities,insteadofaninnitenumber. 6.4.2.1TheLogisticModel:Form Themostcommonparametricmodelintheclassicationproblemisthelogisticmodeloftencalledthe logit model,seeninSection6.3.10.Initsr-predictorform,itis m Y ; X t = P Y =1 j X = t = 1 1+ e )]TJ/F44 7.9701 Tf 6.587 0 Td [( 0 + 1 t 1 + ::: + r t r .69

PAGE 201

6.4.THECLASSIFICATIONPROBLEM 183 Forinstance,considerthepatentexample.Underthelogisticmodel,thepopulationproportionofallpatents thatarepubliclyfunded,amongthosethatcontainthewordNSF,donotcontainNIH,andmakeve claimswouldhavethevalue 1 1+ e )]TJ/F44 7.9701 Tf 6.587 0 Td [( 0 + 1 +5 3 .70 6.4.2.2TheLogisticModel:IntuitiveMotivation Thelogisticfunctionitself, 1 1+ e )]TJ/F47 7.9701 Tf 6.586 0 Td [(u .71 hasvaluesbetween0and1,andisthusacandidateformodelingaprobability.Also,itismonotonicinu, makingitfurtherattractive,asinmanyclassicationproblemswebelievethat m Y ; X t shouldbemonotonic inthepredictorvariables. 6.4.2.3TheLogisticModel:TheoreticalFoundation Buttherearemuchstrongerreasonstousethelogitmodel,asitincludesmanycommonparametricmodels forX.Toseethis,notethatwecanwrite,forvector-valueddiscreteXandt, P Y =1 j X = t = P Y =1 and X = t P X = t .72 = P Y =1 P X = t j Y =1 P X = t .73 = P Y =1 P X = t j Y =1 P Y =1 P X = t j Y =1+ P Y =0 P X = t j Y =0 .74 = 1 1+ )]TJ/F47 7.9701 Tf 6.586 0 Td [(q P X = t j Y =0 qP X = t j Y =1 .75 where q = P Y =1 istheproportionofmembersofthepopulationwhichhave Y =1 .Keepinmind thatthisprobabilityisunconditional!!!!Inthepatentexample,forinstance,ifsay q =0 : 12 ,then12%of allpatentsinthepatentpopulationwithoutregardtowordsused,numbersofclaims,etc.arepublicly funded. If X isacontinuousrandomvector,thentheanalogof.75is P Y =1 j X = t = 1 1+ )]TJ/F47 7.9701 Tf 6.586 0 Td [(q f X j Y =0 t qf X j Y =1 t .76

PAGE 202

184 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Nowsuppose X ,given Y ,hasanormaldistribution.Inotherwords,withineachclass, Y isnormally distributed.Considerthecaseofjustonepredictorvariable,i.e.r=1.Supposethatgiven Y = i X hasthe distribution N i ; 2 ,i=0,1.Then f X j Y = i t = 1 p 2 exp )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 5 t )]TJ/F46 10.9091 Tf 10.909 0 Td [( i 2 # .77 Afterdoingsomeelementarybutrathertediousalgebra,.76reducestothelogisticform 1 1+ e )]TJ/F44 7.9701 Tf 6.587 0 Td [( 0 + 1 t .78 where 0 and 1 arefunctionsof 0 0 and Inotherwords,ifXisnormallydistributedinbothclasses,withthesamevariancebutdifferent means,then m Y ; X hasthelogisticform! AndthesameistrueifXismultivariatenormalineachclass, withdifferentmeanvectorsbutequalcovariancematrices.Thealgebraisevenmoretedioushere,butit doesworkout. So,notonlydoesthelogisticmodelhaveanintuitivelyappealingform,itisalsoimpliedbyoneofthe mostfamousdistributionsXcanhavewithineachclassthemultivariatenormal. Ifyourereadthederivationabove,youwillseethatthelogitmodelwillholdforanywithin-classdistributionsforwhich ln f X j Y =0 t f X j Y =1 t .79 oritsdiscreteanalogislinearint.Wellguesswhatthisconditionistrueforexponentialdistributions too!Workitoutforyourself. Infact,anumberoffamousdistributionsimplythelogitmodel. 6.4.3NonparametricEstimationofRegressionFunctionsforClassicationadvancedtopic 6.4.3.1UsetheKernelMethod,CART,Etc. Sincetheclassicationproblemisaspecialcaseofthegeneralregressionproblem,nonparametricregression methodscanbeusedheretoo. 6.4.3.2SVMs Therearealsosomemethodswhichhavebeendevelopedexclusively,ormainly,forclassication.One ofthemwhichhasbeengettingalotofpublicityincomputersciencecirclesis supportvectormachines SVMs.ToexplaintheSVMconcept,considerthecaser=2,i.e.twopredictorvariables X and X .

PAGE 203

6.4.THECLASSIFICATIONPROBLEM 185 WhatanSVMwoulddoisuseoursampledatatodrawacurveinthe X X plane,withourclassication rulethenbeing,GuessYtobe1ifXisononesideofthecurve,andguessittobe0ifXisontheother side. DON'TBUYSNAKEOIL! Therearenomagicsolutionstostatisticalproblems.SVMsdoverywell insomesituations,notsowellinothers.Ihighlyrecommendthesite www.dtreg.com/benchmarks. htm ,whichcomparessixdifferenttypesofclassicationfunctionestimatorsincludinglogisticregression andSVMonseveraldozenrealdatasets.Theoverallpercentmisclassicationrates,averagedoverallthe datasets,wasfairlyclose,rangingfromahighof25.3%toalowof19.2%.Themuch-vauntedSVMcame inat20.3%.That'snice,butitwasonlyatadbetterthanlogit's20.9%.Consideringthatthelatterhasa bigadvantageinthatonegetsanactualequationfortheclassicationfunction,completewithparameters whichwecanestimateandmakecondenceintervalsfor,itisnotclearjustwhatroleSVMandtheother nonparametricestimatorsshouldplay,ingeneral,thoughinspecicapplicationstheymaybeappropriate. 6.4.4VariableSelectioninClassicationProblems 6.4.4.1ProblemsInheritedfromtheRegressionContext InSection6.3.9.2,itwaspointedoutthattheproblemofpredictorvariableselectioninregressionisunsolved.Sincetheclassicationproblemisaspecialcaseofregression,thereisnosurerewaytoselect predictorvariablesthereeither. 6.4.4.2Example:ForestCoverData Andagain,usinghypothesistestingtochoosepredictorsisnottheanswer.Toillustratethis,let'slookagain attheforestcoverdatawesawinSection4.2.12. Thereweresevenclassesofforestcoverthere.Let'srestrictattentiontoclasses1and2.InmyRanalysisI hadtheclass1and2datainobjects cov1 and cov2 ,respectively.Icombinedthem, >cov1and2<-rbindcov1,cov2 andcreatedanewvariabletoserveasY: cov1and2[,56]<-ifelsecov1and2[,55]==1,1,0 Let'sseehowwellwecanpredictasite'sclassfromthevariableHS12hillsideshadeatnoonthatwe investigatedinthatpastunit,usingalogisticmodel. InRwetlogisticmodelsviathe glm function,forgeneralizedlinearmodels.Theword generalized here referstomodelsinwhichsomefunctionof m Y ; X t islinearinparameters i .Fortheclassicationmodel, ln m Y ; X t = [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(m Y ; X t ]= 0 + 1 t + ::: + r t r .80 ThiskindofgeneralizedlinearmodelisspeciedinRbysettingthenamedargument family to binomial Hereisthecall:

PAGE 204

186 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES >g<-glmcov1and2[,56]cov1and2[,8],family=binomial Theresultwas: >summaryg Call: glmformula=cov1and2[,56]cov1and2[,8],family=binomial DevianceResiduals: Min1QMedian3QMax -1.165-0.820-0.7751.5041.741 Coefficients: EstimateStd.ErrorzvaluePr>|z| Intercept1.5158201.1486651.3200.1870 cov1and2[,8]-0.0109600.005103-2.1480.0317 --Signif.codes:0 *** 0.001 ** 0.01 0.05.0.11 Dispersionparameterforbinomialfamilytakentobe1 Nulldeviance:959.72on810degreesoffreedom Residualdeviance:955.14on809degreesoffreedom AIC:959.14 NumberofFisherScoringiterations:4 So, b 1 = )]TJ/F15 10.9091 Tf 8.485 0 Td [(0 : 01 .Thisistiny,ascanbeseenfromourdatainthelastunit.Therewefoundthattheestimated meanvaluesofHS12forcovertypes1and2were223.8and226.3,adifferenceofonly2.5.Thatdifference inessencegetsmultipliedby0.01.Moreconcretely,in.46,pluginourestimates1.52and-0.01fromour Routputabove,rsttakingttobe223.8andthen226.3.Theresultsare0.328and0.322,respectively.In otherwords,HS12isn'thavingmucheffectontheprobabilityofcovertype1,andsoitcannotbeagood predictorofcovertype. YettheRoutputsaysthat 1 issignicantlydifferentfrom0,withap-valueof0.03.Thus,wesee onceagainthathypothesistestingdoesnotachieveourgoal.Again,crossvalidationisabettermethodfor choosingpredictors. 6.4.5YMustHaveaMarginalDistribution! Inourmaterialhere,wehavetacitlyassumedthatthevectorY,Xhasadistribution.Thatmayseemlike anoddandpuzzlingremarktomakehere,but itisabsolutelycrucial .Let'sseewhatitmeans. Considerthestudyonobject-orientedprogramminginSection6.1,butturnedaround.Thisexamplewill besomewhatcontrived,butitwillillustratetheprinciple.Supposeweknowhowmanylinesofcodeare inaproject,whichwewillstillcall X ,andweknowhowlongittooktocomplete,whichwewillnow takeas X ,andfromthiswewanttoguesswhetherobject-orientedorproceduralprogrammingwasused withoutbeingabletolookatthecode,ofcourse,whichisnowournewY. Hereisourhuge problem:Givenoursampledata,thereisnowaytoestimateqin.75.That'sbecause theauthorsofthestudysimplytooktwogroupsofprogrammersandhadonegroupuseobject-orientedprogrammingandhadtheothergroupuseproceduralprogramming.Ifwehadsampledprogrammersatrandom

PAGE 205

6.5.PRINCIPALCOMPONENTSANALYSIS 187 fromactualprojectsdoneatthiscompany,thatwouldenableustoestimateq,thepopulationproportionof projectsdonewithOOP.Butwecan'tdothatwiththedatathatwedohave.Indeed,inthissetting,itmay notevenmakesensetospeakofqintherstplace. Mathematicallyspeaking,ifyouthinkabouttheprocessunderwhichthedatawascollectedinthisstudy, theredoesexistsomeconditionaldistributionofXgivenY,butYitselfhasnodistribution.So,wecanNOT estimatePY=1 j X.AboutthebestwecandoistrytoguessYonthebasisofwhichevervalueofimakes f X j Y = i X larger. 6.5PrincipalComponentsAnalysis 6.5.1DimensionReductionandthePrincipleofParsimony Considerarandomvector X = X 1 ;X 2 T .SupposethetwocomponentsofXarehighlycorrelatedwith eachother.Thenforsomeconstantscandd, X 2 c + dX 1 .81 Theninasensethereisreallyjustonerandomvariablehere,asthesecondisnearlyequaltosomelinear combinationoftherst.Thesecondprovidesuswithalmostnonewinformation,oncewehavetherst. Inotherwords,eventhoughthevectorXroamsintwo-dimensionalspace,itusuallysticksclosetoaonedimensionalobject,namelytheline.81.Wesawagraphillustratingthisinourunitonmultivariate distributions,page84. Ingeneral,considerak-componentrandomvector X = X 1 ;:::;X k T .82 Weagainwishtoinvestigatewhetherjustafew,sayw,ofthe X i tellalmostthewholestory,i.e.whether most X j canbeexpressedapproximatelyaslinearcombinationsofthesefew X i .Inotherwords,even thoughXisk-dimensional,ittendstostickclosetosomew-dimensionalsubspace. Notethatalthough.81isphrasedinpredictionterms,wearenotormoreaccurately,notnecessarily interestedinpredictionhere.Wehavenotdesignatedoneofthe X i tobearesponsevariableandtherest tobepredictors. Onceagain,thePrincipleofParsimonyiskey.Ifwehave,say,20or30variables,itwouldbeniceifwe couldreducethatto,forexample,threeorfour.Thismaybeeasiertounderstandandworkwith,albeitwith thecomplicationthatournewvariableswouldbelinearcombinationsoftheoldones.

PAGE 206

188 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES 6.5.2HowtoCalculateThem Here'showitworks.Thetheoryoflinearalgebrasaysthatsince isasymmetricmatrix,itisdiagonalizable,i.e.thereisarealmatrixQforwhich Q T Q = D .83 whereDisadiagonalmatrix.Thisisaspecialcaseof singularvaluedecomposition .Thecolumns C i of Qaretheeigenvectorsof ,anditturnsoutthattheyareorthogonaltoeachother,i.e.theirdotproductis0. Let W i = C T i X;i =1 ;:::;k .84 sothatthe W i arescalarrandomvariables,andset W = W 1 ;:::;W k T .85 Then W = Q T X .86 Now,usethematerialoncovariancematricesfromourunitonmultivariateanalysis,page75, Cov W = Cov Q T X = Q T Cov X Q = D from.83.87 NotetoothatifXhasamultivariatenormaldistributionwhichwearenotassuming,thenWdoestoo. Let'srecap: Wehavecreatednewrandomvariables W i aslinearcombinationsofouroriginal X j The W i areuncorrelated.ThusifinadditionXhasamultivariatenormaldistribution,sothatWdoes too,thenthe W i willbeindependent. Thevarianceof W i isgivenbythei th diagonalelementofD. The W i arecalledthe principalcomponents ofthedistributionofX. Itiscustomarytorelabelthe W i sothat W 1 hasthelargestvariance, W 2 hasthesecond-largest,andsoon. Wethenchoosethose W i thathavethelargervariances,anddiscardtheothers,becausethelatter,having smallvariances,areclosetoconstantandthuscarrynoinformation. Allthiswillbecomeclearerintheexamplebelow.

PAGE 207

6.6.LOG-LINEARMODELS 189 6.5.3Example:ForestCoverData Let'stryusingprincipalcomponentanalysisontheforestcoverdatasetwe'velookedatbefore.Thereare 10continuousvariablesalsomanydiscreteones,butthereisanothertoolforthatcase,thelog-linearmodel, discussedinSection6.6. InmyRrun,thedataset.notrestrictedtojusttwoforestcovertypes,butconsistingonlyoftherst1000 observationswasintheobject f .Herearethecallandtheresults: >prc<-prcompf[,1:10] >summaryprc Importanceofcomponents: PC1PC2PC3PC4PC5PC6 Standarddeviation1812.3941613.2871.89e+021.10e+0296.9345530.16789 ProportionofVariance0.5520.4386.01e-032.04e-030.001580.00015 CumulativeProportion0.5520.9909.96e-019.98e-010.999680.99984 PC7PC8PC9PC10 Standarddeviation25.9547816.785954.20.783 ProportionofVariance0.000110.000050.00.000 CumulativeProportion0.999951.000001.01.000 YoucanseefromthevariancevaluesherethatRhasscaledthe W i sothattheirvariancessumto1.0.It hasnotdonesoforthestandarddeviations,whichareforthenonscaledvariables.Thisisne,asweare onlyinterestedinthevariancesrelativetoeachother,i.e.savingtheprincipalcomponentswiththelarger variances. Whatweseehereisthateightofthe10principalcomponentshaveverysmallvariances,i.e.arecloseto constant.Inotherwords,thoughwehave10variables X 1 ;:::;X 10 ,thereisreallyonlytwovariables'worth ofinformationcarriedinthem. Soforexampleifwewishtopredictforestcovertypefromthese10variables,weshouldonlyusetwoof them.Wecoulduse W 1 and W 2 ,butforthesakeofinterpretabilitywesticktotheoriginalXvector;wecan useanytwoofthe X i ThecoefcientsofthelinearcombinationswhichproduceWfromX,i.e.theQmatrix,areavailablevia prc$rotation 6.6Log-LinearModels Herewediscussaprocedurewhichissomethingofananalogofprincipalcomponentsfordiscretevariables. OurmaterialonANOVAwillalsocomeintoplay.ItisrecommendedthatthereaderreviewSections6.3.14 and6.5beforecontinuing. 6.6.1TheSetting Let'sconsideravariationonthesoftwareengineeringexampleinSections6.2and6.3.14.Assumewehave thefactors,IDE,LanguageandEducation.Ourchange ofextremeimportance isthatwewillnow assumethatthesefactorsare RANDOM .Whatdoesthismean?

PAGE 208

190 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES IntheoriginalexampledescribedinSection6.2,programmerswere assigned tolanguages,andinour extensionsofthatexample,wecontinuedtoassumethis.Thusforexamplethenumberofprogrammerswho useanIDEandprograminJavawasxed;ifwerepeatedtheexperiment,thatnumberwouldstaythesame. Ifweweresamplingfromsomeprogrammerpopulation,ournewsamplewouldhavenewprogrammers, butthenumberusingandIDEandJavawouldbethesameasbefore,asourstudyprocedurespeciesthis. Bycontrast,let'snowassumethatwesimplysampleprogrammersatrandom,andaskthemwhetherthey prefertouseanIDEornot,andwhichlanguagetheyprefer. 10 Thenforexamplethenumberofprogrammers whoprefertouseanIDEandprograminJavawillberandom,notxed;ifwerepeattheexperiment,we willgetadifferentcount. Supposenowwenowwishtoinvestigaterelationsbetweenthefactors.Arechoiceofplatformandlanguage relatedtoeducation,forinstance? 6.6.2TheData Denoteourthreefactorsby X s ,s=1,2,3.Here X ,IDE,willtakeonthevalues1and2insteadof1 and0asbefore,1meaningthattheprogrammerpreferstouseanIDE,and2meaningnotso. X changes thiswaytoo,and X willtakeonthevalues1forC++,2forJavaand3forC.Notethatwenolongeruse indicatorvariables. Let X s r denotethevalueof X s forther th programmerinoursample,r=1,2,...,n.Ourdataarethecounts N ijk = numberofrsuchthat X r = i;X r = j and X r = k .88 Forinstance,ifwesample100programmers,ourdatamightlooklikethis: preferstouseIDE: Bachelor'sorlessMaster'sormore C++1815 Java2210 C64 prefersnottouseIDE: Bachelor'sorlessMaster'sormore C++74 Java62 C33 Soforexample N 122 =10 and N 212 =4 Herewehaveathree-dimensional contingencytable .Each N ijk valueisa cell inthetable. 10 Othersamplingschemesarepossibletoo.

PAGE 209

6.6.LOG-LINEARMODELS 191 6.6.3TheModels Let p ijk bethepopulationprobabilityofarandomly-chosenprogrammerfallingintocellijk,i.e. p ijk = P X = i and X = j and X = k = E N ijk =n .89 Asmentioned,weareinterestedinrelationsbetweenthefactors,intheformofindependence,fulland partial.Considerrstthecaseoffullindependence: p ijk = P X = i and X = j and X = k .90 = P X = i P X = j P X = k .91 Takinglogsofbothsidesin.90,weseethatindependenceofthethreefactorsisequivalenttosaying log p ijk = a i + b j + c k .92 forsomenumbers a i b j and c j .Thenumbersmustbenonpositive,andsince X m P X s = m =1 .93 wemusthave,forinstance, 2 X g =1 exp c g =1 .94 Thepointisthat.92lookslikeourno-interactionANOVAmodels,e.g..51.Ontheotherhand,if weassumeinsteadthatEducationisindependentofIDEandLanguagebutthatIDEandLanguagearenot independentofeachother,ourmodelwouldbe log p ijk = P X = i and X = j P X = k .95 = a i + b j + d ij + c k .96 Herewehavewritten P )]TJ/F46 10.9091 Tf 5 -8.837 Td [(X = i and X = j asasumofmaineffects a i and b j ,andinteraction effects, d ij ,analogoustoANOVA. AnotherpossiblemodelwouldhaveIDEandLanguageconditionallyindependent,givenEducation,meaningthatatanylevelofeducation,aprogrammer'spreferencetouseIDEornot,andhischoiceofprogramminglanguage,arenotrelated.We'dwritethemodelthisway:

PAGE 210

192 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES log p ijk = P X = i and X = j P X = k .97 = a i + b j + f ik + h jk + c k .98 Notecarefullythatthetypeofindependencein.98hasaquitedifferentinterpretationthanthatin.96. Thefullmodel,withnoindependenceassumptionsatall,wouldhavethreetwo-wayinteractionterms,as wellasathree-wayinteractionterm. 6.6.4ParameterEstimation Remember,wheneverwehaveparametricmodels,thestatistician'sSwissarmyknifeismaximumlikelihoodestimation.Thatiswhatismostoftenusedinthecaseoflog-linearmodels. How,then,dowecomputethelikelihoodofourdata,the N ijk ?It'sactuallyquitestraightforward,becausethe N ijk havethemultinomialdistributionwestudiedinSection3.6.1.1ofourunitonmultivariate distributions. L = n i;j;k N ijk p N ijk ijk .99 Wethenwritethe p ijk intermsofourmodelparameters.Takeforexample.96,wherewewrite p ijk = e a i + b j + d ij + c k .100 Wethensubstitute.100in.99,andmaximizethelatterwithrespecttothe a i b j d ij and c k ,subjectto constraintssuchas.94. Themaximizationmaybemessy.Butcertaincaseshavebeenworkedoutinclosedform,andinanycase todayonewouldtypicallydothecomputationbycomputer.InR,forexample,thereisthe loglin function forthispurpose. 6.6.5TheGoal:ParsimonyAgain Again,we'dlikethesimplestmodelpossible,butnotsimpler.Thismeansamodelwithasmuchindependencebetweenfactorsaspossible,subjecttothemodelbeingaccurate. Classicallog-linearmodelproceduresdomodelselectionbyhypothesistesting,testingwhethervarious interactiontermsare0.ThetestsoftenparallelANOVAtesting,withchi-squaredistributionsarisinginstead ofF-distributions.

PAGE 211

6.7.SIMPSON'SNON-PARADOX 193 6.7Simpson'sNon-Paradox Supposeeachindividualinapopulationeitherpossessesordoesnotpossesstraits A B and C ,andthatwe wishtopredicttrait A .Let A B and C denotethesituationsinwhichtheindividualdoesnotpossessthe giventrait.Simpson'sParadoxthendescribesasituationinwhich P A j B >P A j B .101 andyet P A j B;C


PAGE 212

194 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES Richmond'spopulationwas37%black,proportionallyfarmorethanNewYork's0.2%.So,Richmond's heavyconcentrationofblacksmadeitsoverallmortalityratelookworsethanNewYork's,eventhough thingswereactuallymuchworseinNewYork. Butisthisreallyaparadox?Closerconsiderationofthisexamplerevealsthattheonlyreasonthisexample andotherslikeitissurprisingisthatthepredictorswereusedinthewrongorder.Onenormallylooksfor predictorsoneatatime,rstndingthebestsinglepredictor,thenthebestpairofpredictors,andsoon. Ifthisweredoneontheabovedataset,therstpredictorvariablechosenwouldberace,notcity.Inother words,thesequenceofanalysiswouldlooksomethinglikethis: Pmortality j Richmond=0.0022 Pmortality j NewYork=0.0019 Pmortality j black=0.0048 Pmortality j white=0.0018 Pmortality j black,Richmond=0.0033 Pmortality j black,NewYork=0.0056 Pmortality j white,Richmond=0.0016 Pmortality j white,NewYork=0.0018 Theanalystwouldhaveseenthatraceisabetterpredictorthancity,andthuswouldhavechosenraceasthe bestsinglepredictor.Theanalystwouldtheninvestigatetherace/citypredictorpair,andwouldneverreach apointinwhichcityalonewereintheselectedpredictorset.Thusnoanomalieswouldarise. Exercises Notetoinstructor: SeethePrefaceforalistofsourcesofrealdataonwhichexercisescanbeassignedto complementthetheoreticalexercisesbelow. 1 .Supposeweareinterestedindocumentsofacertaintype,whichwe'llcallType1.Everythingthatisnot Type1we'llcallType2,withaproportion q ofalldocumentsbeingType1.Ourgoalwillbetotrytoguess documenttypebythepresenceofabsenceofacertainword;wewillguessType1ifthewordispresent, andotherwisewillguessType2. Let T denotedocumenttype,andlet W denotetheeventthatthewordisinthedocument.Also,let p i be theproportionofdocumentsthatcontaintheword,amongalldocumentsofTypei,i=1,2.Theevent C willdenoteourguessingcorrectly. Findtheoverallprobabilityofcorrectclassication, P C ,andalso P C j W Hint:Becarefulofyourconditionalandunconditionalprobabilitieshere. 2 .InthequarticmodelinALOHAsimulationexample,ndanapproximate95%condenceintervalfor thetruepopulationmeanwaitifourbackoffparameterbissetto0.6.

PAGE 213

6.7.SIMPSON'SNON-PARADOX 195 Hint:Youwillneedtousethefactthatalinearcombinationofthecomponentsofamultivariatenormal randomvectorhasaunivariatenormaldistributionsasdiscussedinSection3.6.2.1. 3 .Considerthelinearregressionmodelwithonepredictor,i.e.r=1.Let Y i and X i representthevaluesof theresponseandpredictorvariablesforthei th observationinoursample. aAssumeasinSection6.3.7.4that Var Y j X = t isaconstantint, 2 .Findtheexactvalueof Cov ^ 0 ; ^ 1 ,asafunctionofthe X i and 2 .Yournalanswershouldbeinscalar,i.e.non-matrix form. bSupposewewishtotthemodel m Y ; X t = 1 t ,i.e.theusuallinearmodelbutwithouttheconstant term, 0 .Deriveaformulafortheleast-squaresestimateof 1 4 .Supposetherandompair X;Y hasdensity 8 st on 0 0 .103 6 .Inthisproblem,youwillconductanRsimulationexperimentsimilartothatofFosterandStineon overtting,discussedinSection6.3.9.2. Generatedata X j i ;i =1 ;:::;n;j =1 ;:::;r fromaN,1distribution,and i ;i =1 ;:::;n fromN,4. Set Y i = X i + i ;i =1 ;:::;n .Thissimulatesdrawingarandomsampleofnobservationsfroman r+1-variatepopulation. Nowsupposetheanalyst,unawarethat Y isrelatedtoonly X ,tsthemodel m Y ; X ;:::;X r t 1 ;:::;t r = 0 + 1 t + ::: + r t r .104 Inactuality, j =0 for j> 1 andfor i =0 .Buttheanalystwouldn'tknowthis.Supposetheanalyst selectspredictorsbytestingthehypotheses H 0 : i =0 ,asinSection6.3.9.2,with =0 : 05 Dothisforvariousvaluesofrandn.Youshouldndthat,forxednandincreasingr.Youbegintondthat someofthepredictorsaredeclaredtobesignicantlyrelatedto Y completewithasteriskswheninfact theyarenotwhile X ,whichreallyisrelatedto Y ,maybedeclaredNOTsignicant.Thisillustrates thefollyofusinghypothesistestingtodovariableselection. 7 .Considerarandompart X;Y forwhichthelinearmodel E Y j X = 0 + 1 X holds,andthinkabout predicting Y ,rstwithout X andthenwith X ,minimizingmeansquaredpredictionerrorMSPEineach case.FromSection6.3.6,weknowthatwithout X ,thebestpredictoris EY ,whilewith X itis E Y j X whichunderourassumptionhereis 0 + 1 X .ShowthatthereductioninMSPEaccredbyusing X ,i.e. E Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(EY 2 )]TJ/F46 10.9091 Tf 10.909 0 Td [(E f Y )]TJ/F46 10.9091 Tf 10.909 0 Td [(E Y j X g 2 E [ Y )]TJ/F46 10.9091 Tf 10.91 0 Td [(EY 2 ] .105

PAGE 214

196 CHAPTER6.STATISTICALRELATIONSBETWEENVARIABLES isequalto 2 X;Y .

PAGE 215

Chapter7 MarkovChains OneofthemostfamousstochasticmodelsisthatofaMarkovchain.Thistypeofmodeliswidelyusedin computerscience,biology,physicsandsoon. 7.1Discrete-TimeMarkovChains 7.1.1Example:FiniteRandomWalk Oneofthemostcommonlyusedstochasticmodelsisthatofa Markovchain .Tomotivatethisdiscussion, letusstartwithasimpleexample:Considera randomwalk onthesetofintegersbetween1and5,moving randomlythroughthatset,sayonemovepersecond,accordingtothefollowingscheme.Ifwearecurrently atpositioni,thenonetimeperiodlaterwewillbeateitheri-1,iori+1,accordingtotheoutcomeofrolling afairdiewemovetoi-1ifthediecomesup1or2,stayatiifthediecomesup3or4,andmovetoi+1in thecaseofa5or6.Forthespecialcasesi=1andi=5,wesimplymovebackto2or4,respectively.In randomwalkterminology,thesearecalled reectingbarriers Theintegers1through5formthe statespace forthisprocess;ifwearecurrentlyat4,forinstance,wesay weareinstate4.Let X t representthepositionoftheparticleattimet,t=0,1,2,.... Therandomwalkisa Markovprocess .Theprocessismemoryless,meaningthatwecanforgetthe past;giventhepresentandthepast,thefuturedependsonlyonthepresent: P X t +1 = s t +1 j X t = s t ;X t )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 = s t )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ;:::;X 0 = s 0 = P X t +1 = s t +1 j X t = s t .1 Theterm Markovprocess isthegeneralone.Ifthestatespaceisdiscrete,i.e.countablyinnite,thenwe usuallyusethemorespecializedterm, Markovchain Althoughthisequationhasaverycomplexlook,ithasaverysimplemeaning:Thedistributionofournext position,givenourcurrentpositionandallourpastpositions,isdependentonlyonthecurrentposition. 1 It 1 Thiscanbegeneralized,sothatthefuturedependsonthepresentandalsoonthestateoneunitoftimeago,etc.However,such modelsbecomequiteunwieldy. 197

PAGE 216

198 CHAPTER7.MARKOVCHAINS isclearthattherandomwalkprocessabovedoeshavethisproperty;forinstance,ifwearenowatposition 4,theprobabilitythatournextstatewillbe3is1/3nomatterwherewewereinthepast. Continuingthisexample,let p ij denotetheprobabilityofgoingfrompositionitopositionjinonestep.For example, p 21 = p 23 = 1 3 while p 24 =0 wecanreachposition4fromposition2intwosteps,butnotin onestep.Thenumbers p ij arecalledthe one-steptransitionprobabilities oftheprocess.DenotebyPthe matrixwhoseentriesarethe p ij : 0 B B B B @ 01000 1 3 1 3 1 3 00 0 1 3 1 3 1 3 0 00 1 3 1 3 1 3 00010 1 C C C C A .2 Bytheway,itturnsoutthatthematrix P k givesthek-steptransitionprobabilities.Inotherwords,the elementi,jofthismatrixgivestheprobabilityofgoingfromitojinksteps. 7.1.2Long-RunDistribution Intypicalapplicationsweareinterestedinthelong-rundistributionoftheprocess,forexamplethelong-run proportionofthetimethatweareatposition4.Foreachstatei,dene i =lim t !1 N it t .3 where N it isthenumberofvisitstheprocessmakestostateiamongtimes1,2,...,t.Inmostpracticalcases, thisproportionwillexistandbeindependentofourinitialposition X 0 .The i arecalledthe steady-state probabilities ,orthe stationarydistribution oftheMarkovchain. Intuitively,theexistenceof i impliesthatastapproachesinnity,thesystemapproachessteady-state,in thesensethat lim t !1 P X t = i = i .4 Actually,thelimit.4maynotexistinsomecases.We'llreturntothatpointlater,butfortypicalcasesit doesexist,andwewillusuallyassumethis.Itthensuggestsawaytocalculatethevalues i ,asfollows. Firstnotethat P X t +1 = i = X k P X t = k and X t +1 = i = X k P X t = k P X t +1 = i j X t = k = X k P X t = k p ki .5

PAGE 217

7.1.DISCRETE-TIMEMARKOVCHAINS 199 wherethesumgoesoverallstatesk.Forexample,inourrandomwalkexampleabove,wewouldhave P X t +1 =3= 5 X k =1 P X t = k and X t +1 =3= 5 X k =1 P X t = k P X t +1 =3 j X t = k = 5 X k =1 P X t = k p k 3 .6 Thenas t !1 inEquation.5,intuitivelywewouldhave i = X k k p ki .7 Remember,hereweknowthe p ki andwanttondthe i .Solvingtheseequationsoneforeachi,called the balanceequations ,giveusthe i Amatrixformulationisalsouseful.Letting denotetherowvectoroftheelements i ,i.e. = 1 ; 2 ;::: theseequationsoneforeachithenhavethematrixform = P .8 or I )]TJ/F46 10.9091 Tf 10.909 0 Td [(P =0 .9 Notethatthereisalsotheconstraint X i i =1 .10 Fortherandomwalkproblemabove,forinstance,thesolutionis = 1 11 ; 3 11 ; 3 11 ; 3 11 ; 1 11 .Thusinthelong runwewillspend1/11ofourtimeatposition1,3/11ofourtimeatposition2,andsoon. Oneoftheequationsinthesystemisredundant.Wethuseliminateoneofthem,saybyremovingthelast rowofI-Pin.9.ToreectThiscanbeusedtocalculatethe i .Itturnsoutthatoneoftheequationsin thesystemisredundant.Wethuseliminateoneofthem,saybyremovingthelastrowofI-Pin.9.To reect.10,wereplacetheremovedrowbyarowofall1s,andintheright-handsideof.9wereplace thelast0bya1.Wecanthensolvethesystem.ItcanbedonewithR's solve function.Oronecannote from.8that isalefteigenvectorofPwitheigenvalue1,soonecancall eign onP'. ButEquation.9maynotbeeasytosolve.Forinstance,ifthestatespaceisinnite,thenthismatrix equationrepresentsinnitelymanyscalarequations.Insuchcases,youmayneedtotrytondsomeclever trickwhichwillallowyoutosolvethesystem,orinmanycasesaclevertricktoanalyzetheprocessinsome wayotherthanexplicitsolutionofthesystemofequations. Andevenfornitestatespaces,thematrixmaybeextremelylarge.Insomecases,youmayneedtoresort tonumericalmethods,orsymbolicmathpackages.

PAGE 218

200 CHAPTER7.MARKOVCHAINS 7.1.2.1PeriodicChains NoteagainthatevenifEquation.9hasasolution,thisdoesnotimplythat.4holds.Forinstance, supposewealtertherandomwalkexampleabovesothat p i;i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 = p i;i +1 = 1 2 .11 fori=2,3,4,withtransitionsoutofstates1and5remainingasbefore.Inthiscase,thesolutiontoEquation .9is 1 8 ; 1 4 ; 1 4 ; 1 4 ; 1 8 .Thissolutionisstillvalid,inthesensethatEquation.3willhold.Forexample, wewillspend1/4ofourtimeatPosition4inthelongrun.Butthelimitof P X i =4 willnotbe1/4,and infactthelimitwillnotevenexist.Ifsay X 0 iseven,then X i canbeevenonlyforevenvaluesofi.Wesay thatthisMarkovchainis periodic withperiod2,meaningthatreturnstoagivenstatecanonlyoccurafter amountsoftimewhicharemultiplesof2. 7.1.2.2TheMeaningoftheTermStationaryDistribution Thoughwehaveinformallydenedtheterm stationarydistribution intermsoflong-runproportions,the technicaldenitionisthis: Denition11 ConsideraMarkovchain.Supposewehaveavector ofnonnegativenumbersthatsumto 1.Let X 0 havethedistribution .Ifthatresultsin X 1 havingthatdistributiontooandthusalsoall X n wesaythat isthe stationarydistribution ofthisMarkovchain. Notethatthisdenitionstemsfrom.5. Inourrstrandomwalkexampleabove,thiswouldmeanthatifwehave X 0 distributedontheintegers1 through5withprobabilities 1 11 ; 3 11 ; 3 11 ; 3 11 ; 1 11 ,thenforexample P X 1 =1= 1 11 P X 1 =4= 3 11 etc. Thisisindeedthecase,asyoucanverifyusing.5witht=0. Inournotebookview,hereiswhatwewoulddo.Imaginethatwegeneratearandomintegerbetween1 and5accordingtotheprobabilities 1 11 ; 3 11 ; 3 11 ; 3 11 ; 1 11 2 andset X 0 tothatnumber.Wewouldthengenerate anotherrandomnumber,byrollinganordinarydie,andgoingleft,rightorstayingput,withprobability1/3 each.Wewouldthenwritedown X 1 and X 2 ontherstlineofournotebook.Wewouldthendothis experimentagain,recordingtheresultsonthesecondline,thenagainandagain.Inthelongrun,3/11ofthe lineswouldhave,forinstance, X 0 =4 ,and3/11ofthelineswouldhave X 1 =4 .Inotherwords, X 1 would havethesamedistributionas X 0 7.1.3Example:Stuck-At0Fault 7.1.3.1Description Intheaboveexample,thelabelsforthestatesconsistedofsingleintegersi.Insomeotherexamples, convenientlabelsmayber-tuples,forexample2-tuplesi,j. 2 Saybyrollingan11-sideddie.

PAGE 219

7.1.DISCRETE-TIMEMARKOVCHAINS 201 Consideraserialcommunicationline.Let B 1 ;B 2 ;B 3 ;::: denotethesequenceofbitstransmittedonthis line.Itisreasonabletoassumethe B i tobeindependent,andthat P B i =0 and P B i =1 arebothequal to0.5. Supposethatthereceiverwilleventuallyfail,withthetypeoffailurebeing stuckat0 ,meaningthatafter failureitwillreportallfuturereceivedbitstobe0,regardlessoftheirtruevalue.Oncefailed,thereceiver staysfailed,andshouldbereplaced.Eventuallythenewreceiverwillalsofail,andwewillreplaceit;we continuethisprocessindenitely. Let denotetheprobabilitythatthereceiverfailsonanygivenbit,withindependencebetweenbitsinterms ofreceiverfailure.Thenthelifetimeofthereceiver,thatis,thetimetofailure,isgeometricallydistributed withsuccessprobability i.e.theprobabilityoffailingonreceiptofthei-thbitafterthereceiveris installedis )]TJ/F46 10.9091 Tf 10.909 0 Td [( i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 fori=l,2,3,... However,theproblemisthatwewillnotknowwhetherareceiverhasfailedunlesswetestitonceina while,whichwearenotincludinginthisexample.Ifthereceiverreportsalongstringof0s,weshould suspectthatthereceiverhasfailed,butofcoursewecannotbesurethatithas;itisstillpossiblethatthe messagebeingtransmittedjusthappenedtocontainalongstringof0s. Supposeweadoptthepolicythat,ifwereceivekconsecutive0s,wewillreplacethereceiverwithanew unit.Herekisadesignparameter;whatvalueshouldwechooseforit?Ifweuseaverysmallvalue,then wewillincurgreatexpense,duetothefactthatwewillbereplacingreceiverunitsatanunnecessarilyhigh rate.Ontheotherhand,ifwemakektoolarge,thenwewilloftenwaittoolongtoreplacethereceiver, andtheresultingerrorrateinreceivedbitswillbesizable.Resolutionofthistradeoffbetweenexpenseand accuracydependsontherelativeimportanceofthetwo.Therearealsootherpossibilities,involvingthe additionofredundantbitsforerrordetection,suchasparitybits.Forsimplicity,wewillnotconsidersuch renementshere.However,theanalysisofmorecomplexsystemswouldbesimilartotheonebelow. 7.1.3.2InitialAnalysis Anaturalstatespaceinthisexamplewouldbe f i;j : i =0 ; 1 ;:::;k )]TJ/F15 10.9091 Tf 10.909 0 Td [(1; j =0 ; 1; i + j 6 =0 g .12 whereirepresentsthenumberofconsecutive0sthatwehavereceivedsofar,andjrepresentsthestateof thereceiverforfailed,1fornonfailed.Notethatwhenweareinastateoftheformk-1,j,ifwereceive a0onthenextbitwhetheritisatrue0orthereceiverhasfailed,ournewstatewillbe,1,aswewill installanewreceiver.Notetoothatthereisnostate,0,sinceifthereceiverisdownitmusthavereceived atleastonebit. ThecalculationofthetransitionmatrixPisstraightforward,thoughitrequirescarefulthought.Forexample, supposethecurrentstateis,1,andthatweareinvestigatingtheexpenseandbitaccuracycorresponding toapolicyhavingk=5.Whatcanhappenuponreceiptofthenextbit?Thenextbitwillhaveatruevalue ofeither0or1,withprobability0.5each.Thereceiverwillchangefromworkingtofailedstatuswith probability .Thusournextstatecouldbe: ,1,ifa0arrives,andthereceiverdoesnotfail;

PAGE 220

202 CHAPTER7.MARKOVCHAINS ,1,ifa1arrives,andthereceiverdoesnotfail;or ,0,ifthereceiverfails Theprobabilitiesofthesethreetransitionsoutofstate,1are: p ; 1 ; ; 1 =0 : 5 )]TJ/F46 10.9091 Tf 10.909 0 Td [( .13 p ; 1 ; ; 1 =0 : 5 )]TJ/F46 10.9091 Tf 10.909 0 Td [( .14 p ; 1 ; ; 0 = .15 OtherentriesofthematrixPcanbecomputedsimilarly.Notebythewaythatfromstate,1wewillgoto ,1,nomatterwhathappens. FormallyspecifyingthematrixPusingthe2-tuplenotationasabovewouldbeverycumbersome.Inthis case,itwouldbemucheasiertomaptoaone-dimensionallabeling.Forexample,ifk=5,theninestates ,0,...,,0,,1,,1,...,,1couldberenamedstates1,2,...,9.ThenwecouldformPunderthislabeling, andthetransitionprobabilitiesabovewouldappearas p 78 =0 : 5 )]TJ/F46 10.9091 Tf 10.909 0 Td [( .16 p 75 =0 : 5 )]TJ/F46 10.9091 Tf 10.909 0 Td [( .17 p 73 = .18 7.1.3.3GoingBeyondFinding Findingthe i shouldbejusttherststep.Wethenwanttousethemtocalculatevariousquantitiesof interest. 3 Forinstance,inthisexample,itwouldalsobeusefultondtheerrorrate ,andthemeantime i.e.,themeannumberofbitreceptionsbetweenreceiverreplacements, .Wecanndboth and in termsofthe i ,inthefollowingmanner. Thequantity istheproportionofthetimeduringwhichthetruevalueofthereceivedbitis1butthe receiverisdown,whichis0.5timestheproportionofthetimespentinstatesoftheformi,0: =0 : 5 1 + 2 + 3 + 4 .19 Thisshouldbeclearintuitively,butitwouldalsobeinstructivetopresentamoreformalderivationofthe samething.Let E n betheeventthatthen-thbitisreceivedinerror,with D n denotingtheeventthatthe receiverisdown.Then 3 Notethatunlikeaclassroomsetting,wherethosequantitieswouldbelistedforthestudentstocalculate,inresearchwemust decideonourownwhichquantitiesareofinterest.

PAGE 221

7.1.DISCRETE-TIMEMARKOVCHAINS 203 =lim n !1 P E n .20 =lim n !1 P X n =1 and D n .21 =lim n !1 P X n =1 P D n .22 =0 : 5 1 + 2 + 3 + 4 .23 Hereweusedthefactthat X n andthereceiverstateareindependent. Equations.20followapatternwe'lluserepeatedlyinthischapter.Insubsequentexampleswewill notshowthestepswiththelimits,butthelimitsareindeedthere.Makesuretomentallygothrough thesestepsyourself. 4 Nowtoget intermsofthe i notethatsince isthelong-runaveragenumberofbitsbetweenreceiver replacements,itisthenthereciprocalof ,thelong-runfractionofbitsthatresultinreplacements.For example,saywereplacethereceiveronaverageevery20bits.Overaperiodof1000bits,thenspeaking onanintuitivelevelthatwouldmeanabout50replacements.Thusapproximately0.05outof1000of allbitsresultsinreplacements. = 1 .24 Againsupposek=5.Areplacementwilloccuronlyfromstatesoftheform,j,andeventhenonly undertheconditionthatthenextreportedbitisa0.Inotherwords,therearethreepossiblewaysinwhich replacementcanoccur: aWeareinstate,0.Here,sincethereceiverhasfailed,thenextreportedbitwilldenitelybea0, regardlessofthatbit'struevalue.Wewillthenhaveatotalofk=5consecutivereceived0s,and thereforewillreplacethereceiver. bWeareinthestate,1,andthenextbittoarriveisatrue0.Itthenwillbereportedasa0,ourfth consecutive0,andwewillreplacethereceiver,asina. cWeareinthestate,1,andthenextbittoarriveisatrue1,butthereceiverfailsatthattime,resulting inthereportedvaluebeinga0.Againwehaveveconsecutivereported0s,sowereplacethereceiver. Therefore, = 4 + 9 : 5+0 : 5 .25 Again,makesureyouworkthroughthefullversionof.25,usingthepatternin.20. 4 Theotherwaytoworkthisoutrigorouslyistoassumethat X 0 hasthedistribution ,asinSection7.1.2.2.Thennolimitsare neededin.20.Butthismaybemoredifculttounderstand.

PAGE 222

204 CHAPTER7.MARKOVCHAINS Thus = 1 = 1 4 +0 : 5 9 + .26 Thiskindofanalysiscouldbeusedasthecoreofacost-benettradeoffinvestigationtodetermineagood valueofk.Notethatthe i arefunctionsofk,andthattheaboveequationsforthecasek=5mustbe modiedforothervaluesofk. 7.1.4Example:Shared-MemoryMultiprocessor Adaptedfrom ProbabiilityandStatistics,withReliability,QueuingandComputerScienceApplicatiions byK.S.Trivedi,Prentice-Hall,1982and2002,butsimilartomanymodelsintheresearchliterature. 7.1.4.1TheModel Considerashared-memorymultiprocessorsystemwithmmemorymodulesandmCPUs.Theaddress spaceispartitionedintomchunks,basedoneitherthemost-signicantorleast-signicant log 2 m bitsinthe address. 5 TheCPUswillneedtoaccessthememorymodulesinsomerandomway,dependingontheprogramsthey arerunning.Tomakethisideaconcrete,considertheIntelassemblylanguageinstruction add%eax,%ebx whichaddsthecontentsoftheEAXregistertothewordinmemorypointedtobytheEBXregister.Executionofthatinstructionwillabsentcacheandothersimilareffects,aswewillassumehereandbelow involvetwoaccessestomemoryonetofetchtheoldvalueofthewordpointedtobyEBX,andanother tostorethenewvalue.Moreover,theinstructionitselfmustbefetchedfrommemory.So,altogetherthe processingofthisinstructioninvolvesthreememoryaccesses. Sincedifferentprogramsaremadeupofdifferentinstructions,usedifferentregistervaluesandsoon,the sequenceofaddressesinmemorythataregeneratedbyCPUsaremodeledasrandomvariables.Inourmodel here,theCPUsareassumedtoactindependentlyofeachother,andsuccessiverequestsfromagivenCPU areindependentofeachothertoo.ACPUwillchoosethei th modulewithprobability q i .Amemoryrequest takesoneunitoftimetoprocess,thoughthewaitmaybelongerduetoqueuing.Inthisverysimplistic model,assoonasaCPU'smemoryrequestisfullled,itgeneratesanotherone.Ontheotherhand,whilea CPUhasonememoryrequestpending,itdoesnotgenerateanother. Let'sassumeacrossbarinterconnect,whichmeansthereare m 2 separatepathsfromCPUstomemory modules,sothatifthemCPUshavememoryrequeststomdifferentmemorymodules,thenalltherequests canbefullledsimultaneously.Also,assumeasanapproximationthatwecanignorecommunicationdelays. 5 Youmayrecognizethisashigh-orderandlow-orderinterleaving,respectively.

PAGE 223

7.1.DISCRETE-TIMEMARKOVCHAINS 205 Howgoodaretheseassumptions?Oneweakness,forinstance,isthatmanyinstructions,forexample,do notusememoryatall,exceptfortheinstructionfetch,andasmentioned,eventhelattermaybesuppressed duetocacheeffects. Anotherexampleofpotentialproblemswiththeassumptionsinvolvesthefactthatmanyprogramswillhave codelike fori=0;i<10000;i++sum+=x[i]; Sincetheelementsofthearrayxwillbestoredinconsecutiveaddresses,successivememoryrequestsfrom theCPUwhileexecutingthiscodewillnotbeindependent.Theassumptionwouldbemorejustiedifwe wereincludingcacheeffects,ornoticedbyEarlBarrifwearestudyingatimesharingsystemwithasmall quantumsize. Thus,manymodelsofsystemslikethishavebeenquitecomplex,inordertocapturetheeffectsofvarious thingslikecaching,nonindependenceandsooninthemodel.Nevertheless,onecanoftengetsomeinsight fromevenverysimplemodelstoo.Inanycase,forourpurposeshereitisbesttosticktosimplemodels,so astounderstandmoreeasily. Ourstatewillbeanm-tuple N 1 ;:::;N m ,where N i isthenumberofrequestscurrentlypendingatmemory modulei.RecallingourassumptionthataCPUgeneratesanothermemoryrequestimmediatelyafterthe previousoneisfullled,wealwayshavethat N 1 + ::: + N m = m Itisstraightforwardtondthetransitionprobabilities p ij .Hereareacoupleofexamples,withm=2: p ; 0 ; ; 1 :Recallthatstate,0meansthatcurrentlytherearetworequestspendingatModule1, onebeingservedandoneinthequeue,andnorequestsatModule2.Forthetransition ; 0 ; 1 tooccur,whentherequestbeingservedatModule1isdone,itwillmakeanewrequest,thistime forModule2.Thiswilloccurwithprobability q 2 .Meanwhile,therequestwhichhadbeenqueuedat Module1willnowstartservice.So, p ; 0 ; ; 1 = q 2 p ; 1 ; ; 1 :Instate,1,bothpendingrequestswillnishinthiscycle.Togoto,1again,that wouldmeanthatthetwoCPUsrequestdifferentmodulesfromeachotherCPUs1and2choose Modules1and2or2and1.Eachofthosetwopossibilitieshasprobability q 1 q 2 ,so p ; 1 ; ; 1 = 2 q 1 q 2 Wethensolveforthe ,using.7.Itturnsout,forexample,that ; 1 = q 1 q 2 1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 q 1 q 2 .27 7.1.4.2GoingBeyondFinding LetBdenotethenumberofmemoryrequestsinagivenmemorycycle.ThenwemaybeinterestedinEB, thenumberofrequestscompletedperunittime,i.e.percycle.WecanndEBasfollows.LetSdenote

PAGE 224

206 CHAPTER7.MARKOVCHAINS thecurrentstate.Then,continuingthecasem=2,wehavefromtheLawofTotalExpectation, 6 E B = E [ E B j S ] .28 = P S = ; 0 E B j S = ; 0+ P S = ; 1 E B j S = ; 1+ P S = ; 2 E B j S = ; 2 .29 = ; 0 E B j S = ; 0+ ; 1 E B j S = ; 1+ ; 2 E B j S = ; 2 .30 AllthisequationisdoingisndingtheoverallmeanofBbybreakingdownintothecasesforthedifferent states. Nowifweareinstate,0,onlyonerequestwillbecompletedthiscycle,andBwillbe1.Thus E B j S = ; 0=1 .Similarly, E B j S = ; 1=2 andsoon.Afterdoingallthealgebra,wendthat EB = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(q 1 q 2 1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 q 1 q 2 .31 ThemaximumvalueofEBoccurswhen q 1 = q 2 = 1 2 ,inwhichcaseEB=1.5.Thisisalotlessthanthe maximumcapacityofthememorysystem,whichism=2requestspercycle. So,wecanlearnalotevenfromthissimplemodel,inthiscaselearningthattheremaybeasubstantial underutilizationofthesystem.Thisisacommonthemeinprobabilisticmodeling:Simplemodelsmaybe worthwhileintermsofinsightprovided,eveniftheirnumericalpredictionsmaynotbetooaccurate. 7.1.5Example:SlottedALOHA RecalltheslottedALOHAmodelfromChapter1: Timeisdividedintoslotsorepochs. Therearennodes,eachofwhichiseitheridleorhasa single messagetransmissionpending.So,a nodedoesn'tgenerateanewmessageuntiltheoldoneissuccessfullytransmittedaveryunrealistic assumption,butwe'rekeepingthingssimplehere. Inthemiddleofeachtimeslot,eachoftheidlenodesgeneratesamessagewithprobabilityq. Justbeforetheendofeachtimeslot,eachactivenodeattemptstosenditsmessagewithprobability p. Ifmorethanonenodeattemptstosendwithinagiventimeslot,thereisa collision ,andeachofthe transmissionsinvolvedwillfail. So,weincludea backoff mechanism:Atthemiddleofeachtimeslot,eachnodewithamessagewill withprobabilityqattempttosendthemessage,withthetransmissiontimeoccupyingtheremainder oftheslot. 6 Actually,wecouldtakeamoredirectrouteinthiscase,notingthatBcanonlytakeonthevalues1and2.Then EB = P B = 1+2 P B =2= ; 0 + s ; 2 +2 ; 1 : Buttheanalysisbelowextendsbettertothecaseofgeneralm.

PAGE 225

7.1.DISCRETE-TIMEMARKOVCHAINS 207 So,qisadesignparameter,whichmustbechosencarefully.Ifqistoolarge,wewillhavetoomnay collisions,thusincreasingtheaveragetimetosendamessage.Ifqistoosmall,anodewilloftenrefrain fromsendingevenifnoothernodeistheretocollidewith. Deneourstateforanygiventimeslottobethenumberofnodescurrentlyhavingamessagetosendatthe verybeginningofthetimeslotbeforenewmessagesaregenerated.Thenfor 0
PAGE 226

208 CHAPTER7.MARKOVCHAINS Now,tocalculate P successxmit j instates ,recallthatinstateswestarttheslotwithsnonidlenodes,but thatwemayacquiresomenewones;eachofthen-sidlenodeswillcreateanewmessage,withprobability q.So, P successxmit j instates = n )]TJ/F47 7.9701 Tf 6.586 0 Td [(s X j =0 n )]TJ/F46 10.9091 Tf 10.909 0 Td [(s j q j )]TJ/F46 10.9091 Tf 10.909 0 Td [(q n )]TJ/F47 7.9701 Tf 6.587 0 Td [(s )]TJ/F47 7.9701 Tf 6.586 0 Td [(j s + j )]TJ/F46 10.9091 Tf 10.909 0 Td [(p s + j )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 p .36 Substitutinginto.35,wehave = n X s =0 n )]TJ/F47 7.9701 Tf 6.587 0 Td [(s X j =0 n )]TJ/F46 10.9091 Tf 10.909 0 Td [(s j q j )]TJ/F46 10.9091 Tf 10.91 0 Td [(q n )]TJ/F47 7.9701 Tf 6.586 0 Td [(s )]TJ/F47 7.9701 Tf 6.586 0 Td [(j s + j )]TJ/F46 10.9091 Tf 10.909 0 Td [(p s + j )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 p s .37 Withsomemoresubtlereasoning,onecanderivethemeantimeamessagewaitsbeforebeingsuccessfully transmitted,asfollows: Focusattentionononeparticularnode,sayNode0.Itwillrepeatedlycyclethroughidleandbusyperiods,I andB.WewishtondEB.Ihasageometricdistributionwithparameterq, 7 so E I = 1 q .38 ThenifwecanndEI+B,wewillgetEBbysubtraction. TondEI+B,notethatthereisaone-to-onecorrespondencebetweenI+Bcyclesandsuccessfultransmissions;eachI+BperiodendswithasuccessfultransmissionatNode0.Imagineagainobservingthisnode for,say,100000timeslots,andsayEI+Bis2000.Thatwouldmeanwe'dhaveabout50cycles,thus50 successfultransmissionsfromthisnode.Inotherwords,thethroughputwouldbeapproximately50/100000 =0.02=1/EI+B.So,afraction 1 E I + B .39 ofthetimeslotshavesuccessfultransmissionsfromthisnode. Butthatquantityisthethroughputforthisnodenumberofsuccessfultransmissionsperunittime,anddue tothesymmetryofthesystem,thatthroughputis1/nofthetotalthroughputofthennodesinthenetwork, whichwedenotedaboveby So, E I + B = n .40 7 Ifamessageissentinthesameslotinwhichitiscreated,wewillcountBas1.Ifitissentinthefollowingslot,B=2,etc.B willhaveamodiedgeometricdistributionstartingat0insteadof1,butwewillignorethishereforthesakeofsimplicity.

PAGE 227

7.2.HIDDENMARKOVMODELS 209 Thusfrom.38wehave E B = n )]TJ/F15 10.9091 Tf 12.105 7.381 Td [(1 q .41 whereofcourse isthefunctionofthe i in.35. Nowlet'sndtheproportionofattemptedtransmissionswhicharesuccessful.Thiswillbe E numberofsuccessfultransmissionsinaslot E numberofattemptedtransmissionsinaslot .42 Toseewhythisisthecase,againthinkofwatchingthenetworkfor100,000slots.Thentheproportionof successfultransmissionsduringthatperiodoftimeisthenumberofsuccessfultransmissionsdividedbythe numberofattemptedtransmissions.Thosetwonumbersareapproximatelythenumeratoranddenominator of7.42. Now,howdoweevaluate.42?Well,thenumeratoriseasy,sinceitis ,whichwefoundbefore.The denominatorwillbe X s s [ sp + n )]TJ/F46 10.9091 Tf 10.909 0 Td [(s pq ] .43 Thefactorsp+spqcomesfromthefollowingreasoning.Ifweareinstates,thesnodeswhichalreadyhave somethingtosendwilleachtransmitwithprobabilityp,sotherewillbeanexpectednumberspofthemthat trytosend.Also,ofthen-swhichareidleatthebeginningoftheslot,anexpectedsqofthemwillgenerate newmessages,andofthosesq,andestimatedsqpwilltrytosend. 7.2HiddenMarkovModels Theword hidden intheterm HiddenMarkovModel HMMreferstothefactthatthestateoftheprocessis hidden,i.e.unobservable. Actually,we'vealreadyseenanexampleofthis,backinSection7.1.3.Therethestate,actuallyjustpartof it,wasunobservable,namelythestatusofthereceiverbeingupordown.Butherewearenottryingtoguess X n from Y n seebelow,soitprobablywouldnotbeconsideredanHMM.HMMs. AnHMMconsistsofaMarkovchain X n whichisunobservable,togetherwithobservablevalues Y n .The X n aregovernedbythetransitionprobabilities p ij ,andthe Y n aregeneratedfromthe X n accordingto r km = P Y n = m j X n = k .44 Typicallytheideaistoguessthe X n fromthe Y n andourknowledgeofthe p ij and r km .Thedetailsaretoo complextogivehere,butyoucanatleastunderstandthatBayes'Rulecomesintoplay.

PAGE 228

210 CHAPTER7.MARKOVCHAINS AgoodexampleofHMMswouldbeintextminingapplications.Herethe Y n mightbewordsinthetext, and X n wouldbetheirpartsofspeechPOSnouns,verbs,adjectivesandsoon.Considertheword round forinstance.Yourrstthoughtmightbethatitisanadjective,butitcouldbeanoune.g.anelimination roundinatournamentoraverbe.g.toroundoffanumberorroundacorner.TheHMMwouldhelpus toguesswhich,andthereforeguessthetruemeaningoftheword. HMMsarealsousedinspeechprocess,DNAmodelingandmanyotherapplications. 7.3Continuous-TimeMarkovChains IntheMarkovchainsweanalyzedabove,eventsoccuronlyatintegertimes.However,manyMarkovchain modelsareofthe continuous-time type,inwhicheventscanoccuratanytimes.Herethe holdingtime ,i.e. thetimethesystemspendsinonestatebeforechangingtoanotherstate,isacontinuousrandomvariable. ThestateofaMarkovchainatanytimenowhasacontinuoussubscript.Insteadofthechainconsistingof therandomvariables X n ;n =1 ; 2 ; 3 ;::: youcanalsostartnat0inthesenseofSection7.1.2.2,itnow consistsof f X t : t 2 [0 ; 1 g .TheMarkovpropertyisnow P X t + u = k j X s forall 0 s t = P X t + u = k j X t forall t;u> 0 .45 7.3.1Holding-TimeDistribution InorderfortheMarkovpropertytohold,thedistributionofholdingtimeatagivenstateneedstobe memoryless.Youmayrecallthatexponentiallydistributedrandomvariableshavethisproperty.Inother words,ifarandomvariableWhasdensity f t = e )]TJ/F47 7.9701 Tf 6.586 0 Td [(t .46 forsome then P W>r + s j W>r = P W>s .47 forallpositiverands.Actually,onecanshowthatexponentialdistributionsaretheonlycontinuousdistributionswhichhavethisproperty.Therefore, holdingtimesinMarkovchainsmustbeexponentially distributed. Itisdifcultforthebeginningmodelertofullyappreciatethememorylessproperty.Youareurgedtoread thematerialonexponentialdistributionsinSection2.3.4.1beforecontinuing. BecauseitiscentraltotheMarkovproperty,theexponentialdistributionisassumedforallbasicactivities inMarkovmodels.Inqueuingmodels,forinstance,boththeinterarrivaltimeandservicetimeareassumed tobeexponentiallydistributedthoughofcoursewithdifferentvaluesof .Inreliabilitymodeling,the lifetimeofacomponentisassumedtohaveanexponentialdistribution.

PAGE 229

7.3.CONTINUOUS-TIMEMARKOVCHAINS 211 Suchassumptionshaveinmanycasesbeenveriedempirically.Ifyougotoabank,forexample,andrecord dataonwhencustomersarriveatthedoor,youwillndtheexponentialmodeltoworkwellthoughyou mayhavetorestrictyourselftoagiventimeofday,toaccountfornonrandomeffectssuchasheavytrafc atthenoonhour.Inastudyoftimetofailureforairplaneairconditioners,thedistributionwasalsofound tobewellttedbyanexponentialdensity.Ontheotherhand,inmanycasesthedistributionisnotcloseto exponential,andpurelyMarkovianmodelscannotbeused. 7.3.2TheNotionofRates Akeypointisthattheparameter in.46hastheinterpretationofarate,inthesensewewillnowdiscuss. First,recallthat 1 = isthemean.Saylightbulblifetimeshaveanexponentialdistributionwithmean100 hours,so =0 : 01 .Inourlamp,wheneveritsbulbburnsout,weimmediatelyreplaceitwithanewon. Imaginewatchingthislampfor,say,100,000hours.Duringthattime,wewillhavedoneapproximately 100000/100=1000replacements.Thatwouldbeusing1000lightbulbsin100000hours,soweareusing bulbsattherateof0.01bulbperhour.Forageneral ,wewoulduselightbulbsattherateof bulbsper hour.Thisconceptiscrucialtowhatfollows. 7.3.3StationaryDistribution Weagaindene i tobethelong-runproportionoftimethesystemisinstatei,andweagainwillderivea systemoflinearequationstosolvefortheseproportions. Tothisend,let i denotetheparameterintheholding-timedistributionatstatei,anddenethefollowing: U i;t isthetotaltimespentatstateiupthroughtimet N i;t isthenumberofvisitstostateiupthroughtimet H ij istheholdingtimeduringthej th visittostatei Thereason U i;t isofinteresttousisthat lim t !1 U i;t t = i .48 Next,write U i;t = H i 1 + H i 2 + ::: + H i;N i;t + smallerror.49 Thereasonforthesmallerroristhatattimet,wemaybecurrentlyatstatei,inavisitthathasnotyet nished.At t !1 ,thistermvanishes,sowe'llignoreit.

PAGE 230

212 CHAPTER7.MARKOVCHAINS Nowintakingtheexpectedvaluein.49,weneedtodealwiththefactthatthereisarandomnumberof termsinthesumontheright-handside.ThiswedousingtheTheoremofTotalExpectation,asseeninthe exampleinSection3.8.1.3,yielding E [ U i;t ]= 1 i E [ N i;t ] .50 sinceholdingtimesatstateihavemean 1 = i .Andthenforlarget U i;t 1 i N i;t .51 Thismaybeclearertoyouifyoudividebothsidesbyt.Both U and N areessentiallycumulativesums,so dividingbytsetsupsomethingliketheStrongLawofLargeNumbers,discussedinSection1.4.10. Thenextpointistolookattheratesoftransitionsintoandoutofstatei.Theseshouldbeequalinthelong run,andthatwillbethebasisforourbalanceequations. Thenumberoftransitionsoutofiupthroughtimetexceptforthesmallerrorisequalto N i;t .What aboutinboundtransitions?Let p ji betheprobabilitythat,whenaholdingtimeatstatejends,ourtransition istoi.Thenforlarget,thenumberoftransitionsfromstatejtostateiisapproximately N j;t p ji .Equating thetwo,wehave,againforlarget X j 6 = i N j;t p ji N i;t .52 Combining.51and.52,wehave U i;t i N i;t X j 6 = i N j;t p ji X j 6 = i U j;t j p ji .53 Dividingbytandtakinglimits,wehave i i = X j 6 = i j j p ji .54 So, voila! ,thereareourbalanceequationsoneforeachi. Wewillsometimesrefertoquantities rs = r p rs .55 withthefollowinginterpretation.InthecontextoftheideasinourexampleoftherateoflightbulbreplacementsinSection7.3.2,onecanview.55astherateoftransitionsfromrtos, duringthetimewearein stater .Equation.54canthenbeinterpretedasequatingtherateoftransitionsintoiandtherateoutofi.

PAGE 231

7.3.CONTINUOUS-TIMEMARKOVCHAINS 213 7.3.4MinimaofIndependentExponentiallyDistributedRandomVariables Equation.54arene,butinactualexamplestherewillbeanissuewithndingthe p ji .Thematerialin thissectionwillbeusedforthatpurposeinlatersections. Suppose W 1 ;:::;W k areindependentrandomvariables,with W i beingexponentiallydistributedwithparameter i .Let Z =min W 1 ;:::;W k .Then aZisexponentiallydistributedwithparameter 1 + ::: + k b P Z = W i = i 1 + ::: + k Thesum 1 + ::: + n inashouldmakegoodintuitivesensetoyou,forthefollowingreasons.Saywe havepersons1and2.Eachhasalamp.PersoniusesBrandilightbulbs.SayBrandilightbulbshave exponentiallifetimeswithparameter i .Supposeeachtimepersonireplacesabulb,heshoutsout,New bulb!andeachtime anyone replacesabulb,IshoutoutNewbulb!Persons1and2areshoutingatarate of 1 and 2 ,respectively,soIamshoutingatarateof 1 + 2 .Moreover,atanygiventime,thetimeat whichIshoutnextwillbethe minimum ofthetimesatwhichpersons1and2shoutnext. Similarly,bshouldbeintuitivelyclearaswellfromtheabovethoughtexperiment,sinceforinstancea proportion 1 = 1 + 2 ofmyshoutswillbeinresponsetoperson1'sshouts. Propertiesaandbaboveareeasytoprove,startingwiththerelation F Z t =1 )]TJ/F46 10.9091 Tf 10 0 Td [(P Z>t =1 )]TJ/F46 10.9091 Tf 10 0 Td [(P W 1 >t and ::: and W k >t =1 )]TJ/F15 10.9091 Tf 10 0 Td [( i e )]TJ/F47 7.9701 Tf 6.587 0 Td [( i t =1 )]TJ/F46 10.9091 Tf 10 0 Td [(e )]TJ/F44 7.9701 Tf 6.587 0 Td [( 1 + ::: + n t .56 Taking d dt ofbothsidesshowsa. Forb,supposek=2.wehavethat P Z = W 1 = P W 1
PAGE 232

214 CHAPTER7.MARKOVCHAINS Supposethetimeuntilfailureofasinglemachine,carryingthefullloadofthefactory,hasanexponential distributionwithmean20.0,butthemeanis25.0whentheothermachineisworking,sinceitisnotso loaded.Repairtimeisexponentiallydistributedwithmean8.0. Wecantakeasourstatespace f 0,1,2 g ,wherethestateisthenumberofworkingmachines.Now,letus ndtheparameters i and p ji forthissystem.Forexample,whatabout 2 ?Theholdingtimeinstate2is theminimumofthetwolifetimesofthemachines,andthusfromtheresultsofSection7.3.4,hasparameter 1 25 : 0 + 1 25 : 0 =0 : 08 For 1 ,atransitionoutofstate1willbeeithertostate2thedownmachineisrepairedortostate0 theupmachinefails.Thetimeuntiltransitionwillbetheminimumofthelifetimeoftheupmachine andtherepairtimeofthedownmachine,andthuswillhaveparameter 1 20 : 0 + 1 8 : 0 =0 : 175 .Similarly, 0 = 1 8 : 0 + 1 8 : 0 =0 : 25 ItisimportanttounderstandhowtheMarkovpropertyisbeingusedhere.Supposeweareinstate1,and thedownmachineisrepaired,sendingusintostate2.Remember,themachinewhichhadalreadybeenup haslivedforsometimenow.Butthememorylesspropertyoftheexponentialdistributionimpliesthatthis machineisnowbornagain. Whatabouttheparameters p ji ?Well, p 21 iscertainlyeasytond;sincethetransition 2 1 isthe only transitionpossibleoutofstate2, p 21 =1 For p 12 ,recallthattransitionsoutofstate1aretostates0and2,withrates20.0and8.0,respectively.So, p 12 = 8 : 0 20 : 0+8 : 0 =0 : 28 .58 Workinginthismanner,wenallyarriveatthecompletesystemofequations.54: 2 : 08= 1 : 125 .59 1 : 175= 2 : 08+ 0 : 25 .60 0 : 25= 1 : 05 .61 Ofcourse,wealsohavetheconstraint 2 + 1 + 0 =1 .Thesolutionturnsouttobe = : 072 ; 0 : 362 ; 0 : 566 .62 Thusforexample,during7.2%ofthetime,therewillbenomachineavailableatall. Severalvariationsofthisproblemcouldbeanalyzed.Wecouldcomparethetwo-machinesystemwitha one-machineversion.Itturnsoutthattheproportionofdowntimei.e.timewhennomachineisavailable increasesto28.6%.Orwecouldanalyzethecaseinwhichonlyonerepairpersonisemployedbythis factory,sothatonlyonemachinecanberepairedatatime,comparedtothesituationabove,inwhichwe tacitlyassumedthatifbothmachinesaredown,theycanberepairedinparallel.Weleavethesevariations asexercisesforthereader.

PAGE 233

7.3.CONTINUOUS-TIMEMARKOVCHAINS 215 7.3.6Continuous-TimeBirth/DeathProcesses Wenotedearlierthatthesystemofequationsforthe i maynotbeeasytosolve.Inmanycases,forinstance, thestatespaceisinniteandthusthesystemofequationsisinnitetoo.However,thereisarichclassof Markovchainsforwhichclosed-formsolutionshavebeenfound,called birth/deathprocesses 8 Herethestatespaceconsistsoforhasbeenmappedtothesetofnonnegativeintegers,and p ji isnonzero onlyincasesinwhich j i )]TJ/F46 10.9091 Tf 11.538 0 Td [(j j =1 .Thenamebirth/deathhasitsorigininMarkovmodelsofbiologicalpopulations,inwhichthestateisthecurrentpopulationsize.Noteforinstancethattheexampleof thegracefullydegradingsystemabovehasthisform.AnM/M/1queueoneserver,Markovi.e.exponentialinterarrivaltimesandMarkovservicetimesisalsoabirth/deathprocess,withthestatebeingthe numberofjobsinthesystem. Becausethe p ji havesuchasimplestructure,thereishopethatwecanndaclosed-formsolutionto.54, anditturnsoutwecan.Let u i = i;i +1 and d i = i;i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 `u'forup,`d'fordown.Then.54is i +1 d i +1 + i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 u i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 = i i = i u i + d i ;i 1 .63 1 d 1 = 0 0 = 0 u 0 .64 Inotherwords, i +1 d i +1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( i u i = i d i )]TJ/F46 10.9091 Tf 10.909 0 Td [( i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 u i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 ;i 1 .65 1 d 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 0 u 0 =0 .66 Applying.65recursivelytothebase.66,weseethat i d i )]TJ/F46 10.9091 Tf 10.91 0 Td [( i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 u i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 =0 ;i 1 .67 sothat i = i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 u i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 d i i 1 .68 andthus i = 0 r i .69 where r i = i k =1 u k )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 d k .70 8 Thoughwetreatthecontinuous-timecasehere,thereisalsoadiscrete-timeanalog.

PAGE 234

216 CHAPTER7.MARKOVCHAINS where r i =0 for i>m ifthechainhasnostatespastm. Thensincethe i mustsumto1,wehavethat 0 = 1 1+ P 1 i =1 r i .71 andtheother i arethenfoundvia.69. Notethatthechainmightbenite,i.e.have u i =0 forsomei.Inthatcaseitisstillabirth/deathchain,and theformulasabovefor stillapply. 7.3.7Example:ComputerWorm NotallinterestingMarkovchainshavestationarydistributions.Hereisanexampleinwhichotherconsiderationscomeintoplay.Thischainhappenstobeabirth/deathchain,butitispurebirth,andthusdoes nothaveastationarydistribution,ormoreaccurately,hasitsstationarydistributionconcentratedonthe absorbingstate. AcomputersciencegraduatestudentatUCD,C.Senthilkumar,wasworkingonawormalertmechanism. Asimpliedversionofthemodelisthatnetworkhostsaredividedintogroupsofsizeg,sayonthebasis ofsharingthesamerouter.Eachinfectedhosttriestoinfectalltheothersinthegroup.Wheng-1group membersareinfected,analertissenttotheoutsideworld. Thestudentwasstudyingthismodelviasimulation,andfoundsomesurprisingbehavior.Nomatterhow largehemadeg,themeantimeuntilanexternalalertwasraisedseemedbounded.Heaskedmeforadvice. Imodeledthisasapurebirthprocess.Instatei,thereareiinfectedhosts,eachtryingtoinfectallofthe g-inoninfectedhots.Whentheprocessreachesstateg-1,theprocessends;wecallthisstatean absorbing state ,i.e.onefromwhichtheprocessneverleaves. Supposethatforeachinfected/noninfectedpairofhosts,thetimetoinfectionofthenoninfectedmember bytheinfectedmemberhasanexponentialdistributionwithmean1.0.Assumeindependenceamongthe variousinfectionattempts.Sinceinstateithereareig-isuchpairs,andsincewegotostatei+1whenthe rstinfectionamongtheseoccurs,wehave i = i g )]TJ/F46 10.9091 Tf 10.423 0 Td [(i .Thusthemeantimetogofromstateitostatei+1 is1/[ig-i]. Thenthemeantimetogofromstate1tostateg-1is g )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 X i =1 1 i g )]TJ/F46 10.9091 Tf 10.909 0 Td [(i .72 Usingacalculusapproximation,wehave Z g )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 1 1 x g )]TJ/F46 10.9091 Tf 10.909 0 Td [(x dx = 1 g Z g )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 1 1 x + 1 g )]TJ/F46 10.9091 Tf 10.91 0 Td [(x dx = 2 g ln g )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .73

PAGE 235

7.4.HITTINGTIMESETC. 217 Thelatterquantitygoestozeroas g !1 .Thisconrmsthatthebehaviorseenbythestudentinsimulations holdsingeneral.Inotherwords,.72remainsboundedas g !1 .Thisisaveryinterestingresult,since itsaysthatthemeantimetoalertisboundednomatterhowbigourgroupsizeis. 7.4HittingTimesEtc. Inthissectionwe'reinterestedintheamountoftimeittakestogetfromonestatetoanother,includingcases inwhichthismightbeinnite. 7.4.1SomeMathematicalConditions ThereisarichmathematicaltheoryregardingtheasymptoticbehaviorofMarkovchains.Wewillnotpresent suchmaterialhereinthisbriefintroduction,butwewillgiveanexampleoftheimplicationsthetheorycan have. AstateinaMarkovchainiscalled recurrent ifitisguaranteedthat,ifwestartatthatstate,wewillreturn tothestateinnitelymanytimes.Anonrecurrentstateiscalled transient. Let T ii denotethetimeneededtoreturntostateiifwestartthere. 9 Notethatanequivalentdenitionof recurrenceisthat P T ii < 1 =1 ,i.e.wearesuretoreturntoiatleastonce.BytheMarkovproperty, ifwearesuretoreturnonce,thenwearesuretoreturnagainonceafterthat,andsoon,sothisimplies innitelymanyvisits. Arecurrentstateiiscalled positiverecurrent if E T ii < 1 ,whileastatewhichisrecurrentbutnot positiverecurrentiscalled nullrecurrent Let T ij bethetimeittakestogettostatejifwearenowini.Notethatthisismeasuredfromthetimethat weenterstateitothetimeweenterstatej. Onecanshowthatinthediscretetimecase,astateiisrecurrentifandonlyif 1 X n =0 P T ii = n = 1 .74 Consideran irreducible Markovchain,meaningonewhichhasthepropertythatonecangetfromanystate toanyotherstatethoughnotnecessarilyinonestep.Onecanshowthatinanirreduciblechain,ifonestate isrecurrentthentheyallare.Thesamestatementholdsifrecurrentisreplacedbypositiverecurrent. 7.4.2Example:RandomWalks Considerthefamous randomwalk onthefullsetofintegers:Ateachtimestep,onegoesleftoneinteger orrightoneintegere.g.to+3or+5from+4,withprobability1/2each.Inotherwords,weipacoinand 9 Keepinmindthat T ii isthetimefromoneentrytostateitothenextentrytostatei.So,itincludestimespentini,whichis 1unitoftimeforadiscrete-timechainandarandomexponentialamountoftimeinthecontinuous-timecase,andthentimespent awayfromi,uptothetimeofnextentrytoi.

PAGE 236

218 CHAPTER7.MARKOVCHAINS goleftforheads,rightfortails. Ifwestartat0,thenwereturnto0whenwehaveaccumulatedanequalnumberofheadsandtails.Sofor even-numberedn,i.e.n=2m,wehave P T ii = n = P mheadsandmtails = 2 m m 1 2 2 m .75 OnecanuseStirling'sapproximation, m p 2 e )]TJ/F47 7.9701 Tf 6.586 0 Td [(m m m +1 = 2 .76 toshowthattheseries.74divergesinthiscase.So,thischainmeaningallstatesinthechainisrecurrent. However,itisnotpositiverecurrent. Thesameistrueforthecorrespondingrandomwalkonthetwo-dimensionalintegerlatticemovingup, down,leftorrightwithprobability1/4each.However,inthethree-dimensionalcase,thechainisnoteven nullrecurrent;itistransient. 7.4.3FindingHittingandRecurrenceTimes Forapositiverecurrentstateiinadiscrete-timeMarkovchain, i = 1 E T ii .77 TheapproachtoderivingthisissimilartothatofSection7.1.5.1.DenealternatingOnandOffsubcycles, whereOnmeansweareatstateiandOffmeansweareelsewhere.AnOnsubcyclehasduration1,and anOffsubcyclehasduration T ii )]TJ/F15 10.9091 Tf 11.281 0 Td [(1 .DeneafullcycletoconsistofanOnsubcyclefollowedbyanOff subcycle. Thenintuitivelytheproportionoftimeweareinstateiis i = E On E On + E Off = 1 1+ E T ii )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 = 1 ET ii .78 Theequationissimilarforthecontinuous-timecase.Here E On =1 = i .TheOffsubcyclehasduration T ii )]TJ/F15 10.9091 Tf 10.197 0 Td [(1 = i .Notethat T ii ismeasuredfromthetimeweenterstateionceuntilthetimeweenteritagain.We thenhave i = 1 = i E T ii .79 Thuspositiverecurrencemeansthat i > 0 .Foranullrecurrentchain,thelimitsinEquation.3are0, whichmeansthattheremayberatherlittleonecansayofinterestregardingthelong-runbehaviorofthe chain.

PAGE 237

7.4.HITTINGTIMESETC. 219 Weareofteninterestedinndingquantitiesoftheform E T ij .Wecandosobysettingupsystemsof equationssimilartothebalanceequationsusedforndingstationarydistributions. Firstconsiderthediscretecase.Conditioningontherststepwetakeafterbeingatstatei,wehave E T ij = X k 6 = j p ik [1+ E T kj ]+ p ij 1 .80 Byvaryingiandjin.80,wegetasystemoflinearequationswhichwecansolvetondthe ET ij .Note that.77givesusequationswecanuseheretoo. Thecontinuousversionusesthesamereasoning: E T ij = X k 6 = j p ik 1 i + E T kj + p ij 1 i .81 Onecanuseasimilaranalysistodeterminetheprobabilityofeverreachingastate,inchainswhichhave transientorabsorbingstates.Forxedjdene i = P T ij < 1 .82 Then i = X k 6 = j p ik k + p ij .83 7.4.4Example:FiniteRandomWalk Let'sgobacktotheexampleinSection7.1.1. Supposewestartourrandomwalkat2.Howlongwillittaketoreachstate4?Set b i = E T i 4 j startati From.80wecouldsetupequationslike b 2 = 1 3 + b 1 + 1 3 + b 2 + 1 3 + b 3 .84 Nowchangethemodelalittle,andmakestates1and6absorbing.Supposewestartatposition3.Whatis theprobabilitythatweeventuallyareabsorbedat6ratherthan1?Wecouldsetupequationslike.83to ndthis.

PAGE 238

220 CHAPTER7.MARKOVCHAINS 7.4.5Example:Tree-Searching ConsiderthefollowingMarkovchainwithinnitestatespace f 0,1,2,3,... g 10 Thetransitionmatrixisdened by p i;i +1 = q i and p i 0 =1 )]TJ/F46 10.9091 Tf 10.9 0 Td [(q i .Thiskindofmodelhasmanydifferentapplications,includingincomputer sciencetree-searchingalgorithms.Thestaterepresentsthelevelinthetreewherethesearchiscurrently, andareturnto0representsabacktrack.Moregeneralbacktrackingcanbemodeledsimilarly. Thequestionathandis,Whatconditionsonthe q i willgiveusapositiverecurrentchain? Assuming 0 n = n )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 i =0 q i .85 Therefore,thechainisrecurrentifandonlyif lim n !1 n )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 i =0 q i =0 .86 Forpositiverecurrence,weneed E T 00 < 1 .Now,onecanshowthatforanynonnegativeinteger-valued randomvariableY E Y = 1 X n =0 P Y>n .87 Thusforpositiverecurrence,ourconditiononthe q i is 1 X n =0 n )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 i =0 q i < 1 .88 Exercises 1 .ConsiderawraparoundvariantoftherandomwalkinSection7.1.1.Westillhaveareectingbarrier at1,butat5,wegobackto4,stayat5orwraparoundto1,eachwithprobability1/3.Findthenewset ofstationaryprobabilities. 2 .ConsidertheMarkovmodeloftheshared-memorymultiprocessorsysteminourPLN.Ineachpartbelow, youranswerwillbeafunctionof q 1 ;:::;q m aForthecasem=3,nd p ; 0 ; 1 ; ; 1 ; 1 10 Adaptedfrom PerformanceModellingofCommunicationNetworksandComputerArchitectures ,byP.HarrisonandN.Patel, pub.byAddison-Wesley,1993.

PAGE 239

7.4.HITTINGTIMESETC. 221 bForthecasem=6,giveacompactexpressionfor p ; 1 ; 1 ; 1 ; 1 ; 1 ; i;j;k;l;m;n Hint:Wehaveaninstanceofafamousparametricdistributionfamilyhere. 3 .Thisprobleminvolvestheanalysisofcallcenters.Thisisasubjectofmuchinterestinthebusinessworld, withtherebeingcommercialsimulatorssoldtoanalyzevariousscenarios.Hereareourassumptions: CallscomeinaccordingtoaPoissonprocesswithintensityparameter Calldurationisexponentiallydistributedwithparameter Therearealwaysatleastboperatorsinservice,andatmostb+r. Operatorsworkfromhome,andcanbebroughtintooroutofserviceinstantlywhenneeded.They arepaidonlyforthetimeinservice. Ifacallcomesinwhenthecurrentnumberofoperatorsislargerthanbbutsmallerthanb+r,another operatorisbroughtintoservicetoprocessthecall. Ifacallcomesinwhenthecurrentnumberofoperatorsisb+r,thecallisrejected. Whenanoperatorcompletesprocessingacall,andthecurrentnumberofoperatorsincludingthis oneisgreaterthanb,thenthatoperatoristakenoutofservice. Notethatthisisabirth/deathprocess,withthestatebeingthenumberofcallscurrentlyinthesystem. aFindapproximateclosed-formexpressionsforthe i forlargeb+r,intermsofb,r, and .You shouldnothaveanysummationsymbols. bFindtheproportionofrejectedcalls,intermsof i andb,r, and cAnoperatorispaidwhileinservice,evenifhe/sheisidle,inwhichcasethewagesarewasted. Expresstheproportionofwastedtimeintermsofthe i andb,r, and dSupposeb=r=2,and = =1 : 0 .Whenacallcompleteswhileweareinstateb+1,anoperatoris sentaway.Findthemeantimeuntilwemakeournextsummonstothereservepool. 4 .The bin-packingproblem arisesinmanycomputerscienceapplications.Itemsofvarioussizesmustbe placedintoxed-sizedbins.Thegoalistondapackingarrangementthatminimizesunusedspace.Toward thatend,workthefollowingproblem. Weareworkinginonedimension,andhaveacontinuingstreamofitemsarriving,oflengths L 1 ;L 2 ;L 3 ;::: Weplacetheitemsinthebinsintheorderofarrival,i.e.withoutoptimizing.Wecontinuetoplaceitemsin abinuntilweencounteranitemthatwillnottintheremainingspace,inwhichcasewegotothenextbin. Supposethebinsareoflength5,andanitemhaslength1,2,3or4,withprobability0.25each.Findthe long-runproportionofwastedspace. Hint:Setupadiscrete-timeMarkovchain,withtimebeingthenumberofitemspackedsofar,andthe statebeingtheamountofremainingspaceinthecurrentbin.

PAGE 240

222 CHAPTER7.MARKOVCHAINS 5 .Supposewekeeprollingadie.Findthemeannumberofrollsneededtogetthreeconsecutive4s. Hint:UsethematerialinSection7.4. 6 .Asystemconsistsoftwomachines,withexponentiallydistributedlifetimeshavingmean25.0.Thereis asinglerepairperson,butheisnotusuallyonsite.Whenabreakdownoccurs,heissummonedunlesshe isalreadyonhiswayoronsite,andittakeshimarandomamountoftimetoreachthesite,exponentially distributedwithmean2.0.Repairtimeisexponentiallydistributedwithmean8.0.Ifaftercompletinga repairtherepairpersonndsthattheothermachineneedsxing,hewillrepairit;otherwisehewillleave. RepairisperformedonaFirstCome,FirstServedschedule.Findthefollowing: aThelong-runproportionofthetimethattherepairpersonisonsite. bTherateperunittimeofcallstotherepairperson. cThemeantimetorepair,i.e.themeantimebetweenabreakdownofamachineandcompletionof repairofthatmachine. dTheprobabilitythat,whentwomachinesareupandoneofthemgoesdown,thesecondmachinefails beforetherepairpersonarrives. 7 .ConsideragaintherandomwalkinSection7.1.1.Find lim n !1 X n ;X n +1 .89 Hint:ApplytheLawofTotalExpectationto E X n X n +1 8 .Considerarandomvariable X thathasacontinuousdensity.Thatimpliesthat G u = P X>u has aderivative.Differentiate.47withrespecttor,thensetr=0,resultinginadifferentialequationfor G Solvethatequationtoshowthattheonlycontinuousdensitiesthatproducethememorylesspropertyare thoseintheexponentialfamily.

PAGE 241

Chapter8 IntroductiontoQueuingModels 8.1Introduction Likeotherareasofappliedstochasticprocesses,queuingtheoryhasavastliterature,coveringahugenumber ofvariationsondifferenttypesofqueues.Ourtutorialherecanonlyjustscratchthesurfacetothiseld. Hereisaroughoverviewofafewofthelargecategoriesofqueuingtheory: Single-serverqueues. Networksofqueues,including open networksinwhichjobsarrivefromoutsidethenetwork,visit someoftheserversinthenetwork,thenleaveand closed networksinwhichjobscontinuallycirculatewithinthenetwork,neverleaving. Non-FirstCome,FirstServedFCFSserviceorderings.Forexample,thereareLastCome,First Servedi.e.stacksandProcessorSharingwhichmodelsCPUtimesharing. Inthisbriefintroduction,wewillnotdiscussnon-FCFSqueues,andwillonlyscratchthesurfaceonthe othertops. 8.2M/M/1 TherstMherestandsforMarkovormemoryless,alludingtothefactthatarrivalstothequeueare Markovian,i.e.interarrivalsarei.i.d.exponentiallydistributed.ThesecondMmeansthattheservicetimes arealsoi.i.d.exponential.Denotethereciprocal-meaninterarrivalandservicetimesby and The1inM/M/1referstothefactthatthereisasingleserver.WewillassumeFCFSjobschedulinghere, butcloseinspectionofthederivationwillshowthatitappliestosomeotherkindsofschedulingtoo. Thissystemisacontinuous-timeMarkovchain,withthestate X t attimetbeingthenumberofjobsinthe systemnotjustinthequeuebutalsoincludingtheonecurrentlybeingserved,ifany. 223

PAGE 242

224 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS 8.2.1Steady-StateProbabilities Intuitivelythesteady-stateprobabilities i willexistonlyif < .Otherwisejobswouldcomeinfaster thantheycouldbeserved,andthequeuewouldbecomeinnite.So,weassumethat u< 1 ,where u = Clearlythisisabirth-and-deathchain.Forstatek,thebirthrate k;k +1 is andthedeathrate k;k )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 is k=0,1,2,...exceptthatthedeathrateatstate0is0.Usingtheformuladerivedforbirth/deathchains,we havethat i = u i 0 ;i 0 .1 and 0 = 1 P 1 j =0 u j =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u .2 Inotherwords, i = u i )]TJ/F46 10.9091 Tf 10.909 0 Td [(u ;i 0 .3 Notebythewaythatsince 0 =1 )]TJ/F46 10.9091 Tf 11.59 0 Td [(u ,thenuisthe utilization oftheserver,i.e.theproportionofthe timetheserverisbusy.Infact,thiscanbeseenintuitively:Thinkofaverylongperiodoftimeoflength t.Duringthistimeapproximately t jobshavingarrived,keepingtheserverbusyforapproximately t 1 time.Thusthefractionoftimeduringwhichtheserverisbusyisapproximantely t 1 t = .4 8.2.2MeanQueueLength AnotherwaytolookatEquation.3isasfollows.LettherandomvariableNhavethelong-rundistribution of X t ,sothat P N = i = u i )]TJ/F46 10.9091 Tf 10.909 0 Td [(u ;i 0 .5 ThenthissaysthatN+1hasageometricdistribution,withsuccessprobability1-u.Nitselfisnotquite geometricallydistributed,sinceN'svaluesbeginat0whileageometricdistributionbeginsat1. Thusthelong-runaveragevalueENfor X t willbethemeanofthatgeometricdistribution,minus1,i.e. EN = 1 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u )]TJ/F15 10.9091 Tf 10.91 0 Td [(1= u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u .6

PAGE 243

8.2.M/M/1 225 Thelong-runmeanqueuelengthEQwillbethisvalueminusthemeannumberofjobsbeingserved.The latterquantityis 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [( 0 = u ,so EQ = u 2 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u .7 8.2.3DistributionofResidenceTime/Little'sRule LetRdenotethe residencetime ofajob,i.e.thetimeelapsedfromthejob'sarrivaltoitsexitfromthe system.Little'sRulesaysthat EN = ER .8 Thispropertyholdsforavarietyofqueuingsystems,includingthisone.Itcanbeprovedformally,buthere istheintuition: Thinkofaparticularjobintheliteratureofqueuingtheory,itiscalledataggedjobatthetimeithasjust exitedthesystem.Ifthisisanaveragejob,thenithasbeeninthesystemforERamountoftime,during whichanaverageof ER newjobshavearrivedbehindit.Thesejobsnowcomprisethetotalnumberof jobsinthesystem,whichintheaveragecaseisEN. ApplyingLittle'sRulehere,weknowENfromEquation.6,sowecansolveforER: ER = 1 u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u = 1 = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u .9 Withalittlemorework,wecanndtheactualdistributionofR,notjustitsmean.Whenajobarrives,say thereareNjobsaheadofit,includingoneinservice.Thenthisjob'svalueofRcanbeexpressedas R = S self + S 1 ;resid + S 2 + ::: + S N .10 where S self istheservicetimeforthisjob, S 1 ;resid istheremainingtimeforthejobnowbeingservedi.e. theresiduallife,andfor i> 1 therandomvariable S i istheservicetimeforthe i th waitingjob. ThentheLaplacetransformofR,evaluatedatsayw,is E e )]TJ/F47 7.9701 Tf 6.586 0 Td [(wR = E [ e )]TJ/F47 7.9701 Tf 6.587 0 Td [(w S self + S 1 ;resid + S 2 + ::: + S N ] = E E [ e )]TJ/F47 7.9701 Tf 6.587 0 Td [(w S self + S 1 ;resid + S 2 + ::: + S N j N ] = E [ g w N +1 ] .11 where g w = E e )]TJ/F47 7.9701 Tf 6.587 0 Td [(wS .12

PAGE 244

226 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS istheLaplacetransformoftheservicevariable,i.e.ofanexponentialdistributionwithparameterequalto theservicerate .Herewehavemadeuseofthesefacts: TheLaplacetransformofasumofindependentrandomvariablesistheproductoftheirindividual Laplacetransforms. Duetothememorylessproperty, S 1 ;resid hasthesamedistributionasdotheother S i Thedistributionoftheservicetimes S i andqueuelengthNobservedbyourtaggedjobisthesameas thedistributionsofthosequantitiesatalltimes,notjustatarrivaltimesoftaggedjobs.Thisproperty canbeprovenforthiskindofqueueandmanyothers,andiscalledPASTAPoissonArrivalsSee TimeAverages. NotethatthePASTApropertyisnotobvious.Onthecontrary,givenourexperiencewiththeBus Paradoxandlength-biasedsamplinginSection2.5,weshouldbewaryofsuchthings.ButthePASTA propertydoesholdandcanbeproven. Butthatlastterm, E [ g w N +1 ] ,issettingv=gwforconvenience E v N +1 = 1 X k =0 v k +1 P N = k = 1 X k =0 v k +1 u k )]TJ/F46 10.9091 Tf 10.909 0 Td [(u = v )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(vu .13 Finally,bydenitionofLaplacetransform, v = g w = E e )]TJ/F47 7.9701 Tf 6.587 0 Td [(wS = Z 1 0 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(wt e )]TJ/F47 7.9701 Tf 6.586 0 Td [(t dt = w + .14 So,from.11,.13and.14,theLaplacetransformofRis )]TJ/F46 10.9091 Tf 10.909 0 Td [(u w + )]TJ/F46 10.9091 Tf 10.909 0 Td [(u .15 Hey,lookatthat!Equation.15hasthesameformas.14.Inotherwords,wehavediscoveredthatR hasanexponentialdistributiontoo,onlywithparameter )]TJ/F46 10.9091 Tf 10.909 0 Td [(u insteadof Thisisquiteremarkable.Thefactthattheserviceandinterarrivaltimesareexponentialdoesn'tmean thateverythingelsewillbeexponentialtoo,soitissurprisingthatRdoesturnouttohaveanexponential distribution. ItisevenmoresurprisinginthatRisasumofindependentexponentialrandomvariables,aswesawin .10,andweknowthatsuchsumshaveErlanddistributions.Theresolutionofthisseemingparadoxis thatthenumberoftermsNin.10isitselfrandom.ConditionalonN,RhasanErlangdistribution,but unconditionallyRhasanexponentialdistribution.

PAGE 245

8.3.MULTI-SERVERMODELS 227 8.3Multi-ServerModels Herewehavecservers,withacommonqueue.Whenajobgetstotheheadofthequeue,itisservedbythe rstavailableserver. Thestateisagainthenumberofjobsinthesystem,includinganyjobsattheservers.Againitisabirth/death chain,with u i;i +1 = and u i;i )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 = i; if0 < i < c c; ifi c .16 Thesolutionturnsouttobe k = 0 k 1 k ; k
PAGE 246

228 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS Weconsideroneparticularcellinthesystem.Mobilephoneusersdriftinandoutofthecellastheymove aroundthecity.Acallcaneitherbea newcall ,i.e.acallwhichsomeonehasjustdialed,ora handoffcall i.e.acallwhichhadalreadybeeninprogressinaneighboringcellbutnowhasmovedtothiscell. Eachcallinacellneedsa channel 1 Therearenchannelsavailableinthecell.Wewishtogivehandoff callspriorityovernewcalls. 2 Thisisaccomplishedasfollows. Thesystemalwaysreservesgchannelsforhandoffcalls.Whenarequestforanewcalli.e.anon-handoff callarrives,thesystemlooksat X t ,thecurrentnumberofcallsinthecell.Ifthatnumberislessthann-g, sothattherearemorethangidlechannelsavailable,thenewcallisaccepted;otherwiseitisrejected. WeassumethatnewcallsoriginatefromwithinthecellsaccordingtoaPoissonprocesswithrate 1 ,while handoffcallsdriftinfromneighboringcellsatrate 2 .Meanwhile,calldurationsareexponentialwithrate 1 ,whilethetimethatacallremainswithinthecellisexponentialwithrate 2 Weagainhaveabirth/deathprocess,thoughabitmorecomplicatedthanourearlierones.Let = 1 + 2 and = 1 + 2 .Thenhereisasamplebalanceequation,focusedontransitionsintoleft-handsideinthe equationandoutofright-handsidestate1: 0 + 2 2 = 1 + .21 Here'swhy:Howcanweenterstate1?Well,wecoulddosofromstate0,wheretherearenocalls;this occursifwegetanewcallrate 1 orahandoffcallrate 2 .Instate2,weenterstate1ifoneofthetwo callsendsrate 1 oroneofthetwocallsleavesthecellrate 2 .Thesamekindofreasoningshowsthat weleavestate1atrate + Asanotherexample,hereistheequationforstaten-g: n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g [ 2 + n )]TJ/F46 10.9091 Tf 10.909 0 Td [(g ]= n )]TJ/F47 7.9701 Tf 6.587 0 Td [(g +1 n )]TJ/F46 10.9091 Tf 10.909 0 Td [(g +1 + n )]TJ/F47 7.9701 Tf 6.587 0 Td [(g )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 .22 Notetheterm 2 in.22,ratherthan asin.21. Usingourbirth/deathformulaforthe i ,wendthat k = 0 A k k ; k n-g 0 A n )]TJ/F48 5.9776 Tf 5.757 0 Td [(g k A k )]TJ/F44 7.9701 Tf 6.586 0 Td [( n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g 1 ; k n-g .23 where A = = A 1 = 2 = and 0 = 2 4 n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 X k =0 A k k + n X k = n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g A n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g k A k )]TJ/F44 7.9701 Tf 6.587 0 Td [( n )]TJ/F47 7.9701 Tf 6.587 0 Td [(g 1 3 5 )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 .24 Onecancalculateanumberofinterestingquantitiesfromthe i : 1 Thiscouldbeacertainfrequencyoracertaintimeslotposition. 2 Wewouldrathergivethecallerofanewcallapoliterejectionmessage,e.g.Nolinesavailableatthistime,thansuddenly terminateanexistingconversation.

PAGE 247

8.5.NONEXPONENTIALSERVICETIMES 229 Theprobabilityofahandoffcallbeingrejectedis n Theprobabilityofanewcallbeingdroppedis n X k = n )]TJ/F47 7.9701 Tf 6.587 0 Td [(g k .25 Sincetheper-channelutilizationinstateiisi/n,theoveralllong-runper-channelutilizationis n X i =0 i i n .26 Thelong-runproportionofacceptedcallswhicharehandoffcallsistherateatwhichhandoffcalls areaccepted,dividedbytherateatwhichcallsareaccepted: 2 P n )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 i =0 i 1 P n )]TJ/F47 7.9701 Tf 6.586 0 Td [(g )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 i =0 i + 2 P n )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 i =0 i .27 8.5NonexponentialServiceTimes TheMarkovpropertyisofcoursecrucialtotheanalyseswemadeabove.Thusdroppingtheexponential assumptionpresentsamajoranalyticalchallenge. OnequeuingmodelwhichhasbeenfoundtractableisM/G/1:Exponentialinterarrivaltimes,generalservice times,oneserver.Infact,themeanqueuelengthandrelatedquantitiescanbeobtainedfairlyeasily,as follows. ConsidertheresidencetimeRforataggedjob.Risthetimethatourtaggedjobmustrstwaitforcompletion ofserviceofalljobs,ifany,whichareaheadofitqueuedornowinserviceplusthetaggedjob'sown servicetime.Let T 1 ;T 2 ;::: bei.i.d.withthedistributionofagenericservicetimerandomvariableS. T 1 representstheservicetimeofthetaggedjobitself. T 2 ;:::;T N representtheservicetimesofthequeuedjobs, ifany. LetNbethenumberofjobsinthesystem,eitherbeingservedorqueued;Bbeeither1or0,dependingon whetherthesystemisbusyi.e.N > 0ornot;and S 1 ;resid betheremainingservicetimeofthejobcurrently beingserved,ifany.Finally,wedene,asbefore, u = 1 =ES ,theutilization.NotethatthatimpliestheEB =u. ThenthedistributionofRisthatof BS 1 ;resid + T 1 + ::: + T N + )]TJ/F46 10.9091 Tf 10.909 0 Td [(B T 1 .28 NotethatifN=0,then T 1 + ::: + T N isconsideredtobe0,i.e.notpresentin.28. Then

PAGE 248

230 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS E R = uE S 1 ;resid + E T 1 + ::: + T N + )]TJ/F46 10.9091 Tf 10.909 0 Td [(u ET 1 = uE S 1 ;resid + E N E S + )]TJ/F46 10.9091 Tf 10.909 0 Td [(u ES = uE S 1 ;resid + E R E S + )]TJ/F46 10.9091 Tf 10.909 0 Td [(u ES .29 ThelastequalityisduetoLittle'sRule.NotealsothatwehavemadeuseofthePASTApropertyhere,so thatthedistributionofNisthesameatarrivaltimesasgeneraltimes. Then E R = uE S 1 ;resid 1 )]TJ/F46 10.9091 Tf 10.91 0 Td [(u + ES .30 Notethatthetwotermshererepresentthemeanresidencetimeasthemeanqueuingtimeplusthemean servicetime. Sowemustnd E S 1 ;resid .Thisisjustthemeanoftheremaining-lifedistributionwhichwesawinSection 9.6ofourunitonrenewaltheory.Then E S 1 ;resid = Z 1 0 t 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F S t ES dt = 1 ES Z 1 0 t Z 1 t f S u dudt = 1 ES Z 1 0 f S u Z u 0 tdtdu = 1 2 ES E S 2 .31 So, E R = uE S 2 2 ES )]TJ/F46 10.9091 Tf 10.909 0 Td [(u + ES .32 WhatisremarkableaboutthisfamousformulaisthatERdependsnotonlyonthemeanservicetimebut alsoonthevariance.Thisresult,whichisnotsointuitivelyobviousatrstglance,showsthepowerof modeling.WemightobservethedependencyofERonthevarianceofservicetimeempiricallyifwedo simulation,buthereisacompactformulathatshowsitforus. 8.6ReversedMarkovChains Wecangetinsightintosomekindsofqueuingsystemsbymakinguseoftheconceptsof reversed Markov chains,whichinvolveplayingtheMarkovchainbackward,justaswecouldplayamoviebackward.

PAGE 249

8.6.REVERSEDMARKOVCHAINS 231 Consideracontinuous-time,irreducible,positiverecurrentMarkovchainXt. 3 Foranyxedtime typicallythoughtofaslarge,denethe reversed versionofXtasYt=X -t,for 0 t .Wewill discussanumberofpropertiesofreversedchains.Thesepropertieswillenablewhatmathematicianscall softanalysisofsomeMarkovchains,especiallythoserelatedtoqueues.Thistermreferstoshort,simple, elegantproofsorderivations. 8.6.1MarkovProperty TherstpropertytonoteisthatYtisaMarkovchain!Hereisourrstchanceforsoftanalysis. Thehardanalysisapproachwouldbetostartwiththedenition,whichincontinuoustimewouldbethat P Y t = k j Y u ;u s = P Y t = k j Y s .33 forall 0
PAGE 250

232 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS 8.6.4ReversibleMarkovChains Insomecases,thereversedchainhasthesameprobabilisticstructureastheoriginalone!Notecarefully whatthatwouldmean.Inthecontinuous-timecase,itwouldmeanthat ~ ij = ij foralliandj,wherethe ~ ij arethetransitionratesofYt. 4 Ifthisisthecase,wesaythatXtis reversible Thatisaverystrongproperty.Anexampleofachainwhichisnotreversibleisthetree-searchmodelin Section7.4.5. 5 Therethestatespaceconsistsofallthenonnegativeintegers,andtransitionswerepossible fromstatesnton+1andfromnto0.Clearlythischainisnot reversible,sincewecangofromnto0inone stepbutnotviceversa. 8.6.4.1ConditionsforCheckingReversibility Equation.34showsthattheoriginalchainXtisreversibleifandonlyif i ij = j ji .36 foralliandj.Theseequationsarecalledthe detailedbalanceequations ,asopposedtothegeneral balance equations X j 6 = i j ji = i i .37 whichareusedtondthe values.Recallthat.37arisesfromequatingtheowintostateiwiththeow outofit.Bycontrast,Equation.36equatestheowintoifromaparticularstatejtotheowfromito j.Again,thatisamuchstrongercondition,sowecanseethatmostchainsarenot reversible.However,a numberofimportantonesarereversible,aswe'llsee. Equations.36maynotbesoeasytocheck,sinceforcomplexchainswemaynotbeabletondclosedformexpressionsforthe values.Thusitisdesirabletohaveanothertestavailableforreversibility.One suchtestis Kolmogorov'sCriterion : Thechainisreversibleifandonlyifforany loop ofstates,theproductofthetransitionratesis thesameinboththeforwardandbackwarddirections. Forexample,considertheloop i j k i .Thenwewouldcheckwhether ij jk ki = ik kj ji Technically,wedohavetocheck all loops.However,inmanycasesitshouldbeclearthatjustafewloops arerepresentative,astheotherloopshavethesamestructure. Kolmogorov'sCriteriontriviallyshowsthatbirth-and-deathprocessesarereversible,sinceanyloopinvolves apathwhichisthesamepathwhentraversedinreverse. 4 Notethatforacontinuous-timeMarkovchain,thetransitionratesdoindeeduniquelydeterminedtheprobabilisticstructureof thechain,notjustthelong-runstateproportions.Theshort-runbehaviorofthechainisalsodeterminedbythetransitionrates,and atleastintheorycanbecalculatedbysolvingdifferentialequationswhosecoefcientsmakeuseofthoserates. 5 Thatisadiscrete-timeexample,buttheprinciplehereisthesame.

PAGE 251

8.6.REVERSEDMARKOVCHAINS 233 8.6.4.2MakingNewReversibleChainsfromOldOnes Sincereversiblechainsaresousefulwhenweareluckytohavethem,averyusefultrickistobeableto formnewreversiblechainsfromoldones.Thefollowingtwopropertiesareveryhandyinthatregard: aSupposeUtandVtarereversibleMarkovchains,anddeneWttobethetuple[Ut,Vt].Then Wtisreversible. bSupposeXtisareversibleMarkovchain,andAisanirreduciblesubsetofthestatespaceofthe chain,withlong-runstatedistribution .DeneachainWtwithtransitionrates 0 ij for i 2 A where 0 ij = ij if j 2 A and 0 ij =0 otherwise.ThenWtisreversible,withlong-runstate distributiongivenby 0 i = i P j 2 A j .38 8.6.4.3Example:QueueswithaCommonWaitingArea ConsidertwoM/M/1queues,withchainsGtandHt,withindependentarrivalstreamsbuthavinga commonwaitingarea,withjobsarrivingtoafullwaitingareasimplybeinglost. 6 Firstconsiderthecaseofaninnitewaitingarea.Let u 1 and u 2 betheutilizationsofthetwoqueues,as in.3.GtandHt,beingbirth/deathprocesses,arereversible.Thenbypropertyaabove,thechain [Gt,Ht]isalsoreversible.Thenlong-runproportionofthetimethattherearemjobsintherstqueue andnjobsinthesecondis a mn = )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 1 m u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 2 n u 2 .39 form,n=0,1,2,3,... Nowconsiderwhatwouldhappenifthesetwoqueuesweretohaveacommon,nitewaitingarea.Denote theamountofspaceinthewaitingareabyw.RecallthatthestatemintheoriginalqueueUtisthenumber ofjobs,includingtheoneinserviceifany.Thatmeansthenumberofjobswaitingis m )]TJ/F15 10.9091 Tf 11.56 0 Td [(1 + ,where x + =max x; 0 .Thatmeansthatforournewsystem,withthecommonwaitingarea,weshouldtakeour subsetAtobe f m;n : m;n 0 ; m )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 + + n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 + w g .40 So,bypropertybabove,weknowthatthelong-runstatedistributionforthequeuewiththecommon waitingareais mn = 1 a )]TJ/F46 10.9091 Tf 10.91 0 Td [(u 1 m u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 2 n u 2 .41 6 AdaptedfromRonaldWolff, StochasticModelingandtheTheoryofQueues PrenticeHall,1989.

PAGE 252

234 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS where a = X i;j 2 A )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 1 i u 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 2 j u 2 .42 Inthisexample,reversibilitywasquiteuseful.Itwouldhavebeenessentiallyimpossibletoderive.41 algebraically.Andevenifintuitionhadsuggestedthatsolutionasaguess,itwouldhavebeenquitemessy toverifytheguess. 8.6.4.4Closed-FormExpressionfor forAnyReversibleMarkovChain AdaptedfromRonaldNelson, Probability,StochasticProcessesandQueuingTheory ,Springer-Verlag, 1995. RecallthatmostMarkovchains,especiallythosewithinnitestatespaces,donothaveclosed-formexpressionsforthesteady-stateprobabilities.Butwecanalwaysgetsuchexpressionsforreversiblechains,as follows. Chooseaxedstates,andndpathsfromstoallotherstates.Denotethepathtoiby s = j i 1 j i 2 ::: j im i = i .43 Dene i = 1 ; i=s m i k =1 r ik ; i 6 = s .44 where r ik = j ik ;j i;k +1 j i;k +1 ;j i;k .45 Thenthesteady-stateprobabilitiesare i = i P k k .46 Youmaynoticethatthislookssimilartothederivationforbirth/deathprocesses,whichashasbeenpointed out,arereversible.

PAGE 253

8.7.NETWORKSOFQUEUES 235 8.7NetworksofQueues 8.7.1TandemQueues Let'srstconsideranM/M/1queue.Asmentionedearlier,thisisabirth/deathprocess,thusreversible.This hasaninterestingandveryusefulapplication,asfollows. Thinkofthetimesatwhichjobs depart thissystem,i.e.thetimesatwhichjobsnishservice.Inthereversed process,thesetimesare arrivals .Duetothereversibility,thatmeansthatthedistributionofdeparturetimes isthesameasthatofarrivaltimes.Inotherwords: DeparturesfromthissystembehaveasaPoissonprocesswithrate Also,lettheinitialstateXbedistributedaccordingtothesteady-stateprobabilities 7 Duetothe memorylesspropertyofPoissonarrivals,thedistributionofthesystemstateatarrivaltimesisthesameas thedistributionofthesystemstateatnonrandomtimest.Thenbyreversibility,wehavethat: Thestatedistributionatdeparturetimesisthesameasatnonrandomtimes. Andnally,notingthat,givenXt,thestates f X s ;s t g ofthequeuebeforetimetarestatistically independentofthearrivalprocessaftertimet,reversibilitygivesusthat: Givent,thedepartureprocessbeforetimetisstatisticallyindependentofthestates f X s ;s t g of thequeueaftertimet. Let'sapplythatto tandem queues,whicharequeuesactinginseries.Supposewehavetwosuchqueues, withtherst, X 1 t feedingitsoutputtothesecondone, X 2 t ,asinput.Supposetheinputinto X 1 t is aPoissonprocesswithrate ,andservicetimesatbothqueuesareexponentiallydistributed,withrates 1 and 2 X 1 t isanM/M/1queue,soitssteady-stateprobabilitiesfor X 1 t aregivenbyEquation.3,with u = = 1 Bytherstbulleteditemabove,weknowthattheinputinto X 2 t isalsoPoisson.Therefore, X 2 t alsois anM/M/1queue,withsteady-stateprobabilitiesasinEquation.3,with u = = 2 Now,whataboutthejointdistributionof [ X 1 t ;X 2 t ] ?Thethirdbulleteditemabovesaysthattheinput to X 2 t uptotimetisindependentof f X 1 s ;s t g .So,usingthefactthatweareassumingthat X 1 hasthesteady-statedistribution,wehavethat P [ X 1 t = i;X 2 t = j ]= )]TJ/F46 10.9091 Tf 10.91 0 Td [(u 1 u i 1 P [ X 2 t = j ] .47 7 Thisisjustadeviceusedtoavoidsayingthingslikelargenandlargetbelow.

PAGE 254

236 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS Nowletting t !1 ,wegetthatthelong-runprobabilityofthevector [ X 1 t ;X 2 t ] beingequaltoi,jis )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 1 u j 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u 2 u j 2 .48 Inotherwords,thesteady-statedistributionforthevectorhasthetwocomponentsofthevectorbeing independent. Equation.48iscalleda productformsolution tothebalanceequationsforsteady-stateprobabilities. Bytheway,thevector [ X 1 t ;X 2 t ] is not reversible. 8.7.2JacksonNetworks Thetandemqueuesdiscussedinthelastsectioncompriseaspecialcaseofwhatareknownas Jackson networks .Onceagain,thereexistsanenormousliteratureofJacksonandotherkindsofqueuingnetworks. Thematerialcanbecomeverycomplicatedeventhenotationisverycomplex,andwewillonlypresentan introductionhere.OurpresentationisadaptedfromI.Mitrani, ModellingofComputerandCommuncation Systems ,CambridgeUniversityPress,1987. OurnetworkconsistsofNnodes,andjobsmovefromnodetonode.Thereisaqueueateachnode,and servicetimeatnodeiisexponentiallydistributedwithmean 1 = i 8.7.2.1OpenNetworks Eachjoboriginallyarrivesexternallytothenetwork,withthearrivalrateatnodeibeing i .Aftermoving amongvariousnodes,thejobwilleventuallyleavethenetwork.Specically,afterajobcompletesservice atnodei,itmovestonodejwithprobability q ij ,where X j q ij < 1 .49 reectingthefactthatthejobwillleavethenetworkaltogetherwithprobability 1 )]TJ/F52 10.9091 Tf 11.003 8.182 Td [(P j q ij 8 Itisassumed thatthemovementfromnodetonodeismemoryless. Asanexample,youmaywishtothinkofmovementofpacketsamongroutersinacomputernetwork,with thepacketsbeingjobsandtheroutersbeingnodes. Let i denotethetotaltrafcrateintonodei.Bytheusualequatingofowinandowout,wehave i = i + N X j =1 j q ji .50 NotetheinEquations.50,theknownsare i andthe q ji .Wecansolvethissystemoflinearequationsfor theunknowns, i 8 Bytheway, q ii canbenonzero,allowingforfeedbackloopsatnodes.

PAGE 255

8.7.NETWORKSOFQUEUES 237 Theutilizationatnodeiisthen u i = i = i ,asbefore.Jackson'sTheoremthensaysthatinthelongrun, nodeiactsasanM/M/1queuewiththatutilization,andthatthenodesareindependentinthelongrun: 9 lim t !1 P [ X 1 t = i 1 ;:::;X N t = i N ]= N i =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u i u i .51 So,againwehaveaproductformsolution. Let L i denotetheaveragenumberofjobsatnodei.FromEquation.6,wehave L i = u i = )]TJ/F46 10.9091 Tf 10.969 0 Td [(u i .Thus themeannumberofjobsinthesystemis L = N X i =1 u i 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u i .52 Fromthiswecangetthemeantimethatjobsstayinthenetwork,W:FromLittle'sRule, L = W ,so W = 1 N X i =1 u i 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(u i .53 where = 1 + ::: + N isthetotalexternalarrivalrate. Jacksonnetworksarenotgenerallyreversible.ThereversedversionsofJacksonnetworksareworthstudying forotherreasons,butwecannotpursuethemhere. 8.7.3ClosedNetworks InaclosedJacksonnetwork,wehaveforalli, i =0 and X j q ij =1 .54 Inotherwords,jobsneverenterorleavethenetwork.Therehavebeenmanymodelslikethisinthecomputer performancemodelingliterature.Forinstance,amodelmightconsistofsomenodesrepresentingCPUs, somerepresentingdiskdrives,andsomerepresentingusersatterminals. Itturnsoutthatweagaingetaproductformsolution. 10 Thenotationismoreinvolved,sowewillnotpresent ithere. Exercises 1 .InvestigatetherobustnessoftheM/M/1queuemodelwithrespecttotheassumptionofexponential servicetimes,asfollows.Supposetheservicetimeisactuallyuniformlydistributedon,c,sothatthemean 9 Wedonotpresenttheproofhere,butitreallyisjustamatterofshowingthatthedistributionheresatisesthebalanceequations. 10 Thisisconfusing,sincethedifferentnodesarenownotindependent,duetothefactthatthenumberofjobsintheoverall systemisconstant.

PAGE 256

238 CHAPTER8.INTRODUCTIONTOQUEUINGMODELS servicetimewouldbec/2.Assumethatarrivalsdofollowtheexponentialmodel,withmeaninterarrivaltime 1.0.Findthemeanresidencetime,using.9,andcompareittothetruevalueobtainedfrom.32.Do thisforvariousvaluesofc,andgraphthetwocurvesusingR. 2 .Manymathematicalanalysesofqueuingsystemsuse nitesource models.Therearealwaysaxed numberjofjobsinthesystem.Ajobqueuesupfortheserver,getsservedintime S ,thenwaitsarandom time W beforequeuingupfortheserveragain. Atypicalexamplewouldbealeserverwithjclients.Thetime W wouldbethetimeaclientdoeswork beforeitneedstoaccesstheleserveragain. aUseLittle'sRule,ontwoormoreappropriatelychosenboxes,toderivethefollowingrelation: ER = jES U )]TJ/F46 10.9091 Tf 10.909 0 Td [(EW .55 where R isresidencetimetimespentinthequeueplusservicetimeinonecycleforajoband U is theutilizationfractionoftheserver. bSetupacontinuoustimeMarkovchain,assumingexponentialdistributionsfor S and W ,withstate beingthenumberofjobscurrentlyattheserver.Deriveclosed-formexpressionsforthe i 3 .ConsiderthefollowingvariantofanM/M/1queue:Eachcustomerhasacertainamountofpatience, varyingfromonecustomertoanother,exponentiallydistributedwithrate .Whenacustomer'spatience wearsoutwhilethecustomerisinthequeue,he/sheleavesbutnotifhis/herjobisnowinservice.Arrival andserviceratesare and ,respectively. aExpressthe i intermsof and bExpresstheproportionoflostjobsasafunctionofthe i and .

PAGE 257

Chapter9 RenewalTheoryandSomeApplications 9.1Introduction 9.1.1TheLightBulbExample,Generalized Supposreacertainlampisusedcontinuously,andthatwheneveritsbulbburnsout,itisimmediatelyreplaced byanewone.LetNtdenotethenumberofreplacements,called renewals here,thathaveoccurredup throughtimet.Assumethelifetimesofthebulbs, L 1 ;L 2 ; L 3 ; ::: areindependentandidenticallydistributed. Thecollectionofrandomvariables N t ;t 0 iscalleda renewalprocess. Thequantities R i = L 1 + ::: + L i arecalledthe renewalpoints WewillseethatmostrenewalprocessesarenotMarkovian.However,timedoesstartoverattherenewal points.Wesaytheprocessis regenerative atthesepoints. Notethatalthoughwearemotivatingthiswiththelightbulbexample,inwhichthe L i measuretime,the theorywewillpresenthereisnotlimitedtosuchacontext.Allthatisneededisthatthe L i bei.i.d.and nonnegative. Thereisaveryrichcollectionofmathematicalmaterialonrenewalprocesses,andtherearemyriadapplicationstoavarietyofelds. 9.1.2DualityBetweenLifetimeDomainandCountsDomain Averyimportantpropertyofrenewalprocessesisthat N t k ifandonlyif R K t .1 Thisisjustaformalmathematicalofcommonsense:Therehavebeenatleastkrenewalsbynowifandonly ifthe k th renewalhasalreadyoccurred!Butitisaveryimportantdeviceinrenewalanalysis. Equation.1mightbedescribedasrelatingthecountsdomainleft-handsideoftheequationtothe lifetimesdomainright-handside. 239

PAGE 258

240 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS 9.2WhereWeAreGoing SomeofthematerialinSections9.3and9.4ofthisPLNunitmayseemalittletheoetical.However,itall doeshavepracticalvalue,anditwillalsoexercisesomeoftheconceptsyou'velearnedinearlierunits. Afterthosetwosections,thefocuswillmainlybeonconceptswhichapplythetheory,andonspecic examples. 9.3PropertiesofPoissonProcesses 9.3.1Denition ArenewalprocessNtiscalleda Poissonprocess ifeachNthasaPoissondistribution,i.e.forsome wehave P N t = k = e )]TJ/F47 7.9701 Tf 6.586 0 Td [(t t k k ;k =0 ; 1 ; 2 ; 3 ;::: .2 Theparameter iscalledthe intensityparameter oftheprocess,andsince E [ N t ]= t ,ithasthenatural interpretationoftheaveragerateatwhichrenewaleventsoccurperunittime. 9.3.2AlternateCharacterizationsofPoissonProcesses 9.3.2.1ExponentialInterrenewalTimes Theorem12 ArenewalprocessNtisaPoissonprocessifandonlyiftheinterrenewaltimes L i havean exponentialdistributionwithparameter NotethatthisshowsthataPoissonprocessisalsoaMarkovchainwithstatebeingthecurrentnumberof renewals,duetothememorylesspropertyoftheexponentialdistribution. Proof Fortheonlyifpartoftheclaim,notethatfromthedomainswitchingdiscussedinSection.1.2, F L 1 t =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P L 1 >t =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(P N t =0=1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(e )]TJ/F47 7.9701 Tf 6.586 0 Td [(t .3 Differentiatingwithrespecttot,wendthat f L 1 t = e )]TJ/F47 7.9701 Tf 6.587 0 Td [(t .Sincethe L i arei.i.d.bydenitionofrenewal proceses,thisshowsthatallthe L i haveanexponentialdistribution.

PAGE 259

9.3.PROPERTIESOFPOISSONPROCESSES 241 9.3.2.2Stationary,IndependentIncrements Theorem13 SupposeNtisarenewalprocessNtforwhichthe L i arecontinuousrandomvariables. ThenNtisaPoissonprocessifandonlyifithas stationary,independentincrements .Thelatterterm meansthatforall 0 0. Proof Sketch. Fortheonlyifpart:Asnotedabove,PoissonprocesseshavetheMarkovproperty,whichimmediately impliesindependentincrements.Also,sincetimestartsover,sayattimer,thenfors > rwewillhavethat Ns-NrhasthesamedistributionasNs-r. Fortheifpart:TheindependenceoftheincrementsimpliestheMarkovproperty,andsincetheonly continuousdistributionwhichismemorylessistheexponential,thatimpliesthethe L i haveexponential distributions,whichwesawaboveimpliesthatNtisPoisson. NotetoothatforthesereasonsNs-NrwillhavethesamedistributionasNs-r. Wecanusethesepropertiestondthe autocorrelationfunction ofthePoissonprocess,whichshowsthe correlationoftheprocesswithitselfatdifferenttimes s 0 Afterperformingtheremainingcalculations,wendthat c s;t = r s t .9

PAGE 260

242 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS 9.3.3ConditionalDistributionofRenewalTimes Theorem14 SupposeNtisaPoissonprocess.Let R i = L 1 + ::: + L i betherenewaltimes. GivenNt=k,then M 1 ;:::;M k arei.i.d.U,tuniformdistributionon,t,wherethe M i comprisea randompermutationofthe R i Inotherwords,conditionalontherebeingkrenewalswithin,t,theunorderedrenewaltimesareindependentanduniformlydistributedon,t. ThisfactoftenplaysakeyroleinanalysesofPoissonmodels. Let'slookattheintuitivemeaningofthis,usingournotebookview,takingk=3andt=12.0forconcreteness. Eachlineofthenotebookwouldconsistoftheresultsofourobservingtheprocessuptotime12.0.Therst columnwouldshowthe R i .InrowsinwhichN.0=3,wewouldhavesecondandthirdcolumnsaswell. Thesecondcolumnwouldconsistofthe M i ,andthethirdcolumnwouldstatewhichpermutationisusedto generatethesecondcolumnfromtherst.Sincetherearesixpermutations,thirdcolumnwouldconsistof randomnumbersintheset1,2,3,4,5,6,with1meaningthepermutation123,2meaning132,3meaning321 andsoon. AmongalltherowsforwhichN.0=3,1/6wouldhave1inthethirdcolumn1/6wouldhave2,andso on.Theuniformdistributionpartofthetheoremissayingthatamongtheserows,1/2ofthemwillhave, forinstance, M 1 < 6 : 0 .ThatwillNOTbetruefor R 1 ;considerablyfewerthan1/2oftherowswillhave R 1 < 6 : 0 .Theindependencepartofthetheoremissaying,forexample,that1/4oftheserowswillhave both M 1 < 6 : 0 and M 2 < 6 : 0 Here'stheproof. Proof LetYdenotethenumberof M i s .Then P Y = b j N t = k = P Y = b and N t = k P [ N t = k ] .10 = P [ N s = b and N t )]TJ/F46 10.9091 Tf 10.909 0 Td [(N s = k )]TJ/F46 10.9091 Tf 10.909 0 Td [(b ] P [ N t = k ] .11 = e )]TJ/F48 5.9776 Tf 5.756 0 Td [(s s b b e )]TJ/F48 5.9776 Tf 5.756 0 Td [( t )]TJ/F48 5.9776 Tf 5.756 0 Td [(s t )]TJ/F47 7.9701 Tf 6.586 0 Td [(s k )]TJ/F48 5.9776 Tf 5.756 0 Td [(b k )]TJ/F47 7.9701 Tf 6.586 0 Td [(b e t t k k .12 = k b s t b )]TJ/F46 10.9091 Tf 12.105 7.38 Td [(s t k )]TJ/F47 7.9701 Tf 6.586 0 Td [(b .13 Thisistheprobabilitythatwouldariseifthe M i werei.i.d.U,tasclaimedinthetheorem. 1 1 Tomakethisproofrigorous,though,onewouldhavetoexplainwhy only suchadistributionforthe M i wouldproducethis probability.

PAGE 261

9.3.PROPERTIESOFPOISSONPROCESSES 243 9.3.4DecompositionandSuperpositionofPoissonProcesses Theorem15 Poissonprocessescanbesplitandcombined: aLetNtbeaPoissonprocesswithintensityparameter .Saywe decompose Ntinto N 1 t and N 2 t byassigningeachrenewalinNttoeither N 1 t or N 2 t withprobabilitypand1-prespectively. ThenthetworesultingprocessesareagainPoissonprocesses.Theyareindependentofeachother, andhaveintensityparameters p and )]TJ/F46 10.9091 Tf 10.909 0 Td [(p bIfwe superimpose twoindependentPoissonprocesses N 1 t and N 2 t ,theresult N t = N 1 t + N 2 t willbeaPoissonprocess,withintensityparameterequaltothesumofthetwooriginalparameters. Thesepropertiesareoftenusefulinqueuingmodels,whereanarrivalprocessissubdividedintwoprocesses correspondingtotwojobclasses. 9.3.5NonhomogeneousPoissonProcesses AusefulvariantofPoissonprocessesisthe nonhomogeneousPoissonprocess .Thekeyhereisthatthe intensityparameter variesovertime.Wewillwriteitas t .I'lldeneitthisway: Denition16 Let N t ;t 0 beacountingprocesswithindependentincrements,andlet m t = E [ N t ] .14 Ifforall 0
PAGE 262

244 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS 9.3.5.1Example:SoftwareReliability NonhomogeneousPoissonprocessmodelshavebeenusedsuccessfullytomodelthearrivalsi.e.discoveriesofbugsinsoftware.Questionsthatariseare,forinstance,Whenarewereadytoship?,meaning whencanwebelievewithsomecondencethatmostbugshavebeenfound? Typicallyonecollectsdataonbugdiscoveriesfromanumberofprojectsofsimilarcomplexity,andestimates mtfromthatdata. SeeforexampleEstimatingtheParametersofaNon-homogeneousPoisson-ProcessModelforSoftware Reliability, IEEETransactionsonReliability,Vol.42,No.4,1993 9.4PropertiesofGeneralRenewalProcesses Wenowturnourattentiontothegeneralcase,inwhichthe L i arenotnecessarilyexponentiallydistributed. Wewillstillassumethe L i tobecontinuousrandomvariables,though. 9.4.1TheRegenerativeNatureofRenewalProcesses RecallthatMarkovchainsarememoryless.Ifwearenowattimet,timestartsover,andtheprobabilities ofeventsaftertimetdonotdependonwhathappenedbeforetimet. Ifthe L i arenotexponentiallydistributed,NtisnotMarkovian,sincetheexponentialdistributionisthe onlycontinuousmemorylessdistribution.However,itistruethattimestartsoverateachrenewalepoch R i = L 1 + ::: + L i .Notethedifference:ThedenitionoftheMarkovpropertyconcernstimestartingover ataxedtime,t.Here,inthecontextofrenewalprocesses,timestartsoveratrandomtimes,oftheform R i 9.4.2SomeoftheMainTheorems Letmt=E[Nt].Manyoftheresultsconcernthisfunctionm.Let F t = P L i
PAGE 263

9.4.PROPERTIESOFGENERALRENEWALPROCESSES 245 IfZisanonnegativecontinuousrandomvariable,then E Z = Z 1 0 [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F Z t ] dt .18 Proof Inthediscretecase, E Z = 1 X i =1 iP Z = i .19 = 1 X i =1 P Z = i i X j =1 1 .20 = 1 X j =1 1 X i = j P Z = i .21 = 1 X j =1 P Z j .22 = 1 X j =0 P Z>j .23 = 1 X j =0 [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F Z j ] .24 Thecontinuouscaseissimilar. Now,let'sapplythat: Theorem18 m t = 1 X n =1 F n t .25 Proof LetZ=Nt.ThenyetagainswitchingdomainsasinSection.1.2, m t = 1 X j =1 P N t j = 1 X j =1 P R j t = 1 X j =1 F j t .26

PAGE 264

246 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS 9.4.2.2TheRenewalEquation Theorem19 Thefunctionmsatisestheequation m t = F t + Z t 0 m t )]TJ/F46 10.9091 Tf 10.909 0 Td [(w f w dw .27 Proof UsingtheLawofTotalExpectation,wehave m t = E [ N t ]= E f E [ N t j L 1 ] g .28 Butattime L 1 timestartsoveragain,bytheregenerativepropertyofrenewalprocesses.So, E [ N t j L 1 = 1+ m t )]TJ/F46 10.9091 Tf 10.909 0 Td [(L 1 ; if L 1
PAGE 265

9.4.PROPERTIESOFGENERALRENEWALPROCESSES 247 Proof Foranyfunctionhforwhichtheabsolutevaluehasaniteintegralon ; 1 ,let h denetheLaplace transformofh, h s = R 1 0 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(st h t dt Now,wewilltaketheLaplacetransformofbothsidesof.27.Todoso,wewillneedtostatesomefacts aboutLaplacetransforms. Remember,Laplacetransformsare,exceptforachangeofvariable,justlikegeneratingfunctionsormoment generatingfunctions,whichyouprobablystudiedinyourundergraduateprobabilitycourse.ThusLaplace transformshavethesamepropertiesasgeneratingfunctions. Nowrecallthatthegeneratingfunctionofthesumoftwoindependentnonnegativerandomvariablesisthe productoftheirindividualmomentgeneratingfunctions.Well,thedensityofsuchasumhasthesame integralformasin.27,whichwecalla convolution. Eventhoughthempartoftheintegralisnota densityfunction,itisstilltruethattheLaplacetransformofaconvolutionoftwofunctionsistheproductof theirindividualLaplacetransforms. So,takingtheLaplacetransformofbothsidesof.27yields: m s = F s + m s f s .31 i.e. m s = F s 1 )]TJ/F15 10.9091 Tf 10.909 0 Td [( f s .32 ThissaysthatmuniquelydeterminestheLaplacetransformsof F and f ,andsincethereisaone-to-one correspondencebetweendistributionsandLaplacetransforms,weseethatmuniquelydeterminestheinterrenewaldistribution.Inotherwords,thereisonlyonepossibleinterrenewaldistributionforanygivenmean function. Infact,somesimilaranalysis,whichwewillnotpresenthere,yields: f s = r s 1+ r s .33 where r t = d dt m t .So,wecanrecoverffromm. Asanexample,weearliersawthattherearethreeequivalentdenitionsofaPoissonprocess,i.e.each impliestheother.Wecanusetheaboveresulttoshowoneofthoseequivalences:

PAGE 266

248 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS Supposearenewalprocesshasstationary,independentincrements.Thiswouldimplythatmt=ctforsome c > 0,andthusrt=c.Then r s = Z 1 0 e )]TJ/F47 7.9701 Tf 6.587 0 Td [(st cdt = c s .34 so.33givesus f s = c s + c .35 ThisistheLaplacetransformfortheexponentialdistributionwithparameterc.So,weseehowthestationary,independentincrementspropertyimpliesexponentialinterrenewaltimes. 9.4.2.4AsymptoticBehaviorofmt Theorem21 lim t !1 m t t = 1 E L .36 Thisshouldmakegoodintuitivesensetoyou.Bytheway,thisandsomeotherthingswe'llndinthisunit canbeusedtoformallyprovesomeassertionswemadeononlyanintuitivebasisinearlyunits. 9.5AlternatingRenewalProcesses 9.5.1DenitionandMainResult Supposewehaveasequenceofpairsofrandomvariables Y n ;Z n whicharei.i.d. aspairs .Inotherwords, forinstancethepair Y 1 ;Z 1 isindependentof,buthasthesamedistributionas, Y 2 ;Z 2 andsoon,buton theotherhand Y n and Z n arealloweddependency. Imagineamachinebeingbusy,thenidle,thenbusy,thenidle,andsoon,with Y n beingtheamountofon timeinthe n th cycle,andsimilarly Z n beingtheofftime.Eachtimeanon/offpairnishes,callthata renewal.Thesequenceiscalledan alternatingrenewalprocess Itisintuitivelyclear,andcanbeproven,that lim t !1 P onattimet = E Y E Y + E Z .37 and lim t !1 P offattimet = E Z E Y + E Z .38 Again,theseresultscanbeusedtoformallyprovesomeassertionswehavemadeinearlierunitsonjustan intuitivebasis.

PAGE 267

9.5.ALTERNATINGRENEWALPROCESSES 249 9.5.2Example:InventoryProblemdifcult Adaptedfrom StochasticProcesses ,bySheldonRoss,Wiley,1996. Consideravendorwhichusesan s,Spolicy forreplenishingitsinventory. 2 Whatthismeansisthatafter llingacustomerorder,ifinventoryoftheitemfallsbelowlevels,theinventoryisreplenishedtolevelS. Supposecustomerordersarriveasarenewalprocess,withthe L n inthiscasebeingi.i.d.interarrivaltimes. Let O n denotethesizeofthe n th order,andletItdenotetheamountofinventoryonhandattimet.Take ItobeS.Wewishtondthelong-rundistributionofIt,i.e. lim t !1 P I t u .39 forall sS )]TJ/F46 10.9091 Tf 10.909 0 Td [(x g .41 Inotherwords, N x istheindexoftherstorderwhichmakestheinventoryfallbelowx.Allthetimeprior tothisorder,theinventoryisatleastx.Thenourearlierpointthatalltransitionsintotheonandoffstates coincidewithsome R i canberenedtosaythat Customernumber N u triggerstheendoftheoncycle.Thusthelengthoftheoncycleis R N u Customernumber N s endsthefullcycle.Thusthelengthofthefullcycleis R N s Thereforethefractionin.40isequalto E R N u E R N s = E P N u i =1 L i E P N s i =1 L i = E N u E L E N s E L = E N u E N s .42 2 Igenerallytrytoreservecapitallettersfornamesofrandomvariables,usinglower-caselettersorGreektodenoteconstants, butamusingRoss'notationhere.

PAGE 268

250 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS whereLisarandomvariablehavingthecommondistributionofthe L i .Herewehaveusedthefactthatthe O i areindependentofthe L j Nowconsideranewrenewalprocess,withthe O i playingtheroleoflightbulblifetimes.Thisisabit difculttogetusedto,sincethe O i areofcoursenottimes,buttheyarenonnegativerandomvariables,and thusfromtheconsiderationsof.1weseethat ~ N t = max f n : O 1 + ::: + O n t g .43 isindeedarenewalprocess.Theimportanceofthatrenewalprocessisthat ~ N t +1= N S )]TJ/F47 7.9701 Tf 6.587 0 Td [(t .44 Nowlet m O x = E ~ N x .45 Then.40and9.42implythat lim t !1 P I t u = m O S )]TJ/F46 10.9091 Tf 10.909 0 Td [(u +1 m O S )]TJ/F46 10.9091 Tf 10.909 0 Td [(s +1 .46 ThoughRoss'treatmentendsatthispoint,wecanalsoextenditbyusing.36ifS-sislarge,yielding lim t !1 P I t u S )]TJ/F47 7.9701 Tf 6.587 0 Td [(u EO +1 S )]TJ/F47 7.9701 Tf 6.586 0 Td [(s EO +1 = S )]TJ/F46 10.9091 Tf 10.909 0 Td [(u + EO S )]TJ/F46 10.9091 Tf 10.909 0 Td [(s + EO .47 9.6Residual-LifeDistribution Itisassumedherethatyouknowaboutthebusparadox,describedinSection2.5ofourunitoncontinuous distributions. 9.6.1Residual-LifeDistribution Inthebus-paradoxexample,ifwehadbeenworkingwithlightbulbsinsteadofbuses,theanalogofthetime wewaitforthenextbuswouldbetheremaininglifetimeofthecurrentlightbulb.So,ingeneral,thetime fromaxedtimepointtuntilthenextrenewal,isknownasthe residuallife .Anothernameforitisthe forwardrecurrencetime. Letusderivethedistributionofthisquantity. Hereisaderivationforthecontinuouscase.Considerarenewalprocesswithinter-renewaltimesi.e. lifetimes L n ,anddeneDttobetheresiduallifeofthecurrentlightbulbattimet.

PAGE 269

9.6.RESIDUAL-LIFEDISTRIBUTION 251 Forw > 0,formthealternatingrenewalprocess Y n ;Z n ,asfollows.Forthen-thlightbulb, Z n isthe portionofthebulb'soveralllifetime L n duringwhichtheresiduallifeislessthanorequaltow.Sayfor examplethat L n is500andwis202.6.Then Z n is202.6and Y n is297.4.Basically Z n willbewand Y n willbe L n )]TJ/F46 10.9091 Tf 10.909 0 Td [(w .However,wmightbelargerthan L n ,soourformaldenitionis Z n =min L n ;w .48 Y n = L n )]TJ/F46 10.9091 Tf 10.909 0 Td [(Z n .49 Themainpointisthatthe Z n portionof L n isthetimeduringwhich D t w .ThenfromEquation.38 wehavethat lim t !1 P D t w = E Z E Y + E Z .50 = E [min L;w ] E L .51 SupposeLisacontinuousrandomvariable.ByLemma17wehave E [min L;w ]= Z 1 0 P [min L;w >u ] du = Z w 0 P L>u du = Z w 0 [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L u ] du .52 since P [min L;w >u ]=0 whenever u>w Substitutingthisinto.50,andtakingderivativeswithrespecttow,wehavethat lim t !1 f D t w = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L w E L .53 Thisisaclassicresult,ofcentralimportanceandusefulness,asseeninourupcomingexampleslaterinthis section. 9.6.2AgeDistribution AnalogoustotheresiduallifetimeDt,letAtdenotethe age sometimescalledthe backwardrecurrence time ofthecurrentlightbulb,i.e.thelengthoftimeithasbeeninservice.Inthebus-paradoxexample, Atwouldbethetimewhichhaselapsedsincethelastarrivalofabus,tothecurrenttimet.Usingan approachsimilartothattakenabove,onecanshowthat lim t !1 f A t w = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L w E L .54 Inotherwords,Athasthesamelong-rundistributionasDt!

PAGE 270

252 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS Hereisaderivationforthecaseinwhichthe L i arediscrete.Remember,ourxedobservationpointtis assumedlarge,sothatthesystemisinsteady-state.LetWdenotethelifetimesofarforthecurrentbulb. ThenwehaveaMarkovchaininwhichourstateatanytimeisthevalueofW.Saywehaveanewbulbat time52.ThenWis0atthattime.Ifthetotallifetimeturnsouttobe,say,12,thenWwillbe0againattime 64. Fori=0,1,...thetransitionprobabilitiesare p i; 0 = P L = i +1 j L>i = p L i +1 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i .55 p i;i +1 = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i +1 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i .56 Thestructureoftheseequationsallowsaclosed-formsolution.Let q i = 1 )]TJ/F46 10.9091 Tf 10.91 0 Td [(F L i +1 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i .57 andwrite i +1 = i q i .58 Applying.58recursively,wehave i +1 = 0 q i q i )]TJ/F44 7.9701 Tf 6.586 0 Td [(1 q 0 .59 Buttheright-handsideof.59telescopesdownto i +1 = 0 [1 )]TJ/F46 10.9091 Tf 10.91 0 Td [(F L i +1] .60 Then 1= 1 X i =0 i = 0 1 X i =0 [1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i ]= 0 E L .61 by. Thus i = 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F L i +1 EL .62 inanalogyto.54.

PAGE 271

9.6.RESIDUAL-LIFEDISTRIBUTION 253 9.6.3MeanoftheResidualandAgeDistributions Takingtheexpectedvalueof.53or.54,wegetadoubleintegral.Reversingtheorderofintegration, wendthatthemeanresiduallifeorageisgivenby E L 2 2 EL .63 9.6.4Example:EstimatingWebPageModicationRates Mypaper,EstimationofInternetFile-Access/ModicationRates, ACMTransactionsonModelingandComputerSimulation ,2005,15,3,233-253,concernsthefollowingproblem. SupposeweareinterestedintherateofmodcationofaleinsomeFTPrepositoryontheWeb.Wehavea spidervisitthesiteatregularintervals.Ateachvisit,thespiderrecordsthetimeoflastmodicationtothe site.WedonotobservehowMANYtimesthesitewasmodied.Theproblemthenishowtoestimatethe modicationratefromthelast-modicationtimedatathatwedohave. Iassumedthatthemodicationsfollowarenewalprocess.Thenthedifferencebetweenthespidervisittime andthetimeoflastmodcationisequaltotheageAt.Ithenappliedalotofrenewaltheorytodevelop statisticalestimatorsforthemodcationrate. 9.6.5Example:TheS,sInventoryModelAgain HereIextendRoss'examplethatwesawinSection9.5.2. Whenanordercausestheinventorytogobelows,wemustdipintoourreservestollit.LetRbethe amountofreserveswemustdrawupon.Assumingthatwealwayshavesufcientreserves,whatisthe distributionofR? Recalltherenewalprocess ~ N t inSection9.5.2.ThenthedistributionofRisthatoftheresiduallifefor theprocess ~ N t ,givenapproximatelyby.53. SupposeforinstancethatS=20.0,s=2.5and f O x =2 x for 0
PAGE 272

254 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS Sometrackswillcontaindatafromonlyonele.Thelemayextendontoothertracksaswell.Let'snd thelong-runproportionoftrackswhichhavethisproperty. ThinkofthediskasconsistingofaVeryLongLine,withtheendofonetrackbeingfollowedimmediately bythebeginningofthenexttrack.Thepointsatwhichlesbeginthenformarenewalprocess,withtime beingdistancealongtheVeryLongLine.Ifweobservethediskattheendofthe k th track,thisisobserving attimek.ThattrackconsistsentirelyofoneleifandonlyiftheageAofthecurrentlei.e.the distancebacktothebeginningofthatleisgreaterthan1.0. ThenfromEquation.54,wehave f A w = 1 )]TJ/F47 7.9701 Tf 12.105 4.295 Td [(w 3 1 : 5 = 2 3 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(2 9 w .65 Then P A> 1= Z 3 1 2 3 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(2 9 w dw = 4 9 .66 9.6.7Example:EventSetsinDiscreteEventSimulationdifcult Discreteeventsimulationinvolvessystemswhosestateschangeinadiscreteratherthanacontinuousmanner.Forexample,thenumberofpacketscurrentlywaitinginanetworkrouterchangesdiscretely,whilethe airtemperatureinaweathermodelchangescontinuously. Adiscreteeventsimulationprogrammustmaintainan eventset ,whichisadatastructurecontainingall thependingevents.Tomakethisconcrete,supposewearesimulatingak-serverqueue. 4 Therearetwo kindsofeventsjobservicecompletionsandcustomerarrivals.Sincewehavekservers,atanytimeinthe simulationwecouldhaveasmanyask+1pendingevents.Ifk=2,forinstance,oureventsetcouldbe, say,consistofaservicecompletionforMachine1attime124.3,oneforMachine2at95.4,andacustomer arrivalattime99.0. Thecoreoftheprogramconsistsofamainloopthatrepeatedlyloopsthroughthefollowing: aDeletetheearliestmemberoftheeventset. bUpdatesimulatedtimetothetimeforthatevent. cGenerateaneweventtriggeredbytheonejustexecuted. dAddtheneweventtotheeventset. Inourk-serverqueueexample,saytheeventinaisaservicecompletion.Theniftherearejobswaitingin thequeue,thiswilltriggerthestartofserviceforthenewjobattheheadofthequeue,inc. 4 Forthedetails,usingtheSimPylanguage,seemyPLNunitintroducingdiscreteeventsimulation,at http://heather. cs.ucdavis.edu/ matloff/156/PLN/DESimIntro.pdf .

PAGE 273

9.6.RESIDUAL-LIFEDISTRIBUTION 255 Duetoa,theeventlistisapriorityqueue,andthusanyofthewealthofdatastructuresforpriorityqueues couldbeusedtoimplementit.Herewewillassumethesimplestone,alinearlinkedlist,whichisalways maintainedinsortedorder. Thequestionisthis:Indabove,weneedtosearchthelisttodeterminewheretoinsertourneweventin ordertoenhanceourprogram'srunspeed.Shouldwestartoursearchattheheadofthelistoratitstail? Ananswertothisquestionwasprovidedinanoldpaper:OntheDistributionofEventTimesfortheNotices inaSimulationEventList,JeanVaucher, INFOR ,June1977.Vaucherrealizedthatthisproblemwasright uprenewaltheory'salley.Ourpresentationhereisadaptedfromthatpaper. First,thinkofalleventsthatareevergeneratedduringtheentiresimulation.Theycomprisetherenewal process.Let L i denotethesimulatedtimebetweenthe i )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 st and i th events. Let t c denotethetimeoftheeventina,andlet t d denoteearliesttimeintheeventlist after stepais performed.LetLdenotethedurationoftheeventgeneratedinc,sothatthatevent'ssimulatedoccurrence timewillbe t c + L Wewillassumethatthethesizeoftheeventliststaysconstantatr.Thisistrueformanysimulations,and approximatelytrueformanyothers.Now,thequestionofwhereintheeventlisttoplaceourneweventis equivalenttoaskinghowmanyeventsinthelisthavetheirtimes t e after t c + L .Thatinturnisthesameas askinghowmanyoftheeventshavearesiduallifeZofgreaterthanL.Callthatnumber K Thequantityofinterestis = E K r .67 If < 0 : 5 weshouldstartoursearchattheheadofthelist;otherwiseweshouldstartatthetail. So,let'sderiveEK.Firstwrite E K = E [ E K j L ] .68 Tosimplifynotation,letgdenotetheresiduallifedensity.53.Then,usingthepointaboveconcerning residuallifeandkeepinginmindthatLisaconstant'inourconditionalcomputationhere,wehave E K j L = rP Z>L = r Z 1 L g t dt .69 Nowusethisin.68: E K = r Z 1 0 Z 1 s g t dt f L s ds .70 Onethendoesanintegrationbyparts,makinguseofthefactthat g 0 z = )]TJ/F47 7.9701 Tf 11.03 5.374 Td [(f x E L .Afterallthedustclears,

PAGE 274

256 CHAPTER9.RENEWALTHEORYANDSOMEAPPLICATIONS itturnsoutthat =1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(E L Z 1 0 g 2 t dt .71 InVaucher'spaper,hethenevaluatedthisexpressionforvariousdistributionsofL,andfoundthatformost ofthedistributionshetried, wasinthe0.6to0.8range.Inotherwords,onetypicallyshouldstartthe searchfromthetailend. 9.6.8Example:MemoryPagingModel Adaptedfrom ProbabiilityandStatistics,withReliability,QueuingandComputerScienceApplicatiions byK.S.Trivedi,Prentice-Hall,1982and2002. Consideracomputerwithanaddressspaceconsistingofnpages,andaprogramwhichgeneratesasequence ofmemoryreferenceswithaddressespagenumbers D 1 ;D 2 ;::: Inthissimplemodel,the D i areassumed tobei.i.d.integer-valuedrandomvariables. Foreachpagei,let T ij denotethetimeatwhichthe j th referencetopageioccurs.Thenforeachxedi,the T ij formarenewalprocess,andthusallthetheorywehavedevelopedhereapplies. 5 Let F i bethecumulative distributionfunctionfortheinterrenewaldistribution,i.e. F i m = P L ij = m ,where L ij = T ij )]TJ/F46 10.9091 Tf 10.615 0 Td [(T i;j )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 form=0,1,2,... Let W t; denotetheworkingsetattimet,i.e.thecollectionofpagenumbersofpagesaccessedduring thetime t )]TJ/F46 10.9091 Tf 10.909 0 Td [(;t ,andlet S t; denotethesizeofthatset.Weareinterestedinndingthevalueof s =lim t !1 E [ S t; ] .72 Sincethedenitionoftheworkingsetinvolveslookingbackward amountoftimefromtimet,agoodplace tolookforanapproachtonding s mightbetousethelimitingdistributionofbackward-recurrencetime, givenbythediscreteanalogofEquation.54. Accordingly,let A i t betheageattimetforpagei.Pageiisintheworkingsetifandonlyifithasbeen accessedaftertime t )]TJ/F46 10.9091 Tf 11.045 0 Td [( .,i.e. A i t < .Thus,usingthediscreteanalogof.54andletting 1 i be1or0 accordingtowhetherornot A i t < ,wehavethat s =lim t !1 E n X i =1 1 i =lim t !1 n X i =1 P A i t < = n X i =1 )]TJ/F44 7.9701 Tf 6.587 0 Td [(1 X j =0 1 )]TJ/F46 10.9091 Tf 10.909 0 Td [(F i j E L i .73 5 Note,though,thtallrandomvariablesherearediscrete,notcontinuous.