OGT Home  UPF Home  View Cart 
Material Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 Grinstead and Snell's In tro duction to Probabilit y The CHANCE Pro ject 1 V ersion dated 4 July 2006 1 Cop yrigh t (C) 2006 P eter G. Do yle. This w ork is a v ersion of Grinstead and Snell's `In tro duction to Probabilit y 2nd edition', published b y the American Mathematical Societ y Cop yrigh t (C) 2003 Charles M. Grinstead and J. Laurie Snell. This w ork is freely redistributable under the terms of the GNU F ree Do cumen tation License. PAGE 2 T o our wiv es and in memory of Reese T. Prosser PAGE 3 Con ten ts Preface vii 1 Discrete Probabilit y Distributions 1 1.1 Sim ulation of Discrete Probabilities . . 1 1.2 Discrete Probabilit y Distributions . . 18 2 Con tin uous Probabilit y Densities 41 2.1 Sim ulation of Con tin uous Probabilities . . 41 2.2 Con tin uous Densit y F unctions . . 55 3 Com binatorics 75 3.1 P erm utations . . . 75 3.2 Com binations . . . 92 3.3 Card Sh uing . . . 120 4 Conditional Probabilit y 133 4.1 Discrete Conditional Probabilit y . . 133 4.2 Con tin uous Conditional Probabilit y . . 162 4.3 P arado xes . . . . 175 5 Distributions and Densities 183 5.1 Imp ortan t Distributions . . . 183 5.2 Imp ortan t Densities . . . 205 6 Exp ected V alue and V ariance 225 6.1 Exp ected V alue . . . 225 6.2 V ariance of Discrete Random V ariables . . 257 6.3 Con tin uous Random V ariables . . 268 7 Sums of Random V ariables 285 7.1 Sums of Discrete Random V ariables . . 285 7.2 Sums of Con tin uous Random V ariables . . 291 8 La w of Large Num b ers 305 8.1 Discrete Random V ariables . . 305 8.2 Con tin uous Random V ariables . . 316 v PAGE 4 vi CONTENTS 9 Cen tral Limit Theorem 325 9.1 Bernoulli T rials . . . 325 9.2 Discrete Indep enden t T rials . . 340 9.3 Con tin uous Indep enden t T rials . . 356 10 Generating F unctions 365 10.1 Discrete Distributions . . . 365 10.2 Branc hing Pro cesses . . . 376 10.3 Con tin uous Densities . . . 393 11 Mark o v Chains 405 11.1 In tro duction . . . . 405 11.2 Absorbing Mark o v Chains . . . 416 11.3 Ergo dic Mark o v Chains . . . 433 11.4 F undamen tal Limit Theorem . . 447 11.5 Mean First P assage Time . . . 452 12 Random W alks 471 12.1 Random W alks in Euclidean Space . . 471 12.2 Gam bler's Ruin . . . 486 12.3 Arc Sine La ws . . . 493 App endices 499 Index 503 PAGE 5 PrefaceProbabilit y theory b egan in sev en teen th cen tury F rance when the t w o great F renc h mathematicians, Blaise P ascal and Pierre de F ermat, corresp onded o v er t w o problems from games of c hance. Problems lik e those P ascal and F ermat solv ed con tin ued to inruence suc h early researc hers as Huygens, Bernoulli, and DeMoivre in establishing a mathematical theory of probabilit y T o da y probabilit y theory is a w ellestablished branc h of mathematics that nds applications in ev ery area of sc holarly activit y from m usic to ph ysics, and in daily exp erience from w eather prediction to predicting the risks of new medical treatmen ts. This text is designed for an in tro ductory probabilit y course tak en b y sophomores, juniors, and seniors in mathematics, the ph ysical and so cial sciences, engineering, and computer science. It presen ts a thorough treatmen t of probabilit y ideas and tec hniques necessary for a rm understanding of the sub ject. The text can b e used in a v ariet y of course lengths, lev els, and areas of emphasis. F or use in a standard oneterm course, in whic h b oth discrete and con tin uous probabilit y is co v ered, studen ts should ha v e tak en as a prerequisite t w o terms of calculus, including an in tro duction to m ultiple in tegrals. In order to co v er Chapter 11, whic h con tains material on Mark o v c hains, some kno wledge of matrix theory is necessary The text can also b e used in a discrete probabilit y course. The material has b een organized in suc h a w a y that the discrete and con tin uous probabilit y discussions are presen ted in a separate, but parallel, manner. This organization disp els an o v erly rigorous or formal view of probabilit y and oers some strong p edagogical v alue in that the discrete discussions can sometimes serv e to motiv ate the more abstract con tin uous probabilit y discussions. F or use in a discrete probabilit y course, studen ts should ha v e tak en one term of calculus as a prerequisite. V ery little computing bac kground is assumed or necessary in order to obtain full b enets from the use of the computing material and examples in the text. All of the programs that are used in the text ha v e b een written in eac h of the languages T rueBASIC, Maple, and Mathematica. This b o ok is distributed on the W eb as part of the Chance Pro ject, whic h is dev oted to pro viding materials for b eginning courses in probabilit y and statistics. The computer programs, solutions to the o ddn um b ered exercises, and curren t errata are also a v ailable at this site. Instructors ma y obtain all of the solutions b y writing to either of the authors, at jlsnell@dartmouth.edu and cgrinst1@sw arthmore.edu. vii PAGE 6 viii PREF A CE FEA TURES L evel of rigor and emphasis: Probabilit y is a w onderfully in tuitiv e and applicable eld of mathematics. W e ha v e tried not to sp oil its b eaut y b y presen ting to o m uc h formal mathematics. Rather, w e ha v e tried to dev elop the k ey ideas in a somewhat leisurely st yle, to pro vide a v ariet y of in teresting applications to probabilit y and to sho w some of the nonin tuitiv e examples that mak e probabilit y suc h a liv ely sub ject. Exer cises: There are o v er 600 exercises in the text pro viding plen t y of opp ortunit y for practicing skills and dev eloping a sound understanding of the ideas. In the exercise sets are routine exercises to b e done with and without the use of a computer and more theoretical exercises to impro v e the understanding of basic concepts. More dicult exercises are indicated b y an asterisk. A solution man ual for all of the exercises is a v ailable to instructors. Historic al r emarks: In tro ductory probabilit y is a sub ject in whic h the fundamen tal ideas are still closely tied to those of the founders of the sub ject. F or this reason, there are n umerous historical commen ts in the text, esp ecially as they deal with the dev elopmen t of discrete probabilit y Pe dago gic al use of c omputer pr o gr ams: Probabilit y theory mak es predictions ab out exp erimen ts whose outcomes dep end up on c hance. Consequen tly it lends itself b eautifully to the use of computers as a mathematical to ol to sim ulate and analyze c hance exp erimen ts. In the text the computer is utilized in sev eral w a ys. First, it pro vides a lab oratory where c hance exp erimen ts can b e sim ulated and the studen ts can get a feeling for the v ariet y of suc h exp erimen ts. This use of the computer in probabilit y has b een already b eautifully illustrated b y William F eller in the second edition of his famous text A n Intr o duction to Pr ob ability The ory and Its Applic ations (New Y ork: Wiley 1950). In the preface, F eller wrote ab out his treatmen t of ructuation in coin tossing: \The results are so amazing and so at v ariance with common in tuition that ev en sophisticated colleagues doubted that coins actually misb eha v e as theory predicts. The record of a sim ulated exp erimen t is therefore included." In addition to pro viding a lab oratory for the studen t, the computer is a p o w erful aid in understanding basic results of probabilit y theory F or example, the graphical illustration of the appro ximation of the standardized binomial distributions to the normal curv e is a more con vincing demonstration of the Cen tral Limit Theorem than man y of the formal pro ofs of this fundamen tal result. Finally the computer allo ws the studen t to solv e problems that do not lend themselv es to closedform form ulas suc h as w aiting times in queues. Indeed, the in tro duction of the computer c hanges the w a y in whic h w e lo ok at man y problems in probabilit y F or example, b eing able to calculate exact binomial probabilities for exp erimen ts up to 1000 trials c hanges the w a y w e view the normal and P oisson appro ximations. A CKNO WLEDGMENTS An y one writing a probabilit y text to da y o w es a great debt to William F eller, who taugh t us all ho w to mak e probabilit y come aliv e as a sub ject matter. If y ou PAGE 7 PREF A CE ix nd an example, an application, or an exercise that y ou really lik e, it probably had its origin in F eller's classic text, An In tro duction to Probabilit y Theory and Its Applications. W e are indebted to man y p eople for their help in this undertaking. The approac h to Mark o v Chains presen ted in the b o ok w as dev elop ed b y John Kemen y and the second author. Reese Prosser w as a silen t coauthor for the material on con tin uous probabilit y in an earlier v ersion of this b o ok. Mark Kernighan con tributed 40 pages of commen ts on the earlier edition. Man y of these commen ts w ere v ery though tpro v oking; in addition, they pro vided a studen t's p ersp ectiv e on the b o ok. Most of the ma jor c hanges in this v ersion of the b o ok ha v e their genesis in these notes. F uxing Hou and Lee Na v e pro vided extensiv e help with the t yp esetting and the gures. John Finn pro vided v aluable p edagogical advice on the text and and the computer programs. Karl Knaub and Jessica Sklar are resp onsible for the implemen tations of the computer programs in Mathematica and Maple. Jessica and Gang W ang assisted with the solutions. Finally w e thank the American Mathematical So ciet y and in particular Sergei Gelfand and John Ewing, for their in terest in this b o ok; their help in its pro duction; and their willingness to mak e the w ork freely redistributable. PAGE 8 x PREF A CE PAGE 9 Chapter 1 Discrete Probabilit y Distributions1.1 Sim ulation of Discrete Probabilities Probabilit y In this c hapter, w e shall rst consider c hance exp erimen ts with a nite n um b er of p ossible outcomes 1 2 n F or example, w e roll a die and the p ossible outcomes are 1, 2, 3, 4, 5, 6 corresp onding to the side that turns up. W e toss a coin with p ossible outcomes H (heads) and T (tails). It is frequen tly useful to b e able to refer to an outcome of an exp erimen t. F or example, w e migh t w an t to write the mathematical expression whic h giv es the sum of four rolls of a die. T o do this, w e could let X i i = 1 ; 2 ; 3 ; 4 ; represen t the v alues of the outcomes of the four rolls, and then w e could write the expression X 1 + X 2 + X 3 + X 4 for the sum of the four rolls. The X i 's are called r andom variables A random v ariable is simply an expression whose v alue is the outcome of a particular exp erimen t. Just as in the case of other t yp es of v ariables in mathematics, random v ariables can tak e on dieren t v alues. Let X b e the random v ariable whic h represen ts the roll of one die. W e shall assign probabilities to the p ossible outcomes of this exp erimen t. W e do this b y assigning to eac h outcome j a nonnegativ e n um b er m ( j ) in suc h a w a y that m ( 1 ) + m ( 2 ) + + m ( 6 ) = 1 : The function m ( j ) is called the distribution function of the random v ariable X F or the case of the roll of the die w e w ould assign equal probabilities or probabilities 1/6 to eac h of the outcomes. With this assignmen t of probabilities, one could write P ( X 4) = 2 3 1 PAGE 10 2 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS to mean that the probabilit y is 2 = 3 that a roll of a die will ha v e a v alue whic h do es not exceed 4. Let Y b e the random v ariable whic h represen ts the toss of a coin. In this case, there are t w o p ossible outcomes, whic h w e can lab el as H and T. Unless w e ha v e reason to susp ect that the coin comes up one w a y more often than the other w a y it is natural to assign the probabilit y of 1/2 to eac h of the t w o outcomes. In b oth of the ab o v e exp erimen ts, eac h outcome is assigned an equal probabilit y This w ould certainly not b e the case in general. F or example, if a drug is found to b e eectiv e 30 p ercen t of the time it is used, w e migh t assign a probabilit y .3 that the drug is eectiv e the next time it is used and .7 that it is not eectiv e. This last example illustrates the in tuitiv e fr e quency c onc ept of pr ob ability. That is, if w e ha v e a probabilit y p that an exp erimen t will result in outcome A then if w e rep eat this exp erimen t a large n um b er of times w e should exp ect that the fraction of times that A will o ccur is ab out p T o c hec k in tuitiv e ideas lik e this, w e shall nd it helpful to lo ok at some of these problems exp erimen tally W e could, for example, toss a coin a large n um b er of times and see if the fraction of times heads turns up is ab out 1/2. W e could also sim ulate this exp erimen t on a computer. Sim ulation W e w an t to b e able to p erform an exp erimen t that corresp onds to a giv en set of probabilities; for example, m ( 1 ) = 1 = 2, m ( 2 ) = 1 = 3, and m ( 3 ) = 1 = 6. In this case, one could mark three faces of a sixsided die with an 1 t w o faces with an 2 and one face with an 3 In the general case w e assume that m ( 1 ), m ( 2 ), m ( n ) are all rational n um b ers, with least common denominator n If n > 2, w e can imagine a long cylindrical die with a crosssection that is a regular n gon. If m ( j ) = n j =n then w e can lab el n j of the long faces of the cylinder with an j and if one of the end faces comes up, w e can just roll the die again. If n = 2, a coin could b e used to p erform the exp erimen t. W e will b e particularly in terested in rep eating a c hance exp erimen t a large n umb er of times. Although the cylindrical die w ould b e a con v enien t w a y to carry out a few rep etitions, it w ould b e dicult to carry out a large n um b er of exp erimen ts. Since the mo dern computer can do a large n um b er of op erations in a v ery short time, it is natural to turn to the computer for this task. Random Num b ers W e m ust rst nd a computer analog of rolling a die. This is done on the computer b y means of a r andom numb er gener ator. Dep ending up on the particular soft w are pac k age, the computer can b e ask ed for a real n um b er b et w een 0 and 1, or an in teger in a giv en set of consecutiv e in tegers. In the rst case, the real n um b ers are c hosen in suc h a w a y that the probabilit y that the n um b er lies in an y particular subin terv al of this unit in terv al is equal to the length of the subin terv al. In the second case, eac h in teger has the same probabilit y of b eing c hosen. PAGE 11 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 3 .203309 .762057 .151121 .623868 .932052 .415178 .716719 .967412 .069664 .670982 .352320 .049723 .750216 .784810 .089734 .966730 .946708 .380365 .027381 .900794 T able 1.1: Sample output of the program RandomNum b ers Let X b e a random v ariable with distribution function m ( ), where is in the set f 1 ; 2 ; 3 g and m ( 1 ) = 1 = 2, m ( 2 ) = 1 = 3, and m ( 3 ) = 1 = 6. If our computer pac k age can return a random in teger in the set f 1 ; 2 ; :::; 6 g then w e simply ask it to do so, and mak e 1, 2, and 3 corresp ond to 1 4 and 5 corresp ond to 2 and 6 corresp ond to 3 If our computer pac k age returns a random real n um b er r in the in terv al (0 ; 1), then the expression b 6 r c + 1 will b e a random in teger b et w een 1 and 6. (The notation b x c means the greatest in teger not exceeding x and is read \ro or of x .") The metho d b y whic h random real n um b ers are generated on a computer is describ ed in the historical discussion at the end of this section. The follo wing example giv es sample output of the program RandomNum b ers Example 1.1 (Random Num b er Generation) The program RandomNum b ers generates n random real n um b ers in the in terv al [0 ; 1], where n is c hosen b y the user. When w e ran the program with n = 20, w e obtained the data sho wn in T able 1.1. 2 Example 1.2 (Coin T ossing) As w e ha v e noted, our in tuition suggests that the probabilit y of obtaining a head on a single toss of a coin is 1/2. T o ha v e the computer toss a coin, w e can ask it to pic k a random real n um b er in the in terv al [0 ; 1] and test to see if this n um b er is less than 1/2. If so, w e shall call the outcome he ads ; if not w e call it tails. Another w a y to pro ceed w ould b e to ask the computer to pic k a random in teger from the set f 0 ; 1 g The program CoinT osses carries out the exp erimen t of tossing a coin n times. Running this program, with n = 20, resulted in: THTTTHTTTTHTTTTTHHTT. Note that in 20 tosses, w e obtained 5 heads and 15 tails. Let us toss a coin n times, where n is m uc h larger than 20, and see if w e obtain a prop ortion of heads closer to our in tuitiv e guess of 1/2. The program CoinT osses k eeps trac k of the n um b er of heads. When w e ran this program with n = 1000, w e obtained 494 heads. When w e ran it with n = 10000, w e obtained 5039 heads. PAGE 12 4 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS W e notice that when w e tossed the coin 10,000 times, the prop ortion of heads w as close to the \true v alue" .5 for obtaining a head when a coin is tossed. A mathematical mo del for this exp erimen t is called Bernoulli T rials (see Chapter 3). The L aw of L ar ge Numb ers, whic h w e shall study later (see Chapter 8), will sho w that in the Bernoulli T rials mo del, the prop ortion of heads should b e near .5, consisten t with our in tuitiv e idea of the frequency in terpretation of probabilit y Of course, our program could b e easily mo died to sim ulate coins for whic h the probabilit y of a head is p where p is a real n um b er b et w een 0 and 1. 2 In the case of coin tossing, w e already knew the probabilit y of the ev en t o ccurring on eac h exp erimen t. The real p o w er of sim ulation comes from the abilit y to estimate probabilities when they are not kno wn ahead of time. This metho d has b een used in the recen t disco v eries of strategies that mak e the casino game of blac kjac k fa v orable to the pla y er. W e illustrate this idea in a simple situation in whic h w e can compute the true probabilit y and see ho w eectiv e the sim ulation is. Example 1.3 (Dice Rolling) W e consider a dice game that pla y ed an imp ortan t role in the historical dev elopmen t of probabilit y The famous letters b et w een P ascal and F ermat, whic h man y b eliev e started a serious study of probabilit y w ere instigated b y a request for help from a F renc h nobleman and gam bler, Chev alier de M er e. It is said that de M er e had b een b etting that, in four rolls of a die, at least one six w ould turn up. He w as winning consisten tly and, to get more p eople to pla y he c hanged the game to b et that, in 24 rolls of t w o dice, a pair of sixes w ould turn up. It is claimed that de M er e lost with 24 and felt that 25 rolls w ere necessary to mak e the game fa v orable. It w as un gr and sc andale that mathematics w as wrong. W e shall try to see if de M er e is correct b y sim ulating his v arious b ets. The program DeMere1 sim ulates a large n um b er of exp erimen ts, seeing, in eac h one, if a six turns up in four rolls of a die. When w e ran this program for 1000 pla ys, a six came up in the rst four rolls 48.6 p ercen t of the time. When w e ran it for 10,000 pla ys this happ ened 51.98 p ercen t of the time. W e note that the result of the second run suggests that de M er e w as correct in b elieving that his b et with one die w as fa v orable; ho w ev er, if w e had based our conclusion on the rst run, w e w ould ha v e decided that he w as wrong. A c cur ate r esults by simulation r e quir e a lar ge numb er of exp eriments. 2 The program DeMere2 sim ulates de M er e's second b et that a pair of sixes will o ccur in n rolls of a pair of dice. The previous sim ulation sho ws that it is imp ortan t to kno w ho w man y trials w e should sim ulate in order to exp ect a certain degree of accuracy in our appro ximation. W e shall see later that in these t yp es of exp erimen ts, a rough rule of th um b is that, at least 95% of the time, the error do es not exceed the recipro cal of the square ro ot of the n um b er of trials. F ortunately for this dice game, it will b e easy to compute the exact probabilities. W e shall sho w in the next section that for the rst b et the probabilit y that de M er e wins is 1 (5 = 6) 4 = : 518. PAGE 13 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 5 5 10 15 20 25 30 35 40 10 8 6 4 2 2 4 6 8 10 Figure 1.1: P eter's winnings in 40 pla ys of heads or tails. One can understand this calculation as follo ws: The probabilit y that no 6 turns up on the rst toss is (5 = 6). The probabilit y that no 6 turns up on either of the rst t w o tosses is (5 = 6) 2 Reasoning in the same w a y the probabilit y that no 6 turns up on an y of the rst four tosses is (5 = 6) 4 Th us, the probabilit y of at least one 6 in the rst four tosses is 1 (5 = 6) 4 Similarly for the second b et, with 24 rolls, the probabilit y that de M er e wins is 1 (35 = 36) 24 = : 491, and for 25 rolls it is 1 (35 = 36) 25 = : 506. Using the rule of th um b men tioned ab o v e, it w ould require 27,000 rolls to ha v e a reasonable c hance to determine these probabilities with sucien t accuracy to assert that they lie on opp osite sides of .5. It is in teresting to p onder whether a gam bler can detect suc h probabilities with the required accuracy from gam bling exp erience. Some writers on the history of probabilit y suggest that de M er e w as, in fact, just in terested in these problems as in triguing probabilit y problems. Example 1.4 (Heads or T ails) F or our next example, w e consider a problem where the exact answ er is dicult to obtain but for whic h sim ulation easily giv es the qualitativ e results. P eter and P aul pla y a game called he ads or tails. In this game, a fair coin is tossed a sequence of timesw e c ho ose 40. Eac h time a head comes up P eter wins 1 p enn y from P aul, and eac h time a tail comes up P eter loses 1 p enn y to P aul. F or example, if the results of the 40 tosses are THTHHHHTTHTHHTTHHTTTTHHHTHHTHHHTHHHTTTHH. P eter's winnings ma y b e graphed as in Figure 1.1. P eter has w on 6 p ennies in this particular game. It is natural to ask for the probabilit y that he will win j p ennies; here j could b e an y ev en n um b er from 40 to 40. It is reasonable to guess that the v alue of j with the highest probabilit y is j = 0, since this o ccurs when the n um b er of heads equals the n um b er of tails. Similarly w e w ould guess that the v alues of j with the lo w est probabilities are j = 40. PAGE 14 6 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS A second in teresting question ab out this game is the follo wing: Ho w man y times in the 40 tosses will P eter b e in the lead? Lo oking at the graph of his winnings (Figure 1.1), w e see that P eter is in the lead when his winnings are p ositiv e, but w e ha v e to mak e some con v en tion when his winnings are 0 if w e w an t all tosses to con tribute to the n um b er of times in the lead. W e adopt the con v en tion that, when P eter's winnings are 0, he is in the lead if he w as ahead at the previous toss and not if he w as b ehind at the previous toss. With this con v en tion, P eter is in the lead 34 times in our example. Again, our in tuition migh t suggest that the most lik ely n um b er of times to b e in the lead is 1/2 of 40, or 20, and the least lik ely n um b ers are the extreme cases of 40 or 0. It is easy to settle this b y sim ulating the game a large n um b er of times and k eeping trac k of the n um b er of times that P eter's nal winnings are j and the n um b er of times that P eter ends up b eing in the lead b y k The prop ortions o v er all games then giv e estimates for the corresp onding probabilities. The program HTSim ulation carries out this sim ulation. Note that when there are an ev en n um b er of tosses in the game, it is p ossible to b e in the lead only an ev en n um b er of times. W e ha v e sim ulated this game 10,000 times. The results are sho wn in Figures 1.2 and 1.3. These graphs, whic h w e call spik e graphs, w ere generated using the program Spik egraph The v ertical line, or spik e, at p osition x on the horizon tal axis, has a heigh t equal to the prop ortion of outcomes whic h equal x Our in tuition ab out P eter's nal winnings w as quite correct, but our in tuition ab out the n um b er of times P eter w as in the lead w as completely wrong. The sim ulation suggests that the least lik ely n um b er of times in the lead is 20 and the most lik ely is 0 or 40. This is indeed correct, and the explanation for it is suggested b y pla ying the game of heads or tails with a large n um b er of tosses and lo oking at a graph of P eter's winnings. In Figure 1.4 w e sho w the results of a sim ulation of the game, for 1000 tosses and in Figure 1.5 for 10,000 tosses. In the second example P eter w as ahead most of the time. It is a remark able fact, ho w ev er, that, if pla y is con tin ued long enough, P eter's winnings will con tin ue to come bac k to 0, but there will b e v ery long times b et w een the times that this happ ens. These and related results will b e discussed in Chapter 12. 2 In all of our examples so far, w e ha v e sim ulated equiprobable outcomes. W e illustrate next an example where the outcomes are not equiprobable. Example 1.5 (Horse Races) F our horses (Acorn, Balky Chestn ut, and Dolb y) ha v e raced man y times. It is estimated that Acorn wins 30 p ercen t of the time, Balky 40 p ercen t of the time, Chestn ut 20 p ercen t of the time, and Dolb y 10 p ercen t of the time. W e can ha v e our computer carry out one race as follo ws: Cho ose a random n um b er x If x < : 3 then w e sa y that Acorn w on. If : 3 x < : 7 then Balky wins. If : 7 x < : 9 then Chestn ut wins. Finally if : 9 x then Dolb y wins. The program HorseRace uses this metho d to sim ulate the outcomes of n races. Running this program for n = 10 w e found that Acorn w on 40 p ercen t of the time, Balky 20 p ercen t of the time, Chestn ut 10 p ercen t of the time, and Dolb y 30 p ercen t PAGE 15 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 7 Figure 1.2: Distribution of winnings. Figure 1.3: Distribution of n um b er of times in the lead. PAGE 16 8 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS 200 400 600 800 1000 1000 plays 50 40 30 20 10 0 10 20 Figure 1.4: P eter's winnings in 1000 pla ys of heads or tails. 2000 4000 6000 8000 10000 10000 plays 0 50 100 150 200 Figure 1.5: P eter's winnings in 10,000 pla ys of heads or tails. PAGE 17 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 9 of the time. A larger n um b er of races w ould b e necessary to ha v e b etter agreemen t with the past exp erience. Therefore w e ran the program to sim ulate 1000 races with our four horses. Although v ery tired after all these races, they p erformed in a manner quite consisten t with our estimates of their abilities. Acorn w on 29.8 p ercen t of the time, Balky 39.4 p ercen t, Chestn ut 19.5 p ercen t, and Dolb y 11.3 p ercen t of the time. The program GeneralSim ulation uses this metho d to sim ulate rep etitions of an arbitrary exp erimen t with a nite n um b er of outcomes o ccurring with kno wn probabilities. 2 Historical Remarks An y one who pla ys the same c hance game o v er and o v er is really carrying out a simulation, and in this sense the pro cess of sim ulation has b een going on for cen turies. As w e ha v e remark ed, man y of the early problems of probabilit y migh t w ell ha v e b een suggested b y gam blers' exp eriences. It is natural for an y one trying to understand probabilit y theory to try simple exp erimen ts b y tossing coins, rolling dice, and so forth. The naturalist Buon tossed a coin 4040 times, resulting in 2048 heads and 1992 tails. He also estimated the n um b er b y thro wing needles on a ruled surface and recording ho w man y times the needles crossed a line (see Section 2.1). The English biologist W. F. R. W eldon 1 recorded 26,306 thro ws of 12 dice, and the Swiss scien tist Rudolf W olf 2 recorded 100,000 thro ws of a single die without a computer. Suc h exp erimen ts are v ery timeconsuming and ma y not accurately represen t the c hance phenomena b eing studied. F or example, for the dice exp erimen ts of W eldon and W olf, further analysis of the recorded data sho w ed a susp ected bias in the dice. The statistician Karl P earson analyzed a large n um b er of outcomes at certain roulette tables and suggested that the wheels w ere biased. He wrote in 1894: Clearly since the Casino do es not serv e the v aluable end of h uge laboratory for the preparation of probabilit y statistics, it has no scien tic r aison d' ^ etr e. Men of science cannot ha v e their most rened theories disregarded in this shameless manner! The F renc h Go v ernmen t m ust b e urged b y the hierarc h y of science to close the gamingsalo ons; it w ould b e, of course, a graceful act to hand o v er the remaining resources of the Casino to the Acad emie des Sciences for the endo wmen t of a lab oratory of ortho do x probabilit y; in particular, of the new branc h of that study the application of the theory of c hance to the biological problems of ev olution, whic h is lik ely to o ccup y so m uc h of men's though ts in the near future. 3 Ho w ev er, these early exp erimen ts w ere suggestiv e and led to imp ortan t disco veries in probabilit y and statistics. They led P earson to the chisquar e d test, whic h 1 T. C. F ry Pr ob ability and Its Engine ering Uses, 2nd ed. (Princeton: V an Nostrand, 1965). 2 E. Czub er, Wahrscheinlichkeitsr e chnung, 3rd ed. (Berlin: T eubner, 1914). 3 K. P earson, \Science and Mon te Carlo," F ortnightly R eview v ol. 55 (1894), p. 193; cited in S. M. Stigler, The History of Statistics (Cam bridge: Harv ard Univ ersit y Press, 1986). PAGE 18 10 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS is of great imp ortance in testing whether observ ed data t a giv en probabilit y distribution. By the early 1900s it w as clear that a b etter w a y to generate random n um b ers w as needed. In 1927, L. H. C. Tipp ett published a list of 41,600 digits obtained b y selecting n um b ers haphazardly from census rep orts. In 1955, RAND Corp oration prin ted a table of 1,000,000 random n um b ers generated from electronic noise. The adv en t of the highsp eed computer raised the p ossibilit y of generating random n umb ers directly on the computer, and in the late 1940s John v on Neumann suggested that this b e done as follo ws: Supp ose that y ou w an t a random sequence of fourdigit n um b ers. Cho ose an y fourdigit n um b er, sa y 6235, to start. Square this n um b er to obtain 38,875,225. F or the second n um b er c ho ose the middle four digits of this square (i.e., 8752). Do the same pro cess starting with 8752 to get the third n um b er, and so forth. More mo dern metho ds in v olv e the concept of mo dular arithmetic. If a is an in teger and m is a p ositiv e in teger, then b y a (mo d m ) w e mean the remainder when a is divided b y m F or example, 10 (mo d 4) = 2, 8 (mo d 2) = 0, and so forth. T o generate a random sequence X 0 ; X 1 ; X 2 ; : : : of n um b ers c ho ose a starting n um b er X 0 and then obtain the n um b ers X n +1 from X n b y the form ula X n +1 = ( aX n + c ) (mo d m ) ; where a c and m are carefully c hosen constan ts. The sequence X 0 ; X 1 ; X 2 ; : : : is then a sequence of in tegers b et w een 0 and m 1. T o obtain a sequence of real n um b ers in [0 ; 1), w e divide eac h X j b y m The resulting sequence consists of rational n um b ers of the form j =m where 0 j m 1. Since m is usually a v ery large in teger, w e think of the n um b ers in the sequence as b eing random real n um b ers in [0 ; 1). F or b oth v on Neumann's squaring metho d and the mo dular arithmetic tec hnique the sequence of n um b ers is actually completely determined b y the rst n um b er. Th us, there is nothing really random ab out these sequences. Ho w ev er, they pro duce n um b ers that b eha v e v ery m uc h as theory w ould predict for random exp erimen ts. T o obtain dieren t sequences for dieren t exp erimen ts the initial n um b er X 0 is c hosen b y some other pro cedure that migh t in v olv e, for example, the time of da y 4 During the Second W orld W ar, ph ysicists at the Los Alamos Scien tic Lab oratory needed to kno w, for purp oses of shielding, ho w far neutrons tra v el through v arious materials. This question w as b ey ond the reac h of theoretical calculations. Daniel McCrac k en, writing in the Scientic A meric an states: The ph ysicists had most of the necessary data: they knew the a v erage distance a neutron of a giv en sp eed w ould tra v el in a giv en substance b efore it collided with an atomic n ucleus, what the probabilities w ere that the neutron w ould b ounce o instead of b eing absorb ed b y the n ucleus, ho w m uc h energy the neutron w as lik ely to lose after a giv en 4 F or a detailed discussion of random n um b ers, see D. E. Kn uth, The A rt of Computer Pr ogr amming, v ol. I I (Reading: AddisonW esley 1969). PAGE 19 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 11 collision and so on. 5 John v on Neumann and Stanislas Ulam suggested that the problem b e solv ed b y mo deling the exp erimen t b y c hance devices on a computer. Their w ork b eing secret, it w as necessary to giv e it a co de name. V on Neumann c hose the name \Mon te Carlo." Since that time, this metho d of sim ulation has b een called the Monte Carlo Metho d. William F eller indicated the p ossibilities of using computer sim ulations to illustrate basic concepts in probabilit y in his b o ok A n Intr o duction to Pr ob ability The ory and Its Applic ations. In discussing the problem ab out the n um b er of times in the lead in the game of \heads or tails" F eller writes: The results concerning ructuations in coin tossing sho w that widely held b eliefs ab out the la w of large n um b ers are fallacious. These results are so amazing and so at v ariance with common in tuition that ev en sophisticated colleagues doubted that coins actually misb eha v e as theory predicts. The record of a sim ulated exp erimen t is therefore included. 6 F eller pro vides a plot sho wing the result of 10,000 pla ys of he ads or tails similar to that in Figure 1.5. The martingale b etting system describ ed in Exercise 10 has a long and in teresting history Russell Barnhart p oin ted out to the authors that its use can b e traced bac k at least to 1754, when Casano v a, writing in his memoirs, History of My Life, writes She [Casano v a's mistress] made me promise to go to the casino [the Ridotto in V enice] for money to pla y in partnership with her. I w en t there and to ok all the gold I found, and, determinedly doubling m y stak es according to the system kno wn as the martingale, I w on three or four times a da y during the rest of the Carniv al. I nev er lost the sixth card. If I had lost it, I should ha v e b een out of funds, whic h amoun ted to t w o thousand zecc hini. 7 Ev en if there w ere no zeros on the roulette wheel so the game w as p erfectly fair, the martingale system, or an y other system for that matter, cannot mak e the game in to a fa v orable game. The idea that a fair game remains fair and unfair games remain unfair under gam bling systems has b een exploited b y mathematicians to obtain imp ortan t results in the study of probabilit y W e will in tro duce the general concept of a martingale in Chapter 6. The w ord martingale itself also has an in teresting history The origin of the w ord is obscure. A recen t v ersion of the Oxfor d English Dictionary giv es examples 5 D. D. McCrac k en, \The Mon te Carlo Metho d," Scientic A meric an, v ol. 192 (Ma y 1955), p. 90. 6 W. F eller, Intr o duction to Pr ob ability The ory and its Applic ations, v ol. 1, 3rd ed. (New Y ork: John Wiley & Sons, 1968), p. xi. 7 G. Casano v a, History of My Life, v ol. IV, Chap. 7, trans. W. R. T rask (New Y ork: HarcourtBrace, 1968), p. 124. PAGE 20 12 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS of its use in the early 1600s and sa ys that its probable origin is the reference in Rab elais's Bo ok One, Chapter 20: Ev erything w as done as planned, the only thing b eing that Gargan tua doubted if they w ould b e able to nd, righ t a w a y breec hes suitable to the old fello w's legs; he w as doubtful, also, as to what cut w ould b e most b ecoming to the oratorthe martingale, whic h has a dra wbridge eect in the seat, to p ermit doing one's business more easily; the sailorst yle, whic h aords more comfort for the kidneys; the Swiss, whic h is w armer on the b elly; or the co dshtail, whic h is co oler on the loins. 8 Dominic Lusinc hi noted an earlier o ccurrence of the w ord martingale. According to the F renc h dictionary L e Petit R ob ert the w ord comes from the Pro v en cal w ord \martegalo," whic h means \from Martigues." Martigues is a to wn due w est of Merseille. The dictionary giv es the example of \c hausses a la martinguale" (whic h means Martiguesst yle breec hes) and the date 1491. In mo dern uses martingale has sev eral dieren t meanings, all related to holding down, in addition to the gam bling use. F or example, it is a strap on a horse's harness used to hold do wn the horse's head, and also part of a sailing rig used to hold do wn the b o wsprit. The Lab ouc here system describ ed in Exercise 9 is named after Henry du Pre Lab ouc here (1831{1912), an English journalist and mem b er of P arliamen t. Lab ouc here attributed the system to Condorcet. Condorcet (1743{1794) w as a p olitical leader during the time of the F renc h rev olution who w as in terested in applying probabilit y theory to economics and p olitics. F or example, he calculated the probabilit y that a jury using ma jorit y v ote will giv e a correct decision if eac h juror has the same probabilit y of deciding correctly His writings pro vided a w ealth of ideas on ho w probabilit y migh t b e applied to h uman aairs. 9 Exercises 1 Mo dify the program CoinT osses to toss a coin n times and prin t out after ev ery 100 tosses the prop ortion of heads min us 1/2. Do these n um b ers app ear to approac h 0 as n increases? Mo dify the program again to prin t out, ev ery 100 times, b oth of the follo wing quan tities: the prop ortion of heads min us 1/2, and the n um b er of heads min us half the n um b er of tosses. Do these n um b ers app ear to approac h 0 as n increases? 2 Mo dify the program CoinT osses so that it tosses a coin n times and records whether or not the prop ortion of heads is within .1 of .5 (i.e., b et w een .4 and .6). Ha v e y our program rep eat this exp erimen t 100 times. Ab out ho w large m ust n b e so that appro ximately 95 out of 100 times the prop ortion of heads is b et w een .4 and .6? 8 Quoted in the Portable R ab elais, ed. S. Putnam (New Y ork: Viking, 1946), p. 113. 9 Le Marquise de Condorcet, Essai sur l'Applic ation de l'A nalyse a la Pr ob abilit e d es D ecisions R endues a la Plur alit e des V oix (P aris: Imprimerie Ro y ale, 1785). PAGE 21 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 13 3 In the early 1600s, Galileo w as ask ed to explain the fact that, although the n um b er of triples of in tegers from 1 to 6 with sum 9 is the same as the n um b er of suc h triples with sum 10, when three dice are rolled, a 9 seemed to come up less often than a 10supp osedly in the exp erience of gam blers. (a) W rite a program to sim ulate the roll of three dice a large n um b er of times and k eep trac k of the prop ortion of times that the sum is 9 and the prop ortion of times it is 10. (b) Can y ou conclude from y our sim ulations that the gam blers w ere correct? 4 In raquetball, a pla y er con tin ues to serv e as long as she is winning; a p oin t is scored only when a pla y er is serving and wins the v olley The rst pla y er to win 21 p oin ts wins the game. Assume that y ou serv e rst and ha v e a probabilit y .6 of winning a v olley when y ou serv e and probabilit y .5 when y our opp onen t serv es. Estimate, b y sim ulation, the probabilit y that y ou will win a game. 5 Consider the b et that all three dice will turn up sixes at least once in n rolls of three dice. Calculate f ( n ), the probabilit y of at least one triplesix when three dice are rolled n times. Determine the smallest v alue of n necessary for a fa v orable b et that a triplesix will o ccur when three dice are rolled n times. (DeMoivre w ould sa y it should b e ab out 216 log 2 = 149 : 7 and so w ould answ er 150see Exercise 1.2.17. Do y ou agree with him?) 6 In Las V egas, a roulette wheel has 38 slots n um b ered 0, 00, 1, 2, 36. The 0 and 00 slots are green and half of the remaining 36 slots are red and half are blac k. A croupier spins the wheel and thro ws in an iv ory ball. If y ou b et 1 dollar on red, y ou win 1 dollar if the ball stops in a red slot and otherwise y ou lose 1 dollar. W rite a program to nd the total winnings for a pla y er who mak es 1000 b ets on red. 7 Another form of b et for roulette is to b et that a sp ecic n um b er (sa y 17) will turn up. If the ball stops on y our n um b er, y ou get y our dollar bac k plus 35 dollars. If not, y ou lose y our dollar. W rite a program that will plot y our winnings when y ou mak e 500 pla ys of roulette at Las V egas, rst when y ou b et eac h time on red (see Exercise 6), and then for a second visit to Las V egas when y ou mak e 500 pla ys b etting eac h time on the n um b er 17. What dierences do y ou see in the graphs of y our winnings on these t w o o ccasions? 8 An astute studen t noticed that, in our sim ulation of the game of heads or tails (see Example 1.4), the prop ortion of times the pla y er is alw a ys in the lead is v ery close to the prop ortion of times that the pla y er's total winnings end up 0. W ork out these probabilities b y en umeration of all cases for t w o tosses and for four tosses, and see if y ou think that these probabilities are, in fact, the same. 9 The L ab oucher e system for roulette is pla y ed as follo ws. W rite do wn a list of n um b ers, usually 1, 2, 3, 4. Bet the sum of the rst and last, 1 + 4 = 5, on PAGE 22 14 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS red. If y ou win, delete the rst and last n um b ers from y our list. If y ou lose, add the amoun t that y ou last b et to the end of y our list. Then use the new list and b et the sum of the rst and last n um b ers (if there is only one n um b er, b et that amoun t). Con tin ue un til y our list b ecomes empt y Sho w that, if this happ ens, y ou win the sum, 1 + 2 + 3 + 4 = 10, of y our original list. Sim ulate this system and see if y ou do alw a ys stop and, hence, alw a ys win. If so, wh y is this not a fo olpro of gam bling system? 10 Another w ellkno wn gam bling system is the martingale doubling system Supp ose that y ou are b etting on red to turn up in roulette. Ev ery time y ou win, b et 1 dollar next time. Ev ery time y ou lose, double y our previous b et. Supp ose that y ou use this system un til y ou ha v e w on at least 5 dollars or y ou ha v e lost more than 100 dollars. W rite a program to sim ulate this and pla y it a n um b er of times and see ho w y ou do. In his b o ok The Newc omes, W. M. Thac kera y remarks \Y ou ha v e not pla y ed as y et? Do not do so; ab o v e all a v oid a martingale if y ou do." 10 W as this go o d advice? 11 Mo dify the program HTSim ulation so that it k eeps trac k of the maxim um of P eter's winnings in eac h game of 40 tosses. Ha v e y our program prin t out the prop ortion of times that y our total winnings tak e on v alues 0 ; 2 ; 4 ; : : : ; 40. Calculate the corresp onding exact probabilities for games of t w o tosses and four tosses. 12 In an up coming national election for the Presiden t of the United States, a p ollster plans to predict the winner of the p opular v ote b y taking a random sample of 1000 v oters and declaring that the winner will b e the one obtaining the most v otes in his sample. Supp ose that 48 p ercen t of the v oters plan to v ote for the Republican candidate and 52 p ercen t plan to v ote for the Demo cratic candidate. T o get some idea of ho w reasonable the p ollster's plan is, write a program to mak e this prediction b y sim ulation. Rep eat the sim ulation 100 times and see ho w man y times the p ollster's prediction w ould come true. Rep eat y our exp erimen t, assuming no w that 49 p ercen t of the p opulation plan to v ote for the Republican candidate; rst with a sample of 1000 and then with a sample of 3000. (The Gallup P oll uses ab out 3000.) (This idea is discussed further in Chapter 9, Section 9.1.) 13 The psyc hologist Tv ersky and his colleagues 11 sa y that ab out four out of v e p eople will answ er (a) to the follo wing question: A certain to wn is serv ed b y t w o hospitals. In the larger hospital ab out 45 babies are b orn eac h da y and in the smaller hospital 15 babies are b orn eac h da y Although the o v erall prop ortion of b o ys is ab out 50 p ercen t, the actual prop ortion at either hospital ma y b e more or less than 50 p ercen t on an y da y 10 W. M. Thac k erey The Newc omes (London: Bradbury and Ev ans, 1854{55). 11 See K. McKean, \Decisions, Decisions," Disc over, June 1985, pp. 22{31. Kevin McKean, Disco v er Magazine, c r 1987 F amily Media, Inc. Reprin ted with p ermission. This p opular article rep orts on the w ork of Tv erksy et. al. in Judgement Under Unc ertainty: Heuristics and Biases (Cam bridge: Cam bridge Univ ersit y Press, 1982). PAGE 23 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 15 A t the end of a y ear, whic h hospital will ha v e the greater n um b er of da ys on whic h more than 60 p ercen t of the babies b orn w ere b o ys? (a) the large hospital (b) the small hospital (c) neitherthe n um b er of da ys will b e ab out the same. Assume that the probabilit y that a bab y is a b o y is .5 (actual estimates mak e this more lik e .513). Decide, b y sim ulation, what the righ t answ er is to the question. Can y ou suggest wh y so man y p eople go wrong? 14 Y ou are oered the follo wing game. A fair coin will b e tossed un til the rst time it comes up heads. If this o ccurs on the j th toss y ou are paid 2 j dollars. Y ou are sure to win at least 2 dollars so y ou should b e willing to pa y to pla y this gamebut ho w m uc h? F ew p eople w ould pa y as m uc h as 10 dollars to pla y this game. See if y ou can decide, b y sim ulation, a reasonable amoun t that y ou w ould b e willing to pa y p er game, if y ou will b e allo w ed to mak e a large n um b er of pla ys of the game. Do es the amoun t that y ou w ould b e willing to pa y p er game dep end up on the n um b er of pla ys that y ou will b e allo w ed? 15 Tv ersky and his colleagues 12 studied the records of 48 of the Philadelphia 76ers bask etball games in the 1980{81 season to see if a pla y er had times when he w as hot and ev ery shot w en t in, and other times when he w as cold and barely able to hit the bac kb oard. The pla y ers estimated that they w ere ab out 25 p ercen t more lik ely to mak e a shot after a hit than after a miss. In fact, the opp osite w as truethe 76ers w ere 6 p ercen t more lik ely to score after a miss than after a hit. Tv ersky rep orts that the n um b er of hot and cold streaks w as ab out what one w ould exp ect b y purely random eects. Assuming that a pla y er has a ft yft y c hance of making a shot and mak es 20 shots a game, estimate b y sim ulation the prop ortion of the games in whic h the pla y er will ha v e a streak of 5 or more hits. 16 Estimate, b y sim ulation, the a v erage n um b er of c hildren there w ould b e in a family if all p eople had c hildren un til they had a b o y Do the same if all p eople had c hildren un til they had at least one b o y and at least one girl. Ho w man y more c hildren w ould y ou exp ect to nd under the second sc heme than under the rst in 100,000 families? (Assume that b o ys and girls are equally lik ely .) 17 Mathematicians ha v e b een kno wn to get some of the b est ideas while sitting in a cafe, riding on a bus, or strolling in the park. In the early 1900s the famous mathematician George P oly a liv ed in a hotel near the w o o ds in Zuric h. He lik ed to w alk in the w o o ds and think ab out mathematics. P oly a describ es the follo wing inciden t: 12 ibid. PAGE 24 16 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS 0 1 2 3 1 2 3 c. Random walk in three dimensions. b. Random walk in two dimensions. a. Random walk in one dimension. Figure 1.6: Random w alk. PAGE 25 1.1. SIMULA TION OF DISCRETE PR OBABILITIES 17 A t the hotel there liv ed also some studen ts with whom I usually to ok m y meals and had friendly relations. On a certain da y one of them exp ected the visit of his anc ee, what (sic) I knew, but I did not foresee that he and his anc ee w ould also set out for a stroll in the w o o ds, and then suddenly I met them there. And then I met them the same morning rep eatedly I don't remem b er ho w man y times, but certainly m uc h to o often and I felt em barrassed: It lo ok ed as if I w as sno oping around whic h w as, I assure y ou, not the case. 13 This set him to thinking ab out whether random w alk ers w ere destined to meet.P oly a considered random w alk ers in one, t w o, and three dimensions. In one dimension, he en visioned the w alk er on a v ery long street. A t eac h in tersection the w alk er rips a fair coin to decide whic h direction to w alk next (see Figure 1.6a). In t w o dimensions, the w alk er is w alking on a grid of streets, and at eac h in tersection he c ho oses one of the four p ossible directions with equal probabilit y (see Figure 1.6b). In three dimensions (w e migh t b etter sp eak of a random clim b er), the w alk er mo v es on a threedimensional grid, and at eac h in tersection there are no w six dieren t directions that the w alk er ma y c ho ose, eac h with equal probabilit y (see Figure 1.6c). The reader is referred to Section 12.1, where this and related problems are discussed. (a) W rite a program to sim ulate a random w alk in one dimension starting at 0. Ha v e y our program prin t out the lengths of the times b et w een returns to the starting p oin t (returns to 0). See if y ou can guess from this sim ulation the answ er to the follo wing question: Will the w alk er alw a ys return to his starting p oin t ev en tually or migh t he drift a w a y forev er? (b) The paths of t w o w alk ers in t w o dimensions who meet after n steps can b e considered to b e a single path that starts at (0 ; 0) and returns to (0 ; 0) after 2 n steps. This means that the probabilit y that t w o random w alk ers in t w o dimensions meet is the same as the probabilit y that a single w alk er in t w o dimensions ev er returns to the starting p oin t. Th us the question of whether t w o w alk ers are sure to meet is the same as the question of whether a single w alk er is sure to return to the starting p oin t. W rite a program to sim ulate a random w alk in t w o dimensions and see if y ou think that the w alk er is sure to return to (0 ; 0). If so, P oly a w ould b e sure to k eep meeting his friends in the park. P erhaps b y no w y ou ha v e conjectured the answ er to the question: Is a random w alk er in one or t w o dimensions sure to return to the starting p oin t? P oly a answ ered 13 G. P oly a, \Tw o Inciden ts," Scientists at Work: F estschrift in Honour of Herman Wold, ed. T. Dalenius, G. Karlsson, and S. Malmquist (Uppsala: Almquist & Wiksells Boktryc k eri AB, 1970). PAGE 26 18 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS this question for dimensions one, t w o, and three. He established the remark able result that the answ er is yes in one and t w o dimensions and no in three dimensions. (c) W rite a program to sim ulate a random w alk in three dimensions and see whether, from this sim ulation and the results of (a) and (b), y ou could ha v e guessed P oly a's result. 1.2 Discrete Probabilit y Distributions In this b o ok w e shall study man y dieren t exp erimen ts from a probabilistic p oin t of view. What is in v olv ed in this study will b ecome eviden t as the theory is dev elop ed and examples are analyzed. Ho w ev er, the o v erall idea can b e describ ed and illustrated as follo ws: to eac h exp erimen t that w e consider there will b e asso ciated a random v ariable, whic h represen ts the outcome of an y particular exp erimen t. The set of p ossible outcomes is called the sample sp ac e In the rst part of this section, w e will consider the case where the exp erimen t has only nitely man y p ossible outcomes, i.e., the sample space is nite. W e will then generalize to the case that the sample space is either nite or coun tably innite. This leads us to the follo wing denition.Random V ariables and Sample Spaces Denition 1.1 Supp ose w e ha v e an exp erimen t whose outcome dep ends on c hance. W e represen t the outcome of the exp erimen t b y a capital Roman letter, suc h as X called a r andom variable The sample sp ac e of the exp erimen t is the set of all p ossible outcomes. If the sample space is either nite or coun tably innite, the random v ariable is said to b e discr ete 2 W e generally denote a sample space b y the capital Greek letter n. As stated ab o v e, in the corresp ondence b et w een an exp erimen t and the mathematical theory b y whic h it is studied, the sample space n corresp onds to the set of p ossible outcomes of the exp erimen t. W e no w mak e t w o additional denitions. These are subsidiary to the denition of sample space and serv e to mak e precise some of the common terminology used in conjunction with sample spaces. First of all, w e dene the elemen ts of a sample space to b e outc omes Second, eac h subset of a sample space is dened to b e an event Normally w e shall denote outcomes b y lo w er case letters and ev en ts b y capital letters. Example 1.6 A die is rolled once. W e let X denote the outcome of this exp erimen t. Then the sample space for this exp erimen t is the 6elemen t set n = f 1 ; 2 ; 3 ; 4 ; 5 ; 6 g ; PAGE 27 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 19 where eac h outcome i for i = 1, 6, corresp onds to the n um b er of dots on the face whic h turns up. The ev en t E = f 2 ; 4 ; 6 g corresp onds to the statemen t that the result of the roll is an ev en n um b er. The ev en t E can also b e describ ed b y sa ying that X is ev en. Unless there is reason to b eliev e the die is loaded, the natural assumption is that ev ery outcome is equally lik ely Adopting this con v en tion means that w e assign a probabilit y of 1/6 to eac h of the six outcomes, i.e., m ( i ) = 1 = 6, for 1 i 6. 2 Distribution F unctions W e next describ e the assignmen t of probabilities. The denitions are motiv ated b y the example ab o v e, in whic h w e assigned to eac h outcome of the sample space a nonnegativ e n um b er suc h that the sum of the n um b ers assigned is equal to 1. Denition 1.2 Let X b e a random v ariable whic h denotes the v alue of the outcome of a certain exp erimen t, and assume that this exp erimen t has only nitely man y p ossible outcomes. Let n b e the sample space of the exp erimen t (i.e., the set of all p ossible v alues of X or equiv alen tly the set of all p ossible outcomes of the exp erimen t.) A distribution function for X is a realv alued function m whose domain is n and whic h satises: 1. m ( ) 0 ; for all 2 n and 2. P 2 n m ( ) = 1 F or an y subset E of n, w e dene the pr ob ability of E to b e the n um b er P ( E ) giv en b y P ( E ) = X 2 E m ( ) : 2 Example 1.7 Consider an exp erimen t in whic h a coin is tossed t wice. Let X b e the random v ariable whic h corresp onds to this exp erimen t. W e note that there are sev eral w a ys to record the outcomes of this exp erimen t. W e could, for example, record the t w o tosses, in the order in whic h they o ccurred. In this case, w e ha v e n = f HH,HT,TH,TT g W e could also record the outcomes b y simply noting the n um b er of heads that app eared. In this case, w e ha v e n = f 0,1,2 g Finally w e could record the t w o outcomes, without regard to the order in whic h they o ccurred. In this case, w e ha v e n = f HH,HT,TT g W e will use, for the momen t, the rst of the sample spaces giv en ab o v e. W e will assume that all four outcomes are equally lik ely and dene the distribution function m ( ) b y m (HH ) = m (HT) = m (TH ) = m (TT) = 1 4 : PAGE 28 20 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS Let E = f HH,HT,TH g b e the ev en t that at least one head comes up. Then, the probabilit y of E can b e calculated as follo ws: P ( E ) = m (HH ) + m (HT ) + m (TH) = 1 4 + 1 4 + 1 4 = 3 4 : Similarly if F = f HH,HT g is the ev en t that heads comes up on the rst toss, then w e ha v e P ( F ) = m (HH ) + m (HT ) = 1 4 + 1 4 = 1 2 : 2 Example 1.8 (Example 1.6 con tin ued) The sample space for the exp erimen t in whic h the die is rolled is the 6elemen t set n = f 1 ; 2 ; 3 ; 4 ; 5 ; 6 g W e assumed that the die w as fair, and w e c hose the distribution function dened b y m ( i ) = 1 6 ; for i = 1 ; : : : ; 6 : If E is the ev en t that the result of the roll is an ev en n um b er, then E = f 2 ; 4 ; 6 g and P ( E ) = m (2) + m (4) + m (6) = 1 6 + 1 6 + 1 6 = 1 2 : 2 Notice that it is an immediate consequence of the ab o v e denitions that, for ev ery 2 n, P ( f g ) = m ( ) : That is, the probabilit y of the elemen tary ev en t f g consisting of a single outcome is equal to the v alue m ( ) assigned to the outcome b y the distribution function. Example 1.9 Three p eople, A, B, and C, are running for the same oce, and w e assume that one and only one of them wins. The sample space ma y b e tak en as the 3elemen t set n = f A,B,C g where eac h elemen t corresp onds to the outcome of that candidate's winning. Supp ose that A and B ha v e the same c hance of winning, but that C has only 1/2 the c hance of A or B. Then w e assign m (A ) = m (B) = 2 m (C) : Since m (A ) + m (B) + m (C) = 1 ; PAGE 29 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 21 w e see that 2 m (C) + 2 m (C ) + m (C) = 1 ; whic h implies that 5 m (C) = 1. Hence, m (A ) = 2 5 ; m (B) = 2 5 ; m (C) = 1 5 : Let E b e the ev en t that either A or C wins. Then E = f A,C g and P ( E ) = m (A) + m (C) = 2 5 + 1 5 = 3 5 : 2 In man y cases, ev en ts can b e describ ed in terms of other ev en ts through the use of the standard constructions of set theory W e will briery review the denitions of these constructions. The reader is referred to Figure 1.7 for V enn diagrams whic h illustrate these constructions. Let A and B b e t w o sets. Then the union of A and B is the set A [ B = f x j x 2 A or x 2 B g : The in tersection of A and B is the set A \ B = f x j x 2 A and x 2 B g : The dierence of A and B is the set A B = f x j x 2 A and x 62 B g : The set A is a subset of B written A B if ev ery elemen t of A is also an elemen t of B Finally the complemen t of A is the set ~ A = f x j x 2 n and x 62 A g : The reason that these constructions are imp ortan t is that it is t ypically the case that complicated ev en ts describ ed in English can b e brok en do wn in to simpler ev en ts using these constructions. F or example, if A is the ev en t that \it will sno w tomorro w and it will rain the next da y ," B is the ev en t that \it will sno w tomorro w," and C is the ev en t that \it will rain t w o da ys from no w," then A is the in tersection of the ev en ts B and C Similarly if D is the ev en t that \it will sno w tomorro w or it will rain the next da y ," then D = B [ C (Note that care m ust b e tak en here, b ecause sometimes the w ord \or" in English means that exactly one of the t w o alternativ es will o ccur. The meaning is usually clear from con text. In this b o ok, w e will alw a ys use the w ord \or" in the inclusiv e sense, i.e., A or B means that at least one of the t w o ev en ts A B is true.) The ev en t ~ B is the ev en t that \it will not sno w tomorro w." Finally if E is the ev en t that \it will sno w tomorro w but it will not rain the next da y ," then E = B C PAGE 30 22 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS A B A B A B A A A B ~ A B A B Figure 1.7: Basic set op erations. Prop erties Theorem 1.1 The probabilities assigned to ev en ts b y a distribution function on a sample space n satisfy the follo wing prop erties: 1. P ( E ) 0 for ev ery E n 2. P (n) = 1 3. If E F n, then P ( E ) P ( F ) 4. If A and B are disjoint subsets of n, then P ( A [ B ) = P ( A ) + P ( B ) 5. P ( ~ A ) = 1 P ( A ) for ev ery A n Pro of. F or an y ev en t E the probabilit y P ( E ) is determined from the distribution m b y P ( E ) = X 2 E m ( ) ; for ev ery E n. Since the function m is nonnegativ e, it follo ws that P ( E ) is also nonnegativ e. Th us, Prop ert y 1 is true. Prop ert y 2 is pro v ed b y the equations P (n) = X 2 n m ( ) = 1 : Supp ose that E F n. Then ev ery elemen t that b elongs to E also b elongs to F Therefore, X 2 E m ( ) X 2 F m ( ) ; since eac h term in the lefthand sum is in the righ thand sum, and all the terms in b oth sums are nonnegativ e. This implies that P ( E ) P ( F ) ; and Prop ert y 3 is pro v ed. PAGE 31 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 23 Supp ose next that A and B are disjoin t subsets of n. Then ev ery elemen t of A [ B lies either in A and not in B or in B and not in A It follo ws that P ( A [ B ) = P 2 A [ B m ( ) = P 2 A m ( ) + P 2 B m ( ) = P ( A ) + P ( B ) ; and Prop ert y 4 is pro v ed. Finally to pro v e Prop ert y 5, consider the disjoin t union n = A [ ~ A : Since P (n) = 1, the prop ert y of disjoin t additivit y (Prop ert y 4) implies that 1 = P ( A ) + P ( ~ A ) ; whence P ( ~ A ) = 1 P ( A ). 2 It is imp ortan t to realize that Prop ert y 4 in Theorem 1.1 can b e extended to more than t w o sets. The general nite additivit y prop ert y is giv en b y the follo wing theorem.Theorem 1.2 If A 1 A n are pairwise disjoin t subsets of n (i.e., no t w o of the A i 's ha v e an elemen t in common), then P ( A 1 [ [ A n ) = n X i =1 P ( A i ) : Pro of. Let b e an y elemen t in the union A 1 [ [ A n : Then m ( ) o ccurs exactly once on eac h side of the equalit y in the statemen t of the theorem. 2 W e shall often use the follo wing consequence of the ab o v e theorem. Theorem 1.3 Let A 1 A n b e pairwise disjoin t ev en ts with n = A 1 [ [ A n and let E b e an y ev en t. Then P ( E ) = n X i =1 P ( E \ A i ) : Pro of. The sets E \ A 1 E \ A n are pairwise disjoin t, and their union is the set E The result no w follo ws from Theorem 1.2. 2 PAGE 32 24 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS Corollary 1.1 F or an y t w o ev en ts A and B P ( A ) = P ( A \ B ) + P ( A \ ~ B ) : 2 Prop ert y 4 can b e generalized in another w a y Supp ose that A and B are subsets of n whic h are not necessarily disjoin t. Then: Theorem 1.4 If A and B are subsets of n, then P ( A [ B ) = P ( A ) + P ( B ) P ( A \ B ) : (1.1) Pro of. The left side of Equation 1.1 is the sum of m ( ) for in either A or B W e m ust sho w that the righ t side of Equation 1.1 also adds m ( ) for in A or B If is in exactly one of the t w o sets, then it is coun ted in only one of the three terms on the righ t side of Equation 1.1. If it is in b oth A and B it is added t wice from the calculations of P ( A ) and P ( B ) and subtracted once for P ( A \ B ). Th us it is coun ted exactly once b y the righ t side. Of course, if A \ B = ; then Equation 1.1 reduces to Prop ert y 4. (Equation 1.1 can also b e generalized; see Theorem 3.8.) 2 T ree Diagrams Example 1.10 Let us illustrate the prop erties of probabilities of ev en ts in terms of three tosses of a coin. When w e ha v e an exp erimen t whic h tak es place in stages suc h as this, w e often nd it con v enien t to represen t the outcomes b y a tr e e diagr am as sho wn in Figure 1.8. A p ath through the tree corresp onds to a p ossible outcome of the exp erimen t. F or the case of three tosses of a coin, w e ha v e eigh t paths 1 2 8 and, assuming eac h outcome to b e equally lik ely w e assign equal w eigh t, 1/8, to eac h path. Let E b e the ev en t \at least one head turns up." Then ~ E is the ev en t \no heads turn up." This ev en t o ccurs for only one outcome, namely 8 = TTT. Th us, ~ E = f TTT g and w e ha v e P ( ~ E ) = P ( f TTT g ) = m (TTT ) = 1 8 : By Prop ert y 5 of Theorem 1.1, P ( E ) = 1 P ( ~ E ) = 1 1 8 = 7 8 : Note that w e shall often nd it is easier to compute the probabilit y that an ev en t do es not happ en rather than the probabilit y that it do es. W e then use Prop ert y 5 to obtain the desired probabilit y PAGE 33 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 25 First toss Second toss Third toss Outcome H H H H H HT T T TT T (Start) wwwwwwww 1 2 3456 78 H T Figure 1.8: T ree diagram for three tosses of a coin. Let A b e the ev en t \the rst outcome is a head," and B the ev en t \the second outcome is a tail." By lo oking at the paths in Figure 1.8, w e see that P ( A ) = P ( B ) = 1 2 : Moreo v er, A \ B = f 3 ; 4 g and so P ( A \ B ) = 1 = 4 : Using Theorem 1.4, w e obtain P ( A [ B ) = P ( A ) + P ( B ) P ( A \ B ) = 1 2 + 1 2 1 4 = 3 4 : Since A [ B is the 6elemen t set, A [ B = f HHH,HHT,HTH,HTT,TTH,TTT g ; w e see that w e obtain the same result b y direct en umeration. 2 In our coin tossing examples and in the die rolling example, w e ha v e assigned an equal probabilit y to eac h p ossible outcome of the exp erimen t. Corresp onding to this metho d of assigning probabilities, w e ha v e the follo wing denitions. Uniform Distribution Denition 1.3 The uniform distribution on a sample space n con taining n elemen ts is the function m dened b y m ( ) = 1 n ; for ev ery 2 n. 2 PAGE 34 26 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS It is imp ortan t to realize that when an exp erimen t is analyzed to describ e its p ossible outcomes, there is no single correct c hoice of sample space. F or the exp erimen t of tossing a coin t wice in Example 1.2, w e selected the 4elemen t set n = f HH,HT,TH,TT g as a sample space and assigned the uniform distribution function. These c hoices are certainly in tuitiv ely natural. On the other hand, for some purp oses it ma y b e more useful to consider the 3elemen t sample space n = f 0 ; 1 ; 2 g in whic h 0 is the outcome \no heads turn up," 1 is the outcome \exactly one head turns up," and 2 is the outcome \t w o heads turn up." The distribution function m on n dened b y the equations m (0) = 1 4 ; m (1) = 1 2 ; m (2) = 1 4 is the one corresp onding to the uniform probabilit y densit y on the original sample space n. Notice that it is p erfectly p ossible to c ho ose a dieren t distribution function. F or example, w e ma y consider the uniform distribution function on n, whic h is the function q dened b y q (0) = q (1) = q (2) = 1 3 : Although q is a p erfectly go o d distribution function, it is not consisten t with observ ed data on coin tossing. Example 1.11 Consider the exp erimen t that consists of rolling a pair of dice. W e tak e as the sample space n the set of all ordered pairs ( i; j ) of in tegers with 1 i 6 and 1 j 6. Th us, n = f ( i; j ) : 1 i; j 6 g : (There is at least one other \reasonable" c hoice for a sample space, namely the set of all unordered pairs of in tegers, eac h b et w een 1 and 6. F or a discussion of wh y w e do not use this set, see Example 3.14.) T o determine the size of n, w e note that there are six c hoices for i and for eac h c hoice of i there are six c hoices for j leading to 36 dieren t outcomes. Let us assume that the dice are not loaded. In mathematical terms, this means that w e assume that eac h of the 36 outcomes is equally lik ely or equiv alen tly that w e adopt the uniform distribution function on n b y setting m (( i; j )) = 1 36 ; 1 i; j 6 : What is the probabilit y of getting a sum of 7 on the roll of t w o diceor getting a sum of 11? The rst ev en t, denoted b y E is the subset E = f (1 ; 6) ; (6 ; 1) ; (2 ; 5) ; (5 ; 2) ; (3 ; 4) ; (4 ; 3) g : A sum of 11 is the subset F giv en b y F = f (5 ; 6) ; (6 ; 5) g : Consequen tly P ( E ) = P 2 E m ( ) = 6 1 36 = 1 6 ; P ( F ) = P 2 F m ( ) = 2 1 36 = 1 18 : PAGE 35 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 27 What is the probabilit y of getting neither snake eyes (double ones) nor b oxc ars (double sixes)? The ev en t of getting either one of these t w o outcomes is the set E = f (1 ; 1) ; (6 ; 6) g : Hence, the probabilit y of obtaining neither is giv en b y P ( ~ E ) = 1 P ( E ) = 1 2 36 = 17 18 : 2 In the ab o v e coin tossing and the dice rolling exp erimen ts, w e ha v e assigned an equal probabilit y to eac h outcome. That is, in eac h example, w e ha v e c hosen the uniform distribution function. These are the natural c hoices pro vided the coin is a fair one and the dice are not loaded. Ho w ev er, the decision as to whic h distribution function to select to describ e an exp erimen t is not a part of the basic mathematical theory of probabilit y The latter b egins only when the sample space and the distribution function ha v e already b een dened. Determination of Probabilities It is imp ortan t to consider w a ys in whic h probabilit y distributions are determined in practice. One w a y is b y symmetry. F or the case of the toss of a coin, w e do not see an y ph ysical dierence b et w een the t w o sides of a coin that should aect the c hance of one side or the other turning up. Similarly with an ordinary die there is no essen tial dierence b et w een an y t w o sides of the die, and so b y symmetry w e assign the same probabilit y for an y p ossible outcome. In general, considerations of symmetry often suggest the uniform distribution function. Care m ust b e used here. W e should not alw a ys assume that, just b ecause w e do not kno w an y reason to suggest that one outcome is more lik ely than another, it is appropriate to assign equal probabilities. F or example, consider the exp erimen t of guessing the sex of a newb orn c hild. It has b een observ ed that the prop ortion of newb orn c hildren who are b o ys is ab out .513. Th us, it is more appropriate to assign a distribution function whic h assigns probabilit y .513 to the outcome b oy and probabilit y .487 to the outcome girl than to assign probabilit y 1/2 to eac h outcome. This is an example where w e use statistical observ ations to determine probabilities. Note that these probabilities ma y c hange with new studies and ma y v ary from coun try to coun try Genetic engineering migh t ev en allo w an individual to inruence this probabilit y for a particular case. OddsStatistical estimates for probabilities are ne if the exp erimen t under consideration can b e rep eated a n um b er of times under similar circumstances. Ho w ev er, assume that, at the b eginning of a fo otball season, y ou w an t to assign a probabilit y to the ev en t that Dartmouth will b eat Harv ard. Y ou really do not ha v e data that relates to this y ear's fo otball team. Ho w ev er, y ou can determine y our o wn p ersonal probabilit y PAGE 36 28 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS b y seeing what kind of a b et y ou w ould b e willing to mak e. F or example, supp ose that y ou are willing to mak e a 1 dollar b et giving 2 to 1 o dds that Dartmouth will win. Then y ou are willing to pa y 2 dollars if Dartmouth loses in return for receiving 1 dollar if Dartmouth wins. This means that y ou think the appropriate probabilit y for Dartmouth winning is 2/3. Let us lo ok more carefully at the relation b et w een o dds and probabilities. Supp ose that w e mak e a b et at r to 1 o dds that an ev en t E o ccurs. This means that w e think that it is r times as lik ely that E will o ccur as that E will not o ccur. In general, r to s o dds will b e tak en to mean the same thing as r =s to 1, i.e., the ratio b et w een the t w o n um b ers is the only quan tit y of imp ortance when stating o dds. No w if it is r times as lik ely that E will o ccur as that E will not o ccur, then the probabilit y that E o ccurs m ust b e r = ( r + 1), since w e ha v e P ( E ) = r P ( ~ E ) and P ( E ) + P ( ~ E ) = 1 : In general, the statemen t that the o dds are r to s in fa v or of an ev en t E o ccurring is equiv alen t to the statemen t that P ( E ) = r =s ( r =s ) + 1 = r r + s : If w e let P ( E ) = p then the ab o v e equation can easily b e solv ed for r =s in terms of p ; w e obtain r =s = p= (1 p ). W e summarize the ab o v e discussion in the follo wing denition.Denition 1.4 If P ( E ) = p the o dds in fa v or of the ev en t E o ccurring are r : s ( r to s ) where r =s = p= (1 p ). If r and s are giv en, then p can b e found b y using the equation p = r = ( r + s ). 2 Example 1.12 (Example 1.9 con tin ued) In Example 1.9 w e assigned probabilit y 1/5 to the ev en t that candidate C wins the race. Th us the o dds in fa v or of C winning are 1 = 5 : 4 = 5. These o dds could equally w ell ha v e b een written as 1 : 4, 2 : 8, and so forth. A b et that C wins is fair if w e receiv e 4 dollars if C wins and pa y 1 dollar if C loses. 2 Innite Sample Spaces If a sample space has an innite n um b er of p oin ts, then the w a y that a distribution function is dened dep ends up on whether or not the sample space is coun table. A sample space is c ountably innite if the elemen ts can b e coun ted, i.e., can b e put in onetoone corresp ondence with the p ositiv e in tegers, and unc ountably innite PAGE 37 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 29 otherwise. Innite sample spaces require new concepts in general (see Chapter 2), but coun tably innite spaces do not. If n = f 1 ; 2 ; 3 ; : : : g is a coun tably innite sample space, then a distribution function is dened exactly as in Denition 1.2, except that the sum m ust no w b e a c onver gent innite sum. Theorem 1.1 is still true, as are its extensions Theorems 1.2 and 1.4. One thing w e cannot do on a coun tably innite sample space that w e could do on a nite sample space is to dene a uniform distribution function as in Denition 1.3. Y ou are ask ed in Exercise 20 to explain wh y this is not p ossible. Example 1.13 A coin is tossed un til the rst time that a head turns up. Let the outcome of the exp erimen t, b e the rst time that a head turns up. Then the p ossible outcomes of our exp erimen t are n = f 1 ; 2 ; 3 ; : : : g : Note that ev en though the coin could come up tails ev ery time w e ha v e not allo w ed for this p ossibilit y W e will explain wh y in a momen t. The probabilit y that heads comes up on the rst toss is 1/2. The probabilit y that tails comes up on the rst toss and heads on the second is 1/4. The probabilit y that w e ha v e t w o tails follo w ed b y a head is 1/8, and so forth. This suggests assigning the distribution function m ( n ) = 1 = 2 n for n = 1, 2, 3, T o see that this is a distribution function w e m ust sho w that X m ( ) = 1 2 + 1 4 + 1 8 + = 1 : That this is true follo ws from the form ula for the sum of a geometric series, 1 + r + r 2 + r 3 + = 1 1 r ; or r + r 2 + r 3 + r 4 + = r 1 r ; (1.2) for 1 < r < 1. Putting r = 1 = 2, w e see that w e ha v e a probabilit y of 1 that the coin ev en tually turns up heads. The p ossible outcome of tails ev ery time has to b e assigned probabilit y 0, so w e omit it from our sample space of p ossible outcomes. Let E b e the ev en t that the rst time a head turns up is after an ev en n um b er of tosses. Then E = f 2 ; 4 ; 6 ; 8 ; : : : g ; and P ( E ) = 1 4 + 1 16 + 1 64 + : Putting r = 1 = 4 in Equation 1.2 see that P ( E ) = 1 = 4 1 1 = 4 = 1 3 : Th us the probabilit y that a head turns up for the rst time after an ev en n um b er of tosses is 1/3 and after an o dd n um b er of tosses is 2/3. 2 PAGE 38 30 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS Historical Remarks An in teresting question in the history of science is: Wh y w as probabilit y not dev elop ed un til the sixteen th cen tury? W e kno w that in the sixteen th cen tury problems in gam bling and games of c hance made p eople start to think ab out probabilit y But gam bling and games of c hance are almost as old as civilization itself. In ancien t Egypt (at the time of the First Dynast y ca. 3500 B.C. ) a game no w called \Hounds and Jac k als" w as pla y ed. In this game the mo v emen t of the hounds and jac k als w as based on the outcome of the roll of foursided dice made out of animal b ones called astragali. Sixsided dice made of a v ariet y of materials date bac k to the sixteen th cen tury B.C. Gam bling w as widespread in ancien t Greece and Rome. Indeed, in the Roman Empire it w as sometimes found necessary to in v ok e la ws against gam bling. Wh y then, w ere probabilities not calculated un til the sixteen th cen tury? Sev eral explanations ha v e b een adv anced for this late dev elopmen t. One is that the relev an t mathematics w as not dev elop ed and w as not easy to dev elop. The ancien t mathematical notation made n umerical calculation complicated, and our familiar algebraic notation w as not dev elop ed un til the sixteen th cen tury Ho w ev er, as w e shall see, man y of the com binatorial ideas needed to calculate probabilities w ere discussed long b efore the sixteen th cen tury Since man y of the c hance ev en ts of those times had to do with lotteries relating to religious aairs, it has b een suggested that there ma y ha v e b een religious barriers to the study of c hance and gam bling. Another suggestion is that a stronger incen tiv e, suc h as the dev elopmen t of commerce, w as necessary Ho w ev er, none of these explanations seems completely satisfactory and p eople still w onder wh y it to ok so long for probabilit y to b e studied seriously An in teresting discussion of this problem can b e found in Hac king. 14 The rst p erson to calculate probabilities systematically w as Gerolamo Cardano (1501{1576) in his b o ok Lib er de Ludo A le ae. This w as translated from the Latin b y Gould and app ears in the b o ok Car dano: The Gambling Scholar b y Ore. 15 Ore pro vides a fascinating discussion of the life of this colorful sc holar with accoun ts of his in terests in man y dieren t elds, including medicine, astrology and mathematics. Y ou will also nd there a detailed accoun t of Cardano's famous battle with T artaglia o v er the solution to the cubic equation. In his b o ok on probabilit y Cardano dealt only with the sp ecial case that w e ha v e called the uniform distribution function. This restriction to equiprobable outcomes w as to con tin ue for a long time. In this case Cardano realized that the probabilit y that an ev en t o ccurs is the ratio of the n um b er of fa v orable outcomes to the total n um b er of outcomes. Man y of Cardano's examples dealt with rolling dice. Here he realized that the outcomes for t w o rolls should b e tak en to b e the 36 ordered pairs ( i; j ) rather than the 21 unordered pairs. This is a subtle p oin t that w as still causing problems m uc h later for other writers on probabilit y F or example, in the eigh teen th cen tury the famous F renc h mathematician d'Alem b ert, author of sev eral w orks on probabilit y claimed that when a coin is tossed t wice the n um b er of heads that turn up w ould 14 I. Hac king, The Emer genc e of Pr ob ability (Cam bridge: Cam bridge Univ ersit y Press, 1975). 15 O. Ore, Car dano: The Gambling Scholar (Princeton: Princeton Univ ersit y Press, 1953). PAGE 39 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 31 b e 0, 1, or 2, and hence w e should assign equal probabilities for these three p ossible outcomes. 16 Cardano c hose the correct sample space for his dice problems and calculated the correct probabilities for a v ariet y of ev en ts. Cardano's mathematical w ork is in tersp ersed with a lot of advice to the p oten tial gam bler in short paragraphs, en titled, for example: \Who Should Pla y and When," \Wh y Gam bling W as Condemned b y Aristotle," \Do Those Who T eac h Also Pla y W ell?" and so forth. In a paragraph en titled \The F undamen tal Principle of Gambling," Cardano writes: The most fundamen tal principle of all in gam bling is simply equal conditions, e.g., of opp onen ts, of b ystanders, of money of situation, of the dice b o x, and of the die itself. T o the exten t to whic h y ou depart from that equalit y if it is in y our opp onen t's fa v or, y ou are a fo ol, and if in y our o wn, y ou are unjust. 17 Cardano did mak e mistak es, and if he realized it later he did not go bac k and c hange his error. F or example, for an ev en t that is fa v orable in three out of four cases, Cardano assigned the correct o dds 3 : 1 that the ev en t will o ccur. But then he assigned o dds b y squaring these n um b ers (i.e., 9 : 1) for the ev en t to happ en t wice in a ro w. Later, b y considering the case where the o dds are 1 : 1, he realized that this cannot b e correct and w as led to the correct result that when f out of n outcomes are fa v orable, the o dds for a fa v orable outcome t wice in a ro w are f 2 : n 2 f 2 Ore p oin ts out that this is equiv alen t to the realization that if the probabilit y that an ev en t happ ens in one exp erimen t is p the probabilit y that it happ ens t wice is p 2 Cardano pro ceeded to establish that for three successes the form ula should b e p 3 and for four successes p 4 making it clear that he understo o d that the probabilit y is p n for n successes in n indep enden t rep etitions of suc h an exp erimen t. This will follo w from the concept of indep endence that w e in tro duce in Section 4.1. Cardano's w ork w as a remark able rst attempt at writing do wn the la ws of probabilit y but it w as not the spark that started a systematic study of the sub ject. This came from a famous series of letters b et w een P ascal and F ermat. This corresp ondence w as initiated b y P ascal to consult F ermat ab out problems he had b een giv en b y Chev alier de M er e, a w ellkno wn writer, a prominen t gure at the court of Louis XIV, and an arden t gam bler. The rst problem de M er e p osed w as a dice problem. The story go es that he had b een b etting that at least one six w ould turn up in four rolls of a die and winning to o often, so he then b et that a pair of sixes w ould turn up in 24 rolls of a pair of dice. The probabilit y of a six with one die is 1/6 and, b y the pro duct la w for indep enden t exp erimen ts, the probabilit y of t w o sixes when a pair of dice is thro wn is (1 = 6)(1 = 6) = 1 = 36. Ore 18 claims that a gam bling rule of the time suggested that, since four rep etitions w as fa v orable for the o ccurrence of an ev en t with probabilit y 1/6, for an ev en t six times as unlik ely 6 4 = 24 rep etitions w ould b e sucien t for 16 J. d'Alem b ert, \Croix ou Pile," in L'Encyclop edie, ed. Diderot, v ol. 4 (P aris, 1754). 17 O. Ore, op. cit., p. 189. 18 O. Ore, \P ascal and the In v en tion of Probabilit y Theory ," A meric an Mathematics Monthly v ol. 67 (1960), pp. 409{419. PAGE 40 32 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS a fa v orable b et. P ascal sho w ed, b y exact calculation, that 25 rolls are required for a fa v orable b et for a pair of sixes. The second problem w as a m uc h harder one: it w as an old problem and concerned the determination of a fair division of the stak es in a tournamen t when the series, for some reason, is in terrupted b efore it is completed. This problem is no w referred to as the problem of p oin ts. The problem had b een a standard problem in mathematical texts; it app eared in F ra Luca P accioli's b o ok summa de A rithmetic a, Ge ometria, Pr op ortioni et Pr op ortionalit a prin ted in V enice in 1494, 19 in the form: A team pla ys ball suc h that a total of 60 p oin ts are required to win the game, and eac h inning coun ts 10 p oin ts. The stak es are 10 ducats. By some inciden t they cannot nish the game and one side has 50 p oin ts and the other 20. One w an ts to kno w what share of the prize money b elongs to eac h side. In this case I ha v e found that opinions dier from one to another but all seem to me insucien t in their argumen ts, but I shall state the truth and giv e the correct w a y Reasonable solutions, suc h as dividing the stak es according to the ratio of games w on b y eac h pla y er, had b een prop osed, but no correct solution had b een found at the time of the P ascalF ermat corresp ondence. The letters deal mainly with the attempts of P ascal and F ermat to solv e this problem. Blaise P ascal (1623{1662) w as a c hild pro digy ha ving published his treatise on conic sections at age sixteen, and ha ving in v en ted a calculating mac hine at age eigh teen. A t the time of the letters, his demonstration of the w eigh t of the atmosphere had already established his p osition at the forefron t of con temp orary ph ysicists. Pierre de F ermat (1601{ 1665) w as a learned jurist in T oulouse, who studied mathematics in his spare time. He has b een called b y some the prince of amateurs and one of the greatest pure mathematicians of all times. The letters, translated b y Maxine Merrington, app ear in Florence Da vid's fascinating historical accoun t of probabilit y Games, Go ds and Gambling 20 In a letter dated W ednesda y 29th July 1654, P ascal writes to F ermat: Sir,Lik e y ou, I am equally impatien t, and although I am again ill in b ed, I cannot help telling y ou that y esterda y ev ening I receiv ed from M. de Carca vi y our letter on the problem of p oin ts, whic h I admire more than I can p ossibly sa y I ha v e not the leisure to write at length, but, in a w ord, y ou ha v e solv ed the t w o problems of p oin ts, one with dice and the other with sets of games with p erfect justness; I am en tirely satised with it for I do not doubt that I w as in the wrong, seeing the admirable agreemen t in whic h I nd m yself with y ou no w. Y our metho d is v ery sound and is the one whic h rst came to m y mind in this researc h; but b ecause the lab our of the com bination is excessiv e, I ha v e found a short cut and indeed another metho d whic h is m uc h 19 ibid., p. 414. 20 F. N. Da vid, Games, Go ds and Gambling (London: G. Grin, 1962), p. 230 PAGE 41 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 33 0 1 2 3 0 1 2 3 0 0 0 8 16 32 64 20 32 48 6464 32 44 56 Number of games A has won Number of games B has won Figure 1.9: P ascal's table. quic k er and neater, whic h I w ould lik e to tell y ou here in a few w ords: for henceforth I w ould lik e to op en m y heart to y ou, if I ma y as I am so o v erjo y ed with our agreemen t. I see that truth is the same in T oulouse as in P aris. Here, more or less, is what I do to sho w the fair v alue of eac h game, when t w o opp onen ts pla y for example, in three games and eac h p erson has stak ed 32 pistoles. Let us sa y that the rst man had w on t wice and the other once; no w they pla y another game, in whic h the conditions are that, if the rst wins, he tak es all the stak es; that is 64 pistoles; if the other wins it, then they ha v e eac h w on t w o games, and therefore, if they wish to stop pla ying, they m ust eac h tak e bac k their o wn stak e, that is, 32 pistoles eac h. Then consider, Sir, if the rst man wins, he gets 64 pistoles; if he loses he gets 32. Th us if they do not wish to risk this last game but wish to separate without pla ying it, the rst man m ust sa y: `I am certain to get 32 pistoles, ev en if I lost I still get them; but as for the other 32, p erhaps I will get them, p erhaps y ou will get them, the c hances are equal. Let us then divide these 32 pistoles in half and giv e one half to me as w ell as m y 32 whic h are mine for sure.' He will then ha v e 48 pistoles and the other 16. P ascal's argumen t pro duces the table illustrated in Figure 1.9 for the amoun t due pla y er A at an y quitting p oin t. Eac h en try in the table is the a v erage of the n um b ers just ab o v e and to the righ t of the n um b er. This fact, together with the kno wn v alues when the tournamen t is completed, determines all the v alues in this table. If pla y er A wins the rst game, PAGE 42 34 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS then he needs t w o games to win and B needs three games to win; and so, if the tounamen t is called o, A should receiv e 44 pistoles. The letter in whic h F ermat presen ted his solution has b een lost; but fortunately P ascal describ es F ermat's metho d in a letter dated Monda y 24th August, 1654. F rom P ascal's letter: 21 This is y our pro cedure when there are t w o pla y ers: If t w o pla y ers, pla ying sev eral games, nd themselv es in that p osition when the rst man needs two games and second needs thr e e then to nd the fair division of stak es, y ou sa y that one m ust kno w in ho w man y games the pla y will b e absolutely decided. It is easy to calculate that this will b e in four games, from whic h y ou can conclude that it is necessary to see in ho w man y w a ys four games can b e arranged b et w een t w o pla y ers, and one m ust see ho w man y com binations w ould mak e the rst man win and ho w man y the second and to share out the stak es in this prop ortion. I w ould ha v e found it dicult to understand this if I had not kno wn it m yself already; in fact y ou had explained it with this idea in mind. F ermat realized that the n um b er of w a ys that the game migh t b e nished ma y not b e equally lik ely F or example, if A needs t w o more games and B needs three to win, t w o p ossible w a ys that the tournamen t migh t go for A to win are WL W and L WL W. These t w o sequences do not ha v e the same c hance of o ccurring. T o a v oid this dicult y F ermat extended the pla y adding ctitious pla ys, so that all the w a ys that the games migh t go ha v e the same length, namely four. He w as shrewd enough to realize that this extension w ould not c hange the winner and that he no w could simply coun t the n um b er of sequences fa v orable to eac h pla y er since he had made them all equally lik ely If w e list all p ossible w a ys that the extended game of four pla ys migh t go, w e obtain the follo wing 16 p ossible outcomes of the pla y: WWWW WL WW L WWW LL WW WWWL WL WL L WWL LL WL WWL W WLL W L WL W LLL W WWLL WLLL L WLL LLLL Pla y er A wins in the cases where there are at least t w o wins (the 11 underlined cases), and B wins in the cases where there are at least three losses (the other 5 cases). Since A wins in 11 of the 16 p ossible cases F ermat argued that the probabilit y that A wins is 11/16. If the stak es are 64 pistoles, A should receiv e 44 pistoles in agreemen t with P ascal's result. P ascal and F ermat dev elop ed more systematic metho ds for coun ting the n um b er of fa v orable outcomes for problems lik e this, and this will b e one of our cen tral problems. Suc h coun ting metho ds fall under the sub ject of c ombinatorics whic h is the topic of Chapter 3. 21 ibid., p. 239. PAGE 43 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 35 W e see that these t w o mathematicians arriv ed at t w o v ery dieren t w a ys to solv e the problem of p oin ts. P ascal's metho d w as to dev elop an algorithm and use it to calculate the fair division. This metho d is easy to implemen t on a computer and easy to generalize. F ermat's metho d, on the other hand, w as to c hange the problem in to an equiv alen t problem for whic h he could use coun ting or com binatorial metho ds. W e will see in Chapter 3 that, in fact, F ermat used what has b ecome kno wn as P ascal's triangle! In our study of probabilit y to da y w e shall nd that b oth the algorithmic approac h and the com binatorial approac h share equal billing, just as they did 300 y ears ago when probabilit y got its start. Exercises 1 Let n = f a; b; c g b e a sample space. Let m ( a ) = 1 = 2, m ( b ) = 1 = 3, and m ( c ) = 1 = 6. Find the probabilities for all eigh t subsets of n. 2 Giv e a p ossible sample space n for eac h of the follo wing exp erimen ts: (a) An election decides b et w een t w o candidates A and B. (b) A t w osided coin is tossed. (c) A studen t is ask ed for the mon th of the y ear and the da y of the w eek on whic h her birthda y falls. (d) A studen t is c hosen at random from a class of ten studen ts. (e) Y ou receiv e a grade in this course. 3 F or whic h of the cases in Exercise 2 w ould it b e reasonable to assign the uniform distribution function? 4 Describ e in w ords the ev en ts sp ecied b y the follo wing subsets of n = f H H H ; H H T ; H T H ; H T T ; T H H ; T H T ; T T H ; T T T g (see Example 1.6). (a) E = f HHH,HHT,HTH,HTT g (b) E = f HHH,TTT g (c) E = f HHT,HTH,THH g (d) E = f HHT,HTH,HTT,THH,THT,TTH,TTT g 5 What are the probabilities of the ev en ts describ ed in Exercise 4? 6 A die is loaded in suc h a w a y that the probabilit y of eac h face turning up is prop ortional to the n um b er of dots on that face. (F or example, a six is three times as probable as a t w o.) What is the probabilit y of getting an ev en n um b er in one thro w? 7 Let A and B b e ev en ts suc h that P ( A \ B ) = 1 = 4, P ( ~ A ) = 1 = 3, and P ( B ) = 1 = 2. What is P ( A [ B )? PAGE 44 36 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS 8 A studen t m ust c ho ose one of the sub jects, art, geology or psyc hology as an electiv e. She is equally lik ely to c ho ose art or psyc hology and t wice as lik ely to c ho ose geology What are the resp ectiv e probabilities that she c ho oses art, geology and psyc hology? 9 A studen t m ust c ho ose exactly t w o out of three electiv es: art, F renc h, and mathematics. He c ho oses art with probabilit y 5/8, F renc h with probabilit y 5/8, and art and F renc h together with probabilit y 1/4. What is the probabilit y that he c ho oses mathematics? What is the probabilit y that he c ho oses either art or F renc h? 10 F or a bill to come b efore the presiden t of the United States, it m ust b e passed b y b oth the House of Represen tativ es and the Senate. Assume that, of the bills presen ted to these t w o b o dies, 60 p ercen t pass the House, 80 p ercen t pass the Senate, and 90 p ercen t pass at least one of the t w o. Calculate the probabilit y that the next bill presen ted to the t w o groups will come b efore the presiden t. 11 What o dds should a p erson giv e in fa v or of the follo wing ev en ts? (a) A card c hosen at random from a 52card dec k is an ace. (b) Tw o heads will turn up when a coin is tossed t wice. (c) Bo xcars (t w o sixes) will turn up when t w o dice are rolled. 12 Y ou oer 3 : 1 o dds that y our friend Smith will b e elected ma y or of y our cit y What probabilit y are y ou assigning to the ev en t that Smith wins? 13 In a horse race, the o dds that Romance will win are listed as 2 : 3 and that Do wnhill will win are 1 : 2. What o dds should b e giv en for the ev en t that either Romance or Do wnhill wins? 14 Let X b e a random v ariable with distribution function m X ( x ) dened b y m X ( 1) = 1 = 5 ; m X (0) = 1 = 5 ; m X (1) = 2 = 5 ; m X (2) = 1 = 5 : (a) Let Y b e the random v ariable dened b y the equation Y = X + 3. Find the distribution function m Y ( y ) of Y (b) Let Z b e the random v ariable dened b y the equation Z = X 2 Find the distribution function m Z ( z ) of Z *15 John and Mary are taking a mathematics course. The course has only three grades: A, B, and C. The probabilit y that John gets a B is .3. The probabilit y that Mary gets a B is .4. The probabilit y that neither gets an A but at least one gets a B is .1. What is the probabilit y that at least one gets a B but neither gets a C? 16 In a erce battle, not less than 70 p ercen t of the soldiers lost one ey e, not less than 75 p ercen t lost one ear, not less than 80 p ercen t lost one hand, and not PAGE 45 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 37 less than 85 p ercen t lost one leg. What is the minimal p ossible p ercen tage of those who sim ultaneously lost one ear, one ey e, one hand, and one leg? 22 *17 Assume that the probabilit y of a \success" on a single exp erimen t with n outcomes is 1 =n Let m b e the n um b er of exp erimen ts necessary to mak e it a fa v orable b et that at least one success will o ccur (see Exercise 1.1.5). (a) Sho w that the probabilit y that, in m trials, there are no successes is (1 1 =n ) m (b) (de Moivre) Sho w that if m = n log 2 then lim n !1 1 1 n m = 1 2 : Hint : lim n !1 1 1 n n = e 1 : Hence for large n w e should c ho ose m to b e ab out n log 2. (c) W ould DeMoivre ha v e b een led to the correct answ er for de M er e's t w o b ets if he had used his appro ximation? 18 (a) F or ev en ts A 1 A n pro v e that P ( A 1 [ [ A n ) P ( A 1 ) + + P ( A n ) : (b) F or ev en ts A and B pro v e that P ( A \ B ) P ( A ) + P ( B ) 1 : 19 If A B and C are an y three ev en ts, sho w that P ( A [ B [ C ) = P ( A ) + P ( B ) + P ( C ) P ( A \ B ) P ( B \ C ) P ( C \ A ) + P ( A \ B \ C ) : 20 Explain wh y it is not p ossible to dene a uniform distribution function (see Denition 1.3) on a coun tably innite sample space. Hint : Assume m ( ) = a for all where 0 a 1. Do es m ( ) ha v e all the prop erties of a distribution function? 21 In Example 1.13 nd the probabilit y that the coin turns up heads for the rst time on the ten th, elev en th, or t w elfth toss. 22 A die is rolled un til the rst time that a six turns up. W e shall see that the probabilit y that this o ccurs on the n th roll is (5 = 6) n 1 (1 = 6). Using this fact, describ e the appropriate innite sample space and distribution function for the exp erimen t of rolling a die un til a six turns up for the rst time. V erify that for y our distribution function P m ( ) = 1. 22 See Knot X, in Lewis Carroll, Mathematic al R e cr e ations, v ol. 2 (Do v er, 1958). PAGE 46 38 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS 23 Let n b e the sample space n = f 0 ; 1 ; 2 ; : : : g ; and dene a distribution function b y m ( j ) = (1 r ) j r ; for some xed r 0 < r < 1, and for j = 0 ; 1 ; 2 ; : : : Sho w that this is a distribution function for n. 24 Our calendar has a 400y ear cycle. B. H. Bro wn noticed that the n um b er of times the thirteen th of the mon th falls on eac h of the da ys of the w eek in the 4800 mon ths of a cycle is as follo ws: Sunda y 687 Monda y 685 T uesda y 685 W ednesda y 687 Th ursda y 684 F rida y 688 Saturda y 684 F rom this he deduced that the thirteen th w as more lik ely to fall on F rida y than on an y other da y Explain what he mean t b y this. 25 Tv ersky and Kahneman 23 ask ed a group of sub jects to carry out the follo wing task. They are told that: Linda is 31, single, outsp ok en, and v ery brigh t. She ma jored in philosoph y in college. As a studen t, she w as deeply concerned with racial discrimination and other so cial issues, and participated in an tin uclear demonstrations. The sub jects are then ask ed to rank the lik eliho o d of v arious alternativ es, suc h as:(1) Linda is activ e in the feminist mo v emen t. (2) Linda is a bank teller. (3) Linda is a bank teller and activ e in the feminist mo v emen t. Tv ersky and Kahneman found that b et w een 85 and 90 p ercen t of the sub jects rated alternativ e (1) most lik ely but alternativ e (3) more lik ely than alternativ e (2). Is it? They call this phenomenon the c onjunction fal lacy, and note that it app ears to b e unaected b y prior training in probabilit y or statistics. Is this phenomenon a fallacy? If so, wh y? Can y ou giv e a p ossible explanation for the sub jects' c hoices? 23 K. McKean, \Decisions, Decisions," pp. 22{31. PAGE 47 1.2. DISCRETE PR OBABILITY DISTRIBUTIONS 39 26 Tw o cards are dra wn successiv ely from a dec k of 52 cards. Find the probabilit y that the second card is higher in rank than the rst card. Hint : Sho w that 1 = P (higher ) + P (lo w er ) + P (same) and use the fact that P (higher ) = P (lo w er ). 27 A life table is a table that lists for a giv en n um b er of births the estimated n um b er of p eople who will liv e to a giv en age. In App endix C w e giv e a life table based up on 100,000 births for ages from 0 to 85, b oth for w omen and for men. Sho w ho w from this table y ou can estimate the probabilit y m ( x ) that a p erson b orn in 1981 w ould liv e to age x W rite a program to plot m ( x ) b oth for men and for w omen, and commen t on the dierences that y ou see in the t w o cases. *28 Here is an attempt to get around the fact that w e cannot c ho ose a \random in teger." (a) What, in tuitiv ely is the probabilit y that a \randomly c hosen" p ositiv e in teger is a m ultiple of 3? (b) Let P 3 ( N ) b e the probabilit y that an in teger, c hosen at random b et w een 1 and N is a m ultiple of 3 (since the sample space is nite, this is a legitimate probabilit y). Sho w that the limit P 3 = lim N !1 P 3 ( N ) exists and equals 1/3. This formalizes the in tuition in (a), and giv es us a w a y to assign \probabilities" to certain ev en ts that are innite subsets of the p ositiv e in tegers. (c) If A is an y set of p ositiv e in tegers, let A ( N ) mean the n um b er of elemen ts of A whic h are less than or equal to N Then dene the \probabilit y" of A as P ( A ) = lim N !1 A ( N ) = N ; pro vided this limit exists. Sho w that this denition w ould assign probabilit y 0 to an y nite set and probabilit y 1 to the set of all p ositiv e in tegers. Th us, the probabilit y of the set of all in tegers is not the sum of the probabilities of the individual in tegers in this set. This means that the denition of probabilit y giv en here is not a completely satisfactory denition. (d) Let A b e the set of all p ositiv e in tegers with an o dd n um b er of digits. Sho w that P ( A ) do es not exist. This sho ws that under the ab o v e denition of probabilit y not all sets ha v e probabilities. 29 (from Sholander 24 ) In a standard clo v erleaf in terc hange, there are four ramps for making righ thand turns, and inside these four ramps, there are four more ramps for making lefthand turns. Y our car approac hes the in terc hange from the south. A mec hanism has b een installed so that at eac h p oin t where there exists a c hoice of directions, the car turns to the righ t with xed probabilit y r 24 M. Sholander, Problem #1034, Mathematics Magazine, v ol. 52, no. 3 (Ma y 1979), p. 183. PAGE 48 40 CHAPTER 1. DISCRETE PR OBABILITY DISTRIBUTIONS (a) If r = 1 = 2, what is y our c hance of emerging from the in terc hange going w est? (b) Find the v alue of r that maximizes y our c hance of a w est w ard departure from the in terc hange. 30 (from Benk oski 25 ) Consider a \pure" clo v erleaf in terc hange in whic h there are no ramps for righ thand turns, but only the t w o in tersecting straigh t high w a ys with clo v erlea v es for lefthand turns. (Th us, to turn righ t in suc h an in terc hange, one m ust mak e three lefthand turns.) As in the preceding problem, y our car approac hes the in terc hange from the south. What is the v alue of r that maximizes y our c hances of an east w ard departure from the in terc hange? 31 (from v os Sa v an t 26 ) A reader of Marilyn v os Sa v an t's column wrote in with the follo wing question: My dad heard this story on the radio. A t Duk e Univ ersit y t w o studen ts had receiv ed A's in c hemistry all semester. But on the nigh t b efore the nal exam, they w ere part ying in another state and didn't get bac k to Duk e un til it w as o v er. Their excuse to the professor w as that they had a rat tire, and they ask ed if they could tak e a mak eup test. The professor agreed, wrote out a test and sen t the t w o to separate ro oms to tak e it. The rst question (on one side of the pap er) w as w orth 5 p oin ts, and they answ ered it easily Then they ripp ed the pap er o v er and found the second question, w orth 95 p oin ts: `Whic h tire w as it?' What w as the probabilit y that b oth studen ts w ould sa y the same thing? My dad and I think it's 1 in 16. Is that righ t?" (a) Is the answ er 1/16? (b) The follo wing question w as ask ed of a class of studen ts. \I w as driving to sc ho ol to da y and one of m y tires w en t rat. Whic h tire do y ou think it w as?" The resp onses w ere as follo ws: righ t fron t, 58%, left fron t, 11%, righ t rear, 18%, left rear, 13%. Supp ose that this distribution holds in the general p opulation, and assume that the t w o testtak ers are randomly c hosen from the general p opulation. What is the probabilit y that they will giv e the same answ er to the second question? 25 S. Benk oski, Commen t on Problem #1034, Mathematics Magazine, v ol. 52, no. 3 (Ma y 1979), pp. 183184. 26 M. v os Sa v an t, Par ade Magazine 3 Marc h 1996, p. 14. PAGE 49 Chapter 2 Con tin uous Probabilit y Densities2.1 Sim ulation of Con tin uous Probabilities In this section w e shall sho w ho w w e can use computer sim ulations for exp erimen ts that ha v e a whole con tin uum of p ossible outcomes. ProbabilitiesExample 2.1 W e b egin b y constructing a spinner, whic h consists of a circle of unit cir cumfer enc e and a p oin ter as sho wn in Figure 2.1. W e pic k a p oin t on the circle and lab el it 0, and then lab el ev ery other p oin t on the circle with the distance, sa y x from 0 to that p oin t, measured coun terclo c kwise. The exp erimen t consists of spinning the p oin ter and recording the lab el of the p oin t at the tip of the p oin ter. W e let the random v ariable X denote the v alue of this outcome. The sample space is clearly the in terv al [0 ; 1). W e w ould lik e to construct a probabilit y mo del in whic h eac h outcome is equally lik ely to o ccur. If w e pro ceed as w e did in Chapter 1 for exp erimen ts with a nite n um b er of p ossible outcomes, then w e m ust assign the probabilit y 0 to eac h outcome, since otherwise, the sum of the probabilities, o v er all of the p ossible outcomes, w ould not equal 1. (In fact, summing an uncoun table n um b er of real n um b ers is a tric ky business; in particular, in order for suc h a sum to ha v e an y meaning, at most coun tably man y of the summands can b e dieren t than 0.) Ho w ev er, if all of the assigned probabilities are 0, then the sum is 0, not 1, as it should b e. In the next section, w e will sho w ho w to construct a probabilit y mo del in this situation. A t presen t, w e will assume that suc h a mo del can b e constructed. W e will also assume that in this mo del, if E is an arc of the circle, and E is of length p then the mo del will assign the probabilit y p to E This means that if the p oin ter is spun, the probabilit y that it ends up p oin ting to a p oin t in E equals p whic h is certainly a reasonable thing to exp ect. 41 PAGE 50 42 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 x Figure 2.1: A spinner. T o sim ulate this exp erimen t on a computer is an easy matter. Man y computer soft w are pac k ages ha v e a function whic h returns a random real n um b er in the interv al [0 ; 1]. Actually the returned v alue is alw a ys a rational n um b er, and the v alues are determined b y an algorithm, so a sequence of suc h v alues is not truly random. Nev ertheless, the sequences pro duced b y suc h algorithms b eha v e m uc h lik e theoretically random sequences, so w e can use suc h sequences in the sim ulation of exp erimen ts. On o ccasion, w e will need to refer to suc h a function. W e will call this function r nd 2 Mon te Carlo Pro cedure and Areas It is sometimes desirable to estimate quan tities whose exact v alues are dicult or imp ossible to calculate exactly In some of these cases, a pro cedure in v olving c hance, called a Monte Carlo pr o c e dur e can b e used to pro vide suc h an estimate. Example 2.2 In this example w e sho w ho w sim ulation can b e used to estimate areas of plane gures. Supp ose that w e program our computer to pro vide a pair ( x; y ) or n um b ers, eac h c hosen indep enden tly at random from the in terv al [0 ; 1]. Then w e can in terpret this pair ( x; y ) as the co ordinates of a p oin t c hosen at r andom from the unit square. Ev en ts are subsets of the unit square. Our exp erience with Example 2.1 suggests that the p oin t is equally lik ely to fall in subsets of equal area. Since the total area of the square is 1, the probabilit y of the p oin t falling in a sp ecic subset E of the unit square should b e equal to its area. Th us, w e can estimate the area of an y subset of the unit square b y estimating the probabilit y that a p oin t c hosen at random from this square falls in the subset. W e can use this metho d to estimate the area of the region E under the curv e y = x 2 in the unit square (see Figure 2.2). W e c ho ose a large n um b er of p oin ts ( x; y ) at random and record what fraction of them fall in the region E = f ( x; y ) : y x 2 g The program Mon teCarlo will carry out this exp erimen t for us. Running this program for 10,000 exp erimen ts giv es an estimate of .325 (see Figure 2.3). F rom these exp erimen ts w e w ould estimate the area to b e ab out 1/3. Of course, PAGE 51 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 43 1 x 1 y y = x 2 E Figure 2.2: Area under y = x 2 : for this simple region w e can nd the exact area b y calculus. In fact, Area of E = Z 1 0 x 2 dx = 1 3 : W e ha v e remark ed in Chapter 1 that, when w e sim ulate an exp erimen t of this t yp e n times to estimate a probabilit y w e can exp ect the answ er to b e in error b y at most 1 = p n at least 95 p ercen t of the time. F or 10,000 exp erimen ts w e can exp ect an accuracy of 0.01, and our sim ulation did ac hiev e this accuracy This same argumen t w orks for an y region E of the unit square. F or example, supp ose E is the circle with cen ter (1 = 2 ; 1 = 2) and radius 1/2. Then the probabilit y that our random p oin t ( x; y ) lies inside the circle is equal to the area of the circle, that is, P ( E ) = 1 2 2 = 4 : If w e did not kno w the v alue of w e could estimate the v alue b y p erforming this exp erimen t a large n um b er of times! 2 The ab o v e example is not the only w a y of estimating the v alue of b y a c hance exp erimen t. Here is another w a y disco v ered b y Buon. 1 1 G. L. Buon, in \Essai d'Arithm etique Morale," Oeuvr es Compl etes de Buon ave c Supplements, tome iv, ed. Dum enil (P aris, 1836). PAGE 52 44 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 1 1 1000 trials Estimate of area is .325 y = x 2 E Figure 2.3: Computing the area b y sim ulation. Buon's Needle Example 2.3 Supp ose that w e tak e a card table and dra w across the top surface a set of parallel lines a unit distance apart. W e then drop a common needle of unit length at random on this surface and observ e whether or not the needle lies across one of the lines. W e can describ e the p ossible outcomes of this exp erimen t b y co ordinates as follo ws: Let d b e the distance from the cen ter of the needle to the nearest line. Next, let L b e the line determined b y the needle, and dene as the acute angle that the line L mak es with the set of parallel lines. (The reader should certainly b e w ary of this description of the sample space. W e are attempting to co ordinatize a set of line segmen ts. T o see wh y one m ust b e careful in the c hoice of co ordinates, see Example 2.6.) Using this description, w e ha v e 0 d 1 = 2, and 0 = 2. Moreo v er, w e see that the needle lies across the nearest line if and only if the h yp oten use of the triangle (see Figure 2.4) is less than half the length of the needle, that is, d sin < 1 2 : No w w e assume that when the needle drops, the pair ( ; d ) is c hosen at random from the rectangle 0 = 2, 0 d 1 = 2. W e observ e whether the needle lies across the nearest line (i.e., whether d (1 = 2) sin ). The probabilit y of this ev en t E is the fraction of the area of the rectangle whic h lies inside E (see Figure 2.5). PAGE 53 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 45 d 1/2 q Figure 2.4: Buon's exp erimen t. q 0 1/2 0 d p/2 E Figure 2.5: Set E of pairs ( ; d ) with d < 1 2 sin No w the area of the rectangle is = 4, while the area of E is Area = Z = 2 0 1 2 sin d = 1 2 : Hence, w e get P ( E ) = 1 = 2 = 4 = 2 : The program BuonsNeedle sim ulates this exp erimen t. In Figure 2.6, w e sho w the p osition of ev ery 100th needle in a run of the program in whic h 10,000 needles w ere \dropp ed." Our nal estimate for is 3.139. While this w as within 0.003 of the true v alue for w e had no righ t to exp ect suc h accuracy The reason for this is that our sim ulation estimates P ( E ). While w e can exp ect this estimate to b e in error b y at most 0.001, a small error in P ( E ) gets magnied when w e use this to compute = 2 =P ( E ). P erlman and Wic h ura, in their article \Sharp ening Buon's PAGE 54 46 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0.00 5.000.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 10000 3.139 Figure 2.6: Sim ulation of Buon's needle exp erimen t. Needle," 2 sho w that w e can exp ect to ha v e an error of not more than 5 = p n ab out 95 p ercen t of the time. Here n is the n um b er of needles dropp ed. Th us for 10,000 needles w e should exp ect an error of no more than 0.05, and that w as the case here. W e see that a large n um b er of exp erimen ts is necessary to get a decen t estimate for 2 In eac h of our examples so far, ev en ts of the same size are equally lik ely Here is an example where they are not. W e will see man y other suc h examples later. Example 2.4 Supp ose that w e c ho ose t w o random real n um b ers in [0 ; 1] and add them together. Let X b e the sum. Ho w is X distributed? T o help understand the answ er to this question, w e can use the program Areabargraph This program pro duces a bar graph with the prop ert y that on eac h in terv al, the ar e a rather than the heigh t, of the bar is equal to the fraction of outcomes that fell in the corresp onding in terv al. W e ha v e carried out this exp erimen t 1000 times; the data is sho wn in Figure 2.7. It app ears that the function dened b y f ( x ) = x; if 0 x 1, 2 x; if 1 < x 2 ts the data v ery w ell. (It is sho wn in the gure.) In the next section, w e will see that this function is the \righ t" function. By this w e mean that if a and b are an y t w o real n um b ers b et w een 0 and 2, with a b then w e can use this function to calculate the probabilit y that a X b T o understand ho w this calculation migh t b e p erformed, w e again consider Figure 2.7. Because of the w a y the bars w ere constructed, the sum of the areas of the bars corresp onding to the in terv al 2 M. D. P erlman and M. J. Wic h ura, \Sharp ening Buon's Needle," The A meric an Statistician, v ol. 29, no. 4 (1975), pp. 157{163. PAGE 55 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 47 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 Figure 2.7: Sum of t w o random n um b ers. [ a; b ] appro ximates the probabilit y that a X b But the sum of the areas of these bars also appro ximates the in tegral Z b a f ( x ) dx : This suggests that for an exp erimen t with a con tin uum of p ossible outcomes, if w e nd a function with the ab o v e prop ert y then w e will b e able to use it to calculate probabilities. In the next section, w e will sho w ho w to determine the function f ( x ). 2 Example 2.5 Supp ose that w e c ho ose 100 random n um b ers in [0 ; 1], and let X represen t their sum. Ho w is X distributed? W e ha v e carried out this exp erimen t 10000 times; the results are sho wn in Figure 2.8. It is not so clear what function ts the bars in this case. It turns out that the t yp e of function whic h do es the job is called a normal density function. This t yp e of function is sometimes referred to as a \b ellshap ed" curv e. It is among the most imp ortan t functions in the sub ject of probabilit y and will b e formally dened in Section 5.2 of Chapter 4.3. 2 Our last example explores the fundamen tal question of ho w probabilities are assigned.Bertrand's P arado x Example 2.6 A c hord of a circle is a line segmen t b oth of whose endp oin ts lie on the circle. Supp ose that a c hord is dra wn at r andom in a unit circle. What is the probabilit y that its length exceeds p 3? Our answ er will dep end on what w e mean b y r andom, whic h will dep end, in turn, on what w e c ho ose for co ordinates. The sample space n is the set of all p ossible c hords in the circle. T o nd co ordinates for these c hords, w e rst in tro duce a PAGE 56 48 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 40 45 50 55 60 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Figure 2.8: Sum of 100 random n um b ers. x y A B M q b a Figure 2.9: Random c hord. rectangular co ordinate system with origin at the cen ter of the circle (see Figure 2.9). W e note that a c hord of a circle is p erp endicular to the radial line con taining the midp oin t of the c hord. W e can describ e eac h c hord b y giving: 1. The rectangular co ordinates ( x; y ) of the midp oin t M or 2. The p olar co ordinates ( r ; ) of the midp oin t M or 3. The p olar co ordinates (1 ; ) and (1 ; ) of the endp oin ts A and B In eac h case w e shall in terpret at r andom to mean: c ho ose these co ordinates at random. W e can easily estimate this probabilit y b y computer sim ulation. In programming this sim ulation, it is con v enien t to include certain simplications, whic h w e describ e in turn: PAGE 57 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 49 1. T o sim ulate this case, w e c ho ose v alues for x and y from [ 1 ; 1] at random. Then w e c hec k whether x 2 + y 2 1. If not, the p oin t M = ( x; y ) lies outside the circle and cannot b e the midp oin t of an y c hord, and w e ignore it. Otherwise, M lies inside the circle and is the midp oin t of a unique c hord, whose length L is giv en b y the form ula: L = 2 p 1 ( x 2 + y 2 ) : 2. T o sim ulate this case, w e tak e accoun t of the fact that an y rotation of the circle do es not c hange the length of the c hord, so w e migh t as w ell assume in adv ance that the c hord is horizon tal. Then w e c ho ose r from [ 1 ; 1] at random, and compute the length of the resulting c hord with midp oin t ( r ; = 2) b y the form ula: L = 2 p 1 r 2 : 3. T o sim ulate this case, w e assume that one endp oin t, sa y B lies at (1 ; 0) (i.e., that = 0). Then w e c ho ose a v alue for from [0 ; 2 ] at random and compute the length of the resulting c hord, using the La w of Cosines, b y the form ula: L = p 2 2 cos : The program BertrandsP arado x carries out this sim ulation. Running this program pro duces the results sho wn in Figure 2.10. In the rst circle in this gure, a smaller circle has b een dra wn. Those c hords whic h in tersect this smaller circle ha v e length at least p 3. In the second circle in the gure, the v ertical line in tersects all c hords of length at least p 3 In the third circle, again the v ertical line in tersects all c hords of length at least p 3 In eac h case w e run the exp erimen t a large n um b er of times and record the fraction of these lengths that exceed p 3 W e ha v e prin ted the results of ev ery 100th trial up to 10,000 trials. It is in teresting to observ e that these fractions are not the same in the three cases; they dep end on our c hoice of co ordinates. This phenomenon w as rst observ ed b y Bertrand, and is no w kno wn as Bertr and's p ar adox. 3 It is actually not a parado x at all; it is merely a rerection of the fact that dieren t c hoices of co ordinates will lead to dieren t assignmen ts of probabilities. Whic h assignmen t is \correct" dep ends on what application or in terpretation of the mo del one has in mind. One can imagine a real exp erimen t in v olving thro wing long stra ws at a circle dra wn on a card table. A \correct" assignmen t of co ordinates should not dep end on where the circle lies on the card table, or where the card table sits in the ro om. Ja ynes 4 has sho wn that the only assignmen t whic h meets this requiremen t is (2). In this sense, the assignmen t (2) is the natural, or \correct" one (see Exercise 11). W e can easily see in eac h case what the true probabilities are if w e note that p 3 is the length of the side of an inscrib ed equilateral triangle. Hence, a c hord has 3 J. Bertrand, Calcul des Pr ob abilit es (P aris: GauthierVillars, 1889). 4 E. T. Ja ynes, \The W ellP osed Problem," in Pap ers on Pr ob ability, Statistics and Statistic al Physics, R. D. Rosencran tz, ed. (Dordrec h t: D. Reidel, 1983), pp. 133{148. PAGE 58 50 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES .0 1.0 .2 .4 .6 .8 1.0 .488 .227 .0 1.0 .2 .4 .6 .8 1.0 .0 1.0 .2 .4 .6 .8 1.0 .332 10000 10000 10000 Figure 2.10: Bertrand's parado x. length L > p 3 if its midp oin t has distance d < 1 = 2 from the origin (see Figure 2.9). The follo wing calculations determine the probabilit y that L > p 3 in eac h of the three cases. 1. L > p 3 if( x; y ) lies inside a circle of radius 1/2, whic h o ccurs with probabilit y p = (1 = 2) 2 (1) 2 = 1 4 : 2. L > p 3 if j r j < 1 = 2, whic h o ccurs with probabilit y 1 = 2 ( 1 = 2) 1 ( 1) = 1 2 : 3. L > p 3 if 2 = 3 < < 4 = 3, whic h o ccurs with probabilit y 4 = 3 2 = 3 2 0 = 1 3 : W e see that our sim ulations agree quite w ell with these theoretical v alues. 2 Historical Remarks G. L. Buon (1707{1788) w as a natural scien tist in the eigh teen th cen tury who applied probabilit y to a n um b er of his in v estigations. His w ork is found in his mon umen tal 44v olume Histoir e Natur el le and its supplemen ts. 5 F or example, he 5 G. L. Buon, Histoir e Natur el le, Gener ali et Particular ave c le Descripti on du Cabinet du R oy, 44 v ols. (P aris: L`Imprimerie Ro y ale, 1749{1803). PAGE 59 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 51 Length of Num b er of Num b er of Estimate Exp erimen ter needle casts crossings for W olf, 1850 .8 5000 2532 3.1596 Smith, 1855 .6 3204 1218.5 3.1553 De Morgan, c.1860 1.0 600 382.5 3.137 F o x, 1864 .75 1030 489 3.1595 Lazzerini, 1901 .83 3408 1808 3.1415929 Reina, 1925 .5419 2520 869 3.1795 T able 2.1: Buon needle exp erimen ts to estimate presen ted a n um b er of mortalit y tables and used them to compute, for eac h age group, the exp ected remaining lifetime. F rom his table he observ ed: the exp ected remaining lifetime of an infan t of one y ear is 33 y ears, while that of a man of 21 y ears is also appro ximately 33 y ears. Th us, a father who is not y et 21 can hop e to liv e longer than his one y ear old son, but if the father is 40, the o dds are already 3 to 2 that his son will outliv e him. 6 Buon w an ted to sho w that not all probabilit y calculations rely only on algebra, but that some rely on geometrical calculations. One suc h problem w as his famous \needle problem" as discussed in this c hapter. 7 In his original form ulation, Buon describ es a game in whic h t w o gam blers drop a loaf of F renc h bread on a wideb oard ro or and b et on whether or not the loaf falls across a crac k in the ro or. Buon ask ed: what length L should the bread loaf b e, relativ e to the width W of the ro orb oards, so that the game is fair. He found the correct answ er ( L = ( = 4) W ) using essen tially the metho ds describ ed in this c hapter. He also considered the case of a c hec k erb oard ro or, but ga v e the wrong answ er in this case. The correct answ er w as giv en later b y Laplace. The literature con tains descriptions of a n um b er of exp erimen ts that w ere actually carried out to estimate b y this metho d of dropping needles. N. T. Gridgeman 8 discusses the exp erimen ts sho wn in T able 2.1. (The halv es for the n um b er of crossing comes from a compromise when it could not b e decided if a crossing had actually o ccurred.) He observ es, as w e ha v e, that 10,000 casts could do no more than establish the rst decimal place of with reasonable condence. Gridgeman p oin ts out that, although none of the exp erimen ts used ev en 10,000 casts, they are surprisingly go o d, and in some cases, to o go o d. The fact that the n um b er of casts is not alw a ys a round n um b er w ould suggest that the authors migh t ha v e resorted to clev er stopping to get a go o d answ er. Gridgeman commen ts that Lazzerini's estimate turned out to agree with a w ellkno wn appro ximation to 355 = 113 = 3 : 1415929, disco vered b y the fthcen tury Chinese mathematician, Tsu Ch'ungc hih. Gridgeman sa ys that he did not ha v e Lazzerini's original rep ort, and while w aiting for it (kno wing 6 G. L. Buon, \Essai d'Arithm etique Morale," p. 301. 7 ibid., pp. 277{278. 8 N. T. Gridgeman, \Geometric Probabilit y and the Num b er Scripta Mathematika, v ol. 25, no. 3, (1960), pp. 183{195. PAGE 60 52 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES only the needle crossed a line 1808 times in 3408 casts) deduced that the length of the needle m ust ha v e b een 5/6. He calculated this from Buon's form ula, assuming = 355 = 113: L = P ( E ) 2 = 1 2 355 113 1808 3408 = 5 6 = : 8333 : Ev en with careful planning one w ould ha v e to b e extremely luc ky to b e able to stop so clev erly. The second author lik es to trace his in terest in probabilit y theory to the Chicago W orld's F air of 1933 where he observ ed a mec hanical device dropping needles and displa ying the ev erc hanging estimates for the v alue of (The rst author lik es to trace his in terest in probabilit y theory to the second author.) Exercises *1 In the spinner problem (see Example 2.1) divide the unit circumference in to three arcs of length 1/2, 1/3, and 1/6. W rite a program to sim ulate the spinner exp erimen t 1000 times and prin t out what fraction of the outcomes fall in eac h of the three arcs. No w plot a bar graph whose bars ha v e width 1/2, 1/3, and 1/6, and areas equal to the corresp onding fractions as determined b y y our sim ulation. Sho w that the heigh ts of the bars are all nearly the same. 2 Do the same as in Exercise 1, but divide the unit circumference in to v e arcs of length 1/3, 1/4, 1/5, 1/6, and 1/20. 3 Alter the program Mon teCarlo to estimate the area of the circle of radius 1/2 with cen ter at (1 = 2 ; 1 = 2) inside the unit square b y c ho osing 1000 p oin ts at random. Compare y our results with the true v alue of = 4. Use y our results to estimate the v alue of Ho w accurate is y our estimate? 4 Alter the program Mon teCarlo to estimate the area under the graph of y = sin x inside the unit square b y c ho osing 10,000 p oin ts at random. No w calculate the true v alue of this area and use y our results to estimate the v alue of Ho w accurate is y our estimate? 5 Alter the program Mon teCarlo to estimate the area under the graph of y = 1 = ( x + 1) in the unit square in the same w a y as in Exercise 4. Calculate the true v alue of this area and use y our sim ulation results to estimate the v alue of log 2. Ho w accurate is y our estimate? 6 T o sim ulate the Buon's needle problem w e c ho ose indep enden tly the distance d and the angle at random, with 0 d 1 = 2 and 0 = 2, and c hec k whether d (1 = 2) sin Doing this a large n um b er of times, w e estimate as 2 =a where a is the fraction of the times that d (1 = 2) sin W rite a program to estimate b y this metho d. Run y our program sev eral times for eac h of 100, 1000, and 10,000 exp erimen ts. Do es the accuracy of the exp erimen tal appro ximation for impro v e as the n um b er of exp erimen ts increases? PAGE 61 2.1. SIMULA TION OF CONTINUOUS PR OBABILITIES 53 7 F or Buon's needle problem, Laplace 9 considered a grid with horizontal and vertic al lines one unit apart. He sho w ed that the probabilit y that a needle of length L 1 crosses at least one line is p = 4 L L 2 : T o sim ulate this exp erimen t w e c ho ose at random an angle b et w een 0 and = 2 and indep enden tly t w o n um b ers d 1 and d 2 b et w een 0 and L= 2. (The t w o n um b ers represen t the distance from the cen ter of the needle to the nearest horizon tal and v ertical line.) The needle crosses a line if either d 1 ( L= 2) sin or d 2 ( L= 2) cos W e do this a large n um b er of times and estimate as = 4 L L 2 a ; where a is the prop ortion of times that the needle crosses at least one line. W rite a program to estimate b y this metho d, run y our program for 100, 1000, and 10,000 exp erimen ts, and compare y our results with Buon's metho d describ ed in Exercise 6. (T ak e L = 1.) 8 A long needle of length L m uc h bigger than 1 is dropp ed on a grid with horizon tal and v ertical lines one unit apart. W e will see (in Exercise 6.3.28) that the a v erage n um b er a of lines crossed is appro ximately a = 4 L : T o estimate b y sim ulation, pic k an angle at random b et w een 0 and = 2 and compute L sin + L cos This ma y b e used for the n um b er of lines crossed. Rep eat this man y times and estimate b y = 4 L a ; where a is the a v erage n um b er of lines crossed p er exp erimen t. W rite a program to sim ulate this exp erimen t and run y our program for the n um b er of exp erimen ts equal to 100, 1000, and 10,000. Compare y our results with the metho ds of Laplace or Buon for the same n um b er of exp erimen ts. (Use L = 100.) The follo wing exercises in v olv e exp erimen ts in whic h not all outcomes are equally lik ely W e shall consider suc h exp erimen ts in detail in the next section, but w e in vite y ou to explore a few simple cases here. 9 A large n um b er of w aiting time problems ha v e an exp onential distribution of outcomes. W e shall see (in Section 5.2) that suc h outcomes are sim ulated b y computing ( 1 = ) log(rnd), where > 0. F or w aiting times pro duced in this w a y the a v erage w aiting time is 1 = F or example, the times sp en t w aiting for 9 P S. Laplace, Th eorie A nalytique des Pr ob abilit es (P aris: Courcier, 1812). PAGE 62 54 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES a car to pass on a high w a y or the times b et w een emissions of particles from a radioactiv e source, are sim ulated b y a sequence of random n um b ers, eac h of whic h is c hosen b y computing ( 1 = ) log (rnd), where 1 = is the a v erage time b et w een cars or emissions. W rite a program to sim ulate the times b et w een cars when the a v erage time b et w een cars is 30 seconds. Ha v e y our program compute an area bar graph for these times b y breaking the time in terv al from 0 to 120 in to 24 subin terv als. On the same pair of axes, plot the function f ( x ) = (1 = 30) e (1 = 30) x Do es the function t the bar graph w ell? 10 In Exercise 9, the distribution came \out of a hat." In this problem, w e will again consider an exp erimen t whose outcomes are not equally lik ely W e will determine a function f ( x ) whic h can b e used to determine the probabilit y of certain ev en ts. Let T b e the righ t triangle in the plane with v ertices at the p oin ts (0 ; 0) ; (1 ; 0) ; and (0 ; 1). The exp erimen t consists of pic king a p oin t at random in the in terior of T and recording only the x co ordinate of the p oin t. Th us, the sample space is the set [0 ; 1], but the outcomes do not seem to b e equally lik ely W e can sim ulate this exp erimen t b y asking a computer to return t w o random real n um b ers in [0 ; 1], and recording the rst of these t w o n um b ers if their sum is less than 1. W rite this program and run it for 10,000 trials. Then mak e a bar graph of the result, breaking the in terv al [0 ; 1] in to 10 in terv als. Compare the bar graph with the function f ( x ) = 2 2 x No w sho w that there is a constan t c suc h that the heigh t of T at the x co ordinate v alue x is c times f ( x ) for ev ery x in [0 ; 1]. Finally sho w that Z 1 0 f ( x ) dx = 1 : Ho w migh t one use the function f ( x ) to determine the probabilit y that the outcome is b et w een : 2 and : 5? 11 Here is another w a y to pic k a c hord at r andom on the circle of unit radius. Imagine that w e ha v e a card table whose sides are of length 100. W e place co ordinate axes on the table in suc h a w a y that eac h side of the table is parallel to one of the axes, and so that the cen ter of the table is the origin. W e no w place a circle of unit radius on the table so that the cen ter of the circle is the origin. No w pic k out a p oin t ( x 0 ; y 0 ) at random in the square, and an angle at random in the in terv al ( = 2 ; = 2). Let m = tan Then the equation of the line passing through ( x 0 ; y 0 ) with slop e m is y = y 0 + m ( x x 0 ) ; and the distance of this line from the cen ter of the circle (i.e., the origin) is d = y 0 mx 0 p m 2 + 1 : W e can use this distance form ula to c hec k whether the line in tersects the circle (i.e., whether d < 1). If so, w e consider the resulting c hord a r andom c hord. PAGE 63 2.2. CONTINUOUS DENSITY FUNCTIONS 55 This describ es an exp erimen t of dropping a long stra w at random on a table on whic h a circle is dra wn. W rite a program to sim ulate this exp erimen t 10000 times and estimate the probabilit y that the length of the c hord is greater than p 3. Ho w do es y our estimate compare with the results of Example 2.6? 2.2 Con tin uous Densit y F unctions In the previous section w e ha v e seen ho w to sim ulate exp erimen ts with a whole con tin uum of p ossible outcomes and ha v e gained some exp erience in thinking ab out suc h exp erimen ts. No w w e turn to the general problem of assigning probabilities to the outcomes and ev en ts in suc h exp erimen ts. W e shall restrict our atten tion here to those exp erimen ts whose sample space can b e tak en as a suitably c hosen subset of the line, the plane, or some other Euclidean space. W e b egin with some simple examples.SpinnersExample 2.7 The spinner exp erimen t describ ed in Example 2.1 has the in terv al [0 ; 1) as the set of p ossible outcomes. W e w ould lik e to construct a probabilit y mo del in whic h eac h outcome is equally lik ely to o ccur. W e sa w that in suc h a mo del, it is necessary to assign the probabilit y 0 to eac h outcome. This do es not at all mean that the probabilit y of every ev en t m ust b e zero. On the con trary if w e let the random v ariable X denote the outcome, then the probabilit y P ( 0 X 1) that the head of the spinner comes to rest somewher e in the circle, should b e equal to 1. Also, the probabilit y that it comes to rest in the upp er half of the circle should b e the same as for the lo w er half, so that P 0 X < 1 2 = P 1 2 X < 1 = 1 2 : More generally in our mo del, w e w ould lik e the equation P ( c X < d ) = d c to b e true for ev ery c hoice of c and d If w e let E = [ c; d ], then w e can write the ab o v e form ula in the form P ( E ) = Z E f ( x ) dx ; where f ( x ) is the constan t function with v alue 1. This should remind the reader of the corresp onding form ula in the discrete case for the probabilit y of an ev en t: P ( E ) = X 2 E m ( ) : PAGE 64 56 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Figure 2.11: Spinner exp erimen t. The dierence is that in the con tin uous case, the quan tit y b eing in tegrated, f ( x ), is not the probabilit y of the outcome x (Ho w ev er, if one uses innitesimals, one can consider f ( x ) dx as the probabilit y of the outcome x .) In the con tin uous case, w e will use the follo wing con v en tion. If the set of outcomes is a set of real n um b ers, then the individual outcomes will b e referred to b y small Roman letters suc h as x If the set of outcomes is a subset of R 2 then the individual outcomes will b e denoted b y ( x; y ). In either case, it ma y b e more con v enien t to refer to an individual outcome b y using as in Chapter 1. Figure 2.11 sho ws the results of 1000 spins of the spinner. The function f ( x ) is also sho wn in the gure. The reader will note that the area under f ( x ) and ab o v e a giv en in terv al is appro ximately equal to the fraction of outcomes that fell in that in terv al. The function f ( x ) is called the density function of the random v ariable X The fact that the area under f ( x ) and ab o v e an in terv al corresp onds to a probabilit y is the dening prop ert y of densit y functions. A precise denition of densit y functions will b e giv en shortly 2 DartsExample 2.8 A game of darts in v olv es thro wing a dart at a circular target of unit r adius. Supp ose w e thro w a dart once so that it hits the target, and w e observ e where it lands. T o describ e the p ossible outcomes of this exp erimen t, it is natural to tak e as our sample space the set n of all the p oin ts in the target. It is con v enien t to describ e these p oin ts b y their rectangular co ordinates, relativ e to a co ordinate system with origin at the cen ter of the target, so that eac h pair ( x; y ) of co ordinates with x 2 + y 2 1 describ es a p ossible outcome of the exp erimen t. Then n = f ( x; y ) : x 2 + y 2 1 g is a subset of the Euclidean plane, and the ev en t E = f ( x; y ) : y > 0 g for example, corresp onds to the statemen t that the dart lands in the upp er half of the target, and so forth. Unless there is reason to b eliev e otherwise (and with exp erts at the PAGE 65 2.2. CONTINUOUS DENSITY FUNCTIONS 57 game there ma y w ell b e!), it is natural to assume that the co ordinates are c hosen at r andom. (When doing this with a computer, eac h co ordinate is c hosen uniformly from the in terv al [ 1 ; 1]. If the resulting p oin t do es not lie inside the unit circle, the p oin t is not coun ted.) Then the argumen ts used in the preceding example sho w that the probabilit y of an y elemen tary ev en t, consisting of a single outcome, m ust b e zero, and suggest that the probabilit y of the ev en t that the dart lands in an y subset E of the target should b e determined b y what fraction of the target area lies in E Th us, P ( E ) = area of E area of target = area of E : This can b e written in the form P ( E ) = Z E f ( x ) dx ; where f ( x ) is the constan t function with v alue 1 = In particular, if E = f ( x; y ) : x 2 + y 2 a 2 g is the ev en t that the dart lands within distance a < 1 of the cen ter of the target, then P ( E ) = a 2 = a 2 : F or example, the probabilit y that the dart lies within a distance 1/2 of the cen ter is 1/4. 2 Example 2.9 In the dart game considered ab o v e, supp ose that, instead of observing where the dart lands, w e observ e ho w far it lands from the cen ter of the target. In this case, w e tak e as our sample space the set n of all circles with cen ters at the cen ter of the target. It is con v enien t to describ e these circles b y their radii, so that eac h circle is iden tied b y its radius r 0 r 1. In this w a y w e ma y regard n as the subset [0 ; 1] of the real line. What probabilities should w e assign to the ev en ts E of n? If E = f r : 0 r a g ; then E o ccurs if the dart lands within a distance a of the cen ter, that is, within the circle of radius a and w e sa w in the previous example that under our assumptions the probabilit y of this ev en t is giv en b y P ([0 ; a ]) = a 2 : More generally if E = f r : a r b g ; then b y our basic assumptions, P ( E ) = P ([ a; b ]) = P ([0 ; b ]) P ([0 ; a ]) = b 2 a 2 = ( b a )( b + a ) = 2( b a ) ( b + a ) 2 : PAGE 66 58 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 2 1.5 1 0.5 0 Figure 2.12: Distribution of dart distances in 400 thro ws. Th us, P ( E ) =2(length of E )(midp oin t of E ). Here w e see that the probabilit y assigned to the in terv al E dep ends not only on its length but also on its midp oin t (i.e., not only on ho w long it is, but also on where it is). Roughly sp eaking, in this exp erimen t, ev en ts of the form E = [ a; b ] are more lik ely if they are near the rim of the target and less lik ely if they are near the cen ter. (A common exp erience for b eginners! The conclusion migh t w ell b e dieren t if the b eginner is replaced b y an exp ert.) Again w e can sim ulate this b y computer. W e divide the target area in to ten concen tric regions of equal thic kness. The computer program Darts thro ws n darts and records what fraction of the total falls in eac h of these concen tric regions. The program Areabargraph then plots a bar graph with the ar e a of the i th bar equal to the fraction of the total falling in the i th region. Running the program for 1000 darts resulted in the bar graph of Figure 2.12. Note that here the heigh ts of the bars are not all equal, but gro w appro ximately linearly with r In fact, the linear function y = 2 r app ears to t our bar graph quite w ell. This suggests that the probabilit y that the dart falls within a distance a of the cen ter should b e giv en b y the ar e a under the graph of the function y = 2 r b et w een 0 and a This area is a 2 whic h agrees with the probabilit y w e ha v e assigned ab o v e to this ev en t. 2 Sample Space Co ordinates These examples suggest that for con tin uous exp erimen ts of this sort w e should assign probabilities for the outcomes to fall in a giv en in terv al b y means of the area under a suitable function. More generally w e supp ose that suitable co ordinates can b e in tro duced in to the sample space n, so that w e can regard n as a subset of R n W e call suc h a sample space a c ontinuous sample sp ac e. W e let X b e a random v ariable whic h represen ts the outcome of the exp erimen t. Suc h a random v ariable is called a c ontinuous r andom variable. W e then dene a densit y function for X as follo ws. PAGE 67 2.2. CONTINUOUS DENSITY FUNCTIONS 59 Densit y F unctions of Con tin uous Random V ariables Denition 2.1 Let X b e a con tin uous realv alued random v ariable. A density function for X is a realv alued function f whic h satises P ( a X b ) = Z b a f ( x ) dx for all a; b 2 R 2 W e note that it is not the case that all con tin uous realv alued random v ariables p ossess densit y functions. Ho w ev er, in this b o ok, w e will only consider con tin uous random v ariables for whic h densit y functions exist. In terms of the densit y f ( x ), if E is a subset of R then P ( X 2 E ) = Z E f ( x ) dx : The notation here assumes that E is a subset of R for whic h R E f ( x ) dx mak es sense.Example 2.10 (Example 2.7 con tin ued) In the spinner exp erimen t, w e c ho ose for our set of outcomes the in terv al 0 x < 1, and for our densit y function f ( x ) = 1 ; if 0 x < 1, 0 ; otherwise. If E is the ev en t that the head of the spinner falls in the upp er half of the circle, then E = f x : 0 x 1 = 2 g and so P ( E ) = Z 1 = 2 0 1 dx = 1 2 : More generally if E is the ev en t that the head falls in the in terv al [ a; b ], then P ( E ) = Z b a 1 dx = b a : 2 Example 2.11 (Example 2.8 con tin ued) In the rst dart game exp erimen t, w e c ho ose for our sample space a disc of unit radius in the plane and for our densit y function the function f ( x; y ) = 1 = ; if x 2 + y 2 1, 0 ; otherwise. The probabilit y that the dart lands inside the subset E is then giv en b y P ( E ) = Z Z E 1 dx dy = 1 (area of E ) : 2 PAGE 68 60 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES In these t w o examples, the densit y function is constan t and do es not dep end on the particular outcome. It is often the case that exp erimen ts in whic h the co ordinates are c hosen at r andom can b e describ ed b y c onstant densit y functions, and, as in Section 1.2, w e call suc h densit y functions uniform or e quipr ob able. Not all exp erimen ts are of this t yp e, ho w ev er. Example 2.12 (Example 2.9 con tin ued) In the second dart game exp erimen t, w e c ho ose for our sample space the unit in terv al on the real line and for our densit y the function f ( r ) = 2 r ; if 0 < r < 1, 0 ; otherwise. Then the probabilit y that the dart lands at distance r a r b from the cen ter of the target is giv en b y P ([ a; b ]) = Z b a 2 r dr = b 2 a 2 : Here again, since the densit y is small when r is near 0 and large when r is near 1, w e see that in this exp erimen t the dart is more lik ely to land near the rim of the target than near the cen ter. In terms of the bar graph of Example 2.9, the heigh ts of the bars appro ximate the densit y function, while the areas of the bars appro ximate the probabilities of the subin terv als (see Figure 2.12). 2 W e see in this example that, unlik e the case of discrete sample spaces, the v alue f ( x ) of the densit y function for the outcome x is not the probabilit y of x o ccurring (w e ha v e seen that this probabilit y is alw a ys 0) and in general f ( x ) is not a pr ob ability at al l. In this example, if w e tak e = 2 then f (3 = 4) = 3 = 2, whic h b eing bigger than 1, cannot b e a probabilit y Nev ertheless, the densit y function f do es con tain all the probabilit y information ab out the exp erimen t, since the probabilities of all ev en ts can b e deriv ed from it. In particular, the probabilit y that the outcome of the exp erimen t falls in an in terv al [ a; b ] is giv en b y P ([ a; b ]) = Z b a f ( x ) dx ; that is, b y the ar e a under the graph of the densit y function in the in terv al [ a; b ]. Th us, there is a close connection here b et w een probabilities and areas. W e ha v e b een guided b y this close connection in making up our bar graphs; eac h bar is c hosen so that its ar e a, and not its heigh t, represen ts the relativ e frequency of o ccurrence, and hence estimates the probabilit y of the outcome falling in the asso ciated in terv al. In the language of the calculus, w e can sa y that the probabilit y of o ccurrence of an ev en t of the form [ x; x + dx ], where dx is small, is appro ximately giv en b y P ([ x; x + dx ]) f ( x ) dx ; that is, b y the area of the rectangle under the graph of f Note that as dx 0, this probabilit y 0, so that the probabilit y P ( f x g ) of a single p oin t is again 0, as in Example 2.7. PAGE 69 2.2. CONTINUOUS DENSITY FUNCTIONS 61 A glance at the graph of a densit y function tells us immediately whic h ev en ts of an exp erimen t are more lik ely Roughly sp eaking, w e can sa y that where the densit y is large the ev en ts are more lik ely and where it is small the ev en ts are less lik ely In Example 2.4 the densit y function is largest at 1. Th us, giv en the t w o in terv als [0 ; a ] and [1 ; 1 + a ], where a is a small p ositiv e real n um b er, w e see that X is more lik ely to tak e on a v alue in the second in terv al than in the rst. Cum ulativ e Distribution F unctions of Con tin uous Random V ariables W e ha v e seen that densit y functions are useful when considering con tin uous random v ariables. There is another kind of function, closely related to these densit y functions, whic h is also of great imp ortance. These functions are called cumulative distribution functions. Denition 2.2 Let X b e a con tin uous realv alued random v ariable. Then the cum ulativ e distribution function of X is dened b y the equation F X ( x ) = P ( X x ) : 2 If X is a con tin uous realv alued random v ariable whic h p ossesses a densit y function, then it also has a cum ulativ e distribution function, and the follo wing theorem sho ws that the t w o functions are related in a v ery nice w a y Theorem 2.1 Let X b e a con tin uous realv alued random v ariable with densit y function f ( x ). Then the function dened b y F ( x ) = Z x 1 f ( t ) dt is the cum ulativ e distribution function of X F urthermore, w e ha v e d dx F ( x ) = f ( x ) : Pro of. By denition, F ( x ) = P ( X x ) : Let E = ( 1 ; x ]. Then P ( X x ) = P ( X 2 E ) ; whic h equals Z x 1 f ( t ) dt : Applying the F undamen tal Theorem of Calculus to the rst equation in the statemen t of the theorem yields the second statemen t. 2 PAGE 70 62 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 1 0.5 0 0.5 1 1.5 2 0.25 0.5 0.75 1 1.25 1.5 1.75 2 f ( x ) F ( x ) X X Figure 2.13: Distribution and densit y for X = U 2 In man y exp erimen ts, the densit y function of the relev an t random v ariable is easy to write do wn. Ho w ev er, it is quite often the case that the cum ulativ e distribution function is easier to obtain than the densit y function. (Of course, once w e ha v e the cum ulativ e distribution function, the densit y function can easily b e obtained b y dieren tiation, as the ab o v e theorem sho ws.) W e no w giv e some examples whic h exhibit this phenomenon. Example 2.13 A real n um b er is c hosen at random from [0 ; 1] with uniform probabilit y and then this n um b er is squared. Let X represen t the result. What is the cum ulativ e distribution function of X ? What is the densit y of X ? W e b egin b y letting U represen t the c hosen real n um b er. Then X = U 2 If 0 x 1, then w e ha v e F X ( x ) = P ( X x ) = P ( U 2 x ) = P ( U p x ) = p x : It is clear that X alw a ys tak es on a v alue b et w een 0 and 1, so the cum ulativ e distribution function of X is giv en b y F X ( x ) = 8<: 0 ; if x 0 ; p x; if 0 x 1 ; 1 ; if x 1 : F rom this w e easily calculate that the densit y function of X is f X ( x ) = 8<: 0 ; if x 0 ; 1 = (2 p x ) ; if 0 x 1 ; 0 ; if x > 1 : Note that F X ( x ) is con tin uous, but f X ( x ) is not. (See Figure 2.13.) 2 PAGE 71 2.2. CONTINUOUS DENSITY FUNCTIONS 63 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 E .8 Figure 2.14: Calculation of distribution function for Example 2.14. When referring to a con tin uous random v ariable X (sa y with a uniform densit y function), it is customary to sa y that \ X is uniformly distribute d on the in terv al [ a; b ]." It is also customary to refer to the cum ulativ e distribution function of X as the distribution function of X Th us, the w ord \distribution" is b eing used in several dieren t w a ys in the sub ject of probabilit y (Recall that it also has a meaning when discussing discrete random v ariables.) When referring to the cum ulativ e distribution function of a con tin uous random v ariable X w e will alw a ys use the w ord \cum ulativ e" as a mo dier, unless the use of another mo dier, suc h as \normal" or \exp onen tial," mak es it clear. Since the phrase \uniformly densitied on the in terv al [ a; b ]" is not acceptable English, w e will ha v e to sa y \uniformly distributed" instead. Example 2.14 In Example 2.4, w e considered a random v ariable, dened to b e the sum of t w o random real n um b ers c hosen uniformly from [0 ; 1]. Let the random v ariables X and Y denote the t w o c hosen real n um b ers. Dene Z = X + Y W e will no w deriv e expressions for the cum ulativ e distribution function and the densit y function of Z Here w e tak e for our sample space n the unit square in R 2 with uniform densit y A p oin t 2 n then consists of a pair ( x; y ) of n um b ers c hosen at random. Then 0 Z 2. Let E z denote the ev en t that Z z In Figure 2.14, w e sho w the set E : 8 The ev en t E z for an y z b et w een 0 and 1, lo oks v ery similar to the shaded set in the gure. F or 1 < z 2, the set E z lo oks lik e the unit square with a triangle remo v ed from the upp er righ thand corner. W e can no w calculate the probabilit y distribution F Z of Z ; it is giv en b y F Z ( z ) = P ( Z z ) = Area of E z PAGE 72 64 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 1 1 2 3 0.2 0.4 0.6 0.8 1 1 1 2 3 0.2 0.4 0.6 0.8 1 F Z (z) f (z) Z Figure 2.15: Distribution and densit y functions for Example 2.14. 1 E Z Figure 2.16: Calculation of F z for Example 2.15. = 8>><>>: 0 ; if z < 0 ; (1 = 2) z 2 ; if 0 z 1 ; 1 (1 = 2)(2 z ) 2 ; if 1 z 2 ; 1 ; if 2 < z : The densit y function is obtained b y dieren tiating this function: f Z ( z ) = 8>><>>: 0 ; if z < 0 ; z ; if 0 z 1 ; 2 z ; if 1 z 2 ; 0 ; if 2 < z : The reader is referred to Figure 2.15 for the graphs of these functions. 2 Example 2.15 In the dart game describ ed in Example 2.8, what is the distribution of the distance of the dart from the cen ter of the target? What is its densit y? Here, as b efore, our sample space n is the unit disk in R 2 with co ordinates ( X ; Y ). Let Z = p X 2 + Y 2 represen t the distance from the cen ter of the target. Let PAGE 73 2.2. CONTINUOUS DENSITY FUNCTIONS 65 1 0.5 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1 0.5 0 0.5 1 1.5 2 0.25 0.5 0.75 1 1.25 1.5 1.75 2 F (z) Z f (z) Z Figure 2.17: Distribution and densit y for Z = p X 2 + Y 2 E b e the ev en t f Z z g Then the distribution function F Z of Z (see Figure 2.16) is giv en b y F Z ( z ) = P ( Z z ) = Area of E Area of target : Th us, w e easily compute that F Z ( z ) = 8<: 0 ; if z 0 ; z 2 ; if 0 z 1 ; 1 ; if z > 1 : The densit y f Z ( z ) is giv en again b y the deriv ativ e of F Z ( z ): f Z ( z ) = 8<: 0 ; if z 0 ; 2 z ; if 0 z 1 ; 0 ; if z > 1 : The reader is referred to Figure 2.17 for the graphs of these functions. W e can v erify this result b y sim ulation, as follo ws: W e c ho ose v alues for X and Y at random from [0 ; 1] with uniform distribution, calculate Z = p X 2 + Y 2 c hec k whether 0 Z 1, and presen t the results in a bar graph (see Figure 2.18). 2 Example 2.16 Supp ose Mr. and Mrs. Lo c khorn agree to meet at the Hano v er Inn b et w een 5:00 and 6:00 P .M. on T uesda y Supp ose eac h arriv es at a time b et w een 5:00 and 6:00 c hosen at random with uniform probabilit y What is the distribution function for the length of time that the rst to arriv e has to w ait for the other? What is the densit y function? Here again w e can tak e the unit square to represen t the sample space, and ( X ; Y ) as the arriv al times (after 5:00 P .M. ) for the Lo c khorns. Let Z = j X Y j Then w e ha v e F X ( x ) = x and F Y ( y ) = y Moreo v er (see Figure 2.19), F Z ( z ) = P ( Z z ) = P ( j X Y j z ) = Area of E : PAGE 74 66 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 Figure 2.18: Sim ulation results for Example 2.15. Th us, w e ha v e F Z ( z ) = 8<: 0 ; if z 0 ; 1 (1 z ) 2 ; if 0 z 1 ; 1 ; if z > 1 : The densit y f Z ( z ) is again obtained b y dieren tiation: f Z ( z ) = 8<: 0 ; if z 0 ; 2(1 z ) ; if 0 z 1 ; 0 ; if z > 1 : 2 Example 2.17 There are man y o ccasions where w e observ e a sequence of o ccurrences whic h o ccur at \random" times. F or example, w e migh t b e observing emissions of a radioactiv e isotop e, or cars passing a milep ost on a high w a y, or ligh t bulbs burning out. In suc h cases, w e migh t dene a random v ariable X to denote the time b et w een successiv e o ccurrences. Clearly X is a con tin uous random v ariable whose range consists of the nonnegativ e real n um b ers. It is often the case that w e can mo del X b y using the exp onential density This densit y is giv en b y the form ula f ( t ) = e t ; if t 0 ; 0 ; if t < 0 : The n um b er is a nonnegativ e real n um b er, and represen ts the recipro cal of the a v erage v alue of X (This will b e sho wn in Chapter 6.) Th us, if the a v erage time b et w een o ccurrences is 30 min utes, then = 1 = 30. A graph of this densit y function with = 1 = 30 is sho wn in Figure 2.20. One can see from the gure that ev en though the a v erage v alue is 30, o ccasionally m uc h larger v alues are tak en on b y X Supp ose that w e ha v e b ough t a computer that con tains a W arp 9 hard driv e. The salesp erson sa ys that the a v erage time b et w een breakdo wns of this t yp e of hard driv e is 30 mon ths. It is often assumed that the length of time b et w een breakdo wns PAGE 75 2.2. CONTINUOUS DENSITY FUNCTIONS 67 E 1 z 1 z 1 z 1 z E Figure 2.19: Calculation of F Z 20 40 60 80 100 120 0.005 0.01 0.015 0.02 0.025 0.03 f (t) = (1/30) e (1/30) t Figure 2.20: Exp onen tial densit y with = 1 = 30. PAGE 76 68 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 20 40 60 80 100 0 0.005 0.01 0.015 0.02 0.025 0.03 Figure 2.21: Residual lifespan of a hard driv e. is distributed according to the exp onen tial densit y W e will assume that this mo del applies here, with = 1 = 30. No w supp ose that w e ha v e b een op erating our computer for 15 mon ths. W e assume that the original hard driv e is still running. W e ask ho w long w e should exp ect the hard driv e to con tin ue to run. One could reasonably exp ect that the hard driv e will run, on the a v erage, another 15 mon ths. (One migh t also guess that it will run more than 15 mon ths, since the fact that it has already run for 15 mon ths implies that w e don't ha v e a lemon.) The time whic h w e ha v e to w ait is a new random v ariable, whic h w e will call Y Ob viously Y = X 15. W e can write a computer program to pro duce a sequence of sim ulated Y v alues. T o do this, w e rst pro duce a sequence of X 's, and discard those v alues whic h are less than or equal to 15 (these v alues corresp ond to the cases where the hard driv e has quit running b efore 15 mon ths). T o sim ulate a v alue of X w e compute the v alue of the expression 1 log ( r nd ) ; where r nd represen ts a random real n um b er b et w een 0 and 1. (That this expression has the exp onen tial densit y will b e sho wn in Chapter 4.3.) Figure 2.21 sho ws an area bar graph of 10,000 sim ulated Y v alues. The a v erage v alue of Y in this sim ulation is 29.74, whic h is closer to the original a v erage life span of 30 mon ths than to the v alue of 15 mon ths whic h w as guessed ab o v e. Also, the distribution of Y is seen to b e close to the distribution of X It is in fact the case that X and Y ha v e the same distribution. This prop ert y is called the memoryless pr op erty b ecause the amoun t of time that w e ha v e to w ait for an o ccurrence do es not dep end on ho w long w e ha v e already w aited. The only con tin uous densit y function with this prop ert y is the exp onen tial densit y 2 PAGE 77 2.2. CONTINUOUS DENSITY FUNCTIONS 69 Assignmen t of Probabilities A fundamen tal question in practice is: Ho w shall w e c ho ose the probabilit y densit y function in describing an y giv en exp erimen t? The answ er dep ends to a great exten t on the amoun t and kind of information a v ailable to us ab out the exp erimen t. In some cases, w e can see that the outcomes are equally lik ely In some cases, w e can see that the exp erimen t resem bles another already describ ed b y a kno wn densit y In some cases, w e can run the exp erimen t a large n um b er of times and mak e a reasonable guess at the densit y on the basis of the observ ed distribution of outcomes, as w e did in Chapter 1. In general, the problem of c ho osing the righ t densit y function for a giv en exp erimen t is a cen tral problem for the exp erimen ter and is not alw a ys easy to solv e (see Example 2.6). W e shall not examine this question in detail here but instead shall assume that the righ t densit y is already kno wn for eac h of the exp erimen ts under study The in tro duction of suitable co ordinates to describ e a con tin uous sample space, and a suitable densit y to describ e its probabilities, is not alw a ys so ob vious, as our nal example sho ws. Innite T ree Example 2.18 Consider an exp erimen t in whic h a fair coin is tossed rep eatedly without stopping. W e ha v e seen in Example 1.6 that, for a coin tossed n times, the natural sample space is a binary tree with n stages. On this evidence w e exp ect that for a coin tossed rep eatedly the natural sample space is a binary tree with an innite n um b er of stages, as indicated in Figure 2.22. It is surprising to learn that, although the n stage tree is ob viously a nite sample space, the unlimited tree can b e describ ed as a con tin uous sample space. T o see ho w this comes ab out, let us agree that a t ypical outcome of the unlimited coin tossing exp erimen t can b e describ ed b y a sequence of the form = f H H T H T T H : : : g If w e write 1 for H and 0 for T, then = f 1 1 0 1 0 0 1 : : : g In this w a y eac h outcome is describ ed b y a sequence of 0's and 1's. No w supp ose w e think of this sequence of 0's and 1's as the binary expansion of some real n um b er x = : 1101001 lying b et w een 0 and 1. (A binary exp ansion is lik e a decimal expansion but based on 2 instead of 10.) Then eac h outcome is describ ed b y a v alue of x and in this w a y x b ecomes a co ordinate for the sample space, taking on all real v alues b et w een 0 and 1. (W e note that it is p ossible for t w o dieren t sequences to corresp ond to the same real n um b er; for example, the sequences f T H H H H H : : : g and f H T T T T T : : : g b oth corresp ond to the real n um b er 1 = 2. W e will not concern ourselv es with this apparen t problem here.) What probabilities should b e assigned to the ev en ts of this sample space? Consider, for example, the ev en t E consisting of all outcomes for whic h the rst toss comes up heads and the second tails. Ev ery suc h outcome has the form : 10 where can b e either 0 or 1. No w if x is our realv alued co ordinate, then the v alue of x for ev ery suc h outcome m ust lie b et w een 1 = 2 = : 10000 and 3 = 4 = : 11000 and moreo v er, ev ery v alue of x b et w een 1/2 and 3/4 has a binary expansion of the PAGE 78 70 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 00 ( start) 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Figure 2.22: T ree for innite n um b er of tosses of a coin. form : 10 This means that 2 E if and only if 1 = 2 x < 3 = 4, and in this w a y w e see that w e can describ e E b y the in terv al [1 = 2 ; 3 = 4). More generally ev ery ev en t consisting of outcomes for whic h the results of the rst n tosses are prescrib ed is describ ed b y a binary in terv al of the form [ k = 2 n ; ( k + 1) = 2 n ). W e ha v e already seen in Section 1.2 that in the exp erimen t in v olving n tosses, the probabilit y of an y one outcome m ust b e exactly 1 = 2 n It follo ws that in the unlimited toss exp erimen t, the probabilit y of an y ev en t consisting of outcomes for whic h the results of the rst n tosses are prescrib ed m ust also b e 1 = 2 n But 1 = 2 n is exactly the length of the in terv al of x v alues describing E Th us w e see that, just as with the spinner exp erimen t, the probabilit y of an ev en t E is determined b y what fraction of the unit in terv al lies in E Consider again the statemen t: The probabilit y is 1/2 that a fair coin will turn up heads when tossed. W e ha v e suggested that one in terpretation of this statemen t is that if w e toss the coin indenitely the prop ortion of heads will approac h 1/2. That is, in our corresp ondence with binary sequences w e exp ect to get a binary sequence with the prop ortion of 1's tending to 1/2. The ev en t E of binary sequences for whic h this is true is a prop er subset of the set of all p ossible binary sequences. It do es not con tain, for example, the sequence 011011011 : : : (i.e., (011) rep eated again and again). The ev en t E is actually a v ery complicated subset of the binary sequences, but its probabilit y can b e determined as a limit of probabilities for ev en ts with a nite n um b er of outcomes whose probabilities are giv en b y nite tree measures. When the probabilit y of E is computed in this w a y its v alue is found to b e 1. This remark able result is kno wn as the Str ong L aw of L ar ge Numb ers (or L aw of A ver ages ) and is one justication for our frequency concept of probabilit y. W e shall pro v e a w eak form of this theorem in Chapter 8. 2 PAGE 79 2.2. CONTINUOUS DENSITY FUNCTIONS 71 Exercises 1 Supp ose y ou c ho ose at r andom a real n um b er X from the in terv al [2 ; 10]. (a) Find the densit y function f ( x ) and the probabilit y of an ev en t E for this exp erimen t, where E is a subin terv al [ a; b ] of [2 ; 10]. (b) F rom (a), nd the probabilit y that X > 5, that 5 < X < 7, and that X 2 12 X + 35 > 0. 2 Supp ose y ou c ho ose a real n um b er X from the in terv al [2 ; 10] with a densit y function of the form f ( x ) = C x ; where C is a constan t. (a) Find C (b) Find P ( E ), where E = [ a; b ] is a subin terv al of [2 ; 10]. (c) Find P ( X > 5), P ( X < 7), and P ( X 2 12 X + 35 > 0). 3 Same as Exercise 2, but supp ose f ( x ) = C x : 4 Supp ose y ou thro w a dart at a circular target of radius 10 inc hes. Assuming that y ou hit the target and that the co ordinates of the outcomes are c hosen at random, nd the probabilit y that the dart falls (a) within 2 inc hes of the cen ter. (b) within 2 inc hes of the rim. (c) within the rst quadran t of the target. (d) within the rst quadran t and within 2 inc hes of the rim. 5 Supp ose y ou are w atc hing a radioactiv e source that emits particles at a rate describ ed b y the exp onen tial densit y f ( t ) = e t ; where = 1, so that the probabilit y P (0 ; T ) that a particle will app ear in the next T seconds is P ([0 ; T ]) = R T 0 e t dt Find the probabilit y that a particle (not necessarily the rst) will app ear (a) within the next second. (b) within the next 3 seconds. (c) b et w een 3 and 4 seconds from no w. (d) after 4 seconds from no w. PAGE 80 72 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 6 Assume that a new ligh t bulb will burn out after t hours, where t is c hosen from [0 ; 1 ) with an exp onen tial densit y f ( t ) = e t : In this con text, is often called the failur e r ate of the bulb. (a) Assume that = 0 : 01, and nd the probabilit y that the bulb will not burn out b efore T hours. This probabilit y is often called the r eliability of the bulb. (b) F or what T is the reliabilit y of the bulb = 1 = 2? 7 Cho ose a n um b er B at r andom from the in terv al [0 ; 1] with uniform densit y Find the probabilit y that (a) 1 = 3 < B < 2 = 3. (b) j B 1 = 2 j 1 = 4. (c) B < 1 = 4 or 1 B < 1 = 4. (d) 3 B 2 < B 8 Cho ose indep enden tly t w o n um b ers B and C at r andom from the in terv al [0 ; 1] with uniform densit y Note that the p oin t ( B ; C ) is then c hosen at r andom in the unit square. Find the probabilit y that (a) B + C < 1 = 2. (b) B C < 1 = 2. (c) j B C j < 1 = 2. (d) max f B ; C g < 1 = 2. (e) min f B ; C g < 1 = 2. (f ) B < 1 = 2 and 1 C < 1 = 2. (g) conditions (c) and (f ) b oth hold. (h) B 2 + C 2 1 = 2. (i) ( B 1 = 2) 2 + ( C 1 = 2) 2 < 1 = 4. 9 Supp ose that w e ha v e a sequence of o ccurrences. W e assume that the time X b et w een o ccurrences is exp onen tially distributed with = 1 = 10, so on the a v erage, there is one o ccurrence ev ery 10 min utes (see Example 2.17). Y ou come up on this system at time 100, and w ait un til the next o ccurrence. Mak e a conjecture concerning ho w long, on the a v erage, y ou will ha v e to w ait. W rite a program to see if y our conjecture is righ t. 10 As in Exercise 9, assume that w e ha v e a sequence of o ccurrences, but no w assume that the time X b et w een o ccurrences is uniformly distributed b et w een 5 and 15. As b efore, y ou come up on this system at time 100, and w ait un til the next o ccurrence. Mak e a conjecture concerning ho w long, on the a v erage, y ou will ha v e to w ait. W rite a program to see if y our conjecture is righ t. PAGE 81 2.2. CONTINUOUS DENSITY FUNCTIONS 73 11 F or examples suc h as those in Exercises 9 and 10, it migh t seem that at least y ou should not ha v e to w ait on a v erage mor e than 10 min utes if the a v erage time b et w een o ccurrences is 10 min utes. Alas, ev en this is not true. T o see wh y consider the follo wing assumption ab out the times b et w een o ccurrences. Assume that the time b et w een o ccurrences is 3 min utes with probabilit y .9 and 73 min utes with probabilit y .1. Sho w b y sim ulation that the a v erage time b et w een o ccurrences is 10 min utes, but that if y ou come up on this system at time 100, y our a v erage w aiting time is more than 10 min utes. 12 T ak e a stic k of unit length and break it in to three pieces, c ho osing the break p oin ts at random. (The break p oin ts are assumed to b e c hosen sim ultaneously .) What is the probabilit y that the three pieces can b e used to form a triangle? Hint : The sum of the lengths of an y t w o pieces m ust exceed the length of the third, so eac h piece m ust ha v e length < 1 = 2. No w use Exercise 8(g). 13 T ak e a stic k of unit length and break it in to t w o pieces, c ho osing the break p oin t at random. No w break the longer of the t w o pieces at a random p oin t. What is the probabilit y that the three pieces can b e used to form a triangle? 14 Cho ose indep enden tly t w o n um b ers B and C at r andom from the in terv al [ 1 ; 1] with uniform distribution, and consider the quadratic equation x 2 + B x + C = 0 : Find the probabilit y that the ro ots of this equation (a) are b oth real. (b) are b oth p ositiv e. Hints : (a) requires 0 B 2 4 C (b) requires 0 B 2 4 C B 0, 0 C 15 A t the T un bridge W orld's F air, a coin toss game w orks as follo ws. Quarters are tossed on to a c hec k erb oard. The managemen t k eeps all the quarters, but for eac h quarter landing en tirely within one square of the c hec k erb oard the managemen t pa ys a dollar. Assume that the edge of eac h square is t wice the diameter of a quarter, and that the outcomes are describ ed b y co ordinates c hosen at r andom. Is this a fair game? 16 Three p oin ts are c hosen at r andom on a circle of unit cir cumfer enc e. What is the probabilit y that the triangle dened b y these p oin ts as v ertices has three acute angles? Hint : One of the angles is obtuse if and only if all three p oin ts lie in the same semicircle. T ak e the circumference as the in terv al [0 ; 1]. T ak e one p oin t at 0 and the others at B and C 17 W rite a program to c ho ose a random n um b er X in the in terv al [2 ; 10] 1000 times and record what fraction of the outcomes satisfy X > 5, what fraction satisfy 5 < X < 7, and what fraction satisfy x 2 12 x + 35 > 0. Ho w do these results compare with Exercise 1? PAGE 82 74 CHAPTER 2. CONTINUOUS PR OBABILITY DENSITIES 18 W rite a program to c ho ose a p oin t ( X ; Y ) at r andom in a square of side 20 inc hes, doing this 10,000 times, and recording what fraction of the outcomes fall within 19 inc hes of the cen ter; of these, what fraction fall b et w een 8 and 10 inc hes of the cen ter; and, of these, what fraction fall within the rst quadran t of the square. Ho w do these results compare with those of Exercise 4? 19 W rite a program to sim ulate the problem describ e in Exercise 7 (see Exercise 17). Ho w do the sim ulation results compare with the results of Exercise 7? 20 W rite a program to sim ulate the problem describ ed in Exercise 12. 21 W rite a program to sim ulate the problem describ ed in Exercise 16. 22 W rite a program to carry out the follo wing exp erimen t. A coin is tossed 100 times and the n um b er of heads that turn up is recorded. This exp erimen t is then rep eated 1000 times. Ha v e y our program plot a bar graph for the prop ortion of the 1000 exp erimen ts in whic h the n um b er of heads is n for eac h n in the in terv al [35 ; 65]. Do es the bar graph lo ok as though it can b e t with a normal curv e? 23 W rite a program that pic ks a random n um b er b et w een 0 and 1 and computes the negativ e of its logarithm. Rep eat this pro cess a large n um b er of times and plot a bar graph to giv e the n um b er of times that the outcome falls in eac h in terv al of length 0.1 in [0 ; 10]. On this bar graph plot a graph of the densit y f ( x ) = e x Ho w w ell do es this densit y t y our graph? PAGE 83 Chapter 3 Com binatorics 3.1 P erm utations Man y problems in probabilit y theory require that w e coun t the n um b er of w a ys that a particular ev en t can o ccur. F or this, w e study the topics of p ermutations and c ombinations. W e consider p erm utations in this section and com binations in the next section. Before discussing p erm utations, it is useful to in tro duce a general coun ting tec hnique that will enable us to solv e a v ariet y of coun ting problems, including the problem of coun ting the n um b er of p ossible p erm utations of n ob jects. Coun ting Problems Consider an exp erimen t that tak es place in sev eral stages and is suc h that the n um b er of outcomes m at the n th stage is indep enden t of the outcomes of the previous stages. The n um b er m ma y b e dieren t for dieren t stages. W e w an t to coun t the n um b er of w a ys that the en tire exp erimen t can b e carried out. Example 3.1 Y ou are eating at Emile's restauran t and the w aiter informs y ou that y ou ha v e (a) t w o c hoices for app etizers: soup or juice; (b) three for the main course: a meat, sh, or v egetable dish; and (c) t w o for dessert: ice cream or cak e. Ho w man y p ossible c hoices do y ou ha v e for y our complete meal? W e illustrate the p ossible meals b y a tree diagram sho wn in Figure 3.1. Y our men u is decided in three stagesat eac h stage the n um b er of p ossible c hoices do es not dep end on what is c hosen in the previous stages: t w o c hoices at the rst stage, three at the second, and t w o at the third. F rom the tree diagram w e see that the total n um b er of c hoices is the pro duct of the n um b er of c hoices at eac h stage. In this examples w e ha v e 2 3 2 = 12 p ossible men us. Our men u example is an example of the follo wing general coun ting tec hnique. 2 75 PAGE 84 76 CHAPTER 3. COMBINA TORICS ice cream cake ice cream cake ice cream cake ice cream cakeice cream cake ice cream cake (start) soup meat fish vegetable juice meat fish vegetable Figure 3.1: T ree for y our men u. A Coun ting T ec hnique A task is to b e carried out in a sequence of r stages. There are n 1 w a ys to carry out the rst stage; for eac h of these n 1 w a ys, there are n 2 w a ys to carry out the second stage; for eac h of these n 2 w a ys, there are n 3 w a ys to carry out the third stage, and so forth. Then the total n um b er of w a ys in whic h the en tire task can b e accomplished is giv en b y the pro duct N = n 1 n 2 : : : n r T ree Diagrams It will often b e useful to use a tree diagram when studying probabilities of ev en ts relating to exp erimen ts that tak e place in stages and for whic h w e are giv en the probabilities for the outcomes at eac h stage. F or example, assume that the o wner of Emile's restauran t has observ ed that 80 p ercen t of his customers c ho ose the soup for an app etizer and 20 p ercen t c ho ose juice. Of those who c ho ose soup, 50 p ercen t c ho ose meat, 30 p ercen t c ho ose sh, and 20 p ercen t c ho ose the v egetable dish. Of those who c ho ose juice for an app etizer, 30 p ercen t c ho ose meat, 40 p ercen t c ho ose sh, and 30 p ercen t c ho ose the v egetable dish. W e can use this to estimate the probabilities at the rst t w o stages as indicated on the tree diagram of Figure 3.2. W e c ho ose for our sample space the set n of all p ossible paths = 1 2 6 through the tree. Ho w should w e assign our probabilit y distribution? F or example, what probabilit y should w e assign to the customer c ho osing soup and then the meat? If 8/10 of the customers c ho ose soup and then 1/2 of these c ho ose meat, a prop ortion 8 = 10 1 = 2 = 4 = 10 of the customers c ho ose soup and then meat. This suggests c ho osing our probabilit y distribution for eac h path through the tree to b e the pr o duct of the probabilities at eac h of the stages along the path. This results in the probabilit y distribution for the sample p oin ts indicated in Figure 3.2. (Note that m ( 1 ) + + m ( 6 ) = 1.) F rom this w e see, for example, that the probabilit y PAGE 85 3.1. PERMUT A TIONS 77 (start) soup meat fish vegetable juice .8 .2 .2 .3 .3 .4 .5 .3 meat fish vegetable w (w) w w w w w w .4 .24 .16 .06 .08 .06 m 1 2 3 456 Figure 3.2: Tw ostage probabilit y assignmen t. that a customer c ho oses meat is m ( 1 ) + m ( 4 ) = : 46. W e shall sa y more ab out these tree measures when w e discuss the concept of conditional probabilit y in Chapter 4. W e return no w to more coun ting problems. Example 3.2 W e can sho w that there are at least t w o p eople in Colum bus, Ohio, who ha v e the same three initials. Assuming that eac h p erson has three initials, there are 26 p ossibilities for a p erson's rst initial, 26 for the second, and 26 for the third. Therefore, there are 26 3 = 17 ; 576 p ossible sets of initials. This n um b er is smaller than the n um b er of p eople living in Colum bus, Ohio; hence, there m ust b e at least t w o p eople with the same three initials. 2 W e consider next the celebrated birthda y problemoften used to sho w that naiv e in tuition cannot alw a ys b e trusted in probabilit y Birthda y Problem Example 3.3 Ho w man y p eople do w e need to ha v e in a ro om to mak e it a fa v orable b et (probabilit y of success greater than 1/2) that t w o p eople in the ro om will ha v e the same birthda y? Since there are 365 p ossible birthda ys, it is tempting to guess that w e w ould need ab out 1/2 this n um b er, or 183. Y ou w ould surely win this b et. In fact, the n um b er required for a fa v orable b et is only 23. T o sho w this, w e nd the probabilit y p r that, in a ro om with r p eople, there is no duplication of birthda ys; w e will ha v e a fa v orable b et if this probabilit y is less than one half. PAGE 86 78 CHAPTER 3. COMBINA TORICS Num b er of p eople Probabilit y that all birthda ys are dieren t 20 .5885616 21 .5563117 22 .5243047 23 .4927028 24 .4616557 25 .4313003 T able 3.1: Birthda y problem. Assume that there are 365 p ossible birthda ys for eac h p erson (w e ignore leap y ears). Order the p eople from 1 to r F or a sample p oin t w e c ho ose a p ossible sequence of length r of birthda ys eac h c hosen as one of the 365 p ossible dates. There are 365 p ossibilities for the rst elemen t of the sequence, and for eac h of these c hoices there are 365 for the second, and so forth, making 365 r p ossible sequences of birthda ys. W e m ust nd the n um b er of these sequences that ha v e no duplication of birthda ys. F or suc h a sequence, w e can c ho ose an y of the 365 da ys for the rst elemen t, then an y of the remaining 364 for the second, 363 for the third, and so forth, un til w e mak e r c hoices. F or the r th c hoice, there will b e 365 r + 1 p ossibilities. Hence, the total n um b er of sequences with no duplications is 365 364 363 : : : (365 r + 1) : Th us, assuming that eac h sequence is equally lik ely p r = 365 364 : : : (365 r + 1) 365 r : W e denote the pro duct ( n )( n 1) ( n r + 1) b y ( n ) r (read \ n do wn r ," or \ n lo w er r "). Th us, p r = (365) r (365) r : The program Birthda y carries out this computation and prin ts the probabilities for r = 20 to 25. Running this program, w e get the results sho wn in T able 3.1. As w e asserted ab o v e, the probabilit y for no duplication c hanges from greater than one half to less than one half as w e mo v e from 22 to 23 p eople. T o see ho w unlik ely it is that w e w ould lose our b et for larger n um b ers of p eople, w e ha v e run the program again, prin ting out v alues from r = 10 to r = 100 in steps of 10. W e see that in a ro om of 40 p eople the o dds already hea vily fa v or a duplication, and in a ro om of 100 the o dds are o v erwhelmingly in fa v or of a duplication. W e ha v e assumed that birthda ys are equally lik ely to fall on an y particular da y Statistical evidence suggests that this is not true. Ho w ev er, it is in tuitiv ely clear (but not easy to pro v e) that this mak es it ev en more lik ely to ha v e a duplication with a group of 23 p eople. (See Exercise 19 to nd out what happ ens on planets with more or few er than 365 da ys p er y ear.) 2 PAGE 87 3.1. PERMUT A TIONS 79 Num b er of p eople Probabilit y that all birthda ys are dieren t 10 .8830518 20 .5885616 30 .2936838 40 .1087682 50 .0296264 60 .0058773 70 .0008404 80 .0000857 90 .0000062 100 .0000003 T able 3.2: Birthda y problem. W e no w turn to the topic of p erm utations. P erm utations Denition 3.1 Let A b e an y nite set. A p ermutation of A is a onetoone mapping of A on to itself. 2 T o sp ecify a particular p erm utation w e list the elemen ts of A and, under them, sho w where eac h elemen t is sen t b y the onetoone mapping. F or example, if A = f a; b; c g a p ossible p erm utation w ould b e = a b c b c a : By the p erm utation a is sen t to b b is sen t to c and c is sen t to a The condition that the mapping b e onetoone means that no t w o elemen ts of A are sen t, b y the mapping, in to the same elemen t of A W e can put the elemen ts of our set in some order and rename them 1, 2, n Then, a t ypical p erm utation of the set A = f a 1 ; a 2 ; a 3 ; a 4 g can b e written in the form = 1 2 3 4 2 1 4 3 ; indicating that a 1 w en t to a 2 a 2 to a 1 a 3 to a 4 and a 4 to a 3 If w e alw a ys c ho ose the top ro w to b e 1 2 3 4 then, to prescrib e the p erm utation, w e need only giv e the b ottom ro w, with the understanding that this tells us where 1 go es, 2 go es, and so forth, under the mapping. When this is done, the p erm utation is often called a r e arr angement of the n ob jects 1, 2, 3, n F or example, all p ossible p erm utations, or rearrangemen ts, of the n um b ers A = f 1 ; 2 ; 3 g are: 123 ; 132 ; 213 ; 231 ; 312 ; 321 : It is an easy matter to coun t the n um b er of p ossible p erm utations of n ob jects. By our general coun ting principle, there are n w a ys to assign the rst elemen t, for PAGE 88 80 CHAPTER 3. COMBINA TORICS n n 0 1 1 1 2 2 3 6 4 24 5 120 6 720 7 5040 8 40320 9 362880 10 3628800 T able 3.3: V alues of the factorial function. eac h of these w e ha v e n 1 w a ys to assign the second ob ject, n 2 for the third, and so forth. This pro v es the follo wing theorem. Theorem 3.1 The total n um b er of p erm utations of a set A of n elemen ts is giv en b y n ( n 1) ( n 2) : : : 1. 2 It is sometimes helpful to consider orderings of subsets of a giv en set. This prompts the follo wing denition. Denition 3.2 Let A b e an n elemen t set, and let k b e an in teger b et w een 0 and n Then a k p erm utation of A is an ordered listing of a subset of A of size k 2 Using the same tec hniques as in the last theorem, the follo wing result is easily pro v ed. Theorem 3.2 The total n um b er of k p erm utations of a set A of n elemen ts is giv en b y n ( n 1) ( n 2) : : : ( n k + 1). 2 F actorials The n um b er giv en in Theorem 3.1 is called n factorial, and is denoted b y n !. The expression 0! is dened to b e 1 to mak e certain form ulas come out simpler. The rst few v alues of this function are sho wn in T able 3.3. The reader will note that this function gro ws v ery rapidly The expression n will en ter in to man y of our calculations, and w e shall need to ha v e some estimate of its magnitude when n is large. It is clearly not practical to mak e exact calculations in this case. W e shall instead use a result called Stirling's formula. Before stating this form ula w e need a denition. PAGE 89 3.1. PERMUT A TIONS 81 n n Appro ximation Ratio 1 1 .922 1.084 2 2 1.919 1.042 3 6 5.836 1.028 4 24 23.506 1.021 5 120 118.019 1.016 6 720 710.078 1.013 7 5040 4980.396 1.011 8 40320 39902.395 1.010 9 362880 359536.873 1.009 10 3628800 3598696.619 1.008 T able 3.4: Stirling appro ximations to the factorial function. Denition 3.3 Let a n and b n b e t w o sequences of n um b ers. W e sa y that a n is asymptotic al ly e qual to b n and write a n b n if lim n !1 a n b n = 1 : 2 Example 3.4 If a n = n + p n and b n = n then, since a n =b n = 1 + 1 = p n and this ratio tends to 1 as n tends to innit y w e ha v e a n b n 2 Theorem 3.3 (Stirling's F orm ula) The sequence n is asymptotically equal to n n e n p 2 n : 2 The pro of of Stirling's form ula ma y b e found in most analysis texts. Let us v erify this appro ximation b y using the computer. The program StirlingAppro ximations prin ts n !, the Stirling appro ximation, and, nally the ratio of these t w o n um b ers. Sample output of this program is sho wn in T able 3.4. Note that, while the ratio of the n um b ers is getting closer to 1, the dierence b et w een the exact v alue and the appro ximation is increasing, and indeed, this dierence will tend to innit y as n tends to innit y ev en though the ratio tends to 1. (This w as also true in our Example 3.4 where n + p n n but the dierence is p n .) Generating Random P erm utations W e no w consider the question of generating a random p erm utation of the in tegers b et w een 1 and n Consider the follo wing exp erimen t. W e start with a dec k of n cards, lab elled 1 through n W e c ho ose a random card out of the dec k, note its lab el, and put the card aside. W e rep eat this pro cess un til all n cards ha v e b een c hosen. It is clear that eac h p erm utation of the in tegers from 1 to n can o ccur as a sequence PAGE 90 82 CHAPTER 3. COMBINA TORICS Num b er of xed p oin ts F raction of p erm utations n = 10 n = 20 n = 30 0 .362 .370 .358 1 .368 .396 .358 2 .202 .164 .192 3 .052 .060 .070 4 .012 .008 .020 5 .004 .002 .002 Av erage n um b er of xed p oin ts .996 .948 1 .042 T able 3.5: Fixed p oin t distributions. of lab els in this exp erimen t, and that eac h sequence of lab els is equally lik ely to o ccur. In our implemen tations of the computer algorithms, the ab o v e pro cedure is called RandomP erm utation Fixed P oin ts There are man y in teresting problems that relate to prop erties of a p erm utation c hosen at random from the set of all p erm utations of a giv en nite set. F or example, since a p erm utation is a onetoone mapping of the set on to itself, it is in teresting to ask ho w man y p oin ts are mapp ed on to themselv es. W e call suc h p oin ts xe d p oints of the mapping. Let p k ( n ) b e the probabilit y that a random p erm utation of the set f 1 ; 2 ; : : : ; n g has exactly k xed p oin ts. W e will attempt to learn something ab out these probabilities using sim ulation. The program FixedP oin ts uses the pro cedure RandomP erm utation to generate random p erm utations and coun t xed p oin ts. The program prin ts the prop ortion of times that there are k xed p oin ts as w ell as the a v erage n um b er of xed p oin ts. The results of this program for 500 sim ulations for the cases n = 10, 20, and 30 are sho wn in T able 3.5. Notice the rather surprising fact that our estimates for the probabilities do not seem to dep end v ery hea vily on the n um b er of elemen ts in the p erm utation. F or example, the probabilit y that there are no xed p oin ts, when n = 10 ; 20 ; or 30 is estimated to b e b et w een .35 and .37. W e shall see later (see Example 3.12) that for n 10 the exact probabilities p n (0) are, to six decimal place accuracy equal to 1 =e : 367879. Th us, for all practical purp oses, after n = 10 the probabilit y that a random p erm utation of the set f 1 ; 2 ; : : : ; n g has no xed p oin ts do es not dep end up on n These sim ulations also suggest that the a v erage n um b er of xed p oin ts is close to 1. It can b e sho wn (see Example 6.8) that the a v erage is exactly equal to 1 for all n More picturesque v ersions of the xedp oin t problem are: Y ou ha v e arranged the b o oks on y our b o ok shelf in alphab etical order b y author and they get returned to y our shelf at random; what is the probabilit y that exactly k of the b o oks end up in their correct p osition? (The library problem.) In a restauran t n hats are c hec k ed and they are hop elessly scram bled; what is the probabilit y that no one gets his o wn hat bac k? (The hat c hec k problem.) In the Historical Remarks at the end of this section, w e giv e one metho d for solving the hat c hec k problem exactly Another PAGE 91 3.1. PERMUT A TIONS 83 Date Sno wfall in inc hes 1974 75 1975 88 1976 72 1977 110 1978 85 1979 30 1980 55 1981 86 1982 51 1983 64 T able 3.6: Sno wfall in Hano v er. Y ear 1 2 3 4 5 6 7 8 9 10 Ranking 6 9 5 10 7 1 3 8 2 4 T able 3.7: Ranking of total sno wfall. metho d is giv en in Example 3.12. RecordsHere is another in teresting probabilit y problem that in v olv es p erm utations. Estimates for the amoun t of measured sno w in inc hes in Hano v er, New Hampshire, in the ten y ears from 1974 to 1983 are sho wn in T able 3.6. Supp ose w e ha v e started k eeping records in 1974. Then our rst y ear's sno wfall could b e considered a record sno wfall starting from this y ear. A new record w as established in 1975; the next record w as established in 1977, and there w ere no new records established after this y ear. Th us, in this teny ear p erio d, there w ere three records established: 1974, 1975, and 1977. The question that w e ask is: Ho w man y records should w e exp ect to b e established in suc h a teny ear p erio d? W e can coun t the n um b er of records in terms of a p erm utation as follo ws: W e n um b er the y ears from 1 to 10. The actual amoun ts of sno wfall are not imp ortan t but their relativ e sizes are. W e can, therefore, c hange the n um b ers measuring sno wfalls to n um b ers 1 to 10 b y replacing the smallest n um b er b y 1, the next smallest b y 2, and so forth. (W e assume that there are no ties.) F or our example, w e obtain the data sho wn in T able 3.7. This giv es us a p erm utation of the n um b ers from 1 to 10 and, from this p erm utation, w e can read o the records; they are in y ears 1, 2, and 4. Th us w e can dene records for a p erm utation as follo ws: Denition 3.4 Let b e a p erm utation of the set f 1 ; 2 ; : : : ; n g Then i is a r e c or d of if either i = 1 or ( j ) < ( i ) for ev ery j = 1 ; : : : ; i 1. 2 No w if w e regard all rankings of sno wfalls o v er an n y ear p erio d to b e equally lik ely (and allo w no ties), w e can estimate the probabilit y that there will b e k records in n y ears as w ell as the a v erage n um b er of records b y sim ulation. PAGE 92 84 CHAPTER 3. COMBINA TORICS W e ha v e written a program Records that coun ts the n um b er of records in randomly c hosen p erm utations. W e ha v e run this program for the cases n = 10, 20, 30. F or n = 10 the a v erage n um b er of records is 2.968, for 20 it is 3.656, and for 30 it is 3.960. W e see no w that the a v erages increase, but v ery slo wly W e shall see later (see Example 6.11) that the a v erage n um b er is appro ximately log n Since log 10 = 2 : 3, log 20 = 3, and log 30 = 3 : 4, this is consisten t with the results of our sim ulations. As remark ed earlier, w e shall b e able to obtain form ulas for exact results of certain problems of the ab o v e t yp e. Ho w ev er, only minor c hanges in the problem mak e this imp ossible. The p o w er of sim ulation is that minor c hanges in a problem do not mak e the sim ulation m uc h more dicult. (See Exercise 20 for an in teresting v ariation of the hat c hec k problem.) List of P erm utations Another metho d to solv e problems that is not sensitiv e to small c hanges in the problem is to ha v e the computer simply list all p ossible p erm utations and coun t the fraction that ha v e the desired prop ert y The program AllP erm utations pro duces a list of all of the p erm utations of n When w e try running this program, w e run in to a limitation on the use of the computer. The n um b er of p erm utations of n increases so rapidly that ev en to list all p erm utations of 20 ob jects is impractical. Historical Remarks Our basic coun ting principle stated that if y ou can do one thing in r w a ys and for eac h of these another thing in s w a ys, then y ou can do the pair in r s w a ys. This is suc h a selfeviden t result that y ou migh t exp ect that it o ccurred v ery early in mathematics. N. L. Biggs suggests that w e migh t trace an example of this principle as follo ws: First, he relates a p opular n ursery rh yme dating bac k to at least 1730: As I w as going to St. Iv es, I met a man with sev en wiv es, Eac h wife had sev en sac ks, Eac h sac k had sev en cats, Eac h cat had sev en kits. Kits, cats, sac ks and wiv es, Ho w man y w ere going to St. Iv es? (Y ou need our principle only if y ou are not clev er enough to realize that y ou are supp osed to answ er one, since only the narrator is going to St. Iv es; the others are going in the other direction!) He also giv es a problem app earing on one of the oldest surviving mathematical man uscripts of ab out 1650 B.C. roughly translated as: PAGE 93 3.1. PERMUT A TIONS 85 Houses 7 Cats 49 Mice 343 Wheat 2401 Hek at 16807 19607 The follo wing in terpretation has b een suggested: there are sev en houses, eac h with sev en cats; eac h cat kills sev en mice; eac h mouse w ould ha v e eaten sev en heads of wheat, eac h of whic h w ould ha v e pro duced sev en hek at measures of grain. With this in terpretation, the table answ ers the question of ho w man y hek at measures w ere sa v ed b y the cats' actions. It is not clear wh y the writer of the table w an ted to add the n um b ers together. 1 One of the earliest uses of factorials o ccurred in Euclid's pro of that there are innitely man y prime n um b ers. Euclid argued that there m ust b e a prime n um b er b et w een n and n + 1 as follo ws: n and n + 1 cannot ha v e common factors. Either n + 1 is prime or it has a prop er factor. In the latter case, this factor cannot divide n and hence m ust b e b et w een n and n + 1. If this factor is not prime, then it has a factor that, b y the same argumen t, m ust b e bigger than n In this w a y w e ev en tually reac h a prime bigger than n and this holds for all n The \ n !" rule for the n um b er of p erm utations seems to ha v e o ccurred rst in India. Examples ha v e b een found as early as 300 B.C. and b y the elev en th cen tury the general form ula seems to ha v e b een w ell kno wn in India and then in the Arab coun tries. The hat che ck pr oblem is found in an early probabilit y b o ok written b y de Mon tmort and rst prin ted in 1708. 2 It app ears in the form of a game called T r eize. In a simplied v ersion of this game considered b y de Mon tmort one turns o v er cards n um b ered 1 to 13, calling out 1, 2, 13 as the cards are examined. De Mon tmort ask ed for the probabilit y that no card that is turned up agrees with the n um b er called out. This probabilit y is the same as the probabilit y that a random p erm utation of 13 elemen ts has no xed p oin t. De Mon tmort solv ed this problem b y the use of a recursion relation as follo ws: let w n b e the n um b er of p erm utations of n elemen ts with no xed p oin t (suc h p erm utations are called der angements ). Then w 1 = 0 and w 2 = 1. No w assume that n 3 and c ho ose a derangemen t of the in tegers b et w een 1 and n Let k b e the in teger in the rst p osition in this derangemen t. By the denition of derangemen t, w e ha v e k 6 = 1. There are t w o p ossibilities of in terest concerning the p osition of 1 in the derangemen t: either 1 is in the k th p osition or it is elsewhere. In the rst case, the n 2 remaining in tegers can b e p ositioned in w n 2 w a ys without resulting in an y xed p oin ts. In the second case, w e consider the set of in tegers f 1 ; 2 ; : : : ; k 1 ; k + 1 ; : : : ; n g The n um b ers in this set m ust o ccup y the p ositions f 2 ; 3 ; : : : ; n g so that none of the n um b ers other than 1 in this set are xed, and 1 N. L. Biggs, \The Ro ots of Com binatorics," Historia Mathematic a, v ol. 6 (1979), pp. 109{136. 2 P R. de Mon tmort, Essay d'A nalyse sur des Jeux de Hazar d, 2d ed. (P aris: Quillau, 1713). PAGE 94 86 CHAPTER 3. COMBINA TORICS also so that 1 is not in p osition k The n um b er of w a ys of ac hieving this kind of arrangemen t is just w n 1 Since there are n 1 p ossible v alues of k w e see that w n = ( n 1) w n 1 + ( n 1) w n 2 for n 3. One migh t conjecture from this last equation that the sequence f w n g gro ws lik e the sequence f n g In fact, it is easy to pro v e b y induction that w n = nw n 1 + ( 1) n : Then p i = w i =i satises p i p i 1 = ( 1) i i : If w e sum from i = 2 to n and use the fact that p 1 = 0, w e obtain p n = 1 2! 1 3! + + ( 1) n n : This agrees with the rst n + 1 terms of the expansion for e x for x = 1 and hence for large n is appro ximately e 1 : 368. Da vid remarks that this w as p ossibly the rst use of the exp onen tial function in probabilit y 3 W e shall see another w a y to deriv e de Mon tmort's result in the next section, using a metho d kno wn as the InclusionExclusion metho d. Recen tly a related problem app eared in a column of Marilyn v os Sa v an t. 4 Charles Price wrote to ask ab out his exp erience pla ying a certain form of solitaire, sometimes called \frustration solitaire." In this particular game, a dec k of cards is sh ued, and then dealt out, one card at a time. As the cards are b eing dealt, the pla y er coun ts from 1 to 13, and then starts again at 1. (Th us, eac h n um b er is coun ted four times.) If a n um b er that is b eing coun ted coincides with the rank of the card that is b eing turned up, then the pla y er loses the game. Price found that he rarely w on and w ondered ho w often he should win. V os Sa v an t remark ed that the exp ected n um b er of matc hes is 4 so it should b e dicult to win the game. Finding the c hance of winning is a harder problem than the one that de Mon tmort solv ed b ecause, when one go es through the en tire dec k, there are dieren t patterns for the matc hes that migh t o ccur. F or example matc hes ma y o ccur for t w o cards of the same rank, sa y t w o aces, or for t w o dieren t ranks, sa y a t w o and a three. A discussion of this problem can b e found in Riordan. 5 In this b o ok, it is sho wn that as n 1 the probabilit y of no matc hes tends to 1 =e 4 The original game of T reize is more dicult to analyze than frustration solitaire. The game of T reize is pla y ed as follo ws. One p erson is c hosen as dealer and the others are pla y ers. Eac h pla y er, other than the dealer, puts up a stak e. The dealer sh ues the cards and turns them up one at a time calling out, \Ace, t w o, three,..., 3 F. N. Da vid, Games, Go ds and Gambling (London: Grin, 1962), p. 146. 4 M. v os Sa v an t, Ask Marilyn, Par ade Magazine, Boston Glob e 21 August 1994. 5 J. Riordan, A n Intr o duction to Combinatorial A nalysis, (New Y ork: John Wiley & Sons, 1958). PAGE 95 3.1. PERMUT A TIONS 87 king," just as in frustration solitaire. If the dealer go es through the 13 cards without a matc h he pa ys the pla y ers an amoun t equal to their stak e, and the deal passes to someone else. If there is a matc h the dealer collects the pla y ers' stak es; the pla y ers put up new stak es, and the dealer con tin ues through the dec k, calling out, \Ace, t w o, three, ...." If the dealer runs out of cards he resh ues and con tin ues the coun t where he left o. He con tin ues un til there is a run of 13 without a matc h and then a new dealer is c hosen. The question at this p oin t is ho w m uc h money can the dealer exp ect to win from eac h pla y er. De Mon tmort found that if eac h pla y er puts up a stak e of 1, sa y then the dealer will win appro ximately .801 from eac h pla y er. P eter Do yle calculated the exact amoun t that the dealer can exp ect to win. The answ er is: 26516072156010 21 85 82 22 76 07 91 27 34 18 27 84 64 21 20 482 13 60 91 44 67 15 37 19 62 08 99 31 52311343541724 55 43 34 91 28 70 54 14 40 29 92 39 25 16 07 694 11 35 00 08 07 75 91 78 18 51 20 13 82176876653563 17 38 52 87 45 55 85 93 67 25 46 32 00 94 77 403 72 73 95 57 28 07 45 93 84 34 27 47 87664965076063 99 05 38 26 11 89 38 81 43 51 35 47 36 63 16 017 00 49 45 50 72 01 76 42 78 82 83 06 60117107953633 14 27 34 38 24 77 92 27 09 83 52 81 75 32 99 035 98 85 81 41 36 88 36 76 55 83 31 13 24476153310720 62 74 74 16 97 19 30 18 06 64 91 52 69 87 04 084 38 39 14 21 79 07 90 69 54 97 60 36 28528211590140 31 62 02 12 06 01 54 91 26 92 08 80 82 49 13 325 55 38 82 69 20 55 42 78 30 81 03 68 57818861208758 24 88 00 68 09 78 64 04 38 11 85 82 83 48 77 542 56 09 55 55 06 62 87 89 27 12 30 48 26997601700116 23 35 92 79 33 08 29 75 33 64 21 93 50 50 74 540 26 89 25 68 31 93 88 78 21 30 14 42 70519791882/33036929133582 59 22 20 11 72 20 71 31 56 07 11 14 97 51 01 149 83 10 63 36 40 72 13 89 69 87 80 07 99647204708825 30 33 87 52 58 92 23 65 81 32 30 15 62 80 05 621 14 34 27 29 06 25 65 89 74 43 39 71 65719454122908 00 70 86 28 98 41 30 60 87 56 13 02 81 89 91 167 35 78 63 62 37 56 06 71 84 98 64 91 35353553622197 44 88 90 22 32 67 10 11 58 80 10 16 28 59 31 351 97 92 94 38 72 23 27 70 33 39 69 67 79797069933475 80 24 23 67 69 49 87 36 61 60 51 84 03 14 77 561 56 03 93 38 02 57 07 09 70 71 19 59 69641268242455 01 33 19 87 97 47 05 46 93 51 78 09 38 37 50 593 48 88 58 69 86 72 36 48 46 95 05 39 88868628582609 90 55 86 27 10 01 31 81 50 62 11 34 40 70 56 983 21 47 40 22 18 51 56 77 06 67 20 80 94586589378459 43 27 99 86 87 06 33 41 61 81 29 88 63 04 96 327 28 72 54 81 84 58 87 93 53 02 44 98 00322425586446 74 10 48 14 77 20 93 41 08 06 13 50 61 35 03 856 97 30 48 97 12 13 06 39 37 04 05 15 59533731591.This is .803 to 3 decimal places. A description of the algorithm used to nd this answ er can b e found on his W eb page. 6 A discussion of this problem and other problems can b e found in Do yle et al. 7 The birthday pr oblem do es not seem to ha v e a v ery old history Problems of this t yp e w ere rst discussed b y v on Mises. 8 It w as made p opular in the 1950s b y F eller's b o ok. 9 6 P Do yle, \Solution to Mon tmort's Probleme du T reize," h ttp://math.ucsd.edu/ ~ do yle/. 7 P Do yle, C. Grinstead, and J. Snell, \F rustration Solitaire," UMAP Journal v ol. 16, no. 2 (1995), pp. 137145. 8 R. v on Mises, \ Ub er Aufteilungsund BesetzungsW ahrsc heinlic hk eiten," R evue de la F acult e des Scienc es de l'Universit e d'Istanbul, N. S. v ol. 4 (193839), pp. 145163. 9 W. F eller, Intr o duction to Pr ob ability The ory and Its Applic ations, v ol. 1, 3rd ed. (New Y ork: PAGE 96 88 CHAPTER 3. COMBINA TORICS Stirling presen ted his form ula n p 2 n n e n in his w ork Metho dus Dier entialis published in 1730. 10 This appro ximation w as used b y de Moivre in establishing his celebrated cen tral limit theorem that w e will study in Chapter 9. De Moivre himself had indep enden tly established this appro ximation, but without iden tifying the constan t Ha ving established the appro ximation 2 B p n for the cen tral term of the binomial distribution, where the constan t B w as determined b y an innite series, de Moivre writes: m y w orth y and learned F riend, Mr. James Stirling, who had applied himself after me to that inquiry found that the Quan tit y B did denote the Squarero ot of the Circumference of a Circle whose Radius is Unit y so that if that Circumference b e called c the Ratio of the middle T erm to the Sum of all T erms will b e expressed b y 2 = p nc 11 Exercises 1 F our p eople are to b e arranged in a ro w to ha v e their picture tak en. In ho w man y w a ys can this b e done? 2 An automobile man ufacturer has four colors a v ailable for automobile exteriors and three for in teriors. Ho w man y dieren t color com binations can he pro duce? 3 In a digital computer, a bit is one of the in tegers f 0,1 g and a wor d is an y string of 32 bits. Ho w man y dieren t w ords are p ossible? 4 What is the probabilit y that at least 2 of the presiden ts of the United States ha v e died on the same da y of the y ear? If y ou b et this has happ ened, w ould y ou win y our b et? 5 There are three dieren t routes connecting cit y A to cit y B. Ho w man y w a ys can a round trip b e made from A to B and bac k? Ho w man y w a ys if it is desired to tak e a dieren t route on the w a y bac k? 6 In arranging p eople around a circular table, w e tak e in to accoun t their seats relativ e to eac h other, not the actual p osition of an y one p erson. Sho w that n p eople can b e arranged around a circular table in ( n 1)! w a ys. John Wiley & Sons, 1968). 10 J. Stirling, Metho dus Dier entialis, (London: Bo wy er, 1730). 11 A. de Moivre, The Do ctrine of Chanc es, 3rd ed. (London: Millar, 1756). PAGE 97 3.1. PERMUT A TIONS 89 7 Fiv e p eople get on an elev ator that stops at v e ro ors. Assuming that eac h has an equal probabilit y of going to an y one ro or, nd the probabilit y that they all get o at dieren t ro ors. 8 A nite set n has n elemen ts. Sho w that if w e coun t the empt y set and n as subsets, there are 2 n subsets of n. 9 A more rened inequalit y for appro ximating n is giv en b y p 2 n n e n e 1 = (12 n +1) < n < p 2 n n e n e 1 = (12 n ) : W rite a computer program to illustrate this inequalit y for n = 1 to 9. 10 A dec k of ordinary cards is sh ued and 13 cards are dealt. What is the probabilit y that the last card dealt is an ace? 11 There are n applican ts for the director of computing. The applican ts are in terview ed indep enden tly b y eac h mem b er of the threep erson searc h committee and rank ed from 1 to n A candidate will b e hired if he or she is rank ed rst b y at least t w o of the three in terview ers. Find the probabilit y that a candidate will b e accepted if the mem b ers of the committee really ha v e no abilit y at all to judge the candidates and just rank the candidates randomly In particular, compare this probabilit y for the case of three candidates and the case of ten candidates. 12 A symphon y orc hestra has in its rep ertoire 30 Ha ydn symphonies, 15 mo dern w orks, and 9 Beetho v en symphonies. Its program alw a ys consists of a Ha ydn symphon y follo w ed b y a mo dern w ork, and then a Beetho v en symphon y (a) Ho w man y dieren t programs can it pla y? (b) Ho w man y dieren t programs are there if the three pieces can b e pla y ed in an y order? (c) Ho w man y dieren t threepiece programs are there if more than one piece from the same category can b e pla y ed and they can b e pla y ed in an y order? 13 A certain state has license plates sho wing three n um b ers and three letters. Ho w man y dieren t license plates are p ossible (a) if the n um b ers m ust come b efore the letters? (b) if there is no restriction on where the letters and n um b ers app ear? 14 The do or on the computer cen ter has a lo c k whic h has v e buttons n um b ered from 1 to 5. The com bination of n um b ers that op ens the lo c k is a sequence of v e n um b ers and is reset ev ery w eek. (a) Ho w man y com binations are p ossible if ev ery button m ust b e used once? PAGE 98 90 CHAPTER 3. COMBINA TORICS (b) Assume that the lo c k can also ha v e com binations that require y ou to push t w o buttons sim ultaneously and then the other three one at a time. Ho w man y more com binations do es this p ermit? 15 A computing cen ter has 3 pro cessors that receiv e n jobs, with the jobs assigned to the pro cessors purely at random so that all of the 3 n p ossible assignmen ts are equally lik ely Find the probabilit y that exactly one pro cessor has no jobs. 16 Pro v e that at least t w o p eople in A tlan ta, Georgia, ha v e the same initials, assuming no one has more than four initials. 17 Find a form ula for the probabilit y that among a set of n p eople, at least t w o ha v e their birthda ys in the same mon th of the y ear (assuming the mon ths are equally lik ely for birthda ys). 18 Consider the problem of nding the probabilit y of more than one coincidence of birthda ys in a group of n p eople. These include, for example, three p eople with the same birthda y or t w o pairs of p eople with the same birthda y or larger coincidences. Sho w ho w y ou could compute this probabilit y and write a computer program to carry out this computation. Use y our program to nd the smallest n um b er of p eople for whic h it w ould b e a fa v orable b et that there w ould b e more than one coincidence of birthda ys. *19 Supp ose that on planet Zorg a y ear has n da ys, and that the lifeforms there are equally lik ely to ha v e hatc hed on an y da y of the y ear. W e w ould lik e to estimate d whic h is the minim um n um b er of lifeforms needed so that the probabilit y of at least t w o sharing a birthda y exceeds 1/2. (a) In Example 3.3, it w as sho wn that in a set of d lifeforms, the probabilit y that no t w o life forms share a birthda y is ( n ) d n d ; where ( n ) d = ( n )( n 1) ( n d + 1). Th us, w e w ould lik e to set this equal to 1/2 and solv e for d (b) Using Stirling's F orm ula, sho w that ( n ) d n d 1 + d n d n d +1 = 2 e d : (c) No w tak e the logarithm of the righ thand expression, and use the fact that for small v alues of x w e ha v e log (1 + x ) x x 2 2 : (W e are implicitly using the fact that d is of smaller order of magnitude than n W e will also use this fact in part (d).) PAGE 99 3.1. PERMUT A TIONS 91 (d) Set the expression found in part (c) equal to log (2), and solv e for d as a function of n thereb y sho wing that d p 2(log 2) n : Hint : If all three summands in the expression found in part (b) are used, one obtains a cubic equation in d If the smallest of the three terms is thro wn a w a y one obtains a quadratic equation in d (e) Use a computer to calculate the exact v alues of d for v arious v alues of n Compare these v alues with the appro ximate v alues obtained b y using the answ er to part d). 20 A t a mathematical conference, ten participan ts are randomly seated around a circular table for meals. Using sim ulation, estimate the probabilit y that no t w o p eople sit next to eac h other at b oth lunc h and dinner. Can y ou mak e an in telligen t conjecture for the case of n participan ts when n is large? 21 Mo dify the program AllP erm utations to coun t the n um b er of p erm utations of n ob jects that ha v e exactly j xed p oin ts for j = 0, 1, 2, n Run y our program for n = 2 to 6. Mak e a conjecture for the relation b et w een the n um b er that ha v e 0 xed p oin ts and the n um b er that ha v e exactly 1 xed p oin t. A pro of of the correct conjecture can b e found in Wilf. 12 22 Mr. Wimply Dimple, one of London's most prestigious w atc h mak ers, has come to Sherlo c k Holmes in a panic, ha ving disco v ered that someone has b een pro ducing and selling crude coun terfeits of his b est selling w atc h. The 16 coun terfeits so far disco v ered b ear stamp ed n um b ers, all of whic h fall b et w een 1 and 56, and Dimple is anxious to kno w the exten t of the forger's w ork. All presen t agree that it seems reasonable to assume that the coun terfeits th us far pro duced b ear consecutiv e n um b ers from 1 to whatev er the total n um b er is.\Chin up, Dimple," opines Dr. W atson. \I shouldn't w orry o v erly m uc h if I w ere y ou; the Maxim um Lik eliho o d Principle, whic h estimates the total n um b er as precisely that whic h giv es the highest probabilit y for the series of n um b ers found, suggests that w e guess 56 itself as the total. Th us, y our forgers are not a big op eration, and w e shall ha v e them safely b ehind bars b efore y our business suers signican tly ." \Stu, nonsense, and b other y our fancy principles, W atson," coun ters Holmes. \An y one can see that, of course, there m ust b e quite a few more than 56 w atc heswh y the o dds of our ha ving disco v ered precisely the highest n umb ered w atc h made are laughably negligible. A m uc h b etter guess w ould b e twic e 56." (a) Sho w that W atson is correct that the Maxim um Lik eliho o d Principle giv es 56. 12 H. S. Wilf, \A Bijection in the Theory of Derangemen ts," Mathematics Magazine, v ol. 57, no. 1 (1984), pp. 37{40. PAGE 100 92 CHAPTER 3. COMBINA TORICS (b) W rite a computer program to compare Holmes's and W atson's guessing strategies as follo ws: x a total N and c ho ose 16 in tegers randomly b et w een 1 and N Let m denote the largest of these. Then W atson's guess for N is m while Holmes's is 2 m See whic h of these is closer to N Rep eat this exp erimen t (with N still xed) a h undred or more times, and determine the prop ortion of times that eac h comes closer. Whose seems to b e the b etter strategy? 23 Barbara Smith is in terviewing candidates to b e her secretary As she in terviews the candidates, she can determine the relativ e rank of the candidates but not the true rank. Th us, if there are six candidates and their true rank is 6, 1, 4, 2, 3, 5, (where 1 is b est) then after she had in terview ed the rst three candidates she w ould rank them 3, 1, 2. As she in terviews eac h candidate, she m ust either accept or reject the candidate. If she do es not accept the candidate after the in terview, the candidate is lost to her. She w an ts to decide on a strategy for deciding when to stop and accept a candidate that will maximize the probabilit y of getting the b est candidate. Assume that there are n candidates and they arriv e in a random rank order. (a) What is the probabilit y that Barbara gets the b est candidate if she in terviews all of the candidates? What is it if she c ho oses the rst candidate? (b) Assume that Barbara decides to in terview the rst half of the candidates and then con tin ue in terviewing un til getting a candidate b etter than an y candidate seen so far. Sho w that she has a b etter than 25 p ercen t c hance of ending up with the b est candidate. 24 F or the task describ ed in Exercise 23, it can b e sho wn 13 that the b est strategy is to pass o v er the rst k 1 candidates where k is the smallest in teger for whic h 1 k + 1 k + 1 + + 1 n 1 1 : Using this strategy the probabilit y of getting the b est candidate is appro ximately 1 =e = : 368. W rite a program to sim ulate Barbara Smith's in terviewing if she uses this optimal strategy using n = 10, and see if y ou can v erify that the probabilit y of success is appro ximately 1 =e 3.2 Com binations Ha ving mastered p erm utations, w e no w consider com binations. Let U b e a set with n elemen ts; w e w an t to coun t the n um b er of distinct subsets of the set U that ha v e exactly j elemen ts. The empt y set and the set U are considered to b e subsets of U The empt y set is usually denoted b y 13 E. B. Dynkin and A. A. Y ushk evic h, Markov Pr o c esses: The or ems and Pr oblems, trans. J. S. W o o d (New Y ork: Plen um, 1969). PAGE 101 3.2. COMBINA TIONS 93 Example 3.5 Let U = f a; b; c g The subsets of U are ; f a g ; f b g ; f c g ; f a; b g ; f a; c g ; f b; c g ; f a; b; c g : 2 Binomial Co ecien ts The n um b er of distinct subsets with j elemen ts that can b e c hosen from a set with n elemen ts is denoted b y n j and is pronounced \ n c ho ose j ." The n um b er n j is called a binomial c o ecient. This terminology comes from an application to algebra whic h will b e discussed later in this section. In the ab o v e example, there is one subset with no elemen ts, three subsets with exactly 1 elemen t, three subsets with exactly 2 elemen ts, and one subset with exactly 3 elemen ts. Th us, 30 = 1, 31 = 3, 32 = 3, and 33 = 1. Note that there are 2 3 = 8 subsets in all. (W e ha v e already seen that a set with n elemen ts has 2 n subsets; see Exercise 3.1.8.) It follo ws that 30 + 31 + 32 + 33 = 2 3 = 8 ; n 0 = nn = 1 : Assume that n > 0. Then, since there is only one w a y to c ho ose a set with no elemen ts and only one w a y to c ho ose a set with n elemen ts, the remaining v alues of n j are determined b y the follo wing r e curr enc e r elation : Theorem 3.4 F or in tegers n and j with 0 < j < n the binomial co ecien ts satisfy: n j = n 1 j + n 1 j 1 : (3.1) Pro of. W e wish to c ho ose a subset of j elemen ts. Cho ose an elemen t u of U Assume rst that w e do not w an t u in the subset. Then w e m ust c ho ose the j elemen ts from a set of n 1 elemen ts; this can b e done in n 1 j w a ys. On the other hand, assume that w e do w an t u in the subset. Then w e m ust c ho ose the other j 1 elemen ts from the remaining n 1 elemen ts of U ; this can b e done in n 1 j 1 w a ys. Since u is either in our subset or not, the n um b er of w a ys that w e can c ho ose a subset of j elemen ts is the sum of the n um b er of subsets of j elemen ts whic h ha v e u as a mem b er and the n um b er whic h do notthis is what Equation 3.1 states. 2 The binomial co ecien t n j is dened to b e 0, if j < 0 or if j > n With this denition, the restrictions on j in Theorem 3.4 are unnecessary PAGE 102 94 CHAPTER 3. COMBINA TORICS n = 0 1 10 1 10 45 120 210 252 210 120 45 10 1 9 1 9 36 84 126 126 84 36 9 1 8 1 8 28 56 70 56 28 8 1 7 1 7 21 35 35 21 7 1 6 1 6 15 20 15 6 1 5 1 5 10 10 5 1 4 1 4 6 4 1 3 1 3 3 1 2 1 2 1 1 1 1 j = 0 1 2 3 4 5 6 7 8 9 10 Figure 3.3: P ascal's triangle. P ascal's T riangle The relation 3.1, together with the kno wledge that n 0 = nn = 1 ; determines completely the n um b ers n j W e can use these relations to determine the famous triangle of Pasc al, whic h exhibits all these n um b ers in matrix form (see Figure 3.3). The n th ro w of this triangle has the en tries n 0 n 1 ,. nn W e kno w that the rst and last of these n um b ers are 1. The remaining n um b ers are determined b y the recurrence relation Equation 3.1; that is, the en try n j for 0 < j < n in the n th ro w of P ascal's triangle is the sum of the en try immediately ab o v e and the one immediately to its left in the ( n 1)st ro w. F or example, 52 = 6 + 4 = 10. This algorithm for constructing P ascal's triangle can b e used to write a computer program to compute the binomial co ecien ts. Y ou are ask ed to do this in Exercise 4. While P ascal's triangle pro vides a w a y to construct recursiv ely the binomial co ecien ts, it is also p ossible to giv e a form ula for n j Theorem 3.5 The binomial co ecien ts are giv en b y the form ula n j = ( n ) j j : (3.2) Pro of. Eac h subset of size j of a set of size n can b e ordered in j w a ys. Eac h of these orderings is a j p erm utation of the set of size n The n um b er of j p erm utations is ( n ) j so the n um b er of subsets of size j is ( n ) j j : This completes the pro of. 2 PAGE 103 3.2. COMBINA TIONS 95 The ab o v e form ula can b e rewritten in the form n j = n j !( n j )! : This immediately sho ws that n j = n n j : When using Equation 3.2 in the calculation of n j if one alternates the m ultiplications and divisions, then all of the in termediate v alues in the calculation are in tegers. F urthermore, none of these in termediate v alues exceed the nal v alue. (See Exercise 40.) Another p oin t that should b e made concerning Equation 3.2 is that if it is used to dene the binomial co ecien ts, then it is no longer necessary to require n to b e a p ositiv e in teger. The v ariable j m ust still b e a nonnegativ e in teger under this denition. This idea is useful when extending the Binomial Theorem to general exp onen ts. (The Binomial Theorem for nonnegativ e in teger exp onen ts is giv en b elo w as Theorem 3.7.) P ok er Hands Example 3.6 P ok er pla y ers sometimes w onder wh y a four of a kind b eats a ful l house. A p ok er hand is a random subset of 5 elemen ts from a dec k of 52 cards. A hand has four of a kind if it has four cards with the same v aluefor example, four sixes or four kings. It is a full house if it has three of one v alue and t w o of a secondfor example, three t w os and t w o queens. Let us see whic h hand is more lik ely Ho w man y hands ha v e four of a kind? There are 13 w a ys that w e can sp ecify the v alue for the four cards. F or eac h of these, there are 48 p ossibilities for the fth card. Th us, the n um b er of fourofakind hands is 13 48 = 624. Since the total n um b er of p ossible hands is 52 5 = 2598960, the probabilit y of a hand with four of a kind is 624 = 2598960 = : 00024. No w consider the case of a full house; ho w man y suc h hands are there? There are 13 c hoices for the v alue whic h o ccurs three times; for eac h of these there are 43 = 4 c hoices for the particular three cards of this v alue that are in the hand. Ha ving pic k ed these three cards, there are 12 p ossibilities for the v alue whic h o ccurs t wice; for eac h of these there are 42 = 6 p ossibilities for the particular pair of this v alue. Th us, the n um b er of full houses is 13 4 12 6 = 3744, and the probabilit y of obtaining a hand with a full house is 3744 = 2598960 = : 0014. Th us, while b oth t yp es of hands are unlik ely y ou are six times more lik ely to obtain a full house than four of a kind. 2 PAGE 104 96 CHAPTER 3. COMBINA TORICS (start) rr S F F F F S S S S S SF F F p q p p q p q q p q p q q q q q q p p p p p p q q p p q m (w) w w w w w w w w w 2 3 3 2 2 2 2 2 1 2345 678 Figure 3.4: T ree diagram of three Bernoulli trials. Bernoulli T rials Our principal use of the binomial co ecien ts will o ccur in the study of one of the imp ortan t c hance pro cesses called Bernoul li trials. Denition 3.5 A Bernoul li trials pr o c ess is a sequence of n c hance exp erimen ts suc h that 1. Eac h exp erimen t has t w o p ossible outcomes, whic h w e ma y call suc c ess and failur e. 2. The probabilit y p of success on eac h exp erimen t is the same for eac h exp erimen t, and this probabilit y is not aected b y an y kno wledge of previous outcomes. The probabilit y q of failure is giv en b y q = 1 p 2 Example 3.7 The follo wing are Bernoulli trials pro cesses: 1. A coin is tossed ten times. The t w o p ossible outcomes are heads and tails. The probabilit y of heads on an y one toss is 1/2. 2. An opinion p oll is carried out b y asking 1000 p eople, randomly c hosen from the p opulation, if they fa v or the Equal Righ ts Amendmen tthe t w o outcomes b eing y es and no. The probabilit y p of a y es answ er (i.e., a success) indicates the prop ortion of p eople in the en tire p opulation that fa v or this amendmen t. 3. A gam bler mak es a sequence of 1dollar b ets, b etting eac h time on blac k at roulette at Las V egas. Here a success is winning 1 dollar and a failure is losing PAGE 105 3.2. COMBINA TIONS 97 1 dollar. Since in American roulette the gam bler wins if the ball stops on one of 18 out of 38 p ositions and loses otherwise, the probabilit y of winning is p = 18 = 38 = : 474. 2 T o analyze a Bernoulli trials pro cess, w e c ho ose as our sample space a binary tree and assign a probabilit y distribution to the paths in this tree. Supp ose, for example, that w e ha v e three Bernoulli trials. The p ossible outcomes are indicated in the tree diagram sho wn in Figure 3.4. W e dene X to b e the random v ariable whic h represen ts the outcome of the pro cess, i.e., an ordered triple of S's and F's. The probabilities assigned to the branc hes of the tree represen t the probabilit y for eac h individual trial. Let the outcome of the i th trial b e denoted b y the random v ariable X i with distribution function m i Since w e ha v e assumed that outcomes on an y one trial do not aect those on another, w e assign the same probabilities at eac h lev el of the tree. An outcome for the en tire exp erimen t will b e a path through the tree. F or example, 3 represen ts the outcomes SFS. Our frequency in terpretation of probabilit y w ould lead us to exp ect a fraction p of successes on the rst exp erimen t; of these, a fraction q of failures on the second; and, of these, a fraction p of successes on the third exp erimen t. This suggests assigning probabilit y pq p to the outcome 3 More generally w e assign a distribution function m ( ) for paths b y dening m ( ) to b e the pro duct of the branc h probabilities along the path Th us, the probabilit y that the three ev en ts S on the rst trial, F on the second trial, and S on the third trial o ccur is the pro duct of the probabilities for the individual ev en ts. W e shall see in the next c hapter that this means that the ev en ts in v olv ed are indep endent in the sense that the kno wledge of one ev en t do es not aect our prediction for the o ccurrences of the other ev en ts. Binomial Probabilities W e shall b e particularly in terested in the probabilit y that in n Bernoulli trials there are exactly j successes. W e denote this probabilit y b y b ( n; p; j ). Let us calculate the particular v alue b (3 ; p; 2) from our tree measure. W e see that there are three paths whic h ha v e exactly t w o successes and one failure, namely 2 3 and 5 Eac h of these paths has the same probabilit y p 2 q Th us b (3 ; p; 2) = 3 p 2 q Considering all p ossible n um b ers of successes w e ha v e b (3 ; p; 0) = q 3 ; b (3 ; p; 1) = 3 pq 2 ; b (3 ; p; 2) = 3 p 2 q ; b (3 ; p; 3) = p 3 : W e can, in the same manner, carry out a tree measure for n exp erimen ts and determine b ( n; p; j ) for the general case of n Bernoulli trials. PAGE 106 98 CHAPTER 3. COMBINA TORICS Theorem 3.6 Giv en n Bernoulli trials with probabilit y p of success on eac h exp erimen t, the probabilit y of exactly j successes is b ( n; p; j ) = n j p j q n j where q = 1 p Pro of. W e construct a tree measure as describ ed ab o v e. W e w an t to nd the sum of the probabilities for all paths whic h ha v e exactly j successes and n j failures. Eac h suc h path is assigned a probabilit y p j q n j Ho w man y suc h paths are there? T o sp ecify a path, w e ha v e to pic k, from the n p ossible trials, a subset of j to b e successes, with the remaining n j outcomes b eing failures. W e can do this in n j w a ys. Th us the sum of the probabilities is b ( n; p; j ) = n j p j q n j : 2 Example 3.8 A fair coin is tossed six times. What is the probabilit y that exactly three heads turn up? The answ er is b (6 ; : 5 ; 3) = 63 1 2 3 1 2 3 = 20 1 64 = : 3125 : 2 Example 3.9 A die is rolled four times. What is the probabilit y that w e obtain exactly one 6? W e treat this as Bernoulli trials with suc c ess = \rolling a 6" and failur e = \rolling some n um b er other than a 6." Then p = 1 = 6, and the probabilit y of exactly one success in four trials is b (4 ; 1 = 6 ; 1) = 41 1 6 1 5 6 3 = : 386 : 2 T o compute binomial probabilities using the computer, m ultiply the function c ho ose( n; k ) b y p k q n k The program BinomialProbabilities prin ts out the binomial probabilities b ( n; p; k ) for k b et w een k min and k max and the sum of these probabilities. W e ha v e run this program for n = 100, p = 1 = 2, k min = 45, and k max = 55; the output is sho wn in T able 3.8. Note that the individual probabilities are quite small. The probabilit y of exactly 50 heads in 100 tosses of a coin is ab out .08. Our in tuition tells us that this is the most lik ely outcome, whic h is correct; but, all the same, it is not a v ery lik ely outcome. PAGE 107 3.2. COMBINA TIONS 99 k b ( n; p; k ) 45 .0485 46 .0580 47 .0666 48 .0735 49 .0780 50 .0796 51 .0780 52 .0735 53 .0666 54 .0580 55 .0485 T able 3.8: Binomial probabilities for n = 100 ; p = 1 = 2. Binomial Distributions Denition 3.6 Let n b e a p ositiv e in teger, and let p b e a real n um b er b et w een 0 and 1. Let B b e the random v ariable whic h coun ts the n um b er of successes in a Bernoulli trials pro cess with parameters n and p Then the distribution b ( n; p; k ) of B is called the binomial distribution 2 W e can get a b etter idea ab out the binomial distribution b y graphing this distribution for dieren t v alues of n and p (see Figure 3.5). The plots in this gure w ere generated using the program BinomialPlot W e ha v e run this program for p = : 5 and p = : 3. Note that ev en for p = : 3 the graphs are quite symmetric. W e shall ha v e an explanation for this in Chapter 9. W e also note that the highest probabilit y o ccurs around the v alue np but that these highest probabilities get smaller as n increases. W e shall see in Chapter 6 that np is the me an or exp e cte d v alue of the binomial distribution b ( n; p; k ). The follo wing example giv es a nice w a y to see the binomial distribution, when p = 1 = 2. Example 3.10 A Galton b o ar d is a b oard in whic h a large n um b er of BBshots are dropp ed from a c h ute at the top of the b oard and derected o a n um b er of pins on their w a y do wn to the b ottom of the b oard. The nal p osition of eac h slot is the result of a n um b er of random derections either to the left or the righ t. W e ha v e written a program GaltonBoard to sim ulate this exp erimen t. W e ha v e run the program for the case of 20 ro ws of pins and 10,000 shots b eing dropp ed. W e sho w the result of this sim ulation in Figure 3.6. Note that if w e write 0 ev ery time the shot is derected to the left, and 1 ev ery time it is derected to the righ t, then the path of the shot can b e describ ed b y a sequence of 0's and 1's of length n just as for the n fold coin toss. The distribution sho wn in Figure 3.6 is an example of an empirical distribution, in the sense that it comes ab out b y means of a sequence of exp erimen ts. As exp ected, PAGE 108 100 CHAPTER 3. COMBINA TORICS 0 20 40 60 80 100 120 0 0.025 0.05 0.075 0.1 0.125 0.15 0 20 40 60 80 100 0.02 0.04 0.06 0.08 0.1 0.12 p = .5 n = 40 n = 80 n = 160 n = 30 n = 120 n = 270 p = .30 Figure 3.5: Binomial distributions. PAGE 109 3.2. COMBINA TIONS 101 Figure 3.6: Sim ulation of the Galton b oard. this empirical distribution resem bles the corresp onding binomial distribution with parameters n = 20 and p = 1 = 2. 2 Hyp othesis T esting Example 3.11 Supp ose that ordinary aspirin has b een found eectiv e against headac hes 60 p ercen t of the time, and that a drug compan y claims that its new aspirin with a sp ecial headac he additiv e is more eectiv e. W e can test this claim as follo ws: w e call their claim the alternate hyp othesis, and its negation, that the additiv e has no appreciable eect, the nul l hyp othesis. Th us the n ull h yp othesis is that p = : 6, and the alternate h yp othesis is that p > : 6, where p is the probabilit y that the new aspirin is eectiv e. W e giv e the aspirin to n p eople to tak e when they ha v e a headac he. W e w an t to nd a n um b er m called the critic al value for our exp erimen t, suc h that w e reject the n ull h yp othesis if at least m p eople are cured, and otherwise w e accept it. Ho w should w e determine this critical v alue? First note that w e can mak e t w o kinds of errors. The rst, often called a typ e 1 err or in statistics, is to reject the n ull h yp othesis when in fact it is true. The second, called a typ e 2 err or, is to accept the n ull h yp othesis when it is false. T o determine the probabilit y of b oth these t yp es of errors w e in tro duce a function ( p ), dened to b e the probabilit y that w e reject the n ull h yp othesis, where this probabilit y is calculated under the assumption that the n ull h yp othesis is true. In the presen t case, w e ha v e ( p ) = X m k n b ( n; p; k ) : PAGE 110 102 CHAPTER 3. COMBINA TORICS Note that ( : 6) is the probabilit y of a t yp e 1 error, since this is the probabilit y of a high n um b er of successes for an ineectiv e additiv e. So for a giv en n w e w an t to c ho ose m so as to mak e ( : 6) quite small, to reduce the lik eliho o d of a t yp e 1 error. But as m increases ab o v e the most probable v alue np = : 6 n ( : 6), b eing the upp er tail of a binomial distribution, approac hes 0. Th us incr e asing m mak es a t yp e 1 error less lik ely No w supp ose that the additiv e really is eectiv e, so that p is appreciably greater than .6; sa y p = : 8. (This alternativ e v alue of p is c hosen arbitrarily; the follo wing calculations dep end on this c hoice.) Then c ho osing m w ell b elo w np = : 8 n will increase ( : 8), since no w ( : 8) is all but the lo w er tail of a binomial distribution. Indeed, if w e put ( : 8) = 1 ( : 8), then ( : 8) giv es us the probabilit y of a t yp e 2 error, and so de cr e asing m mak es a t yp e 2 error less lik ely The man ufacturer w ould lik e to guard against a t yp e 2 error, since if suc h an error is made, then the test do es not sho w that the new drug is b etter, when in fact it is. If the alternativ e v alue of p is c hosen closer to the v alue of p giv en in the n ull h yp othesis (in this case p = : 6), then for a giv en test p opulation, the v alue of will increase. So, if the man ufacturer's statistician c ho oses an alternativ e v alue for p whic h is close to the v alue in the n ull h yp othesis, then it will b e an exp ensiv e prop osition (i.e., the test p opulation will ha v e to b e large) to reject the n ull h yp othesis with a small v alue of What w e hop e to do then, for a giv en test p opulation n is to c ho ose a v alue of m if p ossible, whic h mak es b oth these probabilities small. If w e mak e a t yp e 1 error w e end up buying a lot of essen tially ordinary aspirin at an inrated price; a t yp e 2 error means w e miss a bargain on a sup erior medication. Let us sa y that w e w an t our critical n um b er m to mak e eac h of these undesirable cases less than 5 p ercen t probable. W e write a program P o w erCurv e to plot, for n = 100 and selected v alues of m the function ( p ), for p ranging from .4 to 1. The result is sho wn in Figure 3.7. W e include in our graph a b o x (in dotted lines) from .6 to .8, with b ottom and top at heigh ts .05 and .95. Then a v alue for m satises our requiremen ts if and only if the graph of en ters the b o x from the b ottom, and lea v es from the top (wh y?whic h is the t yp e 1 and whic h is the t yp e 2 criterion?). As m increases, the graph of mo v es to the righ t. A few exp erimen ts ha v e sho wn us that m = 69 is the smallest v alue for m that th w arts a t yp e 1 error, while m = 73 is the largest whic h th w arts a t yp e 2. So w e ma y c ho ose our critical v alue b et w een 69 and 73. If w e're more in ten t on a v oiding a t yp e 1 error w e fa v or 73, and similarly w e fa v or 69 if w e regard a t yp e 2 error as w orse. Of course, the drug compan y ma y not b e happ y with ha ving as m uc h as a 5 p ercen t c hance of an error. They migh t insist on ha ving a 1 p ercen t c hance of an error. F or this w e w ould ha v e to increase the n um b er n of trials (see Exercise 28). 2 Binomial Expansion W e next remind the reader of an application of the binomial co ecien ts to algebra. This is the binomial exp ansion, from whic h w e get the term binomial co ecien t. PAGE 111 3.2. COMBINA TIONS 103 .4 1 .5 .6 .7 .8 .9 1 .0 1.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 .4 1 .5 .6 .7 .8 .9 1 .0 1.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 Figure 3.7: The p o w er curv e. Theorem 3.7 (Binomial Theorem) The quan tit y ( a + b ) n can b e expressed in the form ( a + b ) n = n X j =0 n j a j b n j : Pro of. T o see that this expansion is correct, write ( a + b ) n = ( a + b )( a + b ) ( a + b ) : When w e m ultiply this out w e will ha v e a sum of terms eac h of whic h results from a c hoice of an a or b for eac h of n factors. When w e c ho ose j a 's and ( n j ) b 's, w e obtain a term of the form a j b n j T o determine suc h a term, w e ha v e to sp ecify j of the n terms in the pro duct from whic h w e c ho ose the a This can b e done in n j w a ys. Th us, collecting these terms in the sum con tributes a term n j a j b n j 2 F or example, w e ha v e ( a + b ) 0 = 1 ( a + b ) 1 = a + b ( a + b ) 2 = a 2 + 2 ab + b 2 ( a + b ) 3 = a 3 + 3 a 2 b + 3 ab 2 + b 3 : W e see here that the co ecien ts of successiv e p o w ers do indeed yield P ascal's triangle.Corollary 3.1 The sum of the elemen ts in the n th ro w of P ascal's triangle is 2 n If the elemen ts in the n th ro w of P ascal's triangle are added with alternating signs, the sum is 0. PAGE 112 104 CHAPTER 3. COMBINA TORICS Pro of. The rst statemen t in the corollary follo ws from the fact that 2 n = (1 + 1) n = n 0 + n 1 + n 2 + + nn ; and the second from the fact that 0 = (1 1) n = n 0 n 1 + n 2 + ( 1) n nn : 2 The rst statemen t of the corollary tells us that the n um b er of subsets of a set of n elemen ts is 2 n W e shall use the second statemen t in our next application of the binomial theorem. W e ha v e seen that, when A and B are an y t w o ev en ts (cf. Section 1.2), P ( A [ B ) = P ( A ) + P ( B ) P ( A \ B ) : W e no w extend this theorem to a more general v ersion, whic h will enable us to nd the probabilit y that at least one of a n um b er of ev en ts o ccurs. InclusionExclusion Principle Theorem 3.8 Let P b e a probabilit y distribution on a sample space n, and let f A 1 ; A 2 ; : : : ; A n g b e a nite set of ev en ts. Then P ( A 1 [ A 2 [ [ A n ) = n X i =1 P ( A i ) X 1 i PAGE 113 3.2. COMBINA TIONS 105 Hence, 1 = k 0 = k X j =1 k j ( 1) j 1 : If the outcome is not in an y of the ev en ts A i then it is not coun ted on either side of the equation. 2 Hat Chec k Problem Example 3.12 W e return to the hat c hec k problem discussed in Section 3.1, that is, the problem of nding the probabilit y that a random p erm utation con tains at least one xed p oin t. Recall that a p erm utation is a onetoone map of a set A = f a 1 ; a 2 ; : : : ; a n g on to itself. Let A i b e the ev en t that the i th elemen t a i remains xed under this map. If w e require that a i is xed, then the map of the remaining n 1 elemen ts pro vides an arbitrary p erm utation of ( n 1) ob jects. Since there are ( n 1)! suc h p erm utations, P ( A i ) = ( n 1)! =n = 1 =n Since there are n c hoices for a i the rst term of Equation 3.3 is 1. In the same w a y to ha v e a particular pair ( a i ; a j ) xed, w e can c ho ose an y p erm utation of the remaining n 2 elemen ts; there are ( n 2)! suc h c hoices and th us P ( A i \ A j ) = ( n 2)! n = 1 n ( n 1) : The n um b er of terms of this form in the righ t side of Equation 3.3 is n 2 = n ( n 1) 2! : Hence, the second term of Equation 3.3 is n ( n 1) 2! 1 n ( n 1) = 1 2! : Similarly for an y sp ecic three ev en ts A i A j A k P ( A i \ A j \ A k ) = ( n 3)! n = 1 n ( n 1)( n 2) ; and the n um b er of suc h terms is n 3 = n ( n 1)( n 2) 3! ; making the third term of Equation 3.3 equal to 1/3!. Con tin uing in this w a y w e obtain P (at least one xed p oin t) = 1 1 2! + 1 3! ( 1) n 1 1 n and P (no xed p oin t ) = 1 2! 1 3! + ( 1) n 1 n : PAGE 114 106 CHAPTER 3. COMBINA TORICS Probabilit y that no one n gets his o wn hat bac k 3 .333333 4 .375 5 .366667 6 .368056 7 .367857 8 .367882 9 .367879 10 .367879 T able 3.9: Hat c hec k problem. F rom calculus w e learn that e x = 1 + x + 1 2! x 2 + 1 3! x 3 + + 1 n x n + : Th us, if x = 1, w e ha v e e 1 = 1 2! 1 3! + + ( 1) n n + = : 3678794 : Therefore, the probabilit y that there is no xed p oin t, i.e., that none of the n p eople gets his o wn hat bac k, is equal to the sum of the rst n terms in the expression for e 1 This series con v erges v ery fast. Calculating the partial sums for n = 3 to 10 giv es the data in T able 3.9. After n = 9 the probabilities are essen tially the same to six signican t gures. In terestingly the probabilit y of no xed p oin t alternately increases and decreases as n increases. Finally w e note that our exact results are in go o d agreemen t with our sim ulations rep orted in the previous section. 2 Cho osing a Sample Space W e no w ha v e some of the to ols needed to accurately describ e sample spaces and to assign probabilit y functions to those sample spaces. Nev ertheless, in some cases, the description and assignmen t pro cess is somewhat arbitrary Of course, it is to b e hop ed that the description of the sample space and the subsequen t assignmen t of a probabilit y function will yield a mo del whic h accurately predicts what w ould happ en if the exp erimen t w ere actually carried out. As the follo wing examples sho w, there are situations in whic h \reasonable" descriptions of the sample space do not pro duce a mo del whic h ts the data. In F eller's b o ok, 14 a pair of mo dels is giv en whic h describ e arrangemen ts of certain kinds of elemen tary particles, suc h as photons and protons. It turns out that exp erimen ts ha v e sho wn that certain t yp es of elemen tary particles exhibit b eha vior 14 W. F eller, Intr o duction to Pr ob ability The ory and Its Applic ations v ol. 1, 3rd ed. (New Y ork: John Wiley and Sons, 1968), p. 41 PAGE 115 3.2. COMBINA TIONS 107 whic h is accurately describ ed b y one mo del, called \BoseEinstein statistics," while other t yp es of elemen tary particles can b e mo delled using \F ermiDir ac statistics." F eller sa ys: W e ha v e here an instructiv e example of the imp ossibilit y of selecting or justifying probabilit y mo dels b y a priori argumen ts. In fact, no pure reasoning could tell that photons and protons w ould not ob ey the same probabilit y la ws. W e no w giv e some examples of this description and assignmen t pro cess. Example 3.13 In the quan tum mec hanical mo del of the helium atom, v arious parameters can b e used to classify the energy states of the atom. In the triplet spin state ( S = 1) with orbital angular momen tum 1 ( L = 1), there are three p ossibilities, 0, 1, or 2, for the total angular momen tum ( J ). (It is not assumed that the reader kno ws what an y of this means; in fact, the example is more illustrativ e if the reader do es not kno w an ything ab out quan tum mec hanics.) W e w ould lik e to assign probabilities to the three p ossibilities for J The reader is undoubtedly resisting the idea of assigning the probabilit y of 1 = 3 to eac h of these outcomes. She should no w ask herself wh y she is resisting this assignmen t. The answ er is probably b ecause she do es not ha v e an y \in tuition" (i.e., exp erience) ab out the w a y in whic h helium atoms b eha v e. In fact, in this example, the probabilities 1 = 9 ; 3 = 9 ; and 5 = 9 are assigned b y the theory The theory giv es these assignmen ts b ecause these frequencies w ere observ ed in exp eriments and further parameters w ere dev elop ed in the theory to allo w these frequencies to b e predicted. 2 Example 3.14 Supp ose t w o p ennies are ripp ed once eac h. There are sev eral \reasonable" w a ys to describ e the sample space. One w a y is to coun t the n um b er of heads in the outcome; in this case, the sample space can b e written f 0 ; 1 ; 2 g Another description of the sample space is the set of all ordered pairs of H 's and T 's, i.e., f ( H ; H ) ; ( H ; T ) ; ( T ; H ) ; ( T ; T ) g : Both of these descriptions are accurate ones, but it is easy to see that (at most) one of these, if assigned a constan t probabilit y function, can claim to accurately mo del realit y In this case, as opp osed to the preceding example, the reader will probably sa y that the second description, with eac h outcome b eing assigned a probabilit y of 1 = 4, is the \righ t" description. This con viction is due to exp erience; there is no pro of that this is the w a y realit y w orks. 2 The reader is also referred to Exercise 26 for another example of this pro cess. Historical Remarks The binomial co ecien ts ha v e a long and colorful history leading up to P ascal's T r e atise on the A rithmetic al T riangle, 15 where P ascal dev elop ed man y imp ortan t 15 B. P ascal, T r ait e du T riangle A rithm etique (P aris: Desprez, 1665). PAGE 116 108 CHAPTER 3. COMBINA TORICS 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 1 3 6 10 15 21 28 36 1 4 10 20 35 56 84 1 5 15 35 70 126 1 6 21 56 126 1 7 28 84 1 8 36 1 9 1 T able 3.10: P ascal's triangle. natural n um b ers 1 2 3 4 5 6 7 8 9 triangular n um b ers 1 3 6 10 15 21 28 36 45 tetrahedral n um b ers 1 4 10 20 35 56 84 120 165 T able 3.11: Figurate n um b ers. prop erties of these n um b ers. This history is set forth in the b o ok Pasc al's A rithmetic al T riangle b y A. W. F. Edw ards. 16 P ascal wrote his triangle in the form sho wn in T able 3.10. Edw ards traces three dieren t w a ys that the binomial co ecien ts arose. He refers to these as the gur ate numb ers, the c ombinatorial numb ers, and the binomial numb ers. They are all names for the same thing (whic h w e ha v e called binomial co ecien ts) but that they are all the same w as not appreciated un til the sixteen th cen tury The gur ate numb ers date bac k to the Pythagorean in terest in n um b er patterns around 540 BC. The Pythagoreans considered, for example, triangular patterns sho wn in Figure 3.8. The sequence of n um b ers 1 ; 3 ; 6 ; 10 ; : : : obtained as the n um b er of p oin ts in eac h triangle are called triangular numb ers. F rom the triangles it is clear that the n th triangular n um b er is simply the sum of the rst n in tegers. The tetr ahe dr al numb ers are the sums of the triangular n um b ers and w ere obtained b y the Greek mathematicians Theon and Nicomac h us at the b eginning of the second cen tury BC. The tetrahedral n um b er 10, for example, has the geometric represen tation sho wn in Figure 3.9. The rst three t yp es of gurate n um b ers can b e represen ted in tabular form as sho wn in T able 3.11. These n um b ers pro vide the rst four ro ws of P ascal's triangle, but the table w as not to b e completed in the W est un til the sixteen th cen tury In the East, Hindu mathematicians b egan to encoun ter the binomial co ecien ts in com binatorial problems. Bhask ara in his Lilavati of 1150 ga v e a rule to nd the 16 A. W. F. Edw ards, Pasc al's A rithmetic al T riangle (London: Grin, 1987). PAGE 117 3.2. COMBINA TIONS 109 1 3 6 10 Figure 3.8: Pythagorean triangular patterns. Figure 3.9: Geometric represen tation of the tetrahedral n um b er 10. PAGE 118 110 CHAPTER 3. COMBINA TORICS 1112 22 13 23 33 14 24 34 44 15 25 35 45 55 16 26 36 46 56 66 T able 3.12: Outcomes for the roll of t w o dice. n um b er of medicinal preparations using 1, 2, 3, 4, 5, or 6 p ossible ingredien ts. 17 His rule is equiv alen t to our form ula n r = ( n ) r r : The binomial n um b ers as co ecien ts of ( a + b ) n app eared in the w orks of mathematicians in China around 1100. There are references ab out this time to \the tabulation system for unlo c king binomial co ecien ts." The triangle to pro vide the co ecien ts up to the eigh th p o w er is giv en b y Ch u Shihc hieh in a b o ok written around 1303 (see Figure 3.10). 18 The original man uscript of Ch u's b o ok has b een lost, but copies ha v e surviv ed. Edw ards notes that there is an error in this cop y of Ch u's triangle. Can y ou nd it? ( Hint : Tw o n um b ers whic h should b e equal are not.) Other copies do not sho w this error. The rst app earance of P ascal's triangle in the W est seems to ha v e come from calculations of T artaglia in calculating the n um b er of p ossible w a ys that n dice migh t turn up. 19 F or one die the answ er is clearly 6. F or t w o dice the p ossibilities ma y b e displa y ed as sho wn in T able 3.12. Displa ying them this w a y suggests the sixth triangular n um b er 1 + 2 + 3 + 4 + 5 + 6 = 21 for the thro w of 2 dice. T artaglia \on the rst da y of Len t, 1523, in V erona, ha ving though t ab out the problem all nigh t," 20 realized that the extension of the gurate table ga v e the answ ers for n dice. The problem had suggested itself to T artaglia from w atc hing p eople casting their o wn horoscop es b y means of a Bo ok of F ortune, selecting v erses b y a pro cess whic h included noting the n um b ers on the faces of three dice. The 56 w a ys that three dice can fall w ere set out on eac h page. The w a y the n um b ers w ere written in the b o ok did not suggest the connection with gurate n um b ers, but a metho d of en umeration similar to the one w e used for 2 dice do es. T artaglia's table w as not published un til 1556. A table for the binomial co ecien ts w as published in 1554 b y the German mathematician Stifel. 21 P ascal's triangle app ears also in Cardano's Opus novum of 1570. 22 17 ibid., p. 27. 18 J. Needham, Scienc e and Civilization in China, v ol. 3 (New Y ork: Cam bridge Univ ersit y Press, 1959), p. 135. 19 N. T artaglia, Gener al T r attato di Numeri et Misur e (Vinegia, 1556). 20 Quoted in Edw ards, op. cit., p. 37. 21 M. Stifel, A rithmetic a Inte gr a (Norim burgae, 1544). 22 G. Cardano, Opus Novum de Pr op ortionibus Numer orum (Basilea, 1570). PAGE 119 3.2. COMBINA TIONS 111 Figure 3.10: Ch u Shihc hieh's triangle. [F rom J. Needham, Scienc e and Civilization in China, v ol. 3 (New Y ork: Cam bridge Univ ersit y Press, 1959), p. 135. Reprin ted with p ermission.] PAGE 120 112 CHAPTER 3. COMBINA TORICS Cardano w as in terested in the problem of nding the n um b er of w a ys to c ho ose r ob jects out of n Th us b y the time of P ascal's w ork, his triangle had app eared as a result of lo oking at the gurate n um b ers, the com binatorial n um b ers, and the binomial n um b ers, and the fact that all three w ere the same w as presumably prett y w ell understo o d. P ascal's in terest in the binomial n um b ers came from his letters with F ermat concerning a problem kno wn as the problem of p oin ts. This problem, and the corresp ondence b et w een P ascal and F ermat, w ere discussed in Chapter 1. The reader will recall that this problem can b e describ ed as follo ws: Tw o pla y ers A and B are pla ying a sequence of games and the rst pla y er to win n games wins the matc h. It is desired to nd the probabilit y that A wins the matc h at a time when A has w on a games and B has w on b games. (See Exercises 4.1.404.1.42.) P ascal solv ed the problem b y bac kw ard induction, m uc h the w a y w e w ould do to da y in writing a computer program for its solution. He referred to the com binatorial metho d of F ermat whic h pro ceeds as follo ws: If A needs c games and B needs d games to win, w e require that the pla y ers con tin ue to pla y un til they ha v e pla y ed c + d 1 games. The winner in this extended series will b e the same as the winner in the original series. The probabilit y that A wins in the extended series and hence in the original series is c + d 1 X r = c 1 2 c + d 1 c + d 1 r : Ev en at the time of the letters P ascal seemed to understand this form ula. Supp ose that the rst pla y er to win n games wins the matc h, and supp ose that eac h pla y er has put up a stak e of x P ascal studied the v alue of winning a particular game. By this he mean t the increase in the exp ected winnings of the winner of the particular game under consideration. He sho w ed that the v alue of the rst game is 1 3 5 : : : (2 n 1) 2 4 6 : : : (2 n ) x : His pro of of this seems to use F ermat's form ula and the fact that the ab o v e ratio of pro ducts of o dd to pro ducts of ev en n um b ers is equal to the probabilit y of exactly n heads in 2 n tosses of a coin. (See Exercise 39.) P ascal presen ted F ermat with the table sho wn in T able 3.13. He states: Y ou will see as alw a ys, that the v alue of the rst game is equal to that of the second whic h is easily sho wn b y com binations. Y ou will see, in the same w a y that the n um b ers in the rst line are alw a ys increasing; so also are those in the second; and those in the third. But those in the fourth line are decreasing, and those in the fth, etc. This seems o dd. 23 The studen t can pursue this question further using the computer and P ascal's bac kw ard iteration metho d for computing the exp ected pa y o at an y p oin t in the series. 23 F. N. Da vid, op. cit., p. 235. PAGE 121 3.2. COMBINA TIONS 113 if eac h one stak en 256 in F rom m y opp onen t's 256 6 5 4 3 2 1 p ositions I get, for the games games games games games games 1st game 63 70 80 96 128 256 2nd game 63 70 80 96 128 3rd game 56 60 64 64 4th game 42 40 32 5th game 24 16 6th game 8 T able 3.13: P ascal's solution for the problem of p oin ts. In his treatise, P ascal ga v e a formal pro of of F ermat's com binatorial form ula as w ell as pro ofs of man y other basic prop erties of binomial n um b ers. Man y of his pro ofs in v olv ed induction and represen t some of the rst pro ofs b y this metho d. His b o ok brough t together all the dieren t asp ects of the n um b ers in the P ascal triangle as kno wn in 1654, and, as Edw ards states, \That the Arithmetical T riangle should b ear P ascal's name cannot b e disputed." 24 The rst serious study of the binomial distribution w as undertak en b y James Bernoulli in his A rs Conje ctandi published in 1713. 25 W e shall return to this w ork in the historical remarks in Chapter 8. Exercises 1 Compute the follo wing: (a) 63 (b) b (5 ; : 2 ; 4) (c) 72 (d) 2626 (e) b (4 ; : 2 ; 3) (f ) 62 (g) 10 9 (h) b (8 ; : 3 ; 5) 2 In ho w man y w a ys can w e c ho ose v e p eople from a group of ten to form a committee? 3 Ho w man y sev enelemen t subsets are there in a set of nine elemen ts? 4 Using the relation Equation 3.1 write a program to compute P ascal's triangle, putting the results in a matrix. Ha v e y our program prin t the triangle for n = 10. 24 A. W. F. Edw ards, op. cit., p. ix. 25 J. Bernoulli, A rs Conje ctandi (Basil: Th urnisiorum, 1713). PAGE 122 114 CHAPTER 3. COMBINA TORICS 5 Use the program BinomialProbabilities to nd the probabilit y that, in 100 tosses of a fair coin, the n um b er of heads that turns up lies b et w een 35 and 65, b et w een 40 and 60, and b et w een 45 and 55. 6 Charles claims that he can distinguish b et w een b eer and ale 75 p ercen t of the time. Ruth b ets that he cannot and, in fact, just guesses. T o settle this, a b et is made: Charles is to b e giv en ten small glasses, eac h ha ving b een lled with b eer or ale, c hosen b y tossing a fair coin. He wins the b et if he gets sev en or more correct. Find the probabilit y that Charles wins if he has the abilit y that he claims. Find the probabilit y that Ruth wins if Charles is guessing. 7 Sho w that b ( n; p; j ) = p q n j + 1 j b ( n; p; j 1) ; for j 1. Use this fact to determine the v alue or v alues of j whic h giv e b ( n; p; j ) its greatest v alue. Hint : Consider the successiv e ratios as j increases. 8 A die is rolled 30 times. What is the probabilit y that a 6 turns up exactly 5 times? What is the most probable n um b er of times that a 6 will turn up? 9 Find in tegers n and r suc h that the follo wing equation is true: 13 5 + 2 13 6 + 13 7 = n r : 10 In a tenquestion truefalse exam, nd the probabilit y that a studen t gets a grade of 70 p ercen t or b etter b y guessing. Answ er the same question if the test has 30 questions, and if the test has 50 questions. 11 A restauran t oers apple and blueb erry pies and sto c ks an equal n um b er of eac h kind of pie. Eac h da y ten customers request pie. They c ho ose, with equal probabilities, one of the t w o kinds of pie. Ho w man y pieces of eac h kind of pie should the o wner pro vide so that the probabilit y is ab out .95 that eac h customer gets the pie of his or her o wn c hoice? 12 A p ok er hand is a set of 5 cards randomly c hosen from a dec k of 52 cards. Find the probabilit y of a (a) ro y al rush (ten, jac k, queen, king, ace in a single suit). (b) straigh t rush (v e in a sequence in a single suit, but not a ro y al rush). (c) four of a kind (four cards of the same face v alue). (d) full house (one pair and one triple, eac h of the same face v alue). (e) rush (v e cards in a single suit but not a straigh t or ro y al rush). (f ) straigh t (v e cards in a sequence, not all the same suit). (Note that in straigh ts, an ace coun ts high or lo w.) 13 If a set has 2 n elemen ts, sho w that it has more subsets with n elemen ts than with an y other n um b er of elemen ts. PAGE 123 3.2. COMBINA TIONS 115 14 Let b (2 n; : 5 ; n ) b e the probabilit y that in 2 n tosses of a fair coin exactly n heads turn up. Using Stirling's form ula (Theorem 3.3), sho w that b (2 n; : 5 ; n ) 1 = p n Use the program BinomialProbabilities to compare this with the exact v alue for n = 10 to 25. 15 A baseball pla y er, Smith, has a batting a v erage of : 300 and in a t ypical game comes to bat three times. Assume that Smith's hits in a game can b e considered to b e a Bernoulli trials pro cess with probabilit y .3 for suc c ess. Find the probabilit y that Smith gets 0, 1, 2, and 3 hits. 16 The Siw ash Univ ersit y fo otball team pla ys eigh t games in a season, winning three, losing three, and ending t w o in a tie. Sho w that the n um b er of w a ys that this can happ en is 83 53 = 8! 3! 3! 2! : 17 Using the tec hnique of Exercise 16, sho w that the n um b er of w a ys that one can put n dieren t ob jects in to three b o xes with a in the rst, b in the second, and c in the third is n = ( a b c !). 18 Baumgartner, Prosser, and Cro w ell are grading a calculus exam. There is a truefalse question with ten parts. Baumgartner notices that one studen t has only t w o out of the ten correct and remarks, \The studen t w as not ev en brigh t enough to ha v e ripp ed a coin to determine his answ ers." \Not so clear," sa ys Prosser. \With 340 studen ts I b et that if they all ripp ed coins to determine their answ ers there w ould b e at least one exam with t w o or few er answ ers correct." Cro w ell sa ys, \I'm with Prosser. In fact, I b et that w e should exp ect at least one exam in whic h no answ er is correct if ev ery one is just guessing." Who is righ t in all of this? 19 A gin hand consists of 10 cards from a dec k of 52 cards. Find the probabilit y that a gin hand has (a) all 10 cards of the same suit. (b) exactly 4 cards in one suit and 3 in t w o other suits. (c) a 4, 3, 2, 1, distribution of suits. 20 A sixcard hand is dealt from an ordinary dec k of cards. Find the probabilit y that: (a) All six cards are hearts. (b) There are three aces, t w o kings, and one queen. (c) There are three cards of one suit and three of another suit. 21 A lady wishes to color her ngernails on one hand using at most t w o of the colors red, y ello w, and blue. Ho w man y w a ys can she do this? PAGE 124 116 CHAPTER 3. COMBINA TORICS 22 Ho w man y w a ys can six indistinguishable letters b e put in three mail b o xes? Hint : One represen tation of this is giv en b y a sequence j LL j L j LLL j where the j 's represen t the partitions for the b o xes and the L's the letters. An y p ossible w a y can b e so describ ed. Note that w e need t w o bars at the ends and the remaining t w o bars and the six L's can b e put in an y order. 23 Using the metho d for the hin t in Exercise 22, sho w that r indistinguishable ob jects can b e put in n b o xes in n + r 1 n 1 = n + r 1 r dieren t w a ys. 24 A tra v el bureau estimates that when 20 tourists go to a resort with ten hotels they distribute themselv es as if the bureau w ere putting 20 indistinguishable ob jects in to ten distinguishable b o xes. Assuming this mo del is correct, nd the probabilit y that no hotel is left v acan t when the rst group of 20 tourists arriv es. 25 An elev ator tak es on six passengers and stops at ten ro ors. W e can assign t w o dieren t equiprobable measures for the w a ys that the passengers are disc harged: (a) w e consider the passengers to b e distinguishable or (b) w e consider them to b e indistinguishable (see Exercise 23 for this case). F or eac h case, calculate the probabilit y that all the passengers get o at dieren t ro ors. 26 Y ou are pla ying he ads or tails with Prosser but y ou susp ect that his coin is unfair. V on Neumann suggested that y ou pro ceed as follo ws: T oss Prosser's coin t wice. If the outcome is HT call the result win. if it is TH call the result lose. If it is TT or HH ignore the outcome and toss Prosser's coin t wice again. Keep going un til y ou get either an HT or a TH and call the result win or lose in a single pla y Rep eat this pro cedure for eac h pla y Assume that Prosser's coin turns up heads with probabilit y p (a) Find the probabilit y of HT, TH, HH, TT with t w o tosses of Prosser's coin. (b) Using part (a), sho w that the probabilit y of a win on an y one pla y is 1/2, no matter what p is. 27 John claims that he has extrasensory p o w ers and can tell whic h of t w o sym b ols is on a card turned face do wn (see Example 3.11). T o test his abilit y he is ask ed to do this for a sequence of trials. Let the n ull h yp othesis b e that he is just guessing, so that the probabilit y is 1/2 of his getting it righ t eac h time, and let the alternativ e h yp othesis b e that he can name the sym b ol correctly more than half the time. Devise a test with the prop ert y that the probabilit y of a t yp e 1 error is less than .05 and the probabilit y of a t yp e 2 error is less than .05 if John can name the sym b ol correctly 75 p ercen t of the time. PAGE 125 3.2. COMBINA TIONS 117 28 In Example 3.11 assume the alternativ e h yp othesis is that p = : 8 and that it is desired to ha v e the probabilit y of eac h t yp e of error less than .01. Use the program P o w erCurv e to determine v alues of n and m that will ac hiev e this. Cho ose n as small as p ossible. 29 A drug is assumed to b e eectiv e with an unkno wn probabilit y p T o estimate p the drug is giv en to n patien ts. It is found to b e eectiv e for m patien ts. The metho d of maximum likeliho o d for estimating p states that w e should c ho ose the v alue for p that giv es the highest probabilit y of getting what w e got on the exp erimen t. Assuming that the exp erimen t can b e considered as a Bernoulli trials pro cess with probabilit y p for success, sho w that the maxim um lik eliho o d estimate for p is the prop ortion m=n of successes. 30 Recall that in the W orld Series the rst team to win four games wins the series. The series can go at most sev en games. Assume that the Red So x and the Mets are pla ying the series. Assume that the Mets win eac h game with probabilit y p F ermat observ ed that ev en though the series migh t not go sev en games, the probabilit y that the Mets win the series is the same as the probabilit y that they win four or more game in a series that w as forced to go sev en games no matter who wins the individual games. (a) Using the program P o w erCurv e of Example 3.11 nd the probabilit y that the Mets win the series for the cases p = : 5, p = : 6, p = : 7. (b) Assume that the Mets ha v e probabilit y .6 of winning eac h game. Use the program P o w erCurv e to nd a v alue of n so that, if the series go es to the rst team to win more than half the games, the Mets will ha v e a 95 p ercen t c hance of winning the series. Cho ose n as small as p ossible. 31 Eac h of the four engines on an airplane functions correctly on a giv en righ t with probabilit y .99, and the engines function indep enden tly of eac h other. Assume that the plane can mak e a safe landing if at least t w o of its engines are functioning correctly What is the probabilit y that the engines will allo w for a safe landing? 32 A small b o y is lost coming do wn Moun t W ashington. The leader of the searc h team estimates that there is a probabilit y p that he came do wn on the east side and a probabilit y 1 p that he came do wn on the w est side. He has n p eople in his searc h team who will searc h indep enden tly and, if the b o y is on the side b eing searc hed, eac h mem b er will nd the b o y with probabilit y u Determine ho w he should divide the n p eople in to t w o groups to searc h the t w o sides of the moun tain so that he will ha v e the highest probabilit y of nding the b o y Ho w do es this dep end on u ? *33 2 n balls are c hosen at random from a total of 2 n red balls and 2 n blue balls. Find a com binatorial expression for the probabilit y that the c hosen balls are equally divided in color. Use Stirling's form ula to estimate this probabilit y PAGE 126 118 CHAPTER 3. COMBINA TORICS Using BinomialProbabilities compare the exact v alue with Stirling's appro ximation for n = 20. 34 Assume that ev ery time y ou buy a b o x of Wheaties, y ou receiv e one of the pictures of the n pla y ers on the New Y ork Y ank ees. Ov er a p erio d of time, y ou buy m n b o xes of Wheaties. (a) Use Theorem 3.8 to sho w that the probabilit y that y ou get all n pictures is 1 n 1 n 1 n m + n 2 n 2 n m + ( 1) n 1 n n 1 1 n m : Hint : Let E k b e the ev en t that y ou do not get the k th pla y er's picture. (b) W rite a computer program to compute this probabilit y Use this program to nd, for giv en n the smallest v alue of m whic h will giv e probabilit y : 5 of getting all n pictures. Consider n = 50, 100, and 150 and sho w that m = n log n + n log 2 is a go o d estimate for the n um b er of b o xes needed. (F or a deriv ation of this estimate, see F eller. 26 ) *35 Pro v e the follo wing binomial identity 2 n n = n X j =0 n j 2 : Hint : Consider an urn with n red balls and n blue balls inside. Sho w that eac h side of the equation equals the n um b er of w a ys to c ho ose n balls from the urn. 36 Let j and n b e p ositiv e in tegers, with j n An exp erimen t consists of c ho osing, at random, a j tuple of p ositive in tegers whose sum is at most n (a) Find the size of the sample space. Hint : Consider n indistinguishable balls placed in a ro w. Place j mark ers b et w een consecutiv e pairs of balls, with no t w o mark ers b et w een the same pair of balls. (W e also allo w one of the n mark ers to b e placed at the end of the ro w of balls.) Sho w that there is a 11 corresp ondence b et w een the set of p ossible p ositions for the mark ers and the set of j tuples whose size w e are trying to coun t. (b) Find the probabilit y that the j tuple selected con tains at least one 1. 37 Let n (mo d m ) denote the remainder when the in teger n is divided b y the in teger m W rite a computer program to compute the n um b ers n j (mo d m ) where n j is a binomial co ecien t and m is an in teger. Y ou can do this b y using the recursion relations for generating binomial co ecien ts, doing all the 26 W. F eller, Intr o duction to Pr ob ability The ory and its Applic ations, v ol. I, 3rd ed. (New Y ork: John Wiley & Sons, 1968), p. 106. PAGE 127 3.2. COMBINA TIONS 119 arithmetic using the basic function mo d( n; m ). T ry to write y our program to mak e as large a table as p ossible. Run y our program for the cases m = 2 to 7. Do y ou see an y patterns? In particular, for the case m = 2 and n a p o w er of 2, v erify that all the en tries in the ( n 1)st ro w are 1. (The corresp onding binomial n um b ers are o dd.) Use y our pictures to explain wh y this is true. 38 Lucas 27 pro v ed the follo wing general result relating to Exercise 37. If p is an y prime n um b er, then n j (mo d p ) can b e found as follo ws: Expand n and j in base p as n = s 0 + s 1 p + s 2 p 2 + + s k p k and j = r 0 + r 1 p + r 2 p 2 + + r k p k resp ectiv ely (Here k is c hosen large enough to represen t all n um b ers from 0 to n in base p using k digits.) Let s = ( s 0 ; s 1 ; s 2 ; : : : ; s k ) and r = ( r 0 ; r 1 ; r 2 ; : : : ; r k ). Then n j (mo d p ) = k Yi =0 s i r i (mo d p ) : F or example, if p = 7, n = 12, and j = 9, then 12 = 5 7 0 + 1 7 1 ; 9 = 2 7 0 + 1 7 1 ; so that s = (5 ; 1) ; r = (2 ; 1) ; and this result states that 12 9 (mo d p ) = 52 11 (mo d 7) : Since 12 9 = 220 = 3 (mo d 7), and 52 = 10 = 3 (mo d 7), w e see that the result is correct for this example. Sho w that this result implies that, for p = 2, the ( p k 1)st ro w of y our triangle in Exercise 37 has no zeros. 39 Pro v e that the probabilit y of exactly n heads in 2 n tosses of a fair coin is giv en b y the pro duct of the o dd n um b ers up to 2 n 1 divided b y the pro duct of the ev en n um b ers up to 2 n 40 Let n b e a p ositiv e in teger, and assume that j is a p ositiv e in teger not exceeding n= 2. Sho w that in Theorem 3.5, if one alternates the m ultiplications and divisions, then all of the in termediate v alues in the calculation are in tegers. Sho w also that none of these in termediate v alues exceed the nal v alue. 27 E. Lucas, \Th eorie des F unctions Num eriques Simplemen t P erio diques," A meric an J. Math., v ol. 1 (1878), pp. 184240, 289321. PAGE 128 120 CHAPTER 3. COMBINA TORICS 3.3 Card Sh uing Muc h of this section is based up on an article b y Brad Mann, 28 whic h is an exp osition of an article b y Da vid Ba y er and P ersi Diaconis. 29 Rie Sh ues Giv en a dec k of n cards, ho w man y times m ust w e sh ue it to mak e it \random"? Of course, the answ er dep ends up on the metho d of sh uing whic h is used and what w e mean b y \random." W e shall b egin the study of this question b y considering a standard mo del for the rie sh ue. W e b egin with a dec k of n cards, whic h w e will assume are lab elled in increasing order with the in tegers from 1 to n A rie sh ue consists of a cut of the dec k in to t w o stac ks and an in terlea ving of the t w o stac ks. F or example, if n = 6, the initial ordering is (1 ; 2 ; 3 ; 4 ; 5 ; 6), and a cut migh t o ccur b et w een cards 2 and 3. This giv es rise to t w o stac ks, namely (1 ; 2) and (3 ; 4 ; 5 ; 6). These are in terlea v ed to form a new ordering of the dec k. F or example, these t w o stac ks migh t form the ordering (1 ; 3 ; 4 ; 2 ; 5 ; 6). In order to discuss suc h sh ues, w e need to assign a probabilit y distribution to the set of all p ossible sh ues. There are sev eral reasonable w a ys in whic h this can b e done. W e will giv e sev eral dieren t assignmen t strategies, and sho w that they are equiv alen t. (This do es not mean that this assignmen t is the only reasonable one.) First, w e assign the binomial probabilit y b ( n; 1 = 2 ; k ) to the ev en t that the cut o ccurs after the k th card. Next, w e assume that all p ossible in terlea vings, giv en a cut, are equally lik ely Th us, to complete the assignmen t of probabilities, w e need to determine the n um b er of p ossible in terlea vings of t w o stac ks of cards, with k and n k cards, resp ectiv ely W e b egin b y writing the second stac k in a line, with spaces in b et w een eac h pair of consecutiv e cards, and with spaces at the b eginning and end (so there are n k + 1 spaces). W e c ho ose, with replacemen t, k of these spaces, and place the cards from the rst stac k in the c hosen spaces. This can b e done in n k w a ys. Th us, the probabilit y of a giv en in terlea ving should b e 1 n k : Next, w e note that if the new ordering is not the iden tit y ordering, it is the result of a unique cutin terlea ving pair. If the new ordering is the iden tit y it is the result of an y one of n + 1 cutin terlea ving pairs. W e dene a rising se quenc e in an ordering to b e a maximal subsequence of consecutiv e in tegers in increasing order. F or example, in the ordering (2 ; 3 ; 5 ; 1 ; 4 ; 7 ; 6) ; 28 B. Mann, \Ho w Man y Times Should Y ou Sh ue a Dec k of Cards?", UMAP Journal v ol. 15, no. 4 (1994), pp. 303{331. 29 D. Ba y er and P Diaconis, \T railing the Do v etail Sh ue to its Lair," A nnals of Applie d Pr obability v ol. 2, no. 2 (1992), pp. 294{313. PAGE 129 3.3. CARD SHUFFLING 121 there are 4 rising sequences; they are (1), (2 ; 3 ; 4), (5 ; 6), and (7). It is easy to see that an ordering is the result of a rie sh ue applied to the iden tit y ordering if and only if it has no more than t w o rising sequences. (If the ordering has t w o rising sequences, then these rising sequences corresp ond to the t w o stac ks induced b y the cut, and if the ordering has one rising sequence, then it is the iden tit y ordering.) Th us, the sample space of orderings obtained b y applying a rie sh ue to the iden tit y ordering is naturally describ ed as the set of all orderings with at most t w o rising sequences. It is no w easy to assign a probabilit y distribution to this sample space. Eac h ordering with t w o rising sequences is assigned the v alue b ( n; 1 = 2 ; k ) n k = 1 2 n ; and the iden tit y ordering is assigned the v alue n + 1 2 n : There is another w a y to view a rie sh ue. W e can imagine starting with a dec k cut in to t w o stac ks as b efore, with the same probabilities assignmen t as b efore i.e., the binomial distribution. Once w e ha v e the t w o stac ks, w e tak e cards, one b y one, o of the b ottom of the t w o stac ks, and place them on to one stac k. If there are k 1 and k 2 cards, resp ectiv ely in the t w o stac ks at some p oin t in this pro cess, then w e mak e the assumption that the probabilities that the next card to b e tak en comes from a giv en stac k is prop ortional to the curren t stac k size. This implies that the probabilit y that w e tak e the next card from the rst stac k equals k 1 k 1 + k 2 ; and the corresp onding probabilit y for the second stac k is k 2 k 1 + k 2 : W e shall no w sho w that this pro cess assigns the uniform probabilit y to eac h of the p ossible in terlea vings of the t w o stac ks. Supp ose, for example, that an in terlea ving came ab out as the result of c ho osing cards from the t w o stac ks in some order. The probabilit y that this result o ccurred is the pro duct of the probabilities at eac h p oin t in the pro cess, since the c hoice of card at eac h p oin t is assumed to b e indep enden t of the previous c hoices. Eac h factor of this pro duct is of the form k i k 1 + k 2 ; where i = 1 or 2, and the denominator of eac h factor equals the n um b er of cards left to b e c hosen. Th us, the denominator of the probabilit y is just n !. A t the momen t when a card is c hosen from a stac k that has i cards in it, the n umerator of the PAGE 130 122 CHAPTER 3. COMBINA TORICS corresp onding factor in the probabilit y is i and the n um b er of cards in this stac k decreases b y 1. Th us, the n umerator is seen to b e k !( n k )!, since all cards in b oth stac ks are ev en tually c hosen. Therefore, this pro cess assigns the probabilit y 1 n k to eac h p ossible in terlea ving. W e no w turn to the question of what happ ens when w e rie sh ue s times. It should b e clear that if w e start with the iden tit y ordering, w e obtain an ordering with at most 2 s rising sequences, since a rie sh ue creates at most t w o rising sequences from ev ery rising sequence in the starting ordering. In fact, it is not hard to see that eac h suc h ordering is the result of s rie sh ues. The question b ecomes, then, in ho w man y w a ys can an ordering with r rising sequences come ab out b y applying s rie sh ues to the iden tit y ordering? In order to answ er this question, w e turn to the idea of an a sh ue. a Sh ues There are sev eral w a ys to visualize an a sh ue. One w a y is to imagine a creature with a hands who is giv en a dec k of cards to rie sh ue. The creature naturally cuts the dec k in to a stac ks, and then ries them together. (Imagine that!) Th us, the ordinary rie sh ue is a 2sh ue. As in the case of the ordinary 2sh ue, w e allo w some of the stac ks to ha v e 0 cards. Another w a y to visualize an a sh ue is to think ab out its in v erse, called an a unsh ue. This idea is describ ed in the pro of of the next theorem. W e will no w sho w that an a sh ue follo w ed b y a b sh ue is equiv alen t to an ab sh ue. This means, in particular, that s rie sh ues in succession are equiv alen t to one 2 s sh ue. This equiv alence is made precise b y the follo wing theorem. Theorem 3.9 Let a and b b e t w o p ositiv e in tegers. Let S a;b b e the set of all ordered pairs in whic h the rst en try is an a sh ue and the second en try is a b sh ue. Let S ab b e the set of all ab sh ues. Then there is a 11 corresp ondence b et w een S a;b and S ab with the follo wing prop ert y Supp ose that ( T 1 ; T 2 ) corresp onds to T 3 If T 1 is applied to the iden tit y ordering, and T 2 is applied to the resulting ordering, then the nal ordering is the same as the ordering that is obtained b y applying T 3 to the iden tit y ordering. Pro of. The easiest w a y to describ e the required corresp ondence is through the idea of an unsh ue. An a unsh ue b egins with a dec k of n cards. One b y one, cards are tak en from the top of the dec k and placed, with equal probabilit y on the b ottom of an y one of a stac ks, where the stac ks are lab elled from 0 to a 1. After all of the cards ha v e b een distributed, w e com bine the stac ks to form one stac k b y placing stac k i on top of stac k i + 1, for 0 i a 1. It is easy to see that if one starts with a dec k, there is exactly one w a y to cut the dec k to obtain the a stac ks generated b y the a unsh ue, and with these a stac ks, there is exactly one w a y to in terlea v e them PAGE 131 3.3. CARD SHUFFLING 123 to obtain the dec k in the order that it w as in b efore the unsh ue w as p erformed. Th us, this a unsh ue corresp onds to a unique a sh ue, and this a sh ue is the in v erse of the original a unsh ue. If w e apply an ab unsh ue U 3 to a dec k, w e obtain a set of ab stac ks, whic h are then com bined, in order, to form one stac k. W e lab el these stac ks with ordered pairs of in tegers, where the rst co ordinate is b et w een 0 and a 1, and the second co ordinate is b et w een 0 and b 1. Then w e lab el eac h card with the lab el of its stac k. The n um b er of p ossible lab els is ab as required. Using this lab elling, w e can describ e ho w to nd a b unsh ue and an a unsh ue, suc h that if these t w o unsh ues are applied in this order to the dec k, w e obtain the same set of ab stac ks as w ere obtained b y the ab unsh ue. T o obtain the b unsh ue U 2 w e sort the dec k in to b stac ks, with the i th stac k con taining all of the cards with second co ordinate i for 0 i b 1. Then these stac ks are com bined to form one stac k. The a unsh ue U 1 pro ceeds in the same manner, except that the rst co ordinates of the lab els are used. The resulting a stac ks are then com bined to form one stac k. The ab o v e description sho ws that the cards ending up on top are all those lab elled (0 ; 0). These are follo w ed b y those lab elled (0 ; 1) ; (0 ; 2) ; : : : ; (0 ; b 1) ; (1 ; 0) ; (1 ; 1) ; : : : ; ( a 1 ; b 1). F urthermore, the relativ e order of an y pair of cards with the same lab els is nev er altered. But this is exactly the same as an ab unsh ue, if, at the b eginning of suc h an unsh ue, w e lab el eac h of the cards with one of the lab els (0 ; 0) ; (0 ; 1) ; : : : ; (0 ; b 1) ; (1 ; 0) ; (1 ; 1) ; : : : ; ( a 1 ; b 1). This completes the pro of. 2 In Figure 3.11, w e sho w the lab els for a 2unsh ue of a dec k with 10 cards. There are 4 cards with the lab el 0 and 6 cards with the lab el 1, so if the 2unsh ue is p erformed, the rst stac k will ha v e 4 cards and the second stac k will ha v e 6 cards. When this unsh ue is p erformed, the dec k ends up in the iden tit y ordering. In Figure 3.12, w e sho w the lab els for a 4unsh ue of the same dec k (b ecause there are four lab els b eing used). This gure can also b e regarded as an example of a pair of 2unsh ues, as describ ed in the pro of ab o v e. The rst 2unsh ue will use the second co ordinate of the lab els to determine the stac ks. In this case, the t w o stac ks con tain the cards whose v alues are f 5 ; 1 ; 6 ; 2 ; 7 g and f 8 ; 9 ; 3 ; 4 ; 10 g : After this 2unsh ue has b een p erformed, the dec k is in the order sho wn in Figure 3.11, as the reader should c hec k. If w e wish to p erform a 4unsh ue on the dec k, using the lab els sho wn, w e sort the cards lexicographically obtaining the four stac ks f 1 ; 2 g ; f 3 ; 4 g ; f 5 ; 6 ; 7 g ; and f 8 ; 9 ; 10 g : When these stac ks are com bined, w e once again obtain the iden tit y ordering of the dec k. The p oin t of the ab o v e theorem is that b oth sorting pro cedures alw a ys lead to the same initial ordering. PAGE 132 124 CHAPTER 3. COMBINA TORICS Figure 3.11: Before a 2unsh ue. Figure 3.12: Before a 4unsh ue. PAGE 133 3.3. CARD SHUFFLING 125 Theorem 3.10 If D is an y ordering that is the result of applying an a sh ue and then a b sh ue to the iden tit y ordering, then the probabilit y assigned to D b y this pair of op erations is the same as the probabilit y assigned to D b y the pro cess of applying an ab sh ue to the iden tit y ordering. Pro of. Call the sample space of a sh ues S a If w e lab el the stac ks b y the in tegers from 0 to a 1, then eac h cutin terlea ving pair, i.e., sh ue, corresp onds to exactly one n digit base a in teger, where the i th digit in the in teger is the stac k of whic h the i th card is a mem b er. Th us, the n um b er of cutin terlea ving pairs is equal to the n um b er of n digit base a in tegers, whic h is a n Of course, not all of these pairs leads to dieren t orderings. The n um b er of pairs leading to a giv en ordering will b e discussed later. F or our purp oses it is enough to p oin t out that it is the cutin terlea ving pairs that determine the probabilit y assignmen t. The previous theorem sho ws that there is a 11 corresp ondence b et w een S a;b and S ab F urthermore, corresp onding elemen ts giv e the same ordering when applied to the iden tit y ordering. Giv en an y ordering D let m 1 b e the n um b er of elemen ts of S a;b whic h, when applied to the iden tit y ordering, result in D Let m 2 b e the n um b er of elemen ts of S ab whic h, when applied to the iden tit y ordering, result in D The previous theorem implies that m 1 = m 2 Th us, b oth sets assign the probabilit y m 1 ( ab ) n to D This completes the pro of. 2 Connection with the Birthda y Problem There is another p oin t that can b e made concerning the lab els giv en to the cards b y the successiv e unsh ues. Supp ose that w e 2unsh ue an n card dec k un til the lab els on the cards are all dieren t. It is easy to see that this pro cess pro duces eac h p erm utation with the same probabilit y i.e., this is a random pro cess. T o see this, note that if the lab els b ecome distinct on the s th 2unsh ue, then one can think of this sequence of 2unsh ues as one 2 s unsh ue, in whic h all of the stac ks determined b y the unsh ue ha v e at most one card in them (remem b er, the stac ks corresp ond to the lab els). If eac h stac k has at most one card in it, then giv en an y t w o cards in the dec k, it is equally lik ely that the rst card has a lo w er or a higher lab el than the second card. Th us, eac h p ossible ordering is equally lik ely to result from this 2 s unsh ue. Let T b e the random v ariable that coun ts the n um b er of 2unsh ues un til all lab els are distinct. One can think of T as giving a measure of ho w long it tak es in the unsh uing pro cess un til randomness is reac hed. Since sh uing and unsh uing are in v erse pro cesses, T also measures the n um b er of sh ues necessary to ac hiev e randomness. Supp ose that w e ha v e an n card dec k, and w e ask for P ( T s ). This equals 1 P ( T > s ). But T > s if and only if it is the case that not all of the lab els after s 2unsh ues are distinct. This is just the birthda y problem; w e are asking for the probabilit y that at least t w o p eople ha v e the same birthda y giv en PAGE 134 126 CHAPTER 3. COMBINA TORICS that w e ha v e n p eople and there are 2 s p ossible birthda ys. Using our form ula from Example 3.3, w e nd that P ( T > s ) = 1 2 s n n 2 sn : (3.4) In Chapter 6, w e will dene the a v erage v alue of a random v ariable. Using this idea, and the ab o v e equation, one can calculate the a v erage v alue of the random v ariable T (see Exercise 6.1.41). F or example, if n = 52, then the a v erage v alue of T is ab out 11.7. This means that, on the a v erage, ab out 12 rie sh ues are needed for the pro cess to b e considered random. CutIn terlea ving P airs and Orderings As w as noted in the pro of of Theorem 3.10, not all of the cutin terlea ving pairs lead to dieren t orderings. Ho w ev er, there is an easy form ula whic h giv es the n um b er of suc h pairs that lead to a giv en ordering. Theorem 3.11 If an ordering of length n has r rising sequences, then the n um b er of cutin terlea ving pairs under an a sh ue of the iden tit y ordering whic h lead to the ordering is n + a r n : Pro of. T o see wh y this is true, w e need to coun t the n um b er of w a ys in whic h the cut in an a sh ue can b e p erformed whic h will lead to a giv en ordering with r rising sequences. W e can disregard the in terlea vings, since once a cut has b een made, at most one in terlea ving will lead to a giv en ordering. Since the giv en ordering has r rising sequences, r 1 of the division p oin ts in the cut are determined. The remaining a 1 ( r 1) = a r division p oin ts can b e placed an ywhere. The n um b er of places to put these remaining division p oin ts is n + 1 (whic h is the n um b er of spaces b et w een the consecutiv e pairs of cards, including the p ositions at the b eginning and the end of the dec k). These places are c hosen with rep etition allo w ed, so the n um b er of w a ys to mak e these c hoices is n + a r a r = n + a r n : In particular, this means that if D is an ordering that is the result of applying an a sh ue to the iden tit y ordering, and if D has r rising sequences, then the probabilit y assigned to D b y this pro cess is n + a r n a n : This completes the pro of. 2 PAGE 135 3.3. CARD SHUFFLING 127 The ab o v e theorem sho ws that the essen tial information ab out the probabilit y assigned to an ordering under an a sh ue is just the n um b er of rising sequences in the ordering. Th us, if w e determine the n um b er of orderings whic h con tain exactly r rising sequences, for eac h r b et w een 1 and n then w e will ha v e determined the distribution function of the random v ariable whic h consists of applying a random a sh ue to the iden tit y ordering. The n um b er of orderings of f 1 ; 2 ; : : : ; n g with r rising sequences is denoted b y A ( n; r ), and is called an Eulerian n um b er. There are man y w a ys to calculate the v alues of these n um b ers; the follo wing theorem giv es one recursiv e metho d whic h follo ws immediately from what w e already kno w ab out a sh ues. Theorem 3.12 Let a and n b e p ositiv e in tegers. Then a n = a X r =1 n + a r n A ( n; r ) : (3.5) Th us, A ( n; a ) = a n a 1 X r =1 n + a r n A ( n; r ) : In addition, A ( n; 1) = 1 : Pro of. The second equation can b e used to calculate the v alues of the Eulerian n um b ers, and follo ws immediately from the Equation 3.5. The last equation is a consequence of the fact that the only ordering of f 1 ; 2 ; : : : ; n g with one rising sequence is the iden tit y ordering. Th us, it remains to pro v e Equation 3.5. W e will coun t the set of a sh ues of a dec k with n cards in t w o w a ys. First, w e kno w that there are a n suc h sh ues (this w as noted in the pro of of Theorem 3.10). But there are A ( n; r ) orderings of f 1 ; 2 ; : : : ; n g with r rising sequences, and Theorem 3.11 states that for eac h suc h ordering, there are exactly n + a r n cutin terlea ving pairs that lead to the ordering. Therefore, the righ thand side of Equation 3.5 coun ts the set of a sh ues of an n card dec k. This completes the pro of. 2 Random Orderings and Random Pro cesses W e no w turn to the second question that w as ask ed at the b eginning of this section: What do w e mean b y a \random" ordering? It is somewhat misleading to think ab out a giv en ordering as b eing random or not random. If w e w an t to c ho ose a random ordering from the set of all orderings of f 1 ; 2 ; : : : ; n g w e mean that w e w an t ev ery ordering to b e c hosen with the same probabilit y i.e., an y ordering is as \random" as an y other. PAGE 136 128 CHAPTER 3. COMBINA TORICS The w ord \random" should really b e used to describ e a pro cess. W e will sa y that a pro cess that pro duces an ob ject from a (nite) set of ob jects is a random pro cess if eac h ob ject in the set is pro duced with the same probabilit y b y the pro cess. In the presen t situation, the ob jects are the orderings, and the pro cess whic h pro duces these ob jects is the sh uing pro cess. It is easy to see that no a sh ue is really a random pro cess, since if T 1 and T 2 are t w o orderings with a dieren t n um b er of rising sequences, then they are pro duced b y an a sh ue, applied to the iden tit y ordering, with dieren t probabilities. V ariation Distance Instead of requiring that a sequence of sh ues yield a pro cess whic h is random, w e will dene a measure that describ es ho w far a w a y a giv en pro cess is from a random pro cess. Let X b e an y pro cess whic h pro duces an ordering of f 1 ; 2 ; : : : ; n g Dene f X ( ) b e the probabilit y that X pro duces the ordering (Th us, X can b e though t of as a random v ariable with distribution function f .) Let n n b e the set of all orderings of f 1 ; 2 ; : : : ; n g Finally let u ( ) = 1 = j n n j for all 2 n n The function u is the distribution function of a pro cess whic h pro duces orderings and whic h is random. F or eac h ordering 2 n n the quan tit y j f X ( ) u ( ) j is the dierence b et w een the actual and desired probabilities that X pro duces If w e sum this o v er all orderings and call this sum S w e see that S = 0 if and only if X is random, and otherwise S is p ositiv e. It is easy to sho w that the maxim um v alue of S is 2, so w e will m ultiply the sum b y 1 = 2 so that the v alue falls in the in terv al [0 ; 1]. Th us, w e obtain the follo wing sum as the form ula for the variation distanc e b et w een the t w o pro cesses: k f X u k = 1 2 X 2 n n j f X ( ) u ( ) j : No w w e apply this idea to the case of sh uing. W e let X b e the pro cess of s successiv e rie sh ues applied to the iden tit y ordering. W e kno w that it is also p ossible to think of X as one 2 s sh ue. W e also kno w that f X is constan t on the set of all orderings with r rising sequences, where r is an y p ositiv e in teger. Finally w e kno w the v alue of f X on an ordering with r rising sequences, and w e kno w ho w man y suc h orderings there are. Th us, in this sp ecic case, w e ha v e k f X u k = 1 2 n X r =1 A ( n; r ) 2 s + n r n = 2 ns 1 n : Since this sum has only n summands, it is easy to compute this for mo derate sized v alues of n F or n = 52, w e obtain the list of v alues giv en in T able 3.14. T o help in understanding these data, they are sho wn in graphical form in Figure 3.13. The program V ariationList pro duces the data sho wn in b oth T able 3.14 and Figure 3.13. One sees that un til 5 sh ues ha v e o ccurred, the output of X is PAGE 137 3.3. CARD SHUFFLING 129 Num b er of Rie Sh ues V ariation Distance 1 1 2 1 3 1 4 0.9999995334 5 0.9237329294 6 0.6135495966 7 0.3340609995 8 0.1671586419 9 0.0854201934 10 0.0429455489 11 0.0215023760 12 0.0107548935 13 0.0053779101 14 0.0026890130 T able 3.14: Distance to the random pro cess. 5 10 15 20 0.2 0.4 0.6 0.8 1 Figure 3.13: Distance to the random pro cess. PAGE 138 130 CHAPTER 3. COMBINA TORICS v ery far from random. After 5 sh ues, the distance from the random pro cess is essen tially halv ed eac h time a sh ue o ccurs. Giv en the distribution functions f X ( ) and u ( ) as ab o v e, there is another w a y to view the v ariation distance k f X u k Giv en an y ev en t T (whic h is a subset of S n ), w e can calculate its probabilit y under the pro cess X and under the uniform pro cess. F or example, w e can imagine that T represen ts the set of all p erm utations in whic h the rst pla y er in a 7pla y er p ok er game is dealt a straigh t rush (v e consecutiv e cards in the same suit). It is in teresting to consider ho w m uc h the probabilit y of this ev en t after a certain n um b er of sh ues diers from the probabilit y of this ev en t if all p erm utations are equally lik ely This dierence can b e though t of as describing ho w close the pro cess X is to the random pro cess with resp ect to the ev en t T No w consider the ev en t T suc h that the absolute v alue of the dierence b et w een these t w o probabilities is as large as p ossible. It can b e sho wn that this absolute v alue is the v ariation distance b et w een the pro cess X and the uniform pro cess. (The reader is ask ed to pro v e this fact in Exercise 4.) W e ha v e just seen that, for a dec k of 52 cards, the v ariation distance b et w een the 7rie sh ue pro cess and the random pro cess is ab out : 334. It is of in terest to nd an ev en t T suc h that the dierence b et w een the probabilities that the t w o pro cesses pro duce T is close to : 334. An ev en t with this prop ert y can b e describ ed in terms of the game called NewAge Solitaire. NewAge Solitaire This game w as in v en ted b y P eter Do yle. It is pla y ed with a standard 52card dec k. W e deal the cards face up, one at a time, on to a discard pile. If an ace is encoun tered, sa y the ace of Hearts, w e use it to start a Heart pile. Eac h suit pile m ust b e built up in order, from ace to king, using only subsequen tly dealt cards. Once w e ha v e dealt all of the cards, w e pic k up the discard pile and con tin ue. W e dene the Yin suits to b e Hearts and Clubs, and the Y ang suits to b e Diamonds and Spades. The game ends when either b oth Yin suit piles ha v e b een completed, or b oth Y ang suit piles ha v e b een completed. It is clear that if the ordering of the dec k is pro duced b y the random pro cess, then the probabilit y that the Yin suit piles are completed rst is exactly 1/2. No w supp ose that w e buy a new dec k of cards, break the seal on the pac k age, and rie sh ue the dec k 7 times. If one tries this, one nds that the Yin suits win ab out 75% of the time. This is 25% more than w e w ould get if the dec k w ere in truly random order. This deviation is reasonably close to the theoretical maxim um of 33 : 4% obtained ab o v e. Wh y do the Yin suits win so often? In a brand new dec k of cards, the suits are in the follo wing order, from top to b ottom: ace through king of Hearts, ace through king of Clubs, king through ace of Diamonds, and king through ace of Spades. Note that if the cards w ere not sh ued at all, then the Yin suit piles w ould b e completed on the rst pass, b efore an y Y ang suit cards are ev en seen. If w e w ere to con tin ue pla ying the game un til the Y ang suit piles are completed, it w ould tak e 13 passes PAGE 139 3.3. CARD SHUFFLING 131 through the dec k to do this. Th us, one can see that in a new dec k, the Yin suits are in the most adv an tageous order and the Y ang suits are in the least adv an tageous order. Under 7 rie sh ues, the relativ e adv an tage of the Yin suits o v er the Y ang suits is preserv ed to a certain exten t. Exercises 1 Giv en an y ordering of f 1 ; 2 ; : : : ; n g w e can dene 1 the in v erse ordering of to b e the ordering in whic h the i th elemen t is the p osition o ccupied b y i in F or example, if = (1 ; 3 ; 5 ; 2 ; 4 ; 7 ; 6), then 1 = (1 ; 4 ; 2 ; 5 ; 3 ; 7 ; 6). (If one thinks of these orderings as p erm utations, then 1 is the in v erse of .) A fal l o ccurs b et w een t w o p ositions in an ordering if the left p osition is o ccupied b y a larger n um b er than the righ t p osition. It will b e con v enien t to sa y that ev ery ordering has a fall after the last p osition. In the ab o v e example, 1 has four falls. They o ccur after the second, fourth, sixth, and sev en th p ositions. Pro v e that the n um b er of rising sequences in an ordering equals the n um b er of falls in 1 2 Sho w that if w e start with the iden tit y ordering of f 1 ; 2 ; : : : ; n g then the probabilit y that an a sh ue leads to an ordering with exactly r rising sequences equals n + a r n a n A ( n; r ) ; for 1 r a 3 Let D b e a dec k of n cards. W e ha v e seen that there are a n a sh ues of D A co ding of the set of a unsh ues w as giv en in the pro of of Theorem 3.9. W e will no w giv e a co ding of the a sh ues whic h corresp onds to the co ding of the a unsh ues. Let S b e the set of all n tuples of in tegers, eac h b et w een 0 and a 1. Let M = ( m 1 ; m 2 ; : : : ; m n ) b e an y elemen t of S Let n i b e the n um b er of i 's in M for 0 i a 1. Supp ose that w e start with the dec k in increasing order (i.e., the cards are n um b ered from 1 to n ). W e lab el the rst n 0 cards with a 0, the next n 1 cards with a 1, etc. Then the a sh ue corresp onding to M is the sh ue whic h results in the ordering in whic h the cards lab elled i are placed in the p ositions in M con taining the lab el i The cards with the same lab el are placed in these p ositions in increasing order of their n um b ers. F or example, if n = 6 and a = 3, let M = (1 ; 0 ; 2 ; 2 ; 0 ; 2). Then n 0 = 2 ; n 1 = 1 ; and n 2 = 3. So w e lab el cards 1 and 2 with a 0, card 3 with a 1, and cards 4, 5, and 6 with a 2. Then cards 1 and 2 are placed in p ositions 2 and 5, card 3 is placed in p osition 1, and cards 4, 5, and 6 are placed in p ositions 3, 4, and 6, resulting in the ordering (3 ; 1 ; 4 ; 5 ; 2 ; 6). (a) Using this co ding, sho w that the probabilit y that in an a sh ue, the rst card (i.e., card n um b er 1) mo v es to the i th p osition, is giv en b y the follo wing expression: ( a 1) i 1 a n i + ( a 2) i 1 ( a 1) n i + + 1 i 1 2 n i a n : PAGE 140 132 CHAPTER 3. COMBINA TORICS (b) Giv e an accurate estimate for the probabilit y that in three rie sh ues of a 52card dec k, the rst card ends up in one of the rst 26 p ositions. Using a computer, accurately estimate the probabilit y of the same ev en t after sev en rie sh ues. 4 Let X denote a particular pro cess that pro duces elemen ts of S n and let U denote the uniform pro cess. Let the distribution functions of these pro cesses b e denoted b y f X and u resp ectiv ely Sho w that the v ariation distance k f X u k is equal to max T S n X 2 T f X ( ) u ( ) : Hint : W rite the p erm utations in S n in decreasing order of the dierence f X ( ) u ( ). 5 Consider the pro cess describ ed in the text in whic h an n card dec k is rep eatedly lab elled and 2unsh ued, in the manner describ ed in the pro of of Theorem 3.9. (See Figures 3.10 and 3.13.) The pro cess con tin ues un til the lab els are all dieren t. Sho w that the pro cess nev er terminates un til at least d log 2 ( n ) e unsh ues ha v e b een done. PAGE 141 Chapter 4 Conditional Probabilit y 4.1 Discrete Conditional Probabilit y Conditional Probabilit y In this section w e ask and answ er the follo wing question. Supp ose w e assign a distribution function to a sample space and then learn that an ev en t E has o ccurred. Ho w should w e c hange the probabilities of the remaining ev en ts? W e shall call the new probabilit y for an ev en t F the c onditional pr ob ability of F given E and denote it b y P ( F j E ). Example 4.1 An exp erimen t consists of rolling a die once. Let X b e the outcome. Let F b e the ev en t f X = 6 g and let E b e the ev en t f X > 4 g W e assign the distribution function m ( ) = 1 = 6 for = 1 ; 2 ; : : : ; 6. Th us, P ( F ) = 1 = 6. No w supp ose that the die is rolled and w e are told that the ev en t E has o ccurred. This lea v es only t w o p ossible outcomes: 5 and 6. In the absence of an y other information, w e w ould still regard these outcomes to b e equally lik ely so the probabilit y of F b ecomes 1/2, making P ( F j E ) = 1 = 2. 2 Example 4.2 In the Life T able (see App endix C), one nds that in a p opulation of 100,000 females, 89.835% can exp ect to liv e to age 60, while 57.062% can exp ect to liv e to age 80. Giv en that a w oman is 60, what is the probabilit y that she liv es to age 80? This is an example of a conditional probabilit y In this case, the original sample space can b e though t of as a set of 100,000 females. The ev en ts E and F are the subsets of the sample space consisting of all w omen who liv e at least 60 y ears, and at least 80 y ears, resp ectiv ely W e consider E to b e the new sample space, and note that F is a subset of E Th us, the size of E is 89,835, and the size of F is 57,062. So, the probabilit y in question equals 57 ; 062 = 89 ; 835 = : 6352. Th us, a w oman who is 60 has a 63.52% c hance of living to age 80. 2 133 PAGE 142 134 CHAPTER 4. CONDITIONAL PR OBABILITY Example 4.3 Consider our v oting example from Section 1.2: three candidates A, B, and C are running for oce. W e decided that A and B ha v e an equal c hance of winning and C is only 1/2 as lik ely to win as A. Let A b e the ev en t \A wins," B that \B wins," and C that \C wins." Hence, w e assigned probabilities P ( A ) = 2 = 5, P ( B ) = 2 = 5, and P ( C ) = 1 = 5. Supp ose that b efore the election is held, A drops out of the race. As in Example 4.1, it w ould b e natural to assign new probabilities to the ev en ts B and C whic h are prop ortional to the original probabilities. Th us, w e w ould ha v e P ( B j A ) = 2 = 3, and P ( C j A ) = 1 = 3. It is imp ortan t to note that an y time w e assign probabilities to reallife ev en ts, the resulting distribution is only useful if w e tak e in to accoun t all relev an t information. In this example, w e ma y ha v e kno wledge that most v oters who fa v or A will v ote for C if A is no longer in the race. This will clearly mak e the probabilit y that C wins greater than the v alue of 1/3 that w as assigned ab o v e. 2 In these examples w e assigned a distribution function and then w ere giv en new information that determined a new sample space, consisting of the outcomes that are still p ossible, and caused us to assign a new distribution function to this space. W e w an t to mak e formal the pro cedure carried out in these examples. Let n = f 1 ; 2 ; : : : ; r g b e the original sample space with distribution function m ( j ) assigned. Supp ose w e learn that the ev en t E has o ccurred. W e w an t to assign a new distribution function m ( j j E ) to n to rerect this fact. Clearly if a sample p oin t j is not in E w e w an t m ( j j E ) = 0. Moreo v er, in the absence of information to the con trary it is reasonable to assume that the probabilities for k in E should ha v e the same relativ e magnitudes that they had b efore w e learned that E had o ccurred. F or this w e require that m ( k j E ) = cm ( k ) for all k in E with c some p ositiv e constan t. But w e m ust also ha v e X E m ( k j E ) = c X E m ( k ) = 1 : Th us, c = 1 P E m ( k ) = 1 P ( E ) : (Note that this requires us to assume that P ( E ) > 0.) Th us, w e will dene m ( k j E ) = m ( k ) P ( E ) for k in E W e will call this new distribution the c onditional distribution giv en E F or a general ev en t F this giv es P ( F j E ) = X F \ E m ( k j E ) = X F \ E m ( k ) P ( E ) = P ( F \ E ) P ( E ) : W e call P ( F j E ) the c onditional pr ob ability of F o c curring given that E o c curs, and compute it using the form ula P ( F j E ) = P ( F \ E ) P ( E ) : PAGE 143 4.1. DISCRETE CONDITIONAL PR OBABILITY 135 (start) r p (w) w w w w w 1/2 1/2 l ll 2/53/5 1/2 1/2 b ww b 1/5 3/101/41/4 Urn Color of ball 12 34 Figure 4.1: T ree diagram. Example 4.4 (Example 4.1 con tin ued) Let us return to the example of rolling a die. Recall that F is the ev en t X = 6, and E is the ev en t X > 4. Note that E \ F is the ev en t F So, the ab o v e form ula giv es P ( F j E ) = P ( F \ E ) P ( E ) = 1 = 6 1 = 3 = 1 2 ; in agreemen t with the calculations p erformed earlier. 2 Example 4.5 W e ha v e t w o urns, I and I I. Urn I con tains 2 blac k balls and 3 white balls. Urn I I con tains 1 blac k ball and 1 white ball. An urn is dra wn at random and a ball is c hosen at random from it. W e can represen t the sample space of this exp erimen t as the paths through a tree as sho wn in Figure 4.1. The probabilities assigned to the paths are also sho wn. Let B b e the ev en t \a blac k ball is dra wn," and I the ev en t \urn I is c hosen." Then the branc h w eigh t 2/5, whic h is sho wn on one branc h in the gure, can no w b e in terpreted as the conditional probabilit y P ( B j I ). Supp ose w e wish to calculate P ( I j B ). Using the form ula, w e obtain P ( I j B ) = P ( I \ B ) P ( B ) = P ( I \ B ) P ( B \ I ) + P ( B \ I I ) = 1 = 5 1 = 5 + 1 = 4 = 4 9 : 2 PAGE 144 136 CHAPTER 4. CONDITIONAL PR OBABILITY (start) r p (w) w w w w w 9/20 11/20 b w 4/95/9 5/11 6/11 I IIII I 1/5 3/10 1/41/4 Urn Color of ball 1324 Figure 4.2: Rev erse tree diagram. Ba y es Probabilities Our original tree measure ga v e us the probabilities for dra wing a ball of a giv en color, giv en the urn c hosen. W e ha v e just calculated the inverse pr ob ability that a particular urn w as c hosen, giv en the color of the ball. Suc h an in v erse probabilit y is called a Bayes pr ob ability and ma y b e obtained b y a form ula that w e shall dev elop later. Ba y es probabilities can also b e obtained b y simply constructing the tree measure for the t w ostage exp erimen t carried out in rev erse order. W e sho w this tree in Figure 4.2. The paths through the rev erse tree are in onetoone corresp ondence with those in the forw ard tree, since they corresp ond to individual outcomes of the exp erimen t, and so they are assigned the same probabilities. F rom the forw ard tree, w e nd that the probabilit y of a blac k ball is 1 2 2 5 + 1 2 1 2 = 9 20 : The probabilities for the branc hes at the second lev el are found b y simple division. F or example, if x is the probabilit y to b e assigned to the top branc h at the second lev el, w e m ust ha v e 9 20 x = 1 5 or x = 4 = 9. Th us, P ( I j B ) = 4 = 9, in agreemen t with our previous calculations. The rev erse tree then displa ys all of the in v erse, or Ba y es, probabilities. Example 4.6 W e consider no w a problem called the Monty Hal l problem. This has long b een a fa v orite problem but w as reviv ed b y a letter from Craig Whitak er to Marilyn v os Sa v an t for consideration in her column in Par ade Magazine 1 Craig wrote: 1 Marilyn v os Sa v an t, Ask Marilyn, Par ade Magazine 9 Septem b er; 2 Decem b er; 17 F ebruary 1990, reprin ted in Marilyn v os Sa v an t, Ask Marilyn St. Martins, New Y ork, 1992. PAGE 145 4.1. DISCRETE CONDITIONAL PR OBABILITY 137 Supp ose y ou're on Mon t y Hall's L et's Make a De al! Y ou are giv en the c hoice of three do ors, b ehind one do or is a car, the others, goats. Y ou pic k a do or, sa y 1, Mon t y op ens another do or, sa y 3, whic h has a goat. Mon t y sa ys to y ou \Do y ou w an t to pic k do or 2?" Is it to y our adv an tage to switc h y our c hoice of do ors? Marilyn ga v e a solution concluding that y ou should switc h, and if y ou do, y our probabilit y of winning is 2/3. Sev eral irate readers, some of whom iden tied themselv es as ha ving a PhD in mathematics, said that this is absurd since after Mon t y has ruled out one do or there are only t w o p ossible do ors and they should still eac h ha v e the same probabilit y 1/2 so there is no adv an tage to switc hing. Marilyn stuc k to her solution and encouraged her readers to sim ulate the game and dra w their o wn conclusions from this. W e also encourage the reader to do this (see Exercise 11). Other readers complained that Marilyn had not describ ed the problem completely In particular, the w a y in whic h certain decisions w ere made during a pla y of the game w ere not sp ecied. This asp ect of the problem will b e discussed in Section 4.3. W e will assume that the car w as put b ehind a do or b y rolling a threesided die whic h made all three c hoices equally lik ely Mon t y kno ws where the car is, and alw a ys op ens a do or with a goat b ehind it. Finally w e assume that if Mon t y has a c hoice of do ors (i.e., the con testan t has pic k ed the do or with the car b ehind it), he c ho oses eac h do or with probabilit y 1/2. Marilyn clearly exp ected her readers to assume that the game w as pla y ed in this manner. As is the case with most apparen t parado xes, this one can b e resolv ed through careful analysis. W e b egin b y describing a simpler, related question. W e sa y that a con testan t is using the \sta y" strategy if he pic ks a do or, and, if oered a c hance to switc h to another do or, declines to do so (i.e., he sta ys with his original c hoice). Similarly w e sa y that the con testan t is using the \switc h" strategy if he pic ks a do or, and, if oered a c hance to switc h to another do or, tak es the oer. No w supp ose that a con testan t decides in adv ance to pla y the \sta y" strategy His only action in this case is to pic k a do or (and decline an in vitation to switc h, if one is oered). What is the probabilit y that he wins a car? The same question can b e ask ed ab out the \switc h" strategy Using the \sta y" strategy a con testan t will win the car with probabilit y 1/3, since 1/3 of the time the do or he pic ks will ha v e the car b ehind it. On the other hand, if a con testan t pla ys the \switc h" strategy then he will win whenev er the do or he originally pic k ed do es not ha v e the car b ehind it, whic h happ ens 2/3 of the time. This v ery simple analysis, though correct, do es not quite solv e the problem that Craig p osed. Craig ask ed for the conditional probabilit y that y ou win if y ou switc h, giv en that y ou ha v e c hosen do or 1 and that Mon t y has c hosen do or 3. T o solv e this problem, w e set up the problem b efore getting this information and then compute the conditional probabilit y giv en this information. This is a pro cess that tak es place in sev eral stages; the car is put b ehind a do or, the con testan t pic ks a do or, and nally Mon t y op ens a do or. Th us it is natural to analyze this using a tree measure. Here w e mak e an additional assumption that if Mon t y has a c hoice PAGE 146 138 CHAPTER 4. CONDITIONAL PR OBABILITY 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1 1/2 1/2 1/2 1/2 1/2 1 1/2 1 1 1 1 Door opened by Monty Door chosen by contestant Path probabilities Placement of car 12 3 12 3 1 12 2 3 3 2 33 2 3 311 2 1 1 2 1/18 1/18 1/18 1/18 1/18 1/9 1/91/91/181/91/91/9 1/3 1/3 Figure 4.3: The Mon t y Hall problem. of do ors (i.e., the con testan t has pic k ed the do or with the car b ehind it) then he pic ks eac h do or with probabilit y 1/2. The assumptions w e ha v e made determine the branc h probabilities and these in turn determine the tree measure. The resulting tree and tree measure are sho wn in Figure 4.3. It is tempting to reduce the tree's size b y making certain assumptions suc h as: \Without loss of generalit y w e will assume that the con testan t alw a ys pic ks do or 1." W e ha v e c hosen not to mak e an y suc h assumptions, in the in terest of clarit y No w the giv en information, namely that the con testan t c hose do or 1 and Mon t y c hose do or 3, means only t w o paths through the tree are p ossible (see Figure 4.4). F or one of these paths, the car is b ehind do or 1 and for the other it is b ehind do or 2. The path with the car b ehind do or 2 is t wice as lik ely as the one with the car b ehind do or 1. Th us the conditional probabilit y is 2/3 that the car is b ehind do or 2 and 1/3 that it is b ehind do or 1, so if y ou switc h y ou ha v e a 2/3 c hance of winning the car, as Marilyn claimed. A t this p oin t, the reader ma y think that the t w o problems ab o v e are the same, since they ha v e the same answ ers. Recall that w e assumed in the original problem PAGE 147 4.1. DISCRETE CONDITIONAL PR OBABILITY 139 1/3 1/3 1/3 1/2 1 Door opened rby Monty Door chosen rby contestant Unconditional rprobability Placement rof car 12 1 1 3 3 1/18 1/9 1/3 Conditionalrprobability 1/32/3 Figure 4.4: Conditional probabilities for the Mon t y Hall problem. if the con testan t c ho oses the do or with the car, so that Mon t y has a c hoice of t w o do ors, he c ho oses eac h of them with probabilit y 1/2. No w supp ose instead that in the case that he has a c hoice, he c ho oses the do or with the larger n um b er with probabilit y 3/4. In the \switc h" vs. \sta y" problem, the probabilit y of winning with the \switc h" strategy is still 2/3. Ho w ev er, in the original problem, if the con testan t switc hes, he wins with probabilit y 4/7. The reader can c hec k this b y noting that the same t w o paths as b efore are the only t w o p ossible paths in the tree. The path leading to a win, if the con testan t switc hes, has probabilit y 1/3, while the path whic h leads to a loss, if the con testan t switc hes, has probabilit y 1/4. 2Indep enden t Ev en ts It often happ ens that the kno wledge that a certain ev en t E has o ccurred has no eect on the probabilit y that some other ev en t F has o ccurred, that is, that P ( F j E ) = P ( F ). One w ould exp ect that in this case, the equation P ( E j F ) = P ( E ) w ould also b e true. In fact (see Exercise 1), eac h equation implies the other. If these equations are true, w e migh t sa y the F is indep endent of E F or example, y ou w ould not exp ect the kno wledge of the outcome of the rst toss of a coin to c hange the probabilit y that y ou w ould assign to the p ossible outcomes of the second toss, that is, y ou w ould not exp ect that the second toss dep ends on the rst. This idea is formalized in the follo wing denition of indep enden t ev en ts. Denition 4.1 Let E and F b e t w o ev en ts. W e sa y that they are indep endent if either 1) b oth ev en ts ha v e p ositiv e probabilit y and P ( E j F ) = P ( E ) and P ( F j E ) = P ( F ) ; or 2) at least one of the ev en ts has probabilit y 0. 2 PAGE 148 140 CHAPTER 4. CONDITIONAL PR OBABILITY As noted ab o v e, if b oth P ( E ) and P ( F ) are p ositiv e, then eac h of the ab o v e equations imply the other, so that to see whether t w o ev en ts are indep enden t, only one of these equations m ust b e c hec k ed (see Exercise 1). The follo wing theorem pro vides another w a y to c hec k for indep endence. Theorem 4.1 Tw o ev en ts E and F are indep enden t if and only if P ( E \ F ) = P ( E ) P ( F ) : Pro of. If either ev en t has probabilit y 0, then the t w o ev en ts are indep enden t and the ab o v e equation is true, so the theorem is true in this case. Th us, w e ma y assume that b oth ev en ts ha v e p ositiv e probabilit y in what follo ws. Assume that E and F are indep enden t. Then P ( E j F ) = P ( E ), and so P ( E \ F ) = P ( E j F ) P ( F ) = P ( E ) P ( F ) : Assume next that P ( E \ F ) = P ( E ) P ( F ). Then P ( E j F ) = P ( E \ F ) P ( F ) = P ( E ) : Also, P ( F j E ) = P ( F \ E ) P ( E ) = P ( F ) : Therefore, E and F are indep enden t. 2 Example 4.7 Supp ose that w e ha v e a coin whic h comes up heads with probabilit y p and tails with probabilit y q No w supp ose that this coin is tossed t wice. Using a frequency in terpretation of probabilit y it is reasonable to assign to the outcome ( H ; H ) the probabilit y p 2 to the outcome ( H ; T ) the probabilit y pq and so on. Let E b e the ev en t that heads turns up on the rst toss and F the ev en t that tails turns up on the second toss. W e will no w c hec k that with the ab o v e probabilit y assignmen ts, these t w o ev en ts are indep enden t, as exp ected. W e ha v e P ( E ) = p 2 + pq = p P ( F ) = pq + q 2 = q Finally P ( E \ F ) = pq so P ( E \ F ) = P ( E ) P ( F ). 2 Example 4.8 It is often, but not alw a ys, in tuitiv ely clear when t w o ev en ts are indep enden t. In Example 4.7, let A b e the ev en t \the rst toss is a head" and B the ev en t \the t w o outcomes are the same." Then P ( B j A ) = P ( B \ A ) P ( A ) = P f HH g P f HH,HT g = 1 = 4 1 = 2 = 1 2 = P ( B ) : Therefore, A and B are indep enden t, but the result w as not so ob vious. 2 PAGE 149 4.1. DISCRETE CONDITIONAL PR OBABILITY 141 Example 4.9 Finally let us giv e an example of t w o ev en ts that are not indep enden t. In Example 4.7, let I b e the ev en t \heads on the rst toss" and J the ev en t \t w o heads turn up." Then P ( I ) = 1 = 2 and P ( J ) = 1 = 4. The ev en t I \ J is the ev en t \heads on b oth tosses" and has probabilit y 1 = 4. Th us, I and J are not indep enden t since P ( I ) P ( J ) = 1 = 8 6 = P ( I \ J ). 2 W e can extend the concept of indep endence to an y nite set of ev en ts A 1 A 2 A n Denition 4.2 A set of ev en ts f A 1 ; A 2 ; : : : ; A n g is said to b e mutual ly indep endent if for an y subset f A i ; A j ; : : : ; A m g of these ev en ts w e ha v e P ( A i \ A j \ \ A m ) = P ( A i ) P ( A j ) P ( A m ) ; or equiv alen tly if for an y sequence A 1 A 2 A n with A j = A j or ~ A j P ( A 1 \ A 2 \ \ A n ) = P ( A 1 ) P ( A 2 ) P ( A n ) : (F or a pro of of the equiv alence in the case n = 3, see Exercise 33.) 2 Using this terminology it is a fact that an y sequence (S ; S ; F ; F ; S ; : : : ; S) of p ossible outcomes of a Bernoulli trials pro cess forms a sequence of m utually indep enden t ev en ts. It is natural to ask: If all pairs of a set of ev en ts are indep enden t, is the whole set m utually indep enden t? The answ er is not ne c essarily, and an example is giv en in Exercise 7. It is imp ortan t to note that the statemen t P ( A 1 \ A 2 \ \ A n ) = P ( A 1 ) P ( A 2 ) P ( A n ) do es not imply that the ev en ts A 1 A 2 A n are m utually indep enden t (see Exercise 8). Join t Distribution F unctions and Indep endence of Random V ariables It is frequen tly the case that when an exp erimen t is p erformed, sev eral dieren t quan tities concerning the outcomes are in v estigated. Example 4.10 Supp ose w e toss a coin three times. The basic random v ariable X corresp onding to this exp erimen t has eigh t p ossible outcomes, whic h are the ordered triples consisting of H's and T's. W e can also dene the random v ariable X i for i = 1 ; 2 ; 3, to b e the outcome of the i th toss. If the coin is fair, then w e should assign the probabilit y 1/8 to eac h of the eigh t p ossible outcomes. Th us, the distribution functions of X 1 X 2 and X 3 are iden tical; in eac h case they are dened b y m ( H ) = m ( T ) = 1 = 2. 2 PAGE 150 142 CHAPTER 4. CONDITIONAL PR OBABILITY If w e ha v e sev eral random v ariables X 1 ; X 2 ; : : : ; X n whic h corresp ond to a giv en exp erimen t, then w e can consider the join t random v ariable X = ( X 1 ; X 2 ; : : : ; X n ) dened b y taking an outcome of the exp erimen t, and writing, as an n tuple, the corresp onding n outcomes for the random v ariables X 1 ; X 2 ; : : : ; X n Th us, if the random v ariable X i has, as its set of p ossible outcomes the set R i then the set of p ossible outcomes of the join t random v ariable X is the Cartesian pro duct of the R i 's, i.e., the set of all n tuples of p ossible outcomes of the X i 's. Example 4.11 (Example 4.10 con tin ued) In the cointossing example ab o v e, let X i denote the outcome of the i th toss. Then the join t random v ariable X = ( X 1 ; X 2 ; X 3 ) has eigh t p ossible outcomes. Supp ose that w e no w dene Y i for i = 1 ; 2 ; 3, as the n um b er of heads whic h o ccur in the rst i tosses. Then Y i has f 0 ; 1 ; : : : ; i g as p ossible outcomes, so at rst glance, the set of p ossible outcomes of the join t random v ariable Y = ( Y 1 ; Y 2 ; Y 3 ) should b e the set f ( a 1 ; a 2 ; a 3 ) : 0 a 1 1 ; 0 a 2 2 ; 0 a 3 3 g : Ho w ev er, the outcome (1 ; 0 ; 1) cannot o ccur, since w e m ust ha v e a 1 a 2 a 3 The solution to this problem is to dene the probabilit y of the outcome (1 ; 0 ; 1) to b e 0. In addition, w e m ust ha v e a i +1 a i 1 for i = 1 ; 2. W e no w illustrate the assignmen t of probabilities to the v arious outcomes for the join t random v ariables X and Y In the rst case, eac h of the eigh t outcomes should b e assigned the probabilit y 1/8, since w e are assuming that w e ha v e a fair coin. In the second case, since Y i has i + 1 p ossible outcomes, the set of p ossible outcomes has size 24. Only eigh t of these 24 outcomes can actually o ccur, namely the ones satisfying a 1 a 2 a 3 Eac h of these outcomes corresp onds to exactly one of the outcomes of the random v ariable X so it is natural to assign probabilit y 1/8 to eac h of these. W e assign probabilit y 0 to the other 16 outcomes. In eac h case, the probabilit y function is called a join t distribution function. 2 W e collect the ab o v e ideas in a denition. Denition 4.3 Let X 1 ; X 2 ; : : : ; X n b e random v ariables asso ciated with an exp erimen t. Supp ose that the sample space (i.e., the set of p ossible outcomes) of X i is the set R i Then the join t random v ariable X = ( X 1 ; X 2 ; : : : ; X n ) is dened to b e the random v ariable whose outcomes consist of ordered n tuples of outcomes, with the i th co ordinate lying in the set R i The sample space n of X is the Cartesian pro duct of the R i 's: n = R 1 R 2 R n : The join t distribution function of X is the function whic h giv es the probabilit y of eac h of the outcomes of X 2 Example 4.12 (Example 4.10 con tin ued) W e no w consider the assignmen t of probabilities in the ab o v e example. In the case of the random v ariable X the probabilit y of an y outcome ( a 1 ; a 2 ; a 3 ) is just the pro duct of the probabilities P ( X i = a i ), PAGE 151 4.1. DISCRETE CONDITIONAL PR OBABILITY 143 Not smok e Smok e T otal Not cancer 40 10 50 Cancer 7 3 10 T otals 47 13 60 T able 4.1: Smoking and cancer. S 0 1 0 40/60 10/60 C 1 7/60 3/60 T able 4.2: Join t distribution. for i = 1 ; 2 ; 3. Ho w ev er, in the case of Y the probabilit y assigned to the outcome (1 ; 1 ; 0) is not the pro duct of the probabilities P ( Y 1 = 1), P ( Y 2 = 1), and P ( Y 3 = 0). The dierence b et w een these t w o situations is that the v alue of X i do es not aect the v alue of X j if i 6 = j while the v alues of Y i and Y j aect one another. F or example, if Y 1 = 1, then Y 2 cannot equal 0. This prompts the next denition. 2 Denition 4.4 The random v ariables X 1 X 2 X n are mutual ly indep endent if P ( X 1 = r 1 ; X 2 = r 2 ; : : : ; X n = r n ) = P ( X 1 = r 1 ) P ( X 2 = r 2 ) P ( X n = r n ) for an y c hoice of r 1 ; r 2 ; : : : ; r n Th us, if X 1 ; X 2 ; : : : ; X n are m utually indep enden t, then the join t distribution function of the random v ariable X = ( X 1 ; X 2 ; : : : ; X n ) is just the pro duct of the individual distribution functions. When t w o random v ariables are m utually indep enden t, w e shall sa y more briery that they are indep endent. 2 Example 4.13 In a group of 60 p eople, the n um b ers who do or do not smok e and do or do not ha v e cancer are rep orted as sho wn in T able 4.1. Let n b e the sample space consisting of these 60 p eople. A p erson is c hosen at random from the group. Let C ( ) = 1 if this p erson has cancer and 0 if not, and S ( ) = 1 if this p erson smok es and 0 if not. Then the join t distribution of f C ; S g is giv en in T able 4.2. F or example P ( C = 0 ; S = 0) = 40 = 60, P ( C = 0 ; S = 1) = 10 = 60, and so forth. The distributions of the individual random v ariables are called mar ginal distributions. The marginal distributions of C and S are: p C = 0 1 50 = 60 10 = 60 ; PAGE 152 144 CHAPTER 4. CONDITIONAL PR OBABILITY p S = 0 1 47 = 60 13 = 60 : The random v ariables S and C are not indep enden t, since P ( C = 1 ; S = 1) = 3 60 = : 05 ; P ( C = 1) P ( S = 1) = 10 60 13 60 = : 036 : Note that w e w ould also see this from the fact that P ( C = 1 j S = 1) = 3 13 = : 23 ; P ( C = 1) = 1 6 = : 167 : 2 Indep enden t T rials Pro cesses The study of random v ariables pro ceeds b y considering sp ecial classes of random v ariables. One suc h class that w e shall study is the class of indep endent trials. Denition 4.5 A sequence of random v ariables X 1 X 2 X n that are m utually indep enden t and that ha v e the same distribution is called a sequence of indep enden t trials or an indep endent trials pr o c ess. Indep enden t trials pro cesses arise naturally in the follo wing w a y W e ha v e a single exp erimen t with sample space R = f r 1 ; r 2 ; : : : ; r s g and a distribution function m X = r 1 r 2 r s p 1 p 2 p s : W e rep eat this exp erimen t n times. T o describ e this total exp erimen t, w e c ho ose as sample space the space n = R R R ; consisting of all p ossible sequences = ( 1 ; 2 ; : : : ; n ) where the v alue of eac h j is c hosen from R W e assign a distribution function to b e the pr o duct distribution m ( ) = m ( 1 ) : : : m ( n ) ; with m ( j ) = p k when j = r k Then w e let X j denote the j th co ordinate of the outcome ( r 1 ; r 2 ; : : : ; r n ). The random v ariables X 1 X n form an indep enden t trials pro cess. 2 Example 4.14 An exp erimen t consists of rolling a die three times. Let X i represen t the outcome of the i th roll, for i = 1 ; 2 ; 3. The common distribution function is m i = 1 2 3 4 5 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 : PAGE 153 4.1. DISCRETE CONDITIONAL PR OBABILITY 145 The sample space is R 3 = R R R with R = f 1 ; 2 ; 3 ; 4 ; 5 ; 6 g If = (1 ; 3 ; 6), then X 1 ( ) = 1, X 2 ( ) = 3, and X 3 ( ) = 6 indicating that the rst roll w as a 1, the second w as a 3, and the third w as a 6. The probabilit y assigned to an y sample p oin t is m ( ) = 1 6 1 6 1 6 = 1 216 : 2 Example 4.15 Consider next a Bernoulli trials pro cess with probabilit y p for success on eac h exp erimen t. Let X j ( ) = 1 if the j th outcome is success and X j ( ) = 0 if it is a failure. Then X 1 X 2 X n is an indep enden t trials pro cess. Eac h X j has the same distribution function m j = 0 1 q p ; where q = 1 p If S n = X 1 + X 2 + + X n then P ( S n = j ) = n j p j q n j ; and S n has, as distribution, the binomial distribution b ( n; p; j ). 2 Ba y es' F orm ula In our examples, w e ha v e considered conditional probabilities of the follo wing form: Giv en the outcome of the second stage of a t w ostage exp erimen t, nd the probabilit y for an outcome at the rst stage. W e ha v e remark ed that these probabilities are called Bayes pr ob abilities. W e return no w to the calculation of more general Ba y es probabilities. Supp ose w e ha v e a set of ev en ts H 1 ; H 2 H m that are pairwise disjoin t and suc h that the sample space n satises the equation n = H 1 [ H 2 [ [ H m : W e call these ev en ts hyp otheses. W e also ha v e an ev en t E that giv es us some information ab out whic h h yp othesis is correct. W e call this ev en t evidenc e. Before w e receiv e the evidence, then, w e ha v e a set of prior pr ob abilities P ( H 1 ), P ( H 2 ), P ( H m ) for the h yp otheses. If w e kno w the correct h yp othesis, w e kno w the probabilit y for the evidence. That is, w e kno w P ( E j H i ) for all i W e w an t to nd the probabilities for the h yp otheses giv en the evidence. That is, w e w an t to nd the conditional probabilities P ( H i j E ). These probabilities are called the p osterior pr ob abilities. T o nd these probabilities, w e write them in the form P ( H i j E ) = P ( H i \ E ) P ( E ) : (4.1) PAGE 154 146 CHAPTER 4. CONDITIONAL PR OBABILITY Num b er ha ving The results Disease this disease + + + { { + { { d 1 3215 2110 301 704 100 d 2 2125 396 132 1187 410 d 3 4660 510 3568 73 509 T otal 10000 T able 4.3: Diseases data. W e can calculate the n umerator from our giv en information b y P ( H i \ E ) = P ( H i ) P ( E j H i ) : (4.2) Since one and only one of the ev en ts H 1 H 2 H m can o ccur, w e can write the probabilit y of E as P ( E ) = P ( H 1 \ E ) + P ( H 2 \ E ) + + P ( H m \ E ) : Using Equation 4.2, the ab o v e expression can b e seen to equal P ( H 1 ) P ( E j H 1 ) + P ( H 2 ) P ( E j H 2 ) + + P ( H m ) P ( E j H m ) : (4.3) Using (4.1), (4.2), and (4.3) yields Bayes' formula : P ( H i j E ) = P ( H i ) P ( E j H i ) P mk =1 P ( H k ) P ( E j H k ) : Although this is a v ery famous form ula, w e will rarely use it. If the n um b er of h yp otheses is small, a simple tree measure calculation is easily carried out, as w e ha v e done in our examples. If the n um b er of h yp otheses is large, then w e should use a computer. Ba y es probabilities are particularly appropriate for medical diagnosis. A do ctor is anxious to kno w whic h of sev eral diseases a patien t migh t ha v e. She collects evidence in the form of the outcomes of certain tests. F rom statistical studies the do ctor can nd the prior probabilities of the v arious diseases b efore the tests, and the probabilities for sp ecic test outcomes, giv en a particular disease. What the do ctor w an ts to kno w is the p osterior probabilit y for the particular disease, giv en the outcomes of the tests. Example 4.16 A do ctor is trying to decide if a patien t has one of three diseases d 1 d 2 or d 3 Tw o tests are to b e carried out, eac h of whic h results in a p ositiv e (+) or a negativ e ( ) outcome. There are four p ossible test patterns ++, + +, and National records ha v e indicated that, for 10,000 p eople ha ving one of these three diseases, the distribution of diseases and test results are as in T able 4.3. F rom this data, w e can estimate the prior probabilities for eac h of the diseases and, giv en a particular disease, the probabilit y of a particular test outcome. F or example, the prior probabilit y of disease d 1 ma y b e estimated to b e 3215 = 10 ; 000 = : 3215. The probabilit y of the test result + giv en disease d 1 ma y b e estimated to b e 301 = 3215 = : 094. PAGE 155 4.1. DISCRETE CONDITIONAL PR OBABILITY 147 d 1 d 2 d 3 + + .700 .131 .169 + { .075 .033 .892 { + .358 .604 .038 { { .098 .403 .499 T able 4.4: P osterior probabilities. W e can no w use Ba y es' form ula to compute v arious p osterior probabilities. The computer program Ba y es computes these p osterior probabilities. The results for this example are sho wn in T able 4.4. W e note from the outcomes that, when the test result is ++, the disease d 1 has a signican tly higher probabilit y than the other t w o. When the outcome is + this is true for disease d 3 When the outcome is +, this is true for disease d 2 Note that these statemen ts migh t ha v e b een guessed b y lo oking at the data. If the outcome is the most probable cause is d 3 but the probabilit y that a patien t has d 2 is only sligh tly smaller. If one lo oks at the data in this case, one can see that it migh t b e hard to guess whic h of the t w o diseases d 2 and d 3 is more lik ely 2 Our nal example sho ws that one has to b e careful when the prior probabilities are small. Example 4.17 A do ctor giv es a patien t a test for a particular cancer. Before the results of the test, the only evidence the do ctor has to go on is that 1 w oman in 1000 has this cancer. Exp erience has sho wn that, in 99 p ercen t of the cases in whic h cancer is presen t, the test is p ositiv e; and in 95 p ercen t of the cases in whic h it is not presen t, it is negativ e. If the test turns out to b e p ositiv e, what probabilit y should the do ctor assign to the ev en t that cancer is presen t? An alternativ e form of this question is to ask for the relativ e frequencies of false p ositiv es and cancers. W e are giv en that prior(cancer) = : 001 and prior(not cancer) = : 999. W e kno w also that P (+ j cancer ) = : 99, P ( j cancer ) = : 01, P (+ j not cancer ) = : 05, and P ( j not cancer ) = : 95. Using this data giv es the result sho wn in Figure 4.5. W e see no w that the probabilit y of cancer giv en a p ositiv e test has only increased from .001 to .019. While this is nearly a t w en t yfold increase, the probabilit y that the patien t has the cancer is still small. Stated in another w a y among the p ositiv e results, 98.1 p ercen t are false p ositiv es, and 1.9 p ercen t are cancers. When a group of secondy ear medical studen ts w as ask ed this question, o v er half of the studen ts incorrectly guessed the probabilit y to b e greater than .5. 2 Historical Remarks Conditional probabilit y w as used long b efore it w as formally dened. P ascal and F ermat considered the pr oblem of p oints : giv en that team A has w on m games and team B has w on n games, what is the probabilit y that A will win the series? (See Exercises 40{42.) This is clearly a conditional probabilit y problem. In his b o ok, Huygens ga v e a n um b er of problems, one of whic h w as: PAGE 156 148 CHAPTER 4. CONDITIONAL PR OBABILITY .001 can not .01 .95 .05 + .001 0 .05 .949 + .051 .949 + .981 1 0 cannot .001 .05 0 .949 can not .019 Original Tree Reverse Tree .99 .999 Figure 4.5: F orw ard and rev erse tree diagrams. Three gam blers, A, B and C, tak e 12 balls of whic h 4 are white and 8 blac k. They pla y with the rules that the dra w er is blindfolded, A is to dra w rst, then B and then C, the winner to b e the one who rst dra ws a white ball. What is the ratio of their c hances? 2 F rom his answ er it is clear that Huygens mean t that eac h ball is replaced after dra wing. Ho w ev er, John Hudde, the ma y or of Amsterdam, assumed that he mean t to sample without replacemen t and corresp onded with Huygens ab out the dierence in their answ ers. Hac king remarks that \Neither part y can understand what the other is doing." 3 By the time of de Moivre's b o ok, The Do ctrine of Chanc es, these distinctions w ere w ell understo o d. De Moivre dened indep endence and dep endence as follo ws: Tw o Ev en ts are indep enden t, when they ha v e no connexion one with the other, and that the happ ening of one neither forw ards nor obstructs the happ ening of the other. Tw o Ev en ts are dep enden t, when they are so connected together as that the Probabilit y of either's happ ening is altered b y the happ ening of the other. 4 De Moivre used sampling with and without replacemen t to illustrate that the probabilit y that t w o indep enden t ev en ts b oth happ en is the pro duct of their probabilities, and for dep enden t ev en ts that: 2 Quoted in F. N. Da vid, Games, Go ds and Gambling (London: Grin, 1962), p. 119. 3 I. Hac king, The Emer genc e of Pr ob ability (Cam bridge: Cam bridge Univ ersit y Press, 1975), p. 99. 4 A. de Moivre, The Do ctrine of Chanc es, 3rd ed. (New Y ork: Chelsea, 1967), p. 6. PAGE 157 4.1. DISCRETE CONDITIONAL PR OBABILITY 149 The Probabilit y of the happ ening of t w o Ev en ts dep enden t, is the pro duct of the Probabilit y of the happ ening of one of them, b y the Probabilit y whic h the other will ha v e of happ ening, when the rst is considered as ha ving happ ened; and the same Rule will extend to the happ ening of as man y Ev en ts as ma y b e assigned. 5 The form ula that w e call Ba y es' form ula, and the idea of computing the probabilit y of a h yp othesis giv en evidence, originated in a famous essa y of Thomas Ba y es. Ba y es w as an ordained minister in T un bridge W ells near London. His mathematical in terests led him to b e elected to the Ro y al So ciet y in 1742, but none of his results w ere published within his lifetime. The w ork up on whic h his fame rests, \An Essa y T o w ard Solving a Problem in the Do ctrine of Chances," w as published in 1763, three y ears after his death. 6 Ba y es review ed some of the basic concepts of probabilit y and then considered a new kind of in v erse probabilit y problem requiring the use of conditional probabilit y Bernoulli, in his study of pro cesses that w e no w call Bernoulli trials, had pro v en his famous la w of large n um b ers whic h w e will study in Chapter 8. This theorem assured the exp erimen ter that if he knew the probabilit y p for success, he could predict that the prop ortion of successes w ould approac h this v alue as he increased the n um b er of exp erimen ts. Bernoulli himself realized that in most in teresting cases y ou do not kno w the v alue of p and sa w his theorem as an imp ortan t step in sho wing that y ou could determine p b y exp erimen tation. T o study this problem further, Ba y es started b y assuming that the probabilit y p for success is itself determined b y a random exp erimen t. He assumed in fact that this exp erimen t w as suc h that this v alue for p is equally lik ely to b e an y v alue b et w een 0 and 1. Without kno wing this v alue w e carry out n exp erimen ts and observ e m successes. Ba y es prop osed the problem of nding the conditional probabilit y that the unkno wn probabilit y p lies b et w een a and b He obtained the answ er: P ( a p < b j m successes in n trials ) = R b a x m (1 x ) n m dx R 1 0 x m (1 x ) n m dx : W e shall see in the next section ho w this result is obtained. Ba y es clearly w an ted to sho w that the conditional distribution function, giv en the outcomes of more and more exp erimen ts, b ecomes concen trated around the true v alue of p Th us, Ba y es w as trying to solv e an inverse pr oblem. The computation of the in tegrals w as to o dicult for exact solution except for small v alues of j and n and so Ba y es tried appro ximate metho ds. His metho ds w ere not v ery satisfactory and it has b een suggested that this discouraged him from publishing his results. Ho w ev er, his pap er w as the rst in a series of imp ortan t studies carried out b y Laplace, Gauss, and other great mathematicians to solv e in v erse problems. They studied this problem in terms of errors in measuremen ts in astronom y If an astronomer w ere to kno w the true v alue of a distance and the nature of the random 5 ibid, p. 7. 6 T. Ba y es, \An Essa y T o w ard Solving a Problem in the Do ctrine of Chances," Phil. T r ans. R oyal So c. L ondon, v ol. 53 (1763), pp. 370{418. PAGE 158 150 CHAPTER 4. CONDITIONAL PR OBABILITY errors caused b y his measuring device he could predict the probabilistic nature of his measuremen ts. In fact, ho w ev er, he is presen ted with the in v erse problem of kno wing the nature of the random errors, and the v alues of the measuremen ts, and w an ting to mak e inferences ab out the unkno wn true v alue. As Maistro v remarks, the form ula that w e ha v e called Ba y es' form ula do es not app ear in his essa y Laplace ga v e it this name when he studied these in v erse problems. 7 The computation of in v erse probabilities is fundamen tal to statistics and has led to an imp ortan t branc h of statistics called Ba y esian analysis, assuring Ba y es eternal fame for his brief essa y Exercises 1 Assume that E and F are t w o ev en ts with p ositiv e probabilities. Sho w that if P ( E j F ) = P ( E ), then P ( F j E ) = P ( F ). 2 A coin is tossed three times. What is the probabilit y that exactly t w o heads o ccur, giv en that (a) the rst outcome w as a head? (b) the rst outcome w as a tail? (c) the rst t w o outcomes w ere heads? (d) the rst t w o outcomes w ere tails? (e) the rst outcome w as a head and the third outcome w as a head? 3 A die is rolled t wice. What is the probabilit y that the sum of the faces is greater than 7, giv en that (a) the rst outcome w as a 4? (b) the rst outcome w as greater than 3? (c) the rst outcome w as a 1? (d) the rst outcome w as less than 5? 4 A card is dra wn at random from a dec k of cards. What is the probabilit y that (a) it is a heart, giv en that it is red? (b) it is higher than a 10, giv en that it is a heart? (In terpret J, Q, K, A as 11, 12, 13, 14.) (c) it is a jac k, giv en that it is red? 5 A coin is tossed three times. Consider the follo wing ev en ts A : Heads on the rst toss. B : T ails on the second. C : Heads on the third toss. D : All three outcomes the same (HHH or TTT). E : Exactly one head turns up. 7 L. E. Maistro v, Pr ob ability The ory: A Historic al Sketch, trans. and ed. Sam ual Kotz (New Y ork: Academic Press, 1974), p. 100. PAGE 159 4.1. DISCRETE CONDITIONAL PR OBABILITY 151 (a) Whic h of the follo wing pairs of these ev en ts are indep enden t? (1) A B (2) A D (3) A E (4) D E (b) Whic h of the follo wing triples of these ev en ts are indep enden t? (1) A B C (2) A B D (3) C D E 6 F rom a dec k of v e cards n um b ered 2, 4, 6, 8, and 10, resp ectiv ely a card is dra wn at random and replaced. This is done three times. What is the probabilit y that the card n um b ered 2 w as dra wn exactly t w o times, giv en that the sum of the n um b ers on the three dra ws is 12? 7 A coin is tossed t wice. Consider the follo wing ev en ts. A : Heads on the rst toss. B : Heads on the second toss. C : The t w o tosses come out the same. (a) Sho w that A B C are pairwise indep enden t but not indep enden t. (b) Sho w that C is indep enden t of A and B but not of A \ B 8 Let n = f a; b; c; d; e; f g Assume that m ( a ) = m ( b ) = 1 = 8 and m ( c ) = m ( d ) = m ( e ) = m ( f ) = 3 = 16. Let A B and C b e the ev en ts A = f d; e; a g B = f c; e; a g C = f c; d; a g Sho w that P ( A \ B \ C ) = P ( A ) P ( B ) P ( C ) but no t w o of these ev en ts are indep enden t. 9 What is the probabilit y that a family of t w o c hildren has (a) t w o b o ys giv en that it has at least one b o y? (b) t w o b o ys giv en that the rst c hild is a b o y? 10 In Example 4.2, w e used the Life T able (see App endix C) to compute a conditional probabilit y The n um b er 93,753 in the table, corresp onding to 40y earold males, means that of all the males b orn in the United States in 1950, 93.753% w ere aliv e in 1990. Is it reasonable to use this as an estimate for the probabilit y of a male, b orn this y ear, surviving to age 40? 11 Sim ulate the Mon t y Hall problem. Carefully state an y assumptions that y ou ha v e made when writing the program. Whic h v ersion of the problem do y ou think that y ou are sim ulating? 12 In Example 4.17, ho w large m ust the prior probabilit y of cancer b e to giv e a p osterior probabilit y of .5 for cancer giv en a p ositiv e test? 13 Tw o cards are dra wn from a bridge dec k. What is the probabilit y that the second card dra wn is red? PAGE 160 152 CHAPTER 4. CONDITIONAL PR OBABILITY 14 If P ( ~ B ) = 1 = 4 and P ( A j B ) = 1 = 2, what is P ( A \ B )? 15 (a) What is the probabilit y that y our bridge partner has exactly t w o aces, giv en that she has at least one ace? (b) What is the probabilit y that y our bridge partner has exactly t w o aces, giv en that she has the ace of spades? 16 Pro v e that for an y three ev en ts A B C eac h ha ving p ositiv e probabilit y and with the prop ert y that P ( A \ B ) > 0, P ( A \ B \ C ) = P ( A ) P ( B j A ) P ( C j A \ B ) : 17 Pro v e that if A and B are indep enden t so are (a) A and ~ B (b) ~ A and ~ B 18 A do ctor assumes that a patien t has one of three diseases d 1 d 2 or d 3 Before an y test, he assumes an equal probabilit y for eac h disease. He carries out a test that will b e p ositiv e with probabilit y .8 if the patien t has d 1 .6 if he has disease d 2 and .4 if he has disease d 3 Giv en that the outcome of the test w as p ositiv e, what probabilities should the do ctor no w assign to the three p ossible diseases? 19 In a p ok er hand, John has a v ery strong hand and b ets 5 dollars. The probabilit y that Mary has a b etter hand is .04. If Mary had a b etter hand she w ould raise with probabilit y .9, but with a p o orer hand she w ould only raise with probabilit y .1. If Mary raises, what is the probabilit y that she has a b etter hand than John do es? 20 The P oly a urn mo del for con tagion is as follo ws: W e start with an urn whic h con tains one white ball and one blac k ball. A t eac h second w e c ho ose a ball at random from the urn and replace this ball and add one more of the color c hosen. W rite a program to sim ulate this mo del, and see if y ou can mak e an y predictions ab out the prop ortion of white balls in the urn after a large n um b er of dra ws. Is there a tendency to ha v e a large fraction of balls of the same color in the long run? 21 It is desired to nd the probabilit y that in a bridge deal eac h pla y er receiv es an ace. A studen t argues as follo ws. It do es not matter where the rst ace go es. The second ace m ust go to one of the other three pla y ers and this o ccurs with probabilit y 3/4. Then the next m ust go to one of t w o, an ev en t of probabilit y 1/2, and nally the last ace m ust go to the pla y er who do es not ha v e an ace. This o ccurs with probabilit y 1/4. The probabilit y that all these ev en ts o ccur is the pro duct (3 = 4)(1 = 2)(1 = 4) = 3 = 32. Is this argumen t correct? 22 One coin in a collection of 65 has t w o heads. The rest are fair. If a coin, c hosen at random from the lot and then tossed, turns up heads 6 times in a ro w, what is the probabilit y that it is the t w oheaded coin? PAGE 161 4.1. DISCRETE CONDITIONAL PR OBABILITY 153 23 Y ou are giv en t w o urns and ft y balls. Half of the balls are white and half are blac k. Y ou are ask ed to distribute the balls in the urns with no restriction placed on the n um b er of either t yp e in an urn. Ho w should y ou distribute the balls in the urns to maximize the probabilit y of obtaining a white ball if an urn is c hosen at random and a ball dra wn out at random? Justify y our answ er. 24 A fair coin is thro wn n times. Sho w that the conditional probabilit y of a head on an y sp ecied trial, giv en a total of k heads o v er the n trials, is k =n ( k > 0). 25 (Johnson b ough 8 ) A coin with probabilit y p for heads is tossed n times. Let E b e the ev en t \a head is obtained on the rst toss' and F k the ev en t `exactly k heads are obtained." F or whic h pairs ( n; k ) are E and F k indep enden t? 26 Supp ose that A and B are ev en ts suc h that P ( A j B ) = P ( B j A ) and P ( A [ B ) = 1 and P ( A \ B ) > 0. Pro v e that P ( A ) > 1 = 2. 27 (Ch ung 9 ) In London, half of the da ys ha v e some rain. The w eather forecaster is correct 2/3 of the time, i.e., the probabilit y that it rains, giv en that she has predicted rain, and the probabilit y that it do es not rain, giv en that she has predicted that it w on't rain, are b oth equal to 2/3. When rain is forecast, Mr. Pic kwic k tak es his um brella. When rain is not forecast, he tak es it with probabilit y 1/3. Find (a) the probabilit y that Pic kwic k has no um brella, giv en that it rains. (b) the probabilit y that he brings his um brella, giv en that it do esn't rain. 28 Probabilit y theory w as used in a famous court case: Pe ople v. Col lins. 10 In this case a purse w as snatc hed from an elderly p erson in a Los Angeles suburb. A couple seen running from the scene w ere describ ed as a blac k man with a b eard and a m ustac he and a blond girl with hair in a p on ytail. Witnesses said they dro v e o in a partly y ello w car. Malcolm and Janet Collins w ere arrested. He w as blac k and though clean sha v en when arrested had evidence of recen tly ha ving had a b eard and a m ustac he. She w as blond and usually w ore her hair in a p on ytail. They dro v e a partly y ello w Lincoln. The prosecution called a professor of mathematics as a witness who suggested that a conserv ativ e set of probabilities for the c haracteristics noted b y the witnesses w ould b e as sho wn in T able 4.5. The prosecution then argued that the probabilit y that all of these c haracteristics are met b y a randomly c hosen couple is the pro duct of the probabilities or 1/12,000,000, whic h is v ery small. He claimed this w as pro of b ey ond a reasonable doubt that the defendan ts w ere guilt y The jury agreed and handed do wn a v erdict of guilt y of seconddegree robb ery 8 R. Johnson b ough, \Problem #103," Two Y e ar Col le ge Math Journal, v ol. 8 (1977), p. 292. 9 K. L. Ch ung, Elementary Pr ob ability The ory With Sto chastic Pr o c esses, 3r d e d. (New Y ork: SpringerV erlag, 1979), p. 152. 10 M. W. Gra y \Statistics and the La w," Mathematics Magazine, v ol. 56 (1983), pp. 67{81. PAGE 162 154 CHAPTER 4. CONDITIONAL PR OBABILITY man with m ustac he 1/4 girl with blond hair 1/3 girl with p on ytail 1/10 blac k man with b eard 1/10 in terracial couple in a car 1/1000 partly y ello w car 1/10 T able 4.5: Collins case probabilities. If y ou w ere the la wy er for the Collins couple ho w w ould y ou ha v e coun tered the ab o v e argumen t? (The app eal of this case is discussed in Exercise 5.1.34.) 29 A studen t is applying to Harv ard and Dartmouth. He estimates that he has a probabilit y of .5 of b eing accepted at Dartmouth and .3 of b eing accepted at Harv ard. He further estimates the probabilit y that he will b e accepted b y b oth is .2. What is the probabilit y that he is accepted b y Dartmouth if he is accepted b y Harv ard? Is the ev en t \accepted at Harv ard" indep enden t of the ev en t \accepted at Dartmouth"? 30 Luxco, a wholesale ligh tbulb man ufacturer, has t w o factories. F actory A sells bulbs in lots that consists of 1000 regular and 2000 softglow bulbs eac h. Random sampling has sho wn that on the a v erage there tend to b e ab out 2 bad regular bulbs and 11 bad softglo w bulbs p er lot. A t factory B the lot size is rev ersedthere are 2000 regular and 1000 softglo w p er lotand there tend to b e 5 bad regular and 6 bad softglo w bulbs p er lot. The manager of factory A asserts, \W e're ob viously the b etter pro ducer; our bad bulb rates are .2 p ercen t and .55 p ercen t compared to B's .25 p ercen t and .6 p ercen t. W e're b etter at b oth regular and softglo w bulbs b y half of a ten th of a p ercen t eac h." \Au con traire," coun ters the manager of B, \eac h of our 3000 bulb lots contains only 11 bad bulbs, while A's 3000 bulb lots con tain 13. So our .37 p ercen t bad bulb rate b eats their .43 p ercen t." Who is righ t? 31 Using the Life T able for 1981 giv en in App endix C, nd the probabilit y that a male of age 60 in 1981 liv es to age 80. Find the same probabilit y for a female. 32 (a) There has b een a blizzard and Helen is trying to driv e from W o o dsto c k to T un bridge, whic h are connected lik e the top graph in Figure 4.6. Here p and q are the probabilities that the t w o roads are passable. What is the probabilit y that Helen can get from W o o dsto c k to T un bridge? (b) No w supp ose that W o o dsto c k and T un bridge are connected lik e the middle graph in Figure 4.6. What no w is the probabilit y that she can get from W to T ? Note that if w e think of the roads as b eing comp onen ts of a system, then in (a) and (b) w e ha v e computed the r eliability of a system whose comp onen ts are (a) in series and (b) in p ar al lel. PAGE 163 4.1. DISCRETE CONDITIONAL PR OBABILITY 155 Woodstock Tunbridge p q C D T W .8 .9 .9 .8 .95 W T p q (a) (b) (c) Figure 4.6: F rom W o o dsto c k to T un bridge. (c) No w supp ose W and T are connected lik e the b ottom graph in Figure 4.6. Find the probabilit y of Helen's getting from W to T Hint : If the road from C to D is impassable, it migh t as w ell not b e there at all; if it is passable, then gure out ho w to use part (b) t wice. 33 Let A 1 A 2 and A 3 b e ev en ts, and let B i represen t either A i or its complemen t ~ A i Then there are eigh t p ossible c hoices for the triple ( B 1 ; B 2 ; B 3 ). Pro v e that the ev en ts A 1 A 2 A 3 are indep enden t if and only if P ( B 1 \ B 2 \ B 3 ) = P ( B 1 ) P ( B 2 ) P ( B 3 ) ; for all eigh t of the p ossible c hoices for the triple ( B 1 ; B 2 ; B 3 ). 34 F our w omen, A, B, C, and D, c hec k their hats, and the hats are returned in a random manner. Let n b e the set of all p ossible p erm utations of A, B, C, D. Let X j = 1 if the j th w oman gets her o wn hat bac k and 0 otherwise. What is the distribution of X j ? Are the X i 's m utually indep enden t? 35 A b o x has n um b ers from 1 to 10. A n um b er is dra wn at random. Let X 1 b e the n um b er dra wn. This n um b er is replaced, and the ten n um b ers mixed. A second n um b er X 2 is dra wn. Find the distributions of X 1 and X 2 Are X 1 and X 2 indep enden t? Answ er the same questions if the rst n um b er is not replaced b efore the second is dra wn. PAGE 164 156 CHAPTER 4. CONDITIONAL PR OBABILITY Y 1 0 1 2 X 1 0 1/36 1/6 1/12 0 1/18 0 1/18 0 1 0 1/36 1/6 1/12 2 1/12 0 1/12 1/6 T able 4.6: Join t distribution. 36 A die is thro wn t wice. Let X 1 and X 2 denote the outcomes. Dene X = min( X 1 ; X 2 ). Find the distribution of X *37 Giv en that P ( X = a ) = r P (max( X ; Y ) = a ) = s and P (min ( X ; Y ) = a ) = t sho w that y ou can determine u = P ( Y = a ) in terms of r s and t 38 A fair coin is tossed three times. Let X b e the n um b er of heads that turn up on the rst t w o tosses and Y the n um b er of heads that turn up on the third toss. Giv e the distribution of (a) the random v ariables X and Y (b) the random v ariable Z = X + Y (c) the random v ariable W = X Y 39 Assume that the random v ariables X and Y ha v e the join t distribution giv en in T able 4.6. (a) What is P ( X 1 and Y 0)? (b) What is the conditional probabilit y that Y 0 giv en that X = 2? (c) Are X and Y indep enden t? (d) What is the distribution of Z = X Y ? 40 In the pr oblem of p oints discussed in the historical remarks in Section 3.2, t w o pla y ers, A and B, pla y a series of p oin ts in a game with pla y er A winning eac h p oin t with probabilit y p and pla y er B winning eac h p oin t with probabilit y q = 1 p The rst pla y er to win N p oin ts wins the game. Assume that N = 3. Let X b e a random v ariable that has the v alue 1 if pla y er A wins the series and 0 otherwise. Let Y b e a random v ariable with v alue the n um b er of p oin ts pla y ed in a game. Find the distribution of X and Y when p = 1 = 2. Are X and Y indep enden t in this case? Answ er the same questions for the case p = 2 = 3. 41 The letters b et w een P ascal and F ermat, whic h are often credited with ha ving started probabilit y theory dealt mostly with the pr oblem of p oints describ ed in Exercise 40. P ascal and F ermat considered the problem of nding a fair division of stak es if the game m ust b e called o when the rst pla y er has w on r games and the second pla y er has w on s games, with r < N and s < N Let P ( r ; s ) b e the probabilit y that pla y er A wins the game if he has already w on r p oin ts and pla y er B has w on s p oin ts. Then PAGE 165 4.1. DISCRETE CONDITIONAL PR OBABILITY 157 (a) P ( r ; N ) = 0 if r < N (b) P ( N ; s ) = 1 if s < N (c) P ( r ; s ) = pP ( r + 1 ; s ) + q P ( r ; s + 1) if r < N and s < N ; and (1), (2), and (3) determine P ( r ; s ) for r N and s N P ascal used these facts to nd P ( r ; s ) b y w orking bac kw ard: He rst obtained P ( N 1 ; j ) for j = N 1, N 2, 0; then, from these v alues, he obtained P ( N 2 ; j ) for j = N 1, N 2, 0 and, con tin uing bac kw ard, obtained all the v alues P ( r ; s ). W rite a program to compute P ( r ; s ) for giv en N a b and p Warning : F ollo w P ascal and y ou will b e able to run N = 100; use recursion and y ou will not b e able to run N = 20. 42 F ermat solv ed the pr oblem of p oints (see Exercise 40) as follo ws: He realized that the problem w as dicult b ecause the p ossible w a ys the pla y migh t go are not equally lik ely F or example, when the rst pla y er needs t w o more games and the second needs three to win, t w o p ossible w a ys the series migh t go for the rst pla y er are WL W and L WL W. These sequences are not equally lik ely T o a v oid this dicult y F ermat extended the pla y adding ctitious pla ys so that the series w en t the maxim um n um b er of games needed (four in this case). He obtained equally lik ely outcomes and used, in eect, the P ascal triangle to calculate P ( r ; s ). Sho w that this leads to a formula for P ( r ; s ) ev en for the case p 6 = 1 = 2. 43 The Y ank ees are pla ying the Do dgers in a w orld series. The Y ank ees win eac h game with probabilit y .6. What is the probabilit y that the Y ank ees win the series? (The series is w on b y the rst team to win four games.) 44 C. L. Anderson 11 has used F ermat's argumen t for the pr oblem of p oints to pro v e the follo wing result due to J. G. Kingston. Y ou are pla ying the game of p oints (see Exercise 40) but, at eac h p oin t, when y ou serv e y ou win with probabilit y p and when y our opp onen t serv es y ou win with probabilit y p Y ou will serv e rst, but y ou can c ho ose one of the follo wing t w o con v en tions for serving: for the rst con v en tion y ou alternate service (tennis), and for the second the p erson serving con tin ues to serv e un til he loses a p oin t and then the other pla y er serv es (racquetball). The rst pla y er to win N p oin ts wins the game. The problem is to sho w that the probabilit y of winning the game is the same under either con v en tion. (a) Sho w that, under either con v en tion, y ou will serv e at most N p oin ts and y our opp onen t at most N 1 p oin ts. (b) Extend the n um b er of p oin ts to 2 N 1 so that y ou serv e N p oin ts and y our opp onen t serv es N 1. F or example, y ou serv e an y additional p oin ts necessary to mak e N serv es and then y our opp onen t serv es an y additional p oin ts necessary to mak e him serv e N 1 p oin ts. The winner 11 C. L. Anderson, \Note on the Adv an tage of First Serv e," Journal of Combinatorial The ory, Series A, v ol. 23 (1977), p. 363. PAGE 166 158 CHAPTER 4. CONDITIONAL PR OBABILITY is no w the p erson, in the extended game, who wins the most p oin ts. Sho w that pla ying these additional p oin ts has not c hanged the winner. (c) Sho w that (a) and (b) pro v e that y ou ha v e the same probabilit y of winning the game under either con v en tion. 45 In the previous problem, assume that p = 1 p (a) Sho w that under either service con v en tion, the rst pla y er will win more often than the second pla y er if and only if p > : 5. (b) In v olleyball, a team can only win a p oin t while it is serving. Th us, an y individual \pla y" either ends with a p oin t b eing a w arded to the serving team or with the service c hanging to the other team. The rst team to win N p oin ts wins the game. (W e ignore here the additional restriction that the winning team m ust b e ahead b y at least t w o p oin ts at the end of the game.) Assume that eac h team has the same probabilit y of winning the pla y when it is serving, i.e., that p = 1 p Sho w that in this case, the team that serv es rst will win more than half the time, as long as p > 0. (If p = 0, then the game nev er ends.) Hint : Dene p 0 to b e the probabilit y that a team wins the next p oin t, giv en that it is serving. If w e write q = 1 p then one can sho w that p 0 = p 1 q 2 : If one no w considers this game in a sligh tly dieren t w a y one can see that the second service con v en tion in the preceding problem can b e used, with p replaced b y p 0 46 A p ok er hand consists of 5 cards dealt from a dec k of 52 cards. Let X and Y b e, resp ectiv ely the n um b er of aces and kings in a p ok er hand. Find the join t distribution of X and Y 47 Let X 1 and X 2 b e indep enden t random v ariables and let Y 1 = 1 ( X 1 ) and Y 2 = 2 ( X 2 ). (a) Sho w that P ( Y 1 = r ; Y 2 = s ) = X 1 ( a )= r 2 ( b )= s P ( X 1 = a; X 2 = b ) : (b) Using (a), sho w that P ( Y 1 = r ; Y 2 = s ) = P ( Y 1 = r ) P ( Y 2 = s ) so that Y 1 and Y 2 are indep enden t. 48 Let n b e the sample space of an exp erimen t. Let E b e an ev en t with P ( E ) > 0 and dene m E ( ) b y m E ( ) = m ( j E ). Pro v e that m E ( ) is a distribution function on E that is, that m E ( ) 0 and that P 2 n m E ( ) = 1. The function m E is called the c onditional distribution given E PAGE 167 4.1. DISCRETE CONDITIONAL PR OBABILITY 159 49 Y ou are giv en t w o urns eac h con taining t w o biased coins. The coins in urn I come up heads with probabilit y p 1 and the coins in urn I I come up heads with probabilit y p 2 6 = p 1 Y ou are giv en a c hoice of (a) c ho osing an urn at random and tossing the t w o coins in this urn or (b) c ho osing one coin from eac h urn and tossing these t w o coins. Y ou win a prize if b oth coins turn up heads. Sho w that y ou are b etter o selecting c hoice (a). 50 Pro v e that, if A 1 A 2 A n are indep enden t ev en ts dened on a sample space n and if 0 < P ( A j ) < 1 for all j then n m ust ha v e at least 2 n p oin ts. 51 Pro v e that if P ( A j C ) P ( B j C ) and P ( A j ~ C ) P ( B j ~ C ) ; then P ( A ) P ( B ). 52 A coin is in one of n b o xes. The probabilit y that it is in the i th b o x is p i If y ou searc h in the i th b o x and it is there, y ou nd it with probabilit y a i Sho w that the probabilit y p that the coin is in the j th b o x, giv en that y ou ha v e lo ok ed in the i th b o x and not found it, is p = p j = (1 a i p i ) ; if j 6 = i; (1 a i ) p i = (1 a i p i ) ; if j = i: 53 George W olford has suggested the follo wing v ariation on the Linda problem (see Exercise 1.2.25). The registrar is carrying John and Mary's registration cards and drops them in a puddle. When he pic k es them up he cannot read the names but on the rst card he pic k ed up he can mak e out Mathematics 23 and Go v ernmen t 35, and on the second card he can mak e out only Mathematics 23. He asks y ou if y ou can help him decide whic h card b elongs to Mary Y ou kno w that Mary lik es go v ernmen t but do es not lik e mathematics. Y ou kno w nothing ab out John and assume that he is just a t ypical Dartmouth studen t. F rom this y ou estimate: P (Mary tak es Go v ernmen t 35 ) = : 5 ; P (Mary tak es Mathematics 23 ) = : 1 ; P (John tak es Go v ernmen t 35 ) = : 3 ; P (John tak es Mathematics 23 ) = : 2 : Assume that their c hoices for courses are indep enden t ev en ts. Sho w that the card with Mathematics 23 and Go v ernmen t 35 sho wing is more lik ely to b e Mary's than John's. The conjunction fallacy referred to in the Linda problem w ould b e to assume that the ev en t \Mary tak es Mathematics 23 and Go v ernmen t 35" is more lik ely than the ev en t \Mary tak es Mathematics 23." Wh y are w e not making this fallacy here? PAGE 168 160 CHAPTER 4. CONDITIONAL PR OBABILITY 54 (Suggested b y Eisen b erg and Ghosh 12 ) A dec k of pla ying cards can b e describ ed as a Cartesian pro duct Dec k = Suit Rank ; where Suit = f ; } ; ~ ; g and Rank = f 2 ; 3 ; : : : ; 10 ; J ; Q ; K ; A g This just means that ev ery card ma y b e though t of as an ordered pair lik e ( } ; 2). By a suit event w e mean an y ev en t A con tained in Dec k whic h is describ ed in terms of Suit alone. F or instance, if A is \the suit is red," then A = f} ; ~g Rank ; so that A consists of all cards of the form ( } ; r ) or ( ~ ; r ) where r is an y rank. Similarly a r ank event is an y ev en t describ ed in terms of rank alone. (a) Sho w that if A is an y suit ev en t and B an y rank ev en t, then A and B are indep endent. (W e can express this briery b y sa ying that suit and rank are indep enden t.) (b) Thro w a w a y the ace of spades. Sho w that no w no non trivial (i.e., neither empt y nor the whole space) suit ev en t A is indep enden t of an y non trivial rank ev en t B Hint : Here indep endence comes do wn to c= 51 = ( a= 51) ( b= 51) ; where a b c are the resp ectiv e sizes of A B and A \ B It follo ws that 51 m ust divide ab hence that 3 m ust divide one of a and b and 17 the other. But the p ossible sizes for suit and rank ev en ts preclude this. (c) Sho w that the dec k in (b) nev ertheless do es ha v e pairs A B of non trivial indep enden t ev en ts. Hint : Find 2 ev en ts A and B of sizes 3 and 17, resp ectiv ely whic h in tersect in a single p oin t. (d) Add a jok er to a full dec k. Sho w that no w there is no pair A B of non trivial indep enden t ev en ts. Hint : See the hin t in (b); 53 is prime. The follo wing problems are suggested b y Stanley Gudder in his article \Do Go o d Hands A ttract?" 13 He sa ys that ev en t A attr acts ev en t B if P ( B j A ) > P ( B ) and r ep els B if P ( B j A ) < P ( B ). 55 Let R i b e the ev en t that the i th pla y er in a p ok er game has a ro y al rush. Sho w that a ro y al rush (A,K,Q,J,10 of one suit) attracts another ro y al rush, that is P ( R 2 j R 1 ) > P ( R 2 ). Sho w that a ro y al rush rep els full houses. 56 Pro v e that A attracts B if and only if B attracts A Hence w e can sa y that A and B are mutual ly attr active if A attracts B 12 B. Eisen b erg and B. K. Ghosh, \Indep enden t Ev en ts in a Discrete Uniform Probabilit y Space," The A meric an Statistician, v ol. 41, no. 1 (1987), pp. 52{56. 13 S. Gudder, \Do Go o d Hands A ttract?" Mathematics Magazine, v ol. 54, no. 1 (1981), pp. 13{ 16. PAGE 169 4.1. DISCRETE CONDITIONAL PR OBABILITY 161 57 Pro v e that A neither attracts nor rep els B if and only if A and B are indep enden t. 58 Pro v e that A and B are m utually attractiv e if and only if P ( B j A ) > P ( B j ~ A ). 59 Pro v e that if A attracts B then A rep els ~ B 60 Pro v e that if A attracts b oth B and C and A rep els B \ C then A attracts B [ C Is there an y example in whic h A attracts b oth B and C and rep els B [ C ? 61 Pro v e that if B 1 B 2 B n are m utually disjoin t and collectiv ely exhaustiv e, and if A attracts some B i then A m ust rep el some B j 62 (a) Supp ose that y ou are lo oking in y our desk for a letter from some time ago. Y our desk has eigh t dra w ers, and y ou assess the probabilit y that it is in an y particular dra w er is 10% (so there is a 20% c hance that it is not in the desk at all). Supp ose no w that y ou start searc hing systematically through y our desk, one dra w er at a time. In addition, supp ose that y ou ha v e not found the letter in the rst i dra w ers, where 0 i 7. Let p i denote the probabilit y that the letter will b e found in the next dra w er, and let q i denote the probabilit y that the letter will b e found in some subsequen t dra w er (b oth p i and q i are conditional probabilities, since they are based up on the assumption that the letter is not in the rst i dra w ers). Sho w that the p i 's increase and the q i 's decrease. (This problem is from F alk et al. 14 ) (b) The follo wing data app eared in an article in the W all Street Journal. 15 F or the ages 20, 30, 40, 50, and 60, the probabilit y of a w oman in the U.S. dev eloping cancer in the next ten y ears is 0.5%, 1.2%, 3.2%, 6.4%, and 10.8%, resp ectiv ely A t the same set of ages, the probabilit y of a w oman in the U.S. ev en tually dev eloping cancer is 39.6%, 39.5%, 39.1%, 37.5%, and 34.2%, resp ectiv ely Do y ou think that the problem in part (a) giv es an explanation for these data? 63 Here are t w o v ariations of the Mon t y Hall problem that are discussed b y Gran b erg. 16 (a) Supp ose that ev erything is the same except that Mon t y forgot to nd out in adv ance whic h do or has the car b ehind it. In the spirit of \the sho w m ust go on," he mak es a guess at whic h of the t w o do ors to op en and gets luc ky op ening a do or b ehind whic h stands a goat. No w should the con testan t switc h? 14 R. F alk, A. Lipson, and C. Konold, \The ups and do wns of the hop e function in a fruitless searc h," in Subje ctive Pr ob ability, G. W righ t and P Ayton, (eds.) (Chic hester: Wiley 1994), pgs. 353377. 15 C. Crossen, \F righ t b y the n um b ers: Alarming disease data are frequen tly ra w ed," Wal l Str e et Journal, 11 April 1996, p. B1. 16 D. Gran b erg, \T o switc h or not to switc h," in The p ower of lo gic al thinking, M. v os Sa v an t, (New Y ork: St. Martin's 1996). PAGE 170 162 CHAPTER 4. CONDITIONAL PR OBABILITY (b) Y ou ha v e observ ed the sho w for a long time and found that the car is put b ehind do or A 45% of the time, b ehind do or B 40% of the time and b ehind do or C 15% of the time. Assume that ev erything else ab out the sho w is the same. Again y ou pic k do or A. Mon t y op ens a do or with a goat and oers to let y ou switc h. Should y ou? Supp ose y ou knew in adv ance that Mon t y w as going to giv e y ou a c hance to switc h. Should y ou ha v e initially c hosen do or A? 4.2 Con tin uous Conditional Probabilit y In situations where the sample space is con tin uous w e will follo w the same pro cedure as in the previous section. Th us, for example, if X is a con tin uous random v ariable with densit y function f ( x ), and if E is an ev en t with p ositiv e probabilit y w e dene a conditional densit y function b y the form ula f ( x j E ) = f ( x ) =P ( E ) ; if x 2 E ; 0 ; if x 62 E : Then for an y ev en t F w e ha v e P ( F j E ) = Z F f ( x j E ) dx : The expression P ( F j E ) is called the conditional probabilit y of F giv en E As in the previous section, it is easy to obtain an alternativ e expression for this probabilit y: P ( F j E ) = Z F f ( x j E ) dx = Z E \ F f ( x ) P ( E ) dx = P ( E \ F ) P ( E ) : W e can think of the conditional densit y function as b eing 0 except on E and normalized to ha v e in tegral 1 o v er E Note that if the original densit y is a uniform densit y corresp onding to an exp erimen t in whic h all ev en ts of equal size are e qual ly likely, then the same will b e true for the conditional densit y Example 4.18 In the spinner exp erimen t (cf. Example 2.1), supp ose w e kno w that the spinner has stopp ed with head in the upp er half of the circle, 0 x 1 = 2. What is the probabilit y that 1 = 6 x 1 = 3? Here E = [0 ; 1 = 2], F = [1 = 6 ; 1 = 3], and F \ E = F Hence P ( F j E ) = P ( F \ E ) P ( E ) = 1 = 6 1 = 2 = 1 3 ; whic h is reasonable, since F is 1/3 the size of E The conditional densit y function here is giv en b y PAGE 171 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 163 f ( x j E ) = 2 ; if 0 x < 1 = 2 ; 0 ; if 1 = 2 x < 1 : Th us the conditional densit y function is nonzero only on [0 ; 1 = 2], and is uniform there. 2 Example 4.19 In the dart game (cf. Example 2.8), supp ose w e kno w that the dart lands in the upp er half of the target. What is the probabilit y that its distance from the cen ter is less than 1/2? Here E = f ( x; y ) : y 0 g and F = f ( x; y ) : x 2 + y 2 < (1 = 2) 2 g Hence, P ( F j E ) = P ( F \ E ) P ( E ) = (1 = )[(1 = 2)( = 4)] (1 = )( = 2) = 1 = 4 : Here again, the size of F \ E is 1/4 the size of E The conditional densit y function is f (( x; y ) j E ) = f ( x; y ) =P ( E ) = 2 = ; if ( x; y ) 2 E ; 0 ; if ( x; y ) 62 E : 2 Example 4.20 W e return to the exp onen tial densit y (cf. Example 2.17). W e supp ose that w e are observing a lump of plutonium239. Our exp erimen t consists of w aiting for an emission, then starting a clo c k, and recording the length of time X that passes un til the next emission. Exp erience has sho wn that X has an exp onen tial densit y with some parameter whic h dep ends up on the size of the lump. Supp ose that when w e p erform this exp erimen t, w e notice that the clo c k reads r seconds, and is still running. What is the probabilit y that there is no emission in a further s seconds? Let G ( t ) b e the probabilit y that the next particle is emitted after time t Then G ( t ) = Z 1 t e x dx = e x 1t = e t : Let E b e the ev en t \the next particle is emitted after time r and F the ev en t \the next particle is emitted after time r + s ." Then P ( F j E ) = P ( F \ E ) P ( E ) = G ( r + s ) G ( r ) = e ( r + s ) e r = e s : PAGE 172 164 CHAPTER 4. CONDITIONAL PR OBABILITY This tells us the rather surprising fact that the probabilit y that w e ha v e to w ait s seconds more for an emission, giv en that there has b een no emission in r seconds, is indep endent of the time r This prop ert y (called the memoryless prop ert y) w as in tro duced in Example 2.17. When trying to mo del v arious phenomena, this prop ert y is helpful in deciding whether the exp onen tial densit y is appropriate. The fact that the exp onen tial densit y is memoryless means that it is reasonable to assume if one comes up on a lump of a radioactiv e isotop e at some random time, then the amoun t of time un til the next emission has an exp onen tial densit y with the same parameter as the time b et w een emissions. A w ellkno wn example, kno wn as the \bus parado x," replaces the emissions b y buses. The apparen t parado x arises from the follo wing t w o facts: 1) If y ou kno w that, on the a v erage, the buses come b y ev ery 30 min utes, then if y ou come to the bus stop at a random time, y ou should only ha v e to w ait, on the a v erage, for 15 min utes for a bus, and 2) Since the buses arriv al times are b eing mo delled b y the exp onen tial densit y then no matter when y ou arriv e, y ou will ha v e to w ait, on the a v erage, for 30 min utes for a bus. The reader can no w see that in Exercises 2.2.9, 2.2.10, and 2.2.11, w e w ere asking for sim ulations of conditional probabilities, under v arious assumptions on the distribution of the in terarriv al times. If one mak es a reasonable assumption ab out this distribution, suc h as the one in Exercise 2.2.10, then the a v erage w aiting time is more nearly onehalf the a v erage in terarriv al time. 2 Indep enden t Ev en ts If E and F are t w o ev en ts with p ositiv e probabilit y in a con tin uous sample space, then, as in the case of discrete sample spaces, w e dene E and F to b e indep endent if P ( E j F ) = P ( E ) and P ( F j E ) = P ( F ). As b efore, eac h of the ab o v e equations imply the other, so that to see whether t w o ev en ts are indep enden t, only one of these equations m ust b e c hec k ed. It is also the case that, if E and F are indep enden t, then P ( E \ F ) = P ( E ) P ( F ). Example 4.21 (Example 4.18 con tin ued) In the dart game (see Example 4.18), let E b e the ev en t that the dart lands in the upp er half of the target ( y 0) and F the ev en t that the dart lands in the right half of the target ( x 0). Then P ( E \ F ) is the probabilit y that the dart lies in the rst quadran t of the target, and P ( E \ F ) = 1 Z E \ F 1 dxdy = Area ( E \ F ) = Area ( E ) Area ( F ) = 1 Z E 1 dxdy 1 Z F 1 dxdy = P ( E ) P ( F ) so that E and F are indep enden t. What mak es this w ork is that the ev en ts E and F are describ ed b y restricting dieren t co ordinates. This idea is made more precise b elo w. 2 PAGE 173 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 165 Join t Densit y and Cum ulativ e Distribution F unctions In a manner analogous with discrete random v ariables, w e can dene join t densit y functions and cum ulativ e distribution functions for m ultidimensional con tin uous random v ariables. Denition 4.6 Let X 1 ; X 2 ; : : : ; X n b e con tin uous random v ariables asso ciated with an exp erimen t, and let X = ( X 1 ; X 2 ; : : : ; X n ). Then the join t cum ulativ e distribution function of X is dened b y F ( x 1 ; x 2 ; : : : ; x n ) = P ( X 1 x 1 ; X 2 x 2 ; : : : ; X n x n ) : The join t densit y function of X satises the follo wing equation: F ( x 1 ; x 2 ; : : : ; x n ) = Z x 1 1 Z x 2 1 Z x n 1 f ( t 1 ; t 2 ; : : : t n ) dt n dt n 1 : : : dt 1 : 2 It is straigh tforw ard to sho w that, in the ab o v e notation, f ( x 1 ; x 2 ; : : : ; x n ) = @ n F ( x 1 ; x 2 ; : : : ; x n ) @ x 1 @ x 2 @ x n : (4.4) Indep enden t Random V ariables As with discrete random v ariables, w e can dene m utual indep endence of con tin uous random v ariables. Denition 4.7 Let X 1 X 2 X n b e con tin uous random v ariables with cum ulativ e distribution functions F 1 ( x ) ; F 2 ( x ) ; : : : ; F n ( x ). Then these random v ariables are mutual ly indep endent if F ( x 1 ; x 2 ; : : : ; x n ) = F 1 ( x 1 ) F 2 ( x 2 ) F n ( x n ) for an y c hoice of x 1 ; x 2 ; : : : ; x n Th us, if X 1 ; X 2 ; : : : ; X n are m utually indep enden t, then the join t cum ulativ e distribution function of the random v ariable X = ( X 1 ; X 2 ; : : : ; X n ) is just the pro duct of the individual cum ulativ e distribution functions. When t w o random v ariables are m utually indep enden t, w e shall sa y more briery that they are indep endent. 2 Using Equation 4.4, the follo wing theorem can easily b e sho wn to hold for m utually indep enden t con tin uous random v ariables. Theorem 4.2 Let X 1 X 2 X n b e con tin uous random v ariables with densit y functions f 1 ( x ) ; f 2 ( x ) ; : : : ; f n ( x ). Then these random v ariables are mutual ly indep endent if and only if f ( x 1 ; x 2 ; : : : ; x n ) = f 1 ( x 1 ) f 2 ( x 2 ) f n ( x n ) for an y c hoice of x 1 ; x 2 ; : : : ; x n 2 PAGE 174 166 CHAPTER 4. CONDITIONAL PR OBABILITY 1 1 r r 0 w w E 2 1 1 2 1 Figure 4.7: X 1 and X 2 are indep enden t. Let's lo ok at some examples. Example 4.22 In this example, w e dene three random v ariables, X 1 ; X 2 and X 3 W e will sho w that X 1 and X 2 are indep enden t, and that X 1 and X 3 are not indep enden t. Cho ose a p oin t = ( 1 ; 2 ) at random from the unit square. Set X 1 = 2 1 X 2 = 2 2 and X 3 = 1 + 2 Find the join t distributions F 12 ( r 1 ; r 2 ) and F 23 ( r 2 ; r 3 ). W e ha v e already seen (see Example 2.13) that F 1 ( r 1 ) = P ( 1 < X 1 r 1 ) = p r 1 ; if 0 r 1 1 ; and similarly F 2 ( r 2 ) = p r 2 ; if 0 r 2 1. No w w e ha v e (see Figure 4.7) F 12 ( r 1 ; r 2 ) = P ( X 1 r 1 and X 2 r 2 ) = P ( 1 p r 1 and 2 p r 2 ) = Area ( E 1 ) = p r 1 p r 2 = F 1 ( r 1 ) F 2 ( r 2 ) : In this case F 12 ( r 1 ; r 2 ) = F 1 ( r 1 ) F 2 ( r 2 ) so that X 1 and X 2 are indep enden t. On the other hand, if r 1 = 1 = 4 and r 3 = 1, then (see Figure 4.8) F 13 (1 = 4 ; 1) = P ( X 1 1 = 4 ; X 3 1) PAGE 175 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 167 1 1 0 w w w + w = 1 1/2 1 2 1 2 E 2 Figure 4.8: X 1 and X 3 are not indep enden t. = P ( 1 1 = 2 ; 1 + 2 1) = Area ( E 2 ) = 1 2 1 8 = 3 8 : No w recalling that F 3 ( r 3 ) = 8>><>>: 0 ; if r 3 < 0 ; (1 = 2) r 2 3 ; if 0 r 3 1 ; 1 (1 = 2)(2 r 3 ) 2 ; if 1 r 3 2 ; 1 ; if 2 < r 3 ; (see Example 2.14), w e ha v e F 1 (1 = 4) F 3 (1) = (1 = 2)(1 = 2) = 1 = 4. Hence, X 1 and X 3 are not indep enden t random v ariables. A similar calculation sho ws that X 2 and X 3 are not indep enden t either. 2 Although w e shall not pro v e it here, the follo wing theorem is a useful one. The statemen t also holds for m utually indep enden t discrete random v ariables. A pro of ma y b e found in R en yi. 17 Theorem 4.3 Let X 1 ; X 2 ; : : : ; X n b e m utually indep enden t con tin uous random v ariables and let 1 ( x ) ; 2 ( x ) ; : : : ; n ( x ) b e con tin uous functions. Then 1 ( X 1 ) ; 2 ( X 2 ) ; : : : ; n ( X n ) are m utually indep enden t. 2 Indep enden t T rials Using the notion of indep endence, w e can no w form ulate for con tin uous sample spaces the notion of indep enden t trials (see Denition 4.5). 17 A. R en yi, Pr ob ability The ory (Budap est: Ak ad emiai Kiad o, 1970), p. 183. PAGE 176 168 CHAPTER 4. CONDITIONAL PR OBABILITY 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 3 a = b =.5 a = b =1 a = b = 2 0 Figure 4.9: Beta densit y for = = : 5 ; 1 ; 2 : Denition 4.8 A sequence X 1 X 2 X n of random v ariables X i that are m utually indep enden t and ha v e the same densit y is called an indep endent trials pr o c ess. 2 As in the case of discrete random v ariables, these indep enden t trials pro cesses arise naturally in situations where an exp erimen t describ ed b y a single random v ariable is rep eated n times. Beta Densit y W e consider next an example whic h in v olv es a sample space with b oth discrete and con tin uous co ordinates. F or this example w e shall need a new densit y function called the b eta density. This densit y has t w o parameters and is dened b y B ( ; ; x ) = (1 =B ( ; )) x 1 (1 x ) 1 ; if 0 x 1 ; 0 ; otherwise : Here and are an y p ositiv e n um b ers, and the b eta function B ( ; ) is giv en b y the area under the graph of x 1 (1 x ) 1 b et w een 0 and 1: B ( ; ) = Z 1 0 x 1 (1 x ) 1 dx : Note that when = = 1 the b eta densit y if the uniform densit y When and are greater than 1 the densit y is b ellshap ed, but when they are less than 1 it is Ushap ed as suggested b y the examples in Figure 4.9. W e shall need the v alues of the b eta function only for in teger v alues of and and in this case B ( ; ) = ( 1)! ( 1)! ( + 1)! : Example 4.23 In medical problems it is often assumed that a drug is eectiv e with a probabilit y x eac h time it is used and the v arious trials are indep enden t, so that PAGE 177 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 169 one is, in eect, tossing a biased coin with probabilit y x for heads. Before further exp erimen tation, y ou do not kno w the v alue x but past exp erience migh t giv e some information ab out its p ossible v alues. It is natural to represen t this information b y sk etc hing a densit y function to determine a distribution for x Th us, w e are considering x to b e a con tin uous random v ariable, whic h tak es on v alues b et w een 0 and 1. If y ou ha v e no kno wledge at all, y ou w ould sk etc h the uniform densit y If past exp erience suggests that x is v ery lik ely to b e near 2/3 y ou w ould sk etc h a densit y with maxim um at 2/3 and a spread rerecting y our uncertainly in the estimate of 2/3. Y ou w ould then w an t to nd a densit y function that reasonably ts y our sk etc h. The b eta densities pro vide a class of densities that can b e t to most sk etc hes y ou migh t mak e. F or example, for > 1 and > 1 it is b ellshap ed with the parameters and determining its p eak and its spread. Assume that the exp erimen ter has c hosen a b eta densit y to describ e the state of his kno wledge ab out x b efore the exp erimen t. Then he giv es the drug to n sub jects and records the n um b er i of successes. The n um b er i is a discrete random v ariable, so w e ma y con v enien tly describ e the set of p ossible outcomes of this exp erimen t b y referring to the ordered pair ( x; i ). W e let m ( i j x ) denote the probabilit y that w e observ e i successes giv en the v alue of x By our assumptions, m ( i j x ) is the binomial distribution with probabilit y x for success: m ( i j x ) = b ( n; x; i ) = n i x i (1 x ) j ; where j = n i If x is c hosen at random from [0 ; 1] with a b eta densit y B ( ; ; x ), then the densit y function for the outcome of the pair ( x; i ) is f ( x; i ) = m ( i j x ) B ( ; ; x ) = n i x i (1 x ) j 1 B ( ; ) x 1 (1 x ) 1 = n i 1 B ( ; ) x + i 1 (1 x ) + j 1 : No w let m ( i ) b e the probabilit y that w e observ e i successes not kno wing the v alue of x Then m ( i ) = Z 1 0 m ( i j x ) B ( ; ; x ) dx = n i 1 B ( ; ) Z 1 0 x + i 1 (1 x ) + j 1 dx = n i B ( + i; + j ) B ( ; ) : Hence, the probabilit y densit y f ( x j i ) for x giv en that i successes w ere observ ed, is f ( x j i ) = f ( x; i ) m ( i ) PAGE 178 170 CHAPTER 4. CONDITIONAL PR OBABILITY = x + i 1 (1 x ) + j 1 B ( + i; + j ) ; (4.5) that is, f ( x j i ) is another b eta densit y This sa ys that if w e observ e i successes and j failures in n sub jects, then the new densit y for the probabilit y that the drug is eectiv e is again a b eta densit y but with parameters + i + j No w w e assume that b efore the exp erimen t w e c ho ose a b eta densit y with parameters and and that in the exp erimen t w e obtain i successes in n trials. W e ha v e just seen that in this case, the new densit y for x is a b eta densit y with parameters + i and + j No w w e wish to calculate the probabilit y that the drug is eectiv e on the next sub ject. F or an y particular real n um b er t b et w een 0 and 1, the probabilit y that x has the v alue t is giv en b y the expression in Equation 4.5. Giv en that x has the v alue t the probabilit y that the drug is eectiv e on the next sub ject is just t Th us, to obtain the probabilit y that the drug is eectiv e on the next sub ject, w e in tegrate the pro duct of the expression in Equation 4.5 and t o v er all p ossible v alues of t W e obtain: 1 B ( + i; + j ) Z 1 0 t t + i 1 (1 t ) + j 1 dt = B ( + i + 1 ; + j ) B ( + i; + j ) = ( + i )! ( + j 1)! ( + + i + j )! ( + + i + j 1)! ( + i 1)! ( + j 1)! = + i + + n : If n is large, then our estimate for the probabilit y of success after the exp erimen t is appro ximately the prop ortion of successes observ ed in the exp erimen t, whic h is certainly a reasonable conclusion. 2 The next example is another in whic h the true probabilities are unkno wn and m ust b e estimated based up on exp erimen tal data. Example 4.24 (Tw oarmed bandit problem) Y ou are in a casino and confron ted b y t w o slot mac hines. Eac h mac hine pa ys o either 1 dollar or nothing. The probabilit y that the rst mac hine pa ys o a dollar is x and that the second mac hine pa ys o a dollar is y W e assume that x and y are random n um b ers c hosen indep enden tly from the in terv al [0 ; 1] and unkno wn to y ou. Y ou are p ermitted to mak e a series of ten pla ys, eac h time c ho osing one mac hine or the other. Ho w should y ou c ho ose to maximize the n um b er of times that y ou win? One strategy that sounds reasonable is to calculate, at ev ery stage, the probabilit y that eac h mac hine will pa y o and c ho ose the mac hine with the higher probabilit y Let win( i ), for i = 1 or 2, b e the n um b er of times that y ou ha v e w on on the i th mac hine. Similarly let lose( i ) b e the n um b er of times y ou ha v e lost on the i th mac hine. Then, from Example 4.23, the probabilit y p ( i ) that y ou win if y ou PAGE 179 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 171 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 Machine Result r 1 Wr 1 Lr 2 Lr 1 Lr 1 Wr 1 Lr 1 Lr 1 Lr 2 Wr 2 L 0 0 Figure 4.10: Pla y the b est mac hine. c ho ose the i th mac hine is p ( i ) = win( i ) + 1 win ( i ) + lose ( i ) + 2 : Th us, if p (1) > p (2) y ou w ould pla y mac hine 1 and otherwise y ou w ould pla y mac hine 2. W e ha v e written a program Tw oArm to sim ulate this exp erimen t. In the program, the user sp ecies the initial v alues for x and y (but these are unkno wn to the exp erimen ter). The program calculates at eac h stage the t w o conditional densities for x and y giv en the outcomes of the previous trials, and then computes p ( i ), for i = 1, 2. It then c ho oses the mac hine with the highest v alue for the probabilit y of winning for the next pla y The program prin ts the mac hine c hosen on eac h pla y and the outcome of this pla y It also plots the new densities for x (solid line) and y (dotted line), sho wing only the curren t densities. W e ha v e run the program for ten pla ys for the case x = : 6 and y = : 7. The result is sho wn in Figure 4.10. The run of the program sho ws the w eakness of this strategy Our initial probabilit y for winning on the b etter of the t w o mac hines is .7. W e start with the p o orer mac hine and our outcomes are suc h that w e alw a ys ha v e a probabilit y greater than .6 of winning and so w e just k eep pla ying this mac hine ev en though the other mac hine is b etter. If w e had lost on the rst pla y w e w ould ha v e switc hed mac hines. Our nal densit y for y is the same as our initial densit y namely the uniform densit y Our nal densit y for x is dieren t and rerects a m uc h more accurate kno wledge ab out x The computer did prett y w ell with this strategy winning sev en out of the ten trials, but ten trials are not enough to judge whether this is a go o d strategy in the long run. Another p opular strategy is the playthewinner str ate gy. As the name suggests, for this strategy w e c ho ose the same mac hine when w e win and switc h mac hines when w e lose. The program Tw oArm will sim ulate this strategy as w ell. In Figure 4.11, w e sho w the results of running this program with the pla ythewinner strategy and the same true probabilities of .6 and .7 for the t w o mac hines. After ten pla ys our densities for the unkno wn probabilities of winning suggest to us that the second mac hine is indeed the b etter of the t w o. W e again w on sev en out of the ten trials. PAGE 180 172 CHAPTER 4. CONDITIONAL PR OBABILITY 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 Machine Result r 1 Wr 1 Wr 1 Lr 2 Lr 1 Wr 1 Wr 1 Lr 2 Lr 1 Lr 2 W Figure 4.11: Pla y the winner. Neither of the strategies that w e sim ulated is the b est one in terms of maximizing our a v erage winnings. This b est strategy is v ery complicated but is reasonably appro ximated b y the pla ythewinner strategy V ariations on this example ha v e pla y ed an imp ortan t role in the problem of clinical tests of drugs where exp erimen ters face a similar situation. 2 Exercises 1 Pic k a p oin t x at random (with uniform densit y) in the in terv al [0 ; 1]. Find the probabilit y that x > 1 = 2, giv en that (a) x > 1 = 4. (b) x < 3 = 4. (c) j x 1 = 2 j < 1 = 4. (d) x 2 x + 2 = 9 < 0. 2 A radioactiv e material emits particles at a rate describ ed b y the densit y function f ( t ) = : 1 e : 1 t : Find the probabilit y that a particle is emitted in the rst 10 seconds, giv en that (a) no particle is emitted in the rst second. (b) no particle is emitted in the rst 5 seconds. (c) a particle is emitted in the rst 3 seconds. (d) a particle is emitted in the rst 20 seconds. 3 The Acme Sup er ligh t bulb is kno wn to ha v e a useful life describ ed b y the densit y function f ( t ) = : 01 e : 01 t ; where time t is measured in hours. PAGE 181 4.2. CONTINUOUS CONDITIONAL PR OBABILITY 173 (a) Find the failur e r ate of this bulb (see Exercise 2.2.6). (b) Find the r eliability of this bulb after 20 hours. (c) Giv en that it lasts 20 hours, nd the probabilit y that the bulb lasts another 20 hours. (d) Find the probabilit y that the bulb burns out in the fort yrst hour, giv en that it lasts 40 hours. 4 Supp ose y ou toss a dart at a circular target of radius 10 inc hes. Giv en that the dart lands in the upp er half of the target, nd the probabilit y that (a) it lands in the righ t half of the target. (b) its distance from the cen ter is less than 5 inc hes. (c) its distance from the cen ter is greater than 5 inc hes. (d) it lands within 5 inc hes of the p oin t (0 ; 5). 5 Supp ose y ou c ho ose t w o n um b ers x and y indep enden tly at random from the in terv al [0 ; 1]. Giv en that their sum lies in the in terv al [0 ; 1], nd the probabilit y that (a) j x y j < 1. (b) xy < 1 = 2. (c) max f x; y g < 1 = 2. (d) x 2 + y 2 < 1 = 4. (e) x > y 6 Find the conditional densit y functions for the follo wing exp erimen ts. (a) A n um b er x is c hosen at random in the in terv al [0 ; 1], giv en that x > 1 = 4. (b) A n um b er t is c hosen at random in the in terv al [0 ; 1 ) with exp onen tial densit y e t giv en that 1 < t < 10. (c) A dart is thro wn at a circular target of radius 10 inc hes, giv en that it falls in the upp er half of the target. (d) Tw o n um b ers x and y are c hosen at random in the in terv al [0 ; 1], giv en that x > y 7 Let x and y b e c hosen at random from the in terv al [0 ; 1]. Sho w that the ev en ts x > 1 = 3 and y > 2 = 3 are indep enden t ev en ts. 8 Let x and y b e c hosen at random from the in terv al [0 ; 1]. Whic h pairs of the follo wing ev en ts are indep enden t? (a) x > 1 = 3. (b) y > 2 = 3. (c) x > y PAGE 182 174 CHAPTER 4. CONDITIONAL PR OBABILITY (d) x + y < 1. 9 Supp ose that X and Y are con tin uous random v ariables with densit y functions f X ( x ) and f Y ( y ), resp ectiv ely Let f ( x; y ) denote the join t densit y function of ( X ; Y ). Sho w that Z 1 1 f ( x; y ) dy = f X ( x ) ; and Z 1 1 f ( x; y ) dx = f Y ( y ) : *10 In Exercise 2.2.12 y ou pro v ed the follo wing: If y ou tak e a stic k of unit length and break it in to three pieces, c ho osing the breaks at random (i.e., c ho osing t w o real n um b ers indep enden tly and uniformly from [0, 1]), then the probabilit y that the three pieces form a triangle is 1/4. Consider no w a similar exp erimen t: First break the stic k at random, then break the longer piece at random. Sho w that the t w o exp erimen ts are actually quite dieren t, as follo ws: (a) W rite a program whic h sim ulates b oth cases for a run of 1000 trials, prin ts out the prop ortion of successes for eac h run, and rep eats this pro cess ten times. (Call a trial a success if the three pieces do form a triangle.) Ha v e y our program pic k ( x; y ) at random in the unit square, and in eac h case use x and y to nd the t w o breaks. F or eac h exp erimen t, ha v e it plot ( x; y ) if ( x; y ) giv es a success. (b) Sho w that in the second exp erimen t the theoretical probabilit y of success is actually 2 log 2 1. 11 A coin has an unkno wn bias p that is assumed to b e uniformly distributed b et w een 0 and 1. The coin is tossed n times and heads turns up j times and tails turns up k times. W e ha v e seen that the probabilit y that heads turns up next time is j + 1 n + 2 : Sho w that this is the same as the probabilit y that the next ball is blac k for the P oly a urn mo del of Exercise 4.1.20. Use this result to explain wh y in the P oly a urn mo del, the prop ortion of blac k balls do es not tend to 0 or 1 as one migh t exp ect but rather to a uniform distribution on the in terv al [0 ; 1]. 12 Previous exp erience with a drug suggests that the probabilit y p that the drug is eectiv e is a random quan tit y ha ving a b eta densit y with parameters = 2 and = 3. The drug is used on ten sub jects and found to b e successful in four out of the ten patien ts. What densit y should w e no w assign to the probabilit y p ? What is the probabilit y that the drug will b e successful the next time it is used? PAGE 183 4.3. P ARADO XES 175 13 W rite a program to allo w y ou to compare the strategies pla ythewinner and pla ytheb estmac hine for the t w oarmed bandit problem of Example 4.24. Ha v e y our program determine the initial pa y o probabilities for eac h mac hine b y c ho osing a pair of random n um b ers b et w een 0 and 1. Ha v e y our program carry out 20 pla ys and k eep trac k of the n um b er of wins for eac h of the t w o strategies. Finally ha v e y our program mak e 1000 rep etitions of the 20 pla ys and compute the a v erage winning p er 20 pla ys. Whic h strategy seems to b e the b est? Rep eat these sim ulations with 20 replaced b y 100. Do es y our answ er to the ab o v e question c hange? 14 Consider the t w oarmed bandit problem of Example 4.24. Bruce Barnes prop osed the follo wing strategy whic h is a v ariation on the pla ytheb estmac hine strategy The mac hine with the greatest probabilit y of winning is pla y ed unless the follo wing t w o conditions hold: (a) the dierence in the probabilities for winning is less than .08, and (b) the ratio of the n um b er of times pla y ed on the more often pla y ed mac hine to the n um b er of times pla y ed on the less often pla y ed mac hine is greater than 1.4. If the ab o v e t w o conditions hold, then the mac hine with the smaller probabilit y of winning is pla y ed. W rite a program to sim ulate this strategy Ha v e y our program c ho ose the initial pa y o probabilities at random from the unit in terv al [0 ; 1], mak e 20 pla ys, and k eep trac k of the n um b er of wins. Rep eat this exp erimen t 1000 times and obtain the a v erage n um b er of wins p er 20 pla ys. Implemen t a second strategyfor example, pla ytheb estmac hine or one of y our o wn c hoice, and see ho w this second strategy compares with Bruce's on a v erage wins. 4.3 P arado xes Muc h of this section is based on an article b y Snell and V anderb ei. 18 One m ust b e v ery careful in dealing with problems in v olving conditional probabilit y The reader will recall that in the Mon t y Hall problem (Example 4.6), if the con testan t c ho oses the do or with the car b ehind it, then Mon t y has a c hoice of do ors to op en. W e made an assumption that in this case, he will c ho ose eac h do or with probabilit y 1/2. W e then noted that if this assumption is c hanged, the answ er to the original question c hanges. In this section, w e will study other examples of the same phenomenon. Example 4.25 Consider a family with t w o c hildren. Giv en that one of the c hildren is a b o y what is the probabilit y that b oth c hildren are b o ys? One w a y to approac h this problem is to sa y that the other c hild is equally lik ely to b e a b o y or a girl, so the probabilit y that b oth c hildren are b o ys is 1/2. The \textb o ok" solution w ould b e to dra w the tree diagram and then form the conditional tree b y deleting paths to lea v e only those paths that are consisten t with the giv en 18 J. L. Snell and R. V anderb ei, \Three Bewitc hing P arado xes," in T opics in Contemp or ary Pr ob ability and Its Applic ations CR C Press, Bo ca Raton, 1995. PAGE 184 176 CHAPTER 4. CONDITIONAL PR OBABILITY Firstrchild Secondrchild Conditionalrprobability Firstrchild Secondrchild Unconditionalrprobability bg b g b g b 1/4 1/4 1/4 1/4 1/31/3 1/3 1/4 1/4 1/4 1/2 1/2 1/2 1/2 1/2 b g 1/2 1/2 1/2 g b Unconditionalrprobability 1/2 1/2 1/2 Figure 4.12: T ree for Example 4.25. information. The result is sho wn in Figure 4.12. W e see that the probabilit y of t w o b o ys giv en a b o y in the family is not 1/2 but rather 1/3. 2 This problem and others lik e it are discussed in BarHillel and F alk. 19 These authors stress that the answ er to conditional probabilities of this kind can c hange dep ending up on ho w the information giv en w as actually obtained. F or example, they sho w that 1/2 is the correct answ er for the follo wing scenario. Example 4.26 Mr. Smith is the father of t w o. W e meet him w alking along the street with a y oung b o y whom he proudly in tro duces as his son. What is the probabilit y that Mr. Smith's other c hild is also a b o y? As usual w e ha v e to mak e some additional assumptions. F or example, w e will assume that if Mr. Smith has a b o y and a girl, he is equally lik ely to c ho ose either one to accompan y him on his w alk. In Figure 4.13 w e sho w the tree analysis of this problem and w e see that 1/2 is, indeed, the correct answ er. 2 Example 4.27 It is not so easy to think of reasonable scenarios that w ould lead to the classical 1/3 answ er. An attempt w as made b y Stephen Geller in prop osing this problem to Marilyn v os Sa v an t. 20 Geller's problem is as follo ws: A shopk eep er sa ys she has t w o new bab y b eagles to sho w y ou, but she do esn't kno w whether they're b oth male, b oth female, or one of eac h sex. Y ou tell her that y ou w an t only a male, and she telephones the fello w who's giving them a bath. \Is at least one a male?" 19 M. BarHillel and R. F alk, \Some teasers concerning conditional probabilities," Co gnition v ol. 11 (1982), pgs. 109122. 20 M. v os Sa v an t, \Ask Marilyn," Par ade Magazine 9 Septem b er; 2 Decem b er; 17 F ebruary 1990, reprin ted in Marilyn v os Sa v an t, Ask Marilyn St. Martins, New Y ork, 1992. PAGE 185 4.3. P ARADO XES 177 Mr.Smith'srchildren Walking withrMr.Smith Unconditionalrprobability Mr.Smith'srchildren Walking withrMr. Smith Unconditionalrprobability b bb b gg g b 1/4 1/8 1/8 1/8 1/8 1/4 1/4 1/4 1/4 1/4 1 1/2 1/2 1/2 1/2 1 bg gb gg b bb b b 1/4 1/8 1/8 1/4 1/4 1/4 1 1/2 1/2 bg gb Conditionalrprobability 1/2 1/41/4 Figure 4.13: T ree for Example 4.26. PAGE 186 178 CHAPTER 4. CONDITIONAL PR OBABILITY she asks. \Y es," she informs y ou with a smile. What is the probabilit y that the other one is male? The reader is ask ed to decide whether the mo del whic h giv es an answ er of 1/3 is a reasonable one to use in this case. 2 In the preceding examples, the apparen t parado xes could easily b e resolv ed b y clearly stating the mo del that is b eing used and the assumptions that are b eing made. W e no w turn to some examples in whic h the parado xes are not so easily resolv ed. Example 4.28 Tw o en v elop es eac h con tain a certain amoun t of money One env elop e is giv en to Ali and the other to Baba and they are told that one en v elop e con tains t wice as m uc h money as the other. Ho w ev er, neither kno ws who has the larger prize. Before an y one has op ened their en v elop e, Ali is ask ed if she w ould lik e to trade her en v elop e with Baba. She reasons as follo ws: Assume that the amoun t in m y en v elop e is x If I switc h, I will end up with x= 2 with probabilit y 1/2, and 2 x with probabilit y 1/2. If I w ere giv en the opp ortunit y to pla y this game man y times, and if I w ere to switc h eac h time, I w ould, on a v erage, get 1 2 x 2 + 1 2 2 x = 5 4 x : This is greater than m y a v erage winnings if I didn't switc h. Of course, Baba is presen ted with the same opp ortunit y and reasons in the same w a y to conclude that he to o w ould lik e to switc h. So they switc h and eac h thinks that his/her net w orth just w en t up b y 25%. Since neither has y et op ened an y en v elop e, this pro cess can b e rep eated and so again they switc h. No w they are bac k with their original en v elop es and y et they think that their fortune has increased 25% t wice. By this reasoning, they could con vince themselv es that b y rep eatedly switc hing the en v elop es, they could b ecome arbitrarily w ealth y Clearly something is wrong with the ab o v e reasoning, but where is the mistak e? One of the tric ks of making parado xes is to mak e them sligh tly more dicult than is necessary to further b efuddle us. As John Finn has suggested, in this parado x w e could just ha v e w ell started with a simpler problem. Supp ose Ali and Baba kno w that I am going to giv e then either an en v elop e with $5 or one with $10 and I am going to toss a coin to decide whic h to giv e to Ali, and then giv e the other to Baba. Then Ali can argue that Baba has 2 x with probabilit y 1 = 2 and x= 2 with probabilit y 1 = 2. This leads Ali to the same conclusion as b efore. But no w it is clear that this is nonsense, since if Ali has the en v elop e con taining $5, Baba cannot p ossibly ha v e half of this, namely $2.50, since that w as not ev en one of the c hoices. Similarly if Ali has $10, Baba cannot ha v e t wice as m uc h, namely $20. In fact, in this simpler problem the p ossibly outcomes are giv en b y the tree diagram in Figure 4.14. F rom the diagram, it is clear that neither is made b etter o b y switc hing. 2 In the ab o v e example, Ali's reasoning is incorrect b ecause he infers that if the amoun t in his en v elop e is x then the probabilit y that his en v elop e con tains the PAGE 187 4.3. P ARADO XES 179 $5 $10 $10 $5 1/2 1/2 1/2 1 1 1/2 In Alisrenveloperr In Babasrenvelope Figure 4.14: John Finn's v ersion of Example 4.28. smaller amoun t is 1/2, and the probabilit y that her en v elop e con tains the larger amoun t is also 1/2. In fact, these conditional probabilities dep end up on the distribution of the amoun ts that are placed in the en v elop es. F or deniteness, let X denote the p ositiv e in tegerv alued random v ariable whic h represen ts the smaller of the t w o amoun ts in the en v elop es. Supp ose, in addition, that w e are giv en the distribution of X i.e., for eac h p ositiv e in teger x w e are giv en the v alue of p x = P ( X = x ) : (In Finn's example, p 5 = 1, and p n = 0 for all other v alues of n .) Then it is easy to calculate the conditional probabilit y that an en v elop e con tains the smaller amoun t, giv en that it con tains x dollars. The t w o p ossible sample p oin ts are ( x; x= 2) and ( x; 2 x ). If x is o dd, then the rst sample p oin t has probabilit y 0, since x= 2 is not an in teger, so the desired conditional probabilit y is 1 that x is the smaller amoun t. If x is ev en, then the t w o sample p oin ts ha v e probabilities p x= 2 and p x resp ectiv ely so the conditional probabilit y that x is the smaller amoun t is p x p x= 2 + p x ; whic h is not necessarily equal to 1/2. Stev en Brams and D. Marc Kilgour 21 study the problem, for dieren t distributions, of whether or not one should switc h en v elop es, if one's ob jectiv e is to maximize the longterm a v erage winnings. Let x b e the amoun t in y our en v elop e. They sho w that for an y distribution of X there is at least one v alue of x suc h that y ou should switc h. They giv e an example of a distribution for whic h there is exactly one v alue of x suc h that y ou should switc h (see Exercise 5). P erhaps the most in teresting case is a distribution in whic h y ou should alw a ys switc h. W e no w giv e this example. Example 4.29 Supp ose that w e ha v e t w o en v elop es in fron t of us, and that one en v elop e con tains t wice the amoun t of money as the other (b oth amoun ts are p ositiv e in tegers). W e are giv en one of the en v elop es, and ask ed if w e w ould lik e to switc h. 21 S. J. Brams and D. M. Kilgour, \The Bo x Problem: T o Switc h or Not to Switc h," Mathematics Magazine v ol. 68, no. 1 (1995), p. 29. PAGE 188 180 CHAPTER 4. CONDITIONAL PR OBABILITY As ab o v e, w e let X denote the smaller of the t w o amoun ts in the en v elop es, and let p x = P ( X = x ) : W e are no w in a p osition where w e can calculate the longterm a v erage winnings, if w e switc h. (This longterm a v erage is an example of a probabilistic concept kno wn as exp ectation, and will b e discussed in Chapter 6.) Giv en that one of the t w o sample p oin ts has o ccurred, the probabilit y that it is the p oin t ( x; x= 2) is p x= 2 p x= 2 + p x ; and the probabilit y that it is the p oin t ( x; 2 x ) is p x p x= 2 + p x : Th us, if w e switc h, our longterm a v erage winnings are p x= 2 p x= 2 + p x x 2 + p x p x= 2 + p x 2 x : If this is greater than x then it pa ys in the long run for us to switc h. Some routine algebra sho ws that the ab o v e expression is greater than x if and only if p x= 2 p x= 2 + p x < 2 3 : (4.6) It is in teresting to consider whether there is a distribution on the p ositiv e in tegers suc h that the inequalit y 4.6 is true for all ev en v alues of x Brams and Kilgour 22 giv e the follo wing example. W e dene p x as follo ws: p x = ( 1 3 2 3 k 1 ; if x = 2 k ; 0 ; otherwise. It is easy to calculate (see Exercise 4) that for all relev an t v alues of x w e ha v e p x= 2 p x= 2 + p x = 3 5 ; whic h means that the inequalit y 4.6 is alw a ys true. 2 So far, w e ha v e b een able to resolv e parado xes b y clearly stating the assumptions b eing made and b y precisely stating the mo dels b eing used. W e end this section b y describing a parado x whic h w e cannot resolv e. Example 4.30 Supp ose that w e ha v e t w o en v elop es in fron t of us, and w e are told that the en v elop es con tain X and Y dollars, resp ectiv ely where X and Y are dieren t p ositiv e in tegers. W e randomly c ho ose one of the en v elop es, and w e op en 22 ibid. PAGE 189 4.3. P ARADO XES 181 it, rev ealing X sa y Is it p ossible to determine, with probabilit y greater than 1/2, whether X is the smaller of the t w o dollar amoun ts? Ev en if w e ha v e no kno wledge of the join t distribution of X and Y the surprising answ er is y es! Here's ho w to do it. T oss a fair coin un til the rst time that heads turns up. Let Z denote the n um b er of tosses required plus 1/2. If Z > X then w e sa y that X is the smaller of the t w o amoun ts, and if Z < X then w e sa y that X is the larger of the t w o amoun ts. First, if Z lies b et w een X and Y then w e are sure to b e correct. Since X and Y are unequal, Z lies b et w een them with p ositiv e probabilit y Second, if Z is not b et w een X and Y then Z is either greater than b oth X and Y or is less than b oth X and Y In either case, X is the smaller of the t w o amoun ts with probabilit y 1/2, b y symmetry considerations (remem b er, w e c hose the en v elop e at random). Th us, the probabilit y that w e are correct is greater than 1/2. 2 Exercises 1 One of the rst conditional probabilit y parado xes w as pro vided b y Bertrand. 23 It is called the Box Par adox A cabinet has three dra w ers. In the rst dra w er there are t w o gold balls, in the second dra w er there are t w o silv er balls, and in the third dra w er there is one silv er and one gold ball. A dra w er is pic k ed at random and a ball c hosen at random from the t w o balls in the dra w er. Giv en that a gold ball w as dra wn, what is the probabilit y that the dra w er with the t w o gold balls w as c hosen? 2 The follo wing problem is called the two ac es pr oblem This problem, dating bac k to 1936, has b een attributed to the English mathematician J. H. C. Whitehead (see Gridgeman 24 ). This problem w as also submitted to Marilyn v os Sa v an t b y the master of mathematical puzzles Martin Gardner, who remarks that it is one of his fa v orites. A bridge hand has b een dealt, i. e. thirteen cards are dealt to eac h pla y er. Giv en that y our partner has at least one ace, what is the probabilit y that he has at least t w o aces? Giv en that y our partner has the ace of hearts, what is the probabilit y that he has at least t w o aces? Answ er these questions for a v ersion of bridge in whic h there are eigh t cards, namely four aces and four kings, and eac h pla y er is dealt t w o cards. (The reader ma y wish to solv e the problem with a 52card dec k.) 3 In the preceding exercise, it is natural to ask \Ho w do w e get the information that the giv en hand has an ace?" Gridgeman considers t w o dieren t w a ys that w e migh t get this information. (Again, assume the dec k consists of eigh t cards.) (a) Assume that the p erson holding the hand is ask ed to \Name an ace in y our hand" and answ ers \The ace of hearts." What is the probabilit y that he has a second ace? 23 J. Bertrand, Calcul des Pr ob abilit es GauthierUillars, 1888. 24 N. T. Gridgeman, Letter, A meric an Statistician 21 (1967), pgs. 3839. PAGE 190 182 CHAPTER 4. CONDITIONAL PR OBABILITY (b) Supp ose the p erson holding the hand is ask ed the more direct question \Do y ou ha v e the ace of hearts?" and the answ er is y es. What is the probabilit y that he has a second ace? 4 Using the notation in tro duced in Example 4.29, sho w that in the example of Brams and Kilgour, if x is a p ositiv e p o w er of 2, then p x= 2 p x= 2 + p x = 3 5 : 5 Using the notation in tro duced in Example 4.29, let p x = ( 2 3 1 3 k ; if x = 2 k ; 0 ; otherwise. Sho w that there is exactly one v alue of x suc h that if y our en v elop e con tains x then y ou should switc h. *6 (F or bridge pla y ers only F rom Sutherland. 25 ) Supp ose that w e are the declarer in a hand of bridge, and w e ha v e the king, 9, 8, 7, and 2 of a certain suit, while the dumm y has the ace, 10, 5, and 4 of the same suit. Supp ose that w e w an t to pla y this suit in suc h a w a y as to maximize the probabilit y of ha ving no losers in the suit. W e b egin b y leading the 2 to the ace, and w e note that the queen drops on our left. W e then lead the 10 from the dumm y and our righ thand opp onen t pla ys the six (after pla ying the three on the rst round). Should w e nesse or pla y for the drop? 25 E. Sutherland, \Restricted Choice  F act or Fiction?", Canadian Master Point No v em b er 1, 1993. PAGE 191 Chapter 5 Imp ortan t Distributions and Densities5.1 Imp ortan t Distributions In this c hapter, w e describ e the discrete probabilit y distributions and the con tin uous probabilit y densities that o ccur most often in the analysis of exp erimen ts. W e will also sho w ho w one sim ulates these distributions and densities on a computer. Discrete Uniform Distribution In Chapter 1, w e sa w that in man y cases, w e assume that all outcomes of an exp erimen t are equally lik ely If X is a random v ariable whic h represen ts the outcome of an exp erimen t of this t yp e, then w e sa y that X is uniformly distributed. If the sample space S is of size n where 0 < n < 1 then the distribution function m ( ) is dened to b e 1 =n for all 2 S As is the case with all of the discrete probabilit y distributions discussed in this c hapter, this exp erimen t can b e sim ulated on a computer using the program GeneralSim ulation Ho w ev er, in this case, a faster algorithm can b e used instead. (This algorithm w as describ ed in Chapter 1; w e rep eat the description here for completeness.) The expression 1 + b n ( r nd ) c tak es on as a v alue eac h in teger b et w een 1 and n with probabilit y 1 =n (the notation b x c denotes the greatest in teger not exceeding x ). Th us, if the p ossible outcomes of the exp erimen t are lab elled 1 2 ; : : : ; n then w e use the ab o v e expression to represen t the subscript of the output of the exp erimen t. If the sample space is a coun tably innite set, suc h as the set of p ositiv e in tegers, then it is not p ossible to ha v e an exp erimen t whic h is uniform on this set (see Exercise 3). If the sample space is an uncoun table set, with p ositiv e, nite length, suc h as the in terv al [0 ; 1], then w e use con tin uous densit y functions (see Section 5.2). 183 PAGE 192 184 CHAPTER 5. DISTRIBUTIONS AND DENSITIES Binomial Distribution The binomial distribution with parameters n p and k w as dened in Chapter 3. It is the distribution of the random v ariable whic h coun ts the n um b er of heads whic h o ccur when a coin is tossed n times, assuming that on an y one toss, the probabilit y that a head o ccurs is p The distribution function is giv en b y the form ula b ( n; p; k ) = n k p k q n k ; where q = 1 p One straigh tforw ard w a y to sim ulate a binomial random v ariable X is to compute the sum of n indep enden t 0 1 random v ariables, eac h of whic h tak e on the v alue 1 with probabilit y p This metho d requires n calls to a random n um b er generator to obtain one v alue of the random v ariable. When n is relativ ely large (sa y at least 30), the Cen tral Limit Theorem (see Chapter 9) implies that the binomial distribution is w ellappro ximated b y the corresp onding normal densit y function (whic h is dened in Section 5.2) with parameters = np and = p npq Th us, in this case w e can compute a v alue Y of a normal random v ariable with these parameters, and if 1 = 2 Y < n + 1 = 2, w e can use the v alue b Y + 1 = 2 c to represen t the random v ariable X If Y < 1 = 2 or Y > n + 1 = 2, w e reject Y and compute another v alue. W e will see in the next section ho w w e can quic kly sim ulate normal random v ariables. Geometric Distribution Consider a Bernoulli trials pro cess con tin ued for an innite n um b er of trials; for example, a coin tossed an innite sequence of times. W e sho w ed in Section 2.2 ho w to assign a probabilit y distribution to the innite tree. Th us, w e can determine the distribution for an y random v ariable X relating to the exp erimen t pro vided P ( X = a ) can b e computed in terms of a nite n um b er of trials. F or example, let T b e the n um b er of trials up to and including the rst success. Then P ( T = 1) = p ; P ( T = 2) = q p ; P ( T = 3) = q 2 p ; and in general, P ( T = n ) = q n 1 p : T o sho w that this is a distribution, w e m ust sho w that p + q p + q 2 p + = 1 : PAGE 193 5.1. IMPOR T ANT DISTRIBUTIONS 185 5 10 15 20 0 0.2 0.4 p = .5 5 10 15 20 0 0.05 0.1 0.15 0.2 p = .2 Figure 5.1: Geometric distributions. The lefthand expression is just a geometric series with rst term p and common ratio q so its sum is p 1 q whic h equals 1. In Figure 5.1 w e ha v e plotted this distribution using the program GeometricPlot for the cases p = : 5 and p = : 2. W e see that as p decreases w e are more lik ely to get large v alues for T as w ould b e exp ected. In b oth cases, the most probable v alue for T is 1. This will alw a ys b e true since P ( T = j + 1) P ( T = j ) = q < 1 : In general, if 0 < p < 1, and q = 1 p then w e sa y that the random v ariable T has a ge ometric distribution if P ( T = j ) = q j 1 p ; for j = 1 ; 2 ; 3 ; : : : T o sim ulate the geometric distribution with parameter p w e can simply compute a sequence of random n um b ers in [0 ; 1), stopping when an en try do es not exceed p Ho w ev er, for small v alues of p this is timeconsuming (taking, on the a v erage, 1 =p steps). W e no w describ e a metho d whose running time do es not dep end up on the size of p Dene Y to b e the smallest in teger satisfying the inequalit y 1 q Y r nd : (5.1) Then w e ha v e P ( Y = j ) = P 1 q j r nd > 1 q j 1 = q j 1 q j = q j 1 (1 q ) = q j 1 p : PAGE 194 186 CHAPTER 5. DISTRIBUTIONS AND DENSITIES Th us, Y is geometrically distributed with parameter p T o generate Y all w e ha v e to do is solv e Equation 5.1 for Y W e obtain Y = & log(1 r nd ) log q ; where the notation d x e means the least in teger whic h is greater than or equal to x Since log (1 r nd ) and log( r nd ) are iden tically distributed, Y can also b e generated using the equation Y = & log r nd log q : Example 5.1 The geometric distribution pla ys an imp ortan t role in the theory of queues, or w aiting lines. F or example, supp ose a line of customers w aits for service at a coun ter. It is often assumed that, in eac h small time unit, either 0 or 1 new customers arriv e at the coun ter. The probabilit y that a customer arriv es is p and that no customer arriv es is q = 1 p Then the time T un til the next arriv al has a geometric distribution. It is natural to ask for the probabilit y that no customer arriv es in the next k time units, that is, for P ( T > k ). This is giv en b y P ( T > k ) = 1 X j = k +1 q j 1 p = q k ( p + q p + q 2 p + ) = q k : This probabilit y can also b e found b y noting that w e are asking for no successes (i.e., arriv als) in a sequence of k consecutiv e time units, where the probabilit y of a success in an y one time unit is p Th us, the probabilit y is just q k since arriv als in an y t w o time units are indep enden t ev en ts. It is often assumed that the length of time required to service a customer also has a geometric distribution but with a dieren t v alue for p This implies a rather sp ecial prop ert y of the service time. T o see this, let us compute the conditional probabilit y P ( T > r + s j T > r ) = P ( T > r + s ) P ( T > r ) = q r + s q r = q s : Th us, the probabilit y that the customer's service tak es s more time units is indep enden t of the length of time r that the customer has already b een serv ed. Because of this in terpretation, this prop ert y is called the \memoryless" prop ert y and is also ob ey ed b y the exp onen tial distribution. (F ortunately not to o man y service stations ha v e this prop ert y .) 2 Negativ e Binomial Distribution Supp ose w e are giv en a coin whic h has probabilit y p of coming up heads when it is tossed. W e x a p ositiv e in teger k and toss the coin un til the k th head app ears. W e PAGE 195 5.1. IMPOR T ANT DISTRIBUTIONS 187 let X represen t the n um b er of tosses. When k = 1, X is geometrically distributed. F or a general k w e sa y that X has a negativ e binomial distribution. W e no w calculate the probabilit y distribution of X If X = x then it m ust b e true that there w ere exactly k 1 heads thro wn in the rst x 1 tosses, and a head m ust ha v e b een thro wn on the x th toss. There are x 1 k 1 sequences of length x with these prop erties, and eac h of them is assigned the same probabilit y namely p k 1 q x k : Therefore, if w e dene u ( x; k ; p ) = P ( X = x ) ; then u ( x; k ; p ) = x 1 k 1 p k q x k : One can sim ulate this on a computer b y sim ulating the tossing of a coin. The follo wing algorithm is, in general, m uc h faster. W e note that X can b e understo o d as the sum of k outcomes of a geometrically distributed exp erimen t with parameter p Th us, w e can use the follo wing sum as a means of generating X : k X j =1 & log r nd j log q : Example 5.2 A fair coin is tossed un til the second time a head turns up. The distribution for the n um b er of tosses is u ( x; 2 ; p ). Th us the probabilit y that x tosses are needed to obtain t w o heads is found b y letting k = 2 in the ab o v e form ula. W e obtain u ( x; 2 ; 1 = 2) = x 1 1 1 2 x ; for x = 2 ; 3 ; : : : In Figure 5.2 w e giv e a graph of the distribution for k = 2 and p = : 25. Note that the distribution is quite asymmetric, with a long tail rerecting the fact that large v alues of x are p ossible. 2 P oisson Distribution The P oisson distribution arises in man y situations. It is safe to sa y that it is one of the three most imp ortan t discrete probabilit y distributions (the other t w o b eing the uniform and the binomial distributions). The P oisson distribution can b e view ed as arising from the binomial distribution or from the exp onen tial densit y W e shall no w explain its connection with the former; its connection with the latter will b e explained in the next section. PAGE 196 188 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 5 10 15 20 25 30 0 0.02 0.04 0.06 0.08 0.1 Figure 5.2: Negativ e binomial distribution with k = 2 and p = : 25. Supp ose that w e ha v e a situation in whic h a certain kind of o ccurrence happ ens at random o v er a p erio d of time. F or example, the o ccurrences that w e are in terested in migh t b e incoming telephone calls to a p olice station in a large cit y W e w an t to mo del this situation so that w e can consider the probabilities of ev en ts suc h as more than 10 phone calls o ccurring in a 5min ute time in terv al. Presumably in our example, there w ould b e more incoming calls b et w een 6:00 and 7:00 P .M. than b et w een 4:00 and 5:00 A.M. and this fact w ould certainly aect the ab o v e probabilit y Th us, to ha v e a hop e of computing suc h probabilities, w e m ust assume that the a v erage rate, i.e., the a v erage n um b er of o ccurrences p er min ute, is a constan t. This rate w e will denote b y (Th us, in a giv en 5min ute time in terv al, w e w ould exp ect ab out 5 o ccurrences.) This means that if w e w ere to apply our mo del to the t w o time p erio ds giv en ab o v e, w e w ould simply use dieren t rates for the t w o time p erio ds, thereb y obtaining t w o dieren t probabilities for the giv en ev en t. Our next assumption is that the n um b er of o ccurrences in t w o nono v erlapping time in terv als are indep enden t. In our example, this means that the ev en ts that there are j calls b et w een 5:00 and 5:15 P .M. and k calls b et w een 6:00 and 6:15 P .M. on the same da y are indep enden t. W e can use the binomial distribution to mo del this situation. W e imagine that a giv en time in terv al is brok en up in to n subin terv als of equal length. If the subinterv als are sucien tly short, w e can assume that t w o or more o ccurrences happ en in one subin terv al with a probabilit y whic h is negligible in comparison with the probabilit y of at most one o ccurrence. Th us, in eac h subin terv al, w e are assuming that there is either 0 or 1 o ccurrence. This means that the sequence of subin terv als can b e though t of as a sequence of Bernoulli trials, with a success corresp onding to an o ccurrence in the subin terv al. PAGE 197 5.1. IMPOR T ANT DISTRIBUTIONS 189 T o decide up on the prop er v alue of p the probabilit y of an o ccurrence in a giv en subin terv al, w e reason as follo ws. On the a v erage, there are t o ccurrences in a time in terv al of length t If this time in terv al is divided in to n subin terv als, then w e w ould exp ect, using the Bernoulli trials in terpretation, that there should b e np o ccurrences. Th us, w e w an t t = np ; so p = t n : W e no w wish to consider the random v ariable X whic h coun ts the n um b er of o ccurrences in a giv en time in terv al. W e w an t to calculate the distribution of X F or ease of calculation, w e will assume that the time in terv al is of length 1; for time in terv als of arbitrary length t see Exercise 11. W e kno w that P ( X = 0) = b ( n; p; 0) = (1 p ) n = 1 n n : F or large n this is appro ximately e It is easy to calculate that for an y xed k w e ha v e b ( n; p; k ) b ( n; p; k 1) = ( k 1) p k q whic h, for large n (and therefore small p ) is appro ximately =k Th us, w e ha v e P ( X = 1) e ; and in general, P ( X = k ) k k e : (5.2) The ab o v e distribution is the P oisson distribution. W e note that it m ust b e c hec k ed that the distribution giv en in Equation 5.2 really is a distribution, i.e., that its v alues are nonnegativ e and sum to 1. (See Exercise 12.) The P oisson distribution is used as an appro ximation to the binomial distribution when the parameters n and p are large and small, resp ectiv ely (see Examples 5.3 and 5.4). Ho w ev er, the P oisson distribution also arises in situations where it ma y not b e easy to in terpret or measure the parameters n and p (see Example 5.5). Example 5.3 A t yp esetter mak es, on the a v erage, one mistak e p er 1000 w ords. Assume that he is setting a b o ok with 100 w ords to a page. Let S 100 b e the n um b er of mistak es that he mak es on a single page. Then the exact probabilit y distribution for S 100 w ould b e obtained b y considering S 100 as a result of 100 Bernoulli trials with p = 1 = 1000. The exp ected v alue of S 100 is = 100(1 = 1000) = : 1. The exact probabilit y that S 100 = j is b (100 ; 1 = 1000 ; j ), and the P oisson appro ximation is e : 1 ( : 1) j j : In T able 5.1 w e giv e, for v arious v alues of n and p the exact v alues computed b y the binomial distribution and the P oisson appro ximation. 2 PAGE 198 190 CHAPTER 5. DISTRIBUTIONS AND DENSITIES P oisson Binomial P oisson Binomial P oisson Binomial n = 100 n = 100 n = 1000 j = : 1 p = : 001 = 1 p = : 01 = 10 p = : 01 0 .9048 .9048 .3679 .3660 .0000 .0000 1 .0905 .0905 .3679 .3697 .0005 .0004 2 .0045 .0045 .1839 .1849 .0023 .0022 3 .0002 .0002 .0613 .0610 .0076 .0074 4 .0000 .0000 .0153 .0149 .0189 .0186 5 .0031 .0029 .0378 .0374 6 .0005 .0005 .0631 .0627 7 .0001 .0001 .0901 .0900 8 .0000 .0000 .1126 .1128 9 .1251 .1256 10 .1251 .1257 11 .1137 .1143 12 .0948 .0952 13 .0729 .0731 14 .0521 .0520 15 .0347 .0345 16 .0217 .0215 17 .0128 .0126 18 .0071 .0069 19 .0037 .0036 20 .0019 .0018 21 .0009 .0009 22 .0004 .0004 23 .0002 .0002 24 .0001 .0001 25 .0000 .0000 T able 5.1: P oisson appro ximation to the binomial distribution. PAGE 199 5.1. IMPOR T ANT DISTRIBUTIONS 191 Example 5.4 In his b o ok, 1 F eller discusses the statistics of rying b om b hits in the south of London during the Second W orld W ar. Assume that y ou liv e in a district of size 10 blo c ks b y 10 blo c ks so that the total district is divided in to 100 small squares. Ho w lik ely is it that the square in whic h y ou liv e will receiv e no hits if the total area is hit b y 400 b om bs? W e assume that a particular b om b will hit y our square with probabilit y 1/100. Since there are 400 b om bs, w e can regard the n um b er of hits that y our square receiv es as the n um b er of suc c esses in a Bernoulli trials pro cess with n = 400 and p = 1 = 100. Th us w e can use the P oisson distribution with = 400 1 = 100 = 4 to appro ximate the probabilit y that y our square will receiv e j hits. This probabilit y is p ( j ) = e 4 4 j =j !. The exp ected n um b er of squares that receiv e exactly j hits is then 100 p ( j ). It is easy to write a program LondonBom bs to sim ulate this situation and compare the exp ected n um b er of squares with j hits with the observ ed n um b er. In Exercise 26 y ou are ask ed to compare the actual observ ed data with that predicted b y the P oisson distribution. In Figure 5.3, w e ha v e sho wn the sim ulated hits, together with a spik e graph sho wing b oth the observ ed and predicted frequencies. The observ ed frequencies are sho wn as squares, and the predicted frequencies are sho wn as dots. 2 If the reader w ould rather not consider rying b om bs, he is in vited to instead consider an analogous situation in v olving co okies and raisins. W e assume that w e ha v e made enough co okie dough for 500 co okies. W e put 600 raisins in the dough, and mix it thoroughly One w a y to lo ok at this situation is that w e ha v e 500 co okies, and after placing the co okies in a grid on the table, w e thro w 600 raisins at the co okies. (See Exercise 22.) Example 5.5 Supp ose that in a certain xed amoun t A of blo o d, the a v erage h uman has 40 white blo o d cells. Let X b e the random v ariable whic h giv es the n um b er of white blo o d cells in a random sample of size A from a random individual. W e can think of X as binomially distributed with eac h white blo o d cell in the b o dy represen ting a trial. If a giv en white blo o d cell turns up in the sample, then the trial corresp onding to that blo o d cell w as a success. Then p should b e tak en as the ratio of A to the total amoun t of blo o d in the individual, and n will b e the n um b er of white blo o d cells in the individual. Of course, in practice, neither of these parameters is v ery easy to measure accurately but presumably the n um b er 40 is easy to measure. But for the a v erage h uman, w e then ha v e 40 = np so w e can think of X as b eing P oisson distributed, with parameter = 40. In this case, it is easier to mo del the situation using the P oisson distribution than the binomial distribution. 2 T o sim ulate a P oisson random v ariable on a computer, a go o d w a y is to tak e adv an tage of the relationship b et w een the P oisson distribution and the exp onen tial densit y This relationship and the resulting sim ulation algorithm will b e describ ed in the next section. 1 ibid., p. 161. PAGE 200 192 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 0 2 4 6 8 10 0 0.05 0.1 0.15 0.2 Figure 5.3: Flying b om b hits. PAGE 201 5.1. IMPOR T ANT DISTRIBUTIONS 193 Hyp ergeometric Distribution Supp ose that w e ha v e a set of N balls, of whic h k are red and N k are blue. W e c ho ose n of these balls, without replacemen t, and dene X to b e the n um b er of red balls in our sample. The distribution of X is called the h yp ergeometric distribution. W e note that this distribution dep ends up on three parameters, namely N k and n There do es not seem to b e a standard notation for this distribution; w e will use the notation h ( N ; k ; n; x ) to denote P ( X = x ). This probabilit y can b e found b y noting that there are N n dieren t samples of size n and the n um b er of suc h samples with exactly x red balls is obtained b y m ultiplying the n um b er of w a ys of c ho osing x red balls from the set of k red balls and the n um b er of w a ys of c ho osing n x blue balls from the set of N k blue balls. Hence, w e ha v e h ( N ; k ; n; x ) = k x N k n x N n : This distribution can b e generalized to the case where there are more than t w o t yp es of ob jects. (See Exercise 40.) If w e let N and k tend to 1 in suc h a w a y that the ratio k = N remains xed, then the h yp ergeometric distribution tends to the binomial distribution with parameters n and p = k = N This is reasonable b ecause if N and k are m uc h larger than n then whether w e c ho ose our sample with or without replacemen t should not aect the probabilities v ery m uc h, and the exp erimen t consisting of c ho osing with replacemen t yields a binomially distributed random v ariable (see Exercise 44). An example of ho w this distribution migh t b e used is giv en in Exercises 36 and 37. W e no w giv e another example in v olving the h yp ergeometric distribution. It illustrates a statistical test called Fisher's Exact T est. Example 5.6 It is often of in terest to consider t w o traits, suc h as ey e color and hair color, and to ask whether there is an asso ciation b et w een the t w o traits. Tw o traits are asso ciated if kno wing the v alue of one of the traits for a giv en p erson allo ws us to predict the v alue of the other trait for that p erson. The stronger the asso ciation, the more accurate the predictions b ecome. If there is no asso ciation b et w een the traits, then w e sa y that the traits are indep enden t. In this example, w e will use the traits of gender and p olitical part y and w e will assume that there are only t w o p ossible genders, female and male, and only t w o p ossible p olitical parties, Demo cratic and Republican. Supp ose that w e ha v e collected data concerning these traits. T o test whether there is an asso ciation b et w een the traits, w e rst assume that there is no asso ciation b et w een the t w o traits. This giv es rise to an \exp ected" data set, in whic h kno wledge of the v alue of one trait is of no help in predicting the v alue of the other trait. Our collected data set usually diers from this exp ected data set. If it diers b y quite a bit, then w e w ould tend to reject the assumption of indep endence of the traits. T o PAGE 202 194 CHAPTER 5. DISTRIBUTIONS AND DENSITIES Demo crat Republican F emale 24 4 28 Male 8 14 22 32 18 50 T able 5.2: Observ ed data. Demo crat Republican F emale s 11 s 12 t 11 Male s 21 s 22 t 12 t 21 t 22 n T able 5.3: General data table. nail do wn what is mean t b y \quite a bit," w e decide whic h p ossible data sets dier from the exp ected data set b y at least as m uc h as ours do es, and then w e compute the probabilit y that an y of these data sets w ould o ccur under the assumption of indep endence of traits. If this probabilit y is small, then it is unlik ely that the dierence b et w een our collected data set and the exp ected data set is due en tirely to c hance. Supp ose that w e ha v e collected the data sho wn in T able 5.2. The ro w and column sums are called marginal totals, or marginals. In what follo ws, w e will denote the ro w sums b y t 11 and t 12 and the column sums b y t 21 and t 22 The ij th en try in the table will b e denoted b y s ij Finally the size of the data set will b e denoted b y n Th us, a general data table will lo ok as sho wn in T able 5.3. W e no w explain the mo del whic h will b e used to construct the \exp ected" data set. In the mo del, w e assume that the t w o traits are indep enden t. W e then put t 21 y ello w balls and t 22 green balls, corresp onding to the Demo cratic and Republican marginals, in to an urn. W e dra w t 11 balls, without replacemen t, from the urn, and call these balls females. The t 12 balls remaining in the urn are called males. In the sp ecic case under consideration, the probabilit y of getting the actual data under this mo del is giv en b y the expression 3224 18 4 5028 ; i.e., a v alue of the h yp ergeometric distribution. W e are no w ready to construct the exp ected data set. If w e c ho ose 28 balls out of 50, w e should exp ect to see, on the a v erage, the same p ercen tage of y ello w balls in our sample as in the urn. Th us, w e should exp ect to see, on the a v erage, 28(32 = 50) = 17 : 92 18 y ello w balls in our sample. (See Exercise 36.) The other exp ected v alues are computed in exactly the same w a y Th us, the exp ected data set is sho wn in T able 5.4. W e note that the v alue of s 11 determines the other three v alues in the table, since the marginals are all xed. Th us, in considering the p ossible data sets that could app ear in this mo del, it is enough to consider the v arious p ossible v alues of s 11 In the sp ecic case at hand, what is the probabilit y PAGE 203 5.1. IMPOR T ANT DISTRIBUTIONS 195 Demo crat Republican F emale 18 10 28 Male 14 8 22 32 18 50 T able 5.4: Exp ected data. of dra wing exactly a y ello w balls, i.e., what is the probabilit y that s 11 = a ? It is 32 a 18 28 a 5028 : (5.3) W e are no w ready to decide whether our actual data diers from the exp ected data set b y an amoun t whic h is greater than could b e reasonably attributed to c hance alone. W e note that the exp ected n um b er of female Demo crats is 18, but the actual n um b er in our data is 24. The other data sets whic h dier from the exp ected data set b y more than ours corresp ond to those where the n um b er of female Demo crats equals 25, 26, 27, or 28. Th us, to obtain the required probabilit y w e sum the expression in (5.3) from a = 24 to a = 28. W e obtain a v alue of : 000395. Th us, w e should reject the h yp othesis that the t w o traits are indep enden t. 2 Finally w e turn to the question of ho w to sim ulate a h yp ergeometric random v ariable X Let us assume that the parameters for X are N k and n W e imagine that w e ha v e a set of N balls, lab elled from 1 to N W e decree that the rst k of these balls are red, and the rest are blue. Supp ose that w e ha v e c hosen m balls, and that j of them are red. Then there are k j red balls left, and N m balls left. Th us, our next c hoice will b e red with probabilit y k j N m : So at this stage, w e c ho ose a random n um b er in [0 ; 1], and rep ort that a red ball has b een c hosen if and only if the random n um b er do es not exceed the ab o v e expression. Then w e up date the v alues of m and j and con tin ue un til n balls ha v e b een c hosen. Benford Distribution Our next example of a distribution comes from the study of leading digits in data sets. It turns out that man y data sets that o ccur \in real life" ha v e the prop ert y that the rst digits of the data are not uniformly distributed o v er the set f 1 ; 2 ; : : : ; 9 g Rather, it app ears that the digit 1 is most lik ely to o ccur, and that the distribution is monotonically decreasing on the set of p ossible digits. The Benford distribution app ears, in man y cases, to t suc h data. Man y explanations ha v e b een giv en for the o ccurrence of this distribution. P ossibly the most con vincing explanation is that this distribution is the only one that is in v arian t under a c hange of scale. If one thinks of certain data sets as someho w \naturally o ccurring," then the distribution should b e unaected b y whic h units are c hosen in whic h to represen t the data, i.e., the distribution should b e in v arian t under c hange of scale. PAGE 204 196 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 2 4 6 8 0 0.05 0.1 0.15 0.2 0.25 0.3 Figure 5.4: Leading digits in Presiden t Clin ton's tax returns. Theo dore Hill 2 giv es a general description of the Benford distribution, when one considers the rst d digits of in tegers in a data set. W e will restrict our atten tion to the rst digit. In this case, the Benford distribution has distribution function f ( k ) = log 10 ( k + 1) log 10 ( k ) ; for 1 k 9. Mark Nigrini 3 has adv o cated the use of the Benford distribution as a means of testing suspicious nancial records suc h as b o okk eeping en tries, c hec ks, and tax returns. His idea is that if someone w ere to \mak e up" n um b ers in these cases, the p erson w ould probably pro duce n um b ers that are fairly uniformly distributed, while if one w ere to use the actual n um b ers, the leading digits w ould roughly follo w the Benford distribution. As an example, Nigrini analyzed Presiden t Clin ton's tax returns for a 13y ear p erio d. In Figure 5.4, the Benford distribution v alues are sho wn as squares, and the Presiden t's tax return data are sho wn as circles. One sees that in this example, the Benford distribution ts the data v ery w ell. This distribution w as disco v ered b y the astronomer Simon New com b who stated the follo wing in his pap er on the sub ject: \That the ten digits do not o ccur with equal frequency m ust b e eviden t to an y one making use of logarithm tables, and noticing ho w m uc h faster the rst pages w ear out than the last ones. The rst signican t gure is oftener 1 than an y other digit, and the frequency diminishes up to 9." 4 2 T. P Hill, \The Signican t Digit Phenomenon," A meric an Mathematic al Monthly, v ol. 102, no. 4 (April 1995), pgs. 322327. 3 M. Nigrini, \Detecting Biases and Irregularities in T abulated Data," w orking pap er 4 S. New com b, \Note on the frequency of use of the dieren t digits in natural n um b ers," A meric an Journal of Mathematics, v ol. 4 (1881), pgs. 3940. PAGE 205 5.1. IMPOR T ANT DISTRIBUTIONS 197 Exercises 1 F or whic h of the follo wing random v ariables w ould it b e appropriate to assign a uniform distribution? (a) Let X represen t the roll of one die. (b) Let X represen t the n um b er of heads obtained in three tosses of a coin. (c) A roulette wheel has 38 p ossible outcomes: 0, 00, and 1 through 36. Let X represen t the outcome when a roulette wheel is spun. (d) Let X represen t the birthda y of a randomly c hosen p erson. (e) Let X represen t the n um b er of tosses of a coin necessary to ac hiev e a head for the rst time. 2 Let n b e a p ositiv e in teger. Let S b e the set of in tegers b et w een 1 and n Consider the follo wing pro cess: W e remo v e a n um b er from S at random and write it do wn. W e rep eat this un til S is empt y The result is a p erm utation of the in tegers from 1 to n Let X denote this p erm utation. Is X uniformly distributed? 3 Let X b e a random v ariable whic h can tak e on coun tably man y v alues. Sho w that X cannot b e uniformly distributed. 4 Supp ose w e are attending a college whic h has 3000 studen ts. W e wish to c ho ose a subset of size 100 from the studen t b o dy Let X represen t the subset, c hosen using the follo wing p ossible strategies. F or whic h strategies w ould it b e appropriate to assign the uniform distribution to X ? If it is appropriate, what probabilit y should w e assign to eac h outcome? (a) T ak e the rst 100 studen ts who en ter the cafeteria to eat lunc h. (b) Ask the Registrar to sort the studen ts b y their So cial Securit y n um b er, and then tak e the rst 100 in the resulting list. (c) Ask the Registrar for a set of cards, with eac h card con taining the name of exactly one studen t, and with eac h studen t app earing on exactly one card. Thro w the cards out of a thirdstory windo w, then w alk outside and pic k up the rst 100 cards that y ou nd. 5 Under the same conditions as in the preceding exercise, can y ou describ e a pro cedure whic h, if used, w ould pro duce eac h p ossible outcome with the same probabilit y? Can y ou describ e suc h a pro cedure that do es not rely on a computer or a calculator? 6 Let X 1 ; X 2 ; : : : ; X n b e n m utually indep enden t random v ariables, eac h of whic h is uniformly distributed on the in tegers from 1 to k Let Y denote the minim um of the X i 's. Find the distribution of Y 7 A die is rolled un til the rst time T that a six turns up. (a) What is the probabilit y distribution for T ? PAGE 206 198 CHAPTER 5. DISTRIBUTIONS AND DENSITIES (b) Find P ( T > 3). (c) Find P ( T > 6 j T > 3). 8 If a coin is tossed a sequence of times, what is the probabilit y that the rst head will o ccur after the fth toss, giv en that it has not o ccurred in the rst t w o tosses? 9 A w ork er for the Departmen t of Fish and Game is assigned the job of estimating the n um b er of trout in a certain lak e of mo dest size. She pro ceeds as follo ws: She catc hes 100 trout, tags eac h of them, and puts them bac k in the lak e. One mon th later, she catc hes 100 more trout, and notes that 10 of them ha v e tags. (a) Without doing an y fancy calculations, giv e a rough estimate of the n umb er of trout in the lak e. (b) Let N b e the n um b er of trout in the lak e. Find an expression, in terms of N for the probabilit y that the w ork er w ould catc h 10 tagged trout out of the 100 trout that she caugh t the second time. (c) Find the v alue of N whic h maximizes the expression in part (b). This v alue is called the maximum likeliho o d estimate for the unkno wn quan tit y N Hint : Consider the ratio of the expressions for successiv e v alues of N 10 A census in the United States is an attempt to coun t ev ery one in the coun try It is inevitable that man y p eople are not coun ted. The U. S. Census Bureau prop osed a w a y to estimate the n um b er of p eople who w ere not coun ted b y the latest census. Their prop osal w as as follo ws: In a giv en lo calit y let N denote the actual n um b er of p eople who liv e there. Assume that the census coun ted n 1 p eople living in this area. No w, another census w as tak en in the lo calit y and n 2 p eople w ere coun ted. In addition, n 12 p eople w ere coun ted b oth times. (a) Giv en N n 1 and n 2 let X denote the n um b er of p eople coun ted b oth times. Find the probabilit y that X = k where k is a xed p ositiv e in teger b et w een 0 and n 2 (b) No w assume that X = n 12 Find the v alue of N whic h maximizes the expression in part (a). Hint : Consider the ratio of the expressions for successiv e v alues of N 11 Supp ose that X is a random v ariable whic h represen ts the n um b er of calls coming in to a p olice station in a onemin ute in terv al. In the text, w e sho w ed that X could b e mo delled using a P oisson distribution with parameter where this parameter represen ts the a v erage n um b er of incoming calls p er min ute. No w supp ose that Y is a random v ariable whic h represen ts the n umb er of incoming calls in an in terv al of length t Sho w that the distribution of Y is giv en b y P ( Y = k ) = e t ( t ) k k ; PAGE 207 5.1. IMPOR T ANT DISTRIBUTIONS 199 i.e., Y is P oisson with parameter t Hint : Supp ose a Martian w ere to observ e the p olice station. Let us also assume that the basic time in terv al used on Mars is exactly t Earth min utes. Finally w e will assume that the Martian understands the deriv ation of the P oisson distribution in the text. What w ould she write do wn for the distribution of Y ? 12 Sho w that the v alues of the P oisson distribution giv en in Equation 5.2 sum to 1. 13 The P oisson distribution with parameter = : 3 has b een assigned for the outcome of an exp erimen t. Let X b e the outcome function. Find P ( X = 0), P ( X = 1), and P ( X > 1). 14 On the a v erage, only 1 p erson in 1000 has a particular rare blo o d t yp e. (a) Find the probabilit y that, in a cit y of 10,000 p eople, no one has this blo o d t yp e. (b) Ho w man y p eople w ould ha v e to b e tested to giv e a probabilit y greater than 1/2 of nding at least one p erson with this blo o d t yp e? 15 W rite a program for the user to input n p j and ha v e the program prin t out the exact v alue of b ( n; p; k ) and the P oisson appro ximation to this v alue. 16 Assume that, during eac h second, a Dartmouth switc h b oard receiv es one call with probabilit y .01 and no calls with probabilit y .99. Use the P oisson appro ximation to estimate the probabilit y that the op erator will miss at most one call if she tak es a 5min ute coee break. 17 The probabilit y of a ro y al rush in a p ok er hand is p = 1 = 649 ; 740. Ho w large m ust n b e to render the probabilit y of ha ving no ro y al rush in n hands smaller than 1 =e ? 18 A bak er blends 600 raisins and 400 c ho colate c hips in to a dough mix and, from this, mak es 500 co okies. (a) Find the probabilit y that a randomly pic k ed co okie will ha v e no raisins. (b) Find the probabilit y that a randomly pic k ed co okie will ha v e exactly t w o c ho colate c hips. (c) Find the probabilit y that a randomly c hosen co okie will ha v e at least t w o bits (raisins or c hips) in it. 19 The probabilit y that, in a bridge deal, one of the four hands has all hearts is appro ximately 6 : 3 10 12 In a cit y with ab out 50,000 bridge pla y ers the residen t probabilit y exp ert is called on the a v erage once a y ear (usually late at nigh t) and told that the caller has just b een dealt a hand of all hearts. Should she susp ect that some of these callers are the victims of practical jok es? PAGE 208 200 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 20 An adv ertiser drops 10,000 learets on a cit y whic h has 2000 blo c ks. Assume that eac h learet has an equal c hance of landing on eac h blo c k. What is the probabilit y that a particular blo c k will receiv e no learets? 21 In a class of 80 studen ts, the professor calls on 1 studen t c hosen at random for a recitation in eac h class p erio d. There are 32 class p erio ds in a term. (a) W rite a form ula for the exact probabilit y that a giv en studen t is called up on j times during the term. (b) W rite a form ula for the P oisson appro ximation for this probabilit y Using y our form ula estimate the probabilit y that a giv en studen t is called up on more than t wice. 22 Assume that w e are making raisin co okies. W e put a b o x of 600 raisins in to our dough mix, mix up the dough, then mak e from the dough 500 co okies. W e then ask for the probabilit y that a randomly c hosen co okie will ha v e 0, 1, 2, raisins. Consider the co okies as trials in an exp erimen t, and let X b e the random v ariable whic h giv es the n um b er of raisins in a giv en co okie. Then w e can regard the n um b er of raisins in a co okie as the result of n = 600 indep enden t trials with probabilit y p = 1 = 500 for success on eac h trial. Since n is large and p is small, w e can use the P oisson appro ximation with = 600(1 = 500) = 1 : 2. Determine the probabilit y that a giv en co okie will ha v e at least v e raisins. 23 F or a certain exp erimen t, the P oisson distribution with parameter = m has b een assigned. Sho w that a most probable outcome for the exp erimen t is the in teger v alue k suc h that m 1 k m Under what conditions will there b e t w o most probable v alues? Hint : Consider the ratio of successiv e probabilities. 24 When John Kemen y w as c hair of the Mathematics Departmen t at Dartmouth College, he receiv ed an a v erage of ten letters eac h da y On a certain w eekda y he receiv ed no mail and w ondered if it w as a holida y T o decide this he computed the probabilit y that, in ten y ears, he w ould ha v e at least 1 da y without an y mail. He assumed that the n um b er of letters he receiv ed on a giv en da y has a P oisson distribution. What probabilit y did he nd? Hint : Apply the P oisson distribution t wice. First, to nd the probabilit y that, in 3000 da ys, he will ha v e at least 1 da y without mail, assuming eac h y ear has ab out 300 da ys on whic h mail is deliv ered. 25 Reese Prosser nev er puts money in a 10cen t parking meter in Hano v er. He assumes that there is a probabilit y of .05 that he will b e caugh t. The rst oense costs nothing, the second costs 2 dollars, and subsequen t oenses cost 5 dollars eac h. Under his assumptions, ho w do es the exp ected cost of parking 100 times without pa ying the meter compare with the cost of pa ying the meter eac h time? PAGE 209 5.1. IMPOR T ANT DISTRIBUTIONS 201 Num b er of deaths Num b er of corps with x deaths in a giv en y ear 0 144 1 91 2 32 3 11 4 2 T able 5.5: Mule kic ks. 26 F eller 5 discusses the statistics of rying b om b hits in an area in the south of London during the Second W orld W ar. The area in question w as divided in to 24 24 = 576 small areas. The total n um b er of hits w as 537. There w ere 229 squares with 0 hits, 211 with 1 hit, 93 with 2 hits, 35 with 3 hits, 7 with 4 hits, and 1 with 5 or more. Assuming the hits w ere purely random, use the P oisson appro ximation to nd the probabilit y that a particular square w ould ha v e exactly k hits. Compute the exp ected n um b er of squares that w ould ha v e 0, 1, 2, 3, 4, and 5 or more hits and compare this with the observ ed results. 27 Assume that the probabilit y that there is a signican t acciden t in a n uclear p o w er plan t during one y ear's time is .001. If a coun try has 100 n uclear plan ts, estimate the probabilit y that there is at least one suc h acciden t during a giv en y ear. 28 An airline nds that 4 p ercen t of the passengers that mak e reserv ations on a particular righ t will not sho w up. Consequen tly their p olicy is to sell 100 reserv ed seats on a plane that has only 98 seats. Find the probabilit y that ev ery p erson who sho ws up for the righ t will nd a seat a v ailable. 29 The king's coinmaster b o xes his coins 500 to a b o x and puts 1 coun terfeit coin in eac h b o x. The king is suspicious, but, instead of testing all the coins in 1 b o x, he tests 1 coin c hosen at random out of eac h of 500 b o xes. What is the probabilit y that he nds at least one fak e? What is it if the king tests 2 coins from eac h of 250 b o xes? 30 (F rom Kemen y 6 ) Sho w that, if y ou mak e 100 b ets on the n um b er 17 at roulette at Mon te Carlo (see Example 6.13), y ou will ha v e a probabilit y greater than 1/2 of coming out ahead. What is y our exp ected winning? 31 In one of the rst studies of the P oisson distribution, v on Bortkiewicz 7 considered the frequency of deaths from kic ks in the Prussian arm y corps. F rom the study of 14 corps o v er a 20y ear p erio d, he obtained the data sho wn in T able 5.5. Fit a P oisson distribution to this data and see if y ou think that the P oisson distribution is appropriate. 5 ibid., p. 161. 6 Priv ate comm unication. 7 L. v on Bortkiewicz, Das Gesetz der Kleinen Zahlen (Leipzig: T eubner, 1898), p. 24. PAGE 210 202 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 32 It is often assumed that the auto trac that arriv es at the in tersection during a unit time p erio d has a P oisson distribution with exp ected v alue m Assume that the n um b er of cars X that arriv e at an in tersection from the north in unit time has a P oisson distribution with parameter = m and the n um b er Y that arriv e from the w est in unit time has a P oisson distribution with parameter = m If X and Y are indep enden t, sho w that the total n um b er X + Y that arriv e at the in tersection in unit time has a P oisson distribution with parameter = m + m 33 Cars coming along Magnolia Street come to a fork in the road and ha v e to c ho ose either Willo w Street or Main Street to con tin ue. Assume that the n um b er of cars that arriv e at the fork in unit time has a P oisson distribution with parameter = 4. A car arriving at the fork c ho oses Main Street with probabilit y 3/4 and Willo w Street with probabilit y 1/4. Let X b e the random v ariable whic h coun ts the n um b er of cars that, in a giv en unit of time, pass b y Jo e's Barb er Shop on Main Street. What is the distribution of X ? 34 In the app eal of the Pe ople v. Col lins case (see Exercise 4.1.28), the counsel for the defense argued as follo ws: Supp ose, for example, there are 5,000,000 couples in the Los Angeles area and the probabilit y that a randomly c hosen couple ts the witnesses' description is 1/12,000,000. Then the probabilit y that there are t w o suc h couples giv en that there is at least one is not at all small. Find this probabilit y (The California Supreme Court o v erturned the initial guilt y v erdict.) 35 A man ufactured lot of brass turn buc kles has S items of whic h D are defectiv e. A sample of s items is dra wn without replacemen t. Let X b e a random v ariable that giv es the n um b er of defectiv e items in the sample. Let p ( d ) = P ( X = d ). (a) Sho w that p ( d ) = D d S D s d S s : Th us, X is h yp ergeometric. (b) Pro v e the follo wing iden tit y kno wn as Euler's formula : min ( D ;s ) Xd =0 D d S D s d = S s : 36 A bin of 1000 turn buc kles has an unkno wn n um b er D of defectiv es. A sample of 100 turn buc kles has 2 defectiv es. The maximum likeliho o d estimate for D is the n um b er of defectiv es whic h giv es the highest probabilit y for obtaining the n um b er of defectiv es observ ed in the sample. Guess this n um b er D and then write a computer program to v erify y our guess. 37 There are an unkno wn n um b er of mo ose on Isle Ro y ale (a National P ark in Lak e Sup erior). T o estimate the n um b er of mo ose, 50 mo ose are captured and PAGE 211 5.1. IMPOR T ANT DISTRIBUTIONS 203 tagged. Six mon ths later 200 mo ose are captured and it is found that 8 of these w ere tagged. Estimate the n um b er of mo ose on Isle Ro y ale from these data, and then v erify y our guess b y computer program (see Exercise 36). 38 A man ufactured lot of buggy whips has 20 items, of whic h 5 are defectiv e. A random sample of 5 items is c hosen to b e insp ected. Find the probabilit y that the sample con tains exactly one defectiv e item (a) if the sampling is done with replacemen t. (b) if the sampling is done without replacemen t. 39 Supp ose that N and k tend to 1 in suc h a w a y that k = N remains xed. Sho w that h ( N ; k ; n; x ) b ( n; k = N ; x ) : 40 A bridge dec k has 52 cards with 13 cards in eac h of four suits: spades, hearts, diamonds, and clubs. A hand of 13 cards is dealt from a sh ued dec k. Find the probabilit y that the hand has (a) a distribution of suits 4, 4, 3, 2 (for example, four spades, four hearts, three diamonds, t w o clubs). (b) a distribution of suits 5, 3, 3, 2. 41 W rite a computer algorithm that sim ulates a h yp ergeometric random v ariable with parameters N k and n 42 Y ou are presen ted with four dieren t dice. The rst one has t w o sides mark ed 0 and four sides mark ed 4. The second one has a 3 on ev ery side. The third one has a 2 on four sides and a 6 on t w o sides, and the fourth one has a 1 on three sides and a 5 on three sides. Y ou allo w y our friend to pic k an y of the four dice he wishes. Then y ou pic k one of the remaining three and y ou eac h roll y our die. The p erson with the largest n um b er sho wing wins a dollar. Sho w that y ou can c ho ose y our die so that y ou ha v e probabilit y 2/3 of winning no matter whic h die y our friend pic ks. (See T enney and F oster. 8 ) 43 The studen ts in a certain class w ere classied b y hair color and ey e color. The con v en tions used w ere: Bro wn and blac k hair w ere considered dark, and red and blonde hair w ere considered ligh t; blac k and bro wn ey es w ere considered dark, and blue and green ey es w ere considered ligh t. They collected the data sho wn in T able 5.6. Are these traits indep enden t? (See Example 5.6.) 44 Supp ose that in the h yp ergeometric distribution, w e let N and k tend to 1 in suc h a w a y that the ratio k = N approac hes a real n um b er p b et w een 0 and 1. Sho w that the h yp ergeometric distribution tends to the binomial distribution with parameters n and p 8 R. L. T enney and C. C. F oster, Nontr ansitive Dominanc e Math. Mag. 49 (1976) no. 3, pgs. 115120. PAGE 212 204 CHAPTER 5. DISTRIBUTIONS AND DENSITIES Dark Ey es Ligh t Ey es Dark Hair 28 15 43 Ligh t Hair 9 23 32 37 38 75 T able 5.6: Observ ed data. 0 10 20 30 40 0 500 1000 1500 2000 2500 3000 3500 Figure 5.5: Distribution of c hoices in the P o w erball lottery 45 (a) Compute the leading digits of the rst 100 p o w ers of 2, and see ho w w ell these data t the Benford distribution. (b) Multiply eac h n um b er in the data set of part (a) b y 3, and compare the distribution of the leading digits with the Benford distribution. 46 In the P o w erball lottery, con testan ts pic k 5 dieren t in tegers b et w een 1 and 45, and in addition, pic k a b on us in teger from the same range (the b on us in teger can equal one of the rst v e in tegers c hosen). Some con testan ts c ho ose the n um b ers themselv es, and others let the computer c ho ose the n um b ers. The data sho wn in T able 5.7 are the con testan tc hosen n um b ers in a certain state on Ma y 3, 1996. A spik e graph of the data is sho wn in Figure 5.5. The goal of this problem is to c hec k the h yp othesis that the c hosen n um b ers are uniformly distributed. T o do this, compute the v alue v of the random v ariable 2 giv en in Example 5.6. In the presen t case, this random v ariable has 44 degrees of freedom. One can nd, in a 2 table, the v alue v 0 = 59 : 43 whic h represen ts a n um b er with the prop ert y that a 2 distributed random v ariable tak es on v alues that exceed v 0 only 5% of the time. Do es y our computed v alue of v exceed v 0 ? If so, y ou should reject the h yp othesis that the con testan ts' c hoices are uniformly distributed. PAGE 213 5.2. IMPOR T ANT DENSITIES 205 In teger Times In teger Times In teger Times Chosen Chosen Chosen 1 2646 2 2934 3 3352 4 3000 5 3357 6 2892 7 3657 8 3025 9 3362 10 2985 11 3138 12 3043 13 2690 14 2423 15 2556 16 2456 17 2479 18 2276 19 2304 20 1971 21 2543 22 2678 23 2729 24 2414 25 2616 26 2426 27 2381 28 2059 29 2039 30 2298 31 2081 32 1508 33 1887 34 1463 35 1594 36 1354 37 1049 38 1165 39 1248 40 1493 41 1322 42 1423 43 1207 44 1259 45 1224 T able 5.7: Num b ers c hosen b y con testan ts in the P o w erball lottery 5.2 Imp ortan t Densities In this section, w e will in tro duce some imp ortan t probabilit y densit y functions and giv e some examples of their use. W e will also consider the question of ho w one sim ulates a giv en densit y using a computer. Con tin uous Uniform Densit y The simplest densit y function corresp onds to the random v ariable U whose v alue represen ts the outcome of the exp erimen t consisting of c ho osing a real n um b er at random from the in terv al [ a; b ]. f ( ) = 1 = ( b a ) ; if a b; 0 ; otherwise. It is easy to sim ulate this densit y on a computer. W e simply calculate the expression ( b a ) r nd + a : Exp onen tial and Gamma Densities The exp onen tial densit y function is dened b y f ( x ) = e x ; if 0 x < 1 ; 0 ; otherwise : Here is an y p ositiv e constan t, dep ending on the exp erimen t. The reader has seen this densit y in Example 2.17. In Figure 5.6 w e sho w graphs of sev eral exp onential densities for dieren t c hoices of The exp onen tial densit y is often used to PAGE 214 206 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 0 2 4 6 8 10 l = 1 l = 2 l = 1/2 Figure 5.6: Exp onen tial densities. describ e exp erimen ts in v olving a question of the form: Ho w long un til something happ ens? F or example, the exp onen tial densit y is often used to study the time b et w een emissions of particles from a radioactiv e source. The cum ulativ e distribution function of the exp onen tial densit y is easy to compute. Let T b e an exp onen tially distributed random v ariable with parameter If x 0, then w e ha v e F ( x ) = P ( T x ) = Z x 0 e t dt = 1 e x : Both the exp onen tial densit y and the geometric distribution share a prop ert y kno wn as the \memoryless" prop ert y This prop ert y w as in tro duced in Example 5.1; it sa ys that P ( T > r + s j T > r ) = P ( T > s ) : This can b e demonstrated to hold for the exp onen tial densit y b y computing b oth sides of this equation. The righ thand side is just 1 F ( s ) = e s ; while the lefthand side is P ( T > r + s ) P ( T > r ) = 1 F ( r + s ) 1 F ( r ) PAGE 215 5.2. IMPOR T ANT DENSITIES 207 = e ( r + s ) e r = e s : There is a v ery imp ortan t relationship b et w een the exp onen tial densit y and the P oisson distribution. W e b egin b y dening X 1 ; X 2 ; : : : to b e a sequence of indep enden t exp onen tially distributed random v ariables with parameter W e migh t think of X i as denoting the amoun t of time b et w een the i th and ( i + 1)st emissions of a particle b y a radioactiv e source. (As w e shall see in Chapter 6, w e can think of the parameter as represen ting the recipro cal of the a v erage length of time b et w een emissions. This parameter is a quan tit y that migh t b e measured in an actual exp erimen t of this t yp e.) W e no w consider a time in terv al of length t and w e let Y denote the random v ariable whic h coun ts the n um b er of emissions that o ccur in the time in terv al. W e w ould lik e to calculate the distribution function of Y (clearly Y is a discrete random v ariable). If w e let S n denote the sum X 1 + X 2 + + X n then it is easy to see that P ( Y = n ) = P ( S n t and S n +1 > t ) : Since the ev en t S n +1 t is a subset of the ev en t S n t the ab o v e probabilit y is seen to b e equal to P ( S n t ) P ( S n +1 t ) : (5.4) W e will sho w in Chapter 7 that the densit y of S n is giv en b y the follo wing form ula: g n ( x ) = ( ( x ) n 1 ( n 1)! e x ; if x > 0, 0 ; otherwise. This densit y is an example of a gamma densit y with parameters and n The general gamma densit y allo ws n to b e an y p ositiv e real n um b er. W e shall not discuss this general densit y It is easy to sho w b y induction on n that the cum ulativ e distribution function of S n is giv en b y: G n ( x ) = 8<: 1 e x 1 + x 1! + + ( x ) n 1 ( n 1)! ; if x > 0 ; 0 ; otherwise. Using this expression, the quan tit y in (5.4) is easy to compute; w e obtain e t ( t ) n n ; whic h the reader will recognize as the probabilit y that a P oissondistributed random v ariable, with parameter t tak es on the v alue n The ab o v e relationship will allo w us to sim ulate a P oisson distribution, once w e ha v e found a w a y to sim ulate an exp onen tial densit y The follo wing random v ariable do es the job: Y = 1 log ( r nd ) : (5.5) PAGE 216 208 CHAPTER 5. DISTRIBUTIONS AND DENSITIES Using Corollary 5.2 (b elo w), one can deriv e the ab o v e expression (see Exercise 3). W e con ten t ourselv es for no w with a short calculation that should con vince the reader that the random v ariable Y has the required prop ert y W e ha v e P ( Y y ) = P 1 log( r nd ) y = P (log( r nd ) y ) = P ( r nd e y ) = 1 e y : This last expression is seen to b e the cum ulativ e distribution function of an exp onen tially distributed random v ariable with parameter T o sim ulate a P oisson random v ariable W with parameter w e simply generate a sequence of v alues of an exp onen tially distributed random v ariable with the same parameter, and k eep trac k of the subtotals S k of these v alues. W e stop generating the sequence when the subtotal rst exceeds Assume that w e nd that S n < S n +1 : Then the v alue n is returned as a sim ulated v alue for W Example 5.7 (Queues) Supp ose that customers arriv e at random times at a service station with one serv er, and supp ose that eac h customer is serv ed immediately if no one is ahead of him, but m ust w ait his turn in line otherwise. Ho w long should eac h customer exp ect to w ait? (W e dene the w aiting time of a customer to b e the length of time b et w een the time that he arriv es and the time that he b egins to b e serv ed.) Let us assume that the in terarriv al times b et w een successiv e customers are giv en b y random v ariables X 1 X 2 X n that are m utually indep enden t and iden tically distributed with an exp onen tial cum ulativ e distribution function giv en b y F X ( t ) = 1 e t : Let us assume, to o, that the service times for successiv e customers are giv en b y random v ariables Y 1 Y 2 Y n that again are m utually indep enden t and iden tically distributed with another exp onen tial cum ulativ e distribution function giv en b y F Y ( t ) = 1 e t : The parameters and represen t, resp ectiv ely the recipro cals of the a v erage time b et w een arriv als of customers and the a v erage service time of the customers. Th us, for example, the larger the v alue of the smaller the a v erage time b et w een arriv als of customers. W e can guess that the length of time a customer will sp end in the queue dep ends on the relativ e sizes of the a v erage in terarriv al time and the a v erage service time. It is easy to v erify this conjecture b y sim ulation. The program Queue sim ulates this queueing pro cess. Let N ( t ) b e the n um b er of customers in the queue at time t PAGE 217 5.2. IMPOR T ANT DENSITIES 209 2000 4000 6000 8000 10000 10 20 30 40 50 60 2000 4000 6000 8000 10000 200 400 600 800 1000 1200 l = 1 l = 1 m = .9 m = 1.1 Figure 5.7: Queue sizes. 0 10 20 30 40 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Figure 5.8: W aiting times. Then w e plot N ( t ) as a function of t for dieren t c hoices of the parameters and (see Figure 5.7). W e note that when < then 1 = > 1 = so the a v erage in terarriv al time is greater than the a v erage service time, i.e., customers are serv ed more quic kly on a v erage, than new ones arriv e. Th us, in this case, it is reasonable to exp ect that N ( t ) remains small. Ho w ev er, if > then customers arriv e more quic kly than they are serv ed, and, as exp ected, N ( t ) app ears to gro w without limit. W e can no w ask: Ho w long will a customer ha v e to w ait in the queue for service? T o examine this question, w e let W i b e the length of time that the i th customer has to remain in the system (w aiting in line and b eing serv ed). Then w e can presen t these data in a bar graph, using the program Queue to giv e some idea of ho w the W i are distributed (see Figure 5.8). (Here = 1 and = 1 : 1.) W e see that these w aiting times app ear to b e distributed exp onen tially This is alw a ys the case when < The pro of of this fact is to o complicated to giv e here, but w e can v erify it b y sim ulation for dieren t c hoices of and as ab o v e. 2 PAGE 218 210 CHAPTER 5. DISTRIBUTIONS AND DENSITIES F unctions of a Random V ariable Before con tin uing our list of imp ortan t densities, w e pause to consider random v ariables whic h are functions of other random v ariables. W e will pro v e a general theorem that will allo w us to deriv e expressions suc h as Equation 5.5. Theorem 5.1 Let X b e a con tin uous random v ariable, and supp ose that ( x ) is a strictly increasing function on the range of X Dene Y = ( X ). Supp ose that X and Y ha v e cum ulativ e distribution functions F X and F Y resp ectiv ely Then these functions are related b y F Y ( y ) = F X ( 1 ( y )) : If ( x ) is strictly decreasing on the range of X then F Y ( y ) = 1 F X ( 1 ( y )) : Pro of. Since is a strictly increasing function on the range of X the ev en ts ( X 1 ( y )) and ( ( X ) y ) are equal. Th us, w e ha v e F Y ( y ) = P ( Y y ) = P ( ( X ) y ) = P ( X 1 ( y )) = F X ( 1 ( y )) : If ( x ) is strictly decreasing on the range of X then w e ha v e F Y ( y ) = P ( Y y ) = P ( ( X ) y ) = P ( X 1 ( y )) = 1 P ( X < 1 ( y )) = 1 F X ( 1 ( y )) : This completes the pro of. 2 Corollary 5.1 Let X b e a con tin uous random v ariable, and supp ose that ( x ) is a strictly increasing function on the range of X Dene Y = ( X ). Supp ose that the densit y functions of X and Y are f X and f Y resp ectiv ely Then these functions are related b y f Y ( y ) = f X ( 1 ( y )) d dy 1 ( y ) : If ( x ) is strictly decreasing on the range of X then f Y ( y ) = f X ( 1 ( y )) d dy 1 ( y ) : PAGE 219 5.2. IMPOR T ANT DENSITIES 211 Pro of. This result follo ws from Theorem 5.1 b y using the Chain Rule. 2 If the function is neither strictly increasing nor strictly decreasing, then the situation is somewhat more complicated but can b e treated b y the same metho ds. F or example, supp ose that Y = X 2 Then ( x ) = x 2 and F Y ( y ) = P ( Y y ) = P ( p y X + p y ) = P ( X + p y ) P ( X p y ) = F X ( p y ) F X ( p y ) : Moreo v er, f Y ( y ) = d dy F Y ( y ) = d dy ( F X ( p y ) F X ( p y )) = f X ( p y ) + f X ( p y ) 1 2 p y : W e see that in order to express F Y in terms of F X when Y = ( X ), w e ha v e to express P ( Y y ) in terms of P ( X x ), and this pro cess will dep end in general up on the structure of Sim ulation Theorem 5.1 tells us, among other things, ho w to sim ulate on the computer a random v ariable Y with a prescrib ed cum ulativ e distribution function F W e assume that F ( y ) is strictly increasing for those v alues of y where 0 < F ( y ) < 1. F or this purp ose, let U b e a random v ariable whic h is uniformly distributed on [0 ; 1]. Then U has cum ulativ e distribution function F U ( u ) = u No w, if F is the prescrib ed cum ulativ e distribution function for Y then to write Y in terms of U w e rst solv e the equation F ( y ) = u for y in terms of u W e obtain y = F 1 ( u ). Note that since F is an increasing function this equation alw a ys has a unique solution (see Figure 5.9). Then w e set Z = F 1 ( U ) and obtain, b y Theorem 5.1, F Z ( y ) = F U ( F ( y )) = F ( y ) ; since F U ( u ) = u Therefore, Z and Y ha v e the same cum ulativ e distribution function. Summarizing, w e ha v e the follo wing. PAGE 220 212 CHAPTER 5. DISTRIBUTIONS AND DENSITIES y = f( x ) x = F Y (y) Y (y) Graph of F x y 1 0 Figure 5.9: Con v erting a uniform distribution F U in to a prescrib ed distribution F Y Corollary 5.2 If F ( y ) is a giv en cum ulativ e distribution function that is strictly increasing when 0 < F ( y ) < 1 and if U is a random v ariable with uniform distribution on [0 ; 1], then Y = F 1 ( U ) has the cum ulativ e distribution F ( y ). 2 Th us, to sim ulate a random v ariable with a giv en cum ulativ e distribution F w e need only set Y = F 1 (rnd). Normal Densit y W e no w come to the most imp ortan t densit y function, the normal densit y function. W e ha v e seen in Chapter 3 that the binomial distribution functions are b ellshap ed, ev en for mo derate size v alues of n W e recall that a binomiallydistributed random v ariable with parameters n and p can b e considered to b e the sum of n m utually indep enden t 01 random v ariables. A v ery imp ortan t theorem in probabilit y theory called the Cen tral Limit Theorem, states that under v ery general conditions, if w e sum a large n um b er of m utually indep enden t random v ariables, then the distribution of the sum can b e closely appro ximated b y a certain sp ecic con tin uous densit y called the normal densit y This theorem will b e discussed in Chapter 9. The normal densit y function with parameters and is dened as follo ws: f X ( x ) = 1 p 2 e ( x ) 2 = 2 2 : The parameter represen ts the \cen ter" of the densit y (and in Chapter 6, w e will sho w that it is the a v erage, or exp ected, v alue of the densit y). The parameter is a measure of the \spread" of the densit y and th us it is assumed to b e p ositiv e. (In Chapter 6, w e will sho w that is the standard deviation of the densit y .) W e note that it is not at all ob vious that the ab o v e function is a densit y i.e., that its PAGE 221 5.2. IMPOR T ANT DENSITIES 213 4 2 2 4 0.1 0.2 0.3 0.4 s = 1 s = 2 Figure 5.10: Normal densit y for t w o sets of parameter v alues. in tegral o v er the real line equals 1. The cum ulativ e distribution function is giv en b y the form ula F X ( x ) = Z x 1 1 p 2 e ( u ) 2 = 2 2 du : In Figure 5.10 w e ha v e included for comparison a plot of the normal densit y for the cases = 0 and = 1, and = 0 and = 2. One cannot write F X in terms of simple functions. This leads to sev eral problems. First of all, v alues of F X m ust b e computed using n umerical in tegration. Extensiv e tables exist con taining v alues of this function (see App endix A). Secondly w e cannot write F 1 X in closed form, so w e cannot use Corollary 5.2 to help us sim ulate a normal random v ariable. F or this reason, sp ecial metho ds ha v e b een dev elop ed for sim ulating a normal distribution. One suc h metho d relies on the fact that if U and V are indep enden t random v ariables with uniform densities on [0 ; 1], then the random v ariables X = p 2 log U cos 2 V and Y = p 2 log U sin 2 V are indep enden t, and ha v e normal densit y functions with parameters = 0 and = 1. (This is not ob vious, nor shall w e pro v e it here. See Bo x and Muller. 9 ) Let Z b e a normal random v ariable with parameters = 0 and = 1. A normal random v ariable with these parameters is said to b e a standar d normal random v ariable. It is an imp ortan t and useful fact that if w e write X = Z + ; then X is a normal random v ariable with parameters and T o sho w this, w e will use Theorem 5.1. W e ha v e ( z ) = z + 1 ( x ) = ( x ) = and F X ( x ) = F Z x ; 9 G. E. P Bo x and M. E. Muller, A Note on the Gener ation of R andom Normal Deviates Ann. of Math. Stat. 29 (1958), pgs. 610611. PAGE 222 214 CHAPTER 5. DISTRIBUTIONS AND DENSITIES f X ( x ) = f Z x 1 = 1 p 2 e ( x ) 2 = 2 2 : The reader will note that this last expression is the densit y function with parameters and as claimed. W e ha v e seen ab o v e that it is p ossible to sim ulate a standard normal random v ariable Z If w e wish to sim ulate a normal random v ariable X with parameters and then w e need only transform the sim ulated v alues for Z using the equation X = Z + Supp ose that w e wish to calculate the v alue of a cum ulativ e distribution function for the normal random v ariable X with parameters and W e can reduce this calculation to one concerning the standard normal random v ariable Z as follo ws: F X ( x ) = P ( X x ) = P Z x = F Z x : This last expression can b e found in a table of v alues of the cum ulativ e distribution function for a standard normal random v ariable. Th us, w e see that it is unnecessary to mak e tables of normal distribution functions with arbitrary and The pro cess of c hanging a normal random v ariable to a standard normal random v ariable is kno wn as standardization. If X has a normal distribution with parameters and and if Z = X ; then Z is said to b e the standardized v ersion of X The follo wing example sho ws ho w w e use the standardized v ersion of a normal random v ariable X to compute sp ecic probabilities relating to X Example 5.8 Supp ose that X is a normally distributed random v ariable with parameters = 10 and = 3. Find the probabilit y that X is b et w een 4 and 16. T o solv e this problem, w e note that Z = ( X 10) = 3 is the standardized v ersion of X So, w e ha v e P (4 X 16) = P ( X 16) P ( X 4) = F X (16) F X (4) = F Z 16 10 3 F Z 4 10 3 = F Z (2) F Z ( 2) : PAGE 223 5.2. IMPOR T ANT DENSITIES 215 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 Figure 5.11: Distribution of dart distances in 1000 drops. This last expression can b e ev aluated b y using tabulated v alues of the standard normal distribution function (see 11.5); when w e use this table, w e nd that F Z (2) = : 9772 and F Z ( 2) = : 0228. Th us, the answ er is .9544. In Chapter 6, w e will see that the parameter is the mean, or a v erage v alue, of the random v ariable X The parameter is a measure of the spread of the random v ariable, and is called the standard deviation. Th us, the question ask ed in this example is of a t ypical t yp e, namely what is the probabilit y that a random v ariable has a v alue within t w o standard deviations of its a v erage v alue. 2 Maxw ell and Ra yleigh Densities Example 5.9 Supp ose that w e drop a dart on a large table top, whic h w e consider as the xy plane, and supp ose that the x and y co ordinates of the dart p oin t are indep enden t and ha v e a normal distribution with parameters = 0 and = 1. Ho w is the distance of the p oin t from the origin distributed? This problem arises in ph ysics when it is assumed that a mo ving particle in R n has comp onen ts of the v elo cit y that are m utually indep enden t and normally distributed and it is desired to nd the densit y of the sp eed of the particle. The densit y in the case n = 3 is called the Maxw ell densit y The densit y in the case n = 2 (i.e. the dart b oard exp erimen t describ ed ab o v e) is called the Ra yleigh densit y W e can sim ulate this case b y pic king indep enden tly a pair of co ordinates ( x; y ), eac h from a normal distribution with = 0 and = 1 on ( 1 ; 1 ), calculating the distance r = p x 2 + y 2 of the p oin t ( x; y ) from the origin, rep eating this pro cess a large n um b er of times, and then presen ting the results in a bar graph. The results are sho wn in Figure 5.11. PAGE 224 216 CHAPTER 5. DISTRIBUTIONS AND DENSITIES F emale Male A 37 56 93 B 63 60 123 C 47 43 90 Belo w C 5 8 13 152 167 319 T able 5.8: Calculus class data. F emale Male A 44.3 48.7 93 B 58.6 64.4 123 C 42.9 47.1 90 Belo w C 6.2 6.8 13 152 167 319 T able 5.9: Exp ected data. W e ha v e also plotted the theoretical densit y f ( r ) = r e r 2 = 2 : This will b e deriv ed in Chapter 7; see Example 7.7. 2 ChiSquared Densit y W e return to the problem of indep endence of traits discussed in Example 5.6. It is frequen tly the case that w e ha v e t w o traits, eac h of whic h ha v e sev eral dieren t v alues. As w as seen in the example, quite a lot of calculation w as needed ev en in the case of t w o v alues for eac h trait. W e no w giv e another metho d for testing indep endence of traits, whic h in v olv es m uc h less calculation. Example 5.10 Supp ose that w e ha v e the data sho wn in T able 5.8 concerning grades and gender of studen ts in a Calculus class. W e can use the same sort of mo del in this situation as w as used in Example 5.6. W e imagine that w e ha v e an urn with 319 balls of t w o colors, sa y blue and red, corresp onding to females and males, resp ectiv ely W e no w dra w 93 balls, without replacemen t, from the urn. These balls corresp ond to the grade of A. W e con tin ue b y dra wing 123 balls, whic h corresp ond to the grade of B. When w e nish, w e ha v e four sets of balls, with eac h ball b elonging to exactly one set. (W e could ha v e stipulated that the balls w ere of four colors, corresp onding to the four p ossible grades. In this case, w e w ould dra w a subset of size 152, whic h w ould corresp ond to the females. The balls remaining in the urn w ould corresp ond to the males. The c hoice do es not aect the nal determination of whether w e should reject the h yp othesis of indep endence of traits.) The exp ected data set can b e determined in exactly the same w a y as in Example 5.6. If w e do this, w e obtain the exp ected v alues sho wn in T able 5.9. Ev en if PAGE 225 5.2. IMPOR T ANT DENSITIES 217 the traits are indep enden t, w e w ould still exp ect to see some dierences b et w een the n um b ers in corresp onding b o xes in the t w o tables. Ho w ev er, if the dierences are large, then w e migh t susp ect that the t w o traits are not indep enden t. In Example 5.6, w e used the probabilit y distribution of the v arious p ossible data sets to compute the probabilit y of nding a data set that diers from the exp ected data set b y at least as m uc h as the actual data set do es. W e could do the same in this case, but the amoun t of computation is enormous. Instead, w e will describ e a single n um b er whic h do es a go o d job of measuring ho w far a giv en data set is from the exp ected one. T o quan tify ho w far apart the t w o sets of n um b ers are, w e could sum the squares of the dierences of the corresp onding n um b ers. (W e could also sum the absolute v alues of the dierences, but w e w ould not w an t to sum the dierences.) Supp ose that w e ha v e data in whic h w e exp ect to see 10 ob jects of a certain t yp e, but instead w e see 18, while in another case w e exp ect to see 50 ob jects of a certain t yp e, but instead w e see 58. Ev en though the t w o dierences are ab out the same, the rst dierence is more surprising than the second, since the exp ected n um b er of outcomes in the second case is quite a bit larger than the exp ected n um b er in the rst case. One w a y to correct for this is to divide the individual squares of the dierences b y the exp ected n um b er for that b o x. Th us, if w e lab el the v alues in the eigh t b o xes in the rst table b y O i (for observ ed v alues) and the v alues in the eigh t b o xes in the second table b y E i (for exp ected v alues), then the follo wing expression migh t b e a reasonable one to use to measure ho w far the observ ed data is from what is exp ected: 8 X i =1 ( O i E i ) 2 E i : This expression is a random v ariable, whic h is usually denoted b y the sym b ol 2 pronounced \kisquared." It is called this b ecause, under the assumption of indep endence of the t w o traits, the densit y of this random v ariable can b e computed and is appro ximately equal to a densit y called the c hisquared densit y W e c ho ose not to giv e the explicit expression for this densit y since it in v olv es the gamma function, whic h w e ha v e not discussed. The c hisquared densit y is, in fact, a sp ecial case of the general gamma densit y In applying the c hisquared densit y tables of v alues of this densit y are used, as in the case of the normal densit y The c hisquared densit y has one parameter n whic h is called the n um b er of degrees of freedom. The n um b er n is usually easy to determine from the problem at hand. F or example, if w e are c hec king t w o traits for indep endence, and the t w o traits ha v e a and b v alues, resp ectiv ely then the n um b er of degrees of freedom of the random v ariable 2 is ( a 1)( b 1). So, in the example at hand, the n um b er of degrees of freedom is 3. W e recall that in this example, w e are trying to test for indep endence of the t w o traits of gender and grades. If w e assume these traits are indep enden t, then the ballandurn mo del giv en ab o v e giv es us a w a y to sim ulate the exp erimen t. Using a computer, w e ha v e p erformed 1000 exp erimen ts, and for eac h one, w e ha v e calculated a v alue of the random v ariable 2 The results are sho wn in Figure 5.12, together with the c hisquared densit y function with three degrees of freedom. PAGE 226 218 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 0 2 4 6 8 10 12 0 0.05 0.1 0.15 0.2 Figure 5.12: Chisquared densit y with three degrees of freedom. As w e stated ab o v e, if the v alue of the random v ariable 2 is large, then w e w ould tend not to b eliev e that the t w o traits are indep enden t. But ho w large is large? The actual v alue of this random v ariable for the data ab o v e is 4.13. In Figure 5.12, w e ha v e sho wn the c hisquared densit y with 3 degrees of freedom. It can b e seen that the v alue 4.13 is larger than most of the v alues tak en on b y this random v ariable. T ypically a statistician will compute the v alue v of the random v ariable 2 just as w e ha v e done. Then, b y lo oking in a table of v alues of the c hisquared densit y a v alue v 0 is determined whic h is only exceeded 5% of the time. If v v 0 the statistician rejects the h yp othesis that the t w o traits are indep enden t. In the presen t case, v 0 = 7 : 815, so w e w ould not reject the h yp othesis that the t w o traits are indep enden t. 2 Cauc h y Densit y The follo wing example is from F eller. 10 Example 5.11 Supp ose that a mirror is moun ted on a v ertical axis, and is free to rev olv e ab out that axis. The axis of the mirror is 1 fo ot from a straigh t w all of innite length. A pulse of ligh t is sho wn on to the mirror, and the rerected ra y hits the w all. Let b e the angle b et w een the rerected ra y and the line that is p erp endicular to the w all and that runs through the axis of the mirror. W e assume that is uniformly distributed b et w een = 2 and = 2. Let X represen t the distance b et w een the p oin t on the w all that is hit b y the rerected ra y and the p oin t on the w all that is closest to the axis of the mirror. W e no w determine the densit y of X Let B b e a xed p ositiv e quan tit y Then X B if and only if tan( ) B whic h happ ens if and only if arctan ( B ). This happ ens with probabilit y = 2 arctan ( B ) : 10 W. F eller, A n Intr o duction to Pr ob ability The ory and Its Applic ations, v ol. 2, (New Y ork: Wiley 1966) PAGE 227 5.2. IMPOR T ANT DENSITIES 219 Th us, for p ositiv e B the cum ulativ e distribution function of X is F ( B ) = 1 = 2 arctan ( B ) : Therefore, the densit y function for p ositiv e B is f ( B ) = 1 (1 + B 2 ) : Since the ph ysical situation is symmetric with resp ect to = 0, it is easy to see that the ab o v e expression for the densit y is correct for negativ e v alues of B as w ell. The La w of Large Num b ers, whic h w e will discuss in Chapter 8, states that in man y cases, if w e tak e the a v erage of indep enden t v alues of a random v ariable, then the a v erage approac hes a sp ecic n um b er as the n um b er of v alues increases. It turns out that if one do es this with a Cauc h ydistributed random v ariable, the a v erage do es not approac h an y sp ecic n um b er. 2 Exercises 1 Cho ose a n um b er U from the unit in terv al [0 ; 1] with uniform distribution. Find the cum ulativ e distribution and densit y for the random v ariables (a) Y = U + 2. (b) Y = U 3 2 Cho ose a n um b er U from the in terv al [0 ; 1] with uniform distribution. Find the cum ulativ e distribution and densit y for the random v ariables (a) Y = 1 = ( U + 1). (b) Y = log( U + 1). 3 Use Corollary 5.2 to deriv e the expression for the random v ariable giv en in Equation 5.5. Hint : The random v ariables 1 r nd and r nd are iden tically distributed. 4 Supp ose w e kno w a random v ariable Y as a function of the uniform random v ariable U : Y = ( U ), and supp ose w e ha v e calculated the cum ulativ e distribution function F Y ( y ) and thence the densit y f Y ( y ). Ho w can w e c hec k whether our answ er is correct? An easy sim ulation pro vides the answ er: Mak e a bar graph of Y = ( r nd ) and compare the result with the graph of f Y ( y ). These graphs should lo ok similar. Chec k y our answ ers to Exercises 1 and 2 b y this metho d. 5 Cho ose a n um b er U from the in terv al [0 ; 1] with uniform distribution. Find the cum ulativ e distribution and densit y for the random v ariables (a) Y = j U 1 = 2 j (b) Y = ( U 1 = 2) 2 PAGE 228 220 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 6 Chec k y our results for Exercise 5 b y sim ulation as describ ed in Exercise 4. 7 Explain ho w y ou can generate a random v ariable whose cum ulativ e distribution function is F ( x ) = 8<: 0 ; if x < 0 ; x 2 ; if 0 x 1 ; 1 ; if x > 1 : 8 W rite a program to generate a sample of 1000 random outcomes eac h of whic h is c hosen from the distribution giv en in Exercise 7. Plot a bar graph of y our results and compare this empirical densit y with the densit y for the cum ulativ e distribution giv en in Exercise 7. 9 Let U V b e random n um b ers c hosen indep enden tly from the in terv al [0 ; 1] with uniform distribution. Find the cum ulativ e distribution and densit y of eac h of the v ariables (a) Y = U + V (b) Y = j U V j 10 Let U V b e random n um b ers c hosen indep enden tly from the in terv al [0 ; 1]. Find the cum ulativ e distribution and densit y for the random v ariables (a) Y = max ( U; V ). (b) Y = min( U; V ). 11 W rite a program to sim ulate the random v ariables of Exercises 9 and 10 and plot a bar graph of the results. Compare the resulting empirical densit y with the densit y found in Exercises 9 and 10. 12 A n um b er U is c hosen at random in the in terv al [0 ; 1]. Find the probabilit y that (a) R = U 2 < 1 = 4. (b) S = U (1 U ) < 1 = 4. (c) T = U = (1 U ) < 1 = 4. 13 Find the cum ulativ e distribution function F and the densit y function f for eac h of the random v ariables R S and T in Exercise 12. 14 A p oin t P in the unit square has co ordinates X and Y c hosen at random in the in terv al [0 ; 1]. Let D b e the distance from P to the nearest edge of the square, and E the distance to the nearest corner. What is the probabilit y that (a) D < 1 = 4? (b) E < 1 = 4? 15 In Exercise 14 nd the cum ulativ e distribution F and densit y f for the random v ariable D PAGE 229 5.2. IMPOR T ANT DENSITIES 221 16 Let X b e a random v ariable with densit y function f X ( x ) = cx (1 x ) ; if 0 < x < 1 ; 0 ; otherwise. (a) What is the v alue of c ? (b) What is the cum ulativ e distribution function F X for X ? (c) What is the probabilit y that X < 1 = 4? 17 Let X b e a random v ariable with cum ulativ e distribution function F ( x ) = 8<: 0 ; if x < 0 ; sin 2 ( x= 2) ; if 0 x 1 ; 1 ; if 1 < x: (a) What is the densit y function f X for X ? (b) What is the probabilit y that X < 1 = 4? 18 Let X b e a random v ariable with cum ulativ e distribution function F X and let Y = X + b Z = aX and W = aX + b where a and b are an y constan ts. Find the cum ulativ e distribution functions F Y F Z and F W Hint : The cases a > 0, a = 0, and a < 0 require dieren t argumen ts. 19 Let X b e a random v ariable with densit y function f X and let Y = X + b Z = aX and W = aX + b where a 6 = 0. Find the densit y functions f Y f Z and f W (See Exercise 18.) 20 Let X b e a random v ariable uniformly distributed o v er [ c; d ], and let Y = aX + b F or what c hoice of a and b is Y uniformly distributed o v er [0 ; 1]? 21 Let X b e a random v ariable with cum ulativ e distribution function F strictly increasing on the range of X Let Y = F ( X ). Sho w that Y is uniformly distributed in the in terv al [0 ; 1]. (The form ula X = F 1 ( Y ) then tells us ho w to construct X from a uniform random v ariable Y .) 22 Let X b e a random v ariable with cum ulativ e distribution function F The me dian of X is the v alue m for whic h F ( m ) = 1 = 2. Then X < m with probabilit y 1/2 and X > m with probabilit y 1/2. Find m if X is (a) uniformly distributed o v er the in terv al [ a; b ]. (b) normally distributed with parameters and (c) exp onen tially distributed with parameter 23 Let X b e a random v ariable with densit y function f X The me an of X is the v alue = R xf x ( x ) dx Then giv es an a v erage v alue for X (see Section 6.3). Find if X is distributed uniformly normally or exp onen tially as in Exercise 22. PAGE 230 222 CHAPTER 5. DISTRIBUTIONS AND DENSITIES T est Score Letter grade + < x A < x < + B < x < C 2 < x < D x < 2 F T able 5.10: Grading on the curv e. 24 Let X b e a random v ariable with densit y function f X The mo de of X is the v alue M for whic h f ( M ) is maxim um. Then v alues of X near M are most lik ely to o ccur. Find M if X is distributed normally or exp onen tially as in Exercise 22. What happ ens if X is distributed uniformly? 25 Let X b e a random v ariable normally distributed with parameters = 70, = 10. Estimate (a) P ( X > 50). (b) P ( X < 60). (c) P ( X > 90). (d) P (60 < X < 80). 26 Bridies' Bearing W orks man ufactures b earing shafts whose diameters are normally distributed with parameters = 1, = : 002. The buy er's sp ecications require these diameters to b e 1 : 000 : 003 cm. What fraction of the man ufacturer's shafts are lik ely to b e rejected? If the man ufacturer impro v es her qualit y con trol, she can reduce the v alue of What v alue of will ensure that no more than 1 p ercen t of her shafts are lik ely to b e rejected? 27 A nal examination at P o dunk Univ ersit y is constructed so that the test scores are appro ximately normally distributed, with parameters and The instructor assigns letter grades to the test scores as sho wn in T able 5.10 (this is the pro cess of \grading on the curv e"). What fraction of the class gets A, B, C, D, F? 28 (Ross 11 ) An exp ert witness in a paternit y suit testies that the length (in da ys) of a pregnancy from conception to deliv ery is appro ximately normally distributed, with parameters = 270, = 10. The defendan t in the suit is able to pro v e that he w as out of the coun try during the p erio d from 290 to 240 da ys b efore the birth of the c hild. What is the probabilit y that the defendan t w as in the coun try when the c hild w as conceiv ed? 29 Supp ose that the time (in hours) required to repair a car is an exp onen tially distributed random v ariable with parameter = 1 = 2. What is the probabilit y that the repair time exceeds 4 hours? If it exceeds 4 hours what is the probabilit y that it exceeds 8 hours? 11 S. Ross, A First Course in Pr ob ability The ory, 2d ed. (New Y ork: Macmillan, 1984). PAGE 231 5.2. IMPOR T ANT DENSITIES 223 30 Supp ose that the n um b er of y ears a car will run is exp onen tially distributed with parameter = 1 = 4. If Prosser buys a used car to da y what is the probabilit y that it will still run after 4 y ears? 31 Let U b e a uniformly distributed random v ariable on [0 ; 1]. What is the probabilit y that the equation x 2 + 4 U x + 1 = 0 has t w o distinct real ro ots x 1 and x 2 ? 32 W rite a program to sim ulate the random v ariables whose densities are giv en b y the follo wing, making a suitable bar graph of eac h and comparing the exact densit y with the bar graph. (a) f X ( x ) = e x on [0 ; 1 ) (but just do it on [0 ; 10]) : (b) f X ( x ) = 2 x on [0 ; 1] : (c) f X ( x ) = 3 x 2 on [0 ; 1] : (d) f X ( x ) = 4 j x 1 = 2 j on [0 ; 1] : 33 Supp ose w e are observing a pro cess suc h that the time b et w een o ccurrences is exp onen tially distributed with = 1 = 30 (i.e., the a v erage time b et w een o ccurrences is 30 min utes). Supp ose that the pro cess starts at a certain time and w e start observing the pro cess 3 hours later. W rite a program to sim ulate this pro cess. Let T denote the length of time that w e ha v e to w ait, after w e start our observ ation, for an o ccurrence. Ha v e y our program k eep trac k of T What is an estimate for the a v erage v alue of T ? 34 Jones puts in t w o new ligh tbulbs: a 60 w att bulb and a 100 w att bulb. It is claimed that the lifetime of the 60 w att bulb has an exp onen tial densit y with a v erage lifetime 200 hours ( = 1 = 200). The 100 w att bulb also has an exp onen tial densit y but with a v erage lifetime of only 100 hours ( = 1 = 100). Jones w onders what is the probabilit y that the 100 w att bulb will outlast the 60 w att bulb. If X and Y are t w o indep enden t random v ariables with exp onen tial densities f ( x ) = e x and g ( x ) = e x resp ectiv ely then the probabilit y that X is less than Y is giv en b y P ( X < Y ) = Z 1 0 f ( x )(1 G ( x )) dx; where G ( x ) is the cum ulativ e distribution function for g ( x ). Explain wh y this is the case. Use this to sho w that P ( X < Y ) = + and to answ er Jones's question. PAGE 232 224 CHAPTER 5. DISTRIBUTIONS AND DENSITIES 35 Consider the simple queueing pro cess of Example 5.7. Supp ose that y ou w atc h the size of the queue. If there are j p eople in the queue the next time the queue size c hanges it will either decrease to j 1 or increase to j + 1. Use the result of Exercise 34 to sho w that the probabilit y that the queue size decreases to j 1 is = ( + ) and the probabilit y that it increases to j + 1 is = ( + ). When the queue size is 0 it can only increase to 1. W rite a program to sim ulate the queue size. Use this sim ulation to help form ulate a conjecture con taining conditions on and that will ensure that the queue will ha v e times when it is empt y 36 Let X b e a random v ariable ha ving an exp onen tial densit y with parameter Find the densit y for the random v ariable Y = r X where r is a p ositiv e real n um b er. 37 Let X b e a random v ariable ha ving a normal densit y and consider the random v ariable Y = e X Then Y has a lo g normal densit y Find this densit y of Y 38 Let X 1 and X 2 b e indep enden t random v ariables and for i = 1 ; 2, let Y i = i ( X i ), where i is strictly increasing on the range of X i Sho w that Y 1 and Y 2 are indep enden t. Note that the same result is true without the assumption that the i 's are strictly increasing, but the pro of is more dicult. PAGE 233 Chapter 6 Exp ected V alue and V ariance 6.1 Exp ected V alue of Discrete Random V ariables When a large collection of n um b ers is assem bled, as in a census, w e are usually in terested not in the individual n um b ers, but rather in certain descriptiv e quan tities suc h as the a v erage or the median. In general, the same is true for the probabilit y distribution of a n umericallyv alued random v ariable. In this and in the next section, w e shall discuss t w o suc h descriptiv e quan tities: the exp e cte d value and the varianc e. Both of these quan tities apply only to n umericallyv alued random v ariables, and so w e assume, in these sections, that all random v ariables ha v e n umerical v alues. T o giv e some in tuitiv e justication for our denition, w e consider the follo wing game. Av erage V alue A die is rolled. If an o dd n um b er turns up, w e win an amoun t equal to this n um b er; if an ev en n um b er turns up, w e lose an amoun t equal to this n um b er. F or example, if a t w o turns up w e lose 2, and if a three comes up w e win 3. W e w an t to decide if this is a reasonable game to pla y W e rst try sim ulation. The program Die carries out this sim ulation. The program prin ts the frequency and the relativ e frequency with whic h eac h outcome o ccurs. It also calculates the a v erage winnings. W e ha v e run the program t wice. The results are sho wn in T able 6.1. In the rst run w e ha v e pla y ed the game 100 times. In this run our a v erage gain is : 57. It lo oks as if the game is unfa v orable, and w e w onder ho w unfa v orable it really is. T o get a b etter idea, w e ha v e pla y ed the game 10,000 times. In this case our a v erage gain is : 4949. W e note that the relativ e frequency of eac h of the six p ossible outcomes is quite close to the probabilit y 1/6 for this outcome. This corresp onds to our frequency in terpretation of probabilit y It also suggests that for v ery large n um b ers of pla ys, our a v erage gain should b e = 1 1 6 2 1 6 + 3 1 6 4 1 6 + 5 1 6 6 1 6 225 PAGE 234 226 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE n = 100 n = 10000 Winning F requency Relativ e F requency Relativ e F requency F requency 1 17 .17 1681 .1681 2 17 .17 1678 .1678 3 16 .16 1626 .1626 4 18 .18 1696 .1696 5 16 .16 1686 .1686 6 16 .16 1633 .1633 T able 6.1: F requencies for dice game. = 9 6 12 6 = 3 6 = : 5 : This agrees quite w ell with our a v erage gain for 10,000 pla ys. W e note that the v alue w e ha v e c hosen for the a v erage gain is obtained b y taking the p ossible outcomes, m ultiplying b y the probabilit y and adding the results. This suggests the follo wing denition for the exp ected outcome of an exp erimen t. Exp ected V alue Denition 6.1 Let X b e a n umericallyv alued discrete random v ariable with sample space n and distribution function m ( x ). The exp e cte d value E ( X ) is dened b y E ( X ) = X x 2 n xm ( x ) ; pro vided this sum con v erges absolutely W e often refer to the exp ected v alue as the me an, and denote E ( X ) b y for short. If the ab o v e sum do es not con v erge absolutely then w e sa y that X do es not ha v e an exp ected v alue. 2 Example 6.1 Let an exp erimen t consist of tossing a fair coin three times. Let X denote the n um b er of heads whic h app ear. Then the p ossible v alues of X are 0 ; 1 ; 2 and 3. The corresp onding probabilities are 1 = 8 ; 3 = 8 ; 3 = 8 ; and 1 = 8. Th us, the exp ected v alue of X equals 0 1 8 + 1 3 8 + 2 3 8 + 3 1 8 = 3 2 : Later in this section w e shall see a quic k er w a y to compute this exp ected v alue, based on the fact that X can b e written as a sum of simpler random v ariables. 2 Example 6.2 Supp ose that w e toss a fair coin un til a head rst comes up, and let X represen t the n um b er of tosses whic h w ere made. Then the p ossible v alues of X are 1 ; 2 ; : : : and the distribution function of X is dened b y m ( i ) = 1 2 i : PAGE 235 6.1. EXPECTED V ALUE 227 (This is just the geometric distribution with parameter 1 = 2.) Th us, w e ha v e E ( X ) = 1 X i =1 i 1 2 i = 1 X i =1 1 2 i + 1 X i =2 1 2 i + = 1 + 1 2 + 1 2 2 + = 2 : 2 Example 6.3 (Example 6.2 con tin ued) Supp ose that w e rip a coin un til a head rst app ears, and if the n um b er of tosses equals n then w e are paid 2 n dollars. What is the exp ected v alue of the pa ymen t? W e let Y represen t the pa ymen t. Then, P ( Y = 2 n ) = 1 2 n ; for n 1. Th us, E ( Y ) = 1 X n =1 2 n 1 2 n ; whic h is a div ergen t sum. Th us, Y has no exp ectation. This example is called the St. Petersbur g Par adox The fact that the ab o v e sum is innite suggests that a pla y er should b e willing to pa y an y xed amoun t p er game for the privilege of pla ying this game. The reader is ask ed to consider ho w m uc h he or she w ould b e willing to pa y for this privilege. It is unlik ely that the reader's answ er is more than 10 dollars; therein lies the parado x. In the early history of probabilit y v arious mathematicians ga v e w a ys to resolv e this parado x. One idea (due to G. Cramer) consists of assuming that the amoun t of money in the w orld is nite. He th us assumes that there is some xed v alue of n suc h that if the n um b er of tosses equals or exceeds n the pa ymen t is 2 n dollars. The reader is ask ed to sho w in Exercise 20 that the exp ected v alue of the pa ymen t is no w nite. Daniel Bernoulli and Cramer also considered another w a y to assign v alue to the pa ymen t. Their idea w as that the v alue of a pa ymen t is some function of the pa ymen t; suc h a function is no w called a utilit y function. Examples of reasonable utilit y functions migh t include the squarero ot function or the logarithm function. In b oth cases, the v alue of 2 n dollars is less than t wice the v alue of n dollars. It can easily b e sho wn that in b oth cases, the exp ected utilit y of the pa ymen t is nite (see Exercise 20). 2 PAGE 236 228 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE Example 6.4 Let T b e the time for the rst success in a Bernoulli trials pro cess. Then w e tak e as sample space n the in tegers 1 ; 2 ; : : : and assign the geometric distribution m ( j ) = P ( T = j ) = q j 1 p : Th us, E ( T ) = 1 p + 2 q p + 3 q 2 p + = p (1 + 2 q + 3 q 2 + ) : No w if j x j < 1, then 1 + x + x 2 + x 3 + = 1 1 x : Dieren tiating this form ula, w e get 1 + 2 x + 3 x 2 + = 1 (1 x ) 2 ; so E ( T ) = p (1 q ) 2 = p p 2 = 1 p : In particular, w e see that if w e toss a fair coin a sequence of times, the exp ected time un til the rst heads is 1/(1/2) = 2. If w e roll a die a sequence of times, the exp ected n um b er of rolls un til the rst six is 1/(1/6) = 6. 2 In terpretation of Exp ected V alue In statistics, one is frequen tly concerned with the a v erage v alue of a set of data. The follo wing example sho ws that the ideas of a v erage v alue and exp ected v alue are v ery closely related. Example 6.5 The heigh ts, in inc hes, of the w omen on the Sw arthmore bask etball team are 5' 9", 5' 9", 5' 6", 5' 8", 5' 11", 5' 5", 5' 7", 5' 6", 5' 6", 5' 7", 5' 10", and 6' 0". A statistician w ould compute the a v erage heigh t (in inc hes) as follo ws: 69 + 69 + 66 + 68 + 71 + 65 + 67 + 66 + 66 + 67 + 70 + 72 12 = 67 : 9 : One can also in terpret this n um b er as the exp ected v alue of a random v ariable. T o see this, let an exp erimen t consist of c ho osing one of the w omen at random, and let X denote her heigh t. Then the exp ected v alue of X equals 67.9. 2 Of course, just as with the frequency in terpretation of probabilit y to in terpret exp ected v alue as an a v erage outcome requires further justication. W e kno w that for an y nite exp erimen t the a v erage of the outcomes is not predictable. Ho w ev er, w e shall ev en tually pro v e that the a v erage will usually b e close to E ( X ) if w e rep eat the exp erimen t a large n um b er of times. W e rst need to dev elop some prop erties of the exp ected v alue. Using these prop erties, and those of the concept of the v ariance PAGE 237 6.1. EXPECTED V ALUE 229 X Y HHH 1 HHT 2 HTH 3 HTT 2 THH 2 THT 3 TTH 2 TTT 1 T able 6.2: T ossing a coin three times. to b e in tro duced in the next section, w e shall b e able to pro v e the L aw of L ar ge Numb ers. This theorem will justify mathematically b oth our frequency concept of probabilit y and the in terpretation of exp ected v alue as the a v erage v alue to b e exp ected in a large n um b er of exp erimen ts. Exp ectation of a F unction of a Random V ariable Supp ose that X is a discrete random v ariable with sample space n, and ( x ) is a realv alued function with domain n. Then ( X ) is a realv alued random v ariable. One w a y to determine the exp ected v alue of ( X ) is to rst determine the distribution function of this random v ariable, and then use the denition of exp ectation. Ho w ev er, there is a b etter w a y to compute the exp ected v alue of ( X ), as demonstrated in the next example. Example 6.6 Supp ose a coin is tossed 9 times, with the result H H H T T T T H T : The rst set of three heads is called a run There are three more runs in this sequence, namely the next four tails, the next head, and the next tail. W e do not consider the rst t w o tosses to constitute a run, since the third toss has the same v alue as the rst t w o. No w supp ose an exp erimen t consists of tossing a fair coin three times. Find the exp ected n um b er of runs. It will b e helpful to think of t w o random v ariables, X and Y asso ciated with this exp erimen t. W e let X denote the sequence of heads and tails that results when the exp erimen t is p erformed, and Y denote the n um b er of runs in the outcome X The p ossible outcomes of X and the corresp onding v alues of Y are sho wn in T able 6.2. T o calculate E ( Y ) using the denition of exp ectation, w e rst m ust nd the distribution function m ( y ) of Y i.e., w e group together those v alues of X with a common v alue of Y and add their probabilities. In this case, w e calculate that the distribution function of Y is: m (1) = 1 = 4 ; m (2) = 1 = 2 ; and m (3) = 1 = 4. One easily nds that E ( Y ) = 2. PAGE 238 230 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE No w supp ose w e didn't group the v alues of X with a common Y v alue, but instead, for eac h X v alue x w e m ultiply the probabilit y of x and the corresp onding v alue of Y and add the results. W e obtain 1 1 8 + 2 1 8 + 3 1 8 + 2 1 8 + 2 1 8 + 3 1 8 + 2 1 8 + 1 1 8 ; whic h equals 2. This illustrates the follo wing general principle. If X and Y are t w o random v ariables, and Y can b e written as a function of X then one can compute the exp ected v alue of Y using the distribution function of X 2 Theorem 6.1 If X is a discrete random v ariable with sample space n and distribution function m ( x ), and if : n R is a function, then E ( ( X )) = X x 2 n ( x ) m ( x ) ; pro vided the series con v erges absolutely 2 The pro of of this theorem is straigh tforw ard, in v olving nothing more than grouping v alues of X with a common Y v alue, as in Example 6.6. The Sum of Tw o Random V ariables Man y imp ortan t results in probabilit y theory concern sums of random v ariables. W e rst consider what it means to add t w o random v ariables. Example 6.7 W e rip a coin and let X ha v e the v alue 1 if the coin comes up heads and 0 if the coin comes up tails. Then, w e roll a die and let Y denote the face that comes up. What do es X + Y mean, and what is its distribution? This question is easily answ ered in this case, b y considering, as w e did in Chapter 4, the join t random v ariable Z = ( X ; Y ), whose outcomes are ordered pairs of the form ( x; y ), where 0 x 1 and 1 y 6. The description of the exp erimen t mak es it reasonable to assume that X and Y are indep enden t, so the distribution function of Z is uniform, with 1 = 12 assigned to eac h outcome. No w it is an easy matter to nd the set of outcomes of X + Y and its distribution function. 2 In Example 6.1, the random v ariable X denoted the n um b er of heads whic h o ccur when a fair coin is tossed three times. It is natural to think of X as the sum of the random v ariables X 1 ; X 2 ; X 3 where X i is dened to b e 1 if the i th toss comes up heads, and 0 if the i th toss comes up tails. The exp ected v alues of the X i 's are extremely easy to compute. It turns out that the exp ected v alue of X can b e obtained b y simply adding the exp ected v alues of the X i 's. This fact is stated in the follo wing theorem. PAGE 239 6.1. EXPECTED V ALUE 231 Theorem 6.2 Let X and Y b e random v ariables with nite exp ected v alues. Then E ( X + Y ) = E ( X ) + E ( Y ) ; and if c is an y constan t, then E ( cX ) = cE ( X ) : Pro of. Let the sample spaces of X and Y b e denoted b y n X and n Y and supp ose that n X = f x 1 ; x 2 ; : : : g and n Y = f y 1 ; y 2 ; : : : g : Then w e can consider the random v ariable X + Y to b e the result of applying the function ( x; y ) = x + y to the join t random v ariable ( X ; Y ). Then, b y Theorem 6.1, w e ha v e E ( X + Y ) = X j X k ( x j + y k ) P ( X = x j ; Y = y k ) = X j X k x j P ( X = x j ; Y = y k ) + X j X k y k P ( X = x j ; Y = y k ) = X j x j P ( X = x j ) + X k y k P ( Y = y k ) : The last equalit y follo ws from the fact that X k P ( X = x j ; Y = y k ) = P ( X = x j ) and X j P ( X = x j ; Y = y k ) = P ( Y = y k ) : Th us, E ( X + Y ) = E ( X ) + E ( Y ) : If c is an y constan t, E ( cX ) = X j cx j P ( X = x j ) = c X j x j P ( X = x j ) = cE ( X ) : 2 PAGE 240 232 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE X Y a b c 3 a c b 1 b a c 1 b c a 0 c a b 0 c b a 1 T able 6.3: Num b er of xed p oin ts. It is easy to pro v e b y mathematical induction that the exp e cte d value of the sum of any nite numb er of r andom variables is the sum of the exp e cte d values of the individual r andom variables. It is imp ortan t to note that m utual indep endence of the summands w as not needed as a h yp othesis in the Theorem 6.2 and its generalization. The fact that exp ectations add, whether or not the summands are m utually indep enden t, is sometimes referred to as the First F undamen tal Mystery of Probabilit y Example 6.8 Let Y b e the n um b er of xed p oin ts in a random p erm utation of the set f a; b; c g T o nd the exp ected v alue of Y it is helpful to consider the basic random v ariable asso ciated with this exp erimen t, namely the random v ariable X whic h represen ts the random p erm utation. There are six p ossible outcomes of X and w e assign to eac h of them the probabilit y 1 = 6 see T able 6.3. Then w e can calculate E ( Y ) using Theorem 6.1, as 3 1 6 + 1 1 6 + 1 1 6 + 0 1 6 + 0 1 6 + 1 1 6 = 1 : W e no w giv e a v ery quic k w a y to calculate the a v erage n um b er of xed p oin ts in a random p erm utation of the set f 1 ; 2 ; 3 ; : : : ; n g Let Z denote the random p erm utation. F or eac h i 1 i n let X i equal 1 if Z xes i and 0 otherwise. So if w e let F denote the n um b er of xed p oin ts in Z then F = X 1 + X 2 + + X n : Therefore, Theorem 6.2 implies that E ( F ) = E ( X 1 ) + E ( X 2 ) + + E ( X n ) : But it is easy to see that for eac h i E ( X i ) = 1 n ; so E ( F ) = 1 : This metho d of calculation of the exp ected v alue is frequen tly v ery useful. It applies whenev er the random v ariable in question can b e written as a sum of simpler random v ariables. W e emphasize again that it is not necessary that the summands b e m utually indep enden t. 2 PAGE 241 6.1. EXPECTED V ALUE 233 Bernoulli T rials Theorem 6.3 Let S n b e the n um b er of successes in n Bernoulli trials with probabilit y p for success on eac h trial. Then the exp ected n um b er of successes is np That is, E ( S n ) = np : Pro of. Let X j b e a random v ariable whic h has the v alue 1 if the j th outcome is a success and 0 if it is a failure. Then, for eac h X j E ( X j ) = 0 (1 p ) + 1 p = p : Since S n = X 1 + X 2 + + X n ; and the exp ected v alue of the sum is the sum of the exp ected v alues, w e ha v e E ( S n ) = E ( X 1 ) + E ( X 2 ) + + E ( X n ) = np : 2 P oisson Distribution Recall that the P oisson distribution with parameter w as obtained as a limit of binomial distributions with parameters n and p where it w as assumed that np = and n 1 Since for eac h n the corresp onding binomial distribution has exp ected v alue it is reasonable to guess that the exp ected v alue of a P oisson distribution with parameter also has exp ectation equal to This is in fact the case, and the reader is in vited to sho w this (see Exercise 21). Indep endence If X and Y are t w o random v ariables, it is not true in general that E ( X Y ) = E ( X ) E ( Y ). Ho w ev er, this is true if X and Y are indep endent. Theorem 6.4 If X and Y are indep enden t random v ariables, then E ( X Y ) = E ( X ) E ( Y ) : Pro of. Supp ose that n X = f x 1 ; x 2 ; : : : g and n Y = f y 1 ; y 2 ; : : : g PAGE 242 234 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE are the sample spaces of X and Y resp ectiv ely Using Theorem 6.1, w e ha v e E ( X Y ) = X j X k x j y k P ( X = x j ; Y = y k ) : But if X and Y are indep enden t, P ( X = x j ; Y = y k ) = P ( X = x j ) P ( Y = y k ) : Th us, E ( X Y ) = X j X k x j y k P ( X = x j ) P ( Y = y k ) = 0@ X j x j P ( X = x j ) 1A X k y k P ( Y = y k ) = E ( X ) E ( Y ) : 2 Example 6.9 A coin is tossed t wice. X i = 1 if the i th toss is heads and 0 otherwise. W e kno w that X 1 and X 2 are indep enden t. They eac h ha v e exp ected v alue 1/2. Th us E ( X 1 X 2 ) = E ( X 1 ) E ( X 2 ) = (1 = 2)(1 = 2) = 1 = 4. 2 W e next giv e a simple example to sho w that the exp ected v alues need not m ultiply if the random v ariables are not indep enden t. Example 6.10 Consider a single toss of a coin. W e dene the random v ariable X to b e 1 if heads turns up and 0 if tails turns up, and w e set Y = 1 X Then E ( X ) = E ( Y ) = 1 = 2. But X Y = 0 for either outcome. Hence, E ( X Y ) = 0 6 = E ( X ) E ( Y ). 2 W e return to our records example of Section 3.1 for another application of the result that the exp ected v alue of the sum of random v ariables is the sum of the exp ected v alues of the individual random v ariables. RecordsExample 6.11 W e start k eeping sno wfall records this y ear and w an t to nd the exp ected n um b er of records that will o ccur in the next n y ears. The rst y ear is necessarily a record. The second y ear will b e a record if the sno wfall in the second y ear is greater than that in the rst y ear. By symmetry this probabilit y is 1/2. More generally let X j b e 1 if the j th y ear is a record and 0 otherwise. T o nd E ( X j ), w e need only nd the probabilit y that the j th y ear is a record. But the record sno wfall for the rst j y ears is equally lik ely to fall in an y one of these y ears, PAGE 243 6.1. EXPECTED V ALUE 235 so E ( X j ) = 1 =j Therefore, if S n is the total n um b er of records observ ed in the rst n y ears, E ( S n ) = 1 + 1 2 + 1 3 + + 1 n : This is the famous diver gent harmonic series. It is easy to sho w that E ( S n ) log n as n 1 A more accurate appro ximation to E ( S n ) is giv en b y the expression log n + r + 1 2 n ; where r denotes Euler's constan t, and is appro ximately equal to .5772. Therefore, in ten y ears the exp ected n um b er of records is appro ximately 2 : 9298; the exact v alue is the sum of the rst ten terms of the harmonic series whic h is 2.9290. 2 CrapsExample 6.12 In the game of craps, the pla y er mak es a b et and rolls a pair of dice. If the sum of the n um b ers is 7 or 11 the pla y er wins, if it is 2, 3, or 12 the pla y er loses. If an y other n um b er results, sa y r then r b ecomes the pla y er's p oin t and he con tin ues to roll un til either r or 7 o ccurs. If r comes up rst he wins, and if 7 comes up rst he loses. The program Craps sim ulates pla ying this game a n um b er of times. W e ha v e run the program for 1000 pla ys in whic h the pla y er b ets 1 dollar eac h time. The pla y er's a v erage winnings w ere : 006. The game of craps w ould seem to b e only sligh tly unfa v orable. Let us calculate the exp ected winnings on a single pla y and see if this is the case. W e construct a t w ostage tree measure as sho wn in Figure 6.1. The rst stage represen ts the p ossible sums for his rst roll. The second stage represen ts the p ossible outcomes for the game if it has not ended on the rst roll. In this stage w e are represen ting the p ossible outcomes of a sequence of rolls required to determine the nal outcome. The branc h probabilities for the rst stage are computed in the usual w a y assuming all 36 p ossibilites for outcomes for the pair of dice are equally lik ely F or the second stage w e assume that the game will ev en tually end, and w e compute the conditional probabilities for obtaining either the p oin t or a 7. F or example, assume that the pla y er's p oin t is 6. Then the game will end when one of the elev en pairs, (1 ; 5), (2 ; 4), (3 ; 3), (4 ; 2), (5 ; 1), (1 ; 6), (2 ; 5), (3 ; 4), (4 ; 3), (5 ; 2), (6 ; 1), o ccurs. W e assume that eac h of these p ossible pairs has the same probabilit y Then the pla y er wins in the rst v e cases and loses in the last six. Th us the probabilit y of winning is 5/11 and the probabilit y of losing is 6/11. F rom the path probabilities, w e can nd the probabilit y that the pla y er wins 1 dollar; it is 244/495. The probabilit y of losing is then 251/495. Th us if X is his winning for PAGE 244 236 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE W L W L W L WL WL WL (2,3,12) L 10 9 8 6 5 4 (7,11) W 1/32/32/5 3/5 5/11 6/11 5/116/11 2/5 3/5 1/3 2/3 2/9 1/121/95/365/36 1/91/12 1/9 1/36 2/36 2/453/45 25/396 30/396 25/396 30/396 2/45 3/45 1/36 2/36 Figure 6.1: T ree measure for craps. PAGE 245 6.1. EXPECTED V ALUE 237 a dollar b et, E ( X ) = 1 244 495 + ( 1) 251 495 = 7 495 : 0141 : The game is unfa v orable, but only sligh tly The pla y er's exp ected gain in n pla ys is n ( : 0141). If n is not large, this is a small exp ected loss for the pla y er. The casino mak es a large n um b er of pla ys and so can aord a small a v erage gain p er pla y and still exp ect a large prot. 2 RouletteExample 6.13 In Las V egas, a roulette wheel has 38 slots n um b ered 0, 00, 1, 2, 36. The 0 and 00 slots are green, and half of the remaining 36 slots are red and half are blac k. A croupier spins the wheel and thro ws an iv ory ball. If y ou b et 1 dollar on red, y ou win 1 dollar if the ball stops in a red slot, and otherwise y ou lose a dollar. W e wish to calculate the exp ected v alue of y our winnings, if y ou b et 1 dollar on red. Let X b e the random v ariable whic h denotes y our winnings in a 1 dollar b et on red in Las V egas roulette. Then the distribution of X is giv en b y m X = 1 1 20 = 38 18 = 38 ; and one can easily calculate (see Exercise 5) that E ( X ) : 0526 : W e no w consider the roulette game in Mon te Carlo, and follo w the treatmen t of Sagan. 1 In the roulette game in Mon te Carlo there is only one 0. If y ou b et 1 franc on red and a 0 turns up, then, dep ending up on the casino, one or more of the follo wing options ma y b e oered: (a) Y ou get 1/2 of y our b et bac k, and the casino gets the other half of y our b et. (b) Y our b et is put \in prison," whic h w e will denote b y P 1 If red comes up on the next turn, y ou get y our b et bac k (but y ou don't win an y money). If blac k or 0 comes up, y ou lose y our b et. (c) Y our b et is put in prison P 1 as b efore. If red comes up on the next turn, y ou get y our b et bac k, and if blac k comes up on the next turn, then y ou lose y our b et. If a 0 comes up on the next turn, then y our b et is put in to double prison, whic h w e will denote b y P 2 If y our b et is in double prison, and if red comes up on the next turn, then y our b et is mo v ed bac k to prison P 1 and the game pro ceeds as b efore. If y our b et is in double prison, and if blac k or 0 come up on the next turn, then y ou lose y our b et. W e refer the reader to Figure 6.2, where a tree for this option is sho wn. In this gure, S is the starting p osition, W means that y ou win y our b et, L means that y ou lose y our b et, and E means that y ou break ev en. 1 H. Sagan, Markov Chains in Monte Carlo, Math. Mag., v ol. 54, no. 1 (1981), pp. 310. PAGE 246 238 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE S W L E L L L L L L E P 1 P 1 P 1 P 2 P 2 P 2 Figure 6.2: T ree for 2prison Mon te Carlo roulette. It is in teresting to compare the exp ected winnings of a 1 franc b et on red, under eac h of these three options. W e lea v e the rst t w o calculations as an exercise (see Exercise 37). Supp ose that y ou c ho ose to pla y alternativ e (c). The calculation for this case illustrates the w a y that the early F renc h probabilists w ork ed problems lik e this. Supp ose y ou b et on red, y ou c ho ose alternativ e (c), and a 0 comes up. Y our p ossible future outcomes are sho wn in the tree diagram in Figure 6.3. Assume that y our money is in the rst prison and let x b e the probabilit y that y ou lose y our franc. F rom the tree diagram w e see that x = 18 37 + 1 37 P (y ou lose y our franc j y our franc is in P 2 ) : Also, P (y ou lose y our franc j y our franc is in P 2 ) = 19 37 + 18 37 x : So, w e ha v e x = 18 37 + 1 37 19 37 + 18 37 x : Solving for x w e obtain x = 685 = 1351. Th us, starting at S the probabilit y that y ou lose y our b et equals 18 37 + 1 37 x = 25003 49987 : T o nd the probabilit y that y ou win when y ou b et on red, note that y ou can only win if red comes up on the rst turn, and this happ ens with probabilit y 18/37. Th us y our exp ected winnings are 1 18 37 1 25003 49987 = 687 49987 : 0137 : PAGE 247 6.1. EXPECTED V ALUE 239 P W L P P L 18/3718/37 1/37 19/37 18/37 1 1 2 Figure 6.3: Y our money is put in prison. It is in teresting to note that the more roman tic option (c) is less fa v orable than option (a) (see Exercise 37). If y ou b et 1 dollar on the n um b er 17, then the distribution function for y our winnings X is P X = 1 35 36 = 37 1 = 37 ; and the exp ected winnings are 1 36 37 + 35 1 37 = 1 37 : 027 : Th us, at Mon te Carlo dieren t b ets ha v e dieren t exp ected v alues. In Las V egas almost all b ets ha v e the same exp ected v alue of 2 = 38 = : 0526 (see Exercises 4 and 5). 2 Conditional Exp ectation Denition 6.2 If F is an y ev en t and X is a random v ariable with sample space n = f x 1 ; x 2 ; : : : g then the c onditional exp e ctation given F is dened b y E ( X j F ) = X j x j P ( X = x j j F ) : Conditional exp ectation is used most often in the form pro vided b y the follo wing theorem. 2 Theorem 6.5 Let X b e a random v ariable with sample space n. If F 1 F 2 F r are ev en ts suc h that F i \ F j = ; for i 6 = j and n = [ j F j then E ( X ) = X j E ( X j F j ) P ( F j ) : PAGE 248 240 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE Pro of. W e ha v e X j E ( X j F j ) P ( F j ) = X j X k x k P ( X = x k j F j ) P ( F j ) = X j X k x k P ( X = x k and F j o ccurs ) = X k X j x k P ( X = x k and F j o ccurs ) = X k x k P ( X = x k ) = E ( X ) : 2 Example 6.14 (Example 6.12 con tin ued) Let T b e the n um b er of rolls in a single pla y of craps. W e can think of a single pla y as a t w ostage pro cess. The rst stage consists of a single roll of a pair of dice. The pla y is o v er if this roll is a 2, 3, 7, 11, or 12. Otherwise, the pla y er's p oin t is established, and the second stage b egins. This second stage consists of a sequence of rolls whic h ends when either the pla y er's p oin t or a 7 is rolled. W e record the outcomes of this t w ostage exp erimen t using the random v ariables X and S where X denotes the rst roll, and S denotes the n um b er of rolls in the second stage of the exp erimen t (of course, S is sometimes equal to 0). Note that T = S + 1. Then b y Theorem 6.5 E ( T ) = 12 X j =2 E ( T j X = j ) P ( X = j ) : If j = 7, 11 or 2, 3, 12, then E ( T j X = j ) = 1. If j = 4 ; 5 ; 6 ; 8 ; 9 ; or 10, w e can use Example 6.4 to calculate the exp ected v alue of S In eac h of these cases, w e con tin ue rolling un til w e get either a j or a 7. Th us, S is geometrically distributed with parameter p whic h dep ends up on j If j = 4, for example, the v alue of p is 3 = 36 + 6 = 36 = 1 = 4. Th us, in this case, the exp ected n um b er of additional rolls is 1 =p = 4, so E ( T j X = 4) = 1 + 4 = 5. Carrying out the corresp onding calculations for the other p ossible v alues of j and using Theorem 6.5 giv es E ( T ) = 1 12 36 + 1 + 36 3 + 6 3 36 + 1 + 36 4 + 6 4 36 + 1 + 36 5 + 6 5 36 + 1 + 36 5 + 6 5 36 + 1 + 36 4 + 6 4 36 + 1 + 36 3 + 6 3 36 = 557 165 3 : 375 : : : : 2 PAGE 249 6.1. EXPECTED V ALUE 241 MartingalesW e can extend the notion of fairness to a pla y er pla ying a sequence of games b y using the concept of conditional exp ectation. Example 6.15 Let S 1 S 2 S n b e P eter's accum ulated fortune in pla ying heads or tails (see Example 1.4). Then E ( S n j S n 1 = a; : : : ; S 1 = r ) = 1 2 ( a + 1) + 1 2 ( a 1) = a : W e note that P eter's exp ected fortune after the next pla y is equal to his presen t fortune. When this o ccurs, w e sa y the game is fair. A fair game is also called a martingale. If the coin is biased and comes up heads with probabilit y p and tails with probabilit y q = 1 p then E ( S n j S n 1 = a; : : : ; S 1 = r ) = p ( a + 1) + q ( a 1) = a + p q : Th us, if p < q this game is unfa v orable, and if p > q it is fa v orable. 2 If y ou are in a casino, y ou will see pla y ers adopting elab orate systems of pla y to try to mak e unfa v orable games fa v orable. Tw o suc h systems, the martingale doubling system and the more conserv ativ e Lab ouc here system, w ere describ ed in Exercises 1.1.9 and 1.1.10. Unfortunately suc h systems cannot c hange ev en a fair game in to a fa v orable game. Ev en so, it is a fa v orite pastime of man y p eople to dev elop systems of pla y for gam bling games and for other games suc h as the sto c k mark et. W e close this section with a simple illustration of suc h a system. Sto c k Prices Example 6.16 Let us assume that a sto c k increases or decreases in v alue eac h da y b y 1 dollar, eac h with probabilit y 1/2. Then w e can iden tify this simplied mo del with our familiar game of heads or tails. W e assume that a buy er, Mr. Ace, adopts the follo wing strategy He buys the sto c k on the rst da y at its price V He then w aits un til the price of the sto c k increases b y one to V + 1 and sells. He then con tin ues to w atc h the sto c k un til its price falls bac k to V He buys again and w aits un til it go es up to V + 1 and sells. Th us he holds the sto c k in in terv als during whic h it increases b y 1 dollar. In eac h suc h in terv al, he mak es a prot of 1 dollar. Ho w ev er, w e assume that he can do this only for a nite n um b er of trading da ys. Th us he can lose if, in the last in terv al that he holds the sto c k, it do es not get bac k up to V + 1; and this is the only w a y he can lose. In Figure 6.4 w e illustrate a t ypical history if Mr. Ace m ust stop in t w en t y da ys. Mr. Ace holds the sto c k under his system during the da ys indicated b y brok en lines. W e note that for the history sho wn in Figure 6.4, his system nets him a gain of 4 dollars. W e ha v e written a program Sto c kSystem to sim ulate the fortune of Mr. Ace if he uses his sytem o v er an n da y p erio d. If one runs this program a large n um b er PAGE 250 242 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE 5 10 15 20 1 0.5 0.5 1 1.5 2 Figure 6.4: Mr. Ace's system. of times, for n = 20, sa y one nds that his exp ected winnings are v ery close to 0, but the probabilit y that he is ahead after 20 da ys is signican tly greater than 1/2. F or small v alues of n the exact distribution of winnings can b e calculated. The distribution for the case n = 20 is sho wn in Figure 6.5. Using this distribution, it is easy to calculate that the exp ected v alue of his winnings is exactly 0. This is another instance of the fact that a fair game (a martingale) remains fair under quite general systems of pla y Although the exp ected v alue of his winnings is 0, the probabilit y that Mr. Ace is ahead after 20 da ys is ab out .610. Th us, he w ould b e able to tell his friends that his system giv es him a b etter c hance of b eing ahead than that of someone who simply buys the sto c k and holds it, if our simple random mo del is correct. There ha v e b een a n um b er of studies to determine ho w random the sto c k mark et is. 2 Historical Remarks With the La w of Large Num b ers to b olster the frequency in terpretation of probabilit y w e nd it natural to justify the denition of exp ected v alue in terms of the a v erage outcome o v er a large n um b er of rep etitions of the exp erimen t. The concept of exp ected v alue w as used b efore it w as formally dened; and when it w as used, it w as considered not as an a v erage v alue but rather as the appropriate v alue for a gam ble. F or example recall, from the Historical Remarks section of Chapter 1, Section 1.2, P ascal's w a y of nding the v alue of a threegame series that had to b e called o b efore it is nished. P ascal rst observ ed that if eac h pla y er has only one game to win, then the stak e of 64 pistoles should b e divided ev enly Then he considered the case where one pla y er has w on t w o games and the other one. Then consider, Sir, if the rst man wins, he gets 64 pistoles, if he loses he gets 32. Th us if they do not wish to risk this last game, but wish PAGE 251 6.1. EXPECTED V ALUE 243 20 15 10 5 0 5 10 0 0.05 0.1 0.15 0.2 Figure 6.5: Winnings distribution for n = 20. to separate without pla ying it, the rst man m ust sa y: \I am certain to get 32 pistoles, ev en if I lose I still get them; but as for the other 32 pistoles, p erhaps I will get them, p erhaps y ou will get them, the c hances are equal. Let us then divide these 32 pistoles in half and giv e one half to me as w ell as m y 32 whic h are mine for sure." He will then ha v e 48 pistoles and the other 16. 2 Note that P ascal reduced the problem to a symmetric b et in whic h eac h pla y er gets the same amoun t and tak es it as ob vious that in this case the stak es should b e divided equally The rst systematic study of exp ected v alue app ears in Huygens' b o ok. Lik e P ascal, Huygens nd the v alue of a gam ble b y assuming that the answ er is ob vious for certain symmetric situations and uses this to deduce the exp ected for the general situation. He do es this in steps. His rst prop osition is Prop. I. If I exp ect a or b either of whic h, with equal probabilit y ma y fall to me, then m y Exp ectation is w orth ( a + b ) = 2, that is, the half Sum of a and b 3 Huygens pro v ed this as follo ws: Assume that t w o pla y er A and B pla y a game in whic h eac h pla y er puts up a stak e of ( a + b ) = 2 with an equal c hance of winning the total stak e. Then the v alue of the game to eac h pla y er is ( a + b ) = 2. F or example, if the game had to b e called o clearly eac h pla y er should just get bac k his original stak e. No w, b y symmetry this v alue is not c hanged if w e add the condition that the winner of the game has to pa y the loser an amoun t b as a consolation prize. Then for pla y er A the v alue is still ( a + b ) = 2. But what are his p ossible outcomes 2 Quoted in F. N. Da vid, Games, Go ds and Gambling (London: Grin, 1962), p. 231. 3 C. Huygens, Calculating in Games of Chanc e, translation attributed to John Arbuthnot (London, 1692), p. 34. PAGE 252 244 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE for the mo died game? If he wins he gets the total stak e a + b and m ust pa y B an amoun t b so ends up with a If he loses he gets an amoun t b from pla y er B. Th us pla y er A wins a or b with equal c hances and the v alue to him is ( a + b ) = 2. Huygens illustrated this pro of in terms of an example. If y ou are oered a game in whic h y ou ha v e an equal c hance of winning 2 or 8, the exp ected v alue is 5, since this game is equiv alen t to the game in whic h eac h pla y er stak es 5 and agrees to pa y the loser 3  a game in whic h the v alue is ob viously 5. Huygens' second prop osition is Prop. I I. If I exp ect a b or c either of whic h, with equal facilit y ma y happ en, then the V alue of m y Exp ectation is ( a + b + c ) = 3, or the third of the Sum of a b and c 4 His argumen t here is similar. Three pla y ers, A, B, and C, eac h stak e ( a + b + c ) = 3 in a game they ha v e an equal c hance of winning. The v alue of this game to pla y er A is clearly the amoun t he has stak ed. F urther, this v alue is not c hanged if A en ters in to an agreemen t with B that if one of them wins he pa ys the other a consolation prize of b and with C that if one of them wins he pa ys the other a consolation prize of c By symmetry these agreemen ts do not c hange the v alue of the game. In this mo died game, if A wins he wins the total stak e a + b + c min us the consolation prizes b + c giving him a nal winning of a If B wins, A wins b and if C wins, A wins c Th us A nds himself in a game with v alue ( a + b + c ) = 3 and with outcomes a b and c o ccurring with equal c hance. This pro v es Prop osition I I. More generally this reasoning sho ws that if there are n outcomes a 1 ; a 2 ; : : : ; a n ; all o ccurring with the same probabilit y the exp ected v alue is a 1 + a 2 + + a n n : In his third prop osition Huygens considered the case where y ou win a or b but with unequal probabilities. He assumed there are p c hances of winning a and q c hances of winning b all ha ving the same probabilit y He then sho w ed that the exp ected v alue is E = p p + q a + q p + q b : This follo ws b y considering an equiv alen t gam ble with p + q outcomes all o ccurring with the same probabilit y and with a pa y o of a in p of the outcomes and b in q of the outcomes. This allo w ed Huygens to compute the exp ected v alue for exp erimen ts with unequal probabilities, at least when these probablities are rational n um b ers. Th us, instead of dening the exp ected v alue as a w eigh ted a v erage, Huygens assumed that the exp ected v alue of certain symmetric gam bles are kno wn and deduced the other v alues from these. Although this requires a go o d deal of clev er 4 ibid., p. 35. PAGE 253 6.1. EXPECTED V ALUE 245 manipulation, Huygens ended up with v alues that agree with those giv en b y our mo dern denition of exp ected v alue. One adv an tage of this metho d is that it giv es a justication for the exp ected v alue in cases where it is not reasonable to assume that y ou can rep eat the exp erimen t a large n um b er of times, as for example, in b etting that at least t w o presiden ts died on the same da y of the y ear. (In fact, three did; all w ere signers of the Declaration of Indep endence, and all three died on July 4.) In his b o ok, Huygens calculated the exp ected v alue of games using tec hniques similar to those whic h w e used in computing the exp ected v alue for roulette at Mon te Carlo. F or example, his prop osition XIV is: Prop. XIV. If I w ere pla ying with another b y turns, with t w o Dice, on this Condition, that if I thro w 7 I gain, and if he thro ws 6 he gains allo wing him the rst Thro w: T o nd the prop ortion of m y Hazard to his. 5 A mo dern description of this game is as follo ws. Huygens and his opp onen t tak e turns rolling a die. The game is o v er if Huygens rolls a 7 or his opp onen t rolls a 6. His opp onen t rolls rst. What is the probabilit y that Huygens wins the game? T o solv e this problem Huygens let x b e his c hance of winning when his opp onen t threw rst and y his c hance of winning when he threw rst. Then on the rst roll his opp onen t wins on 5 out of the 36 p ossibilities. Th us, x = 31 36 y : But when Huygens rolls he wins on 6 out of the 36 p ossible outcomes, and in the other 30, he is led bac k to where his c hances are x Th us y = 6 36 + 30 36 x : F rom these t w o equations Huygens found that x = 31 = 61. Another early use of exp ected v alue app eared in P ascal's argumen t to sho w that a rational p erson should b eliev e in the existence of Go d. 6 P ascal said that w e ha v e to mak e a w ager whether to b eliev e or not to b eliev e. Let p denote the probabilit y that Go d do es not exist. His discussion suggests that w e are pla ying a game with t w o strategies, b eliev e and not b eliev e, with pa y os as sho wn in T able 6.4. Here u represen ts the cost to y ou of passing up some w orldly pleasures as a consequence of b elieving that Go d exists. If y ou do not b eliev e, and Go d is a v engeful Go d, y ou will lose x If Go d exists and y ou do b eliev e y ou will gain v. No w to determine whic h strategy is b est y ou should compare the t w o exp ected v alues p ( u ) + (1 p ) v and p 0 + (1 p )( x ) ; 5 ibid., p. 47. 6 Quoted in I. Hac king, The Emer genc e of Pr ob ability (Cam bridge: Cam bridge Univ. Press, 1975). PAGE 254 246 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE Go d do es not exist Go d exists p 1 p b eliev e u v not b eliev e 0 x T able 6.4: P a y os. Age Surviv ors 0 100 6 64 16 40 26 25 36 16 46 10 56 6 66 3 76 1 T able 6.5: Graun t's mortalit y data. and c ho ose the larger of the t w o. In general, the c hoice will dep end up on the v alue of p But P ascal assumed that the v alue of v is innite and so the strategy of b elieving is b est no matter what probabilit y y ou assign for the existence of Go d. This example is considered b y some to b e the b eginning of decision theory Decision analyses of this kind app ear to da y in man y elds, and, in particular, are an imp ortan t part of medical diagnostics and corp orate business decisions. Another early use of exp ected v alue w as to decide the price of ann uities. The study of statistics has its origins in the use of the bills of mortalit y k ept in the parishes in London from 1603. These records k ept a w eekly tally of c hristenings and burials. F rom these John Graun t made estimates for the p opulation of London and also pro vided the rst mortalit y data, 7 sho wn in T able 6.5. As Hac king observ es, Graun t apparen tly constructed this table b y assuming that after the age of 6 there is a constan t probabilit y of ab out 5/8 of surviving for another decade. 8 F or example, of the 64 p eople who surviv e to age 6, 5/8 of 64 or 40 surviv e to 16, 5/8 of these 40 or 25 surviv e to 26, and so forth. Of course, he rounded o his gures to the nearest whole p erson. Clearly a constan t mortalit y rate cannot b e correct throughout the whole range, and later tables pro vided b y Halley w ere more realistic in this resp ect. 9 7 ibid., p. 108. 8 ibid., p. 109. 9 E. Halley \An Estimate of The Degrees of Mortalit y of Mankind," Phil. T r ans. R oyal. So c., PAGE 255 6.1. EXPECTED V ALUE 247 A terminal annuity pro vides a xed amoun t of money during a p erio d of n y ears. T o determine the price of a terminal ann uit y one needs only to kno w the appropriate in terest rate. A life annuity pro vides a xed amoun t during eac h y ear of the buy er's life. The appropriate price for a life ann uit y is the exp ected v alue of the terminal ann uit y ev aluated for the random lifetime of the buy er. Th us, the w ork of Huygens in in tro ducing exp ected v alue and the w ork of Graun t and Halley in determining mortalit y tables led to a more rational metho d for pricing ann uities. This w as one of the rst serious uses of probabilit y theory outside the gam bling houses. Although exp ected v alue pla ys a role no w in ev ery branc h of science, it retains its imp ortance in the casino. In 1962, Edw ard Thorp's b o ok Be at the De aler 10 pro vided the reader with a strategy for pla ying the p opular casino game of blac kjac k that w ould assure the pla y er a p ositiv e exp ected winning. This b o ok forev ermore c hanged the b elief of the casinos that they could not b e b eat. Exercises 1 A card is dra wn at random from a dec k consisting of cards n um b ered 2 through 10. A pla y er wins 1 dollar if the n um b er on the card is o dd and loses 1 dollar if the n um b er if ev en. What is the exp ected v alue of his winnings? 2 A card is dra wn at random from a dec k of pla ying cards. If it is red, the pla y er wins 1 dollar; if it is blac k, the pla y er loses 2 dollars. Find the exp ected v alue of the game. 3 In a class there are 20 studen ts: 3 are 5' 6", 5 are 5'8", 4 are 5'10", 4 are 6', and 4 are 6' 2". A studen t is c hosen at random. What is the studen t's exp ected heigh t? 4 In Las V egas the roulette wheel has a 0 and a 00 and then the n um b ers 1 to 36 mark ed on equal slots; the wheel is spun and a ball stops randomly in one slot. When a pla y er b ets 1 dollar on a n um b er, he receiv es 36 dollars if the ball stops on this n um b er, for a net gain of 35 dollars; otherwise, he loses his dollar b et. Find the exp ected v alue for his winnings. 5 In a second v ersion of roulette in Las V egas, a pla y er b ets on red or blac k. Half of the n um b ers from 1 to 36 are red, and half are blac k. If a pla y er b ets a dollar on blac k, and if the ball stops on a blac k n um b er, he gets his dollar bac k and another dollar. If the ball stops on a red n um b er or on 0 or 00 he loses his dollar. Find the exp ected winnings for this b et. 6 A die is rolled t wice. Let X denote the sum of the t w o n um b ers that turn up, and Y the dierence of the n um b ers (sp ecically the n um b er on the rst roll min us the n um b er on the second). Sho w that E ( X Y ) = E ( X ) E ( Y ). Are X and Y indep enden t? v ol. 17 (1693), pp. 596{610; 654{656. 10 E. Thorp, Be at the De aler (New Y ork: Random House, 1962). PAGE 256 248 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE *7 Sho w that, if X and Y are random v ariables taking on only t w o v alues eac h, and if E ( X Y ) = E ( X ) E ( Y ), then X and Y are indep enden t. 8 A ro y al family has c hildren un til it has a b o y or un til it has three c hildren, whic hev er comes rst. Assume that eac h c hild is a b o y with probabilit y 1/2. Find the exp ected n um b er of b o ys in this ro y al family and the exp ected n umb er of girls. 9 If the rst roll in a game of craps is neither a natural nor craps, the pla y er can mak e an additional b et, equal to his original one, that he will mak e his p oin t b efore a sev en turns up. If his p oin t is four or ten he is paid o at 2 : 1 o dds; if it is a v e or nine he is paid o at o dds 3 : 2; and if it is a six or eigh t he is paid o at o dds 6 : 5. Find the pla y er's exp ected winnings if he mak es this additional b et when he has the opp ortunit y 10 In Example 6.16 assume that Mr. Ace decides to buy the sto c k and hold it un til it go es up 1 dollar and then sell and not buy again. Mo dify the program Sto c kSystem to nd the distribution of his prot under this system after a t w en t yda y p erio d. Find the exp ected prot and the probabilit y that he comes out ahead. 11 On Septem b er 26, 1980, the New Y ork Times rep orted that a m ysterious stranger stro de in to a Las V egas casino, placed a single b et of 777,000 dollars on the \don't pass" line at the crap table, and w alk ed a w a y with more than 1.5 million dollars. In the \don't pass" b et, the b ettor is essen tially b etting with the house. An exception o ccurs if the roller rolls a 12 on the rst roll. In this case, the roller loses and the \don't pass" b etter just gets bac k the money b et instead of winning. Sho w that the \don't pass" b ettor has a more fa v orable b et than the roller. 12 Recall that in the martingale doubling system (see Exercise 1.1.10), the pla y er doubles his b et eac h time he loses. Supp ose that y ou are pla ying roulette in a fair c asino where there are no 0's, and y ou b et on red eac h time. Y ou then win with probabilit y 1/2 eac h time. Assume that y ou en ter the casino with 100 dollars, start with a 1dollar b et and emplo y the martingale system. Y ou stop as so on as y ou ha v e w on one b et, or in the unlik ely ev en t that blac k turns up six times in a ro w so that y ou are do wn 63 dollars and cannot mak e the required 64dollar b et. Find y our exp ected winnings under this system of pla y 13 Y ou ha v e 80 dollars and pla y the follo wing game. An urn con tains t w o white balls and t w o blac k balls. Y ou dra w the balls out one at a time without replacemen t un til all the balls are gone. On eac h dra w, y ou b et half of y our presen t fortune that y ou will dra w a white ball. What is y our exp ected nal fortune? 14 In the hat c hec k problem (see Example 3.12), it w as assumed that N p eople c hec k their hats and the hats are handed bac k at random. Let X j = 1 if the PAGE 257 6.1. EXPECTED V ALUE 249 j th p erson gets his or her hat and 0 otherwise. Find E ( X j ) and E ( X j X k ) for j not equal to k Are X j and X k indep enden t? 15 A b o x con tains t w o gold balls and three silv er balls. Y ou are allo w ed to c ho ose successiv ely balls from the b o x at random. Y ou win 1 dollar eac h time y ou dra w a gold ball and lose 1 dollar eac h time y ou dra w a silv er ball. After a dra w, the ball is not replaced. Sho w that, if y ou dra w un til y ou are ahead b y 1 dollar or un til there are no more gold balls, this is a fa v orable game. 16 Gerolamo Cardano in his b o ok, The Gambling Scholar, written in the early 1500s, considers the follo wing carniv al game. There are six dice. Eac h of the dice has v e blank sides. The sixth side has a n um b er b et w een 1 and 6a dieren t n um b er on eac h die. The six dice are rolled and the pla y er wins a prize dep ending on the total of the n um b ers whic h turn up. (a) Find, as Cardano did, the exp ected total without nding its distribution. (b) Large prizes w ere giv en for large totals with a mo dest fee to pla y the game. Explain wh y this could b e done. 17 Let X b e the rst time that a failur e o ccurs in an innite sequence of Bernoulli trials with probabilit y p for success. Let p k = P ( X = k ) for k = 1, 2, Sho w that p k = p k 1 q where q = 1 p Sho w that P k p k = 1. Sho w that E ( X ) = 1 =q What is the exp ected n um b er of tosses of a coin required to obtain the rst tail? 18 Exactly one of six similar k eys op ens a certain do or. If y ou try the k eys, one after another, what is the exp ected n um b er of k eys that y ou will ha v e to try b efore success? 19 A m ultiple c hoice exam is giv en. A problem has four p ossible answ ers, and exactly one answ er is correct. The studen t is allo w ed to c ho ose a subset of the four p ossible answ ers as his answ er. If his c hosen subset con tains the correct answ er, the studen t receiv es three p oin ts, but he loses one p oin t for eac h wrong answ er in his c hosen subset. Sho w that if he just guesses a subset uniformly and randomly his exp ected score is zero. 20 Y ou are oered the follo wing game to pla y: a fair coin is tossed un til heads turns up for the rst time (see Example 6.3). If this o ccurs on the rst toss y ou receiv e 2 dollars, if it o ccurs on the second toss y ou receiv e 2 2 = 4 dollars and, in general, if heads turns up for the rst time on the n th toss y ou receiv e 2 n dollars. (a) Sho w that the exp ected v alue of y our winnings do es not exist (i.e., is giv en b y a div ergen t sum) for this game. Do es this mean that this game is fa v orable no matter ho w m uc h y ou pa y to pla y it? (b) Assume that y ou only receiv e 2 10 dollars if an y n um b er greater than or equal to ten tosses are required to obtain the rst head. Sho w that y our exp ected v alue for this mo died game is nite and nd its v alue. PAGE 258 250 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE (c) Assume that y ou pa y 10 dollars for eac h pla y of the original game. W rite a program to sim ulate 100 pla ys of the game and see ho w y ou do. (d) No w assume that the utilit y of n dollars is p n W rite an expression for the exp ected utilit y of the pa ymen t, and sho w that this expression has a nite v alue. Estimate this v alue. Rep eat this exercise for the case that the utilit y function is log( n ). 21 Let X b e a random v ariable whic h is P oisson distributed with parameter Sho w that E ( X ) = Hint : Recall that e x = 1 + x + x 2 2! + x 3 3! + : 22 Recall that in Exercise 1.1.14, w e considered a to wn with t w o hospitals. In the large hospital ab out 45 babies are b orn eac h da y and in the smaller hospital ab out 15 babies are b orn eac h da y W e w ere in terested in guessing whic h hospital w ould ha v e on the a v erage the largest n um b er of da ys with the prop ert y that more than 60 p ercen t of the c hildren b orn on that da y are b o ys. F or eac h hospital nd the exp ected n um b er of da ys in a y ear that ha v e the prop ert y that more than 60 p ercen t of the c hildren b orn on that da y w ere b o ys. 23 An insurance compan y has 1,000 p olicies on men of age 50. The compan y estimates that the probabilit y that a man of age 50 dies within a y ear is .01. Estimate the n um b er of claims that the compan y can exp ect from b eneciaries of these men within a y ear. 24 Using the life table for 1981 in App endix C, write a program to compute the exp ected lifetime for males and females of eac h p ossible age from 1 to 85. Compare the results for males and females. Commen t on whether life insurance should b e priced dieren tly for males and females. *25 A dec k of ESP cards consists of 20 cards eac h of t w o t yp es: sa y ten stars, ten circles (normally there are v e t yp es). The dec k is sh ued and the cards turned up one at a time. Y ou, the alleged p ercipien t, are to name the sym b ol on eac h card b efor e it is turned up. Supp ose that y ou are really just guessing at the cards. If y ou do not get to see eac h card after y ou ha v e made y our guess, then it is easy to calculate the exp ected n um b er of correct guesses, namely ten. If, on the other hand, y ou are guessing with information, that is, if y ou see eac h card after y our guess, then, of course, y ou migh t exp ect to get a higher score. This is indeed the case, but calculating the correct exp ectation is no longer easy But it is easy to do a computer sim ulation of this guessing with information, so w e can get a go o d idea of the exp ectation b y sim ulation. (This is similar to the w a y that skilled blac kjac k pla y ers mak e blac kjac k in to a fa v orable game b y observing the cards that ha v e already b een pla y ed. See Exercise 29.) PAGE 259 6.1. EXPECTED V ALUE 251 (a) First, do a sim ulation of guessing without information, rep eating the exp erimen t at least 1000 times. Estimate the exp ected n um b er of correct answ ers and compare y our result with the theoretical exp ectation. (b) What is the b est strategy for guessing with information? (c) Do a sim ulation of guessing with information, using the strategy in (b). Rep eat the exp erimen t at least 1000 times, and estimate the exp ectation in this case. (d) Let S b e the n um b er of stars and C the n um b er of circles in the dec k. Let h ( S; C ) b e the exp ected winnings using the optimal guessing strategy in (b). Sho w that h ( S; C ) satises the recursion relation h ( S; C ) = S S + C h ( S 1 ; C ) + C S + C h ( S; C 1) + max ( S; C ) S + C ; and h (0 ; 0) = h ( 1 ; 0) = h (0 ; 1) = 0. Using this relation, write a program to compute h ( S; C ) and nd h (10 ; 10). Compare the computed v alue of h (10 ; 10) with the result of y our sim ulation in (c). F or more ab out this exercise and Exercise 26 see Diaconis and Graham. 11 *26 Consider the ESP problem as describ ed in Exercise 25. Y ou are again guessing with information, and y ou are using the optimal guessing strategy of guessing star if the remaining dec k has more stars, cir cle if more circles, and tossing a coin if the n um b er of stars and circles are equal. Assume that S C where S is the n um b er of stars and C the n um b er of circles. W e can plot the results of a t ypical game on a graph, where the horizon tal axis represen ts the n um b er of steps and the v ertical axis represen ts the dier enc e b et w een the n um b er of stars and the n um b er of circles that ha v e b een turned up. A t ypical game is sho wn in Figure 6.6. In this particular game, the order in whic h the cards w ere turned up is ( C ; S; S; S; S; C ; C ; S; S; C ). Th us, in this particular game, there w ere six stars and four circles in the dec k. This means, in particular, that ev ery game pla y ed with this dec k w ould ha v e a graph whic h ends at the p oin t (10 ; 2). W e dene the line L to b e the horizon tal line whic h go es through the ending p oin t on the graph (so its v ertical co ordinate is just the dierence b et w een the n um b er of stars and circles in the dec k). (a) Sho w that, when the random w alk is b elo w the line L the pla y er guesses righ t when the graph go es up (star is turned up) and, when the w alk is ab o v e the line, the pla y er guesses righ t when the w alk go es do wn (circle turned up). Sho w from this prop ert y that the sub ject is sure to ha v e at least S correct guesses. (b) When the w alk is at a p oin t ( x; x ) on the line L the n um b er of stars and circles remaining is the same, and so the sub ject tosses a coin. Sho w that 11 P Diaconis and R. Graham, \The Analysis of Sequen tial Exp erimen ts with F eedbac k to Subjects," A nnals of Statistics, v ol. 9 (1981), pp. 3{23. PAGE 260 252 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE 2 1 1 2 3 4 5 6 7 8 9 10 (10,2) L Figure 6.6: Random w alk for ESP the probabilit y that the w alk reac hes ( x; x ) is S x C x S + C 2 x : Hint : The outcomes of 2 x cards is a h yp ergeometric distribution (see Section 5.1). (c) Using the results of (a) and (b) sho w that the exp ected n um b er of correct guesses under in telligen t guessing is S + C X x =1 1 2 S x C x S + C 2 x : 27 It has b een said 12 that a Dr. B. Muriel Bristol declined a cup of tea stating that she preferred a cup in to whic h milk had b een p oured rst. The famous statistician R. A. Fisher carried out a test to see if she could tell whether milk w as put in b efore or after the tea. Assume that for the test Dr. Bristol w as giv en eigh t cups of teafour in whic h the milk w as put in b efore the tea and four in whic h the milk w as put in after the tea. (a) What is the exp ected n um b er of correct guesses the lady w ould mak e if she had no information after eac h test and w as just guessing? (b) Using the result of Exercise 26 nd the exp ected n um b er of correct guesses if she w as told the result of eac h guess and used an optimal guessing strategy 28 In a p opular computer game the computer pic ks an in teger from 1 to n at random. The pla y er is giv en k c hances to guess the n um b er. After eac h guess the computer resp onds \correct," \to o small," or \to o big." 12 J. F. Bo x, R. A. Fisher, The Life of a Scientist (New Y ork: John Wiley and Sons, 1978). PAGE 261 6.1. EXPECTED V ALUE 253 (a) Sho w that if n 2 k 1, then there is a strategy that guaran tees y ou will correctly guess the n um b er in k tries. (b) Sho w that if n 2 k 1, there is a strategy that assures y ou of iden tifying one of 2 k 1 n um b ers and hence giv es a probabilit y of (2 k 1) =n of winning. Wh y is this an optimal strategy? Illustrate y our result in terms of the case n = 9 and k = 3. 29 In the casino game of blac kjac k the dealer is dealt t w o cards, one face up and one face do wn, and eac h pla y er is dealt t w o cards, b oth face do wn. If the dealer is sho wing an ace the pla y er can lo ok at his do wn cards and then mak e a b et called an insur anc e b et. (Exp ert pla y ers will recognize wh y it is called insurance.) If y ou mak e this b et y ou will win the b et if the dealer's second card is a ten c ar d : namely a ten, jac k, queen, or king. If y ou win, y ou are paid t wice y our insurance b et; otherwise y ou lose this b et. Sho w that, if the only cards y ou can see are the dealer's ace and y our t w o cards and if y our cards are not ten cards, then the insurance b et is an unfa v orable b et. Sho w, ho w ev er, that if y ou are pla ying t w o hands sim ultaneously and y ou ha v e no ten cards, then it is a fa v orable b et. (Thorp 13 has sho wn that the game of blac kjac k is fa v orable to the pla y er if he or she can k eep go o d enough trac k of the cards that ha v e b een pla y ed.) 30 Assume that, ev ery time y ou buy a b o x of Wheaties, y ou receiv e a picture of one of the n pla y ers for the New Y ork Y ank ees (see Exercise 3.2.34). Let X k b e the n um b er of additional b o xes y ou ha v e to buy after y ou ha v e obtained k 1 dieren t pictures, in order to obtain the next new picture. Th us X 1 = 1, X 2 is the n um b er of b o xes b ough t after this to obtain a picture dieren t from the rst pictured obtained, and so forth. (a) Sho w that X k has a geometric distribution with p = ( n k + 1) =n (b) Sim ulate the exp erimen t for a team with 26 pla y ers (25 w ould b e more accurate but w e w an t an ev en n um b er). Carry out a n um b er of sim ulations and estimate the exp ected time required to get the rst 13 pla y ers and the exp ected time to get the second 13. Ho w do these exp ectations compare? (c) Sho w that, if there are 2 n pla y ers, the exp ected time to get the rst half of the pla y ers is 2 n 1 2 n + 1 2 n 1 + + 1 n + 1 ; and the exp ected time to get the second half is 2 n 1 n + 1 n 1 + + 1 : 13 E. Thorp, Be at the De aler (New Y ork: Random House, 1962). PAGE 262 254 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE (d) In Example 6.11 w e stated that 1 + 1 2 + 1 3 + + 1 n log n + : 5772 + 1 2 n : Use this to estimate the expression in (c). Compare these estimates with the exact v alues and also with y our estimates obtained b y sim ulation for the case n = 26. *31 (F eller 14 ) A large n um b er, N of p eople are sub jected to a blo o d test. This can b e administered in t w o w a ys: (1) Eac h p erson can b e tested separately in this case N test are required, (2) the blo o d samples of k p ersons can b e p o oled and analyzed together. If this test is ne gative, this one test suces for the k p eople. If the test is p ositive, eac h of the k p ersons m ust b e tested separately and in all, k + 1 tests are required for the k p eople. Assume that the probabilit y p that a test is p ositiv e is the same for all p eople and that these ev en ts are indep enden t. (a) Find the probabilit y that the test for a p o oled sample of k p eople will b e p ositiv e. (b) What is the exp ected v alue of the n um b er X of tests necessary under plan (2)? (Assume that N is divisible b y k .) (c) F or small p sho w that the v alue of k whic h will minimize the exp ected n um b er of tests under the second plan is appro ximately 1 = p p 32 W rite a program to add random n um b ers c hosen from [0 ; 1] un til the rst time the sum is greater than one. Ha v e y our program rep eat this exp erimen t a n um b er of times to estimate the exp ected n um b er of selections necessary in order that the sum of the c hosen n um b ers rst exceeds 1. On the basis of y our exp erimen ts, what is y our estimate for this n um b er? *33 The follo wing related discrete problem also giv es a go o d clue for the answ er to Exercise 32. Randomly select with replacemen t t 1 t 2 t r from the set (1 =n; 2 =n; : : : ; n=n ). Let X b e the smallest v alue of r satisfying t 1 + t 2 + + t r > 1 : Then E ( X ) = (1 + 1 =n ) n T o pro v e this, w e can just as w ell c ho ose t 1 t 2 t r randomly with replacemen t from the set (1 ; 2 ; : : : ; n ) and let X b e the smallest v alue of r for whic h t 1 + t 2 + + t r > n : (a) Use Exercise 3.2.36 to sho w that P ( X j + 1) = n j 1 n j : 14 W. F eller, Intr o duction to Pr ob ability The ory and Its Applic ations, 3rd ed., v ol. 1 (New Y ork: John Wiley and Sons, 1968), p. 240. PAGE 263 6.1. EXPECTED V ALUE 255 (b) Sho w that E ( X ) = n X j =0 P ( X j + 1) : (c) F rom these t w o facts, nd an expression for E ( X ). This pro of is due to Harris Sc h ultz. 15 *34 (Banac h's Matc h b o x 16 ) A man carries in eac h of his t w o fron t p o c k ets a b o x of matc hes originally con taining N matc hes. Whenev er he needs a matc h, he c ho oses a p o c k et at random and remo v es one from that b o x. One da y he reac hes in to a p o c k et and nds the b o x empt y (a) Let p r denote the probabilit y that the other p o c k et con tains r matc hes. Dene a sequence of c ounter random v ariables as follo ws: Let X i = 1 if the i th dra w is from the left p o c k et, and 0 if it is from the righ t p o c k et. In terpret p r in terms of S n = X 1 + X 2 + + X n Find a binomial expression for p r (b) W rite a computer program to compute the p r as w ell as the probabilit y that the other p o c k et con tains at least r matc hes, for N = 100 and r from 0 to 50. (c) Sho w that ( N r ) p r = (1 = 2)(2 N + 1) p r +1 (1 = 2)( r + 1) p r +1 (d) Ev aluate P r p r (e) Use (c) and (d) to determine the exp ectation E of the distribution f p r g (f ) Use Stirling's form ula to obtain an appro ximation for E Ho w man y matc hes m ust eac h b o x con tain to ensure a v alue of ab out 13 for the exp ectation E ? (T ak e = 22 = 7.) 35 A coin is tossed un til the rst time a head turns up. If this o ccurs on the n th toss and n is o dd y ou win 2 n =n but if n is ev en then y ou lose 2 n =n Then if y our exp ected winnings exist they are giv en b y the con v ergen t series 1 1 2 + 1 3 1 4 + called the alternating harmonic series. It is tempting to sa y that this should b e the exp ected v alue of the exp erimen t. Sho w that if w e w ere to do this, the exp ected v alue of an exp erimen t w ould dep end up on the order in whic h the outcomes are listed. 36 Supp ose w e ha v e an urn con taining c y ello w balls and d green balls. W e dra w k balls, without replacemen t, from the urn. Find the exp ected n um b er of y ello w balls dra wn. Hint : W rite the n um b er of y ello w balls dra wn as the sum of c random v ariables. 15 H. Sc h ultz, \An Exp ected V alue Problem," TwoY e ar Mathematics Journal, v ol. 10, no. 4 (1979), pp. 277{78. 16 W. F eller, Intr o duction to Pr ob ability The ory, v ol. 1, p. 166. PAGE 264 256 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE 37 The reader is referred to Example 6.13 for an explanation of the v arious options a v ailable in Mon te Carlo roulette. (a) Compute the exp ected winnings of a 1 franc b et on red under option (a). (b) Rep eat part (a) for option (b). (c) Compare the exp ected winnings for all three options. *38 (from Pittel 17 ) T elephone b o oks, n in n um b er, are k ept in a stac k. The probabilit y that the b o ok n um b ered i (where 1 i n ) is consulted for a giv en phone call is p i > 0, where the p i 's sum to 1. After a b o ok is used, it is placed at the top of the stac k. Assume that the calls are indep enden t and ev enly spaced, and that the system has b een emplo y ed indenitely far in to the past. Let d i b e the a v erage depth of b o ok i in the stac k. Sho w that d i d j whenev er p i p j Th us, on the a v erage, the more p opular b o oks ha v e a tendency to b e closer to the top of the stac k. Hint : Let p ij denote the probabilit y that b o ok i is ab o v e b o ok j Sho w that p ij = p ij (1 p j ) + p j i p i *39 (from Propp 18 ) In the previous problem, let P b e the probabilit y that at the presen t time, eac h b o ok is in its prop er place, i.e., b o ok i is i th from the top. Find a form ula for P in terms of the p i 's. In addition, nd the least upp er b ound on P if the p i 's are allo w ed to v ary Hint : First nd the probabilit y that b o ok 1 is in the righ t place. Then nd the probabilit y that b o ok 2 is in the righ t place, giv en that b o ok 1 is in the righ t place. Con tin ue. *40 (from H. Sh ultz and B. Leonard 19 ) A sequence of random n um b ers in [0 ; 1) is generated un til the sequence is no longer monotone increasing. The n umb ers are c hosen according to the uniform distribution. What is the exp ected length of the sequence? (In calculating the length, the term that destro ys monotonicit y is included.) Hint : Let a 1 ; a 2 ; : : : b e the sequence and let X denote the length of the sequence. Then P ( X > k ) = P ( a 1 < a 2 < < a k ) ; and the probabilit y on the righ thand side is easy to calculate. F urthermore, one can sho w that E ( X ) = 1 + P ( X > 1) + P ( X > 2) + : 41 Let T b e the random v ariable that coun ts the n um b er of 2unsh ues p erformed on an n card dec k un til all of the lab els on the cards are distinct. This random v ariable w as discussed in Section 3.3. Using Equation 3.4 in that section, together with the form ula E ( T ) = 1 X s =0 P ( T > s ) 17 B. Pittel, Problem #1195, Mathematics Magazine, v ol. 58, no. 3 (Ma y 1985), pg. 183. 18 J. Propp, Problem #1159, Mathematics Magazine v ol. 57, no. 1 (F eb. 1984), pg. 50. 19 H. Sh ultz and B. Leonard, \Unexp ected Occurrences of the Num b er e ," Mathematics Magazine v ol. 62, no. 4 (Octob er, 1989), pp. 269271. PAGE 265 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 257 that w as pro v ed in Exercise 33, sho w that E ( T ) = 1 X s =0 1 2 s n n 2 sn : Sho w that for n = 52, this expression is appro ximately equal to 11.7. (As w as stated in Chapter 3, this means that on the a v erage, almost 12 rie sh ues of a 52card dec k are required in order for the pro cess to b e considered random.) 6.2 V ariance of Discrete Random V ariables The usefulness of the exp ected v alue as a prediction for the outcome of an exp erimen t is increased when the outcome is not lik ely to deviate to o m uc h from the exp ected v alue. In this section w e shall in tro duce a measure of this deviation, called the v ariance. V ariance Denition 6.3 Let X b e a n umerically v alued random v ariable with exp ected v alue = E ( X ). Then the varianc e of X denoted b y V ( X ), is V ( X ) = E (( X ) 2 ) : 2 Note that, b y Theorem 6.1, V ( X ) is giv en b y V ( X ) = X x ( x ) 2 m ( x ) ; (6.1) where m is the distribution function of X Standard Deviation The standar d deviation of X denoted b y D ( X ), is D ( X ) = p V ( X ). W e often write for D ( X ) and 2 for V ( X ). Example 6.17 Consider one roll of a die. Let X b e the n um b er that turns up. T o nd V ( X ), w e m ust rst nd the exp ected v alue of X This is = E ( X ) = 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = 7 2 : T o nd the v ariance of X w e form the new random v ariable ( X ) 2 and compute its exp ectation. W e can easily do this using the follo wing table. PAGE 266 258 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE x m ( x ) ( x 7 = 2) 2 1 1/6 25/4 2 1/6 9/4 3 1/6 1/4 4 1/6 1/4 5 1/6 9/4 6 1/6 25/4 T able 6.6: V ariance calculation. F rom this table w e nd E (( X ) 2 ) is V ( X ) = 1 6 25 4 + 9 4 + 1 4 + 1 4 + 9 4 + 25 4 = 35 12 ; and the standard deviation D ( X ) = p 35 = 12 1 : 707. 2 Calculation of V ariance W e next pro v e a theorem that giv es us a useful alternativ e form for computing the v ariance. Theorem 6.6 If X is an y random v ariable with E ( X ) = then V ( X ) = E ( X 2 ) 2 : Pro of. W e ha v e V ( X ) = E (( X ) 2 ) = E ( X 2 2 X + 2 ) = E ( X 2 ) 2 E ( X ) + 2 = E ( X 2 ) 2 : 2 Using Theorem 6.6, w e can compute the v ariance of the outcome of a roll of a die b y rst computing E ( X 2 ) = 1 1 6 + 4 1 6 + 9 1 6 + 16 1 6 + 25 1 6 + 36 1 6 = 91 6 ; and, V ( X ) = E ( X 2 ) 2 = 91 6 7 2 2 = 35 12 ; in agreemen t with the v alue obtained directly from the denition of V ( X ). PAGE 267 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 259 Prop erties of V ariance The v ariance has prop erties v ery dieren t from those of the exp ectation. If c is an y constan t, E ( cX ) = cE ( X ) and E ( X + c ) = E ( X ) + c These t w o statemen ts imply that the exp ectation is a linear function. Ho w ev er, the v ariance is not linear, as seen in the next theorem. Theorem 6.7 If X is an y random v ariable and c is an y constan t, then V ( cX ) = c 2 V ( X ) and V ( X + c ) = V ( X ) : Pro of. Let = E ( X ). Then E ( cX ) = c and V ( cX ) = E (( cX c ) 2 ) = E ( c 2 ( X ) 2 ) = c 2 E (( X ) 2 ) = c 2 V ( X ) : T o pro v e the second assertion, w e note that, to compute V ( X + c ), w e w ould replace x b y x + c and b y + c in Equation 6.1. Then the c 's w ould cancel, lea ving V ( X ). 2 W e turn no w to some general prop erties of the v ariance. Recall that if X and Y are an y t w o random v ariables, E ( X + Y ) = E ( X ) + E ( Y ). This is not alw a ys true for the case of the v ariance. F or example, let X b e a random v ariable with V ( X ) 6 = 0, and dene Y = X Then V ( X ) = V ( Y ), so that V ( X ) + V ( Y ) = 2 V ( X ). But X + Y is alw a ys 0 and hence has v ariance 0. Th us V ( X + Y ) 6 = V ( X ) + V ( Y ). In the imp ortan t case of m utually indep enden t random v ariables, ho w ev er, the varianc e of the sum is the sum of the varianc es. Theorem 6.8 Let X and Y b e t w o indep endent random v ariables. Then V ( X + Y ) = V ( X ) + V ( Y ) : Pro of. Let E ( X ) = a and E ( Y ) = b Then V ( X + Y ) = E (( X + Y ) 2 ) ( a + b ) 2 = E ( X 2 ) + 2 E ( X Y ) + E ( Y 2 ) a 2 2 ab b 2 : Since X and Y are indep enden t, E ( X Y ) = E ( X ) E ( Y ) = ab Th us, V ( X + Y ) = E ( X 2 ) a 2 + E ( Y 2 ) b 2 = V ( X ) + V ( Y ) : 2 PAGE 268 260 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE It is easy to extend this pro of, b y mathematical induction, to sho w that the varianc e of the sum of any numb er of mutual ly indep endent r andom variables is the sum of the individual varianc es. Th us w e ha v e the follo wing theorem. Theorem 6.9 Let X 1 X 2 X n b e an indep enden t trials pro cess with E ( X j ) = and V ( X j ) = 2 Let S n = X 1 + X 2 + + X n b e the sum, and A n = S n n b e the a v erage. Then E ( S n ) = n ; V ( S n ) = n 2 ; ( S n ) = p n ; E ( A n ) = ; V ( A n ) = 2 ; ( A n ) = p n : Pro of. Since all the random v ariables X j ha v e the same exp ected v alue, w e ha v e E ( S n ) = E ( X 1 ) + + E ( X n ) = n ; V ( S n ) = V ( X 1 ) + + V ( X n ) = n 2 ; and ( S n ) = p n : W e ha v e seen that, if w e m ultiply a random v ariable X with mean and v ariance 2 b y a constan t c the new random v ariable has exp ected v alue c and v ariance c 2 2 Th us, E ( A n ) = E S n n = n n = ; and V ( A n ) = V S n n = V ( S n ) n 2 = n 2 n 2 = 2 n : Finally the standard deviation of A n is giv en b y ( A n ) = p n : 2 PAGE 269 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 261 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 n = 10 n = 100 Figure 6.7: Empirical distribution of A n The last equation in the ab o v e theorem implies that in an indep enden t trials pro cess, if the individual summands ha v e nite v ariance, then the standard deviation of the a v erage go es to 0 as n 1 Since the standard deviation tells us something ab out the spread of the distribution around the mean, w e see that for large v alues of n the v alue of A n is usually v ery close to the mean of A n whic h equals as sho wn ab o v e. This statemen t is made precise in Chapter 8, where it is called the La w of Large Num b ers. F or example, let X represen t the roll of a fair die. In Figure 6.7, w e sho w the distribution of a random v ariable A n corresp onding to X for n = 10 and n = 100. Example 6.18 Consider n rolls of a die. W e ha v e seen that, if X j is the outcome if the j th roll, then E ( X j ) = 7 = 2 and V ( X j ) = 35 = 12. Th us, if S n is the sum of the outcomes, and A n = S n =n is the a v erage of the outcomes, w e ha v e E ( A n ) = 7 = 2 and V ( A n ) = (35 = 12) =n Therefore, as n increases, the exp ected v alue of the a v erage remains constan t, but the v ariance tends to 0. If the v ariance is a measure of the exp ected deviation from the mean this w ould indicate that, for large n w e can exp ect the a v erage to b e v ery near the exp ected v alue. This is in fact the case, and w e shall justify it in Chapter 8. 2 Bernoulli T rials Consider next the general Bernoulli trials pro cess. As usual, w e let X j = 1 if the j th outcome is a success and 0 if it is a failure. If p is the probabilit y of a success, and q = 1 p then E ( X j ) = 0 q + 1 p = p ; E ( X 2 j ) = 0 2 q + 1 2 p = p ; and V ( X j ) = E ( X 2 j ) ( E ( X j )) 2 = p p 2 = pq : Th us, for Bernoulli trials, if S n = X 1 + X 2 + + X n is the n um b er of successes, then E ( S n ) = np V ( S n ) = npq and D ( S n ) = p npq : If A n = S n =n is the a v erage n um b er of successes, then E ( A n ) = p V ( A n ) = pq =n and D ( A n ) = p pq =n W e see that the exp ected prop ortion of successes remains p and the v ariance tends to 0. PAGE 270 262 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE This suggests that the frequency in terpretation of probabilit y is a correct one. W e shall mak e this more precise in Chapter 8. Example 6.19 Let T denote the n um b er of trials un til the rst success in a Bernoulli trials pro cess. Then T is geometrically distributed. What is the v ariance of T ? In Example 4.15, w e sa w that m T = 1 2 3 p q p q 2 p : In Example 6.4, w e sho w ed that E ( T ) = 1 =p : Th us, V ( T ) = E ( T 2 ) 1 =p 2 ; so w e need only nd E ( T 2 ) = 1 p + 4 q p + 9 q 2 p + = p (1 + 4 q + 9 q 2 + ) : T o ev aluate this sum, w e start again with 1 + x + x 2 + = 1 1 x : Dieren tiating, w e obtain 1 + 2 x + 3 x 2 + = 1 (1 x ) 2 : Multiplying b y x x + 2 x 2 + 3 x 3 + = x (1 x ) 2 : Dieren tiating again giv es 1 + 4 x + 9 x 2 + = 1 + x (1 x ) 3 : Th us, E ( T 2 ) = p 1 + q (1 q ) 3 = 1 + q p 2 and V ( T ) = E ( T 2 ) ( E ( T )) 2 = 1 + q p 2 1 p 2 = q p 2 : F or example, the v ariance for the n um b er of tosses of a coin un til the rst head turns up is (1 = 2) = (1 = 2) 2 = 2. The v ariance for the n um b er of rolls of a die un til the rst six turns up is (5 = 6) = (1 = 6) 2 = 30. Note that, as p decreases, the v ariance increases rapidly This corresp onds to the increased spread of the geometric distribution as p decreases (noted in Figure 5.1). 2 PAGE 271 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 263 P oisson Distribution Just as in the case of exp ected v alues, it is easy to guess the v ariance of the P oisson distribution with parameter W e recall that the v ariance of a binomial distribution with parameters n and p equals npq W e also recall that the P oisson distribution could b e obtained as a limit of binomial distributions, if n go es to 1 and p go es to 0 in suc h a w a y that their pro duct is k ept xed at the v alue In this case, npq = q approac hes since q go es to 1. So, giv en a P oisson distribution with parameter w e should guess that its v ariance is The reader is ask ed to sho w this in Exercise 29. Exercises 1 A n um b er is c hosen at random from the set S = f 1 ; 0 ; 1 g Let X b e the n um b er c hosen. Find the exp ected v alue, v ariance, and standard deviation of X 2 A random v ariable X has the distribution p X = 0 1 2 4 1 = 3 1 = 3 1 = 6 1 = 6 : Find the exp ected v alue, v ariance, and standard deviation of X 3 Y ou place a 1dollar b et on the n um b er 17 at Las V egas, and y our friend places a 1dollar b et on blac k (see Exercises 1.1.6 and 1.1.7). Let X b e y our winnings and Y b e her winnings. Compare E ( X ), E ( Y ), and V ( X ), V ( Y ). What do these computations tell y ou ab out the nature of y our winnings if y ou and y our friend mak e a sequence of b ets, with y ou b etting eac h time on a n um b er and y our friend b etting on a color? 4 X is a random v ariable with E ( X ) = 100 and V ( X ) = 15. Find (a) E ( X 2 ). (b) E (3 X + 10). (c) E ( X ). (d) V ( X ). (e) D ( X ). 5 In a certain man ufacturing pro cess, the (F ahrenheit) temp erature nev er v aries b y more than 2 from 62 The temp erature is, in fact, a random v ariable F with distribution P F = 60 61 62 63 64 1 = 10 2 = 10 4 = 10 2 = 10 1 = 10 : (a) Find E ( F ) and V ( F ). (b) Dene T = F 62. Find E ( T ) and V ( T ), and compare these answ ers with those in part (a). PAGE 272 264 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE (c) It is decided to rep ort the temp erature readings on a Celsius scale, that is, C = (5 = 9)( F 32). What is the exp ected v alue and v ariance for the readings no w? 6 W rite a computer program to calculate the mean and v ariance of a distribution whic h y ou sp ecify as data. Use the program to compare the v ariances for the follo wing densities, b oth ha ving exp ected v alue 0: p X = 2 1 0 1 2 3 = 11 2 = 11 1 = 11 2 = 11 3 = 11 ; p Y = 2 1 0 1 2 1 = 11 2 = 11 5 = 11 2 = 11 1 = 11 : 7 A coin is tossed three times. Let X b e the n um b er of heads that turn up. Find V ( X ) and D ( X ). 8 A random sample of 2400 p eople are ask ed if they fa v or a go v ernmen t prop osal to dev elop new n uclear p o w er plan ts. If 40 p ercen t of the p eople in the coun try are in fa v or of this prop osal, nd the exp ected v alue and the standard deviation for the n um b er S 2400 of p eople in the sample who fa v ored the prop osal. 9 A die is loaded so that the probabilit y of a face coming up is prop ortional to the n um b er on that face. The die is rolled with outcome X Find V ( X ) and D ( X ). 10 Pro v e the follo wing facts ab out the standard deviation. (a) D ( X + c ) = D ( X ). (b) D ( cX ) = j c j D ( X ). 11 A n um b er is c hosen at random from the in tegers 1, 2, 3, n Let X b e the n um b er c hosen. Sho w that E ( X ) = ( n + 1) = 2 and V ( X ) = ( n 1)( n + 1) = 12. Hint : The follo wing iden tit y ma y b e useful: 1 2 + 2 2 + + n 2 = ( n )( n + 1)(2 n + 1) 6 : 12 Let X b e a random v ariable with = E ( X ) and 2 = V ( X ). Dene X = ( X ) = The random v ariable X is called the standar dize d r andom variable asso ciated with X Sho w that this standardized random v ariable has exp ected v alue 0 and v ariance 1. 13 P eter and P aul pla y Heads or T ails (see Example 1.4). Let W n b e P eter's winnings after n matc hes. Sho w that E ( W n ) = 0 and V ( W n ) = n 14 Find the exp ected v alue and the v ariance for the n um b er of b o ys and the n um b er of girls in a ro y al family that has c hildren un til there is a b o y or un til there are three c hildren, whic hev er comes rst. PAGE 273 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 265 15 Supp ose that n p eople ha v e their hats returned at random. Let X i = 1 if the i th p erson gets his or her o wn hat bac k and 0 otherwise. Let S n = P ni =1 X i Then S n is the total n um b er of p eople who get their o wn hats bac k. Sho w that (a) E ( X 2 i ) = 1 =n (b) E ( X i X j ) = 1 =n ( n 1) for i 6 = j (c) E ( S 2 n ) = 2 (using (a) and (b)). (d) V ( S n ) = 1. 16 Let S n b e the n um b er of successes in n indep enden t trials. Use the program BinomialProbabilities (Section 3.2) to compute, for giv en n p and j the probabilit y P ( j p npq < S n np < j p npq ) : (a) Let p = : 5, and compute this probabilit y for j = 1, 2, 3 and n = 10, 30, 50. Do the same for p = : 2. (b) Sho w that the standar dize d r andom variable S n = ( S n np ) = p npq has exp ected v alue 0 and v ariance 1. What do y our results from (a) tell y ou ab out this standardized quan tit y S n ? 17 Let X b e the outcome of a c hance exp erimen t with E ( X ) = and V ( X ) = 2 When and 2 are unkno wn, the statistician often estimates them b y rep eating the exp erimen t n times with outcomes x 1 x 2 x n estimating b y the sample me an x = 1 n n X i =1 x i ; and 2 b y the sample varianc e s 2 = 1 n n X i =1 ( x i x ) 2 : Then s is the sample standar d deviation. These form ulas should remind the reader of the denitions of the theoretical mean and v ariance. (Man y statisticians dene the sample v ariance with the co ecien t 1 =n replaced b y 1 = ( n 1). If this alternativ e denition is used, the exp ected v alue of s 2 is equal to 2 See Exercise 18, part (d).) W rite a computer program that will roll a die n times and compute the sample mean and sample v ariance. Rep eat this exp erimen t sev eral times for n = 10 and n = 1000. Ho w w ell do the sample mean and sample v ariance estimate the true mean 7/2 and v ariance 35/12? 18 Sho w that, for the sample mean x and sample v ariance s 2 as dened in Exercise 17, (a) E ( x ) = PAGE 274 266 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE (b) E ( x ) 2 = 2 =n (c) E ( s 2 ) = n 1 n 2 Hint : F or (c) write n X i =1 ( x i x ) 2 = n X i =1 ( x i ) ( x ) 2 = n X i =1 ( x i ) 2 2( x ) n X i =1 ( x i ) + n ( x ) 2 = n X i =1 ( x i ) 2 n ( x ) 2 ; and tak e exp ectations of b oth sides, using part (b) when necessary (d) Sho w that if, in the denition of s 2 in Exercise 17, w e replace the co ecien t 1 =n b y the co ecien t 1 = ( n 1), then E ( s 2 ) = 2 (This sho ws wh y man y statisticians use the co ecien t 1 = ( n 1). The n um b er s 2 is used to estimate the unkno wn quan tit y 2 If an estimator has an a v erage v alue whic h equals the quan tit y b eing estimated, then the estimator is said to b e unbiase d Th us, the statemen t E ( s 2 ) = 2 sa ys that s 2 is an un biased estimator of 2 .) 19 Let X b e a random v ariable taking on v alues a 1 a 2 a r with probabilities p 1 p 2 p r and with E ( X ) = Dene the spr e ad of X as follo ws: = r X i =1 j a i j p i : This, lik e the standard deviation, is a w a y to quan tify the amoun t that a random v ariable is spread out around its mean. Recall that the v ariance of a sum of m utually indep enden t random v ariables is the sum of the individual v ariances. The square of the spread corresp onds to the v ariance in a manner similar to the corresp ondence b et w een the spread and the standard deviation. Sho w b y an example that it is not necessarily true that the square of the spread of the sum of t w o indep enden t random v ariables is the sum of the squares of the individual spreads. 20 W e ha v e t w o instrumen ts that measure the distance b et w een t w o p oin ts. The measuremen ts giv en b y the t w o instrumen ts are random v ariables X 1 and X 2 that are indep enden t with E ( X 1 ) = E ( X 2 ) = where is the true distance. F rom exp erience with these instrumen ts, w e kno w the v alues of the v ariances 2 1 and 2 2 These v ariances are not necessarily the same. F rom t w o measuremen ts, w e estimate b y the w eigh ted a v erage = w X 1 + (1 w ) X 2 Here w is c hosen in [0 ; 1] to minimize the v ariance of (a) What is E ( )? (b) Ho w should w b e c hosen in [0 ; 1] to minimize the v ariance of ? PAGE 275 6.2. V ARIANCE OF DISCRETE RANDOM V ARIABLES 267 21 Let X b e a random v ariable with E ( X ) = and V ( X ) = 2 Sho w that the function f ( x ) dened b y f ( x ) = X ( X ( ) x ) 2 p ( ) has its minim um v alue when x = 22 Let X and Y b e t w o random v ariables dened on the nite sample space n. Assume that X Y X + Y and X Y all ha v e the same distribution. Pro v e that P ( X = Y = 0) = 1. 23 If X and Y are an y t w o random v ariables, then the c ovarianc e of X and Y is dened b y Co v( X ; Y ) = E (( X E ( X ))( Y E ( Y ))). Note that Co v( X ; X ) = V ( X ). Sho w that, if X and Y are indep enden t, then Co v( X ; Y ) = 0; and sho w, b y an example, that w e can ha v e Co v( X ; Y ) = 0 and X and Y not indep enden t. *24 A professor wishes to mak e up a truefalse exam with n questions. She assumes that she can design the problems in suc h a w a y that a studen t will answ er the j th problem correctly with probabilit y p j and that the answ ers to the v arious problems ma y b e considered indep enden t exp erimen ts. Let S n b e the n um b er of problems that a studen t will get correct. The professor wishes to c ho ose p j so that E ( S n ) = : 7 n and so that the v ariance of S n is as large as p ossible. Sho w that, to ac hiev e this, she should c ho ose p j = : 7 for all j ; that is, she should mak e all the problems ha v e the same dicult y 25 (Lamp erti 20 ) An urn con tains exactly 5000 balls, of whic h an unkno wn n um b er X are white and the rest red, where X is a random v ariable with a probabilit y distribution on the in tegers 0, 1, 2, 5000. (a) Supp ose w e kno w that E ( X ) = Sho w that this is enough to allo w us to calculate the probabilit y that a ball dra wn at random from the urn will b e white. What is this probabilit y? (b) W e dra w a ball from the urn, examine its color, replace it, and then dra w another. Under what conditions, if an y are the results of the t w o dra wings indep enden t; that is, do es P (white ; white) = P (white) 2 ? (c) Supp ose the v ariance of X is 2 What is the probabilit y of dra wing t w o white balls in part (b)? 26 F or a sequence of Bernoulli trials, let X 1 b e the n um b er of trials un til the rst success. F or j 2, let X j b e the n um b er of trials after the ( j 1)st success un til the j th success. It can b e sho wn that X 1 X 2 is an indep enden t trials pro cess. 20 Priv ate comm unication. PAGE 276 268 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE (a) What is the common distribution, exp ected v alue, and v ariance for X j ? (b) Let T n = X 1 + X 2 + + X n Then T n is the time un til the n th success. Find E ( T n ) and V ( T n ). (c) Use the results of (b) to nd the exp ected v alue and v ariance for the n um b er of tosses of a coin un til the n th o ccurrence of a head. 27 Referring to Exercise 6.1.30, nd the v ariance for the n um b er of b o xes of Wheaties b ough t b efore getting half of the pla y ers' pictures and the v ariance for the n um b er of additional b o xes needed to get the second half of the pla y ers' pictures. 28 In Example 5.3, assume that the b o ok in question has 1000 pages. Let X b e the n um b er of pages with no mistak es. Sho w that E ( X ) = 905 and V ( X ) = 86. Using these results, sho w that the probabilit y is : 05 that there will b e more than 924 pages without errors or few er than 866 pages without errors. 29 Let X b e P oisson distributed with parameter Sho w that V ( X ) = 6.3 Con tin uous Random V ariables In this section w e consider the prop erties of the exp ected v alue and the v ariance of a con tin uous random v ariable. These quan tities are dened just as for discrete random v ariables and share the same prop erties. Exp ected V alue Denition 6.4 Let X b e a realv alued random v ariable with densit y function f ( x ). The exp e cte d value = E ( X ) is dened b y = E ( X ) = Z + 1 1 xf ( x ) dx ; pro vided the in tegral Z + 1 1 j x j f ( x ) dx is nite. 2 The reader should compare this denition with the corresp onding one for discrete random v ariables in Section 6.1. In tuitiv ely w e can in terpret E ( X ), as w e did in the previous sections, as the v alue that w e should exp ect to obtain if w e p erform a large n um b er of indep enden t exp erimen ts and a v erage the resulting v alues of X W e can summarize the prop erties of E ( X ) as follo ws (cf. Theorem 6.2). PAGE 277 6.3. CONTINUOUS RANDOM V ARIABLES 269 Theorem 6.10 If X and Y are realv alued random v ariables and c is an y constan t, then E ( X + Y ) = E ( X ) + E ( Y ) ; E ( cX ) = cE ( X ) : The pro of is v ery similar to the pro of of Theorem 6.2, and w e omit it. 2 More generally if X 1 X 2 X n are n realv alued random v ariables, and c 1 c 2 c n are n constan ts, then E ( c 1 X 1 + c 2 X 2 + + c n X n ) = c 1 E ( X 1 ) + c 2 E ( X 2 ) + + c n E ( X n ) : Example 6.20 Let X b e uniformly distributed on the in terv al [0 ; 1]. Then E ( X ) = Z 1 0 x dx = 1 = 2 : It follo ws that if w e c ho ose a large n um b er N of random n um b ers from [0 ; 1] and tak e the a v erage, then w e can exp ect that this a v erage should b e close to the exp ected v alue of 1/2. 2 Example 6.21 Let Z = ( x; y ) denote a p oin t c hosen uniformly and randomly from the unit disk, as in the dart game in Example 2.8 and let X = ( x 2 + y 2 ) 1 = 2 b e the distance from Z to the cen ter of the disk. The densit y function of X can easily b e sho wn to equal f ( x ) = 2 x so b y the denition of exp ected v alue, E ( X ) = Z 1 0 xf ( x ) dx = Z 1 0 x (2 x ) dx = 2 3 : 2 Example 6.22 In the example of the couple meeting at the Inn (Example 2.16), eac h p erson arriv es at a time whic h is uniformly distributed b et w een 5:00 and 6:00 PM. The random v ariable Z under consideration is the length of time the rst p erson has to w ait un til the second one arriv es. It w as sho wn that f Z ( z ) = 2(1 z ) ; for 0 z 1. Hence, E ( Z ) = Z 1 0 z f Z ( z ) dz PAGE 278 270 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE = Z 1 0 2 z (1 z ) dz = h z 2 2 3 z 3 i 10 = 1 3 : 2 Exp ectation of a F unction of a Random V ariable Supp ose that X is a realv alued random v ariable and ( x ) is a con tin uous function from R to R The follo wing theorem is the con tin uous analogue of Theorem 6.1. Theorem 6.11 If X is a realv alued random v ariable and if : R R is a con tin uous realv alued function with domain [ a; b ], then E ( ( X )) = Z + 1 1 ( x ) f X ( x ) dx ; pro vided the in tegral exists. 2 F or a pro of of this theorem, see Ross. 21 Exp ectation of the Pro duct of Tw o Random V ariables In general, it is not true that E ( X Y ) = E ( X ) E ( Y ), since the in tegral of a pro duct is not the pro duct of in tegrals. But if X and Y are indep enden t, then the exp ectations m ultiply Theorem 6.12 Let X and Y b e indep enden t realv alued con tin uous random v ariables with nite exp ected v alues. Then w e ha v e E ( X Y ) = E ( X ) E ( Y ) : Pro of. W e will pro v e this only in the case that the ranges of X and Y are con tained in the in terv als [ a; b ] and [ c; d ], resp ectiv ely Let the densit y functions of X and Y b e denoted b y f X ( x ) and f Y ( y ), resp ectiv ely Since X and Y are indep enden t, the join t densit y function of X and Y is the pro duct of the individual densit y functions. Hence E ( X Y ) = Z b a Z d c xy f X ( x ) f Y ( y ) dy dx = Z b a xf X ( x ) dx Z d c y f Y ( y ) dy = E ( X ) E ( Y ) : The pro of in the general case in v olv es using sequences of b ounded random v ariables that approac h X and Y and is somewhat tec hnical, so w e will omit it. 2 21 S. Ross, A First Course in Pr ob ability, (New Y ork: Macmillan, 1984), pgs. 241245. PAGE 279 6.3. CONTINUOUS RANDOM V ARIABLES 271 In the same w a y one can sho w that if X 1 X 2 X n are n m utually indep enden t realv alued random v ariables, then E ( X 1 X 2 X n ) = E ( X 1 ) E ( X 2 ) E ( X n ) : Example 6.23 Let Z = ( X ; Y ) b e a p oin t c hosen at random in the unit square. Let A = X 2 and B = Y 2 Then Theorem 4.3 implies that A and B are indep enden t. Using Theorem 6.11, the exp ectations of A and B are easy to calculate: E ( A ) = E ( B ) = Z 1 0 x 2 dx = 1 3 : Using Theorem 6.12, the exp ectation of AB is just the pro duct of E ( A ) and E ( B ), or 1/9. The usefulness of this theorem is demonstrated b y noting that it is quite a bit more dicult to calculate E ( AB ) from the denition of exp ectation. One nds that the densit y function of AB is f AB ( t ) = log( t ) 4 p t ; so E ( AB ) = Z 1 0 tf AB ( t ) dt = 1 9 : 2 Example 6.24 Again let Z = ( X ; Y ) b e a p oin t c hosen at random in the unit square, and let W = X + Y Then Y and W are not indep enden t, and w e ha v e E ( Y ) = 1 2 ; E ( W ) = 1 ; E ( Y W ) = E ( X Y + Y 2 ) = E ( X ) E ( Y ) + 1 3 = 7 12 6 = E ( Y ) E ( W ) : 2 W e turn no w to the v ariance. V ariance Denition 6.5 Let X b e a realv alued random v ariable with densit y function f ( x ). The varianc e 2 = V ( X ) is dened b y 2 = V ( X ) = E (( X ) 2 ) : 2 PAGE 280 272 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE The next result follo ws easily from Theorem 6.1. There is another w a y to calculate the v ariance of a con tin uous random v ariable, whic h is usually sligh tly easier. It is giv en in Theorem 6.15. Theorem 6.13 If X is a realv alued random v ariable with E ( X ) = then 2 = Z 1 1 ( x ) 2 f ( x ) dx : 2 The prop erties listed in the next three theorems are all pro v ed in exactly the same w a y that the corresp onding theorems for discrete random v ariables w ere pro v ed in Section 6.2. Theorem 6.14 If X is a realv alued random v ariable dened on n and c is an y constan t, then (cf. Theorem 6.7) V ( cX ) = c 2 V ( X ) ; V ( X + c ) = V ( X ) : 2 Theorem 6.15 If X is a realv alued random v ariable with E ( X ) = then (cf. Theorem 6.6) V ( X ) = E ( X 2 ) 2 : 2 Theorem 6.16 If X and Y are indep enden t realv alued random v ariables on n, then (cf. Theorem 6.8) V ( X + Y ) = V ( X ) + V ( Y ) : 2 Example 6.25 (con tin uation of Example 6.20) If X is uniformly distributed on [0 ; 1], then, using Theorem 6.15, w e ha v e V ( X ) = Z 1 0 x 1 2 2 dx = 1 12 : 2 PAGE 281 6.3. CONTINUOUS RANDOM V ARIABLES 273 Example 6.26 Let X b e an exp onen tially distributed random v ariable with parameter Then the densit y function of X is f X ( x ) = e x : F rom the denition of exp ectation and in tegration b y parts, w e ha v e E ( X ) = Z 1 0 xf X ( x ) dx = Z 1 0 xe x dx = xe x 10 + Z 1 0 e x dx = 0 + e x 10 = 1 : Similarly using Theorems 6.11 and 6.15, w e ha v e V ( X ) = Z 1 0 x 2 f X ( x ) dx 1 2 = Z 1 0 x 2 e x dx 1 2 = x 2 e x 10 + 2 Z 1 0 xe x dx 1 2 = x 2 e x 10 2 xe x 10 2 2 e x 10 1 2 = 2 2 1 2 = 1 2 : In this case, b oth E ( X ) and V ( X ) are nite if > 0. 2 Example 6.27 Let Z b e a standard normal random v ariable with densit y function f Z ( x ) = 1 p 2 e x 2 = 2 : Since this densit y function is symmetric with resp ect to the y axis, then it is easy to sho w that Z 1 1 xf Z ( x ) dx has v alue 0. The reader should recall ho w ev er, that the exp ectation is dened to b e the ab o v e in tegral only if the in tegral Z 1 1 j x j f Z ( x ) dx is nite. This in tegral equals 2 Z 1 0 xf Z ( x ) dx ; PAGE 282 274 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE whic h one can easily sho w is nite. Th us, the exp ected v alue of Z is 0. T o calculate the v ariance of Z w e b egin b y applying Theorem 6.15: V ( Z ) = Z + 1 1 x 2 f Z ( x ) dx 2 : If w e write x 2 as x x and in tegrate b y parts, w e obtain 1 p 2 ( xe x 2 = 2 ) + 1 1 + 1 p 2 Z + 1 1 e x 2 = 2 dx : The rst summand ab o v e can b e sho wn to equal 0, since as x 1 e x 2 = 2 gets small more quic kly than x gets large. The second summand is just the standard normal densit y in tegrated o v er its domain, so the v alue of this summand is 1. Therefore, the v ariance of the standard normal densit y equals 1. No w let X b e a (not necessarily standard) normal random v ariable with parameters and Then the densit y function of X is f X ( x ) = 1 p 2 e ( x ) 2 = 2 2 : W e can write X = Z + where Z is a standard normal random v ariable. Since E ( Z ) = 0 and V ( Z ) = 1 b y the calculation ab o v e, Theorems 6.10 and 6.14 imply that E ( X ) = E ( Z + ) = ; V ( X ) = V ( Z + ) = 2 : 2 Example 6.28 Let X b e a con tin uous random v ariable with the Cauc h y densit y function f X ( x ) = a 1 a 2 + x 2 : Then the exp ectation of X do es not exist, b ecause the in tegral a Z + 1 1 j x j dx a 2 + x 2 div erges. Th us the v ariance of X also fails to exist. Densities whose v ariance is not dened, lik e the Cauc h y densit y b eha v e quite dieren tly in a n um b er of imp ortan t resp ects from those whose v ariance is nite. W e shall see one instance of this dierence in Section 8.2. 2 Indep enden t T rials PAGE 283 6.3. CONTINUOUS RANDOM V ARIABLES 275 Corollary 6.1 If X 1 X 2 X n is an indep enden t trials pro cess of realv alued random v ariables, with E ( X i ) = and V ( X i ) = 2 and if S n = X 1 + X 2 + + X n ; A n = S n n ; then E ( S n ) = n ; E ( A n ) = ; V ( S n ) = n 2 ; V ( A n ) = 2 n : It follo ws that if w e set S n = S n n p n 2 ; then E ( S n ) = 0 ; V ( S n ) = 1 : W e sa y that S n is a standar dize d version of S n (see Exercise 12 in Section 6.2). 2 QueuesExample 6.29 Let us consider again the queueing problem, that is, the problem of the customers w aiting in a queue for service (see Example 5.7). W e supp ose again that customers join the queue in suc h a w a y that the time b et w een arriv als is an exp onen tially distributed random v ariable X with densit y function f X ( t ) = e t : Then the exp ected v alue of the time b et w een arriv als is simply 1 = (see Example 6.26), as w as stated in Example 5.7. The recipro cal of this exp ected v alue is often referred to as the arrival r ate. The servic e time of an individual who is rst in line is dened to b e the amoun t of time that the p erson sta ys at the head of the line b efore lea ving. W e supp ose that the customers are serv ed in suc h a w a y that the service time is another exp onen tially distributed random v ariable Y with densit y function f X ( t ) = e t : Then the exp ected v alue of the service time is E ( X ) = Z 1 0 tf X ( t ) dt = 1 : The recipro cal if this exp ected v alue is often referred to as the servic e r ate. PAGE 284 276 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE W e exp ect on grounds of our ev eryda y exp erience with queues that if the service rate is greater than the arriv al rate, then the a v erage queue size will tend to stabilize, but if the service rate is less than the arriv al rate, then the queue will tend to increase in length without limit (see Figure 5.7). The sim ulations in Example 5.7 tend to b ear out our ev eryda y exp erience. W e can mak e this conclusion more precise if w e in tro duce the tr ac intensity as the pro duct = (arriv al rate )(a v erage service time ) = = 1 = 1 = : The trac in tensit y is also the ratio of the a v erage service time to the a v erage time b et w een arriv als. If the trac in tensit y is less than 1 the queue will p erform reasonably but if it is greater than 1 the queue will gro w indenitely large. In the critical case of = 1, it can b e sho wn that the queue will b ecome large but there will alw a ys b e times at whic h the queue is empt y 22 In the case that the trac in tensit y is less than 1 w e can consider the length of the queue as a random v ariable Z whose exp ected v alue is nite, E ( Z ) = N : The time sp en t in the queue b y a single customer can b e considered as a random v ariable W whose exp ected v alue is nite, E ( W ) = T : Then w e can argue that, when a customer joins the queue, he exp ects to nd N p eople ahead of him, and when he lea v es the queue, he exp ects to nd T p eople b ehind him. Since, in equilibrium, these should b e the same, w e w ould exp ect to nd that N = T : This last relationship is called Little's law for queues. 23 W e will not pro v e it here. A pro of ma y b e found in Ross. 24 Note that in this case w e are coun ting the w aiting time of all customers, ev en those that do not ha v e to w ait at all. In our sim ulation in Section 4.2, w e did not consider these customers. If w e knew the exp ected queue length then w e could use Little's la w to obtain the exp ected w aiting time, since T = N : The queue length is a random v ariable with a discrete distribution. W e can estimate this distribution b y sim ulation, k eeping trac k of the queue lengths at the times at whic h a customer arriv es. W e sho w the result of this sim ulation (using the program Queue ) in Figure 6.8. 22 L. Kleinro c k, Queueing Systems, v ol. 2 (New Y ork: John Wiley and Sons, 1975). 23 ibid., p. 17. 24 S. M. Ross, Applie d Pr ob ability Mo dels with Optimization Applic ations, (San F rancisco: HoldenDa y 1970) PAGE 285 6.3. CONTINUOUS RANDOM V ARIABLES 277 0 10 20 30 40 50 0 0.02 0.04 0.06 0.08 Figure 6.8: Distribution of queue lengths. W e note that the distribution app ears to b e a geometric distribution. In the study of queueing theory it is sho wn that the distribution for the queue length in equilibrium is indeed a geometric distribution with s j = (1 ) j for j = 0 ; 1 ; 2 ; : : : ; if < 1. The exp ected v alue of a random v ariable with this distribution is N = (1 ) (see Example 6.4). Th us b y Little's result the exp ected w aiting time is T = (1 ) = 1 ; where is the service rate, the arriv al rate, and the trac in tensit y In our sim ulation, the arriv al rate is 1 and the service rate is 1.1. Th us, the trac in tensit y is 1 = 1 : 1 = 10 = 11, the exp ected queue size is 10 = 11 (1 10 = 11) = 10 ; and the exp ected w aiting time is 1 1 : 1 1 = 10 : In our sim ulation the a v erage queue size w as 8.19 and the a v erage w aiting time w as 7.37. In Figure 6.9, w e sho w the histogram for the w aiting times. This histogram suggests that the densit y for the w aiting times is exp onen tial with parameter and this is the case. 2 PAGE 286 278 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE 0 10 20 30 40 50 0 0.02 0.04 0.06 0.08 Figure 6.9: Distribution of queue w aiting times. Exercises 1 Let X b e a random v ariable with range [ 1 ; 1] and let f X ( x ) b e the densit y function of X Find ( X ) and 2 ( X ) if, for j x j < 1, (a) f X ( x ) = 1 = 2. (b) f X ( x ) = j x j (c) f X ( x ) = 1 j x j (d) f X ( x ) = (3 = 2) x 2 2 Let X b e a random v ariable with range [ 1 ; 1] and f X its densit y function. Find ( X ) and 2 ( X ) if, for j x j > 1, f X ( x ) = 0, and for j x j < 1, (a) f X ( x ) = (3 = 4)(1 x 2 ). (b) f X ( x ) = ( = 4) cos ( x= 2). (c) f X ( x ) = ( x + 1) = 2. (d) f X ( x ) = (3 = 8)( x + 1) 2 3 The lifetime, measure in hours, of the A CME sup er ligh t bulb is a random v ariable T with densit y function f T ( t ) = 2 te t where = : 05. What is the exp ected lifetime of this ligh t bulb? What is its v ariance? 4 Let X b e a random v ariable with range [ 1 ; 1] and densit y function f X ( x ) = ax + b if j x j < 1. (a) Sho w that if R +1 1 f X ( x ) dx = 1, then b = 1 = 2. (b) Sho w that if f X ( x ) 0, then 1 = 2 a 1 = 2. (c) Sho w that = (2 = 3) a and hence that 1 = 3 1 = 3. PAGE 287 6.3. CONTINUOUS RANDOM V ARIABLES 279 (d) Sho w that 2 ( X ) = (2 = 3) b (4 = 9) a 2 = 1 = 3 (4 = 9) a 2 5 Let X b e a random v ariable with range [ 1 ; 1] and densit y function f X ( x ) = ax 2 + bx + c if j x j < 1 and 0 otherwise. (a) Sho w that 2 a= 3 + 2 c = 1 (see Exercise 4). (b) Sho w that 2 b= 3 = ( X ). (c) Sho w that 2 a= 5 + 2 c= 3 = 2 ( X ). (d) Find a b and c if ( X ) = 0, 2 ( X ) = 1 = 15, and sk etc h the graph of f X (e) Find a b and c if ( X ) = 0, 2 ( X ) = 1 = 2, and sk etc h the graph of f X 6 Let T b e a random v ariable with range [0 ; 1 ] and f T its densit y function. Find ( T ) and 2 ( T ) if, for t < 0, f T ( t ) = 0, and for t > 0, (a) f T ( t ) = 3 e 3 t (b) f T ( t ) = 9 te 3 t (c) f T ( t ) = 3 = (1 + t ) 4 7 Let X b e a random v ariable with densit y function f X Sho w, using elemen tary calculus, that the function ( a ) = E (( X a ) 2 ) tak es its minim um v alue when a = ( X ), and in that case ( a ) = 2 ( X ). 8 Let X b e a random v ariable with mean and v ariance 2 Let Y = aX 2 + bX + c Find the exp ected v alue of Y 9 Let X Y and Z b e indep enden t random v ariables, eac h with mean and v ariance 2 (a) Find the exp ected v alue and v ariance of S = X + Y + Z (b) Find the exp ected v alue and v ariance of A = (1 = 3)( X + Y + Z ). (c) Find the exp ected v alue of S 2 and A 2 10 Let X and Y b e indep enden t random v ariables with uniform densit y functions on [0 ; 1]. Find (a) E ( j X Y j ). (b) E (max( X ; Y )). (c) E (min ( X ; Y )). (d) E ( X 2 + Y 2 ). (e) E (( X + Y ) 2 ). PAGE 288 280 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE 11 The Pilsdor Beer Compan y runs a reet of truc ks along the 100 mile road from Hangto wn to Dry Gulc h. The truc ks are old, and are apt to break do wn at an y p oin t along the road with equal probabilit y Where should the compan y lo cate a garage so as to minimize the exp ected distance from a t ypical breakdo wn to the garage? In other w ords, if X is a random v ariable giving the lo cation of the breakdo wn, measured, sa y from Hangto wn, and b giv es the lo cation of the garage, what c hoice of b minimizes E ( j X b j )? No w supp ose X is not distributed uniformly o v er [0 ; 100], but instead has densit y function f X ( x ) = 2 x= 10 ; 000. Then what c hoice of b minimizes E ( j X b j )? 12 Find E ( X Y ), where X and Y are indep enden t random v ariables whic h are uniform on [0 ; 1]. Then v erify y our answ er b y sim ulation. 13 Let X b e a random v ariable that tak es on nonnegativ e v alues and has distribution function F ( x ). Sho w that E ( X ) = Z 1 0 (1 F ( x )) dx : Hint : In tegrate b y parts. Illustrate this result b y calculating E ( X ) b y this metho d if X has an exp onen tial distribution F ( x ) = 1 e x for x 0, and F ( x ) = 0 otherwise. 14 Let X b e a con tin uous random v ariable with densit y function f X ( x ). Sho w that if Z + 1 1 x 2 f X ( x ) dx < 1 ; then Z + 1 1 j x j f X ( x ) dx < 1 : Hint : Except on the in terv al [ 1 ; 1], the rst in tegrand is greater than the second in tegrand. 15 Let X b e a random v ariable distributed uniformly o v er [0 ; 20]. Dene a new random v ariable Y b y Y = b X c (the greatest in teger in X ). Find the exp ected v alue of Y Do the same for Z = b X + : 5 c Compute E j X Y j and E j X Z j (Note that Y is the v alue of X rounded o to the nearest smallest in teger, while Z is the v alue of X rounded o to the nearest in teger. Whic h metho d of rounding o is b etter? Wh y?) 16 Assume that the lifetime of a diesel engine part is a random v ariable X with densit y f X When the part w ears out, it is replaced b y another with the same densit y Let N ( t ) b e the n um b er of parts that are used in time t W e w an t to study the random v ariable N ( t ) =t Since parts are replaced on the a v erage ev ery E ( X ) time units, w e exp ect ab out t=E ( X ) parts to b e used in time t That is, w e exp ect that lim t !1 E N ( t ) t = 1 E ( X ) : PAGE 289 6.3. CONTINUOUS RANDOM V ARIABLES 281 This result is correct but quite dicult to pro v e. W rite a program that will allo w y ou to sp ecify the densit y f X and the time t and sim ulate this exp erimen t to nd N ( t ) =t Ha v e y our program rep eat the exp erimen t 500 times and plot a bar graph for the random outcomes of N ( t ) =t F rom this data, estimate E ( N ( t ) =t ) and compare this with 1 =E ( X ). In particular, do this for t = 100 with the follo wing t w o densities: (a) f X = e t (b) f X = te t 17 Let X and Y b e random v ariables. The c ovarianc e Co v(X ; Y) is dened b y (see Exercise 6.2.23) co v(X ; Y) = E((X (X))(Y (Y))) : (a) Sho w that co v(X ; Y) = E(XY) E(X)E(Y). (b) Using (a), sho w that co v( X ; Y ) = 0, if X and Y are indep enden t. (Caution: the con v erse is not alw a ys true.) (c) Sho w that V ( X + Y ) = V ( X ) + V ( Y ) + 2co v ( X ; Y ). 18 Let X and Y b e random v ariables with p ositiv e v ariance. The c orr elation of X and Y is dened as ( X ; Y ) = co v( X ; Y ) p V ( X ) V ( Y ) : (a) Using Exercise 17(c), sho w that 0 V X ( X ) + Y ( Y ) = 2(1 + ( X ; Y )) : (b) No w sho w that 0 V X ( X ) Y ( Y ) = 2(1 ( X ; Y )) : (c) Using (a) and (b), sho w that 1 ( X ; Y ) 1 : 19 Let X and Y b e indep enden t random v ariables with uniform densities in [0 ; 1]. Let Z = X + Y and W = X Y Find (a) ( X ; Y ) (see Exercise 18). (b) ( X ; Z ). (c) ( Y ; W ). (d) ( Z ; W ). PAGE 290 282 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE *20 When studying certain ph ysiological data, suc h as heigh ts of fathers and sons, it is often natural to assume that these data (e.g., the heigh ts of the fathers and the heigh ts of the sons) are describ ed b y random v ariables with normal densities. These random v ariables, ho w ev er, are not indep enden t but rather are correlated. F or example, a t w odimensional standard normal densit y for correlated random v ariables has the form f X ;Y ( x; y ) = 1 2 p 1 2 e ( x 2 2 xy + y 2 ) = 2(1 2 ) : (a) Sho w that X and Y eac h ha v e standard normal densities. (b) Sho w that the correlation of X and Y (see Exercise 18) is *21 F or correlated random v ariables X and Y it is natural to ask for the exp ected v alue for X giv en Y F or example, Galton calculated the exp ected v alue of the heigh t of a son giv en the heigh t of the father. He used this to sho w that tall men can b e exp ected to ha v e sons who are less tall on the a v erage. Similarly studen ts who do v ery w ell on one exam can b e exp ected to do less w ell on the next exam, and so forth. This is called r e gr ession on the me an. T o dene this conditional exp ected v alue, w e rst dene a conditional densit y of X giv en Y = y b y f X j Y ( x j y ) = f X ;Y ( x; y ) f Y ( y ) ; where f X ;Y ( x; y ) is the join t densit y of X and Y and f Y is the densit y for Y Then the conditional exp ected v alue of X giv en Y is E ( X j Y = y ) = Z b a xf X j Y ( x j y ) dx : F or the normal densit y in Exercise 20, sho w that the conditional densit y of f X j Y ( x j y ) is normal with mean y and v ariance 1 2 F rom this w e see that if X and Y are p ositiv ely correlated (0 < < 1), and if y > E ( Y ), then the exp ected v alue for X giv en Y = y will b e less than y (i.e., w e ha v e regression on the mean). 22 A p oin t Y is c hosen at random from [0 ; 1]. A second p oin t X is then c hosen from the in terv al [0 ; Y ]. Find the densit y for X Hint : Calculate f X j Y as in Exercise 21 and then use f X ( x ) = Z 1 x f X j Y ( x j y ) f Y ( y ) dy : Can y ou also deriv e y our result geometrically? *23 Let X and V b e t w o standard normal random v ariables. Let b e a real n um b er b et w een 1 and 1. (a) Let Y = X + p 1 2 V Sho w that E ( Y ) = 0 and V ar ( Y ) = 1. W e shall see later (see Example 7.5 and Example 10.17), that the sum of t w o indep enden t normal random v ariables is again normal. Th us, assuming this fact, w e ha v e sho wn that Y is standard normal. PAGE 291 6.3. CONTINUOUS RANDOM V ARIABLES 283 (b) Using Exercises 17 and 18, sho w that the correlation of X and Y is (c) In Exercise 20, the join t densit y function f X ;Y ( x; y ) for the random v ariable ( X ; Y ) is giv en. No w supp ose that w e w an t to kno w the set of p oin ts ( x; y ) in the xy plane suc h that f X ;Y ( x; y ) = C for some constan t C This set of p oin ts is called a set of constan t densit y Roughly sp eaking, a set of constan t densit y is a set of p oin ts where the outcomes ( X ; Y ) are equally lik ely to fall. Sho w that for a giv en C the set of p oin ts of constan t densit y is a curv e whose equation is x 2 2 xy + y 2 = D ; where D is a constan t whic h dep ends up on C (This curv e is an ellipse.) (d) One can plot the ellipse in part (c) b y using the parametric equations x = r cos p 2(1 ) + r sin p 2(1 + ) ; y = r cos p 2(1 ) r sin p 2(1 + ) : W rite a program to plot 1000 pairs ( X ; Y ) for = 1 = 2 ; 0 ; 1 = 2. F or eac h plot, ha v e y our program plot the ab o v e parametric curv es for r = 1 ; 2 ; 3. *24 F ollo wing Galton, let us assume that the fathers and sons ha v e heigh ts that are dep enden t normal random v ariables. Assume that the a v erage heigh t is 68 inc hes, standard deviation is 2.7 inc hes, and the correlation co ecien t is .5 (see Exercises 20 and 21). That is, assume that the heigh ts of the fathers and sons ha v e the form 2 : 7 X + 68 and 2 : 7 Y + 68, resp ectiv ely where X and Y are correlated standardized normal random v ariables, with correlation co ecien t .5. (a) What is the exp ected heigh t for the son of a father whose heigh t is 72 inc hes? (b) Plot a scatter diagram of the heigh ts of 1000 father and son pairs. Hint : Y ou can c ho ose standardized pairs as in Exercise 23 and then plot (2 : 7 X + 68 ; 2 : 7 Y + 68). *25 When w e ha v e pairs of data ( x i ; y i ) that are outcomes of the pairs of dep enden t random v ariables X Y w e can estimate the co orelation co ecien t b y r = P i ( x i x )( y i y ) ( n 1) s X s Y ; where x and y are the sample means for X and Y resp ectiv ely and s X and s Y are the sample standard deviations for X and Y (see Exercise 6.2.17). W rite a program to compute the sample means, v ariances, and correlation for suc h dep enden t data. Use y our program to compute these quan tities for Galton's data on heigh ts of paren ts and c hildren giv en in App endix B. PAGE 292 284 CHAPTER 6. EXPECTED V ALUE AND V ARIANCE Plot the equal densit y ellipses as dened in Exercise 23 for r = 4, 6, and 8, and on the same graph prin t the v alues that app ear in the table at the appropriate p oin ts. F or example, prin t 12 at the p oin t (70 : 5 ; 68 : 2), indicating that there w ere 12 cases where the paren t's heigh t w as 70.5 and the c hild's w as 68.12. See if Galton's data is consisten t with the equal densit y ellipses. 26 (from Hamming 25 ) Supp ose y ou are standing on the bank of a straigh t riv er. (a) Cho ose, at random, a direction whic h will k eep y ou on dry land, and w alk 1 km in that direction. Let P denote y our p osition. What is the exp ected distance from P to the riv er? (b) No w supp ose y ou pro ceed as in part (a), but when y ou get to P y ou pic k a random direction (from among al l directions) and w alk 1 km. What is the probabilit y that y ou will reac h the riv er b efore the second w alk is completed? 27 (from Hamming 26 ) A game is pla y ed as follo ws: A random n um b er X is c hosen uniformly from [0 ; 1]. Then a sequence Y 1 ; Y 2 ; : : : of random n um b ers is c hosen indep enden tly and uniformly from [0 ; 1]. The game ends the rst time that Y i > X Y ou are then paid ( i 1) dollars. What is a fair en trance fee for this game? 28 A long needle of length L m uc h bigger than 1 is dropp ed on a grid with horizon tal and v ertical lines one unit apart. Sho w that the a v erage n um b er a of lines crossed is appro ximately a = 4 L : 25 R. W. Hamming, The A rt of Pr ob ability for Scientists and Engine ers (Redw o o d Cit y: AddisonW esley 1991), p. 192. 26 ibid., pg. 205. PAGE 293 Chapter 7 Sums of Indep enden t Random V ariables 7.1 Sums of Discrete Random V ariables In this c hapter w e turn to the imp ortan t question of determining the distribution of a sum of indep enden t random v ariables in terms of the distributions of the individual constituen ts. In this section w e consider only sums of discrete random v ariables, reserving the case of con tin uous random v ariables for the next section. W e consider here only random v ariables whose v alues are in tegers. Their distribution functions are then dened on these in tegers. W e shall nd it con v enien t to assume here that these distribution functions are dened for al l in tegers, b y dening them to b e 0 where they are not otherwise dened. Con v olutions Supp ose X and Y are t w o indep enden t discrete random v ariables with distribution functions m 1 ( x ) and m 2 ( x ). Let Z = X + Y W e w ould lik e to determine the distribution function m 3 ( x ) of Z T o do this, it is enough to determine the probabilit y that Z tak es on the v alue z where z is an arbitrary in teger. Supp ose that X = k where k is some in teger. Then Z = z if and only if Y = z k So the ev en t Z = z is the union of the pairwise disjoin t ev en ts ( X = k ) and ( Y = z k ) ; where k runs o v er the in tegers. Since these ev en ts are pairwise disjoin t, w e ha v e P ( Z = z ) = 1 X k = 1 P ( X = k ) P ( Y = z k ) : Th us, w e ha v e found the distribution function of the random v ariable Z This leads to the follo wing denition. 285 PAGE 294 286 CHAPTER 7. SUMS OF RANDOM V ARIABLES Denition 7.1 Let X and Y b e t w o indep enden t in tegerv alued random v ariables, with distribution functions m 1 ( x ) and m 2 ( x ) resp ectiv ely Then the c onvolution of m 1 ( x ) and m 2 ( x ) is the distribution function m 3 = m 1 m 2 giv en b y m 3 ( j ) = X k m 1 ( k ) m 2 ( j k ) ; for j = : : : ; 2 ; 1 ; 0 ; 1 ; 2 ; : : : The function m 3 ( x ) is the distribution function of the random v ariable Z = X + Y 2 It is easy to see that the con v olution op eration is comm utativ e, and it is straigh tforw ard to sho w that it is also asso ciativ e. No w let S n = X 1 + X 2 + + X n b e the sum of n indep enden t random v ariables of an indep enden t trials pro cess with common distribution function m dened on the in tegers. Then the distribution function of S 1 is m W e can write S n = S n 1 + X n : Th us, since w e kno w the distribution function of X n is m w e can nd the distribution function of S n b y induction. Example 7.1 A die is rolled t wice. Let X 1 and X 2 b e the outcomes, and let S 2 = X 1 + X 2 b e the sum of these outcomes. Then X 1 and X 2 ha v e the common distribution function: m = 1 2 3 4 5 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 : The distribution function of S 2 is then the con v olution of this distribution with itself. Th us, P ( S 2 = 2) = m (1) m (1) = 1 6 1 6 = 1 36 ; P ( S 2 = 3) = m (1) m (2) + m (2) m (1) = 1 6 1 6 + 1 6 1 6 = 2 36 ; P ( S 2 = 4) = m (1) m (3) + m (2) m (2) + m (3) m (1) = 1 6 1 6 + 1 6 1 6 + 1 6 1 6 = 3 36 : Con tin uing in this w a y w e w ould nd P ( S 2 = 5) = 4 = 36, P ( S 2 = 6) = 5 = 36, P ( S 2 = 7) = 6 = 36, P ( S 2 = 8) = 5 = 36, P ( S 2 = 9) = 4 = 36, P ( S 2 = 10) = 3 = 36, P ( S 2 = 11) = 2 = 36, and P ( S 2 = 12) = 1 = 36. The distribution for S 3 w ould then b e the con v olution of the distribution for S 2 with the distribution for X 3 Th us P ( S 3 = 3) = P ( S 2 = 2) P ( X 3 = 1) PAGE 295 7.1. SUMS OF DISCRETE RANDOM V ARIABLES 287 = 1 36 1 6 = 1 216 ; P ( S 3 = 4) = P ( S 2 = 3) P ( X 3 = 1) + P ( S 2 = 2) P ( X 3 = 2) = 2 36 1 6 + 1 36 1 6 = 3 216 ; and so forth. This is clearly a tedious job, and a program should b e written to carry out this calculation. T o do this w e rst write a program to form the con v olution of t w o densities p and q and return the densit y r W e can then write a program to nd the densit y for the sum S n of n indep enden t random v ariables with a common densit y p at least in the case that the random v ariables ha v e a nite n um b er of p ossible v alues. Running this program for the example of rolling a die n times for n = 10 ; 20 ; 30 results in the distributions sho wn in Figure 7.1. W e see that, as in the case of Bernoulli trials, the distributions b ecome b ellshap ed. W e shall discuss in Chapter 9 a v ery general theorem called the Centr al Limit The or em that will explain this phenomenon. 2 Example 7.2 A w ellkno wn metho d for ev aluating a bridge hand is: an ace is assigned a v alue of 4, a king 3, a queen 2, and a jac k 1. All other cards are assigned a v alue of 0. The p oint c ount of the hand is then the sum of the v alues of the cards in the hand. (It is actually more complicated than this, taking in to accoun t v oids in suits, and so forth, but w e consider here this simplied form of the p oin t coun t.) If a card is dealt at random to a pla y er, then the p oin t coun t for this card has distribution p X = 0 1 2 3 4 36 = 52 4 = 52 4 = 52 4 = 52 4 = 52 : Let us regard the total hand of 13 cards as 13 indep enden t trials with this common distribution. (Again this is not quite correct b ecause w e assume here that w e are alw a ys c ho osing a card from a full dec k.) Then the distribution for the p oin t coun t C for the hand can b e found from the program NF oldCon v olution b y using the distribution for a single card and c ho osing n = 13. A pla y er with a p oin t coun t of 13 or more is said to ha v e an op ening bid. The probabilit y of ha ving an op ening bid is then P ( C 13) : Since w e ha v e the distribution of C it is easy to compute this probabilit y Doing this w e nd that P ( C 13) = : 2845 ; so that ab out one in four hands should b e an op ening bid according to this simplied mo del. A more realistic discussion of this problem can b e found in Epstein, The The ory of Gambling and Statistic al L o gic. 1 2 1 R. A. Epstein, The The ory of Gambling and Statistic al L o gic, rev. ed. (New Y ork: Academic Press, 1977). PAGE 296 288 CHAPTER 7. SUMS OF RANDOM V ARIABLES 20 40 60 80 100 120 140 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 20 40 60 80 100 120 140 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 20 40 60 80 100 120 140 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 n = 10n = 20rr n = 30 Figure 7.1: Densit y of S n for rolling a die n times. PAGE 297 7.1. SUMS OF DISCRETE RANDOM V ARIABLES 289 F or certain sp ecial distributions it is p ossible to nd an expression for the distribution that results from con v oluting the distribution with itself n times. The con v olution of t w o binomial distributions, one with parameters m and p and the other with parameters n and p is a binomial distribution with parameters ( m + n ) and p This fact follo ws easily from a consideration of the exp erimen t whic h consists of rst tossing a coin m times, and then tossing it n more times. The con v olution of k geometric distributions with common parameter p is a negativ e binomial distribution with parameters p and k This can b e seen b y considering the exp erimen t whic h consists of tossing a coin un til the k th head app ears. Exercises 1 A die is rolled three times. Find the probabilit y that the sum of the outcomes is (a) greater than 9. (b) an o dd n um b er. 2 The price of a sto c k on a giv en trading da y c hanges according to the distribution p X = 1 0 1 2 1 = 4 1 = 2 1 = 8 1 = 8 : Find the distribution for the c hange in sto c k price after t w o (indep enden t) trading da ys. 3 Let X 1 and X 2 b e indep enden t random v ariables with common distribution p X = 0 1 2 1 = 8 3 = 8 1 = 2 : Find the distribution of the sum X 1 + X 2 4 In one pla y of a certain game y ou win an amoun t X with distribution p X = 1 2 3 1 = 4 1 = 4 1 = 2 : Using the program NF oldCon v olution nd the distribution for y our total winnings after ten (indep enden t) pla ys. Plot this distribution. 5 Consider the follo wing t w o exp erimen ts: the rst has outcome X taking on the v alues 0, 1, and 2 with equal probabilities; the second results in an (indep enden t) outcome Y taking on the v alue 3 with probabilit y 1/4 and 4 with probabilit y 3/4. Find the distribution of (a) Y + X (b) Y X PAGE 298 290 CHAPTER 7. SUMS OF RANDOM V ARIABLES 6 P eople arriv e at a queue according to the follo wing sc heme: During eac h min ute of time either 0 or 1 p erson arriv es. The probabilit y that 1 p erson arriv es is p and that no p erson arriv es is q = 1 p Let C r b e the n um b er of customers arriving in the rst r min utes. Consider a Bernoulli trials pro cess with a success if a p erson arriv es in a unit time and failure if no p erson arriv es in a unit time. Let T r b e the n um b er of failures b efore the r th success. (a) What is the distribution for T r ? (b) What is the distribution for C r ? (c) Find the mean and v ariance for the n um b er of customers arriving in the rst r min utes. 7 (a) A die is rolled three times with outcomes X 1 X 2 and X 3 Let Y 3 b e the maxim um of the v alues obtained. Sho w that P ( Y 3 j ) = P ( X 1 j ) 3 : Use this to nd the distribution of Y 3 Do es Y 3 ha v e a b ellshap ed distribution? (b) No w let Y n b e the maxim um v alue when n dice are rolled. Find the distribution of Y n Is this distribution b ellshap ed for large v alues of n ? 8 A baseball pla y er is to pla y in the W orld Series. Based up on his season pla y y ou estimate that if he comes to bat four times in a game the n um b er of hits he will get has a distribution p X = 0 1 2 3 4 : 4 : 2 : 2 : 1 : 1 : Assume that the pla y er comes to bat four times in eac h game of the series. (a) Let X denote the n um b er of hits that he gets in a series. Using the program NF oldCon v olution nd the distribution of X for eac h of the p ossible series lengths: fourgame, v egame, sixgame, sev engame. (b) Using one of the distribution found in part (a), nd the probabilit y that his batting a v erage exceeds .400 in a fourgame series. (The batting a v erage is the n um b er of hits divided b y the n um b er of times at bat.) (c) Giv en the distribution p X what is his longterm batting a v erage? 9 Pro v e that y ou cannot load t w o dice in suc h a w a y that the probabilities for an y sum from 2 to 12 are the same. (Be sure to consider the case where one or more sides turn up with probabilit y zero.) 10 (L evy 2 ) Assume that n is an in teger, not prime. Sho w that y ou can nd t w o distributions a and b on the nonnegativ e in tegers suc h that the con v olution of 2 See M. Krasner and B. Ran ulae, \Sur une Propriet e des P olynomes de la Division du Circle"; and the follo wing note b y J. Hadamard, in C. R. A c ad. Sci., v ol. 204 (1937), pp. 397{399. PAGE 299 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 291 a and b is the equiprobable distribution on the set 0, 1, 2, n 1. If n is prime this is not p ossible, but the pro of is not so easy (Assume that neither a nor b is concen trated at 0.) 11 Assume that y ou are pla ying craps with dice that are loaded in the follo wing w a y: faces t w o, three, four, and v e all come up with the same probabilit y (1 = 6) + r F aces one and six come up with probabilit y (1 = 6) 2 r with 0 < r < : 02. W rite a computer program to nd the probabilit y of winning at craps with these dice, and using y our program nd whic h v alues of r mak e craps a fa v orable game for the pla y er with these dice. 7.2 Sums of Con tin uous Random V ariables In this section w e consider the con tin uous v ersion of the problem p osed in the previous section: Ho w are sums of indep enden t random v ariables distributed? Con v olutions Denition 7.2 Let X and Y b e t w o con tin uous random v ariables with densit y functions f ( x ) and g ( y ), resp ectiv ely Assume that b oth f ( x ) and g ( y ) are dened for all real n um b ers. Then the c onvolution f g of f and g is the function giv en b y ( f g )( z ) = Z + 1 1 f ( z y ) g ( y ) dy = Z + 1 1 g ( z x ) f ( x ) dx : 2 This denition is analogous to the denition, giv en in Section 7.1, of the conv olution of t w o distribution functions. Th us it should not b e surprising that if X and Y are indep enden t, then the densit y of their sum is the con v olution of their densities. This fact is stated as a theorem b elo w, and its pro of is left as an exercise (see Exercise 1). Theorem 7.1 Let X and Y b e t w o indep enden t random v ariables with densit y functions f X ( x ) and f Y ( y ) dened for all x Then the sum Z = X + Y is a random v ariable with densit y function f Z ( z ), where f Z is the con v olution of f X and f Y 2 T o get a b etter understanding of this imp ortan t result, w e will lo ok at some examples. PAGE 300 292 CHAPTER 7. SUMS OF RANDOM V ARIABLES Sum of Tw o Indep enden t Uniform Random V ariables Example 7.3 Supp ose w e c ho ose indep enden tly t w o n um b ers at random from the in terv al [0 ; 1] with uniform probabilit y densit y. What is the densit y of their sum? Let X and Y b e random v ariables describing our c hoices and Z = X + Y their sum. Then w e ha v e f X ( x ) = f Y ( x ) = 1 if 0 x 1, 0 otherwise; and the densit y function for the sum is giv en b y f Z ( z ) = Z + 1 1 f X ( z y ) f Y ( y ) dy : Since f Y ( y ) = 1 if 0 y 1 and 0 otherwise, this b ecomes f Z ( z ) = Z 1 0 f X ( z y ) dy : No w the in tegrand is 0 unless 0 z y 1 (i.e., unless z 1 y z ) and then it is 1. So if 0 z 1, w e ha v e f Z ( z ) = Z z 0 dy = z ; while if 1 < z 2, w e ha v e f Z ( z ) = Z 1 z 1 dy = 2 z ; and if z < 0 or z > 2 w e ha v e f Z ( z ) = 0 (see Figure 7.2). Hence, f Z ( z ) = 8<: z ; if 0 z 1 ; 2 z ; if 1 < z 2 ; 0 ; otherwise. Note that this result agrees with that of Example 2.4. 2 Sum of Tw o Indep enden t Exp onen tial Random V ariables Example 7.4 Supp ose w e c ho ose t w o n um b ers at random from the in terv al [0 ; 1 ) with an exp onential densit y with parameter What is the densit y of their sum? Let X Y and Z = X + Y denote the relev an t random v ariables, and f X f Y and f Z their densities. Then f X ( x ) = f Y ( x ) = e x ; if x 0 ; 0 ; otherwise; PAGE 301 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 293 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 Figure 7.2: Con v olution of t w o uniform densities. 1 2 3 4 5 6 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Figure 7.3: Con v olution of t w o exp onen tial densities with = 1. and so, if z > 0, f Z ( z ) = Z + 1 1 f X ( z y ) f Y ( y ) dy = Z z 0 e ( z y ) e y dy = Z z 0 2 e z dy = 2 z e z ; while if z < 0, f Z ( z ) = 0 (see Figure 7.3). Hence, f Z ( z ) = 2 z e z ; if z 0 ; 0 ; otherwise. 2 PAGE 302 294 CHAPTER 7. SUMS OF RANDOM V ARIABLES Sum of Tw o Indep enden t Normal Random V ariables Example 7.5 It is an in teresting and imp ortan t fact that the con v olution of t w o normal densities with means 1 and 2 and v ariances 1 and 2 is again a normal densit y with mean 1 + 2 and v ariance 2 1 + 2 2 W e will sho w this in the sp ecial case that b oth random v ariables are standard normal. The general case can b e done in the same w a y but the calculation is messier. Another w a y to sho w the general result is giv en in Example 10.17. Supp ose X and Y are t w o indep enden t random v ariables, eac h with the standard normal densit y (see Example 5.8). W e ha v e f X ( x ) = f Y ( y ) = 1 p 2 e x 2 = 2 ; and so f Z ( z ) = f X f Y ( z ) = 1 2 Z + 1 1 e ( z y ) 2 = 2 e y 2 = 2 dy = 1 2 e z 2 = 4 Z + 1 1 e ( y z = 2) 2 dy = 1 2 e z 2 = 4 p 1 p Z 1 1 e ( y z = 2) 2 dy : The expression in the brac k ets equals 1, since it is the in tegral of the normal densit y function with = 0 and = p 2 So, w e ha v e f Z ( z ) = 1 p 4 e z 2 = 4 : 2 Sum of Tw o Indep enden t Cauc h y Random V ariables Example 7.6 Cho ose t w o n um b ers at random from the in terv al ( 1 ; + 1 ) with the Cauc h y densit y with parameter a = 1 (see Example 5.10). Then f X ( x ) = f Y ( x ) = 1 (1 + x 2 ) ; and Z = X + Y has densit y f Z ( z ) = 1 2 Z + 1 1 1 1 + ( z y ) 2 1 1 + y 2 dy : PAGE 303 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 295 This in tegral requires some eort, and w e giv e here only the result (see Section 10.3, or Dw ass 3 ): f Z ( z ) = 2 (4 + z 2 ) : No w, supp ose that w e ask for the densit y function of the aver age A = (1 = 2)( X + Y ) of X and Y Then A = (1 = 2) Z Exercise 5.2.19 sho ws that if U and V are t w o con tin uous random v ariables with densit y functions f U ( x ) and f V ( x ), resp ectiv ely and if V = aU then f V ( x ) = 1 a f U x a : Th us, w e ha v e f A ( z ) = 2 f Z (2 z ) = 1 (1 + z 2 ) : Hence, the densit y function for the a v erage of t w o random v ariables, eac h ha ving a Cauc h y densit y is again a random v ariable with a Cauc h y densit y; this remark able prop ert y is a p eculiarit y of the Cauc h y densit y One consequence of this is if the error in a certain measuremen t pro cess had a Cauc h y densit y and y ou a v eraged a n um b er of measuremen ts, the a v erage could not b e exp ected to b e an y more accurate than an y one of y our individual measuremen ts! 2 Ra yleigh Densit y Example 7.7 Supp ose X and Y are t w o indep enden t standard normal random v ariables. No w supp ose w e lo cate a p oin t P in the xy plane with co ordinates ( X ; Y ) and ask: What is the densit y of the square of the distance of P from the origin? (W e ha v e already sim ulated this problem in Example 5.9.) Here, with the preceding notation, w e ha v e f X ( x ) = f Y ( x ) = 1 p 2 e x 2 = 2 : Moreo v er, if X 2 denotes the square of X then (see Theorem 5.1 and the discussion follo wing) f X 2 ( r ) = 1 2 p r ( f X ( p r ) + f X ( p r )) if r > 0 ; 0 otherwise. = 1 p 2 r ( e r = 2 ) if r > 0 ; 0 otherwise. 3 M. Dw ass, \On the Con v olution of Cauc h y Distributions," A meric an Mathematic al Monthly, v ol. 92, no. 1, (1985), pp. 55{57; see also R. Nelson, letters to the Editor, ibid., p. 679. PAGE 304 296 CHAPTER 7. SUMS OF RANDOM V ARIABLES This is a gamma densit y with = 1 = 2, = 1 = 2 (see Example 7.4). No w let R 2 = X 2 + Y 2 Then f R 2 ( r ) = Z + 1 1 f X 2 ( r s ) f Y 2 ( s ) ds = 1 4 Z + 1 1 e ( r s ) = 2 r s 2 1 = 2 e s s 2 1 = 2 ds ; = 1 2 e r 2 = 2 ; if r 0 ; 0 ; otherwise. Hence, R 2 has a gamma densit y with = 1 = 2, = 1. W e can in terpret this result as giving the densit y for the square of the distance of P from the cen ter of a target if its co ordinates are normally distributed. The densit y of the random v ariable R is obtained from that of R 2 in the usual w a y (see Theorem 5.1), and w e nd f R ( r ) = 1 2 e r 2 = 2 2 r = r e r 2 = 2 ; if r 0 ; 0 ; otherwise. Ph ysicists will recognize this as a Ra yleigh densit y Our result here agrees with our sim ulation in Example 5.9. 2 ChiSquared Densit y More generally the same metho d sho ws that the sum of the squares of n indep enden t normally distributed random v ariables with mean 0 and standard deviation 1 has a gamma densit y with = 1 = 2 and = n= 2. Suc h a densit y is called a chisquar e d density with n degrees of freedom. This densit y w as in tro duced in Chapter 4.3. In Example 5.10, w e used this densit y to test the h yp othesis that t w o traits w ere indep enden t. Another imp ortan t use of the c hisquared densit y is in comparing exp erimen tal data with a theoretical discrete distribution, to see whether the data supp orts the theoretical mo del. More sp ecically supp ose that w e ha v e an exp erimen t with a nite set of outcomes. If the set of outcomes is coun table, w e group them in to nitely man y sets of outcomes. W e prop ose a theoretical distribution whic h w e think will mo del the exp erimen t w ell. W e obtain some data b y rep eating the exp erimen t a n um b er of times. No w w e wish to c hec k ho w w ell the theoretical distribution ts the data. Let X b e the random v ariable whic h represen ts a theoretical outcome in the mo del of the exp erimen t, and let m ( x ) b e the distribution function of X In a manner similar to what w as done in Example 5.10, w e calculate the v alue of the expression V = X x ( o x n m ( x )) 2 n m ( x ) ; where the sum runs o v er all p ossible outcomes x n is the n um b er of data p oin ts, and o x denotes the n um b er of outcomes of t yp e x observ ed in the data. Then PAGE 305 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 297 Outcome Observ ed F requency 1 15 2 8 3 7 4 5 5 7 6 18 T able 7.1: Observ ed data. for mo derate or large v alues of n the quan tit y V is appro ximately c hisquared distributed, with 1 degrees of freedom, where represen ts the n um b er of p ossible outcomes. The pro of of this is b ey ond the scop e of this b o ok, but w e will illustrate the reasonableness of this statemen t in the next example. If the v alue of V is v ery large, when compared with the appropriate c hisquared densit y function, then w e w ould tend to reject the h yp othesis that the mo del is an appropriate one for the exp erimen t at hand. W e no w giv e an example of this pro cedure. Example 7.8 Supp ose w e are giv en a single die. W e wish to test the h yp othesis that the die is fair. Th us, our theoretical distribution is the uniform distribution on the in tegers b et w een 1 and 6. So, if w e roll the die n times, the exp ected n um b er of data p oin ts of eac h t yp e is n= 6. Th us, if o i denotes the actual n um b er of data p oin ts of t yp e i for 1 i 6, then the expression V = 6 X i =1 ( o i n= 6) 2 n= 6 is appro ximately c hisquared distributed with 5 degrees of freedom. No w supp ose that w e actually roll the die 60 times and obtain the data in T able 7.1. If w e calculate V for this data, w e obtain the v alue 13.6. The graph of the c hisquared densit y with 5 degrees of freedom is sho wn in Figure 7.4. One sees that v alues as large as 13.6 are rarely tak en on b y V if the die is fair, so w e w ould reject the h yp othesis that the die is fair. (When using this test, a statistician will reject the h yp othesis if the data giv es a v alue of V whic h is larger than 95% of the v alues one w ould exp ect to obtain if the h yp othesis is true.) In Figure 7.5, w e sho w the results of rolling a die 60 times, then calculating V and then rep eating this exp erimen t 1000 times. The program that p erforms these calculations is called DieT est W e ha v e sup erimp osed the c hisquared densit y with 5 degrees of freedom; one can see that the data v alues t the curv e fairly w ell, whic h supp orts the statemen t that the c hisquared densit y is the correct one to use. 2 So far w e ha v e lo ok ed at sev eral imp ortan t sp ecial cases for whic h the con v olution in tegral can b e ev aluated explicitly In general, the con v olution of t w o con tin uous densities cannot b e ev aluated explicitly and w e m ust resort to n umerical metho ds. F ortunately these pro v e to b e remark ably eectiv e, at least for b ounded densities. PAGE 306 298 CHAPTER 7. SUMS OF RANDOM V ARIABLES 5 10 15 20 0.025 0.05 0.075 0.1 0.125 0.15 Figure 7.4: Chisquared densit y with 5 degrees of freedom. 0 5 10 15 20 25 30 0 0.025 0.05 0.075 0.1 0.125 0.15 1000 experimentsr60 rolls per experiment Figure 7.5: Rolling a fair die. PAGE 307 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 299 1 2 3 4 5 6 7 8 0 0.2 0.4 0.6 0.8 1 n = 2 n = 4 n = 6 n = 8 n = 10 Figure 7.6: Con v olution of n uniform densities. Indep enden t T rials W e no w consider briery the distribution of the sum of n indep enden t random v ariables, all ha ving the same densit y function. If X 1 X 2 X n are these random v ariables and S n = X 1 + X 2 + + X n is their sum, then w e will ha v e f S n ( x ) = ( f X 1 f X 2 f X n ) ( x ) ; where the righ thand side is an n fold con v olution. It is p ossible to calculate this densit y for general v alues of n in certain simple cases. Example 7.9 Supp ose the X i are uniformly distributed on the in terv al [0 ; 1]. Then f X i ( x ) = 1 ; if 0 x 1 ; 0 ; otherwise, and f S n ( x ) is giv en b y the form ula 4 f S n ( x ) = 1 ( n 1)! P 0 j x ( 1) j n j ( x j ) n 1 ; if 0 < x < n; 0 ; otherwise. The densit y f S n ( x ) for n = 2, 4, 6, 8, 10 is sho wn in Figure 7.6. If the X i are distributed normally with mean 0 and v ariance 1, then (cf. Example 7.5) f X i ( x ) = 1 p 2 e x 2 = 2 ; 4 J. B. Usp ensky Intr o duction to Mathematic al Pr ob ability (New Y ork: McGra wHill, 1937), p. 277. PAGE 308 300 CHAPTER 7. SUMS OF RANDOM V ARIABLES 15 10 5 5 10 15 0.025 0.05 0.075 0.1 0.125 0.15 0.175 n = 5 n = 10 n = 15 n = 20 n = 25 Figure 7.7: Con v olution of n standard normal densities. and f S n ( x ) = 1 p 2 n e x 2 = 2 n : Here the densit y f S n for n = 5, 10, 15, 20, 25 is sho wn in Figure 7.7. If the X i are all exp onen tially distributed, with mean 1 = then f X i ( x ) = e x ; and f S n ( x ) = e x ( x ) n 1 ( n 1)! : In this case the densit y f S n for n = 2, 4, 6, 8, 10 is sho wn in Figure 7.8. 2 Exercises 1 Let X and Y b e indep enden t realv alued random v ariables with densit y functions f X ( x ) and f Y ( y ), resp ectiv ely Sho w that the densit y function of the sum X + Y is the con v olution of the functions f X ( x ) and f Y ( y ). Hint : Let X b e the join t random v ariable ( X ; Y ). Then the join t densit y function of X is f X ( x ) f Y ( y ), since X and Y are indep enden t. No w compute the probabilit y that X + Y z b y in tegrating the join t densit y function o v er the appropriate region in the plane. This giv es the cum ulativ e distribution function of Z No w dieren tiate this function with resp ect to z to obtain the densit y function of z 2 Let X and Y b e indep enden t random v ariables dened on the space n, with densit y functions f X and f Y resp ectiv ely Supp ose that Z = X + Y Find the densit y f Z of Z if PAGE 309 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 301 5 10 15 20 0.05 0.1 0.15 0.2 0.25 0.3 0.35 n = 2 n = 4 n = 6 n = 8 n = 10 Figure 7.8: Con v olution of n exp onen tial densities with = 1. (a) f X ( x ) = f Y ( x ) = 1 = 2 ; if 1 x +1 ; 0 ; otherwise. (b) f X ( x ) = f Y ( x ) = 1 = 2 ; if 3 x 5 ; 0 ; otherwise. (c) f X ( x ) = 1 = 2 ; if 1 x 1 ; 0 ; otherwise. f Y ( x ) = 1 = 2 ; if 3 x 5 ; 0 ; otherwise. (d) What can y ou sa y ab out the set E = f z : f Z ( z ) > 0 g in eac h case? 3 Supp ose again that Z = X + Y Find f Z if (a) f X ( x ) = f Y ( x ) = x= 2 ; if 0 < x < 2 ; 0 ; otherwise : (b) f X ( x ) = f Y ( x ) = (1 = 2)( x 3) ; if 3 < x < 5 ; 0 ; otherwise : (c) f X ( x ) = 1 = 2 ; if 0 < x < 2 ; 0 ; otherwise ; PAGE 310 302 CHAPTER 7. SUMS OF RANDOM V ARIABLES f Y ( x ) = x= 2 ; if 0 < x < 2 ; 0 ; otherwise : (d) What can y ou sa y ab out the set E = f z : f Z ( z ) > 0 g in eac h case? 4 Let X Y and Z b e indep enden t random v ariables with f X ( x ) = f Y ( x ) = f Z ( x ) = 1 ; if 0 < x < 1 ; 0 ; otherwise. Supp ose that W = X + Y + Z Find f W directly and compare y our answ er with that giv en b y the form ula in Example 7.9. Hint : See Example 7.3. 5 Supp ose that X and Y are indep enden t and Z = X + Y Find f Z if (a) f X ( x ) = e x ; if x > 0 ; 0 ; otherwise. f Y ( x ) = e x ; if x > 0 ; 0 ; otherwise. (b) f X ( x ) = e x ; if x > 0 ; 0 ; otherwise. f Y ( x ) = 1 ; if 0 < x < 1 ; 0 ; otherwise. 6 Supp ose again that Z = X + Y Find f Z if f X ( x ) = 1 p 2 1 e ( x 1 ) 2 = 2 2 1 f Y ( x ) = 1 p 2 2 e ( x 2 ) 2 = 2 2 2 : *7 Supp ose that R 2 = X 2 + Y 2 Find f R 2 and f R if f X ( x ) = 1 p 2 1 e ( x 1 ) 2 = 2 2 1 f Y ( x ) = 1 p 2 2 e ( x 2 ) 2 = 2 2 2 : 8 Supp ose that R 2 = X 2 + Y 2 Find f R 2 and f R if f X ( x ) = f Y ( x ) = 1 = 2 ; if 1 x 1 ; 0 ; otherwise. 9 Assume that the service time for a customer at a bank is exp onen tially distributed with mean service time 2 min utes. Let X b e the total service time for 10 customers. Estimate the probabilit y that X > 22 min utes. PAGE 311 7.2. SUMS OF CONTINUOUS RANDOM V ARIABLES 303 10 Let X 1 X 2 X n b e n indep enden t random v ariables eac h of whic h has an exp onen tial densit y with mean Let M b e the minimum v alue of the X j Sho w that the densit y for M is exp onen tial with mean =n Hint : Use cum ulativ e distribution functions. 11 A compan y buys 100 ligh tbulbs, eac h of whic h has an exp onen tial lifetime of 1000 hours. What is the exp ected time for the rst of these bulbs to burn out? (See Exercise 10.) 12 An insurance compan y assumes that the time b et w een claims from eac h of its homeo wners' p olicies is exp onen tially distributed with mean It w ould lik e to estimate b y a v eraging the times for a n um b er of p olicies, but this is not v ery practical since the time b et w een claims is ab out 30 y ears. A t Galam b os' 5 suggestion the compan y puts its customers in groups of 50 and observ es the time of the rst claim within eac h group. Sho w that this pro vides a practical w a y to estimate the v alue of 13 P articles are sub ject to collisions that cause them to split in to t w o parts with eac h part a fraction of the paren t. Supp ose that this fraction is uniformly distributed b et w een 0 and 1. F ollo wing a single particle through sev eral splittings w e obtain a fraction of the original particle Z n = X 1 X 2 : : : X n where eac h X j is uniformly distributed b et w een 0 and 1. Sho w that the densit y for the random v ariable Z n is f n ( z ) = 1 ( n 1)! ( log z ) n 1 : Hint : Sho w that Y k = log X k is exp onen tially distributed. Use this to nd the densit y function for S n = Y 1 + Y 2 + + Y n and from this the cum ulativ e distribution and densit y of Z n = e S n 14 Assume that X 1 and X 2 are indep enden t random v ariables, eac h ha ving an exp onen tial densit y with parameter Sho w that Z = X 1 X 2 has densit y f Z ( z ) = (1 = 2) e j z j : 15 Supp ose w e w an t to test a coin for fairness. W e rip the coin n times and record the n um b er of times X 0 that the coin turns up tails and the n um b er of times X 1 = n X 0 that the coin turns up heads. No w w e set Z = 1 X i =0 ( X i n= 2) 2 n= 2 : Then for a fair coin Z has appro ximately a c hisquared distribution with 2 1 = 1 degree of freedom. V erify this b y computer sim ulation rst for a fair coin ( p = 1 = 2) and then for a biased coin ( p = 1 = 3). 5 J. Galam b os, Intr o ductory Pr ob ability The ory (New Y ork: Marcel Dekk er, 1984), p. 159. PAGE 312 304 CHAPTER 7. SUMS OF RANDOM V ARIABLES 16 V erify y our answ ers in Exercise 2(a) b y computer sim ulation: Cho ose X and Y from [ 1 ; 1] with uniform densit y and calculate Z = X + Y Rep eat this exp erimen t 500 times, recording the outcomes in a bar graph on [ 2 ; 2] with 40 bars. Do es the densit y f Z calculated in Exercise 2(a) describ e the shap e of y our bar graph? T ry this for Exercises 2(b) and Exercise 2(c), to o. 17 V erify y our answ ers to Exercise 3 b y computer sim ulation. 18 V erify y our answ er to Exercise 4 b y computer sim ulation. 19 The supp ort of a function f ( x ) is dened to b e the set f x : f ( x ) > 0 g : Supp ose that X and Y are t w o con tin uous random v ariables with densit y functions f X ( x ) and f Y ( y ), resp ectiv ely and supp ose that the supp orts of these densit y functions are the in terv als [ a; b ] and [ c; d ], resp ectiv ely Find the supp ort of the densit y function of the random v ariable X + Y 20 Let X 1 X 2 X n b e a sequence of indep enden t random v ariables, all ha ving a common densit y function f X with supp ort [ a; b ] (see Exercise 19). Let S n = X 1 + X 2 + + X n with densit y function f S n Sho w that the supp ort of f S n is the in terv al [ na; nb ]. Hint : W rite f S n = f S n 1 f X No w use Exercise 19 to establish the desired result b y induction. 21 Let X 1 X 2 X n b e a sequence of indep enden t random v ariables, all ha ving a common densit y function f X Let A = S n =n b e their a v erage. Find f A if (a) f X ( x ) = (1 = p 2 ) e x 2 = 2 (normal densit y). (b) f X ( x ) = e x (exp onen tial densit y). Hint : W rite f A ( x ) in terms of f S n ( x ). PAGE 313 Chapter 8 La w of Large Num b ers 8.1 La w of Large Num b ers for Discrete Random V ariables W e are no w in a p osition to pro v e our rst fundamen tal theorem of probabilit y W e ha v e seen that an in tuitiv e w a y to view the probabilit y of a certain outcome is as the frequency with whic h that outcome o ccurs in the long run, when the exp erimen t is rep eated a large n um b er of times. W e ha v e also dened probabilit y mathematically as a v alue of a distribution function for the random v ariable represen ting the exp erimen t. The La w of Large Num b ers, whic h is a theorem pro v ed ab out the mathematical mo del of probabilit y sho ws that this mo del is consisten t with the frequency in terpretation of probabilit y This theorem is sometimes called the law of aver ages. T o nd out what w ould happ en if this la w w ere not true, see the article b y Rob ert M. Coates. 1 Cheb yshev Inequalit y T o discuss the La w of Large Num b ers, w e rst need an imp ortan t inequalit y called the Chebyshev Ine quality. Theorem 8.1 (Cheb yshev Inequalit y) Let X b e a discrete random v ariable with exp ected v alue = E ( X ), and let > 0 b e an y p ositiv e real n um b er. Then P ( j X j ) V ( X ) 2 : Pro of. Let m ( x ) denote the distribution function of X Then the probabilit y that X diers from b y at least is giv en b y P ( j X j ) = X j x j m ( x ) : 1 R. M. Coates, \The La w," The World of Mathematics, ed. James R. Newman (New Y ork: Simon and Sc h uster, 1956. 305 PAGE 314 306 CHAPTER 8. LA W OF LAR GE NUMBERS W e kno w that V ( X ) = X x ( x ) 2 m ( x ) ; and this is clearly at least as large as X j x j ( x ) 2 m ( x ) ; since all the summands are p ositiv e and w e ha v e restricted the range of summation in the second sum. But this last sum is at least X j x j 2 m ( x ) = 2 X j x j m ( x ) = 2 P ( j X j ) : So, P ( j X j ) V ( X ) 2 : 2 Note that X in the ab o v e theorem can b e an y discrete random v ariable, and an y p ositiv e n um b er. Example 8.1 Let X b y an y random v ariable with E ( X ) = and V ( X ) = 2 Then, if = k Cheb yshev's Inequalit y states that P ( j X j k ) 2 k 2 2 = 1 k 2 : Th us, for an y random v ariable, the probabilit y of a deviation from the mean of more than k standard deviations is 1 =k 2 If, for example, k = 5, 1 =k 2 = : 04. 2 Cheb yshev's Inequalit y is the b est p ossible inequalit y in the sense that, for an y > 0, it is p ossible to giv e an example of a random v ariable for whic h Cheb yshev's Inequalit y is in fact an equalit y T o see this, giv en > 0, c ho ose X with distribution p X = + 1 = 2 1 = 2 : Then E ( X ) = 0, V ( X ) = 2 and P ( j X j ) = V ( X ) 2 = 1 : W e are no w prepared to state and pro v e the La w of Large Num b ers. PAGE 315 8.1. DISCRETE RANDOM V ARIABLES 307 La w of Large Num b ers Theorem 8.2 (La w of Large Num b ers) Let X 1 X 2 X n b e an indep enden t trials pro cess, with nite exp ected v alue = E ( X j ) and nite v ariance 2 = V ( X j ). Let S n = X 1 + X 2 + + X n Then for an y > 0, P S n n 0 as n 1 Equiv alen tly P S n n < 1 as n 1 Pro of. Since X 1 X 2 X n are indep enden t and ha v e the same distributions, w e can apply Theorem 6.9. W e obtain V ( S n ) = n 2 ; and V ( S n n ) = 2 n : Also w e kno w that E ( S n n ) = : By Cheb yshev's Inequalit y for an y > 0, P S n n 2 n 2 : Th us, for xed P S n n 0 as n 1 or equiv alen tly P S n n < 1 as n 1 2 La w of Av erages Note that S n =n is an a v erage of the individual outcomes, and one often calls the La w of Large Num b ers the \la w of a v erages." It is a striking fact that w e can start with a random exp erimen t ab out whic h little can b e predicted and, b y taking a v erages, obtain an exp erimen t in whic h the outcome can b e predicted with a high degree of certain t y The La w of Large Num b ers, as w e ha v e stated it, is often called the \W eak La w of Large Num b ers" to distinguish it from the \Strong La w of Large Num b ers" describ ed in Exercise 15. PAGE 316 308 CHAPTER 8. LA W OF LAR GE NUMBERS Consider the imp ortan t sp ecial case of Bernoulli trials with probabilit y p for success. Let X j = 1 if the j th outcome is a success and 0 if it is a failure. Then S n = X 1 + X 2 + + X n is the n um b er of successes in n trials and = E ( X 1 ) = p The La w of Large Num b ers states that for an y > 0 P S n n p < 1 as n 1 The ab o v e statemen t sa ys that, in a large n um b er of rep etitions of a Bernoulli exp erimen t, w e can exp ect the prop ortion of times the ev en t will o ccur to b e near p This sho ws that our mathematical mo del of probabilit y agrees with our frequency in terpretation of probabilit y Coin T ossing Let us consider the sp ecial case of tossing a coin n times with S n the n um b er of heads that turn up. Then the random v ariable S n =n represen ts the fraction of times heads turns up and will ha v e v alues b et w een 0 and 1. The La w of Large Num b ers predicts that the outcomes for this random v ariable will, for large n b e near 1/2. In Figure 8.1, w e ha v e plotted the distribution for this example for increasing v alues of n W e ha v e mark ed the outcomes b et w een .45 and .55 b y dots at the top of the spik es. W e see that as n increases the distribution gets more and more concen trated around .5 and a larger and larger p ercen tage of the total area is con tained within the in terv al ( : 45 ; : 55), as predicted b y the La w of Large Num b ers. Die Rolling Example 8.2 Consider n rolls of a die. Let X j b e the outcome of the j th roll. Then S n = X 1 + X 2 + + X n is the sum of the rst n rolls. This is an indep enden t trials pro cess with E ( X j ) = 7 = 2. Th us, b y the La w of Large Num b ers, for an y > 0 P S n n 7 2 0 as n 1 An equiv alen t w a y to state this is that, for an y > 0, P S n n 7 2 < 1 as n 1 2 Numerical Comparisons It should b e emphasized that, although Cheb yshev's Inequalit y pro v es the La w of Large Num b ers, it is actually a v ery crude inequalit y for the probabilities in v olv ed. Ho w ev er, its strength lies in the fact that it is true for an y random v ariable at all, and it allo ws us to pro v e a v ery p o w erful theorem. In the follo wing example, w e compare the estimates giv en b y Cheb yshev's Inequalit y with the actual v alues. PAGE 317 8.1. DISCRETE RANDOM V ARIABLES 309 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 0.25 0 0.2 0.4 0.6 0.8 1 0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 n=10 n=20 n=40 n=30 n=60 n=100 Figure 8.1: Bernoulli trials distributions. PAGE 318 310 CHAPTER 8. LA W OF LAR GE NUMBERS Example 8.3 Let X 1 X 2 X n b e a Bernoulli trials pro cess with probabilit y .3 for success and .7 for failure. Let X j = 1 if the j th outcome is a success and 0 otherwise. Then, E ( X j ) = : 3 and V ( X j ) = ( : 3)( : 7) = : 21. If A n = S n n = X 1 + X 2 + + X n n is the aver age of the X i then E ( A n ) = : 3 and V ( A n ) = V ( S n ) =n 2 = : 21 =n Cheb yshev's Inequalit y states that if, for example, = : 1, P ( j A n : 3 j : 1) : 21 n ( : 1) 2 = 21 n : Th us, if n = 100, P ( j A 100 : 3 j : 1) : 21 ; or if n = 1000, P ( j A 1000 : 3 j : 1) : 021 : These can b e rewritten as P ( : 2 < A 100 < : 4) : 79 ; P ( : 2 < A 1000 < : 4) : 979 : These v alues should b e compared with the actual v alues, whic h are (to six decimal places) P ( : 2 < A 100 < : 4) : 962549 P ( : 2 < A 1000 < : 4) 1 : The program La w can b e used to carry out the ab o v e calculations in a systematic w a y 2 Historical Remarks The La w of Large Num b ers w as rst pro v ed b y the Swiss mathematician James Bernoulli in the fourth part of his w ork A rs Conje ctandi published p osth umously in 1713. 2 As often happ ens with a rst pro of, Bernoulli's pro of w as m uc h more dicult than the pro of w e ha v e presen ted using Cheb yshev's inequalit y Cheb yshev dev elop ed his inequalit y to pro v e a general form of the La w of Large Num b ers (see Exercise 12). The inequalit y itself app eared m uc h earlier in a w ork b y Biena ym e, and in discussing its history Maistro v remarks that it w as referred to as the Biena ym eCheb yshev Inequalit y for a long time. 3 In A rs Conje ctandi Bernoulli pro vides his reader with a long discussion of the meaning of his theorem with lots of examples. In mo dern notation he has an ev en t 2 J. Bernoulli, The A rt of Conje cturing IV, trans. Bing Sung, T ec hnical Rep ort No. 2, Dept. of Statistics, Harv ard Univ., 1966 3 L. E. Maistro v, Pr ob ability The ory: A Historic al Appr o ach, trans. and ed. Sam ual Kotz, (New Y ork: Academic Press, 1974), p. 202 PAGE 319 8.1. DISCRETE RANDOM V ARIABLES 311 that o ccurs with probabilit y p but he do es not kno w p He w an ts to estimate p b y the fraction p of the times the ev en t o ccurs when the exp erimen t is rep eated a n um b er of times. He discusses in detail the problem of estimating, b y this metho d, the prop ortion of white balls in an urn that con tains an unkno wn n um b er of white and blac k balls. He w ould do this b y dra wing a sequence of balls from the urn, replacing the ball dra wn after eac h dra w, and estimating the unkno wn prop ortion of white balls in the urn b y the prop ortion of the balls dra wn that are white. He sho ws that, b y c ho osing n large enough he can obtain an y desired accuracy and reliabilit y for the estimate. He also pro vides a liv ely discussion of the applicabilit y of his theorem to estimating the probabilit y of dying of a particular disease, of dieren t kinds of w eather o ccurring, and so forth. In sp eaking of the n um b er of trials necessary for making a judgemen t, Bernoulli observ es that the \man on the street" b eliev es the \la w of a v erages." F urther, it cannot escap e an y one that for judging in this w a y ab out an y ev en t at all, it is not enough to use one or t w o trials, but rather a great n um b er of trials is required. And sometimes the stupidest manb y some instinct of nature p er se and b y no previous instruction (this is truly amazing) kno ws for sure that the more observ ations of this sort that are tak en, the less the danger will b e of stra ying from the mark. 4 But he go es on to sa y that he m ust con template another p ossibilit y Something futher m ust b e con templated here whic h p erhaps no one has though t ab out till no w. It certainly remains to b e inquired whether after the n um b er of observ ations has b een increased, the probabilit y is increased of attaining the true ratio b et w een the n um b er of cases in whic h some ev en t can happ en and in whic h it cannot happ en, so that this probabilit y nally exceeds an y giv en degree of certain t y; or whether the problem has, so to sp eak, its o wn asymptotethat is, whether some degree of certain t y is giv en whic h one can nev er exceed. 5 Bernoulli recognized the imp ortance of this theorem, writing: Therefore, this is the problem whic h I no w set forth and mak e kno wn after I ha v e already p ondered o v er it for t w en t y y ears. Both its no v elt y and its v ery great usefullness, coupled with its just as great dicult y can exceed in w eigh t and v alue all the remaining c hapters of this thesis. 6 Bernoulli concludes his long pro of with the remark: Whence, nally this one thing seems to follo w: that if observ ations of all ev en ts w ere to b e con tin ued throughout all eternit y (and hence the ultimate probabilit y w ould tend to w ard p erfect certain t y), ev erything in 4 Bernoulli, op. cit., p. 38. 5 ibid., p. 39. 6 ibid., p. 42. PAGE 320 312 CHAPTER 8. LA W OF LAR GE NUMBERS the w orld w ould b e p erceiv ed to happ en in xed ratios and according to a constan t la w of alternation, so that ev en in the most acciden tal and fortuitous o ccurrences w e w ould b e b ound to recognize, as it w ere, a certain necessit y and, so to sp eak, a certain fate. I do no w kno w whether Plato wished to aim at this in his do ctrine of the univ ersal return of things, according to whic h he predicted that all things will return to their original state after coun tless ages ha v e past. 7 Exercises 1 A fair coin is tossed 100 times. The exp ected n um b er of heads is 50, and the standard deviation for the n um b er of heads is (100 1 = 2 1 = 2) 1 = 2 = 5. What do es Cheb yshev's Inequalit y tell y ou ab out the probabilit y that the n um b er of heads that turn up deviates from the exp ected n um b er 50 b y three or more standard deviations (i.e., b y at least 15)? 2 W rite a program that uses the function binomial( n; p; x ) to compute the exact probabilit y that y ou estimated in Exercise 1. Compare the t w o results. 3 W rite a program to toss a coin 10,000 times. Let S n b e the n um b er of heads in the rst n tosses. Ha v e y our program prin t out, after ev ery 1000 tosses, S n n= 2. On the basis of this sim ulation, is it correct to sa y that y ou can exp ect heads ab out half of the time when y ou toss a coin a large n um b er of times? 4 A 1dollar b et on craps has an exp ected winning of : 0141. What do es the La w of Large Num b ers sa y ab out y our winnings if y ou mak e a large n um b er of 1dollar b ets at the craps table? Do es it assure y ou that y our losses will b e small? Do es it assure y ou that if n is v ery large y ou will lose? 5 Let X b e a random v ariable with E ( X ) = 0 and V ( X ) = 1. What in teger v alue k will assure us that P ( j X j k ) : 01? 6 Let S n b e the n um b er of successes in n Bernoulli trials with probabilit y p for success on eac h trial. Sho w, using Cheb yshev's Inequalit y that for an y > 0 P S n n p p (1 p ) n 2 : 7 Find the maxim um p ossible v alue for p (1 p ) if 0 < p < 1. Using this result and Exercise 6, sho w that the estimate P S n n p 1 4 n 2 is v alid for an y p 7 ibid., pp. 65{66. PAGE 321 8.1. DISCRETE RANDOM V ARIABLES 313 8 A fair coin is tossed a large n um b er of times. Do es the La w of Large Num b ers assure us that, if n is large enough, with probabilit y > : 99 the n um b er of heads that turn up will not deviate from n= 2 b y more than 100? 9 In Exercise 6.2.15, y ou sho w ed that, for the hat c hec k problem, the n um b er S n of p eople who get their o wn hats bac k has E ( S n ) = V ( S n ) = 1. Using Cheb yshev's Inequalit y sho w that P ( S n 11) : 01 for an y n 11. 10 Let X b y an y random v ariable whic h tak es on v alues 0, 1, 2, n and has E ( X ) = V ( X ) = 1. Sho w that, for an y p ositiv e in teger k P ( X k + 1) 1 k 2 : 11 W e ha v e t w o coins: one is a fair coin and the other is a coin that pro duces heads with probabilit y 3/4. One of the t w o coins is pic k ed at random, and this coin is tossed n times. Let S n b e the n um b er of heads that turns up in these n tosses. Do es the La w of Large Num b ers allo w us to predict the prop ortion of heads that will turn up in the long run? After w e ha v e observ ed a large n um b er of tosses, can w e tell whic h coin w as c hosen? Ho w man y tosses suce to mak e us 95 p ercen t sure? 12 (Cheb yshev 8 ) Assume that X 1 X 2 X n are indep enden t random v ariables with p ossibly dieren t distributions and let S n b e their sum. Let m k = E ( X k ), 2 k = V ( X k ), and M n = m 1 + m 2 + + m n Assume that 2 k < R for all k Pro v e that, for an y > 0, P S n n M n n < 1 as n 1 13 A fair coin is tossed rep eatedly Before eac h toss, y ou are allo w ed to decide whether to b et on the outcome. Can y ou describ e a b etting system with innitely man y b ets whic h will enable y ou, in the long run, to win more than half of y our b ets? (Note that w e are disallo wing a b etting system that sa ys to b et un til y ou are ahead, then quit.) W rite a computer program that implemen ts this b etting system. As stated ab o v e, y our program m ust decide whether to b et on a particular outcome b efore that outcome is determined. F or example, y ou migh t select only outcomes that come after there ha v e b een three tails in a ro w. See if y ou can get more than 50% heads b y y our \system." *14 Pro v e the follo wing analogue of Cheb yshev's Inequalit y: P ( j X E ( X ) j ) 1 E ( j X E ( X ) j ) : 8 P L. Cheb yshev, \On Mean V alues," J. Math. Pur e. Appl., v ol. 12 (1867), pp. 177{184. PAGE 322 314 CHAPTER 8. LA W OF LAR GE NUMBERS *15 W e ha v e pro v ed a theorem often called the \W eak La w of Large Num b ers." Most p eople's in tuition and our computer sim ulations suggest that, if w e toss a coin a sequence of times, the prop ortion of heads will really approac h 1/2; that is, if S n is the n um b er of heads in n times, then w e will ha v e A n = S n n 1 2 as n 1 Of course, w e cannot b e sure of this since w e are not able to toss the coin an innite n um b er of times, and, if w e could, the coin could come up heads ev ery time. Ho w ev er, the \Strong La w of Large Num b ers," pro v ed in more adv anced courses, states that P S n n 1 2 = 1 : Describ e a sample space n that w ould mak e it p ossible for us to talk ab out the ev en t E = : S n n 1 2 : Could w e assign the equiprobable measure to this space? (See Example 2.18.) *16 In this exercise, w e shall construct an example of a sequence of random v ariables that satises the w eak la w of large n um b ers, but not the strong la w. The distribution of X i will ha v e to dep end on i b ecause otherwise b oth la ws w ould b e satised. (This problem w as comm unicated to us b y Da vid Maslen.) Supp ose w e ha v e an innite sequence of m utually indep enden t ev en ts A 1 ; A 2 ; : : : Let a i = P ( A i ), and let r b e a p ositiv e in teger. (a) Find an expression of the probabilit y that none of the A i with i > r o ccur. (b) Use the fact that x 1 e x to sho w that P (No A i with i > r o ccurs ) e P 1i = r a i (c) (The rst BorelCan telli lemma) Pro v e that if P 1i =1 a i div erges, then P (innitely man y A i o ccur ) = 1 : No w, let X i b e a sequence of m utually indep enden t random v ariables suc h that for eac h p ositiv e in teger i 2, P ( X i = i ) = 1 2 i log i ; P ( X i = i ) = 1 2 i log i ; P ( X i = 0) = 1 1 i log i : When i = 1 w e let X i = 0 with probabilit y 1. As usual w e let S n = X 1 + + X n Note that the mean of eac h X i is 0. PAGE 323 8.1. DISCRETE RANDOM V ARIABLES 315 (d) Find the v ariance of S n (e) Sho w that the sequence h X i i satises the W eak La w of Large Num b ers, i.e. pro v e that for an y > 0 P S n n 0 ; as n tends to innit y W e no w sho w that f X i g do es not satisfy the Strong La w of Large Numb ers. Supp ose that S n =n 0. Then b ecause X n n = S n n n 1 n S n 1 n 1 ; w e kno w that X n =n 0. F rom the denition of limits, w e conclude that the inequalit y j X i j 1 2 i can only b e true for nitely man y i (f ) Let A i b e the ev en t j X i j 1 2 i Find P ( A i ). Sho w that P 1i =1 P ( A i ) div erges (use the In tegral T est). (g) Pro v e that A i o ccurs for innitely man y i (h) Pro v e that P S n n 0 = 0 ; and hence that the Strong La w of Large Num b ers fails for the sequence f X i g *17 Let us toss a biased coin that comes up heads with probabilit y p and assume the v alidit y of the Strong La w of Large Num b ers as describ ed in Exercise 15. Then, with probabilit y 1, S n n p as n 1 If f ( x ) is a con tin uous function on the unit in terv al, then w e also ha v e f S n n f ( p ) : Finally w e could hop e that E f S n n E ( f ( p )) = f ( p ) : Sho w that, if all this is correct, as in fact it is, w e w ould ha v e pro v en that an y con tin uous function on the unit in terv al is a limit of p olynomial functions. This is a sk etc h of a probabilistic pro of of an imp ortan t theorem in mathematics called the Weierstr ass appr oximation the or em. PAGE 324 316 CHAPTER 8. LA W OF LAR GE NUMBERS 8.2 La w of Large Num b ers for Con tin uous Random V ariables In the previous section w e discussed in some detail the La w of Large Num b ers for discrete probabilit y distributions. This la w has a natural analogue for con tin uous probabilit y distributions, whic h w e consider somewhat more briery here. Cheb yshev Inequalit y Just as in the discrete case, w e b egin our discussion with the Cheb yshev Inequalit y Theorem 8.3 (Cheb yshev Inequalit y) Let X b e a con tin uous random v ariable with densit y function f ( x ). Supp ose X has a nite exp ected v alue = E ( X ) and nite v ariance 2 = V ( X ). Then for an y p ositiv e n um b er > 0 w e ha v e P ( j X j ) 2 2 : 2 The pro of is completely analogous to the pro of in the discrete case, and w e omit it. Note that this theorem sa ys nothing if 2 = V ( X ) is innite. Example 8.4 Let X b e an y con tin uous random v ariable with E ( X ) = and V ( X ) = 2 Then, if = k = k standard deviations for some in teger k then P ( j X j k ) 2 k 2 2 = 1 k 2 ; just as in the discrete case. 2 La w of Large Num b ers With the Cheb yshev Inequalit y w e can no w state and pro v e the La w of Large Num b ers for the con tin uous case. Theorem 8.4 (La w of Large Num b ers) Let X 1 X 2 X n b e an indep enden t trials pro cess with a con tin uous densit y function f nite exp ected v alue and nite v ariance 2 Let S n = X 1 + X 2 + + X n b e the sum of the X i Then for an y real n um b er > 0 w e ha v e lim n !1 P S n n = 0 ; or equiv alen tly lim n !1 P S n n < = 1 : 2 PAGE 325 8.2. CONTINUOUS RANDOM V ARIABLES 317 Note that this theorem is not necessarily true if 2 is innite (see Example 8.8). As in the discrete case, the La w of Large Num b ers sa ys that the a v erage v alue of n indep enden t trials tends to the exp ected v alue as n 1 in the precise sense that, giv en > 0, the probabilit y that the a v erage v alue and the exp ected v alue dier b y more than tends to 0 as n 1 Once again, w e suppress the pro of, as it is iden tical to the pro of in the discrete case.Uniform Case Example 8.5 Supp ose w e c ho ose at random n n um b ers from the in terv al [0 ; 1] with uniform distribution. Then if X i describ es the i th c hoice, w e ha v e = E ( X i ) = Z 1 0 x dx = 1 2 ; 2 = V ( X i ) = Z 1 0 x 2 dx 2 = 1 3 1 4 = 1 12 : Hence, E S n n = 1 2 ; V S n n = 1 12 n ; and for an y > 0, P S n n 1 2 1 12 n 2 : This sa ys that if w e c ho ose n n um b ers at random from [0 ; 1], then the c hances are b etter than 1 1 = (12 n 2 ) that the dierence j S n =n 1 = 2 j is less than Note that pla ys the role of the amoun t of error w e are willing to tolerate: If w e c ho ose = 0 : 1, sa y then the c hances that j S n =n 1 = 2 j is less than 0.1 are b etter than 1 100 = (12 n ). F or n = 100, this is ab out .92, but if n = 1000, this is b etter than .99 and if n = 10 ; 000, this is b etter than .999. W e can illustrate what the La w of Large Num b ers sa ys for this example graphically The densit y for A n = S n =n is determined b y f A n ( x ) = nf S n ( nx ) : W e ha v e seen in Section 7.2, that w e can compute the densit y f S n ( x ) for the sum of n uniform random v ariables. In Figure 8.2 w e ha v e used this to plot the densit y for A n for v arious v alues of n W e ha v e shaded in the area for whic h A n w ould lie b et w een .45 and .55. W e see that as w e increase n w e obtain more and more of the total area inside the shaded region. The La w of Large Num b ers tells us that w e can obtain as m uc h of the total area as w e please inside the shaded region b y c ho osing n large enough (see also Figure 8.1). 2 PAGE 326 318 CHAPTER 8. LA W OF LAR GE NUMBERS n=2 n=5 n=10 n=20 n=30 n=50 Figure 8.2: Illustration of La w of Large Num b ers  uniform case. Normal Case Example 8.6 Supp ose w e c ho ose n real n um b ers at random, using a normal distribution with mean 0 and v ariance 1. Then = E ( X i ) = 0 ; 2 = V ( X i ) = 1 : Hence, E S n n = 0 ; V S n n = 1 n ; and, for an y > 0, P S n n 0 1 n 2 : In this case it is p ossible to compare the Cheb yshev estimate for P ( j S n =n j ) in the La w of Large Num b ers with exact v alues, since w e kno w the densit y function for S n =n exactly (see Example 7.9). The comparison is sho wn in T able 8.1, for = : 1. The data in this table w as pro duced b y the program La wCon tin uous W e see here that the Cheb yshev estimates are in general not v ery accurate. 2 PAGE 327 8.2. CONTINUOUS RANDOM V ARIABLES 319 n P ( j S n =n j : 1) Cheb yshev 100 .31731 1.00000 200 .15730 .50000 300 .08326 .33333 400 .04550 .25000 500 .02535 .20000 600 .01431 .16667 700 .00815 .14286 800 .00468 .12500 900 .00270 .11111 1000 .00157 .10000 T able 8.1: Cheb yshev estimates. Mon te Carlo Metho d Here is a somewhat more in teresting example. Example 8.7 Let g ( x ) b e a con tin uous function dened for x 2 [0 ; 1] with v alues in [0 ; 1]. In Section 2.1, w e sho w ed ho w to estimate the area of the region under the graph of g ( x ) b y the Mon te Carlo metho d, that is, b y c ho osing a large n um b er of random v alues for x and y with uniform distribution and seeing what fraction of the p oin ts P ( x; y ) fell inside the region under the graph (see Example 2.2). Here is a b etter w a y to estimate the same area (see Figure 8.3). Let us c ho ose a large n um b er of indep enden t v alues X n at random from [0 ; 1] with uniform densit y set Y n = g ( X n ), and nd the a v erage v alue of the Y n Then this a v erage is our estimate for the area. T o see this, note that if the densit y function for X n is uniform, = E ( Y n ) = Z 1 0 g ( x ) f ( x ) dx = Z 1 0 g ( x ) dx = a v erage v alue of g ( x ) ; while the v ariance is 2 = E (( Y n ) 2 ) = Z 1 0 ( g ( x ) ) 2 dx < 1 ; since for all x in [0 ; 1], g ( x ) is in [0 ; 1], hence is in [0 ; 1], and so j g ( x ) j 1. No w let A n = (1 =n )( Y 1 + Y 2 + + Y n ). Then b y Cheb yshev's Inequalit y w e ha v e P ( j A n j ) 2 n 2 < 1 n 2 : This sa ys that to get within of the true v alue for = R 1 0 g ( x ) dx with probabilit y at least p w e should c ho ose n so that 1 =n 2 1 p (i.e., so that n 1 = 2 (1 p )). Note that this metho d tells us ho w large to tak e n to get a desired accuracy 2 PAGE 328 320 CHAPTER 8. LA W OF LAR GE NUMBERS Y X Y = g (x) 0 1 1 Figure 8.3: Area problem. The La w of Large Num b ers requires that the v ariance 2 of the original underlying densit y b e nite: 2 < 1 In cases where this fails to hold, the La w of Large Num b ers ma y fail, to o. An example follo ws. Cauc h y Case Example 8.8 Supp ose w e c ho ose n n um b ers from ( 1 ; + 1 ) with a Cauc h y densit y with parameter a = 1. W e kno w that for the Cauc h y densit y the exp ected v alue and v ariance are undened (see Example 6.28). In this case, the densit y function for A n = S n n is giv en b y (see Example 7.6) f A n ( x ) = 1 (1 + x 2 ) ; that is, the density function for A n is the same for al l n In this case, as n increases, the densit y function do es not c hange at all, and the La w of Large Num b ers do es not hold. 2 Exercises 1 Let X b e a con tin uous random v ariable with mean = 10 and v ariance 2 = 100 = 3. Using Cheb yshev's Inequalit y nd an upp er b ound for the follo wing probabilities. PAGE 329 8.2. CONTINUOUS RANDOM V ARIABLES 321 (a) P ( j X 10 j 2). (b) P ( j X 10 j 5). (c) P ( j X 10 j 9). (d) P ( j X 10 j 20). 2 Let X b e a con tin uous random v ariable with v alues unformly distributed o v er the in terv al [0 ; 20]. (a) Find the mean and v ariance of X (b) Calculate P ( j X 10 j 2), P ( j X 10 j 5), P ( j X 10 j 9), and P ( j X 10 j 20) exactly Ho w do y our answ ers compare with those of Exercise 1? Ho w go o d is Cheb yshev's Inequalit y in this case? 3 Let X b e the random v ariable of Exercise 2. (a) Calculate the function f ( x ) = P ( j X 10 j x ). (b) No w graph the function f ( x ), and on the same axes, graph the Cheb yshev function g ( x ) = 100 = (3 x 2 ). Sho w that f ( x ) g ( x ) for all x > 0, but that g ( x ) is not a v ery go o d appro ximation for f ( x ). 4 Let X b e a con tin uous random v ariable with v alues exp onen tially distributed o v er [0 ; 1 ) with parameter = 0 : 1. (a) Find the mean and v ariance of X (b) Using Cheb yshev's Inequalit y nd an upp er b ound for the follo wing probabilities: P ( j X 10 j 2), P ( j X 10 j 5), P ( j X 10 j 9), and P ( j X 10 j 20). (c) Calculate these probabilities exactly and compare with the b ounds in (b). 5 Let X b e a con tin uous random v ariable with v alues normally distributed o v er ( 1 ; + 1 ) with mean = 0 and v ariance 2 = 1. (a) Using Cheb yshev's Inequalit y nd upp er b ounds for the follo wing probabilities: P ( j X j 1), P ( j X j 2), and P ( j X j 3). (b) The area under the normal curv e b et w een 1 and 1 is .6827, b et w een 2 and 2 is .9545, and b et w een 3 and 3 it is .9973 (see the table in App endix A). Compare y our b ounds in (a) with these exact v alues. Ho w go o d is Cheb yshev's Inequalit y in this case? 6 If X is normally distributed, with mean and v ariance 2 nd an upp er b ound for the follo wing probabilities, using Cheb yshev's Inequalit y (a) P ( j X j ). (b) P ( j X j 2 ). (c) P ( j X j 3 ). PAGE 330 322 CHAPTER 8. LA W OF LAR GE NUMBERS (d) P ( j X j 4 ). No w nd the exact v alue using the program NormalArea or the normal table in App endix A, and compare. 7 If X is a random v ariable with mean 6 = 0 and v ariance 2 dene the r elative deviation D of X from its mean b y D = X : (a) Sho w that P ( D a ) 2 = ( 2 a 2 ). (b) If X is the random v ariable of Exercise 1, nd an upp er b ound for P ( D : 2), P ( D : 5), P ( D : 9), and P ( D 2). 8 Let X b e a con tin uous random v ariable and dene the standar dize d version X of X b y: X = X : (a) Sho w that P ( j X j a ) 1 =a 2 (b) If X is the random v ariable of Exercise 1, nd b ounds for P ( j X j 2), P ( j X j 5), and P ( j X j 9). 9 (a) Supp ose a n um b er X is c hosen at random from [0 ; 20] with uniform probabilit y Find a lo w er b ound for the probabilit y that X lies b et w een 8 and 12, using Cheb yshev's Inequalit y (b) No w supp ose 20 real n um b ers are c hosen indep enden tly from [0 ; 20] with uniform probabilit y Find a lo w er b ound for the probabilit y that their a v erage lies b et w een 8 and 12. (c) No w supp ose 100 real n um b ers are c hosen indep enden tly from [0 ; 20]. Find a lo w er b ound for the probabilit y that their a v erage lies b et w een 8 and 12. 10 A studen t's score on a particular calculus nal is a random v ariable with v alues of [0 ; 100], mean 70, and v ariance 25. (a) Find a lo w er b ound for the probabilit y that the studen t's score will fall b et w een 65 and 75. (b) If 100 studen ts tak e the nal, nd a lo w er b ound for the probabilit y that the class a v erage will fall b et w een 65 and 75. 11 The Pilsdor b eer compan y runs a reet of truc ks along the 100 mile road from Hangto wn to Dry Gulc h, and main tains a garage halfw a y in b et w een. Eac h of the truc ks is apt to break do wn at a p oin t X miles from Hangto wn, where X is a random v ariable uniformly distributed o v er [0 ; 100]. (a) Find a lo w er b ound for the probabilit y P ( j X 50 j 10). PAGE 331 8.2. CONTINUOUS RANDOM V ARIABLES 323 (b) Supp ose that in one bad w eek, 20 truc ks break do wn. Find a lo w er b ound for the probabilit y P ( j A 20 50 j 10), where A 20 is the a v erage of the distances from Hangto wn at the time of breakdo wn. 12 A share of common sto c k in the Pilsdor b eer compan y has a price Y n on the n th business da y of the y ear. Finn observ es that the price c hange X n = Y n +1 Y n app ears to b e a random v ariable with mean = 0 and v ariance 2 = 1 = 4. If Y 1 = 30, nd a lo w er b ound for the follo wing probabilities, under the assumption that the X n 's are m utually indep enden t. (a) P (25 Y 2 35). (b) P (25 Y 11 35). (c) P (25 Y 101 35). 13 Supp ose one h undred n um b ers X 1 X 2 X 100 are c hosen indep enden tly at random from [0 ; 20]. Let S = X 1 + X 2 + + X 100 b e the sum, A = S= 100 the a v erage, and S = ( S 1000) = (10 = p 3 ) the standardized sum. Find lo w er b ounds for the probabilities (a) P ( j S 1000 j 100). (b) P ( j A 10 j 1). (c) P ( j S j p 3). 14 Let X b e a con tin uous random v ariable normally distributed on ( 1 ; + 1 ) with mean 0 and v ariance 1. Using the normal table pro vided in App endix A, or the program NormalArea nd v alues for the function f ( x ) = P ( j X j x ) as x increases from 0 to 4.0 in steps of .25. Note that for x 0 the table giv es N A (0 ; x ) = P (0 X x ) and th us P ( j X j x ) = 2( : 5 N A (0 ; x ). Plot b y hand the graph of f ( x ) using these v alues, and the graph of the Cheb yshev function g ( x ) = 1 =x 2 and compare (see Exercise 3). 15 Rep eat Exercise 14, but this time with mean 10 and v ariance 3. Note that the table in App endix A presen ts v alues for a standard normal v ariable. Find the standardized v ersion X for X nd v alues for f ( x ) = P ( j X j x ) as in Exercise 14, and then rescale these v alues for f ( x ) = P ( j X 10 j x ). Graph and compare this function with the Cheb yshev function g ( x ) = 3 =x 2 16 Let Z = X= Y where X and Y ha v e normal densities with mean 0 and standard deviation 1. Then it can b e sho wn that Z has a Cauc h y densit y (a) W rite a program to illustrate this result b y plotting a bar graph of 1000 samples obtained b y forming the ratio of t w o standard normal outcomes. Compare y our bar graph with the graph of the Cauc h y densit y Dep ending up on whic h computer language y ou use, y ou ma y or ma y not need to tell the computer ho w to sim ulate a normal random v ariable. A metho d for doing this w as describ ed in Section 5.2. PAGE 332 324 CHAPTER 8. LA W OF LAR GE NUMBERS (b) W e ha v e seen that the La w of Large Num b ers do es not apply to the Cauc h y densit y (see Example 8.8). Sim ulate a large n um b er of exp erimen ts with Cauc h y densit y and compute the a v erage of y our results. Do these a v erages seem to b e approac hing a limit? If so can y ou explain wh y this migh t b e? 17 Sho w that, if X 0, then P ( X a ) E ( X ) =a 18 (Lamp erti 9 ) Let X b e a nonnegativ e random v ariable. What is the b est upp er b ound y ou can giv e for P ( X a ) if y ou kno w (a) E ( X ) = 20. (b) E ( X ) = 20 and V ( X ) = 25. (c) E ( X ) = 20, V ( X ) = 25, and X is symmetric ab out its mean. 9 Priv ate comm unication. PAGE 333 Chapter 9 Cen tral Limit Theorem 9.1 Cen tral Limit Theorem for Bernoulli T rials The second fundamen tal theorem of probabilit y is the Centr al Limit The or em. This theorem sa ys that if S n is the sum of n m utually indep enden t random v ariables, then the distribution function of S n is w ellappro ximated b y a certain t yp e of con tin uous function kno wn as a normal densit y function, whic h is giv en b y the form ula f ; ( x ) = 1 p 2 e ( x ) 2 = (2 2 ) ; as w e ha v e seen in Chapter 4.3. In this section, w e will deal only with the case that = 0 and = 1. W e will call this particular normal densit y function the s tandard normal densit y and w e will denote it b y ( x ): ( x ) = 1 p 2 e x 2 = 2 : A graph of this function is giv en in Figure 9.1. It can b e sho wn that the area under an y normal densit y equals 1. The Cen tral Limit Theorem tells us, quite generally what happ ens when w e ha v e the sum of a large n um b er of indep enden t random v ariables eac h of whic h contributes a small amoun t to the total. In this section w e shall discuss this theorem as it applies to the Bernoulli trials and in Section 9.2 w e shall consider more general pro cesses. W e will discuss the theorem in the case that the individual random v ariables are iden tically distributed, but the theorem is true, under certain conditions, ev en if the individual random v ariables ha v e dieren t distributions. Bernoulli T rials Consider a Bernoulli trials pro cess with probabilit y p for success on eac h trial. Let X i = 1 or 0 according as the i th outcome is a success or failure, and let S n = X 1 + X 2 + + X n Then S n is the n um b er of successes in n trials. W e kno w that S n has as its distribution the binomial probabilities b ( n; p; j ). In Section 3.2, 325 PAGE 334 326 CHAPTER 9. CENTRAL LIMIT THEOREM 4 2 0 2 4 0 0.1 0.2 0.3 0.4 Figure 9.1: Standard normal densit y w e plotted these distributions for p = : 3 and p = : 5 for v arious v alues of n (see Figure 3.5). W e note that the maxim um v alues of the distributions app eared near the exp ected v alue np whic h causes their spik e graphs to drift o to the righ t as n increased. Moreo v er, these maxim um v alues approac h 0 as n increased, whic h causes the spik e graphs to ratten out. Standardized Sums W e can prev en t the drifting of these spik e graphs b y subtracting the exp ected n umb er of successes np from S n obtaining the new random v ariable S n np No w the maxim um v alues of the distributions will alw a ys b e near 0. T o prev en t the spreading of these spik e graphs, w e can normalize S n np to ha v e v ariance 1 b y dividing b y its standard deviation p npq (see Exercise 6.2.12 and Exercise 6.2.16). Denition 9.1 The standar dize d sum of S n is giv en b y S n = S n np p npq : S n alw a ys has exp ected v alue 0 and v ariance 1. 2 Supp ose w e plot a spik e graph with the spik es placed at the p ossible v alues of S n : x 0 x 1 x n where x j = j np p npq : (9.1) W e mak e the heigh t of the spik e at x j equal to the distribution v alue b ( n; p; j ). An example of this standardized spik e graph, with n = 270 and p = : 3, is sho wn in Figure 9.2. This graph is b eautifully b ellshap ed. W e w ould lik e to t a normal densit y to this spik e graph. The ob vious c hoice to try is the standard normal densit y since it is cen tered at 0, just as the standardized spik e graph is. In this gure, w e PAGE 335 9.1. BERNOULLI TRIALS 327 4 2 0 2 4 0 0.1 0.2 0.3 0.4 Figure 9.2: Normalized binomial distribution and standard normal densit y ha v e dra wn this standard normal densit y The reader will note that a horrible thing has o ccurred: Ev en though the shap es of the t w o graphs are the same, the heigh ts are quite dieren t. If w e w an t the t w o graphs to t eac h other, w e m ust mo dify one of them; w e c ho ose to mo dify the spik e graph. Since the shap es of the t w o graphs lo ok fairly close, w e will attempt to mo dify the spik e graph without c hanging its shap e. The reason for the diering heigh ts is that the sum of the heigh ts of the spik es equals 1, while the area under the standard normal densit y equals 1. If w e w ere to dra w a con tin uous curv e through the top of the spik es, and nd the area under this curv e, w e see that w e w ould obtain, appro ximately the sum of the heigh ts of the spik es m ultiplied b y the distance b et w een consecutiv e spik es, whic h w e will call Since the sum of the heigh ts of the spik es equals one, the area under this curv e w ould b e appro ximately Th us, to c hange the spik e graph so that the area under this curv e has v alue 1, w e need only m ultiply the heigh ts of the spik es b y 1 = It is easy to see from Equation 9.1 that = 1 p npq : In Figure 9.3 w e sho w the standardized sum S n for n = 270 and p = : 3, after correcting the heigh ts, together with the standard normal densit y (This gure w as pro duced with the program CL TBernoulliPlot .) The reader will note that the standard normal ts the heigh tcorrected spik e graph extremely w ell. In fact, one v ersion of the Cen tral Limit Theorem (see Theorem 9.1) sa ys that as n increases, the standard normal densit y will do an increasingly b etter job of appro ximating the heigh tcorrected spik e graphs corresp onding to a Bernoulli trials pro cess with n summands. Let us x a v alue x on the x axis and let n b e a xed p ositiv e in teger. Then, using Equation 9.1, the p oin t x j that is closest to x has a subscript j giv en b y the PAGE 336 328 CHAPTER 9. CENTRAL LIMIT THEOREM 4 2 0 2 4 0 0.1 0.2 0.3 0.4 Figure 9.3: Corrected spik e graph with standard normal densit y form ula j = h np + x p npq i ; where h a i means the in teger nearest to a Th us the heigh t of the spik e ab o v e x j will b e p npq b ( n; p; j ) = p npq b ( n; p; h np + x j p npq i ) : F or large n w e ha v e seen that the heigh t of the spik e is v ery close to the heigh t of the normal densit y at x This suggests the follo wing theorem. Theorem 9.1 (Cen tral Limit Theorem for Binomial Distributions) F or the binomial distribution b ( n; p; j ) w e ha v e lim n !1 p npq b ( n; p; h np + x p npq i ) = ( x ) ; where ( x ) is the standard normal densit y The pro of of this theorem can b e carried out using Stirling's appro ximation from Section 3.1. W e indicate this metho d of pro of b y considering the case x = 0. In this case, the theorem states that lim n !1 p npq b ( n; p; h np i ) = 1 p 2 = : 3989 : : : : In order to simplify the calculation, w e assume that np is an in teger, so that h np i = np Then p npq b ( n; p; np ) = p npq p np q nq n ( np )! ( nq )! : Recall that Stirling's form ula (see Theorem 3.3) states that n p 2 n n n e n as n 1 : PAGE 337 9.1. BERNOULLI TRIALS 329 Using this, w e ha v e p npq b ( n; p; np ) p npq p np q nq p 2 n n n e n p 2 np p 2 nq ( np ) np ( nq ) nq e np e nq ; whic h simplies to 1 = p 2 2 Appro ximating Binomial Distributions W e can use Theorem 9.1 to nd appro ximations for the v alues of binomial distribution functions. If w e wish to nd an appro ximation for b ( n; p; j ), w e set j = np + x p npq and solv e for x obtaining x = j np p npq : Theorem 9.1 then sa ys that p npq b ( n; p; j ) is appro ximately equal to ( x ), so b ( n; p; j ) ( x ) p npq = 1 p npq j np p npq : Example 9.1 Let us estimate the probabilit y of exactly 55 heads in 100 tosses of a coin. F or this case np = 100 1 = 2 = 50 and p npq = p 100 1 = 2 1 = 2 = 5. Th us x 55 = (55 50) = 5 = 1 and P ( S 100 = 55) (1) 5 = 1 5 1 p 2 e 1 = 2 = : 0484 : T o four decimal places, the actual v alue is .0485, and so the appro ximation is v ery go o d. 2 The program CL TBernoulliLo cal illustrates this appro ximation for an y c hoice of n p and j W e ha v e run this program for t w o examples. The rst is the probabilit y of exactly 50 heads in 100 tosses of a coin; the estimate is .0798, while the actual v alue, to four decimal places, is .0796. The second example is the probabilit y of exactly eigh t sixes in 36 rolls of a die; here the estimate is .1093, while the actual v alue, to four decimal places, is .1196. PAGE 338 330 CHAPTER 9. CENTRAL LIMIT THEOREM The individual binomial probabilities tend to 0 as n tends to innit y In most applications w e are not in terested in the probabilit y that a sp ecic outcome o ccurs, but rather in the probabilit y that the outcome lies in a giv en in terv al, sa y the in terv al [ a; b ]. In order to nd this probabilit y w e add the heigh ts of the spik e graphs for v alues of j b et w een a and b This is the same as asking for the probabilit y that the standardized sum S n lies b et w een a and b where a and b are the standardized v alues of a and b But as n tends to innit y the sum of these areas could b e exp ected to approac h the area under the standard normal densit y b et w een a and b The Centr al Limit The or em states that this do es indeed happ en. Theorem 9.2 (Cen tral Limit Theorem for Bernoulli T rials) Let S n b e the n um b er of successes in n Bernoulli trials with probabilit y p for success, and let a and b b e t w o xed real n um b ers. Then lim n !1 P a S n np p npq b = Z b a ( x ) dx : 2 This theorem can b e pro v ed b y adding together the appro ximations to b ( n; p; k ) giv en in Theorem 9.1.It is also a sp ecial case of the more general Cen tral Limit Theorem (see Section 10.3). W e kno w from calculus that the in tegral on the righ t side of this equation is equal to the area under the graph of the standard normal densit y ( x ) b et w een a and b W e denote this area b y NA( a ; b ). Unfortunately there is no simple w a y to in tegrate the function e x 2 = 2 and so w e m ust either use a table of v alues or else a n umerical in tegration program. (See Figure 9.4 for v alues of NA(0 ; z ). A more extensiv e table is giv en in App endix A.) It is clear from the symmetry of the standard normal densit y that areas suc h as that b et w een 2 and 3 can b e found from this table b y adding the area from 0 to 2 (same as that from 2 to 0) to the area from 0 to 3. Appro ximation of Binomial Probabilities Supp ose that S n is binomially distributed with parameters n and p W e ha v e seen that the ab o v e theorem sho ws ho w to estimate a probabilit y of the form P ( i S n j ) ; (9.2) where i and j are in tegers b et w een 0 and n As w e ha v e seen, the binomial distribution can b e represen ted as a spik e graph, with spik es at the in tegers b et w een 0 and n and with the heigh t of the k th spik e giv en b y b ( n; p; k ). F or mo deratesized v alues of n if w e standardize this spik e graph, and c hange the heigh ts of its spik es, in the manner describ ed ab o v e, the sum of the heigh ts of the spik es is appro ximated b y the area under the standard normal densit y b et w een i and j It turns out that a sligh tly more accurate appro ximation is aorded b y the area under the standard PAGE 339 9.1. BERNOULLI TRIALS 331 NA (0,z) = area of rshaded region 0 z z NA(z) z NA(z) z NA(z) z NA(z) .0 .0000 1.0 .3413 2.0 .4772 3.0 4987 .1 .0398 1.1 .3643 2.1 .4821 3.1 4990 .2 .0793 1.2 .3849 2.2 .4861 3.2 4993 .3 .1179 1.3 .4032 2.3 .4893 3.3 4995r .4 .1554 1.4 .4192 2.4 .4918 3.4 4997r .5 .1915 1.5 .4332 2.5 .4938 3.5 4998r .6 .2257 1.6 .4452 2.6 .4953 3.6 4998r .7 .2580 1.7 .4554 2.7 .4965 3.7 4999r .8 .2881 1.8 .4641 2.8 .4974 3.8 4999r .9 .3159 1.9 .4713 2.9 .4981 3.9 5000 Figure 9.4: T able of v alues of NA(0 ; z ), the normal area from 0 to z PAGE 340 332 CHAPTER 9. CENTRAL LIMIT THEOREM normal densit y b et w een the standardized v alues corresp onding to ( i 1 = 2) and ( j + 1 = 2); these v alues are i = i 1 = 2 np p npq and j = j + 1 = 2 np p npq : Th us, P ( i S n j ) NA i 1 2 np p npq ; j + 1 2 np p npq : It should b e stressed that the appro ximations obtained b y using the Cen tral Limit Theorem are only appro ximations, and sometimes they are not v ery close to the actual v alues (see Exercise 12). W e no w illustrate this idea with some examples. Example 9.2 A coin is tossed 100 times. Estimate the probabilit y that the n um b er of heads lies b et w een 40 and 60 (the w ord \b et w een" in mathematics means inclusiv e of the endp oin ts). The exp ected n um b er of heads is 100 1 = 2 = 50, and the standard deviation for the n um b er of heads is p 100 1 = 2 1 = 2 = 5. Th us, since n = 100 is reasonably large, w e ha v e P (40 S n 60) P 39 : 5 50 5 S n 60 : 5 50 5 = P ( 2 : 1 S n 2 : 1) NA( 2 : 1 ; 2 : 1) = 2NA(0 ; 2 : 1) : 9642 : The actual v alue is .96480, to v e decimal places. Note that in this case w e are asking for the probabilit y that the outcome will not deviate b y more than t w o standard deviations from the exp ected v alue. Had w e ask ed for the probabilit y that the n um b er of successes is b et w een 35 and 65, this w ould ha v e represen ted three standard deviations from the mean, and, using our 1/2 correction, our estimate w ould b e the area under the standard normal curv e b et w een 3 : 1 and 3.1, or 2NA(0 ; 3 : 1) = : 9980. The actual answ er in this case, to v e places, is .99821. 2 It is imp ortan t to w ork a few problems b y hand to understand the con v ersion from a giv en inequalit y to an inequalit y relating to the standardized v ariable. After this, one can then use a computer program that carries out this con v ersion, including the 1/2 correction. The program CL TBernoulliGlobal is suc h a program for estimating probabilities of the form P ( a S n b ). Example 9.3 Dartmouth College w ould lik e to ha v e 1050 freshmen. This college cannot accommo date more than 1060. Assume that eac h applican t accepts with PAGE 341 9.1. BERNOULLI TRIALS 333 probabilit y .6 and that the acceptances can b e mo deled b y Bernoulli trials. If the college accepts 1700, what is the probabilit y that it will ha v e to o man y acceptances? If it accepts 1700 studen ts, the exp ected n um b er of studen ts who matriculate is : 6 1700 = 1020. The standard deviation for the n um b er that accept is p 1700 : 6 : 4 20. Th us w e w an t to estimate the probabilit y P ( S 1700 > 1060) = P ( S 1700 1061) = P S 1700 1060 : 5 1020 20 = P ( S 1700 2 : 025) : F rom T able 9.4, if w e in terp olate, w e w ould estimate this probabilit y to b e : 5 : 4784 = : 0216. Th us, the college is fairly safe using this admission p olicy 2 Applications to Statistics There are man y imp ortan t questions in the eld of statistics that can b e answ ered using the Cen tral Limit Theorem for indep enden t trials pro cesses. The follo wing example is one that is encoun tered quite frequen tly in the news. Another example of an application of the Cen tral Limit Theorem to statistics is giv en in Section 9.2. Example 9.4 One frequen tly reads that a p oll has b een tak en to estimate the prop ortion of p eople in a certain p opulation who fa v or one candidate o v er another in a race with t w o candidates. (This mo del also applies to races with more than t w o candidates A and B and t w o ballot prop ositions.) Clearly it is not p ossible for p ollsters to ask ev ery one for their preference. What is done instead is to pic k a subset of the p opulation, called a sample, and ask ev ery one in the sample for their preference. Let p b e the actual prop ortion of p eople in the p opulation who are in fa v or of candidate A and let q = 1 p If w e c ho ose a sample of size n from the p opulation, the preferences of the p eople in the sample can b e represen ted b y random v ariables X 1 ; X 2 ; : : : ; X n where X i = 1 if p erson i is in fa v or of candidate A and X i = 0 if p erson i is in fa v or of candidate B Let S n = X 1 + X 2 + + X n If eac h subset of size n is c hosen with the same probabilit y then S n is h yp ergeometrically distributed. If n is small relativ e to the size of the p opulation (whic h is t ypically true in practice), then S n is appro ximately binomially distributed, with parameters n and p The p ollster w an ts to estimate the v alue p An estimate for p is pro vided b y the v alue p = S n =n whic h is the prop ortion of p eople in the sample who fa v or candidate B The Cen tral Limit Theorem sa ys that the random v ariable p is appro ximately normally distributed. (In fact, our v ersion of the Cen tral Limit Theorem sa ys that the distribution function of the random v ariable S n = S n np p npq is appro ximated b y the standard normal densit y .) But w e ha v e p = S n np p npq r pq n + p ; PAGE 342 334 CHAPTER 9. CENTRAL LIMIT THEOREM i.e., p is just a linear function of S n Since the distribution of S n is appro ximated b y the standard normal densit y the distribution of the random v ariable p m ust also b e b ellshap ed. W e also kno w ho w to write the mean and standard deviation of p in terms of p and n The mean of p is just p and the standard deviation is r pq n : Th us, it is easy to write do wn the standardized v ersion of p ; it is p = p p p pq =n : Since the distribution of the standardized v ersion of p is appro ximated b y the standard normal densit y w e kno w, for example, that 95% of its v alues will lie within t w o standard deviations of its mean, and the same is true of p So w e ha v e P p 2 r pq n < p < p + 2 r pq n : 954 : No w the p ollster do es not kno w p or q but he can use p and q = 1 p in their place without to o m uc h danger. With this idea in mind, the ab o v e statemen t is equiv alen t to the statemen t P p 2 r p q n < p < p + 2 r p q n : 954 : The resulting in terv al p 2 p p q p n ; p + 2 p p q p n is called the 95 p er c ent c ondenc e interval for the unkno wn v alue of p The name is suggested b y the fact that if w e use this metho d to estimate p in a large n um b er of samples w e should exp ect that in ab out 95 p ercen t of the samples the true v alue of p is con tained in the condence in terv al obtained from the sample. In Exercise 11 y ou are ask ed to write a program to illustrate that this do es indeed happ en. The p ollster has con trol o v er the v alue of n Th us, if he w an ts to create a 95% condence in terv al with length 6%, then he should c ho ose a v alue of n so that 2 p p q p n : 03 : Using the fact that p q 1 = 4, no matter what the v alue of p is, it is easy to sho w that if he c ho oses a v alue of n so that 1 p n : 03 ; he will b e safe. This is equiv alen t to c ho osing n 1111 : PAGE 343 9.1. BERNOULLI TRIALS 335 0.48 0.5 0.52 0.54 0.56 0.58 0.6 0 5 10 15 20 25 Figure 9.5: P olling sim ulation. So if the p ollster c ho oses n to b e 1200, sa y and calculates p using his sample of size 1200, then 19 times out of 20 (i.e., 95% of the time), his condence in terv al, whic h is of length 6%, will con tain the true v alue of p This t yp e of condence in terv al is t ypically rep orted in the news as follo ws: this surv ey has a 3% margin of error. In fact, most of the surv eys that one sees rep orted in the pap er will ha v e sample sizes around 1000. A somewhat surprising fact is that the size of the p opulation has apparen tly no eect on the sample size needed to obtain a 95% condence in terv al for p with a giv en margin of error. T o see this, note that the v alue of n that w as needed dep ended only on the n um b er .03, whic h is the margin of error. In other w ords, whether the p opulation is of size 100,000 or 100,000,000, the p ollster needs only to c ho ose a sample of size 1200 or so to get the same accuracy of estimate of p (W e did use the fact that the sample size w as small relativ e to the p opulation size in the statemen t that S n is appro ximately binomially distributed.) In Figure 9.5, w e sho w the results of sim ulating the p olling pro cess. The p opulation is of size 100,000, and for the p opulation, p = : 54. The sample size w as c hosen to b e 1200. The spik e graph sho ws the distribution of p for 10,000 randomly c hosen samples. F or this sim ulation, the program k ept trac k of the n um b er of samples for whic h p w as within 3% of .54. This n um b er w as 9648, whic h is close to 95% of the n um b er of samples used. Another w a y to see what the idea of condence in terv als means is sho wn in Figure 9.6. In this gure, w e sho w 100 condence in terv als, obtained b y computing p for 100 dieren t samples of size 1200 from the same p opulation as b efore. The reader can see that most of these condence in terv als (96, to b e exact) con tain the true v alue of p The Gallup P oll has used these p olling tec hniques in ev ery Presiden tial election since 1936 (and in inn umerable other elections as w ell). T able 9.1 1 sho ws the results 1 The Gallup P oll Mon thly No v em b er 1992, No. 326, p. 33. Supplemen ted with the help of PAGE 344 336 CHAPTER 9. CENTRAL LIMIT THEOREM 0.48 0.5 0.52 0.54 0.56 0.58 0.6 Figure 9.6: Condence in terv al sim ulation. of their eorts. The reader will note that most of the appro ximations to p are within 3% of the actual v alue of p The sample sizes for these p olls w ere t ypically around 1500. (In the table, b oth the predicted and actual p ercen tages for the winning candidate refer to the p ercen tage of the v ote among the \ma jor" p olitical parties. In most elections, there w ere t w o ma jor parties, but in sev eral elections, there w ere three.) This tec hnique also pla ys an imp ortan t role in the ev aluation of the eectiv eness of drugs in the medical profession. F or example, it is sometimes desired to kno w what prop ortion of patien ts will b e help ed b y a new drug. This prop ortion can b e estimated b y giving the drug to a subset of the patien ts, and determining the prop ortion of this sample who are help ed b y the drug. 2 Historical Remarks The Cen tral Limit Theorem for Bernoulli trials w as rst pro v ed b y Abraham de Moivre and app eared in his b o ok, The Do ctrine of Chanc es, rst published in 1718. 2 De Moivre sp en t his y ears from age 18 to 21 in prison in F rance b ecause of his Protestan t bac kground. When he w as released he left F rance for England, where he w ork ed as a tutor to the sons of noblemen. Newton had presen ted a cop y of his Principia Mathematic a to the Earl of Dev onshire. The story go es that, while de Moivre w as tutoring at the Earl's house, he came up on Newton's w ork and found that it w as b ey ond him. It is said that he then b ough t a cop y of his o wn and tore Lydia K. Saab, The Gallup Organization. 2 A. de Moivre, The Do ctrine of Chanc es, 3d ed. (London: Millar, 1756). PAGE 345 9.1. BERNOULLI TRIALS 337 Y ear Winning Gallup Final Election Deviation Candidate Surv ey Result 1936 Ro osev elt 55.7% 62.5% 6.8% 1940 Ro osev elt 52.0% 55.0% 3.0% 1944 Ro osev elt 51.5% 53.3% 1.8% 1948 T ruman 44.5% 49.9% 5.4% 1952 Eisenho w er 51.0% 55.4% 4.4% 1956 Eisenho w er 59.5% 57.8% 1.7% 1960 Kennedy 51.0% 50.1% 0.9% 1964 Johnson 64.0% 61.3% 2.7% 1968 Nixon 43.0% 43.5% 0.5% 1972 Nixon 62.0% 61.8% 0.2% 1976 Carter 48.0% 50.0% 2.0% 1980 Reagan 47.0% 50.8% 3.8% 1984 Reagan 59.0% 59.1% 0.1% 1988 Bush 56.0% 53.9% 2.1% 1992 Clin ton 49.0% 43.2% 5.8% 1996 Clin ton 52.0% 50.1% 1.9% T able 9.1: Gallup P oll accuracy record. it in to separate pages, learning it page b y page as he w alk ed around London to his tutoring jobs. De Moivre frequen ted the coeehouses in London, where he started his probabilit y w ork b y calculating o dds for gam blers. He also met Newton at suc h a coeehouse and they b ecame fast friends. De Moivre dedicated his b o ok to Newton. The Do ctrine of Chanc es pro vides the tec hniques for solving a wide v ariet y of gam bling problems. In the midst of these gam bling problems de Moivre rather mo destly in tro duces his pro of of the Cen tral Limit Theorem, writing A Metho d of appro ximating the Sum of the T erms of the Binomial ( a + b ) n expanded in to a Series, from whence are deduced some practical Rules to estimate the Degree of Assen t whic h is to b e giv en to Exp erimen ts. 3 De Moivre's pro of used the appro ximation to factorials that w e no w call Stirling's form ula. De Moivre states that he had obtained this form ula b efore Stirling but without determining the exact v alue of the constan t p 2 While he sa ys it is not really necessary to kno w this exact v alue, he concedes that kno wing it \has spread a singular Elegancy on the Solution." The complete pro of and an in teresting discussion of the life of de Moivre can b e found in the b o ok Games, Go ds and Gambling b y F. N. Da vid. 4 3 ibid., p. 243. 4 F. N. Da vid, Games, Go ds and Gambling (London: Grin, 1962). PAGE 346 338 CHAPTER 9. CENTRAL LIMIT THEOREM Exercises 1 Let S 100 b e the n um b er of heads that turn up in 100 tosses of a fair coin. Use the Cen tral Limit Theorem to estimate (a) P ( S 100 45). (b) P (45 < S 100 < 55). (c) P ( S 100 > 63). (d) P ( S 100 < 57). 2 Let S 200 b e the n um b er of heads that turn up in 200 tosses of a fair coin. Estimate (a) P ( S 200 = 100). (b) P ( S 200 = 90). (c) P ( S 200 = 80). 3 A truefalse examination has 48 questions. June has probabilit y 3/4 of answ ering a question correctly April just guesses on eac h question. A passing score is 30 or more correct answ ers. Compare the probabilit y that June passes the exam with the probabilit y that April passes it. 4 Let S b e the n um b er of heads in 1,000,000 tosses of a fair coin. Use (a) Cheb yshev's inequalit y and (b) the Cen tral Limit Theorem, to estimate the probabilit y that S lies b et w een 499,500 and 500,500. Use the same t w o metho ds to estimate the probabilit y that S lies b et w een 499,000 and 501,000, and the probabilit y that S lies b et w een 498,500 and 501,500. 5 A ro okie is brough t to a baseball club on the assumption that he will ha v e a .300 batting a v erage. (Batting a v erage is the ratio of the n um b er of hits to the n um b er of times at bat.) In the rst y ear, he comes to bat 300 times and his batting a v erage is .267. Assume that his at bats can b e considered Bernoulli trials with probabilit y .3 for success. Could suc h a lo w a v erage b e considered just bad luc k or should he b e sen t bac k to the minor leagues? Commen t on the assumption of Bernoulli trials in this situation. 6 Once up on a time, there w ere t w o railw a y trains comp eting for the passenger trac of 1000 p eople lea ving from Chicago at the same hour and going to Los Angeles. Assume that passengers are equally lik ely to c ho ose eac h train. Ho w man y seats m ust a train ha v e to assure a probabilit y of .99 or b etter of ha ving a seat for eac h passenger? 7 Assume that, as in Example 9.3, Dartmouth admits 1750 studen ts. What is the probabilit y of to o man y acceptances? 8 A club serv es dinner to mem b ers only They are seated at 12seat tables. The manager observ es o v er a long p erio d of time that 95 p ercen t of the time there are b et w een six and nine full tables of mem b ers, and the remainder of the PAGE 347 9.1. BERNOULLI TRIALS 339 time the n um b ers are equally lik ely to fall ab o v e or b elo w this range. Assume that eac h mem b er decides to come with a giv en probabilit y p and that the decisions are indep enden t. Ho w man y mem b ers are there? What is p ? 9 Let S n b e the n um b er of successes in n Bernoulli trials with probabilit y .8 for success on eac h trial. Let A n = S n =n b e the a v erage n um b er of successes. In eac h case giv e the v alue for the limit, and giv e a reason for y our answ er. (a) lim n !1 P ( A n = : 8). (b) lim n !1 P ( : 7 n < S n < : 9 n ). (c) lim n !1 P ( S n < : 8 n + : 8 p n ). (d) lim n !1 P ( : 79 < A n < : 81). 10 Find the probabilit y that among 10,000 random digits the digit 3 app ears not more than 931 times. 11 W rite a computer program to sim ulate 10,000 Bernoulli trials with probabilit y .3 for success on eac h trial. Ha v e the program compute the 95 p ercen t condence in terv al for the probabilit y of success based on the prop ortion of successes. Rep eat the exp erimen t 100 times and see ho w man y times the true v alue of .3 is included within the condence limits. 12 A balanced coin is ripp ed 400 times. Determine the n um b er x suc h that the probabilit y that the n um b er of heads is b et w een 200 x and 200 + x is appro ximately .80. 13 A no o dle mac hine in Spumoni's spaghetti factory mak es ab out 5 p ercen t defectiv e no o dles ev en when prop erly adjusted. The no o dles are then pac k ed in crates con taining 1900 no o dles eac h. A crate is examined and found to con tain 115 defectiv e no o dles. What is the appro ximate probabilit y of nding at least this man y defectiv e no o dles if the mac hine is prop erly adjusted? 14 A restauran t feeds 400 customers p er da y On the a v erage 20 p ercen t of the customers order apple pie. (a) Giv e a range (called a 95 p ercen t condence in terv al) for the n um b er of pieces of apple pie ordered on a giv en da y suc h that y ou can b e 95 p ercen t sure that the actual n um b er will fall in this range. (b) Ho w man y customers m ust the restauran t ha v e, on the a v erage, to b e at least 95 p ercen t sure that the n um b er of customers ordering pie on that da y falls in the 19 to 21 p ercen t range? 15 Recall that if X is a random v ariable, the cumulative distribution function of X is the function F ( x ) dened b y F ( x ) = P ( X x ) : (a) Let S n b e the n um b er of successes in n Bernoulli trials with probabilit y p for success. W rite a program to plot the cum ulativ e distribution for S n PAGE 348 340 CHAPTER 9. CENTRAL LIMIT THEOREM (b) Mo dify y our program in (a) to plot the cum ulativ e distribution F n ( x ) of the standardized random v ariable S n = S n np p npq : (c) Dene the normal distribution N ( x ) to b e the area under the normal curv e up to the v alue x Mo dify y our program in (b) to plot the normal distribution as w ell, and compare it with the cum ulativ e distribution of S n Do this for n = 10 ; 50, and 100. 16 In Example 3.11, w e w ere in terested in testing the h yp othesis that a new form of aspirin is eectiv e 80 p ercen t of the time rather than the 60 p ercen t of the time as rep orted for standard aspirin. The new aspirin is giv en to n p eople. If it is eectiv e in m or more cases, w e accept the claim that the new drug is eectiv e 80 p ercen t of the time and if not w e reject the claim. Using the Cen tral Limit Theorem, sho w that y ou can c ho ose the n um b er of trials n and the critical v alue m so that the probabilit y that w e reject the h yp othesis when it is true is less than .01 and the probabilit y that w e accept it when it is false is also less than .01. Find the smallest v alue of n that will suce for this. 17 In an opinion p oll it is assumed that an unkno wn prop ortion p of the p eople are in fa v or of a prop osed new la w and a prop ortion 1 p are against it. A sample of n p eople is tak en to obtain their opinion. The prop ortion p in fa v or in the sample is tak en as an estimate of p Using the Cen tral Limit Theorem, determine ho w large a sample will ensure that the estimate will, with probabilit y .95, b e correct to within .01. 18 A description of a p oll in a certain newspap er sa ys that one can b e 95% conden t that error due to sampling will b e no more than plus or min us 3 p ercen tage p oin ts. A p oll in the New Y ork Times tak en in Io w a sa ys that \according to statistical theory in 19 out of 20 cases the results based on suc h samples will dier b y no more than 3 p ercen tage p oin ts in either direction from what w ould ha v e b een obtained b y in terviewing all adult Io w ans." These are b oth attempts to explain the concept of condence in terv als. Do b oth statemen ts sa y the same thing? If not, whic h do y ou think is the more accurate description? 9.2 Cen tral Limit Theorem for Discrete Indep enden t T rials W e ha v e illustrated the Cen tral Limit Theorem in the case of Bernoulli trials, but this theorem applies to a m uc h more general class of c hance pro cesses. In particular, it applies to an y indep enden t trials pro cess suc h that the individual trials ha v e nite v ariance. F or suc h a pro cess, b oth the normal appro ximation for individual terms and the Cen tral Limit Theorem are v alid. PAGE 349 9.2. DISCRETE INDEPENDENT TRIALS 341 Let S n = X 1 + X 2 + + X n b e the sum of n indep enden t discrete random v ariables of an indep enden t trials pro cess with common distribution function m ( x ) dened on the in tegers, with mean and v ariance 2 W e ha v e seen in Section 7.2 that the distributions for suc h indep enden t sums ha v e shap es resem bling the normal curv e, but the largest v alues drift to the righ t and the curv es ratten out (see Figure 7.6). W e can prev en t this just as w e did for Bernoulli trials. Standardized Sums Consider the standardized random v ariable S n = S n n p n 2 : This standardizes S n to ha v e exp ected v alue 0 and v ariance 1. If S n = j then S n has the v alue x j with x j = j n p n 2 : W e can construct a spik e graph just as w e did for Bernoulli trials. Eac h spik e is cen tered at some x j The distance b et w een successiv e spik es is b = 1 p n 2 ; and the heigh t of the spik e is h = p n 2 P ( S n = j ) : The case of Bernoulli trials is the sp ecial case for whic h X j = 1 if the j th outcome is a success and 0 otherwise; then = p and 2 = p pq W e no w illustrate this pro cess for t w o dieren t discrete distributions. The rst is the distribution m giv en b y m = 1 2 3 4 5 : 2 : 2 : 2 : 2 : 2 : In Figure 9.7 w e sho w the standardized sums for this distribution for the cases n = 2 and n = 10. Ev en for n = 2 the appro ximation is surprisingly go o d. F or our second discrete distribution, w e c ho ose m = 1 2 3 4 5 : 4 : 3 : 1 : 1 : 1 : This distribution is quite asymmetric and the appro ximation is not v ery go o d for n = 3, but b y n = 10 w e again ha v e an excellen t appro ximation (see Figure 9.8). Figures 9.7 and 9.8 w ere pro duced b y the program CL TIndT rialsPlot PAGE 350 342 CHAPTER 9. CENTRAL LIMIT THEOREM 4 2 0 2 4 0 0.1 0.2 0.3 0.4 4 2 0 2 4 0 0.1 0.2 0.3 0.4 n = 2 n = 10 Figure 9.7: Distribution of standardized sums. 4 2 0 2 4 0 0.1 0.2 0.3 0.4 4 2 0 2 4 0 0.1 0.2 0.3 0.4 n = 3 n = 10 Figure 9.8: Distribution of standardized sums. Appro ximation Theorem As in the case of Bernoulli trials, these graphs suggest the follo wing appro ximation theorem for the individual probabilities. Theorem 9.3 Let X 1 X 2 X n b e an indep enden t trials pro cess and let S n = X 1 + X 2 + + X n Assume that the greatest common divisor of the dierences of all the v alues that the X j can tak e on is 1. Let E ( X j ) = and V ( X j ) = 2 Then for n large, P ( S n = j ) ( x j ) p n 2 ; where x j = ( j n ) = p n 2 and ( x ) is the standard normal densit y 2 The program CL TIndT rialsLo cal implemen ts this appro ximation. When w e run this program for 6 rolls of a die, and ask for the probabilit y that the sum of the rolls equals 21, w e obtain an actual v alue of .09285, and a normal appro ximation v alue of .09537. If w e run this program for 24 rolls of a die, and ask for the probabilit y that the sum of the rolls is 72, w e obtain an actual v alue of .01724 and a normal appro ximation v alue of .01705. These results sho w that the normal appro ximations are quite go o d. PAGE 351 9.2. DISCRETE INDEPENDENT TRIALS 343 Cen tral Limit Theorem for a Discrete Indep enden t T rials ProcessThe Cen tral Limit Theorem for a discrete indep enden t trials pro cess is as follo ws. Theorem 9.4 (Cen tral Limit Theorem) Let S n = X 1 + X 2 + + X n b e the sum of n discrete indep enden t random v ariables with common distribution ha ving exp ected v alue and v ariance 2 Then, for a < b lim n !1 P a < S n n p n 2 < b = 1 p 2 Z b a e x 2 = 2 dx : 2 W e will giv e the pro ofs of Theorems 9.3 and Theorem 9.4 in Section 10.3. Here w e consider sev eral examples. ExamplesExample 9.5 A die is rolled 420 times. What is the probabilit y that the sum of the rolls lies b et w een 1400 and 1550? The sum is a random v ariable S 420 = X 1 + X 2 + + X 420 ; where eac h X j has distribution m X = 1 2 3 4 5 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 1 = 6 W e ha v e seen that = E ( X ) = 7 = 2 and 2 = V ( X ) = 35 = 12. Th us, E ( S 420 ) = 420 7 = 2 = 1470, 2 ( S 420 ) = 420 35 = 12 = 1225, and ( S 420 ) = 35. Therefore, P (1400 S 420 1550) P 1399 : 5 1470 35 S 420 1550 : 5 1470 35 = P ( 2 : 01 S 420 2 : 30) NA( 2 : 01 ; 2 : 30) = : 9670 : W e note that the program CL TIndT rialsGlobal could b e used to calculate these probabilities. 2 Example 9.6 A studen t's grade p oin t a v erage is the a v erage of his grades in 30 courses. The grades are based on 100 p ossible p oin ts and are recorded as in tegers. Assume that, in eac h course, the instructor mak es an error in grading of k with probabilit y j p=k j where k = 1, 2, 3, 4, 5. The probabilit y of no error is then 1 (137 = 30) p (The parameter p represen ts the inaccuracy of the instructor's grading.) Th us, in eac h course, there are t w o grades for the studen t, namely the PAGE 352 344 CHAPTER 9. CENTRAL LIMIT THEOREM \correct" grade and the recorded grade. So there are t w o a v erage grades for the studen t, namely the a v erage of the correct grades and the a v erage of the recorded grades. W e wish to estimate the probabilit y that these t w o a v erage grades dier b y less than .05 for a giv en studen t. W e no w assume that p = 1 = 20. W e also assume that the total error is the sum S 30 of 30 indep enden t random v ariables eac h with distribution m X : 5 4 3 2 1 0 1 2 3 4 5 1 100 1 80 1 60 1 40 1 20 463 600 1 20 1 40 1 60 1 80 1 100 : One can easily calculate that E ( X ) = 0 and 2 ( X ) = 1 : 5. Then w e ha v e P : 05 S 30 30 : 05 = P ( 1 : 5 S 30 1 : 5) = P 1 : 5 p 30 1 : 5 S 30 1 : 5 p 30 1 : 5 = P ( : 224 S 30 : 224) NA( : 224 ; : 224) = : 1772 : This means that there is only a 17.7% c hance that a giv en studen t's grade p oin t a v erage is accurate to within .05. (Th us, for example, if t w o candidates for v aledictorian ha v e recorded a v erages of 97.1 and 97.2, there is an appreciable probabilit y that their correct a v erages are in the rev erse order.) F or a further discussion of this example, see the article b y R. M. Kozelk a. 5 2 A More General Cen tral Limit Theorem In Theorem 9.4, the discrete random v ariables that w ere b eing summed w ere assumed to b e indep enden t and iden tically distributed. It turns out that the assumption of iden tical distributions can b e substan tially w eak ened. Muc h w ork has b een done in this area, with an imp ortan t con tribution b eing made b y J. W. Lindeb erg. Lindeb erg found a condition on the sequence f X n g whic h guaran tees that the distribution of the sum S n is asymptotically normally distributed. F eller sho w ed that Lindeb erg's condition is necessary as w ell, in the sense that if the condition do es not hold, then the sum S n is not asymptotically normally distributed. F or a precise statemen t of Lindeb erg's Theorem, w e refer the reader to F eller. 6 A sucien t condition that is stronger (but easier to state) than Lindeb erg's condition, and is w eak er than the condition in Theorem 9.4, is giv en in the follo wing theorem. 5 R. M. Kozelk a, \GradeP oin t Av erages and the Cen tral Limit Theorem," A meric an Math. Monthly, v ol. 86 (No v 1979), pp. 773777. 6 W. F eller, Intr o duction to Pr ob ability The ory and its Applic ations, v ol. 1, 3rd ed. (New Y ork: John Wiley & Sons, 1968), p. 254. PAGE 353 9.2. DISCRETE INDEPENDENT TRIALS 345 Theorem 9.5 (Cen tral Limit Theorem) Let X 1 ; X 2 ; : : : ; X n ; : : : b e a sequence of indep enden t discrete random v ariables, and let S n = X 1 + X 2 + + X n F or eac h n denote the mean and v ariance of X n b y n and 2 n resp ectiv ely Dene the mean and v ariance of S n to b e m n and s 2n resp ectiv ely and assume that s n 1 If there exists a constan t A suc h that j X n j A for all n then for a < b lim n !1 P a < S n m n s n < b = 1 p 2 Z b a e x 2 = 2 dx : 2 The condition that j X n j A for all n is sometimes describ ed b y sa ying that the sequence f X n g is uniformly b ounded. The condition that s n 1 is necessary (see Exercise 15). W e illustrate this theorem b y generating a sequence of n random distributions on the in terv al [ a; b ]. W e then con v olute these distributions to nd the distribution of the sum of n indep enden t exp erimen ts go v erned b y these distributions. Finally w e standardize the distribution for the sum to ha v e mean 0 and standard deviation 1 and compare it with the normal densit y The program CL TGeneral carries out this pro cedure. In Figure 9.9 w e sho w the result of running this program for [ a; b ] = [ 2 ; 4], and n = 1 ; 4 ; and 10. W e see that our rst random distribution is quite asymmetric. By the time w e c ho ose the sum of ten suc h exp erimen ts w e ha v e a v ery go o d t to the normal curv e. The ab o v e theorem essen tially sa ys that an ything that can b e though t of as b eing made up as the sum of man y small indep enden t pieces is appro ximately normally distributed. This brings us to one of the most imp ortan t questions that w as ask ed ab out genetics in the 1800's. The Normal Distribution and Genetics When one lo oks at the distribution of heigh ts of adults of one sex in a giv en p opulation, one cannot help but notice that this distribution lo oks lik e the normal distribution. An example of this is sho wn in Figure 9.10. This gure sho ws the distribution of heigh ts of 9593 w omen b et w een the ages of 21 and 74. These data come from the Health and Nutrition Examination Surv ey I (HANES I). F or this surv ey a sample of the U.S. civilian p opulation w as c hosen. The surv ey w as carried out b et w een 1971 and 1974. A natural question to ask is \Ho w do es this come ab out?". F rancis Galton, an English scien tist in the 19th cen tury studied this question, and other related questions, and constructed probabilit y mo dels that w ere of great imp ortance in explaining the genetic eects on suc h attributes as heigh t. In fact, one of the most imp ortan t ideas in statistics, the idea of regression to the mean, w as in v en ted b y Galton in his attempts to understand these genetic eects. Galton w as faced with an apparen t con tradiction. On the one hand, he knew that the normal distribution arises in situations in whic h man y small indep enden t eects are b eing summed. On the other hand, he also knew that man y quan titativ e PAGE 354 346 CHAPTER 9. CENTRAL LIMIT THEOREM 4 2 0 2 4 0 0.1 0.2 0.3 0.4 0.5 0.6 4 2 0 2 4 0 0.1 0.2 0.3 0.4 4 2 0 2 4 0 0.1 0.2 0.3 0.4 Figure 9.9: Sums of randomly c hosen random v ariables. PAGE 355 9.2. DISCRETE INDEPENDENT TRIALS 347 50 55 60 65 70 75 80 0 0.025 0.05 0.075 0.1 0.125 0.15 Figure 9.10: Distribution of heigh ts of adult w omen. attributes, suc h as heigh t, are strongly inruenced b y genetic factors: tall paren ts tend to ha v e tall ospring. Th us in this case, there seem to b e t w o large eects, namely the paren ts. Galton w as certainly a w are of the fact that nongenetic factors pla y ed a role in determining the heigh t of an individual. Nev ertheless, unless these nongenetic factors o v erwhelm the genetic ones, thereb y refuting the h yp othesis that heredit y is imp ortan t in determining heigh t, it did not seem p ossible for sets of paren ts of giv en heigh ts to ha v e ospring whose heigh ts w ere normally distributed. One can express the ab o v e problem sym b olically as follo ws. Supp ose that w e c ho ose t w o sp ecic p ositiv e real n um b ers x and y and then nd all pairs of paren ts one of whom is x units tall and the other of whom is y units tall. W e then lo ok at all of the ospring of these pairs of paren ts. One can p ostulate the existence of a function f ( x; y ) whic h denotes the genetic eect of the paren ts' heigh ts on the heigh ts of the ospring. One can then let W denote the eects of the nongenetic factors on the heigh ts of the ospring. Then, for a giv en set of heigh ts f x; y g the random v ariable whic h represen ts the heigh ts of the ospring is giv en b y H = f ( x; y ) + W ; where f is a deterministic function, i.e., it giv es one output for a pair of inputs f x; y g If w e assume that the eect of f is large in comparison with the eect of W then the v ariance of W is small. But since f is deterministic, the v ariance of H equals the v ariance of W so the v ariance of H is small. Ho w ev er, Galton observ ed from his data that the v ariance of the heigh ts of the ospring of a giv en pair of paren t heigh ts is not small. This w ould seem to imply that inheritance pla ys a small role in the determination of the heigh t of an individual. Later in this section, w e will describ e the w a y in whic h Galton got around this problem. W e will no w consider the mo dern explanation of wh y certain traits, suc h as heigh ts, are appro ximately normally distributed. In order to do so, w e need to in tro duce some terminology from the eld of genetics. The cells in a living organism that are not directly in v olv ed in the transmission of genetic material to ospring are called somatic cells, and the remaining cells are called germ cells. Organisms of PAGE 356 348 CHAPTER 9. CENTRAL LIMIT THEOREM a giv en sp ecies ha v e their genetic information enco ded in sets of ph ysical en tities, called c hromosomes. The c hromosomes are paired in eac h somatic cell. F or example, h uman b eings ha v e 23 pairs of c hromosomes in eac h somatic cell. The sex cells con tain one c hromosome from eac h pair. In sexual repro duction, t w o sex cells, one from eac h paren t, con tribute their c hromosomes to create the set of c hromosomes for the ospring. Chromosomes con tain man y subunits, called genes. Genes consist of molecules of DNA, and one gene has, enco ded in its DNA, information that leads to the regulation of proteins. In the presen t con text, w e will consider those genes con taining information that has an eect on some ph ysical trait, suc h as heigh t, of the organism. The pairing of the c hromosomes giv es rise to a pairing of the genes on the c hromosomes. In a giv en sp ecies, eac h gene can b e an y one of sev eral forms. These v arious forms are called alleles. One should think of the dieren t alleles as p oten tially pro ducing dieren t eects on the ph ysical trait in question. Of the t w o alleles that are found in a giv en gene pair in an organism, one of the alleles came from one paren t and the other allele came from the other paren t. The p ossible t yp es of pairs of alleles (without regard to order) are called genot yp es. If w e assume that the heigh t of a h uman b eing is largely con trolled b y a sp ecic gene, then w e are faced with the same dicult y that Galton w as. W e are assuming that eac h paren t has a pair of alleles whic h largely con trols their heigh ts. Since eac h paren t con tributes one allele of this gene pair to eac h of its ospring, there are four p ossible allele pairs for the ospring at this gene lo cation. The assumption is that these pairs of alleles largely con trol the heigh t of the ospring, and w e are also assuming that genetic factors out w eigh nongenetic factors. It follo ws that among the ospring w e should see sev eral mo des in the heigh t distribution of the ospring, one mo de corresp onding to eac h p ossible pair of alleles. This distribution do es not corresp ond to the observ ed distribution of heigh ts. An alternativ e h yp othesis, whic h do es explain the observ ation of normally distributed heigh ts in ospring of a giv en sex, is the m ultiplegene h yp othesis. Under this h yp othesis, w e assume that there are man y genes that aect the heigh t of an individual. These genes ma y dier in the amoun t of their eects. Th us, w e can represen t eac h gene pair b y a random v ariable X i where the v alue of the random v ariable is the allele pair's eect on the heigh t of the individual. Th us, for example, if eac h paren t has t w o dieren t alleles in the gene pair under consideration, then the ospring has one of four p ossible pairs of alleles at this gene lo cation. No w the heigh t of the ospring is a random v ariable, whic h can b e expressed as H = X 1 + X 2 + + X n + W ; if there are n genes that aect heigh t. (Here, as b efore, the random v ariable W denotes nongenetic eects.) Although n is xed, if it is fairly large, then Theorem 9.5 implies that the sum X 1 + X 2 + + X n is appro ximately normally distributed. No w, if w e assume that the X i 's ha v e a signican tly larger cum ulativ e eect than W do es, then H is appro ximately normally distributed. Another observ ed feature of the distribution of heigh ts of adults of one sex in PAGE 357 9.2. DISCRETE INDEPENDENT TRIALS 349 a p opulation is that the v ariance do es not seem to increase or decrease from one generation to the next. This w as kno wn at the time of Galton, and his attempts to explain this led him to the idea of regression to the mean. This idea will b e discussed further in the historical remarks at the end of the section. (The reason that w e only consider one sex is that h uman heigh ts are clearly sexlink ed, and in general, if w e ha v e t w o p opulations that are eac h normally distributed, then their union need not b e normally distributed.) Using the m ultiplegene h yp othesis, it is easy to explain wh y the v ariance should b e constan t from generation to generation. W e b egin b y assuming that for a sp ecic gene lo cation, there are k alleles, whic h w e will denote b y A 1 ; A 2 ; : : : ; A k W e assume that the ospring are pro duced b y random mating. By this w e mean that giv en an y ospring, it is equally lik ely that it came from an y pair of paren ts in the preceding generation. There is another w a y to lo ok at random mating that mak es the calculations easier. W e consider the set S of all of the alleles (at the giv en gene lo cation) in all of the germ cells of all of the individuals in the paren t generation. In terms of the set S b y random mating w e mean that eac h pair of alleles in S is equally lik ely to reside in an y particular ospring. (The reader migh t ob ject to this w a y of thinking ab out random mating, as it allo ws t w o alleles from the same paren t to end up in an ospring; but if the n um b er of individuals in the paren t p opulation is large, then whether or not w e allo w this ev en t do es not aect the probabilities v ery m uc h.) F or 1 i k w e let p i denote the prop ortion of alleles in the paren t p opulation that are of t yp e A i It is clear that this is the same as the prop ortion of alleles in the germ cells of the paren t p opulation, assuming that eac h paren t pro duces roughly the same n um b er of germs cells. Consider the distribution of alleles in the ospring. Since eac h germ cell is equally lik ely to b e c hosen for an y particular ospring, the distribution of alleles in the ospring is the same as in the paren ts. W e next consider the distribution of genot yp es in the t w o generations. W e will pro v e the follo wing fact: the distribution of genot yp es in the ospring generation dep ends only up on the distribution of alleles in the paren t generation (in particular, it do es not dep end up on the distribution of genot yp es in the paren t generation). Consider the p ossible genot yp es; there are k ( k + 1) = 2 of them. Under our assumptions, the genot yp e A i A i will o ccur with frequency p 2i and the genot yp e A i A j with i 6 = j will o ccur with frequency 2 p i p j Th us, the frequencies of the genot yp es dep end only up on the allele frequencies in the paren t generation, as claimed. This means that if w e start with a certain generation, and a certain distribution of alleles, then in all generations after the one w e started with, b oth the allele distribution and the genot yp e distribution will b e xed. This last statemen t is kno wn as the HardyW ein b erg La w. W e can describ e the consequences of this la w for the distribution of heigh ts among adults of one sex in a p opulation. W e recall that the heigh t of an ospring w as giv en b y a random v ariable H where H = X 1 + X 2 + + X n + W ; with the X i 's corresp onding to the genes that aect heigh t, and the random v ariable PAGE 358 350 CHAPTER 9. CENTRAL LIMIT THEOREM W denoting nongenetic eects. The HardyW ein b erg La w states that for eac h X i the distribution in the ospring generation is the same as the distribution in the paren t generation. Th us, if w e assume that the distribution of W is roughly the same from generation to generation (or if w e assume that its eects are small), then the distribution of H is the same from generation to generation. (In fact, dietary eects are part of W and it is clear that in man y h uman p opulations, diets ha v e c hanged quite a bit from one generation to the next in recen t times. This c hange is though t to b e one of the reasons that h umans, on the a v erage, are getting taller. It is also the case that the eects of W are though t to b e small relativ e to the genetic eects of the paren ts.) DiscussionGenerally sp eaking, the Cen tral Limit Theorem con tains more information than the La w of Large Num b ers, b ecause it giv es us detailed information ab out the shap e of the distribution of S n ; for large n the shap e is appro ximately the same as the shap e of the standard normal densit y More sp ecically the Cen tral Limit Theorem sa ys that if w e standardize and heigh tcorrect the distribution of S n then the normal densit y function is a v ery go o d appro ximation to this distribution when n is large. Th us, w e ha v e a computable appro ximation for the distribution for S n whic h pro vides us with a p o w erful tec hnique for generating answ ers for all sorts of questions ab out sums of indep enden t random v ariables, ev en if the individual random v ariables ha v e dieren t distributions. Historical Remarks In the mid1800's, the Belgian mathematician Quetelet 7 had sho wn empirically that the normal distribution o ccurred in real data, and had also giv en a metho d for tting the normal curv e to a giv en data set. Laplace 8 had sho wn m uc h earlier that the sum of man y indep enden t iden tically distributed random v ariables is appro ximately normal. Galton knew that certain ph ysical traits in a p opulation app eared to b e appro ximately normally distributed, but he did not consider Laplace's result to b e a go o d explanation of ho w this distribution comes ab out. W e giv e a quote from Galton that app ears in the fascinating b o ok b y S. Stigler 9 on the history of statistics: First, let me p oin t out a fact whic h Quetelet and all writers who ha v e follo w ed in his paths ha v e unaccoun tably o v erlo ok ed, and whic h has an in timate b earing on our w ork tonigh t. It is that, although c haracteristics of plan ts and animals conform to the la w, the reason of their doing so is as y et totally unexplained. The essence of the la w is that dierences should b e wholly due to the collectiv e actions of a host of indep enden t p etty inruences in v arious com binations...No w the pro cesses of heredit y ...are not p ett y inruences, but v ery imp ortan t ones...The conclusion 7 S. Stigler, The History of Statistics, (Cam bridge: Harv ard Univ ersit y Press, 1986), p. 203. 8 ibid., p. 136 9 ibid., p. 281. PAGE 359 9.2. DISCRETE INDEPENDENT TRIALS 351 Figure 9.11: Tw ostage v ersion of the quincunx. is...that the pro cesses of heredit y m ust w ork harmoniously with the la w of deviation, and b e themselv es in some sense conformable to it. Galton in v en ted a device kno wn as a quincunx (no w commonly called a Galton b oard), whic h w e used in Example 3.10 to sho w ho w to ph ysically obtain a binomial distribution. Of course, the Cen tral Limit Theorem sa ys that for large v alues of the parameter n the binomial distribution is appro ximately normal. Galton used the quincunx to explain ho w inheritance aects the distribution of a trait among ospring. W e consider, as Galton did, what happ ens if w e in terrupt, at some in termediate heigh t, the progress of the shot that is falling in the quincunx. The reader is referred to Figure 9.11. This gure is a dra wing of Karl P earson, 10 based up on Galton's notes. In this gure, the shot is b eing temp orarily segregated in to compartmen ts at the line AB. (The line A 0 B 0 forms a platform on whic h the shot can rest.) If the line AB is not to o close to the top of the quincunx, then the shot will b e appro ximately normally distributed at this line. No w supp ose that one compartmen t is op ened, as sho wn in the gure. The shot from that compartmen t will fall, forming a normal distribution at the b ottom of the quincunx. If no w all of the compartmen ts are 10 Karl P earson, The Life, L etters and L ab ours of F r ancis Galton, v ol. I I IB, (Cam bridge at the Univ ersit y Press 1930.) p. 466. Reprin ted with p ermission. PAGE 360 352 CHAPTER 9. CENTRAL LIMIT THEOREM op ened, all of the shot will fall, pro ducing the same distribution as w ould o ccur if the shot w ere not temp orarily stopp ed at the line AB. But the action of stopping the shot at the line AB, and then releasing the compartmen ts one at a time, is just the same as con v oluting t w o normal distributions. The normal distributions at the b ottom, corresp onding to eac h compartmen t at the line AB, are b eing mixed, with their w eigh ts b eing the n um b er of shot in eac h compartmen t. On the other hand, it is already kno wn that if the shot are unimp eded, the nal distribution is appro ximately normal. Th us, this device sho ws that the con v olution of t w o normal distributions is again normal. Galton also considered the quincunx from another p ersp ectiv e. He segregated in to sev en groups, b y w eigh t, a set of 490 sw eet p ea seeds. He ga v e 10 seeds from eac h of the sev en group to eac h of sev en friends, who grew the plan ts from the seeds. Galton found that eac h group pro duced seeds whose w eigh ts w ere normally distributed. (The sw eet p ea repro duces b y selfp ollination, so he did not need to consider the p ossibilit y of in teraction b et w een dieren t groups.) In addition, he found that the v ariances of the w eigh ts of the ospring w ere the same for eac h group. This segregation in to groups corresp onds to the compartmen ts at the line AB in the quincunx. Th us, the sw eet p eas w ere acting as though they w ere b eing go v erned b y a con v olution of normal distributions. He no w w as faced with a problem. W e ha v e sho wn in Chapter 7, and Galton knew, that the con v olution of t w o normal distributions pro duces a normal distribution with a larger v ariance than either of the original distributions. But his data on the sw eet p ea seeds sho w ed that the v ariance of the ospring p opulation w as the same as the v ariance of the paren t p opulation. His answ er to this problem w as to p ostulate a mec hanism that he called r eversion and is no w called r e gr ession to the me an As Stigler puts it: 11 The sev en groups of progen y w ere normally distributed, but not ab out their paren ts' w eigh t. Rather they w ere in ev ery case distributed ab out a v alue that w as closer to the a v erage p opulation w eigh t than w as that of the paren t. F urthermore, this rev ersion follo w ed \the simplest p ossible la w," that is, it w as linear. The a v erage deviation of the progen y from the p opulation a v erage w as in the same direction as that of the paren t, but only a third as great. The mean progen y rev erted to t yp e, and the increased v ariation w as just sucien t to main tain the p opulation v ariabilit y Galton illustrated rev ersion with the illustration sho wn in Figure 9.12. 12 The paren t p opulation is sho wn at the top of the gure, and the slan ted lines are mean t to corresp ond to the rev ersion eect. The ospring p opulation is sho wn at the b ottom of the gure. 11 ibid., p. 282. 12 Karl P earson, The Life, L etters and L ab ours of F r ancis Galton, v ol. I I IA, (Cam bridge at the Univ ersit y Press 1930.) p. 9. Reprin ted with p ermission. PAGE 361 9.2. DISCRETE INDEPENDENT TRIALS 353 Figure 9.12: Galton's explanation of rev ersion. PAGE 362 354 CHAPTER 9. CENTRAL LIMIT THEOREM Exercises 1 A die is rolled 24 times. Use the Cen tral Limit Theorem to estimate the probabilit y that (a) the sum is greater than 84. (b) the sum is equal to 84. 2 A random w alk er starts at 0 on the x axis and at eac h time unit mo v es 1 step to the righ t or 1 step to the left with probabilit y 1/2. Estimate the probabilit y that, after 100 steps, the w alk er is more than 10 steps from the starting p osition. 3 A piece of rop e is made up of 100 strands. Assume that the breaking strength of the rop e is the sum of the breaking strengths of the individual strands. Assume further that this sum ma y b e considered to b e the sum of an indep enden t trials pro cess with 100 exp erimen ts eac h ha ving exp ected v alue of 10 p ounds and standard deviation 1. Find the appro ximate probabilit y that the rop e will supp ort a w eigh t (a) of 1000 p ounds. (b) of 970 p ounds. 4 W rite a program to nd the a v erage of 1000 random digits 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. Ha v e the program test to see if the a v erage lies within three standard deviations of the exp ected v alue of 4.5. Mo dify the program so that it rep eats this sim ulation 1000 times and k eeps trac k of the n um b er of times the test is passed. Do es y our outcome agree with the Cen tral Limit Theorem? 5 A die is thro wn un til the rst time the total sum of the face v alues of the die is 700 or greater. Estimate the probabilit y that, for this to happ en, (a) more than 210 tosses are required. (b) less than 190 tosses are required. (c) b et w een 180 and 210 tosses, inclusiv e, are required. 6 A bank accepts rolls of p ennies and giv es 50 cen ts credit to a customer without coun ting the con ten ts. Assume that a roll con tains 49 p ennies 30 p ercen t of the time, 50 p ennies 60 p ercen t of the time, and 51 p ennies 10 p ercen t of the time. (a) Find the exp ected v alue and the v ariance for the amoun t that the bank loses on a t ypical roll. (b) Estimate the probabilit y that the bank will lose more than 25 cen ts in 100 rolls. (c) Estimate the probabilit y that the bank will lose exactly 25 cen ts in 100 rolls. PAGE 363 9.2. DISCRETE INDEPENDENT TRIALS 355 (d) Estimate the probabilit y that the bank will lose an y money in 100 rolls. (e) Ho w man y rolls do es the bank need to collect to ha v e a 99 p ercen t c hance of a net loss? 7 A surv eying instrumen t mak es an error of 2, 1, 0, 1, or 2 feet with equal probabilities when measuring the heigh t of a 200fo ot to w er. (a) Find the exp ected v alue and the v ariance for the heigh t obtained using this instrumen t once. (b) Estimate the probabilit y that in 18 indep enden t measuremen ts of this to w er, the a v erage of the measuremen ts is b et w een 199 and 201, inclusiv e. 8 F or Example 9.6 estimate P ( S 30 = 0). That is, estimate the probabilit y that the errors cancel out and the studen t's grade p oin t a v erage is correct. 9 Pro v e the La w of Large Num b ers using the Cen tral Limit Theorem. 10 P eter and P aul matc h p ennies 10,000 times. Describ e briery what eac h of the follo wing theorems tells y ou ab out P eter's fortune. (a) The La w of Large Num b ers. (b) The Cen tral Limit Theorem. 11 A tourist in Las V egas w as attracted b y a certain gam bling game in whic h the customer stak es 1 dollar on eac h pla y; a win then pa ys the customer 2 dollars plus the return of her stak e, although a loss costs her only her stak e. Las V egas insiders, and alert studen ts of probabilit y theory kno w that the probabilit y of winning at this game is 1/4. When driv en from the tables b y h unger, the tourist had pla y ed this game 240 times. Assuming that no near miracles happ ened, ab out ho w m uc h p o orer w as the tourist up on lea ving the casino? What is the probabilit y that she lost no money? 12 W e ha v e seen that, in pla ying roulette at Mon te Carlo (Example 6.13), b etting 1 dollar on red or 1 dollar on 17 amoun ts to c ho osing b et w een the distributions m X = 1 1 = 2 1 18 = 37 1 = 37 18 = 37 or m X = 1 35 36 = 37 1 = 37 Y ou plan to c ho ose one of these metho ds and use it to mak e 100 1dollar b ets using the metho d c hosen. Using the Cen tral Limit Theorem, estimate the probabilit y of winning an y money for eac h of the t w o games. Compare y our estimates with the actual probabilities, whic h can b e sho wn, from exact calculations, to equal .437 and .509 to three decimal places. 13 In Example 9.6 nd the largest v alue of p that giv es probabilit y .954 that the rst decimal place is correct. PAGE 364 356 CHAPTER 9. CENTRAL LIMIT THEOREM 14 It has b een suggested that Example 9.6 is unrealistic, in the sense that the probabilities of errors are to o lo w. Mak e up y our o wn (reasonable) estimate for the distribution m ( x ), and determine the probabilit y that a studen t's grade p oin t a v erage is accurate to within .05. Also determine the probabilit y that it is accurate to within .5. 15 Find a sequence of uniformly b ounded discrete indep enden t random v ariables f X n g suc h that the v ariance of their sum do es not tend to 1 as n 1 and suc h that their sum is not asymptotically normally distributed. 9.3 Cen tral Limit Theorem for Con tin uous Indep enden t T rials W e ha v e seen in Section 9.2 that the distribution function for the sum of a large n um b er n of indep enden t discrete random v ariables with mean and v ariance 2 tends to lo ok lik e a normal densit y with mean n and v ariance n 2 What is remark able ab out this result is that it holds for any distribution with nite mean and v ariance. W e shall see in this section that the same result also holds true for con tin uous random v ariables ha ving a common densit y function. Let us b egin b y lo oking at some examples to see whether suc h a result is ev en plausible.Standardized Sums Example 9.7 Supp ose w e c ho ose n random n um b ers from the in terv al [0 ; 1] with uniform densit y Let X 1 X 2 X n denote these c hoices, and S n = X 1 + X 2 + + X n their sum. W e sa w in Example 7.9 that the densit y function for S n tends to ha v e a normal shap e, but is cen tered at n= 2 and is rattened out. In order to compare the shap es of these densit y functions for dieren t v alues of n w e pro ceed as in the previous section: w e standar dize S n b y dening S n = S n n p n : Then w e see that for all n w e ha v e E ( S n ) = 0 ; V ( S n ) = 1 : The densit y function for S n is just a standardized v ersion of the densit y function for S n (see Figure 9.13). 2 Example 9.8 Let us do the same thing, but no w c ho ose n um b ers from the in terv al [0 ; + 1 ) with an exp onen tial densit y with parameter Then (see Example 6.26) PAGE 365 9.3. CONTINUOUS INDEPENDENT TRIALS 357 3 2 1 1 2 3 0.1 0.2 0.3 0.4 n = 2n = 3n = 4 n = 10 Figure 9.13: Densit y function for S n (uniform case, n = 2 ; 3 ; 4 ; 10). = E ( X i ) = 1 ; 2 = V ( X j ) = 1 2 : Here w e kno w the densit y function for S n explicitly (see Section 7.2). W e can use Corollary 5.1 to calculate the densit y function for S n W e obtain f S n ( x ) = e x ( x ) n 1 ( n 1)! ; f S n ( x ) = p n f S n p nx + n : The graph of the densit y function for S n is sho wn in Figure 9.14. 2 These examples mak e it seem plausible that the densit y function for the normalized random v ariable S n for large n will lo ok v ery m uc h lik e the normal densit y with mean 0 and v ariance 1 in the con tin uous case as w ell as in the discrete case. The Cen tral Limit Theorem mak es this statemen t precise. Cen tral Limit Theorem Theorem 9.6 (Cen tral Limit Theorem) Let S n = X 1 + X 2 + + X n b e the sum of n indep enden t con tin uous random v ariables with common densit y function p ha ving exp ected v alue and v ariance 2 Let S n = ( S n n ) = p n Then w e ha v e, PAGE 366 358 CHAPTER 9. CENTRAL LIMIT THEOREM 4 2 2 0.1 0.2 0.3 0.4 0.5 n = 2n = 3n = 10n = 30 Figure 9.14: Densit y function for S n (exp onen tial case, n = 2 ; 3 ; 10 ; 30, = 1). for all a < b lim n !1 P ( a < S n < b ) = 1 p 2 Z b a e x 2 = 2 dx : 2 W e shall giv e a pro of of this theorem in Section 10.3. W e will no w lo ok at some examples.Example 9.9 Supp ose a surv ey or w an ts to measure a kno wn distance, sa y of 1 mile, using a transit and some metho d of triangulation. He kno ws that b ecause of p ossible motion of the transit, atmospheric distortions, and h uman error, an y one measuremen t is apt to b e sligh tly in error. He plans to mak e sev eral measuremen ts and tak e an a v erage. He assumes that his measuremen ts are indep enden t random v ariables with a common distribution of mean = 1 and standard deviation = : 0002 (so, if the errors are appro ximately normally distributed, then his measuremen ts are within 1 fo ot of the correct distance ab out 65% of the time). What can he sa y ab out the a v erage? He can sa y that if n is large, the a v erage S n =n has a densit y function that is appro ximately normal, with mean = 1 mile, and standard deviation = : 0002 = p n miles. Ho w man y measuremen ts should he mak e to b e reasonably sure that his a v erage lies within .0001 of the true v alue? The Cheb yshev inequalit y sa ys P S n n : 0001 ( : 0002) 2 n (10 8 ) = 4 n ; so that w e m ust ha v e n 80 b efore the probabilit y that his error is less than .0001 exceeds .95. PAGE 367 9.3. CONTINUOUS INDEPENDENT TRIALS 359 W e ha v e already noticed that the estimate in the Cheb yshev inequalit y is not alw a ys a go o d one, and here is a case in p oin t. If w e assume that n is large enough so that the densit y for S n is appro ximately normal, then w e ha v e P S n n < : 0001 = P : 5 p n < S n < + : 5 p n 1 p 2 Z + : 5 p n : 5 p n e x 2 = 2 dx ; and this last expression is greater than .95 if : 5 p n 2 : This sa ys that it suces to tak e n = 16 measuremen ts for the same results. This second calculation is stronger, but dep ends on the assumption that n = 16 is large enough to establish the normal densit y as a go o d appro ximation to S n and hence to S n The Cen tral Limit Theorem here sa ys nothing ab out ho w large n has to b e. In most cases in v olving sums of indep enden t random v ariables, a go o d rule of th um b is that for n 30, the appro ximation is a go o d one. In the presen t case, if w e assume that the errors are appro ximately normally distributed, then the appro ximation is probably fairly go o d ev en for n = 16. 2 Estimating the Mean Example 9.10 (Con tin uation of Example 9.9) No w supp ose our surv ey or is measuring an unkno wn distance with the same instrumen ts under the same conditions. He tak es 36 measuremen ts and a v erages them. Ho w sure can he b e that his measuremen t lies within .0002 of the true v alue? Again using the normal appro ximation, w e get P S n n < : 0002 = P j S n j < : 5 p n 2 p 2 Z 3 3 e x 2 = 2 dx : 997 : This means that the surv ey or can b e 99.7 p ercen t sure that his a v erage is within .0002 of the true v alue. T o impro v e his condence, he can tak e more measuremen ts, or require less accuracy or impro v e the qualit y of his measuremen ts (i.e., reduce the v ariance 2 ). In eac h case, the Cen tral Limit Theorem giv es quan titativ e information ab out the condence of a measuremen t pro cess, assuming alw a ys that the normal appro ximation is v alid. No w supp ose the surv ey or do es not kno w the mean or standard deviation of his measuremen ts, but assumes that they are indep enden t. Ho w should he pro ceed? Again, he mak es sev eral measuremen ts of a kno wn distance and a v erages them. As b efore, the a v erage error is appro ximately normally distributed, but no w with unkno wn mean and v ariance. 2 PAGE 368 360 CHAPTER 9. CENTRAL LIMIT THEOREM Sample Mean If he kno ws the v ariance 2 of the error distribution is .0002, then he can estimate the mean b y taking the aver age, or sample me an of, sa y 36 measuremen ts: = x 1 + x 2 + + x n n ; where n = 36. Then, as b efore, E ( ) = Moreo v er, the preceding argumen t sho ws that P ( j j < : 0002) : 997 : The in terv al ( : 0002 ; + : 0002) is called the 99.7% c ondenc e interval for (see Example 9.4). Sample V ariance If he do es not kno w the v ariance 2 of the error distribution, then he can estimate 2 b y the sample varianc e : 2 = ( x 1 ) 2 + ( x 2 ) 2 + + ( x n ) 2 n ; where n = 36. The La w of Large Num b ers, applied to the random v ariables ( X i ) 2 sa ys that for large n the sample v ariance 2 lies close to the v ariance 2 so that the surv ey or can use 2 in place of 2 in the argumen t ab o v e. Exp erience has sho wn that, in most practical problems of this t yp e, the sample v ariance is a go o d estimate for the v ariance, and can b e used in place of the v ariance to determine condence lev els for the sample mean. This means that w e can rely on the La w of Large Num b ers for estimating the v ariance, and the Cen tral Limit Theorem for estimating the mean. W e can c hec k this in some sp ecial cases. Supp ose w e kno w that the error distribution is normal, with unkno wn mean and v ariance. Then w e can tak e a sample of n measuremen ts, nd the sample mean and sample v ariance 2 and form T n = S n n p n ; where n = 36. W e exp ect T n to b e a go o d appro ximation for S n for large n t Densit y The statistician W. S. Gosset 13 has sho wn that in this case T n has a densit y function that is not normal but rather a t density with n degrees of freedom. (The n um b er n of degrees of freedom is simply a parameter whic h tells us whic h t densit y to use.) In this case w e can use the t densit y in place of the normal densit y to determine condence lev els for As n increases, the t densit y approac hes the normal densit y Indeed, ev en for n = 8 the t densit y and normal densit y are practically the same (see Figure 9.15). 13 W. S. Gosset disco v ered the distribution w e no w call the t distribution while w orking for the Guinness Brew ery in Dublin. He wrote under the pseudon ym \Studen t." The results discussed here rst app eared in Studen t, \The Probable Error of a Mean," Biometrika, v ol. 6 (1908), pp. 124. PAGE 369 9.3. CONTINUOUS INDEPENDENT TRIALS 361 6 4 2 2 4 6 0.1 0.2 0.3 0.4 Figure 9.15: Graph of t densit y for n = 1 ; 3 ; 8 and the normal densit y with = 0 ; = 1. ExercisesNotes on c omputer pr oblems : (a) Sim ulation: Recall (see Corollary 5.2) that X = F 1 ( r nd ) will sim ulate a random v ariable with densit y f ( x ) and distribution F ( X ) = Z x 1 f ( t ) dt : In the case that f ( x ) is a normal densit y function with mean and standard deviation where neither F nor F 1 can b e expressed in closed form, use instead X = p 2 log ( r nd ) cos 2 ( r nd ) + : (b) Bar graphs: y ou should aim for ab out 20 to 30 bars (of equal width) in y our graph. Y ou can ac hiev e this b y a go o d c hoice of the range [ x min ; x min ] and the n um b er of bars (for instance, [ 3 ; + 3 ] with 30 bars will w ork in man y cases). Exp erimen t! 1 Let X b e a con tin uous random v ariable with mean ( X ) and v ariance 2 ( X ), and let X = ( X ) = b e its standardized v ersion. V erify directly that ( X ) = 0 and 2 ( X ) = 1. PAGE 370 362 CHAPTER 9. CENTRAL LIMIT THEOREM 2 Let f X k g 1 k n b e a sequence of indep enden t random v ariables, all with mean 0 and v ariance 1, and let S n S n and A n b e their sum, standardized sum, and a v erage, resp ectiv ely V erify directly that S n = S n = p n = p nA n 3 Let f X k g 1 k n b e a sequence of random v ariables, all with mean and v ariance 2 and Y k = X k b e their standardized v ersions. Let S n and T n b e the sum of the X k and Y k and S n and T n their standardized v ersion. Sho w that S n = T n = T n = p n 4 Supp ose w e c ho ose indep enden tly 25 n um b ers at random (uniform densit y) from the in terv al [0 ; 20]. W rite the normal densities that appro ximate the densities of their sum S 25 their standardized sum S 25 and their a v erage A 25 5 W rite a program to c ho ose indep enden tly 25 n um b ers at random from [0 ; 20], compute their sum S 25 and rep eat this exp erimen t 1000 times. Mak e a bar graph for the densit y of S 25 and compare it with the normal appro ximation of Exercise 4. Ho w go o d is the t? No w do the same for the standardized sum S 25 and the a v erage A 25 6 In general, the Cen tral Limit Theorem giv es a b etter estimate than Cheb yshev's inequalit y for the a v erage of a sum. T o see this, let A 25 b e the a v erage calculated in Exercise 5, and let N b e the normal appro ximation for A 25 Mo dify y our program in Exercise 5 to pro vide a table of the function F ( x ) = P ( j A 25 10 j x ) = fraction of the total of 1000 trials for whic h j A 25 10 j x Do the same for the function f ( x ) = P ( j N 10 j x ). (Y ou can use the normal table, T able 9.4, or the pro cedure NormalArea for this.) No w plot on the same axes the graphs of F ( x ), f ( x ), and the Cheb yshev function g ( x ) = 4 = (3 x 2 ). Ho w do f ( x ) and g ( x ) compare as estimates for F ( x )? 7 The Cen tral Limit Theorem sa ys the sums of indep enden t random v ariables tend to lo ok normal, no matter what crazy distribution the individual v ariables ha v e. Let us test this b y a computer sim ulation. Cho ose indep enden tly 25 n um b ers from the in terv al [0 ; 1] with the probabilit y densit y f ( x ) giv en b elo w, and compute their sum S 25 Rep eat this exp erimen t 1000 times, and mak e up a bar graph of the results. No w plot on the same graph the densit y ( x ) = normal ( x; ( S 25 ) ; ( S 25 )). Ho w w ell do es the normal densit y t y our bar graph in eac h case? (a) f ( x ) = 1. (b) f ( x ) = 2 x (c) f ( x ) = 3 x 2 (d) f ( x ) = 4 j x 1 = 2 j (e) f ( x ) = 2 4 j x 1 = 2 j 8 Rep eat the exp erimen t describ ed in Exercise 7 but no w c ho ose the 25 n um b ers from [0 ; 1 ), using f ( x ) = e x PAGE 371 9.3. CONTINUOUS INDEPENDENT TRIALS 363 9 Ho w large m ust n b e b efore S n = X 1 + X 2 + + X n is appro ximately normal? This n um b er is often surprisingly small. Let us explore this question with a computer sim ulation. Cho ose n n um b ers from [0 ; 1] with probabilit y densit y f ( x ), where n = 3, 6, 12, 20, and f ( x ) is eac h of the densities in Exercise 7. Compute their sum S n rep eat this exp erimen t 1000 times, and mak e up a bar graph of 20 bars of the results. Ho w large m ust n b e b efore y ou get a go o d t? 10 A surv ey or is measuring the heigh t of a cli kno wn to b e ab out 1000 feet. He assumes his instrumen t is prop erly calibrated and that his measuremen t errors are indep enden t, with mean = 0 and v ariance 2 = 10. He plans to tak e n measuremen ts and form the a v erage. Estimate, using (a) Cheb yshev's inequalit y and (b) the normal appro ximation, ho w large n should b e if he w an ts to b e 95 p ercen t sure that his a v erage falls within 1 fo ot of the true v alue. No w estimate, using (a) and (b), what v alue should 2 ha v e if he w an ts to mak e only 10 measuremen ts with the same condence? 11 The price of one share of sto c k in the Pilsdor Beer Compan y (see Exercise 8.2.12) is giv en b y Y n on the n th da y of the y ear. Finn observ es that the dierences X n = Y n +1 Y n app ear to b e indep enden t random v ariables with a common distribution ha ving mean = 0 and v ariance 2 = 1 = 4. If Y 1 = 100, estimate the probabilit y that Y 365 is (a) 100. (b) 110. (c) 120. 12 T est y our conclusions in Exercise 11 b y computer sim ulation. First c ho ose 364 n um b ers X i with densit y f ( x ) = normal ( x; 0 ; 1 = 4). No w form the sum Y 365 = 100 + X 1 + X 2 + + X 364 and rep eat this exp erimen t 200 times. Mak e up a bar graph on [50 ; 150] of the results, sup erimp osing the graph of the appro ximating normal densit y What do es this graph sa y ab out y our answ ers in Exercise 11? 13 Ph ysicists sa y that particles in a long tub e are constan tly mo ving bac k and forth along the tub e, eac h with a v elo cit y V k (in cm/sec) at an y giv en momen t that is normally distributed, with mean = 0 and v ariance 2 = 1. Supp ose there are 10 20 particles in the tub e. (a) Find the mean and v ariance of the a v erage v elo cit y of the particles. (b) What is the probabilit y that the a v erage v elo cit y is 10 9 cm/sec? 14 An astronomer mak es n measuremen ts of the distance b et w een Jupiter and a particular one of its mo ons. Exp erience with the instrumen ts used leads her to b eliev e that for the prop er units the measuremen ts will b e normally PAGE 372 364 CHAPTER 9. CENTRAL LIMIT THEOREM distributed with mean d the true distance, and v ariance 16. She p erforms a series of n measuremen ts. Let A n = X 1 + X 2 + + X n n b e the a v erage of these measuremen ts. (a) Sho w that P A n 8 p n d A n + 8 p n : 95 : (b) When nine measuremen ts w ere tak en, the a v erage of the distances turned out to b e 23.2 units. Putting the observ ed v alues in (a) giv es the 95 p erc ent c ondenc e interval for the unkno wn distance d Compute this interv al. (c) Wh y not sa y in (b) more simply that the probabilit y is .95 that the v alue of d lies in the computed condence in terv al? (d) What c hanges w ould y ou mak e in the ab o v e pro cedure if y ou w an ted to compute a 99 p ercen t condence in terv al? 15 Plot a bar graph similar to that in Figure 9.10 for the heigh ts of the midparen ts in Galton's data as giv en in App endix B and compare this bar graph to the appropriate normal curv e. PAGE 373 Chapter 10 Generating F unctions 10.1 Generating F unctions for Discrete Distributions So far w e ha v e considered in detail only the t w o most imp ortan t attributes of a random v ariable, namely the mean and the v ariance. W e ha v e seen ho w these attributes en ter in to the fundamen tal limit theorems of probabilit y as w ell as in to all sorts of practical calculations. W e ha v e seen that the mean and v ariance of a random v ariable con tain imp ortan t information ab out the random v ariable, or, more precisely ab out the distribution function of that v ariable. No w w e shall see that the mean and v ariance do not con tain al l the a v ailable information ab out the densit y function of a random v ariable. T o b egin with, it is easy to giv e examples of dieren t distribution functions whic h ha v e the same mean and the same v ariance. F or instance, supp ose X and Y are random v ariables, with distributions p X = 1 2 3 4 5 6 0 1 = 4 1 = 2 0 0 1 = 4 ; p Y = 1 2 3 4 5 6 1 = 4 0 0 1 = 2 1 = 4 0 : Then with these c hoices, w e ha v e E ( X ) = E ( Y ) = 7 = 2 and V ( X ) = V ( Y ) = 9 = 4, and y et certainly p X and p Y are quite dieren t densit y functions. This raises a question: If X is a random v ariable with range f x 1 ; x 2 ; : : : g of at most coun table size, and distribution function p = p X and if w e kno w its mean = E ( X ) and its v ariance 2 = V ( X ), then what else do w e need to kno w to determine p completely? Momen ts A nice answ er to this question, at least in the case that X has nite range, can b e giv en in terms of the moments of X whic h are n um b ers dened as follo ws: 365 PAGE 374 366 CHAPTER 10. GENERA TING FUNCTIONS k = k th momen t of X = E ( X k ) = 1 X j =1 ( x j ) k p ( x j ) ; pro vided the sum con v erges. Here p ( x j ) = P ( X = x j ). In terms of these momen ts, the mean and v ariance 2 of X are giv en simply b y = 1 ; 2 = 2 21 ; so that a kno wledge of the rst t w o momen ts of X giv es us its mean and v ariance. But a kno wledge of al l the momen ts of X determines its distribution function p completely Momen t Generating F unctions T o see ho w this comes ab out, w e in tro duce a new v ariable t and dene a function g ( t ) as follo ws: g ( t ) = E ( e tX ) = 1 Xk =0 k t k k = E 1 X k =0 X k t k k = 1 X j =1 e tx j p ( x j ) : W e call g ( t ) the moment gener ating function for X and think of it as a con v enien t b o okk eeping device for describing the momen ts of X Indeed, if w e dieren tiate g ( t ) n times and then set t = 0, w e get n : d n dt n g ( t ) t =0 = g ( n ) (0) = 1 X k = n k k t k n ( k n )! k t =0 = n : It is easy to calculate the momen t generating function for simple examples. PAGE 375 10.1. DISCRETE DISTRIBUTIONS 367 ExamplesExample 10.1 Supp ose X has range f 1 ; 2 ; 3 ; : : : ; n g and p X ( j ) = 1 =n for 1 j n (uniform distribution). Then g ( t ) = n X j =1 1 n e tj = 1 n ( e t + e 2 t + + e nt ) = e t ( e nt 1) n ( e t 1) : If w e use the expression on the righ thand side of the second line ab o v e, then it is easy to see that 1 = g 0 (0) = 1 n (1 + 2 + 3 + + n ) = n + 1 2 ; 2 = g 00 (0) = 1 n (1 + 4 + 9 + + n 2 ) = ( n + 1)(2 n + 1) 6 ; and that = 1 = ( n + 1) = 2 and 2 = 2 21 = ( n 2 1) = 12. 2 Example 10.2 Supp ose no w that X has range f 0 ; 1 ; 2 ; 3 ; : : : ; n g and p X ( j ) = n j p j q n j for 0 j n (binomial distribution). Then g ( t ) = n X j =0 e tj n j p j q n j = n X j =0 n j ( pe t ) j q n j = ( pe t + q ) n : Note that 1 = g 0 (0) = n ( pe t + q ) n 1 pe t t =0 = np ; 2 = g 00 (0) = n ( n 1) p 2 + np ; so that = 1 = np and 2 = 2 21 = np (1 p ), as exp ected. 2 Example 10.3 Supp ose X has range f 1 ; 2 ; 3 ; : : : g and p X ( j ) = q j 1 p for all j (geometric distribution). Then g ( t ) = 1 X j =1 e tj q j 1 p = pe t 1 q e t : PAGE 376 368 CHAPTER 10. GENERA TING FUNCTIONS Here 1 = g 0 (0) = pe t (1 q e t ) 2 t =0 = 1 p ; 2 = g 00 (0) = pe t + pq e 2 t (1 q e t ) 3 t =0 = 1 + q p 2 ; = 1 = 1 =p and 2 = 2 21 = q =p 2 as computed in Example 6.26. 2 Example 10.4 Let X ha v e range f 0 ; 1 ; 2 ; 3 ; : : : g and let p X ( j ) = e j =j for all j (P oisson distribution with mean ). Then g ( t ) = 1 X j =0 e tj e j j = e 1 X j =0 ( e t ) j j = e e e t = e ( e t 1) : Then 1 = g 0 (0) = e ( e t 1) e t t =0 = ; 2 = g 00 (0) = e ( e t 1) ( 2 e 2 t + e t ) t =0 = 2 + ; = 1 = and 2 = 2 21 = The v ariance of the P oisson distribution is easier to obtain in this w a y than directly from the denition (as w as done in Exercise 6.2.29). 2 Momen t Problem Using the momen t generating function, w e can no w sho w, at least in the case of a discrete random v ariable with nite range, that its distribution function is completely determined b y its momen ts. Theorem 10.1 Let X b e a discrete random v ariable with nite range f x 1 ; x 2 ; : : : ; x n g distribution function p and momen t generating function g Then g is uniquely determined b y p and con v ersely Pro of. W e kno w that p determines g since g ( t ) = n X j =1 e tx j p ( x j ) : Con v ersely assume that g ( t ) is kno wn. W e wish to determine the v alues of x j and p ( x j ), for 1 j n W e assume, without loss of generalit y that p ( x j ) > 0 for 1 j n and that x 1 < x 2 < : : : < x n : PAGE 377 10.1. DISCRETE DISTRIBUTIONS 369 W e note that g ( t ) is dieren tiable for all t since it is a nite linear com bination of exp onen tial functions. If w e compute g 0 ( t ) =g ( t ), w e obtain x 1 p ( x 1 ) e tx 1 + : : : + x n p ( x n ) e tx n p ( x 1 ) e tx 1 + : : : + p ( x n ) e tx n : Dividing b oth top and b ottom b y e tx n w e obtain the expression x 1 p ( x 1 ) e t ( x 1 x n ) + : : : + x n p ( x n ) p ( x 1 ) e t ( x 1 x n ) + : : : + p ( x n ) : Since x n is the largest of the x j 's, this expression approac hes x n as t go es to 1 So w e ha v e sho wn that x n = lim t !1 g 0 ( t ) g ( t ) : T o nd p ( x n ), w e simply divide g ( t ) b y e tx n and let t go to 1 Once x n and p ( x n ) ha v e b een determined, w e can subtract p ( x n ) e tx n from g ( t ), and rep eat the ab o v e pro cedure with the resulting function, obtaining, in turn, x n 1 ; : : : ; x 1 and p ( x n 1 ) ; : : : ; p ( x 1 ). 2 If w e delete the h yp othesis that X ha v e nite range in the ab o v e theorem, then the conclusion is no longer necessarily true. Ordinary Generating F unctions In the sp ecial but imp ortan t case where the x j are all nonnegativ e in tegers, x j = j w e can pro v e this theorem in a simpler w a y In this case, w e ha v e g ( t ) = n X j =0 e tj p ( j ) ; and w e see that g ( t ) is a p olynomial in e t If w e write z = e t and dene the function h b y h ( z ) = n X j =0 z j p ( j ) ; then h ( z ) is a p olynomial in z con taining the same information as g ( t ), and in fact h ( z ) = g (log z ) ; g ( t ) = h ( e t ) : The function h ( z ) is often called the or dinary gener ating function for X Note that h (1) = g (0) = 1, h 0 (1) = g 0 (0) = 1 and h 00 (1) = g 00 (0) g 0 (0) = 2 1 It follo ws from all this that if w e kno w g ( t ), then w e kno w h ( z ), and if w e kno w h ( z ), then w e can nd the p ( j ) b y T a ylor's form ula: p ( j ) = co ecien t of z j in h ( z ) = h ( j ) (0) j : PAGE 378 370 CHAPTER 10. GENERA TING FUNCTIONS F or example, supp ose w e kno w that the momen ts of a certain discrete random v ariable X are giv en b y 0 = 1 ; k = 1 2 + 2 k 4 ; for k 1 : Then the momen t generating function g of X is g ( t ) = 1 Xk =0 k t k k = 1 + 1 2 1 X k =1 t k k + 1 4 1 X k =1 (2 t ) k k = 1 4 + 1 2 e t + 1 4 e 2 t : This is a p olynomial in z = e t and h ( z ) = 1 4 + 1 2 z + 1 4 z 2 : Hence, X m ust ha v e range f 0 ; 1 ; 2 g and p m ust ha v e v alues f 1 = 4 ; 1 = 2 ; 1 = 4 g Prop erties Both the momen t generating function g and the ordinary generating function h ha v e man y prop erties useful in the study of random v ariables, of whic h w e can consider only a few here. In particular, if X is an y discrete random v ariable and Y = X + a then g Y ( t ) = E ( e tY ) = E ( e t ( X + a ) ) = e ta E ( e tX ) = e ta g X ( t ) ; while if Y = bX then g Y ( t ) = E ( e tY ) = E ( e tbX ) = g X ( bt ) : In particular, if X = X ; then (see Exercise 11) g x ( t ) = e t= g X t : PAGE 379 10.1. DISCRETE DISTRIBUTIONS 371 If X and Y are indep endent random v ariables and Z = X + Y is their sum, with p X p Y and p Z the asso ciated distribution functions, then w e ha v e seen in Chapter 7 that p Z is the c onvolution of p X and p Y and w e kno w that con v olution in v olv es a rather complicated calculation. But for the generating functions w e ha v e instead the simple relations g Z ( t ) = g X ( t ) g Y ( t ) ; h Z ( z ) = h X ( z ) h Y ( z ) ; that is, g Z is simply the pr o duct of g X and g Y and similarly for h Z T o see this, rst note that if X and Y are indep enden t, then e tX and e tY are indep enden t (see Exercise 5.2.38), and hence E ( e tX e tY ) = E ( e tX ) E ( e tY ) : It follo ws that g Z ( t ) = E ( e tZ ) = E ( e t ( X + Y ) ) = E ( e tX ) E ( e tY ) = g X ( t ) g Y ( t ) ; and, replacing t b y log z w e also get h Z ( z ) = h X ( z ) h Y ( z ) : Example 10.5 If X and Y are indep enden t discrete random v ariables with range f 0 ; 1 ; 2 ; : : : ; n g and binomial distribution p X ( j ) = p Y ( j ) = n j p j q n j ; and if Z = X + Y then w e kno w (cf. Section 7.1) that the range of X is f 0 ; 1 ; 2 ; : : : ; 2 n g and X has binomial distribution p Z ( j ) = ( p X p Y )( j ) = 2 n j p j q 2 n j : Here w e can easily v erify this result b y using generating functions. W e kno w that g X ( t ) = g Y ( t ) = n X j =0 e tj n j p j q n j = ( pe t + q ) n ; and h X ( z ) = h Y ( z ) = ( pz + q ) n : PAGE 380 372 CHAPTER 10. GENERA TING FUNCTIONS Hence, w e ha v e g Z ( t ) = g X ( t ) g Y ( t ) = ( pe t + q ) 2 n ; or, what is the same, h Z ( z ) = h X ( z ) h Y ( z ) = ( pz + q ) 2 n = 2 n X j =0 2 n j ( pz ) j q 2 n j ; from whic h w e can see that the co ecien t of z j is just p Z ( j ) = 2 n j p j q 2 n j 2 Example 10.6 If X and Y are indep enden t discrete random v ariables with the nonnegativ e in tegers f 0 ; 1 ; 2 ; 3 ; : : : g as range, and with geometric distribution function p X ( j ) = p Y ( j ) = q j p ; then g X ( t ) = g Y ( t ) = p 1 q e t ; and if Z = X + Y then g Z ( t ) = g X ( t ) g Y ( t ) = p 2 1 2 q e t + q 2 e 2 t : If w e replace e t b y z w e get h Z ( z ) = p 2 (1 q z ) 2 = p 2 1 X k =0 ( k + 1) q k z k ; and w e can read o the v alues of p Z ( j ) as the co ecien t of z j in this expansion for h ( z ), ev en though h ( z ) is not a p olynomial in this case. The distribution p Z is a negativ e binomial distribution (see Section 5.1). 2 Here is a more in teresting example of the p o w er and scop e of the metho d of generating functions. Heads or T ails Example 10.7 In the cointossing game discussed in Example 1.4, w e no w consider the question \When is P eter rst in the lead?" Let X k describ e the outcome of the k th trial in the game X k = +1 ; if k th toss is heads ; 1 ; if k th toss is tails. PAGE 381 10.1. DISCRETE DISTRIBUTIONS 373 Then the X k are indep enden t random v ariables describing a Bernoulli pro cess. Let S 0 = 0, and, for n 1, let S n = X 1 + X 2 + + X n : Then S n describ es P eter's fortune after n trials, and P eter is rst in the lead after n trials if S k 0 for 1 k < n and S n = 1. No w this can happ en when n = 1, in whic h case S 1 = X 1 = 1, or when n > 1, in whic h case S 1 = X 1 = 1. In the latter case, S k = 0 for k = n 1, and p erhaps for other k b et w een 1 and n Let m b e the le ast suc h v alue of k ; then S m = 0 and S k < 0 for 1 k < m In this case P eter loses on the rst trial, regains his initial p osition in the next m 1 trials, and gains the lead in the next n m trials. Let p b e the probabilit y that the coin comes up heads, and let q = 1 p Let r n b e the probabilit y that P eter is rst in the lead after n trials. Then from the discussion ab o v e, w e see that r n = 0 ; if n ev en ; r 1 = p (= probabilit y of heads in a single toss) ; r n = q ( r 1 r n 2 + r 3 r n 4 + + r n 2 r 1 ) ; if n > 1 ; n o dd : No w let T describ e the time (that is, the n um b er of trials) required for P eter to tak e the lead. Then T is a random v ariable, and since P ( T = n ) = r n r is the distribution function for T W e in tro duce the generating function h T ( z ) for T : h T ( z ) = 1 X n =0 r n z n : Then, b y using the relations ab o v e, w e can v erify the relation h T ( z ) = pz + q z ( h T ( z )) 2 : If w e solv e this quadratic equation for h T ( z ), w e get h T ( z ) = 1 p 1 4 pq z 2 2 q z = 2 pz 1 p 1 4 pq z 2 : Of these t w o solutions, w e w an t the one that has a con v ergen t p o w er series in z (i.e., that is nite for z = 0). Hence w e c ho ose h T ( z ) = 1 p 1 4 pq z 2 2 q z = 2 pz 1 + p 1 4 pq z 2 : No w w e can ask: What is the probabilit y that P eter is ever in the lead? This probabilit y is giv en b y (see Exercise 10) 1 X n =0 r n = h T (1) = 1 p 1 4 pq 2 q = 1 j p q j 2 q = p=q ; if p < q ; 1 ; if p q ; PAGE 382 374 CHAPTER 10. GENERA TING FUNCTIONS so that P eter is sure to b e in the lead ev en tually if p q Ho w long will it tak e? That is, what is the exp ected v alue of T ? This v alue is giv en b y E ( T ) = h 0T (1) = 1 = ( p q ) ; if p > q ; 1 ; if p = q : This sa ys that if p > q then P eter can exp ect to b e in the lead b y ab out 1 = ( p q ) trials, but if p = q he can exp ect to w ait a long time. A related problem, kno wn as the Gam bler's Ruin problem, is studied in Exercise 23 and in Section 12.2. 2 Exercises 1 Find the generating functions, b oth ordinary h ( z ) and momen t g ( t ), for the follo wing discrete probabilit y distributions. (a) The distribution describing a fair coin. (b) The distribution describing a fair die. (c) The distribution describing a die that alw a ys comes up 3. (d) The uniform distribution on the set f n; n + 1 ; n + 2 ; : : : ; n + k g (e) The binomial distribution on f n; n + 1 ; n + 2 ; : : : ; n + k g (f ) The geometric distribution on f 0 ; 1 ; 2 ; : : : ; g with p ( j ) = 2 = 3 j +1 2 F or eac h of the distributions (a) through (d) of Exercise 1 calculate the rst and second momen ts, 1 and 2 directly from their denition, and v erify that h (1) = 1, h 0 (1) = 1 and h 00 (1) = 2 1 3 Let p b e a probabilit y distribution on f 0 ; 1 ; 2 g with momen ts 1 = 1, 2 = 3 = 2. (a) Find its ordinary generating function h ( z ). (b) Using (a), nd its momen t generating function. (c) Using (b), nd its rst six momen ts. (d) Using (a), nd p 0 p 1 and p 2 4 In Exercise 3, the probabilit y distribution is completely determined b y its rst t w o momen ts. Sho w that this is alw a ys true for an y probabilit y distribution on f 0 ; 1 ; 2 g Hint : Giv en 1 and 2 nd h ( z ) as in Exercise 3 and use h ( z ) to determine p 5 Let p and p 0 b e the t w o distributions p = 1 2 3 4 5 1 = 3 0 0 2 = 3 0 ; p 0 = 1 2 3 4 5 0 2 = 3 0 0 1 = 3 : PAGE 383 10.1. DISCRETE DISTRIBUTIONS 375 (a) Sho w that p and p 0 ha v e the same rst and second momen ts, but not the same third and fourth momen ts. (b) Find the ordinary and momen t generating functions for p and p 0 6 Let p b e the probabilit y distribution p = 0 1 2 0 1 = 3 2 = 3 ; and let p n = p p p b e the n fold con v olution of p with itself. (a) Find p 2 b y direct calculation (see Denition 7.1). (b) Find the ordinary generating functions h ( z ) and h 2 ( z ) for p and p 2 and v erify that h 2 ( z ) = ( h ( z )) 2 (c) Find h n ( z ) from h ( z ). (d) Find the rst t w o momen ts, and hence the mean and v ariance, of p n from h n ( z ). V erify that the mean of p n is n times the mean of p (e) Find those in tegers j for whic h p n ( j ) > 0 from h n ( z ). 7 Let X b e a discrete random v ariable with v alues in f 0 ; 1 ; 2 ; : : : ; n g and momen t generating function g ( t ). Find, in terms of g ( t ), the generating functions for (a) X (b) X + 1. (c) 3 X (d) aX + b 8 Let X 1 X 2 X n b e an indep enden t trials pro cess, with v alues in f 0 ; 1 g and mean = 1 = 3. Find the ordinary and momen t generating functions for the distribution of (a) S 1 = X 1 Hint : First nd X 1 explicitly (b) S 2 = X 1 + X 2 (c) S n = X 1 + X 2 + + X n 9 Let X and Y b e random v ariables with v alues in f 1 ; 2 ; 3 ; 4 ; 5 ; 6 g with distribution functions p X and p Y giv en b y p X ( j ) = a j ; p Y ( j ) = b j : (a) Find the ordinary generating functions h X ( z ) and h Y ( z ) for these distributions. (b) Find the ordinary generating function h Z ( z ) for the distribution Z = X + Y PAGE 384 376 CHAPTER 10. GENERA TING FUNCTIONS (c) Sho w that h Z ( z ) cannot ev er ha v e the form h Z ( z ) = z 2 + z 3 + + z 12 11 : Hint : h X and h Y m ust ha v e at least one nonzero ro ot, but h Z ( z ) in the form giv en has no nonzero real ro ots. It follo ws from this observ ation that there is no w a y to load t w o dice so that the probabilit y that a giv en sum will turn up when they are tossed is the same for all sums (i.e., that all outcomes are equally lik ely). 10 Sho w that if h ( z ) = 1 p 1 4 pq z 2 2 q z ; then h (1) = p=q ; if p q ; 1 ; if p q ; and h 0 (1) = 1 = ( p q ) ; if p > q ; 1 ; if p = q : 11 Sho w that if X is a random v ariable with mean and v ariance 2 and if X = ( X ) = is the standardized v ersion of X then g X ( t ) = e t= g X t : 10.2 Branc hing Pro cesses Historical Bac kground In this section w e apply the theory of generating functions to the study of an imp ortan t c hance pro cess called a br anching pr o c ess. Un til recen tly it w as though t that the theory of branc hing pro cesses originated with the follo wing problem p osed b y F rancis Galton in the Educ ational Times in 1873. 1 Problem 4001: A large nation, of whom w e will only concern ourselv es with the adult males, N in n um b er, and who eac h b ear separate surnames, colonise a district. Their la w of p opulation is suc h that, in eac h generation, a 0 p er cen t of the adult males ha v e no male c hildren who reac h adult life; a 1 ha v e one suc h male c hild; a 2 ha v e t w o; and so on up to a 5 who ha v e v e. Find (1) what prop ortion of the surnames will ha v e b ecome extinct after r generations; and (2) ho w man y instances there will b e of the same surname b eing held b y m p ersons. 1 D. G. Kendall, \Branc hing Pro cesses Since 1873," Journal of L ondon Mathematics So ciety, v ol. 41 (1966), p. 386. PAGE 385 10.2. BRANCHING PR OCESSES 377 The rst attempt at a solution w as giv en b y Rev erend H. W. W atson. Because of a mistak e in algebra, he incorrectly concluded that a family name w ould alw a ys die out with probabilit y 1. Ho w ev er, the metho ds that he emplo y ed to solv e the problems w ere, and still are, the basis for obtaining the correct solution. Heyde and Seneta disco v ered an earlier comm unication b y Biena ym e (1845) that an ticipated Galton and W atson b y 28 y ears. Biena ym e sho w ed, in fact, that he w as a w are of the correct solution to Galton's problem. Heyde and Seneta in their b o ok I. J. Bienaym e: Statistic al The ory A nticip ate d, 2 giv e the follo wing translation from Biena ym e's pap er: If the mean of the n um b er of male c hildren who replace the n um b er of males of the preceding generation w ere less than unit y it w ould b e easily realized that families are dying out due to the disapp earance of the mem b ers of whic h they are comp osed. Ho w ev er, the analysis sho ws further that when this mean is equal to unit y families tend to disapp ear, although less rapidly The analysis also sho ws clearly that if the mean ratio is greater than unit y the probabilit y of the extinction of families with the passing of time no longer reduces to certain t y It only approac hes a nite limit, whic h is fairly simple to calculate and whic h has the singular c haracteristic of b eing giv en b y one of the ro ots of the equation (in whic h the n um b er of generations is made innite) whic h is not relev an t to the question when the mean ratio is less than unit y 3 Although Biena ym e do es not giv e his reasoning for these results, he did indicate that he in tended to publish a sp ecial pap er on the problem. The pap er w as nev er written, or at least has nev er b een found. In his comm unication Biena ym e indicated that he w as motiv ated b y the same problem that o ccurred to Galton. The op ening paragraph of his pap er as translated b y Heyde and Seneta sa ys, A great deal of consideration has b een giv en to the p ossible m ultiplication of the n um b ers of mankind; and recen tly v arious v ery curious observ ations ha v e b een published on the fate whic h allegedly hangs o v er the aristo crary and middle classes; the families of famous men, etc. This fate, it is alleged, will inevitably bring ab out the disapp earance of the socalled families ferm ees. 4 A m uc h more extensiv e discussion of the history of branc hing pro cesses ma y b e found in t w o pap ers b y Da vid G. Kendall. 5 2 C. C. Heyde and E. Seneta, I. J. Bienaym e: Statistic al The ory A nticip ate d (New Y ork: Springer V erlag, 1977). 3 ibid., pp. 117{118. 4 ibid., p. 118. 5 D. G. Kendall, \Branc hing Pro cesses Since 1873," pp. 385{406; and \The Genealogy of Genealogy: Branc hing Pro cesses Before (and After) 1873," Bul letin L ondon Mathematics So ciety, v ol. 7 (1975), pp. 225{253. PAGE 386 378 CHAPTER 10. GENERA TING FUNCTIONS 2 10 1/4 1/4 1/41/41/4 1/4 1/2 1/161/85/16 1/2 43 2 1 0 0 1 2 1/641/325/64 1/8 1/161/16 1/16 1/16 1/2 Figure 10.1: T ree diagram for Example 10.8. Branc hing pro cesses ha v e serv ed not only as crude mo dels for p opulation gro wth but also as mo dels for certain ph ysical pro cesses suc h as c hemical and n uclear c hain reactions.Problem of Extinction W e turn no w to the rst problem p osed b y Galton (i.e., the problem of nding the probabilit y of extinction for a branc hing pro cess). W e start in the 0th generation with 1 male paren t. In the rst generation w e shall ha v e 0, 1, 2, 3, male ospring with probabilities p 0 p 1 p 2 p 3 If in the rst generation there are k ospring, then in the second generation there will b e X 1 + X 2 + + X k ospring, where X 1 X 2 X k are indep enden t random v ariables, eac h with the common distribution p 0 p 1 p 2 This description enables us to construct a tree, and a tree measure, for an y n um b er of generations. ExamplesExample 10.8 Assume that p 0 = 1 = 2, p 1 = 1 = 4, and p 2 = 1 = 4. Then the tree measure for the rst t w o generations is sho wn in Figure 10.1. Note that w e use the theory of sums of indep enden t random v ariables to assign branc h probabilities. F or example, if there are t w o ospring in the rst generation, the probabilit y that there will b e t w o in the second generation is P ( X 1 + X 2 = 2) = p 0 p 2 + p 1 p 1 + p 2 p 0 = 1 2 1 4 + 1 4 1 4 + 1 4 1 2 = 5 16 : W e no w study the probabilit y that our pro cess dies out (i.e., that at some generation there are no ospring). PAGE 387 10.2. BRANCHING PR OCESSES 379 Let d m b e the probabilit y that the pro cess dies out b y the m th generation. Of course, d 0 = 0. In our example, d 1 = 1 = 2 and d 2 = 1 = 2 + 1 = 8 + 1 = 16 = 11 = 16 (see Figure 10.1). Note that w e m ust add the probabilities for all paths that lead to 0 b y the m th generation. It is clear from the denition that 0 = d 0 d 1 d 2 1 : Hence, d m con v erges to a limit d 0 d 1, and d is the probabilit y that the pro cess will ultimately die out. It is this v alue that w e wish to determine. W e b egin b y expressing the v alue d m in terms of all p ossible outcomes on the rst generation. If there are j ospring in the rst generation, then to die out b y the m th generation, eac h of these lines m ust die out in m 1 generations. Since they pro ceed indep enden tly this probabilit y is ( d m 1 ) j Therefore d m = p 0 + p 1 d m 1 + p 2 ( d m 1 ) 2 + p 3 ( d m 1 ) 3 + : (10.1) Let h ( z ) b e the ordinary generating function for the p i : h ( z ) = p 0 + p 1 z + p 2 z 2 + : Using this generating function, w e can rewrite Equation 10.1 in the form d m = h ( d m 1 ) : (10.2) Since d m d b y Equation 10.2 w e see that the v alue d that w e are lo oking for satises the equation d = h ( d ) : (10.3) One solution of this equation is alw a ys d = 1, since 1 = p 0 + p 1 + p 2 + : This is where W atson made his mistak e. He assumed that 1 w as the only solution to Equation 10.3. T o examine this question more carefully w e rst note that solutions to Equation 10.3 represen t in tersections of the graphs of y = z and y = h ( z ) = p 0 + p 1 z + p 2 z 2 + : Th us w e need to study the graph of y = h ( z ). W e note that h (0) = p 0 Also, h 0 ( z ) = p 1 + 2 p 2 z + 3 p 3 z 2 + ; (10.4) and h 00 ( z ) = 2 p 2 + 3 2 p 3 z + 4 3 p 4 z 2 + : F rom this w e see that for z 0, h 0 ( z ) 0 and h 00 ( z ) 0. Th us for nonnegativ e z h ( z ) is an increasing function and is conca v e up w ard. Therefore the graph of PAGE 388 380 CHAPTER 10. GENERA TING FUNCTIONS 1 1 1 1 1 1 0 0 0 0 0 y z d > 1 d < 1 d = 1 0 y = z y y z z y = h (z) 1 1 (a) (c) ( b) Figure 10.2: Graphs of y = z and y = h ( z ). y = h ( z ) can in tersect the line y = z in at most t w o p oin ts. Since w e kno w it m ust in tersect the line y = z at (1 ; 1), w e kno w that there are just three p ossibilities, as sho wn in Figure 10.2. In case (a) the equation d = h ( d ) has ro ots f d; 1 g with 0 d < 1. In the second case (b) it has only the one ro ot d = 1. In case (c) it has t w o ro ots f 1 ; d g where 1 < d Since w e are lo oking for a solution 0 d 1, w e see in cases (b) and (c) that our only solution is 1. In these cases w e can conclude that the pro cess will die out with probabilit y 1. Ho w ev er in case (a) w e are in doubt. W e m ust study this case more carefully F rom Equation 10.4 w e see that h 0 (1) = p 1 + 2 p 2 + 3 p 3 + = m ; where m is the exp ected n um b er of ospring pro duced b y a single paren t. In case (a) w e ha v e h 0 (1) > 1, in (b) h 0 (1) = 1, and in (c) h 0 (1) < 1. Th us our three cases corresp ond to m > 1, m = 1, and m < 1. W e assume no w that m > 1. Recall that d 0 = 0, d 1 = h ( d 0 ) = p 0 d 2 = h ( d 1 ), and d n = h ( d n 1 ). W e can construct these v alues geometrically as sho wn in Figure 10.3. W e can see geometrically as indicated for d 0 d 1 d 2 and d 3 in Figure 10.3, that the p oin ts ( d i ; h ( d i )) will alw a ys lie ab o v e the line y = z Hence, they m ust con v erge to the rst in tersection of the curv es y = z and y = h ( z ) (i.e., to the ro ot d < 1). This leads us to the follo wing theorem. 2 Theorem 10.2 Consider a branc hing pro cess with generating function h ( z ) for the n um b er of ospring of a giv en paren t. Let d b e the smallest ro ot of the equation z = h ( z ). If the mean n um b er m of ospring pro duced b y a single paren t is 1, then d = 1 and the pro cess dies out with probabilit y 1. If m > 1 then d < 1 and the pro cess dies out with probabilit y d 2 W e shall often w an t to kno w the probabilit y that a branc hing pro cess dies out b y a particular generation, as w ell as the limit of these probabilities. Let d n b e PAGE 389 10.2. BRANCHING PR OCESSES 381 y = z y = h(z) y z 1 p 0 0 d = 0 1 d d d d 1 2 3 Figure 10.3: Geometric determination of d the probabilit y of dying out b y the n th generation. Then w e kno w that d 1 = p 0 W e kno w further that d n = h ( d n 1 ) where h ( z ) is the generating function for the n um b er of ospring pro duced b y a single paren t. This mak es it easy to compute these probabilities. The program Branc h calculates the v alues of d n W e ha v e run this program for 12 generations for the case that a paren t can pro duce at most t w o ospring and the probabilities for the n um b er pro duced are p 0 = : 2, p 1 = : 5, and p 2 = : 3. The results are giv en in T able 10.1. W e see that the probabilit y of dying out b y 12 generations is ab out .6. W e shall see in the next example that the probabilit y of ev en tually dying out is 2/3, so that ev en 12 generations is not enough to giv e an accurate estimate for this probabilit y W e no w assume that at most t w o ospring can b e pro duced. Then h ( z ) = p 0 + p 1 z + p 2 z 2 : In this simple case the condition z = h ( z ) yields the equation d = p 0 + p 1 d + p 2 d 2 ; whic h is satised b y d = 1 and d = p 0 =p 2 Th us, in addition to the ro ot d = 1 w e ha v e the second ro ot d = p 0 =p 2 The mean n um b er m of ospring pro duced b y a single paren t is m = p 1 + 2 p 2 = 1 p 0 p 2 + 2 p 2 = 1 p 0 + p 2 : Th us, if p 0 > p 2 m < 1 and the second ro ot is > 1. If p 0 = p 2 w e ha v e a double ro ot d = 1. If p 0 < p 2 m > 1 and the second ro ot d is less than 1 and represen ts the probabilit y that the pro cess will die out. PAGE 390 382 CHAPTER 10. GENERA TING FUNCTIONS Generation Probabilit y of dying out 1 .2 2 .312 3 .385203 4 .437116 5 .475879 6 .505878 7 .529713 8 .549035 9 .564949 10 .578225 11 .589416 12 .598931 T able 10.1: Probabilit y of dying out. p 0 = : 2092 p 1 = : 2584 p 2 = : 2360 p 3 = : 1593 p 4 = : 0828 p 5 = : 0357 p 6 = : 0133 p 7 = : 0042 p 8 = : 0011 p 9 = : 0002 p 10 = : 0000 T able 10.2: Distribution of n um b er of female c hildren. Example 10.9 Keytz 6 compiled and analyzed data on the con tin uation of the female family line among Japanese w omen. His estimates at the basic probabilit y distribution for the n um b er of female c hildren b orn to Japanese w omen of ages 45{49 in 1960 are giv en in T able 10.2. The exp ected n um b er of girls in a family is then 1.837 so the probabilit y d of extinction is less than 1. If w e run the program Branc h w e can estimate that d is in fact only ab out .324. 2 Distribution of Ospring So far w e ha v e considered only the rst of the t w o problems raised b y Galton, namely the probabilit y of extinction. W e no w consider the second problem, that is, the distribution of the n um b er Z n of ospring in the n th generation. The exact form of the distribution is not kno wn except in v ery sp ecial cases. W e shall see, 6 N. Keytz, Intr o duction to the Mathematics of Population, rev. ed. (Reading, P A: Addison W esley 1977). PAGE 391 10.2. BRANCHING PR OCESSES 383 ho w ev er, that w e can describ e the limiting b eha vior of Z n as n 1 W e rst sho w that the generating function h n ( z ) of the distribution of Z n can b e obtained from h ( z ) for an y branc hing pro cess. W e recall that the v alue of the generating function at the v alue z for an y random v ariable X can b e written as h ( z ) = E ( z X ) = p 0 + p 1 z + p 2 z 2 + : That is, h ( z ) is the exp ected v alue of an exp erimen t whic h has outcome z j with probabilit y p j Let S n = X 1 + X 2 + + X n where eac h X j has the same in tegerv alued distribution ( p j ) with generating function k ( z ) = p 0 + p 1 z + p 2 z 2 + : Let k n ( z ) b e the generating function of S n Then using one of the prop erties of ordinary generating functions discussed in Section 10.1, w e ha v e k n ( z ) = ( k ( z )) n ; since the X j 's are indep enden t and all ha v e the same distribution. Consider no w the branc hing pro cess Z n Let h n ( z ) b e the generating function of Z n Then h n +1 ( z ) = E ( z Z n +1 ) = X k E ( z Z n +1 j Z n = k ) P ( Z n = k ) : If Z n = k then Z n +1 = X 1 + X 2 + + X k where X 1 X 2 X k are indep enden t random v ariables with common generating function h ( z ). Th us E ( z Z n +1 j Z n = k ) = E ( z X 1 + X 2 + + X k ) = ( h ( z )) k ; and h n +1 ( z ) = X k ( h ( z )) k P ( Z n = k ) : But h n ( z ) = X k P ( Z n = k ) z k : Th us, h n +1 ( z ) = h n ( h ( z )) : (10.5) If w e dieren tiate Equation 10.5 and use the c hain rule w e ha v e h 0n +1 ( z ) = h 0n ( h ( z )) h 0 ( z ) : Putting z = 1 and using the fact that h (1) = 1, h 0 (1) = m and h 0n (1) = m n = the mean n um b er of ospring in the n 'th generation, w e ha v e m n +1 = m n m : Th us, m 2 = m m = m 2 m 3 = m 2 m = m 3 and in general m n = m n : Th us, for a branc hing pro cess with m > 1, the mean n um b er of ospring gro ws exp onen tially at a rate m PAGE 392 384 CHAPTER 10. GENERA TING FUNCTIONS ExamplesExample 10.10 F or the branc hing pro cess of Example 10.8 w e ha v e h ( z ) = 1 = 2 + (1 = 4) z + (1 = 4) z 2 ; h 2 ( z ) = h ( h ( z )) = 1 = 2 + (1 = 4)[1 = 2 + (1 = 4) z + (1 = 4) z 2 ] = +(1 = 4)[1 = 2 + (1 = 4) z + (1 = 4) z 2 ] 2 = 11 = 16 + (1 = 8) z + (9 = 64) z 2 + (1 = 32) z 3 + (1 = 64) z 4 : The probabilities for the n um b er of ospring in the second generation agree with those obtained directly from the tree measure (see Figure 1). 2 It is clear that ev en in the simple case of at most t w o ospring, w e cannot easily carry out the calculation of h n ( z ) b y this metho d. Ho w ev er, there is one sp ecial case in whic h this can b e done. Example 10.11 Assume that the probabilities p 1 p 2 form a geometric series: p k = bc k 1 k = 1, 2, with 0 < b 1 c and 0 < c < 1. Then w e ha v e p 0 = 1 p 1 p 2 = 1 b bc bc 2 = 1 b 1 c : The generating function h ( z ) for this distribution is h ( z ) = p 0 + p 1 z + p 2 z 2 + = 1 b 1 c + bz + bcz 2 + bc 2 z 3 + = 1 b 1 c + bz 1 cz : F rom this w e nd h 0 ( z ) = bcz (1 cz ) 2 + b 1 cz = b (1 cz ) 2 and m = h 0 (1) = b (1 c ) 2 : W e kno w that if m 1 the pro cess will surely die out and d = 1. T o nd the probabilit y d when m > 1 w e m ust nd a ro ot d < 1 of the equation z = h ( z ) ; or z = 1 b 1 c + bz 1 cz : PAGE 393 10.2. BRANCHING PR OCESSES 385 This leads us to a quadratic equation. W e kno w that z = 1 is one solution. The other is found to b e d = 1 b c c (1 c ) : It is easy to v erify that d < 1 just when m > 1. It is p ossible in this case to nd the distribution of Z n This is done b y rst nding the generating function h n ( z ). 7 The result for m 6 = 1 is: h n ( z ) = 1 m n 1 d m n d + m n h 1 d m n d i 2 z 1 h m n 1 m n d i z : The co ecien ts of the p o w ers of z giv e the distribution for Z n : P ( Z n = 0) = 1 m n 1 d m n d = d ( m n 1) m n d and P ( Z n = j ) = m n 1 d m n d 2 m n 1 m n d j 1 ; for j 1. 2 Example 10.12 Let us reexamine the Keytz data to see if a distribution of the t yp e considered in Example 10.11 could reasonably b e used as a mo del for this p opulation. W e w ould ha v e to estimate from the data the parameters b and c for the form ula p k = bc k 1 Recall that m = b (1 c ) 2 (10.6) and the probabilit y d that the pro cess dies out is d = 1 b c c (1 c ) : (10.7) Solving Equation 10.6 and 10.7 for b and c giv es c = m 1 m d and b = m 1 d m d 2 : W e shall use the v alue 1.837 for m and .324 for d that w e found in the Keytz example. Using these v alues, w e obtain b = : 3666 and c = : 5533. Note that (1 c ) 2 < b < 1 c as required. In T able 10.3 w e giv e for comparison the probabilities p 0 through p 8 as calculated b y the geometric distribution v ersus the empirical v alues. 7 T. E. Harris, The The ory of Br anching Pr o c esses (Berlin: Springer, 1963), p. 9. PAGE 394 386 CHAPTER 10. GENERA TING FUNCTIONS Geometric p j Data Mo del 0 .2092 .1816 1 .2584 .3666 2 .2360 .2028 3 .1593 .1122 4 .0828 .0621 5 .0357 .0344 6 .0133 .0190 7 .0042 .0105 8 .0011 .0058 9 .0002 .0032 10 .0000 .0018 T able 10.3: Comparison of observ ed and exp ected frequencies. The geometric mo del tends to fa v or the larger n um b ers of ospring but is similar enough to sho w that this mo died geometric distribution migh t b e appropriate to use for studies of this kind. Recall that if S n = X 1 + X 2 + + X n is the sum of indep enden t random v ariables with the same distribution then the La w of Large Num b ers states that S n =n con v erges to a constan t, namely E ( X 1 ). It is natural to ask if there is a similar limiting theorem for branc hing pro cesses. Consider a branc hing pro cess with Z n represen ting the n um b er of ospring after n generations. Then w e ha v e seen that the exp ected v alue of Z n is m n Th us w e can scale the random v ariable Z n to ha v e exp ected v alue 1 b y considering the random v ariable W n = Z n m n : In the theory of branc hing pro cesses it is pro v ed that this random v ariable W n will tend to a limit as n tends to innit y Ho w ev er, unlik e the case of the La w of Large Num b ers where this limit is a constan t, for a branc hing pro cess the limiting v alue of the random v ariables W n is itself a random v ariable. Although w e cannot pro v e this theorem here w e can illustrate it b y sim ulation. This requires a little care. When a branc hing pro cess surviv es, the n um b er of ospring is apt to get v ery large. If in a giv en generation there are 1000 ospring, the ospring of the next generation are the result of 1000 c hance ev en ts, and it will tak e a while to sim ulate these 1000 exp erimen ts. Ho w ev er, since the nal result is the sum of 1000 indep enden t exp erimen ts w e can use the Cen tral Limit Theorem to replace these 1000 exp erimen ts b y a single exp erimen t with normal densit y ha ving the appropriate mean and v ariance. The program Branc hingSim ulation carries out this pro cess. W e ha v e run this program for the Keytz example, carrying out 10 sim ulations and graphing the results in Figure 10.4. The exp ected n um b er of female ospring p er female is 1.837, so that w e are graphing the outcome for the random v ariables W n = Z n = (1 : 837) n F or three of PAGE 395 10.2. BRANCHING PR OCESSES 387 5 10 15 20 25 0.5 1 1.5 2 2.5 3 Figure 10.4: Sim ulation of Z n =m n for the Keytz example. the sim ulations the pro cess died out, whic h is consisten t with the v alue d = : 3 that w e found for this example. F or the other sev en sim ulations the v alue of W n tends to a limiting v alue whic h is dieren t for eac h sim ulation. 2 Example 10.13 W e no w examine the random v ariable Z n more closely for the case m < 1 (see Example 10.11). Fix a v alue t > 0; let [ tm n ] b e the in teger part of tm n Then P ( Z n = [ tm n ]) = m n ( 1 d m n d ) 2 ( m n 1 m n d ) [ tm n ] 1 = 1 m n ( 1 d 1 d=m n ) 2 ( 1 1 =m n 1 d=m n ) tm n + a ; where j a j 2. Th us, as n 1 m n P ( Z n = [ tm n ]) (1 d ) 2 e t e td = (1 d ) 2 e t (1 d ) : F or t = 0, P ( Z n = 0) d : W e can compare this result with the Cen tral Limit Theorem for sums S n of in tegerv alued indep enden t random v ariables (see Theorem 9.3), whic h states that if t is an in teger and u = ( t n ) = p 2 n then as n 1 p 2 n P ( S n = u p 2 n + n ) 1 p 2 e u 2 = 2 : W e see that the form of these statemen ts are quite similar. It is p ossible to pro v e a limit theorem for a general class of branc hing pro cesses that states that under PAGE 396 388 CHAPTER 10. GENERA TING FUNCTIONS suitable h yp otheses, as n 1 m n P ( Z n = [ tm n ]) k ( t ) ; for t > 0, and P ( Z n = 0) d : Ho w ev er, unlik e the Cen tral Limit Theorem for sums of indep enden t random v ariables, the function k ( t ) will dep end up on the basic distribution that determines the pro cess. Its form is kno wn for only a v ery few examples similar to the one w e ha v e considered here. 2 Chain Letter Problem Example 10.14 An in teresting example of a branc hing pro cess w as suggested b y F ree Huizinga. 8 In 1978, a c hain letter called the \Circle of Gold," b eliev ed to ha v e started in California, found its w a y across the coun try to the theater district of New Y ork. The c hain required a participan t to buy a letter con taining a list of 12 names for 100 dollars. The buy er giv es 50 dollars to the p erson from whom the letter w as purc hased and then sends 50 dollars to the p erson whose name is at the top of the list. The buy er then crosses o the name at the top of the list and adds her o wn name at the b ottom in eac h letter b efore it is sold again. Let us rst assume that the buy er ma y sell the letter only to a single p erson. If y ou buy the letter y ou will w an t to compute y our exp ected winnings. (W e are ignoring here the fact that the passing on of c hain letters through the mail is a federal oense with certain ob vious resulting p enalties.) Assume that eac h p erson in v olv ed has a probabilit y p of selling the letter. Then y ou will receiv e 50 dollars with probabilit y p and another 50 dollars if the letter is sold to 12 p eople, since then y our name w ould ha v e risen to the top of the list. This o ccurs with probabilit y p 12 and so y our exp ected winnings are 100 + 50 p + 50 p 12 Th us the c hain in this situation is a highly unfa v orable game. It w ould b e more reasonable to allo w eac h p erson in v olv ed to mak e a cop y of the list and try to sell the letter to at least 2 other p eople. Then y ou w ould ha v e a c hance of reco v ering y our 100 dollars on these sales, and if an y of the letters is sold 12 times y ou will receiv e a b on us of 50 dollars for eac h of these cases. W e can consider this as a branc hing pro cess with 12 generations. The mem b ers of the rst generation are the letters y ou sell. The second generation consists of the letters sold b y mem b ers of the rst generation, and so forth. Let us assume that the probabilities that eac h individual sells letters to 0, 1, or 2 others are p 0 p 1 and p 2 resp ectiv ely Let Z 1 Z 2 Z 12 b e the n um b er of letters in the rst 12 generations of this branc hing pro cess. Then y our exp ected winnings are 50( E ( Z 1 ) + E ( Z 12 )) = 50 m + 50 m 12 ; 8 Priv ate comm unication. PAGE 397 10.2. BRANCHING PR OCESSES 389 where m = p 1 + 2 p 2 is the exp ected n um b er of letters y ou sold. Th us to b e fa v orable w e just ha v e 50 m + 50 m 12 > 100 ; or m + m 12 > 2 : But this will b e true if and only if m > 1. W e ha v e seen that this will o ccur in the quadratic case if and only if p 2 > p 0 Let us assume for example that p 0 = : 2, p 1 = : 5, and p 2 = : 3. Then m = 1 : 1 and the c hain w ould b e a fa v orable game. Y our exp ected prot w ould b e 50(1 : 1 + 1 : 1 12 ) 100 112 : The probabilit y that y ou receiv e at least one pa ymen t from the 12th generation is 1 d 12 W e nd from our program Branc h that d 12 = : 599. Th us, 1 d 12 = : 401 is the probabilit y that y ou receiv e some b on us. The maxim um that y ou could receiv e from the c hain w ould b e 50(2 + 2 12 ) = 204 ; 900 if ev ery one w ere to successfully sell t w o letters. Of course y ou can not alw a ys exp ect to b e so luc ky (What is the probabilit y of this happ ening?) T o sim ulate this game, w e need only sim ulate a branc hing pro cess for 12 generations. Using a sligh tly mo died v ersion of our program Branc hingSim ulation w e carried out t w en t y suc h sim ulations, giving the results sho wn in T able 10.4. Note that w e w ere quite luc ky on a few runs, but w e came out ahead only a little less than half the time. The pro cess died out b y the t w elfth generation in 12 out of the 20 exp erimen ts, in go o d agreemen t with the probabilit y d 12 = : 599 that w e calculated using the program Branc h Let us mo dify the assumptions ab out our c hain letter to let the buy er sell the letter to as man y p eople as she can instead of to a maxim um of t w o. W e shall assume, in fact, that a p erson has a large n um b er N of acquain tances and a small probabilit y p of p ersuading an y one of them to buy the letter. Then the distribution for the n um b er of letters that she sells will b e a binomial distribution with mean m = N p Since N is large and p is small, w e can assume that the probabilit y p j that an individual sells the letter to j p eople is giv en b y the P oisson distribution p j = e m m j j : PAGE 398 390 CHAPTER 10. GENERA TING FUNCTIONS Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 Z 7 Z 8 Z 9 Z 10 Z 11 Z 12 Prot 1 0 0 0 0 0 0 0 0 0 0 0 50 1 1 2 3 2 3 2 1 2 3 3 6 250 0 0 0 0 0 0 0 0 0 0 0 0 100 2 4 4 2 3 4 4 3 2 2 1 1 50 1 2 3 5 4 3 3 3 5 8 6 6 250 0 0 0 0 0 0 0 0 0 0 0 0 100 2 3 2 2 2 1 2 3 3 3 4 6 300 1 2 1 1 1 1 2 1 0 0 0 0 50 0 0 0 0 0 0 0 0 0 0 0 0 100 1 0 0 0 0 0 0 0 0 0 0 0 50 2 3 2 3 3 3 5 9 12 12 13 15 750 1 1 1 0 0 0 0 0 0 0 0 0 50 1 2 2 3 3 0 0 0 0 0 0 0 50 1 1 1 1 2 2 3 4 4 6 4 5 200 1 1 0 0 0 0 0 0 0 0 0 0 50 1 0 0 0 0 0 0 0 0 0 0 0 50 1 0 0 0 0 0 0 0 0 0 0 0 50 1 1 2 3 3 4 2 3 3 3 3 2 50 1 2 4 6 6 9 10 13 16 17 15 18 850 1 0 0 0 0 0 0 0 0 0 0 0 50 T able 10.4: Sim ulation of c hain letter (nite distribution case). PAGE 399 10.2. BRANCHING PR OCESSES 391 Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 Z 7 Z 8 Z 9 Z 10 Z 11 Z 12 Prot 1 2 6 7 7 8 11 9 7 6 6 5 200 1 0 0 0 0 0 0 0 0 0 0 0 50 1 0 0 0 0 0 0 0 0 0 0 0 50 1 1 1 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0 0 0 0 0 0 0 0 100 1 1 1 1 1 1 2 4 9 7 9 7 300 2 3 3 4 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 50 2 1 0 0 0 0 0 0 0 0 0 0 0 3 3 4 7 11 17 14 11 11 10 16 25 1300 0 0 0 0 0 0 0 0 0 0 0 0 100 1 2 2 1 1 3 1 0 0 0 0 0 50 0 0 0 0 0 0 0 0 0 0 0 0 100 2 3 1 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 50 1 0 0 0 0 0 0 0 0 0 0 0 50 3 4 4 7 10 11 9 11 12 14 13 10 550 1 3 3 4 9 5 7 9 8 8 6 3 100 1 0 4 6 6 9 10 13 0 0 0 0 50 1 0 0 0 0 0 0 0 0 0 0 0 50 T able 10.5: Sim ulation of c hain letter (P oisson case). The generating function for the P oisson distribution is h ( z ) = 1 X j =0 e m m j z j j = e m 1 X j =0 m j z j j = e m e mz = e m ( z 1) : The exp ected n um b er of letters that an individual passes on is m and again to b e fa v orable w e m ust ha v e m > 1. Let us assume again that m = 1 : 1. Then w e can nd again the probabilit y 1 d 12 of a b on us from Branc h The result is .232. Although the exp ected winnings are the same, the v ariance is larger in this case, and the buy er has a b etter c hance for a reasonably large prot. W e again carried out 20 sim ulations using the P oisson distribution with mean 1.1. The results are sho wn in T able 10.5. W e note that, as b efore, w e came out ahead less than half the time, but w e also had one large prot. In only 6 of the 20 cases did w e receiv e an y prot. This is again in reasonable agreemen t with our calculation of a probabilit y .232 for this happ ening. 2 PAGE 400 392 CHAPTER 10. GENERA TING FUNCTIONS Exercises 1 Let Z 1 Z 2 Z N describ e a branc hing pro cess in whic h eac h paren t has j ospring with probabilit y p j Find the probabilit y d that the pro cess ev entually dies out if (a) p 0 = 1 = 2, p 1 = 1 = 4, and p 2 = 1 = 4. (b) p 0 = 1 = 3, p 1 = 1 = 3, and p 2 = 1 = 3. (c) p 0 = 1 = 3, p 1 = 0, and p 2 = 2 = 3. (d) p j = 1 = 2 j +1 for j = 0, 1, 2, (e) p j = (1 = 3)(2 = 3) j for j = 0, 1, 2, (f ) p j = e 2 2 j =j !, for j = 0, 1, 2, (estimate d n umerically). 2 Let Z 1 Z 2 Z N describ e a branc hing pro cess in whic h eac h paren t has j ospring with probabilit y p j Find the probabilit y d that the pro cess dies out if (a) p 0 = 1 = 2, p 1 = p 2 = 0, and p 3 = 1 = 2. (b) p 0 = p 1 = p 2 = p 3 = 1 = 4. (c) p 0 = t p 1 = 1 2 t p 2 = 0, and p 3 = t where t 1 = 2. 3 In the c hain letter problem (see Example 10.14) nd y our exp ected prot if (a) p 0 = 1 = 2, p 1 = 0, and p 2 = 1 = 2. (b) p 0 = 1 = 6, p 1 = 1 = 2, and p 2 = 1 = 3. Sho w that if p 0 > 1 = 2, y ou cannot exp ect to mak e a prot. 4 Let S N = X 1 + X 2 + + X N where the X i 's are indep enden t random v ariables with common distribution ha ving generating function f ( z ). Assume that N is an in teger v alued random v ariable indep enden t of all of the X j and ha ving generating function g ( z ). Sho w that the generating function for S N is h ( z ) = g ( f ( z )). Hint : Use the fact that h ( z ) = E ( z S N ) = X k E ( z S N j N = k ) P ( N = k ) : 5 W e ha v e seen that if the generating function for the ospring of a single paren t is f ( z ), then the generating function for the n um b er of ospring after t w o generations is giv en b y h ( z ) = f ( f ( z )). Explain ho w this follo ws from the result of Exercise 4. 6 Consider a queueing pro cess suc h that in eac h min ute either 1 or 0 customers arriv e with probabilities p or q = 1 p resp ectiv ely (The n um b er p is called the arrival r ate .) When a customer starts service she nishes in the next min ute with probabilit y r The n um b er r is called the servic e r ate .) Th us when a customer b egins b eing serv ed she will nish b eing serv ed in j min utes with probabilit y (1 r ) j 1 r for j = 1, 2, 3, PAGE 401 10.3. CONTINUOUS DENSITIES 393 (a) Find the generating function f ( z ) for the n um b er of customers who arriv e in one min ute and the generating function g ( z ) for the length of time that a p erson sp ends in service once she b egins service. (b) Consider a customer br anching pr o c ess b y considering the ospring of a customer to b e the customers who arriv e while she is b eing serv ed. Using Exercise 4, sho w that the generating function for our customer branc hing pro cess is h ( z ) = g ( f ( z )). (c) If w e start the branc hing pro cess with the arriv al of the rst customer, then the length of time un til the branc hing pro cess dies out will b e the busy p erio d for the serv er. Find a condition in terms of the arriv al rate and service rate that will assure that the serv er will ultimately ha v e a time when he is not busy 7 Let N b e the exp ected total n um b er of ospring in a branc hing pro cess. Let m b e the mean n um b er of ospring of a single paren t. Sho w that N = 1 + X p k k N = 1 + mN and hence that N is nite if and only if m < 1 and in that case N = 1 = (1 m ). 8 Consider a branc hing pro cess suc h that the n um b er of ospring of a paren t is j with probabilit y 1 = 2 j +1 for j = 0, 1, 2, (a) Using the results of Example 10.11 sho w that the probabilit y that there are j ospring in the n th generation is p ( n ) j = 1 n ( n +1) ( n n +1 ) j ; if j 1 ; n n +1 ; if j = 0 : (b) Sho w that the probabilit y that the pro cess dies out exactly at the n th generation is 1 =n ( n + 1). (c) Sho w that the exp ected lifetime is innite ev en though d = 1. 10.3 Generating F unctions for Con tin uous Densities In the previous section, w e in tro duced the concepts of momen ts and momen t generating functions for discrete random v ariables. These concepts ha v e natural analogues for con tin uous random v ariables, pro vided some care is tak en in argumen ts in v olving con v ergence. Momen ts If X is a con tin uous random v ariable dened on the probabilit y space n, with densit y function f X then w e dene the n th momen t of X b y the form ula n = E ( X n ) = Z + 1 1 x n f X ( x ) dx ; PAGE 402 394 CHAPTER 10. GENERA TING FUNCTIONS pro vided the in tegral n = E ( X n ) = Z + 1 1 j x j n f X ( x ) dx ; is nite. Then, just as in the discrete case, w e see that 0 = 1, 1 = and 2 21 = 2 Momen t Generating F unctions No w w e dene the moment gener ating function g ( t ) for X b y the form ula g ( t ) = 1 X k =0 k t k k = 1 X k =0 E ( X k ) t k k = E ( e tX ) = Z + 1 1 e tx f X ( x ) dx ; pro vided this series con v erges. Then, as b efore, w e ha v e n = g ( n ) (0) : ExamplesExample 10.15 Let X b e a con tin uous random v ariable with range [0 ; 1] and densit y function f X ( x ) = 1 for 0 x 1 (uniform densit y). Then n = Z 1 0 x n dx = 1 n + 1 ; and g ( t ) = 1 X k =0 t k ( k + 1)! = e t 1 t : Here the series con v erges for all t Alternativ ely w e ha v e g ( t ) = Z + 1 1 e tx f X ( x ) dx = Z 1 0 e tx dx = e t 1 t : Then (b y L'H^ opital's rule) 0 = g (0) = lim t 0 e t 1 t = 1 ; 1 = g 0 (0) = lim t 0 te t e t + 1 t 2 = 1 2 ; 2 = g 00 (0) = lim t 0 t 3 e t 2 t 2 e t + 2 te t 2 t t 4 = 1 3 : PAGE 403 10.3. CONTINUOUS DENSITIES 395 In particular, w e v erify that = g 0 (0) = 1 = 2 and 2 = g 00 (0) ( g 0 (0)) 2 = 1 3 1 4 = 1 12 as b efore (see Example 6.25). 2 Example 10.16 Let X ha v e range [ 0 ; 1 ) and densit y function f X ( x ) = e x (exp onen tial densit y with parameter ). In this case n = Z 1 0 x n e x dx = ( 1) n d n d n Z 1 0 e x dx = ( 1) n d n d n [ 1 ] = n n ; and g ( t ) = 1 X k =0 k t k k = 1 X k =0 [ t ] k = t : Here the series con v erges only for j t j < Alternativ ely w e ha v e g ( t ) = Z 1 0 e tx e x dx = e ( t ) x t 10 = t : No w w e can v erify directly that n = g ( n ) (0) = n ( t ) n +1 t =0 = n n : 2 Example 10.17 Let X ha v e range ( 1 ; + 1 ) and densit y function f X ( x ) = 1 p 2 e x 2 = 2 (normal densit y). In this case w e ha v e n = 1 p 2 Z + 1 1 x n e x 2 = 2 dx = (2 m )! 2 m m ; if n = 2 m 0 ; if n = 2 m + 1. PAGE 404 396 CHAPTER 10. GENERA TING FUNCTIONS (These momen ts are calculated b y in tegrating once b y parts to sho w that n = ( n 1) n 2 and observing that 0 = 1 and 1 = 0.) Hence, g ( t ) = 1 X n =0 n t n n = 1 X m =0 t 2 m 2 m m = e t 2 = 2 : This series con v erges for all v alues of t Again w e can v erify that g ( n ) (0) = n Let X b e a normal random v ariable with parameters and It is easy to sho w that the momen t generating function of X is giv en b y e t +( 2 = 2) t 2 : No w supp ose that X and Y are t w o indep enden t normal random v ariables with parameters 1 1 and 2 2 resp ectiv ely Then, the pro duct of the momen t generating functions of X and Y is e t ( 1 + 2 )+(( 2 1 + 2 2 ) = 2) t 2 : This is the momen t generating function for a normal random v ariable with mean 1 + 2 and v ariance 2 1 + 2 2 Th us, the sum of t w o indep enden t normal random v ariables is again normal. (This w as pro v ed for the sp ecial case that b oth summands are standard normal in Example 7.5.) 2 In general, the series dening g ( t ) will not con v erge for all t But in the imp ortan t sp ecial case where X is b ounded (i.e., where the range of X is con tained in a nite in terv al), w e can sho w that the series do es con v erge for all t Theorem 10.3 Supp ose X is a con tin uous random v ariable with range con tained in the in terv al [ M ; M ]. Then the series g ( t ) = 1 Xk =0 k t k k con v erges for all t to an innitely dieren tiable function g ( t ), and g ( n ) (0) = n Pro of. W e ha v e k = Z + M M x k f X ( x ) dx ; so j k j Z + M M j x j k f X ( x ) dx M k Z + M M f X ( x ) dx = M k : PAGE 405 10.3. CONTINUOUS DENSITIES 397 Hence, for all N w e ha v e N Xk =0 k t k k N X k =0 ( M j t j ) k k e M j t j ; whic h sho ws that the p o w er series con v erges for all t W e kno w that the sum of a con v ergen t p o w er series is alw a ys dieren tiable. 2 Momen t Problem Theorem 10.4 If X is a b ounded random v ariable, then the momen t generating function g X ( t ) of x determines the densit y function f X ( x ) uniquely Sketch of the Pr o of. W e kno w that g X ( t ) = 1 Xk =0 k t k k = Z + 1 1 e tx f ( x ) dx : If w e replace t b y i where is real and i = p 1 then the series con v erges for all and w e can dene the function k X ( ) = g X ( i ) = Z + 1 1 e i x f X ( x ) dx : The function k X ( ) is called the char acteristic function of X and is dened b y the ab o v e equation ev en when the series for g X do es not con v erge. This equation sa ys that k X is the F ourier tr ansform of f X It is kno wn that the F ourier transform has an in v erse, giv en b y the form ula f X ( x ) = 1 2 Z + 1 1 e i x k X ( ) d ; suitably in terpreted. 9 Here w e see that the c haracteristic function k X and hence the momen t generating function g X determines the densit y function f X uniquely under our h yp otheses. 2 Sk etc h of the Pro of of the Cen tral Limit Theorem With the ab o v e result in mind, w e can no w sk etc h a pro of of the Cen tral Limit Theorem for b ounded con tin uous random v ariables (see Theorem 9.6). T o this end, let X b e a con tin uous random v ariable with densit y function f X mean = 0 and v ariance 2 = 1, and momen t generating function g ( t ) dened b y its series for all t 9 H. Dym and H. P McKean, F ourier Series and Inte gr als (New Y ork: Academic Press, 1972). PAGE 406 398 CHAPTER 10. GENERA TING FUNCTIONS Let X 1 X 2 X n b e an indep enden t trials pro cess with eac h X i ha ving densit y f X and let S n = X 1 + X 2 + + X n and S n = ( S n n ) = p n 2 = S n = p n Then eac h X i has momen t generating function g ( t ), and since the X i are indep enden t, the sum S n just as in the discrete case (see Section 10.1), has momen t generating function g n ( t ) = ( g ( t )) n ; and the standardized sum S n has momen t generating function g n ( t ) = g t p n n : W e no w sho w that, as n 1 g n ( t ) e t 2 = 2 where e t 2 = 2 is the momen t generating function of the normal densit y n ( x ) = (1 = p 2 ) e x 2 = 2 (see Example 10.17). T o sho w this, w e set u ( t ) = log g ( t ), and u n ( t ) = log g n ( t ) = n log g t p n = nu t p n ; and sho w that u n ( t ) t 2 = 2 as n 1 First w e note that u (0) = log g n (0) = 0 ; u 0 (0) = g 0 (0) g (0) = 1 1 = 0 ; u 00 (0) = g 00 (0) g (0) ( g 0 (0)) 2 ( g (0)) 2 = 2 21 1 = 2 = 1 : No w b y using L'H^ opital's rule t wice, w e get lim n !1 u n ( t ) = lim s !1 u ( t= p s ) s 1 = lim s !1 u 0 ( t= p s ) t 2 s 1 = 2 = lim s !1 u 00 t p s t 2 2 = 2 t 2 2 = t 2 2 : Hence, g n ( t ) e t 2 = 2 as n 1 No w to complete the pro of of the Cen tral Limit Theorem, w e m ust sho w that if g n ( t ) e t 2 = 2 then under our h yp otheses the distribution functions F n ( x ) of the S n m ust con v erge to the distribution function F N ( x ) of the normal v ariable N ; that is, that F n ( a ) = P ( S n a ) 1 p 2 Z a 1 e x 2 = 2 dx ; and furthermore, that the densit y functions f n ( x ) of the S n m ust con v erge to the densit y function for N ; that is, that f n ( x ) 1 p 2 e x 2 = 2 ; PAGE 407 10.3. CONTINUOUS DENSITIES 399 as n 1 Since the densities, and hence the distributions, of the S n are uniquely determined b y their momen t generating functions under our h yp otheses, these conclusions are certainly plausible, but their pro ofs in v olv e a detailed examination of c haracteristic functions and F ourier transforms, and w e shall not attempt them here. In the same w a y w e can pro v e the Cen tral Limit Theorem for b ounded discrete random v ariables with in teger v alues (see Theorem 9.4). Let X b e a discrete random v ariable with densit y function p ( j ), mean = 0, v ariance 2 = 1, and momen t generating function g ( t ), and let X 1 X 2 X n form an indep enden t trials pro cess with common densit y p Let S n = X 1 + X 2 + + X n and S n = S n = p n with densities p n and p n and momen t generating functions g n ( t ) and g n ( t ) = g ( t p n ) n : Then w e ha v e g n ( t ) e t 2 = 2 ; just as in the con tin uous case, and this implies in the same w a y that the distribution functions F n ( x ) con v erge to the normal distribution; that is, that F n ( a ) = P ( S n a ) 1 p 2 Z a 1 e x 2 = 2 dx ; as n 1 The corresp onding statemen t ab out the distribution functions p n ho w ev er, requires a little extra care (see Theorem 9.3). The trouble arises b ecause the distribution p ( x ) is not dened for all x but only for in teger x It follo ws that the distribution p n ( x ) is dened only for x of the form j = p n and these v alues c hange as n c hanges. W e can x this, ho w ev er, b y in tro ducing the function p ( x ), dened b y the form ula p ( x ) = p ( j ) ; if j 1 = 2 x < j + 1 = 2, 0 ; otherwise : Then p ( x ) is dened for all x p ( j ) = p ( j ), and the graph of p ( x ) is the step function for the distribution p ( j ) (see Figure 3 of Section 9.1). In the same w a y w e in tro duce the step function p n ( x ) and p n ( x ) asso ciated with the distributions p n and p n and their momen t generating functions g n ( t ) and g n ( t ). If w e can sho w that g n ( t ) e t 2 = 2 then w e can conclude that p n ( x ) 1 p 2 e t 2 = 2 ; as n 1 for all x a conclusion strongly suggested b y Figure 9.3. No w g ( t ) is giv en b y g ( t ) = Z + 1 1 e tx p ( x ) dx = + N X j = N Z j +1 = 2 j 1 = 2 e tx p ( j ) dx PAGE 408 400 CHAPTER 10. GENERA TING FUNCTIONS = + N X j = N p ( j ) e tj e t= 2 e t= 2 2 t= 2 = g ( t ) sinh( t= 2) t= 2 ; where w e ha v e put sinh( t= 2) = e t= 2 e t= 2 2 : In the same w a y w e nd that g n ( t ) = g n ( t ) sinh ( t= 2) t= 2 ; g n ( t ) = g n ( t ) sinh ( t= 2 p n ) t= 2 p n : No w, as n 1 w e kno w that g n ( t ) e t 2 = 2 and, b y L'H^ opital's rule, lim n !1 sinh( t= 2 p n ) t= 2 p n = 1 : It follo ws that g n ( t ) e t 2 = 2 ; and hence that p n ( x ) 1 p 2 e x 2 = 2 ; as n 1 The astute reader will note that in this sk etc h of the pro of of Theorem 9.3, w e nev er made use of the h yp othesis that the greatest common divisor of the dierences of all the v alues that the X i can tak e on is 1. This is a tec hnical p oin t that w e c ho ose to ignore. A complete pro of ma y b e found in Gnedenk o and Kolmogoro v. 10 Cauc h y Densit y The c haracteristic function of a con tin uous densit y is a useful to ol ev en in cases when the momen t series do es not con v erge, or ev en in cases when the momen ts themselv es are not nite. As an example, consider the Cauc h y densit y with parameter a = 1 (see Example 5.10) f ( x ) = 1 (1 + x 2 ) : If X and Y are indep enden t random v ariables with Cauc h y densit y f ( x ), then the a v erage Z = ( X + Y ) = 2 also has Cauc h y densit y f ( x ), that is, f Z ( x ) = f ( x ) : 10 B. V. Gnedenk o and A. N. Kolomogoro v, Limit Distributions for Sums of Indep endent R andom V ariables (Reading: AddisonW esley 1968), p. 233. PAGE 409 10.3. CONTINUOUS DENSITIES 401 This is hard to c hec k directly but easy to c hec k b y using c haracteristic functions. Note rst that 2 = E ( X 2 ) = Z + 1 1 x 2 (1 + x 2 ) dx = 1 so that 2 is innite. Nev ertheless, w e can dene the c haracteristic function k X ( ) of x b y the form ula k X ( ) = Z + 1 1 e i x 1 (1 + x 2 ) dx : This in tegral is easy to do b y con tour metho ds, and giv es us k X ( ) = k Y ( ) = e j j : Hence, k X + Y ( ) = ( e j j ) 2 = e 2 j j ; and since k Z ( ) = k X + Y ( = 2) ; w e ha v e k Z ( ) = e 2 j = 2 j = e j j : This sho ws that k Z = k X = k Y and leads to the conclusions that f Z = f X = f Y It follo ws from this that if X 1 X 2 X n is an indep enden t trials pro cess with common Cauc h y densit y and if A n = X 1 + X 2 + + X n n is the a v erage of the X i then A n has the same densit y as do the X i This means that the La w of Large Num b ers fails for this pro cess; the distribution of the a v erage A n is exactly the same as for the individual terms. Our pro of of the La w of Large Num b ers fails in this case b ecause the v ariance of X i is not nite. Exercises 1 Let X b e a con tin uous random v ariable with v alues in [ 0 ; 2] and densit y f X Find the momen t generating function g ( t ) for X if (a) f X ( x ) = 1 = 2. (b) f X ( x ) = (1 = 2) x (c) f X ( x ) = 1 (1 = 2) x (d) f X ( x ) = j 1 x j (e) f X ( x ) = (3 = 8) x 2 Hint : Use the in tegral denition, as in Examples 10.15 and 10.16. 2 F or eac h of the densities in Exercise 1 calculate the rst and second momen ts, 1 and 2 directly from their denition and v erify that g (0) = 1, g 0 (0) = 1 and g 00 (0) = 2 PAGE 410 402 CHAPTER 10. GENERA TING FUNCTIONS 3 Let X b e a con tin uous random v ariable with v alues in [ 0 ; 1 ) and densit y f X Find the momen t generating functions for X if (a) f X ( x ) = 2 e 2 x (b) f X ( x ) = e 2 x + (1 = 2) e x (c) f X ( x ) = 4 xe 2 x (d) f X ( x ) = ( x ) n 1 e x = ( n 1)!. 4 F or eac h of the densities in Exercise 3, calculate the rst and second momen ts, 1 and 2 directly from their denition and v erify that g (0) = 1, g 0 (0) = 1 and g 00 (0) = 2 5 Find the c haracteristic function k X ( ) for eac h of the random v ariables X of Exercise 1. 6 Let X b e a con tin uous random v ariable whose c haracteristic function k X ( ) is k X ( ) = e j j ; 1 < < + 1 : Sho w directly that the densit y f X of X is f X ( x ) = 1 (1 + x 2 ) : 7 Let X b e a con tin uous random v ariable with v alues in [ 0 ; 1], uniform densit y function f X ( x ) 1 and momen t generating function g ( t ) = ( e t 1) =t Find in terms of g ( t ) the momen t generating function for (a) X (b) 1 + X (c) 3 X (d) aX + b 8 Let X 1 X 2 X n b e an indep enden t trials pro cess with uniform densit y Find the momen t generating function for (a) X 1 (b) S 2 = X 1 + X 2 (c) S n = X 1 + X 2 + + X n (d) A n = S n =n (e) S n = ( S n n ) = p n 2 9 Let X 1 X 2 X n b e an indep enden t trials pro cess with normal densit y of mean 1 and v ariance 2. Find the momen t generating function for (a) X 1 (b) S 2 = X 1 + X 2 PAGE 411 10.3. CONTINUOUS DENSITIES 403 (c) S n = X 1 + X 2 + + X n (d) A n = S n =n (e) S n = ( S n n ) = p n 2 10 Let X 1 X 2 X n b e an indep enden t trials pro cess with densit y f ( x ) = 1 2 e j x j ; 1 < x < + 1 : (a) Find the mean and v ariance of f ( x ). (b) Find the momen t generating function for X 1 S n A n and S n (c) What can y ou sa y ab out the momen t generating function of S n as n 1 ? (d) What can y ou sa y ab out the momen t generating function of A n as n 1 ? PAGE 412 404 CHAPTER 10. GENERA TING FUNCTIONS PAGE 413 Chapter 11 Mark o v Chains 11.1 In tro duction Most of our study of probabilit y has dealt with indep enden t trials pro cesses. These pro cesses are the basis of classical probabilit y theory and m uc h of statistics. W e ha v e discussed t w o of the principal theorems for these pro cesses: the La w of Large Num b ers and the Cen tral Limit Theorem. W e ha v e seen that when a sequence of c hance exp erimen ts forms an indep enden t trials pro cess, the p ossible outcomes for eac h exp erimen t are the same and o ccur with the same probabilit y F urther, kno wledge of the outcomes of the previous exp erimen ts do es not inruence our predictions for the outcomes of the next exp erimen t. The distribution for the outcomes of a single exp erimen t is sucien t to construct a tree and a tree measure for a sequence of n exp erimen ts, and w e can answ er an y probabilit y question ab out these exp erimen ts b y using this tree measure. Mo dern probabilit y theory studies c hance pro cesses for whic h the kno wledge of previous outcomes inruences predictions for future exp erimen ts. In principle, when w e observ e a sequence of c hance exp erimen ts, all of the past outcomes could inruence our predictions for the next exp erimen t. F or example, this should b e the case in predicting a studen t's grades on a sequence of exams in a course. But to allo w this m uc h generalit y w ould mak e it v ery dicult to pro v e general results. In 1907, A. A. Mark o v b egan the study of an imp ortan t new t yp e of c hance pro cess. In this pro cess, the outcome of a giv en exp erimen t can aect the outcome of the next exp erimen t. This t yp e of pro cess is called a Mark o v c hain. Sp ecifying a Mark o v Chain W e describ e a Mark o v c hain as follo ws: W e ha v e a set of states, S = f s 1 ; s 2 ; : : : ; s r g The pro cess starts in one of these states and mo v es successiv ely from one state to another. Eac h mo v e is called a step. If the c hain is curren tly in state s i then it mo v es to state s j at the next step with a probabilit y denoted b y p ij and this probabilit y do es not dep end up on whic h states the c hain w as in b efore the curren t 405 PAGE 414 406 CHAPTER 11. MARK O V CHAINS state. The probabilities p ij are called tr ansition pr ob abilities. The pro cess can remain in the state it is in, and this o ccurs with probabilit y p ii An initial probabilit y distribution, dened on S sp ecies the starting state. Usually this is done b y sp ecifying a particular state as the starting state. R. A. Ho w ard 1 pro vides us with a picturesque description of a Mark o v c hain as a frog jumping on a set of lily pads. The frog starts on one of the pads and then jumps from lily pad to lily pad with the appropriate transition probabilities. Example 11.1 According to Kemen y Snell, and Thompson, 2 the Land of Oz is blessed b y man y things, but not b y go o d w eather. They nev er ha v e t w o nice da ys in a ro w. If they ha v e a nice da y they are just as lik ely to ha v e sno w as rain the next da y If they ha v e sno w or rain, they ha v e an ev en c hance of ha ving the same the next da y If there is c hange from sno w or rain, only half of the time is this a c hange to a nice da y With this information w e form a Mark o v c hain as follo ws. W e tak e as states the kinds of w eather R, N, and S. F rom the ab o v e information w e determine the transition probabilities. These are most con v enien tly represen ted in a square arra y as P = 0@ R N S R 1 = 2 1 = 4 1 = 4 N 1 = 2 0 1 = 2 S 1 = 4 1 = 4 1 = 2 1A : 2 T ransition Matrix The en tries in the rst ro w of the matrix P in Example 11.1 represen t the probabilities for the v arious kinds of w eather follo wing a rain y da y Similarly the en tries in the second and third ro ws represen t the probabilities for the v arious kinds of w eather follo wing nice and sno wy da ys, resp ectiv ely Suc h a square arra y is called the matrix of tr ansition pr ob abilities or the tr ansition matrix W e consider the question of determining the probabilit y that, giv en the c hain is in state i to da y it will b e in state j t w o da ys from no w. W e denote this probabilit y b y p (2)ij In Example 11.1, w e see that if it is rain y to da y then the ev en t that it is sno wy t w o da ys from no w is the disjoin t union of the follo wing three ev en ts: 1) it is rain y tomorro w and sno wy t w o da ys from no w, 2) it is nice tomorro w and sno wy t w o da ys from no w, and 3) it is sno wy tomorro w and sno wy t w o da ys from no w. The probabilit y of the rst of these ev en ts is the pro duct of the conditional probabilit y that it is rain y tomorro w, giv en that it is rain y to da y and the conditional probabilit y that it is sno wy t w o da ys from no w, giv en that it is rain y tomorro w. Using the transition matrix P w e can write this pro duct as p 11 p 13 The other t w o 1 R. A. Ho w ard, Dynamic Pr ob abilistic Systems, v ol. 1 (New Y ork: John Wiley and Sons, 1971). 2 J. G. Kemen y J. L. Snell, G. L. Thompson, Intr o duction to Finite Mathematics, 3rd ed. (Englew o o d Clis, NJ: Pren ticeHall, 1974). PAGE 415 11.1. INTR ODUCTION 407 ev en ts also ha v e probabilities that can b e written as pro ducts of en tries of P Th us, w e ha v e p (2)13 = p 11 p 13 + p 12 p 23 + p 13 p 33 : This equation should remind the reader of a dot pro duct of t w o v ectors; w e are dotting the rst ro w of P with the third column of P This is just what is done in obtaining the 1 ; 3en try of the pro duct of P with itself. In general, if a Mark o v c hain has r states, then p (2)ij = r X k =1 p ik p k j : The follo wing general theorem is easy to pro v e b y using the ab o v e observ ation and induction.Theorem 11.1 Let P b e the transition matrix of a Mark o v c hain. The ij th entry p ( n ) ij of the matrix P n giv es the probabilit y that the Mark o v c hain, starting in state s i will b e in state s j after n steps. Pro of. The pro of of this theorem is left as an exercise (Exercise 17). 2 Example 11.2 (Example 11.1 con tin ued) Consider again the w eather in the Land of Oz. W e kno w that the p o w ers of the transition matrix giv e us in teresting information ab out the pro cess as it ev olv es. W e shall b e particularly in terested in the state of the c hain after a large n um b er of steps. The program MatrixP o w ers computes the p o w ers of P W e ha v e run the program MatrixP o w ers for the Land of Oz example to compute the successiv e p o w ers of P from 1 to 6. The results are sho wn in T able 11.1. W e note that after six da ys our w eather predictions are, to threedecimalplace accuracy indep enden t of to da y's w eather. The probabilities for the three t yp es of w eather, R, N, and S, are .4, .2, and .4 no matter where the c hain started. This is an example of a t yp e of Mark o v c hain called a r e gular Mark o v c hain. F or this t yp e of c hain, it is true that longrange predictions are indep enden t of the starting state. Not all c hains are regular, but this is an imp ortan t class of c hains that w e shall study in detail later. 2 W e no w consider the longterm b eha vior of a Mark o v c hain when it starts in a state c hosen b y a probabilit y distribution on the set of states, whic h w e will call a pr ob ability ve ctor A probabilit y v ector with r comp onen ts is a ro w v ector whose en tries are nonnegativ e and sum to 1. If u is a probabilit y v ector whic h represen ts the initial state of a Mark o v c hain, then w e think of the i th comp onen t of u as represen ting the probabilit y that the c hain starts in state s i With this in terpretation of random starting states, it is easy to pro v e the follo wing theorem. PAGE 416 408 CHAPTER 11. MARK O V CHAINS P 1 = 0@ Rain Nice Sno w Rain : 500 : 250 : 250 Nice : 500 : 000 : 500 Sno w : 250 : 250 : 500 1A P 2 = 0@ Rain Nice Sno w Rain : 438 : 188 : 375 Nice : 375 : 250 : 375 Sno w : 375 : 188 : 438 1A P 3 = 0@ Rain Nice Sno w Rain : 406 : 203 : 391 Nice : 406 : 188 : 406 Sno w : 391 : 203 : 406 1A P 4 = 0@ Rain Nice Sno w Rain : 402 : 199 : 398 Nice : 398 : 203 : 398 Sno w : 398 : 199 : 402 1A P 5 = 0@ Rain Nice Sno w Rain : 400 : 200 : 399 Nice : 400 : 199 : 400 Sno w : 399 : 200 : 400 1A P 6 = 0@ Rain Nice Sno w Rain : 400 : 200 : 400 Nice : 400 : 200 : 400 Sno w : 400 : 200 : 400 1A T able 11.1: P o w ers of the Land of Oz transition matrix. PAGE 417 11.1. INTR ODUCTION 409 Theorem 11.2 Let P b e the transition matrix of a Mark o v c hain, and let u b e the probabilit y v ector whic h represen ts the starting distribution. Then the probabilit y that the c hain is in state s i after n steps is the i th en try in the v ector u ( n ) = uP n : Pro of. The pro of of this theorem is left as an exercise (Exercise 18). 2 W e note that if w e w an t to examine the b eha vior of the c hain under the assumption that it starts in a certain state s i w e simply c ho ose u to b e the probabilit y v ector with i th en try equal to 1 and all other en tries equal to 0. Example 11.3 In the Land of Oz example (Example 11.1) let the initial probabilit y v ector u equal (1 = 3 ; 1 = 3 ; 1 = 3). Then w e can calculate the distribution of the states after three da ys using Theorem 11.2 and our previous calculation of P 3 W e obtain u (3) = uP 3 = ( 1 = 3 ; 1 = 3 ; 1 = 3 ) 0@ : 406 : 203 : 391 : 406 : 188 : 406 : 391 : 203 : 406 1A = ( : 401 ; : 198 ; : 401 ) : 2 ExamplesThe follo wing examples of Mark o v c hains will b e used throughout the c hapter for exercises.Example 11.4 The Presiden t of the United States tells p erson A his or her inten tion to run or not to run in the next election. Then A rela ys the news to B, who in turn rela ys the message to C, and so forth, alw a ys to some new p erson. W e assume that there is a probabilit y a that a p erson will c hange the answ er from y es to no when transmitting it to the next p erson and a probabilit y b that he or she will c hange it from no to y es. W e c ho ose as states the message, either y es or no. The transition matrix is then P = y es no y es 1 a a no b 1 b : The initial state represen ts the Presiden t's c hoice. 2 Example 11.5 Eac h time a certain horse runs in a threehorse race, he has probabilit y 1/2 of winning, 1/4 of coming in second, and 1/4 of coming in third, indep enden t of the outcome of an y previous race. W e ha v e an indep enden t trials pro cess, PAGE 418 410 CHAPTER 11. MARK O V CHAINS but it can also b e considered from the p oin t of view of Mark o v c hain theory The transition matrix is P = 0@ W P S W : 5 : 25 : 25 P : 5 : 25 : 25 S : 5 : 25 : 25 1A : 2 Example 11.6 In the Dark Ages, Harv ard, Dartmouth, and Y ale admitted only male studen ts. Assume that, at that time, 80 p ercen t of the sons of Harv ard men w en t to Harv ard and the rest w en t to Y ale, 40 p ercen t of the sons of Y ale men w en t to Y ale, and the rest split ev enly b et w een Harv ard and Dartmouth; and of the sons of Dartmouth men, 70 p ercen t w en t to Dartmouth, 20 p ercen t to Harv ard, and 10 p ercen t to Y ale. W e form a Mark o v c hain with transition matrix P = 0@ H Y D H : 8 : 2 0 Y : 3 : 4 : 3 D : 2 : 1 : 7 1A : 2 Example 11.7 Mo dify Example 11.6 b y assuming that the son of a Harv ard man alw a ys w en t to Harv ard. The transition matrix is no w P = 0@ H Y D H 1 0 0 Y : 3 : 4 : 3 D : 2 : 1 : 7 1A : 2 Example 11.8 (Ehrenfest Mo del) The follo wing is a sp ecial case of a mo del, called the Ehrenfest mo del, 3 that has b een used to explain diusion of gases. The general mo del will b e discussed in detail in Section 11.5. W e ha v e t w o urns that, b et w een them, con tain four balls. A t eac h step, one of the four balls is c hosen at random and mo v ed from the urn that it is in in to the other urn. W e c ho ose, as states, the n um b er of balls in the rst urn. The transition matrix is then P = 0BBBB@ 0 1 2 3 4 0 0 1 0 0 0 1 1 = 4 0 3 = 4 0 0 2 0 1 = 2 0 1 = 2 0 3 0 0 3 = 4 0 1 = 4 4 0 0 0 1 0 1CCCCA : 2 3 P and T. Ehrenfest, \ Ub er zw ei b ek ann te Ein w ande gegen das Boltzmannsc he HTheorem," Physikalishc e Zeitschrift, v ol. 8 (1907), pp. 311314. PAGE 419 11.1. INTR ODUCTION 411 Example 11.9 (Gene Mo del) The simplest t yp e of inheritance of traits in animals o ccurs when a trait is go v erned b y a pair of genes, eac h of whic h ma y b e of t w o t yp es, sa y G and g. An individual ma y ha v e a GG com bination or Gg (whic h is genetically the same as gG) or gg. V ery often the GG and Gg t yp es are indistinguishable in app earance, and then w e sa y that the G gene dominates the g gene. An individual is called dominant if he or she has GG genes, r e c essive if he or she has gg, and hybrid with a Gg mixture. In the mating of t w o animals, the ospring inherits one gene of the pair from eac h paren t, and the basic assumption of genetics is that these genes are selected at random, indep enden tly of eac h other. This assumption determines the probabilit y of o ccurrence of eac h t yp e of ospring. The ospring of t w o purely dominan t paren ts m ust b e dominan t, of t w o recessiv e paren ts m ust b e recessiv e, and of one dominan t and one recessiv e paren t m ust b e h ybrid. In the mating of a dominan t and a h ybrid animal, eac h ospring m ust get a G gene from the former and has an equal c hance of getting G or g from the latter. Hence there is an equal probabilit y for getting a dominan t or a h ybrid ospring. Again, in the mating of a recessiv e and a h ybrid, there is an ev en c hance for getting either a recessiv e or a h ybrid. In the mating of t w o h ybrids, the ospring has an equal c hance of getting G or g from eac h paren t. Hence the probabilities are 1/4 for GG, 1/2 for Gg, and 1/4 for gg. Consider a pro cess of con tin ued matings. W e start with an individual of kno wn genetic c haracter and mate it with a h ybrid. W e assume that there is at least one ospring. An ospring is c hosen at random and is mated with a h ybrid and this pro cess rep eated through a n um b er of generations. The genetic t yp e of the c hosen ospring in successiv e generations can b e represen ted b y a Mark o v c hain. The states are dominan t, h ybrid, and recessiv e, and indicated b y GG, Gg, and gg resp ectiv ely The transition probabilities are P = 0@ GG Gg gg GG : 5 : 5 0 Gg : 25 : 5 : 25 gg 0 : 5 : 5 1A : 2 Example 11.10 Mo dify Example 11.9 as follo ws: Instead of mating the oldest ospring with a h ybrid, w e mate it with a dominan t individual. The transition matrix is P = 0@ GG Gg gg GG 1 0 0 Gg : 5 : 5 0 gg 0 1 0 1A : 2 PAGE 420 412 CHAPTER 11. MARK O V CHAINS Example 11.11 W e start with t w o animals of opp osite sex, mate them, select t w o of their ospring of opp osite sex, and mate those, and so forth. T o simplify the example, w e will assume that the trait under consideration is indep enden t of sex. Here a state is determined b y a pair of animals. Hence, the states of our pro cess will b e: s 1 = (GG ; GG ), s 2 = (GG ; Gg ), s 3 = (GG ; gg ), s 4 = (Gg ; Gg), s 5 = (Gg ; gg ), and s 6 = (gg ; gg ). W e illustrate the calculation of transition probabilities in terms of the state s 2 When the pro cess is in this state, one paren t has GG genes, the other Gg. Hence, the probabilit y of a dominan t ospring is 1/2. Then the probabilit y of transition to s 1 (selection of t w o dominan ts) is 1/4, transition to s 2 is 1/2, and to s 4 is 1/4. The other states are treated the same w a y The transition matrix of this c hain is: P 1 = 0BBBBBBB@ GG,GG GG,Gg GG,gg Gg,Gg Gg,gg gg,gg GG,GG 1 : 000 : 000 : 000 : 000 : 000 : 000 GG,Gg : 250 : 500 : 000 : 250 : 000 : 000 GG,gg : 000 : 000 : 000 1 : 000 : 000 : 000 Gg,Gg : 062 : 250 : 125 : 250 : 250 : 062 Gg,gg : 000 : 000 : 000 : 250 : 500 : 250 gg,gg : 000 : 000 : 000 : 000 : 000 1 : 000 1CCCCCCCA : 2 Example 11.12 (Stepping Stone Mo del) Our nal example is another example that has b een used in the study of genetics. It is called the stepping stone mo del. 4 In this mo del w e ha v e an n b yn arra y of squares, and eac h square is initially an y one of k dieren t colors. F or eac h step, a square is c hosen at random. This square then c ho oses one of its eigh t neigh b ors at random and assumes the color of that neigh b or. T o a v oid b oundary problems, w e assume that if a square S is on the lefthand b oundary sa y but not at a corner, it is adjacen t to the square T on the righ thand b oundary in the same ro w as S and S is also adjacen t to the squares just ab o v e and b elo w T A similar assumption is made ab out squares on the upp er and lo w er b oundaries. The top lefthand corner square is adjacen t to three ob vious neigh b ors, namely the squares b elo w it, to its righ t, and diagonally b elo w and to the righ t. It has v e other neigh b ors, whic h are as follo ws: the other three corner squares, the square b elo w the upp er righ thand corner, and the square to the righ t of the b ottom lefthand corner. The other three corners also ha v e, in a similar w a y eigh t neigh b ors. (These adjacencies are m uc h easier to understand if one imagines making the arra y in to a cylinder b y gluing the top and b ottom edge together, and then making the cylinder in to a doughn ut b y gluing the t w o circular b oundaries together.) With these adjacencies, eac h square in the arra y is adjacen t to exactly eigh t other squares. A state in this Mark o v c hain is a description of the color of eac h square. F or this Mark o v c hain the n um b er of states is k n 2 whic h for ev en a small arra y of squares 4 S. Sa wy er, \Results for The Stepping Stone Mo del for Migration in P opulation Genetics," A nnals of Pr ob ability, v ol. 4 (1979), pp. 699{728. PAGE 421 11.1. INTR ODUCTION 413 Figure 11.1: Initial state of the stepping stone mo del. Figure 11.2: State of the stepping stone mo del after 10,000 steps. is enormous. This is an example of a Mark o v c hain that is easy to sim ulate but dicult to analyze in terms of its transition matrix. The program SteppingStone sim ulates this c hain. W e ha v e started with a random initial conguration of t w o colors with n = 20 and sho w the result after the pro cess has run for some time in Figure 11.2. This is an example of an absorbing Mark o v c hain. This t yp e of c hain will b e studied in Section 11.2. One of the theorems pro v ed in that section, applied to the presen t example, implies that with probabilit y 1, the stones will ev en tually all b e the same color. By w atc hing the program run, y ou can see that territories are established and a battle dev elops to see whic h color surviv es. A t an y time the probabilit y that a particular color will win out is equal to the prop ortion of the arra y of this color. Y ou are ask ed to pro v e this in Exercise 11.2.32. 2 Exercises 1 It is raining in the Land of Oz. Determine a tree and a tree measure for the next three da ys' w eather. Find w (1) ; w (2) ; and w (3) and compare with the results obtained from P ; P 2 ; and P 3 PAGE 422 414 CHAPTER 11. MARK O V CHAINS 2 In Example 11.4, let a = 0 and b = 1 = 2. Find P ; P 2 ; and P 3 : What w ould P n b e? What happ ens to P n as n tends to innit y? In terpret this result. 3 In Example 11.5, nd P P 2 ; and P 3 : What is P n ? 4 F or Example 11.6, nd the probabilit y that the grandson of a man from Harv ard w en t to Harv ard. 5 In Example 11.7, nd the probabilit y that the grandson of a man from Harv ard w en t to Harv ard. 6 In Example 11.9, assume that w e start with a h ybrid bred to a h ybrid. Find u (1) ; u (2) ; and u (3) : What w ould u ( n ) b e? 7 Find the matrices P 2 ; P 3 ; P 4 ; and P n for the Mark o v c hain determined b y the transition matrix P = 1 0 0 1 Do the same for the transition matrix P = 0 1 1 0 In terpret what happ ens in eac h of these pro cesses. 8 A certain calculating mac hine uses only the digits 0 and 1. It is supp osed to transmit one of these digits through sev eral stages. Ho w ev er, at ev ery stage, there is a probabilit y p that the digit that en ters this stage will b e c hanged when it lea v es and a probabilit y q = 1 p that it w on't. F orm a Mark o v c hain to represen t the pro cess of transmission b y taking as states the digits 0 and 1. What is the matrix of transition probabilities? 9 F or the Mark o v c hain in Exercise 8, dra w a tree and assign a tree measure assuming that the pro cess b egins in state 0 and mo v es through t w o stages of transmission. What is the probabilit y that the mac hine, after t w o stages, pro duces the digit 0 (i.e., the correct digit)? What is the probabilit y that the mac hine nev er c hanged the digit from 0? No w let p = : 1. Using the program MatrixP o w ers compute the 100th p o w er of the transition matrix. In terpret the en tries of this matrix. Rep eat this with p = : 2. Wh y do the 100th p o w ers app ear to b e the same? 10 Mo dify the program MatrixP o w ers so that it prin ts out the a v erage A n of the p o w ers P n for n = 1 to N T ry y our program on the Land of Oz example and compare A n and P n : 11 Assume that a man's profession can b e classied as professional, skilled lab orer, or unskilled lab orer. Assume that, of the sons of professional men, 80 p ercen t are professional, 10 p ercen t are skilled lab orers, and 10 p ercen t are unskilled lab orers. In the case of sons of skilled lab orers, 60 p ercen t are skilled lab orers, 20 p ercen t are professional, and 20 p ercen t are unskilled. Finally in the case of unskilled lab orers, 50 p ercen t of the sons are unskilled lab orers, and 25 p ercen t eac h are in the other t w o categories. Assume that ev ery man has at least one son, and form a Mark o v c hain b y follo wing the profession of a randomly c hosen son of a giv en family through sev eral generations. Set up PAGE 423 11.1. INTR ODUCTION 415 the matrix of transition probabilities. Find the probabilit y that a randomly c hosen grandson of an unskilled lab orer is a professional man. 12 In Exercise 11, w e assumed that ev ery man has a son. Assume instead that the probabilit y that a man has at least one son is .8. F orm a Mark o v c hain with four states. If a man has a son, the probabilit y that this son is in a particular profession is the same as in Exercise 11. If there is no son, the pro cess mo v es to state four whic h represen ts families whose male line has died out. Find the matrix of transition probabilities and nd the probabilit y that a randomly c hosen grandson of an unskilled lab orer is a professional man. 13 W rite a program to compute u ( n ) giv en u and P Use this program to compute u (10) for the Land of Oz example, with u = (0 ; 1 ; 0), and with u = (1 = 3 ; 1 = 3 ; 1 = 3). 14 Using the program MatrixP o w ers nd P 1 through P 6 for Examples 11.9 and 11.10. See if y ou can predict the longrange probabilit y of nding the pro cess in eac h of the states for these examples. 15 W rite a program to sim ulate the outcomes of a Mark o v c hain after n steps, giv en the initial starting state and the transition matrix P as data (see Example 11.12). Keep this program for use in later problems. 16 Mo dify the program of Exercise 15 so that it k eeps trac k of the prop ortion of times in eac h state in n steps. Run the mo died program for dieren t starting states for Example 11.1 and Example 11.8. Do es the initial state aect the prop ortion of time sp en t in eac h of the states if n is large? 17 Pro v e Theorem 11.1. 18 Pro v e Theorem 11.2. 19 Consider the follo wing pro cess. W e ha v e t w o coins, one of whic h is fair, and the other of whic h has heads on b oth sides. W e giv e these t w o coins to our friend, who c ho oses one of them at random (eac h with probabilit y 1/2). During the rest of the pro cess, she uses only the coin that she c hose. She no w pro ceeds to toss the coin man y times, rep orting the results. W e consider this pro cess to consist solely of what she rep orts to us. (a) Giv en that she rep orts a head on the n th toss, what is the probabilit y that a head is thro wn on the ( n + 1)st toss? (b) Consider this pro cess as ha ving t w o states, heads and tails. By computing the other three transition probabilities analogous to the one in part (a), write do wn a \transition matrix" for this pro cess. (c) No w assume that the pro cess is in state \heads" on b oth the ( n 1)st and the n th toss. Find the probabilit y that a head comes up on the ( n + 1)st toss. (d) Is this pro cess a Mark o v c hain? PAGE 424 416 CHAPTER 11. MARK O V CHAINS 11.2 Absorbing Mark o v Chains The sub ject of Mark o v c hains is b est studied b y considering sp ecial t yp es of Mark o v c hains. The rst t yp e that w e shall study is called an absorbing Markov chain. Denition 11.1 A state s i of a Mark o v c hain is called absorbing if it is imp ossible to lea v e it (i.e., p ii = 1). A Mark o v c hain is absorbing if it has at least one absorbing state, and if from ev ery state it is p ossible to go to an absorbing state (not necessarily in one step). 2 Denition 11.2 In an absorbing Mark o v c hain, a state whic h is not absorbing is called tr ansient. 2 Drunk ard's W alk Example 11.13 A man w alks along a fourblo c k stretc h of P ark Av en ue (see Figure 11.3). If he is at corner 1, 2, or 3, then he w alks to the left or righ t with equal probabilit y He con tin ues un til he reac hes corner 4, whic h is a bar, or corner 0, whic h is his home. If he reac hes either home or the bar, he sta ys there. W e form a Mark o v c hain with states 0, 1, 2, 3, and 4. States 0 and 4 are absorbing states. The transition matrix is then P = 0BBBB@ 0 1 2 3 4 0 1 0 0 0 0 1 1 = 2 0 1 = 2 0 0 2 0 1 = 2 0 1 = 2 0 3 0 0 1 = 2 0 1 = 2 4 0 0 0 0 1 1CCCCA : The states 1, 2, and 3 are transien t states, and from an y of these it is p ossible to reac h the absorbing states 0 and 4. Hence the c hain is an absorbing c hain. When a pro cess reac hes an absorbing state, w e shall sa y that it is absorb e d 2 The most ob vious question that can b e ask ed ab out suc h a c hain is: What is the probabilit y that the pro cess will ev en tually reac h an absorbing state? Other in teresting questions include: (a) What is the probabilit y that the pro cess will end up in a giv en absorbing state? (b) On the a v erage, ho w long will it tak e for the pro cess to b e absorb ed? (c) On the a v erage, ho w man y times will the pro cess b e in eac h transien t state? The answ ers to all these questions dep end, in general, on the state from whic h the pro cess starts as w ell as the transition probabilities. Canonical F orm Consider an arbitrary absorbing Mark o v c hain. Ren um b er the states so that the transien t states come rst. If there are r absorbing states and t transien t states, the transition matrix will ha v e the follo wing c anonic al form PAGE 425 11.2. ABSORBING MARK O V CHAINS 417 1 2 3 0 4 1 1 1/2 1/2 1/2 1/2 1/2 1/2 Figure 11.3: Drunk ard's w alk. P = 0B@ TR. ABS. TR. Q R ABS. 0 I 1CA Here I is an r b yr inden tit y matrix, 0 is an r b yt zero matrix, R is a nonzero t b yr matrix, and Q is an t b yt matrix. The rst t states are transien t and the last r states are absorbing. In Section 11.1, w e sa w that the en try p ( n ) ij of the matrix P n is the probabilit y of b eing in the state s j after n steps, when the c hain is started in state s i A standard matrix algebra argumen t sho ws that P n is of the form P n = 0B@ TR. ABS. TR. Q n ABS. 0 I 1CA where the asterisk stands for the t b yr matrix in the upp er righ thand corner of P n : (This submatrix can b e written in terms of Q and R but the expression is complicated and is not needed at this time.) The form of P n sho ws that the en tries of Q n giv e the probabilities for b eing in eac h of the transien t states after n steps for eac h p ossible transien t starting state. F or our rst theorem w e pro v e that the probabilit y of b eing in the transien t states after n steps approac hes zero. Th us ev ery en try of Q n m ust approac h zero as n approac hes innit y (i.e, Q n 0 ). Probabilit y of Absorption Theorem 11.3 In an absorbing Mark o v c hain, the probabilit y that the pro cess will b e absorb ed is 1 (i.e., Q n 0 as n 1 ). Pro of. F rom eac h nonabsorbing state s j it is p ossible to reac h an absorbing state. Let m j b e the minim um n um b er of steps required to reac h an absorbing state, starting from s j Let p j b e the probabilit y that, starting from s j the pro cess will not reac h an absorbing state in m j steps. Then p j < 1. Let m b e the largest of the PAGE 426 418 CHAPTER 11. MARK O V CHAINS m j and let p b e the largest of p j The probabilit y of not b eing absorb ed in m steps is less than or equal to p in 2 m steps less than or equal to p 2 etc. Since p < 1 these probabilities tend to 0. Since the probabilit y of not b eing absorb ed in n steps is monotone decreasing, these probabilities also tend to 0, hence lim n !1 Q n = 0 : 2The F undamen tal Matrix Theorem 11.4 F or an absorbing Mark o v c hain the matrix I Q has an in v erse N and N = I + Q + Q 2 + The ij en try n ij of the matrix N is the exp ected n um b er of times the c hain is in state s j giv en that it starts in state s i The initial state is coun ted if i = j Pro of. Let ( I Q ) x = 0; that is x = Qx : Then, iterating this w e see that x = Q n x : Since Q n 0 w e ha v e Q n x 0 so x = 0 Th us ( I Q ) 1 = N exists. Note next that ( I Q )( I + Q + Q 2 + + Q n ) = I Q n +1 : Th us m ultiplying b oth sides b y N giv es I + Q + Q 2 + + Q n = N ( I Q n +1 ) : Letting n tend to innit y w e ha v e N = I + Q + Q 2 + : Let s i and s j b e t w o transien t states, and assume throughout the remainder of the pro of that i and j are xed. Let X ( k ) b e a random v ariable whic h equals 1 if the c hain is in state s j after k steps, and equals 0 otherwise. F or eac h k this random v ariable dep ends up on b oth i and j ; w e c ho ose not to explicitly sho w this dep endence in the in terest of clarit y W e ha v e P ( X ( k ) = 1) = q ( k ) ij ; and P ( X ( k ) = 0) = 1 q ( k ) ij ; where q ( k ) ij is the ij th en try of Q k These equations hold for k = 0 since Q 0 = I Therefore, since X ( k ) is a 01 random v ariable, E ( X ( k ) ) = q ( k ) ij The exp ected n um b er of times the c hain is in state s j in the rst n steps, giv en that it starts in state s i is clearly E X (0) + X (1) + + X ( n ) = q (0) ij + q (1) ij + + q ( n ) ij : Letting n tend to innit y w e ha v e E X (0) + X (1) + = q (0) ij + q (1) ij + = n ij : 2 PAGE 427 11.2. ABSORBING MARK O V CHAINS 419 Denition 11.3 F or an absorbing Mark o v c hain P the matrix N = ( I Q ) 1 is called the fundamental matrix for P The en try n ij of N giv es the exp ected n um b er of times that the pro cess is in the transien t state s j if it is started in the transien t state s i 2 Example 11.14 (Example 11.13 con tin ued) In the Drunk ard's W alk example, the transition matrix in canonical form is P = 0BBB@ 1 2 3 0 4 1 0 1 = 2 0 1 = 2 0 2 1 = 2 0 1 = 2 0 0 3 0 1 = 2 0 0 1 = 2 0 0 0 0 1 0 4 0 0 0 0 1 1CCCA : F rom this w e see that the matrix Q is Q = 0@ 0 1 = 2 0 1 = 2 0 1 = 2 0 1 = 2 0 1A ; and I Q = 0@ 1 1 = 2 0 1 = 2 1 1 = 2 0 1 = 2 1 1A : Computing ( I Q ) 1 w e nd N = ( I Q ) 1 = 0@ 1 2 3 1 3 = 2 1 1 = 2 2 1 2 1 3 1 = 2 1 3 = 2 1A : F rom the middle ro w of N w e see that if w e start in state 2, then the exp ected n um b er of times in states 1, 2, and 3 b efore b eing absorb ed are 1, 2, and 1. 2 Time to Absorption W e no w consider the question: Giv en that the c hain starts in state s i what is the exp ected n um b er of steps b efore the c hain is absorb ed? The answ er is giv en in the next theorem. Theorem 11.5 Let t i b e the exp ected n um b er of steps b efore the c hain is absorb ed, giv en that the c hain starts in state s i and let t b e the column v ector whose i th en try is t i Then t = Nc ; where c is a column v ector all of whose en tries are 1. PAGE 428 420 CHAPTER 11. MARK O V CHAINS Pro of. If w e add all the en tries in the i th ro w of N w e will ha v e the exp ected n um b er of times in an y of the transien t states for a giv en starting state s i that is, the exp ected time required b efore b eing absorb ed. Th us, t i is the sum of the en tries in the i th ro w of N If w e write this statemen t in matrix form, w e obtain the theorem. 2 Absorption Probabilities Theorem 11.6 Let b ij b e the probabilit y that an absorbing c hain will b e absorb ed in the absorbing state s j if it starts in the transien t state s i Let B b e the matrix with en tries b ij Then B is an t b yr matrix, and B = N R ; where N is the fundamen tal matrix and R is as in the canonical form. Pro of. W e ha v e B ij = X n X k q ( n ) ik r k j = X k X n q ( n ) ik r k j = X k n ik r k j = ( NR ) ij : This completes the pro of. 2 Another pro of of this is giv en in Exercise 34. Example 11.15 (Example 11.14 con tin ued) In the Drunk ard's W alk example, w e found that N = 0@ 1 2 3 1 3 = 2 1 1 = 2 2 1 2 1 3 1 = 2 1 3 = 2 1A : Hence, t = Nc = 0@ 3 = 2 1 1 = 2 1 2 1 1 = 2 1 3 = 2 1A 0@ 111 1A = 0@ 343 1A : PAGE 429 11.2. ABSORBING MARK O V CHAINS 421 Th us, starting in states 1, 2, and 3, the exp ected times to absorption are 3, 4, and 3, resp ectiv ely F rom the canonical form, R = 0@ 0 4 1 1 = 2 0 2 0 0 3 0 1 = 2 1A : Hence, B = N R = 0@ 3 = 2 1 1 = 2 1 2 1 1 = 2 1 3 = 2 1A 0@ 1 = 2 0 0 0 0 1 = 2 1A = 0@ 0 4 1 3 = 4 1 = 4 2 1 = 2 1 = 2 3 1 = 4 3 = 4 1A : Here the rst ro w tells us that, starting from state 1, there is probabilit y 3/4 of absorption in state 0 and 1/4 of absorption in state 4. 2 ComputationThe fact that w e ha v e b een able to obtain these three descriptiv e quan tities in matrix form mak es it v ery easy to write a computer program that determines these quan tities for a giv en absorbing c hain matrix. The program AbsorbingChain calculates the basic descriptiv e quan tities of an absorbing Mark o v c hain. W e ha v e run the program AbsorbingChain for the example of the drunk ard's w alk (Example 11.13) with 5 blo c ks. The results are as follo ws: Q = 0BB@ 1 2 3 4 1 : 00 : 50 : 00 : 00 2 : 50 : 00 : 50 : 00 3 : 00 : 50 : 00 : 50 4 : 00 : 00 : 50 : 00 1CCA ; R = 0BB@ 0 5 1 : 50 : 00 2 : 00 : 00 3 : 00 : 00 4 : 00 : 50 1CCA ; PAGE 430 422 CHAPTER 11. MARK O V CHAINS N = 0BB@ 1 2 3 4 1 1 : 60 1 : 20 : 80 : 40 2 1 : 20 2 : 40 1 : 60 : 80 3 : 80 1 : 60 2 : 40 1 : 20 4 : 40 : 80 1 : 20 1 : 60 1CCA ; t = 0BB@ 1 4 : 00 2 6 : 00 3 6 : 00 4 4 : 00 1CCA ; B = 0BB@ 0 5 1 : 80 : 20 2 : 60 : 40 3 : 40 : 60 4 : 20 : 80 1CCA : Note that the probabilit y of reac hing the bar b efore reac hing home, starting at x is x= 5 (i.e., prop ortional to the distance of home from the starting p oin t). (See Exercise 24.) Exercises 1 In Example 11.4, for what v alues of a and b do w e obtain an absorbing Mark o v c hain? 2 Sho w that Example 11.7 is an absorbing Mark o v c hain. 3 Whic h of the genetics examples (Examples 11.9, 11.10, and 11.11) are absorbing? 4 Find the fundamen tal matrix N for Example 11.10. 5 F or Example 11.11, v erify that the follo wing matrix is the in v erse of I Q and hence is the fundamen tal matrix N N = 0BB@ 8 = 3 1 = 6 4 = 3 2 = 3 4 = 3 4 = 3 8 = 3 4 = 3 4 = 3 1 = 3 8 = 3 4 = 3 2 = 3 1 = 6 4 = 3 8 = 3 1CCA : Find N c and NR In terpret the results. 6 In the Land of Oz example (Example 11.1), c hange the transition matrix b y making R an absorbing state. This giv es P = 0@ R N S R 1 0 0 N 1 = 2 0 1 = 2 S 1 = 4 1 = 4 1 = 2 1A : PAGE 431 11.2. ABSORBING MARK O V CHAINS 423 Find the fundamen tal matrix N and also Nc and NR In terpret the results. 7 In Example 11.8, mak e states 0 and 4 in to absorbing states. Find the fundamen tal matrix N and also Nc and NR for the resulting absorbing c hain. In terpret the results. 8 In Example 11.13 (Drunk ard's W alk) of this section, assume that the probabilit y of a step to the righ t is 2/3, and a step to the left is 1/3. Find N ; Nc and NR Compare these with the results of Example 11.15. 9 A pro cess mo v es on the in tegers 1, 2, 3, 4, and 5. It starts at 1 and, on eac h successiv e step, mo v es to an in teger greater than its presen t p osition, mo ving with equal probabilit y to eac h of the remaining larger in tegers. State v e is an absorbing state. Find the exp ected n um b er of steps to reac h state v e. 10 Using the result of Exercise 9, mak e a conjecture for the form of the fundamen tal matrix if the pro cess mo v es as in that exercise, except that it no w mo v es on the in tegers from 1 to n T est y our conjecture for sev eral dieren t v alues of n Can y ou conjecture an estimate for the exp ected n um b er of steps to reac h state n for large n ? (See Exercise 11 for a metho d of determining this exp ected n um b er of steps.) *11 Let b k denote the exp ected n um b er of steps to reac h n from n k in the pro cess describ ed in Exercise 9. (a) Dene b 0 = 0. Sho w that for k > 0, w e ha v e b k = 1 + 1 k b k 1 + b k 2 + + b 0 : (b) Let f ( x ) = b 0 + b 1 x + b 2 x 2 + : Using the recursion in part (a), sho w that f ( x ) satises the dieren tial equation (1 x ) 2 y 0 (1 x ) y 1 = 0 : (c) Sho w that the general solution of the dieren tial equation in part (b) is y = log (1 x ) 1 x + c 1 x ; where c is a constan t. (d) Use part (c) to sho w that b k = 1 + 1 2 + 1 3 + + 1 k : 12 Three tanks gh t a threew a y duel. T ank A has probabilit y 1/2 of destro ying the tank at whic h it res, tank B has probabilit y 1/3 of destro ying the tank at whic h it res, and tank C has probabilit y 1/6 of destro ying the tank at whic h PAGE 432 424 CHAPTER 11. MARK O V CHAINS it res. The tanks re together and eac h tank res at the strongest opp onen t not y et destro y ed. F orm a Mark o v c hain b y taking as states the subsets of the set of tanks. Find N ; N c and N R and in terpret y our results. Hint : T ak e as states ABC, A C, BC, A, B, C, and none, indicating the tanks that could surviv e starting in state ABC. Y ou can omit AB b ecause this state cannot b e reac hed from ABC. 13 Smith is in jail and has 3 dollars; he can get out on bail if he has 8 dollars. A guard agrees to mak e a series of b ets with him. If Smith b ets A dollars, he wins A dollars with probabilit y .4 and loses A dollars with probabilit y .6. Find the probabilit y that he wins 8 dollars b efore losing all of his money if (a) he b ets 1 dollar eac h time (timid strategy). (b) he b ets, eac h time, as m uc h as p ossible but not more than necessary to bring his fortune up to 8 dollars (b old strategy). (c) Whic h strategy giv es Smith the b etter c hance of getting out of jail? 14 With the situation in Exercise 13, consider the strategy suc h that for i < 4, Smith b ets min ( i; 4 i ), and for i 4, he b ets according to the b old strategy where i is his curren t fortune. Find the probabilit y that he gets out of jail using this strategy Ho w do es this probabilit y compare with that obtained for the b old strategy? 15 Consider the game of tennis when deuc e is reac hed. If a pla y er wins the next p oin t, he has advantage. On the follo wing p oin t, he either wins the game or the game returns to deuc e. Assume that for an y p oin t, pla y er A has probabilit y .6 of winning the p oin t and pla y er B has probabilit y .4 of winning the p oin t. (a) Set this up as a Mark o v c hain with state 1: A wins; 2: B wins; 3: adv an tage A; 4: deuce; 5: adv an tage B. (b) Find the absorption probabilities. (c) A t deuce, nd the exp ected duration of the game and the probabilit y that B will win. Exercises 16 and 17 concern the inheritance of colorblindness, whic h is a sexlink ed c haracteristic. There is a pair of genes, g and G, of whic h the former tends to pro duce colorblindness, the latter normal vision. The G gene is dominan t. But a man has only one gene, and if this is g, he is colorblind. A man inherits one of his mother's t w o genes, while a w oman inherits one gene from eac h paren t. Th us a man ma y b e of t yp e G or g, while a w oman ma y b e t yp e GG or Gg or gg. W e will study a pro cess of in breeding similar to that of Example 11.11 b y constructing a Mark o v c hain. 16 List the states of the c hain. Hint : There are six. Compute the transition probabilities. Find the fundamen tal matrix N Nc and NR PAGE 433 11.2. ABSORBING MARK O V CHAINS 425 17 Sho w that in b oth Example 11.11 and the example just giv en, the probabilit y of absorption in a state ha ving genes of a particular t yp e is equal to the prop ortion of genes of that t yp e in the starting state. Sho w that this can b e explained b y the fact that a game in whic h y our fortune is the n um b er of genes of a particular t yp e in the state of the Mark o v c hain is a fair game. 5 18 Assume that a studen t going to a certain foury ear medical sc ho ol in northern New England has, eac h y ear, a probabilit y q of runking out, a probabilit y r of ha ving to rep eat the y ear, and a probabilit y p of mo ving on to the next y ear (in the fourth y ear, mo ving on means graduating). (a) F orm a transition matrix for this pro cess taking as states F, 1, 2, 3, 4, and G where F stands for runking out and G for graduating, and the other states represen t the y ear of study (b) F or the case q = : 1, r = : 2, and p = : 7 nd the time a b eginning studen t can exp ect to b e in the second y ear. Ho w long should this studen t exp ect to b e in medical sc ho ol? (c) Find the probabilit y that this b eginning studen t will graduate. 19 (E. Bro wn 6 ) Mary and John are pla ying the follo wing game: They ha v e a threecard dec k mark ed with the n um b ers 1, 2, and 3 and a spinner with the n um b ers 1, 2, and 3 on it. The game b egins b y dealing the cards out so that the dealer gets one card and the other p erson gets t w o. A mo v e in the game consists of a spin of the spinner. The p erson ha ving the card with the n um b er that comes up on the spinner hands that card to the other p erson. The game ends when someone has all the cards. (a) Set up the transition matrix for this absorbing Mark o v c hain, where the states corresp ond to the n um b er of cards that Mary has. (b) Find the fundamen tal matrix. (c) On the a v erage, ho w man y mo v es will the game last? (d) If Mary deals, what is the probabilit y that John will win the game? 20 Assume that an exp erimen t has m equally probable outcomes. Sho w that the exp ected n um b er of indep enden t trials b efore the rst o ccurrence of k consecutiv e o ccurrences of one of these outcomes is ( m k 1) = ( m 1). Hint : F orm an absorbing Mark o v c hain with states 1, 2, k with state i represen ting the length of the curren t run. The exp ected time un til a run of k is 1 more than the exp ected time un til absorption for the c hain started in state 1. It has b een found that, in the decimal expansion of pi, starting with the 24,658,601st digit, there is a run of nine 7's. What w ould y our result sa y ab out the exp ected n um b er of digits necessary to nd suc h a run if the digits are pro duced randomly? 5 H. Gonshor, \An Application of Random W alk to a Problem in P opulation Genetics," A meric an Math Monthly, v ol. 94 (1987), pp. 668{671 6 Priv ate comm unication. PAGE 434 426 CHAPTER 11. MARK O V CHAINS 21 (Rob erts 7 ) A cit y is divided in to 3 areas 1, 2, and 3. It is estimated that amoun ts u 1 u 2 and u 3 of p ollution are emitted eac h da y from these three areas. A fraction q ij of the p ollution from region i ends up the next da y at region j A fraction q i = 1 P j q ij > 0 go es in to the atmosphere and escap es. Let w ( n ) i b e the amoun t of p ollution in area i after n da ys. (a) Sho w that w ( n ) = u + uQ + + uQ n 1 (b) Sho w that w ( n ) w and sho w ho w to compute w from u (c) The go v ernmen t w an ts to limit p ollution lev els to a prescrib ed lev el b y prescribing w : Sho w ho w to determine the lev els of p ollution u whic h w ould result in a prescrib ed limiting v alue w 22 In the Leon tief economic mo del, 8 there are n industries 1, 2, n The i th industry requires an amoun t 0 q ij 1 of go o ds (in dollar v alue) from compan y j to pro duce 1 dollar's w orth of go o ds. The outside demand on the industries, in dollar v alue, is giv en b y the v ector d = ( d 1 ; d 2 ; : : : ; d n ). Let Q b e the matrix with en tries q ij (a) Sho w that if the industries pro duce total amoun ts giv en b y the v ector x = ( x 1 ; x 2 ; : : : ; x n ) then the amoun ts of go o ds of eac h t yp e that the industries will need just to meet their in ternal demands is giv en b y the v ector xQ (b) Sho w that in order to meet the outside demand d and the in ternal demands the industries m ust pro duce total amoun ts giv en b y a v ector x = ( x 1 ; x 2 ; : : : ; x n ) whic h satises the equation x = x Q + d (c) Sho w that if Q is the Q matrix for an absorbing Mark o v c hain, then it is p ossible to meet an y outside demand d (d) Assume that the ro w sums of Q are less than or equal to 1. Giv e an economic in terpretation of this condition. F orm a Mark o v c hain b y taking the states to b e the industries and the transition probabilites to b e the q ij Add one absorbing state 0. Dene q i 0 = 1 X j q ij : Sho w that this c hain will b e absorbing if ev ery compan y is either making a prot or ultimately dep ends up on a protmaking compan y (e) Dene xc to b e the gross national pro duct. Find an expression for the gross national pro duct in terms of the demand v ector d and the v ector t giving the exp ected time to absorption. 23 A gam bler pla ys a game in whic h on eac h pla y he wins one dollar with probabilit y p and loses one dollar with probabilit y q = 1 p The Gambler's R uin 7 F. Rob erts, Discr ete Mathematic al Mo dels (Englew o o d Clis, NJ: Pren tice Hall, 1976). 8 W. W. Leon tief, InputOutput Ec onomics (Oxford: Oxford Univ ersit y Press, 1966). PAGE 435 11.2. ABSORBING MARK O V CHAINS 427 pr oblem is the problem of nding the probabilit y w x of winning an amoun t T b efore losing ev erything, starting with state x Sho w that this problem ma y b e considered to b e an absorbing Mark o v c hain with states 0, 1, 2, T with 0 and T absorbing states. Supp ose that a gam bler has probabilit y p = : 48 of winning on eac h pla y Supp ose, in addition, that the gam bler starts with 50 dollars and that T = 100 dollars. Sim ulate this game 100 times and see ho w often the gam bler is ruined. This estimates w 50 24 Sho w that w x of Exercise 23 satises the follo wing conditions: (a) w x = pw x +1 + q w x 1 for x = 1, 2, T 1. (b) w 0 = 0. (c) w T = 1. Sho w that these conditions determine w x Sho w that, if p = q = 1 = 2, then w x = x T satises (a), (b), and (c) and hence is the solution. If p 6 = q sho w that w x = ( q =p ) x 1 ( q =p ) T 1 satises these conditions and hence giv es the probabilit y of the gam bler winning. 25 W rite a program to compute the probabilit y w x of Exercise 24 for giv en v alues of x p and T Study the probabilit y that the gam bler will ruin the bank in a game that is only sligh tly unfa v orable, sa y p = : 49, if the bank has signican tly more money than the gam bler. *26 W e considered the t w o examples of the Drunk ard's W alk corresp onding to the cases n = 4 and n = 5 blo c ks (see Example 11.13). V erify that in these t w o examples the exp ected time to absorption, starting at x is equal to x ( n x ). See if y ou can pro v e that this is true in general. Hint : Sho w that if f ( x ) is the exp ected time to absorption then f (0) = f ( n ) = 0 and f ( x ) = (1 = 2) f ( x 1) + (1 = 2) f ( x + 1) + 1 for 0 < x < n Sho w that if f 1 ( x ) and f 2 ( x ) are t w o solutions, then their dierence g ( x ) is a solution of the equation g ( x ) = (1 = 2) g ( x 1) + (1 = 2) g ( x + 1) : Also, g (0) = g ( n ) = 0. Sho w that it is not p ossible for g ( x ) to ha v e a strict maxim um or a strict minim um at the p oin t i where 1 i n 1. Use this to sho w that g ( i ) = 0 for all i. This sho ws that there is at most one solution. Then v erify that the function f ( x ) = x ( n x ) is a solution. PAGE 436 428 CHAPTER 11. MARK O V CHAINS 27 Consider an absorbing Mark o v c hain with state space S Let f b e a function dened on S with the prop ert y that f ( i ) = Xj 2 S p ij f ( j ) ; or in v ector form f = Pf : Then f is called a harmonic function for P If y ou imagine a game in whic h y our fortune is f ( i ) when y ou are in state i then the harmonic condition means that the game is fair in the sense that y our exp ected fortune after one step is the same as it w as b efore the step. (a) Sho w that for f harmonic f = P n f for all n (b) Sho w, using (a), that for f harmonic f = P 1 f ; where P 1 = lim n !1 P n = 0 B 0 I : (c) Using (b), pro v e that when y ou start in a transien t state i y our exp ected nal fortune X k b ik f ( k ) is equal to y our starting fortune f ( i ). In other w ords, a fair game on a nite state space remains fair to the end. (F air games in general are called martingales. F air games on innite state spaces need not remain fair with an unlimited n um b er of pla ys allo w ed. F or example, consider the game of Heads or T ails (see Example 1.4). Let P eter start with 1 p enn y and pla y un til he has 2. Then P eter will b e sure to end up 1 p enn y ahead.) 28 A coin is tossed rep eatedly W e are in terested in nding the exp ected n um b er of tosses un til a particular pattern, sa y B = HTH, o ccurs for the rst time. If, for example, the outcomes of the tosses are HHTTHTH w e sa y that the pattern B has o ccurred for the rst time after 7 tosses. Let T B b e the time to obtain pattern B for the rst time. Li 9 giv es the follo wing metho d for determining E ( T B ). W e are in a casino and, b efore eac h toss of the coin, a gam bler en ters, pa ys 1 dollar to pla y and b ets that the pattern B = HTH will o ccur on the next 9 SY. R. Li, \A Martingale Approac h to the Study of Occurrence of Sequence P atterns in Rep eated Exp erimen ts," A nnals of Pr ob ability, v ol. 8 (1980), pp. 1171{1176. PAGE 437 11.2. ABSORBING MARK O V CHAINS 429 three tosses. If H o ccurs, he wins 2 dollars and b ets this amoun t that the next outcome will b e T. If he wins, he wins 4 dollars and b ets this amoun t that H will come up next time. If he wins, he wins 8 dollars and the pattern has o ccurred. If at an y time he loses, he lea v es with no winnings. Let A and B b e t w o patterns. Let AB b e the amoun t the gam blers win who arriv e while the pattern A o ccurs and b et that B will o ccur. F or example, if A = HT and B = HTH then AB = 2 + 4 = 6 since the rst gam bler b et on H and w on 2 dollars and then b et on T and w on 4 dollars more. The second gam bler b et on H and lost. If A = HH and B = HTH, then AB = 2 since the rst gam bler b et on H and w on but then b et on T and lost and the second gam bler b et on H and w on. If A = B = HTH then AB = BB = 8 + 2 = 10. No w for eac h gam bler coming in, the casino tak es in 1 dollar. Th us the casino tak es in T B dollars. Ho w m uc h do es it pa y out? The only gam blers who go o with an y money are those who arriv e during the time the pattern B o ccurs and they win the amoun t BB. But since all the b ets made are p erfectly fair b ets, it seems quite in tuitiv e that the exp ected amoun t the casino tak es in should equal the exp ected amoun t that it pa ys out. That is, E ( T B ) = BB. Since w e ha v e seen that for B = HTH, BB = 10, the exp ected time to reac h the pattern HTH for the rst time is 10. If w e had b een trying to get the pattern B = HHH, then BB = 8 + 4 + 2 = 14 since all the last three gam blers are paid o in this case. Th us the exp ected time to get the pattern HHH is 14. T o justify this argumen t, Li used a theorem from the theory of martingales (fair games). W e can obtain these exp ectations b y considering a Mark o v c hain whose states are the p ossible initial segmen ts of the sequence HTH; these states are HTH, HT, H, and ; where ; is the empt y set. Then, for this example, the transition matrix is 0BB@ HTH HT H ; HTH 1 0 0 0 HT : 5 0 0 : 5 H 0 : 5 : 5 0 ; 0 0 : 5 : 5 1CCA ; and if B = HTH, E ( T B ) is the exp ected time to absorption for this c hain started in state ; Sho w, using the asso ciated Mark o v c hain, that the v alues E ( T B ) = 10 and E ( T B ) = 14 are correct for the exp ected time to reac h the patterns HTH and HHH, resp ectiv ely 29 W e can use the gam bling in terpretation giv en in Exercise 28 to nd the exp ected n um b er of tosses required to reac h pattern B when w e start with pattern A. T o b e a meaningful problem, w e assume that pattern A do es not ha v e pattern B as a subpattern. Let E A ( T B ) b e the exp ected time to reac h pattern B starting with pattern A. W e use our gam bling sc heme and assume that the rst k coin tosses pro duced the pattern A. During this time, the gam blers PAGE 438 430 CHAPTER 11. MARK O V CHAINS made an amoun t AB. The total amoun t the gam blers will ha v e made when the pattern B o ccurs is BB. Th us, the amoun t that the gam blers made after the pattern A has o ccurred is BB AB. Again b y the fair game argumen t, E A ( T B ) = BBAB. F or example, supp ose that w e start with pattern A = HT and are trying to get the pattern B = HTH. Then w e sa w in Exercise 28 that AB = 4 and BB = 10 so E A ( T B ) = BBAB= 6. V erify that this gam bling in terpretation leads to the correct answ er for all starting states in the examples that y ou w ork ed in Exercise 28. 30 Here is an elegan t metho d due to Guibas and Odlyzk o 10 to obtain the exp ected time to reac h a pattern, sa y HTH, for the rst time. Let f ( n ) b e the n um b er of sequences of length n whic h do not ha v e the pattern HTH. Let f p ( n ) b e the n um b er of sequences that ha v e the pattern for the rst time after n tosses. T o eac h elemen t of f ( n ), add the pattern HTH. Then divide the resulting sequences in to three subsets: the set where HTH o ccurs for the rst time at time n + 1 (for this, the original sequence m ust ha v e ended with HT); the set where HTH o ccurs for the rst time at time n + 2 (cannot happ en for this pattern); and the set where the sequence HTH o ccurs for the rst time at time n + 3 (the original sequence ended with an ything except HT). Doing this, w e ha v e f ( n ) = f p ( n + 1) + f p ( n + 3) : Th us, f ( n ) 2 n = 2 f p ( n + 1) 2 n +1 + 2 3 f p ( n + 3) 2 n +3 : If T is the time that the pattern o ccurs for the rst time, this equalit y states that P ( T > n ) = 2 P ( T = n + 1) + 8 P ( T = n + 3) : Sho w that if y ou sum this equalit y o v er all n y ou obtain 1 X n =0 P ( T > n ) = 2 + 8 = 10 : Sho w that for an y in tegerv alued random v ariable E ( T ) = 1 X n =0 P ( T > n ) ; and conclude that E ( T ) = 10. Note that this metho d of pro of mak es v ery clear that E ( T ) is, in general, equal to the exp ected amoun t the casino pa ys out and a v oids the martingale system theorem used b y Li. 10 L. J. Guibas and A. M. Odlyzk o, \String Ov erlaps, P attern Matc hing, and Nontransitiv e Games," Journal of Combinatorial The ory, Series A, v ol. 30 (1981), pp. 183{208. PAGE 439 11.2. ABSORBING MARK O V CHAINS 431 31 In Example 11.11, dene f ( i ) to b e the prop ortion of G genes in state i Sho w that f is a harmonic function (see Exercise 27). Wh y do es this sho w that the probabilit y of b eing absorb ed in state (GG ; GG) is equal to the prop ortion of G genes in the starting state? (See Exercise 17.) 32 Sho w that the stepping stone mo del (Example 11.12) is an absorbing Mark o v c hain. Assume that y ou are pla ying a game with red and green squares, in whic h y our fortune at an y time is equal to the prop ortion of red squares at that time. Giv e an argumen t to sho w that this is a fair game in the sense that y our exp ected winning after eac h step is just what it w as b efore this step. Hint : Sho w that for ev ery p ossible outcome in whic h y our fortune will decrease b y one there is another outcome of exactly the same probabilit y where it will increase b y one. Use this fact and the results of Exercise 27 to sho w that the probabilit y that a particular color wins out is equal to the prop ortion of squares that are initially of this color. 33 Consider a random w alk er who mo v es on the in tegers 0, 1, N mo ving one step to the righ t with probabilit y p and one step to the left with probabilit y q = 1 p If the w alk er ev er reac hes 0 or N he sta ys there. (This is the Gam bler's Ruin problem of Exercise 23.) If p = q sho w that the function f ( i ) = i is a harmonic function (see Exercise 27), and if p 6 = q then f ( i ) = q p i is a harmonic function. Use this and the result of Exercise 27 to sho w that the probabilit y b iN of b eing absorb ed in state N starting in state i is b iN = ( i N ; if p = q ; ( q p ) i 1 ( q p ) N 1 ; if p 6 = q : F or an alternativ e deriv ation of these results see Exercise 24. 34 Complete the follo wing alternate pro of of Theorem 11.6. Let s i b e a transien t state and s j b e an absorbing state. If w e compute b ij in terms of the p ossibilities on the outcome of the rst step, then w e ha v e the equation b ij = p ij + X k p ik b k j ; where the summation is carried out o v er all transien t states s k W rite this in matrix form, and deriv e from this equation the statemen t B = N R : PAGE 440 432 CHAPTER 11. MARK O V CHAINS 35 In Mon te Carlo roulette (see Example 6.6), under option (c), there are six states ( S W L E P 1 and P 2 ). The reader is referred to Figure 6.2, whic h con tains a tree for this option. F orm a Mark o v c hain for this option, and use the program AbsorbingChain to nd the probabilities that y ou win, lose, or break ev en for a 1 franc b et on red. Using these probabilities, nd the exp ected winnings for this b et. F or a more general discussion of Mark o v c hains applied to roulette, see the article of H. Sagan referred to in Example 6.13. 36 W e consider next a game called Penneyante b y its in v en tor W. P enney 11 There are t w o pla y ers; the rst pla y er pic ks a pattern A of H's and T's, and then the second pla y er, kno wing the c hoice of the rst pla y er, pic ks a dieren t pattern B. W e assume that neither pattern is a subpattern of the other pattern. A coin is tossed a sequence of times, and the pla y er whose pattern comes up rst is the winner. T o analyze the game, w e need to nd the probabilit y p A that pattern A will o ccur b efore pattern B and the probabilit y p B = 1 p A that pattern B o ccurs b efore pattern A. T o determine these probabilities w e use the results of Exercises 28 and 29. Here y ou w ere ask ed to sho w that, the exp ected time to reac h a pattern B for the rst time is, E ( T B ) = B B ; and, starting with pattern A, the exp ected time to reac h pattern B is E A ( T B ) = B B AB : (a) Sho w that the o dds that the rst pla y er will win are giv en b y John Con w a y's form ula 12 : p A 1 p A = p A p B = B B B A AA AB : Hint : Explain wh y E ( T B ) = E ( T A or B ) + p A E A ( T B ) and th us B B = E ( T A or B ) + p A ( B B AB ) : In terc hange A and B to nd a similar equation in v olving the p B Finally note that p A + p B = 1 : Use these equations to solv e for p A and p B (b) Assume that b oth pla y ers c ho ose a pattern of the same length k. Sho w that, if k = 2, this is a fair game, but, if k = 3, the second pla y er has an adv an tage no matter what c hoice the rst pla y er mak es. (It has b een sho wn that, for k 3, if the rst pla y er c ho oses a 1 a 2 a k then the optimal strategy for the second pla y er is of the form b a 1 a k 1 where b is the b etter of the t w o c hoices H or T. 13 ) 11 W. P enney \Problem: P enneyAn te," Journal of R e cr e ational Math, v ol. 2 (1969), p. 241. 12 M. Gardner, \Mathematical Games," Scientic A meric an, v ol. 10 (1974), pp. 120{125. 13 Guibas and Odlyzk o, op. cit. PAGE 441 11.3. ER GODIC MARK O V CHAINS 433 11.3 Ergo dic Mark o v Chains A second imp ortan t kind of Mark o v c hain w e shall study in detail is an er go dic Mark o v c hain, dened as follo ws. Denition 11.4 A Mark o v c hain is called an er go dic c hain if it is p ossible to go from ev ery state to ev ery state (not necessarily in one mo v e). 2 In man y b o oks, ergo dic Mark o v c hains are called irr e ducible Denition 11.5 A Mark o v c hain is called a r e gular c hain if some p o w er of the transition matrix has only p ositiv e elemen ts. 2 In other w ords, for some n it is p ossible to go from an y state to an y state in exactly n steps. It is clear from this denition that ev ery regular c hain is ergo dic. On the other hand, an ergo dic c hain is not necessarily regular, as the follo wing examples sho w. Example 11.16 Let the transition matrix of a Mark o v c hain b e dened b y P = 1 2 1 0 1 2 1 0 : Then is clear that it is p ossible to mo v e from an y state to an y state, so the c hain is ergo dic. Ho w ev er, if n is o dd, then it is not p ossible to mo v e from state 0 to state 0 in n steps, and if n is ev en, then it is not p ossible to mo v e from state 0 to state 1 in n steps, so the c hain is not regular. 2 A more in teresting example of an ergo dic, nonregular Mark o v c hain is pro vided b y the Ehrenfest urn mo del. Example 11.17 Recall the Ehrenfest urn mo del (Example 11.8). The transition matrix for this example is P = 0BBBB@ 0 1 2 3 4 0 0 1 0 0 0 1 1 = 4 0 3 = 4 0 0 2 0 1 = 2 0 1 = 2 0 3 0 0 3 = 4 0 1 = 4 4 0 0 0 1 0 1CCCCA : In this example, if w e start in state 0 w e will, after an y ev en n um b er of steps, b e in either state 0, 2 or 4, and after an y o dd n um b er of steps, b e in states 1 or 3. Th us this c hain is ergo dic but not regular. 2 PAGE 442 434 CHAPTER 11. MARK O V CHAINS Regular Mark o v Chains An y transition matrix that has no zeros determines a regular Mark o v c hain. Ho wev er, it is p ossible for a regular Mark o v c hain to ha v e a transition matrix that has zeros. The transition matrix of the Land of Oz example of Section 11.1 has p N N = 0 but the second p o w er P 2 has no zeros, so this is a regular Mark o v c hain. An example of a nonregular Mark o v c hain is an absorbing c hain. F or example, let P = 1 0 1 = 2 1 = 2 b e the transition matrix of a Mark o v c hain. Then all p o w ers of P will ha v e a 0 in the upp er righ thand corner. W e shall no w discuss t w o imp ortan t theorems relating to regular c hains. Theorem 11.7 Let P b e the transition matrix for a regular c hain. Then, as n 1 the p o w ers P n approac h a limiting matrix W with all ro ws the same v ector w The v ector w is a strictly p ositiv e probabilit y v ector (i.e., the comp onen ts are all p ositiv e and they sum to one). 2 In the next section w e giv e t w o pro ofs of this fundamen tal theorem. W e giv e here the basic idea of the rst pro of. W e w an t to sho w that the p o w ers P n of a regular transition matrix tend to a matrix with all ro ws the same. This is the same as sho wing that P n con v erges to a matrix with constan t columns. No w the j th column of P n is P n y where y is a column v ector with 1 in the j th en try and 0 in the other en tries. Th us w e need only pro v e that for an y column v ector y ; P n y approac hes a constan t v ector as n tend to innit y Since eac h ro w of P is a probabilit y v ector, Py replaces y b y a v erages of its comp onen ts. Here is an example: 0@ 1 = 2 1 = 4 1 = 4 1 = 3 1 = 3 1 = 3 1 = 2 1 = 2 0 1A 0@ 123 1A = 0@ 1 = 2 1 + 1 = 4 2 + 1 = 4 3 1 = 3 1 + 1 = 3 2 + 1 = 3 3 1 = 2 1 + 1 = 2 2 + 0 3 1A = 0@ 7 = 4 2 3 = 2 1A : The result of the a v eraging pro cess is to mak e the comp onen ts of Py more similar than those of y In particular, the maxim um comp onen t decreases (from 3 to 2) and the minim um comp onen t increases (from 1 to 3/2). Our pro of will sho w that as w e do more and more of this a v eraging to get P n y the dierence b et w een the maxim um and minim um comp onen t will tend to 0 as n 1 This means P n y tends to a constan t v ector. The ij th en try of P n p ( n ) ij is the probabilit y that the pro cess will b e in state s j after n steps if it starts in state s i If w e denote the common ro w of W b y w then Theorem 11.7 states that the probabilit y of b eing in s j in the long run is appro ximately w j the j th en try of w and is indep enden t of the starting state. PAGE 443 11.3. ER GODIC MARK O V CHAINS 435 Example 11.18 Recall that for the Land of Oz example of Section 11.1, the sixth p o w er of the transition matrix P is, to three decimal places, P 6 = 0@ R N S R : 4 : 2 : 4 N : 4 : 2 : 4 S : 4 : 2 : 4 1A : Th us, to this degree of accuracy the probabilit y of rain six da ys after a rain y da y is the same as the probabilit y of rain six da ys after a nice da y or six da ys after a sno wy da y Theorem 11.7 predicts that, for large n the ro ws of P approac h a common v ector. It is in teresting that this o ccurs so so on in our example. 2 Theorem 11.8 Let P b e a regular transition matrix, let W = lim n !1 P n ; let w b e the common ro w of W and let c b e the column v ector all of whose comp onen ts are 1. Then (a) wP = w and an y ro w v ector v suc h that v P = v is a constan t m ultiple of w (b) Pc = c and an y column v ector x suc h that Px = x is a m ultiple of c Pro of. T o pro v e part (a), w e note that from Theorem 11.7, P n W : Th us, P n +1 = P n P W P : But P n +1 W and so W = W P and w = w P Let v b e an y v ector with v P = v Then v = vP n and passing to the limit, v = vW Let r b e the sum of the comp onen ts of v Then it is easily c hec k ed that vW = r w So, v = r w T o pro v e part (b), assume that x = P x Then x = P n x and again passing to the limit, x = Wx Since all ro ws of W are the same, the comp onen ts of Wx are all equal, so x is a m ultiple of c 2 Note that an immediate consequence of Theorem 11.8 is the fact that there is only one probabilit y v ector v suc h that v P = v Fixed V ectors Denition 11.6 A ro w v ector w with the prop ert y wP = w is called a xe d r ow ve ctor for P Similarly a column v ector x suc h that Px = x is called a xe d c olumn ve ctor for P 2 PAGE 444 436 CHAPTER 11. MARK O V CHAINS Th us, the common ro w of W is the unique v ector w whic h is b oth a xed ro w v ector for P and a probabilit y v ector. Theorem 11.8 sho ws that an y xed ro w v ector for P is a m ultiple of w and an y xed column v ector for P is a constan t v ector. One can also state Denition 11.6 in terms of eigen v alues and eigen v ectors. A xed ro w v ector is a left eigen v ector of the matrix P corresp onding to the eigen v alue 1. A similar statemen t can b e made ab out xed column v ectors. W e will no w giv e sev eral dieren t metho ds for calculating the xed ro w v ector w for a regular Mark o v c hain. Example 11.19 By Theorem 11.7 w e can nd the limiting v ector w for the Land of Oz from the fact that w 1 + w 2 + w 3 = 1 and ( w 1 w 2 w 3 ) 0@ 1 = 2 1 = 4 1 = 4 1 = 2 0 1 = 2 1 = 4 1 = 4 1 = 2 1A = ( w 1 w 2 w 3 ) : These relations lead to the follo wing four equations in three unkno wns: w 1 + w 2 + w 3 = 1 ; (1 = 2) w 1 + (1 = 2) w 2 + (1 = 4) w 3 = w 1 ; (1 = 4) w 1 + (1 = 4) w 3 = w 2 ; (1 = 4) w 1 + (1 = 2) w 2 + (1 = 2) w 3 = w 3 : Our theorem guaran tees that these equations ha v e a unique solution. If the equations are solv ed, w e obtain the solution w = ( : 4 : 2 : 4 ) ; in agreemen t with that predicted from P 6 giv en in Example 11.2. 2 T o calculate the xed v ector, w e can assume that the v alue at a particular state, sa y state one, is 1, and then use all but one of the linear equations from w P = w This set of equations will ha v e a unique solution and w e can obtain w from this solution b y dividing eac h of its en tries b y their sum to giv e the probabilit y v ector w W e will no w illustrate this idea for the ab o v e example. Example 11.20 (Example 11.19 con tin ued) W e set w 1 = 1, and then solv e the rst and second linear equations from wP = w W e ha v e (1 = 2) + (1 = 2) w 2 + (1 = 4) w 3 = 1 ; (1 = 4) + (1 = 4) w 3 = w 2 : If w e solv e these, w e obtain ( w 1 w 2 w 3 ) = ( 1 1 = 2 1 ) : PAGE 445 11.3. ER GODIC MARK O V CHAINS 437 No w w e divide this v ector b y the sum of the comp onen ts, to obtain the nal answ er: w = ( : 4 : 2 : 4 ) : This metho d can b e easily programmed to run on a computer. 2 As men tioned ab o v e, w e can also think of the xed ro w v ector w as a left eigen v ector of the transition matrix P Th us, if w e write I to denote the iden tit y matrix, then w satises the matrix equation wP = w I ; or equiv alen tly w ( P I ) = 0 : Th us, w is in the left n ullspace of the matrix P I F urthermore, Theorem 11.8 states that this left n ullspace has dimension 1. Certain computer programming languages can nd n ullspaces of matrices. In suc h languages, one can nd the xed ro w probabilit y v ector for a matrix P b y computing the left n ullspace and then normalizing a v ector in the n ullspace so the sum of its comp onen ts is 1. The program FixedV ector uses one of the ab o v e metho ds (dep ending up on the language in whic h it is written) to calculate the xed ro w probabilit y v ector for regular Mark o v c hains. So far w e ha v e alw a ys assumed that w e started in a sp ecic state. The follo wing theorem generalizes Theorem 11.7 to the case where the starting state is itself determined b y a probabilit y v ector. Theorem 11.9 Let P b e the transition matrix for a regular c hain and v an arbitrary probabilit y v ector. Then lim n !1 vP n = w ; where w is the unique xed probabilit y v ector for P Pro of. By Theorem 11.7, lim n !1 P n = W : Hence, lim n !1 vP n = v W : But the en tries in v sum to 1, and eac h ro w of W equals w F rom these statemen ts, it is easy to c hec k that v W = w : 2 If w e start a Mark o v c hain with initial probabilities giv en b y v then the probabilit y v ector v P n giv es the probabilities of b eing in the v arious states after n steps. Theorem 11.9 then establishes the fact that, ev en in this more general class of pro cesses, the probabilit y of b eing in s j approac hes w j PAGE 446 438 CHAPTER 11. MARK O V CHAINS EquilibriumW e also obtain a new in terpretation for w Supp ose that our starting v ector pic ks state s i as a starting state with probabilit y w i for all i Then the probabilit y of b eing in the v arious states after n steps is giv en b y wP n = w and is the same on all steps. This metho d of starting pro vides us with a pro cess that is called \stationary ." The fact that w is the only probabilit y v ector for whic h wP = w sho ws that w e m ust ha v e a starting probabilit y v ector of exactly the kind describ ed to obtain a stationary pro cess. Man y in teresting results concerning regular Mark o v c hains dep end only on the fact that the c hain has a unique xed probabilit y v ector whic h is p ositiv e. This prop ert y holds for all ergo dic Mark o v c hains. Theorem 11.10 F or an ergo dic Mark o v c hain, there is a unique probabilit y v ector w suc h that wP = w and w is strictly p ositiv e. An y ro w v ector suc h that vP = v is a m ultiple of w An y column v ector x suc h that Px = x is a constan t v ector. Pro of. This theorem states that Theorem 11.8 is true for ergo dic c hains. The result follo ws easily from the fact that, if P is an ergo dic transition matrix, then P = (1 = 2) I + (1 = 2) P is a regular transition matrix with the same xed v ectors (see Exercises 25{28). 2 F or ergo dic c hains, the xed probabilit y v ector has a sligh tly dieren t in terpretation. The follo wing t w o theorems, whic h w e will not pro v e here, furnish an in terpretation for this xed v ector. Theorem 11.11 Let P b e the transition matrix for an ergo dic c hain. Let A n b e the matrix dened b y A n = I + P + P 2 + + P n n + 1 : Then A n W where W is a matrix all of whose ro ws are equal to the unique xed probabilit y v ector w for P 2 If P is the transition matrix of an ergo dic c hain, then Theorem 11.8 states that there is only one xed ro w probabilit y v ector for P Th us, w e can use the same tec hniques that w ere used for regular c hains to solv e for this xed v ector. In particular, the program FixedV ector w orks for ergo dic c hains. T o in terpret Theorem 11.11, let us assume that w e ha v e an ergo dic c hain that starts in state s i Let X ( m ) = 1 if the m th step is to state s j and 0 otherwise. Then the a v erage n um b er of times in state s j in the rst n steps is giv en b y H ( n ) = X (0) + X (1) + X (2) + + X ( n ) n + 1 : But X ( m ) tak es on the v alue 1 with probabilit y p ( m ) ij and 0 otherwise. Th us E ( X ( m ) ) = p ( m ) ij and the ij th en try of A n giv es the exp ected v alue of H ( n ) that PAGE 447 11.3. ER GODIC MARK O V CHAINS 439 is, the exp ected prop ortion of times in state s j in the rst n steps if the c hain starts in state s i If w e call b eing in state s j suc c ess and an y other state failur e, w e could ask if a theorem analogous to the la w of large n um b ers for indep enden t trials holds. The answ er is y es and is giv en b y the follo wing theorem. Theorem 11.12 (La w of Large Num b ers for Ergo dic Mark o v Chains) Let H ( n ) j b e the prop ortion of times in n steps that an ergo dic c hain is in state s j Then for an y > 0, P j H ( n ) j w j j > 0 ; indep enden t of the starting state s i 2 W e ha v e observ ed that ev ery regular Mark o v c hain is also an ergo dic c hain. Hence, Theorems 11.11 and 11.12 apply also for regular c hains. F or example, this giv es us a new in terpretation for the xed v ector w = ( : 4 ; : 2 ; : 4) in the Land of Oz example. Theorem 11.11 predicts that, in the long run, it will rain 40 p ercen t of the time in the Land of Oz, b e nice 20 p ercen t of the time, and sno w 40 p ercen t of the time. Sim ulation W e illustrate Theorem 11.12 b y writing a program to sim ulate the b eha vior of a Mark o v c hain. Sim ulateChain is suc h a program. Example 11.21 In the Land of Oz, there are 525 da ys in a y ear. W e ha v e sim ulated the w eather for one y ear in the Land of Oz, using the program Sim ulateChain The results are sho wn in T able 11.2. SSRNRNSSSSSSNRSN SS RNS RN SSS NS RRR NS SS NRR SS SSN RS SN SRR RR RRN SS S SSRRRSNSNRRRRSRS RN SNS RR NRR NR SSN SR NR NSS RR SRN SS SN RSR RS SNR SN R RNSSSSNSSNSRSRRN SS NSS RN SSR RN RRR SR NR RRN SS SNR NS RN SNR NR SSS RS S NRSSSNSSSSSSNSSS NS NSR RN RNR RR RSR RR SS SSN RR SSS SR SR RRN RR RSS SS R RNRRRSRSSRRRRSSR NR RRR RR NSS RN RSS SN RN SNR RR RNR RR NR SNR RN SRR SN R RRRSSSRNRRRNSNSS SS SRR RR SRN RS SRR RR SS SRR RN RNR RR SR SRN SN SSR RR R RNSNRNSNRRNRRRRR RS SSN RS SRS NR SSS NS NR NSN SS SNR RS RR RNR RR RNR NR S SSNSRSNRNRRSNRRN SR SSS RN SRR SS NSR RR NR RSN RR NSS SS SN RNS SS SSS SN R NSRRRNSSRRRNSSSN RR SRN SS RRN RR NRS NR RR RRR RR RNS NR RR RRN SR RSS SS N SNS State Times F raction R 217 .413 N 109 .208 S 199 .379 T able 11.2: W eather in the Land of Oz. PAGE 448 440 CHAPTER 11. MARK O V CHAINS W e note that the sim ulation giv es a prop ortion of times in eac h of the states not to o dieren t from the long run predictions of .4, .2, and .4 assured b y Theorem 11.7. T o get b etter results w e ha v e to sim ulate our c hain for a longer time. W e do this for 10,000 da ys without prin ting out eac h da y's w eather. The results are sho wn in T able 11.3. W e see that the results are no w quite close to the theoretical v alues of .4, .2, and .4. State Times F raction R 4010 .401 N 1902 .19 S 4088 .409 T able 11.3: Comparison of observ ed and predicted frequencies for the Land of Oz. 2 Examples of Ergo dic Chains The computation of the xed v ector w ma y b e dicult if the transition matrix is v ery large. It is sometimes useful to guess the xed v ector on purely in tuitiv e grounds. Here is a simple example to illustrate this kind of situation. Example 11.22 A white rat is put in to the maze of Figure 11.4. There are nine compartmen ts with connections b et w een the compartmen ts as indicated. The rat mo v es through the compartmen ts at random. That is, if there are k w a ys to lea v e a compartmen t, it c ho oses eac h of these with equal probabilit y W e can represen t the tra v els of the rat b y a Mark o v c hain pro cess with transition matrix giv en b y P = 0BBBBBBBBBBBBB@ 1 2 3 4 5 6 7 8 9 1 0 1 = 2 0 0 0 1 = 2 0 0 0 2 1 = 3 0 1 = 3 0 1 = 3 0 0 0 0 3 0 1 = 2 0 1 = 2 0 0 0 0 0 4 0 0 1 = 3 0 1 = 3 0 0 0 1 = 3 5 0 1 = 4 0 1 = 4 0 1 = 4 0 1 = 4 0 6 1 = 3 0 0 0 1 = 3 0 1 = 3 0 0 7 0 0 0 0 0 1 = 2 0 1 = 2 0 8 0 0 0 0 1 = 3 0 1 = 3 0 1 = 3 9 0 0 0 1 = 2 0 0 0 1 = 2 0 1CCCCCCCCCCCCCA : That this c hain is not regular can b e seen as follo ws: F rom an o ddn um b ered state the pro cess can go only to an ev enn um b ered state, and from an ev enn um b ered state it can go only to an o dd n um b er. Hence, starting in state i the pro cess will b e alternately in ev enn um b ered and o ddn um b ered states. Therefore, o dd p o w ers of P will ha v e 0's for the o ddn um b ered en tries in ro w 1. On the other hand, a glance at the maze sho ws that it is p ossible to go from ev ery state to ev ery other state, so that the c hain is ergo dic. PAGE 449 11.3. ER GODIC MARK O V CHAINS 441 1 2 3 4 5 6 7 8 9 Figure 11.4: The maze problem. T o nd the xed probabilit y v ector for this matrix, w e w ould ha v e to solv e ten equations in nine unkno wns. Ho w ev er, it w ould seem reasonable that the times sp en t in eac h compartmen t should, in the long run, b e prop ortional to the n um b er of en tries to eac h compartmen t. Th us, w e try the v ector whose j th comp onen t is the n um b er of en tries to the j th compartmen t: x = ( 2 3 2 3 4 3 2 3 2 ) : It is easy to c hec k that this v ector is indeed a xed v ector so that the unique probabilit y v ector is this v ector normalized to ha v e sum 1: w = ( 1 12 1 8 1 12 1 8 1 6 1 8 1 12 1 8 1 12 ) : 2 Example 11.23 (Example 11.8 con tin ued) W e recall the Ehrenfest urn mo del of Example 11.8. The transition matrix for this c hain is as follo ws: P = 0BBBB@ 0 1 2 3 4 0 : 000 1 : 000 : 000 : 000 : 000 1 : 250 : 000 : 750 : 000 : 000 2 : 000 : 500 : 000 : 500 : 000 3 : 000 : 000 : 750 : 000 : 250 4 : 000 : 000 : 000 1 : 000 : 000 1CCCCA : If w e run the program FixedV ector for this c hain, w e obtain the v ector w = 0 1 2 3 4 : 0625 : 2500 : 3750 : 2500 : 0625 : By Theorem 11.12, w e can in terpret these v alues for w i as the prop ortion of times the pro cess is in eac h of the states in the long run. F or example, the prop ortion of PAGE 450 442 CHAPTER 11. MARK O V CHAINS times in state 0 is .0625 and the prop ortion of times in state 1 is .375. The astute reader will note that these n um b ers are the binomial distribution 1/16, 4/16, 6/16, 4/16, 1/16. W e could ha v e guessed this answ er as follo ws: If w e consider a particular ball, it simply mo v es randomly bac k and forth b et w een the t w o urns. This suggests that the equilibrium state should b e just as if w e randomly distributed the four balls in the t w o urns. If w e did this, the probabilit y that there w ould b e exactly j balls in one urn w ould b e giv en b y the binomial distribution b ( n; p; j ) with n = 4 and p = 1 = 2. 2 Exercises 1 Whic h of the follo wing matrices are transition matrices for regular Mark o v c hains? (a) P = : 5 : 5 : 5 : 5 (b) P = : 5 : 5 1 0 (c) P = 0@ 1 = 3 0 2 = 3 0 1 0 0 1 = 5 4 = 5 1A (d) P = 0 1 1 0 (e) P = 0@ 1 = 2 1 = 2 0 0 1 = 2 1 = 2 1 = 3 1 = 3 1 = 3 1A 2 Consider the Mark o v c hain with transition matrix P = 0@ 1 = 2 1 = 3 1 = 6 3 = 4 0 1 = 4 0 1 0 1A : (a) Sho w that this is a regular Mark o v c hain. (b) The pro cess is started in state 1; nd the probabilit y that it is in state 3 after t w o steps. (c) Find the limiting probabilit y v ector w 3 Consider the Mark o v c hain with general 2 2 transition matrix P = 1 a a b 1 b : (a) Under what conditions is P absorbing? (b) Under what conditions is P ergo dic but not regular? (c) Under what conditions is P regular? PAGE 451 11.3. ER GODIC MARK O V CHAINS 443 4 Find the xed probabilit y v ector w for the matrices in Exercise 3 that are ergo dic. 5 Find the xed probabilit y v ector w for eac h of the follo wing regular matrices. (a) P = : 75 : 25 : 5 : 5 (b) P = : 9 : 1 : 1 : 9 (c) P = 0@ 3 = 4 1 = 4 0 0 2 = 3 1 = 3 1 = 4 1 = 4 1 = 2 1A 6 Consider the Mark o v c hain with transition matrix in Exercise 3, with a = b = 1. Sho w that this c hain is ergo dic but not regular. Find the xed probabilit y v ector and in terpret it. Sho w that P n do es not tend to a limit, but that A n = I + P + P 2 + + P n n + 1 do es. 7 Consider the Mark o v c hain with transition matrix of Exercise 3, with a = 0 and b = 1 = 2. Compute directly the unique xed probabilit y v ector, and use y our result to pro v e that the c hain is not ergo dic. 8 Sho w that the matrix P = 0@ 1 0 0 1 = 4 1 = 2 1 = 4 0 0 1 1A has more than one xed probabilit y v ector. Find the matrix that P n approac hes as n 1 and v erify that it is not a matrix all of whose ro ws are the same. 9 Pro v e that, if a 3b y3 transition matrix has the prop ert y that its c olumn sums are 1, then (1 = 3 ; 1 = 3 ; 1 = 3) is a xed probabilit y v ector. State a similar result for n b yn transition matrices. In terpret these results for ergo dic c hains. 10 Is the Mark o v c hain in Example 11.10 ergo dic? 11 Is the Mark o v c hain in Example 11.11 ergo dic? 12 Consider Example 11.13 (Drunk ard's W alk). Assume that if the w alk er reac hes state 0, he turns around and returns to state 1 on the next step and, similarly if he reac hes 4 he returns on the next step to state 3. Is this new c hain ergo dic? Is it regular? 13 F or Example 11.4 when P is ergo dic, what is the prop ortion of p eople who are told that the Presiden t will run? In terpret the fact that this prop ortion is indep enden t of the starting state. PAGE 452 444 CHAPTER 11. MARK O V CHAINS 14 Consider an indep enden t trials pro cess to b e a Mark o v c hain whose states are the p ossible outcomes of the individual trials. What is its xed probabilit y v ector? Is the c hain alw a ys regular? Illustrate this for Example 11.5. 15 Sho w that Example 11.8 is an ergo dic c hain, but not a regular c hain. Sho w that its xed probabilit y v ector w is a binomial distribution. 16 Sho w that Example 11.9 is regular and nd the limiting v ector. 17 T oss a fair die rep eatedly Let S n denote the total of the outcomes through the n th toss. Sho w that there is a limiting v alue for the prop ortion of the rst n v alues of S n that are divisible b y 7, and compute the v alue for this limit. Hint : The desired limit is an equilibrium probabilit y v ector for an appropriate sev en state Mark o v c hain. 18 Let P b e the transition matrix of a regular Mark o v c hain. Assume that there are r states and let N ( r ) b e the smallest in teger n suc h that P is regular if and only if P N ( r ) has no zero en tries. Find a nite upp er b ound for N ( r ). See if y ou can determine N (3) exactly *19 Dene f ( r ) to b e the smallest in teger n suc h that for all regular Mark o v c hains with r states, the n th p o w er of the transition matrix has all en tries p ositiv e. It has b een sho wn, 14 that f ( r ) = r 2 2 r + 2. (a) Dene the transition matrix of an r state Mark o v c hain as follo ws: F or states s i with i = 1, 2, r 2, P ( i; i + 1) = 1, P ( r 1 ; r ) = P ( r 1 ; 1) = 1 = 2, and P ( r ; 1) = 1. Sho w that this is a regular Mark o v c hain. (b) F or r = 3, v erify that the fth p o w er is the rst p o w er that has no zeros. (c) Sho w that, for general r the smallest n suc h that P n has all en tries p ositiv e is n = f ( r ). 20 A discrete time queueing system of capacit y n consists of the p erson b eing serv ed and those w aiting to b e serv ed. The queue length x is observ ed eac h second. If 0 < x < n then with probabilit y p the queue size is increased b y one b y an arriv al and, inep enden tly with probabilit y r it is decreased b y one b ecause the p erson b eing serv ed nishes service. If x = 0, only an arriv al (with probabilit y p ) is p ossible. If x = n an arriv al will depart without w aiting for service, and so only the departure (with probabilit y r ) of the p erson b eing serv ed is p ossible. F orm a Mark o v c hain with states giv en b y the n um b er of customers in the queue. Mo dify the program FixedV ector so that y ou can input n p and r and the program will construct the transition matrix and compute the xed v ector. The quan tit y s = p=r is called the tr ac intensity. Describ e the dierences in the xed v ectors according as s < 1, s = 1, or s > 1. 14 E. Seneta, NonNe gative Matric es: A n Intr o duction to The ory and Applic ations, Wiley New Y ork, 1973, pp. 5254. PAGE 453 11.3. ER GODIC MARK O V CHAINS 445 21 W rite a computer program to sim ulate the queue in Exercise 20. Ha v e y our program k eep trac k of the prop ortion of the time that the queue length is j for j = 0, 1, n and the a v erage queue length. Sho w that the b eha vior of the queue length is v ery dieren t dep ending up on whether the trac in tensit y s has the prop ert y s < 1, s = 1, or s > 1. 22 In the queueing problem of Exercise 20, let S b e the total service time required b y a customer and T the time b et w een arriv als of the customers. (a) Sho w that P ( S = j ) = (1 r ) j 1 r and P ( T = j ) = (1 p ) j 1 p for j > 0. (b) Sho w that E ( S ) = 1 =r and E ( T ) = 1 =p (c) In terpret the conditions s < 1, s = 1 and s > 1 in terms of these exp ected v alues. 23 In Exercise 20 the service time S has a geometric distribution with E ( S ) = 1 =r Assume that the service time is, instead, a constan t time of t seconds. Mo dify y our computer program of Exercise 21 so that it sim ulates a constan t time service distribution. Compare the a v erage queue length for the t w o t yp es of distributions when they ha v e the same exp ected service time (i.e., tak e t = 1 =r ). Whic h distribution leads to the longer queues on the a v erage? 24 A certain exp erimen t is b eliev ed to b e describ ed b y a t w ostate Mark o v c hain with the transition matrix P where P = : 5 : 5 p 1 p and the parameter p is not kno wn. When the exp erimen t is p erformed man y times, the c hain ends in state one appro ximately 20 p ercen t of the time and in state t w o appro ximately 80 p ercen t of the time. Compute a sensible estimate for the unkno wn parameter p and explain ho w y ou found it. 25 Pro v e that, in an r state ergo dic c hain, it is p ossible to go from an y state to an y other state in at most r 1 steps. 26 Let P b e the transition matrix of an r state ergo dic c hain. Pro v e that, if the diagonal en tries p ii are p ositiv e, then the c hain is regular. 27 Pro v e that if P is the transition matrix of an ergo dic c hain, then (1 = 2)( I + P ) is the transition matrix of a regular c hain. Hint : Use Exercise 26. 28 Pro v e that P and (1 = 2)( I + P ) ha v e the same xed v ectors. 29 In his b o ok, Wahrscheinlichkeitsr e chnung und Statistik, 15 A. Engle prop oses an algorithm for nding the xed v ector for an ergo dic Mark o v c hain when the transition probabilities are rational n um b ers. Here is his algorithm: F or 15 A. Engle, Wahrscheinlichkeitsr e chnung und Statistik, v ol. 2 (Stuttgart: Klett V erlag, 1976). PAGE 454 446 CHAPTER 11. MARK O V CHAINS (4 2 4) (5 2 3) (8 2 4) (7 3 4) (8 4 4) (8 3 5) (8 4 8) (10 4 6) (12 4 8) (12 5 7) (12 6 8) (13 5 8) (16 6 8) (15 6 9) (16 6 12) (17 7 10) (20 8 12) (20 8 12) : T able 11.4: Distribution of c hips. eac h state i let a i b e the least common m ultiple of the denominators of the nonzero en tries in the i th ro w. Engle describ es his algorithm in terms of mo ving c hips around on the statesindeed, for small examples, he recommends implemen ting the algorithm this w a y Start b y putting a i c hips on state i for all i Then, at eac h state, redistribute the a i c hips, sending a i p ij to state j The n um b er of c hips at state i after this redistribution need not b e a m ultiple of a i F or eac h state i add just enough c hips to bring the n um b er of c hips at state i up to a m ultiple of a i Then redistribute the c hips in the same manner. This pro cess will ev en tually reac h a p oin t where the n um b er of c hips at eac h state, after the redistribution, is the same as b efore redistribution. A t this p oin t, w e ha v e found a xed v ector. Here is an example: P = 0@ 1 2 3 1 1 = 2 1 = 4 1 = 4 2 1 = 2 0 1 = 2 3 1 = 2 1 = 4 1 = 4 1A : W e start with a = (4 ; 2 ; 4). The c hips after successiv e redistributions are sho wn in T able 11.4. W e nd that a = (20 ; 8 ; 12) is a xed v ector. (a) W rite a computer program to implemen t this algorithm. (b) Pro v e that the algorithm will stop. Hint : Let b b e a v ector with in teger comp onen ts that is a xed v ector for P and suc h that eac h co ordinate of PAGE 455 11.4. FUND AMENT AL LIMIT THEOREM 447 the starting v ector a is less than or equal to the corresp onding comp onen t of b Sho w that, in the iteration, the comp onen ts of the v ectors are alw a ys increasing, and alw a ys less than or equal to the corresp onding comp onen t of b 30 (Coman, Kaduta, and Shepp 16 ) A computing cen ter k eeps information on a tap e in p ositions of unit length. During eac h time unit there is one request to o ccup y a unit of tap e. When this arriv es the rst free unit is used. Also, during eac h second, eac h of the units that are o ccupied is v acated with probabilit y p Sim ulate this pro cess, starting with an empt y tap e. Estimate the exp ected n um b er of sites o ccupied for a giv en v alue of p If p is small, can y ou c ho ose the tap e long enough so that there is a small probabilit y that a new job will ha v e to b e turned a w a y (i.e., that all the sites are o ccupied)? F orm a Mark o v c hain with states the n um b er of sites o ccupied. Mo dify the program FixedV ector to compute the xed v ector. Use this to c hec k y our conjecture b y sim ulation. *31 (Alternate pro of of Theorem 11.8) Let P b e the transition matrix of an ergo dic Mark o v c hain. Let x b e an y column v ector suc h that P x = x Let M b e the maxim um v alue of the comp onen ts of x Assume that x i = M Sho w that if p ij > 0 then x j = M Use this to pro v e that x m ust b e a constan t v ector. 32 Let P b e the transition matrix of an ergo dic Mark o v c hain. Let w b e a xed probabilit y v ector (i.e., w is a ro w v ector with wP = w ). Sho w that if w i = 0 and p j i > 0 then w j = 0. Use this to sho w that the xed probabilit y v ector for an ergo dic c hain cannot ha v e an y 0 en tries. 33 Find a Mark o v c hain that is neither absorbing or ergo dic. 11.4 F undamen tal Limit Theorem for Regular Chains The fundamen tal limit theorem for regular Mark o v c hains states that if P is a regular transition matrix then lim n !1 P n = W ; where W is a matrix with eac h ro w equal to the unique xed probabilit y ro w v ector w for P In this section w e shall giv e t w o v ery dieren t pro ofs of this theorem. Our rst pro of is carried out b y sho wing that, for an y column v ector y P n y tends to a constan t v ector. As indicated in Section 11.3, this will sho w that P n con v erges to a matrix with constan t columns or, equiv alen tly to a matrix with all ro ws the same. The follo wing lemma sa ys that if an r b yr transition matrix has no zero en tries, and y is an y column v ector with r en tries, then the v ector P y has en tries whic h are \closer together" than the en tries are in y 16 E. G. Coman, J. T. Kaduta, and L. A. Shepp, \On the Asymptotic Optimalit y of FirstStorage Allo cation," IEEE T r ans. Softwar e Engine ering, v ol. I I (1985), pp. 235239. PAGE 456 448 CHAPTER 11. MARK O V CHAINS Lemma 11.1 Let P b e an r b yr transition matrix with no zero en tries. Let d b e the smallest en try of the matrix. Let y b e a column v ector with r comp onen ts, the largest of whic h is M 0 and the smallest m 0 Let M 1 and m 1 b e the largest and smallest comp onen t, resp ectiv ely of the v ector P y Then M 1 m 1 (1 2 d )( M 0 m 0 ) : Pro of. In the discussion follo wing Theorem11.7, it w as noted that eac h en try in the v ector Py is a w eigh ted a v erage of the en tries in y The largest w eigh ted a v erage that could b e obtained in the presen t case w ould o ccur if all but one of the en tries of y ha v e v alue M 0 and one en try has v alue m 0 and this one small en try is w eigh ted b y the smallest p ossible w eigh t, namely d In this case, the w eigh ted a v erage w ould equal dm 0 + (1 d ) M 0 : Similarly the smallest p ossible w eigh ted a v erage equals dM 0 + (1 d ) m 0 : Th us, M 1 m 1 dm 0 + (1 d ) M 0 dM 0 + (1 d ) m 0 = (1 2 d )( M 0 m 0 ) : This completes the pro of of the lemma. 2 W e turn no w to the pro of of the fundamen tal limit theorem for regular Mark o v c hains. Theorem 11.13 (F undamen tal Limit Theorem for Regular Chains) If P is the transition matrix for a regular Mark o v c hain, then lim n !1 P n = W ; where W is matrix with all ro ws equal. F urthermore, all en tries in W are strictly p ositiv e. Pro of. W e pro v e this theorem for the sp ecial case that P has no 0 en tries. The extension to the general case is indicated in Exercise 5. Let y b e an y r comp onen t column v ector, where r is the n um b er of states of the c hain. W e assume that r > 1, since otherwise the theorem is trivial. Let M n and m n b e, resp ectiv ely the maxim um and minim um comp onen ts of the v ector P n y The v ector P n y is obtained from the v ector P n 1 y b y m ultiplying on the left b y the matrix P Hence eac h comp onen t of P n y is an a v erage of the comp onen ts of P n 1 y Th us M 0 M 1 M 2 PAGE 457 11.4. FUND AMENT AL LIMIT THEOREM 449 and m 0 m 1 m 2 : Eac h sequence is monotone and b ounded: m 0 m n M n M 0 : Hence, eac h of these sequences will ha v e a limit as n tends to innit y Let M b e the limit of M n and m the limit of m n W e kno w that m M W e shall pro v e that M m = 0. This will b e the case if M n m n tends to 0. Let d b e the smallest elemen t of P Since all en tries of P are strictly p ositiv e, w e ha v e d > 0. By our lemma M n m n (1 2 d )( M n 1 m n 1 ) : F rom this w e see that M n m n (1 2 d ) n ( M 0 m 0 ) : Since r 2, w e m ust ha v e d 1 = 2, so 0 1 2 d < 1, so the dierence M n m n tends to 0 as n tends to innit y Since ev ery comp onen t of P n y lies b et w een m n and M n eac h comp onen t m ust approac h the same n um b er u = M = m This sho ws that lim n !1 P n y = u ; where u is a column v ector all of whose comp onen ts equal u No w let y b e the v ector with j th comp onen t equal to 1 and all other comp onen ts equal to 0. Then P n y is the j th column of P n Doing this for eac h j pro v es that the columns of P n approac h constan t column v ectors. That is, the ro ws of P n approac h a common ro w v ector w or, lim n !1 P n = W : It remains to sho w that all en tries in W are strictly p ositiv e. As b efore, let y b e the v ector with j th comp onen t equal to 1 and all other comp onen ts equal to 0. Then Py is the j th column of P and this column has all en tries strictly p ositiv e. The minim um comp onen t of the v ector Py w as dened to b e m 1 hence m 1 > 0. Since m 1 m w e ha v e m > 0. Note nally that this v alue of m is just the j th comp onen t of w so all comp onen ts of w are strictly p ositiv e. 2 Do eblin's Pro of W e giv e no w a v ery dieren t pro of of the main part of the fundamen tal limit theorem for regular Mark o v c hains. This pro of w as rst giv en b y Do eblin, 17 a brillian t y oung mathematician who w as killed in his t w en ties in the Second W orld W ar. 17 W. Do eblin, \Exp os e de la Th eorie des Chaines Simple Constan tes de Mark o v a un Nom bre Fini d'Etats," R ev. Mach. de l'Union Interb alkanique, v ol. 2 (1937), pp. 77{105. PAGE 458 450 CHAPTER 11. MARK O V CHAINS Theorem 11.14 Let P b e the transition matrix for a regular Mark o v c hain with xed v ector w Then for an y initial probabilit y v ector u uP n w as n 1 : Pro of. Let X 0 ; X 1 ; : : : b e a Mark o v c hain with transition matrix P started in state s i Let Y 0 ; Y 1 ; : : : b e a Mark o v c hain with transition probabilit y P started with initial probabilities giv en b y w The X and Y pro cesses are run indep enden tly of eac h other. W e consider also a third Mark o v c hain P whic h consists of w atc hing b oth the X and Y pro cesses. The states for P are pairs ( s i ; s j ). The transition probabilities are giv en b y P [( i; j ) ; ( k ; l )] = P ( i; k ) P ( j; l ) : Since P is regular there is an N suc h that P N ( i; j ) > 0 for all i and j Th us for the P c hain it is also p ossible to go from an y state ( s i ; s j ) to an y other state ( s k ; s l ) in at most N steps. That is P is also a regular Mark o v c hain. W e kno w that a regular Mark o v c hain will reac h an y state in a nite time. Let T b e the rst time the the c hain P is in a state of the form ( s k ; s k ). In other w ords, T is the rst time that the X and the Y pro cesses are in the same state. Then w e ha v e sho wn that P [ T > n ] 0 as n 1 : If w e w atc h the X and Y pro cesses after the rst time they are in the same state w e w ould not predict an y dierence in their long range b eha vior. Since this will happ en no matter ho w w e started these t w o pro cesses, it seems clear that the long range b eha viour should not dep end up on the starting state. W e no w sho w that this is true. W e rst note that if n T then since X and Y are b oth in the same state at time T P ( X n = j j n T ) = P ( Y n = j j n T ) : If w e m ultiply b oth sides of this equation b y P ( n T ), w e obtain P ( X n = j; n T ) = P ( Y n = j; n T ) : (11.1) W e kno w that for all n P ( Y n = j ) = w j : But P ( Y n = j ) = P ( Y n = j; n T ) + P ( Y n = j; n < T ) ; and the second summand on the righ thand side of this equation go es to 0 as n go es to 1 since P ( n < T ) go es to 0 as n go es to 1 So, P ( Y n = j; n T ) w j ; as n go es to 1 F rom Equation 11.1, w e see that P ( X n = j; n T ) w j ; PAGE 459 11.4. FUND AMENT AL LIMIT THEOREM 451 as n go es to 1 But b y similar reasoning to that used ab o v e, the dierence b et w een this last expression and P ( X n = j ) go es to 0 as n go es to 1 Therefore, P ( X n = j ) w j ; as n go es to 1 This completes the pro of. 2 In the ab o v e pro of, w e ha v e said nothing ab out the rate at whic h the distributions of the X n 's approac h the xed distribution w In fact, it can b e sho wn that 18 r X j =1 j P ( X n = j ) w j j 2 P ( T > n ) : The lefthand side of this inequalit y can b e view ed as the distance b et w een the distribution of the Mark o v c hain after n steps, starting in state s i and the limiting distribution w Exercises 1 Dene P and y b y P = : 5 : 5 : 25 : 75 ; y = 10 : Compute Py P 2 y and P 4 y and sho w that the results are approac hing a constan t v ector. What is this v ector? 2 Let P b e a regular r r transition matrix and y an y r comp onen t column v ector. Sho w that the v alue of the limiting constan t v ector for P n y is wy 3 Let P = 0@ 1 0 0 : 25 0 : 75 0 0 1 1A b e a transition matrix of a Mark o v c hain. Find t w o xed v ectors of P that are linearly indep enden t. Do es this sho w that the Mark o v c hain is not regular? 4 Describ e the set of all xed column v ectors for the c hain giv en in Exercise 3. 5 The theorem that P n W w as pro v ed only for the case that P has no zero en tries. Fill in the details of the follo wing extension to the case that P is regular. Since P is regular, for some N ; P N has no zeros. Th us, the pro of giv en sho ws that M nN m nN approac hes 0 as n tends to innit y Ho w ev er, the dierence M n m n can nev er increase. (Wh y?) Hence, if w e kno w that the dierences obtained b y lo oking at ev ery N th time tend to 0, then the en tire sequence m ust also tend to 0. 6 Let P b e a regular transition matrix and let w b e the unique nonzero xed v ector of P Sho w that no en try of w is 0. 18 T. Lindv all, L e ctur es on the Coupling Metho d (New Y ork: Wiley 1992). PAGE 460 452 CHAPTER 11. MARK O V CHAINS 7 Here is a tric k to try on y our friends. Sh ue a dec k of cards and deal them out one at a time. Coun t the face cards eac h as ten. Ask y our friend to lo ok at one of the rst ten cards; if this card is a six, she is to lo ok at the card that turns up six cards later; if this card is a three, she is to lo ok at the card that turns up three cards later, and so forth. Ev en tually she will reac h a p oin t where she is to lo ok at a card that turns up x cards later but there are not x cards left. Y ou then tell her the last card that she lo ok ed at ev en though y ou did not kno w her starting p oin t. Y ou tell her y ou do this b y w atc hing her, and she cannot disguise the times that she lo oks at the cards. In fact y ou just do the same pro cedure and, ev en though y ou do not start at the same p oin t as she do es, y ou will most lik ely end at the same p oin t. Wh y? 8 W rite a program to pla y the game in Exercise 7. 9 (Suggested b y P eter Do yle) In the pro of of Theorem 11.14, w e assumed the existence of a xed v ector w T o a v oid this assumption, b eef up the coupling argumen t to sho w (without assuming the existence of a stationary distribution w ) that for appropriate constan ts C and r < 1, the distance b et w een P n and P n is at most C r n for an y starting distributions and Apply this in the case where = P to conclude that the sequence P n is a Cauc h y sequence, and that its limit is a matrix W whose ro ws are all equal to a probabilit y v ector w with w P = w Note that the distance b et w een P n and w is at most C r n so in freeing ourselv es from the assumption ab out ha ving a xed v ector w e'v e pro v ed that the con v ergence to equilibrium tak es place exp onen tially fast. 11.5 Mean First P assage Time for Ergo dic Chains In this section w e consider t w o closely related descriptiv e quan tities of in terest for ergo dic c hains: the mean time to return to a state and the mean time to go from one state to another state. Let P b e the transition matrix of an ergo dic c hain with states s 1 s 2 s r Let w = ( w 1 ; w 2 ; : : : ; w r ) b e the unique probabilit y v ector suc h that wP = w Then, b y the La w of Large Num b ers for Mark o v c hains, in the long run the pro cess will sp end a fraction w j of the time in state s j Th us, if w e start in an y state, the c hain will ev en tually reac h state s j ; in fact, it will b e in state s j innitely often. Another w a y to see this is the follo wing: F orm a new Mark o v c hain b y making s j an absorbing state, that is, dene p j j = 1. If w e start at an y state other than s j this new pro cess will b eha v e exactly lik e the original c hain up to the rst time that state s j is reac hed. Since the original c hain w as an ergo dic c hain, it w as p ossible to reac h s j from an y other state. Th us the new c hain is an absorbing c hain with a single absorbing state s j that will ev en tually b e reac hed. So if w e start the original c hain at a state s i with i 6 = j w e will ev en tually reac h the state s j Let N b e the fundamen tal matrix for the new c hain. The en tries of N giv e the exp ected n um b er of times in eac h state b efore absorption. In terms of the original PAGE 461 11.5. MEAN FIRST P ASSA GE TIME 453 1 2 3 4 5 6 7 8 9 Figure 11.5: The maze problem. c hain, these quan tities giv e the exp ected n um b er of times in eac h of the states b efore reac hing state s j for the rst time. The i th comp onen t of the v ector Nc giv es the exp ected n um b er of steps b efore absorption in the new c hain, starting in state s i In terms of the old c hain, this is the exp ected n um b er of steps required to reac h state s j for the rst time starting at state s i Mean First P assage Time Denition 11.7 If an ergo dic Mark o v c hain is started in state s i the exp ected n um b er of steps to reac h state s j for the rst time is called the me an rst p assage time from s i to s j It is denoted b y m ij By con v en tion m ii = 0. 2 Example 11.24 Let us return to the maze example (Example 11.22). W e shall mak e this ergo dic c hain in to an absorbing c hain b y making state 5 an absorbing state. F or example, w e migh t assume that fo o d is placed in the cen ter of the maze and once the rat nds the fo o d, he sta ys to enjo y it (see Figure 11.5). The new transition matrix in canonical form is P = 0BBBBBBBBBBBBBB@ 1 2 3 4 6 7 8 9 5 1 0 1 = 2 0 0 1 = 2 0 0 0 0 2 1 = 3 0 1 = 3 0 0 0 0 0 1 = 3 3 0 1 = 2 0 1 = 2 0 0 0 0 0 4 0 0 1 = 3 0 0 1 = 3 0 1 = 3 1 = 3 6 1 = 3 0 0 0 0 0 0 0 1 = 3 7 0 0 0 0 1 = 2 0 1 = 2 0 0 8 0 0 0 0 0 1 = 3 0 1 = 3 1 = 3 9 0 0 0 1 = 2 0 0 1 = 2 0 0 5 0 0 0 0 0 0 0 0 1 1CCCCCCCCCCCCCCA : PAGE 462 454 CHAPTER 11. MARK O V CHAINS If w e compute the fundamen tal matrix N w e obtain N = 1 8 0BBBBBBBBBB@ 14 9 4 3 9 4 3 2 6 14 6 4 4 2 2 2 4 9 14 9 3 2 3 4 2 4 6 14 2 2 4 6 6 4 2 2 14 6 4 2 4 3 2 3 9 14 9 4 2 2 2 4 4 6 14 6 2 3 4 9 3 4 9 14 1CCCCCCCCCCA : The exp ected time to absorption for dieren t starting states is giv en b y the v ector Nc where N c = 0BBBBBBBBBB@ 65655656 1CCCCCCCCCCA : W e see that, starting from compartmen t 1, it will tak e on the a v erage six steps to reac h fo o d. It is clear from symmetry that w e should get the same answ er for starting at state 3, 7, or 9. It is also clear that it should tak e one more step, starting at one of these states, than it w ould starting at 2, 4, 6, or 8. Some of the results obtained from N are not so ob vious. F or instance, w e note that the exp ected n um b er of times in the starting state is 14/8 regardless of the state in whic h w e start. 2 Mean Recurrence Time A quan tit y that is closely related to the mean rst passage time is the me an r e curr enc e time, dened as follo ws. Assume that w e start in state s i ; consider the length of time b efore w e return to s i for the rst time. It is clear that w e m ust return, since w e either sta y at s i the rst step or go to some other state s j and from an y other state s j w e will ev en tually reac h s i b ecause the c hain is ergo dic. Denition 11.8 If an ergo dic Mark o v c hain is started in state s i the exp ected n um b er of steps to return to s i for the rst time is the me an r e curr enc e time for s i It is denoted b y r i 2 W e need to dev elop some basic prop erties of the mean rst passage time. Consider the mean rst passage time from s i to s j ; assume that i 6 = j This ma y b e computed as follo ws: tak e the exp ected n um b er of steps required giv en the outcome of the rst step, m ultiply b y the probabilit y that this outcome o ccurs, and add. If the rst step is to s j the exp ected n um b er of steps required is 1; if it is to some PAGE 463 11.5. MEAN FIRST P ASSA GE TIME 455 other state s k the exp ected n um b er of steps required is m k j plus 1 for the step already tak en. Th us, m ij = p ij + Xk 6 = j p ik ( m k j + 1) ; or, since P k p ik = 1, m ij = 1 + X k 6 = j p ik m k j : (11.2) Similarly starting in s i it m ust tak e at least one step to return. Considering all p ossible rst steps giv es us r i = X k p ik ( m k i + 1) (11.3) = 1 + X k p ik m k i : (11.4) Mean First P assage Matrix and Mean Recurrence Matrix Let us no w dene t w o matrices M and D The ij th en try m ij of M is the mean rst passage time to go from s i to s j if i 6 = j ; the diagonal en tries are 0. The matrix M is called the me an rst p assage matrix. The matrix D is the matrix with all en tries 0 except the diagonal en tries d ii = r i The matrix D is called the me an r e curr enc e matrix. Let C b e an r r matrix with all en tries 1. Using Equation 11.2 for the case i 6 = j and Equation 11.4 for the case i = j w e obtain the matrix equation M = PM + C D ; (11.5) or ( I P ) M = C D : (11.6) Equation 11.6 with m ii = 0 implies Equations 11.2 and 11.4. W e are no w in a p osition to pro v e our rst basic theorem. Theorem 11.15 F or an ergo dic Mark o v c hain, the mean recurrence time for state s i is r i = 1 =w i where w i is the i th comp onen t of the xed probabilit y v ector for the transition matrix. Pro of. Multiplying b oth sides of Equation 11.6 b y w and using the fact that w ( I P ) = 0 giv es wC wD = 0 : Here wC is a ro w v ector with all en tries 1 and wD is a ro w v ector with i th en try w i r i Th us (1 ; 1 ; : : : ; 1) = ( w 1 r 1 ; w 2 r 2 ; : : : ; w n r n ) and r i = 1 =w i ; as w as to b e pro v ed. 2 PAGE 464 456 CHAPTER 11. MARK O V CHAINS Corollary 11.1 F or an ergo dic Mark o v c hain, the comp onen ts of the xed probabilit y v ector w are strictly p ositiv e. Pro of. W e kno w that the v alues of r i are nite and so w i = 1 =r i cannot b e 0. 2 Example 11.25 In Example 11.22 w e found the xed probabilit y v ector for the maze example to b e w = ( 1 12 1 8 1 12 1 8 1 6 1 8 1 12 1 8 1 12 ) : Hence, the mean recurrence times are giv en b y the recipro cals of these probabilities. That is, r = ( 12 8 12 8 6 8 12 8 12 ) : 2 Returning to the Land of Oz, w e found that the w eather in the Land of Oz could b e represen ted b y a Mark o v c hain with states rain, nice, and sno w. In Section 11.3 w e found that the limiting v ector w as w = (2 = 5 ; 1 = 5 ; 2 = 5). F rom this w e see that the mean n um b er of da ys b et w een rain y da ys is 5/2, b et w een nice da ys is 5, and b et w een sno wy da ys is 5/2. F undamen tal Matrix W e shall no w dev elop a fundamen tal matrix for ergo dic c hains that will pla y a role similar to that of the fundamen tal matrix N = ( I Q ) 1 for absorbing c hains. As w as the case with absorbing c hains, the fundamen tal matrix can b e used to nd a n um b er of in teresting quan tities in v olving ergo dic c hains. Using this matrix, w e will giv e a metho d for calculating the mean rst passage times for ergo dic c hains that is easier to use than the metho d giv en ab o v e. In addition, w e will state (but not pro v e) the Cen tral Limit Theorem for Mark o v Chains, the statemen t of whic h uses the fundamen tal matrix. W e b egin b y considering the case that P is the transition matrix of a regular Mark o v c hain. Since there are no absorbing states, w e migh t b e tempted to try Z = ( I P ) 1 for a fundamen tal matrix. But I P do es not ha v e an in v erse. T o see this, recall that a matrix R has an in v erse if and only if Rx = 0 implies x = 0 But since P c = c w e ha v e ( I P ) c = 0 and so I P do es not ha v e an in v erse. W e recall that if w e ha v e an absorbing Mark o v c hain, and Q is the restriction of the transition matrix to the set of transien t states, then the fundamen tal matrix N could b e written as N = I + Q + Q 2 + : The reason that this p o w er series con v erges is that Q n 0, so this series acts lik e a con v ergen t geometric series. This idea migh t prompt one to try to nd a similar series for regular c hains. Since w e kno w that P n W w e migh t consider the series I + ( P W ) + ( P 2 W ) + : (11.7) PAGE 465 11.5. MEAN FIRST P ASSA GE TIME 457 W e no w use sp ecial prop erties of P and W to rewrite this series. The sp ecial prop erties are: 1) PW = W and 2) W k = W for all p ositiv e in tegers k These facts are easy to v erify and are left as an exercise (see Exercise 22). Using these facts, w e see that ( P W ) n = n X i =0 ( 1) i n i P n i W i = P n + n X i =1 ( 1) i n i W i = P n + n X i =1 ( 1) i n i W = P n + n X i =1 ( 1) i n i W : If w e expand the expression (1 1) n using the Binomial Theorem, w e obtain the expression in paren thesis ab o v e, except that w e ha v e an extra term (whic h equals 1). Since (1 1) n = 0, w e see that the ab o v e expression equals 1. So w e ha v e ( P W ) n = P n W ; for all n 1. W e can no w rewrite the series in 11.7 as I + ( P W ) + ( P W ) 2 + : Since the n th term in this series is equal to P n W the n th term go es to 0 as n go es to innit y This is sucien t to sho w that this series con v erges, and sums to the in v erse of the matrix I P + W W e call this in v erse the fundamental matrix asso ciated with the c hain, and w e denote it b y Z In the case that the c hain is ergo dic, but not regular, it is not true that P n W as n 1 Nev ertheless, the matrix I P + W still has an in v erse, as w e will no w sho w. Prop osition 11.1 Let P b e the transition matrix of an ergo dic c hain, and let W b e the matrix all of whose ro ws are the xed probabilit y ro w v ector for P Then the matrix I P + W has an in v erse. Pro of. Let x b e a column v ector suc h that ( I P + W ) x = 0 : T o pro v e the prop osition, it is sucien t to sho w that x m ust b e the zero v ector. Multiplying this equation b y w and using the fact that w ( I P ) = 0 and wW = w w e ha v e w ( I P + W ) x = wx = 0 : PAGE 466 458 CHAPTER 11. MARK O V CHAINS Therefore, ( I P ) x = 0 : But this means that x = Px is a xed column v ector for P By Theorem 11.10, this can only happ en if x is a constan t v ector. Since wx = 0, and w has strictly p ositiv e en tries, w e see that x = 0 This completes the pro of. 2 As in the regular case, w e will call the in v erse of the matrix I P + W the fundamental matrix for the ergo dic c hain with transition matrix P and w e will use Z to denote this fundamen tal matrix. Example 11.26 Let P b e the transition matrix for the w eather in the Land of Oz. Then I P + W = 0@ 1 0 0 0 1 0 0 0 1 1A 0@ 1 = 2 1 = 4 1 = 4 1 = 2 0 1 = 2 1 = 4 1 = 4 1 = 2 1A + 0@ 2 = 5 1 = 5 2 = 5 2 = 5 1 = 5 2 = 5 2 = 5 1 = 5 2 = 5 1A = 0@ 9 = 10 1 = 20 3 = 20 1 = 10 6 = 5 1 = 10 3 = 20 1 = 20 9 = 10 1A ; so Z = ( I P + W ) 1 = 0@ 86 = 75 1 = 25 14 = 75 2 = 25 21 = 25 2 = 25 14 = 75 1 = 25 86 = 75 1A : 2 Using the F undamen tal Matrix to Calculate the Mean First P assage Matrix W e shall sho w ho w one can obtain the mean rst passage matrix M from the fundamen tal matrix Z for an ergo dic Mark o v c hain. Before stating the theorem whic h giv es the rst passage times, w e need a few facts ab out Z Lemma 11.2 Let Z = ( I P + W ) 1 and let c b e a column v ector of all 1's. Then Zc = c ; wZ = w ; and Z ( I P ) = I W : Pro of. Since Pc = c and Wc = c c = ( I P + W ) c : If w e m ultiply b oth sides of this equation on the left b y Z w e obtain Zc = c : PAGE 467 11.5. MEAN FIRST P ASSA GE TIME 459 Similarly since wP = w and wW = w w = w ( I P + W ) : If w e m ultiply b oth sides of this equation on the righ t b y Z w e obtain wZ = w : Finally w e ha v e ( I P + W )( I W ) = I W P + W + W W = I P : Multiplying on the left b y Z w e obtain I W = Z ( I P ) : This completes the pro of. 2 The follo wing theorem sho ws ho w one can obtain the mean rst passage times from the fundamen tal matrix. Theorem 11.16 The mean rst passage matrix M for an ergo dic c hain is determined from the fundamen tal matrix Z and the xed ro w probabilit y v ector w b y m ij = z j j z ij w j : Pro of. W e sho w ed in Equation 11.6 that ( I P ) M = C D : Th us, Z ( I P ) M = ZC ZD ; and from Lemma 11.2, Z ( I P ) M = C ZD : Again using Lemma 11.2, w e ha v e M WM = C Z D or M = C Z D + WM : F rom this equation, w e see that m ij = 1 z ij r j + ( wM ) j : (11.8) But m j j = 0, and so 0 = 1 z j j r j + ( wM ) j ; PAGE 468 460 CHAPTER 11. MARK O V CHAINS or ( wM ) j = z j j r j 1 : (11.9) F rom Equations 11.8 and 11.9, w e ha v e m ij = ( z j j z ij ) r j : Since r j = 1 =w j m ij = z j j z ij w j : 2 Example 11.27 (Example 11.26 con tin ued) In the Land of Oz example, w e nd that Z = ( I P + W ) 1 = 0@ 86 = 75 1 = 25 14 = 75 2 = 25 21 = 25 2 = 25 14 = 75 1 = 25 86 = 75 1A : W e ha v e also seen that w = (2 = 5 ; 1 = 5 ; 2 = 5). So, for example, m 12 = z 22 z 12 w 2 = 21 = 25 1 = 25 1 = 5 = 4 ; b y Theorem 11.16. Carrying out the calculations for the other en tries of M w e obtain M = 0@ 0 4 10 = 3 8 = 3 0 8 = 3 10 = 3 4 0 1A : 2 ComputationThe program Ergo dicChain calculates the fundamen tal matrix, the xed v ector, the mean recurrence matrix D and the mean rst passage matrix M W e ha v e run the program for the Ehrenfest urn mo del (Example 11.8). W e obtain: P = 0BBBB@ 0 1 2 3 4 0 : 0000 1 : 0000 : 0000 : 0000 : 0000 1 : 2500 : 0000 : 7500 : 0000 : 0000 2 : 0000 : 5000 : 0000 : 5000 : 0000 3 : 0000 : 0000 : 7500 : 0000 : 2500 4 : 0000 : 0000 : 0000 1 : 0000 : 0000 1CCCCA ; w = 0 1 2 3 4 : 0625 : 2500 : 3750 : 2500 : 0625 ; PAGE 469 11.5. MEAN FIRST P ASSA GE TIME 461 r = 0 1 2 3 4 16 : 0000 4 : 0000 2 : 6667 4 : 0000 16 : 0000 ; M = 0BBBB@ 0 1 2 3 4 0 : 0000 1 : 0000 2 : 6667 6 : 3333 21 : 3333 1 15 : 0000 : 0000 1 : 6667 5 : 3333 20 : 3333 2 18 : 6667 3 : 6667 : 0000 3 : 6667 18 : 6667 3 20 : 3333 5 : 3333 1 : 6667 : 0000 15 : 0000 4 21 : 3333 6 : 3333 2 : 6667 1 : 0000 : 0000 1CCCCA : F rom the mean rst passage matrix, w e see that the mean time to go from 0 balls in urn 1 to 2 balls in urn 1 is 2.6667 steps while the mean time to go from 2 balls in urn 1 to 0 balls in urn 1 is 18.6667. This rerects the fact that the mo del exhibits a cen tral tendency Of course, the ph ysicist is in terested in the case of a large n um b er of molecules, or balls, and so w e should consider this example for n so large that w e cannot compute it ev en with a computer. Ehrenfest Mo del Example 11.28 (Example 11.23 con tin ued) Let us consider the Ehrenfest mo del (see Example 11.8) for gas diusion for the general case of 2 n balls. Ev ery second, one of the 2 n balls is c hosen at random and mo v ed from the urn it w as in to the other urn. If there are i balls in the rst urn, then with probabilit y i= 2 n w e tak e one of them out and put it in the second urn, and with probabilit y (2 n i ) = 2 n w e tak e a ball from the second urn and put it in the rst urn. A t eac h second w e let the n um b er i of balls in the rst urn b e the state of the system. Then from state i w e can pass only to state i 1 and i + 1, and the transition probabilities are giv en b y p ij = 8<: i 2 n ; if j = i 1 ; 1 i 2 n ; if j = i + 1 ; 0 ; otherwise. This denes the transition matrix of an ergo dic, nonregular Mark o v c hain (see Exercise 15). Here the ph ysicist is in terested in longterm predictions ab out the state o ccupied. In Example 11.23, w e ga v e an in tuitiv e reason for exp ecting that the xed v ector w is the binomial distribution with parameters 2 n and 1 = 2. It is easy to c hec k that this is correct. So, w i = 2 n i 2 2 n : Th us the mean recurrence time for state i is r i = 2 2 n 2 n i : PAGE 470 462 CHAPTER 11. MARK O V CHAINS 0 200 400 600 800 1000 40 45 50 55 60 65 0 200 400 600 800 1000 40 45 50 55 60 65 Time forwardTime reversed Figure 11.6: Ehrenfest sim ulation. Consider in particular the cen tral term i = n W e ha v e seen that this term is appro ximately 1 = p n Th us w e ma y appro ximate r n b y p n This mo del w as used to explain the concept of rev ersibilit y in ph ysical systems. Assume that w e let our system run un til it is in equilibrium. A t this p oin t, a mo vie is made, sho wing the system's progress. The mo vie is then sho wn to y ou, and y ou are ask ed to tell if the mo vie w as sho wn in the forw ard or the rev erse direction. It w ould seem that there should alw a ys b e a tendency to mo v e to w ard an equal prop ortion of balls so that the correct order of time should b e the one with the most transitions from i to i 1 if i > n and i to i + 1 if i < n In Figure 11.6 w e sho w the results of sim ulating the Ehrenfest urn mo del for the case of n = 50 and 1000 time units, using the program EhrenfestUrn The top graph sho ws these results graphed in the order in whic h they o ccurred and the b ottom graph sho ws the same results but with time rev ersed. There is no apparen t dierence. PAGE 471 11.5. MEAN FIRST P ASSA GE TIME 463 W e note that if w e had not started in equilibrium, the t w o graphs w ould t ypically lo ok quite dieren t. 2 Rev ersibilit y If the Ehrenfest mo del is started in equilibrium, then the pro cess has no apparen t time direction. The reason for this is that this pro cess has a prop ert y called r eversibility. Dene X n to b e the n um b er of balls in the left urn at step n W e can calculate, for a general ergo dic c hain, the rev erse transition probabilit y: P ( X n 1 = j j X n = i ) = P ( X n 1 = j; X n = i ) P ( X n = i ) = P ( X n 1 = j ) P ( X n = i j X n 1 = j ) P ( X n = i ) = P ( X n 1 = j ) p j i P ( X n = i ) : In general, this will dep end up on n since P ( X n = j ) and also P ( X n 1 = j ) c hange with n Ho w ev er, if w e start with the v ector w or w ait un til equilibrium is reac hed, this will not b e the case. Then w e can dene p ij = w j p j i w i as a transition matrix for the pro cess w atc hed with time rev ersed. Let us calculate a t ypical transition probabilit y for the rev erse c hain P = f p ij g in the Ehrenfest mo del. F or example, p i;i 1 = w i 1 p i 1 ;i w i = 2 n i 1 2 2 n 2 n i + 1 2 n 2 2 n 2 n i = (2 n )! ( i 1)! (2 n i + 1)! (2 n i + 1) i (2 n i )! 2 n (2 n )! = i 2 n = p i;i 1 : Similar calculations for the other transition probabilities sho w that P = P When this o ccurs the pro cess is called r eversible. Clearly an ergo dic c hain is rev ersible if, and only if, for ev ery pair of states s i and s j w i p ij = w j p j i In particular, for the Ehrenfest mo del this means that w i p i;i 1 = w i 1 p i 1 ;i Th us, in equilibrium, the pairs ( i; i 1) and ( i 1 ; i ) should o ccur with the same frequency While man y of the Mark o v c hains that o ccur in applications are rev ersible, this is a v ery strong condition. In Exercise 12 y ou are ask ed to nd an example of a Mark o v c hain whic h is not rev ersible. The Cen tral Limit Theorem for Mark o v Chains Supp ose that w e ha v e an ergo dic Mark o v c hain with states s 1 ; s 2 ; : : : ; s k It is natural to consider the distribution of the random v ariables S ( n ) j whic h denotes PAGE 472 464 CHAPTER 11. MARK O V CHAINS the n um b er of times that the c hain is in state s j in the rst n steps. The j th comp onen t w j of the xed probabilit y ro w v ector w is the prop ortion of times that the c hain is in state s j in the long run. Hence, it is reasonable to conjecture that the exp ected v alue of the random v ariable S ( n ) j as n 1 is asymptotic to nw j and it is easy to sho w that this is the case (see Exercise 23). It is also natural to ask whether there is a limiting distribution of the random v ariables S ( n ) j The answ er is y es, and in fact, this limiting distribution is the normal distribution. As in the case of indep enden t trials, one m ust normalize these random v ariables. Th us, w e m ust subtract from S ( n ) j its exp ected v alue, and then divide b y its standard deviation. In b oth cases, w e will use the asymptotic v alues of these quan tities, rather than the v alues themselv es. Th us, in the rst case, w e will use the v alue nw j It is not so clear what w e should use in the second case. It turns out that the quan tit y 2 j = 2 w j z j j w j w 2 j (11.10) represen ts the asymptotic v ariance. Armed with these ideas, w e can state the follo wing theorem. Theorem 11.17 (Cen tral Limit Theorem for Mark o v Chains) F or an ergo dic c hain, for an y real n um b ers r < s w e ha v e P r < S ( n ) j nw j q n 2 j < s 1 p 2 Z s r e x 2 = 2 dx ; as n 1 for an y c hoice of starting state, where 2 j is the quan tit y dened in Equation 11.10. 2 Historical Remarks Mark o v c hains w ere in tro duced b y Andre i Andreevic h Mark o v (1856{1922) and w ere named in his honor. He w as a talen ted undergraduate who receiv ed a gold medal for his undergraduate thesis at St. P etersburg Univ ersit y Besides b eing an activ e researc h mathematician and teac her, he w as also activ e in p olitics and patricipated in the lib eral mo v emen t in Russia at the b eginning of the t w en tieth cen tury In 1913, when the go v ernmen t celebrated the 300th anniv ersary of the House of Romano v family Mark o v organized a coun tercelebration of the 200th anniv ersary of Bernoulli's disco v ery of the La w of Large Num b ers. Mark o v w as led to dev elop Mark o v c hains as a natural extension of sequences of indep enden t random v ariables. In his rst pap er, in 1906, he pro v ed that for a Mark o v c hain with p ositiv e transition probabilities and n umerical states the a v erage of the outcomes con v erges to the exp ected v alue of the limiting distribution (the xed v ector). In a later pap er he pro v ed the cen tral limit theorem for suc h c hains. W riting ab out Mark o v, A. P Y ousc hk evitc h remarks: Mark o v arriv ed at his c hains starting from the in ternal needs of probabilit y theory and he nev er wrote ab out their applications to ph ysical PAGE 473 11.5. MEAN FIRST P ASSA GE TIME 465 science. F or him the only real examples of the c hains w ere literary texts, where the t w o states denoted the v o w els and consonan ts. 19 In a pap er written in 1913, 20 Mark o v c hose a sequence of 20,000 letters from Pushkin's Eugene One gin to see if this sequence can b e appro ximately considered a simple c hain. He obtained the Mark o v c hain with transition matrix v o w el consonan t v o w el : 128 : 872 consonan t : 663 : 337 : The xed v ector for this c hain is ( : 432 ; : 568), indicating that w e should exp ect ab out 43.2 p ercen t v o w els and 56.8 p ercen t consonan ts in the no v el, whic h w as b orne out b y the actual coun t. Claude Shannon considered an in teresting extension of this idea in his b o ok The Mathematic al The ory of Communic ation, 21 in whic h he dev elop ed the informationtheoretic concept of en trop y Shannon considers a series of Mark o v c hain appro ximations to English prose. He do es this rst b y c hains in whic h the states are letters and then b y c hains in whic h the states are w ords. F or example, for the case of w ords he presen ts rst a sim ulation where the w ords are c hosen indep enden tly but with appropriate frequencies. REPRESENTING AND SPEEDIL Y IS AN GOOD APT OR COME CAN DIFFERENT NA TURAL HERE HE THE A IN CAME THE TO OF TO EXPER T GRA Y COME TO FURNISHES THE LINE MESSA GE HAD BE THESE. He then notes the increased resem blence to ordinary English text when the w ords are c hosen as a Mark o v c hain, in whic h case he obtains THE HEAD AND IN FR ONT AL A TT A CK ON AN ENGLISH WRITER THA T THE CHARA CTER OF THIS POINT IS THEREF ORE ANOTHER METHOD F OR THE LETTERS THA T THE TIME OF WHO EVER TOLD THE PR OBLEM F OR AN UNEXPECTED. A sim ulation lik e the last one is carried out b y op ening a b o ok and c ho osing the rst w ord, sa y it is the. Then the b o ok is read un til the w ord the app ears again and the w ord after this is c hosen as the second w ord, whic h turned out to b e he ad. The b o ok is then read un til the w ord he ad app ears again and the next w ord, and, is c hosen, and so on. Other early examples of the use of Mark o v c hains o ccurred in Galton's study of the problem of surviv al of family names in 1889 and in the Mark o v c hain in tro duced 19 See Dictionary of Scientic Bio gr aphy, ed. C. C. Gillespie (New Y ork: Scribner's Sons, 1970), pp. 124{130. 20 A. A. Mark o v, \An Example of Statistical Analysis of the T ext of Eugene Onegin Illustrating the Asso ciation of T rials in to a Chain," Bul letin de l'A c adamie Imp eriale des Scienc es de St. Petersbur g, ser. 6, v ol. 7 (1913), pp. 153{162. 21 C. E. Shannon and W. W ea v er, The Mathematic al The ory of Communic ation (Urbana: Univ. of Illinois Press, 1964). PAGE 474 466 CHAPTER 11. MARK O V CHAINS b y P and T. Ehrenfest in 1907 for diusion. P oincar e in 1912 dicussed card sh uing in terms of an ergo dic Mark o v c hain dened on a p erm utation group. Bro wnian motion, a con tin uous time v ersion of random w alk, w as in tro ducted in 1900{1901 b y L. Bac helier in his study of the sto c k mark et, and in 1905{1907 in the w orks of A. Einstein and M. Smoluc ho wsky in their study of ph ysical pro cesses. One of the rst systematic studies of nite Mark o v c hains w as carried out b y M. F rec het. 22 The treatmen t of Mark o v c hains in terms of the t w o fundamen tal matrices that w e ha v e used w as dev elop ed b y Kemen y and Snell 23 to a v oid the use of eigen v alues that one of these authors found to o complex. The fundamen tal matrix N o ccurred also in the w ork of J. L. Do ob and others in studying the connection b et w een Mark o v pro cesses and classical p oten tial theory The fundamen tal matrix Z for ergo dic c hains app eared rst in the w ork of F rec het, who used it to nd the limiting v ariance for the cen tral limit theorem for Mark o v c hains. Exercises 1 Consider the Mark o v c hain with transition matrix P = 1 = 2 1 = 2 1 = 4 3 = 4 : Find the fundamen tal matrix Z for this c hain. Compute the mean rst passage matrix using Z 2 A study of the strengths of Ivy League fo otball teams sho ws that if a sc ho ol has a strong team one y ear it is equally lik ely to ha v e a strong team or a v erage team next y ear; if it has an a v erage team, half the time it is a v erage next y ear, and if it c hanges it is just as lik ely to b ecome strong as w eak; if it is w eak it has 2/3 probabilit y of remaining so and 1/3 of b ecoming a v erage. (a) A sc ho ol has a strong team. On the a v erage, ho w long will it b e b efore it has another strong team? (b) A sc ho ol has a w eak team; ho w long (on the a v erage) m ust the alumni w ait for a strong team? 3 Consider Example 11.4 with a = : 5 and b = : 75. Assume that the Presiden t sa ys that he or she will run. Find the exp ected length of time b efore the rst time the answ er is passed on incorrectly 4 Find the mean recurrence time for eac h state of Example 11.4 for a = : 5 and b = : 75. Do the same for general a and b 5 A die is rolled rep eatedly Sho w b y the results of this section that the mean time b et w een o ccurrences of a giv en n um b er is 6. 22 M. F rec het, \Th eorie des ev enemen ts en c haine dans le cas d'un nom bre ni d' etats p ossible," in R e cher ches th eoriques Mo dernes sur le c alcul des pr ob abilit es, v ol. 2 (P aris, 1938). 23 J. G. Kemen y and J. L. Snell, Finite Markov Chains. PAGE 475 11.5. MEAN FIRST P ASSA GE TIME 467 2 4 3 6 5 1 Figure 11.7: Maze for Exercise 7. 6 F or the Land of Oz example (Example 11.1), mak e rain in to an absorbing state and nd the fundamen tal matrix N In terpret the results obtained from this c hain in terms of the original c hain. 7 A rat runs through the maze sho wn in Figure 11.7. A t eac h step it lea v es the ro om it is in b y c ho osing at random one of the do ors out of the ro om. (a) Giv e the transition matrix P for this Mark o v c hain. (b) Sho w that it is an ergo dic c hain but not a regular c hain. (c) Find the xed v ector. (d) Find the exp ected n um b er of steps b efore reac hing Ro om 5 for the rst time, starting in Ro om 1. 8 Mo dify the program Ergo dicChain so that y ou can compute the basic quantities for the queueing example of Exercise 11.3.20. In terpret the mean recurrence time for state 0. 9 Consider a random w alk on a circle of circumference n The w alk er tak es one unit step clo c kwise with probabilit y p and one unit coun terclo c kwise with probabilit y q = 1 p Mo dify the program Ergo dicChain to allo w y ou to input n and p and compute the basic quan tities for this c hain. (a) F or whic h v alues of n is this c hain regular? ergo dic? (b) What is the limiting v ector w ? (c) Find the mean rst passage matrix for n = 5 and p = : 5. V erify that m ij = d ( n d ), where d is the clo c kwise distance from i to j 10 Tw o pla y ers matc h p ennies and ha v e b et w een them a total of 5 p ennies. If at an y time one pla y er has all of the p ennies, to k eep the game going, he giv es one bac k to the other pla y er and the game will con tin ue. Sho w that this game can b e form ulated as an ergo dic c hain. Study this c hain using the program Ergo dicChain PAGE 476 468 CHAPTER 11. MARK O V CHAINS 11 Calculate the rev erse transition matrix for the Land of Oz example (Example 11.1). Is this c hain rev ersible? 12 Giv e an example of a threestate ergo dic Mark o v c hain that is not rev ersible. 13 Let P b e the transition matrix of an ergo dic Mark o v c hain and P the rev erse transition matrix. Sho w that they ha v e the same xed probabilit y v ector w 14 If P is a rev ersible Mark o v c hain, is it necessarily true that the mean time to go from state i to state j is equal to the mean time to go from state j to state i ? Hint : T ry the Land of Oz example (Example 11.1). 15 Sho w that an y ergo dic Mark o v c hain with a symmetric transition matrix (i.e., p ij = p j i ) is rev ersible. 16 (Cro w ell 24 ) Let P b e the transition matrix of an ergo dic Mark o v c hain. Sho w that ( I + P + + P n 1 )( I P + W ) = I P n + n W ; and from this sho w that I + P + + P n 1 n W ; as n 1 17 An ergo dic Mark o v c hain is started in equilibrium (i.e., with initial probabilit y v ector w ). The mean time un til the next o ccurrence of state s i is m i = P k w k m k i + w i r i Sho w that m i = z ii =w i b y using the facts that w Z = w and m k i = ( z ii z k i ) =w i 18 A p erp etual craps game go es on at Charley's. Jones comes in to Charley's on an ev ening when there ha v e already b een 100 pla ys. He plans to pla y un til the next time that snak e ey es (a pair of ones) are rolled. Jones w onders ho w man y times he will pla y On the one hand he realizes that the a v erage time b et w een snak e ey es is 36 so he should pla y ab out 18 times as he is equally lik ely to ha v e come in on either side of the halfw a y p oin t b et w een o ccurrences of snak e ey es. On the other hand, the dice ha v e no memory and so it w ould seem that he w ould ha v e to pla y for 36 more times no matter what the previous outcomes ha v e b een. Whic h, if either, of Jones's argumen ts do y ou b eliev e? Using the result of Exercise 17, calculate the exp ected to reac h snak e ey es, in equilibrium, and see if this resolv es the apparen t parado x. If y ou are still in doubt, sim ulate the exp erimen t to decide whic h argumen t is correct. Can y ou giv e an in tuitiv e argumen t whic h explains this result? 19 Sho w that, for an ergo dic Mark o v c hain (see Theorem 11.16), X j m ij w j = X j z j j 1 = K : 24 Priv ate comm unication. PAGE 477 11.5. MEAN FIRST P ASSA GE TIME 469 5r B 20r C 30r A 15rGO Figure 11.8: Simplied Monop oly The second expression ab o v e sho ws that the n um b er K is indep enden t of i The n um b er K is called Kemeny's c onstant. A prize w as oered to the rst p erson to giv e an in tuitiv ely plausible reason for the ab o v e sum to b e indep enden t of i (See also Exercise 24.) 20 Consider a game pla y ed as follo ws: Y ou are giv en a regular Mark o v c hain with transition matrix P xed probabilit y v ector w and a pa y o function f whic h assigns to eac h state s i an amoun t f i whic h ma y b e p ositiv e or negativ e. Assume that wf = 0. Y ou w atc h this Mark o v c hain as it ev olv es, and ev ery time y ou are in state s i y ou receiv e an amoun t f i Sho w that y our exp ected winning after n steps can b e represen ted b y a column v ector g ( n ) with g ( n ) = ( I + P + P 2 + + P n ) f : Sho w that as n 1 g ( n ) g with g = Zf 21 A highly simplied game of \Monop oly" is pla y ed on a b oard with four squares as sho wn in Figure 11.8. Y ou start at GO. Y ou roll a die and mo v e clo c kwise around the b oard a n um b er of squares equal to the n um b er that turns up on the die. Y ou collect or pa y an amoun t indicated on the square on whic h y ou land. Y ou then roll the die again and mo v e around the b oard in the same manner from y our last p osition. Using the result of Exercise 20, estimate the amoun t y ou should exp ect to win in the long run pla ying this v ersion of Monop oly 22 Sho w that if P is the transition matrix of a regular Mark o v c hain, and W is the matrix eac h of whose ro ws is the xed probabilit y v ector corresp onding to P then PW = W and W k = W for all p ositiv e in tegers k 23 Assume that an ergo dic Mark o v c hain has states s 1 ; s 2 ; : : : ; s k Let S ( n ) j denote the n um b er of times that the c hain is in state s j in the rst n steps. Let w denote the xed probabilit y ro w v ector for this c hain. Sho w that, regardless of the starting state, the exp ected v alue of S ( n ) j divided b y n tends to w j as n 1 Hint : If the c hain starts in state s i then the exp ected v alue of S ( n ) j is giv en b y the expression n X h =0 p ( h ) ij : PAGE 478 470 CHAPTER 11. MARK O V CHAINS 24 In the course of a w alk with Snell along Minnehaha Av en ue in Minneap olis in the fall of 1983, P eter Do yle 25 suggested the follo wing explanation for the constancy of Kemeny's c onstant (see Exercise 19). Cho ose a target state according to the xed v ector w Start from state i and w ait un til the time T that the target state o ccurs for the rst time. Let K i b e the exp ected v alue of T Observ e that K i + w i 1 =w i = X j P ij K j + 1 ; and hence K i = X j P ij K j : By the maxim um principle, K i is a constan t. Should P eter ha v e b een giv en the prize? 25 Priv ate comm unication. PAGE 479 Chapter 12 Random W alks 12.1 Random W alks in Euclidean Space In the last sev eral c hapters, w e ha v e studied sums of random v ariables with the goal b eing to describ e the distribution and densit y functions of the sum. In this c hapter, w e shall lo ok at sums of discrete random v ariables from a dieren t p ersp ectiv e. W e shall b e concerned with prop erties whic h can b e asso ciated with the sequence of partial sums, suc h as the n um b er of sign c hanges of this sequence, the n um b er of terms in the sequence whic h equal 0, and the exp ected size of the maxim um term in the sequence. W e b egin with the follo wing denition. Denition 12.1 Let f X k g 1k =1 b e a sequence of indep enden t, iden tically distributed discrete random v ariables. F or eac h p ositiv e in teger n w e let S n denote the sum X 1 + X 2 + + X n The sequence f S n g 1n =1 is called a r andom walk. If the common range of the X k 's is R m then w e sa y that f S n g is a random w alk in R m 2 W e view the sequence of X k 's as b eing the outcomes of indep enden t exp erimen ts. Since the X k 's are indep enden t, the probabilit y of an y particular (nite) sequence of outcomes can b e obtained b y m ultiplying the probabilities that eac h X k tak es on the sp ecied v alue in the sequence. Of course, these individual probabilities are giv en b y the common distribution of the X k 's. W e will t ypically b e in terested in nding probabilities for ev en ts in v olving the related sequence of S n 's. Suc h ev en ts can b e describ ed in terms of the X k 's, so their probabilities can b e calculated using the ab o v e idea. There are sev eral w a ys to visualize a random w alk. One can imagine that a particle is placed at the origin in R m at time n = 0. The sum S n represen ts the p osition of the particle at the end of n seconds. Th us, in the time in terv al [ n 1 ; n ], the particle mo v es (or jumps) from p osition S n 1 to S n The v ector represen ting this motion is just S n S n 1 whic h equals X n This means that in a random w alk, the jumps are indep enden t and iden tically distributed. If m = 1, for example, then one can imagine a particle on the real line that starts at the origin, and at the end of eac h second, jumps one unit to the righ t or the left, with probabilities giv en 471 PAGE 480 472 CHAPTER 12. RANDOM W ALKS b y the distribution of the X k 's. If m = 2, one can visualize the pro cess as taking place in a cit y in whic h the streets form square cit y blo c ks. A p erson starts at one corner (i.e., at an in tersection of t w o streets) and go es in one of the four p ossible directions according to the distribution of the X k 's. If m = 3, one migh t imagine b eing in a jungle gym, where one is free to mo v e in an y one of six directions (left, righ t, forw ard, bac kw ard, up, and do wn). Once again, the probabilities of these mo v emen ts are giv en b y the distribution of the X k 's. Another mo del of a random w alk (used mostly in the case where the range is R 1 ) is a game, in v olving t w o p eople, whic h consists of a sequence of indep enden t, iden tically distributed mo v es. The sum S n represen ts the score of the rst p erson, sa y after n mo v es, with the assumption that the score of the second p erson is S n F or example, t w o p eople migh t b e ripping coins, with a matc h or nonmatc h represen ting +1 or 1, resp ectiv ely for the rst pla y er. Or, p erhaps one coin is b eing ripp ed, with a head or tail represen ting +1 or 1, resp ectiv ely for the rst pla y er. Random W alks on the Real Line W e shall rst consider the simplest nontrivial case of a random w alk in R 1 namely the case where the common distribution function of the random v ariables X n is giv en b y f X ( x ) = 1 = 2 ; if x = 1 ; 0 ; otherwise. This situation corresp onds to a fair coin b eing ripp ed, with S n represen ting the n um b er of heads min us the n um b er of tails whic h o ccur in the rst n rips. W e note that in this situation, all paths of length n ha v e the same probabilit y namely 2 n It is sometimes instructiv e to represen t a random w alk as a p olygonal line, or path, in the plane, where the horizon tal axis represen ts time and the v ertical axis represen ts the v alue of S n Giv en a sequence f S n g of partial sums, w e rst plot the p oin ts ( n; S n ), and then for eac h k < n w e connect ( k ; S k ) and ( k + 1 ; S k +1 ) with a straigh t line segmen t. The length of a path is just the dierence in the time v alues of the b eginning and ending p oin ts on the path. The reader is referred to Figure 12.1. This gure, and the pro cess it illustrates, are iden tical with the example, giv en in Chapter 1, of t w o p eople pla ying heads or tails. Returns and First Returns W e sa y that an e qualization has o ccurred, or there is a r eturn to the origin at time n if S n = 0. W e note that this can only o ccur if n is an ev en in teger. T o calculate the probabilit y of an equalization at time 2 m w e need only coun t the n um b er of paths of length 2 m whic h b egin and end at the origin. The n um b er of suc h paths is clearly 2 m m : Since eac h path has probabilit y 2 2 m w e ha v e the follo wing theorem. PAGE 481 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 473 5 10 15 20 25 30 35 40 10 8 6 4 2 2 4 6 8 10 Figure 12.1: A random w alk of length 40. Theorem 12.1 The probabilit y of a return to the origin at time 2 m is giv en b y u 2 m = 2 m m 2 2 m : The probabilit y of a return to the origin at an o dd time is 0. 2 A random w alk is said to ha v e a rst r eturn to the origin at time 2 m if m > 0, and S 2 k 6 = 0 for all k < m In Figure 12.1, the rst return o ccurs at time 2. W e dene f 2 m to b e the probabilit y of this ev en t. (W e also dene f 0 = 0.) One can think of the expression f 2 m 2 2 m as the n um b er of paths of length 2 m b et w een the p oin ts (0 ; 0) and (2 m; 0) that do not touc h the horizon tal axis except at the endp oin ts. Using this idea, it is easy to pro v e the follo wing theorem. Theorem 12.2 F or n 1, the probabilities f u 2 k g and f f 2 k g are related b y the equation u 2 n = f 0 u 2 n + f 2 u 2 n 2 + + f 2 n u 0 : Pro of. There are u 2 n 2 2 n paths of length 2 n whic h ha v e endp oin ts (0 ; 0) and (2 n; 0). The collection of suc h paths can b e partitioned in to n sets, dep ending up on the time of the rst return to the origin. A path in this collection whic h has a rst return to the origin at time 2 k consists of an initial segmen t from (0 ; 0) to (2 k ; 0), in whic h no in terior p oin ts are on the horizon tal axis, and a terminal segmen t from (2 k ; 0) to (2 n; 0), with no further restrictions on this segmen t. Th us, the n um b er of paths in the collection whic h ha v e a rst return to the origin at time 2 k is giv en b y f 2 k 2 2 k u 2 n 2 k 2 2 n 2 k = f 2 k u 2 n 2 k 2 2 n : If w e sum o v er k w e obtain the equation u 2 n 2 2 n = f 0 u 2 n 2 2 n + f 2 u 2 n 2 2 2 n + + f 2 n u 0 2 2 n : Dividing b oth sides of this equation b y 2 2 n completes the pro of. 2 PAGE 482 474 CHAPTER 12. RANDOM W ALKS The expression in the righ thand side of the ab o v e theorem should remind the reader of a sum that app eared in Denition 7.1 of the con v olution of t w o distributions. The con v olution of t w o sequences is dened in a similar manner. The ab o v e theorem sa ys that the sequence f u 2 n g is the con v olution of itself and the sequence f f 2 n g Th us, if w e represen t eac h of these sequences b y an ordinary generating function, then w e can use the ab o v e relationship to determine the v alue f 2 n Theorem 12.3 F or m 1, the probabilit y of a rst return to the origin at time 2 m is giv en b y f 2 m = u 2 m 2 m 1 = 2 m m (2 m 1)2 2 m : Pro of. W e b egin b y dening the generating functions U ( x ) = 1 X m =0 u 2 m x m and F ( x ) = 1 X m =0 f 2 m x m : Theorem 12.2 sa ys that U ( x ) = 1 + U ( x ) F ( x ) : (12.1) (The presence of the 1 on the righ thand side is due to the fact that u 0 is dened to b e 1, but Theorem 12.2 only holds for m 1.) W e note that b oth generating functions certainly con v erge on the in terv al ( 1 ; 1), since all of the co ecien ts are at most 1 in absolute v alue. Th us, w e can solv e the ab o v e equation for F ( x ), obtaining F ( x ) = U ( x ) 1 U ( x ) : No w, if w e can nd a closedform expression for the function U ( x ), w e will also ha v e a closedform expression for F ( x ). F rom Theorem 12.1, w e ha v e U ( x ) = 1 X m =0 2 m m 2 2 m x m : In Wilf, 1 w e nd that 1 p 1 4 x = 1 X m =0 2 m m x m : The reader is ask ed to pro v e this statemen t in Exercise 1. If w e replace x b y x= 4 in the last equation, w e see that U ( x ) = 1 p 1 x : 1 H. S. Wilf, Gener atingfunctionolo gy, (Boston: Academic Press, 1990), p. 50. PAGE 483 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 475 Therefore, w e ha v e F ( x ) = U ( x ) 1 U ( x ) = (1 x ) 1 = 2 1 (1 x ) 1 = 2 = 1 (1 x ) 1 = 2 : Although it is p ossible to compute the v alue of f 2 m using the Binomial Theorem, it is easier to note that F 0 ( x ) = U ( x ) = 2, so that the co ecien ts f 2 m can b e found b y in tegrating the series for U ( x ). W e obtain, for m 1, f 2 m = u 2 m 2 2 m = 2 m 2 m 1 m 2 2 m 1 = 2 m m (2 m 1)2 2 m = u 2 m 2 m 1 ; since 2 m 2 m 1 = m 2(2 m 1) 2 m m : This completes the pro of of the theorem. 2 Probabilit y of Ev en tual Return In the symmetric random w alk pro cess in R m what is the probabilit y that the particle ev en tually returns to the origin? W e rst examine this question in the case that m = 1, and then w e consider the general case. The results in the next t w o examples are due to P oly a. 2 Example 12.1 (Ev en tual Return in R 1 ) One has to approac h the idea of ev en tual return with some care, since the sample space seems to b e the set of all w alks of innite length, and this set is nonden umerable. T o a v oid diculties, w e will dene w n to b e the probabilit y that a rst return has o ccurred no later than time n Th us, w n concerns the sample space of all w alks of length n whic h is a nite set. In terms of the w n 's, it is reasonable to dene the probabilit y that the particle ev en tually returns to the origin to b e w = lim n !1 w n : This limit clearly exists and is at most one, since the sequence f w n g 1n =1 is an increasing sequence, and all of its terms are at most one. 2 G. P oly a, \ Ub er eine Aufgab e der W ahrsc heinlic hk eitsrec hn ung b etreend die Irrfahrt im Strassennetz," Math. Ann., v ol. 84 (1921), pp. 149160. PAGE 484 476 CHAPTER 12. RANDOM W ALKS In terms of the f n probabilities, w e see that w 2 n = n X i =1 f 2 i : Th us, w = 1 X i =1 f 2 i : In the pro of of Theorem 12.3, the generating function F ( x ) = 1 X m =0 f 2 m x m w as in tro duced. There it w as noted that this series con v erges for x 2 ( 1 ; 1). In fact, it is p ossible to sho w that this series also con v erges for x = 1 b y using Exercise 4, together with the fact that f 2 m = u 2 m 2 m 1 : (This fact w as pro v ed in the pro of of Theorem 12.3.) Since w e also kno w that F ( x ) = 1 (1 x ) 1 = 2 ; w e see that w = F (1) = 1 : Th us, with probabilit y one, the particle returns to the origin. An alternativ e pro of of the fact that w = 1 can b e obtained b y using the results in Exercise 2. 2 Example 12.2 (Ev en tual Return in R m ) W e no w turn our atten tion to the case that the random w alk tak es place in more than one dimension. W e dene f ( m ) 2 n to b e the probabilit y that the rst return to the origin in R m o ccurs at time 2 n The quan tit y u ( m ) 2 n is dened in a similar manner. Th us, f (1) 2 n and u (1)2 n equal f 2 n and u 2 n whic h w ere dened earlier. If, in addition, w e dene u ( m ) 0 = 1 and f ( m ) 0 = 0, then one can mimic the pro of of Theorem 12.2, and sho w that for all m 1, u ( m ) 2 n = f ( m ) 0 u ( m ) 2 n + f ( m ) 2 u ( m ) 2 n 2 + + f ( m ) 2 n u ( m ) 0 : (12.2) W e con tin ue to generalize previous w ork b y dening U ( m ) ( x ) = 1 X n =0 u ( m ) 2 n x n and F ( m ) ( x ) = 1 X n =0 f ( m ) 2 n x n : PAGE 485 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 477 Then, b y using Equation 12.2, w e see that U ( m ) ( x ) = 1 + U ( m ) ( x ) F ( m ) ( x ) ; as b efore. These functions will alw a ys con v erge in the in terv al ( 1 ; 1), since all of their co ecien ts are at most one in magnitude. In fact, since w ( m ) = 1 X n =0 f ( m ) 2 n 1 for all m the series for F ( m ) ( x ) con v erges at x = 1 as w ell, and F ( m ) ( x ) is leftcon tin uous at x = 1, i.e., lim x 1 F ( m ) ( x ) = F ( m ) (1) : Th us, w e ha v e w ( m ) = lim x 1 F ( m ) ( x ) = lim x 1 U ( m ) ( x ) 1 U ( m ) ( x ) ; (12.3) so to determine w ( m ) it suces to determine lim x 1 U ( m ) ( x ) : W e let u ( m ) denote this limit. W e claim that u ( m ) = 1 X n =0 u ( m ) 2 n : (This claim is reasonable; it sa ys that to nd out what happ ens to the function U ( m ) ( x ) at x = 1, just let x = 1 in the p o w er series for U ( m ) ( x ).) T o pro v e the claim, w e note that the co ecien ts u ( m ) 2 n are nonnegativ e, so U ( m ) ( x ) increases monotonically on the in terv al [0 ; 1). Th us, for eac h K w e ha v e K X n =0 u ( m ) 2 n lim x 1 U ( m ) ( x ) = u ( m ) 1 X n =0 u ( m ) 2 n : By letting K 1 w e see that u ( m ) = 1 X 2 n u ( m ) 2 n : This establishes the claim. F rom Equation 12.3, w e see that if u ( m ) < 1 then the probabilit y of an ev en tual return is u ( m ) 1 u ( m ) ; while if u ( m ) = 1 then the probabilit y of ev en tual return is 1. T o complete the example, w e m ust estimate the sum 1 X n =0 u ( m ) 2 n : PAGE 486 478 CHAPTER 12. RANDOM W ALKS In Exercise 12, the reader is ask ed to sho w that u (2)2 n = 1 4 2 n 2 n n 2 : Using Stirling's F orm ula, it is easy to sho w that (see Exercise 13) 2 n n 2 2 n p n ; so u (2)2 n 1 n : F rom this it follo ws easily that 1 X n =0 u (2)2 n div erges, so w (2) = 1, i.e., in R 2 the probabilit y of an ev en tual return is 1. When m = 3, Exercise 12 sho ws that u (3)2 n = 1 2 2 n 2 n n X j;k 1 3 n n j k !( n j k )! 2 : Let M denote the largest v alue of 1 3 n n j k !( n j k )! ; o v er all nonnegativ e v alues of j and k with j + k n It is easy using Stirling's F orm ula, to sho w that M c n ; for some constan t c Th us, w e ha v e u (3)2 n 1 2 2 n 2 n n X j;k M 3 n n j k !( n j k )! : Using Exercise 14, one can sho w that the righ thand expression is at most c 0 n 3 = 2 ; where c 0 is a constan t. Th us, 1 X n =0 u (3)2 n con v erges, so w (3) is strictly less than one. This means that in R 3 the probabilit y of an ev en tual return to the origin is strictly less than one (in fact, it is appro ximately .34). One ma y summarize these results b y stating that one should not get drunk in more than t w o dimensions. 2 PAGE 487 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 479 Exp ected Num b er of Equalizations W e no w giv e another example of the use of generating functions to nd a general form ula for terms in a sequence, where the sequence is related b y recursion relations to other sequences. Exercise 9 giv es still another example. Example 12.3 (Exp ected Num b er of Equalizations) In this example, w e will deriv e a form ula for the exp ected n um b er of equalizations in a random w alk of length 2 m As in the pro of of Theorem 12.3, the metho d has four main parts. First, a recursion is found whic h relates the m th term in the unkno wn sequence to earlier terms in the same sequence and to terms in other (kno wn) sequences. An example of suc h a recursion is giv en in Theorem 12.2. Second, the recursion is used to deriv e a functional equation in v olving the generating functions of the unkno wn sequence and one or more kno wn sequences. Equation 12.1 is an example of suc h a functional equation. Third, the functional equation is solv ed for the unkno wn generating function. Last, using a device suc h as the Binomial Theorem, in tegration, or dieren tiation, a form ula for the m th co ecien t of the unkno wn generating function is found. W e b egin b y dening g 2 m to b e the n um b er of equalizations among all of the random w alks of length 2 m (F or eac h random w alk, w e disregard the equalization at time 0.) W e dene g 0 = 0. Since the n um b er of w alks of length 2 m equals 2 2 m the exp ected n um b er of equalizations among all suc h random w alks is g 2 m = 2 2 m Next, w e dene the generating function G ( x ): G ( x ) = 1 X k =0 g 2 k x k : No w w e need to nd a recursion whic h relates the sequence f g 2 k g to one or b oth of the kno wn sequences f f 2 k g and f u 2 k g W e consider m to b e a xed p ositiv e in teger, and consider the set of all paths of length 2 m as the disjoin t union E 2 [ E 4 [ [ E 2 m [ H ; where E 2 k is the set of all paths of length 2 m with rst equalization at time 2 k and H is the set of all paths of length 2 m with no equalization. It is easy to sho w (see Exercise 3) that j E 2 k j = f 2 k 2 2 m : W e claim that the n um b er of equalizations among all paths b elonging to the set E 2 k is equal to j E 2 k j + 2 2 k f 2 k g 2 m 2 k : (12.4) Eac h path in E 2 k has one equalization at time 2 k so the total n um b er of suc h equalizations is just j E 2 k j This is the rst summand in expression Equation 12.4. There are 2 2 k f 2 k dieren t initial segmen ts of length 2 k among the paths in E 2 k Eac h of these initial segmen ts can b e augmen ted to a path of length 2 m in 2 2 m 2 k w a ys, b y adjoining all p ossible paths of length 2 m 2 k The n um b er of equalizations obtained b y adjoining all of these paths to an y one initial segmen t is g 2 m 2 k b y PAGE 488 480 CHAPTER 12. RANDOM W ALKS denition. This giv es the second summand in Equation 12.4. Since k can range from 1 to m w e obtain the recursion g 2 m = m Xk =1 j E 2 k j + 2 2 k f 2 k g 2 m 2 k : (12.5) The second summand in the t ypical term ab o v e should remind the reader of a con v olution. In fact, if w e m ultiply the generating function G ( x ) b y the generating function F (4 x ) = 1 Xk =0 2 2 k f 2 k x k ; the co ecien t of x m equals m Xk =0 2 2 k f 2 k g 2 m 2 k : Th us, the pro duct G ( x ) F (4 x ) is part of the functional equation that w e are seeking. The rst summand in the t ypical term in Equation 12.5 giv es rise to the sum 2 2 m m Xk =1 f 2 k : F rom Exercise 2, w e see that this sum is just (1 u 2 m )2 2 m Th us, w e need to create a generating function whose m th co ecien t is this term; this generating function is 1 X m =0 (1 u 2 m )2 2 m x m ; or 1 X m =0 2 2 m x m 1 X m =0 u 2 m 2 2 m x m : The rst sum is just (1 4 x ) 1 and the second sum is U (4 x ). So, the functional equation whic h w e ha v e b een seeking is G ( x ) = F (4 x ) G ( x ) + 1 1 4 x U (4 x ) : If w e solv e this recursion for G ( x ), and simplify w e obtain G ( x ) = 1 (1 4 x ) 3 = 2 1 (1 4 x ) : (12.6) W e no w need to nd a form ula for the co ecien t of x m The rst summand in Equation 12.6 is (1 = 2) U 0 (4 x ), so the co ecien t of x m in this function is u 2 m +2 2 2 m +1 ( m + 1) : The second summand in Equation 12.6 is the sum of a geometric series with common ratio 4 x so the co ecien t of x m is 2 2 m Th us, w e obtain PAGE 489 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 481 g 2 m = u 2 m +2 2 2 m +1 ( m + 1) 2 2 m = 1 2 2 m + 2 m + 1 ( m + 1) 2 2 m : W e recall that the quotien t g 2 m = 2 2 m is the exp ected n um b er of equalizations among all paths of length 2 m Using Exercise 4, it is easy to sho w that g 2 m 2 2 m r 2 p 2 m : In particular, this means that the a v erage n um b er of equalizations among all paths of length 4 m is not t wice the a v erage n um b er of equalizations among all paths of length 2 m In order for the a v erage n um b er of equalizations to double, one m ust quadruple the lengths of the random w alks. 2 It is in teresting to note that if w e dene M n = max 0 k n S k ; then w e ha v e E ( M n ) r 2 p n : This means that the exp ected n um b er of equalizations and the exp ected maxim um v alue for random w alks of length n are asymptotically equal as n 1 (In fact, it can b e sho wn that the t w o exp ected v alues dier b y at most 1 = 2 for all p ositiv e in tegers n See Exercise 9.) Exercises 1 Using the Binomial Theorem, sho w that 1 p 1 4 x = 1 X m =0 2 m m x m : What is the in terv al of con v ergence of this p o w er series? 2 (a) Sho w that for m 1, f 2 m = u 2 m 2 u 2 m : (b) Using part (a), nd a closedform expression for the sum f 2 + f 4 + + f 2 m : (c) Using part (b), sho w that 1 X m =1 f 2 m = 1 : (One can also obtain this statemen t from the fact that F ( x ) = 1 (1 x ) 1 = 2 : ) PAGE 490 482 CHAPTER 12. RANDOM W ALKS (d) Using parts (a) and (b), sho w that the probabilit y of no equalization in the rst 2 m outcomes equals the probabilit y of an equalization at time 2 m 3 Using the notation of Example 12.3, sho w that j E 2 k j = f 2 k 2 2 m : 4 Using Stirling's F orm ula, sho w that u 2 m 1 p m : 5 A le ad change in a random w alk o ccurs at time 2 k if S 2 k 1 and S 2 k +1 are of opp osite sign. (a) Giv e a rigorous argumen t whic h pro v es that among all w alks of length 2 m that ha v e an equalization at time 2 k exactly half ha v e a lead c hange at time 2 k (b) Deduce that the total n um b er of lead c hanges among all w alks of length 2 m equals 1 2 ( g 2 m u 2 m ) : (c) Find an asymptotic expression for the a v erage n um b er of lead c hanges in a random w alk of length 2 m 6 (a) Sho w that the probabilit y that a random w alk of length 2 m has a last return to the origin at time 2 k where 0 k m equals 2 k k 2 m 2 k m k 2 2 m = u 2 k u 2 m 2 k : (The case k = 0 consists of all paths that do not return to the origin at an y p ositiv e time.) Hint : A path whose last return to the origin o ccurs at time 2 k consists of t w o paths glued together, one path of whic h is of length 2 k and whic h b egins and ends at the origin, and the other path of whic h is of length 2 m 2 k and whic h b egins at the origin but nev er returns to the origin. Both t yp es of paths can b e coun ted using quan tities whic h app ear in this section. (b) Using part (a), sho w that if m is o dd, the probabilit y that a w alk of length 2 m has no equalization in the last m outcomes is equal to 1 = 2, regardless of the v alue of m Hint : The answ er to part a) is symmetric in k and m k 7 Sho w that the probabilit y of no equalization in a w alk of length 2 m equals u 2 m PAGE 491 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 483 *8 Sho w that P ( S 1 0 ; S 2 0 ; : : : ; S 2 m 0) = u 2 m : Hint : First explain wh y P ( S 1 > 0 ; S 2 > 0 ; : : : ; S 2 m > 0) = 1 2 P ( S 1 6 = 0 ; S 2 6 = 0 ; : : : ; S 2 m 6 = 0) : Then use Exercise 7, together with the observ ation that if no equalization o ccurs in the rst 2 m outcomes, then the path go es through the p oin t (1 ; 1) and remains on or ab o v e the horizon tal line x = 1. *9 In F eller, 3 one nds the follo wing theorem: Let M n b e the random v ariable whic h giv es the maxim um v alue of S k for 1 k n Dene p n;r = n n + r 2 2 n : If r 0, then P ( M n = r ) = p n;r ; if r n (mo d 2) ; p n;r +1 ; if r 6 n (mo d 2) : (a) Using this theorem, sho w that E ( M 2 m ) = 1 2 2 m m X k =1 (4 k 1) 2 m m + k ; and if n = 2 m + 1, then E ( M 2 m +1 ) = 1 2 2 m +1 m Xk =0 (4 k + 1) 2 m + 1 m + k + 1 : (b) F or m 1, dene r m = m X k =1 k 2 m m + k and s m = m X k =1 k 2 m + 1 m + k + 1 : By using the iden tit y n k = n 1 k 1 + n 1 k ; sho w that s m = 2 r m 1 2 2 2 m 2 m m 3 W. F eller, Intr o duction to Pr ob ability The ory and its Applic ations, v ol. I, 3rd ed. (New Y ork: John Wiley & Sons, 1968). PAGE 492 484 CHAPTER 12. RANDOM W ALKS and r m = 2 s m 1 + 1 2 2 2 m 1 ; if m 2. (c) Dene the generating functions R ( x ) = 1 Xk =1 r k x k and S ( x ) = 1 Xk =1 s k x k : Sho w that S ( x ) = 2 R ( x ) 1 2 1 1 4 x + 1 2 p 1 4 x and R ( x ) = 2 xS ( x ) + x 1 1 4 x : (d) Sho w that R ( x ) = x (1 4 x ) 3 = 2 ; and S ( x ) = 1 2 1 (1 4 x ) 3 = 2 1 2 1 1 4 x : (e) Sho w that r m = m 2 m 1 m 1 ; and s m = 1 2 ( m + 1) 2 m + 1 m 1 2 (2 2 m ) : (f ) Sho w that E ( M 2 m ) = m 2 2 m 1 2 m m + 1 2 2 m +1 2 m m 1 2 ; and E ( M 2 m +1 ) = m + 1 2 2 m +1 2 m + 2 m + 1 1 2 : The reader should compare these form ulas with the expression for g 2 m = 2 (2 m ) in Example 12.3. PAGE 493 12.1. RANDOM W ALKS IN EUCLIDEAN SP A CE 485 *10 (from K. Lev asseur 4 ) A paren t and his c hild pla y the follo wing game. A dec k of 2 n cards, n red and n blac k, is sh ued. The cards are turned up one at a time. Before eac h card is turned up, the paren t and the c hild guess whether it will b e red or blac k. Who ev er mak es more correct guesses wins the game. The c hild is assumed to guess eac h color with the same probabilit y so she will ha v e a score of n on a v erage. The paren t k eeps trac k of ho w man y cards of eac h color ha v e already b een turned up. If more blac k cards, sa y than red cards remain in the dec k, then the paren t will guess blac k, while if an equal n um b er of eac h color remain, then the paren t guesses eac h color with probabilit y 1/2. What is the exp ected n um b er of correct guesses that will b e made b y the paren t? Hint : Eac h of the 2 n n p ossible orderings of red and blac k cards corresp onds to a random w alk of length 2 n that returns to the origin at time 2 n Sho w that b et w een eac h pair of successiv e equalizations, the paren t will b e righ t exactly once more than he will b e wrong. Explain wh y this means that the a v erage n um b er of correct guesses b y the paren t is greater than n b y exactly onehalf the a v erage n um b er of equalizations. No w dene the random v ariable X i to b e 1 if there is an equalization at time 2 i and 0 otherwise. Then, among all relev an t paths, w e ha v e E ( X i ) = P ( X i = 1) = 2 n 2 i n i 2 i i 2 n n : Th us, the exp ected n um b er of equalizations equals E n X i =1 X i = 1 2 n n n X i =1 2 n 2 i n i 2 i i : One can no w use generating functions to nd the v alue of the sum. It should b e noted that in a game suc h as this, a more in teresting question than the one ask ed ab o v e is what is the probabilit y that the paren t wins the game? F or this game, this question w as answ ered b y D. Zagier. 5 He sho w ed that the probabilit y of winning is asymptotic (for large n ) to the quan tit y 1 2 + 1 2 p 2 : *11 Pro v e that u (2)2 n = 1 4 2 n n Xk =0 (2 n )! k k !( n k )!( n k )! ; and u (3)2 n = 1 6 2 n X j;k (2 n )! j j k k !( n j k )!( n j k )! ; 4 K. Lev asseur, \Ho w to Beat Y our Kids at Their Own Game," Mathematics Magazine v ol. 61, no. 5 (Decem b er, 1988), pp. 301305. 5 D. Zagier, \Ho w Often Should Y ou Beat Y our Kids?" Mathematics Magazine v ol. 63, no. 2 (April 1990), pp. 8992. PAGE 494 486 CHAPTER 12. RANDOM W ALKS where the last sum extends o v er all nonnegativ e j and k with j + k n Also sho w that this last expression ma y b e rewritten as 1 2 2 n 2 n n X j;k 1 3 n n j k !( n j k )! 2 : *12 Pro v e that if n 0, then n X k =0 n k 2 = 2 n n : Hint : W rite the sum as n Xk =0 n k n n k and explain wh y this is a co ecien t in the pro duct (1 + x ) n (1 + x ) n : Use this, together with Exercise 11, to sho w that u (2)2 n = 1 4 2 n 2 n n n X k =0 n k 2 = 1 4 2 n 2 n n 2 : *13 Using Stirling's F orm ula, pro v e that 2 n n 2 2 n p n : *14 Pro v e that X j;k 1 3 n n j k !( n j k )! = 1 ; where the sum extends o v er all nonnegativ e j and k suc h that j + k n Hint : Coun t ho w man y w a ys one can place n lab elled balls in 3 lab elled urns. *15 Using the result pro v ed for the random w alk in R 3 in Example 12.2, explain wh y the probabilit y of an ev en tual return in R n is strictly less than one, for all n 3. Hint : Consider a random w alk in R n and disregard all but the rst three co ordinates of the particle's p osition. 12.2 Gam bler's Ruin In the last section, the simplest kind of symmetric random w alk in R 1 w as studied. In this section, w e remo v e the assumption that the random w alk is symmetric. Instead, w e assume that p and q are nonnegativ e real n um b ers with p + q = 1, and that the common distribution function of the jumps of the random w alk is f X ( x ) = p; if x = 1 ; q ; if x = 1 : PAGE 495 12.2. GAMBLER'S R UIN 487 One can imagine the random w alk as represen ting a sequence of tosses of a w eigh ted coin, with a head app earing with probabilit y p and a tail app earing with probabilit y q An alternativ e form ulation of this situation is that of a gam bler pla ying a sequence of games against an adv ersary (sometimes though t of as another p erson, sometimes called \the house") where, in eac h game, the gam bler has probabilit y p of winning. The Gam bler's Ruin Problem The ab o v e form ulation of this t yp e of random w alk leads to a problem kno wn as the Gam bler's Ruin problem. This problem w as in tro duced in Exercise 23, but w e will giv e the description of the problem again. A gam bler starts with a \stak e" of size s She pla ys un til her capital reac hes the v alue M or the v alue 0. In the language of Mark o v c hains, these t w o v alues corresp ond to absorbing states. W e are in terested in studying the probabilit y of o ccurrence of eac h of these t w o outcomes. One can also assume that the gam bler is pla ying against an \innitely ric h" adv ersary In this case, w e w ould sa y that there is only one absorbing state, namely when the gam bler's stak e is 0. Under this assumption, one can ask for the probabilit y that the gam bler is ev en tually ruined. W e b egin b y dening q k to b e the probabilit y that the gam bler's stak e reac hes 0, i.e., she is ruined, b efore it reac hes M giv en that the initial stak e is k W e note that q 0 = 1 and q M = 0. The fundamen tal relationship among the q k 's is the follo wing: q k = pq k +1 + q q k 1 ; where 1 k M 1. This holds b ecause if her stak e equals k and she pla ys one game, then her stak e b ecomes k + 1 with probabilit y p and k 1 with probabilit y q In the rst case, the probabilit y of ev en tual ruin is q k +1 and in the second case, it is q k 1 W e note that since p + q = 1, w e can write the ab o v e equation as p ( q k +1 q k ) = q ( q k q k 1 ) ; or q k +1 q k = q p ( q k q k 1 ) : F rom this equation, it is easy to see that q k +1 q k = q p k ( q 1 q 0 ) : (12.7) W e no w use telescoping sums to obtain an equation in whic h the only unkno wn is q 1 : 1 = q M q 0 = M 1 Xk =0 ( q k +1 q k ) ; PAGE 496 488 CHAPTER 12. RANDOM W ALKS so 1 = M 1 Xk =0 q p k ( q 1 q 0 ) = ( q 1 q 0 ) M 1 X k =0 q p k : If p 6 = q then the ab o v e expression equals ( q 1 q 0 ) ( q =p ) M 1 ( q =p ) 1 ; while if p = q = 1 = 2, then w e obtain the equation 1 = ( q 1 q 0 ) M : F or the momen t w e shall assume that p 6 = q Then w e ha v e q 1 q 0 = ( q =p ) 1 ( q =p ) M 1 : No w, for an y z with 1 z M w e ha v e q z q 0 = z 1 Xk =0 ( q k +1 q k ) = ( q 1 q 0 ) z 1 X k =0 q p k = ( q 1 q 0 ) ( q =p ) z 1 ( q =p ) 1 = ( q =p ) z 1 ( q =p ) M 1 : Therefore, q z = 1 ( q =p ) z 1 ( q =p ) M 1 = ( q =p ) M ( q =p ) z ( q =p ) M 1 : Finally if p = q = 1 = 2, it is easy to sho w that (see Exercise 10) q z = M z M : W e note that b oth of these form ulas hold if z = 0. W e dene, for 0 z M the quan tit y p z to b e the probabilit y that the gam bler's stak e reac hes M without ev er ha ving reac hed 0. Since the game migh t PAGE 497 12.2. GAMBLER'S R UIN 489 con tin ue indenitely it is not ob vious that p z + q z = 1 for all z Ho w ev er, one can use the same metho d as ab o v e to sho w that if p 6 = q then q z = ( q =p ) z 1 ( q =p ) M 1 ; and if p = q = 1 = 2, then q z = z M : Th us, for all z it is the case that p z + q z = 1, so the game ends with probabilit y 1. Innitely Ric h Adv ersaries W e no w turn to the problem of nding the probabilit y of ev en tual ruin if the gam bler is pla ying against an innitely ric h adv ersary This probabilit y can b e obtained b y letting M go to 1 in the expression for q z calculated ab o v e. If q < p then the expression approac hes ( q =p ) z and if q > p the expression approac hes 1. In the case p = q = 1 = 2, w e recall that q z = 1 z = M Th us, if M 1 w e see that the probabilit y of ev en tual ruin tends to 1. Historical Remarks In 1711, De Moivre, in his b o ok De Mesur a Sortis ga v e an ingenious deriv ation of the probabilit y of ruin. The follo wing description of his argumen t is tak en from Da vid. 6 The notation used is as follo ws: W e imagine that there are t w o pla y ers, A and B, and the probabilities that they win a game are p and q resp ectiv ely The pla y ers start with a and b coun ters, resp ectiv ely Imagine that eac h pla y er starts with his coun ters b efore him in a pile, and that nominal v alues are assigned to the coun ters in the follo wing manner. A's b ottom coun ter is giv en the nominal v alue q =p ; the next is giv en the nominal v alue ( q =p ) 2 and so on un til his top coun ter whic h has the nominal v alue ( q =p ) a B's top coun ter is v alued ( q =p ) a +1 and so on do wn w ards un til his b ottom coun ter whic h is v alued ( q =p ) a + b After eac h game the loser's top coun ter is transferred to the top of the winner's pile, and it is alw a ys the top coun ter whic h is stak ed for the next game. Then in terms of the nominal values B's stak e is alw a ys q =p times A's, so that at ev ery game eac h pla y er's nominal exp ectation is nil. This remains true throughout the pla y; therefore A's c hance of winning all B's coun ters, m ultiplied b y his nominal gain if he do es so, m ust equal B's c hance m ultiplied b y B's nominal gain. Th us, P a q p a +1 + + q p a + b = P b q p + + q p a : (12.8) 6 F. N. Da vid, Games, Go ds and Gambling (London: Grin, 1962). PAGE 498 490 CHAPTER 12. RANDOM W ALKS Using this equation, together with the fact that P a + P b = 1 ; it can easily b e sho wn that P a = ( q =p ) a 1 ( q =p ) a + b 1 ; if p 6 = q and P a = a a + b ; if p = q = 1 = 2. In terms of mo dern probabilit y theory de Moivre is c hanging the v alues of the coun ters to mak e an unfair game in to a fair game, whic h is called a martingale. With the new v alues, the exp ected fortune of pla y er A (that is, the sum of the nominal v alues of his coun ters) after eac h pla y equals his fortune b efore the pla y (and similarly for pla y er B). (F or a simpler martingale argumen t, see Exercise 9.) De Moivre then uses the fact that when the game ends, it is still fair, th us Equation 12.8 m ust b e true. This fact requires pro of, and is one of the cen tral theorems in the area of martingale theory Exercises 1 In the gam bler's ruin problem, assume that the gam bler initial stak e is 1 dollar, and assume that her probabilit y of success on an y one game is p Let T b e the n um b er of games un til 0 is reac hed (the gam bler is ruined). Sho w that the generating function for T is h ( z ) = 1 p 1 4 pq z 2 2 pz ; and that h (1) = q =p; if q p; 1 ; if q p; and h 0 (1) = 1 = ( q p ) ; if q > p; 1 ; if q = p: In terpret y our results in terms of the time T to reac h 0. (See also Example 10.7.) 2 Sho w that the T a ylor series expansion for p 1 x is p 1 x = 1 X n =0 1 = 2 n x n ; where the binomial co ecien t 1 = 2 n is 1 = 2 n = (1 = 2)(1 = 2 1) (1 = 2 n + 1) n : PAGE 499 12.2. GAMBLER'S R UIN 491 Using this and the result of Exercise 1, sho w that the probabilit y that the gam bler is ruined on the n th step is p T ( n ) = ( ( 1) k 1 2 p 1 = 2 k (4 pq ) k ; if n = 2 k 1, 0 ; if n = 2 k 3 F or the gam bler's ruin problem, assume that the gam bler starts with k dollars. Let T k b e the time to reac h 0 for the rst time. (a) Sho w that the generating function h k ( t ) for T k is the k th p o w er of the generating function for the time T to ruin starting at 1. Hint : Let T k = U 1 + U 2 + + U k where U j is the time for the w alk starting at j to reac h j 1 for the rst time. (b) Find h k (1) and h 0k (1) and in terpret y our results. 4 (The next three problems come from F eller. 7 ) As in the text, assume that M is a xed p ositiv e in teger. (a) Sho w that if a gam bler starts with an stak e of 0 (and is allo w ed to ha v e a negativ e amoun t of money), then the probabilit y that her stak e reac hes the v alue of M b efore it returns to 0 equals p (1 q 1 ). (b) Sho w that if the gam bler starts with a stak e of M then the probabilit y that her stak e reac hes 0 b efore it returns to M equals q q M 1 5 Supp ose that a gam bler starts with a stak e of 0 dollars. (a) Sho w that the probabilit y that her stak e nev er reac hes M b efore returning to 0 equals 1 p (1 q 1 ). (b) Sho w that the probabilit y that her stak e reac hes the v alue M exactly k times b efore returning to 0 equals p (1 q 1 )(1 q q M 1 ) k 1 ( q q M 1 ). Hint : Use Exercise 4. 6 In the text, it w as sho wn that if q < p there is a p ositiv e probabilit y that a gam bler, starting with a stak e of 0 dollars, will nev er return to the origin. Th us, w e will no w assume that q p Using Exercise 5, sho w that if a gam bler starts with a stak e of 0 dollars, then the exp ected n um b er of times her stak e equals M b efore returning to 0 equals ( p=q ) M if q > p and 1, if q = p (W e quote from F eller: \The truly amazing implications of this result app ear b est in the language of fair games. A p erfect coin is tossed un til the rst equalization of the accum ulated n um b ers of heads and tails. The gam bler receiv es one p enn y for ev ery time that the accum ulated n um b er of heads exceeds the accum ulated n um b er of tails b y m The `fair entr anc e fe e' e quals 1 indep endent of m ") 7 W. F eller, op. cit., pg. 367. PAGE 500 492 CHAPTER 12. RANDOM W ALKS 7 In the game in Exercise 6, let p = q = 1 = 2 and M = 10. What is the probabilit y that the gam bler's stak e equals M at least 20 times b efore it returns to 0? 8 W rite a computer program whic h sim ulates the game in Exercise 6 for the case p = q = 1 = 2, and M = 10. 9 In de Moivre's description of the game, w e can mo dify the denition of pla y er A's fortune in suc h a w a y that the game is still a martingale (and the calculations are simpler). W e do this b y assigning nominal v alues to the coun ters in the same w a y as de Moivre, but eac h pla y er's curren t fortune is dened to b e just the v alue of the coun ter whic h is b eing w agered on the next game. So, if pla y er A has a coun ters, then his curren t fortune is ( q =p ) a (w e stipulate this to b e true ev en if a = 0). Sho w that under this denition, pla y er A's exp ected fortune after one pla y equals his fortune b efore the pla y if p 6 = q Then, as de Moivre do es, write an equation whic h expresses the fact that pla y er A's exp ected nal fortune equals his initial fortune. Use this equation to nd the probabilit y of ruin of pla y er A. 10 Assume in the gam bler's ruin problem that p = q = 1 = 2. (a) Using Equation 12.7, together with the facts that q 0 = 1 and q M = 0, sho w that for 0 z M q z = M z M : (b) In Equation 12.8, let p 1 = 2 (and since q = 1 p q 1 = 2 as w ell). Sho w that in the limit, q z = M z M : Hint : Replace q b y 1 p and use L'Hopital's rule. 11 In American casinos, the roulette wheels ha v e the in tegers b et w een 1 and 36, together with 0 and 00. Half of the nonzero n um b ers are red, the other half are blac k, and 0 and 00 are green. A common b et in this game is to b et a dollar on red. If a red n um b er comes up, the b ettor gets her dollar bac k, and also gets another dollar. If a blac k or green n um b er comes up, she loses her dollar. (a) Supp ose that someone starts with 40 dollars, and con tin ues to b et on red un til either her fortune reac hes 50 or 0. Find the probabilit y that her fortune reac hes 50 dollars. (b) Ho w m uc h money w ould she ha v e to start with, in order for her to ha v e a 95% c hance of winning 10 dollars b efore going brok e? (c) A casino o wner w as once heard to remark that \If w e to ok 0 and 00 o of the roulette wheel, w e w ould still mak e lots of money b ecause p eople w ould con tin ue to come in and pla y un til they lost all of their money ." Do y ou think that suc h a casino w ould sta y in business? PAGE 501 12.3. AR C SINE LA WS 493 12.3 Arc Sine La ws In Exercise 12.1.6, the distribution of the time of the last equalization in the symmetric random w alk w as determined. If w e let 2 k ; 2 m denote the probabilit y that a random w alk of length 2 m has its last equalization at time 2 k then w e ha v e 2 k ; 2 m = u 2 k u 2 m 2 k : W e shall no w sho w ho w one can appro ximate the distribution of the 's with a simple function. W e recall that u 2 k 1 p k : Therefore, as b oth k and m go to 1 w e ha v e 2 k ; 2 m 1 p k ( m k ) : This last expression can b e written as 1 m p ( k =m )(1 k =m ) : Th us, if w e dene f ( x ) = 1 p x (1 x ) ; for 0 < x < 1, then w e ha v e 2 k ; 2 m 1 m f k m : The reason for the sign is that w e no longer require that k get large. This means that w e can replace the discrete 2 k ; 2 m distribution b y the con tin uous densit y f ( x ) on the in terv al [0 ; 1] and obtain a go o d appro ximation. In particular, if x is a xed real n um b er b et w een 0 and 1, then w e ha v e X k PAGE 502 494 CHAPTER 12. RANDOM W ALKS random w alk could b e view ed as a p olygonal line connecting (0 ; 0) with ( m; S m ). Under this in terpretation, w e dene b 2 k ; 2 m to b e the probabilit y that a random w alk of length 2 m has exactly 2 k of its 2 m p olygonal line segmen ts ab o v e the t axis. The probabilit y b 2 k ; 2 m is frequen tly in terpreted in terms of a t w opla y er game. (The reader will recall the game Heads or T ails, in Example 1.4.) Pla y er A is said to b e in the lead at time n if the random w alk is ab o v e the t axis at that time, or if the random w alk is on the t axis at time n but ab o v e the t axis at time n 1. (A t time 0, neither pla y er is in the lead.) One can ask what is the most probable n um b er of times that pla y er A is in the lead, in a game of length 2 m Most p eople will sa y that the answ er to this question is m Ho w ev er, the follo wing theorem sa ys that m is the least lik ely n um b er of times that pla y er A is in the lead, and the most lik ely n um b er of times in the lead is 0 or 2 m Theorem 12.4 If P eter and P aul pla y a game of Heads or T ails of length 2 m the probabilit y that P eter will b e in the lead exactly 2 k times is equal to 2 k ; 2 m : Pro of. T o pro v e the theorem, w e need to sho w that b 2 k ; 2 m = 2 k ; 2 m : (12.9) Exercise 12.1.7 sho ws that b 2 m; 2 m = u 2 m and b 0 ; 2 m = u 2 m so w e only need to pro v e that Equation 12.9 holds for 1 k m 1. W e can obtain a recursion in v olving the b 's and the f 's (dened in Section 12.1) b y coun ting the n um b er of paths of length 2 m that ha v e exactly 2 k of their segmen ts ab o v e the t axis, where 1 k m 1. T o coun t this collection of paths, w e assume that the rst return o ccurs at time 2 j where 1 j m 1. There are t w o cases to consider. Either during the rst 2 j outcomes the path is ab o v e the t axis or b elo w the t axis. In the rst case, it m ust b e true that the path has exactly (2 k 2 j ) line segmen ts ab o v e the t axis, b et w een t = 2 j and t = 2 m In the second case, it m ust b e true that the path has exactly 2 k line segmen ts ab o v e the t axis, b et w een t = 2 j and t = 2 m W e no w coun t the n um b er of paths of the v arious t yp es describ ed ab o v e. The n um b er of paths of length 2 j all of whose line segmen ts lie ab o v e the t axis and whic h return to the origin for the rst time at time 2 j equals (1 = 2)2 2 j f 2 j This also equals the n um b er of paths of length 2 j all of whose line segmen ts lie b elo w the t axis and whic h return to the origin for the rst time at time 2 j The n um b er of paths of length (2 m 2 j ) whic h ha v e exactly (2 k 2 j ) line segmen ts ab o v e the t axis is b 2 k 2 j; 2 m 2 j Finally the n um b er of paths of length (2 m 2 j ) whic h ha v e exactly 2 k line segmen ts ab o v e the t axis is b 2 k ; 2 m 2 j Therefore, w e ha v e b 2 k ; 2 m = 1 2 k X j =1 f 2 j b 2 k 2 j; 2 m 2 j + 1 2 m k X j =1 f 2 j b 2 k ; 2 m 2 j : W e no w assume that Equation 12.9 is true for m < n Then w e ha v e PAGE 503 12.3. AR C SINE LA WS 495 0 10 20 30 40 0 0.02 0.04 0.06 0.08 0.1 0.12 Figure 12.2: Times in the lead. b 2 k ; 2 n = 1 2 k X j =1 f 2 j 2 k 2 j; 2 m 2 j + 1 2 m k X j =1 f 2 j 2 k ; 2 m 2 j = 1 2 k X j =1 f 2 j u 2 k 2 j u 2 m 2 k + 1 2 m k X j =1 f 2 j u 2 k u 2 m 2 j 2 k = 1 2 u 2 m 2 k k X j =1 f 2 j u 2 k 2 j + 1 2 u 2 k m k X j =1 f 2 j u 2 m 2 j 2 k = 1 2 u 2 m 2 k u 2 k + 1 2 u 2 k u 2 m 2 k ; where the last equalit y follo ws from Theorem 12.2. Th us, w e ha v e b 2 k ; 2 n = 2 k ; 2 n ; whic h completes the pro of. 2 W e illustrate the ab o v e theorem b y sim ulating 10,000 games of Heads or T ails, with eac h game consisting of 40 tosses. The distribution of the n um b er of times that P eter is in the lead is giv en in Figure 12.2, together with the arc sine densit y W e end this section b y stating t w o other results in whic h the arc sine densit y app ears. Pro ofs of these results ma y b e found in F eller. 8 Theorem 12.5 Let J b e the random v ariable whic h, for a giv en random w alk of length 2 m giv es the smallest subscript j suc h that S j = S 2 m (Suc h a subscript j m ust b e ev en, b y parit y considerations.) Let r 2 k ; 2 m b e the probabilit y that J = 2 k Then w e ha v e r 2 k ; 2 m = 2 k ; 2 m : 2 8 W. F eller, op. cit., pp. 93{94. PAGE 504 496 CHAPTER 12. RANDOM W ALKS The next theorem sa ys that the arc sine densit y is applicable to a wide range of situations. A con tin uous distribution function F ( x ) is said to b e symmetric if F ( x ) = 1 F ( x ). (If X is a con tin uous random v ariable with a symmetric distribution function, then for an y real x w e ha v e P ( X x ) = P ( X x ).) W e imagine that w e ha v e a random w alk of length n in whic h eac h summand has the distribution F ( x ), where F is con tin uous and symmetric. The subscript of the rst maximum of suc h a w alk is the unique subscript k suc h that S k > S 0 ; : : : ; S k > S k 1 ; S k S k +1 ; : : : ; S k S n : W e dene the random v ariable K n to b e the subscript of the rst maxim um. W e can no w state the follo wing theorem concerning the random v ariable K n Theorem 12.6 Let F b e a symmetric con tin uous distribution function, and let b e a xed real n um b er strictly b et w een 0 and 1. Then as n 1 w e ha v e P ( K n < n ) 2 arcsin p : 2 A v ersion of this theorem that holds for a symmetric random w alk can also b e found in F eller. Exercises 1 F or a random w alk of length 2 m dene k to equal 1 if S k > 0, or if S k 1 = 1 and S k = 0. Dene k to equal 1 in all other cases. Th us, k giv es the side of the t axis that the random w alk is on during the time in terv al [ k 1 ; k ]. A \la w of large n um b ers" for the sequence f k g w ould sa y that for an y > 0, w e w ould ha v e P < 1 + 2 + + n n < 1 as n 1 Ev en though the 's are not indep enden t, the ab o v e assertion certainly app ears reasonable. Using Theorem 12.4, sho w that if 1 x 1, then lim n !1 P 1 + 2 + + n n < x = 2 arcsin r 1 + x 2 : 2 Giv en a random w alk W of length m with summands f X 1 ; X 2 ; : : : ; X m g ; dene the r everse d random w alk to b e the w alk W with summands f X m ; X m 1 ; : : : ; X 1 g : (a) Sho w that the k th partial sum S k satises the equation S k = S m S n k ; where S k is the k th partial sum for the random w alk W PAGE 505 12.3. AR C SINE LA WS 497 (b) Explain the geometric relationship b et w een the graphs of a random w alk and its rev ersal. (It is not in general true that one graph is obtained from the other b y rerecting in a v ertical line.) (c) Use parts (a) and (b) to pro v e Theorem 12.5. PAGE 506 498 CHAPTER 12. RANDOM W ALKS PAGE 507 App endices 499 PAGE 508 500 APPENDICES NA (0,d) = area of rshaded region 0 d .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 0319 .0359 0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 0714 .0753 0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 1103 .1141 0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 1480 .1517 r 0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 1844 .1879 r 0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 2190 .2224 r 0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 2517 .2549 r 0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 2823 .2852 r 0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 3106 .3133 r 0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 3365 .3389 r 1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 3599 .3621 r 1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 3810 .3830 r 1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 3997 .4015 r 1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 4162 .4177 1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 4306 .4319r 1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 4429 .4441r 1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 4535 .4545r 1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 4625 .4633r 1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 4699 .4706 r 1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767r 2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 4812 .4817r 2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 4854 .4857 r 2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 4887 .4890 r 2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 4913 .4916r 2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936r 2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 4951 .4952r 2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 4963 .4964 r 2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 4973 .4974 r 2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 4980 .4981r 2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986r 3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 4990 .4990r 3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 4993 .4993r 3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 4995 .4995 r 3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 4996 .4997r 3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 4997 .4998r 3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 4998 .4998r 3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 4999 .4999r 3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 4999 .4999 r 3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 4999 .4999r 3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 5000 .5000 Appendix A Normal distribution table PAGE 509 APPENDICES 501 Above . . . 1 3 4 5 72.5 . 1 2 1 2 7 2 4 19 6 72.2 71.5 . 1 3 4 3 5 10 4 9 2 2 43 11 69.9 70.5 1 1 1 1 3 12 18 14 7 4 3 3 6 8 22 69.5 69.5 1 16 4 17 27 20 33 25 20 11 4 5 183 41 68.9 68.5 1 7 11 16 25 31 34 48 21 18 4 3 219 49 68.2 67.5 3 5 14 15 36 3 8 28 38 19 11 4 211 33 67.6 66.5 3 3 5 2 17 17 14 13 4 . 78 20 67.2 65.5 1 9 5 7 11 11 7 7 5 2 1 66 12 66.7 64.5 1 1 4 4 1 5 5 2 . 23 5 65.8 Below 1 2 4 1 2 2 1 1 . 14 1 Totals 5 7 32 59 48 117 138 120 167 99 64 41 17 14 928 205 Medians 66.3 67.8 67.9 67.7 67.9 68.3 68.5 69.0 69.0 70.0 . Heights of the Midparentsin inches. Below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69 .2 70.2 71.2 72.2 73.2 Above children. Heights of the adult children. Total number of Adult Midparents. Medians Number of adult children of various statures born of 205 midparents of various statures. (All female heights have been multiplied by 1.08) Note. In calculating the Medians, the entries have been taken as referring to the middle of the squares in which they stand. The reason why the headings run 62.2, 63.2, &c., instead of 62.5, 63.5, &c., is that the ob servations are unequally distributed between 62 and 63, 63 and 64, &c., there being a strong bias in favour of integral inches. After car eful consideration, I concluded that the h eadings, as adopted, best satisfied the conditions. This inequality was not apparent in the case of the Midparents. Source: F. Galton, "Regression towards Mediocrity in Hereditary Stature", Royal Anthropological Institute of Great Britain and Ireland, vol.15 (1885), p.248. Appendix B PAGE 510 502 APPENDICES Appendix C Life Table Number of survivors at single years of Age, out of 100,000 Born Alive, by Race and Sex: United States, 1990 0 100000 100000 100000 43 94707 92840 96626 1 99073 98969 99183 44 94453 92505 96455 2 99008 98894 99128 45 94179 92147 96266 3 98959 98840 99085 46 93882 91764 96057 4 98921 98799 99051 47 93560 91352 95827 5 98890 98765 99023 48 93211 90908 95573 6 98863 98735 99000 49 92832 90429 95294 7 98839 98707 98980 50 92420 89912 94987 8 98817 98680 98962 51 91971 89352 94650 9 98797 98657 98946 52 91483 88745 94281 10 98780 98638 98931 53 90950 88084 93877 11 98765 98623 98917 54 90369 87363 93436 12 98750 98608 98902 55 89735 86576 92955 13 98730 98586 98884 56 89045 85719 92432 14 98699 98547 98862 57 88296 84788 91864 15 98653 98485 98833 58 87482 83777 91246 16 98590 98397 98797 59 86596 82678 90571 17 98512 98285 98753 60 85634 81485 89835 18 98421 98154 98704 61 84590 80194 89033 19 98323 98011 98654 62 83462 78803 88162 20 98223 97863 98604 63 82252 77314 87223 21 98120 97710 98555 64 80961 75729 86216 22 98015 97551 98506 65 79590 74051 85141 23 97907 97388 98456 66 78139 72280 83995 24 97797 97221 98405 67 76603 70414 82772 25 97684 97052 98351 68 74975 68445 81465 26 97569 96881 98294 69 73244 66364 80064 27 97452 96707 98235 70 71404 64164 78562 28 97332 96530 98173 71 69453 61847 76953 29 97207 96348 98107 72 67392 59419 75234 30 97077 96159 98038 73 65221 56885 73400 31 96941 95962 97965 74 62942 54249 71499 32 96800 95785 97887 75 60557 51519 69376 33 96652 95545 97804 76 58069 48704 67178 34 96497 95322 97717 77 55482 45816 64851 35 96334 95089 97624 78 52799 42867 62391 36 96161 94843 97525 79 50026 39872 59796 37 95978 94585 97419 80 47168 36848 57062 38 95787 94316 97306 81 44232 33811 54186 39 95588 94038 97187 82 41227 30782 51167 40 95382 93753 97061 83 38161 27782 48002 41 95 168 93460 96926 84 35046 24834 44690 42 94944 93157 96782 85 31892 21962 41230 Age Both sexes Male Female Age Both sexes Male Fe male All races All races PAGE 511 Index estimation of, 43{46 n !, 80 absorbing Mark o v c hain, 416 absorbing state, 416 AbsorbingChain (program), 421 absorption probabilities, 420 Ace, Mr., 241 Ali, 178 alleles, 348 AllP erm utations (program), 84 ANDERSON, C. L., 157 ann uit y 246 life, 247 terminal, 247 arc sine la ws, 493 area, estimation of, 42 Areabargraph (program), 46 asymptotically equal, 81 Baba, 178 babies, 14, 250 Banac h's Matc h b o x, 255 BARHILLEL, M., 176 BARNES, B., 175 BARNHAR T, R., 11 BA YER, D., 120 Ba y es (program), 147 Ba y es probabilit y 136 Ba y es' form ula, 146 BA YES, T., 149 b eard, 153 b ellshap ed, 47 Benford distribution, 195 BENK OSKI, S., 40 Bernoulli trials pro cess, 96 BERNOULLI, D., 227 BERNOULLI, J., 113, 149, 310{312 Bertrand's parado x, 47{50 BER TRAND, J., 49, 181 BertrandsP arado x (program), 49 b eta densit y 168 BIENA YM E, I., 310, 377 BIGGS, N. L., 85 binary expansion, 69 binomial co ecien t, 93 binomial distribution, 99, 184 appro ximating a, 329 Binomial Theorem, 103 BinomialPlot (program), 99 BinomialProbabilities (program), 98 Birthda y (program), 78 birthda y problem, 77 blac kjac k, 247, 253 blo o d test, 254 BoseEinstein statistics, 107 Bo x parado x, 181 BO X, G. E. P ., 213 b o xcars, 27 BRAMS, S., 179, 182 Branc h (program), 381 branc hing pro cess, 376 customer, 393 Branc hingSim ulation (program), 386 bridge, 181, 182, 199, 203, 287 BR O WN, B. H., 38 BR O WN, E., 425 Buon's needle, 44{46, 51{53 BUFF ON, G. L., 9, 44, 50{51 BuonsNeedle (program), 45 bus parado x, 164 calenda 