Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UF00097460/00001
## Material Information- Title:
- Population density estimation using line transect sampling /
- Creator:
- Ondrasik, John Anthony, 1951-
- Place of Publication:
- Gainesville, Fla.
- Publisher:
- University of Florida
- Publication Date:
- 1979
- Copyright Date:
- 1979
- Language:
- English
- Physical Description:
- viii, 92 leaves : ill. ; 28 cm.
## Subjects- Subjects / Keywords:
- Approximation ( jstor )
Density estimation ( jstor ) Estimation methods ( jstor ) Expected values ( jstor ) Maximum likelihood estimations ( jstor ) Population density ( jstor ) Population estimates ( jstor ) Random variables ( jstor ) Statistical discrepancies ( jstor ) Wildlife population estimation ( jstor ) Animal populations -- Mathematical models ( lcsh ) Dissertations, Academic -- Statistics -- UF Plant populations -- Mathematical models ( lcsh ) Statistics thesis Ph. D - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis--University of Florida.
- Bibliography:
- Bibliography: leaves 90-91.
- Additional Physical Form:
- Also available on World Wide Web
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by John A. Ondrasik.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 023347944 ( AlephBibNum )
06591994 ( OCLC ) AAL3076 ( NOTIS )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

POPULATION DENSITY ESTIIMATION USING LINE TRANSECT SAMPLING BY JOHN A. ONDRASIK A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1979 To Toni For Her Love and Support ACKNOWLEDGMENTS I would like to thank my adviser, Dr. P. V. Rao, for his guidance and assistance throughout the course of this research. His patience and thoughtful advice during the writing of this dissertation is sincerely appreciated. I would also like to thank Dr. Dennis D. Wackerly for the help and encouragement that he provided during my years at the University of Florida. Special thanks go to my family for the moral support they provided during the pursuit of this degree. I am espe- cially grateful to my wife, Toni, whose love and understand- ing made it possible for me to finish this project. Her patience and sacrifices will never be forgotten. Finally, I want to express my thanks to Mrs. Edna Larrick for her excellent job of typing this manuscript despite the time constraints involved. iii TABLE OF CONTENTS Page ACKNOWLEDGMENTS . . . . . . . . . iii LIST OF TABLES ... . .... . . . . . . . .vi ABSTRACT . . . . . . . . . . . . vii CHAPTER I INTRODUCTION . . . . . . . . . 1 1.1 Literature Revieu . . . . . . 1 1.2 Density Estimation Using Line Transects 4 1.3 Summary of Results . . .. . . . .9 II DENSITY ESTIMATION USING THE INVERSE SAMPLING PROCEDURE . . . . . . . 13 2.1 Introduction . . . . . . . 13 2.2 A General Model Based on Right Angle Distances and Transect Length . . .. 14 2.2.1 Assumptions . . . . . 15 2.2.2 Derivation of the Likelihood Function . . . . . . 16 2.3 A Parametric Density Estimate . . 28 2.3.1 Maximum Likelihood Estimate for D 28 2.3.2 Unbiased Estimate for D . . .. 29 2.3.3 Variance of 6 . . . .31 2.3.4 Sample Size Determination Using u 32 2.4 Nonparametric Density Estimate . . . 34 2.4.1 The Nonparametric Model for Estimating D . . . . . . 36 2.4.2 An Estimate for fy(O) . . .. 37 2.4.3 Approximations for the Mean and Variance of (0) . . . 40 2.4.4 A Monte Carlo Study . . . 42 2.4.5 The Expected Value nnd Variance for a Nonparamctric Estimate of D. . 46 2.4.6 Sample Size Determination Using DN 47 TABLE OF CONTENTS (Continued) CHAPTER Page III DENSITY ESTIMATION BASED ON A COMBINATION OF INVERSE AND DIRECT SAMPLING .. . . .. 49 3.1 Introduction . . . . . . . 49 3.2 Gates Estimate . . . . . . . 50 3.2.1 The Mean and Variance of 6 .. ... 54 g 3.3 Expected Value of DCp ... ... .... 57 3.4 Variance of DCp . . . . . . 65 3.5 Maximum Likelihood Justification for DCP. 69 IV DENSITY ESTIMATION FOR CLUSTERED POPULATIONS .71 4.1 Introduction . . . . . . 71 4.2 Assumptions . . . . . . . . 73 4.3 General Form of the Likelihood Function .76 4.4 Estimation of D when p(-) and h(-) Have Specific Forms . . . . . . 79 4.5 A Worked Example . . . . . . 86 BIBLIOGRAPHY .. . . . . . .. . .. . 90 BIOGRAPHICAL SKETCH . . . . . . . .92 LIST OF TABLES TABLE Page 1 Number of animals, No, that must be sighted to guarantee the estimate, D has coefficient of variation, CV(Du) . . . . . .. . 34 2 Forms proposed for the function, g(y) . . .. 36 3 Results of Monte Carlo Study using g,(y) =e-10v 45 4 Results of Monte Carlo Study using g2(y) = -y 45 5 Results of Monte Carlo Study using g3(y) = 1-y 46 6 Number of animals, No, that must be sighted to guarantee the estimate DN has coefficient of variation, CV(DN) . . . . . . . 48 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy POPULATION DENSITY ESTIMATION USING LINE TRANSECT SAMPLING By John A. Ondrasik December 1979 Chairman: Pejaver V. Rao Major Department: Statistics The use of line transect methods in estimating animal and plant population densities has recently been receiving increased attention in the literature. Many of the density estimates which are currently available are based only on the right angle distances from the sighted objects to a randomly placed transect of known length. This type of sam- pling, wherein an observer is required to travel along a line transect of some predetermined length, will be referred to as the direct sampling method. In contrast, one can use an inverse sampling plan which will allow the observer to termi- nate sampling as soon as he sights a prespecified number of animals. An obvious advantage of an inverse sampling plan is that sampling is terminated as soon as the required number of objects are sighted. A disadvantage is the possibility that sampling may not terminate in any reasonable period of time. Consequently, a third sampling plan, in which sampling stops vii as soon as either a prespecified number of objects are sighted or a prespecified length of the transect is traversed, is of practical interest. Such a sampling plan will be referred to as the combined sampling method. The objective of this dis- sertation is to develop density estimation techniques suit- able for both inverse and combined sampling plans. In Chapter II, both a parametric and a nonparametric estimate for the population density are developed using the inverse sampling approach. We will show that a primary advantage of estimation using inverse sampling is the fact that these estimates can be expressed as the product of two independent random variables. This representation not only enables us to obtain the expected value and variance of our estimates easily, but also leads to a simple criterion for sample size determination. In Chapter III, we derive a parametric density estimate that is suitable for the combined sampling method. This esti- mate will be shown to be asymptotically unbiased. An approx- imation to the variance of this estimate is also provided. The density estimates developed in Chapters II and III are based on the assumption that the sightings of animals are independent events. In Chapter IV we relax this assumption and develop an estimation procedure using inverse sampling that can be applied to clustered populations--those popula- tions composed of small groups or "clusters" of objects. viii CHAPTER I INTRODUCTION 1.1 Literature Review Our objective in this dissertation is to examine the problem of density estimation in animal and plant populations. The demand for new and more efficient population density esti- mates has grown quite rapidly in the past few years. Anderson et al. (1976, p. 1) give a good assessment of the present sit- uation and provide some reasons for the renewed interest in this subject in the following paragraph: The need to accurately and precisely estimate the size or density of biological populations has increased dramatically in recent years. This has been due largely to ecological problems created by the effects of man's rapidly increasing population. Within the past decade, we have witnessed numerous data gathering activities related to the Environmental Impact Statement (lJEPA) process or Biological Monitoring programs. Environmental programs related to phosphate, uranium and coal mining and the extraction of shale oil typically require esti- mates of the size or density of biological populations. The Endangered Species Act has focused attention on the lack of techniques to estimate population size. It now appears that hundreds of species of plants may be pro- tected under the Act, and, therefore, we will need infor- mation on the size of some plant populations. Estimation of the size of biological populations was a major objec- tive of the International Biological Program (IBP) (Smith et al. 1975). Finally, we mention that the ability to estimate population size or density is fundamental to efficient wildlife and habitat management and many impor- tant studies in basic ecological research. The estimation of population size has always been a very interesting and complex problem For a recent review of the general subject area see Seber (1973). Although many of the methods described in Seber's book are quite useful, they are frequently very expensive and time consuming. Estimation methods based on capture-recapture studies would fall into this category. A further problem with many estimation meth- ods is that they are based on models requiring very restric- tive assumptions which severely limit their use in analyzing and interpreting the data. For these reasons and others, line transect sampling schemes are becoming more and more popular. This method of sampling requires an observer to travel along a line transect that has been randomly placed through the area containing the population under study and to record certain measurements whenever a member of the popu- lation is sighted. There are several density estimation tech- niques available using line transect data; however, the full potential is yet to be realized. Density estimation through line transects is typically practical, rapid and inexpensive for a wide variety of popu- lations. Published references to line transect studies date back to the method used by King (See l.copold, 1933) in the estimation of ruffed grouse populations. Since that time, numerous papers investigating line transect models have appeared, e.g., Webb (1942), Hayne (1949), Robinette et al. (1954), Gates et al. (1968), Anderson and Pospahala (1970), Sen et al. (1974), Burnham and Anderson (1976) and Crain et al. (1978). Since it is commonly assumed by these authors that the objects being sampled are fixed with respect to the transect, line transect models are best suited for either immotile populations, flushing populations (populations where the animal being observed makes a conspicuous response upon the approach of the observer) or slow moving populations. Examples of such populations are: (i) immotile birds' nests, dead deer and plants, (ii) flushing grouse, pheasants and quail, and (iii) slow moving desert tortoise and gila monster. The degree to which line transect methods can be applied to more motile populations, such as deer and hare, will depend on the degree to which the basic assumptions are met. In any case, one should proceed cautiously when using these models for motile populations. Despite the wide applicability of line transect methods, the estimation problem has only recently begun to receive rigorous treatment and attention from a statistical standpoint. Gates et al. (1968) were the first to develop a density esti- mation procedure within a standard statistical framework. After making certain assumptions with regard to the probabil- ity of sighting an animal located at a given right angle distance from the transect, they rigorously derived a popu- lation density estimate. In addition, they were the first authors to provide an explicit form for the approximate sam- pling variance of their density estimate. 1, While the assumptions of Gates et al. (1968) concerning the probability of sighting an animal did work well for the ruffed grouse populations they were studying, it is clear that the validity of their assumptions will be quite crucial in establishing the validity their density estimates. If the collected data fail to substantiate their assumptions, large biases could occur in the estimates as seen in Robinette et al. (1974). As a result, Sen et al. (1974) and Pollock (1978) relaxed the assumptions of Gates et al. (1968) by using more general forms for the sighting probability, while Burnham and Anderson (1976) developed a nonparametric approach as a means of providing a more robust estimation procedure. In the following sections, we will outline the general problem of density estimation using line transects, give our approach to the solution of this problem and summarize the results found in the remainder of this work. 1.2 Density Estimation Using Line Transects The line transect method is simply a means of sampling from some unknown population of objects that are spatially distributed. In the context of animal or plant population density estimation, these objects take the form of mammals, birds, plants, nests, etc., which are distributed over a par- ticular area of interest. From this point on, our refer- ences will always be to animal populations with the under- standing that the estimation methods we describe are appli- cable to all populations which satisfy the necessary assump- tions. In the line transect sampling procedure, a line is ran- domly placed across an area, A, that contains the unknown population of interest. An observer follows the transect and records one or more of the following three pieces of informa- tion for each animal sighted: (i) The radial distance, r, from the observer to the animal. (ii) The right angle distance, y, from the animal to the line transect. (iii) The sighting angle, 8, between the line transect and the line joining the observer to the point at which the animal is sighted. These measurements are illustrated in Figure 1. Figure 1. Measurements recorded using line transect sampling. (Z is the position of an observer when an animal is sighted at X. XP is the line from the animal perpendicular to the transect.) In this work, we shall consider the problem of estimating population density using only the right angle distances. Because estimates depending only on right angle distances are easy and economical to use, such estimates have become very popular over the past several years. Before any estimation procedure based on right angle distances can be formulated, certain assumptions regarding the population of interest must be made. A set of assump- tions used by several workers in the area is detailed in Section 2.2.1. One of the key assumptions in this set is that the probability of sighting an animal located at a right angle distance, y, from the transect can be represented by some nonincreasing function g(y), which satisfies the equal- ity, g(0) = 1. This function is simply a mathematical tool for dealing with the fact that animals located closer to the line transect will be seen more readily than animals located further away from the transect. An alternative method of dealing with this phenomenon is given by Anderson and Pospahala (1970). If g(y) is assumed to have some specific functional form determined by some unknown parameters, then the estimate is said to be parametric. On the other hand, if g(y) is left unspecified except for the requirements that it is nondecreas- ing and g(0) =1, then the estimate is said to be nonparametric. Seber (1973) has shown that any density estimate based on right angle distances will have the form N s 2LoC' where N is a random variable representing the number of animals seen in a line transect of length Lo and c is an estimate for c, a parameter which depends on g(y) through the relation c = g(y)dy. By noting that the density is simply the number of animals present per unit of area, it is clear that c can be inter- preted as one-half of the effective width of the strip actu- ally covered by the observer as he moves along the transect. Further examination of Ds also points out that estimating the parameter c is the key to the estimation problem. At this time, we would like to point out that the range for the right angle distance, y, is allowed to go from 0 to +-, as seen in the integral on the right hand side of the equation for c. In practice, since we are considering only a finite area, A, there will most certainly be a maximum observation distance, W, perpendicular to the transect. However, if W is large enough so that the approximation Sg(y)dy g(y)dy (1.1) is reasonable, then letting y range in the interval (0,+-) will not cause any real problems. In practical terms, this means that the probability of observing an animal located beyond the boundary, W, should be essentially zero. In most real life situations, W can be chosen large enough so that the approximation given in (1.1) is valid. Thus, in the chapters which follow, we will implicitly assume chat relation (1.1) holds for the density estimates that we develop. Both parametric and nonparamecric models have been used to derive an estimate for the parameter c, and, consequently, for the population density. In both cases, the estimate for c turns out to be a function of the observed right angle distances. In the parametric case, c will simply be a func- tion of the parameters that define the function chosen for g(y). Examples of parametric estimates are found in Gates et al. (1968), Sen et al. (1974) and Pollock (197S). Estimation using the nonparametric model is more compli- cated. Burnham and Anderson (1976) have shown that estimat- ing 1 is equivalent to estimating fy(0), where fy(.) is the conditional probability density function for right angle distance given an animal is sighted. Thus, the problem of finding a nonparametric estimate for the population density reduces to the problem of estimating a density function at a given point. Unfortunately, this problem has not received much attention in the literature. Burnham and Anderson (1976) suggest four possible estimates for fy,(0), but the sampling variances associated with these estimates have not been established. Crain et al. (1978) have also considered the problem of estimating fy(0). They derive an estimate using a Fourier Series expansion to approximate the conditional probability density function fy(y). Although their procedure does not lead to a simple estimate, they do provide an approximation to its sampling variance. The line transect method and the corresponding population density estimates so far described require the observer to travel a predetermined distance, Lo, along the transect. This methodwillbe called the direct sampling method. An alternative to the direct method is the inverse sampling method, wherein sampling is terminated as soon as a speci- fied number, No, of animals are sighted. Clearly, in the direct method, the number of animals seen is a random variable and the total length travelled is a fixed quantity, while in inverse sampling method, the total length travelled is the random variable and the number of animals chat must be seen is fixed. The main focus of this work will be to develop density estimation techniques that are based on the inverse sampling method. In addition, we will consider the density estimation problem when a combination of the inverse and direct sampling plans is used. 1.3 Summnar. of Results In Chapter 2 we derive two estimates for the population density, D, using an inverse sampling scheme. The set of assumptions which justify the use of these estimates is similar to those used by Gates et al. (1968) and several others. The estimates have the form D = where No is the number of animals thac must be seen before sampling terminates, L is a random variable representing the length travelled on the transect and c is as previously defined. Note the similarity of bI to Ds given in Section 1.2. The only difference between the two estimates is that in bI the random variables are L and E, while in s they are II and c. However, this difference gives the inverse sampling method a theoretical advantage over the direct sampling method. The random variables L and E will be seen to be independent while N and c are not. Thus, the estimate DI is the product of two independent random variables, a fact which not only allows us to obtain its expected value and variance easily, but also leads to a simple criterion for sample size determination. Both a parametric and a nonparametric estimate for the animal population density are developed in Chapter II. In deriving the parametric estimate, the functional form assumed for g(y) is identical to the one used by Gates et al. (1968). Our parametric density estimate is shown to be unbiased and the exact variance of this estimate is also provided. In the nonparametric case we propose an estimate for f (0) using the method developed by Loftsgaarden and Quesenberry (1965). We then use heuristic reasons to show that the corresponding density estimate is asymptotically unbiased, and derive a large sample approximation for its variance. The inverse sampling method does have one drawback when there is little information available concerning the popula- tion to be studied, namely, there exists the possibility that an observer might have to cover a very long transect to sight No animals. To overcome this problem, we develop a parametric density estimate in Chapter III that is based on a combina- tion of the inverse and the direct sampling procedures. In the combined sampling scheme, sampling is terminated when either a prespecified number, No, of animals are sighted or when a prespecified length, Lo, has been travelled along the transect. Thus, in combined sampling both the length trav- elled and the number of animals seen will be random variables. In deriving the density estimate based on the combined sampling method, we again use the functional form for g(y) proposed by Gates et al. (1968). This estimate is shown to be asymptotically unbiased. In addition, an approximate variance for this density estimate is provided. The density estimates developed in Chapters II and III are based on the assumption that the sightings of animals will be independent events. Gates et al. (1968) showed that this assumption failed to hold for the animal population they were studying. In Chapter IV we relax this assumption, and develop an estimate based on inverse sampling that can be 12 applied to clustered populations--populations in which the animals aggregate into small groups or"cluscers." Since the estimation procedure developed will require the use of a high-speed computer, the last section of Chapter IV is devoted to a worked example to illustrate the computations that would be involved. CHAPTER II DENSITY ESTIMATION USING THE INVERSE SAMPLING PROCEDURE 2.1 Introduction In this chapter we shall propose estimates for animal population density based on an inverse sampling procedure. Unlike the direct sampling method considered by Gates et al. (1968), the inverse sampling procedure specifies the number of animals that must be sighted before the sampling can be terminated. Thus, in the inverse case the number of animals sighted will be a fixed rather than a random quantity. A precise formulation of the inverse sampling method is as follows: 1. Place a line at random across the area, A, to be sampled. 2. Specify a fixed number, No, and sample along the line transect until No animals are observed. As one proceeds along the transect, certain measurements will be made. These will be denoted by yl,y2,'... ,yj and Z, where y. is the right angle distance from the ih animal observed to the transect and k is the total distance trav- elled along the transect during the observation period. A visual depiction of these measurements is given in Figure 2. Figure 2. Measurements recorded using inverse sampling. 2.2 A General Model Based on Right Angle Distances and Transect Length The estimates for the density, D, that we will develop are based on the right angle distances, yly2 ...' YNo and the total distance, Z, travelled along the transect. TJo possible approaches to the estimation of D merit consider- ation. First, recall that density is defined as the number of animals present per unit of area, or equivalently the rate at which animals are distributed over some specific area. Therefore, we can write D = where A is the area of interest and N is the total number of animals present in A. In the direct sampling approach the estimation of D is most often accomplished by first estimating N and then dividing by A. Seber (1973) shows that any estimate of N based on direct sampling has the form ^* NA 2LoC where N is a random variable denoting the number of animals seen, L, is the length of the transect and c is an estimate of c, a parameter which depends on the probability of sight- ing an animal given its right angle distance from the tran- sect. Note that in Seber's estimate, N is random and Lo is fixed. It follows then, that Seber's estimate for D does not depend explicitly on A and has the form B = -N s 2Loc Therefore, the estimate of D is independent of the actual size of A, a property that any reasonable estimate of D should possess. As an alternative, D itself can be regarded as the basic parameter of interest and estimates for D can be derived directly. This is the approach taken by Burnham and Anderson (1976) and the one that we will follow in developing our estimates. 2.2.1 Assumptions The form of any estimate of D, the animal population density, will depend upon the type of assumptions we can make regarding the distribution of the animals to be censused and the nature of the observations that will be made. The assumptions our estimates will be based on are as follows: Al. The animals are randomly distributed with rate or density D over the area of interest A, i.e., the probability of a given animal being in a particu- lar region of Area, 6A, is 6A/A. A2. The animals are independently distributed over A, i.e., given two disjoint regions of area, 6A1 and 6A2' P(n1 animals are in A1 and n2 animals are in 6A2) = P(n1 animals are in 6A1)P(n2 animals are in 65A). A3. The probability of sighcing an animal depends only on its distance from the transect. In addi- tion, there exists a function g(y) giving the conditional probability of observing an animal given its right angle distance, y, from the tran- sect. In probability notation, g(y) = P(observing an animal I y). A4. g(0) = 1, i.e., animals on the line are seen with probability one. A5. Animals are fixed, i.e.. there is no confusion over animals moving during sampling and none are counted cwice. 2.2.2 Derivation of the Likelihood Function We will use the maximum likelihood procedure to obtain an estimate for D. The joint density function we are inter- ested in is f ,L(v .; No) where Y = (Y ,Y2' ,. . Y ) is the vector of random variables 1 2 0 representing the right angle distances, L is the random var- iable representing the total length travelled, and No is the specified number of animals to be seen before sampling ter- minates. Since the dependence of the joint density on No is implicit throughout the rest of this chapter, it will be dropped from our notation for convenience. Thus, from now on we will denote the density as fy,L ( ,'), and all other expressions depending on No in this manner will be handled accordingly. The following two theorems will be very useful in the derivation of the likelihood function. Theorem 1: Let N(Z) denote the number of animals sighted in the interval (0,O] along the transect. Then, N(Z) is a Poisson process, and for some > 0, n! Note that the quantity O8 equals the expected number of animals sighted per segment of length I. Proof: In order to show that N(A) is a Poisson process, we will show that the assumptions in Section 2.2.1 imply the postulates necessary for a Poisson process given in Lindgren (1968, p. 162). First, consider two disjoint intervals, 1 and Z2' along the transect and the corresponding areas, A(1) and A(Q2), enclosed by lines perpendicular to the transect as shown in Figure 3. Figure 3. Two disjoint areas along the transect. Now let N1 and N2 be random variables representing the total number of animals that occupy A(Q1) and A(r.2), respectively. By definition, N(QI) and N(S2) are the number of animals sighted in A(UI) and A(r2), respectively. We know from assump- tion A2 that N1 and N2 are independent, and from assumption A3 that sighting an animal depends only on its distance from the transect. Thus, N(C1), which depends solely on N1 and the distances to the NI animals from the transect, is independent of N(Q2), i.e., the number of sightings that occur in two disjoint intervals along the transect are independent events. Next we will show that for every >m> 0 and any h >0, N(e)-N(m) and N(C+h)-N(m+h) are idcnticailly distributed. First, note that the effective area sampled in seeing N(O)-N(m) animals and N(R.+h)-N(m+h) animals is equal to A(.-m) as seen in Figure 4. Figure 4. Effective area sampled in seeing N(Q)-N(m) animals and N(Z+h)-N(m+h) animals. Therefore, by assumptions Al, A2, and A3, and since the tran- sect is dropped at random, it follows that P{N(Z)-N(m) =j}=P{N(Z+h)-N(m+h) =j), j =0,1,2,... Next we must show that for every Z > 0, and some 9 > 0, P{N(z) = 1 = B+o(Z), as O, where o(Z) is a function such that lim () 0. Again let A(Z) be the area defined by t on the transect. Noow define B. to be the event {N(P.)=j} and E. to be the event J J that there are exactly j animals in area A(Z). Then it fol- lows that 0o P(B ) = E P(BIE ) j=l = P(BI|Ej)P(Ej). j=1 20 Under assumptions Al and A2, Pielou (1969, p. 81) has shown that -DA(C) P(Ej) DA() P(E ) = e- j =0,1,2... . Also, under assumptions Al, A2 and A3, Seber (1973, Eq. (2.6)) has shown that P(BIEl) i T where c = g(y)dy. (2.1) '0 Therefore, we can write P(B1) = 2cDe DA(Q) + 7 P(B 1Ej)P(E ) j=2 and if we show SP(BlIE.)P(E.) = o(C) j=2 the proof will be complete. Note that CO M DA() e -DA()[DA(C)1J P(B1 E.)P(E ) Z e- .IDA( j=2 j=2 J. S-DA() jDA(9) -1 = DA(.) L e' j=2 C" -DA(9.) j-1 e AIDA(e)] D <-j DA() 7- j=2. (j-1)! j 2 = DA(.)I1-e-DA()I. For any finite area A, A(Z) is 0(Z), that is lim A() < K, for some K> 0. ', - Therefore, as -+ 0 E P(BI E.)P(E ) = o(). j=2 and,. upon writing 0 = 2cD, (2.2) we get, as 0, P(B ) = e9 + o(Z). Finally, we need to show that for every Z > 0, E P{N() =n} =o(Z), as 0. n>l Note that for all n> 1, we can write P(B ) = E P(B Ej) j=l = P(B IE.)P(Ej ). j=n Again, by using the fact that A(Z) is 0(,), it is easy to show that P(Bn) = o(9A), as 0, and N(k) satisfies the four conditions necessary for a Poisson process. Before proceeding to the second theorem, we need to define the following random variables. Let T. denote the random i~ variable corresponding to the distance travelled on the transect between sightings of the (i-1)st and ith animals, i = 1,2,... ,N. Then the total distance travelled is given by No L = T.. i=l The following theorem establishes the independence of Y and T1,T2 ... T1 for the case No = 2, and this fact enables us to derive the joint density function, fyL(v,,). Theorem 2. The random variables T1, T2, Y1 and Y2 are mutually independent. Proof: In order to establish the independence of T 2, T2, Y1 and Y2 we will derive the joint density fT1 T2 1 (t t2 l '2) and show that it can be factored into four functions, each depending on only one of the random variables of interest. Let 1' Y,2' t1' t2' h1, h2' g1 and g2 be non-negative real numbers such that t + h < I + t2' l + g1 2 as shown in Figure 5. TI~1 fIn tl+t2 tl+t2+h2 Figure 5. Areas defined by yl,y2,tl,t2',g1g2,hl, andh2. Now let P(h1,g81h2,g2) = P(t1 < T2 tl+hl'Y1 < Y 1 < 1+g t2 < T2 t2 +h2'Y2 < Y2 y2 +g2). Then we can write P(hl,g' ,h2'g2) fTl T Y Y (tlt2I'yl2) h lim h 2' '1 2 h. 0 hlgl2g2 gi 0 i = 1,2 provided the limit exists. Now notice that the event whose probability we wish to find, namely {t < T st] +hl'I < Y1 <1 +glt2 < T2 t2 +h2 'y2 is equivalent to the intersection of the following events: Si, the event {N(tl) = 0} S2, the event {N(tl+hl)-N(tl) = 1} and {yl i.e., an animal is seen in area I Y2 Yl+gl Yl t1 tl+hl S, the event (N(t +t2) N(tl+h) = 0) S the event (N(tl+t2+h2) N(tl+t2) = 11 and {y2.'Y2 2+g}2, i.e., an animal is seen in area II. Now, by Theorem land Assumption A3,.the events S1, S2, S3 and S4 are independent so that we can write P(hl,g1,h2,g2) = P(S1S2S3S4) = P(SI)P(S2)P(S3)P(S ). We now need to find expressions for the probabilities of Si, S2' S3 and S Since N(9.) is a Poisson process, P(S ) = e- , and P(S3) = e-0(c-h) However, P(S2) and P(SA) are not so easil: obtained. We will only show how to find P(S2), since P(SL) is found in a similar fashion. First, define S2j to be the event that there are exactly j animals in area I. Then P(S2) = E P(S2S2j) j=1 = P(S2 |S2j)P(S2j). j=l By assumptions Al and A2, the number of animals located in area I will be distributed as a Poisson random variable with parameter 2Dglhl, where D is the density of the animals (see Pielou, 1969, p. 81). Note, the factor of 2 comes in since area I can be found on both sides of the transect. Therefore, P(S2j) = -2Dglhl j e (2Dglh1) S = 0,1,2,... By assumption A3, P(S21S21) = g(y), for some yl -2Dglh1 O P(S) = 2Dglhle g(yl) .+ P(S21S2j)P(S2j) j=2 -2Dglh+o(g = 2Dglhle g(y{)+o(glhl). Similarly, we can show that as -2+0 and,h2-0, P(Se-2Dg2h2 +o(g2h2 P(S4) = 2Dg2h2e g(yn)+o(g~h2), for some y2 to obtain P(hl,g' ,h2,g2) =e -9t1 -e(t2-h1) e {2Dglhle -2Dglh1 g(yl)+ C(glhl ))} x { h-2Dg2h2 + 2h2 x {2Dg2h2e g(y)+ g2h2)}. Consequently, P(h,g ,h2'g2) 2 -0t1 -0t2 lil h" = 4D e e g(yl)g(y2), h.-*0 81hlg2h2 gi0 i=1,2 which completes the proof of Theorem 2. (2.3) In the same manner, we can show that the independence established in Theorem 2 will hold for any finite number of sightings, No. In this case if T = (T, ,T2... ,TN and Y= (Y' ,...,' ), then (2.3) becomes No -e~t i=1 f T,(t,v) = 2 ODNoe 1 g(y ). i=l Upon using equation (2.2) in fT y(tv), we get -e t NO f ,Y(c,v) = e'e i=1 c No -i=l 1 Thus, the marginal distributions for T. and Y. are g(yi) g (y i) S iand c -Ot. f (ti) = 9e t.> 0. Therefore, T1,T,, .. TN are independent, identically distri- buted (iid) as Exponential random variables with parameter 8, and No L = T. T. has a Gamma distribution with parameters No and 0, i.e., oNo No-1 e- S(9ro) = NO e .>0, 8>0. L (4f(N0) Furthermore, L is independent of Y. The likelihood function for the estimation of 0 and c can now be obtained by taking the product of fL(M) and fy(z), i.e., NoM N N o-le-0O L(O,c;y, ) = g(yi) F(No) (2.4) c i=1 We will now outline how one can estimate D, the animal population density, from the likelihood function given in (2.4). As noted earlier, D is related to 0 and c by equation (2.2), i.e., D - Thus, the maximum likelihood estimate for D would be where 0 and c are maximum likelihood estimates of 0 and c, respectively, obtained from (2.4). Note that the estimate D is the ratio of two mutually independent random variables, one depending on L alone and the other depending on Y alone. This property will be found to be very useful when evaluating the moments of 6. We have now set the framework necessary for deriving an estimate of D. In the next section we shall obtain an estimate for D assuming that g(y) has a particular parametric form. 2.3 A Parametric Densicy Estimate Any estimate for D that is derived after assuming an explicit function for g(y) will be called a parametric esti- mate. Gates et al. (1968), using direct sampling, derived an estimate for D assuming g(y) = e- Using this same function for g(y), we will derive the corre- sponding estimate based on inverse sampling. 2.3.1 Maximum Likelihood Estimate for D To estimate D we need to estimate both 0 and c from the likelihood function (2.4). In this case g(y) = e- :>0, \>0 so that 1 c = . Substituting for c in (2.2) yields D = _- (2.5) Also, by substituting for c in (2.4), the likelihood function becomes No -\. 'i nr n"o-1 -Oz L(0,A, ,(J,) = i=1e i 0 e.0, y.>0. (2.6) The joint maximum likelihood estimates for 0 and A can now be easily obtained. The natural logarithm of the likeli- hood function is No ZnL(0,A;y,Z) = NolnA-A yi+N onO+(No-1)Zn-O-Z- Inlr(No). i=l Taking the partial derivatives with respect to 9 and X yields anL(9,A;y,z) N 3ZnL(,X;y,j) _N NN Setting these equal to 0 yields ^ No and i=l Substituting these estimates for 0 and A in (2.5), the maximum likelihood estimate for D is seen to be ^^ 2 No D 2 No 2 E y. i=l 2.3.2 Unbiased Estimate for D The expected value of the estimate D,developed in 0 A = E(6)E(A) since 0 and A are independent. Using the fact that L has a Gamma distribution with parameters No and 0, we obtain = .No E(A) = E( ^.J0e (No-1) i To derive an expression for E(.), first recall that Y . ,Y are iid with the common density f (y) = = Ae- y>0. C No Therefore, E Y. is distributed as a Gamma random variable i=l with parameters No and \ and E(X) = E N 1. 0 (No- 1) Independence of U and X now yields E(D) = E(6)E(,) -2 2 (N o- 1) =- D. (No-1) An unbiased estimate for D is, therefore, given by u 2 No 2 Lu(2.7) 2L E Y. i=l 1 2.3.3 Variance of D U No Due to the independence of L and E Y. the variance of i=l D can be derived directly. We have -1) Var (D ) = Var (N0O 2L E Y. (N-i=l ] M-1) Var I SVar N L E Y. i=l (No-1) 1 2 L E Y. L E Y. 2C Y 2 i=l i=l No Since L and E Y. are independent it follows that i=l 1 Va(D) (No-I)4 EL-_-\E~ 1 E_2 iyi_ Var(D) = E E E (2.8) S Y E Y., i=1 / i=1 Deriving the Var(D ) now reduces to the problem of evaluating 1 1 1 1 the expected values for L' N. and Expres- E Y. L EOY i=l I i=l e sions for these quantities are easily obtained by noting that No L and E Y. have Gamma distributions with respective parameters i=l No,0 and No,A. Straightforward calculations show that for No>2, 1 0 E() = (N0- (2.9) E Y. E( y' (N -1)(1) \i=1 1 0_ (2.11) E (No- (N-2) (212) Now using (2.5), (2.9), (2.10), (2.11), and (2.12) in (2.8) we get 4 2 A 2 9 Var(D ) 0 4- 2 -AF u 4 (No- )2(No- )2 (No-l)4 (to-L) 0-A j 1 1 (No-2) (1o-1) = D2 (2 (2.13) (No-2 ) provided No>2. Note that Var(D ) does not exist if Nos2. 2.3.4 Sample Size Decermination Using D The first problem in designing a survey using the inverse line transact method is to determine in advance the number of animals, No, that must be sighted before sampling terminates. One criterion for the selection of No (see Sober, 1973) is Lhe requirement that the design must yield an estimate of the density, D, with a prescribed coefficient of variation, CV =-- E(D) where o^ and E(D) denote, respectively, the standard deviation D and the expected value for the estimate, D. As one can see immediately, small values of CV are desirable since this indicates that the estimate has a small standard deviation relative to its expected value. With the inverse sampling method, the value of No needed to guarantee a preset value, C, for the coefficient of varia- tion of D can be calculated easily. Using (2.7) and (2.13) u we see that, for No>2 CV(D) (2N-3)/2 CV( U) = N. uNo-2 Then, setting C= CV(D ), it is easily shown that No is the root of the quadratic equation C2N2 (4C2+2)No+4C2+3 =0. Solving for No yields the two roots S1+(+C2)l/2 No = 2+ C Since the variance of D exists only for No>2, the required sample size is N0 = 2+ 1++C2 )/2 C For example, if C= .25, then No =35. Table 1 gives values of No corresponding to coefficients of variation ranging from .1 to .5. Table I. Number of animals, No, that must be sighted to guarantee the estimate, D has coefficient of variation, CV(D ). U CV(Du) No .50 11 .40 15 .30 25 .25 35 .20 53 .15 92 .10 203 2.4 Nonparametric Density Estimate In this section we will consider a nonparametric estimate for the population density, D, using inverse sampling. In contrast to the parametric approach used in Section 2.3, the nonparametric approach leaves the function g(y), which represents the probability of observing an animal given its right angle distance, unspecified. In Section 2.2.2 we showed that an estimate for D is given by D - 2c where I and c Jare tLhc stimates for 0, Lhe cxpcctcd number of sightings per unit length of the transect, and c defined as c = g(y)dy. *O If g(y) is completely specified, except for some parameters, then the problem of estimating D reduces to the problem of estimating 6 and the parameters in g(y). In Section 2.3 we considered the specific case g(y) = e- A drawback to this approach, where we specify a functional form for g(y), is that the function chosen must take into account the inherent detection difficulties that are present when a particular animal species is being sampled. If one examines the various forms that have been suggested for g(y), one quickly becomes aware of the problem of finding a form that is flexible enough to accommodate the many possibilities which exist. Some of the functions that have been proposed for g(y) are presented in Table 2. As seen in the table, the suggestions for g(y) represent a number of different shapes in an effort to reflect the nature of the animal being sampled and the type of ground cover being searched. Because of the problems that can arise in choosing a function for g(y), Burnham and Anderson (1976) considered a nonparametric approach as a means of avoiding the need for the specification of g(y). Leaving g(y) unspecified will allow the estimation procedure to depend on the observations that are actually miadc, not on any panrticulnr model. Thus, a nonparametric model might provide a more robust estimation method, that is, an estimation method that could be applied to a much wider class of animal species. Table 2. Forms proposed for the function, g(v). Function e Author , A>0 Gates et al. (1968) g(y) = g(y) = a 1 - 0 0<: y < w Eberhardt (1968) 'y> 1 O 0 >w Seber (1973) S..a- -P.. BX e-l g(y) = e (c dx, F(ci y B>0, a>0 Sen et al. (1974) , p>O, \>0 Pollock (1978) 2.4.1 The [Jonparametric Hodel for Estimating D Consider the estimate for D developed in Section 2.2.2, that is D 2a As noted earlier, if g(y) = e - then c= 1 A and our estimate for D is 2 g(y) = e g(y) Now, if g(y) is left unspecified, then an estimate for 1 may be obtained along the same lines Burnham and Anderson (1976) used in the case of direct sampling. By assumption A4, f (0) = gO) 1 Y c c 1 Hence, f equals the value of the fy(') evaluated at y=0, where fy(-) is the probability density function for the right angle distance, Y, given an animal is seen. The problem of 1 finding a nonparametric estimate for -, therefore, reduces to the problem of finding an estimate, y(0), for fy(0). An estimate for D will then be given by D = (2.14) where 0 may be taken as the maximum likelihood estimate derived in Section 2.3.1. That is, S(N -1) L where we have replaced No by No-l to remove the bias. 2.4.2 An Estimate for f (0) ------------------ - Burnham and Anderson (1976) suggested four possible methods for estimating fy(0), but we are not aware of any work which investigates the theoretical properties of any of these estimates. Loftsgaarden and Quesenberry (1965) con- sidered a density function estimate based on the observation that hat F (x+h) Fy(x-h) fy(x) = lim 2h h-0 h where F y() is the cumulative distribution function. For the purpose of estimating fy(0), their estimate takes the form Ff(0) = {(O (' + ) (2.15) where [ITfo + 11] is the value of ,Ton + 1 rounded off to the nearest integer and Y is the j order statistic of the sample ylY2',. -n Loftsgaarden and Quesenberry (1965) showed that f (0) as given in (2.15) is a consistent estimate, provided fy(-) is a positive and continuous probability density function. One nice property of y(0) is that it can be easily calcu- lated from the data. However, evaluation of the moments of this estimate does present some problems. In fact, the mean and the variance may not even exist in some cases. But, whenever ['TJo + 1] 3, i.e., whenever (Jo,,, the variance of fy(0) is finite as shown in the following theorem. Theorem 3. Let Y 1 Y2' .. n be a set of independent, iden- tically distributed random variables, representing the right angle distances, with continuous probability density function (p.d.f.) fy(y) = c v>O Also, let Y( be the rh order statistic. Then (r)r) Efor e y i er + h for every integer r, such that 3 r n. Proof: The density function for Y(r) is hr(y) = n- FrI y)[l-F(y) n-rfy) where y F (y) = fy(t)dt. Therefore, E-) = n1 r- (y)[1-Fy(y) n-rdF, Since g(y) represents a probability, g(y)sl and F(y) g(t) dt < y F(y) c C Therefore, nr- E- 2 Fr(y)[l-Fy(y)]n-rdFy), (rY c 0 y (n-l\ Sr- r (r-2)r(n-r+l) c r(n-1) which completes the proof. Simple asymptotic approximations for the mean and variance of y(0) which work well for several densities given in Table 2 can be developed using the first order terms in a formal Taylor series expansion of Jy(0). The basic ideas involved in the derivation of these approximations are pre- sented in the following section. 2.4.3 Approximations for the Mean and Variance of f,,(0) -1- Let F(-) and F- (*) denote the cumulative distribution function and its inverse for the random variable Y, the right angle distance. Also, let r = ['N + 1], U = F(Y ), r (r) ' and +(U ) = (r-l)F (U ) r th where Y(r) is the r order statistic in a random sample of size N from F(-). Then proceeding as in Lindgren (1968, p. 409) it is easy to see that f,,(0) = ,(U ), E(U) -- Pr' and Pr(1-Pr) Var(Ur) N+2 (2.16) Assuming that q(-) is continuous and differentiable once at Pr, the first order terms in the Taylor series expansion of F(*) at Pr yields the approximation .(U .) t ,(p ) + (U -p ) (u) (2.17) r du u=P (2.17) Taking expectaLions on both sides of (2.17) yields EI t.,(U )} J l (pr), r - and substicucing for r, pr and g(-) fields E{ (O0) ) -1 S- F- 1N+1 / N /+1 Taking the limit as N tends to infinity and noting that F (0) is 0 and u = F(y) yields lim E{f ,(0) lim N- co Y I -co ] = u=0 SdF( o) ( ). dy y=0 (2.18) Thus for large N, fy(0) is approximately unbiased. An appro:-:imation for the variance of 'y(0) is found in a similar fashion. Using (2.17) we get Var{((Ur)) -d, upu r 2Var(Ur). --- u=pr Evaluating the derivative yields d., (u) 1 du u=pr (r-l){F- (pr f so that Varf.+(Ur)} r -1 1 - r (r-l)F (r Pr) Var (U ). r How, using (2.16) and then making the appropriate substitu- tions for r, pr and +*(*), we get 1 (,/ +i) (14- ,'N) Var(f (0))- >. - -1 2 TIN+y IY N+1 I 12 Therefore, as [1-u we have lim ,'T(Var .',(0)} = f (O) so that an approximation for the variance, when N is large, is given by f2(0) Var(,,(0)} (2.19) As stated earlier, the expressions obtained for the expected value and variance of f,,(0) are only approximations. Their adequacy for practical purposes may be evaluated by a Monte Carlo study involving various specific forms for the p.d.f., f,(-). In the next section we will look at the results of just this kind of simulation study. 2.4.4 A Monte Carlo Study A Monte Carlo study was used to examine the approximations for E{ F,(0)) and Var(y,(O)} presented in Section 2.4.3. Three possible shapes for Cy(-) were used in the study. Since the shape of fy(.) depends solely on the choice of g(y), the functions g1(y) = e-10y, y>O g2(y) = 1-y O and g3(y) = l-y2, 0 et al. (1968), while Eberhardt (1968) suggested both g2(-) and g3(*). The different shapes that these three functions represent are depicted in Figure 6. 1 82 g(y) 1 y Figure 6. Three forms for the function g(y). For each value of n= 25, 35, 45, 65, 80 and 100, two thousand random samples of size n were selected from each of the three populations defined by gi('), i= 1,2,3. These samples were obtained by first generating observations from a uniform distribution defined on the interval [0,1] and then transforming these values using the appropriate density fy(*). The UNIFORM function described in Barr et al. (1976) was used to generate the samples from the uniform distribution. For each set of 2000 samples, empirical estimates were calculated for the expected value, e, the percent bias, Be, and the standard deviation, ae, of fy(0) given in equation (2.15) as follows: Let fiY(0) denote the estimate from the ih sample, i= 1,2,...,2000. Then 12000 e 1 i (0) qe = -0 fiY(0) i=Li B = 00( fy(O) e f Y(0) and 1 2000 ( 21/2 ae i= 9 1 (f y(0)- e)-=l All of the necessary computing was performed under release 76.6B of SAS (see Barr et al., 1976) at the Northeast Regional Data Center located at the University of Florida. The results of the study, along with the approximate standard deviations, fy(0) T - are presented in Tables 3, 4, and 5. As can be seen from the tables, the estimate of fy(0) has a negative bias for most samples, generally of a magnitude less than 10% of the true value. The ratio of OT/oe is also within 10% of one for almost all samples considered. This is even true for the smaller sample sizes, n<45. Also, when considering the smaller sample sizes, the ratio was for the most part greater than one. Based on the results of this simulation, we feel that, in practice, the approximations obtained for the expected value and variance of fy(0) would perform adequately. Table 3. Results of Monte Carlo Study using gl(y) = e- Sample a Size e B oT o T e e Te -- e 25 9.05 -9.5 4.47 4.55 .98 35 9.08 -9.2 4.11 4.12 1.00 45 8.87 -11.3 3.86 3.65 1.06 65 9.48 -5.2 3.52 3.65 .96 80 9.49 -5.1 3.35 3.56 .94 100 9.48 -5.2 3.16 3.13 1.01 For gl(y) the theoretical mean is 10. Table 4. Results of Monte Carlo Study using g2(y) = l-y. Sample B T aT Size e e T e -- e 25 1.88 -6.0 .894 .850 1.05 35 1.88 -6.0 .822 .801 1.03 45 1.83 -8.5 .772 .674 1.15 65 1.96 -2.0 .704 .722 .98 80 1.92 -4.0 .669 .647 1.03 100 1.93 -3.5 .632 .615 1.03 For g2(y) the theoretical mean is 2. Table 5. Results of Monte Carlo Study using g3(y)= 1-y2 o Sample B T B o a -- Size e e T e e 25 1.47 -2.0 .671 .625 1.07 35 1.48 -1.3 .616 .594 1.04 45 1.44 -4.0 .579 .559 1.04 65 1.51 .7 .528 .536 .99 80 1.49 .7 .502 .506 .99 100 1.50 0.0 .474 .477 .99 For g3(y) the theoretical mean is 1.5. 2.4.5 The Expected Value and Variance for a Nonparametric Estimate of D Now that we have decided upon an estimate for f (0), the problem of estimating D is straightforward. Substituting the estimate, fy(0), defined in Section 2.4.2 into expression (2.14) a nonparametric estimate for D is (N.-1) D^ = No-l) (2.20) 2L/N '1 +11) Expressions for the expected value and variance of 5N are easily obtained. Since L and Y,Y ...'. Yo are indepen- dent, we can write 1 ^ E(DN) = ; E(0)E{fy,(0)}, and (see Goodman, 1960, Eq. (2)) 1 ar(D) Var(N) = 4 (6)Varf (0)}+E'lf,(O))Var(6)+Var(6)Var(@'(0)], where ^ N -1 L ' and fy(0) = 1 /Y([/No+l]) Then, upon substituting the appropriate expressions for the moments of 6 and fy(0) into the above equations, we get E(DN) D, (2.21) and 2 (/N +1) ar(N (No+2) (2.22) 2.4.6 Sample Size Determination Using DN We can now determine the approximate value of No that is needed to guarantee some preset value for the coefficient of variation of DN, CV(DN). These values for No can then be compared to the corresponding values for No (see Table 1) that are needed to ensure the same coefficient of variation with the parametric estimate, D Using (2.21) and (2.22), we see that an approximation for the coefficient of variation of DN is CV(6N) 1(-1/2 S(No+2)172 and by setting C=CV(DN), one can easily show that /No is the root of the quadratic equation C2No /No + 2C2-1 = 0. Solving for .''o, yields the two roots .. t(l-4C2 (2C 1))i/ fo= 2------------- 2C and since (1-4C2(2C2-1))1 2 >1 whenever ? 1 C- < 2' the required sample size for values of C- .5 is Sj1+(1-4C2 (2C -))1/22 2C2 For example, if C= .25, then No = 284. Table 6 gives values for No corresponding to coefficients of variation ranging from .2 to .5. Table 6. Number of animals, No, that must be sighted to guarantee the estimate DN has coefficient of variation, CV(DQI). CV(DN) No .50 20 .40 48 .30 142 .25 284 .20 671 CHAPTER III DENSITY ESTIMATION BASED ON A COMBINATION OF INVERSE AND DIRECT SAMPLING 3.1 Introduction When sampling a population by means of line transects, it is important to keep in mind that the transect length that can be covered by an observer will be finite. This poses a problem for the inverse sampling plan since there will exist the possibility of not seeing the specified number of animals within the entire length of the transect. There- fore, it seems reasonable to develop a sampling scheme that would employ a rule, which allows one to stop when either a specified number, No, of animals are seen or a fixed distance, Lo, has been travelled on the transect. In this chapter we will consider a sampling plan which combines the inverse sampling procedure discussed in Chapter II and the direct sampling procedure of Gates et al. (1968). More precisely, we will define the combined sampling method as follows: 1. Place a line at random across the area, A, to be sampled 2. Specify a fixed number of animals, No>2, and a fixed transect length, Lo, and then continue sampling along the transect until either N animals are seen or a distance, Lo, has been travelled. 49 Since the above method merely incorporates the individual stopping rules from the inverse and direct sampling methods, it seems reasonable to use the estimate f D if H = INo DCP = ^u (3.1) D if N < No, where N is a random variable corresponding to the actual num- ber of animals sighted using combined sampling, D is the inverse sampling estimator given in (2.7) and D is an esti- g mator appropriate for the direct sampling case. In other words, the combined sampling procedure uses the inverse sam- pling estimate if sampling terminates after No animals are seen and the direct sampling estimate if sampling terminates after travelling a distance Lo. In Section 3.5 we will also show that DCp has a maximum likelihood justification. Before proceeding to derive the mean and variance for DCP, we need an estimate appropriate for the direct sampling case. 3.2 Gates Estimate Based on the direct sampling approach and assuming g(y) =e- y, x 0, Gates ce al. (1968) developed the estimate 0 n = 0,1 d 2(3.2) D dn (n -1) n > 2 ( n 2L : , i=1 where Lo is the fixed length of the transect, n is an observed value for the random variable Nd, the number of animals seen using direct sampling, and yi is an observed value for the random variable Yi, i =1,2,...,n, the right angle distance to the ith animal seen. In what follows, we shall show that the variance of Dd is not finite. First, we need a result concerning the joint density of the Yi, i =1,2,...,Nd, condi- tional on Nd. Theorem 3. Under the assumptions stated in Section 2.2.1, conditional on Nd=n>0, the random variables Y1,Y 2...,YNd are independently, identically distributed with common density fy(y) = Ae- y y>0, A>0. Consequently, conditional on Nd = n>0, the random variable Nd E Y. has a Gamma distribution with parameters n and A. i=l Proof: We want to show that for yi >0, i= i,2,...,Nd, n -A I y. i=l fY 1 2 (YI'' Y2 'YN INd=n) =A ne l..YN .... YNd " Recall that in the direct sampling procedure, the total length travelled, Lo, is fixed, and define L to be the random variable representing the total length travelled on the tran- sect when the nth animal is sighted. Then the events {Nd=n} and {Ln Lo fY1Y Y.d(yl''2.... ( I'Ndld=n) 1 2 L1 d n f', 1Y 2 Y ( yl ...1 nI -- Lo. Now by Theorem 2, Y, Y .Y L and L are mutually 12''' n n n+l independent, and g(yi) f (Yi) g Consequently, n fY Y .Y *(y1 y2 N LNd=n) = H f (yi 1 2 d i=1 l Sn = -n R g(yi). c 1=1 n e i=l which completes the proof. It is now easy to show that Var (Dd) does not exist. N d From Theorem 3, conditional on 1d=n>0, Z Y. has a Gamma i=l distribution with parameters n and X. Thus, using (2.12) and (3.2) 0 = 0, 1 .:(iilN d=n) = T 2 2 2 n 2 Lo (n-2) Also, since Nd is the number of sightings inatransect of length Lo, it follows from Theorem 2 that Nd has a Poisson distribution with parameter OL. Thus E(D2) = E E(D2Nd) -A2 n2 (n-l) e 1(0L)n Sn2 (n-2) n1 4Lo = +0 , showing that the variance for the estimate Dd defined in (3.2) is infinite. In fact as long as P(Nd= 2) > 0, the variance of Dd cannot be finite. The problem of infinite variance for Dd can be overcome by replacing Dd with D where f0 if n=0,l,2 D =" (3. 3) g n(n1) if n>3 2L E Y i=l Note that the estimate, D differs from Dd only when n=2. Since any estimate of the density based on only 2 sightings should be effectively 0, the above modification does not seem to be unreasonable. We will now proceed to derive expressions for the mean and variance of D which are needed in the sequel. 3.2.1 The Mean and Variance of D g A We will first examine E(D ). Recall from Theorem 3, that id conditional on 1d=n, n>0, Z Y. has a Gamma distribution with i=l parameters n and A. Thus Nd, d = n- i=l and 0 n=0,l,2 E(Dg t l=n) =nA 1, nt3 Now since Ud is distributed as a Poisson random variable with parameter OLo, it follows that E(Dg) = E dE(Dg INd) C OL n ne O(OLo) 2L n! 0 n=3 O= {l-e (I+OLL o)} 2( Substituting the left hand side of (2.5) for in the above yields E(D ) = )D l-c (1+0Lo)), and after wriLing In = OL.o, the expected number of sightings in a transect of length Lo, we get E(D ) = D{1-o- (l+u) (3.4) g Thus,D is not strictly unbiased, but the bias arises because g there is a positive probability of obtaining samples of size 1 or 2. However, even for moderate values of v, the bias in D will be small since e (1+p) tends to zero exponentially g fast. For example, if p = 10, the relative bias is only .05%. Next we will look at Var (Dg). Again since, conditional Nd g on Nd=n, n>0, E Y. is distributed as a Gamma random variable, i=l with parameters n and A, we know that 1 2 E 1 =n X n>2, N 2I Nd= (n-l)(n-2)' n>2, E Y i i=l 1 and 2 0 if n=0,1,2 E(D INd=n) = 2 (nl) 2 4 n (n-1) if n>3 Therefore, E(D ) = ENdE(Dg Nd) 2 0 2 n --- n (n-l) e- (3 5) z. (n-2) nl n=3 4Lo and we can write 2 0 2 n- ln Var(D ) = n (n-2) ne D -e +)}2 (3.6) 4Lo n=3 An approximation to Var(D ) valid for large values of u may be derived in a manner analogous to the method used by Gates et al. (1968). After writing 2 n (n-1) 2 4 (n-- = +n+2+- it is easy to see that for n,3 2 n2 n+2 n (n-1) 2 (n-2)n+6 Thus, lower and and upper bounds for E(D ) are g .22 e n LB = -- Z (n 2+n+2)e -- E(Db) ALo n=3 n B 2 2 -uj n ^ UB = =Z n n+n+6)eE 4Lo n=3 n Now 2 m -ui n UB-LB = n Lo n=3 2 - = -7 (1-e -e 2 -u - o-). Upon using the relationships D = and = L D = and I = 0Lo we get 4D' UB-LB = ji which tends to 0 as u-o for E(Dg) is 8 (1-e -ll-ie 2- I) u-e- ^ Thus, a reasonable approximation L^ UB+LB SE(D g = 2 \ 4Lo n=3 e-n 2 -e n (n +n+4) -- n!- 1) - (u 2+22u+4-e 11(4+6u+5ju ) U (3.7) From (3.7) an approximation for Var (D ) is Var(D ) =D2{1+ 4 e ((5+ + )}-D {l-2e- (l+l+e-2 (1+p)2 } 2 2 4 -' 6 4 -2 2 =D { -+-- e (3-2u +-+ e (1+p) } Now, as p increases, the terms involving e-P and e-2p will 2 4 tend to 0 much faster than -+-, so that for large u, we have the approximation Var(D ) D2 (-+ ). (3.8) We are now in a position to derive the mean and variance of Dp 3.3 Expected Value of DCp Recall that in the combined sampling scheme both N, the number of animals seen, and L, the distance travelled before termination of sampling are random variables. Thus, the expected value of DCp can be found directly using E(Dp) = ENE(DCPIN). However, before proceeding along these lines it will be help- ful to have the following theorems. Theorem 4. Let N be the random variable representing the number of animals seen using the combined sampling method. Then under the assumptions stated in Section 2.2.1, 58 f n e n=0, ,...,N -1 P(N=n)= om n e 1I S n=I]o =Uo where u=0Lo is the expected number of animals sighted along a transect of length Lo. Proof: For n event {exactly n sightings occur in (0,Lo1). By Theorem 1, the number of sightings in (0,Lo] is Poisson with parameter OLo. Hence, P(N=n) = enu n=0 ....... Io-1. The case N=No follows since the event (N=N.J is equivalent to the event (at least No sightings occur in (0,Lo]}. The following three theorems establish some useful relationships among the random variables I], LN andY ,Y2 .. YN where 11 is as defined in Theorem 4, LI represents the total t-h length travelled on the transect when the th animal is sighted and Yi, i=1,2,. ..,, represents the right angle S th distance to the i animal seen. Theorem 5. Under the assumptions stated in Section 2.2.1, the conditional p.d.f. of LN given N=n>0, is n-1 nn n n-i fLN (|N=n) = fLeN ( ) -le-0 n=N F(No) P(N=N) where Lo NNoNo-1e -0. P(N=No) = e dt. 'F(No) Proof: First we will consider the case when n Theorem 1, seeing n to observing n Poisson events in the interval (O,Lo]. There- fore (see Bhat, 1972, p. 129), the joint density of L1,L2 ....,LN conditional on the occurrence of N=n Poisson events in (0,L]o is L1 L2 ..LN( 1 . JN=n) = nn Os and the marginal density of LN conditional on N=n is n-l fLN(I|N=n) = n! ---, 0 Next, we will consider the case where N=No. Define T. to be the random variable corresponding to the distance travelled on the transect between the (i-l)st and ith sight- ing. Then the Noth observation is made at No L = E T.. N0 i=1 60 Now in the combined sampling approach, we will see N=Noani- mals, if and only if che distance N L = T.- L. N0 i=l I Therefore, if O<-sLo, then P(L tNJ=N0o) P ( [L.= 1, P(LIj-I N=to) = P(.=) P(i=No) No P( T. .) i=l 1 P(N=No) Now, since the sightings are Poisson events by Theorem i, the random variable n0 L = E T. o i=l has a Gamma distribution with parameters No and 0. Thus, 0[ No N -I -0?. f (IjN=N) = o[ e ___ I 0 Now, by Theorem 4, we have P(N=[) = e-0L (OLo) j=0o j OL-z ONo No-1 0 z dzo SOe- 0 NO )(o-1 Sr(No) de. Substituting P(N=No) into (3.9) above completes the proof. Theorem 6. Under the assumptions stated in Section 2.2.1, conditional on N=n>0, the random variables Y1,Y ... N and LN are independent. Proof: First consider the case N=No. Let 20 and y.i0 for i=l,2,...,N. We want to show that P(Yi yi, LNt, i = 1,2,...,NIN=No) = P(Y iYi, i=1,2,...,No)P(LN ]|N=No). Note that the event {N=NJ is equivalent to the event {LNo Lo}, so that we can write P(YiYi, LN : i = 1,...,NIN=No) = P(Y iyi, LNos-, i = 1,2,...,No I=No) = P(Y iYi, LNosz, i = 1,2,... ,No|LN sLo) P(Yi Yi, LNo : LNoLo, i=1,2,. .. ,N ) P(LNo : Lo) N0 Now by Theorem 2, Consequently, Y1Y2,...,YNo and LNo are independent. P(Y iYi, LNO < LN, Lo, i=1,2,...,NO) P(LNo L0) P(Y = P(Y = P(Y y. i=1,2,...,No)P(LNo IN=No0). 1 1- 0. Now consider the case l=n<1o, and let 9. and v. be defined as before. Also, define XN to be the actual length travelled to see N animals when the combined sampling method is used, that is, = Lo ' L o, N=No Then for n are equivalent. Thus, P(Yi-y L s., i = 1,2, .... ,N|N=n = P(Y y L i = 1,2,...,nl:n=Lo) 1i n n = P(' ., L t, i = 1,2,...,n L Lo P(Y L -, L Lo n nn++ Again by Theorem 2, for N=n, YL 2" n and Ln Ln+ are independent, so that P(Y.i P(Y iYi, i=1,2 .... ,n)P(Ln- Ln<- Lo.Ln+l) P(L nLo = P(Y.i .Y i=l,2,...,n)P(L <. N=n Theorem 7. Under the assumptions stated in Section 2.2.1, conditional on [t=n>0, the random variables Y1,Y2,..' are independently, identically distributed with common density 63 fy(y) = Ae y>0, A>O. Consequently, conditional on N=n>0, the random variable N Y. has a Gamma distribution with parameters n and X. i=l Proof: The case N=n when 0 is equivalent to the event {LNo fYY2 .... YN (Y'Y2' ... 'YN IN=No) 1 f ,Y2 ..., No (Y Y21 . 'YNo ILNo Lo ) Now by Theorem 2, Y ,Y2 .... YNo and LNo are mutually inde- pendent and g(Yi) fY(Yi) c Consequently, No fYY2"'YNo (yl'y2' ..YN IN=N) H f (yi) 12*N. i=l i 1 No = I g(yi), c i=l and substituting -AYi g(yi) = e completes the proof. We are now ready to determine the expected value of DCp given N=n. For n = 0,1,2, E(DCPIN=n) = E(D IN=n) = 0. (3.10) g Next consider the values 3~n,-No. Recall from Theorem 7 that, conditional on N=n>0, Y. has a Gamma distribution i=l - with parameters n and A. Then using expressions (3.1) and (3.3) it follows that E(DCpIN=n) = E(6 lN=n) = E N(Nrl) il=n 2L.E Y. n.\ 2 (3.11) 2L0 Finally for N=No it follows from Theorem 5, Theorem 6, Theorem 7, and expressions (3.1) and (3.3) that E(DcpI r=Nro) = E(Du J=No) iE N-_1 ____ E '-1 [ "0 E Y L T L 0 1 0-2 - 2i'(N=No ) o e * Then using the transformation 9. = we get e\0 I-, e (OL, ) E(D pN=No) =- 2P0(N=N) (3.12) CP 2P (N nl A-12 n=N'. 1 We can now evaluate the expected value of DCp. Using Theorem 4 and expressions (2.5), (3.10), (3.11), and (3.12), we find that E(DCP) = ENE(DcPIN) n E L- P (N=n) + E nl n3 n=No-l = Dfl-e- (I+p)]. (3.13) where i =0Lo. Thus DCp is a biased estimate for the density. Note that the bias here is equal to the bias for the modified estimate, D in direct sampling. This is as expected since in the combined sampling procedure, we are simply choosing the estimate that corresponds to the reason for terminating sampling. If we stop sampling after seeing the Noth animal, then the inverse sampling estimate is used, and, likewise, if sampling stops after travelling the distance Lo, then the direct sampling estimate is used. 3.4 Variance of Dp An expression for the variance of DCp can be found directly using the formula 2 2 Var(DCp) = E(DCI) {E(D ,)}2. (3.14) In the preceding section we derived E(Dcp) so that our "2 problem reduces to evaluating E(DCp). Proceeding along the same lines as in Section (3.3), we quickly find 2 E(DCPIN=n, n=0,1,2) = E(D CP N[=n, 2 (3.15) S1 N=n, 2 9 2 n' (n-L). 4Lo-(n-2) E(DCpIrJ=No) = E CP (3.16) N=N j - E -IN=No E LL N2 1 '(N) -1) [ S4(No-2) 2P (N=No) 2 2''(No-l)" 2 2 4(N0-2) P(N=f0) -, (N 2 I.) n= 4(No-2) P(N=N0) n=N (N,- 1) 20 N -2. 3e- 9. F (No) 0 4-2,No-3 -0. e d. -OL LO) n e-L -(oLo)- n9- .0(3.17) <- 1 Then, using Theorem 4, expressions (2.5), (3.15), (3.L6), and (3.17) and letting u=0Lo, it follows that (2 2 E(IC) EN CP N) 2 o-1 4L- n=3 -, -ii n 2 22 / 2 2 I L(n- 1 ) c A I - (n-2) nN 4 No-2 n=No-2 - n2 -1 n-2 -,u 2 2 2 ,102 n u e A2 0 0- 4 (n-2) (n-2)! '-- -2 n=3 O m -1 n e nI n! n=No-2 and, 0 -l I 1 -II S11="No N-3 2 -3n -p 2 m -e n = D2 (n+2) e+ (No- E e nnN0-2 (3.18) n=l n n1 n=No-2 n! An expression for the variance of DCp is now evident. Using (3.13), (3.14), and (3.18), we get S 2 -n+2) -3 n2 -- / 2 co - n Var(DCP)=D2 L (n 2)L N "- [ -e- l+u)]2 CP n=1 n=No-2 n (3.19) where p=9Lo. Note that, S rN_1 2 D2 lim Var(Dcp) = L -2 D L 0-o and lim Var(D p) =D e -e (1+u)] CP n n! o-400 n=1 After some simple algebraic manipulations and using the relationship D= one can easily show that the limit as Lo-f and the limit as No--o are equal to the Var(D ) given in (2.13) and the Var(D ) given in (3.6), respectively. These limiting values are as expected, since letting Lo-0 in the combined sampling approach is equivalent to using inverse sampling, while letting No-*o is equivalent co using direct sampling. We will now show that Var(DCP) can be expressed as a function of both Var(6u) and Var(b ) given in (2.13) and (3.6), respectively. Writing the equation in this form will then lead directly to an approximation for Var(DCp). 68 First note that (3.19) can be rewritten as Var(Dcp) No-3 (n+2) nn -u e 2 -1 -, n CP (n+2) e IN-11 i e n -u 2 mn= n n +---+ 7 ---r-- ll-e (l+u)] D n= n=No1-2 (3.20) Adding and subtracting the terms n=N- n=N. -2 m n -u IM (n+2) u e- n=N -2 and n=N.-2 n n! -u n e to the right hand side of (3.20) fields to the right hand side of (3.20) yields n] Var(Dcp) o n -11 CP-_ = (n+2) e -u 2 D = I -e n +u) D n=1 e n - ^n I ^I.-2/' co -L n 0) + e z n=No-2 n=N,-2 c -n e uI 2N,-3 + n=No-2 (No-2) n -u (n+2) u e n nT c n - Z (n+2) I e n=l n O 2 n -u -2 2 u e 1l-e (l+u)] 2 n=No-2 n n Now, after multiplying both sides of the previous expression 2 0.\ by D using the relationship D= and substituting the expressions for Var(D ) and Var(D ) given in (2.13) and (3.6), respectively, we see that 11 n- Var(Dp) = Var( ) +Var(D) D 2 C n=No-2 n u n=No-2n n (3.21) Therefore, an approximation for Var(DCP) can be simply ob- tained by using the approximation for Var(D ) given in (3.8). 6 CO + Z n=No,-2 3.5 Maximum Likelihood Justification for Dp In Section 3.1 we stated that the estimate D g N -DU N=No could be justified using the maximum likelihood procedure. To show this, we first need the joint density function for YIY2 ... YN LN and N, i.e., fY,LN N(X,S,n). By Theorem 6, fY,LNN() N can be written as fY,LNN(y, ,n) =fy (yiN=n)fLN(JN=n)P(N=n). The functional form for fY,L NN() is now evident. Using Theorems 4 and 5 and recalling from Theorem 7 that n -A E Y. fy(yxN=n) = ne i=l A>O, we obtain, / n -A E Y. ne i=l n-1 -OLo n nZ e (OLo) Lo n,- N=n No N0 -A Y. AN ei-1i N ,No-1 -A0 (No- )-I- N=No. (3.22) As shown in Section 2.3. .themaximum likelihood estimate for D is given by OX D 2 where 0 and \ are maximum likelihood estimates for 0 and ,\ respectively. Finding maximum likelihood estimates for 6 and A is now straightforward. Taking the natural logarithm of the likelihood function and setting the partial derivatives with respect to 6 and equal to 0 yields, for n>0, N=n 96 = and Y N Z Y. , I N=l, . N . Thus, a maximum likelihood estimate for D using combined sampling would be 9 N2 E'l i=l 1 N=n 6 *= 2L Y. [ 1i= Our estimate for DCp is obtained by correcting the estimates A and for bias and noting that 9711=n onl y j exists for values of N>2. CHAPTER IV DENSITY ESTIMATION FOR CLUSTERED POPULATIONS 4.1 Introduction The estimation procedures developed in Chapters II and III are based on the assumption that the sightings of ani- mals are independent events. These methods would be appli- cable to animal populations that are generally made up of solitary individuals, such as ruffed grouse, snowshoe hare and gila monster. However, there are other types of animals which aggregate into coveys, schools and other tight groups. Animals behaving in this way will be said to belong to clus- tered populations. Some examples of clustered populations are bobwhite quail, gray partridge and porpoise. In these cases the assumption of independent sightings is certainly not valid, and a different procedure would have to be used. The line transect method could be easily generalized to provide estimates for clustered populations. As noted by Anderson et al. (1976, p. 12), if we amend the assumptions in Section 2.2.1 so that they refer to clusters of animals rather than individual animals, then the results of Chap- ters II and III are directly applicable to the estimation of the cluster density, Dc. The estimate for Dc will be 71 based on the right angle distances to the clusters from the random line transect. In the case where the number of ani- mals in every sighted cluster can be determined without error, an estimate for the population density D is given by D = D s where Dc is the estimate for Dc and s is the average size of the observed clusters. Some criticisms of the approach outlined in the preced- ing paragraph are possible. First of all, it may not be possible to determine the distance to a cluster as easily (or as accurately) as the distance to an animal. How will this distance be defined? Secondly, the simple modification of the assumptions in Section 2.2.1, obtained by replacing the word "animal" by the word "cluster" would imply that the probability of sighting a cluster depends only on its right angle distance from the line. This may not be a reasonable assumption since the probability of sighting a larger cluster is likely to be greater than the probability of sighting a smaller cluster. Finally, the sighting of a cluster may not necessarily mean that all of the animals comprising the cluster are seen and counted by the observer. In this case, a more reasonable assumption would be to let the probability of sighting an animal belonging to a cluster depend on the distance to the cluster as well as the true cluster size. In this chapter we shall propose a density estimate for a clustered population by assuming, among other things, that it is possible to determine the distance to the center of the cluster from the line transect. An estimation procedure will then be developed using a model in which the observer's count of the number of animals in a cluster is regarded as a random variable with a probability distribution depending upon the right angle distance and the size of the cluster. 4.2 Assumptions The density estimate that we will develop is based on the inverse sampling approach outlined in Section 2.1, with one minor modification. In clustered populations the plan is to continue sampling along a randomly placed cransect until a prespecified number, Nc, of clusters (rather than animals) are seen. As each cluster is sighted, the follow- ing information is recorded: 1. the right angle distance, y, from the transect to the center of the cluster 2. the observed number of animals, s, in the cluster. (this may be less than the true size of the cluster) 3. the actual distance, Z, travelled by the observer to sight N clusters. The sampling procedure described above may be used to construct an estimate of the population density under the following set of assumptions. These assumptions closely parallel those of Section 2.2.1 with the exception chat they are now phrased in terms of clusters rather than indi- vidual animals. Bl. The clusters are randomly distributed with rate (density) D over the area of interest, A. B2. The clusters are independently distributed over A, i.e., given two disjoint regions of area, 6A1 and 6A2' P(n1 clusters are in .A1 and n2 clusters are in 6A2) = P(nI clusters are in 6A1)P(n2 clusters are in 6A2). B3. Clusters are fixed, i.e., there is no confusion over clusters moving during sampling and none are counted twice. B4. There exists a probability mass function p(-) defined on the set of positive integers, such that p(r) is the probability that r is the true size of a cluster located at a right angle distance, y, from the transect. Note that p(r) is independent of y. In probability notation, if R and Y denote the random variables representing the true cluster size and the right angle distance to the cluster, respectively, then P(R=rjY=y) = p(r), r = 1,2,... (4.1) B5. The probability of observing a cluster depends only on the size of the cluster and the distance from the transect to the cluster. B6. There exists a non-negative function h(-) defined on 10,w) such that 0 5 h(.) < 1, h(0) = 0, and the probability of observing s animals belong- ing to a cluster of size r 2 s located at a right angle distance y from the transect is (r) [h(y) ls l-h(y) r-s That is, if Y and S denote the random variables represent ng tLhe right angle di:sLtance to a cluster and the observed number of animals in a cluster, respectively, then P(S=slR=r,Y=y) = (r)[h(y)Is[-h(y) r-s. (4.2) \s Closer examination of assumption B6 shows that we are now allowing the probability of observing a cluster to depend on both the right angle distance, y, and the true cluster size, r. To see this, first let C be the event that a cluster is observed. Then the probability of observing a cluster of size r located at distance y from the transect is r P(CIR=r,Y=y) = E P(S=slR=r,Y=y) s=l = 1 P(S=0IR=r,Y=y) = 1 [l-h(y)]r, (4.3) which clearly depends on both y and r. The assumption B6 also satisfies the reasonable require- ment that for a fixed right angle distance y >0 and r1 < r2, P(CIR=rl,Y=y) P (C R=r2,Y=y). This follows immediately from equation (4.3). Note that P(CIR=rl,Y=y) = 1 [l-h(y)] l and P(CIR=r2,Y=y) = 1 [l-h(y)]2 Now, since 0 h(y) 1, it is clear that P(CIR=rl,Y=y) _P(CIR=r2,Y=y). One final note with regard to assumption B6 is in order. In the case where every cluster has size 1, i.e., P(R= 1) =1, the probability of sighting a cluster located at a right angle distance y is simply h(y). This is quickly seen by setting r= 1 in (4.3). Thus, under these circumstances, h(y) has the same interpretation as g(y) defined in Sec- tion 2.2.1, that is, h(y) is the conditional probability of sighting an animal at distance y given there is an animal at v. 4.3 General Form of the Likelihood Function We will use the maximum likelihood procedure to obtain an estimate for D, the animal population density. To obtain the likelihood function, we first need an expression for the probability density function fS,Y,L s'y' ) where S= (S1,S2, .... SN ) is the vector of random variables c representing the actual number of animals seen in the clusters, Y= (Yi ,\Y ...2' Y ) is the vector of random variables repre- c senting the right angle distances from the clusters to the transect and L is the random variable representing the total length travelled on the transect to see H clusters. Upon writ ing fS ( = fS I 'Y, L(s)I'y) f [, L 1)fL (9(), (4.4) S. 'i L S |YSI^,LilL- L it is seen that specifying the joint probability density function for S,Y and L is equivalent to specifying the three functions on the right hand side of (4.4). The density functions fyIL(ylk) and fL(k) may be derived in a manner analogous to that used in Section 2.2. Let g (y) denote the probability of sighting a cluster located at a right angle distance y from the transect, that is gc(y) =P(observe a cluster Y=y). Since sighting a cluster located at a distance y is equiv- alent to observing at least one animal belonging to the cluster, we can write gc(y) = E P(observe s animals Y=y) s=l = E P(S=s Y=y). s=l Now, for s l1, P(S=slY=y) = E P(S=slR=r,Y=y)P(R=rlY=y). r=s By assumption B4, it follows that Y and R are independent random variables. Thus, using (4.1) and (4.2) we get P(S=sIY=y) = Z s (h(y)lsl-h(y)r-s p(r). r=s Therefore, g (y) = s ( [h(y)s [l-h(y)r-s p(r). (4.5) s=1 r=s Now, according to assumption B6 h(0) = 1, so that g (0) = p(s) =1. s=1 Therefore, the function gc(y) plays a role similar to the role of g(y) in Section 2.2. Consequently, by regarding a "cluster" as an "animal," the results of Section 2.2 can be applied to clustered populations in a straightforward manner. Let Nc () denote the random variable representing the number of clusters seen when travelling a distance on the transect. Then, by Theorem 1, Nc () is a Poisson process with parameter 0*L, where 0* is the expected number of clusters seen when travelling along a transect of length Z. Also, from Theorem 1 we see that the respective analogs to equations (2.2) and (2.1) are 0 D (4.6) c 2c where Dc is the density of clusters and Cr c' = gc y)dy. (4.7) Furthermore, Theorem 2 gives us the results that L and Y are mutually independent random variables, L is distributed as a Gamma random variable with parameters N and 0 and the conditional density of Y given L= R is N IL( fY) IT c gc(Yi). (4.8) (c*) c i=l Now, assumption B5 implies that the number of animals actually observed in a cluster depends only on the right angle distances to the animals, Y, and the size of the cluster, R. Thus, S is independent of L, and since Y is also independent of L it follows that fSIY,L (sly,) = fsly(S Y) Nc = n P(S =s .Y.=y.). (4.9) i=l i We can now write an expression for the likelihood func- tion L(e*,p(.),h(.);s,y,t). Using (4.4), (4.8) and (4.9) and recalling that L has a Gamma distribution with param- eters N and 0 we obtain c N L(6*,p(.),h(.);s,y,t) = H P(S=sjY=y) N N Nl - H gc(Yi)(O*) c Z c e x i= (4.10) (c*) c F(N ) 4.4 Estimation of D when p(*) and h(.) Have Specific Forms For a clustered population with a cluster density, D , the animal population density may be defined as D=Dc , where v =E(R) is the expected cluster size. Upon using the expression for D given in (4.6), we get 0*v D= 0 (4.11) 2c so that maximum likelihood estimation of D can be carried out by using (4.11) in the likelihood function presented in (4.10). Since the random variables S and Y are independent of L, it is easily seen from (4.10) that the maximum likelihood estimate of O' corrected for bias is -1 S (4.12) However, finding estimates of v and c can be quite difficult depending upon the nature of the functions p(.) and h(-). Very likely, one has to resort to some iterative technique such as the Newton-Raphson method (see Korn and Korn, 1968, eqn. (20.2-31)) to solve the likelihood equation. It is apparent that there exist a wide variety of func- tions which satisfy the requirements of p(-) and h(-). The appropriate choice in a particular problem depends on the nature of the population under investigation. In this work we will consider the functions r C p(r) = c e a > 0, r = 1,2,... (4.13) rl(1-e ) and h(y) = e- > 0, v > 0. (4.14) It is easily seen that p(.) given by (4.13) represents a truncated Poisson distribution. The expected cluster size v is therefore given by SC (4.15) l-e- The limiting case a = 0 corresponds to a population in which the cluster size is 1 with probability 1. Thus, c =0 corre- sponds to the model in Section 2.2. The choice for h(-) is based on the fact that when a=0, h(-) may be interpreted as the function g(*) defined in Chapter II. Because g(y) = e- y seems to be a popular choice for g(*), we feel that h(y) = e y is a reasonable choice for h(-). The likelihood function may now be regarded as a function of 0*, a and X*, and maximum likelihood estimation of D - D 0 * 2c* may be accomplished by expressing v and c as functions of a and A We have already seen the form of v in equation (4.15). To derive an expression for c we proceed as follows. Recall from (4.5) that g (y) = E P(S=sIY=y), s=l where P(S=s|Y=y) = s) [h(y)]s[1-h(y)]r-s p(r). r=s Now using (4.13) and (4.14) in the above equation, we get ( -s r\ 4 yst -A 'k f r -a s e -e P(S=sJY=y)= E s r=s rl(1-e-) (ae- ~ys -*y r-s S(ae y)s e- [ra (1-e y)] -e ) (r-s)1 -* -A y (ee y)s -e" s(1ee-s (4.16) sl(l-e -) Then, substituting for P(S=s|Y=y), we get -oe y -A v s g(y) = e e (I-e ) s=l1 l-e ea (4.17) (l-e ) Therefore, using (4.7) and (4.17), we get c g c(y)dy --eA --z) dy (4.18) (L-e a) To evaluate c note that 0 \ *y I -- e )dy "li-e y (-e )dy = lim { e dy) (4.19) ,3 X -*+? *-3o0 By letting t = e in the integral in the right hand side of (4.19), we can show "" A N V 1 -aC t ae dy 1 e dt ee dy = t e 1 (-1) aj(l-e ) j V +--- g A j=l j -j and upon substituting inco (4.19), we get o e 1 0 ( 1 ) a ( L e ) (l-e- )dy = lim L- c yo j=1 j j! Since the sum above is absolutely convergent, we can take the limit inside the sum and obtain o -a*e 1 J- (l-e )dy = ), i) 0o j=l j Then, using (4.18), it follows that c = a) (4.20) S(l-e-0) where a(a) = E ) (A.21) j=l J We can now write the likelihood function in terms of 6*, \* and a. Using equations (4.10), (4.16), (4.20) and (4.21), we get N N N * c c c -A Yi E s. y.s. -a e i=l i=l i=l a e e L('*,X*,a;s,, ) = ;eN e N c (le-") c H s.! (l-e ) i=l N N -ae Y N N -1 ( c c -ae )(0i c c9. 0. ) T (1-e )e i=l N [a(a)] c F(Nc) (4.22) Using the likelihood function given in (4.22), we can now obtain an estimate for the population density D. Recall from (4.11) that we can write * 0 V D - 2c* After substituting for c and v using equations (4.15), (4.20) and (4.21), the expression for D becomes D 0 ) k 2 2a-(i) Thus, an estimate for D would be 2a(a) where 0 X and a are maximum likelihood estimates for 0 A and a, respectively. As noted earlier in this section, S and Y are independent of L so that O0 can be estimated using equation (4.13). However, this still leaves us the problem of estimating A* and a. Instead of estimating A* and a separately, we can reparamecerize the likelihood equation in (4.22) by letting 0 and a = a. Then, our estimate for D becomes D = (4.23) The advantage of this reparameterization is that it makes use of the fact that L is independent of both S and Y. Thus, the estimate for D given in (4.23) is now the product of two independent esLimates, 0 which depends on L alone and D which depends on S and Y. As a result the variance of can now be found easily. Using the formula (see Goodman, 1960) for the variance of the product of two independent estimates, we get Var(D ) =E (6)Var(p)+E (p)Var(6) +Var(6 )Var(p) (4.24) Since L is distributed as a Gamma random variable with an* (^ * parameters Nc and 0 exact expressions for E(6 ) and Var( ) can be obtained using (2.4) and (2.11), i.e., E(0 ) = , and *2 Var(e ) -=N -- (4.25) c Expressions for the variance and expected value of p can be- come quite complicated. An iterative scheme would be needed to find the solutions for p and a that would maximize the reparameterized version of the likelihood function given in (4.22). There are computer programs available that can provide maximum likelihood estimates for p and a along with numerical approximations for the variance covariance matrix of the estimates. In the next section we will demonstrate the use of one such program with a set of hypothetical data. 4.5 A Worked Example In this section we will present a worked example to demonstrate the use of a computer program to find the estimate D* and its approximate variance. Because we are not aware of any real data that have been collected according to the sam- pling plan described in Section 4.2, we shall use an artifi- cial set of data in the example. Suppose that sampling was continued until N = 25 clusters were sighted, and that a transect length of = 25 miles was needed to sight the 25 clusters. Suppose further that the observed right angle distances and the cluster sizes were as follows, where the first number in the pair is the right angle distance, y, measured in yards and the second number in the pair is the corresponding cluster size, s: (1,1), (3,2), (7,1), (10,1), (2,3) (5,5), (4,1), (7,2), (15,1), (22,1) (6,1), (3,6), (2,1), (12,1), (28,3) (9,2), (18,1), (36,7), (17,6), (5,1) (4,1), (3,1), (8,2), (3,4), (13,1). As noted in Seccion 4.4, an estimate for 0 is N -1 c- c = .96, and an estimate for the variance of ,U is Var(0 ) = .001. c In order to estimate p, the reparameterized version of the likelihood function given in (4.22) will have to be max- imized. The Fortran subroutine ZXMIN, found in IMSL (1979) may be used for this purpose. This program uses a quasi- Newton iterative procedure to find the minimum of a function. Thus, we first need to take the negative of the likelihood equation before we can use this subroutine to our advantage. On output, this subroutine not only provides the values at which the function is minimized, but also provides numer- ical estimates for the second partial derivatives of the function evaluated at the minimization point. Thus, when used with the negative of the likelihood function this pro- gram will provide the maximum likelihood estimates, p and i, as well as the matrix of negative second partial derivatives of the likelihood, L(*), evaluated at p and a. We will denote this matrix by 2 2 / 2IL(*) 2 ZnL() 2 -1 a2nL() aInL ( ) /a=a \ a 9p p2 / For our data, the use of the subroutine ZXMIN with initial values ai = 2.24 and pI = .16 yielded a = 2.844, p = .0907, and 88 S 7.687 -161.229 -161.229 5098.985 The initial value used for a was the mean of the observed cluster sizes, i.e., 25 r S. i=l - L 25- = s. 25 Since our model does not assume all animals belonging to a cluster are seen, s would underestimate the expected cluster size, i.e., s < E(R) . 1-e Thus, s seems to be a good starting value for a. In choosing an initial value for P, first recall that 0 = a(O- , where a(a) is given in equation (4.21). Since our initial value for a is s, all we need is a starting value for A. If every animal in the cluster was seen with probability 1, the density of clusters would be estimated by the method described in Chapter II. In this case, the maximum likelihood estimate for .\ would be 1/7 where 25 y' =- Thus, as the initial value for p we used S 1 - ya(s) The estimate for the density can now be calculated. Using (4.23) and substituting the values we obtained for 6 and p, we get A* D = 76.7 animals/square mile. Now if we can obtain a large sample approximation for the variance of p, then we can use (4.24) as an approxima- tion for the variance of D Now, under the usual regularity conditions, V will be a large sample approximation to the inverse of the variance-covariance matrix of a and p. Further- more, the approximate variance of D* can be obtained from equation (4.24) after substituting the element in the matrix corresponding to the approximate variance of p along with the other appropriate quantities. Straightforward calcula- tions show that SVar(D*) 26.2 animals/square mile. The use of this Fortran subroutine required a minimal amount of programming to enter the appropriate likelihood function. It was run using the computer facilities of the Northeast Regional Data Center located in Gainesvillc, Florida. Less than two seconds of CPU time was needed for the estimates to converge to values that agreed to four significant digits on two successive interations. BIBLIOGRAPHY Anderson, D. R., Laake, J. L., Crain, B. R., and Burnham, K. P. (1976), Guidelines for Line Transect Sampling of Biological Populations, Logan: Utah Cooperative Wildlife Research Unit. Anderson, D. R., and Pospahala, R. S. (1970), "Correction of Bias in Belt Transect Studies of Immotile Objects," Journal of Wildlife Management, 34, 141-146. Barr, A. J., Goodnight, J. H., Sail, J. P., and Helwig, J. T. (1976), A User's Guide to SAS 76, Raleigh: SAS Institute. Bhat, U. N. (1972), Elements of Applied Stochastic Processes, New York: John Wiley & Sons. Burnham, K. P., and Anderson, D. R. (1976), "Mathematical Models for Nonparametric Inferences from Line Transect Data," Biometrics, 32, 325-336. Crain, B. R., Burnham, K. P., Anderson, D. R., and Laake, J. L. (1978), A Fourier Series Estimator of Population Density for Line Transect Sampling, Logan: Utah State University Press. Eberhardt, L. L. (1968), "A Preliminary Appraisal of Line Transects," Journal of Wildlife Management, 32, 82-88. Gates, C. E., Marshall, W. H., and Olson, D. P. (1968), "Line Transect Method of Estimating Grouse Population Densities," Biometrics, 24, 135-145. Goodman, L. A. (1960), "On the Exact Variance of Products," Journal of the American Statistical Association, 55, 708-713. Ilayne, D. W. (1949), "An Examination of the Strip Census Method for Estimating Animal Populations," Journal of Wildlife Managemcnt, 13, 145-157. IMSL (1979), The IMSL Library, Seventh ed., Vol. 3, Houston: International Mathematical and Statistical Libraries, Inc. Korn, G. A., and Korn, T. M. (1968), Mathematical Handbook for Scientists and Engineers, Second ed., New York: McGraw-Hill. Leopold, A. (1933), Game Management, New York: Charles Scribner's Sons. Lindgren, B. W. (1968), Statistical Theory, Second ed., New York: Macmillan. Loftsgaarden, D. 0., and Quesenberry, C. P. (1965), A Non- parametric Estimate of a Multivariate Density Function," Annals of Mathematical Statistics, 36, 1049-1051. Pielou, E. C. (1969), An Introduction to Mathematical Ecology, New York: John Wiley & Sons. Pollock, K. H. (1978), "A Family of Density Estimators for Line Transect Sampling," Biometrics, 34, 475-478. Robinette, W. L., Jones, D. A. Gashwiler, J. S., and Aldous, C. M. (1954), "Methods for Censusing Winter-Lost Deer," North American Wildlife Conference Transactions, 19, 511-524. Robinette, W. L., Loveless, C. M., and Jones, D. A. (1974), "Field Tests of Strip Census Methods," Journal of Wild- life Management, 38, 81-96. Seber, G. A. F. (1973), The Estimation of Animal Abundance and Related Parameters, London: Griffin. Sen, A. R., Tournigny, J., and Smith, G. E. J. (1974), "On the Line Transect Sampling Method," Biometrics, 30, 329-340. Smith, M. H., Gardner, R. H., Gentry, J. B., Kaufman, D. W., and O'Farrel, M. H. (1975), Small Mammals: Their Pro- ductivity and Population Dynamics, International Biolog- ical Program. Webb. W. L. (1942), "Notes on a Method of Censusing Snowshoe Hare Populations," Journal of Wildlife Management, 6, 67-69. BIOGRAPHICAL SKETCH John Anthony Ondrasik was born on August 17, 1951, in New Brunswick, New Jersey. Shortly thereafter his parents moved to Palmerton, Pennsylvania, where he grew up and attended high school. After graduation in June, 1969, he entered Bucknell University in Lewisburg, Pennsylvania, and received the degree of Bachelor of Science with a major in mathematics in June, 1973. It was during his studies at Bucknell that he became interested in statistics through the influence of the late Professor Paul Benson. In September, 1973, he matriculated in the graduate school at the University of Florida and received the degree Master of Statistics in 1975. -. lile pursuing his graduate studies, he worked for the Department of Statistics as an assistant in their biosta- tistics consulting unit. In November, 1978, he accepted the position of biostatistician with Boehringer Ingelheim, Ltd. John Ondrasik is married to the former Anntoinette M. Lucia. Currently they reside in Danbury, Connecticut. |

Full Text |

PAGE 1 POPULATION DENSITY ESTIMATION USING LINE TRANSECT SAMPLING BY JOHN A. ONDRASIK A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1979 PAGE 2 To Toni For Her Love and Support PAGE 3 ACKNOWLEDGMENTS I would like to thank my adviser, Dr. P. V. Rao, for his guidance and assistance throughout the course of this research, His patience and thoughtful advice during the writing of this dissertation is sincerely appreciated. I would also like to thank Dr. Dennis D. Wackerly for the help and encouragement that he provided during my years at the University of Florida. Special thanks go to my family for the moral support they provided during the pursuit of this degree. I am especially grateful to my wife, Toni, whose love and understanding made it possible for me to finish this project. Her patience and sacrifices will never be forgotten. Finally, I want to express my thanks to Mrs. Edna Larrick for her excellent job of typing this manuscript despite the time constraints involved. Ill PAGE 4 TABLE OF CONTENTS Page iii ACKNOWLEDGMENTS LIST OF TABLES vi ABSTRACT vii CHAPTER I INTRODUCTION 1 1.1 Literature Review 1 1.2 Density Estimation Using Line Transects . . 4 1.3 Sununary of Results 9 II DENSITY ESTIMATION USING THE INVERSE SAMPLING PROCEDURE 13 2.1 Introduction 13 2.2 A General Model Based on Right Angle Distances and Transect Length 14 2.2.1 Assumptions 15 2.2.2 Derivation of the Likelihood Function 16 2.3 A Parametric Density Estimate 28 2.3.1 Maximum Likelihood Estimate for D . 28 2.3.2 Unbiased Estimate for D 29 2.3.3 Variance of 6 31 2.3.4 Sample Size Determination Using fi . 32 2.4 Nonparametric Density Estimate 34 2.4.1 The Nonparametric Model for Estimating D 36 2.4.2 An Estimate for fyCO) 37 2.4.3 Approximations for the Mean and Variance of t^iO) 40 2.4.4 A Monte Carlo Study 42 2.4.5 The Expected Value and Variance for a Nonparametric Estimate of D. . . . 46 2.4.6 Sample Size Determination Using t)^ . 47 IV PAGE 5 TABLE OF CONTENTS (Continued) CHAPTER Page III DENSITY ESTIMATION BASED ON A COMBINATION OF INVERSE AND DIRECT SAMPLING 49 3.1 Introduction 49 3.2 Gates Estimate 50 3.2.1 The Mean and Variance of 6 54 3.3 Expected Value of DÂ„p 57 3.4 Variance of DÂ„p 55 3.5 Maximum Likelihood Justification for Dpp. . 69 IV DENSITY ESTIMATION FOR CLUSTERED POPULATIONS . . 71 4.1 Introduction ji 4.2 Assumptions 73 4.3 General Form of the Likelihood Function . . 76 4.4 Estimation of D when p(Â«) and h(*) Have Specific Forms 79 4.5 A Worked Example [ [ 86 BIBLIOGRAPHY gO BIOGRAPHICAL SKETCH 92 PAGE 6 LIST OF TABLES TABLE Page 1 Number of animals, N^ , that must be sighted to guarantee the estimate, D has coefficient of variation, CV(D ) 34 2 Forms proposed for the function, g(y) 36 3 Results of Monte Carlo Study using g-,(y) = e~ ^ . 45 4 Results of Monte Carlo Study using goCy) = 1-y . . 2 5 Results of Monte Carlo Study using go(y)=l-y 45 46 6 Number of animals, Nq , that must be sighted to guarantee the estimate Dj^ has coefficient of variation, CV(Dj^) 48 VI PAGE 7 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy POPULATION DENSITY ESTIMATION USING LINE TRANSECT SAMPLING By John A. Ondrasik December 1979 Chairman: Pejaver V. Rao Major Department: Statistics The use of line transect methods in estimating animal and plant population densities has recently been receiving increased attention in the literature. Many of the density estimates which are currently available are based only on the right angle distances from the sighted objects to a randomly placed transect of known length. This type of sampling, wherein an observer is required to travel along a line transect of some predetermined length, will be referred to as the direct sampling method. In contrast, one can use an inverse sampling plan which will allow the observer to terminate sampling as soon as he sights a prespecified number of animals . An obvious advantage of an inverse sampling plan is that sampling is terminated as soon as the required number of objects are sighted. A disadvantage is the possibility that sampling may not terminate in any reasonable period of time. Consequently, a third sampling plan, in which sampling stops vii PAGE 8 as soon as either a prespecified number of objects are sighted or a prespecified length of the transect is traversed, is of practical interest. Such a sampling plan will be referred to as the combined sampling method. The objective of this dissertation is to develop density estimation techniques suitable for both inverse and combined sampling plans. In Chapter II, both a parametric and a nonparametric estimate for the population density are developed using the inverse sampling approach. We will show that a primary advantage of estimation using inverse sampling is the fact that these estimates can be expressed as the product of two independent random variables. This representation not only enables us to obtain the expected value and variance of. our estimates easily, but also leads to a simple criterion for sample size determination. In Chapter III, we derive a parametric density estimate that is suitable for the combined sampling method. This estimate will be sho\>m to be asymptotically unbiased. An approximation to the variance of this estimate is also provided. The density estimates developed in Chapters II and III are based on the assumption that the sightings of animals are independent events. In Chapter IV we relax this assumption and develop an estimation procedure usiny, inverse sampling that can be applied to clustered populations-those populations composed of small groups or "clusters" of objects. Vlll PAGE 9 CHAPTER I INTRODUCTION 1 . 1 Literature Review Our objective in this dissertation is to examine the problem of density estimation in animal and plant populations. The demand for new and more efficient population density estimates has grown quite rapidly in the past few years. Anderson et al. (1976, p. 1) give a good assessment of the present situation and provide some reasons for the renewed interest in this subject in the following paragraph: _ The need to accurately and precisely estimate the size or density of biological populations has increased dramatically in recent years. This has been due largely to ecological problems created by the effects of man's rapidly increasing population. Within the past decade, we have witnessed numerous data gathering activities related to the Environmental Impact Statement (NEPA) process or Biological Monitoring programs. Environmental programs related to phosphate, uranium and coal mining and the extraction of shale oil typically require estimates of the size or density of biological populations. The Endangered Species Act has focused attention on the lack of techniques to estimate population size. It now appears that hundreds of species of plants may be protected under the Act, and, therefore, we will need information on the size of some plant populations. Estimation oP the size of biological populations was a maior objective of the International Biological Program (IBP) (Smith et al. 1975). Finally, we mention that the ability to estimate population size or density is fundamental to efficient wildlife and habitat management and many important studies in basic ecological research. PAGE 10 The estimation of population size has always been a very interesting and complex problem . For a recent review of the general subject area see Seber (1973). Although many of the methods described in Seber 's book are quite useful, they are frequently very expensive and time consuming. Estimation methods based on capture-recapture studies would fall into this category. A further problem with many estimation methods is that they are based on models requiring very restrictive assumptions which severely limit their use in analyzing and interpreting the data. For these reasons and others, line transect sampling schemes are becoming more and more popular. This method of sampling requires an observer to travel along a line transect that has been randomly placed through the area containing the population under study and to record certain measurements whenever a member of the population is sighted. There are several density estimation techniques available using line transect data; however, the full potential is yet to be realized. Density estimation through line transects is typically practical, rapid and inexpensive for a wide variety of populations. Published references to line transect studies date back to the method used by King (See Leopold, 1933) in the estimation of ruffed grouse populations. Since that time, numerous papers investigating line transect models have appeared, e.g., Webb (1942), Hayne (1949). Robinette et al. (1954), Gates et al. (1968), Anderson and Pospahala (1970), PAGE 11 Sen et al. (1974), Burnham and Anderson (1976) and Grain et al. (1978). Since it is commonly assumed by these authors that the objects being sampled are fixed with respect to the transect, line transect models are best suited for either immotile populations, flushing populations (populations where the animal being observed makes a conspicuous response upon the approach of the observer) or slow moving populations. Examples of such populations are: (i) immotile birds' nests, dead deer and plants, (ii) flushing grouse, pheasants and quail, and (iii) slow moving desert tortoise and gila monster. The degree to which line transect methods can be applied to more motile populations, such as deer and hare, will depend on the degree to which the basic assumptions are met. In any case, one should proceed cautiously when using these models for motile populations. Despite the wide applicability of line transect methods, the estimation problem has only recently begun to receive rigorous treatment and attention from a statistical standpoint Gates et al. (1968) were the first to develop a density estimation procedure within a standard statistical framework. After making certain assumptions with regard to the probability of sighting an animal located at a given right angle distance from the transect, they rigorously derived a population density estimate. In addition, they were the first authors to provide an explicit form for the approximate sampling variance of their density estimate. PAGE 12 4 While the assumptions of Gates et al. (1968) concerning the probability of sighting an animal did work well for the ruffed grouse populations they were studying, it is clear that the validity of their assumptions will be quite crucial in establishing the validity their density estimates. If the collected data fail to substantiate their assumptions, large biases could occur in the estimates as seen in Robinette et al. (1974). As a result. Sen et al. (1974) and Pollock (1978) relaxed the assumptions of Gates et al . (1968) by using more general forms for the sighting probability, while Burnham and Anderson (1976) developed a nonparametric approach as a means of providing a more robust estimation procedure. In the following sections, we will outline the general problem of density estimation using line transects, give our approach to the solution of this problem and summarize the results found in the remainder of this work. 1 . 2 Density Estimation Using Line Transects The line transect method is simply a means of sampling from some unknown population of objects that are spatially distributed. In the context of animal or plant population density estimation, these objects take the form of mammals, birds, plants, nests, etc., which are distributed over a particular area of interest. From this point on, our references will always be to animal populations with the understanding that the estimation methods we describe are applicable to all populations which satisfy the necessary assumptions . PAGE 13 In the line transect sampling procedure, a line is randomly placed across an area, A. that contains the unknown population of interest. An observer follows the transect and records one or more of the following three pieces of information for each animal sighted: (i) The radial distance, r, from the observer to the animal . (ii) The right angle distance, y, from the animal to the line transect. (iii) The sighting angle, 6, between the line transect and the line joining the observer to the point at which the animal is sighted. These measurements are illustrated in Figure 1. Figure 1. Measurements recorded using line transect sampling, (Z is the position of an observer when an animal is sighted at X. XP is the line from the animal perpendicular to the transect.) PAGE 14 In this work, we shall consider the problem of estimating population density using only the right angle distances. Because estimates depending only on right angle distances are easy and economical to use, such estimates have become very popular over the past several years. Before any estimation procedure based on right angle distances can be formulated, certain assumptions regarding the population of interest must be made. A set of assumptions used by several workers in the area is detailed in Section 2.2.1. One of the key assumptions in this set is that the probability of sighting an animal located at a right angle distance, y, from the transect can be represented by some nonincreasing function g(y), which satisfies the equality, g(0) =1. This function is simply a mathematical tool for dealing with the fact that animals located closer to the line transect will be seen more readily than animals located further away from the transect. An alternative method of dealing with this phenomenon is given by Anderson and Pospahala (1970). If g(y) is assumed to have some specific functional form determined by some unknown parameters, then the estimate is said to be parametric. On the other hand, if g(y) is left unspecified except for the requirements that it is nondecreasing and g(0) = I, then the estimate is said to be nonparametric Seber (1973) has shown that any density estimate based on right angle distances will have the form PAGE 15 N D = ^^ s 2LoC' where N is a random variable representing the number of animals seen in a line transect of length Lq and c is an estimate for c, a parameter which depends on g(y) through the relation .00 c = g(y)dy, ^O By noting that the density is simply the number of animals present per unit of area, it is clear that c can be interpreted as one-half of the effective width of the strip actually covered by the observer as he moves along the transect. Further examination of D also points out that estimating the parameter c is the key to the estimation problem. At this time, we would like to point out that the range for the right angle distance, y, is allowed to go from to + Â«>, as seen in the integral on the right hand side of the equation for c. In practice, since we are considering only a finite area, A, there will most certainly be a maximum observation distance, W, perpendicular to the transect. However, if W is large enough so that the approximation g(y)ciy = s(y)dy (i.i) -'o 'o is reasonable, then letting y range in the interval [0,+ <") will not cause any real problems. In practical terms, this means that the probability of observing an animal located beyond the boundary, W, should be essentially zero. PAGE 16 8 In most real life situations, W can be chosen large enough so that the approximation given in (1.1) is valid. Thus, in the chapters which follow, we will implicitly assume that relation (1.1) holds for the density estimates that we develop. Both parametric and nonparametric models have been used to derive an estimate for the parameter c, and, consequently, for the population density. In both cases, the estimate for c turns out to be a function of the observed right angle distances. In the parametric case, c will simply be a function of the parameters that define the function chosen for g(y) . Examples of parametric estimates are found in Gates et al. (1968), Sen et al . (1974) and Pollock (1978). Estimation using the nonparametric model is more complicated. Burnham and Anderson (1976) have shown that estimating Â— is equivalent to estimating fY(0), where fY(') is the conditional probability density function for right angle distance given an animal is sighted. Thus, the problem of finding a nonparametric estimate for the population density reduces to the problem of estimating a density function at a given point. Unfortunately, this problem has not received much attention in the literature. Burnham and Anderson (1976) suggest four possible estimates for fY(0), but the sampling variances associated with these estimates have not been established. PAGE 17 Grain et al . (1978) have also considered the problem of estimating fyCO). They derive an estimate using a Fourier Series expansion to approximate the conditional probability density function fyCy). Although their procedure does not lead to a simple estimate, they do provide an approximation to its sampling variance. The line transect method and the corresponding population density estimates so far described require the observer to travel a predetermined distance, Lq, along the transect. This method will be called the direct sampling method. An alternative to the direct method is the inverse sampling method, wherein sampling is terminated as soon as a specified number. No, of animals are sighted. Clearly, in the direct method, the number of animals seen is a random variable and the total length travelled is a fixed quantity, while in inverse sampling method, the total length travelled is the random variable and the number of animals that must be seen is fixed. The main focus of this v;ork will be to develop density estimation techniques that are based on the inverse sampling method. In addition, we will consider the density estimation problem when a combination of the inverse and direct sampling plans is used. 1 . 3 Summary of Results In Chapter 2 we derive two estimates for the population density, D, using an inverse sampling scheme. The set of assumptions which justify the use of these estimates is PAGE 18 10 similar to those used by Gates at al. (1968) and several others. The estimates have the form where No is the number of animals that must be seen before sampling terminates, L is a random variable representing the length travelled on the transect and c is as previously defined. Note the similarity of t) to D given in Section 1.2 The only difference between the two estimates is that in fi-jthe random variables are L and c, while in D they are N and c. s ' However, this difference gives the inverse sampling method a theoretical advantage over the direct sampling method. The random variables L and c will be seen to be independent while N and c are not. Thus, the estimate Dj is the product of two independent random variables, a fact which not only allows us to obtain its expected value and variance easily, but also leads to a simple criterion for sample size determination. Both a parametric and a nonparametric estimate for the animal population density are developed in Chapter II. In deriving the parametric estimate, the functional form assumed for g(y) is identical to the one used by Gates et al. (1968). Our parametric density estimate is shown to be unbiased and the exact variance of this estimate is also provided. In the nonparametric case we propose an estimate for fY(0) using the method developed by Loftsgaarden and Quesenberry (1965) . We then use heuristic reasons to show PAGE 19 11 that the corresponding density estimate is asymptotically unbiased, and derive a large sample approximation for its variance . The inverse sampling method does have one drawback when there is little information available concerning the population to be studied, namely, there exists the possibility that an observer might have to cover a very long transect to sight No animals. To overcome this problem, we develop a parametric density estimate in Chapter III that is based on a combination of the inverse and the direct sampling procedures. In the combined sampling scheme, sampling is terminated when either a prespecified number, No, of animals are sighted or when a prespecified length, Lq , has been travelled along the transect. Thus, in combined sampling both the length travelled and the number of animals seen will be random variables. In deriving the density estimate based on the combined sampling method, we again use the functional form for g(y) proposed by Gates et al. (1968). This estimate is shown to be asymptotically unbiased. In addition, an approximate variance for this density estimate is provided. The density estimates developed in Chapters II and III are based on the assumption that the sightings of animals will be independent events. Gates et al. (1968) showed that this assumption failed to hold for the animal population they were studying. In Chapter IV we relax this assumption, and develop an estimate based on inverse sampling that can be PAGE 20 12 applied to clustered populations--populations in which the animals aggregate into small groups or "clusters . " Since the estimation procedure developed will require the use of a high-speed computer, the last section of Chapter IV is devoted to a worked example to illustrate the computations that would be involved. PAGE 21 CHAPTER II DENSITY ESTIMATION USING THE INVERSE SAMPLING PROCEDURE 2 . 1 Introduction In this chapter we shall propose estimates for animal population density based on an inverse sampling procedure. Unlike the direct sampling method considered by Gates et al. (1968) , the inverse sampling procedure specifies the number of animals that must be sighted before the sampling can be terminated. Thus, in the inverse case the number of animals sighted will be a fixed rather than a random quantity. A precise formulation of the inverse sampling method is as follows : 1. Place a line at random across the area, A, to be sampled. 2. Specify a fixed number, N,, , and sample along the line transect until N,, animals are observed. As one proceeds along the transect, certain measurements will be made. These will be denoted by y, , y^ , . . . , y^^ and Z , where y. is the right angle distance from the i animal observed to the transect and I is the total distance travelled along the transect during the observation period. A visual depiction of these measurements is given in Figure 2 13 PAGE 22 14 Figure 2. Measurements recorded using inverse sampling 2. 2 A General Model Based on Right Angle Distances and Transect Length The estimates for the density, D, that we will develop are based on the right angle distances, y^^ ,72 > Â• Â• Â• -yNo Â• ^'^^ the total distance, Â£, travelled along the transect. Two possible approaches to the estimation of D merit consideration. First, recall that density is defined as the number of animals present per unit of area, or equivalently the rate at which animals are distributed over some specific area. Therefore, we can write D N where A is the area of interest and N' is the total number of animals present in A. In the direct sampling approach PAGE 23 15 the estimation of D is most often accomplished by first estimating N and then dividing by A. Seber (1973) shows k that any estimate of N based on direct sampling has the form ^* = NA 2Lot where N is a random variable denoting the number of animals seen, L^ is the length of the transect and c is an estimate of c, a parameter which depends on the probability of sighting an animal given its right angle distance from the transect. Note that in Seber 's estimate, N is random and L^, is fixed. It follows then, that Seber' s estimate for D does not depend explicitly on A and has the form N ^s ~ 2Ul Â• Therefore, the estimate of D is independent of the actual size of A, a property that any reasonable estimate of D should possess. As an alternative, D itself can be regarded as the basic parameter of interest and estimates for D can be derived directly. This is the approach taken by Burnham and Anderson (1976) and the one that we will follow in developing our estimates . 2.2.1 Assumptions The form of any estimate of D, the animal population density, will depend upon the type of assumptions we can PAGE 24 16 make regarding the distribution of the animals to be censused and the nature of the observations that will be made. The assumptions our estimates will be based on are as follows: Al. The animals are randomly distributed with rate or density D over the area of interest A, i.e., the probability of a given animal being in a particular region of Area, 6A, is 6A/A. A2. The animals are independently distributed over A, i.e., given two disjoint regions of area, 6A, and 6A2. P(n-| animals are in 6A, and n^ animals are in 6A2) = P(n, animals are in 6A-j^)P(n2 animals are in 6A2) . A3. The probability of sighting an animal depends only on its distance from the transect. In addition, there exists a function g(y) giving the conditional probability of observing an animal given its right angle distance, y, from the transect. In probability notation, g(y) = P(observing an animal | y) . A4. g(0) = 1, i.e., animals on the line are seen with probability one. A5. Animals are fixed, i.e., there is no confusion over animals moving during sampling and none are counted twice. 2.2.2 D erivation of the likelihood Function We will use the maximum likelihood procedure to obtain an estimate for D. The joint density function we are interested in is fy l(Z' ^' No) where Y = (Y-,,Y2 Y^^ ) is the vector of random variables PAGE 25 17 representing the right angle distances, L is the random variable representing the total length travelled, and No is the specified number of animals to be seen before sampling terminates. Since the dependence of the joint density on No is implicit throughout the rest of this chapter, it will be dropped from our notation for convenience. Thus, from now on we will denote the density as and all other expressions depending on No in this manner will be handled accordingly. The following two theorems will be very useful in the derivation of the likelihood function. Theorem 1 : Let N(Jl) denote the number of animals sighted in the interval (0,?.] along the transect. Then, N(Â£) is a Poisson process, and for some 6 > 0, P{N(Jl)=n} = ^ ^;^-' ' n = 0,l,2... . Note that the quantity 9J, equals the expected number of animals sighted per segment of length i, . Proof: In order to show that N(Â£.) is a Poisson process, we will show that the assumptions in Section 2.2.1 imply the postulates necessary for a Poisson process given in Lindgren (1968, p. 162). First, consider two disjoint intervals, Â£-, and 2,2' along the transect and the corresponding areas, A(Â£,) and A(i2) , enclosed by lines perpendicular to the transect as shown in Figure 3. PAGE 26 18 Figure 3. Two disjoint areas along the transect Now let N, and NÂ„ be random variables representing the total number of animals that occupy A(Â£,) and A(Â£Â„), respectively. By definition, N(Â£,) and N(J.2) ^^^^ the number of animals sighted in A(t-|) and A(Z^) , respectively. We know from assumption A2 that N, and N^ are independent, and from assumption A3 that sighting an animal depends only on its distance from the transect. Thus, N(S,,), which depends solely on N, and the distances to the N, animals from the transect, is independent of l^(^2^ > i-^-. the number of sightings that occur in two disjoint intervals along the transect are independent events. Next we will show that for every il > m ^ and any h> 0, N(il)-N(m) and N (Â«.+h) -N (m+h) are identically distributed. First, note that the effective area sampled in seeing N(ii)-N(m) animals and N(Â£+h)-N(m+h) animals is equal to A(Jl-m) as seen in Figure 4 . PAGE 27 19 Figure 4. Effective area sampled in seeing N(]l)-N(in) animals and N(2,+h) -N(m+h) animals. Therefore, by assumptions Al , A2 , and A3, and since the transect is dropped at random, it follows that P{N(Â£)-N(m) = j}=P{N(Â£+h)-N(m+h) = j}, j=0,l,2 Next we must show that for every Â£ > 0, and some 6 > 0, P{N(Â£) = 1} = eÂ£ + o(Â£) , as Â£^0, where lim \ = 0. Â£^0 ^ Again let A(Â£) be the area defined by Â£ on the transect. Now define B. to be the event {N(Â£> = i) and E. to be the event that there are exactly j animals in area A(Â£). Then it follows that P(B,) = E P(B,E ) 1 j=l ^ J = E P(BJE.)P(E.) j=l ^ J J PAGE 28 20 Under assumptions Al and A2 , Pielou (1969, p. 81) has shown that ,-DA(OrnA/.MJ P(E.) = [DA(Â£)]-. j =0,1.2. Also, under assumptions Al , A2 and A3. Seber (1973, Eq. (2.6)) has shown that where g(y)dy, (2.1) Therefore, we can write P (B,) =2cDee"Â°^^^'^ + I P(BJE )P(E.). j=2 J J and if we show E P(B, |E,)P(E.) = o{l) j=2 ^ J J the proof will be complete. Note that I P(B, |E.)P(E.) < 1 CO Â„-DA(OrnA.nMJ [DA(0] j=2 1' j'^^ J j=2 T " Â„-DA(Â£) , . J-1 = DA(Â£) I ^ UmLU j=2 J' < DA(a) 5: " e-^^^^^DA(0]J-^ DA ( Â£ ) = DA(Â£) [1-e ^"^"^ ] PAGE 29 21 For any finite area A, A(?,) is {I) , that is lim ^-^ < K, for some K > 0. Therefore , as Jf. -> 00 Z P(B, |E.)P(E.) = o{l) . j=2 ^ J J and, . upon writing e = 2cD, (2.2) we get , as I -^ Q , P(B-|^) = Ql + o{l) . Finally, we need to show that for every 1> 0, Z P{N(Â£) =n} = o(Ji) , as I -> 0. n>l Note that for all n > 1, we can write 00 P(B ) = E P(B E.) n . T n 1 00 = T. P(B |E.)P(E.) . n' J J j=n J J Again, by using the fact that k{i) ts {l) , it is easy to show that P(B ) = oO') , as Â£ ^ 0, and N(Â£) satisfies the four conditions necessary for a Poisson process . Before proceeding to the second theorem, we need to define the following random variables. Let T. denote the random PAGE 30 22 variable corresponding to the distance travelled on the transect between sightings of the (i-1)^'' and i^ animals, i = 1 , 2 , . . . ,No . Then the total distance travelled is given by No L = I T. . i=l ^ The following theorem establishes the independence of Y and T, ,TÂ„ TÂ„ , for the case N^ = 2, and this fact enables us to derive the joint density function, fy t(x.i^)Theorem 2 . The random variables T , T2 , Y, and YÂ„ are mutually independent . Proof: In order to establish the independence of T, , T^ , Y, and Y2 we will derive the joint density ^T^,T2,Y^,Y2^^1'^2'yi'y2^ and show that it can be factored into four functions, each depending on only one of the random variables of interest. Let y, , y2 , t, , t2 , h-, , h^ , g, and g^ be non-negative real numbers such that h + h^ PAGE 31 72+52 ^2 Yl+gl ^1 23 II h 4-^hi 4+4 t-|+t2+h2 Figure 5. Areas defined by y-i ,72 . t-, , t2 , g-i , go .h-, , and h2 Now let P(h^.g^,h2'g2^ = P<^4. < ^2 ^t^+h^.y-^ < Y^ PAGE 32 24 S^, the event {N(t, +t2) N(t,+h, ) = 0} S, , the event {N(t,+t2+h2) N (t-, +t2) = 1} and {y^ PAGE 33 25 (2Dg^h^)J By assumption A3, P(S2j) = yi ^^-^Â— , j =0.1,2. P(S2|S2i) = g(y{). for some y-. PAGE 34 26 In the same manner, we can show that the independence established in Theorem 2 will hold for any finite number of sightings, NÂ„. In this case if T = (T^ . T2 . . . . , T^ ) and Y=(Y^,Y2, ^nJ' ^^^^ ^2.3) becomes N, >NÂ„^N -e EÂ°t. 1=1 ^ N f^^Y^^'i^) = 2''Â°D''Â°e n g(y.). i=l Upon using equation (2.2) in f^ y^^'^^ ' ^^ S^^ No -e E t. f (r ^7^ fi^o^ i=l ^ -No Â„Â° . , r-p Y^->^^ e Â°e c IT g(y.). -'i=l ^ Thus, the marginal distributions for T. and Y. are g(y.) and T. 1 -Gt. f^ (t.) = ee ^. t. > 0. Therefore, Tj^,T2 T^^ are independent, identically distributed (iid) as Exponential random variables with parameter 6, and No L = T. T. i = l ^ has a Gamma distribution with parameters N^, and 0, i.e., qNopNÂ„-1 4)i Furthermore, L is independent of Y. PAGE 35 27 The likelihood function for the estimation of 6 and c can now be obtained by taking the product of fj (i) and f^Cz) . i.e. , No N NÂ„-l -0Â£ We will now outline how one can estimate D, the animal population density, from the likelihood function given in (2.4). As noted earlier, D is related to 6 and c by equation (2.2), i.e., D ^ Thus, the maximum likelihood estimate for D would be where G and c are maximum likelihood estimates of 6 and c, respectively, obtained from (2.4). Note that the estimate D is the ratio of two mutually independent random variables, one depending on L alone and the other depending on Y alone. This property will be found to be very useful when evaluating the moments of 6. We have now set the framework necessary for deriving an estimate of D. In the next section we shall obtain an estimate for D assuming that g(y) has a particular parametric form. PAGE 36 28 2 . 3 A Parametric Density Estimate Any estimate for D that is derived after assuming an explicit function for g(y) will be called a parametric estimate. Gates et al . (1968), using direct sampling, derived an estimate for D assuming g(y) = e" ^. Using this same function for g(y), we will derive the corresponding estimate based on inverse sampling. 2.3.1 Maximum Likelihood Estimate for D To estimate D we need to estimate both 6 and c from the likelihood function (2.4). In this case g(y) = e'^^", y>0, A>0 so that Substituting for c in (2.2) yields D = ^ . (2.5) Also, by substituting for c in (2. A), the likelihood function becomes No L(0,A;^,f,) = A^Â°e "--^ ^ l^^j^ . 1^0, y.>0. (2.6) The joint maximum likelihood estimates for and A can now be easily obtained. The natural logarithm of the likelihood function is PAGE 37 29 No lnL(0,X;y_,l) =li^lnX-X T. y .+li^lnO+(li^-l) Inl-Ql-lnT (N^) . i=l ^ Taking the partial derivatives with respect to 9 and A yields = tlo. _ 96 e ' dlnL(d,X;y_,l) No dX X i=i^i' Setting these equal to yields ^ IT Â• and NÂ„ A = No Substituting these estimates for G and A in (2.5), the maximum likelihood estimate for D is seen to be 5 = 6A ^ N^ 2 NÂ„ 2Â£ T. y. i=l ^ 2.3.2 Unbiased Estimate for D The expected value of the estimate D, developed in Section 2.3.1, is E(D) = E(^) -*^ ^ = 2" E(0)E(A) since 6 and A are independent. Using the fact that L has a Gamma distribution with parameters N^ and 0, we obtain PAGE 38 30 E(e) = E(^) NÂ„9 To derive an expression for E(A), first recall that Y-,,...,YÂ„ are iid with the common density f^(y) = ^ = xe-^y, y>0. No Therefore, Z Y. is distributed as a Gamma random variable i=l ^ with parameters Nq and A and E(A) = E NqA Independence of 6 and A now yields E(D) = ^ E(e)E(A) ,2 fNo 1 PAGE 39 31 2,3.3 Variance of D -u N, Due to the independence of L and I Y., the variance of i=l ^ D can be derived directly. We have Var (D ) = Var 2 1 (Nq-I) No 2L E Y. i=l ^ (N,-!)' Var 1 No L Z Y. i=l ^ (No-1)^ E ""^M No LEY. Li=l ^J / N, Since L and Z Y. are independent it follows that i=l ^ Var (D ) = (^-1)' u 4 / 1 \ VL^/ PAGE 40 32 ECi) = ' 1 \ PAGE 41 33 CV= ^ E(D) where o^ and E(D) denote, respectively, the standard deviation D and the expected value for the estimate, D. As one can see immediately, small values of CV are desirable since this indicates that the estimate has a small standard deviation relative to its expected value. With the inverse sampling method, the value of N^ needed to guarantee a preset value, C, for the coefficient of variation of D can be calculated easily. Using (2.7) and (2,13) U JO we see that, for No>2 Then, setting C = CV(D ), it is easily shown that N, is the root of the quadratic equation C^No^ (4C^+2)No+4C^+3 = 0. Solving for N, yields the two roots c Since the variance of D exists only for N^ > 2 , the required sample size is For example, if C=.25, then No = 35. Table 1 gives values of No corresponding to coefficients of variation ranging from .1 to .5. PAGE 42 34 Table 1. Number of animals, N,, , that must be sighted to guarantee the estimate, D , has coefficient of variation, CV(D ) . u CV(D^) N, .50 11 .40 15 .30 25 .25 35 .20 53 .15 92 .10 203 2 .4 Nonparametric Density Estimate In this section we will consider a nonparametric estimate for the population density, D, using inverse sampling. In contrast to the parametric approach used in Section 2,3, the nonparametric approach leaves the function g(y), which represents the probability of observing an animal given its right angle distance, unspecified. In Section 2.2,2 we showed that an estimate for D is given by 2c where n and c are the estimates for 0, the expected number of sightings per unit length of the transect, and c defined as c = g(y)dy, -"o PAGE 43 35 If g(y) is completely specified, except for some parameters, then the problem of estimating D reduces to the problem of estimating 6 and the parameters in g(y). In Section 2 . 3 we considered the specific case g(y) = e-^y. A drawback to this approach, where we specify a functional form for g(y), is that the function chosen must take into account the inherent detection difficulties that are present when a particular animal species is being sampled. If one examines the various forms that have been suggested for g(y), one quickly becomes aware of the problem of finding a form that is flexible enough to accommodate the many possibilities which exist. Some of the functions that have been proposed for g(y) are presented in Table 2. As seen in the table, the suggestions for g(y) represent a number of different shapes in an effort to reflect the nature of the animal being sampled and the type of ground cover being searched. Because of the problems that can arise in choosing a function for g(y), Burnham and Anderson (1976) considered a nonparametric approach as a means of avoiding the need for the specification of g(y). Leaving g(y) unspecified will allow the estimation procedure to depend on the observations tlKiL arc actually made, not on any particular model. Thus, a nonparametric model might provide a more robust estimation method, that is, an estimation method that could be applied to a much wider class of animal species. PAGE 44 36 Table 2. Forms proposed for the function, g(y). Function Author g(y) = e" y , A>0 Gates et al. (1968) g(y) =< ^^ Eberhardt (1968) -Ay PAGE 45 37 Now, if g(y) is left unspecified, then an estimate for may be obtained along the same lines Burnham and Anderson (1976) used in the case of direct sampling. By assumption A4, Hence, Â— equals the value of the fyC') evaluated at y=0, where fyC') is the probability density function for the right angle distance, Y, given an animal is seen. The problem of finding a nonparametric estimate for -, therefore, reduces to the problem of finding an estimate, i^yCO) . fo^ fyCO). An estimate for D will then be given by . e! (0) D = -4 . (2.14) where 6 may be taken as the maximum likelihood estimate derived in Section 2.3.1. That is, e = (Nq-D L where we have replaced No by Nq-I to remove the bias. 2.4.2 An Estimate for f y (0) Burnham and Anderson (1976) suggested four possible methods for estimating fY(0) , but we are not aware of any work which investigates the theoretical properties of any of these estimates. Loftsgaarden and Quesenberry (1965) considered a density function estimate based on the observation ^^^^ FY(x+h) FY(x-h) fy(x) = lim yr ^ h-0 ^ where Fy(') is the cumulative distribution function. For the purpose of estimating fY(0), their estimate takes the fo rm PAGE 46 38 Â£^(0) = (/Nl Y(f^^,^)} -1 (2.15) where [/n7 +1] is the value of /N7 + 1 rounded off to the nearest integer and ^/\ is the j order statistic of the sample y^ , 72 , . . . , y^. Loftsgaarden and Quesenberry (1965) showed that fY(0) as given in (2.15) is a consistent estimate, provided fY(*) is a positive and continuous probability density function. One nice property of iyiO) is that it can be easily calculated from the data. However, evaluation of the moments of this estimate does present some problems. In fact, the mean and the variance may not even exist in some cases. But, whenever [ /n7 + 1] ^ 3, i.e., whenever No5?4, the variance of fY(0) is finite as sho\^m in the following theorem. Theorem 3 . Let Y, ,Y2,...,Y be a set of independent, identically distributed random variables, representing the right angle distances, with continuous probability density function (p.d.f.) fv(y) = ^^ ' y^o. Also, let Y. s be the r order statistic Then (r) < + 00 for every integer r, such that 3 < r < n PAGE 47 39 Proof: The density function for Y. v is h^(y) = n(j:})F^-^(y)[l-FY(y)]^"'^fY(y) where 'n Therefore, e(^) = "(?:1)I3 Fr' PAGE 48 40 2.4.3 Approximations for the Mean and Variance of f y(0) Let F(*) and F (Â•) denote the cumulative distribution function and its inverse for the random variable Y, the right angle distance. Also, let r = [/N + 1] , U = F(Y, 0, r (r) ' and PAGE 49 41 Taking the limit as N tends to infinity and noting that F'^^CO) is and u = F(y) yields lim ElfyCO)} 1 lim 1 7W dF~-'(u) du u=0 ^ dF(y) dy y=0 = fyCO) (2.18) Thus for large N, fyCO) is approximately unbiased. An approximation for the variance of ^y^^^ ^^ found in a similar fashion. Using (2.17) we get 2, Var{(j)(U^)} |d0(u) u=p. Var(U ) . r Evaluating the derivative yields d4)(u) du u=p^ (r-l){F-^p^))^fY(p^) so that Var{ PAGE 50 42 Var{fY(0)}^ 4^"(^r[^vm)] l/ /N+l \1 I N+1/ W LyV N+lj_ ^ (/N+1) (N-/N) ^ (N+l)^(N+2) (>4y+l)(N-/N)N (N+l)^(N+2) Therefore, as N->oÂ° we have lim /N{Var !Â„(0)} = fj(0) , N-xÂ» Y so that an approximation for the variance, when N is large, is given by Var{?Â„(0)} = -^ ^ /N (2.19) As stated earlier, the expressions obtained for the expected value and variance of fY(0) are only approximations. Their adequacy for practical purposes may be evaluated by a Monte Carlo study involving various specific forms for the p.d.f., fY(*). In the next section we will look at the results of just this kind of simulation study. 2.4.4 A Monte Carlo Study A Monte Carlo study was used to examine the approximations for E{fY(0)} and Var{?Y(0)? presented in Section 2.4.3. Three possible shapes for fY(*) were used in the study. Since the shape of fY(*) depends solely on the choice of g(y), the functions &^(y) = e-^Â°y. y>0 g2(y) = 1-y , 0 PAGE 51 43 and g^(y) = 1-y^, 0 PAGE 52 44 standard deviation, o^, of fyCO) given in equation (2.15) as follows: Let f^yCO) denote the estimate from the i sample, i= 1,2, ... ,2000. Then -, 2000 ^e = mj^ r, fiY^O) 1=1 /y3-fY(0)x B^ = 100' "^ and , /2000 ^ 9\l/2 ^e = TM^,J, (f,Y(0)-.,)2j . All of the necessary computing was performed under release 76. 6B of SAS (see Barr et al., 1976) at the Northeast Regional Data Center located at the University of Florida. The results of the study, along with the approximate standard deviations, Â°T = fyCO) /N are presented in Tables 3, 4, and 5. As can be seen from the tables, the estimate of fY(0) has a negative bias for most samples, generally of a magnitude less than 107o of the true value The ratio of Oy/o is also within 10% of one for almost all samples considered. This is even true for the smaller sample sizes, n<45. Also, when considering the smaller sample sizes, the ratio was for the most part greater than one. Based on the results of this simulation, we feel that, in practice, the approximations obtained for the expected value and variance of fY(0) would perform adequately. PAGE 53 45 Table 3. PAGE 54 46 2 Table 5. Results of Monte Carlo Study using g^Cy) = 1-y Sample Size PAGE 55 47 where N -1 L ' and 1 fv(0) = ^^([/N,+l]) Then, upon substituting the appropriate expressions for the moments of 6 and fyCO) into the above equations, we get E(D^) 1 D, (2.21) and Var(Dj^) . d2 -^^|^ . (2.22) 2.4.6 Sample Size Determination Using P ., We can now determine the approximate value of No that is needed to guarantee some preset value for the coefficient of variation of DÂ„, CV(Dj,). These values for No can then be compared to the corresponding values for No (see Table 1) that are needed to ensure the same coefficient of variation with the parametric estimate, D . Using (2.21) and (2.22), we see that an approximation for the coefficient of variation of DÂ„ is ^^^^N-* = ,Â„ ..a/2 Â• (No+2)^'^ and by setting C=CV(Dj^), one can easily show that /R^ is the root of the quadratic equation C^No /N7 + 2C^-1 = 0. PAGE 56 48 Solving for /W^ yields the two roots 2? and since whenever (l-4C^(2C^-l))^/2 > 1 ^ 2' the required sample size for values of C< .5 i< N^ = r i+(l-AC^(2C^-l))^/ ^l 2 \ 2? For example, if C = .25, then N^ = 284. Table 6 gives values for No corresponding to coefficients of variation ranging from .2 to . 5 . Table 6. Number of animals, N^ , that must be sighted to guarantee the estimate D^, has coefficient of variation, CV(DÂ„). CV(D^) PAGE 57 CHAPTER III DENSITY ESTir-lATION BASED ON A COMBINATION OF INVERSE AND DIRECT SAMPLING 3. 1 Introduction When sampling a population by means of line transects. it is important to keep in mind that the transect length that can be covered by an observer will be finite. This poses a problem for the inverse sampling plan since there will exist the possibility of not seeing the specified number of animals within the entire length of the transect. Therefore, it seems reasonable to develop a sampling scheme that would employ a rule, which allows one to stop when either a specified number. NÂ„. of animals are seen or a fixed distance, Lo, has been travelled on the transect. In this chapter we will consider a sampling plan which combines the inverse sampling procedure discussed in Chapter II and the direct sampling procedure of Gates et al. (1968). More precisely, we will define the combined sampling method as follows : 1. Place a line at random across the area. A to be sampled ' 2. Specify a fixed number of animals, NÂ„>2 and a finnf ll^'^l^'''' length. LÂ„, and then continue sampling ar/.^^t transect until either H, animals are seen or a distance, LÂ„ , has been travelled. 49 PAGE 58 50 Since the above method merely incorporates the individual stopping rules from the inverse and direct sampling methods, it seems reasonable to use the estimate Td if N = No Dcp =\ ."^ (3.1) [Dg if N < Ne, where N is a random variable corresponding to the actual number of animals sighted using combined sampling, 6 is the inverse sampling estimator given in (2.7) and D is an esti6 mator appropriate for the direct sampling case. In other words, the combined sampling procedure uses the inverse sampling estimate if sampling terminates after N^, animals are seen and the direct sampling estimate if sampling terminates after travelling a distance Lq . In Section 3 . 5 we will also show that DÂ„p has a maximum likelihood justification. Before proceeding to derive the mean and variance for Dpp , we need an estimate appropriate for the direct sampling case . 3 . 2 Gates Estimate Based on the direct sampling approach and assuming g(y) =e-^y, X > 0, Gates ct al. (1968) developed the estimate . r . n = 0.1 n(n-l) , n > 2, n 2Lo yy. i=l ^ PAGE 59 51 "here L. is the fixed length of the transect, n Is an observed value for the random variable N,, the number of animals seen using direct sampling, and y. is an observed value for the rando. variable Y.. 1-1,2 Â„, .^e right angle distance to the i' animal seen. In what follows, we shall show that the variance of D, is not finite. First, we need a result concerning the joint density of the Y . , 1-1,2 N,, conditional on N^. a IMosen^. Under the assumptions stated m Section 2.2.1, conditional on Nj=n>0, the random variables ^^.Y^ Y ' are Independently, identically distributed with cLon'd!nsity fyfy) = ie'-^y y>0, A>0. Consequently, conditional on N^ Â„>o, the random variable d .^^Y. has a Gamma distribution with parameters n and A. Proof: We want to show that for y. >0, i = i 2 N n -A Z y. d " Recall that in the direct sampling procedure, the total length travelled, LÂ„ , is fixed, and define L^ to bo the random variable representing the total length travelled on the transect when the n'" animal is sighted. Then the events (N,-n) and fL^sL, PAGE 60 "V2---Yn ^'^1'''2 ^N IV") 52 = f. Now by Theorem 2. Y-^ . Y^ . . . . . Y^. L^ and L^^^ are mutually independent, and \^yP ^ Consequently, i=l i n Y^Y2...Y^^^i-2 -N^'"d -IT .^g(yi) c 1=1 n n = An e-^yi. i=l which completes the proof, It is now easy to show that Var (D^) does not exist. ^d From Theorem 3. conditional on N =n>0. Z Y. has a Gamma i=l ^ distribution with parameters n and A . Thus, using (2.12) and (3.2) "O E(o2|N^.n) = . n = 0,1 2 ? n^(n-l)A^ o Â— T , n>2 l4Lo(n-2) PAGE 61 53 Also, since N, is the number of sightings in a transect of length Lo , it follows from Theorem 2 that N, has a Poisson distribution with parameter OL,, . Thus E(D^) = E^ ECD^lN^) d _ A^ " n^(n-l) e'^^(9Lj" T ^' -(7^:21 HT^ showing that the variance for the estimate D , defined in (3.2) is infinite. In fact as long as P(N^ = 2) > 0. the variance of D, cannot be finite. The problem of infinite variance for D, can be overcome by replacing D, with D , where 6 J g n(n-l) n 2Lo T. Y i=l ^ if n=0,l,2 if n>3 (3.3) Note that the estimate, D , differs from D, only when n=2. g d -^ Since any estimate of the density based on only 2 sightings should be effectively 0, the above modification does not seem to be unreasonable. Wc will now proceed Lo derive expressions for the mean and variance of D , which are needed g in the sequel. PAGE 62 iÂ»4 3.2.1 The Mean and Variance of D g We will first examine E(D ). Recall from Theorem 3, that O ^d conditional on N,=n, n>0, Z Y. has a Gamma distribution with i=l ^ parameters n and A. Thus X E/ _J: zS 1=1^"d= HTT and , n=0.1.2 E(D IN =n) = ^ Â• "^ n.3 2L o Now since N^ is distributed as a Poisson random variable with parameter OL,, , it follows that E(Dg) = E E(DglN^) X ; ne-^LÂ„^ n 2L, ^^3 = ^ {l-e-^^Â°(l+OLo)} . Substituting the left hand side of (2.5) for ^ in the above yields E(Dg) = D{l-e"''^Â°(l+ULo)}, and after writinp, p = OL^ , the expected number of sightings in a transect of length L^ , we get E(D ) = D{l-e"^'(l+u)}. (3.4) PAGE 63 Thus.D is not strictly unbiased, but the bias arises because there is a positive probability of obtaining samples of size 1 or 2. However, even for moderate values of \\ , the bias in D will be small since e"'^(l+y) tends to zero exponentially g fast. For example, if y = 10, the relative bias is only .057o, Next we will look at Var (D ). Again since, conditional N g on N,=n, n>0, T. Y. is distributed as a Gamma random variable. i=l " with parameters n and 1 !^d (>) ilV^ we know that (n-l)(n-2) n>2. and E(D |N, g' d =n) JÂ° n^(n-l)A^ if n=0,l,2 if n>3 Therefore, E(d') = E^ E(D |N^) g -\i_ n 4L V " (n-1) e u T '' (n-2) n! n= J (3.5) \i n Var(D ) = -^ T. M^ S^^ D^d-e-^l+u)) g AT 2 Â„_T (n-2) n! (3.6) and we can write 2 oo 2 Uhf n=3 An approximation to Var (D ) valid for largo values of \i may be derived in a manner analogous to the method used by Gates et al. (1968). After writing PAGE 64 56 2 it is easy to see that for n^3 Thus, lower and upper bounds for E(D ) are LB = -^ E (n -tTi+2) ^ .^ PAGE 65 57 From (3.7) an approximation for Var (D ) is Var(D ) =D2{l+lf4--e"^(5+^ + 4)}-D^U-2e-^(l+u)+e'2^(l+y)2} fa ^ u ^ \i = d2(^ + ^e-^3-2p +1 + 4) e-2^1+,)2} . Now, as u increases, the terms involving e~^ and e~ will 9 A tend to much faster than + Â—7, so that for large y, we have the approximation Var(D ) 1 d2(^ + 4). (3.8) 6 P We are now in a position to derive the mean and variance of D^p_ 3 . 3 Expected Value of D ^p Recall that in the combined sampling scheme both N, the number of animals seen, and L, the distance travelled before termination of sampling are random variables. Thus, the expected value of T>^^ can be found directly using E(D(.p) = E^E(D^plN). However, before proceeding along these lines it will be helpful to have the following theorems. Theorem h . Let N be the random variable representing the number of animals seen using the combined sampling method. Then under the assumptions stated in Section 2,2,1, PAGE 66 58 -u n e \i . n=0,l, NÂ„-l n=N<, where vi = 6Lo is the expected number of animals sighted along a transect of length L^ . Proof: For n PAGE 67 59 Theorem 5 . Under the assumptions stated in Section 2.2.1, the conditional p.d.f. of Lj^ given N=n:0, is .n-l ^1^^ . iKLo. n=l,2 No-1 Lo f. ()i|N=n) N 1 qNo^N,-1^-0Â£ rTNll Â• P(N=NJ ^^^Â°' "=^< where fl-o P(N=NJ Proof: First we will consider the case when n PAGE 68 60 Now in the combined sampling approach, we will see N=No animals , if and only if the distance No L = E T PAGE 69 61 Theorem 6 . Under the assumptions stated in Section 2.2.1, conditional on N=n>0, the random variables Y, , Y2 , . . . , Y^^ and Lj^ are independent. Proof: First consider the case N=No . Let ^^0 and y.^0 for i=l,2 N. We want to show that P(Y^ PAGE 70 62 Now consider the case N=n PAGE 71 63 fyCy) = Ae' y, y>0, A>0. consequently, conditional on N=n>0. the rando. variable .f^Y. has a Gamma distribution with parameters n and A. Proof: The case N=n PAGE 72 64 We are now ready to determine the expected value of DÂ„p given N=n. For n = 0,1,2, E(D |N=n) = E(D |N=n) = (3.10) Next consider the values 3 PAGE 73 65 We can now evaluate the expected value of D^p . Using Theorem 4 and expressions (2.5). (3.10). (3.11), and (3.12). CP d expressions (2.5). (3.10). (3.11), we find that ^^^CP) = V(D^p|N) = Â°^^ 4^ P(N=n)+^ V e:!Ns^ = D[l-e~^(l+y)] ^' (3.13) where \i =9Lo . Thus Dj,p Is a biased estimate for the density. Note that the bias here is equal to the bias for the modified estimate, D^, in direct sampling. This is as expected since in the combined sampling procedure, we are simply choosing the estimate that corresponds to the reason for terminating sampling. If Â„e stop sampling after seeing the n/^ animal, then the inverse sampling estimate is used, and, likewise, if sampling stops after travelling the distance LÂ„ , then the direct sampling estimate is used. 3 ^ Variance of D An expression for the variance of 5^^ can be found directly usinj., the formula Var(6Â„) =E(6^p.Â„(6^__),2 ^^^^^ In the preceding section we derived E(6^,,) so that our problem reduces to evaluating ECD^p. Proceeding along the same lines as in Section (3.3), we quickly find PAGE 74 66 E(D2p|N=n. n=0.1,2) = 0, E(D^p|N=n, 2 PAGE 75 67 = D No-3 (n+2) y"e-^ + n n! m) n=NÂ„-2 -\i n e y nl (3.18) An expression for the variance of Dpp is now evident Using (3.13), (3.14), and (3.18), we get No-3 , ... n -u A, n\ 2 Var(D^p) =D' I (l}Â±2)iLe 7n^ , ^y _ [i_e-^(l+,)] n=l ^ ^^ V^o^y i,=No-2 ^^ (3.19) where y=GLo . Note that, and lim Var(D^p) = lim Var(D^p) = D' No"*-" N, ^vN, ^)'-^^ Â°Â° , .^x n -y o ^ (n+2)H_e _ -y 2 n=l n n After some simple algebraic manipulations and using the relationship D=-^, one can easily show that the limit as Lo-^Â°Â° and the limit as Nq-*"are equal to the Var(D ) given in (2.13) and the Var(D ) given in (3.6), respectively. These limiting values are as expected, since letting Lo-Â»-Â°Â° in the combined sampling approach is equivalent to using inverse sampling, while letting No-*-Â°Â° is equivalent to using direct sampling. We will now show that Var(Dpp) can be expressed as a function of both Var(6 ) and Var(6 ) given in (2.13) and (3.6), respectively. Writing the equation in this form will then lead directly to an approximation for Var(DÂ„p). PAGE 76 68 First note that (3.19) can be rewritten as Var(D^p) N,-3 2 Â«> D n=l " ^' ^'^Â° ^z n=No-2 "' (3.20) Adding and subtracting the terms E ^^"^^ i^Â— ^ _ M o n n n_-p n=NÂ„-2 n! and 00 -jj n e u Z T Â— to the right hand side of (3.20) yields n=N,-2 '^' Var(D^^) CP^ _ ; (n+2) p"e"^ ,-, Â„-vi,. , x,2 D n=l n -^ [1-e ^'(1+y)] oo + z n=NÂ„-2 ^[(^) -] + n=NÂ„-2 -VI n e \i n! n=NÂ„-2 (n+2) u"e-^ n nl n=No-2 -\i n e M 2No-3 [l-e~^(l+p)] (No-2)' 2 + E n=l (n+2) Me n n! oo n=NÂ„-2 n -y n rTI Now, after multiplying both sides of the previous expression by D , using the relationship D = -^ and substituting the expressions for Var(D ) and Var(D ) given in (2.13) and (3.6), respectively, we see that Var(D^p) = n=No-2 ^-11 n o 1' Var(D ) +Var(D ) D Y. u n=NÂ„-2 2 ij e n nl (3.21) Therefore, an approximation for Var(Dpp) can be simply obtained by using the approximation for Var(D ) given in (3.8). o PAGE 77 69 ^Â•^ Maximum Likel i hnnH__h^oHfir nrf ,-,Â„ for D In Section 3.1 we stated that the estimate CP Dg , N PAGE 78 70 where and A are maximum likelihood estimates for and A, respectively. Finding maximum likelihood estimates for 6 and A is now straightforward. Taking the natural logarithm of the likelihood function and setting the partial derivatives with respect to 6 and A equal to yields, for n>0, e ={ _N_ Lo N=n PAGE 79 CHAPTER IV DENSITY ESTIMATION FOR CLUSTERED POPULATIONS 4. I Introduction The estimation procedures developed in Chapters II and III are based on the assumption that the sightings of animals are independent events. These methods would be applicable to animal populations that are generally made up of solitary individuals, such as ruffed grouse, snowshoe hare and gila monster. However, there are other types of animals which aggregate into coveys, schools and other tight groups. Animals behaving in this way will be said to belong to clustered populations. Some examples of clustered populations are bobwhite quail, gray partridge and porpoise. In these cases the assumption of independent sightings is certainly not valid, and a different procedure would have to be used. The line transect method could be easily generalized to provide estimates for clustered populations. As noted by Anderson et al. (1976, p. 12), if we amend the assumptions in Section 2.2.1 so that they refer to clusters of animals rather than individual animals, then the results of Chapters II and III are directly applicable to the estimation of the cluster density, D . The estimate for D will be ' ' c c 71 PAGE 80 72 based on the right angle distances to the clusters from the random line transect. In the case where the number of animals in every sighted cluster can be determined without error an estimate for the population density D is given by D = D s c where D^ is the estimate for D and s is the average size of the observed clusters. Some criticisms of the approach outlined in the preceding paragraph are possible. First of all, it may not be possible to determine the distance to a cluster as easily (or as accurately) as the distance to an animal. How will this distance be defined? Secondly, the simple modification of the assumptions in Section 2.2.1, obtained by replacing the word "animal" by the word "cluster" would imply that the probability of sighting a cluster depends only on its right angle distance from the line. This may not be a reasonable assumption since the probability of sighting a larger cluster is likely to be greater than the probability of sighting a smaller cluster. Finally, the sighting of a cluster may not necessarily mean that all of the animals comprising the cluster are seen and counted by the observer. In this case, a more reasonable assumption would be to let the probability of sighting an animal belonging to a cluster depend on the distance to the cluster as well as the true cluster size. In this chapter we shall propose a density estimate for a clustered population by assuming, among other things, that PAGE 81 73 it is possible to determine the distance to the center of the cluster from the line transect. An estimation procedure will then be developed using a model in which the observer's count of the number of animals in a cluster is regarded as a random variable with a probability distribution depending upon the right angle distance and the size of the cluster. 4, 2 Assumptions The density estimate that we will develop is based on the inverse sampling approach outlined in Section 2,1, with one minor modification. In clustered populations the plan is to continue sampling along a randomly placed transect until a prespecified number, N , of clusters (rather than animals) are seen. As each cluster is sighted, the following information is recorded: 1. the right angle distance, y, from the transect to the center of the cluster 2. the observed number of animals, s, in the cluster, (this may be less than the true size of the cluster) 3. the actual distance, I, travelled by the observer to sight N clusters. The sampling procedure described above may be used to construct an estimate of the population density under the following set of assumptions. These assumptions closely parallel those of Section 2.2.1 with the exception that they are now phrased in terms of clusters rather than individual animals. PAGE 82 74 Bl. The clusters are randomly distributed with rate (density) D over the area of interest, A. B2 . The clusters are independently distributed over A, i.e., given two disjoint regions of area, 6A. and 6A2, 1 P(n-|^ clusters are in 6A, and 1X2 clusters are in 6Ap) = P(nj^ clusters are in 6A^)P(n2 clusters are in SA^) B3 . Clusters are fixed, i.e., there is no confusion over clusters moving during sampling and none are counted twice. BA. There exists a probability mass function p(') defined on the set of positive integers, such that p(r) is the probability that r is the true size of a cluster located at a right angle distance, y, from the transect. Note that p(r) is independent of y. In probability notation, if R and Y denote the random variables representing the true cluster size and the right angle distance to the cluster, respectively, then P(R=r|Y=y) = p(r), r = 1,2 (A.l) B5 . The probability of observing a cluster depends only on the size of the cluster and the distance from the transect to the cluster. 36. There exists a non-negative function h(*) defined on [0,Â«') such that < h(.) < 1, h(0) = 0, and the probability of observing s animals belonging to a cluster of size r > s located at a right angle distance y from the transect is it) (h(y))^[l-h(y)]''' That is, if Y and S denote the random variables representing the riglit anj^lc distance to a cluster and the observed number of animals in a cluster, respectively, then P(S=s|R=r,Y=y) = (g) (h(y) ] ^ 1-h (y) ] ''-^ (4.2) PAGE 83 75 Closer examination of assumption R6 shows that we are now allowing the probability of observing a cluster to depend on both the right angle distance, y. and the true cluster size. r. To see this, first let C be the event that a cluster is observed. Then the probability of observing a cluster of size r located at distance y from the transect is P(C|R=r,Y=y) = z P(S=s | R=r . Y=y) s = l = 1 P(S=0|R=r,Y=y) = 1 [l-h(y)]-, (^3) which clearly depends on both y and r. The assumption B6 also satisfies the reasonable requirement that for a fixed right angle distance y > and r < r . P(C-'|R=r^,Y=y) < P (c| R=r2 , Y=y) . This follows immediately from equation (4.3). Note that P(C(R=r^.Y=y) = 1 _ [l-h{y)]^^, and P(c|R=r2.Y=y) = 1 [l-h(y)]'"2_ Now, since 0 PAGE 84 76 the probability of sighting a cluster located at a right angle distance y is simply h(y). This is quickly seen by setting r=l in (4.3). Thus, under these circumstances, h(y) has the same interpretation as g(y) defined in Section 2.2.1, that is, h(y) is the conditional probability of sighting an animal at distance y given there is an animal at y. 4. 3 General Form of the Likelihood Function We will use the maximum likelihood procedure to obtain an estimate for D, the animal population density. To obtain the likelihood function, we first need an expression for the probability density function where S = (S-, , Sj , . . . , S., ) is the vector of random variables c representing the actual number of animals seen in the clusters, Y = (Y, ,YÂ„ , . . . , Yj. ) is the vector of random variables reprec senting the right angle distances from the clusters to the transect and L is the random variable representing the total length travelled on the transect to see N^ clusters. Upon writing it is seen that specifying the joint probability density function for S,Y and L is equivalent to specifying the three functions on the right hand side of (4.4). PAGE 85 77 The density functions fylL^^'^^ ^"^ ^L^^^ ^^^ ^^ derived in a manner analogous to that used in Section 2.2. Let g (y) denote the probability of sighting a cluster located at a right angle distance y from the transect, that is g^(y) = P (observe a cluster | Y=y) . Since sighting a cluster located at a distance y is equivalent to observing at least one animal belonging to the cluster, we can write CO gj,(y) = ^ P(observe s animals | Y=y) s = l oo = I P(S=s I Y=y). s = l Now, for s > 1, oo P (S=s I Y=y ) = Z P (S=s I R=r . Y=y ) P (R=r | Y=y) . r=s By assumption B4, it follows that Y and R are independent random variables. Thus, using (4.1) and (4.2) we get P(S=s|Y=y)= I (3)[h(y)]^[l-h(y)]^-%(r). Therefore, CO oo gc(y)= y^ y^ ls)[h(y)]^[l-h(y)]^-%(r). (4.5) s=l r=s ^ ' Now, according to assumption B6 h(0) = 1. so that oo g_(0) = Z p(s) =1. ^ s=l PAGE 86 78 Therefore, the function g^(y) plays a role similar to the role of g(y) in Section 2.2. Consequently, by regarding a "cluster" as an "animal," the results of Section 2.2 can be applied to clustered populations in a straightforward manner. Let N (?,) denote the random variable representing the number of clusters seen when travelling a distance Â£ on the transect. Then, by Theorem 1, N (Â£) is a Poisson process with parameter e*iJ,, where Q* I is the expected number of clusters seen when travelling along a transect of length Z . Also, from Theorem 1 we see that the respective analogs to equations (2.2) and (2.1) are D =-^ , (4.6) ^ 2c where D is the density of clusters and c ' * c = g^(y)dy. (4.7) o Furthermore, Theorem 2 gives us the results that L and Y are mutually independent random variables, L is distributed as a Gamma random variable with parameters N and G and the '^ c conditional density of Y given L = S, is N fylLCzU) = fy^y^^ ^r ""^ P'c^^i^^"^-^^ -' ^ *x c i=l (c ) Now, assumption B5 implies that the number of animals actually observed in a cluster depends only on the right angle distances to the animals, Y, and the size of the cluster, R. Thus, S^ is independent of L, and since Y is PAGE 87 79 also independent of L it follov^^s that Nc = i!i ^^h='i^\=yi^' (4.9) We can now write an expression for the likelihood function L(e *, p (Â•) ,h(. ) ;s,x. ^) . Using (4.4), (4.8) and (4.9) and recalling that L has a Gamma distribution with parameters N^ and * , we obtain N * c L(9 .p(-).h(.);s,^,Â£) = n P(S.=s. |Y.=y.) ~ i^=X "* 111 V i=l ^ N Â• (-^-10) (c*) ^ r(N^) 4-4 Estimation of D when p(') and h(-) Have Specific Forms For a clustered population with a cluster density, D^ , the animal population density may be defined as c D = D V c where v = E(R) is the expected cluster size. Upon using the expression for D^ given in (4.6), we get Â°-r^ ' (4.11) zc so that maximum likelihood estimation of D can be carried out by using (4.11) in the likelihood function presented in (4.10) PAGE 88 80 Since the random variables S and Y are independent of L, it is easily seen from (4,10) that the maximum likelihood estimate of 6 corrected for bias is N -1 e* = -XÂ• (4.12) However, finding estimates of v and c* can be quite difficult depending upon the nature of the functions p(*) and h('). Very likely, one has to resort to some iterative technique such as the Newton-Raphson method (see Korn and Korn, 1968, eqn. (20.2-31)) to solve the likelihood equation. It is apparent that there exist a wide variety of functions which satisfy the requirements of p(") and h('). The appropriate choice in a particular problem depends on the nature of the population under investigation. In this work we will consider the functions p(r) = " ^ ^ a> 0, r = 1.2,..., (4.13) r!(l-e"'') and it h(y) =e"^ >" X* > 0. y > 0. (4.14) It is easily seen that p(*) given by (4.13) represents a truncated Poisson distribution. The expected cluster size V is therefore given by " (4.15) 1-e The limiting case a = corresponds to a population in which the cluster size is 1 with probability 1. Thus, a = corresponds to the model in Section 2.2. PAGE 89 81 The choice for h(') is based on the facC that when a = 0, h(') may be interpreted as the function g(*) defined in Chapter II. Because g(y) = e" y seems to be a popular choice for g(*), we feel that h(y) = e-^*y is a reasonable choice for h('). The likelihood function may now be regarded as a function of , a and A , and maximum likelihood estimation of 2c may be accomplished by expressing v and c as functions of a and A . We have already seen the form of v in equation (4.15). To derive an expression for c we proceed as follows, Recall from (4.5) that g_(y) = TP(S = s|Y=y), ^ s = l where P(S = s|Y=y)= E g [h(y)]^[l-h(y)]'""^ p(r) r=s Now using (4.13) and (4.14) in the above equation, we get /r\ -A ys/, -A "1 r -ci 0Â° I e ' \l-e } it e P(S=s|Y=y)= I i^i r=s rld-e'") _ (ae-^ y)" e-'^ ^ [a (l-e' ^ y ) ] ^'^ s!(l-e-^) r=s ^^^ /_ ^-A y. s -ae (ae ' ) e s! d-e"'') -Ay (4.16) PAGE 90 82 Then, substituting for P(S=s|Y=y), we get g.(y) = -ae -X'y f A y^ s (ae ^ ) d-e"") 8 = 1 1-e ae .A*y d-e-") Therefore, using (4.7) and (4.17), we get .00 * c = gc(y)dy 1-e -ae X y o (l-e-Â«) dy To evaluate c , note that (1-e -ae \ y )dy = lim (x X-^-oo /Â•x o -ae Â•X*y dy} By letting (4.17) (4.18) (4.19) -^ y t = e Â•' in the integral in the right hand side of (4.19), we can show fX ^* J, ae -X-y X \\ -at e -X*X ^ dt = X + (-l)j aJ(l-e"^ '^)J A j=l j Â• jl and upon substituting into (4.19), we get (1-e ae -X y o ) dy = 1 im Â— Y. X-^^ X* j=l (-1)-^ Jq-e'^ ^)^ PAGE 91 83 Since the sum above is absolutely convergent, we can take the limit inside the sum and obtain roo , * (1-e >^y ^~ .\ j Â• j! O A j=l -J Then, using (4.18), it follows that c* = ^^'^^ , (4.20) A (1-e ) where a(a) = Z ^^^" . (4.21) 3=1 ^ ' ^' We can now write the likelihood function in terms of 6*, A* and a. Using equations (4.10), (4.16), (4.20) and (4. 21) , we get N N N * c ^ c c -A y^ Z s. -A Z y.s. -a y, e i=l ^ i=l ^^ ^ i=l L(e ,A ,a;s,^,Â£) = ^ N ^c , . -I a N c I] s . ! (1-e ) .^;l ^ * ^c ^c -ae-^*yi * ^c ^c"^ -9*11 (A ) ^ n'^ (1-e ""^ ^)(0 ) ^ P. e ^ ^i ^ ^ [a(a)] "^ r(N^) (4.22) Using the likelihood function given in (4.22), we can now obtain an estimate for the population density D. Recall from (4.11) that we can v-zrite D = -^-^ . 2c* PAGE 92 84 After substituting for c* and v using equations (4.15), (4.20) and (4.21), the expression for D becomes Â• * Thus, an estimate for D would be /\ X y\ * /v 2a(S) where 6 , A and a are maximum likelihood estimates for * * 6 , A and a, respectively. As noted earlier in this section, S and Y are independent of L so that 0* can be estimated using equation (4.13). However, this still leaves us the problem of estimating A and a. Instead of estimating A* and a separately, we can reparameterize the likelihood equation in (4.22) by letting Aa and a = a . Then, our estimate for D becomes j^ a* ^ The advantage of this reparameterization is that it makes use of the fact that L is independent of both S and Y. Thus, the estimate for D given in (4.23) is now the product of two independent estimates, which depends on L alone and p which depends on S and Y. As a result the variance of D* can now be found easily. Using the formula (see Goodman, 1960) for PAGE 93 85 the variance of the product of two Independent estimates, we get Var(D*) =1 4 '2/fi*^,, _,:x , ^2 E (e )Var(p)+E2(S)Var(6*)+Var(0*)Var(p)'] (4.24) Since L is distributed as a Gamma random variable with parameters N^ and o\ exact expressions for E(e *) and Var (6 *) can be obtained using (2.4) and (2.11), i.e.. E(o ) = e ^ and ''^^<'>>-^(.,25) Expressions for the variance and expected value of p" can become quite complicated. An iterative scheme would be needed to find the solutions for p and a that would maximize the reparameterized version of the likelihood function given m (4.22). There are computer programs available that can provide maximum likelihood estimates for p and . along with numerical approximations for the variance covariance matrix of the estimates. In the next section we will demonstrate the use of one such program with a set of hypothetical data. PAGE 94 86 4 , 5 A Worked Example In this section we will present a worked example to demonstrate the use of a computer program to find the estimate D and its approximate variance. Because we are not aware of any real data that have been collected according to the sampling plan described in Section 4.2, we shall use an artificial set of data in the example. Suppose that sampling was continued until N = 25 clusters were sighted, and that a transect length of Â£ = 25 miles was needed to sight the 25 clusters. Suppose further that the observed right angle distances and the cluster sizes were as follows, where the first number in the pair is the right angle distance, y, measured in yards and the second number in the pair is the corresponding cluster size, s: (1.1), (3,2). (7,1), (10,1), (2,3) (5.5), (4.1). (7.2), (15,1). (22.1) (6.1). (3.6). (2.1). (12.1). (28,3) (9,2). (18.1), (36.7). (17.6). (5.1) (4.1), (3,1), (8.2), (3.4), (13.1). As noted in Section 4.4, an estimate for G is .. N -1 = = .96, and an estimate for the variance of is .0401. Var(0 ) = c PAGE 95 87 In order to estimate p, the reparameterized version of the likelihood function given in (4.22) will have to be maximized. The Fortran subroutine ZXIIIN, found in IMSL (1979) may be used for this purpose. This program uses a quasiNewton iterative procedure to find the minimum of a function. Thus, v;e first need to take the negative of the likelihood equation before we can use this subroutine to our advantage. On output, this subroutine not only provides the values at which the function is minimized, but also provides numerical estimates for the second partial derivatives of the function evaluated at the minimization point. Thus, when used with the negative of the likelihood function this program will provide the maximum likelihood estimates, p and a, as well as the matrix of negative second partial derivatives of the likelihood, L(*), evaluated at p and a. We will denote this matrix by V -1 d InL(-) da d^lnL(') 9a 3p P=P 3p' 8a dp For our data, the use of tlio sul)r()uLine ZXMIN with initial values a^ = 2.24 and p-r = .16 yielded a = 2.844, P = .0907, and PAGE 96 88 -, I 7.687 -161.229 \ -161.229 5098.985 The initial value used for a was the mean of the observed cluster sizes, i.e., 25 Z s. a, = i^L_^ = i . ^ 25 Since our model does not assume all animals belonging to a cluster are seen, s would underestimate the expected cluster size , i.e., i" < E(R) = " 1-e Thus, s seems to be a good starting value for a. In choosing an initial value for p, first recall that P = X a a(a) ' where a(a) is given in equation (4.21). Since our initial value for a is s, all we need is a starting value for A . If every animal in the cluster was seen with probability 1, the density of clusters would be estimated by the method described in Chapter II. In this case, the maximum likelihood estimalG for A would be 1/y where 25 i^/i PAGE 97 89 Thus, as the initial value for p we used Pi ya(s) The estimate for the density can now be calculated. Using (4.23) and substituting the values we obtained for 6 and p, we get D =76.7 animals/square mile. Now if we can obtain a large sample approximation for the variance of p, then we can use (4.24) as an approximation for the variance of D . Now, under the usual regularity conditions, V will be a large sample approximation to the inverse of the variance-covariance matrix of a and p. Furthermore, the approximate variance of D can be obtained from equation (4.24) after substituting the element in the matrix corresponding to the approximate variance of p along with the other appropriate quantities. Straightforward calculations show that / Var(D*) ^26.2 animals/square mile. The use of this Fortran subroutine required a minimal amount of programming to enter the appropriate likelihood function. It was run using the computer facilities of the Northeast Regional Data Center located in Gainesville, Florida Less than two seconds of CPU time was needed for the estimates to converge to values that agreed to four significant digits on two successive interations. PAGE 98 BIBLIOGRAPHY Anderson, D. R. , Laake , J. L. , Grain, B. R. , and Burnham, K. P. (1976) , Guidelines for Line Transect Sampling, of Biological Populations , Logan: Utah Gooperative VJildlife Research Unit, Anderson, D. R. , and Pospahala, R. S. (1970), "Gorrection of Bias in Belt Transect Studies of Immotile Objects," Journal of Wildlife Management , 34, 141-146. Barr, A. J., Goodnight, J. H., Sail, J. P., and Helwig, J. T. (1976), A User's Guide to SAS 76 , Raleigh: SAS Institute. Bhat, U. N. (1972), Elements of Applied Stochastic Processes , New York : John Wiley & Sons. Burnham, K. P., and Anderson, D. R. (1976), "Mathematical Models for Nonparametric Inferences from Line Transect Data," Biometrics , 32, 325-336. Grain, B. R. , Burnham, K. P., Anderson, D. R. , and Laake, J. L. (1978) , A Fourier Series Estimator of Population Density for Line Transect Sampling ^ Logan: Utah State University Press . Eberhardt, L. L. (1968), "A Preliminary Appraisal of Line Transects," Journal of VJildlife Management , 32, 82-88. Gates, G. E., Marshall, W. H. , and Olson, D. P. (1968), "Line Transect Method of Estimating Grouse Population Densities," Biometrics , 24, 135-145. Goodman, L. A. (1960), "On the Exact Variance of Products," Journal of the American Statistical Association , 55 , 708-713. Hayne , D. W. (1949), "An Examination of the Strip Gensus Method for Estimating Animal Populations," Journal of Wildlife Management , 13, 145-157. IMSL (1979), The IMSL Library , Seventh ed., Vol. 3, Houston: International Mathematical and Statistical Libraries, Inc. Korn, G. A., and Korn , T. M. (1968), Mathematical Handbook for Scientists and Engineers , Second ed. , New York: McGraw-Hill. 90 PAGE 99 91 Leopold, A. (1933), Game Management , New York: Charles Scrlbner's Sons. Lindgren, B. W. (1968), Statistical Theory , Second ed. , New York: Macmillan. Loftsgaarden, D. 0., and Quesenberry, C. P. (1965), " A Nonparametric Estimate of a Multivariate Density Function," Annals of Mathematical Statistics , 36, 1049-1051. Pielou, E. C. (1969), An Introduction to Mathematical Ecology , New York: John Wiley & Sons. Pollock, K. H. (1978), "A Family of Density Estimators for Line Transect Sampling," Biometrics , 34, 475-478. Robinette, W. L., Jones, D. A. Gashwiler, J. S., and Aldous, C. M. (1954), "Methods for Censusing Winter-Lost Deer," North American Wildlife Conference Transaction s , 19, 511-524. Robinette, W. L. , Loveless, C. M. , and Jones, D. A. (1974), "Field Tests of Strip Census Methods," Journal of Wild life Management , 38, 81-96. Seber, G. A. F. (1973), The Estimation of Animal Abundance and Related Parameters^ London : Griffin. Sen, A. R. , Tournigny, J., and Smith, G. E. J. (1974), "On the Line Transect Sampling Method," Biometrics , 30, 329-340. Smith, M. H. , Gardner, R. H., Gentry, J. B., Kaufman, D. W. , and O'Farrel, M. H. (1975), Small Mammals: Their Pro ductivity and Population Dynamics , International Biological Program. Webb, W. L. (1942), "Notes on a Method of Censusing Snowshoe Hare Populations," Journal of Wi ldlife Management, 6, 67-69. PAGE 100 BIOGRAPHICAL SKETCH John Anthony Ondrasik was born on August 17, 1951, in New Brunswick, New Jersey. Shortly thereafter his parents moved to Palmerton, Pennsylvania, where he grew up and attended high school. After graduation in June, 1969, he entered Bucknell University in Lewisburg, Pennsylvania, and received the degree of Bachelor of Science with a major in mathematics in June, 1973. It was during his studies at Bucknell that he became interested in statistics through the influence of the late Professor Paul Benson. In September, 1973, he matriculated in the graduate school at the University of Florida and received the degree Master of Statistics in 1975. V/hile pursuing his graduate studies, he worked for the Department of Statistics as an assistant in their biostatistics consulting unit. In November, 1978, he accepted the position of biostatistician with Boehringer Ingelheim, Ltd. John Ondrasik is married to the former Anntoinette M. Lucia. Currently they reside in Danbury , Connecticut. 92 PAGE 101 I certi<^7 that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. c?.\].n^ peiaver V. Rao, Chairinan Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Dennis D. Wackerly Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ) / / Richard L. Scheaffer Professor of Statistics -^^^ T certify that I have rend this .study i\\m\ that in my opinion it conforms to acceptable standards of scholarly l)resentaLiou and is fully adc(|uaLc, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Ra'mc^n'C. LitU< Associate Pro^-t^ssor of Statistics PAGE 102 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. L V^ . iVvcVJayne R. Marion Assistant Professor of Forest Resources and Conservation This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. December 1979 Dean Graduate School |