
Citation 
 Permanent Link:
 https://ufdc.ufl.edu/UF00097680/00001
Material Information
 Title:
 On some problems of estimation and prediction for nonstationary time series
 Added title page title:
 Prediction for nonstationary time series
 Creator:
 McClave, James Thomas, 1944 ( Dissertant )
Chanda, Kamal C. ( Thesis advisor )
Saw, John G. ( Thesis advisor )
Rao, Pejaver V. ( Reviewer )
PopStojanovic, Zoran R. ( Reviewer )
 Place of Publication:
 Gainesville, Fla.
 Publisher:
 University of Florida
 Publication Date:
 1971
 Copyright Date:
 1971
 Language:
 English
 Physical Description:
 viii, 103 leaves. : ill. ; 28 cm.
Subjects
 Subjects / Keywords:
 Approximation ( jstor )
Estimation methods ( jstor ) Estimators ( jstor ) Logistic growth ( jstor ) Logistics ( jstor ) Maximum likelihood estimates ( jstor ) Maximum likelihood estimations ( jstor ) Population estimates ( jstor ) Statistical estimation ( jstor ) Time series ( jstor ) Dissertations, Academic  Statistics  UF Statistics thesis Ph. D Timeseries analysis ( lcsh )
 Genre:
 bibliography ( marcgt )
nonfiction ( marcgt )
Notes
 Abstract:
 Many techniques are available for estimating the parameters of
linear stationary time series. The effect of some of these estimators
on the least squares predictor for some future value of the series is
examined. We have obtained approximations for the increase in prediction
error due to parameter estimation in several cases for which no
exact expression could be found.
An efficient estimator for the parameters in a first order
autoregressivemoving average model is developed by making use of a
linear function of the autocorrelations. For more general models we
conclude that efficient estimation is so difficult to attain that first
consideration in many estimation prediction problems should be given
to ease of calculation.
Numerous unsolved estimation and prediction problems remain
for nonstationary time series. We consider the logistic growth process,
which is used extensively as an economic and population growth
model, in detail. Current estimation procedures for the logistic
process' parameters make no reference to an error structure. We propose a probability structure consistent with the realistic
properties of the series. We then use this structure to obtain estimators
from three different observational standpoints:
(1) that of observation at equidistant time points,
(2) that of continuous observation, and
(3) that of arrival time observation.
For observation types (1) and (2) we have used a modification
of maximum likelihood procedures to obtain estimators having most of
the usual properties associated with maximum likelihood estimators.
A computer program was written for type (1) to solve the intractable
estimation equations, using the NewtonRhapson procedure. Observation
of arrival times is shown not to lead to any useful estimation procedures.
We also examine the effect of the estimators calculated from
the observations taken at equal time intervals (type (1) above) on the
error of prediction.
The procedure developed for observation type (1) is then compared
by means of example to an estimation procedure developed by Rhodes.
The logistic model is also fitted by each method to the population of
conterminous United States. The estimates are then used to obtain
predictions of the population of conterminous United States in 1970.
We conclude from the results that the effort required to calculate the
maximum likelihood estimates is worthwhile. We further conjecture that
the methods developed may be applicable to other growth processes.
 Thesis:
 ThesisUniversity of Florida, 1971.
 Bibliography:
 Bibliography: leaves 100102.
 General Note:
 Manuscript.
 General Note:
 Vita.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 022281066 ( AlephBibNum )
13618620 ( OCLC ) ACZ2431 ( NOTIS )

Downloads 
This item has the following downloads:

Full Text 
ON SOME PROBLEMS OF ESTIMATION AND PREDICTION
FOR NONSTATIONARY TIME SERIES
By
JAMES THO:.LAS McCLAVE
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE RErLUIRE.LENT'S FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1971
TO MY FATHER, MY MOTHER
AND MY V.IFE
FOR THEIR LOVE, UNDERSTANDING
AND FINANCIAL SUPPORT
ACKNOWLEDGMENTS
I wish to express my heartfelt thanks to Dr. K. C. Chanda for
his expert and patient guidance in this effort.
I also want to express my gratitude to Dr. J. G. Saw for
proofreading this dissertation and for his helpful comments; and to
my office partner, Dr. J. Shuster, for many helpful discussions.
Special thanks go to my wife, Mary Jay, for her loving support,
financial and otherwise.
Finally, I wish to thank Mrs. Edna Larrick for the magnificent
job of turning the rough draft I gave her into this typing masterpiece.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS ... .. ... .. ... ... .. iii
LIST OF TABLES ...... ... .... . . .. . . v
LIST OF FIGURES . ..... ..... .. ........ vi
ABSTRACT . .. .... . ... . .. . ... . .vii
CHAPTER
I STATEMENT OF THE PROBLEM. . ............ .. 1
1.1 Introduction .... . . ... . ... .. 1
1.2 HistoryStationary Processes .. . ... ... 4
1.3 HistoryNonstationary Processes . ... ... . 5
1.4 Summary of Results . . . . . . . .. 7
II LINEAR STATIONARY MODELS .. . . .. ... . ... 9
2.1 Introduction . . . .... . ...... . 9
2.2 The Autoregressive Sequence . ... .. . 10
2.3 The Moving Average Sequence ... ... . 24
2.4 The Hybrid Model ..... .. . . .. .... 32
III THE LOGISTIC GROWTH PROCESS ..... .. .. . 46
3.1 Introduction .. .... . ..... . . 46
3.2 Probability Distribution of [X(t)] . . . . . 47
3.3 Equispaced Observation . . .... . ... 50
3.4 Continuous Observation in [O,T] . . ... .. ... 58
3.5 Arrival Time Observation .. . .... . .... 73
3.6 Prediction . . .. . . . . .. . .. . 75
IV AN EMPIRICAL COMPARISON.. .. ..... . ... 80
4.1 Introduction .......... . . . ... 80
4.2 Rhodes' Estimators . . .. . . . . . . 80
4.3 Sampling Properties of Rhodes' Estimators . . .. 82
4.4 Monte Carlo Experiments . . .. ... .. . 86
BIBLIOGRAPHY.. .......... ...... . . . 100
BIOGRAPHICAL SKETCH .. . . .. .. .... .. .... 103
LIST OF TABLES
Table k Page
2.1 Tables of Efficiencies for Y0 = dj C . . . . 41
j=l J1
4.1 Generated and Expected Sequence for n = 40, Q = 10.0,
and p = 0.8 . . . . . . . . ... . . . 8.
4.2 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.1 for n = 40, a = 10.0 and p = 0.8 . 9C
4.3 Generated and Expected Sequence for n = 100, a = 5.0,
and p = 0.8 . . . . . . . . ... . . . 93
4.4 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.3 for n = 100, Q = 5.0 and p = 0.8 .. 9.
4.5 Generated and Expected Sequence for n = 400, a = 3.0,
and p = 0.8 . . . . . . . . ... . . . 9
4.6 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.5 for n = 400, a = 3.0 and p = 0.8 . 9E
4.7 Summary of Results of Study . . . . . . . . 9
LIST OF FIGURES
Figure Page
4.1 Generated and Expected Sequence for n = 40, a = 10.0,
and p = 0.8 ... .. .. .. .. ........ ... . 89
4.2 Generated and Expected Sequence for n = 100, a = 5.0,
and p = 0.8 ... . . . . . . .. .... . 92
4.3 Generated and Expected Sequence for n = 400, a = 3.0,
and p = 0.8 .... . . . . . . . . 95
Abstract of Dissertation Presented to the
Graduate Council of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy
ON SOME PROBLEMS OF ESTIMATION AND PREDICTION
FOR NONSTATIONARY TIME SERIES
By
James Thomas McClave
March, 1971
Chairman: Dr. Kamal C. Chanda
Major Department: Statistics
Many techniques are available for estimating the parameters of
linear stationary time series. The effect of some of these estimators
on the least squares predictor for some future value of the series is
examined. We have obtained approximations for the increase in predic
tion error due to parameter estimation in several cases for which no
exact expression could be found.
An efficient estimator for the parameters in a first order
autoregressivemoving average model is developed by making use of a
linear function of the autocorrelations. For more general models we
conclude that efficient estimation is so difficult to attain that first
consideration in many estimation prediction problems should be given
to ease of calculation.
Numerous unsolved estimation and prediction problems remain
for nonstationary time series. We consider the logistic growth proc
ess, which is used extensively as an economic and population growth
model, in detail. Current estimation procedures for the logistic
process' parnrneters make no reference to an error structure.
We propose a probability structure consistent with the realistic
properties of the series. We then use this structure to obtain esti
mators from three different observational standpoints:
(1) that of observation at equidistant time points,
(2) that of continuous observation, and
(3) that of arrival time observation.
For observation types (1) and (2) we have used a modification
of maximum likelihood procedures to obtain estimators having most of
the usual properties associated with maximum likelihood estimators.
A computer program was written for type (1) to solve the intractable
estimation equations, using the NewtonRhapson procedure. Observation
of arrival times is shown not to lead to any useful estimation proce
dures. We also examine the effect of the estimators calculated from
the observations taken at equal time intervals (type (1) above) on the
error of prediction.
The procedure developed for observation type (1) is then compared
by means of example to an estimation procedure developed by Rhodes.
The logistic model is also fitted by each method to the population of
conterminous United States. The estimates are then used to obtain
predictions of the population of conterminous United States in 1970.
We conclude from the results that the effort required to calculate the
maximum likelihood estimates is worthwhile. We further conjecture that
the methods developed may be applicable to other growth processes.
viii
CHAPTER I
STATEMENT OF THE PROBLEM
1.1 Introduction
Consider the following two types of time series:
1. fX(t); t e Ja,b, where Ja,b is the set of real integers
1a,a+1,...,b}. This is the discrete time series.
2. {X(t); t e Ia,b}, where Ia,b is the interval (open or closed)
between the real numbers a and b. This is the continuous
time series.
Suppose an observation is made on a given series, either obtain
ing an observation set of the kind (x(S),x(S+l),...,x(T)}, with S and
T real integers such that JS,T C Ja,b from the first or second type;
or of the kind fx(t); t e IST, with IS Iab, from the second type.
S,T S,T a,b'
The problem to be considered is that of finding the least squares pre
dictor of X(T+m) (with m an integer if the series is discrete and T+msb)
based upon the observation. Thus, we want to find that function of the
observation set, say x (m), which minimizes the mean square error of
prediction.
The first step toward the solution of this problem is to construct
a parametric representation of the series. One must make use of all avail
able information about (X(t)J in order to place the problem in a parametric
framework. This parametrization may vary from specification of the first
and/or second moments of the series to complete specification of the
probability distribution at each point t.
This study of the series' genesis will usually place the process
into one of two subclasses:
(a) Weakly stationary, hereafter referred to as stationary,
series; that is to say, those whose first two moments are,
for every t,
E(X(t)i= u, (1.1.1)
independent of t; and
E(X(t+s)X(t)] = s (1.1.2)
a function of s only.
(b) Nonstationary series, the complement of subclass (a).
If the series satisfies the conditions of stationarity, the
prediction problem is simplified. The minimum mean square error pre
dictor depends on the first two moments of the time series. Since for
stationary processes the second moment depends only upon the "lag" or
time difference of the random variables, and, since the first moment is
independent of time, all sequences of observations which are of equal
lengths yield the same information about the series' first two moments,
no matter when in the series' development the observations are taken.
This second order homogeneity simplifies the problem to the extent that
all further parametrization pertains to the description of 4j and ,s'
from which the predictor will be calculated.
On the other hand, if the series is nonstationary, the problem
is more complex because of the nature of this subclass. We will divide
the nonstationary subclass into two categories:
(i) those for which a simple transformation exists which trans
forms the process into a stationary time series, and
(ii) those for which either no such transformation exists, or at
least no simple transformation can be found.
As to the first category, the nature of the transformation
depends, of course, on the reason for the series' nonstationarity.
As a simple example, suppose (X(t)} is a continuous time process with
E(X(t)} = P(t) = e + Bt, (1.1.3)
E(X(t) X(t+s)] = Ys(1.1.4)
Then the process fY(t)), formed by taking
Y(t) = X(t) m(t), (1.1.5)
will be stationary.
This particular kind of transformation is popularly referred to
as removal of the trend from fX(t)];it can be accomplished for simple
linear trends like that given by (1.1.3), or for more complex seasonal
or harmonic trends common to many economic time series. The end result
is the same; this further parametrization has permitted us to reduce the
problem to that of stationary series prediction, for which solutions
are known.
However, it is perhaps within the second category that one
encounters the most complex prediction problems in time series. Without
the homogeneity of the first two moments, the nonstationary process'
analysis will certainly depend on when the observation is taken; and the
underlying probability structure of the series can be extremely compli
cated. In short, general methods cannot be used to solve the prediction
problem for nonstationary processes with satisfactory results. Each
series must be considered individually, its evolution thoroughly re
searched, a parametrization made, and then an optimal prediction proce
dure developed. One can hope that this systematic method will apply to
a general group of processes, but that the actual procedure developed
will apply to any but the specific series being investigated is extremely
doubtful.
1.2 HistoryStationary Processes
The literature on the prediction problem for the stationary time
series is voluminous. That the area is extremely attractive follows from
the above; most results have wide application within the subclass of
stationary processes, and the chances that a problem is mathematically
tractable are enhanced by stationarity. In addition, of course, station
ary processes have great practical significance.
Problems of least squares prediction for stochastic processes
were first considered by Kolmogorov [20], using Hilbert space geometry.
This treatment has been adapted to the stationary processes by several
authors, including Doob [9], Yaglom [36], and Parzen [24]. Hence the
prediction problem is essentially solved for all stationary time series;
the known general methods need only be applied to the specific stationary
process being considered. Many authors have contributed to this end,
among them Doob [9], Yaglom [36], Whittle [33,34], Box and Jenkins [3,4],
and Jenkins and Watts [17].
The particular parametrization proposed often results in an
estimation problem which must be solved prior to making the prediction.
This is illustrated by three of the most commonly hypothesized dis
crete time series: the autoregressive, moving average, and mixed
autoregressivemoving aerange (hybrid) models. Each receives wide appli
cation in engineering and economic research, and each entails parameter
estimation. These estimation problems have been considered by Mann and
Wald [21], V;Tiittle [33], Durbin [10,11,12], Walker [28,29,30,31],
Hannan [16], and Box and Jenkins [1]. Some of the estimation procedures
are rattler complicated to apply; but, for the most part, satisfactory
estimation methods exist for these linear stationary models.
The literature is nearly devoid of research exploring the effect
of parameter estimation on the error of least square predictors. Even
though the error increase due to parancter estimation will usually be of
a smaller order than the mean square error, the precise nature of this
increase needs to be studied. Box and Jenkins [4] do present a brief
discussion of this problem for one simple time series model, but to my
knowledge the subject has not been mentioned elsewhere. This problem
will be considered at some length in Chapter II.
1.3 HistoryNonstationary Processes
All the work on the prediction problem for nonstationary time
series is divided into two distinct groups: that which deals with trend
removal in order to make the process stationary, and that which does not.
We shall consider only the latter in this dissertation; the interested
reader may consult Grenander and Rosenblatt [15] and Box and Jenkins [1]
for a complete treatment of trend removal.
Kolmogorov's Hilbert space solution to the prediction problem
is applicable to nonstantionary processes. The solution depends mainly
on the second order properties of the process. Since the second moments
are inherently more complex for nonstationary processes than those for
stationary processes, the result of the application of Kolmogorov's
method to a nonstationary time series usually does not have general
application within the class of nonstationary processes. For this
reason one finds that most of the estimation and prediction research in
this area has dealt with specific process types; for example, queuing
processes, renewal processes, birthdeath processes, and growth
(evolutive) processes. The list is not exhaustive; each class has
received attention in the literature, much of it devoted to the estima
tion and prediction problems.
Specifically, considering the birthdeath and growth processes,
both usually Markov processes, most of the research has been concen
trated on homogeneous processes; that is, those with transition probabil
ities independent of time. Feller [13] gives a rather complete intro
duction to the birthdeath processes, defining the underlying probabil
ity structure of them. Then David Kendall [18] uses generating functions
to obtain solutions for the probability distribution of these series at
any time t. Moran [23] partly solves the problem of estimation for
homogeneous birthdeath processes, and both Moran [22] and Kendall [19]
propose several r.ethods for estimating the parameters of the transition
probabilities of some simple nonhomogeneous birthdeath and growth
processes.
Since the publication of these papers, research in this area has
been sparse at best; and the prediction problem has received surpris
ingly little attention from statisticians. Biologists and economists
(Rhodes [25], Tinter [26], and Granger [14]) have taken some interest in
this area, since it includes prediction of animal population size and
economic forecasting. Unfortunately, these problems are often treated
with little reference to underlying probability structures. The results
rely more on intuition than mathematics, and hence the predictions are
often unsatisfactory.
An example of this kind of treatment is given by Rhodes [25].
Rhodes considers the problem of estimating human population size without
using any probabilistic arguments. His early paper was an indication of
things to come. In all fairness though, early researchers had a legit
imate excuse for steering away from these arguments, for they often make
prediction problems for nonhomogeneous Markov processes mathematically
intractable. However, in this age of the electronic computer, many
formerly insoluble problems can be treated iteratively and solutions
obtained.
1.4 Summary of Results
The results are divided into two distinct categories: first, in
Chapter II the effect of estimation on the error of prediction for linear
stationary time series is considered. For several models an exact expres
sion for the error increase due to parameter estimation is obtained; in
many cases for which this is not possible, approximations for this
increase are given. For the mixed autoregressivemoving average model,
a new estimation procedure is examined with respect to the prediction
problem.
In Chapter III the logistic growth process, a nonstationary,
Markov time series, is considered in detail. A realistic probabilistic
structure is assigned to this process, and then the parameters involved
are estimated from several different observational standpoints. Finally,
the prediction problem is considered for this process, and the effect
of parameter estimation on the mean square error of prediction is
examined.
The methods developed for the logistic process may be applicable
to other nonstationary processes. Nevertheless, the logistic process
has application in describing animal population growth and certain eco
nomic growth patterns, so that the results may be useful in themselves.
Chapter IV is devoted to a comparison of an estimation procedure
(for the logistic process) derived by Rhodes [25] with one of those
derived in this paper. The analytical intractability of the estimators'
sampling properties calls for numerical comparison, so that conclusions
are tentative. The purpose of this Monte Carlo study is to determine
whether the extra effort which the probabilistic methods require is
worthwhile, or whether one should continue to estimate the parameters,
using methods like those proposed by Rhodes; that is to say, those which
rely mainly on intuitive appeal and their ease of calculation.
CHAPTER II
LINEAR STATIONARY MODELS
2.1 Introduction
Many volumes have been written about estimation and prediction
problems for linear stationary time series, but there is little in the
literature which considers the effect of estimation upon prediction.
That is, the increase in the mean square error of prediction due to
parameter estimation has not been examined.
Knowledge about the increase in prediction error for various
predictors may be useful. If the increase is small for several differ
ent estimators, one might choose the estimator easiest to calculate.
On the other hand, one would spare no effort in finding that estimator
which least increased the error of prediction if that increase were
known to be significant.
Our purpose in this chapter is to consider this estimation
prediction problem for the three basic linear stationary time series:
autoregressive, moving average, and a mixture of these (hybrid). Only
discrete time series will be examined; since the stationarity of the
process implies that the mean is mathematically independent of time,
this mean will be assumed known and equal to zero. All results remain
asymptotically true for the case of an unknown mean for the series
formed by subtracting the arithmetic mean from each of the observed
values. To pursue this would be stray from the objective; the inter
ested render is referred to Bartlett [1] and Box and Jenkins [1].
Finally, the sequence [Zt; t= 0, 1 ,...] will be used throughout
to denote a stationary series whose first two moments are the following:
EZt3 = 0, (2.1.1)
2
2. if j = 0
E[Z Zt = z (2.1.2)
0 if j 0 .
From time to time comment will be made on the case where Zt is normally
distributed. But unless specified, no distributional assumptions are
made.
2.2 The Autoregressive Sequence
If (Xt; t Jm m is a stationary time series satisfying the
relationship
Xt + .XtI + ... + p X = Z (2.2.1)
t 1 t1 p tp t
for every t,then [Xt is said to be a pth order autoregressive sequence.
The problem of estimating the set of parameters [a.; j e J
is analogous to the classical regression problem. Defining ,j as the
jstep covariance of [X}t and pj as the jstep correlation, that is
to say
yj = E[X t+X t (2.2.2)
and
I
Pj = YjO1 (2.2.3)
we get
p
e r = r (2.2.4)
j=1
for r e J by multiplying (2.2.1) by X and taking expected values.
1,p tr
We use the fact that Z and X are uncorrelated when r < 0, which
t tr
will be sho~n later. Equation (2.2.4) gives rise to the estimating
equations
P
E a. C = C (2.2.5)
j=l 3 jr r
for r e J ,p' where
NlI
C. = N t x t l (2.2.6)
t=l iJ.
is the sample covariance function for the observed sequence
fxt; t e J1,N. Mann and Wald [21] have shown that the estimators
[&.; j e J1,p) given by (2.2.5) are asymptotically normally distributed
1 2 1
with null mean and variance covariance matrix (NYO) OZ where
S is a (p Xp) matrix whose element in the (i,j) cell is
( .. = 0. . (2.2.7)
ij 1j
,T
The vector of estimates whose transpose is a = (a& l'&2' '" ) is
T
shown to be the maximum likelihood estimator of a = (a ,'a,... ,p) if
the distribution of Zt is assumed normal for each t.
In order to predict the value of XN+m using the observations
Xlx2,'...,., we will use the function which minimizes the mean square
^(1)
error of prediction. That is, the optimal predictor is ^N (m) which
minimizes
MSE ((m) = E([X R(1)(m)]2 (2.2.8)
But this implies that
X (m) = EXN+m/(Xj = x; j e J1N)3 (2.2.9)
m ,N
Substitution of (2.2.1) into (2.2.9) yields
(1I)
N (m)  E/ ( ] xj j e Jl, )
EZNm j J N+mj j 1,N
P
= T. a (mj), (2.2.10)
j=l
where
E(XN+j/( kXk; k e J1,N
(I) if jj>o
N (j) i j >o (2.2.11)
.N+j if j s .
Thus one can calculate the mstep predictor, xN(m), using the recursive
relation (2.2.10).
This, however, requires that the autoregressive parameters be
known. When they are not, we may logically use
.(2) ..2)
x (m) = a. 2) (mj) (2.2.12)
j=1
To be determined is the increase in the error of prediction caused by
(2) C(1)
using xN (m) instead of N (m).
Case 1: p = m = 1
For the sake of simplicity, let a = c1, so that [Xt) satisfies
the relationship
X aX = Z (2.2.13)
t t1 t
for all t. Using the MannWald estimate for o, one has
1
a = C1 C1 (2.2.14)
and 2
Var (5) ; (N(0) (2.2.15)
Rewriting (2.2.13) and using the fact that the stationarity of [Xt
implies Ic4 < 1, one obtains
X = (Xt2 + Zt + Zt
k1 .
k ki
= X + E cA Z (2.2.16)
tk tj
j=o
for any positive integer k. We can extend this to the limit in the
mean square sense as k m. That is, one can write
Xt cE Z _j, (2.2.17)
j=0
since
E[X2 = 2j 2
j=0
2 1 2
= (12) 2 (2.2.18)
is finite. Note that the left side of (2.2.18) is, by definition, "0'
and
Yk = E Xt+k Xt]
= E Zt+kjj r, tjj
j=0 j=0
= Y YO" (2.2.19)
Thus
1 k
Pk = Yk YO = k (2.2.20)
In (2.2.19) and (2.2.20) we assume k is a positive integer; that ykk
is easily shown.
The minimum mean square error predictor of XN+1 is given by
setting p = m = 1 in (2.2.10), whereupon
(1)
x (1) = x (2.2.21)
N N
The prediction error is given by
SE(1) 2
MSE = E([XN (1)]"
1 N+1 N
= E[(XN+I XN)2
= 2 (2.2.22)
If a is unknown, the predictor corresponding to (2.2.12) is
^(2) = 1 x. (2.2.23)
(1)
XN N.
Using the fact that [X X (1)] is uncorrelated with the sequence
N+1 N
[X ; j c J ,N3, one has
S(2) 2
MSE = E[[XN+1 N (1)] ]
2 N+1 N
= MSE1 + EfXN2( o)2}. (2.2.24)
Thus the increase in the error of prediction due to estimating a is
MSE2 MSE1 = EfXN(o)2} (2.2.25)
Suppose the estimator a is calculated using only the first k
observations in the sequence [xj; j e J1,N. where k << N; that is to say,
we form the sample covariances in (2.2.3), using only k observations.
Since pj decreases exponentially, the fXj; j E Jk} and X are virtually
uncorrelated, and hence so are a and XN. A first approximation for
(2.2.25) is thus
MSE 1MS E(X } E[(()2]
1 2 1 2
= Y k (1c) = k a1.. (2.2.26)
Box and Jenkins [4] present an argument similar to the above. The
degree to which the approximation (2.2.24) holds is impossible to ascer
tain from these arguments.
To obtain a more exact expression for (2.2.25), first define
6 = C. Yj (2.2.27)
for j e J1. Then we can write
1
(<'c,) = C1 C a
I
(60 + Y)1 ( 0), (2.2.28)
using the fact that 1 iY0 = 0. Then
E( a) = El(61 0 )2 (60+70)2
1 1
= Efy2(61p 0)2 (1 260YO1 +3(60 YO ...)
= Eyo ( 60) 23 + o(N1), (2.2.29)
1 '2
since 60 and 1 have sampling errors whose order of magnitude is N
1
(Bartlett [1]). Thus, retaining terms of order N and higher, (2.2.25)
becomes
E[X2c)23 Yo2 E(X (6 1 2 0 (2.2.30)
We now state
Theorem 2.2.1:
If (X ; t e Jm } is an autoregressive sequence of order one,
then
MSE2 MSZ1 = NK(2) (l + c2) 2 K (4)~ l(2))
+ o(N1),
where YZ(r) is the rth cumulant of the distribution of [Zt]
Proof:
(2.2.31)
Referring to (2.2.30), we must evaluate
(i) E([ 60)
(ii) E[ 52 and
(iii) Er 50 51
(i) We will evaluate the first term in detail in order to make
the methods clear. First, note that
E([. 5} = E( (CO )2
N
2 1 N 2
= E( (N XE YO)
N 2 
= E(N l x2) 2N 0
t=1
N 2
EXt= XN
t=1
2 2
+ y0 XN
Now
X (N
t=1
2N2 N 2 4 N
t ) = E[XN Xtj + 2
t=1 t=2
t1
s=l
E[X9 2 2
2E t
E(% N s x
From (2.2.16), for t > s,
ts1
ts k
X= u S X + Y
t s Ztk
k=O
ts
= X + y
s S,t
(2.2.32)
(2.2.33)
(2.2.34)
where
ts1
Y = E a Z k (2.2.35)
s,t k=0
We now write
v(k) = EYk (2.2.36)
s,t s,t
defining
v(k) = y = 0 (2.2.37)
SS S= S
s,s s,s
for all k e J Hence we obtain from (2.2.34), for t S N,
E(X X2X = E(E[X X2/X 13
S2(Nt) EX4 (2) EX (2.2.38)
= a EXtJ + tN t
Furthermore, for s S t 5 N,
EfXN X2 X2 = E(E[XN X2 X2/X]2
X xt Xs N t s s
1= 2(Nt)E([a(ts) X + Y 2 X2
s s,t s
V(2) E (ts) 2 2
+ t,N s st s
2(Nt) +4(ts) 6
a E(X 3
S 2(Nt) + 2(ts) (2) 2(ts) (2) 4
+ 16a +a VtN] E(Xs
st t,N
S42(Nt) + (ts) (3) X3
s,t s
+ [I2(Nt) s(4) (2) (2) E(X2 (2.2.39)
Before we can write (2.2.32) in terms of the cumulants of [Zt}, we need
to note that the rth umulant of the processis,using(2.2.17),
to note that the r cumulant of the (Xt] process is, using (2.2.17),
X(r) = (1 r) 1 z (r) ,
and hence that
E(X6s = (1c6) 1 Y(6) + 15[(1o4)(1Cr2 )1 t(4) ~ (2)
32 2 2 3 3
+ 10(1a3) K(3) + 15(1 o2)3 (2),
E(X]J = (10 4)1 (4) + 3(1o2)2 (2),
E[SX3 = (1c3)1 z(3),
and
E[X2] = (1a2)1 K (2).
We will also need, from (2.2.35) and (2.2.36),
(4)  4j 2
Vst = E [?z(4) + 3K(2)] + 6
j=0
ts1
i=l
i1
j=o
(2.2.40)
(2.2.41)
(2.2.42)
(2.2.43)
(2.2.44)
2(i+j) (2)
2
4 1 4(ts)
= (1c ) (1 )Z(4)
2)2 2(ts) 2 2
+ 3(1a ) (1ct ) x (2),
(3) tsl
v(3) = j Z (3)
j=0
= (1e3 1( 13(ts)) (3),
= (1ct) (1ct ) K2(3),
(2.2.45)
(2.2.46)
ts1
(2) =2j
V st= C O(2)
j=0
= (1u )1(2(ts) a (2) .
(2.2.47)
We are now prepared to obtain (2.2.39) in terms of the cumulants of
(Zt3; the result of substitution of (2.2.i41) (2.2.417) into (2.2.39) is
E 2 2 6 1 2(Ns)+2(ts)
EfX t = (1 6) ~Z(6)
S[ 4 (12 2 2(Ns)+2(ts)
+ [(1ca )(cY )] [S'
2(Ns) 2(Nt) 2(ts)
+ 5a + a + a ] x Z(4) KZ(2)
3 32 2(Ns)+2(ts) 2Nst 2
+ (1c ) [6a + 4a + 4Z(3)
2 3 i 2(Ns) 2(Nt)
+ (1a ) [1+O1 + 2a
2(ts) 3
+ 22(t ] 3(2) (2.2.48)
We now substitute (2.2.48) into (2.2.33) to get
N N1
Ef ( E X )2 = (1a6)1 Z(6) E a
t=1 t=O
Ni t
N1 t 2s+2tI
+ 2 E E Cy
t=l s=l
4 2 N1 t 2s+2t
+ [(1c )(1a2)] qz(4)x (2)fN+16 E E
t=l s=l
N2 2t 2 N 2t
+ E a [2(Ntl) + a (2N+8t)] + 14 E cr
t=O t=O
3 2 2 N 2t N1 s 2t+2s
+ (1 ) KZ (3)f10 E C + 12 E E a
t=0 s=2 t=l
N1 tI
+ 8 E E t+s]
t=l s=0
23 3 2 N2 2t
+ (1C ) z (2)[N +2N + E a [4(Ntl)
t=0
+ 2(4N+16t)]3. (2.2.49)
The second term on the right side of (2.2.32) is similarly evaluated to
obtain in part
N 2 41 N 2(Nt)
E E[XN X = (1) (4) Z
t=l t=0
2 2 2 N1 2(Nt)
+ (1a ) H(2)[l+ 2 ac
t=0
4( ) 1 N 2t
= (1C) (4) 2t
t=0
Ni
S N1 2t
+ (1C) H(2) [N+2 E a I (2.2.50)
t=0
Putting (2.2.49) and (2.2.50) into (2.2.32) yields
E[f 62} = Nl([(1&4)(1&c2)2]1 (l+ 2) 4 (4) K (2)
X =4(2) 3 z
+ 2(1c 2)4(l+a2) (2)) + o(N1). (2.2.51)
(ii) We proceed in the manner described above to obtain
E(XN 62] = N (2[(1a )(12a )21 4 a (4) x (2)
+ (1 )4 (1+4a 24) (2)] + o(N). (2.2.52)
(iii) Similarly, we find
E 1 4 221 2
E[X 5 80 = N f[(1: )(1a ) 2 o(l+a ) n (4) x (2)
+ 4(12) 4 3 (2)]+o(N1). (2.2.53)
Finally, from (2.2.30) and (2.2.51) (2.2.53),
MSE2 MSE1 = (1c2) 2 (2) [EfX2 2 2caE(XN .51
2 2 1 N 0
+ a E[ 2 A bI
= N{ (x(2) (1+2 )1 a2 Z(4) < (2)] + o(N1),
((2.2.31))
which completes the proof of the theorem.
Note that if [Zt3 is a normal process, YZ(4) is zero, so that
MSESE SE = N1 (2) + o(N )
2 1 2
= N1 2 + o(N1). (2.2.54)
Z
Thus we have the rather remarkable result that, after [Zt3 is a normal
process, XN and a are uncorrelated when terms of lower order than N
are ignored. That is,
MSE MSE = E[X 2 E((& )2} + o(N1). (2.2.55)
Case 2: p = 1, m general
Using (2.2.9) and (2.2.16)
m1
xl) (m) = Ef E ZN+mj +m XN/(X = x; j Jl, )
XN = N+mj j
j=O
= mxN. (2.2.56)
Then
MSE1 = E[XN+m (1) (m)]2}
2 2m 2 (2.)
= a (10 )(10? (2.2.57)
Assuming ca unknown, one forms the predictor
x(2)() = am xN. (2.2.58)
As in Case 1,
MSE MSE = E[X((m) X(2)(m)]2
= EXN [(56 + a))m Om 2 (2.2.59)
= (N[( + ]3
writing
6? = & a.
Upon expanding (2.2.59) and remembering that 5o has sampling error of
1/2
order N we have
MSE2 MSE1 = EX2L[(,52)m + mc(5)m1 + + mm(6)] 2
2 2(m1) 2 1
= (m Ef.(6o)2} + o(N). (2.2.60)
Using Theorem 2.2.1 and assuming Z normal, this becomes
1 2 2(m1) 2 + o(Nl1 (2.2.61)
MISE MSE = N m a + o(N ). (2.2.61)
2 1 Z
If the assumption is dropped, the fourth order cumulant must be included
in (2.2.61).
Case 3: p general, m = 1
The optimal predictor here is
4(1) p
N (1) t xN+lt (2.2.62)
t=l
with prediction error
MSE = Ef[ (1) (1)]2
1 N+1 N
p 2
= E .(X + Z a. X
E[(XN+1 .+ j N+lj)
j=1
2
Z0. (2.2.63)
The predictor becomes, when the parameters are unknown,
(2) P
xN (1) = Ej 1& x j (2.2.64)
j=l J N+1j
The increase in prediction error is given by
MSSE MSE = Ef[ .2)(1) X(1)(1)]2
2 1 N N
= T T (2.2.65)
= EU~o) x x (2.2.65)
T
using matrix notation with X = (XNX_1 .. XN+). The evaluation of
(2.2.65) is intractable; Theorem 2.2.1 leads us to conjecture that, for
normal Zt, X and (6a) are uncorrelated (ignoring terms of order less
1
than N ). In this case
MSE2 MSE1 YOE[(6& E Z()},
where X X has been replaced by its expected value. It follows from the
results of Mann and Wald [21] that, asymptotically,
(6) N N ,c (Y0Z) (2.2.66)
where N denotes a pvariate normal distribution and cp a null vector of
p p
order p. Thus
= 1 2 
2LSE ASE1 YO[(Ny0) ac p] + o(N )
2 2 
= N p a + o(N ), (2.2.67)
T 2 1 1
using the fact that N(6a)T[a (yOE)1() ) has a chisquare distribution
with p degrees of freedom. Note that expression (2.2.67) is a simple
generalization of expression (2.2.54) for the case p = 1.
Case 4: p, m general
In this most general case, the predictor is (2.2.10), where
the values 1)(j) are determined recursively. When ([^.; j e J ,P are
substituted for ([.; j = J }, one obtains (2)(m), given by (2.2.12).
j ,p N
We are interested in calculating
S (2) (1)
MSE2 MSE1 = E[[ E ( X (2m) a X (m))] (2.2.68)
All attempts to obtain (2.2.68) in closed form have been unsuccessful.
That the trm is of older N follows from the properties of ().
That the te.rm is of order N follows from the properties of (i').
Further conclusions might be possible with the aid of Monte Carlo
experiments.
In summary, the autoregressive process allows one to calculate
estimators having many satisfying properties, including relative ease
of calculation. Furthermore, the effect of these estimators on the
error of prediction becomes negligible when N is large. Exactly how
large N must be is dependent upon p and m. We have obtained precise
expressions for the increase in prediction error for several special
cases.
2.3 The Moving Average Sequence
If [Xt; t e J(m )] is a stationary time series satisfying the
relationship
X= + Z (2.3.1)
Xt = t + t1 ... + Bq tq
th
for every t, then [Xtj is said to be a q order moving average process.
The problem of parameter estimation for the moving average
sequence is complicated by the fact that the sample covariances, C., do
not provide efficient estimators of the covariance function of f[Xtl, y
(Efficiency is defined with respect to the variance of the maximum
likelihood estimator when Zt is assumed normal.) The following is
a brief history of the estimation problem.
Whittle [33] obtained maximum likelihood equations for the
moving average process assuming Z to be normal. Estimators were not
obtained from these equations, however; even iterative methods have
proven unsatisfactory for solving the likelihood equations. The search
thus began for estimators of calculable form and with high efficiency.
We note that
yj = E(Xt+j X t
q q
= Ef( E 0k Zt+jk)( X k Ztk)3
k=0 k=0
2j
q5 8k k+j if j
k=O
0 if j Jq+l,' (2.3.2)
defining = 1 and noting that yj = .. One can obtain estimates
3 j
for j j; j J1,q by substituting C. for yj in (2.3.2) for j C Jl,q
However, the lack of efficiency of C. for y. has a predictable effect
on these estimators. Whittle [33] has shown that for q = 1 and 1= 0.5,
the efficiency of this method is about 0.3.
Another representation of iXts is the following:
Xt + E cj Xj = Zt, (2.3.3)
3=1
where
J J2 J
r >0 1 2 q
s>O
sCJ1,q
Zjs= r
and where i .j; j Jl,q are the (distinct) roots of
zq + 1z1 + ... + q = 0. (2.3.5)
1 q
If the roots of (2.3.5) are not all different, the representation is
not affected; only the definition of the parameter set (ci; j e J 3"
3 1,a
differs slightly. Also, in order that (2.3.3) be a valid representa
tion (in the mean square sense), we must require that all roots of
(2.3.5) lie inside the unit circle. This restriction is not serious;
in fact, it insures that the representation (2.3.1) is unique.
Thus [Xt3 may be considered an autoregressive process of infin
ite order. Durbin [10] proposes that, since it is easily shown that
ar decreases exponentially in r, we use the representation
k
X + E C Xt ; Z (2.3.6)
j=1
where k is chosen large, but still k << N. The parameters (I; j e Jl ,k
may then be estimated using methods discussed in 2.2; finally, (2.3.4)
and (2.3.5) are used to transform (&[; j J l,k to estimators of
([ ; j e J 3,q. Durbin shows the estimators can be made efficient by
choosing k(and hence N) large. All properties derived assume N infinite
and k large enough that relationships (2.3.3) and (2.3.6) are nearly
equivalent. The computation of the estimators requires considerable
effort.
Walker [31] finds linear functions of the sample correlations
1
[r = C CO ; j J1,N) which provide asymptotically efficient esti
mators of the correlation function [p = 1 j J He then
=, y E: ji He then
substitutes these estimators into (2.3.2) (expressing (2.3.2) in terms
of correlations rather than covariances), obtaining asymptotically
efficient estimators of [%j; j e J q}. The efficiency depends pri
marily on the number of correlations used in the calculation of the
estimators and the number of observations, N. The weights assigned to
the sample correlations used in these linear estimators are determined
iteratively, so calculations may be tedious.
The most recent contribution is due to Hannan [16]. He uses
the spectral representation of fXt} to obtain parameter estimators.
Though his work is too complicated to summarize here, it should be
noted that the estimators are shown to be asymptotically normal with
variancecovariance matrix identical to that of the maximum likelihood
estimators. The procedure does involve an iterative procedure which,
by his own examples,may be slow and thus expensive. His Monte Carlo
experiments also showed inexplicable bias in his estimates.
By now one realizes that the estimation problem for the moving
average time series is anything but simple; but the author's objective
is to obtain an estimator suitable for use in a predictor. The predic
tor which minimizes the mean square error of prediction is
^(1)
xN (m) = EXN+m/(X = x; j e J1,N. (2.3.7)
If the entire history of fXt were known, that is, if the observation
were fxj; j Je mN}, we would have
q
(1) q
xN (m) = E{ Z 3. Z_. ./(X. = x.; j J }. (2.3.8)
j=0 N+mj j j ,N
Note that fX. = x.; j e J ,N uniquely determines Zj =z; j e J ,N
j j J o,N j dN3
and conversely, by using (2.3.1) and (2.3.3). Thus
EZ N+k/(X. = x; j J N
N+k3 jmN
= E(ZN(Zj = z ; j J 3
N+k ( 3,N3
= 0,
(2.3.9)
for k J 1,., since the (Ztj process is an uncorrelated sequence.
Thus (2.3.8) becomes
4(1) q
XN (m) = S m (2.3.10)
j=m
Remembering that at decreases exponentially in r, we can approximate
ZNk by
NkI
ZN~k w xNk + Q 0t. x (2.3. 11)
zNk X xNk + 1j xNkj
j=1
for use in (2.3.10). Henceforth we assume that the entire history of
the (XtJ series is known up to time N so that the discussion is not
needlessly complicated.
Case 1: q = m = 1
From (2.3.10)
x(1 ) = B zN, (2.3.12)
writing Q = I The error is
MSE1 = E[[XN+ N(1)]2
= E((XN+1 N 2
= E[(XN1 B ZN)2
2
= (2) (2.3.13)
If S is unknown, and one uses the estimator 5, we have
^) (1) = (2.3.14)
where from (2.3.3)
Z = Xt + E () Xt ., (2.3.15)
j3=1
and so
ZN = XN + j ( x)J Nj
j=1
N1
Sx + F (s)j xNj. (2.3.16)
j=1
From previous arguments we know that
MSE2 SE1 = E[($ZN ) ZN)2. (2.3.17)
1/2
If we assume that 0 has sampling error of order N/2, which
is the case for each of the estimation procedures considered above,
we have
MSE2 MSE1 = E([[ZN (6+)(6Z N+ N)]2
= E{[ZN (62, + 02 E[(6ZN)23 + o(N1), (2.3.18)
where 60 = and 6ZN =ZN N
Note that it follows from (2.3.9) that if we use (xj; j J1,N1
in calculating $, then 0 and ZN are uncorrelated. Then
EfZ (68)23 = E(Z E[(60)2
= a2 E{([_)23 (2.3.19)
The second term on the righthand side of (2.3.18) follows from
E((6ZN )2 = E2 [ E (1) (J~J)XNj]2
j=1
= E[ (1)j j 1 (6)XNj 23 + o(N1). (2.3.20)
j=l
The value of (2.3.20) depends on the specific form of H. However, the
expression will be complicated for all estimates considered. One can
2
obtain an approximation for (2.3.20) by replacing (58) by its expected
value to get
E(8ZN)2 E[(6)2 [ j2 2(j1)
j=l
2y1 j(j+l)w2(j1)
j=l
SE(86)2] 2 o2(182)1 (2.3.21)
Substitution of (2.3.19) and (2.3.21) into (2.3.18) yields
USE0S2 22 2 2 1
SE2 E E(5)2 o[ + 8(1iB2)1]
= Ef(B) 2] a (1". (2.3.22)
Whittle [33] has shown that the maximum likelihood estimator for B has
asymptotic variance N (1 ), so that, at best, the increase in the
mean square error of prediction is given by
1 2
MSE MSE = N aZ (2.3.23)
2 1 Z
A strong argument can be made for using the inefficient estimator
given by substituting sample correlations into (2.3.2) and solving for B;
the estimation equation is thus
1 12l
rl = C1 C1 = (1+ 2)1 (2.3.24)
Only one of the solutions to this quadratic will be less than unity.
The estimator is easy to calculate, and even if the efficiency is only
0.1, the additional error given by (2.3.22) is negligible for large N.
Case 2: q, m general (m < q)
Using the mstep predictor given in (2.3.10), the mean square
error is
qm
MSE = E{[XN+m j+m Z Nj]2
j=0
m1
= a2(1 + E ) .
j=1
When the estimator
.(2) q
N (m) E= 2 N+mj
j=m
is used, the increase
MSE MSE =
2 1
in prediction error is given by
Ef[x(1) (2) 2
N N
E[E (Z (6$ ) + $.(6Z .) 23
=E[ {ZN+mj ) + (ZN+mj
+ o(N ).
By using only [x.; j e J ,Nq in the calculation of (Pj; j
the cross product terms in (2.3.27) vanish, leaving
q 2 q
MSE MSE = Ef[ E Z .(68j)] } + Ef[ Z j.6Z
2 1 N+mj N
j=m j=m
(2.3.27)
1,q3'
e J
+m2
+mj
+ o(N ). (2.3.28)
The first term on the righthand side of (2.3.28) is easily
evaluated as
Ef E ZN+m (6 )2= Ef(. ) 2}.
j=m 3 j=m
(2.3.29)
The second term, however, is complicated by the fact that the [Xt]
process is qdependent; that is to say, (.j; j e Jq,q are all nonzero.
If we apply the methods used in Case 1,
E[[ Z B. 5Z ] = E (E (,E.c.)XN+m
j N+mj =m l+mjt
j(2.3.30)
(2.3.30)
(2.3.25)
(2.3.26)
where [(; j e Jlq] are obtained from (2.3.4), using [T.; j Jl,q
as the roots of
+ zq1 + ... + B 0. (2.3.31)
All that could be gleaned from (2.3.30) is the fact that an
increase in q will result in an increase in the prediction error
increase given by (2.3.27). But since no more concise expression could
be obtained for (2.3.30), no approximation was obtained for (2.3.27).
The argument for using the estimators which are easily calculated,
but perhaps inefficient, still applies, of course. But determination
of N in order that the increase be bounded by a given quantity must be
accomplished by using numerical methods.
2.4 The Hybrid Model
If IXt; t e J, ]3 is a stationary time series satisfying the
relationship
t + + t1 p + .tp + 1 t1 tq (2.4.1)
for every t, then [Xt is said to be a hybrid model of order (p,q).
Estimation of the parameter set [ ... ; B1 ... 8 3 for the
hybrid model is the most difficult of those considered. The fact that
the sample covariance function is an inefficient estimator of the true
covariance function contributes to this difficulty. The maximum like
lihood estimators are again intractable, but their asymptotic variance
covariance matrix can be used for calculations of efficiency.
Durbin [11,12] again makes use of an infinite order autoregres
sive representation of [Xt). The problems with the procedure are similar
to those mentioned in 2.3: the choice of truncation point for the
autoregressive representation, the choice of N, and the cumbersome
iterative procedure used in the calculations. However, the efficiency
of the estimators can be made as close to unity as desired by taking
the truncation point and N large.
Walker [32] approaches the problem via the covariance function;
by finding an efficient estimator of this function one can efficiently
estimate the hybrid model parameters. The following relationships are
used:
P p
E{( ZE X .)( Z y. X .s)
j=0 j =0 s
q q
= E( E Z .)( E B. Z .)J (2.4.2)
j=0 J j=0 s
implies that
P P 2 q
j i _cr.i Y(ts)+(ij)=o s j . .tsI, (2. 4.3)
j=0 i=0 j=tsI jts
where o= 0 = 1, and tsl q. This system of equations gives
a ; e Jl,p] and [j ; j E Jl,q in terms of thp covariance function
{yj). In order that the solutions to (2.4.3) be unique, one must require
that the roots of
z + 1 zq1 + ... + q = 0 (2.4.4)
have modulus less than unity.
Also, after multiplication of (2.4.1) by Xr for r e J ,q+p
tr q+l,q+p'
taking expected values yields
P
E i yrj = 0, (2.4.5)
j=0
using the fact that X is uncorrelated with the sequence
t
(Z t+; j J1 ]. Walker finds efficient estimators of y j), and then
uses (2.4.3) and (2.4.5) to calculate estimates of the hybrid model
parameters. An iterative procedure is required, but the resulting
estimators are asymptotically normal and fully efficient.
Hannan [16] achieves the same goal by a different method:
that of estimating the spectral density and using this to get estimates
of the parameters. The effort required is substantial, but the esti
mators are asymptotically normal and efficient.
The predictor of Xm given the observation x.; j c J_ N is
(1)
N+m N "N
xN (m) = EN+mI /(X x; j c J
p q
p 4(1) q
CE (1)(mj) + Bj. zN+m
j=1 j=m
= if m C Jl,q
(1)
(1mj) if m J (2.4.6)
j a N (m q+l,'
where
.(1) if j J1
xN () =,
Nj if j J (2.4.7)
Note that the values of [N; j E J are calculated from
Cl,q
[xj; j e J_ N, using the infinite order autoregressive representation.
When only [x ; j C J ) is observed, we may approximate [z N;j cJ q]
by truncating the autoregressive representation after N terms.
When the parameters are unknown, the predictor used is
P (2) q
E a J( N (mj) + E' j ZN+mj
j=1 j=mj
.(2) if me Jq
xN (m) = l,q
P (2)
E ^ x2) (mj) if m Jq+l, (2.4.8)
j=l
^(2) ^(1)
where xN (j) is calculated just as xN (j) with the
by their estimators, and z. is calculated, using the
presentations with estimated parameters.
Case 1: p = q = m = 1
The time series IXt; t e J
X cXt_ = Z + zt_1,
writing = and = in (2.4.t ).
writing c = Yl and $ = P1 in (2.4.1).
parameters replaced
autoregressive rep
_C satisfies
(2.4.9)
From (2.4.6),
S(1) = N +
XN (1) = CYN + BZN.
(2.4.10)
The autoregressive representation of Z is
ZN = XN + E (0) (XN XN )
j=1
= XN + (1 oal ) E ()J XNj.
j=1
(2.4.11)
Noticing that the autoregressive parameters (the coefficients of the
Xt s) on the righthand side of (2.4.11) are decreasing exponentially,
we will use
N1
zN xN + (1 ()B3 xN (2.4.12)
j=1
in (2.4.10) to calculate iN (m). The mean square error of prediction
for (2.4.10) is
MSE1 = E[[X X)
1 N+1 N
= E[XN+ (XN + ZN) 2
= 2 (2.4.13)
When a and B are estimated by & and B, respectively, we use
(m) = xN ZN. (2.4.14)
The increase in prediction error is
MSE2 MSE1 = E[[eX+ $ZN (&N + ZN)]2]. (2.4.15)
The term Z greatly complicates this expression; but if %e assume
N
ZN ZN'
MSE2 MSE1 E[[('ue)XN + (S^)ZN]2}
S yoE (a)2}+ 02 E (^)2
+ 2o2E[(a )(SS)], (2.4.16)
where
(i) XN and & are assumed approximately uncorrelated;
(ii) (&,P) are calculated using only [x.; j e J1,N_1 so that
3 and ZN are uncorrelated; and, using the fact that [Xt can be repre
sented as an infinite order moving average process,
(iii) EXNZN} = E[[ZN + c jZNj ]ZN]
j=1
= a2 (2.4.17)
Cz
To obtain ~ ej ; j Jl in terms of (a,c), note that
xI = Zt + (Zj. + z_j_)
j=1
Zt + ( tj tj1
= Z + (0) E Z .. (2.4.18)
j=l
One can derive an estimator for a from the equation (2.4.5):
Yr r1 = 0 (2.4.19)
for every re J2,. With this in mind, we define k as an asymptotic
(kxk) covariance matrix with (i,j) element
*
C.i = lim E((C.C')(C.')), (2.4.20)
13 N 1 J
1
where C. =C. C (2.4.21)
1 1+1 i
for i e J,k By letting
1 1
6C = C C
i i+l i i+l Yi
= Ci a (2.4.22)
and
6C. = C. (2.4.23)
we have
1
6C. = (C CI.)C1
1 j+l J j
= (6C. aoc.)(6C. + y.)
J+l J 3 J
1 1/2
=v (6C.j aC.) + o (N 2). (2.4.24)
J J+ p
But Bartlett [1] has shown that C. is consistent for y so that
J 3a
(2.4.24) implies that C. is consistent for &c.
J
Furthermore, from (2.4.18) we find
Q(2 1+( 2+ 20ci (12) i:
Ys
2 am1
(Z aca i:
where
2 1
a = (c+t)(1+)( 1+a )
Defining
v = Ef(5c )(5C )],
Xrr+s E r )(Cr+s '
Bartlett [1] has shown that
2
NXr r+s = Y
S(ppjj+s
j 3S
+ jPj+r+s ) + o(1).
jr j+r+s
(2.4.28)
1
Remembering that p = yo we use (2.4.25) to get
Sp.p
SPjPj+s
3 ,
1 + 2a2(1c2 )1
Ss2 2 2 2 1 2
a [2a a2 (1_ ) + (k1)a + 2ac]
if s= 0
if s i 0.
(2.4.29)
We now state
Theorem 2.4.1:
The estimator
k
cy = djo C ,
j= JO
T
where d = (dl,d0,...,d
S T 1 T 1 1,
I S [f S ]]1
(2.4.30)
(2.4.31)
(2.4.32)
T
d 4 1
0
f s = 0
f s 0,
(2.4.25)
(2.4.26)
(2.4.27)
T
with = (1,1,...1) is consistent and has minimum mean square
error among the estimators
k
= E d. C.
j=1 J J
with d T = 1. This error is given by
MSE( ) = [fT E1 T1,
(2.4.33)
(2.4.34)
where
(a r 12 (1+2 4ac+2a2)
r 2 2 2 2
r,r+s = (a r ) (2 aca a 2 )
2r 2
a (1 )c
if s = 0
if s = 1
if s J2 c
Proof:
That aO is consistent for a follows immediately
T
that C. is consistent for a (2.4'24) and d T = 1.
Now if & is of form (2.3.33),
MSE[} = dT E d.
(2.4.35)
from the fact
(2.4.36)
In order to minimize this with the restriction d = 1, we form the
Lagrange equation
L = dT d 2\(d T 1).
But = 0 implies
I d=d
0
T T
d E = 0.
0
(2.4.37)
(2.4.38)
Using d 11 = 1, it follows that
S= [ Z 1 ]1 (2.4.39)
and hence
T T 1 T 1l 1
d = ~ S I[ E ] (2.4.40)
as was to be shown.
Substituting (2.4.40) into (2.4.36), we obtain
MSE(^&0 = [T E1 P1. (2.4.41)
Finally, noting from (2.4.24) that
a = E[(6C )(6C* )
r,r+s r r+s
r= r+s rt's,r+s+l C ,r.tsl + Yr+l,rs)
+ a2 rr+s, (2.4.42)
substitution of (2.4.28) and (2.4.29) into (2.4.42) yields (2.4.35),
which completes the proof.
Table 2.4.1 gives efficiencies for &0 for several values of
Oa and $; three values of k (the number of sample covariances used in
the calculation of &0) are given. All calculations were carried out
on the University of Florida's IBM 360 computer. Also, efficiencies are
calculated with respect to the variance of the maximum likelihood esti
mator of ce given by Whittle [appendix of 35] as
W 1 2 2
W= N(1 )(cM+a3)2(c+). (2.4.43)
The table shows that the efficiency is especially sensitive to
changes in 8; as B tends to unity, it becomes necessary to use more
sample covariances to achieve high efficiency.
Table 2.1
k *
Tables of Efficiencies for 0 = djo Cj
j=1
k= 2 y
.1 .3 .5 .7 .9
.1 0.999 0.999 0.999 1.000 1.000
.3 0.952 0.961 0.972 0.983 0.994
0 .5 0.791 0.840 0.888 0.934 0.979
.7 0.591 0.687 0.780 0.871 0.958
.9 0.427 0.551 0.680 0.809 0.937
k= 4 y
.1 .3 .5 .7 .9
.1 1.000 1.000 1.000 1.000 1.000
.3 0.999 0.999 0.999 1.000 1.000
0 .5 0.967 0.975 0.983 0.990 0.997
.7 0.830 0.877 0.918 0.954 0.986
.9 0.632 0.731 0.819 0.897 0.968
k= 6 y
.1 .3 .5 .7 .9
.1 1.000 1.000 1.000 1.000 1.000
.3 1.000 1.000 1.000 1.000 1.000
0 .5 0.996 0.997 0.998 0.999 1.000
.7 0.931 0.951 0.968 0.982 0.995
.9 0.746 0.820 0.882 0.935 0.980
In practice, one would need to estimate S (using 2.4.2) in order
to calculate a Thus, though the effort of calculation is somewhat
o
reduced, an iterative method must be used. We can obtain an upper
bound to the mean square error of prediction for
(2M) = &xN + 8N (2.4.44)
by considering
(m) O xN. (2.4.45)
The corresponding increase in prediction error is
MSE3 MSE1 = Ef[(^ c)XN + 6ZN]2}
3 1N
aC [2 + K(l+ce1)2], (2.4.46)
z
where we assume K observations Ix ; je J ,K have been used in the
calculation of &0, with K << N, in order that the assumption that X
and ^ are uncorrelated is more plausible.
In fact, the use of (3)m) as a predictor of XN may not be
N N+m
unreasonable as long as 3 is small, thus saving the rather tedious
calculation of Z N'
Case 2: p = q = 1; m general
The mstep predictor is given by
1) m ml
xN (m) = m x + a1 (2.4.47)
with mean square error
z2 if m= 1
MSE =
:2[1+ (re+B)2 (1 m)(1C2 ) ] if m J2. (2.4.48)
If we use
^(2) A^m ml
xN (m) = C xN + am N' (2.4.49)
the increase in prediction error will be
m m [m1X + (m
USE MSE1 2 MS E E[( C )XN+ (a C CY )ZN]
m1 mN
= E{[ma (6 )N + (a (60)
+ mom 1 c(6))ZN 2) + o(N1)
2 2(m1 (0) 2 2 2(ml) E (6)2
Sm a (YO+ Z)Ef(a') + E(6)
2(m1) 2
+ 2m 1) (1+)oCT E((6a)(6 )}, (2.4.50)
where we have replaced ZN by ZN, and assumed (&,0) to be uncorrelated
with XN.
We can again obtain an upper bound for this rather complicated
expression by considering
xN (m) = xN. (2.4.51)
Then MSE2 MSE1 is bounded above by
2 2(ml) 2 a1 ]2
a C [ 2( 2 + K m(l+a) ]. (2.4.52)
In addition to the comments applied to this quantity in Case 1, notice
that the increase becomes less significant as m is increased.
Case 3: p general; q = m = 1
The representation of (Xt; t e J_, ~ is now, from 2.4.1,
Xt + at + +" +p tp= Z + Zt1 (2.4.53)
The predictor is
N (1) = E Q. 0x + Np (2.N.54)
jXl "xNilj + ZN'
j=1
and the error of prediction is
MSE = E([X ( Xn. i + 8Z,)2I
1 N+I 3 o N+ 13 N
j=1
= oz. (2.4.55)
When o. ; j e J ] arc unknown, estimators can be obtained
from (2.4.2) and (2.1.5); we have
p
E a C = 0 (2.1.56)
j=O
for r e J2,p41 (a = 1) and (2.4.3) for estimating the parameters.
One can also extend the method presented in Theorem 2.4.1 to this case,
but the analysis is increasingly complicated as p increases. The deter
mination of estimates from (2.4.56) and (2.1.3) is probably easiest,
1
and the order of the error increase is N even though the estimators
are inefficient.
The cases with p, q, and m general are analytically intractable
insofar as evaluation of increase in prediction error due to parameter
estimation is concerned. But the study of the simple cases has shown
45
that primary consideration should be given to the construction of
estimators which are consistent and have mean square error of order
l
N. As long as N is large, high efficiency is not a necessary
condition for small mean square error of prediction.
CHAPTER III
THE LOGISTIC GROWTH PROCESS
3.1 Introduction
Let [X(t); t E I0, be a time series such that
t 1
E[X(t)/X(0) = n) = n(+ce)(1+ap) (3.1.1)
where a > 0, n e J and p e (0,1). Then [X(t)3 is said to be a
growth process following the logistic law, or simply a logistic growth
process. This process receives wide application in animal population
models and various economic models; the representation varies slightly,
depending upon the particular application, but with a little adjustment
each can be written as (3.1.1).
The problems to be considered are:
(i) the estimation of a and p in (3.1.1), and
(ii) the prediction of future values of X(t) when a and p
are estimated.
Several observational possibilities will be considered. The
process may be observed periodically, at equidistant time points; it
may be observed continuously during a given time interval; or only
the times at which the process increases (assuming all increases are
discrete) may be observed.
3.2 Probability Distribution of IX(t)}
The probability structure of a growth process may be specified
in any of an infinite number of ways. But in reality these processes
are often used wi:h no reference to such a structure; the parameter
estimation problem is considered without reference to error.
We shall use the following probabilistic model for (X(t)}:
let [X(t); t S IO,} be a Markov process with
P{X(t+5t) = j/X(t) = i}
ik(t)6t + o(6t) if j = i + 1
= 1 ik(t)6t + o(6t) if j = i
o(6t) otherwise, (3.2.1)
where 6t is an infinitesimal time interval. This model obviously
requires the assumption that [X(t)] grows in equivalent discrete units
(economic units, population units, etc.), and that the probability of
a oneunit increase at any given time is proportional to the size of
the process at that time. The reader will recognize (3.2.1) as the
common definition given a birthdeath process with the death rate zero
for all t. Kendall [19] has shown that the above representation is
the birthdeath model which minimizes the variance of (X(t)3.
Defining
pij(st) = P{X(t) = j/X(s) = i), (3.2.2)
an application of the Kolmogorov equations yields
(j1)(t)6t p ij1(s,t) +
p .(s,t+.t) = + [1 j.(t)6t]p .(s,t) +o(6t) if j e Ji+
[1ij (t)6t]pij(s,t) +o(6t) if j = i.
(3.2.3)
Thus
(jl)\(t)P (s,t) j,(t)p. i (s,t)
[P j(s,t)] = if j J+,m
j .(t)p. (s,t) if j = i. (3.2.4)
13
Multiplication of (3.2.4) by j and summation over all j gives
E J([ [Pij(st)}] =
j=i
2
= \(t)[ E j(jl)p (s,t) E j P.(st)]
j=i+l ij1 i
= x(t) E j p..(s,t). (3.2.5)
j=i
Defining
m.(s,t) = E j p. (s,t), (3.2.6)
1 1.
S j=i 1
we rewrite (3.2.5), obtaining
[m (s,t)] = \(t) m.(s,t). (3.2.7)
Solving,
t
mi(s,t) = i exp [ X (u)du}, (3.2.8)
s
using the initial condition
m.(s,s) = i. (3.2.9)
1
The function m.(s,t) shall be called the mean function of the growth
process (x(t)J, since obviously
m (s,t) = EfX(t) X(s) = i]. (3.2.10)
1 
From (3.1.1) we know that
m (0,t) = n(l+Qi)(l+0p t)
n
(3.2.11)
is the number of units at time zero. Equating (3.2.11) and
(for s = 0) implies that, for the logistic growth process,
X(t) = a((a + p 1 (3.2.12)
(3.2.13)
0 = log p.
We now state
Theorem 3.2.1:
If [X(t); t
ability structure is
e IO,] is a logistic growth
(3.2.1), then
process whose proba
j+i1 i (1 T )
Sj+J 1 s,t st
0
where
T = (1 + p )(l + Op ).
s,t
Proof:
The proof is a special case of that given by Kendall [19] for
the general birthdeath process. We let X(t) be defined by (3.2.12) and
p(t), the death rate, be zero to obtain the density given in (3.2.14).
The rigorous proof of this is given for the general case in many places,
and so will not be reproduced here.
where n
(3.2.8)
where
if je Jo
otherwise,
(3.2.14)
(3.2.15)
3.3 Equispaced Observation
Assume that N observations are made on the logistic growth
process I((t); t e 10~} at equispaced intervals of length A. Without
losing any generality in the results to be presented, we henceforth
assume A = 1. Thus (f(j); j e JiN will be observed, with X(0) = n
known.
Writing the set of observations as [xj; j e J ,N], the likeli
hood function for this set is
L[La,p; (x ; j J1,N)] =
= P (Xj = x.; j J1,N)/e,p n). (3.3.1)
Using the fact that [X(t)3 is a Markov process and applying Theorem
3.2.1,
N
L[e,p; (xj; j C J1)] = IT P[X(j) = x./X(j1) = x )
j=l1 j1
N x. x. x.x.
TT T J1 (1T ) J j1
j=1 j1
(3.3.2)
where x0 = n.
Taking the logarithm of (3.3.2), and replacing T by the
j 1,j
defining equation (3.2.15), one gets
N x 1 N
log L = Z x ) + x j[log (l+cp)
j=l j1 j=l
N
log (l+Cpjl)] + (x.x. )[log o(pJ1pj
j=l 3 3l
 log (l+apj1)],
(3.3.3)
writing L = L[c,p;(x; j J1,N)] for simplicity. Differentiation of
(3.3.3) with respect to a yields
Nl
Slog L 1 1 j+1 j+1 1
 = n[p(l+oap) a ] + E x.[p (+ap )
j=1
(1+ jpJl) 1 +xN[O p1 (1+apN1) 1; (3.3.4)
and differentiation with respect to p gives
log L 1 1
Sl = n[(1p) + a,(l+ap)1]
N1
+ x.[(j+l) apj (l+ap j+l) p1
j=1
(j1)cpj2 (l+1apj1)1]
+ xN[(Nl)p1 (lp) (Nl) pN2 (l+pN1 )].
(3.3.5)
Before considering (3.3.4) and (3.3.5) at greater length, let
(Y (t); t e I O, i e J,n be n independent logistic processes, where
Y (0) = 1 for each i, and each of the n processes sharesthe same param
eters as [X(t)}, (a,p). Defining, for s < t,
Hyi(s,t; z,w) = E E PfY.(s) = m; Y.(t) = k}z wm (3.3.6)
m=1 k=m
one has
Hyi(s,t; z,w) = PY.i(s) = m/Y.(0) = 1} wm E P[Y.(t)=k/Y.(s) = m}zk
m=l k=m
T. (1T0 )m1 m kl m (1Ts km k
= Os(w,st ( w s ( ml) Ts,t (,t
m=l ,s O( s k=m s,t st
STO, 6(w zT )[I(IT )6(w,ZsT ,)] (3.3.7)
Os ''s,t t .3, L st
where
6(w,Z;T s) = T wz[1z(IT )]
Simis,t s,t
Similarly, defining
Hx(s,t;z,w) = Z
one has
HX(s,t;z,w) =
SPkX(s) = m; (t) = k ) = n (3 9)
Z P(X(s) = m; X(t) = kX(0) = n)z w (3.3.9)
m=n k=m
mn
SnI
m=n
n (1T n m
0,s ( 0,
X Z (k1)
k m1
k=m
Tm (T km zk
s,t s,t
= (T0,s 6(w,;T t )[(T ) (w,z;T s,] 3
O,s s,t s,t st
which implies that
H (s,t;z,w) = [Hi(s,t;z,w)] = T Hyi(st;zw).
i=1
X(t) = Z Y.(t)
i=l1
(t e I ).
If we rewrite (3.3.3) for the random sequence
[X(j) = X.; j e J1,N0 we get
N
log L(cp,p; X1, .. N) = n p 0(a,p) + E X. (a,p)
j=1
(3.3.8)
Hence
(3.3.10)
(3.3.11)
(3.3.12)
where
(3.3.13)
N
S 1 N
log [(l+Qp)[(lp)] nJ+n E
i=1
log [(l+apj+l) [p(l+apj1)] 1
log [pNl1 (lp ( N1 1
log [ap (lp)(l+wp ) 3
log X 1
i1
if j = 0
if je J1,N1
if j = N.
(3.3.14)
But in view of (3.3.12), this can be written
log L(a,p; Y1'2 ...' ) =
N n
= n q ((e,p) + E E Yi,j Pj(~,p)
j=1 i=1
n
= z Z
i=l
YT = (Y Y. Yi
1 ij'i ,2 .. ,N
Y. = Y.(j)
z = o (a,p) +
(3.3.15)
(3.3.16)
(3.3.17)
N
S=1 ( ip)Y
j=l 1 j
It follows from the discussion leading up to (3.3.12) that
N
zi = Co (Op) + cp.(a,p)yi,
Sj=l
= log p( i,a1,p),
where
P()'%Cp) = PYi, = y ,1'.... i.N = i,N.
CPj CkQP
where
(3.3.18)
(3.3.19)
We are now ready to state and prove
Theorem 3.3.1: The equations
8 log L(a,p; X1,X .... XN)
= 0
ac
Slog L(o,p; X1 XN)
= 0 (3.3.20)
ap
have a solution (&,p) which is weakly consistent for (c,p). Furthermore,
the asymptotic distribution of [,/n (&c); ../n (pp)] as n m is
normal, with null mean vector and variancecovariance matrix 2, where
11 12
1 CT
12 22
(3.3.21)
with
11 2 p2 2
a = p (1+ap) a
N1 2
S(l+c(ltpJ 1 2(j+l) (l+cpj+il 2 2(j1) j1)
+ E (l+u)(l+up ) [p (1+ep ) p (1+Cp ]
j=1
N 1 2 2(N1) N12
+ (l+a)(l+upN) [cI p (1+Np ) ], (3.3.22)
12 (lp 2
a = (1+cp)
Nl
+ E (l+a)(l+c&J)l[(jl)pj2(l+QpJ1)2(j+l)p (l+cpJ+l)2
j=l
 (1+o)(1+cPN 1[(N1)p2(N1) (1+QN) 2]
(3.3.23)
and
22 2 2 ( 2
22 = o (1+op) (1p)
N1
Ni j 1 2 2 2j j1
+ E (l+i)(1+ap ) p~ +Q++ [(j+l)c2p 2 j(j+l)p1
j=l
X (1+pj+1 2 [(j1)(j2)p j3 (j )2 2(j2)
x (l+crpjl)2] N)i N3
X (1+apj1)23 + (l+')(l+app )1 ({[(Nl)(N2)ap 3
2 2(N2 N1 2 2 2
(Nl)ap ]( (1+cp ) +(lp) + (Nl)p] (3.3.24)
Proof:
Write p = p(y ; a,p), 01 = a, and 62 = p for convenience.
Chanda [5] has shown that the following three conditions are suffi
cient for the theorem to be true:
(i) For all (01, 2) e ( = [(0,=) X (0,1)3,
3
a log p 8 log p and log p exist for almost a
8 9 *and o9 oe9 exist for almost all },
i 1 3 1 j k
and for all i,j,k e J
1,2
(ii) Defining F() to be a summable function when
E F(y ) < <, (3.3.25)
i 
where Z is the summation over almost all ,, we must
yi
have, for all (91, 2) e 0 I < F
I 12P {3 log p k
< Fij (y), and aeisejae
where F (y) and F.. () are summable functions, and
i ~ ij1
S H k(y )p(y; e, 6) < M,
E Hj ,k i 1 2 )
zi
M being a finite positive constant. This condition
must be satisfied for i,j,k e J1,2
(iii) For all (98182) e C, the matrix J = ((Ji,j (1 2) ),
where
(3 log p 3 log p,
J (6 ,9 ) =) p(y, 9 9e )
i,j 1 2 1P(i, '1 2
1i 1 3
is positive definite and JJ is finite.
(i) Condition (i) will be established as soon as
2 3
ar r _r
rW' 8.e., and roe 0o are shown to exist for r e JO,N and
i j J j k
i,j,k c J1,2. This follows from the form of log p(yi ;,p), which
from (3.3.18) is
N
log p(y ;o,p)= Z cj(,p)y. ., (3.3.26)
j=O 'J'
where we define yi, = 1.
But by inspection of (3.3.14), one sees that cr(a,p) is the
logarithm of a continuous finite function for all (a,p) e ,. Thus
it is easily shown that derivatives up to third order exist in C.
(ii) It follows from (3.3.26) that
N
p(~i;o(,p)= exp [ c.(o,p)y.i. (3.3.27)
Thus
dp. N dr N
e E i,r e. exp _E y
j r=O j=0
N ar
r=0 3
which is bounded by the summable function
N ra
F.(y.) = I E i Y p(y; ,2), (3.3.29)
3 ~1 CC i,N 1 r1
r= i
for from (i) the terms C are finite, and Efyi, < .
3
Similarly,
2 8 N # er
eae. e (aee) ( k,r k 1 2
1 J i r=0 j
N 2
= 0 T Yk,r Pk 1 2)
r=O 1 3
r=0 s=0 1 3
which requires (i) and finite second order moment of yk in order to be
bounded by the summable function
N Tr .
i,j Lk [ ae= ] kN krk 12
N N r s
+ E a j YN P(Y ;1 2). (3.3.31)
r=0 s=0 i j
Finally,
3
8 log ps N (3.3.32)
i.dej r=E d.d9.9 Ys,r (3.3.32
1 j k r=0 i j k
is bounded by
N o r
Hijk ) = YsN r i. oh.k (3.3.33)
L'0 i j k
which has finite expected value by (i) and the finite expected value of
ys,N. Thus (ii) is established.
(iii) Chanda has shown that J will be positive definite as
long as p is a density function. That IJI is finite follows from the
fact that log log P) is finite (from (i)) for all (9 ,) e Q.
1i j
Thus the proof of Theorem 3.3.1 is complete. A closed form has
not been found for the solution to (3.3.20). Some numerical results
using an iterative procedure will be presented in Chapter IV. Also, we
attempted to establish stronger consistency properties for (,p)), using
results of Billingsley [2] and Wald [27], but have had no success as yet.
3.4 Continuous Observation in [O,T]
Assume that the logistic growth process iX(t); t e Io 1 is
observed continuously in [0,T] = IO,T That is, [X(t); t I ,T) will
be observed, with X(0) = n known.
The problem of estimating aC and p will be approached as follows:
first, divide the interval I,T into N equal intervals of infinitesimal
length h. Assume that observations will be made only at the end of
each of the N intervals; that is, (X(jh); j e J 1N will be observed,
where Nh = T. From (3.3.2) we can write the likelihood function for
such an observation:
L[c,p; (xjh; j E J ,N)
S jh )jl)h(T jh (jl)h
j=l (j 1h1 T (jl)h,jh (jl)h,jh)
(3.4.1)
where x0 = n and T is defined in (3.2.15). Note that to obtain
0 s,t
the continuous observation in I ,T, one must let h 0 and N 
such that Nh = T. We thus want to consider the properties of
log [L[a,p; (Xh; j JON)]} as h 0.
We first define the concept of "limit in mean" (l.i.m.).
If a sequence of random variables fz,,Z 2" ,m,... are such that
lim E[(Z Z)23 = 0, (3.4.2)
mm
m M
then [Zm is said to approach Z as a limit in the mean square sense, or
simply as a limit in mean. We will write
l.i.m. Z = Z (3.4.3)
m
m  co
We now state
Theorem 3.4.1:
The limit in mean of log fL[c~,p;(Xjh; j e JO,N)] as
h 0 (Nh = T) does not exist. However, for fixed (a ,P) e Q,
l.i.m. [log L[a,p; (Xjh; j JO,N)] log L[0,PO; (Xjh; j eJON)
h 0
Nh=T
T
= (log t log 0) dX(t)
0
+ j X(t)[t 0] dt,
where we define dt
where we define
O t t 1
t = ao0oP (1+0 Po
(3.4.4)
(3.4.5)
(3.4.6)
The stochastic integrals on the righthand side of (3.4.4) are defined
in the usual sense, and remember e = log p.
Before proving Theorem 3.4.1, we will need the following lemmas.
Lemma 3.4.1:
lim N T
h 0 log T 1)hj = t dt. (3.4.7)
Nh=T = 0
The
lim N
h 0 {ZI log (1 T(l)h ,Jh)
h=T 1 =1
does not exist, but for any fixed (0 ,Po) e
lim N F 0 1
h 0 E h log L (1 (j1)hj T ()hIj
Nh =T =1
T
= (log Et log g )dt, (3.4.8)
0
writing (s < t)
,t (1 + cg t)(1 + ) 1 (3.4.9)
Proof of Lemma 3.4.1:
Using (3.2.15),
jh (j1)h 1
(j1)h,jh = (1 + pjh ) ( + Cp1)h )1
= (1 + apjh) [1 + ap exp (Sh)]1
= [1 + (1 + cp jh)l [Cpjh(exp [$h3 1)])1
[ + h + (h) (3.4.10)
= [1 + h( + 0(h)] (3.4.10)
j h
Hence
log T(jl)h,jh = log [1 + h jh + O(h2)
h jh + 0(h2), (3.4.11)
from which (3.4.7) follows immediately.
But from (3.4.10)
2 2 1
1 T(jl)h,jh = [h jh +0(h )][ + h jh +0(h2)], (3.4.12)
lim
which implies that h 0 [log (1 T(jl)h,jh) does not exist. Now
Nh=T
(jl)h,jh (jl)h,jh
2 O O 2 0 2 1
=[hjh +0(h )][1+hh +0(h2 )][h h +0(h )]
X [I + h j+0(h2)]
Sjh Sj
= jh~0jh) [1 + 0(h)] (3.4.13)
We substitute (3.4.13) into the lefthand side of (3.4.8) to complete
the proof.
Lemma 3.4.2:
If we define, for s < t u,
R (t,u) = C (t,u) mxs(s,t)mx (s,u) (3.4.14)
where
Cs(t,u) = E[X(u) X(t)/(X(s) = xs)) (3.4.15)
and mx (s,t) is defined in (3.2.10), then
R t 5) u( t 1
R (t,u) = e(pp )(1+op s) [(1+Cop )( t +p )] (3.4.16)
5
Proof of Lemma 3.4.2:
Using (3.2.26),
C (t,u) = E[X(t) [E[X(u).'X(t) = x t] '(X(s) = x )
= (1 + &p t) (1 + Cp )1 C (t,t). (3.4.17)
s
And from Theorem 3.2.1,
E[[X(t) x ][X(t) x 1/(X(s) = xs)
0 /x +j1 xj
= j(T(1s t s (1T
j=0 s s,
= (x +1)x (T ), (T ) (3.4.18)
s s s,t s,t
Thus
C (t,t) = (x +1)x (T )(T )2
s s s s,t s,t
+ (2x +1)E[X(t)/(X(s) = xs) Xs(x +l)
,)
= x (T ) "(x +1T ). (3.4.19)
s s,t s s,t
Substitution of (3.4.19) into (3.4.17) gives
1
C (t,u) = x (T T ) (x +1T ). (3.4.20)
s s s,t S,u s s,t
Then using (3.4.20) in (3.4.14) yields
1
R (t,u) = x (T T )1 (1 T ), (3.4.21)
s s s,u s,t s
which is (3.4.16) upon substitution of (3.2.15) in (3.4.21).
63
Lemma 3.4.3:
The stochastic integral
1.i.m. N
I = h 0 E X(jh) jh h (3.4.22)
Nh= T j=l
exists in the mean square sense for all (n,p) e Q. We shall write this
limiting random variable
T
I, f X(t) dt. (3.4.23)
0
Proof of Lenna 3.4.3:
Using a result from Cramer and Leadbetter [7],sufficient condi
tions for the existence of Il are that R (t,u) be of bounded variation
in [IO,T X IOT) and that
T T
J1 T t u R0(t,u) dt du (3.4.24)
0 0
exist.
Since, from (3.4.16),
0 R (t,u) x0 (1+f), (3.4.25)
the first of these conditions is satisfied. And since t is finite for
any (a,p) e I1 and all t e IO,T, the integral (3.4.24) exists. Thus
the lemma is proved.
Lemma 3.4.4:
The stochastic integral
1.i.m. N
2 = h 0 (log jh)[X(jh) X((jl)h)] (3.4.26)
Nh=T j=l
exists in the mean square sense for all (a,p) e Q.
We shall write this limiting random variable
T
12 = J (log St) dX(t). (3.4.27)
0
Proof of Lemma 31.4:
Again from Cramer and Leadbetter [7], sufficient conditions for
the existence of 12 are that R (t,u) be of bounded variation and that
T T
J2 =j (log t)(log u) dt[R0(t,u)] (3.4.28)
0 0
exist. The first of these has already been shown, and by differentia
tion of R (t,u), one has
T Tt
J2= J (log t)(log S )[a3(l+o)]2 pt
0 0
[(l+apt (l+apu)] dt du (3.4.29)
The integrand is finite in C and [IOT X IT}, which completes the
proof.
Proof of Theorem 3.4.1:
From equation (3.4.1)
log L[o,p; (Xjh; j e J,N)
N X.h N
= (X Jh + E X log [T
j=1 (j1)h j=1 (j1)h (j1)hjh
N
+ E (Xjh X j)h) log [1 T j)hjh ]. (3.4.30)
j=1
The third term on the righthand side of (3.4.30) has no limit in the
mean square sense because of the first part of Lemma 3.4.1. We thus
consider
log L[a,p;(Xjh;j e JON)] log L[ 0 ,P0;(Xjh;J e JO,N)
N 0
= X(j.l)h (log T(jl)h,jh log T(jl)hjh
j=1
N 0 1
+ Z (X^ X ) log [(lT )(1T ) 13
+ E (Xjh X(jl)h) log [(lT(jl)h,jh)(1T(jl)h,jh
j=1
(3.4.31)
for any (a0, 0) e Q.
Using lemmas 3.4.3 and 3.4.4, we have
log L* [(,p;O',P ; (X(t); t e IO,T)]
T T
S X(t)(~0t) dt+ (log tlog 0) dX(t), (3.4.32)
0 0
writing
log L f(,p;aO,p0; (X(t); t e I ),T
1.i.m.
= h 0 (log L[a ,P; (Xjh; j e JO,N)
Nh=T
log L O('PO; (Xjh; e JO,N)]. (3.4.33)
This completes the proof of the theorem. The idea of resolving the
limit problem by considering the ratio of likelihood functions, as in
(3.4.31), is from Bartlett [1].
By using the expression (3.3.12), we may rewrite (3.4.32)
to obtain
g n
log L (a,p;a0,PO; [X(t); t e I,T] = ZT,i, (3.4.34)
j=1
where
T T
ZTi (t)( )dt+ (log t log 0)dYi(t). (3.4.35)
O 0
(ZT,i; i J1,N is a sequence of independent, identically distributed
random variables. We will be interested in
n T
alog L l t t) 
.Z Z {I [ap (+ ( cp ) ] dY. (t)
i=l 0
T
pt(l+ op Yi(t) dt (3.4.36)
0
and
a log L n T t t 
9 f 1(1 + ap pt) (1 + CI p dYi(t)
i=1 0
T
T t t)2
a p (l+apt Bt)(l + p )2 i(t) dt (3.4.37)
0
Since L is not the sum of probability densities, as is the
usual case for likelihood functions, we will need the following
Lemma 3.4.5:
Efa log L E log L 0. (3.4.38)
Eo(u ( } E (3.4.38)
Proof:
From (3.4.36)
a log L 1
SY C ) p + ept) ] dtm l(, t)
i=1 0
T
1 pt (l+ ap )2 m(,t)dt
n T
0
S (+1( (l)8 [c~ pt(l+cp t)]p (l+p )dt
i=1 0
T
P(+) 1 pt(1+Cpt) 3dt}
0
= 0,
(3.4.39)
using the fact that (CramerLeadbetter [7])
EfdYi(t)] = dtEfYi(t)j
= d [m (0,t)]. (3.4.40)
Similarly,
log L* n 1
E{( ls = Z 1 { (l+Cptt) (l+apt)dtml(O0,t)
i=l 0
n T t t t 3
= (c+1) (l1+ap Pt)p (l+ap ) dt
i=l 0
a(cl) (l+aptt)pt(l+apt)3dtt
0
=0, (3.4.41)
which completes the proof.
We are now ready to state and prove
Theorem 3.4.2:
The equations
8 log L ',$;ao ,P; [X(t); t e IOT]]
= 0
8 log L*[a,P;O, 0; [X(t); t e IO,T ]
= 0 (3.4.42)
have a solution (a ,* ) which is weakly consistent for (&,P). Further
more, the asymptotic distribution of [(A ( *y); 4n (4*8)] as n m is
normal, with null mean vector and variancecovariance matrix Z where
[*
a11
* =
a12
L
a12
*22
22
2 2 2
Now, (w ?W +w X 12U w A ) I
Now, al = 1 (2 22 22 11 2 2212,
2 2 2
a22 = (12 Xli + w1 22 11 12 12)
and
2 2
12 = A 12 2211+ 11 12 1)22 (U2 + 11 22 12
with, writing a1 = and 2 = '
2 *
= =1 Efa 82 log L
w"ij n E r6939.
1 j
. =..n1E( log L
ij l o8. /
2
A = wll W22 W12.
d log L *
\ 9. /J '
J
(3.4.48)
Proof:
Writing
log [Pi [c,';CO', 0; (Yi(t); t E I ,T)]}
T T
= J Y (t)(S )dt +
0 0
(log ?t log 0)dyi(t),
t ~t I
we have from (3.4.34) that
log L *O.B,;&c00; [x(t); t IO, T]
= log Pi [',;cO,'80; (Yi(t); t C IO,T)].
i=1
(3.4.43)
(3.4.44)
(3.4.45)
(3.4.46)
(3.4.47)
(3.4.49)
(3.4.50)
As mentioned prior to the statement of Lemma 3.4.5, we cannot regard
(p ; i e J1,N as a sequence of probability density functions. However,
as we shall see, this is not essential to the proof of the theorem.
In the proofs given by Cramer [6] for the single parameter case
and Chanda [5) for the multiparameter case, the only property of the
density function crucial to the proofs is
a log f,
E( = 0, (3.4.51)
where f is the density function and 9 some parameter of f. The property
holds as soon as I [ is bounded by an integrable function (over the
sample space).
But in the present case we have, from Lemma 3.4.5,
n *
E alog } E( lo 0, (3.4.52)
J 1=1 j
which implies
6 log p
E = 0, (3.4.53)
for j C J1,2. Thus the assumption that p is a probability density,
and, in fact, the requirement that I 'I is integrable (or summable
in this case) are unnecessary.
We now must establish the three conditions given in the proof of
Theorem 3.3.1 to complete the proof of this theorem.
(i) To see that log p. is differentiable up to the third
order for almost all yi, we need only note that t
and log Ft are differentiable to that order; this is
easily sho'n using (3.4.5).
(ii) As mentioned, we no longer need the summability condition
,2 *
for I '. The additional requirement that 1 I be
j j k
bounded by a summable function is used such that
9 *
Ek p= = 0, (3.4.
Ii i k
and hence
2 *
a log p. a log P* ( .
E 6e. e 1 J = E (3.4.
Sk j k
But we shall not make use of the property (3.4.55), so it will not be
,2 *
o p.
necessary to have I bounded by a summable function.
Sk 3 *
j k s log p.
Thus, we need only establish the condition that ioge e
b6 de k6
0J k I
< H. j,k, ), where EI[H j,k,(i)l < M, some finite, positive constant.
Using (3.4.49),
C3 log p T a3
i leke a i 0 Yi(t)dt
0 j k 0
54)
55)
3 2
+ T 1 2 t
0 j' k t \aea a k ae,
2 2
2 t t 2tt
23 t (t) (3.4.56)
j k I
Since, for every (81 ,2) E 0, we can easily bound t and its derivatives
2 It
used in (3.4.56) for all t C IO,T we can find Hj ,k,(yi) as soon as
T T
E[( y (t)dt] and E[J dy (t)} < ", which follows from the proof of
0 0
Lenma 3.4.5.
(iii) The third condition required for Theorem 3.4.2 to be true
is that J = ((Jij 1(9,2)) is positive definite and that
IJI is finite, where
i E a 2log p log p
Ji,j(el, 2) A 9. J
i j
.ij 
13
(3.4.57)
From (3.4.36) and (3.4.37),
nX11 = f [a tl p (1+p ) 1]
0 0
+ 2 fT T t+u( t)
+ pp )
0 0
T T
0 0
l u u i
[ pU (l+ap ) dt uR (t ,u)
(l+cpU) ]2
RO (t,u)dt du
[C p (1+ap t)] p (l+ap ) dtR (t.u)du,
(3.4.58)
1 1 T T 1 t t )1 u u 1
nhl2 = B [ pt(l+op (lltp) (1+ap Bu)(l+Yp u) dt,uRo(t,u)
0 0
T T
+ T J Jp t+U(l+p t0t) [(l+app ) (l+CYp )]2 R(t,u)dtdu
0 0
T T
J J (1+a(pt t) (l+Yp ) pU (l+p )2
0 0
+ crcUU p (1+ap) (l+p ) t) p (l+apUBu) (l+apU)c )
X dt RO(t,u)du,
T T
n22 = B2
0 0
(3.4.59)
(l+ap tet) (1+p UBu) [(l+cp t) (1+p U)]1
X dt R (t,u)
t,U 0
T T
+ 2T J Jp (t+0p St) (l+opUSu) [(l+cip )(+pU)]"
0 0
X R (t,u)dt du
T T
2a_11 T fT (++ tt)(l+, u)(l+p t1 (l+culu2
0 0
xdtRo(t,u)du. (3.4.60)
For all (c,B) 0e R (t,u), given by Lemma 3.4.2, is finite, as are its
derivatives. Thus .ij is the integral of a finite integrand over a
finite interval, and so is finite. This completes the proof of the
theorem.
Thus we have shown that (0 ,P ) have the properties usually asso
ciated with maximum likelihood estimators, losing only the property that
Slog pi log P. log p
E e i E E e i 1 i)} (3.4.61)
3 k j k
that is to say,
Uk = jk (3.4.62)
When this property holds, the form of 7 is greatly simplified; since we
cannot assume (3.4.62) here, calculation of E is tedious. In addition
to .ij 's given by (3.4.58) (3.4.60), we will need
Wll = (c1)B pt[a(l+&pt )4 ]1dt, (3.4.63)
0
T
t 2 t t1
I12 = (4+1) J (1+ip ) [p (1+ip ) Otp
0
X(l+ p ) 2]dt, (3.4.64)
and
022 = o(ao+l)B p [(l+c pt) 1St(+cp t) 2]2 dt. (3.4.65)
0
No closed form has been found for the solution to the estimation
equations (3.4.36) and (3.4.37), nor could a more convenient form be
found for the elements of Z It is conjectured that iterative numer
ical methods can be used, but we leave this particular topic open to
further research.
3.5 Arrival Time Observation
Assume that instead of observing the logistic process (X(t)}
itself, we observe only the times at which the process increases; that is,
the "arrival times" are observed. Observation is terminated when the
time of the Nth increase has been recorded. Hence we will observe
(Tj; j J1,N,' where
< n + j if t
x(T.) is
> n + j if t 2 T., (3.5.1)
where x(0) = n.
The distribution of this sequence of random variables is easily
obtained using Theorem 3.2.1. For any 9 2 0,
P Tk+j T.j e/(Ti = ti; i J,j))
= PX(t. + 0) X(t.) > k/(t.; i e J .)
3 3 1 1,3
= P[X(t.+e) X(t.) > k/C. (3.5.2)
where
x(t) = n
x(t) = n + 1
C =a
x(t) = n + j 1
x(t) = n + j
if t < t
if t 1 t < t2
if t t < t
f j1 j=
if t = t..
J
Since [X(t)} is :,arkov, we have
P[T .+ T. 9e/(T. = t.; i C J .)}
.k+ 3 1 1 1,J
= P[X(t.+e) X(t.) k/(X(t.) = n + j))
J J J
= P[X(t.+e) 2 n + j + k/(X(t.) = n + j)]
CO +j+iI n+j
i=k t.,t.+e t.,t.+
i=k J J J J
Since (3.5.4) is a function of tj only, it follows tht [T ; j C
is a Markov sequence, and from (3.5.4) the distribution function
given by
FT (;t.) = P[T j+t.+OT = t.
j+1
= 1 (Tt.,
3
requiring 9 0.
To follow the procedure suggested
ferent birth and de3th processes would be
function
where
by Moran [22] for several dif
to calculate the likelihood
L(c,p; t t ) = IT f (c),p; t ,
1 N" T J
j=1 j+1
f ( ;t ) [FT (6;t )],
j+1 d(t +) Tj+1i
(3.5.6)
(3.5.7)
(3.5.3)
(3.5.4)
1,Ns
is
(3.5.5)
which exists in Q since T. t.+, and thus F (e;tj), is absolutely
J' J J+1l
continuous for 9 > 0.
However, Moran's approach can be shown unsatisfactory for this
case by observing that, from (3.5.5),
lim FT (e;t)
S j+1 J
t.+e t
= lim [1 [(l+cp J )(l+ap j)l]n+j3
t
= 1 (l+p )(n+j) (3.5.8)
Hence there is a nonzero probability that, given T = t., Tj+ will
not be finite. Thus ETj+ /(T = t.)J does notexist, and the likeli
j+1 J J
hood function given by (3.5.6) is not useful for estimating (ci,p).
Perhaps a conditional likelihood could be used to get estimation
equations. Further study of the problem is needed.
3.6 Prediction
Assume that N equispaced observations will be made on the
logistic growth process (X(t); t e IO,C0, as in 3.3. Thus, based upon
observations taken on fX(j); j J1,N we will predict the value of the
time series at time N + m, that is, X(N+m).
Restricting our attention to the predictor which minimizes the
mean square error of prediction, we define
xN (m) a E[X(N+m)/[X(j) = xj; j e J1,N
= E[X(N+m)/(X(N) = xN))
= m (N,N+m)
N
(3.6.1)
= XN(l+opN)( 1+ap N+m)l
S(1)
using (3.2.10). Then x (m) minimizes the mean square error of pre
N
diction, which is
MSE = E[[X(N+m) ,(m)]2
= ERx (N+m,N+m)
SXN 'N(NPm (+Q ) 2
o(1+ct N N N+m N+m 2.
= EfX^ a(+c/p )(p p )(1+op ) }
= n(l+c) pN (1p m)(l+pN+m)2
(3.6.2)
where X(0) = n.
If cy and p are unknown, consider the predictor
(2)(m) = xN(+ N)( ^ N+m 1
(m) = x (1+cyp )(l+ao )
(3.6.3)
(2)
Denoting the mean square error of x (m) by MSE it
N SE2, it
( 2) (m) M1)()]2 9
MSE2 MSE = E[ (m) " X'
9 N N+m 1
= E(X[(l+ap ) (l+ap )
(1+&p ) (.opN+m)1 2 .
Define
v I ^ ^N+m
V = 1 + ci
6g = a 0,
6p= p p.
follows that
(3.6.4)
(3.6.5)
(3.6.6)
(3.6.7)
(3.6.8)
Then
6U = (1 + QpN) (1 + apN)
N N
= (&p &p
N N1N N
= (6 Ce) (6 p+p) Qp
= p 6 + NpN1 6p + 0 (n ), (3.6.9)
p
making use of the fact that the maximum likelihood estimators are weakly
consistent. Similarly,
6v = (l+cypN ) (1 + pN+m)
N+m N+m1 1
Sp 6 + (N+m)ap 6p + 0 (n ). (3.6.10)
Hence
1 N N+m 1
U = [6U + (l+ap) ] [6V + (l+cYpN) ]
= (l+op ) (+apN+m )+ (1+p N+m)6U
(l+ap ) (l+pN+m)2 6) V + 0 (n ). (3.6.11)
p
Now we can get an approximation for (3.6.4) by taking
USE MSE = E({X[UV1 (l+!pN) (l+0p N+m) 1 2
2 1 N
E(X) E[UV (1+apN) (l+YpN+m )12
= CO(N,N)(1+apN+m)2 Ef6U (l+cpN) V]2
2N N l ! 2 .. N, N+m) 2
= p n[n+a(lp )(1+a)1](l+()2 [(l+(p ) (l+apN )1
N N+m l m 2 1
X[a11[l(l+cp ) (l+Cpp ) p ] + 2 12 p
X[N(2N+m)pm(l+apN) (l+CpN+m ) + (N+m)
2m N2 N+m 2] 2 2
X p (l+ap ) (l+p ) ] + 22c p
[N (Nm)(1+ p N) (+epN+m ) pm ] 2.
(3.6.12)
Perhaps (3.6.12) is only a crude approximation for (3.6.4), but inspec
tion of MSE (3.6.2) and the approximation for MSE2 MSE1 (3.6.12) leads
to several interesting observations:
1. MSE1 is proportional to pN; that is, MSE1 decreases exponen
tially as N increases. This is intuitively plausible, since the corre
lation between X(N) and X(N+m) is given by
Corr [X(N), X(N+m)]
1/2
= R (N,N+m)[R (N,N) RO(N+m, N+m)]
= [(lpN (lN+m)1 1/2 (3.6.13)
Hence the correlation is a monotonically increasing function of N, and
it follows that X(N) will contain more information about X(N+m) when
N is large; thus the decrease in prediction error.
2. MSE1 is proportional to n. Thus, contrary to the error
associated with the maximum likelihood estimators of the logistic param
eters, the error of prediction increases with the initial size of the
process. This is perhaps partially explained by the fact that the
process variance at any time is proportional to n (Lemma 3.4.2);
heuristically, the population or process size is less deterministic for
large populations.
2N
3. MSE2 MSE1 is proportional to p Hence the additional
prediction error incurred by parameter estimation also decreases as
N increases, and at a faster rate than MSE1.
4. MSE2 MSE1 is proportional to n.
Observation (4) is surprising if true; but it depends most
directly upon the accuracy of approximation (3.6.12). Thus, though
79
this may be just another of the logistic process' strange characteristics,
one must question the credibility of (3.6.12). Rather, one expects the
increase in prediction error to be of order one, following the general
pattern that a decrease in estimator variance produces a corresponding
decrease in the ratio of MSE2 MSE1 to MISE1.
2 1 1
CHAPTER IV
AN EMPIRICAL COMPARISON
4.1 Introduction
Rhodes [25] has given a method of estimating the logistic
parameters which receives wide application. We present his method in
this chapter, with an attempt to examine the estimators' sampling prop
erties. These shall be shown to be analytically intractable; thus
a Monte Carlo experiment is performed in order to compare these esti
mators with those presented in the last chapter. Finally, the logistic
model will be fitted to the population of conterminous United States,
using both Rhodes' and maximum likelihood estimation procedures.
4.2 Rhodes' Estimators
Recall that the mean function for the logistic growth process,
given by (3.2.12), is
t I
EfX(t)/X(0) = nj=mn(0,t) = n(l+o)(l+ctp )1 (3.2.12)
Upon inspection, it is seen that
1 1 1
[m (O,t+l)] =p[m (0,t)] + (lp)[n(l+,:a)] (4.2.1)
n n
Rhodes thus defines
Y(t) = X(t)] (4.2.2)
Y(t) = [X(t)] (4.2.2)
and, observing the random variables (Y(j) =Y.) Y1 Y.. 2 Y N he finds
estimators for p and y in
t+ = + v t J1,N (4.2.3)
where
y = (p) a) (4.2.4)
and yt is the observed value of Yt
The estimators, obtained by minimizing
N1 2
S = E [y t+ (pyt .2 (4.2.5)
i=l
are
P r1 t 1 t+1 2 1
C1 3 )(Yt 9y2)j (tY) J (4.2.6)
=1 =1
and
0r = Y2 P Y (4.2.7)
where
N1
y = (N1) E y (4.2.8)
t=l
and
N1
y = (N1) Z y t (4.2.9)
t=1
Using (4.2.4), one gets an estimate for a:
&r = (Ip)(n r)i 1 (4.2.10)
Rhodes then shows by application that one obtains reasonably
close agreement between projection and reality, using these estimated
values. To my knowledge no probabilistic study of the properties of
these estimators has been made.
4.3 Sampling Properties of Rhodes' Estimators
Define, for 0 S s S t and u > 0,
1
Bn(s,t) = E[Y(t)/Y(s) n },
gn(s,t) = EY 2(t),Y(s) = n1},
and
n(s,tt+u) = E[Y(t+u)Y(t)/Y(s) = n }.
(4.3.1)
(4.3.2)
(4.3.3)
Now fix s,t and u, and write n and Tn for (4.3.1), (4.3.2) and
(4.3.3), respectively. We now prove
Theorem 4.3.1:
(i) B =
n
ni
n 1 nj
V log T T ji V ;
j=l
(4.3.4)
(ii) n 1 1 2 1 2
(ii) = [3[cos (1T)3 6n cos (1T) + 2rr
n 12o( 1T +2
nI j1
1 1 1 1 k
+ E [j log T j Z k v ;
j=1 k=l
(4.3.5)
and
writing
1 *
(iii) T) = [T n B ],
n+1 n n
T = (l+Cpt) (l+cps )
B = B (s,t+u),
n n
(4.3.6)
(4.3.7)
(4.3.8)
(4.3.9)
and defining
V = T(T1)
Proof:
It follows from Theorem 3.2.1 that
PlY(t) = j/Y(s) = n1 = P[X(t) = j/X(s) = n3
= Tn (1T)jn
0
if j = n,n+l,...
otherwise.
O j1 )Tn(lT) jn
n E 1
J=n
M1 +n1 n
= (j+n)1 n(1T) .
j=0 n1
C 1 1 )n+1 jn1
n+1 n
On+l = j n Tl(j
j=n+l
 +n1 .
1 1 n 3
= n v E [n(j+n) 1] nl T (lT)
j=0
= v(n n1)
Notice that
81 = E{Y(t)/Y(s) = 13
= j T(1T)j1
j=1
(4.3.10)
(4.3.11)
(4.3.12)
= V log T .
Thus
(4.3.13)
Now, defining the generating function
CO
n
P (z) = E B z
n=l
we have, using (4.3.12) and (4.3.13),
1 n
P (z) = 01z + vz P (z) vz E n z
n=l
Rearranging,
l
P (z) = (1 Vz )1 vz [log T + log (1z)],
which implies that
n1
n = n log T E j1 vj,
j=1
as was to be shown.
(ii) Similarly,
(jn)2 +n1
j=. nn ) T (1T)')
j=O
CD
+1= (j+n)2
nj=
j=l
+n1 n+l j1
n (1)
= n1 (j+n) [n(j+n)1 1] Tn(lT)
j=0
1
= V [n n B].
n n
(4.3.14)
(4.3.15)
(4.3.16)
(4.3.1)
(4.3.17)
(4.3.18)
Noticing that, from a result in Davis [8],
S= EY2(t)/Y(s) = 13
S j2 T(T)j1
j=1
1 [3[cos 6 os1 (lT) 2
12 v [3[cos (1T)] 6T cos (1T) + 2TT ,
we define the generating function
Co
n
P (z) = E z.
n=l
It then follows from (4.3.18) that
1 n
P (z) = 1z + vz [P (z) E n Pn zn],
n=1
which implies that
n n1 l
n1
n 1 nj
Sj
j=1
Upon substitution of (4.3.19) and (4.3.1), this is (4.3.2).
(iii) From (4.3.3)
In = E
E [(j+n)(k+j+n)] I
j=O k=O
X P(X(t+u) = k+j+n/X(t) = j+njP[X(t)
= j+n/X(s) = n3 .
n+1 = E E [(j+n)(k+j+n)]1
j=1 k=0
X P(X(t+u) = k+j+n'X(t) = j+n3}PX(t) = j+n/X(s) = n+1l
(4.3.19)
(4.3.20)
(4.3.21)
(4.3.22)
(4.3.23)
CD CO
= n ( k+j+n) [l+n(j+n)]
j=0 ko=
X P(X(t+u) = k+j+n.'X(t) = j+n]P[X(t) = j+n'X(s) = n)
= V [7n n*1], (4.3.24)
as was to be shown. This completes the proof of Theorem 4.3.1.
The reason generating functions were not employed to further
resolve (4.3.24) is that no closed form could be found for 1 '
It is not difficult to use Theorem 4.3.1 to obtain approximations
for the first order sampling properties of Rhodes' estimators. The
expressions are tedious, the computing process was extremely time con
suming (and hence expensive), and the few numerical results obtained
were nonsensical. This is probably due to rounding errors resulting
from the great number of calculations involved, or perhaps from the
inaccuracy of approximations made in order to obtain these expected
values.
Second order properties of Rhodes' estimators would involve
fourth order moments of [Y(t)], which are intractable. However, one
can perform Monte Carlo experiments to at least compare the estimators
for specific processes.
4.4 Monte Carlo Experiments
A computer program was written for the IBM 360 which
(1) simulates a logistic growth process given n, a', p and N;
(2) uses this sequence to calculate the maximum likelihood
estimates given in 3.3, using the NcwtonRhapson iterative
procedure;
(3) calculates E, the asymptotic variancecovariance matrix
for these estimators;
(4) calculates Rhodes' estimates for a and p, using the method
given in 4.2.
The results of several of the experiments are presented. The format is
as follows: first, a table is given which lists the generated logistic
sequence and the sequence of expected values. This table is followed
by a graph depicting both the generated sequence and the expected
sequence. Finally, a second table gives the maximum likelihood esti
mates and the corresponding asymptotic variancecovariance matrix,
using the first third, first twothirds, and all of the generated
sequence, respectively. Rhodes' estimators are also calculated for
each case.
Three such experiments are presented, each with a different
value of n and Ca. We have conducted several other experiments, and
those presented here are exemplary. Changes in p do not significantly
affect the results, except that rounding errors in the computations
affected the maximum likelihood estimates for p (p ). These cases
are practically uninteresting, though, since for small p the process
quickly approaches an asymptotic value, a characteristic uncommon to
most economic and other real sequences.
Note that Rhodes' estimates are unreliable when N is small,
whereas the effect of N on the maximum likelihood estimates is negli
gible. In addition, we could obtain asymptotic confidence intervals
for the parameters, using the maximum likelihood estimates and T. We
also note that the bias in a seems to depend primarily on the relationship
Table 4.1
Generated and Expected Sequence for
n = 40, a = 10.0, and p = 0.8
i x. m (0,i)
1 n
1 55 59.46
2 58 71.90
3 72 86.34
4 84 102.88
5 94 121.50
6 110 142.07
7 128 164.32
8 144 187.86
9 164 212.18
10 176 236.69
11 192 260.79
12 211 283.92
13 225 305.60
14 244 325.48
15 262 343.35
1 xi m (0,i)
1 n
16 267 359.13
17 276 372.84
18 291 384.58
19 296 394.52
20 302 402.84
21 307 409.76
22 309 415.48
23 313 420.16
24 314 423.98
25 318 427.09
26 319 429.61
27 321 431.65
28 321 433.30
29 324 434.62
30 326 435.68
400
350
300
:t)
250
200
00
0
* 0
0
* 0
o observed value
* expected value
Fig. 4.1. Generated and E:Tecterl Sequence for
n = 40, a = 10.0, and p = 0.S
S0 0
*
*
0 0
00 0000 00
0 0
) 5 10 15 20 25 30
Table 4.2
Maximum Likelihood Estimates, and Rhodes'
Estimates for Data in Table 4.1 for
n = 40, a = 10.0 and p = 0.8
N = 10
81.242
0.603
C = 5.909
p = 0.794
S= 69.464
S= 0.873
r
0.603
0.00493
N = 20
S 2.801
0.0170
0.0170
3
0. 182 x 10
N = 30
3.247
0. 0162
0.0162
112 X3
0. 112 xlO0
7.134
0.810
9.747
0.851
= 7.045
= 0.808
0r =
Pr =
8.290
0.845
Table 4.3
Generated and Expected Sequence for
n = 100, a = 5.0, and p = 0.8
i x. m (0,i)
1 fl
1 146 142.86
2 179 168.54
3 225 196.85
4 265 227.41
5 304 259.66
6 339 292.89
7 386 326.29
8 425 359.05
9 472 390.40
10 509 419.73
11 534 446.56
12 566 470.63
13 591 491.84
14 607 510.24
15 629 525.98
i x. m (0,i)
1 n
645
655
668
677
682
686
691
696
697
698
700
702
702
702
703
539.28
550.42
559.67
567.30
573.55
578.65
582.80
586.16
588.88
591.07
592.83
594.25
595.39
596.31
597.04
S0 0 0 00000
0**
o observed value
S expected value
t
Fig. 4.2. Generated and E, ected Sequence for
n = 100, a = 5.0, and p = 0.8
700
450
X(t)
400
300
250
0 0
200
150
n=100,

Full Text 
PAGE 1
ON SOME PROBLEMS OF ESTIMATION AND PREDICTION FOR NONSTATIONARY TIME SERIES By JAMES THOMAS Mc CLAVE A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1971
PAGE 2
TO MY FATHER, MY MOTHER AND MY WIFE FOR THEIR LOVE, UNDERSTANDING AND FINANCIAL SUPPORT
PAGE 3
ACKNOUXEDGMENTS I wish to express my heartfelt thanks to Dr. K. C. Chanda for his expert and patient guidance in this effort. I also want to express my gratitude to Dr. J. G. Saw for proofreading this dissertation and for his helpful comments; and to my office partner, Dr. J. Shuster, for many helpful discussions. Special thanks go to my wife, Mary Jay, for her loving support, financial and otherwise. Finally, I wish to thank Mr& Edna Larrick for the magnificent job of turning the rough draft I gave her into this typing masterpiece.
PAGE 4
TABLE OF CONTENTS ACKNOWLEDGMENTS LIST OF TABLES . v LIST OF FIGURES vi ABSTRACT vii CHAPTER I STATEMENT OF THE PROBLEM 1 1.1 Introduction 1 1.2 History Â— Stationary Processes 4 1.3 History Â— Nonstationary Processes 5 1.4 Summary of Results 7 II LINEAR STATIONARY MODELS 9 2. 1 Introduction 9 2.2 The Autoregressive Sequence 10 2.3 The Moving Average Sequence 24 2.4 The Hybrid Model 32 III THE LOGISTIC GROWTH PROCESS 46 3.1 Introduction 46 3.2 Probability Distribution of fX(t) ] 47 3.3 Equispaced Observation 50 3.4 Continuous Observation in [0,T] 58 3.5 Arrival Time Observation 73 3.6 Prediction 75 IV AN EMPIRICAL COMPARISON 80 4. 1 Introduction 80 4.2 Rhodes' Estimators 80 4.3 Sampling Properties of Rhodes' Estimators 82 4.4 Monte Carlo Experiments 86 BIBLIOGRAPHY 100 BIOGRAPHICiVL SKETCH 103
PAGE 5
LIST OF TABLES Table 2.1 Tables of Efficiencies for q= E d C. j = l "^ ^ 4,1 Generated and Expected Sequence for n = 40 , a = 10.0, and p = 0.8 88 4.2 Maximum Likelihood Estimates, Z and Rhodes' Estimates for Data in Table 4.1 for n = 40, a= 10.0 and p = 0.8 . . 90 4.3 Generated and Expected Sequence for n = 100, a = 5.0, and p = 0.8 91 4.4 Maximum Likelihood Estimates, Z and Rhodes' Estimates for Data in Table 4.3 for n = 100, ck= 5.0 and p = 0.8 . . 92 4.5 Generated and Expected Sequence for n = 400, q! 3.0, and p = 0.8 94 4.6 Maximum Likelihood Estimates, E and Rhodes' Estimates for Data in Table 4.5 for n = 400, a = 3.0 and p = 0.8 . . 96 4.7 Summary of Results of Study
PAGE 6
LIST OF FIGURES Figiire Page 4.1 Generated and Expected Sequence for n = 40, q= 10.0, and p = 0. 8 89 4.2 Generated and Expected Sequence for n = 100, a= 5.0, and p = 0.8 92 4.3 Generated and Expected Sequence for n 400, a 3.0, and p = 0. 8 95
PAGE 7
Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ON SOIVEE PROBLEMS OF ESTIMATION AND PREDICTION FOR NONSTATIONARY TIME SERIES By James Thomas McClave March, 1971 Chairman: Dr. Kamal C. Chanda Major Department: Statistics Many techniques are available for estimating the parameters of linear stationary time series. The effect of some of these estimators on the least squares predictor for some future value of the series is examined. We have obtained approximations for the increase in prediction error due to parameter estimation in several cases for which no exact expression could be found. An efficient estimator for the parameters in a first order autoregressivemoving average model is developed by making use of a linear function of the autocorrelations. For more general models we conclude that efficient estimation is so difficult to attain that first consideration in many estimation prediction problems should be given to ease of calculation. Numerous unsolved estimation and prediction problems remain for nonstationary time series. We consider the logistic growth process, which is used extensively as an economic and population growth model, in detail. Current estimation procedures for the logistic process' parameters make no reference to an error structure.
PAGE 8
We propose a probability structure consistent with the realistic properties of the series. We then use this structure to obtain estimators from three different observational standpoints: (1) that of observation at equidistant time points, (2) that of continuous observation, and (3) that of arrival time observation. For observation types (1) and (2) we have used a modification of maximum likelihood procedures to obtain estimators having most of the usual properties associated with maximum likelihood estimators. A computer program was written for type (1) to solve the intractable estimation equations, using the NewtonRhapson procedure. Observation of arrival times is shown not to lead to any useful estimation procedures. We also examine the effect of the estimators calculated from the observations taken at equal time intervals (type (1) above) on the error of prediction. The procedure developed for observation type (1) is then compared by means of example to an estimation procedure developed by Rhodes. The logistic model is also fitted by each method to the population of conterminous United States. The estimates are then used to obtain predictions of the population of conterminous United States in 1970. We conclude from the results that the effort required to calculate the maximum likelihood estimates is worthwhile. We further conjecture that the methods developed may be applicable to other growth processes.
PAGE 9
CHAPTER I STATEMEJfT OF THE PROBLEM 1. 1 Introduction Consider the following two types of time series: 1. fx(t) ; t e J , ], where J , is the set of real integers a.b* a,b {a,a!l,... ,b}. This is the discrete time series. 2. {X(t) ; t e I , ], where I , is the interval (open or closed) a,b^ a ,b between the real numbers a and b. This is the continuous time series. Suppose an observation is made on a given series, either obtaining an observation set of the kind [x(S),x(S+l) x(T) ] , with S and T real integers such that J CI j from the first or second type; or of the kind [xCt) ; t Â€ I }, with I c I . , from the second type. Of! ^ t ^ 3. J t) The problem to be considered is that of finding the least squares predictor of X(Thm) (with m an integer if the series is discrete and T+m s b) based upon the observation. Thus, we want to find that function of the observation set, say x (m) , which minimizes the mean square error of prediction. The first step toward the solution of this problem is to construct a parametric representation of the series. One must make use of all available information .ibout [X(t) ] in order to place the problem in a parametric framework. This parametrization may vary from specification of the first
PAGE 10
and/or second moments of the series to complete specification of the probability distribution at each point t. This study of the series' genesis will usually place the process into one of two subclasses: (a) Weakly stationary, hereafter referred to as stationary, series; that is to say, those whose first two moments are, for every t , E{X(t)]= ^L, (1.1.1) independent of t ; and E{X(t+s)X(t)} = Y . (1.1.2) a function of s only. (b) Nonstationary series, the complement of subclass (a). If the series satisfies the conditions of stationarity , the prediction problem is simplified. The minimum mean square error predictor depends on the first two moments of the time series. Since for stationary processes the second moment depends only upon the "lag" or time difference of the random variables, and, since the first moment is independent of time, all sequences of observations which are of equal lengths yield the same information about the series' first two moments, no matter when in the series' development the observations are taken. This second order homogeneity simplifies the problem to the extent that all further parametrization pertains to the description of ^ and y , from which the predictor will be calculated. On the other hand, if the series is nonstationary, the problem is more complex because of the nature of this subclass. We will divide the nonstationary subclass into two categories:
PAGE 11
(i) those for which a simple transformation exists which transforms the process into a stationary time series, and (ii) those for which either no such transformation exists, or at least no simple transformation can be found. As to the first category, the nature of the transformation depends, of course, on the reason for the series' nonstationarity. As a simple example, suppose {X(t)] is a continuous time process with E{X(t)3 ^(t) = a + 0t, (1.1.3) E[X(t) X(t+s)} = Yg. (1.1.4) Then the process [Y(t)], formed by taking Y(t) = X(t) m(t), (1.1.5) will be stationary. This particular kind of transformation is popularly referred to as removal of the trend from [X(t)];it can be accomplished for simple linear trends like that given by (1.1.3), or for more complex seasonal or harmonic trends common to many economic time series. The end result is the same; this further parametrization has permitted us to reduce the problem to that of stationary series prediction, for which solutions are known. However, it is perhaps within the second category that one encounters the most complex prediction problems in time series. Without the homogeneity of the first two moments, the nonstationary process' analysis will certainly depend on when the observation is taken; and the underlying probability structure of the series can be extremely complicated. In short, general methods cannot be used to solve the prediction problem for nonstationary processes with satisfactory results. Each
PAGE 12
series must be considered individually, its evolution thoroughly researched, a parametrization made, and then an optimal prediction procedure developed. One can hope that this systematic method will apply to a general group of processes, but that the actual procedure developed will apply to any but the specific series being investigated is extremely doubtful. 1.2 History Â— Stationary Processes The literature on the prediction problem for the stationary time series is voluminous. That the area is extremely attractive follows from the above; most results have wide application within the subclass of stationary processes, and the chances that a problem is mathematically tractable are enhanced by stationarity. In addition, of course, stationary processes have great practical significance. Problems of least squares prediction for stochastic processes were first considered by Kolmogorov [20] , using Hilbert space geometry. This treatment has been adapted to the stationary processes by several authors, including Doob [9], Yaglom [36], and Parzen [24]. Hence the prediction problem is essentially solved for all stationary time series; the known general methods need only be applied to the specific stationary process being considered. Many authors have contributed to this end, among them Doob [9], Yaglom [36], Whittle [33,34], Box and Jenkins [3,4], and Jenkins and Watts [17]. The particular parametrization proposed often results in an estimation problem which must be solved prior to making the prediction. This is illustrated by three of the most commonly hypothesized discrete time series: the autoregressive, moving average, and mixed
PAGE 13
autoregressivemoving average (hybrid) models. Each receives wide application in engineering and economic research, and each entails parameter estimation. These estimation problems have been considered by Mann and Wald [21], Wliittle [33], IXirbin [10,11,12], Walker [28,29,30,31], Hannan [16], and Box and Jenkins [4]. Some of the estimation procedures are rather complicated to apply; but, for the most part, satisfactory estimation methods exist for these linear stationary models. The literature is nearly devoid of research exiDloring the effect of parameter estimation on the error of least square predictors. Even though the error increase due to parameter estimation will usually be of a smaller order than the mean square error, the precise nature of this increase needs to be studied. Box and Jenkins [4] do present a brief discussion of this problem for one simple time series model, but to my knowledge the subject has not been mentioned elsewhere. This problem will be considered at some length in Chapter II. 13 History Â— Nonstationary Proce sses All the work on the prediction problem for nonstationary time series is divided into two distinct groups: that which deals with trend removal in order to make the process stationary, and that which does not. We shall consider only the latter in this dissertation; the interested reader may consult Grenander and Rosenblatt [15] and Box and Jenkins [4] for a complete treatment of trend removal. Kolmogorov's Hilbert space solution to the prediction problem is applicable to nonstationary processes. The solution depends mainly on the second oidor properties of the process. Since the second moments
PAGE 14
are inherently nore complex for nonstationary processes than those for stationary processes, the result of the application of Kolmogorov' s method to a nonstationary time series usually does not have general application within the class of nonstationary processes. For this reason one finds that most of the estimation and prediction research in this area has dealt with specific process types; for example, queuing processes, renewal processes, birthdeath processes, and growth (evolutive) processes. The list is not exhaustive; each class has received attention in the literature, much of it devoted to the estimation and prediction problems. Specifically, considering the birthdeath and growth processes, t>oth usually Markov processes, most of the research has been concentrated on homogeneous processes; that is, those with transition probabilities independent of time. Feller [13] gives a rather complete introduction to the birthdeath processes, defining the underlying probability structure of them. Then David Kendall [18] uses generating functions to obtain solutions for the probability distribution of these series at any time t. Moran [23] partly solves the problem of estimation for homogeneous birthdeath processes, and both Moran [22] and Kendall [19] propose several r.iethods for estimating the parameters of the transition probabilities of some simple nonhomogeneous birthdeath and growth processes. Since the publication of these papers, research in this area has been sparse at best; and the prediction problem has received surprisingly little attention from statisticians. Biologists and economists
PAGE 15
(Rhodes [25], Tinter [26], and Granger [14]) have taken some interest in this area, since it includes prediction of animal population size and economic forecasting. Unfortunately, these problems are often treated with little reference to underlying probability structures. The results rely more on intuition than mathematics, and hence the predictions are often unsatisfactory. An example of this kind of treatment is given by Rhodes [25]. Rhodes considers the problem of estimating human population size without using any probabilistic arguments. His early paper was an indication of things to come. In all fairness though, early researchers had a legitimate excuse for steering away from these arguments, for they often make prediction problems for nonhomogeneous Markov processes mathematically intractable. However, in this age of the electronic computer, many formerly insoluble problems can be treated iteratively and solutions obtained. 1.4 Summary of Results The results are divided into two distinct categories: first, in Chapter II the effect of estimation on the error of prediction for linear stationary time series is considered. For several models an exact expression for the error increase due to parameter estimation is obtained; in many cases for which this is not possible, approximations for this increase are given. For the mixed autoregressivemoving average model, a new estimation procedure is examined with respect to the prediction problem.
PAGE 16
In Chapter III the logistic growth process, a nonstationary, Markov time series, is considered in detail. A realistic probabilistic structure is assigned to this process , and then the parameters involved are estimated from several different observational standpoints. Finally, the prediction problem is considered for this process, and the effect of parameter estimation on the mean square error of prediction is examined. The methods developed for the logistic process may be applicable to other nonstationary processes. Nevertheless, the logistic process has application in describing animal population growth and certain economic growth patterns, so that the results may be useful in themselves. Chapter IV is devoted to a comparison of an estimation procedure (for the logistic process) derived by Rhodes [25] with one of those derived in this paper. The analytical intractability of the estimators' sampling properties calls for numerical comparison, so that conclusions are tentative. The purpose of this Monte Carlo study is to determine whether the extra effort which the probabilistic methods require is worthwhile, or whether one should continue to estimate the parameters, using methods like those proposed by Rhodes; that is to say, those which rely mainly on intuitive appeal and their ease of calculation.
PAGE 17
CHAPTER II LINEAR STATIONARY MODELS 2. 1 Introduction Many volumes have been written about estimation and prediction problems for linear stationary time series, but there is little in the literature which considers the effect of estimation upon prediction. That is, the increase in the mean square error of prediction due to parameter estimation has not been examined. Knowledge about the increase in prediction error for various predictors may be useful. If the increase is small for several different estimators, one might choose the estimator easiest to calculate. On the other hand, one would spare no effort in finding that estimator which least increased the error of prediction if that increase were known to be significant. Our purpose in this chapter is to consider this estimationprediction problem for the three basic linear stationary time series: autoregressive, moving average, and a mixture of these (hybrid). Only discrete time series will be examined; since the stationarity of the process implies that the mean is mathematically independent of time, this mean will be assumed known and equal to zero. All results remain asymptotically true for the case of an unknown mean for the series formed by subtracting the arithmetic mean from each of the observed values. To pursue this would be stray from the objective; the interested reader is referred to Bartlett [1] and Box and Jenkins [4]. 9
PAGE 18
10 Finally, the sequence [Z ; t = 0, Â±1,...} will be used throughout to denote a stationary series whose first two moments are the following: E[Z^} = 0, (2.1.1) rcl if j = ^ ^^J ^0 if j p^ . From time to time comment will be made on the case where Z is normally distributed. But unless specified, no distributional assumptions are made. 2. 2 The Autoregressive Sequence If fX : t e J 1 is a stationary time series satisfying the I. Â•(; ' _00 00' relationship K + a,X^ , + ...+ QX^ = Z. (2.2.1) t 1 t1 p tp t for every t,then [X } is said to be a p order autoregressive sequence. The problem of estimating the set of parameters [a.; j ^ ^i ^ is analogous to the classical regression problem. Defining y. as the jstep covariance of [X } and p . as the jstep correlation, that is t J to say Y. = EfX^ . X^} (2.2.2) 'J t+j t" and we get 1 'j Yj^ Â• P Â• YYr Â• (2.2.3) E Oi.y. = Y j=l ^ ^" " (2.2.4) for r e J by multiplying (2.2.1) by X and taking expected values. We use the fact that Z and X are uncorrelated when r < 0, which t tr
PAGE 19
11 will be sho.m later. Equation (2.2.4) gives rise to the estimating equations P Z a C = C (2.2.5) _ i ir r for r e J, , where 1,P is the sar.ple covariance function for the observed sequence {x ; t e J }. Mann and Wald [21] have shown that the estimators fa ; J e J ] given by (2.2.5) are asymptotically normally distributed "Â• j l.P 1 2 1 with null mean and variance covariance matrix (^Yq) ^ Z , where E is a (p xp) matrix whose element in the (i,j) cell is (D. . = 0. (2.2.7) The vector of estinates whose transpose is q^ = (a^ ,Q!^, ,0 ) is T shown to be the maximum likelihood estimator of a = (a^ ,a^, . . ,a ) if the distribution of Z is assumed normal for each t. In order to predict the value of X using the observations X ,x , ... ,x., we will use the function which minimizes the mean square error of prediction. That is, the optimal predictor is x^ (m) which minimizes MSE[^f/\m)} = Ef[X^^ }^f/\m)]^} . (2.2.8) " N ' ^ N+m N ' But this irolies that ^.^\m) = EX. /(X. = X.; j e J^ )} . (2.2.9) > N+m J J 1,N '
PAGE 20
12 Substitution of (2.2.1) into (2.2.9) yields xA '(m) =. E{[Z Z a. X .]/(X. = x.;j e J, ^)} N N+m ._ J N+mj j j "" 1,N "' = T. a. X (mj) J=l J ^ (2.2.10) where ^Â»a) = E{X ./(X =x Â• k e J, )] "Â• N+j k k 1,N ^ if j > 'Sj.^ if j ^0 (2.2.11) 'J+J Thus one can calculate the mstep predictor, xÂ„(in) , using the recursive N relation (2.2.10). This, however, requires that the autoregressive parameters be known. When they are not, we may logically use (2) ^(2) 3C. (m) = E a. x^ (mj) (2.2.12) To be determined is the increase in the error of prediction caused by using X (m) instead of x (m) . N N Case 1: p = m = 1 For the sake of simplicity, let a = a^ , so that fX ] satisfies the relationship ^t ^\l = \ for all t. Using the MannWald estimate for a, one has 1 and a . c^ c^ Var (a) Â« ^^Yq^'^ ^^ (2.2.13) (2.2.14) (2.2.15)
PAGE 21
13 Rewriting (2.2.13) and using the fact that the stationarity of [X } implies \a\ < 1, one obtains = 0X^ . + E c? Z^ . (2.2.16) for any positive integer k. We can extend this to the limit in the mean square sense as k Â• =. That is, one can write 00 X. = E a^ 1^ ., (2.2.17) ' j=0 "' * J=0 2 = (la^)~^ a^ (2.2.18) is finite. Note that the left side of (2.2.18) is, by definition, y^ ' and = 4[A"'^..<j]lIÂ«' vj} j0 ^^ j=o = (^ Y^. (2.2.19) Thus Pk " Yk Yo = Â» Â• (2.2.20) In (2.2.19) and (2.2.20) we assume k is a positive integer; that Y t ~ Yi, is easily shown.
PAGE 22
14 The minimum mean square error predictor of X is given bysetting p = m = 1 in (2.2.10), whereupon ^^\l) = ax^. (2.2.21) The prediction error is given by = ^f<\.i ^v'^ = a^. (2.2.22) If a is unknown, the predictor corresponding to (2.2.12) is x^^\l) ax^. (2.2.23) Using the fact that [X X (1)] is uncorrelated with the sequence N+1 N {X ; j e J^ ^3, one has = MSE, + E[xl(aa)^]. (2.2.24) IN Thus the increase in the error of prediction due to estimating a is MSEÂ„ MSE, = EfxffaQ')^] . (2.2.25) 2 IN ' Suppose the estimator a is calculated using only the first k observations in the sequence [x.; j Â€ J, Â„}, where k Â« N; that is to say, we form the sample covariances in (2.2.3), using only k observations. Since p. decreases exponentially, the [X . ; j e J } and X are virtually uncorrelated, and hence so are a and X^. A first approximation for (2.2.25) is thus
PAGE 23
15 = Yq k"^(la^) = k"^ a^ . (2.2.26) Box and Jenkins [4] present an argument similar to the above. The degree to which the approximation (2.2.24) holds is impossible to ascertain from these Â£jguments. To obtain a more exact expression for (2.2.25), first define 6 = C V (2.2.27) J J M for j e J . Then we can write l,h (aci) = C^ C~ a = (5Â„ + Yn)'^ (S^ cfi^), (2.2.28) ^0 ^ Yo^ ^"i ^0^ using the fact that y., ~ '^Yn ~ ^Â• Then Ef(5a)} = E{(6^c.6^)2 (6q + Yo)'^} = E[y'(6^P 6^)2 (l26^Yo'3(6^ Yo')'Â•Â•Â•)3 = EfYQ^(6^p 6^)2} + o(n"^), (2.2.29) Â—1/2 since Â£ and i have sampling errors whose order of magnitude is N (Bartlett [1]). Thus, retaining terms of order N and higher, (2.2.25) becomes E{xJ(^:c.)2} Â«Yo^ ^f^^^^V^^ ^2.2.30) We now state Theorem 2.2.1: If ^X ; t e J ] is an autoregressive sequence of order one, then
PAGE 24
16 MSE^ MSZ^ = ^~'^[k^{2) (1 + a^) ^ a^ K^i4)K^ (2)} + o(n"''"), (2.2.31) \rtiere h (r) is the r cumulant of the distribution of fZ } . Proof : Referring to (2.2.30), we must evaluate (ii) eTxT. 5^} , and (iii) Efx^. 6^6^} . (i) We will evaluate the first term in detail in order to make the methods clear. First, note that = E[XJ (N^ Â£ Xl y/] 2 ?/ ^ 2\2 1 ^22 = ^f^ 4(,f, ^t) ^^ YO J^ < < Yo xj] . (2.2.32) Now 9 / N 2 N N t1 2 2 2 Efx! ( E x;) 3 = E E[x;x} + 2E E E[X X x } . (2.2.33) ^ \=1 *^ t=l ^ ^ t=2 s=l ^ s t From (2.2.16), for t > s, ts1 tS I^ J. , X^ = c X + E Q' Z^ , t s , Â„ tk k=0 = a~^ X + Y , , (2.2.34) s s , t
PAGE 25
17 where ts1 Y = E d^ Z . (2.2.35) ^'^ k=0 ^"*' We now write (k) k ^s,t = ^f^s.t^ ' (2.2.36) defining (k) V^ ^ = Y (2.2.37) s , s s , s ' for all k e J ^. Hence we obtain from (2.2.34), for t S N, = Â«^Â«Â« EfxJ) . v<2> EtxJ) . (2.2.38, Furthermore, for s ^ t S N, EfxJx^xJ) = E[E[X^X^xWl) = a^^^EClc**'' X . V ]^ X^] s s, t s" t ,N *Â• s s, t s" ^^2(Nt).4(ts) 6 . [6a2(Nt).2(ts) ^(2) 2(ts) (2) 4 s,t t,N " s" , 4^2(Nt).(ts) ^(3) ^3 s, t * s" ,. 2(Nt) (4) (2) (2), , 2, Before we can write (2.2.32) in terms of the cumulants of {Z }, we need to note that the r cumulant of the [X } process is, using (2.2.17),
PAGE 26
18 ><^(r) (1/)'^ H^Cr) , (2.2.40) and hence that EfXÂ®3 aa^) ^ K^(6) + 15[(la^)(la^)]"^ k (4) k (2) + 10(1 a^)"^ 4(3) + 15(1 a^)"^ K^(2), (2.2.41) E[X^} = (lQ^) ^ K2(4) + 3(1 a^) ^ H2(2), (2.2.42) E{X^} = (1cy^)"^ H^(3), (2.2.43) s ^ and EfX^} = (1a^)"^ H,(2). (2.2.44) We will also need, from (2.2.35) and (2.2.36), /..^ ts1 .. _ ts1 i1 _,. ., _ V<^i = E a'*^ [H ,(4) . 3.2(2)] . 6 E E a'^^^^^'(2) ^Â•* j=0 ^ ^ i=l j=0 ^ = dC^' (lc.^^*^^K^(4) + 3(lc.2)"2(la2^*"^^2 H2(2), (2.2.45) V<^> = 'T' a^J K (3) = (la^"^(la^^*"^^ K^O), (2.2.46) and /ON ts1 vf = E c^J H^(2) Â®'^ j=0 ^ = (la^"\la^^*"^^ K2(2) . (2.2.47) We are now prepared to obtain (2.2.39) in terms of the cumulants of {Z }; the result of substitution of (2.2.41) (2.2.47) into (2.2.39) is
PAGE 27
19 ^,,^2 J2. 2, _ 61 2(Ns) + 2(ts) ,_. r.. 4^^1 2.2^^ 2 (Ns)+2 (ts) + [(1Q' ) (lQ)] [Sa ^ 2(Ns) 2(Nt) 2(ts)^ .^. ... /I 3 2_ 2(Ns)+2(ts) ^ 2Nst., 2 + (1a ) [eck+ 4a ] H (3) ^1 2,3 2(Ns) 2(Nt) + (1a ) [1+lOa + 2a + 2a^^*"^'*] >i\{2) . (2.2.48) We now substitute (2.2.48) into (2.2.33) to get N N1 EfxJ ( Z X^)^] = (la^)~^ K^iG) [ E a^ t=l t=0 + 2 E E a ] t::rl S= 1 N1 t + [(la^)(la^)]"^H (4)k (2)fN+l6 E E a^^"^^* t=l S=:l ^2 2t 2 ^"^ 2t + E a [2(Ntl) + a (2N+8t)] + 14 E a } t=0 t=0 32 2 ^"^ 2t ^"^ ^ 2t+2s + (1a ) ^ 4(3) {10 E a^ + 12 E E a^^^^ t=0 s=2 t=l N1 t1 + 8 E E a } t=l s=0 23 3 ,2 ^"^ 2t + (1a ) KÂ„(2){N +2N + E a [4(Ntl) ^ t=0 + a^(4N+16t)]]. (2.2.49) The second term on the right side of (2.2.32) is similarly evaluated to obtain in part
PAGE 28
20 NX. Z t=l t=0 22 2 ^~J2(Nt) + (la ) H^(2)[l + 2 E a 4 1 ^^ 2t = (la ) K_(4) E a ^ t=:0 N1 + (la^)~^ H^(2) [N+2 E a^*] . (2.2.50) ^ t=0 Putting (2.2.49) and (2.2.50) into (2.2.32) yields E{X^ 6q} = N"^{[(lcy^)(la^)^]"^(l+a^) K^(4) k^(2) + 2(lcy^)"^(l+a^) K^(2)} + o(n"^). (2.2.51) (ii) We proceed in the manner described above to obtain E{xJ 6^} = N"^[2[(lo^)(la^)^]"^ a^K2(4) K^(2) + (lcy^)"^(l+4cy^a'*) H^(2)] + o(n"^). (2.2.52) (iii) Similarly, we find EfX^ 6^6^} = N"^f[(lc^'*)(la^)^]"^ ail+a^) k^(4) K^i2) + 4(la^)"'* a K^<.2)]+o(}i~h. (2.2.53) Finally, from (2.2.30) and (2.2.51) (2.2.53), MSE^ MSE^ = (1a^) K^^(2) [E[X^ 6^} 2aE[X^ 6^6^} = n"^[h2(2) (l+Q^)"^ a^ K2(4) ^2^(2)} + o(n"^), ((2.2.31)) which completes the proof of the theorem.
PAGE 29
21 Note that if [Z } is a normal process, K (4) is zero, so that t Ci MSE_ MSE = n"''' kÂ„(2) + o(N~ ) = n'"*" a^ + o(n"^). (2.2.54) Thus we have the rather remarkable result that, after [Z } is a normal process, X and a are uncorrelated when terms of lower order than N are ignored. That is, MSE^ MSE^ = E{xf] Ef(^Q')^} + oCn""^). (2.2.55) 2 IN ' Case 2: p = 1, m general Using (2.2.9) and (2.2.16) i^^\m) = E[V .J Z^^^_. . ." V^^J y^' ^,N^) = aV,. (2.2.56) N Then = c\ilc^'^)ac^)~'^ . (2.2.57) Assuming a unknown, one forms the predictor i^^\m) = a"" x^. (2.2.58) As in Case 1, MSE MSE = E[[Xy^m) X^^''(m)]''] z J. fj N ' = E[X^ [(5cy + a)"" a""]^], Â• (2.2.59) writing ^c/ a oi.
PAGE 30
22 Upon expanding (2.2.59) and remembering that Sqhas sampling error of 1/2 order N , we have MSE^ MSE^ = EfX^[(6c^)'" + m(y(6a)'""^+ ... + ma"'"^(6a) ]^} 2 2(niÂ— 1) 2 2 Â—1 = m or E[X^(.6q!) } + o(N ). (2.2.60) Using Theorem 2.2.1 and assuming Z normal, this becomes MSE^ MSE^ = N'Vc^^^^^'^^a^ + o(n"^). (2.2.61) If the assumption is dropped, the fourth order cumulant must be included in (2.2.61). Case 3 : p general , m = 1 The optimal predictor here is ^'^^'^ = " tfl "* 'N.lf <2.2.62) with prediction error MSE i = ^ft\>iV <^^^ ^ P 2 = E{(X ^ + E a. X^^ ^ .) 1 J*+l . j^ J N+lj J 2 = O^. (2.2.63) The predictor becomes, when the parameters are unknown, \ CD = E a. X (2.2.64) j = l j N+lj The increase in prediction error is given by MSE_ MSE .(2),,, (1) 2 ..^^ = EfCX^^l) X^^^l)]^] = Â£{(60)'^ X X^(6a)}, (2.2.65)
PAGE 31
23 T using matrix notation with X (X:,,,X,, ...Â•Â• .^^r )Â• The evaluation of ^ ~ N N1 Np+i (2.2.65) is intractable; Theorem 2.2.1 leads us to conjecture that, for normal Z , X and (6q') are uncorrelated (ignoring terms of order less than N ) . In this case MSE^ MSE^ Â«Yo^C<^)'^ ^(^>}T where X X has been replaced by its expected value. It follows from the results of Mann and Wald [21] that, asymptotically, V5r (6a) ^Np[cD .g^CyqE)"^}, (2.2.66) where N denotes a pvariate normal distribution and cp a null vector of P ^P order p. Thus MSE^ MSE^ = Yol^^^^"^ 4 P^ ^ o(n"^) = n""*^ P a1 + o(n""^), (2.2.67) T 2 11 using the fact that N(5g) [a^(y T.) ] (Sa) has a chisquare distribution with p degrees of freedom. Note that expression (2.2.67) is a simple generalization of expression (2.2.54) for the case p = 1. Case 4: p, m general In this most general case, the predictor is (2.2.10), where the values x (j) are determined recursively. When fo' ; j e J ] are ^ j I.P^ a(2) substituted for [q. ; j = J ], one obtains x^ (m) , given by (2.2.12). We are interested in calculating p MSE MSE = Ef[ E (a. xf^^\m) a. xf^^\m))]^]. (2.2.68) ^ J. ^ J M J N All attempts to obtain (2.2.68) in closed form have been unsuccessful. That the term is of order N follows from the properties of (^a) .
PAGE 32
24 Further conclusions might be possible with the aid of Monte Carlo experiments. In summary, the autoregressive process allows one to calculate estimators having many satisfying properties, including relative ease of calculation. Furthermore, the effect of these estimators on the error of prediction becomes negligible when N is large. Exactly how large N must be is dependent upon p and m. We have obtained precise expressions for the increase in prediction error for several special cases. 2. 3 The Moving Average Sequence If fx^; tej. )]isa stationary time series satisfying the *Â• "t (Â—00 CO * w relationship X = Z + Z + ... + 3 z (2.3.1) t t I t1 q tq for every t, then [X ] is said to be a q order moving average process. The problem of parameter estimation for the moving average sequence is complicated by the fact that the sample covariances, C, do not provide efficient estimators of the covariance function of fX ] , v Â• < t' 'j (Efficiency is defined with respect to the variance of the maximum likelihood estimator when Z is assumed normal.) The following is a brief history of the estimation problem. Whittle [33] obtained maximum likelihood equations for the moving average process assuming Z to be normal. Estimators were not obtained from these equations, however; even iterative methods have proven unsatisfactory for solving the likelihood equations. The search thus began for estimators of calculable form and with high efficiency.
PAGE 33
25 We note that ^J ^ t+j t" q = Ef( Z 0^ Z^ . ,)( E 0^ Z. .)} k=o ^ *+J^ k=o ^ *^ p qJ if j e J 0,q if j e J q+1, (2.3.2) defining 3Â„ = 1 and noting that v = v .Â• One can obtain estimates for fP.; j e J, ] by substituting C. for y. in (2.3.2) for j e J, J Lq* J ^j i.q However, the lack of efficiency of C. for y. has a predictable effect on these estimators. Whittle [33] has shown that for q = 1 and 3 =0.5, the efficiency of this method is about 0.3. Another representation of [X ] is the following: \ + S Q. X^ . = Z ' 3 = 1 ' ^' ' (2.3.3) where Js^Â° Ej =r \'i q (2.3.4) and where fl] . ; j e J } are the (distinct) roots of z^ + 3 z^ ^ + ... + 3 = 0. 1 q (2.3.5)
PAGE 34
26 If the roots of (2.3.5) are not all different, the representation is not affected; only the definition of the parameter set [a.; j e J } differs slightly. Also, in order that (2.3.3) be a valid representation (in the mean square sense) , we must require that all roots of (2.3.5) lie inside the unit circle. This restriction is not serious; in fact, it insures that the representation (2.3.1) is unique. Thus fX } may be considered an autoregressive process of infinite order. Durbin [10] proposes that, since it is easily shown that Â» decreases exponentially in r, we use the representation k \+ 2 a X ;:Â« Z (2.3.6) t .^^ J tj t where k is chosen large, but still k Â« N. The parameters [0; j e J ] may then be estimated using methods discussed in 2.2; finally, (2.3.4) and (2.3.5) are used to transform [5 ; j e J, , ] to estimators of J l.k' {p.; j e J ]. Durbin shows the estimators can be made efficient by J i.q choosing k(and hence N) large. All properties derived assume N infinite and k large enough that relationships (2.3.3) and (2.3.6) are nearly equivalent. The computation of the estimators requires considerable effort. Walker [31] finds linear functions of the sample correlations {r = *^T ^0 ' ^ ^ "^1 N^ which provide asymptotically efficient estimators of the correlation function [p. = y. v" > j e J ]. He then J J ^ 1 1 Q substitutes these estimators into (2.3.2) (expressing (2.3.2) in terms of correlations rather than covariances) , obtaining asymptotically efficient estimators of [p . ; j e J }. The efficiency depends primarily on the number of correlations used in the calculation of the
PAGE 35
27 estimators and the number of observations, N. The weights assigned to the sample correlations used in these linear estimators are determined iteratively, so calculations may be tedious. The most recent contribution is due to Hannan [16]. He uses the spectral representation of {X } to obtain parameter estimators. Though his work is too complicated to summarize here, it should be noted that the estimators are shown to be asymptotically normal with variancecovariance matrix identical to that of the maximum likelihood estimators. The procedure does involve an iterative procedure which, by his own examples, may be slow and thus expensive. His Monte Carlo experiments also showed inexplicable bias in his estimates. By now one realizes that the estimation problem for the moving average time series is anything but simple; but the author's objective is to obtain an estimator suitable for use in a predictor. The predictor which minimizes the mean square error of prediction is x^^m) = e[x /(X. = x.; j e J^ J. (2.3.7) N *Â• N+m J J 1,N' If the entire history of [X } were known, that is, if the observation were fx.; j e J Â„] , we would have .(1) "^ xA ^(m) = E[ E g. Z^^ ./(X. = X.; j e J )]. (2.3.8) i=0 ^ N+mj J J ' " 00, N J Note that [X. = x.; j e J_^ ] uniquely determines f Z . = z . ; j e J_^ }, and conversely, by using (2.3.1) and (2.3.3). Thus EfZ^^ ,/(X.=x.;jeJ ] Â•Â• N+k J J ' " Â»,N' = EfZ ,/(Z. = z.; j e J } 'Â• N+k J J ' "^ =Â°,N' J J = 0, (2.3.9)
PAGE 36
28 for k e J , since the fZ ] process is an uncorrelated sequence, 1 , 00 ^ t ^ Thus (2.3.8) becomes .(1) "^ xÂ„ (ra) = E p. z^^ . . (2,3.10) N J N+mj J=ni Remembering that a decreases exponentially in r, we can approximate r ^Nk ^y Nk1 ^xr V Â« ^XT u + ^ Q'^M ,, Â• (2.3,11) Nk Nk . , J Nkj J = l for use in (2.3.10). Henceforth we assume that the entire history of the {X ] series is known up to time N so that the discussion is not needlessly complicated. Case 1 : q = ra = 1 From (2.3. 10) (2.3.12) (1),,, Q writing 0=0. The error is = ^f(^N.i p v':f = a^. (2.3.13) If P is unknown, and one uses the estimator 0, we have (2) /v ^ '(1) ^ z^, (2.3.14) where from (2.3.3) CO Z = X + Z (0)^ X^ ., (2.3.15) ^ j = l ^"^
PAGE 37
29 and so From previous arguments we know that MSE^ MSE, = E[Oz^^ Z^)^}. (2.3.17) 2 IN N 1/2 If we assume that 3 has sampling error of order N , which is the case for each of the estimation procedures considered above, we have MSE, MSE^ = E[[3Z^ (63+p)(6z +Z )] } IN N N " 1, = EUza^r] + 3 E[(6Z ) ] + o(N "), (2.3.18) ^ N ' N where 63 = p and 6z^, = Z^^ Z^ . N N N Not e that it follows from (2.3.9) that if we use fx.; j e J^ ^^ ^ } J 1,N1 in calculating 3, then 3 and Z are uncorrelated. Then E{Z^(63)2} = E[Z^} Ef(63)^} = al E{(33)^} . (2.3.19) The second term on the righthand side of (2.3.18) follows from E{(6Z )2] = E[[ E (1)^ (3^'3^)X ]2] CO = E{[ E (l)'^ j 3'^"^(63)XÂ„ .]^} + o(n"^). (2.3.20) j = l ^J The value of (2.3.20) depends on the specific form of 3. However, the expression will be complicated for all estimates considered. One can
PAGE 38
30 2 obtain an approximation for (2.3.20) by replacing (5P) by its expected value to get J = l 00 2y. E j(j + l)3^^^"^^} = Ef(63)2} p^ CT2(l3^)"^ (2.3.21) Substitution of (2.3.19) and (2.3.21) into (2.3.18) yields 2 2_ Â„2_ n2.l USE ^ MSE^ ^ Ef(63)" a^Ll + (1P ) ]} = E{(P3)^} a^dP^) ^. (2.3.22) Whittle [33] has shown that the maximum likelihood estimator for has Â—1 2 asymptotic v"ariance N (10 ), so that, at best, the increase in the mean square error of prediction is given by MSE^ MSE = n"''" a^ Â• (2.3.23) A strong argument can be made for using the inefficient estimator given by substituting sample correlations into (2.3.2) and solving for 3; the estimation equation is thus r^ = C^ C^^ = 3(1+3^)"^ . (2.3.24) Only one of the solutions to this quadratic will be less than unity. The estimator is easy to calculate, and even if the efficiency is only 0.1, the additional error given by (2.3.22) is negligible for large N. Case 2: q, m general (m < q) Using the mstep predictor given in (2.3.10), the mean square error is
PAGE 39
31 qm 2 1 " N+m ._ j+m Nj ' = otn + i: 3 ) . (2.3.25) ' J=l ^ TOien the estimator q 3ci^\m) = Z e. 5 (2.3.26) N J N+mj is used, the increase in prediction error is given by MSE^ MSE^ = E[[X^ ^ X^ ^] } ** 2 = Ef[ E [z^ .(63.) + 3.(Sz .)}] } + o(n"''"). (2.3.27) By using only fx.; j e J^ , ] in the calculation of [3 . ; j e J, ], J l.Nq' 'Â•J l,q' the cross product terms in (2.3.27) vanish, leaving q q MSEÂ„ MSE., = e{[ E Z^^ .(63.)]^} + E[[ Z 3.62^^ .]^} 2 1 *Â• . N+mj J Â•* '. J N+m~j Â• J=m ^ ^ j=m + o(n"'^). (2.3.28) The first term on the righthand side of (2.3.28) is easily evaluated as q q E{ Z Z^ .(63.)^] = ot H E[(3.3.)^}. (2.3.29) J=m j=m The second term, however, is complicated by the fact that the fX } process is qdependent; that is to say, f v . ! J e J ] ^i"Â© all nonzero. J' (Yj J _q^qJ If we apply the methods used in Case 1, E{[ Z 3. 6Z .]^}E^ Z 3. ( Z (5.cy.)X^^ . jfj, j=m > N+'^J ^^^m J H=l J J N+mjt;jJ (2.3.30)
PAGE 40
32 where {q'. ; j e J } are obtained from (2.3.4), using f "H . ; j e J ] as the roots of ^ + 0z'^"''" + ... + 3 =0. (2.3.31) All that could be gleaned from (2.3.30) is the fact that an increase in q will result in an increase in the prediction error increase given by (2.3.27). But since no more concise expression could be obtained for (2.3.30), no approximation was obtained for (2.3.27). The argument for using the estimators which are easily calculated, but perhaps inefficient, still applies, of course. But determination of N in order that the increase be bounded by a given quantity must be accomplished by using numerical methods. 2.4 The Hybrid Model If f X^ ; t e J 1 is a stationary time series satisfving the ^ t ~^ , ^ " relationship t 1 tl p tp t 1 tl "^q tq (2.4.1) for every t, then {X ] is said to be a hybrid model of order (p,q). Estimation of the parameter set [a^ o' ; B, , . . . , 3 ] for the Â•Â• 1 p 1 q" hybrid model is the most difficult of those considered. The fact that the sample covariance function is an inefficient estimator of the true covariance function contributes to this difficulty. The maximum likelihood estimators are again intractable, but their asymptotic variancecovariance matrix can be used for calculations of ef f iciencj''. Durbin [11,12] again makes use of an infinite order autoregressive representation of [X ]. The problems with the procedure are similar to those mentioned in 2.3: the choice of truncation point for the
PAGE 41
33 autoregressive representation, the choice of N, and the cumbersome iterative procedure used in the calculations. However, the efficiency of the estimators can be made as close to unity as desired by taking the truncation point and N large. Walker [32] approaches the problem via the covariance function; by finding an efficient estimator of this function one can efficiently estimate the hybrid model parameters. The following relationships are used: P P E{( E Q. X^ .)( H a. X .)} q q = E{( E 3. Z .)( E 3. Z .)} (2.4.2) implies that P P 2 ^ E E a. a. y,^ . ,. .. =CT E 3. P. r I, (2.4.3) j^O i=0 ^ J ^(ts) + (ijO Zj^^_^jPj Jtsl' where cxÂ„ = 3f^ = 1. ^nd ts S q. This system of equations gives \_0i.\ j e J, } and fP.; j e J, ] in terms of the covariance function J i,p J i.q" {y.}In order that the solutions to (2.4.3) be unique, one must require that the roots of ^ + 3^ z*'"'+...+3=0 (2.4.4) have modulus less than unity. Also, after multiplication of (2.4.1) by X for r e J tr q+i,q+p taking expected values yields 2 a. Yr_i =0. (2.4.5) j0 J ^ J
PAGE 42
34 using the fact that X is uncorrelated with the sequence fZ. . ; j e J, ]. Walker finds efficient estimators of fv ], and then t+j l,*"" ' j uses (2.4.3) and (2.4.5) to calculate estimates of the hybrid model parameters. An iterative procedure is required, but the resulting estimators are asymptotically normal and fully efficient. Hannan [16] achieves the same goal by a different method: that of estimating the spectral density and using this to get estimates of the parameters. The effort required is substantial, but the estinators are asymptotically normal and efficient. The predictor of X, given the observation fx ; j e J 1 is N+m "J oo.N^ xj^\m) = EfX /(X. X.; j e J )} N * N+m J J ".N Â• *^ .(1) ^ Z a. x; ''(mj) + E 3. z^^ = < if m e J E a X (raj) J=l ^ if m e J i,q q+l,Â»' (2.4.6) irtiere .(1),., . Efx., ./(X. = X.; j e J ^ if j e J 1,00 '^ +3 if j e J 00,0" (2.4.7) Note that the values of fz^, . ; j e J, 1 are calculated from Â• Nj "^ l,q' {x.; j 6 J jg Â„}. using the infinite order autoregressive representation. When only [x.; j e J ] is observed, we may approximate [z .; j e J } by truncating the autoregressive representation after N terms.
PAGE 43
35 When the parameters are unknown, the predictor used is 3^(m) = < E a. x;/\mj) + E3. Â£ . if m e J i,q E a. X (mj) j = l ^ ^ if m e J , , (2.4.8) q+l,
PAGE 44
36 in (2.4.10) to calculate x^ (m) . The mean square error of prediction for (2.4.10) is MSE l^f^Vl^N^'"^^ ^ .2. = a^ Â• (2.4.13) When a and are estimated by a and P, respectively, we use x^^\m) = Sx^+ PzY (2.4.14) The increase in prediction error is ,2. MSE 2 MSE^ = Ef[cyX^+3Z^(5x^+aZ^)] }. (2.4.15) The term Z greatly complicates this expression; but if we assume Z ^ Z , N N' MSE^ MSE^ Â«i E{[(5<:^)X^ + (gp)Z^]^} + 2a^E[(5'a)(30)3, (2.4.16) where (i) X and oi are assumed approximately uncorrelated; (ii) (a,0) are calculated using only {x.; j e J vr_. } ^o that 3 and Z are uncorrelated; and, using the fact that fx } can be represented as an infinite order moving average process, (iii) E{X^Z^} = Ef[Z^ . J^ ^jVj^^N^ = dt. (2.4.17) z
PAGE 45
37 To obtain f 7]^ ; j e j^^J m terms of (c^,3), note that X 7 _L V J 3J1 = Z + (c^3) E a^ " z * J=l *^ (2.4.18) One can derive an estimator for a from the equation (2.4.5): "^r ~ ^r1 = Â° (2.4.19) for every r e J^^^. With this in mind, we define E^ as an asymptotic (kxk) covariance matrix with (i,j) element where C = C C i i+1 i (2.4.21) for i e J^ j^. By letting fiC. = C. C"^ V v"^ 1 i+1 i Yi+i Yi = C, a (2.4.22) and we have 5C. = CY , 1 1 ''i (2.4.23) = (6c.^^a^C)(6c +Y)1 Yi (6C Q^c.) + o (n"^^) J + l J p (2.4.24) But Bartlett [1] has shown that C. is consistent for y., so that (2.4.24) implies that C* is consistent for a 3
PAGE 46
Furthermore, from (2.4.18) we find if s = YÂ« = < Q^a+?^+2a3)aa^) ^ 2 m1 if s ^ 0, where a = (cwP)(l+a3)(la )~^ Â• Defining y = E{(6C )(6C )}, 'T,r+s ' r r+s ' Bartlett [1] has shown that 00 ) + o(l) J= 1 Remembering that p. = YYn ' *^ "^^ (2.4.25) to get E p .p . = / 2 21 1 + 2a (1a ) 38 (2.4.25) (2.4.26) (2.4.27) (2.4.28) if s = J=We now state a^ ^[2a^a^(lQ'^) ^ + (k1) a^ + 2aQ'] if s ;:^ 0. (2.4.29) Theorem 2.4.1: The estimator Â»o = .f ^ "jo <=j Â• (2.4.30) where d^ = (^o'^20 ' ' " " '^kO^ T 1 T 1 1 and ^0 1=^ (2.4.31) (2.4.32)
PAGE 47
39 with Tl :r (1,1 1) , is consistent and has minimum mean square L IXk' error among the estimators 5 = i: d. c. (2.4.33) with d T) = 1. This error is given by mse(Oq) [Tf e"^ V ^, (2.4.34) where a = < r ,r+s (ao' ) d+Q' 4aQM2a ) if s = r 2 2 2 2 (aa ) (2aQ'Q' a a ) a (lQ) if s = 1 if s e J 2,00 (2.4.35) Proof : That a is consistent for a follows immediately from the fact * T _ that C. is consistent for a (2. 4". 24) and clÂ„ T^ = 1. J 0 iNow if a is of form (2.3.33), MSEfa} = d^ S d^. (2.4.36) T In order to minimize this with the restriction d T_=: 1, we form the Lagrange equation L = d"^ Z d 2X(cl'^ Tl 1) (2.4.37) But :rTl = implies T T ^Q E X TT 0. (2.4.38)
PAGE 48
40 T Using d T] = 1, it follows that \ = [Tp^ e"^ T\)~^ (2.4.39) and hence dj = f E'^[Tf e"^ T]]"\ (2.4.40) as was to be sho\vn. Substituting (2.4.40) into (2.4.36), we obtain MSE[5q} = [if E"^ Tl)''^. (2.4.41) Finally, noting from (2.4.24) that a = Ef(6C*)(5C* )} r,r+s "Â• r r+s ' = WrVr+s^'^^^+s,r+s+l^'^^.r+s+l "^ Vl,r+s^ + ^\^r+s^' (2.4.42) substitution of (2.4.28) and (2.4.29) into (2.4.42) yields (2.4.35), which completes the proof. Table 2.4.1 gives efficiencies for a for several values of 0/ and P; three values of k (the number of sample covariances used in the calculation of a ) are given. All calculations were carried out on the University of Florida's IBM 360 computer. Also, efficiencies are calculated with respect to the variance of the maximum likelihood estimator of Q' given by Whittle [appendix of 35] as W = N"''"(la^)(l+a0)^(cyfp)"^. (2.4.43) The table shows that the efficiency is especially sensitive to changes in B; as tends to unity, it becomes necessary to use more sample covariances to achieve high efficiency.
PAGE 49
41 k = 4 Table 2.1 Tables of Efficiencies for aÂ„ = T, d.Â„ C . , jO J j = l = 2
PAGE 50
42 In practice, one would need to estimate 3 (using 2.4.2) in order to calculate a . Thus, though the effort of calculation is somewhat o reduced, an iterative method must be used. We can obtain an upper bound to the mean square error of prediction for >(2) Xj^ Â• (m) . Cq x^ 4P z^ (2.4.44) by considering .(3) ^ (n^) = ^0 ^N(2.4.45) The corresponding increase in prediction error is MSE 3 MSE^ = E[[(aoa)X^ + PZ^] } a^CS^ + K ^a+c3)^], (2.4.46) where we assume K observations [x.; j ^ J } have been used in the calculation of a , with K Â« N, in order that the assumption that X and Qare uncorrelated is more plausible. '^(3) In fact, the use of x (m) as a predictor of XÂ„ may not be N N+m umreasonable as long as is small , thus saving the rather tedious calculation of z^^. N Case 2 : p = q = 1; m general The mstep predictor is given by xÂ„ (m) = a X + 0Pz N N N (2.4.47) with mean square error MSE. if m = 1 21. a2[l+ (c^8)^(la^'")(la") ^] if m e J 2,Â»" (2.4.48)
PAGE 51
43 If we use Xj^^\m) = ^"'x^ + S"'"^ z^, ' (2.4.49) the increase in prediction error will be ^m1 ;^ ml MSE. MSE, ^E[l(a"a)X^+(a " 0a" " 3)2 1"] 1 ^ N N ' = E{[inc^Â™"^6cy)X^+ (^"""^(60) + ma"'"^ 0(6a))Zj^]^} + o(n"^) 2 2(ml), Â„2 2,^r/x x2, 2 (ml)^, .kqn 2. Â« m QWo+^ a^)E[(bQ!) } + 0E{(60) } + 2ma^^'""^\l+3)a^ E[(6a)(60)}, (2.4.50) where ue have replaced ZÂ„ by ZÂ„, and assumed (5,0) to be uncorrelated N N with X^. We can again obtain an upper bound for this rather complicated expression by considering 4^\in) . ^ x^. (2.4.51) Then JISE MSE is bounded above by MSE3 MSE^ . E{[5^^ (a\ + J"^ 0Z^)]^} ^ Cj2 ^2(ml)^p2 ^ K"^m(l+a0)^]. (2.4.52) In addition to the comments applied to this quantity in Case 1, notice that the increase becomes less significant as m is increased.
PAGE 52
44 Case 3: p general; q i= m = 1 The representation of f X, ; t e J 1 is now, from 2.4.1, ^ t Â— CO ro' X. + a^X. ^ + ... + a X. = Z + pZ (2.4.53) t 1 t1 p tp t t1 The predictor is (1) P x; "^(l) = E a. x^, ^ . + pz^. (2.4.54) N Â•_! J N+lj N and the error of prediction is P MSE^ = ^[[^N.l ^.^, ^J VlJ ' P^N>^ 3 (2.4.55) Wlien fQ'.,6; j e J, ] are unknown, estimators can be obtained "Â•J 1 , P ^ from (2.4.2) and (2.4.5); we have P E Q. C . = (2.4.56) for r e JÂ„ (o" = 1) and (2.4.3) for estimating the parameters. 2,p+l One can also extend the method presented in Theorem 2.4.1 to this case, but the analysis is increasingly complicated as p increases. The determination of estimates from (2.4.56) and (2.4.3) is probably easiest, and the order of the error increase is N , even thougli the estimators are inefficient. The cases with p, q, and m general are analytically intractable insofar as evaluation of increase in prediction error due to parameter estimation is concerned. But the study of the simple cases has shown
PAGE 53
45 that primary consideration should be given to the construction of estimators which are consistent and have mean square error of order N . As long as N is large, high efficiency is not a necessary condition for small mean square error of prediction.
PAGE 54
CHAPTER III THE LOGISTIC GROWTH PROCESS 3. 1 Introduction Let fx(t) : t e IÂ„ ] be a time series such that , Â°Â° E{X(t)/X(0) = n} = n(l+cy)(l+cyp*)"^ , (3.1.1) where a > 0, n e J , and n e (0,1). Then fX(t)] is said to be a 1,<= ' growth process following the logistic law, or simply a logistic growth process. This process receives wide application in animal population models and various economic models; the representation varies slightly, depending upon the particular application, but with a little adjustment each can be written as (3.1.1). The problems to be considered are: (i) the estimation of ex and p in (3. 1. 1) , and (ii) the prediction of future values of X(t) when a and p are estimated. Several observational possibilities will be considered. The process may be observed periodically, at equidistant time points; it may be observed continuously during a given time interval; or only the times at which the process increases (assuming all increases are discrete) may be observed. 46
PAGE 55
47 3.2 Probability Distribution of fxCt)] The probability structiire of a growth process may be specified in any cf an infinite number of ways. But in reality these processes are often used wih no reference to such a structure; the parameter estimation probleis considered without reference to error. Ve shall use the following probabilistic model for [X(t)}: let fX(t) : t Â£ I^ ] be a Markov process with , Â™ PfX(t+5t) = j/X(t) = i] iX(t)6t + o(5t) if j = i + 1 =<1 iX(t)6t + o(6t) if j = i o(6t) otherwise, (3.2.1) where Â£t is an infinitesimal time interval. This model obviously requires the assumption that [X(t)} grows in equivalent discrete units (econonic units, population units, etc.), and that the probability of a onevmit increase at any given time is proportional to the size of the process at that time. The reader will recognize (3.2.1) as the common definition given a birthdeath process with the death rate zero for all t. Kendall [19] has shown that the above representation is the birthdeath r.odel which minimizes the variance of {X(t)}. Defining p..{s.t) = P{x(t) j/X(s) = i], (3.2.2) an application of the Kolmogorov equations yields (Jl)?.(t)6t p. ._^(s,t) + + [1 jX(t)6t]p. .(s,t) +o(6t) if j e J if j = i. p. .(s,t+5t) = < [liX(t)6t]p. .(s,t) +o(6t) (3.2.3)
PAGE 56
48 Thus ^ [p. .(s,t)] = (jl)X(t)p^ .l^s.t) jX(t)p. .(s,t) if j e J. , 1+1,00 JX(t)p. .(s,t) if j = i. (3.2.4) Multiplication of (3.2.4) by j and summation over all j gives _i:_ j[^fp..(s,t)}] = = X(t)[ S j(jl)p. ._,(s,t) E j p..(s,t)] j=i+l ^'^ j=i ^^ = \(t) E j p. .(s,t) j=i "^ (3.2.5) Defining m. (s,t) = E j p. .(s,t), j=i ^^ we rewrite (3.2.5), obtaining a ^ ^""i [m. (s,t)] = \(t) m. (s,t), Solving, in.(s,t) = i exp { J X(u)du}, (3.2.6) (3.2.7) (3.2.8) using the initial condition m. (s,s) = i. (3.2.9) The function m.(s,t) shall be called the mean function of the growth process [X(t)], since obviously m^(s,t) = EfX(t)/X(s) =1}. (3.2.10)
PAGE 57
49 From (3.1.1) we know that m (0,t) = n(l+a)(l+Qp )" , n (3.2.11) where n is the number of xonits at time zero. Equating (3.2.11) and (3.2.8) (for s = 0) implies that, for the logistic growth process, where X(t) = a0(a + p"*)"^, P = log p . We now state (3.2.12) (3.2.13) Theorem 3.2. 1: If fx(t) ; t e IÂ„ ] is a logistic growth process whose proba, 00 bility structure is (3.2.1), then p. . ., (S,t) 31 < {'ti'X t '* J ^'o,Â« otherwise , where Proof: T^ ^ = (1 + op*)(l + op^)'^. (3.2.14) (3.2.15) The proof is a special case of that given by Kendall [19] for the general birthdeath process. We let X(t) be defined by (3.2.12) and p,(t) , the death rate, be zero to obtain the density given in (3.2.14). The rigorous proof of this is given for the general case in many places, and so will not be reproduced here.
PAGE 58
50 3. 3 Equispaced Observation Assume that N observations are made on the logistic growth process [X(t) ; t e I ^} at equispaced intervals of length A. Without losing any generality in the results to be presented, we henceforth assume A = 1. Thus [X(j); j e J } will be observed, with X(0) = n known. Writing the set of observations as fx ; j e J ], the likelij 1,N^ hood function for this set is L[Qf,p; (x ; j e Jj^ j^)] = = Pf(X^ = ^j' ^ ^ ^l N^/Â°''P"3. (3.3.1) Using the fact that [X(t)] is a Markov process and applying Theorem 3.2.1, .[o.p; (X Â• j e J )] = TT p[x(j) = x./x(jl) = x. J ' i=l ^ J""N . x.l . X . , x.x. = TT f J )t ^^ (lT ) J J^ (3.3.2) where x = n. Taking the logarithm of (3.3.2), and replacing T. .by the defining equation (3.2.15), one gets N x.l N log L = E ( ^ )+ E X [log (l+cvp^) j=i ^""jr^^ j=i ^'^ log d+cyp^' )]+ E (x.x. )[log a(p^~^p^) j1 ^ ^ log d+Q'p'^"^)], (3.3.3)
PAGE 59
51 writing L = L[a,p;(x.; j e J )] for simplicity. Differentiation of (3.3.3) with respect to a yields N1 ^a?iii= n[p(l.ap)' c""] + Z x.[pJ^\l+c.pJ^S' p (l+ap ) +^jTti^ P (1+ap ) ]; (3.3.4) and differentiation with respect to p gives ^ Â— = n[(lp) + ad+Qp) ] + S x.[(j + l)ap^ d+ap^"^ ) p~ J1 ^ (jl)cyp^~^ d+ap^'^)"^] + Xj^[(Nl)p'^(1p)"^ (Nl)ap^"^ d+op^"^)"^]. (3.3.5) Before considering (3.3.4) and (3.3.5) at greater length, let (Y. (t) ; t e IÂ„ , i e J^ ] be n independent logistic processes, where 1 0,00 l,n^ Y. (0) = 1 for each i, and each of the n processes shares the same parameters as [X(t)}, (ck',p). Defining, for s < t, 00 00 Hy^(s,t; z,w) = Z E P[Y.(s) = ra; Y. (t) = kjz^"", (3.3.6) m=l k=m one has 00 eo HY^(s,t; z,w) = S PfY^(s) = m/Y. (0) = 1} w" Z p[y. (t)= k/Y. (s) = mjz'^ ni=l k=m 00 00 TT n/inxinl m Â„ /k1. ra ,_ .km k = \ ^0,s^^^0,s^ ^ ^ (ml)T(lT ) z nt=l k=m = T 6(w,zrT )[i(iT )5(w,z,T J1~^, (3.3.7) vJjfa SjU St S.U
PAGE 60
where 52 6(w,2;t . ) = t wz[1z(1t J]~^. (3.3.8) s , t s , t s , t Similarly, defining a 00 k m H^(s,t;z,w) =z E Z P{x(s) = m; X(t) = k/X(0) = n}z w , (3.3.9) itt=n k=m one has H_.(s,t;z,w) = E ( ,) T (1T ) if n1 O.s 0,s nfcn _ ,kl. m ., .km k ml s , t s , t k=m = f^o.s ^(.;%,t>[i(i^s.t)^^^'^=^s,t^"'^"' (3.3.10) which implies that H^(s,t;z,w) = [Hy. (s,t;z,w)] = IT Hy. (s , t ; z , w) . (3.3.11) i=l Hence n X(t) = E Y.(t) (t e IÂ„ ). (3.3.12) i=l ^ 0'Â° If we rewrite (3.3.3) for the random sequence {X(j) = X^; j Â€ Jj^ j^}, we get N logL(Q',p;X^ X^) = n cp^(Q/,p) + E X. cp.(a,p), (3.3.13) where
PAGE 61
53 1 1 ^ '^i'^ : log [(l+Qp)[a(lp)] } + n E log (^^ _^j if j = I: 1=1 11 9.(a,p) = ^ log {(l+Qp ) [pd+ap )] } , N1,, ,, Nll^ log [op (lpjd+G'P ) j if j = N. (3.3.14) But in view of (3.3.12), this can be written log Ua.p; Y^.Y^,... ,Y^) = N n = ncpÂ„(ck',p) + E E Y. . ^).(.Q:,p) Â° j=l 1=1 ^'' J = E Z. , i=l " (3.3.15) where Y. = (Y. ,,Y. Â„,... ,Y. J, Â—1 1,1 1,2 1 ,N (3.3.16) and Y. . = Y.(j), Z. = 9n(cv,p) + E cp.(Q/,p)Y. . It follows from the discussion leading up to (3.3.12) that (3.3.17) 2. = 9_(q',o) + E 9.(0, p)y. . 1 .^^ J i,j log pCl.a.p) . (3.3.18) where p(j^.,a,p) = P[Y. .^ = yi,l'^i.N = ^i.N^ ^"^^^^
PAGE 62
54 We are now ready to state and prove Theorem 3.3.1: The equations a log L(ff,p; \,\, X^) ba d log L(a,p; x^,.. . ,X^) dp = (3.3.20) have a solution (a,p) which is weakly consistent for (oip). Fvirthermore, the asymptotic distribution of [/v^i (S^Q') ; vOi (pp)] as n <Â» is normal, with null mean vector and variancecovariance matrix E, where 1 11 12 a a 12 22 a a (3.3.21) with 11 2,, 2 2 a = p (1+ck'p) a ^v^ ^1 ^^. J,l r 2(j + l),, J + ls2 2(jl) jl)~, + E (l+a)(l+Q'p ) [p (l+ap ) p (l+Qp ] /. ^/, N.l^ 2 2(N1),, N12, + (l+Q')(l+ck'P ) [a p (l+op ) ], (3.3.22) 12 2 a = (l+ap) N1 + E (l+C^)(l+Qp^)"^[(jl)p^'^(l+Qp^"^)"^(j + l)p^(l+Qp^'^^)'^] ^. ^/. N 1^.^ _ 2(N1),^ N1,2^ (1+Q')(l+Qp ) [(Nl)p (l+op ) ], (3.3.23)
PAGE 63
55 and 22 222 a = 0! (l+op) (1p) + E (l+Ck')(l+ap^)"^ [p~^ + [(j + l)a^p^'^j(j + l)cyp'^"^] X (l+ap^^S"^+ [(jl)(j2)cypJ"^(jl)a2p^^J2^ X (1+Q'p"^"^)"^}+ (l+cv)(l+C^p^)~^ [[(Nl)(N2)ap^"^ (Nl)a^p^^^"^] (l+c'P^~S"^+(lp)"^ + (Nl)p"^] . (3.3.24) Proof : Write p = p(y ; a.p), 9 = a, and 9Â„ = p for convenience. Chanda [5] has shown that the following three conditions are sufficient for the theorem to be true: (i) For all (e^^.Gg) en= [(0,Â») x (0,1)], 3 3 log p 9 log p , B log p ae. ' 89 Se ' ^^^ dg 59 de ^'^^^* ^Â°^ ^^"^Â°^* ^^^ y1 1 J i j k and for all i,j,k e J (ii) Defining F() to be a summable function when E F(y^) < CO, (3.3.25) ^i where E is the summation over almost all y, we must have, for all (9, ,9_) e n, f < F. (y) , 12 o9 1 ~ 1 1 J 1 J k where F. (y) and F. . (y) are summable functions, and
PAGE 64
56 ^Â± M being a finite positive constant. This condition must be satisfied for i,j,k e J . (iii) For all (6,,92^ ^^' *^Â® matrix J = KJ^ .(Q ,Q^))\ where jr^ 1 J is positive definite and (j is finite. (i) Condition (i) will be established as soon as irÂ— Â— , TTrÂ— =Â— Â— , and ^^ . sA are shov,Ti to exist for r e J^ Â„, and Se. ' 36. d9. de.oe.dO, 0,N 1 1 J 1 J k i,j,k Â€ J . This follows from the form of log p(y.;Q',p), which from (3.3. 18) is log p(y. ;cv,p)= E
PAGE 65
57 Thus SP. N 3cp N S y^ exp { E cp. y. .} N 2 y. a?. 1 ,r o9. ~i 1 2 r=0 J (3.3.28) which is bounded by the summable function F.(y.) = [ Z J ~i r=0 (3.3.29) for from (i) the terms ^^ are finite, and Efy. } < J Similarly, i\ ae.se. 1 J iar ^ (ae:;^k,rP(^k=^'V N a Cp r=0 1 J N N acp acp + E E ( ' ' ' r=0 s=0 VaerJ (arJ^k.r ^k.s P^^k'^iV , (3.3.30) which requires (i) and finite second order moment of y in order to be bounded by the summable function ,2 N ^^ ^ r^O a cp ae.ae. 1 J ^ ^k.NP^^k'^'V N N + E E r=0 s=0 a:p Sep ^r ^ s ae. ae~ 1 J y p(y ;e ,eÂ„). (3.3.31) *^k,N ^ ~k' 1' 2^ Finally, a log p^ ae. ae.ae 1 J k N a cp r ^Â„ ae. ae.ae ^s.r r=0 1 J k (3.3.32)
PAGE 66
58 is bounded by (3.3.33) "ij.k^^s^ = ^s.N r=0 i J k which has finite expected value by (i) and the finite expected value of y . Thus (ii) is established. (iii) Chanda has shown that J will be positive definite as long as p is a density function. That j is finite follows from the fact that ( Iq^ ^ ) ( 36^ ^ ) ^^ finite (from (i)) for all (9,9) e Q. Thus the proof of Theorem 3.3.1 is complete. A closed form has not been found for the solution to (3.3.20). Some numerical results using an iterative procedure will be presented in Chapter IV. Also, we attempted to establish stronger consistency properties for (a.p) , using results of Billingsley [2] and Wald [27], but have had no success as yet. 3. 4 Continuous Observation in [0,T] Assume that the logistic growth process fX(t) ; t e I^ ] is , Â°^ observed continuously in [0,T] = I . That is, [X(t) ; t e I ] will be observed, with X(0) = n known. The problem of estimating a and p will be approached as follows: first, divide the interval I into N equal intervals of infinitesimal length h. Assume that observations will be made only at the end of each of the N intervals; that is, [X(jh) ; j e J^ ^} will be observed, where Nh = T. From (3.3.2) we can write the likelihood function for such an observation: L[a,p; (Xj^; j e Jq jj)] N , x.^1 , X . X.. X. > (JI)h (jl)h,jh^ ^ (jl)h,jh^ = ^ (x "'''" iV^^^^ ..)"^^"'^NiT,.. ,.. S'"" "^^'^^ j1 (jl)h (3.4.1)
PAGE 67
59 where x^ = " and T^ ^ is defined in (3.2.15). Note that to obtain the continuous observation in I , one must let h Â• and N * Â» such that Nh = T. We thus want to consider the properties of log [lCq.p; (X Â• J e Jq N^]} as h 0. We first define the concept of "limit in mean" (l.i.m. ). If a sequence of random variables fz ,Z ,...,Z , . . . ] are such that 12 m ' lim E[(Z^ Z)^} = 0, , (3.4.2) m * Â°= then fz } is said to approach Z as a limit in the mean square sense, or simply as a limit in mean. We will write l.i.m. Z^ = Z . (3.4.3) m Â» 00 We now state Theorem 3.4 . 1: The limit in mean of log fL[a,p;(X.^; j e J ^ J ] ] as Jh 0,N ' "h = T) does not exist. However, for fixed (a ,p ) e Q, l.i.m. [logLEcp; (X.^; j ^ Jo,N^^^Â°SL[c.o,Pq;(X.^; JeJ^^^)]} T = / (log Â§ log Â§Â°) dX(t) ^ ^ T + / X(t)[Â§^ Â§Â°] dt, (3.4.4) Nh^T T where we define Â§^ = Oi&p n+ap ) , (3.4.5) ^t = ^O^oPo^^+^oPo^'^ Â• (3.4.6)
PAGE 68
60 The stochastic integrals on the righthand side of (3.4.4) are defined in the usual sense, and remember 3 = log p. Before proving Theorem 3.4.1, we will need the following lemmas. Lemma 3. 4. 1: lim , N J T lim J. N J The lim , N h ^ Z log (1 T )f does not exist, but for any fixed ia ,p ) e Q ^''" r ^ r ' il I !*! if, ' 'Â°^L^' ^(Ji)h,jh><^ \ji)h,jh> J/ T = / (log Â§^ log Â§^)dt, (3.4.8) lim , N 1 I E Nh = T jzrl T r writing (s < t) %.t= (1Vo^^^Vo>''^"""^ Proof of Lemma 3.4.1: Using (3.2.15) , r, j^^ f^ (jl)h 1 (jl)h,jh = (1 + Qp^^) [1 + ap^^exp (0h)]"^ = fl + (1 + ap^^)"^[ap^^(exp[Ph} 1)]}1 = [1 + hÂ§.^ + 0(h^)]"^. (3.4.10) jh
PAGE 69
61 Hence ^Â°^ ^Jl)h,jh = ^Â°^ [i.h^.^.och^)] = h Â§.^ + O(h^), (3.4.11) Jh from which (3.4.7) follows immediately. But from (3.4. 10) 1 ''f, i^h ih = f^ Â§., +0(h^)][l + h Â§ +0(h^)]'^, (3.4.12) (jl)h,jh Jh Jh lira which implies that h * flog (1T..^., .^)] does not exist. Now Nh = T (JI)h,jh (jl)h,jh (jl)h,jh = [hÂ§ .^ + 0(h^)] [1 + hÂ§Â° + 0(h^)] [hÂ§Â° + 0(h^)]"^ jn jh jh X [1 + hÂ§jj^ + 0(h^)]"^ = Â§jh^^jh^'^ ^^ ^ ^^'"^^ Â• (3.4.13) We substitute (3.4.13) into the lefthand side of (3.4.8) to complete the proof. Lemma 3. 4. 2 : If we define, for s < t S u, R (t,u) = C (t,u) m^ (s,t)my (s,u) (3.4.14) s s ^s Â•'^s where C (t,u) =: Efx(u) X(t)/(X(s) = X )] (3.4.15) s s ' and mx (s,t) is defined in (3.2.10), then R^(t,u) = (^(p^p )(l+ck'p^) [(1+Q'p")( l+O'p *)]""' . (3.4.16)
PAGE 70
62 Proof of Lenma 3.4.2 : Using (3.2. 26) , Cg(t,u) = E[X(t) [E[X(u)/X(t) = x^]]/(X(s) = x^)3 = (1 + ap*) (1 + Q'p")""'' C (t,t). (3.4.17) And from Theorem 3.2.1, E{[X(t) x^][X(t) x^l]/(X(s) = x^)} /X +jl J=0 s = (x +l)x (1T )^(T J~^. (3.4.18) s s s , t s , t Thus C (t,t) ^ (X +1)X (1T j'^iT J~^ s s s s , t s , t + (2x +l)EfX(t)/(X(s) = X )} X (x +1) s Â• s ' s s = X (T ^)'^(x +1T J. (3.4.19) s s , t s s , t Substitution of (3.4.19) into (3.4.17) gives C (t,u) = x (T T )~^ (x+lT J. (3.4.20) s ss.ts.u s s,t Then using (3.4.20) in (3.4.14) yields R (t,u) = x (T T J~'^ (1T J, . (3.4.21) s ss,us,t s,t which is (3.4.16) upon substitution of (3.2.15) in (3.4.21).
PAGE 71
63 Lemma 3.4.3 : The stochastic integral l.i.m. N I, = h Z X(jh) Â§.^ h (3.4.22) exists in the mean square sense for all (a,p) e Q. We shall write this limiting random variable T I = / X(t) Â§ dt. (3.4.23) Proof of Lenna 3.4.3 : Using a result from Cramer and Leadbetter [7] , sufficient conditions for the existence of I are that R (t,u) be of bounded variation in [Iq.T ^ ^O.T^ ^""^ *^^^ T T J^ = / / Â§^ Â§^ RQ(t,u) dt du (3.4.24) T T r .' exist. Since, from (3.4.16), S R^(t,u) < Xq Q'd + or), (3.4.25) the first of these conditions is satisfied. And since ^^ is finite for ^t any (Q',p) e H and all * ^ ^n t' *^^ integral (3.4.24) exists. Thus the lemma is proved. Lemma 3.4.4: The stochastic integral l.i.m. N I, = h E (log Â§.,)tX(jh) X((jl)h)] (3,4.26) ^ Nh = T j = l J^ exists in the mean square sense for all (Q',p) e Q.
PAGE 72
64 We shall Arite this limiting random variable T I= r (log Â§^) dX(t). (3.4.27) ^ * Proof of Lemma 3.4.4 : Again from Cramer and Leadbetter [7] , sufficient conditions for the existence of I are that R (t,u) be of bounded variation and that X 1 J^=f / (log Â§^)(log 5^) d^^^[R^(t,u)] (3.4.28) T T r ; exist. The first of these has already been shown, and by differentiation of R (t,u), one has T T Jg = / / (log Â§^)(log Â§^)[a3(l+a)] p [(l+Qp*) d+Qp")]"^ dt du . (3.4.29) The integrand is finite in Q and [l X I }, which completes the proof. Proof of Theorem 3.4.1 : From equation (3.4.1) log L[o,p; (X.^; j Â€ j^^^)] = H ( ^ .)+ E X,. ^.^ log [T,. ... .J 3 = 1 ^(jl)h^^ j = l ^J^>^ (JI)h,jh ^ j/V^Jl)h^ ^Â°^f^\jl)hjh^^^Â•^Â•^Â«> The third term on the righthand side of (3.4.30) has no limit in the mean square sense because of the first part of Lemma 3.4.1. We thus consider
PAGE 73
65 log L[a,p;(Xj^;j e J^^^^)] log LqEq^.p^; (X^^^; j s J^^^)] = jfl ^al)h ^'Â°^ ^jl)h Jh ^Â°^ \jl)h,jh> N 01 + E (X.^ X,. ,.^) log [(1T .,)(1T ) ] .^j^ jh (jl)h ^ (jl)h,jh (jl)h,jh (3.4.31) for any ((!Â„, p^) e Q. Using lemmas 3.4.3 and 3.4.4, we have log L* fa,p;cyQ,pQ; (X(t); t e I^ ,j,)} T T = r X(t)(Â§Â° Â§.) dt+ r (log Â§ log Â§^ dX(t), (3.4.32) ^ ^ writing log L* [a,p;aQ,pQ; (X(t); t e I^ ,^)3 1. i.m. = h 0 (log lEcv.p; (X ; j e J^ ^)] Nh =T ^ This completes the proof of the theorem. The idea of resolving the limit problem by considering the ratio of likelihood functions, as in (3.4.31), is from Bartlett [1]. By using the expression (3.3.12), we may rewrite (3.4.32) to obtain * " log L [a,p;cy^,p^; [X(t) ; t e I^ ^] } = E Z^ ., (3.4.34) j = l where T T Z =/ Y (t)(rÂ§ )dt+/ (log Â§ log r)dY (t). (3.4.35) '0
PAGE 74
66 fZ . ; i Â€ J ] is a sequence of independent, identically distributed random variables. We will be interested in and ^^2Â£^= E {/ [a^p* * * Â„ro log L Â„rB log L , EfÂ— ^^ = ^i S3 ^ = Â°(3.4.38) Proof : From (3.4.36) * n , T i=l T r T ^f^^^^= ^ {/ [c.^p'(lc.p*)']dm (0,t) i=l ^ ^ T 3/ p*(l + ap'^)~^m (0,t)dt} ^ . = E {a(c^l)3 J [a" p*(l+ap*)"^]p*(l+cvp*) ^dt 1=1 T 3(CH1) / p*(l+ap''^)"3dtj = 0, (3.4.39)
PAGE 75
67 using the fact that (CramerLeadbetter [7]) E{dY. (t)} = d^E[Y.(t)] 1 ^ t ^ 1 = d^[m^(0,t)]. (3.4.40) Similarly, * Ef^ lÂ°g L ag ] = ^ JP" / (i+c^P 0t) (i+ap*)~^d m ro,t) i=l ^ ' T or/ p (l+ckp 3t) (l+Qp *)""Â• ra^(0,t)dt ^ ' T {aic^D J (l+ap*3t)p* (i+ap*)"^dt T a(cHl) / (l+ap*3t)p*(i+c^p^)"^dt} A J T = E i=l ^ T r = Â°' (3.4.41) which completes the proof. We are now ready to state and prove Theorem 3. 4.2: The equations a log L*{g,p;a^,p^; [X(t); t e I^ ^] } ba = 9 log L [q/,P;c^q,3q ; [X(t); t e Iq ^] } ~~ Â§g~~ ' =0 (3.4.42) have a solution (or ,0 ) which is weakly consistent for (a,0). Furthermore, the asymptotic distribution of [^ (i%) . ^ ^^*_^^^ as n ^ is normal, with null mean vector and variancecovariance matrix Z* , where
PAGE 76
68 and * E = a.. a. 11 12 ^12 ^22 * 2 2 2 Now, a., = A (o;,^ XÂ„_ + cuÂ„_ X_ 2a)._ujÂ„_X._) , '11 12 22 22 "11 12 22"12 "=^22^ ^'^^2 \l ^ ^?1 ^22 2<^ll'^12^12^ ' Â—2 2 ^12 = ^ ["^12^22^11^ "^ll'^12^22^"^12 ^ ^l"^22^ ^12^ ' with, writing Q = en and 9 ij " n ae.de. J > ij ij ae.ae. 1 J M( 9 log L \ /9 log L se. ae. (3.4.43) (3.4.44) (3.4.45) (3.4.46) (3.4.47) and ^ = '"ll "^22 ^12 (3.4.48) Proof: Writing log [p* [a,P;cvQ,0Q; (y.(t); t e Iq^^)]} = J y^(t)(5^Â§^)clt + / (log Â§^ log Â§^)dy.(t), (3.4.49) we have from (3.4.34) that log L [a,?;a^,&^; [x(t) ; t e I^^^]] = E log p. [a,&;a^,&Q; (y.(t); t e I^ ^)], (3.4.50)
PAGE 77
69 As mentioned prior to the statement of Lenma 3.4.5, we cannot regard {p.; i e J } as a sequence of probability density functions. However, as we shall see, this is not essential to the proof of the theorem. In the proofs given by Cramer [6] for the single parameter case and Qianda [5] for the multiparameter case, the only property of the density function crucial to the proofs is Efii^] = 0. (3.4.51) where f is the density function and 9 some parameter of f. The property holds as soon as I'T^I is bounded by an integrable function (over the sample space) . But in the present case we have, from Lemma 3.4.5, J 1=1 J which implies S log p* for j ^ Ji oThus the assumption that p. is a probability density, ^'^ ap* ^ and, in fact, the requirement that I'Sq~I i^ integrable (or summable J in this case) are unnecessary. We now must establish the three conditions given in the proof of Theorem 3.3.1 to complete the proof of this theorem. (i) To see that log p. is dif f erentiable up to the third order for almost all y. , we need only note that Â§ and log Â§ are dif f erentiable to that order; this is easily shown using (3.4.5).
PAGE 78
70 (ii) As mentioned, we no longer need the summability condition Sp. 3 p for r^The additional requirement that  ^  be o w . 8 . o 9, J J k bounded by a summable function is used such that 2 * 3^ r ., S'p. [f ^3= and hence E 2 * * rd log p.. S log P * . foe) = 4(^e) (^^sF^)} Â• *^^"' J k But we shall not make use of the property (3.4.55), so it will not be ^2 * o p. necessary to have f , s bounded by a summable function. Thus, we need only establish the condition that Nr;:Â— =r;rÂ— w^ ^* J k X J k i .3 7 r 1 ^ ^t 2/ ^ ^t \ /^^t\ (3.4.56) j k .e j k" 2 ^'^t y^^tn ,2 ^y^ ^t vde.se, ase, y vae, aejvae. j k .c>a Since, for every (6 , 9Â„) e Q, we can easily bound Â§ and its derivatives 1 J t used in (3.4.56) for all t el, we can find H. Z^) ^s soon as T T ^fj y.(t)dt3 and EfJ dy.(t)} < Â«, which follows from the proof of Lemma 3.4.5
PAGE 79
71 (iii) The third condition required for Theorem 3.4.2 to be true is that J ((J. .(9,9)) ) is positive definite and that 1 ) J 1 ^ J( is finite, where \. .. (3.4.57) From (3.4.36) and (3.4.37), T T "^1 " / / [Q'^^d+ap*)"^] [a'^^d+cyp"^)"^] d R (t,u) '" and i 1 2 (Â» r> t+u t u 2 + 3 J J P [(1+ap ) (1+ap )] RÂ„ (t,u)dt du T T 20/ / [cy"%'^(l+cyp*)"^] p"(l+cvp")"^d^RQ(t,u)du, (3.4.58) nX = 3"^ / / [a'^p'^d+ap^)"^] (l+ap"3u)(l+ap")"^d R_(t,u) Â•^0 t,u u T T + 0^ I S p*^"(l+ap^3t) [(l+ap'*') (l+c^p")]"^R^j(t,u)dtdu T T r ; T T / / [(l+ap'^3t) (l+ap''^)~^p''(l+ap")' + aCa p (1+cvp) (1+ap ) ] p^d+ap^pu) d+ap")" ] X d^ RQ(t,u)du, (3.4.59) T T ^"^22 = ^~ I I (l+ap*0t) (l+ap"Pu) [(l+ap*) (l+ap")]~^ '22 ><^,u\^*'">
PAGE 80
7Z T T + a / /p """"(l+ap 3t) (l+ap^Pu) [d+ap Xl+Qp")] X R (t,u)clt du T T 2c3~'^ J / (l+Q'p*pt)(l+ap'^0u)(l+ap*)"^ (l+ap"")"^ Xd^RQ(t,u)du. (3.4.60) For all (c^.P) e f}, R (t,u), given by Lemma 3.4.2, is finite, as are its derivatives. Thus X. . is the integral of a finite integrand over a finite interval, and so is finite. This completes the proof of the theorem. Thus we have shown that (5,3) have the properties usually associated with maximum likelihood estimators, losing only the property that ^logp. , ,9 log p. . .a log p. , , <w^} K ^r^)(aeÂ— )}Â• "^Â•> J k J k that is to say, "'jk = V ^'''^^ * Wien this property holds, the form of E is greatly simplified; since we * cannot assume (3.4.62) here, calculation of E is tedious. In addition to X. .' s given by (3.4.58) (3.4.60), we will need "^11^ (OHDPJ p [ad+Ckp )'*]"' dt, (3.4.63) J,Â„ = (Q41) r (1+Q?p ) [p (l+Qp ) 0tp ^^ X(l+cyp^) ~^]dt, (3.4.64) U)Â„Â„ = Ck'(QHl)P~''" J p* [(HQ'p*)"'^3t(l+Q'p*)~^]^dt. (3,4.65) and
PAGE 81
73 No closed form has been found for the solution to the estimation equations (3.4.36) and (3.4.37), nor could a more convenient form be found for the elements of E . It is conjectured that iterative numerical methods can be used, but we leave this particular topic open to further research. 3. 5 Arrival Time Observation Assume that instead of observing the logistic process [X(t) ] itself, we observe only the times at which the process increases; that is, the "arrival times" are observed. Observation is terminated when the th time of the N increase has been recorded. Hence we will observe fT^; J e J^ j^}. where x(T.) is < J k/(t.; i e J, .)] J J 1 l,j ' = P[X(t.+e) X(t.) ^ k/C.}, (3.5.2) where
PAGE 82
74 x(t) n x(t) = n + 1 Cj=< x(t) = n + j 1 '^ x(t) = n + j if t < t if t^ ^ t < t^ if t. , S t < t. if t = t.. J (3.5.3) Since [X(t)} is Markov, we have PfT, . T. S e/(T. = t. ; i e J .)] = P[X(t.+ e) X(t.) ^ k/(X(t.) = n + j)} = P{X(t. + e) 2 n + j + k/(X(t.) = n + j)] = .\ C^.Jl)<^..t^.e'""'<'^.t..9>' i=k ^ J J J J (3.5.4) Since (3.5.4) is a function of t. only, it follows that [T . ; j ^ J } is a Markov sequence, and from (3.5.4) the distribution function is given by F (e;t.) = p[t. St +e/r = t.} = 1 (T >^'J t ,t .+' J 3 (3.5.5) requiring 9^0. To follow the procedure suggested by Moran [22] for several different birth and death processes would be to calculate the likelihood function N L( c/,p; t ,. . . ,t ) = Tf f (a,p ; t , 9) , where J J+1 J+1 [F^ (9;t.)], J (3.5.6) (3.5.7)
PAGE 83
75 which exists in 7} since T. ^ ., and thus F (9;t ), is absolutely 3 3 J + 1 continuous for 9 S: 0. However, Moran's approach can be shown unsatisfactory for this case by observing that, from (3.5.5), lim F (e;t.) e Â» j+1 ^ t. + e *Â• 1 n = lim [1 [(1+ap ^ )(l+ap Â•^)" ] ^^ ] QÂ»oo = 1 (1+ap J)^"+J^ . (3.5.8) Hence there is a nonzero probability that, given T. = t., T. will not be finite. Thus EfT. ,/(T. = t.)} does not exist, and the likeliÂ• J + 1 J J hood function given by (3.5.6) is not useful for estimating (a,p) . Perhaps a conditional likelihood could be used to get estimation equations. Fxirther study of the problem is needed. 3.6 Prediction Assume that N equispaced observations will be made on the logistic growth process [X(t) ; t e I ^} , as in 3.3. Thus, based upon observations taken on [X(j); j e J }, we will predict the value of the time series at time N + m, that is, X(N+m) . Restricting our attention to the predictor which minimizes the mean square error of prediction, we define i^^\m) = E[X(N+m)/[X(j) = x.; j e J^ ^]} = EfX(N+m)/(X(N) = Xj^)} = m (N,N+m) "^ = x^(Uc.p^(l.ap ''"'")"'. (3.6.1)
PAGE 84
76 using (3.2.10). Then x (m) minimizes the mean square error of preN diction, which is MSE = E[[X(N+m) X (m)]"} = E{R (N+m,N+m)} ^r^ ,, N. . N N+m^ ., N+m. 2, = E{X^ QCl+ap )(p p Xl+Qp ) } ^X N., in^,^ N+m^ 2 ,^^^^ = nad+Q') p (1p )(1+Q'p ) , (3.6.2) where X(0) = n. If a and p are unknown, consider the predictor (2) . . /. 'N, ,, >. N+m 1 X (m) = X (l+cyp Xl+cyo ) (3.6.3) ^(2) Denoting the mean square error of x (m) by MSE , it follows that N ^ MSEÂ„ MSE, = E[[X7/(m) X^'Cm)]^} 2 1 N N T^rJ^r,^ N. _ N+m 1 = E[X~[(l+Ck'P ) (l+Q'p ) ., >Â» aN. ,, A N+m 1 2 t3 c A\ (1+ck'p ) (1+ap ) ] }. (3.6.4) Define U = 1 + ap^ , (3.6.5) V= 1 + ap''^"''", ^ (3.6.6) 5a = a Â— a, (3.6.7) and 6p = p" p. (3.6.8)
PAGE 85
77 Then 6U = (1 + ap ) (1 + ap^) = Ofp ap = (^040') (5 p+p) ap = p fiof + Nap 5 p + (n ) , (3.6.9) making use of the fact that the maximum likelihood estimators are weakly consistent. Sinilarly, o\ = (1+c^p ) (1 + ap ) N+m c ,Â„ . N+m1 . , 1 = P 00+ (N+m)Q'p 6 p + (n ) . (3.6. 10) Hence UV"^ = [5u + (l+cyp^ ] [6v + (1+cvp^^") ]"^ = (1+c.p^ (1+c.p''''")^ (i+cp^'^^^eu /I N,, N+m^ 2 t , 1 d+ap ) (l+ap ) 6v + (n ). (3.6.11) NoÂ» we can get an approximation for (3.6.4) by taking MSE^ MSE^ = EfX^CUV"^ (l+^p^ d+cp'^^'") "^1 ^ } Â«E(X^) E[[Uv"^(l+c^p^ (l+apN+'")l]2j = Cq(N,N)(1+cvp^^'")"^ E{[6u(l+cvp^5V]^] = p^^ n[n+a(lp^(l+a)"^](l+cy)^ [(l+cvp^ (l+ap^^"") ] "^ xf a^^[l(l+cvp^ (l+c.p^'^"''")'^'"]^ + 2a^2 ^P"^ X[N(2N+m)p'"(l+ap^ d+op^^'")"^ + (N+m) V r.^^ r, N. 2 _ N+m 2, 2 2 X p (1+ap ) (l+ap ) ] + cr a p [N (N+m)(l+c^p^ (l+c.pN+'")^ p'"]2j, (3.6.12)
PAGE 86
78 Perhaps (3.6.12) is only a crude approximation for (3.6.4), but inspection of MSE (3.6.2) and the approximation for MSE MSE (3.6.12) leads X Â« 1 to several interesting observations: N 1. MSE is proportional to p ; that is, MSE decreases exponentially as N increases. This is intuitively plausible, since the correlation between X(N) and X(N+m) is given by Corr [X(N) , X(N+m) } 1/2 = R (N,N+m)[R (N,N) R (N+m, N+m) ] rf, N. ,, N+m. 1,1/2 .Â„ Â„ ,Â„. = [(1p )(lp ) ] . " (3.6.13) Hence the correlation is a raonotonically increasing function of N, and it follows that X(N) will contain more information about X(N+m) when N is large; thus the decrease in prediction error. 2. MSE is proportional to n. Thus, contrary to the error associated with the maximum likelihood estimators of the logistic parameters, the error of prediction increases with the initial size of the process. This is perhaps partially explained by the fact that the process variance at any time is proportional to n (Lemma 3.4.2); heuristically , the population or process size is less deterministic for large populations. 2N 3. MSE MSE is proportional to p . Hence the additional prediction error incurred by parameter estimation also decreases as N increases, and at a faster rate than MSE . 4. MSE MSE is proportional to n. ^ 1 Observation (4) is surprising if true; but it depends most directly upon the accuracy of approximation (3.6.12). Thus, though
PAGE 87
79 this may be just another of the logistic process' strange characteristics, one must question the credibility of (3.6.12). Rather, one expects the increase in prediction error to be of order one, following the general pattern that a decrease in estimator variance produces a corresponding decrease in the ratio of MSEÂ„ MSE, to MSE . 2 11
PAGE 88
CHAPTER IV AN EMPIRICAL COMPARISON 4. 1 Introduction Rhodes [25] has given a method of estimating the logistic parameters which receives wide application. We present his method in this chapter, with an attempt to examine the estimators' sampling properties. These shall be shown to be analytically intractable; thus a Monte Carlo experiment is performed in order to compare these estimators with those presented in the last chapter. Finally, the logistic model will be fitted to the population of conterminous United States, using both Rhodes' and maximum likelihood estimation procedures. 4.2 Rhodes' Estimators Recall that the mean function for the logistic growth process, given by (3.2,12) , is E{X(t)/X(0) = n} = m (0,t) = nil+a) (1+a p^)~^ . (3.2.12) Upon inspection, it is seen that [m (0,t+l)]"^=p[m (0,t)]"^+ (lp)[n(l+a)]"^. (4.2.1) n n Rhodes thus defines Y(t) = [X(t)]"^, (4.2.2) 80
PAGE 89
81 and, observing the random variables (Y(j)=Y.) Y,Y , ...,Y , he finds estimators for p and y ii^ ytHi = p^ ^ y * '^.Ni' ^^'^'^^ where Y = (lp)[n(l+cv)]"^, (4.2.4) and y is the observed value of Y . The estimators, obtained by minimizing N1 s = r [y^^^ (py^ + Y)l . (4.2.5) 1=1 Pr = L^f^ <^tyi)(^.ry2>JL^f/yt^> J and where Yr = ^2 Â°r ^1 ' and (4.2.6) (4.2.7) 1 N^ y, = (Nl) E y. (4.2.8) ^ t=l ^ _1 Nl yÂ„ = (Nl) E y. , . (4.2.9) ^ t=l *+^ Using (4.2.4), one gets an estimate for ck': a^ = (lp)(nY^)"'1 . (4.2.10) Rhodes then shows by application that one obtains reasonably close agreement between projection and reality, using these estimated values. To my knowledge no probabilistic study of the properties of these estimators has been made.
PAGE 90
82 4. 3 Sampling Properties of Rhodes' Estimators Define, for S s S t and u > 0, 3 (s,t) = E{Y(t)/Y(s) = n"^}, (4.3.1) Â§^(s,t) = E[Y^(t)/Y(s) = n"^3, (4.3.2) and Tl (s,t,t+u) = EfY(t+u)Y(t)/Y(s) = n"^}. (4.3.3) Now fix s,t and u, and write B ,Â§ and 71 for (4.3.1), (4.3.2) and n n n (4.3.3), respectively. We now prove Theorem 4.3.1 : n1 (i) g = v" log T E j v" ^; (4.3.4) j = l (ii) Â§ ^ ^ {^ [3[cos"^(lT)}^ 6tt cos~^(1t) + 2n ] Â°"^ 1 1 ^"^ 1 k + E [j log T j Z k V ]]; (4.3.5) j=l k=l and (iii) T , = v[Tl n"^ 3*], (4.3.6) n+1 n n writing and defining T= (1+ap*) (1+cyp^)"^, (4.3.7) P* = 3 (s,t+u), (4.3.8) n n V = T(Tl)"'". (4.3.9)
PAGE 91
83 Proof: It follows from Theorem 3.2.1 that PfY(t) = j"Vy(s) = n~^] = P[X(t) = j/X(s) = n] o T (1T) Jn = < if j = n,n+l, . otherwise. (4.3,10) Thus But J=n J+n1 = Z (j+n)"" j=0 ^ n1 T^lT)^ . J=n+1 1 " 1 /^'"'^"'Â•N = n V Z [n(j+n) 1] ( , ) t"(1t)^ = V(3 n"^), n Notice that 0^ = E[Y(t)/Y(s) = 1] (4.3.11) (4.3.12) = S j"^ T(lT)^~^ = V log T . (4.3.13)
PAGE 92
84 Now, defining the generating function Po(z) = E z , (4.3.14) n=l we have, using (4.3.12) and (4.3.13), OB P.(z) = B z + vz P(z) vz Z n""^ z'^ . (4.3.15) P ^ P __, Rearranging, Pg(z) = (1 vz)' vz[log T+ log (l~z)], (4.3.16) which implies that n1 _^ B = v" log T E j~ vÂ° Â•^, (4.3.1) as was to be shown. (ii) Similarly, (j+n) ( J=0 " 2 /'^^"'Â•N n i Â§^ = E (j+n) ( n_i j "^ (l*^) . (4.3.17) and 5... = . vj'" rrV""<'^" = n"^ V E (j+n)"^[n(j+n)"^ 1] t'^(It)^ j=0 = V [? n"^ 3 ]. (4.3.18) n n
PAGE 93
85 Noticing that, from a result in Davis [8], Â§^ = E{Y^(t)/Y(s) =1] 00 J = l = ^ V f3[cos"'(lT)]^ 6n cos"'^ (1T) + 2n^}, (4.3.19) we define the generating function Pp(z) = E Â§ z . (4.3.20) n=l It then follows from (4.3.18) that 1 But 00 <Â» 1 (4.3.21) P (z) ^ Â§ z + V z [P_(z) En P z ] , ^ * n=l which implies that n1 5 = ^^" Â§^ + Z 3 v"""^ p (4.3.22) j=l '' Upon substitution of (4.3.19) and (4.3.1), this is (4.3.2). (iii) From (4.3.3) 00 00 71 = E E [(j+n)(k+j+n)]"^ "^ j=0 k=0 X PfX(t+u) = k+j+n/X(t) = j+n}P[X(t) = j+n/X(s) = n] . (4.3.23) \^3^ = E E [(j+n)(k+j + n)] j=l k=0 X P{X(t+u) = k+j+n/X(t) = j+n]P[X(t) = j+n/X(s) = n+1]
PAGE 94
86 00 00 = n~ V E E (k+j + n)~ [l+n(j+n)~ ] j=0 k=0 X PfX(t+u) = k+j+n/X(t) = j+n}P[X(t) = j+n/X(s) = n] = V [7]^ n"^ p*], (4.3.24) as was to be shown. This completes the proof of Theorem 4.3.1. The reason generating functions were not employed to further resolve (4.3.24) is that no closed form could be found for T) . It is not difficult to use Theorem 4.3.1 to obtain approximations for the first order sampling properties of Rhodes' estimators. Tlie expressions are tedious, the computing process was extremely time consuming (and hence expensive) , and the few numerical results obtained were nonsensical. This is probably due to rounding errors resulting from the great number of calculations involved, or perhaps from the inaccuracy of approximations made in order to obtain these expected values. Second order properties of Rhodes' estimators would involve fourth order moments of {Y(t)], which are intractable. However, one can perform Monte Carlo experiments to at least compare the estimators for specific processes. 4. 4 Monte Carlo Experiments A computer program was written for the IBM 360 which (1) simulates a logistic growth process given n, a, p and N; (2) uses this sequence to calculate the maximum likelihood estimates given in 3.3, using the NewtonRhapson iterative procedure;
PAGE 95
87 (3) calculates Z, the asjmptotic variancecovariance matrix for these estimators; (4) calculates Rhodes' estimates for a and p , using the method given in 4.2. The results of several of the experiments are presented. The format is as follows: first, a table is given which lists the generated logistic sequence and the sequence of expected values. This table is followed by a graph depicting both the generated sequence and the expected sequence. Finally, a second table gives the maximum likelihood estimates and the corresponding asymptotic variancecovariance matrix, using the first third, first twothirds, and all of the generated sequence, respectively. Rhodes' estimators are also calculated for each case. Three such experiments are presented, each with a different value of n and a. We have conducted several other experiments, and those presented here are exemplary. Changes in p do not significantly affect the results, except that rounding errors in the computations affected the maximum likelihood estimates for p (p ^ ) . These cases are practically uninteresting, though, since for small p the process quickly approaches an asymptotic value, a characteristic uncommon to most economic and other real sequences. Note that Rhodes' estimates are unreliable when N is small, whereas the effect of N on the maximum likelihood estimates is negligible. In addition, we could obtain asymptotic confidence intervals for the parameters, using the maximum likelihood estimates and E. We also note that the bias in ^ seems to depend primarily on the relationship
PAGE 96
Table 4. 1 Generated and Expected Sequence for n = 40 , a = 10 . , and p = . 8 88 m (0,i) 1
PAGE 97
89 Â• Â• Â• 400350300X(t) 250o o o o o o o o o o 20015010050 n = 40ei_ Â• o o Â• o Â• o o 10 o observed value Â• expected value 15 ~So 25 Â— I 30 Fig. 4. 1. Generated and Expected Sequence for n = 40 , a = 10 . , and p = . S
PAGE 98
90 Table 4.2 Maximum Likelihood Estimates, E and Rhodes' Estimates for Data in Table 4. 1 for n = 40, a = 10.0 and p = 0.8 a = 5.909 p = 0.794 a = 69.464 r = 0.873 N = 10 E = 81.242 0.603 0.603 0.00493 N = 20 or = P = 7.134 0.810 9.747 0.851 Z = 2.801 0.0170 0.0170 0. 182 xlO 3 N = 30 a =
PAGE 99
91 Table 4.3 Generated and Expected Sequence for n = 100, Q= 5.0, and p = 0.8 i
PAGE 100
92 7001 650 600 550 500 450X(t) 4001 350300ooooooooo Â• Â• Â• Â• Â• o e 250 200 o Â• o observed value Â• expected value 150n=IOoiL Fig. 4.2. Generated and Expected Sequence for n = 100, cy = 5.0, and p = 0.8
PAGE 101
93 Table 4.4 Maximum Likelihood Estimates, Y, and Rhodes' Estimates for Data in Table 4.3 for n = 100, Q/ = 5.0 and p = 0.8 N = 10 a =
PAGE 102
Table 4.5 Generated and Expected Sequence for n = 400, Q= 3.0, and p = 0.8 94 i
PAGE 103
95 16001 1500 o o o o Â° o o o Â»400 o 1300 Â• o 1200 1100 X(t) lOOOj 90O 800 700600o observed value Â• expected value 50O n=400tiL. 10 15 20 Â— I Â— 25 Fig, 4.3. Generated and Expected Sequence for n = 400, a = 3.0, and p = 0.8 30
PAGE 104
96 Table 4.6 Maximum Likelihood Estimates, Z and Rhodes' Estimates for Data in Table 4.5 for n = 400, a = 3.0 and p = 0. 8 N = 10 a =
PAGE 105
97 of the generated to the expected sequence. That is, when the generated sequence falls short of the expected sequence, a tends to underestimate a. We conclude by fitting the logistic model to the population of conterminous United States. We use the census results at each decade from 1800 to 1960 as the data to which the model is fitted, treating each decade as one unit of time. Using the population given by the 1800 census as X(0) = n, we calculate the maximum likelihood and Rhodes' estimates for a and p for two sequences: that made up of the populations from 1810 to 1910 and that of those from 1810 to i960. We then use the estimates based upon the latter sequence to predict the population of conterminous United States in 1970 by three methods: (1) that using (3.6.3 ) and the maximum likelihood estimates, (2) that using (3.6.3 ) and Rhodes' estimates, and (3) that using the recursive relation (4.2.3) and Rhodes' estimates. Table 4.7 summarizes the results of this study. The data are given, and then the maximum likelihood estimates, Rhodes' estimates, and an estimate of E, the asymptotic variancecovariance matrix for (cy.p) , are presented for the two aforementioned sequences. We complete the table by giving the predictions for the 1970 population in the order presented above, and finally the data are given for convenience. Note the rather significant discrepancy in the estimates for the two sequences, illustrating a point made by Granger [14]: the logistic model is rarely useful in making longrange predictions. However, as shown by the prediction of the 1970 population, the model has application when one confines extrapolations to shorter time spans.
PAGE 106
(a) Data Table 4.7 Summary of Results of Study 98 Year
PAGE 107
99 The extra effort required to calculate the maximum likelihood estimates seems to be worthwhile. One is encouraged to try to apply the methods used for the logistic process to other evolutive models whose parameters are currently estimated without reference to probability structure. Further research into this area is contemplated.
PAGE 108
BIBLIOGRAPHY 1. Bartlett, M. S. (1955). An Introduction to Stochastic Processes . Cambridge University Press. 2. Billingsley, P. (1961). Statistical Inference for Markov Proc esses. University of Chicago Press. 3. Box, G. E. P. and Jenkins, G. M. (1968). Some Recent Advances in Forecasting and Control, I. Applied Statistics . 17 91108. 4. Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis ( Forecasting and Control ). HoldenDay, San Francisco. 5. Chanda, K. C. (1954). A Note on the Consistency and Maxima of the Roots of Likelihood Equations. Biometrika . 41 5661. 6. Cramer, H. (1946). Mathematical Methods of Statistics . Princeton University Press. 7. Cramer, H. and Leadbetter, M. R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. 8. Davis, H. F. (1963). Fourier Series and Orthogonal Functions . Allyn and Bacon, London. 9. Doob, J. L. (1953). Stochastic Processes . Wiley, New York. 10. Durbin, J. (1960). Efficient Estimation of Parameters in MovingAverage Models. Biometrika. 46 306316. 11. Durbin, J. (1960). The Fitting of Time Series Models. Rev, of Inst, of Int. Statist . 28 233244. 12. Durbin, J. (1961). Efficient Fitting of Linear Models for Continuous Stationary Time Series from Discrete Data. Bull, of Inst, of Int. Statist . 38 273281. 13. Feller, W. (1950). An Introduction to Probability Theory and Its Applications , Vol. 1. Wiley, New York. 14. Granger, C. W. (1964). Spectral Analysis of Economic Time Series . Princeton University Press. 100
PAGE 109
101 15. Grenander, U. and Rosenblatt, M. (1957). Statistical Analysis of Stationary Time Series . Wi 1 ey , New York. ~ 16. Hannan, E. J. (1969). The Estimation of Mixed Moving AverageAutoregressive Systems. Biometrika . 56 579593. 17. Jenkins, G. M. and Watts, D. G. (1968). Spectral Analysis and Its Applications . HoldenDay, San Francisco. 18. Kendall, D. G. (1948). On the Generalized "Birth and Death" Process. Ann. Math. Statist . 19 115. 19. Kendall, D. G. (1949). Stochastic Processes and Population Growth. Journ. Roy. Statist. Soc. B . 11 230264. 20. Kolmogorov, A. (1941). Stationary Sequences in Hilbert Space. Bull. Math. Univ. Moscow . 2 140. 21. Mann, H. B. and Wald, A. (1943), On the Statistical Treatment of Linear Stochastic Difference Equations. Econometrika . n 173220. 22. Moran, P. A. P. (1951). Estimation Methods for Evolutive Processes. Journ. Roy. Statist. Soc.B . 13 141146. 23. Moran, P. A. P. (1953). Estimation of the Parameters of a Birth and Death Process. Journ. Roy. Statist. Soc.B . 15 241245. Â— 24. Parzen, E. (1962). Stochastic Processes . HoldenDay, San Francisco. 25. Rhodes, E. C. (1940). Population Mathematics, III. Journ. Roy. Statist. Soc. 103 362387. 26, Tinter, G. (1952). Econometrics . Wiley, New York. 27. Wald, A. (1949). Note on the Consistency of the Ma.ximum Likelihood Estimate. Ann. Math. Statist . 20 595601. 28. Walker, A. M. (1954). The Asymptotic Distribution of Serial Correlation Coefficients for Autoregressive Processes with Dependent Residuals. Proc. Camb. Phil. Soc . 50 6064. 29. Walker, A. M. (1960). Some Consequences of Superimposed Error in Time Series Analysis. Biometrika . 47 3343. 30. Walker, A. M. (1961), On Durbin' s Formula for the Limiting Generalized Variance of a Sample of Consecutive Observations from a Moving Average Process. Biometrika, 48 197199.
PAGE 110
102 31. Walker, A. M. (1961). Large Sample Estimation of Parameters for Moving Average Models. Biometrika . 48 343357. 32. Walker, A. M. (1962). Large Sample Estimation of Parameters for Autoregressive Processes with Moving Average Residuals. Biometrika . 49 117131. 33. Whittle, P. (1953). Estimation and Information in Stationary Time Series. Arkiv for Matematik. 2 423434. 34. Whittle, P. (1963). Prediction and Regulation . English Universities Press, London. 35. Wold, H. (1954). A Study in the Analysis of Stationary Time Series (2nd Ed.). Almquist and Wickell , Uppsala. 36. Yaglom, A. M. (1962). An Introduction to the Theory of Station ary Random Functions. PrenticeHall, Englewood Cliffs, New Jersey.
PAGE 111
BIOGRAPHICAL SKETCH James Thomas McClave was born on December 16, 1944, in Uhrichsville, Ohio. He was graduated from Wintersville High School in June, 1962. In September of that year he enrolled in Bucknell University, receiving the degree of Bachelor of Science with a major in Physics in June, 1966. Since an interest in the study of Statistics was initiated by Mr. Paul Benson at Bucknell, the writer entered the University of Florida Graduate School in June, 1966. Mr. McClave has worked as a consultant and teaching assistant for the Department of Statistics since that time, simultaneously pursuing his work towards the degree of Doctor of Philosophy. James Thomas McClave is married to the former Mary Jay Johnson. He is a member of the Institute of Mathematical Statistics. 10 :
PAGE 112
I certify that I have read this study and that in my opinion it conforras to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kamal C^^jarSrida , Chairman Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. c;
PAGE 113
This dissertation was submitted to the Dean of the College of Arts and Sciences and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. March, 1971 Dean, Graduate School
PAGE 114
.^

