ON SOME PROBLEMS OF ESTIMATION AND PREDICTION
FOR NONSTATIONARY TIME SERIES
By
JAMES THO:.LAS McCLAVE
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE RErLUIRE.LENT'S FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1971
TO MY FATHER, MY MOTHER
AND MY V.IFE
FOR THEIR LOVE, UNDERSTANDING
AND FINANCIAL SUPPORT
ACKNOWLEDGMENTS
I wish to express my heartfelt thanks to Dr. K. C. Chanda for
his expert and patient guidance in this effort.
I also want to express my gratitude to Dr. J. G. Saw for
proofreading this dissertation and for his helpful comments; and to
my office partner, Dr. J. Shuster, for many helpful discussions.
Special thanks go to my wife, Mary Jay, for her loving support,
financial and otherwise.
Finally, I wish to thank Mrs. Edna Larrick for the magnificent
job of turning the rough draft I gave her into this typing masterpiece.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS ... .. ... .. ... ... .. iii
LIST OF TABLES ...... ... .... . . .. . . v
LIST OF FIGURES . ..... ..... .. ........ vi
ABSTRACT . .. .... . ... . .. . ... . .vii
CHAPTER
I STATEMENT OF THE PROBLEM. . ............ .. 1
1.1 Introduction .... . . ... . ... .. 1
1.2 HistoryStationary Processes .. . ... ... 4
1.3 HistoryNonstationary Processes . ... ... . 5
1.4 Summary of Results . . . . . . . .. 7
II LINEAR STATIONARY MODELS .. . . .. ... . ... 9
2.1 Introduction . . . .... . ...... . 9
2.2 The Autoregressive Sequence . ... .. . 10
2.3 The Moving Average Sequence ... ... . 24
2.4 The Hybrid Model ..... .. . . .. .... 32
III THE LOGISTIC GROWTH PROCESS ..... .. .. . 46
3.1 Introduction .. .... . ..... . . 46
3.2 Probability Distribution of [X(t)] . . . . . 47
3.3 Equispaced Observation . . .... . ... 50
3.4 Continuous Observation in [O,T] . . ... .. ... 58
3.5 Arrival Time Observation .. . .... . .... 73
3.6 Prediction . . .. . . . . .. . .. . 75
IV AN EMPIRICAL COMPARISON.. .. ..... . ... 80
4.1 Introduction .......... . . . ... 80
4.2 Rhodes' Estimators . . .. . . . . . . 80
4.3 Sampling Properties of Rhodes' Estimators . . .. 82
4.4 Monte Carlo Experiments . . .. ... .. . 86
BIBLIOGRAPHY.. .......... ...... . . . 100
BIOGRAPHICAL SKETCH .. . . .. .. .... .. .... 103
LIST OF TABLES
Table k Page
2.1 Tables of Efficiencies for Y0 = dj C . . . . 41
j=l J1
4.1 Generated and Expected Sequence for n = 40, Q = 10.0,
and p = 0.8 . . . . . . . . ... . . . 8.
4.2 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.1 for n = 40, a = 10.0 and p = 0.8 . 9C
4.3 Generated and Expected Sequence for n = 100, a = 5.0,
and p = 0.8 . . . . . . . . ... . . . 93
4.4 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.3 for n = 100, Q = 5.0 and p = 0.8 .. 9.
4.5 Generated and Expected Sequence for n = 400, a = 3.0,
and p = 0.8 . . . . . . . . ... . . . 9
4.6 Maximum Likelihood Estimates, E and Rhodes' Estimates
for Data in Table 4.5 for n = 400, a = 3.0 and p = 0.8 . 9E
4.7 Summary of Results of Study . . . . . . . . 9
LIST OF FIGURES
Figure Page
4.1 Generated and Expected Sequence for n = 40, a = 10.0,
and p = 0.8 ... .. .. .. .. ........ ... . 89
4.2 Generated and Expected Sequence for n = 100, a = 5.0,
and p = 0.8 ... . . . . . . .. .... . 92
4.3 Generated and Expected Sequence for n = 400, a = 3.0,
and p = 0.8 .... . . . . . . . . 95
Abstract of Dissertation Presented to the
Graduate Council of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy
ON SOME PROBLEMS OF ESTIMATION AND PREDICTION
FOR NONSTATIONARY TIME SERIES
By
James Thomas McClave
March, 1971
Chairman: Dr. Kamal C. Chanda
Major Department: Statistics
Many techniques are available for estimating the parameters of
linear stationary time series. The effect of some of these estimators
on the least squares predictor for some future value of the series is
examined. We have obtained approximations for the increase in predic
tion error due to parameter estimation in several cases for which no
exact expression could be found.
An efficient estimator for the parameters in a first order
autoregressivemoving average model is developed by making use of a
linear function of the autocorrelations. For more general models we
conclude that efficient estimation is so difficult to attain that first
consideration in many estimation prediction problems should be given
to ease of calculation.
Numerous unsolved estimation and prediction problems remain
for nonstationary time series. We consider the logistic growth proc
ess, which is used extensively as an economic and population growth
model, in detail. Current estimation procedures for the logistic
process' parnrneters make no reference to an error structure.
We propose a probability structure consistent with the realistic
properties of the series. We then use this structure to obtain esti
mators from three different observational standpoints:
(1) that of observation at equidistant time points,
(2) that of continuous observation, and
(3) that of arrival time observation.
For observation types (1) and (2) we have used a modification
of maximum likelihood procedures to obtain estimators having most of
the usual properties associated with maximum likelihood estimators.
A computer program was written for type (1) to solve the intractable
estimation equations, using the NewtonRhapson procedure. Observation
of arrival times is shown not to lead to any useful estimation proce
dures. We also examine the effect of the estimators calculated from
the observations taken at equal time intervals (type (1) above) on the
error of prediction.
The procedure developed for observation type (1) is then compared
by means of example to an estimation procedure developed by Rhodes.
The logistic model is also fitted by each method to the population of
conterminous United States. The estimates are then used to obtain
predictions of the population of conterminous United States in 1970.
We conclude from the results that the effort required to calculate the
maximum likelihood estimates is worthwhile. We further conjecture that
the methods developed may be applicable to other growth processes.
viii
CHAPTER I
STATEMENT OF THE PROBLEM
1.1 Introduction
Consider the following two types of time series:
1. fX(t); t e Ja,b, where Ja,b is the set of real integers
1a,a+1,...,b}. This is the discrete time series.
2. {X(t); t e Ia,b}, where Ia,b is the interval (open or closed)
between the real numbers a and b. This is the continuous
time series.
Suppose an observation is made on a given series, either obtain
ing an observation set of the kind (x(S),x(S+l),...,x(T)}, with S and
T real integers such that JS,T C Ja,b from the first or second type;
or of the kind fx(t); t e IST, with IS Iab, from the second type.
S,T S,T a,b'
The problem to be considered is that of finding the least squares pre
dictor of X(T+m) (with m an integer if the series is discrete and T+msb)
based upon the observation. Thus, we want to find that function of the
observation set, say x (m), which minimizes the mean square error of
prediction.
The first step toward the solution of this problem is to construct
a parametric representation of the series. One must make use of all avail
able information about (X(t)J in order to place the problem in a parametric
framework. This parametrization may vary from specification of the first
and/or second moments of the series to complete specification of the
probability distribution at each point t.
This study of the series' genesis will usually place the process
into one of two subclasses:
(a) Weakly stationary, hereafter referred to as stationary,
series; that is to say, those whose first two moments are,
for every t,
E(X(t)i= u, (1.1.1)
independent of t; and
E(X(t+s)X(t)] = s (1.1.2)
a function of s only.
(b) Nonstationary series, the complement of subclass (a).
If the series satisfies the conditions of stationarity, the
prediction problem is simplified. The minimum mean square error pre
dictor depends on the first two moments of the time series. Since for
stationary processes the second moment depends only upon the "lag" or
time difference of the random variables, and, since the first moment is
independent of time, all sequences of observations which are of equal
lengths yield the same information about the series' first two moments,
no matter when in the series' development the observations are taken.
This second order homogeneity simplifies the problem to the extent that
all further parametrization pertains to the description of 4j and ,s'
from which the predictor will be calculated.
On the other hand, if the series is nonstationary, the problem
is more complex because of the nature of this subclass. We will divide
the nonstationary subclass into two categories:
(i) those for which a simple transformation exists which trans
forms the process into a stationary time series, and
(ii) those for which either no such transformation exists, or at
least no simple transformation can be found.
As to the first category, the nature of the transformation
depends, of course, on the reason for the series' nonstationarity.
As a simple example, suppose (X(t)} is a continuous time process with
E(X(t)} = P(t) = e + Bt, (1.1.3)
E(X(t) X(t+s)] = Ys(1.1.4)
Then the process fY(t)), formed by taking
Y(t) = X(t) m(t), (1.1.5)
will be stationary.
This particular kind of transformation is popularly referred to
as removal of the trend from fX(t)];it can be accomplished for simple
linear trends like that given by (1.1.3), or for more complex seasonal
or harmonic trends common to many economic time series. The end result
is the same; this further parametrization has permitted us to reduce the
problem to that of stationary series prediction, for which solutions
are known.
However, it is perhaps within the second category that one
encounters the most complex prediction problems in time series. Without
the homogeneity of the first two moments, the nonstationary process'
analysis will certainly depend on when the observation is taken; and the
underlying probability structure of the series can be extremely compli
cated. In short, general methods cannot be used to solve the prediction
problem for nonstationary processes with satisfactory results. Each
series must be considered individually, its evolution thoroughly re
searched, a parametrization made, and then an optimal prediction proce
dure developed. One can hope that this systematic method will apply to
a general group of processes, but that the actual procedure developed
will apply to any but the specific series being investigated is extremely
doubtful.
1.2 HistoryStationary Processes
The literature on the prediction problem for the stationary time
series is voluminous. That the area is extremely attractive follows from
the above; most results have wide application within the subclass of
stationary processes, and the chances that a problem is mathematically
tractable are enhanced by stationarity. In addition, of course, station
ary processes have great practical significance.
Problems of least squares prediction for stochastic processes
were first considered by Kolmogorov [20], using Hilbert space geometry.
This treatment has been adapted to the stationary processes by several
authors, including Doob [9], Yaglom [36], and Parzen [24]. Hence the
prediction problem is essentially solved for all stationary time series;
the known general methods need only be applied to the specific stationary
process being considered. Many authors have contributed to this end,
among them Doob [9], Yaglom [36], Whittle [33,34], Box and Jenkins [3,4],
and Jenkins and Watts [17].
The particular parametrization proposed often results in an
estimation problem which must be solved prior to making the prediction.
This is illustrated by three of the most commonly hypothesized dis
crete time series: the autoregressive, moving average, and mixed
autoregressivemoving aerange (hybrid) models. Each receives wide appli
cation in engineering and economic research, and each entails parameter
estimation. These estimation problems have been considered by Mann and
Wald [21], V;Tiittle [33], Durbin [10,11,12], Walker [28,29,30,31],
Hannan [16], and Box and Jenkins [1]. Some of the estimation procedures
are rattler complicated to apply; but, for the most part, satisfactory
estimation methods exist for these linear stationary models.
The literature is nearly devoid of research exploring the effect
of parameter estimation on the error of least square predictors. Even
though the error increase due to parancter estimation will usually be of
a smaller order than the mean square error, the precise nature of this
increase needs to be studied. Box and Jenkins [4] do present a brief
discussion of this problem for one simple time series model, but to my
knowledge the subject has not been mentioned elsewhere. This problem
will be considered at some length in Chapter II.
1.3 HistoryNonstationary Processes
All the work on the prediction problem for nonstationary time
series is divided into two distinct groups: that which deals with trend
removal in order to make the process stationary, and that which does not.
We shall consider only the latter in this dissertation; the interested
reader may consult Grenander and Rosenblatt [15] and Box and Jenkins [1]
for a complete treatment of trend removal.
Kolmogorov's Hilbert space solution to the prediction problem
is applicable to nonstantionary processes. The solution depends mainly
on the second order properties of the process. Since the second moments
are inherently more complex for nonstationary processes than those for
stationary processes, the result of the application of Kolmogorov's
method to a nonstationary time series usually does not have general
application within the class of nonstationary processes. For this
reason one finds that most of the estimation and prediction research in
this area has dealt with specific process types; for example, queuing
processes, renewal processes, birthdeath processes, and growth
(evolutive) processes. The list is not exhaustive; each class has
received attention in the literature, much of it devoted to the estima
tion and prediction problems.
Specifically, considering the birthdeath and growth processes,
both usually Markov processes, most of the research has been concen
trated on homogeneous processes; that is, those with transition probabil
ities independent of time. Feller [13] gives a rather complete intro
duction to the birthdeath processes, defining the underlying probabil
ity structure of them. Then David Kendall [18] uses generating functions
to obtain solutions for the probability distribution of these series at
any time t. Moran [23] partly solves the problem of estimation for
homogeneous birthdeath processes, and both Moran [22] and Kendall [19]
propose several r.ethods for estimating the parameters of the transition
probabilities of some simple nonhomogeneous birthdeath and growth
processes.
Since the publication of these papers, research in this area has
been sparse at best; and the prediction problem has received surpris
ingly little attention from statisticians. Biologists and economists
(Rhodes [25], Tinter [26], and Granger [14]) have taken some interest in
this area, since it includes prediction of animal population size and
economic forecasting. Unfortunately, these problems are often treated
with little reference to underlying probability structures. The results
rely more on intuition than mathematics, and hence the predictions are
often unsatisfactory.
An example of this kind of treatment is given by Rhodes [25].
Rhodes considers the problem of estimating human population size without
using any probabilistic arguments. His early paper was an indication of
things to come. In all fairness though, early researchers had a legit
imate excuse for steering away from these arguments, for they often make
prediction problems for nonhomogeneous Markov processes mathematically
intractable. However, in this age of the electronic computer, many
formerly insoluble problems can be treated iteratively and solutions
obtained.
1.4 Summary of Results
The results are divided into two distinct categories: first, in
Chapter II the effect of estimation on the error of prediction for linear
stationary time series is considered. For several models an exact expres
sion for the error increase due to parameter estimation is obtained; in
many cases for which this is not possible, approximations for this
increase are given. For the mixed autoregressivemoving average model,
a new estimation procedure is examined with respect to the prediction
problem.
In Chapter III the logistic growth process, a nonstationary,
Markov time series, is considered in detail. A realistic probabilistic
structure is assigned to this process, and then the parameters involved
are estimated from several different observational standpoints. Finally,
the prediction problem is considered for this process, and the effect
of parameter estimation on the mean square error of prediction is
examined.
The methods developed for the logistic process may be applicable
to other nonstationary processes. Nevertheless, the logistic process
has application in describing animal population growth and certain eco
nomic growth patterns, so that the results may be useful in themselves.
Chapter IV is devoted to a comparison of an estimation procedure
(for the logistic process) derived by Rhodes [25] with one of those
derived in this paper. The analytical intractability of the estimators'
sampling properties calls for numerical comparison, so that conclusions
are tentative. The purpose of this Monte Carlo study is to determine
whether the extra effort which the probabilistic methods require is
worthwhile, or whether one should continue to estimate the parameters,
using methods like those proposed by Rhodes; that is to say, those which
rely mainly on intuitive appeal and their ease of calculation.
CHAPTER II
LINEAR STATIONARY MODELS
2.1 Introduction
Many volumes have been written about estimation and prediction
problems for linear stationary time series, but there is little in the
literature which considers the effect of estimation upon prediction.
That is, the increase in the mean square error of prediction due to
parameter estimation has not been examined.
Knowledge about the increase in prediction error for various
predictors may be useful. If the increase is small for several differ
ent estimators, one might choose the estimator easiest to calculate.
On the other hand, one would spare no effort in finding that estimator
which least increased the error of prediction if that increase were
known to be significant.
Our purpose in this chapter is to consider this estimation
prediction problem for the three basic linear stationary time series:
autoregressive, moving average, and a mixture of these (hybrid). Only
discrete time series will be examined; since the stationarity of the
process implies that the mean is mathematically independent of time,
this mean will be assumed known and equal to zero. All results remain
asymptotically true for the case of an unknown mean for the series
formed by subtracting the arithmetic mean from each of the observed
values. To pursue this would be stray from the objective; the inter
ested render is referred to Bartlett [1] and Box and Jenkins [1].
Finally, the sequence [Zt; t= 0, 1 ,...] will be used throughout
to denote a stationary series whose first two moments are the following:
EZt3 = 0, (2.1.1)
2
2. if j = 0
E[Z Zt = z (2.1.2)
0 if j 0 .
From time to time comment will be made on the case where Zt is normally
distributed. But unless specified, no distributional assumptions are
made.
2.2 The Autoregressive Sequence
If (Xt; t Jm m is a stationary time series satisfying the
relationship
Xt + .XtI + ... + p X = Z (2.2.1)
t 1 t1 p tp t
for every t,then [Xt is said to be a pth order autoregressive sequence.
The problem of estimating the set of parameters [a.; j e J
is analogous to the classical regression problem. Defining ,j as the
jstep covariance of [X}t and pj as the jstep correlation, that is
to say
yj = E[X t+X t (2.2.2)
and
I
Pj = YjO1 (2.2.3)
we get
p
e r = r (2.2.4)
j=1
for r e J by multiplying (2.2.1) by X and taking expected values.
1,p tr
We use the fact that Z and X are uncorrelated when r < 0, which
t tr
will be sho~n later. Equation (2.2.4) gives rise to the estimating
equations
P
E a. C = C (2.2.5)
j=l 3 jr r
for r e J ,p' where
NlI
C. = N t x t l (2.2.6)
t=l iJ.
is the sample covariance function for the observed sequence
fxt; t e J1,N. Mann and Wald [21] have shown that the estimators
[&.; j e J1,p) given by (2.2.5) are asymptotically normally distributed
1 2 1
with null mean and variance covariance matrix (NYO) OZ where
S is a (p Xp) matrix whose element in the (i,j) cell is
( .. = 0. . (2.2.7)
ij 1j
,T
The vector of estimates whose transpose is a = (a& l'&2' '" ) is
T
shown to be the maximum likelihood estimator of a = (a ,'a,... ,p) if
the distribution of Zt is assumed normal for each t.
In order to predict the value of XN+m using the observations
Xlx2,'...,., we will use the function which minimizes the mean square
^(1)
error of prediction. That is, the optimal predictor is ^N (m) which
minimizes
MSE ((m) = E([X R(1)(m)]2 (2.2.8)
But this implies that
X (m) = EXN+m/(Xj = x; j e J1N)3 (2.2.9)
m ,N
Substitution of (2.2.1) into (2.2.9) yields
(1I)
N (m)  E/ ( ] xj j e Jl, )
EZNm j J N+mj j 1,N
P
= T. a (mj), (2.2.10)
j=l
where
E(XN+j/( kXk; k e J1,N
(I) if jj>o
N (j) i j >o (2.2.11)
.N+j if j s .
Thus one can calculate the mstep predictor, xN(m), using the recursive
relation (2.2.10).
This, however, requires that the autoregressive parameters be
known. When they are not, we may logically use
.(2) ..2)
x (m) = a. 2) (mj) (2.2.12)
j=1
To be determined is the increase in the error of prediction caused by
(2) C(1)
using xN (m) instead of N (m).
Case 1: p = m = 1
For the sake of simplicity, let a = c1, so that [Xt) satisfies
the relationship
X aX = Z (2.2.13)
t t1 t
for all t. Using the MannWald estimate for o, one has
1
a = C1 C1 (2.2.14)
and 2
Var (5) ; (N(0) (2.2.15)
Rewriting (2.2.13) and using the fact that the stationarity of [Xt
implies Ic4 < 1, one obtains
X = (Xt2 + Zt + Zt
k1 .
k ki
= X + E cA Z (2.2.16)
tk tj
j=o
for any positive integer k. We can extend this to the limit in the
mean square sense as k m. That is, one can write
Xt cE Z _j, (2.2.17)
j=0
since
E[X2 = 2j 2
j=0
2 1 2
= (12) 2 (2.2.18)
is finite. Note that the left side of (2.2.18) is, by definition, "0'
and
Yk = E Xt+k Xt]
= E Zt+kjj r, tjj
j=0 j=0
= Y YO" (2.2.19)
Thus
1 k
Pk = Yk YO = k (2.2.20)
In (2.2.19) and (2.2.20) we assume k is a positive integer; that ykk
is easily shown.
The minimum mean square error predictor of XN+1 is given by
setting p = m = 1 in (2.2.10), whereupon
(1)
x (1) = x (2.2.21)
N N
The prediction error is given by
SE(1) 2
MSE = E([XN (1)]"
1 N+1 N
= E[(XN+I XN)2
= 2 (2.2.22)
If a is unknown, the predictor corresponding to (2.2.12) is
^(2) = 1 x. (2.2.23)
(1)
XN N.
Using the fact that [X X (1)] is uncorrelated with the sequence
N+1 N
[X ; j c J ,N3, one has
S(2) 2
MSE = E[[XN+1 N (1)] ]
2 N+1 N
= MSE1 + EfXN2( o)2}. (2.2.24)
Thus the increase in the error of prediction due to estimating a is
MSE2 MSE1 = EfXN(o)2} (2.2.25)
Suppose the estimator a is calculated using only the first k
observations in the sequence [xj; j e J1,N. where k << N; that is to say,
we form the sample covariances in (2.2.3), using only k observations.
Since pj decreases exponentially, the fXj; j E Jk} and X are virtually
uncorrelated, and hence so are a and XN. A first approximation for
(2.2.25) is thus
MSE 1MS E(X } E[(()2]
1 2 1 2
= Y k (1c) = k a1.. (2.2.26)
Box and Jenkins [4] present an argument similar to the above. The
degree to which the approximation (2.2.24) holds is impossible to ascer
tain from these arguments.
To obtain a more exact expression for (2.2.25), first define
6 = C. Yj (2.2.27)
for j e J1. Then we can write
1
(<'c,) = C1 C a
I
(60 + Y)1 ( 0), (2.2.28)
using the fact that 1 iY0 = 0. Then
E( a) = El(61 0 )2 (60+70)2
1 1
= Efy2(61p 0)2 (1 260YO1 +3(60 YO ...)
= Eyo ( 60) 23 + o(N1), (2.2.29)
1 '2
since 60 and 1 have sampling errors whose order of magnitude is N
1
(Bartlett [1]). Thus, retaining terms of order N and higher, (2.2.25)
becomes
E[X2c)23 Yo2 E(X (6 1 2 0 (2.2.30)
We now state
Theorem 2.2.1:
If (X ; t e Jm } is an autoregressive sequence of order one,
then
MSE2 MSZ1 = NK(2) (l + c2) 2 K (4)~ l(2))
+ o(N1),
where YZ(r) is the rth cumulant of the distribution of [Zt]
Proof:
(2.2.31)
Referring to (2.2.30), we must evaluate
(i) E([ 60)
(ii) E[ 52 and
(iii) Er 50 51
(i) We will evaluate the first term in detail in order to make
the methods clear. First, note that
E([. 5} = E( (CO )2
N
2 1 N 2
= E( (N XE YO)
N 2 
= E(N l x2) 2N 0
t=1
N 2
EXt= XN
t=1
2 2
+ y0 XN
Now
X (N
t=1
2N2 N 2 4 N
t ) = E[XN Xtj + 2
t=1 t=2
t1
s=l
E[X9 2 2
2E t
E(% N s x
From (2.2.16), for t > s,
ts1
ts k
X= u S X + Y
t s Ztk
k=O
ts
= X + y
s S,t
(2.2.32)
(2.2.33)
(2.2.34)
where
ts1
Y = E a Z k (2.2.35)
s,t k=0
We now write
v(k) = EYk (2.2.36)
s,t s,t
defining
v(k) = y = 0 (2.2.37)
SS S= S
s,s s,s
for all k e J Hence we obtain from (2.2.34), for t S N,
E(X X2X = E(E[X X2/X 13
S2(Nt) EX4 (2) EX (2.2.38)
= a EXtJ + tN t
Furthermore, for s S t 5 N,
EfXN X2 X2 = E(E[XN X2 X2/X]2
X xt Xs N t s s
1= 2(Nt)E([a(ts) X + Y 2 X2
s s,t s
V(2) E (ts) 2 2
+ t,N s st s
2(Nt) +4(ts) 6
a E(X 3
S 2(Nt) + 2(ts) (2) 2(ts) (2) 4
+ 16a +a VtN] E(Xs
st t,N
S42(Nt) + (ts) (3) X3
s,t s
+ [I2(Nt) s(4) (2) (2) E(X2 (2.2.39)
Before we can write (2.2.32) in terms of the cumulants of [Zt}, we need
to note that the rth umulant of the processis,using(2.2.17),
to note that the r cumulant of the (Xt] process is, using (2.2.17),
X(r) = (1 r) 1 z (r) ,
and hence that
E(X6s = (1c6) 1 Y(6) + 15[(1o4)(1Cr2 )1 t(4) ~ (2)
32 2 2 3 3
+ 10(1a3) K(3) + 15(1 o2)3 (2),
E(X]J = (10 4)1 (4) + 3(1o2)2 (2),
E[SX3 = (1c3)1 z(3),
and
E[X2] = (1a2)1 K (2).
We will also need, from (2.2.35) and (2.2.36),
(4)  4j 2
Vst = E [?z(4) + 3K(2)] + 6
j=0
ts1
i=l
i1
j=o
(2.2.40)
(2.2.41)
(2.2.42)
(2.2.43)
(2.2.44)
2(i+j) (2)
2
4 1 4(ts)
= (1c ) (1 )Z(4)
2)2 2(ts) 2 2
+ 3(1a ) (1ct ) x (2),
(3) tsl
v(3) = j Z (3)
j=0
= (1e3 1( 13(ts)) (3),
= (1ct) (1ct ) K2(3),
(2.2.45)
(2.2.46)
ts1
(2) =2j
V st= C O(2)
j=0
= (1u )1(2(ts) a (2) .
(2.2.47)
We are now prepared to obtain (2.2.39) in terms of the cumulants of
(Zt3; the result of substitution of (2.2.i41) (2.2.417) into (2.2.39) is
E 2 2 6 1 2(Ns)+2(ts)
EfX t = (1 6) ~Z(6)
S[ 4 (12 2 2(Ns)+2(ts)
+ [(1ca )(cY )] [S'
2(Ns) 2(Nt) 2(ts)
+ 5a + a + a ] x Z(4) KZ(2)
3 32 2(Ns)+2(ts) 2Nst 2
+ (1c ) [6a + 4a + 4Z(3)
2 3 i 2(Ns) 2(Nt)
+ (1a ) [1+O1 + 2a
2(ts) 3
+ 22(t ] 3(2) (2.2.48)
We now substitute (2.2.48) into (2.2.33) to get
N N1
Ef ( E X )2 = (1a6)1 Z(6) E a
t=1 t=O
Ni t
N1 t 2s+2tI
+ 2 E E Cy
t=l s=l
4 2 N1 t 2s+2t
+ [(1c )(1a2)] qz(4)x (2)fN+16 E E
t=l s=l
N2 2t 2 N 2t
+ E a [2(Ntl) + a (2N+8t)] + 14 E cr
t=O t=O
3 2 2 N 2t N1 s 2t+2s
+ (1 ) KZ (3)f10 E C + 12 E E a
t=0 s=2 t=l
N1 tI
+ 8 E E t+s]
t=l s=0
23 3 2 N2 2t
+ (1C ) z (2)[N +2N + E a [4(Ntl)
t=0
+ 2(4N+16t)]3. (2.2.49)
The second term on the right side of (2.2.32) is similarly evaluated to
obtain in part
N 2 41 N 2(Nt)
E E[XN X = (1) (4) Z
t=l t=0
2 2 2 N1 2(Nt)
+ (1a ) H(2)[l+ 2 ac
t=0
4( ) 1 N 2t
= (1C) (4) 2t
t=0
Ni
S N1 2t
+ (1C) H(2) [N+2 E a I (2.2.50)
t=0
Putting (2.2.49) and (2.2.50) into (2.2.32) yields
E[f 62} = Nl([(1&4)(1&c2)2]1 (l+ 2) 4 (4) K (2)
X =4(2) 3 z
+ 2(1c 2)4(l+a2) (2)) + o(N1). (2.2.51)
(ii) We proceed in the manner described above to obtain
E(XN 62] = N (2[(1a )(12a )21 4 a (4) x (2)
+ (1 )4 (1+4a 24) (2)] + o(N). (2.2.52)
(iii) Similarly, we find
E 1 4 221 2
E[X 5 80 = N f[(1: )(1a ) 2 o(l+a ) n (4) x (2)
+ 4(12) 4 3 (2)]+o(N1). (2.2.53)
Finally, from (2.2.30) and (2.2.51) (2.2.53),
MSE2 MSE1 = (1c2) 2 (2) [EfX2 2 2caE(XN .51
2 2 1 N 0
+ a E[ 2 A bI
= N{ (x(2) (1+2 )1 a2 Z(4) < (2)] + o(N1),
((2.2.31))
which completes the proof of the theorem.
Note that if [Zt3 is a normal process, YZ(4) is zero, so that
MSESE SE = N1 (2) + o(N )
2 1 2
= N1 2 + o(N1). (2.2.54)
Z
Thus we have the rather remarkable result that, after [Zt3 is a normal
process, XN and a are uncorrelated when terms of lower order than N
are ignored. That is,
MSE MSE = E[X 2 E((& )2} + o(N1). (2.2.55)
Case 2: p = 1, m general
Using (2.2.9) and (2.2.16)
m1
xl) (m) = Ef E ZN+mj +m XN/(X = x; j Jl, )
XN = N+mj j
j=O
= mxN. (2.2.56)
Then
MSE1 = E[XN+m (1) (m)]2}
2 2m 2 (2.)
= a (10 )(10? (2.2.57)
Assuming ca unknown, one forms the predictor
x(2)() = am xN. (2.2.58)
As in Case 1,
MSE MSE = E[X((m) X(2)(m)]2
= EXN [(56 + a))m Om 2 (2.2.59)
= (N[( + ]3
writing
6? = & a.
Upon expanding (2.2.59) and remembering that 5o has sampling error of
1/2
order N we have
MSE2 MSE1 = EX2L[(,52)m + mc(5)m1 + + mm(6)] 2
2 2(m1) 2 1
= (m Ef.(6o)2} + o(N). (2.2.60)
Using Theorem 2.2.1 and assuming Z normal, this becomes
1 2 2(m1) 2 + o(Nl1 (2.2.61)
MISE MSE = N m a + o(N ). (2.2.61)
2 1 Z
If the assumption is dropped, the fourth order cumulant must be included
in (2.2.61).
Case 3: p general, m = 1
The optimal predictor here is
4(1) p
N (1) t xN+lt (2.2.62)
t=l
with prediction error
MSE = Ef[ (1) (1)]2
1 N+1 N
p 2
= E .(X + Z a. X
E[(XN+1 .+ j N+lj)
j=1
2
Z0. (2.2.63)
The predictor becomes, when the parameters are unknown,
(2) P
xN (1) = Ej 1& x j (2.2.64)
j=l J N+1j
The increase in prediction error is given by
MSSE MSE = Ef[ .2)(1) X(1)(1)]2
2 1 N N
= T T (2.2.65)
= EU~o) x x (2.2.65)
T
using matrix notation with X = (XNX_1 .. XN+). The evaluation of
(2.2.65) is intractable; Theorem 2.2.1 leads us to conjecture that, for
normal Zt, X and (6a) are uncorrelated (ignoring terms of order less
1
than N ). In this case
MSE2 MSE1 YOE[(6& E Z()},
where X X has been replaced by its expected value. It follows from the
results of Mann and Wald [21] that, asymptotically,
(6) N N ,c (Y0Z) (2.2.66)
where N denotes a pvariate normal distribution and cp a null vector of
p p
order p. Thus
= 1 2 
2LSE ASE1 YO[(Ny0) ac p] + o(N )
2 2 
= N p a + o(N ), (2.2.67)
T 2 1 1
using the fact that N(6a)T[a (yOE)1() ) has a chisquare distribution
with p degrees of freedom. Note that expression (2.2.67) is a simple
generalization of expression (2.2.54) for the case p = 1.
Case 4: p, m general
In this most general case, the predictor is (2.2.10), where
the values 1)(j) are determined recursively. When ([^.; j e J ,P are
substituted for ([.; j = J }, one obtains (2)(m), given by (2.2.12).
j ,p N
We are interested in calculating
S (2) (1)
MSE2 MSE1 = E[[ E ( X (2m) a X (m))] (2.2.68)
All attempts to obtain (2.2.68) in closed form have been unsuccessful.
That the trm is of older N follows from the properties of ().
That the te.rm is of order N follows from the properties of (i').
Further conclusions might be possible with the aid of Monte Carlo
experiments.
In summary, the autoregressive process allows one to calculate
estimators having many satisfying properties, including relative ease
of calculation. Furthermore, the effect of these estimators on the
error of prediction becomes negligible when N is large. Exactly how
large N must be is dependent upon p and m. We have obtained precise
expressions for the increase in prediction error for several special
cases.
2.3 The Moving Average Sequence
If [Xt; t e J(m )] is a stationary time series satisfying the
relationship
X= + Z (2.3.1)
Xt = t + t1 ... + Bq tq
th
for every t, then [Xtj is said to be a q order moving average process.
The problem of parameter estimation for the moving average
sequence is complicated by the fact that the sample covariances, C., do
not provide efficient estimators of the covariance function of f[Xtl, y
(Efficiency is defined with respect to the variance of the maximum
likelihood estimator when Zt is assumed normal.) The following is
a brief history of the estimation problem.
Whittle [33] obtained maximum likelihood equations for the
moving average process assuming Z to be normal. Estimators were not
obtained from these equations, however; even iterative methods have
proven unsatisfactory for solving the likelihood equations. The search
thus began for estimators of calculable form and with high efficiency.
We note that
yj = E(Xt+j X t
q q
= Ef( E 0k Zt+jk)( X k Ztk)3
k=0 k=0
2j
q5 8k k+j if j
k=O
0 if j Jq+l,' (2.3.2)
defining = 1 and noting that yj = .. One can obtain estimates
3 j
for j j; j J1,q by substituting C. for yj in (2.3.2) for j C Jl,q
However, the lack of efficiency of C. for y. has a predictable effect
on these estimators. Whittle [33] has shown that for q = 1 and 1= 0.5,
the efficiency of this method is about 0.3.
Another representation of iXts is the following:
Xt + E cj Xj = Zt, (2.3.3)
3=1
where
J J2 J
r >0 1 2 q
s>O
sCJ1,q
Zjs= r
and where i .j; j Jl,q are the (distinct) roots of
zq + 1z1 + ... + q = 0. (2.3.5)
1 q
If the roots of (2.3.5) are not all different, the representation is
not affected; only the definition of the parameter set (ci; j e J 3"
3 1,a
differs slightly. Also, in order that (2.3.3) be a valid representa
tion (in the mean square sense), we must require that all roots of
(2.3.5) lie inside the unit circle. This restriction is not serious;
in fact, it insures that the representation (2.3.1) is unique.
Thus [Xt3 may be considered an autoregressive process of infin
ite order. Durbin [10] proposes that, since it is easily shown that
ar decreases exponentially in r, we use the representation
k
X + E C Xt ; Z (2.3.6)
j=1
where k is chosen large, but still k << N. The parameters (I; j e Jl ,k
may then be estimated using methods discussed in 2.2; finally, (2.3.4)
and (2.3.5) are used to transform (&[; j J l,k to estimators of
([ ; j e J 3,q. Durbin shows the estimators can be made efficient by
choosing k(and hence N) large. All properties derived assume N infinite
and k large enough that relationships (2.3.3) and (2.3.6) are nearly
equivalent. The computation of the estimators requires considerable
effort.
Walker [31] finds linear functions of the sample correlations
1
[r = C CO ; j J1,N) which provide asymptotically efficient esti
mators of the correlation function [p = 1 j J He then
=, y E: ji He then
substitutes these estimators into (2.3.2) (expressing (2.3.2) in terms
of correlations rather than covariances), obtaining asymptotically
efficient estimators of [%j; j e J q}. The efficiency depends pri
marily on the number of correlations used in the calculation of the
estimators and the number of observations, N. The weights assigned to
the sample correlations used in these linear estimators are determined
iteratively, so calculations may be tedious.
The most recent contribution is due to Hannan [16]. He uses
the spectral representation of fXt} to obtain parameter estimators.
Though his work is too complicated to summarize here, it should be
noted that the estimators are shown to be asymptotically normal with
variancecovariance matrix identical to that of the maximum likelihood
estimators. The procedure does involve an iterative procedure which,
by his own examples,may be slow and thus expensive. His Monte Carlo
experiments also showed inexplicable bias in his estimates.
By now one realizes that the estimation problem for the moving
average time series is anything but simple; but the author's objective
is to obtain an estimator suitable for use in a predictor. The predic
tor which minimizes the mean square error of prediction is
^(1)
xN (m) = EXN+m/(X = x; j e J1,N. (2.3.7)
If the entire history of fXt were known, that is, if the observation
were fxj; j Je mN}, we would have
q
(1) q
xN (m) = E{ Z 3. Z_. ./(X. = x.; j J }. (2.3.8)
j=0 N+mj j j ,N
Note that fX. = x.; j e J ,N uniquely determines Zj =z; j e J ,N
j j J o,N j dN3
and conversely, by using (2.3.1) and (2.3.3). Thus
EZ N+k/(X. = x; j J N
N+k3 jmN
= E(ZN(Zj = z ; j J 3
N+k ( 3,N3
= 0,
(2.3.9)
for k J 1,., since the (Ztj process is an uncorrelated sequence.
Thus (2.3.8) becomes
4(1) q
XN (m) = S m (2.3.10)
j=m
Remembering that at decreases exponentially in r, we can approximate
ZNk by
NkI
ZN~k w xNk + Q 0t. x (2.3. 11)
zNk X xNk + 1j xNkj
j=1
for use in (2.3.10). Henceforth we assume that the entire history of
the (XtJ series is known up to time N so that the discussion is not
needlessly complicated.
Case 1: q = m = 1
From (2.3.10)
x(1 ) = B zN, (2.3.12)
writing Q = I The error is
MSE1 = E[[XN+ N(1)]2
= E((XN+1 N 2
= E[(XN1 B ZN)2
2
= (2) (2.3.13)
If S is unknown, and one uses the estimator 5, we have
^) (1) = (2.3.14)
where from (2.3.3)
Z = Xt + E () Xt ., (2.3.15)
j3=1
and so
ZN = XN + j ( x)J Nj
j=1
N1
Sx + F (s)j xNj. (2.3.16)
j=1
From previous arguments we know that
MSE2 SE1 = E[($ZN ) ZN)2. (2.3.17)
1/2
If we assume that 0 has sampling error of order N/2, which
is the case for each of the estimation procedures considered above,
we have
MSE2 MSE1 = E([[ZN (6+)(6Z N+ N)]2
= E{[ZN (62, + 02 E[(6ZN)23 + o(N1), (2.3.18)
where 60 = and 6ZN =ZN N
Note that it follows from (2.3.9) that if we use (xj; j J1,N1
in calculating $, then 0 and ZN are uncorrelated. Then
EfZ (68)23 = E(Z E[(60)2
= a2 E{([_)23 (2.3.19)
The second term on the righthand side of (2.3.18) follows from
E((6ZN )2 = E2 [ E (1) (J~J)XNj]2
j=1
= E[ (1)j j 1 (6)XNj 23 + o(N1). (2.3.20)
j=l
The value of (2.3.20) depends on the specific form of H. However, the
expression will be complicated for all estimates considered. One can
2
obtain an approximation for (2.3.20) by replacing (58) by its expected
value to get
E(8ZN)2 E[(6)2 [ j2 2(j1)
j=l
2y1 j(j+l)w2(j1)
j=l
SE(86)2] 2 o2(182)1 (2.3.21)
Substitution of (2.3.19) and (2.3.21) into (2.3.18) yields
USE0S2 22 2 2 1
SE2 E E(5)2 o[ + 8(1iB2)1]
= Ef(B) 2] a (1". (2.3.22)
Whittle [33] has shown that the maximum likelihood estimator for B has
asymptotic variance N (1 ), so that, at best, the increase in the
mean square error of prediction is given by
1 2
MSE MSE = N aZ (2.3.23)
2 1 Z
A strong argument can be made for using the inefficient estimator
given by substituting sample correlations into (2.3.2) and solving for B;
the estimation equation is thus
1 12l
rl = C1 C1 = (1+ 2)1 (2.3.24)
Only one of the solutions to this quadratic will be less than unity.
The estimator is easy to calculate, and even if the efficiency is only
0.1, the additional error given by (2.3.22) is negligible for large N.
Case 2: q, m general (m < q)
Using the mstep predictor given in (2.3.10), the mean square
error is
qm
MSE = E{[XN+m j+m Z Nj]2
j=0
m1
= a2(1 + E ) .
j=1
When the estimator
.(2) q
N (m) E= 2 N+mj
j=m
is used, the increase
MSE MSE =
2 1
in prediction error is given by
Ef[x(1) (2) 2
N N
E[E (Z (6$ ) + $.(6Z .) 23
=E[ {ZN+mj ) + (ZN+mj
+ o(N ).
By using only [x.; j e J ,Nq in the calculation of (Pj; j
the cross product terms in (2.3.27) vanish, leaving
q 2 q
MSE MSE = Ef[ E Z .(68j)] } + Ef[ Z j.6Z
2 1 N+mj N
j=m j=m
(2.3.27)
1,q3'
e J
+m2
+mj
+ o(N ). (2.3.28)
The first term on the righthand side of (2.3.28) is easily
evaluated as
Ef E ZN+m (6 )2= Ef(. ) 2}.
j=m 3 j=m
(2.3.29)
The second term, however, is complicated by the fact that the [Xt]
process is qdependent; that is to say, (.j; j e Jq,q are all nonzero.
If we apply the methods used in Case 1,
E[[ Z B. 5Z ] = E (E (,E.c.)XN+m
j N+mj =m l+mjt
j(2.3.30)
(2.3.30)
(2.3.25)
(2.3.26)
where [(; j e Jlq] are obtained from (2.3.4), using [T.; j Jl,q
as the roots of
+ zq1 + ... + B 0. (2.3.31)
All that could be gleaned from (2.3.30) is the fact that an
increase in q will result in an increase in the prediction error
increase given by (2.3.27). But since no more concise expression could
be obtained for (2.3.30), no approximation was obtained for (2.3.27).
The argument for using the estimators which are easily calculated,
but perhaps inefficient, still applies, of course. But determination
of N in order that the increase be bounded by a given quantity must be
accomplished by using numerical methods.
2.4 The Hybrid Model
If IXt; t e J, ]3 is a stationary time series satisfying the
relationship
t + + t1 p + .tp + 1 t1 tq (2.4.1)
for every t, then [Xt is said to be a hybrid model of order (p,q).
Estimation of the parameter set [ ... ; B1 ... 8 3 for the
hybrid model is the most difficult of those considered. The fact that
the sample covariance function is an inefficient estimator of the true
covariance function contributes to this difficulty. The maximum like
lihood estimators are again intractable, but their asymptotic variance
covariance matrix can be used for calculations of efficiency.
Durbin [11,12] again makes use of an infinite order autoregres
sive representation of [Xt). The problems with the procedure are similar
to those mentioned in 2.3: the choice of truncation point for the
autoregressive representation, the choice of N, and the cumbersome
iterative procedure used in the calculations. However, the efficiency
of the estimators can be made as close to unity as desired by taking
the truncation point and N large.
Walker [32] approaches the problem via the covariance function;
by finding an efficient estimator of this function one can efficiently
estimate the hybrid model parameters. The following relationships are
used:
P p
E{( ZE X .)( Z y. X .s)
j=0 j =0 s
q q
= E( E Z .)( E B. Z .)J (2.4.2)
j=0 J j=0 s
implies that
P P 2 q
j i _cr.i Y(ts)+(ij)=o s j . .tsI, (2. 4.3)
j=0 i=0 j=tsI jts
where o= 0 = 1, and tsl q. This system of equations gives
a ; e Jl,p] and [j ; j E Jl,q in terms of thp covariance function
{yj). In order that the solutions to (2.4.3) be unique, one must require
that the roots of
z + 1 zq1 + ... + q = 0 (2.4.4)
have modulus less than unity.
Also, after multiplication of (2.4.1) by Xr for r e J ,q+p
tr q+l,q+p'
taking expected values yields
P
E i yrj = 0, (2.4.5)
j=0
using the fact that X is uncorrelated with the sequence
t
(Z t+; j J1 ]. Walker finds efficient estimators of y j), and then
uses (2.4.3) and (2.4.5) to calculate estimates of the hybrid model
parameters. An iterative procedure is required, but the resulting
estimators are asymptotically normal and fully efficient.
Hannan [16] achieves the same goal by a different method:
that of estimating the spectral density and using this to get estimates
of the parameters. The effort required is substantial, but the esti
mators are asymptotically normal and efficient.
The predictor of Xm given the observation x.; j c J_ N is
(1)
N+m N "N
xN (m) = EN+mI /(X x; j c J
p q
p 4(1) q
CE (1)(mj) + Bj. zN+m
j=1 j=m
= if m C Jl,q
(1)
(1mj) if m J (2.4.6)
j a N (m q+l,'
where
.(1) if j J1
xN () =,
Nj if j J (2.4.7)
Note that the values of [N; j E J are calculated from
Cl,q
[xj; j e J_ N, using the infinite order autoregressive representation.
When only [x ; j C J ) is observed, we may approximate [z N;j cJ q]
by truncating the autoregressive representation after N terms.
When the parameters are unknown, the predictor used is
P (2) q
E a J( N (mj) + E' j ZN+mj
j=1 j=mj
.(2) if me Jq
xN (m) = l,q
P (2)
E ^ x2) (mj) if m Jq+l, (2.4.8)
j=l
^(2) ^(1)
where xN (j) is calculated just as xN (j) with the
by their estimators, and z. is calculated, using the
presentations with estimated parameters.
Case 1: p = q = m = 1
The time series IXt; t e J
X cXt_ = Z + zt_1,
writing = and = in (2.4.t ).
writing c = Yl and $ = P1 in (2.4.1).
parameters replaced
autoregressive rep
_C satisfies
(2.4.9)
From (2.4.6),
S(1) = N +
XN (1) = CYN + BZN.
(2.4.10)
The autoregressive representation of Z is
ZN = XN + E (0) (XN XN )
j=1
= XN + (1 oal ) E ()J XNj.
j=1
(2.4.11)
Noticing that the autoregressive parameters (the coefficients of the
Xt s) on the righthand side of (2.4.11) are decreasing exponentially,
we will use
N1
zN xN + (1 ()B3 xN (2.4.12)
j=1
in (2.4.10) to calculate iN (m). The mean square error of prediction
for (2.4.10) is
MSE1 = E[[X X)
1 N+1 N
= E[XN+ (XN + ZN) 2
= 2 (2.4.13)
When a and B are estimated by & and B, respectively, we use
(m) = xN ZN. (2.4.14)
The increase in prediction error is
MSE2 MSE1 = E[[eX+ $ZN (&N + ZN)]2]. (2.4.15)
The term Z greatly complicates this expression; but if %e assume
N
ZN ZN'
MSE2 MSE1 E[[('ue)XN + (S^)ZN]2}
S yoE (a)2}+ 02 E (^)2
+ 2o2E[(a )(SS)], (2.4.16)
where
(i) XN and & are assumed approximately uncorrelated;
(ii) (&,P) are calculated using only [x.; j e J1,N_1 so that
3 and ZN are uncorrelated; and, using the fact that [Xt can be repre
sented as an infinite order moving average process,
(iii) EXNZN} = E[[ZN + c jZNj ]ZN]
j=1
= a2 (2.4.17)
Cz
To obtain ~ ej ; j Jl in terms of (a,c), note that
xI = Zt + (Zj. + z_j_)
j=1
Zt + ( tj tj1
= Z + (0) E Z .. (2.4.18)
j=l
One can derive an estimator for a from the equation (2.4.5):
Yr r1 = 0 (2.4.19)
for every re J2,. With this in mind, we define k as an asymptotic
(kxk) covariance matrix with (i,j) element
*
C.i = lim E((C.C')(C.')), (2.4.20)
13 N 1 J
1
where C. =C. C (2.4.21)
1 1+1 i
for i e J,k By letting
1 1
6C = C C
i i+l i i+l Yi
= Ci a (2.4.22)
and
6C. = C. (2.4.23)
we have
1
6C. = (C CI.)C1
1 j+l J j
= (6C. aoc.)(6C. + y.)
J+l J 3 J
1 1/2
=v (6C.j aC.) + o (N 2). (2.4.24)
J J+ p
But Bartlett [1] has shown that C. is consistent for y so that
J 3a
(2.4.24) implies that C. is consistent for &c.
J
Furthermore, from (2.4.18) we find
Q(2 1+( 2+ 20ci (12) i:
Ys
2 am1
(Z aca i:
where
2 1
a = (c+t)(1+)( 1+a )
Defining
v = Ef(5c )(5C )],
Xrr+s E r )(Cr+s '
Bartlett [1] has shown that
2
NXr r+s = Y
S(ppjj+s
j 3S
+ jPj+r+s ) + o(1).
jr j+r+s
(2.4.28)
1
Remembering that p = yo we use (2.4.25) to get
Sp.p
SPjPj+s
3 ,
1 + 2a2(1c2 )1
Ss2 2 2 2 1 2
a [2a a2 (1_ ) + (k1)a + 2ac]
if s= 0
if s i 0.
(2.4.29)
We now state
Theorem 2.4.1:
The estimator
k
cy = djo C ,
j= JO
T
where d = (dl,d0,...,d
S T 1 T 1 1,
I S [f S ]]1
(2.4.30)
(2.4.31)
(2.4.32)
T
d 4 1
0
f s = 0
f s 0,
(2.4.25)
(2.4.26)
(2.4.27)
T
with = (1,1,...1) is consistent and has minimum mean square
error among the estimators
k
= E d. C.
j=1 J J
with d T = 1. This error is given by
MSE( ) = [fT E1 T1,
(2.4.33)
(2.4.34)
where
(a r 12 (1+2 4ac+2a2)
r 2 2 2 2
r,r+s = (a r ) (2 aca a 2 )
2r 2
a (1 )c
if s = 0
if s = 1
if s J2 c
Proof:
That aO is consistent for a follows immediately
T
that C. is consistent for a (2.4'24) and d T = 1.
Now if & is of form (2.3.33),
MSE[} = dT E d.
(2.4.35)
from the fact
(2.4.36)
In order to minimize this with the restriction d = 1, we form the
Lagrange equation
L = dT d 2\(d T 1).
But = 0 implies
I d=d
0
T T
d E = 0.
0
(2.4.37)
(2.4.38)
Using d 11 = 1, it follows that
S= [ Z 1 ]1 (2.4.39)
and hence
T T 1 T 1l 1
d = ~ S I[ E ] (2.4.40)
as was to be shown.
Substituting (2.4.40) into (2.4.36), we obtain
MSE(^&0 = [T E1 P1. (2.4.41)
Finally, noting from (2.4.24) that
a = E[(6C )(6C* )
r,r+s r r+s
r= r+s rt's,r+s+l C ,r.tsl + Yr+l,rs)
+ a2 rr+s, (2.4.42)
substitution of (2.4.28) and (2.4.29) into (2.4.42) yields (2.4.35),
which completes the proof.
Table 2.4.1 gives efficiencies for &0 for several values of
Oa and $; three values of k (the number of sample covariances used in
the calculation of &0) are given. All calculations were carried out
on the University of Florida's IBM 360 computer. Also, efficiencies are
calculated with respect to the variance of the maximum likelihood esti
mator of ce given by Whittle [appendix of 35] as
W 1 2 2
W= N(1 )(cM+a3)2(c+). (2.4.43)
The table shows that the efficiency is especially sensitive to
changes in 8; as B tends to unity, it becomes necessary to use more
sample covariances to achieve high efficiency.
Table 2.1
k *
Tables of Efficiencies for 0 = djo Cj
j=1
k= 2 y
.1 .3 .5 .7 .9
.1 0.999 0.999 0.999 1.000 1.000
.3 0.952 0.961 0.972 0.983 0.994
0 .5 0.791 0.840 0.888 0.934 0.979
.7 0.591 0.687 0.780 0.871 0.958
.9 0.427 0.551 0.680 0.809 0.937
k= 4 y
.1 .3 .5 .7 .9
.1 1.000 1.000 1.000 1.000 1.000
.3 0.999 0.999 0.999 1.000 1.000
0 .5 0.967 0.975 0.983 0.990 0.997
.7 0.830 0.877 0.918 0.954 0.986
.9 0.632 0.731 0.819 0.897 0.968
k= 6 y
.1 .3 .5 .7 .9
.1 1.000 1.000 1.000 1.000 1.000
.3 1.000 1.000 1.000 1.000 1.000
0 .5 0.996 0.997 0.998 0.999 1.000
.7 0.931 0.951 0.968 0.982 0.995
.9 0.746 0.820 0.882 0.935 0.980
In practice, one would need to estimate S (using 2.4.2) in order
to calculate a Thus, though the effort of calculation is somewhat
o
reduced, an iterative method must be used. We can obtain an upper
bound to the mean square error of prediction for
(2M) = &xN + 8N (2.4.44)
by considering
(m) O xN. (2.4.45)
The corresponding increase in prediction error is
MSE3 MSE1 = Ef[(^ c)XN + 6ZN]2}
3 1N
aC [2 + K(l+ce1)2], (2.4.46)
z
where we assume K observations Ix ; je J ,K have been used in the
calculation of &0, with K << N, in order that the assumption that X
and ^ are uncorrelated is more plausible.
In fact, the use of (3)m) as a predictor of XN may not be
N N+m
unreasonable as long as 3 is small, thus saving the rather tedious
calculation of Z N'
Case 2: p = q = 1; m general
The mstep predictor is given by
1) m ml
xN (m) = m x + a1 (2.4.47)
with mean square error
z2 if m= 1
MSE =
:2[1+ (re+B)2 (1 m)(1C2 ) ] if m J2. (2.4.48)
If we use
^(2) A^m ml
xN (m) = C xN + am N' (2.4.49)
the increase in prediction error will be
m m [m1X + (m
USE MSE1 2 MS E E[( C )XN+ (a C CY )ZN]
m1 mN
= E{[ma (6 )N + (a (60)
+ mom 1 c(6))ZN 2) + o(N1)
2 2(m1 (0) 2 2 2(ml) E (6)2
Sm a (YO+ Z)Ef(a') + E(6)
2(m1) 2
+ 2m 1) (1+)oCT E((6a)(6 )}, (2.4.50)
where we have replaced ZN by ZN, and assumed (&,0) to be uncorrelated
with XN.
We can again obtain an upper bound for this rather complicated
expression by considering
xN (m) = xN. (2.4.51)
Then MSE2 MSE1 is bounded above by
2 2(ml) 2 a1 ]2
a C [ 2( 2 + K m(l+a) ]. (2.4.52)
In addition to the comments applied to this quantity in Case 1, notice
that the increase becomes less significant as m is increased.
Case 3: p general; q = m = 1
The representation of (Xt; t e J_, ~ is now, from 2.4.1,
Xt + at + +" +p tp= Z + Zt1 (2.4.53)
The predictor is
N (1) = E Q. 0x + Np (2.N.54)
jXl "xNilj + ZN'
j=1
and the error of prediction is
MSE = E([X ( Xn. i + 8Z,)2I
1 N+I 3 o N+ 13 N
j=1
= oz. (2.4.55)
When o. ; j e J ] arc unknown, estimators can be obtained
from (2.4.2) and (2.1.5); we have
p
E a C = 0 (2.1.56)
j=O
for r e J2,p41 (a = 1) and (2.4.3) for estimating the parameters.
One can also extend the method presented in Theorem 2.4.1 to this case,
but the analysis is increasingly complicated as p increases. The deter
mination of estimates from (2.4.56) and (2.1.3) is probably easiest,
1
and the order of the error increase is N even though the estimators
are inefficient.
The cases with p, q, and m general are analytically intractable
insofar as evaluation of increase in prediction error due to parameter
estimation is concerned. But the study of the simple cases has shown
45
that primary consideration should be given to the construction of
estimators which are consistent and have mean square error of order
l
N. As long as N is large, high efficiency is not a necessary
condition for small mean square error of prediction.
CHAPTER III
THE LOGISTIC GROWTH PROCESS
3.1 Introduction
Let [X(t); t E I0, be a time series such that
t 1
E[X(t)/X(0) = n) = n(+ce)(1+ap) (3.1.1)
where a > 0, n e J and p e (0,1). Then [X(t)3 is said to be a
growth process following the logistic law, or simply a logistic growth
process. This process receives wide application in animal population
models and various economic models; the representation varies slightly,
depending upon the particular application, but with a little adjustment
each can be written as (3.1.1).
The problems to be considered are:
(i) the estimation of a and p in (3.1.1), and
(ii) the prediction of future values of X(t) when a and p
are estimated.
Several observational possibilities will be considered. The
process may be observed periodically, at equidistant time points; it
may be observed continuously during a given time interval; or only
the times at which the process increases (assuming all increases are
discrete) may be observed.
3.2 Probability Distribution of IX(t)}
The probability structure of a growth process may be specified
in any of an infinite number of ways. But in reality these processes
are often used wi:h no reference to such a structure; the parameter
estimation problem is considered without reference to error.
We shall use the following probabilistic model for (X(t)}:
let [X(t); t S IO,} be a Markov process with
P{X(t+5t) = j/X(t) = i}
ik(t)6t + o(6t) if j = i + 1
= 1 ik(t)6t + o(6t) if j = i
o(6t) otherwise, (3.2.1)
where 6t is an infinitesimal time interval. This model obviously
requires the assumption that [X(t)] grows in equivalent discrete units
(economic units, population units, etc.), and that the probability of
a oneunit increase at any given time is proportional to the size of
the process at that time. The reader will recognize (3.2.1) as the
common definition given a birthdeath process with the death rate zero
for all t. Kendall [19] has shown that the above representation is
the birthdeath model which minimizes the variance of (X(t)3.
Defining
pij(st) = P{X(t) = j/X(s) = i), (3.2.2)
an application of the Kolmogorov equations yields
(j1)(t)6t p ij1(s,t) +
p .(s,t+.t) = + [1 j.(t)6t]p .(s,t) +o(6t) if j e Ji+
[1ij (t)6t]pij(s,t) +o(6t) if j = i.
(3.2.3)
Thus
(jl)\(t)P (s,t) j,(t)p. i (s,t)
[P j(s,t)] = if j J+,m
j .(t)p. (s,t) if j = i. (3.2.4)
13
Multiplication of (3.2.4) by j and summation over all j gives
E J([ [Pij(st)}] =
j=i
2
= \(t)[ E j(jl)p (s,t) E j P.(st)]
j=i+l ij1 i
= x(t) E j p..(s,t). (3.2.5)
j=i
Defining
m.(s,t) = E j p. (s,t), (3.2.6)
1 1.
S j=i 1
we rewrite (3.2.5), obtaining
[m (s,t)] = \(t) m.(s,t). (3.2.7)
Solving,
t
mi(s,t) = i exp [ X (u)du}, (3.2.8)
s
using the initial condition
m.(s,s) = i. (3.2.9)
1
The function m.(s,t) shall be called the mean function of the growth
process (x(t)J, since obviously
m (s,t) = EfX(t) X(s) = i]. (3.2.10)
1 
From (3.1.1) we know that
m (0,t) = n(l+Qi)(l+0p t)
n
(3.2.11)
is the number of units at time zero. Equating (3.2.11) and
(for s = 0) implies that, for the logistic growth process,
X(t) = a((a + p 1 (3.2.12)
(3.2.13)
0 = log p.
We now state
Theorem 3.2.1:
If [X(t); t
ability structure is
e IO,] is a logistic growth
(3.2.1), then
process whose proba
j+i1 i (1 T )
Sj+J 1 s,t st
0
where
T = (1 + p )(l + Op ).
s,t
Proof:
The proof is a special case of that given by Kendall [19] for
the general birthdeath process. We let X(t) be defined by (3.2.12) and
p(t), the death rate, be zero to obtain the density given in (3.2.14).
The rigorous proof of this is given for the general case in many places,
and so will not be reproduced here.
where n
(3.2.8)
where
if je Jo
otherwise,
(3.2.14)
(3.2.15)
3.3 Equispaced Observation
Assume that N observations are made on the logistic growth
process I((t); t e 10~} at equispaced intervals of length A. Without
losing any generality in the results to be presented, we henceforth
assume A = 1. Thus (f(j); j e JiN will be observed, with X(0) = n
known.
Writing the set of observations as [xj; j e J ,N], the likeli
hood function for this set is
L[La,p; (x ; j J1,N)] =
= P (Xj = x.; j J1,N)/e,p n). (3.3.1)
Using the fact that [X(t)3 is a Markov process and applying Theorem
3.2.1,
N
L[e,p; (xj; j C J1)] = IT P[X(j) = x./X(j1) = x )
j=l1 j1
N x. x. x.x.
TT T J1 (1T ) J j1
j=1 j1
(3.3.2)
where x0 = n.
Taking the logarithm of (3.3.2), and replacing T by the
j 1,j
defining equation (3.2.15), one gets
N x 1 N
log L = Z x ) + x j[log (l+cp)
j=l j1 j=l
N
log (l+Cpjl)] + (x.x. )[log o(pJ1pj
j=l 3 3l
 log (l+apj1)],
(3.3.3)
writing L = L[c,p;(x; j J1,N)] for simplicity. Differentiation of
(3.3.3) with respect to a yields
Nl
Slog L 1 1 j+1 j+1 1
 = n[p(l+oap) a ] + E x.[p (+ap )
j=1
(1+ jpJl) 1 +xN[O p1 (1+apN1) 1; (3.3.4)
and differentiation with respect to p gives
log L 1 1
Sl = n[(1p) + a,(l+ap)1]
N1
+ x.[(j+l) apj (l+ap j+l) p1
j=1
(j1)cpj2 (l+1apj1)1]
+ xN[(Nl)p1 (lp) (Nl) pN2 (l+pN1 )].
(3.3.5)
Before considering (3.3.4) and (3.3.5) at greater length, let
(Y (t); t e I O, i e J,n be n independent logistic processes, where
Y (0) = 1 for each i, and each of the n processes sharesthe same param
eters as [X(t)}, (a,p). Defining, for s < t,
Hyi(s,t; z,w) = E E PfY.(s) = m; Y.(t) = k}z wm (3.3.6)
m=1 k=m
one has
Hyi(s,t; z,w) = PY.i(s) = m/Y.(0) = 1} wm E P[Y.(t)=k/Y.(s) = m}zk
m=l k=m
T. (1T0 )m1 m kl m (1Ts km k
= Os(w,st ( w s ( ml) Ts,t (,t
m=l ,s O( s k=m s,t st
STO, 6(w zT )[I(IT )6(w,ZsT ,)] (3.3.7)
Os ''s,t t .3, L st
where
6(w,Z;T s) = T wz[1z(IT )]
Simis,t s,t
Similarly, defining
Hx(s,t;z,w) = Z
one has
HX(s,t;z,w) =
SPkX(s) = m; (t) = k ) = n (3 9)
Z P(X(s) = m; X(t) = kX(0) = n)z w (3.3.9)
m=n k=m
mn
SnI
m=n
n (1T n m
0,s ( 0,
X Z (k1)
k m1
k=m
Tm (T km zk
s,t s,t
= (T0,s 6(w,;T t )[(T ) (w,z;T s,] 3
O,s s,t s,t st
which implies that
H (s,t;z,w) = [Hi(s,t;z,w)] = T Hyi(st;zw).
i=1
X(t) = Z Y.(t)
i=l1
(t e I ).
If we rewrite (3.3.3) for the random sequence
[X(j) = X.; j e J1,N0 we get
N
log L(cp,p; X1, .. N) = n p 0(a,p) + E X. (a,p)
j=1
(3.3.8)
Hence
(3.3.10)
(3.3.11)
(3.3.12)
where
(3.3.13)
N
S 1 N
log [(l+Qp)[(lp)] nJ+n E
i=1
log [(l+apj+l) [p(l+apj1)] 1
log [pNl1 (lp ( N1 1
log [ap (lp)(l+wp ) 3
log X 1
i1
if j = 0
if je J1,N1
if j = N.
(3.3.14)
But in view of (3.3.12), this can be written
log L(a,p; Y1'2 ...' ) =
N n
= n q ((e,p) + E E Yi,j Pj(~,p)
j=1 i=1
n
= z Z
i=l
YT = (Y Y. Yi
1 ij'i ,2 .. ,N
Y. = Y.(j)
z = o (a,p) +
(3.3.15)
(3.3.16)
(3.3.17)
N
S=1 ( ip)Y
j=l 1 j
It follows from the discussion leading up to (3.3.12) that
N
zi = Co (Op) + cp.(a,p)yi,
Sj=l
= log p( i,a1,p),
where
P()'%Cp) = PYi, = y ,1'.... i.N = i,N.
CPj CkQP
where
(3.3.18)
(3.3.19)
We are now ready to state and prove
Theorem 3.3.1: The equations
8 log L(a,p; X1,X .... XN)
= 0
ac
Slog L(o,p; X1 XN)
= 0 (3.3.20)
ap
have a solution (&,p) which is weakly consistent for (c,p). Furthermore,
the asymptotic distribution of [,/n (&c); ../n (pp)] as n m is
normal, with null mean vector and variancecovariance matrix 2, where
11 12
1 CT
12 22
(3.3.21)
with
11 2 p2 2
a = p (1+ap) a
N1 2
S(l+c(ltpJ 1 2(j+l) (l+cpj+il 2 2(j1) j1)
+ E (l+u)(l+up ) [p (1+ep ) p (1+Cp ]
j=1
N 1 2 2(N1) N12
+ (l+a)(l+upN) [cI p (1+Np ) ], (3.3.22)
12 (lp 2
a = (1+cp)
Nl
+ E (l+a)(l+c&J)l[(jl)pj2(l+QpJ1)2(j+l)p (l+cpJ+l)2
j=l
 (1+o)(1+cPN 1[(N1)p2(N1) (1+QN) 2]
(3.3.23)
and
22 2 2 ( 2
22 = o (1+op) (1p)
N1
Ni j 1 2 2 2j j1
+ E (l+i)(1+ap ) p~ +Q++ [(j+l)c2p 2 j(j+l)p1
j=l
X (1+pj+1 2 [(j1)(j2)p j3 (j )2 2(j2)
x (l+crpjl)2] N)i N3
X (1+apj1)23 + (l+')(l+app )1 ({[(Nl)(N2)ap 3
2 2(N2 N1 2 2 2
(Nl)ap ]( (1+cp ) +(lp) + (Nl)p] (3.3.24)
Proof:
Write p = p(y ; a,p), 01 = a, and 62 = p for convenience.
Chanda [5] has shown that the following three conditions are suffi
cient for the theorem to be true:
(i) For all (01, 2) e ( = [(0,=) X (0,1)3,
3
a log p 8 log p and log p exist for almost a
8 9 *and o9 oe9 exist for almost all },
i 1 3 1 j k
and for all i,j,k e J
1,2
(ii) Defining F() to be a summable function when
E F(y ) < <, (3.3.25)
i 
where Z is the summation over almost all ,, we must
yi
have, for all (91, 2) e 0 I < F
I 12P {3 log p k
< Fij (y), and aeisejae
where F (y) and F.. () are summable functions, and
i ~ ij1
S H k(y )p(y; e, 6) < M,
E Hj ,k i 1 2 )
zi
M being a finite positive constant. This condition
must be satisfied for i,j,k e J1,2
(iii) For all (98182) e C, the matrix J = ((Ji,j (1 2) ),
where
(3 log p 3 log p,
J (6 ,9 ) =) p(y, 9 9e )
i,j 1 2 1P(i, '1 2
1i 1 3
is positive definite and JJ is finite.
(i) Condition (i) will be established as soon as
2 3
ar r _r
rW' 8.e., and roe 0o are shown to exist for r e JO,N and
i j J j k
i,j,k c J1,2. This follows from the form of log p(yi ;,p), which
from (3.3.18) is
N
log p(y ;o,p)= Z cj(,p)y. ., (3.3.26)
j=O 'J'
where we define yi, = 1.
But by inspection of (3.3.14), one sees that cr(a,p) is the
logarithm of a continuous finite function for all (a,p) e ,. Thus
it is easily shown that derivatives up to third order exist in C.
(ii) It follows from (3.3.26) that
N
p(~i;o(,p)= exp [ c.(o,p)y.i. (3.3.27)
Thus
dp. N dr N
e E i,r e. exp _E y
j r=O j=0
N ar
r=0 3
which is bounded by the summable function
N ra
F.(y.) = I E i Y p(y; ,2), (3.3.29)
3 ~1 CC i,N 1 r1
r= i
for from (i) the terms C are finite, and Efyi, < .
3
Similarly,
2 8 N # er
eae. e (aee) ( k,r k 1 2
1 J i r=0 j
N 2
= 0 T Yk,r Pk 1 2)
r=O 1 3
r=0 s=0 1 3
which requires (i) and finite second order moment of yk in order to be
bounded by the summable function
N Tr .
i,j Lk [ ae= ] kN krk 12
N N r s
+ E a j YN P(Y ;1 2). (3.3.31)
r=0 s=0 i j
Finally,
3
8 log ps N (3.3.32)
i.dej r=E d.d9.9 Ys,r (3.3.32
1 j k r=0 i j k
is bounded by
N o r
Hijk ) = YsN r i. oh.k (3.3.33)
L'0 i j k
which has finite expected value by (i) and the finite expected value of
ys,N. Thus (ii) is established.
(iii) Chanda has shown that J will be positive definite as
long as p is a density function. That IJI is finite follows from the
fact that log log P) is finite (from (i)) for all (9 ,) e Q.
1i j
Thus the proof of Theorem 3.3.1 is complete. A closed form has
not been found for the solution to (3.3.20). Some numerical results
using an iterative procedure will be presented in Chapter IV. Also, we
attempted to establish stronger consistency properties for (,p)), using
results of Billingsley [2] and Wald [27], but have had no success as yet.
3.4 Continuous Observation in [O,T]
Assume that the logistic growth process iX(t); t e Io 1 is
observed continuously in [0,T] = IO,T That is, [X(t); t I ,T) will
be observed, with X(0) = n known.
The problem of estimating aC and p will be approached as follows:
first, divide the interval I,T into N equal intervals of infinitesimal
length h. Assume that observations will be made only at the end of
each of the N intervals; that is, (X(jh); j e J 1N will be observed,
where Nh = T. From (3.3.2) we can write the likelihood function for
such an observation:
L[c,p; (xjh; j E J ,N)
S jh )jl)h(T jh (jl)h
j=l (j 1h1 T (jl)h,jh (jl)h,jh)
(3.4.1)
where x0 = n and T is defined in (3.2.15). Note that to obtain
0 s,t
the continuous observation in I ,T, one must let h 0 and N 
such that Nh = T. We thus want to consider the properties of
log [L[a,p; (Xh; j JON)]} as h 0.
We first define the concept of "limit in mean" (l.i.m.).
If a sequence of random variables fz,,Z 2" ,m,... are such that
lim E[(Z Z)23 = 0, (3.4.2)
mm
m M
then [Zm is said to approach Z as a limit in the mean square sense, or
simply as a limit in mean. We will write
l.i.m. Z = Z (3.4.3)
m
m  co
We now state
Theorem 3.4.1:
The limit in mean of log fL[c~,p;(Xjh; j e JO,N)] as
h 0 (Nh = T) does not exist. However, for fixed (a ,P) e Q,
l.i.m. [log L[a,p; (Xjh; j JO,N)] log L[0,PO; (Xjh; j eJON)
h 0
Nh=T
T
= (log t log 0) dX(t)
0
+ j X(t)[t 0] dt,
where we define dt
where we define
O t t 1
t = ao0oP (1+0 Po
(3.4.4)
(3.4.5)
(3.4.6)
The stochastic integrals on the righthand side of (3.4.4) are defined
in the usual sense, and remember e = log p.
Before proving Theorem 3.4.1, we will need the following lemmas.
Lemma 3.4.1:
lim N T
h 0 log T 1)hj = t dt. (3.4.7)
Nh=T = 0
The
lim N
h 0 {ZI log (1 T(l)h ,Jh)
h=T 1 =1
does not exist, but for any fixed (0 ,Po) e
lim N F 0 1
h 0 E h log L (1 (j1)hj T ()hIj
Nh =T =1
T
= (log Et log g )dt, (3.4.8)
0
writing (s < t)
,t (1 + cg t)(1 + ) 1 (3.4.9)
Proof of Lemma 3.4.1:
Using (3.2.15),
jh (j1)h 1
(j1)h,jh = (1 + pjh ) ( + Cp1)h )1
= (1 + apjh) [1 + ap exp (Sh)]1
= [1 + (1 + cp jh)l [Cpjh(exp [$h3 1)])1
[ + h + (h) (3.4.10)
= [1 + h( + 0(h)] (3.4.10)
j h
Hence
log T(jl)h,jh = log [1 + h jh + O(h2)
h jh + 0(h2), (3.4.11)
from which (3.4.7) follows immediately.
But from (3.4.10)
2 2 1
1 T(jl)h,jh = [h jh +0(h )][ + h jh +0(h2)], (3.4.12)
lim
which implies that h 0 [log (1 T(jl)h,jh) does not exist. Now
Nh=T
(jl)h,jh (jl)h,jh
2 O O 2 0 2 1
=[hjh +0(h )][1+hh +0(h2 )][h h +0(h )]
X [I + h j+0(h2)]
Sjh Sj
= jh~0jh) [1 + 0(h)] (3.4.13)
We substitute (3.4.13) into the lefthand side of (3.4.8) to complete
the proof.
Lemma 3.4.2:
If we define, for s < t u,
R (t,u) = C (t,u) mxs(s,t)mx (s,u) (3.4.14)
where
Cs(t,u) = E[X(u) X(t)/(X(s) = xs)) (3.4.15)
and mx (s,t) is defined in (3.2.10), then
R t 5) u( t 1
R (t,u) = e(pp )(1+op s) [(1+Cop )( t +p )] (3.4.16)
5
Proof of Lemma 3.4.2:
Using (3.2.26),
C (t,u) = E[X(t) [E[X(u).'X(t) = x t] '(X(s) = x )
= (1 + &p t) (1 + Cp )1 C (t,t). (3.4.17)
s
And from Theorem 3.2.1,
E[[X(t) x ][X(t) x 1/(X(s) = xs)
0 /x +j1 xj
= j(T(1s t s (1T
j=0 s s,
= (x +1)x (T ), (T ) (3.4.18)
s s s,t s,t
Thus
C (t,t) = (x +1)x (T )(T )2
s s s s,t s,t
+ (2x +1)E[X(t)/(X(s) = xs) Xs(x +l)
,)
= x (T ) "(x +1T ). (3.4.19)
s s,t s s,t
Substitution of (3.4.19) into (3.4.17) gives
1
C (t,u) = x (T T ) (x +1T ). (3.4.20)
s s s,t S,u s s,t
Then using (3.4.20) in (3.4.14) yields
1
R (t,u) = x (T T )1 (1 T ), (3.4.21)
s s s,u s,t s
which is (3.4.16) upon substitution of (3.2.15) in (3.4.21).
63
Lemma 3.4.3:
The stochastic integral
1.i.m. N
I = h 0 E X(jh) jh h (3.4.22)
Nh= T j=l
exists in the mean square sense for all (n,p) e Q. We shall write this
limiting random variable
T
I, f X(t) dt. (3.4.23)
0
Proof of Lenna 3.4.3:
Using a result from Cramer and Leadbetter [7],sufficient condi
tions for the existence of Il are that R (t,u) be of bounded variation
in [IO,T X IOT) and that
T T
J1 T t u R0(t,u) dt du (3.4.24)
0 0
exist.
Since, from (3.4.16),
0 R (t,u) x0 (1+f), (3.4.25)
the first of these conditions is satisfied. And since t is finite for
any (a,p) e I1 and all t e IO,T, the integral (3.4.24) exists. Thus
the lemma is proved.
Lemma 3.4.4:
The stochastic integral
1.i.m. N
2 = h 0 (log jh)[X(jh) X((jl)h)] (3.4.26)
Nh=T j=l
exists in the mean square sense for all (a,p) e Q.
We shall write this limiting random variable
T
12 = J (log St) dX(t). (3.4.27)
0
Proof of Lemma 31.4:
Again from Cramer and Leadbetter [7], sufficient conditions for
the existence of 12 are that R (t,u) be of bounded variation and that
T T
J2 =j (log t)(log u) dt[R0(t,u)] (3.4.28)
0 0
exist. The first of these has already been shown, and by differentia
tion of R (t,u), one has
T Tt
J2= J (log t)(log S )[a3(l+o)]2 pt
0 0
[(l+apt (l+apu)] dt du (3.4.29)
The integrand is finite in C and [IOT X IT}, which completes the
proof.
Proof of Theorem 3.4.1:
From equation (3.4.1)
log L[o,p; (Xjh; j e J,N)
N X.h N
= (X Jh + E X log [T
j=1 (j1)h j=1 (j1)h (j1)hjh
N
+ E (Xjh X j)h) log [1 T j)hjh ]. (3.4.30)
j=1
The third term on the righthand side of (3.4.30) has no limit in the
mean square sense because of the first part of Lemma 3.4.1. We thus
consider
log L[a,p;(Xjh;j e JON)] log L[ 0 ,P0;(Xjh;J e JO,N)
N 0
= X(j.l)h (log T(jl)h,jh log T(jl)hjh
j=1
N 0 1
+ Z (X^ X ) log [(lT )(1T ) 13
+ E (Xjh X(jl)h) log [(lT(jl)h,jh)(1T(jl)h,jh
j=1
(3.4.31)
for any (a0, 0) e Q.
Using lemmas 3.4.3 and 3.4.4, we have
log L* [(,p;O',P ; (X(t); t e IO,T)]
T T
S X(t)(~0t) dt+ (log tlog 0) dX(t), (3.4.32)
0 0
writing
log L f(,p;aO,p0; (X(t); t e I ),T
1.i.m.
= h 0 (log L[a ,P; (Xjh; j e JO,N)
Nh=T
log L O('PO; (Xjh; e JO,N)]. (3.4.33)
This completes the proof of the theorem. The idea of resolving the
limit problem by considering the ratio of likelihood functions, as in
(3.4.31), is from Bartlett [1].
By using the expression (3.3.12), we may rewrite (3.4.32)
to obtain
g n
log L (a,p;a0,PO; [X(t); t e I,T] = ZT,i, (3.4.34)
j=1
where
T T
ZTi (t)( )dt+ (log t log 0)dYi(t). (3.4.35)
O 0
(ZT,i; i J1,N is a sequence of independent, identically distributed
random variables. We will be interested in
n T
alog L l t t) 
.Z Z {I [ap (+ ( cp ) ] dY. (t)
i=l 0
T
pt(l+ op Yi(t) dt (3.4.36)
0
and
a log L n T t t 
9 f 1(1 + ap pt) (1 + CI p dYi(t)
i=1 0
T
T t t)2
a p (l+apt Bt)(l + p )2 i(t) dt (3.4.37)
0
Since L is not the sum of probability densities, as is the
usual case for likelihood functions, we will need the following
Lemma 3.4.5:
Efa log L E log L 0. (3.4.38)
Eo(u ( } E (3.4.38)
Proof:
From (3.4.36)
a log L 1
SY C ) p + ept) ] dtm l(, t)
i=1 0
T
1 pt (l+ ap )2 m(,t)dt
n T
0
S (+1( (l)8 [c~ pt(l+cp t)]p (l+p )dt
i=1 0
T
P(+) 1 pt(1+Cpt) 3dt}
0
= 0,
(3.4.39)
using the fact that (CramerLeadbetter [7])
EfdYi(t)] = dtEfYi(t)j
= d [m (0,t)]. (3.4.40)
Similarly,
log L* n 1
E{( ls = Z 1 { (l+Cptt) (l+apt)dtml(O0,t)
i=l 0
n T t t t 3
= (c+1) (l1+ap Pt)p (l+ap ) dt
i=l 0
a(cl) (l+aptt)pt(l+apt)3dtt
0
=0, (3.4.41)
which completes the proof.
We are now ready to state and prove
Theorem 3.4.2:
The equations
8 log L ',$;ao ,P; [X(t); t e IOT]]
= 0
8 log L*[a,P;O, 0; [X(t); t e IO,T ]
= 0 (3.4.42)
have a solution (a ,* ) which is weakly consistent for (&,P). Further
more, the asymptotic distribution of [(A ( *y); 4n (4*8)] as n m is
normal, with null mean vector and variancecovariance matrix Z where
[*
a11
* =
a12
L
a12
*22
22
2 2 2
Now, (w ?W +w X 12U w A ) I
Now, al = 1 (2 22 22 11 2 2212,
2 2 2
a22 = (12 Xli + w1 22 11 12 12)
and
2 2
12 = A 12 2211+ 11 12 1)22 (U2 + 11 22 12
with, writing a1 = and 2 = '
2 *
= =1 Efa 82 log L
w"ij n E r6939.
1 j
. =..n1E( log L
ij l o8. /
2
A = wll W22 W12.
d log L *
\ 9. /J '
J
(3.4.48)
Proof:
Writing
log [Pi [c,';CO', 0; (Yi(t); t E I ,T)]}
T T
= J Y (t)(S )dt +
0 0
(log ?t log 0)dyi(t),
t ~t I
we have from (3.4.34) that
log L *O.B,;&c00; [x(t); t IO, T]
= log Pi [',;cO,'80; (Yi(t); t C IO,T)].
i=1
(3.4.43)
(3.4.44)
(3.4.45)
(3.4.46)
(3.4.47)
(3.4.49)
(3.4.50)
As mentioned prior to the statement of Lemma 3.4.5, we cannot regard
(p ; i e J1,N as a sequence of probability density functions. However,
as we shall see, this is not essential to the proof of the theorem.
In the proofs given by Cramer [6] for the single parameter case
and Chanda [5) for the multiparameter case, the only property of the
density function crucial to the proofs is
a log f,
E( = 0, (3.4.51)
where f is the density function and 9 some parameter of f. The property
holds as soon as I [ is bounded by an integrable function (over the
sample space).
But in the present case we have, from Lemma 3.4.5,
n *
E alog } E( lo 0, (3.4.52)
J 1=1 j
which implies
6 log p
E = 0, (3.4.53)
for j C J1,2. Thus the assumption that p is a probability density,
and, in fact, the requirement that I 'I is integrable (or summable
in this case) are unnecessary.
We now must establish the three conditions given in the proof of
Theorem 3.3.1 to complete the proof of this theorem.
(i) To see that log p. is differentiable up to the third
order for almost all yi, we need only note that t
and log Ft are differentiable to that order; this is
easily sho'n using (3.4.5).
(ii) As mentioned, we no longer need the summability condition
,2 *
for I '. The additional requirement that 1 I be
j j k
bounded by a summable function is used such that
9 *
Ek p= = 0, (3.4.
Ii i k
and hence
2 *
a log p. a log P* ( .
E 6e. e 1 J = E (3.4.
Sk j k
But we shall not make use of the property (3.4.55), so it will not be
,2 *
o p.
necessary to have I bounded by a summable function.
Sk 3 *
j k s log p.
Thus, we need only establish the condition that ioge e
b6 de k6
0J k I
< H. j,k, ), where EI[H j,k,(i)l < M, some finite, positive constant.
Using (3.4.49),
C3 log p T a3
i leke a i 0 Yi(t)dt
0 j k 0
54)
55)
3 2
+ T 1 2 t
0 j' k t \aea a k ae,
2 2
2 t t 2tt
23 t (t) (3.4.56)
j k I
Since, for every (81 ,2) E 0, we can easily bound t and its derivatives
2 It
used in (3.4.56) for all t C IO,T we can find Hj ,k,(yi) as soon as
T T
E[( y (t)dt] and E[J dy (t)} < ", which follows from the proof of
0 0
Lenma 3.4.5.
(iii) The third condition required for Theorem 3.4.2 to be true
is that J = ((Jij 1(9,2)) is positive definite and that
IJI is finite, where
i E a 2log p log p
Ji,j(el, 2) A 9. J
i j
.ij 
13
(3.4.57)
From (3.4.36) and (3.4.37),
nX11 = f [a tl p (1+p ) 1]
0 0
+ 2 fT T t+u( t)
+ pp )
0 0
T T
0 0
l u u i
[ pU (l+ap ) dt uR (t ,u)
(l+cpU) ]2
RO (t,u)dt du
[C p (1+ap t)] p (l+ap ) dtR (t.u)du,
(3.4.58)
1 1 T T 1 t t )1 u u 1
nhl2 = B [ pt(l+op (lltp) (1+ap Bu)(l+Yp u) dt,uRo(t,u)
0 0
T T
+ T J Jp t+U(l+p t0t) [(l+app ) (l+CYp )]2 R(t,u)dtdu
0 0
T T
J J (1+a(pt t) (l+Yp ) pU (l+p )2
0 0
+ crcUU p (1+ap) (l+p ) t) p (l+apUBu) (l+apU)c )
X dt RO(t,u)du,
T T
n22 = B2
0 0
(3.4.59)
(l+ap tet) (1+p UBu) [(l+cp t) (1+p U)]1
X dt R (t,u)
t,U 0
T T
+ 2T J Jp (t+0p St) (l+opUSu) [(l+cip )(+pU)]"
0 0
X R (t,u)dt du
T T
2a_11 T fT (++ tt)(l+, u)(l+p t1 (l+culu2
0 0
xdtRo(t,u)du. (3.4.60)
For all (c,B) 0e R (t,u), given by Lemma 3.4.2, is finite, as are its
derivatives. Thus .ij is the integral of a finite integrand over a
finite interval, and so is finite. This completes the proof of the
theorem.
Thus we have shown that (0 ,P ) have the properties usually asso
ciated with maximum likelihood estimators, losing only the property that
Slog pi log P. log p
E e i E E e i 1 i)} (3.4.61)
3 k j k
that is to say,
Uk = jk (3.4.62)
When this property holds, the form of 7 is greatly simplified; since we
cannot assume (3.4.62) here, calculation of E is tedious. In addition
to .ij 's given by (3.4.58) (3.4.60), we will need
Wll = (c1)B pt[a(l+&pt )4 ]1dt, (3.4.63)
0
T
t 2 t t1
I12 = (4+1) J (1+ip ) [p (1+ip ) Otp
0
X(l+ p ) 2]dt, (3.4.64)
and
022 = o(ao+l)B p [(l+c pt) 1St(+cp t) 2]2 dt. (3.4.65)
0
No closed form has been found for the solution to the estimation
equations (3.4.36) and (3.4.37), nor could a more convenient form be
found for the elements of Z It is conjectured that iterative numer
ical methods can be used, but we leave this particular topic open to
further research.
3.5 Arrival Time Observation
Assume that instead of observing the logistic process (X(t)}
itself, we observe only the times at which the process increases; that is,
the "arrival times" are observed. Observation is terminated when the
time of the Nth increase has been recorded. Hence we will observe
(Tj; j J1,N,' where
< n + j if t
x(T.) is
> n + j if t 2 T., (3.5.1)
where x(0) = n.
The distribution of this sequence of random variables is easily
obtained using Theorem 3.2.1. For any 9 2 0,
P Tk+j T.j e/(Ti = ti; i J,j))
= PX(t. + 0) X(t.) > k/(t.; i e J .)
3 3 1 1,3
= P[X(t.+e) X(t.) > k/C. (3.5.2)
where
x(t) = n
x(t) = n + 1
C =a
x(t) = n + j 1
x(t) = n + j
if t < t
if t 1 t < t2
if t t < t
f j1 j=
if t = t..
J
Since [X(t)} is :,arkov, we have
P[T .+ T. 9e/(T. = t.; i C J .)}
.k+ 3 1 1 1,J
= P[X(t.+e) X(t.) k/(X(t.) = n + j))
J J J
= P[X(t.+e) 2 n + j + k/(X(t.) = n + j)]
CO +j+iI n+j
i=k t.,t.+e t.,t.+
i=k J J J J
Since (3.5.4) is a function of tj only, it follows tht [T ; j C
is a Markov sequence, and from (3.5.4) the distribution function
given by
FT (;t.) = P[T j+t.+OT = t.
j+1
= 1 (Tt.,
3
requiring 9 0.
To follow the procedure suggested
ferent birth and de3th processes would be
function
where
by Moran [22] for several dif
to calculate the likelihood
L(c,p; t t ) = IT f (c),p; t ,
1 N" T J
j=1 j+1
f ( ;t ) [FT (6;t )],
j+1 d(t +) Tj+1i
(3.5.6)
(3.5.7)
(3.5.3)
(3.5.4)
1,Ns
is
(3.5.5)
which exists in Q since T. t.+, and thus F (e;tj), is absolutely
J' J J+1l
continuous for 9 > 0.
However, Moran's approach can be shown unsatisfactory for this
case by observing that, from (3.5.5),
lim FT (e;t)
S j+1 J
t.+e t
= lim [1 [(l+cp J )(l+ap j)l]n+j3
t
= 1 (l+p )(n+j) (3.5.8)
Hence there is a nonzero probability that, given T = t., Tj+ will
not be finite. Thus ETj+ /(T = t.)J does notexist, and the likeli
j+1 J J
hood function given by (3.5.6) is not useful for estimating (ci,p).
Perhaps a conditional likelihood could be used to get estimation
equations. Further study of the problem is needed.
3.6 Prediction
Assume that N equispaced observations will be made on the
logistic growth process (X(t); t e IO,C0, as in 3.3. Thus, based upon
observations taken on fX(j); j J1,N we will predict the value of the
time series at time N + m, that is, X(N+m).
Restricting our attention to the predictor which minimizes the
mean square error of prediction, we define
xN (m) a E[X(N+m)/[X(j) = xj; j e J1,N
= E[X(N+m)/(X(N) = xN))
= m (N,N+m)
N
(3.6.1)
= XN(l+opN)( 1+ap N+m)l
S(1)
using (3.2.10). Then x (m) minimizes the mean square error of pre
N
diction, which is
MSE = E[[X(N+m) ,(m)]2
= ERx (N+m,N+m)
SXN 'N(NPm (+Q ) 2
o(1+ct N N N+m N+m 2.
= EfX^ a(+c/p )(p p )(1+op ) }
= n(l+c) pN (1p m)(l+pN+m)2
(3.6.2)
where X(0) = n.
If cy and p are unknown, consider the predictor
(2)(m) = xN(+ N)( ^ N+m 1
(m) = x (1+cyp )(l+ao )
(3.6.3)
(2)
Denoting the mean square error of x (m) by MSE it
N SE2, it
( 2) (m) M1)()]2 9
MSE2 MSE = E[ (m) " X'
9 N N+m 1
= E(X[(l+ap ) (l+ap )
(1+&p ) (.opN+m)1 2 .
Define
v I ^ ^N+m
V = 1 + ci
6g = a 0,
6p= p p.
follows that
(3.6.4)
(3.6.5)
(3.6.6)
(3.6.7)
(3.6.8)
Then
6U = (1 + QpN) (1 + apN)
N N
= (&p &p
N N1N N
= (6 Ce) (6 p+p) Qp
= p 6 + NpN1 6p + 0 (n ), (3.6.9)
p
making use of the fact that the maximum likelihood estimators are weakly
consistent. Similarly,
6v = (l+cypN ) (1 + pN+m)
N+m N+m1 1
Sp 6 + (N+m)ap 6p + 0 (n ). (3.6.10)
Hence
1 N N+m 1
U = [6U + (l+ap) ] [6V + (l+cYpN) ]
= (l+op ) (+apN+m )+ (1+p N+m)6U
(l+ap ) (l+pN+m)2 6) V + 0 (n ). (3.6.11)
p
Now we can get an approximation for (3.6.4) by taking
USE MSE = E({X[UV1 (l+!pN) (l+0p N+m) 1 2
2 1 N
E(X) E[UV (1+apN) (l+YpN+m )12
= CO(N,N)(1+apN+m)2 Ef6U (l+cpN) V]2
2N N l ! 2 .. N, N+m) 2
= p n[n+a(lp )(1+a)1](l+()2 [(l+(p ) (l+apN )1
N N+m l m 2 1
X[a11[l(l+cp ) (l+Cpp ) p ] + 2 12 p
X[N(2N+m)pm(l+apN) (l+CpN+m ) + (N+m)
2m N2 N+m 2] 2 2
X p (l+ap ) (l+p ) ] + 22c p
[N (Nm)(1+ p N) (+epN+m ) pm ] 2.
(3.6.12)
Perhaps (3.6.12) is only a crude approximation for (3.6.4), but inspec
tion of MSE (3.6.2) and the approximation for MSE2 MSE1 (3.6.12) leads
to several interesting observations:
1. MSE1 is proportional to pN; that is, MSE1 decreases exponen
tially as N increases. This is intuitively plausible, since the corre
lation between X(N) and X(N+m) is given by
Corr [X(N), X(N+m)]
1/2
= R (N,N+m)[R (N,N) RO(N+m, N+m)]
= [(lpN (lN+m)1 1/2 (3.6.13)
Hence the correlation is a monotonically increasing function of N, and
it follows that X(N) will contain more information about X(N+m) when
N is large; thus the decrease in prediction error.
2. MSE1 is proportional to n. Thus, contrary to the error
associated with the maximum likelihood estimators of the logistic param
eters, the error of prediction increases with the initial size of the
process. This is perhaps partially explained by the fact that the
process variance at any time is proportional to n (Lemma 3.4.2);
heuristically, the population or process size is less deterministic for
large populations.
2N
3. MSE2 MSE1 is proportional to p Hence the additional
prediction error incurred by parameter estimation also decreases as
N increases, and at a faster rate than MSE1.
4. MSE2 MSE1 is proportional to n.
Observation (4) is surprising if true; but it depends most
directly upon the accuracy of approximation (3.6.12). Thus, though
79
this may be just another of the logistic process' strange characteristics,
one must question the credibility of (3.6.12). Rather, one expects the
increase in prediction error to be of order one, following the general
pattern that a decrease in estimator variance produces a corresponding
decrease in the ratio of MSE2 MSE1 to MISE1.
2 1 1
CHAPTER IV
AN EMPIRICAL COMPARISON
4.1 Introduction
Rhodes [25] has given a method of estimating the logistic
parameters which receives wide application. We present his method in
this chapter, with an attempt to examine the estimators' sampling prop
erties. These shall be shown to be analytically intractable; thus
a Monte Carlo experiment is performed in order to compare these esti
mators with those presented in the last chapter. Finally, the logistic
model will be fitted to the population of conterminous United States,
using both Rhodes' and maximum likelihood estimation procedures.
4.2 Rhodes' Estimators
Recall that the mean function for the logistic growth process,
given by (3.2.12), is
t I
EfX(t)/X(0) = nj=mn(0,t) = n(l+o)(l+ctp )1 (3.2.12)
Upon inspection, it is seen that
1 1 1
[m (O,t+l)] =p[m (0,t)] + (lp)[n(l+,:a)] (4.2.1)
n n
Rhodes thus defines
Y(t) = X(t)] (4.2.2)
Y(t) = [X(t)] (4.2.2)
and, observing the random variables (Y(j) =Y.) Y1 Y.. 2 Y N he finds
estimators for p and y in
t+ = + v t J1,N (4.2.3)
where
y = (p) a) (4.2.4)
and yt is the observed value of Yt
The estimators, obtained by minimizing
N1 2
S = E [y t+ (pyt .2 (4.2.5)
i=l
are
P r1 t 1 t+1 2 1
C1 3 )(Yt 9y2)j (tY) J (4.2.6)
=1 =1
and
0r = Y2 P Y (4.2.7)
where
N1
y = (N1) E y (4.2.8)
t=l
and
N1
y = (N1) Z y t (4.2.9)
t=1
Using (4.2.4), one gets an estimate for a:
&r = (Ip)(n r)i 1 (4.2.10)
Rhodes then shows by application that one obtains reasonably
close agreement between projection and reality, using these estimated
values. To my knowledge no probabilistic study of the properties of
these estimators has been made.
4.3 Sampling Properties of Rhodes' Estimators
Define, for 0 S s S t and u > 0,
1
Bn(s,t) = E[Y(t)/Y(s) n },
gn(s,t) = EY 2(t),Y(s) = n1},
and
n(s,tt+u) = E[Y(t+u)Y(t)/Y(s) = n }.
(4.3.1)
(4.3.2)
(4.3.3)
Now fix s,t and u, and write n and Tn for (4.3.1), (4.3.2) and
(4.3.3), respectively. We now prove
Theorem 4.3.1:
(i) B =
n
ni
n 1 nj
V log T T ji V ;
j=l
(4.3.4)
(ii) n 1 1 2 1 2
(ii) = [3[cos (1T)3 6n cos (1T) + 2rr
n 12o( 1T +2
nI j1
1 1 1 1 k
+ E [j log T j Z k v ;
j=1 k=l
(4.3.5)
and
writing
1 *
(iii) T) = [T n B ],
n+1 n n
T = (l+Cpt) (l+cps )
B = B (s,t+u),
n n
(4.3.6)
(4.3.7)
(4.3.8)
(4.3.9)
and defining
V = T(T1)
Proof:
It follows from Theorem 3.2.1 that
PlY(t) = j/Y(s) = n1 = P[X(t) = j/X(s) = n3
= Tn (1T)jn
0
if j = n,n+l,...
otherwise.
O j1 )Tn(lT) jn
n E 1
J=n
M1 +n1 n
= (j+n)1 n(1T) .
j=0 n1
C 1 1 )n+1 jn1
n+1 n
On+l = j n Tl(j
j=n+l
 +n1 .
1 1 n 3
= n v E [n(j+n) 1] nl T (lT)
j=0
= v(n n1)
Notice that
81 = E{Y(t)/Y(s) = 13
= j T(1T)j1
j=1
(4.3.10)
(4.3.11)
(4.3.12)
= V log T .
Thus
(4.3.13)
Now, defining the generating function
CO
n
P (z) = E B z
n=l
we have, using (4.3.12) and (4.3.13),
1 n
P (z) = 01z + vz P (z) vz E n z
n=l
Rearranging,
l
P (z) = (1 Vz )1 vz [log T + log (1z)],
which implies that
n1
n = n log T E j1 vj,
j=1
as was to be shown.
(ii) Similarly,
(jn)2 +n1
j=. nn ) T (1T)')
j=O
CD
+1= (j+n)2
nj=
j=l
+n1 n+l j1
n (1)
= n1 (j+n) [n(j+n)1 1] Tn(lT)
j=0
1
= V [n n B].
n n
(4.3.14)
(4.3.15)
(4.3.16)
(4.3.1)
(4.3.17)
(4.3.18)
Noticing that, from a result in Davis [8],
S= EY2(t)/Y(s) = 13
S j2 T(T)j1
j=1
1 [3[cos 6 os1 (lT) 2
12 v [3[cos (1T)] 6T cos (1T) + 2TT ,
we define the generating function
Co
n
P (z) = E z.
n=l
It then follows from (4.3.18) that
1 n
P (z) = 1z + vz [P (z) E n Pn zn],
n=1
which implies that
n n1 l
n1
n 1 nj
Sj
j=1
Upon substitution of (4.3.19) and (4.3.1), this is (4.3.2).
(iii) From (4.3.3)
In = E
E [(j+n)(k+j+n)] I
j=O k=O
X P(X(t+u) = k+j+n/X(t) = j+njP[X(t)
= j+n/X(s) = n3 .
n+1 = E E [(j+n)(k+j+n)]1
j=1 k=0
X P(X(t+u) = k+j+n'X(t) = j+n3}PX(t) = j+n/X(s) = n+1l
(4.3.19)
(4.3.20)
(4.3.21)
(4.3.22)
(4.3.23)
CD CO
= n ( k+j+n) [l+n(j+n)]
j=0 ko=
X P(X(t+u) = k+j+n.'X(t) = j+n]P[X(t) = j+n'X(s) = n)
= V [7n n*1], (4.3.24)
as was to be shown. This completes the proof of Theorem 4.3.1.
The reason generating functions were not employed to further
resolve (4.3.24) is that no closed form could be found for 1 '
It is not difficult to use Theorem 4.3.1 to obtain approximations
for the first order sampling properties of Rhodes' estimators. The
expressions are tedious, the computing process was extremely time con
suming (and hence expensive), and the few numerical results obtained
were nonsensical. This is probably due to rounding errors resulting
from the great number of calculations involved, or perhaps from the
inaccuracy of approximations made in order to obtain these expected
values.
Second order properties of Rhodes' estimators would involve
fourth order moments of [Y(t)], which are intractable. However, one
can perform Monte Carlo experiments to at least compare the estimators
for specific processes.
4.4 Monte Carlo Experiments
A computer program was written for the IBM 360 which
(1) simulates a logistic growth process given n, a', p and N;
(2) uses this sequence to calculate the maximum likelihood
estimates given in 3.3, using the NcwtonRhapson iterative
procedure;
(3) calculates E, the asymptotic variancecovariance matrix
for these estimators;
(4) calculates Rhodes' estimates for a and p, using the method
given in 4.2.
The results of several of the experiments are presented. The format is
as follows: first, a table is given which lists the generated logistic
sequence and the sequence of expected values. This table is followed
by a graph depicting both the generated sequence and the expected
sequence. Finally, a second table gives the maximum likelihood esti
mates and the corresponding asymptotic variancecovariance matrix,
using the first third, first twothirds, and all of the generated
sequence, respectively. Rhodes' estimators are also calculated for
each case.
Three such experiments are presented, each with a different
value of n and Ca. We have conducted several other experiments, and
those presented here are exemplary. Changes in p do not significantly
affect the results, except that rounding errors in the computations
affected the maximum likelihood estimates for p (p ). These cases
are practically uninteresting, though, since for small p the process
quickly approaches an asymptotic value, a characteristic uncommon to
most economic and other real sequences.
Note that Rhodes' estimates are unreliable when N is small,
whereas the effect of N on the maximum likelihood estimates is negli
gible. In addition, we could obtain asymptotic confidence intervals
for the parameters, using the maximum likelihood estimates and T. We
also note that the bias in a seems to depend primarily on the relationship
Table 4.1
Generated and Expected Sequence for
n = 40, a = 10.0, and p = 0.8
i x. m (0,i)
1 n
1 55 59.46
2 58 71.90
3 72 86.34
4 84 102.88
5 94 121.50
6 110 142.07
7 128 164.32
8 144 187.86
9 164 212.18
10 176 236.69
11 192 260.79
12 211 283.92
13 225 305.60
14 244 325.48
15 262 343.35
1 xi m (0,i)
1 n
16 267 359.13
17 276 372.84
18 291 384.58
19 296 394.52
20 302 402.84
21 307 409.76
22 309 415.48
23 313 420.16
24 314 423.98
25 318 427.09
26 319 429.61
27 321 431.65
28 321 433.30
29 324 434.62
30 326 435.68
400
350
300
:t)
250
200
00
0
* 0
0
* 0
o observed value
* expected value
Fig. 4.1. Generated and E:Tecterl Sequence for
n = 40, a = 10.0, and p = 0.S
S0 0
*
*
0 0
00 0000 00
0 0
) 5 10 15 20 25 30
Table 4.2
Maximum Likelihood Estimates, and Rhodes'
Estimates for Data in Table 4.1 for
n = 40, a = 10.0 and p = 0.8
N = 10
81.242
0.603
C = 5.909
p = 0.794
S= 69.464
S= 0.873
r
0.603
0.00493
N = 20
S 2.801
0.0170
0.0170
3
0. 182 x 10
N = 30
3.247
0. 0162
0.0162
112 X3
0. 112 xlO0
7.134
0.810
9.747
0.851
= 7.045
= 0.808
0r =
Pr =
8.290
0.845
Table 4.3
Generated and Expected Sequence for
n = 100, a = 5.0, and p = 0.8
i x. m (0,i)
1 fl
1 146 142.86
2 179 168.54
3 225 196.85
4 265 227.41
5 304 259.66
6 339 292.89
7 386 326.29
8 425 359.05
9 472 390.40
10 509 419.73
11 534 446.56
12 566 470.63
13 591 491.84
14 607 510.24
15 629 525.98
i x. m (0,i)
1 n
645
655
668
677
682
686
691
696
697
698
700
702
702
702
703
539.28
550.42
559.67
567.30
573.55
578.65
582.80
586.16
588.88
591.07
592.83
594.25
595.39
596.31
597.04
S0 0 0 00000
0**
o observed value
S expected value
t
Fig. 4.2. Generated and E, ected Sequence for
n = 100, a = 5.0, and p = 0.8
700
450
X(t)
400
300
250
0 0
200
150
n=100,
