BAYESIAN MODELING OF NONSTATIONARITY IN
NORMAL AND LOGNORMAL PROCESSES WITH
APPLICATIONS IN CVP ANALYSIS AND
LIFE TESTING MODELS
By
JORGE IVAN VELEZAROCHO
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
TIE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHItLISOPIY
UNIVERSITY OF FLORIDA
1978
Copyright 1978
by
Jorge Ivan VelezArocho
This dissertation stands as a symbol of love
to my wife, Angle, and to my daughter,
Angeles Maria, without whose under
standing, patience and willingness
to accept sacrifice this
investigation would have
been quite impossible.
ACKNOWLEDGMENTS
I would like to acknowledge my full indebtedness to those
people who gave their interest, time and effort to making this dis
sertation possible.
To Dr. Christopher B. Barry who has been my advisor and my
friend, I wish to express my gratitude and deepest appreciation for the
support he has given me throughout the development of this study. He
critized but tolerated my mistakes and encouraged my good performance.
His intelligent guidance, extraordinary competence, and friendly attitude
have been a source of inspiration and encouragement for me.
I am especially grateful to Dr. Antal Majthay for his sincere
advice and assistance during the supervision of my doctoral program and
the preparation of this dissertation. I admire and am inspired by his
unreserved dedication to excellence in education. He will always be
remembered as one of the most valuable models of excellent teaching.
The other members of my committee, Dr. Tom Hodgson and Dr. Zoran
PopStojanovic have each in his own way contributed to the successful
completion of this work. Appreciation is extended to each for his indi
vidual efforts and expressed concern for my progress. Although not on
my committee, I would also like to express appreciation to Dr. Gary
Koehler, whose support and encouragement came when they were badly
needed.
To Onmar Ruiz, Dean of the School of Business Administration
of the University of Pucurt.o Rico at Mayaguez, I am particularly grateful
i v
for his understanding, confidence and cooperation during my leave of
absence from that institution. Completion of this study was only pos
sible because of the combined financial support of the University of
Puerto Rico, the University of Florida and Peter Eckrich and Sons Co..
'li'h ii ouil in u iou support is sincerely appreciated.
I am indebted to Dr. Conrad Doenges, Chairman of the Department
of Finance of the University of Texas at Austin, for his interest and
help and to the many members of the Finance faculty for their interest
during my period of research at the University of Texas. Special thanks
go to Nettie Webb for her warm friendship and continuous secretarial
assistance to my wife.
It is difficult to adequately convey the support my family has
provided. My parents, Jorge Velez and Elba Iucrecia Arocho, and my
brothers and sisters provided understanding and moral assistance for
which I will always be grateful. Their high expectations and constant
encouragement have been a powerful factor in shaping my desire to pursue
this degree.
Most of all a gratitude which cannot be expressed in words
goes to my loving wife, Angle, for her patience and persistence in
typing this dissertation and for her wonderful attitude throughout
the entire arduous process.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS .................
LIST OF APPENDIX TABLES..........
LIST OF FIGURES ..................
ABSTRACT..........................
Chapter
ONE INTRODUCTION .............................................
1.1 Introduction ........................................
1.2 Summary of Results and Overview of Dissertation.....
TWO SURVEY OF PERTINENT LITERATURE...........................
2.1 CostVolumeProfit (CVP) Analysis...................
2.2 Life Testing Models .................................
2.2.1 Introduction.................................
2.2.2 Some Common Life Distributions..............
2.2.3 Traditional Approach to Life Testing
Inferences ..................................
2.2.4 Bayesian Techniques in Life Testing.........
2.3 Modeling of Nonstationary Processes.................
THREE NONSTATLONARITY IN NORMAL AND LOGNORMAL PROCESSES........
3.1 Introduction.........................................
3.2 Bayesian Analysis of Normal and Lognormal Processes.
3.3 Nonstationary Model for Normal and Lognormal Means..
.33.1 i is Unknown and a2 is Known................
3.3.2 j and d2 Both Unknown.......................
3.3.3 Stationary Versus Nonstationary Results.....
3.4 Conclusion ..........................................
FOUR LIMITING RESULTS AND PREDICTION INTERVALS FOR NONSTA
TTONAKY NORMAL AND LOGNORIALI PROCESSES ...................
4 .1 Lnt roduc t ion ........................................
vi
Page
iv
ix
x
xi
..........
. . . . . .
..........
..........
....................
....................
....................
....................
4.2 Special Properties and Limiting Results Under
Nonstationarity. ........................................ 86
4.2.1 Limiting Behavior of m' and n' When P is
the Only Unknown Parameter..................... 86
4.2.2 Limiting Behavior of m nt', v and dt
When Both Parameters p and oc are Unknown....... 95
4.3 Prediction Intervals for Normal, Student, Lognormal
and LogStudent Distributions............................ 103
4.4 Conclusion. ........................................... 117
FIVE NONSTATIONARITY IN CVP AND STATISTICAL LIFE ANALYSIS........ 119
5.1 Introduction........................................... 119
5.2 Nonstationarity in CostVolumeProfit Analysis.......... 120
5.2.1 Existing Analysis............................... 120
5.2.2 Nonstationary Bayesian CVP Model ............... 122
5.2.3 Extensions to the Nonstationary Bayesian
CVP Model...................................... 136
5.3 Nonstationarity in Statistical Life Analysis........... 140
5.3.1 Existing Analysis............................... 140
5.3.2 A Life Testing Model Under Nonstationarity..... 141
5.4 Conclusion............................................. 148
SIX CONCLUSIONS, LIMITATIONS AND FURTHER STUDY.................. 150
6.1 Summary ............. ........................... 150
6.2 Limitations ............................. ........ 152
6.3 Suggestions for Further Research ............... 155
APPENDIXES
I Bayesian Analysis of Normal and Lognormal Processes......... 160
IT Nonstationary Models for the Exponential Distribution....... 172
IIL Algorithm to Determine Prediction Intervals for Lognormal
and LogStudent Distributions ................................ 185
Chapter
Page
Page
LIST OF REFERENCES.................................................... 198
BIOGRAPHICAL SKETCH .................................................... 213
viii
LIST OF APPENDIX TABLES
Table Page
1. predictive Intervals for Some Lognormal Predictive
Distributions ........................................ .... 191
2. Predictive Intervals for Some LogStudent Predictive
Distributions. ............................................ 192
LIST OF FIGURES
Figure Page
1. Life Characteristics of Some Systems...................... 21
AIII.1 Predictive Distribution.................................. 186
AIII.2 Predictive Distribution .................................. ]87
AIII.3 Predictive Distribution.................................. 188
AIII.4 Predictive Distribution.................................. 189
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
BAYESIAN MODELING OF NONSTATIONARITY IN
NORMAL AND LOGNORMAL PROCESSES WITH
APPLICATIONS IN CVP ANALYSIS AND
LIFE TESTING MODELS
By
Jorge Ivan VelezArocho
June 1978
Chairman: Christopher B. Barry
Major Department: Management
Probability models applied by decision makers in a wide variety
of contents must be able to provide inferences under conditions of change.
A stochastic process whose probabilistic properties change through time
can be described as a nonstationary process. In this dissertation a model
involving normal and lognormal processes is developed for handling a par
ticular form of nonstationarity within a Bayesian framework. Two uncer
tainty conditions are considered; in one the location parameter, p, is
assumed to be unknown and the spread parameter, a, is assumed to be known;
and in the other both parameters are assumed to be unknown. Comparing the
nonstationary model with the stationary one it is shown that:
_1. more uncertainty (of a particular definition) is present
under nonstationarity than under stationarity;
2. since the variance of a lognormal distribution, V(x), is a
function of p and o2, nonstationarity in P means that both mean and vari
ance of the random variable, x, are nonstationary so that the lognormal
xi
case provides a generalization of the normal results;
and
3. as additional observations are collected uncertainty about
stochasticallyvarying parameters is never entirely eliminated.
The asymptotic behavior of the model has important implications
for the decision maker. An implication of the stationary Bayesian model
for normal and lognormal processes is that as additional observations are
collected,parameter uncertainty is reduced and (in the limit) eliminated
altogether. In contrast, for the nonstationary model considered in this
dissertation the following inferential results are obtained:
1. for the case of lognormal or normal model, a particular form
of stochastic parameter variation implies a treatment of data involving
the use of all observations in a differential weighting scheme;
and
2. random parameter variation produces important differences in
the limiting behavior of the prior and predictive distributions since
under nonstationarity the limiting values of the parameters of the poste
rior and predictive distributions cannot be determined clearly.
Practical implications of the results for the areas of Cost
VolumeProfit Analysis and life testing are discussed with emphasis on
the predictive distribution for the outcome of a future observation from
the data generating process. It is emphasized that a CostVolumeProfit
(CVP) and life testing model ideally should include the changing charac
ter of the process by allowing for changes in the parametric description
of the process through time. Failure to recognize nonstationarity when
Xii
it is present has a number of implications in the CVP and lifetesting
contexts that are explored in the dissertation. For example, inferences
are improperly obtained if the nonstationarity is ignored, and prediction
interval coverage probabilities are overstated since uncertainty is
greater (in a particular sense) when nonstationarity is present.
:'7 A
CHAPTER ONE
INTRODUCTION
1.1 Introduction
Uncertainty is an essential and intrinsic part of the human
condition. The opinions we express, the conclusions we reach and the
decisions we make are often based on beliefs concerning the probability
of uncertain events such as the result of an experiment, the future value
of an investment or the number of units to be sold next year. If manage
ment, for instance, were certain about what circumstances would exist at
a given time, the preparation of a forecast would be a trivial matter.
Virtually all situations faced by management involve uncertainty, however,
and judgments must be made and information must be gathered to reduce
this uncertainty and its effects. One of the functions of applied mathe
matics is to provide information which may be used in making decisions
or forming judgments about unknown quantities.
Several early studies by econometricians and statisticians
examined the problem of constructing a model.whose output is as
close aspossible to the observed data from the real system and which
reflects all the uncertainty that the decision maker has. Mathematical
models for statistical problems, for instance, have some element of un
certainty incorporated in the form of a probability measure. The model
usually involves the formulation of a probability distribution of the
uncertain quantities. This element of uncertainty is carried through
the analysis to the inferences drawn. The equations that form the mathe
matical model are usually specified to within a number of parameters
or coefficients which must be estimated. The unknown parameters are
usually assumed to be constant and the problem of model identification
is reduced to one of constant parameter estimation.
There are several reasons for suspecting that the parameters
of many models constructed by engineers and econometricians are not
constant but in fact timevarying. For instance, it has become increas
ingly clear that to assume that behavioral and technological relationships
are stable over time is,in many cases,completly untenable on the basis
of economic theory. Several recent studies provide support for the claim
that the parameters of distributions of stockpricerelated variables may
change over time [see Barry and Winkler (1976)]. In engineering, particu
larly in reliability theory, the origins of parameter variation are usually
not very hard to pinpoint. Component wear, variation in inputs or compo
nent failure are some very common reasons for parameter variations. The
major objective of construction of engineering models is control and regu
lation of the real system modeled. Therefore, much of the research in
that area has concentrated on devising ways to make the output of the
model insensitive to parameter variation. Similarly, in forecasting models
for economic variables, researchers have had great concern with time varying
parameters of the distributions of interest. In this area the problem of
varying parameters has received increased attention because there is
increasing evidence that the common regression assumption of stable
parameters often appears invalid.
Ln this dissertation we plan to study a particular type of random
parameter variation which is likely to be applicable when nonstationariLy
over time is present. The modeling of nonstationarity that we are going to
present assumes that successive values in time of the unknown parameter
are related in a stochastic manner; i.e., the parameter variation includes
a component which is a realization of some random process. For purposes
of estimation we are interested in specific realizations of the random
process. When the process generating the unknown parameter is a nonsta
tionary process over time the decision maker should be concerned with
a sequence of values of the parameter instead of a single value as in
the usual stationary model; i.e., inferences and decisions concerning the
parameter should reflect the fact that it is changing over time.
If the values of an unknown parameter over time are related
in a stochastic manner, a formal analysis of the situation requires
some assumptions about the stochastic relationship. For the model of
nonstationarity that we develop in this dissertation, the specification
of the stochastic relationship between values of the parameter is suf
ficient. Moreover it is assumed that this relationship is stationary
(usually referred to as secondorder stationarity) in the sense that the
stochastic relationship is the same for any pair of consecutive values
of the unknown parameter.
We want to gain more precise information about the structure
of the timevarying parameters and to obtain estimated relationships
that are suitable for forecasting. The model to be developed makes it
possible to draw inferences about the structure of the relationship at
every point in time. There are problems in accounting, life testing theory,
finance and a variety of other areas that can benefit from nonstationary
parameter estimation techniques.
1.2 Summary of Results and Overview of Dissertation
The goals of this dissertation are to develop a rigorous model
for handling nonstationarity within a Bayesian framework, to compare
inferences from stationary and nonstationary models, and to investigate
inferential applications in the areas of CostVolumeProfit Analysis and
life testing models involving nonstationarity. Probably the most important
advantage of the new work to be presented in this dissertation is the
increased versatility it adds to the nonstationary Bayesian model derived
by Winkler and Barry (1973). The new results enlarge the range of real and
important problems involving univariate and multivariate nonstationary
normal and lognormal processes which can be handled. Another advantage
is the simplicity of the updating methods for the efficient handling of
the estimation of unknown parameters and the prediction of the outcome
of a future sample.
A survey of the most relevant literature is provided in Chapter
Two to set the stage for the new developments in the remainder of the dis
sertation. In this survey we present an overview of probabilistic Cost
VolumeProfit (CVP) Analysis and discuss the most important articles
that deal with CVP under conditions of uncertainty. The rev t~w of the
literature includes a section on life testing models emphasizing the use
of Bayesian techniques used in life testing. It is emphasized that most
of the research done in these two areas neglects the problem of nonsta
tionarity. A special section is presented to discuss some important
articles about modeling nonstationary processes.
As is mentioned in Chapter Two, most research concerned with
the normal and lognormal distributions has considered only stationary
situations. That is, the parameters and distributions used are assumed
to remain the same in all periods. In Chapter Three we develop a Bayesian
model of nonstationnrity for normal and lognormal processes. In it we
describe essential features of the Bayesian analysis of normal and log
normal processes inder nonstationarity, like the prior, posterior and
predictive distributions. Two uncertainty conditions are considered in
this chapter; in one the location parameter, 1, is assumed to be unknown
and the spread parameter, o, is assumed to be known; and in the other,
both parameters are assumed to be unknown. Comparing the nonstationary
model with the stationary one it is shown that:
1. more uncertainty (of a particular definition) is present
under nonstationarity than under stationarity;
2. since the variance of a lognormal distribution, V(x), is a
function of p and (J2 nonstationarity in p means that both mean and vari
ance of the random variable, x, are nonstationary, so that the lognormal
case provides a generalizati n of the normal results;
and,
3. that,as additional observations are collected, uncertainty about
stochasticallyvarying parameters is never entirely eliminated.
The results discussed in Chapter Three have to do with the period
toperiod effects of random parameter variation upon the posterior and pre
dictive distributions. However, the asymptotic behavior of the model has
important implications for the decision maker. An implication of the sta
tionary Bayesian model for normal and lognormal processes is that as addi
tional observations are collected parameter uncertainty is reduced and
(in the limit) eliminated altogether. Such an implication is inconsistent
with observed real world behavior largely because the conditions under
which inferences are made typically change across time. The common dictum
[see Dickinson (1974)] has been to eliminate some observations in the case
of changing parameters so that only those most recent observations are
considered. In Chapter Four we show that:
1. for the case of a lognormal or normal model, a particular
form of stochastic parameter variation implies a treatment of data
involving tile use of all observations in a differential weighting scheme,
and,
2. random parameter variation produces important differences
in the limiting behavior of the prior and predictive distributions since
under nonstationarity tile limiting values of some of the parameters of the
posterior .nd predictive distributions can not be determined clearly.
One objective of tlis dissertation is to develop Bayesian pre
diction intervals for future observations that come from normal and log
normal data generating processes. In Chapter Four we address the problem
of constructing prediction intervals for normal, Student, lognormal and
logStudent distributions. It is pointed out that it is easy to construct
these intervals for the normal and Student distributions but that it is
rather difficult for the lognormal and logStudent distributions. An
algorithm is presented to compute the Bayesian prediction intervals for
the lognormal and logStudent distributions. Bayesian prediction intervals
under nonstationarity are compared with classical, certainty equivalent
and Bayesian stationary intervals.
In Chapter Five we discuss the application of the results of
Chapters Three and Four concerning nonstationarity to the area of CVP
analysis and life testing models. Practical implications of our results
for these two areas are discussed with emphasis on the predictive dis
tribution for the outcome of a future observation from the data generating
process. It is emphasized that CVP and life testing models ideally
should include the changing character of the process by allowing for
changes in the parametric description of the process through time. It
is shown that, for the case of normal and lognormal data generating
processes under a particular form of stochastic parameter variation, the
presence of nonstationarity produces greater uncertainty to the decision
maker. Nonstiationarilty implies greater uncertainty, which is reflected
by an increase in lie predictive variance of profits for CVP models,
8
by an increase in the predictive variance of life length for life testing
models, and by an increase in the width of intervals required to contain
particular coverage probabilities.
Chapter Six provides conclusions, limitations and suggestions
for further research. Since stationarity assumptions are often quite
unrealistic, it is concluded in that chapter that the introduction of
possible nonstationarity greatly increases the realism and applicability
of statistical inference methods, in particular of Bayesian procedures.
CHAPTER TWO
SURVEY OF PERTINENT LITERATURE
The primary purpose of the research in this dissertation is
to present a Bayesian model of nonstationarity in normal and lognormal
processes with applications in CostVolumeProfit analysis and life
testing models. A survey of the most relevant literature is provided
in the chapter and will serve to set the stage for the new developments
in the remainder of the thesis.
In this survey, three areas are covered. In Section 2.1 we pre
sent an overview of probabilistic CostVolumeProfit (CVP) analysis and
discuss the most important articles that deal with CVP under conditions
of uncertainty. In Section 2.2 we discuss life testing models with an
emphasis on the exponential, gamma, Weibull and lognormal models. The
review of the literature includes a special section on Bayesian techniques
used in life testing. Finally in Section 2.3 a survey is presented of
some important articles about modeling nonstationary processes.
2.1 CostVolumeProfit (CVP), Analysis
Management requires realistic and accurate information to
aid in decision making. CostVolumeProfit (CVP) analysis is a widely
accepted generator of information useful in decision making proces
ses. CVP analysis essentially consists in examining tle relationship
between changes in volume ( output ) and changes in profit. The funda
mental assumption in all types of CVP decisions is that the firm, or
a department or otl.ier Lype of costing unit, possesses a fixed set
of resources that commits the firm to a certain level of fixed costs
for at least a shortrun period. The decision problem facing a manager
is to determine the most efficient and productive use of this fixed
set of resources relative to output levels and output mixes. The scope
of CVP analysis ranges from determination of the optimal output level
for a singleproduct department to the determination of optimal output
mix of a large multiproduct firm. All these decisions rely on simple
relationships between changes in revenues and costs and changes in
output levels or mixes. All CVP analyses are characterized by their
emphasis on cost and revenue behavior over various ranges of output
levels and mixes.
The determination of the selling price of a product is a
complex matter that is often affected by forces partially or entirely
beyond the control of management. Nevertheless, management must formu
late pricing policies within the bounds permitted by the market place.
Accounting can play an important role in the development of policy
by supplying management with special reports on the relative profit
ability of its various products, the probable effects of contemplated
changes in selling price and other CVP relationships.
The unit cost of producing a commodity is affected by such
factors as the inherent nature of the product, the efficiency of oper
ations, and the volume of production. An increase in the quantity
produced is ordinarily accompaniedd by a decrease in unit cost, pro
vided the volume attained remains within the limits of plant capacity.
Quantitative data relating to the effect on income of changes in
unit selling price, sales volume, production volume, production costs.
and operating expenses help management to improve the relationships
among these variables. If a change in selling price appears to be de
sirable or, because of competitive pressure, unavoidable, the possible
effect of the change on sales volume and product cost needs to be
considered.
A mathematical expression of the profit equation of CVP
analysis is:
(2.1.1) Z = Q (PV) F,
where Z = total profits,
Q = sales volume in units,
P = unit selling price,
V = unit variable cost,
and F = total fixed costs.
This accounting model of analysis has been traditionally
used by the management accountant in profit planning. This use, how
wver, typically ignores the uncertainty associated with the firm's oper
ation, thus severely limiting its applicability. During the past 12
years, accountants have attempted to resolve this problem by intro
ducing stochastic aspects into the analysis.
The applicability of probabilistic models for this analysis
has been claimed because of the realism of such models, i.e., deci
sions are always accompanied by uncertainty. Thus, the ideal model
is one that gives a probability distribution of the criterion variable,
profit, and that fully recognizes the uncertainty faced by the firm.
The realism of such a model is dependent on logical assumptions for
the input variables and rigorous methodology in obtaining the output
distribution. Further, we hope that,the model can accommodate a wide
range of uses. For example, the capability to handle dependence among
input variables adds a highly useful dimension.
Jaedicke and Robichek (1964) first introduced risk into the
model. They assumed the following relation among the means
(2.1.2) R(Z) = E(Q) [E(P) E(V)] E(F),
where E(.) denotes mathematical expectation.
In addition they assumed that the key variables were all normally
distributed and that the resulting profit is also normally distributed.
Thus, by computing the mean value and standard deviation of the re
sulting profit function, various probabilistic measures of profit
can be obtained. This model has been depicted as a limit analysis,
since the assumptions of the independent model parameters and the
normalcy of the resulting profit function are not true except in
limiting cases. According to Ferrara, Hayya and Nachman (1972), the
product of two normally and independently distributed variables will
approximate normality if the sum of the two coefficients of variation
is less than or equal to .12.
Others have confronted the same problem of how to identify
the resulting profit distribution when it is not close to a normal
distribution. They have noted that it is often difficult to obtain
analytical expressions for Lhe product of random variables. Because
Lhe appropriate d i :tr inbi ti nal forms for the product of the variable
functions may not be known, Buzby (1974) suggests the application of
Tchebycheff's theorem to stochastic CostVolumeProfit analysis. This
theorem, however, permits the analyst to derive only some very crude
bounds on the probabilities of interest, so its value as a decision
making tool is limited. Liao (1975) illustrated how model sampling
(also called distribution sampling) coupled with a curvefitting
technique can be used to overcome the above problems associated
with stochastic CVP analysis. In his paper, the illustration of the
proposed approach to stochastic CVP analysis is first developed through
a consideration to the JaedickeRobicheck problem, wherein the model
parameters are independent and normally distributed. After that, the
illustration problem is modified to accommodate dependent and nonnormal
variates in the problem.
Hilliard and Leitch (1975) developed a model for CVP analysis
assuming a more tractable distribution for the inputs of the equation.
It allows for dependent relationships and permits a rigorous deriva
tion of the distribution of profit. The problems of assuming price and
quantity to be independent are pointed out. The authors also pointed
out that assuming sales to be normally distributed implies a positive
probability of negative sales.
Probabilities and tolerance intervals for the Hilliard and
Leitch model are obtained from tables of the normal distribution.
The only assumpt ions required for the model are (1) quantity and
contribution margin are lognormally distributed random variables
and (2) fixed cosLs art deterministic. The assumption that sales
quantity and contribution margin are bivariate lognormally distributed
eliminates the possibility of negative sales and of selling prices
below variable costs, and it has the nice additional property that the
product of two bivariate lognormal random variables is also lognormal.
Thus, we can allow for uncertainty in price and quantity and still
have a closed form expression for the probability distribution of
gross profits. H!illi:ird and Leitch can not assume that price and
variable costs are marginally lognormally distributed and have contri
bution margin also be lognormally distributed. Similarly, if fixed
costs are assumed to be lognormally distributed too, net profits will
not be lognormally distributed.
Adar, Barnea and Lev (1977) presented a model for CVP
analysis under uncertainty that combines the probability characteristics
of the environment variables with the risk preferences of decision
makers. The approach is based on recently suggested economic models
of the firm's optimal output decision under uncertainty, which were
modified within the meanstandard deviation framework to provide for
a costvolumeutility analysis allowing management to: (1) determine
optimal output, (2) consider the desirability of alternative plans
involving changes in fixed and variable costs, expected price and
uncertainty of price and technology changes and (3) determine the
economic consequences of fixed cost variances.
Dickinson (1974) addresses the problem of CVP analysis under
uncert.i nty by examining l he reliability of using the usual methods
of estimating the means ;nd variances of the past distributions of
sales demand lie emphasized that, when the expectation and variance
of profits are estimated from past data, it is important to differen
tiate between what, in fact, are estimated and what are true values
of the parameters. In other words, he pointed out that the estimated
expectation of profits, E(T), reflects estimation risk and is not
equal to E(,). Classical confidence intervals were used for the
expected value of profits, E(n), for the variance of profits, Var(nf),
and for probabilities of various profit levels. However, Dickinson
misinterpreted the classical confidence intervals that he obtains in
his paper. When a classicist constructs a 90 percent confidence interval
for for example, he would state that in the long run, 90 percent
of all such intervals will contain the true value of p. The classical
statement is based on longrun frequency considerations. The classicist
is absolutely opposed to the interpretation that the 90 percent refers
to the probability that the true universe mean lies within the specified
interval. in the eyes of a classicist, a unique true value exists for
the universe mean, and therefore the value of the universe mean can
not be treated as a random variable. Dickinson's paper also illus
trates the difficulty of obtaining the probability statements of
greatest interest to management in a classical approach. lis analysis
is only able to provide confidence intervals of probabilities of
profit levels rather than the profit level probabilities themselves.
The problem of parameter uncertainty has been neglected by
the people that have studied CVP analysis under uncertainty. In the
Bayesian approach, 1ine11rtainty regarding the parameters of probability
models is reflected in prior and posterior probability statements
regarding the parameters. Marginal distributions of variables which
depend on those parameters may be obtained by integrating out the
distribution of the parameters, thereby obtaining predictive distri
butions [see Roberts (1965) and Zellner (1971)] of the quantities of
interest to the manager. These predictive distributions permit one
to make valid probability statements regarding the important quan
tities, such as profits.
Nonstationarity is another important aspect related to CVP
analysis that no one has considered. In a world that is continually
changing, it is important to recognize that the parameters that
describe a process at a particular point in time may not do so at
a later point in time. In the case of the variable sales, for instance,
experience shows that it is typically affected by a variety of eco
nomic and political events. Thus, a CVP model ideally should include
the changing character of the process by allowing for changes in the
parametric description of tle process through time. Failure to recog
nize the nonstationary conditions may result in misleading inferences.
In this dissertation the problem of CostVolumeProfit analy
sis will be considered from a Bayesian viewpoint, and inferences under
a special case of nonstationarity will be considered. Also the Bayesian
results under nonstationarity will be compared with those results
that can be obtained under a stationary Bayesian model, and the Baye
sian model will hib compared with some alternative approaches.
2.2 Life Testing Models
2.2.1 Introduction
The development of recent technology has given special impor
tance to several problems concerning the improvement of the effective
ness of devices of various kinds. It is often important to impose
extraordinarily high standards on the performance of these devices,
since a failure in the performance could bring disastrous consequences.
The quality of production plays an important role in today's life. An
interruption in the operation of a regulating device can lead not
only to deterioration in the quality of a manufactured product but
also to damage of the industrial process. From a purely economic view
point high reliability is desirable to reduce costs. However, since
it is costly to achieve high reliability, there is a tradeoff. The
failure of a part or component results not only in the loss of the
failed item hut often results in the loss (at least temporarily) of
some larger assembly or system of which it is part. There are nu
merous examples in which failures of components have caused losses
of millions of dollars and personal losses. The space program is an
excellent example where even the lives of some astronauts were lost
due to failure in the system. The following authors have considered
the statistical theory of reliability and provide a good set of re
ferences on the subject: Mendenhall (L958), Buckland (1960), Birnbaum
(1962), Govindarajlu l (1 )4), Mann, Schaefer and Singpurwalla (1973),
and Canfield and Borgman (1975).
Re ability tchuury is lih disciple ine that deals wi th procedures
to ensure the maximum effectiveness of manufactured articles
and that develops methods of evaluating the quality of systems from
known qualities of their component parts. A large number of problems
in reliability theory have a mathematical character and require the
use of mathematical tools and the development of new ones for their
solution. Areas like probability theory and mathematical statistics
are necessary to solve some of the problems found in reliability
theory. No matter how hard the company works to maintain constant
conditions during a production process, fluctuations in the production
factors lead to a significant variation in the properties of the
finished products. In addition, articles are subjected to different
conditions in the course of their use. To maintain and to increase
the reliability of a system or of an article requires both material
expenditures and scientific research.
Statist c.ia theory and methodology have played an influen
tial role in the development of reliability theory since the publi
cation of the paper by Epstein and Sobel (1953). Four statistical
concepts provide the basis for estimating relevant parameters and
testing hypotheses about the life characteristic of the subject
matter. These concepts are:
(i) the distribution function of some variable which is a
direct or indirect measure of the response (life time) to usage in
a particular euvirunment;
(ii) the associated probability density (or frequency)
fiinC ti on
(iii) the survival probability function; and
(iv) the conditional failure rate.
A failure distribution provides a mathematical description
of the length of life of a device, structure or material. Consider
a piece of equipment which has been in a given environment,e. The
fatigue life of this piece of equipment is defined to be the length
of time, T(e), this piece of equipment operates before it fails. Full
information about e would fully determine T(e), so that given e, T(e)
would not be random. One source of randomness in life is in uncertainty
about the environment, i.e., T(e) is a random variable because e is
random. Equipment has different survival characteristics depending on
the conditions under which it is operated, and e provides a statement
of what conditions are but does not determine T(e) Fully.
The reliability of an operating system is defined as the
probability that the system will perform satisfactorily within
specified conditions over a given future time period when the system
starts operating at some time origin. Different distributions can
be distinguished according to their failure rate function, which
is known in the literature of reliability as a hazard rate [see
Barlow and Prosch:n (1965)]. The hazard rate (denoted by h), which
is a function of time, gives the conditional density of failure at
time, t,with the hypothesis that the unit has been funcitoning with
out failure up to that point in time. The conditional failure is
defined as:
!(t) = f(t)/[l F(t)] = f(t)/R(t) ,
(2.2.1)
where (2.2.2) F(t) = Prob (T < t) = ft f(t) ds,
CO
is the probability that an observed value of T will be less than or
equal to an assigned number t. The reliability function (also called
the survival function) of the random variable T gives the probability
that T will exceed t and is defined by
(2.2.3) R(t) = 1 F(t) = Prob (T > t).
The probability density function of the random variable T, f(t),
0 < t < oo, is known as the failure density function of the device.
It can be shown that the conditional failure rate and the distribu
tion function of a random variable are related by
(2.2.4) F(t) = 1 exp[ ft h(s) d(s)].
0
The causes of failure can be categorized into three basic
types. It is recognized, however, that there may be more than one
contributing cause to a particular failure and that, in some cases,
there may be no completely clearcut distinction between some of the
causes. The three classes of failure are infant mortalities, or
early failures, random failures and wearout failures. The behavior
of the hazard rate as a funciton of time is sometimes known as the
hazard function or life characteristic of the system. For a typical
system that may experience any of the three previously described types
of failure, the life characteristic will appear as in Figure 1. The
representation of the life characteristic has been classically referred
to as the "bathtub curve", wherein the three segments of the curve
represent the three time periods of initial, chance and wearout failure.
Hazard
rate
Initial Random failure Wearout
Random failure
failure failure
Time
Figure 1. Life characteristics of some systems
The initial failure period is characterized by a high hazard rate
shortly after time x=O and a gradual reduction during the initial
period of operation. During the chance failure period, the hazard
rate is constant and generally lower than during the initial period.
The cause of this failure is attributed to unusual and unpredictable
environmental conditions occurring during the operating time of the
system or of the device. The hazard rate increases during the wearout
period. This failure is associated with the gradual depletion of a
material or an accumulation of shocks and so on.
In the following subsections we will consider the general
properties of some widely used life distributions, the assessment
and use of those distributions, and the literature related to Bayesian
methods in life testing.
2.2.2 Some Common Life Distributions
..2.2.1 The Exponential Distribution
In the case of a constant failure rate the distribution of
life is exponential. This case has received the most emphasis in the
literature, since, in spite of theoretical limitations, it presents
attractive statistical properties and is highly tractable. Data
arising from life tests under laboratory or service conditions are
often found to conform to the exponential distribution.
An acceptable justification for the assumption of an expo
nential distribution to life studies was initially presented by Davis
(1952). More recently Barlow and Proschan (1965) have advanced a mathe
matical argument to support the plausability of the exponential dis
tribution as the failure law of complex equipment. The random variable
T has an exponential distribution if it has a probability density
function of the form
(2.2.5) fT(t) = o" exp[(tO)/o] t > 6,
o > 0.
The mean and variance ot T are (0 + 0) and o2, respectively. In most
applications 0 is taken as zero. For this distribution, the physical
interpretation of a constant hazard function is that, irrespective of
the time elapsed since the start of operation,of a system the prob
ability th:it the system fail in the next time intervals dt,
given that it has survived to time t, is independent of the elapsed
time t and is constant.
2.2.2.2 The Gamma Distribution
An extremely useful distribution in fatigue and wearout
studies is the gamma distribution. It also has a very important rela
tionship to the exponential distribution, namely, that the sum of n
independent and identically distributed (i.i.d.) exponential random
variables with common parameters e=0 and o is a random variable that
has a gamma distribution with parameters n and a. Hence, the exponen
tial distribution is a special case of the gamma with n=l.
The random variable T has a gamma distribution if its pro
bability density function is of the form,
(2.2.6) f (t) = {(t )n1 exp[(t0)/o]} /onr(n); n > 0,
o > 0,
o > 0.
The standard form of the distribution is obtained by putting o=l and
0=0, giving
(2.2.7) f (t) = [tn exp(t)]/ (n), t > 0;
where the gamma function, denoted F, is a mapping of the interval
(0,) into itself and is defined by
(2.2.8) F(n) = tn1 exp(t) dt.
0
The probability distribution function of (2.2.7) is
(2.2.9) 'roblT < t] = lr(n)]1 ft xn1 exp(x) dx
0
Since a distribution of the form given in equation (2.2.6)
can he obtained from standardized distributions, as in equation (2.2.7),
by the linear transformation t=(t'6)/o. there is no difficulty in
deriving formulas for moments, generating functions,etc., for equation
(2.2.6) from those for equation (2.2.7).
One of the most important properties of the distribution is
the reproductive property; if T1 and T2 are independent random variables
each having a distribution of the form (2.2.7), possibly with different
values n', n" of n but with common values of o and 0, then (T1+ T2)
also has a distribution of this form, with the same value of o and
0, and with n = n' + n".
2.2.2.3 The Weibull Distribution
The Weibull distribution was developed by W. Weibull (1951)
of Sweden and used for problems involving fatigue lives of materials.
Three parameters are required to uniquely define a particular Weibull
distribution. Those three parameters are the scale parameter o, the
shape parameter n and the location parameter 0.
A random variable T has a Weibull distribution if there are
values of the parameters n (>0), o (>0) and 0 such that,
(2.2.10) Y = [(te)/on
has the exponential distribution with probability density function
(2.2.11) ty(y) = exp(y), y > 0.
The probability density function of T is given by
at succesive points of time of, for example, a fatigue crack or the
growth of biological organisms and the change between any pairs of
successive steps or stages is a random proportion of the previous
size, then asymptotically the distribution of the random variable is
lognormal [see Kapteyn (1903)]. This theoretical result imparted some
plausibility to the lognormal distribution for failure problems. Let
tl< t2< ... < t be a sequence of random variables that denote the
sizes of a fatigue crack at succesive stages of its growth. It is
assumed that the crack growth at stage i, t. t.i1, is randomly
proportional to tle size of the crack, t.1 and that the item fails
when the crack reaches t Let ti t = t_.L i= 1, 2, ..., n, where
7. is a random variable. The ni are assumed to be independently dis
tributed random variables that need not have a common distribution
for all i's when n is large but that need to be lognormally distrib
uted otherwise. Thus,
1ri = (t t i = 1 2, . n .
Mann, Schaefer and Singpurwalla (1973) show that In tn, the life
length of the item, for large n, is asymptotically normally distri
buted, and hence tn has a lognormal distribution.
If there is a number y such that
(2.2.18) Z = In(ty)
is normally distiibilted, then the distribution of t is said to be
lognormal. The distribution of t can be defined by the equation,
(2.2. 19) U = , + 6 ln(tL ) ,
(2.2.12) fT(t) = no [(t0)/o]n exp (t )/o]} t > .
The standard Weibuill distribution is obtained by putting o=1 and
0=0. The value zero for 0 is by far the most frequently used, espe
cially in representing distributions of life times.
The Weibull distribution has cumulative distribution function
(2.2.13) FT(t) = ]exp{[(t0)/o]n ,
and its mean and variance are
(2.2.14) E(t) = oF(l + [1/n])
and (2.2.15) Var(t) = o2{F(l+[2/n]) F2(l+[l/n])} respectively.
For the two parameter Weibull distribution we have that the reliability
and hazard function are
(2.2.16) R (L) = exp [(t/o)n]
and
(2.2.17) h (t) = nt1/n
When n=l, the hazard function is a constant. Thus the exponential dis
tribution is a special case of the Weibull distribution with n=l.
2.2.2.4 The Lognormal Distribution
The lognormal distribution is also a very popular distribution
in describing wearoit failures. This model was developed as a physical
or, more appropriately biological, model associated with the theory
of proportion Le effects (see Aitchison and Brown (1957) for a full
description of thle distribution, its properties, and its developments).
Briefly, if a random variable is supposed to represent the magnitudes
where U is a unit normal variable and 0, 6 and y are parameters. The
probability density function of T is defined by
(2.2.20) fT(t) = 6[(ty)/Ji ] exp[{f+61n(ty)}2/2], t>y.
An alternative, more fashionable notation replaces 0 and 6 by the
expected value p and standard deviation o of Z = In(ty). The two
sets ot parameters are related by the equations,
(2.2.21) p = 0/6
and
(2.2.22) o = 6
so that the distribution of t can be defined by
(2.2.23) U = [In(ty) p]/o
and the probabiliLy density function of T by
(2.2.24) fT(t) = [(ty)/2Ro]1 exp[{ln(ty)p}2/2U2], t>y.
In many applications, y is known (or assumed) to be zero.
This important case has been given the name two parameter lognormal
distribution. The mean and variance of the two parameter distribution
are given by
(2.2.25) E(t) = exp[P + (2/2)] ,
and
(2.2.26) Var(t) = [exp(211) ] ((al) ,
where M = rxp(O2).
In addition, the value t such that Pr(t
corresponding percentile, U of the unit normal distribution by the
relation,
(2.2.27) t = exp(p + U a).
Applications of the lognormal distribution have appeared in
many diverse areas, e.g., environmental health [see Dixon (1937) and
Hill (1963)], air pollution control [see Singpurwalla (1971, 1972),
Larsen (1969) and others like economics and insurance claims [see
Wilson and Worcester (1945)] application of the distribution is not
only based on empirical observation, but in some cases is supported
by theoretical arguments.
For example, such arguments have been made in the distribution
of particle sizes in natural aggregates and in the closely related
distribution of dust concentration in industrial atmospheres [see
Tomlinson (1957) and Oldham (1965)]. The lognormal distribution has
also been found to be a serious competitor to the Weibull distribution
in representing life time distributions for manufactured products.
Among our references, Adams (1962), Ansley (1967), Epstein (1947,
1948), Farewell and Prentice (1977), Govindarajulu (1977), Goldthwaite
(1961), Gupta (1962), Hald (1952) and Nowick and Berry (1961) refer
to this topic. Other applications in quality control are described
by Ferrell (1958), Morrison (1958) and Rohn (1959). Many of these
applications are also referenced by Aitchison and Brown (1957),
Finney (1941) and Gupta et al. (1974).
2.2.3 Traditional Approach to Life Testing Inferences
In life testing theory we find a large number of random quan
tities. In most cases we do not know the distributions and theoretical
characteristics; our aim is to estimate some of these quantities. This
is usually accomplished with the aid of observations on the random
variables. According to the laws of large numbers, an "exact" deter
mination of a probability, an expected value, etc., would require an
"infinite" number of observations. Having samples of finite size,
we can do no more than estimate the theoretical values in question.
The sample characteristics, or statistics, serve the purpose of sta
tistical estimation. For a good estimation of theoretical quantities,
a fairly large sample is sometimes needed. In many practical situations
the following two types of estimation problems arise. A certain quan
tity, say 6, which is, from the statistical point of view, a theo
retical quantity, has to be determined by means of measurement. Such a
quantity may be, for example, the electrical resistance of a given
device, the life of a given product, etc. The result T of the mea
suring procedure is a random variable whose distribution depends on
o and perhaps on additional quantities. That is,we have to estimate
the parameter 0 out of a sample T1, T, ... T taken on T. In the
n
other case, tile quantity in question is a random variable itself
and in such cases we are interested in the (theoretical) average
value, or tie dispersion of T, etc. This means that we have to es
timate the expected value E(T) or Var(T), and perhaps other (constant)
quantities that can be expressed with the aid of the distributed on
function of T, like the reliability function. More often for lifetime
distributions, the quantity of interest is a distribution percentile,
also known as the reliable life of the item to be tested, corresponding
to some specified population survival proportion; or it is the pop
ulation proportion surviving at least a specified time, say S
For the classical statistician,the unknown parameter 6 is
considered to be a constant. In estimating a constant value there
are various aspects to consider. If we wish to have an estimator
whose value can be used instead of the unknown parameter in formulas
[certainty equivalent (CE) approach], then the estimator should
have one given value. In this case we speak of point estimation. But
knowing that our estimator is subject to error, sometimes we would
like to have some information on the average deviation from the
value. In this case we have to construct an interval that contains
the unknown parameter, at least with high probability, or give a
measure of the variability of the estimator (such as the standard
error of the estimate). Most of the literature about the traditional
approach to life testing inferences is focused in two areas; one
relates to point and interval estimation procedures for lifetime
distributions and the other relates to methods of testing statisti
cal hypotheses in reliability (known as "reliability demonstration
tests").
The classical approach to point estimation in life testing
inferences emphasizes that a good estimator should have properties
like unbiasedness, eff iciency, consistency and sufficiency [see
Dubey (1968), Bartlett (1937) and Weiss (1961). Two methods, the
method of moments and method of maximum likelihood, are frequently
used to yield estimators with as many as possible of the previously
mentioned properties. Under various sampling assumptions, the maxi
mum likelihood estimators of the parameters were obtained for the
following distributions; gamma Isee Choi and Wette (1969) and Harter
and Moore (1965) ; Weibull [see Bain (1972), Billman et al. (1971),
Cohen (1965), Englehardt (1975), Haan and Beer (1967), Lemon (1975)
and Rockette et aL. (1973)1; exponential [see Deemer and Votaw (1955),
ElSayyad (1967) and Epstein (1957)]; and for the normal and lognormal
[see Cohen (1951), Hlarter and Moore (1966), Lambert (1964) and Tallis
and Young (1962)]. The traditional approach also includes some linear
estimation properties like Best Linear Unbiased (BLU) and Best Linear
Invariance (BLI).
Interval estimation procedures have also been developed for
the parameters of the life distributions. Examples include Bain and
Englehardt (1973), Epstein (1961), carter (1964) and Mann (1968).
Point or interval estimators for functions of the life distributions,
such as reliable life, reliability function, hazard rate, etc. were
obtained by substituting for the unknown parameters the point or inter
val estimators obtained for them [see Johns and Lieberman (1966),
Bartholomew (1963), (rubbs (1971), Harris and Singpurwalla (1968, 1969),
Lawless (1971,1972), Iikes (1967), Mann (1969a, 1969b, 1970), Varde
(1969) and Linhart ( )5) ].
Testing reliability hypotheses is the second major area of
research in the classical approach to life testing. By means of
the methods referenced previously, a test statistic is selected,
regions of acceptance and rejection are set up, and risks of in
correct decisions are calculated. In addition it is emphasized
that the risks of incorrect decisions are specified before the
sample is obtained, and in this case n, the sample size, is gene
rally to be determined. Some of the references in this area include
[Epstein (1960), Epstein and Sobel (1955), Kumar and Pate] (1971),
Lilliefors (1967, 1969), Sobel and Tischendorf (1959), Thoman et al.
(1969, 1970) and Fercho and Ringer (1972)].
A large part of the statistical problem in reliability in
volves the estimation of parameters in failure models. Each of the
methods of obtaining point estimates previously referenced has
certain statistical properties that make it desirable, at least
from a theoretical viewpoint. Not surprisingly, point estimates
are often made (particularly in reliability) because decisions are
to be based on them. The consequences of the decisions based on the
estimates often involve money, or, more generally, some form of
utility. Hence the decision maker is more interested in the practi
cal consequence of the estimate than in its theoretical properties.
In particular, he may be interested in making estimates that mini
mize the expected loss (cost), but this can not be accomplished in
general with classical methodology because the methodology does not
admit probability distributions of the parameters.
92. 2. Bayesian Techniques in Life Testing
The nonBayesian (classical) approach to estimation con
siders an unknown parameter as fixed. This means that classical in
terval estimation and hypothesis testing must lean on inductive
reasoning either through the likelihood function or the sampling
distributions. In point estimation, the classical approach must
depend on estimates the criteria for which often are not based on
the practical consequences of the estimates. On the other hand,Bayes
procedures assume a prior distribution of the parameter space, that
is,considers the parameter as a random variable, and,hence, the pos
terior distribution is available. This creates the possibility of
a whole new class of criteria for estimation, namely, minimization
of expected loss, probability intervals and others.
In view of the difficulty in assessing utility or costs of
complex reliability problems, in previous studies Bayesian methods
have been used primarily to provide a means of combining previous
data (expressed as the prior distribution) with observed data
(expressed in the likelihood function) to obtain estimates of parame
ters by using the posterior density. However, it must be emphasized that
Bayesian methods are perfectly general in providing whatever the
reliability problem demands.
Tlhre is a loss function that is rather popular in Bayesian
analysis and gives simple results. Suppose that 6 is an estimate of
)and that the loss function is
('.2. 28) 1(k,) = (0 O) 2.
This function states that the loss is equal to the square of the
distance of ; from 0. The Bayes approach is to select the estimate
of 6 that minimizes the expected loss with respect to the posterior
distribution. The estimate that accomplishes this is the posterior
mean, that is,
(2.2.29) 6 = E(61tl, t2, ... tn;P)
where P represents prior information. The above loss function is often
called the quadratic loss function and the posterior mean is termed
the Bayes estimate. If the loss function is of the form
(2.2.30) L(6,6) = 66 ,
the estimate of 0 that minimizes the expected loss is the median of
the posterior dLstribution. Canfield (1970) developed a Bayesian es
timate of reliability for the exponential case using this loss function.
The resulting estimate is seen to be the MVUE of reliability when the
prior is flat. A third and simple case is the asymmetric linear,
k (66) if 0>6
(2.2.31) L(6,6) = ( if 6
k (66) if 6<6.
The estimate of 6 that minimizes the expected loss if the ku/(k0+ k )
fractile, [see Raiffa and Schlaifer (1961)]. Beyond these three
simple cases, things become difficult in regard to loss function for
two reasons:
(i) difficulties in assessing a realistic loss function
and
(ii) mathematical intractability.
The expected loss is generally a random variable a prior since it
depends on the as yet unobserved sample data. The unconditional ex
pectation (with respect to the sample) of the expected loss is called
the "Bayes risk" and is minimized by the Bayes estimate.
Bayes methods have been used in a variety of areas of re
liability. Most uses can be characterized as point or interval esti
mation of parameters of Life distributions or of reliability functions.
Examples include Breipohl, et. al., (1965) who studied the be
havior of a family of Bayesian posterior distributions. In addition
the properties of the mean of the posterior distribution as a point
estimate and a method for constructing confidence intervals were
given. The problem of hypothesis testing was considered, among others,
by MacFarland (1972). He provided a simple exposition of the rudi
ments of applying Bayes equation to hypotheses concerning relia
bility.
The Bayesian approach has also been applied to parameter
estimation and reliability estimation of some known distributions
like gamma, Poisson, lognormal and others. Lwin and Singh (1974)
considered a Bayesian analysis of a twoparameter gamma model in
life testing context with special emphasis on estimation of the
reliability function. The Poisson distribution has received the
attention of Canavos (1972, 1973). In the first article a smooth
empirical Bayes estimator is derived for the hazard rate. The re
liability function is also estimated either by using the empirical
Bayes estimate of the parameters, or by obtaining the expectation
of the reliability function. Results indicate a significant reduc
tion in mean squared error of the empirical Bayes estimates over
the maximum likelihood estimates. A similar result was also derived
for the exponential distribution by Lemon (1972) and by Martz (1975).
Next, Canavos developed Bayesian procedures for life testing with
respect to a random intensity parameter. Bayes estimators were
derived for the Poisson parameters and reliability function based
on uniform and gamma prior distributions. Again, as expected, the
Bayes estimators have mean squared errors (MSE) that are appreciably
smaller than those of the minimum variance unbiased estimator (MVUE)
and of the maximum likelihood estimator (MLE).
Zellner (1971) has studied the Bayesian estimation of the
parameters of the lognormal distribution. Employing a flat prior,
Zellner found that the NSE estimators of the parameters are the
optimal Bayesian estimators when a relative squared error loss
function is used.
The Weibull and exponential function have received most
of the attention of authors who have studied life distributions
from a Bayesian viewpoint. The Weibull process with unknown scale
parameter is taken as a model by Soland (1968) for Bayesian decision
theory. The family of natural conjugate prior distributions for the
scale parameter is used in prior and posterior analysis. In addition,
preposterior analysis is given for an acceptance sampling problem
with utility linear in the unknown mean of the Weibull process. Soland
(1969) extended the analysis by treating both the shape and scale
parameters as unknown, hut as was previously known i t is not possi
ble to find a family of continuous joint distributions on the two
parameters that is closed under sampling, so a family of prior dis
tributions is used that places continuous distributions on the scale
parameter and discrete distributions on the shape parameter. Prior
and posterior analysis are examined and seen to be no more difficult
than for the case in which only the scale parameter is treated as
unknown, but posterior analysis and determination of optimal sampling
plans are considerably more complicated in this case.
In Bury (1972), a twoparameter Weibull distribution is
assumed to be an appropriate statistical life model. A Bayesian decision
model is constructed around a conjugate probability density function
for the Weibull hazard rate. Since a single sufficient statistic of
fixed dimensionality does not exist for the Weibull model, Bury was
able to consider only two sampling plans in his preposterior analysis:
obtain one further observation or terminate testing. Bury points out
that small sample Bayesian analysis tends to be more accurate than
classical analysis because of the additional prior information utilized
,in the analysis. Bayes credible bounds for the scale parameter and
for the reliability function are derived by Papadopoulos and Tsokos
(1975).
Reliability data often include information that the failure
event has not yet ccuIrred for some items, while observations of
complete lifetimes are available for other items. Cozzolino (1974)
addressed this problem from a Bayesian point of view, considering
density functions that have failure rate functions consisting of a known
function multiplied by an unknown scale factor. It is shown that a gamma
family of priors is conjugate for the unknown scale parameter for both
complete and incomplete experiments. A very flexible and convenient
model resulting from the assumption of a piecewise constant failure function.
Life tests that are terminated at preassigned time points or
after a preassigned number of failures are sometimes found in reliabil
ity theory. Bhattaclarya (1967) provided a Bayesian analysis of the
exponential model based on this kind of life test. He showed that the
reliability estimate for a diffuse prior (which is uniform over the
entire positive line) closely resembles the classical MVUE, and he
considered the role of prior quasidensitiesL when a life tester has
no prior information. Bhattacharya points out that the use of constant
density over the positive real line has been suggested to express
ignorance but that it causes problems. For example it can not be
interpreted as a probability density since it assigns infinite measure
on the parameter space. [See Box and Tiao (1972).]
A paper by Dunsmore (1974), stands out from among the other
Bayesian papers in life testing and is particularly pertinent to
the life testing application in this thesis. This article is an
important exception because it carries the Bayesian approach to its
natural conclusions by determining prediction intervals for future
If g(O) is any nonnegative function defined in the parame
ter space S& such that g(O) 0 V 0 c Q, then g(b) is called a prior
quasidensi ty.
observations in life testing using the concept of the Bayesian pre
dictive distribution. One objective of prediction is to provide some
estimate either point or interval, for future observations of an
experiment F based on the results obtained from an informative experi
ment E. As we mentioned before, the classical approach to prediction
involves the use of tolerance regions. [See Aitchison (1966), Folks
and Browne (1975), Guenther et al. (1976) and Hewett and Moeschberger
(1976)]. In these we obtain a prediction interval only, and the
measure of confidence refers to the repetitions of the whole experimental
situation. The Bayesian approach on the other hand, allows us to
incorporate further information which might be available through a
prior distribution and leads to a more natural interpretation.
Let t ..., t he a random sample from a distribution with
probability density function P(t 6), (tcT;0cO), and let yl', y2 .... yn
be a second independent random sample of "future" observations from
a distribution with probability density function P(ylo), (ycY;OcO).
Our aim is to make predictions about some function of y y2' "..' Y 1
The Bayesian approach assumes that a prior density function P(0),
(Oce) is available that measures our uncertainty about the value of 6.
If the information in E is summarized by a sufficient statistic t
then a posterior distribution P(eOt) is available. Suppose now that
we wish to predict some statistic y defined on y y2 .' yn. Then
Such a suffice ie nt statistic will always exist since, for
T
example, t co Ild I tl ie vector (tI, L, ... t )
1" n
the predictive density function for y is given by
(2.2.32)
P(y t) = f P(ye) P(olt) de
A Bayesian prediction interval of cover is then defined as an in
terval I such that,
(2.2.33) P(l t) = f P(ylt) dy = 8.
[See, for example, Aitchison and Sculthorpe (1965), Aitchison (1966)
and Guttman (1970).] It should be emphasized that in the Bayesian
approach the complete inferential statement about y is given by the
predictive density function P(y t). Any prediction interval is only
a summary of the full description P(y t).
In general there will be many intervals I that satisfy (2.2.33).
Dunsmore considers most plausible Bayesian prediction intervals
(commonly known as highest posterior density (HPD) intervals) of cover
B, which have the form,
(2.2.34) I = [y:P(y It) > ],
where A is.determined by P(I t) = B.
In conclusion we might say that theuses of Bayesian methods
in life testing have been limited. However in those cases where Bayes
estimators have been found, they performed better, according to clas
sical criteria, than the conventional ones. The use of loss functions
has not been analyzed deeply for the reasons mentioned before; namely
1
It is implicitly assumed in (2.2.32) that conditional on
6, y and t are independent.
that the loss function is usually complex and unknown, and that even
when the loss function is known the Bayes estimate is sometimes dif
ficult to find. Some of these problems wil.L be solved with the develop
ment of mathematical theory and probably with the development of
computer systems. Only the Dunsmore paper fully used the Bayesian method
ology to obtain prediction intervals that consider all available in
formation and fully recognize the remaining parameter uncertainty.
All of the papers discussed in the previous section con
sidered a stationary situation. That is, the known parameters and
the distributions used are assumed to remain the same across all
time periods. It would be of value to study the nonstationary case,
where the parameters are changing in time and possibly the distri
butions could also change in time. It is important to recognize,
however, that probably the problems now faced with the stationarity
assumption will be greater when that assumption is relaxed. Never
this dynamic system is well worth investigating.
2.3 'Modeling of Nonstationary Processes
For many real world data generating processes the assump
tion of stationarity is questionable. Take for instance life testing
models. When it is assumed that the life of certain commodities
follows a lognormal distribution, for example, the stationarity as
sumption could beC expected to hold over short periods of time; but
in most cases it would ihe expected that for a lengthy period, sta
Lionarity would bh, a doubtful assumption. If the model represents
the life of perishable products, ike food for example, then it
would be expected that environmental factors like heat and humidity
could change and affect the characteristics of the life distribution
of the product or affect the input factors used in the manufacturing
process. Furthermore, the wearout of the machines used in the manu
facture of the products could cause changes in the quality of the pro
ducts and hence in the parameters of the life distributions.
Random parameter variation is surely to be a reasonable as
sumption when we are concerned with economic variables, like those
used in CostVolumeProfit analysis. A wide spectrum of circumstances
could be mentioned where the economic environment is gradually
affected. For example, the level of economic development changes
gradually in a country and consequently brings gradual changes in
related variables like income, consumption and price. Also, consumer's
tastes and preferences evolve relatively slowly as social and economic
conditions change and as new marketing channels or techniques are
developed. The gradual increase in technology available to the indus
try and to the government may produce changes that are not dramatic
but tlat will have some influence in any particular period of time.
In other words, it seems reasonable to assume that in at least some
situations the distribution functions of variables, like sales, price
or costs, could be gradually changing in time. It is important to
emphasize that we are referring to gradual changes, the effects of
which are not perfIctly predictable in advance for a particular period.
If a data generating process characterized by some parameter
0 is nonstationary, then it is not particularly realistic to make
inferences and decisions concerning 0 as if 0 only took on a single
value. Instead we should he concerned with a sequence 6 02, ... of
values of 0 corresponding to different time periods, assuming the
characteristics of the process vary across time but are relatively
constant within a given period. Some researchers have studied this
problem with particular stochastic processes.
Chernoff and Zacks (1964) studied what they called a "tracking"
problem. Observations are taken on the successive positions of an
object traveling on a path, and it is desired to estimate its current
position. If the path is smooth, regression estimates seem appropriate.
However, if the path is subjected to occasional changes in direction,
regression will give misleading results. Their objective was to arrive
at a simple formula which implicitly accounts for possible changes in
direction and discounts observations taken before the latest change.
Successive observations were assumed to be taken on n independently
and normally distributed random variables with means p', 2', ..' n"
Each mean is equal to the preceding mean except when an occasional
change takes place. The object is to estimate the current mean p n
They studied the problem from a Bayesian point of view and made the
following assumptions: the time points of change obey an arbitrary
specified a priori probability distribution; the amounts of change
in the means (when changes take place) are independently and normally
distributed random variables with zero mean; and the current mean
n is a normally distributed random variable with zero mean. Using
a quadratic loss function and a uniform prior distribution for p1 on
the whole real line they derived a Bayes estimator of p In ad
dition they derived the minimum variance linear unbiased (MVLU)
estimator of 1 n. Comparing both estimators they found that although
the MVLU estimator is considerably simpler than the Bayes estimator,
when the expected number of changes in the mean is neither zero nor
n1 the Bayes estimator is more efficient than the MVLU.
Chernoff and Zacks studied an alternative problem in which the
prior distribution of time points of change is such that there is at
most one change. This problem leads to a relatively simple Bayes esti
mator. However, difficulties may arise if this estimator is applied
when there are actually two (or more) changes. The suggested technique
starts at the end of a series, searches back for a change in mean and
then estimates the mean value of the series forward from the point at
which such a change is assumed to have occurred. They designed a procedure
to test whether a change in mean has occurred and found a simpler test
than the one used by Page (1954, 1955). Most of the results appearing in
this paper were derived in a previous paper by Barnard (1959) in a some
what different manner, but the general results are essentially the same.
The previous paper by Chernoff and Zacks motivated some
research in the following years. Mustafi (1968) considered a situa
tion in which a random variable is observed sequentially over time
and the distribution of this random variable is subjected to a pos
sible, change at very point in the sequence. The study of this prob
lem is centered about the model introduced by Chernoff and Zacks.
Three aspects of the problem were considered by Nustafi. First he
considered the problem of estimating the current value of the mean
on the basis of a set of observations taken up to the present. Chernoff
and Zacks assumed Lhat certain parameters occurring in the model were
known. Mustafi then derives a procedure for estimating the current
value of the mean on the basis of a set of observations taken at
successive time points when nothing is known about the other parame
ters occurring in the model. Second Mustafi estimated the various
points of change in the framework of an empirical Bayes procedure and
used an idea similar to that of Taimiter (1966) to derive a sequence
of tests to be applied at each stage. Third he considers n independent
observations of a random variable that belong to the one parameter
exponential family taken at successive time points. He examines the
problem of testing the equality of these n parameters against the
alternative that the parameter has changed rtimes at some unknown
points where r is some finite positive integer less than n. He de
veloped a test procedure generalizing the techniques used by Kender
and Zacks (1966) and Page (1955).
Hinich and Farley (1966) also studied the problem of estima
tion models for time series with nonstationary means. They assumed
a model similar to the one developed by Chernoff and Zacks except
that they assumed that the number of points of change per unit time
are Poisson distributed with a known shift rate parameter. They found
an estimator for tie mean which is unbiased and efficient. Also it
turned out to be a linear combination of the vector of observations.
The Farleyllinich technique attempts to estimate jointly the level
of the mean at the beginning of a series as well as the size of the
change (if any).
Farley and Hinich in a later paper (1970) compared the method
developed in (1966) with the one presented by Chernoff and Zacks (1964)
and later generalized by Mustafi (1968). Some ways were examined to
systematically track time series which may contain small stochastic
mean shifts as well as random measurement errors. A "small" shift
is one which is small relative to measurement error. Three approaches
were tested with artificial data, by means of Monte Carlo methods,
using mean shifts which were rather small, that is, mean shifts which
were half the magnitude of random measurement error variance. Several
false starts with actual marketing data showed that there was an iden
tification problem to provide an adequate test of the procedures'
performance, and artificial data of known configuration provided a
more natural starting point. Two techniques (one developed by the
authors and the other by Chernoff and Zacks) involved formal estimation
under the assumption that there was at most one discrete jump in a
data record of fixed length of the type often stored in an information
system. Both techniques performed reasonably well when the rate of
shift occurrence was known, but both techniques are very sensitive
to prior specification of the rate at which shifts occur in
terms of both classes of errors, that is, missing shifts which
occur and identifying "shifts" which do not occur. Knowing the
shift rate precis ly and knowing that more than one shift in a record
is extremely unlikely are two very severe restrictions for many ap
plications. A simpler filter technique was tested similarly with more
promising results in terms of avoiding both classes of errors. The
filter approach involved first smoothing the series and then imple
menting ad hoc decision rules based on consecutive occurrences of
smoothed values falling outside a predetermined range around the
moving average.
Hlarrison and Stevens have produced two important papers about
Bayesian forecasting using nonstationary models. In the first of these
papers (1971), they described a new approach to shortterm forecasting
based on Bayesian principles in conjunction with a multistate data
generating process. The various states correspond to the occurrence of
transient errors and step changes in trend and slope. The performance
of conventional systems, like the growth models of Holt] (1957), Brown
(1963) and BoxJenkins (1970), is often upset by the occurrence of
changes in trend ,nd slope or transients. In Harrison and Stevens'
approach events of this nature are modelled explicitly, and succes
sive data points are used to calculate the posterior probabilities
of such events at each instant of time.
In the second paper (1976), Harrison and Stevens describe a
more general approach to forecasting. The principles of Bayesian fore
casting are discussed and the formal inclusion of the "forecaster"
in the forecasting system is emphasized as a major feature. The criti
cal distinction is that between a statistical forecasting method and
a forecasting sysltcm. The former transform input data into output in
formation in a purely mechanical way. The latter, however, includes
people: the person responsible for the forecast and all the people
concerned with using the forecasts and supplying information relevant
to the resulting actions. It is necessary that people can communicate
their information to the method and that the method clearly communi
cates the uncertain information in such a way that it is readily
interpreted and accepted by decision makers. The basic model, called
by them "the dynamic linear model", is defined together with Kalman
filter recurrence relations and a number of model formulations are
given based on their result. They first phrase the models in terms
of their "natural" parameters and structure, and then translate them
into the dynamic linear model form. Some of the models discussed by
them are, a) regression models, b) the steady model, c) the linear
growth model, d) the general polynomial models, e) seasonal models,
f) autoregressive models, and g) moving average models.
Multiprocess models introduce uncertainty as to the under
tying model, itself, and this approach is described in a more general
fashion than in their 1971 paper. In the 1976 paper they present a
Bayesian approach to forecasting which not only includes many con
ventional methods, as presented before, but possesses a remarkable
range of additional facilities, not the least being its ability to
respond effectively in the startup situation where no prior data
history (as distinct from information) is available. The essential
found ions of the method are:
(a) a para:metric (or state space) model, as distinct from
a functional model;
(b) probabilistic information on the parameters at any given
time;
(c) a sequential model definition which describes how the
parameters change in time, both systematically and as a result of
random shocks;
and
(d) uncertainty as to the underlying model itself, as be
tween a number of discrete alternatives.
Kamat (1976) developed a smoothed Bayes control procedure for
controlling the output of a production process when the quality charac
teristic is continuous with a linear shift in its basic level. The
procedure uses Bayesian estimation with exponential smoothing for
updating the necessary parameter estimates. The application of the
procedure to real life data is illustrated with an example. Applica
tions of the traditional xchart and the cumulative sum control chart
to the same data are also illustrated for comparison.
In Chapter Three of this dissertation we develop a Bayesian
model of nonstationarity for normal and lognormal processes. We build
our results directly on two papers, Winkler and Barry (1973) and Barry
and Winkler (1976). In the first paper they developed a Bayesian model
for nonstationary means in a multinormal datagenerating process and
demonstrated that the presence of nonstationary means can have an impact
upon the uncertainty associated with a given random variable that has
a normal distribution. Moreover, the nonstationary model considered by
them seems to have more realistic properties than the corresponding
stationary model. For example, they found that in tlhe nonstationary
model the recent observations are given more weight that the distant
ones in determining the mean of the distribution at any given time,
and the uncertainty about the parameters of the process is never
completely removed. Barry and Winkler (1976) were concerned with the
effects of nonstationarity on portfolio decision. The use of a Bayesian
approach to statistical inference and decision provides a convenient
framework for studying the problem of changing parameters, both in
terms of forecasting security prices and in terms of portfolio decision
making. In this thesis a number of extensions to their results are
made, thereby removing some of the restrictiveness of their results,
and applications are considered in the areas of CVP analysis and life
testing.
CHAPTER TREE
NONSTATIONARITY IN NORMAL AND LOGNORMAL PROCESSES
3.1 Introduction
The normal distribution is considered by many persons an im
portant distribution. The earliest workers regarded the distribution
only as a convenient approximation to the binomial distribution. However,
with the work of Laplace and Gauss its broader theoretical importance
spread. The normal distribution became widely and uncritically accepted
as the basis of much practical statistical work. More recently a more
critical spirit has developed, with more attention being paid to systems
of "skew (asymmetric) frequency curves". This critical spirit has per
sisted, but is offset by developments in both theory and practice. The
normal distribution has a unique position in probability theory, and can
be used as an approximation to many other distributions. In real world
problems, "normal theory" can frequently be applied, with small risk of
serious erros, when substantially nonnormal distributions correspond more
closely to observed values. This allows us to take advantage of the elegant
nature and extensive supporting numerical tables of normal theory. Host
theoretical arguments for the use of the normal distribution are based on
forms of central limit theorems. These theorems state conditions under
which the distribution of standardized sums of random variables tends to
a unit normal dist ribut ion as the number of variables in the sum increases,
that is with conditions sufficient to ensure an asymptotic unit normal
distribution.
The normal distribution, for the reasons exposed before, has
been widely used and enumerating the fields of application would be
lengthy and not really informative. However, we do emphasize that the
normal distribution is almost always used as an approximation, either
to a theoretical or an unknown distribution. The normal distribution
is well suited to this because its theoretical analysis is fully worked
out and often simple in form. Where these conditions are not fulfilled
substitutes for normal distributions should be sought. Even when nor
mal distributions are not used results corresponding to "normal theory"
are often useful as standards of comparison.
The use of normal distributions when the coefficient of variation
is large presents many difficulties in some applications. For instance,
observed values more than twice the mean would then imply the existence
of observations with negative values. Frequently this is a logical absurdity.
The lognormal distribution, as defined in equation 2.2.20, is in at least
one important respect a more realistic representation of distributions
of characters that cannot assume negative values than is the normal distri
bution. A normal distribution assigns positive probability to such events,
while the lognormal distribution does not. The use of the lognormal distri
bution has been investigated as a possible solution to this problem [see
Cohen (1951), Calton (1879), Jenkins (1932) and Yuan (1933)]. In a
review of the literature Caddum (1945) found that the lognormal dis
tribution could be used to describe several processes. In Chapter Two
we presented a list of some of the applications of this distribution
to real life problems. Among those applications we emphasized its
use in CostVolumeProfit analysis and in life testing models. Fur
thermore, by taking the spread parameter small enough, it is possible
to construct a lognormal distribution closely resembling any normal
distribution. Hence, even if a normal distribution is felt to be really
appropriate, it might be replaced by a suitable lognormal distribution.
As was mentioned in Chapter Two, most research concerned with
the normal and lognormal distributions has considered only stationary
situations. That is, the parameters (known or assumed to be known)
and distributions used are assumed to remain the same in the future.
In this third chapter we intend to build a nonstationary model for
normal and lognormal processes from a Bayesian point of view. Section
3.2 sets the stage for the development of the nonstationary model. In
it, we describe essential features of the Bayesian analysis of normal
and lognormal processes including prior, posterior and predictive dis
tributions. Two uncertainty situations are considered in this section;
in one the shift parameter, U is assumed to be unknown and the spread
parameter, o, is assumed to be known; and in the other, both parameters
are assumed to be unknown. In Section 3.3, we develop a particular non
stationary model for the shift parameter of the lognormal distribution,
again under the same two uncertainty situations, and provide a com
parison of the results with a stationary model.
3.2 Bayesian Analysis of Normal and Lognormal Processes
Before the last decade, most of the Bayesian research dealing
with problems ot statistical inference and decisions concerning a parame
ter 0 assume that 0 takes on a single value; those models are called
stationary models. For example, 6 may represent the proportion of de
fective items produced by a certain manufacturing process; the mean
monthly profits of a given company; the mean life of a manufactured
product and so on. In each case a is assumed to be a fixed but not known.
A formal Bayesian statistical analysis articulates the evidence of a
sample to be analyzed with evidence other than that of the sample; it
is felt that there usually is prior evidence. The nonsample evidence
is assessed judgmentally or subjectively and is expressed in proba
bilistic terms, by means of: (1) a data distribution that specifies
the probability of any sample result conditional on certain parameters;
and (2) a prior distribution that expresses our uncertainty about the
parameters. When judgment in the form of the assessment of a likeli
hood function to apply to the data is combined with evidence of a
sample, we have the likelihood function of the sample. The likelihood
function of the sample is combined with the prior distribution via
Bayes' theorem to produce a posterior distribution for the parameters
of the data distribution, and this is the typical output of a formal
Bayesian analysis. If we assume that the prior distribution, for the
parameters of the data distribution, is continuous then we may express
Bayes' theorem as
(3.2.1) t(6 x) = ( f(x)) o;
where
x denotes the vector of sample observations,
6 represents all the unknown parameters,
and
r represents the known parameters of the prior
distribution of 0.
We can interpret f(xlo) in two ways: (1) for given 6, f(xJo)
gives the distribution of the random vector 1; (2) for given x, f(xlO)
as a function of d, together with a] positive multiples, in the ususal
usage is the likelihood function of the sample.
The prior probability of the sample f(xrT) is computed from
(3.2.2) f(xII) = / f(0I1) f(xIo) de,
0
from which we see that f(xIT) can be interpreted as the expected
value of the likelihood in the light of the prior distribution. Alter
natively, f(x[T) can be interpreted as the marginal distribution of
the random vector R with respect to the joint distribution,
(3.2.3) f(x, O T) = f(eO T) f(xle).
Since (3.2.2) can be computed in advance of the sample for any x,
we shall frequently refer to the marginal distribution of R as the
predictive distribution implied by the specified prior distribution
and datai distribtl ion.
If we have a posterior distribution f(ojx) and if a future
random vector "I is to come from f(w O), which may or may not be
the same data distribution as in (3.2.2), we may compute
(3.2.4) f(xl x) = I f(0lx) f(x 10) dO.
0
We refer to the distribution so defined as the predictive distribution
of a future sample implied by the posterior distribution. It must be
understood that (3.2.2) and (3.2.4) are but two instances of the same
relationship; sometimes it is worth distinguishing the practical prob
lems arising when predictions refer to the present sample from those
arising in connection with predictions about a future sample; that is
a "notyetobserved" sample. The revision of the prior distribution
gives the statistician a method for drawing inferences about 9, the
uncertain expression, quantity or parameter of interest, and for deci
sions related to 0.
In general then we may say that the term Bayesian refers to
any use or user of prior distributions on a parameter space (although
there is some nonp.irametric Bayesian material also) with the associ
ated application of Bayes theorem in the analysis of an inferential
or decision problem under uncertainty. Such an analysis rests on the
belief that in most practical situations the statistician will pos
sess some subjective a priori information concerning the probable
values of the parameter. This information may often be reasonably
summarized and formalized by the choice of a suitable prior dis
tribution on the parameter space. The fact that the decision maker
can not specify every detail of his prior distribution by direct asses
sment means that t here will often be considerable latitude in the
choice of the family of distributions to be used, even though the
selection of a particular member within the chosen family will
usually be wholly determined by the decision maker's expressed beliefs
or betting odds. Three characteristics are particularly desirable for
a family of prior distributions:
(i) analytical tractability in three aspects; namely
a) it should be reasonably easy to determine the
posterior distribution resulting from a given prior and sample,
1) it should be possible to express in convenient
form the expectations of some simple utility functions with respect
to any member of it,
and
c) the family should be closed in the sense that if
the prior is a member of it, the posterior will also be a member of it;
(ii) the family should be rich, sot that there will exist a
member of it capable of expressing the decision maker's prior beliefs
or at least approximating them well;
and
(iii) it should be pajrametrizable in a manner which can
readily be interpretted, so that it will be easy to verify that the
chosen member of the family is really in close agreement with the
decision maker's prior judgments about 0 and not a mere artifact
agreeing with one or two quantitative summarizations of these judg
ments.
A family of prior densities which gives rise to posteriors
belonging to the same family is very useful inasmuch as one aspect
of mathematical tractability is maintained, and this property has
been termed "closure under sampling". For densities which admit
sufficient statistics of fixed dimensionality, a concept to be
explained later, Raiffa and Schlaifer (1961) have considered a
method of generating prior densities on the parameter space that
possess the "closure under sampling" property. A family of such
densities has been called by them a "natural conjugate family".
To define the concepts of sufficient statistic and sufficient sta
tistic of fixed dimensionality, consider a statistical problem in
which a large amount of experimental data has been collected. The
treatment of the data is often simplified if the statistician
computes a few numerical values, or statistics, and considers these
values as summaries of the relevant information in the data. In some
problems, a statistical analysis that is based on these few sum
mary values can be just as effective as any analysis that could be
based on all observed values. If the summaries are fully informative
they are known as sufficient statistics. Formally, suppose that 6 is
a parametrr which takes a value in the space 0. Also suppose that x
is a randoin variable, or random vector, which takes values in the
sample space S. We shall let f(.0 0) denote the conditional proba
bility density function (p.d.f.) of x when 0=00 (O0O). It is
assumed that the observed value of x will be available for making
inferences and decisions related to the parameter e. Denote any
function T of the observations x, a statistic. Loosely speaking,
a statistic T is called a sufficient statistic if, for any prior
distribution of 0, its posterior distribution depends on the ob
served value of x only through T(x). More formally, for any prior
p.d.f. g(0) and any observed value xeS, let g(" x) denote the pos
terior p.d.f. of 0, assuming for simplicity that for every value of
xeS and every prior p.d.f. g, the posterior g(' x) exists and is
specified by the Bayes theorem. Then it is said that a statistic
T is sufficient for the family of p.d.f.'s f(10), 60O, if
g( IX) = g(. x2) for any prior p.d.f. g and any two points xleS
and x2ECS such that T(x1) = T(x2).
Now, consider only data generating processes which generate
independent and identically distributed random variables ', 2, ...
such that, for any n and any (xl, x2 ... x ) there exists a suf
ficient statistic. Sufficient statisticsof fixed dimensionality are
those statistics T such that T' (x x2 ... x ) = T = (T1, T ... T )
where a particular value T. is a real number and the dimensionality
s of T does nor depend on n. Independently of how many elements we
sample, only s stall istics are needed.
Raiffa and Schlaifer (1961) present the following method for
developing the natural runjugn te prior for a given likelihood function:
(i) Let the density function of a be g, where g denotes either
a prior or a posterior density, and let k be another function on 0
such that
k(O)
(3.2.5) g k(
fk(o) de
0
Then we shall write
(3.2.6) g(o) k(e)
and say that k is a kernel of the density of 0.
(ii) Let the likelihood of x given 0 be l(x 0), and suppose
that P and k are functions on x such that, for all x and 0,
(3.2.7) l(xlO) = k(xl) P(x).
Then we shall say that k(xl0) is a kernel of the likelihood of x
given 0 and that P(x) is a residue of this likelihood.
(iii) Let the prior distribution of the random variable 0
have a density g'. For any x such taht l*(xlg') = J 1(x 0) g'(0) dO > 0,
0 ~
it follows from Bayes theorem that the posterior distribution of 0 has
a density g" whose value at (0) for the given x is
(3.2.8) g"(O0x) = g'(0) l(xO) N(x)
where
N(x) = [ f g'(O) l(x O) dO]1
0
(iv) Now let k' denote a kernel of the prior density of 0. It
follows from the definitions of k and 1 and of the symbol I that
the Bayes formula can be written,
(3.2.9) g"(9O x) = g'(6) 1(x6) N(x)
= k'(0) [ k(o) de]1 k(x e) P(x) N(x)
0
g"(' x) k'(6) k(x6),
where the value of the constant of proportionality for the given x,
(3.2.10) P(x) N(x) [ f k(6) dol]
can always be determined by the condition,
(3.2.11) g"(Olx) do = 1, whenever the integral exists.
Before we begin our presentation of a basic Bayesian analysis
of normal and lognormal processes we want to emphasize that caution
should be exercised in the application of the method developed by
Raiffa and Schlailer, as is pointed out by Box and Tiao (1972). According
to them it is often appropriate to analyze data from scientific inves
tigation on the assumption that the likelihood dominate the prior, for
two reasons:
(i) a scientific investigation is not usually undertaken unless
information supplied by the investigation is likely to be considerably
more precise than information already available, that is unless it is
likely to increase knowledge by a substantial amount. Therefore analysis
with priors which are dominated by the likelihood often realistically
represents the true inferential situation.
(ii) Even when a scientist holds strong prior beliefs about the
value of a parameter 0, nevertheless, in reporting the results it would
usually be appropriate and most convincing to his colleagues if he ana
lyzed the data against a reference prior which is dominated by the like
lihood. He could say that, irrespective of what he or anyone else be
lieved to begin with, the posterior distribution represented what some
one who a priori knew very little about 0 should believe in the light
of the data. Reference priors in general mean standard priors domi
nated by the likelihood. [See Dickey (1973) for a general discussion
of Bayesian methods in scientific reporting.]
In general a prior which is dominated by the likelihood is one
which does not change very much over the region in which the likelihood
is appreciable and does not assume large values outside that range. We
shall refer to a prior distribution which has these properties as a
locally uniform prior. There are some difficulties, however, associated
with locally uniform priors. The choice of a prior to characterize a
situation where "nothing" (or, more realistically, little) is known a
priori has long been, and still is, a matter of dispute. Bayes tenta
tively suggested that where such knowledge was lacking concerning the
nature of the prior distribution, it might be regarded as uniform. There
is an objection to Bayes postulate. If the distribution of a continuous
parameter 0 were taken to he locally uniform, then the distribution of
log or some other transformation of (which might provide equally
log 0, 0 or some other transformation of 0 (which might provide equally
sensible bases for parametrizing the problem) would not be locally
uniform. Thus, application of Bayes' postulate to different trans
formations of 0 would lead to posterior distributions from the same
data which were inconsistent with the notion that nothing is known
about 0 or functions of 0 This argument is of course correct, but
the arbitrariness of the choice of parametrization does not by it
self mean that we should not employ Bayes postulate in practice.
Box and Tiao (1972) present an argument for choosing a par
ticular metric in terms of which a locally uniform prior can be
regarded as noninformative about the parameters. It is important to
bear in mind that one can never be in a state of complete ignorance;
further, the statement "knowing little a priori" can only have mean
ing relative to the information provided by the experiment. A prior
distribution is supposed to represent knowledge about parameters
before the outcome of a projected experiment is known. Thus, the main
issue is how to select a prior which provides little information rela
tive to what is expected to be provided by the intended experiment.
3.3 Nonstationary Model for Normal and Lognormal Means
It was emphasized in Section 2.3 that for many real world
data generating processes the assumption of stationarity is question
able. Random parameter variation could be a reasonable assumption when
we are concerned with life testing models or with economic variables.
For example, in life testing models, when it is assumed that the life
of certain parts follows a lognormal distribution, the stationarity
assumption could be expected to hold over short periods of time; but
in most cases it would be expected that for a lengthy period, statio
narity would be a doubtful assumption. Similarly in other areas like
CostVolumeProfit analysis it is doubtful that the stationarity
assumption will hold over long periods of time. Variables like sales,
costs, and contribution margin are affected by economic, political
and environmental factors. In particular it was pointed out that we
are interested in gradual changes, the effects of which are not perfectly
predictable in advance for a particular period.
If a data generating process characterized by some parameter
6 is nonstationary, then it is potentially misleading to make inferences
and decisions concerning 0 as if 6 only took on a single value. Instead
we should he concerned with a sequence 61, 62, ... of values of 6 cor
responding to different time periods, assuming the characteristics of
the process may vary across Lime. Several methods have been proposed
to study stochastic parameter variation [see Chernoff and Zacks (1964)
and Harrison and Stevens (1976)]. Some have claimed that a reasonable
approach to the effects of gradual change might be to model the para
meters of nonstationary distributions as if they undergo independent
random shifts through time [see Barry (1976), Carter (1972), and
Kamat (1976)]. Specifically they suggest the use of a model that
assumes that the mean of the distribution has a linear shift. In those
papers, it is clearly demonstrated that when it is assumed that the
process represented by the model is normal, this linear random shift
,model allows anal ytical comparisons to be drawn if it is assumed that
the successive increments in the process mean are drawn independently
from a normal population with mean u and variance p. We intend to
use the same approach in this dissertation. Two cases are considered:
11 unknown and o2 known; and both 1 and o2 unknown.
3.3.1 P is Unknown and 02 is Known
For a process that has a normal density function with unknown
parameter f, Raiffa and Schlaifer (1961) show that the natural conju
gate prior is normal with parameters m' and o2/n'. (See Appendix I
for the details of their exposition.) From the prior distribution on
00 and with a sequence of n independent observations (x, x2, ... xn
from the normal process under consideration [N(p,o2)], the posterior
distribution in period zero is obtained. If the sample yields sufficient
statistics m and n, then the posterior distribution is normal with para
meters n1 and m" given by
(3.3.1) n" = n + n,
and
(3.3.2) mi = (n0 mo + n m)/(n + n).
If the mean of the distribution does not change from period to period
except by the effect of the sample information then each posterior can
be thought of as prior with respect to the following sample. Thus, the
posterior distribution on 00 is the prior distribution on i0 ; i.e.
(.3.3.) 1I (i(0 i o2/n1 ) = f (Fp o2/1 )
where
(3.3.4) m = m' ,
and
(3.3.5) n = n
0 1
In general, if we assume that a fixed sample of size n is employed
every time a sample is taken and if we assume that the mean is sta
tionary except by the effect of the sample information, then in any
given period t the posterior distribution is normal with parameters
n" and m" given by,
t t
(3.3.6) n" = n' + n
t t
and
(3.3.7) m" = (n' m' + n m)/(n' + n).
t tt t
This inferential model is called a stationary model since it assumes
that neither the distribution nor the parameters change from period
to period. In this case it assumes that ut takes on the same value
in every period and that f'(u ) represents the information available
about that value as of the start of the tth period.
Suppose now that the process generating the observations un
dergoes a mean shift between succesive periods. In particular infer
ences about the mean of a normal process are considered when the para
meter I shifts from period to period, with the shifts governed by an
independent normal process. Formally, consider a data generating pro
cess that generates n observations tl x t2' ..., xtn during time
period t according to a normal process with parameters pt and ao.
Assume that the parameter o" is known and does not change over time,
whereas pt is not known and may vary over time. In particular, values
of the parameter for successive time periods are related as,
(3.3.8) t l = t + %t+1 t = 1, 2, ...
where t+ is a normal "random shock" term independent of p with
known mean u and variance o2.That is t behaves as a random walk.
e t
The mean in any period t is equal to the mean in the previous period
plus an increment e, which has a normal distribution, with known
mean and variance.
Before the sample is taken at time t, we assume that a prior
density function could be assessed that represents judgment (based
on past experience, past information etc.) concerning the probabilities
for the possible values of t. If the prior distribution of pt at the
beginning of time period t is represented by f'(t ), and a sample of
size nt during period t yields xt = (t ... xtn), then the prior
distribution of p can be revised. Furthermore at the end of time
period t (the beginning of time period t+l), the data generating pro
cess is governed by a new mean Dt+ so it is necessary to use the
posterior distribution of t and the relation (3.3.8) to determine
the prior distribution of pt+
In order to determine the distribution of the parameter pt+l
a wel l known tlheoim c could be used. It says that the convolution g(z)
of two normal dist ribut ions witli parameters (pt,o2) and (p2,02)
gives a distribution which is normal with mean (ip + 112) and variance
(02 + o2), i.e..
(3.3.9) g(z) = fN(ZIPl + P2' + o2)'
[see Mood et. al. (1974)]. Thus the distribution of t+ is normal,
i.e.,
(3.3.10) f( t+ m" + u, (02/n") + o2) ,< t+ <'
N C+1 t t e t+1
m< m" + u
t
(o2/n") + 02 >0.
t e
We could find a simpler expression if we realize that, since o2 and
a2 are positive, there must exist n such that,
s
(3.3.11) 02 = o2/n
e s
or n = 02/o2
s e
In other words, the disturbance variance is a multiple of the pro
cess variance. The prior distribution of the mean after t periods then
simplifies to
(3.3.12) fN(t+Im + u, 02(n" + s )/n' ns
f t t !!
or
(3.3.13) f'( t ln' 0 2/ )
where' t+
where
m'+l = mi' + u
t+i 1: '
( .3.14)
and
(3.3.15) ni = i n / (n( + n ) <] n"
L+1 t s t s t
The inequality stated above can be interpreted as showing that the
presence of nonstationarity produces greater uncertainty (variance)
at the start of period L+L than would be present under stationarity
because in the stationary case n' = n". If we assume that a change
t+I t
in the mean occurs between every two consecutive periods then we could
repeat the previous procedure each time a change occurs to determine
the new prior distribution.
For a process that has a lognormal density function as defined
in (Al.14), it was shown in Appendix I that, when the unknown parame
ter is p, the natural conjugate prior is normal. Thus, the revision
of the prior distribution in any given period is identical to the revi
sion in the normal case [see equations (3.3.6) and (3.3.7)] except that m
is defined as the sample mean of the natural logarithms of the observed
x values. Furthermore the procedure presented before to represent
changes in the mean, F, of the normal distribution can be used to model
changes in the shift parameter p of the lognormal distribution. The
normality of the natural conjugate prior, in this case, allows us to
use the formulas (.3..8)(3.3.15) to study the behavior of the prior dis
tribution of p alter t periods of time.
Since the variance V(x) of the lognormal random variable x is
1 function of i and 6" in the lognormal case, nonstaLionarity in 0
means that both the mean and the variance of x are nonstationary, so
that the lognormal case provides a generalization of the normal results.
3.3,2 p and 62 Both Unknown
The results of the previous section can he extended to the case
of unknown mean and variance. The joint natural conjugate prior density
function for p and 02 is a normalgamma2 functions, as was shown in
Appendix I, given by
(3.3.16)
d'
1
n' d'v' d'v' 2 d'v'
/n' exp[ z2(Um')] exp[ 7T2] [
' (',a2 m',v',n',d') = 2 2G 2
oNy /2 F(d'/2)
Given a prior from this family and assuming that information is
available from a normal (or lognormal) process through a sample of obser
vations xl, x,5 ... x it is possible to obtain a posterior distribution
of the two parameters D and d2. It was shown in Appendix I that the pos
terior distribution is also normalgamma2, i.e., f" (p,62 m",v",n",d")
Ny2
where
(3.3.17) m" = (n'm' + n m)/(R' + n)
(3.3.18) v" = [d'v' + n'm'2 + dv + nm2 n"m"2]/(d' + n)
n" = n' + n
(3.3.19)
and
(3.3.20) d" = d' + n,
It is clear from (3.3.16) that the joint distribution of p
and a2 is the product of two marginal distributions, i.e.,
(3.3.21) f" ( ,2 im",v",n",d") = f"( 12 ,n",m") ft"(C2v",d")
Ny2
The marginal density of 82 does not depend on . Now consider the case
of nonstationary p as in the previous section. The independence of the
marginal distribution of o2 from p will be an important factor in our
results below.
At the end of period t (the beginning of time period t+l) the
posterior distribution of p and 02 could be used in conjunction with
the relation between pt and the random shock et+l to get the joint
prior distribution at the beginning of period t+l. As before,the random
shock model to be considered is pt = p + e We make the assump
t+l t t+i
tion that although 02 is unknown, it is known that e 's variance,
t
o2, is 1/n times the unknown process variance, 02. As before, assuming
e s
that P has a posterior distribution with parameters (m",'2/n") and that
& is distributed normally with parameters (u,o2/n ) it was shown in
s
Appendix I that the convolution z (z = u + e) has a conditional density
given by
(3.3.22) g(z) = f"(zim" + u, d2[(l/n") + (1/n )]).
Note that this density is conditional on d2, as is the conjugate
prior of f. Thus, the prior density of pt+l' at the beginning of period
t+l after the random shock has occurred, is given by
(3.3.23) f'(pt+llm" + u, 2 [("s + nt)/n't ns
Since 62 is assumed constant, f2 (a2) does not change but
equals the posterior distribution at the end of period t. Hence,the
joint distribution at the beginning of period t+l is given by
(3.3.24) f 2(t+1,6) = Nf (t+llmt + u,2[(ns+ nns) ) f (62Idtv )
If we let
(3.3.25) m' = m" + u,
mt+1 t
(3.3.26) n' = n" n/(n + n")
t+1 t s s t
(3.3.27) d+1 =t'
and
(3.3.28) v'+ = v"
t+1 t '
then the distribution of j and OQ could be written as
(3.3.29) f' ( t+l,02) = '( t+lm' 2/nt') f'2 (62d v'
Ny2(t+1 2 N t+1 t+1, t+1 f2( t+1, t+1).
The revision could be continued since the prior distribution
at the beginning of period t+l is still a normalgamma2 distribution.
At any Lime t, the process mean is not known with certainty, but the
information from the samples collected up to time t provides an indi
cation of t. Before lhe sample is taken at time t, we assume that
one is capable of assessing a prior density function that represents
our judgment (based on past experience, past information, etc.) con
cerning the probabilities for the possible values of pt and 6'. In
effect, one view; (Pt,0 ) as a pair of random variables to which we
have assigned a probability density function; in this case a normal
gamma2 with parameters m', n', v' and d'. The sample results at time
t can be described in terms of the sufficient statistics m,, nt, v
and d ; sample mean, sample size, sample variance and degrees of free
dom needed to determine vt, respectively. Using these sample results,
a new posterior distribution could be obtained which is normalgamma2.
The tractability of the model is maintained when a natural conjugate
prior is used and ai shift model of the form (3.3.8) is assumed for the
changes of the parameter P between two consecutive periods. Hence,
after t periods of time the joint distribution of p and 2 is norma
gamma2; that is,
(3.3.30) fN_y2 (t+ i021 m,+l n'+ dt+' vt+l) '
where
(3.3.31) d' = d' + (t)n
t+l 1
(3.3.32) n' = (n' + n)ns/[(n' + n) + n ,
L+1 t s s
(3.3.33) v' = [d'v' + n'm'2 + dv + nm2 + n"m"2]/td' + n]
Lt t t t t t t t
and
(3.3.34) m = (n'm' + nm)/(n' + n).
t3+1 tt t
In this manner, a sequence of prior and posterior distributions for
successive pt may be obtained as successive values of the random vector
g = (xIt' .. xt ) are observed.
For the process that has a lognormal density function as defined
in (Ai.14), it was shown before that when both parameters are unknown
the joint natural conjugate prior is normalgamma2. Thus, the revision
of the prior distribution in any given period is identical to the revi
sion in the normal case. Furthermore the procedure presented previously
to represent changes in the mean, P, of the normal distribution could
be used to model changes in the shift parameter of the lognormal.
The fact that both normal and lognormal distributions have a joint
natural conjugate prior which is normalgamma2 allows us to use the
formulas (3.3.30 3.3.34) to study the behavior of the prior distri
bution of P and 62 after t periods.
3.3.3 Stationary Versus Nonstationary Results
Stationary conditions, in the context of our discussion, imply
that there is no shift in the mean, i, of the distribution; that is,
= 0 and consequently u and 92 are both zero. Successive values of U
t e
are the same across t ime, i.e., Iii= P2= ... For the case when
1 2 t
only Ti is unknown,this implies that equation (3.3.10) becomes,
(.3.3.5) f,(p't+ll m + 0, (,2/np) + 0)
or
(3.3.36) f [Ot+l m"' (o2/n")].
Under stationarity, then, the prior distribution of Dt+, at the start
of period t+l is the same as the posterior distribution of jt at the
end of period t. In the case of nonstationarity with no drift, u=0;
in other words, the distribution of e is normal with mean 0 and vari
ance a2. For this case it is clear that for a given posterior distri
bution of t at time t, the only difference between the prior dis
tributions of 0it+ under stationarity (see equation 3.3.36) and the
prior distribution of t+ under nonstationarity (see equation 3.3.10)
is the variance term. The prior variance of .t+l under stationarity
is,
(3.3.37) VarS t+1) = o2/n'+1 = 02/n't;
(3.3.37) Var 2 n+ G
whereas the prior variance of Ot+l under nonstationarity is,
(3.3.38) VarN ( ) = 02/n+ = (2/n") + (02/n )
= o2[(1/nt) + (1/n)].
As expected, the incorporation of the nonstationary condition has
caused an increase in the variance of the prior distribution. The
variance increased by an amount oz/ns; that is,by an amount equal
to the variance of the distribution of successive increments in the
process mean. For the stationary case
(3.3.39) [n' = [/n"]
t+1 t
and for the nonstationary case,
I ( / t ( / )
(3.3.40) [nt' ] = [(1/n ) + (1/n )].
t+1 t s
Thus, equivalently, we could say that for a given posterior distri
bution of pt at time t, the only difference between the prior distri
bution of 0Jtl under stationarity is that the term n' is larger
t+l
with the stationary condition. When u+0, m' is always changing and,
t
therefore, there is a difference in mean and variance.
Stationary conditions, in the case when both 0 and 82 are
unknown, imply that in any given period t+l the joint prior density
for P and 62 is a normalgamma2 of the form given in equations
(3.3.30 3.3.34). That is,
(3.3.41)
fNyP( 2 t82v' n'd'+ ) = +l(P +ll'n f;2( 21d(2 d' v' )
tIt2it+1 ,tt+] ,t+l ,n tI+ N + t+t
='fl 'nt+1
where
(3.3.42) m' = m" ,
t+13 t
(3.3.43) v' = v" ,
t+1 t
(3.3.44)
,I
11t = cI
and
(3.3.45) d' = d"
+ t
Under stationarity, then, the joint prior distribution of p and 82
at the start of period t+l is the same as the posterior distribution
of pt and d2 at the end of period t. Since the distribution of 62
does not depend on 5, only on the parameters d and v, we could model
changes in p. These changes in the mean only affect the function
f'(+l mti+,' 2/n'+ ), in equation (3.3.41). In fact, the effect
of the nonstationarity assumption on f'(p +) is identical to the
effect of nonstationarity over the prior distribution in the case
when only j was the unknown parameter. In the case of nonstationarity
with no drift, i.e., u=0, for a given posterior distribution of jt and
62 at time t, the joint prior density function for 0 and 62 is similar
to the stationary counterpart, as given in equation (3.3.41), except
for the fact that the variance of f'N ,, t,' 62/n' ) is larger
fN t+l
than the variance of f(p +l m'+l, 62/n' ) in the stationary case.
In other words a2/n1' in the stationary case is smaller than 82/n'
L+l t+1
in the nonstationary case.
The nonstationarity assumption also affects the predictive
distribution. For the case when p is the unknown parameter and the
data generating process is normal, assume that after t periods we have
.1 posterior distribution f"(p ) which is normal with mean m" and
t L L
variance o '/11'. Ttie predictive distribution at the end of period t
was sliown in equat ion (A1.12) to be normal with mean,
(3.3.46) E (x ) = m"
t t t
and variance
(3.3.47) Var (x) = 02(1 + n")/n"] = 2[1 + (1/n")].
t t t t t
If the process is stationary then the predictive distribution of the
random variable of interest at the beginning of period t+l is the same
as the distribution we had at the end of period t, i.e., N(m",o2[(1+n")/n").
t t t
However if we assume the nonstationary condition, the prior distribu
tion of P at the start of period t+l has a different mean and a dif
ferent variance. Consequently the predictive distribution changes in
mean and variance between consecutive time periods. In other words
E t(xt ) is always changing depending on the stochastic change of
the mean pt+. In the case of nonstationarity with no drift, i.e., u=0,
for a given posterior distribution of pt at time t, the only differ
ence between the predictive distribution of xt+I under stationarity
and the predictive distribution of xt+l under nonstationarity is the
variance term. The variance of xt+ under stationarity, at the start
of time period t+l, is
(3.3.48) Var (x ) = o2[(l+n' )/n' I = o2[1+(1/n )].
It was stated previously that the parameter n +1 is smaller when
is unknown and nonstationary than when p is unknown but stationary.
Hence, as expected, the variance of the predictive distribution,
Var t+(xt+ ), is larger when p is nonstationary. This has some
implications for the determination of prediction intervals; which
we will discuss in detail in Chapter Four. Nonstationarity implies
greater uncertainty, which is reflected by an increase in the mea
sure of uncertainty, variance.
For the case when both p and 62 are the unknown parameters
and the data generating process is normal, assume that after t
periods we have a posterior distribution f"( ,t'2) which is
normalgamma2 with parameters m", n", v" and d". The predictive
t t t t
distribution at the end of period t was shown in equation (AI.33)
to be Student with mean,
(3.3.49) E (x ) = m" d" > 1,
t t t t
and variance
(3.3.50) Var (x ) = [v" (n'+ ) /n"] [d"/(d" 2)], d" > 2.
L t t L t t t t
Again, if the process is stationary then the predictive distribution
at the beginning of period t+1 is the same as the distribution that
we had at the end of period t, i.e., ST (m", [v"(n"+l)/n"][d"/(d" 2)]).
t t t t t t
When we assume the nonstationary condition, the.joint prior
distribution of 0 and 82 at the start of period t+l changes from its
original form at the end of period t. The specific random model we
are assuming causes the parameter m and n of the distribution of
U to change from lhe end of period t to the start of period t+l.
Therefore the predictive distribution f' (x ) has a different
t+1 t+1
mean and variance than t"(x ). In the case of nonstationarity with
no drift, i.
no drift, i.e., u=0, for a given posterior distribution of 1t and 62
t
at time t, the only difference between the predictive distribution of
xt+1 under statioiarity visavis nonstationarity is the variance term.
Observing equation (3.3.50) closely we note that the effect of nonsta
tionarity is the same as in all previous cases; that is the parameter
n+, is smaller when p is nonstationary and therefore the variance is
larger. In this case since both p and a2 are unknown, at the end of
period t our estimate of the variance is v" which includes all the
t
information that we have available at the time including sample in
formation.
A comparison of stationary versus nonstationary results when
the data generating process is lognorma] moves along the same lines as
the normal process does. For the case where the unknown parameter is
1, the nonstationarity condition causes an increase in the variance
and in the mean of the normal prior distribution which causes an
increase in the mean and variance of the lognormal predictive distri
bution. Similarly, for the case when both parameters are unknown the
condition causes an increase in mean and variance in the prior distri
bution of p and a change in the joint prior distribution of Q and a2
which affects the logStudent predictive distribution. The logStudent
predictive distribution has infinite mean and variance which are not
affected by the nonstationary condition.
3.4 Conclusion
In this chapter we modeled nonstationarity in the mean of
normal and lognormal processes under two uncertainty assumptions,
The model is built upon the Bayesian analysis of normal processes
of Raiiftli andi Schlaifer (19(1) and upon tile analysis of nonstationary
means of normal processes, for unknown i, of Barry (1973). We extended
the nonstationary results of Barry (1973) to the lognormal distribu
tion. The variance of the lognormal distribution is given by
(3.4.1) Var(x) = w(wl) e2 ,
where w = exp(o2).
Since V(x) is a function of 11 and o2 in the lognormal case, nonsta
tionarity in D means that both mean and variance of x are nonsta
tionary, so that the lognormal case provides a generalization of the
normal results. Furthermore, we developed the nonstationary model
for tie mean of normal and lognormal processes for the case when both
parameters, j and 02, are unknown. For each group of assumptions we
noted that, in every time period t, the uncertainty is never fully
eliminated from tle model.
In Chapter Two we emphasized that the exponential distri
bution was often used to represent life testing models. All the
research in the area of life testing where this distribution has
been used has assumed stationary conditions for the parameters of
the model and for the model itself. Appendix II shows the Bayesian
modeling of nonstationarity for the parameters of an exponential dis
tribution using random shock models. Only under very trivial as
sumptions does the analysis yield tractable and consequently useful
results. On the other h ;nd, as was shown in this chapter, tile normal
and lognormal distributions provide results that are especially
tractable.
In any given period t, the prior, posterior and predictive
distributions depend on the parameters, m and n when only p is
t t
unknown; and on the parameters m nt, vt and d when both p and
o2 are unknown. Under the nonstationarity conditions, these para
meters change from period to period not only because new information
becomes available through the sample, but because of the additional
uncertainty involving the shifts in the parameter p. To make better
use of these distributions the decision maker must know how they are
evolving through time. Management requires realistic and accurate
information to aid in decision making. For instance the decision
maker can be interested in knowing how the variance of the distri
bution of the mean, p, changes across time. Furthermore, since one
of the objectives of the user of the distribution is to construct
prediction intervals for the process variable he can be interested
in knowing how the variance of the predictive distribution behaves
as the number of observed periods increases. We will address this
problem in detail in Chapter Four through the study of the limiting
behavior of the parameters m n t, vt and d In addition, attention
will be focused on the methods of constructing prediction intervals
for the normal, Student, lognormal and logStudent distributions
under various uncertainty conditions.
CHAPTER FOUR
LIMITING RESULTS AND PREDICTION INTERVALS FOR NONSTATIONARY
NORMAL AND LOGNORMAL PROCESSES
4.1 Introduction
In Chapter Three we emphasized that for many real world data
generating processes the assumption of stationarity is questionable and
stochastic parameter variation seems to be a reasonable assumption. If
a data generating process characterized by some parameter is nonstation
ary, then it is potentially misleading to make inferences and decisions
concerning the parameter as if it only took on a single value. We should
be concerned with a sequence of values of the parameter corresponding to
different time periods. It was shown in Chapter Three that if we use a
particular stochastic model we can model nonstationarity for the shift
parameter of normal and lognormal processes from a Bayesian viewpoint,
under two uncertainty conditions, and that we can obtain tractable
results. In particular, values of the parameter for successive time
periods are assumed to be related as
(4.1.1) Pt+ = ~ + et+1 t = 1, 2, ... ,
where e+l is a normal "random shock" term independent of t with known
mean u and variance o2. The mean in any period t is equal to the mean
e
in the previous period plus an increment e, which has a normal distri
bution, with known mean.
Comparing the stationary with the nonstationary processes we
pointed out that when the data generating process is normal or log
83
normal and the unknown parameter is p, the nonstationary condition
causes in any given period t an increase in the variance of the nor
mal prior distribution. This causes an increase in the mean of the nor
mal predictive distribution for normal processes and causes an increase
in the mean and variance of the lognormal predictive distribution for
lognormal processes. When both parameters, p and o2, are unknown a
similar result is found for the prior and predictive distributions of
the normal and lognormal data generating processes.
The results discussed in Chapter Three have to do with the
period to period effects of random parameter variation upon the prior
and predictive distributions. However, the asymptotic behavior of the
model has important implications for the decision maker. For instance,
when only p is the unknown parameter, under constant parameters uncer
tainty about p eventually is eliminated since n' increases without
bound and the sequence of prior variances (o2/nd) converges to zero.
Hence the distribution of t eventually will be unaffected by further
samples. On the other hand, shifting parameters could increase the uncer
tainty under which a decision must be made since it reduces the infor
mation content that past samples offer for the actual situation. Increases
in uncertainty, caused by stochastic parameter variation, have important
implications for the decision maker since his decisions depend upon
the uncertainty under which they are made. Similarly, random parameter
variation produces important differences in the limiting behavior of
the prior and predictive distributions when i and o2 are the unknown
parameters. In Section 4.2 we study the limiting behavior of the param
eters m', v, nt, and d' of the prior and predictive distributions for
the normal and lognormal data generating processes. In addition we dis
cuss the implications of these limiting results for the inferences and
decisions based on the posterior and predictive distributions.
In any period t, all the information contained in the initial
prior distribution and in subsequent samples is fully reflected in the
posterior and the predictive distributions. In some applications, partial
summaries of the information are of special importance. One important
way to partially summarize the information contained in the posterior
distribution is to quote one or more intervals which contain a stated
amount of probability. Often the problem itself will dictate certain
limits which are of special interest. A rather different situation
occurs when there are no limits of special interest, but an interval
is needed to show a range over which "most of the probability lies".
One objective of this thesis is to develop Bayesian prediction
intervals for future observations that come from normal and lognormal
data generating processes. In particular, we are interested in most plau
sible Bayesian prediction intervals of cover P as were defined in Section
2.2. In Section 4.3 we discuss the problem of constructing prediction
intervals for normal, Student, lognormal and logStudent distributions.
It is pointed out that it is easy to construct these intervals for the
normal and Student distributions but that it is rather difficult for
the lognormal and logStudent distributions. An algorithm is presented
to compute the BayesLan prediction intervals for the lognormal and log
Student distributions. In addition, we discuss the relationship that
86
exists between Bayesian prediction intervals under nonstationarity
and classical certainty equivalent and Bayesian stationary intervals.
4.2 Special Properties and Limiting Results
Under Nonstationarity
4.2.1 Limiting Behavior of m' and n' When P is the Only Unknown Parameter
t t
For a process that has a normal density function with unknown
parameter p, Raiffa and Schlaifer (1961) show that the natural conjugate
prior distribution is normal with parameters m' and o2/n'. In Section
3.3 we pointed out that if the mean, p, of the data generating process
does not change from period to period except by the effect of the sample
information, then each posterior can be thought of as a prior with
respect to a subsequent sample. In general, if we assume that a sample
of size nt is employed every time a sample is taken [which yields a
n
statistic m = ( E x ./n)] and if we assume that the mean p is sta
i=l
tionary then in any given period t the posterior distribution of p
is normal with parameters n" and m" given by
t t
(4.2.1) n" = n' + n
t t t
and
(4.2.2) m" = (n' m' + n m )/(n' + n).
t t t t t t
In order to study the limiting values of n' and m' under sta
t t
tionary conditions, we have to characterize the posterior and predictive
distributions after t periods of time have elapsed. Since the limiting
results under nonstationary means will be based on a fixed sample size
each period, we will make the same assumption for the stationary lim
iting results, that is n n, Vt, In period one, for a process that has
a normal density function with unknown parameter p,i.e., fN(xl1p),
the natural conjugate prior is normal with mean m' and variance o2/n,
i.e.,fN (Pim{,G2/n{). If a sample of size n from a normal process yields
the sufficient statistics mI and n, then the posterior and predictive
distributions at the end of period one are given by
(4.2.3) f" [pl(n'm{ + nm )/(n'+ n),o2/(n{+ n)] = f"(Plm1',2/n")
or
= fN21m2,o2/n) ,
and
(4.2.4) fN(xl im, 2(l + n/n")
respectively.
In period two, if a sample is taken from a normal process that
yields the sufficient statistics m2 and n then the posterior and predic
tive distributions at the end of the period are given by,
(4.2.5)
f [I2 [n1ml + n(ml+ m2)]/(n'+ 2n). o2/(n' + 2n)] = f"(21m'0, o2/n)
or
= f 3(lm3, 02/n'),
and
(4.2.6) fN(x2 m2, 02(1 + n2)/n')
1' 2 2"
respectively.
