A BAYESIAN ANALYSIS OF MODEL SPECIFICATION UNCERTAINTY
IN FORECASTING AND CONTROL
By
PAUL GEORGE BENSON
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
Even though you now know the true
model of the process and have no
need for this dissertation
... this is for you, Dad.
ACKNOWLEDGEMENTS
For his technical insights, constant encouragement, unflagging
optimism, patience, and personal commitment to me and this dissertation
I am deeply indebted to my friend and major advisor, Dr. Christopher B.
Barry. Exposure to his enthusiasm for and approach to research has
sparked my interest in research and made me aware of career paths I
might otherwise have overlooked.
Dr. Ira Horowitz has made numerous helpful technical and editorial
suggestions. More importantly, at the time of my greatest personal need
his door was always open. His emotional support and understanding will
never be forgotten.
I am grateful to the other members of my committee, Dr. Roger D.
Blair, Dr. H. Russell Fogler, and Dr. James T. McClave, for not only
their comments and constructive criticisms of this dissertation, but for
their encouraging words and advice throughout my tenure at Florida. I
am fortunate to have such friends.
Dr. William Miendenhall has been both friend and advisor for many
years. It was he who originally aroused my interest in combining a
major in quantitative management with a minor in statistics. I have
benefited both personally and professionally from his advice, support,
and concern for my well being.
I particularly want to thank Dr. Max R. Langham for the interest
and confidence he showed in me during my first two years at Florida.
His words of encouragement made me feel capable of succeeding in a doc
toral program and convinced me to continue on for a doctorate.
My colleagues David L. Hill and Ronald E. Shiffler took time from
their own dissertations to listen to and comment on my ideas. I particu
larly want to thank Dave for his many very helpful suggestions.
My thanks to Kathy Jarboe at the University of Florida and Diane
Berube at the University of Minnesota for the excellent typing support
they provided while I was drafting chapters of this dissertation. Thanks
also to Kathy for cheerfully functioning as a gobetween and redtape
cutter at the University of Florida after I moved to Minnesota.
I am particularly indebted to Pat Kaluza for professionally
creating the final typed draft of this dissertation. Her willingness
to meet tight deadlines and ability to smile even after facing the pain
ful notation of Chapter V will be remembered.
My sincere thanks to Elizabeth Wells for patiently listening to and
advising me on a potentially allconsuming personal problem. Our many
long talks did much to free my mind for work on this dissertation.
The thoughtfulness of friends Sharron K. Duncan and Susan G. Benson,
and new friends Thomas R. Lundstedt, Charlotte (Char) A. Lundstedt, and
Claris E. Loomis helped me to successfully negotiate the transition from
sunny Florida to frozen Minnesota last winter. Claris deserves special
mention. Her concern for my happiness and interest in my work helped
speed my return to work on this dissertation. Had we not met when we
did, this dissertation might still be a collection of partially finished
chapter drafts.
My first course in statistics was at Bucknell University in 1964.
The text we used was by Dr. Mendenhall, the instructor was my father.
No single course or approach to teaching has influenced me more.
Throughout my life, but particularly during the past two and a half
years, my father has been a great source of strength and inspiration.
Even though absent, he was present. The completion of this disser
tation is as much his accomplishment as it is mine.
Finally, my thanks to my mother for not letting me set my goals
too low or letting me give up before attaining them. Your many, many
sacrifices have been and are appreciated.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS......... . . . . . . iii
ABSTRACT ... . . . . . . . . viii
Chapter
I. INTRODUCTION . . . . . . . ... . . .. 1
I.1 Statistical Models . . . . . . . .. 2
1.2 Model Specification Uncertainty . . . . . 3
1.3 The Bayesian Approach to Inference and Decision .. 5
1.3.1 The Predictive Distribution. . . . . 5
1.3.2 The Posterior Distribution . . . . 7
1.4 Chapter Outline and Preview of Results . . . . 7
II. HYPOTHESIS TESTING, BAYESIAN MODEL SELECTION, AND
BAYESIAN MODEL COMPARISON. . . . . . . . 11
II.1 Harold Jeffreys: Hypothesis Testing . . . .. 11
11.2 Harry V. Roberts: Comparing Forecasters . . .. .13
11.3 Martin S. Geisel: Bayesian Model Comparison
and Selection . . . . . . ... . 16
III. FORECASTING WITH AND WITHOUT REGARD FOR MODEL
SPECIFICATION UNCERTAINTY. . . . . . . . 25
III.1 A Comparison of the Predictive Variances
Generated by the Bayesian Mixed Model
Distribution and the Bayesian Model Selection
Procedure . . . .............. . 27
111.2 Forecasting: Bayesian Model Comparison Versus
Bayesian Model Selection and the MaximizeR2 Rule 36
111.2.1 The Bayesian Model Selection Procedure
(BMS) ............... . .37
111.2.2 The MaximizeR2 Rule . . . .... 39
111.2.3 The Bayesian Model Comparison Procedure
(BMC) . . ............. 42
111.2.4 Model Space and Assumptions. . .... .45
111.2.5 The Treatment of Model Specification
Uncertainty . . . . . . 48
III.2.6 Risk Specification . . . ... .50
111.2.7 A Comparison of Expected Losses ... .57
111.2.8 Implications for Point Estimation. ... 59
111.2.9 Implications for Interval Estimation . 62
IV. MODEL SPECIFICATION UNCERTAINTY IN SINGLEPERIOD
ECONOMIC CONTROL PROBLEMS. . . . . . . . 68
IV.1 The Economic Control Problem . . . . .... 69
IV.2 Model Space and Assumptions . . . . . . 74
IV.3 SinglePeriod CertaintyEquivalent Control . .. 76
IV.3.1 CertaintyEquivalent Control Using
the BMMP Distribution . . . .... ... .78
IV.3.2 Risk Specification in Certainty
Equivalent Control . . . . .... 82
IV.4 Optimal SinglePeriod Control . . . . ... 87
IV.4.1 Optimal BMC Control . . . . . . 89
IV.4.2 Optimal BMC Control When Instrument
Use Costs Are Considered . . . ... 94
IV.4.3 CertaintyEquivalent BMC Control
Solutions When Instrument Use Costs
Are Considered . . .. . . . .. 104
IV.4.4 Risk Specification in Ootimal BMC Control. 107
IV.4.5 BMC Control When More Complicated Models
Are Included in the Model Space . . ... 108
V. BAYESIAN MODEL SWITCHING ... . . . .... .112
V.1 Bayesian Model Switching Methodology . . . ... 112
V.2 Special Case: Two Normal Models . . . . ... 122
VI. CONCLUDING COMMENTS AND SUGGESTIONS FOR FURTHER
RESEARCH. . . . . . . . . . . . .. 139
VI.1 Research Difficulties Encountered . . . .. 141
VI.2 Shortcomings of the BMC Procedure and
Suggestions for Further Research .. . ..... .143
VI.3 Suggestions for Further Research in the Areas
of Economic Control and Model Nonstationarity. . 145
BIBLIOGRAPHY .. . . . . . . . . . 148
BIOGRAPHICAL SKETCHl . . . . . . . . 151
Chapter
Page
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy
A BAYESIAN ANALYSIS OF MODEL SPECIFICATION UNCERTAINTY
IN FORECASTING AND CONTROL
By
Paul George Benson
August 1977
Chairman: Christopher B. Barry
Major Department: Management
The use of statistical models for forecasting and economic control
has received widespread attention in recent years. Most of this atten
tion has been focused on the problems caused by uncertainty concerning
the parameters of a given model, whereas little attention has been paid
to the problems caused by uncertainty concerning the specification of
the model itself. In this dissertation Bayesian methodology is employed
to treat model specification uncertainty in forecasting and control
environments. The implications of forecasting with and without formal
regard for model specification uncertainty are explored via a comparison
of the recommended methodology and alternative methods which involve the
selection of a single model. The recommended methodology is applied to
singleperiod economic control problems. In particular, certainty
equivalent and optimal analytic solutions are found for problems in
which there exist two viable alternative linear models of the data
generating process each with a different instrument and no intercept
viii
term. Solutions are obtained for situations in which control is
costfree and in which various instrumentuse cost functions are known.
Finally, a Bayesian procedure for modeling and making inferences about
particular nonstationary datagenerating processes is introduced. This
procedure characterizes data as being generated by different statistical
models in different time periods with the switch between models con
trolled by some random process.
CHAPTER I
INTRODUCTION
The use of statistical models for forecasting and economic
control has recieved widespread attention in recent years. Most of
this attention has been focused on the problems caused by uncertainty
concerning the parameters of a given model. As a result, much has
been written about parameter specification and estimation and their
decisionmaking implications, whereas little analytical attention
has been paid to the problems caused by uncertainty concerning the
specification of the model itself. The implications of this type of
uncertainty for forecasting and control are virtually unexplored. That
these implications are significant and worth exploring has been ex
pressed by Pierce:
Another area of uncertainty has to do with our models ..
The problem lies not only with uncertainty concerning
the true value of model parameters, but also with the
structure of models themselves. . We have found
that with some relatively minor changes in the specifi
cation of our quarterly model. . we can importantly
alter its policy multipliers.1
J. L. Pierce, "Quantitative Analysis for Decisions at the Federal
Reserve," Annals of Economic and Social Measurement, 3 (1974), 19.
2
In this dissertation the Bayesian Model Comparison procedure
developed by Geisel from the work of Roberts2 is advocated as a method
for formally treating model specification uncertainty in forecasting
and control problems. The implications for forecasting with and
without regard for model specification uncertainty are examined, and
the Bayesian Model Comparison procedure is applied to simple single
period economic control problems.
The following sections of this chapter introduce definitions and
discuss concepts that will be referred to throughout the remainder
of the dissertation.
I.1 Statistical Models
Throughout this dissertation the term "model" refers to a para
metric statistical characterization of a datagenerating process
composed of both deterministic and random components. The general
linear model used in regression analysis is an example of such a charac
terization. Each such model describes the datagenerating process via
a family of probability density functions in which each member of the
family depends on a finite number of parameters, probability density
functions over the parameters, and predetermined values of a specified
set of variables upon which it has been hypothesized that the data
generating process depends.
1Martin S. Geisel, "Comparing and Choosing Among Parametric Sta
tistical Models: A Bayesian Analysis with Macroeconomic Applications"
(Ph.D. dissertation, University of Chicago: 1971).
2Harry V. Roberts, "Probabilistic Prediction," Journal of the
American Statistical Association, 60 (March, 1965), 5062.
Statistical models are used to describe the stochastic behavior
of a datagenerating process. Decision makers use them "as if" they
were actually generating the data of interest. Any reference to a
model as being the "true" or "correct" model of a datagenerating
process should not be taken literally. A model is referred to as
being the "true" model only insofar as it behaves "as if" it were
generating the observed data.
1.2 Model Specification Uncertainty
The statistical models discussed in the previous section ex
plicitly admit uncertainty about the datagenerating process through
their parameters and random error terms. These two sources of uncer
tainty will be referred to as parameter uncertainty and random error
(or residual) uncertainty. Random error is present in each model
since the deterministic component of the model cannot realistically
be expected to account for all factors influencing a realization of
the datagenerating process. Parameter uncertainty is present since
a model's parameters are typically not observable and must be estimated
from sample data. Being explicitly present in a statistical model,
these two types of uncertainty and their implications for decision
making have received considerable attention in the literature.
Thus, it is wellknown that the appropriate use of a statistical model
in decision making requires the consideration and treatment of both
parameter and random error uncertainty.
References are provided in Chapters III and IV.
When a decision maker is uncertain as to the functional form of
his model and/or is uncertain as to the set of variables upon which
the datagenerating process depends, model specification uncertainty
is said to be present. Model specification uncertainty and its decision
making implications have received little attention in the literature.
As a result, model specification uncertainty is typically ignored or
assumed away in the statistical analysis of datagenerating processes
that precedes decisionmaking. The usual procedure is for the decision
maker to utilize sample information to aid in the selection of a model
from a set of models he believes to be viable alternative representa
tions of the datagenerating process. The chosen model is then assumed
to appropriately represent the datagenerating process,and the decision
maker bases his decisions on the information provided by this model.
Such a procedure can formally consider only parameter and random error
uncertainty. Depending on the particular model selection procedure
utilized, model specification uncertainty is either completely ignored
or suboptimally treated. The result is that some or all of the informa
tion provided about the datagenerating process by the set of models
which were not chosen, but were believed to be viable, is lost. This
loss is analogous to the information loss that would occur if the deci
sion maker assumed he knew the parameters of a given model and made his
decisions without acknowledging parameter uncertainty. Chapters III and
IV will discuss in detail the decisionmaking implications of the infor
mation loss caused by failing to treat model specification uncertainty.
1An interesting exception is the recent paper by M. Brenner, "The
Effect of Model Misspecification on Tests of the Efficient Market Hypo
thesis," Journal of Finance, 32 (1977), 5766. There are other excep
tions as well.
1.3 The Bayesian Approach to Inference and Decision Making
In this dissertation uncertainty is dealt with via Bayesian
inferential procedures. This section briefly reviews the methodology
of Bayesian inference.1
1.3.1 The Predictive Distribution
Decisions frequently hinge on the future outcome of a data
generating process. In such cases decision makers typically use a
statistical model to characterize the datagenerating process. If
model specification uncertainty is negligible and the parameters of
the model are known, then the decision maker can feel secure in basing
his decision on the information provided him by his model. However, if
the parameters of the model are unknown the model should be altered to
reflect the decisionmaker's uncertainty concerning the parameters.
This can be accomplished by treating the parameters as random variables,
utilizing a probability distribution over the parameters to reflect the
decisionmaker's parameter uncertainty, and computing the marginal
distribution of future realizations of the datagenerating process,
i.e., the distribution of future realizations which is not conditioned
on the model's parameters.
Suppose the decisionmaker's statistical model describes the data
generating process via the sampling distribution f(YFe), where YF is
a future value of the datagenerating process (yF E Y) and the
For a thorough discussion of methodology reviewed in this section
see Howard Raiffa and Robert Schlaifer, Applied Statistical Decision
Theory (Cambridge,Mass.:The M.I.T. Press, 1961).
parameters of the datagenerating process are represented by
e (a E e). Then, if the decisionmaker's parameter uncertainty can be
described by a probability distribution g'(e), the decision maker can
compute the marginal distribution of future realizations of the data
generating process as follows:
f(yF) = I g'(O)f(yF )de. (1.1)
This distribution is referred to as a predictive distribution.
If the decision maker is able to obtain a sample from the data
generating process of interest he may update his distribution of e to
reflect the sample information. Then, utilizing his revised distri
bution of 6, he may recompute his predictive distribution of yF so
that it too reflects the sample information. The revision of f(e) is
accomplished via Bayes' Rule:
I g'(e)f(y e)d8
f"(e y) = g a )f(y )do (1.2)
0
The function g'(e) is called the decisionmaker's prior distribution of
6 since it was established prior to obtaining the sample y. The
function f(Yle) is a likelihood function. It describes the likelihood
of the given sample result, y, for different values of e. The function
f"(ely) is the decisionmaker's revised distribution of e. It is called
a posterior distribution since it was computed following the receipt of
sample information. The posterior distribution reflects all the infor
mation about 6 currently available to the decision maker. This infor
mation may be incorporated into his predictive distribution of yF as
follows:
f(yFIY) = f f"(ely)f(yFle)de. (1.3)
It is from this distribution that needed information about future
observations of the datagenerating process should be extracted. As
more sample and/or subjective information about the process becomes
available, the decision maker can formally revise his predictive dis
tribution to reflect that information by repeatedly applying the
above procedure.
1.3.2 The Posterior Distribution
There are three inputs to Bayes' Rule: (1) the decisionmaker's
prior information about 6 expressed via g'(9); (2) sample observations
from the datagenerating process; and (3) the choice of the functional
form of the datagenerating process, i.e., the choice of a likelihood
function. The output of Bayes' Rule is an inferential statement
about 9 in the form of a probability distribution, f"(e0y). A
decision maker interested in obtaining information about a param
eter of the datagenerating process should compute f"(ely). The
function f"(ejy) can stand alone as an inferential statement about 9,
or it can be used to determine point and interval estimates of 0. As
more sample and/or subjective information about the datagenerating
process becomes available, Bayes' Rule can be reapplied to revise
f"(ely). The sequential application of Bayes' Rule permits the
decision maker to formally learn about o over time.
1.4 Chapter Outline and Preview of Results
Typically econometric forecasting and control models are developed
and used without formally considering the full impact of model specifi
cation uncertainty. The usual procedure is to (1) utilize a model se
lection technique to choose one model from a set of alternative compe
ting models to characterize the datagenerating process, and (2) assume
the chosen model to be the correct model of the datagenerating process
and use it to forecast and/or control the process. Such procedures
either ignore or do not fully consider the information about the data
generating process contributed by the models that were proposed as
being viable but were not selected by the model selection procedure.
Further, in assuming the chosen model is the correct model of the
process, the forecaster or controller is behaving as though he faces a
lesser degree of uncertainty than is really the case. Thus, in utilizing
model selection procedures, forecasters and controllers are simultane
ously discarding relevant information about the datagenerating process
and behaving as if they have more information than is actually possessed.
This dissertation advocates the use of the Roberts/Geisel Bayesian
Model Comparison Procedure as a means of comprehensively treating model
specification uncertainty and avoiding such contradictory behavior. The
Bayesian Model Comparison Procedure and its origins are described in
Chapter II. Chapter II also describes a Bayesian model selection pro
cedure referred to herein as the Bayesian Model Selection Procedure.
In Chapter III, the effects of forecasting with and without regard
for model specification uncertainty are examined by comparing forecasts
determined via the Bayesian Model Comparison procedure (BMC) with those
yielded by a Bayesian procedure which fails to appropriately consider
model specification uncertainty, the Bayesian Model Selection procedure
(BMS). The following results are derived:
1. If the variance of the decisionmaker's predictive distribution
is used to measure forecastrisk, and a decision maker fore
casts via the BMS procedure rather than the BMC procedure,
the risk he takes in predicting future values of the data
generating process is misspecified.
2. A decisionmaker's posterior expected loss from using a fore
cast derived via the BMC procedure is less than his posterior
expected loss from forecasting via the BMS procedure.
3. Point estimates derived via BMS are frequently misplaced.
4. The reliability of credible intervals derived via the BMS
procedure may be misspecified.
In Chapter IV, the BMC procedure is applied to simple single
period economic control problems. In particular, certaintyequivalent
and optimal analytic solutions are found for the case of two competing
linear models each with a different instrument (controllable variable)
and no intercept term. The following results are obtained:
1. The BMC certaintyequivalent control solution is to set both
instruments as if each instrument's respective model were in
fact the true model of the datagenerating process.
2. If the variance of the controller's predictive distribution
is used to measure controlrisk, and certaintyequivalent
control is utilized, it can be shown that under certain
circumstances the BMS approach to control always understates
the controlrisk involved.
3. The optimal BMC control solution is to set both instruments as
if each instrument's respective model were in fact the true
model of the process. Since optimal BMC control treats model
specification uncertainty, parameter uncertainty, and residual
uncertainty, whereas certaintyequivalent control treats only
model specification uncertainty, the optimal BMC control solu
tion differs from the BMC certaintyequivalent control solu
tion.
Certaintyequivalent and optimal BMC control solutions for cases where
instrument use costs are known are also derived in Chapter IV.
In Chapter V, a procedure for handling model nonstationarity is
introduced. Called Bayesian Model Switching, this procedure was
suggested by anomalies observed in sequences of posterior model proba
bilities generated by the BMS and BMC procedures. The Bayesian Model
Switching procedure characterizes the datagenerating process in a
manner similar to Quandt's switching regression regimes.1
Chapter VI contains an overview of the dissertation, a discussion
of the shortcomings of the Bayesian Model Comparison and Bayesian
Model Switching procedures, and suggestions for future work in the
area of model specification uncertainty.
R. E. Quandt, "A New Approach to Estimating Switching Regressions,"
Journal of the American Statistical Association, 67 (March, 1972),
306310.
CHAPTER II
HYPOTHESIS TESTING, BAYESIAN MODEL SELECTION, AND
BAYESIAN MODEL COMPARISON
The Bayesian Model Comparison approach to handling model specifi
cation uncertainty in decisionmaking problems has its origins in the
hypothesis testing work of Harold Jeffreys1 and is a direct spinoff of
a Bayesian procedure developed by Harry Roberts2 for combining expert
opinions. Martin Geisel3 adapted Roberts' work for use in econometrics
and in so doing formalized the Bayesian Model Comparison and Bayesian
Model Selection procedures. The contributions of Jeffreys, Roberts,
and Geisel to the existing Bayesian Model Comparison and Bayesian Model
Selection procedures are discussed in this chapter.
II.1 Harold Jeffreys: Hypothesis Testing4
In considering two mutually exclusive and exhaustive hypotheses
about the parameter vector e of a probability density function,
Jeffreys suggests that the decision maker should place prior probability
masses on each of the hypotheses. The probabilities should be con
sistent with the decision maker's prior information and, consequently,
Harold Jeffreys, Theory of Probability (London: Oxford University
Press, 1961), Chapters 4 and 5.
Roberts, pp. 5062.
3Geisel, pp. 145.
Jeffreys, Chapters 4 and 5.
prior beliefs about the appropriateness of each of the hypotheses.
Thus, if the two hypotheses H and H1 are exhaustive and nonoverlapping
their prior probabilities P'(H ) and P'(HI) would be assessed, and must
sum to one. If H and H1 are a prior equally likely, P'(H ) = P'(H1).
It is assumed that given H a future sample result y has probability
density function f(ylH ), and that given H1 is true, y's probability
density function is f(ylH1). Then, using Bayes' Rule, the posterior
probability that, say, H is the appropriate hypothesis is
P'(H )f(y H ) )
P"(Hy) = P'(Ho)f(yH ) + P'(H )f(yH) (2.1)
and P"(H1y) = 1P"(Holy). After determining P"(Holy) and P"(H1ly),
the decision maker can choose as the more appropriate hypothesis the one
with the higher posterior probability. Or, if the decision maker can
economically determine the losses involved from choosing an incorrect
hypothesis, he can use P"(Hoy) and P"(H y) to determine the expected
loss of choosing H or HI and then select as being the more appropriate
the hypothesis that minimizes his expected loss.
More formally, if H is e = eo and H1 is e = el, where eo and el
are particular values of the parameter vector (i.e., two simple
hypotheses), then (2.1) would be
P'(e=e )f(ye=e ) (2.2)
"(H ) = P"( y) = (e=ef(ye=e) + P'(e=e)f(ye=e)
If H is ae 1 and HI is e s2 where f1 and i2 (vIU Y2 = T) are
mutually exclusive and exhaustive sets (i.e., H and H1 are two
composite hypotheses), then it is necessary for the decision maker to
assess a prior pdf for e over Yl, P'( elesY1), and another for a over t2'
P'(elec'2). Then (2.1) would be
P'(eEY )f(YI e lI)
P"(Hy) = P"(eEI y) = ( (2.3)
where f(yec:1) = f P'(ejle8l)f(yle, eed)de
2
and f(y) = I P'(eei)f(yle i).
i=1
The next section discusses Harry Roberts important extension of
Jeffreys' work.
11.2 Harry V. Roberts: Comparing Forecasters1
Roberts was concerned with "reconciling conflicting expert inter
pretations of the same data."2 Building on Jeffrey's work, Roberts'
devised a method for discriminating among a set of alternative para
metric statistical models each of which purports to describe some
random process of interest. This Bayesian discrimination procedure
will be discussed in detail in the next section.
It will be assumed that person C knows nothing about a particular
datagenerating process f(yle), but wishes, for example, to predict
future y values and is, therefore, interested in learning about the
process. Persons A and B possess knowledge about the same process.
A and B express their knowledge about f(yle) via the data distributions
1Roberts, pp. 5062.
2 bid., p. 55.
f(yle, A) and f(yle, B), respectively, and their prior distributions on
the parameter e, g(elA) and g(e]B). e may be a vector. For expository
purposes, only two individuals will be assumed to possess knowledge
about the process, and all probability distributions of this section
will be assumed to be discrete.1
C's prior distribution for the parameter 0 may be expressed as:
g'(e) = P'(A)g(eA) + P'(B)g(elB). (2.4)
P'(A) and P'(B) sum to one and may be thought of as C's probability
assessment of the accuracy of A's judgment and B's judgment, respec
tively. If C had some knowledge about the reliability of opinions
expressed by A and B he might tend to respect the opinion of one,
say A, more than the other, and so assign P'(A) > P'(B). If C knew
nothing about either A or B it would be appropriate for him to assess
P'(A) = P'(B) = .5. C can then learn about f(yle) by combining his
thoughts (if any) about A and B (reflected in P'(A) and P'(B))with the
opinions expressed by A and B about f(y e) (represented by g(eJA),
f(yle, A), g(elB), and f(yle, B)) as in (2.4), and by using sample
information to revise (2.4). Thus it is C's posterior distribution of
e, g"(ely), that C should use in predicting y. Roberts' development of
g"(ely) is outlined in the next paragraph.
For another approach to the use of expert opinion see Peter
Morris, "Decision Analysis Expert Use," Management Science,20 (May,
1974), 123341 and "Combining Expert Judgments: A Bayesian Approach,"
Management Science,23 (March, 1977), 679693.
Following Roberts, let A index the opinions of A and B,i.e., when
A = AA reference is being made to person A, and when A = AB reference
is being made to person B. With C's prior distribution for A denoted
by P'(A), C's joint prior distribution for X and e is denoted:
g'(e,A) = P'(A)g(eIA). (2.5)
Accordingly, C's marginal prior distribution for 6 is denoted by
P'(e) = EP'(A)g(OX). (2.6)
A
Equations (2.6) and (2.4) are equivalent. C's joint posterior distri
bution of A and e is obtained via Bayes' Rule as follows:
h"(A,ely) = h'(A,e)f(y x,e) P'(x)g'(l )f(y Axe) (2.7)
h h'(x,e)f(ylA,e) fY)
A8
f(ylx,e) represents the likelihood of observing the sample result y
given particular values for X and e. f(y) is the marginal distribution
of the data. Then, recognizing that g(elx)f(yl,e) = f(yJX)g(ely,x),
g"(ely) is obtained from (2.7) as follows:
g"(ey) = P()9'(e) f(yAy) (2.8)
A f(y
= I P'(M)f(yl) g(,y)
fA(y g(eA,y)
= Z P(Aly)g(elx,y).
Thus C's posterior distribution of e is a weighted average of A's
posterior distribution of 6 and B's posterior distribution of e.
Roberts points out that if (2.7) is summed over 0 instead of A,
as was done in (2.8), the marginal posterior distribution of A is
obtained:
P(fly) = P' f( (2.9)
Roberts notes that in statistical discrimination problems where it is
assumed that y is generated by either f(yI/A) or f(ylhB), P'(.i) (i=A,B)
may be interpreted as the discriminator's prior probability that f(yAi)
generates y, and P(Aily) may be interpreted as the discriminator's
posterior probability that f(yI i) generates y. Roberts suggests that
discrimination between these two alternative generating processes should
P(hAAY)
be accomplished via examination of the posterior odds ratio, P( .A
T\BTYT
Roberts' interpretation of P(Aily), and his suggested procedure for
discriminating among alternative statistical models, were formalized by
Geisel in his Bayesian, Model Selection and Bayesian Model Comparison
schemes. Geisel's extension of Roberts' work is discussed in the next
section.
11.3 Martin S. Geisel: Bayesian Model Comparison and Selection1
Geisel's work was concerned with Bayesian procedures for comparing
and choosing among parametric statistical models. His procedure for
comparing models will be referred to as the Bayesian Model Comparison
(BMC) approach. His procedure for choosing one model from among a set
of competing models uses the same methodology as the BMC approach but
for different purposes. Consequently, the Tatter procedure will be
referred to here as the Bayesian Model Selection (BMS) procedure.
Geisel, pp. 145.
Suppose the decision maker feels that any one of N alternative
models could represent the datagenerating process of interest to him.
Denote by P'(Mi), i=1,2,...,N the decisionmaker's prior proba
bility that Mi, the ith model, is an accurate representation of the
datagenerating process. If the decision maker assesses P'(Mi) > 0,
then the model should be included in the set of N models. It follows
N
that P P'(Mi) = 1. The unknown vector of parameters of M. is denoted
i=l
by e., i=1,...,N where 9. c 0. The decision maker's knowledge about
8i is described via a prior density function, g'(eiMi ).
If Mi were known to be the true model and its parameters were
known to La 6?, the datagenerating process could be completely charac
terized by the density function f(yle?, Mi, D), where y is the random
variable of interest to the decision maker.1 In the forecasting
problems of Chapter III, Di, which may be a vector, will be the
explanatory variables of Mi and will be used to help forecast future
values of y. In the economic control problems of Chapter IV, D. will
be the independent variables of M. and will be under the control of the
decision maker. Once y has been observed, f(yIei, Mi, Di), viewed as
a function of ei, Mi and Di, is a likelihood function and can be used
to make inferences from the data about the correct model and about the
parameters of all the models.
y may be vectorvalued, but in order to simplify the notation
and discussion to follow it is assumed that y is a scalar.
As new information is received about the datagenerating process
being modeled, i.e., as y is observed, the prior distribution on the
parameters of Mi should be revised to reflect this new information.
Revision for a single model is accomplished exactly as if the parame
ter distribution of a known datagenerating process with unknown
parameters were being revised. Applying Bayes' Rule yields
g"(ei iy,Di) = g'(eilMi)f(yle i,Mi,Di)/f(yMi,Di) (2.10)
where
f(yIMi,Di) = f g'(e MiM)f(y ei,Mi,D )dei. (2.11)
0
The function g"(ei Mi,y,Di) is the posterior distribution of ei.
Given Mi and Di, and before observing y, f(yNMi,Di) is commonly
called the predictive density function of y. It is the distribution
of future realizations of the datagenerating process conditioned on
Mi being the correct model of the process and unconditioned on ei,
the parameters of Mi. Having observed y, f(yMiN,Di) may be thought
of as a "model likelihood" since it compares the relative likelihood
of the data, y, across models. Utilizing these model likelihood
Bayes' Rule is invoked a second time to revise the prior model
probabilities:
P"(Mily,D) = P'(Mi)f(ylMi,Di)/f(yID) (2.12)
where
N
f(y(D) = P'(Mi)f(yIMi,Di). (2.13)
i=l
P"(Mi y,D)is the posterior probability that Mi is the correct model.
f(ylD) is a predictive distribution, a distribution of future
realizations of the data unconditioned on a particular model being
the correct model. D, written without a subscript, is a vector
comprised of the set of decision variables, Di, i=1,2,...,N from all
N models.
After observing y and revising the prior distributions on M.
and ei, the posterior probability distributions reflect all the
information the decision maker has about the set of models and their
parameters. Any prior information is reflected in the prior distri
butions, P'(Mi) and g'(eilMi). The sample evidence, y, is incorpo
rated through the likelihood function, f(yMi,ei,Di). As additional
information in the form of further observations of y is obtained, it
may be reflected in new posterior distributions that are obtainable
via revision of the existing posteriors (which, relative to the latest
data, are called priors) derived in (2.10) and (2.12) above.
As long as the datagenerating process does not change over time,
the application of (2.10) and (2.12) to successive sets of new data
permits the decision maker to "learn from experience" about which
model of the process is the most appropriate. When the data may be
generated by different models in different time periods, successive
application of the probability revision procedures in this section
would be inappropriate. This problem and an approach to handling it
are discussed in Chapter V.
The above procedure can be used to select a single model to
represent a random process by a decision maker who is uncertain about
the appropriate form of that process. He can accomplish this by
choosing from his original set of N competing models the one with the
highest posterior probability or, if losses associated with choosing
the incorrect model are known or can be estimated, by selecting the
model that minimizes his posterior expected loss. The use of posterior
model probabilities for model selection is the procedure referred to
in this dissertation as Bayesian Model Selection (BMS).
The decisionmaker's posterior model probabilities indicate that
he is uncertain of the form of the random process. Thus, any decision
procedure based on a chosen model fails to appropriately treat model
specification uncertainty. Geisel points out that if the posterior
probability of a model is positive, then the model contributes to our
knowledge of future observations of the random process of interest and
there is no theoretical reason to neglect this contribution. Hence,
any decision procedure that involves selecting a single model from
among a set of competing models ignores relevant information, and,
computation costs and other complexities aside, can only be viewed
as an approximation to an optimal procedure.
The key to utilizing all the information contained in the set
of competing models relative to future observations of the random
process lies in the use of the predictive density function derived
in (2.13) above and repeated here in more detail:
1See A. M. Faden and G. C. Rausser, "Econometric Policy Model
Construction: The PostBayesian Approach," Annals of Economic and
Social Measurement, 5(1976), 349362.
N
f(ylD) = i P'(Mi)[f f(ylei,M ,Di)g'(9 iMi )de ]
i=1 a 1 1 1
(2.14)
N
= P' (M )f(y Ml i D ).
i=1
f(yD) is a weighted average of the predictive densities of y for each
of the N models (referred to below as model predictives). It is this
distribution that the decision maker should use to characterize the
random process upon which his decision hinges and about whose form he
is uncertain. This distribution will herein be referred to as a
Bayesian Mixed Model Predictive (BMMP) distribution. The process of
computing and analyzing posterior probabilities and the associated BMMP
distribtuion is called "comparing models" by Geisel and is referred to
herein as the Bayesian Model Comparison (BMC) procedure.
Suppose that y has been observed and that the decision maker is
interested in making a decision that relates to some future value, yF'
of the random variable. If the decision maker knew the correct model,
say Mi, and its parameters, say ei, then his distribution of yF would
be f(yFlei,Mi,DFi) and his decision would depend on this distribution.
DFi is used to denote values of the decision variables of model i
associated with yF. But the decision maker knows neither the correct
model nor its parameters. What he does know is summarized in
P"(Mily,D) and f(YFIMi,y,Di). Thus, his distribution of yF should be
a BMMP conditioned on the data already observed, y and D:
N
f(YFIy,D,DF) = I P"(Mily,D)[ff (YFIi i DFi )g(9iIMi ,y,D )dei] (2.15)
i=l 0
N
= Z P"(Mi y,D)f(YFIMiy,Di DFi).
i=l
This BMMP is a function of all N competing models and thus enables the
decision maker to choose a course of action in light of all available
information relating to yF.
Even when the BMMP is the distribution (model) that the decision
maker should use to characterize the random process in question, there
can be at least three reasons for selecting a single model via the
Bayesian Model Selection procedure:
1) In comparing alternative theories or hypotheses
it may be desirable to choose the one with the
most substantive content.
2) It may be more convenient to approximate the
random process with a simple model.
3) The use of a BMMP may prove too costly. In
general, the computation of a BMMP involves
the combination of its components via exten
sive numerical methods.
Geisel shows that under certain assumptions, Bayesian Model
Selection provides a Bayesian interpretation for the classical pro
cedure of choosing from a set of models the one with the lowest
estimated residual variance, s2, or highest coefficient of determi
nation, R2. Given a set of normal regression models each of which
has the same number of parameters, given diffuse prior distributions
over the models and the parameters of the models, and given a symmetric
loss function with respect to the choice of an incorrect model, Geisel
shows that the procedure of choosing a model with the highest posterior
model probability, P"(Mily,D), is equivalent to the procedure of
selecting the model with the lowest s2 or highest R 1 This result is
very similar to a result derived by Thornber. Thornber, however,
uses as priors on the parameters of the models those suggested by
Jeffreys' invariance theory, whereas Geisel's priors on the parameters
of the models take the form of multinormal and inverted gamma2 dis
tributions. These results will be discussed in more detail in
Chapter III.
Another important Geisel result that will be drawn upon is his
proof that given, say, M1 is the true model in the set of N competing
models, as sample evidence accumulates (i.e., n) P"(M ly,D)* and
the BMMP P(yFIMi,y,Di).4 Thus, if the decision maker could wait
long enough, the data he would observe would tell him with near
certainty which of the N models was generating the data. This result
will be discussed in more detail in Chapter III.
Ibid., pp. 2437.
E. H. Thornber, "Applications of Decision Theory to Econometrics"
(Ph.D. dissertation, University of Chicago, 1966), Chapter 2.
3For discussion of Jeffrey's invariance theory see Arnold Zellner,
An Introduction to Bayesian Inference in Econometrics (New York:
Wiley, 1971), pp. 4153.
Geisel, p. 23.
24
In the next chapter some of the consequences of forecasting with
and without the use of the Bayesian Model Comparison procedure are
explored. Particular attention is paid to the comparison of the
Bayesian Model Comparison procedure and the Bayesian Model Selection
procedure.
CHAPTER III
FORECASTING WITH AND WITHOUT REGARD
FOR MODEL SPECIFICATION UNCERTAINTY
If a decision maker is uncertain as to which one of N random
processes is generating future values of a random variable upon which
the effectiveness of his current decision depends, Geisel contends
that the decision maker should use the Bayesian Mixed Model Predictive
(BMMP) distribution of the Bayesian Model Comparison (BMC) procedure
to reflect the information he has concerning the process of interest.
His justification for this approach rests primarily on the following
statement:2
Note again that this procedure does not select one
model as "true" or "best" and eliminate the rest.
If the probabilistic weight of a model is positive
it contributes to our knowledge of the future ob
servations and there is no reason to neglect this
contribution. Thus, any decision theoretic pro
cedure which is designed to eliminate some of the
models is viewed as an approximation which is used
for reasons of simplicity of view or to reduce the
cost of computation.
This chapter explores some of the consequences of forecasting with and
without the use of the Bayesian Model Comparison procedure and, in so
doing, attempts to more rigorously justify advocation of the BMC pro
cedure for use in decisionmaking problems in which model specifi
cation uncertainty is present. The chapter attempts to explain why
Geisel, Chapter II.
Ibid., p. 19.
it would frequently be worth the extra cost to use the BMC approach
rather than approaches which, though perhaps simpler and less costly,
fail to fully reflect model specification uncertainty and the totality
of information the decision maker has concerning the process of
interest.
In this chapter, forecasting via the Bayesian Model Comparison
procedure will be compared to forecasting via the Bayesian Model
Selection procedure and the maximizeR2 rule. It is shown that when
model specification uncertainty exists, of these three procedures
only the BMC procedure optimally handles the information the decision
maker has concerning the datagenerating process whose future values
he wants to forecast. More specifically, if a decision maker forecasts
via the BMS procedure, it is shown that the risk he takes in predicting
future values of the random process of interest is misspecified. It is
also shown that the decisionmaker's posterior expected loss from using
a BMC forecast is less than his posterior expected loss from using a
BMS forecast. The last two sections of this chapter compare the effec
tiveness of point and interval forecasts generated via the BMC procedure
with those generated via the BMS procedure. It is shown that BMS point
estimates are typically misplaced and that the reliability of BMS
credible intervals may be misspecified.
The following section introduces notation which will be used in the
remainder of the chapter and examines the relationship between the pre
dictive variance of y as defined by a BMMP distribution, and the predic
tive variance of y as defined by the model selected by the BMS procedure.
III.1 A Comparison of the Predictive Variances Generated by the
Bayesian Mixed Model Distribution and the Bayesian Model
Selection Procedure
Much of the analysis in Section 111.2 draws on the relative sizes
of the predictive variance of y as defined by a Bayesian Mixed Model
Predictive distribution, V(BMMP), and the predictive variance of y as
defined by a Bayesian Model Selection Predictive distribution, V(BMSP).1
Accordingly, to avoid awkward digressions in Section 111.2, this section
will be devoted to a comparison of V(BMMP) and V(BMSP).
It was shown in equations (2.14) and (2.15) that the BMMP is a
weighted average of the predictive densities of y for each of N alter
native models. Equation (2.15) is repeated here:
N
f(lFD,DF) = P"(MilY,D) [f(YFiMi'i,DFi)g"(OMiy,Di) dei]
i=1 a
N
= P"(Mily,D)f(yF iM y,,DFi). (3.1)
The function f(yFIi.,y,DiDFi) will be referred to as a "model predic
tive." Recalling equation (2.11), a model predictive is a distribution
of realizations from the datagenerating process conditioned on
1) II. being the correct model of the process; 2) previous observations
of y, the dependent variable of interest, and Di, the decision vari
able; and 3) DFi, the value of the decision variable with which the
next y to be observed, yF, is associated. Thus, if the Bayesian Model
Selection procedure chooses, say, M., it is Mi's predictive distri
bution, f(YFIMi,YD,DFi), that is being chosen to characterize future
observations of y, yF. It is the variance of this preditive
IV(BMMP) and V(BMSP) are formally defined below.
distribution that is referred to as V(BMSP). In general, the mean and
variance of the predictive distribution generated by Mi will be
2
denoted by Wi and o., respectively. The mean and variance of a BMMP
will be denoted by p and o2 (or V(BMMP)), respectively.
It is shown below that
S= P"(M1y,D)p1 + ... + P"(M, ly,D)PN (3.2)
and
2 = P"(M yD)[o2 + ( )2] ...
+ P"(MMIyD)[a + (pN )2]. (3.3)
To demonstrate, first note that u can be obtained by definition as
S= YFf(YFIYD,DF)dyF (3.4)
Substituting (3.1) for f(yFIy,D,DF) in (3.4) yields
= i yF P"(M iy,D)f(yFIMiy,Di',DFi)]dYF. (3.5)
With the expansion of the sum in equation (3.5), (3.2) is obtained:
p= YF[P"(Ml y,D)f(yFI ,y,D1,DF )
+ ... + P"(MN y,D)f(yFIMNy,DDFN)]dyF
= P"(M1y,D) j YFf(YFIM ,y,D0,DF1)dyF
+ ... + P"(MNy,D) yFf(yF Iy,DNDFN)dyF
= P"(M y,D) 1 + ... + P"(MNIY,D)N.
(3.6)
To obtain an expression for the predictive variance of the BMMP,
note that by definition
V(BMMP) = 02 = (YF f(yFY,D,DF)dYF. (3.7)
Substituting (3.1) for f(yF Y,D,DF) in equation (3.7) yields
2 (F )2 IP"(MilyDi f(FMy,DiDFi)dyF. (3.8)
The following is obtained by expanding the sum in equation (3.8):
2 N 2
= P"(Miy,D) f (YF ) f(FMi yDiDFi)dYF
i=l 
N
(Y 2YF + 2)f(YFlbli Y'Oi'DFi)dYF"
i=l +
Working with the ith term of this sum, the following is obtained:
P"(Mily,D){Ei(y2) 2 Ei ) + }. (3.9)
Noting that Ei(Y2) = o + [Ei(F)]2 and Ei(Y) = ,i' (3.9) becomes
P"(Miy,D){2 2+ + ,2}. (3.10)
The three righthand terms inside the brackets of (3.10) may be
factored yielding:
P"(Mily,D){oa + (ui u)2}.
2
Thus, 2 may be written as follows:
N 2 2
a P"(Mily,D){ai + (i 2) }. (3.11)
i=l 1
This is the same as equation (3.3). Defining P"(M1) = P"(Mily,D),
(3.11) can be rewritten as follows:
N 2 N 2
02 p Mi P")oi + P"(M.)(u )2. (3.12)
i=1 i=l
Having defined V(BMMP) and V(BMSP), it is now possible to compare
their magnitudes. Assuming, as will be done for the remainder of this
dissertation unless otherwise noted, that the decisionmaker's model
space contains only two models, M1 and M2,1 the relative magnitudes of
V(BMMP) and V(BMSP) will be examined for each of the following cases:
CASE I: 2 a2
2 2
CASE II: a0 < 02 and BMS chooses M1
2 2
CASE III: l0 < a2 and BMS chooses M2.
For convenience, P"(1i) will be used in place of P"(M.y,D) in the
discussion and proofs of these cases and the lemmas that follow.
2 2
THEOREM 1: If a2 = 02, then V(BH1IP) > V(B;1SP).
PROOF: When N = 2,
a2 = P"(M)N + P"(M)2) + P" )(1 2 + "(M2)( 2 )2,
2 2 2 2
and when o2 02, V(BNSP) = o1 = o2.
This assumption is made in order to simplify the analysis which
follows. For a more precise explanation of this assumption, see
Section 111.2.4.
Since by definition 0 < P"(MI), P"(M2) < 1, it follows trivially that
2 2
when 01 = 02,
2 2 2 2
P"(M1 )o + P"(M2)o2 = 1 = 02
Thus, if
P"(M)( p)2 + P"(M2)(2 2 2 0,
V(BMMP) > V(BMSP). Since (uI v)2 and (u2 2 are nonnegative,
P"(M1)(p~1 1 2 + P"( (2)("2 2 0
and V(BMMP) > V(BMSP). Unless il = '2' in which case pl = p2 = '
P"(M )(I p)2 + P(M2)( 2 )2 is strictly greater than zero and
V(BMMP) is strictly greater than V(BMSP).1
THEOREM 2: If 2 < o2 and BMS chooses M1, then V(BMMP) V(BMSP).
2 2
PROOF: Refer to the proof of Theorem 1. Since ao < a2,
2 2 2
P"(M1)2 + P"(2)o2 a02
From the proof of Case I,
P"(M 2 P(M2)(2 )2 0.
Thus, it follows that
P"(' )o2 + P"( m2)2 + P"(MI)(S1 2 P ( 2)( 2 > o2
1This dissertation is not concerned with special cases in which
pl = 12 and o2 = o.
i.e., V(BMMP) 2 V(BMSP). However, V(BMMP) equals V(BMSP) only if
P"(M1) = 1. But, if P"(M1) = 1, there exists no model specification
uncertainty. Thus, when model specification uncertainty exists,
V(BMMP) is strictly greater than V(BMSP).
2 2
THEOREM 3: If a < o2 and BMS chooses M2, then V(BMMP) < V(BMSP).
PROOF: Refer to Theorem 1. Whenever P"(M2) 1,
P"(M1)o2 + P"(M2)G2 < o2.
Therefore,
2 = P"(M ) 2 + P"(M2)o2 + P"(M1)( 2 + P"(M 2 2
depending on the size of P"(M1)( 1 )2 + P"(M2)(M 2 ,)2
Perhaps the most important thing that Theorems 1, 2, and 3 reveal
is that if model specification uncertainty exists, V(BMMP) f V(BMSP),
except for uninteresting cases. This fact will be referred to repeatedly
throughout Chapters III and IV. As will be seen in Section 111.2,
the inability to order V(BMMP) and V(BMSP) in Case III poses no problem
with respect to comparing the relative merits of the BMC and BMS proce
dures as aids to forecasting. It does, however, make identification
of whether the measure of forecastrisk provided the decision maker by
the BMS procedure (defined in Section 111.2.6 to be V(BMSP)) understates
or overstates the actual forecastrisk Faced by the decision maker.
This problem is discussed in Section 111.2.6. In Sections 111.2.6 and
IV.3.2, it is shown that Case III may never arise, since situations
exist in which only Case I applies.
The following three lemmas and the discussion that follows them
are useful for helping to order V(BMMP) and V(BMSP) in situations
in which Case III applies. The first provides a necessary and
sufficient condition for V(BFHIP) to be greater than V(BMSP).
2 2
LEMMA 1: If o2 < 02, ] f ," and BMS chooses M2, then
V(BMMP) > V(BMSP) if and only if
2 2
(o2 o1)
P"(M2 y,D) >2
( l "2)
PROOF: 1. If V(BMMP) > V(BMSP), it must be shown that
2 2
(a2 aI)
P"(M2) > 2
( l 2)
Since V(BMMP) = 02 = P"(M1)o + P(2)o + P"(M)(1 2
+ P"(M2 )(2 )2
and V(BMSP) = 2, V(BMMP) > V(BMSP) is the same as
P"(M1 )a2+ pP(M2)o + P"(M1)( u2 + P"(M2)(p 2 > o2. (3.13)
Subtracting P"(M)2 + P"(M2)o2 from both sides of (3.13) yields
Subtracting P2+(M n1fM( 2 2 22
P"(M)(1 )2 + P"(M2 2 2 > 02 [P"(M1)o1 P(M2 (3.14)
From (3.2) it is known that p = P"(M 1)p + P"(M2)p2. Let the rhs of
(3.14) equal R, and define P1 = P"(11 ) and P2 = P"(M2). Then substi
tuting for p in (3.14) yields
P I( ~ Pi P2'2)2 + 2 P P2) R (3.15)
Noting that P2 = 1 P1, (3.15) can be written
P(U u2 P2)2 + P2(12P1 1P1)2 > R. (3.16)
Factoring P2 out of the first term on the lhs of (3.16) and P1 out
of the second term on the Ihs yields
P2P P2(Ul 12)2 + PIPIP2(2 p1)2 > R. (3.17)
Noting that P1 + P2 = 1, and that P1P2I U2)2 = P1P2( 'l)2
(3.17) becomes
P1Pl > o (P1 12 + Po2) = P1(o 01). (3.18)
2 2
Dividing both sides of this inequality by P (02 a,) yields the
desired result 2 2
(' 2 01
P2 '
(vl 112)v
02 2
(o2 1 )
2. If P2 > 22, then V(BMMP) > V(BMSP), i.e.,
(I 2
Pa + P2a2 + P1("1 )2 + P2(p2 )2 > 02.
A reversal of the steps in the first half of the proof leads
immediately to this result.
Lemma 1 can be combined with Theorem 1 to form a necessary and
sufficient condition for V(BMMP) to be greater than V(BMSP) when,
2 2
say, oI < 02, regardless of which model BMS selects.
2 2
LEMMA 2: If a0 < 02, the V(BMMP) > V(BMSP) if and only if
a) Model 1 is selected by BMS,
or
b) Model 2 is selected by BMS, 9 #f 2, and
2 2
(o o)
P"(M2 Y,D) > 2 '
PROOF: Lemma 2 results from combining Theorem 1 and Lemma 1, and
its proof follows directly from their proofs.
It is clear that V(BMMP) > V(BMSP) whenever condition a or b
of Lemma 2 is satisfied. Thus, upon examining condition b the
following can be said:
1. Other things equal, the greater the distance between the
means, p1 and p2, of the predictive distributions of the
two models in question, the smaller is the rhs of the
inequality of condition b, and the more likely it is
that V(BMMP) > V(BMSP).
2. Other things equal, the closer in size are the predictive
2 2
variances, o1 and 02, the smaller is the rhs of the
inequality of condition b, and the more likely it is
that condition b holds, i.e., the more likely it is
that V(BMMP) > V(BMSP).
Both these statements apply irregardless of which model is chosen by
BMS, i.e., whether it be the model with the lower or higher pre
dictive variance.
As an example of how statements one and two might help determine
the relationship between V(BMMP) and V(BMSP), the following is
offered. Suppose the decisionmaker's prior information about y
leads him to believe that the predictive variances of both models
are roughly equal, but that their predictive means differ
significantly. By Theorems 1 and 2 and statements one and two above,
the decision maker should consider it more likely that V(BMMP) exceeds
V(BMSP) than if he believed, say, that ,1 and V2 were about the same
2 2
size. This follows since a) if ao in fact equals a2, then by Theorem 1
2 2
V(BMMP) > V(BMSP); b) if, say, a2 < a2 and the BIS procedure chooses
M1, then Theorem 2 applies and V(BIMMP) > V(BMSP); and c) if a2 < 2
and the BuS procedure chooses M2, then Theorem 3 applies and the de
2 2
cisionmaker's prior information about a~, 02, u1 and 02 in concert
with statements one and two above indicate that it is more likely that
V(BMMP) exceeds V(BMSP) than if, say, the decision maker thoughtI and
I2 were about the same size.
The next section utilizes the results of this section in comparing
the effectiveness of the BMC, BMS, and maximizeR2 approaches to
forecasting.
111.2 Forecasting: Bayesian Model Comparison Versus
Bayesian Model Selection and the MaximizeR2 Rule
Most forecasting procedures handle model specification uncertainty
suboptimally. Typically, a forecaster proposes a number of alternative
statistical models as possible candidates to represent the data
generating process whose future value he is interested in predicting
and then, via some model screening procedure, eliminates all but one
model.
For a discussion of various classical and Bayesian model screening
procedures, see Kenneth M. Gaver and Martin S. Geisel, "Discriminating
among Alternative Models: Bayesian and NonBayesian Methods," Chapter
Two in Paul Zarembka (ed.), Frontiers in Econometrics (New York:
Academic Press, 1974), pp. 4977.
In this section, forecasting as accomplished via two model
screening procedures, Bayesian Model Selection (BMS) and the classi
cal maximizeR2 rule approach (maxR2), is compared to forecasting as
handled by a procedure that optimally considers model specification
uncertainty, the Bayesian Model Comparison approach (BMC). Before
2
proceeding with the comparison a brief review of BMS, maxR2, and
BMC is in order.
111.2.1 The Bayesian Model Selection Procedure (BMS)
Bayesian Model Selection was discussed in some detail in Chapter
II. Briefly, it requires the following:
1. The specification of a set of N alternative statistical
models each of which purports to represent the data
generating process of interest to the forecaster.
2. The assessment of a prior probability mass function over
the set of N models, P'(Mi), i=1,2,...,N.
3. The assessment of prior probability density functions over
the parameters of each model,g'(ei Mi), i=1,2,...,N.
4. The specification of a likelihood function for each model,
f(yei,MiDi), i=1,2,...,N.1
5. The computation of posterior probabilities for the models,
P"(Miy,D), i=1,2,... N.
The posterior model probabilities are often used to select one model
from among the set of N models to represent the datagenerating
When thought of as a function of y with e., M., and D. given,
f(ylei, Mi, D.) is model i. 1 1
process of interest to the forecaster. The usual procedure is to
select the model with the highest posterior model probability. In
the event that the forecaster can estimate the loss that results from
choosing an inappropriate model and can do this for each of the N
models, he can compute his expected loss from choosing each model
and select the model which yields the lowest expected loss.
It should be noted that BMS may also be used for reasons other
than for the selection of a single model from among a set of N
models. For example, if N is large, BMS can be used to reduce the
number of models in the model space to a number that can be more
easily and inexpensively dealt with by a procedure such as BMC.
This can be accomplished by eliminating all models from considera
tion whose posterior model probability is, say, less than some a,
0 < a < 1. In this dissertation, however, BMS will be regarded as
a procedure for selecting a single model from among N alternative
models.
The forecaster who uses BMS essentially handles his forecasting
problem in a twostep sequence: first, a single model is chosen to
represent the datagenerating process; second, under the assumption
that the chosen model is in fact a "true" reflection of the data
generating process, the forecaster addresses his prediction problem.
1Actually the decisionmaker must be able to determine the
loss from choosing model i when model j is the true model, i / j.
There are N(N 1) such losses.
111.2.2 The MaximizeR2 Rule
The maximizeR2 rule is frequently used to choose one from among
a set of alternative competing linear statistical models whose explana
tory variables are nonrandom. The usual procedure is to estimate the
parameters of each of the alternative models, compute each model's
coefficient of determination, R2, and then select as being the best
representation of the datagenerating process the model with the
highest R2. Forecasting is then carried out utilizing the chosen model
as if it were in fact the true model.
It is important to reiterate the wellknown fact that R2 is
2
inversely related to S2, the estimate of the dependent variable's
residual variance. A maximizeR2 rule is therefore equivalent to
22
a minimizeS2 rule. In other words, the model with the maximum R2
is also the model with the minimum S2
Geisel2 and Thornber3 have shown that under certain conditions
model selection as accomplished via the maxR2 rule is equivalent to
the Bayesian Model Selection procedure. The conditions are the
following:
1. The loss structure with respect to the selection of an
incorrect model is symmetric. That is, if the loss from
For a more detailed discussion of the maxR2 rule see Gaver
and Geisel, pp. 5253.
Geisel, pp. 2437.
3Thornber, Chapter 2.
choosing Mi when Mj is true is represented by Li, then
L. = L, for all i,j,k,. = 1,2,...,N, with i / j and
k / Q, and Lij, LkQ > 0.
2. P'(M1) = P'(M2) = ... = P'(MN), i.e., the prior model
probabilities are equal.
3. The statistical models in question are normal regression
models each of which has the same number of parameters.
The parameters of each are its regression coefficients,
2
usually denoted by B's, and its residual variance, a
That each model has the same number of coefficients implies
that each model has the same number of independent (explana
tory) variables.
4. The prior density function for the parameters, Bi and o ,
is diffuse.
Geisel and Thornber used different forms for the diffuse prior
density function for the parameters Bi and oci, but both showed that
selection of the model with the highest posterior probability is
equivalent to selection of the model with the lowest S2. Since the
model with the lowest S2 also has the highest R2, Geisel and Thornber
have shown that selection of a model via the BMS procedure is equiva
lent to selection via the maximizeR2 rule.
Since a model's R2 can be increased simply by adding more
"explanatory" variables to the model, a maximizeR2 rule is frequently
used in place of the maximizeR2 rule. R2 is defined as follows:1
See Gaver and Geisel, pp. 5254.
R2 = R2 1 (1 R2
(nk (1T
where n is the sample size and k is the number of explanatory vari
ables. The addition of variables will increase the model's R2,
adjusted coefficient of determination, if and only if the F statistic
for the hypothesis that the added variables' coefficients are all zero
is greater than one.1 Geisel showed that in the twomodel case, model
selection via the BMS procedure can be made equivalent to selection
2
via the maximizeR rule if the relationships between the parameters
of M1 and M2 are appropriately specified. The required parameter
relationships are, unfortunately, somewhat nonsensical. There are
no known intuitively meaningful sets of assumptions under which the
BMS procedure and the maximizeR rule are equivalent.
In the remainder of this chapter the four conditions listed
above apply, unless noted otherwise. Thus, to avoid redundancy, the
maximizeR2 rule will not be discussed directly in what follows
but will be addressed indirectly through comments about the equiva
lent selection procedure, BES. Since the BMS and maximizeR2 pro
cedures are equivalent only in that they select the same model,
only comments concerning the fact that the BMS procedure actually
chooses a model, or comments about which model it chooses, also apply
to the maximizeR2 procedure.
1 2
John B. Edwards, "The Relationship Between the FTest and R ,
The American Statistician, 23 (December, 1969), p. 28.
Geisel, pp. 4145.
111.2.3 The Bayesian Model Comparison Procedure (BMC)
The Bayesian Model Comparison procedure was discussed in detail in
Chapter II. Briefly, it requires the following:
1. The specification of a set of N alternative statistical
models, each of which purports to represent the data
generating process of interest to the forecaster.
2. The assessment of a prior probability mass function over
the set of N models, P'(Mi), i=1,...,N.
3. The assessment of prior probability density functions
over the parameters of each model, g'(eilMi), i=1,...,N.
4. The specification of a likelihood function for each model,
f(ylei,Mi,Di), i=1,...,N.
5. The computation of posterior probabilities for the models
(referred to as model probabilities),P"(Mi y,D), i=1,2,...,N.
6. The computation of the marginal distribution of future values
of the datagenerating process. (This distribution, as
noted earlier, is a predictive distribution. It will be re
ferred to herein as the Bayesian Mixed Model Predictive
(BMMP).)
The first five requirements are the same as the five requirements
of the Bayesian Model Selection procedure. It is the sixth requirement
that distinguishes the Bayesian Model Comparison procedure from the
Bayesian Model Selection procedure. Instead of choosing one of the N
models,as does the BMS procedure, BMC models the datagenerating
process of interest with the BMMP distribution.
Recalling (2.14), the BMMP distribution is defined as follows:
N
f(ylD) = I P'(Mi)[f f(yeiM ,Dii)g'(ei Mi)dei] (3.19)
i=l 0
N
= I P'(Mi)f(ylMi,Di). (3.20)
i= 1
All the terms denoted in (3.19) and (3.20) were defined in Chapter II,
and the distributions denoted in (3.19) and (3.20) were redefined in
the six requirements above.
After observing realizations of the datagenerating process in
question, the BMMP takes the form presented in (2.15):
N
f(yFIY,D,DF) = i P"(Mily,D)[f f(yFeiMi,DFi)g"(eiIMiy,Di)dei] (3.21)
i=l 0
N
= P"(Mily,D)f(yFIMiY ,Di,DFi). (3.22)
i=l
Recall that D = (DI ,D2,... DN)', where Di is a vector containing the
values of model i's explanatory variables that correspond to the most
recently observed y value. DF = (DF1,DF2 ... DFN)', where DFi is a
vector containing the values of model i's explanatory variables at
the time the next y value is to be generated. From (3.20) or (3.22),
it can be seen that a BMMP distribution is a weighted average  or
mixture  of each model's predictive density of YF' f(yF]Mi'yDiDFi)"
The implications of parameter and residual uncertainty for pre
diction and decision making have been given considerable attention.
See, for example, any of the following: Theil, Fisher, Brainard,
Leland, Basu, Zellner, Barry and Horowitz, and/or Waud.1 As noted
in Chapter II, the BMC procedure considers residual, parameter, and
model specification uncertainty. Accordingly, if each model in the
set of N competing models is viewed as a possible "parameter value"
for the process of interest, the BMC procedure may be thought of as a
means for extending the parametric analysis of prediction and decision
making problems to include consideration of the possibly widely dif
fering predictive and decisionmaking implications of the competing
models. Thus, just as a Bayesian can extend predictive analysis by
explicitly allowing for parameter uncertainty instead of just using
parameter estimates, the BMC procedure extends parametric analysis
by explicitly considering model specification uncertainty.
A forecaster using the BMC procedure rather than, say, the BMS
procedure, does not have to unnaturally divide the forecasting problem
into two parts. He does not have to first select a model from the set
H. Theil, Economic Forecasts and Policy (Amsterdam: North
Holland, 1961). Walter D. Fisher, "Estimation in the Linear Decision
Model," International Economic Review, 3 (January, 1972): 129.
William Brainard, "Uncertainty and the Effectiveness of Policy,"
American Economic Review, 57 (May, 1967): 41125. H. Leland, "The
Theory of the Firm Facing Uncertain Demand," American Economic Review,
62 (1972): 278291. A. Basu, "Economic Regulation Under Parameter
Uncertainty" (Ph.D. dissertation, Economics Department, Stanford Uni
versity, 1973). Zellner, Chapters II, III, and XI. Christopher B.
Barry and Ira Horowitz, "Risk and Economic Policy Decisions," Public
Finance 30 (no. 2, 1975): 153165. Roger Waud, "Asymmetric Policy
maker Utility Functions and Optimal Policy Under Uncertainty,"
Econometrica, 44 (January, 1976): 5366.
of N competing models and then, assuming the chosen model to be the
correct model of the process, proceed with his forecasting. He
computes the BMMP distribution for his set of models and uses it
directly to determine, say, point or interval predictions for future
values of y. The forecaster's BMMP distribution reflects his residual,
parameter, and model specification uncertainty, and any predictions
that he makes using his BMMP are made in light of all three types of
uncertainty and with the use of information bearing on any and all of
them. This point will be discussed in greater detail in Section
111.2.5.
The next section sets forth the specific assumptions under which
the BMC and BMS procedures will be compared in the remainder of the
chapter.
111.2.4 Model Space and Assumptions
The comparison of the BMC and BMS procedures (and indirectly
maxR ) that follows will be based on the following assumptions:
1. The decision maker (forecaster) behaves as if he believes
that one or the other of the following two models is an
accurate representation of the random process of interest,
but he is unsure which model is appropriate:
M1: = 1X + C;
M2: y = B2 + 6.
y is the variable whose future value the forecaster is
interested in predicting. X and Z are two different
explanatory variables. X and Z are random, but their
values associated with the next y to be generated are known
prior to y's observation. B1 and B2 are unknown parameters.
e and 6 are the usual normally distributed error terms,
2 2
each with mean zero and unknown variance, o2 and o res
E 6
pectively. It is also assumed that cov(B,lE) = cov(B2,6) =
cov(e,6) = 0. Thus, M1 and M2 are normal univariate regres
sion models which, to keep the number of each model's unknown
parameters at two, have been forced through the origin. Since
the values of the explanatory and dependent variables can
always be scaled so that M1 and M2 pass through the origin,
no generality is lost by using models without intercept
terms. Care must be taken, however, to interpret results
in the appropriate units.
2. The random process of interest to the forecaster is
stationary.
3. X and Z are uncorrelated and only the explanatory variable
in the true model affects y. Thus, if M1 were the true
model, B2 would be zero. If neither M1 nor M2 were the
true model, it may be that 1I = a2 = 0.
4. In comparing the BMC and BMS procedures, it will be assumed
that the forecaster may have prior information about the
parameters of M1 and M2. Since model selection via the BMS
procedure and the maximizeR2 rule are equivalent only if
the forecaster has no prior information about the parameters
of the models, any comments made about the BMS procedure
under this assumption do not apply to the maximizeR2 rule.1
Note that in assumption one above the residual variance of each
model is assumed to be unknown. It would be unrealistic to assume
the residual variance to be known when the correct model of the process
2 2
is not known. Further if a and ao were known, or were assumed to be
known, and the correct model was known to be either M1 or M2, the
correct model could be selected by the forecaster with probability one
and there would be no need for procedures such as BMC or BMS.
To illustrate, consider the following argument. For a given X
2 2
value the conditional variance of y, a2lx, is 02 For a given value
2 2
of Z the conditional variance of y, a2 yZ is a The marginal variance
2
of y (i.e., y's variance unconditioned on X), 0y, as described by M1
22 2
is elax + a and the marginal variance of y as described by M2 is
22 2
27Z + a6. If M1 were in fact the true model, then
2 2 2 2
y = alx + aE,
y 1lx E
2 2
yX = a,
and
B2 = 0 .
Since B2 = 0, the marginal variance of y as described by M2 is simply
2
o6. Thus, since the marginal variance of y is now known to be
2 2 2 2 2 2 2
a x + e, it follows that a% = B1ax + a This says that when M1 is
2 2 2
the true model a < a Consequently, if it is assumed that a and o0
1The specific conditions under which the BMS and the maxR2
approaches to model selection are equivalent were listed in Section
111.2. Only assumption four of this section affects their equivalency.
are known, the model with the lower residual variance can be identi
fied with probability one as being the true model.
In the next section the BMS and BMC procedures are compared with
respect to how well each accounts for a forecaster's model specifi
cation uncertainty.
111.2.5 The Treatment of Model Specification Uncertainty
Assuming that the random process of interest is stationary and
that one of a proposed set of alternative models is a true repre
sentation of the process, Geisel has shown that in the limit the BMMP
and BMS predictive distributions are the same.1 Thus, in the limit,
the BMC and BMS approaches to forecasting are equivalent. This
result is demonstrated below.
Recalling (2.15), a BMMP can be written as a weighted average of
model predictives:
N
f(YFy,D,DF) = i P"(Miy,D)f(yFIMiy,Di,D Fi). (3.23)
i=1
Each of the individual model predictives, f(yFIMiy,DiDFi), is the
distribution that would be used to characterize the random process
in question if the BMS procedure chose M..
Geisel has shown that if Mi is in fact the true model, then as
sample evidence accumulates (i.e., as n ) P"(Mily,D) approaches one.
It follows trivially that as n approaches , f(yFIy,D,DF) approaches
1Geisel, pp. 2223.
2Ibid.
f(YFIMiy,DiDFi). Thus, since the distribution yielded by the BMC
procedure to forecast future values of y is f(yFly,D,DF), and that
yielded by the BMS procedure for forecasting purposes is
f(yFIMiy,Di,DFi), in the limit the BMC and BMS procedures are
equivalent forecasting procedures. This unsurprising result says
that in the limit, under the assumed conditions, truth is obtained,
i.e., the accumulated data would indicate with certainty the model
that had been generating the data. If such were the case, everybody
would ultimately use the samecorrectmodel to predict future
values of y.
In both the BMS and BMC procedures the forecaster or decision
maker proposes a set of N models each of which he believes might
correctly represent the random process whose future values he is
interested in predicting. Theoretically, if he assesses a nonzero
probability for a particular mdoel, that model should be included
in his model space. In both the BMS and BMC procedure the fore
caster assesses a prior probability mass function over the N models
in his model space. By so doing the forecaster is formally
acknowledging the fact that he is uncertain as to the correct model.
He is thus faced with a forecasting problem in which model specifi
cation uncertainty is present and must be dealt with.
By selecting one of the N models and assuming it to be true, the
BMS approach to forecasting yields predictions that do not appropri
ately reflect the forecaster's model specification uncertainty. The
BMMP of the BMC procedure, however, by utilizing all N model predic
tives and their associated model probabilities acknowledges the
forecaster's model specification uncertainty and yields predictions
that do reflect this uncertainty. Forecasting via the BMS procedure
should therefore be regarded as an approximation to the "optimal"
approach to forecasting offered by the BMC procedure.
In the next section of this chapter the risk involved in fore
casting via the BMC procedure is compared to that involved in fore
casting via the BMS procedure. These risks are measured by V(BMMP)
and V(BMSP), respectively.
111.2.6 Risk Specification
Forecasts are frequently used an inputs to decisionmaking
problems. For example, predicted newcar demand might be used by an
auto manufacturer in determining the rate and timing of automobile
production, as well as the size of his labor force. Much of the risk
taken by a decision maker in making a decision that utilizes a fore
cast stems from the possibility of forecasting error. If, for
example, the forecasted newcar demand errs on the high side, both the
manufacturer and many of his distributors might be burdened with an
excess stock of cars, leading to unnecessarily high inventory costs.
The risk passed on to a decision maker by a forecaster, called here
forecastrisk, will be assumed to be adequately measured in terms of
the variance of the forecaster's predictive distribution. Such an
assumption would be appropriate, for example, if losses associated
with forecast errors were proportional to the squared error of the
forecast.
A forecaster that utilizes the BMS or BMC procedure is admitting
that he is uncertain of the specification of the process whose future
values he wishes to predict. It has been noted above that this uncer
tainty is fully reflected in a BMMP distribution but not in a
Bayesian Model Selection Predictive (BMSP) distribution. Thus,
unless V(BMMP) equals V(BMSP), or if no model specification uncertainty
exists,V(BMSP) is an inappropriate measure of forecastrisk, either
under or overstating it as V(BMMP) > V(BMSP) or V(BMMP) < V(BMSP).
Thus, the decisions that utilize a prediction arrived at via the BMS
procedure will have been made under the assumption that the risk
involved is either less than or greater than it is in reality. The BMS
procedure, therefore, has the potential to provide the decision maker
with information that may lead him to generate inappropriate and
excessively costly decisions.
As seen in Cases I, II, and III of Section III.1, V(BMMP) may be
greater than or less than V(BMSP). In certain situations it is more
likely that V(BMMP) is greater than V(BMSP), and in others it is
always the case that V(BMMP) is greater than V(BMSP). Such situations
will be discussed below.
It was noted in Section 111.2.2 that a model's posterior proba
bility is inversely related to its estimated residual variance, S
ability is inversely related to its estimated residual variance, Si,
and, therefore, directly related to its coefficient of determination,
R Thus, if MI's posterior probability is high relative to M2's pos
2 2 2
terior probability, then S1 is low relative to S2, and R1 is high rela
2
tive to R2. If such were the case, it could be said that the accumu
lated evidence supports M1 rather than M2 as being the more likely
datagenerating source. Accordingly, a forecaster might be tempted to
invoke the BMS procedure or the maximizeR2 rule and choose M1 and its
predictive distribution with which to forecast yF. But in such cases
it is more likely that V(BMMP) > V(BMSP) than it would be if the evi
dence did not so clearly support one model or the other.1 This is
explained below.
S2 2
02 1i
CLAIM: If  2 remains constant, the larger the difference in
( )
P"(M1) and P"(M2), the more likely that V(BMMP) > V(BMSP).
DISCUSSION: Zellner has shown that for a normal regression model
(see the assumptions of Section 111.2.4) with diffuse prior information
2
on the parameters of the model, V(BMSP), also denoted oa, is defined as
follows:2
(n 1)S D
1i (n 3) NN 2
I Dji
J=1
1From (3.12) it can be seen that when, say, P"(Mi) is close to
one, the difference between V(BMMP) and V(BMSP) is of no practical
significance. Under such circumstances a comparison of V(BMMP)
and V(BMSP) serves little purpose.
Zellner, pp. 7274.
where n is the sample size, i.e., the number of y values observed to
date; the D.j's are the values of model i's independent (explanatory)
variable, Di, observed to date; DFi is the value of Di that corresponds
to the next y value generated by the process in question; S2 is the
estimated residual variance of model i. It can be seen from (3.24)
2 2
that C is proportional to Si.
It is known from Geisel's work that P"(Mi) is inversely pro
2 1
portional to S Thus, the larger P"(M1) P"(M2)1, the larger is
S2 S2
1>1 2.
Conditions 2a and 2b of Section III.1 provide necessary and suf
ficient conditions for V(BMMP) > V(BMSP). The conditions are that if,
2 2
say, i,< ,2' then P"(M1) must be greater than .5 or P"(M2) must be
2 2
02 1 2
greater than 2 Thus,other things equal, if P"(M1) < .5,
(12 P )
then the larger is [P"(M2) P"(M1)], the more likely it is that
P"(M2) satisfies either condition 2a or 2b, i.e., the more likely it
is that V(BMMP) > V(BMSP). Of course if P"(M1) > .5, then V(BMMP) is
greater than V(BMSP) regardless of how large [P"(M1) P"(M2)] is.
The phrase "other things equal" used above refers specifically
2 2 2
to the ratio of loJ all to ("2 1)2 What is being said is that
given two model selection situations in which the absolute value of
the ratio of (ao o ) to (2 1)2 is the same in both, but that in
the first situation IP"(M1) P"(M2)I is larger than it is in the
See Section 111.2.2.
2The "other things" are clarified in the next paragraph.
second, then it is more likely in the first situation that
V(BMMP) > V(BMSP).
2
This claim can be supported from another angle. Since o. is
2 2
proportional to Si, it can be said that the smaller, say Sl is in
2 2 2
relation to S2, the more likely it is that o0 < 02. By the Geisel
2re
result discussed in Section 111.2.2, the smaller is Si in relation
2
to S2, the larger is P"(M1) in relation to P"(M2). Thus, the smaller
2 2
S2 is in relation to S2, the more likely it is that the model with
the lower predictive variance will be chosen by the BMS procedure.
Therefore, by Theorem 1 of Section III.1, the more likely it is that
V(BMMP) is greater than V(BMSP).
There is a special forecasting case worth noting in which V(BMMP)
is greater than V(BMSP) no matter which model the BMS procedure
chooses. It is a result of the following lemma.
2 2
X Z
LEMMA 3: If then the model with the lower estimated
n 2 nZ2
j=1 j=1
2 2
residual variance Si, also has the lower predictive variance, oa.
PROOF: Proof of this lemma follows directly from the definition
of o0. Recalling (3.24) and the model space assumptions of Section
2 2
111.2.4, 0l and 02 are defined as follows:
1 22
2 (n 1)S1 X2
2I =1(n + 1 (3.25)
1 nW73) n 2 )
U'lX
2 (n 1)S2( Z
2 2F + 1 (3.26)
j=1 / 2
Thus, since n, the sample size, is a constant, and F is assumed
IX.
2
F 2 2 2 2
equal to 2 a1, and 02 are proportional to S1 and S2, respectively.
Since the model chosen by BMS has the smaller estimated residual
variance, by Lemma 3 it also has the lower predictive variance. Thus if
Lemma 3 holds, by Theorem 2 of Section III.1 V(BMMP) > V(BMSP). In
this special case, a decision maker using a forecast obtained via the
BMS procedure would be making a decision that fails to recognize the
full extent of the uncertainty involved in the outcome of his decision.
Under the assumptions of Section III.2.4,Zellner has shown that
the posterior expected value of the residual variance of, say, Model 1
is1 2
2 (n 1)S 2
E" (n 3) (3.27)
and Raiffa and Schlaifer have shown2 that the posterior variance of,
say, B1 is 2
(n 1)S1
V"(1) = n 1 (3.28)
(n 3) X2
j=1
Thus, recalling (3.25), the predictive variance of model 1 may be
written
Zellner, p. 62.
Howard Raiffa and Robert Schlaiffer, Applied Statistical Decision
Theory (Cambridge,llass.:The M.I.T. Press, 1961), pp. 34955
02 = V"(Bl)X2 + E"(2). (3.29)
The following lemma, based on the above facts, is offered to
further explain the relationship between V(BMMP) and V(BMSP):
LEMMA 4: If E"(c2) < E"(o2), X2 Z and X < Z, then
=1 F J=1
2 21
V"(B) < V"(B2) and 1 < 02'
PROOF: From equation (3.27) it can be seen that E"(a ) and E"(oa)
2 2 2 2
are proportional to Sl and S2, respectively. Thus, E"(o ) < E"(o6)
means that S2 < S2. From (3.28) it can be seen that V"(BI) and V"(B)
n n
are inversely related to X and Z2, respectively. Consequently,
j=1 j=
if S2 < S2 and X2 > Z2, it can be seen from (3.28) that
j= 1 j=1
V"(1 ) < V"(B2). Thus, since V"(1) < V"(B2), E"(a2) < E"(a2), and
2 2
XF < ZF, it follows from equation (3.29) that a < 22.
If the conditions of Lemma 4 are fulfilled, the model selected
by the BMS procedure will have the lower predictive variance and by The
orem 2 of Section III.1, V(BMMP) > V(BMSP). Thus, as is the case
when Lemma 3 holds, a decision maker using a forecast obtained via the
BMS procedure would be making a decision which fails to recognize the
full extent of the uncertainty involved in the outcome of his decision.
12 2 2 2
S2 and S2 could, of course be substituted for E"(o ) and E"( 2)
respectively, but one of the goals of this lemma is to explain the
relationship of V(BMMP) and V(BMSP) via the, perhaps, more easily
interpretable definition of o2: o2 V"(B )X2 + E"(o2).
01 01 8lF E'o4
The next section examines the decisionmaker's posterior expected
losses from utilizing BMS and BMCgenerated predictions of yF'
111.2.7 A Comparison of Expected Losses
Given a loss function, sample y values, and a predictive distri
bution of y, a forecaster can find an optimal point estimate for y
by minimizing the decisionmaker's posterior expected loss:
mmn L(YF yF)f(yFD,DF)dyF. (3.30)
y =
It is well known that if a quadratic loss function is used in (3.30),
the solution to the minimization problem is the mean of f(YF Y,D,DF).
If the forecaster chooses to forecast via the BHS procedure he would
utilize a model predictive, f(yF Mi,y,Di,DFi), to solve (3.30). The
solution to (3.30) and his point estimate for yF would therefore be
the mean of his model predictive, i.. If he chooses to forecast via
the BMC procedure, he would use a BMMP, f(yFIYD,DF), to solve (3.30)
and his solution and point estimate would be the mean of the BMMP, p.
As has been mentioned several times earlier in this chapter, however,
a forecaster who opts for forecasting via BMS is not making use of
all the available information about yF. The Bayesian Mixed Model
Predictive (BMHP) of the BMC procedure reflects all the available
information, whereas a BMSP is merely an approximation to the BMMP.
Therefore, the appropriate predictive distribution to use in (3.30) is
a BMMP. Consequently, the optimal solution to (3.30) is p, the mean
of the BMMP, i.e., y = p. Only if the forecaster and/or decision
maker assess a probability of one for a particular model being the
true model of YF's process would a single model predictive provide
full information to the forecaster and/or decision maker and, hence,
an optimal solution to (3.30).1
Since the appropriate distribution to use in solving (3.30) is a
BMMP, the decisionmaker's posterior expected loss using a BMS fore
cast, .i, is greater than his posterior expected loss using a BMC
forecast, p:
EL(,i) = f L(YF'pi)f(YFIY,D,DF)dYF > EL(u)
= f L(yF')f(YFLDi,DDFi)dyF. (3.31)
This follows from the fact that it is v, and not pi, that minimizes
j L(YF,y)f(yFIY,D,DF)dyF. (3.32)
When P(Mi) > 0,i=1,2, then only if u, = P2 would, say, p1, minimize
(3.32) since then 1 = P2 = P"(M1)p1 + P"(M2)u2 = u. Of course if for
some i P(Mi) = 1, then i = v also. But in the context of this disser
tation, this case is of no interest.
Let C(BMC) and C(BMS) stand for the costs required to forecast
with BMC and BMS, respectively.2 Then, assuming that the decision
maker's loss function and the cost functions C(BMC) and C(BMS) can be
1Note that when P(Mi) = 1, the BMSP and BMMP distribution are the
same.
2In general C(BMC) and C(BMS) cannot be computed without going
through the actual computations required by the BMC and BMS pro
cedures.
meaningfully compared, if experimentation with the BMC and BMS pro
cedures shows that in general
EL(Ci) EL(p) > C(BMC) C(BMS),
it is materially as well as theoretically advantageous for the fore
caster to use the BMC procedure rather than the BMS procedure.
Future values of a random variable are typically predicted using
point or interval estimates. The implications of making point and
interval estimates via the BMS procedure as opposed to the BMC pro
cedure are discussed in the next two sections.
I11.2.8 Implications for Point Estimation
The point estimate of a future value of some random process will
be denoted by yF. The use of loss functions to determine optimal
point estimates was discussed in the preceding section of this chapter.
If a loss function can be specified by the forecaster and/or decision
maker, it should be used to determine yF. Frequently, however, loss
functions are too costly to develop and predictions must be made
without the information that a loss function provides. In such cases
forecasters usually examine YF's predictive distribution and choose a
measure of its central tendency as their estimate of yF. Their logic
is that central tendency measures are usually in the high density region
of the distribution and will not err significantly even if the actual
YF falls in a tail of YF's predictive distribution. Further, it is
well known that commonlyused loss functions often result in mean,
median, or modal estimates of parameters.
In the preceding section, it was noted that if the BMS procedure
and a quadratic loss function are utilized for forecasting, yF = Hi.
However, even if a BMS forecaster does not have a loss function with
which to work, he might again choose the mean of the chosen model pre
dictive, ui, as his point estimate of yF. In either of these cases, if
the BMS procedure chooses M1 and ul f '2', then, for reasons explained
below, it can be said that the forecaster's point estimate is inappro
priately high or low with probability one. For example, if vH < H, and
P is used by a BMS forecaster to predict YF, Pi is said to be an
inappropriately low prediction of yF.
Suppose it is the next y value that the forecaster would like to
predict. By assessing nonzero model probabilities for M1 and M2, as is
done in both the BMS and BMC procedures, the forecaster/decision maker
is acknowledging that he believes the next observation could be gene
rated by either M1 or M2. A prediction of the next yF value should
acknowledge this uncertainty. But forecasting procedures that utilize
the BMS procedure do not optimally account for this sort of uncertainty
(model specification uncertainty) because they do not appropriately
reflect the possibility that a rejected model may be the true model.
Thus, in the example of the preceding paragraph, ll is said to be an
inappropriately low forecast because it does not appropriately reflect
the fact that yF may be generated by M2.
1Since w = P"(M1)01 + P"(M2)'2 and P"(M1), P"(M2) > 0, P1 f P2
means 1l f H2 f u.
Forecasts made utilizing the BMC procedure do reflect model
specification uncertainty. u, the mean of the BMMP distribution is
an example of a BMCgenerated prediction. As can be seen by examining
its definition, u reflects the belief that YF may be generated by
either M1 or M2:
v = P"(M1 ) + P"(M2) 2.
Since the decision maker's predictive distribution is a mixture of
the model preditives, his optimal estimator will arise from the
mixture as well, and in this case will be p. It is just as appropriate
to use v when model specification uncertainty exists as it is to use,
say, lI when it is known that yF will be generated by M1.
If a forecaster's loss function is asymmetric, the mean of YF's
predictive distribution would not be appropriate for forecasting yF.
Suppose his losses are best represented by an asymmetric linear loss
function and model specification uncertainty exists. Then his optimal
point estimate for YF would be a fractile of YF's BMMP distribution.
A BMS forecaster utilizing an asymmetric linear loss function would
use a fractile of the BHSP distribution. If the asymmetric linear loss
function describes losses from underestimating YF as being greater than
losses from overestimating yF, the BMS forecaster's point estimate
would be a fractile of the BiISF distribution which is greater than the
If the linear loss function were symmetric, the optimal point
estimate would be the median of the BMMP.
mean.1 In such cases the BMS forecaster may seriously underestimate
yF and incur a large loss while thinking is is protecting against
such an occurrence. Suppose p, < v, and the BMS procedure selects Ml.
Then, if the forecaster chooses to estimate YF with a fractile of Ml's
BMSP distribution which is less than p, say, the .7 fractile, the BMSP
reflects his probability of underestimating YF as being only .3. But,
if the .7 fractile of the BMSP distribution is less than p, the BMMP
distribution reflects his probability of underestimating yF as being
greater than .5. Thus, a BHS forecaster may believe he is protecting
against underestimating yF when in fact he has a higher probability of
an underestimate than an overestimate.
The results of this section were generated via a comparison of
BMC and BMS forecasts. It should be noted, however, that point esti
mates determined by any procedure which utilizes a single model that
has been selected from a set of viable models will typically be mis
placed. This is due to the fact that use of a single model, however
selected, has the effect of ignoring information provided by those
remaining models which have positive posterior probability.
111.2.9 Implications for Interval Estimation
The procedure of predicting that a future value of a random
process will take on a value between two specified real numbers with
1Raiffa and Schlaifer, p. 345, have shown that the predictive
distributions for yF yielded by M1 and M2 are Student. Since the
Student distribution is unimodal and symmetric, its mean and median
are equal.
some positive probability is referred to as Bayesian interval esti
mation. The interval represented by the two given numbers is called
a credible interval. Often, a Bayesian will choose as his credible
interval a Highest Posterior Density (HPD) region.1 Denoting YF's
predictive distribution as f(yF]y), an interval I in the domain of
YF is called a HPD region of content 1 a if
a) P(YF E I) = 1 a
b) YFI e I and yF2 1 I implies
f(yFl1y) f(YF21Y)'2
BMS interval forecasts of yF are determined from the predictive
distribution of yF generated by the model chosen by the BMS procedure,
i.e., a Bayesian Model Selection Preditive (BMSP). BMC interval fore
casts of yF are determined from the appropriate Bayesian Mixed Model
Predictive (BMMP).
Recall that under the assumptions of Section 111.2.4, M1 and M2
define unimodal, symmetric distributions (Student distributions).
Accordingly, a HPD credible interval determined from Mi's BMSP will be
centered at pi. Thus, when model specification uncertainty exists and
Wl uP2' the midpoint of a BMS credible interval is inappropriately
high or low in the same sense as BMS point estimates were in the
Bayesian methods for optimal interval estimates exist when, as in
the case of point estimation, appropriate loss functions may be speci
fied. See R. L. Winkler, "DecisionTheoretic Approach to Interval Esti
mation," Journal of the American Statistical Association, 67 (1972),
187191.
2George E. P. Box and George C. Tiao, Bayesian Inference in
Statistical Analysis (Reading, MA: AddisonWesley, 1973), p. 123.
preceding section. The discussion of this phenomenon with respect to
point estimates in the preceding section applies equally well here.
Under the assumptions that M1 and M2 are normal regression models,
Il f u2, and P'(h1), P'(M2) > 0 (see Section 111.2.4), the BMMP distri
bution is bimodal. Accordingly, an HPD BMC credible region will fre
quently consist of two intervals; one with midpoint u1, the other with
midpoint P2. Interval forecasts that are comprised of more than one
interval will be referred to as splitinterval forecasts or split
credible intervals. An HPD split credible interval serves to warn a
decision maker that it is highly probable that yF will take on a value
in one of two or more noncontiguous regions.
The following two lemmas demonstrate how a credible interval
formed using a BMSP can be misleading when model specification uncer
tainty exists. In Lemma 5, the intersection of the BMSP's of M1 and
M2 between their modes is referred to as the intermodal intersection.
The YF value that corresponds to the intermodal intersection will be
denoted YF.
LEMMA 5: Let 1 f v"2 and suppose BMS chooses model i. If the length
of a credible interval formed using the BMSP is less than or equal to
2pi YF, then the BMS credible interval overstates the probability
that it will cover YF'
PROOF: Recall that the BMMP is a mixture of predictive distri
butions generated by M1 and M2:
f(yF IYD,DF) = P"(M1 Iy,D)f(yF IM ,y,D1 DF1)
+ P"(M2 ly,D)f(yFM2Y,D02DF2)
Thus,
f(YFi,Y,D DFi f(yF y,D,DF).
When P"(Mi y,D) f 1, then
f(Fli 'Y,Di'DFi) > f(yFIY ',D,DF)'
If, say, p, < y2 and yF < Y, then
f(yFIM ,y,D,DF0) > f(yFy,D,DF).
Thus, the probability of an interval centered on pi of length less
than 21I YI containing yF is greater when the probability is
evaluated via f(YFIMIY,D,DFl), rather than f(yFly,D,DF).
If the conditions of Lemma 5 are fulfilled, the probability of a
BMS credible interval covering yF is actually smaller than claimed by
the forecaster using the BMS credible interval. Thus, the BMS credible
interval overstates the probability of yF being covered and therefore
understates the risk involved in using the interval forecast for
decisionmaking purposes. Notice that since f(yFy,D,DF) >
f(yMy,D1,DF) when y> y, it is unclear whether a BMS credible
interval of length greater than 211 yFI understates or overstates
the probability that it will cover yF'
LEMMA 6: If pl = P2 and the BMS procedure chooses the model with the
higher (lower) predictive variance, then a BMS credible interval under
states (overstates) the probability that it will cover yF'
PROOF: Theorem 3 of Section III.1 showed that if the BMS proce
dure chooses the model with the higher preditive variance, thenV(BMMP)
may be less than, greater than, or equal to V(BMSP). Recall that
V(BMMP) = 2 P"(MI Iy,D)ac2 + P"(M2,D) 2 + P"(M y,D)(l w)2
+ P"(M2y,.D) (" 2 u)2
and
E(BMMP) = p = P"(M1 y,D)p + P"(M21y,D)"2.
Thus, pl = 12 implies that p1 = P2 = p and
2 P"(1 y,D)o2 + P'"(I'2y,D)o2.
Therefore, if the BMS procedure chooses the model with the higher
predictive variance,V(BMMP) < V(BMSP). Under the assumptions of this
chapter, a BMSP distribution is Student and, therefore, unimodal and
symmetric. Accordingly a 95 percent credible interval, say, formed
using the BMSP distribution will be wider than a 95 percent credible
interval formed using the BMMP distribution. It follows that the
probability of yF being covered by a BMMP (i.e., BMC) credible inter
val of the same size as a 95 percent BMS credible interval is greater
than .95. Thus, it may be said that when the conditions of Lemma 6
are fulfilled, a Bi1S credible interval understates its probability
of covering yF'
In this chapter, it has been shown that when model specification
uncertainty is present, the appropriate distribution with which to
characterize a datagenerating process is the BMMP of the BMC proce
dure. Failure to use the BMMP when model specification uncertainty
exists results in two interesting and seemingly contradictory effects.
First, in using a single model, however selected, information provided
by the remaining models which have positive posterior probability is
ignored. Second, in ignoring available information about model speci
fication uncertainty, the forecaster behaves in many cases as if he
is facing a lesser degree of uncertainty than is actually the case.
Thus, the forecaster simultaneously discards relevant information
and behaves as if he possesses more information than is actually
possessed.1 This phenomenon was noted in both point and interval
forecasting situations.
In the next chapter, the BMC procedure is applied to single
period economic control problems.
1Christopher B. Barry and P. George Benson, "Specification Uncer
tainty in Economic Forecasting and Control Models," University of
Minnesota, Graduate School of Business Administration, Working Paper
No. 35 (February, 1977), p. 7.
CHAPTER IV
MODEL SPECIFICATION UNCERTAINTY IN SINGLEPERIOD
ECONOMIC CONTROL PROBLEMS
In Chapter III the consequences of forecasting with and without
considering model specification uncertainty were examined. Given the
existence of model specification uncertainty, it was concluded that the
BMC procedure was an appropriate procedure to utilize in predicting
future values of a random process. In this chapter, the BMC procedure
is applied to singleperiod economic control problems. In particular,
the BMC procedure will be used to find both certaintyequivalent and
optimal analytic solutions to singleperiod control problems. In both
cases, control solutions will be derived which take into consideration
costs that may be incurred by a controller as a result of his employing
a particular instrument (controllable variable) to help control a
random process.
By using the BMC procedure to solve economic control problems,
control solutions need not be artificially conditioned on the assump
tion that a particular econometric model is in fact an accurate
characterization of the process whose control is desired. Instead, a
controller's model specification uncertainty is reflected in his con
trol solutions, i.e., in his decisions concerning the levels or rates
at which to set his controllable variables. By explicitly recognizing
model specification uncertainty and including it through the BMC
procedure as part of the economic control problem, the controller is
appropriately specifying the risk that control entails.
In the following section, the economic control problem is defined,
references to previous work in this area are cited, and the integra
tion of the BMC procedure and the economic control problem is discussed.
IV.1 The Economic Control Problem
The problem of effecting the outcome of some economic data
generating process such as the GNP, rate of inflation, or unemployment
rate, is referred to as an economic control problem. More specifically,
given an econometric model of the datagenerating process of interest,
a singleperiod economic control problem involves determining settings
for the model's instruments  controllable variables  in one time
period such that in the next time period the model's dependent variable
 a desideratum or policy objective  is close to a specified target
value or within a specified target interval. Controlling values for
the model's instruments are determined by optimizing an objective or
criterion function that is typically a function of the difference
between the target and realized values of the dependent variable.
An economic control problem may be expressed as:
min f L(y y*)f(ylX)dy (4.1)
X 
When a control of a dependent variable is desired over more than
one time period, the problem is referred to as a multiperiod control
problem. For a discussion of multiperiod control, see Zellner,
336354.
where y is the dependent variable whose control is desired, y* is the
value or target the controller would like y to attain next period, and
X is the model's vectorvalued instrument. L(y y*) is a loss func
tion that describes the losses incurred by the controller as a result
of y not equalling the target, y*, or not falling in the target inter
val.1 The loss values may be viewed as opportunity losses or "social
costs". The function f(y[X) is the predictive distribution of future
values of y as determined by the econometric model used to characterize
y. Thus, in this case control of y is effected by setting X in the
current time period so as to minimize next period's expected loss.
If sample information about the process is available, the control
problem is still solved by minimizing expected loss, but the expecta
tion is taken with respect to a predictive distribution that reflects
the sample information. Letting y and X now refer to observed data
points and YF and XF refer to the controlperiod values of the target
and control variable, respectively, the problem becomes
min f L(yF yF*)f(yFIYX,XF)dyF. (4.2)
XF F
1For a discussion of the sensitivity of control to the form of
the loss function, see Arnold Zellner and Martin Geisel, "Sensitivity
of Control to Uncertainty and Form of the Criterion Function," in
D. G. Watts (ed.), The Future of Statistics (New York: Academic Press,
1968), 269283.
In the singleperiod control problem described above, it is
assumed that the controller knows the correct econometric model or
random process he wishes to control. Accordingly, in solving his
control problem, the controller has only to contend with parameter and
residual uncertainty, not model specification uncertainty. Much work
has been done on such problems by, for example, Fisher, Brainard,
Leland, Basu, and Zellner.2 The more complicated multiperiod control
problem, in which it is assumed that the controller knows the correct
econometric model of the process he desires to control, has also re
ceived attention. See, for example, Aoki, Prescott, Zellner, Taylor,
and Chow.3 The approaches to single and multiperiod control of the
1For a more complete discussion of control problems, see Zellner,
319359.
2Walter D. Fisher, "Estimation in the Linear Decision Model,"
International Economic Review, 3 (January, 1972), 129. William
Brainard, "Uncertainty and the Effectiveness of Policy," American
Economic Review, 57 (May, 1967), 411425. H. Leland, "The Theory of
the Firm Facing Uncertain Demand," American Economic Review, 62 (1972),
278291. A. Basu, "Economic Regulation Under Parameter Uncertainty,"
(Ph.D. dissertation, Economics Department, Stanford University, 1973).
Arnold Zellner, An Introduction to Bayesian Inference in Econometrics
(New York: John Wiley and Sons, 1971), 319336.
3Masanao Aoki, Optimization of Stochastic Systems (New York: Aca
demic Press, 1967); Edward C. Prescott, "Adaptive Decision Rules for
Macro Economic Planning" (Ph.D. dissertation, Graduate School of Indus
trial Administration, CarnegieMellon University, 1967); Edward C.
Prescott, "The MultiPeriod Control Problem Under Uncertainty," Econo
metrica, 40 (November, 1972), 104358; Zellner, pp. 33654; John B.
Taylor, "Asymptotic Properties of Multiperiod Control Rules in the
Linear Regression Model," Institute for Mathematical Studies in the
Social Sciences, Stanford University, Technical Report No. 79, December,
1972; Gregory D. Chow, "Effect of Uncertainty on Optimal Control Poli
cies," International Economic Review, 14 (October, 1973), 632645;
Gregory C. Chow, "A Solution to Optimal Control of Linear Systems with
Unknown Parameters," Econometric Presearch Program, Princeton University,
Research Memorandum No. 157, December, 1973.
abovementioned authors are theoretically appropriate only if the
controller can assert with probability one that the model he has
chosen to represent the process whose control is desired is in fact
the correct representation of the process. If the controller can make
such a statement, then in solving his control problem he only has to
contend with the model's parameter and residual uncertainty. If, how
ever, he specifies the chosen model's appropriateness with a model
probability less than one, he is acknowledging the existence of model
specification uncertainty. Theoretically, if model specification
uncertainty exists, it should be dealt with in control problems. It
should not be ignored or assumed away via some model selection proce
dure such as Bayesian Model Selection.1 Control procedures that fail
to consider model specification when it exists are not optimal proce
dures. Such procedures, in the sense of Chapter III, misspecify the
uncertainty involved in controlling y, and therefore, the risk faced by
the controller in using them to set the rate or level of his instru
ments.
Model specification uncertainty has not been explicitly considered
in the control literature. Since it may have an impact upon optimal
control solutions, it merits consideration. That the consideration of
model specification uncertainty in control contexts is important and warrants
1For a discussion of several model selection procedures that are
frequently used to establish econometric models of processes whose
control is desired, see Gaver and Geisel, "Discriminating Among Alter
native Models: Bayesian and ilonBayesian Methods," pp. 4977.
a great deal of attention has been expressed by Pierce:
Another area of uncertainty has to do with our
models. I want to stress this because users of
control theory often tend to take models as given
and work out solutions without seriously ques
tioning the reasonableness of the models. This
tendency is not very harmful when one is working
on technique. However, there is a real danger
of giving more credence to model results than
they deserve, especially if a particular policy
trajectory is highly influenced by the choice
of a model.'
He goes on to say:
The problem lies not with uncertainty concerning
the true value of the model parameters, but also
with the structure of the models themselves.2
By utilizing the Bayesian Model Comparison procedure to develop
a Bayesian Mixed Model Predictive distribution for the process whose
singleperiod control is desired, a controller can determine settings
for his instruments in light of residual, parameter, and model speci
fication uncertainty.3 When singleperiod control is desired, the
solution to the following minimization problem provides optimal
settings for the controller's instruments, DF:
min J L(YFYF*)f(YFly,D,DF)dyF (4.3)
DF 
J. L. Pierce, "Quantitative Analysis for Decisions at the Federal
Reserve," Annals of Economic and Social Measurement, 3 (1974), 19.
2Ibid.
3That a BMMP in fact reflects model specification uncertainty was
discussed in Chapter III.
Recall that D = (X,Z)' and DF = (XF,ZF)'. The function f(yF y,D,DF)
is the controller's GMMP for the datagenerating process. All other
terms in (4.3) are as previously defined. The only difference between
(4.3) and (4.1) or (4.2) is the use of a Bayesian Mixed Model Predic
tive in (4.3) rather than a predictive distribution determined from a
single model.1 Since all relevant major forms of uncertainty, resi
dual, parameter, and model specification uncertainty are reflected in
(4.3) and, therefore, influence its solution, it is said that (4.3)
provides optimal settings for DF.
In the next section of this chapter, assumptions are presented
under which various singleperiod control solutions are obtained using
the BMC procedure in the remainder of the chapter.
IV.2 Model Space and Assumptions
In the remainder of this chapter, solutions will be derived for
singleperiod control problems based on the following assumptions:
1. The decision maker (controller) believes that one or
the other of the following two models is an accurate
representation of a datagenerating process to be con
trolled, but he is unsure which one is correct:
M1: y = I1X + e ;
M2: y = $2Z + 6. (4.4)
y is the target variable, and X and Z are two dif
ferent nonrandom explanatory variables, instruments
1In what follows, control problems that deal with a single model
will be expressed as in (4.1) or (4.2).
over which the controller has complete control. 81 and
82 are unknown parameters. E and 6 are the usual normally
distributed error terms, each with zero mean and unknown
2 2
variance, o and o respectively. It is also assumed that
cov(8,E) = cov(B2,6) = cov(e,6) = 0. Thus, M1 and M2 are
normal univariate regression models which, to keep the number
of each model's unknown parameters at two, have been forced
through the origin.
2. The datagenerating process over which control is desired
is stationary.
3. X and Z are uncorrelated, and only the controllable variable
in the true model affects y. Thus, if M1 were the true
model, 82 would be zero. If neither M1 nor rl2 were the true
model, it may be that 81 = B2 = 0.
4. The controller's loss function is a quadratic loss function
of the form
L(YFYF*) = K(yF yF*)2
where K is a constant. In what follows, K is set equal to
one without loss of generality.
Aside from the change in emphasis from forecasting to control, and the
assumption that X and Z are controllable variables, the above assump
tions are similar to those under which the Bayesian Model Comparison
and Bayesian Model Selection procedures were compared in Chapter III
(see Section 111.2.4).
In the next section, certaintyequivalent solutions to single
period control problems will be derived under the above assumptions
with the use of the BMC procedure.
IV.3 SinglePeriod CertaintyEquivalent Control
If in attempting to control y's value next period, the controller
behaves this period as if E(y) is the value of y that will occur with
certainty next period, then E(y) is said to be a "certainty equivalent"
for y.1 When the process which generates y is known or assumed to be
known, the singleperiod control problem under parameter and residual
uncertainty is reduced to a deterministic problem. If the process
which generates y is not known, but is believed to be best represented
by one of N alternative models, the singleperiod control problem under
model specification, parameter, and residual uncertainty reduces to one
of control under model specification uncertainty alone. In this sec
tion, singleperiod control solutions are derived for the controller
who admits to model specification uncertainty and behaves as if E(y)
will occur with certainty next period.
The use of the certaintyequivalent E(y) for y reduces models 1
and 2 of Section IV.2 to the following:
M1: Eyl (y) E E ( )X + EE(c) = bl'X;
(4.5)
M2: Ey[x(Y) = EB2(62)Z + E (6) = b2'Z.
For a definition and discussion of certainty equivalence, see
Herbert A. Simon, "Dynamic Programming Under Uncertainty with a Qua
dratic Criterion Function," Econometrica, 24 (1956), 7481; and/or
C. Holt, J. F. Muth, F. Modigliani, and H. A. Simon, Planning Produc
tion, Inventories and Work Force (Englewood Cliffs, N.J.: Prentice
Hall, 1960), Chapter 6.
In (4.5) the parameter and residual uncertainty of M1 and M2 are
treated as if they do not exist. Thus, if neither M1 nor M2 is known
or assumed to be the true model, it is only necessary to deal with
model specification uncertainty.
From (4.2), assuming that M1 is the true model, the singleperiod
control problem is solved by determining:
min f L(YF,YF*)f(YFYX,XF )dyF min E L(YFYF*). (4.6)
XF _ XF yFyX,XF
The solution to (4.6) yields the controller's minimum expected loss
under Mi's predictive distribution of yF. If the loss function is
quadratic, as is assumed for the remainder of this chapter, (4.6)
becomes:
min E y,XXFF F*)2 (4.7)
XF YYXF
The use by the controller of EyF y ,XXF (YF as a certainty equivalent
for YF reduces (4.7) to the following:
min [Ey ,X (YF) y*]2. (4.8)
X F
Note that (4.8) contains no random terms. Thus, (4.8) is minimized by
the value of XF that sets EyF ,X,XF (F) equal to yF*. From (4.5) it
can be seen that EyF y,X(YF) = E y(B1)XF + EEy(c) = bl"XF. Thus
The appropriate setting for XF is one such that bl"XF = y*. According
ly, XF should be set equal to y This is the singleperiod certainty
S
equivalent solution when it is assumed that model 1 generates
yF1 Similarly, when model 2 is assumed to generate yF' the single
period certaintyequivalent solution is to set Z equal to ,.2
Singleperiod certaintyequivalent control solutions in which
a particular model is assumed to generate yF are derived assuming the
mean of YF's predictive distribution is the value of yF that will
occur with certainty next period. This is equivalent to assuming that
B1 = bl" and e = 0. YF's predictive variance is ignored in the cer
taintyequivalent solution. Consequently, such solutions are not
optimal but are only approximations to optimal solutions, as explained
by Zellner. In general, since certaintyequivalent control problems
ignore yF's predictive variance and, therefore, parameter and residual
uncertainty, their solutions are much easier and less costly to obtain
than are optimal control solutions. Consequently, certaintyequivalent
control may at times provide the controller with an attractive alter
native to fullscale optimal control.
IV.3.1 CertaintyEquivalent Control Using the BMMP Distribution
By using the Bayesian Model Comparison procedure's Bayesian Mixed
Model Predictive as YF's predictive distribution, singleperiod cer
taintyequivalent control solutions can be derived which reflect the
controller's model specification uncertainty concerning M1 and M2 of
1This solution can also be found in Zellner, pp. 320322.
2Notice that these certaintyequivalent solutions make the control
target, y*, the mean of YF's predictive distribution.
Zellner, pp. 322324.
the previous section. This approach to certaintyequivalent control
also does not explicitly consider parameter and residual uncertainty
and is, therefore, also suboptimal. But by enabling the controller to
solve his control problems in light of any model specification uncer
tainty, this approach may improve the effectiveness of certainty
equivalent control solutions. As will be seen below, singleperiod
BMC certaintyequivalent control, as it will be called, requires little
more computational effort than the certaintyequivalent control solu
tions derived above in which specification uncertainty was not treated.
The BMC certaintyequivalent control solution can be obtained from
the fullscale BMC control problem of (4.3). (4.3) is repeated here
and the BMC certaintyequivalent control solution is derived below:
min f L(yF,yF*)f(YF ,D,DF)dyF. (4.9)
DF _
Recall from (3.1) that for the twomodel case the BMMP distribution
would be expressed as
f(yFIYD,DF) = P"(Ml y,X)f(yFIM1y,X,XF)
+ P"(M2ly,Z)f(yFIM2,y,Z,ZF). (4.10)
In this case, D and DF are vectors of control variables: D = (X,Z)'
and DF = (XFZF)'. Accordingly, (4.9) may be written
min [P"(Ml y,X) f L(F,yF*)f(YFIM1 yXXF)dyF
D+
+ P"(M21y,Z) L(YFYF*)f(YFIM2 yZZF)dyF]. (4.11)
Under the assumption that the loss function is quadratic, (4.11) may
be rewritten
min [P"(M1 YX)EF M1,X,X(YF *)2
DF Fi
+ P"(ML2 ,Z)E FM2,YZ,ZF (F yF*) 2 (4.12)
The use by the controller of E FI MyXXF (F) = C1 and
EyFIM2,YZZFYF) = C2 as certainty equivalents for yF in M1 and M2,
respectively, means that yF is no longer treated as being random.
Consequently, (4.12) reduces to
min [P"(MIjy,X)(C1 yF*)2 + P"(2yZ)(C2 F) 2]. (4.13)
DF
Because the righthand term inside the brackets of (4.13) is not a
function of XF, and the lefthand term is not a function of ZF, the
vector optimizing (4.13), DF*, may be found by minimizing each of the
terms within the brackets separately. Thus, in order to find DF*,
the singleperiod BMC certaintyequivalent control solution, the
following two problems must be solved:
min P"(M1Iy,X)(C1 yF*)2 (4.14)
and XF
min P"(M21YZ)(C2 yF*)2. (4.15)
ZF
Noting that P"(M1Iy,X) is not a function of XF, and that P"(M21y,Z) is
not a function of ZF, (4.14) and (4.15) reduce to the following:
min (C1 F*) ; (4.16)
XF
2
min (C2 yF*) (4.17)
ZF
Notice that (4.16) is the same as (4.8). Thus, for example, in order to
solve (4.16) XF should be set equal to y,. Thus, DF* = (),  In
F bb2 1 2
words, the BMC certaintyequivalent control solution is to set XF as if
M1 were in fact the model generating yF and to set ZF as if 12 were the
true model.
The rationale behind the BMC control solution is that since the
controller is unsure of which of the two control instruments, XF or
ZF, affects yF, and since he believes that only one of them actually
affects y, he should use them both fully in attempting to attain yF*'
Due to the restrictive assumptions under which it was derived, this
solution is somewhat unrealistic. A more realistic solution would
account for the possibility that (1) costs might be incurred for the
use of an instrument, especially for the use of an inappropriate
instrument; (2) both instruments might affect yF; (3) the instruments
interact in some manner; and/or (4) the process generating y may be
nonstationary. The first of these more realistic cases will be
discussed with respect to optimal BMC control in Section IV.4. At
For a solution to how to account for the cost of changing the
setting of an instrument in the optimal singleperiod control problem
in which a particular model is assumed to generate yF, see Zellner,
pp. 324325.
that time, the appropriate optimal BMC and BMC certaintyequivalent
control solutions for various cases in which instrument use costs are
involved will be derived. Case (2) above is discussed in Section
IV.4.5, and an approach to case (4) is discussed in Chapter V.
IV.3.2 Risk Specification in CertaintyEquivalent Control
Even though a controller may behave as though the expected value
of YF is certain to occur next period, he should not ignore the risk
involved in his choosing to do so. This risk may be represented by
YF's predictive variance. The larger YF's predictive variance, the
more likely that L(YF'YF*) = (YF F*)2 will be large. Thus, the
controller can use YF's predictive variance as a measure of the risk
involved in his attempt to attain yF*. If the risk appears too great,
the controller may choose a different control method, perhaps optimal
singleperiod control (discussed in Section IV.4), since it considers
the size of YF's predictive variance in determining settings for the
controller's policy instruments.
If the controller knows that a particular model, say M, will
generate yF, recalling (3.24), YF's predictive variance and a measure
of the risk being taken by the controller is
(n 1)5 X
02 1 + 1 (4.18)
1 (n 3) N 2 (4.
1 xi
where xi is the i sample observation of X. Notice that 01 is a function
of the controller's instrument setting, XF. Consequently, since the con
trol method chosen affects XF, it also influences the size of o1.
In the case of certaintyequivalent
F*
control, XF = and the predictive variance is
(n 1)S2 y*
( x2
2 (n _) + (4.19)
1i=1 j
If the controller acknowledges model specification uncertainty
and chooses to control via BMC certaintyequivalent control, then,
recalling (3.12), YF's predictive variance is
o2 = p,(M1 y,X)o2 + P"(M y,Z)o2 + P"(M1 y,X)(l 2
+ P"(M21y,Z)(2 )2 (4.20)
Di is the mean of YF's predictive distribution as characterized by
model i, and 1 is the mean of the BMMP distribution for yF.
Equation (4.18) provides an appropriate risk measure only if the
controller is certain that a particular model will generate YF. If he
utilizes the BMS procedure to choose a model for yF, he is acknow
ledging that he is uncertain of the form of the process generating
yF. Consequently, (4.18) is not an appropriate measure of his risk.
2
If the BMS procedure chooses, say, Ml, a1 understates the risk in
volved in his attempt to attain yF*. The following lemma is needed
to prove this statement.
LEMMA 7. Let n > 3. When BMS and the maxR2 rule provide equivalent
methods for choosing between M1 and M2 (see Section III.2.2), and
singleperiod certaintyequivalent control is applied to the model
chosen by BMS, say, model 1, then V(BMSP) = o2 1 V(BMMP) = 02
PROOF: Suppose the BMS procedure chooses M1, and M1 is used to
YF*
control yF. The certaintyequivalent control solution is XF =b
Accordingly, o1 is as shown in (4.19). Raiffa and Schlaifer show that
N
I xiYi
1 N 2 '
Si=x
where xi and y. may be the ith sample observation of X and y, or
reflect prior information about Bi in a form equivalent to sample
observations.1 Substituting for bl" in (4.19) yields
n 2 2
(n I)S5 (I l i F
2 1 i=l 1
i=l
1 (n 3) n (4.21)
2
s estimated residual variance, is, by definition,
n 2 n 2
Yi ( xi
2 2
Thus, a necessary and sufficient condition for S = S2 is
Howard Raiffa and Robert Schlaifer, Applied Statistical Decision
Theory (Cambridge, Mass.: M.I.T. Press, 1961), p. 343.
n 2 N 2
S xiYi)2 ( ziYi)2
i=l i=l
=1 = (4.23)
n 2 n 2
Szxi i
i= 1 i1= l
where zi is the ith sample observation of Z. Accordingly, if S1 = S~,
(xy) (2 2 2
then 2 = ~X) Z and, noting ao's definition in (4.21), o = .
2 z 2
1 2
If it can be shown that 2 > 0, then it can also be said that o, >
2S1
2 2 2 ,1
2 when S > S2. That > 0 is demonstrated in the next paragraph.
h 1 >
Noting that
(Y = [ y2 (n 1)S2] > 0, (4.24)
x
(4.21) can be rewritten
2 (n 1)S yF 2
1 (n 3) 2 (n )S] (4.25)
Taking the partial derivative of (4.25) with respect to S2 yields
ao2 (n l)yF2
1S2 (n 3)[y 2 (n 1)S2
(n 1) 2 F* (n 1)
+(n 3)[y2 (n 2(4.26)
By (4.24), the denominator and, therefore, the entire first term on the
rhs of (4.26) is positive when n > 3, The second and third terms on
the rhs of (4.26) are also obviously both positive if n > 3. Conse
302
quently, > 0.
351
Under the conditions of this lemma, the BMS procedure selects the
model with the lower 2
model with the lower S2 (higher R2), Thus, since 01 = 02 when S = S2'
22 2 2 2
and 0 > 2 when S2 > S2, the BMS procedure also selects the model with
the lower predictive variance. Recall Theorem 2 of Chapter III in
which it was shown that if the BMS procedure chooses the model with
the lower predictive variance, then V(BIMMP)? V(BMSP). Accordingly,
by Theorem 2, the desired result is obtained.1
If model specification uncertainty exists, the BMMP distribution
of the BHC procedure is the appropriate distribution with which to
characterize yF; any other procedure for determining the predictive
distribution will fail to include relevant information. Accordingly,
when model specification uncertainty exists, the appropriate measure
2 2 2
of the controller's risk is 02, not a.. As shown in Lemma 7, a is
1 1
less than 2 and therefore understates the controller's risk.
In this section, certaintyequivalent solutions have been con
sidered, but, certaintyequivalent solutions are not fully optimal in
1Recall that Theorem 2 showed that V(BMMP) > V(BMSP). It was
noted, however, that V(BMMP) = V(BMSP) only when one or the other of
p'(M1) and P'(M2 equalled one. But neither of these cases involve
model specification uncertainty and, therefore, are not of interest
in this dissertation. Therefore, under the conditions of Lemma 7,
V(BMMP) > V(BMSP) in cases of interest.
general. In the next section, optimal singleperiod BMC control
solutions will be derived.
IV.4 Optimal SinglePeriod Control
A control procedure will be referred to as providing an optimal
solution to a control problem if it explicitly recognizes all existing
major forms of uncertainty and utilizes the information provided by
them in its solution to the control problem. Thus, for example, for
a control procedure and its solution to be called optimal when the
controller knows the form of the model generating y, but does not
know the parameter values of the model, the procedure need only con
sider residual and parameter uncertainty. However, should specifica
tion uncertainty concerning the model be present as well, the procedure
would have to consider residual uncertainty, parameter uncertainty,
and model specification uncertainty. As discussed in Section IV.3, the
certaintyequivalent approach to economic control problems treats
residual and parameter uncertainty suboptimally and, unless BMC
certaintyequivalent control procedures are used, also treats model
specification uncertainty suboptimally. In this section, optimal
control solutions, i.e., solutions that appropriately treat residual,
parameter and model specification uncertainty, will be derived using
the BMC procedure. These solutions will be referred to as "optimal
BMC control solutions."
Before proceeding with the derivation of optimal BMC control
solutions, mention should be made of the optimal control solution for
the case in which the controller knows the form of the model generating
y, but not its parameters. Assuming M1 is the true model, and em
ploying a quadratic loss function, Zellner shows that the optimal
solution to (4.2) is1
SF (4.27)
XF n
(n 1)S2 xiYi
1 i=l
+
n n 2
(n 3) xiYi i x
i=l i=l
Equation (4.27) may be rewritten so that its relationship to the
certaintyequivalent solution to this problem may be examined:
XF = 12j (4.28)
1F (n 1)S
1+ 1
n x
(n 3)bl"2 x2
i=l
Recall from Section IV.3 that the certaintyequivalent solution is
YF*
F b
Thus, as Zellner has noted, the certaintyequivalent solution is just
the first term on the rhs of (4.28).2 Zellner has shown that as the
precision of the estimation of 81 improves (i.e., as the posterior
variance of 81 decreases),
Zellner, pp. 320322.
2bid.
(n 1)S2
1 0n 0
(n 3)bl2 xi
i=l
and, accordingly, the second term on the rhs of (4.28) approaches 1.
Thus, if b" is a very precise estimate of 81, (4.29) is approximately
(4.28). Zellner has also demonstrated that the use of the certainty
equivalent solution (4.29) leads to higher expected losses than the
use of (4.28).2
IV.4.1 Optimal BMC Control
The optimal BMC control solution is obtained by minimizing ex
pected loss over XF and ZF using the BMMP distribution of YF. This
problem was stated in (4.3) and is repeated here for convenience:
min f L(yF'F*)f(yFly,D,DF)dyF. (4.29)
DF F
(4.29) is solved below for the twomodel case (see the assumptions of
Section IV.2) under study in this dissertation.
Substituting (4.10) for f(yF I,D,DF) in (4.29), the minimization
problem becomes
min [P"(MIy,X) L(YFYF*)f(YFIM1Y,y,,XF)dyF
DF
+ P"(M21y,Z) L(YFYF*)f(YFIM2,yZ,ZF)dyF]. (4.30)
Ibid.
Zellner, pp. 322324.
Recalling that 81 and c, and 62 and 6 are assumed to be independent,
the following transformation of variables can be made in the first
and second terms of (4.30), respectively, so that (4.30) may be written
in a more convenient form:
YF = XF +
YF= 2ZF +
Thus, utilizing a quadratic loss function for L(YF,YF*), (4.30) may
be written
min {P"(MI1y,X)E1 ,,y,,X[yF* (BIXF + )]2
D F (' ]
+ P"(M2 'yZ)E 2,,jtYZ,ZZLF ( 2ZF 6 )]2} (4.31)
It can be seen that, as in the case of the BMC certaintyequivalent
control problem, (4.31) separates into two minimization problems,
min E1, y2,X,X[Y* (lXF + E)]2 (4.32)
and
min E62 y Z Z F[YF* (2Z + 6)]2. (4.33)
Z F 62' e yZZF F'
Recall that (4.2) is the mathematical statement of the control problem
when it is known that M1 will generate yF. After the transformation
of variables noted above, (4.2) and (4.32) are the same. Thus, the
solution to (4.32) will be the same as that derived by Zellner for
(4.2). Except that it is M2 that is known to be generating yF in
