• TABLE OF CONTENTS
HIDE
 Title Page
 Dedication
 Acknowledgement
 Table of Contents
 Abstract
 Introduction
 Hypothesis testing, Bayesian model...
 Forecasting with and without regard...
 Model specification uncertainty...
 Bayesian model switching
 Concluding comments and suggestions...
 Bibliography
 Biographical sketch






Group Title: Bayesian analysis of model specification uncertainty in forecasting and control /
Title: A Bayesian analysis of model specification uncertainty in forecasting and control /
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00097481/00001
 Material Information
Title: A Bayesian analysis of model specification uncertainty in forecasting and control /
Physical Description: ix, 152 leaves : ; 28 cm.
Language: English
Creator: Benson, Paul George, 1946-
Publication Date: 1977
Copyright Date: 1977
 Subjects
Subject: Bayesian statistical decision theory   ( lcsh )
Economic forecasting   ( lcsh )
Management thesis Ph. D   ( lcsh )
Dissertations, Academic -- Management -- UF   ( lcsh )
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
 Notes
Thesis: Thesis--University of Florida.
Bibliography: Bibliography: leaves 148-150.
Additional Physical Form: Also available on World Wide Web
General Note: Typescript.
General Note: Vita.
Statement of Responsibility: by Paul George Benson.
 Record Information
Bibliographic ID: UF00097481
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000209989
oclc - 04164547
notis - AAX6808

Downloads

This item has the following downloads:

PDF ( 5 MBs ) ( PDF )


Table of Contents
    Title Page
        Page i
        Page i-a
    Dedication
        Page ii
    Acknowledgement
        Page iii
        Page iv
        Page v
    Table of Contents
        Page vi
        Page vii
    Abstract
        Page viii
        Page ix
    Introduction
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
    Hypothesis testing, Bayesian model selection, and Bayesian model comparison
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
    Forecasting with and without regard for model specification uncertainty
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
    Model specification uncertainty in single-period economic control problems
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
    Bayesian model switching
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
    Concluding comments and suggestions for further research
        Page 139
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
        Page 146
        Page 147
    Bibliography
        Page 148
        Page 149
        Page 150
    Biographical sketch
        Page 151
        Page 152
        Page 153
        Page 154
Full Text









A BAYESIAN ANALYSIS OF MODEL SPECIFICATION UNCERTAINTY
IN FORECASTING AND CONTROL













By

PAUL GEORGE BENSON


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY





UNIVERSITY OF FLORIDA

























Even though you now know the true
model of the process and have no
need for this dissertation
... this is for you, Dad.













ACKNOWLEDGEMENTS


For his technical insights, constant encouragement, unflagging

optimism, patience, and personal commitment to me and this dissertation

I am deeply indebted to my friend and major advisor, Dr. Christopher B.

Barry. Exposure to his enthusiasm for and approach to research has

sparked my interest in research and made me aware of career paths I

might otherwise have overlooked.

Dr. Ira Horowitz has made numerous helpful technical and editorial

suggestions. More importantly, at the time of my greatest personal need

his door was always open. His emotional support and understanding will

never be forgotten.

I am grateful to the other members of my committee, Dr. Roger D.

Blair, Dr. H. Russell Fogler, and Dr. James T. McClave, for not only

their comments and constructive criticisms of this dissertation, but for

their encouraging words and advice throughout my tenure at Florida. I

am fortunate to have such friends.

Dr. William Miendenhall has been both friend and advisor for many

years. It was he who originally aroused my interest in combining a

major in quantitative management with a minor in statistics. I have

benefited both personally and professionally from his advice, support,

and concern for my well being.









I particularly want to thank Dr. Max R. Langham for the interest

and confidence he showed in me during my first two years at Florida.

His words of encouragement made me feel capable of succeeding in a doc-

toral program and convinced me to continue on for a doctorate.

My colleagues David L. Hill and Ronald E. Shiffler took time from

their own dissertations to listen to and comment on my ideas. I particu-

larly want to thank Dave for his many very helpful suggestions.

My thanks to Kathy Jarboe at the University of Florida and Diane

Berube at the University of Minnesota for the excellent typing support

they provided while I was drafting chapters of this dissertation. Thanks

also to Kathy for cheerfully functioning as a go-between and red-tape

cutter at the University of Florida after I moved to Minnesota.

I am particularly indebted to Pat Kaluza for professionally

creating the final typed draft of this dissertation. Her willingness

to meet tight deadlines and ability to smile even after facing the pain-

ful notation of Chapter V will be remembered.

My sincere thanks to Elizabeth Wells for patiently listening to and

advising me on a potentially all-consuming personal problem. Our many

long talks did much to free my mind for work on this dissertation.

The thoughtfulness of friends Sharron K. Duncan and Susan G. Benson,

and new friends Thomas R. Lundstedt, Charlotte (Char) A. Lundstedt, and

Claris E. Loomis helped me to successfully negotiate the transition from

sunny Florida to frozen Minnesota last winter. Claris deserves special

mention. Her concern for my happiness and interest in my work helped










speed my return to work on this dissertation. Had we not met when we

did, this dissertation might still be a collection of partially finished

chapter drafts.

My first course in statistics was at Bucknell University in 1964.

The text we used was by Dr. Mendenhall, the instructor was my father.

No single course or approach to teaching has influenced me more.

Throughout my life, but particularly during the past two and a half

years, my father has been a great source of strength and inspiration.

Even though absent, he was present. The completion of this disser-

tation is as much his accomplishment as it is mine.

Finally, my thanks to my mother for not letting me set my goals

too low or letting me give up before attaining them. Your many, many

sacrifices have been and are appreciated.












TABLE OF CONTENTS


Page

ACKNOWLEDGMENTS......... . . . . . . iii

ABSTRACT ... . . . . . . . . viii

Chapter

I. INTRODUCTION . . . . . . . ... . . .. 1
I.1 Statistical Models . . . . . . . .. 2
1.2 Model Specification Uncertainty . . . . . 3
1.3 The Bayesian Approach to Inference and Decision .. 5
1.3.1 The Predictive Distribution. . . . . 5
1.3.2 The Posterior Distribution . . . . 7
1.4 Chapter Outline and Preview of Results . . . . 7

II. HYPOTHESIS TESTING, BAYESIAN MODEL SELECTION, AND
BAYESIAN MODEL COMPARISON. . . . . . . . 11
II.1 Harold Jeffreys: Hypothesis Testing . . . .. 11
11.2 Harry V. Roberts: Comparing Forecasters . . .. .13
11.3 Martin S. Geisel: Bayesian Model Comparison
and Selection . . . . . . ... . 16

III. FORECASTING WITH AND WITHOUT REGARD FOR MODEL
SPECIFICATION UNCERTAINTY. . . . . . . . 25
III.1 A Comparison of the Predictive Variances
Generated by the Bayesian Mixed Model
Distribution and the Bayesian Model Selection
Procedure . . . .............. . 27
111.2 Forecasting: Bayesian Model Comparison Versus
Bayesian Model Selection and the Maximize-R2 Rule 36
111.2.1 The Bayesian Model Selection Procedure
(BMS) ............... . .37
111.2.2 The Maximize-R2 Rule . . . .... 39
111.2.3 The Bayesian Model Comparison Procedure
(BMC) . . ............. 42
111.2.4 Model Space and Assumptions. . .... .45
111.2.5 The Treatment of Model Specification
Uncertainty . . . . . . 48
III.2.6 Risk Specification . . . ... .50
111.2.7 A Comparison of Expected Losses ... .57
111.2.8 Implications for Point Estimation. ... 59
111.2.9 Implications for Interval Estimation . 62













IV. MODEL SPECIFICATION UNCERTAINTY IN SINGLE-PERIOD
ECONOMIC CONTROL PROBLEMS. . . . . . . . 68
IV.1 The Economic Control Problem . . . . .... 69
IV.2 Model Space and Assumptions . . . . . . 74
IV.3 Single-Period Certainty-Equivalent Control . .. 76
IV.3.1 Certainty-Equivalent Control Using
the BMMP Distribution . . . .... ... .78
IV.3.2 Risk Specification in Certainty-
Equivalent Control . . . . .... 82
IV.4 Optimal Single-Period Control . . . . ... 87
IV.4.1 Optimal BMC Control . . . . . . 89
IV.4.2 Optimal BMC Control When Instrument
Use Costs Are Considered . . . ... 94
IV.4.3 Certainty-Equivalent BMC Control
Solutions When Instrument Use Costs
Are Considered . . .. . . . .. 104
IV.4.4 Risk Specification in Ootimal BMC Control. 107
IV.4.5 BMC Control When More Complicated Models
Are Included in the Model Space . . ... 108

V. BAYESIAN MODEL SWITCHING ... . . . .... .112
V.1 Bayesian Model Switching Methodology . . . ... 112
V.2 Special Case: Two Normal Models . . . . ... 122

VI. CONCLUDING COMMENTS AND SUGGESTIONS FOR FURTHER
RESEARCH. . . . . . . . . . . . .. 139
VI.1 Research Difficulties Encountered . . . .. 141
VI.2 Shortcomings of the BMC Procedure and
Suggestions for Further Research .. . ..... .143
VI.3 Suggestions for Further Research in the Areas
of Economic Control and Model Nonstationarity. . 145

BIBLIOGRAPHY .. . . . . . . . . . 148

BIOGRAPHICAL SKETCHl . . . . . . . . 151


Chapter


Page











Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy



A BAYESIAN ANALYSIS OF MODEL SPECIFICATION UNCERTAINTY
IN FORECASTING AND CONTROL

By

Paul George Benson

August 1977

Chairman: Christopher B. Barry
Major Department: Management


The use of statistical models for forecasting and economic control

has received widespread attention in recent years. Most of this atten-

tion has been focused on the problems caused by uncertainty concerning

the parameters of a given model, whereas little attention has been paid

to the problems caused by uncertainty concerning the specification of

the model itself. In this dissertation Bayesian methodology is employed

to treat model specification uncertainty in forecasting and control

environments. The implications of forecasting with and without formal

regard for model specification uncertainty are explored via a comparison

of the recommended methodology and alternative methods which involve the

selection of a single model. The recommended methodology is applied to

single-period economic control problems. In particular, certainty-

equivalent and optimal analytic solutions are found for problems in

which there exist two viable alternative linear models of the data-

generating process each with a different instrument and no intercept

viii











term. Solutions are obtained for situations in which control is

cost-free and in which various instrument-use cost functions are known.

Finally, a Bayesian procedure for modeling and making inferences about

particular nonstationary data-generating processes is introduced. This

procedure characterizes data as being generated by different statistical

models in different time periods with the switch between models con-

trolled by some random process.














CHAPTER I

INTRODUCTION



The use of statistical models for forecasting and economic

control has recieved widespread attention in recent years. Most of

this attention has been focused on the problems caused by uncertainty

concerning the parameters of a given model. As a result, much has

been written about parameter specification and estimation and their

decision-making implications, whereas little analytical attention

has been paid to the problems caused by uncertainty concerning the

specification of the model itself. The implications of this type of

uncertainty for forecasting and control are virtually unexplored. That

these implications are significant and worth exploring has been ex-

pressed by Pierce:


Another area of uncertainty has to do with our models ..
The problem lies not only with uncertainty concerning
the true value of model parameters, but also with the
structure of models themselves. . We have found
that with some relatively minor changes in the specifi-
cation of our quarterly model. . we can importantly
alter its policy multipliers.1



J. L. Pierce, "Quantitative Analysis for Decisions at the Federal
Reserve," Annals of Economic and Social Measurement, 3 (1974), 1-9.







2


In this dissertation the Bayesian Model Comparison procedure

developed by Geisel from the work of Roberts2 is advocated as a method

for formally treating model specification uncertainty in forecasting

and control problems. The implications for forecasting with and

without regard for model specification uncertainty are examined, and

the Bayesian Model Comparison procedure is applied to simple single-

period economic control problems.

The following sections of this chapter introduce definitions and

discuss concepts that will be referred to throughout the remainder

of the dissertation.


I.1 Statistical Models

Throughout this dissertation the term "model" refers to a para-

metric statistical characterization of a data-generating process

composed of both deterministic and random components. The general

linear model used in regression analysis is an example of such a charac-

terization. Each such model describes the data-generating process via

a family of probability density functions in which each member of the

family depends on a finite number of parameters, probability density

functions over the parameters, and predetermined values of a specified

set of variables upon which it has been hypothesized that the data-

generating process depends.


1Martin S. Geisel, "Comparing and Choosing Among Parametric Sta-
tistical Models: A Bayesian Analysis with Macroeconomic Applications"
(Ph.D. dissertation, University of Chicago: 1971).

2Harry V. Roberts, "Probabilistic Prediction," Journal of the
American Statistical Association, 60 (March, 1965), 50-62.











Statistical models are used to describe the stochastic behavior

of a data-generating process. Decision makers use them "as if" they

were actually generating the data of interest. Any reference to a

model as being the "true" or "correct" model of a data-generating

process should not be taken literally. A model is referred to as

being the "true" model only insofar as it behaves "as if" it were

generating the observed data.


1.2 Model Specification Uncertainty

The statistical models discussed in the previous section ex-

plicitly admit uncertainty about the data-generating process through

their parameters and random error terms. These two sources of uncer-

tainty will be referred to as parameter uncertainty and random error

(or residual) uncertainty. Random error is present in each model

since the deterministic component of the model cannot realistically

be expected to account for all factors influencing a realization of

the data-generating process. Parameter uncertainty is present since

a model's parameters are typically not observable and must be estimated

from sample data. Being explicitly present in a statistical model,

these two types of uncertainty and their implications for decision

making have received considerable attention in the literature.

Thus, it is well-known that the appropriate use of a statistical model

in decision making requires the consideration and treatment of both

parameter and random error uncertainty.


References are provided in Chapters III and IV.








When a decision maker is uncertain as to the functional form of

his model and/or is uncertain as to the set of variables upon which

the data-generating process depends, model specification uncertainty

is said to be present. Model specification uncertainty and its decision-

making implications have received little attention in the literature.

As a result, model specification uncertainty is typically ignored or

assumed away in the statistical analysis of data-generating processes

that precedes decision-making. The usual procedure is for the decision-

maker to utilize sample information to aid in the selection of a model

from a set of models he believes to be viable alternative representa-

tions of the data-generating process. The chosen model is then assumed

to appropriately represent the data-generating process,and the decision

maker bases his decisions on the information provided by this model.

Such a procedure can formally consider only parameter and random error

uncertainty. Depending on the particular model selection procedure

utilized, model specification uncertainty is either completely ignored

or suboptimally treated. The result is that some or all of the informa-

tion provided about the data-generating process by the set of models

which were not chosen, but were believed to be viable, is lost. This

loss is analogous to the information loss that would occur if the deci-

sion maker assumed he knew the parameters of a given model and made his

decisions without acknowledging parameter uncertainty. Chapters III and

IV will discuss in detail the decision-making implications of the infor-

mation loss caused by failing to treat model specification uncertainty.


1An interesting exception is the recent paper by M. Brenner, "The
Effect of Model Misspecification on Tests of the Efficient Market Hypo-
thesis," Journal of Finance, 32 (1977), 57-66. There are other excep-
tions as well.










1.3 The Bayesian Approach to Inference and Decision Making

In this dissertation uncertainty is dealt with via Bayesian

inferential procedures. This section briefly reviews the methodology

of Bayesian inference.1

1.3.1 The Predictive Distribution

Decisions frequently hinge on the future outcome of a data-

generating process. In such cases decision makers typically use a

statistical model to characterize the data-generating process. If

model specification uncertainty is negligible and the parameters of

the model are known, then the decision maker can feel secure in basing

his decision on the information provided him by his model. However, if

the parameters of the model are unknown the model should be altered to

reflect the decision-maker's uncertainty concerning the parameters.

This can be accomplished by treating the parameters as random variables,

utilizing a probability distribution over the parameters to reflect the

decision-maker's parameter uncertainty, and computing the marginal

distribution of future realizations of the data-generating process,

i.e., the distribution of future realizations which is not conditioned

on the model's parameters.

Suppose the decision-maker's statistical model describes the data-

generating process via the sampling distribution f(YF|e), where YF is

a future value of the data-generating process (yF E Y) and the


For a thorough discussion of methodology reviewed in this section
see Howard Raiffa and Robert Schlaifer, Applied Statistical Decision
Theory (Cambridge,Mass.:The M.I.T. Press, 1961).










parameters of the data-generating process are represented by

e (a E e). Then, if the decision-maker's parameter uncertainty can be

described by a probability distribution g'(e), the decision maker can

compute the marginal distribution of future realizations of the data-

generating process as follows:

f(yF) = I g'(O)f(yF )de. (1.1)

This distribution is referred to as a predictive distribution.

If the decision maker is able to obtain a sample from the data-

generating process of interest he may update his distribution of e to

reflect the sample information. Then, utilizing his revised distri-

bution of 6, he may recompute his predictive distribution of yF so

that it too reflects the sample information. The revision of f(e) is

accomplished via Bayes' Rule:


I g'(e)f(y e)d8
f"(e y) = g a )f(y )do (1.2)
0

The function g'(e) is called the decision-maker's prior distribution of

6 since it was established prior to obtaining the sample y. The

function f(Yle) is a likelihood function. It describes the likelihood

of the given sample result, y, for different values of e. The function

f"(ely) is the decision-maker's revised distribution of e. It is called

a posterior distribution since it was computed following the receipt of

sample information. The posterior distribution reflects all the infor-

mation about 6 currently available to the decision maker. This infor-

mation may be incorporated into his predictive distribution of yF as

follows:











f(yFIY) = f f"(ely)f(yFle)de. (1.3)

It is from this distribution that needed information about future

observations of the data-generating process should be extracted. As

more sample and/or subjective information about the process becomes

available, the decision maker can formally revise his predictive dis-

tribution to reflect that information by repeatedly applying the

above procedure.


1.3.2 The Posterior Distribution

There are three inputs to Bayes' Rule: (1) the decision-maker's

prior information about 6 expressed via g'(9); (2) sample observations

from the data-generating process; and (3) the choice of the functional

form of the data-generating process, i.e., the choice of a likelihood

function. The output of Bayes' Rule is an inferential statement

about 9 in the form of a probability distribution, f"(e0y). A

decision maker interested in obtaining information about a param-

eter of the data-generating process should compute f"(ely). The

function f"(ejy) can stand alone as an inferential statement about 9,

or it can be used to determine point and interval estimates of 0. As

more sample and/or subjective information about the data-generating

process becomes available, Bayes' Rule can be reapplied to revise

f"(ely). The sequential application of Bayes' Rule permits the

decision maker to formally learn about o over time.


1.4 Chapter Outline and Preview of Results

Typically econometric forecasting and control models are developed










and used without formally considering the full impact of model specifi-

cation uncertainty. The usual procedure is to (1) utilize a model se-

lection technique to choose one model from a set of alternative compe-

ting models to characterize the data-generating process, and (2) assume

the chosen model to be the correct model of the data-generating process

and use it to forecast and/or control the process. Such procedures

either ignore or do not fully consider the information about the data-

generating process contributed by the models that were proposed as

being viable but were not selected by the model selection procedure.

Further, in assuming the chosen model is the correct model of the

process, the forecaster or controller is behaving as though he faces a

lesser degree of uncertainty than is really the case. Thus, in utilizing

model selection procedures, forecasters and controllers are simultane-

ously discarding relevant information about the data-generating process

and behaving as if they have more information than is actually possessed.

This dissertation advocates the use of the Roberts/Geisel Bayesian

Model Comparison Procedure as a means of comprehensively treating model

specification uncertainty and avoiding such contradictory behavior. The

Bayesian Model Comparison Procedure and its origins are described in

Chapter II. Chapter II also describes a Bayesian model selection pro-

cedure referred to herein as the Bayesian Model Selection Procedure.

In Chapter III, the effects of forecasting with and without regard

for model specification uncertainty are examined by comparing forecasts

determined via the Bayesian Model Comparison procedure (BMC) with those

yielded by a Bayesian procedure which fails to appropriately consider










model specification uncertainty, the Bayesian Model Selection procedure

(BMS). The following results are derived:

1. If the variance of the decision-maker's predictive distribution

is used to measure forecast-risk, and a decision maker fore-

casts via the BMS procedure rather than the BMC procedure,

the risk he takes in predicting future values of the data-

generating process is misspecified.

2. A decision-maker's posterior expected loss from using a fore-

cast derived via the BMC procedure is less than his posterior

expected loss from forecasting via the BMS procedure.

3. Point estimates derived via BMS are frequently misplaced.

4. The reliability of credible intervals derived via the BMS

procedure may be misspecified.


In Chapter IV, the BMC procedure is applied to simple single-

period economic control problems. In particular, certainty-equivalent

and optimal analytic solutions are found for the case of two competing

linear models each with a different instrument (controllable variable)

and no intercept term. The following results are obtained:

1. The BMC certainty-equivalent control solution is to set both

instruments as if each instrument's respective model were in

fact the true model of the data-generating process.

2. If the variance of the controller's predictive distribution

is used to measure control-risk, and certainty-equivalent

control is utilized, it can be shown that under certain

circumstances the BMS approach to control always understates










the control-risk involved.

3. The optimal BMC control solution is to set both instruments as

if each instrument's respective model were in fact the true

model of the process. Since optimal BMC control treats model

specification uncertainty, parameter uncertainty, and residual

uncertainty, whereas certainty-equivalent control treats only

model specification uncertainty, the optimal BMC control solu-

tion differs from the BMC certainty-equivalent control solu-

tion.


Certainty-equivalent and optimal BMC control solutions for cases where

instrument use costs are known are also derived in Chapter IV.

In Chapter V, a procedure for handling model nonstationarity is

introduced. Called Bayesian Model Switching, this procedure was

suggested by anomalies observed in sequences of posterior model proba-

bilities generated by the BMS and BMC procedures. The Bayesian Model

Switching procedure characterizes the data-generating process in a

manner similar to Quandt's switching regression regimes.1

Chapter VI contains an overview of the dissertation, a discussion

of the shortcomings of the Bayesian Model Comparison and Bayesian

Model Switching procedures, and suggestions for future work in the

area of model specification uncertainty.



R. E. Quandt, "A New Approach to Estimating Switching Regressions,"
Journal of the American Statistical Association, 67 (March, 1972),
306-310.












CHAPTER II

HYPOTHESIS TESTING, BAYESIAN MODEL SELECTION, AND
BAYESIAN MODEL COMPARISON


The Bayesian Model Comparison approach to handling model specifi-

cation uncertainty in decision-making problems has its origins in the

hypothesis testing work of Harold Jeffreys1 and is a direct spin-off of

a Bayesian procedure developed by Harry Roberts2 for combining expert

opinions. Martin Geisel3 adapted Roberts' work for use in econometrics

and in so doing formalized the Bayesian Model Comparison and Bayesian

Model Selection procedures. The contributions of Jeffreys, Roberts,

and Geisel to the existing Bayesian Model Comparison and Bayesian Model

Selection procedures are discussed in this chapter.

II.1 Harold Jeffreys: Hypothesis Testing4

In considering two mutually exclusive and exhaustive hypotheses

about the parameter vector e of a probability density function,

Jeffreys suggests that the decision maker should place prior probability

masses on each of the hypotheses. The probabilities should be con-

sistent with the decision maker's prior information and, consequently,


Harold Jeffreys, Theory of Probability (London: Oxford University
Press, 1961), Chapters 4 and 5.
Roberts, pp. 50-62.
3Geisel, pp. 1-45.
Jeffreys, Chapters 4 and 5.









prior beliefs about the appropriateness of each of the hypotheses.
Thus, if the two hypotheses H and H1 are exhaustive and nonoverlapping
their prior probabilities P'(H ) and P'(HI) would be assessed, and must
sum to one. If H and H1 are a prior equally likely, P'(H ) = P'(H1).
It is assumed that given H a future sample result y has probability
density function f(ylH ), and that given H1 is true, y's probability
density function is f(ylH1). Then, using Bayes' Rule, the posterior
probability that, say, H is the appropriate hypothesis is
P'(H )f(y H ) )
P"(Hy) = P'(Ho)f(y|H ) + P'(H )f(yH) (2.1)

and P"(H1y) = 1-P"(Holy). After determining P"(Holy) and P"(H1ly),
the decision maker can choose as the more appropriate hypothesis the one
with the higher posterior probability. Or, if the decision maker can
economically determine the losses involved from choosing an incorrect
hypothesis, he can use P"(Ho|y) and P"(H |y) to determine the expected
loss of choosing H or HI and then select as being the more appropriate
the hypothesis that minimizes his expected loss.
More formally, if H is e = eo and H1 is e = el, where eo and el
are particular values of the parameter vector (i.e., two simple
hypotheses), then (2.1) would be

P'(e=e )f(y|e=e ) (2.2)
"(H ) = P"( y) = (e=ef(ye=e) + P'(e=e)f(ye=e)

If H is ae 1 and HI is e s2 where f1 and i2 (vIU Y2 = T) are
mutually exclusive and exhaustive sets (i.e., H and H1 are two
composite hypotheses), then it is necessary for the decision maker to










assess a prior pdf for e over Yl, P'( elesY1), and another for a over t2'

P'(elec'2). Then (2.1) would be

P'(eEY )f(YI e lI)
P"(H|y) = P"(eEI y) = (---- (2.3)

where f(y|ec:1) = f P'(ejle8l)f(yle, eed)de

2
and f(y) = I P'(eei)f(yle| i).
i=1

The next section discusses Harry Roberts important extension of

Jeffreys' work.

11.2 Harry V. Roberts: Comparing Forecasters1

Roberts was concerned with "reconciling conflicting expert inter-

pretations of the same data."2 Building on Jeffrey's work, Roberts'

devised a method for discriminating among a set of alternative para-

metric statistical models each of which purports to describe some

random process of interest. This Bayesian discrimination procedure

will be discussed in detail in the next section.

It will be assumed that person C knows nothing about a particular

data-generating process f(yle), but wishes, for example, to predict

future y values and is, therefore, interested in learning about the

process. Persons A and B possess knowledge about the same process.

A and B express their knowledge about f(yle) via the data distributions



1Roberts, pp. 50-62.
2 bid., p. 55.










f(yle, A) and f(yle, B), respectively, and their prior distributions on
the parameter e, g(elA) and g(e]B). e may be a vector. For expository

purposes, only two individuals will be assumed to possess knowledge

about the process, and all probability distributions of this section

will be assumed to be discrete.1

C's prior distribution for the parameter 0 may be expressed as:

g'(e) = P'(A)g(e|A) + P'(B)g(elB). (2.4)

P'(A) and P'(B) sum to one and may be thought of as C's probability

assessment of the accuracy of A's judgment and B's judgment, respec-

tively. If C had some knowledge about the reliability of opinions

expressed by A and B he might tend to respect the opinion of one,

say A, more than the other, and so assign P'(A) > P'(B). If C knew

nothing about either A or B it would be appropriate for him to assess

P'(A) = P'(B) = .5. C can then learn about f(yle) by combining his

thoughts (if any) about A and B (reflected in P'(A) and P'(B))with the

opinions expressed by A and B about f(y e) (represented by g(eJA),

f(yle, A), g(elB), and f(yle, B)) as in (2.4), and by using sample
information to revise (2.4). Thus it is C's posterior distribution of

e, g"(ely), that C should use in predicting y. Roberts' development of

g"(ely) is outlined in the next paragraph.




For another approach to the use of expert opinion see Peter
Morris, "Decision Analysis Expert Use," Management Science,20 (May,
1974), 1233-41 and "Combining Expert Judgments: A Bayesian Approach,"
Management Science,23 (March, 1977), 679-693.









Following Roberts, let A index the opinions of A and B,i.e., when
A = AA reference is being made to person A, and when A = AB reference
is being made to person B. With C's prior distribution for A denoted
by P'(A), C's joint prior distribution for X and e is denoted:

g'(e,A) = P'(A)g(eIA). (2.5)
Accordingly, C's marginal prior distribution for 6 is denoted by

P'(e) = EP'(A)g(O|X). (2.6)
A
Equations (2.6) and (2.4) are equivalent. C's joint posterior distri-
bution of A and e is obtained via Bayes' Rule as follows:

h"(A,ely) = h'(A,e)f(y x,e) P'(x)g'(l )f(y Axe) (2.7)
h h'(x,e)f(ylA,e) fY)
A8
f(ylx,e) represents the likelihood of observing the sample result y
given particular values for X and e. f(y) is the marginal distribution
of the data. Then, recognizing that g(elx)f(yl|,e) = f(yJX)g(ely,x),
g"(ely) is obtained from (2.7) as follows:

g"(e|y) = P()9'(e) f(y|Ay) (2.8)
A f(y

= I P'(M)f(yl) g(,y)
fA(y g(e|A,y)


= Z P(Aly)g(elx,y).

Thus C's posterior distribution of e is a weighted average of A's
posterior distribution of 6 and B's posterior distribution of e.
Roberts points out that if (2.7) is summed over 0 instead of A,
as was done in (2.8), the marginal posterior distribution of A is
obtained:










P(fly) = P' f( (2.9)

Roberts notes that in statistical discrimination problems where it is

assumed that y is generated by either f(yI/A) or f(ylhB), P'(.i) (i=A,B)

may be interpreted as the discriminator's prior probability that f(y|Ai)

generates y, and P(Aily) may be interpreted as the discriminator's

posterior probability that f(yI i) generates y. Roberts suggests that

discrimination between these two alternative generating processes should
P(hAAY)
be accomplished via examination of the posterior odds ratio, P( .A
T\BTYT
Roberts' interpretation of P(Aily), and his suggested procedure for
discriminating among alternative statistical models, were formalized by

Geisel in his Bayesian, Model Selection and Bayesian Model Comparison

schemes. Geisel's extension of Roberts' work is discussed in the next

section.

11.3 Martin S. Geisel: Bayesian Model Comparison and Selection1

Geisel's work was concerned with Bayesian procedures for comparing
and choosing among parametric statistical models. His procedure for

comparing models will be referred to as the Bayesian Model Comparison
(BMC) approach. His procedure for choosing one model from among a set

of competing models uses the same methodology as the BMC approach but
for different purposes. Consequently, the Tatter procedure will be

referred to here as the Bayesian Model Selection (BMS) procedure.


Geisel, pp. 1-45.










Suppose the decision maker feels that any one of N alternative

models could represent the data-generating process of interest to him.

Denote by P'(Mi), i=1,2,...,N the decision-maker's prior proba-

bility that Mi, the ith model, is an accurate representation of the

data-generating process. If the decision maker assesses P'(Mi) > 0,

then the model should be included in the set of N models. It follows

N
that P P'(Mi) = 1. The unknown vector of parameters of M. is denoted
i=l

by e., i=1,...,N where 9. c 0. The decision maker's knowledge about

8i is described via a prior density function, g'(eiMi ).
If Mi were known to be the true model and its parameters were

known to La 6?, the data-generating process could be completely charac-

terized by the density function f(yle?, Mi, D), where y is the random

variable of interest to the decision maker.1 In the forecasting

problems of Chapter III, Di, which may be a vector, will be the

explanatory variables of Mi and will be used to help forecast future

values of y. In the economic control problems of Chapter IV, D. will

be the independent variables of M. and will be under the control of the

decision maker. Once y has been observed, f(yIei, Mi, Di), viewed as

a function of ei, Mi and Di, is a likelihood function and can be used

to make inferences from the data about the correct model and about the

parameters of all the models.




y may be vector-valued, but in order to simplify the notation
and discussion to follow it is assumed that y is a scalar.










As new information is received about the data-generating process

being modeled, i.e., as y is observed, the prior distribution on the
parameters of Mi should be revised to reflect this new information.

Revision for a single model is accomplished exactly as if the parame-
ter distribution of a known data-generating process with unknown

parameters were being revised. Applying Bayes' Rule yields

g"(ei iy,Di) = g'(eilMi)f(yle i,Mi,Di)/f(y|Mi,Di) (2.10)
where

f(yIMi,Di) = f g'(e MiM)f(y ei,Mi,D )dei. (2.11)
0

The function g"(ei Mi,y,Di) is the posterior distribution of ei.

Given Mi and Di, and before observing y, f(yNMi,Di) is commonly

called the predictive density function of y. It is the distribution

of future realizations of the data-generating process conditioned on

Mi being the correct model of the process and unconditioned on ei,

the parameters of Mi. Having observed y, f(yMiN,Di) may be thought
of as a "model likelihood" since it compares the relative likelihood
of the data, y, across models. Utilizing these model likelihood

Bayes' Rule is invoked a second time to revise the prior model
probabilities:
P"(Mily,D) = P'(Mi)f(ylMi,Di)/f(yID) (2.12)
where
N
f(y(D) = P'(Mi)f(yIMi,Di). (2.13)
i=l
P"(Mi y,D)is the posterior probability that Mi is the correct model.

f(ylD) is a predictive distribution, a distribution of future










realizations of the data unconditioned on a particular model being

the correct model. D, written without a subscript, is a vector

comprised of the set of decision variables, Di, i=1,2,...,N from all

N models.

After observing y and revising the prior distributions on M.

and ei, the posterior probability distributions reflect all the

information the decision maker has about the set of models and their

parameters. Any prior information is reflected in the prior distri-

butions, P'(Mi) and g'(eilMi). The sample evidence, y, is incorpo-

rated through the likelihood function, f(y|Mi,ei,Di). As additional

information in the form of further observations of y is obtained, it

may be reflected in new posterior distributions that are obtainable

via revision of the existing posteriors (which, relative to the latest

data, are called priors) derived in (2.10) and (2.12) above.

As long as the data-generating process does not change over time,

the application of (2.10) and (2.12) to successive sets of new data

permits the decision maker to "learn from experience" about which

model of the process is the most appropriate. When the data may be

generated by different models in different time periods, successive

application of the probability revision procedures in this section

would be inappropriate. This problem and an approach to handling it

are discussed in Chapter V.

The above procedure can be used to select a single model to

represent a random process by a decision maker who is uncertain about

the appropriate form of that process. He can accomplish this by










choosing from his original set of N competing models the one with the

highest posterior probability or, if losses associated with choosing

the incorrect model are known or can be estimated, by selecting the

model that minimizes his posterior expected loss. The use of posterior

model probabilities for model selection is the procedure referred to

in this dissertation as Bayesian Model Selection (BMS).

The decision-maker's posterior model probabilities indicate that

he is uncertain of the form of the random process. Thus, any decision

procedure based on a chosen model fails to appropriately treat model

specification uncertainty. Geisel points out that if the posterior

probability of a model is positive, then the model contributes to our

knowledge of future observations of the random process of interest and

there is no theoretical reason to neglect this contribution. Hence,

any decision procedure that involves selecting a single model from

among a set of competing models ignores relevant information, and,

computation costs and other complexities aside, can only be viewed

as an approximation to an optimal procedure.

The key to utilizing all the information contained in the set

of competing models relative to future observations of the random

process lies in the use of the predictive density function derived

in (2.13) above and repeated here in more detail:



1See A. M. Faden and G. C. Rausser, "Econometric Policy Model
Construction: The Post-Bayesian Approach," Annals of Economic and
Social Measurement, 5(1976), 349-362.









N
f(ylD) = i P'(Mi)[f f(ylei,M ,Di)g'(9 iMi )de ]
i=1 a 1 1 1
(2.14)
N
= P' (M )f(y Ml i D ).
i=1


f(y|D) is a weighted average of the predictive densities of y for each
of the N models (referred to below as model predictives). It is this

distribution that the decision maker should use to characterize the

random process upon which his decision hinges and about whose form he

is uncertain. This distribution will herein be referred to as a

Bayesian Mixed Model Predictive (BMMP) distribution. The process of

computing and analyzing posterior probabilities and the associated BMMP

distribtuion is called "comparing models" by Geisel and is referred to

herein as the Bayesian Model Comparison (BMC) procedure.

Suppose that y has been observed and that the decision maker is

interested in making a decision that relates to some future value, yF'

of the random variable. If the decision maker knew the correct model,

say Mi, and its parameters, say ei, then his distribution of yF would

be f(yFlei,Mi,DFi) and his decision would depend on this distribution.

DFi is used to denote values of the decision variables of model i
associated with yF. But the decision maker knows neither the correct

model nor its parameters. What he does know is summarized in

P"(Mily,D) and f(YFIMi,y,Di). Thus, his distribution of yF should be

a BMMP conditioned on the data already observed, y and D:









N
f(YFIy,D,DF) = I P"(Mily,D)[ff (YFIi i DFi )g(9iIMi ,y,D )dei] (2.15)
i=l 0

N
= Z P"(Mi y,D)f(YFIMiy,Di DFi).
i=l

This BMMP is a function of all N competing models and thus enables the

decision maker to choose a course of action in light of all available

information relating to yF.

Even when the BMMP is the distribution (model) that the decision

maker should use to characterize the random process in question, there

can be at least three reasons for selecting a single model via the

Bayesian Model Selection procedure:

1) In comparing alternative theories or hypotheses

it may be desirable to choose the one with the

most substantive content.

2) It may be more convenient to approximate the

random process with a simple model.

3) The use of a BMMP may prove too costly. In

general, the computation of a BMMP involves

the combination of its components via exten-

sive numerical methods.

Geisel shows that under certain assumptions, Bayesian Model

Selection provides a Bayesian interpretation for the classical pro-

cedure of choosing from a set of models the one with the lowest

estimated residual variance, s2, or highest coefficient of determi-

nation, R2. Given a set of normal regression models each of which










has the same number of parameters, given diffuse prior distributions

over the models and the parameters of the models, and given a symmetric

loss function with respect to the choice of an incorrect model, Geisel

shows that the procedure of choosing a model with the highest posterior

model probability, P"(Mily,D), is equivalent to the procedure of

selecting the model with the lowest s2 or highest R 1 This result is

very similar to a result derived by Thornber. Thornber, however,

uses as priors on the parameters of the models those suggested by

Jeffreys' invariance theory, whereas Geisel's priors on the parameters

of the models take the form of multinormal and inverted gamma-2 dis-

tributions. These results will be discussed in more detail in

Chapter III.

Another important Geisel result that will be drawn upon is his

proof that given, say, M1 is the true model in the set of N competing

models, as sample evidence accumulates (i.e., n-) P"(M ly,D)-* and

the BMMP P(yFIMi,y,Di).4 Thus, if the decision maker could wait

long enough, the data he would observe would tell him with near

certainty which of the N models was generating the data. This result

will be discussed in more detail in Chapter III.

Ibid., pp. 24-37.

E. H. Thornber, "Applications of Decision Theory to Econometrics"
(Ph.D. dissertation, University of Chicago, 1966), Chapter 2.

3For discussion of Jeffrey's invariance theory see Arnold Zellner,
An Introduction to Bayesian Inference in Econometrics (New York:
Wiley, 1971), pp. 41-53.

Geisel, p. 23.







24



In the next chapter some of the consequences of forecasting with

and without the use of the Bayesian Model Comparison procedure are

explored. Particular attention is paid to the comparison of the

Bayesian Model Comparison procedure and the Bayesian Model Selection

procedure.











CHAPTER III

FORECASTING WITH AND WITHOUT REGARD
FOR MODEL SPECIFICATION UNCERTAINTY


If a decision maker is uncertain as to which one of N random

processes is generating future values of a random variable upon which

the effectiveness of his current decision depends, Geisel contends

that the decision maker should use the Bayesian Mixed Model Predictive

(BMMP) distribution of the Bayesian Model Comparison (BMC) procedure

to reflect the information he has concerning the process of interest.

His justification for this approach rests primarily on the following

statement:2

Note again that this procedure does not select one
model as "true" or "best" and eliminate the rest.
If the probabilistic weight of a model is positive
it contributes to our knowledge of the future ob-
servations and there is no reason to neglect this
contribution. Thus, any decision theoretic pro-
cedure which is designed to eliminate some of the
models is viewed as an approximation which is used
for reasons of simplicity of view or to reduce the
cost of computation.

This chapter explores some of the consequences of forecasting with and

without the use of the Bayesian Model Comparison procedure and, in so

doing, attempts to more rigorously justify advocation of the BMC pro-

cedure for use in decision-making problems in which model specifi-

cation uncertainty is present. The chapter attempts to explain why

Geisel, Chapter II.
Ibid., p. 19.










it would frequently be worth the extra cost to use the BMC approach

rather than approaches which, though perhaps simpler and less costly,

fail to fully reflect model specification uncertainty and the totality

of information the decision maker has concerning the process of

interest.

In this chapter, forecasting via the Bayesian Model Comparison

procedure will be compared to forecasting via the Bayesian Model

Selection procedure and the maximize-R2 rule. It is shown that when

model specification uncertainty exists, of these three procedures

only the BMC procedure optimally handles the information the decision

maker has concerning the data-generating process whose future values

he wants to forecast. More specifically, if a decision maker forecasts

via the BMS procedure, it is shown that the risk he takes in predicting

future values of the random process of interest is misspecified. It is

also shown that the decision-maker's posterior expected loss from using

a BMC forecast is less than his posterior expected loss from using a

BMS forecast. The last two sections of this chapter compare the effec-

tiveness of point and interval forecasts generated via the BMC procedure

with those generated via the BMS procedure. It is shown that BMS point

estimates are typically misplaced and that the reliability of BMS

credible intervals may be misspecified.

The following section introduces notation which will be used in the

remainder of the chapter and examines the relationship between the pre-

dictive variance of y as defined by a BMMP distribution, and the predic-

tive variance of y as defined by the model selected by the BMS procedure.










III.1 A Comparison of the Predictive Variances Generated by the
Bayesian Mixed Model Distribution and the Bayesian Model
Selection Procedure

Much of the analysis in Section 111.2 draws on the relative sizes

of the predictive variance of y as defined by a Bayesian Mixed Model

Predictive distribution, V(BMMP), and the predictive variance of y as

defined by a Bayesian Model Selection Predictive distribution, V(BMSP).1

Accordingly, to avoid awkward digressions in Section 111.2, this section

will be devoted to a comparison of V(BMMP) and V(BMSP).

It was shown in equations (2.14) and (2.15) that the BMMP is a

weighted average of the predictive densities of y for each of N alter-

native models. Equation (2.15) is repeated here:

N
f(lFD,DF) = P"(MilY,D) [f(YFiMi'i,DFi)g"(OMiy,Di) dei]
i=1 a

N
= P"(Mily,D)f(yF iM y,,DFi). (3.1)

The function f(yFIi.,y,DiDFi) will be referred to as a "model predic-

tive." Recalling equation (2.11), a model predictive is a distribution

of realizations from the data-generating process conditioned on

1) II. being the correct model of the process; 2) previous observations

of y, the dependent variable of interest, and Di, the decision vari-

able; and 3) DFi, the value of the decision variable with which the

next y to be observed, yF, is associated. Thus, if the Bayesian Model

Selection procedure chooses, say, M., it is Mi's predictive distri-

bution, f(YFIMi,YD,DFi), that is being chosen to characterize future

observations of y, yF. It is the variance of this preditive
IV(BMMP) and V(BMSP) are formally defined below.









distribution that is referred to as V(BMSP). In general, the mean and
variance of the predictive distribution generated by Mi will be
2
denoted by Wi and o., respectively. The mean and variance of a BMMP
will be denoted by p and o2 (or V(BMMP)), respectively.
It is shown below that

S= P"(M1|y,D)p1 + ... + P"(M, ly,D)PN (3.2)
and
2 = P"(M yD)[o2 + ( )2] ...

+ P"(MMIyD)[a + (pN )2]. (3.3)

To demonstrate, first note that u can be obtained by definition as

S= YFf(YFIYD,DF)dyF (3.4)

Substituting (3.1) for f(yFIy,D,DF) in (3.4) yields

= i yF P"(M iy,D)f(yFIMiy,Di',DFi)]dYF. (3.5)

With the expansion of the sum in equation (3.5), (3.2) is obtained:

p= YF[P"(Ml y,D)f(yFI ,y,D1,DF )

+ ... + P"(MN y,D)f(yFIMNy,DDFN)]dyF

= P"(M1y,D) j YFf(YFIM ,y,D0,DF1)dyF

+ ... + P"(MN|y,D) yFf(yF Iy,DNDFN)dyF


= P"(M |y,D) 1 + ... + P"(MNIY,D)N.


(3.6)










To obtain an expression for the predictive variance of the BMMP,
note that by definition

V(BMMP) = 02 = (YF f(yFY,D,DF)dYF. (3.7)

Substituting (3.1) for f(yF Y,D,DF) in equation (3.7) yields

2 (F )2 IP"(MilyDi f(FMy,DiDFi)dyF. (3.8)


The following is obtained by expanding the sum in equation (3.8):
2 N 2
= P"(Mi|y,D) f (YF ) f(FMi yDiDFi)dYF
i=l -
N
(Y 2YF + 2)f(YFlbli Y'Oi'DFi)dYF"
i=l -+

Working with the ith term of this sum, the following is obtained:

P"(Mily,D){Ei(y2) 2 Ei ) + }. (3.9)

Noting that Ei(Y2) = o + [Ei(F)]2 and Ei(Y) = ,i' (3.9) becomes

P"(Mi|y,D){2 2+ + ,2}. (3.10)


The three right-hand terms inside the brackets of (3.10) may be
factored yielding:

P"(Mily,D){oa + (ui u)2}.
2
Thus, 2 may be written as follows:

N 2 2
a P"(Mily,D){ai + (i 2) }. (3.11)
i=l 1









This is the same as equation (3.3). Defining P"(M1) = P"(Mily,D),

(3.11) can be rewritten as follows:

N 2 N 2
02 p Mi P")oi + P"(M.)(u )2. (3.12)
i=1 i=l

Having defined V(BMMP) and V(BMSP), it is now possible to compare

their magnitudes. Assuming, as will be done for the remainder of this

dissertation unless otherwise noted, that the decision-maker's model

space contains only two models, M1 and M2,1 the relative magnitudes of

V(BMMP) and V(BMSP) will be examined for each of the following cases:

CASE I: 2 a2

2 2
CASE II: a0 < 02 and BMS chooses M1

2 2
CASE III: l0 < a2 and BMS chooses M2.

For convenience, P"(1i) will be used in place of P"(M.|y,D) in the

discussion and proofs of these cases and the lemmas that follow.

2 2
THEOREM 1: If a2 = 02, then V(BH1IP) > V(B;1SP).


PROOF: When N = 2,

a2 = P"(M)N + P"(M)2) + P" )(1 2 + "(M2)( 2 )2,

2 2 2 2
and when o2 02, V(BNSP) = o1 = o2.



This assumption is made in order to simplify the analysis which
follows. For a more precise explanation of this assumption, see
Section 111.2.4.










Since by definition 0 <- P"(MI), P"(M2) < 1, it follows trivially that
2 2
when 01 = 02,

2 2 2 2
P"(M1 )o + P"(M2)o2 = 1 = 02
Thus, if
P"(M)( p)2 + P"(M2)(2 2 2 0,

V(BMMP) > V(BMSP). Since (uI v)2 and (u2 2 are nonnegative,

P"(M1)(p~1 1 2 + P"( (2)("2 2 0

and V(BMMP) > V(BMSP). Unless il = '2' in which case pl = p2 = '
P"(M )(I p)2 + P(M2)( 2 )2 is strictly greater than zero and
V(BMMP) is strictly greater than V(BMSP).1

THEOREM 2: If 2 < o2 and BMS chooses M1, then V(BMMP) V(BMSP).
2 2
PROOF: Refer to the proof of Theorem 1. Since ao < a2,
2 2 2
P"(M1)2 + P"(2)o2 a02

From the proof of Case I,

P"(M 2 P(M2)(2 )2 0.

Thus, it follows that

P"(' )o2 + P"( m2)2 + P"(MI)(S1 2 P ( 2)( 2 > o2


1This dissertation is not concerned with special cases in which
pl = 12 and o2 = o.









i.e., V(BMMP) 2 V(BMSP). However, V(BMMP) equals V(BMSP) only if

P"(M1) = 1. But, if P"(M1) = 1, there exists no model specification

uncertainty. Thus, when model specification uncertainty exists,

V(BMMP) is strictly greater than V(BMSP).
2 2
THEOREM 3: If a < o2 and BMS chooses M2, then V(BMMP) < V(BMSP).

PROOF: Refer to Theorem 1. Whenever P"(M2) 1,

P"(M1)o2 + P"(M2)G2 < o2.

Therefore,
2 = P"(M ) 2 + P"(M2)o2 + P"(M1)( 2 + P"(M 2 2

depending on the size of P"(M1)( 1 )2 + P"(M2)(M 2 ,)2

Perhaps the most important thing that Theorems 1, 2, and 3 reveal
is that if model specification uncertainty exists, V(BMMP) f V(BMSP),

except for uninteresting cases. This fact will be referred to repeatedly

throughout Chapters III and IV. As will be seen in Section 111.2,

the inability to order V(BMMP) and V(BMSP) in Case III poses no problem

with respect to comparing the relative merits of the BMC and BMS proce-

dures as aids to forecasting. It does, however, make identification

of whether the measure of forecast-risk provided the decision maker by

the BMS procedure (defined in Section 111.2.6 to be V(BMSP)) understates

or overstates the actual forecast-risk Faced by the decision maker.
This problem is discussed in Section 111.2.6. In Sections 111.2.6 and
IV.3.2, it is shown that Case III may never arise, since situations

exist in which only Case I applies.

The following three lemmas and the discussion that follows them
are useful for helping to order V(BMMP) and V(BMSP) in situations









in which Case III applies. The first provides a necessary and
sufficient condition for V(BFHIP) to be greater than V(BMSP).
2 2
LEMMA 1: If o2 < 02, ] f ," and BMS chooses M2, then

V(BMMP) > V(BMSP) if and only if
2 2
(o2 o1)
P"(M2 y,D) >2
( l "2)
PROOF: 1. If V(BMMP) > V(BMSP), it must be shown that
2 2
(a2 aI)
P"(M2) > 2
( l 2)

Since V(BMMP) = 02 = P"(M1)o + P(2)o + P"(M)(1 2

+ P"(M2 )(2 -)2

and V(BMSP) = 2, V(BMMP) > V(BMSP) is the same as

P"(M1 )a2+ pP(M2)o + P"(M1)( u2 + P"(M2)(p 2 > o2. (3.13)

Subtracting P"(M)2 + P"(M2)o2 from both sides of (3.13) yields
Subtracting P2+(M n1fM( 2 2 22

P"(M)(1 )2 + P"(M2 2 2 > 02 [P"(M1)o1 P(M2 (3.14)

From (3.2) it is known that p = P"(M 1)p + P"(M2)p2. Let the rhs of
(3.14) equal R, and define P1 = P"(11 ) and P2 = P"(M2). Then substi-
tuting for p in (3.14) yields

P I( ~ Pi P2'2)2 +- 2 P P2) R (3.15)

Noting that P2 = 1 P1, (3.15) can be written

P(U u2 P2)2 + P2(12P1 1P1)2 > R. (3.16)










Factoring P2 out of the first term on the lhs of (3.16) and P1 out
of the second term on the Ihs yields

P2P P2(Ul 12)2 + PIPIP2(2 p1)2 > R. (3.17)

Noting that P1 + P2 = 1, and that P1P2I U2)2 = P1P2( 'l)2
(3.17) becomes
P1Pl > o (P1 12 + Po2) = P1(o 01). (3.18)
2 2
Dividing both sides of this inequality by P (02 a,) yields the
desired result 2 2
(' 2 01
P2 '
(vl- 112)v
02 2
(o2 1 )
2. If P2 > -22, then V(BMMP) > V(BMSP), i.e.,
(I 2

Pa + P2a2 + P1("1 )2 + P2(p2 )2 > 02.

A reversal of the steps in the first half of the proof leads
immediately to this result.
Lemma 1 can be combined with Theorem 1 to form a necessary and
sufficient condition for V(BMMP) to be greater than V(BMSP) when,
2 2
say, oI < 02, regardless of which model BMS selects.
2 2
LEMMA 2: If a0 < 02, the V(BMMP) > V(BMSP) if and only if
a) Model 1 is selected by BMS,
or


b) Model 2 is selected by BMS, 9 #f 2, and









2 2
(o| o)
P"(M2 Y,D) > 2--- '-


PROOF: Lemma 2 results from combining Theorem 1 and Lemma 1, and

its proof follows directly from their proofs.

It is clear that V(BMMP) > V(BMSP) whenever condition a or b

of Lemma 2 is satisfied. Thus, upon examining condition b the

following can be said:

1. Other things equal, the greater the distance between the

means, p1 and p2, of the predictive distributions of the

two models in question, the smaller is the rhs of the

inequality of condition b, and the more likely it is

that V(BMMP) > V(BMSP).

2. Other things equal, the closer in size are the predictive
2 2
variances, o1 and 02, the smaller is the rhs of the

inequality of condition b, and the more likely it is

that condition b holds, i.e., the more likely it is

that V(BMMP) > V(BMSP).

Both these statements apply irregardless of which model is chosen by

BMS, i.e., whether it be the model with the lower or higher pre-

dictive variance.

As an example of how statements one and two might help determine

the relationship between V(BMMP) and V(BMSP), the following is

offered. Suppose the decision-maker's prior information about y

leads him to believe that the predictive variances of both models

are roughly equal, but that their predictive means differ











significantly. By Theorems 1 and 2 and statements one and two above,

the decision maker should consider it more likely that V(BMMP) exceeds

V(BMSP) than if he believed, say, that ,1 and V2 were about the same
2 2
size. This follows since a) if ao in fact equals a2, then by Theorem 1
2 2
V(BMMP) > V(BMSP); b) if, say, a2 < a2 and the BIS procedure chooses

M1, then Theorem 2 applies and V(BIMMP) > V(BMSP); and c) if a2 < 2

and the BuS procedure chooses M2, then Theorem 3 applies and the de-
2 2
cision-maker's prior information about a~, 02, u1 and 02 in concert

with statements one and two above indicate that it is more likely that

V(BMMP) exceeds V(BMSP) than if, say, the decision maker thoughtI and

I2 were about the same size.
The next section utilizes the results of this section in comparing

the effectiveness of the BMC, BMS, and maximize-R2 approaches to

forecasting.


111.2 Forecasting: Bayesian Model Comparison Versus
Bayesian Model Selection and the Maximize-R2 Rule

Most forecasting procedures handle model specification uncertainty

suboptimally. Typically, a forecaster proposes a number of alternative

statistical models as possible candidates to represent the data-

generating process whose future value he is interested in predicting

and then, via some model screening procedure, eliminates all but one

model.

For a discussion of various classical and Bayesian model screening
procedures, see Kenneth M. Gaver and Martin S. Geisel, "Discriminating
among Alternative Models: Bayesian and Non-Bayesian Methods," Chapter
Two in Paul Zarembka (ed.), Frontiers in Econometrics (New York:
Academic Press, 1974), pp. 49-77.










In this section, forecasting as accomplished via two model-

screening procedures, Bayesian Model Selection (BMS) and the classi-

cal maximize-R2 rule approach (max-R2), is compared to forecasting as

handled by a procedure that optimally considers model specification

uncertainty, the Bayesian Model Comparison approach (BMC). Before
2
proceeding with the comparison a brief review of BMS, max-R2, and

BMC is in order.


111.2.1 The Bayesian Model Selection Procedure (BMS)

Bayesian Model Selection was discussed in some detail in Chapter

II. Briefly, it requires the following:

1. The specification of a set of N alternative statistical

models each of which purports to represent the data-

generating process of interest to the forecaster.

2. The assessment of a prior probability mass function over

the set of N models, P'(Mi), i=1,2,...,N.

3. The assessment of prior probability density functions over

the parameters of each model,g'(ei Mi), i=1,2,...,N.

4. The specification of a likelihood function for each model,

f(y|ei,MiDi), i=1,2,...,N.1

5. The computation of posterior probabilities for the models,

P"(Miy,D), i=1,2,... N.

The posterior model probabilities are often used to select one model

from among the set of N models to represent the data-generating

When thought of as a function of y with e., M., and D. given,
f(ylei, Mi, D.) is model i. 1 1










process of interest to the forecaster. The usual procedure is to

select the model with the highest posterior model probability. In

the event that the forecaster can estimate the loss that results from

choosing an inappropriate model and can do this for each of the N

models, he can compute his expected loss from choosing each model

and select the model which yields the lowest expected loss.

It should be noted that BMS may also be used for reasons other

than for the selection of a single model from among a set of N

models. For example, if N is large, BMS can be used to reduce the

number of models in the model space to a number that can be more

easily and inexpensively dealt with by a procedure such as BMC.

This can be accomplished by eliminating all models from considera-

tion whose posterior model probability is, say, less than some a,

0 < a < 1. In this dissertation, however, BMS will be regarded as

a procedure for selecting a single model from among N alternative

models.

The forecaster who uses BMS essentially handles his forecasting

problem in a two-step sequence: first, a single model is chosen to

represent the data-generating process; second, under the assumption

that the chosen model is in fact a "true" reflection of the data-

generating process, the forecaster addresses his prediction problem.





1Actually the decision-maker must be able to determine the
loss from choosing model i when model j is the true model, i / j.
There are N(N 1) such losses.










111.2.2 The Maximize-R2 Rule

The maximize-R2 rule is frequently used to choose one from among

a set of alternative competing linear statistical models whose explana-

tory variables are nonrandom. The usual procedure is to estimate the

parameters of each of the alternative models, compute each model's

coefficient of determination, R2, and then select as being the best

representation of the data-generating process the model with the

highest R2. Forecasting is then carried out utilizing the chosen model

as if it were in fact the true model.

It is important to reiterate the well-known fact that R2 is
2
inversely related to S2, the estimate of the dependent variable's

residual variance. A maximize-R2 rule is therefore equivalent to
22
a minimize-S2 rule. In other words, the model with the maximum R2

is also the model with the minimum S2

Geisel2 and Thornber3 have shown that under certain conditions

model selection as accomplished via the max-R2 rule is equivalent to

the Bayesian Model Selection procedure. The conditions are the

following:

1. The loss structure with respect to the selection of an

incorrect model is symmetric. That is, if the loss from


For a more detailed discussion of the max-R2 rule see Gaver
and Geisel, pp. 52-53.

Geisel, pp. 24-37.

3Thornber, Chapter 2.










choosing Mi when Mj is true is represented by Li, then

L. = L, for all i,j,k,. = 1,2,...,N, with i / j and

k / Q, and Lij, LkQ > 0.

2. P'(M1) = P'(M2) = ... = P'(MN), i.e., the prior model

probabilities are equal.

3. The statistical models in question are normal regression

models each of which has the same number of parameters.

The parameters of each are its regression coefficients,
2
usually denoted by B's, and its residual variance, a

That each model has the same number of coefficients implies

that each model has the same number of independent (explana-

tory) variables.

4. The prior density function for the parameters, Bi and o ,

is diffuse.

Geisel and Thornber used different forms for the diffuse prior

density function for the parameters Bi and oci, but both showed that

selection of the model with the highest posterior probability is

equivalent to selection of the model with the lowest S2. Since the

model with the lowest S2 also has the highest R2, Geisel and Thornber

have shown that selection of a model via the BMS procedure is equiva-

lent to selection via the maximize-R2 rule.

Since a model's R2 can be increased simply by adding more

"explanatory" variables to the model, a maximize-R2 rule is frequently

used in place of the maximize-R2 rule. R2 is defined as follows:1

See Gaver and Geisel, pp. 52-54.












R2 = R2 1 (1 R2
(n-k (1T

where n is the sample size and k is the number of explanatory vari-

ables. The addition of variables will increase the model's R2,

adjusted coefficient of determination, if and only if the F statistic

for the hypothesis that the added variables' coefficients are all zero

is greater than one.1 Geisel showed that in the two-model case, model

selection via the BMS procedure can be made equivalent to selection
-2
via the maximize-R rule if the relationships between the parameters

of M1 and M2 are appropriately specified. The required parameter

relationships are, unfortunately, somewhat nonsensical. There are

no known intuitively meaningful sets of assumptions under which the

BMS procedure and the maximize-R rule are equivalent.

In the remainder of this chapter the four conditions listed

above apply, unless noted otherwise. Thus, to avoid redundancy, the

maximize-R2 rule will not be discussed directly in what follows

but will be addressed indirectly through comments about the equiva-

lent selection procedure, BES. Since the BMS and maximize-R2 pro-

cedures are equivalent only in that they select the same model,

only comments concerning the fact that the BMS procedure actually

chooses a model, or comments about which model it chooses, also apply

to the maximize-R2 procedure.


1 -2
John B. Edwards, "The Relationship Between the F-Test and R ,
The American Statistician, 23 (December, 1969), p. 28.

Geisel, pp. 41-45.










111.2.3 The Bayesian Model Comparison Procedure (BMC)

The Bayesian Model Comparison procedure was discussed in detail in

Chapter II. Briefly, it requires the following:

1. The specification of a set of N alternative statistical

models, each of which purports to represent the data-

generating process of interest to the forecaster.

2. The assessment of a prior probability mass function over

the set of N models, P'(Mi), i=1,...,N.

3. The assessment of prior probability density functions

over the parameters of each model, g'(eilMi), i=1,...,N.

4. The specification of a likelihood function for each model,

f(ylei,Mi,Di), i=1,...,N.
5. The computation of posterior probabilities for the models

(referred to as model probabilities),P"(Mi |y,D), i=1,2,...,N.

6. The computation of the marginal distribution of future values

of the data-generating process. (This distribution, as

noted earlier, is a predictive distribution. It will be re-

ferred to herein as the Bayesian Mixed Model Predictive

(BMMP).)

The first five requirements are the same as the five requirements

of the Bayesian Model Selection procedure. It is the sixth requirement

that distinguishes the Bayesian Model Comparison procedure from the

Bayesian Model Selection procedure. Instead of choosing one of the N

models,as does the BMS procedure, BMC models the data-generating

process of interest with the BMMP distribution.










Recalling (2.14), the BMMP distribution is defined as follows:

N
f(ylD) = I P'(Mi)[f f(y|eiM ,Dii)g'(ei Mi)dei] (3.19)
i=l 0

N
= I P'(Mi)f(ylMi,Di). (3.20)
i= 1

All the terms denoted in (3.19) and (3.20) were defined in Chapter II,

and the distributions denoted in (3.19) and (3.20) were redefined in

the six requirements above.

After observing realizations of the data-generating process in

question, the BMMP takes the form presented in (2.15):

N
f(yFIY,D,DF) = i P"(Mily,D)[f f(yFeiMi,DFi)g"(eiIMiy,Di)dei] (3.21)
i=l 0

N
= P"(Mily,D)f(yFIMiY ,Di,DFi). (3.22)
i=l

Recall that D = (DI ,D2,... DN)', where Di is a vector containing the

values of model i's explanatory variables that correspond to the most

recently observed y value. DF = (DF1,DF2 ... DFN)', where DFi is a

vector containing the values of model i's explanatory variables at

the time the next y value is to be generated. From (3.20) or (3.22),

it can be seen that a BMMP distribution is a weighted average -- or

mixture -- of each model's predictive density of YF' f(yF]Mi'yDiDFi)"










The implications of parameter and residual uncertainty for pre-

diction and decision making have been given considerable attention.

See, for example, any of the following: Theil, Fisher, Brainard,

Leland, Basu, Zellner, Barry and Horowitz, and/or Waud.1 As noted

in Chapter II, the BMC procedure considers residual, parameter, and

model specification uncertainty. Accordingly, if each model in the

set of N competing models is viewed as a possible "parameter value"

for the process of interest, the BMC procedure may be thought of as a

means for extending the parametric analysis of prediction and decision-

making problems to include consideration of the possibly widely dif-

fering predictive and decision-making implications of the competing

models. Thus, just as a Bayesian can extend predictive analysis by

explicitly allowing for parameter uncertainty instead of just using

parameter estimates, the BMC procedure extends parametric analysis

by explicitly considering model specification uncertainty.

A forecaster using the BMC procedure rather than, say, the BMS

procedure, does not have to unnaturally divide the forecasting problem

into two parts. He does not have to first select a model from the set

H. Theil, Economic Forecasts and Policy (Amsterdam: North-
Holland, 1961). Walter D. Fisher, "Estimation in the Linear Decision
Model," International Economic Review, 3 (January, 1972): 1-29.
William Brainard, "Uncertainty and the Effectiveness of Policy,"
American Economic Review, 57 (May, 1967): 411-25. H. Leland, "The
Theory of the Firm Facing Uncertain Demand," American Economic Review,
62 (1972): 278-291. A. Basu, "Economic Regulation Under Parameter
Uncertainty" (Ph.D. dissertation, Economics Department, Stanford Uni-
versity, 1973). Zellner, Chapters II, III, and XI. Christopher B.
Barry and Ira Horowitz, "Risk and Economic Policy Decisions," Public
Finance 30 (no. 2, 1975): 153-165. Roger Waud, "Asymmetric Policy-
maker Utility Functions and Optimal Policy Under Uncertainty,"
Econometrica, 44 (January, 1976): 53-66.










of N competing models and then, assuming the chosen model to be the

correct model of the process, proceed with his forecasting. He

computes the BMMP distribution for his set of models and uses it

directly to determine, say, point or interval predictions for future

values of y. The forecaster's BMMP distribution reflects his residual,

parameter, and model specification uncertainty, and any predictions

that he makes using his BMMP are made in light of all three types of

uncertainty and with the use of information bearing on any and all of

them. This point will be discussed in greater detail in Section

111.2.5.

The next section sets forth the specific assumptions under which

the BMC and BMS procedures will be compared in the remainder of the

chapter.


111.2.4 Model Space and Assumptions

The comparison of the BMC and BMS procedures (and indirectly

max-R ) that follows will be based on the following assumptions:

1. The decision maker (forecaster) behaves as if he believes

that one or the other of the following two models is an

accurate representation of the random process of interest,

but he is unsure which model is appropriate:

M1: = 1X + C;

M2: y = B2 + 6.

y is the variable whose future value the forecaster is

interested in predicting. X and Z are two different

explanatory variables. X and Z are random, but their











values associated with the next y to be generated are known

prior to y's observation. B1 and B2 are unknown parameters.

e and 6 are the usual normally distributed error terms,
2 2
each with mean zero and unknown variance, o2 and o res-
E 6
pectively. It is also assumed that cov(B,lE) = cov(B2,6) =

cov(e,6) = 0. Thus, M1 and M2 are normal univariate regres-

sion models which, to keep the number of each model's unknown

parameters at two, have been forced through the origin. Since

the values of the explanatory and dependent variables can

always be scaled so that M1 and M2 pass through the origin,

no generality is lost by using models without intercept

terms. Care must be taken, however, to interpret results

in the appropriate units.

2. The random process of interest to the forecaster is

stationary.

3. X and Z are uncorrelated and only the explanatory variable

in the true model affects y. Thus, if M1 were the true

model, B2 would be zero. If neither M1 nor M2 were the

true model, it may be that 1I = a2 = 0.

4. In comparing the BMC and BMS procedures, it will be assumed

that the forecaster may have prior information about the

parameters of M1 and M2. Since model selection via the BMS

procedure and the maximize-R2 rule are equivalent only if

the forecaster has no prior information about the parameters










of the models, any comments made about the BMS procedure

under this assumption do not apply to the maximize-R2 rule.1

Note that in assumption one above the residual variance of each

model is assumed to be unknown. It would be unrealistic to assume

the residual variance to be known when the correct model of the process
2 2
is not known. Further if a and ao were known, or were assumed to be

known, and the correct model was known to be either M1 or M2, the

correct model could be selected by the forecaster with probability one

and there would be no need for procedures such as BMC or BMS.

To illustrate, consider the following argument. For a given X
2 2
value the conditional variance of y, a2lx, is 02 For a given value
2 2
of Z the conditional variance of y, a2 yZ is a The marginal variance
2
of y (i.e., y's variance unconditioned on X), 0y, as described by M1
22 2
is elax + a and the marginal variance of y as described by M2 is
22 2
27Z + a6. If M1 were in fact the true model, then
2 2 2 2
y = alx + aE,
y 1lx E
2 2
y|X = a,
and

B2 = 0 .

Since B2 = 0, the marginal variance of y as described by M2 is simply
2
o6. Thus, since the marginal variance of y is now known to be
2 2 2 2 2 2 2
a x + e, it follows that a% = B1ax + a This says that when M1 is
2 2 2
the true model a < a Consequently, if it is assumed that a and o0


1The specific conditions under which the BMS and the max-R2
approaches to model selection are equivalent were listed in Section
111.2. Only assumption four of this section affects their equivalency.










are known, the model with the lower residual variance can be identi-

fied with probability one as being the true model.

In the next section the BMS and BMC procedures are compared with

respect to how well each accounts for a forecaster's model specifi-

cation uncertainty.


111.2.5 The Treatment of Model Specification Uncertainty

Assuming that the random process of interest is stationary and

that one of a proposed set of alternative models is a true repre-

sentation of the process, Geisel has shown that in the limit the BMMP

and BMS predictive distributions are the same.1 Thus, in the limit,

the BMC and BMS approaches to forecasting are equivalent. This

result is demonstrated below.

Recalling (2.15), a BMMP can be written as a weighted average of

model predictives:

N
f(YF|y,D,DF) = i P"(Mi|y,D)f(yFIMiy,Di,D Fi). (3.23)
i=1

Each of the individual model predictives, f(yFIMiy,DiDFi), is the

distribution that would be used to characterize the random process

in question if the BMS procedure chose M..

Geisel has shown that if Mi is in fact the true model, then as

sample evidence accumulates (i.e., as n -) P"(Mily,D) approaches one.

It follows trivially that as n approaches -, f(yFIy,D,DF) approaches

1Geisel, pp. 22-23.

2Ibid.











f(YFIMiy,DiDFi). Thus, since the distribution yielded by the BMC
procedure to forecast future values of y is f(yFly,D,DF), and that

yielded by the BMS procedure for forecasting purposes is

f(yFIMiy,Di,DFi), in the limit the BMC and BMS procedures are
equivalent forecasting procedures. This unsurprising result says

that in the limit, under the assumed conditions, truth is obtained,

i.e., the accumulated data would indicate with certainty the model

that had been generating the data. If such were the case, everybody

would ultimately use the same--correct--model to predict future

values of y.

In both the BMS and BMC procedures the forecaster or decision

maker proposes a set of N models each of which he believes might

correctly represent the random process whose future values he is

interested in predicting. Theoretically, if he assesses a nonzero

probability for a particular mdoel, that model should be included

in his model space. In both the BMS and BMC procedure the fore-

caster assesses a prior probability mass function over the N models

in his model space. By so doing the forecaster is formally

acknowledging the fact that he is uncertain as to the correct model.

He is thus faced with a forecasting problem in which model specifi-

cation uncertainty is present and must be dealt with.










By selecting one of the N models and assuming it to be true, the

BMS approach to forecasting yields predictions that do not appropri-

ately reflect the forecaster's model specification uncertainty. The

BMMP of the BMC procedure, however, by utilizing all N model predic-

tives and their associated model probabilities acknowledges the

forecaster's model specification uncertainty and yields predictions

that do reflect this uncertainty. Forecasting via the BMS procedure

should therefore be regarded as an approximation to the "optimal"

approach to forecasting offered by the BMC procedure.

In the next section of this chapter the risk involved in fore-

casting via the BMC procedure is compared to that involved in fore-

casting via the BMS procedure. These risks are measured by V(BMMP)

and V(BMSP), respectively.


111.2.6 Risk Specification

Forecasts are frequently used an inputs to decision-making

problems. For example, predicted new-car demand might be used by an

auto manufacturer in determining the rate and timing of automobile

production, as well as the size of his labor force. Much of the risk

taken by a decision maker in making a decision that utilizes a fore-

cast stems from the possibility of forecasting error. If, for

example, the forecasted new-car demand errs on the high side, both the

manufacturer and many of his distributors might be burdened with an

excess stock of cars, leading to unnecessarily high inventory costs.

The risk passed on to a decision maker by a forecaster, called here

forecast-risk, will be assumed to be adequately measured in terms of










the variance of the forecaster's predictive distribution. Such an

assumption would be appropriate, for example, if losses associated

with forecast errors were proportional to the squared error of the

forecast.

A forecaster that utilizes the BMS or BMC procedure is admitting

that he is uncertain of the specification of the process whose future

values he wishes to predict. It has been noted above that this uncer-

tainty is fully reflected in a BMMP distribution but not in a

Bayesian Model Selection Predictive (BMSP) distribution. Thus,

unless V(BMMP) equals V(BMSP), or if no model specification uncertainty

exists,V(BMSP) is an inappropriate measure of forecast-risk, either

under or overstating it as V(BMMP) > V(BMSP) or V(BMMP) < V(BMSP).

Thus, the decisions that utilize a prediction arrived at via the BMS

procedure will have been made under the assumption that the risk

involved is either less than or greater than it is in reality. The BMS

procedure, therefore, has the potential to provide the decision maker

with information that may lead him to generate inappropriate and

excessively costly decisions.

As seen in Cases I, II, and III of Section III.1, V(BMMP) may be

greater than or less than V(BMSP). In certain situations it is more

likely that V(BMMP) is greater than V(BMSP), and in others it is

always the case that V(BMMP) is greater than V(BMSP). Such situations

will be discussed below.

It was noted in Section 111.2.2 that a model's posterior proba-
bility is inversely related to its estimated residual variance, S
ability is inversely related to its estimated residual variance, Si,










and, therefore, directly related to its coefficient of determination,

R Thus, if MI's posterior probability is high relative to M2's pos-
2 2 2
terior probability, then S1 is low relative to S2, and R1 is high rela-
2
tive to R2. If such were the case, it could be said that the accumu-

lated evidence supports M1 rather than M2 as being the more likely

data-generating source. Accordingly, a forecaster might be tempted to

invoke the BMS procedure or the maximize-R2 rule and choose M1 and its

predictive distribution with which to forecast yF. But in such cases

it is more likely that V(BMMP) > V(BMSP) than it would be if the evi-

dence did not so clearly support one model or the other.1 This is

explained below.
S2 2
02 1i
CLAIM: If --- 2 remains constant, the larger the difference in
( )
P"(M1) and P"(M2), the more likely that V(BMMP) > V(BMSP).

DISCUSSION: Zellner has shown that for a normal regression model

(see the assumptions of Section 111.2.4) with diffuse prior information
2
on the parameters of the model, V(BMSP), also denoted oa, is defined as

follows:2

(n 1)S D
1i (n 3) NN 2
I Dji
J=1

1From (3.12) it can be seen that when, say, P"(Mi) is close to
one, the difference between V(BMMP) and V(BMSP) is of no practical
significance. Under such circumstances a comparison of V(BMMP)
and V(BMSP) serves little purpose.

Zellner, pp. 72-74.










where n is the sample size, i.e., the number of y values observed to
date; the D.j's are the values of model i's independent (explanatory)

variable, Di, observed to date; DFi is the value of Di that corresponds
to the next y value generated by the process in question; S2 is the

estimated residual variance of model i. It can be seen from (3.24)
2 2
that C is proportional to Si.
It is known from Geisel's work that P"(Mi) is inversely pro-
2 1
portional to S Thus, the larger P"(M1) P"(M2)1, the larger is

|S2 S2
1>1 2-|.
Conditions 2a and 2b of Section III.1 provide necessary and suf-

ficient conditions for V(BMMP) > V(BMSP). The conditions are that if,
2 2
say, i,< ,2' then P"(M1) must be greater than .5 or P"(M2) must be
2 2
02 1 2
greater than 2 Thus,other things equal, if P"(M1) < .5,
(12 P )
then the larger is [P"(M2) P"(M1)], the more likely it is that
P"(M2) satisfies either condition 2a or 2b, i.e., the more likely it

is that V(BMMP) > V(BMSP). Of course if P"(M1) > .5, then V(BMMP) is

greater than V(BMSP) regardless of how large [P"(M1) P"(M2)] is.
The phrase "other things equal" used above refers specifically
2 2 2
to the ratio of loJ all to ("2 1)2 What is being said is that

given two model selection situations in which the absolute value of
the ratio of (ao o ) to (2 1)2 is the same in both, but that in

the first situation IP"(M1) P"(M2)I is larger than it is in the


See Section 111.2.2.
2The "other things" are clarified in the next paragraph.









second, then it is more likely in the first situation that

V(BMMP) > V(BMSP).
2
This claim can be supported from another angle. Since o. is
2 2
proportional to Si, it can be said that the smaller, say Sl is in
2 2 2
relation to S2, the more likely it is that o0 < 02. By the Geisel
2re
result discussed in Section 111.2.2, the smaller is Si in relation
2
to S2, the larger is P"(M1) in relation to P"(M2). Thus, the smaller
2 2
S2 is in relation to S2, the more likely it is that the model with

the lower predictive variance will be chosen by the BMS procedure.

Therefore, by Theorem 1 of Section III.1, the more likely it is that

V(BMMP) is greater than V(BMSP).

There is a special forecasting case worth noting in which V(BMMP)

is greater than V(BMSP) no matter which model the BMS procedure

chooses. It is a result of the following lemma.
2 2
X Z
LEMMA 3: If then the model with the lower estimated
n 2 nZ2
j=1 j=1
2 2
residual variance Si, also has the lower predictive variance, oa.

PROOF: Proof of this lemma follows directly from the definition

of o0. Recalling (3.24) and the model space assumptions of Section
2 2
111.2.4, 0l and 02 are defined as follows:

1 22
2 (n 1)S1 X2
2I =1(n + 1 (3.25)
1 nW-73) n 2 )
U'lX












2 (n 1)S2( Z
2 2-F + 1 (3.26)

j=1 / 2

Thus, since n, the sample size, is a constant, and F is assumed
IX.
2
F 2 2 2 2
equal to -2 a1, and 02 are proportional to S1 and S2, respectively.

Since the model chosen by BMS has the smaller estimated residual

variance, by Lemma 3 it also has the lower predictive variance. Thus if

Lemma 3 holds, by Theorem 2 of Section III.1 V(BMMP) > V(BMSP). In

this special case, a decision maker using a forecast obtained via the

BMS procedure would be making a decision that fails to recognize the

full extent of the uncertainty involved in the outcome of his decision.

Under the assumptions of Section III.2.4,Zellner has shown that

the posterior expected value of the residual variance of, say, Model 1
is1 2
2 (n 1)S 2
E" (n 3) (3.27)

and Raiffa and Schlaifer have shown2 that the posterior variance of,

say, B1 is 2
(n 1)S1
V"(1) = n 1 (3.28)
(n 3) X2
j=1
Thus, recalling (3.25), the predictive variance of model 1 may be

written

Zellner, p. 62.

Howard Raiffa and Robert Schlaiffer, Applied Statistical Decision
Theory (Cambridge,llass.:The M.I.T. Press, 1961), pp. 349-55









02 = V"(Bl)X2 + E"(2). (3.29)

The following lemma, based on the above facts, is offered to

further explain the relationship between V(BMMP) and V(BMSP):

LEMMA 4: If E"(c2) < E"(o2), X2 Z and X < Z, then
=1 F J=1
2 21
V"(B) < V"(B2) and 1 < 02'

PROOF: From equation (3.27) it can be seen that E"(a ) and E"(oa)
2 2 2 2
are proportional to Sl and S2, respectively. Thus, E"(o ) < E"(o6)

means that S2 < S2. From (3.28) it can be seen that V"(BI) and V"(B)
n n
are inversely related to X and Z2, respectively. Consequently,
j=1 j=

if S2 < S2 and X2 > Z2, it can be seen from (3.28) that
j= 1 j=1

V"(1 ) < V"(B2). Thus, since V"(1) < V"(B2), E"(a2) < E"(a2), and
2 2
XF < ZF, it follows from equation (3.29) that a < 22.

If the conditions of Lemma 4 are fulfilled, the model selected

by the BMS procedure will have the lower predictive variance and by The-

orem 2 of Section III.1, V(BMMP) > V(BMSP). Thus, as is the case

when Lemma 3 holds, a decision maker using a forecast obtained via the

BMS procedure would be making a decision which fails to recognize the

full extent of the uncertainty involved in the outcome of his decision.

12 2 2 2
S2 and S2 could, of course be substituted for E"(o ) and E"( 2)
respectively, but one of the goals of this lemma is to explain the
relationship of V(BMMP) and V(BMSP) via the, perhaps, more easily
interpretable definition of o2: o2 V"(B )X2 + E"(o2).
01 01 8lF E'o4










The next section examines the decision-maker's posterior expected

losses from utilizing BMS and BMC-generated predictions of yF'

111.2.7 A Comparison of Expected Losses

Given a loss function, sample y values, and a predictive distri-

bution of y, a forecaster can find an optimal point estimate for y

by minimizing the decision-maker's posterior expected loss:


mmn L(YF yF)f(yFD,DF)dyF. (3.30)
y -=

It is well known that if a quadratic loss function is used in (3.30),

the solution to the minimization problem is the mean of f(YF Y,D,DF).

If the forecaster chooses to forecast via the BHS procedure he would

utilize a model predictive, f(yF Mi,y,Di,DFi), to solve (3.30). The

solution to (3.30) and his point estimate for yF would therefore be

the mean of his model predictive, i.. If he chooses to forecast via

the BMC procedure, he would use a BMMP, f(yFIYD,DF), to solve (3.30)

and his solution and point estimate would be the mean of the BMMP, p.

As has been mentioned several times earlier in this chapter, however,

a forecaster who opts for forecasting via BMS is not making use of

all the available information about yF. The Bayesian Mixed Model

Predictive (BMHP) of the BMC procedure reflects all the available

information, whereas a BMSP is merely an approximation to the BMMP.

Therefore, the appropriate predictive distribution to use in (3.30) is

a BMMP. Consequently, the optimal solution to (3.30) is p, the mean

of the BMMP, i.e., y = p. Only if the forecaster and/or decision









maker assess a probability of one for a particular model being the
true model of YF's process would a single model predictive provide
full information to the forecaster and/or decision maker and, hence,
an optimal solution to (3.30).1
Since the appropriate distribution to use in solving (3.30) is a

BMMP, the decision-maker's posterior expected loss using a BMS fore-
cast, .i, is greater than his posterior expected loss using a BMC
forecast, p:

EL(,i) = f L(YF'pi)f(YFIY,D,DF)dYF > EL(u)


= f L(yF')f(YFLDi,DDFi)dyF. (3.31)


This follows from the fact that it is v, and not pi, that minimizes

j L(YF,y)f(yFIY,D,DF)dyF. (3.32)


When P(Mi) > 0,i=1,2, then only if u, = P2 would, say, p1, minimize
(3.32) since then 1 = P2 = P"(M1)p1 + P"(M2)u2 = u. Of course if for
some i P(Mi) = 1, then -i = v also. But in the context of this disser-
tation, this case is of no interest.
Let C(BMC) and C(BMS) stand for the costs required to forecast
with BMC and BMS, respectively.2 Then, assuming that the decision

maker's loss function and the cost functions C(BMC) and C(BMS) can be

1Note that when P(Mi) = 1, the BMSP and BMMP distribution are the
same.
2In general C(BMC) and C(BMS) cannot be computed without going
through the actual computations required by the BMC and BMS pro-
cedures.











meaningfully compared, if experimentation with the BMC and BMS pro-

cedures shows that in general

EL(Ci) EL(p) > C(BMC) C(BMS),

it is materially as well as theoretically advantageous for the fore-

caster to use the BMC procedure rather than the BMS procedure.

Future values of a random variable are typically predicted using

point or interval estimates. The implications of making point and

interval estimates via the BMS procedure as opposed to the BMC pro-

cedure are discussed in the next two sections.


I11.2.8 Implications for Point Estimation

The point estimate of a future value of some random process will

be denoted by yF. The use of loss functions to determine optimal

point estimates was discussed in the preceding section of this chapter.

If a loss function can be specified by the forecaster and/or decision

maker, it should be used to determine yF. Frequently, however, loss

functions are too costly to develop and predictions must be made

without the information that a loss function provides. In such cases

forecasters usually examine YF's predictive distribution and choose a

measure of its central tendency as their estimate of yF. Their logic

is that central tendency measures are usually in the high density region

of the distribution and will not err significantly even if the actual

YF falls in a tail of YF's predictive distribution. Further, it is










well known that commonly-used loss functions often result in mean,

median, or modal estimates of parameters.

In the preceding section, it was noted that if the BMS procedure

and a quadratic loss function are utilized for forecasting, yF = Hi.

However, even if a BMS forecaster does not have a loss function with

which to work, he might again choose the mean of the chosen model pre-

dictive, ui, as his point estimate of yF. In either of these cases, if

the BMS procedure chooses M1 and ul f '2', then, for reasons explained

below, it can be said that the forecaster's point estimate is inappro-

priately high or low with probability one. For example, if vH < H, and

P is used by a BMS forecaster to predict YF, Pi is said to be an

inappropriately low prediction of yF.

Suppose it is the next y value that the forecaster would like to

predict. By assessing nonzero model probabilities for M1 and M2, as is

done in both the BMS and BMC procedures, the forecaster/decision maker

is acknowledging that he believes the next observation could be gene-

rated by either M1 or M2. A prediction of the next yF value should

acknowledge this uncertainty. But forecasting procedures that utilize

the BMS procedure do not optimally account for this sort of uncertainty

(model specification uncertainty) because they do not appropriately

reflect the possibility that a rejected model may be the true model.

Thus, in the example of the preceding paragraph, ll is said to be an

inappropriately low forecast because it does not appropriately reflect

the fact that yF may be generated by M2.

1Since w = P"(M1)01 + P"(M2)'2 and P"(M1), P"(M2) > 0, P1 f P2
means 1l f H2 f u.










Forecasts made utilizing the BMC procedure do reflect model

specification uncertainty. u, the mean of the BMMP distribution is

an example of a BMC-generated prediction. As can be seen by examining

its definition, u reflects the belief that YF may be generated by

either M1 or M2:

v = P"(M1 ) + P"(M2) 2.

Since the decision maker's predictive distribution is a mixture of

the model preditives, his optimal estimator will arise from the

mixture as well, and in this case will be p. It is just as appropriate

to use v when model specification uncertainty exists as it is to use,

say, lI when it is known that yF will be generated by M1.

If a forecaster's loss function is asymmetric, the mean of YF's

predictive distribution would not be appropriate for forecasting yF.

Suppose his losses are best represented by an asymmetric linear loss

function and model specification uncertainty exists. Then his optimal

point estimate for YF would be a fractile of YF's BMMP distribution.

A BMS forecaster utilizing an asymmetric linear loss function would

use a fractile of the BHSP distribution. If the asymmetric linear loss

function describes losses from underestimating YF as being greater than

losses from overestimating yF, the BMS forecaster's point estimate

would be a fractile of the BiISF distribution which is greater than the


If the linear loss function were symmetric, the optimal point
estimate would be the median of the BMMP.










mean.1 In such cases the BMS forecaster may seriously underestimate

yF and incur a large loss while thinking is is protecting against
such an occurrence. Suppose p, < v, and the BMS procedure selects Ml.

Then, if the forecaster chooses to estimate YF with a fractile of Ml's

BMSP distribution which is less than p, say, the .7 fractile, the BMSP

reflects his probability of underestimating YF as being only .3. But,

if the .7 fractile of the BMSP distribution is less than p, the BMMP

distribution reflects his probability of underestimating yF as being

greater than .5. Thus, a BHS forecaster may believe he is protecting

against underestimating yF when in fact he has a higher probability of

an underestimate than an overestimate.

The results of this section were generated via a comparison of

BMC and BMS forecasts. It should be noted, however, that point esti-

mates determined by any procedure which utilizes a single model that

has been selected from a set of viable models will typically be mis-

placed. This is due to the fact that use of a single model, however

selected, has the effect of ignoring information provided by those

remaining models which have positive posterior probability.


111.2.9 Implications for Interval Estimation

The procedure of predicting that a future value of a random

process will take on a value between two specified real numbers with
1Raiffa and Schlaifer, p. 345, have shown that the predictive
distributions for yF yielded by M1 and M2 are Student. Since the
Student distribution is unimodal and symmetric, its mean and median
are equal.










some positive probability is referred to as Bayesian interval esti-
mation. The interval represented by the two given numbers is called
a credible interval. Often, a Bayesian will choose as his credible

interval a Highest Posterior Density (HPD) region.1 Denoting YF's

predictive distribution as f(yF]y), an interval I in the domain of

YF is called a HPD region of content 1 a if
a) P(YF E I) = 1 a

b) YFI e I and yF2 1 I implies

f(yFl1y) f(YF21Y)'2

BMS interval forecasts of yF are determined from the predictive

distribution of yF generated by the model chosen by the BMS procedure,

i.e., a Bayesian Model Selection Preditive (BMSP). BMC interval fore-

casts of yF are determined from the appropriate Bayesian Mixed Model
Predictive (BMMP).
Recall that under the assumptions of Section 111.2.4, M1 and M2

define unimodal, symmetric distributions (Student distributions).
Accordingly, a HPD credible interval determined from Mi's BMSP will be

centered at pi. Thus, when model specification uncertainty exists and

Wl uP2' the midpoint of a BMS credible interval is inappropriately
high or low in the same sense as BMS point estimates were in the

Bayesian methods for optimal interval estimates exist when, as in
the case of point estimation, appropriate loss functions may be speci-
fied. See R. L. Winkler, "Decision-Theoretic Approach to Interval Esti-
mation," Journal of the American Statistical Association, 67 (1972),
187-191.
2George E. P. Box and George C. Tiao, Bayesian Inference in
Statistical Analysis (Reading, MA: Addison-Wesley, 1973), p. 123.










preceding section. The discussion of this phenomenon with respect to

point estimates in the preceding section applies equally well here.

Under the assumptions that M1 and M2 are normal regression models,

Il f u2, and P'(h1), P'(M2) > 0 (see Section 111.2.4), the BMMP distri-
bution is bimodal. Accordingly, an HPD BMC credible region will fre-

quently consist of two intervals; one with midpoint u1, the other with

midpoint P2. Interval forecasts that are comprised of more than one

interval will be referred to as split-interval forecasts or split

credible intervals. An HPD split credible interval serves to warn a

decision maker that it is highly probable that yF will take on a value

in one of two or more noncontiguous regions.

The following two lemmas demonstrate how a credible interval

formed using a BMSP can be misleading when model specification uncer-

tainty exists. In Lemma 5, the intersection of the BMSP's of M1 and

M2 between their modes is referred to as the inter-modal intersection.

The YF value that corresponds to the inter-modal intersection will be

denoted YF.

LEMMA 5: Let 1 f v"2 and suppose BMS chooses model i. If the length

of a credible interval formed using the BMSP is less than or equal to

2|pi YF, then the BMS credible interval overstates the probability

that it will cover YF'

PROOF: Recall that the BMMP is a mixture of predictive distri-

butions generated by M1 and M2:










f(yF IYD,DF) = P"(M1 Iy,D)f(yF IM ,y,D1 DF1)

+ P"(M2 ly,D)f(yFM2Y,D02DF2)

Thus,

f(YFi,Y,D DFi f(yF y,D,DF).

When P"(Mi |y,D) f 1, then

f(Fli 'Y,Di'DFi) > f(yFIY ',D,DF)'
If, say, p, < y2 and yF < Y, then

f(yFIM ,y,D,DF0) > f(yFy,D,DF).

Thus, the probability of an interval centered on pi of length less
than 2|1I YI containing yF is greater when the probability is
evaluated via f(YFIMIY,D,DFl), rather than f(yFly,D,DF).
If the conditions of Lemma 5 are fulfilled, the probability of a
BMS credible interval covering yF is actually smaller than claimed by
the forecaster using the BMS credible interval. Thus, the BMS credible
interval overstates the probability of yF being covered and therefore
understates the risk involved in using the interval forecast for
decision-making purposes. Notice that since f(yF|y,D,DF) >

f(yMy,D1,DF) when y> y, it is unclear whether a BMS credible
interval of length greater than 21|1 yFI understates or overstates
the probability that it will cover yF'

LEMMA 6: If pl = P2 and the BMS procedure chooses the model with the
higher (lower) predictive variance, then a BMS credible interval under-










states (overstates) the probability that it will cover yF'

PROOF: Theorem 3 of Section III.1 showed that if the BMS proce-

dure chooses the model with the higher preditive variance, thenV(BMMP)

may be less than, greater than, or equal to V(BMSP). Recall that

V(BMMP) = 2 P"(MI Iy,D)ac2 + P"(M2,D) 2 + P"(M |y,D)(l w)2

+ P"(M2y,.D) (" 2 u)2

and

E(BMMP) = p = P"(M1 |y,D)p + P"(M21y,D)"2.


Thus, pl = 12 implies that p1 = P2 = p and

2 P"(1 |y,D)o2 + P'"(I'2y,D)o2.

Therefore, if the BMS procedure chooses the model with the higher

predictive variance,V(BMMP) < V(BMSP). Under the assumptions of this

chapter, a BMSP distribution is Student and, therefore, unimodal and

symmetric. Accordingly a 95 percent credible interval, say, formed

using the BMSP distribution will be wider than a 95 percent credible

interval formed using the BMMP distribution. It follows that the

probability of yF being covered by a BMMP (i.e., BMC) credible inter-

val of the same size as a 95 percent BMS credible interval is greater

than .95. Thus, it may be said that when the conditions of Lemma 6

are fulfilled, a Bi1S credible interval understates its probability

of covering yF'

In this chapter, it has been shown that when model specification

uncertainty is present, the appropriate distribution with which to











characterize a data-generating process is the BMMP of the BMC proce-

dure. Failure to use the BMMP when model specification uncertainty

exists results in two interesting and seemingly contradictory effects.

First, in using a single model, however selected, information provided

by the remaining models which have positive posterior probability is

ignored. Second, in ignoring available information about model speci-

fication uncertainty, the forecaster behaves in many cases as if he

is facing a lesser degree of uncertainty than is actually the case.

Thus, the forecaster simultaneously discards relevant information

and behaves as if he possesses more information than is actually

possessed.1 This phenomenon was noted in both point and interval

forecasting situations.

In the next chapter, the BMC procedure is applied to single-

period economic control problems.
















1Christopher B. Barry and P. George Benson, "Specification Uncer-
tainty in Economic Forecasting and Control Models," University of
Minnesota, Graduate School of Business Administration, Working Paper
No. 35 (February, 1977), p. 7.










CHAPTER IV

MODEL SPECIFICATION UNCERTAINTY IN SINGLE-PERIOD
ECONOMIC CONTROL PROBLEMS


In Chapter III the consequences of forecasting with and without

considering model specification uncertainty were examined. Given the

existence of model specification uncertainty, it was concluded that the

BMC procedure was an appropriate procedure to utilize in predicting

future values of a random process. In this chapter, the BMC procedure

is applied to single-period economic control problems. In particular,

the BMC procedure will be used to find both certainty-equivalent and

optimal analytic solutions to single-period control problems. In both

cases, control solutions will be derived which take into consideration

costs that may be incurred by a controller as a result of his employing

a particular instrument (controllable variable) to help control a

random process.

By using the BMC procedure to solve economic control problems,

control solutions need not be artificially conditioned on the assump-

tion that a particular econometric model is in fact an accurate

characterization of the process whose control is desired. Instead, a

controller's model specification uncertainty is reflected in his con-

trol solutions, i.e., in his decisions concerning the levels or rates

at which to set his controllable variables. By explicitly recognizing

model specification uncertainty and including it through the BMC











procedure as part of the economic control problem, the controller is

appropriately specifying the risk that control entails.

In the following section, the economic control problem is defined,

references to previous work in this area are cited, and the integra-

tion of the BMC procedure and the economic control problem is discussed.

IV.1 The Economic Control Problem

The problem of effecting the outcome of some economic data-

generating process such as the GNP, rate of inflation, or unemployment

rate, is referred to as an economic control problem. More specifically,

given an econometric model of the data-generating process of interest,

a single-period economic control problem involves determining settings

for the model's instruments -- controllable variables -- in one time

period such that in the next time period the model's dependent variable

-- a desideratum or policy objective -- is close to a specified target

value or within a specified target interval. Controlling values for

the model's instruments are determined by optimizing an objective or

criterion function that is typically a function of the difference

between the target and realized values of the dependent variable.

An economic control problem may be expressed as:

min f L(y y*)f(ylX)dy (4.1)
X -




When a control of a dependent variable is desired over more than
one time period, the problem is referred to as a multiperiod control
problem. For a discussion of multiperiod control, see Zellner,
336-354.










where y is the dependent variable whose control is desired, y* is the

value or target the controller would like y to attain next period, and

X is the model's vector-valued instrument. L(y y*) is a loss func-

tion that describes the losses incurred by the controller as a result

of y not equalling the target, y*, or not falling in the target inter-

val.1 The loss values may be viewed as opportunity losses or "social

costs". The function f(y[X) is the predictive distribution of future

values of y as determined by the econometric model used to characterize

y. Thus, in this case control of y is effected by setting X in the

current time period so as to minimize next period's expected loss.

If sample information about the process is available, the control

problem is still solved by minimizing expected loss, but the expecta-

tion is taken with respect to a predictive distribution that reflects

the sample information. Letting y and X now refer to observed data

points and YF and XF refer to the control-period values of the target

and control variable, respectively, the problem becomes

min f L(yF yF*)f(yFIYX,XF)dyF. (4.2)
XF -F







1For a discussion of the sensitivity of control to the form of
the loss function, see Arnold Zellner and Martin Geisel, "Sensitivity
of Control to Uncertainty and Form of the Criterion Function," in
D. G. Watts (ed.), The Future of Statistics (New York: Academic Press,
1968), 269-283.












In the single-period control problem described above, it is

assumed that the controller knows the correct econometric model or

random process he wishes to control. Accordingly, in solving his

control problem, the controller has only to contend with parameter and

residual uncertainty, not model specification uncertainty. Much work

has been done on such problems by, for example, Fisher, Brainard,

Leland, Basu, and Zellner.2 The more complicated multiperiod control

problem, in which it is assumed that the controller knows the correct

econometric model of the process he desires to control, has also re-

ceived attention. See, for example, Aoki, Prescott, Zellner, Taylor,

and Chow.3 The approaches to single and multiperiod control of the

1For a more complete discussion of control problems, see Zellner,
319-359.

2Walter D. Fisher, "Estimation in the Linear Decision Model,"
International Economic Review, 3 (January, 1972), 1-29. William
Brainard, "Uncertainty and the Effectiveness of Policy," American
Economic Review, 57 (May, 1967), 411-425. H. Leland, "The Theory of
the Firm Facing Uncertain Demand," American Economic Review, 62 (1972),
278-291. A. Basu, "Economic Regulation Under Parameter Uncertainty,"
(Ph.D. dissertation, Economics Department, Stanford University, 1973).
Arnold Zellner, An Introduction to Bayesian Inference in Econometrics
(New York: John Wiley and Sons, 1971), 319-336.
3Masanao Aoki, Optimization of Stochastic Systems (New York: Aca-
demic Press, 1967); Edward C. Prescott, "Adaptive Decision Rules for
Macro Economic Planning" (Ph.D. dissertation, Graduate School of Indus-
trial Administration, Carnegie-Mellon University, 1967); Edward C.
Prescott, "The Multi-Period Control Problem Under Uncertainty," Econo-
metrica, 40 (November, 1972), 1043-58; Zellner, pp. 336-54; John B.
Taylor, "Asymptotic Properties of Multiperiod Control Rules in the
Linear Regression Model," Institute for Mathematical Studies in the
Social Sciences, Stanford University, Technical Report No. 79, December,
1972; Gregory D. Chow, "Effect of Uncertainty on Optimal Control Poli-
cies," International Economic Review, 14 (October, 1973), 632-645;
Gregory C. Chow, "A Solution to Optimal Control of Linear Systems with
Unknown Parameters," Econometric Presearch Program, Princeton University,
Research Memorandum No. 157, December, 1973.










above-mentioned authors are theoretically appropriate only if the

controller can assert with probability one that the model he has

chosen to represent the process whose control is desired is in fact

the correct representation of the process. If the controller can make

such a statement, then in solving his control problem he only has to

contend with the model's parameter and residual uncertainty. If, how-

ever, he specifies the chosen model's appropriateness with a model

probability less than one, he is acknowledging the existence of model

specification uncertainty. Theoretically, if model specification

uncertainty exists, it should be dealt with in control problems. It

should not be ignored or assumed away via some model selection proce-

dure such as Bayesian Model Selection.1 Control procedures that fail

to consider model specification when it exists are not optimal proce-

dures. Such procedures, in the sense of Chapter III, misspecify the

uncertainty involved in controlling y, and therefore, the risk faced by

the controller in using them to set the rate or level of his instru-

ments.

Model specification uncertainty has not been explicitly considered

in the control literature. Since it may have an impact upon optimal

control solutions, it merits consideration. That the consideration of

model specification uncertainty in control contexts is important and warrants



1For a discussion of several model selection procedures that are
frequently used to establish econometric models of processes whose
control is desired, see Gaver and Geisel, "Discriminating Among Alter-
native Models: Bayesian and ilon-Bayesian Methods," pp. 49-77.










a great deal of attention has been expressed by Pierce:


Another area of uncertainty has to do with our
models. I want to stress this because users of
control theory often tend to take models as given
and work out solutions without seriously ques-
tioning the reasonableness of the models. This
tendency is not very harmful when one is working
on technique. However, there is a real danger
of giving more credence to model results than
they deserve, especially if a particular policy
trajectory is highly influenced by the choice
of a model.'


He goes on to say:


The problem lies not with uncertainty concerning
the true value of the model parameters, but also
with the structure of the models themselves.2


By utilizing the Bayesian Model Comparison procedure to develop

a Bayesian Mixed Model Predictive distribution for the process whose

single-period control is desired, a controller can determine settings

for his instruments in light of residual, parameter, and model speci-

fication uncertainty.3 When single-period control is desired, the

solution to the following minimization problem provides optimal

settings for the controller's instruments, DF:


min J L(YFYF*)f(YFly,D,DF)dyF- (4.3)
DF -


J. L. Pierce, "Quantitative Analysis for Decisions at the Federal
Reserve," Annals of Economic and Social Measurement, 3 (1974), 1-9.
2Ibid.
3That a BMMP in fact reflects model specification uncertainty was
discussed in Chapter III.











Recall that D = (X,Z)' and DF = (XF,ZF)'. The function f(yF y,D,DF)

is the controller's GMMP for the data-generating process. All other

terms in (4.3) are as previously defined. The only difference between

(4.3) and (4.1) or (4.2) is the use of a Bayesian Mixed Model Predic-

tive in (4.3) rather than a predictive distribution determined from a

single model.1 Since all relevant major forms of uncertainty, resi-

dual, parameter, and model specification uncertainty are reflected in

(4.3) and, therefore, influence its solution, it is said that (4.3)

provides optimal settings for DF.

In the next section of this chapter, assumptions are presented

under which various single-period control solutions are obtained using

the BMC procedure in the remainder of the chapter.


IV.2 Model Space and Assumptions

In the remainder of this chapter, solutions will be derived for

single-period control problems based on the following assumptions:

1. The decision maker (controller) believes that one or

the other of the following two models is an accurate

representation of a data-generating process to be con-

trolled, but he is unsure which one is correct:

M1: y = I1X + e ;

M2: y = $2Z + 6. (4.4)

y is the target variable, and X and Z are two dif-

ferent nonrandom explanatory variables, instruments

1In what follows, control problems that deal with a single model
will be expressed as in (4.1) or (4.2).











over which the controller has complete control. 81 and

82 are unknown parameters. E and 6 are the usual normally

distributed error terms, each with zero mean and unknown
2 2
variance, o and o respectively. It is also assumed that

cov(8,E) = cov(B2,6) = cov(e,6) = 0. Thus, M1 and M2 are

normal univariate regression models which, to keep the number

of each model's unknown parameters at two, have been forced

through the origin.

2. The data-generating process over which control is desired

is stationary.

3. X and Z are uncorrelated, and only the controllable variable

in the true model affects y. Thus, if M1 were the true

model, 82 would be zero. If neither M1 nor rl2 were the true

model, it may be that 81 = B2 = 0.

4. The controller's loss function is a quadratic loss function

of the form

L(YFYF*) = K(yF yF*)2

where K is a constant. In what follows, K is set equal to

one without loss of generality.

Aside from the change in emphasis from forecasting to control, and the

assumption that X and Z are controllable variables, the above assump-

tions are similar to those under which the Bayesian Model Comparison

and Bayesian Model Selection procedures were compared in Chapter III

(see Section 111.2.4).

In the next section, certainty-equivalent solutions to single










period control problems will be derived under the above assumptions

with the use of the BMC procedure.


IV.3 Single-Period Certainty-Equivalent Control

If in attempting to control y's value next period, the controller

behaves this period as if E(y) is the value of y that will occur with

certainty next period, then E(y) is said to be a "certainty equivalent"

for y.1 When the process which generates y is known or assumed to be

known, the single-period control problem under parameter and residual

uncertainty is reduced to a deterministic problem. If the process

which generates y is not known, but is believed to be best represented

by one of N alternative models, the single-period control problem under

model specification, parameter, and residual uncertainty reduces to one

of control under model specification uncertainty alone. In this sec-

tion, single-period control solutions are derived for the controller

who admits to model specification uncertainty and behaves as if E(y)

will occur with certainty next period.

The use of the certainty-equivalent E(y) for y reduces models 1

and 2 of Section IV.2 to the following:

M1: Eyl (y) E E ( )X + EE(c) = bl'X;

(4.5)
M2: Ey[x(Y) = EB2(62)Z + E (6) = b2'Z.


For a definition and discussion of certainty equivalence, see
Herbert A. Simon, "Dynamic Programming Under Uncertainty with a Qua-
dratic Criterion Function," Econometrica, 24 (1956), 74-81; and/or
C. Holt, J. F. Muth, F. Modigliani, and H. A. Simon, Planning Produc-
tion, Inventories and Work Force (Englewood Cliffs, N.J.: Prentice-
Hall, 1960), Chapter 6.









In (4.5) the parameter and residual uncertainty of M1 and M2 are
treated as if they do not exist. Thus, if neither M1 nor M2 is known
or assumed to be the true model, it is only necessary to deal with
model specification uncertainty.
From (4.2), assuming that M1 is the true model, the single-period
control problem is solved by determining:

min f L(YF,YF*)f(YFYX,XF )dyF min E L(YFYF*). (4.6)
XF -_ XF yF|yX,XF

The solution to (4.6) yields the controller's minimum expected loss
under Mi's predictive distribution of yF. If the loss function is
quadratic, as is assumed for the remainder of this chapter, (4.6)
becomes:

min E y,XXFF -F*)2 (4.7)
XF YYXF
The use by the controller of EyF y ,XXF (YF as a certainty equivalent

for YF reduces (4.7) to the following:

min [Ey ,X (YF) y*]2. (4.8)
X F

Note that (4.8) contains no random terms. Thus, (4.8) is minimized by
the value of XF that sets EyF ,X,XF (F) equal to yF*. From (4.5) it

can be seen that EyF y,X(YF) = E y(B1)XF + EEy(c) = bl"XF. Thus

The appropriate setting for XF is one such that bl"XF = y*. According-
ly, XF should be set equal to y- This is the single-period certainty-
S
equivalent solution when it is assumed that model 1 generates










yF1 Similarly, when model 2 is assumed to generate yF' the single-
period certainty-equivalent solution is to set Z equal to -,.2

Single-period certainty-equivalent control solutions in which

a particular model is assumed to generate yF are derived assuming the

mean of YF's predictive distribution is the value of yF that will

occur with certainty next period. This is equivalent to assuming that

B1 = bl" and e = 0. YF's predictive variance is ignored in the cer-

tainty-equivalent solution. Consequently, such solutions are not

optimal but are only approximations to optimal solutions, as explained

by Zellner. In general, since certainty-equivalent control problems

ignore yF's predictive variance and, therefore, parameter and residual

uncertainty, their solutions are much easier and less costly to obtain

than are optimal control solutions. Consequently, certainty-equivalent

control may at times provide the controller with an attractive alter-

native to full-scale optimal control.


IV.3.1 Certainty-Equivalent Control Using the BMMP Distribution

By using the Bayesian Model Comparison procedure's Bayesian Mixed

Model Predictive as YF's predictive distribution, single-period cer-

tainty-equivalent control solutions can be derived which reflect the

controller's model specification uncertainty concerning M1 and M2 of


1This solution can also be found in Zellner, pp. 320-322.

2Notice that these certainty-equivalent solutions make the control
target, y*, the mean of YF's predictive distribution.

Zellner, pp. 322-324.









the previous section. This approach to certainty-equivalent control
also does not explicitly consider parameter and residual uncertainty
and is, therefore, also suboptimal. But by enabling the controller to
solve his control problems in light of any model specification uncer-
tainty, this approach may improve the effectiveness of certainty-
equivalent control solutions. As will be seen below, single-period
BMC certainty-equivalent control, as it will be called, requires little
more computational effort than the certainty-equivalent control solu-
tions derived above in which specification uncertainty was not treated.
The BMC certainty-equivalent control solution can be obtained from
the full-scale BMC control problem of (4.3). (4.3) is repeated here
and the BMC certainty-equivalent control solution is derived below:

min f L(yF,yF*)f(YF ,D,DF)dyF. (4.9)
DF _

Recall from (3.1) that for the two-model case the BMMP distribution
would be expressed as


f(yFIYD,DF) = P"(Ml y,X)f(yFIM1y,X,XF)

+ P"(M2ly,Z)f(yFIM2,y,Z,ZF). (4.10)


In this case, D and DF are vectors of control variables: D = (X,Z)'
and DF = (XFZF)'. Accordingly, (4.9) may be written

min [P"(Ml y,X) f L(F,yF*)f(YFIM1 yXXF)dyF
D+

+ P"(M21y,Z) L(YFYF*)f(YFIM2 yZZF)dyF]. (4.11)









Under the assumption that the loss function is quadratic, (4.11) may
be rewritten

min [P"(M1 YX)EF M1-,X,X(YF *)2
DF Fi

+ P"(ML2 ,Z)E FM2,YZ,ZF (F yF*) 2 (4.12)


The use by the controller of E FI MyXXF (F) = C1 and

EyFIM2,YZZFYF) = C2 as certainty equivalents for yF in M1 and M2,

respectively, means that yF is no longer treated as being random.
Consequently, (4.12) reduces to

min [P"(MIjy,X)(C1 yF*)2 + P"(2yZ)(C2 F) 2]. (4.13)
DF
Because the right-hand term inside the brackets of (4.13) is not a
function of XF, and the left-hand term is not a function of ZF, the
vector optimizing (4.13), DF*, may be found by minimizing each of the
terms within the brackets separately. Thus, in order to find DF*,
the single-period BMC certainty-equivalent control solution, the
following two problems must be solved:

min P"(M1Iy,X)(C1 yF*)2 (4.14)
and XF
min P"(M21YZ)(C2 yF*)2. (4.15)
ZF

Noting that P"(M1Iy,X) is not a function of XF, and that P"(M21y,Z) is
not a function of ZF, (4.14) and (4.15) reduce to the following:










min (C1 F*) ; (4.16)
XF

2
min (C2 yF*) (4.17)
ZF


Notice that (4.16) is the same as (4.8). Thus, for example, in order to

solve (4.16) XF should be set equal to y,. Thus, DF* = ()-, -- In
F bb2 1 2

words, the BMC certainty-equivalent control solution is to set XF as if

M1 were in fact the model generating yF and to set ZF as if 12 were the

true model.
The rationale behind the BMC control solution is that since the

controller is unsure of which of the two control instruments, XF or

ZF, affects yF, and since he believes that only one of them actually

affects y, he should use them both fully in attempting to attain yF*'

Due to the restrictive assumptions under which it was derived, this

solution is somewhat unrealistic. A more realistic solution would

account for the possibility that (1) costs might be incurred for the

use of an instrument, especially for the use of an inappropriate

instrument; (2) both instruments might affect yF; (3) the instruments

interact in some manner; and/or (4) the process generating y may be

nonstationary. The first of these more realistic cases will be

discussed with respect to optimal BMC control in Section IV.4. At


For a solution to how to account for the cost of changing the
setting of an instrument in the optimal single-period control problem
in which a particular model is assumed to generate yF, see Zellner,
pp. 324-325.








that time, the appropriate optimal BMC and BMC certainty-equivalent

control solutions for various cases in which instrument use costs are

involved will be derived. Case (2) above is discussed in Section

IV.4.5, and an approach to case (4) is discussed in Chapter V.

IV.3.2 Risk Specification in Certainty-Equivalent Control

Even though a controller may behave as though the expected value

of YF is certain to occur next period, he should not ignore the risk
involved in his choosing to do so. This risk may be represented by

YF's predictive variance. The larger YF's predictive variance, the
more likely that L(YF'YF*) = (YF F*)2 will be large. Thus, the

controller can use YF's predictive variance as a measure of the risk

involved in his attempt to attain yF*. If the risk appears too great,

the controller may choose a different control method, perhaps optimal

single-period control (discussed in Section IV.4), since it considers

the size of YF's predictive variance in determining settings for the

controller's policy instruments.
If the controller knows that a particular model, say M, will

generate yF, recalling (3.24), YF's predictive variance and a measure

of the risk being taken by the controller is

(n 1)5 X
02 1 + 1 (4.18)
1 (n 3) N 2 (4.
1 xi
where xi is the i sample observation of X. Notice that 01 is a function
of the controller's instrument setting, XF. Consequently, since the con-

trol method chosen affects XF, it also influences the size of o1.

In the case of certainty-equivalent









F*
control, XF = and the predictive variance is


(n 1)S2 y*
( x2
2 (n _) + (4.19)

1i=1 j

If the controller acknowledges model specification uncertainty
and chooses to control via BMC certainty-equivalent control, then,
recalling (3.12), YF's predictive variance is

o2 = p,(M1 y,X)o2 + P"(M y,Z)o2 + P"(M1 y,X)(l 2

+ P"(M21y,Z)(2 )2 (4.20)

Di is the mean of YF's predictive distribution as characterized by
model i, and 1 is the mean of the BMMP distribution for yF.
Equation (4.18) provides an appropriate risk measure only if the
controller is certain that a particular model will generate YF. If he
utilizes the BMS procedure to choose a model for yF, he is acknow-
ledging that he is uncertain of the form of the process generating

yF. Consequently, (4.18) is not an appropriate measure of his risk.
2
If the BMS procedure chooses, say, Ml, a1 understates the risk in-
volved in his attempt to attain yF*. The following lemma is needed
to prove this statement.

LEMMA 7. Let n > 3. When BMS and the max-R2 rule provide equivalent
methods for choosing between M1 and M2 (see Section III.2.2), and
single-period certainty-equivalent control is applied to the model










chosen by BMS, say, model 1, then V(BMSP) = o2 1 V(BMMP) = 02

PROOF: Suppose the BMS procedure chooses M1, and M1 is used to
YF*
control yF. The certainty-equivalent control solution is XF =b

Accordingly, o1 is as shown in (4.19). Raiffa and Schlaifer show that
N
I xiYi
1 N 2 '

Si=x


where xi and y. may be the ith sample observation of X and y, or

reflect prior information about Bi in a form equivalent to sample

observations.1 Substituting for bl" in (4.19) yields


n 2 2
(n I)S5 (I l i F
2 1 i=l 1
i=l
1 (n 3) n (4.21)


2
s estimated residual variance, is, by definition,

n 2 n 2
Yi -( xi





2 2
Thus, a necessary and sufficient condition for S = S2 is



Howard Raiffa and Robert Schlaifer, Applied Statistical Decision
Theory (Cambridge, Mass.: M.I.T. Press, 1961), p. 343.










n 2 N 2
S xiYi)2 ( ziYi)2
i=l i=l
=1 = (4.23)
n 2 n 2
Szxi i
i= 1 i1= l

where zi is the ith sample observation of Z. Accordingly, if S1 = S~,
(xy) (2 2 2
then 2 = ~X) Z and, noting ao's definition in (4.21), o = .
2 z 2
1 2
If it can be shown that -2 > 0, then it can also be said that o, >
2S1
2 2 2 ,1
2 when S > S2. That > 0 is demonstrated in the next paragraph.
h 1 >


Noting that

(Y = [ -y2 (n 1)S2] > 0, (4.24)
x

(4.21) can be rewritten


2 (n 1)S yF 2
1 (n 3) 2 (n )S] (4.25)


Taking the partial derivative of (4.25) with respect to S2 yields


ao2 (n l)yF2
1S2 (n 3)[y 2 (n 1)S2

(n- 1) 2 F* (n -1)
+(n 3)[y2 (n 2(4.26)










By (4.24), the denominator and, therefore, the entire first term on the

rhs of (4.26) is positive when n > 3, The second and third terms on

the rhs of (4.26) are also obviously both positive if n > 3. Conse-
302
quently, > 0.
351

Under the conditions of this lemma, the BMS procedure selects the
model with the lower 2
model with the lower S2 (higher R2), Thus, since 01 = 02 when S = S2'
22 2 2 2
and 0 > 2 when S2 > S2, the BMS procedure also selects the model with

the lower predictive variance. Recall Theorem 2 of Chapter III in

which it was shown that if the BMS procedure chooses the model with

the lower predictive variance, then V(BIMMP)? V(BMSP). Accordingly,

by Theorem 2, the desired result is obtained.1

If model specification uncertainty exists, the BMMP distribution

of the BHC procedure is the appropriate distribution with which to

characterize yF; any other procedure for determining the predictive

distribution will fail to include relevant information. Accordingly,

when model specification uncertainty exists, the appropriate measure
2 2 2
of the controller's risk is 02, not a.. As shown in Lemma 7, a is
1 1
less than 2 and therefore understates the controller's risk.

In this section, certainty-equivalent solutions have been con-

sidered, but, certainty-equivalent solutions are not fully optimal in

1Recall that Theorem 2 showed that V(BMMP) > V(BMSP). It was
noted, however, that V(BMMP) = V(BMSP) only when one or the other of
p'(M1) and P'(M2 equalled one. But neither of these cases involve
model specification uncertainty and, therefore, are not of interest
in this dissertation. Therefore, under the conditions of Lemma 7,
V(BMMP) > V(BMSP) in cases of interest.










general. In the next section, optimal single-period BMC control

solutions will be derived.


IV.4 Optimal Single-Period Control

A control procedure will be referred to as providing an optimal

solution to a control problem if it explicitly recognizes all existing

major forms of uncertainty and utilizes the information provided by

them in its solution to the control problem. Thus, for example, for

a control procedure and its solution to be called optimal when the

controller knows the form of the model generating y, but does not

know the parameter values of the model, the procedure need only con-

sider residual and parameter uncertainty. However, should specifica-

tion uncertainty concerning the model be present as well, the procedure

would have to consider residual uncertainty, parameter uncertainty,

and model specification uncertainty. As discussed in Section IV.3, the

certainty-equivalent approach to economic control problems treats

residual and parameter uncertainty suboptimally and, unless BMC

certainty-equivalent control procedures are used, also treats model

specification uncertainty suboptimally. In this section, optimal

control solutions, i.e., solutions that appropriately treat residual,

parameter and model specification uncertainty, will be derived using

the BMC procedure. These solutions will be referred to as "optimal

BMC control solutions."

Before proceeding with the derivation of optimal BMC control

solutions, mention should be made of the optimal control solution for

the case in which the controller knows the form of the model generating










y, but not its parameters. Assuming M1 is the true model, and em-

ploying a quadratic loss function, Zellner shows that the optimal

solution to (4.2) is1


SF (4.27)
XF n
(n 1)S2 xiYi
1 i=l
+
n n 2
(n 3) xiYi i x
i=l i=l

Equation (4.27) may be rewritten so that its relationship to the

certainty-equivalent solution to this problem may be examined:


XF = 12j (4.28)
1F (n 1)S
1--+ 1
n x
(n 3)bl"2 x2
i=l


Recall from Section IV.3 that the certainty-equivalent solution is

YF*
F b

Thus, as Zellner has noted, the certainty-equivalent solution is just

the first term on the rhs of (4.28).2 Zellner has shown that as the

precision of the estimation of 81 improves (i.e., as the posterior

variance of 81 decreases),




Zellner, pp. 320-322.
2bid.











(n 1)S2
1 0n 0

(n 3)bl2 xi
i=l

and, accordingly, the second term on the rhs of (4.28) approaches 1.
Thus, if b" is a very precise estimate of 81, (4.29) is approximately

(4.28). Zellner has also demonstrated that the use of the certainty-

equivalent solution (4.29) leads to higher expected losses than the

use of (4.28).2

IV.4.1 Optimal BMC Control
The optimal BMC control solution is obtained by minimizing ex-

pected loss over XF and ZF using the BMMP distribution of YF. This

problem was stated in (4.3) and is repeated here for convenience:

min f L(yF'F*)f(yFly,D,DF)dyF. (4.29)
DF F

(4.29) is solved below for the two-model case (see the assumptions of
Section IV.2) under study in this dissertation.

Substituting (4.10) for f(yF I,D,DF) in (4.29), the minimization

problem becomes
min [P"(MI|y,X) L(YFYF*)f(YFIM1Y,y,,XF)dyF
DF

+ P"(M21y,Z) L(YFYF*)f(YFIM2,yZ,ZF)dyF]. (4.30)



Ibid.
Zellner, pp. 322-324.









Recalling that 81 and c, and 62 and 6 are assumed to be independent,
the following transformation of variables can be made in the first
and second terms of (4.30), respectively, so that (4.30) may be written
in a more convenient form:


YF = XF +

YF= 2ZF +

Thus, utilizing a quadratic loss function for L(YF,YF*), (4.30) may
be written

min {P"(MI1y,X)E1 ,,y,,X[yF* (BIXF + )]2
D F (' ]

+ P"(M2 'yZ)E 2,,jtYZ,ZZLF ( 2ZF 6 )]2} (4.31)


It can be seen that, as in the case of the BMC certainty-equivalent
control problem, (4.31) separates into two minimization problems,

min E1, y2,X,X[Y* (lXF + E)]2 (4.32)

and
min E62 y Z Z F[YF* (2Z + 6)]2. (4.33)
Z F 62' e yZZF F'

Recall that (4.2) is the mathematical statement of the control problem
when it is known that M1 will generate yF. After the transformation
of variables noted above, (4.2) and (4.32) are the same. Thus, the
solution to (4.32) will be the same as that derived by Zellner for
(4.2). Except that it is M2 that is known to be generating yF in




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs