A knowledge-intensive machine-learning approach to the principal- agent problem


Material Information

A knowledge-intensive machine-learning approach to the principal- agent problem
Physical Description:
xvi, 220 leaves : ill. ; 29 cm.
Garimella, Kiran K
Publication Date:


Subjects / Keywords:
Decision making   ( lcsh )
Machine learning   ( lcsh )
Agency Theory / Expertensystem / Lernprozess / Theorie
Decision and Information Sciences thesis Ph. D
Dissertations, Academic -- Decision and Information Sciences -- UF
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1993.
Includes bibliographical references (leaves 206-218).
Additional Physical Form:
Also available online.
Statement of Responsibility:
by Kiran K. Garimella.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 030202441
oclc - 30381670
System ID:

Full Text






To my mother, Dr. Seeta Garimella


I thank Prof. Gary Koehler, chairman of the DIS department, a guru to me in the

deepest sense of the word who made it possible for me to grow intellectually and

experience the richness and fulfillment of an active mind.

I also want to thank Prof. Selcuk Erenguc for encouraging me at all times; Prof.

Harold Benson who taught me care, caution, and clarity in thinking by patiently teaching

me proof techniques in mathematics; Prof. David E.M. Sappington for giving me

invaluable lessons, by his teaching and example, on research techniques, for writing

papers and books that are replete with elegance and clarity, and for ensuring that my

research is meaningful and interesting from an economist's perspective; Prof. Sanford

V. Berg, for providing valuable suggestions in agency theory; and Prof. Richard Elnicki,

Prof. Antal Majthay, and Prof. Ira Horowitz for their advice and help with the research.

I thank Prof. Malay Ghosh, Department of Statistics, and Prof. Scott

McCullough, Department of Mathematics, for their guidance in statistics and


I also thank the administrative staff of the DIS department for helping me in

numerous ways and making my work extremely pleasant.

I thank my wife, Raji, for her patience and understanding while I put in long and

erratic hours.

I cannot conclude without expressing my deepest sense of gratitude to my mother,

Dr. Seeta Garimella, who constantly encouraged me in ways too numerous to recount

and made it possible for me to pursue my studies in the land of my dreams.


ACKNOWLEDGMENTS ................................... iii

LIST OF TABLES ...................................... viii

ABSTRACT ........................................... xv

1 OVERVIEW ...................................... 1


2.1 Introduction ............ .... .. .. ...... ..... ... 6
2.2 Expert Systems ..................................... 8
2.3 Machine Learning ................................. 10
2.3.1 Introduction ................................. 10
2.3.2 Definitions and Paradigms ....................... 14
2.3.3 Probably Approximately Close Learning ............. 21

3 GENETIC ALGORITHMS .............................. 23

3.1 Introduction ................................... 23
3.2 The Michigan Approach ............................ 26
3.3 The Pitt Approach ................................ 27

4 THE MAXIMUM ENTROPY PRINCIPLE ................... 28

4.1 Historical Introduction .............................. 28
4.2 Examples ..................................... 34

5 THE PRINCIPAL-AGENT PROBLEM .................... 38

5.1 Introduction ............ ............................38
5.1.1 The Agency Relationship ....................... 38
5.1.2 The Technology Component of Agency ............. 40
5.1.3 The Information Component of Agency ............. 40
5.1.4 The Timing Component of Agency .................. 42

5.1.5 Limited Observability, Moral Hazard, and Monitoring . . 44
5.1.6 Informational Asymmetry, Adverse Selection, and Screening 45
5.1.7 Efficiency of Cooperation and Incentive Compatibility . . 47
5.1.8 Agency Costs . . . . . . . . . . . . . ... .. 47
5.2 Formulation of the Principal-Agent Problem . . . . . . ... .. 48
5.3 Main Results in the Literature . . . . . . . . . . ... .. 62
5.3.1 Model 1: The Linear-Exponential-Normal Model . . . .. ..63
5.3.2 M odel 2 . . . . . . . . . . . . . . . ... .. 68
5.3.3 Model 3 . . .................................. 72
5.3.4 Model 4: Communication under Asymmetry . . . . ... ..77
5.3.5 Model G: Some General Results . . . . . . . . ... .. 80

6 METHODOLOGICAL ANALYSIS . . . . . . . . . . ... .. 82

7 MOTIVATION THEORY . . . . . . . . . . . . . ... .. 87

8 RESEARCH FRAMEWORK . . . . . . . . . . . . ... .. 92

9 M ODEL 3 ........................................ 97

9.1 Introduction .................................... 97
9.2 An Implementation and Study ......................... 101
9.3 Details of Experiments ............................ 106
9.3.1 Rule Representation .......................... 106
9.3.2 Inference Method .............................. 110
9.3.3 Calculation of Satisfaction ....................... 111
9.3.4 Genetic Learning Details . . . . . . . . . . ... .. 114
9.3.5 Statistics Captured for Analysis . . . . . . . . ... .. 115
9.4 Results . . . . . . . . . . . . . . . . . . . 116
9.5 Analysis of Results ................................ 118

10 REALISTIC AGENCY MODELS ......................... 149

10.1 Characteristics of Agents ......................... 157
10.2 Learning with Specialization and Generalization .......... 158
10.3 Notation and Conventions ...................... 160
10.4 Model 4: Discussion of Results ................... 161
10.5 Model 5: Discussion of Results ................... 163
10.6 Model 6: Discussion of Results ................... 164
10.7 Model 7: Discussion of Results ................... 165
10.8 Comparison of the Models ........................ 167
10.9 Examination of Learning ......................... 172

11 CONCLUSION . . . . . . . . . . . . . . . . . . 194

12 FUTURE RESEARCH ................................. 198

12.1 Nature of the Agency . . . . . . . . . . . ... .. 198
12.2 Behavior and Motivation Theory . . . . . . . . ... ..199
12.3 Machine Learning . . . . . . . . . . . . ... .. 200
12.4 Maximum Entropy . . . . . . . . . . . . ... ..203

APPENDIX FACTOR ANALYSIS . . . . . . . . . . . ... ..204

REFERENCES . . . . . . . . . . . . . . . . . . . . 206

BIOGRAPHICAL SKETCH ................................. 219


Table page

9.1: Characterization of Agents . . . . . . . . . . . . . . ... .. 125

9.2: Iteration of First Occurrence of Maximum Fitness . . . . . . ... ..126

9.3: Learning Statistics for Fitness of Final Knowledge Bases . . . . .. ..126

9.4: Entropy of Final Knowledge Bases and Closeness to the Maximum . . . 126

9.5: Frequency (as Percentage) of Values of Compensation Variables in the Final
Knowledge Base in Experiment 1 . . . . . . . . . . . ... .. 127

9.6: Range, Mean and Standard Deviation of Values of Compensation Variables
in the Final Knowledge Base in Experiment 1 . . . . . . . ... ..127

9.7: Correlation Analysis of Values of Compensation Variables in the Final
Knowledge Base in Experiment 1 . . . . . . . . . . . ... .. 128

9.8: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 1 . . . . . . . . . . . . . . ... .. 128

9.9: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 1 Factor Pattern . . . . . . . . . ... ..129

9.10: Experiment 1 Varimax Rotation . . . . . . . . . . . ... .. 130

9.11: Frequency (as Percentage) of Values of Compensation Variables in the
Final Knowledge Base in Experiment 2 . . . . . . . . . ... ..131

9.12: Range, Mean and Standard Deviation of Values of Compensation Variables
in the Final Knowledge Base in Experiment 2 . . . . . . . ... ..131

9.13: Correlation Analysis of Values of Compensation Variables in the Final
Knowledge Base in Experiment 2 . . . . . . . . . . . ... ..131


9.14: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 2 Eigenvalues of the Correlation Matrix ....... ..132

9.15: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 2 Factor Pattern . . . . . . . . . ... ..133

9.16: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 2 -Varimax Rotated Factor Pattern . . . . .. ..134

9.17: Frequency (as Percentage) of Values of Compensation Variables in the
Final Knowledge Base in Experiment 3 . . . . . . . . . ... ..135

9.18: Range, Mean and Standard Deviation of Values of Compensation Variables
in the Final Knowledge Base in Experiment 3 . . . . . . . ... ..135

9.19: Correlation Analysis of Values of Compensation Variables in the Final
Knowledge Base in Experiment 3 . . . . . . . . . . . ... .. 135

9.20: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 3 Eigenvalues of the Correlation Matrix ....... ..136

9.21: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 3 Factor Pattern . . . . . . . . . ... ..137

9.22: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 3 Varimax Rotated Factor Pattern . . . . ... ..138

9.23: Frequency (as Percentage) of Values of Compensation Variables in the
Final Knowledge Base in Experiment 4 . . . . . . . . . ... ..139

9.24: Range, Mean and Standard Deviation of Values of Compensation Variables
in the Final Knowledge Base in Experiment 4 . . . . . . . ... ..139

9.25: Correlation Analysis of Values of Compensation Variables in the Final
Knowledge Base in Experiment 4 . . . . . . . . . . . ... .. 139

9.26: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 4 Eigenvalues of the Correlation Matrix ....... ..140

9.27: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 4 Factor Pattern . . . . . . . . . ... .. 141

9.28: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 4 Varimax Rotated Factor Pattern . . . . ... ..143

9.29: Frequency (as Percentage) of Values of Compensation Variables in the
Final Knowledge Base in Experiment 5 . . . . . . . . . ... ..144

9.30: Range, Mean and Standard Deviation of Values of Compensation Variables
in the Final Knowledge Base in Experiment 5 . . . . . . . ... ..144

9.31: Correlation Analysis of Values of Compensation Variables in the Final
Knowledge Base in Experiment 5 . . . . . . . . . . . ... ..144

9.32: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 5 Eigenvalues of the Correlation Matrix ....... ..145

9.33: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 5 Factor Pattern . . . . . . . . . ... ..145

9.34: Factor Analysis (Principal Components Method) of the Final Knowledge
Base of Experiment 5 Varimax Rotated Factor Pattern . . . . ... ..146

9.35: Summary of Factor Analytic Results for the Five Experiments . . ... ..146

9.36: Expected Factor Identification of Compensation Variables for the Five
Experiments Derived from the Direct Factor Analytic Solution ....... ..147

9.37: Expected Factor Identification of Compensation Variables for the Five
Experiments Derived from the Varimax Rotated Factor Analytic
Solution . . . . . . . . . . . . . . . . . . . . 147

9.38: Expected Factor Identification of Behavioral and Risk Variables for the
Five Experiments Derived from the Direct Factor Pattern . . . ... ..148

9.39: Expected Factor Identification of Behavioral and Risk Variables for the
Five Experiments Derived from Varimax Rotated Factor Analytic
Solution . . . . . . . . . . . . . . . . . . . . 148

10.1: Correlation of LP and CP with Simulation Statistics (Model 4) . . ... ..174

10.2: Correlation of LP and CP with Compensation Offered to Agents (Model
4) . . . . . . . . . . . . . . . . . . . . . . 174

10.3: Correlation of LP and CP with Compensation in the Principal's Final KB
(M odel 4) . . . . . . . . . . . . . . . . . . . 174

10.4: Correlation of LP and CP with the Movement of Agents (Model 44 . . 174

10.5: Correlation of LP with Agent Factors (Model 4) . . . . . . ... ..174

10.6: Correlation of LP and CP with Agents' Satisfaction (Model 4) . . ... ..175

10.7: Correlation of LP and CP with Agents' Satisfaction at Termination (Model
4) . . . . . . . . . . . . . . . . . . . . . . 175

10.8: Correlation of LP and CP with Agency Interactions (Model 4) . . ... ..175

10.9: Correlation of LP with Rule Activation (Model 4) . . . . . . ... .. 175

10.10: Correlation of LP with Rule Activation in the Final Iteration (Model 4) . 175

10.11: Correlation of LP and CP with Principal's Satisfaction and Least Squares
(M odel 4) . . . . . . . . . . . . . . . . . . . 175

10.12: Correlation of Agent Factors with Agent Satisfaction (Model 4) . . . 176

10.13: Correlation of Principal's Satisfaction with Agent Factors (Model 4) . 176

10.14: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model
4) . . . . . . . . . . . . . . . . . . . . . . 176

10.15: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction
(M odel 4) . . . . . . . . . . . . . . . . . . . 176

10.16: Correlation of Principal's Factor with Agent Factors (Model 4) . . . 177

10.17: Correlation of LP and CP with Simulation Statistics (Model 5) ....... ..177

10.18: Correlation of LP and CP with Compensation Offered to Agents (Model
5) . . . . . . . . . . . . . . . . . . . . . . 177

10.19: Correlation of LP and CP with Compensation in the Principal's Final
Knowledge Base (Model 5) . . . . . . . . . . . . . ... ..177

10.20: Correlation of LP and CP with the Movement of Agents (Model 5) . .. 177

10.21: Correlation of LP with Agent Factors (Model 5) . . . . . . ... ..178

10.22: Correlation of LP and CP with Agents' Satisfaction (Model 5) ....... ..178

10.23: Correlation of LP and CP with Agents' Satisfaction at Termination (Model
5) . . . . . . . . . . . . . . . . . . . . . . 178

10.24: Correlation of LP and CP with Agency Interactions (Model 5) ....... ..178

10.25: Correlation of LP with Rule Activation (Model 5) . . . . . . ... ..178

10.26: Correlation of LP with Rule Activation in the Final Iteration (Model 5 . 179

10.27: Correlation of LP and CP with Payoffs from Agents (Model 5) . . .. ..179

10.28: Correlation of LP and CP with Principal's Satisfaction, Principal's Factor
and Least Squares (Model 5) . . . . . . . . . . . . ... ..179

10.29: Correlation of Agent Factors with Agent Satisfaction (Model 5) . . . 179

10.30: Correlation of Principal's Satisfaction with Agent Factors (Model 5) . 180

10.31: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model
5) . . . . . . . . . . . . . . . . . . . . . . 180

10.32: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction
(M odel 5) . . . . . . . . . . . . . . . . . . . 180

10.33: Correlation of Principal's Satisfaction with Outcomes from Agents (Model
5) . . . . . . . . . . . . . . . . . . . . . . 18 1

10.34: Correlation of Principal's Factor with Agents' Factors (Model 5) . . . 181

10.35: Correlation of LP and CP with Simulation Statistics (Model 6) ....... ..181

10.36: Correlation of LP and CP with Compensation Offered to Agents (Model
6) . . . . . . . . . . . . . . . . . . . . . . 18 1

10.37: Correlation of LP and CP with Compensation in the Principal's Final
Knowledge Base (Model 6) . . . . . . . . . . . . . ... ..182

10.38: Correlation of LP and CP with the Movement of Agents (Model 6) . . 182

10.39: Correlation of LP and CP with Agent Factors (Model 6) . . . . .. .. 182

10.40: Correlation of LP and CP with Agents' Satisfaction (Model 6) ....... ..182

10.41: Correlation of LP and CP with Agents' Satisfaction at Termination (Model
6) . . . . . . . . . . . . . . . . . . . . . . 183

10.42: Correlation of LP and CP with Agency Interactions (Model 6) ....... ..183


10.43: Correlation of LP and CP with Rule Activation (Model 6) . . . .. ..183

10.44: Correlation of LP and CP with Rule Activation in the Final Iteration
(M odel 6) . . . . . . . . . . . . . . . . . . . 183

10.45: Correlation of LP and CP with Principal's Satisfaction and Least Squares
(M odel 6) . . . . . . . . . . . . . . . . . . . 184

10.46: Correlation of Agents' Factors with Agents' Satisfaction (Model 6) . . 184

10.47: Correlation of Principal's Satisfaction with Agents' Factors and Agents'
Satisfaction (M odel 6) . . . . . . . . . . . . . . ... .. 185

10.48: Correlation of Principal's Factor with Agents' Factor (Model 6) . . . 185

10.49: Correlation of LP and CP with Simulation Statistics (Model 7) ....... ..185

10.50: Correlation of LP and CP with Compensation Offered to Agents (Model
7) . . . . . . . . . . . . . . . . . . . . . . 185

10.51: Correlation of LP and CP with Compensation in the Principal's Final

Knowledge Base (Model 7) . .





7) . . .







of LP and CP with the Movement of Agents (Model 7) . . 186

of LP with Agent Factors (Model 7) . . . . . . ... ..186

of LP and CP with Agents' Satisfaction (Model 7) ....... ..186

of LP and CP with Agents' Satisfaction at Termination (Model
. . . . . . . . . . . . . . . . . . . . 187

of LP and CP with Agency Interactions (Model 7) ....... ..187

of LP and CP with Rule Activation (Model 7) . . . .. ..187

of LP with Rule Activation in the Final Iteration (Model 7) . 187

of LP and CP with Payoffs from Agents (Model 7) ....... .188

of LP and CP with Principal's Satisfaction (Model 7) . . 188

of Agent Factors with Agent Satisfaction (Model 7) . . . 188












10.62: Correlation of Principal's Satisfaction with Agent Factors (Model 7) . 188

10.63: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model
7) . . . . . . . . . . . . . . . . . . . . . . 189

10.64: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction
(M odel 7) . . . . . . . . . . . . . . . . . . . 189

10.65: Correlation of Principal's Satisfaction with Outcomes from Agents (Model
7) . . . . . . . . . . . . . . . . . . . . . . 189

10.66: Correlation of Principal's Factor with Agents' Factor (Model 7) . . . 189

10.67: Comparison of Models . . . . . . . . . . . . . . ... .. 190

10.68: Probability Distributions for Models 4, 5, 6, and 7 . . . . . .. .. 193


Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Kiran K. Garimella

August 1993

Chairperson: Gary J. Koehler
Major Department: Decision and Information Sciences

The objective of the research is to explore an alternative approach to the solution

of the principal-agent problem, which is extremely important since it is applicable in

almost all business environments. It has been traditionally addressed by the optimization-

analytical framework. However, there is a clearly recognized need for techniques that

allow the incorporation of behavioral and motivational characteristics of the agent and

the principal that influence their selection of effort and payment levels.

The alternative proposed is a knowledge-intensive, machine-learning approach,

where all the relevant knowledge and the constraints of the problem are taken into

account in the form of knowledge-bases.

Genetic algorithms are employed for learning, supplemented in later models by

specialization and generalization operators. A number of models are studied in order of

increasing complexity and realism. Initial studies are presented that provide counter-

examples to traditional agency theory and that emphasize the need for going beyond the

traditional framework. The new framework is more robust, easily extensible in a

modular manner, and yields contracts tailored to the behavioral characteristics of

individual agents.

Factor analysis of final knowledge bases after extensive learning shows that

elements of compensation besides basic pay and share of output play a greater role in

characterizing good contracts. The learning algorithms tailor contracts to the behavioral

and motivational characteristics of individual agents. Further, neither did perfect

information yield the highest satisfaction nor did the complete absence of information

yield the least satisfaction. This calls into question the traditional agency wisdom that

more information is always desirable.

Studies of other models study the effect of two different policies of evaluating

agents' performance by the principal-individualized (discriminatory) evaluation versus the

relative (nondiscriminatory) evaluation. The results suggest guidelines for employing

different types of models to simulate different agency environments.


The basic research addressed by this dissertation is the theory and application of

machine learning to assist in the solution of decision problems in business. Much of the

earlier research in machine learning was devoted to addressing specific and ad-hoc

problems or to fill a gap or make up for some deficiency in an existing framework,

usually motivated by developments in expert systems and statistical pattern recognition.

The first applications were to technical problems such as knowledge acquisition, coping

with a changing environment and filtering of noise (where filtering and optimal control

were considered inadequate because of poorly understood domains), data or knowledge

reduction (where the usual statistical theory is inadequate to express the symbolic

richness of the underlying domain), and scene and pattern analysis (where the classical

statistical techniques fail to take into account pertinent prior information; see for

example, Jaynes, 1986a).

The initial research was concerned with gaining an understanding of learning in

extremely simple toy world models, such as checkers (Samuel, 1963), SHRDLU blocks

world (Winograd, 1972), and various discovery systems. The insights gained by such

research soon influenced serious applications.

The underlying domains of most of the early applications were relatively well

structured, whether they were the stylized rules of checkers and chess or the digitized

images of visual sensors. Our research focus is on importing these ideas into the area

of business decisionmaking.

Genetic algorithms, a relatively new paradigm of machine learning, deals with

adaptive processes modeled on ideas from natural genetics. Genetic algorithms use the

ideas of parallelism, randomized search, fitness criteria for individuals, and the formation

of new exploratory solutions using reproduction, survival and mutation. The concept is

extremely elegant, powerful, and easy to work with from the viewpoint of the amount

of knowledge necessary to start the search for solutions.

A related issue is maximum entropy. The Maximum Entropy Principle is an

extension of Bayesian theory and is founded on two other principles: the Desideratum of

Consistency and Maximal-Noncommitment. While Bayesian analysis begins by assuming

a prior, the Maximum Entropy Principle seeks distributions that maximize the Shannon

entropy and at the same time satisfy whatever constraints may apply. The justification

for using Shannon entropy comes from the works of Bernoulli, Laplace, Jeffreys, and

Cox on the one hand, and from the works of Maxwell, Boltzmann, Gibbs, and Shannon

on the other; the principle has been extensively championed by Jaynes and is only just

now penetrating into economic analysis.

Under the maximum entropy technique, the task of updating priors based on data

is now subsumed under the general goal of maximizing entropy of distributions given any

and all applicable constraints, where the data (or sufficient statistics on the data) play the

role of constraints. Maximum entropy is related to machine learning by the fact that the

initial distributions (or assumptions) used in a learning framework, such as genetic

algorithms, may be maximum entropy distributions. A topic of research interest is the

development of machine learning algorithms or frameworks that are robust with respect

to maximum entropy. In other words, deviation of initial distributions from maximum

entropy distributions should not have any significant effect on the learning algorithms (in

the sense of departure from good solutions).

The overall goal of the research is to present an integrated methodology involving

machine learning with genetic algorithms in knowledge bases and to illustrate its use by

application to an important problem in business. The principal-agent problem was

chosen for the following reasons: it is widespread, important, nontrivial, and fairly

general so that different models of the problem can be investigated, and information-

theoretic considerations play a crucial role in the problem. Moreover, a fair amount of

interest over the problem has been generated among researchers in economics, finance,

accounting, and game theory, whose predominant approach to the problem is that of

constrained optimization. Several analytical insights have been generated, which should

serve as points of comparison to results that are expected from our new methodology.

The most important component of the new proposed methodology is information

in the form of knowledge bases, coupled with strength of performance of the individual

pieces of knowledge. These knowledge bases, the associated strengths, their relation to

one another, and their role in the scheme of things are derived from the individuals' prior

knowledge and from the theory of human behavior and motivation. These knowledge

bases contain, for example, information about the agent's characteristics and pattern of

behavior under different compensation schemes; in other words, they deal with the issues

of hidden characteristics and induced effort or behavior. Given the expected behavior

pattern of an agent, a related research issue is the study of the effect of using

distributions that have maximum entropy with respect to the expected behavior.

Trial compensation schemes, which come from the specified knowledge bases, are

presented to the agentss. Upon acceptance of the contract and realization of the output,

the actual performance of the agent (in terms of output or the total welfare) is evaluated,

and the associated compensation schemes are assigned proportional credit. Periodically,

iterations of the genetic algorithm will be used to create a new knowledge base that

enriches the current one.

Chapter 2 begins with an introduction to artificial intelligence, expert systems,

and machine learning. Chapter 3 describes genetic algorithms. Chapter 4 covers the

origin of the Maximum Entropy Principle and its formulation. Chapter 5 deals with a

survey of the principal-agent problem, where a few basic models are presented, along

with some of the main results of the research.

Chapter 6 examines the traditional methodology used in attacking the principal-

agent problem, and measures to cover the inadequacies are proposed. One of the basic

assumptions of the economic theory--the assumption of risk attitudes and utility--is

circumvented by directly dealing with the knowledge-based models of the agent and the

principal. To this end, a brief look at some of the ideas from behavior and motivation

theory is taken in Chapter 7.

Chapter 8 describes the basic research model. Elements of behavior and

motivation theory and knowledge bases are incorporated. A research strategy to study

agency problems is proposed. The use of genetic algorithms periodically to enrich the

knowledge bases and to carry out learning is suggested. An overview of the research

models, all of which incorporate many features of the basic model, is presented.

Chapter 9 describes Model 3 in detail. Chapter 10 introduces Models 4 through

7 and describes each in detail. Chapter 11 provides a summary of the results of Chapters

9 and 10. Directions for future research are covered in Chapter 12.


2.1 Introduction

The use of artificial intelligence in a computerized world is as revolutionary as

the use of computers is in a manual world. One can make computers intelligent in the

same sense as man is intelligent. The various techniques of doing this compose the body

of the subject of artificial intelligence. At the present state of the art, computers are at

last being designed to compete with man on his own ground on something like equal

terms. To put it in another way, computers have traditionally acted as convenient tools

in areas where man is known to be deficient or inefficient, namely, doing complicated

arithmetic very quickly, or making many copies of data (i.e., files, reports, etc.).

Learning new things, discovering facts, conjecturing, evaluating and judging

complex issues (for example, consulting), using natural languages, analyzing and

understanding complex sensory inputs such as sound and light, and planning for future

action are mental processes that are peculiar to man (and to a lesser extent, to some

animals). Artificial intelligence is the science of simulating or mimicking these mental

processes in a computer.

The benefits are immediately obvious. First, computers already fill some of the

gaps in human skills; second, artificial intelligence fills some of the gaps that computers

themselves suffer (i.e., human mental processes). While the full simulation of the human

brain is a distant dream, limited application of this idea has already produced favorable


Speech-understanding problems were investigated with the help of the HEARSAY

system (Erman et al., 1980, 1981; and Hayes-Roth and Lesser, 1977). The faculty of

vision relates to pattern recognition and classification and analysis of scenes. These

problems are especially encountered in robotics (Paul, 1981). Speech recognition

coupled with natural language understanding as in the limited system SHRDLU

(Winograd, 1973) can find immediate uses in intelligent secretary systems that can help

in data management and correspondence associated with business.

An area that is commercially viable in large business environments that involve

manufacturing and any other physical treatment of objects is robotics. This is a proven

area of artificial intelligence application, but is not yet cost effective for small business.

Several robot manufacturers have a good order book position. For a detailed survey see

for example, Engelberger, 1980.

An interesting viewpoint to the application of artificial intelligence to industry and

business is that presented by decision analysis theory. Decision analysis helps managers

to decide between alternative options and assess risk and uncertainty in a better way than

before, and to carry out conflict management when there are conflicts among objectives.

Certain operations research techniques are also incorporated, as for example, fair

allocation of resources that optimize returns. Decision analysis is treated in Fishburn

(1981), Lindley (1971), Keeney (1984) and Keeney and Raiffa (1976). In most

applications of expert systems, concepts of decision analysis find expression (Phillips,

1986). Manual application of these techniques is not cost effective, whereas their use

in certain expert systems, which go by the generic name of Decision Analysis Expert

Systems, leads to quick solutions of what were previously thought to be intractable

problems (Conway, 1986). Several systems have been proposed that range from

scheduling to strategy planning. See for example, Williams (1986).

2.2 Expert Systems

The most fascinating and economically justifiable area of artificial intelligence is

the development of expert systems. These are computer systems that are designed to

provide expert advice in any area. The kind of information that distinguishes an expert

from a nonexpert forms the central idea in any expert system. This is perhaps the only

area that provides concrete and conclusive proof of the power of artificial intelligence

techniques. Many expert systems are commercially viable and motivate diverse sources

of funding for research into artificial intelligence. An expert system incorporates many

of the techniques of artificial intelligence, and a positive response to artificial intelligence

depends on the reception of expert systems by informed laymen.

To construct an expert system, the knowledge engineer works with an expert in

the domain and extracts knowledge of relevant facts, rules, rules-of-thumb, exceptions

to standard theory, and so on. This is a difficult task and is known variously as

knowledge acquisition or mining. Because of the complex nature of the knowledge and

the ways humans store knowledge, this is bound to be a bottleneck to the development


of the expert system. This knowledge is codified in the form of several rules and

heuristics. Validation and verification runs are conducted on problems of sufficient

complexity to see that the expert system does indeed model the thinking of the expert.

In the task of building expert systems, the knowledge engineer is helped by several tools,


The net result of the activity of knowledge mining is a knowledge base. An

inference system or engine acts on this knowledge base to solve problems in the domain

of the expert system. An important characteristic of expert systems is the ability to

justify and explain their line of reasoning. This is to create credibility during their use.

In order to do this, they must have a reasonably sophisticated input/output system.

Some of the typical problems handled by expert systems in the areas of business,

industry, and technology are presented in Feigenbaum and McCorduck (1983) and Mitra

(1986). Important cases where expert systems are brought in to handle the problems are

1. Capturing, replicating, and distributing expertise.

2. Fusing the knowledge of many experts.

3. Managing complex problems and amplifying expertise.

4. Managing knowledge.

5. Gaining a competitive edge.

As examples of successful expert systems, one can consider MYCIN, designed

to diagnose infectious diseases (Shortliffe, 1976); DENDRAL, for interpretation of

molecular spectra (Buchanan and Feigenbaum, 1978); PROSPECTOR, for geological

studies (Duda et al., 1979; Hart, 1978); and WHY, for teaching geography (Stevens and


Collins, 1977). For a more exhaustive treatment, see, for example Stefik et al. (1982),

Barr and Feigenbaum (1981, 1982), Cohen and Feigenbaum (1982), and Barr et al.


2.3 Machine Learning

2.3.1 Introduction

One of the key limitations of computers as envisaged by early researchers is the

fact that they must be told in explicit detail how to solve every problem. In other words,

they lack the capacity to learn from experience and improve their performance with time.

Even in most expert systems today, there is only some weak form of implicit learning,

such as learning by being told, rote memorizing, and checking for logical consistency.

The task of machine learning research is to make up for this inadequacy by incorporating

learning techniques into computers.

The abstract goals of machine learning research are broadly

1. To construct learning algorithms that enable computers to learn.

2. To construct learning algorithms that enable computers to learn in the same way

as humans learn.

In both cases, the functional goals of machine learning research are as follows:

1. To use the learning algorithms in application domains to solve nontrivial


2. To gain a better understanding of how humans learn, and the details of human

cognitive processes.


When the goal is to come up with paradigms that can be used to solve problems,

several subsidiary goals can be proposed:

1. To see if the learning algorithms do indeed perform better than humans do in

similar situations.

2. To see if the learning algorithms come up with solutions that are intuitively

meaningful for humans.

3. To see if the learning algorithms come up with solutions that are in some way

better or less expensive than some alternative methodology.

It is undeniable that humans possess cognitive skills that are superior not only to

other animals but also to most learning algorithms that are in existence today. It is true

that some of these algorithms perform better than humans in some limited and highly

formalized situations involving carefully modeled problems, just as the simplex method

consistently produces solutions superior to those possible by a human being. However,

and this is the crucial issue, humans are quick to adopt different strategies and solve

problems that are ill-structured, ill-defined, and not well understood, for which there

does not exist any extensive domain theory, and that are characterized by uncertainty,

noise, or randomness. Moreover, in many cases, it seems more important to humans to

find solutions to problems that satisfy some constraints rather than to optimize some

"function." At the present state of the art, we do not have a consistent, coherent and

systematic theory of what these constraints are. These constraints are usually understood

to be behavioral or motivational in nature.


Recent research has shown that it is also undeniable that humans perform very poorly in

the following respects:

* they do not solve problems in probability theory correctly ;

* while they are good at deciding cogency of information, they are poor at judging

relevance (see Raiffa, accident witnesses, etc.);

* they lack statistical sophistication;

* they find it difficult to detect contradictions in long chains of reasoning;

* they find it difficult to avoid bias in inference and in fact may not be able to

identify it.

(See for example, Einhomrn, 1982; Kahneman and Tversky, 1982a, 1982b, 1982c, 1982d;

Lichtenstein et al., 1982; Nisbett et al., 1982; Tversky and Kahneman, 1982a, 1982b,

1982c, 1982d.)

Tversky and Kahneman (1982a) classify, for example, several misconceptions in

probability theory as follows:

* insensitivity to prior probability of outcomes;

* insensitivity to sample size;

* misconceptions of chance;

* insensitivity to predictability;

* the illusion of validity;

* misconceptions of regression.


The above inadequacies on the part of humans pertain to higher cognitive

thinking. It goes without saying that humans are poor at manipulating numbers quickly,

and are subject to physical fatigue and lack of concentration when involved in mental

activity for a long time. Computers are, of course, subject to no such limitations.

It is important to note that these inadequacies usually do not lead to disastrous

consequences in most everyday circumstances. However, the complexity of the modem

world gives rise to intricate and substantial problems, solutions to which forbid

inadequacies of the above type.

Machine learning must be viewed as an integrated research area that seeks to

understand the learning strategies employed by humans, incorporate them into learning

algorithms, remove any cognitive inadequacies faced by humans, investigate the

possibility of better learning strategies, and characterize the solutions yielded by such

research in terms of proof of correctness, convergence to optimality (where meaningful),

robustness, graceful degradation, intelligibility, credibility, and plausibility.

Such an integrated view does not see the different goals of machine learning

research as separate and clashing; insights in one area have implications for another.

For example, insights into how humans learn help spot their strengths and weaknesses,

which motivates research into how to incorporate the strengths into algorithms and how

to cover up the weaknesses; similarly, discovering solutions from machine learning

algorithms that are at first nonintuitive to humans motivates deeper analysis of the

domain theory and of the human cognitive processes in order to come up with at least

plausible explanations.

2.3.2 Definitions and Paradigms

Any activity that improves performance or skills with time may be defined as

learning. This includes motor skills and general problem-solving skills. This is a highly

functional definition of learning and may be objected to on the grounds that humans learn

even in a context that does not demand action or performance. However, the functional

definition may be justified by noting that performance can be understood as improvement

in knowledge and acquisition of new knowledge or cognitive skills that are potentially

usable in some context to improve actions or enable better decisions to be taken.

Learning may be characterized by several criteria. Most paradigms fall under

more than one category. Some of these are

1. Involvement of the learner.

2. Sources of knowledge.

3. Presence and role of a teacher.

4. Access to an oracle (learning from internally generated examples).

5. Learning "richness."

6. Activation of learning:

(a) systematic;

(b) continuous;

(c) periodic or random;

(d) background;

(e) explicit or external (also known as intentional);

(f) implicit (also known as incidental);

(g) call on success; and

(h) call on failure.

When classified by the criterion of the learner's involvement, the standard is the

degree of activity or passivity of the learner. The following paradigms of learning are

classified by this criterion, in increasing order of learner control:

1. Learning by being told (learner only needs to memorize by rote);

2. Learning by instruction (learner needs to abstract, induce, or integrate to some

extent, and then store it);

3. Learning by examples (learner needs to induce to a great extent the correct

concept, examples of which are supplied by the instructor);

4. Learning by analogy (learner needs to abstract and induce to a greater degree in

order to learn or solve a problem by drawing the analogy. This implies that the

learner already has a store of cases against which he can compare the analogy and

that he knows how to abstract and induce knowledge);

5. Learning by observation and discovery (here the role of the learner is greatest;

the learner needs to focus on only the relevant observations, use principles of

logic and evidence, apply some value judgments, and discover new knowledge

either by using induction or deduction).

The above learning paradigms may also be classified on the basis of richness of

knowledge. Under this criterion, the focus is on the richness of the resulting knowledge,

which may be independent of the involvement of the learner. The spectrum of learning


is from "raw data" to simple functions, complicated functions, simple rules, complex

knowledge bases, semantic nets, scripts, and so on.

One fundamental distinction can be made from observation of human learning.

The most widespread form of human learning is incidental learning. The learning

process is incidental to some other cognitive process. Perception of the world, for

example, leads to formation of concepts, classification of objects in classes or primitives,

the discovery of the abstract concepts of number, similarity, and so on (see for example,

Rand 1967). These activities are not indulged in deliberately. As opposed to incidental

learning, we have intentional learning, where there is a deliberate and explicit effort to

learn. The study of human learning processes from the standpoint of implicit or explicit

cognition is the main subject of research in psychological learning. (See for example,

Anderson, 1980; Craik and Tulving, 1975; Glass and Holyoak, 1986; Hasher and Zacks,

1979; Hebb, 1961; Mandler, 1967; Reber, 1967; Reber, 1976; Reber and Allen, 1978;

Reber et al., 1980).

A useful paradigm for the area of expert systems might be learning through

failure. The explanation facility ensures that the expert system knows why it is correct

when it is correct, but it needs to know why it is wrong when it is wrong, if it must

improve performance with time. Failure analysis helps in focussing on deficient areas

of knowledge.

Research in machine learning raises several wider epistemological issues such as

hierarchy of knowledge, contextuality, integration, conditionality, abstraction, and

reduction. The issue of hierarchy arises in induction of decision trees (see for example,


Quinlan, 1979; Quinlan, 1986; Quinlan, 1990); contextuality arises in learning semantics,

as in conceptual dependency (see for example, Schank, 1972; Schank and Colby, 1973),

learning by analogy (see for example, Buchanan et al., 1977; Dietterich and Michalski,

1979), and case-based reasoning (Riesbeck and Schank, 1989); integration is fundamental

to forming relationships, as in semantic nets (Quillian, 1968; Anderson and Bower, 1973;

Anderson, 1976; Norman, et al., 1975; Schank and Abelson, 1977), and frame-based

learning (see for example, Minsky, 1975); abstraction deals with formation of universals

or classes, as in classification (see for example, Holland, 1975), and induction of

concepts (see for example, Mitchell, 1977; Mitchell, 1979; Valiant, 1984; Haussler,

1988); reduction arises in the context of deductive learning (see for example, Newell

and Simon, 1956; Lenat, 1977), conflict resolution (see for example, McDermott and

Forgy, 1978), and theorem-proving (see for example, Nilsson, 1980). For an excellent

treatment of these issues from a purely epistemological viewpoint, see for example Rand

(1967) and Peikoff (1991).

In discussing real-world examples of learning, it is difficult or meaningless to look

for one single paradigm or knowledge representation scheme as far as learning is

concerned. Similarly, there could be multiple teachers: humans, oracles, and an

accumulated knowledge that acts as an internal generator of examples.

In analyzing learning paradigms, it is useful to look at least three aspects, since

they each have a role in making the others possible:

1. Knowledge representation scheme.

2. Knowledge acquisition scheme.

3. Learning scheme.

At the present time, we do not yet have a comprehensive classification of learning

paradigms and their systematic integration into a theory. One of the first attempts in this

direction was taken by Michalski, Carbonell, and Mitchell (1983).

An extremely interesting area of research in machine learning that will have far-

reaching consequences for such a theory of learning is multistrategy systems, which try

to combine one or more paradigms or types of learning based on domain problem

characteristics or to try a different paradigm when one fails. See for example Kodratoff

and Michalski (1990). One may call this type of research meta-learning research,

because the focus is not simply on rules and heuristics for learning, but on rules and

heuristics for learning paradigms. Here are some simple learning heuristics, for


LH1: Given several "isa" relationships, find out about relations between the properties.

(For example, the observation that "Socrates is a man" motivates us to find out

why Socrates should indeed be classified as a man, i.e., to discover that the

common properties are "rational animal" and several physical properties.)

LH2: When an instance causes an existing heuristic with certainty to be revised

downwards, ask for causes.

LH3: When an instance that was thought to belong to a concept or class but later turns

out not to belong to it, find out what it does belong to.

LH4: If X isa Yl and X isa Y2, then find the relationship between Yl and Y2, and

check for consistency. (This arises in learning by using semantic nets).

LH5: Given an implication, find out if it is also an equivalence.

LH6: Find out if any two or more properties are semantically the same, the opposite,

or unrelated.

LH7: If an object possesses two or more properties simultaneously from the same class

or similar classes, check for contradictions, or rearrange classes hierarchically.

LH8: An isa-tree in a semantic net creates an isa-tree with the object as a parent; find

out in which isa-tree the parent object occurs as a child.

We can contrast these with meta-rules or meta-heuristics. A meta-rule is also a

rule which says something about another rule. It is understood that meta-rules are watch-

dog rules that supervise the firing of other rules. Each learning paradigm has a set of

rules that will lead to learning under that paradigm. We can have a set of meta-rules for

learning if we have a learning system that has access to several paradigms of learning

and if we are concerned with what paradigm to select at any given time. Learning meta-

rules help the learner to pick a particular paradigm because the learner has knowledge

of the applicability of particular paradigms given the nature and state of a domain or

given the underlying knowledge-base representation schema.

The following are examples of meta-rules in learning:

ML1: If several instances of a domain-event occur,

then use generalization techniques.

ML2: If an event or class of events occur a number of times with little or no change on

each occurrence,

then use induction techniques.


ML3: If a problem description similar to the problem on hand exists in a different

domain or situation and that problem has a known solution,

then use learning-by-analogy techniques.

ML4: If several facts are known about a domain including axioms and production rules,

then use deductive learning techniques.

ML5: If undefined variables or unknown variables are present and no other learning rule

was successful,

then use the learning-from-instruction paradigm.

In all cases of learning, meta-rules dictate learning strategies, whether explicitly as in a

multi-strategy system, or implicitly as when the researcher or user selects a paradigm.

Just as in expert systems, the learning strategy may be either goal directed or

knowledge directed. Goal-directed learning proceeds as follows:

1. Meta-rules select learning paradigm(s).

2. Learner imposes the learning paradigm on the knowledge base.

3. The structure of the knowledge base and the characteristics of the paradigm

determine the representation scheme.

4. The learning algorithm(s) of the paradigm(s) execute(s).

Knowledge directed learning, on the other hand, proceeds as follows:

1. The learner examines the available knowledge base.

2. The structure of the knowledge base limits the extent and type of learning, which

is determined by the meta-rules.

3. The learner chooses an appropriate representation scheme.

4. The learning algorithm(s) of the chosen learning paradigm(s) execute(s).

2.3.3 Probably Approximately Close Learning

Early research on inductive inference dealt with supervised learning from

examples (see for example, Michalski, 1983; Michalski, Carbonell, and Mitchell, 1983).

The goal was to learn the correct concept by looking at both positive and negative

examples of the concept in question. These examples were provided in one of two ways:

either the learner obtained them by observation, or they were provided to the learner by

some external instructor. In both cases, the class to which each example belonged was

conveyed to the learner by the instructor (supervisor, or oracle). The examples provided

to the learner were drawn from a population of examples or instances. This is the

framework underlying early research in inductive inference (see for example, Quinlan,

1979; Quinlan, 1986: Angluin and Smith 1983).

Probably Approximately Close Identification (or PAC-ID for short) is a powerful

machine-learning methodology that seeks inductive solutions in a supervised

nonincremental learning environment. It may be viewed as a multiple-criteria learning

problem in which there are at least three major objectives:

(1) to derive (or induce) the correct solution, concept or rule, which is as close as we

please to the optimal (which is unknown);

(2) to achieve as high a degree of confidence as we please that the solution so derived

above is in fact as close to the optimal as we intended;

(3) to ensure that the "cost" of achieving the above two objectives is "reasonable."


PAC-ID therefore replaces the original research direction in inductive machine

learning (seeking the true solution) by the more practical goal of seeking solutions close

to the true one in polynomial time. The technique has been applied to certain classes of

concepts, such as conjunctive normal forms (CNF). Estimates of necessary distribution

independent sample sizes are derived based on the error and confidence criteria; the

sample sizes are found to be polynomial in some factor such as the number of attributes.

Applications to science and engineering have been demonstrated.

The pioneering work on PAC-ID was by Valiant (1984, 1985) who proposed the

idea of finding approximate solutions in polynomial time. The ideas of characterizing

the notion of approximation by using the concept of functional complexity of the

underlying hypothesis spaces, introducing confidence in the closeness to optimality, and

obtaining results that are independent of the underlying probability distribution with

which the supervisory examples are generated (by nature or by the supervisor), compose

the direction of the latest research. (See for example, Haussler, 1988; Haussler, 1990a;

Haussler, 1990b; Angluin, 1987; Angluin, 1988; Angluin and Laird, 1988; Blumer,

Ehrenfeucht, Haussler, and Warmuth, 1989; Pitt and Valiant, 1988; and Rivest, 1987).

The theoretical foundations for the mathematical ideas of learning convergence

with high confidence are mainly derived from ideas in statistics, probability, statistical

decision theory, and fractal theory. (See for example, Vapnik, 1982; Vapnik and

Chervonenkis, 1971; Dudley, 1978; Dudley, 1984; Dudley, 1987; Kolmogorov and

Tihomirov, 1961; Kullback, 1959; Mandelbrot, 1982; Pollard, 1984; Weiss and

Kulikowski, 1991).


3.1 Introduction

Genetic classification algorithms are learning algorithms that are modeled on the

lines of natural genetics (Holland, 1975). Specifically, they use operators such as

reproduction, crossover, mutation, and fitness functions. Genetic algorithms make use

of inherent parallelism of chromosome populations and search for better solutions through

randomized exchange of chromosome material and mutation. The goal is to improve the

gene pool with respect to the fitness criterion from generation to generation.

In order to use the idea of genetic algorithms, problems must be appropriately

modeled. The parameters or attributes that constitute an individual of the population

must be specified. These parameters are then coded. The simulation begins with a

random generation of an initial population of chromosomes, and the fitness of each is

calculated. Depending on the problem and the type of convergence desired, it may be

decided to keep the population size constant or varying across iterations of the


Using the population of an iteration, individuals are selected randomly according

to their fitness level to survive intact or to mate with other similarly selected individuals.

For mating members, a crossover point is randomly determined (an individual with n


attributes has n-1 crossover points), and the individuals exchange their "strings," thus

forming new individuals. It may so happen that the new individuals are exactly the same

as the parents. In order to introduce a certain amount of richness into the population,

a mutation operator with extremely low probability is applied to the bits in the individual

strings, which randomly changes each bit. After mating, survival, and mutation, the

fitness of each individual in the new population is calculated. Since the probability of

survival and mating is dependent on the fitness level, more fit individuals have a higher

probability of passing on their genetic material.

Another factor plays a role in determining the average fitness of the population.

Portions of the chromosome, called genes or features, act as determinants of qualities of

the individual. Since in mating, the crossover point is chosen randomly, those genes that

are shorter in length are more likely to survive a crossover and thus be carried from

generation to generation. This has important implications for modeling a problem and

will be mentioned in the chapter on research directions.

The power of genetic algorithms (henceforth, GAs) derives from the following


1. It is only necessary to know enough about the problem to identify the

essential attributes of the solution (or "individual"); the researcher can work

in comparative ignorance of the actual combinations of attribute values that

may denote qualities of the individual.

2. Excessive knowledge cannot harm the algorithm; the simulation may be

started with any extra knowledge the researcher may have about the problem,


such as his beliefs about which combinations play an important role. In such

cases, the simulation may start with the researcher's population and not a

random population; if it turns out that the whole or some part of this

knowledge is incorrect or irrelevant, then the corresponding individuals get

low fitness values and hence have a high probability of eventually

disappearing from the population.

3. The remarks in point 2 above apply in the case of mutation also. If mutation

gives rise to a useless feature, that individual gets a low fitness value and

hence has a low probability of remaining in the population for a long time.

4. Since GAs use many individuals, the probability of getting stuck at local

optima is minimized.

According to Holland (1975), there are essentially four ways in which genetic

algorithms differ from optimization techniques:

1. GAs manipulate codings of attributes directly.

2. They conduct search from a population and not from a single point.

3. It is not necessary to know or assume extra simplifications in order to

conduct the search; GAs conduct the search "blindly." It must be noted

however, that randomized search does not imply directionless search.

4. The search is conducted using stochastic operators (random selection

according to fitness) and not by using deterministic rules.


There are two important models for GAs in learning. One is the Pitt approach,

and the other is the Michigan approach. The approaches differ in the way they define

individuals and the goals of the search process.

3.2 The Michigan Approach

The knowledge base of the researcher or the user constitutes the genetic

population, in which each rule is an individual. The antecedents and consequents of each

rule form the chromosome. Each rule denotes a classifier or detector of a particular

signal from the environment. Upon receipt of a signal, one or more rules fire,

depending on the signal satisfying the antecedent clauses. Depending on the success of

the action taken or the consequent value realized, those rules that contributed to the

success are rewarded, and those rules that supported a different consequent value or

action are punished. This process of assigning reward or punishment is called credit


Eventually, rules that are correct classifiers get high reward values, and their

proposed action when fired carries more weight in the overall decision of selecting an

action. The credit assignment problem is the problem of how to allocate credit (reward

or punishment). One approach is the bucket-brigade algorithm (Holland, 1986).

The Michigan approach may be combined with the usual genetic operators to

investigate other rules that may not have been considered by the researcher.

3.3 The Pitt Approach

The Pitt Approach, by De Jong (see for example, De Jong, 1988), considers the

whole knowledge base as one individual. The simulation starts with a collection of

knowledge bases. The operation of crossover works by randomly dichotomizing two

parent knowledge bases (selected at random) and mixing the dichotomized portions across

the parents to obtain two new knowledge bases. The Pitt approach may be used when

the researcher has available to him a panel of experts or professionals, each of whom

provides one knowledge base for some decision problem at hand. The crossover operator

therefore enables one to consider combinations of the knowledge of the individuals, a

process that resembles a brainstorming session. This is similar to a group decision-

making approach. The final knowledge base or bases that perform well empirically

would then constitute a collection of rules obtained from the best rules of the original

expertise, along with some additional rules that the expert panel did not consider before.

The Michigan approach will be used in this research to simulate learning on one

knowledge base.


4.1 Historical Introduction

The principle of maximum entropy was championed by E.T. Jaynes in the 1950s

and has gained many adherents since. There are a number of excellent papers by E.T.

Jaynes explaining the rationale and philosophy of the maximum entropy principle. The

discussion of the principle essentially follows Jaynes (1982, 1983, 1986a, 1986b, and


The maximum entropy principle may be viewed as "a natural extension and

unification of two separate lines of development . The first line is identified with the

names Bernoulli, Laplace, Jeffreys, Cox; the second with Maxwell, Boltzmann, Gibbs,

Shannon." (Jaynes, 1983).

The question of approaching any decision problem with some form of prior

information is historically known as the Principle of Insufficient Reason (so named by

James Bernoulli in 1713). Jaynes (1983) suggests the name Desideratum of Consistency,

which may be formally stated as follows:

(1) a probability assignment is a way of describing a certain state of knowledge;

i.e., probability is an epistemological concept, not a metaphysical one;

(2) when the available evidence does not favor any one alternative among others,

then the state of knowledge is described correctly by assigning equal

probabilities to all the alternatives;

(3) suppose A is an event or occurrence for which some favorable cases out of

some set of possible cases exist. Suppose also that all the cases are equally

likely. Then, the probability that A will occur is the ratio of the number of

cases favorable to A to the total number of equally possible cases. This idea

is formally expressed as

Pr[A] = M Number of cases favorable to A
N Number of equally possible cases"

In cases where Pr[] is difficult to estimate (such as when the number of cases is

infinite or impossible to find out), Bernoulli's weak law of large numbers may be

applied, where

Pr [A] = M = Number of cases favorable to A
N Total number of equally likely cases

Number of times A occurs
Number of trials


Limit theorems in statistics show that given (M,N) as the true state of nature, the

observed frequency f(m,n) = m/n approaches Pr[A] = P(M,N) = M/N as the number

of trials increase.


The reverse problem consists of estimating P(M,N) by f(m,n). For example, the

probability of seeing m successes in n trials when each trial is independent with

probability of success p, is given by the binomial distribution:

P(m I n,-) = P(m I n,p) (= )p(I -p)"- m.

The inverse problem would then consist of finding Pr[M] given (m,N,n). This problem

was given a solution by Bayes in 1763 as follows: Given (m,n), then

Pr[p < M < p + dp] = P(dp rm, n)

: (n + 1) M p (I p) n -w dp.
m! (n m)

which is the Beta distribution.

These ideas were generalized and put into the form they are today, known as the

Bayes' theorem, by Laplace as follows: When there is an event E with possible causes

C1, and given prior information I and the observation E, the probability that a particular

cause Ci caused the event E is given by

P(Ci[E,) = 1 ^
EP(El C1) P(CI|J)
S_, p (EI C ) P (Cjl- 1T

which result has been called "learning by experience" (Jaynes, 1978).

The contributions of Laplace were rediscovered by Jeffreys around 1939 and in

1946 by Cox who, for the first time, set out to study the "possibility of constructing a

consistent set of mathematical rules for carrying out plausible, rather than deductive,

reasoning." (Jaynes, 1983).


According to Cox, the fundamental result of mathematical inference may be

described as follows: Suppose A, B, and C represent propositions, AB the proposition

"Both A and B are true", and -'A the negation of A. Then, the consistent rules of

combination are:

P(ABIC) = P(A|BC) P(BIC), and

P(AIB) + P(-,AIB) = 1.

Thus, "Cox proved that any method of inference in which we represent degrees of

plausibility by real numbers, is necessarily either equivalent to Laplace's, or

inconsistent." (Jaynes, 1983).

The second line of development starts with James Clerk Maxwell in the 1850s

who, in trying to find the probability distribution for the velocity direction of spherical

molecules after impact, realized that knowledge of the meaning of the physical

parameters of any system constituted extremely relevant prior information. The

development of the concept of entropy maximization started with Boltzmann who

investigated the distribution of molecules in a conservative force field in a closed system.

Given that there are N molecules in the closed system, the total energy E remains

constant irrespective of the distribution of the molecules inside the system. All positions

and velocities are not equally likely. The problem is to find the most probable

distribution of the molecules. Boltzmann partitioned the phase space of position and

momentum into a discrete number of cells Rk, where 1 k < s. These cells were

assumed to be such that the k-th cell is a region which is small enough so that the energy

of a molecule as it moves inside that region does not change significantly, but which is


also so large that a large number Nk of molecules can be accommodated in it. The

problem of Boltzmann then reduces to the problem of finding the best prediction of Nk

for any given k in 1,...,s.

The numbers Nk are called the occupation numbers. The number of ways a given

set of occupation numbers will be realized is given by the multinomial coefficient

W(Nk) N= N!. ... N
N1 N2 N,!

The constraints are given by

E = E Nk Ek, and
k =1

N = Nk.
k= 1

Since each set {NJ} of occupation numbers represents a possible distribution, the

problem is equivalently expressed as finding the most probable set of occupation numbers

from the many possible sets. Using Stirling's approximation of factorials

n! V- -/nn (n) n

in equation (1) yields
logW = -N^ ) 1og I. (2)
k=1\N \N

The right hand side of (2) is the familiar Shannon entropy formula for the

distribution specified by probabilities which are approximated by the frequencies Nk/N,

k = 1, ..., s. In fact, in the limit as N goes to infinity,

li N log 10 -E N log ( N) = H.
N- 00 N.

Distributions of higher entropy therefore have higher multiplicity. In other words,

Nature is likely to realize them in more ways. If W, and W2 are two distributions, with

corresponding entropies of H, and H2, then the ratio W2/W1 is the relative preference of

W2 over W,. Since W2/W, exp[N(H2 H,)], when N becomes large (such as the

Avogadro number), the relative preference "becomes so overwhelming that exceptions

to it are never seen; and we call it the Second Law of Thermodynamics." (Jaynes, 1982).

The problem may now be expressed in terms of constrained optimization as


Maximize log W = -N -k 1o04-I
{Nkl ki \N/ N/

subject to

E Nk Ek = E, and
k= 1

SNk = N.
k = I

The solution yields surprisingly rich results which would not be attainable even

if the individual trajectories of all the molecules in the closed spaces were calculated.

The efficiency of the method reveals that in fact, such voluminous calculations would

have canceled each other out, and were actually irrelevant to the problem. A similar

idea is seen in the chapter on genetic algorithms, where ignorance can be seemingly


exploited and irrelevant information, even if assumed, would be eliminated from the


The technique has been used in artificial intelligence (see for example, [Lippman,

1988; Jaynes, 1991; Kane, 1991]), and in solving problems in business and economics

(see for example, [Jaynes, 1991; Grandy, 1991; Zellner, 1991]).

4.2 Examples

We will see how the principle is used in solving problems involving some type

of prior information which is used as a constraint on the problem. For simplicity, we

will deal with problems involving one random variable 0 having n values, and call the

associated probabilities pi. For all the problems, the goal is to choose a probability

distribution from among many possible ones which has the maximum entropy.

No prior information whatsoever. The problem may be formulated using the

Lagrange multiplier X for the single constraint as:

n n
Max g({pi}) = p1 in p + P 1 .
{Pi} i = i i

The solution is obtained as follows:Hence, pi = 1/n, i = l,...,n is the MaxEnt

assignment, which confirms the intuition on the non-informative prior.

Suppose the expected value of 0 is 10. We have two constraints in this problem:

the first is the usual constraint on the probabilities summing to one; the second is the

given information expected value of 0 is 1. We use the Lagrange multipliers X, and

\2 for the two constraints respectively. The problem statement follows:

= 1 lnp, + I = 0

= X -i

= e&-1 V i = 1,...,n,

= 1

e- 1
= E eX-i = 1

= n el-1 = 1

= n p,= 1

Pi = 1 V i = 1,. .,n.

- f Piln Pi + x1iE Pi-1 + ;2[L (OiPi2.L ]
j=-1 ,

This can be solved in the usual way by taking partial derivatives of gO w.r.t. p,, X,, and

X2, and equating them to zero. We obtain:
Pi = e21, and

n n
Sie-2 = VI e 2(I.
=-1 1i1


x = e


- in pi


= Pi

Maxg({pi}) =

we get

i Qx8 I.Le
n n
i =i i =i1

(Oi =6) x0 = 0
i =i

which is a polynomial in x, whose roots can be determined numerically.

For example, let n = 3, 0 take values {1,2,3}, lo = 1.25. Solving as above and

taking the appropriate roots, we obtain

X, 2.2752509, X2 -1.5132312, giving

p, 0.7882, p2 = 0.1671, and p3 0.0382.

Partial knowledge of probabilities. Suppose we know p,, i = l,...,k. Since we

have n-1 degrees of freedom in choosing pi, assume k < n-2 to make the example non-

trivial. Then, the problem may be formulated as:
n n
max g(pi}) = E Pi in pi + I pi + q- 1 ,
{Pj} i = k+1 i = kk1

where q = Pi.
i =1

Solving, we obtain
S-- V i = k+l,...n.
Pi-n A"'

This is again fairly intuitive: the remaining probability 1-q is distributed non-

informatively over the rest of the probability space. For example, if n = 4, p, = 0.5,

and P2 = 0.3, then k = 2, q = 0.8, and P3 = p4 = (1 0.8)/(4 2) = 0.2/2 = 0.1.

Note that the first case is a special case of the last one, with q = k = 0.

The technique can be extended to cover prior knowledge expressed in the form

of probabilistic knowledge bases by using two key MaxEnt solutions: non-informativeness

(as covered in the last example above), and statistical independence of two random

variables given no knowledge to the contrary (in other words, given two probability

distributions f and g over two random variables X and Y respectively, and no further

information, the MaxEnt joint probability distribution h over X*Y is obtained as h =



5.1 Introduction

5.1.1 The Agency Relationship

The principal-agent problem arises in the context of the agency relationship in

social interaction. The agency relationship occurs when one party, the agent, contracts

to act as a representative of another party, the principal, in a particular domain of

decision problems.

The principal-agent problem is a special case of a dynamic two-person game. The

principal has available to her a set of possible compensation schemes, out of which she

must select one that both motivates the agent and maximizes her welfare. The agent also

must choose a compensation scheme which maximizes his welfare, and he does so by

accepting or rejecting the compensation schemes presented to him by the principal. Each

compensation package he considers implicitly influences him to choose a particular

(possibly complex) action or level of effort. Every action has associated with it certain

disutilities to the agent, in that he must expend a certain amount of effort and/or expense.

It is reasonable to assume that the agent will reject outright any compensation package

which yields less than that which can be obtained elsewhere in the market. This

assumption is in turn based on the assumptions that the agent is knowledgeable about his

"reservation constraint", and that he is free to act in a rational manner. The assumption

of rationality also applies to the principal. After agreeing to a contract, the agent

proceeds to act on behalf of the principal, which in due course yields a certain outcome.

The outcome is not only dependent on the agent's actions but also on exogenous factors.

Finally the outcome, when expressed in monetary terms, is shared between the principal

and the agent in the manner decided upon by the selected compensation plan.

The specific ways in which the agency relationship differs from the usual

employer-employee relationship are (Simon, 1951):

(1) The agent does not recognize the authority of the principal over specific tasks the

agent must do to realize the output.

(2) The agent does not inform the principal about his "area of acceptance" of

desirable work behavior.

(3) The work behavior of the agent is not directly (or costlessly) observable by the


Some of the first contributions to the analysis of principal-agent problems can be

found in Simon (1951), Alchian & Demsetz (1972), Ross (1973), Sitglitz (1974), Jensen

& Meckling (1976), Shavell (1979a, 1979b), Holmstrom (1979, 1982), Grossman & Hart

(1983), Rees (1985), Pratt & Zeckhauser (1985), and Arrow (1986).

There are three critical components in the principal-agent model: the technology,

the informational assumptions, and the timing. Each of these three components is

described below.

5.1.2 The Technology Component of Agency

The technology component deals with the type and number of variables involved

(for example, production variables, technology parameters, factor prices, etc.), the type

and the nature of functions defined on these variables (for example, the type of utility

functions, the presence of uncertainty and hence the existence of probability distribution

functions, continuity, differentiability, boundedness, etc.), the objective function and the

type of optimization (maximization or minimization), the decision criteria on which

optimization is carried out (expected utility, weighted welfare measures, etc.), the nature

of the constraints, and so on.

5.1.3 The Information Component of Agency

The information component deals with the private information sources of the

principal and the agent, and information which is public (i.e. known to both the parties

and costlessly verifiable by a third party, such as a court). This component of the model

addresses the question, "who knows what?". The role of the informational assumption

in agency is as follows:

(a) it determines how the parties act and make decisions (such as offer payment

schemes or choose effort levels),

(b) it makes it possible to identify or design communication structures,

(c) it determines what additional information is necessary or desirable for

improved decision making, and


(d) it enables the computation of the cost of maintaining or establishing

communication structures, or the cost of obtaining additional information.

For example, one usual assumption in the principal-agent literature is that the

agent's reservation level is known to both parties. As another example of the way in

which additional information affects the decisions of the principal, note that the principal,

in choosing a set of compensation schemes for presenting to the agent, wishes to

maximize her welfare. It is in her interest, therefore, to make the agent accept a payment

scheme which induces him to choose an effort level that will yield a desired level of

output (taking into consideration exogenous risk). The principal would be greatly

assisted in her decision making if she had knowledge of the "function" which induces the

agent to choose an effort level based on the compensation scheme, and also knowledge

of the hidden characteristics of the agent such as his utility of income, disutility of effort,

risk attitude, reservation constraint, etc. Similarly, the agent would be able to take better

decisions if he were more aware of his risk attitude, disutility of effort and exogenous

factors. Any information, even if imperfect, would reduce either the magnitude or the

variance of risk or both. However, better information for the agent does not always

imply that the agent will choose an act or effort level that is also optimal for the

principal. In some cases, the total welfare of the agency may be reduced as a result

(Christensen, 1981).

The gap in information may be reduced by employing a system of messages from

the agent to the principal. This system of messages may be termed a "communication

structure" (Christensen, 1981). The agent chooses his action by observing a signal from

his private information system after he accepts a particular compensation scheme from

the principal subject to its satisfying the reservation constraint. This signal is caused by

the combination of the compensation scheme, an estimate of exogenous risk by the agent

based on his prior information or experience, and the agent's knowledge of his risk

attitude and disutility of action. The communication structure agreed upon by both the

principal and the agent allows the agent to send a message to the principal. It is to be

noted that the agency contract can be made contingent on the message, which is jointly

observable by both the parties. The compensation scheme considers the messages) as

one (some) of the factors in the computation of the payment to the agent, the other of

course being the output caused by the agent's action. Usually, formal communication

is not essential, as the principal can just offer the agent a menu of compensation

schemes, and allow the agent to choose one element of the menu.

5.1.4 The Timing Component of Agency

Timing deals with the sequence of actions taken by the principal and the agent,

and the time when they commit themselves to specific decisions (for example, the agent

may choose an effort level before or after observing some signal about exogenous risk).

Below is one example of timing (T denotes time):

T1. The principal selects a particular compensation scheme from a set of possible

compensation schemes.

T2. The agent accepts or rejects the suggested compensation scheme depending on

whether it satisfies his reservation constraint or not.

T3. The agent chooses an action or effort level from a set of possible actions or effort


T4. The outcome occurs as a function of the agent's actions and exogenous factors

which are unknown or known only with uncertainty.

Another example of timing is when a communication structure with signals and

messages is involved (Christensen, 1981):

Tl. The principal designs a compensation scheme.

T2. Formation of the agency contract.

T3. The agent observes a signal.

T4. The agent chooses an act and sends a message to the principal.

T5. The output occurs from the agent's act and exogenous factors.

Variations in the principal-agent problems are caused by changes in one or more

of these components. For example, some principal-agent problems are characterized by

the fact that the agent may not be able to enforce the payment commitments of the

principal. This situation occurs in some of the relationships in the context of regulation.

Another is the possibility of renegotiation or review of the contract at some future date.

Agency theory, dealing with the above market structure, gives rise to a variety

of problems caused by the presence of factors such as the influence of externalities,

limited observability, asymmetric information, and uncertainty (Gjesdal, 1982).

5.1.5 Limited Observability. Moral Hazard, and Monitoring

An important characteristic of principal-agent problems limited observability of

the agent's actions gives rise to moral hazard. Moral hazard is a situation in which one

party (say, the agent) may take actions detrimental to the principal and which cannot be

perfectly and/or costlessly observed by the principal (see for example, [Holmstrom,

1979]). Formally, perfect observation might very well impose "infinite" costs on the

principal. The problem of unobservability is usually addressed by designing monitoring

systems or signals which act as estimators of the agent's effort. The selection of

monitoring signals and their value is discussed for the case of costless signals in Harris

and Raviv (1979), Holmstrom (1979), Shavell (1979), Gjesdal (1982), Singh (1985), and

Blickle (1987). Costly signals are discussed for three cases in Blickle (1987).

On determining the appropriate monitoring signals, the principal invites the agent

to select a compensation scheme from a class of compensation schemes which she, the

principal, compiles. Suppose the principal determines monitoring signals s,, ..., s,,, and

has a compensation scheme c(q, s,, ..., sj, where q is the output, which the agent

accepts. There is no agreement between the principal and the agent as to the level of the

effort e. Since the signals si, i = 1, ..., n determine the payoff and the effort level e of

the agent (assuming the signals have been chosen carefully), the agent is thereby induced

to an effort level which maximizes the expected utility of his payoff (or some other

decision criterion). The only decision still in the agent's control is the choice of how

much payoff he wants; the assumption is that the agent is rational in an economic sense.

The principal's residuum is the output q less the compensation c(-). The principal


structures the compensation scheme c(.) in such a way as to maximize the expected

utility of her residuum (or some other decision criterion). In this manner, the principal

induces desirable work behavior in the agent.

It has been observed that "the source of moral hazard is not unobservability but

the fact that the contract cannot be conditioned on effort. Effort is noncontractible."

(Rasmusen, 1989). This is true when the principal observes shirking on the part of the

agent but is unable to prove it in a court of law. However, this only implies that a

contract on effort is imperfectly enforceable. Moral hazard may be alleviated in cases

where effort is contracted, and where both limited observability and a positive probability

of proving non-compliance exist.

5.1.6 Informational Asymmetry. Adverse Selection, and Screening

Adverse selection arises in the presence of informational asymmetry which causes

the two parties to act on different sets of information. When perfect sharing of

information is present and certain other conditions are satisfied, first-best solutions are

feasible (Sappington and Stiglitz, 1987). Typically however, adverse-selection exists.

While the effect of moral hazard makes itself felt when the agent is taking actions

(say, production or sales), adverse selection affects the formation of the relationship, and

may give rise to inefficient (in the second-best sense) contracts. In the information-

theoretic approach, we can think of both being caused by lack of information. This is

variously referred to as the dissimilarity between private information systems of the agent

and the firm, or the unobservability or ignorance of "hidden characteristics" (in the latter

sense, moral hazard is caused by "hidden effort or actions").

In the theory of agency, the hidden characteristic problem is addressed by

designing various sorting and screening mechanisms, or communication systems that pass

signals or messages about the hidden characteristics (of course, the latter can also be used

to solve the moral hazard problem).

On the one hand, the screening mechanisms can be so arranged as to induce the

target party to select by itself one of the several alternative contracts (or "packages").

The selection would then reveal some particular hidden characteristic of the party. In

such cases, these mechanisms are called "self-selection" devices. See, for example,

Spremann (1987) for a discussion of self-selection contracts designed to reveal the agent's

risk attitude. On the other hand, the screening mechanisms may be used as indirect

estimators of the hidden characteristics, as when aptitude tests and interviews are used

to select agents.

The significance of the problem caused by the asymmetry of information is related

to the degree of lack of trust between the parties to the agency contract which, however,

may be compensated for by observation of effort. However, most real life situations

involving an agency relationship of any complexity are characterized not only by a lack

of trust but also by a lack of observability of the agent's effort. The full context to the

concept of information asymmetry is the fact that each party in the agency relationship

is either unaware or has only imperfect knowledge of certain factors which are better

known to the other party.

5.1.7 Efficiency of Cooperation and Incentive Compatibility

In the absence of asymmetry of information, both principal and agent would

cooperatively determine both the payoff and the effort or work behavior of the agent.

Subsequently, the "game" would be played cooperatively between the principal and the

agent. This would lead to an efficient agreement termed the first-best design of

cooperation. First-best solutions are often absent not merely because of the presence of

externalities but mainly because of adverse selection and moral hazard (Spremann, 1987).

Let F = { (c,e) }, where compensation c and effort e satisfy the principal's and

the agent's decision criteria respectively. In other words, F is the set of first-best

designs of cooperation, also called efficient designs with respect to the principal-agent

decision criteria. Now, suppose that the agent's action e is induced as above by a

function I: I(c) = e. Let S = { (c,I(c)) } -- i.e. S denotes the set of designs feasible

under information asymmetry. If it were not the case that F n S = 0, then efficient

designs of cooperation would be easily induced by the principal. Situations where this

occurs are said to be incentive compatible. In all other cases, the principal has available

to her only second-best designs of cooperation, which are defined as those schemes that

arise in the presence of information asymmetry.

5.1.8 Agency Costs

There are three types of agency costs (Schneider, 1987):

(1) the cost of monitoring the hidden effort of the agent,

(2) the bonding costs of the agent, and

(3) the residual loss, defined as the monetary equivalent of the loss in welfare of the

principal caused by the actions taken by the agent which are non-optimal with

respect to the principal.

Agency costs may be interpreted in the following two ways:

(1) they may be used to measure the "distance" between the first-best and the second-

best designs;

(2) they may be looked upon as the value of information necessary to achieve second-

best designs which are arbitrarily close to the first-best designs.

Obviously, the value of perfect information should be considered as an upper

bound on the agency costs (see for example, [Jensen and Meckling, 1976]).

5.2 Formulation of the Principal-Agent Problem

The following notation and definitions will be used throughout:

D: the set of decision criteria, such as {maximin, minimax, maximax, minimin,

minimax regret, expected value, expected loss,...}. We use A E D.

Ap: the decision criterion of the principal.

AA: the decision criterion of the agent.

Up: the principal's utility function.

UA: the agent's utility function.

C: the set of all compensation schemes. We use c E C.

E: the set of actions or effort levels of the agent. We use e E E.

0: a random variable denoting the true state of nature.

Op: a random variable denoting the principal's estimate of the state of nature.

O^: a random variable denoting the agent's estimate of the state of nature.

q: output realized from the agent's actions (and possibly the state of nature).

qp: monetary equivalent of the principal's residuum. Note that qp = q c(.),

where c may depend on the output and possibly other variables.

Output/outcome. The goal or purpose of the agency relationship, such as sales,

services or production, is called the output or the outcome.

Public knowledge/information. Knowledge or information known to both the

principal and the agent, and also a third enforcement party, is termed public knowledge

or information. A contract in agency can be based only on public knowledge (i.e.

observable output or signals).

Private knowledge/information. Knowledge or information known to either the

principal or the agent but not both is termed private knowledge or information.

State of nature. Any events, happenings, occurrences or information which are

not in the control of the principal or the agent and which affect the output of the agency

directly through the technology constitute the state of nature.

Compensation. The economic incentive to the agent to induce him to participate

in the agency is called the compensation. This is also called wage, payment or reward.

Compensation scheme. The package of benefits and output sharing rules or

functions that provide compensation to the agent is called the compensation scheme.

Also called contract, payment function or compensation function.


The word "scheme" is used here instead of "function" since complicated

compensation packages will be considered as an extension later on. In the literature, the

word "scheme" may be seen, but it is used in the sense of "function", and several nice

properties are assumed for the function (such as continuity, differentiability, and so on).

Depending on the contract, the compensation may be negative a penalty for the agent.

Typical components of the compensation functions considered in the literature are rent

(fixed and possibly negative), and share of the output.

The principal's residuum. The economic incentive to the principal to engage in

the agency is the principal's residuum. The residuum is the output (expressed in

monetary terms) less the compensation to the agent. Hence, the principal is sometimes

called the residual claimant.

Payoff. Both the agent's compensation and the principal's residuum are called

the payoffs.

Reservation welfare (of the agent). The monetary equivalent of the best of the

alternative opportunities (with other competing principals, if any) available to the agent

is known as the reservation welfare of the agent. Accordingly, it is the minimum

compensation that induces an agent to accept the contract, but not necessarily induce him

to his best effort level. Also known as reservation utility or individual utility, it is

variously denoted in the literature as m or U.

Disutility of effort. The cost of inputs which the agent must supply himself when

he expends effort contributes to disutility, and hence is called the disutility of effort.


Individual rationality constraint (IRC). The agent's (expected) utility of net

compensation (compensation from the principal less his disutility of effort) must be at

least as high as his reservation welfare. This constraint is also called the participation


When a contract violates the individual rationality constraint, the agent rejects it

and prefers unemployment instead. Such a contract is not necessarily "bad", since

different individuals have different levels of reservation welfare. For example,

financially independent individuals may have higher than usual reservation welfare levels,

and might very well prefer leisure to work even when contracts are attractive to most

other people.

Incentive compatibility constraint (ICC). A contract will be acceptable to the

agent if it satisfies his decision criterion on compensation, such as maximization of

expected utility of net compensation. This constraint is called the incentive compatibility


Development of the problem: Model 1. We develop the problem from simple

cases involving the least possible assumptions on the technology and informational

constraints, to those having sophisticated assumptions. Corresponding models from the

literature are reviewed briefly in section 1.3.

A. Technology:

(a) fixed compensation, C set of fixed compensations, U E C;

output q q(e); assume q(0) = 0;

existence of nonseparable utility functions;

decision criterion: maximization of utility;

no uncertainty in the state of nature.

B. Public information:

(a) compensation scheme, c;

(b) range of possible outputs, Q;

(c) U.


private to the principal: Up

Information private to the agent:

(a) U^;

(b) disutility of effort, d;

(c) range of effort levels, e.

C. Timing:

(1) the principal makes an offer of fixed wage c;

(2) the agent either rejects or accepts the offer;

(3) if he accepts it, exerts effort level e;

(4) output q(e) results;

(5) sharing of output according to contract.

D. Payoffs:

Case 1:



Case 2:


Agent rejects contract, i.e. e = 0;

= Up[q(e)] = Up[q(0)] = Up[0].

= UA[U].

Agent accepts contract;

= Up[q(e) c].

= UA[c d(e)].

E. The principal's problem:

(MI.P1) Max, c c maxq E Q Up[q c]

such that

c > U. (IRC)

Suppose C* c C is the solution set of Ml.P1. The principal picks c* E C* and offers

it to the agent.

The agent's problem:

(M1.A1) For a given c*,

Max, E E U^[c d(e)].

Suppose E* c E is the solution set of M1.A1. The agent selects e* E E*.

F. The solution:

(a) the principal offers c* E C* to the agent;

(b) the agent accepts the contract;

(c) the agent exerts effort e'(c') E E';

(d) output q(e*(c)) occurs;

(e) payoffs:

rp = Up[q(e'(c4)) c'];

7A = UA[c" d(e'(c'))].


1. The agent accepts the contract in F.b since IRC is present in Ml.PI, and C*

is nonempty since U E C.

2. Effort of the agent is a function of the offered compensation.

3. Since one of the informational assumptions was that the principal does not

know the agent's utility function, U is a compensation rather than the agent's

utility of compensation, so UA(U) is meaningful.

G. Variations:

1. The principal offers C to the agent instead of a c* E C*. The agent's problem

then becomes:

(M1.A2) Maxc. E c. max, E E UA[c d(e)].

The first three steps in the solution then become:

(a) the principal offers C* to the agent;

(b) the agent accepts the contract;

(c) the agent picks an effort level e* which is a solution to M1.A2 and reports

the corresponding c" (or its index if appropriate) to the principal.

2. The agent may decide to solve an additional problem: from among two or more

competing optimal effort levels, he may wish to select a minimum effort level.

Then, his problem would be:

(M1.A3) Min e- d(e)

such that

e* E argmax, E E UA[c* d(e)].


Let E = {e,, e2, e3},

C* = {c,,c2,c3}.


c1(q(e,)) = 5, d(e,) = 2;

c2(q(e2)) = 6, d(e2) = 3;

c3(q(e3)) = 6, d(e,) = 4;

The net compensation to the agent in choosing the three effort levels is 3, 3, and

2 respectively. Assuming d(e) is monotone increasing in e, the agent chooses e,

to e2, and so prefers compensation c, to C2.

3. We assumed U is public knowledge. If this were not so, then the agent has to

test all offers to see it they are at least as high as the utility of his reservation

welfare. The two problems then become:

(M1.P2) Maxc C maxq E Q Up[q c]


(M1.A4) Max, E UA[c" d(e)]

such that

c* > UA[U], (IRC)

c* E argmax M1.P2.

In this case, there is a distinct possibility of the agent rejecting an offer of the


4. Note that in most realistic situations, a distinction must be made between the

reservation welfare and the agent's utility of the reservation welfare. Otherwise,

merely using IRC with the reservation welfare in Ml.P1 may not satisfy the

agent's constraint. On the other hand, U = UA(U) implies knowledge of UA by

the principal, a complication which yields a completely different model.

When U UA(U), the following two problems occur:

(M 1.P3) Max c C maXq E Q Up(q c)

such that

c > U.

(M1.A5) Max, E E UA(C d(e))

such that

c. > UA(U), (IRC)

c* E argmax M1.P3.

In other words, the principal solves her problem the best way she can, and hopes

the solution is acceptable to the agent.

5. Negotiation. Negotiation of a contract can occur in two contexts:

(a) when there is no solution to the initial problem, the agent may communicate

to the principal his reservation welfare, and the principal may design new

compensation schemes or revise her old schemes so that a solution may be

found. This type of negotiation also occurs in the case of problems M1.P3

and M1.A5.

(b) The principal may offer c* E argmax, c c Ml .P1. The agent either accepts

it or does not; if he does not, then the principal may offer another optimal

contract, if any. This interaction may continue until either the agent accepts

some compensation scheme or the principal runs out of optimal


Development of the problem: Model 2. This model differs from the first by

incorporating uncertainty in the state of nature, and conditioning the compensation

functions on the output.

A. Technology:

(a) presence of uncertainty in the state of nature;

(b) compensation scheme c = c(q);

(c) output q = q(e,O);

(d) existence of known utility functions for the agent and the principal;

(e) disutility of effort for the agent is monotone increasing in effort e;

B. Public information:

(a) presence of uncertainty, and range of 0;

(b) output function q;

(c) payment functions c;

(d) range of effort levels of the agent.

Information private to the principal:

(a) the principal's utility function;

(b) the principal's estimate of the state of nature.

Information private to the agent:

(a) the agent's utility function;

(b) the agent's estimate of the state of nature;

(c) disutility of effort;

(d) reservation welfare;

C. Timing:

(a) the principal determines the set of all compensation schemes that maximize

her expected utility;

(b) the principal presents this set to the agent as the set of offered contracts;

(c) the agent picks from this set of compensation schemes a compensation

scheme that maximizes his net compensation, and a corresponding effort


(d) a state of nature occurs;

(e) an output results;

(f) sharing of the output takes place as contracted.

D. Payoffs:

Case 1: Agent rejects contract, i.e. e = 0;

rp = Up[q(e,0)] = Up[q(0,0)].

7KA = UA[U].

Case 2:



Agent accepts contract;

= Up[q(e,0) c(q)].

= UA[c(q) d(e)].

E. The principal's problem:
(M2.P) MaxCEC MaxoE EP Up [q(e, 0) c(q(e,Q))]

where the expectation E(.) is given by (assuming the usual regularity conditions)

f Up[q(e,0) c(q(e,0))] f- (0) dO
fo ep


0 E [0, U], and

f(O) is the distribution assigned by the principal.

The agent's problem:
(M2.A) Maxcec MaxeE E@A UA[c(q(e,O) ) d(e)]

subject to

EeA[c(q(e,O)) d(e)] >U, (IRC)

c e argmax(M2.P).
where the expectation E(.) is given as usual by

f UA[q(e,0) c(q(e,O))] f- (0) dO.
0 0,

F. The solution:

(a) The agent selects c* E C, and a corresponding effort e* which is a solution

to M2.A;

(b) a state of nature 0 occurs;

(c) output q(e*,0) is generated;

(d) payoffs:

,rp = Up[q(e',0) c'(q(e*,0))];

7rA = UA[c'(q(e',0)) d(e*)].

Development of the problem: Model 3. In this model, the strongest possible

assumption is made about information available to the principal: the principal has

complete knowledge of the utility function of the agent, his disutility of effort, and his

reservation welfare. Accordingly, the principal is able to make an offer of compensation

which satisfies the decision criterion of the agent and his constraints. In other words,

the two problems are treated as one. The assumptions are as in model 2, so only the

statement of the problem will be given below.

The problem:
MaxEc, e-e E Up[q(e*',Q) c(q(e*,O))]

subject to

E UA[c(q(e*,O)) d(e*)] > U, (IRC)

e* E argmax {MaxEE, cEc E U[c(q(e,O) ) d(e)] } (ICC)

5.3 Main Results in the Literature

Several results from basic agency models will be presented using the framework

established in the development of the problem. The following will be presented for each





Payoffs, and


It must be noted that the literature rarely presents such an explicit format; rather,

several assumptions are often buried within the results, or implied or just not stated.

Only by trying an algorithmic formulation is it possible to unearth unspecified

assumptions. In many cases, some of the factors are assumed for the sake of formal

completeness, even though the original paper neither mentions nor uses those factors in

its results. This type of modeling is essential when the algorithms are implemented

subsequently using a knowledge-intensive methodology.

One recurrent example of incomplete specification is the treatment of the agent's

individual rationality constraint (IRC). The principal has to pick a compensation which

satisfies IRC. However, some consistency in using IRC is necessary. The agent's

reservation welfare U is also a compensation (albeit a default one). The agent must

check one of two constraints to verify that the offered compensation indeed meets his

reservation welfare:

c > U or UA(c) -- UA(U).

If the principal picks a compensation which satisfies c > U, it is not necessary that

UA(C) -> UA(U) be also satisfied. However, using UA(C) > U for the IRC, where

U is treated "as if" it were UA(U), implies knowledge of the agent's utility on the part

of the principal.

The difference between the two situations is of enormous significance if the

purpose of analysis is to devise solutions to real-world problems. In the literature, this

distinction is conveniently overlooked. If all such vagueness in the technological,

informational and temporal assumptions was to be systematically eliminated, the analysis

might change in a way not intended in the original literature. Hence, the main results

in the literature will be presented as they are.

5.3.1 Model 1: The Linear-Exponential-Normal Model

This name of the model (Spremann, 1987) derives from the nature of three crucial

parameters: the payoff functions are linear, the utility functions are exponential, and the

exogenous risk has a normal distribution. Below is a full description.


(a) compensation is the sum of a fixed rent r and a share s of the output q: c(q)

= r + sq;

(b) presence of uncertainty in the state of nature, denoted by 9, where 0 -


(c) the set of effort levels of the agent, E = [O,1]; effort is induced by


(d) output q =- q(e,O) =- e + 0;

(e) the agent's disutility of effort is d d(e) e2;

(f) the principal's utility Up is linear (the principal is risk neutral);

(g) the agent has constant risk aversion ca > 0, and his utility is

UA(W) = -exp(-uw), where w is his net compensation (also called the


(h) the certainty equivalent of wealth, denoted V, is defined as:

V(w) = U-[E(U(w))], where U denotes the utility function, E0 is the

expectation with respect to 0; as usual, subscripts P or A on V denote the

principal or the agent respectively;

(i) the decision criterion is maximization of expected utility.

Public information:

(a) compensation scheme c(q; r,s);

(b) output q;

(c) distribution of 0;

(d) agent's reservation welfare U;

(e) agent's risk aversion a.

Information private to the principal:

Utility of residuum, Up.

Information private to the agent:

(a) selection of effort given the compensation;

(b) utility of welfare;

(c) disutility of effort.


(a) the principal offers a contract (r,s) to the agent;

(b) the agent's effort e is induced by the compensation scheme;

(c) a state of nature occurs;

(d) the agent's effort and the state of nature give rise to output;

(e) sharing of the output takes place.


-p = Up[q (r + sq)]

= Up[e(r,s) + 0 (r + s(e(r,s) + 0o))]

7rA = UA[r + sq d(e(r,s))]

= U^[r + s(e(r,s) + 0) d(e(r,s))],

where e(r,s) is the function which induces effort based on compensation, and 0o

is the realized state of nature.



Result 1.1: The optimal effort level of the agent given a compensation scheme

(r,s) is denoted e*, and is obtained by straightforward maximization to yield:

e* = e*(r,s) = s/2.

This shows that the rent r and the reservation welfare U have no impact on the selection

of the agent's effort.

Result 1.2: A necessary and sufficient condition for IRC to be satisfied for a

given compensation scheme (r,s) is:

j S2 (1 2a(12)
r U S( 00

Result 1.3: The optimal compensation scheme for the principal is c* = (r*,s*),


s* = 1and
1 + 2ao*

= 1 2go2
4s *2

Corollary 1.3: The agent's optimal effort given a compensation scheme (r*,s")

is (using result 1.1):
e 1
2 (1 + 2ao2)


Result 1.4: Suppose 2ao? > 1. Then, an increase in share s requires an increase

in rent r (in order to satisfy IRC).

To see this, suppose we increase the share s by 5,

o= s + 5, 0 < 6 < 1-s. From Result 1.2, for IRC to hold we need,

so2(1 2xo2)

= (s + 8)2(1 2a02)

(1 2go02)[S2 + 2s8 + 82]

S (1 2 ~ 2) 2 (2S6 + 82)(1 2aO2)
4 4

= (2S8 + 82)(1 2a02)

r ( 1 < 2ao 2).

Result 1.5: The welfare attained by the agent is U, while the principal's welfare

is given by:
S -U.

Result 1.6: The principal prefers agents with lower risk aversion. This is

immediate from the fact that the principal's welfare is decreasing in the agent's risk

aversion for a given o2 and U.

Result 1.7: Fixed fee arrangements are non-optimal, no matter how large the

agent's risk aversion. This is immediate from the fact that

s* =- 1 > 0 Va > 0.
1 + 2aco2

Result 1.8: It is the connection between unobservability of the agent's effort and

his risk aversion that excludes first-best solutions.

5.3.2 Model 2

This model (Gjesdal, 1982) deals with two problems:

(a) choosing an information system, and

(b) designing a sharing rule based on the information system.


(a) presence of uncertainty, 0;

(b) finite effort set of the agent; effort has several components, and is hence

treated as a vector;

(c) output q is a function of the agent's effort and the state of nature 0; the

range of output levels is finite;

(d) presence of a finite number of public signals;

(e) presence of a set of public information systems (i.e. signals), including non-

informative and randomized systems, the output being treated as one of the

informative information systems;

(f) costlessness of public information systems;

(g) compensation schemes are based on signals about effort or output or both.

Public information:

(a) distribution of the state of nature, 0;

(b) output levels;

(c) common information systems which are non-informative and randomizing;

(d) UA.

Information private to the principal: utility function, Up.

Information private to the agent: disutility of effort.


(a) principal offers contract based on observable public information systems,

including the output;

(b) agent chooses action;

(c) signals from the specified public information systems are observed;

(d) agent gets paid on the basis of the signal;

(e) a state of nature occurs;

(f) output is observed;

(g) principal keeps the residuum.

Special technological assumptions: Some of these assumptions are used in only some of

the results; other results are obtained by relaxing them.

(a) The joint probability distribution function on output, signals, and actions is

twice-differentiable in effort, and the marginal effects on this distribution of

the different components of effort are independent.

(b) The principal's utility function Up is trice differentiable, increasing, and


(c) The agent's utility function UA is separable, with the function on the

compensation scheme (or sharing rule as it is known) being increasing and

concave, and the function on the effort being concave.


Result 2.1: There exists a marginal incentive informativeness condition which is

essentially sufficient for marginal value given a signal information system Y. When

information about the output is replaced by signals about the output and/or the agent's

effort, marginal incentive informativeness is no longer a necessary condition for marginal

value since an additional information system Z may be valuable as information about

both the output and the effort.


Result 2.2: Information systems having no marginal insurance value but having

marginal incentive informativeness may be used to improve risk sharing, as for example,

when the signals which are perfectly correlated with output on the agent's effort are

completely observable.

Result 2.3: Under the assumptions of result 2.2, when the output alone is

observed, it must be used for both incentives and insurance. If the effort is observed as

well, then a contract may consist of two parts: one part is based on the effort, and takes

care of incentives; the other part is based on output, and so takes care of risk-sharing.

For example, consider auto insurance. The principal (the insurer) cannot observe

the actions taken by the driver (such as care, caution and good driving habits) to avoid

collisions. However, any positive signals of effort can be the basis of discounts on

insurance premiums, as for example when the driver has proof of regular maintenance

and safety check up for the vehicle or undergoes safe driving courses. Also factors such

as age, marital status and expected usage are taken into account. The "output" in this

case is the driving history, which can be used for risk- sharing; another indicator of risk

which may be used is the locale of usage (country lanes or heavy city traffic). This

example motivates result 2.4, a corollary to results 2.2 and 2.3.

Result 2.4: Information systems having no marginal incentive informativeness

but having marginal insurance value may be used to offer improved incentives.

Result 2.5: If the uncertainty in the informative signal system is influenced by

the choices of the principal and the agent, then such information systems may be used

for control in decentralized decision-making.

5.3.3 Model 3

Holmstrom's model (Holmstrom, 1979) examines the role of imperfect

information under two conditions: (i) when the compensation scheme is based on output

alone, and (ii) when additional information is used. The assumptions about technology,

information and timing are more or less standard, as in the earlier models. The model

specifically uses the following:

(a) In the first part of the model, almost all information is public; in the second

part, asymmetry is brought in by assuming extra knowledge on the part of

the agent.

(b) output is a function of the agent's effort and state of nature: q q(e,0), and

aq/ae > 0.

(c) The agent's utility function is separable in compensation and effort, where

UA(c) is defined on compensation, and d(e) is the disutility defined on effort.

(d) Disutility of effort d(e) is increasing in effort.

(e) The agent is risk averse, so that UA" < 0.

(f) The principal is weakly risk neutral, so that Up" < 0.

(g) Compensation is based on output alone.

(h) Knowledge of the probability distribution on the state of nature 0 is public.

(i) Timing: The agent chooses effort before the state of nature is observed.

The problem:

(P) MaxEC c EE E[Up(q c(q))]

such that

E[UA(c(q),e)] > U, (IRC)

e E argmax,.EE E[UA(C(q), e')]. (ICC)

To obtain a workable formulation, two further assumptions are made:

(a) There exists a distribution induced on output and effort by the state of

nature, denoted F(q,e), where q q(e,0). Since aq/ae > 0 by assumption,

it implies aF(q,e)/ae < 0. For a given e, assume aF(q,e)/ae < 0 for some

range of values q.

(b) F has density function f(q,e), where (denoting fe = af/ae) f, and fe are well

defined for all (q,e).

The ICC constraint in (P) is replaced by its first order condition using f, and the

following formulation is obtained:

(P') MaxEC ,cEE I Up(q c(q)) f(q,e) dq

such that

I [UA(c(q)) d(e)] f(q,e) dq U, (IRC')

SUA(c(q)) fQ(q,e) dq = d'(e). (ICC')


Result 3.1: Let X and /A be the Lagrange multipliers for IRC' and ICC' in (P')

respectively. Then, the optimal compensation schemes are characterized as follows:

U(q c(q)) f.(qe)
-- ; ----__ = X + IL. -^ ]
Uc(q)) fq,e)

where c is the agent's wealth, and c is the principal's wealth plus the output (these form

the lower and upper bounds). If the equality in the above characterization does not hold,

then c(q) = c or c depending on the direction of inequality.

Result 3.2: Under the given assumptions and the characterization in result 3.1,

1A > 0; this is equivalent to saying that the principal prefers the agent increase his effort

given a second-best compensation scheme as in the above result 3.1. The second-best

solution is strictly inferior to a first-best solution.

Result 3.3: f I /f is interpreted as a benefit-cost ratio for deviation from optimal

risk sharing. Result 3.1 states that such deviation must be proportional to this ratio

taking individual risk aversion into account. From Result 3.2, incentives for increased

effort are preferable to the principal. The following compensation scheme accomplishes

this (where cF(q) denotes the first-best solution for a given X):

c(q) > cF(q), if the marginal return on effort is positive to the agent;

c(q) < cF(q), otherwise.

Result 3.4: Intuitively, the agent carries excess responsibility for the output. This

is implied by result 3.3 and the assumptions on the induced distribution f.

A previous assumption is now modified as follows: Compensation c is a function

of output and some other signal y which is public knowledge. Associated with this is a

joint distribution F(q,y,e) (as above), with f(q,y,e) the corresponding density function.


Result 3.5: An extension of result 3.1 on the characterization of optimal

compensation schemes is as follows:

U(q c(q,y)) f(q,y,e)
U(c(q,y)) W

where X and /x are as in result 3.1.

Result 3.6: Any informative signal, no matter how noisy it is, has a positive value

if costlessly obtained and administered into the contract.

Note: This result is based on rigorous definitions of value and informativeness of signals

(Holmstrom, 1979).

In the second part of this model, an assumption is made about additional

knowledge of the state of nature revealed to the agent alone, denoted z. This introduces

asymmetry into the model. The timing is as follows:

(a) the principal offers a contract c based on the output and an observed signal


(b) the agent accepts the contract;

(c) the agent observes a signal z about 0;

(d) the agent chooses an effort level;

(e) a state of nature occurs;

(f) agent's effort and state of nature yield an output;

(g) sharing of output takes place.

We can think of the signal y as information about the state of nature which both

parties share and agree upon, and the signal z as special post-contract information about

the state of nature received by the agent alone.

For example, a salesman's compensation may be some combination of percentage

of orders and a fixed fee. If both the salesman and his manager agree that the economy

is in a recession, the manager may offer a year-long contract which does not penalize the

salesman for poor sales, but offers above subsistence level fixed fee to motivate loyalty

to the firm on the part of the salesman, and a clause thrown in which transfers a larger

share of output than normal to the agent (i.e. incentives for extra effort in a time of


Now suppose the salesman, as he sets out on his rounds, discovers that the

economy is in an upswing, and that his orders are being filled with little effort on his

part. Then the agent may continue to exert little effort, realize high output, get a higher

share of output in addition to a higher initial fixed fee as his compensation.

In the case of asymmetric information, the problem is formulated as follows:

(PA) Maxc(qy)Ec,e(z)EE I Up(q c(q,y))f(q,y I z,e(z))p(z)dqdydz

such that

I UA(c(q,y))f(q,y I z,e(z))p(z)dqdydz- J d(e(z))pzdz > U, (IRC)

e(z) E argmax.gE I UA(c(q,y))f(q,yIz,e')dqdy- d(e') V z (ICC)

where p(z) is the marginal density of z, d(e(z)) is the disutility of effort e(z).

Let X and 1t(z)p(z) be the Lagrange multipliers for (IRC) and (ICC) in (PA) respectively.

Result 3.7: The extension of result 3.1 on the characterization of optimal

compensation schemes to the problem (PA) is:

U'(q c(qy)) I fpL(z).f(q,y Iz,e(z))p(z)dz
Uq-------)- = A. + -------

U(c(q,y)) fftq,y Iz,e(z))p(z)dz

The interpretation of result 3.7 is similar to that of result 3.1. Analogous to result 3.2,

p(z) 4 0, and /z(z) < 0 for some z and )(z) > 0 for other z, which implies, as in

result 3.2, that result 3.7 characterizes solutions which are second-best.

5.3.4 Model 4: Communication under Asymmetry

This model (Christensen, 1981) attempts an analysis similar to model 3, and

includes communication structures in the agency. The special assumptions are as


(a) There is a set of messages M that the agent uses to communicate with the

principal; compensation is based on the output and the message picked by

the agent; hence, the message is public knowledge.

(b) There is a set of signals about the environment; the agent chooses his effort

level based on the signal he observes; the agent also selects his compensation

scheme at this time by selecting an appropriate message to communicate to

the principal; selection of the message is based on the effort.

(c) Uncertainty is with respect to the signals observed by the agent; the

distribution characterizing this uncertainty is public knowledge; the joint

density is defined on output and signal conditioned on the effort:

f(qtle) = f(ql|,e)'f().

(d) Both parties are Savage(1954)-rational.

(e) The principal's utility of wealth is Up, with weak risk-aversion; in particular,

Up' > 0 and U" < 0.

(f) The agent's utility of wealth is separable into UA defined on compensation

and disutility of effort. The agent has positive marginal utility for money,

and he is strictly risk-averse; i.e. UA' > 0, UA" < 0, and d' > 0.


(a) The principal and the agent determine the set compensation schemes, based

on the output and the message sent to the principal by the agent; the

principal is committed to this set of compensation schemes;

(b) the agent accepts the compensation scheme if it satisfies his reservation


(c) the agent observes a signal ;

(d) the agent picks an effort level based on ;

(e) the agent sends a message m to the principal; this causes a compensation

scheme from the contracted set to be chosen;

(f) output occurs;

(g) sharing of output takes place.

Note that in the timing, (d) and (e) could be interchanged in this model without affecting


The following is the principal's problem:

(P) Find (c*(q,m),e'(,m),m*()) such that c* E C, e* E E, and

m* E M solves:

Maxq,n,),mi) E[Up(q c(q,m))]

such that

E[UA(c(q,m)) d(e)] U, (IRC)

e(Q) E argmaxo.'E E[UA(c(q,m())) d(e') (self-selection of action),

m(Q) argmaxm.eM E[UA(c(q,m'))-d(e(Q,m')) f] (self-selection of


where e(Q,m) is the optimal act given that is observed and m is reported.

The following assumptions are used for analyzing the problem in the above formulation:

(a) Up(.) and U^(*) d(.) are concave and twice continuously differentiable in

all arguments.

(b) Compensation functions are piecewise continuous and differentiable a.e.(Q).

(c) The density function f is twice differentiable a.e.

(d) Regularity conditions enable differentiation under the integral sign.

(e) Existence of an optimal solution is assumed.


Result 4.1: The following is a characterization of optimal functions:

Up,(q-c "(q,&)) = e .-( l--)) p(tq,E le(E))
= I *
U (c *(q,E)) Aq,E le *(E)) f(q,E le *(E)

where X, 1(Q), and p(Q) are Lagrange multipliers for the three constraints in (P)


5.3.5 Model G: Some General Results

Result G. 1 (Wilson. 1968). Suppose that both the principal and the agent are risk

averse having linear risk tolerance functions with the same slope, and the disutility of the

agent's effort is constant. Then the optimal sharing rule is a non-constant function of the


Result G.2. In addition to the assumptions of result G. 1, also suppose that the

agent's effort has negative marginal utility. Let cl(q) be a sharing rule (or compensation

scheme) which is linear in the output q, and let c2(q) = k be a constant sharing rule.

Then, c, dominates c2.

The two results above deal with conditions when observation of the output is

useful. Suppose Y is a public information system that conveys information about the

output. So, compensation schemes can be based on Y alone. The value of Y, denoted

W(Y) (following model 1), is defined as: W(Y) = maxcEc EUp[q c(y)], subject to IRC


and ICC. Let Y' denote a non-informative signal. Then, the two results yield a ranking

of informativeness: W(Y) > W(Y). When Q is an information system denoting perfect

observability of the output q, and the timing of the agency relationship is as in model 1

(i.e. payment is made to the agent after observing the output), then W(Q) > W(Y) as



The solution to the principal-agent problem is influenced by the way the model

itself is setup in the literature. Highly specialized assumptions, which are necessary in

order to use the optimization technique, contribute a certain amount of bias. As an

analogy, one may note that a linear regression model assumes implicit bias by seeking

solutions only among linear relationships between the variables; a correlation coefficient

of zero therefore implies only that the variables are not linearly correlated, not that they

are not correlated. Examples of such specialized assumptions abound in the literature,

a small but typical sample of which are detailed in the models presented in Chapter 5.

The consequences of using the optimization methodology are primarily of two.

Firstly, much of the pertinent information that is available to the principal, the agent and

the researcher must be ignored, since this information deals with variables which are not

easily quantifiable, or which can only be ranked nominally, such as those that deal with

behavioral and motivational characteristics of the agent and the prior beliefs of the agent

and principal (regarding the task at hand, the environment, and other exogenous

variables). Most of this knowledge takes the form of rules linking antecedents and

consequents, and which have associated certainty factors.


Secondly, a certain amount of bias is introduced into the model by requiring that

the functions involved in the constraints satisfy some properties, such as differentiability,

monotone likelihood ratio, and so on. It must be noted that many of these properties are

reasonable and meaningful from the standpoint of accepted economic theory. However,

standard economic theory itself relies heavily on concepts such as utility and risk

aversion in order to explain the behavior of economic agents. Such assumptions have

been criticized on the grounds that individuals violate them; for example, it is known that

individuals sometimes violate properties of the Neumann-Morgenstemrn utility functions.

Decision theory addressing economic problems also uses concepts such as utility, risk,

loss, and regret, and relies on classical statistical inference procedures. However, real

life individuals are rarely consistent in their inference, lacking in statistical sophistication,

and unreliable on probability calculations. Several references to support this view are

cited in Chapter 2. If the term "rational man" as used in economic theory means that

individuals act as if they were sophisticated and infallible (in terms of method and not

merely content), then economic analysis might very well yield erroneous solutions.

Consider, as an example, the treatment of compensation schemes in the literature.

They are assumed to be quite simple, either being linear in the output, or involving a

fixed element called the rent. (See chapter 5 for details). In practice, compensation

schemes are fairly comprehensive and involved. They cover as many contingencies as

possible, provide for a variety of payment and reward criteria, specify grievance

procedures, termination, promotion, varieties of fringe benefits, support services, access

to company resources, and so on.

The set of all compensation schemes is in fact a set of knowledge bases consisting

of the following components (B.R. Ellig, 1982):

(1) Compensation policies/strategies of the principal;

(2) Knowledge of the structure of the compensation plans, which means specific rules

concerning short-term incentives linked to partial realization of expected output,

long-term incentives linked to full realization of expected output, bonus plans

linked to realizing more than the expected output, disutilities linked to

underachievement, and rules specifying injunctions to the agent to restrain from

activities that may result in disutilities to the principal (if any).

There are various elements in a compensation scheme, which can be classified as

financial and non-financial:

Financial elements of compensation

1. Base Pay (periodic).

2. Commission or Share of Output.

3. Bonus (annual or on special occasions).

4. Long Term Income (lump sum payments at termination).

5. Benefits (insurance, etc.).

6. Stock Participation.

7. Non-taxable or tax-sheltered values.

Nonfinancial elements of compensation

1. Company Environment.

2. Work Environment.