UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
A KNOWLEDGEINTENSIVE MACHINELEARNING APPROACH TO THE PRINCIPALAGENT PROBLEM By KIRAN K. GARIMELLA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1993 To my mother, Dr. Seeta Garimella ACKNOWLEDGMENTS I thank Prof. Gary Koehler, chairman of the DIS department, a guru to me in the deepest sense of the word who made it possible for me to grow intellectually and experience the richness and fulfillment of an active mind. I also want to thank Prof. Selcuk Erenguc for encouraging me at all times; Prof. Harold Benson who taught me care, caution, and clarity in thinking by patiently teaching me proof techniques in mathematics; Prof. David E.M. Sappington for giving me invaluable lessons, by his teaching and example, on research techniques, for writing papers and books that are replete with elegance and clarity, and for ensuring that my research is meaningful and interesting from an economist's perspective; Prof. Sanford V. Berg, for providing valuable suggestions in agency theory; and Prof. Richard Elnicki, Prof. Antal Majthay, and Prof. Ira Horowitz for their advice and help with the research. I thank Prof. Malay Ghosh, Department of Statistics, and Prof. Scott McCullough, Department of Mathematics, for their guidance in statistics and mathematics. I also thank the administrative staff of the DIS department for helping me in numerous ways and making my work extremely pleasant. I thank my wife, Raji, for her patience and understanding while I put in long and erratic hours. I cannot conclude without expressing my deepest sense of gratitude to my mother, Dr. Seeta Garimella, who constantly encouraged me in ways too numerous to recount and made it possible for me to pursue my studies in the land of my dreams. TABLE OF CONTENTS ACKNOWLEDGMENTS ................................... iii LIST OF TABLES ...................................... viii ABSTRACT ........................................... xv 1 OVERVIEW ...................................... 1 2 EXPERT SYSTEMS AND MACHINE LEARNING ............... 6 2.1 Introduction ............ .... .. .. ...... ..... ... 6 2.2 Expert Systems ..................................... 8 2.3 Machine Learning ................................. 10 2.3.1 Introduction ................................. 10 2.3.2 Definitions and Paradigms ....................... 14 2.3.3 Probably Approximately Close Learning ............. 21 3 GENETIC ALGORITHMS .............................. 23 3.1 Introduction ................................... 23 3.2 The Michigan Approach ............................ 26 3.3 The Pitt Approach ................................ 27 4 THE MAXIMUM ENTROPY PRINCIPLE ................... 28 4.1 Historical Introduction .............................. 28 4.2 Examples ..................................... 34 5 THE PRINCIPALAGENT PROBLEM .................... 38 5.1 Introduction ............ ............................38 5.1.1 The Agency Relationship ....................... 38 5.1.2 The Technology Component of Agency ............. 40 5.1.3 The Information Component of Agency ............. 40 5.1.4 The Timing Component of Agency .................. 42 5.1.5 Limited Observability, Moral Hazard, and Monitoring . . 44 5.1.6 Informational Asymmetry, Adverse Selection, and Screening 45 5.1.7 Efficiency of Cooperation and Incentive Compatibility . . 47 5.1.8 Agency Costs . . . . . . . . . . . . . ... .. 47 5.2 Formulation of the PrincipalAgent Problem . . . . . . ... .. 48 5.3 Main Results in the Literature . . . . . . . . . . ... .. 62 5.3.1 Model 1: The LinearExponentialNormal Model . . . .. ..63 5.3.2 M odel 2 . . . . . . . . . . . . . . . ... .. 68 5.3.3 Model 3 . . .................................. 72 5.3.4 Model 4: Communication under Asymmetry . . . . ... ..77 5.3.5 Model G: Some General Results . . . . . . . . ... .. 80 6 METHODOLOGICAL ANALYSIS . . . . . . . . . . ... .. 82 7 MOTIVATION THEORY . . . . . . . . . . . . . ... .. 87 8 RESEARCH FRAMEWORK . . . . . . . . . . . . ... .. 92 9 M ODEL 3 ........................................ 97 9.1 Introduction .................................... 97 9.2 An Implementation and Study ......................... 101 9.3 Details of Experiments ............................ 106 9.3.1 Rule Representation .......................... 106 9.3.2 Inference Method .............................. 110 9.3.3 Calculation of Satisfaction ....................... 111 9.3.4 Genetic Learning Details . . . . . . . . . . ... .. 114 9.3.5 Statistics Captured for Analysis . . . . . . . . ... .. 115 9.4 Results . . . . . . . . . . . . . . . . . . . 116 9.5 Analysis of Results ................................ 118 10 REALISTIC AGENCY MODELS ......................... 149 10.1 Characteristics of Agents ......................... 157 10.2 Learning with Specialization and Generalization .......... 158 10.3 Notation and Conventions ...................... 160 10.4 Model 4: Discussion of Results ................... 161 10.5 Model 5: Discussion of Results ................... 163 10.6 Model 6: Discussion of Results ................... 164 10.7 Model 7: Discussion of Results ................... 165 10.8 Comparison of the Models ........................ 167 10.9 Examination of Learning ......................... 172 11 CONCLUSION . . . . . . . . . . . . . . . . . . 194 12 FUTURE RESEARCH ................................. 198 12.1 Nature of the Agency . . . . . . . . . . . ... .. 198 12.2 Behavior and Motivation Theory . . . . . . . . ... ..199 12.3 Machine Learning . . . . . . . . . . . . ... .. 200 12.4 Maximum Entropy . . . . . . . . . . . . ... ..203 APPENDIX FACTOR ANALYSIS . . . . . . . . . . . ... ..204 REFERENCES . . . . . . . . . . . . . . . . . . . . 206 BIOGRAPHICAL SKETCH ................................. 219 LIST OF TABLES Table page 9.1: Characterization of Agents . . . . . . . . . . . . . . ... .. 125 9.2: Iteration of First Occurrence of Maximum Fitness . . . . . . ... ..126 9.3: Learning Statistics for Fitness of Final Knowledge Bases . . . . .. ..126 9.4: Entropy of Final Knowledge Bases and Closeness to the Maximum . . . 126 9.5: Frequency (as Percentage) of Values of Compensation Variables in the Final Knowledge Base in Experiment 1 . . . . . . . . . . . ... .. 127 9.6: Range, Mean and Standard Deviation of Values of Compensation Variables in the Final Knowledge Base in Experiment 1 . . . . . . . ... ..127 9.7: Correlation Analysis of Values of Compensation Variables in the Final Knowledge Base in Experiment 1 . . . . . . . . . . . ... .. 128 9.8: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 1 . . . . . . . . . . . . . . ... .. 128 9.9: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 1 Factor Pattern . . . . . . . . . ... ..129 9.10: Experiment 1 Varimax Rotation . . . . . . . . . . . ... .. 130 9.11: Frequency (as Percentage) of Values of Compensation Variables in the Final Knowledge Base in Experiment 2 . . . . . . . . . ... ..131 9.12: Range, Mean and Standard Deviation of Values of Compensation Variables in the Final Knowledge Base in Experiment 2 . . . . . . . ... ..131 9.13: Correlation Analysis of Values of Compensation Variables in the Final Knowledge Base in Experiment 2 . . . . . . . . . . . ... ..131 viii 9.14: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 2 Eigenvalues of the Correlation Matrix ....... ..132 9.15: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 2 Factor Pattern . . . . . . . . . ... ..133 9.16: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 2 Varimax Rotated Factor Pattern . . . . .. ..134 9.17: Frequency (as Percentage) of Values of Compensation Variables in the Final Knowledge Base in Experiment 3 . . . . . . . . . ... ..135 9.18: Range, Mean and Standard Deviation of Values of Compensation Variables in the Final Knowledge Base in Experiment 3 . . . . . . . ... ..135 9.19: Correlation Analysis of Values of Compensation Variables in the Final Knowledge Base in Experiment 3 . . . . . . . . . . . ... .. 135 9.20: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 3 Eigenvalues of the Correlation Matrix ....... ..136 9.21: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 3 Factor Pattern . . . . . . . . . ... ..137 9.22: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 3 Varimax Rotated Factor Pattern . . . . ... ..138 9.23: Frequency (as Percentage) of Values of Compensation Variables in the Final Knowledge Base in Experiment 4 . . . . . . . . . ... ..139 9.24: Range, Mean and Standard Deviation of Values of Compensation Variables in the Final Knowledge Base in Experiment 4 . . . . . . . ... ..139 9.25: Correlation Analysis of Values of Compensation Variables in the Final Knowledge Base in Experiment 4 . . . . . . . . . . . ... .. 139 9.26: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 4 Eigenvalues of the Correlation Matrix ....... ..140 9.27: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 4 Factor Pattern . . . . . . . . . ... .. 141 9.28: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 4 Varimax Rotated Factor Pattern . . . . ... ..143 9.29: Frequency (as Percentage) of Values of Compensation Variables in the Final Knowledge Base in Experiment 5 . . . . . . . . . ... ..144 9.30: Range, Mean and Standard Deviation of Values of Compensation Variables in the Final Knowledge Base in Experiment 5 . . . . . . . ... ..144 9.31: Correlation Analysis of Values of Compensation Variables in the Final Knowledge Base in Experiment 5 . . . . . . . . . . . ... ..144 9.32: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 5 Eigenvalues of the Correlation Matrix ....... ..145 9.33: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 5 Factor Pattern . . . . . . . . . ... ..145 9.34: Factor Analysis (Principal Components Method) of the Final Knowledge Base of Experiment 5 Varimax Rotated Factor Pattern . . . . ... ..146 9.35: Summary of Factor Analytic Results for the Five Experiments . . ... ..146 9.36: Expected Factor Identification of Compensation Variables for the Five Experiments Derived from the Direct Factor Analytic Solution ....... ..147 9.37: Expected Factor Identification of Compensation Variables for the Five Experiments Derived from the Varimax Rotated Factor Analytic Solution . . . . . . . . . . . . . . . . . . . . 147 9.38: Expected Factor Identification of Behavioral and Risk Variables for the Five Experiments Derived from the Direct Factor Pattern . . . ... ..148 9.39: Expected Factor Identification of Behavioral and Risk Variables for the Five Experiments Derived from Varimax Rotated Factor Analytic Solution . . . . . . . . . . . . . . . . . . . . 148 10.1: Correlation of LP and CP with Simulation Statistics (Model 4) . . ... ..174 10.2: Correlation of LP and CP with Compensation Offered to Agents (Model 4) . . . . . . . . . . . . . . . . . . . . . . 174 10.3: Correlation of LP and CP with Compensation in the Principal's Final KB (M odel 4) . . . . . . . . . . . . . . . . . . . 174 10.4: Correlation of LP and CP with the Movement of Agents (Model 44 . . 174 10.5: Correlation of LP with Agent Factors (Model 4) . . . . . . ... ..174 10.6: Correlation of LP and CP with Agents' Satisfaction (Model 4) . . ... ..175 10.7: Correlation of LP and CP with Agents' Satisfaction at Termination (Model 4) . . . . . . . . . . . . . . . . . . . . . . 175 10.8: Correlation of LP and CP with Agency Interactions (Model 4) . . ... ..175 10.9: Correlation of LP with Rule Activation (Model 4) . . . . . . ... .. 175 10.10: Correlation of LP with Rule Activation in the Final Iteration (Model 4) . 175 10.11: Correlation of LP and CP with Principal's Satisfaction and Least Squares (M odel 4) . . . . . . . . . . . . . . . . . . . 175 10.12: Correlation of Agent Factors with Agent Satisfaction (Model 4) . . . 176 10.13: Correlation of Principal's Satisfaction with Agent Factors (Model 4) . 176 10.14: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model 4) . . . . . . . . . . . . . . . . . . . . . . 176 10.15: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction (M odel 4) . . . . . . . . . . . . . . . . . . . 176 10.16: Correlation of Principal's Factor with Agent Factors (Model 4) . . . 177 10.17: Correlation of LP and CP with Simulation Statistics (Model 5) ....... ..177 10.18: Correlation of LP and CP with Compensation Offered to Agents (Model 5) . . . . . . . . . . . . . . . . . . . . . . 177 10.19: Correlation of LP and CP with Compensation in the Principal's Final Knowledge Base (Model 5) . . . . . . . . . . . . . ... ..177 10.20: Correlation of LP and CP with the Movement of Agents (Model 5) . .. 177 10.21: Correlation of LP with Agent Factors (Model 5) . . . . . . ... ..178 10.22: Correlation of LP and CP with Agents' Satisfaction (Model 5) ....... ..178 10.23: Correlation of LP and CP with Agents' Satisfaction at Termination (Model 5) . . . . . . . . . . . . . . . . . . . . . . 178 10.24: Correlation of LP and CP with Agency Interactions (Model 5) ....... ..178 10.25: Correlation of LP with Rule Activation (Model 5) . . . . . . ... ..178 10.26: Correlation of LP with Rule Activation in the Final Iteration (Model 5 . 179 10.27: Correlation of LP and CP with Payoffs from Agents (Model 5) . . .. ..179 10.28: Correlation of LP and CP with Principal's Satisfaction, Principal's Factor and Least Squares (Model 5) . . . . . . . . . . . . ... ..179 10.29: Correlation of Agent Factors with Agent Satisfaction (Model 5) . . . 179 10.30: Correlation of Principal's Satisfaction with Agent Factors (Model 5) . 180 10.31: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model 5) . . . . . . . . . . . . . . . . . . . . . . 180 10.32: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction (M odel 5) . . . . . . . . . . . . . . . . . . . 180 10.33: Correlation of Principal's Satisfaction with Outcomes from Agents (Model 5) . . . . . . . . . . . . . . . . . . . . . . 18 1 10.34: Correlation of Principal's Factor with Agents' Factors (Model 5) . . . 181 10.35: Correlation of LP and CP with Simulation Statistics (Model 6) ....... ..181 10.36: Correlation of LP and CP with Compensation Offered to Agents (Model 6) . . . . . . . . . . . . . . . . . . . . . . 18 1 10.37: Correlation of LP and CP with Compensation in the Principal's Final Knowledge Base (Model 6) . . . . . . . . . . . . . ... ..182 10.38: Correlation of LP and CP with the Movement of Agents (Model 6) . . 182 10.39: Correlation of LP and CP with Agent Factors (Model 6) . . . . .. .. 182 10.40: Correlation of LP and CP with Agents' Satisfaction (Model 6) ....... ..182 10.41: Correlation of LP and CP with Agents' Satisfaction at Termination (Model 6) . . . . . . . . . . . . . . . . . . . . . . 183 10.42: Correlation of LP and CP with Agency Interactions (Model 6) ....... ..183 xii 10.43: Correlation of LP and CP with Rule Activation (Model 6) . . . .. ..183 10.44: Correlation of LP and CP with Rule Activation in the Final Iteration (M odel 6) . . . . . . . . . . . . . . . . . . . 183 10.45: Correlation of LP and CP with Principal's Satisfaction and Least Squares (M odel 6) . . . . . . . . . . . . . . . . . . . 184 10.46: Correlation of Agents' Factors with Agents' Satisfaction (Model 6) . . 184 10.47: Correlation of Principal's Satisfaction with Agents' Factors and Agents' Satisfaction (M odel 6) . . . . . . . . . . . . . . ... .. 185 10.48: Correlation of Principal's Factor with Agents' Factor (Model 6) . . . 185 10.49: Correlation of LP and CP with Simulation Statistics (Model 7) ....... ..185 10.50: Correlation of LP and CP with Compensation Offered to Agents (Model 7) . . . . . . . . . . . . . . . . . . . . . . 185 10.51: Correlation of LP and CP with Compensation in the Principal's Final Knowledge Base (Model 7) . . 186 Correlation Correlation Correlation Correlation 7) . . . Correlation Correlation Correlation Correlation Correlation Correlation of LP and CP with the Movement of Agents (Model 7) . . 186 of LP with Agent Factors (Model 7) . . . . . . ... ..186 of LP and CP with Agents' Satisfaction (Model 7) ....... ..186 of LP and CP with Agents' Satisfaction at Termination (Model . . . . . . . . . . . . . . . . . . . . 187 of LP and CP with Agency Interactions (Model 7) ....... ..187 of LP and CP with Rule Activation (Model 7) . . . .. ..187 of LP with Rule Activation in the Final Iteration (Model 7) . 187 of LP and CP with Payoffs from Agents (Model 7) ....... .188 of LP and CP with Principal's Satisfaction (Model 7) . . 188 of Agent Factors with Agent Satisfaction (Model 7) . . . 188 xiii 10.52: 10.53: 10.54: 10.55: 10.56: 10.57: 10.58: 10.59: 10.60: 10.61: 10.62: Correlation of Principal's Satisfaction with Agent Factors (Model 7) . 188 10.63: Correlation of Principal's Satisfaction with Agents' Satisfaction (Model 7) . . . . . . . . . . . . . . . . . . . . . . 189 10.64: Correlation of Principal's Last Satisfaction with Agents' Last Satisfaction (M odel 7) . . . . . . . . . . . . . . . . . . . 189 10.65: Correlation of Principal's Satisfaction with Outcomes from Agents (Model 7) . . . . . . . . . . . . . . . . . . . . . . 189 10.66: Correlation of Principal's Factor with Agents' Factor (Model 7) . . . 189 10.67: Comparison of Models . . . . . . . . . . . . . . ... .. 190 10.68: Probability Distributions for Models 4, 5, 6, and 7 . . . . . .. .. 193 xiv Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A KNOWLEDGEINTENSIVE MACHINELEARNING APPROACH TO THE PRINCIPALAGENT PROBLEM By Kiran K. Garimella August 1993 Chairperson: Gary J. Koehler Major Department: Decision and Information Sciences The objective of the research is to explore an alternative approach to the solution of the principalagent problem, which is extremely important since it is applicable in almost all business environments. It has been traditionally addressed by the optimization analytical framework. However, there is a clearly recognized need for techniques that allow the incorporation of behavioral and motivational characteristics of the agent and the principal that influence their selection of effort and payment levels. The alternative proposed is a knowledgeintensive, machinelearning approach, where all the relevant knowledge and the constraints of the problem are taken into account in the form of knowledgebases. Genetic algorithms are employed for learning, supplemented in later models by specialization and generalization operators. A number of models are studied in order of increasing complexity and realism. Initial studies are presented that provide counter examples to traditional agency theory and that emphasize the need for going beyond the traditional framework. The new framework is more robust, easily extensible in a modular manner, and yields contracts tailored to the behavioral characteristics of individual agents. Factor analysis of final knowledge bases after extensive learning shows that elements of compensation besides basic pay and share of output play a greater role in characterizing good contracts. The learning algorithms tailor contracts to the behavioral and motivational characteristics of individual agents. Further, neither did perfect information yield the highest satisfaction nor did the complete absence of information yield the least satisfaction. This calls into question the traditional agency wisdom that more information is always desirable. Studies of other models study the effect of two different policies of evaluating agents' performance by the principalindividualized (discriminatory) evaluation versus the relative (nondiscriminatory) evaluation. The results suggest guidelines for employing different types of models to simulate different agency environments. CHAPTER 1 OVERVIEW The basic research addressed by this dissertation is the theory and application of machine learning to assist in the solution of decision problems in business. Much of the earlier research in machine learning was devoted to addressing specific and adhoc problems or to fill a gap or make up for some deficiency in an existing framework, usually motivated by developments in expert systems and statistical pattern recognition. The first applications were to technical problems such as knowledge acquisition, coping with a changing environment and filtering of noise (where filtering and optimal control were considered inadequate because of poorly understood domains), data or knowledge reduction (where the usual statistical theory is inadequate to express the symbolic richness of the underlying domain), and scene and pattern analysis (where the classical statistical techniques fail to take into account pertinent prior information; see for example, Jaynes, 1986a). The initial research was concerned with gaining an understanding of learning in extremely simple toy world models, such as checkers (Samuel, 1963), SHRDLU blocks world (Winograd, 1972), and various discovery systems. The insights gained by such research soon influenced serious applications. 2 The underlying domains of most of the early applications were relatively well structured, whether they were the stylized rules of checkers and chess or the digitized images of visual sensors. Our research focus is on importing these ideas into the area of business decisionmaking. Genetic algorithms, a relatively new paradigm of machine learning, deals with adaptive processes modeled on ideas from natural genetics. Genetic algorithms use the ideas of parallelism, randomized search, fitness criteria for individuals, and the formation of new exploratory solutions using reproduction, survival and mutation. The concept is extremely elegant, powerful, and easy to work with from the viewpoint of the amount of knowledge necessary to start the search for solutions. A related issue is maximum entropy. The Maximum Entropy Principle is an extension of Bayesian theory and is founded on two other principles: the Desideratum of Consistency and MaximalNoncommitment. While Bayesian analysis begins by assuming a prior, the Maximum Entropy Principle seeks distributions that maximize the Shannon entropy and at the same time satisfy whatever constraints may apply. The justification for using Shannon entropy comes from the works of Bernoulli, Laplace, Jeffreys, and Cox on the one hand, and from the works of Maxwell, Boltzmann, Gibbs, and Shannon on the other; the principle has been extensively championed by Jaynes and is only just now penetrating into economic analysis. Under the maximum entropy technique, the task of updating priors based on data is now subsumed under the general goal of maximizing entropy of distributions given any and all applicable constraints, where the data (or sufficient statistics on the data) play the 3 role of constraints. Maximum entropy is related to machine learning by the fact that the initial distributions (or assumptions) used in a learning framework, such as genetic algorithms, may be maximum entropy distributions. A topic of research interest is the development of machine learning algorithms or frameworks that are robust with respect to maximum entropy. In other words, deviation of initial distributions from maximum entropy distributions should not have any significant effect on the learning algorithms (in the sense of departure from good solutions). The overall goal of the research is to present an integrated methodology involving machine learning with genetic algorithms in knowledge bases and to illustrate its use by application to an important problem in business. The principalagent problem was chosen for the following reasons: it is widespread, important, nontrivial, and fairly general so that different models of the problem can be investigated, and information theoretic considerations play a crucial role in the problem. Moreover, a fair amount of interest over the problem has been generated among researchers in economics, finance, accounting, and game theory, whose predominant approach to the problem is that of constrained optimization. Several analytical insights have been generated, which should serve as points of comparison to results that are expected from our new methodology. The most important component of the new proposed methodology is information in the form of knowledge bases, coupled with strength of performance of the individual pieces of knowledge. These knowledge bases, the associated strengths, their relation to one another, and their role in the scheme of things are derived from the individuals' prior knowledge and from the theory of human behavior and motivation. These knowledge 4 bases contain, for example, information about the agent's characteristics and pattern of behavior under different compensation schemes; in other words, they deal with the issues of hidden characteristics and induced effort or behavior. Given the expected behavior pattern of an agent, a related research issue is the study of the effect of using distributions that have maximum entropy with respect to the expected behavior. Trial compensation schemes, which come from the specified knowledge bases, are presented to the agentss. Upon acceptance of the contract and realization of the output, the actual performance of the agent (in terms of output or the total welfare) is evaluated, and the associated compensation schemes are assigned proportional credit. Periodically, iterations of the genetic algorithm will be used to create a new knowledge base that enriches the current one. Chapter 2 begins with an introduction to artificial intelligence, expert systems, and machine learning. Chapter 3 describes genetic algorithms. Chapter 4 covers the origin of the Maximum Entropy Principle and its formulation. Chapter 5 deals with a survey of the principalagent problem, where a few basic models are presented, along with some of the main results of the research. Chapter 6 examines the traditional methodology used in attacking the principal agent problem, and measures to cover the inadequacies are proposed. One of the basic assumptions of the economic theorythe assumption of risk attitudes and utilityis circumvented by directly dealing with the knowledgebased models of the agent and the principal. To this end, a brief look at some of the ideas from behavior and motivation theory is taken in Chapter 7. 5 Chapter 8 describes the basic research model. Elements of behavior and motivation theory and knowledge bases are incorporated. A research strategy to study agency problems is proposed. The use of genetic algorithms periodically to enrich the knowledge bases and to carry out learning is suggested. An overview of the research models, all of which incorporate many features of the basic model, is presented. Chapter 9 describes Model 3 in detail. Chapter 10 introduces Models 4 through 7 and describes each in detail. Chapter 11 provides a summary of the results of Chapters 9 and 10. Directions for future research are covered in Chapter 12. CHAPTER 2 EXPERT SYSTEMS AND MACHINE LEARNING 2.1 Introduction The use of artificial intelligence in a computerized world is as revolutionary as the use of computers is in a manual world. One can make computers intelligent in the same sense as man is intelligent. The various techniques of doing this compose the body of the subject of artificial intelligence. At the present state of the art, computers are at last being designed to compete with man on his own ground on something like equal terms. To put it in another way, computers have traditionally acted as convenient tools in areas where man is known to be deficient or inefficient, namely, doing complicated arithmetic very quickly, or making many copies of data (i.e., files, reports, etc.). Learning new things, discovering facts, conjecturing, evaluating and judging complex issues (for example, consulting), using natural languages, analyzing and understanding complex sensory inputs such as sound and light, and planning for future action are mental processes that are peculiar to man (and to a lesser extent, to some animals). Artificial intelligence is the science of simulating or mimicking these mental processes in a computer. The benefits are immediately obvious. First, computers already fill some of the gaps in human skills; second, artificial intelligence fills some of the gaps that computers 7 themselves suffer (i.e., human mental processes). While the full simulation of the human brain is a distant dream, limited application of this idea has already produced favorable results. Speechunderstanding problems were investigated with the help of the HEARSAY system (Erman et al., 1980, 1981; and HayesRoth and Lesser, 1977). The faculty of vision relates to pattern recognition and classification and analysis of scenes. These problems are especially encountered in robotics (Paul, 1981). Speech recognition coupled with natural language understanding as in the limited system SHRDLU (Winograd, 1973) can find immediate uses in intelligent secretary systems that can help in data management and correspondence associated with business. An area that is commercially viable in large business environments that involve manufacturing and any other physical treatment of objects is robotics. This is a proven area of artificial intelligence application, but is not yet cost effective for small business. Several robot manufacturers have a good order book position. For a detailed survey see for example, Engelberger, 1980. An interesting viewpoint to the application of artificial intelligence to industry and business is that presented by decision analysis theory. Decision analysis helps managers to decide between alternative options and assess risk and uncertainty in a better way than before, and to carry out conflict management when there are conflicts among objectives. Certain operations research techniques are also incorporated, as for example, fair allocation of resources that optimize returns. Decision analysis is treated in Fishburn (1981), Lindley (1971), Keeney (1984) and Keeney and Raiffa (1976). In most 8 applications of expert systems, concepts of decision analysis find expression (Phillips, 1986). Manual application of these techniques is not cost effective, whereas their use in certain expert systems, which go by the generic name of Decision Analysis Expert Systems, leads to quick solutions of what were previously thought to be intractable problems (Conway, 1986). Several systems have been proposed that range from scheduling to strategy planning. See for example, Williams (1986). 2.2 Expert Systems The most fascinating and economically justifiable area of artificial intelligence is the development of expert systems. These are computer systems that are designed to provide expert advice in any area. The kind of information that distinguishes an expert from a nonexpert forms the central idea in any expert system. This is perhaps the only area that provides concrete and conclusive proof of the power of artificial intelligence techniques. Many expert systems are commercially viable and motivate diverse sources of funding for research into artificial intelligence. An expert system incorporates many of the techniques of artificial intelligence, and a positive response to artificial intelligence depends on the reception of expert systems by informed laymen. To construct an expert system, the knowledge engineer works with an expert in the domain and extracts knowledge of relevant facts, rules, rulesofthumb, exceptions to standard theory, and so on. This is a difficult task and is known variously as knowledge acquisition or mining. Because of the complex nature of the knowledge and the ways humans store knowledge, this is bound to be a bottleneck to the development 9 of the expert system. This knowledge is codified in the form of several rules and heuristics. Validation and verification runs are conducted on problems of sufficient complexity to see that the expert system does indeed model the thinking of the expert. In the task of building expert systems, the knowledge engineer is helped by several tools, such as EMYCIN, EXPERT, OPS5, ROSIE, GURU, etc. The net result of the activity of knowledge mining is a knowledge base. An inference system or engine acts on this knowledge base to solve problems in the domain of the expert system. An important characteristic of expert systems is the ability to justify and explain their line of reasoning. This is to create credibility during their use. In order to do this, they must have a reasonably sophisticated input/output system. Some of the typical problems handled by expert systems in the areas of business, industry, and technology are presented in Feigenbaum and McCorduck (1983) and Mitra (1986). Important cases where expert systems are brought in to handle the problems are 1. Capturing, replicating, and distributing expertise. 2. Fusing the knowledge of many experts. 3. Managing complex problems and amplifying expertise. 4. Managing knowledge. 5. Gaining a competitive edge. As examples of successful expert systems, one can consider MYCIN, designed to diagnose infectious diseases (Shortliffe, 1976); DENDRAL, for interpretation of molecular spectra (Buchanan and Feigenbaum, 1978); PROSPECTOR, for geological studies (Duda et al., 1979; Hart, 1978); and WHY, for teaching geography (Stevens and 10 Collins, 1977). For a more exhaustive treatment, see, for example Stefik et al. (1982), Barr and Feigenbaum (1981, 1982), Cohen and Feigenbaum (1982), and Barr et al. (1989). 2.3 Machine Learning 2.3.1 Introduction One of the key limitations of computers as envisaged by early researchers is the fact that they must be told in explicit detail how to solve every problem. In other words, they lack the capacity to learn from experience and improve their performance with time. Even in most expert systems today, there is only some weak form of implicit learning, such as learning by being told, rote memorizing, and checking for logical consistency. The task of machine learning research is to make up for this inadequacy by incorporating learning techniques into computers. The abstract goals of machine learning research are broadly 1. To construct learning algorithms that enable computers to learn. 2. To construct learning algorithms that enable computers to learn in the same way as humans learn. In both cases, the functional goals of machine learning research are as follows: 1. To use the learning algorithms in application domains to solve nontrivial problems. 2. To gain a better understanding of how humans learn, and the details of human cognitive processes. 11 When the goal is to come up with paradigms that can be used to solve problems, several subsidiary goals can be proposed: 1. To see if the learning algorithms do indeed perform better than humans do in similar situations. 2. To see if the learning algorithms come up with solutions that are intuitively meaningful for humans. 3. To see if the learning algorithms come up with solutions that are in some way better or less expensive than some alternative methodology. It is undeniable that humans possess cognitive skills that are superior not only to other animals but also to most learning algorithms that are in existence today. It is true that some of these algorithms perform better than humans in some limited and highly formalized situations involving carefully modeled problems, just as the simplex method consistently produces solutions superior to those possible by a human being. However, and this is the crucial issue, humans are quick to adopt different strategies and solve problems that are illstructured, illdefined, and not well understood, for which there does not exist any extensive domain theory, and that are characterized by uncertainty, noise, or randomness. Moreover, in many cases, it seems more important to humans to find solutions to problems that satisfy some constraints rather than to optimize some "function." At the present state of the art, we do not have a consistent, coherent and systematic theory of what these constraints are. These constraints are usually understood to be behavioral or motivational in nature. 12 Recent research has shown that it is also undeniable that humans perform very poorly in the following respects: * they do not solve problems in probability theory correctly ; * while they are good at deciding cogency of information, they are poor at judging relevance (see Raiffa, accident witnesses, etc.); * they lack statistical sophistication; * they find it difficult to detect contradictions in long chains of reasoning; * they find it difficult to avoid bias in inference and in fact may not be able to identify it. (See for example, Einhomrn, 1982; Kahneman and Tversky, 1982a, 1982b, 1982c, 1982d; Lichtenstein et al., 1982; Nisbett et al., 1982; Tversky and Kahneman, 1982a, 1982b, 1982c, 1982d.) Tversky and Kahneman (1982a) classify, for example, several misconceptions in probability theory as follows: * insensitivity to prior probability of outcomes; * insensitivity to sample size; * misconceptions of chance; * insensitivity to predictability; * the illusion of validity; * misconceptions of regression. 13 The above inadequacies on the part of humans pertain to higher cognitive thinking. It goes without saying that humans are poor at manipulating numbers quickly, and are subject to physical fatigue and lack of concentration when involved in mental activity for a long time. Computers are, of course, subject to no such limitations. It is important to note that these inadequacies usually do not lead to disastrous consequences in most everyday circumstances. However, the complexity of the modem world gives rise to intricate and substantial problems, solutions to which forbid inadequacies of the above type. Machine learning must be viewed as an integrated research area that seeks to understand the learning strategies employed by humans, incorporate them into learning algorithms, remove any cognitive inadequacies faced by humans, investigate the possibility of better learning strategies, and characterize the solutions yielded by such research in terms of proof of correctness, convergence to optimality (where meaningful), robustness, graceful degradation, intelligibility, credibility, and plausibility. Such an integrated view does not see the different goals of machine learning research as separate and clashing; insights in one area have implications for another. For example, insights into how humans learn help spot their strengths and weaknesses, which motivates research into how to incorporate the strengths into algorithms and how to cover up the weaknesses; similarly, discovering solutions from machine learning algorithms that are at first nonintuitive to humans motivates deeper analysis of the domain theory and of the human cognitive processes in order to come up with at least plausible explanations. 2.3.2 Definitions and Paradigms Any activity that improves performance or skills with time may be defined as learning. This includes motor skills and general problemsolving skills. This is a highly functional definition of learning and may be objected to on the grounds that humans learn even in a context that does not demand action or performance. However, the functional definition may be justified by noting that performance can be understood as improvement in knowledge and acquisition of new knowledge or cognitive skills that are potentially usable in some context to improve actions or enable better decisions to be taken. Learning may be characterized by several criteria. Most paradigms fall under more than one category. Some of these are 1. Involvement of the learner. 2. Sources of knowledge. 3. Presence and role of a teacher. 4. Access to an oracle (learning from internally generated examples). 5. Learning "richness." 6. Activation of learning: (a) systematic; (b) continuous; (c) periodic or random; (d) background; (e) explicit or external (also known as intentional); (f) implicit (also known as incidental); (g) call on success; and (h) call on failure. When classified by the criterion of the learner's involvement, the standard is the degree of activity or passivity of the learner. The following paradigms of learning are classified by this criterion, in increasing order of learner control: 1. Learning by being told (learner only needs to memorize by rote); 2. Learning by instruction (learner needs to abstract, induce, or integrate to some extent, and then store it); 3. Learning by examples (learner needs to induce to a great extent the correct concept, examples of which are supplied by the instructor); 4. Learning by analogy (learner needs to abstract and induce to a greater degree in order to learn or solve a problem by drawing the analogy. This implies that the learner already has a store of cases against which he can compare the analogy and that he knows how to abstract and induce knowledge); 5. Learning by observation and discovery (here the role of the learner is greatest; the learner needs to focus on only the relevant observations, use principles of logic and evidence, apply some value judgments, and discover new knowledge either by using induction or deduction). The above learning paradigms may also be classified on the basis of richness of knowledge. Under this criterion, the focus is on the richness of the resulting knowledge, which may be independent of the involvement of the learner. The spectrum of learning 16 is from "raw data" to simple functions, complicated functions, simple rules, complex knowledge bases, semantic nets, scripts, and so on. One fundamental distinction can be made from observation of human learning. The most widespread form of human learning is incidental learning. The learning process is incidental to some other cognitive process. Perception of the world, for example, leads to formation of concepts, classification of objects in classes or primitives, the discovery of the abstract concepts of number, similarity, and so on (see for example, Rand 1967). These activities are not indulged in deliberately. As opposed to incidental learning, we have intentional learning, where there is a deliberate and explicit effort to learn. The study of human learning processes from the standpoint of implicit or explicit cognition is the main subject of research in psychological learning. (See for example, Anderson, 1980; Craik and Tulving, 1975; Glass and Holyoak, 1986; Hasher and Zacks, 1979; Hebb, 1961; Mandler, 1967; Reber, 1967; Reber, 1976; Reber and Allen, 1978; Reber et al., 1980). A useful paradigm for the area of expert systems might be learning through failure. The explanation facility ensures that the expert system knows why it is correct when it is correct, but it needs to know why it is wrong when it is wrong, if it must improve performance with time. Failure analysis helps in focussing on deficient areas of knowledge. Research in machine learning raises several wider epistemological issues such as hierarchy of knowledge, contextuality, integration, conditionality, abstraction, and reduction. The issue of hierarchy arises in induction of decision trees (see for example, 17 Quinlan, 1979; Quinlan, 1986; Quinlan, 1990); contextuality arises in learning semantics, as in conceptual dependency (see for example, Schank, 1972; Schank and Colby, 1973), learning by analogy (see for example, Buchanan et al., 1977; Dietterich and Michalski, 1979), and casebased reasoning (Riesbeck and Schank, 1989); integration is fundamental to forming relationships, as in semantic nets (Quillian, 1968; Anderson and Bower, 1973; Anderson, 1976; Norman, et al., 1975; Schank and Abelson, 1977), and framebased learning (see for example, Minsky, 1975); abstraction deals with formation of universals or classes, as in classification (see for example, Holland, 1975), and induction of concepts (see for example, Mitchell, 1977; Mitchell, 1979; Valiant, 1984; Haussler, 1988); reduction arises in the context of deductive learning (see for example, Newell and Simon, 1956; Lenat, 1977), conflict resolution (see for example, McDermott and Forgy, 1978), and theoremproving (see for example, Nilsson, 1980). For an excellent treatment of these issues from a purely epistemological viewpoint, see for example Rand (1967) and Peikoff (1991). In discussing realworld examples of learning, it is difficult or meaningless to look for one single paradigm or knowledge representation scheme as far as learning is concerned. Similarly, there could be multiple teachers: humans, oracles, and an accumulated knowledge that acts as an internal generator of examples. In analyzing learning paradigms, it is useful to look at least three aspects, since they each have a role in making the others possible: 1. Knowledge representation scheme. 2. Knowledge acquisition scheme. 3. Learning scheme. At the present time, we do not yet have a comprehensive classification of learning paradigms and their systematic integration into a theory. One of the first attempts in this direction was taken by Michalski, Carbonell, and Mitchell (1983). An extremely interesting area of research in machine learning that will have far reaching consequences for such a theory of learning is multistrategy systems, which try to combine one or more paradigms or types of learning based on domain problem characteristics or to try a different paradigm when one fails. See for example Kodratoff and Michalski (1990). One may call this type of research metalearning research, because the focus is not simply on rules and heuristics for learning, but on rules and heuristics for learning paradigms. Here are some simple learning heuristics, for example: LH1: Given several "isa" relationships, find out about relations between the properties. (For example, the observation that "Socrates is a man" motivates us to find out why Socrates should indeed be classified as a man, i.e., to discover that the common properties are "rational animal" and several physical properties.) LH2: When an instance causes an existing heuristic with certainty to be revised downwards, ask for causes. LH3: When an instance that was thought to belong to a concept or class but later turns out not to belong to it, find out what it does belong to. LH4: If X isa Yl and X isa Y2, then find the relationship between Yl and Y2, and check for consistency. (This arises in learning by using semantic nets). LH5: Given an implication, find out if it is also an equivalence. LH6: Find out if any two or more properties are semantically the same, the opposite, or unrelated. LH7: If an object possesses two or more properties simultaneously from the same class or similar classes, check for contradictions, or rearrange classes hierarchically. LH8: An isatree in a semantic net creates an isatree with the object as a parent; find out in which isatree the parent object occurs as a child. We can contrast these with metarules or metaheuristics. A metarule is also a rule which says something about another rule. It is understood that metarules are watch dog rules that supervise the firing of other rules. Each learning paradigm has a set of rules that will lead to learning under that paradigm. We can have a set of metarules for learning if we have a learning system that has access to several paradigms of learning and if we are concerned with what paradigm to select at any given time. Learning meta rules help the learner to pick a particular paradigm because the learner has knowledge of the applicability of particular paradigms given the nature and state of a domain or given the underlying knowledgebase representation schema. The following are examples of metarules in learning: ML1: If several instances of a domainevent occur, then use generalization techniques. ML2: If an event or class of events occur a number of times with little or no change on each occurrence, then use induction techniques. 20 ML3: If a problem description similar to the problem on hand exists in a different domain or situation and that problem has a known solution, then use learningbyanalogy techniques. ML4: If several facts are known about a domain including axioms and production rules, then use deductive learning techniques. ML5: If undefined variables or unknown variables are present and no other learning rule was successful, then use the learningfrominstruction paradigm. In all cases of learning, metarules dictate learning strategies, whether explicitly as in a multistrategy system, or implicitly as when the researcher or user selects a paradigm. Just as in expert systems, the learning strategy may be either goal directed or knowledge directed. Goaldirected learning proceeds as follows: 1. Metarules select learning paradigm(s). 2. Learner imposes the learning paradigm on the knowledge base. 3. The structure of the knowledge base and the characteristics of the paradigm determine the representation scheme. 4. The learning algorithm(s) of the paradigm(s) execute(s). Knowledge directed learning, on the other hand, proceeds as follows: 1. The learner examines the available knowledge base. 2. The structure of the knowledge base limits the extent and type of learning, which is determined by the metarules. 3. The learner chooses an appropriate representation scheme. 4. The learning algorithm(s) of the chosen learning paradigm(s) execute(s). 2.3.3 Probably Approximately Close Learning Early research on inductive inference dealt with supervised learning from examples (see for example, Michalski, 1983; Michalski, Carbonell, and Mitchell, 1983). The goal was to learn the correct concept by looking at both positive and negative examples of the concept in question. These examples were provided in one of two ways: either the learner obtained them by observation, or they were provided to the learner by some external instructor. In both cases, the class to which each example belonged was conveyed to the learner by the instructor (supervisor, or oracle). The examples provided to the learner were drawn from a population of examples or instances. This is the framework underlying early research in inductive inference (see for example, Quinlan, 1979; Quinlan, 1986: Angluin and Smith 1983). Probably Approximately Close Identification (or PACID for short) is a powerful machinelearning methodology that seeks inductive solutions in a supervised nonincremental learning environment. It may be viewed as a multiplecriteria learning problem in which there are at least three major objectives: (1) to derive (or induce) the correct solution, concept or rule, which is as close as we please to the optimal (which is unknown); (2) to achieve as high a degree of confidence as we please that the solution so derived above is in fact as close to the optimal as we intended; (3) to ensure that the "cost" of achieving the above two objectives is "reasonable." 22 PACID therefore replaces the original research direction in inductive machine learning (seeking the true solution) by the more practical goal of seeking solutions close to the true one in polynomial time. The technique has been applied to certain classes of concepts, such as conjunctive normal forms (CNF). Estimates of necessary distribution independent sample sizes are derived based on the error and confidence criteria; the sample sizes are found to be polynomial in some factor such as the number of attributes. Applications to science and engineering have been demonstrated. The pioneering work on PACID was by Valiant (1984, 1985) who proposed the idea of finding approximate solutions in polynomial time. The ideas of characterizing the notion of approximation by using the concept of functional complexity of the underlying hypothesis spaces, introducing confidence in the closeness to optimality, and obtaining results that are independent of the underlying probability distribution with which the supervisory examples are generated (by nature or by the supervisor), compose the direction of the latest research. (See for example, Haussler, 1988; Haussler, 1990a; Haussler, 1990b; Angluin, 1987; Angluin, 1988; Angluin and Laird, 1988; Blumer, Ehrenfeucht, Haussler, and Warmuth, 1989; Pitt and Valiant, 1988; and Rivest, 1987). The theoretical foundations for the mathematical ideas of learning convergence with high confidence are mainly derived from ideas in statistics, probability, statistical decision theory, and fractal theory. (See for example, Vapnik, 1982; Vapnik and Chervonenkis, 1971; Dudley, 1978; Dudley, 1984; Dudley, 1987; Kolmogorov and Tihomirov, 1961; Kullback, 1959; Mandelbrot, 1982; Pollard, 1984; Weiss and Kulikowski, 1991). CHAPTER 3 GENETIC ALGORITHMS 3.1 Introduction Genetic classification algorithms are learning algorithms that are modeled on the lines of natural genetics (Holland, 1975). Specifically, they use operators such as reproduction, crossover, mutation, and fitness functions. Genetic algorithms make use of inherent parallelism of chromosome populations and search for better solutions through randomized exchange of chromosome material and mutation. The goal is to improve the gene pool with respect to the fitness criterion from generation to generation. In order to use the idea of genetic algorithms, problems must be appropriately modeled. The parameters or attributes that constitute an individual of the population must be specified. These parameters are then coded. The simulation begins with a random generation of an initial population of chromosomes, and the fitness of each is calculated. Depending on the problem and the type of convergence desired, it may be decided to keep the population size constant or varying across iterations of the simulation. Using the population of an iteration, individuals are selected randomly according to their fitness level to survive intact or to mate with other similarly selected individuals. For mating members, a crossover point is randomly determined (an individual with n 24 attributes has n1 crossover points), and the individuals exchange their "strings," thus forming new individuals. It may so happen that the new individuals are exactly the same as the parents. In order to introduce a certain amount of richness into the population, a mutation operator with extremely low probability is applied to the bits in the individual strings, which randomly changes each bit. After mating, survival, and mutation, the fitness of each individual in the new population is calculated. Since the probability of survival and mating is dependent on the fitness level, more fit individuals have a higher probability of passing on their genetic material. Another factor plays a role in determining the average fitness of the population. Portions of the chromosome, called genes or features, act as determinants of qualities of the individual. Since in mating, the crossover point is chosen randomly, those genes that are shorter in length are more likely to survive a crossover and thus be carried from generation to generation. This has important implications for modeling a problem and will be mentioned in the chapter on research directions. The power of genetic algorithms (henceforth, GAs) derives from the following features: 1. It is only necessary to know enough about the problem to identify the essential attributes of the solution (or "individual"); the researcher can work in comparative ignorance of the actual combinations of attribute values that may denote qualities of the individual. 2. Excessive knowledge cannot harm the algorithm; the simulation may be started with any extra knowledge the researcher may have about the problem, 25 such as his beliefs about which combinations play an important role. In such cases, the simulation may start with the researcher's population and not a random population; if it turns out that the whole or some part of this knowledge is incorrect or irrelevant, then the corresponding individuals get low fitness values and hence have a high probability of eventually disappearing from the population. 3. The remarks in point 2 above apply in the case of mutation also. If mutation gives rise to a useless feature, that individual gets a low fitness value and hence has a low probability of remaining in the population for a long time. 4. Since GAs use many individuals, the probability of getting stuck at local optima is minimized. According to Holland (1975), there are essentially four ways in which genetic algorithms differ from optimization techniques: 1. GAs manipulate codings of attributes directly. 2. They conduct search from a population and not from a single point. 3. It is not necessary to know or assume extra simplifications in order to conduct the search; GAs conduct the search "blindly." It must be noted however, that randomized search does not imply directionless search. 4. The search is conducted using stochastic operators (random selection according to fitness) and not by using deterministic rules. 26 There are two important models for GAs in learning. One is the Pitt approach, and the other is the Michigan approach. The approaches differ in the way they define individuals and the goals of the search process. 3.2 The Michigan Approach The knowledge base of the researcher or the user constitutes the genetic population, in which each rule is an individual. The antecedents and consequents of each rule form the chromosome. Each rule denotes a classifier or detector of a particular signal from the environment. Upon receipt of a signal, one or more rules fire, depending on the signal satisfying the antecedent clauses. Depending on the success of the action taken or the consequent value realized, those rules that contributed to the success are rewarded, and those rules that supported a different consequent value or action are punished. This process of assigning reward or punishment is called credit assignment. Eventually, rules that are correct classifiers get high reward values, and their proposed action when fired carries more weight in the overall decision of selecting an action. The credit assignment problem is the problem of how to allocate credit (reward or punishment). One approach is the bucketbrigade algorithm (Holland, 1986). The Michigan approach may be combined with the usual genetic operators to investigate other rules that may not have been considered by the researcher. 3.3 The Pitt Approach The Pitt Approach, by De Jong (see for example, De Jong, 1988), considers the whole knowledge base as one individual. The simulation starts with a collection of knowledge bases. The operation of crossover works by randomly dichotomizing two parent knowledge bases (selected at random) and mixing the dichotomized portions across the parents to obtain two new knowledge bases. The Pitt approach may be used when the researcher has available to him a panel of experts or professionals, each of whom provides one knowledge base for some decision problem at hand. The crossover operator therefore enables one to consider combinations of the knowledge of the individuals, a process that resembles a brainstorming session. This is similar to a group decision making approach. The final knowledge base or bases that perform well empirically would then constitute a collection of rules obtained from the best rules of the original expertise, along with some additional rules that the expert panel did not consider before. The Michigan approach will be used in this research to simulate learning on one knowledge base. CHAPTER 4 THE MAXIMUM ENTROPY PRINCIPLE 4.1 Historical Introduction The principle of maximum entropy was championed by E.T. Jaynes in the 1950s and has gained many adherents since. There are a number of excellent papers by E.T. Jaynes explaining the rationale and philosophy of the maximum entropy principle. The discussion of the principle essentially follows Jaynes (1982, 1983, 1986a, 1986b, and 1991). The maximum entropy principle may be viewed as "a natural extension and unification of two separate lines of development . The first line is identified with the names Bernoulli, Laplace, Jeffreys, Cox; the second with Maxwell, Boltzmann, Gibbs, Shannon." (Jaynes, 1983). The question of approaching any decision problem with some form of prior information is historically known as the Principle of Insufficient Reason (so named by James Bernoulli in 1713). Jaynes (1983) suggests the name Desideratum of Consistency, which may be formally stated as follows: (1) a probability assignment is a way of describing a certain state of knowledge; i.e., probability is an epistemological concept, not a metaphysical one; 29 (2) when the available evidence does not favor any one alternative among others, then the state of knowledge is described correctly by assigning equal probabilities to all the alternatives; (3) suppose A is an event or occurrence for which some favorable cases out of some set of possible cases exist. Suppose also that all the cases are equally likely. Then, the probability that A will occur is the ratio of the number of cases favorable to A to the total number of equally possible cases. This idea is formally expressed as Pr[A] = M Number of cases favorable to A N Number of equally possible cases" In cases where Pr[] is difficult to estimate (such as when the number of cases is infinite or impossible to find out), Bernoulli's weak law of large numbers may be applied, where Pr [A] = M = Number of cases favorable to A N Total number of equally likely cases Number of times A occurs Number of trials m n Limit theorems in statistics show that given (M,N) as the true state of nature, the observed frequency f(m,n) = m/n approaches Pr[A] = P(M,N) = M/N as the number of trials increase. 30 The reverse problem consists of estimating P(M,N) by f(m,n). For example, the probability of seeing m successes in n trials when each trial is independent with probability of success p, is given by the binomial distribution: P(m I n,) = P(m I n,p) (= )p(I p)" m. The inverse problem would then consist of finding Pr[M] given (m,N,n). This problem was given a solution by Bayes in 1763 as follows: Given (m,n), then Pr[p < M < p + dp] = P(dp rm, n) N : (n + 1) M p (I p) n w dp. m! (n m) which is the Beta distribution. These ideas were generalized and put into the form they are today, known as the Bayes' theorem, by Laplace as follows: When there is an event E with possible causes C1, and given prior information I and the observation E, the probability that a particular cause Ci caused the event E is given by P(Ci[E,) = 1 ^ EP(El C1) P(CIJ) S_, p (EI C ) P (Cjl 1T which result has been called "learning by experience" (Jaynes, 1978). The contributions of Laplace were rediscovered by Jeffreys around 1939 and in 1946 by Cox who, for the first time, set out to study the "possibility of constructing a consistent set of mathematical rules for carrying out plausible, rather than deductive, reasoning." (Jaynes, 1983). 31 According to Cox, the fundamental result of mathematical inference may be described as follows: Suppose A, B, and C represent propositions, AB the proposition "Both A and B are true", and 'A the negation of A. Then, the consistent rules of combination are: P(ABIC) = P(ABC) P(BIC), and P(AIB) + P(,AIB) = 1. Thus, "Cox proved that any method of inference in which we represent degrees of plausibility by real numbers, is necessarily either equivalent to Laplace's, or inconsistent." (Jaynes, 1983). The second line of development starts with James Clerk Maxwell in the 1850s who, in trying to find the probability distribution for the velocity direction of spherical molecules after impact, realized that knowledge of the meaning of the physical parameters of any system constituted extremely relevant prior information. The development of the concept of entropy maximization started with Boltzmann who investigated the distribution of molecules in a conservative force field in a closed system. Given that there are N molecules in the closed system, the total energy E remains constant irrespective of the distribution of the molecules inside the system. All positions and velocities are not equally likely. The problem is to find the most probable distribution of the molecules. Boltzmann partitioned the phase space of position and momentum into a discrete number of cells Rk, where 1 k < s. These cells were assumed to be such that the kth cell is a region which is small enough so that the energy of a molecule as it moves inside that region does not change significantly, but which is 32 also so large that a large number Nk of molecules can be accommodated in it. The problem of Boltzmann then reduces to the problem of finding the best prediction of Nk for any given k in 1,...,s. The numbers Nk are called the occupation numbers. The number of ways a given set of occupation numbers will be realized is given by the multinomial coefficient W(Nk) N= N!. ... N N1 N2 N,! The constraints are given by S E = E Nk Ek, and k =1 N = Nk. k= 1 Since each set {NJ} of occupation numbers represents a possible distribution, the problem is equivalently expressed as finding the most probable set of occupation numbers from the many possible sets. Using Stirling's approximation of factorials n! V /nn (n) n ej in equation (1) yields logW = N^ ) 1og I. (2) k=1\N \N The right hand side of (2) is the familiar Shannon entropy formula for the distribution specified by probabilities which are approximated by the frequencies Nk/N, k = 1, ..., s. In fact, in the limit as N goes to infinity, li N log 10 E N log ( N) = H. N 00 N. Distributions of higher entropy therefore have higher multiplicity. In other words, Nature is likely to realize them in more ways. If W, and W2 are two distributions, with corresponding entropies of H, and H2, then the ratio W2/W1 is the relative preference of W2 over W,. Since W2/W, exp[N(H2 H,)], when N becomes large (such as the Avogadro number), the relative preference "becomes so overwhelming that exceptions to it are never seen; and we call it the Second Law of Thermodynamics." (Jaynes, 1982). The problem may now be expressed in terms of constrained optimization as follows: Maximize log W = N k 1o04I {Nkl ki \N/ N/ subject to s E Nk Ek = E, and k= 1 SNk = N. k = I k=1 The solution yields surprisingly rich results which would not be attainable even if the individual trajectories of all the molecules in the closed spaces were calculated. The efficiency of the method reveals that in fact, such voluminous calculations would have canceled each other out, and were actually irrelevant to the problem. A similar idea is seen in the chapter on genetic algorithms, where ignorance can be seemingly 34 exploited and irrelevant information, even if assumed, would be eliminated from the solution. The technique has been used in artificial intelligence (see for example, [Lippman, 1988; Jaynes, 1991; Kane, 1991]), and in solving problems in business and economics (see for example, [Jaynes, 1991; Grandy, 1991; Zellner, 1991]). 4.2 Examples We will see how the principle is used in solving problems involving some type of prior information which is used as a constraint on the problem. For simplicity, we will deal with problems involving one random variable 0 having n values, and call the associated probabilities pi. For all the problems, the goal is to choose a probability distribution from among many possible ones which has the maximum entropy. No prior information whatsoever. The problem may be formulated using the Lagrange multiplier X for the single constraint as: n n Max g({pi}) = p1 in p + P 1 . {Pi} i = i i The solution is obtained as follows:Hence, pi = 1/n, i = l,...,n is the MaxEnt assignment, which confirms the intuition on the noninformative prior. Suppose the expected value of 0 is 10. We have two constraints in this problem: the first is the usual constraint on the probabilities summing to one; the second is the given information expected value of 0 is 1. We use the Lagrange multipliers X, and \2 for the two constraints respectively. The problem statement follows: = 1 lnp, + I = 0 = X i = e&1 V i = 1,...,n, = 1 n e 1 = E eXi = 1 i=1 = n el1 = 1 = n p,= 1 Pi = 1 V i = 1,. .,n. n  f Piln Pi + x1iE Pi1 + ;2[L (OiPi2.L ] j=1 , This can be solved in the usual way by taking partial derivatives of gO w.r.t. p,, X,, and X2, and equating them to zero. We obtain: Pi = e21, and n n Sie2 = VI e 2(I. =1 1i1 Writing x = e ag api  in pi Pi n = Pi i=1 Maxg({pi}) = (PI} we get i Qx8 I.Le n n i =i i =i1 (Oi =6) x0 = 0 i =i which is a polynomial in x, whose roots can be determined numerically. For example, let n = 3, 0 take values {1,2,3}, lo = 1.25. Solving as above and taking the appropriate roots, we obtain X, 2.2752509, X2 1.5132312, giving p, 0.7882, p2 = 0.1671, and p3 0.0382. Partial knowledge of probabilities. Suppose we know p,, i = l,...,k. Since we have n1 degrees of freedom in choosing pi, assume k < n2 to make the example non trivial. Then, the problem may be formulated as: n n max g(pi}) = E Pi in pi + I pi + q 1 , {Pj} i = k+1 i = kk1 k where q = Pi. i =1 Solving, we obtain S V i = k+l,...n. Pin A"' This is again fairly intuitive: the remaining probability 1q is distributed non informatively over the rest of the probability space. For example, if n = 4, p, = 0.5, and P2 = 0.3, then k = 2, q = 0.8, and P3 = p4 = (1 0.8)/(4 2) = 0.2/2 = 0.1. Note that the first case is a special case of the last one, with q = k = 0. 37 The technique can be extended to cover prior knowledge expressed in the form of probabilistic knowledge bases by using two key MaxEnt solutions: noninformativeness (as covered in the last example above), and statistical independence of two random variables given no knowledge to the contrary (in other words, given two probability distributions f and g over two random variables X and Y respectively, and no further information, the MaxEnt joint probability distribution h over X*Y is obtained as h = f*g). CHAPTER 5 THE PRINCIPALAGENT PROBLEM 5.1 Introduction 5.1.1 The Agency Relationship The principalagent problem arises in the context of the agency relationship in social interaction. The agency relationship occurs when one party, the agent, contracts to act as a representative of another party, the principal, in a particular domain of decision problems. The principalagent problem is a special case of a dynamic twoperson game. The principal has available to her a set of possible compensation schemes, out of which she must select one that both motivates the agent and maximizes her welfare. The agent also must choose a compensation scheme which maximizes his welfare, and he does so by accepting or rejecting the compensation schemes presented to him by the principal. Each compensation package he considers implicitly influences him to choose a particular (possibly complex) action or level of effort. Every action has associated with it certain disutilities to the agent, in that he must expend a certain amount of effort and/or expense. It is reasonable to assume that the agent will reject outright any compensation package which yields less than that which can be obtained elsewhere in the market. This assumption is in turn based on the assumptions that the agent is knowledgeable about his 39 "reservation constraint", and that he is free to act in a rational manner. The assumption of rationality also applies to the principal. After agreeing to a contract, the agent proceeds to act on behalf of the principal, which in due course yields a certain outcome. The outcome is not only dependent on the agent's actions but also on exogenous factors. Finally the outcome, when expressed in monetary terms, is shared between the principal and the agent in the manner decided upon by the selected compensation plan. The specific ways in which the agency relationship differs from the usual employeremployee relationship are (Simon, 1951): (1) The agent does not recognize the authority of the principal over specific tasks the agent must do to realize the output. (2) The agent does not inform the principal about his "area of acceptance" of desirable work behavior. (3) The work behavior of the agent is not directly (or costlessly) observable by the principal. Some of the first contributions to the analysis of principalagent problems can be found in Simon (1951), Alchian & Demsetz (1972), Ross (1973), Sitglitz (1974), Jensen & Meckling (1976), Shavell (1979a, 1979b), Holmstrom (1979, 1982), Grossman & Hart (1983), Rees (1985), Pratt & Zeckhauser (1985), and Arrow (1986). There are three critical components in the principalagent model: the technology, the informational assumptions, and the timing. Each of these three components is described below. 5.1.2 The Technology Component of Agency The technology component deals with the type and number of variables involved (for example, production variables, technology parameters, factor prices, etc.), the type and the nature of functions defined on these variables (for example, the type of utility functions, the presence of uncertainty and hence the existence of probability distribution functions, continuity, differentiability, boundedness, etc.), the objective function and the type of optimization (maximization or minimization), the decision criteria on which optimization is carried out (expected utility, weighted welfare measures, etc.), the nature of the constraints, and so on. 5.1.3 The Information Component of Agency The information component deals with the private information sources of the principal and the agent, and information which is public (i.e. known to both the parties and costlessly verifiable by a third party, such as a court). This component of the model addresses the question, "who knows what?". The role of the informational assumption in agency is as follows: (a) it determines how the parties act and make decisions (such as offer payment schemes or choose effort levels), (b) it makes it possible to identify or design communication structures, (c) it determines what additional information is necessary or desirable for improved decision making, and 41 (d) it enables the computation of the cost of maintaining or establishing communication structures, or the cost of obtaining additional information. For example, one usual assumption in the principalagent literature is that the agent's reservation level is known to both parties. As another example of the way in which additional information affects the decisions of the principal, note that the principal, in choosing a set of compensation schemes for presenting to the agent, wishes to maximize her welfare. It is in her interest, therefore, to make the agent accept a payment scheme which induces him to choose an effort level that will yield a desired level of output (taking into consideration exogenous risk). The principal would be greatly assisted in her decision making if she had knowledge of the "function" which induces the agent to choose an effort level based on the compensation scheme, and also knowledge of the hidden characteristics of the agent such as his utility of income, disutility of effort, risk attitude, reservation constraint, etc. Similarly, the agent would be able to take better decisions if he were more aware of his risk attitude, disutility of effort and exogenous factors. Any information, even if imperfect, would reduce either the magnitude or the variance of risk or both. However, better information for the agent does not always imply that the agent will choose an act or effort level that is also optimal for the principal. In some cases, the total welfare of the agency may be reduced as a result (Christensen, 1981). The gap in information may be reduced by employing a system of messages from the agent to the principal. This system of messages may be termed a "communication structure" (Christensen, 1981). The agent chooses his action by observing a signal from 42 his private information system after he accepts a particular compensation scheme from the principal subject to its satisfying the reservation constraint. This signal is caused by the combination of the compensation scheme, an estimate of exogenous risk by the agent based on his prior information or experience, and the agent's knowledge of his risk attitude and disutility of action. The communication structure agreed upon by both the principal and the agent allows the agent to send a message to the principal. It is to be noted that the agency contract can be made contingent on the message, which is jointly observable by both the parties. The compensation scheme considers the messages) as one (some) of the factors in the computation of the payment to the agent, the other of course being the output caused by the agent's action. Usually, formal communication is not essential, as the principal can just offer the agent a menu of compensation schemes, and allow the agent to choose one element of the menu. 5.1.4 The Timing Component of Agency Timing deals with the sequence of actions taken by the principal and the agent, and the time when they commit themselves to specific decisions (for example, the agent may choose an effort level before or after observing some signal about exogenous risk). Below is one example of timing (T denotes time): T1. The principal selects a particular compensation scheme from a set of possible compensation schemes. T2. The agent accepts or rejects the suggested compensation scheme depending on whether it satisfies his reservation constraint or not. 43 T3. The agent chooses an action or effort level from a set of possible actions or effort levels. T4. The outcome occurs as a function of the agent's actions and exogenous factors which are unknown or known only with uncertainty. Another example of timing is when a communication structure with signals and messages is involved (Christensen, 1981): Tl. The principal designs a compensation scheme. T2. Formation of the agency contract. T3. The agent observes a signal. T4. The agent chooses an act and sends a message to the principal. T5. The output occurs from the agent's act and exogenous factors. Variations in the principalagent problems are caused by changes in one or more of these components. For example, some principalagent problems are characterized by the fact that the agent may not be able to enforce the payment commitments of the principal. This situation occurs in some of the relationships in the context of regulation. Another is the possibility of renegotiation or review of the contract at some future date. Agency theory, dealing with the above market structure, gives rise to a variety of problems caused by the presence of factors such as the influence of externalities, limited observability, asymmetric information, and uncertainty (Gjesdal, 1982). 5.1.5 Limited Observability. Moral Hazard, and Monitoring An important characteristic of principalagent problems limited observability of the agent's actions gives rise to moral hazard. Moral hazard is a situation in which one party (say, the agent) may take actions detrimental to the principal and which cannot be perfectly and/or costlessly observed by the principal (see for example, [Holmstrom, 1979]). Formally, perfect observation might very well impose "infinite" costs on the principal. The problem of unobservability is usually addressed by designing monitoring systems or signals which act as estimators of the agent's effort. The selection of monitoring signals and their value is discussed for the case of costless signals in Harris and Raviv (1979), Holmstrom (1979), Shavell (1979), Gjesdal (1982), Singh (1985), and Blickle (1987). Costly signals are discussed for three cases in Blickle (1987). On determining the appropriate monitoring signals, the principal invites the agent to select a compensation scheme from a class of compensation schemes which she, the principal, compiles. Suppose the principal determines monitoring signals s,, ..., s,,, and has a compensation scheme c(q, s,, ..., sj, where q is the output, which the agent accepts. There is no agreement between the principal and the agent as to the level of the effort e. Since the signals si, i = 1, ..., n determine the payoff and the effort level e of the agent (assuming the signals have been chosen carefully), the agent is thereby induced to an effort level which maximizes the expected utility of his payoff (or some other decision criterion). The only decision still in the agent's control is the choice of how much payoff he wants; the assumption is that the agent is rational in an economic sense. The principal's residuum is the output q less the compensation c(). The principal 45 structures the compensation scheme c(.) in such a way as to maximize the expected utility of her residuum (or some other decision criterion). In this manner, the principal induces desirable work behavior in the agent. It has been observed that "the source of moral hazard is not unobservability but the fact that the contract cannot be conditioned on effort. Effort is noncontractible." (Rasmusen, 1989). This is true when the principal observes shirking on the part of the agent but is unable to prove it in a court of law. However, this only implies that a contract on effort is imperfectly enforceable. Moral hazard may be alleviated in cases where effort is contracted, and where both limited observability and a positive probability of proving noncompliance exist. 5.1.6 Informational Asymmetry. Adverse Selection, and Screening Adverse selection arises in the presence of informational asymmetry which causes the two parties to act on different sets of information. When perfect sharing of information is present and certain other conditions are satisfied, firstbest solutions are feasible (Sappington and Stiglitz, 1987). Typically however, adverseselection exists. While the effect of moral hazard makes itself felt when the agent is taking actions (say, production or sales), adverse selection affects the formation of the relationship, and may give rise to inefficient (in the secondbest sense) contracts. In the information theoretic approach, we can think of both being caused by lack of information. This is variously referred to as the dissimilarity between private information systems of the agent 46 and the firm, or the unobservability or ignorance of "hidden characteristics" (in the latter sense, moral hazard is caused by "hidden effort or actions"). In the theory of agency, the hidden characteristic problem is addressed by designing various sorting and screening mechanisms, or communication systems that pass signals or messages about the hidden characteristics (of course, the latter can also be used to solve the moral hazard problem). On the one hand, the screening mechanisms can be so arranged as to induce the target party to select by itself one of the several alternative contracts (or "packages"). The selection would then reveal some particular hidden characteristic of the party. In such cases, these mechanisms are called "selfselection" devices. See, for example, Spremann (1987) for a discussion of selfselection contracts designed to reveal the agent's risk attitude. On the other hand, the screening mechanisms may be used as indirect estimators of the hidden characteristics, as when aptitude tests and interviews are used to select agents. The significance of the problem caused by the asymmetry of information is related to the degree of lack of trust between the parties to the agency contract which, however, may be compensated for by observation of effort. However, most real life situations involving an agency relationship of any complexity are characterized not only by a lack of trust but also by a lack of observability of the agent's effort. The full context to the concept of information asymmetry is the fact that each party in the agency relationship is either unaware or has only imperfect knowledge of certain factors which are better known to the other party. 5.1.7 Efficiency of Cooperation and Incentive Compatibility In the absence of asymmetry of information, both principal and agent would cooperatively determine both the payoff and the effort or work behavior of the agent. Subsequently, the "game" would be played cooperatively between the principal and the agent. This would lead to an efficient agreement termed the firstbest design of cooperation. Firstbest solutions are often absent not merely because of the presence of externalities but mainly because of adverse selection and moral hazard (Spremann, 1987). Let F = { (c,e) }, where compensation c and effort e satisfy the principal's and the agent's decision criteria respectively. In other words, F is the set of firstbest designs of cooperation, also called efficient designs with respect to the principalagent decision criteria. Now, suppose that the agent's action e is induced as above by a function I: I(c) = e. Let S = { (c,I(c)) }  i.e. S denotes the set of designs feasible under information asymmetry. If it were not the case that F n S = 0, then efficient designs of cooperation would be easily induced by the principal. Situations where this occurs are said to be incentive compatible. In all other cases, the principal has available to her only secondbest designs of cooperation, which are defined as those schemes that arise in the presence of information asymmetry. 5.1.8 Agency Costs There are three types of agency costs (Schneider, 1987): (1) the cost of monitoring the hidden effort of the agent, (2) the bonding costs of the agent, and 48 (3) the residual loss, defined as the monetary equivalent of the loss in welfare of the principal caused by the actions taken by the agent which are nonoptimal with respect to the principal. Agency costs may be interpreted in the following two ways: (1) they may be used to measure the "distance" between the firstbest and the second best designs; (2) they may be looked upon as the value of information necessary to achieve second best designs which are arbitrarily close to the firstbest designs. Obviously, the value of perfect information should be considered as an upper bound on the agency costs (see for example, [Jensen and Meckling, 1976]). 5.2 Formulation of the PrincipalAgent Problem The following notation and definitions will be used throughout: D: the set of decision criteria, such as {maximin, minimax, maximax, minimin, minimax regret, expected value, expected loss,...}. We use A E D. Ap: the decision criterion of the principal. AA: the decision criterion of the agent. Up: the principal's utility function. UA: the agent's utility function. C: the set of all compensation schemes. We use c E C. E: the set of actions or effort levels of the agent. We use e E E. 0: a random variable denoting the true state of nature. 49 Op: a random variable denoting the principal's estimate of the state of nature. O^: a random variable denoting the agent's estimate of the state of nature. q: output realized from the agent's actions (and possibly the state of nature). qp: monetary equivalent of the principal's residuum. Note that qp = q c(.), where c may depend on the output and possibly other variables. Output/outcome. The goal or purpose of the agency relationship, such as sales, services or production, is called the output or the outcome. Public knowledge/information. Knowledge or information known to both the principal and the agent, and also a third enforcement party, is termed public knowledge or information. A contract in agency can be based only on public knowledge (i.e. observable output or signals). Private knowledge/information. Knowledge or information known to either the principal or the agent but not both is termed private knowledge or information. State of nature. Any events, happenings, occurrences or information which are not in the control of the principal or the agent and which affect the output of the agency directly through the technology constitute the state of nature. Compensation. The economic incentive to the agent to induce him to participate in the agency is called the compensation. This is also called wage, payment or reward. Compensation scheme. The package of benefits and output sharing rules or functions that provide compensation to the agent is called the compensation scheme. Also called contract, payment function or compensation function. 50 The word "scheme" is used here instead of "function" since complicated compensation packages will be considered as an extension later on. In the literature, the word "scheme" may be seen, but it is used in the sense of "function", and several nice properties are assumed for the function (such as continuity, differentiability, and so on). Depending on the contract, the compensation may be negative a penalty for the agent. Typical components of the compensation functions considered in the literature are rent (fixed and possibly negative), and share of the output. The principal's residuum. The economic incentive to the principal to engage in the agency is the principal's residuum. The residuum is the output (expressed in monetary terms) less the compensation to the agent. Hence, the principal is sometimes called the residual claimant. Payoff. Both the agent's compensation and the principal's residuum are called the payoffs. Reservation welfare (of the agent). The monetary equivalent of the best of the alternative opportunities (with other competing principals, if any) available to the agent is known as the reservation welfare of the agent. Accordingly, it is the minimum compensation that induces an agent to accept the contract, but not necessarily induce him to his best effort level. Also known as reservation utility or individual utility, it is variously denoted in the literature as m or U. Disutility of effort. The cost of inputs which the agent must supply himself when he expends effort contributes to disutility, and hence is called the disutility of effort. 51 Individual rationality constraint (IRC). The agent's (expected) utility of net compensation (compensation from the principal less his disutility of effort) must be at least as high as his reservation welfare. This constraint is also called the participation constraint. When a contract violates the individual rationality constraint, the agent rejects it and prefers unemployment instead. Such a contract is not necessarily "bad", since different individuals have different levels of reservation welfare. For example, financially independent individuals may have higher than usual reservation welfare levels, and might very well prefer leisure to work even when contracts are attractive to most other people. Incentive compatibility constraint (ICC). A contract will be acceptable to the agent if it satisfies his decision criterion on compensation, such as maximization of expected utility of net compensation. This constraint is called the incentive compatibility constraint. Development of the problem: Model 1. We develop the problem from simple cases involving the least possible assumptions on the technology and informational constraints, to those having sophisticated assumptions. Corresponding models from the literature are reviewed briefly in section 1.3. A. Technology: (a) fixed compensation, C set of fixed compensations, U E C; output q q(e); assume q(0) = 0; existence of nonseparable utility functions; decision criterion: maximization of utility; no uncertainty in the state of nature. B. Public information: (a) compensation scheme, c; (b) range of possible outputs, Q; (c) U. Information private to the principal: Up Information private to the agent: (a) U^; (b) disutility of effort, d; (c) range of effort levels, e. C. Timing: (1) the principal makes an offer of fixed wage c; (2) the agent either rejects or accepts the offer; (3) if he accepts it, exerts effort level e; (4) output q(e) results; (5) sharing of output according to contract. D. Payoffs: Case 1: 7p 7A Case 2: rp 1rA Agent rejects contract, i.e. e = 0; = Up[q(e)] = Up[q(0)] = Up[0]. = UA[U]. Agent accepts contract; = Up[q(e) c]. = UA[c d(e)]. E. The principal's problem: (MI.P1) Max, c c maxq E Q Up[q c] such that c > U. (IRC) Suppose C* c C is the solution set of Ml.P1. The principal picks c* E C* and offers it to the agent. The agent's problem: (M1.A1) For a given c*, Max, E E U^[c d(e)]. Suppose E* c E is the solution set of M1.A1. The agent selects e* E E*. F. The solution: (a) the principal offers c* E C* to the agent; (b) the agent accepts the contract; (c) the agent exerts effort e'(c') E E'; (d) output q(e*(c)) occurs; (e) payoffs: rp = Up[q(e'(c4)) c']; 7A = UA[c" d(e'(c'))]. Notes: 1. The agent accepts the contract in F.b since IRC is present in Ml.PI, and C* is nonempty since U E C. 2. Effort of the agent is a function of the offered compensation. 3. Since one of the informational assumptions was that the principal does not know the agent's utility function, U is a compensation rather than the agent's utility of compensation, so UA(U) is meaningful. G. Variations: 1. The principal offers C to the agent instead of a c* E C*. The agent's problem then becomes: (M1.A2) Maxc. E c. max, E E UA[c d(e)]. The first three steps in the solution then become: (a) the principal offers C* to the agent; (b) the agent accepts the contract; (c) the agent picks an effort level e* which is a solution to M1.A2 and reports the corresponding c" (or its index if appropriate) to the principal. 2. The agent may decide to solve an additional problem: from among two or more competing optimal effort levels, he may wish to select a minimum effort level. Then, his problem would be: (M1.A3) Min e d(e) such that e* E argmax, E E UA[c* d(e)]. Example: Let E = {e,, e2, e3}, C* = {c,,c2,c3}. Suppose, c1(q(e,)) = 5, d(e,) = 2; c2(q(e2)) = 6, d(e2) = 3; c3(q(e3)) = 6, d(e,) = 4; The net compensation to the agent in choosing the three effort levels is 3, 3, and 2 respectively. Assuming d(e) is monotone increasing in e, the agent chooses e, to e2, and so prefers compensation c, to C2. 56 3. We assumed U is public knowledge. If this were not so, then the agent has to test all offers to see it they are at least as high as the utility of his reservation welfare. The two problems then become: (M1.P2) Maxc C maxq E Q Up[q c] and (M1.A4) Max, E UA[c" d(e)] such that c* > UA[U], (IRC) c* E argmax M1.P2. In this case, there is a distinct possibility of the agent rejecting an offer of the principal. 4. Note that in most realistic situations, a distinction must be made between the reservation welfare and the agent's utility of the reservation welfare. Otherwise, merely using IRC with the reservation welfare in Ml.P1 may not satisfy the agent's constraint. On the other hand, U = UA(U) implies knowledge of UA by the principal, a complication which yields a completely different model. When U UA(U), the following two problems occur: (M 1.P3) Max c C maXq E Q Up(q c) such that c > U. (M1.A5) Max, E E UA(C d(e)) such that c. > UA(U), (IRC) c* E argmax M1.P3. In other words, the principal solves her problem the best way she can, and hopes the solution is acceptable to the agent. 5. Negotiation. Negotiation of a contract can occur in two contexts: (a) when there is no solution to the initial problem, the agent may communicate to the principal his reservation welfare, and the principal may design new compensation schemes or revise her old schemes so that a solution may be found. This type of negotiation also occurs in the case of problems M1.P3 and M1.A5. (b) The principal may offer c* E argmax, c c Ml .P1. The agent either accepts it or does not; if he does not, then the principal may offer another optimal contract, if any. This interaction may continue until either the agent accepts some compensation scheme or the principal runs out of optimal compensations. Development of the problem: Model 2. This model differs from the first by incorporating uncertainty in the state of nature, and conditioning the compensation functions on the output. A. Technology: (a) presence of uncertainty in the state of nature; (b) compensation scheme c = c(q); (c) output q = q(e,O); (d) existence of known utility functions for the agent and the principal; (e) disutility of effort for the agent is monotone increasing in effort e; B. Public information: (a) presence of uncertainty, and range of 0; (b) output function q; (c) payment functions c; (d) range of effort levels of the agent. Information private to the principal: (a) the principal's utility function; (b) the principal's estimate of the state of nature. Information private to the agent: (a) the agent's utility function; (b) the agent's estimate of the state of nature; (c) disutility of effort; (d) reservation welfare; C. Timing: (a) the principal determines the set of all compensation schemes that maximize her expected utility; (b) the principal presents this set to the agent as the set of offered contracts; (c) the agent picks from this set of compensation schemes a compensation scheme that maximizes his net compensation, and a corresponding effort level; (d) a state of nature occurs; (e) an output results; (f) sharing of the output takes place as contracted. D. Payoffs: Case 1: Agent rejects contract, i.e. e = 0; rp = Up[q(e,0)] = Up[q(0,0)]. 7KA = UA[U]. Case 2: Ir. 7rA Agent accepts contract; = Up[q(e,0) c(q)]. = UA[c(q) d(e)]. E. The principal's problem: (M2.P) MaxCEC MaxoE EP Up [q(e, 0) c(q(e,Q))] where the expectation E(.) is given by (assuming the usual regularity conditions) 0 f Up[q(e,0) c(q(e,0))] f (0) dO fo ep where 0 E [0, U], and f(O) is the distribution assigned by the principal. The agent's problem: (M2.A) Maxcec MaxeE E@A UA[c(q(e,O) ) d(e)] subject to EeA[c(q(e,O)) d(e)] >U, (IRC) c e argmax(M2.P). where the expectation E(.) is given as usual by U f UA[q(e,0) c(q(e,O))] f (0) dO. 0 0, F. The solution: (a) The agent selects c* E C, and a corresponding effort e* which is a solution to M2.A; (b) a state of nature 0 occurs; (c) output q(e*,0) is generated; (d) payoffs: ,rp = Up[q(e',0) c'(q(e*,0))]; 7rA = UA[c'(q(e',0)) d(e*)]. Development of the problem: Model 3. In this model, the strongest possible assumption is made about information available to the principal: the principal has complete knowledge of the utility function of the agent, his disutility of effort, and his reservation welfare. Accordingly, the principal is able to make an offer of compensation which satisfies the decision criterion of the agent and his constraints. In other words, the two problems are treated as one. The assumptions are as in model 2, so only the statement of the problem will be given below. The problem: MaxEc, ee E Up[q(e*',Q) c(q(e*,O))] subject to E UA[c(q(e*,O)) d(e*)] > U, (IRC) e* E argmax {MaxEE, cEc E U[c(q(e,O) ) d(e)] } (ICC) 5.3 Main Results in the Literature Several results from basic agency models will be presented using the framework established in the development of the problem. The following will be presented for each model: Technology, Information, Timing, Payoffs, and Results. It must be noted that the literature rarely presents such an explicit format; rather, several assumptions are often buried within the results, or implied or just not stated. Only by trying an algorithmic formulation is it possible to unearth unspecified assumptions. In many cases, some of the factors are assumed for the sake of formal completeness, even though the original paper neither mentions nor uses those factors in its results. This type of modeling is essential when the algorithms are implemented subsequently using a knowledgeintensive methodology. One recurrent example of incomplete specification is the treatment of the agent's individual rationality constraint (IRC). The principal has to pick a compensation which satisfies IRC. However, some consistency in using IRC is necessary. The agent's reservation welfare U is also a compensation (albeit a default one). The agent must 63 check one of two constraints to verify that the offered compensation indeed meets his reservation welfare: c > U or UA(c)  UA(U). If the principal picks a compensation which satisfies c > U, it is not necessary that UA(C) > UA(U) be also satisfied. However, using UA(C) > U for the IRC, where U is treated "as if" it were UA(U), implies knowledge of the agent's utility on the part of the principal. The difference between the two situations is of enormous significance if the purpose of analysis is to devise solutions to realworld problems. In the literature, this distinction is conveniently overlooked. If all such vagueness in the technological, informational and temporal assumptions was to be systematically eliminated, the analysis might change in a way not intended in the original literature. Hence, the main results in the literature will be presented as they are. 5.3.1 Model 1: The LinearExponentialNormal Model This name of the model (Spremann, 1987) derives from the nature of three crucial parameters: the payoff functions are linear, the utility functions are exponential, and the exogenous risk has a normal distribution. Below is a full description. Technology: (a) compensation is the sum of a fixed rent r and a share s of the output q: c(q) = r + sq; 64 (b) presence of uncertainty in the state of nature, denoted by 9, where 0  N(O,o2); (c) the set of effort levels of the agent, E = [O,1]; effort is induced by compensation; (d) output q = q(e,O) = e + 0; (e) the agent's disutility of effort is d d(e) e2; (f) the principal's utility Up is linear (the principal is risk neutral); (g) the agent has constant risk aversion ca > 0, and his utility is UA(W) = exp(uw), where w is his net compensation (also called the wealth); (h) the certainty equivalent of wealth, denoted V, is defined as: V(w) = U[E(U(w))], where U denotes the utility function, E0 is the expectation with respect to 0; as usual, subscripts P or A on V denote the principal or the agent respectively; (i) the decision criterion is maximization of expected utility. Public information: (a) compensation scheme c(q; r,s); (b) output q; (c) distribution of 0; (d) agent's reservation welfare U; (e) agent's risk aversion a. Information private to the principal: Utility of residuum, Up. Information private to the agent: (a) selection of effort given the compensation; (b) utility of welfare; (c) disutility of effort. Timing: (a) the principal offers a contract (r,s) to the agent; (b) the agent's effort e is induced by the compensation scheme; (c) a state of nature occurs; (d) the agent's effort and the state of nature give rise to output; (e) sharing of the output takes place. Payoffs: p = Up[q (r + sq)] = Up[e(r,s) + 0 (r + s(e(r,s) + 0o))] 7rA = UA[r + sq d(e(r,s))] = U^[r + s(e(r,s) + 0) d(e(r,s))], where e(r,s) is the function which induces effort based on compensation, and 0o is the realized state of nature. 66 Results: Result 1.1: The optimal effort level of the agent given a compensation scheme (r,s) is denoted e*, and is obtained by straightforward maximization to yield: e* = e*(r,s) = s/2. This shows that the rent r and the reservation welfare U have no impact on the selection of the agent's effort. Result 1.2: A necessary and sufficient condition for IRC to be satisfied for a given compensation scheme (r,s) is: j S2 (1 2a(12) r U S( 00 4 Result 1.3: The optimal compensation scheme for the principal is c* = (r*,s*), where s* = 1and 1 + 2ao* = 1 2go2 4s *2 Corollary 1.3: The agent's optimal effort given a compensation scheme (r*,s") is (using result 1.1): e 1 2 (1 + 2ao2) 67 Result 1.4: Suppose 2ao? > 1. Then, an increase in share s requires an increase in rent r (in order to satisfy IRC). To see this, suppose we increase the share s by 5, o= s + 5, 0 < 6 < 1s. From Result 1.2, for IRC to hold we need, so2(1 2xo2) 4 = (s + 8)2(1 2a02) 4 (1 2go02)[S2 + 2s8 + 82] 4 S (1 2 ~ 2) 2 (2S6 + 82)(1 2aO2) 4 4 = (2S8 + 82)(1 2a02) 4 r ( 1 < 2ao 2). Result 1.5: The welfare attained by the agent is U, while the principal's welfare is given by: 1 S U. 45s* 68 Result 1.6: The principal prefers agents with lower risk aversion. This is immediate from the fact that the principal's welfare is decreasing in the agent's risk aversion for a given o2 and U. Result 1.7: Fixed fee arrangements are nonoptimal, no matter how large the agent's risk aversion. This is immediate from the fact that s* = 1 > 0 Va > 0. 1 + 2aco2 Result 1.8: It is the connection between unobservability of the agent's effort and his risk aversion that excludes firstbest solutions. 5.3.2 Model 2 This model (Gjesdal, 1982) deals with two problems: (a) choosing an information system, and (b) designing a sharing rule based on the information system. Technology: (a) presence of uncertainty, 0; (b) finite effort set of the agent; effort has several components, and is hence treated as a vector; (c) output q is a function of the agent's effort and the state of nature 0; the range of output levels is finite; (d) presence of a finite number of public signals; (e) presence of a set of public information systems (i.e. signals), including non informative and randomized systems, the output being treated as one of the informative information systems; (f) costlessness of public information systems; (g) compensation schemes are based on signals about effort or output or both. Public information: (a) distribution of the state of nature, 0; (b) output levels; (c) common information systems which are noninformative and randomizing; (d) UA. Information private to the principal: utility function, Up. Information private to the agent: disutility of effort. Timing: (a) principal offers contract based on observable public information systems, including the output; (b) agent chooses action; (c) signals from the specified public information systems are observed; (d) agent gets paid on the basis of the signal; (e) a state of nature occurs; (f) output is observed; (g) principal keeps the residuum. Special technological assumptions: Some of these assumptions are used in only some of the results; other results are obtained by relaxing them. (a) The joint probability distribution function on output, signals, and actions is twicedifferentiable in effort, and the marginal effects on this distribution of the different components of effort are independent. (b) The principal's utility function Up is trice differentiable, increasing, and concave. (c) The agent's utility function UA is separable, with the function on the compensation scheme (or sharing rule as it is known) being increasing and concave, and the function on the effort being concave. Results: Result 2.1: There exists a marginal incentive informativeness condition which is essentially sufficient for marginal value given a signal information system Y. When information about the output is replaced by signals about the output and/or the agent's effort, marginal incentive informativeness is no longer a necessary condition for marginal value since an additional information system Z may be valuable as information about both the output and the effort. 71 Result 2.2: Information systems having no marginal insurance value but having marginal incentive informativeness may be used to improve risk sharing, as for example, when the signals which are perfectly correlated with output on the agent's effort are completely observable. Result 2.3: Under the assumptions of result 2.2, when the output alone is observed, it must be used for both incentives and insurance. If the effort is observed as well, then a contract may consist of two parts: one part is based on the effort, and takes care of incentives; the other part is based on output, and so takes care of risksharing. For example, consider auto insurance. The principal (the insurer) cannot observe the actions taken by the driver (such as care, caution and good driving habits) to avoid collisions. However, any positive signals of effort can be the basis of discounts on insurance premiums, as for example when the driver has proof of regular maintenance and safety check up for the vehicle or undergoes safe driving courses. Also factors such as age, marital status and expected usage are taken into account. The "output" in this case is the driving history, which can be used for risk sharing; another indicator of risk which may be used is the locale of usage (country lanes or heavy city traffic). This example motivates result 2.4, a corollary to results 2.2 and 2.3. Result 2.4: Information systems having no marginal incentive informativeness but having marginal insurance value may be used to offer improved incentives. Result 2.5: If the uncertainty in the informative signal system is influenced by the choices of the principal and the agent, then such information systems may be used for control in decentralized decisionmaking. 5.3.3 Model 3 Holmstrom's model (Holmstrom, 1979) examines the role of imperfect information under two conditions: (i) when the compensation scheme is based on output alone, and (ii) when additional information is used. The assumptions about technology, information and timing are more or less standard, as in the earlier models. The model specifically uses the following: (a) In the first part of the model, almost all information is public; in the second part, asymmetry is brought in by assuming extra knowledge on the part of the agent. (b) output is a function of the agent's effort and state of nature: q q(e,0), and aq/ae > 0. (c) The agent's utility function is separable in compensation and effort, where UA(c) is defined on compensation, and d(e) is the disutility defined on effort. (d) Disutility of effort d(e) is increasing in effort. (e) The agent is risk averse, so that UA" < 0. (f) The principal is weakly risk neutral, so that Up" < 0. (g) Compensation is based on output alone. (h) Knowledge of the probability distribution on the state of nature 0 is public. (i) Timing: The agent chooses effort before the state of nature is observed. The problem: (P) MaxEC c EE E[Up(q c(q))] such that E[UA(c(q),e)] > U, (IRC) e E argmax,.EE E[UA(C(q), e')]. (ICC) To obtain a workable formulation, two further assumptions are made: (a) There exists a distribution induced on output and effort by the state of nature, denoted F(q,e), where q q(e,0). Since aq/ae > 0 by assumption, it implies aF(q,e)/ae < 0. For a given e, assume aF(q,e)/ae < 0 for some range of values q. (b) F has density function f(q,e), where (denoting fe = af/ae) f, and fe are well defined for all (q,e). The ICC constraint in (P) is replaced by its first order condition using f, and the following formulation is obtained: (P') MaxEC ,cEE I Up(q c(q)) f(q,e) dq such that I [UA(c(q)) d(e)] f(q,e) dq U, (IRC') SUA(c(q)) fQ(q,e) dq = d'(e). (ICC') Results: Result 3.1: Let X and /A be the Lagrange multipliers for IRC' and ICC' in (P') respectively. Then, the optimal compensation schemes are characterized as follows: U(q c(q)) f.(qe)  ; __ = X + IL. ^ ] Uc(q)) fq,e) where c is the agent's wealth, and c is the principal's wealth plus the output (these form the lower and upper bounds). If the equality in the above characterization does not hold, then c(q) = c or c depending on the direction of inequality. Result 3.2: Under the given assumptions and the characterization in result 3.1, 1A > 0; this is equivalent to saying that the principal prefers the agent increase his effort given a secondbest compensation scheme as in the above result 3.1. The secondbest solution is strictly inferior to a firstbest solution. Result 3.3: f I /f is interpreted as a benefitcost ratio for deviation from optimal risk sharing. Result 3.1 states that such deviation must be proportional to this ratio taking individual risk aversion into account. From Result 3.2, incentives for increased effort are preferable to the principal. The following compensation scheme accomplishes this (where cF(q) denotes the firstbest solution for a given X): c(q) > cF(q), if the marginal return on effort is positive to the agent; c(q) < cF(q), otherwise. Result 3.4: Intuitively, the agent carries excess responsibility for the output. This is implied by result 3.3 and the assumptions on the induced distribution f. A previous assumption is now modified as follows: Compensation c is a function of output and some other signal y which is public knowledge. Associated with this is a joint distribution F(q,y,e) (as above), with f(q,y,e) the corresponding density function. 75 Result 3.5: An extension of result 3.1 on the characterization of optimal compensation schemes is as follows: U(q c(q,y)) f(q,y,e) U(c(q,y)) W where X and /x are as in result 3.1. Result 3.6: Any informative signal, no matter how noisy it is, has a positive value if costlessly obtained and administered into the contract. Note: This result is based on rigorous definitions of value and informativeness of signals (Holmstrom, 1979). In the second part of this model, an assumption is made about additional knowledge of the state of nature revealed to the agent alone, denoted z. This introduces asymmetry into the model. The timing is as follows: (a) the principal offers a contract c based on the output and an observed signal y; (b) the agent accepts the contract; (c) the agent observes a signal z about 0; (d) the agent chooses an effort level; (e) a state of nature occurs; (f) agent's effort and state of nature yield an output; (g) sharing of output takes place. 76 We can think of the signal y as information about the state of nature which both parties share and agree upon, and the signal z as special postcontract information about the state of nature received by the agent alone. For example, a salesman's compensation may be some combination of percentage of orders and a fixed fee. If both the salesman and his manager agree that the economy is in a recession, the manager may offer a yearlong contract which does not penalize the salesman for poor sales, but offers above subsistence level fixed fee to motivate loyalty to the firm on the part of the salesman, and a clause thrown in which transfers a larger share of output than normal to the agent (i.e. incentives for extra effort in a time of recession). Now suppose the salesman, as he sets out on his rounds, discovers that the economy is in an upswing, and that his orders are being filled with little effort on his part. Then the agent may continue to exert little effort, realize high output, get a higher share of output in addition to a higher initial fixed fee as his compensation. In the case of asymmetric information, the problem is formulated as follows: (PA) Maxc(qy)Ec,e(z)EE I Up(q c(q,y))f(q,y I z,e(z))p(z)dqdydz such that I UA(c(q,y))f(q,y I z,e(z))p(z)dqdydz J d(e(z))pzdz > U, (IRC) e(z) E argmax.gE I UA(c(q,y))f(q,yIz,e')dqdy d(e') V z (ICC) where p(z) is the marginal density of z, d(e(z)) is the disutility of effort e(z). Let X and 1t(z)p(z) be the Lagrange multipliers for (IRC) and (ICC) in (PA) respectively. 77 Result 3.7: The extension of result 3.1 on the characterization of optimal compensation schemes to the problem (PA) is: U'(q c(qy)) I fpL(z).f(q,y Iz,e(z))p(z)dz Uq) = A. +  U(c(q,y)) fftq,y Iz,e(z))p(z)dz The interpretation of result 3.7 is similar to that of result 3.1. Analogous to result 3.2, p(z) 4 0, and /z(z) < 0 for some z and )(z) > 0 for other z, which implies, as in result 3.2, that result 3.7 characterizes solutions which are secondbest. 5.3.4 Model 4: Communication under Asymmetry This model (Christensen, 1981) attempts an analysis similar to model 3, and includes communication structures in the agency. The special assumptions are as follows: (a) There is a set of messages M that the agent uses to communicate with the principal; compensation is based on the output and the message picked by the agent; hence, the message is public knowledge. (b) There is a set of signals about the environment; the agent chooses his effort level based on the signal he observes; the agent also selects his compensation scheme at this time by selecting an appropriate message to communicate to the principal; selection of the message is based on the effort. 78 (c) Uncertainty is with respect to the signals observed by the agent; the distribution characterizing this uncertainty is public knowledge; the joint density is defined on output and signal conditioned on the effort: f(qtle) = f(ql,e)'f(). (d) Both parties are Savage(1954)rational. (e) The principal's utility of wealth is Up, with weak riskaversion; in particular, Up' > 0 and U" < 0. (f) The agent's utility of wealth is separable into UA defined on compensation and disutility of effort. The agent has positive marginal utility for money, and he is strictly riskaverse; i.e. UA' > 0, UA" < 0, and d' > 0. Timing: (a) The principal and the agent determine the set compensation schemes, based on the output and the message sent to the principal by the agent; the principal is committed to this set of compensation schemes; (b) the agent accepts the compensation scheme if it satisfies his reservation welfare; (c) the agent observes a signal ; (d) the agent picks an effort level based on ; (e) the agent sends a message m to the principal; this causes a compensation scheme from the contracted set to be chosen; (f) output occurs; (g) sharing of output takes place. Note that in the timing, (d) and (e) could be interchanged in this model without affecting anything. The following is the principal's problem: (P) Find (c*(q,m),e'(,m),m*()) such that c* E C, e* E E, and m* E M solves: Maxq,n,),mi) E[Up(q c(q,m))] such that E[UA(c(q,m)) d(e)] U, (IRC) e(Q) E argmaxo.'E E[UA(c(q,m())) d(e') (selfselection of action), m(Q) argmaxm.eM E[UA(c(q,m'))d(e(Q,m')) f] (selfselection of message), where e(Q,m) is the optimal act given that is observed and m is reported. The following assumptions are used for analyzing the problem in the above formulation: (a) Up(.) and U^(*) d(.) are concave and twice continuously differentiable in all arguments. (b) Compensation functions are piecewise continuous and differentiable a.e.(Q). (c) The density function f is twice differentiable a.e. (d) Regularity conditions enable differentiation under the integral sign. (e) Existence of an optimal solution is assumed. Result: Result 4.1: The following is a characterization of optimal functions: Up,(qc "(q,&)) = e .( l)) p(tq,E le(E)) = I * U (c *(q,E)) Aq,E le *(E)) f(q,E le *(E) where X, 1(Q), and p(Q) are Lagrange multipliers for the three constraints in (P) respectively. 5.3.5 Model G: Some General Results Result G. 1 (Wilson. 1968). Suppose that both the principal and the agent are risk averse having linear risk tolerance functions with the same slope, and the disutility of the agent's effort is constant. Then the optimal sharing rule is a nonconstant function of the output. Result G.2. In addition to the assumptions of result G. 1, also suppose that the agent's effort has negative marginal utility. Let cl(q) be a sharing rule (or compensation scheme) which is linear in the output q, and let c2(q) = k be a constant sharing rule. Then, c, dominates c2. The two results above deal with conditions when observation of the output is useful. Suppose Y is a public information system that conveys information about the output. So, compensation schemes can be based on Y alone. The value of Y, denoted W(Y) (following model 1), is defined as: W(Y) = maxcEc EUp[q c(y)], subject to IRC 81 and ICC. Let Y' denote a noninformative signal. Then, the two results yield a ranking of informativeness: W(Y) > W(Y). When Q is an information system denoting perfect observability of the output q, and the timing of the agency relationship is as in model 1 (i.e. payment is made to the agent after observing the output), then W(Q) > W(Y) as well. CHAPTER 6 METHODOLOGICAL ANALYSIS The solution to the principalagent problem is influenced by the way the model itself is setup in the literature. Highly specialized assumptions, which are necessary in order to use the optimization technique, contribute a certain amount of bias. As an analogy, one may note that a linear regression model assumes implicit bias by seeking solutions only among linear relationships between the variables; a correlation coefficient of zero therefore implies only that the variables are not linearly correlated, not that they are not correlated. Examples of such specialized assumptions abound in the literature, a small but typical sample of which are detailed in the models presented in Chapter 5. The consequences of using the optimization methodology are primarily of two. Firstly, much of the pertinent information that is available to the principal, the agent and the researcher must be ignored, since this information deals with variables which are not easily quantifiable, or which can only be ranked nominally, such as those that deal with behavioral and motivational characteristics of the agent and the prior beliefs of the agent and principal (regarding the task at hand, the environment, and other exogenous variables). Most of this knowledge takes the form of rules linking antecedents and consequents, and which have associated certainty factors. 83 Secondly, a certain amount of bias is introduced into the model by requiring that the functions involved in the constraints satisfy some properties, such as differentiability, monotone likelihood ratio, and so on. It must be noted that many of these properties are reasonable and meaningful from the standpoint of accepted economic theory. However, standard economic theory itself relies heavily on concepts such as utility and risk aversion in order to explain the behavior of economic agents. Such assumptions have been criticized on the grounds that individuals violate them; for example, it is known that individuals sometimes violate properties of the NeumannMorgenstemrn utility functions. Decision theory addressing economic problems also uses concepts such as utility, risk, loss, and regret, and relies on classical statistical inference procedures. However, real life individuals are rarely consistent in their inference, lacking in statistical sophistication, and unreliable on probability calculations. Several references to support this view are cited in Chapter 2. If the term "rational man" as used in economic theory means that individuals act as if they were sophisticated and infallible (in terms of method and not merely content), then economic analysis might very well yield erroneous solutions. Consider, as an example, the treatment of compensation schemes in the literature. They are assumed to be quite simple, either being linear in the output, or involving a fixed element called the rent. (See chapter 5 for details). In practice, compensation schemes are fairly comprehensive and involved. They cover as many contingencies as possible, provide for a variety of payment and reward criteria, specify grievance procedures, termination, promotion, varieties of fringe benefits, support services, access to company resources, and so on. 84 The set of all compensation schemes is in fact a set of knowledge bases consisting of the following components (B.R. Ellig, 1982): (1) Compensation policies/strategies of the principal; (2) Knowledge of the structure of the compensation plans, which means specific rules concerning shortterm incentives linked to partial realization of expected output, longterm incentives linked to full realization of expected output, bonus plans linked to realizing more than the expected output, disutilities linked to underachievement, and rules specifying injunctions to the agent to restrain from activities that may result in disutilities to the principal (if any). There are various elements in a compensation scheme, which can be classified as financial and nonfinancial: Financial elements of compensation 1. Base Pay (periodic). 2. Commission or Share of Output. 3. Bonus (annual or on special occasions). 4. Long Term Income (lump sum payments at termination). 5. Benefits (insurance, etc.). 6. Stock Participation. 7. Nontaxable or taxsheltered values. Nonfinancial elements of compensation 1. Company Environment. 2. Work Environment. 