Citation

Material Information

Title:
Automated knowledge acquisition using inductive learning application to Mutual Fund classification
Creator:
Norris, Robert Clayton
Publication Date:
Language:
English
Physical Description:
ix, 217 leaves : ill. ; 29 cm.

Subjects

Subjects / Keywords:
Assets ( jstor )
Capitalization ( jstor )
Datasets ( jstor )
Debt ( jstor )
Decision trees ( jstor )
Error rates ( jstor )
Machine learning ( jstor )
Mutual funds ( jstor )
Price earnings ratio ( jstor )
Ratings ( jstor )
Decision and Information Sciences thesis, Ph.D ( lcsh )
Dissertations, Academic -- Decision and Information Sciences -- UF ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph.D.)--University of Florida, 1997.
Bibliography:
Includes bibliographical references (leaves 155-216).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Robert Clayton Norris.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
028715168 ( ALEPH )
48449355 ( OCLC )

Full Text

AUTOMATED KNOWLEDGE ACQUISITION
USING INDUCTIVE LEARNING:
APPLICATION TO MUTUAL FUND CLASSIFICATION

By

ROBERT CLAYTON NORRIS, JR.

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1997

by

Robert Clayton Norris, Jr.

To my wife, Suzanne

and my daughter, Christina

ACKNOWLEDGEMENTS

I wish to thank Dr. Gary J. Koehler, the chairman of my committee, and Dr.

Robert C. Radcliffe, the external member, who gave me their time, support, guidance,

and patience throughout this research. Dr. Koehler provided the initial concept of the

research topic in the area of artificial intelligence and finance. Dr. Radcliffe provided the

idea of studying the Morningstar rating system. I wish to thank Dr. Richard Elnicki and

Dr. Patrick Thompson for serving on my committee and for their help and advice over

these years.

I wish to thank Dr. H. Russell Fogler who has always shown an interest in my

research. I also would like to thank Dean Earle C. Traynham and Dr. Robert C. Pickhardt

of the College of Business Administration, University of North Florida, for their support.

I wish to thank my wife, Suzanne, for her love, assistance, and understanding as I

worked on this most important undertaking. I would also like to thank my aunt and

uncle, Lillian and Jack Norris, for encouraging me to continue my education. Finally, I

would like to thank my late father for his advice and guidance over the years and my late

mother for her love.

ACKNOW LEDGEM ENTS .......................................... ........................iv

A B STRA CT ................................... .............................. viii

CHAPTERS

1 IN TRO D U CTION ................. ..................................... 1

1.1 Background................. .......... ......................
1.2 Research Problem ............................................ .................3
1.3 Purpose .... .................................. ................... .... .......... 5
1.4 M otivation ................................. ................... ...............
1.5 Chapter Organization ............... ........................ ................ 6

2 LITERA TURE REVIEW .................................... .... ............. ......8

2.1 Historical Overview of Machine Learning.................................8
2.1.1 Brief History of AI Research on Learning........................8
2.1.2 Four Perspectives on Learning................................... 10
2.2 AI and Financial Applications................................................. 15
2.2.1 Expert Systems................................. .............. ............. 15
2.2.2 Neural Networks and Financial Applications................... 17
2.2.3 Genetic Algorithms and Financial Applications...............27
2.3 The C4.5 Learning System ............................ ....................32
2.3.1 Brief History of C4.5 ..................................... ........... 33
2.3.2 C4.5 Algorithm s.......................... .............................. 37
2.3.3 Limitations of C4.5........................... ...............44
2.3.4 C4.5 Financial Applications.............. ..................45
2.4 Linear Discriminant Analysis.............................................47
2.4.1 Overview ........................... .......................... 47
2.4.2 Limitations of LDA ...................................... ....... 48
2.5 Logistic Regression (Logit)................................. ... ........... 49
2 .6 Su m m ary................ .... .......................... .... ........... .... 50

3 DOM AIN PROBLEM ................................................ .................. 51

3.1 Overview of Mutual Fund Ratings Systems... ........................ 51
3.1.1 Morningstar, Inc. Overview.............................. ....54
3.1.2 Morningstar Rating System ........................................... 55
3.1.3 Review and Criticism of the Morningstar Rating System.58
3.1.4 Investment Managers Use of Ratings ............ ............59
3.1.5 Performance Persistence in Mutual Funds......................61
3.1.6 Review of Yearly Variation of Morningstar Ratings ........64
3.2 Problem Specification ................ ......................... 68

4 CLASSIFICATION OF MUTUAL FUNDS BY RATINGS...................71

4.1 R research G oals ............................. .... ........................7 1
4.2 Research Caveats ................................. ... ......... ...........71
4.3 Example Databases ................. .......... ............... ...72
4.4 Brief Overview of the Research Phases....................................72
4.5 Phase 1 Classifying 1993 Funds...................... ................73
4.5.1 M methodology ........................................ ......................73
4.5.2 Results................... ..... ..... .... ............... 77
4.5.3 Conclusions.............................. ... ................. 81
4.6 Phase 2 1993 Data with Derived Features................................ 81
4.6.1 M ethodology ....................... ................ ................81
4.6.2 Results for the Regular Dataset.................. ............... 84
4.6.3 Results for the Derived Features Dataset.........................87
4.6.4 Conclusions................... ..................................89
4.7 Phase 3 Comparing 5-Star and 3-Star Classifications ................91
4.7.1 M ethodology ..................................................91
4.7.2 R esults............... ... ........... .............. ... ....... ..93
4.7.3 C conclusions ............................. ............. ............. .. 100
4.8 Phase 4 Crossvalidation with C4.5 ......................................... 101
4.8.1 M ethodology .................. ........................................ 10 1
4.8.2 R esults........................... .. .. ........ .... ....... .... 103
4.8.3 Conclusions................ ...................... .................. 104
4.9 Overall Summary ....................................... ....................... 105

5 PREDICTION OF MUTUAL FUND RATINGS AND
RATINGS CHANGES................................ ................ 107

5.1 Phase 5 Predicting Ratings with a Common Feature
Vector Over Two Years.......................... ......... 109
5.1.1 M ethodology ................................................................ 109
5 .1.2 R esu lts................................ .. ... .. ... .. ............ ... 1 10
5.1.3 C o nclu sio ns....................................... ..................... 112

5.2 Phase 6 Predicting Matched Mutual Fund Rating Changes......113
5.2.1 M ethodology ...................................................... ........ 113
5.2.2 Results for 1994 Data Predicting 1995 Ratings..............116
5.2.3 Results for 1995 Data Predicting 1996 Ratings ..............125
5.2.4 C onclusions............................................................... 133
5.3 Phase 7 Predicting Unmatched Mutual Fund Ratings ..............134
5.3.1 M ethodology .............................. .......................... 134
5.3.2 Results for 1994 Data Predicting 1995 Ratings ..............135
5.3.3 Results for 1995 Data Predicting 1996 Ratings.............. 141
5.3.4 Conclusions ................. ................... ... ....... ... 148
5.4 Overall Summary................. .......................................... 148

6 SUMMARY AND FUTURE RESEARCH................................... 150

APPENDICES

A DESCRIPTION OF MUTUAL FUND FEATURES............................. 155

B PHASE 1 CLASSIFICATION FEATURES .................................... 161

C PHASE 2 CLASSIFICATION FEATURES ..................................... 165

D PHASE 3 CLASSIFICATION FEATURES ..................................... 174

E BEST CLASSIFICATION TREES FROM PHASES 1-4 ..................... 185

REFERENCES ................................. .................................... 208

BIOGRAPHICAL SKETCH .................................. ............................ 217

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

AUTOMATED KNOWLEDGE ACQUISITION
USING INDUCTIVE LEARNING:
APPLICATION TO MUTUAL FUND CLASSIFICATION

By

Robert Clayton Norris, Jr.

December 1997

Chairman: Dr. Gary J. Koehler
Major Department: Decision and Information Sciences

This research uses an inductive learning methodology that builds decision trees,

Quinlan's C4.5, to classify mutual funds according to the Morningstar Mutual Fund five

class or star rating system and predict the mutual fund ratings one year in the future. In

the first part of the research, I compare the performance of C4.5, Logistic Regression and

Linear Discriminant Analysis in classifying mutual funds according to the Morningstar

rating system. As the size of the training set increases, so does C4.5's performance

versus the two statistical methods. Overall, C4.5 performed as well as Logistic

Regression in classifying the mutual funds and outperformed Linear Discriminant

Analysis. This part of the research also explored the ability of C4.5 to classify equity

mutual funds that were unrated by Morningstar. The results suggested that, with the

proper features and a modification to the Morningstar five class rating system to three

classes, unrated mutual funds could be classified with a 30% error.

Anecdotal evidence suggested that investors purchase mutual funds by ratings and

have an expectation that the rating will stay the same or improve. The second part of the

research used a training set of one year to construct a decision tree with C4.5 to predict

the ratings of mutual funds one year in the future. The testing set consisted of examples

from the prediction year in question and the predictions were compared to the actual

ratings for that year. The results were that, with the necessary feature vector, five-star

fund ratings could be predicted with 65% accuracy. With a modification to the rating

system changing it to three stars, predicted mutual fund ratings were 75% accurate.

This research also identifies features that are useful for the classifying mutual

funds by the Morningstar rating system and for the prediction of fund ratings.

CHAPTER 1
INTRODUCTION

1.1 Background

Glancing through a copy of Technical Analysis of Stocks & Commodities

magazine, you find a great deal of information about artificial intelligence (AI) and the

selection of stocks and commodities for portfolios. In the February 1997 issue of the

magazine, the Traders' Glossary even defines the term neural network. However, it is

difficult to find scientific research about AI systems used on Wall Street since usually

they are proprietary and could provide a competitive advantage to the investment firm.

Five years ago AI use in financial applications was just beginning to be noticed.

For example, this story in the Wall Street Journal of October 27, 1992 about the use of

artificial intelligence to select stocks for a mutual fund portfolio:

"Bradford Lewis, manager of Fidelity Investment Inc.'s Fidelity
Disciplined Equity Fund, has found a way to out-perform standard indices
using neural network software. A neural network is an artificial
intelligence program that copies the workings of the human brain. The
mutual fund, which invests in the same businesses as the Standard and
Poor's 500 stock index (S & P 500), has won over the index by 2.3 to 5.6
percent for three years running and is maintaining its performance in FY
1993...Lewis checks with analysts at Fidelity to double-check his results,
but sometimes when he buys stock contrary to the computer's advice, he
loses money." (McGough, 1992, p. C1)

Academic research concerning mutual funds and artificial intelligence is

relatively new since only three studies were cited in the literature from 1986 to the

present. Chiang et al. (1996) described a neural network that used historical economic

information to predict the end-of-year Net Asset Value and that outperformed regression

models. A second study (Lettau, 1994) used genetic algorithms to simulate the actions of

investors buying and selling mutual funds. The third paper studied portfolio decisions of

boundedly rational agents by modeling learning with a Genetic Algorithm (Lettau, 1997)

and had nothing to do with mutual fund ratings, the topic of this research.

The difficulty of conducting research about working AI systems requires

researchers to design systems for study that would be of interest to the investor and

practitioner. This leaves much room for applied research to develop systems that could,

for example, classify mutual funds according to one of several popular rating systems. If

we could classify mutual funds according to a rating system, why not go one step further

and try to predict the ratings over some fixed horizon. Classification and prediction of

mutual fund ratings are the essence of our research with an inductive learning program

called C4.5, the direct descendent of ID3 (Quinlan, 1993, p.vii).

Machine learning (ML) is a branch of artificial intelligence concerned with

automated knowledge acquisition and inductive learning is a strategy of ML that seeks to

produce generally applicable rules from the examination of a number of examples (Trippi

and Lee, 1996). Since inductively generated rules could be used for classification

problems, a common concern is the performance relative to other existing models for

classification, such as linear discriminant analysis (LDA) and logistic regression (Logit).

Unlike LDA and Logit, inductive learning makes no a priori assumptions about forms of

the distribution of data (e.g., a normal distribution) or the relationship between the

features or variables describing each mutual fund. C4.5 builds decision trees to classify

the examples, which consist of the features and an indicator function for class

membership. It then uses the best decision tree produced using a training set of examples

to classify unseen examples in a testing set to determine the accuracy of the tree. C4.5

also has the ability to translate the decision tree into a ruleset that could be used as an

alternative means for classification.

We have selected the C4.5 decision tree generating program for this research for

several reasons. First, as our literature review will show, decision tree programs,

specifically ID3 and its successors, have been used for a variety of financial applications

over 10 years (Braun and Chandler, 1987) and good results were achieved. Second,

decision trees provide practitioners with a way of understanding how an individual

example was classified. They can start at the root and see the values used for the features

in partitioning the examples into their respective classes. The decision tree may be

complex and difficult to understand but each example can be explained with it. Other AI

programs used for financial applications do not have this capability. Neural networks

inherently lack explanatory capability (Trippi and Lee, 1996). Genetic algorithms are

complex systems that are considered hard to design and analyze (Goldberg, 1994). Third,

C4.5 processes discrete, continuous and nominal valued features without transformation

while this is not possible for neural networks and genetic algorithms.

1.2 Research Problem

The number of common stocks far exceeded the number of mutual funds for

many years after the enactment of the Investment Company Act of 1940. After all, the

acquisition of mutual funds was a way of selecting a professional portfolio manager to

pick stocks for you and, by 1976, there were 452 funds. Today we have the situation

where the number of equity and bond funds exceeds the number of stocks on the New

York Stock Exchange. For many investors, selecting a fund has become a difficult

decision. Over the years rating services have appeared to aid the investor in their

decision of which mutual funds to buy and sell.

There is contemporary evidence, presented later in this study, that investors

appear to be buying mutual funds based on the ratings. This occurs despite the rating

services' disclaimer that their ratings evaluate historical performance and are not a

predictor of future performance. We also found that the rating services do not rate all the

mutual funds. Momingstar, the major mutual fund rating service, does not evaluate a

mutual fund for a rating unless they have a three-year financial history. For example, the

June 30, 1996, Momingstar Principia CD-ROM showed Morningstar was tracking 3,794

equity mutual funds but had rated only 1,848 funds or less than half. Chapter 3 discusses

how Momingstar rates mutual funds.

Being able to classify equity mutual funds according to the rules of a rating

system would be an important ability. There may be relationships among the variety of

data, financial and otherwise, that would permit the classification of mutual funds not yet

rated by Momingstar. In addition, if we could classify mutual funds with a high degree

of accuracy to a known rating system, there could be relationships among the data that

permit predicting the future rating of a mutual fund already rated by Morningstar.

In developing a process to classify mutual funds and predict their ratings, we

could also automate the identification of important features that aid the classification

process. In addition, we could use decision trees to develop a knowledge base of the

relationships between these features.

This investigation of inductive learning using decision trees for classification and

prediction consists of two parts. The first part evaluates the performance of C4.5 in

classifying mutual funds against two statistical techniques used for classification in the

field of finance, LDA and Logit. The second part evaluates the ability of C4.5 to predict

mutual fund ratings by comparing the performance of C4.5 to actual ratings and

conducting statistical tests of goodness-of-fit on the predicted and actual ratings

distributions. The results are analyzed to gain insights into the relationships among the

data used for classification and prediction.

The benefits of this research could be extended to studying other mutual fund

rating systems and other types of mutual funds such as bond and hybrid funds. This

problem is of interest because the domain theory is not well developed. In such

problems, data-driven search techniques, such as inductive learning, have been found to

provide particularly good results.

1.3 Purpose

The approach to this study is empirical. Beginning with the research problem a

number of experiments were designed using an AI methodology and current statistical

classification techniques to find a solution to the classification of mutual funds and the

prediction of their ratings

The major goals of the study are as follows:

Demonstrate the relevance of inductive learning techniques in solving a real
world problem.

Investigate relationships in the domain data that could contribute to
understanding the rating system.

Investigate the application of an existing methodology to a new domain.

1.4 Motivation

Interest in the use of AI techniques in various business domains has grown very

rapidly and this is evident in the field of investment analysis. We have already identified

the use of neural networks in selecting portfolios for mutual funds and predicting the end-

of-year Net Asset Value. Additionally, traditional statistical methods are often used in

conjunction with, or in competition with, AI techniques. However, statistical methods

rely upon assumptions about the underlying distribution of the data that are usually

ignored or assumed away. AI methodologies make no such assumptions and can be

applied without invalidating the model.

Exploring the application of induction to mutual funds classification and

prediction will prove to be very useful and provide an area of research for strategic and

competitive use of artificial intelligence in information systems.

1.5 Chapter Organization

Chapter 2 of this study reviews the literature on artificial intelligence and its

financial applications. The chapter begins with an overview of artificial intelligence and

machine learning research, a review of AI and financial applications, followed by

definitions of learning and other formal concepts. Then we discuss the induction process

of decision trees for classification and prediction. The chapter ends with a discussion of

LDA and Logit classification.

Chapter 3 provides an overview of the problem domain. It reviews mutual fund

rating systems, followed by an analysis of the problem, the hypotheses to be tested, and

the benefits of the research effort. This chapter ends with a figure mapping out the

experimental design.

7

Chapters 4 and 5 provide details of the experimental methodology, results and

analysis, and the conclusions we draw from the results. Chapter 4 involves classifying

mutual funds using C4.5, LDA, and Logit. Chapter 5 involves predicting mutual fund

ratings one year in the future using C4.5.

Finally, Chapter 6 provides a summary and conclusion of the research and

discusses extensions of this work.

CHAPTER 2
LITERATURE REVIEW

2.1 Historical Overview of Machine Learning

The field of machine learning concerns computer programs that can imitate the

learning behavior of humans (Natarajan, 1991). Learning is the improvement of

performance in some environment through the acquisition of knowledge resulting from

experience in that environment (Langley, 1996). From the very beginning of artificial

intelligence, researchers have sought to understand the process of learning and to create

computer programs that can learn (Cohen and Feigenbaum, 1982). Two reasons for

studying learning are to understand the process and to provide computers with the ability to

learn.

2.1.1 Brief History of AI Research on Learning

AI research on learning started in the late 1950's with work on self-organizing

systems that modified themselves to adapt to their environments. Through the use of

feedback and a given set of stimuli, the researchers thought the systems would evolve. Most

of these first attempts did not produce systems of any complexity or intelligence (Cohen and

Feigenbaum, 1982).

In the 1960s, AI research turned to knowledge-based problem solving and natural

language understanding. Workers adopted the view that learning is a complex and difficult

process, and that a learning system could not learn high-level concepts by starting without

any knowledge at all. This viewpoint resulted in some researchers studying simple

problems in great detail and led others to incorporate large amounts of domain knowledge

into learning systems so they could explore high-level concepts (Cohen and Feigenbaum,

1982).

A third stage of learning research, searching for ways to acquire knowledge for

expert systems, focuses on all forms of learning, including advice-taking and learning from

analogies. This stage began in earnest in the late 1970s (Feigenbaum et al., 1988).

An expert system imitates the intellectual activities that make a human an expert in

an area such as financial applications. The key elements of a traditional expert system are a

user interface, knowledge base, an inference engine, explanation capability, and a

knowledge acquisition system (Trippi and Lee, 1996). The knowledge base consists of

facts, rules, and heuristic knowledge supplied by an expert who may be assisted in this task

by a knowledge engineer. Knowledge representation formalizes and organizes the

knowledge using IF-THEN production rules (Feigenbaum et al., 1988). Other

representations of knowledge, such as frames, may be used.

Figure 2.1: Basic Structure of an Expert System.

The inference engine uses the knowledge base plus facts provided by the user to

draw inferences in making a recommendation. The system can chain the IF-THEN rules

together from a set of initial conditions moving to a conclusion. This approach to problem

solving is called forward chaining. If the conclusion is known but the path to that

conclusion is not known, then reasoning backwards, or backward chaining, is used

(Feigenbaum et al., 1988).

Because an expert system uses uncertain or heuristic knowledge, its credibility is

often in question. The explanation capability is available to explain to the user how a

particular fact was inferred or why a particular question was asked. This capability can be

used to find incorrect rules in the knowledge base (Feigenbaum et al., 1988).

2.1.2 Four Perspectives on Learning

With this brief overview of AI research and learning, it is now important to turn to

the four perspectives on learning itself Simon defined learning "as any process by which a

system improves its performance (Cohen and Feigenbaum, 1982, p.326)." This assumes

that the system has a task that it is attempting to perform and it may improve its

performance in two ways: applying new methods and knowledge, or improving existing

methods and knowledge.

Expert systems researchers take a more limited view of learning by saying it is "the

acquisition of explicit knowledge." (Cohen and Feigenbaum, 1982). Expert systems usually

represent knowledge as a collection of rules and this viewpoint means that acquired

knowledge should be explicit so that it can be verified, modified, and explained.

A third view is that learning is skill acquisition. Researchers in AI and cognitive

psychology have sought to understand the kinds of knowledge needed to perform a task

skillfully.

A fourth view of learning comes from the collective fields of science and focuses on

theory formation, hypothesis formation, and inductive inference.

Simon's perspective of learning has been the most useful for machine learning

development and Cohen and Feigenbaum (1982) have modeled a learning system consisting

of the environment, a learning element, a knowledge base, and a performance element.

Environment Learning KnowledgePerformance
Element a Element

Figure 2.2: A Simple Model of Learning Systems.

The environment supplies some information to the learning element, the learning

element uses this information to make improvements in an explicit knowledge base, and the

performance element uses the knowledge base to perform its task. Information gained

during attempts to perform the task can serve as feedback to the learning element. This

simple model allows us to classify learning systems according to how they fit into these four

functional elements.

From these four perspectives and with the availability of a learning model, AI

researchers have developed four learning situations: rote learning, learning by being told,

learning from examples or induction, and learning by analogy.

2.1.2.1 Rote learning

Rote learning is memorization of the problem and the solution. New knowledge is

saved to be retrieved later. However, rote learning is useful only if it takes less time to

retrieve the desired item than it does to recompute it. Rote learning is not very useful in a

dynamic environment since a basic assumption is that information acquired today will be

valid in the future (Cohen and Feigenbaum, 1982).

An example of a rote learning system is Samuel's Checkers Player that evaluated

possible moves by conducting a minimax game-tree search and was able to improve its

performance by memorizing every board position it evaluated. Cohen and Feigenbaum

(1982) describe how the system could not search the 104 possible moves in checkers and

evaluated just a few moves into the future, choosing the move that would lead to the best

position. The look-ahead search portion of Samuel's program served as the environment. It

supplied the learning element with board positions and their backed-up minimax values.

The learning element simply stored these board positions and indexed them for rapid

retrieval. The program became capable of playing a very good opening game. Rote

learning did not improve the middle game since the number of possible moves was greater.

At the end game, the system would wander since each possible solution, winning the game,

performance. Research on advice-taking systems has followed two major paths: 1) systems

that accept abstract, high-level advice and convert it into rules to guide a performance

element, and 2) systems that develop sophisticated tools that make it easier for the expert to

transform their own expertise into rules. Five processes identified to convert expert advice

into program performance were as follows:

1. Request advice from the expert,

2. Interpret or assimilate the advice into an internal representation,

3. Operationalize or convert the advice into a usable form,

4. Integrate advice correctly into the knowledge base, and

5. Evaluate the resulting actions of the performance element.

The principal shortcoming of learning by taking advice was that the various methods were

quite specific to the task and generalization would require substantial effort.

This approach was used in building knowledge-based expert systems such as

MYCIN, which acted as a medical consultant system aiding in the diagnosis of patients with

bacteremia or meningitis infections (Barr and Feigenbaum, 1981). The system carried on an

interactive dialogue with a physician and was capable of explaining its reasoning. MYCIN

had a knowledge acquisition subsystem, TEIRESIAS, which helped expert physicians

expand or modify the rule base.

2.1.2.3 Learning by analogy

A third approach to learning is by analogy. If a system has an analogous knowledge

base, it may be able to improve its performance on a related task by recognizing the

analogies and transferring relevant knowledge to another knowledge base specific to the

task (Cohen and Feigenbaum, 1982). An example of this approach in actual use is the AM

computer program written by Douglas Lenat that discovers concepts in elementary

mathematics and set theory. In searching the rule space, AM may employ one of 40

heuristics described as reasoning by analogy. Cohen and Feigenbaum (1982) reported little

research in this area.

2.1.2.4 Learning by example

Learning by example, or inductive learning, requires a program to reason from

specific instances to general rules that can guide the actions of the performance element

(Cohen and Feigenbaum, 1982). The researcher presents the learning element with very low

level information, in the form of a specific situation, and the appropriate behavior for the

performance element in that situation. The program generalizes this information to obtain

general rules of behavior. An important early paper on induction described the two-space

view of learning from examples:

Simon and Lea (1974)...describe the problem of learning from examples as
the problem of using training instances, selected from some space of possible
instances, to guide a search for general rules. They call the space of possible
training instances the instance space and the space of possible general rules
the rule space. Furthermore, Simon and Lea point out that an intelligent
program might select its own training instances by actively searching the
instance space in order to resolve some ambiguity about the rules in the rule
space (Cohen and Feigenbaum, 1982, p. 360).

Simon and Lea viewed the learning system as moving back and forth between an instance

space and a rule space until it converged on the desired rule.

Many different approaches to learning-by-example, such as neural networks and

genetic algorithms, have been developed and used in financial applications. In the

remainder of this chapter we will review the use of AI in financial applications by

discussing the use of expert systems and describing neural networks, genetic algorithms, and

inductive learning systems.

2.2 AI and Financial Applications

2.2.1 Expert Systems

We have previously defined expert systems in 2.1.1. Artificial intelligence has been

applied to business and finance since the early 1980's starting with expert systems

(Schreiber, 1984). Expert Systems were used for production, management, sales, and

finance. For example, an investment banking firm was using an expert system in its

international brokerage operations to manage foreign institutional portfolios (Wilson and

Koehler, 1986). Hansen and Messier (1986) suggested the use of an expert system for

auditing advanced computer systems while Culbertson (1987) provided an overview

showing how expert systems could also be used in accounting. In (Sena and Smith,

1987) an expert system was developed to ask questions about oil company financial

statements and made a judgement about whether the statements were within industry

norms.

By 1988, expert systems were being used in a number of companies. Texas

Instruments, Du Pont, IBM, American Express, Canon, Fujitsu, and Northrop were

showcased in The Rise of the Expert Company (Feigenbaum et al., 1988). Expert

systems could provide internal cost savings, improve product quality control, improve the

consistency of decision making, preserve knowledge, and restructure business to enlarge

customer choice. This book reported on 139 expert systems in use at a variety of

companies in the agriculture, communications, computers, construction, financial,

manufacturing, mining, medical, and transportation industries. The financial applications

included internal audit risk assessment systems, sales tax advising, risk analysis for

underwriting, portfolio management, credit authorization, income tax advising, financial

statement analysis, mortgage loan analysis, and foreign exchange options analysis.

The use of expert systems for commercial loan decisions is described in Duchessi

et al. (1988). The late 80s saw the rise of the bankruptcy of the Savings & Loan industry

so expert systems were used to conduct a financial analysis of their potential failure

(Elmer and Borowski, 1988). Shaw and Gentry (1988) described an improvement in the

design of expert systems: the ability to enhance performance by reacting to a changing

environment. Their MARBLE system used an inductive learning capability to update the

80 decision rules that evaluated business loans.

Expert systems were also moving into the area of stock trading. Laurance (1988)

described a trading expert system using 30 rules and a buy-and-hold strategy that was

superior to the Standard & Poor's 500 Index, a benchmark for manager performance.

information based on current market conditions. In Holsapple et al. (1988) it was pointed

out that the business world was slow to accept expert systems. Notably, unsatisfactory

results were achieved in finance due to unrealistic expectations and managerial mistakes.

The authors went on to mention that the current technology was inadequate for

applications requiring insight, creativity, and intuition; however, it could be used for

financial decision support systems.

Recognizing that arbitrageurs could take advantage of the discrepancies between

the futures market and the stock market, an expert system for program trading was

proposed (Chen and Liang, 1989). This system had the ability to update its rule base with

a learning mechanism and human observations about the markets. Miller (1990) provides

an overview of financial expert systems technology to include a discussion of the

problem domain, heuristics, and the architecture; and discussed the rule structure and

logic of an expert system.

Expert systems have also been applied to assessing audit risks and evaluating

client economic performance (Graham et al., 1991). Coopers & Lybrand developed a

system to assist in audit planning. The field of mortgage guaranty insurance underwriting

has also been able to harness expert systems (Gluch-Rucys and Walker, 1991). United

Guaranty Residential Insurance Co. implemented an expert system that could assist

underwriters with 75% of mortgage insurance applications. The determination of

corporate tax status and liabilities is another application of expert systems to finance (Jih

and Patterson, 1992). The STAX system used the Guru expert system shell to calculate

taxes and determined tax consequences of different corporate filing statuses.

2.2.2 Neural Networks and Financial Applications

Neural networks originated as a model of how the brain works:

McCulloch and Pitts formulated the first neural network model [McCulloch
43]. It featured digital neurons but no ability to learn. The work of another
psychologist, Donald Hebb, introduced the idea of Hebbian learning (as
detailed in Organization of Behavior [Hebb 49]), which states that changes
in synaptic strengths (connections between neurons) are proportional to the
activations of the neurons. This was a formal basis for the creation of neural
networks with the ability to learn (Blum, 1992, p. 4).

Real neurons, such as Figure 2.3, consist of a cell body, one axon (a protuberance that

delivers the neuron's output to connections with other neurons), and many dendrites which

receive inputs from axons of other neurons (Winston, 1992). A neuron does nothing unless

the collective influence of all its inputs reaches a threshold level. When that happens, the

neuron produces a full-strength output in the form of an electrical pulse that travels down the

axon to the next cell, which is separated from it by the synapse. Whenever this happens, the

neuron is said to fire. Stimulation at some synapses may cause a neuron to fire while

stimulation at others may discourage the neuron from firing. There is mounting evidence

that learning takes place near synapses (Winston, 1992).

Dandnris

NudeU5
Axon

j CON body

Figure 2.3: Real Neuron.

In the neural network, multipliers, adders, and thresholds replace the neuron

(Winston, 1992). Neural networks do not model much of the character of real neurons. A

simulated neuron simply adds up a weighted sum of its inputs and fires whenever the

threshold level is reached.

The development of neural networks was seriously derailed in 1969 by the

publication of Perceptrons by Marvin Minsky and Seymour Papert which pointed out

limitations in the prevailing memory model at that time (Blum, 1992). It wasn't until 1986

with the development of backpropagation (explained later on), permitting the training of

multi-layer neural networks, that neural networks became a practical tool for solving

problems that would be quite difficult using conventional computer science techniques.

Most neural networks consist of an input layer, a hidden or processing layer, and an

output layer. The hidden layer may be more than one layer itself. Figure 2.2 is an example

Figure 2.4: A Multilayer Neural Network.

of a multi-layer neural network. In this figure, xl, hi, and ol represent unit activation levels

of input, hidden, and output units.

In Figure 2.5, we show a simple neural network. Input signals are received from the

node's links, assigned weights (the ws), and added. The value of the node, Y, is the sum of

all the weighted input signals. This value is compared with the threshold activation level of

the node. When the value meets the threshold level, the node transmits a signal to its

neighboring nodes.

Figure 2.5: An Artificial Neuron.

Each unit in one layer is connected in the forward direction to every unit in the next

layer. Activations flow from the input layer through the hidden layer, then on to the output

layer. The knowledge of the network is encoded in the weights on connections between

units. The existence of hidden units allows the network to develop complex feature

detectors, or internal representations (Rich and Knight, 1991).

Neural networks learn by supervised training and self-organizing training (Winston,

1992). Supervised training, which we explain here, has the network given a set of examples

(x,y) where is the correct response forx.

In a multilayered neural network, the output nodes detect the errors. These errors

are propagated back to the nodes in the previous layer, and the process is repeated until the

input layer is reached. An effective algorithm that learns in this fashion (adjusting the

weights incrementally toward reducing the errors to within some threshold) is the

backpropagation algorithm (Rumelhart and McClelland, 1986). The discovery of this

algorithm was largely responsible for the renewal of interest in neural networks in the mid-

1980s, after a decade of dormancy (Trippi and Lee, 1996).

Yo Node

Y)
W2

YN-I

The backpropagation neural network typically starts out with a random set of

weights. The network adjusts its weights each time it sees an example (x,y). Each example

requires two stages: a forward pass and a backward pass. The forward pass involves

presenting a sample input, x, to the network and letting activations flow until they reach the

output layer. During the backward pass, the network's actual output from the forward pass

is compared with the correct response, y, and error estimates are computed for the output

units. The weights connected to the output units are adjusted in order to reduce those errors

(Rich and Knight, 1991).

1. Neural networks excel at taking data presented to them and determining

what data are relevant. Irrelevant data simply have such low connection

strength to all of the output neurons that it results in no effect.

2. Because of the abundance of input factors, noise in the data are not as much

of a problem with neural networks.

3. Each synapse in a neural net model can be its own processor. There are no

time dependencies among synapses in the same layer. Thus, neural networks

exhibit inherent parallelism.

4. Training may require thousands of evolutions.

5. Back propagation, which uses gradient descent, can get stuck in local

minima or become unstable.

6. Excess weights may lead to overfitting of the data.

Some consider neural network training to be an art that requires trial-and-error (Winston,

1992).

The use of neural networks for financial applications occurred after the use of

expert systems in this area. Dutta and Shekhar (1988) proposed using a neural network to

predict bond ratings. They trained a neural network using ten features they felt were

representative of bond ratings and had thirty bond issues in the training set. They tested

the network against seventeen bonds and the neural network outperformed regression

analysis.

Miller (1990) devoted no more than six pages in his book to explaining the neural

network concept without identifying possible applications. Hawley et al. (1990) outlined

the advantages and disadvantages of neural networks vs. expert systems. They also

included potential applications such as: financial simulation, financial forecasting,

financial valuation, assessing bankruptcy risk, portfolio management, pricing out Initial

Purchase Offerings, identifying arbitrage opportunities, performing technical analysis to

predict the short-term movements in stock prices, and performing fundamental analysis to

evaluate stocks.

Coats and Fant (1991) used neural networks to forecast financial distress in

businesses. The neural network correctly forecast 91% of the distressed firms as

distressed and 96% of the healthy firms as healthy. This is in contrast to multiple

discriminant analysis correctly identifying 72% of the distressed firms and 89% of the

healthy firms. Neural networks have also been used in credit scoring (Jensen, 1992) or

procedures used to grant or deny credit. Applicant characteristics were the input nodes

and three categories of payment history were the output nodes. The neural network was

trained with 125 credit applicants whose loan outcomes were known. Correct

classifications were made on 76% of the testing sample. Neural networks have also been

used in predicting savings and loan company failures (Salchenberger et al., 1992) and

bank failures (Tam and Kiang, 1992).

Trippi and DeSieno (1992) described trading Standard and Poor's 500 index

futures with a neural network. Their system consisted of several trained networks plus a

set of rules for combining network results to generate a composite recommendation for

the current day's position. The training period spanned 1,168 days from January 1986 to

June 1990. The test period covered 106 days from December 1990 through May 1991

and the system outperformed a passive investment strategy in the index. In

(Kryzanowski et al., 1993) a neural network was provided historical and current

accounting data, and macroeconomic data to discriminate between stocks having superior

future returns and inferior future returns. On 149 test cases the system correctly

classified 66.4% of the stocks.

Pirimuthu et al. (1993) studied ways of improving the performance of the

backpropagation algorithm. They noted that back propagation uses the steepest gradient

search for hill climbing. In essence, it is a linear method and they developed a quadratic

method to improve convergence of the algorithm. They compared the results of

predicting bankruptcy by several types of neural networks to the performance of NEWQ

(Hansen et al., 1993), ID3, and Probit. The training set consisted of 56 randomly selected

examples and the testing set was the remaining 46 examples. Overall, the

backpropagation neural network algorithms performed better than the ID3, NEWQ, and

Probit although the run times by the neural networks were much longer than the other

methods.

Yoon et al. (1994) brought together neural networks and the rule-based expert

system. The motivation for this study was to highlight the advantages and overcome the

disadvantages of the two approaches used separately. The primary advantage of rule-

based expert systems was the readability of the process since it uses explicit rules. A

disadvantage to developing such an expert system is the difficulty of developing those

rules. The authors used an artificial neural network as the knowledge base of the expert

system. The connection weights of the neural network specify the decision rules in an

implicit manner. The explanation module is a rule-based system in which knowledge

implicitly encoded in the neural network has been translated into an "IF-THEN" format.

The training and testing sets consisted of 76 companies each. The neural network system

achieved a correct classification of 76%. In comparison, a Multivariate Discriminant

Analysis model classified the data correctly only 63%.

Hutchinson et al. (1994) proposed using a neural network for pricing and hedging

derivative securities. They took as inputs the primary economic variables that influenced

the derivative's price and defined the derivative price to be the output into which the

neural network maps the inputs. When properly trained, the network "becomes" the

derivative pricing formula. The neural network would provide a nonparametric pricing

method. It was adaptive and responded to structural changes in the data-generating

processes in ways that parametric models could not, and it was flexible enough to

encompass a wide range of derivative securities. The disadvantage of this approach was

that large quantities of data were required, meaning that this would be inappropriate for

thinly traded derivatives or new instruments. Overall, the system achieved error levels

similar to those of the Black-Scholes formula (Black and Scholes, 1973) used for pricing

the derivatives.

Trading on the Edge (Deboeck, 1994) reviewed the use of neural networks for

securities trading. It provided an overview of neural network techniques, explained the

need for pre-processing financial data, discussed using neural networks for predicting the

direction of the Tokyo stock exchange, and described a neural network for trading U.S.

Treasury notes. The Tokyo stock exchange neural network had a 62.1% correct

prediction rate after being put in service in September 1989. A major benefit was that it

reduced the number of trades needed to implement a hedging position, which saved on

commissions. The Treasury notes neural network was evaluated based on the number of

recommended trades, the average profit and loss per trade, and the maximum gains,

losses, and drawdowns. In each case the system provided a higher average profit than

that achieved during the same period. It was noted, however, that the system performed

better when trained on a specific two-year period than when trained on data from a longer

period. We will mention this book again in our discussion of genetic algorithms.

Jain and Nag (1995) developed a neural network for pricing initial public

offerings (IPO). They noted that a vast body of empirical evidence suggested that such

offerings were underpriced by as much as 15% and this represented a huge loss to the

issuer. In developing the model, 276 new issues were used for training the network and

276 new issues were used for testing the network. They used 11 input features

representing a broad spectrum of financial indicators: the reputation of the investment

banker, the log of the gross proceeds, the extent of ownership retained by the original

entrepreneurs, the inverse of sales in millions of dollars in the year prior to the IPO,

capital expenditures over assets, capital expenditures over sales, operating return over

assets, operating return over sales, operating cash flow over assets, operating cash flow

over sales, and asset turnover. The results showed that the neural network generated

market price distributions that outperformed the pricing of investment bankers.

Neural networks have also been used to predict the targets of investigation for

fraudulent financial reporting by the Securities and Exchange Commission (Kwon and

Feroz, 1996). The network outperformed Logit and the study showed that non-financial

information could provide more predictive information than the financial information

alone.

Another study compared the performance of neural networks to LDA and Logit

scoring models for the credit union environment (Desai et al., 1996). The study

determined that neural networks outperformed the other methods in correctly classifying

the percentage of bad loans. If the performance measure was correctly classifying good

and bad loans, then logistic regression is comparable to the neural network.

Hobbs and Bourbakis (1996) studied the success of a neural network computer

model to predict the price of a stock, given the fluctuations in the rest of the market that

day. Based on the neural network's prediction, the program then measured its success by

simulating buying or selling that stock, based on whether the market's price was

determined to be overvalued or undervalued. The program consistently averaged over a

20% annual percent return and was time tested over six years with several stocks.

Two books that focused on the use of neural networks for investing were from

Trippi and Turban (1996b), a collection of journal articles written by others from 1988 to

1995, and Trippi and Lee (1996), a revision of an earlier book they published in 1992. This

book reviews modem portfolio theory, provides an overview of AI in investment

management, discusses machine learning and neural networks, and describes integrating

knowledge with databases.

A final study concerned using a neural network to forecast mutual fund end-of-year

net asset value (Chiang et al., 1996). Fifteen economic variables for 101 U.S. mutual funds

were identified as input to three models: a neural network, a linear regression model, and a

nonlinear regression model. The models were developed using a dataset covered the six-

year period from 1981 to 1986 and were evaluated using the actual 1986 Net Asset Values.

The predictions by the neural network had the lowest error rate of the three models.

2.2.3 Genetic Algorithms and Financial Applications

Genetic Algorithms (GAs) are search algorithms based on the mechanics of natural

selection and natural genetics (Holland, 1975). They have been shown to be effective at

exploring large and complex spaces in an adaptive way, guided by the equivalent biological

mechanisms of reproduction, crossover, and mutation. GAs have been used for machine

learning applications, including classification and prediction tasks, to evolve weights for

neural networks, and rules for learning classifier systems (Mitchell, 1997).

Genetic Algorithms combine survival-of-the-fittest among string structures with a

structured, yet randomized, information exchange to form a search algorithm with some of

the innovative flair of human search. The strings are referred to as chromosomes and they

are composed of genes (a feature on the chromosome) which have values referred to as

alleles (Goldberg, 1989).

In every generation, three operators create a new set of chromosomes: selection,

crossover, and mutation. The selection operator selects chromosomes in the population for

reproduction based on a fitness function that assigns a score (fitness) to each chromosome in

the current population. The fitness of a chromosome depends on how well that chromosome

solves the problem at hand (Mitchell, 1997). The fitter the chromosome, the greater the

probability for it to be selected to reproduce. The crossover operator randomly chooses a

locus and exchanges the chromosomal subsequences before and after that locus to create

two offspring. The mutation operator randomly flips some of the bits in a chromosome.

Mutation can occur at each bit position with some very small probability. While

randomized, Genetic Algorithms are no simple random walk. They efficiently exploit

historical information to speculate on new search point with expected improved

performance (Goldberg, 1989).

By way of explanation, we provide a simple Genetic Algorithm with a fitness

function that we want to maximize, for example, the real-valued one dimensional function:

f y) = y + Isin (32y) O
(Riolo, 1992). The candidate solutions are values of y, which are encoded as bit strings

representing real numbers. The fitness calculation translates a given bit string x into a real

number y and then evaluates the function at that value (Mitchell, 1997). The fitness of a

string is the function value at that point.

reproduction step individual strings are copied according to their objective

function values, f(y). Copying strings according to their

fitness value means that strings with a higher value have a

higher probability of contributing one or more offspring in

the next generation. This operator is an artificial version of

natural selection (Goldberg, 1989).

crossover step After reproduction, crossover may proceed in two steps.

First, members of the newly reproduced strings in the mating

pool are mated at random. Second, each pair of strings could

undergo crossing over as shown below, however, not all

strings mate:

Consider stringsAi andA2

AI= 1011 0101
A2= 111010000

The separator I indicates the uniformly, randomly

selected crossover site. The resulting crossover

yields two new strings where the prime (') means the

strings are part of the new generation:

A'1= 10110000
A'2= 11100101

The mechanics of reproduction and crossover are surprisingly

simple, involving random number generation, string copies,

and some partial string exchanges.

mutation step Mutation has been referred to as bit flipping. This operator

randomly changes Os to Is, and vice versa. When used

sparingly, as recommended, with reproduction and crossover,

it is an insurance policy against premature loss of important

string values (Goldberg, 1989).

With the production of a new generation, the system evaluates the fitness function

for the maximum fitness of the artificial gene pool. A Genetic Algorithm is typically

iterated for anywhere from 50 to 500 or more generations (Mitchell, 1997). One stopping

criteria for GAs is convergence of the chromosome population, defined as when 95% of the

chromosomes in the population all contain the same value or, more loosely, when the GA

has stopped finding new, better solutions (Heitkoetter and Beasley, 1997). Other stopping

criteria concern the utilization of resources (computer time, etc.). The entire set of

generations is called a run and, at the end of a run, there are often one or more highly fit

chromosomes in the population.

Goldberg (1989) describes the development of a classifier system using genetic

algorithms. The backbone of a classifier system is its rule and message system, a type of

production or rule-based system. The rules are of the form, if then ;

however, in classifier systems, conditions and actions are restricted to be fixed-length

strings. Classifier systems have parallel rule activation versus expert systems that use serial

rule activation.

Mitchell (1997) noted that Genetic Algorithms are used for evolving rule-based

systems, e.g., classifier systems, in which incremental learning (and remembering what has

already been learned) is important and in which members of the population collectively

solve the problem at hand. This is often accomplished using the Steady-State population

selection operator in which only a few chromosomes of the least fit individuals are replaced

by offspring resulting from crossover and mutation of the fittest individuals.

GAs have been proposed to work in conjunction with other machine learning

systems, such as neural networks (Kuncheva, 1993). In this sketch of an AI application, the

neural network is set up to provide a trading recommendation, for example, stay long, stay

short, or stay out of the market. The Genetic Algorithm is used to estimate the weights for a

neural network that optimizes the user-defined performance objectives and meets user-

defined constraints or risk limits. For example, they used a fitness function of the average

annual return achieved over three years.

A GA was applied to a portfolio merging problem of maximizing the return/risk

ratio with the added constraint of satisficing expected return (Edelson and Gargano, 1995).

The original problem was recast as a goal programming problem so that GAs could be used.

The results obtained with the GAs were comparable to those calculated by quadratic

programming techniques. The use of the goal programming conversion reduced the number

of generations to obtain convergence from 6,527 down to 780.

Mahfoud and Mani (1995) developed a procedure for extending GAs from

optimization problems to classification and prediction so that they could predict individual

stock performance. They describe the use of a niching method that permits the GA to

converge around multiple solutions or niches, instead of the traditional single point in the

solution space. The analogy in the financial forecasting case is that different rules within the

same GA population can perform forecasting for different sets of market and individual

company conditions, contexts, or situations. The niching method was used to predict the

direction of a randomly selected MidCap stock from the Standard & Poor's 400. The GA

correctly predicted the stock's direction relative to the market 47.6% of the time, produced

no prediction 45.8% of the time, and incorrectly predicted the direction relative to the

market 6.6% of the time. The no prediction state is equivalent to the stock being equally

likely to go in either direction.

In Trippi and Lee (1996), they describe a Genetic Algorithm used for a stock market

trading rule generation system. Buy and sell rules were represented by 20-element bit

strings to examine a solution space of 554,496 possible combinations. The GA was run

using a crossover rate of 0.6 and a mutation rate of 0.002. In 10 experiments of 10 trials

each using different starting strings, the average monthly returns of the best rule parameters

ranged from 6.04 to 7.52 percent, ignoring transaction costs. Although these results were

converged upon quickly by the GA, they did not differ much from optimal rules that were

obtained by a time-consuming exhaustive search.

Genetic Algorithms were used them to optimize the topology of a neural network

that predicted a stock's systematic risk, using the financial statements of 67 German

corporations from the period 1967 to 1986 (Wittkemper and Steiner, 1996).

Additionally, in two studies related to mutual funds but not rating systems, GAs were

used to simulate adaptive learning in a simple static financial market designed to exhibit

very similar behavior as mutual fund investors (Lettau, 1994) and (Lettau, 1997).

2.3 The C4.5 Learning System

Research on learning is composed of diverse subfields. At one extreme, adaptive

systems monitor their own performance and attempt to improve it by adjusting internal

parameters. A quite different approach sees learning as the acquisition of structured

knowledge in the form of concepts, or classification rules (Quinlan, 1986). A primary task

studied in machine learning has been developing classification rules from examples (also

called supervised learning). In this task, a learning algorithm receives a set of training

examples; each labeled as belonging to a particular class. The goal of the algorithm is to

produce a classification rule for correctly assigning new examples to these classes. For

instance, examples could be a vector of descriptive values or features of mutual funds. The

classes could be the Momingstar Mutual Fund ratings and the task of the learning system is

to produce a rule (or a set of rules) for predicting with high accuracy the rating for new

mutual funds.

We will focus on the data-driven approach of decision trees, specifically the C4.5

system (Quinlan, 1993). We present a brief history of C4.5, the algorithms used, the

limitations of the system, and examples of its use.

2.3.1 Brief History of C4.5

C4.5 traces its roots back to CLS (Concept Learning System), a learning algorithm

devised by Earl Hunt (Hunt et al., 1966). It solved single-concept learning tasks and used

the learned concepts to classify new examples. CLS constructed a decision tree that

attempted to minimize the cost of classifying an object (Quinlan, 1986). This cost had two

components: the measurement cost of determining the value of property A exhibited by the

object, and the misclassification cost of deciding that the object belongs to class J when its

real class was K.

The immediate predecessor of C4.5 was ID3 and it used a feature vector

representation to describe training examples. A distinguishing aspect of the feature vector is

that it may take on continuous real values as well as discrete symbolic or numeric values

(Cohen and Feigenbaum, 1982). Concepts are represented as decision trees. We classify an

example by starting at the root of the tree and making tests and following branches until a

node is arrived at that indicates the class. For example, Figure 2.4 shows a decision tree

with symbolic values of Good and Bad expert opinions on a stock. We call this node the

root of the decision tree. The tree branches to Price/Earnings (P/E) if the Expert Opinion is

Good and Price/Book (P/B) if the Expert Opinion is Bad. If the Expert Opinion is Good and

the P/E is > 3, then we classify the stock as Expected Return = High. If the P/E of the stock

is < 2, then we classify the stock as Expected Return = Medium. If the Expert Opinion is

Bad and the P/B is < 3, then we classify the stock as Expected Return = Medium; for P/B >

4, then Expected Return = Low.

Figure 2.4: Decision Tree Example.

Decision trees are inherently disjunctive, since each branch leaving a decision node

corresponds to a separate disjunctive case. The left-hand side of the decision tree in Figure

2.4 for high expected return is equivalent to the predicate calculus expression:

[Expert Opinion (x,Good) v Expert Opinion (x, Bad)] A
[P/E (x, > 3) v P/E (x, < 2)]

Consequently, decision trees can be used to represent disjunctive concepts (Cohen and

Feigenbaum, 1982).

ID3 was designed for the learning situation in which there are many features and the

training set contains many examples, but where a reasonably good decision tree is required

without much computation. It has generally been found to construct simple decision trees,

but the approach it uses cannot guarantee that better trees have not been overlooked

(Quinlan, 1986). This will be discussed in more detail in Section 4.2.

The crux of the problem for ID3 was how to form a decision tree for an arbitrary

collection C of examples. If C was empty or contained only examples of one class, the

simplest decision tree was just a leaf labeled with the class. Otherwise, let T be any test on

an example with possible outcomes {Oi, 02,... Ow). Each example in C would give one of

these outcomes for T, so T produced a partition {C1, C2,..., Cw} of C with C, containing

those examples having outcome Oi. If each subset Ci was replaced by a decision tree for Ci,

the result would be a decision tree for all of C. Moreover, so long as two or more Ci's are

non-empty, each C, is smaller than C. In the worst case, this divide-and-conquer strategy

would yield single-example subsets that satisfied the one-class requirement for a leaf Thus,

if a test could always be found that gave a non-trivial partition of any set of examples, this

procedure could always produce a decision tree that correctly classifies each example in C

(Quinlan, 1986).

The choice of test was crucial for ID3 if the decision tree was to be simple and ID3

used an information-based method that depended on two assumptions. Let C contain p

examples of class P and n of class N. The assumptions were:

(1) Any correct decision tree for C will classify examples in the same proportion

as their representation in C. An arbitrary example will be determined to

belong to class P with probability p/(p+n) and to class N with probability

n/(p+n).

(2) When a decision tree is used to classify an example, it returns a class. A

decision tree can thus be regarded as a source of a message 'P' or 'N, with

the expected information needed to generate this message given by

I(p,n) log, ._ log2
p+n p+n p+n p+n

If feature A with values {Ah, A2,..., Aw} is used for the root of the decision tree, it will

partition C into {Ci, C2, ...,Cv) where Ci contains those examples in C that have value Ai of

A. Let Ci contain pi examples of class P and ni of class N. The expected information

required for the subtree for C, is I(p,, n). The expected information required for the tree

with A as root is then obtained as the weighted average

E(A) = l(p,, n,)
p+n

where the weight for the ith branch is the proportion of the examples in C that belong to Ci.

The information gained by branching on A is, therefore

gain(A) = I(p, n) E(A)

One approach would be to choose a feature to branch on which gains the most information.

ID3 examines all candidate features and chooses A to maximize gain(A), forms the tree as

above, and then uses the same process recursively to form decision trees for the residual

subsets Ci, C2, ..., C, (Quinlan, 1986).

The worth of ID3's feature-selecting greedy heuristic can be assessed by how well

the trees express real relationships between class and features as demonstrated by the

accuracy with which they classify examples other than those in the training set. A

straightforward method of assessing this predictive accuracy is to use only part of the given

set of examples as a training set and to check the resulting decision tree on the remainder or

testing set.

Quinlan (1986) carried out several experiments to test ID3. In one domain of 1.4

million chess positions, using 49 binary-valued features in the feature vector, the decision

tree correctly classified 84% of the holdout sample. Using simpler domains of the chess

problem, correct classification was 98% of the holdout sample.

2.3.2 C4.5 Algorithms

C4.5 is an improved version of ID3 that provides the researcher with the ability to

prune the decision tree to improve classification of noisy data. It also provides a subsystem

to transform decision trees into classification rules. This system of computer programs

constructs classification models similar to ID3 by discovering and analyzing patterns found

in the examples provided to it. Not all classification tasks lend themselves to this inductive

approach and Quinlan (1993) reviews the essential requirements:

Feature-value description: All information about an example must be

expressible in terms of a fixed collection of properties or features. Each

feature may be either discrete or continuous, but the features used to describe

an example may not vary from one example to another.

Predefined classes: The categories to which the examples are to be assigned

must have been established beforehand. This is the supervised learning

model.

Discrete classes: The classes are sharply delineated. An example belongs to

only one class.

Sufficient data: Inductive generalization proceeds by identifying patterns in

data. The approach fails if valid, robust patterns cannot be distinguished

from chance coincidences. As this differentiation usually depends on

statistical tests of one kind or another, there must be sufficient examples to

allow these tests to be effective.

"Logical" classification models: The programs construct only classifiers that

can be expressed as decision trees or sets of rules. These forms essentially

restrict the description of a class to a logical expression whose primitives are

statements about the values of particular features.

Figure 2.5 presents the schematic diagram of the C4.5 system algorithm (Quinlan et

al., 1987). We will discuss several algorithms concerning the evaluation tests carried out by

C4.5, the handling of unknown feature values, and pruning decision trees to improve

classification accuracy on the testing set.

Most decision tree construction methods are nonbacktracking, greedy algorithms.

A greedy algorithm chooses the best path at the time of the test although this may later be

shown suboptimal. Therefore, as noted before, a greedy algorithm is not guaranteed to

provide an optimal solution (Cormen et al., 1990).

C4.5

repeat several times:

GROW:
initialize working set
repeat

FORM TREE for v
if stopping criterion
choose best
otherwise,
choose best feature
divide working set
invoke FORM TRI

test on remainder of training
until no improvement possi

PRUNE:
while decision tree contains
both complex and of margin
replace subtree by leaf

select most promising pruned tree

Figure 2.5: Schematic Diagram of C4.5.

2.3.2.1 Gain criterion and eain ratio criterion

working set:
Sis satisfied,
class

test
accordingly
EE on subsets

gset
ns to working set
ble

subtrees that are
al benefit,

C4.5 provides two means of evaluating the heuristic test used by the divide-and-

conquer algorithm:

the gain criterion which was used in ID3

the gain ratio criterion

The information theory underpinning the gain criterion has been summarized by

Quinlan (1993, p. 21) as, "The information conveyed by a message depends on its

probability and can be measured in bits as minus the logarithm to base 2 of that probability."

The probability that a randomly drawn example for a set S of examples belonging to some

class C, is

freq(C ,S)
|S|

and the information it conveys is

-log2 (freq(C, S) 'bits.
SI

We define the expected information from such a message pertaining to class membership by

summing over the classes in proportion to their frequencies in S (Quinlan, 1993),

k freq(C,, S) (freq(C,,S).
info(S) x log ) b its.
ii S( Is J
When applied to the set of training cases, T, info(T) measures the average amount of

information needed to identify the class of a case in T

Now consider a similar measurement after partitioning T in accordance with the n

outcomes of a test X The expected information requirement can be found as the weighted

sum over the subsets, as

info (T) = x info(T,).
TheI an I
TThe quantity
The quantity

gain(X) = info(T) info (T)

measures the information that is gained by partitioning T in accordance with the test X

(Quinlan, 1993). The gain criterion, then, selects a test to maximize this information gain.

The gain ratio criterion was developed to eliminate the gain criterion of bias in favor

of tests with many outcomes (Quinlan, 1993). The bias can be rectified by the following

sets of equations which, by analogy with the definition of info(S), we have

split info(X) T -L x log 2(J ,

This represents the potential information generated by dividing the training set, T, into n

subsets, whereas the information gain measures the information relevant to classification

that arises from the same division. Then,

gain ratio(X) = gain(X) / split info(X)

expresses the proportion of information generated by the split that is useful, i.e., that appears

helpful for classification (Quinlan, 1993). If the split is near-trivial, split information will be

small and this ratio will be unstable. To avoid this, the gain ratio criterion selects a test to

maximize the ratio above, subject to the constraint that the information gain must be large--

at least as great as the average gain over all tests examined.

Mingers (1989) performed an empirical comparison of selection measure used for

decision tree induction reviewing Quinlan's information measure (1979), the 2

contingency table statistic, using probabilities rather than the X2 the GINI index of

diversity developed by Breiman et al. (1984), the Gain-ratio measure as discussed above,

and the Marshall correction factor, which can be applied to any of the previous measures

and favors features which split the examples evenly and avoid those which produce small

splits. Mingers evaluated these measures on four datasets and concluded that the predictive

accuracy of induced decision trees is not sensitive to the goodness of split measure.

However, the choice of measure does significantly influence the size of the unpruned trees.

Quinlan's Gain-ratio generated the smallest trees, whereas 2 produced the largest. An

additional study (Buntine and Niblett, 1992) confirmed Mingers results while taking issue

with his use of random selection as a comparison to the various methods he studied.

Fayad and Irani (1992) reviewed the ability of ID3 to classify datasets with

continuous-valued features. Such a feature is handled by sorting all the values for that

feature and then partitioning it into two intervals using the gain or gain ratio criterion. They

determined that the algorithm used by ID3 for finding a binary partition for a continuous-

valued feature will always partition the data on a boundary point.

2.3.2.2 Handling unknown feature values

The above algorithms assume that the outcome of a test for any example can be

determined. In many cases of classification research, unknown features appear due to

missed determinations, etc. In the absence of some procedure to evaluate unknown features,

entire examples would have to be discarded, much the same as for missing data in LDA and

Logit.

C4.5 improves upon the definition of gain to accommodate unknown feature values

(Quinlan, 1993). It calculates the apparent gain from looking at examples with known

values of the relevant feature, multiplied by the fraction of such cases in the training set.

Expressed mathematically this is

gain(X)= probability A is known x (info(T) infox (T))

Similarly, the definition of split info(X) can be altered by regarding the examples with

unknown values as an additional group. If a test has n outcomes, its split information is

computed as if the test divided the cases into n + I subsets (Quinlan, 1993).

2.3.2.3 Pruning decision trees

The recursive partitioning method of constructing decision trees continues to

subdivide the set of training cases until each subset in the partition contains cases of a single

class, or until no test offers any improvement (Quinlan, 1986). The result is often a very

complex tree that "overfits the data" by inferring more structure than is justified in the

training cases. Two approaches to improving the results of classification are prepruning or

construction-time pruning, and postpruning. C4.5 uses postpruning.

In prepruning, the typical approach is to look at the best way of splitting a subset and

to assess the split from the point of view of statistical significance, information gain, or error

reduction. If this assessment falls below some threshold, the division is rejected and the tree

for the subset is just the appropriate leaf. Prepruning methods have a weakness in that the

criterion to stop expanding a tree is being made on local information alone. It is possible

that descendent nodes of a node may have better discriminating power (Kim and Koehler,

1995).

C4.5 allows the tree to grow through the divide-and-conquer algorithm and then it is

pruned. C4.5 performs pessimistic error rate pruning developed by Quinlan (1987) and uses

only the training set from which the tree is built. An estimate is made of the error caused by

replacing a subtree with a leaf node If the error is greater with the leaf node, the subtree

remains and vice-versa. Michie (1989) noted that tests on a number of practical problems

gave excellent results with this form of pruning. Mingers (1989) also reported that pruning

improved the ability of decision tree classification.

2.3.3 Limitations of C4.5

Like any classifier, a decision tree specifies how a description space is to be carved

up into regions associated with the classes. When the task is such that class regions are not

hyperrectangles, the best that a decision tree can do is approximate the regions by

hyperrectangles. This is illustrated in Figure 2.6 below in which the classification region is

defined better by the triangular region on the left versus the rectangular regions that would

be used by C4.5 (Quinlan, 1993).

v- .

Figure 2.6: Real and Approximate Divisions for an Artificial Task.

(Michie, 1987, 1989) identified other limitations of ID3 descendents. The former

mentioned that pruning of decision trees when the data are inconclusive and this was

mentioned in (Quinlan, 1993). Inconclusive data are when the features used in describing a

set of examples are not sufficient to specify exactly one outcome or class for each example.

In the latter reference, the use of a feature vector for large domains, such as medical

diagnosis, is discussed.

2.3.4 C4.5 Financial Applications

Braun and Chandler (1987) performed the earliest reported business application

research of rule-induction classification with a variant of ID3, known as ACLS. Using a

database of 80 examples from an investment expert's predictions, they used ACLS to

formulate rules to predict not only the expert's prediction of the market but to predict the

actual market movement. Using 108 examples, ACLS correctly predicted actual market

movement 64.4% of the time. The expert correctly predicted market movement 60.2% of

the time.

Rules for loan default and bankruptcy were developed using a commercial variant of

ID3 (Messier, Jr. and Hansen, 1988). The investigators were surprised with the small

decision tree that was developed and how it correctly classified the testing set with 87.5%

accuracy. The ID3 results of the bankruptcy data were favorably compared to LDA.

Miller (1990) discussed the use of classification trees, similar to ID3, for credit

evaluation systems. Chung and Silver (1992) compared Logit, ID3, and Genetic Algorithms

to the outcomes of experts for graduate admissions and bidder selection. The three methods

performed comparably on the graduate admissions problem but significantly different on the

bidder selection problem where the GA had the superior performance. One conclusion of

the study is that the nature of the problem-solving task matters. A review of the data

showed that the bidder selection problem had a feature that made it difficult for ID3 to build

the decision tree to a high degree of accuracy.

An application of ID3 that is pertinent to our present study is for stock screening and

portfolio construction (Tam, 1991). To demonstrate the effectiveness of the inductive

approach, trading rules were inferred from eight features. Three portfolios were constructed

from three rules each year, and their performance was compared to market standards. In

every case, the portfolios outperformed the Standard & Poor's 500 Index.

In Kattan et al. (1993), human judgment was compared to the machine learning

techniques of ID3, regression trees (Breiman et al., 1984), a back-propagation neural

network, and LDA. Human subjects were put into teams and were allowed to induce rules

from historical data under ideal conditions, such as adequate time and opportunity to sort the

data as desired. The task at hand was to emulate the decisions made by a bank officer when

processing checking account overdrafts A sample of 340 useable observations was

gathered for the experiment. The results on multiple holdout samples indicated that human

judgment, regression trees, and ID3 were equally accurate and outperformed the neural

network.

Fogler (1995) discussed the strengths and weaknesses of using classification tree

programs, such as ID3, to explain nonlinear patterns in stocks. The algorithm sequentially

maximizes the explanatory power at each branch, looking forward only one step at a time.

Additionally, he notes that in larger problems, the classification trees might differ. This

paper also reviews financial applications of Neural Nets, Genetic Algorithms, Fuzzy Logic,

and Chaos.

Harries and Horn (1995) researched the use of strategies to enhance C4.5 to deal

with concept drift and non-determinism in a time series domain. An aim of this study was

to demonstrate that machine learning is capable of providing useful predictive strategies

in financial prediction. For short term financial prediction, a successful prediction rate of

60% is considered the minimum useful to domain experts. Their results implied that

machine learning can exceed this target with the use of new techniques By trading off

coverage for accuracy, they were able to minimize the effect of both noise and concept

drift.

Trippi and Lee (1996) suggest that inductive learning algorithms such as ID3 could

be used to generate rules that classify stocks and bonds into grades. They note that when

used for classification problems, inductive learning algorithms compete well with neural

network approaches.

The classification performance of twenty-two decision tree, nine statistical tests, and

two neural network methods were recently compared in terms of prediction error,

computational time, and the number of terminal nodes for decision trees using thirty-two

datasets (Lim et al., 1997). The datasets were obtained from the University of California at

Irvine Repository of Machine Learning Databases. It was found that a majority of the

methods, including C4.5, LDA, and Logit, had similarly low prediction error rates in the

sense that differences in their error rates were not statistically significant.

2.4 Linear Discriminant Analysis

2.4.1 Overview

Linear discriminants, first studied by Fisher (1936), are the most common form of

classifier, and are quite simple in structure (Weiss and Kulikowski, 1991). The name

indicates that a linear combination of the evidence will be used to separate or discriminate

among the classes and to select the class assignment for an unseen case. For a problem

involving d features, this means geometrically that the separating surface between the

sample will be a (d-1) dimensional hyperplane.

The general form for any linear classifier is given as follows:

wle1 + w2e2 +. +wded WO

where (el,e2,...,ed) are the feature vectors, dis the number of features, and wl are constants

that must be estimated. Intuitively, we can think of the linear discriminant as a scoring

function that adds to or subtracts from each observation, weighing some observations more

than others and yielding a final total score. The class selected, C1, is the one with the highest

score (Weiss and Kulikowski, 1991).

2.4.2 Limitations of LDA

In classical LDA there are some limits on the statistical properties which the

discriminating variables are allowed to have. No variable may be a linear combination of

other discriminating variables. A "linear combination" is the sum of one or more variables

which may have been weighted by constant terms. Thus, one may not use either the sum or

the average of several variables along with all those variables. Likewise, two variables

which are perfectly correlated cannot be used at the same time. Another requirement is that

the population covariance matrices are equal for each group (Klecka, 1980).

Another assumption for classical LDA is that each group is drawn from a population

that has a multivariate normal distribution. Such a distribution exists when each variable has

a normal distribution about fixed values on all the others. This permits the precise

computation of tests of significance and probabilities of group membership. When this

assumption is violated, the computed probabilities are not exact but they may still be quite

useful if interpreted with caution.

It should be noted that there are many generalizations of LDA, including the Linear

Programming variant, that don't have these restrictions (Koehler, 1989), however, many

financial applications studies don't seem to be concerned about the restrictions. Karels and

Prakash (1987), in their study of the use of discriminant analysis for bankruptcy prediction,

noted that violating the multivariate normality constraint was the rule rather than the

exception in finance and economics. A more recent study of neural network classification

vs. discriminant analysis (Lacher et al., 1995, p. 54) noted that the multivariate normal

constraint, and others, are incompatible with the complex nature and interrelationships of

financial ratios. Discriminant analysis techniques have proven better in financial

classification problems until recently when new artificial intelligence classification

procedures were developed.

2.5 Logistic Regression (Logit)

Logit techniques are well-described in the literature (Altman et al., 1981),(Johnston,

1972), and (Judge et al., 1980). If we incorrectly specify a model as linear, the statistical

properties derived under the linearity assumption will not, in general, hold. The obvious

solution to this problem is to specify a nonlinear probability model in place of the linear

model (Sestito and Dillon, 1994). Logistic regression uses a nonlinear probability model

that investigates the relationship between the response probability and the explanatory

features. It is useful in classification problems involving nonnormal population distributions

and noncontinuous features. Studies have shown that the normal linear discriminant and

logistic regression usually give similar results (Weiss and Kulikowski, 1991).

Logistic regression calculates the probability of membership in a class. The model

has the form:

Logit (p) = log (p / (1 p)) = a + j'x

where p = Pr (Y= 1| x) is the response probability to be modeled, a is the intercept

parameter, and 13 is the vector of slope parameters (SAS Institute, 1992). Output from the

analysis is used to calculate the probability of membership in a class.

Several recent financial application studies compared Logit to machine learning or

other AI techniques such as Case-Based Reasoning (CBR). Dwyer (1992) compared the

performance of Logit and nonparametric Discriminant Analysis to two types of Neural

Networks in predicting corporate bankruptcies. Bankruptcy data drawn from a ten-year time

horizon was input into each of the four models, with predictive ability tested at one, three,

and five years prior to bankruptcy filing. The results suggested that Logit and the

backpropagation Neural Network were generally evenly matched as prediction techniques.

Hansen et al. (1993) studied a difficult audit decision problem requiring expertise

and they compared the performance of ID3, Logit, and a new machine learning algorithm

called NEWQ. While NEWQ performed best with 15 errors, Logit produced 16 errors and

ID3 had 18 errors out of 80 examples.

Logit was compared to a CBR system called ReMind (Bryant, 1996) for predicting

corporate bankruptcies. The database used in the study consisted of nonbankrupt and

bankrupt firms in a 20:1 ratio. Logit outperformed ReMind and one conclusion of the study

was that the sample size of bankrupt firms was too small for ReMind to work well.

2.6 Summary

This chapter has reviewed the literature of machine learning, inductive learning,

C4.5 and ID3, and the statistical techniques that we propose using in our research. In

Chapter 3 we focus on the domain problem of classifying mutual funds, identify our

hypotheses for further study, explain the statistical tests used to verify them, and conclude

with a discussion of the benefits of this research.

CHAPTER 3
DOMAIN PROBLEM

In this chapter we will provide an overview of mutual fund ratings systems and

provide background on Morningstar, Inc., the mutual fund rating company of interest to

this research. We will describe the Morningstar rating system, mention observations and

criticisms of the Morningstar rating system, and discuss how investment professionals

use rating systems. We will also review research about the persistence of mutual fund

performance and identify an interesting relationship between the average mutual fund

rating one year and the succeeding year's rating and average one-year return. This

chapter will end with a specification the problems to be studied in this research.

3.1 Overview of Mutual Fund Ratings Systems

Mutual funds are open-end investment companies that buy and sell their shares to

individuals or corporations. Mutual fund buy and sell transactions occur between the fund

and the investor, and do not take place on a secondary market such as the New York Stock

Exchange. Since asset holdings are restricted to various forms of marketable securities, the

total market value of the fund's assets is relatively easy to calculate at the end of each day's

trading. The market value per share of a given mutual fund is equal to the total market value

of its assets divided by the number of shares of stock the fund has outstanding. We refer to

this value as the fund's Net Asset Value (NAV) and it is the price at which the fund will buy

or sell shares to the public (Radcliffe, 1994).

Peter Lynch (1993), until recently manager of the large Fidelity Magellan mutual

fund, wrote that, "...mutual funds were supposed to take the confusion out of investing--no

more worrying about which stock to pick." The growth in the number of mutual funds has

been quite astounding. The November 13, 1995 issue of Business Week noted:

"Still, many investors can't resist them. Equities have been on a long bull
run, and the number of new funds keeps growing-474 have been added to
Morningstar Inc.'s 6,730-fund database so far this year. The temptation to
invest in newbies is understandable, since 61% of all equity mutual funds are
less than three years old, according to CDA/Wiesenberger, a Rockville (Md.)
mutual-fund data service. In fact, nearly one-third of all money flowing into
equity mutual funds in the past 12 months went to those with less than a five-
year track record, says State University of New York at Buffalo finance
professor Charles Trzcinka." (Dunkin, 1995, p.160)

In 1976, 452 mutual funds existed and this number had only grown to 812 mutual

funds managing $241.9 billion in assets by 1987. According to the Investment Company Institute (ICI), there are 2,855 stock mutual funds today managing$2.13 trillion of assets.

This is comparable to the number of stocks on the New York Exchange. The ICI is an

association that represents investment companies. Its membership includes 5,951 open-end

investment companies or mutual funds, 449 closed-end investment companies and 10

sponsors of unit investment trusts. Its mutual fund members have assets of about $3.056 trillion, accounting for approximately 95% of total industry assets, and have over 38 million individual shareholders. Moreover, the growth rate continues. The August 28, 1997 issue of the Wall Street Journal noted that investors put a net$26.56 billion into stock funds in

July and net bond inflows for July were $4.21 billion. Mutual fund rating services, similar to stock and bond rating services, have been in existence since 1940. According to the January 7, 1993 Wall Street Journal, Wiesenberger Financial Services was the oldest mutual fund performance tracking company and had been rating funds since The Investment Company Act of 1940 which authorized the creation of mutual funds. This company merged with CDA Investment Technologies Inc. into CDA/Wiesenberger (CDA/W) in 1991. CDA/W provides a monthly report to subscribers listing performance, portfolio characteristics, and risk and dividend statistics on mutual funds (CDA/Wiesenberger, 1993). The company determines the CDA Rating of mutual funds, a proprietary measure, which is a composite percentile rating from 1 (best) to 99 (worst), based on the fund's performance over the past four market cycles. Two up cycles and two down cycles are used if available, however, at least two cycles (one up and one down) are required for a rating. According to CDA/W, the best-rated funds will be those that have done well in different market environments, whose recent performance continues strong, and whose comparative results have not fluctuated wildly over varying time periods. In determining the CDA Rating, they give extra weight to recent performance (latest 12-months), and penalize funds for inconsistency. CDA/W does not provide the methodology for determining the CDA Rating in their newsletter. The newest rating service is the Value Line Mutual Fund Survey that rates mutual funds by a proprietary rating system. Fund risk is rated from one (safest) to five (most volatile) by Value Line. They also provide an overall rating for the fund on a scale of 1 to 5. Value Line also prints a one-page summary of the fund's performance. Lipper Analytical Services publishes mutual fund indexes in the Wall Street Journal and Barron's, and a Mutual Fund Scorecard in the Wall Street Journal. Lipper's scorecard does not provide a rating for funds but lists the top 15 and bottom 10 performers based on total return over 4 weeks, 52 weeks, and 5 years. The Wall Street Journal mutual fund list ranks mutual funds by investment objective from A, B, C, D, or E (units of 20% each) for total return. The list includes 4 week, 13 week, 26 week, 39 week, 1 year, 3 year, 4 year, and 5 year returns. In 1996, Lipper was tracking 4,555 stock mutual funds. 3.1.1 Morningstar. Inc. Overview Morningstar Mutual Funds is a mutual fund rating service started in April 1984 Its first publication was the quarterly Mutual Fund Sourcebook for stock equity funds. Within two years it was publishing Mutual Fund Values, featuring the one-page analysis of funds that was to become the firm's cornerstone product. It also added bond funds to its coverage. In November 1985, Business Week asked Morningstar to provide data for a new mutual fund issue. Business Week insisted upon a fund rating system, and development work on the magazine's rating system paved the way for Momingstar's own 5-star rating system, which was introduced early in 1986 (Leckey, 1997). Momingstar sales went from$750,000

in its third year of operation to $11 million in its seventh year. Morningstar publishes the following software and print products: Software Published Monthly Morningstar Ascent: Software for the do-it-yourself investor with a database of 7,800 funds Morningstar Stock Tools: online stock newsletter that lets you screen, rank, and create model porfolios from a database of 7,900 stocks Morningstar Principia and Principia Plus: Software for investment professionals providing data on mutual funds, closed-end funds, and variable annuities. The Principia Plus also features a portfolio developer and advanced analytics. Print Morningstar Mutual Funds: Indepth data and analysis on more than 1,600 funds that is published every other week. Morningstar No-Load Funds: a detailed look at nearly 700 no- and low-load funds that is published every four weeks. Morningstar Investor: A 48-page monthly publication featuring articles and information on 500 mutual funds. Morningstar Mutual Fund 500: A year-end synopsis of 500 of the best funds. Morningstar Variable Annuity/Life Performance Report: One monthly guide that covers the variable annuity universe. The Chicago-based firm has become the preeminent source of fund information for investors. Charles A. Jaffe, the Boston Globe financial reporter, said in an August 6, 1996 article, that more than 95 percent of all money flowing into funds goes to those carrying Morningstar's four- and five-star ratings. According to the February 24, 1997 Wall Street Journal, eighty percent of Morningstar's client base are made up of financial planners and brokers say that the firm's star ratings are a big factor in selling funds. 3.1.2 Morningstar Rating System In June 1994, Morningstar had over 4,371 mutual funds in its total universe of funds on CD-ROM. Of these, 2,342 (54%) having three or more years of performance data were rated according to the Morningstar one-star to five-star rating system. Only 1,052 (24%) of the funds were equity mutual funds based on a self-identified investment objective. By June 1995, the number of funds had increased by 51% to 6,584 with 2,871 (44%) having Morningstar ratings and 1,234 (19%) rated as equity funds. The July 1996 Morningstar Principia had 1,583 rated equity mutual funds. Thus, less than half of all rated funds are equity funds and this represents only part of all the funds in the Morningstar database. The main criteria Morningstar uses for including a fund in the biweekly newsletter is that the fund be listed on the NASDAQ (National Association of Security Dealers Automatic Quotation System). Other factors that enter this determination are the cooperation of the fund group, the space limitations of the publication, the asset value of the fund, and the investor interest in the fund (Momingstar, 1992). Domestic stock funds, taxable-bond funds, tax-free bond funds, and international stock funds are each rated separately. These are referred to as the fund classes. Momingstar includes hybrid funds in the domestic stock universe. To determine the star rating, Morningstar analysts calculate Morningstar risk and Morningstar return. They then determine the relative placement of a fund in the rating system by subtracting Morningstar Risk from Morningstar Return (Morningstar, 1992) and ordering the funds from highest to lowest. A fund's Morningstar Risk is determined by subtracting the 3-month Treasury bill return from each month's return by the fund. They sum those months with a negative value and the total losses are divided by the total number of months in the rating period (36, 60, or 120). They compare the average monthly loss for a fund to those of all equity funds by dividing the average risk for this class of funds into all these values and this sets the average Morningstar Risk to 1.00. The resulting risk value expresses the percentage points of how risky the fund is relative to the average fund. For example, a mutual fund with a Morningstar Risk rating of 1.30 is 30%/ more risky than the average mutual fund risk. Morningstar Return is the fund's total return adjusted for all loads (sales commissions) and management fees applied by a fund, that is in excess of the Treasury Bill rate. Morningstar (1992) asserts that the effect of loads will clearly affect a three-year star rating more than a ten year one. Unless the fund's load is substantially different from those of its competitors the effect will not be unduly pronounced. The main reason for including the load in the rating process is so that investors can compare the Morningstar Return numbers for load and no-load funds. For example, in the June 1995 Morningstar CD-ROM database there were 2,360 mutual funds with a mean front-end load of 4.33%. Of these, 940 were equity funds having a mean front-end load of 4.75%. Morningstar assumes the full load was paid on front-end loads. Investors who hit the load fund's breakpoints will receive higher than published returns. Deferred sales charges and redemption fees are included in the calculation by assuming the investor sold the fund at the end of the particular rating period. For the three-year period Morningstar typically charges half of the fund's maximum deferred charge and, for the ten-year period, they ignore the deferred charges. The average value of the Morningstar Return is divided into each calculated return value resulting in the average Morningstar Return being set to 1.00 to allow quick comparisons between different mutual funds. The interpretation of Morningstar Return is similar to Momingstar Risk: a Morningstar Return of 1.30 means that the fund is returning 30% more than the average fund. The result of Morningstar Return minus Morningstar Risk is the Risk-Adjusted Performance (RAP). Morningstar calculates it for each fund class for three years, five years and ten years, if the financial data are available for those periods. Based on the number of years of data available, a weighted average is calculated to report the overall Morningstar Risk-Adjusted Rating: 1. If three years are available, Morningstar uses 100% of 3 year RAP. 2. If five years of data are available, they use 40% of the 3-year RAP and 60%/ of the 5- year RAP. 3. If ten years of data are available, they use 20% of the 3- year RAP, 30% of the 5- year RAP, and 50% of the 10-year RAP. Momingstar orders the results from highest to lowest and distributes the star ratings by a symmetric normal curve. The top 10% of funds receive five stars (highest), the next 22.5% receive four stars, the middle 35% receive 3 stars, the lower 22.5% receive two stars, and the bottom 10% receive I star. 3.1.3 Review and Criticism of the Momingstar Rating System The Momingstar ratings are published in the biweekly Morningstar Mutual Fund newsletter and the ratings are updated monthly. Momingstar suggests that an investor could develop their own rating system by revising the weights for the data, for example, to emphasize risk more than return. In practice, this is difficult to do since Morningstar publishes the detailed quantitative data on a select sample of 130 mutual funds every two weeks in the newsletter. A subscriber to the newsletter will only see the quantitative values needed for developing their own rating system about twice a year unless they subscribe to the more expensive CD-ROM database that provides all the necessary information on a monthly basis. Momingstar (1992) states that its rating system is a purely quantitative method for evaluating mutual funds and they only use objective data to determine the Morningstar rating. They tell investors to use it only as a screening device and not a predictor for future performance. However, in the early years of the star rating system Momingstar labeled 5- star funds "buy" and 1-star funds "sell", a practice that was dropped in 1990 (Leckey, 1997). Two studies of the Morningstar rating have shown that the 5-star system is not predictive of fund performance, according to Leckey (1997). Mark Hulbert, editor of The Hulbert Financial Digest, which tracks investment letters, used the Morningstar ratings to invest$10,000 in 5-star funds. He sold the mutual funds when their rating declined and,

over a five-year period, this trading system failed to beat the Wilshire 5000 index.

Momingstar argued that the 5-star funds should not be seen as a portfolio but Hulbert

disagreed.

A study by Lipper Analytical Services looked at how the 5-star funds performed

over the next twelve months when purchased at the beginning of 1990, 1991, 1992, and

1993. Lipper reported that a majority of 5-star stock funds did worse in the rest of the year

than the average stock fund (Leckey, 1997).

Momingstar has conducted a study that shows that stock funds rated 5-stars in 1986,

when the rating system started, posted respectable or better results during the next nine

years. Conversely, more than a dozen 1-star stock funds have performed so badly as to be

merged out of existence (Leckey, 1997). Another study performed by Morningstar showed

that a statistically significant majority of funds that received 4- and 5-star ratings in 1987

maintained those high ratings a decade later. In her commentary, Morningstar senior analyst

Laura Lallos noted that, "By the standards of what it sets out to do-separating the long-term

winners from the losers-the rating is actually quite successful (Harrell, 1997)."

3.1.4 Investment Managers Use of Ratings

Ratings possess a considerable value to the investment community as indicated by

the extensive use made of them by many institutional and individual investors over a long

period (Teweles and Bradley, 1987). Even organizations with extensive staffs use them in

cross-checking their investigations. They are a quick, easy reference available to most

investors and, when used with care, they are a valuable source of information to supplement

other data

The Momingstar approach to rating mutual funds with five classes is similar to the

classification of stocks in the securities industry today. Elton and Gruber (1987) found that

the stockbroker making an investment decision quite often receives a list of stocks with a

ranking on each (usually from one to five) and perhaps some partial risk information. If

stocks are ranked (grouped) from 1 to 5, stocks ranked in group 1 are best buys, a 2 is a buy,

3 is a hold, 4 is a sell, while a 5 is a definite sell. Of 40 financial institutions they surveyed,

80%0 stated that the data the brokerage community and/or their analysts supplied to the

portfolio managers were in the form of grouped data. The institutions using grouped data

reported that 50% grouped them on expected return, 30% on risk-adjusted return, 10%/ on

expected deviations from a Capital Asset Pricing Model, and 10%0 responded they did not

even know the basis of the grouping.

Fama (1991) noted that the Value Line Investment Survey publishes weekly

rankings of 1,700 common stocks into five groups. Group 1 has the best return prospects

and group 5 the worst. There is evidence that, adjusted for risk and size of the company,

group 1 stocks have higher average returns than group 5 stocks for horizons out to one year.

A study of the Banker's Trust Company stock forecast system (Elton et al., 1986)

used the ranking of stocks on a five-point scale from 33 brokerage firms. A rating of I or 2

was a buy recommendation, 3 was neutral, and 4 and 5 were sell recommendations.

Approximately 700 stock analysts used the system over the three-year period of the study.

We made two observations from this study:

1. On average, changes occurred to 11% of the classifications every month.

2. Table 3.1 shows that, over the three-year period of the study, the distribution

of the monthly average of 9,977 stock forecast ratings was in a skewed curve

Table 3.1: Distribution of Stock Ratings

Rating 1981 1982 1983 Overall
1 17.4% 14.9% 14.7% 15.8%
2 37.6% 29.4% 30.3% 32.5%
3 32.9% 36.8% 40.6% 38.0%
4 10.6% 10.7% 11.6% 11.2%
5 1.5% 2.5% 2.8% 2.4%

The use of rankings for investment instruments has a long history of usage in the financial

community and the investment community readily accepts them in recommending broker

sales to customers.

3.1.5 Performance Persistence in Mutual Funds

Brokers usually base their recommendation to purchase a financial instrument on

historical performance. Investors flock to well-performing mutual funds and common

stocks based on the anticipation of continued performance. This continued performance, if

it does exist, is referred to as performance persistence and a review of the literature provides

and relative ranking were useful in predicting future performance, particularly for raw

returns. A later study (Brown and Goetzmann, 1995) demonstrated that the relative

performance pattern of funds depended upon the period observed and was correlated across

managers. A year-by-year decomposition showed that persistence of return was due to a

common investment strategy and not standard stylistic categories and risk-adjustment

procedures.

Another study of performance persistence (Manly, 1995) showed that relative

performance of no-load, growth-oriented mutual funds persisted for the near term with the

strongest evidence for a one-year evaluation horizon. The difference in risk-adjusted

performance between the top and bottom octile portfolios was six to eight percent per year.

Malkiel (1990) observed that while performance rankings always show many funds

beating the averages--some by significant amounts--the problem is that there is no

consistency to the performances. He felt that no scientific evidence has yet been assembled

to indicate that the investment performance of professionally managed portfolios as a group

has been any better than that of randomly selected portfolios. A later study (Malkiel, 1995)

noted that performance persistence existed in the 1970s but not in the 1980s.

Grinblatt and Titman (1992) identified the difficulty of detecting superior

performance by mutual fund managers by analyzing total returns since the fund manager

may be able to charge higher load fees or expenses. They constructed gross returns for a

sample of mutual funds for December 31, 1974, to December 31, 1984. They concluded

that superior performance by fund managers might exist, particularly among aggressive-

growth and growth funds and those funds with the smallest net asset values. However,

funds with the smallest net asset values have the highest expenses so that actual returns, net

of expenses, will not exhibit abnormal performance.

In (Patel et al., 1991) past performance of a mutual fund was found to have an effect

on cash flows. A one-percentage-point return higher than the average fund's return implies a

$200,000 increased flow in the next year (where the median fund's size is$80 million and

the median flow is $21 million). This performance effect is based on investors' belief that a managed fund with a superior past will perform better than individuals. Another confirmatory study is Phelps (1995) that showed that sophisticated and unsophisticated investors were chasing prior year returns. Was this a good investment strategy? Yes. Risk-adjusted fund specific performance was significantly positively related to past performance during the earlier half of Phelps' sample for 1985-89. Three studies that are more recent show persistence is a real phenomenon while arguing if fund managers are or are not responsible for it. Carhart (1997) stated that common factors in stock returns and investment expenses almost completely explain persistence in equity mutual funds' mean and risk-adjusted returns. He argues against fund managers being skilled portfolio managers but does not deny persistence when it comes to strong underperformance by the worst-return mutual funds. Elton et al. (1996) examined predictability for stock mutual funds using risk-adjusted return. They found that past performance is predictive of future risk-adjusted return. A combination of actively managed portfolios was formed with the same risk as a portfolio of index funds. The actively managed funds were found to have a small, but statistically positive risk-adjusted return during a period where mutual funds in general had negative risk-adjusted returns. Phelps and Detzel (1997) claim that they confirmed the persistence of returns from 1985-89 but it disappeared when risk was properly controlled for, or the more recent past was examined. With these differing results about performance persistence, we would expect investors would rely on performance and rating systems as a way of investing wisely. The popularity of rating systems attests to their use. The ability to predict mutual fund ratings, and improvements or declines in ratings, from one month (the Momingstar rating cycle) to one year would be an important tool in developing an investment plan. 3.1.6 Review of Yearly Variation of Morningstar Ratings We studied matched mutual funds, funds with no name change over a one-year period, from 1993 to 1996 to determine the relationship between the Morningstar rating of one year, and the average one-year return and Morningstar rating of the succeeding year. If some relationship existed, it would show why it would be interesting to know the predicted Morningstar rating one year in the future. The 1993-94 period had 770 matched funds, 1994-95 had 934 matched funds, and 1995-96 had 1,059 matched funds. In Figure 3.1, the Morningstar 5-star ratings for 1993 appear as the data points on the graph. The 1994 Morningstar ratings are the x-axis. The y-axis is the one-year average return percentage. The line connecting the data points shows, for example, the one-year average return percentage for a fund with a 2-Star rating in 1993 if it increased or decreased its star rating in 1994. To be more specific, the average fund with a 1993 2- Star rating would have a 2% one-year return if it stayed 2-Star. If this average fund increased to 3-Star it had a 15% one-year return. If the average fund decreased to 1-Star it had a -7% one-year return. Figure 3.1 also shows that a 1993 2-star and 3-star fund that retained their Morningstar rating in 1994 had the same one-year average return, approximately 2%. As noted in the preceding paragraph, the 1993 2-Star fund that increased its rating had an average 15% one-year return. However, the 1993 3-Star that went up to 4-Star in 1994 only had an average 10% one-year return. This effect for one-year average returns was more distinct in Figure 3.2, using a three-star rating system for rating the mutual funds. The relationship between average return and rating changes was found to exist for all three years studied. Of course, it should be noted that these are years when the stock market increased year-by-year. 1993 Rating vs. One-Year Average Return and 1994 Rating 25 ~ 20 -0 93 1-Star 0 15 93 2-Star o 10 [ -_____-_ 93 3-Star >.4 0 -M93 4-Star -5 --- 93 5-Star -10 1-Star 2-Star 3-Star 4-Star 5-Star 1994 Rating Figure 3.1: 1993 Ratings and 1994 Ratings for the Five-Star Rating System. 1993 Rating vs One-Year Average Return and 1994 Rating S25 - 205 10 ---93 1-Star 5o 5---93 2-Star -~ -93 3-Star -5 1-Star 2-Star 3-Star 1994 Ratings Figure 3.2: 1993 Ratings and 1994 Ratings for the Three-Star System. 1994 Ratings vs One-Year Average Return and 1995 Rating S60, 50 s 50 -- 94 I-Star 30 ---94 2-Star 94 3-Star 20 )f4 94 4-Star S-0- 94 5-Star 0 -10 1-Star 2-Star 3-Star 4-Star 5-Star 1995 Ratings Figure 3.3: 1994 Ratings and 1995 Ratings for the Five-Star System. 1994 Ratings vs. One-Year Return and 1995 Rating 50 S40 - 30 94 I-Star 20 ---94 3-Star to 1 0 0 -9, I-Star 2-Star 3-Star 1995 Rating Figure 3.4: 1994 Ratings and 1995 Ratings for the Three-Star System. Figures 3.2, 3.4 and 3.6 for the 3-Star rating system show that it was better to hold 2 -Star rated funds that maintained their rating or improved them to 3-Stars, rather than to own 3-Star funds that declined to 2-Star. The average one-year return on these funds was less than a 2-Star fund. 1-Star 2-Star 3-Star 4-Star 5-Star 1996 Rating Figure 3.5: 1995 Ratings and 1996 Ratings for the Five-Star System. 1995 Ratings vs One-YearReturn and 1996 Rating 3 35 S25 ---95 1-Star ---95 2-Star S20 -1 r95 3-Star 15 O 10 1-Star 2-Star 3-Star 1996 Rating Figure 3.6: 1995 Ratings and 1996 Ratings for the Three- Star Rating System. 1995 Ratings vs One-Year Average Return and 1996 Rating -- 95 1-Star ---95 2Star -A 95 3-Star -W-95 4-Star --- 95 5-Star Based upon this evidence, for the period studied, it would have been interesting and profitable to have a prediction of the Momingstar mutual fund ratings one year in the future. The graphs indicate that over a one-year period the Morningstar rating responds to increased returns by having the rating go up and the ratings decline when average returns go down. This is somewhat surprising given the 3-year, 5-year, and 10-year return data used by Morningstar in calculating its ratings and the moderating effect it should have on one-year rating changes. 3.2 Problem Specification Our review indicates that there are two issues of interest associated with the Morningstar Mutual Fund rating system: 1) Due to the rapidly increasing number of mutual funds, Momingstar rates approximately half the mutual funds in their database because unrated funds do not have three years of financial data for Morningstar to calculate a rating. Classifying unrated funds could be useful information for investors wanting to know how these funds compare to Momingstar-rated mutual funds, and 2) Anecdotal evidence exists that investors buy mutual funds that will maintain or improve their Morningstar rating, meaning investors have high regard for the Morningstar rating system. The ability to predict Morningstar mutual fund ratings one year in advance would be useful information for planning an investment portfolio. Based on the concept of performance persistence we would expect that mutual fund data could have information that would indicate a fund would continue to maintain or improve their Morningstar rating. It could be due to average returns remaining the same or improving, or a combination of features. Likewise, if the average return or these features decrease, we would expect the rating of the mutual fund to decrease. We will determine the ability of C4.5 to classify mutual funds versus LDA and Logit to demonstrate that unrated funds can be classified with this technology. Success with classification by C4.5 will provide a foundation for our study of the prediction of Morningstar mutual fund ratings. Therefore, our research hypotheses are as follows: 1. C4.5 can classify mutual funds as well as LDA and Logit, and 2. C4.5 can predict mutual fund ratings changes one year in the future compared to an investment strategy of the fund maintaining the same rating one year in the future. Figure 3.7 on the following page shows a map of the experimental design of this study. In Chapter 4, we compare the classification of mutual funds by C4.5 to LDA and Logit to test hypothesis 1. In Chapter 5, we perform experiments with C4.5 to predict mutual fund ratings and ratings changes one year hence to test hypothesis 2. We conducted these studies using two rating systems: the standard Morningstar 5-Star rating system; and a new 3-Star rating system based on merging 1-Star and 2-Star ratings, and the 4-Star and 5- Star ratings into two new ratings. The 3-Star system was designed to reduce the classification error caused by the small number of funds rated 1-Star and 5-Star. Experimental Plan Research Phases Chapter 4 Chapter 5 Classifying Funds Predicting Fund with C4.5, LDA Ratings with C4.5 and Logit Phase 1 Phase 5 Classifying Predicting with 1993 Funds a Common Feature Vector Phase 2 Phase 6 Classifying Predicting Matched with Derived Mutual Fund Features Ratings Phase 3 Phase 7 Comparing Predicting Unmatched 3-Star and 5-Star Mutual Fund Classification Ratings Phase 4 Crossvalidation by C4.5 with Large Feature Vector Figure 3.7: Overview of Research Phases. CHAPTER 4 CLASSIFICATION OF MUTUAL FUNDS BY RATINGS The research design for this study divides into two parts. In this chapter, we determined that C4.5 classifies mutual funds by their Morningstar rating as well as Linear Discriminant Analysis (LDA) and Logistic Regression (Logit). In Chapter 5, C4.5 predicted future Momingstar mutual fund ratings and ratings changes. 4.1 Research Goals The research that we have conducted has several broad research goals: 1) Demonstrate the use of decision trees as a useful classification technique for mutual funds, 2) Improve our understanding of the domain, the Morningstar rating system, from the standpoint of the relationship between ratings and the features or attributes used to describe the mutual funds, and 3) Develop a knowledge base and several decision trees that can be used to predict mutual fund ratings. 4.2 Research Caveats Before reviewing the research methodology of these phases it should be noted that C4.5, and also the stepwise forms of LDA and Logit in this research, use a greedy algorithm heuristic for selecting a feature to classify an example. As noted earlier in Chapter 2, a greedy algorithm always makes the choice that, at the moment, looks best. Greedy algorithms do not always yield optimal solutions but, for many problems, they often do. Another caveat is that the decision tree that is generated by the algorithm to correctly classify examples in the training set is only one of the many possible trees that could classify better or worse (Quinlan, 1990). These two points mean that the order of feature selection in these three methods of classification is not a measure of their value to the classification process. Specifically, in the case of C4.5, more than one decision tree could exist that would classify the training set as well as the one selected by it. Never the less, if a feature is consistently selected in our samples we feel that this is an indication of its importance to the classification process. 4.3 Example Databases Morningstar, Inc. approved the use of their data for the experiments conducted as part of this research. We selected examples from the Momingstar Mutual Funds OnDisc CD-ROM for April 1993, July 1994, and July 1995, and the Momingstar Principia for July 1996. An example consists of various features describing a mutual fund and the Morningstar rating of I-Star to 5-Star. A complete listing of the available features is in Appendix A. In each of the following phases we will list the features that made up the feature vector for each dataset. 4.4 Brief Overview of the Research Phases Phase 1 consisted of classifying the April 1993 Momingstar data with C4.5, LDA, and Logit. In Phase 2 we added new features derived from the April 1993 data to determine if this improved classification. Phase 3 of this study used the July 1994 Morningstar database to study improvements to classification caused by increased sample size (more mutual funds now qualified for a Morningstar rating). We also consolidated the Morningstar rating system into three ratings instead of five ratings and explained the reason for doing this in Section 4.7. Phase 4 departed from comparing C4.5 with LDA and Logit and tested the ability of C4.5 to classify funds using a large number of features, fifty, by crossvalidation with the 5-Star and 3-Star rating system. We also studied the effect of three- year features on classification. The results of this experiment were used to design the final three phases of our research presented in Chapter 5. 4.5 Phase 1 Classifying 1993 Funds 4.5.1 Methodology In this phase, we performed three separate classifications of the April 1993 Morningstar data using C4.5, LDA, and Logit. First, we obtained examples of equity mutual funds with complete data from the selected database. This meant that the examples did not have missing feature values. Equity mutual fund features used in the classification process were selected by two criteria: (1) minimizing the expected correlation among certain features since LDA required that the features not be highly correlated (Klecka, 1980), and (2) excluding total and average return features beyond the first year (three, five, and ten year values are used by Morningstar in calculating Morningstar return) to eliminate correlation with the Morningstar classification. We tested for correlation among features and Percentage Rank All Funds had a highly negative mean correlation of -0.85 with Total Return. The mean correlation was 0.64 for Total Return and Percentage Rank Funds by Objective. Percentage Rank Funds by Objective had a mean correlation of 0.75 with Percentage Rank All Funds. Thus, we used the Total Return feature in the dataset and excluded those correlating features. Factor analysis was not used to reduce this set to independent uncorrelated factors due to its underlying assumption that the variates are multivariate normal (Morrison, 1990). We show later in this chapter that most features are not univariate normal and, therefore, not multivariate normal. Manly (1995) considers univariate normality a minimum requirement before testing for multivariate normality. In addition, the data consisted of five classification groupings (the Morningstar stars). Finding a single orthogonal transformation such that the factors are simultaneously uncorrelated for the five groupings would require that the transformations be the same and differ only by the sampling error (Flury and Riedwyl, 1988). This would be very difficult to measure and would be of suspect value. C4.5 did not require an assumption about an underlying distribution of the data or the correlation between features (Quinlan, 1993). However, others (Han et al., 1996) have identified decreased classification accuracy of ID3, the C4.5 precursor, as correlation among explanatory variables increased. Therefore, the data just were not ideal for any of the three classification systems. The twenty-four continuous features used for this phase are presented in Table 4.1. We selected 555 equity mutual funds having no missing feature values from the Morningstar database. The funds included four investment objectives: Aggressive Growth, Equity-Income, Growth, and Growth-Income. We used the procedure described by Weiss and Kulikowski (1991), and referred to as the train-and-test paradigm, for estimating the true error rate and performance of the Table 4.1: Classification Features for Phase 1 Yield Return on Assets Year-to-Date Return Debt % Total Capitalization 3-Month Total Return Median Market Capitalization Year-1 Total Return Cash % Alpha Natural Resources Sector Beta Industrial Products Sector R-Squared Consumer Durables Sector Total Assets Non-Durables Sector Expense Ratio Retail Trade Sector Turnover Services Sector P/E Ratio Financial Services Sector P/B Ratio Manager Tenure three systems in classifying the mutual funds. We randomly sampled the 555 funds using a uniform distribution to produce a training set of 370 examples and a testing set of 185 examples. We calculated the error rate according to the following formula: error rate = number of errors number of cases We computed a lower bound for the size of the training set with twenty-four continuous features for a two class decision tree induction sufficient to guarantee an error level, e, within a specified degree of confidence, 6, for binary data (Kim and Koehler, 1995). With E = 0.1 and 6 = 0.01 the sample size required was 378. This would be appropriate because our data consisted of continuous features upon which C4.5 would perform binary splits. Our classification problem, however, had five classes, so we required five times as many examples for training (Langley, 1996, p. 31) or 1,892 examples using the above parameters. However, we only had 370 examples and had to modify the error and confidence parameters to e = 0.35 and 5 = 0.115. This meant that given the 370 examples, we were 88.5% confident that the output of the algorithm was 65% correct for the five classes or star ratings. We processed each of the twenty training datasets with the SAS stepwise discriminant analysis procedure, STEPDISC. A major assumption for this statistical procedure is that the features are multivariate normal with a common covariance matrix. Using the one-sample two-tailed Kolmogorov-Smirnov Test we determined that of the 22 features only Debt % of Total Capitalization (p = 0.396) and Return on Assets (p = 0.044) were univariate normally distributed and concluded that the data were not multivariate normal. The classification features determined by STEPDISC then were processed with the SAS DISCRIM procedure to compute the discriminant function based on the selected features. The result was a classification of the mutual funds into the five classes or Morningstar ratings for the training and testing sets. In addition, the SAS system produced a confusion matrix for each set of examples that compared the classification determined by the system to the actual Morningstar rating. We made an error count for each holdout set. Next, we processed the training set with the SAS LOGISTIC procedure and this fitted a linear logistic regression model for ordinal response data by the method of maximum likelihood. The approach selected for this procedure was the stepwise selection of features and, using the following equations, the system performed a probability calculation to determine the classification of each example: logit(p) = intercept + E parameters *feature value where was the probability of the classification and was defined by: p = elogi(p) + elogit(p)) We considered the first rating classification with a probability greater than 50% to be the determined rating and we compared this to the actual Moringstar rating. A count was made of the misclassified examples to determine the classification errors of Logit. We determined a classification rating for every example for the training set and testing set and all were included in the error count. C4.5 classified the training set by building the classification decision tree using the Gain Ratio Criterion, described in Section 2.3.2.1, and the test examples were then subsequently classified with the best tree. We used the default settings for C4.5 described in Quinlan (1993) with the exception that any test used in the classification tree must have at least two outcomes with a minimum of x examples. Quinlan (1993) recommends higher values for x for noisy data. We incremented this value in units between 5 and 25 for the experiments. The C4.5 program calculated the error rate for the training set and the testing set and then produced a Confusion Matrix of the classification of the test set by comparing it to the actual Momingstar rating from the example. 4.5.2 Results Logit performed best with mean classification errors of 33.5% (62 mean errors per sample) over the twenty samples. C4.5 followed with 37.2% mean classification errors (69 mean errors per sample) and LDA had 39.6% mean classification errors (73 mean errors). Figure 4.1 shows the error rate for classification for each of the 20 samples (labeled A through T). We performed an Analysis of Variance (ANOVA) test with the null hypothesis of equal means on the number classification errors. With F=30 (p=0.O), for dfi=2 and df2=57, we rejected the null hypothesis meaning that the three methods did not classify the funds equivalently. 85 - 80 75 A 70 -A C4 5 65 H-LDAI 60 \-.Logit 50 45 . A B C D E F G H I JK L M N OP Q R S T Sample Figure 4.1: C4.5, LDA, and Logit Classification Errors for April 1993 Morningstar Data. Table 4.2: C4.5 Classification Features for Phase 1. Feature Frequency Feature Frequency Expense Ratio 20 Year 1 Total Return 4 Alpha 19 YTD Total Return 4 Yield 14 Cash % 3 Median Market 13 Debt % of Total 3 Capitalization Capitalization Assets 11 Industrial Products Sector 3 Return on Assets 10 Natural Resources Sector 3 R-Squared 9 Nondurables Sector 3 P/B Ratio 8 Manager Tenure 2 Beta 5 P/E Ratio 2 Consumer Durables Sector 4 Retail Sector 2 Turnover 4 Service Sector 2 In Table 4.2, we list the features selected by C4.5 for the classification process and the selection frequency. While frequency is not an absolute measure of classification importance we gained useful knowledge about how often a feature was selected. On average, each decision tree consisted of 7.4 features with a median of 7.5 features. LDA and Logit select features for one time use, while the procedures may later discard the feature due to the stepwise nature of both procedures. Tables 4.3 and 4.4 display not only the selection frequency of the features but also their position in the classification process. Both values, taken together, provide some indication of the classification importance of a feature to the 20 samples. For example, while the Services Sector has an average position of 6.9, it has a position standard deviation (a) of 2.7 and it Table 4.3: LDA Classification Features. Feature Average Position a Frequency Alpha 1.0 0.0 20 R-Squared 2.0 0.0 20 Beta 3.0 0.0 20 Assets 4.8 2.1 20 Year 1 Total Return 7.2 1.9 20 Turnover 6.3 1.9 19 Retail Sector 7.6 1.3 16 Expense Ratio 7.9 2.4 16 Return on Assets 8.1 2.0 14 Cash % 9.4 2.6 13 Debt % of Total Capitalization 8.9 2.5 10 Consumer Durables Sector 10.4 2.1 8 P/E Ratio 11.0 1.3 8 Services Sector 6.9 2.7 7 P/B Ratio 9.4 3.2 7 Financial Sector 12.4 1.5 5 Industrial Products Sector 9.8 2.6 4 Median Market Capitalization 11.3 3.3 4 Yield 10.0 2.0 3 Nondurables Sector 11.3 0.6 3 YTD Total Return 12.0 0.0 3 Natural Resources Sector 12.7 1.5 3 was used in only 7 samples. Year-1 Total Return, while having an Average Position of 7.2, has a much smaller a and was used in classifying all 20 samples. Thus, we considered Year-1 Total Return to be more important to the classification process than the Services Sector feature. On the other hand, we would be suspicious of a classification starting out with Natural Resources Sector rather than Alpha, Beta, or Assets. LDA used a mean of 12.2 features to classify each sample (median = 12.0). Logit required the fewest features with a mean of 5.9 features for classifying the training set (median = 6.0). Table 4.4 : Logit Classification Features. Feature Average Position a Frequency Alpha 1.0 0.0 20 R-Squared 2.0 0.0 20 Beta 3.0 0.0 20 Assets 4.3 0.6 20 Expense Ratio 5.2 0.8 13 Debt % of Total Capitalization 5.3 0.5 11 Median Market Capitalization 6.3 0.6 3 Financials Sector 6.0 0.0 3 Natural Resources Sector 8.0 1.4 2 P/E Ratio 6.0 1.4 2 Return on Assets 5.0 0.0 2 Yield 5.0 0 1 Retail Sector 8.0 0 1 A listing of Phase 1 features used for classification appears in Appendix B. In Table 4.5, we provide a consensus list of the features that the three programs selected consistently for classification. This listing represents the features most frequently selected by all three classification methodologies. 81 Table 4.5: Consensus List of Classification Features for Phase 1 Feature Alpha R-Squared Beta Assets Expense Ratio Debt % of Total Capitalization Return on Assets 4.5.3 Conclusions The results showed that Logit had fewer classification errors than C4.5 and LDA. Logit performed better than C4.5 in seventeen of the twenty samples. Also, the three classification algorithms used a very limited and comparable mix of the twenty-four features to classify the twenty testing sets. After reviewing the error and confidence factors, we concluded that a larger sample size could improve the performance of C4.5. 4.6 Phase 2 1993 Data with Derived Features 4.6.1 Methodology The second phase of this study used the Morningstar April 1993 database to determine if new features, derived from the database, improved the classification of mutual funds. The derived features included approximating Morningstar Return minus Morningstar Risk (not published by Morningstar in 1993), reversing the weights on the approximation of Morningstar Return minus Momingstar Risk, and the Treynor Index. The Treynor Performance Index, Tp, is a measure of relative performance not calculated by Momingstar and is defined as follows: S RF where Rp is the portfolio return, RF is the risk-free rate of return, and f, is the portfolio's beta or the nondiversifiable past risk. The Treynor Performance Index treats only that portion of a portfolio's historical risk that is important to investors, as estimated by tp, and neglects any diversifiable risk (Radcliffe, 1994). Tp was calculated by subtracting the three-year Mean Treasury Bill rate from the Momingstar Three-year Annualized Return for the mutual fund, and dividing this value by the three-year / of the mutual fund in the Morningstar database. We constructed the other new features also using data from Momingstar. The April 1993 Morningstar CD-ROM did not provide the actual Morningstar Return values but did provide Morningstar Risk. To approximate Morningstar Return the 3-year, 5- year, and 10-year Average Returns were used as surrogates in the Morningstar Return minus Risk formulas. The difference between them and the actual Morningstar Return for those periods was that Morningstar deducted the effects of mutual fund loads and redemption fees from these average returns. Morningstar made assumptions about the fund loads and redemption fees that would make it very difficult for anyone to construct a precise determination of the load-adjusted return. In developing the Reversed Weight feature, we reversed the weights used by Morningstar in their Risk-Adjusted Return formulas and applied these to the calculation of five-year and ten-year Return minus Risk data. For five-year old mutual funds, Morningstar used 40% of the three-year value and 60% of the five-year value and we reversed them. For ten-year old mutual funds, Morningstar used 20% of the three-year data, 30% of the five-year data, and 50% of the ten-year data. We reversed these weights to 50% for three-year data, 30% for five-year data, and 20% for ten-year data. We prepared a dataset consisting of 20 random samples. By increasing the number of Investment Objectives from the four used in Phase 1 to the twelve domestic equity investment objectives we were able to increase the total number of examples to 784. The Regular Dataset had 23 features listed in Table 4.6. We constructed a similar dataset, including the three derived features or 26 features, and referred to it as the Derived Dataset. This provided a training set of 523 examples and a testing set of 261 examples. A lower bound for the size of the 5-Star training set, using the method of Phase 1, was calculated at 523 examples for the Regular Dataset with e = 0.35 and 5 = 0.078, and 523 examples for the Derived Dataset with e = 0.35 and 6 = 0.111. The measurement of interest for this Phase was the fewest classification errors using C4.5, LDA, and Logit. We processed the training sets and testing sets through the same SAS procedures used in Phase 1, as well as C4.5. Table 4.6 lists the features common to the Regular and Derived datasets. Table 4.6: Phase 2 Common Features for Classification by C4.5, LDA, and Logit. Yield Return on Assets YTD Return Debt % Total Capitalization Year 1 Total Return Median Mkt. Capitalization Alpha Cash % of Holdings Beta Natural Resources Sector R-Squared Industrial Products Sector Total Assets Consumer Durables Sector Expense Ratio Non-Durables Sector Turnover Retail Trade Sector P/E Ratio Services Sector P/B Ratio Financial Services Sector Manager Tenure 4.6.2 Results for the Regular Dataset The mean classification errors over the twenty samples for the Regular Dataset was as follows: C4.5 was 36.6% (95.6 mean errors), LDA was 37.7% (98.5 mean errors), and Logit was 35.8% (93.4 mean errors). Figure 4.2 shows the errors by sample. Logit had fewer errors than C4.5 in fifteen samples. However, an ANOVA test with a null hypothesis of equal means was performed on the number of errors and calculated F=1.35 (p=0.267) for dfi=2 and df2=57. Thus, we failed to reject the null hypothesis and we assumed the three methods classified the mutual funds equivalently. 170 150 90 ABCDE FGH I JKLMNOPQR ST Sample Figure 4.2: C4.5, LDA, and Logit Classification Errors for 1993 Morningstar Data for the Regular Dataset. The features selected by C4.5 and their frequency are displayed in Table 4.7. C4.5 used an average of 8.9 features per sample with a median of 8.0 features. Table 4.7: C4.5 Feature Selection for the Regular Dataset. Feature Frequency Feature Frequency Alpha 20 Natural Resources Sector 6 Assets 20 Service Sector 5 Industrial Products Sector 16 Cash%/ 4 Median Market 13 Turnover 4 Capitalization R-Squared 13 YTD Total Return 4 SEC Yield 13 Beta 3 Expense Ratio 12 Consumer Durables Sector 3 Return on Assets 10 P/B Ratio 3 Debt % of Total 9 Financial Sector 1 Capitalization P/E Ratio 8 Manager Tenure 1 Year 1 Total Return 8 Nondurables 1 Retail Sector 1 Table 4.8: LDA Feature Selection for the Regular Dataset. Features Average Position a Frequency Alpha 1.0 0.0 20 R-Squared 2.0 0.0 20 Beta 3.0 0.0 20 Assets 5.0 1.0 20 Industrial Products Sector 5.7 1.7 20 Debt % of Total Capitalization 7.7 2.7 20 YTD Total Return 7.9 2.7 18 P/B Ratio 8.1 2.4 14 Retail Sector 10.0 1.8 14 Cash % 11.0 2.8 14 Year 1 Total Return 11.0 2.3 14 Turnover 9.2 1.8 13 P/E Ratio 11.0 4.0 10 Expense Ratio 11.0 3.4 10 Finance Sector 11.0 3.2 9 Service Sector 11.0 2.6 8 Return on Assets 13.0 3.3 8 Consumer Durables 12.0 2.3 7 Natural Resources Sector 5.8 3.1 6 Manager Tenure 14.0 1.2 6 SEC Yield 12.0 2.1 3 Non-Durables Sector 13.0 0 1 Table 4.8 summarizes the feature selection positioning, the standard deviation of the position (a), and the frequency with which LDA used the feature for classification. LDA required an average of 13 features to classify the samples. The features most frequently selected by Logit are listed in Table 4.9. Alpha, R- Squared, Beta, and Assets were selected for every sample. Logit used an average of 7.3 features to classify the 20 samples. Table 4.9: Logit Feature Selection for the Regular Dataset. Features Average Position a Frequency Alpha 1.0 0.0 20 R-Squared 2.0 0.0 20 Beta 3.0 0.0 20 Assets 4.6 0.9 20 YTD Total Return 6.2 0.6 13 Return on Assets 5.5 1.4 11 Debt % of Total Capitalization 4.8 0.6 10 Consumer Durables Sector 6.4 1.6 8 Expense Ratio 5.7 0.8 6 Industrial Products Sector 6.5 1.3 4 Manager Tenure 7.3 0.6 3 P/B Ratio 7.7 0.6 3 P/E Ratio 6.0 1.4 2 Retail Sector 7.0 1.4 2 Median Market Capitalization 7.0 1 SEC Yield 8.0 1 Year 1 Total Return 8.0 1 Table 4.10: Consensus List for Regular Features. Features Alpha R-Squared Assets Beta YTD Total Return Table 4.10 lists the consensus features used with high regularity by the three methodologies classifying the Regular Dataset. There was little agreement beyond these five features. 4.6.3 Results for the Derived Features Dataset C4.5 had a mean classification error rate of 20.9% (54.6 mean errors), LDA performed with a mean error rate of 26.7% (69.7 mean errors) and Logit had a mean error rate of 20.5% (53.6 mean errors). Figure 4.3 on the next page shows the classification errors for the samples. While Logit performed best, it was not statistically significant. We performed an ANOVA test with null hypothesis of equal means for the twenty samples and calculated F=52.7 (p=0.0) for dfi=2 and df2=57, and we reject the null hypothesis. A separate t-test with a null hypothesis of equal means for C4.5 vs. Logit had a p-value of 0.40 so we fail to reject that null hypothesis and consider the mean errors of C4.5 and Logit to be equal. Therefore, C4.5 and Logit classified the mutual funds equally well and LDA performed the worst. 90 70 ---C4.5 -c-LDA 60, -9 a-Logi 50 -* 4'/ A B C D E F G H I J K L MNOPQR ST Sample Figure 4.3: C4.5, LDA, and Logit Classification Errors with Derived Features. The features most frequently selected by C4.5 are listed in Table 4.11. On average, the features used for classification declined from to 4.4, with a median of 4.0. The Treynor Index was only selected twice. Table 4.11: C4.5 Derived Feature Selection. Feature Frequency Feature Frequency Return minus Risk 19 Median Market 2 Capitalization Assets 17 R-Square 2 SEC Yield 9 Treynor Index 2 Reversed Weight Return minus 5 Cash % 1 Risk Alpha 4 Manager Tenure 1 Beta 4 Nondurables Sector 1 Debt % of Total Capitalization 4 P/B Ratio 1 Consumer Durables Sector 3 P/E Ratio 1 Turnover 3 Retail Sector 1 Expense Ratio 2 Services Sector 1 Finance Sector 2 Year 1 Total Return 1 Industrial Sector 2 In Table 4.12 on the next page we see that LDA used more features to classify these datasets than did C4.5 and Logit. Fifteen features (mean and median) were required for classification. Surprisingly, it even used the derived Reversed-Weight Return minus Risk feature in all twenty samples. The Treynor Index was used for 12 samples. Table 4.13, two pages hence, shows that Logit used a mean of 4.8 features for classification (median = 4.5). Table 4.14 is the consensus features used with regularity by the three classification methodologies. Return minus Risk was the most commonly used feature and dominated the selection of other features. Both C4.5 and Logit had few features that were used in more than ten samples. LDA used many more features with poor results. Table 4.12: LDA Derived Feature Selection. Features Average Position a Frequency Return minus Risk 1.1 0.2 20 R-Squared 2.0 0.2 20 Industrial Products Sector 3.8 1.5 20 Assets 5.1 1.6 20 Alpha 6.4 2.2 20 Reversed-Weight Return minus Risk 6.9 2.0 20 Beta 7.8 1.8 20 Expense Ratio 9.6 2.7 19 Debt % of Total Capitalization 11 2.6 18 Return on Assets 8.7 3.2 17 Turnover 9.4 2.3 14 Cash% 13 3.6 12 Treynor Index 10.0 2.9 12 Consumer Durables Sector 13 2.1 11 Manager Tenure 14 1.4 10 Retail Sector 12 3.0 9 P/B Ratio 12 3.9 8 YTD Total Return 6.6 5.3 7 Service Sector 13 3.5 7 Yield 12 4.2 5 Year 1 Total Return 13 2.5 5 Natural Resources Sector 14 2.5 5 P/E Ratio 15 0.8 4 Median Market Capitalization 15 6.4 2 Finance Sector 15 2.8 2 Non-Durable Sector 13 1 4.6.4 Conclusions The classification features for each sample are in Appendix C. A review of the features selected by C4.5 and Logit showed that a small number of the 23 available features were used to classify the Regular Dataset and the Derived Dataset. Increased sample size reduced the classification errors ofC4.5 to where it was equivalent to Logit. Table 4.13: Logit Derived Feature Selection. Features Average Position a Frequency Return minus Risk 1.0 0.0 20 Beta 2.2 0.5 19 Manager Tenure 3.7 1.1 14 Assets 3.1 0.8 11 Natural Resources Sector 4.4 0.9 11 Expense Ratio 4.1 0.9 9 R-Squared 4.5 0.6 4 Consumer Durables Sector 6.0 1.0 3 Alpha 6.0 1 Retail Sector 5.0 1 Debt % of Total Capitalization 4.0 1 P/B Ratio 2.0 1 Table 4.14: Consensus List for Derived Features Features Return minus Risk Assets Beta Manager Tenure Using the derived feature of Return minus Risk resulted in improved classification versus the Regular Dataset. We observed that when this feature was present in the feature vector, C4.5 and Logit used substantially fewer features for classification. The derived features of Reversed Weight Return minus Risk and the Treynor Index were not selected for classification by Logit and were seldom used by C4.5. The interesting result of the second phase was the disappointing performance of LDA with the derived features. We concluded that the derived features either introduced more noise than acceptable, a degree of nonlinearity, or LDA was affected by high correlation among features. Since we had violated the constraints of multivariate normality and multicollinearity for this classification methodology, it was impossible to determine the exact cause of this failure. 4.7 Phase 3 Comparing 5-Star and 3-Star Classification 4.7.1 Methodology Phase 3 of this research tested the idea that we could improve classification error rates, i.e., lower it, by combining the five Morningstar ratings into three using the following scheme: (1) Morningstar ratings 1-Star and 2-Star became new rating 1-Star, (2) Morningstar rating 3-Star became new rating 2-Star, and (3) Momingstar ratings 4-Star and 5-Star became new rating 3-Star. We proposed this variation of the standard Morningstar rating system to increase the number of examples at each end of the Morningstar scale. An examination of the data from the previous phases showed that a higher error rate was occurring in I-Star and 5- Star Morningstar ratings than for 2-Star, 3-Star, or 4-Star. Momingstar rating classifications 1-Star and 5-Star each represented 10% or less each of the examples in the datasets. Additionally, previously cited material showed that investors mostly purchased 4- and 5-Star mutual funds. Combining the individual ratings in this manner still permitted segmenting the 4- and 5-Star funds from the funds with lower ratings. Minitab was used to generate ten uniformly random training sets and testing sets from the July 1994 Momingstar CD-ROM database. Thirty-two features were selected for the classification problem and are listed in Table 4.15. The number of complete examples extracted from the equity mutual funds database was 999, resulting in 666 training set examples and 333 testing set examples. We calculated a lower bound on the required number of training examples. For s = 0.30 and 6 = 0.13 we required 669 training examples for the three-class experiment. With e = 0.30 and 5 = 0.1925 we required 668 training examples for the five-class experiment. Full Text AUTOMATED KNOWLEDGE ACQUISITION USING INDUCTIVE LEARNING: APPLICATION TO MUTUAL FUND CLASSIFICATION By ROBERT CLAYTON NORRIS, JR. A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1997 Copyright 1997 by Robert Clayton Norris, Jr. To my wife, Suzanne and my daughter, Christina ACKNOWLEDGEMENTS I wish to thank Dr Gary J. Koehler, the chairman of my committee, and Dr Robert C. Radcliffe, the external member, who gave me their time, support, guidance, and patience throughout this research. Dr Koehler provided the initial concept of the research topic in the area of artificial intelligence and finance. Dr Radcliffe provided the idea of studying the Momingstar rating system. I wish to thank Dr. Richard Elnicki and Dr. Patrick Thompson for serving on my committee and for their help and advice over these years. I wish to thank Dr, H. Russell Fogler who has always shown an interest in my research I also would like to thank Dean Earle C. Traynham and Dr Robert C. Pickhardt of the College of Business Administration, University of North Florida, for their support. I wish to thank my wife, Suzanne, for her love, assistance, and understanding as I worked on this most important undertaking. I would also like to thank my aunt and uncle, Lillian and Jack Norris, for encouraging me to continue my education. Finally, I would like to thank my late father for his advice and guidance over the years and my late mother for her love. IV TABLE OF CONTENTS ACKNOWLEDGEMENTS Â¡V ABSTRACT viii CHAPTERS 1 INTRODUCTION 1 1.1 Background 1 1.2 Research Problem 3 1.3 Purpose 5 1.4 Motivation 6 1.5 Chapter Organization 6 2 LITERATURE REVIEW 8 2.1 Historical Overview of Machine Learning 8 2.1.1 Brief History of AI Research on Learning 8 2.1.2 Four Perspectives on Learning 10 2.2 AI and Financial Applications 15 2.2.1 Expert Systems 15 2.2.2 Neural Networks and Financial Applications 17 2.2.3 Genetic Algorithms and Financial Applications 27 2.3 The C4.5 Learning System 32 2.3.1 Brief History of C4.5 33 2.3.2 C4.5 Algorithms 37 2.3.3 Limitations of C4.5 44 2.3.4 C4.5 Financial Applications 45 2.4 Linear Discriminant Analysis 47 2.4.1 Overview 47 2.4.2 Limitations of LDA 48 2.5 Logistic Regression (Logit) 49 2.6 Summary 50 v 3 DOMAIN PROBLEM 51 3.1 Overview ofMutual Fund Ratings Systems 51 3.1.1 Momingstar, Inc. Overview 54 3.1.2 Momingstar Rating System 55 3.1.3 Review and Criticism of the Momingstar Rating System. 58 3.1.4 Investment Managers Use of Ratings 59 3.1.5 Performance Persistence in Mutual Funds 61 3.1.6 Review of Yearly Variation of Momingstar Ratings 64 3.2 Problem Specification 68 4 CLASSIFICATION OF MUTUAL FUNDS BY RATINGS 71 4.1 Research Goals 71 4.2 Research Caveats 71 4.3 Example Databases 72 4.4 Brief Overview of the Research Phases 72 4.5 Phase 1 - Classifying 1993 Funds 73 4.5.1 Methodology 73 4.5.2 Results 77 4.5 3 Conclusions 81 4.6 Phase 2 - 1993 Data with Derived Features 81 4 6 1 Methodology 81 4.6 2 Results for the Regular Dataset 84 4.6.3 Results for the Derived Features Dataset 87 4.6.4 Conclusions 89 4.7 Phase 3 - Comparing 5-Star and 3-Star Classifications 91 4.7.1 Methodology 91 4.7.2 Results 93 4.7.3 Conclusions 100 4 8 Phase 4 - Crossvalidation with C4.5 101 4.8.1 Methodology 101 4.8.2 Results 103 4.8.3 Conclusions 104 4.9 Overall Summary 105 5 PREDICTION OF MUTUAL FUND RATINGS AND RATINGS CHANGES 107 5.1 Phase 5 - Predicting Ratings with a Common Feature Vector Over Two Years 109 5.1.1 Methodology 109 5.1.2 Results 110 5.1.3 Conclusions 112 vi 5.2 Phase 6 - Predicting Matched Mutual Fund Rating Changes 113 5.2.1 Methodology 113 5.2.2 Results for 1994 Data Predicting 1995 Ratings 116 5.2.3 Results for 1995 Data Predicting 1996 Ratings 125 5.2.4 Conclusions 133 5.3 Phase 7 - Predicting Unmatched Mutual Fund Ratings 134 5.3.1 Methodology 134 5.3.2 Results for 1994 Data Predicting 1995 Ratings 135 5.3.3 Results for 1995 Data Predicting 1996 Ratings 141 5.3.4 Conclusions 148 5.4 Overall Summary 148 6 SUMMARY AND FUTURE RESEARCH 150 APPENDICES A DESCRIPTION OF MUTUAL FUND FEATURES 155 B PHASE 1 CLASSIFICATION FEATURES 161 C PHASE 2 CLASSIFICATION FEATURES 165 D PHASE 3 CLASSIFICATION FEATURES 174 E BEST CLASSIFICATION TREES FROM PHASES 1-4 185 REFERENCES 208 BIOGRAPHICAL SKETCH 217 vii Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy AUTOMATED KNOWLEDGE ACQUISITION USING INDUCTIVE LEARNING: APPLICATION TO MUTUAL FUND CLASSIFICATION By Robert Clayton Norris, Ir. December 1997 Chairman: Dr. Gary J. Koehler Major Department: Decision and Information Sciences This research uses an inductive learning methodology that builds decision trees, Quinlan's C4.5, to classify mutual funds according to the Momingstar Mutual Fund five class or star rating system and predict the mutual fund ratings one year in the future. In the first part of the research, I compare the performance of C4.5, Logistic Regression and Linear Discriminant Analysis in classifying mutual funds according to the Momingstar rating system. As the size of the training set increases, so does C4.5's performance versus the two statistical methods. Overall, C4.5 performed as well as Logistic Regression in classifying the mutual funds and outperformed Linear Discriminant Analysis. This part of the research also explored the ability of C4.5 to classify equity mutual funds that were unrated by Momingstar. The results suggested that, with the proper features and a modification to the Momingstar five class rating system to three classes, unrated mutual funds could be classified with a 30% error. Anecdotal evidence suggested that investors purchase mutual funds by ratings and have an expectation that the rating will stay the same or improve. The second part of the research used a training set of one year to construct a decision tree with C4.5 to predict the ratings of mutual funds one year in the future. The testing set consisted of examples from the prediction year in question and the predictions were compared to the actual ratings for that year. The results were that, with the necessary feature vector, five-star fund ratings could be predicted with 65% accuracy. With a modification to the rating system changing it to three stars, predicted mutual fund ratings were 75% accurate. This research also identifies features that are useful for the classifying mutual funds by the Momingstar rating system and for the prediction of fund ratings. IX CHAPTER 1 INTRODUCTION 1.1 Background Glancing through a copy of Technical Analysis of Stocks & Commodities magazine, you find a great deal of information about artificial intelligence (AI) and the selection of stocks and commodities for portfolios. In the February 1997 issue of the magazine, the Tradersâ€™ Glossary even defines the term neural network. However, it is difficult to find scientific research about AI systems used on Wall Street since usually they are proprietary and could provide a competitive advantage to the investment firm. Five years ago AI use in financial applications was just beginning to be noticed For example, this story in the Wall Street Journal of October 27, 1992 about the use of artificial intelligence to select stocks for a mutual fund portfolio: â€œBradford Lewis, manager of Fidelity Investment Inc.â€™s Fidelity Disciplined Equity Fund, has found a way to out-perform standard indices using neural network software. A neural network is an artificial intelligence program that copies the workings of the human brain. The mutual fund, which invests in the same businesses as the Standard and Poor's 500 stock index (S & P 500), has won over the index by 2.3 to 5.6 percent for three years running and is maintaining its performance in FY 1993 . . . Lewis checks with analysts at Fidelity to double-check his results, but sometimes when he buys stock contrary to the computer's advice, he loses money.â€ (McGough, 1992, p. Cl) Academic research concerning mutual funds and artificial intelligence is relatively new since only three studies were cited in the literature from 1986 to the present. Chiang et al. (1996) described a neural network that used historical economic 1 2 information to predict the end-of-year Net Asset Value and that outperformed regression models. A second study (Lettau, 1994) used genetic algorithms to simulate the actions of investors buying and selling mutual funds. The third paper studied portfolio decisions of boundedly rational agents by modeling learning with a Genetic Algorithm (Lettau, 1997) and had nothing to do with mutual fund ratings, the topic of this research. The difficulty of conducting research about working AI systems requires researchers to design systems for study that would be of interest to the investor and practitioner. This leaves much room for applied research to develop systems that could, for example, classify mutual funds according to one of several popular rating systems. If we could classify mutual funds according to a rating system, why not go one step further and try to predict the ratings over some fixed horizon. Classification and prediction of mutual fund ratings are the essence of our research with an inductive learning program called C4.5, the direct descendent of ID3 (Quinlan, 1993, p.vii). Machine learning (ML) is a branch of artificial intelligence concerned with automated knowledge acquisition and inductive learning is a strategy of ML that seeks to produce generally applicable rules from the examination of a number of examples (Trippi and Lee, 1996). Since inductively generated rules could be used for classification problems, a common concern is the performance relative to other existing models for classification, such as linear discriminant analysis (LDA) and logistic regression (Logit). Unlike LDA and Logit, inductive learning makes no a priori assumptions about forms of the distribution of data (e g., a normal distribution) or the relationship between the features or variables describing each mutual fund. C4.5 builds decision trees to classify the examples, which consist of the features and an indicator function for class 3 membership. It then uses the best decision tree produced using a training set of examples to classify unseen examples in a testing set to determine the accuracy of the tree. C4.5 also has the ability to translate the decision tree into a ruleset that could be used as an alternative means for classification. We have selected the C4.5 decision tree generating program for this research for several reasons. First, as our literature review will show, decision tree programs, specifically ID3 and its successors, have been used for a variety of financial applications over 10 years (Braun and Chandler, 1987) and good results were achieved. Second, decision trees provide practitioners with a way of understanding how an individual example was classified. They can start at the root and see the values used for the features in partitioning the examples into their respective classes. The decision tree may be complex and difficult to understand but each example can be explained with it. Other AI programs used for financial applications do not have this capability. Neural networks inherently lack explanatory capability (Trippi and Lee, 1996). Genetic algorithms are complex systems that are considered hard to design and analyze (Goldberg, 1994). Third, C4.5 processes discrete, continuous and nominal valued features without transformation while this is not possible for neural networks and genetic algorithms. 1.2 Research Problem The number of common stocks far exceeded the number of mutual funds for many years after the enactment of the Investment Company Act of 1940. After all, the acquisition of mutual funds was a way of selecting a professional portfolio manager to pick stocks for you and, by 1976, there were 452 funds. Today we have the situation where the number of equity and bond funds exceeds the number of stocks on the New 4 York Stock Exchange. For many investors, selecting a fond has become a difficult decision. Over the years rating services have appeared to aid the investor in their decision of which mutual funds to buy and sell. There is contemporary evidence, presented later in this study, that investors appear to be buying mutual funds based on the ratings. This occurs despite the rating servicesâ€™ disclaimer that their ratings evaluate historical performance and are not a predictor of future performance. We also found that the rating services do not rate all the mutual funds. Momingstar, the major mutual fund rating service, does not evaluate a mutual fund for a rating unless they have a three-year financial history. For example, the June 30, 1996, Momingstar Principia CD-ROM showed Momingstar was tracking 3,794 equity mutual funds but had rated only 1,848 funds or less than half. Chapter 3 discusses how Momingstar rates mutual funds. Being able to classify equity mutual funds according to the rules of a rating system would be an important ability. There may be relationships among the variety of data, financial and otherwise, that would permit the classification of mutual funds not yet rated by Momingstar. In addition, if we could classify mutual funds with a high degree of accuracy to a known rating system, there could be relationships among the data that permit predicting the future rating of a mutual fund already rated by Momingstar. In developing a process to classify mutual funds and predict their ratings, we could also automate the identification of important features that aid the classification process. In addition, we could use decision trees to develop a knowledge base of the relationships between these features. 5 This investigation of inductive learning using decision trees for classification and prediction consists of two parts. The first part evaluates the performance of C4.5 in classifying mutual funds against two statistical techniques used for classification in the field of finance, LDA and Logit. The second part evaluates the ability of C4.5 to predict mutual fund ratings by comparing the performance of C4.5 to actual ratings and conducting statistical tests of goodness-of-fit on the predicted and actual ratings distributions. The results are analyzed to gain insights into the relationships among the data used for classification and prediction. The benefits of this research could be extended to studying other mutual fund rating systems and other types of mutual funds such as bond and hybrid funds. This problem is of interest because the domain theory is not well developed. In such problems, data-driven search techniques, such as inductive learning, have been found to provide particularly good results. 1.3 Purpose The approach to this study is empirical. Beginning with the research problem a number of experiments were designed using an AI methodology and current statistical classification techniques to find a solution to the classification of mutual funds and the prediction of their ratings The major goals of the study are as follows: â€¢ Demonstrate the relevance of inductive learning techniques in solving a real world problem. â€¢ Investigate relationships in the domain data that could contribute to understanding the rating system. â€¢ Investigate the application of an existing methodology to a new domain. 6 14 Motivation Interest in the use of AI techniques in various business domains has grown very rapidly and this is evident in the field of investment analysis. We have already identified the use of neural networks in selecting portfolios for mutual funds and predicting the end- of-year Net Asset Value. Additionally, traditional statistical methods are often used in conjunction with, or in competition with, AI techniques. However, statistical methods rely upon assumptions about the underlying distribution of the data that are usually ignored or assumed away. AI methodologies make no such assumptions and can be applied without invalidating the model. Exploring the application of induction to mutual funds classification and prediction will prove to be very useful and provide an area of research for strategic and competitive use of artificial intelligence in information systems. 1.5 Chapter Organization Chapter 2 of this study reviews the literature on artificial intelligence and its financial applications. The chapter begins with an overview of artificial intelligence and machine learning research, a review of AI and financial applications, followed by definitions of learning and other formal concepts. Then we discuss the induction process of decision trees for classification and prediction. The chapter ends with a discussion of LDA and Logit classification. Chapter 3 provides an overview of the problem domain. It reviews mutual fund rating systems, followed by an analysis of the problem, the hypotheses to be tested, and the benefits of the research effort. This chapter ends with a figure mapping out the experimental design. 7 Chapters 4 and 5 provide details of the experimental methodology, results and analysis, and the conclusions we draw from the results. Chapter 4 involves classifying mutual funds using C4.5, LDA, and Logit. Chapter 5 involves predicting mutual fund ratings one year in the future using C4 5. Finally, Chapter 6 provides a summary and conclusion of the research and discusses extensions of this work. CHAPTER 2 LITERATURE REVIEW 2.1 Historical Overview of Machine Learning The field of machine learning concerns computer programs that can imitate the learning behavior of humans (Natarajan, 1991). Learning is the improvement of performance in some environment through the acquisition of knowledge resulting from experience in that environment (Langley, 1996). From the very beginning of artificial intelligence, researchers have sought to understand the process of learning and to create computer programs that can learn (Cohen and Feigenbaum, 1982). Two reasons for studying learning are to understand the process and to provide computers with the ability to learn. 211 Brief History of AI Research on Learning AI research on learning started in the late 1950's with work on self-organizing systems that modified themselves to adapt to their environments. Through the use of feedback and a given set of stimuli, the researchers thought the systems would evolve. Most of these first attempts did not produce systems of any complexity or intelligence (Cohen and Feigenbaum, 1982). In the 1960s, AI research turned to knowledge-based problem solving and natural language understanding. Workers adopted the view that learning is a complex and difficult process, and that a learning system could not learn high-level concepts by starting without any knowledge at all. This viewpoint resulted in some researchers studying simple 8 9 problems in great detail and led others to incorporate large amounts of domain knowledge into learning systems so they could explore high-level concepts (Cohen and Feigenbaum, 1982). A third stage of learning research, searching for ways to acquire knowledge for expert systems, focuses on ail forms of learning, including advice-taking and learning from analogies. This stage began in earnest in the late 1970s (Feigenbaum et al., 1988). An expert system imitates the intellectual activities that make a human an expert in an area such as financial applications. The key elements of a traditional expert system are a user interface, knowledge base, an inference engine, explanation capability, and a knowledge acquisition system (Trippi and Lee, 1996). The knowledge base consists of facts, rules, and heuristic knowledge supplied by an expert who may be assisted in this task by a knowledge engineer. Knowledge representation formalizes and organizes the knowledge using IF-THEN production rules (Feigenbaum et al., 1988). Other representations of knowledge, such as frames, may be used. Figure 2.1: Basic Structure of an Expert System. 10 The inference engine uses the knowledge base plus facts provided by the user to draw inferences in making a recommendation. The system can chain the IF-THEN rules together from a set of initial conditions moving to a conclusion. This approach to problem solving is called forward chaining. If the conclusion is known but the path to that conclusion is not known, then reasoning backwards, or backward chaining, is used (Feigenbaum et al., 1988). Because an expert system uses uncertain or heuristic knowledge, its credibility is often in question. The explanation capability is available to explain to the user how a particular fact was inferred or why a particular question was asked. This capability can be used to find incorrect rules in the knowledge base (Feigenbaum et al., 1988). 2.1.2 Four Perspectives on Learning With this brief overview of Al research and learning, it is now important to turn to the four perspectives on learning itself. Simon defined learning "as any process by which a system improves its performance (Cohen and Feigenbaum, 1982, p.326)." This assumes that the system has a task that it is attempting to perform and it may improve its performance in two ways: applying new methods and knowledge, or improving existing methods and knowledge. Expert systems researchers take a more limited view of learning by saying it is "the acquisition of explicit knowledge." (Cohen and Feigenbaum, 1982). Expert systems usually represent knowledge as a collection of rules and this viewpoint means that acquired knowledge should be explicit so that it can be verified, modified, and explained. 11 A third view is that learning is skill acquisition. Researchers in AI and cognitive psychology have sought to understand the kinds of knowledge needed to perform a task skillfully. A fourth view of learning comes from the collective fields of science and focuses on theory formation, hypothesis formation, and inductive inference. Simon's perspective of learning has been the most useful for machine learning development and Cohen and Feigenbaum (1982) have modeled a learning system consisting of the environment, a learning element, a knowledge base, and a performance element. Figure 2.2: A Simple Model of Learning Systems. The environment supplies some information to the learning element, the learning element uses this information to make improvements in an explicit knowledge base, and the performance element uses the knowledge base to perform its task. Information gained during attempts to perform the task can serve as feedback to the learning element. This simple model allows us to classify learning systems according to how they fit into these four functional elements. 12 From these four perspectives and with the availability of a learning model, AI researchers have developed four learning situations: rote learning, learning by being told, learning from examples or induction, and learning by analogy. 2.1.2 1 Rote learning Rote learning is memorization of the problem and the solution. New knowledge is saved to be retrieved later. However, rote learning is useful only if it takes less time to retrieve the desired item than it does to recompute it. Rote learning is not very useful in a dynamic environment since a basic assumption is that information acquired today will be valid in the future (Cohen and Feigenbaum, 1982). An example of a rote learning system is Samuel's Checkers Player that evaluated possible moves by conducting a minimax game-tree search and was able to improve its performance by memorizing every board position it evaluated. Cohen and Feigenbaum (1982) describe how the system could not search the 1040 possible moves in checkers and evaluated just a few moves into the future, choosing the move that would lead to the best position. The look-ahead search portion of Samuel's program served as the environment. It supplied the learning element with board positions and their backed-up minimax values. The learning element simply stored these board positions and indexed them for rapid retrieval. The program became capable of playing a very good opening game. Rote learning did not improve the middle game since the number of possible moves was greater. At the end game, the system would wander since each possible solution, winning the game, had comparable values. 13 2.1.2.2 Learning by taking advice Learning by taking advice focuses on converting expert advice into expert performance. Research on advice-taking systems has followed two major paths: 1) systems that accept abstract, high-level advice and convert it into rules to guide a performance element, and 2) systems that develop sophisticated tools that make it easier for the expert to transform their own expertise into rules. Five processes identified to convert expert advice into program performance were as follows: 1. Request advice from the expert, 2. Interpret or assimilate the advice into an internal representation, 3. Operationalize or convert the advice into a usable form, 4. Integrate advice correctly into the knowledge base, and 5. Evaluate the resulting actions of the performance element. The principal shortcoming of learning by taking advice was that the various methods were quite specific to the task and generalization would require substantial effort. This approach was used in building knowledge-based expert systems such as MYCIN, which acted as a medical consultant system aiding in the diagnosis of patients with bacteremia or meningitis infections (Barr and Feigenbaum, 1981). The system carried on an interactive dialogue with a physician and was capable of explaining its reasoning. MYCIN had a knowledge acquisition subsystem, TEIRESIAS, which helped expert physicians expand or modify the rule base. 2.1.2.3 Learning bv analogy A third approach to learning is by analogy. If a system has an analogous knowledge base, it may be able to improve its performance on a related task by recognizing the 14 analogies and transferring relevant knowledge to another knowledge base specific to the task (Cohen and Feigenbaum, 1982). An example of this approach in actual use is the AM computer program written by Douglas Lenat that discovers concepts in elementary mathematics and set theory. In searching the rule space, AM may employ one of 40 heuristics described as reasoning by analogy. Cohen and Feigenbaum (1982) reported little research in this area. 2.1.2.4 Learning bv example Learning by example, or inductive learning, requires a program to reason from specific instances to general rules that can guide the actions of the performance element (Cohen and Feigenbaum, 1982). The researcher presents the learning element with very low level information, in the form of a specific situation, and the appropriate behavior for the performance element in that situation. The program generalizes this information to obtain general rules of behavior. An important early paper on induction described the two-space view of learning from examples: Simon and Lea (1974). describe the problem of learning from examples as the problem of using training instances, selected from some space of possible instances, to guide a search for general rules. They call the space of possible training instances the instance space and the space of possible general rules the rule space. Furthermore, Simon and Lea point out that an intelligent program might select its own training instances by actively searching the instance space in order to resolve some ambiguity about the rules in the rule space (Cohen and Feigenbaum, 1982, p. 360). Simon and Lea viewed the learning system as moving back and forth between an instance space and a rule space until it converged on the desired rule. Many different approaches to leaming-by-example, such as neural networks and genetic algorithms, have been developed and used in financial applications. In the 15 remainder of this chapter we will review the use of AI in financial applications by discussing the use of expert systems and describing neural networks, genetic algorithms, and inductive learning systems. 2.2 AI and Financial Applications 2.2.1 Expert Systems We have previously defined expert systems in 2.1.1. Artificial intelligence has been applied to business and finance since the early 1980's starting with expert systems (Schreiber, 1984). Expert Systems were used for production, management, sales, and finance. For example, an investment banking firm was using an expert system in its international brokerage operations to manage foreign institutional portfolios (Wilson and Koehler, 1986). Hansen and Messier (1986) suggested the use of an expert system for auditing advanced computer systems while Culbertson (1987) provided an overview showing how expert systems could also be used in accounting. In (Sena and Smith, 1987) an expert system was developed to ask questions about oil company financial statements and made a judgement about whether the statements were within industry norms By 1988, expert systems were being used in a number of companies. Texas Instruments, Du Pont, IBM, American Express, Canon, Fujitsu, and Northrop were showcased in The Rise of the Expert Company (Feigenbaum et al., 1988). Expert systems could provide internal cost savings, improve product quality control, improve the consistency of decision making, preserve knowledge, and restructure business to enlarge customer choice. This book reported on 139 expert systems in use at a variety of companies in the agriculture, communications, computers, construction, financial, 16 manufacturing, mining, medical, and transportation industries. The financial applications included internal audit risk assessment systems, sales tax advising, risk analysis for underwriting, portfolio management, credit authorization, income tax advising, financial statement analysis, mortgage loan analysis, and foreign exchange options analysis. The use of expert systems for commercial loan decisions is described in Duchessi et al. (1988). The late 80s saw the rise of the bankruptcy of the Savings & Loan industry so expert systems were used to conduct a financial analysis of their potential failure (Elmer and Borowski, 1988). Shaw and Gentry (1988) described an improvement in the design of expert systems: the ability to enhance performance by reacting to a changing environment. Their MARBLE system used an inductive learning capability to update the 80 decision rules that evaluated business loans. Expert systems were also moving into the area of stock trading. Laurance (1988) described a trading expert system using 30 rules and a buy-and-hold strategy that was superior to the Standard & Poor's 500 Index, a benchmark for manager performance. Arend (1988) reported about an expert trading system that constantly adapted to new information based on current market conditions. In Holsapple et al. (1988) it was pointed out that the business world was slow to accept expert systems. Notably, unsatisfactory results were achieved in finance due to unrealistic expectations and managerial mistakes. The authors went on to mention that the current technology was inadequate for applications requiring insight, creativity, and intuition; however, it could be used for financial decision support systems. Recognizing that arbitrageurs could take advantage of the discrepancies between the futures market and the stock market, an expert system for program trading was 17 proposed (Chen and Liang, 1989). This system had the ability to update its rule base with a learning mechanism and human observations about the markets. Miller (1990) provides an overview of financial expert systems technology to include a discussion of the problem domain, heuristics, and the architecture; and discussed the rule structure and logic of an expert system. Expert systems have also been applied to assessing audit risks and evaluating client economic performance (Graham et al., 1991). Coopers & Lybrand developed a system to assist in audit planning. The field of mortgage guaranty insurance underwriting has also been able to harness expert systems (Gluch-Rucys and Walker, 1991). United Guaranty Residential Insurance Co. implemented an expert system that could assist underwriters with 75% of mortgage insurance applications. The determination of corporate tax status and liabilities is another application of expert systems to finance (Jih and Patterson, 1992). The STAX system used the Guru expert system shell to calculate taxes and determined tax consequences of different corporate filing statuses. 2.2.2 Neural Networks and Financial Applications Neural networks originated as a model of how the brain works: McCulloch and Pitts formulated the first neural network model [McCulloch 43], It featured digital neurons but no ability to leam. The work of another psychologist, Donald Hebb, introduced the idea of Hebbian learning (as detailed in Organization of Behavior [Hebb 49]), which states that changes in synaptic strengths (connections between neurons) are proportional to the activations of the neurons. This was a formal basis for the creation of neural networks with the ability to leam (Blum, 1992, p. 4). Real neurons, such as Figure 2.3, consist of a cell body, one axon (a protuberance that delivers the neuron's output to connections with other neurons), and many dendrites which receive inputs from axons of other neurons (Winston, 1992). A neuron does nothing unless 18 the collective influence of all its inputs reaches a threshold level. When that happens, the neuron produces a full-strength output in the form of an electrical pulse that travels down the axon to the next cell, which is separated from it by the synapse. Whenever this happens, the neuron is said to fire. Stimulation at some synapses may cause a neuron to fire while stimulation at others may discourage the neuron from firing. There is mounting evidence that learning takes place near synapses (Winston, 1992). Dendriifts Axon Figure 2.3: Real Neuron. In the neural network, multipliers, adders, and thresholds replace the neuron (Winston, 1992). Neural networks do not model much of the character of real neurons. A simulated neuron simply adds up a weighted sum of its inputs and fires whenever the threshold level is reached. The development of neural networks was seriously derailed in 1969 by the publication of Perceptrons by Marvin Minsky and Seymour Papert which pointed out limitations in the prevailing memory model at that time (Blum, 1992). It wasn't until 1986 with the development of backpropagation (explained later on), permitting the training of 19 multi-layer neural networks, that neural networks became a practical tool for solving problems that would be quite difficult using conventional computer science techniques. Most neural networks consist of an input layer, a hidden or processing layer, and an output layer. The hidden layer may be more than one layer itself. Figure 2.2 is an example of a multi-layer neural network. In this figure, xt, h\, and oi represent unit activation levels of input, hidden, and output units. In Figure 2.5, we show a simple neural network. Input signals are received from the node's links, assigned weights (the ws), and added. The value of the node, Y, is the sum of all the weighted input signals. This value is compared with the threshold activation level of the node. When the value meets the threshold level, the node transmits a signal to its neighboring nodes 20 Figure 2.5: An Artificial Neuron. Each unit in one layer is connected in the forward direction to every unit in the next layer. Activations flow from the input layer through the hidden layer, then on to the output layer. The knowledge of the network is encoded in the weights on connections between units. The existence of hidden units allows the network to develop complex feature detectors, or internal representations (Rich and Knight, 1991). Neural networks learn by supervised training and self-organizing training (Winston, 1992). Supervised training, which we explain here, has the network given a set of examples (x,yj wherey is the correct response for*. In a multilayered neural network, the output nodes detect the errors. These errors are propagated back to the nodes in the previous layer, and the process is repeated until the input layer is reached. An effective algorithm that learns in this fashion (adjusting the weights incrementally toward reducing the errors to within some threshold) is the backpropagation algorithm (Rumelhart and McClelland, 1986). The discovery of this algorithm was largely responsible for the renewal of interest in neural networks in the mid- 1980s, after a decade of dormancy (Trippi and Lee, 1996). 21 The backpropagation neural network typically starts out with a random set of weights. The network adjusts its weights each time it sees an example (x,y). Each exmple requires two stages: a forward pass and a backward pass. The forward pass involves presenting a sample input, x, to the network and letting activations flow until they reach the output layer. During the backward pass, the networkâ€™s actual output from the forward pass is compared with the correct response, y, and error estimates are computed for the output units The weights connected to the output units are adjusted in order to reduce those errors (Rich and Knight, 1991). The advantages and disadvantages of neural networks are as follows: 1. Neural networks excel at taking data presented to them and determining what data are relevant. Irrelevant data simply have such low connection strength to all of the output neurons that it results in no effect. 2. Because of the abundance of input factors, noise in the data are not as much of a problem with neural networks. 3. Each synapse in a neural net model can be its own processor. There are no time dependencies among synapses in the same layer. Thus, neural networks exhibit inherent parallelism. 4. Training may require thousands of evolutions. 5. Back propagation, which uses gradient descent, can get stuck in local minima or become unstable. 6. Excess weights may lead to overfitting of the data. Some consider neural network training to be an art that requires trial-and-error (Winston, 1992). 22 The use of neural networks for financial applications occurred after the use of expert systems in this area Dutta and Shekhar (1988) proposed using a neural network to predict bond ratings. They trained a neural network using ten features they felt were representative of bond ratings and had thirty bond issues in the training set. They tested the network against seventeen bonds and the neural network outperformed regression analysis. Miller (1990) devoted no more than six pages in his book to explaining the neural network concept without identifying possible applications. Hawley et al. (1990) outlined the advantages and disadvantages of neural networks vs. expert systems. They also included potential applications such as: financial simulation, financial forecasting, financial valuation, assessing bankruptcy risk, portfolio management, pricing out Initial Purchase Offerings, identifying arbitrage opportunities, performing technical analysis to predict the short-term movements in stock prices, and performing fundamental analysis to evaluate stocks. Coats and Fant (1991) used neural networks to forecast financial distress in businesses The neural network correctly forecast 91% of the distressed firms as distressed and 96% of the healthy firms as healthy. This is in contrast to multiple discriminant analysis correctly identifying 72% of the distressed firms and 89% of the healthy firms. Neural networks have also been used in credit scoring (Jensen, 1992) or procedures used to grant or deny credit. Applicant characteristics were the input nodes and three categories of payment history were the output nodes. The neural network was trained with 125 credit applicants whose loan outcomes were known. Correct classifications were made on 76% of the testing sample. Neural networks have also been 23 used in predicting savings and loan company failures (Salchenberger et al., 1992) and bank failures (Tam and Kiang, 1992). Trippi and DeSieno (1992) described trading Standard and Poor's 500 index futures with a neural network. Their system consisted of several trained networks plus a set of rules for combining network results to generate a composite recommendation for the current day's position. The training period spanned 1,168 days from January 1986 to June 1990. The test period covered 106 days from December 1990 through May 1991 and the system outperformed a passive investment strategy in the index. In (Kryzanowski et al., 1993) a neural network was provided historical and current accounting data, and macroeconomic data to discriminate between stocks having superior future returns and inferior future returns. On 149 test cases the system correctly classified 66.4% of the stocks. Pirimuthu et al. (1993) studied ways of improving the performance of the backpropagation algorithm. They noted that back propagation uses the steepest gradient search for hill climbing. In essence, it is a linear method and they developed a quadratic method to improve convergence of the algorithm. They compared the results of predicting bankruptcy by several types of neural networks to the performance of NEWQ (Hansen et al., 1993), ID3, and Probit. The training set consisted of 56 randomly selected examples and the testing set was the remaining 46 examples Overall, the backpropagation neural network algorithms performed better than the ID3, NEWQ, and Probit although the run times by the neural networks were much longer than the other methods. 24 Yoon et al. (1994) brought together neural networks and the rule-based expert system. The motivation for this study was to highlight the advantages and overcome the disadvantages of the two approaches used separately. The primary advantage of rule- based expert systems was the readability of the process since it uses explicit rules. A disadvantage to developing such an expert system is the difficulty of developing those rules. The authors used an artificial neural network as the knowledge base of the expert system. The connection weights of the neural network specify the decision rules in an implicit manner The explanation module is a rule-based system in which knowledge implicitly encoded in the neural network has been translated into an "IF-THEN" format. The training and testing sets consisted of 76 companies each. The neural network system achieved a correct classification of 76%. In comparison, a Multivariate Discriminant Analysis model classified the data correctly only 63%. Hutchinson et al. (1994) proposed using a neural network for pricing and hedging derivative securities They took as inputs the primary economic variables that influenced the derivative's price and defined the derivative price to be the output into which the neural network maps the inputs When properly trained, the network "becomes" the derivative pricing formula. The neural network would provide a nonparametric pricing method. It was adaptive and responded to structural changes in the data-generating processes in ways that parametric models could not, and it was flexible enough to encompass a wide range of derivative securities. The disadvantage of this approach was that large quantities of data were required, meaning that this would be inappropriate for thinly traded derivatives or new instruments. Overall, the system achieved error levels 25 similar to those of the Black-Scholes formula (Black and Scholes, 1973) used for pricing the derivatives Trading on the Edge (Deboeck, 1994) reviewed the use of neural networks for securities trading. It provided an overview of neural network techniques, explained the need for pre-processing financial data, discussed using neural networks for predicting the direction of the Tokyo stock exchange, and described a neural network for trading U.S. Treasury notes. The Tokyo stock exchange neural network had a 62.1% correct prediction rate after being put in service in September 1989. A major benefit was that it reduced the number of trades needed to implement a hedging position, which saved on commissions. The Treasury notes neural network was evaluated based on the number of recommended trades, the average profit and loss per trade, and the maximum gains, losses, and drawdowns. In each case the system provided a higher average profit than that achieved during the same period. It was noted, however, that the system performed better when trained on a specific two-year period than when trained on data from a longer period. We will mention this book again in our discussion of genetic algorithms. Jain and Nag (1995) developed a neural network for pricing initial public offerings (IPO). They noted that a vast body of empirical evidence suggested that such offerings were underpriced by as much as 15% and this represented a huge loss to the issuer. In developing the model, 276 new issues were used for training the network and 276 new issues were used for testing the network. They used 11 input features representing a broad spectrum of financial indicators: the reputation of the investment banker, the log of the gross proceeds, the extent of ownership retained by the original entrepreneurs, the inverse of sales in millions of dollars in the year prior to the IPO, 26 capital expenditures over assets, capital expenditures over sales, operating return over assets, operating return over sales, operating cash flow over assets, operating cash flow over sales, and asset turnover. The results showed that the neural network generated market price distributions that outperformed the pricing of investment bankers. Neural networks have also been used to predict the targets of investigation for fraudulent financial reporting by the Securities and Exchange Commission (Kwon and Feroz, 1996). The network outperformed Logit and the study showed that non-financial information could provide more predictive information than the financial information alone. Another study compared the performance of neural networks to LDA and Logit scoring models for the credit union environment (Desai et al., 1996). The study determined that neural networks outperformed the other methods in correctly classifying the percentage of bad loans. If the performance measure was correctly classifying good and bad loans, then logistic regression is comparable to the neural network. Hobbs and Bourbakis (1996) studied the success of a neural network computer model to predict the price of a stock, given the fluctuations in the rest of the market that day. Based on the neural network's prediction, the program then measured its success by simulating buying or selling that stock, based on whether the market's price was determined to be overvalued or undervalued. The program consistently averaged over a 20% annual percent return and was time tested over six years with several stocks. Two books that focused on the use of neural networks for investing were from Trippi and Turban (1996b), a collection of journal articles written by others from 1988 to 1995, and Trippi and Lee (1996), a revision of an earlier book they published in 1992. This 27 book reviews modem portfolio theory, provides an overview of A1 in investment management, discusses machine learning and neural networks, and describes integrating knowledge with databases. A final study concerned using a neural network to forecast mutual fund end-of-year net asset value (Chiang et al., 1996) Fifteen economic variables for 101 U S. mutual funds were identified as input to three models: a neural network, a linear regression model, and a nonlinear regression model. The models were developed using a dataset covered the six- year period from 1981 to 1986 and were evaluated using the actual 1986 Net Asset Values. The predictions by the neural network had the lowest error rate of the three models. 2,2.3 Genetic Algorithms and Financial Applications Genetic Algorithms (GAs) are search algorithms based on the mechanics of natural selection and natural genetics (Holland, 1975). They have been shown to be effective at exploring large and complex spaces in an adaptive way, guided by the equivalent biological mechanisms of reproduction, crossover, and mutation. GAs have been used for machine learning applications, including classification and prediction tasks, to evolve weights for neural networks, and mies for learning classifier systems (Mitchell, 1997). Genetic Algorithms combine survival-of-the-fittest among string structures with a structured, yet randomized, information exchange to form a search algorithm with some of the innovative flair of human search. The strings are referred to as chromosomes and they are composed of genes (a feature on the chromosome) which have values referred to as alleles (Goldberg, 1989). In every generation, three operators create a new set of chromosomes: selection, crossover, and mutation. The selection operator selects chromosomes in the population for 28 reproduction based on a fitness function that assigns a score (fitness) to each chromosome in the current population. The fitness of a chromosome depends on how well that chromosome solves the problem at hand (Mitchell, 1997). The fitter the chromosome, the greater the probability for it to be selected to reproduce. The crossover operator randomly chooses a locus and exchanges the chromosomal subsequences before and after that locus to create two offspring. The mutation operator randomly flips some of the bits in a chromosome. Mutation can occur at each bit position with some very small probability. While randomized, Genetic Algorithms are no simple random walk. They efficiently exploit historical information to speculate on new search point with expected improved performance (Goldberg, 1989). By way of explanation, we provide a simple Genetic Algorithm with a fitness function that we want to maximize, for example, the real-valued one dimensional function: f(y) =y+ \sin (32y) |, 0 (Riolo, 1992). The candidate solutions are values of y, which are encoded as bit strings representing real numbers. The fitness calculation translates a given bit string x into a real number y and then evaluates the function at that value (Mitchell, 1997). The fitness of a string is the function value at that point. reproduction step individual strings are copied according to their objective function values, f(y). Copying strings according to their fitness value means that strings with a higher value have a higher probability of contributing one or more offspring in the next generation. This operator is an artificial version of natural selection (Goldberg, 1989). 29 crossover step mutation step After reproduction, crossover may proceed in two steps. First, members of the newly reproduced strings in the mating pool are mated at random. Second, each pair of strings could undergo crossing over as shown below, however, not all strings mate: Consider strings A1 and Ai A,= 1 0 1 1 |0 1 0 1 A2= 1 1 1 0 I 0 0 0 0 The separator | indicates the uniformly, randomly selected crossover site. The resulting crossover yields two new strings where the prime (') means the strings are part of the new generation: A',= 1 0 1 1 0 0 0 0 A\= 11100101 The mechanics of reproduction and crossover are surprisingly simple, involving random number generation, string copies, and some partial string exchanges. Mutation has been referred to as bit flipping. This operator randomly changes 0s to Is, and vice versa. When used sparingly, as recommended, with reproduction and crossover, it is an insurance policy against premature loss of important string values (Goldberg, 1989). 30 With the production of a new generation, the system evaluates the fitness function for the maximum fitness of the artificial gene pool. A Genetic Algorithm is typically iterated for anywhere from 50 to 500 or more generations (Mitchell, 1997). One stopping criteria for GAs is convergence of the chromosome population, defined as when 95% of the chromosomes in the population all contain the same value or, more loosely, when the GA has stopped finding new, better solutions (Heitkoetter and Beasley, 1997). Other stopping criteria concern the utilization of resources (computer time, etc ). The entire set of generations is called a run and, at the end of a run, there are often one or more highly fit chromosomes in the population. Goldberg (1989) describes the development of a classifier system using genetic algorithms. The backbone of a classifier system is its rule and message system, a type of production or rule-based system. The rules are of the form, if then ; however, in classifier systems, conditions and actions are restricted to be fixed-length strings. Classifier systems have parallel rule activation versus expert systems that use serial rule activation. Mitchell (1997) noted that Genetic Algorithms are used for evolving rule-based systems, e g., classifier systems, in which incremental learning (and remembering what has already been learned) is important and in which members of the population collectively solve the problem at hand. This is often accomplished using the Steady-State population selection operator in which only a few chromosomes of the least fit individuals are replaced by offspring resulting from crossover and mutation of the fittest individuals. 31 GAs have been proposed to work in conjunction with other machine learning systems, such as neural networks (Kuncheva, 1993), In this sketch of an AI application, the neural network is set up to provide a trading recommendation, for example, stay long, stay short, or stay out of the market. The Genetic Algorithm is used to estimate the weights for a neural network that optimizes the user-defined performance objectives and meets user- defined constraints or risk limits. For example, they used a fitness function of the average annual return achieved over three years. A GA was applied to a portfolio merging problem of maximizing the retum/risk ratio with the added constraint of satisficing expected return (Edelson and Gargano, 1995). The original problem was recast as a goal programming problem so that GAs could be used. The results obtained with the GAs were comparable to those calculated by quadratic programming techniques. The use of the goal programming conversion reduced the number of generations to obtain convergence from 6,527 down to 780. Mahfoud and Mani (1995) developed a procedure for extending GAs from optimization problems to classification and prediction so that they could predict individual stock performance. They describe the use of a niching method that permits the GA to converge around multiple solutions or niches, instead of the traditional single point in the solution space. The analogy in the financial forecasting case is that different rules within the same GA population can perform forecasting for different sets of market and individual company conditions, contexts, or situations. The niching method was used to predict the direction of a randomly selected MidCap stock from the Standard & Poor's 400. The GA correctly predicted the stock's direction relative to the market 47.6% of the time, produced no prediction 45.8% of the time, and incorrectly predicted the direction relative to the 32 market 6.6% of the time. The no prediction state is equivalent to the stock being equally likely to go in either direction. In Trippi and Lee (1996), they describe a Genetic Algorithm used for a stock market trading rule generation system. Buy and sell rules were represented by 20-element bit strings to examine a solution space of 554,496 possible combinations. The GA was run using a crossover rate of 0.6 and a mutation rate of 0.002. In 10 experiments of 10 trials each using different starting strings, the average monthly returns of the best rule parameters ranged from 6.04 to 7.52 percent, ignoring transaction costs. Although these results were converged upon quickly by the GA, they did not differ much from optimal rules that were obtained by a time-consuming exhaustive search. Genetic Algorithms were used them to optimize the topology of a neural network that predicted a stock's systematic risk, using the financial statements of 67 German corporations from the period 1967 to 1986 (Wittkemper and Steiner, 1996). Additionally, in two studies related to mutual funds but not rating systems, GAs were used to simulate adaptive learning in a simple static financial market designed to exhibit very similar behavior as mutual fund investors (Lettau, 1994) and (Lettau, 1997). 2.3 The C4.5 Learning System Research on learning is composed of diverse subfields. At one extreme, adaptive systems monitor their own performance and attempt to improve it by adjusting internal parameters A quite different approach sees learning as the acquisition of structured knowledge in the form of concepts, or classification mies (Quinlan, 1986). A primary task studied in machine learning has been developing classification mies from examples (also called supervised learning). In this task, a learning algorithm receives a set of training 33 examples; each labeled as belonging to a particular class. The goal of the algorithm is to produce a classification rule for correctly assigning new examples to these classes. For instance, examples could be a vector of descriptive values or features of mutual funds. The classes could be the Momingstar Mutual Fund ratings and the task of the learning system is to produce a rule (or a set of rules) for predicting with high accuracy the rating for new mutual funds. We will focus on the data-driven approach of decision trees, specifically the C4.5 system (Quinlan, 1993). We present a brief history of C4.5, the algorithms used, the limitations of the system, and examples of its use. 2.3 .1 Brief History ofC4.5 C4.5 traces its roots back to CLS (Concept Learning System), a learning algorithm devised by Earl Hunt (Hunt et al., 1966). It solved single-concept learning tasks and used the learned concepts to classify new examples. CLS constructed a decision tree that attempted to minimize the cost of classifying an object (Quinlan, 1986). This cost had two components: the measurement cost of determining the value of property A exhibited by the object, and the misclassification cost of deciding that the object belongs to class J when its real class was K. The immediate predecessor of C4.5 was ID3 and it used a feature vector representation to describe training examples. A distinguishing aspect of the feature vector is that it may take on continuous real values as well as discrete symbolic or numeric values (Cohen and Feigenbaum, 1982). Concepts are represented as decision trees. We classify an example by starting at the root of the tree and making tests and following branches until a node is arrived at that indicates the class. For example, Figure 2.4 shows a decision tree 34 with symbolic values of Good and Bad expert opinions on a stock. We call this node the root of the decision tree. The tree branches to Price/Eamings (P/E) if the Expert Opinion is Good and Price/Book (P/B) if the Expert Opinion is Bad. If the Expert Opinion is Good and the P/E is > 3, then we classify the stock as Expected Return = High. If the P/E of the stock is < 2, then we classify the stock as Expected Return = Medium. If the Expert Opinion is Bad and the P/B is < 3, then we classify the stock as Expected Return = Medium; for P/B > 4, then Expected Return = Low. Figure 2.4; Decision Tree Example. Decision trees are inherently disjunctive, since each branch leaving a decision node corresponds to a separate disjunctive case. The left-hand side of the decision tree in Figure 2.4 for high expected return is equivalent to the predicate calculus expression: [Expert Opinion (x,Good) v â€”i Expert Opinion (x, Bad)] a [P/E (x, > 3) v P/E (x, < 2)] 35 Consequently, decision trees can be used to represent disjunctive concepts (Cohen and Feigenbaum, 1982). ID3 was designed for the learning situation in which there are many features and the training set contains many examples, but where a reasonably good decision tree is required without much computation. It has generally been found to construct simple decision trees, but the approach it uses cannot guarantee that better trees have not been overlooked (Quinlan, 1986). This will be discussed in more detail in Section 4.2. The crux of the problem for ID3 was how to form a decision tree for an arbitrary collection C of examples. If C was empty or contained only examples of one class, the simplest decision tree was just a leaf labeled with the class. Otherwise, let T be any test on an example with possible outcomes {0Â¡, O2,... 0â€ž). Each example in C would give one of these outcomes for 7) so T produced a partition {Ci, C2,..., Cw} of C with C, containing those examples having outcome O,. If each subset Cj was replaced by a decision tree for CÂ¡, the result would be a decision tree for all of C. Moreover, so long as two or more CÂ¡'s are non-empty, each C, is smaller than C. In the worst case, this divide-and-conquer strategy would yield single-example subsets that satisfied the one-class requirement for a leaf Thus, if a test could always be found that gave a non-trivial partition of any set of examples, this procedure could always produce a decision tree that correctly classifies each example in C (Quinlan, 1986). The choice of test was crucial for ID3 if the decision tree was to be simple and ID3 used an information-based method that depended on two assumptions. Let C contain p examples of class P and n of class N. The assumptions were: 36 (1) Any correct decision tree for C will classify examples in the same proportion as their representation in C. An arbitrary example will be determined to belong to class P with probability p/(p +â– n) and to class N with probability n/(p+n). (2) When a decision tree is used to classify an example, it returns a class. A decision tree can thus be regarded as a source of a message 'P' or 'N\ with the expected information needed to generate this message given by I(p,n)= - -Aâ€”logy P n , n log; p + n p + n p + n p + n If feature/! with values {Au Ai,..., /fw) is used for the root of the decision tree, it will partition C into {C\, C2, ...,CV} where CÂ¡ contains those examples in C that have value AÂ¡ of A. Let CÂ¡ contain p, examples of class P and n, of class N. The expected information required for the subtree for C, is I(pÂ¡, nj. The expected information required for the tree with A as root is then obtained as the weighted average E(A) = Â¿Pl I(pt,n,) Â« P + n where the weight for the ith branch is the proportion of the examples in C that belong to CÂ¡. The information gained by branching on A is, therefore gain(A) = I(p, n) - E(A) One approach would be to choose a feature to branch on which gains the most information. ID3 examines all candidate features and chooses A to maximize gain(A), forms the tree as above, and then uses the same process recursively to form decision trees for the residual subsets Ci, C2,..., Cv (Quinlan, 1986). 37 The worth of ID3's feature-selecting greedy heuristic can be assessed by how well the trees express real relationships between class and features as demonstrated by the accuracy with which they classify examples other than those in the training set. A straightforward method of assessing this predictive accuracy is to use only part of the given set of examples as a training set and to check the resulting decision tree on the remainder or testing set. Quinlan (1986) carried out several experiments to test ID3. In one domain of 1.4 million chess positions, using 49 binary-valued features in the feature vector, the decision tree correctly classified 84% of the holdout sample. Using simpler domains of the chess problem, correct classification was 98% of the holdout sample. 2.3.2 C4.5 Algorithms C4.5 is an improved version of ID3 that provides the researcher with the ability to prune the decision tree to improve classification of noisy data. It also provides a subsystem to transform decision trees into classification rules. This system of computer programs constructs classification models similar to ID3 by discovering and analyzing patterns found in the examples provided to it. Not all classification tasks lend themselves to this inductive approach and Quinlan (1993) reviews the essential requirements: Feature-value description: All information about an example must be expressible in terms of a fixed collection of properties or features. Each feature may be either discrete or continuous, but the features used to describe an example may not vary from one example to another. 38 Predefined classes: The categories to which the examples are to be assigned must have been established beforehand. This is the supervised learning model. Discrete classes: The classes are sharply delineated. An example belongs to only one class. Sufficient data: Inductive generalization proceeds by identifying patterns in data. The approach fails if valid, robust patterns cannot be distinguished from chance coincidences As this differentiation usually depends on statistical tests of one kind or another, there must be sufficient examples to allow these tests to be effective. "Logical" classification models: The programs construct only classifiers that can be expressed as decision trees or sets of rules. These forms essentially restrict the description of a class to a logical expression whose primitives are statements about the values of particular features. Figure 2.5 presents the schematic diagram of the C4.5 system algorithm (Quinlan et al., 1987). We will discuss several algorithms concerning the evaluation tests carried out by C4.5, the handling of unknown feature values, and pruning decision trees to improve classification accuracy on the testing set. Most decision tree construction methods are nonbacktracking, greedy algorithms. A greedy algorithm chooses the best path at the time of the test although this may later be shown suboptimal Therefore, as noted before, a greedy algorithm is not guaranteed to provide an optimal solution (Cormen et al., 1990). 39 C4.5 - repeat several times: GROW: - initialize working set - repeat FORM TREE for working set: - if stopping criterion is satisfied, - choose best class otherwise, - choose best feature test - divide working set accordingly - invoke FORM TREE on subsets - test on remainder of training set - add some misclassified items to working set until no improvement possible PRUNE: while decision tree contains subtrees that are both complex and of marginal benefit, - replace subtree by leaf - select most promising pruned tree Figure 2.5: Schematic Diagram of C4.5. 2.3.2.1 Gain criterion and gain ratio criterion C4.5 provides two means of evaluating the heuristic test used by the divide-and- conquer algorithm: the gain criterion which was used in ID3 the gain ratio criterion 40 The information theory underpinning the gain criterion has been summarized by Quinlan (1993, p. 21) as, "The information conveyed by a message depends on its probability and can be measured in bits as minus the logarithm to base 2 of that probability." The probability that a randomly drawn example for a set S of examples belonging to some class Cj is Jreq(Cj.S) I S\ and the information it conveys is We define the expected information from such a message pertaining to class membership by summing over the classes in proportion to their frequencies in S (Quinlan, 1993), When applied to the set of training cases, T, info(T) measures the average amount of information needed to identify the class of a case in T. Now consider a similar measurement after partitioning T in accordance with the n outcomes of a test X. The expected information requirement can be found as the weighted sum over the subsets, as infox (I) = Â¿ ^Zil x mfo( T.)â– i=/ I â€¢* I The quantity gain(X) = info(T) - infox (T) 41 measures the information that is gained by partitioning T in accordance with the test X (Quinlan, 1993). The gain criterion, then, selects a test to maximize this information gain. The gain ratio criterion was developed to eliminate the gain criterion of bias in favor of tests with many outcomes (Quinlan, 1993). The bias can be rectified by the following sets of equations which, by analogy with the definition of infofS), we have split infa(X) = - Â¿ X log, This represents the potential information generated by dividing the training set, T, into n subsets, whereas the information gain measures the information relevant to classification that arises from the same division. Then, gain ratio(X) = gain(X) / split info(X) expresses the proportion of information generated by the split that is useful, i.e., that appears helpful for classification (Quinlan, 1993). If the split is near-trivial, split information will be small and this ratio will be unstable. To avoid this, the gain ratio criterion selects a test to maximize the ratio above, subject to the constraint that the information gain must be large- at least as great as the average gain over all tests examined. Mingers (1989) performed an empirical comparison of selection measure used for decision tree induction reviewing Quinlan's information measure (1979), the y? contingency table statistic, using probabilities rather than the yj , the GINI index of diversity developed by Breiman et al. (1984), the Gain-ratio measure as discussed above, and the Marshall correction factor, which can be applied to any of the previous measures and favors features which split the examples evenly and avoid those which produce small splits. Mingers evaluated these measures on four datasets and concluded that the predictive 42 accuracy of induced decision trees is not sensitive to the goodness of split measure. However, the choice of measure does significantly influence the size of the unpruned trees Quinlan's Gain-ratio generated the smallest trees, whereas X2 produced the largest. An additional study (Buntine and Niblett, 1992) confirmed Mingers results while taking issue with his use of random selection as a comparison to the various methods he studied. Fayad and Irani (1992) reviewed the ability of ID3 to classify datasets with continuous-valued features. Such a feature is handled by sorting all the values for that feature and then partitioning it into two intervals using the gain or gain ratio criterion. They determined that the algorithm used by ID3 for finding a binary partition for a continuousÂ¬ valued feature will always partition the data on a boundary point. 2.3.2 2 Handling unknown feature values The above algorithms assume that the outcome of a test for any example can be determined. In many cases of classification research, unknown features appear due to missed determinations, etc. In the absence of some procedure to evaluate unknown features, entire examples would have to be discarded, much the same as for missing data in LDA and Logit. C4.5 improves upon the definition of gain to accommodate unknown feature values (Quinlan, 1993). It calculates the apparent gain from looking at examples with known values of the relevant feature, multiplied by the fraction of such cases in the training set. Expressed mathematically this is gain(X) = probability A is biown x (info(T)-infox(T)) 43 Similarly, the definition of split info(X) can be altered by regarding the examples with unknown values as an additional group If a test has n outcomes, its split information is computed as if the test divided the cases into n + 1 subsets (Quinlan, 1993). 2.3,2 3 Pruning decision trees The recursive partitioning method of constructing decision trees continues to subdivide the set of training cases until each subset in the partition contains cases of a single class, or until no test offers any improvement (Quinlan, 1986). The result is often a very complex tree that "overfits the data" by inferring more structure than is justified in the training cases. Two approaches to improving the results of classification are prepruning or construction-time pruning, and postpruning. C4.5 uses postpruning. In prepruning, the typical approach is to look at the best way of splitting a subset and to assess the split from the point of view of statistical significance, information gain, or error reduction. If this assessment falls below some threshold, the division is rejected and the tree for the subset is just the appropriate leaf. Prepruning methods have a weakness in that the criterion to stop expanding a tree is being made on local information alone. It is possible that descendent nodes of a node may have better discriminating power (Kim and Koehler, 1995). C4.5 allows the tree to grow through the divide-and-conquer algorithm and then it is pruned. C4.5 performs pessimistic error rate pruning developed by Quinlan (1987) and uses only the training set from which the tree is built. An estimate is made of the error caused by replacing a subtree with a leaf node. If the error is greater with the leaf node, the subtree remains and vice-versa. Michie (1989) noted that tests on a number of practical problems 44 gave excellent results with this form of pruning. Mingers (1989) also reported that pruning improved the ability of decision tree classification. 2,3.3 Limitations ofC4.5 Like any classifier, a decision tree specifies how a description space is to be carved up into regions associated with the classes. When the task is such that class regions are not hyperrectangles, the best that a decision tree can do is approximate the regions by hyperrectangles. This is illustrated in Figure 2.6 below in which the classification region is defined better by the triangular region on the left versus the rectangular regions that would be used by C4.5 (Quinlan, 1993). Figure 2.6: Real and Approximate Divisions for an Artificial Task. (Michie, 1987, 1989) identified other limitations of ID3 descendents. The former mentioned that pruning of decision trees when the data are inconclusive and this was mentioned in (Quinlan, 1993). Inconclusive data are when the features used in describing a set of examples are not sufficient to specify exactly one outcome or class for each example. In the latter reference, the use of a feature vector for large domains, such as medical diagnosis, is discussed. 45 2.3.4 C4.5 Financial Applications Braun and Chandler (1987) performed the earliest reported business application research of rule-induction classification with a variant of ID3, known as ACLS. Using a database of 80 examples from an investment expert's predictions, they used ACLS to formulate rules to predict not only the expert's prediction of the market but to predict the actual market movement Using 108 examples, ACLS correctly predicted actual market movement 64.4% of the time. The expert correctly predicted market movement 60.2% of the time. Rules for loan default and bankruptcy were developed using a commercial variant of ED3 (Messier, Jr. and Hansen, 1988). The investigators were surprised with the small decision tree that was developed and how it correctly classified the testing set with 87.5% accuracy. The ID3 results of the bankruptcy data were favorably compared to LDA. Miller (1990) discussed the use of classification trees, similar to ID3, for credit evaluation systems. Chung and Silver (1992) compared Logit, ID3, and Genetic Algorithms to the outcomes of experts for graduate admissions and bidder selection. The three methods performed comparably on the graduate admissions problem but significantly different on the bidder selection problem where the GA had the superior performance. One conclusion of the study is that the nature of the problem-solving task matters. A review of the data showed that the bidder selection problem had a feature that made it difficult for ID3 to build the decision tree to a high degree of accuracy. An application of ID3 that is pertinent to our present study is for stock screening and portfolio construction (Tam, 1991). To demonstrate the effectiveness of the inductive approach, trading rules were inferred from eight features. Three portfolios were constructed 46 from three rules each year, and their performance was compared to market standards. In every case, the portfolios outperformed the Standard & Poorâ€™s 500 Index. In Kattan et al. (1993), human judgment was compared to the machine learning techniques of ID3, regression trees (Breiman et al., 1984), a back-propagation neural network, and LDA. Human subjects were put into teams and were allowed to induce rules from historical data under ideal conditions, such as adequate time and opportunity to sort the data as desired. The task at hand was to emulate the decisions made by a bank officer when processing checking account overdrafts. A sample of 340 useable observations was gathered for the experiment. The results on multiple holdout samples indicated that human judgment, regression trees, and ID3 were equally accurate and outperformed the neural network. Fogler (1995) discussed the strengths and weaknesses of using classification tree programs, such as ID3, to explain nonlinear patterns in stocks. The algorithm sequentially maximizes the explanatory power at each branch, looking forward only one step at a time. Additionally, he notes that in larger problems, the classification trees might differ This paper also reviews financial applications of Neural Nets, Genetic Algorithms, Fuzzy Logic, and Chaos. Harries and Horn (1995) researched the use of strategies to enhance C4.5 to deal with concept drift and non-determinism in a time series domain. An aim of this study was to demonstrate that machine learning is capable of providing useful predictive strategies in financial prediction. For short term financial prediction, a successful prediction rate of 60% is considered the minimum useful to domain experts. Their results implied that machine learning can exceed this target with the use of new techniques. By trading off 47 coverage for accuracy, they were able to minimize the effect of both noise and concept drift. Trippi and Lee (1996) suggest that inductive learning algorithms such as ID3 could be used to generate rules that classify stocks and bonds into grades. They note that when used for classification problems, inductive learning algorithms compete well with neural network approaches. The classification performance of twenty-two decision tree, nine statistical tests, and two neural network methods were recently compared in terms of prediction error, computational time, and the number of terminal nodes for decision trees using thirty-two datasets (Lim et al., 1997). The datasets were obtained from the University of California at Irvine Repository of Machine Learning Databases. It was found that a majority of the methods, including C4.5, LDA, and Logit, had similarly low prediction error rates in the sense that differences in their error rates were not statistically significant. 2 4 Linear Discriminant Analysis 2.4.1 Overview Linear discriminants, first studied by Fisher (1936), are the most common form of classifier, and are quite simple in structure (Weiss and Kulikowski, 1991). The name indicates that a linear combination of the evidence will be used to separate or discriminate among the classes and to select the class assignment for an unseen case. For a problem involving d features, this means geometrically that the separating surface between the sample will be a (d-1) dimensional hyperplane. 48 The general form for any linear classifier is given as follows: w\e\ + Wiei +. . +Wdej - w0 where (e\,ei,... ,tâ€™d) are the feature vectors, d is the number of features, and wÂ¡ are constants that must be estimated Intuitively, we can think of the linear discriminant as a scoring function that adds to or subtracts from each observation, weighing some observations more than others and yielding a final total score. The class selected, Q, is the one with the highest score (Weiss and Kulikowski, 1991). 2.4 2 Limitations of LDA In classical LDA there are some limits on the statistical properties which the discriminating variables are allowed to have. No variable may be a linear combination of other discriminating variables. A â€œlinear combinationâ€ is the sum of one or more variables which may have been weighted by constant terms. Thus, one may not use either the sum or the average of several variables along with all those variables. Likewise, two variables which are perfectly correlated cannot be used at the same time. Another requirement is that the population covariance matrices are equal for each group (Klecka, 1980). Another assumption for classical LDA is that each group is drawn from a population that has a multivariate normal distribution. Such a distribution exists when each variable has a normal distribution about fixed values on all the others. This permits the precise computation of tests of significance and probabilities of group membership. When this assumption is violated, the computed probabilities are not exact but they may still be quite useful if interpreted with caution. It should be noted that there are many generalizations of LDA, including the Linear Programming variant, that don't have these restrictions (Koehler, 1989), however, many 49 financial applications studies don't seem to be concerned about the restrictions. Karels and Prakash (1987), in their study of the use of discriminant analysis for bankruptcy prediction, noted that violating the multivariate normality constraint was the rule rather than the exception in finance and economics A more recent study of neural network classification vs. discriminant analysis (Lacher et al., 1995, p. 54) noted that the multivariate normal constraint, and others, are incompatible with the complex nature and interrelationships of financial ratios. Discriminant analysis techniques have proven better in financial classification problems until recently when new artificial intelligence classification procedures were developed. 2.5 Logistic Regression fLogif) Logit techniques are well-described in the literature (Altman et al., 1981),(Johnston, 1972), and (Judge et al., 1980). If we incorrectly specify a model as linear, the statistical properties derived under the linearity assumption will not, in general, hold. The obvious solution to this problem is to specify a nonlinear probability model in place of the linear model (Sestito and Dillon, 1994). Logistic regression uses a nonlinear probability model that investigates the relationship between the response probability and the explanatory features. It is useful in classification problems involving nonnormal population distributions and noncontinuous features. Studies have shown that the normal linear discriminant and logistic regression usually give similar results (Weiss and Kuiikowski, 1991). Logistic regression calculates the probability of membership in a class. The model has the form: Logit (p) = log (p / (1 - p)) = a + P'x where p = Pr (Y= 1| x) is the response probability to be modeled, a is the intercept 50 parameter, and [1 is the vector of slope parameters (SAS Institute, 1992). Output from the analysis is used to calculate the probability of membership in a class. Several recent financial application studies compared Logit to machine learning or other AI techniques such as Case-Based Reasoning (CBR). Dwyer (1992) compared the performance of Logit and nonparametric Discriminant Analysis to two types of Neural Networks in predicting corporate bankruptcies. Bankruptcy data drawn from a ten-year time horizon was input into each of the four models, with predictive ability tested at one, three, and five years prior to bankruptcy filing. The results suggested that Logit and the backpropagation Neural Network were generally evenly matched as prediction techniques. Hansen et al. (1993) studied a difficult audit decision problem requiring expertise and they compared the performance of ID3, Logit, and a new machine learning algorithm called NEWQ. While NEWQ performed best with 15 errors, Logit produced 16 errors and ID3 had 18 errors out of 80 examples. Logit was compared to a CBR system called ReMind (Bryant, 1996) for predicting corporate bankruptcies. The database used in the study consisted of nonbankrupt and bankrupt firms in a 20:1 ratio. Logit outperformed ReMind and one conclusion of the study was that the sample size of bankrupt firms was too small for ReMind to work well. 2.6 Summary This chapter has reviewed the literature of machine learning, inductive learning, C4.5 and ID3, and the statistical techniques that we propose using in our research. In Chapter 3 we focus on the domain problem of classifying mutual funds, identify our hypotheses for further study, explain the statistical tests used to verify them, and conclude with a discussion of the benefits of this research. CHAPTER 3 DOMAIN PROBLEM In this chapter we will provide an overview of mutual fund ratings systems and provide background on Momingstar, Inc., the mutual fund rating company of interest to this research. We will describe the Momingstar rating system, mention observations and criticisms of the Momingstar rating system, and discuss how investment professionals use rating systems. We will also review research about the persistence of mutual fund performance and identify an interesting relationship between the average mutual fund rating one year and the succeeding year's rating and average one-year return. This chapter will end with a specification the problems to be studied in this research. 3.1 Overview of Mutual Fund Ratings Systems Mutual funds are open-end investment companies that buy and sell their shares to individuals or corporations Mutual fund buy and sell transactions occur between the fund and the investor, and do not take place on a secondary market such as the New York Stock Exchange. Since asset holdings are restricted to various forms of marketable securities, the total market value of the fund's assets is relatively easy to calculate at the end of each dayâ€™s trading. The market value per share of a given mutual fund is equal to the total market value of its assets divided by the number of shares of stock the fund has outstanding We refer to this value as the fund's Net Asset Value (NAV) and it is the price at which the fund will buy or sell shares to the public (Radcliffe, 1994). 51 52 Peter Lynch (1993), until recently manager of the large Fidelity Magellan mutual fund, wrote that, "...mutual funds were supposed to take the confusion out of investing~no more worrying about which stock to pick." The growth in the number of mutual funds has been quite astounding. The November 13, 1995 issue of Business Week noted: â€œStill, many investors can't resist them. Equities have been on a long bull run, and the number of new funds keeps growing-^474 have been added to Momingstar Inc.'s 6,730-fund database so far this year. The temptation to invest in newbies is understandable, since 61% of all equity mutual funds are less than three years old, according to CDAAViesenberger, a Rockville (Md.) mutual-fund data service. In fact, nearly one-third of all money flowing into equity mutual funds in the past 12 months went to those with less than a five- year track record, says State University of New York at Buffalo finance professor Charles Trzcinka.â€ (Dunkin, 1995, p. 160) In 1976, 452 mutual funds existed and this number had only grown to 812 mutual funds managing$241.9 billion in assets by 1987. According to the Investment Company
Institute (ICI), there are 2,855 stock mutual funds today managing $2.13 trillion of assets. This is comparable to the number of stocks on the New York Exchange. The ICI is an association that represents investment companies. Its membership includes 5,951 open-end investment companies or mutual funds, 449 closed-end investment companies and 10 sponsors of unit investment trusts Its mutual fund members have assets of about$3,056
trillion, accounting for approximately 95% of total industry assets, and have over 38 million
individual shareholders Moreover, the growth rate continues. The August 28, 1997 issue
of the Wall Street Journal noted that investors put a net $26.56 billion into stock funds in July and net bond inflows for July were$4.21 billion.
Mutual fund rating services, similar to stock and bond rating services, have been in
existence since 1940. According to the January 7, 1993 Wall Street Journal. Wiesenberger
Financial Services was the oldest mutual fund performance tracking company and had been

53
rating funds since The Investment Company Act of 1940 which authorized the creation of
mutual funds. This company merged with CD A Investment Technologies Inc. into
CDAAViesenberger (CDAAV) in 1991.
CDAAV provides a monthly report to subscribers listing performance, portfolio
characteristics, and risk and dividend statistics on mutual funds (CDAAViesenberger, 1993).
The company determines the CDA Rating of mutual funds, a proprietary measure, which is
a composite percentile rating from 1 (best) to 99 (worst), based on the fund's performance
over the past four market cycles. Two up cycles and two down cycles are used if available,
however, at least two cycles (one up and one down) are required for a rating. According to
CDAAV, the best-rated funds will be those that have done well in different market
environments, whose recent performance continues strong, and whose comparative results
have not fluctuated wildly over varying time periods. In determining the CDA Rating, they
give extra weight to recent performance (latest 12-months), and penalize funds for
inconsistency. CDAAV does not provide the methodology for determining the CDA Rating
The newest rating service is the Value Line Mutual Fund Survey that rates mutual
funds by a proprietary rating system. Fund risk is rated from one (safest) to five (most
volatile) by Value Line. They also provide an overall rating for the fund on a scale of 1 to 5.
Value Line also prints a one-page summary of the fundâ€™s performance.
Lipper Analytical Services publishes mutual fund indexes in the Wall Street Journal
and Barron's, and a Mutual Fund Scorecard in the Wall Street loumal Lipper's scorecard
does not provide a rating for funds but lists the top 15 and bottom 10 performers based on
total return over 4 weeks, 52 weeks, and 5 years. The Wall Street Journal mutual fund list

54
ranks mutual funds by investment objective from A, B, C, D, or E (units of 20% each) for
total return. The list includes 4 week, 13 week, 26 week, 39 week, 1 year, 3 year, 4 year,
and 5 year returns. In 1996, Lipper was tracking 4,555 stock mutual funds.
3.1.1 Momingstar. Inc. Overview
Momingstar Mutual Funds is a mutual fund rating service started in April 1984 . Its
first publication was the quarterly Mutual Fund Sourcebook for stock equity funds. Within
two years it was publishing Mutual Fund Values, featuring the one-page analysis of funds
that was to become the firm's cornerstone product. It also added bond funds to its coverage.
In November 1985, Business Week asked Momingstar to provide data for a new mutual
fund issue. Business Week insisted upon a fund rating system, and development work on
the magazine's rating system paved the way for Momingstar's own 5-star rating system,
which was introduced early in 1986 (Leckey, 1997). Momingstar sales went from $750,000 in its third year of operation to$11 million in its seventh year.
Momingstar publishes the following software and print products:
Software Published Monthly
Momingstar Ascent. Software for the do-it-yourself investor with a database of 7,800 funds
Momingstar Stock Tools: online stock newsletter that lets you screen, rank, and create
model porfolios from a database of7,900 stocks
Momingstar Principia and Principia Plus: Software for investment professionals providing
data on mutual funds, closed-end funds, and variable annuities. The Principia Plus also
features a portfolio developer and advanced analytics.
Print
Momingstar Mutual Funds: Indepth data and analysis on more than 1,600 funds that is
published every other week
Momingstar No-Load Funds a detailed look at nearly 700 no- and low-load funds that is
published every four weeks.

55
Morningstar Investor: A 48-page monthly publication featuring articles and information on
500 mutual funds.
Morningstar Mutual Fund 500. A year-end synopsis of 500 of the best funds.
Morningstar Variable Annuity/Life Performance Report. One monthly guide that covers the
variable annuity universe.
The Chicago-based firm has become the preeminent source of fund information for
investors. Charles A. Jaffe, the Boston Globe financial reporter, said in an August 6, 1996
article, that more than 95 percent of all money flowing into funds goes to those carrying
Morningstarâ€™s four- and five-star ratings. According to the February 24, 1997 Wall Street
Journal, eighty percent of Morningstarâ€™s client base are made up of financial planners and
brokers say that the firmâ€™s star ratings are a big factor in selling funds.
3.1.2 Morningstar Rating System
In June 1994, Morningstar had over 4,371 mutual funds in its total universe of funds
on CD-ROM. Of these, 2,342 (54%) having three or more years of performance data were
rated according to the Morningstar one-star to five-star rating system. Only 1,052 (24%) of
the funds were equity mutual funds based on a self-identified investment objective. By June
1995, the number of funds had increased by 51% to 6,584 with 2,871 (44%) having
Morningstar ratings and 1,234 (19%) rated as equity funds. The July 1996 Morningstar
Principia had 1,583 rated equity mutual funds. Thus, less than half of all rated funds are
equity funds and this represents only part of all the funds in the Morningstar database.
The main criteria Morningstar uses for including a fund in the biweekly newsletter is
that the fund be listed on the NASDAQ (National Association of Security Dealers
Automatic Quotation System). Other factors that enter this determination are the

56
cooperation of the fund group, the space limitations of the publication, the asset value of the
fund, and the investor interest in the fund (Momingstar, 1992).
Domestic stock funds, taxable-bond funds, tax-free bond funds, and international
stock funds are each rated separately. These are referred to as the fund classes. Momingstar
includes hybrid funds in the domestic stock universe. To determine the star rating,
Momingstar analysts calculate Momingstar risk and Momingstar return. They then
determine the relative placement of a fund in the rating system by subtracting Momingstar
Risk from Momingstar Return (Momingstar, 1992) and ordering the funds from highest to
lowest.
A fund's Momingstar Risk is determined by subtracting the 3-month Treasury bill
return from each month's return by the fund. They sum those months with a negative value
and the total losses are divided by the total number of months in the rating period (36, 60, or
120). They compare the average monthly loss for a fund to those of all equity funds by
dividing the average risk for this class of funds into all these values and this sets the average
Momingstar Risk to 1.00. The resulting risk value expresses the percentage points of how
risky the fund is relative to the average fund. For example, a mutual fund with a
Momingstar Risk rating of 1.30 is 30% more risky than the average mutual fund risk.
Momingstar Return is the fund's total return adjusted for all loads (sales
commissions) and management fees applied by a fund, that is in excess of the Treasury Bill
rate. Momingstar (1992) asserts that the effect of loads will clearly affect a three-year star
rating more than a ten year one. Unless the fund's load is substantially different from those
of its competitors the effect will not be unduly pronounced. The main reason for including
the load in the rating process is so that investors can compare the Momingstar Return

57
numbers for load and no-load funds. For example, in the June 1995 Momingstar CD-ROM
database there were 2,360 mutual funds with a mean front-end load of 4.33%. Of these, 940
were equity funds having a mean front-end load of 4.75%.
Momingstar assumes the full load was paid on front-end loads. Investors who hit
the load fund's breakpoints will receive higher than published returns. Deferred sales
charges and redemption fees are included in the calculation by assuming the investor sold
the fund at the end of the particular rating period. For the three-year period Momingstar
typically charges half of the fundâ€™s maximum deferred charge and, for the ten-year period,
they ignore the deferred charges. The average value of the Momingstar Return is divided
into each calculated return value resulting in the average Momingstar Return being set to
1.00 to allow quick comparisons between different mutual funds. The interpretation of
Momingstar Return is similar to Momingstar Risk: a Momingstar Return of 1.30 means that
the fund is returning 30% more than the average fund.
The result of Momingstar Return minus Momingstar Risk is the Risk-Adjusted
Performance (RAP). Momingstar calculates it for each fund class for three years, five years
and ten years, if the financial data are available for those periods. Based on the number of
years of data available, a weighted average is calculated to report the overall Momingstar
1. If three years are available, Momingstar uses 100% of 3 year RAP.
2. If five years of data are available, they use 40% of the 3-year RAP and 60% of the 5-
year RAP
3. If ten years of data are available, they use 20% of the 3- year RAP, 30% of the 5-
year RAP, and 50% of the 10-year RAP.

58
Momingstar orders the results from highest to lowest and distributes the star ratings by a
symmetric normal curve. The top 10% of funds receive five stars (highest), the next 22.5%
receive four stars, the middle 35% receive 3 stars, the lower 22.5% receive two stars, and the
3.1.3 Review and Criticism of the Momingstar Rating System
The Momingstar ratings are published in the biweekly Momingstar Mutual Fund
newsletter and the ratings are updated monthly. Momingstar suggests that an investor could
develop their own rating system by revising the weights for the data, for example, to
emphasize risk more than return In practice, this is difficult to do since Momingstar
publishes the detailed quantitative data on a select sample of 130 mutual funds every two
weeks in the newsletter A subscriber to the newsletter will only see the quantitative values
needed for developing their own rating system about twice a year unless they subscribe to
the more expensive CD-ROM database that provides all the necessary information on a
monthly basis.
Momingstar (1992) states that its rating system is a purely quantitative method for
evaluating mutual funds and they only use objective data to determine the Momingstar
rating. They tell investors to use it only as a screening device and not a predictor for future
performance. However, in the early years of the star rating system Momingstar labeled 5-
star funds "buy" and 1-star funds "sell", a practice that was dropped in 1990 (Leckey, 1997).
Two studies of the Momingstar rating have shown that the 5-star system is not
predictive of fund performance, according to Leckey (1997). Mark Hulbert, editor of The
Hulbert Financial Digest, which tracks investment letters, used the Momingstar ratings to
invest $10,000 in 5-star funds. He sold the mutual funds when their rating declined and, 59 over a five-year period, this trading system failed to beat the Wilshire 5000 index. Momingstar argued that the 5-star funds should not be seen as a portfolio but Hulbert disagreed A study by Upper Analytical Services looked at how the 5-star funds performed over the next twelve months when purchased at the beginning of 1990, 1991, 1992, and 1993. Upper reported that a majority of 5-star stock funds did worse in the rest of the year than the average stock fund (Leckey, 1997). Momingstar has conducted a study that shows that stock funds rated 5-stars in 1986, when the rating system started, posted respectable or better results during the next nine years. Conversely, more than a dozen 1-star stock funds have performed so badly as to be merged out of existence (Leckey, 1997). Another study performed by Momingstar showed that a statistically significant majority of funds that received 4- and 5-star ratings in 1987 maintained those high ratings a decade later. In her commentary, Momingstar senior analyst Laura Lallos noted that, "By the standards of what it sets out to do-separating the long-term winners from the losers-the rating is actually quite successful (Harrell, 1997)." 3.1.4 Investment Managers Use of Ratines Ratings possess a considerable value to the investment community as indicated by the extensive use made of them by many institutional and individual investors over a long period (Teweles and Bradley, 1987). Even organizations with extensive staffs use them in cross-checking their investigations. They are a quick, easy reference available to most investors and, when used with care, they are a valuable source of information to supplement other data. 60 The Momingstar approach to rating mutual funds with five classes is similar to the classification of stocks in the securities industry today. Elton and Gruber (1987) found that the stockbroker making an investment decision quite often receives a list of stocks with a ranking on each (usually from one to five) and perhaps some partial risk information. If stocks are ranked (grouped) from 1 to 5, stocks ranked in group 1 are best buys, a 2 is a buy, 3 is a hold, 4 is a sell, while a 5 is a definite sell Of 40 financial institutions they surveyed, 80% stated that the data the brokerage community and/or their analysts supplied to the portfolio managers were in the form of grouped data. The institutions using grouped data reported that 50% grouped them on expected return, 30% on risk-adjusted return, 10% on expected deviations from a Capital Asset Pricing Model, and 10% responded they did not even know the basis of the grouping. Fama (1991) noted that the Value Line Investment Survey publishes weekly rankings of 1,700 common stocks into five groups. Group 1 has the best return prospects and group 5 the worst. There is evidence that, adjusted for risk and size of the company, group 1 stocks have higher average returns than group 5 stocks for horizons out to one year. A study of the Banker's Trust Company stock forecast system (Elton et a!., 1986) used the ranking of stocks on a five-point scale from 33 brokerage firms. A rating of 1 or 2 was a buy recommendation, 3 was neutral, and 4 and 5 were sell recommendations. Approximately 700 stock analysts used the system over the three-year period of the study. We made two observations from this study: 1. On average, changes occurred to 11 % of the classifications every month. 61 2. Table 3.1 shows that, over the three-year period of the study, the distribution of the monthly average of9,977 stock forecast ratings was in a skewed curve favoring buy recommendations. Table 3.1: Distribution of Stock Ratings Rating 1981 1982 1983 Overall 1 17.4% 14.9% 14.7% 15.8% 2 37.6% 29.4% 30.3% 32.5% 3 32.9% 36.8% 40.6% 38.0% 4 10.6% 10.7% 11.6% 11.2% 5 1.5% 2.5% 2.8% 2.4% The use of rankings for investment instruments has a long history of usage in the financial community and the investment community readily accepts them in recommending broker sales to customers. 3.1.5 Performance Persistence in Mutual Funds Brokers usually base their recommendation to purchase a financial instrument on historical performance. Investors flock to well-performing mutual funds and common stocks based on the anticipation of continued performance. This continued performance, if it does exist, is referred to as performance persistence and a review of the literature provides mixed results about this strategy. Goetzmann and Ibbotson (1994) showed that past returns and relative ranking were useful in predicting future performance, particularly for raw returns. A later study (Brown and Goetzmann, 1995) demonstrated that the relative performance pattern of funds depended upon the period observed and was correlated across managers. A year-by-year decomposition showed that persistence of return was due to a 62 common investment strategy and not standard stylistic categories and risk-adjustment procedures. Another study of performance persistence (Manly, 1995) showed that relative performance of no-load, growth-oriented mutual funds persisted for the near term with the strongest evidence for a one-year evaluation horizon. The difference in risk-adjusted performance between the top and bottom octile portfolios was six to eight percent per year. Malkiel (1990) observed that while performance rankings always show many funds beating the averages--some by significant amounts--the problem is that there is no consistency to the performances. He felt that no scientific evidence has yet been assembled to indicate that the investment performance of professionally managed portfolios as a group has been any better than that of randomly selected portfolios. A later study (Malkiel, 1995) noted that performance persistence existed in the 1970s but not in the 1980s Grinblatt and Titman (1992) identified the difficulty of detecting superior performance by mutual fund managers by analyzing total returns since the fund manager may be able to charge higher load fees or expenses. They constructed gross returns for a sample of mutual funds for December 31, 1974, to December 31, 1984. They concluded that superior performance by fund managers might exist, particularly among aggressive- growth and growth funds and those funds with the smallest net asset values. However, funds with the smallest net asset values have the highest expenses so that actual returns, net of expenses, will not exhibit abnormal performance. In (Patel et al ., 1991) past performance of a mutual fund was found to have an effect on cash flows. A one-percentage-point return higher than the average fundâ€™s return implies a$200,000 increased flow in the next year (where the median fund's size is $80 million and 63 the median flow is$21 million). This performance effect is based on investors' belief that a
managed fund with a superior past will perform better than individuals.
Another confirmatory study is Phelps (1995) that showed that sophisticated and
unsophisticated investors were chasing prior year returns. Was this a good investment
strategy? Yes. Risk-adjusted fund specific performance was significantly positively related
to past performance during the earlier half of Phelpsâ€™ sample for 1985-89.
Three studies that are more recent show persistence is a real phenomenon while
arguing if fund managers are or are not responsible for it. Carhart (1997) stated that
common factors in stock returns and investment expenses almost completely explain
persistence in equity mutual fundsâ€™ mean and risk-adjusted returns. He argues against fund
managers being skilled portfolio managers but does not deny persistence when it comes to
strong underperformance by the worst-return mutual funds. Elton et al. (1996) examined
predictability for stock mutual funds using risk-adjusted return. They found that past
performance is predictive of future risk-adjusted return A combination of actively managed
portfolios was formed with the same risk as a portfolio of index funds. The actively
managed funds were found to have a small, but statistically positive risk-adjusted return
during a period where mutual funds in general had negative risk-adjusted returns. Phelps
and Detzel (1997) claim that they confirmed the persistence of returns from 1985-89 but it
disappeared when risk was properly controlled for, or the more recent past was examined.
With these differing results about performance persistence, we would expect
investors would rely on performance and rating systems as a way of investing wisely. The
popularity of rating systems attests to their use. The ability to predict mutual fond ratings,

64
and improvements or declines in ratings, from one month (the Momingstar rating cycle) to
one year would be an important tool in developing an investment plan.
3.1.6 Review of Yearly Variation of Momingstar Ratings
We studied matched mutual funds, funds with no name change over a one-year
period, from 1993 to 1996 to determine the relationship between the Momingstar rating
of one year, and the average one-year return and Momingstar rating of the succeeding
year. If some relationship existed, it would show why it would be interesting to know the
predicted Momingstar rating one year in the future. The 1993-94 period had 770
matched funds, 1994-95 had 934 matched funds, and 1995-96 had 1,059 matched funds.
In Figure 3.1, the Momingstar 5-star ratings for 1993 appear as the data points on
the graph. The 1994 Momingstar ratings are the x-axis. The y-axis is the one-year
average return percentage. The line connecting the data points shows, for example, the
one-year average return percentage for a fund with a 2-Star rating in 1993 if it increased
or decreased its star rating in 1994. To be more specific, the average fund with a 1993 2-
Star rating would have a 2% one-year return if it stayed 2-Star. If this average fund
increased to 3-Star it had a 15% one-year return. If the average fund decreased to 1-Star
it had a -7% one-year return.
Figure 3.1 also shows that a 1993 2-star and 3-star fund that retained their
Momingstar rating in 1994 had the same one-year average return, approximately 2%. As
noted in the preceding paragraph, the 1993 2-Star fund that increased its rating had an
average 15% one-year return However, the 1993 3-Star that went up to 4-Star in 1994
only had an average 10% one-year return.

65
This effect for one-year average returns was more distinct in Figure 3.2, using a
three-star rating system for rating the mutual funds. The relationship between average
return and rating changes was found to exist for all three years studied. Of course, it
should be noted that these are years when the stock market increased year-by-year.
1993 Rating vs. One-Year Average Return and 1994 Rating
0>
25
2
20
u
>
15
<
5
10
s
1
5
Ã³
cC
0
a
O
-5
-10
â™¦ qi 1-Star
â€”bâ€” 93 2-Star
/ / ^
â€”Aâ€” 93 3-Star
^
â€”*â€”93 4-Star
p'
1-Star
2-Star
3-Star
1994 Rating
4-Star
5-Star
Figure 3.1: 1993 Ratings and 1994 Ratings for the Five-Star Rating System.
1993 Rating vs One-Year Average Return and 1994 Rating
-93 1-Star
-93 2-Star
-93 3-Star
1994 Ratings
Figure 3.2: 1993 Ratings and 1994 Ratings for the Three-Star System.

66
1994 Ratings vs One-Year Average Return and 1995 Rating
-*
94
1-Star
-B-
94
2-Star
â€”*â€”
94
3-Star
-x-
94
4-Star
94
5-Star
1-Star 2-Star 3-Star 4-Star 5-Star
1995 Ratings
Figure 3.3: 1994 Ratings and 1995 Ratings for the Five-Star System.
1994 Ratings vs. One-Year Return and 1995 Rating
1995 Rating
-*-94 1-Star
-â€¢-94 2-Star
-*-94 3-Star
Figure 3 .4: 1994 Ratings and 1995 Ratings for the Three-Star System.
Figures 3.2, 3.4 and 3.6 for the 3-Star rating system show that it was better to hold
2 -Star rated funds that maintained their rating or improved them to 3-Stars, rather than to

67
own 3-Star funds that declined to 2-Star. The average one-year return on these funds was
less than a 2-Star fund.
1995 Ratings vs One-Year Average Return and
1996 Rating
â€”â€¢â€” 95 1-Star
â€”Bâ€”95 2Star
â€”Aâ€”95 3-Star
-Â«-95 4-Star
-Â«-95 5-Star
1996 Rating
Figure 3.5: 1995 Ratings and 1996 Ratings for the Five-Star System.
1995 Ratings vs One-YearRetum
and 1996 Rating
1996 Rating
-*-95 1-Star
â€”â– â€”95 2-Star
-A-95 3-Star
Figure 3.6: 1995 Ratings and 1996 Ratings for the Three- Star Rating System.

68
Based upon this evidence, for the period studied, it would have been interesting and
profitable to have a prediction of the Momingstar mutual fund ratings one year in the future.
The graphs indicate that over a one-year period the Momingstar rating responds to increased
returns by having the rating go up and the ratings decline when average returns go down.
This is somewhat surprising given the 3-year, 5-year, and 10-year return data used by
Momingstar in calculating its ratings and the moderating effect it should have on one-year
rating changes.
3 2 Problem Specification
Our review indicates that there are two issues of interest associated with the
Momingstar Mutual Fund rating system:
1) Due to the rapidly increasing number of mutual funds, Momingstar rates
approximately half the mutual funds in their database because unrated funds
do not have three years of financial data for Momingstar to calculate a rating.
Classifying unrated funds could be useful information for investors wanting
to know how these funds compare to Momingstar-rated mutual funds, and
2) Anecdotal evidence exists that investors buy mutual funds that will
maintain or improve their Momingstar rating, meaning investors have high
regard for the Momingstar rating system. The ability to predict Momingstar
mutual fund ratings one year in advance would be useful information for
planning an investment portfolio.
Based on the concept of performance persistence we would expect that mutual fund
data could have information that would indicate a fund would continue to maintain or
improve their Momingstar rating. It could be due to average returns remaining the same or

69
improving, or a combination of features. Likewise, if the average return or these features
decrease, we would expect the rating of the mutual fund to decrease.
We will determine the ability of C4.5 to classify mutual funds versus LDA and Logit
to demonstrate that unrated funds can be classified with this technology. Success with
classification by C4.5 will provide a foundation for our study of the prediction of
Momingstar mutual fund ratings. Therefore, our research hypotheses are as follows:
1. C4.5 can classify mutual funds as well as LDA and Logit, and
2. C4.5 can predict mutual fund ratings changes one year in the future compared to an
investment strategy of the fund maintaining the same rating one year in the future.
Figure 3.7 on the following page shows a map of the experimental design of this
study. In Chapter 4, we compare the classification of mutual funds by C4.5 to LDA and
Logit to test hypothesis 1. In Chapter 5, we perform experiments with C4.5 to predict
mutual fund ratings and ratings changes one year hence to test hypothesis 2. We conducted
these studies using two rating systems: the standard Momingstar 5-Star rating system; and a
new 3-Star rating system based on merging 1-Star and 2-Star ratings, and the 4-Star and 5-
Star ratings into two new ratings. The 3-Star system was designed to reduce the
classification error caused by the small number of funds rated 1-Star and 5-Star.

70
Experimental Plan
Figure 3.7: Overview ofResearch Phases.

CHAPTER 4
CLASSIFICATION OF MUTUAL FUNDS BY RATINGS
The research design for this study divides into two parts. In this chapter, we
determined that C4.5 classifies mutual funds by their Momingstar rating as well as Linear
Discriminant Analysis (LDA) and Logistic Regression (Logit). In Chapter 5, C4.5
predicted future Momingstar mutual fund ratings and ratings changes.
4.1 Research Goals
The research that we have conducted has several broad research goals:
1) Demonstrate the use of decision trees as a useful classification technique for
mutual funds,
2) Improve our understanding of the domain, the Momingstar rating system,
from the standpoint of the relationship between ratings and the features or
attributes used to describe the mutual funds, and
3) Develop a knowledge base and several decision trees that can be used to
predict mutual fund ratings.
4 2 Research Caveats
Before reviewing the research methodology of these phases it should be noted that
C4.5, and also the stepwise forms of LDA and Logit in this research, use a greedy
algorithm heuristic for selecting a feature to classify an example. As noted earlier in
Chapter 2, a greedy algorithm always makes the choice that, at the moment, looks best.
71

72
Greedy algorithms do not always yield optimal solutions but, for many problems, they
often do.
Another caveat is that the decision tree that is generated by the algorithm to
correctly classify examples in the training set is only one of the many possible trees that
could classify better or worse (Quinlan, 1990). These two points mean that the order of
feature selection in these three methods of classification is not a measure of their value to
the classification process. Specifically, in the case of C4.5, more than one decision tree
could exist that would classify the training set as well as the one selected by it. Never the
less, if a feature is consistently selected in our samples we feel that this is an indication of
its importance to the classification process.
4 3 Example Databases
Momingstar, Inc. approved the use of their data for the experiments conducted as
part of this research We selected examples from the Momingstar Mutual Funds OnDisc
CD-ROM for April 1993, July 1994, and July 1995, and the Momingstar Principia for
July 1996. An example consists of various features describing a mutual fund and the
Momingstar rating of 1-Star to 5-Star. A complete listing of the available features is in
Appendix A. In each of the following phases we will list the features that made up the
feature vector for each dataset
4 4 Brief Overview of the Research Phases
Phase 1 consisted of classifying the April 1993 Momingstar data with C4.5, LDA,
and Logit. In Phase 2 we added new features derived from the April 1993 data to determine
if this improved classification. Phase 3 of this study used the July 1994 Momingstar
database to study improvements to classification caused by increased sample size (more

73
mutual funds now qualified for a Momingstar rating). We also consolidated the
Momingstar rating system into three ratings instead of five ratings and explained the reason
for doing this in Section 4.7. Phase 4 departed from comparing C4.5 with LDA and Logit
and tested the ability of C4.5 to classify funds using a large number of features, fifty, by
crossvalidation with the 5-Star and 3-Star rating system. We also studied the effect of three-
year features on classification. The results of this experiment were used to design the final
three phases of our research presented in Chapter 5.
4.5 Phase 1 - Classifying 1993 Funds
451 Methodology
In this phase, we performed three separate classifications of the April 1993
Momingstar data using C4.5, LDA, and Logit. First, we obtained examples of equity
mutual funds with complete data from the selected database. This meant that the
examples did not have missing feature values.
Equity mutual fund features used in the classification process were selected by
two criteria:
(1) minimizing the expected correlation among certain features since LDA
required that the features not be highly correlated (Klecka, 1980), and
(2) excluding total and average return features beyond the first year (three, five,
and ten year values are used by Momingstar in calculating Momingstar return) to
eliminate correlation with the Momingstar classification.
We tested for correlation among features and Percentage Rank All Funds had a
highly negative mean correlation of -0.85 with Total Return. The mean correlation was
0.64 for Total Return and Percentage Rank Funds by Objective. Percentage Rank Funds

74
by Objective had a mean correlation of 0.75 with Percentage Rank All Funds. Thus, we
used the Total Return feature in the dataset and excluded those correlating features
Factor analysis was not used to reduce this set to independent uncorrelated factors
due to its underlying assumption that the variates are multivariate normal (Morrison,
1990). We show later in this chapter that most features are not univariate normal and,
therefore, not multivariate normal. Manly (1995) considers univariate normality a
minimum requirement before testing for multivariate normality In addition, the data
consisted of five classification groupings (the Momingstar stars). Finding a single
orthogonal transformation such that the factors are simultaneously uncorrelated for the
five groupings would require that the transformations be the same and differ only by the
sampling error (Flury and Riedwyl, 1988). This would be very difficult to measure and
would be of suspect value.
C4.5 did not require an assumption about an underlying distribution of the data or
the correlation between features (Quinlan, 1993). However, others (Han et al., 1996)
have identified decreased classification accuracy of ID3, the C4.5 precursor, as
correlation among explanatory variables increased. Therefore, the data just were not
ideal for any of the three classification systems.
The twenty-four continuous features used for this phase are presented in Table
4 1. We selected 555 equity mutual funds having no missing feature values from the
Momingstar database. The funds included four investment objectives: Aggressive
Growth, Equity-Income, Growth, and Growth-Income.
We used the procedure described by Weiss and Kulikowski (1991), and referred
to as the train-and-test paradigm, for estimating the true error rate and performance of the

75
Table 4.1: Classification Features for Phase 1.
Yield
Return on Assets
Year-to-Date Return
Debt % Total Capitalization
3-Month Total Return
Median Market Capitalization
Year-1 Total Return
Cash %
Alpha
Natural Resources Sector
Beta
Industrial Products Sector
R-Squared
Consumer Durables Sector
Total Assets
Non-Durables Sector
Expense Ratio
Turnover
Services Sector
P/E Ratio
Financial Services Sector
P/B Ratio
Manager Tenure
three systems in classifying the mutual funds. We randomly sampled the 555 funds using
a uniform distribution to produce a training set of 370 examples and a testing set of 185
examples. We calculated the error rate according to the following formula:
error rate = number of errors
number of cases
We computed a lower bound for the size of the training set with twenty-four
continuous features for a two class decision tree induction sufficient to guarantee an error
level, 6, within a specified degree of confidence, 5, for binary data (Kim and Koehler,
1995). With e = 0.1 and 6 = 0.01 the sample size required was 378. This would be
appropriate because our data consisted of continuous features upon which C4.5 would
perform binary splits.
Our classification problem, however, had five classes, so we required five times
as many examples for training (Langley, 1996, p 31) or 1,892 examples using the above
parameters. However, we only had 370 examples and had to modify the error and
confidence parameters to e = 0.35 and 8 = 0.115. This meant that given the 370

76
examples, we were 88.5% confident that the output of the algorithm was 65% correct for
the five classes or star ratings.
We processed each of the twenty training datasets with the SAS stepwise
discriminant analysis procedure, STEPDISC. A major assumption for this statistical
procedure is that the features are multivariate normal with a common covariance matrix.
Using the one-sample two-tailed Kolmogorov-Smirnov Test we determined that of the 22
features only Debt % of Total Capitalization (p = 0.396) and Return on Assets (p =
0.044) were univariate normally distributed and concluded that the data were not
multivariate normal.
The classification features determined by STEPDISC then were processed with
the SAS DISCRIM procedure to compute the discriminant function based on the selected
features. The result was a classification of the mutual funds into the five classes or
Momingstar ratings for the training and testing sets. In addition, the SAS system
produced a confusion matrix for each set of examples that compared the classification
determined by the system to the actual Momingstar rating. We made an error count for
each holdout set.
Next, we processed the training set with the SAS LOGISTIC procedure and this
fitted a linear logistic regression model for ordinal response data by the method of
maximum likelihood. The approach selected for this procedure was the stepwise
selection of features and, using the following equations, the system performed a
probability calculation to determine the classification of each example:
logit(p) = intercept + I parameterj * feature valuej
where p was the probability of the classification and was defined by:
p = eWW/fi + Jogit(p))

77
We considered the first rating classification with a probability greater than 50% to be the
determined rating and we compared this to the actual Momingstar rating. A count was
made of the misclassified examples to determine the classification errors of Logit. We
determined a classification rating for every example for the training set and testing set
and all were included in the error count.
C4.5 classified the training set by building the classification decision tree using
the Gain Ratio Criterion, described in Section 2.3.2.1, and the test examples were then
subsequently classified with the best tree. We used the default settings for C4.5
described in Quinlan (1993) with the exception that any test used in the classification tree
must have at least two outcomes with a minimum of x examples. Quinlan (1993)
recommends higher values for x for noisy data. We incremented this value in units
between 5 and 25 for the experiments.
The C4.5 program calculated the error rate for the training set and the testing set
and then produced a Confusion Matrix of the classification of the test set by comparing it
to the actual Momingstar rating from the example.
4.5.2 Results
Logit performed best with mean classification errors of 33.5% (62 mean errors
per sample) over the twenty samples. C4.5 followed with 37.2% mean classification
errors (69 mean errors per sample) and LDA had 39.6% mean classification errors (73
mean errors). Figure 4.1 shows the error rate for classification for each of the 20 samples
(labeled A through T). We performed an Analysis of Variance (ANOVA) test with the
null hypothesis of equal means on the number classification errors. With F=30 (p=0.0),

78
for dfi=2 and df2=57, we rejected the null hypothesis meaning that the three methods did
not classify the funds equivalently.
Figure 4.1: C4.5, LDA, and Logit Classification Errors for April 1993 Momingstar Data.
Table 4.2: C4.5 Classification Features for Phase 1.
Feature
Frequency
Feature
Frequency
Expense Ratio
20
Year 1 Total Return
4
Alpha
19
YTD Total Return
4
Yield
14
Cash %
3
Median Market
Capitalization
13
Debt % of Total
Capitalization
3
Assets
11
Industrial Products Sector
3
Return on Assets
10
Natural Resources Sector
3
R-Squared
9
Nondurables Sector
3
P/B Ratio
8
Manager Tenure
2
Beta
5
P/E Ratio
2
Consumer Durables Sector
4
Retail Sector
2
Turnover
4
Service Sector
2
In Table 4.2, we list the features selected by C4.5 for the classification process
and the selection frequency. While frequency is not an absolute measure of classification

79
importance we gained useful knowledge about how often a feature was selected. On
average, each decision tree consisted of 7.4 features with a median of 7.5 features.
LDA and Logit select features for one time use, while the procedures may later
discard the feature due to the stepwise nature of both procedures. Tables 4.3 and 4.4
display not only the selection frequency of the features but also their position in the
classification process. Both values, taken together, provide some indication of the
classification importance of a feature to the 20 samples. For example, while the Services
Sector has an average position of 6.9, it has a position standard deviation (a) of 2.7 and it
Table 4.3: LDA Classification Features.
Feature
Average Position
a
Frequency
Alpha
1.0
0.0
20
R-Squared
2,0
0.0
20
Beta
3.0
0.0
20
Assets
4.8
2.1
20
Year 1 Total Return
7.2
1.9
20
Turnover
6.3
1.9
19
Retail Sector
7.6
1.3
16
Expense Ratio
7.9
2.4
16
Return on Assets
8.1
2.0
14
Cash %
9.4
2.6
13
Debt % of Total Capitalization
8.9
2.5
10
Consumer Durables Sector
10.4
2,1
8
P/E Ratio
11.0
1.3
8
Services Sector
6.9
2.7
7
P/B Ratio
9.4
3.2
7
Financial Sector
12.4
1.5
5
Industrial Products Sector
9.8
2.6
4
Median Market Capitalization
11.3
3.3
4
Yield
10.0
2.0
3
Nondurables Sector
11.3
0.6
3
YTD Total Return
12.0
0.0
3
Natural Resources Sector
12.7
1.5
3

80
was used in only 7 samples. Year-1 Total Return, while having an Average Position of
7.2, has a much smaller c and was used in classifying all 20 samples. Thus, we
considered Year-1 Total Return to be more important to the classification process than
the Services Sector feature. On the other hand, we would be suspicious of a classification
starting out with Natural Resources Sector rather than Alpha, Beta, or Assets
LDA used a mean of 12.2 features to classify each sample (median = 12.0). Logit
required the fewest features with a mean of 5.9 features for classifying the training set
(median = 6.0).
Table 4.4 : Logit Classification Features.
Feature
Average Position
a
Frequency
Alpha
1.0
0.0
20
R-Squared
2.0
0.0
20
Beta
3.0
0.0
20
Assets
4.3
0.6
20
Expense Ratio
5.2
0.8
13
Debt % of Total Capitalization
5.3
0.5
11
Median Market Capitalization
6.3
0.6
3
Financials Sector
6.0
0.0
3
Natural Resources Sector
8.0
1.4
2
P/E Ratio
6.0
1.4
2
Return on Assets
5.0
0.0
2
Yield
5.0
0
1
Retail Sector
8.0
0
1
A listing of Phase 1 features used for classification appears in Appendix B. In Table 4.5,
we provide a consensus list of the features that the three programs selected consistently
for classification This listing represents the features most frequently selected by all three
classification methodologies.

81
Table 4.5: Consensus List of Classification Features for Phase 1
Feature
Alpha
R-Squared
Beta
Assets
Expense Ratio
Debt % of Total Capitalization
Return on Assets
4 5.3 Conclusions
The results showed that Logit had fewer classification errors than C4.5 and LDA.
Logit performed better than C4.5 in seventeen of the twenty samples. Also, the three
classification algorithms used a very limited and comparable mix of the twenty-four
features to classify the twenty testing sets. After reviewing the error and confidence
factors, we concluded that a larger sample size could improve the performance of C4.5.
4 6 Phase 2 - 1993 Data with Derived Features
4.6.1 Methodology
The second phase of this study used the Momingstar April 1993 database to
determine if new features, derived from the database, improved the classification of
mutual funds The derived features included approximating Momingstar Return minus
Momingstar Risk (not published by Momingstar in 1993), reversing the weights on the
approximation of Momingstar Return minus Momingstar Risk, and the Treynor Index.
The Treynor Performance Index, Tp, is a measure of relative performance not
calculated by Momingstar and is defined as follows:

82
where Rp is the portfolio return, RF is the risk-free rate of return, and Â¡}p is the portfolio's
beta or the nondiversifiable past risk. The Treynor Performance Index treats only that
portion of a portfolio's historical risk that is important to investors, as estimated by Â¡}p, and
neglects any diversifiable risk (Radcliffe, 1994).
Tp was calculated by subtracting the three-year Mean Treasury Bill rate from the
Momingstar Three-year Annualized Return for the mutual fund, and dividing this value
by the three-year /?of the mutual fund in the Momingstar database.
We constructed the other new features also using data from Momingstar. The
April 1993 Momingstar CD-ROM did not provide the actual Momingstar Return values
but did provide Momingstar Risk. To approximate Momingstar Return the 3-year, 5-
year, and 10-year Average Returns were used as surrogates in the Momingstar Return
minus Risk formulas. The difference between them and the actual Momingstar Return
for those periods was that Momingstar deducted the effects of mutual fund loads and
fund loads and redemption fees that would make it very difficult for anyone to construct a
In developing the Reversed Weight feature, we reversed the weights used by
Momingstar in their Risk-Adjusted Return formulas and applied these to the calculation
of five-year and ten-year Return minus Risk data. For five-year old mutual funds,
Momingstar used 40% of the three-year value and 60% of the five-year value and we
reversed them For ten-year old mutual funds, Momingstar used 20% of the three-year
data, 30% of the five-year data, and 50% of the ten-year data. We reversed these weights
to 50% for three-year data, 30% for five-year data, and 20% for ten-year data.

83
We prepared a dataset consisting of 20 random samples. By increasing the
number of Investment Objectives from the four used in Phase 1 to the twelve domestic
equity investment objectives we were able to increase the total number of examples to
784. The Regular Dataset had 23 features listed in Table 4.6. We constructed a similar
dataset, including the three derived features or 26 features, and referred to it as the
Derived Dataset. This provided a training set of 523 examples and a testing set of 261
examples. A lower bound for the size of the 5-Star training set, using the method of
Phase 1, was calculated at 523 examples for the Regular Dataset with e = 0.35 and 8 =
0.078, and 523 examples for the Derived Dataset with e = 0.35 and 6 = 0.111
The measurement of interest for this Phase was the fewest classification errors
using C4.5, LDA, and Logit. We processed the training sets and testing sets through the
same SAS procedures used in Phase 1, as well as C4.5. Table 4 6 lists the features
common to the Regular and Derived datasets.
Table 4.6: Phase 2 Common Features for Classification by C4.5, LDA, and Logit.
Yield
Return on Assets
YTD Return
Debt % Total Capitalization
Year 1 Total Return
Median Mkt. Capitalization
Alpha
Cash % of Holdings
Beta
Natural Resources Sector
R-Squared
Industrial Products Sector
Total Assets
Consumer Durables Sector
Expense Ratio
Non-Durables Sector
Turnover
P/E Ratio
Services Sector
P/B Ratio
Financial Services Sector
Manager Tenure

84
4 6 2 Results for the Regular Dataset
The mean classification errors over the twenty samples for the Regular Dataset
was as follows: C4.5 was 36.6% (95.6 mean errors), LDA was 37.7% (98.5 mean errors),
and Logit was 35.8% (93.4 mean errors). Figure 4.2 shows the errors by sample. Logit
had fewer errors than C4.5 in fifteen samples
However, an ANOVA test with a null hypothesis of equal means was performed
on the number of errors and calculated F=1.35 (p=0.267) for dft=2 and df2=57. Thus, we
failed to reject the null hypothesis and we assumed the three methods classified the
mutual funds equivalently.
Figure 4.2: C4.5, LDA, and Logit Classification Errors for 1993 Momingstar Data for the
Regular Dataset.
The features selected by C4.5 and their frequency are displayed in Table 4.7.
C4.5 used an average of 8.9 features per sample with a median of 8.0 features.

85
Table 4.7: C4.5 Feature Selection for the Regular Dataset.
Feature
Frequency
Feature
Frequency
Alpha
20
Natural Resources Sector
6
Assets
20
Service Sector
5
Industrial Products Sector
16
Cash%
4
Median Market
Capitalization
13
Turnover
4
R-Squared
13
YTD Total Return
4
SEC Yield
13
Beta
3
Expense Ratio
12
Consumer Durables Sector
3
Return on Assets
10
P/B Ratio
3
Debt % of Total
Capitalization
9
Financial Sector
1
P/E Ratio
8
Manager Tenure
1
Year 1 Total Return
8
Nondurables
1
Retail Sector
1
Table 4.8: LDA Feature Selection for the Regular Dataset.
Features
Average Position
(T
Frequency
Alpha
1.0
0.0
20
R-Squared
2.0
0.0
20
Beta
3.0
0.0
20
Assets
5.0
1.0
20
Industrial Products Sector
5.7
1.7
20
Debt % of Total Capitalization
7.7
2.7
20
YTD Total Return
7.9
2.7
18
P/B Ratio
8.1
2.4
14
Retail Sector
10.0
1.8
14
Cash %
11.0
2.8
14
Year 1 Total Return
11.0
2.3
14
Turnover
9.2
1.8
13
P/E Ratio
11.0
4.0
10
Expense Ratio
11.0
3.4
10
Finance Sector
11.0
3.2
9
Service Sector
11.0
2.6
8
Return on Assets
13.0
3.3
8
Consumer Durables
12.0
2.3
7
Natural Resources Sector
5.8
3.1
6
Manager Tenure
14.0
1.2
6
SEC Yield
12.0
2.1
3
Non-Durables Sector
13.0
0
1

86
Table 4.8 summarizes the feature selection positioning, the standard deviation of
the position (c), and the frequency with which LDA used the feature for classification.
LDA required an average of 13 features to classify the samples.
The features most frequently selected by Logit are listed in Table 4.9. Alpha, R-
Squared, Beta, and Assets were selected for every sample. Logit used an average of 7.3
features to classify the 20 samples.
Table 4.9: Logit Feature Selection for the Regular Dataset.
Features
Average Position
a
Frequency
Alpha
1.0
0.0
20
R-Squared
2Xp
0.0
20
Beta
3.0
0.0
20
Assets
4.6
0.9
20
YTD Total Return
6.2
0.6
13
Return on Assets
5.5
1.4
11
Debt % of Total Capitalization
4T
0.6
10
Consumer Durables Sector
6.4
1.6
8
Expense Ratio
5.7
0.8
6
Industrial Products Sector
6.5
1.3
4
Manager Tenure
7.3
0.6
3
P/B Ratio
7.7
0.6
3
P/E Ratio
6 0
1.4
2
Retail Sector
7.0
1.4
2
Median Market Capitalization
7.0
1
SEC Yield
8.0
1
Year 1 Total Return
8.0
1
Table 4,10: Consensus List for Regular Features.
Features
Alpha
R-Squared
Assets
Beta
YTD Total Return

87
Table 4.10 lists the consensus features used with high regularity by the three
methodologies classifying the Regular Dataset. There was little agreement beyond these
five features.
4.6,3 Results for the Derived Features Dataset
C4.5 had a mean classification error rate of 20.9% (54.6 mean errors), LDA
performed with a mean error rate of 26.7% (69.7 mean errors) and Logit had a mean error
rate of 20.5% (53.6 mean errors). Figure 4.3 on the next page shows the classification
errors for the samples. While Logit performed best, it was not statistically significant.
We performed an ANOVA test with null hypothesis of equal means for the twenty
samples and calculated F=52.7 (p=0.0) for dfi=2 and df2=57, and we reject the null
hypothesis. A separate t-test with a null hypothesis of equal means for C4.5 vs. Logit had
a p-value of 0.40 so we fail to reject that null hypothesis and consider the mean errors of
C4.5 and Logit to be equal Therefore, C4.5 and Logit classified the mutual funds
equally well and LDA performed the worst.
Figure 4 3: C4.5, LDA, and Logit Classification Errors with Derived Features.

88
The features most frequently selected by C4.5 are listed in Table 4.11. On
average, the features used for classification declined from to 4.4, with a median of 4.0.
The Treynor Index was only selected twice.
Table 4.11: C4.5 Derived Feature Selection
Feature
Frequency
Feature
Frequency
Return minus Risk
19
Median Market
Capitalization
2
Assets
17
R-Square
2
SEC Yield
9
TreynorIndex
2
Reversed Weight Return minus
Risk
5
Cash %
1
Alpha
4
Manager Tenure
1
Beta
4
Nondurables Sector
1
Debt % of Total Capitalization
4
P/B Ratio
1
Consumer Durables Sector
3
P/E Ratio
1
Turnover
3
Retail Sector
I
Expense Ratio
2
Services Sector
1
Finance Sector
2
Year 1 Total Return
1
Industrial Sector
2
In Table 4.12 on the next page we see that LDA used more features to classify
these datasets than did C4.5 and Logit Fifteen features (mean and median) were required
for classification. Surprisingly, it even used the derived Reversed-Weight Return minus
Risk feature in all twenty samples. The Treynor Index was used for 12 samples.
Table 4 13, two pages hence, shows that Logit used a mean of 4.8 features for
classification (median = 4 5).
Table 4.14 is the consensus features used with regularity by the three
classification methodologies. Return minus Risk was the most commonly used feature
and dominated the selection of other features Both C4.5 and Logit had few features that
were used in more than ten samples LDA used many more features with poor results.

89
Table 4.12: LDA Derived Feature Selection
Features
Average Position
CT
Frequency
Return minus Risk
1.1
0.2
20
R-Squared
2.0
0.2
20
Industrial Products Sector
3.8
1.5
20
Assets
5.1
1.6
20
Alpha
6.4
2.2
20
Reversed-Weight Return minus Risk
6.9
2.0
20
Beta
7.8
1.8
20
Expense Ratio
9.6
2.7
19
Debt % of Total Capitalization
11
2.6
18
Return on Assets
8.7
3.2
17
Turnover
9.4
2.3
14
Cash%
13
3.6
12
TreynorIndex
10.0
2.9
12
Consumer Durables Sector
13
2.1
11
Manager Tenure
14
1.4
10
Retail Sector
12
3.0
9
P/B Ratio
12
3.9
8
YTD Total Return
6.6
5.3
7
Service Sector
13
3.5
7
Yield
12
4.2
5
Year 1 Total Return
13
2.5
5
Natural Resources Sector
14
2.5
5
P/E Ratio
15
0.8
4
Median Market Capitalization
15
6.4
2
Finance Sector
15
2,8
2
Non-Durable Sector
13
1
4.6,4 Conclusions
The classification features for each sample are in Appendix C. A review of the
features selected by C4.5 and Logit showed that a small number of the 23 available
features were used to classify the Regular Dataset and the Derived Dataset. Increased
sample size reduced the classification errors of C4.5 to where it was equivalent to Logit.

90
Table 4 13: Logit Derived Feature Selection.
Features
Average Position
a
Frequency
Return minus Risk
1.0
0.0
20
Beta
2.2
0.5
19
Manager Tenure
3.7
1.1
14
Assets
3.1
0.8
11
Natural Resources Sector
4,4
0.9
11
Expense Ratio
4,1
0.9
9
R-Squared
4.5
0.6
4
Consumer Durables Sector
6.0
1.0
3
Alpha
6.0
1
Retail Sector
5.0
1
Debt % of Total Capitalization
4.0
1
P/B Ratio
2.0
1
Table 4 14: Consensus List for Derived Features
Features
Return minus Risk
Assets
Beta
Manager Tenure
Using the derived feature of Return minus Risk resulted in improved
classification versus the Regular Dataset. We observed that when this feature was present
in the feature vector, C4.5 and Logit used substantially fewer features for classification.
The derived features of Reversed Weight Return minus Risk and the Treynor Index were
not selected for classification by Logit and were seldom used by C4.5.
The interesting result of the second phase was the disappointing performance of
LDA with the derived features. We concluded that the derived features either introduced
more noise than acceptable, a degree of nonlinearity, or LDA was affected by high
correlation among features. Since we had violated the constraints of multivariate
normality and multicollinearity for this classification methodology, it was impossible to
determine the exact cause of this failure

91
4 7 Phase 3 - Comparing 5-Star and 3-Star Classification
4.7.1 Methodology
Phase 3 of this research tested the idea that we could improve classification error
rates, i.e., lower it, by combining the five Momingstar ratings into three using the
following scheme:
(1) Momingstar ratings 1-Star and 2-Star became new rating 1-Star,
(2) Momingstar rating 3-Star became new rating 2-Star, and
(3) Momingstar ratings 4-Star and 5-Star became new rating 3-Star.
We proposed this variation of the standard Momingstar rating system to increase the
number of examples at each end of the Momingstar scale. An examination of the data
from the previous phases showed that a higher error rate was occurring in 1-Star and 5-
Star Momingstar ratings than for 2-Star, 3-Star, or 4-Star. Momingstar rating
classifications 1-Star and 5-Star each represented 10% or less each of the examples in the
datasets. Additionally, previously cited material showed that investors mostly purchased
4- and 5-Star mutual funds Combining the individual ratings in this manner still
permitted segmenting the 4- and 5-Star funds from the funds with lower ratings.
Minitab was used to generate ten uniformly random training sets and testing sets
from the July 1994 Momingstar CD-ROM database Thirty-two features were selected
for the classification problem and are listed in Table 4.15. The number of complete
examples extracted from the equity mutual funds database was 999, resulting in 666
training set examples and 333 testing set examples. We calculated a lower bound on the
required number of training examples. For e = 0.30 and 5 = 0.13 we required 669
training examples for the three-class experiment. With e = 0.30 and 6 = 0.1925 we
required 668 training examples for the five-class experiment.

92
Table 4.15: Phase 3 Features.
3 Month Total Return
Financials Sector
6 Month Total Return
Industrial Cyclicals Sector
Year 1 Total Return
Consumer Durables Sector
Alpha
Consumer Staples Sector
R-Squared
Services Sector
Distributed Yield
Retail Sector
Income Ratio
Health Sector
Turnover
Technology Sector
P/E Ratio
Cash %
P/B Ratio
Stocks %
Return on Assets
Foreign Stock %
Debt % of Total Capitalization
Maximum Sales Charge
Median Market Capitalization
Expense Ratio
Momingstar Style
Assets
Utilities Sector
Manager Tenure
Energy Sector
Minimum Initial Purchase
Table 4.16 shows that we created four datasets from the July 1994 Momingstar
CD-ROM database for the classification experiments. Two datasets using the
Momingstar 5-Star rating were prepared to test against the new 3-Star rating system.
Table 4.16: Phase 3 Datasets
Dataset
Description
1
Momingstar 5-Star rating and all features excluding Alpha and R-Squared
2
Momingstar 5-Star rating and all features
3
New 3-Star rating and all features excluding Alpha and R-Squared
4
New 3-Star rating and all features
We excluded Alpha and R-Squared from two of the datasets to determine the
accuracy of classifying mutual funds absent three-year data This is the minimum
financial reporting period required for a Momingstar rating and this test would show how
well we could classify funds not having three-year data, or unrated funds. We performed
a separate test of classifying mutual funds from the July 1994 database by only Alpha and

93
had an error rate of 49%. We concluded that Alpha randomly classifies the mutual funds
using the 5-star rating system.
We conducted the classification experiments in this phase similarly to Phase 1 and
2 using SAS STEPDISC and DISCRIM, SAS Logit, and C4.5. The C4.5 sensible tests
value, i.e., the number of examples used in a classification test, now varied from 10 to 30.
4.7,2 Results
Classification of the datasets, by all three methods, showed improving results
from dataset 1 to dataset 4. Table 4.17 is a summary of the results and it shows poor
classification results for dataset 1, 5-Star ratings absent three-year data.
Table 4.17: Summary of Phase 3 Mean Classification Error Rate for Four Datasets
Dataset
C4.5
LDA
Logit
1
53.4%
56.4%
53.8%
2
40.6%
44.3%
42.3%
3
42.0%
40.4%
42.0%
4
30.2%
33.2%
30.5%
Dataset 1 was not analyzed further due to poor classification. The sample classification
features for the four datasets are in Appendix D.
Table 4.18 is a comparison of the classification errors for the three methodologies.
We conducted an ANOVA test with null hypothesis of equal means for the classification
errors listed in this table. The null hypothesis was rejected with F = 9.47 for dfr=2 and
dfÂ¡=27 with p=0.001. This statistic showed that the results of the three methodologies
were not equivalent. We performed a separate t-test for equal means for C4.5 vs. Logit
with p = 0.019 and this showed that C4.5 performed significantly better than Logit.
Figures 4.4, 4 5, and 4.6, on the following page, graph the sample results of C4.5,
LDA, and Logit classification error rates, or the error percentage. From Figure 4.6 it was

94
observed that Logit performed well due to lower classification errors for 3-Star funds
than the other mutual fund ratings Approximately one-third of all mutual funds are rated
3-Stars.
Table4.18: Comparison of Classification Errors for C4.5, LDA, and Logit for Dataset 2.
Sample
C4.5
LDA
Logit
A
132
152
147
B
130
148
138
C
133
161
140
D
136
144
142
E
145
135
140
F
139
150
140
G
144
149
150
H
120
143
134
I
134
149
139
J
140
145
139
Mean
135.3
147.6
140.9
â™¦ 1-Star
â€”Hâ€” 2-Star
â€”Aâ€”3-Star
â€”Xâ€”4-Star
â€”atâ€”5-Star
ABCDEFGH I J
Sample
Figure 4.4: C4.5 Classification Error Rate of Dataset 2.
Table 4.19 provides a breakdown of mean classification errors by Star rating for
the ten samples. This demonstrates how C4.5 was better able to classify 1-Star and 5-
Star funds than LDA and Logit. This was of interest because the 1-Star funds have the
lowest Momingstar Risk-Adjusted Return and the 5-Star funds have the highest.

95
Figure 4 5: LDA Classification Error Rate for Dataset 2.
The results for classifying Dataset 3, the first with the modified 3-Star rating, are
displayed in Table 4.20. We performed an ANOVA test with null hypothesis of equal
means and it showed that, F = 1.33 for df,=2 and df2=27 and p = 0 28. Thus, C4.5, LDA
and Logit performed the classification task equally well and LDA had the fewest errors.
Figures 4 7, 4.8, and 4.9 show the classification error rates of the individual
samples for C4.5, LDA, and Logit, respectively, by the 3-Star classification. In Figure
4.8 we see that LDA was particularly good at classifying 3-Star funds.

96
Table 4.19: Dataset 2 Summary of Mean Classification Error Rate by Star Rating.
1 Star
2 Stars
3 Stars
4 Stars
5 Stars
C4.5
31.7%
43.5%
37.9%
46.5%
32.9%
LDA
39.0%
39.2%
47.9%
49.5%
31.0%
Logit
45.2%
47.6%
32.0%
46.4%
56.9%
Table 4.20: Comparison of Classification Errors for C4.5, LDA, and Logit for Dataset 3.
Sample
C4.5
LDA
Logit
A
131
132
146
B
146
141
142
C
140
144
143
D
153
152
140
E
135
129
145
F
134
136
130
G
139
132
143
H
128
127
134
I
137
121
136
J
143
132
138
Mean
138.6
134.6
139.7
â™¦ 1 Star
â–  2 Star
â€”Aâ€”3 Star
Figure 4.7: C4.5 Classification Error Rate of Dataset 3.

97
Sample
Figure 4.8: LDA Classification Error Rate of Dataset 3.
Figure 4.9 shows that Logit classified 2-Star funds with fewer errors than 1- and
3-Star funds. Since each of the classes had approximately the same number of examples
we assume that merging Momingstar classes to form the new ratings provided sufficient
noise to cause Logit to poorly classify the 1- and 3-Star funds. Table 4.21 shows the
extent of the Logit classification difficulty since the 1- and 3-Star classification error
rates are the same as random chance.

98
Table 4.21 also shows that C4.5 and LDA performed much better than Logit at
classifying by individual star ratings although the previous ANOVA statistical test
showed all three classification methods performed equivalently
Table 4.21: Dataset 3 Summary of Mean Classification Error Rate by Star Rating.
1 Star
2 Stars
3 Stars
C4.5
44.3%
43.2%
38.6%
LDA
42.6%
42.7%
36.6%
Logit
52.9%
25.7%
50.0%
The analysis of Dataset 4 completes Phase 3. Table 4.22 shows the results of
classifying the 10 samples. We performed an ANOVA test with null hypothesis of equal
means and determined F = 5.35 for dfi=2 and df2=27, or p=0.011, and rejected the null
hypothesis. We assumed that LDA performed the worst. A separate t-test for equal means
for C4.5 and Logit had a p-value of 0.43 and so we failed to reject the null hypothesis of
equal means for these two procedures. We assumed that they classified the samples
equivalently although C4.5 had the fewest mean errors.
C4.5 had difficulty classifying 2-Star funds in Figure 4.10 on the following page.
LDA was rather balanced among the three ratings in Figure 4.11. In Figure 4.12, we see
that Logit classified 3-Star funds best among the three ratings. Table 4.23 summarizes
the mean classification error rates by star rating and we see that C4.5 fewer errors
classifying 1- Star and 3-Star mutual funds than did LDA and Logit. This was of interest,
once again, because 1-Star mutual funds have the lowest Risk-Adjusted Return and 3-
Star funds have the highest. Thus, it would be of practical benefit to identify the extreme
ratings, like C4.5 does, and avoid underperforming funds and identify high performance
funds, than it would be to identify the middle 2-Star funds, like LDA and Logit.

99
Table 4.22: Classification Errors for C4.5, LDA, and Logit for Dataset 4.
Sample
C4.5
LDA
Logit
A
97
113
108
B
94
117
91
C
101
116
104
D
102
122
98
E
96
103
102
F
92
104
105
G
112
120
121
H
97
108
97
I
101
93
92
J
104
109
98
Mean
99.6
110.5
101.6
Figure 4.10: C4.5 Classification Error Rate of Dataset 4.
Figure 4.11: LDA Classification Error Rate of Dataset 4.

100
Figure 4.12: Logit Classification Error Rate of Dataset 4.
Table 4.23: Dataset 4 Summary of Mean Classification Error Rate by Star Rating
1-Star
2-Star
3-Star
C4.5
24.4%
40.0%
23.7%
LDA
32.2%
36.0%
31.1%
Logit
32,1%
30.7%
28.5%
4 7 3 Conclusions
statistically significant, C4.5 had fewer errors than Logit in classifying three out of four
datasets. With respect to LDA, C4.5 had statistically significant fewer errors in two out
of three datasets we reviewed
Datasets 1 and 2 showed that the Momingstar 5-star rating system requires, at a
minimum. Alpha and R-Squared, to provide a good level of accuracy in classification.
Recall that the classification results for Dataset 1 were slightly worse than random.
However, with the Alpha and R-Squared features in Dataset 2, C4.5 had the best mean
error rate of 40.6%. Therefore, we concluded that we could not classify mutual funds
with the 5-Star rating system unless three-year features are present in the dataset.

101
Dataset 3, with the 3-Star classification system and no three-year features,
suggested it would be possible to classify unrated mutual funds but the results were not
great. The three methods performed equivalently well with a mean classification error
rate of 41.3%. This is just below the 40% threshold Harries and Horn (1995) described as
being useful for domain experts.
Dataset 4, with the Alpha and R-Squared three-year features and 3-Star rating
system, had a mean classification error rate by the three procedures of 31.2%. We
concluded from these results that we should be able to correctly classify unrated mutual
funds with a reasonable degree of accuracy.
4 8 Phase 4 - Crossvalidation with C4.5
4.8.1 Methodology
In the previous experiments with C4.5, we used 30 or fewer features and
compared its classification performance to LDA and Logit. However, in Phase 4 we used
C4.5 and not LDA and Logit. This phase is a classification experiment using 50 features
available in the 1994 Momingstar Mutual Fund database that we have listed in Table
4,24 We tested whether the presence of many features would cause confusion for C4.5
in classifying the mutual funds.
The C4.5 system includes a subroutine for performing classification experiments
by the method of crossvalidation (Quinlan, 1993). Two programs and a shell script
permit dividing the set of examples into N subsets. N-1 subsets are used in developing
the classification tree and the remaining subset becomes the test set. The N subsets are
tested individually and the results are averaged to determine the classification error rate,
Weiss and Kulikowski (1991) describe this methodology as an elegant and

102
straightforward technique for estimating classifier error rates. They note that for large
samples, 10-fold crossvalidation seems to be adequate and accurate for estimating error
rates. Quinlan (1993) also suggests using 10-fold crossvalidation, noting that the average
error rate over the unseen test sets is a good predictor of the error rate of a model built
from all the data. Our experiment used the 10-fold crossvalidation method.
Table 4.24: Phase 4 Features.
Investment Objective
Momingstar Investment Style
3 Month Total Return
Utilities Sector
3 Month Percentage Rank Objective
Energy Sector
6 Month Total Return
Financials Sector
6 Month Percentage Rank Objective
Industrial Cyclicals Sector
1 Year Total Return
Consumer Durables Sector
1 Year Percentage Rank Objective
Consumer Staples Sector
Annual Return 1993
Services Sector
Annual Return 1992
Retail Sector
Alpha (3 Year)
Health Sector
Beta (3 Year)
Technology Sector
R-Squared (3 Year)
Cash %
Standard Deviation (3 Year)
Stocks %
SEC Yield (30 Day)
Bonds %
Distributed Yield (12 Month)
Preferred %
Income Ratio
Other %
Turnover
Foreign Stock %
Potential Capital Gains Exposure
Price/Eamings Ratio
Deferred Charge
Price/Book Ratio
12B-1 Fees
Return on Assets
Manager Tenure
Debt % of Total Capitalization
Minimum Initial Purchase
Median Market Capitalization
Fund Inception Date
Assets
Shareholder Report Rating
Expense Ratio
Maximum Sales Charge
Seven datasets were created for the crossvalidation experiments and they are
listed in Table 4.25. Datasets 1 and 2 were to determine the classification performance
using all Table 4.23 features and the 3-Star and 5-Star rating systems. We created

103
Dataset 3 to serve as a benchmark for comparison to the other six datasets. Datasets 4, 5,
6 and 7 were to determine if classification performance was improved by including a
feature for one or both yearly Annual Returns while not including the three-year features
such as Alpha, Beta, R-Squared.
Table 4.25: Phase 4 Dataset Description.
Dataset
Features
Rating System
1
All Table 4.23 Features
5-Star
2
All Table 4.23 Features
3-Star
3
Current Year Features
3-Star
4
Current Year and Annual Return for 1992 and 1993
5-Star
5
Current Year and Annual Return for 1992 and 1993
3-Star
6
Current Year and Annual Return 1992
5-Star
7
Current Year and Annual Return 1992
3-Star
The measurement of interest was the mean number of classification errors for the
ten crossvalidation subsets. A total of 1,052 examples were used in this experiment with
the average training set having 947 examples and the average testing set having 105
examples. This was a much larger training set than the previous phases.
We calculated a lower bound on the required number of training examples using
fifty features. For e = 0.30 and 8 = 0.195 we required 953 training examples for the 3-
Star datasets. With e = 0.30 and 8 = 0.231 we required 957 training examples for the 5-
Star datasets.
4.8.2 Results
We present the results of this experiment in Table 4.26. Dataset 1 and 2 showed
that reducing the number of ratings from 5-Star to 3-Star resulted in a 25.7%
improvement in classification Dataset 3 showed that classifying mutual funds without
three-year features such as Alpha, Beta, etc., was equivalent to Dataset 1. In other words,

104
the 5-Star rating systems require three-year features but 3-Star rating systems do not
those features
Table 4.26: Mean Classification Errors and Error Rates for Phase 4.
Dataset
Mean Classification Errors
1
43.5 (41.4%)
2
32.3 (30.7%)
3
43.6(41.4%)
4
55.0 (52.3%)
5
39.6 (37.6%)
6
58.4 (55.5%)
7
42.3 (40.2%)
Dataset 4 classified funds very poorly and confirmed the requirement for three-
year features for 5-Star rating systems. We had added the Annual Return 1992 and
Annual Return 1993 features to this dataset and got random results. Dataset 6 was a
variation on Dataset 4, dropping the Annual Return 1993 feature, and our results were
similar. Datasets 5 and 7 show that classification performance can be improved by using
a 3-Star rating system compared to Datasets 4 and 6.
4.8.3 Conclusions
The first conclusion we draw from these results is that three classes or 3-Star
ratings provide better correct classification than 5-Star ratings. This was expected
features such as Alpha, R-Squared, and Beta improved classification and are necessary to
have good classification with a 5-Star rating system
A second conclusion was that classification error rates between the training and
testing set methodology, in Phases 1 through 3, and crossvalidation, have produced
comparable results. Specifically, the mean C4.5 error rate in the Phase 3, Dataset 4 was

105
30.2%. A similar dataset in Phase 4, Dataset 2, had an error rate of 30.7%. Thus, from
this similar result it appeared that C4.5 did not overfit the decision tree while classifying
mutual funds in the presence of a large number of features in Phase 4.
Our final conclusion for Phase 4 was that Annual Return features in Dataset 5
provided a slight improvement in classification of mutual funds versus Dataset 3. We
used two of the three years of financial data (Annual Return 92 and 93) to classify the 3-
star ratings for mutual funds in 1994 with a 37.6% error rate. This was a 9.2%
improvement in classification over Dataset 3 in which these features were not present.
Thus, we should be able to classify unrated mutual funds more accurately by having one
or two years of the Momingstar-required three years of financial data.
4 9 Overall Summary
This chapter focused primarily on classifying mutual funds with C4.5, LDA, and
Logit and comparing their performance. It is important to recall that in Phase 1 of our
study, Logit outperformed C4.5 and LDA. In Phase 2, C4.5, Logit and LDA performed
equivalently for the Regular Dataset and C4.5 and Logit outperformed LDA for the
Derived Dataset In Phase 3, C4.5 had fewer errors than Logit in three out of four
datasets, however, this was not statistically significant. C4.5 outperformed LDA in three
of the four Phase 3 datasets. In other words, as the number of training examples
increased from 370 to 523 to 666, the performance of C4.5 improved versus Logit and
LDA.
From the results of Phase 1, we concluded that a larger sample size could improve
the performance of C4.5. Phase 2 results for the Regular Dataset were similar to Phase 1.
The classification errors ranged around 33% - 35% for both phases which was in line

106
with our proposed error, or e, value. However, when we included the derived feature for
Risk-Adjusted Return in the second part of Phase 2, classification error rates declined to
20%-22%.
Phase 3 of this chapter is quite important since it compared the performance of the
Momingstar 5-Star rating system to a proposed 3-Star rating system. C4.5 and Logit
were determined to be statistically equivalent in error results. Datasets 2 and 3 showed
that we could classify mutual funds without three-year features, if we used a 3-Star rating
system. This result suggested the ability to classify mutual funds not rated by
Momingstar with an error rate of approximately 42%.
The fourth, and final phase, of this chapter used crossvalidation as a classification
technique for C4.5. We favorably compared one dataset from Phase 3 with 30 features to
one from Phase 4 with 50 features This indicated that the mutual fund features did not
confuse C4.5. Another result of this phase was that we could reduce classification errors
by approximately 10% by including features that represented one or two years of financial
data, out of the three years required for a Momingstar rating
Examples of the decision trees generated for Phases 1 through 4 of this chapter
are found in Appendix E.
Now that we have shown that C4.5 classified mutual funds equivalently to Logit and
outperformed LDA, in Chapter 5 we use C4.5 to predict Momingstar Mutual Fund ratings
and ratings changes one year in the future.

CHAPTER 5
PREDICTION OF MUTUAL FUND RATINGS AND RATINGS CHANGES
In the previous chapter, we compared the performance of C4.5 to LDA and Logit in
classifying mutual funds. In this chapter, we discuss several experiments conducted in three
phases using only C4.5 to predict mutual fund ratings and rating changes one year in the
future. The performance of C4.5 in predicting fund ratings and ratings changes will be
compared to actual ratings for the period in question and an investment strategy that mutual
funds maintain their rating for a period of one year.
Phase 5 concerned predicting mutual fund ratings over two one-year periods using
a common feature vector. We used a set of twenty-eight features common to the 1993,
1994, and 1995 Momingstar Mutual Fund databases to predict mutual fund ratings for
1994 and 1995 using the 5-Star and 3-Star rating systems.
In Phase 6 we used features newly published by Momingstar in 1994, along with
existing features, to predict the ratings and ratings changes of matched mutual funds.
Matched mutual funds have the same name from one year to the next. Nine hundred thirty-
four matched fund examples from 1994 were used to predict the 1995 mutual fund ratings
and rating changes with the 5-Star and 3-Star rating systems. We used 1,059 matched fund
examples from 1995 to predict ratings and rating changes for 1996 with the two rating
systems. We tested the C4.5 predicted ratings to see if they had an independent distribution
107

108
from the actual ratings. We also tested the predicted 5-Star rating distribution for normality;
a characteristic of the Momingstar 5-Star rating system.
Table 5.1: Characteristics of the Average Mutual Fund by Year.
Characteristic
1993
1994
1995
1996
Number of Mutual Funds
909
1,052
1,234
1,583
Average Rating
3.0
3.08
3.03
3.01
Alpha
-0.58
2.75
1.02
-0.90
Beta
0.91
0.84
0.87
0.90
R-Squared
69
57.4
52
53
Std. Deviation 3 Years
4.7
12.5
NA
NA
YTD Total Return
4.9%
-4.98%
13.9%
9.98%
1 Year Total Return
12.3%
4.1%
17.5%
22.1%
3 Year Annualized Return
11.6%
10.9%
12.6%
14.4%
5 Year Annualized Return
12.6%
9.7%
10.6%
14.6%
10 Year Annualized Return
11.9%
12 7%
13.3%
117%
Yield
1.2%
1.15%
1.06%
1.0%
Expense Ratio
1.49
1.39
1.40
1.40
Price/Eamings Ratio
21.9
22.4
20.3
24.1
Price/Book Ratio
4.1
3.1
3.4
4.3
Net Assets $MM$454
$579.4$631.3
$746.5 Median Market Capitalization$MM
$5,394$4,600
$6,785$9,043
Return on Assets
6 98
7.6
NA
NA
Turnover %
85.1
78
67%
69%
Cash %
8.3
8.99
8.0
7.9
Stock %
87.1
86.4
87.6
88.5
Bonds %
2.0
1.8
2.8
2.5
Manager Tenure in Years
6.3
5.6
5.6
5.2
Minimum Initial Purchase
$15,940$30,378
$37,576$99,314
12B-1 Fees
0,24%
0.23%
0.23%
0.24%
Deferred Charge
0.44%
0.40%
0.45%
0.48%
2.48%
2.30%
2.14%
1.92%
Industrial Cyclicals %
13.4%
18.0%
15.8%
17.3%
Consumer Products %
9.7%
7.6%
7.0%
5.2%
Services %
13.0%
12.4%
11.5%
13.0%
Utilities %
10.3%
10.0%
8.7%
5.9%
Financial Services %
14.6%
16.5%
16.1%
16.7%

109
Phase 7 is similar to Phase 6 except we predicted mutual fund ratings one year in
the future with unmatched funds, i.e., the training and testing sets were unbalanced.
Thus, we could predict ratings for more funds than Phase 6.
Table 5.1 lists characteristics of the average mutual fund in the Momingstar
databases used for these experiments. The data are dynamic on a year-to-year basis.
5.1 Phase 5 - Predicting Ratings with a Common Feature Vector Over Two Years
5.1.1 Methodology
We constructed six example datasets with a common feature vector using the
Momingstar Mutual Funds databases for 1993, 1994, and 1995. The goal was to determine
how well one feature vector could predict ratings over a multi-year period. Three of the
datasets used the Momingstar 5-Star rating system and three used the 3-Star rating system
described in Chapter 4. Table 5.2 lists the 28 features used in the experiment.
Table 5.2: Phase 5 Features.
Total Return YTD
Consumer Staples Sector
Total Return 12 Months
Retail Sector
Alpha
Services Sector
Beta
Utilities Sector
R-Squared
Financials Sector
Standard Deviation-3 Year
Minimum Initial Purchase
Net Assets
12b-l Fees
Expense Ratio
Deferred Charges
Turnover
%Cash
Manager Tenure
%Stocks
Price/Eamings Ratio
%Bonds
Price/Book Ratio
Industrial Cyclicals Sector
Median Market Capitalization
Consumer Durables Sector
Distributed Yield

110
The 1993 dataset had 909 examples, the 1994 dataset had 1,052 examples, and the
1995 dataset had 1,234 examples. We computed the lower bound for the size of the 1993
training set to be 911 with e = 0.25 and 5 = 0.075. The lower bound for the size of the
1994 training set was 1,052 with e = 0.25 and 8 = 0.035.
C4.5 constructed decision trees using the 1993 dataset as the training set and the
1994 dataset as the testing set. We then repeated this methodology with the 1994 data as
the training set and the 1995 data as the testing set. The measurement of interest was the
number of correctly predicted mutual fund ratings.
512 Results
The results for datasets using the 5-Star rating system were very poor. The 1993
training examples correctly predicted 40% of the 1994 mutual fund ratings. The 1994
training examples correctly predicted 50% of the 1995 ratings. We did no further
examination with the results of the 5-Star rating system.
Table 5.3, on the following page, shows results were much better using the 3-Star
rating system with datasets having three-year features such as Alpha, Beta, R-Squared,
and Standard Deviation. To estimate how well C4.5 correctly predicted the fund ratings,
we developed a dataset of matched mutual funds for 1993 and 1994 (772 mutual funds),
and 1994 and 1995 (930 mutual funds). In Table 5 4, we determined that 58.7% of these
772 matched funds for 1993-94 had unchanged ratings over the one-year period. The
matched funds are representative of 73% of the funds in the 1994 testing set of 1,052
examples. From this evidence, we assumed that these results are what would be expected
from the entire testing set. In other words, an investment strategy of mutual fund ratings
being maintained for one year would result in 618 fund ratings being correctly predicted.

Ill
C4.5 correctly predicted 641 fund ratings. This is 23 better than the investment strategy
or a 3.7% improvement.
Table 5.3: Number and Percent Correctly Predicted Mutual Funds, 3-Star System.
Prediction Year
1 Star
2 Star
3 Star
Weighted
Prediction
1994
168
54.9%
193
49.2%
280
79.1%
60.9%
1995
295
76.4%
248
57.7%
210
50.2%
61.0%
Table 5.4: Distribution of Matched Mutual Funds with Unchanged Ratings After One Year.
Period
1 Star
2 Star
3 Star
Weighted
Actual %
1993 to 1994
193
64.8%
151
55.7%
109
53.7%
58.7%
1994 to 1995
188
74.3%
207
58.8%
239
73.5%
68.2%
We performed a similar determination for the 1994-1995 period using the data in
Tables 5.3 and 5.4. There were 930 matched funds and 68.2% of these funds had
unchanged ratings from 1994 to 1995. The 930 matched funds are 75% of the 1,234 funds
in the 1995 testing set. However, C4.5 correctly predicted ratings for 753 of the mutual
funds or 61.0%. We expected that the investment strategy would result in 842 mutual funds
or 68.2%. Thus, C4.5 performed 10.6% worse than the investment strategy
Figures 5.1 and 5.2 display the best performing decision trees for predicting the
mutual fund ratings. The root of the tree in Figure 5.1 is the feature Alpha. To
understand the decision rules of the tree you read downward. For example, one
prediction rule would be:

112
IF Alpha > -3.04 AND Industrial Cyclicals > 30.9 THEN Fund is 1-Star
A more complex rule example shows how C4.5 selected Alpha twice as the best feature
in the decision tree construction process:
IF Alpha > -3.04 AND Industrial Cyclicals <= 30.9 AND Alpha <= -1.19 AND
R-Squared > 69 THEN Fund is 2-Star
Alpha <= -3.04 : 1-Star
Alpha > -3.04 :
1 Industrial Cyclicals > 30.9 : 1-Star
1 Industrial Cyclicals <= 30.9 :
I | Alpha <= -1.19 :
| | | R-Squared <= 69 : 1-Star
| | | R-Squared > 69 : 2-Star
| | Alpha > -1.19 :
| | | Alpha > 3.86 : 3-Star
I | | Alpha <= 3.86 :
I | | | R-Squared <= 52 : 2-Star
I | | | R-Squared > 52 :
I | | | | Beta <=0.68 : 3-Star
I | | I | Beta > 0.68 :
I I I I I | Expense Ratio <= 0.72 : 3-Star
I | | | | | Expense Ratio > 0.72 :
I | | | | | | Alpha <= 0.94 :
| | | | | | | | Standard Deviation > 4.27 : 2-Star
| | | | | | | | Standard Deviation <= 4.27 :
I I | | | | | | | Net Assets <= 159.1 : 2-Star
I | | | | | | | | Net Assets > 159.1 : 3-Star
I I I I | | | Alpha > 0.94 :
I | | | | | | | R-Squared <= 77 : 2-Star
I | I | | | | | R-Squared > 77 :
I I I I I I I I I Standard Deviation <=3.78 : 3-Star
I I I I I | | | | Standard Deviation > 3.78 :
I I I I I I I | | | Front Load <= 4 : 3-Star
I | I I I I | | | | Front Load > 4 : 2-Star
Figure 5.1: Decision Tree for Predicting 1994 Mutual Fund Ratings.
5.1.3 Conclusions
We concluded that we could not predict mutual fund ratings over the 1993 to
1995 period using a common feature vector. If C4.5 is to predict mutual fund ratings

113
better than an investment strategy of maintaining the same ratings over a one-year period,
we will have to identify new features to include in the feature vector.
Alpha <= -0.77 :
i
i
Net Assets <=12.1 : 1-Star
Net Assets > 12.1 :
i
i
| Distributed Yield <= 0 : 1-Star
| Distributed Yield <= 0 :
i
i
I | Alpha <= -3.98 : 1-Star
| | Alpha > -3.98 :
i
i
| | | Retail >6.1 : 2-Star
| | | Retail <= 6.1 :
i
i
| | | | Expense Ratio <=1.07 : 2-Star
| | | | Expense Ratio > 1.07 : 1-Star
Alpha > -0.77 :
i
i
P/E Ratio > 35.74 : 1-Star
P/E Ratio <= 35.74 :
i
i
I Alpha >7.05 : 3-Star
| Alpha <=7.05 :
i
i
| | Net Assets > 1055.7 : 3-Star
| | Net Assets <= 1055.7 :
i
| | | R-Squared <= 60:
i
| | | | Alpha <= 3.54
i
| | | | | Industrial Cyclicals <= 12.2
: 2-Star
i
i
| | | | | Industrial Cyclicals > 12.2
| | | | Alpha >3.54 :
: 1-Star
i
i
| | | | | R-Squared <= 24 : 2-Star
| | | | | R-Squared > 24 :
i
I I | | | I Consumer Durables <=4.5 :
3-Star
i
I | | | | | Consumer Durables > 4.5:
2-Star
i
| | | R-Squared > 60 :
i
i
Mil Alpha <=2.96 : 2-Star
| | | | Alpha > 2.96 : 3-Star
Figure 5.2: Decision Tree for Predicting 1995 Mutual Fund Ratings.
5.2 Phase 6: Predicting Matched Mutual Fund Rating Changes
5.2.1 Methodology
The experiments in this phase add to the feature vector several new features first
published by Momingstar in 1994. Among the new features are Momingstar Return for
Year 3, Year 5, and Year 10; Momingstar Average Rating; and Months Rated as of the date

114
of publication of the database. We also included the previously published Momingstar Risk
for Year 3, Year 5 and Year 10. We constructed four example datasets using 60 features
from the 1994, 1995 and 1996 Momingstar databases and the features are in Table 5.5.
Table 5.5: Sixth Phase Features
Momingstar Average Rating
Months Rated
Momingstar Risk 3 Year
Momingstar Risk 5 Year
Momingstar Risk 10 Year
Momingstar Return 3 Year
Momingstar Return 5 Year
Momingstar Return 10 Year
YTD Total Return
Median Market Capitalization
YTD % Rank Objective
Investment Style-Equity
3 Month Total Return
% Cash
3 Month % Rank Objective
% Stocks
12 Month Total Return
% Bonds
12 Month % Rank Objective
% Other
3 Year Annualized Return
% Foreign Stock
3 Year % Rank Objective
Industrial Cyclicals Sector
5 Year Annualized Return
Consumer Staples Sector
5 Year % Rank Objective
Consumer Durables Sector
10 Year Annualized Return
Retail Sector
10 Year % Rank Objective
Services Sector
15 Year Annualized Return
Utilities Sector
15 Year % Rank Objective
Financials Sector
1994 Annual Return
Energy Sector
1993 Annual Return
Health Sector
1992 Annual Return
Technology Sector
Alpha
Price/Eamings Ratio
Beta
Price/Book Ratio
Standard Deviation 3 Year
12b-l Fees
Standard Deviation 5 Year
Deferred Charges
Standard Deviation 10 Year
SEC Yield
Management Tenure
Turnover
Minimum Initial Purchase
Potential Capital Gains Exposure
Expense Ratio
5 Year Earnings Gain
Net Assets
These experiments used matched mutual funds for training and testing the
decision tree We manually matched the 1994 training dataset by mutual fund name to

115
the 1995 testing dataset (934 examples each). We also matched the 1995 training dataset
by fund name to the 1996 testing dataset (1,059 examples each).
We calculated a lower bound on the number of training examples required for the
experiments in this phase. These are listed in Table 5.6.
Table 5.6: Error and Confidence Parameters for Lower Bound on Training Examples.
Year and
Rating System
8
6
Training
Examples
1994-95 5-Star
0.35
0.1925
895
1994-95 3-Star
0.35
0.164
937
1995-96 5-Star
0.35
0.1855
1,059
1995-96 3-Star
0.35
0.155
1,064
There were several measurements of interest for these experiments:
1) The number of correct predictions for mutual fund ratings compared to
the actual ratings of the preceding year and current year,
2) x2 Goodness-of-Fit tests to determine if the C4.5 predicted ratings
distribution is statistically similar to the preceding year actual ratings
distribution and/or the current year actual ratings distribution,
3) A x2 Goodness-of-Fit test to determine if the 5-Star C4.5 predicted
ratings distribution is statistically similar to the standard Momingstar
distribution of 10%-22.5%-35%-22.5%-10%, and
4) Tests of normality (Kolmogorov-Smirnov test) for the 5-Star ratings
distributions.

116
5.2.2 Results for 1994 Data Predicting 1995 Ratings
5 2 2 1 The 5-Star ratine system
The C4.5 program, using the 1994 training set, output the Confusion Matrix in Table
5.7 that shows the matched funds 1995 5-Star ratings C4.5 predicted versus the actual 1995
of 78 mutual funds rated 1-Star in 1995 (total by going across), C4.5 correctly predicted 63
ratings. For the 2-Star funds, C4.5 correctly predicted 99 ratings. Overall, there were 602
correct predictions out of934 mutual funds or 64.5% correct.
Table 5.7: Five-Star Rating System Confusion Matrix, 1995 Matched Funds Actual Ratings
versus C4.5 Predicted Mutual Fund Ratings.
Predicted As =>
1-Star
2-Star
3-Star
4-Star
5-Star
Actual 95 1-Star
63
13
2
0
0
Actual 95 2-Star
17
99
65
13
0
Actual 95 3-Star
1
24
228
78
2
Actual 95 4-Star
0
2
72
153
16
Actual 95 5-Star
0
0
8
19
59
Table 5.8 lists the predicted and actual ratings changes from 1994 to 1995. To read
this table, start at the upper left-hand side where it says 1994 Rating. C4.5 predicted that the
funds rated 1-Star in 1994 would, in 1995, be rated 48 1-Star, 16 2-Star, and 4 3-Star. These
funds were actually rated in 1995 by Momingstar as 55 1-Star, 12 2-Star, and 1 3-Star.
Using Table 5.8 we can determine how well C4.5 predicted the one-year rating
changes for the funds This is quite different from Table 5.7 which only looks at the
classification of the 1995 predicted and 1995 actual ratings data, without regard to the 1994
rating. To measure the accuracy of the predicted rating changes vs. the actual rating
changes, we summed the absolute value of the differences between predicted and actual

117
ratings for 1995. There were 148 rating change differences between the predicted and actual
distributions. Thus, C4.5 predicted 84.2% of the ratings changes from 1994 to 1995.
Table 5.8: Five-Star Rating System, 1994 Rating vs. 1995 Actual Rating Changes and
C4.5 Predicted Rating Changes for Matched Funds.
1994
Rating
1995 Ratings
1-Star
2-Star
3-Star
4-Star
5-Star
1-Star
Predicted
48
16
4
0
0
Actual
55
12
1
0
0
2-Star
Predicted
20
75
75
15
0
Actual
19
102
58
6
0
3-Star
Predicted
5
44
212
87
5
Actual
3
61
204
8T
3
4-Star
Predicted
3
5
72
134
31
Actual
1
14
63
133
34
5-Star
Predicted
0
2
11
29
41
Actual
0
5
7
22
49
Ratings Distribution
Predicted 1995
76
142
374
265
77
Actual 1995
78
194
333
243
86
Actual 1994
68
184
355
244
83
Table 5.8 was also used to determine how many funds maintained their rating over
the one-year period according to the following formula:
Funds Maintaining Rating = 1994 1-Star Funds Rated 1995 Actual 1-Star + 1994 2-Star
Funds Rated 1995 Actual 2-Star + 1994 3-Star Funds Rated
1995 Actual 3-Star + 1994 4-Star Funds Rated 1995 Actual
4-Star + 1994 5-Star Funds Rated 1995 Actual 5-Star
We determined that 543 funds maintained their fund rating for the one-year period or
58 1%. Doing a similar addition for the predicted ratings we counted that C4.5 predicted
503 funds, or 53.9%, would maintain their rating. Thus, we could say that C4.5 correctly

118
predicted 40 fewer funds than the investment strategy of maintaining ratings for one year.
Figure 5.3 is a graph of the data in Table 5.8. It shows how closely the actual 1995
and predicted 1995 rating changes match.
Actual and C4.5 Predicted Ratings Changes
for 1994 to 1995
Figure 5.3: Distribution of Five-Star Actual Ratings Changes and C4.5 Predicted Ratings
Changes, 1994 to 1995
We performed three %2 Goodness-of-Fit tests on the ratings distributions in Table
5.8. We tested the 1994 actual ratings to 1995 actual ratings and determined that, with p =
0.53 for d.f. = 4, the two distributions were equivalent. We tested the 1994 actual ratings to
1995 predicted ratings and determined that, with p = 0.007 for d.f. = 4, the two distributions
were not equivalent. Finally, we tested the 1995 actual ratings to the 1995 predicted ratings
and determined that, with p = 0.0002 d.f. =4, the two distributions were not equivalent.
We performed a j2 Goodness-of-Fit test on the 1995 predicted ratings versus the
standard Momingstar rating distribution for 934 examples. If the predicted distribution fit
the standard Momingstar distribution, we could assume that any random distribution similar
to the Momingstar should be able to predict ratings just as well. The x2 result for the

119
Momingstar vs. the C4.5 predicted distribution was p < 0.001 for d.f. = 4 and we concluded
that the distributions were not equivalent.
We tested the three distributions (actual 1994, actual 1995, and predicted 1995) for
normality using the standard Momingstar 10%-22.5%-35%-22.5%-10% distribution as the
reference distribution. All were determined to be normally distributed with p > 0.15 for the
Kolmogorov-Smimov test.
3-Star
4-star
Year 3 Momingstar Risk <= 1.6 :
Year 3 Rank by Objective <= 30 :
Year 3 Annualized Return <= 16.86 :
| Year 5 Annualized Return >11.32 : 4-star
| Year 5 Annualized Return <= 11.32 :
| | Momingstar Average Rating <= 3.5
| | Momingstar Average Rating > 3.5
Year 3 Annualized Return > 16.86 :
| Year 5 Rank by Objective <= 17 : 5-Star
| Year 5 Rank by Objective > 17 : 4-star
Year 3 Rank by Objective > 30 :
Momingstar Average Rating <= 2.9 :
| Year 3 Momingstar Return <=0.48 :
<=1.9
> 1.9
0.48 :
<= 0.51
> 0.51
2.9 :
<=3.8 :
<= 0.81
<= 0.82
> 0.82
| | Momingstar Average Rating
| | Momingstar Average Rating
| Year 3 Momingstar Return >
| | Year 5 Momingstar Return
| | Year 5 Momingstar Return
Momingstar Average Rating >
| Momingstar Average Rating
I I
1-star
2-star
2-star
3-Star
Year 5 Momingstar Return
| | | Year 3 Momingstar Risk
| | | Year 3 Momingstar Risk
I I I I Alpha <= 0.28 : 2-star
till Alpha > 0.28 : 3-Star
| | Year 5 Momingstar Return > 0.81 :
| | | Bonds% <= 0 : 3-Star
| | | Bonds% > 0 : 4-star
| Momingstar Average Rating >
| | YTD Rank by Objective <= 72
| | YTD Rank by Objective > 72
Year 3 Momingstar > 1.6 :
Year 5 Momingstar Return <= 0.46 : 1-star
Year 5 Momingstar Return > 0.46 : 2-star
3-Star
3.8
: 4
star
-Star
Figure 5.4: Decision Tree for 1994-1995 5-Star Matched Funds Prediction.

120
Figure 5.4 is the decision tree with the best predictive power and it has 35 nodes.
By trial-and-error variation we determined that any test used in the decision tree had to
have at least two outcomes with 21 or more examples.
Table 5.9, on the following page, shows the eleven features selected by C4.5 for
building the tree. Selection Frequency counts each time a feature is used at a node (the tree
lists the feature twice for a node to show the values). Leaf Node Frequency counts each
node with a rating. The three-year and five-year features were dominant. We expected this
given the Momingstar rating formula discussed in Chapter 3. The decision tree, similar to
the Momingstar formula, alternated between risk and return features or surrogates like Rank
by Objective. Leaf node frequency counts showed that the five-year features were lower in
the tree than the three-year features.
Table 5.9: Features Selected for Predicting 1995 Matched Fund 5-Star Ratings.
Feature
Selection
Frequency
LeafNode
Frequency
Alpha
1
2
Bonds %
1
2
Momingstar Average Rating
4
4
Year 3 Annualized Return
1
0
Year 3 Momingstar Return
1
0
Year 3 Momingstar Risk
2
1
Year 3 Rank by Objective
1
0
Year 5 Annualized Return
1
1
Year 5 Momingstar Return
3
4
Year 5 Rank by Objective
1
2
YTD Rank by Objective
1
2
5,2.2.2 The 3-Star rating system
The Confusion Matrix results for the second experiment, predicting the ratings
from 1994 to 1995 for matched mutual funds classified by the 3-Star rating system, are

121
shown in Table 5.10. The correctly predicted ratings by C4.5 were 703 correct
predictions out of 934 mutual funds, or 75.3% correct. We determined from a manual
count that 635 funds, or 68 0%, maintained their rating over the one-year period. Thus,
C4.5 outperformed our benchmark investment strategy of funds maintaining their rating
for one year. We noted the improvement over the 5-Star rating system in the previous
experiment, however, we decided not to determine the individual rating changes since
they would be a small improvement from the sum of the individual ratings in Table 5.8
Table 5.10: Three-Star Rating System Confusion Matrix, 1995 Matched Funds Actual
Ratings versus C4.5 Predicted Mutual Fund Ratings.
Predicted As =>
1-Star
2-Star
3-Star
Actual 95 1-Star
202
65
5
Actual 95 2-Star
31
239
63
Actual 95 3-Star
2
65
262
The C4.5 predicted ratings distribution is determined by summing the vertical
values in Table 5.10. C4.5 predicted 235 1-Star funds, 369 2-Star funds, and 330 3-Star
funds. We performed a x2 test of Goodness-of-Fit to compare the actual 1995 ratings
distribution, summing Table 5.10 horizontally, to the C4.5 predicted ratings distribution.
We calculated x2 = 8.93 for d.f. = 2, or p = 0.0115. Thus, the distribution of the C4.5
predicted ratings is not equivalent to the 1995 actual ratings distribution. We tested this
predicted ratings distribution for normality with the Kolmogorov-Smirnov test and the
distribution was normal with p>0.15.
Figure 5.5 is the decision tree with the best predictive power and has 75 nodes.
By trial-and-error we determined that any test used in the tree had to have at least two
outcomes with 5 or more examples, resulting in a larger tree compared to Figure 5.4.

122
Table 5.11 lists the features selected for building this decision tree and their
frequency of selection. Twenty-two of sixty features were selected for the decision tree.
Again, we see the dominance of the three-year and five-year features. However, we now
have twice as many features compared to the previous list in Table 5.9. Also, the three-
year features are located at the leaf nodes whereas in the previous 5-Star experiment they
were distinctly absent from this location in the tree. This is a very different decision tree
from Figure 5.4 in more than just the number of nodes.
Table 5.11: Features Selected for Predicting 1995 Matched Fund 3-Star Ratings.
Feature
Selection
Frequency
LeafNode
Frequency
12B-1 Fees
1
1
Alpha
2
3
Annual Return 1992
1
0
Bonds%
1
1
Foreign Stock%
1
0
1
2
Month 3 Total Return
1
0
Months Rated
2
4
Momingstar Average Rating
5
3
Stocks%
2
3
Utilities Sector
1
2
YTD Rank by Objective
1
0
Year 3 Annualized Return
1
0
Year 3 Momingstar Return
3
3
Year 3 Momingstar Risk
2
2
Year 3 Rank by Objective
1
0
Year 3 Standard Deviation
1
2
Year 5 Annualized Return
2
2
Year 5 Earnings Gain
1
2
Year 5 Momingstar Return
3
3
Year 5 Rank by Objective
2
4
Year 10 Rank by Objective
1
1

123
Morningstar Average Rating <= 2.9 :
Year 3 Morningstar Return <= 0.97 :
Year 3 Rank by Objective <= 62 :
| Year 5 Morningstar Return > 0.48
| Year 5 Morningstar Return <=0.48 :
| | Months Rated <= 22 : 2-star
| | Months Rated > 22 : 1-star
Year 3 Rank by Objective > 62 :
| Morningstar Average Rating <= 2.5 :
| Morningstar Average Rating > 2.5 :
| | Year 3 Morningstar Return <= 0.52
| | Year 3 Morningstar Return > 0.52
Year 3 Morningstar Return > 0.97 :
Year 3 Morningstar Risk > 1.71 : 1-stc
Year 3 Morningstar Risk <= 1.71 :
Morningstar Average Rating <= 1.7 :
| Year 5 Rank by Objective <= 57 :
| Year 5 Rank by Objective > 57 :
Morningstar Average Rating > 1.7
| Alpha <= 6.2 : 2-star
| Alpha > 6.2 :
| | Year 5 Annualized Return
Year 5 Annualized Return
2-star
1-star
1-star
2-star
2-star
1-star
12B-1 Fees
12B-1 Fees
| Stocks%
| Stocks%
<=
>
<=
>
0.25 :
0.25 :
91.32 :
91.32 :
> 2.
<= 11.16
> 11.16
3-Star
3-Star
2-star
2-star
Morningstar Average Rating
Year 3 Annualized Return <= 13.1 :
Morningstar Average Rating <= 3.7 :
Month 3 Total Return <= -7.53 :
Year 5 Morningstar Return <= 1.
Year 5 Morningstar Return > 1.
Month 3 Total Return > -7.53 :
Annual Return 1992 <= -4.15 :
Year 5 Earnings Gain <= 8.09
Year 5 Earnings Gain > 8.09
Annual Return 1992 > -4.15 :
Year 5 Morningstar Return <=
Year 3 Morningstar Risk <=
Year 3 Morningstar Risk >
I
27
27
â– star
star
1-star
2-star
0.97
0.83
0.83
<= 72
> 72
Year 5 Rank by Objective
| Year 5 Rank by Objective
I | Front Load <= 4 : 2-star
I | Front Load > 4 : 1-star
Year 5 Morningstar Return > 0.97
| Foreign Stock% <= 6.93 :
2-star
2-star
Figure 5.5: Decision Tree for 1994-1995 3-Star Matched Funds Prediction.

124
I I | | | Alpha <= 3.03 : 2-star
| | | | | Alpha > 3.03 : 3-Star
| | | | Foreign Stock% > 6.93 :
| | | | | Stocks% > 93.5 : 2-star
IMM Stocks% <= 93.5 :
I I | | | I Year 5 Rank by Objective <= 34 : 3-Star
| | | | | | Year 5 Rank by Objective > 34 :
I | | | | | | Year 3 Standard Deviation <= 10.35 : 3-Star
I I I | | | | Year 3 Standard Deviation > 10.35 : 2-star
Morningstar Average Rating > 3.7 :
| YTD Rank by Objective <= 74 :
| Year 3 Morningstar Return
o
II
V
2-star
| Year 3 Morningstar Return
> 0.44
| | Year 10 Rank by Objective
o
II
V
3-Star
| | Year 10 Rank by Objective
> 40
| | | Bonds% > 2.08 : 3-Star
| | | Bonds% <=2.08 :
I I I I Months Rated <= 91
3-Star
| I | | Months Rated > 91
2-star
YTD Rank by Objective > 74
| Utilities Sector <= 1.4 :
1-star
| Utilities Sector > 1.4 :
2-star
Year 3 Annualized Return > 13.1 :
| Year 5 Annualized Return > 10.91 : 3-Star
| Year 5 Annualized Return <= 10.91 :
I | Morningstar Average Rating <= 3.4 : 2-star
I | Morningstar Average Rating > 3.4 : 3-Star
Figure 5.5â€”continued.

125
5 2 3 Results for 1995 Data Predicting 1996 Ratings
5.2.3.1 The 5-Star rating system
The Confusion Matrix for the third experiment, predicting ratings and rating
changes from 1995 to 1996 with matched mutual funds and the 5-Star rating system, is
shown in Table 5.12. There were 688 correct predictions out of 1,059 mutual funds or
65.0% correct. These results are similar to Section 5.2.2.1, which had 64.5% correct
predictions
Table 5.12: Five-Star Rating System Confusion Matrix, 1996 Matched Funds Actual
Ratings versus C4.5 Predicted Mutual Fund Ratings.
Predicted As =>
1-Star
2-Star
3-Star
4-Star
5-Star
Actual 96 1-Star
61
30
2
0
0
Actual 96 2-Star
16
131
80
2
0
Actual 96 3-Star
1
40
301
37
2
Actual 96 4-Star
0
5
90
167
8
Actual 96 5-Star
0
0
12
46
28
Table 5.13 shows how funds with a given rating in 1995 were actually rated in
1996 or C4.5 predicted the rating in 1996. Using the method described in Section 5.2.2.1
we determined that there were 186 rating change differences between the predicted and
actual ratings distributions. Thus, C4.5 correctly predicted 873 rating changes or 82.4%.
As mentioned earlier, we also determined from Table 5.13 that 654 mutual funds
maintained the same rating from 1995 by adding the actual values on the diagonal from
1-Star to 5-Star. C4.5 predicted that 582 mutual funds would maintain their existing
rating This is 72 fewer, or 11.0% worse, than the investment strategy of maintaining
ratings over one year.

126
Table 5.13: Five-Star Rating System, 1995 Rating vs. 1996 Actual Rating Changes and
C4.5 Predicted Rating Changes for Matched Funds.
1995
Rating
1996 Ratings
1-Star
2-Star
3-Star
4-Star
5-Star
1-Star
Predicted
54
18
5
0
0
Actual
64
21
0
0
0
2-Star
Predicted
23
90
80
17
0
Actual
23
142
64
8
2
3-Star
Predicted
6
50
240
97
6
Actual
4
51
239
65
5
4-Star
Predicted
3
6
82
152
35
Actual
0
13
67
158
28
5-Star
Predicted
0
2
13
33
47
Actual
2
2
11
39
51
Ratings Distribution
Predicted 1996
86
166
420
299
88
Actual 1996
93
229
381
270
86
Actual 1995
85
239
364
266
105
Three x2 Goodness-of-Fit tests were performed on the distribution of the 1995
actual ratings, the 1996 actual ratings, and the 1996 predicted ratings to determine if the
distributions were equivalent. The 1995 actual ratings and 1996 actual ratings
distributions were equivalent with p = 0.243 for d.f. = 4 However, for comparing the
1995 actual ratings distribution to the 1996 predicted ratings and for comparing the 1996
actual ratings distribution to the 1996 predicted ratings, p < 0.001 for d.f. = 4 for both
tests. Thus, these distributions were not equivalent.
We performed a x2 Goodness-of-Fit tests comparing the predicted ratings
distribution to the standard Momingstar distribution. We determined the distributions
were not equivalent with p < 0.001 for d.f. = 4 We also tested the three distributions for

127
normality using the standard Momingstar distribution as a reference. The Kolmogorov-
Smirnov test determined all three were normally distributed with p > 0.15.
Figure 5.6, on the following page, is a graph of the data in Table 5.13 and shows
how closely the C4.5 predicted ratings changes tracked the actual changes.
Figure 5.7 is the decision tree with the best predictive power and has 77 nodes.
By trial-and-error, we determined that the best tree required any test used in the tree to
have at least two outcomes with 10 or more examples.
Figure 5.6: Distribution of Five-Star Actual Rating Changes and C4.5 Predicted Ratings
Changes, 1995 to 1996.
Table 5.14 lists the 17 features selected for building this decision tree and their
frequency of selection. The tree in Figure 5.7 generally follows the pattern of the tree in
Figure 5.3 of alternating between return and risk features. Flowever, the leaf node
frequency shows that the Year 3 features are more likely to be leaf nodes in this tree.

128
ylorningstar Average Rating <= 1.9
| Year 3 Morningstar Risk <= 1.08
2-Star
I Year 3 Morningstar Risk > 1.08
| | Month 12 Total Return <= 27.15
1-Star
| | Month 12 Total Return > 27.15
2-Star
Morningstar Average Rating > 1.9
| Year 3 Morningstar Return <= 1.62
| | Month 12 Total Return <= 2.04
3-Star
2-Star
0.77
0.61 : 2-Star
0.61 : 3-Star
Year 3 Morningstar Risk <= 1.91 : 2-Star
Year 3 Morningstar Risk > 1.91 : 1-Star
Month 12 Total Return > 2.04 :
Morningstar Average Rating <= 3 :
Year 3 Annualized Return <= 9.6 :
Year 3 Morningstar Risk <= 0.7
Year 3 Morningstar Risk > 0.7
Year 3 Annualized Return > 9.6 :
Year 3 Morningstar Risk <= 1.12
Year 5 Morningstar Return <
| Other% > 0.9 : 3-Star
| Other% <= 0.9 :
| | Months Rated <= 23 : 3-Star
| | Months Rated > 23 :
| | | Turnover > 76 : 2-Star
| | | Turnover <= 76 :
| | | | Year 5 Morningstar Return <=
| | | | Year 5 Morningstar Return >
Year 5 Morningstar Return > 0.77 :
| Year 5 Morningstar Return <=1.22 : 3-Star
| Year 5 Morningstar Return > 1.22 :
| | Year 3 Morningstar Return <= 1.09 : 3-Star
| | Year 3 Morningstar Return > 1.09 : 4-Star
Year 3 Morningstar Risk > 1.12 :
Year 3 Morningstar Return <= 0.92 : 2-Star
Year 3 Morningstar Return
| Beta <= 1.1 : 2-Star
| Beta > 1.1 : 3-Star
Morningstar Average Rating >
Annual Return 1994 <= -10 :
Annual Return 1994 > -10 :
Year 3 Morningstar Return
Net Assets > 5571.3 :
Net Assets <= 5571.3 :
Year 3 Morningstar Risk <= 0.99 :
Year 5 Morningstar Return > 1.21 : 4-Star
Year 5 Morningstar Return <=1.21 :
| Year 10 Annualized Return <= 13.64 : 3-Star
| Year 10 Annualized Return > 13.64 :
| | Months Rated <= 84 : 3-Star
| | Months Rated > 84 : 4-Star
<= 0.92
> 0.92
3 :
2-Star
<= 1.07
4-Star
Figure 5.7: Decision Tree for 1995-1996 5-Star Matched Funds Prediction.

129
1
1
i
i
| | | Year 3 Morningstar Risk >
0.99 :
1
1
i
i
| | | | Months Rated <= 77 : 2-
Star
1
1
i
i
| | | | Months Rated > 77 : 3-
Star
1
1
i
i
| Year 3 Morningstar Return > 1
.07 :
1
1
i
i
| | Annual Return 1993 > 32.08 :
3-Star
1
1
i
i
| | Annual Return 1993 <= 32.08 :
1
1
i
i
| | | Year 5 Morningstar Return >
1.4 :
4-Star
1
1
i
i
| | | Year 5 Morningstar Return <
= 1.4 :
1
1
i
i
| | | | Year 3 Standard Deviation
<=9.1
: 4-Star
1
1
i
i
| | | | Year 3 Standard Deviation
> 9.1
: 3-Star
1
Year 3
Morningstar Return > 1.62
1
i
Year
3 Morningstar Risk > 1.5 : 3-Star
1
i
Year
3 Morningstar Risk <= 1.5 :
1
i
i
Morningstar Average Rating > 4.7 :
5-Star
1
i
i
Morningstar Average Rating <= 4.7 :
1
i
i
i
Net Assets > 1933.7 : 5-Star
1
i
i
i
Net Assets <= 1933.7 :
1
i
i
i
| Months Rated <= 114 :
1
i
i
i
| | Year 5 Morningstar Return <=
2.19 :
4-Star
1
i
i
i
| | Year 5 Morningstar Return >
2.19 :
1
i
i
i
| | | Net Assets > 360.7 : 5-Star
1
i
i
i
| | | Net Assets <= 360.7 :
1
i
i
i
| | | | Alpha <= 7.1 : 4-Star
!
i
i
i
I I I | Alpha > 7.1 : 5-Star
1
i
i
i
| Months Rated > 114 :
1
i
i
i
| | 12B-1 Fees <= 0.1 : 3-Star
1
i
i
i
| | 12B-1 Fees > 0.1 : 4-star
Figure 5.7â€”continued

130
Table 5.14: Features Selected for Predicting 1996 Matched Fund 5-Star Ratings
Feature
Selection
Frequency
Leaf Node
Frequency
12B-1 Fees
1
2
Alpha
1
2
Annual Return 1993
1
1
Annual Return 1994
1
1
Beta
1
2
Month 12 Total Return
2
2
Months Rated
4
5
Momingstar Average Rating
3
1
Net Assets
3
3
Other%
I
1
Turnover
1
1
Year 3 Annualized Return
1
0
Year 3 Momingstar Return
4
3
Year 3 Momingstar Risk
6
6
Year 3 Standard Deviation
1
2
Year 5 Momingstar Return
6
6
Year 10 Annualized Return
1
1
5.2.3.2 The 3-Star rating system
The Confusion Matrix of the results of the fourth, and final, experiment in this
phase are shown in Table 5.15. C4.5 correctly predicted the ratings of 799 out of 1,059
mutual funds rated in 1996, or 75.4%. These results are comparable to Section 5.2.2.2.
A separate manual count determined that 765 funds, or 72.2%, maintained their rating for
one year according to the 3-Star rating system. Thus, the C4.5 predictions narrowly
outperformed our benchmark investment strategy.
Table 5.15: Three-Star Rating System Confusion Matrix, 1996 Actual versus C4.5
Predicted Mutual Fund Ratings.
Predicted As=>
1-Star
2-Star
3-Star
Actual 96 1-Star
221
98
3
Actual 96 2-Star
31
268
82
Actual 96 3-Star
4
42
310

131
We noted the improvement in prediction of the 3-Star system over the 5-Star
rating system. Again, we decided not to determine the rating changes since they would
be a slight improvement over the sum of the individual ratings in Table 5.13.
We performed a %2 Goodness-of-Fit test to compare the actual 1996 ratings to the
C4.5 predicted ratings and we calculated %2 = 19.71 for d.f. =2, or p = 0.0001. Thus, the
two distributions were not equivalent. We tested the C4.5 predicted ratings distribution
for Kolmogorov-Smirnov normality and the distribution was normal with p = 0.131
Figure 5 8 is the decision tree with the best predictive power and has 45 nodes.
By trial-and-error we determined that any test used in the tree had to have at least two
outcomes with 13 or more examples. Table 5.18 lists the features selected for building
this tree and their frequency of selection. Compared to Table 5.14, Months Rated and
Momingstar Average Rating have a reduced role in the decision tree.
Table 5.18: Features Selected for Predicting 1996 Matched Fund 3-Star Ratings.
Feature
Selection
Frequency
Leaf Node
Frequency
Annual Return 1994
2
2
Foreign Stock%
1
1
Month 3 Rank by Objective
1
2
Month 12 Total Return
1
1
Months Rated
1
2
Momingstar Average Rating
1
0
Technology Sector
1
2
Year 3 Annualized Return
2
2
Year 3 Momingstar Return
2
0
Year 3 Momingstar Risk
2
1
Year 3 Standard Deviation
1
1
Year 5 Annualized Return
2
4
Year 5 Momingstar Return
4
4
Year 10 Momingstar Return
1
1

132
3-Star
1-star
1-star
1.22
1.22 :
3-Star
2-star
1-star
2-star
2-star
Month 12 Total Return <= 2.04 : 1-star
Month 12 Total Return > 2.04 :
Morningstar Average Rating <= 3 :
Year 3 Annualized Return <= 9.6 : 1-star
Year 3 Annualized Return > 9.6 :
Year 3 Annualized Return > 21.78
Year 3 Annualized Return <= 21.78
Year 3 Morningstar Risk > 1.28
Year 3 Morningstar Risk <= 1.28
Year 3 Morningstar Return <= 1.09 :
I Year 3 Standard Deviation > 11.9
| Year 3 Standard Deviation <= 11.9
| | Year 5 Morningstar Return <= 0.
| | Year 5 Morningstar Return > 0.
Year 3 Morningstar Return > 1.09 :
| Year 5 Morningstar Return <=
| Year 5 Morningstar Return >
| | Technology Sector <= 18.8
I | Technology Sector > 18.8
Morningstar Average Rating > 3 :
Year 3 Morningstar Return <= 1.05 :
Annual Return 1994 <= -9.97 : 1-star
Annual Return 1994 > -9.97 :
Year 3 Morningstar Risk <= 0.99 :
Morningstar Average Rating <= 3.9 :
| Year 10 Morningstar Return > 1.11 : 3-Star
| Year 10 Morningstar Return <= 1.11 :
I | Year 5 Morningstar Return <= 1 : 2-star
| | Year 5 Morningstar Return > 1 :
I | | Month 3 Rank by Objective <= 56
I | | Month 3 Rank by Objective > 56
Morningstar Average Rating > 3.9 :
I Year 5 Annualized Return <= 10.99
| Year 5 Annualized Return > 10.99
Year 3 Morningstar Risk > 0.99 :
Months Rated <= 77 : 1-star
Months Rated > 77 : 2-star
Year 3 Morningstar Return > 1.05 :
Foreign Stock% > 65 : 2-star
Foreign Stock% <= 65 :
I Annual Return 1994 > -2.26 : 3-Star
| Annual Return 1994 <= -2.26 :
I | Year 5 Annualized Return <= 12.85 :
I | Year 5 Annualized Return > 12.85 :
2-star
3-Star
2-star
3-Star
2-star
3-Star
Figure 5.8: Decision Tree for 1995-1996 3-Star Matched Funds Prediction.

133
5.2.4 Conclusions
We drew several conclusions from the results of the experiments in this phase.
First, we concluded that C4.5 could predict mutual fund ratings one year in the future
using a decision tree trained on the preceding year's data. Using the 3-Star rating system,
it performed marginally better at predicting ratings one year in the future than an
investment strategy of assuming that mutual fund ratings were maintained for a one-year
period. There appears to be information in the mix of 60 features that permits predicting
the ratings. The 5-Star rating system slightly underperformed this benchmark.
The 5-Star rating system predicted ratings with 65% accuracy and the 3-Star
rating system we devised for these experiments predicted fund ratings at 75% accuracy.
As we mentioned previously, the 3-Star rating prediction problem is an easier task.
However, recalling the confidence and error parameters, 8 and s, discussed in Section
5.2.1, we were 80% (1 - 8) confident that, given the size of the training set, our results
would be 65% (1 - s) correct for the 5-Star rating system. We were approximately 73%
confident that the 3-Star rating system would be 75% correct.
Our second conclusion was that, using matched mutual funds, C4.5 predicted 5-
Star mutual fund ratings changes one year in the future with approximately 80%
accuracy. It is important to mention that these were matched funds because of expected
claims of survivorship bias having an effect on the results. However, we felt it not
necessary to control for this effect since we were only comparing one-year, and not
multi-year, periods. If our interest as an investment manager is in predicting ratings
changes, either improvements or declines, then these results are encouraging.

134
Our third conclusion was that the %2 Goodness-of-Fit tests of the ratings
distributions showed that C4.5 was not mirroring the original training set.
Our final conclusion was that C4.5 created unique, normally-distributed
predictions of the mutual fond ratings that should perform better than a randomly
generated set of ratings within the standard Momingstar parameters.
The next phase of this study predicts ratings for unmatched training and testing
sets of mutual funds
5 3 Phase 7: Predicting Unmatched Mutual Fund Ratines
5.3.1 Methodology
Phase 7 used a methodology similar to Phase 6 for C4.5 to predict the mutual fond
ratings of unmatched mutual funds, i.e., the testing set has more mutual funds than the
training set. We prepared a training set of 1,052 mutual fond examples from 1994 that
used C4.5 to build a decision tree to predict the ratings of a testing set of 1,234 examples
from 1995. Another training set of 1,234 mutual funds from 1995 was prepared to
predict the ratings of a testing set of 1,583 mutual fond examples from 1996. We used
both the 5-Star rating system and 3-Star rating system approaches, so there was four
datasets. The features used in the feature vector of this phase are listed in Table 5.5.
Note the additional 182 funds in the 1995 testing set and the 349 additional funds
in the 1996 testing set. During the course of the business year, many funds consolidate or
go out of business altogether We did not do a count to determine the exact number of
new funds. There could be more new funds than the previously cited numbers.
We reported the results of the best predicting decision tree. The measurement of
interest was found by using a Confusion Matrix to count the mutual fond ratings correctly

135
predicted by C4.5 compared to the actual mutual fund ratings for that year. In other
words, we compared the actual 1995 mutual fund ratings to the predicted 1995 ratings,
and did the same count for the 1996 results.
We tested the actual and predicted distributions for equivalence to the standard
Momingstar distribution of 10%-22.5%-35%-22.5%-10% using the \ Goodness-of-Fit
test. We also compared the actual and predicted distributions to each other with the same
test to assure they were unique. Finally, we tested the two distributions for normality
using the Kolmogorov-Smirnov test for normality.
We calculated the following error and confidence parameters for these
experiments concerning unmatched funds:
Table 5.19: Error and Confidence Parameters for Lower Bound on Training Examples.
Year and
Rating System
8
6
Training
Examples
1994-95 5-Star
0.35
0.1855
1,059
1994-95 3-Star
0.25
0.268
1,059
1995-96 5-Star
0.35
0.178
1,234
1995-96 3-Star
0.25
0.257
1,234
5,3,2 Results for 1994 Data Predicting 1995 Ratings
5,3.2.1 5-Star rating system
The results of our first experiment, predicting the 5-Star ratings of the 1995
unmatched funds, are in a Confusion Matrix in Table 5.20. It shows that C4.5 predicted
the ratings of 790 out of 1,234 unmatched funds, or an accuracy of 64.0%. This is
comparable to results obtained in Sections 5.2.2.1 and 5.2.3.1.

136
Table 5.20: Five-Star Rating System Confusion Matrix, 1995 Unmatched Funds Actual
Ratings versus C4.5 Predicted Mutual Fund Ratings.
Predicted As=>
1-Star
2-Star
3-Star
4-Star
5-Star
Actual 95 1-Star
48
59
1
1
1
Actual 95 2-Star
5
191
61
16
3
Actual 95 3-Star
0
74
289
60
7
Actual 95 4-Star
0
1
84
180
42
Actual 95 5-Star
0
1
5
23
82
We performed three y': Goodness-of-Fit tests to determine if the 1995 actual and
1995 predicted ratings distribution were equivalent to the standard Momingstar
distribution and to determine if the predicted 1995 ratings distribution was equivalent to
the actual 1995 distribution. The results were:
1) The actual 1995 distribution was equivalent to the standard Momingstar
distribution, with p = 0.232 for d.f. = 4,
2) The predicted 1995 distribution did not fit the Momingstar distribution,
with p < 0.001 for d.f. = 4, and
3) The actual 1995 ratings distribution was not equivalent to the predicted
1995 ratings distribution, with p < 0.001 for d.f. = 4.
We also performed a Kolmogorov-Smimov test of normality on the actual 1995 and
predicted 1995 distributions. Both were normally distributed with p > 0.15.
Figure 5.9 is the decision tree with the best predictive power and it has 31 nodes.
By trial-and-error, we determined that any test used in the training phase of the decision
tree had to have at least two outcomes with 30 or more examples.
The results in Table 5.21 show how often features were selected in Figure 5.9 and
how often they were used for leaf nodes. Only seven features were used for this decision
tree with the most prominent being Average Momingstar Rating. The consistency of

137
Year 3 features in the decision tree indicates an importance of this information to the
rating process for all the decision trees in this chapter of our research.
Year 3 Morningstar Return > 2.11 : 5-star
Year 3 Morningstar Return <=2.11 :
| Average Morningstar Rating <=2.1 :
| | Year 3 Annualized Return <=3.35 : 1-star
| | Year 3 Annualized Return >3.35 :
| | | Year 3 Morningstar Risk <= 1.6 : 2-star
| | | Year 3 Morningstar Risk > 1.6 : 1-star
| Average Morningstar Rating > 2.1 :
| | Year 3 Morningstar Return <= 0.32 :
| | | Average Morningstar Rating <= 3.1 : 2-star
| | | Average Morningstar Rating > 3.1 : 3-star
| | Year 3 Morningstar Return >0.32 :
| | | Average Morningstar Rating <=3.8 :
| | | | Year 3 Morningstar Risk > 1.31 : 2-star
| | | | Year 3 Morningstar Risk <= 1.31 :
| | | | | Year 3 Rank by Objective <= 30 :
| | | | | | Year 5 Morningstar Return <= 1.06 : 3-star
| | | | | | Year 5 Morningstar Return > 1.06 : 4-star
| | | | | Year 3 Rank by Objective > 30 :
| | | | | | Average Morningstar Rating > 2.7 : 3-star
| | | | | | Average Morningstar Rating <=2.7 :
| | | | | | | Year 3 Morningstar Return <= 0.86 : 2-star
| | | | | | I Year 3 Morningstar Return > 0.86 : 3-star
| | | Average Morningstar Rating >3.8 :
| | | | Year 3 Rank by Objective <= 27 :
| | | | | Average Morningstar Rating <= 4.1 : 4-star
| | | | | Average Morningstar Rating > 4.1 : 5-star
| | | | Year 3 Rank by Objective > 27 : 1-star
I I I I | YTD Total Return <= -7.13 : 3-star
| | | | | YTD Total Return > -7.13 : 4-star
Figure 5.9: Decision Tree for 1994-1995 5-Star Unmatched Funds Prediction.
Table 5.21: Features Selected for Predicting 1995 Unmatched Fund 5-Star Ratings.
Feature
Selection Frequency
Leaf Node
Frequency
Average Morningstar Rating
6
5
Year 3 Annualized Return
1
1
Year 3 Morningstar Return
3
3
Year 3 Morningstar Risk
2
3
Year 3 Rank by Objective
3
0
Year 5 Morningstar Return
1
2
YTD Total Return
1
2

138
5.3.2.2 3-Star rating system
In the second experiment, we predicted the 1995 ratings using the 3-Star rating
system used in Section 5.2. The results are shown in the Confusion Matrix in Table 5.22.
C4.5 correctly predicted the ratings of 934 of 1,234 unmatched mutual funds, or an
accuracy of 75.7%.
Table 5.22: Three-Star Rating System Confusion Matrix, 1995 Unmatched Funds Actual
Ratings versus C4.5 Predicted Mutual Funds Ratings
Predicted As=>
1-Star
2-Star
3-Star
Actual 95 1-Star
302
76
8
Actual 95 2-Star
52
288
90
Actual 95 3-Star
0
74
344
The x2 Goodness-of-Fit tests were performed First, we tested the standard
Momingstar distribution and the actual 1995 distribution and, with p = 0.524 for d.f. = 2,
determined that they were equivalent. Second, we tested the Momingstar distribution and
the predicted 1995 ratings distribution and, with p = 0.0075 for d.f. = 2, determined that
they were not equivalent. The third test was the actual 1995 distribution and the
predicted 1995 distribution and, with p = 0.124 for d.f. = 2, we determined that they were
equivalent. We expected that they would not be equivalent and concluded that the result
may be caused by the few degrees of freedom of the test. We also performed the
Kolmogorov-Smirnov test for normality on the actual and predicted distributions. Both
were normal with p > 0.15 and p = 0.099, respectively.
Figure 5.10 is the decision tree with the best predictive power and it has 61 nodes.
Through several iterations, we determined that any test used in the training phase of the
decision tree had to have at least two outcomes with eight or more examples.

139
Average Morningstar Rating <=2.9 :
i
Year 3 Morningstar Return <=0.97 :
i
i
Year 3 Rank by Objective <= 57 :
i
i
| R-Squared <= 72 : 1-star
i
i
| R-Squared > 72 : 2-star
i
i
Year 3 Rank by Objective > 57 :
i
i
| Average Morningstar Rating <= 2
.4 : 1-star
i
i
| Average Morningstar Rating > 2
.4 :
i
i
I | Year 3 Morningstar Return <=
0.52 : 1-star
i
i
I | Year 3 Morningstar Return >
0.52 : 2-star
i
Year 3 Morningstar Return > 0.97 :
i
i
Year 3 Morningstar Risk > 1.71 :
1-star
i
i
Year 3 Morningstar Risk <=1.71 :
i
i
| Average Morningstar Rating <= 1
.4 : 1-star
i
i
I Average Morningstar Rating > 1
.4 :
i
i
I | Year 3 Morningstar Return > 2
.66 : 3-star
i
i
1 | Year 3 Morningstar Return <=
2.66 :
i
i
| | | Year 3 Rank by Objective <=
6 : 3-star
i
i
| | | Year 3 Rank by Objective >
6 :
i
i
1 I I I Year 5 Annualized Return
<= 11.41 : 2-star
i
i
I I I I Year 5 Annualized Return
> 11.41 :
i
i
I 1 1 1 1 Average Morningstar Rating <=2.3 : 2-star
i
i
| 1 1 | | Average Morningstar Rating > 2.3 : 3-star
Average Morningstar Rating > 2.9:
i
Year 3 Annualized Return <= 13.14 :
i
i
Month 3 Total Return <= -7.69 :
i
i
I Year 3 Rank by Objective <= 66
: 2-star
i
i
| Year 3 Rank by Objective > 66
: 1-star
i
i
Month 3 Total Return > -7.69 :
i
i
| Average Morningstar Rating <= 3
.7 :
i
i
I | P/E Ratio > 30.1 : 1-star
i
i
1 | P/E Ratio <=30.1 :
i
i
I | | Year 3 Annualized Return <=
10.61 :
i
i
1 I | | Year 5 Morningstar Return
> 0.5 : 2-star
i
i
1 1 1 1 Year 5 Morningstar Return
IT)
o
ll
V
i
i
1 I 1 1 I Year 3 Standard Deviation <=10.8 : 2-star
i
i
1 I 1 I | Year 3 Standard Deviation > 10.8 : 1-star
i
i
| | | Year 3 Annualized Return >
10.61 :
i
i
1 I I | Year 10 Morningstar Return
<= 1.22 :
i
i
1 1 1 1 1 Year 3 Morningstar Risk
<= 0.58 : 3-star
i
i
1 1 1 1 1 Year 3 Morningstar Risk
> 0.58 : 2-star
i
i
I I I I Year 10 Morningstar Return
> 1.22 :
i
i
1 I | | I Months Rated <=84 : 2-star
i
i
1 I I I I Months Rated > 84 : 3-star
i
i
I Average Morningstar Rating > 3
7 :
i
i
I | YTD Total Return <= -10.93 :
â€™-star
i
i
I | YTD Total Return > -10.93
i
i
I | | Year 3 Morningstar Return <=
0.44 : 2-star
Figure 5.10: Decision Tree for 1994-1995 3-Star Unmatched Funds Prediction

140
lili Year 3 Morningstar Return > 0.44 :
| | | | | P/B Ratio > 3.83 : 2-star
| | | | | P/B Ratio <=3.83 :
| | | | | | Year 10 Rank by Objective <= 40 : 3-star
| | | | | | Year 10 Rank by Objective > 40 :
| | | | | | | Months Rated <= 99 : 3-star
| | | | | | | Months Rated > 99 : 2-star
Year 3 Annualized Return > 13.14 :
| Average Morningstar Rating > 3.4 : 3-star
| Average Morningstar Rating <=3.4 :
| | Year 5 Morningstar Return <= 1.43 : 2-star
| | Year 5 Morningstar Return > 1.43 : 3-star
Figure 5.10~continued.

141
Table 5.23 shows the 16 features selected by C4.5 for the decision tree. Average
Morningstar Rating, Months Rated, and the Year 3 features dominate the decision tree.
Table 5.23: Features Selected for Predicting 1995 Unmatched Fund 3-Star Ratings.
Feature
Selection Frequency
Leaf Node
Frequency
Average Morningstar Rating
6
5
Month 3 Total Return
1
0
Months Rated
2
4
P/B Ratio
1
1
P/E Ratio
1
1
R-Squared
1
2
Year 3 Annualized Return
2
0
Year 3 Morningstar Return
4
4
Year 3 Morningstar Risk
2
3
Year 3 Rank by Objective
3
3
Year 3 Standard Deviation
1
2
Year 5 Morningstar Return
2
3
Year 10 Morningstar Return
1
0
Year 10 Rank by Objective
1
1
YTD Total Return
1
2
5.3,3 Results for 1995 Data Predicting 1996 Ratines
5 3 3.1 5-Star rating system
The results of our third experiment in this phase, predicting the five-star ratings of
the 1996 unmatched funds, are presented in the Confusion Matrix in Table 5.24. C4.5
predicted the 1996 ratings of 1,053 out of 1,583 mutual funds, or an accuracy of 66.5%.
This is comparable to other 5-Star rating system prediction rates.
We performed three x2 Goodness-of-Fit tests In the first one we determined that
the actual 1996 ratings distribution and the standard Morningstar distribution were

142
equivalent with p = 0.258 and d.f. = 4. The second test compared the predicted 1996
ratings distribution and the standard Momingstar distribution. With p < 0.001 for d.f. = 4
we determined that the distributions were not equivalent. The third test compared the
actual 1996 distribution and the predicted 1996 distribution and they were determined to
be not equivalent with p < 0.001 for d.f. = 4. The Kolmogorov-Smirnov test of normality
showed that the actual 1996 and predicted 1996 distributions were normal with p > 0.15.
Table 5.24: Five-Star Rating System Confusion Matrix, 1996 Unmatched Funds Actual
Ratings versus C4.5 Predicted Mutual Funds Ratings.
Predicted As=>
1-Star
2-Star
3-Star
4-Star
5-Star
Actual 96 1-Star
127
21
3
0
0
Actual 96 2-Star
34
178
113
22
0
Actual 96 3-Star
1
50
395
103
9
Actual 96 4-Star
0
3
71
263
50
Actual 96 5-Star
0
1
1
48
90
Figure 5.11 is the decision tree with the best predictive power and it has 35 nodes.
By trial and error, we determined that any test used in the training phase of the decision
tree had to have at least 29 or more examples.
Table 5.25 lists the seven features selected by C4.5 for building the decision tree
and their frequency of selection. The Average Momingstar Rating feature is the most
dominant followed by the Year 3 features. Flowever, the Year 5 features were most often
at the leaf nodes. It is surprising that such a small tree with such few features predicted
the ratings so well.

143
rear 3 Morningstar Return <= 1.07
Year 3 Morningstar Risk <= 1.05
Morningstar Average Rating <=
2.9
1
i
| Year 3 Annualized Return <=
9.51 : 2-
star
1
i
| Year 3 Annualized Return >
9.51 :
1
i
| | Year 5 Morningstar Return
<= 0.7
2-star
1
i
I | Year 5 Morningstar Return
> 0.7
3-star
1
i
Morningstar Average Rating >
2.9 :
1
i
| Morningstar Average Rating
<= 3.6 :
3-star
1
i
| Morningstar Average Rating
> 3.6 :
1
i
| | Year 5 Morningstar Return
A
II
O
00
kO
: 3-star
1
i
| | Year 5 Morningstar Return
> 0.89
: 4-star
1
Year 3 Morningstar Risk > 1.05 :
1
i
Morningstar Average Rating <=
1.9 : 1-
star
1
i
Morningstar Average Rating >
1.9 :
1
i
| Morningstar Average Rating
> 3.3 :
3-star
1
i
| Morningstar Average Rating
<= 3.3 :
1
i
| | Year 3 Annualized Return
<= 6.92 :
1-star
1
i
| | Year 3 Annualized Return
> 6.92 :
2-star
Year 3 Morningstar Return > 1.07 :
i
Year 3 Morningstar Risk <= 1.29 :
i
i
Year 3 Annualized Return <= 16.77 :
i
i
| Morningstar Average Rating
> 3.2 :
4-star
i
i
| Morningstar Average Rating
<= 3.2 :
i
i
| | Year 5 Morningstar Return
<= 1.19
: 3-star
i
i
| | Year 5 Morningstar Return
> 1.19
: 4-star
i
i
Year 3 Annualized Return > 16.77 :
i
i
| Morningstar Average Rating
> 4.1 :
5-star
i
i
| Morningstar Average Rating
<= 4.1 :
i
i
| | Year 5 Rank by Objective
<= 13 : 5
-star
i
i
| | Year 5 Rank by Objective
> 13 : 4
-star
Year 3 Morningstar Risk > 1.29 :
Technology Sector <=12.5 : 2-star
Technology Sector > 12.5 : 4-star
Figure 5.11: Decision Tree for 1995-1996 5-Star Unmatched Funds Prediction.
Table 5.25: Features Selected for Predicting 1996 Unmatched Fund 5-Star Ratings.
Feature
Selection Frequency
LeafNode
Frequency
Average Morningstar Rating
6
5
Year 3 Annualized Return
3
3
Year 3 Morningstar Return
1
0
Year 3 Morningstar Risk
2
0
Year 5 Morningstar Return
3
6
Year 5 Rank by Objective
1
2
Technology Sector
1
2

144
5.3.3.2 3-Star rating system
The fourth experiment used C4.5 to predict the three-star ratings of the 1996
unmatched mutual funds. The results are in the Confusion Matrix in Table 5.26
comparing the actual 1996 fund ratings to the predicted 1996 ratings. C4.5 predicted the
ratings of 1,206 of 1,583 funds, or an accuracy of 76.2%. This performance is
comparable to the other 3-Star rating experiments in Phase 6.
Table 5.26: Three-Star Rating System Confusion Matrix, 1996 Unmatched Funds Actual
Ratings versus C4.5 Predicted Mutual Fund Ratings.
Predicted As=>
1-Star
2-Star
3-Star
Actual 96 1-Star
292
203
3
Actual 96 2-Star
19
472
67
Actual 96 3-Star
2
83
442
We performed the %2 Goodness-of-Fit tests on the actual 1996 and predicted 1996
ratings distributions. First, comparing the standard Momingstar ratings distribution,
modified for the 3-Star rating system, to the actual 1996 ratings distribution, we
determined them to be equivalent with p = 0.656 for d.f. = 2. Second, comparing the
standard Momingstar ratings distribution to the predicted 1996 ratings distribution, we
determined them to be not equivalent with p < 0.001 for d.f. = 2. The third comparison
for equivalency was the actual 1996 ratings distribution to the predicted 1996 ratings
distribution. We determined them to be not equivalent with p < 0.001 for d.f. = 2.
Kolmogorov-Smirnov tests of normality showed the actual 1996 and predicted 1996
ratings distributions were normal with p > 0.15.
Figure 5.12 is the decision tree with the best predictive capability for the
unmatched funds using the 3-Star rating system. It has 83 nodes and, by trial-and-error,

145
we determined that any test used in the training phase of the decision tree had to have at
least two outcomes with 7 or more examples.
Table 5.27 lists the nineteen features used C4.5 in building the decision tree. It
also lists how many times features were selected for use in the decision tree and how
often the features were used in leaf nodes. The major features are Month 12 Total
Return, Momingstar Average Rating, Months Rated, and Year 3 Momingstar Return.
Year 3 and Year 5 features also were selected for building this tree and they are similar to
those selected for the development of the other trees in this research.
Table 5.27: Features Selected for Predicting 1996 Unmatched Fund 3-Star Ratings.
Feature
Selection Frequency
Leaf Node
Frequency
Alpha
2
2
Annual Return 1994
1
1
Beta
1
1
Energy Sector
2
2
1
1
Flealth Sector
1
2
Month 12 Total Return
4
4
Months Rated
3
5
Momingstar Average Rating
6
5
Other %
1
1
Stocks %
1
2
Year 3 Annualized Return
1
0
Year 3 Momingstar Return
4
3
Year 3 Momingstar Risk
1
0
Year 3 Standard Deviation
3
4
Year 5 Annualized Return
3
1
Year 5 Momingstar Return
2
3
Year 5 Momingstar Risk
2
3
Year 10 Momingstar Return
2
1

146
Morningstar Average Rating <= 3 :
Month 12 Total Return <=7.71 : 1-star
Month 12 Total Return > 7.71 :
Year 3 Annualized Return <= 9.6 :
Energy Sector <= 10.6 :
Year 5 Morningstar Risk > 0.9 : 1-
Year 5 Morningstar Risk <= 0.9 :
| Front Load > 2 : 1-star
| Front Load <= 2 :
| | Morningstar Average Rating <= 2
I | Morningstar Average Rating > 2
Energy Sector > 10.6 :
Morningstar Average Rating > 2.7 :
Morningstar Average Rating <=2.7 :
| Stocks% <= 91.4 : 2-Star
| Stocks% > 91.4 : 1-star
Year 3 Annualized Return > 9.6 :
Alpha <= 5.3 :
Year 3 Standard Deviation <= 11.9 :
Year 5 Annualized Return <= 9.33
| Months Rated <=24 : 2-star
| Months Rated >24 :
| | Other% <= 0 :
| | Other% > 0 : 2-star
Year 5 Annualized Return > 9.33 :
| Year 3 Morningstar Return <= 1.
| Year 3 Morningstar Return > 1.
| | Year 5 Morningstar Return <=
| | Year 5 Morningstar Return >
| | | Year 3 Standard Deviation <
| | | Year 3 Standard Deviation >
Year 3 Standard Deviation > 11.9 :
Year 3 Morningstar Return <= 0.97
Year 3 Morningstar Return > 0.97
| Year 10 Morningstar Return > 0.
| Year 10 Morningstar Return <= 0.
| | Months Rated <=78 : 2-star
I | Months Rated > 78 : 1-star
I | | Alpha > 5.3 :
till Morningstar Average Rating <=1.8 :
till Morningstar Average Rating > 1.8:
I | | | | Month 12 Total Return > 37.91 :
I | | | | Month 12 Total Return <= 37.91 :
lililÃ­ Energy Sector <=1.2 : 3-star
| | I | | | Energy Sector > 1.2 : 2-star
Morningstar Average Rating > 3:
I Year 3 Morningstar Return <=1.05 :
â€¢star
.6 : 1-star
.6 : 2-star
2-star
09 : 2-star
09 :
1.22 : 2-star
1.22 :
=9.7 : 3-star
9.7 : 2-star
: 1-star
91 : 2-star
91 :
2-star
3-star
Figure 5.12: Decision Tree for 1995-1996 3-Star Unmatched Funds Prediction.

147
| Month 12 Total Return <=1.92 : 1-star
| Month 12 Total Return > 1.92 :
| | Morningstar Average Rating <= 3.9 :
| | | Year 3 Morningstar Return <= 0.11 : 1-star
| | | Year 3 Morningstar Return > 0.11 :
| | | | Year 3 Morningstar Risk <= 0.82 :
| | | | | Year 10 Morningstar Return <= 1.1 :
| | | | | | Year 5 Morningstar Return <= 0.98 : 2-
I I I | | | Year 5 Morningstar Return > 0.98 :
I I I | | | | Year 5 Morningstar Risk <=0.68 : 3-
I I | | | | | Year 5 Morningstar Risk > 0.68:2-
I I I I I Year 10 Morningstar Return > 1.1 :
I I | | | I Months Rated <= 87 : 2-star
I I I I I | Months Rated > 87 : 3-star
| | | | Year 3 Morningstar Risk > 0.82 :
| | | | | Beta > 0.8 : 2-star
| || I I Beta <=0.8 :
I I | | | | Health Sector <= 5.5 : 2-star
I | | | | | Health Sector > 5.5 : 1-star
| | Morningstar Average Rating > 3.9 :
| | | Alpha <= -2.7 : 2-star
| | | Alpha > -2.7 : 3-star
Year 3 Morningstar Return > 1.05 :
| Month 12 Total Return <= 6.36 : 2-star
I Month 12 Total Return > 6.36 :
I | Year 5 Morningstar Return <= 1.19 :
I | | Morningstar Average Rating > 3.8 : 3-star
| | | Morningstar Average Rating <=3.8 :
| | | | Year 3 Standard Deviation <= 8.6 : 3-star
till Year 3 Standard Deviation > 8.6 : 2-star
| | Year 5 Morningstar Return > 1.19 :
I | | Annual Return 1994 > -3.36 : 3-star
I | | Annual Return 1994 <= -3.36 :
| | | | Year 5 Morningstar Return <=1.44 : 2-star
I | | | Year 5 Morningstar Return > 1.44 : 3-star
Figure 5.12â€”continued.
star
star
star

148
5.3.4 Conclusions
The experiments in this phase lead us to the conclusion that C4.5 can predict the
ratings of unmatched mutual funds one year in the future by using decision trees trained
on the previous year's examples. Based upon the error and confidence parameters we
calculated for this phase, we are approximately 80% (1-5) confident that the 5-Star
predicted ratings were 65% (1 - e) correct. For the 3-Star rating system, we are
approximately 75% (1 - 8) confident that the predicted ratings are 75% (1 - s) correct. In
other words, the 3-Star rating system has a tradeoff between confidence and accuracy
versus the 5-Star rating system.
We also concluded that C4.5 correctly identified the role of the three-year features
in the rating process. This is a continuation of an observation from Phase 6.
5.4 Overall Summary
This chapter focused on predicting mutual fund ratings and ratings changes with
C4.5 using the 5-Star and the 3-Star rating systems. In the fifth phase we attempted to
predict mutual fund ratings one year in the future using a common 28 feature vector for
1993, 1994, and 1995 Momingstar data. Our results for the two rating systems could best
be described as mixed compared to an investment strategy of assuming the ratings did not
change over a one-year period These results also lead us to conclude that more features
could improve the accuracy of the C4.5 predictions.
In the sixth phase, 60 features were included in the feature vector from the
Momingstar data for 1994, 1995, and 1996 to predict matched mutual fund ratings.
Morningstar published some of these features for the first time in 1994. The results
showed that C4.5 could predict mutual fund ratings and ratings changes. C4.5 produced

149
an independent distribution of ratings that was also normally-distributed similar to the
Morningstar Mutual Funds five-star rating system. In other words, C4.5 was not
mirroring the distribution of the preceding year's rating distribution or the current year
actual ratings distribution The 5-Star rating system predictions for matched funds
underperformed an investment strategy of assuming the rating will be the same one year
in the future. The 3-Star rating system outperformed this benchmark strategy.
In the seventh, and final, phase C4.5 predicted the ratings of unmatched mutual
funds. The results showed that C4.5 did a good job of predicting the ratings of
unmatched mutual funds and that C4.5 created an independent normal distribution of
fund ratings
In the next chapter, we will present our overall conclusions of this research and our
directions for future research.

CHAPTER 6
SUMMARY AND FUTURE RESEARCH
In this report, we have described a new application of the artificial intelligence
technique of machine learning to a very complex problem concerning the classification
and prediction of mutual fund ratings Our research consisted of two parts: comparing
the classification capability ofC4.5, the machine learning program, to statistical methods
and predicting mutual fund ratings to a benchmark investment strategy.
Logit and LDA have been applied to financial classification problems over a
period of years and gained acceptance from academic researchers. It was necessary to
test C4.5 against these statistical methods to prove that the machine learning system
could classify mutual funds as well as they could.
The first phase of this research used Momingstar mutual fund data to test C4.5,
Logit, and LDA. Logit was the clear winner in this first face-off and the results showed
that the three systems could classify mutual funds with reasonable success. We
concluded from the results that C4.5 could improve its performance with a larger training
set.
In the second phase, we compared the three systems performance to classifying
mutual funds after we added some new features that we derived from the Momingstar
data and we increased the number of examples in the training set. We were also
observing the effect the derived features had on classification.
150

151
The results of the baseline test for this phase proved a statistical dead heat for the
three systems. In the second test, classifying the examples with the derived features,
C4.5 and Logit tied while leaving LDA hopelessly lost in the data. One of the derived
features was the Treynor Performance Index and it provided no improvement to the
classification of mutual funds. Another derived feature, a surrogate for the Momingstar
Risk-adjusted Return, did improve classification by C4.5 and Logit The third derived
feature played the role of a confounding factor and it had no impact on classification by
the two winners.
The third phase devised a merger of the Momingstar rating system to simplify the
classification task. Instead of five rating classes or stars, we used three. Part of the logic
behind this suggestion was the anecdotal evidence that investors were mostly buying 4-
and 5-Star funds, so why not combine them. We also combined 1- and 2-Star funds into
a separate rating class. We tested the classification performance of the three systems
against the 5-Star and 3-Star rating systems.
A second goal of this research phase was to identify the features needed to
classify unrated funds. A large number of funds are unrated because Momingstar
requires three years of financial data before a rating can be assigned to a fund. Some
funds are unrated by Momingstar because they are small or have not attracted much
investor attention. By being able to classify them according to the Momingstar rating
system, or the 3-Star variant, without using proprietary Momingstar data, would be an
important accomplishment of this research.
The training set for this phase was larger than the prior two phases, and C4.5 and
Logit were not statistically different in their performance. This time, however, C4.5 had

152
fewer errors than Logit in classifying most of the datasets. The results of classifying the
testing set, or unseen examples, suggested that it would be possible to classify unrated
mutual funds by the Momingstar rating system. Classification with the 3-Star rating
system resulted in fewer errors than the 5-Star system. This phase concluded our testing
of the three classification systems.
Phase 4 of the research used the method of crossvalidation to test the ability of
C4.5 to classify mutual funds with a large number of features in the feature vector. We
had good results from this phase of the research and applied them to the remainder of our
research.
The final three phases concerned predicting the Momingstar ratings one year in
the future. The general research methodology had C4.5 build a decision tree using
examples from one year and then we predicted ratings using examples from the
succeeding year. We compared the predicted ratings to the actual Momingstar ratings for
the prediction year. We also compared the predictions to an investment strategy of
assuming mutual funds maintain their rating over the one-year period.
In Phase 5, we tested to see if we could devise a common feature vector to predict
Momingstar ratings over two one-year periods. The mixed results, compared to the
investment strategy, indicated that we needed more features to achieve good predictions.
The sixth phase of the research compared the ability of C4.5 to predict the ratings
of matched mutual funds. By only using funds that existed over the one-year period, we
were able to test not only the accuracy of the predictions but the accuracy of the ratings
changes. Our results showed that the 5-Star rating system had good prediction results but
it barely failed to outperform the benchmark investment strategy. The 3-Star rating

153
system outperformed the investment strategy. We had good success in predicting rating
changes by the mutual funds.
The seventh, and final, phase of our research tested the ability of C4.5 to develop
decision trees that could predict the ratings for unmatched funds. Now we were
predicting ratings on 180 or more funds that the decision tree had not seen in the training
set. Our results were similar to Phase 6 and we concluded that the prediction process
worked well.
Of the two tasks at hand, classification and prediction, we feel that classification
is the more important. It would desirable, and easier, for investors to be able to classify
unrated mutual funds than trying to predict fund ratings The reason for this is that
predictions of fund ratings would be based on predictions of mutual fund financial data,
whereas classification of unrated funds would be performed using hard financial data.
Classifying unrated mutual funds would probably be more believable to the skeptical
financial investor On the other hand, with all the firms predicting financial data today
and in the future, it should be possible to improve upon the generation of predicted
mutual fund financial data to incorporate into a model to predict mutual fund ratings.
The major importance of the prediction phase of this research is that we have
demonstrated that there is some relationship between the data provided by Momingstar
and their rating system, such that we could predict future ratings with a reasonable degree
of accuracy. If investors are truly chasing Momingstar ratings, this is good news.
Our experiences and results have also identified opportunities for future research.
First, we chose to study equity mutual funds and the methodology could be logically

154
extended to bond mutual funds, hybrid funds, and mutual funds outside this country.
Bond funds, in particular, have a slightly different set of features than equities.
Second, in obtaining good results we frequently felt frustrated with the wide
variation on the C4.5 sensible tests parameter. As can be seen in Chapter 5, every result
had a different sensible test setting to produce the best predicting decision tree.
Consistency in determining this setting is key to developing a stable and workable
methodology for classification and prediction. Further research is needed to determine a
minimal range for this parameter or published results to give practitioners a good
jumping off point.
Third, research could isolate the features and interactions that aid in predicting the
mutual fund ratings.
Fourth, having demonstrated that C4.5 classifies mutual funds by rating as well as
Logit and better than LDA, another extension of this research is to classify mutual funds
with other artificial intelligence systems.

APPENDIX A
DESCRIPTION OF MUTUAL FUND FEATURES
Equity Fund Investment Objectives: Momingstar bases a fundâ€™s objective on either the
wording in the prospectus issued by the fundâ€™s advisor, or, to a lesser extent, the manner
in which the fund is marketed. The objectives are defined as follows:
Aggressive Growth: seeks rapid growth of capital, often through investment in
smaller companies and with investment techniques involving greater-than-average
risk, such as frequent trading, leveraging, and short-selling.
Equity-Income: seeks current income by investing at least 65% of its assets in
equity securities with above-average yields.
Growth: seeks capital appreciation by investing primarily in equity securities.
Growth and Income, seeks growth of capital and current income as near-equal
objectives, primarily by investing in equity securities with above-average yields and
some potential for appreciation.
Europe Stock: generally invests at least 65% of assets in equity securities of
European issuers.
Foreign Stock: invests primarily in equity securities of issuers located outside the
United States.
Pacific Stock: invests primarily in issuers located in countries of the Pacific Basin
World Stock: invests primarily in equity securities of issuers located throughout
the world, maintaining a percentage of assets in the United States.
Specialty-Financial: seeks capital appreciation by investing primarily in equity
securities of financial services companies.
Specialty-Health: seeks capital appreciation by investing primarily in equity
securities of healthcare companies, including drug manufacturers, hospitals, and
biotechnology firms.
155

156
Specialty-Natural Resources: seeks capital appreciation by investing primarily in
equity securities of companies involved in the exploration, distribution, or
processing of natural resources.
Specialty-Precious Metals: seeks capital appreciation by investing primarily in
equity securities of companies engaged in the mining, distribution, or processing of
precious metals.
Specialty-Technology: seeks capital appreciation by investing primarily in equity
securities of companies engaged in the development, distribution, or servicing of
technology-related equipment or processes.
Specialty-Utilities: seeks capital appreciation by investing primarily in equity
securities of public utilities.
Small Company: seeks capital appreciation by investing primarily in stocks of small
companies as determined by market capitalization
Yield: a fundâ€™s income return on capital investment for the past twelve months, expressed
as a percentage. This refers only to distributions of dividends from stocks.
SEC Yield: a standardized figure that the Securities and Exchange Commission requires
funds to use when mentioning yield in their advertisements. An annualized calculation
based on a trailing 30-day period, SEC Yield will often differ significantly from
Momingstarâ€™s other yield figure, which reflects trailing 12-month distributed yield. In
companies, the figure often reflects the yield from the period two months previous to the
date of the Momingstar report.
Assets: The fund assets in millions of dollars at the end of the most recently reported
month.
Three Month. Six Month and One Year Total Returns: Three. Five, and Ten Year
Average Returns: Momingstar calculates total returns by taking the change in investment
value, assuming reinvestment of all income and capital-gains distributions (plus any other
miscellaneous distributions) during the period, and dividing by the initial investment value.
Momingstar does not adjust the total returns for sales charges or redemption fees. The
total returns do account for management, administrative, and 12b-l fees, and other costs
automatically deducted from fund assets. Total returns for periods longer than one year
are expressed in terms of compounded average annual returns
Percentile Rank All: a fundâ€™s total return is ranked against the returns over the same
period for all funds tracked by Momingstar.
Percentile Rank Objective: compares the fundâ€™s total return with all funds that have the
same investment objective.

157
Momingstar Return (3, 5, and 10 Year): Equity funds are rated separately from other
types of funds. The 3 Year, 5 Year and 10 Year Average Returns are adjusted for
maximum front-end loads and an appropriate level of redemption fees, if any. The
Momingstar Return is then calculated by setting the Equity class average to 1.00 with all
other funds relative to this value. The overall Momingstar Return is based on a weighted
average of the three periods; the 10-year number accounts for 50%, the five-year value
accounts for 30% and the three-year value for 20%. If only a five-year period is available,
the five-year value will account for 60%, and the three-year number, 40%. If only three
years of data are available, the three-year value serves as the overall rating. Momingstar
does not calculate a Momingstar Return for funds with less than three full years of
performance data. In July 1994, Momingstar changed the calculation of Momingstar
Return to base it on excess return Funds will now have the T-bill rate of return
subtracted from their load-adjusted total returns. This change has the effect of magnifying
good results; top-performing funds look that much better under the new methodology
Similarly, poor-performing funds score that much worse than before, reducing the chance
that the mere presence of a low risk score could push their ratings up to very high levels.
The use of excess returns not only puts Momingstar's ratings calculations in line with
standard academic practice, but also places more emphasis on the return half of the
risk/retum calculation than before.
Momingstar Risk f3. 5, and 10 Year!: This is a comparison of a fund's risk level relative to
other funds in the Equity class. Unlike traditional risk measures, which see both greater-
than-expected and less-than-expected returns as added risk, the Momingstar proprietary
measure focuses only on the downside. Because an investor can earn a certain return from
T-bills without incurring any risk, they define any monthly return less than the T-bill as
negative. Thus, to calculate Momingstar Risk, they subtract the three-month T-bill return
from each month's return by the fund. They then total the figures for the months wherein
the figure is negative, and divide the total losses (not the number of losing months) by the
total number of months in the rating period The average monthly loss is then compared
with those of all equity funds. The class average is calculated and set equal to 1.00. The
resulting Momingstar Risk expresses in percentage points how risky the fund is relative to
the average fund. For multiple periods, Momingstar Risk is weighted the exact same way
Momingstar Return is weighted.
Minimum Initial Purchase: the smallest investment accepted for establishing a new
account.
Shareholder Report Grade: the Momingstar evaluation of the quality of the report the
fund sends to the shareholder. Grades are A+, A, A-, B+, B, B-, C+, C, D, and F.
Date of Inception: the date on which the fund commenced operation.

158
Alpha: measures the difference between a fundâ€™s actual returns and its expected
performance over three years. A positive alpha indicates that the fund has performed
better than its beta would predict. The formula used by Momingstar is as follows:
a = (Fund - Treasury)-[/? x (Bench - Treasury)]
where Bench is the total return of the appropriate benchmark index, Treasury is the return
on three-month Treasury Bills, p is the fundâ€™s beta as calculated below, and Fund is the
fundâ€™s total return.
Beta: a measure of a fundâ€™s sensitivity to market movements over the last three years. It
measures the relationship between a fundâ€™s excess return over Treasury Bills and the
excess return of the benchmark index By definition, the beta of the benchmark index is
1.00. Accordingly, a fund with a beta of 1.10 is expected to perform 10% better, after
deducting the Treasury Bill rate, than the index in up markets and 10% worse in down
markets. Beta is defined as:
where pÂ¡ is the beta of security i, oÂ¡ represents the standard deviation of the returns of
security i, oM is the standard deviation of the returns of the market portfolio, and nM is the
correlation coefficient of security i to the market portfolio.
R-Squared: the percentage of the fundâ€™s movements over the last three years that are
explained by movements in its benchmark index. An R-Squared of 100 means that all
movements of a fund are completely explained by movements in the index.
Standard Deviation (3 Year, 5 Year, and 10 Year!: a statistical measure of the range of
performance within which a fundâ€™s total returns have fallen over the identified period.
Expense Ratio: the percentage of assets deducted each fiscal year for fund operating
expenses including 12b-1 fees, management fees, administrative fees, operating costs, and
all other asset-based costs incurred by the fund, except brokerage costs. Sales charges are
not included in the expense ratio.
Turnover Ratio: the fundâ€™s level of trading activity. This publicly reported figure is
calculated by the fund in accordance with SEC regulations. A fund divides the lesser of
purchases or sales by the fundâ€™s average monthly assets.
Net Assets: the year-end net assets of the mutual fund.
Price/Eamines Ratio: the weighted average of the price/eamings ratios of the stocks in a
fundâ€™s portfolio. It is calculated by dividing the current price of the stock by its trailing 12
monthsâ€™ earnings per share.
Price/Book Ratio: the weighted average of the price/book ratios of all the stocks in a
fundâ€™s portfolio. The P/B ratio of a company is calculated by dividing the market price of
its stock by the companyâ€™s per-share book value. Stocks with negative book values are

159
excluded from this calculation. In computing this average, Momingstar weights each
portfolio holding by the percentage of equity assets it represents, so that larger positions
have proportionately greater influence on the final P/B.
Five-Year Earnings Growth Percent: a measure of the trailing five-year annualized
earnings growth record of the stocks in the portfolio. This number is weighted such that
larger positions in the portfolio count proportionately more than lesser positions. Stocks
with losses during the past five years or stocks that lack a five-year record of
accomplishment are excluded from this calculation.
Return on Assets: a measure of the after-tax and after-debt-service profitability of the
company. This is the weighted average of the portfolio calculated from the companiesâ€™
net earnings over the trailing 12 months and their total assets.
Debt Percent of Total Capitalization: the weighted average of the amount of capital that
companies derive from long-term debt, as opposed to equity.
Median Market Capitalization: the median stock market capitalization of the companies in
an equity portfolio. This figure gives a measure of the size of the companies in which the
fund invests.
Style Box: a tool designed by Momingstar to help investors understand a fundâ€™s true
investment strategy. The Equity Style Box is a matrix with Median Market Capitalization
on the vertical axis (values: Small, Medium, and Large) and Investment Style on the
horizontal axis (values: Value-oriented, Blend, and Growth-oriented). A Growth-oriented
portfolio will mostly contain companies that the portfolio manager believes has the
potential to increase earnings faster than the rest of the market. A Value-oriented
portfolio focuses on stocks that the manager believes are undervalued in price and whose
worth the market will eventually recognize. In order to classify funds by Investment Style,
Momingstar takes a stock portfolioâ€™s average price/eamings ratio relative to the average
Standard & Poorâ€™s 500 Index and adds to it the portfolioâ€™s average price/book figure
relative to that of the S&P 500. By definition, the S&P 500 scores 2.00 under this system.
The sum of the relative ratios is then placed into one of the three style categories. Funds
with a combined value less than 1.75 are considered value funds. Portfolios with a
combined value betweenl.75 and 2.25 are considered blend, and any funds above 2.25 are
classified as growth funds. For the size categorization, Momingstar classifies Median
Market Capitalizations less that $1 billion as Small company, funds greater than$1 billion
but less than $5 billion are Medium, and funds exceeding$5 billion are classified as Large.
Equity Style Boxes are useful for evaluating only the stock portion of a fundâ€™s portfolio.
Funds with a significant mix of stock, bonds, and cash may have a substantial portion of
their portfolios left out of the Equity Style Box calculation.
Sector Weightings: this statistic shows the percentage of the fundâ€™s equity assets invested
into each of the major industry classifications. For 1993 Momingstar classified these as
Natural Resources, Industrial Products, Consumer Durables, Nondurables, Retail Trade,

160
Utilities, Services, Transportation, Finance, and Multi-industry. For 1994, Momingstar
changed the sectors to ten industries: Utilities, Energy, Finance, Industrial Cyclicals,
Consumer Durables, Consumer Staples, Services, Retail, Health, and Technology.
Front Load Charge: the maximum sales charge of the fund expressed as a percentage of
the offering price of the fund.
Deferred Sales Charge: a percentage of the lesser of the value of the shares at the time of
purchase or their value at the time of sale
12b-l Fees: the maximum annual charge deducted from fund assets to pay for distribution
and marketing costs. The SEC caps the 12b-l fee at 1.00% annually.
Cash%. Stock%. Bonds%. Preferred%. Other%. and Foreign Stock%: the percentage of
the portfolio in each of these classifications. Mutual funds sometimes redeem stock and
hold cash or other financial instruments.
Manager Tenure: the length of time in years that an individual has been the fund manager.

APPENDIX B
PHASE 1 CLASSIFICATION FEATURES
Table B. 1: C4.5 Classification Features.
Sample
Classification Features
A
Year 1 Total Return, Alpha, Expense Ratio, Retail Sector, P/E Ratio, Cash%,
Beta, ROA, Yield
B
Alpha, Yield, Debt % of Total Capitalization, Expense Ratio, Return on
Assets, Turnover
C
Alpha, Median Market Capitalization, Expense Ratio, Assets, Industrial
Products Sector, Consumer Durables Sector, Yield
D
Year 1 Total Return, Yield, R-Square, P/B Ratio, Expense Ratio, Median
Market Capitalization, Assets, Natural Resources Sector
E
Alpha, Yield, Expense Ratio, Assets, YTD Total Return, Median Market
Capitalization
F
Alpha, Yield, ROA, Expense Ratio, Median Market Capitalization, Consumer
Durables Sector, PB Ratio, Cash%
G
Alpha, Yield, R-Square, Expense Ratio, Debt% of Total Cap, Median Market
Capitalization, ROA, Year 1 Total Return, R-Square, Non-Durables Sector,
P/B Ratio, YTD Total Return
H
Alpha, R-Square, Expense Ratio, Yield, Retail Sector, Assets, PB Ratio
I
Alpha, Expense Ratio, Median Market Capitalization, Debt% of Total
Capitalization
J
Alpha, Expense Ratio, YTD Total Return, Yield, ROA, Nondurables Sector,
Assets
K
Alpha, R-Square, Yield, Natural Resource Sector, Year 1 Total Return, Beta,
Expense Ratio, Assets, Manager Tenure
L
Alpha, Expense Ratio, ROA, P/B Ratio
M
Alpha, R-Square, Expense Ratio, Turnover, Median Market Capitalization,
Manager Tenure, ROA
N
Alpha, Expense Ratio, Median Market Capitalization
0
Alpha, R-Square, PE Ratio, Median Market Capitalization, Turnover, Expense
Ratio, Service Sector, Assets, P/B Ratio, Consumer Durables Sector, Non-
Durables Sector
P
Alpha, R-Square, Median Market Capitalization, Expense Ratio, Assets, Yield,
P/B Ratio, YTD Total Return, Natural Resources Sector
161

162
Table B 1â€”continued.
Sample
Classification Features
Q
Alpha, R-Square, Median Market Capitalization, Beta, Expense Ratio, Assets,
P/B Ratio
R
Alpha, Median Market Capitalization, Expense Ratio, ROA, Yield, Industrial
Products Sector, Turnover, Cash%
S
Alpha, R-Square, Yield, Expense Ratio, Industrial Products Sector, Beta,
ROA, Service Sector, Assets
T
Alpha, Expense Ratio, ROA, Assets, Beta, Median Market Capitalization,
Consumer Durables Sector, Yield
Table B.2: Stepwise Discriminant Analysis Classification Features.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Beta, Assets, Turnover, Retail Sector, Year 1 Total Return,
ROA, Expense Ratio, Cash %, Debt % of Total Capitalization
B
Alpha, R-Square, Beta, Assets, Return on Assets, Turnover, Year 1 Total
Return, Yield, Debt % of Total Capitalization, Industrial Products Sector,
Consumer Durables Sector, P/E Ratio
C
Alpha, R-Square, Beta, Assets, Turnover, Expense Ratio, Retail Sector, Debt
% of Total Capitalization, ROA, Year 1 Total Return, Natural Resources
Sector, Cash %, Financials Sector, Consumer Durables Sector, Median Market
Capitalization
D
Alpha, R-Square, Beta, Services Sector, Assets, Year 1 Total Return,
Turnover, ROA, P/E Ratio, Cash %, Median Market Capitalization
E
Alpha, R-Square, Beta, Assets, Turnover, Retail Sector, Expense Ratio, Debt
% of Total Capitalization, Year 1 Total Return, Financials Sector, P/E Ratio,
YTD Total Return
F
Alpha, R-Square, Beta, Assets, Year 1 Total Return, ROA, Turnover, Retail
Sector, Services Sector
G
Alpha, R-Square, Beta, Turnover, Assets, ROA, Year 1 Total Return, Retail
Sector, Cash %, Expense Ratio, P/E Ratio, Yield
H
Alpha, R-Square, Beta, Assets, Expense Ratio, Cash %, Retail Sector, Year 1
Total Return, ROA, Consumer Durables Sector, Turnover, Median Market
Capitalization
I
Alpha, R-Square, Beta, Assets, Expense Ratio, Turnover, Year 1 Total Return,
Retail Sector, ROA, P/B Ratio, Financials Sector
J
Alpha, R-Square, Beta, Assets, Turnover, P/B Ratio, Year 1 Total Return,
Retail Sector, Expense Ratio, ROA
K
Alpha, R-Square, Beta, Assets, Return on Assets, Year 1 Total Return, Retail
Sector, Expense Ratio, Turnover, Consumer Durables, Cash %, P/E Ratio

163
Table B.2â€”continued.
Sample
Classification Features by Order of Selection
L
Alpha, R-Square, Beta, Assets, Turnover, P/B Ratio, Year 1 Total Return,
Debt % of Total Capitalization, Expense Ratio, Cash %, Industrial Products
Sector, Nondurables Sector
M
Alpha, R-Square, Beta, Assets, Turnover, Expense Ratio, Debt % of Total
Capitalization, Cash %, Retail Sector, Yield, Year 1 Total Return, ROA
N
Alpha, R-Square, Beta, Year 1 Total Return, Turnover, Services Sector,
Median Market Capitalization, Retail Sector, Expense Ratio, P/B Ratio,
Assets, YTD Total Return, Natural Resources Sector, Financials Sector, Debt
% of Total Capitalization
0
Alpha, R-Square, Beta, Assets, Year 1 Total Return, Industrial Products
Sector, Consumer Durables Sector, Expense Ratio, Retail Sector, Turnover,
Nondurables Sector, Services Sector, P/E Ratio, Natural Resources Sector,
P/B Ratio
P
Alpha, R-Square, Beta, Assets, Services Sector, Turnover, Retail Sector,
Expense Ratio, Year 1 Total Return, P/B Ratio, Cash %, Consumer Durables
Sector, Financials Sector
Q
Alpha, R-Square, Beta, Assets, Retail Sector, Cash %, Turnover, Debt % of
Total Capitalization, Return on Assets, Year 1 Total Return
R
Alpha, R-Square, Beta, Expense Ratio, Cash %, Debt % of Total
Capitalization, Year 1 Total Return, P/B Ratio, Consumer Durables Sector,
Retail Sector, Assets
S
Alpha, R-Square, Beta, Assets, Turnover, Services Sector, Year 1 Total
Return, ROA, Debt % of Total Capitalization, Cash %, Expense Ratio, YTD
Total Return
T
Alpha, R-Square, Beta, Assets, Year 1 Total Return, Services Sector,
Turnover, Retail Sector, ROA, Consumer Durables Sector, Nondurable Sector,
Expense Ratio, Cash%
Table B.3: Logistic Regression Classification Features.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Beta, Assets, Yield
B
Alpha, R-Square, Beta, Assets, ROA, Median Market Capitalization
C
Alpha, R-Square, Beta, Assets, Expense Ratio, Debt % of Total Capitalization,
Median Market Capitalization
D
Alpha, R-Square, Beta, Assets, Expense Ratio, Financials Sector, Natural
Resources Sector
E
Alpha, R-Square, Beta, Assets, Expense Ratio, Debt % of Total Capitalization
F
Alpha, R-Square, Beta, Assets

164
Table B.3â€”continued.
Sample
Classification Features by Order of Selection
G
Alpha, R-Square, Beta, Assets, P/E Ratio, Expense Ratio
H
Alpha, R-Square, Beta, Assets
I
Alpha, R-Square, Beta, Expense Ratio, Debt % of Total Capitalization, Assets
J
Alpha, R-Square, Beta, Expense Ratio, Assets, Debt % of Total Capitalization
K
Alpha, R-Square, Beta, Assets, ROA, Median Market Capitalization
L
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Expense Ratio
M
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Expense Ratio
N
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Expense Ratio
0
Alpha, R-Square, Beta, Expense Ratio, Debt % of Total Capitalization, Assets,
P/E Ratio, Retail Sector, Natural Resources Sector
P
Alpha, R-Square, Beta, Assets, Expense Ratio, Financials Sector
0
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization
R
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization
s
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Expense Ratio
T
Alpha, R-Square, Beta, Assets, Expense Ratio, Financials Sector

APPENDIX C
PHASE 2 CLASSIFICATION FEATURES
Table C. 1: C4.5 Classification with Regular Features.
Sample
Classification Features
A
Alpha, R-Square, SEC Yield, Industrial Products Sector, Expense Ratio, YTD
Total Return, Services Sector, Assets, Consumer Durables Sector
B
Alpha, YTD Total Return, SEC Yield, Industrial Products Sector, Debt % of
Total Capitalization, P/E Ratio, Median Market Capitalization, Turnover,
Assets, ROA, Expense Ratio, Services Sector, Natural Resources Sector
C
Alpha, R-Square, Industrial Products Sector, ROA, P/E Ratio, Expense Ratio,
Assets, SEC Yield
D
Alpha, YTD Total Return, SEC Yield, Industrial Products Sector, Cash%,
Median Market Capitalization, Year 1 Total Return, ROA, Assets, Debt % of
Total Capitalization, Non-Durables Sector, P/B Ratio, Turnover
E
R-Square, Alpha, Year 1 Total Return, Industrial Products Sector, Beta,
Expense Ratio, Assets
F
Alpha, P/E Ratio, Median Market Capitalization, SEC Yield, Industrial Products
Sector, ROA, Services Sector, Year 1 Total Return, Assets, Beta, Cash%,
Natural Resources Sector
G
Alpha, SEC Yield, Industrial Products Sector, ROA, Assets, Median Market
Capitalization, R-Square
H
Alpha, SEC Yield, Debt % of Total Capitalization, ROA, P/E Ratio, Services
Sector, Assets, Natural Resources Sector
I
Natural Resources Sector, Alpha, R-Square, SEC Yield, Industrial Products
Sector, Expense Ratio, Turnover, Beta, ROA
J
Alpha, R-Square, Industrial Products Sector, Services Sector, Expense Ratio,
Debt % of Total Capitalization, Median Market Capitalization, Assets
K
Alpha, R-Square, Debt % of Total Capitalization, Industrial Products Sector,
Assets, Expense Ratio, Year 1 Total Return, Median Market Capitalization
L
Alpha, R-Square, Industrial Products Sector, SEC Yield, Debt % of Total
Capitalization, Assets, ROA
M
Alpha, SEC Yield, Industrial Products Sector, Assets, Natural Resources
Sector, Median Market Capitalization
N
Alpha, R-Square, Beta, Debt % of Total Capitalization, ROA, P/E Ratio,
Cash%, Expense Ratio, Assets, Year 1 Total Return, Consumer Durables
Sector, Median Market Capitalization, YTD Total Return, Finance Sector
165

166
Table C. 1â€”continued
Sample
Classification Features
0
Alpha, SEC Yield, Debt % of Total Capitalization, P/E Ratio, Assets, Median
Market Capitalization
P
Alpha, R-Square, Industrial Products Sector, P/B Ratio, SEC Yield, Consumer
Durables Sector, Assets, Manager Tenure, Expense Ratio
Q
Alpha, R-Square, Debt % of Total Capitalization, P/B Ratio, Expense Ratio,
Median Market Capitalization, Assets
R
R-Square, Alpha, Industrial Products Sector, ROA, Turnover, P/E Ratio,
Expense Ratio, Assets, Median Market Capitalization
S
Alpha, Cash%, Industrial Products Sector, Year 1 Total Return, Expense
Ratio, ROA, Retail Trade Sector, Assets
T
Alpha, R-Square, Industrial Products Sector, SEC Yield, P/E Ratio, Year 1
Total Return, Turnover, Assets, Median Market Capitalization, Natural
Resources Sector
Table C.2: C4.5 Classification with Derived Features.
Sample
Classification Features
A
Return-Risk, Reversed Weight Return-Risk, Expense Ratio, Assets, SEC Yield,
Finance Sector
B
Return-Risk, SEC Yield, Assets, Reversed Weight Return-Risk
C
Return-Risk, Industrial Products Sector, Year 1 Total Return, Reversed Weight
Return-Risk, Assets, Consumer Durables Sector, SEC Yield, Finance Sector,
Median Market Capitalization, R-Square
D
Return-Risk, Assets
E
Return-Risk, Industrial Products Sector, Expense Ratio, Reversed Weight
Return-Risk, Manager Tenure, Beta, Turnover, Services Sector
F
Return-Risk, Assets, SEC Yield, Debt % of Total Capitalization
G
Return-Risk, Assets
H
Return-Risk, Reversed Weight Return-Risk, Assets, Beta
I
Return-Risk, SEC Yield, Assets
J
Return-Risk, Assets, Beta, Median Market Capitalization, P/E Ratio
K
Return-Risk, Assets
L
Return-Risk, Assets, Debt % of Total Capitalization
M
Return-Risk, Assets
N
Return-Risk, P/B Ratio
0
Return-Risk, Turnover, Retail Trade Sector, Assets, Expense Ratio, Non-
Durables Sector, Assets, SEC Yield, Consumer Durables
P
Return-Risk, Assets

167
Table C.2â€”continued.
Sample
Classification Features
Q
Return-Risk, Treynor Index, R-Square, Beta, Alpha, Consumer Durables
Sector, SEC Yield, Assets, Debt % of Total Capitalization
R
Return-Risk, Assets, Treynor Index, Reversed Weight Return-Risk, SEC Yield
S
Return-Risk, Cash%, SEC Yield, Assets
T
Return-Risk, Debt % of Total Capitalization, Turnover
Table C.3: Stepwise Discriminant Analysis with Regular Features.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Beta, Natural Resources Sector, Industrial Products Sector,
Assets, P/B Ratio, Services Sector, Consumer Durables Sector, Cash %,
Expense Ratio, P/E Ratio, Retail Sector, Debt % of Total Capitalization
B
Alpha, R-Square, Beta, Natural Resources Sector, Debt % of Total
Capitalization, Assets, Industrial Products Sector, Turnover, Financials Sector,
P/B Ratio, Cash %, Retail Sector, Consumer Durables Sector, Expense Ratio,
YTD Total Return, Year 1 Total Return, P/E Ratio, Services Sector, ROA
C
Alpha, R-Square, Beta, Industrial Products Sector, Assets, Debt % of Total
Capitalization, YTD Total Return, Cash %, P/E Ratio, Financials Sector,
Turnover
D
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, YTD Total
Return, Industrial Products Sector, Turnover, Retail Sector, Year 1 Total
Return, Yield
E
Alpha, R-Square, Beta, Assets, Industrial Products Sector, P/B Ratio, YTD
Total Return, Debt % of Total Capitalization, Year 1 Total Return, Cash %,
Expense Ratio, ROA, Nondurables Sector, Yield, Manager Tenure
F
Alpha, R-Square, Beta, Industrial Products Sector, Assets, P/E Ratio,
Financials Sector, Debt % of Total Capitalization, YTD Total Return, Year 1
Total Return, Retail Sector, Expense Ratio, Cash %
G
Alpha, R-Square, Beta, Assets, Industrial Products Sector, Natural Resources
Sector, Services Sector, Cash %, Consumer Durables Sector, P/B Ratio, Retail
Sector, P/E Ratio, ROA, Debt % of Total Capitalization
H
Alpha, R-Square, Beta, Natural Resources Sector, Debt % of Total
Capitalization, Services Sector, Expense Ratio, Industrial Products Sector, P/B
Ratio, YTD Total Return, Turnover, Year 1 Total Return, Consumer Durables
Sector, ROA

168
Table C.3â€”continued.
Sample
Classification Features by Order of Selection
I
Alpha, R-Square, Beta, Natural Resources Sector, Industrial Products Sector,
Assets, Retail Sector, Debt % of Total Capitalization, Turnover, YTD Total
Return, P/B Ratio, Manager Tenure
J
Alpha, R-Square, Beta, YTD Total Return, Assets, Debt % of Total
Capitalization, Expense Ratio, Retail Sector, Year 1 Total Return, Industrial
Products Sector, Cash %, P/B Ratio
K
Alpha, R-Square, Beta, Debt % of Total Capitalization, Industrial Products
Sector, Assets, YTD Total Return, Year 1 Total Return, P/B Ratio, Retail
Sector, ROA, Turnover, Manager Tenure, Financials Sector, Services Sector,
Expense Ratio
L
Alpha, R-Square, Beta, Assets, Industrial Products Sector, Debt % of Total
Capitalization, YTD Total Return, Cash %, Services Sector, Retail Sector, P/E
Ratio, Expense Ratio
M
Alpha, R-Square, Beta, Assets, Industrial Products Sector, Debt % of Total
Capitalization, Turnover, YTD Total Return, P/B Ratio, ROA, Year 1 Total
Return, Retail Sector
N
Alpha, R-Square, Beta, Assets, P/B Ratio, Industrial Products Sector,
Turnover, ROA, Debt % of Total Capitalization, YTD Total Return, Year 1
Total Return, Services Sector, Cash %, Manager Tenure, P/E Ratio, Financials
Sector
0
Alpha, R-Square, Beta, Industrial Products Sector, YTD Total Return, Debt
% of Total Capitalization, Assets, P/B Ratio, Retail Sector, Turnover, P/E
Ratio, ROA, Year 1 Total Return, Financials Sector, Manager Tenure, Median
Market Capitalization
P
Alpha, R-Square, Beta, Assets, Industrial Products Sector, YTD Total Return,
Debt % of Total Capitalization, Year 1 Total Return, Retail Sector, Turnover,
P/E Ratio, Financials Sector, Manager Tenure, Cash %, Consumer Durables
Sector
Q
Alpha, R-Square, Beta, Assets, Industrial Products Sector, P/E Ratio,
Financials Sector, Turnover, Debt % of Total Capitalization, YTD Total
Return, Retail Sector, Year 1 Total Return, Cash %
R
Alpha, R-Square, Beta, Industrial Products Sector, Assets, P/B Ratio, YTD
Total Return, Turnover, Debt % of Total Capitalization, Cash %, Expense
Ratio, Consumer Durables Sector
S
Alpha, R-Square, Beta, Cash %, P/B Ratio, Assets, Industrial Products Sector,
Retail Sector, YTD Total Return, Debt % of Total Capitalization, Year 1
Total Return, Expense Ratio, SEC Yield

169
Table C.3â€”continued.
Sample
Classification Features by Order of Selection
T
Alpha, R-Square, Beta, YTD Total Return, P/B Ratio, Assets, Industrial
Products Sector, Debt % of Total Capitalization, Turnover, Financials Sector,
Services Sector, Natural Resources Sector, Consumer Durables Sector, Cash
%
Table C.4: Stepwise Discriminant Analysis with Derived Features
Sample
Classification Features by Order of Selection
A
Return - Risk, R-Square, Industrial Products Sector, Alpha, RevWgtRetum -
Risk, Beta, Assets, Treynor Index, Expense Ratio, ROA, Debt % of Total
Capitalization, Turnover, Consumer Durables Sector
B
Return - Risk, R-Square, Industrial Products Sector, Assets, ROA,
RevWgtRetum - Risk, Alpha, Beta, Debt % of Total Capitalization, Turnover,
Treynor Index, Cash %, Expense Ratio, SEC Yield, Retail Sector, Consumer
Durables Sector, Services Sector
C
Return - Risk, R-Square, Industrial Products Sector, Treynor Index, YTD
Total Return, ROA, Assets, Expense Ratio, Year 1 Total Return, Alpha,
RevWgtRetum - Risk, Beta, Debt % of Total Capitalization, Natural
Resources Sector, Cash %, Manager Tenure, SEC Yield
D
Return - Risk, R-Square, Assets, RevWgtRetum - Risk, Alpha, Beta, Expense
Ratio, Turnover, Industrial Products Sector, Retail Sector, SEC Yield, ROA,
Year 1 Total Return, Debt % of Total Capitalization
E
Return - Risk, R-Square, Industrial Products Sector, Assets, Alpha,
RevWgtRetum - Risk, Beta, Expense Ratio, Treynor Index, ROA, Debt % of
Total Capitalization, YTD Total Return, P/B Ratio, Cash %, Manager Tenure,
Consumer Durables Sector, Services Sector, Natural Resources Sector
F
Return - Risk, R-Square, Industrial Products Sector, Assets, ROA, Expense
Ratio, Alpha, RevWgtRetum - Risk, Beta, Debt % of Total Capitalization,
Retail Sector, Cash %, Treynor Index, Natural Resources Sector, P/E Ratio
G
Return - Risk, R-Square, YTD Total Return, Assets, ROA, Industrial Products
Sector, Expense Ratio, Alpha, RevWgtRetum - Risk, Beta, Treynor Index,
Cash %, Services Sector, Natural Resources Sector, Consumer Durables
Sector
H
Return - Risk, R-Square, YTD Total Return, Industrial Products Sector,
Assets, Treynor Index, Beta, Services Sector, Turnover, RevWgtRetum -
Risk, Alpha, Expense Ratio, Debt % of Total Capitalization, ROA, Manager
Tenure, Consumer Durables Sector

170
Table C.4â€”continued.
Sample
Classification Features by Order of Selection
I
Return - Risk, R-Square, Industrial Products Sector, Alpha, RevWgtRetum -
Risk, Beta, Assets, Expense Ratio, Turnover, Retail Sector, Consumer
Durables Sector, Debt % of Total Capitalization, ROA, P/B Ratio, Year 1
Total Return
J
Return - Risk, R-Square, Assets, Expense Ratio, Industrial Products Sector,
Alpha, RevWgtRetum - Risk, Beta, Retail Sector, Turnover, Treynor Index,
Consumer Durables Sector, Services Sector, P/E Ratio
K
R-Square, Return - Risk, Alpha, RevWgtRetum - Risk, Industrial Products
Sector, Debt % of Total Capitalization, Assets, Turnover, Beta, Median
Market Capitalization, Expense Ratio, Manager Tenure, ROA, P/B Ratio,
Year 1 Total Return, YTD Total Return, Financials Sector
L
Return - Risk, R-Square, Industrial Products Sector, Assets, ROA, Debt % of
Total Capitalization, Alpha, RevWgtRetum - Risk, Beta, Cash %, Expense
Ratio, Treynor Index, Nondurables Sector, Manager Tenure, P/B Ratio
M
Return - Risk, R-Square, P/B Ratio, Industrial Products Sector, Assets,
RevWgtRetum - Risk, Alpha, Beta, ROA, Debt % of Total Capitalization,
Turnover, Treynor Index, Expense Ratio, Year 1 Total Return
N
Return - Risk, R-Square, Industrial Products Sector, Alpha, RevWgtRetum -
Risk, Beta, Assets, Expense Ratio, Turnover, ROA, P/B Ratio, Debt % of
Total Capitalization, Manager Tenure, Cash %, P/E Ratio
0
Return - Risk, R-Square, Industrial Products Sector, Alpha, RevWgtRetum -
Risk, Beta, ROA, Assets, Expense Ratio, Turnover, Natural Resources Sector,
Consumer Durables Sector, Manager Tenure, P/B Ratio, Debt % of Total
Capitalization, P/E Ratio, Retail Sector, Cash %, Median Market
Capitalization
P
Return - Risk, R-Square, Industrial Products Sector, YTD Total Return,
Assets, Alpha, RevWgtRetum - Risk, Beta, ROA, Debt % of Total
Capitalization, Retail Sector, Treynor Index, Expense Ratio, Manager Tenure,
Turnover, Services Sector, Cash %
Q
Return - Risk, R-Square, YTD Total Return, Industrial Products Sector, ROA,
Assets, Debt % of Total Capitalization, Turnover, RevWgtRetum - Risk,
Alpha, Beta, Expense Ratio, Treynor Index, Cash %, Retail Sector, Manager
Tenure
R
Return - Risk, R-Square, Industrial Products Sector, Assets, Alpha,
RevWgtRetum - Risk, Beta, Turnover, Cash %, ROA, Consumer Durables
Sector, Expense Ratio, Debt % of Total Capitalization, P/B Ratio

171
Table C.4â€”continued.
Sample
Classification Features by Order of Selections
S
Return - Risk, R-Square, Industrial Products Sector, Assets, Cash %, Beta,
Alpha, RevWgtRetum - Risk, Retail Sector, Debt % of Total Capitalization,
Expense Ratio, Consumer Durables Sector, Manager Tenure, SEC Yield
T
Return - Risk, R-Square, Industrial Products Sector, Assets, Turnover, SEC
Yield, Beta, Alpha, RevWgtRetum - Risk, Services Sector, Consumer
Durables Sector, Debt % of Total Capitalization, Financials Sector
Table C.5: Logistic Regression Analysis with Regular Features
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Beta, Assets, P/E Ratio, Industrial Products Sector
B
Alpha, R-Square, Beta, Assets, Industrial Products Sector, YTD Total Return,
ROA, Retail Sector, Consumer Durables Sector
C
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Consumer
Durables Sector
D
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, YTD Total
Return, ROA
E
Alpha, R-Square, Beta, Assets, Consumer Durables Sector, Debt % of Total
Capitalization, Manager Tenure, SEC Yield
F
Alpha, R-Square, Beta, Assets, ROA, YTD Total Return
G
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Consumer
Durables Sector
H
Alpha, R-Square, Beta, ROA, Expense Ratio, Assets, YTD Total Return, Year
1 Total Return
I
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Retail Sector,
Consumer Durables Sector, P/B Ratio
J
Alpha, R-Square, Beta, Assets, Expense Ratio, ROA, YTD Total Return
K
Alpha, R-Square, Beta, Debt % of Total Capitalization, Expense Ratio, YTD
Total Return, Assets, ROA
L
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, Expense Ratio,
YTD Total Return, P/B Ratio
M
Alpha, R-Square, Beta, Debt % of Total Capitalization, Assets, YTD Total
Return
N
Alpha, R-Square, Beta, Consumer Durables Sector, ROA, Assets, YTD Total
Return, Industrial Products Sector
0
Alpha, R-Square, Beta, Assets, YTD Total Return, ROA, Industrial Products
Sector
P
Alpha, R-Square, Beta, Assets, ROA, YTD Total Return, Manager Tenure

172
Table C.5â€”continued.
Sample
Classification Features by Order of Selection
Q
Alpha, R-Square, Beta, Debt % of Total Capitalization, Assets, Expense Ratio,
P/E Ratio, Manager Tenure
R
Alpha, R-Square, Beta, Assets, Debt % of Total Capitalization, YTD Total
Return, P/B Ratio, Consumer Durables Sector
S
Alpha, R-Square, Beta, ROA, Assets, Consumer Durables Sector, Median
Market Capitalization
T
Alpha, R-Square, Beta, ROA, Assets, YTD Total Return, Expense Ratio
Table C.6: Logistic Regression Analysis with Derived Features.
Sample
Classification Features by Order of Selection
A
Return - Risk, Beta, Manager Tenure, Natural Resources Sector
B
Return - Risk, Beta, Manager Tenure
C
Return - Risk, P/B Ratio, Assets, Expense Ratio
D
Return - Risk, Beta, Assets, Expense Ratio
E
Return - Risk, Assets, Expense Ratio, Beta, Natural Resources Sector,
Manager Tenure, Consumer Durables Sector
F
Return - Risk, Beta, Assets, Expense Ratio, Natural Resources Sector
G
Return - Risk, Beta, Assets, Expense Ratio, Natural Resources Sector,
Manager Tenure
H
Return - Risk, Beta, Manager Tenure, Debt % of Total Capitalization
1
Return - Risk, Beta, Expense Ratio, Assets, Retail Sector, Consumer Durables
Sector
J
Return - Risk, Beta, Manager Tenure
K
Return - Risk, Beta, Manager Tenure, Natural Resources Sector, R-Square,
Alpha
L
Return - Risk, Assets, Beta, Manager Tenure, Expense Ratio, Natural
Resources Sector
M
Return - Risk, Beta, Assets, Manager Tenure, Natural Resources Sector,
Expense Ratio
N
Return - Risk, Beta, Assets, R-Square
0
Return - Risk, Beta, Manager Tenure, Natural Resources Sector
P
Return - Risk, Beta, Manager Tenure, Expense Ratio, Assets
0
Return - Risk, Beta, Manager Tenure, Natural Resources Sector
R
Return - Risk, Beta, Assets, R-Square, Consumer Durables Sector

173
Table C.6â€”continued.
Sample
Classification Features by Order of Selection
S
Return - Risk, Beta, Natural Resources Sector, Manager Tenure
T
Return - Risk, Beta, Natural Resources Sector, Manager Tenure, R-Square

APPENDIX D
PHASE 3 CLASSIFICATION FEATURES
Table Dl: C4.5 Classification Features for Dataset 1.
Sample
Classification Features
A
P/E Ratio, Assets, Year 1 Total Return, P/E Ratio, Services Sector, Foreign
Stock%
B
P/E Ratio, Assets, Consumer Staples Sector, Year 1 Total Return, Industrial
Cyclicals Sector, Median Market Capitalization, Foreign Stock %, 3 Month
Total Return
C
P/E Ratio, Year 1 Total Return, Assets, Median Market Capitalization, P/B
Ratio, Financials Sector, ROA, Technology Sector, Manager Tenure, Expense
Ratio
D
P/E Ratio, Year 1 Total Return, Assets, Utilities Sector, Median Market
Capitalization, Foreign Stock %, 6 Month Total Return, Income Ratio
E
Assets, P/E Ratio, Year 1 Total Return, Utilities Sector, Expense Ratio,
Foreign Stock %, Industrial Cyclicals Sector, 3 Month Total Return, Income
Ratio, Manager Tenure
F
P/E Ratio, Assets, Year 1 Total Return, Expense Ratio, Foreign Stock %,
Utilities Sector, Median Market Capitalization
G
Industrial Cyclicals Sector, Assets, Distributed Yield, ROA, Energy Sector,
Consumer Durables Sector, Year 1 Total Return, P/E Ratio, Retail Sector, 3
Month Total Return, Utilities Sector, Median Market Capitalization, Cash %,
Turnover, Health Sector, Expense Ratio, Foreign Stock %, P/B Ratio
H
Assets, Industrial Cyclicals Sector, Year 1 Total Return, Foreign Stock %, P/E
Ratio, Health Sector, Expense Ratio
I
P/E Ratio, Assets, Median Market Capitalization, Year 1 Total Return,
Income Ratio, Foreign Stock %, Expense Ratio, Distributed Yield, Health
Sector, Stocks %
J
Assets, P/E Ratio, Year 1 Total Return, Expense Ratio, Foreign Stock %, 6
Month Total Return, Median Market Capitalization, Debt % of Total
Capitalization
174

175
Table D.2: Stepwise Discriminant Analysis Classification Features for Dataset 1.
Sample
Classification Features by Order of Selection
A
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Style, Expense
Ratio, Utilities Sector, 6 Month Total Return, Financial Sector, Technology
Sector, Debt % of Total Capitalization, 3 Month Total Return, Retail Sector,
Manager Tenure, Stocks %, Cash %
B
P/E Ratio, Year 1 Total Return, Foreign Stock %, Assets, Style, Expense
Ratio, 6 Month Total Return, Industrial Cyclicals Sector, Stocks %, Maximum
Sales Charge, Health Sector, Consumer Staples Sector, Income Ratio,
Services Sector, Manager Tenure, Retail Sector, Median Market Capitalization
C
P/E Ratio, Year 1 Total Return, Foreign Stock %, Assets, Style, Expense
Ratio, Industrial Cyclicals Sector, Consumer Staples Sector, Turnover, 6
Month Total Return, Services Sector, Debt % of Total Capitalization,
Maximum Sales Charge, 3 Month Total Return, Financial Sector, Technology
Sector, Retail Sector, Utilities Sector
D
P/E Ratio, Year 1 Total Return, Expense Ratio, Assets, Style, Foreign Stock
%, Consumer Staples Sector, Industrial Cyclicals Sector, Income Ratio, 6
Month Total Return, Financial Sector, Debt % Total Capitalization, Maximum
Sales Charge, Retail Sector, Services Sector, Manager Tenure
E
P/E Ratio, Year 1 Total Return, Expense Ratio, Assets, Industrial Cyclicals
Sector, Style, Foreign Stock %, Financial Sector, Debt % of Total
Capitalization, Maximum Sales Charge, Services Sector, Median Market
Capitalization, Manager Tenure, 6 Month Total Return, Utilities Sector,
Health Sector, ROA
F
Industrial Cyclicals Sector, Year 1 Total Return, Foreign Stock %, Median
Market Capitalization, Assets, Expense Ratio, P/E Ratio, 6 Month Total
Return, Debt % of Total Capitalization, Income Ratio, Services Sector,
Energy Sector, Health Sector, Maximum Sales Charge
G
P/E Ratio, Year 1 Total Return, Expense Ratio, Industrial Cyclicals Sector,
Style, Assets, Foreign Stock %, Debt % of Total Capitalization, 3 Month
Total Return, Consumer Durables Sector, Turnover, Technology Sector,
Health Sector, Manager Tenure, Retail Sector
H
P/E Ratio, Year 1 Total Return, Expense Ratio, Industrial Cyclicals Sector,
Assets, Style, Foreign Stock %, 6 Month Total Return, Debt % of Total
Capitalization, Technology Sector, Retail Sector, Financial Sector, Utilities
Sector, Maximum Sales Charge, Manager Tenure, Consumer Durables, Stocks
%, Cash %
I
P/E Ratio, Year 1 Total Return, Foreign Stock %, Style, Assets, Expense
Ratio, Industrial Cyclicals Sector, 6 Month Total Return, Debt % of Total
Capitalization, Stocks %, Retail Sector, Turnover, Maximum Sales Charge,
Consumer Staples Sector

176
Table D.2
â€”continued.
Sample
Classification Features by Order of Selection
J
Expense Ratio, P/E Ratio, Year 1 Total Return, Assets, Style, Foreign Stock
%, 6 Month Total Return, Debt % of Total Capitalization, Industrial Cyclicals
Sector, Maximum Sales Charge, Consumer Staples Sector, Manager Tenure,
Turnover, Services Sector, Retail Sector, Income Ratio
Table D.3
Logistic Regression Classification Features for Dataset 1.
Sample
Classification Features by Order of Selection
A
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Utilities Sector,
Technology Sector, Financials Sector, Expense Ratio, Debt % of Total
Capitalization, 6 Month Total Return, Maximum Sales Charge
B
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Maximum Sales
Charge, Expense Ratio, Industrial Cyclicals Sector, Health Sector, 6 Month
Total Return, Stocks %, Services Sector, Median Market Capitalization
C
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Consumer Staples
Sector, Industrial Cyclicals Sector, Maximum Sales Charge, Expense Ratio, 6
Month Total Return, Debt % of Total Capitalization, Services Sector, Stocks
%
D
Year 1 Total Return, P/E Ratio, Expense Ratio, Foreign Stock %, Consumer
Staples Sector, Maximum Sales Charge, Industrial Cyclicals Sector, 6 Month
Total Return, Debt % of Total Capitalization, Financials Sector
E
Assets, Year 1 Total Return, Foreign Stock %, Industrial Cyclicals Sector,
Expense Ratio, Median Market Capitalization, Services Sector, Health Sector,
Maximum Sales Charge, 6 Month Total Return, Debt % of Total
Capitalization, Financials Sector
F
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Industrial Cyclicals
Sector, Maximum Sales Charge, Median Market Capitalization, Expense
Ratio, Utilities Sector, 6 Month Total Return, Financials Sector, Debt % of
Total Capitalization
G
Year 1 Total Return, Foreign Stock %, Industrial Cyclicals Sector, Assets, P/E
Ratio, Median Market Capitalization, Expense Ratio, Utilities Sector, 3 Month
Total Return, Financials Sector, Maximum Sales Charge, Debt % of Total
Capitalization, ROA
H
P/E Ratio, Year 1 Total Return, Expense Ratio, Industrial Cyclicals Sector,
Foreign Stock %, Assets, 6 Month Total Return, Debt % of Total
Capitalization, Maximum Sales Charge, Consumer Staples Sector, Health
Sector, Energy Sector, Services Sector

177
Table D.3â€”continued
Sample
Classification Features by Order of Selection
I
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Utilities Sector,
Financials Sector, 6 Month Total Return, Expense Ratio, Stocks %, Debt % of
Total Capitalization, Technology Sector, Maximum Sales Charge
J
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Expense Ratio,
Median Market Capitalization, Utilities Sector, Financials Sector, 6 Month
Total Return, Maximum Sales Charge, Debt % of Total Capitalization,
Technology Sector
Table D.4: C4.5 Classification Features for Dataset 2.
Sample
Classification Features
A
Alpha, R-Square, Assets, Distributed Yield, P/E Ratio, Stocks %, Median
Market Capitalization
B
Alpha, R-Square, P/B Ratio, Assets,
C
R-Square, Alpha, Assets, 6 Month Total Return, Foreign Stock %, ROA,
Turnover
D
R-Square, Alpha, P/E Ratio, Assets, Median Market Capitalization
E
R-Square, Alpha, Assets, Minimum Purchase
F
R-Square, Alpha, Assets, Distributed Yield, P/E Ratio, Minimum Purchase
G
R-Square, Alpha, Assets, Financial Sector, Foreign Stock %, Technology
Sector, Debt % of Total Capitalization, Consumer Durables Sector, ROA,
Energy Sector
H
R-Square, Alpha, Assets, Turnover, P/E Ratio, Turnover, Financial Sector,
Expense Ratio, 6 Month Total Return, Flealth Sector, Consumer Durables
Sector, Technology Sector, Minimum Purchase
1
R-Square, Alpha, Assets, Median Market Capitalization, Turnover
J
R-Square, Alpha, Assets, P/E Ratio, Expense Ratio
Table D.5: Stepwise Discriminant Analysis Classification Features for Dataset 2.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return,
Utilities Sector, P/E Ratio, Stocks %, Expense Ratio, Turnover, Manager
Tenure, Cash %, Retail Sector

178
Table D.5
ontinued.
Sample
Classification Features by Order of Selection
B
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/E Ratio, Expense
Ratio, Year 1 Total Return, Maximum Sales Charge, 6 Month Total Return,
Stocks %, Retail Sector, Utilities Sector, Manager Tenure, Consumer Staples
Sector, Turnover, Income Ratio
C
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Year 1 Total
Return, Utilities Sector, P/E Ratio, Expense Ratio, Health Sector, Maximum
Sales Charge, Turnover, 6 Month Total Return, Foreign Stock %, Financial
Sector, Services Sector, Debt % of Total Capitalization, Manager Tenure,
Retail Sector, Stocks %
D
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return,
Income Ratio, P/E Ratio, 6 Month Total Return, Expense Ratio, Turnover,
Health Sector, Debt % of Total Capitalization, Maximum Sales Charge,
Manager Tenure, Services Sector, Retail Sector, Utilities Sector, Consumer
Staples Sector
E
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return,
Utilities Sector, P/E Ratio, Maximum Sales Charge, Health Sector, Manager
Tenure, Retail Sector, Foreign Stock %, Debt % of Total Capitalization,
Technology Sector, 6 Month Total Return, P/B Ratio, Style, Consumer
Durables Sector
F
Alpha, R-Square, Industrial Cyclicals Sector, Year 1 Total Return, Assets, P/E
Ratio, Utilities Sector, Expense Ratio, Median Market Capitalization, 6 Month
Total Return, Health Sector, Debt % of Total Capitalization, Retail Sector,
Maximum Sales Charge, Technology Sector, P/B Ratio
G
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Debt % of Total
Capitalization, Year 1 Total Return, Expense Ratio, Turnover, Health Sector,
Foreign Stock %, Technology Sector, Retail Sector, Manager Tenure, Style,
Consumer Durables Sector
H
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/E Ratio, Year 1 Total
Return, Utilities Sector, Maximum Sales Charge, Turnover, Retail Sector,
Manager Tenure, 6 Month Total Return, Services Sector, Income Ratio,
Foreign Stock %, Expense Ratio
I
Alpha. R-Square, Industrial Cyclicals Sector, Assets, Expense Ratio, Year 1
Total Return, Stocks %, P/E Ratio, Income Ratio, Maximum Sales Charge,
Turnover, Style, Manager Tenure, Retail Sector, Debt % of Total
Capitalization, 6 Month Total Return
J
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return, 6
Month Total Return, Expense Ratio, Utilities Sector, Maximum Sales Charge,
Stocks %, P/E Ratio, Health Sector, Turnover, Manager Tenure, Income
Ratio, P/B Ratio, Retail Sector

179
Table D 6: Logistic Regression Classification Features for Dataset 2.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return,
Manager Tenure, Expense Ratio, P/E Ratio, Stocks %, Services Sector
B
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/E Ratio, Maximum
Sales Charge, Stocks %, Retail Sector, Year 1 Total Return, Expense Ratio
C
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Maximum
Sales Charge, Year 1 Total Return, P/E Ratio, Health Sector, Expense Ratio,
Stocks %, Financials Sector
D
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return, P/E
Ratio, Expense Ratio, Health Sector, Maximum Sales Charge, Stocks %, 6
Month Total Return, ROA
E
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Maximum Sales Charge,
Services Sector, P/E Ratio, Health Sector, Year 1 Total Return, Manager
Tenure, Debt % of Total Capitalization
F
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return, P/E
Ratio, Maximum Sales Charge, Health Sector, Expense Ratio
G
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Year 1 Total Return, P/E
Ratio, Expense Ratio, Health Sector, Debt % of Total Capitalization,
Maximum Sales Charge, Foreign Stock %, Style, Income Ratio
H
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/E Ratio, Maximum
Sales Charge, Year 1 Total Return, Manager Tenure, Health Sector, Utilities
Sector
I
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Maximum Sales Charge,
Year 1 Total Return, P/E Ratio, Stocks %, Expense Ratio
J
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Maximum Sales Charge,
Year 1 Total Return, P/E Ratio, Expense Ratio, Health Sector, Debt % of
Total Capitalization, 3 Month Total Return, Manager Tenure
Table D 7: C4.5 Classification Features for Dataset 3.
Sample
Classification Features
A
Assets, P/E Ratio, Year 1 Total Return, P/B Ratio, ROA, Financials Sector,
Manager Tenure, Foreign Stock %, Industrial Cyclicals Sector, Maximum
Sales Charge
B
P/E Ratio, Assets, Year 1 Total Return, Foreign Stock %, Stocks %, Services
Sector, 3 Month Total Return, 6 Month Total Return, Median Market
Capitalization
C
P/E Ratio, Assets, Year 1 Total Return, ROA, P/B Ratio, Median Market
Capitalization, Turnover, Financial Sector

180
Table D 7â€”continued.
Sample
Classification Features
D
P/E Ratio, Assets, 3 Month Total Return, Year 1 Total Return, Financial
Sector, Expense Ratio, 6 Month Total Return, Income Ratio, Foreign Stock %
E
P/E Ratio, Assets, Year 1 Total Return, Median Market Capitalization, Energy
Sector, Turnover, Financial Sector, Income Ratio, Health Sector, Utilities
Sector, Minimum Purchase
F
Assets, P/E Ratio, Year 1 Total Return, Stocks %, P/E Ratio, Expense Ratio,
Industrial Cyclicals Sector, Consumer Staples Sector
G
P/E Ratio, Assets, Year 1 Total Return, Median Market Capitalization,
Foreign Stock %, Consumer Staples Sector, Energy Sector, Income Ratio,
Turnover, Debt % of Total Capitalization, Stocks %
H
P/E Ratio, Assets, Year 1 Total Return, Energy Sector, Foreign Stock %, 3
Month Total Return, Median Market Capitalization, Stocks %
I
P/E Ratio, Assets, Year 1 Total Return, Financial Sector, Income Ratio,
Foreign Stock %, Health Sector
J
P/E Ratio, Assets, Year 1 Total Return, Foreign Stock %, 6 Month Total
Return, 3 Month Total Return, Turnover, Median Market Capitalization
Table D.8: Stepwise Discriminant Analysis Classification Features for Dataset 3.
Sample
Classification Features by Order of Selection
A
Year 1 Total Return, P/E Ratio, Expense Ratio, Assets, Median Market
Capitalization, Foreign Stock %, Utilities Sector, Financials Sector, Retail
Sector, Stocks %, Cash %, Technology Sector, Debt % of Total
Capitalization, 6 Month Total Return, Maximum Sales Charge
B
P/E Ratio, Year 1 Total Return, Assets, Foreign Stock %, Style, Expense
Ratio, Stocks %, Maximum Sales Charge, Consumer Staples Sector, Debt %
of Total Capitalization, Energy Sector, Turnover, Health Sector, Industrial
Cyclicals Sector, 6 Month Total Return, Services Sector
C
P/E Ratio, Year 1 Total Return, Assets, Foreign Stock %, Style, Expense
Ratio, Consumer Staples Sector, Utilities Sector, Maximum Sales Charge,
Retail Sector, Financials Sector, 3 Month Total Return, Income Ratio,
Turnover
D
Year 1 Total Return, P/E Ratio, Expense Ratio, Assets, Consumer Staples
Sector, Foreign Stock %, Income Ratio, Maximum Sales Charge, Style,
Financials Sector, 6 Month Total Return, Turnover, Utilities Sector, Manager
Tenure
E
Assets, P/E Ratio, Year 1 Total Return, Foreign Stock %, Expense Ratio,
Style, Financials Sector, Utilities Sector, Median Market Capitalization,
Maximum Sales Charge, Turnover, 3 Month Total Return

181
Table D.8â€”continued.
Sample
Classification Features by Order of Selection
F
P/E Ratio, Year 1 Total Return, Foreign Stock %, Assets, Median Market
Capitalization, Expense Ratio, Utilities Sector, Maximum Sales Charge,
Financials Sector, Retail Sector, 6 Month Total Return, Turnover
G
Year 1 Total Return, P/E Ratio, Expense Ratio, Assets, Style, Foreign Stock
%, Utilities Sector, Turnover, Financials Sector, Median Market
Capitalization, 3 Month Total Return
H
Expense Ratio, Year 1 Total Return, P/E Ratio, Assets, Foreign Stock %,
Median Market Capitalization, Utilities Sector, Maximum Sales Charge, 6
Month Total Return, Financials Sector, Stocks %, Cash %, Retail Services,
Style
I
P/E Ratio, Year 1 Total Return, Expense Ratio, Assets, Foreign Stock %,
Style, Stocks %, Retail Sector, Debt % of Total Capitalization, Consumer
Staples Sector, Maximum Sales Charge, 6 Month Total Return, Financials
Sector, Turnover
J
Year 1 Total Return, Foreign Stock %, Assets, Style, P/E Ratio, Expense
Ratio, Consumer Staples Sector, Maximum Sales Charge, Debt % of Total
Capitalization, 6 Month Total Return, Turnover, Stocks %, Financials Sector,
Utilities Sector, Income Ratio
Table D 9: Logistic Regression Classification Features for Dataset 3.
Sample
Classification Features by Order of Selection
A
Year 1 Total Return, Foreign Stock %, Utilities Sector, Assets, Financials
Sector, Maximum Sales Charge, Expense Ratio, P/E Ratio, Stocks %, Cash %,
Technology Sector, Debt % of Total Capitalization, Median Market
Capitalization
B
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Maximum Sales
Charge, Consumer Staples Sector, Expense Ratio, Stocks %, Flealth Sector,
Industrial Cyclicals Sector, 6 Month Total Return
C
Assets, P/E Ratio, Year 1 Total Return, Foreign Stock %, Utilities Sector,
Consumer Staples Sector, Maximum Sales Charge, Financials Sector, Retail
Sector, Expense Ratio
D
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Expense Ratio,
Consumer Staples Sector, Maximum Sales Charge, Utilities Sector, Financials
Sector, 6 Month Total Return
E
Assets, Year 1 Total Return, Foreign Stock %, Maximum Sales Charge,
Utilities Sector, Financials Sector, 3 Month Total Return, Expense Ratio

182
Table D.9â€”continued
Sample
Classification Features by Order of Selection
F
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Maximum Sales
Charge, Utilities Sector, Median Market Capitalization, Expense Ratio,
Financials Sector, 6 Month Total Return
G
Year 1 Total Return, Foreign Stock %, Utilities Sector, Assets, Financials
Sector, 3 Month Total Return, Expense Ratio, Maximum Sales Charge, P/E
Ratio, Median Market Capitalization
H
Expense Ratio, Year 1 Total Return, P/E Ratio, Foreign Stock %, Assets,
Industrial Cyclicals Sector, Utilities Sector, Median Market Capitalization,
Maximum Sales Charge, 6 Month Total Return, Flealth Sector, Stocks %
I
Year 1 Total Return, Foreign Stock %, Assets, P/E Ratio, Utilities Sector,
Financials Sector, Expense Ratio, Stocks %, 6 Month Total Return, Debt % of
Total Capitalization, Technology Sector
J
Year 1 Total Return, Foreign Stock %, Assets, Utilities Sector, Maximum
Sales Charge, Financials Sector, 3 Month Total Return, Expense Ratio
Table D. 10: C4.5 Classification Features for Dataset 4
Sample
Classification Features
A
Alpha, Assets, P/E Ratio, 3 Month Total Return, P/B Ratio, Expense Ratio,
Median Market Capitalization, R-Square, Foreign Stock %, Consumer
Durables Sector, Health Sector
B
Alpha, P/E Ratio, P/B Ratio, Maximum Sales Charge, Assets, R-Square,
Foreign Stock %
C
P/E Ratio, Alpha, Assets, Distributed Yield, Turnover, 6 Month Total Return,
P/B Ratio, R-Square, Industrial Cyclicals Sector
D
P/E Ratio, Alpha, Median Market Capitalization, Assets, Expense Ratio, R-
Square
E
Alpha, P/E Ratio, Assets, Stocks %, R-Square, 3 Month Total Return,
Services Sector, Minimum Purchase, Industrial Cyclicals Sector
F
Alpha, Median Market Capitalization, P/E Ratio, R-Square
G
Alpha, Assets, P/E Ratio, Technology Sector, Expense Ratio, Turnover, R-
Square
H
Alpha, P/E Ratio, Expense Ratio, R-Square, Assets, Stocks%, Median Market
Capitalization
I
Alpha, P/E Ratio, Assets, R-Square
J
Alpha, Income Ratio, P/E Ratio, P/B Ratio, Assets, R-Square, Year 1 Total
Return, Median Market Capitalization

183
Table Dll: Stepwise Discriminant Analysis Classification Features for Dataset 4.
Sample
Classification Features by Order of Selection
A
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Expense
Ratio, Stocks %, Cash %, Year 1 Total Return, Turnover, Utilities Sector,
Retail Sector, Maximum Sales Charge
B
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Stocks %, Maximum
Sales Charge, Debt % of Total Capitalization, P/E Ratio, Expense Ratio,
Turnover, Energy Sector
C
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Maximum
Sales Charge, Flealth Sector, Utilities Sector, Year 1 Total Return, P/E Ratio,
Retail Sector, Stocks %, Consumer Staples Sector, Expense Ratio, Foreign
Stock %
D
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Income Ratio, Year 1
Total Return, Turnover, Flealth Sector, P/E Ratio, Maximum Sales Charge,
Expense Ratio, Consumer Staples Sector, Manager Tenure
E
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Health Sector,
Foreign Stock %, Maximum Sales Charge, Turnover, Utilities Sector, Year 1
Total Return, P/E Ratio, Expense Ratio
F
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Maximum
Sales Charge, Health Sector, Median Market Capitalization, Expense Ratio,
Year 1 Total Return, Utilities Sector, Retail Sector, Turnover, P/E Ratio
G
Alpha, R-Square, Industrial Cyclicals Sector, Assets, Expense Ratio,
Turnover, P/B Ratio, Health Sector, Stocks %, Style, P/E Ratio, Year 1 Total
Return, Foreign Stock %
H
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/E Ratio, Expense
Ratio, Year 1 Total Return, Maximum Sales Charge, Stocks %, Utilities
Sector, Turnover, Health Sector, Retail Sector, Foreign Stock %
I
Alpha, R-Square, Industrial Cyclicals Sector, Assets, P/B Ratio, Stocks %,
Maximum Sales Charge, Style, Retail Sector, Expense Ratio, Turnover, P/E
Ratio
J
Alpha, R-Square, Assets, Industrial Cyclicals Sector, P/B Ratio, Maximum
Sales Charge, Turnover, Year 1 Total Return, Stocks %, Utilities Sector,
Health Sector, 3 Month Total Return, Expense Ratio, Manager Tenure
Table D 12: Logistic Regression Classification Attributes for Dataset 4
Sample
Classification Attributes in Order of Selection
A
Alpha, R-Square, Industrial Sector, Assets, P/B Ratio, Maximum Sales
Charge, Stocks %, Expense Ratio, Year 1 Total Return
B
Alpha, R-Square, Industrial Sector, Expense Ratio, P/E Ratio, Assets,
Maximum Sales Charge, Stocks %

184
Table D. 12~continued.
Sample
Classification Attributes in Order of Selection
C
Alpha, R-Square, Industrial Sector, Assets, P/E Ratio, Year 1 Total Return,
Maximum Sales Charge, Stocks %, Expense Ratio, Health Sector, P/B Ratio,
Retail Sector
D
Alpha, R-Square, Industrial Sector, Assets, Health Sector, Technology Sector,
P/E Ratio, Maximum Sales Charge, Expense Ratio, Year 1 Total Return,
Consumer Staples Sector, Stocks %
E
Alpha, R-Square, Industrial Sector, Assets, P/E Ratio, Maximum Sales
Charge, Health Sector, 3 Month Total Return
F
Alpha, R-Square, Industrial Sector, Expense Ratio, Assets, P/B Ratio, Health
Sector, Retail Sector, P/E Ratio, Year 1 Total Return, Turnover, Maximum
Sales Charge
G
Alpha, R-Square, Industrial Sector, Expense Ratio, Assets, P/E Ratio, Year 1
Total Return, Foreign Stock %, Consumer Durables Sector, Turnover,
Maximum Sales Charge, Health Sector
H
Alpha, R-Square, Industrial Sector, Assets, P/E Ratio, Maximum Sales
Charge, Year 1 Total Return, Expense Ratio
I
Alpha, R-Square, Industrial Sector, Expense Ratio, P/B Ratio, Stocks %,
Assets, Maximum Sales Charge
J
Alpha, R-Square, Industrial Sector, Assets, Maximum Sales Charge, P/E
Ratio, Year 1 Total Return, Expense Ratio

APPENDIX E
BEST CLASSIFICATION TREES FROM PHASES 1-4
C4.5 decision tree generator
Options:
File stem
Trees evaluated on unseen cases
Sensible test requires 2 branches with >=11 cases
Read 370 cases (24 attributes) from FE2PID.data
Decision Tree:
alpha <= -3.04 :
| alpha <= -6.41 : 1
| alpha > -6.41 :
| | Yield <= 0.05 : 2
| | Yield > 0.05 :
| | | ROA <=7.09 : 3
| | | ROA >7.09 : 2
alpha > -3.04 :
| alpha <=3.52 :
| | alpha <= -1.8 : 3
| | alpha >-1.8 :
| | | alpha <= 0.36 :
| | | | Expense_Ratio <=0.74 : 4
| | | | Expense_Ratio > 0.74 : 3
| | | alpha > 0.36 :
| | | | MedMktCap <= 1489 : 3
| | | | MedMktCap > 1489 :
| | | | | Cons_Durable <=2.9 : 4
| | | | | Cons_Durable >2.9 :
| | | | | | Cash% <=3.4 : 3
| | | | | | Cash% >3.4 : 4
| alpha >3.52 :
| | Yield <= 0.26 : 4
| | Yield > 0.26 :
| | | PB-ratio <=3.06 : 4
| | | PB-ratio >3.06 : 5
Figure E. 1: Phase 1 C4.5 Tree from Sample F.
185

186
Evaluation
on training
data (370 items):
Before Pruning
After Pruning
Size
Errors
Size
Errors
Estimate
51
97(26.2%)
27
106(28.6%)
(36.5%)
Evaluation
on test data (185
items):
Before Pruning
After Pruning
Size
Errors
Size
Errors
Estimate
51
70(37.8%)
27
58 (31.4%)
(36.5%)
(a)
(b) (c)
(d) (e) C-classified as
4
1
(a) :
class 1
2
12 9
1
(b) :
class 2
2 70
9
2 (c) :
class 3
17
37
1 (d) :
class 4
14
4 (e) :
class 5
Figure E. 1--continued

187
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=17 cases
Read 523 cases (23 attributes) from GEQIPLN.data
Decision Tree:
Alpha <= -2.72 :
| Alpha <= -5.76 : 1
| Alpha > -5.76 :
| | Yield <=0.98 : 2
| | Yield >0.98 : 3
Alpha > -2.72 :
| Alpha <= 4.01 :
| | Indust_Prod >25.6 : 2
| | Indust_Prod <= 25.6 :
| | | Alpha <= -1.34 : 3
| | | Alpha > -1.34 :
| | | | ROA > 9.78 : 3
lili ROA <= 9.78 :
| | | | | Assets > 1244.65 : 4
| | | | | Assets <= 1244.65 :
| | | | | | Alpha <=0.37 :
Median_Mkt_Cap <=7913 : 3
Median_Mkt_Cap > 7913 : 4
Alpha >0.37 :
| R-Square <=76 : 3
| R-Square > 76 : 4
Alpha > 4.01 :
| Yield <=0.03 : 4
| Yield > 0.03 : 5
Evaluation on training data (523 items):
Before Pruning
After Pruning
Size
Errors Size
Errors Estimate
37 163(31.2%)
25 167(31.9%) (38.1%)
Figure E.2: Phase 2 C4.5 Decision Tree for Sample G for Regular Features.

188
Evaluation
on test data (261 items):
Before Pruning
After
Pruning
Size
Errors
Size
Errors Estimate
37
90 (34
.5%)
25
86(33
0%) (38.1%)
(a)
(b)
(c)
(d) (e)
<-classified as
17
3
(a) :
class 1
9
24
9
2
(b) :
class 2
2
72
20 2
(c) :
class 3
1
19
42 6
(d) :
class 4
3
10 20
(e) :
class 5
Figure E.2~continued.

189
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=14 cases
Read 523 cases (26 attributes) from JEQIALL.data
Decision Tree:
Return-Risk <= -0.18 :
| Return-Risk <= -1.16 : 1
| Return-Risk > -1.16 : 2
Return-Risk > -0.18 :
| Return-Risk <=0.57 :
| | Return-Risk > 0.39 : 4
| | Return-Risk <= 0.39 :
| | | Return-Risk <= 0.03 :
| | | | Assets > 121.28 : 3
| | | | Assets <= 121.28 :
| | | | | Beta <=0.98 : 2
| | | | | Beta >0.98 : 3
| | | Return-Risk > 0.03 :
| | | | Return-Risk <=0.3 : 3
| | I | Return-Risk >0.3 :
| | | | | Median_Mkt_Cap > 6635 : 4
| | | | | Median_Mkt_Cap <= 6635 :
| | | | | | P/E_Ratio <=21.26 : 3
| | | | | I P/E_Ratio >21.26 : 4
| Return-Risk >0.57 :
| | Return-Risk > 0.71 : 5
| | Return-Risk <=0.71 :
| | | Assets <= 410.22 : 4
| | | Assets > 410.22 : 5
Evaluation on training data (523 items):
Before
Pruning
After Pruning
Size
Errors
Size
Errors
Estimate
33
76(14.5%)
25
78(14.9%)
(20.1%)
Figure E.3: Phase 2 C4.5 Decision Tree for Sample J for Derived Features.

190
Evaluation
on test data (261 items):
Before Pruning
After
Pruning
Size
Errors
Size
Errors Estimate
33
50(19.2%)
25
48 (18
4%) (20.1%) -
(a)
(b) (c)
(d) (e)
<-classified as
16
5
(a) :
class 1
1
32 4
(b) :
class 2
7 78
12
(c) :
class 3
14
58 5
(d) :
class 4
29
(e) :
class 5
Figure E.3~continued.

191
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=23 cases
Read 666 cases (30 attributes) from AR999I.data
Decision Tree:
P/E_Ratio >35.46 : 1
P/E_Ratio <= 35.46 :
| NetAssets <= 1189.6 :
| | NetAssets <=6.8 : 1
| | NetAssets >6.8 :
| | | yltotret <= -1.86 :
| | | | NetAssets <= 132.5 :
| | | | | P/E_Ratio <=22.09 : 2
| | | | | P/E_Ratio >22.09 : 1
| | | | NetAssets > 132.5 :
| | | | | Services <=15.5 : 3
| | | | | Services >15.5 : 2
| | | yltotret > -1.86 :
| | | | yltotret <=7.08 : 3
| | | | yltotret >7.08 :
| | | | | ForeignStock <=27.1 : 4
| | | | | ForeignStock >27.1 : 3
| NetAssets > 1189.6 :
| | yltotret <=6.31 : 4
| | yltotret > 6.31 : 5
Evaluation on training data (666 items):
Before Pruning
Size Errors
39 312(46.8%)
Evaluation on test data
Before Pruning
Size Errors
39 182(54.7%)
After Pruning
Size Errors Estimate
21 320(48.0%) (52.8%)
(333 items):
After Pruning
Size Errors Estimate
21 173(52.0%) (52.8%)
Figure E.4. Phase 3 C4.5 Decision Tree for Sample A for Dataset 1.

192
(a)
(b)
(c)
(d)
(e)
<-classified as
20
4
(a) :
class 1
16
13
38
2
1
(b) :
class 2
6
9
98
15
2
(c) :
class 3
1
2
45
27
10
(d) :
class 4
1
10
11
2
(e)
class 5
Figure E.4~continued.

193
C4
.5 decision
tree generator
Options:
File stem
Sensible
test requires 2 branches with >=13 cases
(32 attributes) from HR999IAR.data
Decision Tree:
R-
Square <= 1
: 1
R-
Square > 1 :
i
Alpha <= 7
.57 :
i
| Alpha
<= -4.02 :
i
1 1 R-
Square <=74 : 1
i
1 1 R-
Square > 74 : 2
i
| Alpha
> -4.02 :
i
| | Alpha <= -0.77 :
i
i i i
NetAssets <=28 : 2
i
i i i
NetAssets > 28 :
i
i i i
1 Turnover <=22 : 3
i
i i i
| Turnover >22 :
i
i i i
| | Financials <= 11 : 2
i
i i i
| | Financials > 11 :
i
i i i
| | | NetAssets <= 212.7 : 2
i
i i i
| | | NetAssets > 212.7 : 3
i
| | Alpha > -0.77 :
i
i i i
Alpha > 6.85 : 4
i
i i i
Alpha <= 6.85 :
i
i i i
| P/E Ratio <= 28.15 :
i
i i i
| | ExpRatio >2.4 : 2
i
i i i
| | ExpRatio <=2.4 :
i
i i i
| | | m6totret <= -13.1 : 3
i
i i i
| | | m6totret > -13.1 :
i
i i i
| | | | NetAssets > 1018.6 :
4
i
i i i
| | | | NetAssets <= 1018.6
i
i i i
| | | | | R-Square <=24 :
3
i
i i i
I | | | | R-Square >24 :
i
i i i
| | | | | | Alpha <=3.03 :
i
i i i
1 1 | | | 1 1 R-Square
<= 62 :
i
i i i
1 1 1 1 1 1 1 1 Alpha
<= 1.03 :
2
i
i i i
1 1 1 1 1 1 1 1 Alpha
> 1.03 :
3
i
i i i
| 1 1 1 1 | | R-Square
> 62 :
i
i i i
| I | | | I | I ConsDur <= 2.5
: 4
i
i i i
| | | | | | | | ConsDur >2.5 :
3
i
i i i
| | | | | | Alpha >3.03
i
i i i
1 1 1 1 1 I 1 Health >
12.8 : 3
i
i i i
1 | | 1 | | | Health <=
12.8 :
Figure E.5: Phase 3 C4.5 Decision Tree for Sample H for Dataset 2.

194
I I
I I
I I
I
I
|Technology > 12.7:4
(Technology <= 12.7:
| IMinPurchase >
1000 : 4
| IMinPurchase <=
1000 :[SI]
P/E_Ratio > 28.15
| Alpha <= 3.81
| Alpha >3.81 :
I I I
I I I
I I I
Alpha >7.57 :
| NetAssets > 1329.6 : 5
| NetAssets <= 1329.6 :
| | Alpha <= 11.09 : 4
| | Alpha > 11.09 : 5
Subtree [SI]
MedMktCap <= 5038 : 3
MedMktCap > 5038 : 4
Evaluation on training data (666 items):
Before Pruning After Pruning
Size Errors Size Errors Estimate
57 189(28.4%) 53 189(28.4%) (36.6%)
Evaluation on test data (333 items):
Before Pruning After Pruning
Size Errors Size Errors Estimate
57 121(36.3%) 53 120(36.0%) (36.6%)
(a)
(b)
(c)
(e)
<-classified as
15
3
(a) :
class 1
4
45
15
2
(b) :
class 2
3
16
86
28
2
(c) :
class 3
1
2
19
53
7
Id) :
class 4
18
14
(e) :
class 5
Figure E. 5â€”continued.

195
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=24 cases
Read 666 cases (30 attributes) from HR999I3.data
Decision Tree:
P/E_Ratio >34.77 : 1
P/E_Ratio <= 34.77 :
| NetAssets <=13.2 : 1
I NetAssets > 13.2 :
| | yltotret <= -0.59 :
| | | NetAssets > 214.9 : 2
| | | NetAssets <= 214.9 :
| | | | Energy > 10 : 1
| | | | Energy <= 10 :
| | | | | m3totret <= -4.32 : 1
| | | | | m3totret > -4.32 : 2
| | yltotret > -0.59 :
| | | NetAssets > 1008.2 : 3
| | | NetAssets <= 1008.2 :
| | | | ForeignStock <=59.8 :
| | | | | MedMktCap > 4161 : 2
| | | | | MedMktCap <= 4161 :
| | | | | | Stocks% <=77.5 : 3
| | | | | | Stocks% >77.5 :
| | | | | | | yltotret >7.4 : 3
| | | | | | | yltotret <=7.4 :
| | | | | | | | m3totret <= -2.41 : 3
| I | | | | | | m3totret > -2.41 : 2
| I | | ForeignStock >59.8 :
| I | | | yltotret <=16.91 : 1
| | | | | yltotret > 16.91 : 2
Evaluation on training data (666 items):
Before Pruning After Pruning
Size Errors Size Errors Estimate
31 229(34.4%) 27 228(34.2%) (39.8%)
Figure E.6: Phase 3 C4.5 Decision Tree for Sample H for Dataset 3.

196
Evaluation
on test
data (333 items):
Before Pruning
After Pruning
Size
Errors
Size Errors
Estimate
31
134(40
.2%)
27 128(38.4%)
(39.8%)
(a)
(b)
(c)
c-classified as
42
33
9
(a): class 1
19
90
26
(b): class 2
4
37
73
(c): class 3
Figure E 6~contÂ¡nued.

197
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=28 cases
Read 666 cases (32 attributes) from FR999IA3.data
Decision Tree:
Alpha <= -0.77 :
| MedMktCap <= 4327 : 1 (81.0/17.0)
| MedMktCap > 4327 :
| | NetAssets <= 122.2 : 1 (49.0/19.9)
| | NetAssets > 122.2 : 2 (30.0/9.3)
Alpha > -0.77 :
| P/E_Ratio > 34.5 : 1 (28.0/8.2)
| P/E_Ratio <=34.5 :
| | Alpha > 6.85 : 3 (104.0/16.1)
| | Alpha <= 6.85 :
| | | NetAssets > 1055.7 : 3 (62.0/15.9)
| | | NetAssets <= 1055.7 :
| | | | Alpha <=3.36 :
| | | | | R-Square <= 59 : 1 (58.0/28.1)
| | | | | R-Square > 59 : 2 (133.0/41.2)
| I I I Alpha > 3.36 :
| | | | | R-Square <= 58 : 2 (72.0/36.4)
| | | | | R-Square > 58 : 3 (49.0/17.8)
Evaluation on training data (666 items):
Before Pruning
Size Errors
23 183(27.5%)
Evaluation on test data
Before Pruning
Size Errors
23 94(28.2%)
After Pruning
Size Errors Estimate
19 180(27.0%) (31.5%)
(333 items):
After Pruning
Size Errors Estimate
19 92(27.6%) (31.5%)
Figure E.7: Phase 3 C4.5 Decision Tree for Sample F for Dataset 4.

(a)
(b)
(c)
<-classified as
80
16
2
(a) :
class 1
19
83
27
(b) :
class 2
2
26
78
(c) :
class 3
Figure E. 7â€”continued.

199
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=27 cases
Read 947 cases (50 attributes) from XDF.data
Decision Tree:
R-Square <=2:1
R-Square > 2 :
| Alpha >8.11 : 5
| Alpha <= 8.11 :
| | Alpha <= -3.71 :
| | | R-Square <= 68 : 1
| | | R-Square > 68 : 2
| | Alpha > -3.71 :
| | | Alpha <= -0.77 :
| | | | R-Square <=56 : 2
| | | | R-Square >56 :
| | | | | NetAssets <= 154.8 : 2
| | | | | NetAssets > 154.8 : 3
| | | Alpha > -0.77 :
| | | | Alpha >7.05 : 4
| | | | Alpha <=7.05 :
| | | | | AnnualRet92 <= -6.54 : 2
| | | | | AnnualRet92 > -6.54 :
| | | | | | NetAssets > 1055.7 : 4
| | | | | | NetAssets <= 1055.7 :
| | | | | | | R-Square <= 60 :
| | | | | | | | Alpha <= 1.06 : 2
| | | | | | | | Alpha > 1.06 : 3
| | | | | | | R-Square > 60 :
| I | | | | | | Alpha <=3.04 : 3
| | | | | | | | Alpha >3.04 : 4
Evaluation on training data (947 items):
Before Pruning After Pruning
Size Errors Size Errors Estimate
33 328(34.6%) 27 331(35.0%) (39.5%)
Figure E.8: Phase 4 C4.5 Decision Tree for Tree8 for the Newl052 Crossvalidation
Dataset.

200
Evaluation
on test data (105 items):
Before Pruning
After
Pruning
Size
Errors
Size
Errors
Estimate
33
39(37.1%)
27
36(34
3%)
(39.5%)
(a)
(b) (c)
(d) (e)
<-classified as
5
3
(a) :
class 1
3
11
(b) :
class 2
6 30
2 2
(c) :
class 3
i
6
15 4
(d) :
class 4
1 8
(e) :
class 5
Figure E.8â€”continued.

201
C4.5 decision tree generator
Options:
File stem
Sensible test requires 2 branches with >=47 cases
Read 947 cases (50 attributes) from XDF.data
Decision Tree:
Alpha <= -0.77 :
| Alpha <= -3.98 : 1
| Alpha > -3.98 :
| | R-Square <= 63 : 1
| | R-Square > 63 :
| | | NetAssets <= 108 : 1
| | | NetAssets > 108 : 2
Alpha > -0.77 :
| AnnualRet92 <= -7.62 : 1
| AnnualRet92 > -7.62 :
Alpha >7.05 : 3
Alpha <=7.05 :
| NetAssets > 1055.7 : 3
| NetAssets <= 1055.7 :
| | Alpha <=3.07 :
| | | R-Square <= 62 : 1
| | | R-Square > 62 : 2
| | Alpha >3.07 :
| | | R-Square <= 57 : 2
| | | R-Square > 57 : 3
Evaluation on training data (947 items):
Before Pruning
After Pruning
Size
Errors Size
Errors Estimate
23 269(28.4%)
21 261(27.6%) (31.4%)
Evaluation on test data (105 items):
Before Pruning
After Pruning
Size
Errors Size
Errors Estimate
23 28(26.7%)
21 26(24.8%) (31.4%)
Figure E.9: Phase 4 C4.5 Decision Tree for Tree2 for the Newl052A Cross-validation
Dataset.

202
(a)
(b)
(c)
c-classified
24
6
1
(a) :
class 1
8
26
5
(b) :
class 2
6
29
(c) :
class 3
Figure E.9~continued.

203
C4.5 decision tree generator
Options:
File stem < NEW1052B.tree0>
Sensible test requires 2 branches with >=33 cases
Read 946 cases (43 attributes) from XDF.data
Decision Tree:
P/E_Ratio > 35.74 : 1
P/E_Ratio <= 35.74 :
| ylpctrkobj <= 84 :
| | NetAssets > 1008.2 : 3
| | NetAssets <= 1008.2 :
| | | ylpctrkobj <= 12 : 3
| | | ylpctrkobj > 12 :
| | | | Financials >32.7 : 3
| | | | Financials <=32.7 : Â»
| | | | | Financials <=0.4 : 1
| | | | | Financials >0.4 :
NetAssets <=28.3 : 1
NetAssets > 28.3 :
| m3totret > -2.39 : 2
m3totret <= -2.39 :
| yltotret <= -0.54 : 2
| yltotret > -0.54 : 3
ylpctrkobj >84 :
| NetAssets <= 212.7 : 1
| NetAssets >212.7 : 2
Evaluation on training data (946 items):
Before Pruning
After Pruning
Size
Errors Size
Errors Estimate
33 329(34.8%)
21 331(35.0%) (38.9%)
Evaluation on test data (106 items):
Before Pruning
After Pruning
Size
Errors Size
Errors Estimate
33 35(33.0%)
21 39(36.8%) (38.9%)
Figure E.10: Phase 4 C4.5 Decision Tree for TreeO for the Newl052B Crossvalidation
Dataset.

204
(a)
(b)
(c)
<-classified as
18
9
4
(a) :
class 1
5
22
13
(b) :
class 2
2
6
27
(c) :
class 3
Figure E. 10~continued.

205
C4.5 decision tree generator
Options:
File stem < NEW1052C3.tree0>
Sensible test requires 2 branches with >=35 cases
Read 946 cases (45 attributes) from XDF.data
Decision Tree:
P/E_Ratio >35.74 : 1
P/E_Ratio <= 35.74 :
ylpctrkobj <= 84 :
AnnualRet92 >20.65 : 3
AnnualRet92 <= 20.65 :
NetAssets > 1008.2 : 3
NetAssets <= 1008.2 :
NetAssets <=12.1 : 1
NetAssets > 12.1 :
AnnualRet93 <= 8.69 :
AnnualRet92 <=4.59 : 1
AnnualRet92 > 4.59 : 2
AnnualRet93 > 8.69 :
AnnualRet92 <= -5.36 : 1
AnnualRet92 > -5.36 :
ylpctrkobj <= 12 : 3
ylpctrkobj > 12 :
AnnualRet92 <=3.43 : 2
AnnualRet92 > 3.43 :
AnnualRet93 >20.1 : 3
AnnualRet93 <= 20.1 :
MedMktCap > 4327 : 2
MedMktCap <= 4327 :
| AnnualRet92 <= 8.9 :
| AnnualRet92 >8.9 :
| | ROA <=7.06 : 2
| | ROA >7.06 : 3
ylpctrkobj > 84 :
NetAssets <= 212.7 : 1
NetAssets > 212.7 : 2
Evaluation on training data (946 items):
Before Pruning After Pruning
Size
31 298
Errors
(31.5%)
Size
31
Errors
298 (31.5%)
Estimate
(36.4%)
Figure Ell: Phase 4 C4.5 Decision Tree for TreeO for the Newl052C3 Crossvalidation
Dataset.

206
Evaluation
on test data (106 items):
Before Pruning
After Pruning
Size
Errors
Size Errors
Estimate
31
30(28.3%)
31 30(28.3%)
(36.4%)
(a)
(b) (c)
Oclassified as
22
6 3
(a): class 1
7
24 9
(b): class 2
5 30
(c): class 3
Figure Ell -continued.

207
Table E.l: Term Identification Key for Decision Trees.
Term
Description
Term
Description
Alpha
Alpha
NetAssets
Assets
AnnualRet92
Annual Return for
1992
PBRatio
P/B Ratio
AnnualRet93
Annual Return for
1993
P/ERatio
P/E Ratio
ConsDurable or
ConsDur
Consumer Durables
Sector
Return - Risk
Artificial Attribute
for Momingstar
Return - Risk
Energy
Energy Sector
ROA
Return on Assets
ExpenseRatio or
ExpRatio
Expense Ratio
R-Square
R-Square
Financials
Financial Sector
Services
Services Sector
ForeignStock
Foreign Stock %
Stocks
Stocks %
Health
Health Sector
Technology
Technology Sector
IndustProd
Industrial Products
Sector
Turnover
Turnover
m3totret
3 Month Total Return
ylpctrkobj
Year 1 Percentage
Rank by Objective
mÃ³totret
6 Month Total Return
yltotret
Year 1 Total Return
MedMktCap
Median Market
Capitalization
y3mstarrisk
Year 3 Momingstar
Risk
MinPurchase
Minimum Initial
Purchase
Yield
SEC Yield

REFERENCES
Altman, E., Avery, R., Eisenbeis, R., & Sinkey, J (1981). Application of
Classification Techniques in Business. Banking and Finance. Greenwich, CN:
JAI Press, Inc.
Arend, M. (1988). Expert system is AEâ€™s latest commodity. Wall Street Computer
Review. 5(9), 24,101-102.
Barr, A., & Feigenbaum, E.A. (eds.) (1981). The Handbook of Artificial Intelligence
Black, F., & Scholes, N. (1973). The pricing of options and corporate liabilities.
Journal of Political Economy. 81,637-659.
Blum, A. (1992). Neural Networks in C++. New York, NY: John Wiley & Sons,
Inc.
Braun, H., & Chandler, J.S. (1987). Predicting stock market behavior through rule
induction: An application of the Learning-from-Example approach. Decision
Sciences. 18,415-429.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and
Regression Trees. Belmont, CA: Wadsworth International Group
Brown, S.J., & Goetzmann, W.N. (1995). Performance persistence. Journal of
Finance. 50(2), 679-698.
Bryant, S.M. (1996) A Case-Based Reasoning Approach to Bankruptcy Prediction
Modeling. The Louisiana State University, unpublished dissertation.
Buntine, W., & Niblett, T. (1992). A Further Comparison of Splitting Rules for
Decision-Tree Induction. Machine Learning. 8, 75-85.
Carhart, M.M. (1997). On persistence in mutual fund performance. Journal of
Finance. 52(1), 57-82.
CDA/Wiesenberger. (1993). Mutual Funds Update: Subscriber's Guide. Rockville,
MD: CDA Investment Technologies, Inc.
Chen, K.C., & Liang, T P (1989). PROTRADER: an expert system for program
208

209
Chiang, W.C., Urban, T.L., & Baldridge, G W. (1996). A neural-network approach to
mutual fund net asset value forecasting. Omega-International Journal of
Management Science. 24(2), 205-215.
Chung, H.-M.M., & Silver, M S. (1992). Rule-Based Expert Systems and Linear
Models: An Empirical Comparison of Learning-By-Examples Methods.
Decision Sciences. 23(3), 687-707.
Coats, P.K., & Fant, L.F. (1991). A neural network approach to forecasting financial
distress. Journal of Business Forecasting. 10(4), 9-12.
Cohen, P.R., & Feigenbaum, E.A. (eds.) (1982). The Flandbook of Artificial
Cormen, T.H., Leisserson, C.E., & Rivest, R.L. (1990) Introduction to Algorithms
Cambridge, MA: The MIT Press.
Culbertson, W.Y. (1987). Expert systems in finance. Corporate Accounting. 5(2),
47-50.
Deboeck, G.J. (ed.) (1994) Trading on the Edge. New York: John Wiley & Sons,
Inc.
Desai, V.S., Crook, J.N., & Overstreet, G.AJr. (1996). A comparison of neural
networks and linear scoring models in the credit union environment. European
Journal of Operational Research, 95(1), 24-37.
Duchessi, P., Shawky, H., & Seagle, J.P. (1988). A knowledge-engineered system for
commercial loan decisions. Financial Management. 17(3), 57-65.
Dunkin, A. (1995) Mutual funds: Don't be dazzled by first-year fireworks. Business
Week. 11/13/95, 160-161.
Dutta, S.,& Shekhar, S. (1988) Bond-rating: a non-conservative application of neural
networks. Proceedings of the 1988 IEEE International Conference on Neural
Networks.
Dwyer, M.M.D (1992). A Comparison of Statistical Techniques and Artificial Neural
Network Models in Corporate Bankruptcy Prediction. The University of
Edelson, W ,& Gargano, M L. (1995) A genetic algorithm approach to optimizing
portfolio merging problems. The Third International Conference on Artificial
Intelligence Applications on Wall Street: Gaithersburg, MD: Software
Engineering Press.
Elmer, P.J, & Borowski, D M. (1988) An expert system approach to financial
analysis: the case of S&L bankruptcy. Financial Management. 17(3), 66-76.

210
Elton, E.J., & Gruber, M.J. (1987). Portfolio Analysis with Partial Information: The
Case of Grouped Data. Management Science. 33(101. 1238-1246.
Elton, E.J., Gruber, M.J., & Blake, C.R. (1996). The persistence of risk-adjusted
mutual fund performance. Journal of Business. 69(2), 133-157.
Elton, E.J., Gruber, M.J., & Grossman, S. (1986). Discrete Expectational Data and
Portfolio Performance. The Journal of Finance. 41(3), 699-714.
Fama, E.F. (1991). Efficient Capital Markets: II. The Journal of Finance. 46(5),
1575-1617.
Fayyad, U.M., & Irani, K.B (1992). On the Flandling of Continuous-Valued
Attributes in Decision Tree Generation. Machine Learning. 8, 87-102.
Feigenbaum, E.A., McCorduck, P., & Nii, H P. (1988). The Rise of the Expert
Company. New York, NY: Times Books.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems.
Annals of Eugenics. 7, 179-188.
Flury, B , & Riedwyl, H. (1988). Multivariate Statistics. London, UK: Chapman and
Hall.
Fogler, H R. (1995). Investment analysis and new quantitative tools The Journal of
Portfolio Management. 21(4), 39-48.
Gluch-Rucys, M., & Walker, D. (1991). Underwriting intelligence. Mortgage
Banking. 52(3), 60-66.
Goetzmann, W.N., & Ibbotson, R.G. (1994). Do winners repeat? Patterns in mutual
fund return behavior. Journal of Portfolio Management. 20(2), 9-18.
Goldberg, D.E. (1989). Genetic Algorithms in Search. Optimization, and Machine
Goldberg, D.E. (1994). Genetic and evolutionary algorithms come of age.
Communications of the ACM. 37(3), 113-119.
Graham, L.E., Damens, J., & Van Ness, G. (1991). Developing Risk Advisor: an
expert system for risk identification. Auditing: A Journal of Practice & Theory,
10(1), 69-96.
Han, L, Chandler, J.S., & Liang, T.P. (1996). The impact of measurement scale and
correlation structure on classification performance of inductive learning and
statistical methods. Expert Systems with Applications. 10(21. 209-221.

211
Hansen, J V., Koehler, G.J, Messier, Jr.W.F., & Mutchler, J.F. (1993). Developing
knowledge structures: A comparison of a qualitative-response model and two
machine-learning algorithms. Decision Support Systems. 10(2), 235-243.
Hansen, J.V., & Messier, Jr.W.F. (1986). A knowledge-based expert system for
auditing advanced computer systems. European Journal of Operational Research,
26(3), 371-379.
Harrell, D. (1997). The Star Rating. Electronic Citation:
http://www.momingstar.com.
Harries, M.,& Horn, K. Detecting concept drift in financial time series prediction
using symbolic machine learning. Eighth Australian Joint Conference on
Artificial Intelligence:
Hawley, D.D., Johnson, J.D., & Raina, D (1990). Artificial neural systems: a new
tool for financial decision-making. Financial Analysts Journal. 46(6), 63-72.
Heitkoetter, J. and Beasley, D. (1997). The Hitchhiker's Guide to Evolutionary
Computation. Heitkoetter, J. and Beasley, D Genetic Algorithms FAQ.
Electronic Citation: http://www.cis.ohio-state.edu/hypertext/faq/usenet/ai-
faq/genetic/part6/faq html
Hobbs, A.,& Bourbakis, N.G. (1995). A neurofuzzy arbitrage simulator for stock
investing. In Anonymous. Proceedings of the IEEE/IAFE 1995 Computational
Intelligence for Financial Engineering: New York, NY, IEEE
Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI:
University of Michigan Press.
Holsapple, C.W., Tam, K Y, & Whinston, A.B. (1988) Adapting expert system
technology to financial management. Financial Management. 17(3), 12-22.
Hunt, E., Marin, J., & Stone, P (1966). Experiments in Induction New York:
Hutchinson, J.M., Lo, A W, & Poggio, T. (1994). A nonparametric approach to
pricing and hedging derivative securities via learning networks. Journal of
Finance. 49(3), 851-889.
Jain, B.A., & Nag, B.N (1995). Artificial neural network models for pricing initial
public offerings. Decision Sciences. 26(3), 283-302.
Jensen, H.L. (1992). Using neural networks for credit scoring. Managerial Finance.
18(6), 15-26.
Jih, W.K., & Patterson, S. (1992). An expert prototype that determines corporate tax
status and liabilities. Financial & Accounting Systems. 7(4), 15-19.

212
Johnston, J. (1972). Econometric Methods . (2nd ed ). New York, NY: McGraw-
Hill.
Judge, G.G, Griffiths, W.E., Hill, R. C., & Lee, T.-C. (1980). The Theory and
Practice of Econometrics. New York, NY: John Wiley and Sons.
Karels, G.V., & Prakash, A.J. (1987). Multivariate normality and forecasting of
573-593.
Kattan, M.W., Adams, D.A., & Parks, M S. (1993). A comparison of Machine
Learning with human judgment. Journal of Management Information Systems.
9(4), 37-57.
Kim, H., & Koehler, G.J. (1995). Theory and practice of decision tree induction.
Omega. International Journal of Management Science. 23(6), 637-652.
Klecka, W.R. (1980). Discriminant Analysis. Beverly Hills, CA: Sage Publications.
Koehler, G.J. (1989). Characterization of unacceptable solutions in LP Discriminant
Analysis. Decision Sciences. 20, 239-257.
Kryzanowski, L., Galler, M., & Wright, D W. (1993). Using artificial neural networks
to pick stocks. Financial Analysts Journal. 4914). 21-27.
Kuncheva, L. (1993), Genetic algorithm for feature selection for parallel classifiers
Information Processing Letters. 46(4), 163-168.
Kwon, T.M., & Feroz, E.H (1996). A multilayered perceptron approach to prediction
of the SECâ€™s investigation targets IEEE Transactions on Systems. Man, and
Cybernetics. 7, 1286-1290.
Lacher, R.C., Coats, P.K., Sharma, S.C., & Fant, L.F. (1995). A neural network for
classifying the financial health of a firm. European Journal of Operational
Research. 85, 53-65.
Langley, P. (1996) Elements of Machine Learning. San Francisco, CA: Morgan
Kaufman Publishers, Inc.
Laurance, R. (1988). Bold new theory could make investors' day. Wall Street
Computer Review. 5(9), 8-12.
Leckey, A. (1997). The Momingstar Approach to Investing. New York, NY: Warner
Books, Inc.
Lettau, M. (1994). Essays on Adaptive Learning in Macroeconomics and Finance.
Princeton University, unpublished dissertation.

213
Lettau, M. (1997). Explaining the facts with adaptive agents: The case of mutual fund
flows. Journal of Economic Dynamics & Control. 21(7), 1117-1147.
Lim, T., Loh, W., & Shih, Y. (1997). An Empirical Comparison of Decision Trees
and Other Classification Methods. University of Wisconsin, unpublished
Technical Report 979.
Lynch, P., & Rothchild, J. (1993). Beating the Street. New York, NY: Simon &
Schuster.
Mahfoud, S.,& Mani, G. (1995), Genetic algorithms for predicting individual stock
performance. In Anonymous The Third International Conference on Artificial
Intelligence Applications on Wall Street: Gaithersburg, MD: Software
Engineering Press.
Malkiel, B.G. (1990). A Random Walk Down Wall Street (5 ed). New York, NY:
W. W. Norton & Company.
Malkiel, B.G. (1995). Returns from investing in equity mutual funds 1971 to 1991.
Journal of Finance. 50(2), 549-572.
Manly, B.F.J. (1995). Multivariate Statistical Methods: A Primer. (Second ed ).
Chapman & Hall.
McGough, R. (1992). Fidelity's Bradford Lewis takes aim at indexes with his 'neural
network'computer program. The Wall Street Journal. 10/27/92, p.Cl.
Messier, W.F., Jr., & Hansen, J.V. (1988). Inducing Rules for Expert System
Development: An Example Using Default and Bankruptcy Data. Management
Science. 34(12), 1403-1415.
Michie, D. (1987). Current Developments in Expert Systems. In J. R. Quinlan (Ed ),
Publishing Company.
Michie, D. (1989). Problems of Computer-Aided Concept Formation. In J. R. Quinlan
(Ed ), Applications of Expert Systems. Vol. 2. (pp. 310-333). Reading, MA:
Wesley Publishing Company
Mingers, J. (1989). An Empirical Comparison of Selection Measures for Decision-
Tree Induction. Machine Learning. 3,319-342.
Mitchell, M. (1997) An Introduction to Genetic Algorithms. Cambridge, MA: The
MIT Press.

214
Morningstar (1992). Mominestar Mutual Funds: User's Guide. Chicago, IL:
Momingstar.
Morrison, D.F. (1990). Multivariate Statistical Methods. (3 ed). New York:
McGraw-Hill, Inc.
Natarajan, B.K. (1991). Machine Learning: A Theoretical Approach. San Mateo,
CA: Morgan Kaufmann Publishers, Inc.
Patel, ]., Zeckhauser, R., & Hendricks, D (1991). The Rationality Struggle:
Illustrations from Financial Markets The American Economic Review. 81(2),
232-236.
Phelps, S. (1995). The Determinants of Mutual Fund Flows. Unpublished work,
University of Central Florida
Phelps, S., & Detzel, L. (1997). The nonpersistence of mutual fund performance.
Quarterly Journal of Business & Economics, 36(2), 55-69.
Piramuthu, S., Kuan, C., & Shaw, M.J. (1993). Learning algorithms for neural-net
decision support. ORSA Journal on Computing. 5(4), 361-373
Quinlan, J.R. (1979). Discovering Rules of Induction From Large Collections of
Examples. In D. Michie (Ed ), Expert Systems in the Micro Electronic Age, (pp.
168-201). Edinburgh, Scotland: Edinburgh University Press.
Quinlan, J.R (1986). Induction of Decision Trees Machine Learning. 1. 81-106
Quinlan, J.R (1987). Simplifying Decision Trees. International Journal of Man-
Machine Studies. 27, 221-234.
Quinlan, J.R. (1990). Decision Trees and Decisionmaking. IEEE Transactions on
Systems. Man, and Cybernetics. 20(2), 339-346.
Quinlan, J.R. (1993). C4.5: Progams for Machine Learning. San Mateo, CA:
Morgan Kaufmann Publishers, Inc.
Quinlan, J.R., Compton, P.J., Horn, K.A., & Lazarus, L. (1987). Inductive
Knowledge Acquisition: A Case Study. In J. R. Quinlan (Ed ), Applications of
Company
Radcliffe, R.C. (1994). Investment: Concepts. Analysis, and Strategy. (4 ed ). New
York, NY: HarperCollins College Publishers.
Rich, E., & Knight, K. (1991). Artificial Intelligence. (Second ed ). New York:
McGraw-Hill, Inc.

215
Riolo, R.L (1992). Survival of the fittest bits. Scientific American. 267(11. 114-116.
Rumelhart, D.E., & McClelland, J.L. (1986). Parallel Distributed Processing.
Cambridge, MA: MIT Press.
Salchenberger, L.M., Cinar, E.M., & Lash, N.A. (1992). Neural networks: a new tool
for predicting thrift failures. Decision Sciences. 23(4), 899-916.
SAS Institute (1992) SAS/STAT User's Guide. Version 6. (4 ed.) Cary, NC: SAS
Institute Inc.
Schreiber, N. (1984). Artificial intelligence in finance: a challenge to put mind into
matter. Wall Street Computer Review. 12(October), 75-77.
Sena, J.A., & Smith, L.M. (1987). A sample expert system for financial statement
analysis. Journal of Accounting & EDP. 3(2), 15-22.
Sestito, S., & Dillon, T.S. (1994). Automated Knowledge Acquisition. New York:
Prentice Hall, Inc.
Shaw, M.J., & Gentry, J.A. (1988). Using an expert system with inductive learning to
evaluate business loans. Financial Management. Autumn, 45-56.
Smyth, P., & Goodman, R.M. (1992). An Information Theoretic Approach to Rule
Induction from Databases. IEEE Transactions on Knowledge and Data
Engineering. 4(4), 301-316.
Tam, K.Y. (1991) Applying rule induction to stock screening. Proceedings of the First
International Conference on Artificial Intelligence Applications on Wall Street.
Los Alamitos, CA: The IEEE Computer Society Press.
Tam, K.Y., & Kiang, M.Y. (1992). Managerial applications of neural networks: The
case of bank failure predictions. Management Science. 38(7), 926-947.
Teweles, R.J., & Bradley, E.S. (1987). The Stock Market. (5 ed.) New York, NY:
John Wiley & Sons.
Trippi, R.R., & DeSieno, D. (1992). Trading equity index futures with a neural
network. Journal of Portfolio Management. 19(1), 27-33.
Trippi, R R., & Lee, J.K (1996). Artificial Intelligence in Finance & Investing. (Rev.
ed). Chicago, IL: Irwin Professional Publishing.
Trippi, R.R., & Turban, E. (1996). Neural Networks in Finance and Investing.
(Revised ed ). Chicago,IL: Irwin Professional Publishing
Weiss, S.M., & Kulikowski, C. A. (1991). Computer Systems That Learn. San Mateo,
CA: Morgan Kaufmann Publishers, Inc.

216
Wilson, C.J., & Koehler, G.J. (1986). Pros & cons of expert systems. Business
Software Review. 5(12), 38-42.
Wesley Publishing Company, Inc.
Wittkemper, H , & Steiner, M. (1996). Using neural networks to forecast the
systematic risk of stocks. European Journal of Operational Research. 90(3), 577-
588
Yoon, Y., Guimaraes, T., & Swales, G. (1994). Integrating artificial neural networks
with rule-based expert systems. Decision Support Systems. 11(5), 497-507.

BIOGRAPHICAL SKETCH
Robert C. Norris, Jr. received his Bachelor of Science degree in chemistry in 1970
from Frostburg State University in Maryland He received the Master of Business
Administration in 1979 from George Mason University in Virginia. He will receive the
Doctor of Philosophy in decision and information sciences, with specialization in
management information systems, in December 1997, from the University of Florida.
Bob has worked extensively in local government in Northern Virginia in a variety
of positions leading up to City Manager of Fairfax, where he won a statewide award for
establishing a commuter bus service to the District of Columbia. He also served as the
Executive Director of the City of Fairfax Industrial Development Authority. He worked
domestically and internationally for Public Administration Service as a consultant for
information systems and their management, and has been employed as a Visiting
Assistant Professor of MIS at the University of North Florida
His research interests are in the application of artificial intelligence to finance and
the strategic use of information systems in organizations.
217

I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy
2-
Gary J. Â¿Koehler, Chairman
John B. Higdon Eminent Scholar of Decision and
Information Sciences
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy
Associate Professor of Finance, Insurance, and Real
Estate
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy
Richard A. Flnickf
Professor of Decision and Information Sciences
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor ofPhilosophy
Pa(rick//A Thompson
Lecturer of Decision and Information Sciences
This dissertation was submitted to the Graduate Faculty of the Department of
and to the Graduate School and was accepted as partial fulfillment of the requirements for
the degree of Doctor ofPhilosophy.
December 1997

LD
1780
1997
. AI UNIVERSITY OF FLORIDA
3 1262 08556 Jf jjP

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID ER5JUYS5W_CKVDXZ INGEST_TIME 2013-09-28T02:37:38Z PACKAGE AA00014300_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES