Using decision trees and feature construction to describe changing consumer life-styles and expectations


Material Information

Using decision trees and feature construction to describe changing consumer life-styles and expectations
Physical Description:
xv, 250 leaves : ill. ; 29 cm.
Major, Raymond L., 1955-
Publication Date:


bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1994.
Includes bibliographical references (leaves 242-249).
Statement of Responsibility:
by Raymond L. Major.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002019644
notis - AKK7093
oclc - 32801377
System ID:

Full Text







Copyright 1994


Raymond L. Major

To my mother, Pearl Simmons Major, for all her encouragement and support, and,

to my deceased father, John Willie Major, for his wisdom in helping me face the

challenges of life.


I am deeply indebted to many people who helped my dream become a reality.

First of all, my sincerest thanks go to Dr. Israel Tribble Jr. and all of his staff at the

Florida Education Fund. Their guidance, moral, and especially financial support truly

helped me cope with many of the problems and frustrations I encountered during my

graduate experience at the University of Florida.

I am also thankful to all of my committee members Professors Gary Koehler and

Selwyn Piramuthu for being the chair and cochair of my committee; Professors Selcuk

Erenguc and Pat Thompson for their guidance and support; and Professor Dave Denslow

for his friendship and helpful insights.

Thanks go to members in the College of Business who have helped me in one way

or another. Professors Harold Benson and Richard Elnicki helped me in adapting to the

world of academia. Professor Henry Tosi in the Department of Management helped me

make that important decision to enter a Ph.D. program. I will always be grateful for his

friendship. Professor Sanford Berg in the Department of Economics helped me see the

beauty of doing quality research. Thanks also go to the Director and staff of the Bureau

of Economic and Business Research who provided many of the resources I required for

performing my research.

Thanks are also due to Dr. Max Parker in the College of Education. His

accomplishments and enthusiasm had a significant influence on me. Dr. Roderick

McDavis, also of the College of Education, was extremely helpful in my acclimation to

the culture of the University of Florida's Graduate School--'Thanks Rod'.

Finally, my most sincerest thank go to my family members. I am most grateful

to my daughters, Deborah and Brenda. Their love, understanding and support truly

helped in making my graduate experience as a single parent a very positive one. I am

also thankful to be blessed with very supportive and encouraging siblings: Sam, Jimmie,

Lamar, Larry, and Sherrian. I thank them for sharing my agonies and ecstasies.


ACKNOWLEDGEMENTS ..........................

LIST OF TABLES ............................................... x

LIST OF FIGURES ............................................ xii

A abstract .................................................... xiv

CHAPTER 1 INTRODUCTION ..................... 1
1.1 The 1990- 1991 Recession ................................. 1
1.2 Survey Measures of Consumer Confidence ................... 4
1.2.1 National and Statewide Business-Surveys .............. 4
1.2.2 BEBR Survey Data ............................. 6
1.2.3 Consumer Confidence Metrics ............. .......... 11
1.3 Consumer Expectations and Buying Plans .................... 15
1.4 AI and Feature Construction ............................. 17
1.4.1 Time-Complexity of Feature Construction ............. .20
1.4.2 DUALTREE Feature Construction ................... 22 Dual decision trees ....................... 24
1.4.3 ID3/C4.5 Decision Trees Using BEBR Sample Data ...... 29
1.5 Thesis and Objectives .................................. 42
1.5.1 Problem Definition ........ ....... ............... 43
1.5.2 Problem Resolution ............................. 43
1.5.3 Implementation and Experimentation ................... 44
1.6 Dissertation Outline ............ ............ ............ 45

2.1 Consumer Consumption of Durable Goods .................... 48
2.1.1 Estimating a Demand for Commodities ................ 49
2.2 Survey Data of Consumer Households ....................... 52
2.2.1 BEBR Survey of Consumer Confidence ............... 54 BEBR index components ................... 55
2.3 Describing Purchasers of Durable Goods .................... 56
2.3.1 Experimental Design using the BEBR Business Surveys .. 61

CHAPTER 3 MACHINE LEARNING ................... 64
3.1 Background in Artificial Intelligence ........................ 64
3.2 Reasoning Systems .................................... 64
3.3 M machine Learning ..................................... 66
3.3.1 Views of Learning .................... ......... 66
3.3.2 A Model of a Learning Machine .................... 67
3.3.3 Learning Strategies .............................. 69
3.3.4 Learning Theories .............................. 72
3.3.5 Learning Algorithms ...................... ....... 74 Representation of learned concepts ... .. ...... 74 Incremental and Non-Incremental Learning ...... 75 Dealing with uncertainty ................... 76 Learning single or multiple concepts ........... 76 Algorithm's search strategy ....... ....... 76 Concept formation goals ............. ...... 77 Application domain ....................... 79 Criterion ............................... 80
3.4 Concept Description Languages ............................. 80
3.4.1 Binary Trees .................................. 83
3.4.2 Decision Trees ................................. 85
3.4.3 Decision Lists ................................. 86
3.5 C4.5 Machine Learning Programs .......................... 87

4.1 Decision Trees and Feature Construction .................... 90
4.2 Feature-Construction Algorithms .......................... .92
4.2.1 Complexity Measures ............................ 93
4.2.2 CITRE ...................................... 95
4.2.3 FRINGE and Dual FRINGE ....................... 96 FRINGE feature construction ................ 98
4.3 Time Complexity Models ................................ 99
4.3.1 Probabilistic Models ............................ 100
4.3.2 Algorithmic Models ...... ....... ............... 101 Bounded rank decision trees ............... 102
4.3.3 Research Model ............................... 105 Tree-construction component ............. 107
4.4 Finding New Features ................................. 110
4.4.1 Searching a Feature Space ................. ...... 110
4.5 Feature-Representation Models .............. ......... 116
4.5.1 OCCAM's RAZOR ................ .............. 117
4.5.2 M D LP ..................................... 118
4.5.3 Boolean Formulae .................. .... ...... 119 Representation classes ................. 120 Computation models ......... ....... ... 120

4.6 Feature-Construction Models ............................ 122
4.6.1 Exhaustive Approach ........ ................... 122
4.6.2 Binary Tree Construction ........................ .125
4.7 Dual Trees .......................................... 126
4.7.1 Properties of Dual Trees ......................... 127
4.8 DUALTREE Feature Construction ........................ 129
4.8.1 Feature Construction ............................ 130
4.8.2 Validation and Verification ................... .... 131 Procedural framework .................... 134 Claims and proofs ....................... 135
4.8.3 Extensions for DUALTREE ...................... 137 Binarizing nominal and continuous data ........ 137 Forming features with binarized data .......... 139

5.1 DUALTREE's Representation Model ...................... 144
5.1.1 DUALTREE's Adjacency-Structure ........... ..... 145
5.1.2 Searching and Sorting Feature Names ................ 146
5.2 Graph Processing in DUALTREE ......................... 148
5.3 Feature Construction with DUALTREE ..................... 149
5.3.1 Building Class Successors ........................ 150 Finding features ........................ 152
5.3.2 Building the Dual of the Dual ..................... 154 Finding terminal features .................. 156
5.3.3 Finding New Features ........................... 157
5.4 DUALTREE's Time Complexity ......................... 158

CHAPTER 6 EXPERIMENTS .................... 159
6.1 Experimental Design .................................. 160
6.1.1 Experimental Technique ......................... 161
6.1.2 Presentation of Results .......................... 163
6.2 Feature Construction Using Binary Data .................. 163
6.2.1 DNF Functions Test Results ...................... 166 Useful features of DNF functions .... ...... 169
6.2.2 Multiplexor and Parity Functions Test Results .......... 173 Useful features of multiplexor and parity
functions .............................. 176
6.2.3 Summary of Results for Binary Data ................ 177
6.3 Feature Construction Using Nominal Data ................... 182
6.3.1 Test Results using Nominal Data ................... 185
6.4 Feature Construction With Continuous Data .................. 189
6.4.1 Results using Continuous Data ..................... 190
6.5 DUALTREE Descriptions of Consumer Life-Styles ............. 192
6.5.1 Empirical Design and Method .................... 193

6.5.2 Demographic Descriptions of Financial Conditions ....... 195 Descriptions of 'the same' and unsure consumers 201
6.5.3 Describing Consumer Buying Plans ............ ..... 209
6.6 Complexity Results .................. ................. 214

CHAPTER 7 CONCLUSIONS .................... 219
7.1 Sum m ary .......................................... 219
7.2 Attainment of Goals .................................. 222
7.2.1 Problem Definition ............................. 222
7.2.2 Problem Resolution ............................ 224
7.2.3 Implementation and Experiments ................... 226
7.3 Future Research ..................................... 228
7.3.1 Improved Model Development ..................... 228
7.3.2 Problems in the Study of Feature Construction ......... 229



REFERENCE LIST ........................................... 242

BIOGRAPHICAL SKETCH .................................... 250




















1.1 Distribution of respondents' answers for buying a car ......

2.1 Income and age distributions of financial confidence ......

2.2 Sex and party distributions of confidence ..............

6.1 Boolean target functions ..........................

6.2 Class distributions for binary data-sets ................

6.3 C4.5 results using DUALTREE's features. .............

6.4 Feature-formation results for DNF functions ............

6.5 C4.5 results using multiplexor and parity functions ........

6.6 Multiplexor and parity feature-formation results ..........

6.7 C4.5 results using nominal data-sets ..................

6.8 Feature-formation results using nominal data-sets .........

6.9 Continuous and nominal data results ..................

6.10 Training and test set class-distributions ................

6.11 C4.5 results using BEBR data-sets ...................

6.12 Confusion matrices for the four data sets ..............

6.13 Features in a virtual tree for the SAME class ...........

6.14 Demographic descriptions of consumer buying plans ......

6.15 Descriptions of 'better' and unsure respondents ..........

...... 38

...... 59

...... 61

..... 165

. .... 166

. .... 168

. .... 170

. .... 175

..... 177

. .... 186

..... 188

. .... 192

..... 194

. 196

..... 198

..... 202

..... 204

..... 206















Other 'better' and unsure consumer-descriptions ............... 207

Other consumer descriptions regarding buying plans ............ 208

Sample distributions using GBTIME ....................... 212

More C4.5 results for the BEBR data ....................... 213

Confusion matrices for consumer views on buying ............. 214

A list of consumer descriptions ........................... 215

Complexity results .................................... 218


Figure 1.1 Measures of consumer confidence .................. ......... 5

Figure 1.2 Two measures of personal financial expectations ................ 6

Figure 1.3 Consumer expectations of personal finances ................... 8

Figure 1.4 Percentages of financially unsure respondents .................. 9

Figure 1.5 Expectations of national conditions and buying plans ............ 10

Figure 1.6 Indexes determined by 'don't know' combinations .............. 12

Figure 1.7 Consumer car purchase plans ............................. 16

Figure 1.8 A decision tree and its dual .............................. 25

Figure 1.9 The re-oriented dual tree ................................ 26

Figure 1.10 Forming features in the dual tree ................... ...... 29

Figure 1.11 A dual tree after feature construction ....................... 30

Figure 1.12 The dual of the dual tree ................................ 31

Figure 1.13 Decision trees with rank equal to 1 ......................... 32

Figure 1.14 A decision tree for the BEBR data .... ................... 39

Figure 1.15 A decision tree using DUALTREE features ............... ... 41

Figure 2.1 CCI's for Jan '92 May '93 ............................... 56

Figure 2.2 Consumers' financial confidence during 1992 .................. 58

Figure 3.1 Learning machine model ................................. 68

Figure 3.2

Figure 4.1

Figure 4.2

Figure 4.3

Figure 4.4

Figure 6.1

Figure 6.2

Figure 6.3

Figure 6.4

Figure 6.5

Figure 6.6

Learning algorithm considerations ..........................

A decision tree ................................. .....

A reduced decision tree with rank = 1 .....................

3rd smallest decision trees of rank 1. ........................

The re-oriented dual tree ...............................

Comparison of features used to features formed ...............

Comparison of edges with features to total edges ..............

Feature formation-rates of multiplexor and parity functions .......

Edge-usage results using multiplexor and parity functions ........

Performance results for binary data-sets .....................

Tests using DUALTREE features .........................

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Raymond L. Major

August 1994

Chairperson: Gary J. Koehler
Major Department: Decision and Information Sciences

Using artificial intelligence methods to acquire expert knowledge inductively is

a key area of interest in Expert-Systems Development. This dissertation investigates the

theoretical properties of feature-construction learning algorithms and uses them to develop

an empirical model to examine several issues related to the 1990-1991 recession.

Empirical results show that feature construction can improve the performance of an

induced decision tree. We develop an analytical model of learning with feature

construction. Our model characterizes the time complexity of learning boolean functions

with polynomial size DNF expressions, when bounded-rank decision trees are used as a

concept description language. Results show that limiting the number of new features may

improve the computational efficiency of feature construction. Our procedure uses the dual

of a decision tree when forming new features. We then use our empirical model to: (1)


describe changes in consumer life-styles and expectations for time periods associated with

the 1990-1991 recession, and, (2) show that current practice for creating quantitative

measures of consumer confidence is sometimes inappropriately used. Finally, we examine

tradeoffs between expert comprehensibility and formal power, when choosing a

representation to use in expert-system applications.


Knowledge Acquisition is a process currently undergoing extensive research by

many information scientists. In this dissertation we do two things. First, we develop a

new method for feature construction. Next, we use our new method to create a

knowledge base of information associated with a period of recent economic activity in the

United States. Economists and business analysts may use this information to better

understand certain decision criteria used by a diverse group of consumers. Before

describing a way to build the knowledge base, we first discuss the kind of knowledge we

need and how this information can be used.

1.1 The 1990- 1991 Recession

Economists and business analysts are currently exploring many questions related

to the 1990-1991 recession and the recovery period following it. The National Bureau

of Economic Research designates the last two quarters of 1990 and the first quarter of

1991 as a period of negative growth for the U.S. economy (Blanchard 1993; Hall 1993).

Suggested causes for the recent recession include price shocks, higher tax rates, decrease

in defense spending, end of the Cold War, consumer depression, and the Iraqi invasion

of Kuwait. However, the literature suggests that a shock to consumption largely

determined the recessionary episode (Blanchard 1993; Hall 1993; Hansen and Prescott



1993; Perry and Schultze 1993). Negative consumption shocks decrease consumption of

market goods and services below trend. Perry and Schultze (1993) state:

We have been able to tag the recent recession and subsequent sluggish recovery
as clearly unusual in that--unlike its predecessors--it was not primarily driven by
a combination of policy changes and autoregressive responses by other forces
weakening total demand. We have pinpointed the weakness in consumption as the
most important locus of negative shocks, and have suggested that it arose in part
from the depressing effect on consumer confidence stemming from weak
employment growth and from the unusual prevalence of permanent--as contrasted
with temporary--layoffs. (193)

The literature leaves many questions concerning the consumption shock unanswered.

Most researchers use traditional statistical methods in their empirical models for studying

various questions regarding the recent recessionary episode. These models usually require

(1) quantitative information in the form of time series or cross-sectional data, and (2)

estimates of all unknown parameters. However, using these models, it is sometimes

difficult to examine important questions, such as how changes in attitudes of consumers

before, during, and after the recession varied by age, sex, and income. Additionally,

several researchers suggest that established models are not helpful for exploring questions

such as whether changing demographics and life-cycle factors leading to lower savings

rates are partly responsible for the slow recovery (Hall 1993; Hansen and Prescott 1993).

One reason for the frailty of statistical models is that when they are applied to 15

to 20 variables and the interrelationships between them, the maintained assumptions

required for estimation are implausible. Palies and Philip (1989) give additional

disadvantages associated with traditional statistical models including the following: (1)

the analysis can be qualitative or based on heuristic rules; (2) they are of limited use for

very short term forecasting; and (3) classifying the variables into endogenous, exogenous


and out of model variables depends on the feasibility of computing the resulting models

rather than on economic theory. Concerning economic models based on neoclassical

demand theory, Hodgson (1992) suggests that neoclassical economics is deficient because

of its narrow, utilitarian base, and because of its general treatment of time and analysis

of economic processes. Hall (1993) suggests a need for using empirical models without

the neoclassical curvature conditions to examine the recent recession. Moreover, Gianotti

(1989) states that the trend toward the formalization of less structured problems and the

increased emphasis on individual attitudes and expectations create a need for new

methods to represent and manipulate symbolic knowledge.

I propose to develop an empirical model of learning, using Artificial Intelligence

methods and techniques. A major hypothesis of this research is that this system can

produce useful descriptions for examining questions such as the ones previously

mentioned, using data collected from business surveys of consumer attitudes and

expectations. Our model offers the advantage of being able to examine the

interrelationships among a relatively large set of variables or attributes. Additionally,

Palies and Philip (1989) state that a knowledge-based approach gives a framework

allowing for (1) the explicit description of the economic agents process and the

economists' behavior, and, (2) dealing with the quantitative and qualitative scaled

variables. Palies and Philip (1989) overcome certain limitations of econometric models

by linking the models with an expert system to explain and compute several exogenous

variables, taking into account the endogenous variables influencing them.

1.2 Survey Measures of Consumer Confidence

Many researchers use surveys of consumer attitudes and expectations as a source

of information for studying the recessionary episode (Blanchard 1993; Hall 1993; Perry

and Schultze 1993). Gianotti (1989) describes business surveys as qualitative and

quantitative data reporting the opinion of economic entities (i.e. firms, families, etc.)

about the past trend, the current status, and the expected short-term variations of several

key indicators. Survey data of consumer attitudes and expectations is normally used to

create an index of consumer confidence to predict changes in consumer purchase-rates of

durable goods (Juster 1959). One usually obtains an index-value by taking the mean of

several component-values.

1.2.1 National and Statewide Business-Surveys

Today, several indices of consumer confidence regularly appear in a variety of

business publications such as Business Week and the Wall Street Journal. Three widely

publicized national-measures of consumer confidence are available from (1) the University

of Michigan, (2) the Conference Board, and (3) ABC News and Money magazine. A

statewide measure is the Consumer Confidence Index (CCI) published monthly by the

Bureau of Economic and Business Research (BEBR) at the University of Florida.

Figure 1.1 shows the BEBR's CCI and the University of Michigan's Index of Consumer

Sentiment (ICS) from the first quarter of 1989 through the last quarter of 1992. We see

that both measures track fairly well together and drop from high to low levels. The

survey and the procedure for constructing the index employed by the BEBR, are patterned

Consumer Confidence

110 Jan '89 Dec '92
* 95- \ ^

85 I

70 -

Q1 Q2 Q3 Q4 Q1 Q2 Q3 4 Q1 Q2 Q3 04 Q1 Q2 Q3 Q4
r1989 1990 1991 1992

- Michigan's ICS

Figure 1.1 Measures of consumer confidence

after the University of Michigan's national index. Thus, we may infer from Figure 1.1

that the confidence of Floridians is representative of the confidence of consumers

nationally. Figure 1.2 shows the index-component-values for the future-financial-

condition component, used in constructing each respective index. From the figure, we

observe that the two curves behave similarly during the recessionary episode and have a

high degree of correlation between them throughout the time period. This information

strengthens our previous conclusion concerning the representiveness of Floridians. We


150 Jan '89 Dec '92
135 -
I \ I
S 12 5 / A,
120 / \
110- \
80 6304
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 04 Q1 Q2 Q3 Q4
1989 1990 1991 1992

- BR's Component
- Michig.ds COnponez

Figure 1.2 Two measures of personal financial expectations

use the BEBR's survey data over the time period from 1989 through 1992 for analysis

in this research.

1.2.2 BEBR Survey Data

To construct its composite index, the BEBR uses five components. Three of these

are indicators of consumer's personal finances and buying plans, and the other two are

indicators of consumer expectations of the national economy. The survey questions for


these components have four alternative answers: 'better', 'same', 'worse', and 'don't

know', or 'good', 'uncertain', 'bad', and 'don't know'. Figure 1.3 shows the percentage

of answers given by respondents for the first two component questions. The figure shows

a curious change in the percentages of unsure respondents during the last three quarters

of 1990--there is a sudden change in the percentage of respondents answering 'don't

know'. This shift is very pronounced on the first component since the level of the curve

before and after the episode is around zero. The phenomenon appears to be longer lasting

for respondents who were unsure of their future financial condition. We see that for the

second component in Figure 1.3, the percentage of unsure respondents stayed above its

average level preceding the episode, until the second quarter of 1991. Also, for both

components during the episode, the percentage of unsure respondents seems to be

negatively correlated with both the percentage of respondents who felt that their financial

condition will remain unchanged, and those respondents expecting to be financially better

off. We can infer from this discussion that useful information related to the episode may

be gathered by examining factors related to respondents who were unsure of their current

and future financial conditions. We choose to focus on consumer expectations of personal

finances--and not buying plans or national expectations--to see how certain demographic

descriptions of consumer households changed over the time period. Figure 1.4 again

shows the percentages of respondents who were unsure of both their current and future

financial conditions from the first quarter of 1990 to the third quarter of 1991. The figure

shows a high degree of correlation between the two curves. Thus we can say that the

'don't know' category for these components may contain key 'bits' of information which


60 Jan '89 Dec '92

s 35- .*, ,, "' A _\ /'^ /"-. I A
3o\ I "
8 25 i \

S20- -
S 15-

102 -,;

1021 3 0412 Q3 04 01 Q2 3 4 1 02 Q3 04
1989 1990 1991 1992


--- SAME


Jan '89 Dec '92
-- --- SAME

.. ., \ / WORSE
\ /,./ -. ..... DONT KNOW

i. \
,\ I
t I .
\I ~
/ ~
I / .\

Figure 1.3 Consumer expectations of personal finances

Financially Unsure Respondents
40- Jan '90 Jul '91
35- Legend
30- -.. puture-Pcaonal

S25- I ,


I 15- f \
Vo \

5.-. --f '\ /\

Q1 02 03 04 Q1 Q2 03
1990 1991

Figure 1.4 Percentages of financially unsure respondents

may be potentially informative of the ensuing economic downturn. Incidentally, the

percentage of unsure respondents for the other three components shown on Figure 1.5 do

not undergo rapid changes in 1990. However, for these group of answers, we observe an

interesting pattern in the percentage-levels for respondents whose expectations were good

vs. those who had bad expectations. We see that this phenomenon appears both in the

episodal period and the following recovery period.

US 1- Year Condition

Jan '89 Dec '92



-- >k TAC C

20 -. -

1 -. / 'I ^ '. -- ~ ^^ "

01 02 03 04 01 02 03 041 02 03 04 01 02 o3 04
199 1990 1991 1992

US 5-Year Condition
SJanr '89 Dec '92

so- ---- OOO
55- Qo00
A-I '/ ------ UNCERTAIN
40 -- BAD

/ \

01 62 S 0 01 2 Q3 OS 14 C1 Q2 O3 04 C1 02 s k34
r198~ 190 1"91 1992

7s5- Jan '89- Dec '92
/ -----DO" KMOW

25- A
05 -

01 02 03 04 01 02 03 04 01 02 03 04 01 02 03 04
lOut1989 19S90 1991 1992

| 1991

Figure 1.5 Expectations of national conditions and buying plans


1.2.3 Consumer Confidence Metrics

Given the previous discussion, a question we examine in this dissertation concerns

the current practice for constructing a composite index such as BEBR's CCI. The

respondents' answers are qualitative. This data must be quantified for inclusion in most

quantitative models based on traditional statistical techniques, and common practice for

quantifying the qualitative data employs the use of a balance score (Katona and Mueller

1956; Juster 1959; Didow et al. 1983; Gianotti 1989; Niemira 1992). The procedure

works as follows. First, we must express the survey results as the percentages of

respondents choosing one of three possible alternatives--Better, Same, or Worse (or,

Good, Uncertain, Bad). Let us denote the percentage of respondents answering 'Better'

(or 'Good') as P'; the percentage of respondents answering 'Worse' (or 'Bad') as P; and,

P= represents the percentage of respondents answering the same or uncertain. The

balance score is then (P' P-), or, the difference between the percentages of two out of

three categories of answers to a set of questions. An index is usually constructed by

computing the balance score, and perhaps adding some constant and/or error term.

Niemira (1992) gives the forms used for computing the three national measures. They

are as follows: (1) the University of Michigan uses a balance plus 100, or, (P* P) +

100; (2) Assuming (P+ # P-), the Conference Board measure is given by [ P / (P' P)

]; and the ABC News poll is just (P+ P'). We notice two shortfalls associated with

these approaches. First, a requirement for computing a balance score in this way is that

P', P=, and P must sum up to 100. Given that there are four categories of responses,

there are two alternatives for satisfying this requirement. The first is that one can simply


discard the data for the 'don't know' category. We have shown that for the time period

we want to study, this category contains potentially useful information! A second

alternative is to group the 'don't know' category with one of the remaining ones for

computing the percentages. Figure 1.6 shows a re-calculated BEBR index associated with

each respective grouping of the 'don't know' category with the other three categories.

Re-calculated Confidence Index

120 Combining Don't Know With Other Categories
-. Better + Don't Know


-- Same + Dont Know
- Worse + Don't Know

I \_ J \

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 01 Q2 Q3 Q4 01 02 Q3 Q4
1989 1990 1991 1992
Quarter i

Figure 1.6 Indexes determined by 'don't know' combinations

.......... I r 1 _


We observe from the figure that the re-calculated index-values are reasonably correlated,

and may be higher or lower, depending on the particular combination we chose.

Grouping the 'don't know' category with the 'worse' category, for example, gives index-

values that are never higher than those obtained when we group the 'don't know' category

with the 'same' category. We also observe from Figure 1.6 that the vertical distance

between the curves before and during the episode, is significantly larger than the

corresponding distances during the recovery period. Some researchers propose that the

'unsure' respondents may resemble the pessimistic replies in their effect (Katona and

Mueller 1956). Didow et al. (1983) propose that consumer unsureness may in fact have

an optimistic or pessimistic connotation with respect to overall confidence. Common

practice for computing a balance score groups the 'don't know' responses with the 'same'

responses. From a previous discussion, we saw that these two categories appear to be

negatively correlated. This means that if we were to study a combined group for them,

then any key 'bits' of information they may contain individually may become masked in

the combined group. Hence, we use neither of these two alternatives and instead prefer

to study each category separately.

A second shortcoming associated with balance scores is that they completely

ignore the percentage of respondents who feel the 'same', or P". Many researchers

debate this method because ignoring P- implies that different percentage weights can

result in the same score (Didow et al. 1983; Gianotti 1989). By choosing to examine

each of the four categories individually, we immediately gain some advantages over

commonly used approaches. We do not undertake the task, in this work, of determining


an appropriate set of weights to use with the four categories for the purpose of

determining a consumer confidence metric. Didow et al. (1983) studied the problem of

finding a set of weights for thefour categories (i.e. 'better', 'same' 'worse', don't know'),

and give results from using an alternating least-squares optimal scaling model for

evaluating scales based on mixed metric responses. The authors used data from two

national surveys of consumer finances conducted by the University of Michigan's Survey

Research Center in 1971 and 1973. Using the 'PRINCIPALS' (principal components

analysis via alternating least squares) algorithm in their study, Didow et al. (1983) give

empirical results having inconsistencies with Michigan's ICS construction procedure, and

demonstrate the potential of using their alternating least-squares optimal scaling model

for developing a better measure of consumer confidence.

In this research, we seek useful demographic descriptions for the four categories

representing the respondents' expectations of their future financial condition. We prefer

descriptions in terms of such attributes as age, income, and party affiliation. Many

researchers commonly use demographics as independent variables in their analytical and

empirical models (Ketkar and Cho 1982: Wagner and Hanna 1983; Kent 1992; Morwitz

et al. 1993; Sawtelle 1993). Next to income, attributes such as age, marital status,

employment status, etc., can considerably influence consumer-household expenditures for

market goods and services (Wagner and Hanna 1983). Thus, an hypothesis we propose

is: given useful demographic descriptions for the four categories of responses

representing the respondents' future financial condition, descriptions for the 'don't know'

category are unique, in that they are unlike any of the remaining three, to a certain


extent, and should not be combined with one of the remaining three to examine certain

changes taking place during the 1990 -1991 recession. 'Useful' descriptions, in this

context, are descriptions produced by a credible reasoning procedure, that are also easily

interpretable by humans.

1.3 Consumer Expectations and Buying Plans

Economists and business analysts also examine data associated with changes in

consumer demand for automobiles occurring during the episode. Hall (1993) suggests

that consumers unwillingness to buy automobiles was a significant factor associated with

the consumption shock. Perry and Schultze (1993) state that during the recovery period,

motor vehicle purchases is an area where consumer spending was substantially over-

predicted. A question in the BEBR survey ask whether anyone in the household plans

to buy a car or truck. Figure 1.7 shows the percentages of respondent-answers for the

alternative-answers of 'yes', 'maybe', and 'don't know'. From the figure we see that,

during the recessional period, the percentage of 'yes' respondents dropped about six

percentage points before leveling off about five points lower. A similar analysis of the

percentage of 'no' respondents showed that it increased by ten points before leveling off

at a level five points higher. From these observations, and given our previous discussion

concerning information for the 'don't know' category, a plausible hypothesis is that

changes taking place in the 'don't know' category of consumer expectations for personal

finances, buying plans, and national conditions, best explain the change in consumers'

unwillingness to purchase cars. Since this category is not considered in most methods


22 Jan '89 Dec '92
20- Legend
18- YES
16. ......--- MAYBE
w 14- ----- DONT KNOW

I 6- i--/"\,V"" /-
a -

6 .


Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 01 Q2 Q3 Q4 01 Q2 Q3 04
1989 1990 1991 1992

Figure 1.7 Consumer car purchase plans

for computing a consumer confidence metric, this may explain why consumer spending

was over-predicted for motor vehicle purchases.

In order to examine this premise, we require information of the respondents' plans

for buying a car related to their financial, buying plans and national expectations. Thus,

there are four guiding principles for this research on how to obtain the information we

seek: (1) the domain knowledge is induced from the observed data; (2) the knowledge is

obtained using a credible reasoning mechanism; (3) a decision theoretic approach is used


to form descriptions; and (4) these descriptions are easily interpretable by humans. A

working premise of this dissertation is that we can successfully use Artificial Intelligence

(AI) methods and techniques to produce the information we require. Researchers in many

different research communities employ different perspectives and methods for using AI.

The next section describes approaches used in our research.

1.4 AI and Feature Construction

AI based knowledge-acquisition procedures usually examine examples of solved

cases and give general decision rules in terms of a pre-defined structure. Knowledge

acquisition is commonly referred to in the field of machine learning as concept learning

where the fundamental goal is, for example, to extract descriptions that best describe the

sample data. Weiss and Kulikowski (1991) make three key points of interest: (1) machine

learning methods can give solutions in formats easily understood and more compatible

with human reasoning; (2) from the perspective of minimizing average error rates,

learning systems can be viewed as attempts to approximate Bayes rule; and (3) decision

trees are currently the most highly developed machine-learning-technique for partitioning

samples into a set of covering decision rules.

The ID3 algorithm developed by J. R. Quinlan is an extensively studied technique

for inducing a decision tree from a set of examples. Quinlan (1990a) states:

Decision trees provide a powerful formalism for representing
comprehensible, accurate classifiers. The top-down method of constructing them
is computationally undemanding. When they are used, information regarding
attribute values is sought only as required, making them attractive in a diagnostic
context. They have also been found useful as components of intelligent systems...


We use decision trees as a general model for the learning system we develop. A decision

tree is a structure consisting of nodes and branches where each node represents a test or

decision. There is a branch attached to the node for every possible outcome of the test.

Thus, performing the test gives a partition of two or more disjoint sets covering the

outcomes of the test. The tree branches to another node, according to the outcome of the

test, until a leaf or terminal node is reached. The terminal leaves correspond to sets of

the same class or category.

Quinlan (1993) gives two shortcomings related to using decision trees: (1) they

can be cumbersome, complex, and inscrutable due to the specific context established by

the outcomes of tests at antecedent nodes; and, (2) the structure of the tree may cause

individual subconcepts to be fragmented, or appearing twice or more in the tree in a way

making the tree harder to interpret. Quinlan (1993) also gives two ways to avoid this

'replication' problem: create more task-specific attributes, or, use a different structure for

representing the knowledge (i.e., Production Rules).

Pagallo (1990) develops an algorithm that creates more task-specific attributes in

a way such that decision trees constructed using these attributes avoid the replication

problem. The replication problem is a representation shortcoming where duplications of

decision-sequences, or patterns, exist for determining truth settings (Pagallo 1990). The

idea of creating new task-specific attributes is commonly referred to as feature

construction. Feature construction is a technique for creating new features which are

combinations of the existing attributes. This is a type of representation change, where

each term, or feature, in the concept description is a function of the initial prime


attributes. A major area of focus in this research seeks to improve feature construction.

In our research, we form features that are conjuncts of attributes.

Feature construction algorithms use an empirical-learning-method to construct new

features from the initial prime attributes. Pagallo (1990) developed an approach,

FRINGE, which was well received by many researchers. The heart of the FRINGE

algorithm for learning DNF concepts works by combining attributes along the positive

paths of a decision tree (Pagallo 1990; Pagallo and Haussler 1990). One drawback to

constructing features in this way is that the learning algorithm requires a long time to

process the examples. FRINGE works by forming features from a given decision tree and

uses the existing and new features to build a decision tree for the next iteration of the

algorithm. Testing large numbers of features when constructing the decision tree

increases the time of each iteration, and this significantly raises the total running time of

the algorithm. Adding to the dilemma is that Pagallo (1990) reports that in one

experiment, only 10% of the total number of new features were actually used by the

algorithm. She suggests two ways to improve the computational efficiency of FRINGE.

The first is, at each iteration, to remove the features that are not useful to the learning

task from the feature set. Pagallo (1990) focuses on using this approach and refers to it

as feature pruning. A second approach is to limit the number of new features included

in the feature-set used to construct the decision tree. This dissertation focuses on this

latter approach.

1.4.1 Time-Complexity of Feature Construction

In this work we focus on the time complexity of feature-construction learning-

algorithms using decision trees as a concept description language. We also examine the

ranks of decision trees. The rank of a binary decision tree is an indicator of the

conciseness of the tree. Note that a binary decision tree is an approximate representation

of a target concept. The tree is concise if, for example, each decision variable in the tree

is a term in the target concept. We regard a decision tree having a rank of one as being

(1) a concise representation of a target concept, and (2) devoid of the 'replication'

problem. Trees of higher ranks reflect a more complex representation for a given

concept. We prefer concise representations, thus, a general goal for forming features is

that we form features that aid in reducing the ranks of decision trees for a target concept.

Ehrenfeucht and Haussler (1989) give a polynomial learning algorithm for Boolean

Trees, FIND(S,r), that when given a sample, S, of a Boolean function over n Boolean

variables, produces a bounded rank decision tree of rank r that is consistent with S, or

fails. Ehrenfeucht and Haussler (1989) state:

We define the rank of a decision tree and exhibit a learning algorithm that for any
target function f represented by a decision tree of rank at most r on n Boolean
variables, and any distribution P on {0,l}", produces with probability at least 1 -
8, a hypothesis (represented as a decision tree of rank at most r) that has error at
most e. For any fixed rank r, the number of random examples and computation
time required for this algorithm is polynomial in n and linear in 1/e and log(1/6).

The time complexity of the algorithm is:


Lemma 3 [Time Complexity of FIND(S,r), Ehrenfeucht and Haussler (1989)].

For any nonempty sample S of a function on X, and r 2 0, the time of FIND(S,r)
is O(/S/(n+J)2'). (238)

This is a very key result, and the analytical model we develop in this research extends

this result by presenting a model of the time complexity of an algorithm as we add new

features. We propose adding j new features constructed using time FC(j) such that:

(n + 1)2'r > (n + 1 + j)2(r-) + FC(j)

where we get a new decision tree of rank (r-i).

For our analysis, we separate the complexity of our model into two factors: (1) the

tree-construction-factor, or (n+1+j)"r-), and, (2) the feature-construction-factor, or FC(j).

Ideally, we desire that the sum of the individual factors reduce the order of the time

complexity resulting from simply building a decision tree using the initial prime

attributes. Considering the tree-construction-factor, we use Taylor's Theorem to develop

a model allowing us to determine the maximum number of new features we can add to

an existing set of features, such that the order of the time complexity for building a new

tree with this new set of features is no greater than the order of the time complexity

resulting from using the initial features to build the given tree. A key assumption in our

model is that the new features we create will in fact be used by the tree-construction

heuristics, giving a new decision tree of lesser rank. Thus, another hypothesis proposed

in this dissertation is as follows: we can build a decision tree within the standard time

complexity by adding at most j new and useful features to the existing set of features.


Useful features are features that are likely to be selected by the heuristics used to

construct the decision tree.

1.4.2 DUALTREE Feature Construction

Our analysis hinges on being able to create 'useful' features. Indeed, a key

premise is that we are able to construct decision trees of smaller ranks. If we cannot

construct such trees after creating a limited number of features, then the features may not

be 'useful' for our purposes. Considering this problem, this dissertation also focuses on

developing a procedure to construct minimal feature-sets in a computationally efficient

way. Furthermore, the procedure must preserve the basic structure of the subsets

represented by the internal nodes of a decision tree. We propose such a procedure based

on Beckman's (1980) method for finding an automaton with the smallest number of states

that accepts precisely the same set of tapes as a given non-deterministic finite automaton.

The author adopts Webster's definition of an automaton which is "a machine or control

mechanism designed to follow automatically a predetermined sequence of operations or

respond to encoded instructions." Our procedure combines certain concepts from the

theories of sets and categories with this procedure and symbolizes a different approach

to feature construction.

A decision tree constructed using boolean attributes characterizes a classification

procedure that associates a unique leaf of the tree with any object, even one not in the

original training set. Further, the leaves of the tree partition the space of objects into

disjoint categories (Quinlan and Rivest 1989). One appealing aspect of category theory


is that we can construct universal descriptions using category-theoretic terms (Blass 1984).

Examples of concepts with universal descriptions include the set of natural numbers,

power set, cartesian product, the logical connectives and quantifiers. The category-

theoretic framework captures the important structural properties of these descriptions since

we view objects in the category as generalized sets (Blass 1984). A key idea used in our

work relates to the concept of 'duality' and its meaning for categories. Forming the dual

of a category amounts to keeping the same objects but twisting the structure of the links

between the objects. For example, when the links between the objects are 'directed'

links, we 'twist' the structure by simply 'reversing' the directions of all links. We use

this concept to develop our idea of the dual of a decision tree. Krishnan (1981) presents

the following view of the principle of duality for categories:

Usually we visualize a category as a class of points for the objects, and a
class of arrows for the morphisms, each arrow going from the point that is its
domain to the point that is its codomain. For finite categories these diagrams can
be drawn on paper. The dual of a category is then pictured as one with all the
names for the objects as well as morphisms unchanged and only the direction of
each arrow reversed. (39)

We refer to our procedure for forming new features as DUALTREE, since it works by

constructing the dual of a decision tree. After obtaining the dual of a decision tree, the

DUALTREE procedure forms new features in a manner commonly called 'subset

construction' in the literature on automata theory. We refer to it as 'feature construction'

in the sequel.

We next illustrate our procedure using a simple decision tree. First we describe

how to form the dual of a decision tree, and then show how DUALTREE forms new

features. We propose that forming new features in this way results in having a feature-set


of lower cardinality to use for building a decision tree. Finally, we show results from

using our procedure and an initial decision tree for a subset of the BEBR sample data we

analyze in this work. Dual decision trees

To show how to form the dual of a tree, assume we are given a bounded-rank

decision tree of rank r, constructed using n boolean attributes or features. Now, we make

the following assignments. Let the subsets determined by the internal nodes along a path

from the root node to a leaf, represent the objects for a category. Let the edges of the

path represent the morphisms for the category. A path, which is a set of vertices and

edges, consists of predecessor and successor vertices where the directions of the edges

'point' to the successor vertex. Reversing the directions of these morphisms amounts to

saying that, for a given path, each edge will point to the predecessor vertex if the order

in which the vertices are listed remains unchanged. We form the dual of a tree using this

idea of 'reversing' the directions. Figure 1.8 shows both a decision tree, (constructed

using five boolean attributes), and its dual. For the dual tree, only the directions of the

edges differ while the orientation of the nodes and the node-labels remain the same.

Also, references for several of the nodes in the tree have changed. Using this bottom-up

approach for constructing the dual tree, we may acquire several root nodes but only one

terminal node, or vice versa, depending on the structure of the initial tree. With the

exception of the sole terminal node, each node in the dual tree has exactly one child and

zero or two parent nodes. Note also for this example that the decision tree includes the

'replication problem'. This representational complexity stems from the duplication of

The dual tree

Figure 1.8 A decision tree and its dual

tests in the tree to determine if an instance satisfies a term. Hence, our tree is not a

concise decision tree.

We show the dual tree redrawn in Figure 1.9 so that the root nodes are at the top.

In Figure 1.9 we see that every node, except the roots, has exactly two parents. Also,

considering the two edges pointing to a node, the '0-edge' comes from the left and the


(5) ( 0 ( 0 ( X4
0X 1

0 14 3d

0 1

Figure 1.9 The re-oriented dual tree


'l-edge' comes from the right. For now, we adopt this rule for ordering the two parents

of a node. For DUALTREE feature-construction, we make the following set assignments

for R--the set of root nodes, and E--the set of edges, using the dual tree:

(1) R= {0,1}

(2) E = {(0,0,X3), (0,0,X5), (0,1,X4), (X3,0,X2), (x,,x4), (x4,1,X), (1,1x), (1,,x5),

(x3,0,x1), (x, 1,X) }

The set E consists of triples denoting the edges in the dual tree. The first and third

elements of the triple represent the predecessor and successor nodes, respectively. The

second element of the triple is the value of the test-function that this edge, or triple,

corresponds to. For example, the first element of E, (0,0,x1), represents the left-most edge

in the lower half of Figure 1.9. Our triple informs us that if we are at node '0' and the

value of the test function is '0', then, for this instance, we go to node x3. The cardinality

of E is 10 and the last two edges listed are the terminal edges of the tree (i.e., the

successor is a terminal node).

Continuing with the construction, we create a new tree having a root node for each

element of R by constructing all possible successor nodes for the roots. Let the successor

node be the set of elements that are possible successors of any element of the predecessor

node, for a given value of the function. Note that predecessor nodes may themselves be

a 'set' of elements. So, for example, we create the successor nodes for the root nodes of

'0' and '1' with the following steps. First, we form the successor for the first element

of R, using a function-value of 0. Upon examining the edges in E, we find two edges

beginning with '0' that have a function-value of 0. The successor node is the union of


the ending node for each of these two edges, or x3 and x,. Thus, given a function-value

of 0, the successor for '0' is (x,,x)5. For a function-value of 1, the successor of '0' is

just {x4). Continuing in this fashion we find, for example, that given a function-value

of 1, the successor of '1' is {x5,x2}. The successor for any predecessor--function-value

pair not in E is defined to be the null node. Figure 1.10 shows the features formed after

constructing all successors for the dual tree. Observe that negated attributes are not

included in any of the features. Figure 1.11 shows a dual tree equivalent to our initial

dual tree.

Next, we construct the dual of the dual-tree in Figure 1.11. Performing this action

produces a tree once again having features for the roots and classes as leaves, and this

orientation allows us to interpret the tree as we are normally accustomed. Figure 1.12

shows that the dual of the dual tree has six terminal paths to a negative leaf, and 3

terminal paths to a positive leaf. These are the same totals as the respective paths in the

initial decision tree.

The final step for the procedure is to perform feature construction on the dual of

the dual tree to again reduce the inherent structure of the tree. Using our procedure, we

form six new features that are conjuncts of the initial attributes. The new features are:

x,3X XIX2 X2X5

Thus, we add six new features to the set of five attributes resulting in a feature-set

containing eleven elements. Finally, using our new feature-set, Figure 1.13 shows several

equivalent trees for the initial decision tree, also having a bounded rank equal to one.

This completes the illustration of our procedure. Before showing a decision tree for the

A blank spac represents tn null node
,, -.-> Aa nrnu node
S a -> orootnodr

:X X1,


Figure 1.10 Forming features in the dual tree

sample data we study, we first discuss how we acquire the initial decision tree used by

DUALTREE to form new features.

1.4.3 ID3/C4.5 Decision Trees Using BEBR Sample Data

A standard decision tree algorithm creates an hypothesis by recursively selecting

which attribute to place at a node and partitioning the set of examples according to their

values for the test attribute. The successive divisions of the set of examples proceed until

:X2 X l
S ..........

Null nodes are not shown

Figure 1.11 A dual tree after feature construction

all the subsets consist of cases belonging to a single class, or, the subsets satisfy some

other terminating condition. Decision tree algorithms using a greedy method are

nonbacktracking since once a test has been selected to partition the current set of

examples, the choice is irreversible and other attributes or features are not considered.

Also, a common goal of greedy algorithms is not to infer more structure for the target

concept than is justified by the set of examples. In other words, we prefer not to

construct very complex trees that 'overfit' the data. Our purpose here is not to discuss

-..) ,.x X. 2,'

,' x r x, N--- '.-
"l..3.... <


x-- x- x-- )

Figure 1.12 The dual of the dual tree

the theory of decision trees since it is well documented in the literature. Instead, we

focus on several of the major concepts of interest.

A key step in building a decision tree is choosing the best attribute for a node so

that the test at the node gives good partitions. When one chooses the best attribute based

on how well the available attributes separate the classes or categories, Breiman et al.

(1984) refer to this as 'goodness of split'. Using the number of leaves in a tree as a

measure of its size, Mingers (1989) provides an empirical comparison of several

Figure 1.13 Decision trees with rank equal to 1

'goodness of split' measures and shows that the choice of a measure affects the size of

a tree but not its accuracy, which remains essentially the same even when attributes are

selected at random. Note that the number of leaves in a tree correspond to the number

of distinct 'rules' contained within the decision tree. Also, another way to measure the

size of a tree is by counting the number of nodes in the tree. For our purposes, we prefer

smaller trees since large trees may be too complex for humans to easily understand.

Mingers' (1989) tests show that the 'gain-ratio measure', developed by Quinlan (1986),


generally leads to 'smaller' trees. The gain-ratio criterion chooses the test which

maximizes the proportion of information generated by the split that appears helpful for

classification, subject to the constraint of also giving high information gain (Quinlan


Quinlan (1986, 1993) proposed an evaluation function based on a formula from

information theory that measures the theoretical information content of a code. The value

of this measure depends on the likelihood of the various possible messages. If they are

equally likely, then we have a case representing the greatest amount of uncertainty and

the information gained will be the greatest. The less equal the probabilities, the less

information there is to gain. The information-based method is based on two assumptions.

Using Quinlan's (1986) notation, let C be a collection of p objects of class P and n

objects of class N. Quinlan's (1986) assumptions are:

(1) Any correct decision tree for C will classify objects in the same proportion
as their representation in C. An arbitrary object will be determined to
belong to class P with probability p/(p+n) and to class N with probability

(2) When a decision tree is used to classify an object, it returns a class. A
decision tree can thus be regarded as a classification 'P' or 'N', with the
expected information needed to classify an object given by

I(p,n) = log n log, n
p+n p+n p+n p+n

Now, suppose attribute A is used as the root for a decision tree over C. This tree

partitions C into a certain number of smaller collections each denoted by C,. Let each C,

contain p, objects of class P and n, objects of class N. The expected information required


for the subtree for C, is I(p,,n,). The expected information required for the tree with A

as its root is then determined by the weighted average given by

E(A) = C (P,n,)
,=-I p+n

where the weight for the ith branch is the proportion of the objects in C that belong to

C,. Thus, the information gained by branching on A is determined by

gain(A) = I(p,n) E(A).

A drawback to using this measure is that it has a strong bias in favor of tests with

many outcomes. Quinlan (1993) rectifies this bias by using a type of normalization in

which the apparent gain attributable to tests with many outcomes is adjusted. Hence, the

gain-ratio measure is a variant of the information-gain measure that incorporates the idea

that an attribute itself can have some information value. The amount of information value

for an attribute depends on the distribution of examples among the attribute's possible

values. The less evenly spread its values, the less information in the attribute. Noting

that an efficient measure should convert as much as possible of the attribute's information

value into the classification procedure, Quinlan (1993) computes the ratio of the gain in

information from using the attribute, A, to the information value of the attribute itself.

Thus, a gain-ratio is given by

gain-ratio(A) = I(p,n) E(A)
V p+n, p+n
S 7-n p 2 p+n

The value in the denominator has a high score if the examples are spread evenly between

the attribute values and a low one if they are not. Thus, we can say, for example, that

the gain-ratio measure favors attributes with a small number of values.

Quinlan's C4.5 program, an elaboration of ID3, uses the gain ratio measure to

build decision trees. This algorithm is well studied and many researchers site favorable

results from its use. The 'information' or 'entropy' based heuristics used by the

procedure generally produce simpler decision trees, especially when the sample sizes are

small and/or there are many different outcomes for the possible tests. This dissertation

does not focus on the heuristics used for constructing decision trees. However, given that

(1) the problem of finding a decision tree with the minimum expected number of tests is

NP-complete (Hyafil and Rivest 1976), and, (2) a vast amount of literature suggests that

ID3/C4.5 can be a useful component of an intelligent system since its decision-tree-

heuristics are supported by many theoretical arguments, we can say, for example, that

the ID3/C4.5 Program represents a reasonable or practical approach for extracting

symbolic information from a set of examples. Because humans interpret symbolic

knowledge more readily than they do a collection of numbers (i.e., statistical classifiers

and neural networks), we infer from this discussion that the C4.5 Program, (Quinlan

1993), for building decision trees represents a practical reasoning mechanism for inducing

domain knowledge from sample data.


Quinlan (1993) gives a detailed overview of C4.5's implementation. In this

section we illustrate results of using our DUALTREE procedure, given a decision tree

produced by C4.5 on a subset of the BEBR sample data of plans to purchase an

automobile. Our purpose here is to show how DUALTREE contributes to constructing

a smaller decision tree and not to discuss implications of the trees. For this illustration,

we want to start with a small tree in order to keep the discussion tractable. To keep the

initial tree small, we use a subset of the BEBR survey intentional-data for automobile

purchase plans, and perform the following two actions. First, we eliminate all of the

'NO' respondents. This action helps reduce the size of a tree because we have one less

category or class to describe (i.e., leaves have a fewer number of possible values). Next,

we combine the quarterly data so that the time dimension is given as the 'first half' or

'second half' of each respective year. This helps to produce smaller trees since we are

decreasing the total number of test-outcomes when testing this attribute while building the

tree (i.e., instead of sixteen consecutive quarters--or sixteen links to other nodes--we have

eight consecutive semi-annual periods). We use this subset of data for input to C4.5

where the class or category for each instance is simply the respondents' answer of 'YES',

'MAYBE', 'DK' (i.e. Don't Know). The attributes consist of the five component-

questions whose values are 'Better', 'Same', 'Worse' 'Don't Know' or 'Good'

'Uncertain', 'Bad', 'Don't Know'. Following are the BEBR survey questions and the

attribute assignment used for each component-question of the composite index. Note that

they are given in the order in which they are read to the respondents.

CURFIN ==> Current-Personal Expectations

We are interested in how people are getting along financially these days. Would
you say that you (and your family living there) are better off or worse off
financially than you were a year ago?

FUTFIN ==> Future-Personal Expectations

Now, looking ahead--do you think that a year from now you (and your family
living there) will be better off financially, or worse off, or just about the same as

USFUFI ==> U.S. 1-Year Condition

Now turning to business conditions in the country as a whole--do you think that
during the next 12 months we'll have good times financially, or bad times, or

USNEX5 ==> U.S. 5-Years Condition

Looking ahead, which would you say is more likely--that in the country as a
whole we'll have continuous good times during the next five years or so, or that
we will have periods of widespread unemployment or depression, or what?

GBTIME ==> Household Buying Expectations

About the big things people buy for their homes--such as furniture, a refrigerator,
stove, television, and things like that. Generally, speaking, do you think now is
a good or a bad time for people to buy major household items?

Table 1.1 shows the frequencies for the class labels. Considering Table 1.1 we

can say, for example, that a decision tree consisting of a single leaf labeled YES,

misclassifies about thirty percent of the 3910 instances. A tree produced by the C4.5

program for our subset of data is shown in Figure 1.14. This tree has a total of 53 nodes

and misclassifies roughly thirty percent of the sample.

The tree in Figure 1.14 shows that there are sixteen different descriptions, or

terminal paths, for the YES class and eleven descriptions for the MAYBE class. The

Table 1.1 Distribution of respondents' answers for buying a car

SCLASS = Will Buy a Car? Frequency Percent
Yes 2755 70.5

Maybe 1084 27.7
Don't Know 71 1.8
I i

numbers shown next to the leaves indicate, for example, how many instances reached this
leaf / how many of the instances are misclassified by the leaf. C4.5 insist on having at
least two outcomes with a minimum number of cases for any test used in the tree. This
avoids the use of near-trivial tests which typically lead to odd trees with little predictive
power (Quinlan 1993). C4.5 uses various heuristics for assigning classes to terminal
nodes representing subsets that do not contain the minimum number of cases. Other
criteria also exist for deciding not to partition a subset any further. Typical ones are
based on assessing the split from the perspective of statistical significance, information
gain, or 'error reduction' (Brieman et al. 1984; Mingers 1989; Quinlan 1993).
Figure 1.14 shows that all of the leaves in the tree misclassify at least one of the



Outcome Outcome
- 'NO' 'YES'

M/ill f9 3/1U2 412

Figure 1.14 A decision tree for the BEBR data

instances. Thus, there are no 'perfect' leaves in the tree. Quinlan (1993) suggests that

elements of 'randomness' are introduced in a method that chooses a particular test from

several equally promising ones, when the tests are selected based on examinations of

small subsets of cases. Thus, our attribute-tests may be regarded as imperfect since the


attributes do not capture all of the information relevant to classification. C4.5's stopping

criteria focuses on having a significant number of cases at each leaf so that the tree

reveals the structure of a domain and has good 'predictive' power.

We now use DUALTREE to form new features and for this illustration, we prefer

features that are conjuncts of at most three attributes--again to help keep the illustration

simple. For now, we are interested in results from using DUALTREE, and not the actual

features given by the procedure. Using our procedure, we identify sixteen new features

that are conjuncts of at most three attributes, for inclusion in this example. The sixteen

new features we add to the set of initial attributes are:

[(CURFIN=Same ?) & (FUTFIN=Don't Know ?)]
[(CURFIN=Same ?) & (USNEX5=Uncertain ?)]
[(FUTFIN=Better ?) & (USFUFI=Don't Know ?)]
[(FUTFIN=Better ?) & (USNEX5=Uncertain ?)]
[(FUTFIN=Don't Know ?) & (USFUFI=Uncertain ?)]
[(FUTFIN=Don't Know ?) & (TIME= 2nd '91 ?)]
[(USNEX5=Better ?) & (GBTIME=Good ?)]
[(USNEX5=Uncertain ?) & (TIME=2nd '91 ?)]
[(USNEX5=Uncertain ?) & (TIME=lst '92 ?)]
[(CURFIN=Same?) & (USNEX5=Uncertain?) & (TIME=lst '92?)]
[(CURFIN=Same ?) & (FUTFIN=Don't Know ?) & (USFUFI=Uncertain?)]
[(FUTFIN=Better ?) & (USNEX5=Bad ?) & (GBTIME=Good ?)]
[(FUTFIN=Same?) & (GBTIME=Uncertain?) & (TIME=lst '92?)]
[(FUTFIN=Don't Know ?) & (USFUFI=Uncertain ?) &
[(FUTFIN=Don't Know ?) & (USFUFI=Uncertain ?) &
(TIME=lst '92 ?)]
[(FUTFIN=Don't Know ?) & (USNEX5=Uncertain ?) &

(TIME=2nd '91 ?)]

We show a decision tree in Figure 1.15 resulting from using the new features along with

the initial attributes. The tree in Figure 1.15 still misclassifies about thirty percent of the

(FUTFIN Don't Know & & (USFUFI Uncertainh \YES
S A4/26

YES 61
3646 / 1te
Good R
[MNWA I Buy A Car

13s -13
Figure 1.15 A decision tree using DUALTREE features

instances but only has thirteen nodes--a reduction of forty nodes from the initial tree. The

figure also shows that the tree contains three of the new features. Thus, in this example,

we have shown that including fifteen additional features in the set of features for building

a decision tree, reduces the size of the tree by a factor of four, without significantly

increasing the misclassification rate of the tree. Also, the tree in Figure 1.15 has a rank

equal to one. Decision trees having a rank of one are generally easier to comprehend

since it is easier to keep track of the outcomes of tests at the antecedent nodes. This

completes our illustration of how DUALTREE aids in building smaller decision trees.

1.5 Thesis and Objectives

The general goal of this research is to use AI methods and techniques for

analyzing business-survey-data. We state the main thesis as follows:

PRIMARY THESIS: Empirical models of learning based on Artificial

Intelligence methods and techniques represent systems that provide a

useful approach for examining certain research questions related to the

1990-1991 recession, using data collected from business surveys of

consumer attitudes and expectations. Models such as these offer a new

tool for processing the information contained in business surveys.

In support of this thesis I direct my research around three activities: (1) defining the time-

complexity problem of feature construction, (2) developing a procedure to help solve the

problem, and, (3) designing, implementing, and testing the procedure and methods using

a given sample of data. The subtheses of these three activities and their specific

objectives are as follows:

1.5.1 Problem Definition

Subthesis: We can build a decision tree within the standard time complexity, by

adding at most j new and useful features to the existing set of features. Useful features

are features that are likely to be selected by the heuristics used to construct the decision



-Define 'feature construction' and develop an analytical model showing how the
temporal behavior of the algorithm changes as we add additional features.

-Identify the difficulties and conditions for improving the computational
efficiency of feature construction.

-Establish general conditions that must be satisfied by any approach for
resolving the problem.

1.5.2 Problem Resolution

Subthesis: The DUALTREE procedure for forming useful features, produces feature-

sets having practical sizes. We use "useful" in the sense that the features it creates are

likely to be used by tree-construction-heuristics (i.e., given two features having high

information gain for a subset of the instances, the gain-ratio criteria selects the one

giving the higher proportion of split-information). A feature-set has a practical size if,

for example, it does not contain 2" features when using a set of n primitive attributes.


-Identify and examine suitable methods for forming features.

-Use the conditions given by the problem definition to explore new
approaches to feature construction in a computationally efficient way.

-Identify a procedure for constructing 'useful' features such that the time
required to construct a new decision tree using these features and features
from a given decision tree, is on the order lower than the time used to
produce the given tree.

1.5.3 Implementation and Experimentation

Subthesis: An empirical learning model using Decision Trees and DUALTREE feature

construction, provides 'useful' descriptions from the BEBR business surveys, allowing us

to test the following hypotheses:

(1) Given demographic descriptions for the four categories of responses
representing the respondents' future financial condition, descriptions for the 'don't
know' category are unique, in that they are unlike any of the remaining three, to
a certain extent, and should not be combined with one of the remaining three to
examine certain changes taking place during the 1990 1991 recession.

(2) Changes taking place in the 'don't know' category of consumer
expectations for personal finances, buying plans, and the national
economy, best explain the change in consumers' unwillingness to purchase
cars. Since this category is not considered in most methods for computing
a consumer-confidence-metric, this may explain why consumer spending
was over-predicted for motor vehicle purchases.


-Encode an algorithm using C++ for DUALTREE.

-Implement and empirically test two hypotheses using the BEBR sample

-Analyze the results quantitatively and determine the relative worth of the
proposed methods.

1.6 Dissertation Outline

The following chapters describe the results of my research. Chapter 2 presents a

conceptual model of consumer-household-consumption and its relationship to consumer

attitudes and well-being. Chapter 3 introduces basic definitions for machine learning,

decision trees, and reviews operation of the C4.5 Program. In Chapter 4 we present our

analytical model of the time complexity of building decision trees using feature

construction. We describe our DUALTREE algorithm for forming features in Chapter 5,

and discuss experimental results and conclusions in Chapters 6 and 7 respectively.


An objective of this dissertation is to construct a knowledge base of information

related to consumer spending using AI concepts and techniques. Economists and business

analysts may use this information to better understand decision criteria used by a diverse

group of consumers. Pau et al. (1989) describe how the exposure of economists, banks,

and management departments to AI through knowledge based systems, natural language

analysis, or symbolic programming environments, has increased since the mid 1980's.

Their work supplies a structured collection of known projects, organizations involved, and

tools/methods used, in a variety of applications of AI in Economics and Management.

Additionally, the authors list several areas posing unresolved challenges to AI, such as

policy analysis, public services, and forecasting. Economic forecasting, for example, is

an activity resulting in a set of predictions produced by a forecaster or forecasting method

pertaining to estimations of present or future demand along with present or future need.

This activity also requires a model of decision-making by consumer-households with

respect to the choice of goods and services used in living, along with other relationships

and activities stemming from their choices (Cochrane and Bell 1956).

This dissertation focuses on using machine-learning methods to obtain meaningful

descriptions of consumer-households consuming different amounts of durable goods and

services such as major household appliances, houses and automobiles, or, having different



levels of discretionary and postponable expenditures. We use the terms family,

household, and consumer interchangeably to refer to the concept of a consumer

household. A consumer household is a financially independent entity in which one or

more people live together who pool their income to make joint expenditure decisions.

Financial independence is determined by the three major expense categories of housing,

food, and other living expenses (U.S. Bureau of Labor Statistics 1989). Katona and

Mueller (1956) suggests that short-term changes in people's appraisals of trends in their

economic welfare can be attributed in large part to variations in business conditions as

they are perceived by and affect the individual household. Also, economists regard

changes in consumer preferences, attitudes and expectations as a type of scientific data

which is as reliable as changes in income, price, and the like (Katona and Mueller 1956;

Juster 1959; Katona 1960; Juster 1964). Using survey data of consumer attitudes and

expectations, we seek useful descriptions of consumer households having different

appraisals of trends in their financial situation, such as 'better off', 'the same', 'worse

off', and 'don't know'. Additionally, we prefer descriptions in terms of household or

consumer characteristics such as race, income, party affiliation, age, sex, occupation, and

the like. The following sections discuss (1) the flow of durable market goods and

services within the household, and, (2) approaches for using survey data to aid in

determining a demand function for durable goods.

2.1 Consumer Consumption of Durable Goods

Katona (1960) applies two key propositions to the study of consumer motives,

habits, attitudes and expectations on consumer spending. They are that demand depends

on income and confidence, and that changes in confidence are measurable. Examining

questions concerning the subjective saliency of consumer needs and the transformation

of these needs into demand, the author reports that consumers' discretionary expenditures

are a function of several consumer attitudes such as one's view of their personal financial

situation, what happens to other members of the household and community, and what

happens to the country. A major thesis of Katona and Mueller (1956) was that consumer

demand, especially for durable goods, is a function of both ability to buy as measured by

data on income, assets, debts and the like, and, willingness to buy as measured by

attitudinal and expectational questions in surveys.

This dissertation examines consumer motives, attitudes and expectations, and their

relationships to consumer spending. We use AI concepts and techniques to develop an

empirical model for investigating responses to the attitudinal and expectational questions

found in business surveys. Advantages of using a knowledge-based approach over the

commonly used traditional statistical techniques include (1) AI provides a framework for

working with a large number of attributes which may have very complex relationships,

(2) AI methods provide a framework for dealing with both the quantitative and qualitative

scaled attributes, and, (3) AI provides a framework allowing for the explicit description

of the economic agent's process and the economists' behavior, in terms of a given set of

attributes like, for example, developing an AI-based system that integrates an existing


econometric model with an 'expert' knowledge base to explain and compute many of the

factors which are out of the scope of the model. The next sections describe an approach

we use to study household consumption of commodities, incorporating business surveys

of consumer preferences, attitudes and expectations. Following this, we discuss the

survey data and the research questions we investigate in this dissertation.

2.1.1 Estimating a Demand for Commodities

Economists and business analysts labor to determine a demand function for

durable goods and services, with respect to the household-sector--business-sector

relationship. We assume that human resources, income, wealth and price all constrain

consumption. Also, other characteristics of income, as well as the amount of income,

influence household consumption choices--namely (1) regularity and certainty of income

may affect the proportion of income used for current consumption, (2) expectations

regarding future income may affect the savings rate and willingness to pay for current

consumption with credit, and, (3) sources of income and the number of earners may affect

decisions about income use (Cochrane and Bell 1956; Juster et al. 1981; Magrabi et al.

1991). Considering consumer expectations regarding their future financial status, we want

to examine how these expectations changed before, during, and after the 1990-1991

recession. To do this, we require 'representative' descriptions of consumer-households

for different comparative financial states such as better off, the same, worse off, and don't

know, in terms of a fixed set of consumer-characteristics or attributes, such as age,

income, and party affiliation. Our empirical analysis requires a model capable of (1)


handling a large number of attributes, and, (2) providing results that are easily

interpretable by humans.

Magrabi et al. (1991) review several theoretical approaches used to study

household consumption of commodities. These include utility functions, consumption and

savings functions, household production theory, life-style and life-quality approaches and

others. Their results reveal no single logically coherent theory adequate for the analysis

of all aspects of household consumption behavior. The authors highlight the strengths of

many existing theories and conceptual constructs, and also describe research models that

use combined concepts of two or more theoretical approaches. The literature offers

several key results associated with research models based on many of these concepts. For

example, according to Suranyi-Unger (1977), the majority of Americans adhere, to a

greater or lesser extent, to some institutionalized common life-style. He suggests that

such life-style groups or 'standard classes' may be identified either with respect to the

similarity of their behavioral patterns (e.g., spending patterns), or with respect to their

demographic characteristics. Mitchell (1983) offers a comprehensive classification of life-

style types, based in part on developmental psychology drawing on Maslow's hierarchy

of needs. A unique way of life represented by each type is described in terms of

demographics, attitudes, financial status, and use or ownership of selected consumer

goods. Our aim here is to show the significance of demographic data in models based

on many of the theories. Indeed, Ketkar and Cho (1982) show that various demographic

factors such as the age of the household head, his/her educational attainment, the


employment status of the household head and spouse, the household's race and region of

location, all determine expenditure patterns in the United States.

Recall that economic forecasting is an activity resulting in a set of predictions of

present or future demand along with present or future need. Economists and Business

Analysts use various econometric models to develop forecasts. Business surveys are

commonly incorporated in their forecasting models that give an averaged view of the

economy (Katona and Mueller 1956; Juster 1966; Zarnowitz 1967; Gianotti 1989; Palies

and Philip 1989). Business surveys consist of a relatively systematic standardized

approach to collecting the information for each category of answers to a set of questions.

Knowledge about how to properly construct and administer the survey instrument is found

in Rossi et al. (1983) and will not be discussed here.

A general rule for interpreting surveys is that an answer is a function of the

question (Katona and Mueller 1956). When the set of questions is designed to elicit

consumer plans or intentions to buy certain goods, Juster (1966) interprets these plans or

intentions to buy as reflecting the respondent's estimate of the probability that the item

will be purchased within the specified time period. The survey instrument used in this

research is designed to examine the interrelationships of consumer purchases and buying

intentions to consumer attitudes and expectations. The following sections describe the

demographic data used in our work, along with the attitudinal surveys containing this


2.2 Survey Data of Consumer Households

Fluctuations in the ratio of consumer purchases of durable goods to disposable

income is of key interest to economists concerned with upswings and downswings in

business activity (Juster 1959). The earliest surveys eliciting anticipatory data to use for

predicting the demand for durable goods began in 1945 by the Survey Research Center

(SRC) at the University of Michigan (Juster 1966). Generally, economists use survey

data to create an index of consumer attitudes associated with subsequent changes in

purchases of durable goods (Juster 1959). This dissertation uses survey data to study

certain psychological factors of consumers as a function of their responses found in

business surveys. This data represents, among other things, the fixed characteristics or

factual information about consumers such as age, education, income, and party affiliation.

Pioneering research on using survey data of consumer attitudes was performed at

the University of Michigan's SRC by Katona and Mueller (1956) Noting that consumer

demand depends on both ability to buy and willingness to buy, they claimed that changes

in consumer optimism strongly influence the rate of consumer spending on discretionary

or postponable items. Katona and Mueller (1956) did not, however, simply ask people

whether they planned to buy automobiles or major household appliances. Instead they

created a composite index of consumer sentiment using their surveys, incorporating

several indicators measuring changes in consumer buying intentions, attitudes and

expectations. Two key reasons for using several components to determine consumer

sentiment are (1) buying inclinations may depend on a variety of attitudes, and, (2)


answers to single questions are unreliable, depending upon personal circumstances, the

mood of respondents and question wording. Katona and Mueller (1956) also state:

The need for preparing a summary measure of changes in consumer attitudes
became particularly clear when recently calculations were published which
compared changes in the answers to single attitudinal questions with aggregate
durable goods sales. This procedure assumes that each individual attitude, taken
in isolation, must have a specific relation, unchanged over time, to consumer
behavior. The unitary nature of psychological wholes composed of divergent
parts, as well as the multiplicity of human motivations (some motives reinforcing
one another and others conflicting with one another) are disregarded. As Gestalt
theory has shown, a part or item may change its meaning and function according
to the whole to which it belongs. (92)

Another significant contributor is F. Thomas Juster (1959). Juster's early research

showed that (1) expectational and financial variables are more closely associated with

short-horizon and definite buying plans than with longer-horizon and indefinite ones, and,

(2) it is difficult to determine which expectational and financial factors are most closely

associated with buying plans and purchases because of strong interrelationships among

the variables. Another key research result given by Juster (1966) confirmed the finding

by Katona and Mueller (1956) that surveys of consumer intentions to buy are inefficient

predictors of purchase rates because they do not provide accurate estimates of mean

purchase probability. His experiments verified the hypothesis that the basic predictors of

purchase rates given by an intentions survey--the proportions of intenders (respondents

who answer 'yes') and nonintenders (respondents who answer 'no') in the sample--are

inefficient predictors because the mean purchase probabilities of intenders and

nonintenders vary over time. These results led him to develop purchase probability

surveys, which are still used to create indexes of consumer confidence.


Today, several indices of consumer confidence regularly appear in a variety of

business publications. These include the Consumer Confidence Index (CCI) published

by the Bureau of Economic and Business Research (BEBR) at the University of Florida,

and the Index of Consumer Sentiment (ICS) and Consumer Expectations--at the

University of Michigan. The respective indexes of consumer confidence usually appear

monthly, and these two use identical components for creating the index. We obtained the

data used in this research from the BEBR survey of consumer confidence.

Eichhorn et al. (1978) define an economic index as:

DEFINITION [Economic Index, (Eichhorn et al. 1978)]

An economic index is an economic measure, i.e., a function

[0,1] F:D=9Z

which maps, on the one hand, a set D of economically interesting objects into the set 91
of real numbers and which satisfies, on the other hand a system of economically
relevant conditions (for instance, monotonicity and homogeneity or homotheticity
conditions). The form of these conditions depends on the economic information which
we want to obtain from the particular measure. (3)

For the previous definition, [0,1] represents the set of real numbers between zero and one

inclusively. The next section describes the BEBR survey and the various components

used to construct the BEBR's composite confidence-index. We then describe the

consumer-household characteristics or attributes used in our work.

2.2.1 BEBR Survey of Consumer Confidence

The BEBR is an applied research center in the College of Business Administration

at the University of Florida. Founded in 1929, its primary role is to conduct applied


research focusing on the State of Florida. The BEBR Survey Program, starting in 1983,

administers a monthly sample survey of 500-600 households in Florida. The sample of

households is generated through random-digit dialing of telephone numbers throughout

the state. The numbers are called to identify a household and an adult (18 or older)

respondent. The survey is designed to collect data on consumer attitudes about various

business and economic conditions, and, the demographic and socioeconomic

characteristics of Floridians. The survey and the procedure for constructing the index, are

patterned after the University of Michigan's national index.

Figure 2.1 shows the trend of both the BEBR's CCI, and the University of

Michigan's ICS, for the time period January 1992 May 1993. The graph shows that the

indices track fairly well together, thus, we may infer that the confidence of Floridians is

a reasonable proxy of consumer confidence across the nation. Note that this supports our

previous conclusion regarding similar data for the time period January 1989 December

1992 (see Chapter 1). Next, we discuss the BEBR-survey questions used for eliciting

consumer preferences, attitudes and expectations. BEBR index components

Five components are used in calculating the BEBR's CCI. Three of these are

indicators of consumer expectations for personal finances and buying plans, and the other

two are indicators of consumer expectations of the national economy. Appendix A shows

the survey questions, variable assignments, and range of responses for each component

of the composite index as well as for the demographic information. The index-

component-questions are given in the order in which they are read to the respondents.

Indices of Consumer Confidence
o105 Jan '92 May '93

100 Legend
90 -


80 -'

75 -




Figure 2.1 CCI's for Jan '92 May '93

2.3 Describing Purchasers of Durable Goods

A primary purpose of this research is to develop and evaluate a tool for examining

data found in business surveys. Using the survey data, we seek an empirical model that

gives useful descriptions of consumers having higher inclinations to purchase durable

goods, assuming that consumers in the 'better' category have higher levels of

discretionary and postponable consumer expenditures. Additionally, we want to explore


how major motivational forces (i.e., age, income, party affiliation) influence consumer

purchases of durable goods during the short-term horizon when unforseen or imperfectly

foreseen events occur.

As an illustration of the way the BEBR survey data is used, in the BEBR Florida

Consumer Confidence Index press release dated June 3 1992, Dave Denslow, a University

of Florida Research Economist, reports on various results associated with the survey data.

He states:

The stronger demand for housing stems from rising employment and falling
mortgage rates. Along with it comes greater optimism about near-term prospects
for the national economy. The share of respondents expecting the national
economy to revive during the coming year rose to 41 percent in May, up from 36
in April. (1)

This example illustrates a case where we see a connection made between 'home-buying

plans' and consumer expectations for the national economy. As another example,

reporting on a 'sagging' confidence index, in the press release dated August 3 1992,

Economist Denslow states:

By itself the July change is trivial. More troubling is the way confidence has
stalled. Our index rebounded from the 60s in January, but except for the 81
registered in April it has been stuck in the 70s ever since. Only if it climbs well
into the 80s can we expect consumer spending to surge. (1)

For this case, a connection is made between 'climbing consumer confidence' and a

'surge' in consumer spending.

To illustrate how the data may be used in other ways, let's consider people's

replies to the question, "Now, looking ahead--do you think that a year from now you (and

your family living there) will be better off financially, or worse off, or just about the

same as now ?". Note that Figure 2.1 shows a fairly stable confidence index from

percent Financial Expectations
120 Jan '92 Sep '92
110- Legend
100 -
90- ------ BTTER
80-- ........SAME
70 ---- WORSE
40 ----
FUT-CONF => BEBR fuure-financial-compone confidence-value
Figure 2.2 Consumers' financial confidence during 1992

January through September of 1992. Also during this time, consumers witnessed a

considerable amount of intense debate regarding the candidates for the 1992 Presidential

Elections. Figure 2.2 shows consumer confidence of their future financial situation, along

with answer percentages for the respondents replies. We see that, for the time period,

consumers' level of confidence for their future-financial-state remained fairly stable. Note

that for Figure 2.2 and the following graphics, FUT-CONF represents the confidence level

or value for the 'future-financial-component' of the BEBR composite index.

Table 2.1 Income and age distributions of financial confidence



< $25K 91.2 35.39 49.72 14.89 1424
$25K-$45K 98.7 42.12 46.45 11.43 1339
$45K-$75K 102.1 45.55 44.17 10.28 652
> $75K 111.3 54.87 37.99 7.14 308


18 44 yrs
45 65
> 65 yrs


< $25K



> $75K





1~ I I




> 65 yrs


109.3 87.4 70.7
(843) (300) (196)
108.9 90.1 70.7
(418) (179) (89.5)
119.3 98.6 99.7
(188) (95) (25)



Table 2.1 shows consumer confidence for their future financial situation, for this

time period, distributed by income and age groups of consumers. The table shows a

confidence level, based on our question, and, the distribution of respondents' replies.

Considering the income categories shown on the table, we see that, for Jan '92 through

Sep '92, in general, as consumers' income levels go up, they reported higher levels of

confidence. For the three age groups shown in the table, we observe that as consumers

get older, their levels of confidence decreased. In the lower part of Table 2.1 we see

future financial confidence levels for the survey data grouped by income and age. An

# obs

(# obs)



interesting observation is that consumers between the ages of 18 and 44 reported a fairly

high level of confidence regarding their future financial condition, regardless of their

income level!

As another example, Table 2.2 shows distributions of financial confidence using

the sex and party affiliation of consumers. From the table, essentially, male consumers

reported higher levels of future financial confidence than female consumers. Also, of the

three party affiliations of consumers, Republicans reported the highest levels of

confidence, and Democrats reported the lowest confidence levels. Note from the cross-

tabulation in the lower half of Table 2.2 that consumers who were members of the

Democratic party reported essentially the same level of future-financial-confidence,

regardless of their sex!

Recall from Chapter 1 that we proposed two hypotheses regarding the BEBR

business surveys and 'useful' descriptions of consumer-households. For the first

hypothesis, we need consumer-household demographic descriptions for the four categories

of responses representing the respondents' future financial condition, in order to see if

descriptions for the 'don't know' category are unique, during the recent recession. For

the second, we need descriptions of consumer purchase plans for automobiles, in terms

of their attitudes and expectations (i.e., CURFIN, FUTFIN, USFUFI, USNEX5, and

GBTIME), keeping the four categories of answers mutually exclusive. Given that we

have obtained such descriptions, the next section describes an approach for testing our


Table 2.2 Sex and party distributions of confidence


MALE 99.0 43.74 43.54 12.73 1980
FEMALE 93.7 36.31 51.33 12.35 2550



# obs










100.7 89.9 92.5
(738) (822) (591)

(# obs)

2.3.1 Experimental Design using the BEBR Business Surveys

We used eleven fixed attributes of consumer-households in our attempt to establish

a relationship for consumers' perceived future financial conditions. The consumer

information captured by our attributes includes: (1) age, (2) level of education, (3)

employment status, (4) number of people living in the household, (5) family annual

income, (6) marital status, (7) household residence in a metropolitan statistical area, (8)

job category, (9) political party affiliation, (10) racial background, and, (11) sex.








These demographic attributes are used in many of the econometric models found in the

literature. Appendix A lists the alternative values for the demographic data as well as the

values for the 'time' attribute. For the 'TIME' dimension, we examine the survey

responses for one quarter preceding the recessionary episode, the quarters of the recession,

and one quarter following it (i.e., the recovery period). This time span represents an

event where the CCI swung from a relatively high level to a relatively low level, and

back again to a relatively high level. Also, the 'regional' information for a consumer-

household is captured using the 'MSA' attribute. The 'MSA' value reflects whether the

consumer resides in a Metropolitan Statistical Area. For the BEBR data, these are

geographic units for economic analysis located in the state of Florida, however, the

concept of metropolitan areas has a national interpretation.

We want to test one dependent variable for each of the hypotheses given in

Chapter 1 regarding the BEBR data. For the first hypothesis, the dependent variable is

given by the respondent's answer to the future-financial-condition question on the survey

(i.e., FUTFIN). Keeping the categories of answers mutually exclusive, (i.e., 'better',

'same', 'worse', and 'don't know'), we test to see if descriptions for the 'don't know'

category are unique. Note that in the previous illustrations regarding the BEBR survey

data, the 'don't know' answers were combined with answers of the 'same', when we

formed the future-financial-component confidence-value (FUT-CONF).

The dependent variable for the second hypothesis is given by the respondent's

answer to the survey question asking whether anyone in the household plans to buy a car

or truck. The independent variables for this case are: CURFIN, FUTFIN, USFUFI,


USNEX5, and GBTIME. These attributes represent the attitudes and expectations of

consumers. Given descriptions of consumers' intentions to purchase automobiles over the

recessionary and recovery periods, we want to study these descriptions to see if they

represent plausible ones.

The following chapters describe an empirical tool we developed that produces the

type of consumer-household descriptions we desire. We discuss the experimental results

given by our model using the BEBR data, in the chapter describing our experiments. In

the final chapter, we discuss the significance of our results.


3.1 Background in Artificial Intelligence

The field of Artificial Intelligence (AI) focuses on designing or describing systems

normally associated with activities undertaken by humans. Thus, the fundamental

interests of AI research include modeling activities such as understanding natural (i.e.,

human) languages, problem solving, and more.

One way to describe Artificial Intelligence is by using references to its active areas

of research and to the many applications developed in the field. The AI applications

developed to date help subdivide the field into the following disciplines:

-Languages and Environments for AI
-Natural Language Understanding and Semantic Modeling
-Modeling Human Performance
-Automated Reasoning and Theorem Proving
-Game Playing
-Planning and Robotics
-Pattern Recognition

Reasoning systems are the focus of this dissertation.

3.2 Reasoning Systems

Initial work in AI reasoning systems during the 1950's and 1960's was largely

unsuccessful and too ambitious for the computing models and equipment of that era. As


a result, researchers refocused their efforts and concentrated on search methods and

knowledge representation.

Researchers achieved many advances in search and knowledge representation in

the 1970's. However, the application areas (such as medicine, chemistry, mathematics

and the like) were still too ambitious.

A leading researcher, Edward Feigenbaum, suggested that developers limit

reasoning systems to areas where they can meaningfully capture and apply human

expertise. This suggestion resulted in the birth of Expert Systems. Expert-systems

development is currently a very active area yielding significant returns for using human-

expert knowledge in the form of computer code. In the mid-1980's, many Expert System

(ES) Shells became commercially available. An ES shell contains software to (1)

maintain a Knowledge-base (KB), (2) reason with the knowledge to solve a problem, and,

(3) communicate with the user. It does not contain any specific knowledge about a

domain of interest--hence the term 'shell'. One must obtain the knowledge and load it

into an expert system shell. The process of knowledge acquisition has undergone

extensive research.

The most common methods of knowledge acquisition involve interviewing

methods. A Knowledge Engineer (KE) interviews the Domain Experts (DE). The KE

then translates the information into the form of knowledge representation needed by the

ES shell.

Knowledge-Acquisition is a time consuming process that has bedeviled many

attempts at fielding ES applications. Feigenbaum (1983) states that the "knowledge-


engineering-bottleneck" severely limits the practical development of knowledge-based

systems. He associates the 'bottleneck' with using both domain experts and computer

engineers to build expert systems. This practice is both costly and difficult. Many AI

practitioners are now dissolving the bottleneck by developing programs which begin with

a minimal amount of information, if any, and 'learn' on their own.

This research focuses on Knowledge-Acquisition techniques using Decision Trees

and Decision Lists. The remaining sections of this chapter describe Machine Learning,

Decision Trees, and Decision Lists.

3.3 Machine Learning

There are two major approaches to studying learning. Cognitive scientists try to

develop theories and models of learning observable in humans and other animals.

Researchers in Artificial Intelligence develop theories and models of any type of learning.

These theories and models do not necessarily involve living organisms. The focus of this

research is on theories and models of learning.

3.3.1 Views of Learning

Cohen and Feigenbaum (1982) list four views of learning: (1) any process by

which a system improves its performance; (2) the acquisition of explicit knowledge; (3)

skill acquisition; and (4) theory formation and discovery.

The first, improving performance on a given problem, is the most studied form of

learning. Valiant (1984) describes learning as a 'phenomenon' of knowledge acquisition


in the absence of explicit programming. This view of learning grew out of research in

problem solving.

The acquisition of explicit knowledge is a more limited view of task performance.

Skill acquisition refers to the phenomenon whereby one becomes more proficient at a task

with practice. Finally, theory formation and discovery views learning from the process

of scientific discovery of principles and theories.

Our focus is on the first two views of learning. Hence, we will view Machine

Learning as developing information processing systems which expand their knowledge

base and exhibit improved performance.

3.3.2 A Model of a Learning Machine

Figure 3.1 shows the components of a learning machine (Cohen and Feigenbaum

1982). The task of the performance element is the focus of the learning system. The

learning element tries to improve the performance element. It also bridges any

information gaps between the environment and the performance element.

The environment consists of all information required by the machine. This is the

application's domain. Many applications use the closed world assumption. This states

that anything not derivable from the given facts is either irrelevant or false.

The learning element will sample the environment to acquire knowledge that will

improve the machine's performance. We call this sample a training set. The training set

provides a sample of instances from the domain of interest. The space of all possible

instances defines the instance space.

Figure 3.1 Learning machine model

The knowledge base contains the facts and rules derived by the learning element.

The management of a large knowledge base is also a problem of AI research. The

problems are similar to designing a database management system. The knowledge must

be easily searched, retrieved, modified and stored. The 'organized information', or

knowledge, can be specific, general, procedural, declarative, exact or fuzzy. Procedural

knowledge, for example, exists as a set of instructions used to solve a problem. Typical

knowledge representation methods include frames, constraints, production rules, and


mathematical logic. Production rules, for example, are rule-based schemes which use

procedural knowledge. The procedure for solving a problem in a given domain exists as

a set of 'if...then...' rules.

The learning element consists of the various procedures and functions employed

to expand the knowledge base or improve the machine's ability to perform one or more

tasks. It functions as an 'automated' knowledge acquisition tool within the learning

machine model to delimit the rule space. One of its important function is hypothesis

formation. Through trial and error, the learning element revises the current hypothesis

in response to the data contained in the sample from the instance space. It generates

these hypotheses using a learning strategy.

Hypotheses are evaluated by the feedback provided to the learning element. This

feedback normally consists of results from comparing the hypothesis to some 'oracle'.

Finally, effective rule assessment by the learning element requires transparency of the

performance element. Transparency in this case refers to the learning element's ability

to trace the actions of the performance element.

3.3.3 Learning Strategies

There are several basic learning paradigms or learning strategies. Of particular

interest are (Cohen and Feigenbaum 1982):

Rote Learning is simple type of learning similar to memorization. One retrieves
stored knowledge from the system when necessary. However, storage
space can become a problem for a relatively large knowledge base.

Learning by Instruction or Being Told is a type of learning requiring a
transformation of information into knowledge structures and operations


suitable for use by the machine. The problem areas involves interpreting
and assimilating both system requests and advice into a machine-usable
form. One must also integrate the knowledge into a knowledge base.

Learning from Examples is a form of learning that performs inductive
information processing on a set of examples (a training set).

Learning by Analogy is a form of learning consisting of two steps. The first
step involves building a knowledge base. The second step is an analogical
mapping where one performs deductive inference on new problems based
on their similarities to the existing knowledge base.

Learning can take place in two broad settings: 'supervised' or 'unsupervised'.

In supervised learning, a teacher (or an all-knowing oracle) is present. The presence of

a teacher removes ambiguities from the training set. The machine can then learn much

more rapidly and efficiently. In unsupervised learning, the learning system has no

instructor but must acquire knowledge on its own. The training set may be full of

ambiguities which the learning algorithm must resolve on its own.

The four strategies for learning reflect a decreasing reliance of supervision and an

increasing complexity of the inference process. For example, in rote learning, the teacher

directly supplies information. No inference is needed. Learning by analogy involves

little supervision but requires a complex inference capability.

Within each general strategy, we employ different inference mechanisms to

varying degrees. The main inference mechanisms are deduction and induction.

Deduction moves from general truths to specific cases whereas induction moves from

specific cases to generalizations.

Deductive information processing is 'truth preserving'. All 'truths' classified by

the deduced information are implied by the initial information. Hence, new information


'preserves' the facts contained in old information. Deriving specific facts from general

rules or developing new rules from old ones are deductive procedures.

Inductive information processing is 'falsity preserving'. The induced information

correctly categorizes all fallacies contained in the initial knowledge. Using raw data or

examples to establish laws, rules or general patterns, are examples of inductive


Learning by Instruction and Learning from Examples are (arguably) the two most

appropriate strategies for knowledge acquisition aimed at expediting the construction of

knowledge-based applications. Learning by instruction takes two common forms. In one,

there is a computer-based system which aids an expert or knowledge-engineer in building

and testing a knowledge-base. TEIRESIAS (Davis 1982) was the first illustration of this

approach. Others are discussed in "Knowledge Acquisition for Knowledge-Based

Systems: Notes on the State-of-the-Art" (Boose and Gaines 1989). The focus of this work

is on machine learning, independent of interaction with an expert.

The second type of learning by instruction is called Explanation-based

Generalization (Mitchell, Keller, and Kedar-Cabelli 1986; Dejong and Mooney 1986;

O'Rorke 1989; Flann and Dietterich 1989). Here, a source provides initial knowledge that

may not be directly usable. One then uses deduction to obtain more directly applicable

information. The deduction provides a 'proof' of the desired goal using the starting

knowledge. This proof becomes an explanation that can be generalized to more directly

usable 'compiled' knowledge. Hence, Explanation-based Generalization is a useful tool


for machine-learning in an area where a formal theory or wealth of deeper knowledge

may exists.

Learning from examples is an induction process. The two extremes of learning

from examples bracket the range between supervised and unsupervised learning. Under

supervised learning, the teacher may classify the training set into disjoint sets of examples

of a concept. Yet another form of supervised learning lets the learning element query the

teacher to determine what particular examples illustrate. In unsupervised learning, the

learning element examines the training set to discern features that may impact the

performance element.

The remainder of this work focuses on learning from examples. Furthermore, we

will restrict our discussion to supervised learning. In unsupervised learning, the inference

task is more difficult, although there are many available methods, which include neural

nets (Kohonen 1988), cluster analysis (Cooley and Lohones 1971), and others (Michalski

and Stepp 1983). Work in supervised learning from examples has focused on learning

theories and learning algorithms.

3.3.4 Learning Theories

Over the past decade there has been an explosive growth in the theory of learning.

Much of this work can be attributed to two seminal ideas: Mitchell's Version Space

(Mitchell 1982) and Valiant's PAC (Probably approximately correct) learning (Valiant

1984). Haussler (Haussler 1988) has been a prolific source of many important results.


Learning theory considers three aspects of concept formation. They are concept

accuracy, storage efficiency, and computational efficiency. While a discourse on the

state of machine-learning theory is beyond the scope of this dissertation, an acceptable

learning algorithm must operate within reasonable storage and time limitations to produce

an acceptably accurate concept. 'Reasonable' usually means some polynomially bounded


Concept accuracy shows 'how well' the system learns. This is the percentage of

instances correctly classified by the learned concept. This measure is well suited for

description or classification tasks. However, it may not be appropriate for a pattern-

matching task.

Storage efficiency indicates 'how costly' is it for the system to learn. Memory

is a resource that system developers must manage well. Thus, superior memory

management for a given task demonstrates improved performance.

Computational efficiency reveals 'how long' the system takes to learn. A

desirable property for any application is having the computational process by which the

machine learns be a small number of steps. This is normally a relative measure. Two

algorithms exhibit similar performance if both learn using a number of computational

steps on the same order of magnitude. On the other hand, a single algorithm performs

poorly if its computation time is some exponential function of a combination of its inputs.

3.3.5 Learning Algorithms

A learning algorithm inputs a sample and outputs a concept or a 'FAIL' message.

Of course, the hope is that the output concept is 'close' to the true (target) concept.



Figure 3.2 Learning algorithm considerations

There are many variations that a given algorithm may consider. Figure 3.2 shows

eight such considerations. In this section we list and discuss several of these

considerations. Representation of learned concepts

In AI, learning is viewed as concept formation. In its simplest view, a concept

is an equivalence class of instances. The description of the class in some language


provides the class description. Each class contains the subset of instances which are

members of the class. In supervised learning, the teacher or oracle identifies the correct

class for each member. For example, suppose we had a collection of geometric figures

(e.g. squares, rectangles, right triangles, isosceles triangles, lines, circles, ellipses, dots,

spheres, cubes, pyramids, ...). This instance space, (the collection of figures), can be said

to contain the object-classes of rectangular objects, triangular objects, circular objects, and

linear objects. A more general object-class for this same instance space is polygon.

Squares and pyramids are 'positive' instances of a polygon whereas circles and lines

represent 'negative' instances of a polygon. A possible concept for this polygon class

might be 'number of sides > 2'. For this case the representation language must detail the

'side' object for a polygon and the operation of 'counting' the sides.

As another example, consider the concept of 'a person who is a poor credit risk'.

This can be the set of all people who will default on a loan. We might describe this class

using some language such as:

-People who are unemployed
-People who are heavily indebted relative to their
income source
-Younger people who have a police record Incremental and Non-Incremental Learning

Another consideration is whether the algorithm will operate in an 'incremental'

or 'non-incremental' learning mode. Incremental learning algorithms revise the current

hypothesis after sequentially examining each instance in the training set. Re-examining

any previous instance is not possible. The structure of the training set and the order in

which the instances arrive are factors which impact the efficiency of the algorithms. Non-


incremental learning algorithms examine the entire training set as a whole for creating

concepts. They re-examine the training instances as many times as needed to revise the

hypothesis. Dealing with uncertainty

An important issue related to an algorithm is its ability to handle uncertainty.

Uncertainty has two sources--residual variation and noise. Unrecorded extraneous factors

which affect the results represent residual variation. Conflicts in the training set,

misclassification errors and measurement errors are all examples of noise. The presence

of noise increases the computational complexity of many learning algorithms. Learning single or multiple concepts

Recall the earlier example involving geometric figures. Finding a concept to

determine the 'polygon' class represents single-concept learning. The single concept--

(number of sides > 2)--classifies all of the figures into positive or negative examples of

a polygon. Finding concepts to identify the classes of rectangular objects, triangular

objects, circular object, linear objects etc., are the focus of multiple-concept learning

algorithms. Algorithm's search strategy

There are a lot of approaches to implementing an algorithm. However, inductive

learning usually involves a search strategy. Using a good search method improves the

chances of getting a solution quickly. The most important issue associated with search

is the space to be searched--the 'state space'. This is the set of all possible concepts.


A 'state' represents a discrete 'examination point'. The state space contains the set of all

possible states along with implied states using various defined operators.

Operators represent ways to 'move' to successor states, or general rules of

inference to create new assertions from existing ones. A third issue related to search is

the 'control strategy' used for guiding the search. This is simply the search strategy.

Several general methods are available for searching. One can also classify learning

algorithms according to the searching technique they use.

Data-driven methods use the training instances (data) to drive the search. These

algorithms simply specify the search order for examining the search space. Depth-first

and Breadth-first are two common data-driven search techniques. These are 'blind' search

techniques because they use exhaustive approaches and will not see the goal until they

get there.

Model-driven methods use an a priori model to guide its search. The designer

incorporates background or domain knowledge in the model to increase the efficiency of

the search. One uses knowledge of the goal state to eliminate certain areas of the search

space for examination. The knowledge may be heuristic or definitive. Concept formation goals

The goal behind the specific desire to learn a concept may be a consideration.

Four common goals are classification, description, pattern matching and prediction.

Researchers commonly use one of these four tasks to develop new techniques and

examine variations of existing techniques. Also, intellectual processes such as learning

and reasoning inherently involve these same goals.


Classification is a standard goal. Given a new or unknown object, the system

must classify the object into a positive or negative instance of a learned concept.

Description is a core task for many learning algorithms. Given a set of positive

and negative instances of some concept, the system must learn the concept that describes

all of the positive examples. Recall that a learned concept, 'number of sides > 2',

described a set of geometric figures consisting of rectangles, squares, cubes, pyramids,

etc., and not circles, spheres, cones, etc.

Pattern Matching is a task commonly found in the 'adaptive' systems developed

for many engineering applications. These systems are adaptive because of the dynamic

or constantly changing domain-characteristics associated with them. Given an input

object, the system must first establish a pattern for the object. Next, the system

determines whether the object's pattern matches any of the existing patterns currently

stored in the knowledge base. For example, suppose we want a system that translates a

handwritten sentence from English to Japanese. One of the first task to perform is

identifying each letter and/or word of the sentence. Recall that most people have a

unique handwriting style. Hence, the appearance of any given handwritten letter or word

varies from person to person. However, everyone forms their letters according to some

standard template--the alphabets. The input to the system is the handwritten letter. The

output is the correctly identified alphabet the handwritten letter represents. The task is

to map the input to the proper output.

Prediction is another traditional goal. Using a learned concept and a sequence

of instances, the system must predicts other likely examples of the sequence. Application domain

Many AI applications show potential payoffs for developing and using intelligent

computer programs. AI practitioners do not develop programs that act 'just like a

human'. Instead, they develop programs which incorporate their understanding of the

intellectual process to solve problems. Performing this process is laborious, even by the

experts, in domains where the knowledge is incomplete and not well defined. Choosing

the conceptual primitives for these types of domains is a difficult process. On the other

hand, there are other domains for which large amounts of knowledge exist. Learning,

thinking and reasoning in these domains are straightforward processes and the conceptual

primitives are easily specified.

AI practitioners also focus on constructing knowledge bases that are domain

independent. This means that a low level of coupling exists between the performance

element and the environment. This makes it possible to 'interchange' knowledge bases

without making major changes to the system design. One can then study the performance

of a given learning paradigm using several representation schemes. A key to achieving

domain independence is separating the domain-specific knowledge from the knowledge

representation. The representation determines the inferences, relations, and computational

objects available to the machine. The language also affects the process of acquiring and

organizing the knowledge in the knowledge base.

To help separate the representation scheme from the application medium, Davis

(1982) stratifies knowledge into three distinct levels. The first level contains object level

knowledge. This level focuses on knowledge about objects in the application's domain.


The next level concerns the conceptual building blocks of the knowledge representations.

It details the various tools and techniques for acquiring and manipulating the knowledge.

The third level describes the conceptual primitives behind representations in general. This

represents 'meta-knowledge' or knowledge about the objects and structures of the

representation language itself. Criterion

Another consideration necessary to choose or develop a learning algorithm is to

elaborate the measure of usefulness to be employed. In other words, at what point will

the algorithm's output be useful enough to adequately solve a problem. For example, in

PAC learning, one tries to determine a concept that has probability no greater than 8 of

an error e. In neural net learning, a common criterion is to minimize the squared-sum-of-

errors between the target outputs and the network's learned outputs.

Of these eight considerations, representation of learned concepts, search strategies,

and usefulness criterion are of special interest for this work. The next sections describe

concept description languages and briefly reviews a standard algorithm used in our


3.4 Concept Description Languages

The language chosen to represent the concept is critical to the learning process.

The language may not be expressive enough to exactly capture the concept. Or, it may

be too complicated for a human to understand or use.


One taxonomy of learning may focus on the concept description language. Some

common examples of concept descriptions include:

1. Linear equations,

2. Non-linear equations,
-Neural nets

3. Decision Trees,

4. Conjunctive equations,

5. Disjunctive equations,

6. K-DNF equations,

7. K-CNF equations, and

8. Decision Lists (Rivest 1987).

Concept formation is integrally tied to the desired representation of the knowledge-base.

Restrictions placed on the language for representing learned concepts achieve a

balance between the expressiveness and efficiency of the system. For example, most

people prefer simpler explanations to complex ones, whereas highly efficient routines tend

to produce complex expressions. This 'bias' limits the ability of the program to learn

only the required concepts, but it also increases the efficiency of the search. Considering

inductive learning, Haussler (1988) states:

The most prevalent form of inductive bias is the restriction of the
hypothesis space to only concepts that can be expressed in some limited
concept description language, e.g. concepts described by logical
expressions involving only conjunctions. A still stronger bias can be
obtained by also introducing an a priori preference ordering on hypotheses,
e.g. by preferring hypotheses that have shorter descriptions in the given
description language. (178)

There are a number of different research issues in knowledge representation. They

range from the semantics of the language itself, to the kinds of hierarchies and

inheritances supported by the scheme. Additionally, there are issues concerning the kinds

of knowledge and how to distinguish them. Following is a brief discussion of several of

these key issues.

Barr and Feigenbaum (1981) list four kinds of knowledge represented in AI

systems. They are object knowledge, event knowledge, performance knowledge and

meta-knowledge. Object knowledge represents facts about objects in the world around

us. Event knowledge indicates what we know about actions and events in the world. For

example, a well known event is that the sun rises in the morning. Performance

knowledge consists of knowledge about how to do things. Finally, meta-knowledge

pertains to knowledge about knowledge.

Additionally, the authors introduce three characteristics useful for comparing

different representation schemes--scoe, understandability, and modularity. Scope refers

to the amount of information, or level of detail, used to describe objects and events.

Understandability relates to how well humans comprehend the information in its present

form (i.e., the data structure). This is important not only for acquiring knowledge from

the experts but also for interacting with and giving explanations to the users. Modularity

concerns itself with the degree of autonomy for adding, deleting, or modifying individual

chunks of knowledge in the system. The amount of interaction between the various

database entries depends on the representation scheme and data structure used.


This research focuses on using Decision Trees and Decision Lists as concept

description languages. Specifically, we explore learning DNF concepts using binary

decision trees, and its extension to representing the concept as a decision list. We desire

to develop a framework for feature construction, or methods of enlarging the set of

primitive attributes with additional attributes, or features, that we construct using

combinations of the primitives. The approach uses ideas from the fields of machine

learning, pattern recognition, and, category theory. The following two sections describe

the notation and/or terms used in the sequel.

3.4.1 Binary Trees

We begin this section with a brief overview of terms and properties associated

with binary trees. A research goal is to find equivalent binary trees, (i.e., trees giving the

same decision), that generally satisfy some size optimality rule. A complete discussion

of the following concepts is found in Sedgewick (1992) and Safavian et al. (1991).

A vertex is a simple object, (or node), having a name or label. Two vertices are

connected by an edge. A nonempty collection of vertices and edges satisfying various

requirements is a tree. A list of distinct vertices, where successive vertices are connected

by edges, describes a path in the tree. The defining property of a tree is that there exist

exactly one path between any two nodes in the tree. We refer to a tree by its one

designated root node. Hence, any node in the tree defines a subtree consisting of the

node--as a root node--and the nodes below it.


With the exception of the root node, each node in the tree has exactly one parent,

or node above it, and, zero or more children, or nodes directly below it. If the order of

the children is specified for the nodes, then we have an ordered tree. Leaves, or terminal

nodes, are nodes without any children. Nonterminal nodes have at least one child.

Additionally, terminal nodes and nonterminal nodes are sometimes called external nodes

and internal nodes. Each external node and internal node has an associated external path

length and internal path length. The external/internal path length is the sum of the

lengths for the paths from each external/internal node to the root.

The number of nodes on the path from any node in the tree to the root, (excluding

the node), defines the level of the node. The maximum level among all nodes in the tree

defines the height of the tree.

A binary tree is an ordered tree having both internal and external nodes such that

every internal node has exactly two children. Furthermore, the terms left child and right

child refer to the ordered children of an internal node. An empty binary tree has one

external node and no internal nodes. A binary tree where internal nodes completely fill

every level, except for possibly the last, is called a full binary tree. A complete binary

tree is a full binary tree when only external nodes appear at the two greatest levels.

Furthermore, all nodes on the maximum level appear to the left. Note that the external

nodes of a binary tree only serve as placeholders. The major focus of constructing binary

trees is to 'structure' the internal nodes according to some scheme.

Finally, we have the following well-defined properties associated with trees.

PROPERTIES [Tree Properties, Sedgewick (1992)]

4.1 There is exactly one path connecting any two nodes in a tree.

4.2 A tree with N nodes has N 1 edges.

4.3 A binary tree with N internal nodes has N + 1 external nodes.

4.4 The external path length of any binary tree with N internal nodes is
2N greater than the internal path length. (38-39)

5 The height of a full binary tree with N internal nodes is about log2N.

3.4.2 Decision Trees

This research focuses on representing Boolean functions using binary decision

trees. A Boolean function is a function of Boolean variables (i.e., elements of the set

(0,1 ), or literals (a boolean variable or its negation). A binary decision tree consists of

both internal and external nodes. Each internal node represents a comparison of two

objects and has a edge for each outcome. Each external node, or leaf, represents a result.

Pagallo (1990) shows that every Boolean function has a Decision Tree or Disjunctive

Normal Form (DNF) representation. Other results of using Decision Trees, or DNF

equations, are found in Rivest (1987) and Haussler (1988).

Next, we develop a practical approach we need for examining both decision trees

and decision lists. We use the formal definitions of decision trees and the functions they

represent as found in Ehrenfeucht and Haussler (1989). Hence, let us adopt the following


Let V, = (v, ..., v,, be a set of n Boolean variables. Let X, =
(0,1 }. The class Tn of decision trees (over V,) is defined recursively as