The role of fluid, crystallized and creative abilities in the prediction of scores on essay and objective tests

MISSING IMAGE

Material Information

Title:
The role of fluid, crystallized and creative abilities in the prediction of scores on essay and objective tests
Physical Description:
x, 108 leaves : ; 28 cm.
Language:
English
Creator:
Legg, Sue M., 1940-
Publication Date:

Subjects

Subjects / Keywords:
Intelligence tests   ( lcsh )
Creative ability   ( lcsh )
Prediction of scholastic success   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis--University of Florida.
Bibliography:
Includes bibliographical references (leaves 98-106).
Statement of Responsibility:
by Sue M. Legg.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 000079878
notis - AAJ5188
oclc - 04993707
System ID:
AA00003912:00001

Full Text









THE ROLE OF FLUID, CRYSTALLIZED AND CREATIVE
ABILITIES IN THE PREDICTION OF SCORES
ON ESSAY AND OBJECTIVE TESTS














By

SUE M. LEGG


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL
OF THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY









UNIVERSITY OF FLORIDA


1978
































Copyright 1978

by

Sue M. Legg













ACKNOWLEDGMENTS


I would like to express my appreciation to Dr.

William B. Ware, Chairman of my Doctoral Committee, for

his guidance throughout my graduate education. For

his critical appraisal of this manuscript, I am very

grateful. The direction and encouragement of the

members of my Committee, Dr. Linda Crocker and Dr.

James Wattenbarger, is gratefully acknowledged. I

would also like to thank Dr. John Bengston and Dr. Bob

Burton Brown for their interest in this project and

their willingness to serve as unofficial members of

the Committee.

Dr. Jeaninne Webb, Director of the Office of

Instructional Resources, has provided both encourage-

ment and opportunity. The years I have spent working

in Testing and Evaluation in her office have enhanced

my graduate training beyond measure. A special thank

you to Mrs. Susan Anderson for the preparation of this

manuscript.

The professional and personal support of my

husband, Keith, has made this dissertation possible,

and the patience of my family has sustained me.


iii













TABLE OF CONTENTS

PAGE

ACKNOWLEDGMENTS ........................................ iii

LIST OF TABLES........................................... v

ABSTRACT. ............................................... vii

CHAPTER I: INTRODUCTION............................... 1

CHAPTER II: REVIEW OF THE LITERATURE.................. 14

CHAPTER III: METHOD .................................... 35

CHAPTER IV: RESULTS............. ...................... 50

CHAPTER V: DISCUSSION AND CONCLUSIONS................ 73

CHAPTER VI: SUMMARY.................................. 84

APPENDIX A: EXAMPLES OF CONCRETE AND ABSTRACT ITEMS.. 91

APPENDIX B: ITEM ANALYSIS OF THE FINAL OBJECTIVE TEST 94

APPENDIX C: QUESTIONNAIRE............................. 97

BIBLIOGRAPHY............................................ 98

BIOGRAPHICAL SKETCH .................................... 107













LIST OF TABLES


TABLE PAGE

1 Items and Time Allotted to Each Test on the
Cattell Culture Fair Intelligence Test,
Scale Three....................................... 38

2 McGraw-Hill Reading Test, Summary of Items and
KR20 Reliabilities ............................... 39

3 Number of Items in Categories of Bloom's
Taxonomy of Educational Objectives............... 40

4 General Academic Background...................... 52

5 Summary of Items about Test Format .............. 53

6 Test Loadings on Rotated Principal Axes.......... 54

7 Correlation Matrix of Predictor and Criterion
Variables........................................ 55

8 Summary of the Descriptive Statistics for the
Predictor and Criterion Variables: RAW SCORES... 56

9 Mean Scores on the Predictor Variables by
Category of Z Scores on the Essay and Objective
Tests............................................ 58

10 Summary of the Results of the Regression for
Cross Validation.................................. 59

11 Summary of the Changes in R2 due to the Addition
of Predictor Variables using Objective Test
Scores as the Criterion.......................... 62

12 Summary of the Changes in R2 due to the Predictor
Variables with Essay Test Scores as the Criterion 62

13 Standardized Beta Weights for Predicting Essay
and Objective Test Scores.......................... 63

14 Summary of the Descriptive Statistics for the
Criterion Variables............................... 64

15 Summary of the Changes in R2 due to the Addition
of Predictor Variables using Concrete Item Scores
as the Criterion................................. 65

v







LIST OF TABLES
(Continued)

TABLE PAGE

16 Summary of the Changes in R2 due to the Addition
of Predictor Variables using Abstract Item Scores
as the Criterion................................. 66

17 Standardized Beta Weights for the Regression of
Concrete and Abstract Items on Measures of Fluid
and Crystallized Abilities......................... 67

18 Summary of Statistical Tests ..................... 68

19 Standardized Discriminant Function Coefficients.. 68

20 Univariate Tests for Significant Differences
Among the Groups................................. 69

21 Group Centroids................................. 69

22 Percentage of Cases Correctly Classified........ 70













Abstract of Dissertation Presented to the
Graduate Council of the University of Florida
in Partial Fulfillment of the Requirements for the
Degree of Doctor of Philosophy


THE ROLE OF FLUID, CRYSTALLIZED AND CREATIVE
ABILITIES IN THE PREDICTION OF SCORES
ON ESSAY AND OBJECTIVE TESTS

By

Sue M. Legg

August 1978

Chairman: William B. Ware
Major Department: Foundations of Education

In recent years, psychometricians and cognitive

psychologists have begun to conceptualize research ques-

tions which span both fields. Concern about the nature

of the trait being measured and its relationship to

differential cognitive abilities has grown. One of the

questions which has arisen from this area of research

involves the degree of relationship between types of test

items and the mental processes which are required to

succeed on the items. However, little empirical research

has been done relating differential cognitive abilities

to success on essay and objective tests. This specific

problem is the focus of this study.

In this study it was anticipated that a combination

of student and test attributes would help to explain why

many students score differently on essay and objective

vii







examinations which were designed to measure the same course

content. Two sets of variables were characterized as

student attributes. The first set included patterns of

fluid, crystallized and creative abilities which pre-

dicted test performance. The other set included the

previous experience of the students with the subject

matter and the test format.

Student attributes alone were not likely to explain

the variation in performance on essay and objective tests.

Attributes of the tests themselves could influence

student performance. Therefore, the test items were

classified according to their required intellectual pro-

cess. The essay item was designed to measure the ability

of the students to synthesize the material and to express

any creative insights into the conceptual relationships.

One half of the objective test items required the abstract

reasoning ability associated with fluid intelligence.

The remaining items were designed to measure the ability

to comprehend and analyze the subject matter; these skills

were more closely related to the crystallized abilities

in Cattell's framework. Items associated with fluid

ability were labeled abstract, and items which required

crystallized abilities were labeled concrete.

This study was designed to evaluate the argument

that a broad, unstructured essay measured abstract and

creative reasoning abilities while objective tests tended

to measure more crystallized abilities. The counter argu-

ment that differences in scores on essay and objective

viii







tests could be attributed to differences in the abilities

required to succeed on abstract and concrete objective

test items was also evaluated. The investigation was

conducted in four stages. A preliminary phase of the

study involved a description of the sample and a factor

analytic investigation of the independence of a creativity

dimension. Next, the relative contribution of the ability

measures to the prediction of performance on the essay and

objective tests was compared using multiple regression

techniques. The contribution of fluid and crystallized

ability to the prediction of concrete and abstract items

was compared using multiple regression techniques in the

third stage of the analysis. The final phase of the study

was an investigation of the premise that a combination of

fluid, crystallized and creative abilities and scores on

concrete and abstract items could be used to classify

students as better on essays or better on objective tests.

Discriminant function and classification analyses were

conducted to answer this question.

A separate creativity dimension was established.

However, there were no significant differences in the regres-

sion weights of the equations used to predict success on

essay and objective tests or the equations written to

predict success on abstract and concrete items. The

discriminant function which separated students who scored

higher on the essay examination from those who scored

higher on the objective examination reached statistical

significance. Students who scored higher on the

ix







objective examination tended to earn higher scores on

the abstract items associated with fluid intelligence.

Students who scored higher on the essay test tended

to earn higher scores on the concrete objective test

items.













CHAPTER I

INTRODUCTION


Traditionally, one measure of a man has been the

elegance of his prose. Yet, the vagaries inherent in

the scoring of essays were recognized as early as 1912

in studies conducted by Starch and Elliot. Essays were

not found to be valid indicators of future academic

achievement and pressure mounted for the adoption of

new types of examinations which were objective in form

and for which a sophisticated technical theory of test

construction and scoring was being developed.

The use of objective examinations did not resolve

the conflict about the validity of testing. The

validity of either examination form could not be estab-

lished unless the reliability of both essay and objective

tests was improved. Psychometricians have tended to

concentrate upon improving the technical quality of the

tests. Through research they sought to identify and

reduce sources of measurement error to improve the

reliability of test scores. The validity questions

were investigated through correlational studies of

essay and objective tests and other criteria such as

related class exercises. Seldom were psychometricians

1







interested in the intellectual processes which the

tests were purported to measure.

The intellectual process involved in learning

became the concern of the cognitive psychologists.

These studies represented attempts to identify and

classify the mental abilities which constitute intel-

ligence and relate to specific learning tasks. As

Carroll (1974) noted, the psychometricians and

cognitive psychologists pursued two practically non-

overlapping areas of research, both of which were

critically involved with the validity of testing.


The Problem

In recent years, psychometricians and cognitive

psychologists have begun to conceptualize research ques-

tions which span both fields (e.g., Hunt et al., 1975;

Horn, 1976; Cronbach, 1975; Stenhouse, 1976). Concern

about the nature of the trait being measured and its

relationship to differential cognitive abilities has

grown. The literature on aptitude-treatment interactions

(ATI) is replete with examples of these studies. One of

the questions which has arisen from this area of research

concerns the degree of relationship between types of

test items and the mental processes which were required

to succeed on the items. The relationship between

different mental abilities and the ability to solve

various types of analogy items was investigated by

Whitely (1976). However, relatively little empirical





3

research has been done which related differential abilities

to success on essay and objective tests. This specific

problem is the focus of this study.

The Measurement Approach to the Problem

Prior to an examination of the relationships between

cognitive abilities and test formats, several measurement

problems must be addressed. Early comparison studies

between essay and objective examinations yielded substan-

tial correlations between scores on essay and objective

tests, but the uncorrelated variance may have been due to

either a difference in the function of the tests or to

measurement error (Weidemann S Newens, 1933; Vernon,

1959, 1962; Andrews, 1968; Godshalk, Swineford, S

Coffman, 1966). Attempts to verify the unique contribu-

tion of essay tests by Godshalk et al. (1966), Modu

(1972), and Andrews (1968) had contradictory results.

Vernon (1961) stated that it was logical that

scores from essay and objective tests over the same

content would be correlated. Even though essay questions

may have been designed to measure higher level mental

processes, evidence indicated that some essays contained

substantial factual level information and objective

tests may have directly or indirectly measured understanding

and thinking processes. However, both types of tests

were imperfect measures.

Measurement errors in objective tests could be

traced to a variety of sources: ambiguities due to

directions, choice of foils, and relevance to course







material and assessment of complex thinking. Vernon

cited studies in which objective tests developed by

different instructors covering the same material and

administered to the same students only correlated about

0.50. Corrections for differences in difficulty of the

items and scoring techniques did not account for the low

magnitude of these correlations. Vernon concluded that

there may be a trade off between the lower reliability

of the essay and the validity problems of the objective

tests. Moreover, "since the errors which reduce the

validities are different it follows that they measure

somewhat different aspects of ability" (Vernon, 1961,

p. 228).

Vernon argued that if the value of the essay as a

measurement device was that it tended to give greater

opportunity for the measurement of the more complex mental

processes, the essay item should be broad and not highly

structured. Essays should also be marked for imagination,

fluency and unusual understanding. While short, tightly

structured essays were more reliably scored, multiple

essays of this type may be only poor substitutes for

well constructed objective test items.

The results of attempts to identify skills unique

to essay examinations have been inconclusive. Yet, the

evidence that some students consistently differ in their

ability to write essay and objective tests has grown.

French reported at the 1962 ETS invitational conference

on testing that SAT scores were related to the mechanics





5

of writing but not to the style of writing. French

asked: "Isn't this what English teachers have been

telling us psychometricians all along that essay tests of

writing ability measure something that objective tests

do not measure? They seem to be right. ." (French,

1962, p. 27). Biggs and Braun (1972) reported that it

appeared likely that the students scoring in the middle

of the distribution were the most severely affected.

Students at either end of the distribution tend to perform

about the same on either test format. The problem, as

Coffman (1971) suggested, is to find valid criteria to

document and explain these differences in ability.

The Cognitive Approach to the Problem

The ability to write essays may be more than a

combination of knowledge of the content and good form; it

may also include particular cognitive abilities. A

theoretical framework in which to place an investigation

of this premise has been developed by Cattell. The

differentiation of mental abilities into crystallized and

fluid analytic abilities was presented in Cattell and

Butcher (1968). These forms of intelligence were second

order correlated factors of Guilford's primary abilities.

Fluid intelligence represents an analytic ability

measured by culture free, non-verbal tests while crystal-

lized ability is related to those skills taught in a

particular culture. Cattell and Butcher stated that,

"The extent to which the skills will be correlated will be

a function both of the extent to which they are dependent







upon the single gf ability, and of the extent to which

they have received the same length of training and the

same opportunities to improve" (p. 21).

The distinction between these abilities is clarified

in the following passage, "Crystallized ability .

loads more highly those cognitive performances in which

certain initial intelligent judgments have become

crystallized as habits. That is to say, fluid general

ability, which is in some ways the more fundamental of

the two, has at some time been applied in this field,

and the individual by memorizing former responses, is

enabled to make further new judgments Fluid

general ability, on the other hand, shows more in tests

requiring adaptations to entirely new situations where

crystallized skills are of no advantage because they do

not apply to the particular data" (p. 19).

An extension of the fluid and crystallized ability

framework which included a separate creativity dimension

was advocated by Horn (1976). The literature surrounding

the controversy over the existence of an independent

creativity dimension has been reviewed in Chapter II.

The relevance of creativity in the scoring of essays is

apparent. However, the existence of an independent

creativity factor for these data was established in order

to include creativity in the conceptual framework of

this study.

Support for the association between ability differences

measured in different test formats can be found in an





7

article by Snow (1976). This article reviewed research

on studies of aptitude treatment interactions and focused

attention upon two complex hypotheses which Snow believed

deserved the most study. The first complex was identified

as the A.A A complex which asserts that individual

differences in anxiety (Ax), achievement via independence

(Ai), and achievement via conformity (A ), interact with

instructional treatments differing in their degree of

teacher structure and student participation. The summary

of studies investigating this complex supports the general

hypothesis that anxious conforming students perform better

with structured learning while independent, confident

students do less well in structured learning situations.

In the study conducted by Biggs and Braun (1972)

these personality constructs were related to differences

in achievement on essay and objective test formats.

Scores from five tests were factor analyzed and essay and

objective test factors emerged. Again the best students

did well on both measures, but middle range students

did poorer than expected on the objective examinations.

It appeared that students who excelled in structured

situations did better on objective tests and more inde-

pendent students performed better on essay tests.

In most aptitude-treatment interaction studies, the

personality constructs are related to general ability

measures. However, as Snow stated, most ATI research

is based upon an undifferentiated general intelligence.

Since general intelligence is itself a complex of fluid-







analytic intelligence (G ) and crystallized-verbal

intelligence (G ), a better understanding of the dif-

ferences in achievement measured by essay and objective

tests may be obtained by focusing upon differentiated

ability rather than upon differences in personality

constructs.


Purpose of the Study

Student performance on examinations over the same

subject matter has often varied depending upon the format

of the test. In this study it was anticipated that a

combination of student and test attributes would help

explain why many students score differently on essay

and objective examinations which were designed to measure

understanding of the same course content.

The purpose of this study was to examine the contri-

bution of student and test attributes to the prediction

of performance on essay and objective tests. Two sets

of variables were characterized as student attributes.

The first set included measures of fluid, crystallized

and creative abilities which predict test performance.

The other set concerned the previous experience of the

students with the subject matter and the test format.

Student attributes alone were not likely to explain

the variation in performance on essay and objective tests.

Attributes of the tests themselves could influence student

performance. Therefore, the test items were classified

according to their required intellectual process. The







essay item was designed to measure the ability of the

students to synthesize the material and to express any

creative insights into the conceptual relationships

relevant to the question. The objective test was equally

divided between items which required the abstract

reasoning associated with fluid intelligence and items

designed to measure the more crystallized abilities of

comprehension and analysis. Items associated with fluid

ability were labeled abstract, and items which required

crystallized abilities were labeled concrete.

This study was designed to evaluate the argument

that a broad, unstructured essay measures abstract and

creative reasoning abilities, and differences in the

scores on the essay and objective test could be attrib-

uted to differences in the abilities required to succeed

on abstract and concrete objective test items.

The following questions have directed this inquiry.

1. Is there a linear relationship between the essay

and objective test scores?

2. What are the patterns of fluid, crystallized

and creative abilities which can be used to predict scores

on the essay and objective tests?

3. Are different patterns of abilities predictive

of success on essay and objective tests?

4. Is there a difference in the patterns of abilities

which are related to successful performance on concrete and

abstract test items?





10

5. Will linear combinations of fluid, crystallized

and creative abilities and scores on concrete and abstract

items predict a difference in scores between essay and

objective tests?

The questions directing the inquiry have been re-

worded to test the following statistical hypotheses:

1. The multiple correlation coefficients describing

the regression of essay and objective test scores on fluid,

crystallized and creative abilities are equal to zero.

a. The measure of fluid ability will not increase

the accuracy of prediction beyond the variance predicted

by crystallized abilities for essay and objective test

scores.

b. Creativity test scores will not increase the

accuracy of prediction of essay and objective test

scores beyond the variance predicted by measures of

fluid and crystallized abilities.

c. The previous experience with test format and

number of related courses will not significantly increase

the variance explained in objective and essay test

scores by fluid, crystallized, and creative abilities.

d. The interaction of crystallized abilities and

creativity will not significantly increase the variance

explained in objective and essay test scores by fluid,

crystallized and creative abilities, and the experience

variables.

e. There are no significant differences in the

regression weights for measures of fluid, crystallized





11

and creative abilities in the prediction of success

on essay and objective tests.

2. The multiple correlation coefficients representing

the regression of scores of concrete and abstract objective

test items on fluid and crystallized abilities are zero.

a. The measures of fluid ability will not

increase the accuracy of prediction of concrete and

abstract objective test items beyond the variance

predicted by the measures of crystallized ability.

b. There are no significant differences in

the regression weights for measures of fluid and

crystallized abilities in the prediction of success

on concrete and abstract items.

3. There are no significant differences in the

pattern of abilities for students who score high on

objective tests, high on essay tests, or equally on

essay and objective tests.


Significance of the Study

The dual purpose of testing as a teaching device

and an assessment tool has long been recognized. But,

the effect of testing upon learning and the validity of

tests as measures of learning are continually being

debated. Traditional measurement research related to

this debate has centered upon the elimination of errors

due to the construction and scoring of tests. In spite

of the technical quality of current tests, educators

have become increasingly concerned with the clarification







of what is being tested. The extensive publications

on criterion referenced testing and the current emphasis

on functional skills assessment are testimony to this

interest.

A further step in the examination of the validity

of a test is to uncover the relationship between the form

of the measurement and the trait being tested (Campbell

g Fiske, 1959; Cronbach g Meehl, 1955). The manner in

which an ability is assessed may influence which mental

functions are used by students in the testing situation.

Moreover, systematic differences in the type of intelli-

gence required to succeed on different test formats may

obscure the validity of the measurements of students'

abilities.

This study represents an attempt to strengthen the

tie between the academic disciplines of measurement and

cognitive psychology. An insight into the form of

testing and the ability being measured may provide new

directions for validity studies.


Organization of the Study

The rationale for this inquiry is explained in

Chapter I. The literature pertaining to the theoretical

framework of the study and relevant empirical studies

are discussed in Chapter II. Chapter III contains a

description of the methods used to empirically investigate

the relationship between scores on essay and objective

tests and fluid and crystallized ability. Results of





13
this study are reported in Chapter IV. The discussion

of the results with the conclusions and implications of

the findings are presented in Chapter V. The final

chapter includes a summary of the study.













CHAPTER II

REVIEW OF THE LITERATURE


The literature review has been organized around the

following topics: comparative validity of essay and

objective tests, cognitive and creative components of

intelligence, problems in the scoring of essays, and

problems in estimating the reliability of essays. These

topics are interrelated. An assessment of the validity

of essays or objective tests is incomplete unless the

mental abilities being tested are understood. However,

in order to document the validity of the tests which

measure these abilities, a minimum level of reliability

of the instrument had to be assured. Methods to score

essays and to establish the reliabilities of those scores

have been developed in answer to this need.

From this review of the literature a theoretical

framework was constructed. Patterns of fluid and crystal-

lized intelligence and creativity were related to reliably

scored essay and objective tests. Factors in the

construction of the examinations as well as differences

in the experience of the students with the course content

and the test formats have been included in the design of

the study.







Comparative Validity Studies

Comparisons between essay and objective examinations

have been designed to assess whether or not the two

formats measured the same type of achievement. These

studies were generally conducted by an analysis of the

correlations between the essay scores and scores on an

objective test covering the same content. The reliabilities

and correlation coefficients were compared. The results

indicated that substantial correlations existed between

the two types of tests. However, as noted in the intro-

duction, the portion of the variance which did not correlate

may have been due to a difference in the function of the

tests or to measurement error (Weidemann S Newens, 1933;

Vernon, 1959, 1962; Andrews, 1968; Godshalk, Swineford,

& Coffman, 1966). These conclusions were reached because

the reliabilities of the objective tests were higher than

the essay test reliability. However, reliability for

factual type questions was higher than for analysis

level items. Therefore, the more complex the objective

examination, the less reliable was the score.

Cieutat (1960) did correlate factual and application

objective test items with factual and application short

answer exercises. The correlation for the factual tests

was .62 and the correlations for the application tests

dropped to r = .47. Studies of this nature support the

belief that factual items are more valid than applied

items.







Noyes (1963) analyzed the essay scores of 646 eleventh

and twelfth grade students. The purpose of the study was

to predict writing ability from various combinations of

short objective tests, each measuring a different aspect

of writing ability. The fact that the essay tests

correlated more highly with each other than with the

multiple choice test indicated that the essays may have

been measuring a different type of achievement. The

study was criticized because an analysis of the inter-

correlations of the objective tests may have altered the

conclusions (Andrews, 1968).

Comparisons of the relative validity of essay and

objective tests were enhanced by the use of multiple

regression techniques. College Board studies by Godshalk

et al. (1966) found that twenty minute essay scores did

make a unique contribution to the predictive validity

of a one hour objective English composition test when the

criterion was the score on a two and one half hour essay

examination. But, in a later study by Modu (1972) it

was concluded that very little new information was gained

by including a twenty minute essay in an American history

objective test. Moreover, the essay scores may have

reflected factual knowledge rather than the more complex

processes such as application or synthesis.

Study Methods

A completely different line of inquiry into the

relationship between the two test formats dealt with

questions about the way in which students learned and





17
retained information. The influence of test format on

study methods for examinations has been the focus of

several studies. Observational and questionnaire data

reported in articles by Terry (1933), and Meyer (1936)

suggested that students were concerned with detail when

preparing for objective examinations. The Meyer study

concluded that achievement on tests of either format

was greater when students expected an essay examination.

Similar results were reported by Katona (1940), and

Weidemann and Newens (1933). However, Vallance (1947)

and French (1956) questioned whether either method of

testing had any effect on the retention of content over

time. No reliable differences in achievement were found

by Hakstian (1971) between students who expected either

an objective or an essay test and were given the opposite.

The results of these studies make any definitive statement

on the value of a particular format for promoting good

study habits questionable at best.

In another study, scores for both essay and multiple

choice tests were increased by allowing open book examina-

tions (Feldhusen, 1961). Results from questionnaires

revealed that students believed that open book examina-

tions reduced rote memorization and promoted learning

during testing regardless of the form of the test.

Learning Styles

Recent articles by Biggs (1970) and Biggs and Braun

(1972) have reported the results of investigations into

the relationship between test format preference and







personality characteristics. Biggs established that a

stable relationship existed between personality charac-

teristics and study behavior. These different charac-

teristics were correlated with the results from a factor

analysis of the scores from two essays and three objective

tests. The essay test scores had loaded on one factor,

and the objective test scores loaded on the other factor.

The implication of these findings was that students did

differ in their ability to take essay and objective tests.

Therefore, the grading procedure for a course could

discriminate against some students on the basis of their

learning style. For example, if an instructor weighted

all examinations equally and gave mixed format tests,

the course grade for a student may have depended upon

the particular combination of test formats.

Biggs and Braun termed equal weight scoring of tests

for a course grade as the union model of scoring. Scores

based upon the union model were related to types of study

behavior. This comparison revealed that union scoring

favored students who were dependent on the instructor,

organized, tended toward rote learning and could relate

information. The alternative scoring model was the

disjunction model in which individual students were tested

by the specific strategy which worked best for them.

Biggs and Braun rescored the same set of examinations

using this model and found that students who were

independent did better. It should be noted that the

best students did well with either model. Thus, the




19

discrimination in scoring affected the middle range of

the distribution of scores.

Mental abilities and test format. Several attempts

have been made to discover the nature of the difference

in mental function measured by the essay and objective

test format (Huddeston, 1954; Cast, 1940; Vernon, 1962;

Rothkopf S Thurner, 1970). These studies correlated

the essay and objective scores with a criterion. Lee

and Symonds (1933) reported that objective tests corre-

lated more highly with intelligence than did essay

tests: The higher correlations may, in fact, have been

due to the higher reliability of the multiple choice

tests. Andrews (1968) stated that multiple choice and

essay tests correlated more highly with each other than

with any other assignment except the total grade assigned

for several journal reviews.

The studies previously cited were attempts to verify

intra-individual differences in a student's ability to

succeed on both essay and objective examinations. There

was a consensus that essay and objective tests may tap

somewhat different abilities under certain conditions.

The length of the essay and the complexity of the

required thinking process were important considerations.

Relatively few studies have attempted to describe the

cognitive abilities which might have a bearing upon

the problem.

Much of the current interest in differential cognitive

abilities has been generated by Guilford's (1967) Structure







of Intellect model. This model resulted from a factor

analytic investigation of a variety of thinking abilities.

Memory, cognition, convergent and divergent production,

and evaluation factors emerged from the analyses of

tests of primary mental abilities. The development of

these factors in individuals was thought to depend upon

a combination of innate and environmental influences.

The cognition factor included an element which

could aid in the interpretation of the differences in

success on various test formats. Guilford identified

a type of cognition in which the implications of actions

were recognized. This ability was of two types, concrete

and abstract foresight. Guilford stated that foresight

was an important ability for the political strategist

or policy maker. Thus, the extent to which the objective

test items measure the same cognitive abilities as well

as the same content would be important in determining

the comparability of the two test formats.

The Production factor in the Structure of Intellect

model is particularly relevant to this study. The

individual tendency toward convergent or divergent

thinking may have a bearing on success on different test

formats. Guilford (1956) explained as follows:

In convergent thinking, there is usually one
conclusion or answer that is regarded as unique,
and thinking is channelled or controlled in the
direction of that answer. In tests of the con-
vergent thinking factors, there is one keyed
answer to each item. Multiple choice tests are
well adapted to the measurement of these abilities.
In divergent thinking, on the other hand, there is







much searching or going off in various directions.
This is most clearly seen when there is no unique
conclusion. (p. 274)

Studies which linked patterns of cognitive abilities

and success on different item formats have been encouraged

by Carroll (1974) and Messick (1972). Whitely (1976)

followed the suggestion to study the item in order to

further knowledge about abilities and learning. Whitely

had two purposes for her study of the analogy item. One

purpose was to determine if the relational concepts which

are the basis for analogies influence the specific cogni-

tive aptitudes reflected in analogy item performance. The

second purpose was to discover whether or not individual

differences in the ability to solve analogies could be

attributed to individual differences in processing

relationships. Whitely found that success on particular

types of analogies was related to specific cognitive

abilities. However, kinds of relationships tested on

different forms of analogy tests were typically not

controlled in the item selection process.

Whitely used the French (1951) kit of primary mental

abilities in her analysis. However, Cronbach (1975)

advocated the use of broad ability theories in studies of

individual differences in learning. Broad ability con-

structs are of two basic types (Horn, 1976). One type is

commensurate with Cattell's (1971) formulation of fluid

and crystallized ability; the other includes the hier-

archical theories such as Vernon's (1950) verbal-numerical-







educational factor and a practical-mechanical-spatial-

physical factor. Differences in the development of each

theory relate in part to the extent of their dependence

upon factor analytic techniques. Fluid and crystallized

abilities stem directly from second order factor analyses

of Guilford's primary mental abilities, whereas hier-

archical formulations include some variables in broad

ability constructs on the basis of theory rather than

empirical evidence.

Criticisms of Cattell's theory by Humphreys (1967)

were methodological in nature. Humphreys did not dispute

the existence of the broad ability constructs. Rather,

he criticized the inclusion of some near random variables

in the correlation matrix of primary mental abilities.

He also questioned the decision on the number of factors

to rotate. In a reanalysis of the second order factors,

Humphreys concluded that intellectual speed and personality

factors should be identified along with fluid and crystal-

lized abilities.

There are other researchers who argue that a creativity

dimension is independent of general ability factors (e.g.,

Cropley, 1972; Rossman 8 Horn, 1972; Kogan, 1971; Murphy,

1973; Torrance, 1970; Wallach S Kogan, 1965). Creativity

has eluded precise definition which may be one reason for

the controversy about its measurement. Yet, empirical

indicators of creativity labeled as verbal productive

thinking consistently recur in the literature. Measures

of verbal productive thinking are tests of originality,







fluency and flexibility such as those found in the Torrance

Tests of Creativity. Similar tests which measure these

abilities loaded on the convergent-divergent production

factor in the Structure of Intellect model.

An investigation into the relationship of fluid and

crystallized abilities and verbal productive thinking was

conducted by Vernon (1972). In a review of the study,

Horn (1976) reported that verbal productive thinking,

with crystallized intelligence partialled out, was not a

useful predictor of grades, teacher ratings of imagination

or originality, or peer sociometric evaluations. Verbal

productive thinking did contribute to the prediction of

scores on essays and stories beyond the variance due to

fluid and crystallized ability.

In a review of the literature on mental abilities,

Horn (1976) stated that while verbal productive thinking

was largely independent of intelligence, there was doubt

about the extent to which it measured real life creativity.

lie suggested that studies be designed to show the difference

in the pattern of predictions for verbal productive

thinking and intelligence. He conjectured that when

achievement of literary comprehension or critical reading

were the dependent variables, a stepwise multiple regression

procedure would select crystallized intelligence first,

then fluid intelligence followed by a little verbal

productive thinking. While it is not hypothesized that

students who score differently on essay and objective

tests are necessarily "real life creatives" deviations







from the pattern of regression weights suggested by Horn

could indicate that the convergent-divergent thinking

factor operates differently on students who score differently

on essay and objective tests.

David Stenhouse (1976) advanced the argument that

creative students would tend to perform better on objec-

tive tests than on essay tests. There were two criticisms

of objective testing which Stenhouse countered to build

his case. First, the opinion that objective tests were

merely verbal recall exercises was discounted by using the

language game concept (Wittgenstein, 1953). A language

game requires a student to understand how words fit

together, how they relate to physical objects, and how

objects can be used. As knowledge increases the language

becomes more packed with meaning. Thus, the "enigma of

expert inarticulateness" can occur (Stenhouse, p. 171).

The student may have a plethora of information from which

it is difficult to choose quickly. Thus, objective test

formats can test the ability of the student to understand

the relevant language game limited only by the ability

of the examiner to formulate effective questions. Stenhouse

explained the fact that students often are not able to

explain their choices in objective examinations because

these students have an efficient subconscious upon which

they are willing to rely. This explanation is consistent

with those creativity studies in which creative students

tend to rely on the subconscious and to be high risk takers,

and self confident (Barron, 1969; Bloomberg, 1973).







Stenhouse suggested that creative students may be

characterized in multi-modal test situations by rapid

completion of objective items and short essay responses.

Stenhouse argues:

Thus, a good objective test by the fact that it
actually provides the answers and the candidate
has only to select the appropriate one, allows
an individual whose learning has not been
thorough in the usual quantitative sense but
whose powers of discrimination and judgment are
high, to score well in relation to his essay
type score. (p. 177)

Scoring the Essay

The procedures for scoring essay examinations

evolved from suggestions by Sims (1931, 1933) and Cochran

and Weidemann (1939). According to their procedure, the

papers were sorted into from three to five groups after

a cursory reading. The essays were then reread and

shifted as necessary to the appropriate group. Each

question was read separately following a review of the

text and lecture materials relevant to the topic. The

scoring process included a listing of the main points

for an ideal answer. These points were weighted in order

of importance. The development of a key against which to

judge essay content was an attempt to turn readers away

from the distractions caused by spelling, grammatical or

other errors. This systematic approach to the marking

of essays was intended to reduce the variation in ratings

among readers. However, studies contrasting various

approaches to marking compositions found that no matter

which method was used, the marks of individual examiners

diverged widely (Cast, 1939; Hartog g Rhodes, 1936).





26
Stalnaker (1938) found that weighting the questions

for difficulty did not increase score reliability for

essay examinations. Correlations between weighted and

unweighted scores on the College Board examinations were

nearly perfect.

Later studies improved the marking procedures by

selecting representative essays to illustrate selected

points on the rating scale (Gosling, 1966). A chief

examiner was also added to spot check the rating process

to insure adherence to standards. Correlations between

the marks of the chief examiner and the average of the

marks of the other raters were extremely high. The

scoring situation was unique in that there were 10 raters

who had considerable experience using the rating instru-

ment on those particular topics.

Attempts to minimize sources of measurement error

due to rater unreliability led to studies comparing

global or "holistic" rating with the analytical procedure.

Global scoring yielded an assessment of the essay expressed

as a single score. The score represented an integration

of all of the criteria used to judge the essay. Analytical

scoring gave a composite score in which the various elements

were scored separately and summed for a total evaluation.

These methods were related to the different perceptions

of the nature and function of the essay. Sims (1931)

wrote about the functional type essay which, ". reveals

information regarding the structure, dynamics and functioning

of the student's mental life as it has been modified by a





27
particular set of learning experiences" (p. 17). Sims

(1933) also defined essay examinations as projective

techniques. The examinee was confronted with a situation

into which he "projected" his personality and drew upon

his experiences and values in formulating his response.

The latitude allowed in the response was related to the

appropriate scoring technique.

The more complex the process of responding to an

item, the more difficult it was to develop a suitable key.

The danger in analytical scoring was that rating would

be too narrowly focused. An essay rated poor when the

score consisted of a sum of the component elements in a

scoring key could be ranked very high under global scoring.

Global scoring permitted an assessment of the inter-

relationships of the various components of a good essay.

In practice it appeared that there was little difference

in scoring reliability between the two methods (Coward,

1950; Coffman & Kurfman, 1968). Coward did discover that

two global ratings were finished in the time that one

analytical rating was made with equally reliable results.

Therefore, it would appear that for most purposes, global

scoring is preferable.

Problems in Estimating Reliability

A major source of error in estimating the reliability

of essay examinations is scoring error. Readers differ

in the standards they apply, in their preference for

writing styles, and in their allocations of grades (French,

1962; Coffman, 1971). Some raters grade more severely







than others, or they may tend to distribute grades

differently across the scale. The lack of interrater

reliability has been extensively documented (Hartog 8

Rhodes, 1936; Finlayson, 1951; Vernon S Millican, 1954;

Pearson, 1955; Noyes, 1963; Coffman & Kurfman, 1968).

Not only do raters differ from each other, the same

rater does not mark the same paper consistently over

time (Marshall, 1967). One of the higher rates of agree-

ment on ranking the same essays at two separate times

was an eighty-nine percent agreement in a study by

Phillips (1975). This study did not deal with essays

specifically, but with the marking of open ended exercises

in the National Assessment of Educational Progress.

The order in which essays were read has affected

the consistency of scoring also. Stalnaker (1936)

grouped essays by quality following an initial screening

of the papers. Raters were assigned groups of essays

arranged according to a pre-determined pattern of poor

quality essays and superior ones. The grades for average

papers were depressed when they followed good papers.

Conversely, average papers received high marks when they

were read after poor papers. A follow up study by Hales

and Tokar (1975) reached similar conclusions. Therefore,

both studies recommended that the marking procedure take

into account the possibility that the order in which

papers are turned in by students may be related to the

quality of the examinations. For example, students of

similar ability may sit together or may tend to require







a similar length of time to complete the essay. Storey

(1968) developed three nearly identical paragraphs which

were judged to be excellent, fair and poor by two panels

of raters. The paragraphs were submitted to 261 teachers

in homogeneous groups. Scores were distributed similarly

for each group of essays and no significant differences

were found in the means or standard deviations. The

grades awarded reflected teacher set rather than the

value inherent in the paragraphs.

Early studies of the effect of handwriting on the

evaluation of essays were made by James (1927) and

Sheppard (1929). The same compositions were recopied,

and the quality of the penmanship was related to the marks

given. Chase (1968) reexamined the question in a study of

the effect of handwriting quality, spelling accuracy and

the use of a scoring key. Significant differences in

marking were related to handwriting but not to spelling

or the use of the scoring key. Handwriting quality did

not have an apparent relation to the grade on the first

of the two essays, but substantial differences did occur

when the second essay was marked. It was suggested by

the author that the readers may have lost patience after

grading the first essay and lowered marks on the second

in frustration. Markham (1976) used a classification

analysis and found that various teacher characteristics

(age, experience, levels taught, or degree held) had no

significant influence on the score given to essays written

by elementary children. However, an analysis of variance






30
of the scores indicated that scores varied significantly

when the dependent variable was a ranking of the essays

on the basis of handwriting quality. Different results

were reported by Marshall (1972) who compared four levels

of composition errors and four writing treatments (typed,

neat, fair, poor) in a four by four factorial analysis

of variance design. In this study, 16 forms of an essay

examination which were identical in content but different

in the number of errors and neatness were graded by 480

classroom teachers. In this case, no significant

differences were found in the analysis.

Another effect on the reliability of essays was the

error associated with the choice of topic. Ruch (1929)

compared the reliability of eighth grade examinations in

16 subjects with the reliability of marking the examinations.

The same paper read twice by two readers had an average

correlation of .62. The scores for two papers written

by the same student and read by the same person were

correlated at a lower level (.43). Consistent results

were also reported by Young (1962), Swineford (1964),

and Gustav (1968). However, Wiseman and Wrigley (1958)

reported that differences between the average scores

for children selecting different essay topics could

be accounted for by differences in the ability of the

children. The authors concluded that the use of essay

examinations where children were allowed to choose

a topic from a set of topics was not likely to introduce

any error in the marking. The choice of topic served as





31

a mechanism for the children to sort themselves out and

made no real difference in the final distribution of

scores.

The degree to which scores on essays reflect content

knowledge or composition skill has been the subject of

much research. Scannell and Marshall (1966), and

Marshall (1967) used a factorial analysis of variance to

investigate the relationship between essay scores and

three levels of grammatical errors and four types of

composition errors. It was reported that spelling and

grammatical errors reduced marks, but punctuation errors

did not. Following a principal components analysis of

composition errors, Slotnick (1972) compared very high

and very low papers. These papers could be distinguished

by quality of thought, spelling, range of vocabulary,

word choice, sentence structure, emphasis, and paragraph

organization. Diederich, French and Carlton (1961)

identified five characteristics of essays which contributed

to the variability in grades assigned to 300 essays by

a cross section of businessmen, teachers and scientists.

A factor analysis of essay and Scholastic Aptitude scores

resulted in five factors. These factors were ideas,

form,flavor, mechanics, and wording. The SAT scores

were similar to the scores assigned by those readers

weighting mechanics and word skills heavily. The SAT

scores were unrelated to scores by readers who graded more

heavily on ideas.





32

Fosvedt (1965) selected five criteria for evaluating

English compositions from lists drawn from the National

Association of Teachers of English, testing services, and

journal articles. This approach to the identification

of criteria differed from those mentioned above in which

criteria were deduced from an examination of the relation-

ship between the types of errors made and the scores.

Fosvedt validated five criteria: coherence and logic,

development of ideas, diction, emphasis, organizing through

sentence structure, and paragraphing. The criteria were

ranked by a panel of ten judges. Twenty themes were

evaluated on a three point scale on each of the five

criteria. An analysis of variance on the average grades

assigned resulted in significant differences on the grades

assigned by teachers and among the criteria. The con-

clusion reached by the author was that even though

teachers believed that criteria for judging essays were

important, they failed to apply the criteria consistently.


Summary

The results of attempts to measure abilities uniquely

measured by essay examinations have been inconclusive.

The acknowledged difficulty in reliably scoring essays

has tended to attenuate correlation coefficients between

the scores and the criteria. Yet, the belief that essays

measure a unique combination of skills persists. The

task as Coffman (1971) suggested, is to find valid

criteria to document these skills.





33

The process of scoring essays has been refined by

the large testing companies. An adequate number of raters

along with several samples of writing tests contribute

to the reliability of the results for essay examinations.

The mechanics of scoring include training the raters in

the use of the criteria for evaluating the essays, sorting

the papers into categories prior to a final reading, and

spot checking the final reading to insure that similar

standards are being applied.

The content versus style dilemma has not been

resolved, although it appeared that spelling errors and

poor handwriting often had a negative effect on ratings.

Several researchers have investigated the components

upon which content and style ratings were based. While

the criteria that raters may have used were consistent

across studies, the relationship between these criteria

and scoring was not clear.

The unique contribution of essay examinations to

the measurement of achievement has not been clearly

demonstrated. While comparison studies did indicate

that some students score differently on the two types of

examinations, it was not clear to what ability the

difference could be attributed.

Differences in cognitive abilities may be related

to success on specific types of psychometric items.

The debate is between those researchers who associate

fluid ability with creativity and success on essay

examinations, and those who link creativity, objective




34
examinations and fluid ability. Crystallized ability is

important in any examination. Where success on test

formats is uneven, differential cognitive abilities may

provide an explanation. However, the difference in the

cognitive requirements within an objective test may be

as great as the difference between essay and objective

test items.












CHAPTER III

METHOD


This study was designed to assess the contribution of

fluid, crystallized and creative abilities to the prediction

of success on essay and objective examinations. The investi-

gation was conducted in four stages. A preliminary phase of

the study involved a description of the sample and an investi-

gation of the independence of a creativity dimension. Next,

the relative importance of the ability measures for the

prediction of performance on the essay and objective tests was

compared. The contribution of fluid and crystallized ability

to the prediction of concrete and abstract items was compared

in the third stage of the analysis. It was anticipated that

differences in the cognitive levels of the objective test

items would help to explain the differences in the students'

ability to succeed on essay and objective tests. The final

phase of the study was an investigation of the premise that

a combination of fluid, crystallized and creative abilities

and scores on concrete and abstract items could be used to

classify students as better on essays or better on objective

tests. The subjects for this study were 168 students enrolled

in an introductory course in political science. A description

of the subjects, instruments, and statistical analyses is

presented in this chapter.

35







The Sample

Students enrolled in an introductory political science

course were the subjects for this study. There were 171

students enrolled in the course, and 3 students chose not

to participate in the study. The remaining students

received a bonus for participation to be used in the event

their course grade was in question. List wise deletion

of cases due to missing data was used.

The course emphasized the study of international

relations, and the lectures and examinations stressed

conceptual understanding more than factual knowledge.

Since the course fulfilled the general education require-

ment, students were generally sophomores drawn from a

variety of departments within the university.


The Instruments

Five instruments were administered: a questionnaire,

the Cattell Culture Fair Intelligence Test: Scale Three,

the McGraw-Hill Basic Skills Reading Test, the Torrance

Tests of Creativity, and a final two hour examination. A

two hour essay and objective midterm examination was also

administered which gave students experience with the format

of the items used on the final examination.

The questionnaire included eight items drawn from

suggestions by Coffman (1971). The information obtained

from the questionnaire covered the past experience of the

students with other political science courses, with the

instructor, and with essay and objective test formats.







In addition, students were asked to state their test

format preference, if any.

To assess fluid intelligence, the Cattell Culture

Fair Intelligence Test was administered during the second

week of class. Both sections of the test were given to

provide greater score reliability. The test included

100 items in the 4 different subtests: series, classi-

fications, matrices, and conditions. The manual reported

that there were high intercorrelations among the subtests,

and the test represented a valid general ability factor.

Scale Three of the Cattell test was specifically designed

to discriminate among very intelligent high school and

college students.

Reliability for Scale Three was reported at rxx = .91

for college undergraduates as an immediate test-retest

coefficient. Test-retest stability coefficients obtained

by increasing the interval between testing were reported

around .80.

This test measures the "relation education capacity .

in quite different fields of content, that is, verbal,

numerical, spatial, and social skills" (Cattell, p. 6).

These abilities are largely culture free and represent

fluid intelligence. The number of items and testing time

for the four sections of the Cattell test have been

reported in Table 1.







Table 1

Items and Time Allotted to Each Test on the Cattell
Culture Fair Intelligence Test, Scale Three



Time
Test No. of Items (in minutes)


Series 26 6

Classifications 28 8

Matrices 26 6

Conditions 20 5



The McGraw-Hill Basic Skills Reading Test, Form A,

was used as a measure of crystallized ability. This test

included scores for reading rate, retention of information,

skimming and scanning, and paragraph comprehension. A

total score was obtained by summing the three part scores;

reading rate scores were not included. Only the reading

comprehension and the retention of information scores were

used in the study. The test was intended to assess specific

skills in reading which are relevant to academic success

in college. ,The number of items in each subtest, the

testing time, and the total number of items have been

included in Table 2. The internal consistency of the

test was calculated at r = .89 using the Kuder Richardson

formula twenty. However, the speed factor in the skimming

and scanning section may have inflated the reliability

coefficient.







Table 2

McGraw-Hill Reading Test
Summary of Items and KR20 Reliabilities


No. of Time
Items KR20 (in minutes)


Part I Reading Rate and 20 .65 16
Comprehension

Part II Skimming and 30 .88 10
Scanning

Part III Paragraph 30 .76 40
Comprehension

TOTALS 80 66



Measures of reading skills are nearly pure measures of

crystallized abilities (Horn, 1968). Thus, the Cattell

test and the McGraw-Hill Reading Test represent the fluid

and crystallized ability variables. The Torrance Tests

of Creativity were selected as measures of verbal produc-

tive thinking or creativity. Torrance (1974) defined

creativity as:

a process of becoming sensitive to problems,
deficiencies, gaps in knowledge, missing elements,
disharmonies, and so on; identifying the diffi-
culty; searching for solutions, making guesses or
formulating hypotheses about the deficiencies;
testing and retesting these hypotheses and
possibly modifying and retesting them; and finally
communicating the results. (p. 8)

To assess these qualities, Torrance included seven

activities which were scored for fluency, flexibility

and originality. The rationale for the selection of

these activities was explained in the technical manual







(Torrance, 1968). The test required 45 minutes to

administer. It was scored according to the directions

given in the scoring manual.

The final examination for the course was a two part

examination requiring two hours to complete. The first

part of the test was a 50 item multiple choice test

developed and pretested by the instructor the previous

year. The test was relatively difficult and was designed

to assess the students' abilities to comprehend and

analyze the course content.

Items in the objective portion of the examination

were categorized in two ways. First, the instructor

grouped the items according to Bloom's (1956) taxonomy

of Educational Objectives. The classification of the items

by the instructor was reviewed by the investigator and

agreement was reached on the appropriate category for

each item. This categorization ensured that the items

would not be primarily factual. The number of items

which represented each category was included in Table 3.

Table 3

Number of Items in Categories of Bloom's
Taxonomy of Educational Objectives



Category No. of Items


Knowledge 6

Comprehension 8

Analysis 7

Synthesis 29






41
The second dimension along which items were grouped

was a concrete-abstract differentiation. Concrete items

were defined as those items which related directly to

information given in lectures or the text. These items

were designed to assess the students' ability to analyze

material or ideas specifically emphasized in course

materials. It was anticipated that these items would

favor the conscientious student with strong crystallized

abilities. A second type of item was written in which

the connection with specific course materials was less

direct. These items required the student to recognize

the relevant concepts and make generalizations based

upon an understanding of their interrelationships. These

broader, more global questions were expected to require

higher fluid abilities. The instructor classified 22

items as concrete and 23 items as abstract. Five

items were deleted due to low discrimination. Exam-

ples of concrete and abstract items can be found in

Appendix A.


Scoring

The objective test was machine scored using a National

Computer System's scanner. The items were analyzed using

Test Grader II, a computer program adapted from a program

written at the University of Wisconsin. Test Grader II

provided the descriptive test statistics, including item

difficulties, discrimination and point bi-serial correla-

tions with total test score. Internal consistency was






42
computed using the Kuder Richardson formula twenty. The

results of the item analysis have been included in Appendix

B.

The essay test was scored by the instructor on two

separate occasions. Global scoring was used. The score

scale was the traditional scale from A for an excellent

paper to E for a failing paper. The alphabetic grades

were assigned points on a numeric scale. Each alphabetic

category was subdivided into four groups; thus, the total

numeric scale ranged from one to twenty points. Scoring

reliability was assessed by correlating the scores on

two separate readings of the essay.

The global scores assigned by the instructor were

based upon these general criteria:

1. Understanding the dilemma posed by the question

2. Synthesis of diverse material and relevance of

examples

3. Discussion and analysis of conceptual issues

4. Originality of perspective

Writing style was a factor in scoring to the degree that

good style enhanced the effectiveness of the argument.

However, a conscious effort was made to discount grammatical

and spelling errors as well as poor penmanship. The essays

were shuffled and the names obscured to reduce scoring

bias.







Analysis

This study was designed to compare the contribution

of fluid, crystallized, and creative abilities to the

prediction of success on essay and objective tests.

Reading, abstract reasoning, and a creativity test were

administered along with a questionnaire. A one hour

essay test and a fifty item objective test over the same

course content were also administered. Subjects for this

study were 168 students enrolled in a political science

course.

The analysis of the data was completed in four phases.

The preliminary phase involved a description of the relevant

characteristics of the population. In addition, a factor

analysis was conducted to validate the existence of a

separate creativity dimension. The contribution of the

ability measures to the prediction of success on essay and

objective tests was analyzed in three ways. Next, the

extent to which the ability measures would predict scores

on essay and objective tests was established. Separate

multiple regression analyses for each test format were

compared. The third phase was a consideration of the pos-

sibility that success on abstract and concrete objective

test items would require a different pattern of abilities.

Multiple regression was used to analyze the contribution of

the ability measures to the prediction of success on the

concrete and abstract items. The respective regression

weights were compared to determine their similarity.






44
The premise investigated in the final phase of the

study was that differences in abilities could be used

to predict relative standing on essay and objective tests.

A discriminant function procedure was used to analyze

the classification of students as better on essays, better

on objective tests or the same on both test formats.

Preliminary Analyses

The analyses of the data began with a description of

the background characteristics of the sample. The descrip-

tion section was important for two reasons: the sample

was not randomly drawn, and previous educational experiences

of the students were expected to influence the results of

the study.

Another preliminary analysis was conducted to examine

the interrelationship of the variables. The existence of

a creativity dimension separate from dimensions of fluid

and crystallized abilities has been debated in the literature.

One hypothesis of this study was that a creativity dimension

would operate differently on essay and objective tests.

Therefore, to establish that a separate creativity dimension

existed for this sample, a factor analysis of the scores

from the reading, Torrance, Cattell and class examinations

was conducted. The computer program was from the Statistical

Package for the Social Sciences (SPSS). Since the purpose

of the analysis was to examine the underlying structure of

the variables, a principal axes solution was selected.

Multiple correlation coefficients were inserted in the

diagonal of the correlation matrix, and iterations were






45

conducted to arrive at the communality estimates. The

four principal factors with eigen values greater than 1.0

were extracted. The four orthogonal factors were rotated

according to the varimax criterion.

Predicting Essay and Objective Test Scores

Multiple regression equations were developed to test

the hypothesis that measures of fluid, crystallized and

creative abilities predict success on essay and objective

tests. The predictor variables included in this analysis

were retention, comprehension, originality, fluid intel-

ligence and major field of study, and experience with

essay tests. The originality score was the only score

selected from the Torrance Test for two reasons. First

of all, the course instructor stated that originality was

an important consideration in his scoring of the essay

examinations. The second reason originality scores were

selected came from the criticisms of creativity tests

in the literature. In a critique of the Torrance Tests

of Creativity, Harvey, Hoffmeister, Coates, and White

(1970) stated that the originality scores were consistent

across the different activities included in the test.

The fluency and flexibility scores varied with each

activity. Therefore, the use of the originality test fit

the purposes of the study and was judged to be a reliable

measure.

Two separate regression equations were written. One

equation specified the final essay scores as the dependent

variable. The second equation specified the final objective







test scores as the dependent variable. The regression

procedure from SPSS was used to complete the analysis.

The order of entry of the variables was predetermined as

follows: crystallized abilities, fluid ability, originality,

major, experience and the interactions. Tests of signifi-

cance for the increase in the sums of squares due to each

step in the regression were made. The regression model to

be tested was written as follows:

Y = blX1 + b2X2 + b3X3 + b4X4 + b5X5 + b6X6 +

b7X1X4 + b8X2X4 + E.

where:

A
Y = predicted scores on the essay or objective tests

X1 = reading retention

X2 = reading comprehension

X3 = fluid intelligence

X = originality

X5 = major field

X6 = previous experience with test formats

XIX = interaction of retention and originality

X2X = interactions of comprehension and originality

The comparison of the beta weights for the two equations

was accomplished by the Biomedical Package, program 011V

(Dixon, 1973). The statistical procedure has been described

in the section on profile analysis in Morrison (1967). The

level of statistical significance was set with alpha at .05.






47
The hypothesis to be tested was that the differences

in the regression weights would be simultaneously equal

to zero.

H0 = [BIl B21] = [B12 B22] = [B13 B23 =

[B14 B24] = 0.

A cross validation sample of 100 cases was randomly

drawn from the population. Scores based upon the predic-

tion weights of the screening sample were correlated with

the observed criterion scores of the calibration sample.

The shrinkage in R2 was estimated and the samples were

recombined.

Predicting Scores on Concrete and Abstract Items

The second phase of the study included an examination

of the objective test items to determine whether or not

students differed in the ability to succeed on the concrete

and abstract items. A further question involved the

possibility that success on a particular category of

objective test item would be related to high scores on

the ability measures. Two regression equations were

written with the scores on the concrete and abstract items

as the dependent variables respectively. Independent

variables were the two reading scores and the fluid

intelligence score. Regression weights were again compared

using the C matrix of the Biomedical Computer program

011V.







Predicting Differences in Success Across Test Formats

The pattern of the regression weights would not

necessarily be the same for students who score the same on

essay and objective tests and students who score differently

on the two test formats. To pursue this possibility, three

groups were formed. The high essay group included students

whose scores on the essay test were at least one standard

z score above their scores on the objective test. The

high objective test group was defined as those students

whose objective test scores were at least one standard

z score higher than their essay test scores. The third

group included the remaining students whose scores on the

two tests were within one z score. A discriminant

function analysis using the computer program from SPSS

was conducted. The four ability measures of fluid intelli-

gence, retention, comprehension and creativity were

used to predict group membership with the direct solution

option of the SPSS computer program.


Summary

The.study was designed to assess the contribution of

fluid, crystallized and creative abilities to the predic-

tion of success on essay and objective tests. To aid in

the interpretation of the results, a questionnaire was

administered. The items on the questionnaire provided

data on the students' prior experience with the essay and

objective test formats. Other information such as student

major and previous courses from the instructor was gathered.





49

The data were analyzed in several stages. First, the

existence of creativity as a separate factor from fluid and

crystallized intelligence was tested using a principal

axes factor analysis. Multiple regression analyses were

conducted to compare the prediction of success on the

essay and objective tests by the four ability measures

and the measures of experience. The regression weights

were compared using the C matrix of BMD 011V.

The second phase included an examination of the

possibility that concrete and abstract objective test

items would require different patterns of abilities in

order to succeed on the items. Multiple regression

analysis was used to investigate the relationships between

the item categories and the ability patterns.

A discriminant function analysis was used to study

the contribution of the ability variables to the predic-

tion of the relative position of students on the essay

and objective tests. Students were categorized into three

groups. Those students whose essay test scores exceeded

their objective test scores by at least one standard z

score comprised the high essay group. The high objective

group included students whose objective test scores exceeded

their essay test scores by at least one standard z score.

The third group was composed of the remaining students

whose scores on the two test formats did not deviate by

more than one standard z score.













CHAPTER IV

RESULTS

This study was an investigation of the relationship

between fluid, crystallized and creative abilities and

success on essay and objective examinations. The hypotheses

generated to compare the patterns of abilities which are

related to success on the two test formats were stated

below. Specific hypotheses related to each general

hypothesis have been listed.

1. The multiple correlation coefficients representing

the regression of essay and objective test scores on fluid,

crystallized and creative abilities are equal to zero.

a. The measure of fluid ability will not increase

the accuracy of prediction beyond the variance pre-

dicted by crystallized abilities for essay and

objective test scores.

b. Creativity test scores will not increase

the accurc.:y of prediction of essay and objective

test scores beyond the variance predicted by measures

of fluid and crystallized abilities.

c. The previous experience with test format and

number of related courses will not significantly increase

the variance explained in objective and essay test scores

by fluid, crystallized, and creative abilities.

50







d. The interaction of crystallized abilities

and creativity will not significantly increase the

variance explained in objective and essay test scores

by fluid, crystallized and creative abilities, and

the experience variables.

e. There are no significant differences in the

regression weights for measures of fluid, crystallized

and creative abilities in the prediction of success

on essay and objective tests.

2. The multiple correlation coefficients representing

the regression of scores of concrete and abstract objective

test items on fluid and crystallized abilities are zero.

a. The measures of fluid ability will not

increase the accuracy of prediction of concrete and

abstract objective test items beyond the variance

predicted by the measures of crystallized ability.

b. There are no significant differences in

the regression weights for measures of fluid and

crystallized abilities in the prediction of success

on concrete and abstract items.

3. There are no significant differences in the

pattern of abilities for students who score high on

objective tests, high on essay tests, or equally on

essay and objective tests.


Results from Analysis of Questionnaire Data

The questionnaire served a dual purpose. It pro-

vided information which the instructor routinely gathered





52

about the background of the students. Items which were

specifically relevant to this study were imbedded in the

questionnaire. The items relating to the general academic

background of the students were reported in Table 4.

Table 4

General Academic Background



N %


Age
20-under 114 67
21-25 45 26
26-over 11 7

Prior course from instructor?
Yes 14 8
No 156 92

Number of previous political science courses?
None 59 35
One 39 23
Two 27 16
Three or more 45 26

Are you a transfer student?
No 90 53
From 2-year school 60 35
From 4-year school 20 12


Two of the facts from this portion of the questionnaire

were unexpected. The number of transfer students in the

course was nearly 50 percent of the total number of students

enrolled in the course. The fact that one fourth of the

class had taken at least three courses in the field was

unexpected, because the course was an introductory course.

The responses to the other items fit the general pattern

of enrollment at the university.






53
Students were asked to state which test format they

preferred and which format was more difficult. An item

was also included which asked students how frequently they

tended to write essay and objective examinations. The

results for these questions were included in Table 5.

Table 5

Summary of Items about Test Format



N %

Which test format do you prefer?
Essay 60 35
Objective 64 38
No preference 45 26
Other 1 1

Have you written essay tests in other courses?
Usually 53 31
About half of the time 82 48
Seldom 34 20
Never 1 1

Which test format is more difficult for you?
Essay 48 28
Objective 51 30
Neither 71 42



Factor Analysis

A correlation matrix was generated by the scores of

the reading rate for the difficult passage, reading reten-

tion, comprehension, the fluency, flexibility, and origi-

nality scores, the Cattell test of fluid intelligence, and

the midterm and final course examinations. A principal axis

factor analysis of the correlation matrix was conducted.

Four factors with eigen values greater than 1.0 emerged.

The four factor matrix was rotated according to the varimax






54

criterion. The resulting factors were labeled creativity,

fluid and crystallized abilities, final examination, and

midterm examination. The factor loadings and the percen-

tage of common variance accounted for was reported in

Table 6.

Table 6

Test Loadings on Rotated Principal Axes


Fluid S
Crystallized Final Midterm
Creativity Intelligence Exam Exam
I II III IV


Reading Rate

Retention

Skimming S
Scanning

Comprehension

Fluency

Originality

Flexibility

Midterm
Objective

Midterm Essay

Final Objective

Final Essay

Cattell


-.11

-.11

.17


-.04

.66

.55


.62

.03

.06

.02

-.02


.03

.99

.88

.83

.03


.04

.06

.11

.02


.15

.31

.11

.50


.14

.18

.00


.30

.03

.02

.06

-.02


-.02

.74

.50

.03


-.01

.05

.09


.00

.00

.02

.10

.74


.36

.05

-.06

.08


Eigen Values 2.63 1.83 .74 .50

Proportion of .44 .27 .17 .12
common variance
accounted for







Predicting Success on Essay
and Objective Tests

The Pearson Product Moment correlations among the

predictor and criterion variables have been reported in

Table 7.

Table 7

Correlation Matrix of Predictor
and Criterion Variables


F
Ret. Comp. Cat. Orig. Essay F.O. Con. Abs.


Retention 1.00 .48 .31 -.08 .12 .37 .27 .36

Comprehension 1.00 .33 .06 .27 .40 .33 .37

Cattell 1.00 .09 .05 .20 .06 .27

Originality 1.00 .10 .11 .09 .11

Final Essay 1.00 .43 .46 .30

Final Objective 1.00 .89 .89

Concrete 1.00 .57

Abstract 1.00


The final objective and final essay test scores were

plotted against each other. The plot of the scores indi-

cated that differences in success on the two formats were

not related to the achievement level of the student on

either examination.

The means, standard deviations of the predictor and

criterion variables were reported in Table 8. Reliabilities

for the essay and objective tests were also included in

Table 8. Scaled scores for the retention, comprehension

and originality tests were based on a mean of 50 and a





56
standard deviation of 10 points. Scores for the Cattell

test were scaled based upon a mean of zero and a standard

deviation of one.

Table 8

Summary of the Descriptive Statistics for
the Predictor and Criterion Variables:
RAW SCORES


Standard
Mean Deviation Reliability

Retention 54 9.36

Comprehension 52 8.79

Originality 64 12.01

Cattell .83 .74

Essay 12.15 4.57 .84

Objective 26.38 5.44 .75


The score ranges on the essay and the objective tests

were divided into four groups. The groups represented

scores from high to low on each test with the division of

the score ranges by standard deviation units. Table 9

includes the mean score on the independent variables by

category of essay and objective test score. This table

was designed to correspond approximately to the alphabetic

marks the students received on the examinations. Thus,

mean scores on the independent variables for students who

received various marks on each examination can be

compared.













































> >


2. > > >] > OB >> > >


0 D>CDW >> >


> > >0 >


> > amo > CD > > >

> Ca
32 0


> WD > 0


w Co >


S3> > >. > D 0


W > 0> 3>


* 01 -


- -
0 **
0"0 05


> C > 0 > W >> O W >>


O -


I I I







Table 9

Mean Scores on the Predictor
Variables by Category of Z Scores on
the Essay and Objective Tests


1 2 3 4


Essay Test

Reading Rate 55.3 52.9 52.8 49.8

Retention 56.8 53.3 54.3 49.1

Comprehension 55.9 52.6 49.8 46.0

Originality 67.4 62.2 65.2 56.4

Cattell .98 .77 .80 .79


Objective Test

Reading Rate 55.8 53.3 52.8 50.1

Retention 59.4 55.2 52.4 50.2

Comprehension 56.1 53.8 48.6 47.9

Originality 63.4 67.0 62.3 65.1

Cattell 1.15 .99 .58 .73


Hypothesis 1

The multiple correlation coefficients representing the

regression of essay and objective test scores on fluid,

crystallized and creative abilities are equal to zero.

This hypothesis was rejected; the multiple correlation

coefficients for predicting the objective and essay test

scores were significantly different from zero. The multiple

correlation coefficient for predicting objective test

scores was .46, [F (4,143) = 8.10, p < .05]. The multiple

correlation coefficient for predicting essay test scores







by the ability measures was .29, [F (4,143) = 2.75,

p < .05].

The cross validation was conducted using the regression

weights predicting scores on the objective test for a

randomly selected group of. 100 students. The correlation

of the predicted scores and the obtained scores for the

validation sample was .39. The two samples were merged

for the remaining analyses.

The results of the regression of the objective test

scores on the ability variables for the cross validation

have been reported in Table 10.

Table 10

Summary of the Results of the
Regression for Cross Validation



R R2 Simple R Beta

Retention .27 .07 .27 .192

Comprehension .31 .10 .25 .210

Cattell .35 .13 .12 -.031

Originality .35 .13 .15 .176


Hypothesis la. The measure of fluid ability will not

increase the accuracy of prediction beyond the variance pre-

dicted by crystallized abilities for essay and objective

test scores.

The crystallized abilities of comprehension and reten-

tion made a statistically significant contribution to the







explained variance in the objective test scores.

[R2 = .44, F (2,148) = 18.20, p < .05]. However, the

increase in the R2 for the measure of fluid ability did

not approach statistical significance [AR2 = .001, F

(1,149) = .22, p > .053.

The comprehension and retention measures also made a

significant contribution to the explained variance in

the essay test scores. The multiple correlation coeffi-

cient for the prediction of scores on the essay by the

measures of crystallized ability was .27, [F (2,148) =

5.67, p < .05]. The measure of fluid ability did not make

a statistically significant increase in the sums of squares

related to essay test scores EAR2 = .002, F (1,149) = .27,

p > .05].

Hypothesis lb. Creativity test scores will not

increase the accuracy of prediction of essay and objective

test scores beyond the variance predicted by measures

of fluid and crystallized abilities.

The increase in the sums of squares due to the addi-

tion of the measure of creativity did not reach statistical

significance [AR2 = .004, F (1,149) = .66, p > .05] in the

equation written to predict objective test scores.

The increase in the sums of squares due to the addition

of the measure of creativity in the prediction of essay

test scores was not significant [AR2 = .009, F (1,149) =

1.39, p > .05].

Hypothesis ic. The previous experience with test

format and number of related courses will not significantly







increase the variance explained in objective and essay

test scores by fluid, crystallized, and creative abilities.

The increase in R2 due to the number of related

courses in the field and the previous experience with

essay tests was not significant for the equation with

objective test scores as the dependent variable CAR2 = .03,

F (2,145) = 2.92, p > .05]. The increase in R2 for'the

number of related courses and previous experience with

essay tests was also not significant for the equation which

specified the essay test scores as the dependent variable.

[AR2 = .03, F (2,145) = 2.68, p > .05].

Hypothesis Id. The interaction of crystallized

abilities and creativity will not significantly increase

the variance explained in objective and essay test scores

by fluid, crystallized and creative abilities, and the

experience variables.

The increase in the R2 due to the interaction was

not significant [AR2 = .06, F = (4,143) = 1.66, p > .05] for

the objective test as the dependent variable or for the

essay test as the dependent variable [AR2 = .04, F =

(4,143) = 1.37, p > .05].

A summary of the contribution of the predictor variables

to the explained variance in the objective test scores

was presented in Table 11.







Table 11

Summary of the Changes in R2 due to the
Addition of Predictor Variables using Objective
Test Scores as the Criterion


R R2 R2 Change Simple R

Comprehension .40 .16 .16 .40

Retention .44 .20 .04 .37

Cattell .45 .20 .00 .20

Originality .46 .21 .01 .11


Standard error = .90


The summary of the contribution of the predictor vari-

ables to the explained variance in the essay test scores was

included in Table 12.

Table 12

Summary of the Changes in R2 due to the
Predictor Variables with Essay Test Scores as
the Criterion



R R2 R2 Change Simple R

Comprehension .27 .07 .07 .27

Retention .27 .07 .00 .12

Cattell .27 .07 .00 .05

Originality .29 .08 .01 .10


Standard error = .97








Hypothesis le. There are no significant differences

in the regression weights for measures of fluid, crystallized

and creative abilities in the prediction of success on essay

and objective tests.

The overall hypothesis of no significant differences

in the regression weights was supported in the prediction

of essay and objective test scores by measures of fluid,

crystallized and creative abilities F (4,147) = 2.19,

p > .05. The standardized regression weights have been

reported in Table 13.

Table 13

Standardized Beta, Weights for
Predicting Essay and Objective Test Scores



Dependent Independent Variables
Retention Comprehension Cattell Originality


Essay .020 .270 -.055 .095

Objective .241 .266 .025 .110



Predicting Scores on Concrete and
Abstract Objective Test Items

The abilities required to succeed on an objective test

could be linked to the cognitive level of the objective

test item included in the test. The items were categorized

as abstract or concrete depending upon the degree of

generalization required to respond to the item. The

predictor variables (retention, comprehension, and fluid

ability) were used to predict scores on the two categories

of items. The means and standard deviations for the





64

predictor and the criterion variables have been reported

in Table 14.

Table 14

Summary of the Descriptive
Statistics for the Criterion Variables



No. of Items Mean S.D.


Concrete Items 22 12.59 3.10

Abstract Items 23 12.08 3.09


Hypothesis 2

The multiple correlation coefficients representing

the regression of scores of concrete and abstract objective

test items on fluid and crystallized abilities are zero.

Since originality was not postulated to affect scores

on types of objective test items, only the retention,

comprehension and Cattell scores were used in the analysis.

The multiple correlation coefficient for the regression

of concrete items on the independent variables was .36

which was significant, F (3,147) = 7.19, p < .05. The

regression of the abstract item scores on the three inde-

pendent variables produced a multiple correlation coeffi-

cient of .45 which reached statistical significance F (3,147)

12.19, p < .05.

Hypothesis 2a. The measures of fluid ability will not

increase the accuracy of prediction of concrete and abstract

objective test items beyond the variance predicted by the

measures of crystallized ability.







The Cattell measure of fluid ability did not

improve the prediction of scores on concrete items. In

fact, the inclusion of fluid ability in the model slightly

increased the standard error of prediction. The increase

in R2 due to fluid ability scores was AR2 = .005, F (1,149) =

.80, p > .05. Thus, a model including only the comprehension

and retention scores was adequate. The multiple correlation

coefficient for the reduced model was .35, F (2,148) =

10.40, p < .05. The standard error was .95.

The inclusion of the Cattell scores in the equation

written to predict scores on the abstract items did reduce

the standard error of prediction, but the increase in sums

of squares was not significant, AR2 = .02, F (1,149) =

2.86, p > .05.

A summary of the contribution of the predictor

variables to the explained variance in the objective test

scores was presented in Table 15.

Table 15

Summary of the Changes in R2 due to the
Addition of Predictor Variables using Concrete
Item Scores as the Criterion



R R2 R2 Change Simple R


Comprehension .33 .11 .11 .33

Retention .35 .12 .02 .27

Cattell .36 .13 .00 .06


Standard error = .95







The summary of the contribution of the predictor

variables to the explained variance in the abstract item

scores was included in Table 16.

Table 16

Summary of the Changes in R2 due to the
Addition of Predictor Variables using Abstract
Item Scores as the Criterion



R R2 R2 Change Simple R

Comprehension .37 .14 .14 .37

Retention .43 .18 .04 .36

Cattell .45 .20 .02 .27


Standard error = .91


Hypothesis 2b. There are no significant differences in

the regression weights for measures of fluid and crystal-

lized abilities in the prediction of success on concrete

and abstract items.

The hypothesis of no significant differences in the

comparable regression weights for the concrete and abstract

items was supported, F (3,147) = 1.95, p > .05. The

standardized beta weights for the two equations have been

reported in Table 17.







Table 17

Standardized Beta Weights for the
Regression of Concrete and Abstract Items on
Measures of Fluid and Crystallized Abilities


Comprehension Retention Cattell

Concrete Items .274 .161 -.074

Abstract Items .229 .213 .134



Predicting Relative Position
on Essay and Objective Tests

The discriminant function analysis was designed to

test the degree to which fluid, crystallized and creative

abilities would predict the classification of students.

The three categories included students whose essay test

scores were one z score higher than their objective test

scores as one group. The second group included students

whose objective test scores were one z score higher than

their essay test scores. The third group was composed of

all students whose scores on the two forms of the examina-

tion were within one z score.

Hypothesis 3

There are no significant differences in the pattern

of abilities for students who score high on objective tests,

high on essay tests, or equally on essay and objective tests.

The variables were entered in a direct solution; the

combination of variables was expected to predict differences

in success on the two test formats. The summary of the

statistical tests of significance has been presented in








Table 18. The first discriminant function was significant,

and its eigen value accounted for 79 percent of the between

group variance.

Table 18

Summary of Statistical Tests


Canonical
Disc. Eigen Rel. Corre- Wilks
Func. Value Percent lation A 2 D.F. Sig.

1 .195 78.96 .40 .795 33.356 12 .001

2 .052 21.04 .22 .951 7.385 5 --


The weights for the discriminant functions have been

reported in Table 19.

Table 19

Standardized Discriminant
Function Coefficients



Function 1 Function 2

Retention .1602 -.3586

Comprehension -.2309 .7200

Originality -.0571 -.3687

Cattell -.1162 -.7956

Abstract 1.0384 .0455

Concrete -.0209 .1326


The univariate tests for differences among the groups

on the independent variables found significant differences

on the concrete and abstract items. The results of the

univariate tests have been reported in Table 20.








Table 20

Univariate Tests for Significant
Differences Among the Groups



F


Retention 2.17

Comprehension .96

Originality .46

Cattell 2.02

Concrete Items 3.92*

Abstract Items 13.47*


*p < .05

The centroids of the groups aided in the interpretation

of the results. Group one, high on objective tests, had

a higher average score on the first discriminant function.

(See Table 21). The high essay group had a negative average

score. The third group scored between the high essay and

high objective test groups.

Table 21

Group Centroids


D.F. D.F.
Group I II

1 .5215 -.3170

2 -.6951 -.1911

3 .0639 .1869





70

The results of the classification analysis indicated

that group membership could be predicted for 45.7 percent

of the cases. However, much of the error in prediction

was attributed to the third group. (See Table 22).

Table 22

Percentage of Cases
Correctly Classified


Predicted Predicted Predicted
Group N Group 1 Group 2 Group 3

1 N 32 21 4 7

% 66 12 22

2 N 32 5 19 8

% 16 59 25

3 N 87 25 33 29

% 29 38 33



Summary

The results have been summarized for each phase of

the study. The initial phase was a description of the

characteristics of the sample and an assessment of the

independence of the creativity dimension. It was reported

that approximately one fourth of the students had taken at

least three other courses in the field. Also, almost one

half of the class were transfer students. Students were

nearly evenly divided on test format preference. Roughly

a third of the students preferred either objective or

essay tests. The remaining third expressed no preference.







The independence of the creativity dimension was

established through a principal axis factor analysis.

The creativity factor accounted for 44 percent of the

total variance. The ability and achievement measures

had near zero loadings on the creativity factor.

The regressions of essay and objective test scores

on the ability measures had similar results. The multiple

correlation coefficients reached statistical significance.

Crystallized ability had the highest correlation with

the essay and objective test scores. The increase in

the sums of squares for fluid and creative abilities

did not reach statistical significance in either equation.

The comparison of the respective regression weights for

the equations predicting essay and objective test scores

resulted in no significant difference.

The comprehension and retention measures of crystal-

lized measures made a significant contribution to the

explained variance in the prediction of concrete and abstract

objective test items. The Cattell measure of fluid ability

did not make a significant increase in the explained

variance in either equation. The comparison of the weights

for the equations predicting concrete and abstract item

scores failed to reveal a significant difference.

A difference in the configuration of the pattern of

abilities was found between those students who scored higher

on essays and those who scored higher on the objective test.

Students who scored higher on objective tests tended to

get more abstract items correct. Abstract items correlated







higher with the Cattell measure of fluid ability than

did the concrete items. The correlation of concrete item

scores with the Cattell test was near zero. Separate

univariate tests for differences among the groups on each

independent variable resulted in significant differences

in achievement on concrete and abstract items. Students

who scored higher on objective tests had a high positive

weight on the discriminant function which differentiated

the high essay and high objective groups. Students who

scored high on the objective test had a low negative

weight on the concrete items.













CHAPTER V

DISCUSSION AND CONCLUSIONS


The problem addressed in this study was to deter-

mine the extent to which measures of fluid, crystallized

and creative abilities would predict success on essay

and objective examinations. Several factors could

influence the results of the study: the characteristics

of the sample, the interrelationships of the ability

measures, and the scoring procedures. These factors

were dealt with in the first phase of the analysis.

The remainder of the study was concerned with the iden-

tification and comparison of the patterns of abilities

which predict success or relative success on essay and

objective tests. The probability that differences existed

in the abilities required to succeed on particular types

of objective test items was also investigated.


Preliminary Phase

Characteristics of the Sample

There were two major concerns about the sample.

The possibility existed that students who had previous

courses from the instructor or in the department would

be advantaged. Also, students who did not have experience

writing a particular type of examination could be dis-

advantaged. Coffman (1971) stressed that these concerns

73







must be met in research on essay examinations. Even

though a substantial number of students had previous

courses in political science, knowledge of related

content in the field did not significantly improve the

prediction of essay or objective test scores. Moreover,

training to write essays or objective tests did not

contribute to the prediction of essay or objective test

scores. These findings do not mean that the variables

are no longer important. The examinations used in this

course may have been unusual for the students regardless

of their past experience with examinations or the sub-

ject matter. The course description stated that the

ability to integrate ideas and to analyze current events

in international affairs was stressed. The essay item

was a statement by the Secretary of State which required

students to recognize and analyze the relevant issues.

Since the topic was broad and not tightly structured for

the student (as Vernon, 1961, suggested) a different type

of essay item would undoubtedly alter the results.

Another consideration was the novelty of the objective

items for the students. Many students commented that they

found the test to be unusual and interesting in itself.

While other instructors in the department used objective

tests, they did not use the same style of objective test

items. Therefore, the effect of previous experience of

students with either item format was reduced. If a

number of students had taken other courses from the

instructor, then the effect of these variables may have






75

been more substantial. However, only 14 students indicated

that they had taken other courses from the same instructor.

Interrelationship of the Variables

A factor analysis of the three creativity scores,

the reading test scores, the measure of fluid ability

and the scores from the course examinations was conducted.

The major purpose of the analysis was to determine the

independence of a creativity dimension. The four factor

solution was remarkably unambiguous. The creativity

factor accounted for the greatest proportion of variance.

There were near zero loadings of the measures of fluid

and crystallized abilities on the creativity factor.

Therefore, the conceptualization of creativity as a

separate dimension was warranted for these data.

The fact that the fluid and crystallized abilities

loaded on the same factor was also consistent with the

literature. It is only in second order factor analyses

that fluid and crystallized abilities separate.

The two factors representing the midterm and final

examinations may have been expected to have grouped the

two essays and the two objective examinations. However,

the similarity in content tested outweighed the similarity

due to test format. Moreover, the crystallized ability

tests correlated more highly with the final examination

than with the midterm examination. The lack of familiarity

with the format of the test items on the midterm may

account for the change in the correlations. For example,

the reading comprehension score had a zero loading on







the midterm factor which increased to a loading of .30

on the final test factor. Both the midterm and the final

examinations included essay and objective items.


Predicting Success on Essay and Objective Tests

The intercorrelations among the variables were

moderate. The correlations of the originality variable

with the remaining variables were particularly low, and

in most cases failed to reach statistical significance.

The very low magnitude of these correlations was not

expected. A recurring charge in the literature was that

creativity measures represented a form of verbal intel-

ligence. There were, however, some interesting dif-

ferences in the patterns of means (See Table 8) which

helped to explain the low correlations. Originality

scores were generally high across the first three grade

categories of the essay test, but fell sharply for the

lowest grade category. Originality scores remained

fairly constant across all grade categories on the

objective test.

Multiple Regression Analysis

The regression weights for the essay and objective

test scores on the measures of fluid, crystallized and

creative abilities were not significantly different.

Crystallized ability scores were important in predicting

both essay and objective test scores. The measures of

fluid and creative abilities did not improve the prediction

of either essay or objective test scores. The multiple








correlation coefficient was higher for the regression of

objective test scores than for essay test scores.

The plot of the residuals indicated that the assump-

tion of homogeneity of error variance was violated. Most

of the error in prediction was in the middle of the

distribution. The concentration of error could be due

to scoring error in the essay or to less than perfect

reliability in the objective test. An equally plausible

explanation is that other factors such as motivation or

study habits have a greater effect for the middle ability

student than for students at either end of the spectrum.

The possibility that the originality measure may have

had a curvilinear relationship with the essay or the

objective test scores was considered. However, the pattern

of means showed an irregular rather than a curvilinear

relationship. Part of the explanation for this irregu-

larity may have been due to an anomaly of the sample.

There were twelve black students in the sample; two thirds

of these students earned exceptionally high scores on the

originality measure. However, all but one of these

students had lower than average scores on the reading

measures and average or lower grades on the examinations.

It may be pure speculation that a cultural difference may

have been expressed in the originality score because

there were so few students. Further investigation of

this possibility may be warranted.

These findings did not completely explain the rela-

tionship between essay and objective tests. First, the







magnitude of each multiple correlation coefficient was

not high even though they reached statistical significance.

Second, the extent to which students differed in success

on essay and objective tests has not been explicated. The

plot of essay versus objective test scores illustrated

that differences in success occurred all along both

distributions of scores. That is, a substantial number

of students scored higher on one examination than on the

other. This pattern was true for high and low scoring

students on either test. This finding did not coincide

with the Biggs and Braun (1972) study which suggested

that students in the middle of the distribution of scores

were the most likely to be affected by unequal success

on essay and objective tests.


Predicting Scores on Concrete and
Abstract Objective Test Items

The abilities required to succeed on an objective

test depend upon the cognitive abilities required by the

items. Different sets of items may require different

patterns of abilities. Thus, the debate about the

association of fluid, crystallized and creative abilities

with success on essay and objective tests may hinge upon

the nature of the items. The essay item was broadly

structured and required analysis of the course content

to succeed. The objective test items were of two types:

concrete and abstract.

The regression analyses of the concrete and abstract

items produced similar results. Fluid ability did not







significantly improve the prediction of the concrete or

the abstract item scores. Comprehension and retention

were required for both sets of items. This finding was

somewhat unexpected, because the Cattell abstract reasoning

test correlated zero with the scores on the concrete items

and .27 with the scores on the abstract items. Thus,

there was a modest confirmation that the abstract items

were in fact measuring an abstract reasoning ability.

Undoubtedly, the failure of the Cattell measure to enter

the equation was due to its correlation with retention

and comprehension. There was not enough unique variance

due to fluid ability once the crystallized abilities

entered the equation. Differences in the regression

weights did approach statistical significance. Perhaps

a further refinement of the items would substantiate

the difference at a conventional level of statistical

significance.

It was anticipated that students with strong crystal-

lized abilities would tend to do well on the concrete

items; however, the Pearson Product Moment correlations

between the two crystallized abilities of comprehension

and retention were higher for abstract items than for

concrete items. The correlation between the scores on

the concrete items and the essay test were higher than

the correlation between the abstract items and the essay

test. Thus, a new relationship is beginning to emerge.

It may be that this essay tended to measure the ability

of the students to comprehend and relate the course material.







Thus, students who did better on the essay tended to do

better on concrete items which were specifically tied

to course materials. These items were as difficult for

the class as the abstract items. (See Table 14).

Therefore, it cannot be stated that the differences are

simply an artifact of the grading standards or the diffi-

culty of the items. The relationship also cannot be

attributed solely to memory of factual material. Neither

the essay nor the set of concrete items was designed to

test factual knowledge.


Predicting Relative Success on
Essay and Objective Tests

The final question to be addressed was whether or

not a linear combination of fluid, crystallized and

creative abilities would differentiate groups of students

who score higher on one test format. The students were

grouped as higher on the essay, higher on the objective

test, or the same on both tests. The discriminant function

analysis resulted in one statistically significant function.

This function differentiated the high essay and high

objective groups. Success on abstract items and retention

were the variables which had the highest positive weights

on the discriminant function. The high objective group

had a moderately high mean on the function. The high

essay group had a negative mean. Therefore, the high

objective group was characterized as stronger on abstract

reasoning and retention while the high essay group was

weaker in these abilities. The comprehension and







creativity scores did not differentiate the essay and

objective groups. Those students who scored equally well

on both test formats could not be predicted by the ability

variables. This group scored between the essay and objec-

tive test groups on the discriminant function and was

equally likely to belong to either group.

The fact that differences between the high essay

and high objective groups were found on the concrete and

the abstract items was not surprising. The difference

was, in part, a function of the categorization of the

three groups. Students higher on essays had necessarily

lower objective test scores. However, it was the abstract

items which differentiated the groups on the discriminant

function.


Summary

The overall pattern of the regression of essay and

objective test scores on the fluid, crystallized and

creative ability scores did not result in statistically

significant differences. Crystallized ability scores made

the largest contribution to explained variance in both

equations. Fluid and creative abilities did not improve

the prediction of either essay or objective test scores.

The regression of the scores on abstract and concrete

items on measures of fluid and crystallized abilities did

not result in significant differences in the overall

pattern of the regression weights. Comprehension and

retention scores were retained in the model for each equation.







The prediction of relative standing on essay and

objective tests resulted in a linear combination of

variables which differentiated students who scored

higher on essay tests from those who scored higher on

objective tests. Students who scored higher on objec-

tive tests tended to receive higher scores on the

abstract objective test items and on the retention

measure.


Implications of the Study

This study found no significant differences in the

pattern of abilities which predict success on essay and

objective tests. However, the investigation was limited

to the comparison of an unstructured essay and an objec-

tive test which was designed to measure complex mental

processes. Variations in the structure of the essay or

objective test items may shed different insights into the

relationship between the two test formats. Moreover,

further refinement of the essay and objective test items

could clarify the reasons for the lack of homogeneity

in error variance. Another possible avenue of research

would be to investigate cultural differences reflected

in scores on creativity measures in the event that these

differences may have obscured the relationship between

the tests.

No differences in the comparison of regression weights

were found; but, students did differ in their ability to

succeed on essay and objective tests. One third of the







students scored substantially higher on either the essay

or the objective test. The differences in success were

related to success on a particular type of item. Students

who scored well on the objective test had higher scores

on abstract objective items and on retention. Students

who were more successful on the essay also tended to do

well on the concrete objective items. Further verifi-

cation of this finding both within and across disciplines

should be made.

An analysis of the writing ability of the students

could clarify the error in prediction. It may be that

the ability to organize and express ideas clearly was a

deciding factor in the scoring of essays in the middle of

the distribution. Part of the folklore in grading essays

is that it is relatively easy to differentiate the

excellent and poor papers. The problem in scoring is to

separate the average papers from those which are good

but not outstanding.













CHAPTER VI

SUMMARY


This study was an investigation of the hypothesis

that different cognitive abilities were measured by

essay and objective tests. The study compared a broad,

unstructured essay with an objective test consisting of

concrete and abstract items. The rationale for the study

was that differences in students' achievement on essay

and objective tests could be explained by a combination

of student and test variables. The student variables

were of two types. The first set of variables included

measures of fluid, crystallized and creative abilities

(Cattell, 1963; Rossman & Horn, 1972). The other set

of variables included the students' previous experience

with essay examinations and related courses.

The attributes of the tests would also be related

to differences in success on the two formats. Of partic-

ular concern were the reliabilities of the examinations

and the cognitive level of the items. A 50 item objective

test was developed using classical test development

procedures; reliability was reported as a measure of

internal consistency. The objective test items were

categorized as concrete when they were related directly

to information given in lectures or the text. The items

84





85

were designed to assess the students' ability to analyze

material or ideas drawn from the course content. It was

anticipated that these items would favor students with

stronger crystallized abilities. A second type of item

was written in which the link to specific course content

was less direct. These items were labeled abstract.

The abstract items required that the students recognize

the relevant concepts and make generalizations based

upon an understanding of their interrelationships. These

broader items were expected to require more fluid ability.

The reliability of essay tests has generally been

lower than objective tests. Reduced reliabilities have

been traced to problems in the construction of the essay

item and to problems in scoring. The one hour essay item

for the final examination was designed to maximize the

difference between the essay and the objective test

(Vernon, 1961). The essay item was broad and not highly

structured, and global scoring was used. The criteria

for scoring included: understanding the dilemma posed

by the question, synthesis of diverse material and rele-

vance of examples, discussion and analysis of conceptual

issues, and originality of perspective. Reliability of

scoring was established by correlating the scores from

two separate readings of the essays.

The subjects for the study were 168 students enrolled

in an introductory course in political science. Each

student was asked to complete a questionnaire and four

examinations. The questionnaire was designed to gather





86

background information from the students. The examination

used to measure crystallized ability was the McGraw-Hill

Reading Test. Fluid ability was measured with the Cattell

Culture Fair Intelligence Test: Scale Three. The Torrance

Tests of Creativity were also administered. The final

examination was a one hour essay and a fifty item multiple

choice test.

The investigation of the contribution of fluid,

crystallized and creative abilities to the prediction of

success on essay and objective tests was conducted in

four stages. The preliminary phase of the study involved

a description of the sample and an investigation of the

independence of a creativity dimension. The questionnaire

data revealed that there were more students who had already

taken other political science courses than was expected

for an introductory course. This fact was expected to

influence the outcome of the study. However, many of

the students had transferred from other institutions;

thus, only a few students had previous courses from the

instructor and were familiar with the testing procedure.

The contribution of these student variables to the explained

variance in essay and objective test scores did not reach

statistical significance at the level specified for the

study.

The other concern in the preliminary phase of the

study was to document the independence of a creativity

dimension for these data. Part of the controversy sur-

rounding models of the organization of human abilities







has been the role of creativity in the models. The

creativity factor was included in the study on the basis

of a factor analytic investigation of the underlying

relationships between the ability and achievement variables.

A principal axis analysis was conducted and four factors

with eigen values greater than one emerged. Forty percent

of the variance was explained by the creativity factor.

A fluid and crystallized ability factor and midterm and

final examination factors were revealed in this analysis.

The midterm and final examinations did not load on the

same factor. One reason that these course examinations

did not correlate more highly was the novelty of the items

for the students. The midterm examination served as a

vehicle for providing some familiarity with the item

formats. Factor loadings of the ability measures on the

midterm examination were near zero, whereas the crystal-

lized ability scores did correlate moderately with the

factor for the final examination.

The second phase of the study was a comparison of

the regression of essay and objective test scores on the

measures of crystallized, fluid and creative abilities.

The increase in R2 for fluid and creative abilities beyond

the variance explained by crystallized abilities was not

statistically significant in either equation. The inter-

action of crystallized ability measures and creativity

was also not significant. The differences inthe beta

weights for the equations written to predict the essay

and objective test scores were tested to see if they were







simultaneously equal to zero. The differences in the

beta weights were not significantly different from zero.

The third stage of the analysis was an investigation

of the relationship of crystallized and fluid abilities

to scores on concrete and abstract objective test items.

This analysis was conducted because the relationship

between scores on essay and objective tests could be a

function of the cognitive level of the objective test

items. However, fluid ability did not make a signifi-

cant contribution beyond the variance explained by

crystallized ability even though the abstract item

scores did correlate moderately with the fluid ability

measure. Concrete item scores had a near zero correla-

tion with the fluid ability measure. The differences

in the beta weights for the equations with the concrete

and abstract item scores as the dependent variables were

not significantly different from zero. Further refine-

ment of the items may substantiate a difference in the

abilities required to succeed on the concrete and

abstract items.

The purpose of the final analysis was to investigate

differences in fluid, crystallized and creative abilities

for students who scored higher on the essay, higher on

the objective test, or the same on both examinations.

The combination of the fluid, crystallized and creative

ability scores and scores on concrete and abstract items

were expected to differentiate students who performed

better on the essays from those who performed better on





89

the objective test. A discriminant function was found to

differentiate the high essay from the high objective

score group. Higher scores on the abstract items charac-

terized the high objective group. The high essay group

did relatively better on the concrete items than the

abstract items. Therefore, the hypothesis of this study

that fluid ability would be more closely associated with

success on essay examinations was not supported. Students

who scored higher on the objective test items were more

successful on abstract items than on concrete items.

Students who performed equally well on both examinations

were equally successful on the concrete and abstract

items.

































APPENDIX A

EXAMPLES OF CONCRETE AND ABSTRACT ITEMS