Citation
The relationship between test anxiety and statistical measures of person fit on achievement tests

Material Information

Title:
The relationship between test anxiety and statistical measures of person fit on achievement tests
Creator:
Schmitt, Alicia P., 1952-
Publication Date:
Language:
English
Physical Description:
ix, 65 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Achievement tests ( jstor )
Anxiety ( jstor )
Consistent estimators ( jstor )
Correlations ( jstor )
Educational research ( jstor )
Mathematics ( jstor )
Statistics ( jstor )
Test anxiety ( jstor )
Test scores ( jstor )
Test theory ( jstor )
Dissertations, Academic -- Foundations of Education -- UF
Educational tests and measurements ( lcsh )
Foundations of Education thesis Ph. D
Intelligence tests ( lcsh )
Test anxiety ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1984.
Bibliography:
Bibliography: leaves 61-64.
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Alicia P. Schmitt.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. §107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
Resource Identifier:
030595770 ( ALEPH )
12015502 ( OCLC )

Downloads

This item has the following downloads:


Full Text











THE RELATIONSHIP BETWEEN TEST ANXIETY AND STATISTICAL
MEASURES OF PERSON FIT ON ACHIEVEMENT TESTS By



ALICIA P. SCHMITT


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY












UNIVERSITY OF FLORIDA


1984




























To my father, who taught me
the value of education













ACKNOWLEDGMENTS


I would like to express my sincere appreciation to Dr. Linda

Crocker, chair of my doctoral committee. Her standard of excellence and sound advice have guided my doctoral studies. She has always provided invaluable opportunities for learning. I am also extremely grateful to Dr. James Algina who was continually available for consultation and guidance, and to Dr. Marvin E. Shaw who gave freely of his patient and quiet encouragement. I am grateful to each of

these members of my doctoral committee for helping me reach this point in my career.

My husband, Jeff, deserves special recognition since he lived with and stood by me through this special time in my life. He was always encouraging and helpful. This degree belongs to him as much as to me. To my sister, Amelia, I am indebted for providing a consistent model of perseverance and strength. I am also grateful to my family,

friends, and co-workers, who always encouraged me and knew that I would finish, and to Adele Koehler, my typist, who always, with a smile, helped me meet my deadlines.

Finally, I thank the Alachua County School Board for providing the data used in this study.


iii














TABLE OF CONTENTS


Page

ACKNOWLEDGEMENTS.............................................. iii

LIST OF TABLES................................................ vi

LIST OF FIGURES............................................... vii

ABSTRACT...................................................... viii

CHAPTER

I INTRODUCTION.......................................... 1

Statement of the Problem.............................. 1
Purpose of the Study.................................. 2
Theoretical Rationale................................. 3
Definition of Technical Terms......................... 5
Student-Problem Table (S-P table)................ 6
Modified Caution Index (MCI)..................... 7
Personal Biserial Correlation (PB)............... 7
Norm Conformity Index (NCI)... ...................8
Rasch Person-Fit Statistic (Rx )................. 9
Extended Caution Index (ECI)..................... 10
Assumptions........................................... 11
Educational Significance.............................. 11
Summary ............................................... 12

II REVIEW OF LITERATURE.................................. 14

Person-Fit Measures.................................. 14
Historical Background............................14
Types of Person-Fit Indices...................... 16
Comparative Studies of Person-Fit Indices........ 18 Person-Fit Indices Under Study................... 20
Test Anxiety.......................................... 21
Summary ............................................... 23

III METHODOLOGY........................................... 25

Examinees............................................. 25
Instruments........................................... 26
Creation of the Data File............................. 27
Calculation of Person-Fit Statistics.................. 27
Analysis.............................................. 28
Summary ............................................... 29


i V








Page

IV RESULTS............................................... 31

Descriptive Statistics................................ 32
Relationship Between Person-Fit, Ability,
and Test Anxiety.................................... 36
Relationship Between Person-Fit and a Linear
Combination of Ability, Test Anxiety, and
Their Interaction................................... 36
Internal Consistency of Person-Fit Statistics......... 47 Summary............................................... 50

V DISCUSSION............................................ 52

Relationships Among Person-Fit Statistics............. 52
Relationship Between Person-Fit, Ability, and
Test Anxiety........................................ 53
Reliability of Person-Fit Measures.................... 58
Summary............................................... 59

REFERENCES.................................................... 61

BIOGRAPHICAL SKETCH........................................... 65


V














LIST OF TABLES


Table Page

1 S-P Table for Five Examinees and Six Items
(Ideal Pattern)........................................ 6

2 Descriptive Statistics for Person-Fit Measures by
Grade and Ability Test................................. 33

3 Descriptive Statistics for Ability Tests and Test
Anxiety by Grade....................................... 34

4 Person-Fit Measures Intercorrelations by Grade
and Ability Test....................................... 35

5 Correlations Between Person-Fit Measures and Total
Scores on Corresponding Ability Test and Test
Anxiety................................................ 37

6 Percentage of Variance (R2) in Person-Fit
Explained by the Combination of Examinee Ability,
Test Anxiety, and Their Interaction.................... 38

7 Significant Ability and Test Anxiety Interactions
and R2 Increases for Person-Fit Measures............... 41

8 Significant Main Effect of Ability as Predictor
of Person-Fit Measures................................. 48

9 Corrected Split-Half Reliability Estimates for
Person-Fit Measures by Grade and Ability Test.......... 49














LIST OF FIGURES


Figure Page

1. Relationship between the modified caution index
and test anxiety for seventh-grade examinees at
different reading ability levels....................... 42

2. Relationship between the modified caution index
and test anxiety for eighth-grade examinees at
different science ability levels....................... 43

3. Relationship between the personal biserial and test
anxiety for eighth-grade examinees at different
science ability levels................................. 44

4. Relationship between the norm conformity index
and test anxiety for eighth-grade examinees at
different science ability levels....................... 45

5. Relationship between the extended caution index
and test anxiety for eighth-grade examinees at
different science ability levels....................... 46


vii











Abstract of Dissertation Presented to the Graduate School of
the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy


THE RELATIONSHIP BETWEEN TEST ANXIETY AND STATISTICAL
MEASURES OF PERSON FIT ON ACHIEVEMENT TESTS by

Alicia P. Schmitt

December, 1984

Chair: Linda M. Crocker
Major Department: Foundations of Education

The purpose of this study was to investigate the relationship

between an examinee's level of test anxiety and each of five different person-fit statistics and to establish if this relationship is dependent on ability level. A secondary interest was to determine the relationship among person-fit indices within and across different subject areas of a standardized achievement test and to assess the internal consistency of person-fit indices.

An existing data set was analyzed to explore the nature of these relationships for the modified caution index, the personal biserial correlation, the norm conformity index, the Rasch person-fit index, and the extended caution index. Achievement test scores on the

reading, mathematics, and science subtests of the Metropolitan Achievement Test (MAT) and scores on the Test Anxiety Scale for Adolescents were used as estimates of ability and anxiety. The item scores and total test scores of 225 seventh-graders and 188 eighthgraders of a metropolitan middle school comprised this data set.


viii








Intercorrelations among the measures of person-fit were in

the .80s to .90s within same-subject content areas. Across subject content areas little or no relationship was found. Low to moderate correlations were obtained between person-fit indices and their

corresponding ability scores (1.00 to .501) and test anxiety (1.02 to .221). For four of the measures of person fit, on one or more of the subject tests, a significant proportion of variance was explained by the linear combination of ability, test anxiety, and their interaction. In these cases ability levels moderate the relationship between person-fit measures and test anxiety; for lower ability

examinees the relationship is direct, but for higher-ability examinees the relationship is inverse. Only the Rasch person-fit index was consistently unaffected by this interaction. Corrected split-half

reliability estimates of person-fit indices were low (.20 to .56), indicating little consistency of the trait.

According to these results, the potential uses of person-fit

indices are questionable at this time. More research is needed before these measures can be recommended for routine use in interpretation of

achievement test scores for individual examinees.


ix














CHAPTER I
INTRODUCTION



Statement of the Problem


Although total scores have been consistently used as the basis to

evaluate educational achievement, analysis of item-response patterns can contribute additional information that may be useful in the interpretation of an overall score. Analysis of response patterns can be based on two dimensions: item difficulty and examinee ability.

Ability is typically estimated by the total score on the test of interest, and item difficulty, by the proportion of examinees answering the item correctly. If the items are arranged in ascending

order of difficulty, an examinee with a given ability should answer items correctly until the point where his or her ability matches the difficulty of the items, and miss each item thereafter. Deviations from the expected response pattern occur when the pattern of passed and missed items is not consistent. If a person misses easier items

but then responds correctly to harder items, there is deviation from the expected response and misfit occurs.

With the introduction of the scalogram technique, Guttman (1941, 1950) was one of the first social scientists to suggest that some persons respond consistently to a given set of ordered stimuli (test items) while others do not. Under Guttman's scale theory, a response pattern where a student passing a more difficult item also responds


-1-





-2-


correctly to all easier items, is called a perfect simplex, and the scale or test under such situation is called a perfect scale.

During the late 1970s and early 1980s there has been a resurgence of interest in using information provided by response patterns. A

number of person-fit statistics have been developed to provide a measure of an individual examine's deviation from the expected response pattern to a given set of items. Although some studies have shown that indices of person-fit are highly correlated (Harnisch & Linn, 1981; Rudner, 1983), attempts to identify causes of person misfit (or even personality or demographic correlates of it) have remained mainly speculative. Some researchers, such as Frary (1982) and Harnisch and Linn (1981), have suggested that one factor which may contribute to person-misfit on cognitive tests is test anxiety, but prior to this study there has been no empirical investigation to test this hypothesis.



Purpose of the Study


The present exploratory study was designed to investigate the nature of the relationship between measures of person-fit and test anxiety. For each of five selected indices of fit (modified caution index, personal biserial correlation, norm conformity index, Rasch person-fit index, and an extended caution index), the following questions were asked:

1. What is the degree of linear relationship between test anxiety and an examinee's level of misfit?





-3-


2. What is the degree of linear relationship between ability (as defined by performance on the current achievement test) and level of misfit?

3. To what extent is variance in person-misfit explained by a linear combination of the variables: ability level, test anxiety, and their interaction?

A secondary interest in this investigation was to explore the degree of relationship among the five selected person-fit indices within and across subject area tests of an achievement battery and to estimate their internal consistencies. This information was considered important because the tests used in this case were subtests

from a well-known nationally norm-referenced, standardized achievement test battery. Earlier studies of the interrelationship and

reliability of person-fit indices have typically been based upon state minimal competency examinations (Harnisch & Linn, 1981) or locally developed teacher-made tests (Frary, 1982).



Theoretical Rationale


To date most research on test anxiety has considered primarily

the effects on examinees' total test score. Recently, Frary (1982) and Harnisch and Linn (1981) have suggested that test anxiety may be a factor which contributes to erratic performance of an examinee within

a given test (e.g., missing relatively easy items, while answering more difficult items correctly). Careless errors and lack of concentration by high-test-anxious individuals could change the pattern of item responses from the pattern that would be expected.





-4-


Two theories predict the effect of anxiety on performance.

According to the cognitive attentional theory of test anxiety, highly anxious students attend to self-relevant variables instead of to taskrelevant variables, negatively affecting their performance (Wine, 1980). In an analysis of Spielberger's (1966, 1971) extension of Spence-Taylor drive theory, Heinrich and Spielberger (1982) make several predictions about the effect of ability and anxiety on performance of tasks with varying levels of difficulty that seem relevant to this study of test anxiety and person-fit. These are as follows:

1. For subjects with superior intelligence, high anxiety
will facilitate performance on most learning tasks. While high anxiety may initially cause performance decrements on
very difficult tasks, it will eventually facilitate the performance of bright subjects as they progress through
the task and correct responses become dominant.

2. For subjects of average intelligence, high anxiety will facilitate performance on simple tasks and, later in learning, on tasks of moderate difficulty. On very
difficult tasks, high anxiety will generally lead to
performance decrements.

3. For low intelligence subjects, high anxiety may facilitate performance on simple tasks that have been mastered.
However, performance decrements will generally be
associated with high anxiety on difficult tasks, especially in the early stages of learning. (Heinrich & Spielberger,
1982, p. 147)

According to these predictions, response patterns and person-fit

statistics will be different for high, average, and low ability examinees depending on their anxiety levels. The predicted effect for high and low anxious students at these three ability levels would be

1. Subjects with high ability and high test anxiety would be

expected to initially fail hard items, but since during testing conditions, examinees receive no feedback, correct responses will not








be expected to become dominant. These examinees would continue to

have occasional difficulty on harder items, but their high levels of test anxiety would facilitate performance on easier items. Moderate to low misfit would be expected. For subjects with high ability and

low test anxiety, interference in performance of difficult items is not expected. Due to less attentional interference, use of test-taking strategies might also be more accessible. These students might be more open to guessing on harder items. Moderate-high misfit could be expected.

2. For subjects with average ability, high anxiety will help with easy to moderately difficult items, but will interfere with harder items. A low to moderate misfit would be predicted.

Similarly, low test anxiety is not expected to differentially affect item responses for average-ability examin'ees.

3. For low ability subjects, high anxiety may help with the easier items but will interfere with performance on more difficult items. If these examinees do not feel very confident in their knowledge, high anxiety might not help but affect their concentration, making them answer in a more spurious manner. In this last case, higher misfit might occur. When low ability subjects are also low in

test anxiety no interference is expected; these students will probably direct their attention to the easier items they master. A low to moderate misfit is expected.



Definition of Technical Terms


Definitions and formulas required to explain major technical terms used in this study are as follows:


-5-





-6-


Student-Problem Table (S-P table)


The S-P table is used to organize test information into a matrix of zeros and ones. The rows in this matrix represent the students ranked from highest to lowest according to total test score. The

columns represent the items arranged from left to right in ascending order of difficulty. Correct responses are represented by ones and incorrect responses are represented by zeros. Assuming that items are arranged in increasing order of difficulty (from easy to hard), a concordant response pattern is one in which an examinee answers the

items correctly until he or she reaches an item that is too difficult and answers the items incorrectly from then on. If all examinees had concordant response patterns, the S-P matrix would have all ones in the upper left-hand corner, and all zeros in the lower right-hand corner. A short illustrative table of the ideal response pattern is presented as Table 1.



Table 1

S-P Table for Five Examinees and Six Items (Ideal Pattern)


Item
Examinee
i 1 2 3 4 5 6


1 1 1 1 1 1 0

2 1 1 1 1 0 0

3 1 1 1 0 0 0

4 1 1 0 0 0 0

5 1 0 0 0 0 0





-7-


Modified Caution Index (MCI)


Harnisch and Linn (1981) introduced the modified caution index as a modification of an earlier caution index proposed by Sato in 1975 (cited in Harnisch & Linn, 1981). The MCI has a lower bound of 0 and an upper bound of 1. The higher the value of the index, the more divergent is the person's response pattern. This index is computed with data arranged into an S-P table, using the following formula:


n i
. (1- U .)n. - U. n
j=1 1J~ =(n. +1)13.
MCI = n. 1. (1)
n . J
I n I n.
j=1 ' j=(J + 1-n )


where i is the examinee index in the S-P matrix, j is the item index,

Uij is 1 if examinee i answers item j correctly and 0 if examinee i answers item j incorrectly, ni. is the total number of correct responses for examinee i, and nj is the number of correct responses to item j.



Personal Biserial Correlation (PB)


Donlon and Fisher (1968) proposed this correlation. The

coefficient obtained represents the correlation between a person's item response and the item's difficulty value. Donlon and Fischer define item difficulty as the proportion of examinees who respond incorrectly to an item. Large values correspond to difficult items and small values correspond to easy items. A positive correlation





-3-


represents good fit, indicating that a person tends to answer correctly items that are easy for the group and miss the more difficult items. Low or negative correlations represent more divergent response patterns. The formula to compute this correlation is



(Qr ~-O) 3'
PB = . (2)



where Qr is the mean item difficulty for items answered, Qc is the mean item difficulty for items answered correctly, SQ is the standard
r
deviation for Qr' Jr' is the number of items answered correctly divided by the number of items answered, and Y is the ordinate from

the standard normal curve at the point separating the proportions Jr and (1 - Jrl)Norm Conformity Index (NCI)


This index was developed by Tatsuoka and Tatsuoka (1980). The NCI indicates the degree of concordance to a group response pattern where items are arranged in descending order of difficulty, from hardest to easiest. Values of this index may range from -1 to 1. The

smaller or more negative the index, the more divergent is the individual's response pattern in comparison to the group norm. This index is undefined for either perfect or zero scores. Let S denote

the row vector of a person's response pattern; let S' denote the transpose of the complement of S; and let N = S'S. The formula to compute this index is





-9-


2U
NCI = ( )-a (3)



where Ua . .nij, the sum of the above diagonal elements of N and
1 j<1
U = n , the sum of all the elements of N.
1 3


Rasch Person-Fit Statistic (Rx2


The Rx2, also referred to as Maximum Likelihood Procedure (MAX)

(Wright & Panchapakesan, 1969) and as weighted total fit mean square (Rudner, 1983), was adopted for use with the one parameter Rasch model

and is calculated using the BICAL program (Wright, Mead, & Bell, 1979). This index has a high value for an examinee who has a response

pattern that is inconsistent with the examinee's score and the Rasch model measure of item difficulty. The following formula is used to compute this index:



32
2 (U - P )2
Rx2 j=1 13)
S(P;. j(1 - P..i))
3=1



where Uij is the response of examinee i to item j and Pij is the probability of a correct response for examinee i on item j as predicted by the Rasch model:


P e 6 -b ) /[1+ e 1 3]





-10-


Extended Caution Index (ECI)


The extended indices have been proposed by Tatsuoka and Linn (1981, 1983). They describe the extended indices as linear transformations of the distance between a person's response pattern and a theoretical curve. In the case of ECI, this curve is the group

response curve (GRC), which is "an average function of N different Person Response Curves" (Tatsuoka & Linn, 1981, p. 10). For the ECI, probabilities of success, calculated through item response theorylogistic models substitute the zeros or ones in the S-P table. For

purpose of this study, the Rasch one-parameter logistic model will be used to calculate these probabilities. The formula for the ECI is



* (Y. - P. )(Y . - P )
ECI = 1 - =1(5)

j=1 ii


where Yij is the response of examinee i to item j, Yj is the sum of responses across examinees for item j, P. is the proportion of

correct responses of examinee i, P.. is the total proportion of correct responses, Pij is the probability of correct response for each examinee i on item j according to the Rasch model, and Pij/J is the mean predicted probability of success for examinee i.

This formula is computed by the ratio of two covariances. The higher the value of the ECI, the more variation from the expected response pattern. This index is also limited by perfect or zero scores. The denominator would become zero and the value infinite.





-11-


Assumptions


The following underlying assumptions were held for this study:

1. Standardized testing situations are capable of inducing test anxiety among students who would normally be test anxious.

2. Students responded truthfully on the self-report instrument used to assess their level of test anxiety.

3. Total score on the achievement test used can be taken as an

estimate of the examinee's ability (substituting for any external measure of ability, such as an I.Q. or academic aptitude test score).

4. Each subtest of the achievement test measures a fairly

unidimensional trait (i.e., achievement in reading or mathematics or science). This assumption is critical for the indices using the Rasch

statistics, but it also underlies the assumptions made to calculate the other indices.



Educational Significance


The potential value of person-fit indices has been cited by Frary (1982), Harnisch (1983), Harnisch and Linn (1981), Levine and Rubin (1979), Rudner (1983), and Van der Flier (1982). These writers suggest that these indices could be useful for the fol lowing purposes:

1. To identify individuals for whom the test is inappropriate or invalid. Total test score interpretation can be misleading for examinees who come from different experiential backgrounds or take the test under different motivational dispositions, e.g., test anxiety.

2. To identify groups with different instructional practices or histories, which could change the difficulty of the items, e.g., schooling differences.





-12-


3. To identify items that are inadequate for particular groups of examinees.

Presently person-fit indices are considered to be at a state of development where more research is needed to investigate their psychometric properties and establish their applicability. The reasons why some people are misfits are not clear. If test anxiety can be identified as a factor associated with person misfit, then the

interpretive value of person-fit statistics would be enhanced. Another pragmatic contribution of this study is to extend the body of

research of person-fit statistics by providing information about 1) the agreement of person fit classifications across different subject matter content areas, as measured by the subtests of the

Metropolitan Achievement Test, and 2) the degree of agreement of person-fit classifications by different indices.



Summary


Analysis of item-response patterns provides information not

contained in a total test score. Although the idea of using response

pattern information probably originated when Guttman (1941) introduced the scalogram technique, it has not been until the late 1970s and early 1980s that a strong interest in person-fit statistics has developed.

Person-fit statistics quantify the degree of deviation of an

examinee's response pattern from the expected response pattern. The development and application of person-fit indices is at a fledgling stage. More research is needed to investigate their psychometric properties and establish their applicability. Attempts to identify





-13-


causes of person-misfit have remained mainly speculative. Recently Frary (1982) and Harnisch and Linn (1981) have suggested that test anxiety may be one factor that can explain erratic performance of an examinee within a given test (e.g., missing easy items while answering more difficult items correctly). According to drive theory the relationship between test anxiety and performance is moderated by level of ability and task difficulty. Performance on specific items

might not only be dependent on the item's difficulty and the examinee's ability but also on the examinee's test anxiety.

The primary purpose of this study was to establish 1) the degree

of linear relationship between test anxiety and an examinee's level of misfit, 2) the degree of linear relationship between ability (total score on achievement test) and level of misfit, and 3) the extent that variance in person-misfit can be explained by a linear combination of ability level, test anxiety, and their interaction. Five different indices of person-fit were used in this study: the MCI, PB, NCI, Rx2, and ECI.

A secondary interest was to investigate the relationship among the five selected person-fit indices within and across subtests of a norm-referenced achievement battery and to estimate their internal consistencies.













CHAPTER II
REVIEW OF LITERATURE



The two central aspects of this study are person-fit statistics and test anxiety. These two topics provide the major themes for the organization of the literature review presented in this chapter.


Person-Fit Measures


Historical Background


During the late 1970s and early 1980s there has been an

increasing interest in the development and application of statistical indices to identify examinees with aberrant item-response patterns. Proponents of person-fit statistics indicate that these indices add to

the information provided by total scores and can also be used to identify potentially inaccurate total scores (Frary, 1982; Harnisch, 1983; Harnisch & Linn, 1981; Rudner, 1983).

This trend toward using information from item-response patterns is not new. According to Gaier and Lee (1953)

one of the most promising trends in current psychometric
research is an increasing concern with methods of
evaluating patterns of test scores and test responses . . . our initial hypothesis is that consideration of
response configurations will yield more fruitful results
than the usual methods of reporting merely the total score for a test . . . a total score may thus carry
considerably less diagnostic significance than a direct
and detailed analysis of test responses per se.
(p. 140)


-14-




-15-


Guttman (1941, 1950) was one of the first writers to suggest that

some persons respond consistently to a given set of ordered stimuli (test items) while others do not. According to Guttman (1950), "a person who endorses a more extreme statement . . . should endorse all less extreme statements if the statements are to be considered a

scale" (p. 62). Guttman's description of the basic procedure for the scalogram technique of scale analysis is very similar to Sato's S-P chart construction. He states that there are two basic steps in the scalogram pattern formation. These are

first, the questions are ranked in order of "difficulty"
with the "hardest" questions, i.e., the ones that fewest
persons got right, placed first and with the other
questions following in decreasing order of "difficulty."
Second, the people are ranked in order of "knowledge"
with the "most informed" persons, i.e., those who got
all questions right, placed first, the other individuals following in decreasing order of "knowledge." (Guttman,
1950, p. 70)

Sato's S-P chart is also a two-dimensional matrix where the rows represent the students ranked from highest to lowest according to total test score (cited in Harnisch & Linn, 1981). The columns represent the items arranged from left to right in ascending order of difficulty. Construction of a scalogram pattern and a S-P table follow the same two steps. Once the responses are organized in this fashion, a concordant response pattern is defined as the case when an

examinee answers the items correctly until he or she reaches an item that is too difficult and answers all items incorrectly thereafter. Some disruption of a perfect pattern can happen. As the response pattern deviates more from the expected pattern, the degree of aberrance increases. Visual identification of aberrant or erratic

response patterns becomes increasingly more difficult as the number of





-16-


items in a test increases. The number of possible response patterns

multiplies as the number of items increases. With the recent introduction of a variety of statistics to measure the degree of

deviation from a typical response pattern, there is a renewed interest in using response pattern information.



Types of Person-Fit Indices


Indices measuring the degree of unusual response patterns can be categorized into three major types: norm-comparison indices, goodness-of-fit indices, and extended indices.

Norm-comparison indices, which are based on observed patterns of right and wrong answers and are calculated with summary statistics based on the norm group, include Sato's caution index (cited in Harnisch & Linn, 1981), the modified caution index (Harnisch & Linn, 1981), the agreement, disagreement, and dependability indices proposed by Kane and Brennan (1980), the U' index by Van der Flier (1977), the personal biserial by Donlon and Fischer (1968), and the normconformity index by Tatsuoka and Tatsuoka (1980). Van der Flier's U' index and Tatsuoka and Tatsuoka's norm-conformity index have been reported to have a perfect negative relationship (Harnisch & Linn, 1981). Norm-comparison indices are calculated by using information organized in a S-P table. They indicate the degree of aberrance from

the expected response pattern, when examinee ability is defined as the total observed score on the test.

Goodness-of-fit or "appropriateness" indices are based on item

response theory (IRT) (Levine & Rubin, 1979). As with norm-ccanparison indices, goodness-of-fit indices are also based on the expected





-17-


response pattern for an examinee at a given ability level. The distinction is that for goodness-of-fit indices a more sophisticated definition of "ability" is employed. Instead of simply equating ability with the examinee's observed raw score on the test, ability is defined in terms of his estimated score on a theoretical latent continuum underlying test performance. There are two popular IRT models that estimate examinee abilities based on the latent trait underlying test performance. For Rasch's one-parameter logistic model, examinee ability estimates are determined as a function of item difficulty parameters. A widely used computer program, the BICAL, written by Wright et al. (1979) provides examinee ability estimates, item difficulty parameter estimates, and a person-fit statistic (Rx ) which "indicates how well the individual's item response pattern and the Rasch model fit" (Rudner, 1982, p. 4).

The second widely used IRT model is Birnbaum's three parameter logistic model (Lord & Novick, 1968, Ch. 17) for which examinee ability estimates are determined as a function of item difficulty, item discrimination, and guessing parameters. Levine and Rubin (1979) developed three types of appropriateness indices, based on Birnbaum's three parameter logistic model. These approaches are the marginal probability, the likelihood ratios, and the estimated ability variation indices. A practical limitation in using these procedures arises from the large sample sizes usually recommended to obtain stable estimates from the three-parameter model (Hambleton & Cook, 1977). Levine and Rubin devised a simulation of item response data on the Scholastic Aptitude Test (SAT) to conform to normal or aberrant response patterns. Their findings indicate that all three types of





-13-


goodness-of-fit indices demonstrate the capability to detect aberrance when present.

Extended caution indices have been proposed by Tatsuoka and Linn (1981, 1983) as a link between the norm-comparison and the goodnessof-fit indices. They linked Sato's S-P theory and item response theory by replacing the original observed zeros and ones of the item scores with IRT probabilities of passing the items. These

probabilities were then used in the calculation of the caution indices. Five variations of the extended caution index were created. These extended caution indices are defined as "linear transformations

of the covariance or correlation between a person's response pattern and a theoretical curve" (Tatsuoka & Linn, 1983, p. 95). Their findings support the effectiveness of the extended indices in

identifying examinees who use erroneous rules in answering arithmetic test problems. Tatsuoka and Linn (1983) point out that these indices can have instrumental utility by identifying students who consistently make errors because of misconceptions.



Comparative Studies of Person-Fit Indices


There have been several studies in which the relationship and

effectiveness of person-fit indices have been compared. Harnisch and

Linn (1981) made a comparative analysis of ten norm-comparison indices. Using mathematics and reading tests from a statewide assessment program they also examined school and regional differences.

The intercorrelations between these indices ranged from 1.13 to .99 for mathematics and from I.34 to .96 1 for reading. They found that Kane and Brennan's (1980) agreement index had the lowest correlation





-19-


with the other indices, but had the highest correlation (.99) with total score. The modified caution index (MCI) was found to have the

lowest correlation with total score (-.02 for mathematics and -.21 for reading). Harnisch and Linn (1981) found significant school and regional differences in student's response patterns as measured by the

MCI.

Rudner (1982, 1983) evaluated nine indices; four were normcomparison indices, while five were goodness-of-fit indices. He generated data by simulating examinees and their responses through Birnbaum's three-parameters model. Response patterns were altered to simulate spuriously high or low respondents. Findings indicated that

the norm-comparison indices (point biserial correlation, PB, NCI, and MCI) and the weighted total fit mean square or Rx2 were highly intercorrelated (I.77 to .991). The goodness-of-fit indices using Birnbaum's three parameter model and the unweighted total fit mean square had lower intercorrelations (1.17 to .801). Validity of the indices was tested by observing how sensitive they were to assessment accuracy. The MCI and the NCI identified comparable proportions of examinees with aberrant response patterns. According to Rudner,

"these two approaches were the most stable of the statistics" (Rudner, 1983, p. 217). In general, indices based on IRT showed better detection rates of aberrant response patterns than the norm-comparison indices.

Frary (1982), using teacher-made multiple-choice tests, compared three person-fit measures; the Rx2, the MCI, and a weighted choice index. In the weighted choice index, distractor choice is considered as part of the estimation of person-fit. The Rx2 and the MCI were





-20-


found to be highly correlated (.75). The smallest relationship between any two of the three indices was between the Rx2 and the

weighted choice test (.42). In this study Frary was the first to compute and report person-fit internal consistency estimates. Low and even negative split-half coefficients of the person-fit measures in his study were found (Frary, 1982).



Person-Fit Indices Under Study


The present study is the first to include all three different

types of person-fit statistics (i.e., the norm-comparison indices, the goodness-of-fit indices, and the extended indices). The five indices under study are the modified caution index (MCI), the personal

biserial correlation (PB), the norm-conformity index (NCI), the Rasch person-fit statistic (Rx2), and the extended caution index (ECI).

The MCI was chosen for use in this study because it was found to

be least related to total test scores (Harnisch & Linn, 1981) and is considered stable with short and long tests (Rudner, 1983). The PB was selected because it has been in use for a longer period of time

than more recent indices and is generally associated with classical test theory. Its computations are relatively simple and it has been found to be very efficient with shorter classroom tests (Rudner, 1983). For these reasons the PB could be useful to a larger number of practitioners. The NCI has been found to correlate with total score somewhat higher than other indices (Harnisch & Linn, 1981), but it has nevertheless been recurrently used in different research studies

(Harnisch & Linn, 1981; Rudner, 1982, 1983; Van der Flier, 1977, 1982). The NCI and the MCI are considered to be the most applicable





-21-


and stable under situations with long and short tests and spuriously

high or low scores (Rudner, 1983).

The Rx2 index was selected as part of the indices used in this

study, due to the availability of the BICAL computer program. The convenience of having the Rx2 computations given as part of the output from the BICAL computer program make the Rx2 index more usable to practitioners. The Rx2 is a goodness-of-fit or appropriateness type of person-fit index. It uses the Rasch one-parameter logistic model to estimate ability and item difficulty. Appropriateness indices

requiring use of the three-parameter logistic model were not feasible for the present study because of the larger sample size recommended to get consistent ability parameter estimates (Hambleton & Cook, 1977). The ECI represents a link between norm-comparison indices and goodness-of-fit indices. Since no comparisons of the ECI with other

indices or computations with actual data are available in the literature this index was included to evaluate its relationship to the other indices.

It is probably noteworthy that most previous studies of multiple

measures of person-fit have focused primarily upon intercorrelations among these indices without investigating how they correlate with measures of any trait other than achievement itself, as measured by the test. The present study is somewhat broader in scope, since it

investigates how these indices relate to another variable, test anxiety.



Test Anxiety


Most research on test anxiety has considered the effects of test anxiety on total score. According to Tryon (1980) test anxiety





-22-


research findings present a consistent moderate negative correlation between test anxiety and total score measures of achievement. Hightest-anxious individuals tend to score lower on classroom and aptitude tests (Alpert & Haber, 1960; Harper, 1974; Mandler & Sarason, 1952; I. Sarason, 1963, 1975; Spielberger, Gonzalez, Taylor, Algaze, & Anton, 1978).

Several researchers have tried to explain why test anxiety

affects performance. According to the cognitive attentional theory of

test anxiety (CATTA), introduced by Sarason (1960) and extended by Wine (1971, 1980), the "major cognitive characteristics of test anxious persons are negative self-preoccupation, and attention to evaluative cues to the detriment of test cues" (Wine, 1980, p. 371). This misdirection of attention, both in the pre-stages of evaluation

(study phase) and the test-taking situation, may limit coding, retention, and retrieval of information by high-test-anxious individuals. Difficulty of the task (e.g., difficult items) is expected to negatively affect attention. Thus according to CATTA, performance of test-anxious persons will be negatively affected.

The Spence-Taylor drive theory also predicts the effect of

anxiety on performance of tasks with varying levels of difficulty. Heinrich and Spielberger (1982) summarize these predictions according to the difficulty of the task. They explain that for high anxious students the performance of a task is dependent on its difficulty.

High anxiety may facilitate performance on easy tasks, interfere with performance on harder tasks, and be dependent on the stage of learning for tasks of intermediate difficulty. Heinrich and Spielberger (1982) explain the relationship between performance and the learning stage.




-23-


According to these authors, "high anxiety wil1 be detrimental to performance early in learning when the strength of correct responses is weak relative to competing error tendencies. Later in learning, high anxiety will begin to facilitate performance as correct responses are strengthened and error tendencies are extinguished" (Heinrich & Spielberger, 1982, p. 146).

Varying ability levels and their relationship with anxiety and

task difficulty are also considered by the Spence-Taylor drive theory. According to Spielberger (1971) the effect of anxiety on subjects with different ability levels will be subject to the task difficulty and the learning stage considered.

These two theories, the CATTA and the Spence-Taylor drive theory,

point to the possibility that test anxiety might have an effect at the item level and that this effect might be dependent on ability level. Person-fit statistics measure the deviation from an expected response pattern. According to the theory of person-fit, in a good fit to a

response pattern "high ability examinees are expected to get few easy items wrong," while "low ability examinees are expected to get few difficult items right" (Rudner, 1983, p. 207). If test anxiety affects performance at the item level, it might be a factor which contributes to erratic performance for high anxiety examinees.



Summary


Literature pertinent to person-fit indices and test anxiety has

been reviewed in this chapter. Recent literature on person-fit measures shows an increasing interest in the development and application of these indices. The idea of using information provided





-24-


by response patterns is not new; it was first introduced by Guttman (1941) with the scalogram technique. During the late 1970s and early 1980s a number of different person-fit statistics were developed as measures of the degree of deviation from a typical response pattern. These indices can be categorized into three major types: normcomparison indices, goodness-of-fit indices, and extended indices.

Five person-fit indices were selected for use in this study.

Three of these indices are norm comparison indices (MCI, PB, and NCI),

one is a goodness-of-fit index (Rx2), and one belongs to the extended category of indices (ECI). Three major research studies comparing person-fit indices were reviewed. Harnisch and Linn (1981) compared

ten norm-comparison indices, and concluded that the MCI seemed to be the most promising due to its lower correlation to total score. Rudner (1983) evaluated four norm-comparison indices and five goodness-of-fit indices. He found that goodness-of-fit indices showed better detection rates of aberrant response patterns. Frary (1982)

contributed to the development of person-fit indices by being the first to study their internal consistency. He found low split-half reliabilities.

Test anxiety has been suggested as a factor that could affect

performance within a test (Frary, 1982; Harnisch & Linn, 1981). Two

theories, the CATTA and the Spence-Taylor drive theory predict that high test anxiety will negatively affect performance. These two theories point to the possibility that test anxiety might have an effect at the item level and that this effect might be dependent on ability level. Test anxiety could thus be a factor that contributes to person-misfit.













CHAPTER III
METHODOLOGY



The present study was designed to investigate the relationship

between an examinee's level of test anxiety and each of five different person-fit statistics and to establish if this relationship is dependent on ability level. A second purpose was- to investigate the correlation of person-fit indices within and across different subject areas of a standardized achievement test battery and to assess the internal consistency of person-fit indices. An existing data set was

-analyzed to explore the nature of these relationships. A description of the examinee group, instruments, data-file creation, and data analysis methods is presented in this chapter.



Examinees


The data pool used in this study consisted of test scores and

item responses from 225 seventh-graders and 188 eighth-graders from a metropolitan middle school in north central Florida. There was an almost even distribution of boys and girls at each grade level. Approximately 70% of the examinees were white and 30% were black at each grade level. The school population is heterogeneous with respect to socio-economic level.


-25-




-26-


Instruments


The Test Anxiety Scale for Adolescents (TASA) (Schmitt and Crocker, 1982) was used to measure test anxiety as a trait. This instrument is a modified version of the 37-item Mandler-Sarason Test Anxiety Scale (Sarason, 1972). This scale consists of 31 true-false items and is designed for use with examinees in middle school or junior-high grades. Unlike most other test anxiety scales for children, all items on the TASA deal exclusively with examinee feelings about tests. Sample items include

"I worry just before getting a test back"; and

"Sometimes on a test I just can't think."

Schmitt and Crocker (1982) have found the factor structure of the TASA to be fairly similar to that reported for the adult test anxiety scales. They reported a KR20 of .87 as a total score reliability estimate for these middle school examinees.

The Metropolitan Achievement Test (MAT) subscales (Form KS) were used to measure achievement in reading, mathematics, and science (Prescott, Balow, Hogan, & Farr, 1978). The TASA was administered in

March, 1981, approximately two weeks prior to the school-wide administration of standardized achievement tests. The MAT was administered by school staff as part of the school district's regular testing program approximately two weeks after the test anxiety scale was given. The range of item difficulties of the MAT subscales for the seventh and eighth grades is respectively: .21 to .99 and .28 to .99 for reading; .20 to .97 and .31 to .97 for mathematics; and .29 to .94 and .34 to .98 for science.




-27-


Creation of the Data File


The test anxiety scores, coded with student ID number, but no other identifying information, were obtained in conjunction with a

University of Florida College of Education inservice training project for school personnel on identifying and counseling test-anxious students. The researcher later obtained a set of MAT test item responses for students with those same ID numbers from the county school district testing office. This data file also contained some demographic information (i.e., sex, race, and grade level). The data file used for analysis in this study was created by matching the two examinee data files on student ID number and merging the files. Thirty-seven of the students' records (13 of the seventh graders and 24 of the eighth graders) in the merged file were deleted later, because it was found that these students had been tested out-of-grade level, rather than on the test form taken by their grade peers.



Calculation of Person-Fit Statistics


For each examinee on each subtest, five different indices of person-fit were calculated. These were the modified caution index, the personal biserial correlation, the norm-conformity index, the Rasch person-fit index, and the extended caution index. To create the

data file containing the five person-fit indices for each MAT subtest at each grade level, the original item-examinee response matrix was used. Each examinee's response* to each item was coded 0 or 1 in this matrix. The data on this matrix were used to


*Blanks or omitted responses were treated as incorrect responses and
assigned a 0 value.




-28-


1. compute total scores to get an ability estimate for each student;

2. compute a mean score for each item as estimate of item

difficulty (p value); and

3. reorganize data into an S-P matrix, by sorting by total score, and by item difficulty.

The resulting matrix had students organized by ability (from most able to least able) and items organized by difficulty (from easiest to hardest). This S-P matrix was used to calculate the five person-fit indices, using computer programs written by this author for each person-fit statistic. Refer to Chapter I for formulas' definition. These statistics were programmed using the Statistical Analysis System (SAS) package (Helwig & Council, 1979). The accuracy of each programmed computation was tested using the dummy data set given by Harnisch and Linn (1981, p. 136).



Analysis


Means, standard deviations, minimum and maximum values by grade

were computed for each person-fit statistic, ability measure, and test anxiety score. For the person-fit indices and ability measures these

descriptive statistics were calculated for each subtest of the MAT (reading, mathematics, and'science). Correlations among fit

statistics and between person-fit and test anxiety and between personfit and ability measures were calculated.

To investigate the relationship between examinees' level of test anxiety and degree of person-fit and to study if this relationship is dependent upon ability level, a linear multiple regression analysis




-29-


was used. In this analysis person-fit measures were the dependent

variables and ability (reading, mathematics, or science) and TASA were the continuous independent variables. The model used for each ability measure and person-fit index is



Y' = bo + bjXj + b2X2 + b3X1X2



where Y' = person-fit predicted by the model, b0 = intercept value, bj = regression slope for the ability independent variable, X1 = ability, estimated from total score, b2 = regression slope for the TASA independent variable, X2 = TASA, b3 = regression slope for the interaction of ability and TASA, and X1X2 = interaction between ability and TASA.

To estimate the internal consistency of person-fit indices, items for each MAT subtest at each grade level were divided into odd and even subtests. The original sequential-test-item-number was used for this split. Odd-item and even-item person-fit statistics were computed by following the sequence of steps previously described. The fit index for the odd items was correlated with the fit index for the even items and the resulting correlation was corrected using the Spearman-Brown formula to obtain an internal consistency estimate for the full-length test.



Summary


A linear multiple regression analysis was used to investigate the relationship between an examinee's level of test anxiety and each of five different person-fit statistics and to study if this




-30-


relationship is dependent on ability level. The five person-fit indices included in this study were the modified caution index, the

personal biserial correlation, the norm conformity index, the Rasch person-fit index, and the extended caution index. A data set of 225 seventh-grade and 188 eighth-grade examinees' responses to the Test Anxiety Scale for Adolescents and to the Metropolitan Achievement Test reading, mathematics, and science subscales was used to compute person-fit statistics and explore the nature of these relationships. Correlations of person-fit statistics between and within the different subject area tests were examined. Person-fit split-half reliabilities were also computed for each index for each content area and grade level.













CHAPTER IV
RESULTS



The present study was undertaken to answer the following questions for each of five selected indices of person-fit:

1. What is the degree of linear relationship between test anxiety and an examinee's level of misfit?

2. What is the degree of linear relationship between ability (as defined by performance on the current test) and level of misfit?

3. To what extent is variance in person misfit explained by a

linear combination of the variables: ability level, test anxiety, and their interaction? Analyses were also performed to explore the degree of relationship among the five selected person-fit indices and

to determine their split-half reliabilities.

The results of the analyses presented in this chapter have been

organized beginning with results of simpler analyses and proceeding to those of greater complexity. The issue of the degree of relationship among the five person-fit indices within and across subtests of the MAT is addressed in the next section, Descriptive Statistics. Data relevant to questions 1 and 2 (dealing with the bivariate relationships between test anxiety and misfit and between ability level and misfit) are presented next. Results of multiple regression analyses, relevant to question 3 are presented in the third section of this chapter. The final section of this chapter contains the results


-31-




-32-


of the investigation of the internal consistency of person-fit indices.



Descriptive Statistics


The mean, standard deviation, and minimum and maximum values for person-fit measures by grade and ability subtests are presented in Table 2. Means and standard deviations for each person-fit index are very similar across grades and ability subject tests. For the reading test, person-fit measures show the greatest dispersion between the minimum and maximum scores. For this subtest of the MAT, the maximum

score observed for the personal biserial correlation on both the seventh and eighth grade was greater than one. This was not a

computation error but may be ascribed to sampling error or violation of the assumption of an underlying normal distribution for the dichotomous item response variable. Lord and Novick discuss conditions under which biserial correlations may exceed 1.00 (Lord & Novick, 1968, p. 339). Descriptive statistics for ability tests and test anxiety for the seventh and eighth grades are presented in Table 3.

Person-fit measures' intercorrelations for each grade level are

presented in Table 4. Seventh-grade intercorrelations are shown above the diagonal and eighth-grade's below the diagonal. Results of these correlations indicate that person-fit measures are highly related within the same subject test area. These correlations ranged

from 1.78 to .991 and are all significant at an alpha level of .0001. However, across tests in different subject areas, there was little or no relationship between the same person-fit measure. The Rasch






-33-


Table 2

Descriptive Statistics for Person-Fit Measures by Grade and Ability Test




Grade

7 3
Person-Fi: M so Min. Max. M 5D Min. -ax.

Reading Test

MCI .23 .10 .2 .81 .25 .12 0.
.57 .20 -.16 1.12 .59 .2. -.12
NC' .54 .20 -.52 .36 .55 .21 -.10 >.0
Rx .93 .14 .55 1.37 .97 .15 .55 1.49
ECI .01 .25 -.89 1.92 .01 .37 -. 1.24


Mathematics best
MC: .22 .08 .06 .50 .23 .39 .01 .58
.57 .17 .02 .93 .58 .19 -.20
.55 .17 .03 .39 .55 .'. -.17
Rx' .98 .14 .67 1.43 .18 .15 .52
EC: -.21 .31 -.2 .96 -.J .33 -.72 1.34


Science Test

MC: .25 .18 .07 .39 .25 ._8 .37 .54
.43 .5 -.14 .3i 9 .15 -.7 .73
NC: .47 .17 -.15 .37 .47 .17 -.09 .35
;2 .99 .'2 .59 i.40 .;9 .22 .78 '.B
.6 -.54 1.23 -.05 .-3 -0


22% 'or the seventh ;rade; I =U . or -ne 3.ghth ;race.

225 for the seventh 9rade; = 138 -or tre eighth grace.

m * 225 for the seventh ;race; = 15 for the eignth grace.
















Table 3

Descriptive Statistics for Ability Tests and Test Anxiety by Grade



Grade

7 8
Test 11 SD Mill. Max. M SD Mil. Max.

Reading 35.59 11.62 9 54 39.47 10.48 15 54


Mathematics 30.32 9.44 13 49 33.02 8.88 8 48


Science 33.43 9.87 9 53 35.02 9.13 14 52


IASA 15.05 6.66 1 29 13.75 6.23 1 27









Table 4


Person-Fit Measures Intercorrelations by Grade and Ability Test


Test


Read I I I
MCI PII rlI: I


Rx 2


-.97** -.91** .82**

.94** --89**


Tr'st Rh'aIinq
fps. I.


PS1 ric I
Rx'
I C I
th thomlia t i cs

MC I






PB (I-C

MC I



P

.92**

-.90**
--.98**


-.10
.12

.11


-.12


-.15


.13

-.21*
-.15


Ma them t Ic 2
[cI MCI Pli NCI Rx2


.88 **
-.99**
-.96** . 881*


.90**


- .83**
- 93**


-07 .07
.08
-.12
-.o8


.17 .17
13
-2.3*
-.16


.14
-.15

-.15
.17 .15


201


--17


.18


.10
-.1!
-.1o1
.11
.10


.12
-.15
-.09
.17
.10


.12
-.15
-.13
17
.15




- .981** ..99A* .89** .99**


.11
-.16
-.12

-14 .I1


-. 14
.16

.11
-.15
-.15


-.10
.14 .12
-.16
-.14


.00
-.06
-.05
.12
.07


-.98A* --99*1 .93**

.96** -.914*


.97**
-.I3**
-.974**


-.10
.16 .11
-. 13
-.09


- .93*k
-.-914k
-.99** .90**


.861**


.78**
.93**


.09
-.13
-.08
.07 .07


-. 0?
-.03
.05
-.01
-.06


.18
-.-18
-.18
.21
-18


Science
[CI MCI PIl NCI


.10
-.14
-.12
.17
.15


.99 *
-97*A .99* A 94*




.15
-.18
-.15
.18 .15


-.03
- . l 1
-.04
.06 .03


.03
-.01
-.04
.07
.05




- .97**
-.99*1* 9p * .994**


-.02
.03 .04
-.05
-.04






.02

-.03
-.03


.05
-.02
.01
-.04
.00


- .00
-.02
.01
-.04
-.02


.94*
.%6*1


o: Se(,vp' 11rade resuI ts showing ablovo aIga lli I and , ilghth-grade 's he I ow (I iagonla 1 ' t1 i f i ii di. = .or. ' i ti Iiiit di t .(- 0 )I.


- .14
.18
.14
-.17
-.14


IN2


--.06
- 00
S03
-.0.3
.07






-.05
.10 .07


.W


*07'
.1() 1




.0!
- .03
06

Oil


99
- g/I

- 90lA

* o' ,


(.98** - ). # 93*k

.97,* -.901 4 .974A - .()I**


.9IkA


--.92A**


-..94* *
-.99**




-36-


person-fit measure (Rx2) had the lowest relationship to other indices within same subject test. Interestingly Rx2 is the only index with significant intercorrelations with other indices across subject areas.



Relationship Between Person-Fit, Ability, and Test Anxiety


Correlations between person-fit measures, total ability measures of reading, mathematics, and science and test anxiety for each grade are presented in Table 5. Correlations between person-fit measures and their corresponding total ability scores ranged from 1.00 to .501. In general the personal biserial (PB) index had the lowest correlation with total score. This lower correlation was consistently observed across subject test areas and grade level. Correlation values between person-fit measures and test anxiety ranged from I .02 to .221 . The highest correlation between person-fit measures and test anxiety

occurred for seventh grade science person-fit values and for eighthgrade reading person-fit values.



Relationship Between Person-Fit and a Linear Combination
of Ability, Test Anxiety, and Their Interaction


Multiple regression results for the model with ability, test

anxiety, and their interaction are sunmarized in Table 6. Values of R2 for the model at each test area are reported. Models with significant ability and test anxiety interactions are identified by having their corresponding R2 underlined. For the seventh-grade

sample, a significant proportion of variance in the modified caution index was explained for each of the three subject area tests by tie





-37-


Table 5

Correlations Between Person-Fit Measures and Total Scores on Corresponding Ability Test and Test Anxiety



Grade

7 3
Person-Fit Total TASA Total TASA

Reading Test

MCT .28** -.18** .32** .07
PB -.05 .10 .18* -.15*
NCi .-03 .03 .20* -.5*
Rx2 -.11 -.02 -.16* .17*
EC .02 -.08 -.05 .14


Mathematics Test

MCI -.i6* .03 -.09 .06
PB .04 .04 .01 -.03
NC .20** -.05 .17* -.03
Rx -.32** .10 -.30** .11
ECI -.23-* .05 -.20** .08


Science Test

MCI -.25** .17* -.4D** .03
PB .08 -.09 .29** .02
NC .22** -.16* .44** -.03
Rx -.39* .22** -.50** .08
ECl -.28** .18** -.50** .04


Note: Higher misfit is represented oy higher values on the MCI, Rx2,
and ECI and by lower values on the PB and WC.

*Sicnificant at a = .05.


""Significant at a = .01.





-38-


Table 6

Percentage of Variance (R2) in Person-Fit Explained by the Combination of Examinee Ability, Test Anxiety, and Their Interaction



Person-Fit

Test MCI PB NCI Rx2 ECI

Seventh Grade

Reading .11* .04* .01 .02 .03
Mathematics .04* .03 .05 * .06**
Science .07** .01 .06* .** .09**


Eighth Grade

Reading .12** .05* .05 .05* .02
Mathematics .04 .04 .06* .10** .07**
Science .23** .13** ~** .27** ~27*


'Significant at:, = .05.

*Significant at : = .01.


Significant interaction , = .05.




-39-


linear* combination of test anxiety, ability, and their interaction. Only in reading, however, was the percentage of variance explained greater than 10%, and on this case the interaction of ability and test anxiety was significant. For the personal biserial, the normconformity index and the extended caution index, although several significant R2 values were observed, these were all less than .10. Thus it is difficult to interpret these relationships as being substantially important. For the Rasch index of person-fit, substantial proportions of variation were explained by the model in the areas of math and science. No significant interaction effect of ability and test anxiety was detected in either of these cases.

For the eighth grade, the modified caution index again appears to

be sensitive to variation in examinees' level of test anxiety, ability, and their interaction. The significant R2 values exceeded 10% for this index in reading and science. Science had the largest R2 (.23) and a significant interaction between test anxiety and ability.

For the personal biserial, the norm-conformity index, and the extended caution index, significant R2 values greater than .10 were found only in the area of science, and for each of these cases, the interaction between ability and test anxiety contributed significantly to the overall model. Significant (and meaningful) R2 values for the Rasch X2 index were found for the areas of science and mathematics without any interaction between ability and test anxiety.




*Curvilinear relationships between person-fit indices and ability and TASA measures were tested and found not significant.




-40-


For further interpretations of these results, the nature of the

interaction effect of test anxiety and ability on person-fit was examined.

Significant interactions were followed up by plotting the

relationship between person-fit measures and TASA at selected ability levels. The formula to calculate the regression line at each ability level is



Y' = b0 + b1X1 + (b2 + b3Xl)X2



At any value of ability (X1) a predicted person-fit measure

(Y') was calculated for different points of TASA (X2). The intercept of this model equals (b0 + b1XI) and the slope equals (b2 + b3Xl). Table 7 reports the slope and intercept estimates which were used in plotting these lines for all cases in which R2 exceeded .10.

The regression lines resulting from these computations are shown in Figures 1-5. It should be noted that Figures 2-5 are based on a single grade-level and a single subject area. These plots depict the nature of the interaction of ability and test anxiety on various indices of person fit. Although the points of intersection of these

lines may vary, the same general pattern of relationship between person-fit and test anxiety occurs in all cases. Namely, this pattern of interaction can be characterized as follows:

1. For examinees in the average ability ranges, there is little or no relationship between test anxiety and person-misfit. (Note the "flat" slope of the regression line for Group E in all figures.)

2. As examinee ability level increases (see lines for Groups F, G, and H), the slope of the regression lines increases, generally





-41-


Table 7

Significant Ability and Test Anxiety Interactions and Increases
for Person-Fig Measures


ceceldent Stancarc
rarsole Oarameter Estimate Error t :ncrease


Seventh Grade--Reaoing

:nterceot .3294 .3643

Slope-4bifity .0058 .3016

ilo:e-TASA .3071 .A035

Slope-Ability*TASA -.3002 .0001

Nodel - R2 . .11*; R 3,22 = 3.30")


. 16

3. 74** 2.02*

-2.52'


.025


:nter:ept Slope-Ability Slope-TASA Slope-Ability*TASA

(Mcdet - R =23*;



:ntercept Slope-Ability Slooe-TASA

Slcoe-,bility*TASA
2
odei - R=.3*



:nterceot

Sicoe-Ability S!ooe-7ASA Sloce-Ability*TASA





:ntercept Sl.oe-kbiliq


hth Grade--Science .316 .0593

-.3013 .3015

.0064 .3037

-.3002 .0001

F, IQ,) .00**)


.5323

-.0020

-.0157

.3005


.1:99

.0031 .075

.0C02


3,432 -


.1727 ."021

-.3144 .3004 3,132


.'210



.3075

. 002


.2291 .:__73

-.3053 .3367

M 37 .E 1


.1c:


PB


5 .22*





-2. 15





4.44..

-.53

-2.23* 2.566"


fl~Oaflt ~:


3.02** .56

-1.91

2.23*





.39

-.93




-42-


H


G


F


- E


D


C Reading Ability:
A - 12.35 B C -23.97
D -29.78 E-35.59 (x A F -4 1 .40


.32 .30 .28 .26 .24

.22 .20 .18 .16

.14 .12 .0


H -53.02


I I I


1.73 5.06 8.39 11.72 15.05 TASA


18.38 21.71 25.04 28.37


Relationship between the modified caution index and test anxiety for seventh-grade examinees at different reading
ability levels.


more misfit
.38


.36


.34 -


C:


:
0 a-)


)


less misfit'


Figure 1.




-43-


more misfit
.40


38F


36


.34 .32 .30 .28 .26

.24 .22

20


~I6


A


8


C


D


E


F


G


H


- Science Ability:


- -25.89
D D-30.46
- -35.02


. F -39.r
14 - G -44.
F H - 48. less
misfit 9 4.4


59
15 72


752 10.64 13.75 16.87
TASA


19.98 23.10 26.21


Relationship between the modified caution index and test anxiety for eighth-grade examinees at difference science
ability levels.


CO)




C-


Figure 2.


18





-44-


H

G


F

E

D

C
ence Ability: A - 16.76 8 - 21. 33 C - 25.89 D -30.46 A


-39.59
-44.15
-48.72


F
G
H


752 10.64 13.75 16.87
TASA


19.98 23.10 26.21


Relationship between the personal biserial and test anxiety for eighth-grade examinees at different science ability levels.


less misfit
.72

.68


.64 .60 -


.56


c




U 0-


.52

.48

.44

.40

. 36 .32

.28-


Sci


.24 more
misfit H


1.29 4.41


Figure 3.




-45-


less
misfit
.77 Science Ability:
A- 16.76 H
8 -21.33 C -25.89
.69- D -30.46 G
- E -35.02 (x
.65 - F -39.59
- G -44.1 5
.61I H -48.72F

3 .57C
S.53

v .49
0 D
f .45

U .41 C

.37

.33- 8

.29

.25 A

more
misfitt
1.29 4.41 752 10.64 13.75 16.87 19.98 23.10 26.21 TASA

Figure 4. Relationship between the norm conformity index and test
anxiety for eighth-grade examinees at different science
ability levels.





-46-


more misfit
.56


.32

.24 .16 .08


a)




(F)

U
Wi


A


8



C



D


.00 L

-.08E.

-1 61


-.24


-. 321-


-.40

-.48

-.56

-.64


SciE


-. 72


-. 80
less / misfit 4


.48

.AC


E -35.02 F -39.59 G - 44.15 H -48.72


1.29 4.41


(x)


752 10.64
TA


13.75 SA


16.87 19.98 23.10 26.21


nce Ability:
A - 16.76 B - 21.33 C - 25.89
r) -32Q4r


E


Relationship between the extended caution index and test anxiety for eighth-grade examinees at different science ability levels.


Figure 5.


G



H




-47-


indicating an increasing negative relationship between test anxiety and person-fit. Specifically high-ability, low-anxious examinees show more misfit than high-ability, high-anxious examinees.

3. As examinee ability decreases (see lines for Groups C, B, and A), the opposite effect occurs. Namely low-ability, low-anxious examinees show less misfit than low-ability, high-anxious examinees.

When no interactions were significant, only ability main effects. were significant. Table 8 presents the Type I sums-of-squares which

measure incremental sums of squares for the model as each variable was added. Models for the Rasch person-fit index on mathematics and science ability main effects for the seventh and eighth grades were significant. There was also a significant main effect of reading ability for the model with the modified caution index at the eighth grade.



Internal Consistency of Person-Fit Statistics


Corrected split-half internal consistency reliability

coefficients for person-fit measures by grade and subject content area are reported in Table 9.

For the seventh grade sample, the internal consistency estimates

ranged from .23 to .56. The highest person-fit split-half reliability estimates were found consistently for the reading subtest. For the eighth grade the range of values was from .23 to .39, with a slight trend for reading to have the higher values.

Among person-fit indices, the Rx2 index has the highest internal

consistency estimates for the eighth grade (ranging from .31 to .39), but for the seventh grade, no person-fit index was consistently more






-48-


Table 8 Significant Main Effect of Ability as Predictor of Person-Fit Measures


Depencent 7ype I
Iariaoie Source if Sum Squares Mean Squares RZ

Seventh Grade--Mathematics

Rx Mocel 3 .50 .17 .20' .ii4athematics 1 .47 .47 25.30*
7ASA - .01 .31 .47
Matnematics*TASA 1 .32 .32 1.12
Errcr 221 4.01 .02


Seventh Grade--Science

Rx Model 3 .50 .17 13.73* .-5Science .49 .49 39.95*
TASA . .01 .01 1.21
Sciencea*ASA 1 .CO .00 .01
Errcr 221 2.70 .01


Eighth Grade--Reading

MC Model 3 .31 .10 7.37- ..2*;eading .27 .27 20.3*.
TASA .34 .04 3.28
Reading*7ASA .30 .,0 .17
Er-or 120 2.36 .01


Eighth ]rade-Mathematics
;x Mocel 3 .44 .1: 7.21* .13*
Matnematics 1 .38 .3 ;.6*
TASA i .1 .31 .25
Mathematics*TASA 1 .^6 .35
Error 184 3.36 .32


Eighth Grade--Science
x Macel 2 .57 .22 S2.33
Sc-ence .53 .52 53.35*
nASAn A .31 .31
Sc-,enc9,7.4SA 0 .523
:rrcr .32 1.31 .31



.*Signif-can at






-49-


Table 9

Corrected Split-Half Reliability Estimates for Person-Fit Measures by Grade and Ability Test




Person-Fi t

7est PB 1C:

Seventh Grade

Reaaing .06 .:6 .49 .45
Matuematics .28 .29 .25 .2 .29
Science .25 .20 .23 29 .25


Eighth Grade

Reading .35 .37 .31 .39 .33
Ma thematics .29 .37 .33 .29 .26
Science .25 .23 .26 .32 .25




-50-


reliable than the other. Overall with the exception of the reading subtest the reliabilities for person-fit indices for the mathematics and science subtests appear consistent across various indices, and

are relatively low.


Summary


Results can be summarized as follows:

1. Descriptive statistics for person-fit measures, test anxiety, and ability scores were very similar across subject content area and grades.

2. Intercorrelations between person-fit measures showed that

these measures are highly related within the same subject content area. Across subject area little or no relationship was found.

3. Correlations between person-fit measures and their

corresponding total ability score ranged from 1.00 to .501. The PB index had the lowest correlation with total score.

4. Correlations between person-fit measures and test anxiety

scores ranged from 1.02 to .221. In science, four out of five indices were significantly related to test anxiety scores for seventh graders. For eighth graders in reading, three of the five indices were significantly related to test anxiety.

5. A significant proportion of variance in person-fit measures

was explained by the linear combination of test anxiety, ability, and their interaction, for the seventh and eighth grade reading MCI index and for the eighth grade science MCI, PB, NCI, and ECI. Regression lines depicting the nature of these interactions were presented for

selected ability levels. Significant and meaningful R2 values




-51


(greater than R2 = .10) for the Rasch person-fit index were found for the areas of science and mathematics without any interaction between ability and test anxiety.

6. Corrected split-half internal consistency estimates for the person-fit indices ranged from .20 to .56.













CHAPTER V
DISCUSSION



This study was conducted to investigate the relationship between

examinee's level of person-fit and test anxiety, and to study the effect of ability on this relationship. Five person-fit indices were

calculated for seventh and eighth grade students who had taken a testanxiety self-report test and the reading, mathematics, and science subtests of the MAT.

Discussion of results will focus on findings about (1) the interrelationship among the five person-fit indices, (2) the relationship between person-fit, test anxiety, and ability, and

(3) the reliability of the five person-fit statistics under study.



Relationships Among Person-Fit Statistics


Intercorrelations among measures of person-fit were quite high within same-subject content areas. The correlation values ranged from 1.78 to .991. These high person-fit statistics' intercorrelations confirm previous research findings by Harnisch and Linn (1981) and Rudner (1983). In particular, Harnisch and Linn found intercorrelations among the MCI, PB, and NCI that ranged from 1.93 to .971 for mathematics tests and from 1.89 to .95 1 for reading tests. Rudner's intercorrelations among the MCI, PB, NCI, and the Rx2 for the simulated SAT test ranged from 1.80 to .99 and


-52-




-53-


from 1.77 to .99 I for the simulated teacher-made-biology test. These consistent high intercorrelations among the person-fit indices under study indicate that they seem to be measuring a common construct.

Relatively low correlations were found for any given index (MCI, PB, NCI, Rx2, and ECI) across the reading, mathematics, and science tests. These correlations ranged from -.03 to .24. There seems to be no persistence of misfit across the different tests. These results would question the notion that tendency to misfit is a stable trait which consistently manifests itself in examinee performance across various subject areas. Frary's (1982) correlations of same person-fit index across several tests were also very low.

For practical implications these results lead to two points that need to be considered when interpreting person-fit results. Namely, that if an examinee is identified as having poor person-fit by one

index, another index will probably identify him/her as a misfit on the same test; but that it can not be concluded that he or she will also be a misfit on another test.



Relationship Between Person-Fit, Ability, and Test Anxiety


Low to moderate correlation values were obtained between personfit indices and their corresponding total ability scores. The Rasch person-fit index was the index with the highest correlations with total mathematics and science ability scores. For the reading test,

the MCI had the highest correlations with total score, but this correlation was positive while a negative correlation would have been expected. The reading subtest was not typical of a power test, since most examinees did not finish all items. More able examinees probably




-54-


attempted more items, passing end items that were more "difficult" than some items which they had missed earlier in the test. This caused more able students to get higher misfit classifications, as can be seen in Figure 1. The only person-fit index that did not seem to

be affected by the speededness of the reading test was the Rx2. Although its relationship to total reading score was low, it was in the direction expected.

The PB index had the lowest correlation with total scores.

Contrary to the present findings, Harnisch and Linn (1981) found that the PB was one of the indices that had a high relationship with total

score on the reading test (.63). The relationship of the PB to the math total score on the Harnisch and Linn study was nevertheless somewhat lower (.28) than for some other indices.

Correlation values between person-fit measures and test anxiety

were relatively low. These correlations ranged from 1.02 to .22J. The Rx2 index had the highest correlations with test anxiety. The only case when the Rx2 correlated lowest with test anxiety was for the seventh-grade reading test. This low correlation is ascribed to the speeded nature of the reading test, which was more noticeable at the seventh grade. These correlation values are disappointingly low from a practical perspective; however, one point that might be considered

is that these observed correlations may have been attenuated by the extremely low reliability of the person-fit measures and by having a general or trait measure of test anxiety instead of a state-specific measure. As an exploratory exercise one could speculate about the nature of this relationship if person-fit could be more reliably measured. A correction for attenuation method (Magnusson, 1966,




-55-


p. 148) was used to estimate what the correlation between person-fit and test anxiety would be if these measures were perfectly reliable. The corrected correlations between person-fit measures and test anxiety ranged from 1.03 to .441. Even with the correction for attenuation most of these correlations are still relatively low and there is little evidence offered by the present study to support the notion that higher reliabilities can be achieved in person-fit measures.

In order to explore the extent to which variance in person misfit can be explained by a linear combination of the variables--ability level, test anxiety, and their interaction--this linear multiple regression model was tested for each person-fit statistic at each grade level and for each ability measure. Because of the large sample size a number of fairly small R2 were statistically significant, so only cases where there was a meaningful percentage of variance (larger than 10%) were considered.

The significant interactions between ability and test anxiety

demonstrate that ability levels moderate the relationship between person-fit measures and test anxiety. For lower-ability examinees a direct relationship between person-fit measures and test anxiety was found. For higher-ability examinees this relationship was found to be inverse.

Figures 1 and 2 present the two general pictures of these

interactions. Figure 1 shows the relationship between the modified caution index and test anxiety as measured by TASA for seventh-grade examinees with different reading levels. The nature of the interaction is the same as previously described, but higher ability level students have higher overall MCI values (more misfit). For all




-56-


the other significant interactions which were for person fit measures

of science at the eighth grade, lower ability level students have higher values of misfit. Figure 2 shows this for the MCI. The difference found for these two content areas might be due to the performance of this sample on the reading subtest. More than 10 percent of the seventh graders missed the last fifteen items of the reading test, making it appear more like a speeded test than a power test. Higher ability examinees probably attempted more end items, increasing their probability of getting higher-misfit classifications.

The cognitive-attentional theory of test anxiety, the SpenceTaylor drive theory, and previous person-fit research findings suggest interpretive explanations of the interaction results. According to Tobias (1980) and Weinstein, Cubberly, and Richardson (1982) hightest-anxious students will have worse performance on difficult materials compared to low-test-anxious individuals, while with easier material little difference between anxiety levels is expected. In reporting results about cognitive coping behavior and anxiety, Houston

(1982) suggests that "highly trait-anxious (and test anxious) individuals tend to lack organized ways of coping with stress and

instead ruminate about themselves and the situation in which they find themselves" (p. 198).

Since high-test-anxious students' performance on more difficult items is more affected by their anxiety as these examinees reach items

that have a level of difficulty that approximates their ability level they will have a harder time coping. Other testing strategies such as

test wiseness would not be readily available due to lack of concentration. Examinees with high ability levels and low test




-57-


anxiety could take advantage of test-taking skills and answer item correctly beyond their ability level. Since these items would not be answered with the same degree of certainty as easier items, more deviation from the expected response pattern could occur and higher misfit would result. For examinees with high ability and high-test

anxiety, coping and test-taking skills could be blocked; making them consistently miss harder items. Lower misfit values would occur for this group. Examinees with lower ability and lower test anxiety levels would answer items correctly to the point where they reach items at their ability level and then miss the harder items. Some misfit could occur due to attempts at harder items. For examinees

with low ability and high-test anxiety, distracting thoughts might interfere with performance to almost all items. Even easy items (in reference to their ability level) could be missed; this sporadic answering pattern would classify this group as high misfits.

These interaction effects between ability and test anxiety seem

to appear more consistently for science especially at the eighth grade. One possible explanation is that the standardized science test fits the curriculum less than the reading or math tests. The mean item difficulty for the science test is lower, indicating a harder test. Examinees taking the science examination might find themselves in a more ambiguous and hence anxiety-producing situation.

These findings are primarily of theoretical interest to those who may be interested in learning more about the constructs of test anxiety or person-fit. At best the combination of ability, test

anxiety, and their interaction appear to account for only about onefourth of the variance in person-fit indices, and the increments in R2




-58-


obtained by using the interaction term in the regression model were small. The interaction term only accounted for about two percent of the R2 values.



Reliability of Person-Fit Measures


Corrected odd-even split-half reliability estimates of person-fit indices were low. These coefficients ranged from .20 to .56. These internal consistency estimates were highest for the person-fit indices computed for the seventh-grade reading test. Part of the reason for the higher reliabilities could be ascribed to the speeded nature of

the reading test at this grade level since the original sequentialtest item number was used to split the test into odd-even subtests. Magnusson (1966) cautions on using split-half methods on timed tests. He states, "the time limit has the effect that in reliability computations with split-half methods the test's reliability tends to be overestimated" (p. 114).

Frary (1982) analyzed the internal consistency of person-fit

indices and also found low and even negative split-half coefficients. His findings led him to conclude that unexpected responses to a small number of itens contributed to high misfit classifications and that there was little consistency on the specific items contributing more to misfit.

These findings certainly seem to call into question the notion of person-fit as a stable trait that can be reliably measured and also question the potential utility of these indices. Frary summarizes this concern when he suggests "that use of person-fit measures for any decision-making purpose, especially with respect to individual




-59-


examinees, should be undertaken only with extreme caution and that substantial additional research will be required before they can be used routinely" (Frary, 1982, p. 17).



Summary


Results of the relationship among person-fit statistics were quite high within same-subject content areas, but not across different-subject tests. It can be generalized that if an examinee is

identified as having poor person-fit by one index, another index will probably identify this examinee as a misfit on the same test, but it

can not be expected that this examinee will also be a misfit on another test.

Significant interactions between ability and test anxiety

demonstrate that ability levels moderate the relationship between person-fit measures and test anxiety. For lower-ability examinees a direct relationship between person-fit measures and test anxiety was found. For higher-ability examinees this relationship was inverse.

The cognitive-attentional theory of test anxiety and the Spence-Taylor drive theory suggest interpretive explanations for these interaction results. These findings are of theoretical interest to those interested in learning more about the constructs of test anxiety and person-fit. At best the combination of ability, test anxiety, and their interaction appears to account for only about one-fourth of the variance in person-fit indices.

Internal consistency (split-half) reliabilities were relatively low and the present study offers little evidence to support the notion that higher reliability of person-fit indices could be achieved.




-60


According to these results the potential uses of person-fit indices are questionable at this time. More research is needed before personfit indices can be recommended as a routine measure in achievement tests.













REFERENCES


Alpert, R., & Haber, R.N. (1960). Anxiety in academic achievement
situations. Journal of Abnormal and Social Psychology, 61, 207215.

Donlon, T.F., & Fischer, F.E. (1968). An index of an individual's
agreement with group-determined item difficulties. Educational
and Psychological Measurement, 28, 105-113.

Frary, R.B. (1982). A comparison among person-fit measures. Paper
presented at the annual meeting of the American Educational
Research Association, New York.

Frary, R.B., & Giles, M.B. (1980). Multiple-choice test bias due to
answering strategy variations. Paper presented at the annual
meeting of the National Council on Measurement in Education,
Boston, Mass.

Gaier, E.L., & Lee, M.C. (1953). Pattern analysis: The configural
approach to predictive measurement. Psychological Bulletin, 50,
140-148.

Guttman, L. (1941). The quantification of a class of attributes: A
theory and method of scale construction. In P. Horst, P. Wallin, & L. Guttman (Eds.), The prediction of personal adjustment. New
York: Social Science Research Council, Connittee on Social
Adjustment.

Guttman, L. (1950). The basis for scalogram analysis. In S.
Stouffer, L. Guttman, E. Suchman, P. Lazarsfeld, S. Star, & J.
Clausen (Eds.), Measure and prediction (Vol. 6). Princeton,
N.J.: Princeton University Press, 60-90.

Hambleton, R.K., & Cook, L.L. (1977). Latent trait models and their
use in the analysis of educational test data. Journal of
Educational Measurement, 14, 75-96.

Harnisch, D.L. (1983). Item response patterns: Application for
educational practice. Journal of Educational Measurement, 20,
191-206.

Harnisch, D.L., & Linn, R.L. (1981). Analysis of item response
patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133-146.


-61-




-62-


Harper, F.B.W. (1974). The comparative validity of the MandlerSarason Test Anxiety Questionnaire and the Achievement Anxiety Test. Educational and Psychological Measurement, 34, 961-966.

Heinrich, D.L., & Spielberger, C.D. (1982). Anxiety and complex
learning. In H.W. Krohne & L. Laux (Eds.), Achievement, stress,
and anxiety. Washington, D.C.: Hemisphere.

Helwig, J.T., & Council, K.A. (Eds.). (1979). SAS user's guide, 1979
edition. Raleigh, N.C.: SAS Institute.

Houston, K. (1982). Trait anxiety and cognitive coping behavior. In
I.G. Sarason & C.D. Spielberger (Eds.), Achievement, stress, and
anxiety. Washington, D.C.: Hemisphere.

Kane, M.T., & Brennan, R.L. (1980). Agreement coefficients as
indices of dependability for domain-referenced tests. Applied
Psychological Measurement, 4, 105-126.

Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of
multiple choice test scores. Journal of Educational Statistics,
4, 269-290.

Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental
test scores. Reading, Mass.: Addison-Wesley.

Magnusson, D. (1966). Test theory. Reading, Mass.: Addison-Wesley.

Mandler, G., & Sarason, S.B. (1952). A study of anxiety and
learning. Journal of Abnormal and Social Psychology, 47, 166173.

Prescott, G.A., Balow, I.H., Hogan, T.P., & Farr, R.C. (1978).
Teacher's manual for administering and interpreting Metropolitan Achievement Tests (Advanced 1, Forms JS and KS). New York: The
Psychological Corporation.

Rudner, L.M. (1982). Individual assessment accuracy. A paper
presented at the annual meetings of the American Educational
Research Association, New York, 1982.

Rudner, L.M. (1983). Individual assessment accuracy. Journal of
Educational Measurement, 20, 207-220.

Sarason, I.G. (1960). Empirical findings and theoretical problems in
the use of anxiety scales. Psychological Bulletin, 57, 403-415.

Sarason, I.G. (1963). Test anxiety and intellectual performance.
Journal of Abnormal and Social Psychology, 66, 73-75.

Sarason, I.G. (1972). Experimental approaches to test anxiety:
Attention and the use of information. In C.D. Spielberger (Ed.),
Anxiety: Current trends in theory and research (Vol. 2). New
York: Academic Press.




-63-


Sarason, I.G. (1975). Anxiety and self-preoccupation. In I.G.
Sarason & E.D. Spielberger (Eds.), Stress and anxiety (Vol. 2).
Washington, D.C.: Hemisphere.

Schmitt, A.P., & Crocker, L. (1982). Test anxiety and its components
for middle school students. Journal of Early Adolescence, 2,
267-275.

Spielberger, C.D. (1966). Theory and research on anxiety. In C.D.
Speilberger (Ed.), Anxiety and behavior. New York: Academic
Press.

Spielberger, C.D. (1971). Trait-state anxiety and motor behavior.
Journal of Motor Behavior, 3, 265-279.

Spielberger, C.D., Gonzalez, H.P., Taylor, C.J., Algaze, B., & Anton,
W.D. (1978). Examination stress and test anxiety. In C.D.
Spielberger & I.G. Sarason (Eds.), Stress and anxiety (Vol. 5).
New York: Hemisphere/Wiley.

Tatsuoka, K., & Linn, R.L. (1981). Indices for Detecting unusual
item response patterns in Personnel testing: Links between direct and item-response-theory approaches (Research Report
81-5). Urbana: University of Illinois, Computer-Based Education
Research Laboratory.

Tatsuoka, K., & Linn, R.L. (1983). Indices for detecting unusual
patterns: Links between two general approaches and potential
applications. Applied Psychological Measurement, 7, 81-96.

Tatsuoka, K., & Tatsuoka, M.M. (1980). Detection of aberrant
response patterns and their effect on dimensionality (Research
Report 80-4). Urbana: University of Illinois, Computer-Based
Education Research Laboratory.

Tobias, S. (1980). Anxiety and instruction. In I.G. Sarason (Ed.),
Test anxiety: Theory, research, and applications. Hillsdale,
N.J.: Lawrence Erlbaum Associates.

Tryon, G.S. (1980). The measurement and treatment of test anxiety.
Review of Educational Research, 50, 343-372.

Van der Flier, H. (1977). Environmental factors and deviant response
patterns. In Y.H. Poortinga (Ed.), Basic problems in crosscultural psychology. Lisse, Netherlands: Swets & Zeitlinger.

Van der Flier, H. (1982). Deviant response patterns and
comparability of test scores. Journal of Cross-Cultural
Psychology, 13, 267-298.

Weinstein, C.E., Cubberly, W.E., & Richardson, F.C. (1982). The
effects of test anxiety on learning at superficial and deep
levels of processing. Contemporary Educational Psychology, 7,
107-112.




-64


Wine, J.D. (1971). Test anxiety and direction of attention.
Psychological Bullegin, 76, 92-104.


Wine, J.D. (1980).
I.G. Sarason applications.


Cognitive-attentional theory of test anxiety. (Ed.), Test anxiety: Theory, research, and Hillsdale, N.J.: Lawrence Erlbaum Associates.


Wright, B.D., Mead, R., & Bell, S.
with the Rasch model (RM-23b).


In


(1979). BICAL: Calibrating items Chicago: University of Chicago.


Wright, B.D., & Panchapakesan, N.A. (1969). A procedure for sample
free item analysis. Educational and Psychological Measurement,
29, 23-48.













BIOGRAPHICAL SKETCH


Alicia P. Schmitt was born in Havana, Cuba, on September 28,

1952. She immigrated to the United States in 1961 and moved to Puerto Rico in 1962. In 1970 she graduated from high school and entered the

University of Puerto Rico, graduating in 1973 and in 1975 with a bachelor's and master's degree in psychology. From 1975 to 1977 she worked as Evaluation Coordinator for Project Follow Through and taught evening courses at the University of Puerto Rico.

Alicia moved to Gainesville in 1977 and began her doctoral

program at the University of Florida in the fall of 1978. While in graduate school she held a variety of assistantships. For a period of two years, she served as research consultant for the Research Clinic of the College of Education. As part of the duties in other assistantships she developed and taught the Independent Study by Correspondence Course in Measurement and Evaluation in Education; assisted in measurement, research, and statistics courses in the

College of Education; and worked as research assistant for a graduate school dean.

In 1983 she became Assistant Director of Testing and Evaluation for the Office of Instructional Resources, University of Florida. In this capacity she administered the College of Education Basic Skills Testing Program and analyzed institutional studies used for educational planning. Alicia has currently accepted a position with Educational Testing Services and will be moving to Princeton, New Jersey.


-65-









I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.


Linda M. Crocker, Chair Professor of Foundations of Education







I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.



J mes J. Alg'ha
Associate Pro essor o Foundations of Education







I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.




P ofessor of Psychology









This dissertation was submitted to the Graduate Faculty of the College of Education and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy.

December, 1984


Ch rman, Department of Foundations
Education


Dean, College


of Education


Dean for Graduate Studies and
Research




Full Text

PAGE 1

THE RELATIONSHIP BETWEEN TEST ANXIETY AND STATISTICAL MEASURES OF PERSON FIT ON ACHIEVEMENT TESTS By ALICIA P. SCHMITT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1984

PAGE 2

To my father, who taught me the value of education

PAGE 3

ACKNOWLEDGMENTS I would like to express my sincere appreciation to Dr. Linda Crocker, chair of my doctoral committee. Her standard of excellence and sound advice have guided my doctoral studies. She has always provided invaluable opportunities for learning. I am also extremely grateful to Dr. James Algina who was continually available for consultation and guidance, and to Dr. Marvin E. Shaw who gave freely of his patient and quiet encouragement. I am grateful to each of these members of my doctoral committee for helping me reach this point in my career. My husband, Jeff, deserves special recognition since he lived with and stood by me through this special time in my life. He was always encouraging and helpful. This degree belongs to him as much as to me. To my sister, Amelia, I am indebted for providing a consistent model of perseverance and strength. I am also grateful to my family, friends, and co-workers, who always encouraged me and knew that I would finish, and to Adele Koehler, my typist, who always, with a smile, helped me meet my deadlines. Finally, I thank the Alachua County School Board for providing the data used in this study. iii

PAGE 4

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iii LIST OF TABLES vi LIST OF FIGURES vii ABSTRACT vi ii CHAPTER I INTRODUCTION 1 Statement of the Problem 1 Purpose of the Study 2 Theoretical Rationale 3 Definition of Technical Terms 5 Student-Problem Table (S-P table) 6 Modified Caution Index (MCI) 7 Personal Biserial Correlation (PB) 7 Norm Conformity Index (NCI).... 8 Rasch Person-Fit Statistic (Rx 2 ) 9 Extended Caution Index (ECI) 10 Assumptions 11 Educational Significance 11 Summary 12 II REVIEW OF LITERATURE 14 Person-Fit Measures 14 Historical Background 14 Types of Person-Fit Indices 16 Comparative Studies of Person-Fit Indices 18 Person-Fit Indices Under Study 20 Test Anxiety 21 Summary 23 III METHODOLOGY 25 Examinees 25 Instruments 26 Creation of the Data File 27 Calculation of Person-Fit Statistics 27 Analysis 28 Summary 29 i v

PAGE 5

Page IV RESULTS 31 Descriptive Statistics 32 Relationship Between Person-Fit, Ability, and Test Anxiety 36 Relationship Between Person-Fit and a Linear Combination of Ability, Test Anxiety, and Their Interaction 36 Internal Consistency of Person-Fit Statistics 47 Summary 50 V DISCUSSION 52 Relationships Among Person-Fit Statistics 52 Relationship Between Person-Fit, Ability, and Test Anxiety 53 Reliability of Person-Fit Measures 58 Summary 59 REFERENCES 61 BIOGRAPHICAL SKETCH 65 v

PAGE 6

LIST OF TABLES Table Mi 1 S-P Table for Five Examinees and Six Items (Ideal Pattern) 6 2 Descriptive Statistics for Person-Fit Measures by Grade and Ability Test 33 3 Descriptive Statistics for Ability Tests and Test Anxiety by Grade 34 4 Person-Fit Measures Intercorrelations by Grade and Ability Test 35 5 Correlations Between Person-Fit Measures and Total Scores on Corresponding Ability Test and Test Anxiety 37 6 Percentage of Variance (R 2 ) in Person-Fit Explained by the Combination of Examinee Ability, Test Anxiety, and Their Interaction 38 7 Significant Ability and Test Anxiety Interactions and R 2 Increases for Person-Fit Measures 41 8 Significant Main Effect of Ability as Predictor of Person-Fit Measures 48 9 Corrected Split-Half Reliability Estimates for Person-Fit Measures by Grade and Ability Test 49 vi

PAGE 7

LIST OF FIGURES Figure Page 1. Relationship between the modified caution index and test anxiety for seventh-grade examinees at different reading ability levels 42 2. Relationship between the modified caution index and test anxiety for eighth-grade examinees at different science ability levels 43 3. Relationship between the personal biserial and test anxiety for eighth-grade examinees at different science ability levels 44 4. Relationship between the norm conformity index and test anxiety for eighth-grade examinees at different science ability levels 45 5. Relationship between the extended caution index and test anxiety for eighth-grade examinees at different science ability levels 46 vii

PAGE 8

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy THE RELATIONSHIP BETWEEN TEST ANXIETY AND STATISTICAL MEASURES OF PERSON FIT ON ACHIEVEMENT TESTS by Alicia P. Schmitt December, 1984 Chair: Linda M. Crocker Major Department: Foundations of Education The purpose of this study was to investigate the relationship between an examinee's level of test anxiety and each of five different person-fit statistics and to establish if this relationship is dependent on ability level. A secondary interest was to determine the relationship among person-fit indices within and across different subject areas of a standardized achievement test and to assess the internal consistency of person-fit indices. An existing data set was analyzed to explore the nature of these relationships for the modified caution index, the personal biserial correlation, the norm conformity index, the Rasch person-fit index, and the extended caution index. Achievement test scores on the reading, mathematics, and science subtests of the Metropolitan Achievement Test (MAT) and scores on the Test Anxiety Scale for Adolescents were used as estimates of ability and anxiety. The item scores and total test scores of 225 seventh-graders and 188 eighthgraders of a metropolitan middle school comprised this data set. v i i i

PAGE 9

Intercorrel ations among the measures of person-fit were in the .80s to .90s within same-subject content areas. Across subject content areas little or no relationship was found. Low to moderate correlations were obtained between person-fit indices and their corresponding ability scores ( | .00 to .50 1) and test anxiety ( J .02 to . 2 2 1 ) . For four of the measures of person fit, on one or more of the subject tests, a significant proportion of variance was explained by the linear combination of ability, test anxiety, and their interaction. In these cases ability levels moderate the relationship between person-fit measures and test anxiety; for lower ability examinees the relationship is direct, but for higher-ability examinees the relationship is inverse. Only the Rasch person-fit index was consistently unaffected by this interaction. Corrected split-half reliability estimates of person-fit indices were low (.20 to .56), indicating little consistency of the trait. According to these results, the potential uses of person-fit indices are questionable at this time. More research is needed before these measures can be recommended for routine use in interpretation of achievement test scores for individual examinees. ix

PAGE 10

CHAPTER I INTRODUCTION Statement of the Problem Although total scores have been consistently used as the basis to evaluate educational achievement, analysis of item-response patterns can contribute additional information that may be useful in the interpretation of an overall score. Analysis of response patterns can be based on two dimensions: item difficulty and examinee ability. Ability is typically estimated by the total score on the test of interest, and item difficulty, by the proportion of examinees answering the item correctly. If the items are arranged in ascending order of difficulty, an examinee with a given ability should answer items correctly until the point where his or her ability matches the difficulty of the items, and miss each item thereafter. Deviations from the expected response pattern occur when the pattern of passed and missed items is not consistent. If a person misses easier items but then responds correctly to harder items, there is deviation from the expected response and misfit occurs. With the introduction of the scalogram technique, Guttman (1941, 1950) was one of the first social scientists to suggest that some persons respond consistently to a given set of ordered stimuli (test items) while others do not. Under Guttman's scale theory, a response pattern where a student passing a more difficult item also responds -1-

PAGE 11

-2correctly to all easier items, is called a perfect simplex, and the scale or test under such situation is called a perfect scale. During the late 1970s and early 1980s there has been a resurgence of interest in using information provided by response patterns. A number of person-fit statistics have been developed to provide a measure of an individual examinee's deviation from the expected response pattern to a given set of items. Although some studies have shown that indices of person-fit are highly correlated (Harnisch & Linn, 1981; Rudner, 1983), attempts to identify causes of person misfit (or even personality or demographic correlates of it) have remained mainly speculative. Some researchers, such as Frary (1982) and Harnisch and Linn (1981), have suggested that one factor which may contribute to person-misfit on cognitive tests is test anxiety, but prior to this study there has been no empirical investigation to test this hypothesis. Purpose of the Study The present exploratory study was designed to investigate the nature of the relationship between measures of person-fit and test anxiety. For each of five selected indices of fit (modified caution index, personal biserial correlation, norm conformity index, Rasch person-fit index, and an extended caution index), the following questions were asked: 1. What is the degree of linear relationship between test anxiety and an examinee's level of misfit?

PAGE 12

-32. What is the degree of linear relationship between ability (as defined by performance on the current achievement test) and level of misfit? 3. To what extent is variance in person-misfit explained by a linear combination of the variables: ability level, test anxiety, and their interaction? A secondary interest in this investigation was to explore the degree of relationship among the five selected person-fit indices within and across subject area tests of an achievement battery and to estimate their internal consistencies. This information was considered important because the tests used in this case were subtests from a well-known nationally norm-referenced, standardized achievement test battery. Earlier studies of the interrelationship and reliability of person-fit indices have typically been based upon state minimal competency examinations (Harnisch & Linn, 1981) or locally developed teacher-made tests (Frary, 1982). Theoretical Rationale To date most research on test anxiety has considered primarily the effects on examinees' total test score. Recently, Frary (1982) and Harnisch and Linn (1981) have suggested that test anxiety may be a factor which contributes to erratic performance of an examinee within • a given test (e.g., missing relatively easy items, while answering more difficult items correctly). Careless errors and lack of concentration by high-test-anxious individuals could change the pattern of item responses from the pattern that would be expected.

PAGE 13

-4Two theories predict the effect of anxiety on performance. According to the cognitive attentional theory of test anxiety, highly anxious students attend to self-relevant variables instead of to taskrelevant variables, negatively affecting their performance (Wine, 1980). In an analysis of Spiel berger's (1966, 1971) extension of Spence-Tay lor drive theory, Heinrich and Spielberger (1982) make several predictions about the effect of ability and anxiety on performance of tasks with varying levels of difficulty that seem relevant to this study of test anxiety and person-fit. These are as fol lows : 1. For subjects with superior intelligence, high anxiety will facilitate performance on most learning tasks. While high anxiety may initially cause performance decrements on very difficult tasks, it will eventually facilitate the performance of bright subjects as they progress through the task and correct responses become dominant. 2. For subjects of average intelligence, high anxiety will facilitate performance on simple tasks and, later in learning, on tasks of moderate difficulty. On very difficult tasks, high anxiety will generally lead to performance decrements. 3. For low intelligence subjects, high anxiety may facilitate performance on simple tasks that have been mastered. However, performance decrements will generally be associated with high anxiety on difficult tasks, especially in the early stages of learning. (Heinrich & Spielberger, 1982, p. 147) According to these predictions, response patterns and person-fit statistics will be different for high, average, and low ability examinees depending on their anxiety levels. The predicted effect for high and low anxious students at these three ability levels would be 1. Subjects with high ability and high test anxiety would be expected to initially fail hard items, but since during testing conditions, examinees receive no feedback, correct responses will not

PAGE 14

-5be expected to become dominant. These examinees would continue to have occasional difficulty on harder items, but their high levels of test anxiety would facilitate performance on easier items. Moderate to low misfit would be expected. For subjects with high ability and low test anxiety, interference in performance of difficult items is not expected. Due to less attentional interference, use of test-taking strategies might also be more accessible. These students might be more open to guessing on harder items. Moderate-high misfit could be expected. 2. For subjects with average ability, high anxiety will help with easy to moderately difficult items, but will interfere with harder items. A low to moderate misfit would be predicted. Similarly, low test anxiety is not expected to differentially affect item responses for average-ability examinees. 3. For low ability subjects, high anxiety may help with the easier items but will interfere with performance on more difficult items. If these examinees do not feel very confident in their knowledge, high anxiety might not help but affect their concentration, making them answer in a more spurious manner. In this last case, higher misfit might occur. When low ability subjects are also low in test anxiety no interference is expected; these students will probably direct their attention to the easier items they master. A low to moderate misfit is expected. Definition of Technical Terms Definitions and formulas required to explain major technical terms used in this study are as follows:

PAGE 15

-6Student-Problem Table (S-P table) The S-P table is used to organize test information into a matrix of zeros and ones. The rows in this matrix represent the students ranked from highest to lowest according to total test score. The columns represent the items arranged from left to right in ascending order of difficulty. Correct responses are represented by ones and incorrect responses are represented by zeros. Assuming that items are arranged in increasing order of difficulty (from easy to hard), a concordant response pattern is one in which an examinee answers the items correctly until he or she reaches an item that is too difficult and answers the items incorrectly from then on. If all examinees had concordant response patterns, the S-P matrix would have all ones in the upper left-hand corner, and all zeros in the lower right-hand corner. A short illustrative table of the ideal response pattern is presented as Table 1. Table 1 S-P Table for Five Examinees and Six Items (Ideal Pattern) Examinee i 1 2 3 Item j 4 5 6 1 1 1 1 1 1 0 2 1 1 1 1 0 0 3 1 1 1 0 0 0 4 1 1 0 3 0 0 5 1 0 0 0 0 0

PAGE 16

Modified Caution Index (MCI) Harnisch and Linn (1981) introduced the modified caution index as a modification of an earlier caution index proposed by Sato in 1975 (cited in Harnisch & Linn, 1981). The MCI has a lower bound of 0 and an upper bound of 1. The higher the value of the index, the more divergent is the person's response pattern. This index is computed with data arranged into an S-P table, using the following formula: n i. J I (1 " u -i> i " I u ii n i MCI = : (!) n i. J X n . I n . j=l J j = (J + 1-n.J J where i is the examinee index in the S-P matrix, j is the item index, U-jj is 1 if examinee i answers item j correctly and 0 if examinee i answers item j incorrectly, n^ is the total number of correct responses for examinee i, and nj is the number of correct responses to item j. Personal Biserial Correlation (PB) Donlon and Fisher (1968) proposed this correlation. The coefficient obtained represents the correlation between a person's item response and the item's difficulty value. Donlon and Fischer define item difficulty as the proportion of examinees who respond incorrectly to an item. Large values correspond to difficult items and small values correspond to easy items. A positive correlation

PAGE 17

represents good fit, indicating that a person tends to answer correctly items that are easy for the group and miss the more difficult items. Low or negative correlations represent more divergent response patterns. The formula to compute this correlation is (Q r Q c ) J ' PB=— ^ -.4" (2) \ Y where Qr is the mean item difficulty for items answered, Q c is the mean item difficulty for items answered correctly, Sq is the standard r deviation for Q r , J r ' is the number of items answered correctly divided by the number of items answered, and Y is the ordinate from the standard normal curve at the point separating the proportions J r ' and (1 J r '). Norm Conformity Index (NCI) This index was developed by Tatsuoka and Tatsuoka (1980). The NCI indicates the degree of concordance to a group response pattern where items are arranged in descending order of difficulty, from hardest to easiest. Values of this index may range from -1 to 1. The smaller or more negative the index, the more divergent is the individual's response pattern in comparison to the group norm. This index is undefined for either perfect or zero scores. Let S denote the row vector of a person's response pattern; letT denote the transpose of the complement of S; and let N =1'S. The formula to compute this index is

PAGE 18

-9NCI = (-^) 1 (3) where U a = £ [ n-jj, the sum of the above diagonal elements of N and i J the sum °^ a "' ^ the elements of N. i J 2 Rasch Person-Fit Statistic (Rx ) The Rx 2 , also referred to as Maximum Likelihood Procedure (MAX) (Wright & Panchapakesan, 1969) and as weighted total fit mean square (Rudner, 1983), was adopted for use with the one parameter Rasch model and is calculated using the BICAL program (Wright, Mead, & Bell, 1979). This index has a high value for an examinee who has a response pattern that is inconsistent with the examinee's score and the Rasch model measure of item difficulty. The following formula is used to compute this index: 2 1 (u u V 2 Rx^ = (4) (p 1j (l p ij» where U^j is the response of examinee i to item j and P-jj is the probability of a correct response for examinee i on item j as predicted by the Rasch model: (e r b.) (e.-b.) Pij = e 1 J /[l + e 1 J ]

PAGE 19

-10Extended Caution Index (ECI) The extended indices have been proposed by Tatsuoka and Linn (1981, 1983). They describe the extended indices as linear transformations of the distance between a person's response pattern and a theoretical curve. In the case of ECI, this curve is the group response curve (GRC), which is "an average function of N different Person Response Curves" (Tatsuoka & Linn, 1981, p. 10). For the ECI, probabilities of success, calculated through item response theorylogistic models substitute the zeros or ones in the S-P table. For purpose of this study, the Rasch one-parameter logistic model will be used to calculate these probabilities. The formula for the ECI is ,1 (»u p i.'< Y .j p ..> ECI = 1 (5) where Y ^ j is the response of examinee i to item j, Y j is the sum of responses across examinees for item j, is the proportion of correct responses of examinee i, P is the total proportion of correct responses, P-jj is the probability of correct response for each examinee i on item j according to the Rasch model, and Zp^ j / J is the mean predicted probability of success for examinee i. This formula is computed by the ratio of two covariances. The higher the value of the ECI, the more variation from the expected response pattern. This index is also limited by perfect or zero scores. The denominator would become zero and the value infinite.

PAGE 20

-11Assumptions The following underlying assumptions were held for this study: 1. Standardized testing situations are capable of inducing test anxiety among students who would normally be test anxious. 2. Students responded truthfully on the self-report instrument used to assess their level of test anxiety. 3. Total score on the achievement test used can be taken as an estimate of the examinee's ability (substituting for any external measure of ability, such as an I.Q. or academic aptitude test score). 4. Each subtest of the achievement test measures a fairly unidimensional trait (i.e., achievement in reading or mathematics or science). This assumption is critical for the indices using the Rasch statistics, but it also underlies the assumptions made to calculate the other indices. Educational Significance The potential value of person-fit indices has been cited by Frary (1982), Harnisch (1983), Harnisch and Linn (1981), Levine and Rubin (1979), Rudner (1983), and Van der Flier (1982). These writers suggest that these indices could be useful for the following purposes: 1. To identify individuals for whom the test is inappropriate or invalid. Total test score interpretation can be misleading for examinees who come from different experiential backgrounds or take the test under different motivational dispositions, e.g., test anxiety. 2. To identify groups with different instructional practices or histories, which could change the difficulty of the items, e.g., schooling differences.

PAGE 21

-123. To identify items that are inadequate for particular groups of examinees. Presently person-fit indices are considered to be at a state of development where more research is needed to investigate their psychometric properties and establish their applicability. The reasons why some people are misfits are not clear. If test anxiety can be identified as a factor associated with person misfit, then the interpretive value of person-fit statistics would be enhanced. Another pragmatic contribution of this study is to extend the body of research of person-fit statistics by providing information about 1) the agreement of person fit classifications across different subject matter content areas, as measured by the subtests of the Metropolitan Achievement Test, and 2) the degree of agreement of person-fit classifications by different indices. Summary Analysis of item-response patterns provides information not contained in a total test score. Although the idea of using response pattern information probably originated when Guttman (1941) introduced the scalogram technique, it has not been until the late 1970s and early 1980s that a strong interest in person-fit statistics has developed. Person-fit statistics quantify the degree of deviation of an examinee's response pattern from the expected response pattern. The development and application of person-fit indices is at a fledgling stage. More research is needed to investigate their psychometric properties and establish their applicability. Attempts to identify

PAGE 22

-13causes of person-misfit have remained mainly speculative. Recently Frary (1982) and Harnisch and Linn (1981) have suggested that test anxiety may be one factor that can explain erratic performance of an examinee within a given test (e.g., missing easy items while answering more difficult items correctly). According to drive theory the relationship between test anxiety and performance is moderated by level of ability and task difficulty. Performance on specific items might not only be dependent on the item's difficulty and the examinee's ability but also on the examinee's test anxiety. The primary purpose of this study was to establish 1) the degree of linear relationship between test anxiety and an examinee's level of misfit, 2) the degree of linear relationship between ability (total score on achievement test) and level of misfit, and 3) the extent that variance in person-misfit can be explained by a linear combination of ability level, test anxiety, and their interaction. Five different indices of person-fit were used in this study: the MCI, PB, NCI, Rx^, and ECI. A secondary interest was to investigate the relationship among the five selected person-fit indices within and across subtests of a norm-referenced achievement battery and to estimate their internal consistencies.

PAGE 23

CHAPTER II REVIEW OF LITERATURE The two central aspects of this study are person-fit statistics and test anxiety. These two topics provide the major themes for the organization of the literature review presented in this chapter. Person-Fit Measures Historical Background During the late 1970s and early 1980s there has been an increasing interest in the development and application of statistical indices to identify examinees with aberrant item-response patterns. Proponents of person-fit statistics indicate that these indices add to the information provided by total scores and can also be used to identify potentially inaccurate total scores (Frary, 1982; Harnisch, 1983; Harnisch & Linn, 1981; Rudner, 1983). This trend toward using information from item-response patterns is not new. According to Gaier and Lee (1953) one of the most promising trends in current psychometric research is an increasing concern with methods of evaluating patterns of test scores and test responses . . . our initial hypothesis is that consideration of response configurations will yield more fruitful results than the usual methods of reporting merely the total score for a test ... a total score may thus carry considerably less diagnostic significance than a direct and detailed analysis of test responses per se. (p. 140) -14-

PAGE 24

-15Guttman (1941, 1950) was one of the first writers to suggest that some persons respond consistently to a given set of ordered stimuli (test items) while others do not. According to Guttman (1950), "a person who endorses a more extreme statement . . . should endorse all less extreme statements if the statements are to be considered a scale" (p. 62). Guttman's description of the basic procedure for the scalogram technique of scale analysis is very similar to Sato's S-P chart construction. He states that there are two basic steps in the scalogram pattern formation. These are first, the questions are ranked in order of "difficulty" with the "hardest" questions, i.e., the ones that fewest persons got right, placed first and with the other questions following in decreasing order of "difficulty." Second, the people are ranked in order of "knowledge" with the "most informed" persons, i.e., those who got all questions right, placed first, the other individuals following in decreasing order of "knowledge." (Guttman, 1950, p. 70) Sato's S-P chart is also a two-dimensional matrix where the rows represent the students ranked from highest to lowest according to total test score (cited in Harnisch & Linn, 1981). The columns represent the items arranged from left to right in ascending order of difficulty. Construction of a scalogram pattern and a S-P table follow the same two steps. Once the responses are organized in this fashion, a concordant response pattern is defined as the case when an examinee answers the items correctly until he or she reaches an item that is too difficult and answers al 1 items incorrectly thereafter. Some disruption of a perfect pattern can happen. As the response pattern deviates more from the expected pattern, the degree of aberrance increases. Visual identification of aberrant or erratic response patterns becomes increasingly more difficult as the number of

PAGE 25

-16i terns in a test increases. The number of possible response patterns multiplies as the number of items increases. With the recent introduction of a variety of statistics to measure the degree of deviation from a typical response pattern, there is a renewed interest in using response pattern information. Types of Person-Fit Indices Indices measuring the degree of unusual response patterns can be categorized into three major types: norm-comparison indices, goodness-of-f it indices, and extended indices. Norm-comparison indices, which are based on observed patterns of right and wrong answers and are calculated with summary statistics based on the norm group, include Sato's caution index (cited in Harnisch & Linn, 1981), the modified caution index (Harnisch & Linn, 1981), the agreement, disagreement, and dependability indices proposed by Kane and Brennan (1980), the U' index by Van der Flier (1977), the personal biserial by Donlon and Fischer (1968), and the normconformity index by Tatsuoka and Tatsuoka (1980). Van der Flier's U' index and Tatsuoka and Tatsuoka's norm-conformity index have been reported to have a perfect negative relationship (Harnisch & Linn, 1981). Norm-comparison indices are calculated by using information organized in a S-P table. They indicate the degree of aberrance from the expected response pattern, when examinee ability is defined as the total observed score on the test. Goodness-of-fit or "appropriateness" indices are based on item response theory (IRT) (Levine & Rubin, 1979). As with norm-comparison indices, goodness-of-fit indices are also based on the expected

PAGE 26

-17response pattern for an examinee at a given ability level. The distinction is that for goodness-of-f it indices a more sophisticated definition of "ability" is employed. Instead of simply equating ability with the examinee's observed raw score on the test, ability is defined in terms of his estimated score on a theoretical latent continuum underlying test performance. There are two popular IRT models that estimate examinee abilities based on the latent trait underlying test performance. For Rasch's one-parameter logistic model, examinee ability estimates are determined as a function of item difficulty parameters. A widely used computer program, the BICAL, written by Wright et al . (1979) provides examinee ability estimates, item difficulty parameter estimates, and a person-fit statistic (Rx ) which "indicates how well the individual's item response pattern and the Rasch model fit" (Rudner, 1982, p. 4). The second widely used IRT model is Birnbaum's three parameter logistic model (Lord & Novick, 1968, Ch. 17) for which examinee ability estimates are determined as a function of item difficulty, item discrimination, and guessing parameters. Levine and Rubin (1979) developed three types of appropriateness indices, based on Birnbaum's three parameter logistic model. These approaches are the marginal probability, the likelihood ratios, and the estimated ability variation indices. A practical limitation in using these procedures arises from the large sample sizes usually recommended to obtain stable estimates from the three-parameter model (Hambleton & Cook, 1977). Levine and Rubin devised a simulation of item response data on the Scholastic Aptitude Test (SAT) to conform to normal or aberrant response patterns. Their findings indicate that all three types of

PAGE 27

goodness-of-f it indices demonstrate the capability to detect aberrance when present. Extended caution indices have been proposed by Tatsuoka and Linn (1981, 1983) as a link between the norm-comparison and the goodnessof-fit indices. They linked Sato's S-P theory and item response theory by replacing the original observed zeros and ones of the item scores with IRT probabilities of passing the items. These probabilities were then used in the calculation of the caution indices. Five variations of the extended caution index were created. These extended caution indices are defined as "linear transformations of the covariance or correlation between a person's response pattern and a theoretical curve" (Tatsuoka & Linn, 1983, p. 95). Their findings support the effectiveness of the extended indices in identifying examinees who use erroneous rules in answering arithmetic test problems. Tatsuoka and Linn (1983) point out that these indices can have instrumental utility by identifying students who consistently make errors because of misconceptions. Comparative Studies of Person-Fit Indices There have been several studies in which the relationship and effectiveness of person-fit indices have been compared. Harnisch and Linn (1981) made a comparative analysis of ten norm-comparison indices. Using mathematics and reading tests from a statewide assessment program they also examined school and regional differences. The intercorrel ations between these indices ranged from | .13 to .99 | for mathematics and from | .34 to .96 | for reading. They found that Kane and Brennan's (1980) agreement index had the lowest correlation

PAGE 28

-19with the other indices, but had the highest correlation (.99) with total score. The modified caution index (MCI) was found to have the lowest correlation with total score (-.02 for mathematics and -.21 for reading). Harnisch and Linn (1981) found significant school and regional differences in student's response patterns as measured by the MCI. Rudner (1982, 1983) evaluated nine indices; four were normcomparison indices, while five were goodness-of-f it indices. He generated data by simulating examinees and their responses through Birnbaum's three-parameters model. Response patterns were altered to simulate spuriously high or low respondents. Findings indicated that the norm-comparison indices (point biserial correlation, PB, NCI, and MCI) and the weighted total fit mean square or Rx 2 were highly intercorrel ated (1.77 to ,99| ). The goodness-of-f it indices using Birnbaum's three parameter model and the unweighted total fit mean square had lower intercorrel ations (1.17 to .80 1 ). Validity of the indices was tested by observing how sensitive they were to assessment accuracy. The MCI and the NCI identified comparable proportions of examinees with aberrant response patterns. According to Rudner, "these two approaches were the most stable of the statistics" (Rudner, 1983, p. 217). In general, indices based on IRT showed better detection rates of aberrant response patterns than the norm-comparison indices. Frary (1982), using teacher-made multiple-choice tests, compared three person-fit measures; the Rx 2 , the MCI, and a weighted choice index. In the weighted choice index, distractor choice is considered as part of the estimation of person-fit. The Rx 2 and the MCI were

PAGE 29

-20found to be highly correlated (.75). The smallest relationship between any two of the three indices was between the Rx^ and the weighted choice test (.42). In this study Frary was the first to compute and report person-fit internal consistency estimates. Low and even negative split-half coefficients of the person-fit measures in his study were found (Frary, 1982). Person-Fit Indices Under Study The present study is the first to include all three different types of person-fit statistics (i.e., the norm-comparison indices, the goodness-of-f it indices, and the extended indices). The five indices under study are the modified caution index (MCI), the personal biserial correlation (PB), the norm-conformity index (NCI), the Rasch person-fit statistic (Rx^) , and the extended caution index (ECI). The MCI was chosen for use in this study because it was found to be least related to total test scores (Harnisch & Linn, 1981) and is considered stable with short and long tests (Rudner, 1983). The PB was selected because it has been in use for a longer period of time than more recent indices and is generally associated with classical test theory. Its computations are relatively simple and it has been found to be very efficient with shorter classroom tests (Rudner, 1983). For these reasons the PB could be useful to a larger number of practitioners. The NCI has been found to correlate with total score somewhat higher than other indices (Harnisch & Linn, 1981), but it has nevertheless been recurrently used in different research studies (Harnisch & Linn, 1981; Rudner, 1982, 1983; Van der Flier, 1977, 1982). The NCI and the MCI are considered to be the most applicable

PAGE 30

-21and stable under situations with long and short tests and spuriously high or low scores (Rudner, 1983). The Rx 2 index was selected as part of the indices used in this study, due to the availability of the BICAL computer program. The convenience of having the Rx 2 computations given as part of the output from the BICAL computer program make the Rx 2 index more usable to practitioners. The Rx 2 is a goodness-of-f it or appropriateness type of person-fit index. It uses the Rasch one-parameter logistic model to estimate ability and item difficulty. Appropriateness indices requiring use of the three-parameter logistic model were not feasible for the present study because of the larger sample size recommended to get consistent ability parameter estimates (Hambleton & Cook, 1977). The ECI represents a link between norm-comparison indices and goodness-of-f it indices. Since no comparisons of the ECI with other indices or computations with actual data are available in the literature this index was included to evaluate its relationship to the other indices. It is probably noteworthy that most previous studies of multiple measures of person-fit have focused primarily upon intercorrel ations among these indices without investigating how they correlate with measures of any trait other than achievement itself, as measured by the test. The present study is somewhat broader in scope, since it investigates how these indices relate to another variable, test anxiety. Test Anxiety Most research on test anxiety has considered the effects of test anxiety on total score. According to Tryqn (1980) test anxiety

PAGE 31

-22research findings present a consistent moderate negative correlation between test anxiety and total score measures of achievement. Hightest-anxious individuals tend to score lower on classroom and aptitude tests (Alpert & Haber, 1960; Harper, 1974; Mandler & Sarason, 1952; I. Sarason, 1963, 1975; Spielberger, Gonzalez, Taylor, Algaze, & Anton, 1978). Several researchers have tried to explain why test anxiety affects performance. According to the cognitive attentional theory of test anxiety (CATTA), introduced by Sarason (1960) and extended by Wine (1971, 1980), the "major cognitive characteristics of test anxious persons are negative self-preoccupation, and attention to evaluative cues to the detriment of test cues" (Wine, 1980, p. 371). This misdirection of attention, both in the pre-stages of evaluation (study phase) and the test-taking situation, may limit coding, retention, and retrieval of information by high-test-anxious individuals. Difficulty of the task (e.g., difficult items) is expected to negatively affect attention. Thus according to CATTA, performance of test-anxious persons will be negatively affected. The Spence-Tay lor drive theory also predicts the effect of anxiety on performance of tasks with varying levels of difficulty. Heinrich and Spielberger (1982) summarize these predictions according to the difficulty of the task. They explain that for high anxious students the performance of a task is dependent on its difficulty. High anxiety may facilitate performance on easy tasks, interfere with performance on harder tasks, and be dependent on the stage of learning for tasks of intermediate difficulty. Heinrich and Spielberger (1982) explain the relationship between performance and the learning stage.

PAGE 32

-23According to these authors, "high anxiety will be detrimental to performance early in learning when the strength of correct responses is weak relative to competing error tendencies. Later in learning, high anxiety will begin to facilitate performance as correct responses are strengthened and error tendencies are extinguished" (Heinrich & Spielberger, 1982, p. 146). Varying ability levels and their relationship with anxiety and task difficulty are also considered by the Spence-Tay 1 or drive theory. According to Spielberger (1971) the effect of anxiety on subjects with different ability levels will be subject to the task difficulty and the learning stage considered. These two theories, the CATTA and the Spence-Tay lor drive theory, point to the possibility that test anxiety might have an effect at the item level and that this effect might be dependent on ability level. Person-fit statistics measure the deviation from an expected response pattern. According to the theory of person-fit, in a good fit to a response pattern "high ability examinees are expected to get few easy items wrong," while "low ability examinees are expected to get few difficult items right" (Rudner, 1983, p. 207). If test anxiety affects performance at the item level, it might be a factor which contributes to erratic performance for high anxiety examinees. Summary Literature pertinent to person-fit indices and test anxiety has been reviewed in this chapter. Recent literature on person-fit measures shows an increasing interest in the development and application of these indices. The idea of using information provided

PAGE 33

-24by response patterns is not new; it was first introduced by Guttman (1941) with the scalogram technique. During the late 1970s and early 1980s a number of different person-fit statistics were developed as measures of the degree of deviation from a typical response pattern. These indices can be categorized into three major types: normcomparison indices, goodness-of-f it indices, and extended indices. Five person-fit indices were selected for use in this study. Three of these indices are norm comparison indices (MCI, PB, and NCI), one is a goodness-of-f it index (Rx^), and one belongs to the extended category of indices (ECI). Three major research studies comparing person-fit indices were reviewed. Harnisch and Linn (1981) compared ten norm-comparison indices, and concluded that the MCI seemed to be the most promising due to its lower correlation to total score. Rudner (1983) evaluated four norm-comparison indices and five goodness-of-f it indices. He found that goodness-of-f it indices showed better detection rates of aberrant response patterns. Frary (1982) contributed to the development of person-fit indices by being the first to study their internal consistency. He found low split-half reliabilities. Test anxiety has been suggested as a factor that could affect performance within a test (Frary, 1982; Harnisch & Linn, 1981). Two theories, the CATTA and the Spence-Tay 1 or drive theory predict that high test anxiety will negatively affect performance. These two theories point to the possibility that test anxiety might have an effect at the item level and that this effect might be dependent on ability level. Test anxiety could thus be a factor that contributes to person-misfit.

PAGE 34

CHAPTER III METHODOLOGY The present study was designed to investigate the relationship between an examinee's level of test anxiety and each of five different person-fit statistics and to establish if this relationship is dependent on ability level. A second purpose was to investigate the correlation of person-fit indices within and across different subject areas of a standardized achievement test battery and to assess the internal consistency of person-fit indices. An existing data set was •analyzed to explore the nature of these relationships. A description of the examinee group, instruments, data-file creation, and data analysis methods is presented in this chapter. Examinees The data pool used in this study consisted of test scores and item responses from 225 seventh-graders and 188 eighth-graders from a metropolitan middle school in north central Florida. There was an almost even distribution of boys and girls at each grade level. Approximately 70% of the examinees were white and 30% were black at each grade level. The school population is heterogeneous with respect to socio-economic level.

PAGE 35

-26Instruments The Test Anxiety Scale for Adolescents (TASA) (Schmitt and Crocker, 1982) was used to measure test anxiety as a trait. This instrument is a modified version of the 37-item Mandl er-Sarason Test Anxiety Scale (Sarason, 1972). This scale consists of 31 true-false items and is designed for use with examinees in middle school or junior-high grades. Unlike most other test anxiety scales for children, al 1 items on the TASA deal exclusively with examinee feelings about tests. Sample items include "I worry just before getting a test back"; and "Sometimes on a test I just can't think." Schmitt and Crocker (1982) have found the factor structure of the TASA to be fairly similar to that reported for the adult test anxiety scales. They reported a KR 2 n. of 87 as a total score reliability estimate for these middle school examinees.. The Metropolitan Achievement Test (MAT) subscales (Form KS) were used to measure achievement in reading, mathematics, and science (Prescott, Balow, Hogan, & Farr, 1978). The TASA was administered in March, 1981, approximately two weeks prior to the school -wide administration of standardized achievement tests. The MAT was administered by school staff as part of the school district's regular testing program approximately two weeks after the test anxiety scale was given. The range of item difficulties of the MAT subscales for the seventh and eighth grades is respectively: .21 to .99 and .28 to .99 for reading; .20 to .97 and .31 to .97 for mathematics; and .29 to .94 and .34 to .98 for science.

PAGE 36

Creation of the Data File The test anxiety scores, coded with student ID number, but no other identifying information, were obtained in conjunction with a University of Florida College of Education inservice training project for school personnel on identifying and counseling test-anxious students. The researcher later obtained a set of MAT test item responses for students with those same ID numbers from the county school district testing office. This data file also contained some demographic information (i.e., sex, race, and grade level). The data file used for analysis in this study was created by matching the two examinee data files on student ID number and merging the files. Thirty-seven of the students' records (13 of the seventh graders and 24 of the eighth graders) in the merged file were deleted later, because it was found that these students had been tested out-of-grade level, rather than on the test form taken by their grade peers. Calculation of Person-Fit Statistics For each examinee on each subtest, five different indices of person-fit were calculated. These were the modified caution index, the personal biserial correlation, the norm-conformity index, the Rasch person-fit index, and the extended caution index. To create the data file containing the five person-fit indices for each MAT subtest at each grade level, the original item-examinee response matrix was used. Each examinee's response* to each item was coded 0 or 1 in this matrix. The data on this matrix were used to *Blanks or omitted responses were treated as incorrect responses and assigned a 0 value.

PAGE 37

-281. compute total scores to get an ability estimate for each student; 2. compute a mean score for each item as estimate of item difficulty (p value); and 3. reorganize data into an S-P matrix, by sorting by total score, and by item difficulty. The resulting matrix had students organized by ability (from most able to least able) and items organized by difficulty (from easiest to hardest). This S-P matrix was used to calculate the five person-fit indices, using computer programs written by this author for each person-fit statistic. Refer to Chapter I for formulas' definition. These statistics were programmed using the Statistical Analysis System (SAS) package (Helwig & Council, 1979). The accuracy of each programmed computation was tested using the dummy data set given by Harnisch and Linn (1981, p. 136). Analysis Means, standard deviations, minimum and maximum values by grade were computed for each person-fit statistic, ability measure, and test anxiety score. For the person-fit indices and ability measures these descriptive statistics were calculated for each subtest of the MAT (reading, mathematics, and science). Correlations among fit statistics and between person-fit and test anxiety and between personfit and ability measures were calculated. To investigate the relationship between examinees' level of test anxiety and degree of person-fit and to study if this relationship is dependent upon ability level, a linear multiple regression analysis

PAGE 38

-29was used. In this analysis person-fit measures were the dependent variables and ability (reading, mathematics, or science) and TASA were the continuous independent variables. The model used for each ability measure and person-fit index is r = b 0 + bi*i + b 2 X 2 + b 3 X 1 X 2 where Y' = person-fit predicted by the model, b 0 = intercept value, b\ = regression slope for the ability independent variable, = ability, estimated from total score, b 2 = regression slope for the TASA independent variable, X 2 = TASA, b 3 = regression slope for the interaction of ability and TASA, and X^ 2 = interaction between ability and TASA. To estimate the internal consistency of person-fit indices, items for each MAT subtest at each grade level were divided into odd and even subtests. The original sequential -test-item-number was used for this split. Odd-item and even-item person-fit statistics were computed by following the sequence of steps previously described. The fit index for the odd items was correlated with the fit index for the even items and the resulting correlation was corrected using the Spearman-Brown formula to obtain an internal consistency estimate for the full-length test. Summary A linear multiple regression analysis was used to investigate the relationship between an examinee's level of test anxiety and each of five different person-fit statistics and to study if this

PAGE 39

-30relationship is dependent on ability level. The five person-fit indices included in this study were the modified caution index, the personal biserial correlation, the norm conformity index, the Rasch person-fit index, and the extended caution index. A data set of 225 seventh-grade and 188 eighth-grade examinees' responses to the Test Anxiety Scale for Adolescents and to the Metropolitan Achievement Test reading, mathematics, and science subscales was used to compute person-fit statistics and explore the nature of these relationships. Correlations of person-fit statistics between and within the different subject area tests were examined. Person-fit split-half reliabilities were also computed for each index for each content area and grade level .

PAGE 40

CHAPTER IV RESULTS The present study was undertaken to answer the following questions for each of five selected indices of person-fit: 1. What is the degree of linear relationship between test anxiety and an examinee's level of misfit? 2. What is the degree of linear relationship between ability (as defined by performance on the current test) and level of misfit? 3. To what extent is variance in person misfit explained by a linear combination of the variables: ability level, test anxiety, and their interaction? Analyses were also performed to explore the degree of relationship among the five selected person-fit indices and to determine their split-half reliabilities. The results of the analyses presented in this chapter have been organized beginning with results of simpler analyses and proceeding to those of greater complexity. The issue of the degree of relationship among the five person-fit indices within and across subtests of the MAT is addressed in the next section, Descriptive Statistics. Data relevant to questions 1 and 2 (dealing with the bivariate relationships between test anxiety and misfit and between ability level and misfit) are presented next. Results of multiple regression analyses, relevant to question 3 are presented in the third section of this chapter. The final section of this chapter contains the results -31-

PAGE 41

-32of the investigation of the internal consistency of person-fit indices . Descriptive Statistics The mean, standard deviation, and minimum and maximum values for person-fit measures by grade and ability subtests are presented in Table 2. Means and standard deviations for each person-fit index are very similar across grades and ability subject tests. For the reading test, person-fit measures show the greatest dispersion between the minimum and maximum scores. For this subtest of the MAT, the maximum score observed for the personal biserial correlation on both the seventh and eighth grade was greater than one. This was not a computation error but may be ascribed to sampling error or violation of the assumption of an underlying normal distribution for the dichotomous item response variable. Lord and Novick discuss conditions under which biserial correlations may exceed 1.00 (Lord & Novick, 1968, p. 339). Descriptive statistics for ability tests and test anxiety for the seventh and eighth grades are presented in Table 3. Person-fit measures' intercorrel ations for each grade level are presented in Table 4. Seventh-grade intercorrel ations are shown above the diagonal and eighth-grade's below the diagonal. Results of these correlations indicate that person-fit measures are highly related within the same subject test area. These correlations ranged . from |.78 to .99 [ and are all significant at an alpha level of .0001. However, across tests in different subject areas, there was little or no relationship between the same person-fit measure. The Rasch

PAGE 42

-33Table 2 Descriptive Statistics for Person-Fit Measures by Grade and Ability Test Grade 3 -5 rscn-rit so Min. Max. 3D Min. Max. Reading 3 Test MCI .23 . 10 .02 .31 .25 12 sg ? 3 .57 .20 -.46 1.12 '.S3 .23 -.12 Lll2 NCI .54 .20 -.52 .96 .55 .21 -.10 1.00 Rx 2 .93 .14 .55 1.37 .97 .15 .55 1.49 ECI .01 .25 -.89 1.92 .01 .27 . . 4.T Mathematics 13 Tes mc: .22 .08 .06 .50 .23 .39 .04 .53 PS .57 .17 .02 .93 .53 .19 -.2C . 93 nc: .55 .17 .03 .39 .55 .18 -.17 .92 Rx 2 .98 .14 .67 1.43 .98 .15 .52 1.57 ECI -.01 .31 -.62 .96 -.01 .33 -.72 1.34 Science" Test MCI .25 .08 .07 .59 .25 .33 .07 .54 ?3 .49 . 15 -.14 .38 .49 .15 -.07 .73 NCI .47 .17 -.15 .37 .47 .17 -.09 .35 Rx 2 .99 .12 .59 1.40 .99 .12 .73 1.43 ECI -.04 .36 -.94 1.23 -.05 .33 -.90 a n « 224 for the seventh grade; n 134 -"or the eighth grade. \ » 225 for '.he seventh grade; n 1S8 for the eighth grade. n * 225 for the seventh graae; n_ 135 for the eignth grace.

PAGE 43

-34>? aj f— X c < -t-> CJ hTJ ( (O 1/1 -M (/) •p— +-> 0) a. T3 n3 sL. o O (/) ra CD >1 t— Q X cm CM wr £ CO r-. n CM SI d CC CO CM O O r% XI cn n m m rn a m in a

PAGE 44

-35s 5 c rv x 5 : CM C © CM — ' O CO o cr cr. M ^ T IT c CM CM CM r"_ r-i a s o c c 4-> CO GJ 1 — > •i — i — f — -Q c 1 — 1 01 s3 CO ro QJ s: • i — u_ <* 1 C 0) c 1/1 JO s_ CO ha. u 5 == _ — as 01 :o a. — ^ \c m C 05 CO O CO C C CV S ifl P tf) N N c o o — e r c — in it cc Ol IO CO CA CTI CC cm cc cc cc * * * •* * * cr 1 a cc * « * cc cr. f cr. * « * * mc cc cc cr. T rv o r— 1 1 cr. — * •* ccr *^r cr. Cr 1 CT 1 cr * « -« * m S CTl cr i cr. * * -* * Cr cr cr, t cr CT, 1 cr. * -* r cc rv CC PV cr cc 1 cr. i -* -» * < cr Cr cr CC cr = = — 2 i rm lt. lT) ^ C N CO o o o — ~ O N -. M OA C-) CO r* r>» O — o CO CO ^ O LT. CA C?i ? ^ C xi a In co cc — co cc w x c o. rz c: _ £ 5 cc 2 O $ i i ve n-> OA o t r on i J£ o CM *r & i 1 QJ c X rsj CTl © ' O *c 1 ^1 # — o CC , cm a I l i * rv e-i — CC S CM 1 1/1 m ID ' CM 1 ai X CVi -.03 if! -.01 CO c rz VCIll — cm 3C

PAGE 45

-36person-fit measure (Rx 2 ) had the lowest relationship to other indices within same subject test. Interestingly Rx 2 is the only index with significant intercorrelations with other indices across subject areas. Relationship Between Person-Fit, Ability, and Test Anxiety Correlations between person-fit measures, total ability measures of reading, mathematics, and science and test anxiety for each grade are presented in Table 5. Correlations between person-fit measures and their corresponding total ability scores ranged from | .00 to .50 1. In general the personal biserial (PB) index had the lowest correlation with total score. This lower correlation was consistently observed across subject test areas and grade level. Correlation values between person-fit measures and test anxiety ranged from | .02 to .22 1 : The highest correlation between person-fit measures and test anxiety occurred for seventh grade science person-fit values and for eighthgrade reading person-fit values. Relationship Between Person-Fit and a Linear Combination of Ability, Test Anxiety, and Their Interaction Multiple regression results for the model with ability, test anxiety, and their interaction are summarized in Table 6. Values of R*for the model at each test area are reported. Models with significant ability and test anxiety interactions are identified by having their corresponding R 2 underlined. For the seventh-grade sample, a significant proportion of variance in the modified caution index was explained for each of the three subject area tests by the

PAGE 46

-37Table 5 Correlations Between Person-Fit Measures and Total Scores on Corresponding Ability Test and Test Anxiety Grade Person-Fit Total TASA Total TASA Reading Test MCI .23** . 18** .32** .07 PS -.05 .10 .18* -.15* NCI .•03 .03 .20* -.15* Rx2 -.11 -.02 .16* .17* ECI .02 -.08 .05 .14 Mathematics Test MCI -.15* .03 .09 .06 PB .04 .04 .01 -.03 NCI .20** -.05 .17* -.08 Rx 2 -.32** .10 .30** .11 ECI -.23** .05 .20** .08 Science Test MCI -.25** .17* .45** .03 PB .08 -.09 .29** .02 NCI .22** -.16* .44** -.03 Rx 2 -.39** . 22** .50** .08 ECI -.28** . 18** .50** .04 Note: Higher nisfit is represented by higher values on the MCI, Rx 2 , and ECI and by lower values cn the PB and NCI *Si gn if icant at a = .05. Significant ata = .01 .

PAGE 47

-38Table 6 2 Percentage of Variance (R ) in Person-Fit Explained by the Combination of Examinee Ability, Test Anxiety, and Their Interaction Test Person-Fit MCI PB MCI Rx' ECI Reading Mathematics Science .11** .04* .07** Seventh Grade .04 * .03 .01 .01 .05* .06** .02 .11** .16** .03 .06** .09** Reading Mathematics Science .12** .04 .23** Eighth Grade .05* .04 .13** .05 .06* 721** .05* .10** .27** .02 .07** 7Z5*-» 'Significant at a .05. "Significant at a = .01. Significant interaction a = .05.

PAGE 48

linear* combination of test anxiety, ability, and their interaction. Only in reading, however, was the percentage of variance explained greater than 10%, and on this case the interaction of ability and test anxiety was significant. For the personal biserial, the normconformity index and the extended caution index, although several significant R 2 values were observed, these were all less than .10. Thus it is difficult to interpret these relationships as being substantially important. For the Rasch index of person-fit, substantial proportions of variation were explained by the model in the areas of math and science. No significant interaction effect of ability and test anxiety was detected in either of these cases. For the eighth grade, the modified caution index again appears to be sensitive to variation in examinees' level of test anxiety, ability, and their interaction. The significant R 2 values exceeded 10% for this index in reading and science. Science had the largest R 2 (.23) and a significant interaction between test anxiety and ability. For the personal biserial, the norm-conformity index, and the extended caution index, significant R 2 values greater than .10 were found only in the area of science, and for each of these cases, the interaction between ability and test anxiety contributed significantly to the overall model. Significant (and meaningful) R 2 values for the Rasch x c index were found for the areas of science and mathematics without any interaction between ability and test anxiety. *Curvi linear relationships between person-fit indices and ability and TASA measures were tested and found not significant.

PAGE 49

For further interpretations of these results, the nature of the interaction effect of test anxiety and ability on person-fit was examined. Significant interactions were followed up by plotting the relationship between person-fit measures and TASA at selected ability levels. The formula to calculate the regression line at each ability level is Y' = b 0 + b x X x + (b 2 + b 3 X 1 )X 2 At any value of ability (X]_) a predicted person-fit measure (V) was calculated for different points of TASA (X 2 ). The intercept of this model equals (bg + b^X^) and the slope equals (b 2 + b3Xj_). Table 7 reports the slope and intercept estimates which were used in plotting these lines for all cases in which exceeded .10. The regression lines resulting from these computations are shown in Figures 1-5. It should be noted that Figures 2-5 are based on a single grade-level and a single subject area. These plots depict the nature of the interaction of ability and test anxiety on various indices of person fit. Although the points of intersection of these lines may vary, the same general pattern of relationship between person-fit and test anxiety occurs in all cases. Namely, this pattern of interaction can be characterized as follows: 1. For examinees in the average ability ranges, there is little or no relationship between test anxiety and person-misfit. (Note the "flat" slope of the regression line for Group E in all figures.) 2. As examinee ability level increases (see lines for Groups F, G, and H), the slope of the regression lines increases, generally

PAGE 50

-41Table 7 2 Significant Ability and Test Anxiety Interactions and R Increases for Person-Fig Measures Defendant /ariaDi e Parameter Estimate Error t R Increase venth Grade-Reaaing Mr* . 0543 .46 Slope-Abi 1 ity .0058 .0016 3.74*» Sloce-TASA .0071 .0035 2.02* Slope-Ability*TASA -.0002 .0001 -2.52* .026 (Model R 2 = .11** '' R 3,220 = 3 .30**) Eighth Grade-Science Mr ' Intercept .3126 .0593 5.22** Siope-Abi 1 ity -.0013 .0015 -.35 Slope-TASA .0064 .0037 1.73 Slope-Abiiity*TASA -.0002 .0001 -2.15* .020 (Model R 2 = . 23** F 3,182 * li 3.00**) PB .ntercspt .5323 .1199 4.44** Slope-Abil ity -.0020 .0031 .53 Slope-TASA -.0157 .0075 -2.22* Slooe-Abil ity'TASA .0005 .0002 2.66** .024 • (Model R 2 = . 13** 3 . i32 06**) nc: Intercept .3727 .1210 3. OS*' Slope-Abil ity .0021 .0031 .55 Slooe-TASA -.0144 .0075 -1.91 Slooe-Abil ity*TASA .0004 .0002 2.33* .323 (Model R . 23**; '"3,132 = 17 .33**) CI Intercept Slcpe-Abil ity Slope7ASA 31 :3e-Abil ity'TASA -3 [Mocel R~ = .2291 -.0053 .0327 . O^ 1 1 '3,132 .2573 . CC6 7 .0004

PAGE 51

mere misfit .38r .36 .34 .32 .30 .28 .26 .24 .22 .20 .18 .16 .14 .12 .10 less f misfit L /A Reading Ability: A I 2.35 B 18. 16 C -23.97 D -29.78 E-35.59 (x) F-4 1 .40 G-47.21 H-53.02 1.73 5.06 8.39 1.72 15.05 TASA 18.38 21.71 25.04 28.37 Figure 1. Relationship between the modified caution index and test anxiety for seventh-grade examinees at different reading abil ity level s.

PAGE 52

-43more misfit .40 .38 .36 .34 .32 CO i 03 U c 03 'o CO o .30 .28 .26 .24 .22 .20.18.16. 14 Science Ability: A 16.76 B -21.33 C -25.89 D -30.46 E -35.02 (x) F -39.59 G -44.15 H -48.72 misfit 1.29 4.41 7.52 10.64 13.75 16.87 19.98 23.10 26.21 TASA Figure 2. Relationship between the modified caution index and test anxiety for eighth-grade examinees at difference science abil ity levels.

PAGE 53

•44less misfit .72 .68 .64 .60 .56 CO -52 l
PAGE 54

-45less misfit .77 .73 .69 .65 .6 I I -// — >1.29 4.41 7.52 10.64 13.75 16.87 19.98 2310 26 21 TASA Figure 4. Relationship between the norm conformity index and test anxiety for eighth-grade examinees at different science abil ity level s.

PAGE 55

-46more misfit .56 r .48 .40h .32 .24 .16 .08 CO oo CD u c CD "(J 00 O -.08 -. 16 -.24 -. 32 -.40 -.48 -.56 -.64 -.72 -.80less / misfit Science Ability: A 16.76 B -21.33 C -25.89 D -30.46 E -35.02 (x) F -39.59 G -44.15 H -48.72 1.29 4.41 752 10.64 13.75 16.87 19.98 23.10 26.2! TASA Figure 5. Relationship between the extended caution index and test anxiety for eighth-grade examinees at different science ability levels.

PAGE 56

-47indicating an increasing negative relationship between test anxiety and person-fit. Specifically high-ability, low-anxious examinees show more misfit than high-ability, high-anxious examinees. 3. As examinee ability decreases (see lines for Groups C, B, and A), the opposite effect occurs. Namely low-ability, low-anxious examinees show less misfit than low-ability, high-anxious examinees. When no interactions were significant, only ability main effects were significant. Table 8 presents the Type I sums-of-squares which measure incremental sums of squares for the model as each variable was added. Models for the Rasch person-fit index on mathematics and science ability main effects for the seventh and eighth grades were significant. There was also a significant main effect of reading ability for the model with the modified caution index at the eighth grade. Internal Consistency of Person-Fit Statistics Corrected split-half internal consistency reliability coefficients for person-fit measures by grade and subject content area are reported in Table 9. For the seventh grade sample, the internal consistency estimates ranged from .23 to .56. The highest person-fit split-half reliability estimates were found consistently for the reading subtest. For the eighth grade the range of values was from .23 to .39, with a slight trend for reading to have the higher values. Among person-fit indices, the Rx^ index has the highest internal consistency estimates for the eighth grade (ranging from .31 to .39), but for the seventh grade, no person-fit index was consistently more

PAGE 57

-48Table 8 Significant Main Effect of Ability as Predictor of Person-Fit Measures 3epenaent Type I VariaDle Source df Sum Squares Mean Squares F 9r Seventn Grade— Mathematics 3x 2 Mocel 3 .50 .17 3.20™ .11* Mathematics 1 .47 .47 25. 30™ TASA 1 .01 .01 .47 Mathemat1cs*TASA 1 .02 .02 1.12 :r 221 4.01 .02 Seventh Grade—Science Rx 2 Model 3 .50 .17 13.73' Science 1 .49 .49 39.35" "ASA 1 .01 .01 1.21 Science*TASA 1 .00 .00 .01 Error 221 2.70 .01 Eighth Grade— Reading Model 3 .31 . 10 7.37™ heading 1 .27 .27 20. 3S* TASA 1 .04 .04 3.33 3eading*7ASA 1 .00 .00 .17 Error 130 2.36 .01 Eighth Grade-Mathematics . 2 *x Model 3 .44 .15 7.01™ .10* Mathematics 1 .33 .33 13!o6™ TA SA 1 .01 .01 * \zi Mathematics 'TASA 1 .35 .05 2.73 Error 134 3.36 .02 Eighth Grade— Science Model Science i ASA Science*TASA irrcr .a/ .53 .01 .02 3 1 2' .01 ii. jo™ 53.32™ .31 3'07 'Sigm — cant at i 'Sign if-: cant st s = 31.

PAGE 58

-49Table 9 Corrected Split-Half Reliability Estimates for Person-Fit Measures by Grade and Ability Test Person-? i t Test MCI PS NCI ECI Seventh Grade Reading Mathematics Science . 56 .28 .25 . 56 .28 ,20 Eighth Grade .49 .IS .22 .45 .11 19 .29 .25 Reading Mathematics Science .35 .29 .25 .37 .:; .23 .31 .33 .26 .23 .39 .31 .33 .36 .25

PAGE 59

reliable than the other. Overall with the exception of the reading subtest the reliabilities for person-fit indices for the mathematics and science subtests appear consistent across various indices, and are relatively low. Summary Results can be summarized as follows: 1. Descriptive statistics for person-fit measures, test anxiety, and ability scores were very similar across subject content area and grades. 2. Intercorrelations between person-fit measures showed that these measures are highly related within the same subject content area. Across subject area little or no relationship was found. 3. Correlations between person-fit measures and their corresponding total ability score ranged from 1 .00 to .50 1. The PB index had the lowest correlation with total score. 4. Correlations between person-fit measures and test anxiety scores ranged from |.02 to .22 |. In science, four out of five indices were significantly related to test anxiety scores for seventh graders. For eighth graders in reading, three of the five indices were significantly related to test anxiety. 5. A significant proportion of variance in person-fit measures was explained by the linear combination of test anxiety, ability, and their interaction, for the seventh and eighth grade reading MCI index and for the eighth grade science MCI, PB, NCI, and ECI. Regression lines depicting the nature of these interactions were presented for selected ability levels. Significant and meaningful R 2 values

PAGE 60

(greater than R 2 = .10) for the Rasch person-fit index were found for the areas of science and mathematics without any interaction between ability and test anxiety. 6. Corrected split-half internal consistency estimates for the person-fit indices ranged from .20 to .56.

PAGE 61

CHAPTER V DISCUSSION This study was conducted to investigate the relationship between examinee's level of person-fit and test anxiety, and to study the effect of ability on this relationship. Five person-fit indices were calculated for seventh and eighth grade students who had taken a testanxiety self-report test and the reading, mathematics, and science subtests of the MAT. Discussion of results will focus on findings about (1) the interrelationship among the five person-fit indices, (2) the relationship between person-fit, test anxiety, and ability, and (3) the reliability of the five person-fit statistics under study. Relationships Among Person-Fit Statistics Intercorrel ations among measures of person-fit were quite high within same-subject content areas. The correlation values ranged from 1.78 to .991. These high person-fit statistics' intercorrel ations confirm previous research findings by Harnisch and Linn (1981) and Rudner (1983). In particular, Harnisch and Linn found intercorrel ations among the MCI, PB, and NCI that ranged from |.93 to .97 1 for mathematics tests and from 1.89 to .951 for reading tests. Rudner's intercorrelations among the MCI, PB, NCI, and the Rx 2 for the simulated SAT test ranged from 1 .80 to .99 | and -52-

PAGE 62

-53from | .77 to .99 | for the simulated teacher-made-biology test. These consistent high intercorrelations among the person-fit indices under study indicate that they seem to be measuring a common construct. Relatively low correlations were found for any given index (MCI, PB, NCI, Rx 2 , and ECI) across the reading, mathematics, and science tests. These correlations ranged from -.03 to .24. There seems to be no persistence of misfit across the different tests. These results would question the notion that tendency to misfit is a stable trait which consistently manifests itself in examinee performance across various subject areas. Frary's (1982) correlations of same person-fit index across several tests were also very low. For practical implications these results lead to two points that need to be considered when interpreting person-fit results. Namely, that if an examinee is identified as having poor person-fit by one index, another index will probably identify him/her as a misfit on the same test; but that it can not be concluded that he or she will also be a misfit on another test. Relationship Between Person-Fit, Ability, and Test Anxiety Low to moderate correlation values were obtained between personfit indices and their corresponding total ability scores. The Rasch person-fit index was the index with the highest correlations with total mathematics and science ability scores. For the reading test, the MCI had the highest correlations with total score, but this correlation was positive while a negative correlation would have been expected. The reading subtest was not typical of a power test, since most examinees did not finish all items. More able examinees probably

PAGE 63

-54attempted more items, passing end items that were more "difficult" than some items which they had missed earlier in the test. This caused more able students to get higher misfit classifications, as can be seen in Figure 1. The only person-fit index that did not seem to be affected by the speededness of the reading test was the Rx^. Although its relationship to total reading score was low, it was in the direction expected. The PB index had the lowest correlation with total scores. Contrary to the present findings, Harnisch and Linn (1981) found that the PB was one of the indices that had a high relationship with total score on the reading test (.63). The relationship of the PB to the math total score on the Harnisch and Linn study was nevertheless somewhat lower (.28) than for some other indices. Correlation values between person-fit measures and test anxiety were relatively low. These correlations ranged from | .02 to ,ZZ\. The Rx^ index had the highest correlations with test anxiety. The only case when the Rx^ correlated lowest with test anxiety was for the seventh-grade reading test. This low correlation is ascribed to the speeded nature of the reading test, which was more noticeable at the seventh grade. These correlation values are disappointingly low from a practical perspective; however, one point that might be considered is that these observed correlations may have been attenuated by the extremely low reliability of the person-fit measures and by having a general or trait measure of test anxiety instead of a state-specific measure. As an exploratory exercise one could speculate about the nature of this relationship if person-fit could be more reliably measured. A correction for attenuation method (Magnusson, 1956,

PAGE 64

-55p. 148) was used to estimate what the correlation between person-fit and test anxiety would be if these measures were perfectly reliable. The corrected correlations between person-fit measures and test anxiety ranged from j .03 to .44 1 . Even with the correction for attenuation most of these correlations are still relatively low and there is little evidence offered by the present study to support the notion that higher reliabilities can be achieved in person-fit measures. In order to explore the extent to which variance in person misfit can be explained by a linear combination of the variabl es--abi 1 ity level, test anxiety, and their interaction— this linear multiple regression model was tested for each person-fit statistic at each grade level and for each ability measure. Because of the large sample size a number of fairly small R 2 were statistically significant, so only cases where there was a meaningful percentage of variance (larger than 10%) were considered. The significant interactions between ability and test anxiety demonstrate that ability levels moderate the relationship between person-fit measures and test anxiety. For lower-ability examinees a direct relationship between person-fit measures and test anxiety was found. For higher-ability examinees this relationship was found to be inverse. Figures 1 and 2 present the two general pictures of these interactions. Figure 1 shows the relationship between the modified caution index and test anxiety as measured by TASA for seventh-grade examinees with different reading levels. The nature of the interaction is the same as previously described, but higher ability level students have higher overall MCI values (more misfit). For all

PAGE 65

-56the other significant interactions which were for person fit measures of science at the eighth grade, lower ability level students have higher values of misfit. Figure 2 shows this for the MCI. The difference found for these two content areas might be due to the performance of this sample on the reading subtest. More than 10 percent of the seventh graders missed the last fifteen items of the reading test, making it appear more like a speeded test than a power test. Higher ability examinees probably attempted more end items, increasing their probability of getting higher-misfit classifications. The cogniti ve-attentional theory of test anxiety, the SpenceTaylor drive theory, and previous person-fit research findings suggest interpretive explanations of the interaction results. According to Tobias (1980) and Weinstein, Cubberly, and Richardson (1982) hightest-anxious students will have worse performance on difficult materials compared to low-test-anxious individuals, while with easier material little difference between anxiety levels is expected. In reporting results about cognitive coping behavior and anxiety, Houston (1982) suggests that "highly trait-anxious (and test anxious) individuals tend to lack organized ways of coping with stress and instead ruminate about themselves and the situation in which they find themselves" (p. 198). Since high-test-anxious students' performance on more difficult items is more affected by their anxiety as these examinees reach items that have a level of difficulty that approximates their ability level they will have a harder time coping. Other testing strategies such as test wiseness would not be readily available due to lack of concentration. Examinees with high ability levels and low test

PAGE 66

-57anxiety could take advantage of test-taking skills and answer item correctly beyond their ability level. Since these items would not be answered with the same degree of certainty as easier items, more deviation from the expected response pattern could occur and higher misfit would result. For examinees with high ability and high-test anxiety, coping and test-taking skills could be blocked; making them consistently miss harder items. Lower misfit values would occur for this group. Examinees with lower ability and lower test anxiety levels would answer items correctly to the point where they reach items at their ability level and then miss the harder items. Some misfit could occur due to attempts at harder items. For examinees with low ability and high-test anxiety, distracting thoughts might interfere with performance to almost all items. Even easy items (in reference to their ability level) could be missed; this sporadic answering pattern would classify this group as high misfits. These interaction effects between ability and test anxiety seem to appear more consistently for science especially at the eighth grade. One possible explanation is that the standardized science test fits the curriculum less than the reading or math tests. The mean item difficulty for the science test is lower, indicating a harder test. Examinees taking the science examination might find themselves in a more ambiguous and hence anxiety-producing situation. These findings are primarily of theoretical interest to those who may be interested in learning more about the constructs of test anxiety or person-fit. At best the combination of ability, test anxiety, and their interaction appear to account for only about onefourth of the variance in person-fit indices, and the increments in R 2

PAGE 67

-58obtained by using the interaction term in the regression model were small. The interaction term only accounted for about two percent of the R 2 values. Reliability of Person-Fit Measures Corrected odd-even split-half reliability estimates of person-fit indices were low. These coefficients ranged from .20 to .56. These internal consistency estimates were highest for the person-fit indices computed for the seventh-grade reading test. Part of the reason for the higher reliabilities could be ascribed to the speeded nature of the reading test at this grade level since the original sequentialtest item number was used to split the test into odd-even subtests. Magnusson (1966) cautions on using split-half methods on timed tests. He states, "the time limit has the effect that in reliability computations with split-half methods the test's reliability tends to be overestimated" (p. 114). Frary (1982) analyzed the internal consistency of person-fit indices and also found low and even negative split-half coefficients. His findings led him to conclude that unexpected responses to a small number of items contributed to high misfit classifications and that there was little consistency on the specific items contributing more to misfit. These findings certainly seem to call into question the notion of person-fit as a stable trait that can be reliably measured and also question the potential utility of these indices. Frary summarizes this concern when he suggests "that use of person-fit measures for any decision-making purpose, especially with respect to individual

PAGE 68

-59examinees, should be undertaken only with extreme caution and that substantial additional research will be required before they can be used routinely" (Frary, 1982, p. 17). Summary Results of the relationship among person-fit statistics were quite high within same-subject content areas, but not across different-subject tests. It can be generalized that if an examinee is identified as having poor person-fit by one index, another index will probably identify this examinee as a misfit on the same test, but it can not be expected that this examinee will also be a misfit on another test. Significant interactions between ability and test anxiety demonstrate that ability levels moderate the relationship between person-fit measures and test anxiety. For lower-ability examinees a direct relationship between person-fit measures and test anxiety was found. For higher-ability examinees this relationship was inverse. The cogniti ve-attentional theory of test anxiety and the Spence-Tay lor drive theory suggest interpretive explanations for these interaction results. These findings are of theoretical interest to those interested in learning more about the constructs of test anxiety and person-fit. At best the combination of ability, test anxiety, and their interaction appears to account for only about one-fourth of the variance in person-fit indices. Internal consistency (split-half) reliabilities were relatively low and the present study offers little evidence to support the notion that higher reliability of person-fit indices could be achieved.

PAGE 69

-60According to these results the potential uses of person-fit indices are questionable at this time. More research is needed before personfit indices can be recommended as a routine measure in achievement tests .

PAGE 70

REFERENCES Alpert, R., & Haber, R.N. (1960). Anxiety in academic achievement situations. Journal of Abnormal and Social Psychology , 61 , 207215. Donlon, T.F., & Fischer, F.E. (1968). An index of an individual's agreement with group-determined item difficulties. Educational and Psychological Measurement , 28 , 105-113. Frary, R.B. (1982). A comparison among person-fit measures . Paper presented at the annual meeting of the American Educational Research Association, New York. Frary, R.B., & Giles, M.B. (1980). Multiple-choice test bias due to answering strategy variations . Paper presented at the annual meeting of the National Council on Measurement in Education, Boston, Mass. Gaier, E.L., & Lee, M.C. (1953). Pattern analysis: The configural approach to predictive measurement. Psychological Bulletin , 50 , 140-148. Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In P. Horst, P. Wall in & L. Guttman (Eds.), The prediction of personal adjustment . New York: Social Science Research Counci 1 , Committee on Social Adjustment. Guttman, L. (1950). The basis for scalogram analysis. In S. Stouffer, L. Guttman, E. Suchman, P. Lazarsfeld, S. Star, & J. Clausen (Eds.), Measure and prediction (Vol. 6). Princeton, N.J.: Princeton University Press, 60-90. Hambleton, R.K., & Cook, L.L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement , 14 , 75-96. Harnisch, D.L. (1983). Item response patterns: Application for educational practice. Journal of Educational Measurem ent, 20, 191-206. Harnisch, D.L. ,& Linn, R.L. (1981). Ana 1 ys i s of i tern response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement , 18, 133-146.

PAGE 71

Harper, F.B.W. (1974). The comparative validity of the MandlerSarason Test Anxiety Questionnaire and the Achievement Anxiety Test. Educational and Psychological Measurement , 34, 961-966. Heinrich, D.L., & Spielberger, CD. (1982). Anxiety and complex learning. In H.W. Krohne & L. Laux (Eds.), Achievement, stress, and anxiety . Washington, D.C.: Hemisphere. Helwig, J.T., & Council, K.A. (Eds.). (1979). SAS user's guide, 1979 edition . Raleigh, N.C.: SAS Institute. Houston, K. (1982). Trait anxiety and cognitive coping behavior. In 1.6. Sarason & CD. Spielberger (Eds.), Achievement, stress, and anxiety . Washington, D.C.: Hemisphere. Kane, M.T., & Brennan, R.L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests. Appl ied Psychological Measurement , 4, 105-126. Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics , 4, 269-290. Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores . Reading, Mass . : Addison-Wesley. Magnusson, D. (1966). Test theory . Reading, Mass.: Addison-Wesley. Mandler, G., & Sarason, S.B. (1952). A study of anxiety and learning. Journal of Abnormal and Social Psychology , 47, 166173. Prescott, 6. A., Balow, I.H., Hogan, T.P., & Farr, R.C. (1978). Teacher's manual for administering and interpreting Metropolitan Achievement Tests (Advanced 1, Forms JS and KS). New York: Tn~e Psychological Corporation. Rudner, L.M. (1982). Individual assessment accuracy . A paper presented at the annual meetings of the American Educational Research Association, New York, 1982. Rudner, L.M. (1983). Individual assessment accuracy. Journal of Educational Measurement , 20, 207-220. Sarason, I.G. (1960). Empirical findings and theoretical problems in the use of anxiety scales. Psychological Bulletin , 57 , 403-415. Sarason, I.G. (1963). Test anxiety and intellectual performance. Journal of Abnormal and Social Psychology , 66 , 73-75. Sarason, I.G. (1972). Experimental approaches to test anxiety: Attention and the use of information. In CD. Spielberger (Ed.), Anxiety: Current trends in theory and research (Vol. 2). New York: Academic Press.

PAGE 72

-63Sarason, 1.6. (1975). Anxiety and self-preoccupation. In I.G. Sarason & E.O. Spielberger (Eds.), Stress and anxiety (Vol. 2). Washington, D.C.: Hemisphere. Schmitt, A.P., & Crocker, L. (1982). Test anxiety and its components for middle school students. Journal of Early Adolescence , 2, 267-275. Spielberger, CD. (1966). Theory and research on anxiety. In CD. Speilberger (Ed.), Anxiety and behavior . New York: Academic Press. Spielberger, CD. (1971). Trait-state anxiety and motor behavior. Journal of Motor Behavior , _3, 265-279. Spielberger, CD., Gonzalez, H.P., Taylor, C.J., Algaze, B., & Anton, W.D. (1978). Examination stress and test anxiety. In CD. Spielberger & I.G. Sarason (Eds.), Stress and anxiety (Vol. 5). New York: Hemisphere/Wiley. Tatsuoka, K., & Linn, R.L. (1981). Indices for Detecting unusual item response patterns in Personnel testing: Links between direct and item-response-theory approaches (Research Report 81-5). Urbana: University of Illinois, Computer-Based Education Research Laboratory. Tatsuoka, '<., & Linn, R.L. (1983). Indices for detecting unusual patterns: Links between two general approaches and potential applications. Applied Psychological Measurement , ]_, 81-96. Tatsuoka, K., & Tatsuoka, M.M. (1980). Detection of aberrant response patterns and their effect on dimensionality (Research Report 80-4). Urbana: University of Illinois, Computer-Based Education Research Laboratory. Tobias, S. (1980). Anxiety and instruction. In I.G. Sarason (Ed.), Test anxiety: Theory, research, and applications . Hillsdale, N.J.: Lawrence Erlbaum Associates. Tryon, G.S. (1980). The measurement and treatment of test anxiety. Review of Educational Research , 50 , 343-372. Van der Flier, H. (1977). Environmental factors and deviant response patterns. In Y.H. Poortinga (Ed.), Basic problems in crosscultural psychology . Lisse, Netherlands: Swets & Zeitlinger. Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cul tural Psychology , 13_, 267-298. Weinstein, C.E., Cubberly, W.E., & Richardson, F.C (1982). The effects of test anxiety on learning at superficial and deep levels of processing. Contempor ary Educational Psychology, 7, 107-112. ~

PAGE 73

-64Wine, J.D. (1971). Test anxiety and direction of attention. Psychological Bullegin , 76, 92-104. Wine, J.D. (1980). Cogniti ve-attentional theory of test anxiety. In I.G. Sarason (Ed.), Test anxiety: Theory, research, and appl ications . Hillsdale, N.J.: Lawrence Erlbaum Associates. Wright, B.D., Mead, R., & Bell, S. (1979). BICAL: Calibrating items with the Rasch model (RM-23b). Chicago: University of Chicago. Wright, B.D., & Panchapakesan, N.A. (1969). A procedure for sample free item analysis. Educational and Psychological Measurement , 29, 23-48.

PAGE 74

BIOGRAPHICAL SKETCH Alicia P. Schmitt was born in Havana, Cuba, on September 28, 1952. She immigrated to the United States in 1961 and moved to Puerto Rico in 1962. In 1970 she graduated from high school and entered the University of Puerto Rico, graduating in 1973 and in 1975 with a bachelor's and master's degree in psychology. From 1975 to 1977 she worked as Evaluation Coordinator for Project Follow Through and taught evening courses at the University of Puerto Rico. Alicia moved to Gainesville in 1977 and began her doctoral program at the University of Florida in the fall of 1978. While in graduate school she held a variety of assistantships. For a period of two years, she served as research consultant for the Research Clinic of the College of Education. As part of the duties in other assistantships she developed and taught the Independent Study by Correspondence Course in Measurement and Evaluation in Education; assisted in measurement, research, and statistics courses in the College of Education; and worked as research assistant for a graduate school dean. In 1983 she became Assistant Director of Testing and Evaluation for the Office of Instructional Resources, University of Florida. In this capacity she administered the College of Education Basic Skills Testing Program and analyzed institutional studies used for educational planning. Alicia has currently accepted a position with Educational Testing Services and will be moving to Princeton, New Jersey. -65-

PAGE 75

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Linda M. Crocker, Chair Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Jqlmes J. Algina \ Associate Processor of^Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation, and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. -ofessor of Psychology

PAGE 76

This dissertation was submitted to the Graduate Faculty of the College of Education and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. December, 1984 Dean for Graduate Studies and Research