Citation
Development of the disruptive student behavior scale

Material Information

Title:
Development of the disruptive student behavior scale
Creator:
Moses, William L., 1936-
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Copyright Date:
1986
Language:
English

Subjects

Subjects / Keywords:
Child psychology ( jstor )
Classrooms ( jstor )
Educational evaluation ( jstor )
High school students ( jstor )
Psychology ( jstor )
Psychometrics ( jstor )
Rating scales ( jstor )
Schools ( jstor )
Students ( jstor )
Teachers ( jstor )
City of Tallahassee ( local )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright William L. Moses. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
AEJ3538 ( ltuf )
15111825 ( oclc )
0029678715 ( ALEPH )

Downloads

This item has the following downloads:

developmentofdis00mose_0000 ( .txt )

developmentofdis00mose_0001 ( .txt )

developmentofdis00mose_0002 ( .txt )

developmentofdis00mose_0003 ( .txt )

developmentofdis00mose_0004 ( .txt )

developmentofdis00mose_0005 ( .txt )

developmentofdis00mose_0006 ( .txt )

developmentofdis00mose_0007 ( .txt )

developmentofdis00mose_0008 ( .txt )

developmentofdis00mose_0009 ( .txt )

developmentofdis00mose_0010 ( .txt )

developmentofdis00mose_0011 ( .txt )

developmentofdis00mose_0012 ( .txt )

developmentofdis00mose_0013 ( .txt )

developmentofdis00mose_0014 ( .txt )

developmentofdis00mose_0015 ( .txt )

developmentofdis00mose_0016 ( .txt )

developmentofdis00mose_0017 ( .txt )

developmentofdis00mose_0018 ( .txt )

developmentofdis00mose_0019 ( .txt )

developmentofdis00mose_0020 ( .txt )

developmentofdis00mose_0021 ( .txt )

developmentofdis00mose_0022 ( .txt )

developmentofdis00mose_0023 ( .txt )

developmentofdis00mose_0024 ( .txt )

developmentofdis00mose_0025 ( .txt )

developmentofdis00mose_0026 ( .txt )

developmentofdis00mose_0027 ( .txt )

developmentofdis00mose_0028 ( .txt )

developmentofdis00mose_0029 ( .txt )

developmentofdis00mose_0030 ( .txt )

developmentofdis00mose_0031 ( .txt )

developmentofdis00mose_0032 ( .txt )

developmentofdis00mose_0033 ( .txt )

developmentofdis00mose_0034 ( .txt )

developmentofdis00mose_0035 ( .txt )

developmentofdis00mose_0036 ( .txt )

developmentofdis00mose_0037 ( .txt )

developmentofdis00mose_0038 ( .txt )

developmentofdis00mose_0039 ( .txt )

developmentofdis00mose_0040 ( .txt )

developmentofdis00mose_0041 ( .txt )

developmentofdis00mose_0042 ( .txt )

developmentofdis00mose_0043 ( .txt )

developmentofdis00mose_0044 ( .txt )

developmentofdis00mose_0045 ( .txt )

developmentofdis00mose_0046 ( .txt )

developmentofdis00mose_0047 ( .txt )

developmentofdis00mose_0048 ( .txt )

developmentofdis00mose_0049 ( .txt )

developmentofdis00mose_0050 ( .txt )

developmentofdis00mose_0051 ( .txt )

developmentofdis00mose_0052 ( .txt )

developmentofdis00mose_0053 ( .txt )

developmentofdis00mose_0054 ( .txt )

developmentofdis00mose_0055 ( .txt )

developmentofdis00mose_0056 ( .txt )

developmentofdis00mose_0057 ( .txt )

developmentofdis00mose_0058 ( .txt )

developmentofdis00mose_0059 ( .txt )

developmentofdis00mose_0060 ( .txt )

developmentofdis00mose_0061 ( .txt )

developmentofdis00mose_0062 ( .txt )

developmentofdis00mose_0063 ( .txt )

developmentofdis00mose_0064 ( .txt )

developmentofdis00mose_0065 ( .txt )

developmentofdis00mose_0066 ( .txt )

developmentofdis00mose_0067 ( .txt )

developmentofdis00mose_0068 ( .txt )

developmentofdis00mose_0069 ( .txt )

developmentofdis00mose_0070 ( .txt )

developmentofdis00mose_0071 ( .txt )

developmentofdis00mose_0072 ( .txt )

developmentofdis00mose_0073 ( .txt )

developmentofdis00mose_0074 ( .txt )

developmentofdis00mose_0075 ( .txt )

developmentofdis00mose_0076 ( .txt )

developmentofdis00mose_0077 ( .txt )

developmentofdis00mose_0078 ( .txt )

developmentofdis00mose_0079 ( .txt )

developmentofdis00mose_0080 ( .txt )

developmentofdis00mose_0081 ( .txt )

developmentofdis00mose_0082 ( .txt )

developmentofdis00mose_0083 ( .txt )

developmentofdis00mose_0084 ( .txt )

developmentofdis00mose_0085 ( .txt )

developmentofdis00mose_0086 ( .txt )

developmentofdis00mose_0087 ( .txt )

developmentofdis00mose_0088 ( .txt )

developmentofdis00mose_0089 ( .txt )

developmentofdis00mose_0090 ( .txt )

developmentofdis00mose_0091 ( .txt )

developmentofdis00mose_0092 ( .txt )

developmentofdis00mose_0093 ( .txt )

developmentofdis00mose_0094 ( .txt )

developmentofdis00mose_0095 ( .txt )

developmentofdis00mose_0096 ( .txt )

developmentofdis00mose_0097 ( .txt )

developmentofdis00mose_0098 ( .txt )

developmentofdis00mose_0099 ( .txt )

developmentofdis00mose_0100 ( .txt )

developmentofdis00mose_0101 ( .txt )

developmentofdis00mose_0102 ( .txt )

developmentofdis00mose_0103 ( .txt )

developmentofdis00mose_0104 ( .txt )

developmentofdis00mose_0105 ( .txt )

developmentofdis00mose_0106 ( .txt )

developmentofdis00mose_0107 ( .txt )

developmentofdis00mose_0108 ( .txt )

developmentofdis00mose_0109 ( .txt )

developmentofdis00mose_0110 ( .txt )

developmentofdis00mose_0111 ( .txt )

developmentofdis00mose_0112 ( .txt )

developmentofdis00mose_0113 ( .txt )

developmentofdis00mose_0114 ( .txt )

developmentofdis00mose_0115 ( .txt )

developmentofdis00mose_0116 ( .txt )

developmentofdis00mose_0117 ( .txt )

developmentofdis00mose_0118 ( .txt )

developmentofdis00mose_0119 ( .txt )

developmentofdis00mose_0120 ( .txt )

developmentofdis00mose_0121 ( .txt )

developmentofdis00mose_0122 ( .txt )

developmentofdis00mose_0123 ( .txt )

developmentofdis00mose_0124 ( .txt )

developmentofdis00mose_0125 ( .txt )

developmentofdis00mose_0126 ( .txt )

developmentofdis00mose_0127 ( .txt )

developmentofdis00mose_0128 ( .txt )

developmentofdis00mose_0129 ( .txt )

developmentofdis00mose_0130 ( .txt )

developmentofdis00mose_0131 ( .txt )

developmentofdis00mose_0132 ( .txt )

developmentofdis00mose_0133 ( .txt )

developmentofdis00mose_0134 ( .txt )

developmentofdis00mose_0135 ( .txt )

developmentofdis00mose_0136 ( .txt )

developmentofdis00mose_0137 ( .txt )

developmentofdis00mose_0138 ( .txt )

developmentofdis00mose_0139 ( .txt )

developmentofdis00mose_0140 ( .txt )

developmentofdis00mose_0141 ( .txt )

developmentofdis00mose_0142 ( .txt )

developmentofdis00mose_0143 ( .txt )

developmentofdis00mose_0144 ( .txt )

developmentofdis00mose_0145 ( .txt )

developmentofdis00mose_0146 ( .txt )

developmentofdis00mose_0147 ( .txt )

developmentofdis00mose_0148 ( .txt )

developmentofdis00mose_0149 ( .txt )

developmentofdis00mose_0150 ( .txt )

developmentofdis00mose_0151 ( .txt )

developmentofdis00mose_0152 ( .txt )

developmentofdis00mose_0153 ( .txt )

developmentofdis00mose_0154 ( .txt )

developmentofdis00mose_0155 ( .txt )

developmentofdis00mose_0156 ( .txt )

developmentofdis00mose_0157 ( .txt )

developmentofdis00mose_0158 ( .txt )

developmentofdis00mose_0159 ( .txt )

developmentofdis00mose_0160 ( .txt )

developmentofdis00mose_0161 ( .txt )

developmentofdis00mose_0162 ( .txt )

developmentofdis00mose_0163 ( .txt )

developmentofdis00mose_0164 ( .txt )

developmentofdis00mose_0165 ( .txt )

developmentofdis00mose_0166 ( .txt )

developmentofdis00mose_0167 ( .txt )

developmentofdis00mose_0168 ( .txt )

developmentofdis00mose_0169 ( .txt )

developmentofdis00mose_0170 ( .txt )

developmentofdis00mose_0171 ( .txt )

developmentofdis00mose_0172 ( .txt )

developmentofdis00mose_0173 ( .txt )

developmentofdis00mose_0174 ( .txt )

developmentofdis00mose_0175 ( .txt )

developmentofdis00mose_0176 ( .txt )

developmentofdis00mose_0177 ( .txt )

developmentofdis00mose_0178 ( .txt )

developmentofdis00mose_0179 ( .txt )

developmentofdis00mose_0180 ( .txt )

developmentofdis00mose_0181 ( .txt )

developmentofdis00mose_0182 ( .txt )

developmentofdis00mose_0183 ( .txt )

developmentofdis00mose_0184 ( .txt )

developmentofdis00mose_0185 ( .txt )

developmentofdis00mose_0186 ( .txt )

developmentofdis00mose_0187 ( .txt )

developmentofdis00mose_0188 ( .txt )

developmentofdis00mose_0189 ( .txt )

developmentofdis00mose_0190 ( .txt )

developmentofdis00mose_0191 ( .txt )

developmentofdis00mose_0192 ( .txt )

developmentofdis00mose_0193 ( .txt )

developmentofdis00mose_0194 ( .txt )

developmentofdis00mose_0195 ( .txt )

developmentofdis00mose_0196 ( .txt )

developmentofdis00mose_0197 ( .txt )

developmentofdis00mose_0198 ( .txt )

developmentofdis00mose_0199 ( .txt )

developmentofdis00mose_0200 ( .txt )

developmentofdis00mose_0201 ( .txt )

developmentofdis00mose_0202 ( .txt )

developmentofdis00mose_0203 ( .txt )

developmentofdis00mose_0204 ( .txt )

UF00102775 ( .pdf )

UF00102775_pdf.txt


Full Text











DEVELOPMENT OF THE
DISRUPTIVE STUDENT BEHAVIOR SCALE









BY

WILLIAM L. MOSES


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


MAY 1986















































Copyright 1986

by

William L. Moses

























To Billy



Who would have been proud













ACKNOWLEDGEMENTS


I wish to express appreciation to my parents for their

love, understanding, and help; my committee chairperson,

Dr. McDavis, for his counsel and encouragement; my

committee members, Dr. Ziller for his confidence in my

ability to work independently and Dr. Loesch for stepping

into the breach and contributing so much so quickly while

continuing his friendship and support; and my employers at

Pasco-Hernando Community College for their financial

support.

Special thanks go to my friend and colleague, Dr. Tom

Floyd, who listened for hours and encouraged for years,

and to my friends and lovers who were usually supportive,

sometimes distracting, and always worth it.














TABLE OF CONTENTS


ACKNOWLEDGMENTS . . . . .

LIST OF TABLES. . . . . .

ABSTRACT. . . . . .


CHAPTER ONE INTRODUCTION . .
Statement of the Problem .
Purpose of the Study . .
Need for the Study . .
Significance of the Study .
Definition of Terms. . .
Organization of the Study .


* .. .
*. . .



.


CHAPTER TWO REVIEW OF LITERATURE . . .
Definition of Disruptive School Behavior (DSB)
Identification, Assessment, and Placement. .
Rating Scale Development . . .
Psychometric Properties of Rating Scales .
Uses of Behavior Rating Scales . .
Summary . . . . .


CHAPTER THREE METHODOLOGY. . . . .
Research Questions . . . .
Construction of the DSBS . . .
Validation of the DSBS . . . .
Reliability of the DSBS. . . .
Field Study . . . . .
Data Analyses . . . .
Validity . . . . .
Reliability . . . .
Limitations . . . . .

CHAPTER FOUR RESULTS AND DISCUSSION . .
Results . . . . .
The Severity Factor . . .
The Samples . . . .
Research Question One . . ..
Research Question Two . . .
Research Question Three . . ..
Research Question Four. . . .
Summary . . . .
Discussion . . . . .


S. vii

. .viii


S 21
S 38
S 57
S 66
S 68

S 70
S 70
S 71
. 77
. 80
. 80
. 85
. 85
S 87
S 87

S 89
. 89
S 89
. 92
S 93
S 98
S100
. 110
. 117
. 113


* *


* .
. .













TABLE OF CONTENTS CONTINUED


CHAPTER FIVE CONCLUSIONS, IMPLICATIONS, SUMMARY, AND
RECOMMENDATIONS. . . .. . 118
Conclusions. . . . . 118
Implications . ... . . 119
Summary. . . . . ... 121
Recommendations. . . . . 123

APPENDICES
A CONSTRUCT DEVELOPMENT STUDY . . .. 124
B BEHAVIORS COLLECTED FROM DISCIPLINARY RECORDS 130
C ORAL INSTRUCTIONS FOR THE EDITING STUDY . 133
D ITEMS DEVELOPED FROM CONTENT VALIDATION STUDY 134
E INSTRUCTIONS FOR CONTENT VALIDATION STUDY 137
F THE DISRUPTIVE STUDENT BEHAVIOR SCALE (DSBS). 138
G INSTRUCTIONS FOR SEVERITY FACTOR STUDY. .. 144
H SCORING TEMPLATE FOR THE DSBS . .. 145
I SUMMARY OF TEACHER RATINGS ON THE DSBS. . 151
J PRESCRIPTIVE PROFILE WORKSHEET FOR THE DSBS 152
K INSTRUCTIONS FOR THE PILOT STUDY. . .. .158
L RATERS EVALUATION OF THE DSBS . .. 160
M ASSIGNMENT OF CONSTRUCTS BY ITEM NUMBER . 161

REFERENCES. . . . . ... . 162

BIOGRAPHICAL SKETCH . . . . 193













LIST OF TABLES


Table rage

1. Domains of Student Life Influenced by
the School Experience . . 76

2. Potential Adverse Consequences of DSBS
Behaviors . . .... ... 90

3. Rating Form Distribution by Demographic
Categories--Norming Group . .. 94

4. Rating Form Distribution by Demographic
Categories--Disruptive Group. . ... 95

5. Frequency of Observed DSBS Behaviors by
Constructs . .... . .. 97

6. DSBS Constructs by Number. . . ... 99

7. Assignment of Proposed Scale Items to
Constructs for Content Validation 101

8. Follow-up Study for Assignment of Proposed
Scale Items to Constructs. . ... 103

9. Comparison of Disruptiveness Ratings by
Teachers and Non-teaching Personnel. 106

10. DSBS Ratings and z-scores for the Disruptive
Group . . . . 107

11. DSBS Ratings and z-scores for the
Norming Group. . . . 108

12. Test-Retest Correlations . .. 112


vii













Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial
Fulfillment of the Requirements for the
Degree of Doctor of Philosophy



DEVELOPMENT OF THE
DISRUPTIVE STUDENT BEHAVIOR SCALE


By


William L. Moses


May 1986


Chairperson: Roderick McDavis, Ph.D.
Major Department: Counselor Education


Disruptive behavior is currently seen by both educa-

tors and the public as a major problem in American

education. A procedure for quantitatively assessing

disruptive behavior in schools is required to show a need

for intervention programs and to select students for

placement in either special education or alternative

education programs. The purpose of this study was to

develop and validate an instrument, the Disruptive Student

Behavior Scale (DSBS). The DSBS is intended for use in

assessing quantitatively the disruptive school behaviors


viii








of middle and junior high students referred for placement

in special education and alternative education programs.

This study investigated the position that disruptive

school behavior (DSB) can best be described in terms of

its type, frequency, and severity. The use of teachers as

observers and raters of disruptive school behavior is

discussed. Using teacher-generated behavioral statements

from disciplinary referrals to better describe DSB is

suggested. A review of various rating scale development

procedures attempted by business, industry, and government

is summarized.

A set of 10 constructs was selected to define DSB.

Scale items were developed from referral statements on

disciplinary records in a junior high school. A severity

factor was incorporated into the scoring system so that

behaviors rated as more detrimental to the student were

given a higher DSBS rating.

The DSBS was field tested in a public middle school.

Students in a norming group and a criterion, or disrup-

tive, group were rated by their classroom teachers using

the DSBS. A norm for disruptive behavior for the target

school was calculated and a criterion for classifying a

student as disruptive was established.

Results indicated the DSBS could identify the crite-

rion group of disruptive students, classify individual








students as disruptive, and exclude non-disruptive

students from the disruptive group. A follow-up study

suggested the results were consistent over time for all

DSBS ratings except those at the lowest end of the scale.














CHAPTER ONE

INTRODUCTION



The public school system in the United States has

been assigned a major role in socializing and enculturating

American youth (Filipczak, 1978). The U.S. Supreme Court

in its 1954 landmark civil rights decision (Brown v. Board

of Education of Topeka, 74 S.Ct. 686, 691) described

education as "a principal instrument in awakening the

child to cultural values, in preparing him for later

professional training, and in helping him to adjust

normally to his environment." The materialistic emphasis

of American society and culture ordains that the educa-

tional institution at all levels be driven by the broadly

defined goal of career success for its graduates (Bell,

1984; DiPrete, 1981, p. 199; National Education Associa-

tion (NEA), 1975, p. 108).

Unfortunately, a significant number of students are

detoured from this goal when educators describe them as

displaying behaviors inappropriate to the school environ-

ment and not attributable to legally-defined mental or

emotional handicaps. Suspensions, expulsions, and assign-

ments to alternative programs are evidence of failure by

the educational system to effect students' adherence to










current social norms and culturally-specified behaviors.

The consequences to the schools for this failure include

loss of both funds and credibility, neither of which the

educational system has in sufficient quantity to squander.

Attempts to correct this failure to convey effec-

tively norms and behaviors have included both exceptional

child education and alternative schooling programs. The

Education for All Handicapped Children Act of 1975 (P.L.

94-142)(Department of Health, Education, and Welfare,

1977) effectively administered the coup de grace to

exceptional child education approaches in Florida by

failing to include a category appropriate to disruptive

behavior (Florida Department of Education, 1975, 1985).

Alternative programs frequently fail to provide for

selection and discharge criteria, rendering evaluation

virtually impossible (Pinellas County School District,

1982). A primary reason for failure to specify behavioral

criteria for alternative schooling programs is the lack of

appropriate instruments for quantifying disruptive

behavior (Salvia & Ysseldyke, 1981, pp. 8, 9).

Inadequacies of existing behavioral assessment

instruments include failure to provide for local norming,

inclusion of inappropriate items, omission of the severity

factor, and inadequacy of prescriptive information

(Mesinger, 1982). An instrument providing both a

theoretical and a pragmatic rationale for identifying











disruptive students is a requirement for reconsidering the

inclusion of this category in special education legisla-

tion and enhancing the credibility of alternative

education programs (Reeves, Perkins, & Hollon, 1978).



Statement of the Problem

Disruptive behavior in the public school system is

not a new phenomenon (Garibaldi, 1979). That it remains a

problem is emphasized by Robert J. Rubel in introducing a

collection of papers on crime and violence in public

schools:

The issue in the 1980's no longer centers on
whether or not violence in American schools is
serious; the issue no longer centers on whether
violence is increasing or decreasing; the issue
no longer centers on technical anomalies concern-
ing under- or over-reporting of incidents. In
the debate of the 1980's, the primary issue
before large proportions of our urban schools
(and sizeable numbers of our suburban and even
rural schools) revolves around the continued
viability of American education as it existed a
generation ago. (1980, p. 5)

The U. S. government has acknowledged the existence

of disruptive behavior by awarding federal grants for

alternative education pilot programs (Law Enforcement

Assistance Administration, 1979; Moses, 1976).

Included in definitions of disruptive school

behavior (DSB) are such varied activities as talking,

hitting, yelling (Mayer & Butterworth, 1979); defy-

ing rules and procedures (Walker, 1979); aggressive

behavior which interrupts the instructional program










(Foley, 1982); and conduct disorders (American Psychiatric

Association, 1980 pp. 45-50). Forness and Cantwell (1982)

and Forness, Sinclair, and Russell (1984) have identified

these categories as likely to be ineligible for special

education services under P.L. 94-142.

The U.S. government (Department of Health, Education,

and Welfare, 1977), in implementing P.L. 94-142, specif-

ically denied services to the "socially maladjusted."

Florida law provides essentially the same restrictions

(State Board of Education Rule 6A-6.3016), although Bower

(1982), whose research (Bower, 1958) formed the basis for

the P.L. 94-142 definition of emotionally disturbed,

called this exclusion "contradictory in intent and content

with the research from which it came" (1982, p.

60).

The need for alternative education services for

disruptive students seems supported by reports of the

widespread existence of DSB. Individuals and institutions

reporting on the continuing crisis in school discipline

include the California Department of Education (1973),

the National Education Association (1975), the U.S.

Congress (Bayh, 1975; Tygart, 1980), the Michigan

Department of Education (Vergon & Williams, 1978), the

National Institute of Education (Feldhusen, 1978), Cross

and Kohl (1978), Duke (1978), the New York State United

Teachers (1979), and the National Education Association











(1980). The Safe School Study Report to Congress (National

Institute of Education, 1978) indicated 5,000 teacher

assaults per month occurred across the nation. The Gallup

Poll on Education (Gallup, 1984) continues to report lack

of student discipline as the number one concern of

Americans about the public school system.

In Florida, the Governor's Task Force on Disrupted

Youth (GTFDY) found 17,983 student-days lost to suspen-

sions over a 2-year period in the 10 school districts

studied (GTFDY, 1973, p. 11). An analysis of conduct code

violations in Duval County, Florida, schools for 1980-1981

revealed more than 33,000 violations resulting in 13,679

days lost from school (Moses, 1981).

The aversive consequences of chronic DSB for students

include lowered self-esteem and functioning level (Caliste,

1979); dropping out and underemployment (Grise, 1980; NEA,

1975; Safer, Heaton, & Parker, 1981); alienation

(Garbarino, 1980; Moyer & Motta, 1982); and criminal

activity (Edwards, Roundtree, Kent, & Parker, 1981;

Mitchell & Rosa, 1981). Likewise, from the perspective of

the school system DSB is undesirable, involving excessive

teacher attention (Rubel, 1977, Chap. 1), litigation

(Lufler, 1982), vandalism costs (Goldstein, Apter, &

Harootunian, 1984), teacher stress (Pettegrew & Wolf,

1982), and weakened public support (Amos, 1980). Conse-

quences for the community include criminal actions and










psychiatric referrals (Faretra, 1981). Levin (1972) esti-

mated the expense of inadequate education to be about 6

billion dollars a year (1972 dollars) for costs associated

with welfare and crime.

Researchers have identified the middle and junior

high school age student as particularly prone to behavior

disorder (Geiger & Turiel, 1983; Loeber, 1982; Nielsen &

Gerber, 1979; Quay, 1978). These studies suggest the

middle and junior high schools as a focus for identifying

and remediating disruptive school behavior. Unfortu-

nately, no adequate instruments are available specifically

for this population (Mesinger, 1982). Instruments

developed from clinical populations contain some items

irrelevant to the non-clinical population in the public

schools (Quay & Peterson, 1967). Instruments offered with

norms developed from research samples and no procedure for

developing local norms for disruptive behavior do not

consider the placement needs of local school districts

(Messick, 1980). Levels of disruptive behavior that can

be managed within the regular school environment vary

across settings because of differences in such factors as

facilities, experience of teachers and administrators, and

school board policies.

Current instruments fail to consider the widely

differing consequences of specific disruptive acts (Kane &

Bernardin, 1982). Some possible effects of this omission











may be to group together students whose behaviors differ

widely in their severity, to encourage conceptualizing all

disruptive behavior as equally deleterious, and to base

placement decisions on personal judgments about the seri-

ousness of a particular type of behavior. Neither does

any available instrument provide procedures for creating a

prescriptive profile of a student based on the authors'

conceptual model of disruptive school behavior (Salvia &

Ysseldyke, 1981). This failure may seriously limit the

interpretation and application of rating scale results.



Purpose of the Study

The purpose of this study was to develop and validate

an instrument, the Disruptive Student Behavior Scale

(DSBS). The DSBS would be used to assess quantitatively

the disruptive school behaviors of students referred for

placement in either special education or alternative

education programs.



Need for the Study

Salvia and Ysseldyke (1981, pp. 443, 444, 450) have

called for norm-referenced instruments to support

placement decisions, evaluate student progress, evaluate

programs, provide intervention suggestions, and help

parents understand their children's abilities in relation

to other students. Reeves et al. (1978) called for











reliable instruments to use in placing handicapped

children. Also, Camp (1981) notes that

there is very little current, objective,
research-based information in existence to help
identify specific student behavior problems
occurring in the schools. A need exists for
research of this nature to quantitatively
establish the actual, current situation with
regard to student discipline problems in the
public secondary schools. (p. 48)

Presumably, these calls for reliable and valid instruments

apply both to special education and alternative schooling

programs, as both to some degree remove the student from

mainstream classroom activities. However, the Florida law

(State Board of Education Rule 6A-6.3017) providing for

special education programs for the socially maladjusted

was repealed July 24, 1981.

"Educational alternative programs" were created in

Florida in 1978 (Florida Statute 230.2315) specifically to

reduce disruptive behavior and truancy. Florida Statute

229.565 provides for the evaluation of "procedures for

identification and placement of students in educational

alternative programs." As an example of practice, in 1982

the alternative education program in the Pinellas County

School District did not require quantitative behavioral

assessment prior to placement.

Studies, however, have identified problems in using

subjective criteria for alternative education placement.

Disagreements in ranking behaviors (Pisarra & Giblette,











1981), value systems (Messick, 1980), labels applied to

students (Leyser & Abrams, 1982), teaching experience

(Rubel, 1977, p.51), level of frustration (Walker &

Holland, 1979), race (Arnove & Strout, 1978; Bennett &

Harris, 1982; Florida DOE, 1983; Goldsmith, 1982;

Mesinger, 1982), sex (Bennett & Harris, 1982), and

socioeconomic status (Arnove & Strout, 1978; NEA, 1975)

are variables that may confound perceptions of disruptive

behavior.

One way to help neutralize these confounding vari-

ables is to use quantitative measures. A review of

current literature indicates that appropriate instruments

may not exist. After a major study of alternative educa-

tion programs, Mesinger (1982) was unable to recommend

even one instrument for use in selecting students.

Messick (1964, 1965, 1980) argued against applying to

local environments behavioral norms developed elsewhere.

Stott, Marston, and Neill (1975, p. 8), Wodarski and Pedi

(1978, p. 480), and Quay and Peterson (1975, 1979) advised

the setting of local norms. However, no instrument

located in this review provides a specific procedure for

determining local norms.

Another advantage of locally-developed norms is the

opportunity to compute the mean DSB level for individual

schools. Intervention program entry and exit criteria may

be defined by the deviation of an individual student's











mean DSB score from the school mean. This may provide the

type of quantitative assessment required by state (SBE

Rule 6A-6.3016) and federal (P.L. 94-142) law for special

education placement and may meet the need noted by

Mesinger (1982) for quantitative instruments to assist in

selecting students for alternative education programs.

A major need in intervention programs is prescriptive

information (Lovitt, 1967 p. 238; Spivack & Swift, 1977).

However, many instruments do not provide operationally-

defined items which are useful in the classroom. For

example, the Behavior Problem Checklist (Quay & Peterson,

1979) items used to identify conduct problem students

include "restlessness," "disruptiveness," and "irresponsi-

bility." These items originally were taken from the files

of a child guidance clinic (Quay, 1977).

Defining disruptive behavior on the dimensions of

type, frequency, and severity has received support from

numerous sources (American Psychiatric Association, 1980,

p. 45; Bernardin, LaShells, Smith, & Alvares, 1976; Camp,

1980, 1981; Grosek, 1979; Taylor, Warren, & Slocumb,

1979). Criticisms of assessment procedures not incorporat-

ing a severity factor have been made by Kane and Bernardin

(1982) and Pisarra and Giblette (1981). Nevertheless, no

instrument was located which specifically recommended

using a severity factor in assessing disruptive school

behavior.











An instrument which provides for quantifying DSB may

help to protect students from placement in school programs

according to inappropriate criteria. To be most effec-

tive, the instrument should include provisions for

establishing locally-determined placement norms, for

comparing with those norms the scores of individual

students, for providing prescriptive information, and for

systematically considering the type, frequency, and

severity of the disruptive behaviors.



Significance of the Study

This study investigated the theoretical position that

disruptive school behavior (DSB) can best be described

in terms of its type, frequency, and severity. Theoretical

considerations in the use of teachers as observers and

raters of disruptive school behavior were discussed.

The feasibility of using teacher-generated behavioral

statements from disciplinary referrals to better specify

the parameters of DSB was suggested. A review of various

rating scale development procedures attempted by business,

industry, and government were summarized.

The instrument developed by this study will initially

be most appropriate as a research tool for conducting

studies of DSB. The availability of a process for

establishing local norms for DSB may facilitate local

research studies in evaluating the effectiveness of











disciplinary measures, in-service training, and alterna-

tive education programs. This study will likely suggest

additional areas for other investigations.

The identification of disruptive students for inter-

ventions is not standardized. This instrument may assist

in establishing quantitative criteria for selection,

placement, and treatment of disruptive students. This, in

turn, may lead to recognition of DSB as a category for

exceptional student education funding.

A major premise in much of the literature concerning

DSB is the role of school personnel in exacerbating

disruptive behavior. It may be that an instrument which

provides a behavioral profile of the disruptive student

will suggest goals for in-service training programs.



Definition of Terms

For the purposes of this study, the following

definitions apply:

Alternative education program. An educational

procedure which provides intervention outside the regular

classroom for students exhibiting some predetermined level

of disruptive or disinterested school behavior.

Disruptive school behavior (DSB). Behavior that

disrupts the learning of self and/or others and is not

attributable to severe emotional disturbance or other

exceptional education categories.











Delinquent behavior. Behavior by persons under 18

years of age which violates laws and regulations pertain-

ing to them.

Exceptional child (student) education programs.

Programs which receive additional funding in order to

better serve the needs of students meeting governmental

guidelines for special assistance.

Experienced teachers. Full-time, regular classroom

teachers who have held that position at least two academic

years.

Expulsions. Removal from school for at least the

remainder of the school year.

Locally developed norms. Criteria for comparing an

individual student's DSB with the expected DSB of a

specific reference population in the local school or

community.

Maladaptive social behavior. Behavior not of organic

origin which would be judged by impartial observers to be

inappropriate for the social situation and which ulti-

mately results in aversive consequences for the person

exhibiting the behavior.

Method bias. The influence on ratings of the type of

rating method used.

Non-quantitative assessment. See Qualitative assessment.

Qualitative assessment. Evaluation based on individ-

ual opinion and lacking a systematic basis.











Quantitative assessment. The use of numbers in

describing behavior so that a higher number indicates a

higher level of the behavior.

Severity. A prediction, stated quantitatively, of

the potentially detrimental consequences a disruptive

behavior would likely have for a student.

Special education programs. See Exceptional child

education programs.

Suspensions. Temporary removal from the regular

educational program of a school, usually involving

exclusion from school facilities for a specified number of

days.



Organization of the Study

There are four remaining chapters in this

dissertation. Chapter Two will present a review of the

literature related to the development of an instrument to

assess disruptive school behavior (DSB). Specifically,

consideration will be given to disruptive behavior in the

schools, existing assessment methods, rating scale develop-

ment, the psychometric properties of rating scales, and

the possible uses of results from a disruptive behavior

'rating scale.

Chapter Three will present the methodology employed

in the development, validation, and field testing of the

Disruptive Student Behavior Scale (DSBS). Included are











the research questions, information on the population,

procedures used in developing the scale, pilot testing,

data analyses, and possible limitations of the study.

Chapter Four will present the results of this study,

including the data and the information inferred from the

data. An explanation of the results will be given and

they will be related to past research.

Chapter Five will include conclusions from this

study, along with implications for theory, research,

practice, and training. A summary of the entire study

will be presented, followed by recommendations for addi-

tional research.














CHAPTER TWO

REVIEW OF LITERATURE



This study requires an investigation of the history

and current status of attempts to define disruptive

behavior in public schools; identification, assessment,

and placement efforts directed toward disruptive students;

rating scale development procedures; research into the

psychometric properties of rating scales; and the use by

schools of results obtained from rating scales. Accord-

ingly, this chapter will review research and opinion

covering both theoretical and applied considerations

relating to these topics.



Definition of Disruptive School Behavior (DSB)

According to Camp (1981), the major issue in student

discipline in the secondary schools is how to describe

quantitatively the kinds of disruptive behavior currently

occurring. Summarizing a 1978 survey of state directors

of special education, Hirshoren and Heller (1979) reported

that while individual states define emotional disturbance

consistently, there is considerable variation in the kinds

of children so identified. That is, children meeting

program criteria in one state appeared to be excluded in











another. Much has been written in an attempt to resolve

this situation. A review of the literature suggests the

emergence of five discrete perspectives: (a) empirical,

(b) clinical, (c) conceptual, (d) educational, and (e)

school.

The empirical approach of applying factor analysis

(Cattell, 1978; Gorsuch, 1974) to a variety of items has

resulted in the identification of some common behaviors

associated with disruptive school behavior and has contri-

buted to defining DSB (Achenbach, 1978; Achenbach &

Edelbrock, 1978; Edelbrock, 1979; Peterson, 1961; Quay,

1964, 1978; Quay & Peterson, 1967). However, researchers

utilizing the empirical approach have included a broad

range of behaviors, including many which identify delin-

quency and personality disorders (Freemont & Wallbrown,

1979), and so the scales developed from these studies have

limited application for school personnel in defining the

specific category of DSB.

The classification of disorders contained in the

Diagnostic and Statistical Manual of Mental Disorders,

3/e. (DSM-III) (American Psychiatric Association, 1980)

and research studies incorporating these classifications

and descriptions exemplify clinical efforts to define

disruptive school behavior. Hewett and Forness (1982)

pointed to the necessity of finding a common frame of

reference between educational and psychiatric diagnoses in











order for school personnel to accurately interpret

clinical reports. Forness and Cantwell (1982) concluded

that the respective diagnostic systems of psychiatry and

special education remain dissimilar. Likewise, other

studies (Loeber, 1982; Werry, Methuen, Fitzpatrick, &

Dixon, 1983) failed to find support for the use of

psychiatric diagnoses to assign students to special

education programs.

The conceptual approach utilizes experience,

research, and opinion in formulating descriptions of what

is usually referred to in this perspective as "problem

behavior" (Jessor & Jessor, 1977, p. 4). Cullinan (1975),

Howell (1978), and Richard Jessor (1982) are among those

applying a psychosocial conceptualization of problem

behavior to the study of adolescent behavior. Neverthe-

less, while the conceptual perspective gives support to

the notion of comparing the behavior of an individual

student with the behavior of peers before declaring the

student to be deviant, this perspective fails to provide

specific criteria for making such a comparison.

The educational perspective includes the definitions

contained in federal and state statutes, guidelines

proposed by governmental agencies, and district codes of

student conduct. In 1977, the U.S. government, without

defining the term, specifically excluded the socially

maladjusted student from receiving exceptional child











education services under P.L. 94-142. The term "socially

maladjusted" is not defined in the latest Florida guide-

lines for providing special education for exceptional

students (Florida DOE, 1985). The U.S. Bureau of Educa-

tion for the Handicapped has sponsored the compilation of

a manual on behavior disorders (Yard, 1977). However,

these items are too general for use in a quantitative

instrument.

Codes of student conduct contain lists of behaviors

for which punishment may be administered. Offenses listed

in the codes may be violations of either school rules

(e.g., inappropriate display of affection) (Duval County

Public Schools, 1980) or of law (e.g., vandalism) (Pinellas

County Schools, 1983). While these offenses must be

considered in defining disruptive school behavior, they

exclude many of the disruptive behaviors frequently

occurring within the classroom. Federal, state, and local

guidelines seem insufficient for operationally defining

DSB specifically enough to be useful in a selection

instrument.

The school perspective focuses on the interactions of

students, teachers, and administrators within schools.

Disruptive school behavior is seen as a product of these

interactions. H. M. Walker, author of The Walker Problem

Behavior Identification Checklist (1970), described the

acting-out child as one who usually defies rules and











ignores classroom procedures, is difficult to manage,

avoids failure by attempting little academic work, and

alienates teachers and other students by behaving

aversively.

Specific behaviors often include hitting, yelling,

leaving seat, arguing, having temper tantrums, and provok-

ing others and often lead to confrontations. These

confrontations may be verbal, physical, or both. Acting-

out behavior may occur in the classroom, in nonclassroom

areas, or both. Walker (1970) proposed that acting-out

children are differentiated from other students by the

frequency, or quantity, of these behaviors, not by the

type of behaviors. Thus, a measuring instrument must

provide for a frequency component.

Camp (1981) explored the types of behavior considered

to be disciplinary problems, the perceived degree of

severity of these behaviors, and the frequency with which

these behaviors were observed. Camp found that the types

of behaviors rated most serious were rarely observed and

concluded that the most serious problem may be the

frequent, though mild, behaviors that undermine student

and teacher morale. A study of 21 secondary school

administrators' attitudes toward aggressive behavior

suggested that suspensions were awarded according to the

administrators' attitudes toward the referred behavior,











rather than according to a consistent standard for the

school district (Pisarra & Giblette, 1981).

An evaluation of literature of the school perspective

suggests that DSB can be defined in terms which students,

teachers, and administrators understand; that the three

factors of type, severity, and frequency need to be

considered; and that measures of DSB need to be standard-

ized. In this section five perspectives for defining

disruptive school behavior were presented. Each perspec-

tive offers some assistance in differentiating this

category from other behavioral categories. There appears

to be support for an instrument which operationally

defines types of behaviors occurring throughout the school

environment, assigns a quantity to each descriptive item

based on the perceived frequency of occurrence and sever-

ity, and provides for comparing the score of an individual

student to a predetermined norm for that environment.



Identification, Assessment, and Placement

"Measurement is the construction of a model of some

property of the world" (Fraser, 1980, p. 27) and in

education this property is often the behavior of a

student. One role of the model provided by a measure is

to give accurate prescriptive information for planning

interventions with students (Forness, 1983). Several

studies have suggested this is being performed











inadequately (Greenwood, Walker, & Hops, 1977; Schenck,

1980; Sinclair, 1980; Sinclair & Kheifets, 1982; Spivack &

Swift, 1973; Strain, Cooke, & Apolloni, 1976).

Fraser (1980) acknowledged that psychological mea-

surement has been regarded as being quantitatively and

qualitatively of a lower order than physical measurement.

To achieve improvement, Ysseldyke and Marston (1982) have

argued for the use of direct observations of target behav-

iors by either teachers or trained observers. However,

Jones, Reid, and Patterson (1975) found observer reli-

ability varied inversely with the complexity of the

behaviors being observed.

Attempts to improve the validity of observations have

included such sophisticated approaches as Multidimensional

Scaling (MDS) (Torgerson, 1958). Sanson-Fisher and

Mulligan (1977), using adolescent student models, found

only marginal improvement for this technique over ratings

by classroom teachers. A comparison of a computer-driven

program for selecting behavioral/emotional disorders with

two expert psychologists' selections indicated no mean-

ingful differences existed (McDermott & Hale, 1982).

Weinrott (1979) summarized studies that indicated global

ratings could be significantly influenced by expectations,

while post hoc ratings of the same children by the same

raters when recorded on an instrument accurately reflected

discrete behavioral events. Gaynor and Gaynor (1976)











argued for instruments written to define behaviors so they

may be described quantitatively by teachers.

Beltramini (1982) suggested that scale-item content

is more important than other variables in obtaining reli-

able and valid results. A review by Albaum, Best, and

Hawkins (1981) of measurement literature found evidence to

support the use of from five to seven categories on Likert-

type scales, with no significant losses in reliability,

validity, or discrimination when compared with instruments

using more intervals. Fewer intervals sometimes resulted

in a loss of discriminative power and validity. It

appears that teachers using instruments which operation-

ally describe disruptive behaviors can be effective post

hoc raters and are able to provide reliable and valid

identification of disruptive school behavior (Edelbrock,

1979; Gresham, 1982; O'Leary & Johnson, 1979).

A review of current assessment techniques suggests

the emergence of a quantitative/qualitative dichotomy,

which will now be explored. In two reviews (Spivack &

Swift, 1973, 1977) of instruments for measuring secondary

school classroom behaviors no instrument was located which

limits its focus to disruptive school behavior, uses only

behaviorally-stated items, and provides for calculating

local norms. Descriptions follow of representative

instruments currently in use.











The Behavior Problem Checklist (BPC) (Quay & Peterson

1967, 1975, 1979) is a 55-item scale of behavioral traits

developed from a review of clinical records of kinder-

garten through eighth grade students referred for

psychiatric treatment (Quay, 1977). The items were

assigned by factor analysis to four scales plus a grouping

suggestive of psychosis. Epstein, Cullinan, and Rosemier

(1983, p. 172) and Gresham (1982, p. 137) reported that

the BPC is one of the behavior rating scales most widely

used in school studies.

The BPC has been used extensively both as a research

device (Eaves, 1975; Jacob, Grounds, & Haley, 1982;

Kelley, 1981; Touliatos & Lindholm, 1981) and in selecting

students for interventions (Algozzine, 1977; Balow, 1979;

R. Bower, 1969; Gerard, 1970; Ingram, Gerard, Quay, &

Levinson, 1970; McCarthy & Paraskevapoulas, 1969). Jacob

et al. (1982) reported that reviews of studies utilizing

the BPC suggested reliability and validity issues in need

of further study. The inability of the BPC to provide

other than broadband classifications has been noted

(Achenbach & Edelbrock, 1978).

Comprehensive normative data are not available for

the BPC for adolescents (Kelley, 1981). In an investiga-

tion of the effects of race on BPC ratings, Eaves (1975)

found that white teachers consistently rated black

students higher than white students on three of the











subscales. Black teachers showed no such bias. Eaves

(1975) concluded this bias could have a major effect on

the reported norms for the BPC. Touliatos and Lindholm

(1981) found that grade level, sex, and social class had a

significant effect on BPC ratings. However, differences

between schools and teachers contributed more variance in

the BPC ratings than grade, sex, and social class.

Touliatos and Lindholm (1981) suggested that Quay and

Peterson's (1967) recommendations be followed and

individual assessment be based on norms calculated for

particular schools and individual teachers.

Spivack and Swift (1973) concluded that the BPC was a

reasonably reliable measurement tool. Potential users

were cautioned, however, that most items are not specifi-

cally observable, but more like labels which imply

behaviors and designate traits. Likewise, Stott (1971,

p. 232) cited certain BPC items as requiring a teacher to

make inferences about students' feelings (e.g., "feelings

of inferiority"), being vague or ambiguous (e.g., "oddness,

bizarre behavior"), and relating to behaviors unobservable

by a teacher (e.g., "stays out late at night," "bed

wetting"). This review has identified several areas of

the BPC for which additional research has been suggested.

The Behavior Rating Profile (BRP) (Brown & Hammill,

1978) is composed of five rating scales and a sociogram.

Three of the scales (60 items) are completed by the target











student, one (30 items) by the teacher, and one (30 items)

by parents. The sociogram is a peer nominating techni-

que. The student scales provide self-ratings of behaviors

at home, at school, and with peers.

The BRP is based on an ecological approach which,

according to the authors, recognizes that students'

behaviors are dependent on the settings in which they

occur. Its purposes are the identification of students

with behavior problems and the differentiations among

learning disabled, emotionally disturbed, and behaviorally

disordered students in grades 1-12. Each of the six

measures is described as independent and individually

normed, allowing any scale to be used alone or in conjunc-

tion with any of the others.

The BRP manual (Brown & Hammill, 1978) reports

internal consistency reliability coefficients exceeding

.80. Concurrent validity was investigated by correlating

the BRP with measures obtained from other rating scales.

Adequate construct and content validity also are reported

by the authors. Norms are provided using scale scores

with means of 10 and standard deviations of 3, with scores

from 7 to 13 considered to be in the normal range.

One study (Reisberg, Fudell, & Hudson, 1982) of

behavior disordered students indicated that regular

classroom teachers gave higher ratings than special

educators (X=8.85 vs. X=6.87). Thus, norms may vary











according to the type of respondent (e.g., regular teacher

or special education teacher). Also, students' self-

ratings were inflated relative to other respondents'

ratings. Other investigators have noted problems

associated with attempts at multiple and self-ratings.

Lessing and her associates (Lessing & Clarke, 1982;

Lessing, Williams, & Gil, 1982; Lessing, Williams, &

Revelle, 1981) have reported on their unsuccessful

attempts to develop parallel checklists for use by

parents, teachers, and clinicians in psychiatric diag-

noses. Lobitz and Johnson (1975) found low correlations

between parent ratings and observed behaviors. Variables

confounding self-ratings include halo effect (Holzbach,

1978), social desirability (Dunnett, Koun, & Barber, 1981;

Seidman,, Rappaport, Kramer, Linney, Herzberger, & Alden,

1979), and lack of self-knowledge (Beitchman & Raman,

1979).

Ledingham, Younger, Schwartzman, and Bergeron (1982)

investigated teacher, peer, and self-ratings of 801

elementary school students. Self-ratings yielded the

lowest ratings for deviant behavior, aggression, and

withdrawal and the highest ratings for likability. Accu-

racy of self-evaluation has been found to be positively

correlated with high intelligence, high achievement

status, and internal locus of control, characteristics not

usually associated with DSB (Dunnett et al., 1981).











Reported research using the Behavior Rating Profile is

sparse. Additional verification of the assumptions of

equivalency of norms within respondent categories and the

validity of the self-report scales seems indicated.

The Bristol Social Adjustment Guides, 5/e.(BSAG)

(Stott, 1972) consist of 110 behaviorally-stated items

from which teachers select those descriptive of a

student's behavior in the month prior to the rating. The

items were originally developed in 1955 from clinical

observations of children aged 6 to 14 and modified by

classroom teachers (Stott & Sykes, 1956). A primary goal

was to incorporate context into the behavioral descrip-

tions (Stott, 1971).

The BSAG has been used extensively in clinical and

research studies (Davis, Butler, & Goldstein, 1972;

McDermott, 1980; Stott, 1978; Stott & Wilson, 1977).

Reliability and validity data were obtained through

extensive research (Stott et al. 1975) but are not

reported in a manner that is easily abstracted. Normative

data are available only for elementary school populations

(Stott, 1972). More recent research (McDermott, 1980,

1981; McDermott & Hale, 1982) has questioned the

specificity of the core syndromes of the BSAG and called

for further investigation of construct and predictive

validities (Hale & Zuckerman, 1981). At present, it











appears that not all of the core syndromes of the BSAG

have the specificity required in an instrument to be used

in educational placement.

The Hahneman High School Behavior Rating Scale (HHSB)

is a 13-factor, 45-item scale published in 1971 (Spivack &

Swift, 1971). The HHSB items were developed from observa-

tions of actual classroom behaviors, operationally stated

in educational terms. The items cover both academic and

interpersonal issues and can be rated by teachers in the

classroom. The intent is to provide prescriptive informa-

tion (Spivack & Swift, 1977). The factor scores for each

student are found by adding the raw scores for the three

or four items comprising each factor. These scores are

then combined into a profile, which is used to classify

students on the basis of their ability to adapt to total

classroom demands.

According to the authors (Spivack & Swift, 1973),

validity studies suggest consistent and significant

relationships between factor scores and academic grades.

No data are available on test-retest or interrater reli-

ability (Spivack & Swift, 1973). Norms are available

separately for suburban and urban samples. The HHSB is

limited as a selection device for special education pro-

grams by lack of reliability data, use of only three or

four items per factor, and overlapping among profile

descriptions.











The Behavior Evaluation Scale (BES) (McCarney, Leigh,

& Cornbleet, 1983) is a 52-item rating scale for use by

school personnel. Each item is assigned to a subscale

associated with one of the five characteristics of the

Bower (1958) definition of behavior disorders used in

Public Law 94-142. The BES was developed to aid in

diagnosis, placement, and program planning under federal

guidelines. Since federal criteria specifically exclude

the "socially maladjusted" student, the BES is inappropri-

ate for assessing DSB.

The Portland Problem Behavior Checklist (PPBC)

(Waksman & Loveland, 1980) was developed to aid in

assessment, evaluation, and intervention planning for

school children. The 29 items cover teacher-rated

behaviors for grade levels K-12. Norms are not avail-

able. Items are very generally stated (e.g., aggressive-

physical, destructive) and are rated on a scale of 0 (no

problem) to 5 (severe). It is not clear if this is a

rating of frequency of behavior or severity of the

consequences of the behavior. These features of the PPBC

would seem to limit the preciseness and reduce the confi-

dence level of quantitative scores intended to support

evaluation and placement for professional services.

The Pupil Classroom Behavior Scale (PCBS) (Dayton,

1967) is a 24-item, teacher-administered rating scale

intended to measure the effectiveness of special education











services for students displaying inappropriate classroom

behaviors. Most items are behaviorally stated and yield a

profile of three factors, achievement orientation, socio-

academic creativity, and socio-cooperativeness. Dayton

(1967) suggested using the scales for research on groups

rather than to describe individual students. Norms are

not available. Spivack and Swift (1973) concluded that

the PCBS is flawed by having overlapping items in the

factors and lacking data to support a relationship between

scale scores and emotional adjustment.

The 36-item Conners Teachers' Rating Scale (CTRS)

(Conners, 1969) has been used primarily in clinical diagno-

sis of children, particularly in the area of hyperactivity

(Goyette, Conners, & Ulrich, 1978). It does, however,

cover a wide range of school problem behaviors (Roberts,

Milich, Loney, & Caputo, 1981). There appears to be a

high intercorrelation between the Conduct Problem and

Hyperactivity subscales, limiting the usefulness of the

CTRS in identifying DSB.

The Brief Behavior Rating Scale (BBRS) (Kahn &

Ribner, 1982) was developed from the Devereux series of

rating scales (Spivack, Haimes, & Spotts, 1967). A cross-

validation study (Kahn & Ribner, 1982) reported that 61%

of a socially maladjusted group and 27% of an emotionally

handicapped group were correctly identified. These

results suggest that additional development is needed











to obtain support for the discriminant validity of the

BBRS.

Some of the most complete research in instrument

development has been conducted in attempts to improve the

diagnosis of clinical populations in the school environ-

ment. Although these efforts are not directly comparable

to the intent of the present study, six instruments having

potential interest to researchers working in the school

setting will be summarized.

The Child Behavior Check List (CBCL) (Achenbach,

1978) contains 118 behavior problem items and 20 social

competence items. Parallel forms exist for parents and

teachers. A review by Achenbach and Edelbrock (1978) of

empirical attempts to derive syndromes of child behavior

problems concluded with the recommendation that these

efforts be linked to the existing mental health system.

Recent efforts by these researchers and their associates

(Edelbrock & Achenbach, 1980; Reed & Edelbrock, 1983)

continue to pursue this objective. At present the

applicability of this instrument for educational measure-

ment is limited.

The role of parent observations in describing chil-

dren's behavior is formalized in the Louisville Behavior

Check Lists (Miller, 1967, 1980). A study (Tarte, Vernon,

Luke, & Clark, 1982) confirmed the validity of parent

observations of clinical symptoms in their children.











The items require inferences and judgments by raters.

Eight subscales were created through factor analysis and

although several appear to relate to school activities

(e.g., hyperactivity, antisocial), the content of

individual items comprising the subscales renders them

only marginally useful for school assessments.

The Children's Behaviour Questionnaire (Rutter, 1967)

was developed for teachers' use in screening for psychi-

atric assessment large numbers of school children. Many

of the 26 items are vaguely stated and some appear to

require inferences by the rater. The two subscales are

labeled neurotic and antisocial, terms which lack direct

application to the school setting.

The Devereux Adolescent Behavior Rating Scale

(Spivack et al., 1967) was developed to measure behavior

requiring professional intervention. The subscales are

oriented to clinical diagnosis and offer little specific

information for use in placement decisions.

The Pupil Behavior Inventory: 7-12 Grades (Vinter,

Sarri, Vorwaller, & Schafer, 1966) is a 34-item, teacher-

administered rating scale intended to furnish information

on students referred for agency treatment. Behavioral

items were collected from teachers, screened and factor-

analyzed, and grouped into five factors. Lack of data on

reliability, validity, and norms suggests caution in











using this instrument to select students for special

services (Spivack & Swift, 1973).

The Mooney Problem Check List (MPCL) (Mooney, 1942),

has been widely used by counselors to identify problems of

individuals seeking counseling or to explore the problem

profile of a group of students (Sundberg, 1961). However,

two studies (Joshi, 1964; Stewart & Deiker, 1976) of the

underlying factors of the MPCL scales have identified only

a single general factor. The MPCL may be further limited

by utilizing items generated from problems mentioned by

high school students in 1942.

Several instruments designed for other populations

include behaviors often used in descriptions of disruptive

school behavior. The Adolescent Behavioral Classification

Project instrument (Dreger, 1980) was developed for

assessing problems of institutionalized adolescents. An

analysis of the first-order factors indicates some common-

alities with both the Hahnemann High School Behavior

Rating Scale (Spivack & Swift, 1977) and Achenbach and

Edelbrock's (1978) syndromes, but many are couched in

clinical terms that have little or no relevance to the

classroom setting.

Ostrov and associates (Ostrov, Marohn, Offer, Curtiss,

& Feczko, 1980) developed and validated the Adolescent

Antisocial Behavior Check List (AABCL) for delinquents

housed in an institutional treatment setting. The authors











called for modification of the instrument for use in other

settings; however, extensive rewriting of items would seem

to be required.

The Jesness Inventory (Jesness, 1972) was created to

measure attitude change in youthful offenders undergoing

treatment. One study (Graham, 1981) found the Jesness

Inventory did not have the power to discriminate between

non-adjudicated and normal populations and thus would not

be useful in a school setting. The Jesness Inventory

appears best suited for research (Buros, 1978, pp.

876-878).

The Jesness Behavior Checklist (JBC)(Jesness, 1970)

is also a measure of delinquent behavior. The reliability

and validity of this instrument have been questioned and

the JBC is recommended only for research purposes (Buros,

1978, pp. 873-876).

Non-quantitative assessment often uses nonsystematic

observations to provide the information from which judg-

ments will be made. Judgments about individuals are

required in all assessment. Inaccurate, biased, or sub-

jective judgments can be misleading and harmful (Salvia &

Ysseldyke, 1981). The Russell Sage Foundation Conference

Guidelines (Goslin, 1969) and the 1974 Family Educational

Rights and Privacy Act (P.L. 93-380--the Buckley amend-

ment) established guidelines for the proper collection,

maintenance, and dissemination of data concerning students.











For data to be used in making judgments, it must be

verified. For standardized tests, this verification is

implicit in the psychometric qualities of the instrument.

For observational data, verification requires confirmation

by persons other than the original observers (Salvia &

Ysseldyke, 1981). When the observation is nonsystematic,

verification may be difficult to establish and support and

the assessment and resulting evaluation may be open to

challenge.

After a classroom teacher nominates a child for

evaluation for exceptional child education services, that

teacher's observation is verified by required legal proce-

dures (P.L. 94-142). There may be no such procedures for

other interventions. The Duval County, Florida, School

District has used teacher and principal nominations as the

criteria for admittance and dismissal from a program to

intervene with students displaying inappropriate social

behaviors (Duval County Public Schools, 1980). Short-term

suspensions in many school districts do not require hear-

ings and are based solely on a judgment by the school

principal (Lines, 1972; Pisarra & Giblette, 1981).

Subjective assessment practices such as these may

allow extraneous variables to influence judgments

(Poulton, 1976). Four such variables are bias, the influ-

ence of observer expectations, inaccurate perceptions,

and vagueness of the criteria for intervention.











Pupil characteristics were found by Ysseldyke and

Marston (1982) to influence rater bias. Variables

contributing to bias include perceived physical attrac-

tiveness (Ross & Salvia, 1975); sex, socioeconomic status,

and reason for referral (Matusek & Oakland, 1979;

Ysseldyke & Algozzine, 1982; Ysseldyke, Algozzine, Regan,

& McGue, 1979, 1981); race (Florida Department of

Education Report on Public Schools, 1983; Sikes, 1975);

type of behavior displayed by the student (Algozzine,

1980); and the theoretical orientation of the observer

(Messick, 1980; Salvia & Ysseldyke, 1981).

Erickson (1974) and Shuller and McNamara (1976) found

naive observers' reports coincided with experimenter-

induced expectancies about problem behavior. After

observing decisions made by educators, Weinrott (1979);

Ysseldyke, Algozzine, and Richey (1982); and Algozzine and

Ysseldyke (1981) speculated that these judgments were

influenced by an expectancy factor created by the

situation itself. A more direct measure of expectation

was reported on by Green and Brydon (1975). They found

teachers' attitudes were much more favorable toward

middle-income children than low-income children and that

43% of teachers' comments about black children were

negative as opposed to 17% of comments about white

children.


I











Dunlap and Dillard (1980) investigated 164 school

principals' perceptions of the factors indicative of

emotional disturbance in children. The factor least

chosen by the principals was the one considered by the

researchers most predictive of emotional disturbance.

The vagueness of criteria for suspension in one

school district was investigated by Pisarra and Giblette

(1981). They found the criterion to be improper conduct,

which was not further defined. The researchers concluded

that a student reported for fighting would be suspended,

possibly suspended, or not suspended depending on the

individual administrator who had jurisdiction.

A few of the possible sources of error in nonsystem-

atic observation leading to inaccurate, biased, or

subjective judgments have been presented to suggest their

ubiquitous nature and the necessity of providing for

systematic observations in judgments leading to educa-

tional placement decisions.



Rating Scale Development

Designing a rating scale requires addressing four

major issues: (a) what to measure (parameters), (b) how

to measure (item content and format), (c) how to record

(response format), and (d) how to interpret the results

(statistical analysis). Literature pertaining to these

issues will be reviewed in this section.











In a frequently cited longitudinal study of deviant

behavior, Robins (1966) found the variables of type of

behavior, frequency of occurrences, and severity of

consequences to be indicators of future behavior pat-

terns. More recent studies supporting these criteria

include those of Kohn, Koretzky, and Haft (1979); Camp

(1980); Forness and Cantwell (1982); Gresham (1982);

Loeber (1982); and a United States Department of Justice

report (1982, p. 1).

The types of behavior to be measured by a rating

scale are determined by its authorss, who must consider

content, sources, format, number, and order of presenta-

tion of the items to be included. Halo effects, or the

tendency to rate individuals holistically (Thorndike,

1920, p. 25; Willingham & Jones, 1958), were found by

Cooper (1981; 1983) to be reduced by having more specific

item content. Kreitler and Kreitler (1981) demonstrated

that items deemed irrelevant by raters tended to be scored

neutrally, thus limiting the derived information. Never-

theless, scales for rating disruptive behavior sometimes

include prosocial behavior content (Miller, 1980).

However, Deno (1979) suggested that to observe non-

disruptive behavior ignores the purpose of these ratings,

i.e., to determine whether inappropriate behaviors are

actually excessive. Schriesheim and Hill (1981) mixed

positive and negative statements on a questionnaire and











concluded that the effect was to reduce response validity.

Many scales do limit their items to behaviors that focus

on problem behavior (DiPrete, 1981; Duke, 1978; Governor's

Task Force on Disrupted Youth, 1974; Spivack & Swift, 1966;

Walker, 1979, p. 55), although not necessarily school

problems. Camp (1980) suggested that only school problems

directly observable by teachers and/or administrators be

included in scales for rating disruptive school behavior.

Logically, items taken from the setting in which the

ratings will be made best meet the criteria for relevant

content. Smith and Kendall (1963) used this premise in

devising Behavioral Expectation Scales (BES). Numerous

examples exist of the application of this premise in

education (Brown & Hammill, 1978; Camp, 1980; Duval County

School Board, 1979; Ross, Lacey, & Parton, 1965; Sherry,

1979; Spivack & Swift, 1977; Stott et al., 1975), mental

health (Kaufman, Swan, & Wood, 1979; Kohn et al., 1979;

Lachar & Gdowski, 1979; Miller, 1980) and industry (Vance,

Kuhnert, & Farr, 1978).

Item format refers to the various forms used in

presenting the information to which the rater is asked to

respond. It is often related to response format, which

refers to the methods of collecting information from the

raters. Response format literature will be presented in

the section covering the frequency characteristic.











Four types of item formats are currently in use in

behavioral rating scales. Behavioral Observation Scales

(BOS) describe the target behavior in specific terms that

require direct observation at the time the rating is made

(Latham & Wexley, 1977). Behaviorally Anchored Rating

Scales (BARS) provide a specific description of a behavior

for each successive rating point (anchor) of an item and

assess cumulative behavior over some time period (Smith &

Kendall, 1963). The Mixed Standard Scale (MSS) uses sev-

eral scales, with three levels of behavioral description

for each trait to be measured, and randomizes the order of

presentation (Blanz & Ghiselli, 1972).

Summated rating scales (Edwards, 1957), referred to

as Likert scales (LT) (1932) or graphic rating scales

(Waters, Reardon, & Edwards, 1982), present for each item

one statement that may be specific or general. Likert

scales have been used with both direct and deferred obser-

vation. BOS scales are developed using summated rating

procedures (Likert, 1932), while BARS and MSS use the

Thurstone (Thurstone & Chave, 1929) scale development

process (Bruvold, 1969).

Conflicting conclusions have resulted from numerous

investigations into the advantages and disadvantages of

these scale formats. Fay and Latham (1982) found BOS to

be superior to BARS in rating video-taped behavior during

job interviews. However, Murphy, Martin, and Garcia











(1982) questioned the theoretical basis for BOS and found

evidence to suggest that BOS tapped recall for behavior

traits as well as immediate observation. Several studies

(Hom, DeNisi, Kinicki, & Bannister, 1982; Ivancevich,

1980; Keaveny & McGann, 1975; Lee, Malone, & Greco, 1981)

failed to find significant advantages for the BARS format

over summated rating scales or other alternative methods

(Jacobs, Kafry, & Zedeck, 1980; Kingstrom & Bass, 1981;

Schwab, Heneman, & DeCotiis, 1975).

In opposition to MSS theory, Finley, Osburn, Dubin,

and Jeanneret (1977) found evidence to suggest that an

obvious scale format may be superior to a hidden contin-

uum. Dickinson and Zellinger (1980) compared MSS, BARS,

and LT formats and found MSS produced less method bias,

BARS produced as much discriminant validity as MSS and

provided the best feedback to rates, and LT scales were

easiest to understand and use. When Bruvold (1969) tested

the application of summated scales (Likert, 1932) and

successive interval scales (Edwards & Thurstone, 1952) to

the same data set, no significant differences were found

between the two scaling methods. According to Bernardin

and Smith (1981), one explanation may be that scale

constructors have deviated from the original procedures

(Smith & Kendall, 1963) in developing BARS instruments.

In addition to the Thurstone and Likert scaling

procedures, a third method is available. According to











Edwards (1957, p. 172), a Guttman (1944, 1945, 1947a,

1947b), or cumulative scale, requires that the construct

to be measured be unidimensional. Since disruptive school

behavior consists of many discrete behaviors, a Guttman

scale is not suitable for the instrument developed in this

study. At present, it appears that no item format is

superior enough to warrant relinquishing the clarity of

understanding and ease of use (Dickinson & Zellinger, 1980)

of the Likert scale, which presents one descriptive item

at a time to which the rater assigns a quantitative value

from a given range of values.

In determining the number of items to include in a

rating scale, some researchers (Quay & Peterson, 1967,

1979; Spivack & Swift, 1971; Stott, 1972) have relied on

factor analysis, using an arbitrarily chosen factor score

as the cut-off score. Edwards (1957) suggested an intui-

tive approach, utilizing 20-25 items that discriminate

between the groups at the extremes of the scale. A

comprehensive study (Achenbach & Edelbrock, 1978) of 18

rating scales found the range of items to be from 36 to

287 (median = 68 items; mean = 90.4 items). Of the 6

scales intended for use by teachers, 4 contained fewer

than 50 items and 2 between 50 and 100 items.

In a study of preferred scale length, Meredith (1981)

found half of the respondents preferred from 20 to 40

items, with 25 the median preferred length. In another










study, Meredith (1975) found a 52-item scale was judged

too long. Seidman and his associates (Seidman et al.,

1979) concluded their 46-item Teacher Behavior Description

Form was too cumbersome and reduced it to 23 items. While

item complexity is probably a factor (Meredith, 1981),

this review suggests a scale using no more than 40 items

would probably be acceptable to most teachers.

The ordering of items within a scale has been

suggested as a possible source of leniency error, halo

effect, and impaired discriminant validity (Blanz &

Ghiselli, 1972). Schriesheim and DeNisi (1980) and

Schriesheim (1981b) found that grouping according to

constructs rather than randomizing questionnaire items

resulted in impaired discriminant validity. Increased

leniency response bias was also found when items were

grouped (Schriesheim, 1981a).

Dickinson and Zellinger (1980) concluded that a

randomized scale contributed as much discriminant validity

as an ordered scale while displaying less method bias. In

a comparison of randomized and grouped scales, the

randomized scale engendered as much convergent and

discriminant validity (Waters et al., 1982). Thus, a

randomized order of presentation seems indicated.

Obtaining a meaningful measure of the frequency of

target behaviors requires attention to the variables of

response format, length of the observation period, and










type and number of raters. According to Tzeng (1983),

four response formats are most frequently cited in the

literature. They can be differentiated in terms of two

psychometric criteria. First, the existence of a neutral

response option defines the free choice format. Absence

of a neutral rating option defines the forced choice

format. Second, categorical (qualitative) ratings answer

the question "Does the ratee fit this category?" while

discriminatory (quantitative) ratings answer the question

"To what degree does the ratee fit?"

Tzeng (1983) criticized forced choice measures for

their omission of a valid response category, i.e., uncer-

tainty or neutrality of the raters' perceptions. King,

Hunter, and Schmidt (1980) concluded that a forced choice

format was ineffective in reducing rater halo. Dunnette

(1963, p. 96) reported that rater resistance to forced

choice formats led to their abandonment.

Categorical, or qualitative, formats used in

checklists cannot detect relative differences in degree

between two behaviors performed by the same ratee or

between the same behaviors among rates (Tzeng, 1983).

Johnson, Smith, and Tucker (1982) found less response

skewness on a 5-point Likert discriminatory scale compared

to a yes/?/no categorical format. A zero-based discrimina-

tory, free choice response format seems most appropriate

(Likert, 1932). The absence of a behavior can be indicated










by the 0 position or, if present, the perceived frequency

can be indicated by choosing a value from the remainder of

the scale (Edwards, 1957).

The number of value choices permitted to the rater is

a critical issue. If few points are used some information

may be lost, but the scales are less ambiguous for the

rater. If there are too many points the discrimination

may be too fine for the rater to make. Albaum et al.

(1981) attempted to show superiority for a continuous

scale format, but concluded that equivalent aggregate

measurements were obtained from a 5-category, discrete

rating scale.

Likewise, Bernardin et al. (1976) and Bardo and

Yaeger (1982) failed to find continuous scales superior to

discrete scales. The superiority of a 5-point, discrete

rating scale has been suggested by Cowen, Dorr, Clarfield,

Kreling, McWilliams, Pokracki, Pratt, Terrell, and Wilson

(1973); Lissitz and Green (1975); McKelvie (1978); Neumann

and Neumann (1981); and Broadbent, Cooper, Fitzgerald, and

Parkes (1982).

Conversely, Bardo and his associates (Bardo & Yeager,

1982; Bardo, Yeager. & Klingsporn, 1982) found obtained

means and variances closer to the expected values for

4-point scales over 5- and 7-point scales. These results

appear contrary to most other studies. Edwards (1957, pp.










150-151) gives Likert's original statistical rationale for

the use of a 5-point scale, anchored with the integers 0

through 4, and the summation of scores for individual

items as a total score for each ratee. Current research

provides no compelling evidence for departing from this

original format.

An anchor, e.g., "always," "sometimes," "never," is

usually associated with each scale point of a Likert-type

summated rating scale (Pohl, 1981). While a variety of

anchors has been used, the basis for the selection is

often not stated (Beatty, Schneier, & Beatty, 1977;

Broadbent et al., 1982; Camp, 1980; Cowen et al., 1973;

Hunter, Hunter, & Lopis, 1979; Kassin & Wrightsman, 1983;

Moses, 1974; Siegel, Dragovich, & Marholin, 1976; Solomon

& Kendall, 1977; White, 1977).

Several studies have investigated the assumptions

involved in the selection of one popular set of anchors:

always, often, occasionally, seldom, and never. Parducci

(1968), Chase (1969), and Pepper and Prytulak (1974) con-

cluded that the meanings of anchor words were influenced

by context. The effects of individual differences among

raters on their interpretations of anchor words were

demonstrated by Helson (1969) and Goocher (1965). These

studies suggested that the above anchors may not define

perceptually equal intervals along the rating continuum.











Four studies (Bass, Cascio, & O'Conner, 1974;

Schriesheim & Shriesheim, 1974, 1978; Spector, 1976) have

sought to select five anchor words that would be perceived

by raters as defining equally spaced rating intervals.

However, the most definitive study appears to be Pohl's

(1981) partial replication of the Bass et al. (1974) and

Shriesheim and Shriesheim (1974, 1978) studies. Using

responses from 164 college students, Pohl (1981) calcu-

lated the means and standard deviations for 39 expressions

of frequency.

Comparing these with the theoretical mean responses

for a 5-point equal interval scale, Pohl (1981) derived

the response set of always, quite often, sometimes, very

infrequently, and none of the time. The calculated mean

(26.71) for the mid-point term "sometimes" differed signi-

ficantly (p < .001) from the theoretical mean (29.05), but

nevertheless was the value closest to the optimal for a

5-point scale. The other calculated values were not

significantly different from the theoretical profile.

Thus, with the exception of the mid-point term, it appears

that the anchors produced by the Pohl (1981) study

adequately defined equal-appearing intervals on a 5-point

rating scale.

The length of the period for which behaviors are to

be rated has been little studied. For instance, the

manual for the Behavior Problem Checklist (Quay &











Peterson, 1967, 1975, 1979) does not specify for the rater

the inclusive time period to be considered in rating the

listed behaviors. The authors of the Devereux Elementary

School Behavior Rating Scales (Spivack & Swift, 1966)

instructed their raters to "consider recent and current

behavior" (p. 75). The same authors (Spivack & Swift,

1977), in developing the Hahnemann High School Behavior

Rating Scale, instructed teachers to base ratings on

behavior observed "over the past month" (p. 300).

A study (Hinton, Webster, & O'Neill, 1978) of hospi-

talized clinical patients used a 6-week time period. An

investigation (Beatty et al., 1977) of performance rating

in a data processing firm utilized three assessment

periods of two months each for a total of six months. In

a study of several response formats, Broadbent et al.

(1982) used a 6-month inclusive time period. However, in

none of these studies was a rationale given for selection

of the time period.

Two attempts at aggregating measures over specific

time periods have provided more precise instructions to

the rater. Cowen et al. (1973) defined each of five

rating points in terms of the inclusive time periods to be

considered when aggregating occurrences of behavior. For

example, the fourth anchor point, often, was defined as

"you have seen this behavior more often than once a week











but less often than daily" (p. 16). Camp (1980) used the

following scale:

Frequency of occurrence

0 Never observed
1 Once or more in semester
2 Once or more monthly
3 Once or more weekly
4 Once or more daily (p. 11)

The work of Seymour Epstein (1980), in support of the

stability over time of personality traits, bears directly

on the issue of aggregating behavior ratings over some

time period. Epstein (1980) stated that "stability can be

demonstrated as long as the behavior in question is

averaged over a sufficient number of occurrences" (p.

791). In testing this hypothesis, Epstein conducted four

studies in which he used, among other types, ratings

performed in classrooms by teachers. Epstein suggested

aggregating behavior over subjects, stimulus situations,

time, and modes of measurement in order to establish

predictive reliability and validity (p. 797).

Ratings of middle and junior high school students by

their teachers in different courses would meet the

conditions of subjects and situations. Epstein (1980)

suggested that ratings at a single time following multiple

or extended observations represent an intuitive averaging

that has the "potential for producing highly replicable

and valid results" (p. 802). Harrop (1979) also

challenged the common assumptions (Fay & Latham, 1982;











Latham, Fay, & Saari, 1979) that coding of directly

observed behaviors produced superior results to aggregat-

ing behaviors over time.

A related concern in the assessment of school-related

behavior is selection of the time of year in which the

ratings will be made. Several studies (Cowen et al.,

1973; Epstein, et al., 1983; Larrivee & Bourque, 1980)

recommend allowing student behavior and teacher percep-

tions to stabilize. Supporting these decisions are data

from the Texas Junior High School Study (Evertson,

Anderson, & Brophy, 1979).

Evertson and Veldman (1981) found a moderate but

steady increase in serious misbehavior over the course of

the school year and an increase in general misbehavior in

April. Evertson and Veldman (1981) concluded that short-

term studies should avoid ratings made either early or

late in the school year. The available literature seems

to suggest the feasibility of aggregating behaviors over

time periods specified in the rating scale instructions

and after teachers have had at least two months to observe

student behavior.

Deciding on the most appropriate type of rater to use

in assessing children's behavior has long been a problem.

In 1965, Ross et al. recognized the potential usefulness

of teacher ratings. Teacher's ratings have been found to

be more accurate than peer ratings of classroom behaviors











(Bailey, Bender, & Montgomery, 1983); other school profes-

sionals' ratings (Bower & Lambert, 1971, p. 143; Freemont

& Wallbrown, 1979), and institutional child care workers'

ratings (Kohn et al., 1979) and to be equivalent to the

ratings obtained by a multidimensional scaling technique

(MDS) applied to classroom behavior (Sanson-Fisher &

Mulligan, 1977).

A number of researchers have found support for

teacher ratings as appropriate measures of general class-

room behaviors (Solomon & Kendall, 1977), social behavior

(Loranger, Lacroix, & Kaley, 1982), assertive vs. aggres-

sive behavior (Roberts & Jenkins, 1982), acting out

behavior (Walker, 1970), and behavior that would likely

result in referrals for exceptional child education (Dean,

1980; Epstein et al., 1983; Home & Larrivee, 1979; Lahey,

Green, & Forehand, 1980; McKinney & Forman, 1982; Roberts

et al., 1981).

Not all studies have yielded positive results. Morris

and Arrant (1978) found that regular classroom teachers

tended to see more behavior problems in students referred

for evaluation than did school psychologists. A study

(Kazdin, Esveldt-Dawson, & Loar, 1983) of psychiatric

inpatient children found extra-class raters' evaluations

of overt classroom behaviors to correspond more closely to

direct observational data than did teachers' ratings.

However, teachers were more accurate than the extra-class











raters in identifying hyperactive children using a behav-

ior checklist. Overall, the evidence suggests strong

support for the use of teachers as raters of classroom

behavior.

An associated issue is the use of multiple raters to

increase reliability and reduce halo effect (Epstein,

1980). Ratings of students commonly are obtained from all

teachers having direct classroom contact (Linton & Chavez,

1979; Wixson, 1980). This procedure could result in as

few as one or perhaps as many as seven ratings, depending

on the grade level and local practice.

More recent research efforts have focused on

empirically determining the most effective number of

raters. Prinz and Kent (1978) increased from 1 to 4 the

number of raters of parent-adolescent interactions in a

clinical setting and reported increased reliabilities.

Both reliability and concurrent validity of clinical

judgments were shown to increase when the number of judges

was increased from one to ten (Horowitz, Inouye, &

Siegelman, 1979). Strahan (1980) extended the Horowitz et

al. (1979) study and concluded that after using four

raters, adding additional ones contributed little to

measurement effectiveness. Another study (Green, Bigelow,

O'Brien, Stahl, & Wyatt, 1977) of inpatient clinical

behaviors found little improvement when using more than

four raters.











Although in general agreement with the above studies,

a cautionary note was added by Kenny and Berman (1980),

who pointed out that if raters are completely unreliable,

increasing their numbers will not increase reliability.

The number of teachers usually available in a middle or

junior high school to serve as raters would appear to be

adequate to contribute to both improved reliability and

concurrent validity.

Various classifications of severity have been adopted

in school settings. Student conduct codes typically use

some method of indicating seriousness of offenses, such as

"serious misconduct" (Pinellas County Schools, 1983, p. 7)

and "minor, intermediate, and major" (Duval County Public

Schools, 1980, p. 16). Researchers (Pisarra & Giblette,

1981) have used categories emphasizing the targets of the

behavior (e.g., offenses against persons, offenses against

state laws). Teachers often focus on specific behaviors

(e.g., use of drugs, striking teacher) (Camp, 1981) and

administrators have used a combination of both (National

School Public Relations Association, 1973).

There is little consensus on the number of levels to

be used in assigning degrees of severity. Taylor et al.

(1979) used levels ranging from 1 (not very severe) to 4

(extremely severe). Camp (1980) used 0 for "not con-

cerned" through 4 for "extremely concerned." In an

earlier study, Moses (1974) used three levels, 1 (mild), 2











(moderate), and 3 (severe) in asking mental health and

criminal justice professionals to rate a list of problem

behaviors. To use too many levels may imply a degree of

confidence in discrimination not supported by the subjec-

tive nature of such ratings.

Not all rating scale authors and researchers accept

the necessity for including a severity rating (Searls,

Isett, & Bowders, 1981; Spivack & Swift, 1977). Even

when, as in the Behavior Problem Checklist (Quay &

Peterson, 1967, 1975. 1979), a severity factor is provided

for, the author does not always recommend its use. How-

ever, at the practitioners' level the degree of severity

of behaviors is a major concern.

Algozzine (1979), using items characteristic of

several behavior rating scales, developed the Disturbing

Behavior Checklist which asks teachers to rate the degree

of disturbance they experience as a result of different

student behaviors. This suggests a consequence to the

teacher based not on the frequency of the behavior, but on

the type and severity. After noting irregularities and

lower reliabilities, Taylor et al. (1979) had teachers

rate for severity 26 items of Part Two of the Adaptive

Behavior Scale (ABS) (Nihira, Foster, Shellhaas, & Leland,

1969). Teachers were able both to categorize behaviors

and rate them in terms of severity, leading Taylor et al.

(1979) to conclude that this additional information would











be useful in refining the scale and adding to its clinical

efficacy.

Inasmuch as the instrument developed in this study is

intended to have locally developed norms, the statistical

techniques used in the norming procedure and the comparing

of individual scores to the derived local norms are not

complex. While some more recent studies have focused on

problems associated with such common procedures as the

calculation of measures of central tendency (Mosteller &

Tukey, 1977; Stavig, 1978, 1982), many researchers

continue to rely on descriptive statistics utilizing raw

scores, arithmetic means, standard deviations, and stan-

dard scores.

White (1977) compared individual student's scores on

classroom behavior to the computed mean score for five

classes of "Follow Through" program students in order to

identify immature students. In a business setting, Fay

and Latham (1982) used means and standard deviations in

comparing scores obtained using two different rating

methods. A study (Lyness & Cornelius, 1982) comparing

judgment strategies and ratings of college instructors

supported the use of a rating scale composed of discrete

items, with an overall rating calculated by weighting the

items and summing the weighted scores. To obtain mean

sub-scores for subjects, Algozzine (1980) summed scores

across the items defining each of four factors of











disturbing behaviors and used means and standard devia-

tions in analyzing the results.

The cited studies seem to support the use of descrip-

tive statistics in both obtaining individual scores (i.e.,

sum of weighted ratings)-and deriving a local norm (i.e.,

mean) from ratings of a representative sample of a total

population. Salvia and Ysseldyke (1981, chap. 4) offer

definitions of common terms for descriptive statistics

applied to assessment.



Psychometric Properties of Rating Scales

Historically, rating techniques have aroused contro-

versy over estimations of validity and reliability (Ryan,

1958). Validity is the relevance of the scale to the

variables being measured. Most sources recognize three

types of validity, i.e., content, criterion-related or

concurrent, and construct (American Psychological

Association, 1966; Cronbach, 1970; Kerlinger, 1972).

Reliability is the accuracy or precision of a measuring

instrument and has been usually classified as either

temporal, inter-rater, or internal (Cronbach, 1970).

However, investigations (Epstein, 1980) into the

effects of situations on behavior have recently introduced

a fourth consideration, situational reliability, or the

consistency of behavior across settings. The development

of norms against which to compare results obtained from











individual administrations of rating scales is another

area of active investigation (Mendelsohn & Erdwins, 1978;

Messick, 1980). Research on these issues is reviewed in

this section.

Content validity refers to the relevance and repre-

sentativeness of the items used in construction of a scale

(Epstein, 1980). Often, this is determined by obtaining

judgments from experts not otherwise involved in the scale

construction (DiStefano, Pryer, & Erffmeyer, 1983; Jones

et al., 1975, p. 83; Lawshe, 1975; Thorne, 1978).

Kreitler and Kreitler (1981) found that item content

determined the rater's perception of the central theme of

an instrument. Items not perceived as relevant to the

central theme tended to be given neutral responses, thus

limiting the information contributed by the rater.

Criterion-related validity is studied by comparing

scores obtained from an instrument with one or more

external criteria of the variable being measured

(Kerlinger, 1972, p. 459). Criterion-related validity

encompasses both concurrent and predictive qualities

(Epstein, 1980). The comparison of scale results with an

independent judgment or diagnosis of a subject is an

example of an attempt at estimating criterion-related

validity. If the judgment or diagnosis confirms the scale

indications, the inference may be drawn that the scale is

in agreement with the concurrent diagnosis and is











predictive that others given a similar rating would also

be diagnosed similarly (Kohn et al., 1979; Mendelsohn &

Erdwins, 1978).

In one validation study, Harris, Kreil, and Orpet

(1977) used the school principal, guidance counselor, and

two teachers as judges in selecting both disruptive and

prosocial students for rating by the Behavior Coding

System (Patterson, Ray, Shaw, & Cobb, 1969). In develop-

ing the Pittsburgh Adjustment Survey Scales (Ross et al.,

1965), school principals were used to nominate adjusted,

withdrawn, and aggressive students for rating by their

teachers and scale results were compared with these

nominations.

According to Kerlinger (1972, p. 461) and Cronbach

(1970), the significance of construct validity is its

concern with the theory behind the variable being

measured. Guion (1977) argues that construct validity

integrates both content and criterion considerations.

Likewise, the usefulness of content and concurrent

validity is questioned by Sanson-Fisher and Mulligan

(1977) and construct validity is supported.

A definition of construct validity as the process of

ascribing meaning to scores is offered by Stenner and

Smith (1982). Messick (1980) broadens the concept of

validity to include both test interpretation and test

use. Messick (1980) describes construct validity as











"interpretive meaningfulness" (p. 1015) and suggests that

it rests on four bases: convergent and discriminant

validity, ethical interpretation, relevance and utility

for the specific application, and the consequences follow-

ing use of the instrument.

To be interpretable, a rating scale must be reli-

able. That is, a scale must produce similar results when

applied to the same person over several administrations,

the instrument must be relatively free of errors of mea-

surement, and the results must closely approximate the

"true" value of the variable for the person being rated

(Cronbach, 1970; Kerlinger, 1972).

Typically, test-retest data are compiled for varying

time periods between administrations. The correlation

between the two obtained scores is used to justify esti-

mations of temporal stability and, in the case of rating

scales, intra-rater reliability. Examples of reported

test-retest intervals include one week (Duval County

School Board, 1979; Quay, 1977), two weeks (Mendelsohn &

Erdwins, 1978; Russell, Lankford, & Grinnell, 1981) and

two years (Quay, 1977). However, Masterson (1968) pointed

out that low test-retest correlation coefficients may

reflect the transitory nature of the measured variable and

suggested high coefficients of internal consistency may be

more indicative of reliability for some instruments.











Internal consistency has often been estimated by

inter-item and item-total analysis (Edwards, 1957;

Kerlinger, 1972). In these procedures, an individual's

rating on one item is compared with the rating on all

other items or with the total score from the scale or

subscale to estimate the degree to which each item is

similar to the other items. Item analysis may be impor-

tant in reducing errors of measurement attributable to the

composition of the instrument (Benson & Clark, 1982).

However, internal consistency may not provide good reliabi-

lity estimation for a rating scale assessing constructs

comprised of many discrete behaviors (Kerlinger, 1972).

Some research (Rosenthal & Jacobson, 1968; Sulzbacher,

1973) into observer bias has suggested that beliefs about

rates may affect rater perceptions and, consequently, the

reliability of the ratings. In three studies (O'Leary &

Kent, 1973; Shuller & McNamara, 1976; Siegel et al., 1976)

of disruptive classroom behavior, while biasing informa-

tion experimentally introduced was found to influence

global ratings, it had no significant effect upon results

obtained from behaviorally stated scales. Siegel et al.

(1976) suggested that behaviorally specific items reduce

bias and improve inter-rater and intra-rater reliability.

The degrees of agreement among different raters on

measures of the same subjects at the same time in the same

setting have been used to indicate the inter-rater











reliability of an instrument (Cronbach, 1970). Also, the

agreement among different raters of subjects in the same

settings at different times has been used for the same

purpose (Cronbach, 1970). In middle and junior high

schools, these conditions do not usually occur naturally.

Fortunately, investigations of trait consistency in

subjects (Abikoff, Gittelman, & Klein, 1980; Epstein,

1980; Mischel, 1969) have encouraged the comparisons of

ratings by different raters over the same elapsed time

periods, but for different settings and situations,

conditions which do occur naturally in the secondary

school setting.

Epstein (1980) concluded that subjects do manifest

trait consistency, if aggregation techniques are applied

in assessing behaviors. Epstein (1980) suggested aggre-

gation over raters (e.g., teachers), situations (e.g.,

classrooms), occasions (e.g., class periods), and measures

(e.g., disciplinary records). Epstein further suggested

that when single ratings are made after extended periods

of observation, these ratings are similar to aggregated

ratings in that they represent an intuitive averaging of

ratings over many observations. Thus, reliability may be

improved by combining different teachers' ratings of the

same student over the same portion of the school year.

According to Cooper (1981), perhaps the most ubiqui-

tous challenge to inter-rater reliability is halo error











(Thorndike, 1920) or the tendency of a rater to allow

overall impressions of an individual to influence judgment

of specific areas of behavior (Holzbach, 1978). Attempts

(Landy, Vance, Barnes-Farrell, & Steele, 1980; Landy,

Vance, & Barnes-Farrell,' 1982) to statistically control

for halo effects have apparently not succeeded (Harvey,

1982; Hulin, 1982; Mossholder & Giles, 1983; Murphy, 1982).

One exploration of ways to reduce halo error resulted in a

restatement of classic advice: do not use rating cate-

gories that are imprecise and overlapping (Cooper, 1983).

In an extensive review of the literature, Cooper (1981)

concluded that of nine methods currently employed to reduce

halo effect, all leave residual illusory halo.

Studies of variables affecting reliability have iden-

tified several other challenges to the accuracy of school

behavior ratings. The sex of the teacher was found in two

studies (Levine, 1977; Silvern, 1978) to be correlated

with ratings of classroom behavior, with male teachers

consistently reporting lower levels of disruptive behav-

ior. Teachers' ratings seemed to be influenced by special

education labels in one study (Fogel & Nelson, 1983). In

two studies (Marwit, 1982; Marwit, Marwit, & Walker, 1978),

perceived unattractiveness of students has been shown to

correlate with higher ratings of disruptive behavior.

While challenges to reliability from a variety of

sources have been observed, several studies (Bernardin &










Pence, 1980; Fay & Latham, 1982; Latham, Wexley, & Pursell,

1975; Madle, Neisworth, & Kurtz, 1980; Pursell, Dossett, &

Latham, 1980) have suggested that training in the use of

rating scales may be effective in reducing errors of

measurement. This review of studies of validity and

reliability has identified some sources of and counter-

measures for errors of measurement. Next, studies of the

variables affecting the norming of rating scales will be

reviewed.

Several writers have shown concern for the relation-

ship between behavior and the context in which it occurs.

The social value of a test, according to Messick (1980),

is determined by its instrumental value for a particular

setting. Willems (1975) stated that few phenomena have

meaning independent of the context in which they occur.

Likewise, researchers were cautioned by Dickinson (1978)

to evaluate behavior only in an environmental context.

Epstein (1980) referred to the "extreme situational

specificity of behavior" (p. 794) and warned that experi-

ments conducted in a single situation cannot be relied on

to generalize across even minor variations in stimulus

conditions. Others supporting this psychosocial approach

include Sherif (1954); Erickson (1963), quoted in Tinto,

Paclilio, and Cullen (1978); Salvia and Ysseldyke (1981,

p. 378); and Zammuto, London, and Rowland (1982).










Schools were described by Garbarino (1980) as "con-

texts for behavior and development" (p. 19). Some of the

characteristics of schools which may influence levels and

interpretations of disruptive behavior are size of enroll-

ment (DiPrete, 1981, p. 86; Garbarino, 1980; Kowalski,

Adams, & Gundlach, 1983); public or private administration

(DiPrete, 1981, p. 81); control orientation (e.g., human-

istic vs. custodial) (Deibert & Hoy, 1977; Gaynor &

Gaynor, 1976); degree of person-environment fit (Kulka,

Klingel, & Mann, 1980); traditional vs. open classrooms

(Solomon & Kendall, 1975); length of faculty tenure

(DiPrete, 1981, p. 107); socioeconomic level of the host

community (Kowalski et al., 1983); and region of the

country (DiPrete, 1981, p. xx; Kowalski et al., 1983).

Researchers advocating the use of local norms for behav-

ioral measurements include Fremont and Wallbrown (1979);

Mendelsohn and Erdwins (1978); Quay and Peterson (1967);

Smith (1976); Walker and Hops (1976); and Wallbrown,

Wallbrown, and Blaha (1976).

The effects of sex, age, race, and socioeconomic

status on ratings of disruptive behavior have been

frequently studied. The types of disruptive behavior

displayed in both educational and clinical settings have

not been found to be significantly different for the

variables of sex (Behar & Stewart, 1984; Epstein et al.,

1983; Morris & Arrant, 1978; Stott et al., 1975, p. 166),










age (Behar & Stewart, 1984; Ghodsian, Fogelman, Lambert, &

Tibbenham, 1980; Stott et al., 1975, p. 83), race (Gajar &

Hale, 1982), or socioeconomic status (Behar & Stewart,

1984; Stott et al., 1975, p. 97). Thus, providing for

separate norms for these variables seems unnecessary in

any scale rating only disruptive behaviors.



Uses of Behavior Rating Scales

Bailey et al. (1983) supported the use of rating

scales in program planning and evaluation. Likewise, the

lack of effective measurement devices was seen by Hirshoren

and Heller (1979) as limiting the evaluation of program

effectiveness. Mesinger (1982) called for the use of

appropriate measurement devices in providing services for

deviant youth within the public school setting. Cooper

(1983), Peed and Pinsker (1978), and Beatty et al. (1977)

have suggested providing rating scale results to rates to

influence behavior changes. Using rating scales to pro-

vide a standardized description of behavioral problems has

been suggested (Edelbrock & Achenbach 1978).

In a study comparing resource room delivery models,

Wixson (1980) used a behavior rating scale in developing

and evaluating intervention programs for various cate-

gories of handicapped children. Morton Bortner (Buros,

1978, p. 493), reviewing the AAMD Adaptive Behavior Scale,

pointed out its usefulness for evaluating the progress of











individuals and evaluating program goals. The Duval

County School Board (1979) used a locally constructed

behavior checklist to evaluate their grant-funded program

for disruptive students.

Several programs which retained students in their

regular classrooms have used behavior scales for evalua-

tion purposes. Walker and Holland (1979) and Linton and

Chavez (1979) developed and used rating scales for this

purpose in elementary and junior high schools, respec-

tively. The Hahnemann High School Behavior Rating Scale

(Spivack & Swift, 1977) was intended to provide teachers

with a practical means of describing disruptive classroom

behavior to parents and other school personnel. In a

study of junior high school truants, Nielsen and Gerber

(1979) used a behavior rating scale to match school inter-

ventions with student needs.

A quantitative measure of disruptive behavior was

developed by Mendelsohn and Erdwins (1978) to assist

community agencies in devising programs for expelled

students. Haskell (1979) developed a method of quantify-

ing clinical behavior in institutional settings to provide

a basis for planning individual programs and evaluating

results. McSweeney and Trout (1979) used the Jesness

Behavior Checklist (Jesness, 1970) to evaluate the social

progress of deviant children in a wilderness camp pro-

gram. Five reasons for obtaining measures of students










are offered by Salvia and Ysseldyke (1981): "Screening,

placement, program planning, program evaluation, and

assessment of individual programs" (p. 14). Behavior

rating scales have been used to obtain measures for each

of these needs.



Summary

This review has identified five approaches in the

literature to define disruptive school behavior (DSB). A

conceptualization of DSB based on the interactions of

students, teachers, and administrators within the school

setting was suggested as most relevant for the development

of an instrument to quantify DSB.

Psychometric challenges to the use of rating scales

for identifying behavioral characteristics were consid-

ered. Research was cited to suggest that teachers using

reliable and valid scales could accurately identify DSB.

Nineteen instruments available for assessing problem

behaviors were reviewed. None appeared to meet the

psychometric criteria required for educational placement

decisions. Possible sources of error in nonsystematic

observations were presented with the suggestion that

inaccurate, biased, or subjective judgments may result.

Type, frequency, and severity of behaviors were

related to item content, item format, and response

format. Support was found for the inclusion of these











measurement parameters in assessing DSB. The use of

descriptive statistics in current research for obtaining

individual behavior ratings and deriving local norms was

demonstrated.

A review of the sources of error in measurement was

conducted and counter-measures for improving validity and

reliability estimations were suggested. A number of

variables affecting the norming of rating scales were

investigated. Research evidence rejected separate norms

based on gender, race, or socioeconomic status. The

effective use of behavioral instruments in a variety of

settings was documented, suggesting the suitability of

such a device for describing students who display DSB.














CHAPTER THREE

METHODOLOGY



The purpose of this study was to develop and validate

an instrument, the Disruptive Student Behavior Scale

(DSBS). The DSBS is intended to be used to assess

quantitatively the disruptive school behaviors of students

referred for placement in either special education or

alternative education programs. This chapter presents the

research questions, defines the target population,

presents a plan for constructing the scale, describes

procedures for a pilot study, details statistical tests

and procedures for the data analyses, and discusses

possible limitations of the study.



Research Questions

1. Does the content of the DSBS represent behaviors

recognized and accepted by educators as occurring

in and disruptive to the school environment?

2. In the judgment of experts, does the DSBS contain

an equitable distribution of items descriptive of

the underlying theoretical constructs that

identify disruptive students and discriminate them

from non-disruptive students?












3. To what degree does the DSBS demonstrate

criterion, convergent, and discriminant validity?

4. To what degree does the DSBS provide ratings which

are stable over time?



Construction of the DSBS

The following plan is a modification of a suggested

procedure (Benson & Clark, 1982) for rating scale construc-

tion. A review of disruptive school behavior (DSB)

literature provided a research base for defining the

constructs comprising DSB. A total of 303 descriptive

items and 22 categories were found in 36 studies. After

eliminating duplications and items not pertaining directly

to DSB, 56 items remained. Combining similar categories

resulted in a total of 13 potential categories of behav-

iors associated with DSB.

In a project conducted by the Research Committee of

the Psychological Services Department of the Duval County,

Florida School District, the 56 items and 13 categories

were presented to 16 teachers of middle school students

enrolled in a behavior management program for disruptive

students. The rating group was composed of 10 females and

6 males, and all had at least two years full-time teaching

experience. Group members were instructed to assign each

item to one or more of the 13 categories. Instructions

and results are reproduced in Appendix A.











The judges' ratings and comments resulted in the

retention of 10 categories, which were considered to be

one set of constructs which could be used in identifying

DSB. A tentative definition of each construct was

formulated using the descriptive items assigned by the

teachers. A verification was attempted of the inclu-

siveness of these derived constructs. A frequency

distribution was prepared for all of the conduct code

violations reported for chronic violators in a sample of

Duval County, Florida, elementary, middle/junior high,

high, and alternative schools (Moses, 1981). Of 7717

behavior violations, 7686, or 99.6%, were included within

the definitions of the proposed constructs.

The items, as taken from the studies and used in

developing the constructs, were not considered specific

enough for use in a quantitative rating scale. However, a

readily available pool of potential items was located in

the disciplinary referral records of an inner-city junior

high school in a metropolitan Florida school district.

Verbatim transcriptions were made of the reasons recorded

on the referral forms by teachers when sending students to

the deans. All active folders for the 1980-1981 school

year were reviewed. A total of 395 items, including dupli-

cations, were recorded without regard for gender, age,

race, or grade level. Combining obvious duplications and











similarities resulted in 66 items (Appendix B) to be

considered for inclusion in a scale for rating DSB.

All of the 66 items were then presented individually

to six male and five female volunteers, experienced

secondary school regular classroom teachers from suburban

Florida middle and junior high schools. Instructions are

reproduced in Appendix C. These teachers were asked to

verify the specificity of the items and edit those consid-

ered ambiguous. This review yielded 40 items for possible

use on an instrument. These items were then stated in the

past tense to reflect the intention to measure students'

past behavior (Appendix D). This preliminary study indi-

cated the feasibility of using research-based constructs

and teacher-generated items as the basis for a rating scale

for disruptive school behavior.

In order to reduce halo and leniency errors, it has

been suggested (Blanz & Ghiselli, 1972) that a scale be

arranged so that items from the same construct will not be

contiguous. Accordingly, items were initially randomly

ordered, then inspected and rearranged to meet this crite-

rion. (Appendix M). Research studies previously cited

suggested that in addition to specifying the type, a

quantitative measure of disruptive behavior must provide

for rating both frequency and severity. Frequency rating

was provided for by the choice of response format selected

for the instrument. The literature review suggested the











suitabilty of a 5-point, equal interval, summated rating

scale (Likert, 1932) using the following anchors:

0 None of the time

1 Very infrequently

2 Sometimes

3 Quite often

4 Always (Pohl, 1981, p. 239)

The rating scale (Appendix F) utilized this response

format.

The severity rating for each scale item was estab-

lished with assistance from the faculty, staff, and

administrators of two alternative schools located in two

metropolitan Florida school districts. From their experi-

ence with disruptive students, these educators were

particularly aware of the consequences for students who

display DSB. Respondents were selected from volunteers,

including the principal and assistant principal, school

psychologist, social worker, educational evaluator, and

faculty members. This group contained both males and

females in approximately equal numbers. All had more than

two years' experience working with disruptive students.

The school experience may be conceptualized as influ-

encing the social, personal, and academic domains of a

student's life. Each of these domains may be subdivided

to facilitate closer study of the consequences of the

school experience (See Table 1). One way for educators to











assign a severity factor to a disruptive activity is to

have them estimate which domains of student life would

likely be affected adversely by that particular behavior.

Instructions for this procedure are reproduced in Appendix

G. The number of adverse consequences assigned by at

least 50% of the raters, divided by a constant of three to

keep the numbers small and with fractions rounded up to

the next whole number, gave a severity rating of 1, 2, or

3 to each of the items on the rating scale. Results are

reported in Chapter Four.

A scoring template incorporating the severity factor

was prepared for the DSBS (Appendix H). This template has

five holes, one corresponding to each possible frequency

rating (i.e., 0, 1, 2, 3, 4) for each rating scale item.

Through the holes are read the rater's mark (X) indicating

the frequency rating assigned. Above each hole is printed

a number which is the product of that frequency rating and

the previously determined severity factor for that item.

Thus, the weighted score for that item may be read by the

scorer directly from the scoring template and recorded on

the DSBS rating form beside each item.

These item scores were then added to give the page

score and form score (see Appendix F) and recorded onto a

summary sheet (Appendix I). The Summary of Teacher

Ratings form (Appendix I) contains for'each student the

DSBS rating; the deviation, in z-scores, from the local














0
o I


r. a 0


4) rz

HI a) V) > a

Q) (0 > bO I.
Q. 0 bO O 0
X r C bO.r-,
0 (*M r- .


0 OQC
0 a) CM ii


o



4O


.0
C C 4.)



c oo


1) ., o .-1
0 --- > >
C C O
0 0 0


4 0Q -
4- bOi 4-4
-4 .0
CC Oi U






0



C 0u 4
* LO 0

r= ., CO.

.0 4-)0
sC 0 -

S( Oa O





VC 0 *0
4- ) Q



(0 0 *P
Iu 0 >
T- t/3 Q .
r-1 r- o0
(U CO CO

- NCJ











norm; a comparison of ratings by each teacher; and the

basis for constructing a DSBS profile for prescriptive use

(Appendix J). These data are intended to provide local

school authorities with criteria for estimating the devia-

tion of any student's rating from the local DSB norm and

are intended to assist in determining a student's need for

an intervention program. The DSBS is normed locally

within each school district. Norms from this study are

reported in Chapter Four for information, but are not to

be used as criteria for judgments about students in other

settings.



Validation of the DSBS

To assure content validity, the 40 items and the 10

constructs developed from this preliminary study were

presented to a group of 24 teachers with instructions to

assign each item to a construct category or to no category.

The instructions are reproduced in Appendix E. Each judge

had at least two years of regular classroom teaching

experience in a middle or junior high school. Thirteen

male and 11 female teachers participated. The judges were

also asked to verify the specificity of the retained items

and reword those considered ambiguous. Revisions were

made as suggested and confirmed by a follow-up study using

another group of eight similarly-qualified teachers.











As described in the field study section, at a Florida

middle school a criterion group of disruptive students was

selected by nomination by seven non-teaching school person-

nel, including two deans, three guidance counselors, and

two administrators. Students in the disruptive group were

ranked numerically on a continuum from non- to severely

disruptive, based on subjective ratings from all the

nominating personnel. DSBS ratings from teachers were

compared to these subjective ratings to determine how well

high DSBS teacher ratings correlated with high levels of

disruptiveness as perceived by non-teaching school person-

nel.

To estimate how well the DSBS identified the disrup-

tive group, the mean of DSBS ratings for the disruptive

group students were compared with the mean of DSBS ratings

for a norming group representing a sample, stratified by

grade, of the school population. If the DSBS demonstrated

agreement with the concurrent judgments of disruptiveness

made by non-teaching school officials, there would be made

a prima facie case for predicting that students in other

settings identified by the DSBS as disruptive would also

be judged disruptive by non-teaching school officials.

Messick (1980) described construct validity as based

on convergent and discriminant validity, ethical

interpretation, relevance and utility for the specific

application, and the consequences following use of the











instrument. Convergent validity requires that the DSBS be

able to identify all students who are considered exces-

sively disruptive. To demonstrate satisfactory convergent

validity, the DSBS ratings of 100% of the students in the

disruptive group would have to be significantly above the

local DSB norm. The disruptive group ratings are reported

in Chapter Four.

Discriminant validity requires that the DSBS be able

to reject those students who are not considered exces-

sively disruptive. To demonstrate satisfactory discrimi-

nant validity, the DSBS ratings of only those students in

the disruptive group, or eligible for inclusion, could be

significantly above the local DSB norm. Ratings of the

norming group are reported in Chapter Four.

Ethical interpretation of DSBS ratings requires an

understanding of both the theoretical and practical

concepts underlying development of this instrument. There-

fore, a manual will be prepared before the DSBS is offered

for research use. Relevance was supported by the theoreti-

cal basis on which the 10 constructs were chosen to define

DSB for this study. Utility was provided by the proce-

dures used to select appropriate items, score the forms,

interpret the ratings, and present the results. The conse-

quences of using the DSBS cannot be predicted until it is

thoroughly researched. The intent is to improve the

validity of the selection process for programs assisting

disruptive students.











Reliability of the DSBS

The DSBS rating for each student is an aggregate of

scores from at least four teachers. A test-retest measure

compared two DSB ratings obtained from individual teachers.

Fourteen days after the receipt of teacher ratings, a

follow-up rating by the same teachers of approximately 10%

of both the norming and disruptive groups was made. These

results are reported in Chapter Four.

The internal consistency of the DSBS was protected by

choosing only items previously used by teachers to

describe DSB. Item analysis is not an effective technique

for establishing reliability of individual administrations

of the DSBS. Patterns of disruptive behavior are often

narrow and stereotypical, while the DSBS contains items

descriptive of a broad range of possible behaviors. Thus,

item scores were not likely to correlate with each other.

No attempt was made to assess interrater reliability.

Classroom settings are conceptualized as discrete environ-

ments, whose norms for behavior are determined by the

personality of the teacher. The behavior of interest is

the interaction of students with their teachers totally,

not individually.



Field Study

The purpose of the field study was to identify and

correct any problems, actual or potential, with item











content, response format, or administration and scoring

procedures of the DSBS. Following a successful field

study, the instrument may be offered to the profession for

further research and development (Benson & Clark, 1980).

Accordingly, the operational goal of this present effort

was to conduct a field study to determine the readiness of

the DSBS for use as a research instrument.

The target population consisted of students enrolled

in grades six through nine (i.e., middle and junior high

school grades) in public schools anywhere in the United

States. No restrictions were placed on age, gender, race

or socioeconomic status. The selection criteria for the

host school were a heterogeneous ethnic population, an

urban or suburban location, public middle (grades 6, 7, 8)

or junior high (grades 7, 8, 9) school status, random

assignment of students to basic courses, and an average

daily attendance figure of at least 500 students. Special

schools, such as alternative schools and special education

centers, were not considered.

A public middle school meeting these criteria was

located in a predominately urban school district on the

west coast of Florida. The student enrollment was

approximately 76% white, 22% black, and 2% Asian- and

Hispanic-American, with an average daily attendance of

733. Socioeconomic status was said by the principal to be

primarily upper-lower class and lower-middle class.











For the norming group, a sample consisting of 90 stu-

dents was selected using one English and one mathematics

class, with randomly-assigned enrollments, at each grade

level. A total of six classes containing 203 students and

ranging from 32 through 35 students each were sampled.

The numbers 1 through 35 were written on individual slips

of paper and 15 numbers drawn randomly using the replace-

ment procedure. For each class, the students whose class

roll numbers matched the 15 randomly selected numbers were

included in the norming groups.

The disruptive group was selected by nomination by

non-teaching school personnel, who were asked to list the

names of all of the excessively disruptive students

encountered during the current school year. It was thus

possible for a student's name to be included in both the

norming and the disruptive groups. The nominating process

initially produced a group of 64 students. After a confer-

ence among the raters, this group was reduced to 36

students.

All students finally nominated into the disruptive

group were assigned to one of four levels of disruptive-

ness (none, mild, moderate, or severe) by each nominating

person working independently. Nominated students were

assigned a numerical rating according to the following

scale:











Level of Disruptiveness Rating

None 0

Mild 1

Moderate 2

Severe 3

Students were ranked according to the average of these rat-

ings. This ranking permitted the correlation, reported in

Chapter Four, of levels of disruptiveness between the DSBS

results and the qualitative assessments by school person-

nel for each disruptive group student.

Schedules for the sample students were obtained from

school records. No contact was made with any student.

Training of all participating teachers took place in a

meeting at which a DSBS form for each period of a sample

student's current schedule was distributed. Appendix K

contains these instructions. The purpose of the study was

explained and a date and procedure for returning the forms

agreed upon. Emphasis was placed on the need to respond

to only the behaviors actually mentioned on the instrument

and to perform the ratings independently of other teachers.

Provision was made for a faculty member to either answer

or refer questions that might arise during the rating

period.

Teachers not submitting all their DSBS forms by the

agreed upon date were contacted and reminded of the

importance of their participation. Upon receipt of at











least four completed DSBS forms for each student, the DSBS

rating for that student was calculated. At least four

scorable forms, totaling 622, were received for 108 stu-

dents, 76 in the norming group and 32 in the disruptive

group. The scoring template (Appendix H) provided for

calculating item scores weighted for severity.

The item scores were totaled to produce a form score,

which was entered on the Summary of Teacher Ratings form

(Appendix I). This summary form contains spaces for the

student's name, grade, age, and sex; school name; evalua-

tor's name and title; individual form scores; each rater's

name, subject, and class period; and calculation of the

student's DSBS rating and z-score. Each sample student's

form scores were summed to give a total score. The total

score was divided by the number of raters to yield the

average score, which is the student's DSBS rating.

After the DSBS ratings for all the norming group stu-

dents were calculated, the mean DSBS rating and standard

deviation for the group were obtained. This mean of the

means is the local DSBS rating, or norm, for the target

school. The local DSBS norm was subtracted from the

students' DSBS rating, giving their deviation from the

local norm.

Dividing this deviation by the local standard devia-

tion gave the number of standard deviation units, or

z-scores, the student's DSBS rating differed from the











local DSBS norm. The criterion of two standard deviation

units above the local DSBS norm translates to a disrup-

tiveness score higher than approximately 98% of the

predicted scores from the school population. The distri-

bution of scores obtained from the norming group was

inspected to assure the existence of sufficient variance

to make the z-scores meaningful.

A reliability check was performed. Fourteen days

after all the rating forms were collected, approximately

10% of the students from both the norming and disruptive

groups were selected to be representative of the range of

scores. New forms were submitted to the original raters

for rerating the same students and the results compared.

These results are reported in Chapter Four. After comple-

tion of the data analyses, all participants were invited

to a meeting to discuss the results, offer comments, and

receive appreciation for their participation.


Data Analyses

Validity

To establish content validity, 24 expert judges

assigned proposed DSBS items to construct categories.

Results of the judges assignments were totaled for each

item. An item was dropped if not assigned to at least one

category by each judge. If this content validation pro-

cedure had resulted either in fewer than 30 items being











assigned to at least one construct or in having a con-

struct with fewer than three items assigned by 80% of the

respondents, enough items would have been constructed and

validated to meet these criteria. The judges' item assign-

ments are reported in Chapter Four.

To ascertain how well the DSBS identified the

disruptive group, the t-test was used to estimate the

significance of the difference between the means of the

norming group and disruptive group. An obtained prob-

ability level of .05 or less was considered evidence of

statistical significance. The magnitude of the difference

between the means was used to evaluate the practical

significance of the instrument and its potential for

identifying disruptive students. These results are be

reported and discussed in Chapter Four.

To estimate convergent validity for the DSBS, the DSBS

rating for each disruptive group member was compared with

the mean DSBS rating of the norming group. For the

purposes of this study, a DSBS rating of at least two

z-scores above the norming group mean was accepted as

evidence that the DSBS had correctly identified a disrup-

tive group member. The standard error of the mean was

used to include students when evaluating borderline cases.

The criterion for satisfactory convergent validity was the

correct identification of 100% of the disruptive group.











Discriminant validity also was estimated by using

ratings, means, and z-scores. The DSBS rating for each

norming group member was compared with the mean of that

group. Any norming group member whose DSBS rating

exceeded the mean by at least two z-scores was considered

identified by the DSBS as excessively disruptive. Identi-

fied cases, not members of or eligible for the disruptive

group, were considered challenges to the discriminant

validity of the DSBS. All cases not meeting the construct

validity criteria were investigated. Construct validity

results are reported and discussed in Chapter Four.



Reliability

The Pearson product-moment correlation statistic was

used to compare the original ratings on approximately 10%

of the completed forms with follow-up ratings made after

14 days. Individual coefficients of at least .80 were set

arbitrarily to establish an acceptable level of test-

retest reliability.


Limitations

1. The school for the field study was selected based on

the willingness to cooperate by both the school and

the faculty. This may have mitigated problems that

would occur in a less favorable environment.











2. Teacher resistance and/or concerns about this type of

research may have biased or limited their partici-

pation.

3. The study was limited to exploration and the results

are not intended to generalize beyond the administra-

tion and scoring procedures. Specifically, the

calculated DSB norm is valid only for this school.

4. No provision was made to assess the possible effects

of grade and sex on DSB norms. Studies have indicated

the influences are not significant, but at some point

this should be investigated.

5. The disruptive sample group was likely composed of

students who had been referred to the dean. The same

teachers who referred these students to the dean may

have rated their behaviors, with bias a possibility.

6. The use of expert judges in the validation procedures

may have introduced personal bias into the items used

on the instrument.














CHAPTER FOUR

RESULTS AND DISCUSSION



The purpose of this study was to develop and validate

an instrument, the Disruptive Student Behavior Scale

(DSBS). The study focused on identifying components of

disruptive school behavior as perceived by middle and

junior high school teachers and constructing an instrument

to quantify these behaviors. To accomplish this, an

instrument was constructed using behaviors taken from

disciplinary referrals and field tested on a representa-

tive sample of students from a Florida middle school.

Teacher ratings for a norm group and a disruptive group

were collected and analyzed as outlined in Chapter Three.

These results are reported in this chapter.



Results

The Severity Factor

Results of the assignment of potential adverse conse-

quences resulting from DSBS behaviors are reported in

Table 2. Twenty packets containing 40 DSBS items and an

instruction sheet were distributed and 16 were returned.

At least 50%, or 8, of the raters had to assign a DSBS item

to a particular domain before that domain was















Tmm 4--- n -j- N MN~ I
I- mj Crt-~m~r0 0I1















=r'-immemminmmewo0 rc'jm


OV E

V) 4-)




> a




0) w
CO




0




H







E

Q)
V0









C.






o
u-I
r_
0











E
a







0
Qi









QI


CMr0o 0e r oo Co


o0 % n M Ch wNOOOON N O OCN ON N 0


S- r r




OO0000-OOOOOONOO-0000




cM -Moo~~b0oom mMO~MM


m01~o- 0 0J '00 '- oC~c 0


y- OJO '-011 ~-O


OOm mO I. 0 0- M C1 V-m M--:o 3 nO iMm -coom





SeMm-c m~00o0 mom ~m
0C m^0m0 ccOci rmo oorO





r-i-r0T- T-t- --CM M 0 0C


nmoom -rwoON


M


cc
*r

E
0










1m


0C


CM


-ID
V -
0
en


E
Vo




Full Text

PAGE 1

DEVELOPMENT OF THE DISRUPTIVE STUDENT BEHAVIOR SCALE BY WILLIAM L. MOSES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA MAY 1986

PAGE 2

Copyright 1986 by W il 1 i am L. Moses

PAGE 3

To Billy Who would have been proud

PAGE 4

ACKNOWLEDGEMENTS I wish to express appreciation to my parents for their love, understanding, and help; my committee chairperson, Dr. McDavis, for his counsel and encouragement; my committee members, Dr. Ziller for his confidence in my ability to work independently and Dr. Loesch for stepping into the breach and contributing so much so quickly while continuing his friendship and support; and my employers at Pasco-Hernando Community College for their financial support. Special thanks go to my friend and colleague, Dr. Tom Floyd, who listened for hours and encouraged for years, and to my friends and lovers who were usually supportive, sometimes distracting, and always worth it. iv

PAGE 5

TABLE OF CONTENTS ACKN OWLEDG MEN TS LIST OF TABLES. ABSTRACT. iv vii . . Viii CHAPTER ONE INTRODUCTION Statement of the Problem . Purpose of the Study Need for the Study . Significance of the Study. . Definition of Terms. . Organization of the Study. CHAPTER TWO REVIEW OF LITERATURE . Definition of Disruptive School Behavior (DSB) Identification, Assessment, and Placement Rating Scale Development .. Psychometric Properties of Rating Scales Uses of Behavior Rating Scales Summary. . . . . . . . . . . . . . . . . . CHAPTER THREE METHODOLOGY Research Questions. Construction of the DSBS . Val i da ti on of the DSB S Reliability of the DSBS. Fi el d Study . . . . . . . . . . . . . . . . . . . . Data Analyses. . . •.. Validity . . Reliability Limitations. 1 3 7 7 1 1 12 14 16 16 21 38 57 66 68 70 70 71 77 80 80 85 85 87 87 CHAPTER FOUR RESULTS AND DISCUSSION. 89 Results. . . . . . . . . . . . . . . . 89 The Severity Factor 89 The Samples. . 92 Research Question One. 93 Research Question Two . 98 Research Question Three. 100 Research Question Four. . 110 Summary . . . 117 Discussion .................... 113 V

PAGE 6

TABLE OF CONTENTS CONTINUED .E.au. CHAPTER FIVE CONCLUSIONS, INPLICATIONS, SUMMARY, AND RECOMMENDATIONS. . . . . . •..•...... 118 Conclusions •................... 118 Implications ......•....•....... 119 Summary. . . . . . . . . . . . . . . 121 Recommendations. . . . . . •..... 123 APPENDICES A CONSTRUCT DEVELOPMENT STUDY . . ........ 124 B BEHAVIORS COLLECTED FROM DISCIPLINARY RECORDS. 130 C ORAL INSTRUCTIONS FOR THE EDITING STUDY .... 133 D ITEMS DEVELOPED FROM CONTENT VALIDATION STUDY . 134 E INSTRUCTIONS FOR CONTENT VALIDATION STUDY ... 137 F THE DISRUPTIVE STUDENT BEHAVIOR SCALE (DSBS) .. 138 G INSTRUCTIONS FOR SEVERITY FACTOR STUDY ..... 144 H SCORING TEMPLATE FOR THE DSBS ......... 145 I SUMMARY OF TEACHER RATINGS ON THE DSBS ..... 151 J PRESCRIPTIVE PROFILE WORKSHEET FOR THE DSB S . . 1 52 K INSTRUCTIONS FOR THE PILOT STUDY ........ 158 L RATERS EVALUATION OF THE DSBS ..•...... 160 M ASSIGNMENT OF CONSTRUCTS BY ITEM NUMBER .... 161 REFERENCES ..... . 162 BIOGRAPHICAL SKETCH . 1 93 vi

PAGE 7

Table 1. 2. 3. 4. LIST OF TABLES Domains of Student Life Influenced by the School Experience . ..•. Potential Adverse Consequences of DSBS Behaviors Rating Form Distribution by Demographic Categories--Norming Group Rating Form Distribution by Demographic Ca tego r i es--Di srupti v e Group. . 76 90 94 95 5. Frequency of Observed DSBS Behaviors by Constructs. . . . . . . . . 97 6. DSBS Constructs by Number 99 7. Assignment of Proposed Scale Items to Constructs for Content Validation. 101 8. Follow-up Study for Assignment of Proposed 9. Sc a 1 e It em s t o Co n st r u ct s . . . . 1 0 3 Comparison of Disruptiveness Ratings by Teachers and Non-teaching Personnel. 106 10. DSBS Ratings and z-scores for the Disruptive 11. 1 2. Group . . . . . . . . . . . . 107 DSBS Ratings and z-scores for the Norming Group Test-Retest Correlations vii 108 1 1 2

PAGE 8

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DEVELOPMENT OF THE DISRUPTIVE STUDENT BEHAVIOR SCALE By William L. Moses May 1 986 Chairperson: Roderick McDavis, Ph.D. Major Department: Counselor Education Disruptive behavior is currently seen by both educa tors and the public as a major problem in American education. A procedure for quantitatively assessing disruptive behavior in schools is required to show a need for intervention programs and to select students for placement in either special education or alternative education programs. The purpose of this study was to develop and validate an instrument, the Disruptive Student Behavior Seale (DSBS). The DSBS is intended for use in assessing quantitatively the disruptive school behaviors Viii

PAGE 9

of middle and junior high students referred for placement in special education and alternative education programs. This study investigated the position that disruptive school behavior (DSB) can best be described in terms of its type, frequency, and severity. The use of teachers as observers and raters of disruptive school behavior is discussed. Using teacher-generated behavioral statements from disciplinary referrals to better describe DSB is suggested. A review of various rating scale development procedures attempted by business, industry, and government is summarized. A set of 10 constructs was selected to define DSB. Scale items were developed from referral statements on disciplinary records in a junior high school. A severity factor was incorporated into the scoring system so that behaviors rated as more detrimental to the student were given a higher DSBS rating. The DSBS was field tested in a public middle school. Students in a norming group and a criterion, or disrup tive, group were rated by their classroom teachers using the DSBS. A norm for disruptive behavior for the target school was calculated and a criterion for classifying a student as disruptive was established. Results indicated the DSBS could identify the crite rion group of disruptive students, classify individual ix

PAGE 10

students as disruptive, and exclude non-disruptive students from the disruptive group. A follow-up study suggested the results were consistent over time for all DSBS ratings except those at the lowest end of the scale. X

PAGE 11

CHAPTER ONE INTRODUCTION The public school system in the United States has been assigned a major role in socializing and enculturating American youth (Filipczak, 1978). The U.S. Supreme Court in its 1954 landmark civil rights decision (Brown v. Board of Education of Topeka, 74 S.Ct. 686, 691) described education as "a principal instrument in awakening the child to cultural values, in preparing him for later professional training, and in helping him to adjust normally to his environment." The materialistic emphasis of American society and culture ordains that the educa tional institution at all levels be driven by the broadly defined goal of career success for its graduates (Bell, 1984; DiPrete, 1981, p. 199; National Education Associa tion (NEA), 1975, p. 108). Unfortunately, a significant number of students are detoured from this goal when educators describe them as displaying behaviors inappropriate to the school environ ment and not attributable to legally-defined mental or emotional handicaps. Sus pensions, expulsions, and assign ments to alternative programs are evidence of failure by the educational system to effect students' adherence to

PAGE 12

2 current social norms and culturally-specified behaviors. The consequences to the schools for this failure include loss of both funds and credibility, neither of which the educational system has in sufficient quantity to squander. Attempts to correct this failure to convey effec tively norms and behaviors have included both exceptional child education and alternative schooling programs. The Education for All Handicapped Children Act of 1975 (P.L. 94-142)(Department of Health, Education, and Welfare, 1977) effectively administered them .de. grace to exceptional child education approaches in Florida by failing to include a category appropriate to disruptive behavior (Florida Department of Education, 1975, 1985). Alternative programs frequently fail to provide for selection and discharge criteria, rendering evaluation virtually impossible (Pinellas County School District, 1982). A primary reason for failure to specify behavioral criteria for alternative schooling programs is the lack of appropriate instruments for quantifying disruptive behavior (Salvia & Ysseldyke, 1981, pp. 8, 9). Inadequacies of existing behavioral assessment instruments include failure to provide for local norming, inclusion of inappropriate items, omission of the severity factor, and inadequacy of prescriptive information (Mesinger, 1982). An instrument providing both a theoretical and a pragmatic rationale for identifying

PAGE 13

3 disruptive students is a requirement for reconsidering the inclusion of this category in special education legisla tion and enhancing the credibility of alternative education programs (Reeves, Perkins, & Hollon, 1978). Statement of the Problem Disruptive behavior in the public school system is not a new phenomenon (Garibaldi, 1979). That it remains a problem is emphasized by Robert J. Rubel in introducing a collection of papers on crime and violence in public schools: The issue in the 1980 1 s no longer centers on whether or not violence in American schools is serious; the issue no longer centers on whether violence is increasing or decreasing; the issue no longer centers on technical anomalies concern ing underor over-reporting of incidents. In the debate of the 1980 1 s, the primary issue before large proportions of our urban schools (and sizeable numbers of our suburban and even rural schools) revolves around the continued viability of American education as it existed a generation ago. (1980, p. 5) The U. S. government has acknowledged the existence of disruptive behavior by awarding federal grants for alternative education pilot programs (Law Enforcement Assistance Administration, 1979; Moses, 1976). Included in definitions of disruptive school behavior (DSB) are such varied activities as talking, hitting, yelling (Mayer & Butterworth, 1979); defy ing rules and procedures (Walker, 1979); aggressive behavior which interrupts the instructional program

PAGE 14

4 (Foley, 1982); and conduct disorders (American Psychiatric Association, 1980 pp. 45-50). Forness and Cantwell (1982) and Forness, Sinclair, and Russell (1984) have identified these categories as likely to be ineligible for special education services under P.L. 94-142. The U.S. government (Department of Health, Education, and Welfare, 1977), in implementing P.L. 94-142, specif ically denied services to the "socially maladjusted." Florida law provides essentially the same restrictions (State Board of Education Rule 6A-6.3016), although Bower (1982), whose research (Bower, 1958) formed the basis for the P.L. 94-142 definition of emotionally disturbed, called this exclusion "contradictory in intent and content with •.. the research from which it came" (1982, p. 6 0) The need for alternative education services for disruptive students seems supported by reports of the widespread existence of DSB. Individuals and institutions reporting on the continuing crisis in school discipline include the Cal if orni a De pa rtm ent of Edu ca ti on ( 197 3) , the National Education Association (1975), the U.S. Congress (Bayh, 1975; Tygart, 1980), the Michigan Department of Education (Vergon & Williams, 1978), the National Institute of Education (Feldhusen, 1978), Cross and Kohl (1978), Duke (1978), the New York State United Teachers (1979), and the National Education Association

PAGE 15

5 (1980). The Safe School Study Report to Congress (National Institute of Education, 1978) indicated 5,000 teacher assaults per month occurred across the nation. The Gallup Poll on Education (Gallup, 1984) continues to report lack of student discipline as the number one concern of Americans about the public school system. In Florida, the Governor's Task Force on Disrupted Youth (GTFDY) found 17,983 student-days lost to suspen sions over a 2-year period in the 10 school districts studied (GTFDY, 1973, p. 11). An analysis of conduct code violations in Duval County, Florida, schools for 1980-1981 revealed more than 33,000 violations resulting in 13,679 days lost from school (Moses, 1981). The aversive consequences of chronic DSB for students include lowered self-esteem and functioning level (Caliste, 1979); dropping out and underemployment (Grise, 1980; NEA, 1975; Safer, Heaton, & Parker, 1981); alienation (Garbarino, 1980; Moyer & Motta, 1982); and criminal act iv it y ( E dw a rd s , Ro u n d t r e e , Ke n t , & Pa r k e r , 1 9 8 1 ; Mitchell & Rosa, 1981). Likewise, from the perspective of the school system DSB is undesirable, involving excessive teacher attention (Rubel, 1977, Chap. 1), litigation (Lufler, 1982), vandalism costs (Goldstein, Apter, & Harootunian, 1984), teacher stress (Pettegrew & Wolf, 1982), and weakened public support (Amos, 1980). Conse quences for the community include criminal actions and

PAGE 16

6 psychiatric referrals (Faretra, 1981). Levin (1972) esti mated the expense of inadequate education to be about 6 billion dollars a year (1972 dollars) for costs associated with welfare and crime. Researchers have identified the middle and junior high school age student as particularly prone to behavior disorder (Geiger & Turiel, 1983; Loeber, 1982; Nielsen & Gerber, 1979; Quay, 1978). These studies suggest the middle and junior high schools as a focus for identifying and remedi ati ng disruptive school behavior. Unf ortu natel y, no adequate instruments are available specifically for this population (Mesinger, 1982). Instruments developed from clinical populations contain some items irrelevant to the non-clinical population in the public schools (Quay & Peterson, 1967). Instruments offered with norms developed from research samples and no procedure for developing local norms for disruptive behavior do not consider the placement needs of local school districts (Messick, 1980). Levels of disruptive behavior that can be managed within the regular school environment vary across settings because of differences in such factors as facilities, experience of teachers and administrators, and school board policies. Current instruments fail to consider the widely differing consequences of specific disruptive acts (Kane & Bernardin, 1982). Some possible effects of this omission

PAGE 17

7 may be to group together students whose behaviors differ widely in their severity, to encourage conceptualizing all disruptive behavior as equally deleterious, and to base placement decisions on personal judgments about the seri ousness of a pa rti cul ar type of behavior. Neither does any available instrument provide procedures for creating a prescriptive profile of a student based on the authors' conceptual model of disruptive school behavior (Salvia & Ysseldyke, 1981). This failure may seriously limit the interpretation and application of rating scale results. Purpose of the Study The purpose of this study was to develop and validate an instrument, the Disruptive Student Behavior Scale CDSBS). The DSBS would be used to assess quantitatively the disruptive school behaviors of students referred for placement in either special education or alternative education programs. Need for the Study Salvia and Ysseldyke (1981, pp. 443, 444, 450) have called for norm-referenced instruments to support placement decisions, evaluate student progress, evaluate programs, provide intervention suggestions, and help parents understand their children's abilities in relation to other students. Reeves et al. ( 1978) called for

PAGE 18

8 reliable instruments to use in placing handicapped children. Also, Camp . (1981) notes that there is very little current, objective, research-based information in existence to help identify specific student behavior problems occurring in the schools. A need exists for research of this nature to quantitatively establish the actual, current situation with regard to student discipline problems in the public secondary schools. (p. 48) Presumably, these calls for reliable and valid instruments apply both to special education and alternative schooling programs, as both to some degree remove the student from mainstream classroom activities. However, the Florida law (State Board of Education Rule 6A-6.3017) providing for special education programs for the socially maladjusted was repeal e d J ul y 2 4 , 1 981 . "Educational alternative programs" were created in Florida in 1978 (Florida Statute 230.2315) specifically to reduce disruptive behavior and truancy. Florida Statute 229.565 provides for the evaluation of "procedures for identification and placement of students in educational alternative programs." As an example of practice, in 1982 the alternative education program in the Pinellas County School District did not require quantitative behavioral assessment prior to placement. Studies, however, have identified problems in using subjective criteria for alternative education placement. Disagreements in ranking behaviors (Pisarra & Giblette,

PAGE 19

9 1981), value systems (Messick, 1980), labels applied to students (Leyser & Abrams, 1982), teaching experience (Rubel, 1977, p.51), level of frustration (Walker & Holland, 1979), race (Arnove & Strout, 1978; Bennett & Harris, 1982; Florida DOE, 1983; Goldsmith, 1982; Mesinger, 1982), sex (Bennett & Harris, 1982), and socioeconomic status (Arnove & Strout, 1978; NEA, 1975) are variables that may confound perceptions of disruptive behavior. One way to help neutralize these confounding vari ables is to use quantitative measures. A review of current literature indicates that appropriate instruments may not exist. After a major study of alternative educa tion programs, Mesinger (1982) was unable to recommend even one instrument for use in selecting students. Messick (1964, 1965, 1980) argued against applying to local environments behavioral norms developed elsewhere. Stott, Marston, and Neill ( 1975, p. 8), Wodarski and Pedi (1978, p. 480), and Quay and Peterson (1975, 1979) advised the setting of local norms. However, no instrument located in this review provides a specific procedure for determining local norms. Another advantage of locally-developed norms is the opportunity to compute the mean DSB level for individual schools. Intervention program entry and exit criteria may be defined by the deviation of an individual student's

PAGE 20

10 mean DSB score from the school mean. This may provide the type of quantitative assessment required by state (SBE R ul e 6 A6 3 0 1 6 ) a n d f e de r al ( P L. 9 41 4 2 ) 1 aw for s p e c i al education placement and may meet the need noted by Mesinger (1982) for quantitative instruments to assist in selecting students for alternative education programs. A major need in intervention programs is prescriptive information (Lovitt, 1967 p. 238; Spivack & Swift, 1977). However, many instruments do not provide operationally defined i terns which are useful in the classroom. For example, the Behavior Problem Checklist (Quay & Peterson, 1979) items used to identify conduct problem students include "restlessness," "disruptiveness," and "irresponsi bility." These items originally wer~ taken from the files of a child guidance clinic (Quay, 1977). Defining disruptive behavior on the dimensions of type, frequency, and severity has received support from numerous sources (American Psychiatric Association, 1980, p. 45; Bernardin, LaShells, Smith, & Alvares, 1976; Camp, 1980, 1981; Grosek, 1979; Taylor, Warren, & Slocumb, 1979). Criticisms of assessment procedures not incorporat ing a severity factor have been mr3de by Kane and Bernardin (1982) and Pisarra and Giblette (1981). Nevertheless, no instrument was located which specifically recommended using a severity factor in assessing disruptive school behavior.

PAGE 21

11 An instrument which provides for quantifying DSB may help to protect students from placement in school programs according to inappropriate criteria. To be most effec tive, the instrument should include provisions for establishing locally-determined placement norms, for comparing with those norms the scores of individual students, for providing prescriptive information, and for sy st emati cal 1 y considering the type, frequency, and severity of the disruptive behaviors. Significance of the Study This study investigated the theoretical position that disruptive school behavior (DSB) can best be described in terms of its type, frequency, and severity. Theoretical considerations in the use of teachers as observers and raters of disruptive school behavior were discussed. The feasibility of using teacher-generated behavioral statements from disciplinary referrals to better specify the parameters of DSB was suggested. A review of various rating scale development procedures attempted by business, industry, and government were summarized. The instrument developed by this study will initially be most appropriate as a research tool for conducting studies of DSB. The availability of a process for establishing local norms for DSB may facilitate local research studies in evaluating the effectiveness of

PAGE 22

12 disciplinary measures, in-service training, and alterna tive education programs. This study will likely suggest additional areas for other investigations. The identification of disruptive students for inter ventions is not standardized. This instrument may assist in establishing quantitative criteria for selection, placement, and treatment of disruptive students. This, in turn, may lead to recognition of DSB as a category for exceptional student education funding. A major premise in much of the literature concerning DSB is the role of school personnel in exacerbating disruptive behavior. It may be that an instrument which provides a behavioral profile of the disruptive student will suggest goals for in-service training programs. Definition of Terms For the purposes of this study, the following definitions apply: Alternative education program. An educational procedure which provides intervention outside the regular classroom for students exhibiting some predetermined level of disruptive or disinterested school behavior. Disruptive school behavior CDSB). Behavior that disrupts the learning of self and/or others and is not attributable to severe emotional disturbance or other exceptional education categories.

PAGE 23

13 DelinQuent behavior. Behavior by persons under 18 years of age which violates laws and regulations pertain ing to them. Exceptional child (student) education programs. Programs which receive additional funding in order to better serve the needs of students meeting governmental guidelines for special assistance. Experienced teachers. Full-time, regular classroom teachers who have held that position at least two academic years. Expulsions. Removal from school for at 1 east the remainder of the school year. Locally developed norms. Criteria for comparing an individual student's DSB with the expected DSB of a specific reference population in the local school or community. Maladaptive social behavior. Behavior not of organic origin which would be judged by impartial observers to be inappropriate for the social situation and which ulti mately results in aversive consequences for the person exhibiting the behavior. Method bias. The influence on ratings of the type of rating method used. Non-guantitative assessment. See Qualitative assessment. Qualitative assessment. Evaluation based on individual opinion and lacking a systematic basis.

PAGE 24

14 Quantitative assessment. The use of numbers in describing behavior so that a higher number indicates a higher 1 ev el of the be h av i or Severity. A prediction, stated quantitatively, of the potentially detrimental consequences a disruptive behavior would likely have for a student. Special education programs. See Exceptional child education programs. Suspensions. Temporary removal from the regular educational program of a school, usually involving exclusion from school facilities for a specified number of days. Organization of the Study There are four remaining chapters in this dissertation. Chapter Two will present a review of the literature related to the development of an instrument to assess disruptive school behavior (DSB). Specifically, consider~tion will be given to disruptive behavior in the schools, existing assessment methods, rating scale develop ment, the psychometric properties of rating scales, and the possible uses of results from a disruptive behavior rating scale . Chapter Three will present the methodology employed in the development, validation, and field testing of the Disruptive Student Behavior Scale (DSBS). Included are

PAGE 25

15 the research questions, information on the population, procedures used in developing the scale, pilot testing, data analyses, and possible limitations of the study. Chapter Four will present the results of this study, including the data and the information inferred from the data. An explanation of the results will be given and they will be related to past research. Chapter Five will include conclusions from this study, along with implications for theory, research, practice, and training. A summary of the entire study will be presented, followed by recommendations for addi tional research.

PAGE 26

CHAPTER TWO REVIEW OF LITERATURE This study requires an investigation of the history and current status of attempts to define disruptive behavior in public schools; identification, assessment, and placement efforts directed toward disruptive students; rating scale development procedures; research into the psychometric properties of rating scales; and the use by schools of results obtained from rating scales. Accord ingly, this chapter will review research and opinion covering both theoretical and applied considerations relating to these topics. Definition of Disruptive School Behavior CDSB) According to Camp (1981), the major issue in student discipline in the secondary schools is how to describe quantitatively the kinds of disruptive behavior currently occurring. Summarizing a 1978 survey of state direct ors of special education, Hirshoren and Heller (1979) reported that while individual states define emotional disturbance consistently, there is considerable variation in the kinds of children so identified. That is, children meeting program criteria in one state appeared to be excluded in 16

PAGE 27

17 another. Much has been written in an attempt to resolve this situation. A review of the literature suggests the emergence of five discrete perspectives: (a) empirical, (b) clinical, (c) conceptual, (d) educational, and (e) school. The empirical approach of applying factor analysis (Cattell, 1978; Gorsuch, 1974) to a variety of items has resulted in the identification of some common behaviors associated with disruptive school behavior and has contri buted to defining DSB (Achenbach, 1978; Achenbach & Edelbrock, 1978; Edelbrock, 1979; Peterson, 1961; Quay, 1964, 1978; Quay & Peterson, 1967). However, researchers utilizing the empirical approach have included a broad range of behaviors, including many which identify delin quency and personality disorders (Freemont & Wallbrown, 1979), and so the scales developed from these studies have limited application for school personnel in defining the specific category of DSB. The classification of disorders contained in the Diagnostic and Statistical Manual of Mental Disorders, 3/e. (DSM-III) (American Psychiatric Association, 1980) and research studies incorporating these classifications and descriptions exemplify clinical efforts to define disruptive school behavior. Hewett and Forness (1982) pointed to the necessity of finding a common frame of reference between educational and psychiatric diagnoses in

PAGE 28

18 order for school personnel to accurately interpret clinical reports. Forness and Ca ntw el 1 ( 1982) concluded that the respective diagnostic systems of psychiatry and special education remain dissimilar. Likewise, other studies (Loeber, 1982; Werry, Methuen, Fitzpatrick, & Dixon, 1983) failed to find support for the use of psychiatric diagnoses to assign students to special education programs. The conceptual approach utilizes experience, research, and opinion in formulating descriptions of what is usually referred to in this perspective as "problem behavior" (Jessor & Jessor, 1977, p. 4). Cullinan (1975), Howell (1978), and Richard Jessor (1982) are among those applying a psychosocial conceptualization of problem behavior to the study of adolescent behavior . Nev erthe less, while the conceptual perspective gives support to the notion of comparing the behavior of an individual student with the behavior of peers before declaring the student to be deviant, this perspective fails to provide specific criteria for making such a comparison . The educational perspective includes the definitions contained in federal and state statutes, guidelines proposed by governmental agencies, and district codes of student conduct. In 1 977, the U.S. government, with out defining the term, specifically excluded the socially maladjusted student from receiving exceptional child

PAGE 29

19 education services under P.L. 94-142. The term "socially maladjusted" is not defined in the 1 ate st Fl or ida guide1 ines for providing special education for exceptional students (Florida DOE, 1985). The U.S. Bureau of Educa tion for the Handicapped has sponsored the compilation of a manual on behavior disorders (Yard, 1977). However, these items are too general for use in a quantitative i n st rum en t Codes of student conduct contain lists of behaviors for which punishment may be administered. Offenses 1 isted in the codes may be violations of either school rules (e.g., inappropriate display of affection) (Duval County Public Schools, 1980) or of law (e.g., vandalism) (Pinellas County Schools, 1983). While these offenses must be considered in defining disruptive school behavior, they exclude many of the disruptive behaviors frequently occurring within the classroom. Federal, state, and local guidelines seem insufficient for operationally defining DSB specifically enough to be useful in a selection instrument. The school perspective focuses on the interactions of students, teachers, and administrators within schools. Disruptive school behavior is seen as a product of these interactions. H. M. Walker, author of The Walker Problem Behavior Identification Checklist (1970), described the acting-out child as one who usually defies rules and

PAGE 30

20 ignores classroom procedures, is difficult to manage, avoids failure by attempting little academic work, and alienates teachers and other students by behaving aversively. Specific behaviors often include hitting, yelling, leaving seat, arguing, having temper tantrums, and provok ing others and often lead to confrontations. These confrontations may be verbal, physical, or both. Acting out behavior may occur in the classroom, in nonclassroom areas, or both. Walker (1970) proposed that acting-out children are differentiated from other students by the frequency, or quantity, of these behaviors, not by the type of behaviors. Thus, a measuring instrument must provide for a frequency component. Camp (1981) explored the types of behavior considered to be disciplinary problems, the perceived degree of severity of these behaviors, and the frequency with which these behaviors were observed. Camp found that the types of behaviors rated most serious were rarely observed and concluded that the most serious problem may be the frequent, though mild, behaviors that undermine student and teacher morale. A study of 21 secondary school administrators' attitudes toward aggressive behavior suggested that suspensions were awarded according to the administrators' attitudes toward the referred behavior,

PAGE 31

21 rather than according to a consistent standard for the school district (Pisarra & Giblette, 1981). An evaluation of literature of the school perspective suggests that DSB can be defined in terms which students, teachers, and administrators understand; that the three factors of type, severity, and frequency need to be considered; and that measures of DSB need to be standard ized. In this section five perspectives for defining disruptive school behavior were presented. Each perspec tive offers some assistance in differentiating this category from other behavioral categories. There appears to be support for an instrument which operationally defines types of behaviors occurring throughout the school environment, assigns a quantity to each descriptive item based on the perceived frequency of occurrence and sever ity, and provides for comparing the score of an individual student to a predetermined norm for that environment. Identification, Assessment, and Placement "Measurement is the construction of a model of some property of the world" (Fraser, 1980, p. 27) and in education this property is often the behavior of a student. One role of the model provided by a measure is to give accurate prescriptive information for planning interventions with students (Forness, 1983). Several studies have suggested this is being performed

PAGE 32

22 inadequately (Greenwood, Walker, & Hops, 1977; Schenck, 1980; Sinclair, 1980; Sinclair & Kheifets, 1982; Spivack & Swift, 1973; Strain, Cooke, & Apolloni, 1976). Fraser (1980) acknowledged that psychological mea surement has been regarded as being quantitatively and qualitatively of a lower order than physical measurement. To achieve improvement, Ysseldyke and Marston (1982) have argued for the use of direct observations of target behav iors by either teachers or trained observers. However, Jones, Reid, and Patterson (1975) found observer reli ability varied inversely with the complexity of the behaviors being observed. Attempts to improve the validity of observations have included such sophisticated approaches as Multidimensional Scaling (MDS) (Torgerson, 1958). Sanson-Fisher and Mulligan (1977), using adolescent student models, found only marginal improvement for this technique over ratings by classroom teachers. A comparison of a computer-driven program for selecting behavioral/emotional disorders with two expert psychologists' selections indicated no mean ingful differences existed (McDermott & Hale, 1982). Weinrott (1979) summarized studies that indicated global ratings could be significantly influenced by expectations, while post hoc ratings of the same children by the same raters when recorded on an instrument accurately reflected discrete behavioral events. Gaynor and Gaynor (1976)

PAGE 33

23 argued for instruments written to define behaviors so they may be described quantitatively by teachers. Beltramini (1982) suggested that scale-item content is more important than other variables in obtaining reli able and valid results. A review by Albaum, Best, and Hawkins (1981) of measurement literature found evidence to support the use of from five to seven categories on Likert type scales, with no significant losses in reliability, validity, or discrimination when compared with instruments using more intervals. Fewer intervals sometimes resulted in a loss of discriminative power and validity. It appears that teachers using instruments which operation ally describe disruptive behaviors can be effective post hoc raters and are . able to provide reliable and val id identification of disruptive school behavior (Edelbrock, 1979; Gresham, 1982; O'Leary & Johnson, 1979). A review of current assessment techniques suggests the emergence of a quantitative/qualitative dichotomy, which will now be explored. In two reviews (Spivack & Swift, 1973, 1977) of instruments for measuring secondary school classroom behaviors no instrument was located which limits its focus to disruptive school behavior, uses only behaviorally-stated items, and provides for calculating local norms. Descriptions follow of representative instruments currently in use.

PAGE 34

24 The Behavior Problem Checklist (BPC) (Quay & Peterson 1967, 1975, 1979) is a 55-item scale of behavioral traits developed from a review of clinical records of kinder garten through eighth grade students referred for psychiatric treatment (Quay, 1977). The i terns were assigned by factor analysis to four scales plus a grouping sugge stive of psychosis. Epstein, Cullinan, and Rosemier (1983, p. 172) and Gresham (1982, p. 137) reported that the BPC is one of the behavior rating scales most widely used in school studies. The BPC has been used extensively both as a research device (Eaves, 1975; Jacob, Grounds, & Haley, 1982; Kelley, 1981; Touliatos & Lindholm, 1981) and in selecting students for interventions (Algozzine, 1977; Balow, 1979; R. Bower, 1969; Gerard, 1970; Ingram, Gerard, Quay, & Levinson, 1970; McCarthy & Paraskevapoulas, 1969). Jacob et al. (1982) reported that reviews of studies utilizing the BPC suggested reliability and validity issues in need of further study. The inability of the BPC _ to provide other than broadband classifications has been noted (Achenbach & Edelbrock, 1978). Comprehensive normative data are not available for the BPC for adolescents (Kelley, 1981). In an inyestiga tion of the effects of race on BPC ratings, Eaves (1975) found that white teachers consistently rated black students higher than white students on three of the

PAGE 35

25 subscales. Black teachers showed no such bias. Eaves (1975) concluded this bias could have a major effect on the reported norms for the BPC. Touliatos and Lindholm ( 1981) found that grade level, sex, and social class had a significant effect on BPC ratings. However, differences between schools and teachers contributed more variance in the BPC ratings than grade, sex, and social class. Touliatos and Lindholm (1981) suggested that Quay and Peterson's (1967) recommendations be followed and individual assessment be based on norms calculated for particular schools and individual teachers. Spivack and Swift (1973) concluded that the BPC was a reasonably reliable measurement tool. Potential users were cautioned, however, that most items are not specifi cally observable, but more like labels which imply behaviors and designate traits. Likewise, Stott (1971, p. 232) cited certain BPC items as requiring a teacher to make inferences about students' feelings (e.g., "feelings of inferiority"), being vague or ambiguous (e.g., "oddness, bizarre behavior"), and relating to behaviors unobservable by a teacher (e.g., "stays out late at night," "bed wetting"). This review has identified several areas of the BPC for which additional research has been suggested. The Behavior Rating Profile (BRP) (Brown & Hammill, 1978) is composed of five rating scales and a sociogram. Three of the scales (60 items) are completed by the target

PAGE 36

26 student, one (30 items) by the teacher, and one (30 items) by parents. The sociogram is a peer nominating techni que. The student scales provide self-ratings of behaviors at home, at school, and with peers. The BRP is based on an ecological approach which, according to the authors, recognizes that students' behaviors are dependent on the settings in which they occur. Its purposes are the identification of students with behavior problems and the differentiations among learning disabled, emotionally disturbed, and behaviorally disordered students in grades 1-12. Each of the six measures is described as independent and individually normed, allowing any scale to be used alone or in conjunc tion with any of the others. The BRP manual (Brown & Hammiil, 1978) reports internal consistency reliability coefficients exceeding .80. Concurrent validity was investigated by correlating the BRP with measures obtained from other rating scales. Adequate construct and content validity also are reported by the authors. Norms are provided using scale scores with means of 10 and standard deviations of 3, with scores from 7 to 13 considered to be in the normal range. One study (Reisberg, Fudell, & Hudson, 1982) of behavior disordered students indicated that regular classroom teachers gave higher ratings than special educators (X:8.85 vs. X:6.87). Thus, norms may vary

PAGE 37

27 according to the type of respondent (e.g., regular teacher or special education teacher). Also, students' self ratings were inflated relative to other respondents' ratings. Other investigators have noted problems associated with attempts at multiple and self-ratings. Lessing and her associates (Lessing & Clarke, 1982; Lessing, Williams, & Gil, 1982; Lessing, Williams, & Revelle, 1981) have reported on their unsuccessful attempts to develop parallel checklists for use by parents, teachers, and clinicians in psychiatric diag noses. Lobitz and Johnson (1975) found low correlations between parent ratings and observed behaviors. Variables confounding self-ratings include halo effect (Holzbach, 1978), social desirability (Dunnett, Koun, & Barber, 1981; Seidman, . Rappaport, Kramer, Linney, Herzberger, & Alden, 1 979) , and 1 ack of self-kn ow 1 edge ( Bei tchman & Raman, 1979) . Ledingham, Younger, Schwartzman, and Bergeron (1982) investigated teacher, peer, and self-ratings of 801 elementary school students. Self-ratings yielded the lowest ratings for deviant behavior, aggression, and withdrawal and the highest ratings for likability. Accu racy of self-evaluation has been found to be positively correlated with high intelligence, high achievement status, and internal locus of control, characteristics not usually associated with DSB (Dunnett et al., 1981).

PAGE 38

28 Reported research using the Behavior Rating Profile is sparse. Additional verification of the assumptions of equivalency of norms within respondent categories and the validity of the self-report scales seems indicated. The Bristol Social Adjustment Guides, 5/e.(BSAG) (Stott, 1972) consist of 110 behaviorally-stated items from which teachers select those descriptive of a student's behavior in the month prior to the rating. The items were originally developed in 1955 from clinical observations of children aged 6 to 14 and modified by classroom teachers (Stott & Sykes, 1956). A primary goal was to incorporate context into the behavioral descrip tions (Stott, 1971). The BSAG has been used extensively in clinical and research studies (Davis, Butler, & Goldstein, 1972; McDermott, 1980; Stott, 1978; Stott & Wilson, 1977). Reliability and validity data were obtained through extensive research (Stott et al. 1975) but are not reported in a manner that is easily abstracted. Normative data are available only for elementary school populations (Stott, 1972). More recent research (McDermott, 1980, 1981; McDermott & Hale, 1982) has questioned the specificity of the core syndromes of the BSAG and called for further investigation of construct and predictive validities (Hale & Zuckerman, 1981). At present, it

PAGE 39

29 appears that not all of the core syndromes of the BSAG have the specificity required in an instrument to be used in educational placement. The Hahneman High School Behavior Rating Scale (HHSB) is a 13-factor, 45-item scale published in 1971 (Spivack & Swift, 1971). The HHSB items were developed from observa tions of actual classroom behaviors, operationally stated in educational terms. The items cover both academic and interpersonal issues and can be rated by teachers in the classroom. The intent is to provide prescriptive informa tion (Spivack & Swift, 1977). The factor scores for each student are found by adding the raw scores for the three or four items comprising each factor. These scores are then combined into a profile, which is used to classify students on the basis of their ability to adapt to total classroom demands. According to the authors (Spivack & Swift, 1973), validity studies suggest consistent and significant relationships between factor scores and academic grades. No data are available on test-retest or interrater reli ability (Spivack & Swift, 1973). Norms are available separately for suburban and urban samples. The HHSB is limited as a selection device for special education pro grams by lack of reliability data, use of only three or four items per factor, and overlapping among profile de scr i pti ons.

PAGE 40

30 The Behavior Ev al ua ti on Seale (BES) ( Mc Carney, Leigh, & Cornbleet, 1983) is a 52-item rating scale for use by school personnel. Each item is assigned to a subscale associated with one of the five characteristics of the Bower (1958) definition of behavior disorders used in Public Law 94-142. The BES was developed to aid in diagnosis, placement, and program planning under federal guidelines. Si nee federal er i ter i a specif i cal 1 y exclude the "socially maladjusted" student, the BES is inappropri ate for assessing DSB. The Portland Problem Behavior Checklist (PPBC) (Waksman & Loveland, 1980) was developed to aid in assessment, evaluation, and intervention planning for school children. The 29 items cover teacher-rated behaviors for grade levels K-12. Norms are not avail able. Items are very generally stated (e.g., aggressive physical, destructive) and are rated on a scale of 0 (no problem) to 5 (severe). It is not clear if this is a rating of frequency of behavior or severity of the consequences of the behavior. These features of the PPBC would seem to limit the preciseness and reduce the confi dence level of quantitative scores intended to support evaluation and placement for professional services. The Pupil Classroom Behavior Scale (PCBS) (Dayton, 1967) is a 24-item, teacher-administered rating scale intended to measure the effectiveness of special education

PAGE 41

31 services for students displaying inappropriate classroom behaviors. Most i terns a re behaviorally stated and yield a profile of three factors, achievement orientation, socio academic creativity, and socio-cooperativeness. Dayton (1967) suggested using the scales for research on groups rather than to describe individual students. Norms are not available. Spivack and Swift (1973) concluded that the PCBS is flawed by having overlapping items in the factors and lacking data to support a relationship beween seal e scores and emotional adjustment. The 36-item Conners Teachers' Rating Scale (CTRS) (Conners, 1969) has been used primarily in clinical diagno sis _ of children, particularly in the area of hyperactivity (Goyette, Conners, & Ulrich, 1978). It does, however, cover a wide range of school problem behaviors (Roberts, Milich, Loney, & Caputo, 1981). There appears to be a high intercorrelation between the Conduct Problem and Hyperactivity subscales, limiting the usefulness of the CTRS in identifying DSB. The Brief Behavior Rating Scale (BBRS) (Kahn & Ribner, 1982) was developed from the Devereux series of rating scales (Spivack, Haimes, & Spotts, 1967). A cross validation study (Kahn & Ribner, 1982) reported that 61% of a socially maladjusted group and 27i of an emotionally handicapped group were correctly identified. These results suggest that additional development is needed

PAGE 42

32 to obtain support for the discriminant validity of the BBRS. Some of the most complete research in instrument development has been conducted in attempts to improve the diagnosis of clinical populations in the school environ ment. Although these efforts are not directly comparable to the intent of the present study, six instruments having potential interest to researchers working in the school setting will be summarized. The Child Behavior Check List (CBCL) (Achenbach, 1978) contains 118 behavior problem items and 20 social competence items. Parallel forms exist for parents and teachers. A review by Achenbach and Edelbrock (1978) of empirical attempts to derive syndromes of child behavior problems concluded with the recommendation that these efforts be linked to the existing mental health system. Recent efforts by these researchers and their associates (Edelbrock & Achenbach, 1980; Reed & Edelbrock, 1983) continue to pursue this objective. At present the applicability of this instrument for educational measure ment is 1 imi ted. The role of parent observations in describing chil dren's behavior is formalized in the Louisville Behavior Check Lists (Miller, 1967, 1980). A study (Tarte, Vernon, Luke, & Clark, 1982) confirmed the validity of parent observations of clinical symptoms in their children.

PAGE 43

33 The items require inferences and judgments by raters. Eight subscales were created through factor analysis and although several appear to relate to school activities (e.g., hyperactivity, antisocial), the content of individual items comprising the subscales renders them only marginally useful for school assessments. The Children's Behaviour Questionnaire (Rutter, 1967) was developed for teachers' use in screening for psychi atric assessment large numbers of school children. Many of the 26 items are vaguely stated and some appear to require inferences by the rater. The two subscales are labeled neurotic and antisocial, terms which lack direct application to the school setting. The Devereux Adolescent Behavior Rating Scale (Spivack et al., 1967) was developed to measure behavior requiring professional intervention. The subscales are oriented to clinical diagnosis and offer little specific information for use in placement decisions. The Pupil Behavior Inventory: 7-12 Grades (Vinter, Sarri, Vorwaller, & Schafer, 1966) is a 34-item, teacher administered rating scale intended to furnish information on students referred for agency treatment. Behav icral items were collected from teachers, screened and factor analyzed, and grouped into five factors. Lack of data on reliability, validity, and norms suggests caution in

PAGE 44

34 using this instrument to select students for special services (Spivack & Swift, 1973). The Mooney Problem Check List (MPCL) (Mooney, 1942), has been widely used by counselors to identify problems of individuals seeking counseling or to explore the problem profile of a group of students (Sundberg, 1961). However, two studies (Joshi, 1964; Stewart & Deiker, 1976) of the underlying factors of the MPCL scales have identified only a single general factor. The MPCL may be further limited by utilizing items ~enerated from ~roblems mentioned by high school students in 1942. Several instruments designed for other populations include behaviors often used in descriptions of disruptive school behavior. The Adolescent Behavioral Classification Project instrument (Dreger, 1980) was developed for assessing problems of institutionalized adolescents. An analysis of the first-order factors indicates some common alities with both the Hahnemann High School Behavior Rating Scale (Spivack & Swift, 1977) and Achenbach and Edelbrock's (1978) syndromes, but many are couched in clinical terms that have little or no relevance to the classroom setting. Ostrov and associates (Ostrov, Marohn, Offer, Curtiss, & Feczko, 1980) developed and validated the Adolescent Antisocial Behavior Check List (AABCL) for delinquents housed in an institutional treatment setting. The authors

PAGE 45

35 called for modification of the instrument for use in other settings; however, extensive rewriting of items would seem to be required. The Jesness Inventory (Jesness, 1972) was created to measure attitude change in youthful offenders undergoing treatment. One study (Graham, 1981) found the Jesness Inventory did not have the pow er to di scr imina te between non-adjudicated and normal populations and thus would not be useful in a school setting. The Jesness Inventory appears best suited for research (Buros, 1978, pp. 876-878). The Jesness Behavior Checklist (JBC) (Jesness, 1970) is also a measure of delinquent behavior. The reliability and validity of this instrument have been questioned and the JBC is recommended only for research purposes (Buros, 1978, pp. 873-876). Non-quantitative assessment often uses nonsystematic observations to provide the information from which judg ments will be made. Judgments about individuals are required in all assessment. Inaccurate, biased, or sub jective judgments can be misleading and harmful (Salvia & Ysseldyke, 1981). The Russell Sage Foundation Conference Guidelines (Goslin, 1969) and the 1974 Family Educational Rights and Privacy Act (P.L. 93-380--the Buckley amend ment) established guidelines for the proper collection, maintenance, and dissemination of data concerning students.

PAGE 46

36 For data to be used in making judgments, it must be verified. For standardized tests, this v erif ica ti on is implicit in the psychometric qualities of the instrument. For observational data, verification requires confirmation by persons other than the original observers (Salvia & Ysseldyke, 1981 ). When the observation is nonsystematic, verification may be difficult to establish and support and the assessment and resulting evaluation may be open to challenge. After a classroom teacher nominates a child for evaluation for exceptional child education services, that teacher's observation is verified by required legal proce dures (P.L. 94-142). There may be no such procedures for other interventions. The Duval County, Florida, School District has used teacher and principal nominations as the criteria for admittance and dismissal from a program to intervene with students displaying inappropriate social behaviors (Duval County Public Schools, 1980). Short-term suspensions in many school districts do not require hear ings and are based solely on a judgment by the school principal (Lines, 1972; Pisarra & Giblette, 1981). Subjective assessment practices such as these may allow extraneous variables to influence judgments (Poulton, 1976). Four such variables are bias, the influ ence of observer expectations, inaccurate perceptions, and vagueness of the criteria for intervention.

PAGE 47

37 Pupil characteristics were found by Ysseldyke and Marston (1982) to influence rater bias. Variables contributing to bias include perceived physical attrac tiveness (Ross & Salvia, 1975); sex, socioeconomic status, and reason for referral (Matusek & Oakland, 1979; Ysseldyke & Algozzine, 1982; Ysseldyke, Algozzine, Regan, & McGue, 1979, 1981); race (Florida Department of Education Report on Public Schools, 1983; Sikes, 1975); type of behavior displayed by the student (Algozzine, 1980); and the theoretical orientation of the observer (Messick, 1980; Salvia & Ysseldyke, 1981). Erickson (1974) and Shuller and McNamara (1976) found naive observers' reports coincided with experimenter induced expectancies about problem behavior. After I observing decisions made by educators, Weinrott (1979); Ysseldyke, Algozzine, and Richey ( 1982); and Algozzine and Ysseldyke (1981) speculated that these judgments were inf 1 ue need by an expectancy fact or er eated by the situation itself. A more direct measure of expectation was reported on by Green and Brydon (1975). They found teachers' attitudes were much more favorable toward middle-income children than low-income children and that 43% of teachers' comments about black children were negative as opposed to 17% of comments about white children.

PAGE 48

38 Dunlap and Dillard (1980) investigated 164 school principals' perceptions of the factors indicative of emotional disturbance in children. The factor least chosen by the principals was the one considered by the researchers most predictive of emotional disturbance. The vagueness of criteria for suspension in one school district was investigated by Pisarra and Giblette (1981 ). They found the criterion to be improper conduct, which was not further defined. The researchers concluded that a student reported for fighting would be suspended, possibly suspended, or not suspended depending on the individual administrator who had jurisdiction. A few of the possible sources of error in nonsystem atic observation leading to inaccurate, biased, or subjective judgments have been presented to suggest their ubiquitous nature and the necessity of providing for systematic observations in judgments leading to educa tional placement decisions. Rating Scale Development Designing a rating scale requires addressing four major issues: (a) what to measure (parameters), (b) how to measure (item content and format), (c) how to record (response format), and (d) how to interpret the results (statistical analysis). Literature pertaining to these issues will be reviewed in this section.

PAGE 49

39 In a freq ue ntl y cited 1 ongi tudi nal study of deviant behavior, Robins (1966) found the variables of type of behavior, frequency of occurrences, and severity of consequences to be indicators of future behavior pat terns. More recent studies supporting these criteria include those of Kohn, Koretzky, and Haft ( 1979); Camp (1980); Forness and Cantwell (1982); Gresham (1982); Loeber (1982); and a United States Department of Justice report (1982, p. 1). The types of behavior to be measured by a rating scale are determined by its author(s), who must consider content, sources, format, number, and order of presenta tion of the items to be included. Halo effects, or the tendency to rate individuals holistically (Thorndike, 1920, p. 25; Willingham & Jones, 1958), were found by Cooper (1981; 1983) to be reduced by having more specific i tern content. Kreitler and Kreitler ( 1981) demonstrated that items deemed irrelevant by raters tended to be scored neutrally, thus 1 imi ting the derived information. Nev er thel e ss, scales for rating disruptive behavior sometimes include prosocial behavior content (Miller, 1980). However, Deno (1979) suggested that to observe non disruptive behavior ignores the purpose of these ratings, i.e., to determine whether inappropriate behaviors are actually excessive. Schriesheim and Hill ( 1981) mixed positive and negative statements on a questionnaire and

PAGE 50

40 concluded that the effect was to reduce response validity. Many scales do limit their items to behaviors that focus on problem behavior (DiPrete, 1981; Duke, 1978; Governor's Task Force on Disrupted Youth, 1974; Spivack & Swift, 1966; Walker, 1979, p. 55), although not necessarily school problems. Camp (1980) suggested that only school problems directly observable by teachers and/or administrators be included in scales for rating disruptive school behavior. Logically, items taken from the setting in which the ratings will be made best meet the criteria for relevant content. Smith and Kendall (1963) used this premise in devising Behavioral Expectation Scales (BES). Numerous examples exist of the application of this premise in education (Brown & Hammill, 1978; Camp, 1980; Duval County School Board, 1979; Ross, Lacey, & Parton, 1965; Sherry, 1979; Spivack & Swift, 1977; Stott et al., 1975), mental health (Kaufman, Swan, & Wood, 1979; Kohn et al., 1979; Lachar & Gdowski, 1979; Miller, 1980) and industry (Vance, Kuhnert, & Farr, 1978). Item format refers to the various forms used in presenting the information to which the rater is asked to respond. It is often related to response format, which refers to the methods of collecting information from the raters. Response format literature will be presented in the section covering the frequency characteristic.

PAGE 51

41 Four types of item formats are currently in use in behavioral rating scales. Behavioral Observation Scales (BOS) describe the target behavior in specific terms that require direct observation at the time the rating is made ( Latham & Wexl ey, 1977). Behaviorally Anchored Rating Scales (BARS) provide a specific description of a behavior for each successive rating point (anchor) of an item and assess cumulative behavior over some time period (Smith & Kendall, 1963). The Mixed Standard Scale (MSS) uses sev eral scales, with three levels of behavioral description for each trait to be measured, and randomizes the order of presentation (Blanz & Ghiselli, 1972). Summated rating scales (Edwards, 1957), referred to as Likert scales (LT) (1932) or graphic rating scales (Waters, Reardon, & Edwards, 1982), present for each item one statement that may be specific or general. Like rt scales have been used with both direct and deferred obser vation. BOS scales are developed using summated rating procedures (Likert, 1932), while BARS and MSS use the Thurstone (Thurstone & Chave, 1929) scale development process (Bruvold, 1969). Conflicting conclusions have resulted from numerous investigations into the advantages and disadvantages of these seal e formats. Fay and Latham ( 1982) found BOS to be superior to BARS in rating video-taped behavior during job interviews. However, Murphy, Martin, and Garcia

PAGE 52

42 (1982) questioned the theoretical basis for BOS and found evidence to suggest that BOS tapped recall for behavior traits as well as immediate observation. Several studies (Hom, DeNisi, Kinicki, & Bannister, 1982; Ivancevich, 1980; Keaveny & McGann, 1975; Lee, Malone, & Greco, 1981) failed to find significant advantages for the BARS format over summated rating scales or other alternative methods (Jacobs, Kafry, & Zedeck, 1980; Kingstrom & Bass, 1981; Schwab, Heneman, & DeCotiis, 1975). In opposition to MSS theory, Finley, Osburn, Dubin, and Jeanneret (1977) found evidence to suggest that an obvious scale format may be superior to a hidden contin uum. Dickinson and Zellinger ( 1980) compared MSS, BARS, and LT formats and found MSS produced less method bias, BARS produced as much discriminant validity as MSS and provided the best feedback to ratees, and LT scales were easiest to understand and use. When Bruvold (1969) tested the application of summated scales (Likert, 1932) and successive interval scales (Edwards & Thurstone, 1952) to the same data set, no significant differences were found between the two scaling methods. According to Bernardin and Smith (1981), one explanation may be that scale constructors have deviated from the original procedures (Smith & Kendall, 1963) in developing BARS instruments. In addition to the Thurstone and Likert scaling procedures, a third method is available. According to

PAGE 53

43 Edwards (1957, p. 172), a Guttman (1944, 1945, 1947a, 1947b), or cumulative scale, requires that the construct to be measured be unidimensional. Since disruptive school behavior consists of many discrete behaviors, a Guttman scale is not suitable for the instrument developed in this study. At present, it appears that no item format is superior enough to warrant relinquishing the clarity of understanding and ease of use (Dickinson & Zellinger, 1980) of the Likert scale, which presents one descriptive item at a time to which the rater assigns a quantitative value from a given range of values. In determining the number of items to include in a rating scale, some researchers (Quay & Peterson, 1967, 1979; Spivack & Swift, 1971; Stott, 1972) have relied on factor analysis, using an arbitrarily chosen factor score as the cut-off score. Edwards (1957) suggested an intui tive approach, utilizing 20-25 items that discriminate between the groups at the extremes of the scale. A comprehensive study (Achenbach & Edelbrock, 1978) of 18 rating scales found the range of items to be from 36 to 287 (median = 68 items; mean = 90.4 items). Of the 6 scales intended for use by teachers, 4 contained fewer than 50 items and 2 between 50 and 100 items. In a study of preferred scale length, Meredith (1981) found half of the respondents preferred from 20 to 40 items, with 25 the median preferred length. In another

PAGE 54

44 study, Meredith (1975) found a 52-item scale was judged t o o 1 on g. Se i dm an a n d h i s a s soc i ate s ( Se i dm an e t a 1 . , 1979) concluded their 46-item Teacher Behavior Description Form was too cumbersome and reduced it to 23 items. While item complexity is probably a factor (Mere . dith, 1981), this review suggests a scale using no more than 40 items would probably be acceptable to most teachers. The ordering of items within a scale has been suggested as a possible source of leniency error, halo effect, and impaired discriminant validity (Blanz & Ghiselli, 1972). Schriesheim and DeNisi (1980) and Schriesheim (1981b) found that grouping according to constructs rather than randomizing questionnaire items resulted in impaired discriminant validity. Increased leniency response bias was also found when items were gr o u pe d ( Sch r i e sh e i m , 1 9 8 1 a ) Dickinson and Zellinger (1980) concluded that a randomized scale contributed as much discriminant validity as an ordered scale while displaying less method bias. In a comparison of randomized and grouped scales, the randomized scale engendered as much convergent and discriminant validity (Waters et al., 1982). Thus, a randomized order of presentation seems indicated. Obtaining a meaningful measure of the frequency of target behaviors requires attention to the variables of response format, length of the observation period, and

PAGE 55

45 type and number of raters. According to Tzeng (1983), four response formats are most frequently cited in the literature. They can be differentiated in terms of two psychometric criteria. First, the existence of a neutral response option defines the~ choice format. Absence of a neutral rating option defines the forced choice format. Second, categorical (qualitative) ratings answer the question "Does the ratee fit this category?" while discriminatory (quantitative) ratings answer the question "To what degree does the ratee fit?" Tzeng (1983) criticized forced choice measures for their omission of a valid response category, i.e., uncer tainty or neutrality of the raters' perceptions. King~ Hunter, and Schmidt (1980) concluded that a forced choice format was ineffective in reducing rater halo. Dunnette (1963, p. 96) reported that rater resistance to forced choice formats led to their abandonment. Categorical, or qualitative, formats used in checklists cannot detect relative differences in degree between two behaviors performed by the same ratee or be tween the same behaviors among r a tees (Tzeng, 1 983) . Johnson, Smith, and Tucker (1982) found less response skewness on a 5-point Likert discriminatory scale compared to a yes/?/no categorical format. A zero-based discrimina tory, free choice response format seems most appropriate (Likert, 1932). The absence of a behavior can be indicated

PAGE 56

46 by the O position or, if present, the perceived frequency can be indicated by choosing a value from the remainder of the scale (Edwards, 1957). The number of value choices permitted to the rater is a critical issue. If few points are used some information may be lost, but the scales are less ambiguous for the rater. If there are too many points the discriminations may be too fine for the rater to make. Albaum et al. (1981) attempted to show superiority for a continuous scale format, but concluded that equivalent aggregate measurements were obtained from a 5-category, discrete rating scale. Likewise, Bernardin et al. (1976) and Bardo and Yaeger (1982) failed to find continuous scales superior to discrete scales. The superiority of a 5-point, discrete rating scale has been suggested by Cowen, Dorr, Clarfield, Kreling, McWilliams, Pokracki, Pratt, Terrell, and Wilson (1973); Lissitz and Green (1975); McKelvie (1978); Neumann and Neumann (1981); and Broadbent, Cooper, Fitzgerald, and Parkes (1982). Conversely, Bardo and his associates (Bardo & Yeager, 1982; Bardo, Yeager, & Klingsporn, 1982) found obtained means and variances closer to the expected values for 4-point scales over 5and 7-point scales. These results appear contrary to most other studies. Edwards ( 1957, pp.

PAGE 57

47 150-151) gives Likert's original statistical rationale for the use of a 5-point scale, anchored with the integers 0 through 4, and the summation of scores for individual i terns as a total score for each ra tee. Cur rent research provides no compelling evidence for departing from this original format. An anchor, e.g., ."always," "sometimes," "never," is usually associated with each scale point of a Likert-type summated rating scale (Pohl, 1981). While a variety of anchors has been used, the basis for the selection is often not stated (Beatty, Schneier, & Beatty, 1977; Broadbent et al., 1982; Camp, 1980; Cowen et al., 1973; Hunter, Hunter, & Lopis, 1979; Kassin & Wrightsman, 1983; Moses, 1974; Siegel, Dragovich, & Marholin, 1976; Solomon & Kendall, 1977; White, 1977). Several studies have investigated the assumptions involved in the selection of one popular set of anchors: always, often, occasionally, seldom, and never. Parducci (1968), Chase (1969), and Pepper and Prytulak (1974) con cluded that the meanings of anchor words were influenced by context. The effects of individual differences among raters on their interpretations of anchor words were demonstrated by Helson ( 1969) and Gooch er ( 1965). These studies suggested that the above anchors may not define perceptually equal intervals along the rating continuum.

PAGE 58

48 Four studies (Bass, Cascio, & O'Conner, 1974; Schriesheim & Shriesheim, 1974, 1978; Spector, 1976) have sought to select five anchor words that would be perceived by raters as defining equally spaced rating intervals. However, the most definitive study appears to be Pohl's ( 1981) partial replication of the Bass et al. ( 1974) and Shriesheim and Shriesheim (1974, 1978) studies. Using responses from 164 college students, Pohl (1981) calcu lated the means and standard deviations for 39 expressions of frequency. Comparing these with the theoretical mean responses for a 5-point equal interval scale, Pohl ( 1981) derived the response set of always, quite often, sometimes, very infrequently, and none of the time. The calculated mean (26.71) for the mid-point term "sometimes" differed signi ficantly (p < .001) from the theoretical mean (29.05), but nevertheless was the value closest to the optimal for a 5-point scale. The other calculated values were not significantly different from the theoretical profile. Thus, with the exception of the mid-point term, it appears that the anchors produced by the Pohl (1981) study adequately defined equal-appearing intervals on a 5-point rating scale. The length of the period for which behaviors are to be rated has been little studied. For instance, the manual for the Behavior Problem Checklist (Quay &

PAGE 59

49 Peterson, 1967, 1975, 1979) does not specify for the rater the inclusive time period to be considered in rating the listed behaviors. The authors of the Devereux Elementary School Behavior Rating Scales (Spivack & Swift, 1966) instructed their raters to "consider recent and current behavior" (p. 75). The same authors (Spivack & Swift, 1977), in developing the Hahnemann High School Behavior Rating Scale, instructed teachers to base ratings on behavior observed "over the past month" (p. 300). A study (Hinton, Webster, & O'Neill, 1978) of hospi talized clinical patients used a 6-week time period. An investigation (Beatty et al., 1977) of performance rating in a data processing firm utilized three assessment periods of two months each for a total of six months. In a study of several response formats, Broadbent et al. (1982) used a 6-month inclusive time period. However, in none of these studies was a rationale given for selection of the time period. Two attempts at aggregating measures over specific time periods have provided more precise instructions to the rater. Cowen et al. (1973) defined each of five rating points in terms of the inclusive time periods to be considered when aggregating occurrences of behavior. For example, the fourth anchor point, often, was defined as "you have seen this behavior more often than once a week

PAGE 60

50 but less often than daily" (p. 16). Camp (1980) used the foll owing s ca 1 e : Frequency of occurrence 0 Never observed 1 Once or more in seme~ter 2 Once or more monthly 3 Once or more weekly 4 Once or more daily ( p. 1 1 ) The work of Seymour Epstein C 1 980) , in support of the stability over time of personality traits, bears directly on the issue of aggregating behavior ratings over some time period. Epstein (1980) stated that "stability can be demonstrated as long as the behavior in question is averaged over a sufficient number of occurrences" Cp. 791). In testing this hypothesis, Epstein conducted four studies in which he used, among other types, ratings performed in classrooms by teachers. Epstein suggested aggregating behavior over subjects, stimulus situations, time, and modes of measurement in order to establish predictive reliability and validity (p. 797). Ratings of middle and junior high school students by their teachers in different courses would meet the conditions of subjects and situations. Epstein (1980) suggested that ratings at a single time following multiple or extended observations represent an intuitive averaging that has the "potential for producing highly replicable and valid results" Cp. 802). Harrop (1979) also challenged the common assumptions (Fay & Latham, 1982;

PAGE 61

51 Latham, Fay, & Saari, 1979) that coding of directly observed behaviors produced superior results to aggregat ing behaviors over time. A related concern in the assessment of school-related behavior is selection of the time of year in which the ratings will be made. Several studies (Cowen et al., 1973; Epstein, et al., 1983; Larrivee & Bourque, 1980) recommend allowing student behavior and teacher percep tions to stabilize. Supporting these decisions are data from the Texas Junior High School Study (Evertsen, Anderson, & Brophy, 1979). Evertsen and Veldman (1981) found a moderate but steady increase in serious misbehavior over the course of the school year and an increase in general misbehavior in April. Evertsen and Veldman (1981) concluded that short term studies should avoid ratings made either early or late in the school year. The available literature seems to suggest the feasibility of aggregating behaviors over time periods specified in the rating scale instructions and after teachers have had at least two months to observe student behavior. Deciding on the most appropriate type of rater to use in assessing children's behavior has long been a problem. In 1965, Ross et al. recognized the potential usefulness of teacher ratings. Teacher's ratings have been found to be more accurate than peer ratings of classroom behaviors

PAGE 62

52 (Bailey, Bender, & Montgomery, 1 983) ; other school pr of es si onal s' ratings (Bower & Lambert, 1971, p. 143; Freemont & Wallbrown, 1979), and institutional child care workers' ratings (Kohn et al., 1979) and to be equivalent to the ratings obtained by a multidimensional scaling technique (MDS) applied to classroom behavior (Sanson-Fisher & M ul 1 i ga n, 1 97 7 ) A number of researchers have found support for teacher ratings as appropriate measures of general class room behaviors (Solomon & Kendall, 1977), social behavior (Loranger, Lacroix, & Kaley, 1982), assertive vs. aggres sive behavior (Roberts & Jenkins, 1982), acting out behavior (Walker, 1970), and behavior that would likely result in referrals for exceptional child education (Dean, 1980; Epstein et al., 1983; Horne & Larrivee, 1979; Lahey, Green, & Forehand, 1980; McKinney & Forman, 1982; Roberts et al , 1 981 ) Not all studies have yielded positive results. Morris and Arrant (1978) found that regular classroom teachers tended to see more behavior problems in students referred for evaluation than did school psychologists. A study (Kazdin, Esveldt-Dawson, & Loar, 1983) of psychiatric inpatient children found extra-class raters' evaluations of overt classroom behaviors to correspond more closely to direct observational data than did teachers' ratings. However, teachers were more accurate than the extra-class

PAGE 63

53 raters in identifying hyperactive children using a behav ior checklist. Overall, the ev ide nee suggests strong support for the use of teachers as raters of classroom behavior. An associated issue is the use of multiple raters to increase reliability and reduce halo effect (Epstein, 1980). Ratings of students commonly are obtained from all teachers having direct classroom contact (Linton & Chavez, 1979; Wixson, 1980). This procedure could result in as few as one or perhaps as many as seven ratings, depending on the grade level and local practice. More recent research efforts have focused on empirically determining the most effective number of raters. Prinz and Kent (1978) increased from 1 to 4 the number of raters of parent-adolescent interactions in a clinical setting and reported increased reliabilities. Both reliability and concurrent validity of clinical judgments were shown to increase when the number of judges was increased from one to ten (Horowitz, Inouye, & Siegelman, 1979). Strahan (1980) extended the Horowitz et al. ( 1979) study and concluded that after using four raters, adding additional ones contributed little to measurement effectiveness. Another study (Green, Bigelow, O'Brien, Stahl, & Wyatt, 1977) of inpatient clinical behaviors found little improvement when using more than four raters.

PAGE 64

54 Although in general agreement with the above studies, a cautionary note was added by Kenny and Berman ( 1980), who pointed out that if raters are completely unreliable, increasing their numbers will not increase reliability. The number of teachers usually available in a middle or junior high school to serve as raters would appear to be adequate to contribute to both improved reliability and concurrent validity. Various classifications of severity have been adopted in school settings. Student conduct codes typically use some method of indicating seriousness of offenses, such as "serious misconduct" (Pinellas County Schools, 1983, p. 7) and "minor, intermediate, and major" (Duval County Public Schools, 1980, p. 16). Researchers (Pisarra & Giblette, 1981) have used categories emphasizing the targets of the behavior (e.g., offenses against persons, offenses against state laws). Teachers often focus on specific behaviors (e.g., use of drugs, striking teacher) (Camp, 1981) and administrators have used a combination of both (National School Public Relations Association, 1973) There is little consensus on the number of levels to be used in assigning degrees of severity. Taylor et al. (1979) used levels ranging from 1 (not very severe) to 4 (extremely severe). Camp (1980) used 0 for "not con cerned" through 4 for "extremely concerned." In an earlier study, Moses (1974) used three levels, 1 (mild), 2

PAGE 65

55 (moderate), and 3 (severe) in asking mental health and criminal justice professionals to rate a list of problem behaviors. To use too many levels may imply a degree of confidence in discrimination not supported by the subjec tive nature of such ratings. Not all rating scale authors and researchers accept the necessity for including a severity rating (Searls, Isett, & Bowders, 1981; Spivack & Swift, 1977). Even when, as in the Behavior Problem Checklist (Quay & Peterson, 1967, 1975, 1979), a severity factor is provided for, the author does not always recommend its use. How ever, at the practitioners' level the degree of severity of behaviors is a major concern. Algozzine (1979), using items characteristic of several behavior rating scales, developed the Disturbing Behavior Checklist which asks teachers to rate the degree of disturbance they experience as a result of different student behaviors. This suggests a consequence to the teacher based not on the frequency of the behavior, but on the type and severity. After noting i rregul ari ti es and lower reliabilities, Taylor et al. (1979) had teachers rate for severity 26 items of Part Two of the Adaptive Behavior Scale (ABS) (Nihira, Foster, Shellhaas, & Leland, 1969). Teachers were able both to categorize behaviors and rate them in terms of severity, leading Taylor et al. (1979) to conclude that this additional information would

PAGE 66

56 be useful in refining the scale and adding to its clinical efficacy. Inasmuch as the instrument developed in this study is intended to have locally developed norms, the statistical techniques used in the norming procedure and the comparing of individual scores to the derived local norms are not complex. While some more recent studies have focused on problems associated with such common procedures as the calculation of measures of central tendency (Mosteller & Tukey, 1977; Stavig, 1978, 1982), many researchers continue to rely on descriptive statistics utilizing raw scores, arithmetic means, standard deviations, and stan dard scores. White (1977) compared individual student's scores on classroom behavior to the computed mean score for five classes of "Follow Through" program students in order to identify immature students. In a business setting, Fay and Latham (1982) used means and standard deviations in comparing scores obtained using two different rating methods. A study (Lyness & Cornelius, 1982) comparing judgment strategies and ratings of college instructors supported the use of a rating scale composed of discrete items, with an overall rating calculated by weighting the i terns and summing the weighted scores. To obtain mean sub-scores for subjects, Algozzine (1980) summed scores across the items defining each of four factors of

PAGE 67

57 disturbing behaviors and used means and standard devia tions in analyzing the results. The cited studies seem to support the use of descrip tive statistics in both obtaining individual scores (i.e., sum of weighted ratings) and deriving a local norm (i.e., mean) from ratings of a representative sample of a total population. Salvia and Ysseldyke (1981, chap. 4) offer definitions of common terms for descriptive statistics applied to assessment. Psychometric Properties of Rating Scales Historically, rating techniques have aroused contro versy over estimations of validity and reliability (Ryan, 1958). Validity is the relevance of the scale to the variables being measured. Most sources recognize three types of validity, i.e., content, criterion-related or concurrent, and construct (American Psychological Association, 1966; Cronbach, 1970; Kerlinger, 1972). Reliability is the accuracy or precision of a measuring instrument and has been usually classified as either temporal, inter-rater, or internal (Cronbach, 1970). However, investigations (Epstein, 1980) into the effects of situations on behavior have recently introduced a fourth consideration, situational reliability, or the consistency of behavior across settings. The development of norms against which to compare results obtained from

PAGE 68

58 individual administrations of rating scales is another area of active investigation (Mendelsohn & Erdwins, 1978; Messick, 1980). Research on these issues is reviewed in this section. Content validity refers to the relevance and repre sentativeness of the items used in construction of a scale (Epstein, 1980). Often, this is determined by obtaining judgments from experts not otherwise involved in the scale construction (DiStefano, Pryer, & Erffmeyer, 1983; Jones et al., 1975, p. 83; Lawshe, 1975; Thorne, 1978). Kreitler and Kreitler (1981) found that item content determined the rater's perception of the central theme of an instrument. Items not perceived as relevant to the central theme tended to be given neutral responses, thus limiting the information contributed by the rater. Criterion-related validity is studied by comparing scores obtained from an instrument with one or more external criteria of the variable being measured (Kerlinger, 1972, p. 459). Criterion-related validity encompasses both concurrent and predictive qualities (Epstein, 1980). The comparison of scale results with an independent judgment or diagnosis of a subject is an example of an attempt at estimating criterion-related validity. If the judgment or diagnosis confirms the scale indications, the inference may be drawn that the scale is in agreement with the concurrent diagnosis and is

PAGE 69

59 predictive that others given a similar rating would also be diagnosed similarly (Kohn et al., 1979; Mendelsohn & Erdwins, 1978). In one validation study, Harris, Kreil, and Orpet (1977) used the school principal, guidance counselor, and two teachers as judges in selecting both disruptive and prosocial students for rating by the Behavior Coding System (Patterson, Ray, Shaw, & Cobb, 1969). In develop ing the Pittsburgh Adjustment Survey Scales (Ross et al., 1965), school principals were used to nominate adjusted, withdrawn, and aggressive students for rating by their teachers and scale results were compared with these n om i n a ti on s According to Kerlinger (1972, p. 461) and Cronbach (1970), the significance of construct validity is its concern with the theory behind the variable being measured. Guion (1977) argues that construct validity integrates both content and criterion considerations. Likewise, the usefulness of content and concurrent validity is questioned by Sanson-Fisher and Mulligan (1977) and construct validity is supported. A definition of construct validity as the process of ascribing meaning to scores is offered by Stenner and Smith (1982). Messick (1980) broadens the concept of validity to include both test interpretation and test use. Messick (1980) describes construct validity as

PAGE 70

60 "interpretive meaningfulness" (p. 1015) and suggests that it rests on four bases: convergent and discriminant validity, ethical interpretation, relevance and utility for the specific application, and the consequences follow ing use of the instrument. To be interpretable, a rating scale must be reli able. That is, a scale must produce similar results when applied to the same person over several administrations, the instrument must be relatively free of errors of mea surement, and the results must closely approximate the "true" value of the variable for the person being rated (Cronbach, 1970; Kerlinger, 1972). Typically, test-retest data are compiled for varying time periods between administrations. The correlation between the two obtained scores is used to justify esti mations of temporal stability and, in the case of rating scales, intra-rater reliability. Examples of reported test-retest intervals include one week (Duval County School Board, 1979; Quay, 1977), two weeks (Mendelsohn & Erdwins, 1978; Russell, Lankford, & Grinnell, 1981) and two years (Quay, 1977). However, Masterson (1968) pointed out that low test-retest correlation coefficients may reflect the transitory nature of the measured variable and suggested high coefficients of internal consistency may be more indicative of reliability for some instruments.

PAGE 71

61 Internal consistency has often been estimated by inter-item and item-total analysis (Edwards, 1957; Kerlinger, 1972). In these procedures, an individual's rating on one item is compared with the rating on all other items or with the total score from the scale or subscale to estimate the degree to which each item is similar to the other items. Item analysis may be impor tant in reducing errors of measurement attributable to the composition of the instrument (Benson & Clark, 1982). However, internal consistency may not provide good reliabi lity estimation for a rating scale assessing constructs comprised of many discrete behaviors (Kerlinger, 1972). Some research (Rosenthal & Jacobson, 1968; Sulzbacher, 1973) into observer bias has suggested that beliefs about ratees may affect rater perceptions and, consequently, the reliability of the ratings. In three studies (O'Leary & Kent, 1973; Shuller & McNamara, 1976; Siegel et al., 1976) of disruptive classroom behavior, while biasing informa tion experimentally introduced was found to influence global ratings, it had no significant effect upon results obtained from behaviorally stated seal es. Siegel et al. (1976) suggested that behaviorally specific items reduce bias and improve inter-rater and intra-rater reliability. The degrees of agreement among different raters on measures of the same subjects at the same time in the same setting have been used to indicate the inter-rater

PAGE 72

62 reliability of an instrument (Cronbach, 1970). Also, the agreement among different raters of subjects in the same settings at different times has been used for the same purpose (Cronbach, 1970). In middle and junior high schools, these conditions do not usually occur naturally. Fortunately, investigations of trait consistency in subjects (Abikoff, Gittelman, & Klein, 1980; Epstein, 1980; Mischel, 1969) have encouraged the comparisons of ratings by different raters over the same elapsed time periods, but for different settings and situations, conditions which do occur naturally in the secondary school setting. Epstein (1980) concluded that subjects do manifest trait consistency, if aggregation techniques are applied in assessing behaviors. Epstein c 1980) suggested aggre gation over raters (e.g., teachers), situations (e.g., classrooms), occasions (e.g., class periods), and measures (e.g. , di sci pl ina ry records). Epstein further suggested that when single ratings are made after extended periods of observation, these ratings are similar to aggregated ratings in that they represent an intuitive averaging of ratings over many observations. Thus, reliability may be improved by combining different teachers' ratings of the same student over the same portion of the school year. According to Cooper (1981), perhaps the most ubiqui tous challenge to inter-rater reliability is halo error

PAGE 73

63 (Thorndike, 1920) or the tendency of a rater to allow overall impressions of an individual to influence judgment of specific areas of behavior (Holzbach, 1978). Attempts (Landy, Vance, Barnes-Farrell, & Steele, 1980; Landy, Vance, & Barnes-Farrell, 1982) to statistically control for halo effects have apparently not succeeded (Harvey, 1982; Hulin, 1982; Mossholder & Giles, 1983; Murphy, 1982). One exploration of ways to reduce halo error resulted in a restatement of classic adv ice: do not use rating ca te gori es that are imprecise and overlapping (Cooper, 1983). In an extensive review of the literature, Cooper (1981) concluded that of nine methods currently employed to reduce halo effect, all leave residual illusory halo. Studies of variables affecting reliability have iden tified several other ch al 1 enge s to the accuracy of school behavior ratings. The sex of the teacher was found in two studies (Levine, 1977; Silvern, 1978) to be correlated with ratings of classroom behavior, with male teachers consistently reporting lower levels of disruptive behav ior. Teachers' ratings seemed to be influenced by special education labels in one study (Fogel & Nelson, 1983). In two studies (Marwit, 1982; Marwit, Marwit, & Walker, 1978), perceived unattractiveness of students has been shown to correlate with higher ratings of disruptive behavior. While challenges to reliability from a variety of sources have been observed, several studies (Bernardin &

PAGE 74

64 Pence, 1980; Fay & Latham, 1982; Latham, Wexley, & Pursell, 1975; Madle, Neisworth, & Kurtz, 1980; Pursell, Dossett, & Latham, 1980) have suggested that training in the use of rating scales may be effective in reducing errors of measurement. This review of studies of validity and reliability has identified some sources of and counter measures for errors of measurement. Next, studies of the variables affecting the norming of rating scales will be reviewed. Several writers have shown concern for the relation ship between behavior and the context in which it occurs. The social value of a test, according to Messick (1980), is determined by its instrumental value for a particular setting. Willems ( 1975) stated that few phenomena have meaning independent of the context in which they occur. Likewise, researchers were cautioned by Dickinson (1978) to evaluate behavior only in an environmental context. Epstein (1980) referred to the "extreme situational specificity of behavior" (p. 794) and warned that experi ments conducted in a single situation cannot be relied on to generalize across even minor variations in stimulus conditions. Others supporting this psychosocial approach include Sherif (1954); Erickson (1963), quoted in Tinto, Paclilio, and Cullen (1978); Salvia and Ysseldyke (1981, p 3 7 8 ) ; a n d Z am mu t o , Londo n , and Row 1 a n d ( 1 9 8 2 )

PAGE 75

65 Schools were described by Garbarino (1980) as "con texts for behavior and development" (p. 19). Some of the characteristics of schools which may influence levels and interpretations of disruptive behavior are size of enroll ment (Di Prete, 1981, p. 86; Garbarino, 1980; Kowal ski, Adams, & Gundlach, 1983); public or private administration (DiPrete, 1981, p. 81); control orientation (e.g., humani st i c v s. cu st o di al ) ( De i be r t & H oy , 1 97 7 ; G ay nor & Gaynor, 1976); degree of person-environment fit (Kulka, Klingel, & Mann, 1980); traditional vs. open classrooms (Solomon & Kendall, 1975); length of faculty tenure (DiPrete, 1981, p. 107); socioeconomic level of the host community (Kowal ski et al. , 1 983) ; and region of the country (DiPrete, 1981, p. xx; Kowalski et al., 1983). Researchers advocating the use of local norms for behav ioral measurements include Fremont and Wallbrown (1979); Mendelsohn and Erdwins (1978); Quay and Peterson (1967); Smith (1976); Walker and Hops (1976); and Wallbrown, Wallbrown, and Blaha (1976). The effects of sex, age, race, and socioeconomic status on ratings of disruptive behavior have been frequently studied. The types of disruptive behavior displayed in both educational and clinical settings have not been found to be significantly different for the variables of sex (Behar & Stewart, 1984; Epstein et al., 1983; Morris & Arrant, 1978; Stott et al., 1975, p. 166),

PAGE 76

66 age (Behar & Stewart, 1984; Ghodsian, Fogelman, Lambert, & Tibbenham, 1980; Stott et al., 1975, p. 83), race (Gajar & Hale, 1982), or socioeconomic status (Behar & Stewart, 1984; Stott et al., 1975, p. 97). Thus, providing for separate norms for these variables seems unnecessary in any scale rating only disruptive behaviors. Uses of Behavior Rating Scales Bailey et al. (1983) supported the use of rating scales in program planning and evaluation. Likewise, the lack of effective measurement devices was seen by Hirshoren and Heller (1979) as limiting the evaluation of program effectiveness. Mesinger ( 1982) called for the use of appropriate measurement devices in providing services for deviant youth within the public school setting. Cooper (1983), Peed and Pinsker (1978), and Beatty et al. (1977) have suggested providing rating scale results to ratees to influence behavior changes. Using rating scales to pro vide a standardized de scr i pti on of behavior al problems has been suggested (Edelbrock & Achenbach 1978). In a study comparing resource room delivery models, Wixson (1980) used a behavior rating scale in developing and evaluating intervention programs for various cate gories of handicapped children. Morton Bortner (Buras, 1978, p. 493), reviewing the AAMD Adaptive Behavior Scale, pointed out its usefulness for evaluating the progress of

PAGE 77

67 individuals and evaluating program goals. The Duval County School Board (1979) used a locally constructed behavior checklist to evaluate their grant-funded program for disruptive students. Several programs which retained students in their regular classrooms have used behavior scales for evalua tion purposes. Walker and Holland (1979) and Linton and Chavez (1979) developed and used rating scales for this purpose in elementary and junior high schools, respec tively. The Hahnemann High School Behavior Rating Scale (Spivack & Swift, 1977) was intended to provide teachers with a practical means of describing disruptive classroom behavior to parents and other school personnel. In a study of junior high school truants, Nielsen and Gerber (1979) used a behavior rating scale to match school inter ventions with student needs. A quantitative measure of disruptive behavior was developed by Mendelsohn and Erdwins (1978) to assist community agencies in devising programs for expelled students. Haskell (1979) developed a method of quantify ing clinical behavior in institutional settings to provide a basis for planning individual programs and evaluating results. Mcsweeney and Trout (1979) used the Jesness Behavior Checklist (Jesness, 1970) to evaluate the social progress of deviant children in a wilderness camp pro gram. Five reasons for obtaining measures of students

PAGE 78

68 are offered by Salvia and Ysseldyke (1981): "Screening, placement, program planning, program evaluation, and assessment of individual programs" (p. 14). Behavior rating scales have been used to obtain measures for each of these needs. Summary This review has identified five approaches in the literature to define disruptive school behavior CDSB). A conceptualization of DSB based on the interactions of students, teachers, and administrators within the school setting was suggested as most relevant for the development of an instrument to quantify DSB. Psychometric challenges to the use of rating scales for identifying behavioral characteristics were consid ered. Research was cited to suggest that teachers using reliable and valid scales could accurately identify DSB. Nineteen instruments available for assessing problem behaviors were reviewed. None appeared to meet the psychometric criteria required for educational placement decisions. Possible sources of error in nonsystematic observations were presented with the suggestion that inaccurate, biased, or subjective judgments may result. Type, frequency, and severity of behaviors were related to item content, item format, and response format. Support was found for the inclusion of these

PAGE 79

69 measurement parameters in assessing DSB. The use of descriptive statistics in current research for obtaining individual behavior ratings and deriving local norms was demonstrated. A review of the sources of error in measurement was conducted and counter-measures for improving validity and reliability estimations were suggested. A number of variables affecting the norming of rating scales were investigated. Research evidence rejected separate norms based on gender, race, or socioeconomic status. The effective use of behavioral instruments in a variety of settings was documented, suggesting the suitability of such a device for describing students who display DSB.

PAGE 80

CHAPTER THREE METHODOLOGY The purpose of this study was to develop and validate an instrument, the Disruptive Student Behavior Scale (DSBS). The DSBS is intended to be used to assess quantitatively the disruptive school behaviors of students referred for placement in either special education or alternative education programs. This chapter presents the research questions, defines the target population, presents a plan for constructing the scale, describes procedures for a pilot study, details statistical tests and procedures for the data analyses, and discusses possible limitations of the study. Research Questions 1. Does the content of the DSBS represent behaviors recognized and accepted by educators as occurring in and disruptive to the school environment? 2. In the judgment of experts, does the DSBS contain an equitable distribution of items descriptive of the underlying theoretical constructs that identify disruptive students and discriminate them from non-disruptive students? 70

PAGE 81

71 3. To what degree does the DSBS demonstrate criterion, convergent, and discriminant validity? 4. To what degree does the DSBS provide ratings which are stable over time? Construction of the DSBS The following plan is a modification of a suggested procedure (Benson & Clark, 1982) for rating scale construc tion. A review of disruptive school behavior (DSB) literature provided a research base for defining the constructs comprising DSB. A total of 303 descriptive items and 22 categories were found in 36 studies. After eliminating duplications and items not pertaining directly to DSB, 56 items remained. Combining similar categories resulted in a total of 13 potential categories of behav iors associated with DSB. In a project conducted by the Research Committee of the Psychological Services Department of the Duval County, Florida School District, the 56 items and 13 categories were presented to 16 teachers of middle school students enrolled in a behavior management program for disruptive students. The rating group was composed of 10 females and 6 males, and all had at least two years full-time teaching experience. Group members were instructed to assign each i tern to one or more of the 13 categories. Instructions and results are reproduced in Appendix A.

PAGE 82

72 The judges' ratings and comments resulted in the retention of 10 categories, which were considered to be one set of constructs which could be used in identifying DSB. A tentative definition of each construct was formulated using the descriptive items assigned by the teachers. A verification was attempted of the inclu siveness of these derived constructs. A frequency distribution was prepared for all of the conduct code violations reported for chronic violators in a sample of Duval County, Florida, elementary, middle/junior high, high, and alternative schools (Moses, 1981). Of 7717 behavior violations, 7686, or 99.61, were included within the definitions of the proposed constructs. The items, as taken from the studies and used in developing the constructs, were not considered specific enough for use in a quantitative rating scale. However, a readily available pool of potential items was located in the disciplinary referral records of an inner-city junior high school in a metropolitan Florida school district. Verbatim transcriptions were made of the reasons recorded on the referral forms by teachers when sending students to the deans. All active folders for the 1980-1981 school year were reviewed. A total of 395 items, including dupli cations, were recorded without regard for gender, age, race, or grade level. Combining obvious duplications and

PAGE 83

73 similarities resulted in 66 items (Appendix B) to be considered for inclusion in a scale for rating DSB. All of the 66 items were then presented individually to six male and five female volunteers, experienced secondary school regular classroom teachers from suburban Fl or i da middle and junior high schools. Instructions a re reproduced in Appendix C. These teachers were asked to verify the specificity of the items and edit those consid ered ambiguous. This review yielded 40 items for possible use on an instrument. These items were then stated in the past tense to reflect the intention to measure students' past behavior (Appendix D). This preliminary study indi cated the feasibility of using research-based constructs and teacher-generated items as the basis for a rating scale for disruptive school behavior. In order to reduce halo and leniency errors, it has been suggested (Blanz & Ghiselli, 1972) that a scale be arranged so that items from the same construct will not be contiguous. Accordingly~ items were initially randomly ordered, then inspected and rearranged to meet this crite rion. (Appendix M). Research studies previously cited suggested that in addition to specifying the type, a qua nti tati v e measure of disruptive behavior must provide for rating both frequency and severity. Frequency rating was provided for by the choice of response format selected for the instrument. The literature review suggested the

PAGE 84

74 suitabilty of a 5-point, equal interval, summated rating scale (Likert, 1932) using the following anchors: 0 None of the time 1 Very infrequently 2 Sometimes 3 Quite often 4 Always (Pohl, 1981, p. 239) The rating scale (Appendix F) utilized this response format. The severity rating for each scale item was estab lished with assistance from the faculty, staff, and administrators of two alternative schools located in two metropolitan Florida school districts. From their experi ence with disruptive students, these educators were particularly aware of the consequences for students who display DSB. Respondents were selected from volunteers, including the principal and assistant principal, school psychologist, social worker, educational evaluator, and faculty members. This group contained both males and females in approximately equal numbers. All had more than two years' experience working with disruptive students. The school experience may be conceptualized as influ encing the social, personal, and academic domains of a student's life. Each of these domains may be subdivided to facilitate closer study of the consequences of the school experience (See Table 1). One way for educators to

PAGE 85

75 assign a severity factor to a disruptive activity is to have them estimate which domains of student life would likely be affected adversely by that particular behavior. Instructions for this procedure are reproduced in Appendix G. The number of adverse consequences assigned by at least 50% of the raters, divided by a constant of three to keep the numbers small and with fractions rounded up to the next whole number, gave a severity rating of 1, 2, or 3 to each of the items on the rating scale. Results are reported in Chapter Four. A scoring template incorporating the severity factor was prepared for the DSBS (Appendix H). This template has five holes, one corresponding to each possible frequency rating (i.e. , 0, 1, 2, 3, 4) for each rating seal e i tern. Through the holes are read the rater's mark CX) indicating the frequency rating assigned. Above each hole is printed a number which is the product of that frequency rating and the pr ev iousl y determined severity fact or for that i tern. Thus, the weighted score for that item may be read by the scorer directly from the scoring template and recorded on the DSBS rating form beside each item. These item scores were then added to give the page score and form score (see Appendix F) and recorded onto a summary sheet (Appendix I). The Summary of Teacher Ratings form (Appendix I) contains for each student the DSBS rating; the deviation, in z-scores, from the local

PAGE 86

Table 1. Domains of Student Life Influenced by the School Experience Social 1 Interpersonal relations a. school personnel b. peers 2. Vocational opportunities Personal 1. Cognitive development 2. Affective development 3. Physical dev el o pm e nt Academic 1. Learning vs. ignorance 2. Passing vs. failing 3. Attending vs. suspen sion or expulsion

PAGE 87

77 norm; a comparison of ratings by each teacher; and the basis for constructing a DSBS profile for prescriptive use (Appendix J). These data are intended to provide local school authorities with criteria for estimating the devia tion of any student's rating from the 1 ocal DSB norm and are intended to assist in determining a student's need for an intervention program. The DSBS is normed locally within each school district. Norms from this study are reported in Chapter Four for information, but are not to be used as criteria for judgments about students in other settings. Validation of the DSBS To assure content validity, the 40 items and the 10 constructs developed from this preliminary study were presented to a group of 24 teachers with instructions to assign each item to a construct category or to no category. The instructions a re reproduced in Appendix E. Each judge had at least two years of regular classroom teaching experience in a middle or junior high school. Thirteen male and 11 female teachers participated. The judges were also asked to verify the specificity of the retained items and reword those considered ambiguous. Revisions were made as suggested and confirmed by a follow-up study using another group of eight similarly-qualified teachers.

PAGE 88

78 As described in the field study section, at a Florida middle school a criterion group of disruptive students was selected by nomination by seven non-teaching school person nel, including two deans, three guidance counselors, and two administrators. Students in the disruptive group were ranked numerically on a continuum from nonto severely disruptive, based on subjective ratings from all the nominating personnel. DSBS ratings from teachers were compared to these subjective ratings to determine how well high DSBS teacher ratings correlated with high levels of disruptiveness as perceived by non-teaching school person nel. To estimate how well the DSBS identified the disrup tive group, the mean of DSBS ratings for the disruptive group students were compared with the mean of DSBS ratings for a norming group representing a sample, stratified by grade, of the school population. If the DSBS demonstrated agreement with the concurrent judgments of disruptiveness made by non-teaching school officials, there would be made a prima facie case for predicting that students in other settings identified by the DSBS as disruptive would also be judged disruptive by non-teaching school officials. Messick (1980) described construct validity as based on convergent and discriminant validity, ethical interpretation, relevance and utility for the specific application, and the consequences following use of the

PAGE 89

79 instrument. Convergent validity requires that the DSBS be able to identify all students who are considered exces sively disruptive. To demonstrate satisfactory convergent validity, the DSBS ratings of 100% of the students in the disruptive group would have to be significantly above the local DSB norm. The disruptive group ratings are reported i n Ch a pt er Four . Discriminant validity requires that the DSBS be able to reject those students who are not considered exces sively disruptive. To demonstrate satisfactory discrimi nant validity, the DSBS ratings of only those students in the disruptive group, or eligible for inclusion, could be significantly above the local DSB norm. Ratings of the norming group are reported in Chapter Four. Ethical interpretation of DSBS ratings requires an understanding of both the theoretical and practical concepts underlying development of this instrument. There fore, a manual will be prepared before the DSBS is offered for research use. Relevance was supported by the theoreti cal basis on which the 10 constructs were chosen to define DSB for this study. Utility was provided by the proce dures used to select appropriate items, score the forms, interpret the ratings, and present the results. The conse quences of using the DSBS cannot be predicted until it is thoroughly researched. The intent is to improve the validity of the selection process for programs assisting disruptive students.

PAGE 90

80 Reliability of the DSBS The DSBS rating for each student is an aggregate of scores from at 1 ea st four teachers. A test-retest measure compared two DSB ratings obtained from individual teachers. Fourteen days aft~r the receipt of teacher ratings, a follow-up rating by the same teachers of approximately 101 of both the norming and disruptive groups was made. These results are reported in Chapter Four. The internal consistency of the DSBS was protected by choosing only i terns pr ev iousl y used by teach er s to describe DSB. Item analysis is not an effective technique for establishing reliability of individual administrations of the DSBS. Patterns of disruptive behavior are often narrow and stereotypical, while the DSBS contains items descriptive of a broad range of possible behaviors. Thus, item scores were not likely to correlate with each other. No attempt was made to assess interrater reliability. Classroom settings are conceptualized as discrete environ ments, whose norms for behavior are determined by the personality of the teacher. The behavior of interest is the interaction of students with their teachers totally, not individually. Field Study The purpose of the field study was to identify and correct any problems, actual or potential, with item

PAGE 91

81 content, response format, or administration and scoring procedures of the DSBS. Following a successful field study, the instrument may be offered to the profession for further research and development (Benson & Clark, 1980). Accordingly, the operational goal of this present effort was to conduct a field study to determine the readiness of the DSBS for use as a research instrument. The target population consisted of students enrolled in grades six through nine (i.e., middle and junior high school grades) in public schools anywhere in the United States. No restrictions were placed on age, gender, race or socioeconomic status. The selection criteria for the host school were a heterogeneous ethnic population, an urban or suburban location, public middle (grades 6, 7, 8) or junior high (grades 7, 8, 9) school status, random assignment of students to basic courses, and an average daily attendance figure of at least 500 students. Special schools, such as alternative schools and special education centers, were not considered. A public middle school meeting these criteria was located in a predominately urban school district on the west coast of Florida. The student enrollment was approximately 76% white, 22% black, and 2% Asianand Hispanic-American, with an average daily attendance of 733. Socioeconomic status was said by the principal to be primarily upper-lower class and lower-middle class.

PAGE 92

82 For the norming group, a sample consisting of 90 stu dents was selected using one English and one mathematics class, with randomly-assigned enrollments, at each grade level. A total of six classes containing 203 students and ranging from 32 through 35 students each were sampled. The numbers 1 through 35 were written on individual slips of paper and 15 numbers drawn randomly using the replace ment procedure. For each class, the students whose class roll numbers matched the 15 randomly selected numbers were included in the norming groups. The disruptive group was selected by nomination by non-teaching school personnel, who were asked to list the names of all of the excessively disruptive students encountered during the current school year. It was thus possible for a student's name to be included in both the norming and the disruptive groups. The nominating process initially produced a group of 64 students. After a confer ence among the raters, this group was reduced to 36 students. All students finally nominated into the disruptive group were assigned to one of four levels of disruptive ness (none, mild, moderate, or severe) by each nominating person working independently. Nominated students were assigned a numerical rating according to the following seal e:

PAGE 93

83 Level of Disruptiveness None Mild Moderate Severe Rating 0 1 2 3 Students were ranked according to the average of these rat ings. This ranking permitted the correlation, reported in Chapter Four, of levels of disruptiveness between the DSBS results and the qualitative assessments by school person nel for each di sr upti v e group student. Schedules for the sample students were obtained from school records. No contact was made with any student. Training of all participating teachers took place in a meeting at which a DSBS form for each period of a sample student's current schedule was distributed. Appendix K contains these instructions. The purpose of the study was explained and a date and procedure for returning the forms agreed upon. Emphasis was placed on the need to respond to only the behaviors actually mentioned on the instrument and to perform the ratings independently of other teachers. Provision was made for a faculty member to either answer or refer questions that might arise during the rating period. Teachers not submitting all their DSBS forms by the agreed upon date were contacted and reminded of the importance of their participation. Upon receipt of at

PAGE 94

84 least four completed DSBS forms for each student, the DSBS rating for that student was calculated. At least four scorable forms, totaling 622, were received for 108 stu dents, 76 in the norming group and 32 in the disruptive group. The scoring template (Appendix H) provided for calculating item scores weighted for severity. The item scores were totaled to produce a form score, which was entered on the Summary of Teacher Ratings form (Appendix I). This summary form contains spaces for the student's name, grade, age, and sex; school name; evalua tor's name and title; individual form scores; each rater's name, subject, and class period; and calculation of the student's DSBS rating and z-score. Each sample student's form scores were summed to give a total score. The total score was divided by the number of raters to yield the average score, which is the student's DSBS rating. After the DSBS ratings for all the norming group stu dents were calculated, the mean DSBS rating and standard deviation for the group were obtained. This mean of the means is the local DSBS rating, or norm, for the target school. The local DSBS norm was subtracted from the students' DSBS rating, giving their deviation from the 1 ocal norm. Dividing this deviation by the local standard devia tion gave the number of standard deviation units, or z-scores, the student's DSBS rating differed from the

PAGE 95

85 local DSBS norm. The criterion of two standard deviation units above the local DSBS norm translates to a disrup tiveness score higher than approximately 98% of the predicted scores from the school population. The distri bution of scores obtained from the norming group was inspected to assure the existence of sufficient variance to make the z-scores meaningful. A reliability check was performed. Fourteen days after all the rating forms were collected, approximately 10% of the students from both the norming and disruptive groups were selected to be representative of the range of scores. New forms were submitted to the original raters for rerating the same students and the results compared. These results are reported in Chapter Four. After comple tion of the data analyses, all participants were invited to a meeting to discuss the results, offer comments, and receive appreciation for their participation. Data Analyses Validity To establish content validity, 24 expert judges assigned proposed DSBS items to construct categories. Results of the judges assignments were totaled for each item. An item was dropped if not assigned to at least one category by each judge. If this content validation pro cedure had resulted either in fewer than 30 items being

PAGE 96

86 assigned to at least one construct or in having a con struct with fewer than three items assigned by 80% of the respondents, enough items would have been constructed and validated to meet these criteria. The judges' item assign ments are reported in Chapter Four. To ascertain how well the DSBS identified the disruptive group, the t-test was used to estimate the significance of the difference between the means of the norming group and disruptive group. An obtained prob ability level of .05 or less was considered evidence of statistical significance. The magnitude of the difference between the means was used to evaluate the practical significance of the instrument and its potential for identifying disruptive students. These results are be reported and discussed in Chapter Four. To estimate convergent validity for the DSBS, the DSBS rating for each disruptive group member was compared with the mean DSBS rating of the norming group. For the purposes of this study, a DSBS rating of at least two z-scores above the norming group mean was accepted as evidence that the DSBS had correctly identified a disrup tive group member. The standard error of the mean was used to include students when evaluating borderline cases. The criterion for satisfactory convergent validity was the correct identification of 100% of the disruptive group.

PAGE 97

87 Di scr imi nant validity al so was estimated by using ratings, means, and z-scores. The DSBS rating for each norming group member was compared with the mean of that group. Any norming group member whose DSBS rating exceeded the mean by at least two z-scores was considered identified by the DSBS as excessively disruptive. Identi fied cases, not members of or eligible for the disruptive group, were considered challenges to the discriminant validity of the DSBS. All cases not meeting the construct validity criteria were investigated. Construct validity results are reported and discussed in Chapter Four. Reliability The Pearson product-moment correlation statistic was used to compare the original ratings on approximately 10% of the completed forms with follow-up ratings made after 14 days. Individual coefficients of at least .80 were set arbitrarily to establish an acceptable level of test retest reliability. Limitations 1. The school for the field study was selected based on the willingness to cooperate by both the school and the faculty. This may have mitigated problems that would occur in a less favorable environment.

PAGE 98

88 2. Teacher resistance and/or concerns about this type of research may have biased or limited their partici pation. 3. The study was limited to exploration and the results are not intended to generalize beyond the administra tion and scoring procedures. Specifically, the calculated DSB norm is valid only for this school. 4. No provision was made to assess the possible effects of grade and sex on DSB norms. Studies have indicated the influences are not significant, but at some point this should be investigated. 5. The disruptive sample group was likely composed of students who had been referred to the dean. The same teachers who referred these students to the dean may have rated their behaviors, with bias a possibility. 6. The use of expert judges in the validation procedures may have introduced personal bias into the items used on the instrument.

PAGE 99

CHAPTER FOUR RESULTS AND DISCUSSION The purpose of this study was to develop and validate an instrument, the Disruptive Student Behavior Scale (DSBS). The study focused on identifying components of disruptive school behavior as perceived by middle and junior high school teachers and constructing an instrument to quantify these behaviors. To accomplish this, an instrument was constructed using behaviors taken from disciplinary referrals and field tested on a representa tive sample of students from a Florida middle school. Teacher ratings for a norm group and a disruptive group were collected and analyzed as outlined in Chapter Three. These results are reported in this chapter. Results The Severity factor Results of the assignment of potential adverse conse quences resulting from DSBS behaviors are reported in Table 2. Twenty packets containing 40 DSBS items and an instruction sheet were distributed and 16 were returned. At least 50%, or 8, of the raters had to assign a DSBS item to a particular domain before that domain was 89

PAGE 100

Table 2. Potential Adverse Consequences of DSBS Behaviors Old Domains Severity DSBS Item Social Personal Academic Total Rating Item Number 1 a 1b 2 1 2 3 1 2 3 Domains Factor Number 1. 16 8 5 3 2 0 10 8 5 4 2 1 2. 13 0 3 2 7 0 6 5 3 1 1 13 3. 12 15 7 6 3 0 15 11 8 5 2 3 4. 1 3 5 2 4 8 0 3 2 4 2 1 4 5. 10 13 2 2 5 1 1 2 13 3 1 2 6. 9 12 4 2 8 0 2 1 14 4 2 36 7. 12 1 5 4 8 0 9 14 8 5 2 25 8. 1 5 3 6 1 9 0 0 1 9 3 1 5 9. 9 14 0 10 0 0 0 0 2 3 1 28 1 0. 12 2 4 2 8 0 8 9 8 5 2 16 11. 13 3 0 1 8 0 2 5 1 1 3 1 26 \() 0 1 2. 12 9 0 9 0 0 0 0 2 3 1 19 13. 11 1 1 3 8 9 0 2 1 0 4 2 15 1 4. 8 8 0 8 1 1 2 0 0 1 4 2 37 1 5. 13 14 1 2 8 0 1 1 13 4 2 17 16. 12 3 8 1 1 3 0 13 10 1 5 2 18 17. 12 0 4 2 0 1 1 1 10 2 1 20 1 8. 9 10 5 8 8 0 0 0 1 4 2 29 1 9. 11 1 5 3 1 8 1 2 1 8 4 2 12 20. 13 3 3 2 3 0 12 1 1 9 4 2 21 21. 16 7 3 2 8 0 0 0 2 2 1 27 22. 10 8 2 1 2 0 2 2 8 3 1 23. 10 9 3 8 1 5 0 0 0 8 5 2 22

PAGE 101

Old Domains Item Social Number 1 a 1b 2 24. 12 2 2 25. 15 5 4 26. 12 1 1 27. 1 1 8 3 28. 14 8 2 29. 12 1 1 30. 13 10 2 31. 12 2 1 32. 14 15 1 33. 1 1 8 1 34. 12 3 2 35. 15 2 2 36. 10 3 8 37. 12 2 9 38. 14 10 8 39. 12 11 1 1 40. 13 10 8 1 9 1 0 8 2 1 1 1 8 1 8 1 1 9 10 8 8 8 2 8 Table 2. (Continued) Personal 2 3 1 5 0 0 8 0 3 2 0 10 10 0 0 13 0 0 0 0 1 1 8 0 0 8 0 2 8 0 9 1 1 0 1 8 0 2 9 0 2 10 9 9 11 8 8 8 0 0 9 1 2 8 1 2 Severity DSBS Academic Total Rating Item 2 3 Domains Factor Number 0 9 3 1 11 2 12 4 2 23 16 2 4 2 10 3 0 3 1 35 1 1 4 2 7 15 1 3 1 9 0 1 1 5 2 6 2 8 3 1 31 8 9 7 3 34 ' ..... 1 9 5 2 8 2 15 4 2 30 2 16 4 2 32 8 12 8 3 8 14 8 3 14 0 1 1 6 2 24 2 15 5 2 33 2 16 6 2 38

PAGE 102

92 included in the severity factor calculation. Table 2 includes the DSBS item numbers as presented to the raters, the item numbers as randomly assigned for the final revision of the DSBS, the assignment to domains of the individual items, the number of domains assigned by at least 50J of the raters, and the calculated severity factor for each i tern. Severity fact ors of 1 , 2 _ , and 3 were obtained for 14, 23, and 3 items, respectively. Items 38, 39, and 40 received ratings producing a severity factor of 2. Each of these is a law violation and thus was arbit rarily upgraded to a severity factor of 3 to recognize the seriousness of the behavior. A reliability check was performed by having 8 of the original raters rerate all 40 items. The Pearson product moment correlation statistic was used to compare the number of domains assigned to each item, resulting in correlations ranging from .89 to 1.00, with an overall correlation coefficient of .92. The Sam pl es A public middle school (grades 6, 7, 8) in a predominantly urban school district on the west coast of Florida was selected as the host school for the study. For the norming group, a sample consisting of 90 students was selected using one English and one mathematics class, with randomly assigned enrollments, at each grade level.

PAGE 103

93 At least four scorable rating forms, totaling 424, were received for 76 students. For each student, an average of 5.6 ratings was received. Table 3 shows the distribution of the rating forms by grade, age, and gender. The higher grades (7 & 8) and middle ages (13 & 14) accounted for 72% and 61%, respectively, of the cases. The gender distri bution was approximately equal, i.e., 36 females and 40 males. Overall, the coverage of grade, age, and gender is typical for a middle school. The disruptive group was selected by nomination by non-teaching school personnel, who were asked to list the names of all of the excessively disruptive students encountered during the current school year. Thirty-six students were nominated for this group. At least four scorable rating forms, totaling 198, were received for 32 students. For each student, an average of 6.2 ratings was received. Table 4 shows the distribution of the rating forms by grade, age, and gender. The sample is unevenly distributed, with grade 8, ages 14 and older, and males accounting for 63%, 78%, and 78%, respectively, of the cases. Research Question One Does the content of the DSBS represent behaviors recognized and accepted by educators as occurring in and disruptive to the school environment?

PAGE 104

94 Table 3. Rating Form Distribution by Demographic Categories--Norming Group Sample % Forms N(76) Total N(424) Grade 6 21 28 107 7 26 34 153 8 29 38 164 Age 12 & below 1 8 24 92 1 3 21 28 120 14 25 33 152 1 5 & above 12 1 6 60 Gender Female 36 47 207 Male 40 53 217 % Norm 25 36 39 22 28 36 14 49 51

PAGE 105

95 Table 4. Rating Form Distribution by Demographic Categories--Disruptive Group Sample % Forms % N(32) Total NC198) Total Grade 6 2 6 12 6 7 10 31 61 31 8 20 63 125 63 Age 12 & below 1 3 6 3 1 3 6 19 35 18 14 1 1 34 68 34 1 5 & above 14 44 89 45 Gender Female 7 22 44 22 Male 25 78 154 78

PAGE 106

96 All items on the DSBS were developed using behavioral sta tern ent s taken from the di sci pl ina ry records of a junior high school < A ppe ndi x B) The i tern dev el opm ent pr oce d ur e is described in detail in Chapter Three and the 40 proposed items are listed in Appendix D. In the current study, these 40 items were presented for a final review to 24 experienced middle and junior high school teachers. These judges were asked specifically to identify any items not immediately recognizable as potentially occurring in and disruptive to the school environment. Seven of the 40 items were so identified. Items 2, 7, 23, and 33 were reworded for clarification. Items 19 and 22 were dropped as requiring inferences from the rater. Item 36 was combined with item 16. These revisions resulted in the retention of 37 items. These final items were accepted unanimously by a similarly qualified group of eight teachers. Thus the item content of the DSBS does appear to represent behaviors accepted by educators as descriptive of disruptive school behavior. To investigate the recognition of behaviors described by the DSBS items, a frequency count was made of the responses to individual items by five raters, all of whom had rated the same 10 students, 8 from the norming group and 2 from the disruptive group. Ratings averaged across the five raters ranged from .2 for the lowest to 86.6 for the highest rated student. Table 5 summarizes these

PAGE 107

Table 5. Frequency of Observed DSBS Behaviors by Constructs Construct Number 1 2 3 4 5 6 7 8 9 1 O 33 C 1) 38(3) 3(8) 6(4) 12(2) 24 C 10) 18(5) 11 ( 7) 22(9) 1 C 6) Frequency 10( 13) 15( 19) 13( 15) 8 C 11 ) 23(17) 23(18) 5(20) 6(12) 2 C 16) 1 C 1 4 ) of 1 2 ( 26) 19(27) 17(22) 1( 30) 1(23) 13(25) 24(29) 15(28) 3(21) 1(24) Observation 10(31) 31(34) 6(37) 1( 32) 7(35) 2(33) 3(38) 13(36) ( ) = DSBS Item Number IO -..:i Total Observations Per Construct 65 103 33 21 40 60 54 32 27 1 8 % of Total Observations 14 23 7 5 9 13 12 7 6 4

PAGE 108

98 responses. Response frequencies for individual items ranged from 1 (for 6 items) to 38 (for 1 item). These results indicate that all of the items were recognized at 1 east once. Twenty-one of the i terns were recognized 10 or more times. This suggests that the content included in the DSBS items is recognized by educators frequently enough to justify the inclusion of each item on the scale and to provide for the rating of a broad sprectrum of dis ruptive behavior occurring in the school environment. Research Question Two In the judgment of experts, does the DSBS contain an equitable distribution of items descriptive of the under lying theoretical constructs that identify disruptive students and discriminate them from non-disruptive students? For the purposes of this study, experts were defined as educators with at least two years' teaching experience in an appropriate setting. The constructs used as criteria for selecting DSBS items were derived from a review of disruptive school behavior literature. This procedure is described in detail in Chapter Three and Appendi X A. Table 6 contains a list of the final 10 constructs. Appendix D contains the 40 proposed items. In the current study, these constructs were presented individually to 24 experienced middle and junior high school teachers with

PAGE 109

99 Table 6. DSBS Constructs by Number Construct Number Di sobe di ence Di srupti v ene ss Imp ul s iv en e s s Destructiveness Aggression Academic irresponsibility Social/personal irresponsibility Ineffective interpersonal relationships Attendance violations Law v i o 1 at i on s 1 2 3 4 5 6 7 8 9 10

PAGE 110

100 instructions to match the proposed DSBS items to the con structs. Table 7 contains these results. Eight of the constructs met the criterion of having at least three items assigned by 801 of the raters. Only two items met the criterion for the two constructs labeled "Ineffective Interpersonal Relations" and "Impulsive ness." To remedy these deficiencies, one additional item for the "Ineffective Interpersonal Relations" construct was prepared in consultation with the raters and incorpo rated into the scale as item 12. The final revision of the DSBS thus contains 38 items. Item 33 was rewritten to be more indicative of "Impulsiveness." The final 38 items and 10 constructs were presented to eight experienced junior high school teachers. Each construct had at least three items assigned by seven, or 881, of the raters. Table 8 reports these results. Four constructs are represented by three items, four constructs are represented by four items, and two constructs are represented by five items. This distribution of items to constructs appears equitable and thus this research question is answered in the affirmative. Research Question Three To what degree does the DSBS demonstrate criterion, convergent and discriminant validity? Ninety students were selected for the norming group and 36 students for

PAGE 111

Item Number 1 1. 23 2. 18 3. 4. 5. 6. 7. 8. 9. 1 0. 2 11. 22 1 2. 1 1 3. 1 4. 1 5. 1 6. 1 7 2 1 8. 1 9. 20. 1 21. 2 22. 4 23. Table 7. Assignment of Proposed Scale Items to Constructs for Content Validation Constructs j 2 3 4 5 6 7 8 9 1 0 Agreement 1 96 2 4 75 22 2 92 23 1 96 24 100 1 1 20 1 1 83 1 23 96 1 23 96 1 23 96 1 1 20 83 1 1 92 20 1 1 1 83 21 1 2 88 24 100 1 21 1 1 88 1 22 1 92 1 1 20 83 1 23 96 4 12 2 6 50 1 22 92 20 2 83 4 2 14 58 23 1 96 Action Taken Revised Revised _. 0 _. Dropped Dropped Revised

PAGE 112

Table 7. (Continued) Item Constructs % Action Number 1 2 3 4 5 6 1 8 9 10 Agreement Taken 24. 24 100 25. 1 21 1 1 88 26. 24 100 27. 2 22 92 28. 24 100 29. 24 100 30. 24 100 31. 20 3 1 83 32. 24 100 33. 3 7 14 58 Revised 34. 22 2 92 35. 1 1 20 1 1 83 0 I\.) 36. 2 4 7 4 7 29 Corn bi ned w/ II 1 6 37. 1 23 96 38. 1 23 96 39. 1 23 96 40. 1 23 96 Totals 3 4 2 4 5 3 4 2 3 5

PAGE 113

Table 8. Follow-up Study for Assignment of Proposed Scale Items to Constructs Item Constructs J Action Number 1 2 3 4 5 6 7 8 9 1 O Agreement Taken 1. 8 100 2. 7 1 88 3. 8 100 4. 8 100 5. 8 100 6. 7 1 88 7. 8 100 8. 8 100 9. 8 100 1 0. 1 7 88 11. 8 100 ...... 12. 1 7 88 0 IJJ 1 3. 7 1 88 14. 8 100 1 5. 7 1 88 1 6. 8 100 17. 1 7 88 1 8. 8 100 1 9. Dropped 20. 8 100 21. 1 7 88 22. Dropped 23. 8 100

PAGE 114

Table 8. (Continued) Item Constructs j Action Number 1 2 3 4 5 6 7 8 9 1 0 Agreement Taken 24. 8 100 25. 7 1 88 26. 8 100 27. 8 100 28. 8 100 29. 8 100 30. 8 100 31. 7 1 88 32. 8 100 33. 8 100 34. 8 100 35. 7 1 88 36. Combined w/ fl 1 6 ..... 37. 8 100 0 .t:: 38. 8 100 39. 8 100 40. 8 100 Totals 4 4 3 4 5 3 . 4 2 3 5 New Item 3

PAGE 115

105 the disruptive group. DSBS forms totaling 882 were distri buted to 39 teachers. At least four scorable forms were received for 76, or 84%, of the norming group and 32, or 89%, of the disruptive group. A total of 622 scorable forms was used in the validity studies of the DSBS. Ratings ranged from .2 to 85.5 and 55.1 to 86.9 for the norming and disruptive groups, respectively, indicating the DSBS provides for collecting data representative of a wide-ranging population. Criterion validitywas estimated by comparing subjec tive ratings by non-teaching personnel with teachers' DSBS ratings of the disruptive group students. Table 9 presents these ratings. A Pearson product-moment correla tion significant at the p~ .01 Cr= .47, df = 30) was obtained, indicating a positive relationship. Using the t-test to compare means (Tables 10 and 11) of the disruptive and norming groups produced a difference significant at the p~ .01 level Ct = 18.4 , df = 106). A comparison of the values of the means, 64.9 and 19.9, respectively for the two groups, suggests the difference is meaningful and that the DSBS was able to identify a criterion group of disruptive students. Convergent validity was estimated by comparing the DSBS ratings, converted to z-scores, of each disruptive group student to the mean DSBS rating for the norming group. Tables 10 and 11 contain the DSBS ratings and

PAGE 116

106 Table 9. Comparison of Disruptiveness Ratings by Teachers and Non-teaching Personnel Non-Teaching Teachers' Personnel Student DSBS Subjective Number Rating Rating 1 55. 1 4 2 5 5. 1 3 3 55.2 3 4 55.9 3 5 56.4 3 6 56.9 4 7 57.3 3 8 57.8 4 9 58.2 3 10 58.6 3 11 5 9. 1 3 12 59.6 3 1 3 6 0. 1 4 14 61.2 4 1 5 62.3 4 16 62. 8 4 17 63. 3 4 18 63.8 3 19 64.2 4 20 64.7 4 21 6 5. 1 3 22 65.6 3 23 6 6. 1 4 24 66.5 4 25 66.8 4 26 66.9 4 27 67.0 4 28 81. 1 4 29 84.3 4 30 85.5 4 31 86.0 4 32 86.9 4 r = .47** **Ps_.01

PAGE 117

107 Table 10. DSBS Ratings and z-scores for the Disruptive Group Student Number DSBS Rating z-score 101 102 103 104 105 106 107 108 109 110 1 11 1 1 2 113 114 1 1 5 116 117 118 1 1 9 120 121 122 123 124 125 126 127 128 129 130 131 132 Mean = 64. 9 Standard Deviation= 9.5 Standard Error= 1.7 55.1 55.1 55.2 55.9 56.4 56.9 57.3 57.8 58.2 58.6 59. 1 59.6 6 0. 1 61.2 62.3 6 2. 8 63.3 63.8 64.2 64.7 6 5. 1 65.6 6 6. 1 66.5 66.8 66.9 67.0 81. 1 84.3 85.5 86.0 86.9 2.3 2.3 2.3 2.3 2.4 2.4 2.4 2.5 2.5 2.5 2.6 2.6 2.6 2.7 2.8 2.8 2.8 2.9 2.9 2.9 2.9 3.0 3.0 3.0 3.0 3.0 3. 1 4.0 4.2 4.3 4.3 4.4

PAGE 118

108 Table 11. DSBS Ratings and z-scores for the Norming Group Student Number DSBS Rating z-score 1 .2 -1. 3 2 1. 1 -1.2 3 1.2 -1.2 4 1. 6 -1 2 5 2. 1 -1.2 6 2.3 -1. 1 7 4.2 -1.0 8 5.3 -1.0 9 5.9 -0.9 10 6.4 -0.9 1 1 6.7 -0.9 1 2 7.5 -0.8 1 3 7.8 -0.8 14 8.6 -0.7 15 1 0. 1 -0.6 16 11. 5 -0.6 17 1 3. 1 -0.4 1 8 14.2 -0.4 1 9 1 4. 8 -0.3 20 1 5. 3 -0.3 21 15.5 -0.3 22 1 5. 9 -0.3 23 1 6. 1 -0.2 24 16.3 -0.2 25 1 6. 4 -0.2 26 16.6 -0.2 27 16.7 -0.2 28 16.8 -0.2 29 16.8 -0.2 30 1 6. 9 -0.2 31 17.0 -0.2 32 17.2 -0.2 33 17.2 -0.2 34 17.3 -0.2 35 17.4 -0.2 36 17.4 -0.2 37 17.5 -0.2 38 17.5 -0.2 39 17.5 -0.2 40 17.6 -0.2

PAGE 119

Student Number 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 Mean= 19.9 109 Table 11 (Continued) DSBS Rating 17.6 17.7 17.8 18.0 18.2 1 8. 3 18.4 18.5 18.9 1 9. 1 1 9. 6 19.6 19. 7 20.2 20.3 20.5 21. 1 21.6 22.0 22. 1 23.6 24. 1 24.5 25.3 26. 1 26.4 27.7 27.8 28.0 28.3 30.6 6 0. 5 61.2 63.3 81. 1 85.5 Standard Deviation= 15.4 Standard Error= 1.77 zscore -0.2 -0. 1 -0. 1 -0. 1 -0. 1 -0. 1 -0. 1 -0. 1 -0. 1 -0.1 0.0 0.0 o.o 0.0 o.o 0.0 0. 1 0. 1 0. 1 0. 1 0.2 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6 0.7 2.6 2.7 2.8 4.0 4.3

PAGE 120

1 1 0 z-scores for the disruptive and norming groups, respec tively. Using the criterion for significance of two standard deviations, or two z-scores, above the mean, which will exclude approximately 981 of a normally distri buted population, a DSBS rating of 30.8 was required for a student to be identified as disruptive. The standard error was used to include cases. The DSBS so identified 32, or 1001, of the students in the disruptive group. Thus, the DSBS demonstrated satisfactory convergent validity. Discriminant validity was estimated by comparing each norming group student's DSBS rating with the criterion rating (30.8 or z-score 22) for inclusion in the disrup tive group. The scores of five norming group students met the criterion for their identification as disruptive. Two of these students also were in the disruptive group. The remaining three students had only one disciplinary refer ral each in the current school year and were not viewed by the deans as disruptive. The DSBS correctly excluded 961 of the norming group and appears to have demonstrated satisfactory discriminant validity. Research Question Four To what degree does the DSBS provide ratings which are stable over time? Fourteen days after the initial rating period, four students from the disruptive group and eight

PAGE 121

1 1 1 from the norming group were rated again by their teachers (Table 12). Twenty-six forms were received for the disruptive group students. Pearson product-moment correlations ranging from .90 to .98 were obtained, with an overall correlation of r = .94. All correlations were significant at p~ .05 or better. For the norming group, 47 forms were received. Corre lations ranging from .72 to .97 were obtained, with an overall correlation of .92. With the exception of the one correlation of .72, which was not significant, all corre lations were significant at p~ .05 or better. The DSBS provided ratings that appear consistent over a period of time. Temporal reliability for the DSBS thus appeared satisfactory. Summary The data seem to suggest that the content of the items selected for the final version of the DSBS was acceptable to teachers as being descriptive of the disruptive behav iors usually observed in middle and junior high schools. In addition, all of the behaviors described by the DSBS items were actually observed in the field study. The data on actual observations also support the conclusion that the DSBS items equitably represent underlying theoretical constructs of disruptive behavior as presented in the research literature.

PAGE 122

112 Table 12. Test-Retest Correlations Disruptive Group Norming Group Student Number 107 1 1 4 1 21 128 * p~. 05 ** p~.01 Correlation Coefficient .98** .95** .90* .94** Student Number 8 16 24 32 40 48 56 64 Correlation Coefficient 7 2 .92** . 90** .92* .97** .94* .91* .97**

PAGE 123

1 1 3 Although the DSBS did not satisfactorily predict the subjective ratings by non-teaching personnel of individual disruptive students, the DSBS did identify the criterion group as being composed of disruptive students. With a high degree of accuracy, the DSBS identified individual disruptive students and excluded non-disruptive students. The reliability data suggest that the obtained DSBS ratings would be consistent over at least a 14-day period. Thus, the DSBS appeared to meet the proposed major criteria for content relevance and representative ness, validity, and reliability. Discussion A major task of this study was choosing behaviors that would represent disruptiveness across a wide range of school environments. Using teachers' reports for an entire year of behaviors that had resulted in disciplinary ref err al s created a large number and variety of potential items. This procedure provided evidence of content valid ity for the items refined from this pool and utilized on the i n st r um en t Each proposed item had been evaluated several times in previous content validation studies in which the basis for judgment was the raters' pa st e xpe ri ence. Nev erth el e ss, difficulties were experienced by teachers in applying to the school environment seven of these prior-approved

PAGE 124

114 items. These teachers, with minimal assistance from the researcher, were able to make revisions to the items that resulted in unanimous acceptance of these items by another group of raters. This experience reaffirmed the hypoth esis (Smith & Kendall, 1963) that involving experts in the content validation procedure and providing for testing items in the target environment would eliminate some prob lems encountered by instrument developers relying more on clinical descriptions and statistical determinations. No problems with applying any DSBS items to the study samples were reported. Numerous studies, cited in the review of literature, have explored broad-band classifications of disruptive behavior. These investigative techniques were based primarily on statistical analysis or clinical classifi cation systems, neither of which seemed to produce items descriptive of the most prevalent disruptive school behaviors. A project preliminary to this study utilized experienced teachers in selecting categories of disruptive behavior from those identified in these previous research efforts. Thus, the scope, or underlying construct base, of the DSBS was defined by expert judges utilizing research-based categories of disruptive behavior. Equit able coverage of the 10 constructs so selected was evaluated by having experienced teachers assign each proposed DSBS item to one of the constructs. After item

PAGE 125

11 5 refinements, a final review by expert judges resulted in the assignment of from three to five items to each con struct. This was accepted as demonstrating equitable distribution of items among the constructs defining the scope of the DSB S. Reporting the distribution of DSBS items by constructs (Table 8) is not intended to imply subscale character istics for the constructs. Too few items are included to assure adequate convergent and discriminant validity and no attempt was made to create mutually exclusive cate gories. However, by examining the rating forms for a student referred for excessive disruptiveness and extract ing key descriptors from the items selected by his or her teachers, a qualitative behavioral profile for prescrip tive use can be proposed. Criterion validity was estimated by comparing teach ers' ratings from the DSBS with non-teaching personnel's subjective ratings of the 32 students in the disruptive group. The emergence of a statistically significant correlation between teachers' and non-teaching personnel's ratings of the disruptive group is not surprising, as the group members were being evaluated on a common variable, disruptiveness. The weak nature of the correlation may be a statistical artifact resulting from the restricted range of the ratings by non-teaching personnel. The results sug gest that disruptiveness ratings by non-teaching personnel

PAGE 126

116 are poor predictors of DSBS ratings and that the weak cor relation is a true indication of the relationship between the tw o rat i n gs s y st em s. Since no objective criteria exist, it is not possible to state conclusively which rating system is superior in ranking students within the disruptive group. Except for borderline cases, this is not a significant limitation. Membership in the disruptive group would be prima facie ev ide nee of the need for al terna ti v e education services, as the criterion has been set at two z-scores, or a disrup tiveness rating above approximately 98$ of the target school population. The influence of environmental conditions on behavior may limit the period of time over which consistent behavioral measures can be obtained. Nevertheless, an instrument must demonstrate some degree of temporal consistency to be considered reliable enough for use in placement decisions. Since the basis for rating students on the DSBS is the teacher's past experience with a student, consistency over a 14-day period should be obtainable. In the follow-up study to estimate temporal reliabi lity, the test-retest correlations were significant for 11 of the 12 cases. For one student, while the scores from each rater varied significantly, the variances tended to cancel out and the overall effect was negligible. The

PAGE 127

117 absolute values were at the very low end of the range and placement decisions would not be made on students with similar scores. Teachers reported some difficulty in rating students who displayed disruptive behavior very infrequently, as the behavior tended to be forgotten by the teachers with even a short passage of time. This is not considered a meaningful challenge to reliability or validity. The overall results of this reliability check suggest that the items are specific enough to describe observed disruptive behaviors and inclusive enough not to be affected by minor variations in rating performance.

PAGE 128

CHAPTER FIVE CONCLUSIONS, IMPLICATIONS, SUMMARY, AND RECOMMENDATIONS Canel usi ons Based on the results of this study, the following conclusions were drawn: 1. Disruptive behaviors described by the items on the DSBS are accepted by educators as occurring in and disrup tive to the school environment. These behaviors are recognizable by classroom teachers in a school setting. 2. In the opinion of experienced educators, the DSBS items are equitably distributed across the theoretical constructs which define disruptive behavior for the purposes of this study. 3. The DSBS demonstrates satisfactory criterion, conver gent, and discriminant validity. The DSBS can correctly identify a criterion group of disruptive students, cor rectly classify individual students as disruptive, and with a high degree of accuracy exclude non-disruptive students from the disruptive group. 4. Except for ratings of students displaying minimal DSB, the DSBS provides consistent ratings over time. 118

PAGE 129

1 1 9 Impl ica ti ons One implication of this study for existent theory (Sanson-Fisher & Mulligan, 1977; Weinrott, 1979) is support for the use of teachers as raters of classroom behavior. Likewise, the study results appear to confirm Epstein's (1980) suggestion that behaviors could be aggregated over situations and time to produce valid ratings. The study al so demonstrated th at disruptive school behavior is a definable category of behavior that can be measured empirically (Edelbrock, 1979; Gresham, 1982; O'Leary & Johnson, 1979). The contention (Dickinson & Zellinger, 1980) that Likert scales remain viable because they are easy to understand and use was borne out. Support is offered for Messick's (1980) suggestion that tests of validity include interpretation, relevance, utility, and consequences of use. The research of Smith and Kendall ( 1963) in taking items from settings in which ratings will be made and involving raters in scale development was supported. The need for considering type, severity, and frequency as criteria for disruptive behavior also was supported (Robins, 1966). Specifically, this study emphasized the necessity of using severity factors to determine the difference between normal and pathological levels of disruptiveness. In addition to supporting prior attempts to develop a theory

PAGE 130

120 for defining and assessing disruptive behavior, this study specifically developed a set of underlying constructs that may prove useful in future studies of disruptive behavior. In proposing a method of establishing local norms, this study provides a procedure for evaluating behavior in a relevant context (Dickinson, 1978; Messick, 1980; Willems, 1975) An important implication for future research is the provision in this study of an alternative to the approaches based on factor analysis, clinical descrip tions, theoretical conceptions, and legal guidelines in describing disruptive behavior. Providing a set of under lying constructs for disruptive school behavior may help to focus future research activities. Emphasizing the importance of perceived severity of disruptive school behavior may influence future researchers to incorporate this factor into their designs. The DSBS might be used as a survey tool in research to identify and describe at-risk groups in the middle and junior high school population. This might also lead to a revising of gender and ethnic stereotypes of disruptive students. Colleges of education could utilize the DSBS items as descriptors of typical disruptive school behavior and offer training in resolution techniques. School psychol ogists could benefit from being trained in assessing disruptive school behavior and making recommendations for

PAGE 131

121 intervention strategies. Procedures used in validating the DSBS have suggested that descriptive statistics are adequate in establishing validity for research instruments to be used in school settings. The DSBS provides a means of assessing needs for alternative programs and for in-service training. The DSBS provides a quantitative basis for admittance to and discharge from alternative education programs, allowing accountability, and perhaps attracting additional fund ing. The DSBS can provide prescriptive information for student program development. The ease of administration and scoring will permit school counselors to assess dis ruptive school behavior and make recommendations for interventions. The DSBS provides for assessment of disrup tive school behavior by teachers, the persons in the best position to observe DSB. Summary The purpose of the study was to develop and validate an instrument, the Disruptive Student Behavior Scale (DSBS). The DSBS is to be used to assess quantitatively the disruptive school behaviors of students referred for placement in either special education or alternative education programs. The study investigated previous attempts to define disruptive behavior; identification, assessment, and placement efforts directed toward

PAGE 132

122 disruptive students; rating scale development procedures; research into the psychometric properties of rating scales; and the use by schools of results obtained from rating scales. In an urban Florida middle school, consisting of grades six through nine, a norming group was . chosen from a sample stratified by grade. A criterion, or disruptive, group was selected by nomination by non-teaching person nel. Classroom teachers rated both groups using the DSBS. A local norm for the school was calculated and the DSBS rating of each sample group student was compared to this norm. A rating of two or more standard deviation units, or z-scores, above the norm resulted in the student being classified as disruptive. A follow-up rating was performed after 14 days to test for temporal reliability. Disruptive behaviors described by the items on the DSBS were accepted by educators as occurring in and disruptive to the school environment. The DSBS items were equitably distributed across the theoretical constructs which define disruptive behavior. The DSBS correctly identified a criterion group of disruptive students, correctly classified individual students as disruptive, and with a high degree of accuracy excluded non-disruptive students from the disruptive group. Except for ratings at the lowest end of the scale, the DSBS provided consistent ratings over time.

PAGE 133

123 Recommendations There are several follow-up studies that need to be conducted. A study could be conducted to investigate the interactive effects on DSBS ratings of such demographic variables as race, gender, and socioeconomic status of both students and teachers. A study is needed to test specifically for the school setting the assumption (Epstein, 1980) of situational reliability, which permits the collapsing of scores across raters to improve reli ability. The usefulness of the DSBS in identifying groups of students in need of remediation and prevention programs needs to be studied. An evaluation of DSBS ratings as a source of placement and discharge data for alternative education programs would be a useful addition to the literature.

PAGE 134

APPENDIX A CONSTRUCT DEVELOPMENT STUDY Instructions for Matching Items with Categories 1. In each envelope there are 64 slips of paper (56 containing items and 8 blank) and a list of categories. 2. Please read each item and assign it to one or more of the categories from the list provided. Write the number of the category(s) in the upper left corner of the slip containing the item. Unless otherwise specified, think of these in terms of academic and conduct behaviors within the classroom or on school property. 3. If any items appear not to fit any category, please write "None" on the slip. 4. Reexamine each slip to verify your choice of category. For those i terns assigned to more than one category, please try again to determine .Q.D.e. category which you think~ represents the intent of the item. Indicate by circling your final choice. 5. Use the blank slips to write down any other categories or items which you think should be considered when determining a student's need for the program. 6. Replace all of the slips in the brown envelopes and return to me. 7. Please retain the list of categories and discuss these with your school contacts during the next two weeks. Indicate additions, deletions, suggestions, etc. on the list. Arrangements for getting this feedback from you will be made later. 8. Thank you all very much for your participation and assistance. 124 Bill Moses 3/6/81

PAGE 135

125 Categories of Constructs Defining Maladaptive Social Behavior 1. Disobedience (personal confrontation) 2. Aggression (verbal or physical, e.g., pushing, hitting) 3. Disruptiveness (in violation of rules) 4. Peer relationship problems (excluding fighting) 5 Law v i o 1 at i on s 6. Attendance violations 7. Fighting (initiates) 8. Academic irresponsibility (fails to complete a ssi gnm ent s) 9. Destructiveness 10. Impul si v ene ss 11. Poor social relationships with school personnel 12. Denial of responsibility for actions 13. Inappropriate sexually-oriented behavior

PAGE 136

126 TO: ACT Program Personnel FR OM: Bil 1 Moses SUBJECT: Analysis of Your Input Into Defining Maladaptive Social Behavior in the Schools DATE: 3/18/81 Fifty-six items were to be assigned to thirteen categories. This assignment resulted in the following distribution: 1. Disobedience defiance defiance of adult authority disrespect/defiance ignoring the teacher non-compliance rebellion resisting authority 2. Aggression aggression anger-defiance hostility inappropriate gross motor behaviors inappropriate verbalizations social aggression using an object in interfering with another 3. Disruptiveness classroom disturbance conduct problems distracting behaviors general unruliness making inappropriate noises out of seat behavior 4. Peer relationships inappropriate group behavior interpersonal alienation interpersonal relationships problems peer relations difficulties socialization difficulties unfriendliness

PAGE 137

127 5 L aw v i o 1 a ti on s none (see note below) 6. Attendance violations attendance/truancy 7. Fighting none (see Aggression) 8. Academic irresponsibility low productivity off-task behavior task av oi da nee 9. Destructiveness de structi v ene ss 1 0 Im p ul s i v e n e s s impul si vi ty inability to delay gratification 11 Poor social relations with school personnel none (see Disobedience) 12. Denial of responsibility for actions externalization of blame i rresponsi bil i ty unreliability 13. Inappropriate sexually oriented behavior none (see Peer relationships, Disobedience, or Disruptiveness, depending on the type and consequences of the behavior) To be included under a category, an item had to be assigned to that category by at least one-half of the raters, or 8 people. The following items did not meet the criterion: Failure to conform to social rules Failure to function independently Inappropriate physical contacts Inattention Inconsiderateness Irrelevant responses Lack of anger control Lack of social control Proneness to emotional upset Seeking attention inappropriately Sensation-seeking Uncooperative Unethical behavior

PAGE 138

128 These items were either too ambiguous or they fit into too many categories, so were dropped. By consensus, six i terns were combined with other similar items, reducing the number of items to 50. Note: Inspection of the list of items suggests few law violation items were included. Also, two items, drug-related behavior and stealing, were suggested as additional items. Recognizing the validity of these items requires continuing the category of law violations and ensuring that behaviors in this category are included in the final checklist. Your ratings resulted in identifying the following ten constructs as defining maladaptive social behavior in the school. In order to operationalize these constructs, I will ask teachers to assign behaviorally stated i terns to each one. To assist them in making their assignments, I have included suggested definitions for each construct. These def i ni ti ons al so were developed from your input. Final Constructs 1. Disobedience a. defying or challenging legitimate authority b. disrespect to teachers and staff 2. Disruptiveness a. classroom behavior which interferes with learning by others and/or the teachers' attempts to teach b. behavior outside the classroom which interferes with the orderly operation of the school 3. Impulsiveness a. reacting immediately with out co nee rn for consequences b. failure to delay gratification 4. Destructiveness a. intentional destroying of school or personal property b. careless, inattentive behavior resulting in the destruction of property 5. Aggression a. verbal or physical threats or attacks to a person b. passive acts that result in harm or hurt feelings 6. Academic irresponsibility a. failure to complete assignments b. task av oi da nee

PAGE 139

129 7. Social/personal irresponsibility a. denial of responsibility b. blaming others c. unreliability 8. Ineffective interpersonal relationships a. alienation of peers resulting in avoidance by them b. lack of regard for others 9. Attendance violations a. truancy b. excessive tardiness or absences 1 0 Law v i o 1 a ti on s a. any behavior which violates criminal laws, e.g., drug possession, arson I would appreciate two kinds of additional feedback from you: (1) Your op1n1on of the definitions of the constructs on this final list of ten. (2) The names of teachers who would be willing to give some time to assigning specific behaviors, from a list, to these constructs. I will visit the schools to explain the project to each teacher. Thank you again for your participation and assistance.

PAGE 140

APPENDIX B BEHAVIORS COLLECTED FROM DISCIPLINARY RECORDS 1. Leaving classroom without permission 2. Lying to the teacher 3. Inappropriate display of affection 4. Dress code violation--backless shoes 5. Denying responsibility for behavior 6. Sleeping in class 7. Smoking 8. Walking around auditorium 9. Possession of cigarettes 10. Abuse of hall pass privilege 11. Leaving school grounds with permission 12. Bringing vulgar materials to class 13. Having extraneous material in class 14. Walking away from other students in line 15. Shooting a bird at girl in class 16. Use of obscene language to other students 17. Kicking classroom door in anger 18. Skipping school 19. Skipping classes 20. Excessive tardiness that is unexcused 21. Possessing a stolen lock from a locker 130

PAGE 141

131 22. Possession of concealed weapon (razor blade) 23. Stealing ($1.00) 24. Hitting a student in class 25. Throwing things at other students 26. Fighting in class 27. Threatened me "leave me alone or else" 28. Verbal threats to students 29. Failure to return report ca rd 30. Interfering in another student's discipline 31. Refuse to complete sentences 32. Rude and disrespectful to teacher 33. Making obscene gestures to teacher 34. Verbally abusive to teacher 35. Silly and impudent acting 36. Defiance of authority-ref used to foll ow direction 37. Running from teacher 38. Refused to give me the comic book he was looking at when cl ass was rev i ew in g h om ew or k 39. Refusal to obey teach er 40. Disturbing class, playing with electronic games 41. Passing wallets back and forth in class 42. Passing notes in class 43. Asking for pass from class excessively 44. Does not follow class standards and procedures 45. Out of seat without permission 46. Talking excessively loud

PAGE 142

132 47. Arrogant (has to have last word) in classroom discussions with teacher 48. Constantly disrupting class by comments that add nothing to class discussion 49. Defiance--warned not to talk anymore or he would receive a referral--said "No, wanted written up" 50. Using vulgar language in class 51. Calling each other names 52. Disrupting testing 53. Inciting other students to misbehavior 54. Horse play in the halls 55. Misbehavior in lunch period (cafeteria) 56. Taking material off the bulletin board 57. Talking during fire drill 58. Singing (in class or hall) 59. Vandalism--tearing up another's clothes 60. Marking on desk 61. Writing obscenities on wall (malicious mischief) 62. Defiance--does nothing when class is assigned work--just sits 63. Failure to dress out for P. E. 64. Took another student's test paper and put their name on it 65. No books, paper, pencils 66. Writing notes to friends in class instead of doing assignments

PAGE 143

APPENDIX C ORAL INSTRUCTIONS FOR THE EDITING STUDY "On these cards, one to a card, are written behaviors that are often considered disruptive when they occur in school. Please read each card carefully and answer the question: 'Would I recognize this behavior if I saw it occur?' If the answer is yes, please place the card in the pile to the left beside the 'yes' card. If not, place the card in the pile to the right beside the 'no' card. In a task such as this, your first reaction is often the most accurate, but you may take as much time as you need. If really undecided, place the card in the 'no' pile." (After finishing) "Thank you. Now please look at the cards in the 'yes' pile again one more time, just to be sure. Feel free to make any other comments that occur to you. (After finishing) Thank you. Now, please look at each card in the 'no' pile and edit or rewrite the item in specific terms on one of these blank cards." (Clip the two cards together as S completes task). (After finishing) "Thank you very much for your cooperation." 133

PAGE 144

APPENDIX D ITEMS DEVELOPED FROM CONTENT VALIDATION STUDY 1. Has refused to follow my disciplinary instructions Ce. g., to sit down, be quiet). 2. Has broken rules primarily to provoke a confrontation with me. 3. Has disrupted class by inappropriate activities (e.g., throwing objects, passing notes, hitting, walking around). 4. Has contributed to the destruction of personal or school property through carelessness, inattention, or neglect (e.g., using equipment improperly, failing to report unsafe conditions). 5. Has threatened to harm other students. 6. Has fought with other students. 7. Has been observed cheating on assignments (e.g., using notes, copying). 8. Has falsely denied being involved in a disruptive activity (e.g., breaking rules, cheating, fighting). 9. Has alienated other students (e.g., using obscene language or gestures, name calling). 10. Has left class without permission. 11. Has refused to accept disciplinary interventions from me (e.g., sentences, detention, referral to dean). 12. Has been seen outside of class being disruptive (e.g., running, shouting, singing, playing in the halls; throwing food in cafeteria). 13. Has displayed impatience (e.g., demanded immediate attention, refused to wait turn). 14. Has intentionally damaged or destroyed personal property (e.g., clothes, book, bicycle). 134

PAGE 145

135 15. Has hit or thrown objects at other students. 16. Has avoided classroom assignments through passive activities (e.g., sleeping, not bringing materials). 17. Has broken rules primarily for personal convenience ( e . g. , sm o k i n g , eat i n g , d r e s s co de v i o 1 a ti on s ) 18. Has blamed others for his or her actions. 19. Has been careless or malicious, resulting in harm to another person. 20. Has been present at homeroom but absent from class without permission. 21. Has interfered when I was di sci pl ining another student. 22. Has encouraged other students to break school or classroom rules. 23. Has acted impulsively, without seeming to care about the consequences. 24. Has defaced school property (e.g., writing on walls or sidewalks, carving on desk). 25. Has threatened to harm me. 26. Has failed to complete classroom assignments s a ti sf act or i 1 y fr om 1 a ck of int er est < e . g. , or al or written work, projects, dressing out). 27. Has been unreliable (e.g., lying, failing to return report cards or borrowed articles). 28. Has attempted to manipulate me (e.g., using friendship or flattery, exaggerating illness). 29. Has been late coming to class. 30. Has been observed in possession of stolen property (e.g., clothes, books, money, purse, lock). 31. Has treated me with rudeness and disrespect (e.g., back talk, verbal abuse, obscene gestures). 32. Has disrupted class by inappropriate talking (e.g., inappropriate questions, name calling, loudness, obscenity).

PAGE 146

136 33. Has displayed lack of control (e.g., kicking door, hit ti n g w al 1 ) . 34. Has damaged or destroyed school property, making it unusable (e.g., breaking windows or equipment). 35. Has hit or thrown objects at me. 36. Has been unable to complete assignments satisfactorily because of the influence of non prescription drugs or alcohol. 37. Has been observed in possession of illegal drugs or alcohol. 38. Has at school committed theft (e.g., from a locker) or robbery (from a person). 39. Has been observed in possession of a weapon or dangerous object. 40. Has been observed committing other major conduct violations (e.g., arson, battery, bomb threat, extortion, pulling fire alarm, shooting fireworks).

PAGE 147

APPENDIX E INSTRUCTIONS FOR CONTENT VALIDATION STUDY Field Research: The Disruptive Student Behavior Scale Dear Colleague: Thank you for agreeing to participate in this research study. Please read all of the directions before beginning. ___ will answer any questions you may have. 1. Enclosed are two sets of slips. The slips labelled "Constructs" are categories of disruptive behavior. The slips labelled "Items" are statements of possible disruptive school behaviors. 2. Assign each item to .Q.ll.e. category of behavior if possible. 3. If an item obviously fits more than one category, write all category numbers on the slip but put it under the category it fits best. 4. If an item seems to fit no category, write "none" on the item slip. If the item can be edited to fit a category, please feel free to rewrite it and include it under the appropriate category. 5. Your comments on both the categories and items would be appreciated. Please write directly on the slips. 6. Clip the i tern slips to the appropriate category slips. Please place all materials into the original envelope and return to ___ 's mailbox as soon as possible. ___ will have for you an envelope containing your honorarium. Again, thanks for assisting me in this project to develop an instrument for quantifying disruptive behavior. If you would like to receive a copy of the completed instrument, write your name below. Sincerely, /s/ Bill Moses Date 137

PAGE 148

APPENDIX F THE DISRUPTIVE STUDENT BEHAVIOR SCALE (DSBS) by William Moses Student's name ___________ _ Grade __ Age __ Sex __ Rater's name ______________ _ Subject _____ _ School _______________ Class Period _______ _ Date of ratino----------Di rec ti ons: Pl ease take a few minutes to recall YQJJr. observations of this student during the current school year. Then respond to the statements on the following pages by placing an (X) through the appropriate number. Please return this form to ________ by ____ _ Example: In the current school year, this student: 1. Has been irresponsible in returning borrowed articles. 0 None of the time 1 Very inf req ue ntl y 2 Sometimes 3 Quite often 0 None of the time= absolutely never 4 Al ways 1 Very infrequently= no more than once a month 2 Sometimes= more than once a month 3 Quite often= more than once a week 4 Always = daily (Blue Stock) 138

PAGE 149

Key: 0 None of the time 1 Very infrequently 1 39 2 Sometimes In this school year, this student: 1. has refused to follow my di sci pl ina ry (e.g., to sit down, be quiet). 0 1 2 3 2. has threatened to harm other students. 0 1 2 3 3 Quite often 4 Always instructions 4 4 3. has disrupted class by inappropriate activities (e.g., throwing objects, passing notes, hitting, walking around). 0 1 2 3 4 4. has contributed to the destruction of personal or school property through carelessness, inattention, or neglect (e.g., using equipment improperly, failing to report unsafe conditions). 0 1 2 3 4 5. has falsely denied being involved in a disruptive activity Ce. g., breaking rules, cheating, fighting). 0 2 3 6. has been observed in possession of stolen property Ce. g., clothes, books, money, purse, lock). 0 1 2 3 4 4 7. has attempted to manipulate me (e.g., using friendship or flattery, exaggerating illness). 0 1 2 3 4 8. has displayed lack of impulse control (e.g., kicking door, shouting out). 0 1 2 3 4 Page Score

PAGE 150

140 Key: 1 4 0 None of the time Very infrequently 2 Sometimes 3 Quite often Al ways In this school year, this student: 9. has been late coming to class. 0 1 2 3 4 10. has failed to complete classroom assignments satisfactorily from lack of interest (e.g., oral or written work, projects, dressing out). 0 2 3 4 11. has def aced school property (e.g., writing on walls si dew al ks, carving on desk) . 0 1 2 3 4 1 2. has been deliberately rude or impel i te to other students. 0 1 2 3 4 or 1 3. has broken rules resulting in a confrontation with me. 0 1 2 3 4 14. has been observed in possession of illegal drugs or alcohol. 0 1 2 3 4 1 5. has displayed impatience (e.g. , demanded immediate attention, refused to wait turn) 0 1 2 3 4 1 6. has left cl ass with out permission. 0 1 2 3 4 17. has hit or thrown objects at other students. 0 2 3 4 Page Score

PAGE 151

141 Key: 0 1 2 3 4 None of Very Sometimes Quite Al ways the time infrequently often In this school year, this student: 18. has avoided classroom assignments through passive activities (e.g., sleeping, not bringing materials, being under the influence of drugs). 0 1 2 3 4 19. has been seen outside of class being disruptive (e.g., running, shouting, singing, playing in the halls; throwing food in cafeteria). 0 2 3 4 20. has broken rules primarily for personal co nv eni ence Ce. g., smoking, eating, dress code violations). 0 1 2 3 21. has been present at homeroom but absent from class without permission. 0 1 2 3 22. has acted impulsively, without regard for the con seq ue nee s. 0 1 2 3 23. has threatened to harm me. 0 1 2 3 4 4 4 4 24. has at school committed theft (e.g., from a locker) robbery ( from a person). 0 1 2 3 4 25. has cheated on assignments (e.g., using notes, copying). 0 1 2 3 4 or Page Score

PAGE 152

Key: 0 None of the time 1 Very infrequently 142 2 Sometimes In this school year, this student: 3 Quite often 4 Always 26. has refused to accept disciplinary interventions from me (e.g., sentences, detention, referral to dean). 0 1 2 3 4 27. has interfered when I was disciplining another student. 0 1 2 3 4 28. has alienated other students (e.g., using obscene language or gestures, name calling). 0 1 2 3 4 29. has blamed others for his or her actions. 0 1 2 3 4 30. has damaged or destroyed school property, making it unusable (e.g., breaking windows or equipment). 0 1 2 3 4 31. has treated me with rudeness and disrespect (e.g., back talk, verbal abuse, obscene gestures). 0 1 2 3 4 32. has hit or thrown objects at me. 0 1 2 3 4 33. has been observed in possession of a weapon or dangerous object. 0 1 2 3 4 Page Score

PAGE 153

Key: 0 None of the time 1 Very infrequently 143 2 Sometimes In this school year, this student: 3 Quite often 4 Always 34. has disrupted class by inappropriate talking (e.g., inappropriate questions, name calling, loudness, obscenity). 0 1 2 3 4 35. has been unreliable (e.g., lying, failing to return report cards or borrowed articles). 0 1 2 3 4 36. has fought with other students. 0 1 2 3 4 37. has intentionally damaged or destroyed personal property (e.g., clothes, book, bicycle). 0 1 2 3 4 38. has been observed committing other major conduct violations (e.g., arson, battery, bomb threat, extortion, pulling fire alarm, shooting fireworks). 0 1 2 3 4 Page Score Form Score

PAGE 154

APPENDIX G INSTRUCTIONS FOR SEVERITY FACTOR STUDY Dear Colleague: Thank you for your help with this study to develop an instrument for measuring disruptive school behavior. Enclosed are 40 statements of disruptive behaviors which teachers have observed in middle and junior high schools. The purpose of this portion of the project is to determine a severity rating for each of the 40 statements. Severity is defined as "a prediction, stated quantitatively, of the potentially detrimental consequences a disruptive behavior would likely have for a student." One system of identifying these con seq ue nee s is by completing the following statement: This behavior would probably be detrimental to the student's 1. relations with school personnel 2. relations with peers 3. future vocational opportunities 4. mental development 5. emotional development 6. physical development 7. learning of course material 8. course grade 9. remaining in school To assign a s~verity factor to the 40 behavioral items, please write below each item the number(s), 1-9, of the consequences that are most likely to result from that behavior. Please work independently of the other judges. After completing this task, please return all materials in the envelope provided. Thank you again for this help. Sincerely, Isl Bill Moses P. S. An honorarium is enclosed to partially compensate you for your time. 144

PAGE 155

APPENDIX H SCORING TEMPLATE FOR THE DSBS Directions: 1. Align the arrow on the template (green) with the arrow on the rating scale (blue). 2. Read the score above the rater's mark (X) and record on the rating scale (blue) as indicated. 3. Total the student's scores for each page and for the completed form. 4. Record the form score onto the summary of Teacher Ratings (yellow). 145

PAGE 156

146 (Trimmed to Line) SCORING TEMPLATE FOR THE DSBS ,. 0 2 4 6 8 2. 0 1 2 3 4 3. 0 2 4 6 8 4. 0 1 2 3 4 5. 0 1 2 3 4 6. 0 2 4 6 8 'rT 0 2 4 6 8 8. O 2 4 6 8 (Green Stock)

PAGE 157

147 9. 0 2 3 4 10. 0 2 4 6 8 1 1 0 1 2 3 4 12. 0 2 4 6 8 13. 0 1 2 3 4 14. 0 3 6 9 1 2 15. 0 2 4 6 8 16. 0 2 4 6 8 17. 0 2 4 6 8 (Green Stock)

PAGE 158

148 1 8. 0 2 4 6 8 1 9. 0 1 2 3 4 20. 0 2 3 4 21. 0 2 4 6 8 22. 0 2 4 6 8 23. 0 2 4 6 8 24. 0 3 6 9 12 25. 0 2 4 6 8 (Green Stock)

PAGE 159

149 26. 0 1 2 3 4 27. 0 1 2 3 4 28. 0 2 3 4 29. 0 2 4 6 8 30. 0 2 4 6 8 31. 0 1 2 3 4 32. 0 2 4 6 8 33. 0 3 6 9 12 (Green Stock)

PAGE 160

150 34. 0 3 6 9 12 35. 0 1 2 3 4 36. 0 2 4 6 8 37. 0 2 4 6 8 38. 0 3 6 9 12 (Green Stock)

PAGE 161

..... U1 ..... APPENDIX I SUMMARY OF TEACHER RATINGS ON THE DSBS Directions: 1. Enter below the information from each DSBS form (blue). 2. Add the form scores to obtain the student's total score. 3. Divide the total score by the number of raters to obtain the student's disruptive school behavior (DSB) rating. 4. Enter your local DSB norm and SD from the norming study 5 Complete the calculations to obtain the deviation (in standard deviation units) from your 1 ocal norm. Student's name _____________________ _ Grade ___ _ Age ___ _ Sex ___ _ School __________________ _ Evaluator ___________ _ Title ______ _ Form score Total score Nbr of raters = Rater's name Student's DSBS rating Local DSBS norm = Subject Deviation from local norm + (Yellow Stock) Local SD = Period SD' s from local norm

PAGE 162

APPENDIX J PRESCRIPTIVE PROFILE WORKSHEET FOR THE DSBS Student _______________ _ Grade __ Age __ Sex __ School ________ _ Evaluator ______ _ Title ___ _ Directions: 1. Using the DSBS form (blue) and the scoring template (green), enter below each teacher's rating for each scale item. Do .n.o..t use raw scores from the DSBS. Note the different sequence of the item numbers below. 2. Add the ratings across by item and .d.o.HJl by rater. 3. Add the Item Totals down and the Rater Totals across. The two totals must agree. 4. \J1 Add the item totals for each profile category and enter in the appropriate box. Profile definitions are provided for guidance in preparing qualitative behavioral descriptions and suggesting prescriptive interventions. They are .n.o..t intended for use in placement decisions. f\.) DSBS Item Number 1. 1 3. 26. 31. Totals (a)_== (e)=== (f)=== (Beige Stock) Item Totals Disobedience Rating

PAGE 163

DSBS Item Item Number Totals 3. 1 9. 27. 34. Di srupti v ene ss Totals (a) ( b) ( C ) ( d ) ( e) ( f) Rating 8. 1 5. 22. Impul si v ene ss Totals ( a ) ( b) ( C) ( d) ( e) ( f) Rating 4. __. 11. V1 w 30. 37. De structi v ene ss Totals ( a ) ( b) ( C ) ( d) ( e) ( f) Rating 2. 17. 23. 3 2. 38. Aggression Totals ( a > ( b) ( C) ( d) ( e) ( f) Rating (Beige Stock)

PAGE 164

DSBS Item Item Number Totals 10 18. 25. Academic Totals (a) Cb) ( C) ( d ) ( e) ( f) Irresponsibility Rating 5. 20. 29. 35. Personal __. Totals ( a ) ( b) ( C) ( d) ( e) ( f) Irresponsibility U1 Rating 7. 12. 28. Ineffective Totals ( a ) ( b) ( C ) ( d) ( e) ( f) Interpersonal Relationships Rating 9. 1 6. 21. Attendance Totals ( a ) ( b) ( C) ( d) ( e) ( f) Violations Rating (Beige Stock)

PAGE 165

DSBS Item Number 6. 1 4 24. 33. 36. Tot al s ( a >--( b >--~ ( c ) ___ ( d )==-=""""""-=-= ( e >--=-=-== ( f )-=== Rater Totals (a )===(b)===(c )===(d )===(e )===( f)=== (Beige Stock) Item Totals Law Violations Rating ====I DSBS Rating ...... IJ1 IJ1

PAGE 166

156 Definitions of Profile Categories from the DSBS These profile descriptions are provided for guidance in preparing qualitative behavioral descriptions and s ugge sting pr escr i pti v e i nterv enti ons. These de scr ipti ons are .llQt. intended for use in placement decisions. DSBS item numbers are in ( ). 1. Disobedience a. defying or challenging legitimate authority (refusing legitimate request) (1, 26) b. disrespect to teachers and staff (31) c. intentionally breaking rules to show defiance (13) 2. Disruptiveness a. classroom behavior which interferes with learning by others and/or the teacher's attempts to teach (3, 27, 34) b. behavior outside the classroom which interferes with the orderly operation of the school (19) 3. Impulsiveness a. reacting immediately without concern for consequences (8, 22) b. failure to delay gratification (15) 4. Destructiveness a. in ten ti onal destroying of sch oo _ l or personal pr ope rt y ( 1 1 , 3 0 , 3 7 ) b. careless, inattentive behavior resulting in the destruction of property (4) 5. Aggression a. verbal or physical threats or attacks to a person (2, 17, 23, 32, 38) 6. Academic irresponsibility a. failure to complete assignments satisfactorily ( 10) b. task avoidance (18) c. cheating (25) 7. Personal irresponsibility a. denial of his/her involvement in an activity (5) b. blaming others for his/her actions (7) c. unreliability (35) d. intentionally breaking rules for personal convenience (20) 8. Ineffective interpersonal relationships a. alienation of peers resulting in avoidance by them (28)

PAGE 167

157 b. lack of regard for others ( 12) c. manipulation of others (7) 9. Attendance violations a. leaving class (16) b. excessive tardiness or absences (9, 21) 1 0 Law v i o 1 a ti on s a. any behavior which violates criminal laws, e.g., drug possession, arson (6, 14, 24, 33, 36)

PAGE 168

APPENDIX K INSTRUCTIONS FOR THE PILOT STUDY Thank you for agreeing to participate in this pilot study of an instrument to measure disruptive behavior. The latest Gallup education poll, the 16th, shows that the major public concern about education continues to be disruption in the schools. Education association (for example, the NEA) polls show that many teachers feel the same way. This instrument we are field testing here may help identify students who need special assistance in order to develop appropriate school behavior. A number of students at this school have been randomly chosen as representative of all the students. The behaviors of these students are being measured to determine the usual behavior patterns here. Another group of students has been chosen on the basis of referrals to the deans for disciplinary reasons. Their behaviors are being measured to see how different they are from the other, or average, group. Each of you has received a rating form, the blue form, for each period you teach one of the students selected for either group. Each form contains (number) items descrip tive of disruptive behavior. You are being asked to rate each student only on those behaviors included on the instrument and only those you have actually observed during this school year. Please follow along as I read the directions on the blue form. If at any time there is a question, please make a note on the form and I will answer all your questions after reading through the directions. (Read the directions and the example for this scale.) Are there questions? (Answer any.) (Name) is your school coordinator for this study. He/she will answer any questions that may arise later and also collect the completed forms in his/her mail box. Please replace the forms in the original envelope and return to (Name) Today is (day) , the (date) . What is a reasonable date for returning the rating forms? (Determine reasonable deadline and get majority concurrence.) 158

PAGE 169

159 Now, please look at the last page in your packet. I am very interested in your reactions to this instrument and would a ppr eci ate, after you complete your ratings, if you would answer the few questions on this survey form. Remember to rate each student on the basis of only the behaviors included on the instrument. Thank you all very much for your valuable contribu tions. Instruments have to be f iel ct-tested and you a re the people best qualified to do this.

PAGE 170

APPENDIX L RATER'S EVALUATION OF THE DSBS Please use the following guidelines for your answers: 0 No 1 Somewhat 2 Yes 1. The directions could be followed easily. 0 1 2 2. The length of time to complete the ratings was reasonable. 0 1 2 3. The items seemed related to disruptive behavior. 0 1 2 4. The rating task was interesting. 0 1 2 5. The i terns were understandable. 0 1 2 6. The information needed to respond to the i terns was known to you. 0 1 2 160

PAGE 171

APPENDIX M ASSIGNMENT OF CONSTRUCTS BY ITEM NUMBER DSBS Item Numbers Construct Number 1 1 2 5 3 2 4 4 5 7 6 10 7 8 8 3 9 9 1 0 6 1 1 4 12 8 13 1 14 10 15 3 16 9 17 5 18 6 19 2 20 7 21 9 22 3 23 5 24 1 0 25 6 26 1 27 2 28 8 29 7 30 4 31 1 32 5 33 10 34 2 35 7 36 1 0 37 4 38 5 161

PAGE 172

REFERENCES Abikoff, H., Gittelman, R., & Klein, D. ( 1980). Classroom observation code for hyperactive children: A replication of validity. Journal of Consulting and Clinical Psychology, .!la<5>, 555-565. Achenbach, T. (1978). The child behavior profile: I boys aged 6-11. Journal of Consulting and Clinical Psychology, ll<3>, 478-488. Achenbach, T., & Edelbrock, C. of child pychopathology. 1275-1301 . ( 1978). The classification Psy chol ogi cal Bulletin, 8-2. Alba um, G., Best, R., & Hawkins, D. ( 1981). Continuous vs. discrete semantic differential rating scales. Psychological Reports, ll, 83-86. Algozzine, B. ( 1977). The emotionally disturbed child: Disturbed or disturbing? Journal of Abnormal Child P sy ch ol o gy , 2, 2 o 5-211 Algozzine, B. (1979). The disturbing child: A validation report. Minneapolis, Minn.: University of Minnesota, Institute for Research on Learning Disabilities. Algozzine, B. ( 1980). The disturbing child: A matter of opinion. Behavioral Disorders, 2.(2), 112-115. Algozzine, B., & Ysseldyke, J. ( 1981). Special education services for normal children: Better safe than sorry? Exceptional Children, 48(3), 238-243. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.>. Wash in gt on, D. C. : Author American Psychological Association. < 1966). Standards for educational and psychological tests and manuals. Washington, D. C. : Author. Amos, K. ( 1980). Competency testing: Will the LD student be included? Exceptional Children, il(3), 194-197. 162

PAGE 173

163 Arnove, R., & Strout, T. (1978). Alternative schools and cultural plural ism: Promise and reality. Educational Research Quarterly, 2(4), 74-95. Bailey, D., Bender, W., &Montgomery, D. (1983). Compari son of teacher, peer, and self-ratings of classroom and social behavior of adolescents. Behavior Disorders, .8.<3), 153-159. Balow, H. (1979). Definitional and prevalence problems in behavior disorders of children. School Psychology Digest, .8., 348-354. Bardo, J., & Yeager, S. (1982). Note on reliability of fixed-response formats. Perceptual and Motor Skills, 2!!_, 1163-1166. Bardo, J., Yeager, S., & Klingsporn, M. ( 1982). A pr el imina ry assessment of format-specific central tendency and leniency error in summated rating scales. Perceptual and Motor Skills, 2-!L, 227-234. Bass, B., Cascio, W., & 0' Conner, E. ( 1974). Magnitude estimation of expressions of frequency and amount. Journal of Applied PsycholoiY• ~. 313-320. Bayh, B. < 1975). Our nation's schools, a report card: "A" in school violence and vandalism. Washington, D.C.: Senate Committee on the Judiciary, Subcommittee to Investigate Juvenile Delinquency. Beatty, R., Schneier, C., & Beatty, J. (1977). An empirical investigation of perceptions of ratee behavior frequency and ratee behavior change using Behavior al Expect a ti on Seales
PAGE 174

164 Bel tramini, R. ( 1982). discriminability. Rating-scale variations and Psychological Reports. 2.Q, 299-302. Bennett, C., & Harris, J. ( 1982). Suspensions and expulsions of male and Black students. Urban Education, 1.6.<4>, 399-423. Benson , J , & C 1 a r k , F. ( 1 9 8 2 ) A g u i de for i n st rum en t development and validation. The American Journal of Occupational Therapy, .iQ.(12>, 789-800. Bernardin, H., LaShells, M., Smith, P., & Alvares, K. ( 1976). Behavioral expectation scales. Journal of Applied Psychology, .6..1< 1 >, 75-79. Bernardin, H., & Pence, E. ( 1980). Effects of rater training: Creating new response sets and decreasing accuracy. Journal of Applied Psychology, .6..5., 60-66. Bernardin, H., & Smith, P. ( 1981). A clarification of some issues regarding the development and use of behaviorally anchored rating scales (BARS). Journal of Applied Psycholo~y, ~<4>, 158-463. Blanz, F., & Ghiselli, E. (1972). scale: A new rating system. 2..5., 185-199. The mixed standard Personnel Psychology, Bow er , E. ( 1 9 5 8 ) A pr o c e s s for ea r 1 y i de n ti f i ca ti on of emotionally disturbed children. Bulletin of the California State Department of Edu ca ti on, ll, 1-111 Bower, E. ( 1982). Defining emotional disturbances: Public policy and research. Psychology in the Schools, 1..2., 55-60. Bower, E., & Lambert, N. (1971). In-school screening of children with emotional handicaps. In A. Long, O. Morse, & J. Newman (Eds.), Conflict in the classroom, (pp.142-148). Belmont, Cal.: Wadsworth. Bower, R. < 1969>. Early identification of emotionally handicapped children in schools (2nd. ed.>. Springfield, Ill.: Charles C. Thomas. Broadbent, D., Cooper, P., Fitzgerald, P., & Parkes, K. (1982). The Cognitive Failures Questionnaire (CFQ) and its correlates. British Journal of Clinical P sy ch ol o gy , 2.1, 1 1 6 Brown, L., & Hammill, D. (1978). The behavior rating profile. Austin, Tex.: Pro-ed.

PAGE 175

165 Bruvold, W. (1969). Category and successive intervals scales for rating statements and stimulus objects. Journal of Experimental Psychology, ti, 230-234. Bures, o. (Ed.>. < 1978). The eighth mental measurements yearbook. Highland Park, N.J.: The Gryphon Press. California Department of Education. ( 1973). Conflict and violence in California's high schools. Sacramento: Author. Caliste, E. ( 1979). Students' adjustment from open to structured schools. Contemporary Education, 2.Q.( 3), 138-151. Camp, w. (1980). Educator perceptions of student di sci pl ine. Paper presented at the annual meeting of the American Educational Research Association (Boston, Mass., April 7-11) (ERIC Document Reproduction Service No. Ed 187 006). Camp, W. (1981). Identifying problem behaviors: Indiana teachers 1 ook at student di sci pl ine. National Association of Secondary Sch 001 Principals Bul 1 eti n, .Q2 C 4 41 ) , 4 5-4 8. Cattell, R. (1978). The scientific use of factor analysis in behavioral and life sciences. New York: Plenum. Chase, C. (1969). Often is where you find it. American P sy ch ol o g i st , 2 4 , 1 o 4 3 Conners, C. ( 1969). A teacher rating scale for use in drug studies with children. American Journal of Psychiatry, .12.6., 884-888. Cooper, W. < 1981 >. Ubiquitous halo. Psychological Bulletin, .2..Q.(2), 218-244. Cooper, W. (1983). Internal homogeneity, descriptiveness, and halo. Personnel Psychology, .3..6., 489-502. Cowen, E., Dorr, D., Clarfield, S., Kreling, B., McWilliams, S., Pokracki, F., Pratt, D., Terrell, D., & Wilson, A. ( 1973). The AMI..: A quick-screening device for early identification of school maladaptation. American Journal of Community Psych ol o gy , J. < 1 > , 1 23 5. Cronbach, L. < 1970). Essentials of psychological testing ( 3 rd e d . ) . New Yo r k : H a r p e r & R ow .

PAGE 176

166 Cross, H., & Kohl, M. (1978). Predelinquent behavior in males: Perspectives and suggestions. Journal of Research and Development in Education, JJ.(2), 34-41. Cullinan, D. (1975). Behavior modification with disruptive, delinquent, and otherwise deviant adolescents. Wilkes-Barre, Pa.: Wilkes College, Edu ca ti onal Dev el opm ent Center. Davis, R., Butler, N., & Goldstein, N. ( 1972). From birth to seven: Second report of the National Child Development Study. London: Longman and National Children's Bureau. Dayton, c. C 1967). Technical manual: Pupil classroom behavior scale. College Park, Md.: University of Mary land. Dean, R. ( 1980). Teachers as raters of aberrant behavior. Journal of School Psychology, 1..(4), 354-359. Deibert, J., & Hoy, W. ( 1977). "Custodial" high schools and self-actualization of students. Educational Research Quarterly, ZC2), 24-31. Deno, s. C 1979). A direct observation approach to measuring classroom behavior: Procedures and application. Minneapolis: University of Minnesota, Institute for Research on Learning Disabilities. Department of Health, Education, and Welfare (August 23, 1977). Implementation of part b., P.L. 94-142. Federal register, Part II, Section 121a.5. Dickinson, D. ( 1978). Direct assessment of behavioral and emotional problems. Psychology in the Schools, 1..5.(4), 472-477. Dickinson, T., & Zellinger, P. ( 1980). A comparison of the behaviorally anchored rating and mixed standard scale formats. Journal of Applied Psychology, .Q.5.(2), 147-154. Di Prete, T. ( 1981). Discipline, order and student behavior in American high schools. Washington, D.C.: National Center for Education Statistics. DiStef ano, M., Pryer, M., & Erffmeyer, R. ( 1983). Application of content validity methods to the

PAGE 177

167 development of a job-related performance rating criterion. Personnel Psychology, .3..6., 621-631. Dreger, R. (1980). The initial standardization of the Adolescent Behavioral Classification Project Instrument. Journal of Abnormal Child Psychology, .8.< 3), 297-322. Duke, D. ( 1978). Looking at the school as a rule-governed organization. Journal of Research and Development in Education, 11.(4), 116-126. Dunlap, W., & Dillard, J. (1980). the principal's perception: Disorders, .5.(2), 108-111. Problem behavior from A new view. Behay ioral Dunnett, S., Koun, S., & Barber, P. ( 1981). Social desirability in the Eysenck Personality Inventory. British Journal of Psychology, 12, 19-26. Dunnette, M. ( 1963). A modified model for test validation and selection research. Journal of Applied Psychology, il, 317-323. Duval County Public Schools. (1980). Code of student conduct. Jacksonville, Fla.: Author. Duval County School Board. ( 1979). Project ACT: Accountability in citizenship training manual. Jacksonville, Fla.: Author. Eaves, R. ( 1975). Teacher race, student race, and the Behavior Problem Checklist. Journal of Abnormal Child Psychology , .3. < 1 > , 1-9. Edelbrock, C. (1979). Empirical classification of children's behavior disorders: Progress based on parent and teacher ratings. School Psychology Digest, a(4), 355-369. Edelbrock, C., & Achenbach, T. (1978). Child behavior profile patterns of children referred for clinical services. Bethesda, Md.: National Institutes of Mental Heal th. ( ERIC Document Reproduction Service No. ED 768 259) Edelbrock, C., & Achenbach, T. (1980). A typology of Child Behavior Profile patterns: Distribution and correlates for disturbed children aged 6-16. Journal of Abnormal Child Psychology, a,4>, 441-470. Edwards, A. ( 1957 >. TechniQ ue s of attitude seal e construction. New York: Prentice-Hall.

PAGE 178

168 Edwards, A., & Thur stone, L. ( 1952). An internal consistency check for scale values determined by the method of successive intervals. Psychometrika, ll, 169-180. E dw a rd s , D. , Round t re e , G. , Ke n t , S. , & Pa r k e r , J ( 1 9 8 1 ) An investigation of truancy and subsequent antisocial behavior. Corrective and Social Psychiatry and Journal of Behavior Technolo~y Methods and Therapy, 2..8.(1), 5-9. Epstein, M., Cullinan, D., & Rosemier, R. (1983). Behavior problems of behaviorally disordered and normal adolescents. Behavior Disorders, ~(3), 171-175. Epstein, S. (1980). The stability of behavior: II. Implications for psychological research. American P sy ch ol o g i st , .32 < 9 > , 7 9 O8 o 6 Erickson, K. < 1963). The wayward Puritans. New York: Wiley. Erickson, M. (1974). The effects of initial information and observational set upon social perception (Doctoral dissertation, University of Oregon, 1973). Dissertation Abstracts International, J.E., 4659B. Evertsen, C., Anderson, L., &Brophy, J. (1979). Texas junior high school study (R&D Rep. No. 4061). Austin, Tex.: University of Texas, Research and Development Center for Teacher Education. Evertsen, C., & Veldman, D. ( 1981). Changes over time in process measures of classroom behavior. Journal of Educational Psychology, I.3.(2), 156-163. Faretra, G. ( 1981). A profile of aggression from adoles cence to adulthood: An 18-year follow-up of psychiatrically disturbed and violent adolescents. American Journal of Orthopsychiatry, 21<3>, 439-453. Fay, C., & Latham, G. ( 1982). Effects of training and rating scales on rating errors. Personnel Psychology, .32, 105-116. Feldhusen, J. (1978). Behavior problems in secondary schools: Final report. Washington, D. c.: Department of Health, Education, and Welfare; National Institute of Education.

PAGE 179

169 Filipczak, J. (1978). Social skills training in the community with behaviorally disruptive youth. Silver Spring, Md.: Institute for Behavioral Research. (ERIC Document Reproduction Service No. ED 170 632) Finley, D., Osburn, H., Dubin, J., & Jeanneret, P. ( 1977). Behaviorally based rating scales: Effects of specific anchors and disguised scale continua. Personnel Psychology, ~, 659-669. Florida Department of Education. ( 1975). District procedures for providing special education for exceptional students. Tal 1 ahassee, Fl a. : Author. Florida Department of Education. ( 1983). Report on public schools. Tallahassee, Fla.: Author. Florida Department of Education. ( 1985). A resource manual for the development and evaluation of special programs for exceptional students (Vol. 1-B>. Tallahassee, Fla.: Author. Fogel, L., & Nelson, R. (1983). The effects of special education labels on teachers' behavioral observations, checklist scores, and grading of academic work. Journal of School Psychology, 2..1, 241-251. Foley, M. ( 1982). Punishment and the disruptive student. Contemporary Education, 2....3.(2), 92-96. Forness, S. (1983). Diagnostic schooling for children or adolescents with behavioral disorders. Behavioral Disorders, .8.<3>, 176-187. Forness, S., & Cantwell, D. ( 19.82). DSM-III psychiatric diagnoses and special education categories. Journal of Special Education, .16.(1), 49-63. Forness, S., Sinclair, E., & Russell, A. (1984). Serving children with emotional or behavior disorders. American Journal of Orthopsychiatry, ~(1), 22-32. Fraser, C. (1980). Measurement in psychology. British Journal of Psychology, 1.1, 23-34. Fremont, T., & Wallbrown, F. (1979). Types of behavior problems that may be encountered in the classroom. Journal of Education, ill.(2), 5-24.

PAGE 180

170 Gajar, A., & Hale, R. (1982). Factor analysis of the Quay Peterson Behavior Problem Checklist across racially different exceptional children. The Journal of Psychology, ..1..12, 2 87-293. Gallup, G. (1984). The 16th annual Gallup poll of the public's attitudes toward the public schools . ...P.b.i Delta ~appan, M<1>, 23-38. Garbarino, J. (1980). Some thoughts on school size and its effectson adolescent development. Journal of Youth and Adolescence, .9..(1), 19-29. Garibaldi, A. (Ed.) (1979). In-school alternatives to suspension conference report. Washington, D.C.: Department of Health, Education, and Welfare; National Institute of Education. Gaynor, J., & Gaynor, M. (1976). The Delaware function rater: A method of Quantifying classroom behavior. Wilmington, Delaware: Author. (ERIC Document Reproduction Service No. ED 133-983) Geiger, K., & Turiel, E. (1983). Disruptive school behavior and concepts of social convention in early adolescence. Journal of Educational Psychology, 1.2(5), 677-685. Gerard, R. (1970). Institutional innovations in juvenile corrections. Federal Probation, .3A., 37-44. Ghodsian, M., Fogelman, K., Lambert, L., & Tibbenham, A. ( 1980). Changes in behaviour ratings of a national sample of children. British Journal of Social and Clinical Psychology, ll., 247-256. Goldsmith, A. (1982). Codes of discipline: Developments, dimensions, di rec ti ons. Educ a ti on and Urban Society, llC2>, 185-195. Goldstein, A., Apter, S., & Harootunian, B. (1984). School violence. Englewood Cliffs, N.J.: Pr entice-Hal 1. Goocher, B. (1965). Effects of attitude and experience on the selection of frequency adverbs. Journal of Verbal Learning and Verbal Behavior,~' 193-195. Gorsuch, R. (1974). Factor analysis. Philadelphia: Saunders.

PAGE 181

171 Goslin, D. (1969). Guidelines for the collection, maintenance and dissemination of pupil records. Troy, N. Y.: Russell Sage Foundation. Governor's Task Force on Disrupted Youth. ( 1973). Phase I report. Tallahassee, Fla.: State of Florida. Governor's Task Force on Disrupted Youth. (1974). Phase II report. Tallahassee, Fla.: State of Florida. Goyette, C., Conners, C. , . & Ulrich, R. (1978). Normative data on revised Conners' Parent and Teacher Rating Scales. Journal of Abnormal Child Psychology, .6., 221-236. Graham, S. ( 1981). Predictive and concurrent validity of the Jesness Inventory Asocial Index: When does a delinquent become a delinquent? Journal of Consulting and Clinical Psychology, .!!..2.(5), 740-742. Green, R., Bigelow, L., 0' Brien, P., Stahl, D., & Wyatt, R. ( 1977). The inpatient behavioral rating scale. Psychological Reports, !lQ., 543-549 Green, R., & Brydon, J. (1975). Investing in youth: An approachto discipline in urban schools. In National Education Association (Ed.), Discipline and learning, (pp. 107-114). Washington, D. C.: NEA. Greenwood, C., Walker, H., & Hops, H. ( 1977). Issues in social interaction/withdrawal assessment. Exceptional Children, !Li, 490-498. Gresham, F. (1982). A model for the behavioral assessment of behavior disorders in children. Journal of School Psychology, .2..Q.(2), 131-144. Gr i s e , P. ( 1 9 8 0 ) Fl or i d a ' s m i n i mum comp e t e n c y test i n g program for handicapped students. Exceptional Children, llC3), 186-191. Grose k, R. < 197 9 > Pro bl em Behavior Rating Seale. Binghampton, N. Y. : Broome Dev el o pm ental Center. (ERIC Document Reproduction Service No. EC 113 785) Guion, R. (1977). Content validity: Three years of talk --what's the action? Public Personnel Management, .6., 407-414. Guttman, L. ( 1944). A basis for scaling quantitative data. American Sociological Review, .2., 139-150.

PAGE 182

172 Guttman, L. < 1945). Questions and answers about scale analysis. Report D-2. Washington, D. C.: Research Branch, Information and Education Division, Army Service Forces. Guttman, L. (1947a). On Festinger's evaluation of scale analysis. Psychological Bulletin, ll, 451-465. Guttman, L. (1947b). The Cornell technique for scale and intensity analysis. Educational and Psychological Measurement, l, 247-280. Hale, R., & Zuckerman, C. (1981). Application of confirmatory factor analysis to verify the construct validity of the Behavior Problem Checklist and the Bristol Social Adjustment Gui des. Edu ca ti onal and Psychological Measurement, li, 843-850. Harris, A., Kreil, D., &Orpet, R. (1977). The modification and validation of the Behavior Coding System for school settings. Educational and Psychological Measurement, n, 1121-1126. Harrop, L. (1979). Unreliability of classroom observation. Educational Research, 21.(3), 207-211. Harvey, R. (1982). The future of partial correlation as a means to reduce halo in performance ratings. Journal of Applied Psychology, fil, 171-176. Haskel 1, S. ( 197 9) A timereferenced Qsort technique for evaluating behavioral change. American Journal of Orthopsychiatry, !!..2.<1>, 109-120. Helson, H. (1969). Adaptation level theory. New York: Harper and Row. Hew et t , F. , & For n e s s , S. ( 1 9 8 2 ) learners < 3rd ed.). Boston: Education of exceptional Al 1 y n & B a con Hinton, J., Webster, S., &O'Neill, M. (1978). Simple behaviour rating scales for maximum security patients. British Journal of Social and Clinical P sy ch ol o gy , ll < SEP > , 2 5 52 5 9 Hirshoren, A., & Heller, G. ( 1979). Programs for adolescents with behavior disorders: The state of the art. The Journal of Special Education, D<3>, 275-281.

PAGE 183

173 Holzbach, R. ( 1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology, .6..3.(5), 579-588. Hom, P., DeNisi, A., Kinicki, A., & Bannister, B. (1982). Effectiveness of performance feedback from behaviorally anchored rating scales. Journal of Applied Psychology, .6..7.(5), 568-576. Horne, M., &Larrivee, B. (1979). Behavior rating scales: Need for refining normative data. Perceptual and Motor Skills, li, 383-388. Horowitz, L., Inouye, D., & Siegelman, E. (1979). On averaging judges' ratings to increase their correlation with an external criterion. Journal of Consulting and Clinical Psychology, il.(3), 453-458. Howell, K. (1978). Evaluation of behavior disorders. In R. Rutherford & A. Prieto (Eds.), Severe behavior disorders of children and youth. Bloomington, Ind.: Indiana University, Department of Special Education. (Monograph) Hulin, C. ( 1982). Some reflections on general performance dimensions and halo rating error. Journal of Applied Psychology, .6..7., 165-170. Hunter, J., Hunter, R., & Lopis, J. ( 1979). A causal analysis of attitudes toward leadership training in a classroom setting. Human Relations, .3.2< 11), 889-907 Ingram, G., Gerard, R., Quay, H. , & Levinson, R. ( 1970). An experimental program for the psychopathic delinquent. Journal of Research in Crime and Delinguency, .J.a.n, 24-30. Iv ancev ich, J. ( 1980) A 1 ongi tudi nal study of behavior al expectation scales: Attitudes and performance. Journal of Applied Psychology, .6..2<2), 139-146. Jacob, T., Grounds, L., &Haley, R. (1982). Correspondence between parents' reports on the Behavior Pro bl em Checklist. Journal of Abnormal Child P sy ch ol o gy , 1..Q. < 4 ) , 5 9 36 o 8 . J a CO b s ' R . ' Ka f ry ' D . ' & z e de C k ' s . ( 1 9 8 0 ) . of behaviorally anchored rating scales. Psych 01 ogy, n, 5 95-6 04. Expectations Personnel

PAGE 184

174 Jesness, c. ( 1970). The Jesness behavior checklist manual. Palo Alto, CA: Consulting Psychologists Press. Jesness, c. ( 1972). The Jesness inventory manual. Palo Alto, CA: Consulting Psychologists Press. Jessor, R. ( 1982). Problem behavior and developmental transition in adolescence. The Journal of School Health, 22.(5), 295-300. Jessor, R., & Jessor, S. < 1977). Problem behavior and psychosocial development. New York: Academic Press. Johnson, S., Smith, P., & Tucker, S. ( 1982). Response format of the Job Descriptive Index: Assessment of reliability and validity by the multitrait_ mul timethod matrix. Journal of Applied Psychology, .6..1(4), 500-505. Jones, R., Reid, J., & Patterson, G. (1975). Naturalistic observations in clinical assessment. In P. McReynolds (Ed.>, Advances in psychological assessment (Vol. 3, pp.42-95). San Francisco: Jossey-Bass. Joshi, M. ( 1964). A factorial study of the problems of adjustment. Indian Psychological Review, 1., 42-46. Kahn, P., & Ribner, S. (1982). A brief behavior rating scale for children in a school setting. Psychology in the Schools, .19., 113-116. Kane, J., & Bernardin, H. (1982). Behavioral observation scales and the evaluation of performance appraisal effectiveness. Personnel Psychology, .32, 635-641. Kassin, S., & Wrightsman, L. ( 1983). The construction and validation of a juror bias scale. Journal of Research in Personality, ll, 423-442. Kaufman, A. ( 1975). Factor analysis of the WISC-R at 11 age levels between 6-1/2 and 16-1/2 years. Journal of Consulting and Clinical Psychology, .!..3., 135-147. Kaufman, A., Swan, W., & Wood, M. ( 1979). Dimensions of problem behaviors of emotionally disturbed children as seen by their parents and teachers. Psychology in the Schools, il.(2), 207-216.

PAGE 185

175 Kaz din, A., Esveldt-Dawson, K., & Loar, L. ( 1983). Correspondence of teacher ratings and direct observations of classroom behavior of psychiatric inpatient children. Journal of Abnormal Child P sy ch ol o gy , .ll < 4 ) , 5 4 95 6 4 Keaveny, T., & McGann, A. ( 1975). A comparison of behavioral expectation scales and graphic rating scales. Journal of Applied Psychology, .Q.Q(6), 695-703 . Kelley C. (1981). Reliability of the Behavior Problem Checklist with i nsti tuti onal iz ed male del inq ue nt s. Journal of Abnormal Child Psychology, 2.(2), 243-250. Kenny, D., & Berman, J. ( 1980). Statistical approaches to the correction of correlational bias. Psychological Bulletin, .8.8.<2), 288-295. Kerlinger, F. < 1972). Foundations of behavioral research (2nd ed.). New York: Holt, Rinehart and Winston. King, L., Hunter, J., & Schmidt, F. (1980). Halo in a multidimensional forced-choice performance evaluation scale. Journal of Applied Psychology, .Q.2(5), 507-516. Kingstrom, P., & Bass, A. ( 1981). A critical analysis of studies comparing behaviorally anchored rating scales (BARS) and other rating formats. Personnel Psychology, .l!!., 263-289. Kohn, M., Koretzky, M., & Haft, M. ( 1979). An Adolescent Symptom Checklist for Juvenile Delinquents. Journal of Abnormal Child Psychology, 1.(1), 15-29. Konopasek, D. (1983). Behavior evaluation scale. Behavioral Disorders, a<4), 280-281. Kowalski, G., Adams, M., &Gundlach, J. (1983). Structural determinants of juvenile offenses in school. Urban Education, ~(2), 179-190. Kreitler, S., & Kreitler, H. (1981). Test item content: Does it matter? Educational and Psychological Measurement, il, 635-642. Kulka, R., Klingel, D., & Mann, D. (1980). School crime and disruption as a function of student-school fit: An empirical assessment. Journal of Youth and Adolescence, 2.(4), 353-370.

PAGE 186

176 Lachar, D., & Gdowski, C. ( 1979). Problem-behavior factor correlates of Personality Inventory for Children profile scales. Journal of Consulting and Clinical Psychology, ll< 1 >, 39-48. Lahey, B., Green, K., & Forehand, R. ( 1980). On the independence of ratings of hyperactivity, conduct problems, and attention deficits in children: A multiple regression analysis. Journal of Consulting and Clinical Psychology, .!l.8.<5>, 566-574. Landy, F., Vance, R., & Barnes-Farrell, J. ( 1982). Statistical control of halo: A response. Journal of Applied Psychology, fu, 177-180. Landy, F., Vance, R., Barnes-Farrell, J., &Steel, J. ( 1980). Statistical control of halo error in performance ratings. Journal of Applied Psychology, .Q.2, 501-506. Larrivee, B., & Bourque, M. ( 1980). A comparative study of two rating formats. Measurement and Evaluation in Guidance, .12<4>, 223-228. Latham, G., Fay, C., & Saari, L. ( 1979). The development of behavioral observation scales for appraising the performance of foremen. Personnel Psychology, .32, 299-311. . Latham, G., & Wexley, K. ( 1977). Behavioral observation scales. Personnel Psychology, .3.Q, 255-268. Latham, G., Wexley, K., & Pursell, E. (1975). Training managers to minimize rating errors in the observation of behavior. Journal of Applied Psychology, .Q.Q, 550-555. Law Enforcement Assistance Administration. (October 15, 1979). Grants to schools operating alternative education programs. Federal register. Lawshe, C. ( 1975). A quantitative approach to content validity. Personnel Psychology, .2..a, 563-575. Ledingham, J. Younger, A., Schwartzman, A., & Bergeron, G. ( 1982). Agreement among teacher, peer, and self-ratings of children's aggression, withdrawal, and likability. Journal of Abnormal Child Psychology, 1..Q.( 3), 363-372.

PAGE 187

177 Lee, R., Malone, M., & Greco, S. ( 1981). Multi trait mul timethod-mul tirater analysis of performance ratings for law enforcement personnel. Journal of Applied Psychology, .M.C5), 625-632. Lessing, E., & Clarke, C. ( 1982). Reliability and validity of IJR Behavior Checklist scores: Number versus pathology level of symptoms. Journal of Abnormal Child Psychology, .1.Q(3), 337-362. Lessing, E., Williams, V., &Gil, E. (1982). Acluster analytically derived typology: Feasible alternative to clinical diagnostic classification of children. Journal of Abnormal Child Psychology, .1.Q(4), 451-482. Lessing, E., Williams, V., & Revelle, W. ( 1981). Parallel forms of the IJR Behavior Checklist for parents, teachers, and clinicians. Journal of Consulting and Clinical Psychoiogy, !l..2.< 1 >, 34-50. Lev in, H. < 1972). The costs to the nation of inadeQuate education. Washington, D.C.: Senate Select Committee on Equal Educational Opportunity. ( ERIC Document Reproduction Service No. ED 064 437) Levine, M. (1977). Sex differences in behavior ratings: Male and female teachers rate male and female pupils. American Journal of Community Psychology, 2(3), 347-353. Leyser, Y., & Abrams, P. ( 1982). Teacher attitudes toward normal and exceptional groups. The Journal of P sy ch ol o gy , 1..LQ , 2 27 2 3 8 Likert, R. ( 1932). A technique for the measurement of attitudes. Archives of Psychology, No. 140. Lines, P. ( 1972). The case against short suspensions. In IneQ ual ity in education. Boston: Center for Law and Education. Linton, H., & Chavez, C. (1979). Behavior checklist for junior high students. National Association of Secondary School Principals Bulletin, B<431), 119-121. Lissitz, R., & Green, S. (1975). Effect of the number of scale points on reliability: A Monte Carlo study. Journal of Applied Psychology, .6...Q, 10-13.

PAGE 188

178 Lobitz, G., &Johnson, S. (1975). Normal vs. deviant children: A multimethod comparison. Journal of Abnormal Child Psychology, .3., 353-374. Loeber, R. (1982). The stability of antisocial and delinquent child behavior: A review. Child Development, 23., 1431-1446. Loranger, M., Lacroix, O., & Kaley, . R. (1982). Validity of teachers' evaluations of students' social behavior. Psychological Reports, 2.1, 915-920. Lovitt, T. ( 1967). disabilities. Lufler, H. (1982). di sci pl ine. 169-184. Assessment of children with learning Exceptional Children, .3..!l, 233-239. Past court cases and future school Education and Urban Society, .1!!<2>, Lyness, K., & Cornelius, E. ( 1982). A comparison of holistic and decomposed judgment strategies in a performance rating simulation. Organizational Behavior and Human Performance, 2...2., 21-38. Madle, R., Neisworth, J., & Kurtz, P. (1980). Biasing of hyperkinetic behavior ratings by diagnostic reports: Effects of observer training and assessment method. Journal of Learning Disabilities, .U., 35-38. Marwit, K., Marwit, S., & Walker, E. ( 1978). Effects of student race and physical attractiveness on teachers' judgments of transgression. Journal of Educational P sy ch ol o gy , 1JJ., 9 11 9 1 5 Marwit, S. (1982>. Students' race, physical attractive ness and teachers' judgments of transgressions: Follow-up and clarification. Psychological Reports, .5...Q_, 242. Masterson, S. ( 1968). The adjective checklist technique: A review and critique. In P. McReynolds (Ed.), Advances in psychological assessment (pp. 275-312). Palo Alto, Calif.: Science and Behavior Books. Matusek, P., & Oakland, T. ( 1979). Factors influencing teachers' and psy chol ogi st s' recommendations regarding special class placement. Journal of School P sy ch ol o gy , ll, 1 1 61 2 5 Mayer, G., & Butterworth, T. (1979). A preventive approach to school violence and vandalism: An

PAGE 189

179 experimental study. Personnel and Guidance Journal, .51.(9), 436-441. Mccarney, S., Leigh, J., & Cornbleet, J. ( 1983). Behavior evaluation scale. Columbia, Mo.: Educational Services. McCarthy, J., & Paraskevapoulas, J. (1969). Behavior patterns of learning disabled, emotionally disturbed, and average children. Exceptional Children, .3..6., 69-74. McDermott, P. ( 1980). Principal components analysis of the revised Bristol Social Adjustment Guides. British Journal of Educational Psychology, 5...Q., 223-228. McDermott, P. < 1981 >. A syndromic typology for analyzing schoolchildren's disturbed social behavior. Philadelphia: University of Pennsylvania. McDermott, P., & Hale, R. ( 1982). Validation of a systems-actuarial computer process for multidimensional classification of child psychopathology. Journal of Clinical Psychology, .l8.< 3), 477-485. McKelvie, S. (1978). Graphic rating scales--How many categories? British Journal of Psychology, Q.2, 185-202. McKinney, J., & Forman, S. ( 1982). Classroom behavior patterns of EMH, LD, and EH students. Journal of School Psychology, 2Jl< 4 >, 271-279. Mcsweeney, A., & Trout, B. C 1979). Predicting treatment outcome through profile analysis of the Jesness Behavior Checklist. Paper presented at the annual convention of The Association for Behavior Analysis, Dearborn, Mi ch iga n. (ERIC Document Reproduction Service No. ED 177 436) Mendelsohn, M., & Erdwins, C. (1978). The disruptive behavior scale: An objective assessment of unmanageable social behaviors in adolescents. Journal of Clinical Psychology, ~(2), 126-128. Meredith, G. ( 1975). Toward a systems approach to student-based ratings of instruction. Journal of Psychology, 2.1, 235-246.

PAGE 190

180 Meredith, G. ( 1981). Pref erred 1 ength of seal es for students' evaluation of instruction. Perceptual and Motor Skills, il, 490. Mesinger, J. (1982). Alternative education for behavior disordered and delinquent adolescent youth: What works--Maybe? Behavior Disorders, IC2), 91-100. Messick, S. (1964). Personality measurement and college performance. Proceedings of the 1963 invitational conference on testing problems. Princeton, N.J.: Educational Testing Service. Messick, S. ( 1965). Personality measurement and the ethics of assessment. American Psychologist, .2.Q, 136-142. Messick, S. (1980). Test validity and the ethics of assessment. American Psy chol ogi st, .3.5.< 11 > , 1o12-1027. Miller, L. ( 1967). Louisville behavior check list for males 6-12 years of age. Psychological Reports, , 885-896. Miller, L. (1980). Dimensions of adolescent psychopathol ogy. Journal of Abnormal Child Psychology, .8. < 2 > , 161-173. Mischel, W. (1969). Continuity and change in personal ity. American Psychologist, 24, 1012-1018. Mitchell, S., & Rosa, P. ( 1981). Boyhood behavior problems as precursors of criminality: A fifteen-year follow-up study. Journal of Child Psychology and Psychiatry, 2..2., 19-33. Mooney, R. <.1942). Surveying high school students' problems by means of a problem check list. Educational Research Bulletin. 2.1, 57-69. Morris, J., & Arrant, D. ( 1978). Behavior ratings of emotionally disturbed children by teachers, parents,and school psychologists. Psychology in the Schools, ~<3>, 450-455. Moses, w. (1974). A behavior checklist for teachers' use in selecting students for a behavior counseling program. Unpublished manuscript, University of South Florida.

PAGE 191

181 Moses, w. c 1976). Final report on Meaningful Alternatives to Regular Schooling--MARS (Proj. No. 75-AS-08A104). Dade City, Fla.: Pasco County School District. Moses, w. (1981). Conduct code violations, 1980-81. Jacksonville, Fla.: Duval County School District. Mossholder, K., & Giles, W. ( 1983). The use of partial correlation to control halo in performance ratings. Educational and Psychological Measurement,~. 977-984. Mosteller, F., & Tukey, J. ( 1977). regression. Reading, Mass.: Data analysis and Addison-Wesley. Moyer, T., & Motta, R. ( 1982). Alienation and school adjustment among Black and white adolescents. Journal of Psychology, 1.12, 21-28. Murphy, K. (1982). Difficulties in the statistical control of halo. Journal of Applied Psychology, .1, 161-164. Murphy, K., Martin, C., & Garcia, M. (1982). Do behavioral observation scales measure observation? Journal of Applied Psychology, .6.1<5), 562-567. National Education Association (Ed.) (1975). Discipline and learning. Washington, D. c.: NEA. Nati anal Edu ca ti on Association. ( 1980) The 1 980 teacher-opinion poll. Washington, D.C.: Author. National Institute of Education . C 1978). Violent schools--Safe schools: The safe school study report to Congress. Washington, D.C.: Department of Health, Education, and Welfare. National School Public Relations Association. ( 1973). Discipline crisis in schools: The problem, causes, and search for solutions. Arlington, Va.: Author. Neumann, L., & Neumann, Y. ( 1981). Comparison of six lengths of rating scales: Students' attitudes toward instruction. Psychological Reports, 48, 399-404. New York State United Teachers. ( 1979). Survey of classroom stressors. New York: American Federation of Teachers.

PAGE 192

182 Nielsen, A., & Gerber, D. ( 1979). truancy in early adol e see nee. 313-326. Psychosocial aspects of Adolescence, ll(54), Nihira, K., Foster, R., Shellhaas, M., & Leland, H. < 1969). AAMD adaptive behavior scale. Washington, D.C.: American Association on Mental Deficiency. O'Leary, K., &Johnson, S. (1979). Psychological assessment. In H. 0.uay & J. Werry (Eds.), Psychopathological disorders of childhood (2nd ed., pp. 162-175). New York: Wiley. O'Leary, K., & Kent, R. (1973). Behavior modification for social action. In L. Hamerlynck, L. Bundy, & E. Mash (Eds.), Behavior change: Methodology, concepts, and practice (pp. 69-96). Champaign, Ill.: Research Press. Ostrov, E., Marohn, R., Offer, D., Curtiss, G., & Feczko, M. ( 1980). The adolescent antisocial behavior check list. Journal of clinical psychology, .3..Q.(2), 594-601. Parducci, A. ( 1968). Often is often. American Psy chol ogi st, n, 828. Patterson, G., Ray, R., Shaw, D., & Cobb, J. (1969). A manual for coding of family interactions (Document No. 01234). New York: Microfiche Publications. Peed, S., & Pinsker, M. (1978). Behavior change procedures. Education and Urban Society, 1.Q(4), 501-520. Pepper, S., &. Prytulak, L. ( 1974). Sometimes frequently means seldom: Context effects in the interpretation of quantitative expression. Journal of Research and Personality, a, 95-101. Peterson, D. ( 1961). Behavior problems of middle childhood. Journal of Consulting Psychology, 2..2, 205-209. Pettegrew, L., &Wolf, G. (1982). Validating measures of teacher stress. American Educational Research Journal, 1.2.<3), 373-396. Pinellas County School District. (Rev. 8/82). Alternative school referral form, PCS Form 1709. St. Petersburg, Fl a: Author.

PAGE 193

183 Pinellas County Schools. < 1983 >. Student code of conduct. St. Petersburg, Fla.: Author. Pisarra, J., & Giblette, J. ( 1981). Administrators' perceptions of aggressive behaviors. National Association of Secondary School Principals Bulletin, .6.5_ ( 4 4 1 ) , 4 95 3 Pohl, N. ( 1981). Scale considerations in using vague quantifiers. Journal of Experimental Education, ~(4), 235-240. Poulton, E. (1976). Quantitative subjective assessments are almost always biased, sometimes completely misleading. Bulletin of the British Psychological Society, 2.2, 385-387. Prinz, R., & Kent, R. ( 1978). Recording parent-adolescent interactions without the use of frequency or interval-by-interval coding. Behavior Therapy, .2., 602-604. Pursell, E., Dossett, D., & Latham, G. ( 1980). Obtaining valid predictors by minimizing rating errors in the criteria. Personnel Psychology, .3..3., 91-96. Quay, H. ( 1964). Personality dimensions in delinquent males as inferred from the factor analysis of behavior ratings. Journal of Research into Crime and Delinquency, 1-, 33-37. Quay, H. (1977). Measuring the dimensions of deviant behavior: The behavior pro bl em checklist. Journal of Abnormal Child Psychology, 2<3>, 277-287. Quay, H. ( 1978). Behavior disorders in the classroom. Journal of Research and Development in Education, ll<2>, 8-17. Quay, H., & Peterson, D. ( 1967 >. Manual for the behavior problem checklist. Champaign, Ill. : Children's Research Center, University of Illinois. Quay, H., & Peterson, D. problem checklist. Quay, H., & Peterson, D. problem checklist. State University. (1975). Manual for the behavior Miami, Fla.: Author. < 1979). Manual for the behavior New Brun sw i c k, N. J . : Rut g e rs Reed, M., & Edelbrock, C. (1983). Reliability and validity of the Direct Observation Form of the Child

PAGE 194

184 Behavior Checklist. Journal of Abnormal Child Psychology ll C 4 > 5 21-5 3 o. Reeves. J Perkins. M & Hollon. T. (1978). Issues in the effective use of the Walker Problem Behavior Identification Checklist. Spartanburg. s. c.: Spartanburg School District Six. (ERIC Document Reproduction Service No. ED 169 115) Reisberg. L.. Fudell. I.. & Hudson. F. (1982). Comparison of responses to the Behavior Rating Profile for mild to moderate behaviorally disordered subjects. Psychological Reports. 2..Q.. 136-138. Roberts. A & Jenkins, P. ( 1982). Teachers' perceptions of assertive and aggressive behavior at school: A discriminant analysis. Psychological Reports. 2..Q., 827-832. Roberts, M., Milich, R., Loney, J., & Caputo, J. (1981). A multitrait-multimethod analysis of variance of teachers' ratings of aggression, hyperactivity, and inattention. Journal of Abnormal Child Psychology, .2.(3), 371-380. Robins, L. (1966). Deviant children grown up. Baltimore: Williams and Wilkins. Rosenthal, R., & Jacobson, L. ( 1968). Pygmalion in the classroom. New York: Holt. Rinehart. and Winston. Ross, A., Lacey, H., & Parton. D. ( 1965). The development of a behavior checklist for boys. Child Development, .3.,Q_ , 1 0 1 31 0 27 Ross. M., & Salvia, J. ( 1975). Attractiveness as a biasing factor in teach er judgments. American Journal of Mental Deficiency. 80, 96-98. Rubel, R. (1977). The unruly school: Disorders, disruptions, and crimes. Lexington, Mass.: Lexington Books. Rubel. R. (1980). Crime and violence in public schools: Emerging perspectives of the 1980' s. Contemporary Education, 22.(1), 5. Russell, P., Lankford, M., & Grinnell, R. (1981). A management ev al ua ti on seal e for social workers. Journal of Psychology, .1..Q1, 127-130.

PAGE 195

185 Rutter, M. ( 1967). A children's behaviour questionnaire for completion by teachers: Preliminary findings. Journal of Child Psychology and Psychiatry, .8., 1-11. Ryan, F. ( 1958). Trait ratings of high school students by teachers. Journal of Educational Psychology, .!!..2.(3), 124-128. Safer, D., Heaton, R., & Parker, F. (1981). A behavioral program for disruptive junior high school students: Results and follow-up. Journal of Abnormal Child P sy ch 01 o gy , .2. c 4 > , 4 8 3 -4 9 4 . Sal via, J., & Yssel dyke, J. < 1981 >. Assessment procedures in special and remedial education (2nd ed.>. Boston: Houghton-Mifflin. Sanson-Fisher, R., & Mulligan, B. ( 1977). The validity of a behavioral rating scale. The Journal of Multivariate Behavioral Research, 1.2, 357-372. Schenck, S. ( 1980). The diagnostic/instructional link in Individualized Education Programs. The Journal of Special Education, .1.!L, 3 37-345. Schriesheim, C. (1981a). The effect of grouping or randomizing items on leniency response bias. Educational and Psychological Measurement, ll, 401-411. Schriesheim, C. (1981b). Leniency effects on convergent and discriminant validity for grouped questionnaire items: A further investigation. Educational and Psychological Measurement, ll, 1093-1099. Schriesheim, C., & DeNisi, A. ( 1980). Item presentation as an influence on questionnaire validity. Educational and Psychological Measurement, ~' 175-182. Schriesheim, C., & Hill, K. ( 1981). Controlling acquiescence response bias by item reversals: The effect on q ue sti onnai re validity. Edu ca ti onal and Psychological Measurement, ll, 1101-1114. Schriesheim, C., & Schriesheim, J. ( 1974). Development and empirical verification of new response categories to increase the validity of multiple response al terna ti v e q ue sti onnai res. Edu ca ti on and Psychological Measurement, .3..!!., 877-884.

PAGE 196

186 Schriesheim, C., & Schriesheim, J. ( 1978). The invariance of anchor points obtained by magnitude estimation and pair-comparison treatment of complete ranks scaling procedures. Educational and Psychological Measurement, .3..8., 977-983. Schwab, D., Heneman, H., & DeCotiis, T. (1975). Behaviorally anchored rating scales: A review of the 1 i tera tur e. Personnel Psychology, 2..8., 5 49-56 2. Se a r 1 s , E. , Is e t t , R. , & B ow de rs , T. ( 1 9 8 1 ) Exam in a ti on of item weighting on the Adaptive Behavior Scale Part II. Perceptual and Motor Skills. 23., 154. Seidman, E., Rappaport, J., Kramer, J., Linney, J., Herzberger, S., & Alden, L. ( 1979). Assessment of classroom behavior: A multiattribute, multisource approach to instrument development and validation. Journal of Educational Psychology, 1.1(4), 451-464. Sherif, M. ( 1954). Integrating field work and laboratory in small group research. American Sociological Review. 11., 759-771. Sherry, s. c 1979). Behavioral characteristics of educable mentally retarded, emotionally handicapped, and learning disabled students. Washington, D.C.: Department of Health, Education and Welfare, Bureau of Education for the Handicapped. Shuller, D., & McNamara, J. ( 1976). Expectancy factors in behavioral observation. Behavior Therapy, I, 519-527. Siegel, L., Dragovich, S., & Marholin, D. (1976). The effects of biasing information on behavioral observations and rating scales. Journal of Abnormal Child Psychology, !< 3 >, 221-233. Sikes, M. ( 1975). Law and order and race in the classroom. In National Education Association (Ed.). Discipline and learning Cpp.115-119). Washington, D. C. : N EA. Silvern, L. (1978). Masculinity-feminity in children's self-concepts: The relationship to teacher's judgments of social adjustment and academic ability, classroom behavior, and popularity. Sex Roles, !(6), 929-949. Sinclair, E. ( 1980). Relationship of psychoeducational diagnosis to educational placement. Journal of School Psychology, .1-8., 349-353.

PAGE 197

187 Sinclair, E., & Kheifets, L. ( 1982). Use of clustering techniques in deriving psychoeducational profiles. Contemporary Educational Psychology, 1.., 81-89. Smith, c. (1976). Identification of youngsters with emotional disabilities. Des Moines, Iowa: Iowa State Department of Public Instruction. ( ERIC Document Reproduction Service No. ED 177 750) Smith, P., & Kendall, L. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Appl i e d P sy ch ol o gy , ll < 2 > , 1 4 91 5 5 Solomon, D., & Kendall, A. (1975). Teachers' perceptions of and reactions to misbehavior in traditional and open classrooms. Journal of Educational Psychology . ..6.1..(4), 528-530. Solomon, D., & Kendall, A. ( 1977). Dimensions of children's classroom behavior, as perceived by teachers. American Educational Research Journal, il(4), 411-421. Spector, P. ( 1976). Choosing response categories for summated rating scales. Journal of Applied P sy ch ol o gy , il, 3 7 43 7 5 Spivack, G., Haimes, P., & Spotts, J. < 1967). Devereux adolescent behavior rating scale manual. Pennsylvania: The Devereux Foundation. Spivack, G., & Swift, M. ( 1966). The Devereux elementary school behavior rating seal es. Journal of Special Education, 1, 71-90. Spivack, G. , & Swift, M. < 1971> Hahnemann High School Behavior rating scale (HHSB). Philadelphia: Hahnemann Medical College and Hospital, Department of Mental Heal th Services. Spivack, G., & Swift, M. ( 1973). The classroom behavior of children: A critical review of teacher administered rating scales. The Journal of Special Education, 1..< 1), 55-89. (Monograph) Spivack, G., & Swift, M. (1977). The Hahnemann High School Behavior (HHSB) rating scale. Journal of Abnormal Child Psychology, 2.(3), 299-307.

PAGE 198

188 Stavig, G. (1978). The median controversy. Perceptual and Motor Skills, ll, 1175-1178. Stav ig, G. C 1982 >. The normalized mean. Perceptual and Motor Skills, 2..!i., 51-54. Stenner, A., & Smith, M. C 1982). Testing construct theories. Perceptual and Motor Skills, 22, 415-426. Stewart, D., & Deiker, T. (1976). An item factor analysis of the Mooney Pro bl em Check Li st. Edu ca ti onal and Psychological Measurement, .3..6., 509-513. Stott, D. (1971). Classification of behavior disturbance among school-age students: Principles, epidemiology, and syndromes. Psychology in the Schools, .8., 232-239. Stott, D. C 1972). Manual for the Bristol social adjustment guides. San Diego: Educational and Industrial Testing Service. Stott, D. ( 1978). Epidemiological indicators of the origins of behavior disturbance as measured by the Bristol Social Adjustment Guides. Genetic Psychology Monographs, il, 125-159. Stott, D., Marston, N., & Neill, s. < 1975 >. Taxonomy of behavior disturbance. London: University of London Press Ltd. Stott, D., & Sykes, E. < 1956). The Bristol social adjustment guides. London: University of London Press. Stott, D., & Wilson, D. (1977). The adult criminal as juvenile. British Journal of Criminology, 1., 45-57. Strahan, R. ( 1980). More on averaging judges' ratings: Determining the most reliable composite. Journal of Consulting and Clinical Psychology, !l.8.<5>, 587-589. Strain, p., Cooke, T., & Apolloni, T. (1976). Teaching exceptional children: Assessing and modifying social behavior. New York: Academic Press. Sulzbacher, S. ( 1973). Psychotropic medication with children: An evaluation of procedural biases in results of reported studies. Pediatrics, 21.(3), 513-517.

PAGE 199

189 Sundberg, N. ( 1961) The practice of psychological testing in clinical services in the United States. American Psychologist, il., 79-83. Tarte, R. , Vernon, C. , Luke, D. , & Cl ark, H. ( 1982) Comparison of responses by normal and deviant populations to Louisville Behavior Checklist. Psychological Reports, 20., 99-106. Taylor, R., Warren, S., & Slocumb, P. (1979). Categorizing behavior in terms of severity. American Journal of Mental Deficiency, .8.3.<4>, 411-414. Thorndike, E. ( 1920). A constant error in psychological ratings. Journal of Applied Psychology, .!., 25-29. Thorne, F. ( 1978). Methodological advances in the validation of inventory items, scales, profiles, and interpretations. Journal of Clinical Psychology, .3..!_(2), 283-301. Thurstone, L., & Chave, E. (1929). The measurement of attitude. Chicago: University of Chicago Press. Tinto, V., Paclilio, E., & Cullen, F. ( 1978). The social patterning of deviant behaviors in school. Washington, D.C.: Department of Health, Education, and Welfare. Torgerson, W. ( 1958). Theory and methods of sealing. New York: Wiley. Toul iatos, J., & Lindholm, B. ( 1981). Congruence of patients' and teachers' ratings of children's behavior problems. Journal of Abnormal Child P sy ch ol o gy 2. < 3 > , 3 4 7 3 5 4 Tygart, C. ( 1980). Student social structures and/or subcultures as factors in school crime: Toward a paradigm. Adolescence, .12(57), 13-22. Tzeng, O. ( 1983). A comparative evaluation of four response formats in per son al i ty ratings. Edu ca ti onal and Psychological Measurement, !!..3., 935-950. United States Department of Justice. (1982). Assessing the relationship of adult criminal careers to juvenile careers: A summary. Washington, D. c.: Author.

PAGE 200

190 Vance , R. , Kuh n er t , K. , & Fa r r , J ( 1 9 7 8 ) Int e r v i ew judgments: Using external criteria to compare behavioral and graphic scale ratings. Organizational Behavior and Human Performance, 2..2, 279-294. Vergon, C., & Williams, J. (Eds.). ( 1978). School suspension practices in Michigan. Student discipline: Policies. programs. and procedures. Ann Arbor, Mich.: University of Michigan, The Governor's Task Force on Disruptive Youth. Vinter, R., Sarri, R., Vorwaller, D., & Schafer, W. C 1966 > Pupil behavior inventory: A manual for administration and scoring. Ann Arbor, Mich.: Campus Publishers. Waksman, S., & Loveland, R. (1980). The Portland problem behavior checklist. Psychology in the Sch cols, 11.(1), 25-29. Walker, H. (1970). The Walker problem behavior identification checklist. Los Angeles: Western Psychological Services. Walker, H. C 1979). The acting-out child: Coping with classroom disruption. Boston: Allyn and Bacon. Walker, H., & Holland, F. ( 1979). Issues, strategies, and perspectives in the management of disruptive child behavior in the classroom. Journal of Education, li...1.(2), 25-36. Walker, H., & Hops, H. ( 1976). Use of normative peer data as a standard for evaluating classroom treatment effects. Journal of Applied Behavior Analysis, 2., 159-168. Wallbrown, J., Wallbrown, F., & Blaha, J. ( 1976). The stability of teacher ratings on the Devereux Elementary School Behavior Rating Seale. Journal of Experimental Education, 44(4), 20-22. Waters, L., Reardon, M., & Edwards, J. (1982). Multitrait multimethod analysis of three rating formats. Perceptual and Motor Skills, 22., 927-933. Weinrott, M. ( 1979). The utility of tracking child behavior. Journal of Abnormal Child Psychology, 1.(3), 275-286.

PAGE 201

191 Werry, J., Methuen, R., Fitzpatrick, J., & Dixon, H. (1983). The interrater reliability of DSM III in children. Journal of Abnormal Child Psychology, 1.1(3), 341-354. White, W. ( 1977). The use of the Behavioral Maturity Scale in formative and summative evaluation of young children. Education, .2...8.< 1), 17-22. Willems, E. (July 13-19, 1975). Relations of models to methods in behavior ecology. Paper presented at the Biennial Conference, International Society for the Study of Behavior Development, Guilford, Surrey, Engl and. Willingham, W., & Jones , M. ( 1958). On the identification of halo through analysis of variance. Educational and Psychological Measurement, .18., 403-407. Wixson, S. ( 1980). Two resource room models for serving learning and behavior disordered pupils. Behavioral Disorders. 2.<2>, 116-125. Wodarksi, J., & Pedi, S. (1978). The empirical evaluation of the effects of different group treatment strategies. Journal of Clinical Psychology, .3J! < 2) , 471-481. Yard, G. < 1977). Behavioral disorders. Des Moines, Iowa: Drake University, Midwest Regional Resource Center. (ERIC Document Reproduction Service No. ED 163 707) Ysseldyke, J., & Algozzine, B. (1982). Bias among professionals who erroneously declare students eligible for special services. Journal of Experimental Education, .5..Q.(4), 223-228. Ysseldyke, J., Algozzine, B., Regan, R., & McGue, M. (1979). The influence of test scores and naturally occurring pupil characteristics on psychoeducational decision making with children , 167-177.

PAGE 202

192 Ysseldyke, J., Algozzine, B., & Richey, L. ( 1982). Judgments under uncertainty: How many children are handicapped? Exceptional Children, .!l.8.<6>, 531-534. Ysseldyke, J., & Marston, D. ( 1982). Gathering decision making information through the use of non-test-based methods. Measurement and Evaluation in Guidance, 12 ( 1 ) , 5 86 9 Zammuto, R., London, M., & Rowland, K. (1982). Organization and rater differences in performance a ppr ai sal s. Personnel Psych ol o~y, 25., 6 43-6 58.

PAGE 203

BIOGRAPHICAL SKETCH William L. Moses was born on March 20, 1936, in Macon, Georgia. He attended public . schools in Georgia. In 1961 he graduated, summa cum laude, from Mercer University, with a Bachelor of Arts degree in economics and psychol ogy. From 1961 to 1972 he was employed in various business ventures. From 1972 to 1974 Mr. Moses was a graduate student in the Departments of Psychology and Education at the University of South Florida, receiving the Master of Arts degree in school psychology. During this period, he developed and coordinated the Adolescent Resocialization Residential Project at Memorial Hospital in Tampa, Florida. From 1974 to 1976 Mr. Moses was employed in Pasco County, Florida, as a school psychologist and coordinator of a federally-funded alternative schooling project, which was developed from his master's thesis research. In 1976, Mr. Moses joined Pasco-Hernando Community College as an instructor in psychology and business. In 1979 he received a one-year sabbatical leave to pursue a doctoral degree in counselor education at the University of Florida. After an additional year of leave conducting dissertation research while employed as a school psychologist in Duval 193

PAGE 204

194 County, Florida, Mr. Moses returned to Pasco-Hernando Community College where he is currently employed. Mr. Moses maintains an active family counseling practice and is licensed in Florida as a mental health counselor. He is also a National Certified Counselor and a Certified Clinical Mental Health Counselor.

PAGE 205

I ce rt ify that I hav e re 2a t~:s stu~y and that ir. ~Y opin i on it c onforms o acceptable s~antarcs of schola r ly i::resentation and is ully ad e quate , ::.n sccpe and qual:..ty, ~s a disse rta tior. fo r th e ceg ree ~f Coctcr o f ?hiloso~n y. ? ; 6ce ri ci< :--'.cD2v :..s , Cha ir :nan ~:ofesscr of Co~r.selcr L d ~ca tion I certify that : hav~ re2d :his st~ c y 2nd tr.3t i n ~Y opi nion i~ :o acce~table standarcs o~ scholarly ;:reser.t2ti on and is ft.;.l.::..y aceqL;.ate , :. n sc ope and J_;Ja.l.:..ty, as a dissertation for the cegree o f Do ctor o f Philosor~y . e ;J ( -~ '----------------Larr . Loesch , Co ct2irm211 ?rof!"es s o r of Couns e lor ~ducation ce r tify that I have rea d :his st udy and that in ~Y o pinion ~t conforms to acceptatle standards of scnola r_ y ~resentation and is fully 3de qu ate, in scope and qual:.ty, as a dissertation for the degree of D oct o r of Philosophy . Thi s dissertation was submitted to the Graduate Faculty of :he Coll e ge of Education and to the G rad uate School a~d wa s acce p ted as pa r t i al fulfillment of the r eq ui rem en ts fo r the degree of D octor of Philosoph y. M ay 1 9 86 Dean , College of Education Dean.G'acluateSc