Parent-child interaction therapy with behavior problem children


Material Information

Parent-child interaction therapy with behavior problem children maintenance of treatment effects in the school setting
Alternate title:
Maintenance of treatment effects in the school setting
Physical Description:
vii, 183 leaves : ill. ; 29 cm.
Funderburk, Beverly White, 1959-
Publication Date:


Subjects / Keywords:
Parent-Child Relations   ( mesh )
Child Behavior Disorders -- therapy   ( mesh )
Child Behavior Disorders -- psychology   ( mesh )
Schools   ( mesh )
Family Therapy   ( mesh )
Follow-Up Studies   ( mesh )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1993.
Includes bibliographical references (leaves 172-182).
Statement of Responsibility:
by Beverly White Funderburk.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
oclc - 50706665
System ID:

Full Text








I would like to thank each of my committee members, Dr.

Sheila Eyberg, Dr. Jane Pendergast, Dr. Russell Bauer, Dr.

Stephen Boggs, and Dr. Suzanne Johnson, for providing

conceptual guidance in the formulation and realization of

this project. I especially would like to thank the chair of

my committee, Dr. Eyberg, for her meticulous professional

guidance, her unfailing enthusiasm, her patience, and her

friendship during this research as throughout my graduate

training. Special thanks go to Dr. Pendergast for

her valuable statistical and editorial guidance and for

giving so freely of her time. I would also like to thank

Dr. James Algina for statistical consultation which greatly

facilitated the completion of this project.

I am grateful to Katharine Newcomb, Cheryl McNeil, and

Toni Eisenstadt for their primary contributions to this

work, and to Janet Bessmer, Gloria Medige, Laura Mee, Beth

Onufrak, Marc Wruble, and Peter Demaras for the long hours

they spent coding classroom data. I would also like to

thank the teachers who provided ratings and allowed us to

enter their classrooms.

Finally, I would like to thank family and friends who

have sustained me throughout this research. My husband,

Chuck, provided untold emotional support and practical

assistance. My friend, Louise Kerslake, gave me peace of

mind by taking such excellent care of my son while I

completed this manuscript. I thank my mother, Ann White,

for instilling in me a love of learning. I thank my father,

James White, for teaching me not to settle for the ordinary.






Correlates of Disruptive Behavior Disorders.
Treatment of Disruptive Behavior Disorders.
School Generalization . .
Treatment Maintenance Research Methodology. .
Statement of Purpose . .
Hypotheses . .. .

2 METHOD . . .

Subjects . . .
Measures . . .
Procedures . . .

3 RESULTS . . .

Treatment Subject Analysis of Variance .
Treatment vs. Comparison Children .
Comparison of Two vs. Three days
of Classroom Observation . .



















































Abstract of Dissertation Presented to the Graduate
School of the University of Florida in Partial
Fulfillment of the Requirements for the
Degree of Doctor of Philosophy



Beverly White Funderburk

August 1993

Chairman: Sheila Eyberg, Ph.D
Major department: Clinical and Health Psychology

Followup school assessments were conducted 12 months

and 18 months following completion of Parent-Child

Interaction Therapy, a behavioral family therapy for

preschool children with disruptive behavior disorders that

integrates traditional and behavioral methods. The 12

subjects all displayed significant home and school behavior

problems prior to treatment, and all showed clinically

significant improvement in home behavior after completing

the 14-session treatment. Ten subjects had been evaluated

in a previous school generalization study which found

significant improvements on observations and teacher ratings

of classroom conduct problem behaviors, but not on measures

of attention and activity level. In contrast to previous

reports of parent-child therapy, school generalization was

demonstrated without direct classroom intervention.

At the 12-month followup subjects maintained post-

treatment improvements on observational and teacher rating

measures of classroom conduct problems and showed further

improvements in social competency. Subjects fell within the

mid-range of classroom behavior problems relative to

classroom comparison subjects on measures of conduct

problems and social competency. At the 18-month followup,

subjects maintained improvements in compliance, but

demonstrated declines on most measures into the range of

pre-treatment levels. Subjects fell at the high end of the

normal range of classroom conduct problems relative to

classroom comparison subjects at the 18-month followup.

Treatment subjects received a high rate of special

educational services at both followup assessments.



The Disruptive Behavior Disorders listed by DSM-III-R

include Attention-deficit Hyperactivity Disorder,

Oppositional Defiant Disorder, and Conduct Disorder

(American Psychiatric Association, 1987). Attention-deficit

Hyperactivity Disorder (ADHD), defined as having onset

before the age of seven, is characterized by a pattern of

developmentally inappropriate inattention, impulsiveness,

and hyperactivity. Oppositional Defiant Disorder (ODD) is

characterized by a pattern of developmentally inappropriate

negativistic, hostile, and defiant behavior that is usually

most evident with familiar adults. Conduct Disorder

involves a pattern of more serious violations of societal

norms, including physical aggression, truancy, theft, and

vandalism. These categories of disruptive behavior, often

grouped under the term "externalizing behaviors," are an

area of major clinical concern due to their prevalence and


Externalizing behavior problems account for between

one-third and three-quarters of all child referrals to

mental health agencies (Wells & Forehand, 1985). Estimates

of hyperactivity range from 3% to 15% in nonclinical samples

and to 50% and above in clinic referred youngsters (Whalen &

Henker, 1991), with boys showing a higher incidence than

girls. There is considerable overlap among the categories

of disruptive behavior, with many children carrying more

than one diagnosis (American Psychiatric Association, 1987).

Longterm stability of antisocial behavior is predicted by

the co-occurrence of disorders; other prognostic indicators

include the density of the problem behavior, the occurrence

of problem behaviors in multiple settings, and earlier age

of onset (Loeber, 1982).

Despite the fact that normal preschoolers often engage

in specific disruptive behaviors (Crowther, Bond, & Rolf,

1981), the pervasive patterns of behavior associated with

ODD and ADHD are increasingly recognized as prognostic of

later problems, even when diagnosed in the preschool age

range. Campbell notes that "As a group, young hyperactive

children can be differentiated from controls on measures of

both core features [of ADHD] and symptoms of conduct

disorder" (Campbell, 1985, p. 414). In a prospective study

of children who exhibited externalizing behavior problems

that persisted from age three to age six, by age nine 67%

met diagnostic criteria for an externalizing disorder. At

age nine, this persistent problem group also received a

higher rate of special education services and mental health

services than comparison groups (Campbell & Ewing, 1990).

Consistent with these findings, studies following

hyperactive boys into adolescence and early adulthood have

consistently found impaired academic achievement, continuing

attention problems, and higher rates of antisocial behavior

(Barkley, Fischer, Edelbrock, & Smallish, 1991; Fischer,

Barkley, Edelbrock, & Smallish, 1990; Mannuzza, Gittelman

Klein, & Addalli, 1991; Mannuzza, Klein, Bonagura, Malloy,

Giampino, & Addalli, 1991). It is estimated that

approximately 70% of children diagnosed with ADHD continue

to have difficulties as adolescents, regardless of whether

or not they undergo treatment with stimulant medication

(Hechtman, 1989). Overall, childhood antisocial behavior is

considered the strongest predictor of adult antisocial

behavior (Robins, 1978).

Correlates of Disruptive Behavior Disorders

Several factors have been linked with the development

of disruptive behavior disorders in children. These include

an association between lower socioeconomic status (SES) and

higher rates of problem behavior (Richman et al., 1982;

Weiss & Hechtman, 1986), and transmission of familial

patterns of antisocial behavior across generations (Robins,

1981). Negative parent-child interactions in childhood,

specifically maternal directiveness and parent-child

conflict during clinic observations, have been shown to

predict higher rates of ODD and aggression in adolescence

(Barkley et al., 1991). Maternal negative control has

emerged as an important predictor variable for later

behavior problems in Campbell's prospective studies of

externalizing preschoolers (Campbell & Ewing, 1990;

Campbell, March, Pierce, Ewing, & Szumowski, 1991). Higher

rates of negative maternal control were noted even when

possible maternal selection bias was controlled by selecting

subjects based on teacher ratings rather than maternal

report (Campbell et al., 1991).

Treatment of Disruptive Behavior Disorders

Meta-analyses of treatment outcome studies with

children offer some encouragement regarding the utility of

therapy for childhood psychological disorders. A meta-

analysis of 105 controlled therapy outcome studies found an

average effect size .79, indicating that the average treated

child functioned better at posttreatment than 79% of control

subjects (Weisz, Weiss, Alicke, & Klotz, 1987). A recent

meta-analysis of 223 studies concurred with these findings,

with a mean effect size of .88 for the subset of 64 studies

that compared treatment with a no-treatment control group

(Kazdin, Bass, Ayers, & Rodgers, 1990). Approximately half

of the studies included in the meta-analyses addressed

children with externalizing behavior disorders.


Parent Management Training

There is a wide and steadily increasing body of

literature indicating that parent management training (PMT)

offers great promise among therapies for externalizing

disorders in pre-adolescents (Kazdin, 1987; McMahon & Wells,

1989). Broadly stated, PMT involves teaching parents new

ways to interact with their children. The goal is to

replace coercive patterns of interchange, which are believed

to foster and sustain aggressive, oppositional child

behavior, with more positive, prosocial interactions

(Patterson, 1982). The therapist works primarily with

parents, instructing them in social learning principles,

helping them to identify and observe problem behaviors, and

then to design and implement behavioral interventions in the

home. A variation of PMT that is increasingly used with

preschool to early elementary aged children will be referred

to here as parent-child treatment. This treatment, like all

variants of PMT, is based on social learning principles, but

treatment incorporates direct coaching by the therapist of

the parent interacting with the child in the clinic. The

existing literature on PMT outcome has seldom distinguished

parent-child treatment from PMT approaches that do not

include the child or that target older children (Eyberg,


The effectiveness of PMT, broadly defined, for

disruptive behavior disorders in children is well-documented

(Kazdin, 1987; McMahon & Wells, 1989). Controlled studies

have favored it over no-treatment control groups and

nonbehavioral treatments (Hughes & Wilson, 1989; Pisterman

et al., 1989; Taylor & Hoedt, 1974; Wells & Egan, 1988).

Many studies have documented increases in child compliance,

reduction of antisocial behaviors, and increases in targeted

parent skills following PMT both in the clinic (e.g.,

Eisenstadt et al., in press; Eyberg & Robinson, 1982) and in

the home (Baum & Forehand, 1981; Peed, Roberts, & Forehand,

1977). Behavioral improvements have been noted in untreated

siblings as a result of treatment (Eyberg & Robinson, 1982;

Humphreys, Forehand, McMahon, & Roberts, 1978). In recent

years, PMT has been applied with some success in group

format (Webster-Stratton & Hammond, 1990), through self-

instructional bibliotherapy (Sloane, Endo, Hawkes, & Jenson,

1991), and even in therapist-absent self-administered

videotape therapy (Webster-Stratton, Kolpacoff, &

Hollinsworth, 1988). Studies of maintenance of treatment

effects in the home have generally shown behavior at

followup to be stable or slightly improved relative to the

end of treatment (Forehand, Wells, & Griest, 1980; Webster-

Stratton, Hollinsworth, & Kolpacoff, 1988).

Factors Affecting PMT Outcome

Much recent research effort has focused on determining

factors which enhance or limit the effectiveness of PMT.

Like other forms of psychotherapy, relatively high rates of

drop-out and treatment failure are reported, generally in

the range of 30-50% of those initiating treatment (Webster-

Stratton & Hammond, 1990).

Child factors. There is some evidence that children

who enter treatment at an earlier age may show better

outcome and a lower treatment dropout rate than older

children (Strain, Steele, Ellis, & Timm, 1982). While some

evidence of maintenance of treatment effects has been shown

from four to ten years following PMT treatment (Forehand &

Long, 1988), it remains to be shown whether the longterm

incidence of disruptive behavior disorders is lower in

children who have received treatment than in those who do

not receive treatment. Suggestive evidence is provided by

Campbell's prospective study of disruptive children

identified between the ages of two and three (Campbell &

Ewing, 1990). Children whose behavior problems resolved by

age six showed a lower incidence of disruptive behavior

disorders (29%) at age nine than children whose problems

persisted from age three through age six (67%).

An emerging issue in treatment outcome research is

whether PMT may be differentially effective for ODD and

ADHD. Following treatment with Parent-Child Interaction

Therapy, stronger effects were shown on observational

measures of compliance than on clinic observational measures

of activity and attention (Eisenstadt, Eyberg, McNeil,

Newcomb, & Funderburk, in press) and in the classroom

(McNeil, Eyberg, Eisenstadt, Newcomb, & Funderburk, 1991).

Similar results were shown for parent training augmented

with "attention training," in which parents of preschoolers

were trained to increase their children's attention span by

using re-focusing cues and reinforcement of on-task behavior

(Pisterman et al., 1992). The results showed improved child

compliance and parental acquisition of the attention

training target skills, but no significant improvement in

children's attention. The authors suggest "treatment of

ADHD may need to take a developmental perspective.

Behavioral parent training that targets compliance may be

the treatment of choice during the preschool years when

behavioral problems peak .[and] attentional problems

may be more appropriately addressed during the school years

if attentional problems begin to have a deleterious effect

on children's academic success and classroom functioning"

(Pisterman et al., 1992, p. 407).

Parent factors. Kazdin (1987) notes that PMT requires

significant parent involvement and places relatively high

demands on parents. Not surprisingly, factors that would be

expected to limit mothers' efforts in therapy have been

identified as negative predictors for therapy completion and

outcome. These factors include maternal depression,

increased negative life events, marital discord, and lack of

social support (Compas, Howell, Phares, Williams, & Giunta,

1989; Dadds & McHugh, 1992; Webster-Stratton & Hammond,

1990). Single mothers have also been identified as at risk

for poor treatment outcome (Webster-Stratton, 1992; Webster-

Stratton & Hammond 1990). "Insular mothers," those who are

both socially isolated and socioeconomically disadvantaged,

have been identified as at exceptional risk for poor

treatment outcome and maintenance (Dumas & Wahler, 1983;

Wahler, 1980). Strayhorn and Weidman (1989) summarized

negative predictors for PMT as follows: maternal depression,

low SES, low social support, and single parent status/low

father involvement. They note the dilemma that "the same

characteristics predicting poor response to parent training

interventions predict high risk for psychopathology in

children" (Strayhorn & Weidman, 1989, p. 888).

Several modifications of PMT have attempted to

ameliorate the effects of negative predictors. In a PMT

program that attempted to manipulate social support, Dadds

and McHugh (1992) confirmed the importance of social support

as a predictor variable but were unable to increase low SES

mothers' perceived social support. Another study examined

different teaching methods with low SES mothers and found

that the use of modeling and role playing yielded better

acquisition of targeted parenting skills than verbally

mediated training (i.e., reading and discussion) (Knapp &

Deluty, 1989). Single mothers who received problem solving

training dealing with issues other than child-rearing in

addition to parent training showed more positive ratings of

child's behavior four months posttreatment (Pfiffner,

Jouriles, Brown, Etscheidt, & Kelly, 1990).

Parent-Child Interaction Therapy

Several characteristics of Parent-Child Interaction

Therapy (PCIT) suggest its potential for achieving broad and

lasting improvements in children with disruptive behavior

disorders. First, it very directly targets the negative

controlling maternal interactions that have been clearly

linked to the development of childhood externalizing

behavior problems. Second, it provides direct coaching of

the parent and child interaction, which should provide for

greater skills attainment in low-SES or lower functioning

parents as opposed to more verbally mediated forms of parent

training (Knapp & Deluty, 1989). Finally, PCIT targets the

preschool age range that appears most amenable to treatment.

Indeed, several studies cited above attest to the

effectiveness of PCIT for treatment of externalizing

behavior problems in the clinic and home environment (i.e.,

Eisenstadt et al., in press; Eyberg & Robinson, 1982).

Additionally, PCIT has shown promising results in the

area of generalization of treatment effects into the

preschool and school setting (McNeil et al., 1991), which is

the primary focus of the present study. The PMT literature

provides mixed findings on the issue of school

generalization, and before embarking on a review of PMT

school generalization studies, a brief description of

Parent-Child Interaction Therapy is in order.

Parent-Child Interaction Therapy (PCIT) is a behavioral

family therapy approach that integrates traditional and

behavioral methods (Eyberg, 1988). Treatment is conducted

in two phases, Child-Directed Interaction (CDI) and Parent-

Directed Interaction (PDI). Therapy is conducted in the

context of play situations, where the parent interacts with

the child while a therapist coaches from behind a one-way

mirror using a "bug in ear" microphone device. The primary

goal of CDI is to enhance the parent-child relationship, and

the parent is taught to inhibit directive and critical

responses. Parents are trained to praise appropriate

behavior, to follow the child's lead with behavioral

imitations, verbal descriptions and reflective statements,

and to ignore inappropriate behavior. In PDI, emphasis

shifts to establishing effective discipline and reducing the

child's noncompliance. The parent is trained to direct the

child's behavior using clear, developmentally appropriate

commands and to use a time-out procedure in instances of

noncompliance. Generalization to the home is gradually

established as the parents learn to enforce "house rules"

and rules for public behavior. For a more complete

description of PCIT, the interested reader is referred to

Eyberg and Boggs, 1989a.

School Generalization

Studies of the generalization of behavioral

improvements from the clinic setting to the school setting

have been less than conclusive in the PMT literature. Most

research in this area is marked by methodological flaws such

as inadequate controls, unidimensional outcome assessment,

and inadequate demonstration of treatment effects.

One of the earliest studies evaluating school

generalization suggested the presence of a "contrast effect"

of worsening school behavior simultaneous with home

behavioral improvements following PMT (Johnson, Bolstad, &

Lobitz, 1976). This study, however, demonstrated

significant behavioral improvements in the home only on

parent rating scale measures. Behavioral observation data

did not indicate significant reductions in deviant behavior

at home from pre- to post-treatment. Teacher rating scale

assessments of eight of the children who underwent treatment

showed no change from pre- to posttreatment. School

behavioral observations, available on only five of the eight

treatment children, revealed a nonsignificant increase in

observed deviant behavior relative to a control sample of

children with school behavior problems, which showed a

nonsignificant decrease in observed deviant behavior. In

separate small samples of children who were treated in a

special educational setting, Johnson et al. (1976) observed

nonsignificant increases in home deviant behavior.

Combining results from the home treatment and the school

treatment samples, the authors interpreted this as evidence

for a behavioral contrast effect (Johnson, Bolstad, &

Lobitz, 1976).

Forehand and colleagues (1979) investigated school

generalization in a sample of eight children who

successfully completed parent-child treatment (Forehand et

al., 1979). This study reported no significant change in

observational measures or teacher ratings of deviant school

behaviors. It is noteworthy that treatment was described as

successful based on only a 5% increase in compliance in the

home. Furthermore, the treated children did not differ from

normal classroom control children in levels of deviant

behavior prior to treatment, and noncompliance at school was

described as occurring "too infrequently to yield meaningful

results." Nevertheless, the researchers noted that five of

the eight treated children showed nonsignificant increases

in deviant school behavior over the assessment interval,

while only four of the eight control children showed similar

nonsignificant increases. The authors concluded that

further research was required on the contrast effect in

parent-child treatment.

In a subsequent study, Breiner and Forehand (1981)

obtained school observations and teacher ratings of sixteen

children who underwent parent-child treatment for home

behavior problems and sixteen same-sex normal classroom

control subjects. As in the previously described study,

successful completion of treatment was based on relatively

small observed behavioral changes: an increase in compliance

to parental commands from 30% to 35% and a decrease in

oppositional behaviors in the home from 9% to 6%. While

these behavioral improvements were statistically

significant, it is unclear whether the magnitude of these

changes indicates clinically significant behavioral

improvement. Also consistent with the previous study, the

treatment children did not differ from classroom control

children on observational measures or teacher ratings either

prior to or after treatment. The authors reasonably

concluded that "a contrast effect in the school does not

appear to be a concern for parent trainers" (Breiner &

Forehand, 1981, p. 40). Unfortunately, the authors extended

their interpretation of these results to apply to children

exhibiting conduct problem behaviors at school, stating that

"If school problems exist, they will not be reduced by

treatment directed toward home problems. School problems

will have to be directly programmed into an overall

treatment package in order to successfully reduce them"

(Breiner & Forehand, 1981, p. 41).

In contrast, Cox and Matthews (1977) found lower levels

of classroom conduct problems relative to untreated control

children based on behavioral observations at the end of a

parent training program focusing on family relationships and

management skills. These apparent improvements were

maintained at a two-month followup. Unfortunately, this

study failed to obtain pretreatment evaluations, making it

impossible to determine whether the treatment children's

favorable comparisons with untreated controls were due to

the effects of treatment or simply represented systematic

differences between the samples prior to treatment.

Sayger and Horne (1987) found positive improvements in

school behavior that were maintained at a nine-month

followup after parent training using a social learning model

of family therapy. However, this study failed to include a

control group and only one measure, the Daily Behavior

Checklist (Prinz, O'Connor, & Wilson, 1981), was used to

assess school behavior change. In another study employing

only a rating scale measure to assess school generalization,

improvements on the Devereux Elementary School Behavior

Rating Scale (Spivack & Swift, 1967) were greater for

children whose parents underwent nonbehavioral group

counseling addressing their children's school behavior

problems than for children who received direct group

counseling (Taylor & Hoedt, 1974). Followup evaluation was

not included in this report.

Longterm generalization to the school was suggested by

a followup of 40 children whose parents received treatment

based on Wahler's Oppositional Child Treatment parent

training model (Strain et al., 1982). Classroom behavioral

observations conducted three- to nine-years following

completion of treatment revealed that treatment children did

not differ from randomly selected classroom controls in

compliance to teacher commands, on-task behavior, or

appropriate peer interactions or on teacher ratings on the

Walker Problem Behavior Identification Checklist (Walker,

1970). Although the results of this multimodal assessment

are interesting, the results cannot be meaningfully

interpreted because no report was made of pretreatment

classroom evaluation or of treatment outcome.

A recent study compared the effects of parent-child

treatment augmented by positive relationship building and

verbally encoded interchanges (stories, conversation,

dramatic play) to a minimum treatment control group which

received a handout on parenting skills and two videotapes on

PMT (Strayhorn & Weidman, 1989). The treated children,

whose parents attended an average of 12.5 hours of

treatment, failed to show improvements on teacher ratings on

the Behar Preschool Behavior Questionnaire (PBQ) (Behar &

Stringfield, 1974) immediately following treatment, but

significant improvements were documented relative to

controls in teacher ratings of ADHD symptoms at the one-year

followup (Strayhorn & Weidman, 1991). Interestingly,

significant improvements on oppositional and aggressive

behaviors were not found in this PMT study which targeted 89

children recruited from the community (Strayhorn & Weidman,


Webster-Stratton and colleagues (1988) reported

significant reductions in teacher-reported behavior problems

on the PBQ following parent training programs involving

group discussion and combined group discussion with

videotape modeling, but actual school behavior was not

directly assessed. A more recent study replicated

improvements on teacher PBQ ratings at posttreatment, but

the gains were not maintained relative to pretreatment

scores at a one-year followup (Webster-Stratton & Hammond,


Parent training was compared with self-control therapy

and the combination of the two treatments in a sample of

elementary school children with ADHD (Horn, Ialongo,

Greenberg, Packard, & Smith-Winberry, 1990). A control

group of non-ADHD children was included, but they were not

matched for gender or classroom. Improvements were noted at

posttreatment for all treatment modalities on teacher

ratings on the Conners Teacher Rating Scale (Goyette,

Conners, & Ulrich, 1978) and the Teacher Self-Control Rating

Scale (Kendall & Wilcox, 1979), but these were not

maintained at the eight-month followup.

An extensive two-year treatment that combined parent

training with training in social skills, fantasy play and

appropriate television viewing was compared to an attention

control and a no treatment control in a large French

Canadian sample (Tremblay et al., 1991). Subjects were

selected on the basis of teacher ratings of disruptive

behavior above the 70th percentile in an epidemiological

sample. No success was noted on teacher ratings at the end

of treatment or on followups one and two years


The studies reviewed provide mixed results on school

generalization. The Tremblay et al. (1991) report is not

readily comparable to the other PMT treatments reviewed

because of the long duration of treatment, the subject

selection procedures, and the unusual added elements of

treatment (e.g., fantasy play intervention). Of the

remaining studies, several of the earlier reports did not

find school generalization (e.g., Johnson et al., 1976;

Forehand et al., 1979; Breiner & Forehand, 1981). The

likelihood of school generalization was diminished in those

studies by failure to demonstrate clinically significant

behavioral improvement in the home environment and failure

to identify an adequate classroom control group. Several

other studies reported apparent improvements in classroom

behavior following PMT, but failed to document treatment

effects clearly (Cox & Matthews, 1977; Strain et al., 1982;

Strayhorn & Weidman, 1989) or lacked a control group (Sayger

& Home, 1987; Webster-Stratton & Hammond, 1990). In a

controlled study of a referred sample of elementary-aged

children, improvements on teacher ratings were noted

following treatment, but these improvements were not

maintained relative to pretreatment scores at the eight-

month followup (Horn et al., 1990).

Most of the studies discussed above relied exclusively

on teacher rating scale measures. It is true that teacher

ratings have demonstrated high stability for up to four

years, despite transitions between classes, schools, and

teachers, and that aggressive and externalizing behaviors

show the greatest consistency (Verhulst & Van Der Ende,

1991). Campbell has also commented on the impressive

stability of teachers' report despite their relatively brief

history with the rated children (Campbell & Ewing, 1990).

It would appear that teacher ratings are capturing

characteristics of students that endure across time and

specific environment.

While teacher ratings are an important component of

multimodal assessment, previous studies have suggested that

treatment effects may be more readily attained on rating

scale measures than with measures such as behavioral

observations (Eyberg & Johnson, 1974; Johnson & Christensen,

1975). In their review of the parent-child treatment

literature, Forehand and Atkeson (1977, p. 575) concluded

that "the more rigorous the method of assessment, the less

positive the results."

PCIT School Generalization

McNeil et al. (1991) attempted to clarify the

conflicting findings on school generalization in a study

which corrected some of the methodological problems of

previous studies of PMT school generalization. A sample of

ten children, referred for significant behavior problems

both at home and at school, was evaluated using behavioral

observations as well as teacher rating scales to assess

school behavior. Pretreatment school assessment revealed

that, compared to normal control children, the treatment

children showed significantly higher rates of noncompliance,

oppositional behavior, and off-task behavior in the

classroom as well as elevations on teacher ratings of

conduct problem behaviors on the Revised Conners Teacher

Rating Scale (Goyette et al., 1978) and the Sutter-Eyberg

Student Behavior Inventory (Sutter & Eyberg, 1984). For

treatment to be considered successful, children were

required to move to within normal limits on a behavioral

observation of compliance to parental commands in the clinic

(Robinson & Eyberg, 1981) or on the Eyberg Child Behavior

Inventory (Eyberg & Ross, 1978; Robinson, Eyberg, & Ross,

1981), a rating scale of home behavior problems. Observed

compliance to parental commands for the ten children

increased from 40.7% at pretreatment to 70.4% at

posttreatment, and Eyberg Child Behavior Inventory

pretreatment elevations (Intensity Score: 180.7; Problem

Score: 23.3) decreased to within normal limits (Intensity

Score: 105.9; Problem Score: 6.1).

Following the completion of treatment, significant

improvements on school behavioral observation measures of

oppositional behavior and compliance were noted relative to

normal control children and nonreferred deviant control

subjects within the treated children's classrooms. At the

end of treatment, the treatment group fell within one-half

standard deviation of the normal control group on the

observational measures of compliance and appropriate

behavior. In addition, teacher ratings for the treated

children improved to within published normal limits on the

Conduct Problem factor of the Revised Conners Teacher Rating

Scale and on both scales of the Sutter-Eyberg Student

Behavior Inventory at posttreatment. In contrast,

behavioral observations of on-task behavior and teacher

ratings on the Hyperactivity Index of the Revised Conners

Teacher Rating Scale, while showing significant improvement

relative to pretreatment scores, did not improve

significantly more than the deviant control group.

Examination of individual children's data indicated that all

10 children improved 30% or more over baseline levels on

between 1 and 7 of the 7 outcome measures. Although 3

children showed only minimal improvement (e.g., only 3 of 7

outcome measures showed 30% improvement), there was no

evidence of a behavioral contrast effect of worsening

classroom behavior following parent-child treatment that

some previous studies had suggested.

Discussing differences between this demonstration of

school generalization following PCIT, and previous parent-

child treatment studies which did not show generalization,

the authors noted several features which differed from some

previous studies. These differences included 1)

demonstration of clinically significant improvements in home

behavior, 2) documentation of significant school behavior

problems prior to intervention, 3) longer treatment duration

(14 sessions) than previous studies (Breiner & Forehand = 8

sessions; Forehand et al. = 9.5 sessions), and 4) use of

younger subjects. The age range of children in the McNeil

et al. study was 2.6 years to 6.6 years (mean age 4.5),

while previous studies included slightly older children.

Treatment Maintenance Research Methodology

Use of Multiple Measures

The importance of using multiple measures at followup,

including dependent measures comparable to those employed

immediately after treatment, has been noted by a number of

investigators (Kendall, Lerner, & Craighead, 1984; Mash &

Terdal, 1977; Walker & Hops, 1976). The majority of

treatment followup studies have relied on single measures

such as a rating scale or an unstructured telephone

interview (Mash & Terdal, 1977). In followup assessment,

just as at posttreatment assessment, more confidence can be

placed in treatment effects that are demonstrated across

measurement methods.

Selection of Appropriate Comparison Group

The comparison group used by Strain and colleagues

(1982) consisted of four randomly selected classmates of the

same age and gender as the treated child. Other

investigators have also suggested that classroom peers

provide an appropriate standard for evaluation of

maintenance of school generalization (Kendall, Lerner, &

Craighead, 1984; Walker & Hops, 1976). Walker and Hops

(1976, pp. 159-160) noted that observational data on the

peers of a treated child "can be used to determine normal

limits and can provide a measure of uncontrolled

situational variables that may affect the target subject's

behavior." Kendall and Grove (1988, p. 150) describe

observation of nontreated children in the treatment child's

classroom as the method of choice for behavioral assessment

of classroom interventions, noting that "it is particularly

important that the normal samples come from situations

identical to those of the treated sample." Campbell and

Ewing (1990) compensated for control subject attrition in

their longitudinal study by recruiting new comparison

children from the current classrooms of study children.

In the McNeil et al. study (1991), rather than randomly

selecting classroom controls, normal and deviant control

children were chosen from the classrooms of the treated

children. A deviant control subject (a nonreferred child

identified by school personnel other than the rating teacher

as displaying school behavior problems) was used to serve

the function of an untreated control subject. A normal

control subject (a child described as showing "average"

school behaviors) provided a normative comparison against

which to evaluate the clinical significance of treatment.

This yielded more precision for the labor intensive

classroom observations than relying on random selection of

control children. The children nominated as "average"

controls appeared to be somewhat "better than average" on

teacher rating scale measures, with scores more than a

standard deviation below published norms on the Sutter-

Eyberg Student Behavior Inventory. However, on an

observational measure of compliance, the "average" control

children complied with 73% of teachers' commands, which is

quite close to the 77% average compliance with teacher

commands previously reported as typical for nonreferred

preschoolers (Atwater & Morris, 1988).

Statement of Purpose

The purpose of the present study was to evaluate the

maintenance of school generalization demonstrated by McNeil,

et al. (1991) by evaluating the original subjects twelve

months and eighteen months after completion of Parent-Child

Interaction Therapy. To validly assess the maintenance of

school improvements noted by McNeil et al., methods of

measurement were applied that were comparable to those used

in the original study, and normative comparisons were drawn

from the current peer groups of the treated children.

Followup assessments were in most cases conducted in

new classrooms, with different teachers and classmates from

the classroom in which generalization was previously

demonstrated. Treatment children's maintenance of

behavioral improvements was measured through comparison with

pre- and posttreatment data. The clinical significance of

treatment effects was evaluated by comparing the behavior of

the treatment children with that of current classmates.

In a modification of previous methodology (McNeil et

al., 1991), in which one normal control and one deviant

control child were selected, three comparison children were

selected from the classroom of each treated child. Students

were ranked by teachers as presenting few, average, or many

classroom behavior problems, and comparison children were

selected to represent this range. Behavioral observations

of these control children were used to gauge the maintenance

of behavioral effects in treated children. Treated children

were also contrasted with comparison children on teacher

rating scale measures of conduct problem behaviors,

hyperactivity, and social competence.


1. Treated children will maintain increased levels of

appropriate behavior and compliance to teacher's commands

over their pretreatment baselines.

2. Treatment children will fall within current

classroom norms on the same measures that previously

improved to within normal limits. These include percentage

of appropriate behavior and level of compliance during

behavioral observations, and teacher ratings on the Revised

Conners Teacher Rating Scale and the Sutter-Eyberg Student

Behavior Inventory.

3. Percentage of on-task behavior did not show

significant improvements during treatment, and there is no

reason to expect that it would fall within normal limits at

follow-up. On-Task behavior is expected to fall within the

pretreatment range and to be comparable to that of the "most

behavior problems" comparison group.

4. It is hypothesized that longterm maintenance of

increased levels of appropriate behavior and compliance seen

at posttreatment will result in social skills ratings that

are within normal limits by the time of followup, despite

the fact that improvements were not significant relative to

controls at posttreatment. Treatment children are expected

to show improvement beyond posttreatment levels of social

competence and to fall within the range of "average"

comparison children at followup.



Treated Subjects

Subjects in the original school generalization study

and the presently reported followup study comprised a subset

of subjects who participated in a larger treatment outcome

study evaluating Parent-Child Interaction Therapy (PCIT).

Subjects in the PCIT treatment outcome study were children

between the ages of 2 and 7 referred by physicians, mental

health professionals, or school personnel for treatment of

conduct problem behaviors at home and/or at school.

Inclusion criteria for a parent-child dyad in the PCIT

treatment study were as follows: 1) one parent's ratings of

the child's behavior above the published cut-off scores

(i.e., Intensity score = 127, Problem score = 11) on the

Eyberg Child Behavior Inventory (ECBI) (Eyberg & Ross,

1978); 2) designation of Oppositional Defiant Disorder,

Attention-deficit Hyperactivity Disorder, or Conduct

Disorder based on a structured interview with the parentss;

3) ratio of compliance with parental commands lower than the

average for nonreferred children (i.e., below 62%) in clinic

behavioral observations using the Dyadic Parent-Child

Interaction Coding System (DPICS) (Robinson & Eyberg, 1981);

and 4) no evidence of moderate to severe mental retardation

in either the primary caretaker or the child.

A subset of children who met these criteria

additionally qualified for inclusion in the original school

generalization study by receiving teacher ratings at least

one standard deviation above published normal limits on the

Conduct Problem factor and/or Hyperactivity Index of the

Revised Conners Teacher Rating Scale (Goyette, Conners, &

Ulrich, 1976). Twelve children met these criteria for the

school generalization study, although only ten were

ultimately included in that study. One child was excluded

because the school term ended before posttreatment

observations could be collected. Another child, who was on

medication for treatment of hyperactivity throughout the

course of PCIT treatment and followup, was excluded because

the research design of the original school generalization

study required that subjects not be on medication. All 12

children who met the basic inclusion criteria went on to

successfully complete a 14-session course of PCIT with their

parentss, and all 12 were invited to participate in the

current study.

Pretreatment results of the DSM-III-R structured

interview with parents indicated that, of the original 10

school generalization children, 1 met the criteria for

Oppositional Defiant Disorder (ODD), 1 met the criteria for

Attention-deficit Hyperactivity Disorder (ADHD), 5 met the

criteria for both ODD and ADHD, and 3 children met the

criteria for ODD, ADHD, and Conduct Disorder. Regarding the

two additional children included in the followup study, both

met criteria for ODD and ADHD at pretreatment. McNeil et

al. noted that the use of DSM-III-R diagnostic information

was provided solely for research sample description because

these diagnostic categories have not been validated for

children under the age of 7 years old (McNeil et al., 1991).

Although children were not excluded on the basis of

gender, race, or SES, all of the children who met the

inclusion criteria were male. Eleven of the treatment

subjects, including the 2 additional followup subjects, were

White, and 1 was of Asian descent. The mean pretreatment

age of the original 10 treatment children was 54 months

(range = 31 to 79 months). The 2 additional subjects were

aged 57 months and 76 months at pretreatment assessment,

yielding a mean age of 56 months for the 12 subjects. Six

of the treatment children came from two-parent families, and

in 4 of these families the father participated in treatment.

Four subjects in the original school generalization study,

as well as the 2 additional followup subjects, lived with

their mother only. On the Four Factor Index of Social

Status (Hollingshead, 1975), which ranges from 8 to 66, the

mean score for the original 10 subjects was 33.60 (SD =

13.49; range = 17 61). The mean Hollingshead social

status score based on twelve subjects was 36.42 (SD = 14.17;

range = 17 61).

Eleven of the potential 12 subjects (91.67%)

participated in the 12-month followup. Of these subjects,

incomplete data were collected on 2 children. One teacher

provided rating scale measures but refused to allow

classroom observations. Another teacher permitted classroom

observations, but failed to complete rating scales. Because

one subject completed treatment several months later than

the other participants in the original school generalization

study, this subject's 12-month follow-up assessment occurred

concurrently with 18-month follow-up for the other subjects.

No 18-month follow-up assessment was conducted for this

child, which limited the second followup to eleven subjects.

All 11 of these (100%) participated in the full assessment.

The average age at the 12-month followup was 6 years 2

months (range: 3 years 10 months to 7 years 11 months). At

the 12-month followup, 3 children were in preschool

classrooms, 4 were in kindergarten, 2 were in first grade,

and 2 were in second grade. The average age at the 18-month

followup was 6 years 8 months (range: 4 years 6 months to 8

years 7 months). At the second followup, 1 child was in

preschool, 3 were in kindergarten, 4 were in first grade, 1

was in second grade, and 2 were in third grade.

Eleven of the 12 treatment families remained in the

same family constellation at both followups as during

treatment (5 two-parent homes and 6 single mothers). In one

family the parents had separated prior to the 12-month

follow-up, changing the family constellation from a two-

parent home to a household headed by a single mother for

both followup assessments. Another family underwent a

temporary marital separation between the two followup


As mentioned above, 1 treatment child was on medication

for hyperactivity throughout treatment and at both followup

intervals. Two children were taking medication for

hyperactivity at both the 12-month and 18-month followups.

Two additional children, 1 who only participated in the 12-

month followup, and 1 who only participated in the 18-month

followup, were also placed on stimulant medication following

the end of PCIT treatment. Teachers were asked if any of

the children showed atypical behavior during the

observations. At the first observation period, one teacher

noted that a treatment child had recently been placed on

medication and appeared calmer than usual. A comparison

child in a different classroom was described as more

difficult than usual due to the recent birth of a sibling.

At the 18-month followup, 1 treatment child left school

early due to a cold following observations on one of the

three days he was observed. Another teacher reported that

she changed her behavior toward a treatment child during the

observations by giving "many more cues than usual" and

extending his task times to longer than she typically


Four treatment children were receiving special

educational services at the time of the 12-month followup.

Two of these were originally referred by a developmental

preschool where they were enrolled throughout the course of

PCIT treatment. One of these children remained in the

developmental preschool at first followup, while the second

had moved to a special education kindergarten classroom. A

third target child was in an ESE (Exceptional Student

Education) placement, and another had been referred for

special services for learning disabilities (LD) in his

regular first grade at the 12-month followup.

Six of the 11 target children were receiving special

educational services at the time of the 18-month followup.

One child had moved from developmental preschool into a

self-contained multi-handicapped primary school classroom.

The other child from developmental preschool had been

mainstreamed into a regular kindergarten with ongoing

special services. The 2 children receiving, respectively,

ESE and LD services at the first observation continued in

these placements. Finally, 2 additional children had been

referred for LD services at the second followup interval.

Comparison Subjects

Three comparison subjects were chosen from the

classroom of each treatment subject at each followup

interval. Passive consent forms (see Appendix A) were sent

home to all children in each treatment child's class a

minimum of one week prior to observations. Parents who did

not want their child to participate responded by signing the

form and returning it to the teacher. Two or fewer refusals

were received in each classroom, with no refusals in most.

For selection of comparison children, the teacher of

the target child's class was asked to list (by initials) all

boys in her classroom, omitting any whose parents had

declined participation. Children differing in age from the

target child by more than one year were also deleted from

the list of potential comparison subjects, as were those who

were nonverbal, blind, or nonambulatory. The teacher then

rated each child on a 1-5 scale where "1" represented "the

fewest problems relative to the behavior of boys in this

class" and "5" represented "the most behavior problems

relative to the behavior of boys in this class" (see

Appendix B). Three comparison subjects were then selected

to be observed along with the treated child, one rated as

"1," one rated as "3," and one rated as "5."

In cases where more than one child received the same

rating (e.g. "3"), the comparison child was chosen randomly

from the available subjects. In cases where no child was

available for observation with a given rating (e.g., child

rated "5" was absent), a child receiving the nearest

numerical rating (e.g., "4") was selected at random as the

comparison subject. If no child with a "3" rating was

eligible, a comparison child was randomly selected from

those receiving ratings of "2" and "4". At the 12-month

followup, the mean rating of comparison children with fewest

behavior problems was 1.09 across the 11 classrooms. The

mean rating for comparison children with "average" behavior

problems was 2.82, and the mean rating for comparison

children with most behavior problems was 4.64. At the 18-

month followup, the mean rating of comparison children with

fewest behavior problems was 1.00 across the 11 classrooms.

The mean rating for comparison children with "average"

behavior problems was 3.00, and the mean rating for

comparison children with most behavior problems was 4.54.

At each followup interval teachers were asked whether

they felt any of the children they rated were in need of

treatment for behavior problems. At the 12-month followup,

3 treatment children, 6 "most behavior problems" comparison

children and 1 "average behavior" comparison child were

judged to need treatment. At the 18-month followup, 4

target children, 7 "most behavior problems" comparison

children and 1 "average behavior" comparison child were

identified as in need of treatment. Seven of the 33

comparison children (21%) were receiving special educational

services at the 12-month followup; these included 2 "few

problems" comparison children, 2 "average problems"

comparison children, and 3 "most behavior problems"

comparison children. Nine of 33 comparison subjects (27%)

were receiving special educational services at the 18-month

followup; 1 "few problems" comparison child, 3 "average

problems" comparison children, and 5 "most behavior

problems" comparison children.

As mentioned above, comparison children with an age

difference of more than one year from the treatment child

were excluded where possible. An exception was made at the

18-month followup, when 1 treatment subject had been placed

in a primary school ESE classroom where he was more than two

years younger than his classmates. In that classroom,

comparison children were matched for mental age within 2

years of the treated child, based on the teacher's report.

Comparison children, matched for gender to treatment

subjects, were all male. Age was not recorded for all

comparison children, but for the 80% (53 out of 66) for whom

age was reported, ages of comparison subjects ranged from 3

years, 5 months to 8 years, 9 months at the first followup

and from 4 years (months not reported) to 11 years, 7 months

at the second followup.


Eyberg Child Behavior Inventory (ECBI)

The ECBI is a brief, psychometrically sound measure of

disruptive behaviors of children aged 2 to 16 (Eyberg,

1992). The ECBI provides a list of 36 conduct problem

behaviors, including items such as sassess adults,"

"whines," "physically fights with brothers and sisters," and

"is overactive or restless" (see Appendix C). Each behavior

is rated on two dimensions, frequency of occurrence and

identification as a problem. Parents indicate how often the

behavior currently occurs on a scale from "never" (1) to

"always" (7), and the item ratings are summed to yield the

intensity score which has a potential range of 36 to 252.

Parents also indicate whether the behavior is currently a

problem on a yes/no scale, yielding a problem score which is

the sum of "yes" responses and has a potential range of 0 to


The ECBI was initially standardized on preschool

samples (age range: 2 to 7 years) of 42 normal children and

43 behavior problem children (Eyberg & Ross, 1978). The

normal preschoolers had a mean intensity score of 102.6

(S.D. = 25.5) and a mean problem score of 4.62 (S.D. = 8.8).

Subsequent studies expanded standardization of the ECBI

throughout the targeted age range of 2 through 16 years and

established the psychometric properties of the instrument

(Boggs, Eyberg, & Reynolds, 1990; Eyberg & Robinson, 1983;

Robinson, Eyberg, & Ross, 1980). Internal consistency

coefficients of .98 have been reported for both scales of

the ECBI, and test-retest stability over a 3-week period has

been found satisfactory for both the Intensity Scale (r =

.86) and the Problem Scale (r = .88) (Robinson et al.,

1980). Subsequent reports have demonstrated that ECBI

scores are stable over repeated administrations (Eyberg &

Boggs, 1989) and over intervals of up to one year (Pearson's

correlation: r(32) = .75 for Intensity and Problem scales)

(Funderburk, Eyberg, & Behar, 1993).

ECBI scores have been shown to relate significantly to

direct observational measures of parent-child interaction,

activity level, and temperament (Robinson & Eyberg, 1981;

Webster-Stratton & Eyberg, 1982) as well as to the Child

Behavior Checklist (CBCL Externalizing total score

correlated .67 with ECBI intensity score and .75 with ECBI

problem score) (Boggs et al., 1990). Several studies have

demonstrated discrimination among nonreferred, conduct

problem, neglected, and other clinic-referred children

(Argona & Eyberg, 1981; Eyberg & Robinson, 1983; Eyberg &

Ross, 1978; Robinson et al., 1980). Finally, the ECBI has

been shown to be sensitive to treatment effects of parent

training programs (Eisenstadt et al., in press; Eyberg &

Robinson, 1982; Packard, Robinson, & Grove, 1983; Webster-

Stratton, 1982, 1984; Wolfe, Sandler, & Kaufman, 1981).

Revised Conners Teacher Rating Scale (RCTRS)

The RCTRS is a shortened version of the original 39-

item Conners Teacher Rating Scale (Conners, 1969). Both

teacher rating scales were designed to help identify

hyperactive children, although additional symptom patterns

have also been assessed based on factor analysis of the

scales. The RCTRS asks teachers to rate the degree to which

a child exhibits each of 28 listed symptoms on a 4-point

scale ranging from "not at all" (0) to "very much" (3) (see

Appendix D). The revised scale includes three factors:

Conduct Problem, Hyperactivity, and Inattentive-Passive, as

well as a total score which is the average score of the 28

items. Factor scores are obtained by summing the points for

each item on a factor and dividing by the number of items on

that factor.

The RCTRS was standardized on a sample of 570 children

between the ages of 3 and 17 (Goyette et al., 1978). Mean

factor scores of three- to five-year-olds (n = 24) in the

normative sample were: Conduct Problem = .49 (SD = .74);

Hyperactivity = .74 (SD = .74); and Inattentive-passive =

.83 (SD = .87). Mean factor scores for the six- to eight-

year-old age range (n = 102) were: Conduct Problem = .30 (SD

= .41); Hyperactivity = .46 (SD = .57); and Inattentive-

passive = .64 (SD = .71).

The factor structure of the RCTRS was shown to be

highly congruent with the longer original Conners Teacher

Rating Scale (Goyette et al., 1978). The 39-item version of

the rating scale has demonstrated stability over a one-month

interval (Pearson's correlations of .72 .91 for the total

and factor scores) (Conners, 1969). Stability coefficients

of .94 .98 were found for the total and factor scores of

the RCTRS (Edelbrock & Reed, 1984). The RCTRS has shown

significant correlations with the Conners Parent Rating

Scale, although there are no available reports of

interteacher agreement (Goyette et al., 1978). Research on

the Conners scales has demonstrated differentiation between

normal and hyperactive children and sensitivity to the

effects of medication for hyperactivity (Conners, 1970).

Sutter-Eyberg Student Behavior Inventory (SESBI)

The SESBI (Sutter & Eyberg, 1984) was designed as a

unidimensional measure of classroom conduct problem

behaviors to aid in discriminating normal children from

children in need of treatment for school conduct problem

behaviors (Eyberg, 1992). The SESBI provides a list of 36

conduct problem behaviors observable by teachers (see

Appendix E). It has the same format and scoring as the

ECBI, with behaviors rated on two dimensions, frequency of

occurrence and identification as a problem.

The SESBI was standardized on a sample of 55

preschoolers aged 3 to 5 in Gainesville, Florida (Funderburk

& Eyberg, 1989). The mean intensity score for this sample

was 100.9 with a standard deviation of 47.6 and a range of

scores from 36 to 228. The mean problem score was 6.0 with

a standard deviation of 8.8 and a range of scores from 0 to

33. The SESBI showed acceptable interteacher agreement

(mean r = .85 for the Intensity Scale and mean r = .87 for

the Problem Scale based on seven and six pairs of teachers,

respectively) and high internal consistency coefficients of

.98 for the Intensity Scale and .96 for the Problem Scale

(Funderburk & Eyberg, 1989). Funderburk and Eyberg (1989)

assessed test-retest stability over a 1-week interval,

finding coefficients of .90 for the Intensity Scale and .89

for the Problem Scale. Significant correlations have been

found between the SESBI and several school rating scales

including the Conners Teacher Rating Scale (correlations

ranged from .67 to .92 between the two SESBI scales and the

Conners Conduct Problem Factor, Hyperactivity Factor,

Inattentive/Passive Factor, and Hyperkinesis Factor) (Sosna,

Ladish, Warner, & Burns, 1989), the Child Behavior

Checklist--Teacher Form (correlations of .87 and .71 of

SESBI intensity and problem scores with CBC externalizing

score) (Schaughency, Hurley, Yano, Seeley, & Talarico,

1989), and the Behar Preschool Behavior Questionnaire

(correlations of .76 and .61 between the Behar total score

and the SESBI intensity and problem scores, respectively)

(Funderburk & Eyberg, 1989). Significant correlations have

also been found in a sample of behavior problem children

between the SESBI and classroom observational measures of

appropriate behavior (correlations of -.67 and -.70 with the

SESBI intensity and problem scores) and on-task behavior

(correlations of -.54 and -.58 with the SESBI intensity and

problem scores) (Newcomb, Eyberg, Bodiford, Eisenstadt, &

Funderburk, 1989). Initial evidence that the SESBI

discriminates between children referred for school behavior

problems and those referred for developmental problems has

been reported (Funderburk & Eyberg, 1989). Finally, the

original school generalization study demonstrated that the

SESBI is sensitive to treatment effects (McNeil et al.,


Walker-McConnell Scale of Social Competence and School
Adjustment: A Social Skills Rating Scale for Teachers

The Walker-McConnell is a school rating scale designed

to assess adaptive classroom behavior and interpersonal

social competence in elementary-aged school children (Walker

& McConnell, 1987). The scale's 43 items are positively

stated descriptions of social skills which are rated by

teachers on a 1 ("never") to 5 ("frequently") Likert scale

(see Appendix F). The scale yields a total score and scores

on three subscales: 1) Teacher Preferred Social Behavior,

defined as "peer related social behavior that is highly

valued or preferred by teachers;" 2) Peer Preferred Social

Behavior, defined as "peer related social behavior that is

highly valued by peers;" and 3) School Adjustment Behavior,

defined as "social-behavioral competencies that are highly

valued by teachers within classroom instructional contexts."

The Walker-McConnell was standardized on a national

normative sample of 1812 elementary school children. Mean

scores for the normative sample were as follows: Total Scale

= 161.35 (SD = 32.41); Teacher-Preferred Social Behavior =

58.74 (SD = 13.51); Peer-Preferred Social Behavior = 63.72

(SD = 13.05); and School Adjustment Behavior = 38.90 (SD =

9.74). Internal consistency coefficients of .97, .96, .95,

and .96 were found for the Total Scale and three subscales,

respectively. Low to moderate interrater agreement has been

reported for small samples, with higher agreement reported

for nonclinical samples (Total Scale = .77; subscales = .63-

.83) than for a clinical sample (Total Scale = .53;

subscales = .36 .52) (Walker & McConnell, 1987). Hops

(1987) evaluated test-retest stability over a three-week

interval for 323 second and fourth graders. Test-retest

coefficients were .90, .88, and .92 for the three subscales.

Long-term test-retest stability was investigated over the

interval from fall to spring of the academic year for a

"sample of 80 antisocial and normal pupils in the fifth and

sixth grades" (Walker & McConnell, 1987). Test-retest

correlations over this interval were .67, .61, .70, and .65,

respectively, for the total scale and subscales 1 3.

Classroom Observational Coding System

The classroom behavioral coding system used in the

current study represents a modified version of that used in

the original school generalization study (McNeil et al.,

1991). Like in the McNeil et al. study, three behavior

categories were coded: 1) appropriate vs. oppositional

behavior vs. unsure; 2) comply vs. noncomply vs. unsure/no

command; and 3) on task vs. off task vs. not applicable.

This coding system combined procedures used by the Forehand

research group (Breiner & Forehand, 1981; Forehand, Sturgis,

McMahon, Aguar, Green, & Breiner, 1979) with procedures used

by Walker and colleagues (Walker, Shinn, O'Neill, & Ramsey,

1987). The categories of appropriate vs. oppositional and

compliant vs. noncompliant behavior were used in previous

studies of generalization of parent-child therapy to the

school setting (Breiner & Forehand, 1981; Forehand et al.,

1979). The Forehand group's studies did not include a

measure of on-task behavior. Walker et al. (1987) defined

the on-task category used here as "academically engaged

time." Definitions of the school behavior coding categories

used in the original school generalization study are

presented in Appendix G. These same definitions were

maintained in the followup study, with some modifications to

promote clarity. The revised school behavioral definitions

are presented in Appendix H.

The time-sampling procedure used by McNeil et al.

(1991) was similar to that employed by the Forehand group.

McNeil et al. used 10-second observation intervals followed

by 10-second intervals for recording, whereas the previous

school generalization studies used 10-second observation

intervals followed by 5-second recording intervals (Breiner

& Forehand, 1981; Forehand et al., 1979). Walker et al.

(1987) originally recorded the duration of academically

engaged time rather than using time-sampling for this

category. The McNeil et al. category of on-task behavior

maintained Walker's definition, but coded on-task behavior

in 10-second intervals. For on-task to be coded,

academically engaged behavior was required throughout the

10-second interval; off-task would be coded for that

interval if any nonengaged time occurred. A similar

modified time-sampling procedure, combining frequency coding

of oppositional behaviors and compliance with modified

duration coding of on-task behavior, has been used in other

well-researched school behavioral coding systems for

hyperactive children (Abikoff, Gittelman, & Klein, 1980;

Abikoff, Gittelman-Klein, & Klein, 1977).

McNeil et al. (1991) found high percentages of

agreement among coders: appropriate = 92.6% (range: 87% -

97%); comply = 93.2% (range: 83% 99%); and on task = 90.0%

(range: 80% 96%). These interrater agreement statistics

are similar to those, all above 90%, reported by Forehand

and colleagues and Walker and colleagues for their versions

of the coding system (Forehand et al., 1979; Walker et al.,


In the McNeil et al. (1991) coding system the target

child and two control children were each observed for 10

seconds out of every minute, and 45 minutes were coded

during each observation period, yielding seven and one half

minutes of data per child. Two observation periods were

included in each assessment, yielding approximately 15

minutes of data per child.

Because the present study included an additional

comparison subject, a more efficient sampling procedure was

desired. The alternative procedure consisted of consecutive

10-second intervals for one minute with a 15-second break

after each minute. This type of behavioral coding also has

been described in previously reported school observation

systems (Abikoff et al., 1980; Strain, Steele, Ellis, &

Timm, 1982; Walker & Hops, 1976). The method allows four

children to be coded for two minutes each in a 10-minute

cycle. Five observation cycles were coded in each

observation period, providing 10 minutes of data for each

subject. Three observation periods were included in each

assessment, yielding approximately 30 minutes of data per

child at each followup interval.

Observation sessions were conducted on three different

days within ten consecutive school days whenever possible.

This 10-day interval was exceeded in three cases at the 12-

month followup assessment due to 1 target child unexpectedly

leaving town for several weeks (interval of data collection

= 29 days), and two classrooms having extensive absences due

to chicken pox (intervals of 24 and 47 days for data

collection). It was not possible to complete three days of

observation in two of the target classrooms at the 12-month

followup due to the approaching end of the school year. Two

days of observations were completed in one of these

classrooms and one day of observations in the other. These

observation periods were extended for longer than standard

to approximate the goal of 30 minutes of data per child.

Twenty-nine minutes of coding were achieved for the treated

child and comparison children available on two observation

days, and 20 minutes of coding were achieved for the treated

child and comparison children available for only one day.

Observations were rescheduled if a treated child was absent,

but occasionally comparison children were also absent,

thereby reducing the available data for some comparison

children. Fewer than 30 minutes of data were coded on 5 of

the 66 comparison subjects (3 "fewest problems," 1 "average

problems," and 1 "most problems" comparison child); data

collected ranged from 10 to 24 minutes for these comparison


Observations were conducted within a 10-school-day

period in each classroom at the 18-month followup

assessment, and 3 days of observation were attained for 9 of

the 11 treatment children. Only 2 days of observation were

possible in the two remaining classrooms. Fewer than 30

minutes of data were coded for one target child (24

minutes), one "fewest problems" comparison child (24

minutes), two "average problems" comparison children (15 and

20 minutes), and two "most problems" comparison children (12

and 24 minutes).

The school coding system yields a percentage score for

each of the three behavior categories: appropriate behavior,

time on task, and compliance. The percentage of appropriate

behavior was derived from the intervals in which appropriate

behavior was coded divided by the total number of intervals.

Percentage of time on task was calculated as the total

intervals of on-task behavior divided by the total of on-

task plus off-task intervals, omitting intervals in which

the category was not applicable. Percent compliance

represents the total number of commands obeyed divided by

the total number of commands directed toward the child.

Interrater agreement percentages, defined as the number

of agreements divided by the number of agreements plus

disagreements, were calculated in the present study to

conform with the method used in the original school

generalization study. Kappa statistics of interrater

agreement were also calculated in order to control for

chance agreements. Kappa and percent agreement scores were

computed for each of the three behavior categories:

appropriate behavior, compliance, and on-task behavior.


Classroom Evaluation

Parents of children who completed the school

generalization study (McNeil et al., 1991) completed a

followup contact form at the end of treatment in which they

provided permission to be contacted for followup along with

addresses and telephone numbers (see Appendix I). Parents

were contacted with a description of the school

generalization followup by telephone, mail, or in person

during followup assessments at the psychology clinic

scheduled as part of a separate research project. All

agreed to participate and signed an informed consent (see

Appendix J) and a release of information permitting

communication with the child's school (see Appendix K).

Parents who returned for research assessments at the

psychology clinic within one month of scheduled school

observations completed ECBIs during their clinic visit.

Other parents were mailed ECBIs along with return postage.

Parents who failed to return an ECBI after two mailings were

contacted by telephone when possible and read the ECBI over

the telephone.

Once parental consent was obtained, the teacher and/or

principal of each treated child's school were contacted by

telephone and provided with a description of the research.

The various participating schools varied in their

requirements for approval of research projects. Procedures

ranged from simply scheduling visits with the teacher to

providing information to a school board for review to

submitting a standardized research application to a county

public school system's research director. Principals were

provided with a copy of the parent's information release for

the school's records. At the time of initial contact the

teacher was questioned as to his/her knowledge of the

identity of the treated child. In cases where the teacher

was unaware of which child had received treatment, he or she

was not informed prior to the assessment in order to elicit

blind ratings. Despite these efforts, most teachers were

aware (generally through parent-teacher conferences or

conversation with previous teachers) of the treated child's

status. Four of 11 teachers were blind to treatment status

at the 12-month followup and 3 of 11 were blind at the 18-

month followup.

Once approval was obtained to collect data, the teacher

was consulted (usually by telephone) to schedule three days

of observation during relatively structured classroom

activity times. These times were selected as those when

teachers were likely to place most demands on students,

because previous school generalization studies (Forehand et

al., 1979) reported difficulty obtaining a high enough rate

of commands to reliably code compliance. Letters containing

passive consent forms for comparison children (see Appendix

A) were given to the teacher to be sent home with every

child in the class a minimum of five school days prior to


On the first scheduled observation day, once the

experimenter selected the comparison children on the basis

of the teacher's 1-5 rankings, the children to be observed

were pointed out, when possible, by someone other than the

teacher (e.g., the classroom aide). In cases where no one

else was present to identify the children, the teacher was

asked to point out several additional children so as to

minimize the likelihood that the teacher would know

specifically which children were being observed. The

teacher was told that, in addition to the treated child, a

cross-section of children were randomly chosen to represent

the range of behavior in the classroom.

Behavioral observations were collected from an

unobtrusive location in the classroom according to the

procedure described in the section on the Classroom Coding

System. On the final day of observations, the teacher was

given rating forms to complete on the treated child and

comparison children. The teacher was asked to complete the

RCTRS, the SESBI, and the Walker-McConnell on the four

children. The teacher also completed a form noting whether

the children's behavior was typical during the observations

and whether any of the children, to the teacher's knowledge,

were receiving or in need of special education services or

treatment for behavior problems (see Appendix L). Upon

completion of the rating scales the teacher was reimbursed

$20 for his or her time.

One treatment child remained in the same classroom and

with the same teacher at first followup as during treatment.

The other ten participants in the 12-month followup were in

different classrooms. All 18-month followup assessments

occurred in different classrooms from 12-month observations.

The same procedure was followed for each follow-up

assessment, including the comparison subject selection

process. No attempt was made to track comparison children

from the 12-month to 18-month followup since they were never

identified by name.

Interrater Reliability Calculations and Coder Training

Primary coders were five graduate students and one

advanced undergraduate research assistant who were blind to

the treatment status of the children. Coders who collected

12-month observations did not code the same treated child at

the 18-month observation. The experimenter served as a

reliability coder and accompanied the primary coder for 67%

of observations (62% at the 12-month followup and 71% at the

18-month followup). When two coders were present, they

coded simultaneously using a dual-jack earphone for the

taped interval signals. The reliability coder's data were

used only in reliability calculations; the data of the

primary coder who was blind to treatment status were always

used in data analysis.

Reliability was calculated by two methods. Percent

agreement was used to replicate reliability results from

McNeil et al. (1991), and the Kappa statistic was used to

control for chance agreement. The Appropriate behavior

variable was calculated first in terms of agreement on

specific misbehaviors (e.g., no agreement coded if primary

coder noted "disruptive" and the reliability coder noted

"out of seat") and then by collapsing categories of

misbehavior into a global agreement on the presence or

absence of inappropriate behaviors. The overall percent

agreement for appropriate behavior was .89 for global

agreement and .85 for specific agreement. Kappa statistics

for appropriate behavior were .56 for global agreement and

.59 for specific agreement. For the compliance variable,

overall percent agreement was .83 and Kappa was .65. For

the on-task variable, overall percent agreement was .88 and

Kappa was .66. Table 1 presents overall reliability

statistics as well as reliabilities calculated per followup,

per coder, and per subject.

Coder training consisted of didactic presentations,

practice coding of videotapes of children in dyadic

interactions with parents and in classroom situations, and

live classroom coding in a laboratory preschool. Coders

were required to attain an 80% reliability criterion in a

90-minute live coding session before collecting data for the

study. Regular training sessions employing didactic

presentations and coding of videotapes were used to reduce

the possibility of observer drift. Training time required

to meet the criterion ranged from 5.5 hours for a graduate

student with extensive prior experience with the Dyadic

Parent-Child Interaction Coding System to 24.5 hours for an

advanced undergraduate student with little prior behavioral

coding experience. The average total training time for

graduate student coders was 9.2 hours with an average of 1.4

hours of didactic training, 1.3 hours of coding from

videotapes, and 6.5 hours of live coding. As mentioned

above, 24.5 total hours of training were required for the

one undergraduate coder; these included 5 hours of didactic

training, 5.5 hours of coding videotapes, and 14 hours of

live coding.

Table 1

Interrater Aareement Results


Global Specific

% Agree/Kappa % Agree/Kappa %Agree/Kappa %Agree/Kappa


Total .89/.56 .85/.59 .83/.65 .88/.66

12-Month .92/.63 .88/.61 .85/.69 .90/.68

18-Month .87/.52 .83/.58 .80/.59 .86/.64

By Coder

Coder 1 .93/.58 .91/.60 .86/.74 .90/.70

Coder 2 .88/.55 .84/.58 .81/.58 .88/.68

Coder 3 .92/.63 .88/.61 .89/.68 .91/.71

Coder 4 .87/.46 .83/.55 .72/.52 .80/.49

Coder 5 .92/.64 .89/.64 .90/.78 .94/.75

Coder 6 .91/.68 .82/.59 .83/.49 .87/.61

By Subject 12-Month Followupa

Sub. 1 .86/.54 .78/.53 .90/.80 .86/.73

Sub. 2 .94/.68 .92/.70 .89/.78 .96/.75

Sub. 3 .91/.62 .84/.59 .83/.56 .86/.58

Sub. 6 .94/.63 .91/.64 .88/.78 .91/.72

Sub. 7 .92/.69 .86/.61 1.0/--- .95/.82

Sub. 8 .90/.68 .81/.59 .83/.54 .86/.61

Sub. 9 .93/.61 .89/.59 .86/.58 .90/.67

Sub. 12 .91/.38 .90/.56 .63/.36 .89/.61

Table 1--continued


Global Specific

% Agree/Kappa % Agree/Kappa %Agree/Kappa %Agree/Kappa

By Subject 18-Month Followupb

Sub. 1 .83/.49 .80/.57 .74/.51 .82/.62

Sub. 2 .82/.48 .74/.51 .76/.58 .72/.44

Sub. 3 .92/.69 .90/.70 .88/0.0 .89/.74

Sub. 4 .95/.64 .93/.67 .83/.64 .94/.70

Sub. 5 .84/.50 .78/.54 .88/.65 .85/.60

Sub. 6 .89/.67 .81/.62 .86/.55 .91/.79

Sub. 7 .79/.37 .78/.54 .71/.47 .77/.52

Sub. 9 .82/.38 .79/.53 .85/.73 .93/.79

Sub. 10 .89/.36 .86/.52 .75/.57 .94/.75

Sub. 11 .95/.15 .94/.49 .76/.20 .88/.47

Note. Percent agreement is reported on the left and Kappa
is reported on the right of the slash mark.
a Subject 4 and subject 5 did not participate in the 12-
month followup classroom observations. Reliability
coding was not done for subject 10 and subject 11.
b Subject 12 did not participate in the 18-month followup
classroom observations. Reliability coding was not done
for subject 8.


Treatment Subject Analysis of Variance

Treated children's outcome on dependent variables was

evaluated using a one group, repeated measures ANOVA to

compare scores across the four assessment intervals:

pretreatment, posttreatment, 12-month followup, and 18-month

followup. Polynomial contrasts were then used to evaluate

the nature of the repeated measures effect, e.g., upward or

downward shifts in mean scores across the four time

intervals. For example, all variables could be expected to

show a linear trend from pretreatment to posttreatment. If

this trend continued through both followup intervals, the

polynomial contrast would document a simple linear pattern.

If, however, scores significantly regressed toward

pretreatment levels after the observed posttreatment

improvement, then the polynomial contrast would indicate a

quadratic trend. A more complex pattern, with more than two

statistically significant "bends" in the data, would be

indicated by a cubic result in the polynomial contrast. To

illustrate the results of the polynomial contrasts, mean

scores for each assessment interval were graphed.

Behavioral Observations

Means and standard deviations for the observation

variables across the four assessment intervals are shown in

Table 2. Analysis of variance of behavior observation

variables revealed significant effects for compliance

(F(3,6) = 5.06, E = .04) and appropriate behavior (F(3,6) =

11.72, E = .006), but not for on task behavior (F(3,6) =

2.54, n.s.). It may be recalled that significant results

were not obtained for the on task variable in the original

McNeil et al. school generalization study (1991).

Table 2

Mean Scores for Treated Children on Observational

Measures of Classroom Behavior Across Time

Pre-Trt Post-Trt Fup 1 FuP 2

M M M M F(3,6)

(SD) (SD) (SD) (SD)

% Compliance 50.2 85.6 78.8 76.0 5.06*

(27.7) (18.0) (15.0) (17.1)

% Appropriate 61.9 83.8 86.0 75.1 11.72

(18.1) (17.5) (13.2) (11.5)

% On Task 68.0 81.1 79.8 67.7 2.54

(17.2) (15.8) (14.7) (11.4)

N= 9
p < .05 E < .001

Figures 1-3 display mean scores for the behavioral

observation measures plotted across the four assessment














Fup 2

Figure 1. Mean percent compliance for treated children

across four assessments, with error bars denoting 1 standard








e 60-


2 40-



Pre-Trt Post-Trt Fup 1 Fup 2

Figure 2. Mean percent appropriate behavior for treated

children across four assessments, with error bars denoting 1

standard deviation.






I 60-

a 50-

e 40-



Pre-Trt Post-Trt Fup 1 Fup2

Figure 3. Mean percent on-task behavior for treated

children across four assessments, with error bars denoting 1

standard deviation.

intervals. The compliance variable (see Figure 1) showed a

significant linear trend (F(1,8) = 6.51, p = .03) as well as

a significant quadratic trend (F(1,8) = 13.99, p = .006),

documenting the increase in compliance from pretreatment to

posttreatment, followed by a downward "bend" in the data as

scores declined slightly across follow-ups. Compliance

percentages remained within one standard deviation of

posttreatment scores at both followup assessments.

Percentages of appropriate behavior also remained

within one standard deviation of posttreatment percent

appropriate, with even a slight increase between

posttreatment and 12-month followup. However, the percent

appropriate score at second followup is also within one

standard deviation of the pretreatment score, as shown in

Table 2. This parabolic shape is confirmed by the

significant second degree polynomial contrast (F(1,8) =

29.55, p = .0006), or quadratic trend, illustrated in Figure


As mentioned above, the ANOVA results were not

significant for the on-task variable. Scores at

posttreatment and both followups fall within one standard

deviation of pretreatment scores. Percent of on-task

behavior at second follow-up dipped slightly below its

pretreatment level. To summarize, analyses of behavioral

observation measures showed mixed results. On-task behavior

showed no significant improvement at posttreatment or across

followups. For appropriate behavior, significant

posttreatment improvements were maintained at the 12-month

followup but not at the 18-month followup. Compliance

showed the strongest effect and it was maintained across

both followups.

Teacher Ratings

Means and standard deviations across the four

assessment intervals are shown in Table 3 for the SESBI

scales, and the Conners Hyperactivity Index, Conduct Problem

factor, and Inattentive-Passive factor. Analysis of

variance revealed significant effects for both the SESBI

Intensity scale (F(3,6) = 5.26, p = .04) and Problem scale

(F(3,6) = 8.44, p = .02), as well as the Conners Conduct

Problem factor (F(3,6) = 12.06, p = .006). Analysis of

variance revealed near significant effects for the Conners

Hyperactivity Index (F(3,6) = 4.28, p = .06) and Conners

Inattentive-Passive factor (F(3,6) = 3.58, p = .09).

Polynomial contrasts for all of these teacher rating

variables showed significant results for the second degree,

or quadratic contrast. Polynomial contrast values are shown

in Table 4. Figures 4 8 illustrate the parabolic curve of

the plotted means, with pretreatment to posttreatment

decline in rated problems followed by a flat maintenance

curve to the 12-month followup, and finally by degeneration

toward pretreatment levels at the 18-month follow-up. Mean

scores at the 12-month followup were generally equivalent to

posttreatment scores, whereas at the 18-month followup,

scores fell within one standard deviation of the

pretreatment mean for all teacher ratings examined except

the Conners Conduct Problem factor (See Table 3).

Table 3

Mean Scores for Treated Children on Teacher Ratings of

Classroom Behavior Problems Across Time

Pre-Trt Post-Trt Fup 1 Fup 2

M M M M F(3,6)

(SD) (SD) (SD) (SD)


Intensity 153.0 108.0 117.3 142.1 5.26*

(17.8) (29.2) (38.1) (45.0)

Problem 20.8 8.0 9.3 14.9 8.44

(6.0) (5.9) (8.5) (9.4)

Revised Conners

Hyper. Index 2.0 1.3 1.4 1.9 4.28

(.63) (.72) (.79) (.89)

Conduct Probs. 1.61 .84 .99 1.08 12.06

(.33) (.47) (.59) (.67)

Inatt. Passive 1.51 .97 1.30 1.67 3.58

(.73) (.71) (.94) (.79)

N = 9
< .9 .05 *
p< .05 p<







W 120-



Pre-Trt Post-Trt Fup 1 Fup 2

Figure 4. Mean SESBI Intensity Scores for treated children

across four assessments, with error bars denoting 1 standard








I I------i-----I ---
Pre-Trt Post-Trt Fup 1 Fup 2

Figure 5. Mean SESBI Problem Scores for treated children

across four assessments, with error bars denoting 1 standard






- 1.8


I 0.9



Pre-Trt Post-Trt Fup 1 Fup 2

Figure 6. Mean RCTRS Hyperactivity Index scores for treated

children across four assessments, with error bars denoting 1

standard deviation.




.. 1.8-

g 1.2-
c 1



Pre-Trt Post-Trt Fup 1 Fup 2

Figure 7. Mean RCTRS Conduct Problem factor scores for

treated children across four assessments, with error bars

denoting 1 standard deviation.






C 0.9-



n i



Fup 1


Figure 8. Mean RCTRS Inattentive-Passive factor scores for

treated children across four assessments, with error bars

denoting 1 standard deviation.


Table 4

Polynomial Contrast Values for Mean Teacher Ratings

Across Time

Polynomial Contrast Values

1st Degree 2nd Degree 3rd Degree

F(1.8) E F(1,8) p F(1,8) R


Intensity 0.18 n.s 16.41 .004 1.31 n.s.

Problem 3.37 n.s. 19.74 .002 1.56 n.s.

Revised Conners

Hyper. Index 0.00 n.s. 14.64 .005 0.38 n.s.

Conduct Probs. 3.68 n.s. 12.74 .007 1.59 n.s.

Inatt. Passive 0.77 n.s. 5.86 .042 1.64 n.s.

N = 9

Social Skills Ratings

Means and standard deviations are shown in Table 5 for

the Walker-McConnell total and subscale scores across the

four assessment intervals. Higher scores represent greater

social competence on this measure. Analysis of variance

revealed significant effects for the total score (F(3,6) =

17.48, p = .002), and each of the three subscales: Teacher-

Preferred Behavior (F(3,6) = 8.34, p = .01), Peer-Preferred

Behavior (F(3,6) = 7.28, p = .02), and School Adjustment

(F(3,6) = 5.93, p = .03).

Table 5

Mean Scores for Treated Children on Teacher Ratings of

Social Skills Across Time

Pre-Trt Post-Trt Fup 1 Fup 2

M M M M F(3,6)

(SD) (SD) (SD) (SD)


Total 96.8 130.2 135.4 114.0 17.48*

(25.8) (32.5) (36.0) (25.5)

Teach. Prefer. 34.1 47.2 47.2 41.9 8.34*

(9.7) (10.2) (13.1) (11.2)

Peer Prefer. 37.4 51.4 55.8 47.0 7.28*

(15.2) (14.0) (15.0) (9.4)

Sch. Adjust. 21.9 31.6 32.4 25.1 5.93*

(7.5) (9.4) (10.4) (6.6)

N = 9
S< .05 < .01

The results of polynomial contrasts for this social

skills measure differed somewhat from those reported above

for the disruptive behavior variables. Both the first and

second degree contrasts were significant for the Walker-

McConnell total score, Teacher-Preferred Behavior subscale,

and Peer-Preferred subscale. Plotted mean scores, shown in

Figures 9 11, illustrate that treatment gains evident at





a s

| 105-



45- i
Pre-Trt Post-Trt Fup 1 Fup 2

Figure 9. Mean Walker-McConnell social skills total scores
for treated children across four assessments with error
Dars cenoting 1 standard deviation.








10 I
Pre-Trt Post-Trt Fup 1 Fup 2

Figure 10. Mean Walker-McConnell social skills teacher

preferred behavior subscale for treated children across four

assessments, with error bars denoting 1 standard deviation.

0 65-

I 55-


0 35


15 _T
Pre-Trt Post-Trt Fup 1 Fup 2

Figure 11. Mean Walker-McConnell social skills peer

preferred behavior subscale for treated children across four

assessments, with error bars denoting 1 standard deviation.



e 40
. 35-



5 20-




Pre-Trt Post-Trt Fup 1 Fup 2

Figure 12. Mean Walker-McConnell social skills school

adjustment subscale for treated children across four

assessments, with error bars denoting 1 standard deviation.

posttreatment were maintained or improved at the 12-month

followup, followed by a relatively mild decline at the 18-

month followup. Scores at both followups remained within

one standard deviation of posttreatment scores, and all

scores at the 12-month followup were more than one standard

deviation above pretreatment levels. At the 18-month

followup, scores remained within one standard deviation of

posttreatment outcome, but also fell within one standard

deviation of the pretreatment level. For the School

Adjustment subscale of the Walker-McConnell, only the second

degree, or quadratic contrast, achieved significance. This

quadratic pattern, illustrated in Figure 12, is consistent

with the other Walker-McConnell results in showing

improvement from posttreatment to 12-month followup.

However, there is a steeper decline in scores from 12-month

to 18-month followup than for the other social skills

variables. Polynomial contrast results for the Walker-

McConnell are shown in Table 6.

Parent Ratings

Mothers' ratings on the ECBI were available for a

subset of children at both followup assessments. Mean ECBI

scores across the four time intervals are shown in Table 7.

Analysis of variance revealed significant effects for both

the Intensity scale (F(3,5) = 6.55, p = .03) and the Problem

scale (F(3,4) = 11.81 p = .02). Polynomial contrasts for

the ECBI Intensity scale were significant for the first

(F(1,7) = 14.67, p = .006), second (F(1,7) = 16.72, p =

.005), and third degree (F(1,7) = 10.20, p = .015). As

shown in Figure 13, ECBI intensity scores showed a sharp

decline from pre- to posttreatment, followed by a gradual

but significant increasing trend across the followup

assessments. Polynomial contrasts for the Problem scale

were significant for the first degree (F(1,6) = 17.14, p =

.006), and second degree (F(1,6) = 5.95, p = .05), and

approached significance for the third degree contrast

(E(1,6) = 5.06, p = .06). The plotted problem score means,

shown in Figure 14, show a steep pre- to posttreatment

decline followed by a significant increase at first followup

and a slight decline at second followup. Scores for the

ECBI intensity and problem scores are above published

clinical cutoffs at pretreatment and well within the normal

range at the three other assessment intervals.

To evaluate the effect of outliers or of distinct

subsets of subjects on the overall treatment group results,

plots of individual subjects' scores across assessments were

generated for each variable. No readily discernible

patterns emerged from these data, which are presented in

Appendix M. The number of children who maintained 30% gains

over pretreatment scores was also calculated for each

variable at the 12-month and 18-month followups, adopting

the criteria used by McNeil et al. (1991) in the original

school generalization study to define clinically significant

improvement. The current findings, shown in Table 8, echo

the results of the group ANOVAs, with percent compliance and

the RCTRS Conduct Problem factor showing the strongest

maintenance effects across followups. Teacher ratings

showed greater decrements between the 12-month and 18-month

followups in the number of children maintaining significant

improvements than did behavioral observation variables.

Table 6

Polynomial Contrast Values for Mean Walker-McConnell Social

Skills Ratings Across Time

Polynomial Contrast Values

1st Degree 2nd Degree 3rd Degree

F(1.8) p F(1,8) p F(1,8) E


Total 7.23

Teacher-Prefer 10.23

Peer-Prefer 5.24

School Adjust. 1.77













0.0 n.s.

0.23 n.s.

0.04 n.s.

0.0 n.s.

Note. N = 9

Treatment vs. Comparison Children

The original McNeil et al. (1991) school generalization

study used multivariate analysis of variance to explore

differences from pretreatment to posttreatmentamong treated

children and matched untreated control subjects. Because

followup assessments included the same treated children at

both assessments but different comparison children for each

assessment, a standard multivariate analysis of variance

with two groups (treatment and control) and time as a

repeated factor was not applicable. Instead, difference

scores were created and analyzed for each of the dependent


Table 7

Mean Scores for Treated Children on the Eyberg Child

Behavior Inventory

Pre-Trt Post-Trt Fup 1 Fup 2


(SD) (SD) (SD) (SD) (df)


Intensity 8 169.9 101.9 114.8 119.1 6.55

(32.4) (14.5) (18.0) (13.6) (3,5)

Problem 7 21.1 4.8 10.3 7.1 11.81*

(6.2) (3.2) (10.8) (5.8) (3,4)

S< .05 < .01

Three difference scores were created per dependent

variable at each followup by subtracting the score of each

comparison child from that of the treated child (i.e.,

treated child's score minus "fewest behavior problems"


comparison subject's score, treated child's score minus

"average behavior problems" comparison subject's score, and

treated child's score minus "most behavior problems"

comparison subject's score).

Table 8

Number of Children Maintaining 30% Improvement Over

PreTreatment Baseline at Followup

Followup 1 Followup 2

Behavioral Observations

% Compliance 6 7

% Appropriate 4 4

% On Task 2 4

Conduct Ratings

SESBI Intensity 5 3

SESBI Problem 8 5

RCTRS Hyp. Index 6 4

RCTRS Conduct Prob. 8 6

Social Skills Ratings

Total 8 5

Teacher Preferred 7 3

Peer Preferred 5 3

School Adjustment 6 5

Note. N = 10 at 12-month Followup; N = 11 at 18-month





| 160 -

, 140-





Pre-Trt Post-Trt Fup 1 Fup 2

Figure 13. Mean ECBI Intensity Scores for treated children

across four assessments, with error bars denoting 1 standard

deviation (N = 8).





- 15




Figure 14. Mean ECBI Problem Scores for treated children

across four assessments, with error bars denoting 1 standard

deviation (N = 7).

The hypothesis that the treated group did not differ

significantly from comparison children (i.e., the mean

difference between comparison children and treated children

was equal to zero) was tested for each dependent variable

using Hotelling's T2 separately for the 12-month and the 18-

month followups. Next the magnitude of the difference

across time between the treated group and comparison groups

was examined using Hotelling's T2 for dependent samples, a

multivariate generalization of the paired t-test (Tabachnick

& Fidell, 1989, p 447). This tested the hypothesis that the

vector of treatment-to-comparison difference scores found at

the 18-month followup would be identical to that of the 12-

month followup results.

To control for multiple comparisons, Holm's

sequentially rejective procedure was applied (Holm, 1979).

Dependent variables were grouped into three sets: behavior

observation variables, teacher behavior problem ratings, and

social skills ratings. The multivariate hypothesis for each

dependent variable required a significance level of .05

divided by the number of variables in the set (i.e., .05/3 =

.017 for behavioral observations; .05/5 = .01 for teacher

ratings; and .05/4 = .0125 for social skills ratings). To

determine the protected alpha level for pairwise

comparisons, pairwise test values were first sequenced from

largest to smallest. The largest two test values required

an alpha equal to the multivariate test alpha divided by two

(e.g., .01/2 = .005). Subsequent pairwise comparisons could

take the multivariate alpha as their significance level.

12-Month Followup

Behavioral observations. Mean scores for the treated

children and comparison children on the behavioral

observation measures at each followup are shown in Table 9.

Within the 12-month followup, only the appropriate behavior

variable showed a significant difference across mean

difference scores (T2 = 4.24, p = .006). This finding was

followed up with pairwise comparisons, of which the

comparison between the treated child and "few problems"

comparison child was sizable, but not statistically

significant (t = -2.02, p = .07). Pairwise comparisons for

the treated child versus the other two comparison groups did

not approach significance for the appropriate behavior

variable. Tests of the compliance variable (T2 = 1.05,

n.s.) and the on-task variable (T2 = 1.52, n.s.) were not

significant, perhaps due to the relatively restricted

variation and large standard deviations across comparison

groups. As Table 9 shows, the treatment group showed

compliance mean scores slightly below the "behavior problem"

comparison group, while their mean appropriate and on task

scores were slightly above those of the "behavior problem"

and "middle problems" comparison groups at the 12-month

followup. In summary, the treated group did not differ

significantly from the comparison children on observational

measures of compliance and on task at the 12-month followup.

The data were noteworthy for a high degree of overlap

between the "average problems" and "most problems"

comparison groups.

Table 9

Mean Percentages on Observational Measures per Followup

12- Month Followup 18-Month Followup

Mean S.D. Mean S.D.


Treated 77.6 14.6 77.5 17.2

Low Problems 91.3 9.9 92.5 12.6

Avg. Problems 81.4 16.4 81.7 17.6

High Problems 79.6 19.4 71.4 23.7


Treated 87.1 12.9 77.4 11.5

Low Problems 96.3 3.5 92.5 6.7

Avg. Problems 83.1 7.1 92.2 4.8

High Problems 85.0 9.6 84.5 12.2

On Task

Treated 81.3 14.6 71.1 12.7

Low Problems 90.6 6.7 90.1 9.0

Avg. Problems 79.5 12.1 85.9 8.6

High Problems 79.4 11.8 79.4 10.6

Note. N = 10 at 12-month Followup; N = 11 at 18-month

Teacher ratings. Table 10 shows mean scores for the

treatment and comparison children on the SESBI and Conners

for each followup assessment. Analyses of the 12-month data

showed significant differences across the vector of treated-

minus-comparison scores for both scales of the SESBI and all

three of the Conners factors evaluated. Hotelling's T2

values and significance levels are shown in Table 11.

Pairwise comparisons revealed that the treatment-versus-

"average" comparison was not significant for any variable,

meaning that treated children did not differ significantly

from this group. The contrasts between treated and "most

problems" comparison children were sizable for the SESBI

Intensity scale (t = 3.04, p = .01), the Conners

Hyperactivity Index (t = 3.26, p = .01), and the Conners

Conduct Problem factor (t = 3.02, p = .01) (indicating

that treated children showed fewer problems), but these

comparisons were not significant at the alpha levels

required to control for multiple comparisons ((.05/5)/2 =

.005). Treatment-versus-"few problem" comparisons were

significant for the SESBI Intensity scale (t = 3.87, p =

.004), and the Conners Hyperactivity Index (t = 4.15, p =

.002), indicating that treated children showed more problems

than the "few problems" comparison group on these variables.

Comparisons were also sizable for the SESBI Problem scale (t

= 2.48, R = .04), and the Conners Conduct Problem factor (t

= 3.40, p = .008), although these pairwise comparisons were

Table 10

Mean Scores on Teacher Ratings per Followup

Followup 1 Followup 2

Mean S.D. Mean S.D.

SESBI Intensity

Treated 113.4 38.0 138.7 43.3

Low Problems 59.4 22.8 49.6 16.1

Avg. Problems 105.1 24.4 98.9 34.0

High Problems 142.9 20.2 142.4 45.4

SESBI Problem

Treated 8.5 8.4 15.3 8.8

Low Problems 1.1 2.3 1.2 2.1

Avg. Problems 6.9 7.6 9.0 5.1

High Problems 14.3 5.7 18.4 7.5

Conners Hyperactivity Index

Treated 1.41 .75 1.87 .82

Low Problems .24 .32 .24 .20

Avg. Problems 1.06 .62 1.12 .64

High Problems 1.92 .53 1.86 .83

Conners Conduct Problem

Treated .97 .56 1.02 .68

Low Problems .22 .24 .19 .20

Avg. Problems .81 .62 .85 .75

High Problems 1.45 .50 1.65 1.04

Table 10--continued

Followup 1 Followup 2

Mean S.D. Mean S.D.

Conners Inattentive Passive

Treated 1.27 .89 1.67 .72

Low Problems .64 .55 .43 .56

Avg. Problems 1.16 .75 1.10 .81

High Problems 1.76 .62 1.38 .79

Note. N = 10 at 12-month followup; N = 11 at 18-month

not significant at protected alpha levels. Taken together,

these results indicate that at the 12-month followup,

treated children fell within the range of "average"

comparison children on all teacher rating measures of

classroom behavior problems. Mean scores indicated lower

levels of behavior problems than the "most behavior

problems" comparison children on some variables, but these

differences were not significant at protected levels.

Scores on the SESBI Intensity scale and the Conners

Hyperactivity Index did indicate significantly more problems

than those of the "fewest behavior problems" comparison


Social skills ratings. Mean scores for the treated and

comparison children at each followup on the total and

subscale scores of the Walker McConnell are shown in Table

12. Higher scores represent greater social competence on

this measure. At the 12-month followup assessment,

significant variation was found across the mean difference

scores on the Walker-McConnell total score (T2 = 6.38, R =

.002), and the Teacher-Preferred Behavior subscale (T2

3.52, p = .01).

Table 11

Hotelling's T2 Values for Teacher Rating Variables at

12-Month Followup

Hotelling's T2 p-value

SESBI Intensity 6.40 .002

SESBI Problem 5.84 .003

Conners Hyp. Index 14.61 .001

Conners Cond. Prob. 3.94 .008

Conners Inatt. Pass. 3.31 .013

Note. N = 10

Pairwise comparisons revealed that treated children

scored significantly higher on social competence than the

"most problems" comparison group on both the total score (t

= 4.79, p = .002), and the Teacher Preferred Behavior

subscale (t = 3.14, E = .01). As Table 12 shows, treated

children showed mean scores that were very similar to the

"average problems" comparison group and uniformly higher

(i.e., more competent) than the "most problems" comparison


Table 12

Mean Scores on Walker McConnell per Followup

12-Month Followup 18-Month Followup

Mean S.D. Mean S.D.

Total Score

Treated 137.9 34.8 115.0 43.3

Few Problems 156.5 37.5 176.9 16.1

Avg. Problems 141.8 30.0 129.7 34.0

Most Problems 107.2 23.1 114.5 45.4

Teacher Preferred

Treated 48.4 12.9 42.6 11.4

Few Problems 60.5 12.3 64.7 8.6

Avg. Problems 51.9 10.3 46.3 9.1

Most Problems 39.2 6.1 39.1 11.9

Peer Preferred

Treated 56.6 14.4 46.6 8.6

Few Problems 55.9 17.6 67.7 11.4

Avg. Problems 57.0 14.4 53.2 14.6

Most Problems 42.8 13.0 49.7 10.7

School Adjustment

Treated 32.9 9.9 25.7 6.1

Few Problems 40.1 9.8 44.5 6.4

Avg. Problems 32.9 9.8 30.0 8.7

Most Problems 25.2 7.0 26.6 7.1

Note. N = 10 at 12-month Followup; N = 11 at 18-month

group for all subscales of the Walker-McConnell at the 12-

month followup. Hotelling's T2 values were sizable for the

Peer Preferred Behavior subscale (T2 = 2.81, E = .02) and

the School Adjustment subscale (T2 = 2.82, p = .02), but

these did not reach significance at the protected alpha

level (i.e., .05/4 = .0125).

18-Month Followup

Behavioral observations. Mean scores for the 18-month

followup behavioral observation measures are shown in Table

9 for the treated and comparison children. Only the on-task

measure (T2 = 3.04, p = .008) showed significant variation

within the vector of treatment-versus-comparison difference

scores. Pairwise comparisons indicated that treated

children showed lower percentages of on-task behavior than

the "few problems" comparison group (t = 4.34, p = .002),

and the "average" comparison group (t = 5.29, E = .0004)

at the 18-month followup. The treatment group's on task

mean scores were also lower than those of the "behavior

problem" comparison group, although this comparison (t =

-2.57, p = .03) was not significant at the protected alpha

level (i.e., (.05/3)/2 = .008). The treatment group's mean

compliance score fell solidly within the middle range of the

comparison groups (see Table 9) at the 18-month followup,

and the Hotelling's T2 test (T2 = .73, n.s.) did not

approach significance. On the appropriate variable, the

treatment group had the lowest mean percentage of

appropriate behavior (see Table G), but the overall effect

(T2 = 1.73, E = .04) was not significant at the protected

alpha level (i.e., .05/3 = .017). Taken together, these

results indicate that the treated children fell somewhere

within the observed range of classroom behaviors at the 18-

month followup in terms of compliance and percentage of

appropriate behavior, while their percentage of on-task

behavior fell near or below the range of the "behavior

problems" comparison group.

Table 13

Hotelling's T2 Values for Teacher Rating Variables

at 18-Month Followup

Hotelling's T2 p-value

SESBI Intensity 5.30 .002

SESBI Problem 7.52 .001

Conners Hyp. Index 5.78 .001

Conners Cond. Prob. 2.74 .011

Conners Inatt. Pass. 3.17 .007

N = 11

Teacher ratings. Table 10 shows mean scores at the 18-

month followup for the treated and comparison children on

the SESBI and Conners rating scales. Mean scores show that

treatment group scores were similar to those of the "most

problems" comparison group on every measure. Hotelling's T2

tests indicated significant effects for each variable, and

these values are shown in Table 13.

Pairwise comparisons indicated that the treatment group

showed significantly more problems than the "few problems"

comparison group on all measures evaluated: the SESBI

Intensity scale (t = 6.67, p = .0001), the SESBI Problem

scale (t = 5.44, p = .0003), the Conners Hyperactivity

Index (t = 7.15, E = .0001), the Conners Conduct Problem

factor (t = 4.22, p = .002), and the Conners Inattentive-

Passive factor (t = 5.39, E = .0003). Pairwise comparisons

also separated the treatment group from the "average"

comparison group on the SESBI intensity score (t = 3.62, p =

.005). Pairwise comparisons contrasting the treatment and

"average" groups were sizable, but did not reach protected

significance levels (i.e., (.05/5)/2 = .005) for the SESBI

problem score (t = 2.57, p = .03), the Conners Hyperactivity

Index (t = 3.24, p = .009), and the Conners Inattentive-

Passive factor (t = 2.24, p = .05). Only for the Conners

Conduct Problem factor did the treatment group mean score

fall nearer to the "average problems" comparison group than

to the "most problems" comparison group at the 18-month

followup. For this variable the pairwise comparison

contrasting the treatment group to the behavior problem

comparison group was sizeable (t = 2.76, p = .02),

although not significant at the protected alpha level.

Social skills ratings. Results on the Walker-

McConnell social skills ratings at second followup (shown in


Table 12) closely resemble the other teacher rating measures

reported above. Hotelling's T2 values were significant for

the total score and each of the three subscales, and post

hoc pairwise comparisons indicated that the treatment group

showed lower levels of social competence than the "few

problems" comparison group on each variable. No other

pairwise comparisons approached significance. Table 14

displays the values of the overall test and significant

pairwise comparison values for the Walker-McConnell. It

appears that the treatment group children fall within the

range of the "average" and "most problems" comparison groups

at the 18-month followup in terms of social skills, and

clearly below the social skills level of children selected

as presenting few classroom behavior problems.

Table 14

Hotelling's T2 Values for Social Skills Ratings at

18-Month Followup

T2 p t*

Walker McConnell

Total 5.74 .001 7.44 .0001

Teacher-Preferred 4.26 .003 6.08 .0001

Peer-Preferred 3.77 .004 5.52 .0003

School Adjustment 11.81 .0001 -10.65 .0001

Note. N = 11
t-test for treatment group versus "few problems"
comparison group