
Citation 
 Permanent Link:
 https://ufdc.ufl.edu/UF00097732/00001
Material Information
 Title:
 The Effects of statistical information on clinical judgement
 Creator:
 Moxley, Ann Weimer, 1946
 Publication Date:
 1970
 Copyright Date:
 1970
 Language:
 English
 Physical Description:
 vii, 63 leaves. : ill. ; 28 cm.
Subjects
 Subjects / Keywords:
 Clinical judgment ( jstor )
Clinical psychology ( jstor ) Conditional probabilities ( jstor ) Discriminants ( jstor ) Graduate students ( jstor ) Length of stay ( jstor ) Propriety ( jstor ) Psychotherapy ( jstor ) Rate bases ( jstor ) Z score ( jstor ) Clinical psychology ( lcsh ) Dissertations, Academic  Psychology  UF ( lcsh ) Psychology thesis Ph. D ( lcsh ) Statistical decision ( lcsh )
 Genre:
 bibliography ( marcgt )
nonfiction ( marcgt )
Notes
 Thesis:
 ThesisUniversity of Florida, 1970.
 Bibliography:
 Bibliography: leaf 37.
 Additional Physical Form:
 Also available on World Wide Web
 General Note:
 Manuscript copy.
 General Note:
 Vita.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 022060155 ( AlephBibNum )
13493553 ( OCLC ) ACY4899 ( NOTIS )

Downloads 
This item has the following downloads:

Full Text 
THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT
By
ANN WEIMER MOXLEY
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1970
ACKNOWLEDGEMENTS
The author wishes to express her gratitude to her committee
chairman, Dr. Paul Satz, for his comments, patience, and inspiration
throughout the initiation, implementation and completion of this re
search. The author is also indebted to the other members of her
committee, Dr. Audrey Schumacher, Or. Ben Barger, Dr. Marvin Shaw, and
Dr. Donald Childers, for their assistance and criticisms. The author
wishes to express her special thanks to her husband, Jim, whose moral
support was invaluable and most appreciated.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS
LIST OF TABLES
LIST OF FIGURES
ABSTRACT .
INTRODUCTION . .
METHOD . . .
RESULTS ...
DISCUSSION .
SUMMARY ...
APPENDIX A INSTRUCTIONS . . .
APPENDIX B SUMMARY OF NEWMANKEULS TESTS
REFERENCES . . . . . . . . . . .
BIOGRAPHICAL SKETCH ........
. . . 48
. . . . . . 56
LIST OF TABLES
bible Page
1. Actual and testpredicted therapeutic outcome . . .. 9
2. Design schematic .... .............. .18
3. Proportion of correct judgments . . . . .... 22
4. Summary of analysis of variance of accuracy ...... 23
5. Mean confidence scores . . . . . .... . .. 27
6. Summary of analysis of variances of confidence . . 28
7. Correlation coefficients for appropriateness . . .. .31
8. Summary of analysis of variance of appropriateness
correlations . . .. . . .. . . ... . 332
LIST OF FIGURES
Figure Page
i. Accuracy by information levels . . . . . ... 24
2. Confidence by information levels . . . . ... 29
3. AppropriaLeness correlations between accuracy and
confidence by information levels . . . . .... 33
Abstract of Dissertation Presented to the Graduate Council
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
at the University of Florida
THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT
By
Ann Weimer Moxley
December, 1970
Chairman: Dr. Paul Satz
Major Department: Psychology
The present study was designed to look at the effects of adding
quantitative and qualitative data to a relevant clinical judgment
task. In essence, it compared judges with varying degrees of clini
cal experience to actuarial prediction methods. The study also
attempted to train judges to use actuarial information to improve
their prediction accuracy.
Twelve judges representing three levels of clinical experience
made postdictive judgments on the length of stay in psychotherapy
(short or long) from a sample of MMPI profiles of clients seen in a
university mental health service. Judgments were made under four
conditions in which qualitative and quantitative information was
added incrementally at each level. The three levels of judges' ex
perience were professional clinical psychologists, "sophisticated"
third year clinical psychology graduate students trained in
statistical decision theory, and "unsophisticated" third year clini
cal psychology graduate students without any training in statistical
decision theory.
Accuracy increased over levels of information but there were no
differences in accuracy for the three levels of experience. A sig
nificant group by information level interaction demonstrated some
group effects due to a lower proportion of correct judgments for the
less experienced judges under conditions involving the least amount
of information.
Judges became more confident in their judgments as they received
more information. Appropriateness, defined as accuracy weighted by
confidence and measured by correlation coefficients between accuracy
and confidence, increased substantially as increments of information
were added. The group trained in statistical decision theory tended
to make the most appropriate judgments and the least experienced
group of graduate students tended to make the least appropriate judg
ments.
The present study showed that clinicians can use quantitative
data to improve their own judgmental ability and to predict more
accurately than actuarial data alone. Also, since those judges with
the most experience in using actuarial tasks tend to be the most
appropriate in their judgments, this implies that clinicians can also
be trained to be more appropriate and to know when their judgments
are more likely to be accurate.
INTRODUCTION
The objectives of the present study were twofold. The primary
objective was to examine the effects of test and nontest (statistical)
information on the judgmental process. The secondary objective, and
the focus for implementing the primary objective, was to study the
psychological attributes of individuals who stay only a short time in
therapy versus those who remain a long time. That is, the objective
of studying the judgmental process was couched in a real and relevant
situation, length of stay in psychotherapy, which is a meaningful and
pressing problem for psychologists today. However this secondary
objective was minor in relation to the major issue of examining thp
clinical judgment process.
Clinical Versus Actuarial Predictio.
Ever since Paul Meehl's book, Clinical Versus Statistical Pre
dict ,ion clarified the issue of clinicians' predictions versus
actuarial predictions, there have been numerous studies comparing
these two prediction methods. As Meehl (1954t) points out, however,
the two methods need not be mutually exclusive since the clinician can
incorporate actuarial methods and data into his prediction process.
Many studies have focused not only on comparing clinicians to statis
tical formulae but also on improving the clinician's ability to
predict by giving him useful statistical information and training him
to use this information.
In general, the studies which compared clinicians to actuarial
methods found that the actuarial methods were either superior to clin
icians or equal in efficiency to clinicians (Meehl, 1965). With the
exception of one study, the clinician has shown no superiority to
purely quantitative actuarial prediction. The one study which did
find clinicians superior (Lindzey, 1965) used one to two clinicians
and its application is somewhat questionable. One reason the clin
ician has not been superior to actuarial methods is that he has seldom
been given the opportunity to incorporate the actuarial information in
formulating his final decision. He has been at a disadvantage so that
the demonstrated superiority of the actuarial method may be due to the
experimental design rather than to an actual superiority of statistical
techniques. Also, the information available to the clinician has often
been based on nonquantitative data such as interview material, case
history data, and projective tests.
Holtzman (1960) separates the clinician's diagnostic task into
three phases: (1)collection of information; (2)preparation and trans
lation of this information for analysis; (3)interpretation of this
information. As he points out, actuarial methods, and specifically
the computer, are superior to the clinician in processing information
once the primary coding has been done. The clinician is still
superior at collecting information and at interpreting it because at
present the computer lacks the appropriate rules and parameters for
interpretation. Thus, studies which emphasize aspects of prediction
suitable for actuarial methods do not use the clinician's talents to
best advantage. It is when skilled clinicians use familiar methods
to predict a criterion they know something about that they have the
most success (Holt, 1958). This includes their having a rich body of
data and systematic actuarial procedures at their disposal in addition
to their own experience, intuition, and knowledge.
Recent studies suggest that as the amount of clinical experience
increases, prediction accuracy decreases (Goldberg, 1959; Oskamp,
1962; Shagoury, 1969; Shagoury & Satz, 1969). These studies compared
trained clinicians with a professional degree to clinical psychology
graduate students and even to nonprofessional groups, such as secre
taries, and have found that the trained clinicians were not superior
to the other groups. An explanation of this finding is that the more
experienced clinician has developed a particular way of looking at
data which interferes with his making unbiased, objective decisions.
Another aspect of research in the area of clinical versus statis
tical predictions is the confidence clinicians place in their judg
ments and the appropriateness of their predictions. Appropriateness
is a measure of confidence weighted by accuracy which was developed
by Adams (1957). Confidence in judgments also differs between groups
of graduate students and trained clinicians,with the trained psychol
ogists being less confident in their judgments (Goldberg, 1959; Oskamp,
1962). When the measurement of appropriateness of the judgment is
introduced, however, the trained clinicians are more appropriate in
their confidence levels than are either graduate students or non
professionals (Oskamp, 1962; Shagoury, 1969). That is, clinicians
are more confident of their correct decisions and less confident of
their incorrect decisions. The amount of information available to
the judge does not correlate with his predictive accuracy but
increased amounts of information substantially increase confidence
levels (Goldberg,1968).
Goldberg (1968) also discusses the nature of the judgmental
process. He questions whether judges use simple decisionmaking
models such as linear models, or complex processes such as configural
models. In an analysis of clinician's judgments he found that a
linear model usually reproduced 90 to 100 per cent of the reliable
judgmental variance on most decisionmaking tasks even though the
clinicians generally felt that they used more complex, configural
models.
Using Statistical Information to increase Prediction Accuracy
Training in the use of statistical information has been shown
to improve judgmental accuracy. In a study by Oskamp (1962), clini
cians were able to improve their ability to distinguish psychiatric
and medical patients on the basis of their Minnesota Multiphasic
Personality Inventory (MMPI) profiles when'they were provided with
actuarial rules. Statistical formula predicted with 75 per cent accur
acy and the clinicians, after training, were able to reach this 75
per cent accuracy level.
Goldberg (1968) trained judges by giving them a formula and
optimum cutting score for distinguishing neurotic form psychotic MMPI
profiles. The judges were told that the statistical information
predicted with 70 per cent accuracy and they were encouraged to use
this information along with any other information they thought would
improve their prediction accuracy. Goldberg found that after eight
weeks of "value training," the judges, on the average, increased their
accuracy from between 52 per cent to 65 per cent to approximately 70
per cent. This was the only type of training that substantially im
proved accuracy. Thus, feedback is necessary if the clinician is to
learn how to improve his decisionmaking techniques.
Another useful type of statistical information is the incidence,
or base rate, of a given trait in the population available to the
clinician. Goldberg (1959), for example, had judges predict brain
damaged patients from functional patients on the basis of Bender
Gestalt protocols. The protocols were randomized into different
groups in which the incidence of braindamage varied from high (P=.8)
to low (.=.2). Goldberg found no difference in judgmental accuracy
between these groups. Unfortunately, the base rate information was
not provided to the judges.
The importance of base rates for evaluating predictive tests was
discussed by Meehl and Rosen (1955). They cite as an example an Army
adjustment test for predicting which inductees would adjust to the
service. The test predicted inductee adjustment with an accuracy of
79.7 per cent. However, the overall percentage of inductees who
adjusted was 95 per cent; thus, utilization of the base rates alone
(i.e., predicting adjustment in all cases) would result in a hit rate
of 95 per cent.
Another application of base rates is through Bayesian statisti
cal theory which combines the base rates with the valid and false
positive rates of a particular test to give a conditional probability
for the likelihood of being correct or incorrect given a certain test
sign in a given base rate population.
Shagoury (1969) and Shagoury and Satz (1969) demonstrated that
clinicians can substantially improve their predictive accuracy when
provided with information on base rates and conditional probabilities.
These studies showed that increments in statistical information,added
to test data, significantly increased the accuracy of judges in a
reallife clinical decision task of predicting braindamaged and
functional patients on the basis of a block rotation task (Satz, 1966).
Their judges' accuracy approximated that obtained by a discriminant
function predictor score (L). Composite 7 scores were deemphasized
by the judges in favor of using the additional information such as
the base rates, differential error risks, and conditional probabil
ities. However, in groups with a high incidence of braindamaged
individuals (base rate=.8) the judges' overall accuracy decreased,
perhaps due to a reluctance to diagnose pathology.
Mechl and Rosen (1955) point out that test development should be
concentrated on populations with base rates near .50 rather than on
populations with base rates approaching .00 or 1.00 since the use of
a test in the latter cases will lower the hit rate of using the base
rates alone.
A cutting score, or composite Z score, derived from discriminant
function analysis can be manipulated for various purposes in predic
tion. It can be used to maximize the number of correct predictions
for all cases or for maximizing only correct predictions for positives.
An excellent application of this technique of discriminant func
tion analysis to decision theory in a clinical setting was demonstrated
by Satz (1966). Discriminant function analysis is a statistical tech
nique devised to maximally differentiate discrete criterion groups
when multiple measurements are involved. This is essentially a multi
ple regression technique except for a discontinuous distribution on
the criterion variables. The following linear equation expresses this
function:
= X, + X 22 +... + nn
where 7 is the composite predictor score based on the individual scores
on each of the variables (XI, X2,...,X) and the respective weights,
or lambdas, assigned to each of the variable scores (A*' A2"'"... n
If there are two criterion groups involving multiple measures, the
discriminant function determines optimal weights lambdass) for these
variables which will maximize the difference between the composite Z
scores on both criterion groups.
Length of Stay in Psychotherapy as a Criterion Variable
Why is length of stay in psychotherapy a meaningful problem for
study? First, there is the great demand for psychological services
with a presentday manpower shortage of trained clinicians. Most
clinics that see individuals with psychological problems are under
staffed, have patient waitinglists, or both. There are also differ
ential risks involved in selecting who will be seen in therapy. It
is far more serious to miss those who are severely disturbed and need
longterm psychotherapy because of the threat these individuals may
pose to themselves or to society, than it is to wrongly classify
persons who need only a few sessions and are experiencing minor
difficulties in their lives. The first type of error, that of pre
dicting a short stay in therapy based on a negative test score when
in fact the person stays a long time, is a false negative error. A
false positive error results from the prediction of therapy sessions
based on a positive test score when the individual actually stays
only a few sessions.
Meehl and Rosen (1955) point out that often in a clinical set
ting external restraints are imposed, perhaps due to a shortage of
staff time, patient waitinglists, or administrative policy. If this
is the case, decisions cannot always be made in accordance with known
base rates. They give the following example to illustrate the use of
an externally imposed selection ratio. If 80 per cent of the patients
referred to a mental health clinic are recoverable with intensive
psychotherapy, then everyone should be treated rather than relying on
a test which predicts only 75 per cent of those who will have a favor
able therapy outcome. However, if staff time is limited and only half
of the referrals can be treated, following the base rates is meaning
less because this would lead to a decision that would be impossible
to implement. In this case, where a selection ratio of .5 is exter
nally imposed, the use of the test becomes worthwhile. Given the
figures in Table 1 (Meehl & Rosen, 1955), those 50 cases out of the
100 referrals to be treated are selected from those individuals the
test predicts will be "good" therapy risks. If this is done there is
a 92.3 per cent hit rate among those selected for therapy (60/65).
Stated another way, the test will be correct in 46 out of the 50
cases which will be successes (half of the 80 good therapeutic
Table I
Actual and TestPredicted Therapeutic Outcome
Therapeutic Outcome
Test Good Poor Total
Prediction
Good 60 5 65
Poor 20 15 35
Total 80 20 100
outcome group).
A second reason for selecting length of stay in psychotherapy as
the focus for a clinical judgment study is that the problem can be
subjected to multivariate and statistical decision theory analysis in
order to increase the predictive relationship between signs and cri
teria. This possibility thus increases its application and potential
usefulness to clinical judges.
One study in this area found that there are differences in be
havior in psychotherapy between individuals which are predictable
from an MMPI profile (Mello & Guthrie, 1958). Mello and Guthrie
studied 219 individuals seen at a college psychological clinic. They
used only those profiles with at least one T score greater than 70.
They found that length of stay in therapy was related to high scores
on various scales of the MMPI. Of those students with high scores
on Scale 2(D), 45 per cent remained only one to three sessions. Per
sons high on Scale 3(Hy) tended to stay in therapy longer than the
high 2's and also developed dependency on the therapists more easily.
Scale 4(Pd) individuals seldom stayed past seven counseling sessions
and as a group were quite resistant to therapy although they did not
often cancel their appointments. Persons who stayed the longest in
therapy were high on Scales 7(Pt) or 8(Sc) with some clients contin
uing past 60 and 21 sessions respectively for these two scales. Most
of the high 9(Ma) students stayed fewer than II sessions and cancelled
therapy sessions frequently. Mello and Guthrie concluded that a
therapist can get some idea of what to expect from a particular client
on the basis of his MMPI profile.
The Mello and Guthrie study is interesting because it suggests
that psychological data (MlPI) may be used by clinicians to more
efficiently select clients for psychotherapy. Unfortunately, the
authors did not examine this problem within the context of a decision
making task nor did they subject their data to multivariate analysis.
Using length of stay in psychotherapy as the predictor criterion
is valuable for other reasons. For the professional involved, it may
clarify the services offered by his agency and help him to provide
more adequate services to his clients. For example,.he may decide
that seeing many clients for a short period of time is of more value
than giving those who need longterm therapy this service and thus
seeing fewer clients. That is, prevention may be emphasized in a
college mental health clinic and such a clinic may be designed to see
as many students as possible to ease their transition from high school
or junior college to a college curriculum. On the other hand, a
clinic may be more treatment oriented and seek to help those who are
more disturbed and require longer therapy. This emphasis would re
quire more staff time per individual client and would necessitate
seeing fewer clients. Decisions of whom to treat could be more ade
quately made with test and nontest information.
To be able to predict length of stay in therapy could affect
therapist expectations which could in turn affect outcome variables.
Just what effect an expectation for a particular length of stay in
therapy will have on the outcome of the therapy is outside the scope
of the present study but is an important research question in itself.
Of course, if the clinician intends to see everyone who enters his
clinic, a screening procedure is worthless or may even be detrimental
if the test predicts that an individual will not stay in therapy or
will not improve in therapy, because this may lead the therapist to
expect just these results to the client's disadvantage (Meehl & Rosen,
1955).
It is often necessary for the clinician to indicate a therapy
prognosis for an individual. If the clinician can predict or learn
to accurately predict whether or not a person will stay in therapy,
he is providing useful information for the person's treatment.
Thus it can be seen that clinicians are constantly involved in
the task of prediction and decisionmaking. If they can be trained
to make use of relevant data and material, they may improve their
predictions. Although many clinicians look with disfavor on the use
of tests, tests combined with other relevant data can be shown to
have practical and research applications. The clinician may use them
'to better his predictions and decisionmaking processes.
.HEpothLeses Tested
The present study was addressed to two objectives. First, to
examine the decisionmaking process and to determine whether predic
tion accuracy is influenced by independent variables such as clinical
experience and varying amounts and kinds of information. Second, the
question of whether clinicians can be trained to improve their clin
ical decision processes was also examined. The first and primary
objective was studied in terms of the second objective, a reallife
situation that is meaningful to clinicians todaythe problem of
length of stay in psychotherapy. If increments in levels of statis
tical information increase prediction accuracy and thereby improve
the clinicians' decision process, this type of information may be
dovetailed into the operation of a clinic and taught to the staff to
identify highrisk individuals. Specific questions, or hypotheses,
were raised. Does judgmental accuracy increase as more information
is added to the prediction task and what types of information are
most useful in increasing judgmental accuracy? Will there be differ
ences in accuracy dependent on experience level? That is, will grad
uate clinical psychology students trained in statistical decision
theory be better clinical judges than experienced PhD clinical
psychologists (without such training) and will less experienced
psychologists be superior to more experienced clinicians? Will con
fidence and appropriateness increase with increments in information
and will there be differences between the three experience levels,
with regard to their confidence and appropriateness.
METHOD
Sub ects. Twelve judges (Js) represented three levels of
experience and sophistication in statistical decisionmaking. A pro
fessional (P) group of four PhD clinical psychologists represented
the highest level of clinical experience. A group of four clinical
psychology graduate students trained (sophisticated) in statistical
dccisicnmaKing theory (SGS) represented the highest level of statis
ticai sophistication. Another group of four unsophisticated (not
trained in statistical decision theory) clinical psychology graduate
students (UGS) represented the same experience level as the SGS group
and the same level of statistical sophistication as the P group.
Sophistication in decision theory was defined as participation in a
graduate course in statistical decision theory for clinical psychology
students at the University of Florida. Sophistication here only im
plies special training and by no means implies that the professionals
were clinically unsophisticated.
Materials. Test materials for Js were a random sample of 100
MMPI profiles of clients seen in a university mental health service.
The sample profiles were drawn from 241 profiles of all clients seen
during a threeyear period. Each J received 25 of the 100 profiles.
Profiles were divided into two groups based on the client's
length of stay in psychotherapy at the mental health service. A short
stay (S) was defined as four or less therapy sessions and a long stay
(L) as five or more therapy sessions. The mean length of stay for
the S group was 2.00 sessions and for the L group 9.27 sessions.
A discriminant function analysis which maximized the difference
between the two length of stay in psychotherapy groups was run on
the 241 MMPI profiles. The mean discriminant composite scores for
the two length of stay in therapy groups on the 13 MMPI scale vari
ables were Z =29.74 for the fewsession group (S) and Z2=34.26 for
the manysession group (L). An analysis of variance of the composite
means showed a significant difference between the two groups (F=4.19,
df=12,224, p<.001). A commonly used rule of 2= 1i + 2 was used to
2
determine the optimal predictive cutting Z score.
With an emphasis on minimizing the false negative rate, the com
posite Z score of 32.02 predicted with an overall hit rate of 67 per
cent for the original protocol pool. False negative errors repre
sented those clients who were predicted as, hortstays (S), or
negatives, but who remained long in therapy (L). It was felt that
this predictive error was more serious than the false positive error
which included those clients who were predicted as longstays (L)
but who remained a short time in therapy (S). It seemed more impor
tant to identify those clients who really needed longterm therapy
than to identify those who did not. Of course, some of the individ
uals with high test scores who stayed only a few sessions may have
been very disturbed but dropped out of therapy prematurely. There
was no way to identify these case's when a very disturbed student may
have panicked or become threatened by therapy and dropped out or
simply missed appointments. The false negative rate for the Z score
of 32.02 was .38, the false positive rate was .31, giving a valid
negative rate of .69 and a valid positive rate of .62.
Another Z score which minimized the false positive error pre
diction with an overall accuracy of 71 per cent was not used in
the present study for the reason stated above.
Conditional probabilities were calculated for the 7 cutting
score. Conditional probabilities were computed with the following
equations:
P/ (L)/L and P(/) L
P(L/+) = P(L)P(+/L) + P(S)P(+/S) anP(S)P(/S) + P(L)P(/L)
wherL L=many therapy sessions or a long stay in therapy (base rate=.66)
S=few therapy sessions or a short stay in therapy (base rate=.34)
+=a positive test score (Z 1 32.02)
=a negative test score (Z < 32.02)
For the 2 score of 32.02 the conditional probabilities were:
P(L/1)=.51 and P(S/)=.78. With this new information it can be seen
that with a positive test score, predictions will be wrong as often
as they are correct. But given a negative test score, predictions
will be right 78 per cent, or most of the time.
Finally, a random sample of 100 profiles from the total protocol
pool of 241 cases was drawn. This was done so that the Js would have
fewer protocols to judge, making their task more economical with
regard to time.
'Failure to control this factor undoubtedly lowered the predic
tive accuracy of the discriminant function equation (and perhaps
clinical judgment) in that some of the disturbed profiles in the (S)
criterion 9rouip may well have remained (L) if they had not dropped out.
A second reason for drawing a random sample was to make the
situation more relevant clinically in terms of the base rates. That
is, the sample had only approximate base rates and the judges did not
know the exact probabilities for their sample of those who remained a
long or short time in therapy. However, for the sample, the 2 score
predicted with the same accuracy that it did for the total protocol
pool.
Procedure. Refer to Table 2 for a schematic of the design. Is
were asked to predict a client's length of stay in psychotherapy
from the 25 MIPI profiles. These profiles, the sample of 100 pro
files and the original profile,pool all had approximately the same
base rates: 35 per cent of the clients stayed many sessions (L) and
b6 per cent stayed a few sessions (S). The Js predicted length of
stay in therapy (S or L) during four sessions,with additional infor
mation added incrementally at each session. These sessions, or
levels of information, represented one class of independent variables.
Groups, or experience level, represented the other class of indepen
cent variables.
Each J made his predictions on the same 25 protocols that he
received at the first level throughout the training. Level 1: Js
were first given MMPI profiles with no other information Level 11:
Is were again presented the same 25 protocols for the same judgment
but with the additional information of biographical data such as age,
sex, marital status, religious preference, parents' marital status,
previous counseling experience, and subsequent counseling exper
ience. Level ill: For the third decision task, is were given the
2ZJ v\Or vm
en Uvf
a fi iU
Ufi 0Vc enf OV~ enfi U~U
gil Iii jai
cNvf 1Jen\Ohm 0\OIN
Q J0u
NM@
O)ON
mff?
mirt
????
T I V 7
.I u
profiles, biographical data, with the additional statistical infor
mation of the cutting score based on discriminant function analysis.
Valid positive and false positive percentages were also provided
with the cutoff Z score. Level !V: Conditional probabilities and
the base rates were added to the previous information for the fourth
presentation of profiles for prediction. (For a copy of the instruc
tions for each information level see Appendix A.) For each judgment
Js also indicated their confidence in the accuracy of their judg
ment.
To rule out a practice effect from repeated presentation of the
same profiles, two control judges were used who predicted length of
stay in psychotherapy using profiles only,with no additional infor
mation on four separate occasions.
Judges were presented the profiles for judgments on four days
in a row with only one information level given each cay.
Hypotheses. IInformation Level: It was hypothesized that
increments of information would increase overall judgmental accuracy
and group accuracy. (A) Level I accuracy would be at approximately
the level of chance. (B) At Level II, accuracy would decrease or
remain the same. (C) Level III accuracy would be approximately that
of the actuarial prediction accuracy of the discriminate function
analysis. (D) Level IV accuracy would increase slightly over Level
II accuracy.
llExperience Level: It was hypothesized that the statistic
ally sophisticated graduate students would be the most accurate, the
statistically unsophisticated graduate students next most accurate,
20
and the professionals least accurate.
IllConfidence and Appropriateness: It was hypothesized that
confidence ratings would increase with increments of information and
that appropriateness would also increase with more information.
RESULTS
Accuracy: The Effects of Infornation and Experience
Accuracy was defined as the proportion of correct judgments per
presentation of 25 MMPI profiles. The two control Js showed no prac
tice effects. Judge A's accuracy was 52 per cent on the first
presentation and 48 per cent on each of the three subsequent presen
tations. Judge B's accuracy was distributed across sessions as
follows: 76 per cent, 48 per cent, 68 per cent, and 68 per cent.
Table 3 presents accuracy by information level and experience
level. Two analyses of variance were conducted to determine the
effects of information level, experience level, judges, profile set,
and profile. The analysis of variance for profile set effects was
nonsignificant (f'=.59, df=3,91). An Emax test for homogeneity of
variances between groups was also nonsignificant (fmax=343' k=3,
df=16). A surmary of the analysis of variance for information level
and experience level effects is shown in Table 4.
Information. Mean judgmental accuracy increased consistently
with increments of information from Level i to Level IV (x =.55,
Xl=.61, Xi '.67, XV=.69). These differences were significant
(L10.82, df=3,27, pa .01). A graphic presentation of this trend
is shown in Figure 1. Inspection of Figure I shows approximately a
linear increase in accuracy for the three groups by information
level. Both the P and UGS groups increased their accuracy at each
level while the SGS group showed increases at Levels II and Ill
21
Table 3
Proportion of Correct Judgments
Experience Information Level Totals
Level J I II III IV
S .64 .68 .76 .76 .71
SGS 2 .56 .68 .72 .64 .65
3 .64 .68 .72 .72 .69
4 .63 .60 .72 .68 .65
Total .61 .66 .73 .70 .68
5 .72 .64 .64 .76 .69
UGS 6 .40 .64 .72 .68 .61
GS 7 .36 .52 .48 .56 .48
8 .36 .48 .68 .64 .54
Total .46 .57 .63 .66 .58
9 .44 .48 .68 .72 .58
10 .68 .68 .64 .76 .69
P 11 .64 .68 .68 .72 .68
12 .56 .60 .64 .60 .60
Total .58 .61 .66 .70 .64
Totals .55 .61 .67 .69
Control A .52 .48 .48 .48 .49
Control B .76 .48 .68 .68 .65
Table 4
Summary of Analysis of Variance of Accuracy
Source of Variation df MS F
Mean I 478.80
Information Leel 3 1.21 10.82*
Experience Level 2 0.92 2.31;
Information X Experience 6 0.81 7.23*'
Judges 9 0.39 0.69
Information X Judges 27 0.11 0.95
Profile 288 0.57
Information X Profile 864 0.11
** Significant at the .01 level.
 SGS
A UGS
70
0
5 0
0
II III IV
Information Level
Fig. i. Accuracy by information levels.
but a slight decrease in accuracy from Level III to Level IV.
Fxperiencc. There were no differences in accuracy due to exper
ience level except for a trend toward group differences (F=2.34, df=
2,9, p<.20). The SGS group was the most accurate and the UGS group
the least accurate (XGs=.68, p=.64, _uGS=58). Only the SGS
group's overall accuracy was at the level of the discriminant func
tion which predicted with 67 per cent accuracy.
Information and experience level interaction. The only other
significant source of variance was the group by information level
interaction (F=7.23, df=6,27, p<.01). The NewmanKeuls test of
differences between means was used (Kirk, 19G8) and the results of
this analysis are given in Appendix B. The interaction was based
largely on a significantly lower proportion of correct judgments of
the UGS group at Level I. The UGS group not only started with the
lowest proportion of correct judgments, but also showed the most
significant increase in accuracy as information was added. Their
final degree of accuracy, however, was approximately the same as the
SGS accuracy at Level I! The UGS group significantly increased
their level of accuracy at Levels III and IV from Level I when the
composite 7 score, conditional probabilities, end base rates were
added (p<.01).
The only significant increase in accuracy for the P group was
between the first level, with the profile only, and the final level
With all information (p<.05). Increases in accuracy for all groups
across information levels were significant except the increase from
Level III to Level IV where conditional probabilities and base rates
were added. Adding conditional probabilities and base rates to the
previous information did not result in a significant increase in
Js' accuracy over Level III, which included the composite 7 score.
For the SGS group there were no significant differences in accuracy
across information levels. The only significant group differences
within information levels were between the SGS group and the UGS
group (p4.Ol) and between the SGS group and the P group (p .05).
Confidence
Mean confidence scores by information level and experience
level are shown in Table 5. Table 6 shows the summary of the
analysis of variance for confidence scores. The Js confidence in
creased significantly as subsequent items of information were added
to the protocols for all groups (F=5.38, df=2,28, p4.05). Although
there were no differences between confidence scores for groups, the
P group tended to be the most confident and the SGS group tended to
be the least confident (Xp=76.86, UGS=72.72, GS=69.96). These
trends are shown in Figure 2.
Aonroraten ass
A measure of appropriateness (confidence weighted by accuracy)
was measured by Pearson productmoment correlations between confi
dence scores and accuracy scores for each J at each level of
information. There were no significant group effects or interact
tions but the SGS group tended to be the most appropriate (LSGS=.26,
p =.20, I GS=.17). The higher the correlation, the more appropriate
the judgment. Correlation coefficients for appropriateness are
Table 5
Mean Confidence Scores
Experience Information Level Totals
Level J I I III IV
1 61.2 64.0 65.6 65.6 64.1
2 61.2 60.8 60.8 72.0 63.7
SGS 3 75.6 74.0 73.6 74.4 74.9
4 72.0 76.2 80.4 82.0 77.7
Total 67.5 68.8 70.1 73.5 70.0
5 59.0 58.0 66.0 69.2 63.1
UGS 6 85.4 86.8 84.0 92.2 87.1
7 78.2 79.8 76.2 82.2 79.1
8 64.2 61.6 61.6 62.0 62.4
Total 71.7 71.6 71.5 76.4 72.7
9 87.6 87.2 88.4 90.0 88.3
10 64.8 56.0 65.8 56.0 60.7
II 85.2 87.8 86.4 86.6 86.5
12 68.0 70.4 76.8 72.8 72.0
Total 75.2 75.4 79.4 76.4 76.9
Totals 71.4 71.9 73.8 75.4
Table 6
Summary of Analysis of Variance of Confidence
Source of Variation df MS F
Information Level 2 52.67 5.38*
Experience Level 2 191.84 .39
Information X Experience 6 12.73 1.30
Judges within Croups 9 493.76
Information X Judges 28 9.79
* Significant at the .05 level.
80
ou
70
0
0 60
o5 .n SGS
2: C0 P
h L UGS
50
I II III IV
Information Level
Fig. 2. Confidence by information levels.
given in Table 7 with the analysis of variance summary for appro
priateness in Table 8. The analysis of variance was based on Z
transformations of the correlation coefficients. Mean appropriate
ness scores were significantly higher at each level of information
(E=22.03, df=2,28, p<.01).
The SGS group was most appropriate because they were most
accurate and not overconfident, that is, their confidence was con
sistent with their accuracy. The P group was overly confident and
the UGS group was less accurate, making these two groups' confidence
inconsistent with their accuracy. These trends can be seen in
Figure 3.
Judges versus the discriminant function
The discriminant function correctly classified 67 per cent of
the profiles. This information was given to the Js at Level Ill.
At Level Ill only the SGS judges were more accurate than the discri
minant function with a hit rate of 73 per cent. The P group had a
hit rate of 66 per cent and the UGS group had a hit rate of 63 per
cent at Level 11l. The accuracy for all judges combined was 67 per
cent. Five judges (two in the SGS group, two in the P group and one
in the UGS group) were more accurate overall than the linear regres
sion 2 score and only one J (in the UGS group) operated below the
chance level overall.
At Level I, four Js had accuracy scores below the level of
chance and two others were only slightly above chance. However,
none of tie four Js who was below chance were in the SGS group. At
Level !I there were two Js below chance and one slightly above; again,
Table 7
Correlation Coefficients for Appropriateness
Experience Information Level Totals
Level J I II III IV
S .35 .35 .34 .28 .33
SGS 2 .25 .20 .18 .65 .32
3 .34 .31 .A1 .28 .34
4 .11 .06 .01 .02 .05
Total .26 .23 .24 .31 .26
5 .13 .06 .24 .03 .07
U S 6 .03 .15 .10 .47 .10
7 .05 .03 .30 .47 .17
8 .29 .37 .41 .27 .34
Total .09 .04 .26 .30 .17
9 .27 .20 .17 .33 .24
S10 .06 .23 .33 .31 .23
1I .11 .19 .29 .28 .22
12 .02 .07 .08 .41 .10
Total .11 .14 .22 .33 .20
Totals .12 .13 .24 .31
Table 8
Summary of Analysis of Variance of Appropriateness Correlations
Source of Variation df MS F
Information Level 2 .0705 22.03**
Experience Level 2 .0366 2.26
Information X Experience 6 .0032 2.25
Judges within Groups 9 .0162
Information X Judges 28 .0072
** Significant at the .01 level.
33
rj0 SGS
00 P
& UGS
.30
20
^ .10
.00
I II llI IV
Information Level
Fig. 3. Appropriateness correlations between accuracy
and confidence by information levels.
34
none of these Js was in the SGS group. With the addition of the
statistical information, only one J (in the UGS group) remained near
the chance level of accuracy and he was the least accurate of all
the Js.
DISCUSSION
The present study demonstrated that judges can substantially
improve their decision accuracy when provided with increments of
information, particularly statistical information. This finding
extends the earlier findings reported for a different clinical judg
ment task (Shagoury & Satz, 1969) and contrasts with'previous studies
which have used nonquantitative data. These findings also suggest
that if the clinician is able to incorporate quantitative information,
he may improve his own decisionmaking ability and equal or surpass
the accuracy of actuarial methods.
The findings of the present study also showed that accuracy
increased directly as a function of the amount of information avail
able to the judges. Two conclusions that can be drawn from this
finding are that the information was relevant to the judgmental task
and that the judges used this information in formulating their
decisions.
Information
A posttesting interview revealed that the type of information
used varied between groups, among judges, and between information
levels. However, the interview was not structured enough to deter
mine the actual decision rules used by the judges.
At Level I, with the MMPI profile only, most judges used their
35
own intuition about the relationships of which scales were elevated
and the extent of these elevations to the length of stay in treat
ment criterion. There was a great deal of individual variation in
approach since each judge had different training experiences with
the MMPI. The judges of the SGS group had the most similar training
experience in the use of the MMPI since some training in the ration
ale and use of this instrument was given in the statistical decision
theory course. The SGS group also showed the least amount of indi
vidual variation in accuracy at Level I. The other group of graduate
students (UGS) had the least amount of exposure to the MMPI. The
UGS group was barely familiar with this test instrument and none of
these judges had had any formal training in its use. It is inter
esting to note that the group of unsophisticated graduate students
showed the lowest accuracy throughout and was the only group whose
accuracy was never below the level of chance. It seems then, that
the more familiar a judge is with a test instrument, the more accur
ate he will be in using it for prediction.
The SGS group was not familiar with the specific type of task
used in the present study. That is, they had not been trained in
correlating MMPI data to length of stay in psychotherapy. This
aspect of the study was novel to each of the three groups.
At Level II again each judge approached the data differently
and selected certain measures to use in making his decisions. The
variation within groups decreased and there was less difference in
this variation between groups. Accuracy increased or stayed the
same for all but one judge whose accuracy dropped. Thus, the
hypothesis that accuracy would decrease at Level II was not supported.
It was originally felt that all of the biographical data provided
would make the task more complex and more difficult and would thus
confuse the judges. However, the judges were able to relate some of
the information to the task and thereby improve their judgments.
Most judges used some combination of factors. Whether the profile
subject had previous counseling or subsequent counseling and his age
were the factors used most often. Some of the judges also considered
marital status when the subject was married. This finding (Level II)
contrasts with other studies which indicate lowering of accuracy
when data are combined (Golden, 1964).
In support of the hypothesis, overall accuracy for Level III
was the same as the discriminant function's accuracy of 67 per cent.
With the addition of Z scores at Level Ill, only one judge, who was
in the P group, used the cutoff score exclusively. In this same
group one judge changed none of his judgments from Level II and the
other two judges used primarily their own subjective inferences. The
UGS group essentially ignored the Z scores and relied on their own
intuition and thus did not reach the level of accuracy of the cut
off score. All judges in the SGS group combined the cutoff score
data with their own intuition to improve upon the accuracy of the
discriminant function. These findings imply that the clinician can
make use of his intuition and experience but not at the expense of
ignoring available data, particularly when they include quantitative
information. The findings also imply that the most accurate judges
are the ones who are able to utilize statistical data.
The fact that there was an increase in accuracy from Level III
to Level IV,but that this increase was nonsignificant, supported the
hypothesis that Level IV accuracy would increase slightly over Level
III accuracy. For the SGS group there was a slight decrease in
accuracy. One reason for this decrease might have been the informa
tion itself. These judges had been trained to use more powerful
statistical information, that is, data that discriminated groups and
subgroups more than did the data of the present study. The base
rates of .65 and .35 were not sufficiently different from base rates
of .5 to be of much help. Also the conditional probabilities were
not high enough to provide maximum discrimination. All of the
statistical data given were in approximately a 2/3 to 1/3 ratio.
Because all of the statistical data had approximately the same pre
dictive power, it may have been difficult to know which kind of
information would be most useful. Instead, judges may have tried to
combine two or more kinds of data and as a, result were less accurate
than they would have been using one type exclusively. Quantitative
information is most useful when it represents higher ratios, such as
base rates of .2/.8 or .1/.9; conditional probabilities of .85/.15;
and cutoff scores of 75 per cent or higher.
Even though five out of the twelve judges showed decreases in
accuracy at Level IV in comparison with Level III, these decreases
were slight and represented only one more incorrect judgment out of
the 25 judgments for all five judges.
At Level IV, the UGS group ignored the base rates and used the
conditional probabilities. The SGS group and the P group both used
a combination of conditional probabilities and base rates and both
of these groups had the same degree of accuracy at Level IV. Also,
both the SGS and P groups were more accurate than the UGS group.
It seems probable that statistical information is more impor
tant than biographical information about the subjects since there
was a greater increase in accuracy with the addition of statistical
information. Other studies have shown that biographical dataare of
minimal value to judges. Golden (1964) found that judges agreed
less in their description of protocols when they were given identi
fying data alone than when they were given a single psychological
test or a combination of tests. Kostlan (1954) found that judges
were more accurate in their psychodiagnoses when they received both
social case histories and the more quantitative MMPI protocol than
when they received social case histories alone.
One may ask if the judges would have been more accurate had they
been given some feedback on their accuracy at each level of infor
mation. This is possible but then the task would not have been as
lifelike in the sense that clinicians in actual situations must
usually wait some time before learning the accuracy of their predic
tions. However, this does emphasize the point that clinicians should
check the accuracy of their predictions when possible and learn what
helps them to predict most accurately.
Exeperie nce and train i n
The lack of overall differences between groups was not antici
pated. It was assumed that the SGS group would have benefited
from their training in statistical decision theory. However,
artifacts in the design tended to wash out group effects by providing
a guaranteed hit rate if the Z scores were used at Levels III and IV.
The convergence of judgmental accuracy foi each group at Level IV
lends some support for this argument.
The hypothesis that the SGS group would be the most accurate
was thus only tentatively supported since there was not a signifi
cant group effect. However, the SGS group tended to be the most
accurate in their judgments. This finding implies that clinicians
can be trained to improve their own subjective inferences with
statistical information. These judges trained in statistical de
cision theory were able to add their own intuitive judgments to the
statistical data and thereby predict more accurately than did the
discriminant function alone or than they had done without the statis
tical data. This special training taught them not only how to use
statistical information but also how to use their clinical intuition
to its best advantage. The SGS group also tended to be the most
appropriate, that is, to know when their judgments were most accurate
and when they were most inaccurate.
The fact that the P group tended to be more accurate than the
UGS group was also unexpected in light of previous findings con
cerning amount of clinical training and accuracy of prediction.
This finding does not support the previous evidence (Goldberg, 1959;
Oskamp, 1962; Shagoury, 1969; Shagoury & Satz, 1969) that as the
amount of clinical experience increases, prediction accuracy
decreases. In the present study, judges in the SGS and P groups
used familiar methods, the MMPI profiles or statistical data; the
UGS group, by contrast, was presented with essentially unfamiliar
prediction tools. One reason data in the present study were at
variance with previous findings is that previous studies required
clinicians to predict an unknown criterion or to use unfamiliar
methods so that any previous "set" of the clinician was not advanta
geous. In the present study, the judges' familiarity with either the
MMPI or statistical types of data helped them in their predictions.
Interaction effects
The significant group by information interaction effect showed
at least indirect support for the experience level hypothesis that
the SGS group would be most accurate. This interaction showed that
the unsophisticated graduate students started off predicting below
chance and, finally at Level IV, reached the level of accuracy that
the sophisticated graduate students attained at Level I (MMPI pro
files alone). It was the former group with the least amount of
experience, familiarity with the MMPI, and sophistication with the
statistical decision theory which accounted for most of the group
differences and much of the interaction effect.
The rest of the interaction effect was due to the changes in
accuracy across information levels with the UGS group showing the
greatest change and the SGS group showing the least amount of
change in accuracy. The latter group started out predicting fairly
accurately and had less room for improvement while the former group
started out so poorly that their improvement was marked. The SGS
group predicted almost as well as the discriminant function with
the profiles only. The UGS group improved from below chance to the
level of accuracy achieved by the discriminant function.
Confidence
Previous studies would suggest that the trained clinicians
should have had less confidence in their judgments than the two pre
professional groups. Although group differences for confidence were
nonsignificant, the professionals in the present study tended to be
the most confident. Again, this may have been because they were
using the WIPI with which they were more familiar than were the other
two groups. Also the professional group was predicting a criterion
about which they knew something, that is length of stay in psycho
therapy. This again suggests that previous studies have placed the
clinician at a disadvantage so that he is less accurate and less con
fident than he would be predicting in a familiar setting.
In general, adding information substantially increased the
judges' confidence. Judges became more confident as well as more
accurate with increments in information. However, the UGS group's
confidence did not increase until they had all the available infor
mation.
One problem with asking judges to assign a confidence rating to
each judgment was that each judge had a different standard or set
for measuring how confident he was. The range of confidence scores
used also varied between'judges and within groups so that one judge
used all six possible levels of confidence ranking (50, 60, 70, 80,
90, and 100) while another judge only used two (60, 70) or three
(80, 90, 100) rankings.
Approprlateness
The most meaningful measure to express appropriateness, defined
as the relationship between accuracy and confidence, was the corre
lation coefficient. Just as accuracy and confidence increased with
each level of information, so did appropriateness. As judges became
more accurate they also became appropriately more confident.
The increases in appropriateness followed the same pattern as
the increases in accuracy and confidence. That is, appropriateness
increased significantly across levels of information but there was
only a tendency for one group to be more appropriate than the other
groups. As with accuracy, the SGS group tended to be the most appro
priate and the UGS group tended to be the least appropriate. This
contrasts earlier findings that trained clinicians are more appro
priate in their confidence levels than are graduate students in
psychology (Oskamp, 1962; Shagoury, 1969). The findings of the
present study, however, do not contradict.earlier findings since the
present differences between groups on the measure of appropriateness
were nonsignificant.
A ppl icat ions
It appears that actuarial data and training in their use can be
applied to situations in which clinicians must predict and make de
cisions. In the present study, judges were able to postdiet length
of stay in psychotherapy fairly well. The next step would be to
apply these techniques to the same setting and Lr.dict a client's
length of stay in psychotherapy. This could then be followed up at
the end of treatment as a check of prediction accuracy. This would
enable the clinician to determine which short stays were "no shows"
and which were treated. Thus the discriminant function Z score and
judges predictions could be much higher and more useful for practical
application to the clinician's population of clients. This type of
procedure is most useful in a clinic situation which must limit the
number of clients seen or must screen those that will be seen.
Statistical methods of prediction can be particularly applicable to
the screening of patients to determine what type of treatment is most
appropriate and would be most useful for each client.
To use actuarial data in a clinic situation,they must first be
collected and analyzed. Too many clinical situations today fail to
make use of the data they have available. They do not even know the
base rates for various classifications of the clients they see.
Collecting and analyzing statistical data is another way to more
fully understand a particular clinical setting by learning what type
of patients are seen, how long they stay in treatment, and hopefully,
which ones are most likely to improve.
If a clinic decides to see everyone who comes in for help, tests
and statistical data are not of benefit in selecting whom to see.
However, these data might be used for prediction and research in a
setting which sees all clients. It is in situations where everyone
cannot be treated that improving tests and collecting base rate
information is most needed. Where decisions and predictions must be
made, actuarial methods are most needed to improve the clinician's
decisionmaking ability.
45
A further study which would be a fair and optimal test of clini
cal versus statistical prediction would be to give judges an oppor
tunity to see the relationships of test variables with a criterion
on a standardization sample. Then, the judges would be compared
with a discriminant function on a cross validation sample. However,
this was not the purpose of the present study.
SUMMARY
The present study was designed to look at the effects of adding
quantitative and qualitative data to a relevant clinical judgment
task. In essence, it compared judges with varying degrees of clini
cal experience to actuarial prediction methods. The study also
attempted to train judges to use actuarial information to improve
their prediction accuracy.
Twelve judges representing three levels of clinical experience
made postdictive judgments on the length of stay in psychotherapy
(short or long) fro.a a sample of MMPI profiles of clients seen in a
university mental health service. Judgments were made under four
conditions in which qualitative and quantitative information was
added incrementally at each level. The three levels of judges' ex
perience were professional clinical psychologists, "sophisticated'
third year clinical psychology graduate students trained in statis
tical decision theory, and "unsophisticated" third year clinical
psychology graduate students without any training in statistical
decision theory.
Accuracy increased over levels of information but there were no
differences in accuracy for the three levels of experience. A sig
nificant group by information level interaction demonstrated some
group effects due to a lower proportion of correct judgments for the
less experienced judges under conditions involving the least amount
46
of information.
Judges became more confident in their judgments as they received
more information. Appropriateness, defined as accuracy weighted by
confidence and measured by correlation coefficients between accuracy
and confidence, increased substantially as increments of information
were added. The group trained in statistical decision theory tended
to make the most appropriate judgments and the least experienced
group of graduate students tended to make the least appropriate judg
ments.
The present study showed that clinicians can use quantitative
data to improve their own judgmental ability and to predict more
accurately than actuarial data alone. Also, since those judges with
the most experience in using actuarial tasks tend to be the most
appropriate in their judgments, this implies that clinicians can also
be trained to be more appropriate and to know when their judgments
are more likely to be accurate.
APPENDIX A
INSTRUCTIONS
APPENDIX AI
INSTRUCTIONS PART I
This study is designed to examine the decision process when only
limited information is available. You will be presented with 25
Minnesota Multiphasic Personality Inventory (HMPI) profiles of stu
dents seen at the University of Florida Infirmary Mental Health
Service. Some of these students stayed a long time in therapy (5 or
more sessions, X=9) and some stayed only a short time (4 or less
sessions, X=2). Your task will be to decide which students stayed
a long time (L) and which stayed only a short time (S) on the basis
of the test profile alone.
Your task is to try to make the best estimate of probable length
of stay in psychotherapy given only limited information. It is pos
sible to correctly classify all the profiles. It is hoped that your
predictions will in some way help us to understand one aspect of the
decisionmaking process as it is applied by psychologists in clinical
settings.
You will also be asked to rate your confidence for each subject
on a scale from 50 per cent to 100 per cent. If you are positive of
your decision, you should mark 100 per cent; if you are only guessing
you should mark 50 per cent. That is, the more certain of your de
cision, the higher percentage you should mark.
APPENDIX AIl
INSTRUCTIONS PART II
Your task on Part II is identical to that on Part I. You will
be presented the same 25 profiles and asked to predict (S) or (L).
However, this time more information will be available to you. That
is, you will also have biographical data. You may use this infor
mation in any way you wish. You may choose to disregard the infor
mation altogether and make your predictions as you did in Part I.
Your task is to try to make the best estimate of probable length
of stay in therapy given only limited information. It is possible to
correctly classify all the profiles. It is hoped that your predic
tions will in some way help us to understand one aspect of the
decisionmaking process as it is applied by psychologists in clinical
settings,
Again, please indicate your confidence in your judgment for each
subject from 50 per cent to 100 per cent.
APPENDIX AII
INSTRUCTIONS PART III
Your task on Part III is identical to that of Parts I and II.
You will be presented the same 25 profiles and asked to predict as
accurately as possible, on the basis of the information given,
whether the student is (S) or (L). Again, more information will be
made available to you. The following statistical information will
be added.
Discriminant function analysis provided weights for each of the
13 MMPI scale variables in order to obtain maximal differentiation
between long stayers (I.) and short stayers (S). A composite score
'(Z) was obtained which best estimates the combined relative effects
of all the scale variables.
This Z score is used to make the best prediction as to which
criterion group a particular profile belongs. This can be summarized
as follows:
1. ZA32.02 is a positive test sign (4) and indicates a probable
long stay in therapy (.).
2. Z<32.02 is a negative test sign () and indicates a probable
short stay in therapy (S).
No test, however, classifies without some errors. This derived com
posite cutoff (Z=32.02) yields the following percentages of
classification:
Composite test sign Criterion
S L
2<32.02 69% 38%
Z!32.02 31% 62%
1
In other words:
1. A (+) test sign (Zd 32.02) correctly classified 62 per cent
of the long stayers (L). This is known as the valid positive
rate. Also, a (+) test sign incorrectly classified 31 per
cent of the short stayers (S) and this is the false positive
rate.
2. A () test sign (Z c32.02) correctly classified 69 per cent
of (5), the valid neaatjv ra, and incorrectly classified
38 per cent of (L), the false negative rate.
This means that 38 per cent of the (L)'s scored below 32.02 and were
incorrectly classified (S), and 31 per cent of the (S)'s scored above
32.02 and were incorrectly classified as (L). The total percentage
correctly classified was 67 per cent.
You will be required to predict as accurately as possible
whether the student belongs to (S) or (L), short or long stay. It is
possible to score every profile correctly scoring 100 per cent.
You may predict (S) or (L) by using (1)the composite Z score
cutoff, (2)the biographical data, (3)the profile alone, or (4)any
combination of (1), (2), and (3). The composite cutoff score was
applied to yield the best overall classification rate but no test
53
is perfect and errors may be made with any procedure. It is quite
possible that the clinician may be able to improve upon the linear
statistical method (Z score) by utilizing combinations of both
"intuitive" and statistical data.
Your task is to try to make the best estimate of probable length
of stay in therapy given additional, but limited, information. It is
possible to correctly classify all the profiles. It is hoped that
your predictions will in some way help us to understand one aspect
of the decisionmaking process as it is applied by psychologists in
clinical settings.
APPENDIX AIV
INSTRUCTIONS PART IV
Your task on Part IV is identical to that of Parts I, II, and
III, utilizing the same 25 profiles. You are to predict as accur
ately as possible on the basis of the information given, whether the
student is (S) or (L). Again, more information will be made avail
able to you. In addition to the composite Z score, biographical
data, and test data, you will also be told the conditional probabili
ties and base rates for the groups and test signs.
Conditional probabilities combine test signs, (+) or (), and
base rates to yield a quantitative index of the probability of
correct classification when Z232.02 (+) or when Z 32.02 ().
For example, some of the subjects will be (L) when Z232.02 (+)
and some will be (S) when Z 32.02 (). The problem is to determine
how confident we can be with each test sign under the base rates of
the population. The base rates for the two groups are: Short (S)=
66 per cent and Long (L)=34 per cent. In other words, 34 per cent
of the subjects stayed a long time in therapy and 66 per cent stayed
only a short time. The majority, therefore, were shorts (S).
Based on this information, the conditional probabilities are:
for a (+) test sign, P(L/+)=.51 and for a () test sign, P(S/)=.78.
This means that the probability of a person staying a long time in
therapy (L), given a positive test sign, is .51, and the probability
of a person staying a short time (S), given a negative test sign, is
.78.
A conditional probability of .51 for a (+) test sign means that
you would be as often wrong as you were correct in prediction (L) for
a (+) sign. A conditional probability of .78 for a () test sign
means that you would be correct more often than you would be wrong
in predicting (S) for a () test sign.
Your task is to try to make the best estimate of probable length
of stay in psychotherapy given additional, but limited, information.
It is possible to correctly classify all the profiles. It is hoped
that your predictions will in some way help us to understand one
aspect of the decisionmaking process as it is applied by psycholo
gists in clinical settings.
Again, please indicate your confidence in your judgment for each
"subject from 50 per cent to 100 per cent.
APPENDIX B
SUMMARY OF NEWMANKEULS TESTS
APPENDIX BI
SUMMARY OF NEWMANKEULS TEST FOR GROUP MEAN DIFFERENCES
Differences among Level I means
GS XSGS
"GS = .46  .12* .15**
= .58  .03
LSGS = 61
* 2<. 01
Differences among Level II means
iGS P SGS
GS= .57  .04 .11
= .61  .05
p
X = .66
SGS
Differences among Level III means
X XX
UGS 1P SGS
GS = .63  .03 .10
= .66  .07
XG =.73 
SGS
Differences among Level IV means
UGS P SGS
GS .66  .04 .05
Xp .70  .01
XSGS .71
APPENDIX BII
SUMMARY OF NEWMANKEULS TEST FOR INFORMATION MEAN DIFFERENCES
Differences among SGS means
I II Iv II
S = .61  .05 .09 .12
I
X1 = .66  .04 .07
S.70  .03
Iv
X = .73 
III
Differences among P means
x = .58
X, = .61
X = 66
I .71
Iv
X X X X
I I I II II IV
 .03 .08 .13"
 .05 .10
 .05

Differences among UGS means
I II III IV
x = .46  .I* .17 .20;
XI .57  .06 .09
 I .63  .03
I .66 
<** .01
* U<.05
Differences among group means
I II Ill Ilv
S = .55  .06* 12** .14,k,
 = .61  .06* .08*
S= .67  .02
xV = .69 
REFERENCES
Adams, J. K. A confidence scale defined in terms of expected percen
tages. Amer. J. Psychol., 1957, 70, 432436.
Dahlstrom, W. G., & Welsh, G. S. An MMPI handbook: a ouide to use
in clinical practice and research. Minneapolis, Minn.: The
University of Minnesota Press, 1968.
Edwards, A. L. Exerimental design in psychological research. New
York: Rinehart & Company, Inc., 1950.
Goldber., L, R. 1he effectiveness of clinicians' judgements: the
diagnosis of brain damage from the BenderGestalt Test. J.
consul,. Psychol., 1959, 23, 2533.
Go!dberg, L. R. Simple models or simple processes? Some research on
clinics! judgements. Amer. Psyc t, 1968, 23, 483496.
Golden, M. Some effects of combining psychological tests on clinical
inferences. 1. consult. Psychol., 1964, 28, 440446.
Holt, R. R. Clinical and statistical prediction: a reformation and
some new data. J. abnorm. soc. Psychol., 1958, 56, 112.
Holtzman, W. H. Can the computer supplant the clinician? J. clin.
Psychol., 1960, 16, 119122.
Kirk, R. E. Experimental design: procedures for the behavioral
sciences. Belmont, Calif.: Brooks/Cole Publishing Co., 1968.
Kostlan, A. A method for the empirical study of psychodiagnosis. J.
consult. Psychol. 1954, 18, 8388.
61
Lindzey, G. Seer versus sign. J. exp. Res. Pers., 1965, 1, 1726.
Meehl, P. E. Clinical versus statistical prediction. Minneapolis,
Minn.: University of Minnesota Press, 1954.
Meehl, P. E. Seer versus signs: the first good example. J. exi .
Res. Pers., 1965, 1, 2733.
Meehl, P. E., & Rosen, A. Antecedent probability and the efficiency
of signs, patterns, or cutting scores. Psvchol. Bull., 1955,
52, 194216.
Megargee, E. I. (ed.). Research in clinical assessment. New York:
Harper & Row Publishers, 1966.
Mello, Nancy K., & Guthrie, G. M. MMPI profiles and behavior in
counseling. J. counsel. Psychol., 1958, 5, 125129.
Oskamp, S. The relationship of clinical experience and training
methods to several criteria of clinical prediction. Psvchol.
Mono 1S2. 76, No. 547.
Satz, P A block rotation task: the application of rultivariate and
decision theory analysis for the prediction of organic brain
disorder. Psychol. Monoqg., 1966, 80, No. 629.
Shagoury, P. Ta1 influence of statistical information on clinical
decisions. Unpublished master's thesis, University of
Florida, 1969.
Shagoury, P., & Satz, P. Tne effect of statistical information on
clinical prediction. Proceedings: 77th Annual Convention,
APA, !969, 310311.
BIOGRAPHICAL SKETCH
Ann Weimer Moxley was born March 14, 1946, in New York City,
New York. She attended public schools in Gainesville, Florida, and
graduated from Gainesville High School in 1964. She enrolled in the
University of Florida in September, 1964, and received her Bachelor
of Arts degree, magna cum laude, in December, 1967. She received
her Master of Science degree at the University of Florida in December,
1968. She is currently engaged in her clinical psychology internship
at the University of Rochester's Strong Memorial Hospital in Rochester,
New York.
She is married to James Edward Moxley who received his Master 's
in Business Administration at the University of Florida and is now
employed by Eastman Kodak in Rochester, New York. Ann is a member of
Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board.
This dissertation was prepared under the direction of the
chairman of the candidate's supervisory committee and has been
approved by all members of that committee. It was submitted to
the Dean of the College of Arts and Sciences and to the Graduate
Council, and was approved as partial fulfillment of the require
ments for the degree of Doctor of Philosophy.
December 1970
Dean, College of Arts and ciencs
Dean, Graduate School
Supervisory Committee:
Chairman
2Aa sS 
r ma

Full Text 
xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID ESX35SI9F_QD0P1W INGEST_TIME 20110715T22:43:46Z PACKAGE UF00097732_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
PAGE 1
0L6X VaraOTH iO AJLISHSAINn AHdOSOlIHd HO HOXDOa HO HHaOHQ anx HOH SiN3W3OTn5Ta anx ho XNHwnimnH Tvusvd ni VaraOlH JO AXISHHAINn 3HX HO iDNnoD axvnavao hhx ox aaxNHssnid NOuvxÂ«HSsia v A31XOIM â€¢3Wmi5Ci NNV iN3H3eanr ivoinho no NOIiVlAJ^OdNI IVOIlSllViS dO Si03dd3 3Hi
PAGE 2
ACKNOWLEDGEMENTS The author vjishes to express her gratitude to her committee chairman, Dr. Paul Satz, For his comments, patience, and inspiration throughout the initiation, impl emente t ion and completion of this research. The author is also indebted to the other members of her committe, Dr. Audrey Schum::cher, Dr. Ben Barger, Dr. Marvin Shav.', and Dr. Donald Childers, for their assistance and criticisms. The author wishes to express her special thanks to her husband, J ini, whose mora! support vsias invaluable and most appreciated.
PAGE 3
TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ii LIST OF TABLES ?v LIST OF FIGURES v ABSTRACT vi INTRODUCTION 1 METHOD 14 RESULTS 21 DISCUSSION 35 SUMMARY LiG APPENDIX A INSTRUCTIONS k8 APPENDIX B SUMMARY OF NEWMANKEULS TESTS 56 REFERENCES 61 BIOGRAPHICAL SKETCH 63 iii
PAGE 4
LIST OF TABLES Table Page Actual and testpredicted therapeutic outcome 9 Design scfiematic 18 Proportion of correct judgments 22 Summary of analysis of variance of accuracy 23 Mean confidence scores 27 Summary of analysis of variances of confidence 25 Correlation coefficients for appropriateness 31 Summary of analysis of variance of appropriateness correlations 32
PAGE 5
LIST or FIGURES Figure Page 1. Accuracy by information levels 2k 2. Confidence by information levels 29 3. Appropr ia Leness correlations betv.'een accuracy and confidence by information levels 33
PAGE 6
Abstract of Dissertation Presented to the Graduate Council in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Florida THE EFFECTS OF STATISTICAL INFORMATION ON CLINICAL JUDGEMENT By Ann Weimer Moxley December, 1970 Chairman: Dr, Paul Satz Major Department: Psychology The present study was designed to look at the effects of adding quantitative and qualitative data to a relevant clinical judgment task. In essence, it compared judges with varying degrees of clinical experience to actuarial prediction methods. The study also attempted to train judges to use actuarial information to improve their prediction accuracy. Twelve judges representing three levels of clinical experience made postdictive judgments on the length of stay in psychotherapy (short or long) from a sample of MMPI profiles of clients seen in a university mental health service. Judgments were made under four conditions in which qualitative and quantitative information was added incrementally at each level. The three levels of judges' experience were professional clinical psychologists, "sophisticated" third year clinical psychology graduate students trained in
PAGE 7
statistical decision theory, and "unsophisticated" third year clinical psychology graduate students without any training in statistical decision theory. Accuracy increased over levels of information but there were no differences in accuracy for the three levels of experience. A significant group by information level interaction demonstrated some group effects due to a lower proportion of correct judgments for the less experienced judges under conditions involving the least amount of information. Judges became more confident in their judgments as they received more information. Appropriateness, defined as accuracy weighted by confidence and measured by correlation coefficients between accuracy and confidence, increased substantially as increments of information were added. The group trained in statistical decision theory tended to make the most appropriate judgments and the least experienced group of graduate students tended to make the least appropriate judgments. The present study showed that clinicians can use quantitative data to improve their own judgmental ability and to predict more accurately than actuarial data alone. Also, since those judges with the most experience in using actuarial tasks tend to be the most appropriate in their judgments, this implies that clinicians can also be trained to be more appropriate and to know when their judgments are more likely to be accurate.
PAGE 8
INTRODUCTION The objectives of the present study were twofold. The primary objective was to examine the effects of test and nontest (statistical) information on the judgmental process. The secondary objective, end the focus for implementing the primary objective, was to study the psychological attributes of individuals who stay only a short time in therapy versus those wh,o remain a long time. That is, the objective of studying the judgmental process was couched in a real end reievant situation, length of stay in psychotherapy, which is a meaningful and pressing problem for psychologists today. However this secondary objective v.Â£:5 minor in relation to the major issue of e;
PAGE 9
methods found that the actunrial methods were either superior to clinicians or equal in efficiency to clinicians (Meehl, 1965). V/ith the exception of one study, the clinician has shown no superiority to purely quantitative actuarial prediction. The one study v.'hich did find clinicians superior (Lindzey, ly65) used one to two clinicians and its application is sonewhat questionable. One reason the clinician has not been superior to actuarial methods is that he has seldom been given the opportunity to incorporate the actuarial information in formulating his final decision. He has been at a disadvantage so that the demonstrated superiority of the actuarial method may be due to the experimental design rather tl'an to an actual superiority of statistical techniques. Also, the information available to the clinician has often been based on nonquantitative data such as interview material, case history data, and projective tests. Holtzman (1S6C) separates the clinician's diagnostic task into three phases: ( I ) col 1 ect i on of information; (2) preparat i on and translation of this information for analysis; (3) i nterpretat ion of this information. As he points out, actuarial methods, and specifically the computer, ore superior to the clinician in processing information once the primary coding has been done. The clinician is still superior at collecting information and at interpreting it because at present the computer lacks the appropriate rules and parameters for interpretation. Thus, studies which emphasize aspects of prediction suitable for actuarial methods do not use the clinician's talents to best advantage. It is when skilled clinicians use familiar methods to predict a criterion they know something about that they have the
PAGE 10
most success (h'olt, 1958). This includes their having a rich body of data and systematic actuarial procedures at their disposal in addition to their own experience, intuition, and knowledge. Recent studies suggest that as the amount of clinical experience increases, prediction accuracy decreases (Goldberg, 1959; Oskamp, 1962; Shaqoury, 19^9; Shagoury & Satz, 1969). These studies compared trained clinicians with a professional degree to clinical psychology graduate students end even to nonprofessional groups, such as secretaries, and have found that the trained clinicians were not superior to the other groups. An explanation of this finding is that the more experienced clinician har, developed a particular way of looking at data which interferes with his making unbiased, objective decisions. Another aspect of research in the area of clinical versus statistical predictions is the confidence clinicians place in their judgments and the appropriateness of their predictions. Appropriateness Is a measure of confidence weighted by accuracy which was developed by Adams (1957). Confidence in judgments also differs between groups of graduate students and trained cl i nic ians, wi th the trained psychologists being less confident in their judgments (Goldberg, 1959; Oskamp, 1962). When the measurement of appropriateness of the judgment is introduced, however, the trained clinicians are more appropriate in their confidence levels than are either graduate students or nonprofessionals (Oskamp, 19^2; Shagoury, I969). That is, clinicians are more confident of their correct decisions and less confident of their incorrect decisions. The amount of information available to the judge does not correlate with his predictive acc.iracy but
PAGE 11
increased amounts of information substantially increase confidence levels (Goldberg, 1968). Goldberg (I968) also discusses the nature of the judgmental process. He questions vjhether judges use simple decisionmaking models such as linear models, or complex processes such as configural models. In an analysis of clinician's judgments he found that a linear model usually reproduced 90 to 100 per cent of the reliable judgmental variance on most decisionmaking tasks even though the clinicians generally felt that they used more complex, configural model s. Using Statist ical Information to Increase Predi ction Accuracy Training in the use of statistical information has been shown to improve judgmental accuracy. in a study by Oskamp (I962), clinicians 'vicre able to improve their ability to distinguish psychiatric and medical patients on the basis of their Minnesota Multiphasic Personality Inventory (MMPI) profiles vjhen' they were provided with actuarial rules. Statistical formula predicted with 75 per cent accuracy and the clinicians, after training, were able to reach this 75 per cent accuracy level. Goldberg {ISGS) trained judges by giving them a formula and optimum cutting score for distinguishing neurotic form psychotic MMPi profiles. The judges were told that the statistical information predicted v;ith 70 per cent accuracy and they \iere encouraged to use this information along v/ith any other information they thought would improve their prediction accuracy. Goldberg found that after eight
PAGE 12
weeks of "value training," the judges, on the average, increased theii accuracy from between 52 per cent to 65 per cent to approximately 70 per cent. This was the only type of training that substantially improved accuracy. Thus, feedback is necessary if the clinician is to learn how to ir.iprove his decisionmaking techniques. Another useful type of statistical information is the incidence, or base rate, of a given trait in the population available to the clinician. Goldberg (1959), for example, had judges predict braindamaged patients from functional patients on the basis of BenderGestalt protocols. The protocols were randomized into different groups in which the incidence of b ra i nda'nage varied from high (Â£=.8) to low (Â£=.2). Goldberg found no difference in judgmental accuracy between these groups. Unfortunately, the base rate information was not provided to the judges. The importance of base rates for evaluating predictive tejts was discussed by Meehl and Rosen (1955). They cite as an example an Army adjustment test for predicting vjhich inductees would adjust to the service. The test predicted inductee adjustment with an accuracy of 79.7 per cent. However, the overall percentage of inductees who adjusted was 95 per cent; thus, utilization of the base rates alone (i.e., predicting adjustment in all cases) would result in a hit rate of 95 per cent. Another application of base rates is through Bayesian statistical theory which combines the base rates with the valid and false positive rates of a particular test to give a conditional probability for the likelihood of being correct or incorrect given a certain test
PAGE 13
sign in a given base rate population. Shagoury (1963) and Shagoury and Satz (I969) demonstrated that clinicians can substantially improve their predictive accuracy when provided vjith information on base rates and conditional probabilities. These studies sho\;ed that increments in statistical i nformat ion, added to test data, significantly increased the accuracy of judges in a reallife clinical decision task of predicting braindamaged end functional patients on the basis of a block rotation task (Satz, I966). Their judges' accuracy approximated that obtained by a discriminant function predictor score (Z.) . Composite Z. scores were deemphasized by the judges in favor of using the additional information such as the base rates, differential error risks, end conditional probabilities. Hovjever, in groups with a high incidence of braindamaged individuals (base rate=.8) the judges' overall accuracy decreased, perhaps due to a reluctance to diagnose pathology. Mechl and Rosen (1955) point out that test development should be concentrated on populations with base rates near .50 rather than on populations with base rates approaching .00 or I. 00 since the use of a test in the latter cases will lower the hit rate of using the base rates alone. A cutting score, or composite Z_ score, derived from discriminant function analysis can be manipulated for various purposes in prediction. It can be used to maximize the number of correct predictions for all cases or for maximizing only correct predictions for positives. An excellent application of this teclinique of discriminant function analysis to decision theory in a clinical setting was demonstrated
PAGE 14
by Satz (I966). Discriminant function analysis is a statistical technique devised to maximally differentiate discrete criterion groups when multiple measurements are involved. This is essentially e multiple regression technique occept for a discontinuous distribution on the criterion variables. The follovjing linear equation expresses this funct ion: i^\\ ^ \h^ w where Z_ is the composite predictor score based on the individual scores on each of the variables (Xj , X2i..iXp) and the respective weights, or lambdas, assigned to each of the variable scores (X), \j ^n^ ' If there arc two criterion groups involving multiple measures, the discriminant function determines optimal v^jeights (lambdas) for these variables which v;ill maximize the difference betv^een the composite Z scores on botti criterion groups. Length _pf S tay in Psychotherapy as a Criteri on Va riable Why is length of stay in psychotherapy a meaningful problem for study? First, there is the great demand for psychological services with a presentday manpower shortage of trained clinicians. Host clinics that see individuals with psychological problems are understaffed, have pat ient wa i t ing1 ists , or both. There are also differential risks involved in selecting who will be seen in therapy. It is far more serious to miss those who are severely disturDeo and need longterm psychotherapy because of the threat these individuals may pose to themselves or to society, than it is to wrongly classify persons who need only a fevv sessions and are experiencing minor
PAGE 15
difficulties in their lives. The first type of error, that of predicting a short stay in therapy based on a negative test score when in fact the person stays a long time, is a false negative error. A false positive error results from the prediction of therapy sessions based on a positive test score when the individual actually stays only a few sessions. Meehl and Rosen (1955) point out that often in a clinical setting external restraints are imposed, perhaps due to a shortage of staff time, patient vja i t i ng1 i sts , or administrative policy. If this is the case, decisions cannot always be made in accordance with known base rates. They give the following example to illustrate the use of an externally imposed selection ratio. if 80 per cent of the patients referred to a mental health clinic are recoverable v;ith intensive psychotherapy, then everyone should be treated rather than relying on a test which predicts only 75 per cent of those who will have a favorable therapy outcome. However, if staff time is limited and only half of the referrals can be treated, following the base rates is meaningless because this would lead to a decision that would be impossible to implement. In this case, where a selection ratio of .5 is externally imposed, the use of the test becomes worthwhile. Given the figures in Table 1 (Keehl & Rosen, 1955). those 50 cases out of the 100 referrals to be treated are selected from those individuals the test predicts will be "good" therapy risks. If this is done there is a 92.3 per cent hit rate among those selected for therapy (6O/65) . Stated another way, the test will be correct in ^6 out of the 50 cases which v.ill be successes (half of the 80 good therapeutic
PAGE 16
Table 1 Actual and TestPredicted Therapeutic Outcome Test Predict ion Therapeutic Outcome Good Poor Total Good Poor 60 20 65 35 Total 80 20 100
PAGE 17
10 outcome group) . A second reason for selecting length of stay in psychotherapy as the focus for a clinical judgment study is that the probler.i can be subjected to multivariate and statistical decision theory analysis in order to increase the predictive relationship bet'ween signs and criteria. This possibility thus increases its application and potential usefulness to clinical judges. One study in this area found that there are differences in behavior in psychotherapy betv.een individuals which are predictable from an HHPI profile (Mello & Guthrie, 1958). Kello and Guthrie studied 219 individuals seen at a college psychological clinic. They used only those profiles with at least one T score greater than 70. They found that length of stay in therapy was related to high scores on various scales of the HNPI. Of those students with high scores on Scale 2(D), kS per cent remained only one to three sessions. Persons high en Scale 3(Hy) tended to stay in therapy longer than the high 2's and also developed dependency on tiie therapists more easily. Scale ^(Pd) individuals seldom stayed past seven counseling sessions and as a group iMere quite resistant to therapy although they did not often cancel their appointments. Persons who stayed the longest in therapy were high on Scales 7(Pt) or 8(Sc) with some clients continuing past 60 and 21 sessions respectively for these two scales. Most of the high S(Ma) students stayed fewer than 11 sessions and cancelled therapy sessions frequently. Mello and Guthrie concluded thet a therapist can get seme idea of what to expect from a particular client on ti.e basis of h^s MMPI profile.
PAGE 18
n The flello and Guthrie study is interesting because it suggests that psychological data (MMPl) may be used by clirticians to more efficiently select clients for psychotherapy. Unfortunately, the authors did not examine this problem within the context of a decisionmaking task nor did they subject their data to multivariate analysis. Using length of stay in psychotherapy as the predictor criterion is valuable for other reasons. For the professional involved, it may clarify the services offered by his agency and help hir,i to provide more adequate services to his clients. For example,. he may decide that seeing many clients for a short period of time is of more value tiian giving those who need longterm therapy this service and thus seeing fevJer clients. That is, prevention may be emphasized in a college mentai health clinic and such a clinic may be designed to see as many students as possible to ease their transition from high school or junior college to a college curriculum. On the other hand, a clinic nay be more treatment oriented and seek to help those vjho are more disturbed and require longer therapy. This emphasis would require more staff time per individual client and would necessitate seeing fev.'cr clients. Decisions of whom to treat could be more adequately made with test and nontest information. To be able to predict length of stay in therapy could affect therapist expectations which could in turn affect outcome variables. Just what effect an expectation for a particular length of stay in therapy will have on the outcome of the therapy is outside the scope of the present study but is an important research question in itself. Of coufse, if the clinician intends to see ever/one who enters his
PAGE 19
12 clinic, a screening procedure is vjorthless or may even be detrimental if the test predicts that an individual will not stay in therapy or will not improve in therapy, because this may lead the therapist to expect just these results to the client's disadvantage (Meehl & Rosen, 1955). It is often necessary for the clinician to indicate a therapy prognosis for an individual. If the clinician can predict or learn to accurately predict whether or not a person will stay in therapy, he is providing useful information for the person's treatment. Thus it can be seen that clinicians are constantly involved in the task of prediction and decisionmaking. If they can be trained to make use of relevant data and material, they may improve their predictions. Although mar.y clinicians look VN'ith disfavor on the use of tests, tests combined vjith other relevant data can be shovjn to have practical and research applications. The clinician may use them 'to better his predictions and decisionmaking processes. Hypotheses Test ed The present study was addressed to two objectives. First, to examine the decisionmaking process and to determine VJhether prediction accuracy is influenced by independent variables such as clinical experience and varying amounts and kinds of information. Second, the question of vjhether clinicians can be trained to improve their clinical decision processes vsias also examined. The first and primary objective was studied in terms of the second objective, a reallife situation that is meaningful to clinicians todaythe problem of
PAGE 20
13 length of stay in psychotherapy. If increments in levels of statistical information increase prediction eccuracy and thereby improve the clinicians' decision process, this type of information may be dovetailed into the operation of a clinic and taught to the staff to identify highrisk individuals. Specific questions, or hypotheses, were raised. Does judgmental accuracy increase as more information is added to the prediction task and what types of information are most useful in increasing judgmental accuracy? Will there be differences in accuracy dependent on experience level? That is, will graduate clinical psychology students trained in statistical decision theory be better clinical judges than experienced PhD clinical psychologists (vjithout such training) and vjI 1 1 less experienced psychologists be superior to more experienced clinicians? Will confidence and appropriateness increase vvilth increments in information and will there be differences betvjeen the three experience levels, with regard to their confidence and appropriateness.
PAGE 21
METHOD Sub i ects . Tv.elve judges (Js) represented three levels of experience aoo sophistication in statistical decisionmaking, A professional (P) group of four PhD clinical psychologists represented the highest level of clinical experience. A group of four clinical psychology graduate students trained (sophisticated) in statistical dec is ionma King theory (SGS) represented the highest level of statistical sophistication. Another group of four un^ophii st icated (not trained in statistical decision theory) clinical psychology graduate students (UGS) represented the saiue experience level as the SGS group and the same level of statistical sophistication as the P group. Sophistication in decision theory was defined as pa rt i c i fia t ion in a graduate course in statistical decision theory for clinical psychology students at the University of Florida. Sophistication here only Implies special training and by no means implies that ttie professionals were clinically unsophisticated. Ma t e r I a 1 s . Test materials for Js were a random sample of 100 MMPI profiles of clients seen in a university mental health service. The sample profiles were drawn from 2^1 profiles of all clients seen during a threeyear period. Each J received 25 of the 100 profiles. Profiles viere divided into tvjo groups based on the client's length of stay in psychotherapy at the mental health service. A short 14
PAGE 22
15 stay (S) vjas defined as four or less therapy sessions end a long stay (L) as five or more therapy sessions. The mean length or stay for the S group was 2.00 sessions and for the L group 9.27 sessions. A discriminant function analysis which maximized the difference between the two length of stay in psychotherapy groups v.'as run on the 2^1 MMPI profiles. The mean discriminant composite scores for the two length of stay in therapy groups on the 13 ^'MP scale variables were Z_, =29.7^1 for the fewsession group (S) and 1^3^.26 for the manysession group (L) . .An analysis of variance of the composite means showed a significant difference between the two groups (Â£=^^1.19. o'f/:12,22't, p<.O0I). A com.monly used rule of Z= 1 *" 2 was used to 2 determine the optimal predictive cutting I score. With an emphasis on minimizing the false negative rate, the composite Z. score of 32.02 predicted with an overall hit rate of 67 per cent for the original protocol pool. False negative er'ors represented those clients who were predicted as, : hortstays (S) , or negatives, but who remained long in therapy (L). It was felt that this predictive error was more serious than the false positive error which included those clients who were predicted as longstays (L) but who remained a short time in therapy (S). It seemed more important to identify those clients who really needed longterm therapy than to identify those who did not. Of course, some of the individuals with high test scores who stayed only a fev; sessions may have been very disturbed but dropped out of therapy prematurely. There was no v;ay to identify these case's when a very disturbed student may have 3?nicked cr become threatened by therapy and dropped out or
PAGE 23
16 simply missed appointments. The false negative rate for the Z score of 32.02 vjas .38, tiic false positive rate was .31, giving a valid negative rate of .69 and a valid positive rate of .62. Another Z. score v.'liich minimized the false positive error prediction v.'ith an overall accuracy of 71 per cent vs'as not used in the present study for the reason stated above. Conditional probabilities v;ere calculated for the Z. cutting score. Conditional probabilities v.'ere computed vjith the following equi) t ions : P'.Ur) p(gp(,./L) + p(s)r(+/S) ^""^ ^^^^ '' P(S)P(/S) + P(L)P(/L) v^hert: L=many ther'ipy sessions or a long stay in therapy (base rate. 66) S=fcw therapy sessions or a short stay in therapy (base rate=.3^0 +a positive test score (Z ^32.02) a negative test score (Z < 32.02) For the Z. score of 32.02 tiie conditional probabilities were: P(L/^)c5i snd P(S/).78. With this ne.; i niorrrat i on it c^n be seen that with a positive test score, predictions will he wrong as often as they are correct. But given a negative test sco'e, predictions will be riglit 7S per cent, or most of the time. Finally, s random sample of 100 profiles from the total protocol pool of 2^+1 cases was drawn. This was done so that the J.s would have fevJer pro;:ocois to judge, making their task more economical with regard to time. 'Failure to control this factor undoubtedly lovjered the predictive accuracy of the discriminant function equation (and perhaps clinical judgment) in that some of the disturbed profiles in the (S) criterion grcip may well have remained (L) if they had not dropped out.
PAGE 24
17 A second reason for dravjing a random sample was to make the situation more relevant clinically in terms of the base rates. That is, the sample had only approximate base rates and the judges did not knov; the exact probabilities for their sample of those viho remained a long or short time in therapy. Hov;ever, for the sample, the 2 score predicted with the same accuracy that it did for the total protocol pool . Procedure . Refer to Table 2 for a schematic of the design. J,5 were asked to predict a client's length of stay in psychotherapy from tiie 25 MHP! profiles. These profiles, the sample of 100 profiles and the original profile, pool all had approximately the same base rates; 3^ per cent of the clients stayed many sessions (L) and 66 per cenl stayed a few se<;sion5 (S) . The Js predicted length of st.^y in therapy (S or L) during four sess i ons, wi th additional information added incrementally at each session. These sessions, or level? of information, represented one class of independent variables Groups, or experience level, represented the other class of independent variables. Each J made his predictions on the same 25 protocols that he received at th.e first level throughout the training. Level 1: Js were first given KMPI profiles with no other information. Level 11: Js were again presented the same 25 protocols for the same judgment but with the additional information of biographical data such as age, sex, marital status, religious preference, parents' marital status, previous counseling experience, and subsequent counseling experience. Level ill: For the tliird decision task, Js were given the
PAGE 25
0) JD O "O O) JD O XI
PAGE 26
19 profiles, biographical data, with the additional statistical information of the cutting score based on discriminant function analysis. Valid positive and false positive percentages were also provided with the cutoff I score. Level !V: Conditional probabilities and the base rates were added to the previous information for the fourth presentation of profiles for prediction. (For a copy of the instructions for each information level see Appendix A.) For each judgment Js also indicated their confidence in the accuracy of their judgment . To rule out a practice effect from repeated presentation of the same profiles, two control judges were used who predicted length of stay in psychotherapy using profiles only, with no additional information on four separate occasions. Judges v.'ere presented the profiles for judgments on four days in a rov; with only one information level given each cay. Hyp otheses. I I nf ormat ion Level: It was hypothesized that incre.nients of info'mation would inciease overall judgmental accuracy and group accuracy. (A) Level I accuracy would be at approximately the level of chance. (B) At Level II, accuracy would decrease or remain the same. ( C) Level III accuracy would be approximately that of the actuarial prediction accuracy of the d i scr imi nar,t function analysis. (D) Level IV accuracy would increase slightly over Level III accuracy. I Exper ience Level: It was hypothesized that the statistically sophisticated graduate students v.'Duld be the most accurate, the statistically unsophisticated graduate students next most accurate.
PAGE 27
20 and the professionals least accurate. I I I Confidence and Appropriateness: It was hypothesized that confidence ratings v.'ould increase with increments of information and that appropriateness would also increase with more information.
PAGE 28
RESULTS Accuracy: The Effects of Information and Experience Accuracy was defined as the proportion of correct judgments per presentation of 25 MMPl profiles. The two control Js showed no practice effects. Judge A's accuracy was 52 per cent on the first presentation and ^48 per cent on each of the three subsequent presentations. Judge B's accuracy vias distributed across sessions as follows: 76 per cent, '(8 per cent, 68 per cent, and 68 per cent. Table 3 presents accuracy by information level and experience level. Two analyses of variance were conducted to determine the effects of i ."format ion level, experience level, judges, profile set, and profile. The analysis of variance For profile set effects was nonsigoi f ici.nt (f,' = .5'3, df.=3,9I)An F^^^^ test for homogeneity of variances between groups was also nonsignificant (Â£niax~^ Â•^' ' Â— "^ ' dÂ£=l6). A sumTiary cf the analysis of variance for information level and expL^rience level effects Is shown in Table k. .I nformation . Mean judgmental accuracy increased consistently with increments of information from Level ! to Level IV (X^, = .55, X.,=.6I, X,,,=,67, iju=.63). These differences were significant (F.^10.82, df.=3,27, Â£<^.01). A graphic presentation of this trend is shown in Figure i. Inspection of Figure 1 shows approximately a linear increase in accuracy for the three groups by information level. Both the P and UGS groups increased their accuracy at each level w'l. ile the SGS group shewed increases at Levels 11 and Ml 21
PAGE 29
Table 3 Proportion of Correct Judgments 22 Exper ience
PAGE 30
Table k Summary of Analysis of Variance of Accuracy 23 Source of Variation df MS Mean 478.80 Information Level
PAGE 31
2k 70 60 50 !_. SGS P A. UGS I il III Information Level Fig, I. Accuracy by information levels, IV
PAGE 32
25 but a slight decrease in accuracy from Level III to Level IV. Exper ience . There VJere no differences in accuracy due to experience level except for a trend tovjard group differences (Â£=2.3't, d.f= 2,9, p<.20). The SGS group vjas the most accurate and the UGS group the least accurate (XsGS=^^S, Xp=.6^4, iuGS = 58). Only the SGS group's overall accuracy v;as at the level of the discriminant function VJhich predicted with 67 per cent accuracy. Information and experi en ce level interaction . The only other significant source of variance was the group by information level interaction (F7.23. df.=6,27, p<.Ol). The NewmanKeuls test of differences hetvveen means vjas used (Kirk, I9G8) and the results of tliis analysis are given in Appendix B. The interaction v.as based largely on a significantly lower proportion of correct judgn.ents of the UGS group at Level I. The UGS group not only started with the lowest proportion of correct judgments, but also shovN'ed the most significant Increase in accuracy as information was added. Their final degree of accuracy, however , was approximately the same as the SGS accuracy at Level I.' The UGS group significantly increased their level of accuracy at Levels II! and IV from Level I VJhen the composite I score, conditional probabilities, and base rates were added (p<,01). The only significant increase in accuracy for the P group was between the first level, \i\th the profile only, and the final level with all information (p<.05). Increases in accuracy for all groups across information levels were significant except the increase from Level !!! to Level IV where conditional probabilities and base rates
PAGE 33
26 were added. Adding conditional probabilities and base rates to the previous information did not result in a significant increase in J s ' accuracy over Level Mi, which included tiie composite Z. score. For the SGS group there were no significant differences in accuracy across Infornation levels. The only significant group differences within information levels were between the SGS group and the UGS group (p<.01) and between the SGS group and the P group (p<;.05). Conf idenc e Mean confidence scores by information level and experience level are shov
PAGE 34
27 SGS UGS Table 5 Mean Confidence Scores Experience Information Level Totals Level J I 11 Ml IV ' 61.2
PAGE 35
28 Table 6 Summary of Analysis of Variance of Confidence Source of Var i at ion df. MS F Inforniat ion Level 2 52.6? 5.38' ExperieiiCt^ Level 2 191.8^* .39 Information X Experience 6 12.73 1.30 Judges vvithin Groups 9 ' ^93.76 lnforrr,at ion X Judges 28 979 ".'>' Significant at the .05 level.
PAGE 36
29 70 60 50 Cl n SGS A 1 UGS I M III Information Level Fig. 2. Confidence by information levels.
PAGE 37
30 giveri in Teble 7 vjith the analysis of variance summary for appropriateness in Table 8. The analysis of variance was based on 1 transformations of the correlation coefficients. Mean appropriateness scores were significantly higher at each level of information (F=22.03, cLf=2,28, p<.01). The SGS group was most appropriate because they were most accurate and not overconfident, that is, their confidence was consistent VJith their accuracy. The P group was overly confident and the UGS group vs'as less accurate, making these two groups' confidence inconsistent vJith their accuracy. These trends can be seen in Figure J). Judges versu s the discriminant functio n The discriminant function correctly classified 67 per cent of the profiles. This information Vv'as given to the J_s at Level Ml. At Level ill only the SGS judges v;ere more accurate than the discriminant function with a hit rate of 73 per cent. The P group had a hit rate of 66 per cent and the UGS group had a hit rate of 63 per cent at Level 111. The accuracy for all judges combined was 67 per cent. Five judges (two in the SGS group, two in the P group and one in the UGS croup) were more accurate overall than the linear regression _Z score and only one J. (in the UGS group) operated below the chance level Overall. At Level I, four Js had accuracy scores below the level of chsncs and tv.o otliers vjere only slightly above cliance. However, none of tr.e four J^s who was belovj chance wzrc in the SGS group. At Level 1! thr^re were two J.S below chance and one slightly above, again,
PAGE 38
Table 7 Correlation Coefficients for Appropriateness 31 Exper ience
PAGE 39
32 Table 8 Summary of Analysis of Variance of Appropriateness Correlations Source of Varidtion df MS Information Level 2 Experience Level 2 information X Experience 6 Judges within Groups 9 Information X Judges 28 0705
PAGE 40
33 30 .00 IV and I II III Information Level Fig. 3. Appropriateness correlations between accuracy confidence by information levels.
PAGE 41
34 none of these Js was in the SGS group. V/ith the addition of the statistical information, only one J. (in the UGS group) remained near the chance level of accuracy and he vjas the least accurate of all the Js.
PAGE 42
DISCUSSION The present study demonstrated that judges can substantially improve their decision accuracy v>fhen provided vjlth increments of information, particularly statistical information. This finding extends the earlier findings reported for a different clinical judgment task (Shagoury & Satz, ^SbS) and contrasts wi th " prev ious studies which have used nonquantitative data. These findings also suggest that if the clinician is able to incorporate quantitative information, he may improve his ov;n decisionmaking ability and equal or surpass the accuracy of actuarial methods. The findings of the present study also shov.'ed that accuracy increased directly as a function of the amount of information available to the judges. Two conclusions that can be drawn from this finding are that the information v;as relevant to tiie judgmental task and that the judges used this information in formulating their dec is ions. Information A posttesting interview revealed that the type of information used varied between groups, among judges, and betvjeen information levels. Hov^ever, the interview was not structured enough to determine the actual decision rules used by the judges. At Level I, witii the HMPi profile only, most judges used their 35
PAGE 43
36 ovjn intuition about the relationships of vjliich scales were elevated and the extent of these elevations to the length of stay in treatment criterion. There was a great deal of individual variation in approach since each judge had differenct training experiences with the MMPI. The judges of the SGS group had the most similar training experience in the use of the I'nMPI since some training in the rationale and use of this instrument was given in the statistical decision theory course. The SGS group also shov;ed the least amount of individual variation in accuracy at Level I. The other group of graduate students (UGS) had the least amount of exposure to the KMPI. The UGS group was barely familiar vjith this test instrument and none of these judges had had any formal training in its use. It is interesting to note that the group of unsopti i st i Cdted graduate students shoveled the lowest accuracy througliout and v.'as the only group whose accuracy was never below the level of chance. It seems then, that the more familiar a judge is with a test instrument, the more accurate he will be in using it for prediction. The SGS group vjas not familiar with the specific type of task used in the present study. That is, they had not been trained in correlating MMPI data to length of stay in psychotherapy. This aspect of the study vjas novel to each of the three groups. At Level II again each judge approached the data differently and selected certain measures to use in mai
PAGE 44
37 hypothesis that accuracy vjould decrease at Level II was not supported, It was originally felt that all of the biographical data provided would make the task more complex and more difficult and would thus confuse the judges. However, the judges were able to relate some of the information to the task and thereby improve their judgments. Most judges used some combination of factors. V/hether the profile subject had previous counseling or subsequent counseling and his age were the factors used most often. Some of the judges also considered marital status when the subject was married. This finding (Level II) contrasts with other studies which indicate lowering of accuracy vjhen data are combined (Golden, I96'i)In support of the hypothesis, overall accuracy for Level III was the same as the discriminant function's accuracy of 67 per cent. With the addition of 1 scores at Level III, only one judge, who was in the P group, used the cutoff score exclusively. In this same group one judge changed none of his judgments from Level I! and the other two judges used primarily their own subjective inferences. The UG5 group essentially ignored the Z. scores and relied on their own intuition and thus did not reach the level of accuracy of the cutoff score. All judges in the SGS group combined the cutoff score data vjith their own intuition to improve upon the accuracy of the discriminant function. These findings imply that the clinician can make use of his intuition and experience but not at the expense of ignoring available data, particularly when they include quantitative information. The findings also imply that the most accurate judges arc. the ones who are able to utilize statisticei data.
PAGE 45
38 The fact that there v/as an increase in accuracy from Level III to Leve! IV, but that this increase was nonsignificant, supported the hypothesis that Level IV accuracy would increase slightly over Level II! accuracy. For the SGS group there was a slight decrease in accuracy. One reason for this decrease might have been the information itself. These judges had been trained to use more povJerful statistical information, that is, data that discriminated groups and subgroups more than did the data of the present study. The base rates of .65 end .35 vjere not sufficiently different from base rates of .5 to be of much help. Also the conditional probabilities were not high enough to provide maximum discrimination. All of the statistical data given VJere in approximately a 2/3 to 1/3 ratio. Because all of the statistical data had approximately the same predictive power, it may have been difficult to knovrj which kind of information would be most useful. Instead, judges may have tried to coHbine two or more kinds of data and as aresult were less accurate than they would have been using one type exclusively. Quantitative information is most useful when it represents higher ratios, such as base rates of .2/. 8 or .1/.9; conditional probabilities of .85/. 15; and cutoff scores of 75 per cent or higher. Even thougii five out of the twelve judges showed decreases in accuracy at Level IV in comparison with Level III, these decreases were slight and represented only one more incorrect judgment out of the 25 judgments for all five judges. At Level IV, the UGS group ignored the base rates and used the conditional probabilities. The SGS group and the P group both used
PAGE 46
39 a combination of conditional probabilities and base rates and both of these groups had tlie same degree of accuracy at Level IV. Also, both the SGS and P groups were more accurate than the UGS group. It seems probable that statistical information is more important than biographical information about the subjects since there was a greater increase in accuracy with the addition of statistical information. Other studies have siiown that biographical data are of minimal value to judges. Golden {]3Gk) found that judges agreed less in their description of protocols v.'hen they were given identifying data alone than when they were given a single psychological test or a combination of tests. Kostlan (195'+) found that judges VJere more accurate in their psychod iagnoses when they received both social cose histories and the more quantitative MMPI protocol than when they received social case histories alone. One may ask if the judges would liave been more accurate had they been given some feedback on their accuracy at each levol of information. This is possible but then the task would not have been as lifelike in the sense that clinicians in actual situations must usually wait some time before learning the accuracy of their predictions. However, this does emphasize the point that clinicians should check the accuracy of their predictions when possible and learn what helps them to predict most accurately. Exper i en ce and training The lack of overall differences between groups was not anticipated. It was assumed that the SGS group would have benefited from tlieir training in statistical decision theory. However,
PAGE 47
ko artifacts in the design tended to v.'ash out group effects by providing a guaranteed hit rate if the I scores vjere used at Levels III and IV. The convergence of judgmental accuracy for each group at Level IV lends some support for this argument. The hypothesis that the SGS group vJould be the most accurate was thus only tentatively supported since there vias not a significant group effect. However, the SGS group tended to be the most accurate in their judgments. This finding implies that clinicians can be trained to improve their ov.'n subjective inferences vnth statistical information. These Judges trained in statistical decision theory were able to add thsir own intuitive judgments to the statistical datu and thereby predict more accurately than did the discriminant function alone or than they had done without the statistical data. This special training taught them not only now to use statistical information but also how to use their clinical intuition to its best advantage. The SGS group also tended to be the most appropriate, that is, to know when their judgments were most accurate and when they were most inaccurate. The fact that the P group tended to be more accurate than the UG5 group was also unexpected in light of previous findings concerning amount of clinical training and accuracy of prediction. This finding does not support the previous evidence (Goldberg, 1959; Oskamp, 1962; Shagoury, 1969, Shagoury & Satz, 1969) that as the amount of clinical experience increases, prediction accuracy decreases. In the present study, judges in the SGS and P groups used familiar methods, the MHPI profiles or statistical data; the
PAGE 48
k] UGS group, by contrast, was presented v.'ith essentially unfamiliar prediction tools. One reason data in the present study were at variance VJith previous findings is that previous studies required clinicians to predict an unknovjn criterion or to use unfamiliar methods so that any previous "set" of the clinician was not advantageous. In the present study, the j udges' fami 1 ia r i ty with either the MMPI or statistical types of data helped them in their predictions. Interaction e ffects The significant group by information interaction effect showed at least indirect support for the experience level hypothesis that the SGS group Vviould be most accurate. This interaction showed that the unsophisticated graduate students started off predicting belov; chance and, finally at Level IV, reached the level of accuracy that the sophisticated graduate students attained at Level I (MMPI profiles alor.e). It was the former group with the least amount of experience, familiarity with the KMPI, and sophistication with the statistical decision theory which accounted for most of the group differences and much of the interaction effect. The rest of the interaction effect was due to the changes in accuracy across information levels vnth the UGS group shcv\iing the greatest change and the SGS group showing the least amount of change in accuracy. The latter group started out predicting fairly accurately and had less room for improvement while the former group started out so poorly that their improvement was marked. The SGS group predicted almost as well as the discriminant function with
PAGE 49
m the profiles only. The UGS group improved from belovvi chance to the level of accuracy actiieved by the discriminant function. Conf i de nce Previous studies would suggest that the trained clinicians should have had less confidence in their judgments than the tvjo preprofessional groups. Although group differences for confidence viere nonsignificant, the professionals in the present study tended to be the most confident. Again, this may have been because they were using the KMPl with which they v;ere more familiar than viere the other two groups. Also the professional group was predicting a criterion about which they knew sox.ething, that is length of stay in psychotherapy. This again suggests that previous studies have placed the clinician at a disadvantage so that he is less accurate and less confident than he would be predicting in a familiar setting. In general, adding information substantially increased the judges' confidence. Judges becarrie more confident as well as more accurate vv'itli increments in information. However, the UGS group's confidence did not increase until they had all the available informa t i on . Cne problem with asking judges to assign a confidence rating to each judgment was that each judge had a different standard or set for measuring how confident he was. The range of confidence scores used also varied between judges and within groups so that one judge used all six possible levels of confidence ranking (50, 60, 70, 80, 90, and 100) v;hile another judge only used tvio (60, 70) or three (80, 50, 100) rankings.
PAGE 50
^3 Appropr la teness The most meaningful measure to express appropriateness, defined as the relationship betv.'een accuracy and confidence, was the correlation coefficient. Just as accuracy and confidence increased with each level of information, so did appropriateness. As judges became more accurate they also became appropriately more confident. The increases in appropriateness follovjed the same pattern as the increases in accuracy and confidence. That is, appropriateness increased significantly across levels of information but there was only a tendency for one group to be more appropriate than the other groups. As with accuracy, the SGS group tended to be the most appropriate end the UGS group tended to be the least appropriate. This contrasts earlier findings that trained clinicians are more appropriate in their confidence levels than are graduate students in psychology (Oskanp, 1962, Shegoury, 1965). The findings of the present study, however, do not cont rad i ct ea r 1 i er findings since the present differences between groups on the measure of appropriateness were nonsignificant. Appl icat ions It appears that actuarial data and training in their use can be applied to situations in which clinicians must predict and make decisions. In the present study, judges were able to postdiet length of stay in psychotherapy fairly vJel 1 . The next step v^jould be to apply these techniques to the same setting and Â£redict a client's length of stay in psychotherapy. This could then be followed up at
PAGE 51
kh the end of treatment as a check of prediction accuracy. This would enable the clinician to determine vjhich short stays were "no shovjs'' and which were treated. Thus the discriminant function I score and judges predictions could be much higher and more useful for practical application to the clinician's population of clients. This type of procedure is most useful in a clinic situation vjhich must limit the number of clients seen or must screen those that will be seen. Statistical methods of prediction can be particularly applicable to the screening of patients to determine what type of treatment is most appropriate and vjould be most useful for each client. To use actuarial data in a clinic s i tuat i on , they must first be collected and analyzed. Too many clinical situations today fail to make use of the data they have available. They do not even know the base retes for vario'JS classifications of the clients they see. Collecting and analyzing statistical data is another way to more fully understand a particular clinical setting by learning what type of patients are seen, how long they stay in treatment, and hopefully, which ones are most likely to improve. If a clinic decides to see everyone v.'ho comes in for help, tests and statistical data are not of benefit in sel ect i ng whom to see. However, these data might be used for prediction and research in a setting which sees all clients. It is in situations where everyone cannot be treated that improving tests and collecting base rate information is most needed. V/here decisions and predictions must be made, actuarial methods are most needed to improve the clinician's dec i s ion.mski ng ability.
PAGE 52
^5 A further study v^Jhich would be a fair and optimal test of clini cal versus statistical prediction v.'ould be to give judges an opportunity to see tiic relat ionsln i ps of test variables v\iith a criterion on a standardization sample. Then, the judges would be compared with a discriminant function on a cross validation sample. However, this vjas not trie purpose of the present study.
PAGE 53
SUMMARY The present study was designed to look at the effects of adding quantitative and qualitative data to a relevant clinical judgment task. In essence, it compared judges with varying degrees of clinical experience to actuarial prediction methods. The study also attempted to train judges to use actuarial information to improve their prediction accuracy. Twelve judges representing three levels of clinical experience marie pcstdictive judgments on the length of stay in psychotherapy (short or long) fro.i a Sc^T.ple of MMPI profiles of clients seen in a university mental health service. Judgments were made under four conditions in which qualitative ;nd quantitative information was added incrementally at each level. The three levels of judges' experience were professional clinical psychologists, "sophisticated" third year clinical psychology graduate students trained in statistical decision theory, and "unsophisticated" third year clinical psychology graduate students vjithout any training in statistical decision theory. Accuracy increased over levels of information but there were no differences in accuracy for the three levels of experience. A significant group by information level interaction demonstrated some group effects cue to a lower proportion of correct juogmerits for the less experienced judges under conditions involving the least amount ^6
PAGE 54
^7 of inf oruii^t ion. Judges became more confident in tfieir judgments as they received more information. Appropriateness, defined as accuracy weighted by confidence and measured by correlation coefficients between accuracy and confidence, increased substantially as increments of information were added. The group trained in statistical decision theory tended to make the most appropriate judgments and the least experienced group of graduate students tended to make the least appropriate judgments. The present study showed that clinicians can use quantitative data to improve their own judgmental ability and to predict more accurately than actuarial data alone. Also, since triose judges with the most experience in using actuarial tasks tend to be the most appropriate in their judgments, this implies that clinicians can also be trained to be more appropriate and to know when their judgments are more likely to be accurate.
PAGE 55
APPENDIX A INSTRUCTIONS
PAGE 56
^9 APPENDIX A1 INSTRUCTIONS PART I This study is designed to exa.'nine the decision process v;hen only limited inforni?tion is available. You will be presented with 25 Minnesota Multipfiasic Personality Inventory (MMPI) profiles of students seen at the University of Florida Infirmary Mental Health Service. Some of these students stayed a long time in therapy (5 or more sessions, X=9) snd some stayed only a short time (4 or less sessions, X=2) , Your task v;ill be to decide v.'hich students stayed a long time (L) and which stayed only a short time (S) on the basis of the test profile alone. Your task is to try to make the best estimate of probable length of stay in psychotherapy given only limited information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decisionmaking process as it is applied by psychologists in clinical sett ings. You will also be asked to rate your confidence for each subject on a scale from 50 per cent to 100 per cent. If you are positive of your decision, you should mark 100 per cent; if you are only guessing you should mark 50 per cent. That is, the more certain of your decision, the higher percentage you should mark.
PAGE 57
50 APPENDIX Al! INSTRUCTIONS PART II Your task on Part II is identical to that on Part I. You will be presented the same 25 profiles and asked to predict (S) or (L). Hovjever, this time more i nforniat ion VJill be available to you. That is, you vi']]] also hove biographical data. You may use this information in any v.'ay you wish. You may choose to disregard the information altogether and make your predictions as you did in Part I. Your task is to try to make ttie best estimate of probable length of stay in therapy given only limited information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decisionmaking process as it is applied by psychologists in clinical sett ings. Again, please indicate your confidence in your judgment for each subject from 50 per cent to 100 per cent.
PAGE 58
51 APPENDIX Al! i INSTRUCTIONS PART I II Your task on Part III is identical to that of Parts I and II. YoLi will be presented the same 25 profiles and asked to predict as accurately as possible, on the basis of the information given, whether the student Is (S) or (L). Again, more information will be made available to you. The follov.'ing statistical information vn 1 I be added. Discriminant function analysis provided weights for each of the 13 MMP! scale variables in order to obtain maximal differentiation between long stayers (I.) and short slayers (S). A composite score " (_Z) vjas obtained which best estimates the combined relative effects of all the scale variables. This Z score is used to make the best prediction as to vjhich criterion group a particular profile belongs. This can be summarized as fol lows : 1. Zi32.02 is a positive test sign {{) and indicates a probable long stay in therapy (L). 2. Z<32,02 is a negative test sign () and indicates a probable short stay in therapy (S). No test, however, classifies without some errors. This derived composite cutoff (Z=32.02) yields the follov'jing percentages of
PAGE 59
class if icat ion: 52 Composite test sign Cr iter ion Z<32.02 I > 32.02 S 69% 31% L 38% 62% In other vjords: 1. A (0 test sign (Z^32.02) correctly classified 62 per cent of the long steyers (L). This is knovm as the ya lid pos.itive _ra t e . Also, a (+) test sign incorrectly classified 31 per cent of the short stayers (S) and this is the fals e positi ve rate . 2. A () test sign (Z'i32.02) correctly classified 69 per cent of (S) , the valid n eoatjy a rate , and incorrectly classified 38 per cent of (L) , the false negat ive rate. This means that 38 per cent of the (l)'5 scored below 32,02 and were incorrectly classified (S), and 31 per cent of the (S)'s scored above 32.02 and were incorrectly classified as (L) . The total percentage correctly classified was 67 per cent. You will be required to predict as accurately as possible vjhether the student belongs to (S) or (L) , short or long stay. It is possible to score every profile correctly scoring 100 per cent. You r.iay predict (S) or (L) by using (l)the composite Z score cutoff, (2)the biographical data, (3)the profile alone, or (^)any combination of (1), (2), and (3). The composite cutoff score vvas applied to yield the best overall classification rate but no test
PAGE 60
53 is perfect and errors may be made with any procedure. It is quite possible that the clinician may be able to improve upon tiie linear statistical method (_Z score) by utilizing combinations of both "intuitive" and statistical data. Your task, is to try to make the best estimate of probable length of stay in therapy given additional, but limited, information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decisionmaking process as it is applied by psychologists in cl inical sett ings.
PAGE 61
5^+ APPENDIX AIV INSTRUCTIONS PART IV Your task on Part IV is identical to that of Parts I, II, and III, utilizing the same 25 profiles. You arc to predict as accurately as possible on the basis of the information given, VJhether the student is (S) or (L) . Again, more information will be made available to you. In addition to the composite Z_ score, biographical data, and test data, you will also be told the conditional probabilities and base rates for the groups and test signs. Conditional probabilities combine test signs, (+) or () , and base rates to yield a quantitative index of the probability of correct classification when Z> 32.02 (+) or when Z^32.02 (). For example, some of the subjects will be (L) v.'hen Z_i32,02 (+) and some will be (S) when Z^< 32.02 (). The problem is to determine how confident vis can be with each test sign under the base rates of the population. The base rates for the two groups are: Short (S)66 per cent and Long (L)=3'+ per cent. In other words, 3^ per cent of the subjects stayed a long time in therapy and 66 per cent stayed only a short tine. The majority, therefore, were shorts (S). Based on this information, the conditional probabilities are: for a (+) test sign, P(L/+)=.51 and for a () test sign, P(S/)=.78. This means that the probability of a person staying a long time in
PAGE 62
55 therapy (L) , given a positive test sign, is .51, and the probability of a person staying a short time (S), given a negative test sign, is .78. A conditional probability of .51 for a (+) test sign means that you vjould be as often wrong as you viere correct in prediction (L) for a (+) sign. A conditional probability of .78 for a () test sign means that you would be correct more often than you would be v/rong in predicting (S) for a () test sign. Your task is to try to make the best estimate of probable length of stay in psychotherapy given additional, but limited, information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of tiie decisionmaking process as it is applied by psychologists in clinical settings. Again, please indicate your confidence in your judgment for each 'subject from 50 per cent to 100 per cent.
PAGE 63
APPENDIX B SUMFiARY OF NEV/MANKEULS TESTS
PAGE 64
57 APPENDIX Bl SUMMARY OF NEWMANKEULS TEST FOR GROUP MEAN DIFFERENCES Differences among Level I means ^GS h SGS Vs = ^^
PAGE 65
Differences among Level III means 58 X XX njGS p SGS XuGS = 63 Â— .03 .10 X^ = .66 Â— .07 ^SGS 73 Differences amonq Level IV means
PAGE 66
59 APPENDIX Bl I SUMMARY OF NEWMANKEULS TEST FOR INFORMATION MEAN DIFFERENCES Differences amonq SGS means X = .61 X .70 v X = .73 III ^1
PAGE 67
60 Differences among UGS means ^. ^11 ^ill ^IV X = .kS Â— .11X.,, ^ .63 .17
PAGE 68
REFERENCES Adams, J. K. A confidence scale defined in terms of expected percentages. An.er. J. Psychol.. 1957, 70, i+32^36. Dahlstror,!, W, G. , 6V/elsh, G. S. An MMP I handbook: a. guide t^ use in gLLn.'t:.^l pra ctice and r esearch . Minneapolis, Minn.: The University of Minnesota Press, 19^8. Edwards, A. L. ixp^JlilLejita.i des.ijin In p_syd^^ New York: Rinehart & Company, inc., 1950. Goicibirij, i. . R. The effect i veness of clinicians' judgements: the diagr^cris of brain dcr.oge from the BenderGesta 1 1 Test. J.. Â£.5..1SiiU..PL^Jlboi. . 1S59, 23, 2533. Goldbercj, L, f<. Simple models c simple processes? Some research on clinics! judgements. Arnejr. Psy chologist , 1968, 23, ^83^*96. Gclde.ij H. Some effects of combining psychological tests on clinical inferences. J. consult. Psychol. . 196'+. 28, UkOk^S. Hoit, K. R. Clinical and statistical prediction: a reformation and some ne\i data. J. a b n orm , spc . Psychol . , 1958, 56, 112. Holtzman, W. H. Can the computer supplant th.e clinician? J. cjiÂ±n. Psychol. . I960, 16, 119122. K i r i< , R . E . Ex per ime ntp 1 de_s_l_qn: grocediires. for t.h_e behavioral scie nces . Belmont, Calif,: Brooks/Cole Publishing Co., 1968. Kostlan, A. A method for the empirical study of psychod iagnos i s , J. con_5.Ljlt_. Psychol. , 195'Â», 13, 8388. 61
PAGE 69
62 LJncIzey, G. Seer versus sign. J. ex p . Res. Pers . . I965, 1, 1726. Keehl , P. E. CI in ica 1 versLis stat i st ical pred ict ion . Minneapolis, Minn.: University of Minnesota Press, 195^. Meehl , P. E= Seer versus signs: the first good example. J., ex p. R_esEsUJl, 1965, 1, 2733. Heehl, P. E. , SKosen, A. Antecedent probability and the efficiency of signs, patterns, or cutting scores. Psychol . Bui 1 . . 1955, 52, 19^)216. Megargee, E. !. (ed.). Research in clinical assessment . New York: Harper Â£ Row Publishers, 19^6. Mello, r,'ancy K, , Â£ Guthrie, G. M. MHPl profiles and behavior in counseling. J. c,gyji5_eJL. Psychol. . I958, 5, 125129. Oskamp, 5. The relationship of clinical experience and training metf.cds to several criteria of clinical prediction. Psychol . Mpn.oar,. , I562. 76. No. 5^*7. "Satz, P A block rotation task: the application or" iru 1 1 i va r iate and decision tlieory analysis for the prediction of organic brain disorder. Psychol . Monqgx. 1966, 8O, No. 629. Shaqoury, P. The jnf.l uence of. s tat i st ica 1 i nformat ion on cl ini ca 1 dec is "ons. Unpublished master's thesis. University of Florida, I969. Shagoury. P., & Satz, P. Tne effect of statistical information on clinical prediction. Proceed ings: 77th Annual Corivention, APA, Â•!S69, 310311.
PAGE 70
BIOGRAPHICAL SKETCH Ann V/eimer Moxley v/as born Ha rcfi Ui , \Shb, in Ne\J York City, NeV'J York. She attended public schools in Gainesville, Florida, and graduated from Gainesville High School in 196'+. She enrolled in the University of Florida in September, \3Sh , and received her Bachelor of Arts degree, magna cum laude, in December, 1967. She received her Master of Science degree at the University of Florida in December, 1968. She is currently engaged in her clinical psychology internship at the University of Rochester's Stronj Memorial Hospital in Rochester NevJ York. She is married to James Edward Moxley vvho received his Ncster 's in Business Administration at the University of Florida and is novj employed by Eastman Kodak in Rochester, Nev: York. Ann is a member of Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board. 63
PAGE 71
This dissertation WÂ£)s prepared under the direction of the chairinaii of the candidate's supervisory ccnimittee and has been approved by all members of that committee. It was submitted to the Dean of the College of Arts and Sciences and to the Graduate Couiicil, end was approved as partial fulfillment of the requirements for the degree of Doctor of Pliiloscphy. Dece;nber 1S70 Dean, College of Arts and Sciences Dean, Graduate School Supervisory Committee: Chairman ' i^(^
PAGE 72
/fff ^/^^ ^:t^ RX) 1 38.^^' ^'

