THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT
By
ANN WEIMER MOXLEY
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1970
ACKNOWLEDGEMENTS
The author wishes to express her gratitude to her committee
chairman, Dr. Paul Satz, for his comments, patience, and inspiration
throughout the initiation, implementation and completion of this re
search. The author is also indebted to the other members of her
committee, Dr. Audrey Schumacher, Or. Ben Barger, Dr. Marvin Shaw, and
Dr. Donald Childers, for their assistance and criticisms. The author
wishes to express her special thanks to her husband, Jim, whose moral
support was invaluable and most appreciated.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS
LIST OF TABLES
LIST OF FIGURES
ABSTRACT .
INTRODUCTION . .
METHOD . . .
RESULTS ...
DISCUSSION .
SUMMARY ...
APPENDIX A INSTRUCTIONS . . .
APPENDIX B SUMMARY OF NEWMANKEULS TESTS
REFERENCES . . . . . . . . . . .
BIOGRAPHICAL SKETCH ........
. . . 48
. . . . . . 56
LIST OF TABLES
bible Page
1. Actual and testpredicted therapeutic outcome . . .. 9
2. Design schematic .... .............. .18
3. Proportion of correct judgments . . . . .... 22
4. Summary of analysis of variance of accuracy ...... 23
5. Mean confidence scores . . . . . .... . .. 27
6. Summary of analysis of variances of confidence . . 28
7. Correlation coefficients for appropriateness . . .. .31
8. Summary of analysis of variance of appropriateness
correlations . . .. . . .. . . ... . 332
LIST OF FIGURES
Figure Page
i. Accuracy by information levels . . . . . ... 24
2. Confidence by information levels . . . . ... 29
3. AppropriaLeness correlations between accuracy and
confidence by information levels . . . . .... 33
Abstract of Dissertation Presented to the Graduate Council
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
at the University of Florida
THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT
By
Ann Weimer Moxley
December, 1970
Chairman: Dr. Paul Satz
Major Department: Psychology
The present study was designed to look at the effects of adding
quantitative and qualitative data to a relevant clinical judgment
task. In essence, it compared judges with varying degrees of clini
cal experience to actuarial prediction methods. The study also
attempted to train judges to use actuarial information to improve
their prediction accuracy.
Twelve judges representing three levels of clinical experience
made postdictive judgments on the length of stay in psychotherapy
(short or long) from a sample of MMPI profiles of clients seen in a
university mental health service. Judgments were made under four
conditions in which qualitative and quantitative information was
added incrementally at each level. The three levels of judges' ex
perience were professional clinical psychologists, "sophisticated"
third year clinical psychology graduate students trained in
statistical decision theory, and "unsophisticated" third year clini
cal psychology graduate students without any training in statistical
decision theory.
Accuracy increased over levels of information but there were no
differences in accuracy for the three levels of experience. A sig
nificant group by information level interaction demonstrated some
group effects due to a lower proportion of correct judgments for the
less experienced judges under conditions involving the least amount
of information.
Judges became more confident in their judgments as they received
more information. Appropriateness, defined as accuracy weighted by
confidence and measured by correlation coefficients between accuracy
and confidence, increased substantially as increments of information
were added. The group trained in statistical decision theory tended
to make the most appropriate judgments and the least experienced
group of graduate students tended to make the least appropriate judg
ments.
The present study showed that clinicians can use quantitative
data to improve their own judgmental ability and to predict more
accurately than actuarial data alone. Also, since those judges with
the most experience in using actuarial tasks tend to be the most
appropriate in their judgments, this implies that clinicians can also
be trained to be more appropriate and to know when their judgments
are more likely to be accurate.
INTRODUCTION
The objectives of the present study were twofold. The primary
objective was to examine the effects of test and nontest (statistical)
information on the judgmental process. The secondary objective, and
the focus for implementing the primary objective, was to study the
psychological attributes of individuals who stay only a short time in
therapy versus those who remain a long time. That is, the objective
of studying the judgmental process was couched in a real and relevant
situation, length of stay in psychotherapy, which is a meaningful and
pressing problem for psychologists today. However this secondary
objective was minor in relation to the major issue of examining thp
clinical judgment process.
Clinical Versus Actuarial Predictio.
Ever since Paul Meehl's book, Clinical Versus Statistical Pre
dict ,ion clarified the issue of clinicians' predictions versus
actuarial predictions, there have been numerous studies comparing
these two prediction methods. As Meehl (1954t) points out, however,
the two methods need not be mutually exclusive since the clinician can
incorporate actuarial methods and data into his prediction process.
Many studies have focused not only on comparing clinicians to statis
tical formulae but also on improving the clinician's ability to
predict by giving him useful statistical information and training him
to use this information.
In general, the studies which compared clinicians to actuarial
methods found that the actuarial methods were either superior to clin
icians or equal in efficiency to clinicians (Meehl, 1965). With the
exception of one study, the clinician has shown no superiority to
purely quantitative actuarial prediction. The one study which did
find clinicians superior (Lindzey, 1965) used one to two clinicians
and its application is somewhat questionable. One reason the clin
ician has not been superior to actuarial methods is that he has seldom
been given the opportunity to incorporate the actuarial information in
formulating his final decision. He has been at a disadvantage so that
the demonstrated superiority of the actuarial method may be due to the
experimental design rather than to an actual superiority of statistical
techniques. Also, the information available to the clinician has often
been based on nonquantitative data such as interview material, case
history data, and projective tests.
Holtzman (1960) separates the clinician's diagnostic task into
three phases: (1)collection of information; (2)preparation and trans
lation of this information for analysis; (3)interpretation of this
information. As he points out, actuarial methods, and specifically
the computer, are superior to the clinician in processing information
once the primary coding has been done. The clinician is still
superior at collecting information and at interpreting it because at
present the computer lacks the appropriate rules and parameters for
interpretation. Thus, studies which emphasize aspects of prediction
suitable for actuarial methods do not use the clinician's talents to
best advantage. It is when skilled clinicians use familiar methods
to predict a criterion they know something about that they have the
most success (Holt, 1958). This includes their having a rich body of
data and systematic actuarial procedures at their disposal in addition
to their own experience, intuition, and knowledge.
Recent studies suggest that as the amount of clinical experience
increases, prediction accuracy decreases (Goldberg, 1959; Oskamp,
1962; Shagoury, 1969; Shagoury & Satz, 1969). These studies compared
trained clinicians with a professional degree to clinical psychology
graduate students and even to nonprofessional groups, such as secre
taries, and have found that the trained clinicians were not superior
to the other groups. An explanation of this finding is that the more
experienced clinician has developed a particular way of looking at
data which interferes with his making unbiased, objective decisions.
Another aspect of research in the area of clinical versus statis
tical predictions is the confidence clinicians place in their judg
ments and the appropriateness of their predictions. Appropriateness
is a measure of confidence weighted by accuracy which was developed
by Adams (1957). Confidence in judgments also differs between groups
of graduate students and trained clinicians,with the trained psychol
ogists being less confident in their judgments (Goldberg, 1959; Oskamp,
1962). When the measurement of appropriateness of the judgment is
introduced, however, the trained clinicians are more appropriate in
their confidence levels than are either graduate students or non
professionals (Oskamp, 1962; Shagoury, 1969). That is, clinicians
are more confident of their correct decisions and less confident of
their incorrect decisions. The amount of information available to
the judge does not correlate with his predictive accuracy but
increased amounts of information substantially increase confidence
levels (Goldberg,1968).
Goldberg (1968) also discusses the nature of the judgmental
process. He questions whether judges use simple decisionmaking
models such as linear models, or complex processes such as configural
models. In an analysis of clinician's judgments he found that a
linear model usually reproduced 90 to 100 per cent of the reliable
judgmental variance on most decisionmaking tasks even though the
clinicians generally felt that they used more complex, configural
models.
Using Statistical Information to increase Prediction Accuracy
Training in the use of statistical information has been shown
to improve judgmental accuracy. In a study by Oskamp (1962), clini
cians were able to improve their ability to distinguish psychiatric
and medical patients on the basis of their Minnesota Multiphasic
Personality Inventory (MMPI) profiles when'they were provided with
actuarial rules. Statistical formula predicted with 75 per cent accur
acy and the clinicians, after training, were able to reach this 75
per cent accuracy level.
Goldberg (1968) trained judges by giving them a formula and
optimum cutting score for distinguishing neurotic form psychotic MMPI
profiles. The judges were told that the statistical information
predicted with 70 per cent accuracy and they were encouraged to use
this information along with any other information they thought would
improve their prediction accuracy. Goldberg found that after eight
weeks of "value training," the judges, on the average, increased their
accuracy from between 52 per cent to 65 per cent to approximately 70
per cent. This was the only type of training that substantially im
proved accuracy. Thus, feedback is necessary if the clinician is to
learn how to improve his decisionmaking techniques.
Another useful type of statistical information is the incidence,
or base rate, of a given trait in the population available to the
clinician. Goldberg (1959), for example, had judges predict brain
damaged patients from functional patients on the basis of Bender
Gestalt protocols. The protocols were randomized into different
groups in which the incidence of braindamage varied from high (P=.8)
to low (.=.2). Goldberg found no difference in judgmental accuracy
between these groups. Unfortunately, the base rate information was
not provided to the judges.
The importance of base rates for evaluating predictive tests was
discussed by Meehl and Rosen (1955). They cite as an example an Army
adjustment test for predicting which inductees would adjust to the
service. The test predicted inductee adjustment with an accuracy of
79.7 per cent. However, the overall percentage of inductees who
adjusted was 95 per cent; thus, utilization of the base rates alone
(i.e., predicting adjustment in all cases) would result in a hit rate
of 95 per cent.
Another application of base rates is through Bayesian statisti
cal theory which combines the base rates with the valid and false
positive rates of a particular test to give a conditional probability
for the likelihood of being correct or incorrect given a certain test
sign in a given base rate population.
Shagoury (1969) and Shagoury and Satz (1969) demonstrated that
clinicians can substantially improve their predictive accuracy when
provided with information on base rates and conditional probabilities.
These studies showed that increments in statistical information,added
to test data, significantly increased the accuracy of judges in a
reallife clinical decision task of predicting braindamaged and
functional patients on the basis of a block rotation task (Satz, 1966).
Their judges' accuracy approximated that obtained by a discriminant
function predictor score (L). Composite 7 scores were deemphasized
by the judges in favor of using the additional information such as
the base rates, differential error risks, and conditional probabil
ities. However, in groups with a high incidence of braindamaged
individuals (base rate=.8) the judges' overall accuracy decreased,
perhaps due to a reluctance to diagnose pathology.
Mechl and Rosen (1955) point out that test development should be
concentrated on populations with base rates near .50 rather than on
populations with base rates approaching .00 or 1.00 since the use of
a test in the latter cases will lower the hit rate of using the base
rates alone.
A cutting score, or composite Z score, derived from discriminant
function analysis can be manipulated for various purposes in predic
tion. It can be used to maximize the number of correct predictions
for all cases or for maximizing only correct predictions for positives.
An excellent application of this technique of discriminant func
tion analysis to decision theory in a clinical setting was demonstrated
by Satz (1966). Discriminant function analysis is a statistical tech
nique devised to maximally differentiate discrete criterion groups
when multiple measurements are involved. This is essentially a multi
ple regression technique except for a discontinuous distribution on
the criterion variables. The following linear equation expresses this
function:
= X, + X 22 +... + nn
where 7 is the composite predictor score based on the individual scores
on each of the variables (XI, X2,...,X) and the respective weights,
or lambdas, assigned to each of the variable scores (A*' A2"'"... n
If there are two criterion groups involving multiple measures, the
discriminant function determines optimal weights lambdass) for these
variables which will maximize the difference between the composite Z
scores on both criterion groups.
Length of Stay in Psychotherapy as a Criterion Variable
Why is length of stay in psychotherapy a meaningful problem for
study? First, there is the great demand for psychological services
with a presentday manpower shortage of trained clinicians. Most
clinics that see individuals with psychological problems are under
staffed, have patient waitinglists, or both. There are also differ
ential risks involved in selecting who will be seen in therapy. It
is far more serious to miss those who are severely disturbed and need
longterm psychotherapy because of the threat these individuals may
pose to themselves or to society, than it is to wrongly classify
persons who need only a few sessions and are experiencing minor
difficulties in their lives. The first type of error, that of pre
dicting a short stay in therapy based on a negative test score when
in fact the person stays a long time, is a false negative error. A
false positive error results from the prediction of therapy sessions
based on a positive test score when the individual actually stays
only a few sessions.
Meehl and Rosen (1955) point out that often in a clinical set
ting external restraints are imposed, perhaps due to a shortage of
staff time, patient waitinglists, or administrative policy. If this
is the case, decisions cannot always be made in accordance with known
base rates. They give the following example to illustrate the use of
an externally imposed selection ratio. If 80 per cent of the patients
referred to a mental health clinic are recoverable with intensive
psychotherapy, then everyone should be treated rather than relying on
a test which predicts only 75 per cent of those who will have a favor
able therapy outcome. However, if staff time is limited and only half
of the referrals can be treated, following the base rates is meaning
less because this would lead to a decision that would be impossible
to implement. In this case, where a selection ratio of .5 is exter
nally imposed, the use of the test becomes worthwhile. Given the
figures in Table 1 (Meehl & Rosen, 1955), those 50 cases out of the
100 referrals to be treated are selected from those individuals the
test predicts will be "good" therapy risks. If this is done there is
a 92.3 per cent hit rate among those selected for therapy (60/65).
Stated another way, the test will be correct in 46 out of the 50
cases which will be successes (half of the 80 good therapeutic
Table I
Actual and TestPredicted Therapeutic Outcome
Therapeutic Outcome
Test Good Poor Total
Prediction
Good 60 5 65
Poor 20 15 35
Total 80 20 100
outcome group).
A second reason for selecting length of stay in psychotherapy as
the focus for a clinical judgment study is that the problem can be
subjected to multivariate and statistical decision theory analysis in
order to increase the predictive relationship between signs and cri
teria. This possibility thus increases its application and potential
usefulness to clinical judges.
One study in this area found that there are differences in be
havior in psychotherapy between individuals which are predictable
from an MMPI profile (Mello & Guthrie, 1958). Mello and Guthrie
studied 219 individuals seen at a college psychological clinic. They
used only those profiles with at least one T score greater than 70.
They found that length of stay in therapy was related to high scores
on various scales of the MMPI. Of those students with high scores
on Scale 2(D), 45 per cent remained only one to three sessions. Per
sons high on Scale 3(Hy) tended to stay in therapy longer than the
high 2's and also developed dependency on the therapists more easily.
Scale 4(Pd) individuals seldom stayed past seven counseling sessions
and as a group were quite resistant to therapy although they did not
often cancel their appointments. Persons who stayed the longest in
therapy were high on Scales 7(Pt) or 8(Sc) with some clients contin
uing past 60 and 21 sessions respectively for these two scales. Most
of the high 9(Ma) students stayed fewer than II sessions and cancelled
therapy sessions frequently. Mello and Guthrie concluded that a
therapist can get some idea of what to expect from a particular client
on the basis of his MMPI profile.
The Mello and Guthrie study is interesting because it suggests
that psychological data (MlPI) may be used by clinicians to more
efficiently select clients for psychotherapy. Unfortunately, the
authors did not examine this problem within the context of a decision
making task nor did they subject their data to multivariate analysis.
Using length of stay in psychotherapy as the predictor criterion
is valuable for other reasons. For the professional involved, it may
clarify the services offered by his agency and help him to provide
more adequate services to his clients. For example,.he may decide
that seeing many clients for a short period of time is of more value
than giving those who need longterm therapy this service and thus
seeing fewer clients. That is, prevention may be emphasized in a
college mental health clinic and such a clinic may be designed to see
as many students as possible to ease their transition from high school
or junior college to a college curriculum. On the other hand, a
clinic may be more treatment oriented and seek to help those who are
more disturbed and require longer therapy. This emphasis would re
quire more staff time per individual client and would necessitate
seeing fewer clients. Decisions of whom to treat could be more ade
quately made with test and nontest information.
To be able to predict length of stay in therapy could affect
therapist expectations which could in turn affect outcome variables.
Just what effect an expectation for a particular length of stay in
therapy will have on the outcome of the therapy is outside the scope
of the present study but is an important research question in itself.
Of course, if the clinician intends to see everyone who enters his
clinic, a screening procedure is worthless or may even be detrimental
if the test predicts that an individual will not stay in therapy or
will not improve in therapy, because this may lead the therapist to
expect just these results to the client's disadvantage (Meehl & Rosen,
1955).
It is often necessary for the clinician to indicate a therapy
prognosis for an individual. If the clinician can predict or learn
to accurately predict whether or not a person will stay in therapy,
he is providing useful information for the person's treatment.
Thus it can be seen that clinicians are constantly involved in
the task of prediction and decisionmaking. If they can be trained
to make use of relevant data and material, they may improve their
predictions. Although many clinicians look with disfavor on the use
of tests, tests combined with other relevant data can be shown to
have practical and research applications. The clinician may use them
'to better his predictions and decisionmaking processes.
.HEpothLeses Tested
The present study was addressed to two objectives. First, to
examine the decisionmaking process and to determine whether predic
tion accuracy is influenced by independent variables such as clinical
experience and varying amounts and kinds of information. Second, the
question of whether clinicians can be trained to improve their clin
ical decision processes was also examined. The first and primary
objective was studied in terms of the second objective, a reallife
situation that is meaningful to clinicians todaythe problem of
length of stay in psychotherapy. If increments in levels of statis
tical information increase prediction accuracy and thereby improve
the clinicians' decision process, this type of information may be
dovetailed into the operation of a clinic and taught to the staff to
identify highrisk individuals. Specific questions, or hypotheses,
were raised. Does judgmental accuracy increase as more information
is added to the prediction task and what types of information are
most useful in increasing judgmental accuracy? Will there be differ
ences in accuracy dependent on experience level? That is, will grad
uate clinical psychology students trained in statistical decision
theory be better clinical judges than experienced PhD clinical
psychologists (without such training) and will less experienced
psychologists be superior to more experienced clinicians? Will con
fidence and appropriateness increase with increments in information
and will there be differences between the three experience levels,
with regard to their confidence and appropriateness.
METHOD
Sub ects. Twelve judges (Js) represented three levels of
experience and sophistication in statistical decisionmaking. A pro
fessional (P) group of four PhD clinical psychologists represented
the highest level of clinical experience. A group of four clinical
psychology graduate students trained (sophisticated) in statistical
dccisicnmaKing theory (SGS) represented the highest level of statis
ticai sophistication. Another group of four unsophisticated (not
trained in statistical decision theory) clinical psychology graduate
students (UGS) represented the same experience level as the SGS group
and the same level of statistical sophistication as the P group.
Sophistication in decision theory was defined as participation in a
graduate course in statistical decision theory for clinical psychology
students at the University of Florida. Sophistication here only im
plies special training and by no means implies that the professionals
were clinically unsophisticated.
Materials. Test materials for Js were a random sample of 100
MMPI profiles of clients seen in a university mental health service.
The sample profiles were drawn from 241 profiles of all clients seen
during a threeyear period. Each J received 25 of the 100 profiles.
Profiles were divided into two groups based on the client's
length of stay in psychotherapy at the mental health service. A short
stay (S) was defined as four or less therapy sessions and a long stay
(L) as five or more therapy sessions. The mean length of stay for
the S group was 2.00 sessions and for the L group 9.27 sessions.
A discriminant function analysis which maximized the difference
between the two length of stay in psychotherapy groups was run on
the 241 MMPI profiles. The mean discriminant composite scores for
the two length of stay in therapy groups on the 13 MMPI scale vari
ables were Z =29.74 for the fewsession group (S) and Z2=34.26 for
the manysession group (L). An analysis of variance of the composite
means showed a significant difference between the two groups (F=4.19,
df=12,224, p<.001). A commonly used rule of 2= 1i + 2 was used to
2
determine the optimal predictive cutting Z score.
With an emphasis on minimizing the false negative rate, the com
posite Z score of 32.02 predicted with an overall hit rate of 67 per
cent for the original protocol pool. False negative errors repre
sented those clients who were predicted as, hortstays (S), or
negatives, but who remained long in therapy (L). It was felt that
this predictive error was more serious than the false positive error
which included those clients who were predicted as longstays (L)
but who remained a short time in therapy (S). It seemed more impor
tant to identify those clients who really needed longterm therapy
than to identify those who did not. Of course, some of the individ
uals with high test scores who stayed only a few sessions may have
been very disturbed but dropped out of therapy prematurely. There
was no way to identify these case's when a very disturbed student may
have panicked or become threatened by therapy and dropped out or
simply missed appointments. The false negative rate for the Z score
of 32.02 was .38, the false positive rate was .31, giving a valid
negative rate of .69 and a valid positive rate of .62.
Another Z score which minimized the false positive error pre
diction with an overall accuracy of 71 per cent was not used in
the present study for the reason stated above.
Conditional probabilities were calculated for the 7 cutting
score. Conditional probabilities were computed with the following
equations:
P/ (L)/L and P(/) L
P(L/+) = P(L)P(+/L) + P(S)P(+/S) anP(S)P(/S) + P(L)P(/L)
wherL L=many therapy sessions or a long stay in therapy (base rate=.66)
S=few therapy sessions or a short stay in therapy (base rate=.34)
+=a positive test score (Z 1 32.02)
=a negative test score (Z < 32.02)
For the 2 score of 32.02 the conditional probabilities were:
P(L/1)=.51 and P(S/)=.78. With this new information it can be seen
that with a positive test score, predictions will be wrong as often
as they are correct. But given a negative test score, predictions
will be right 78 per cent, or most of the time.
Finally, a random sample of 100 profiles from the total protocol
pool of 241 cases was drawn. This was done so that the Js would have
fewer protocols to judge, making their task more economical with
regard to time.
'Failure to control this factor undoubtedly lowered the predic
tive accuracy of the discriminant function equation (and perhaps
clinical judgment) in that some of the disturbed profiles in the (S)
criterion 9rouip may well have remained (L) if they had not dropped out.
A second reason for drawing a random sample was to make the
situation more relevant clinically in terms of the base rates. That
is, the sample had only approximate base rates and the judges did not
know the exact probabilities for their sample of those who remained a
long or short time in therapy. However, for the sample, the 2 score
predicted with the same accuracy that it did for the total protocol
pool.
Procedure. Refer to Table 2 for a schematic of the design. Is
were asked to predict a client's length of stay in psychotherapy
from the 25 MIPI profiles. These profiles, the sample of 100 pro
files and the original profile,pool all had approximately the same
base rates: 35 per cent of the clients stayed many sessions (L) and
b6 per cent stayed a few sessions (S). The Js predicted length of
stay in therapy (S or L) during four sessions,with additional infor
mation added incrementally at each session. These sessions, or
levels of information, represented one class of independent variables.
Groups, or experience level, represented the other class of indepen
cent variables.
Each J made his predictions on the same 25 protocols that he
received at the first level throughout the training. Level 1: Js
were first given MMPI profiles with no other information Level 11:
Is were again presented the same 25 protocols for the same judgment
but with the additional information of biographical data such as age,
sex, marital status, religious preference, parents' marital status,
previous counseling experience, and subsequent counseling exper
ience. Level ill: For the third decision task, is were given the
2ZJ v\Or vm
en Uvf
a fi iU
Ufi 0Vc enf OV~ enfi U~U
gil Iii jai
cNvf 1Jen\Ohm 0\OIN
Q J0u
NM@
O)ON
mff?
mirt
????
T I V 7
.I u
profiles, biographical data, with the additional statistical infor
mation of the cutting score based on discriminant function analysis.
Valid positive and false positive percentages were also provided
with the cutoff Z score. Level !V: Conditional probabilities and
the base rates were added to the previous information for the fourth
presentation of profiles for prediction. (For a copy of the instruc
tions for each information level see Appendix A.) For each judgment
Js also indicated their confidence in the accuracy of their judg
ment.
To rule out a practice effect from repeated presentation of the
same profiles, two control judges were used who predicted length of
stay in psychotherapy using profiles only,with no additional infor
mation on four separate occasions.
Judges were presented the profiles for judgments on four days
in a row with only one information level given each cay.
Hypotheses. IInformation Level: It was hypothesized that
increments of information would increase overall judgmental accuracy
and group accuracy. (A) Level I accuracy would be at approximately
the level of chance. (B) At Level II, accuracy would decrease or
remain the same. (C) Level III accuracy would be approximately that
of the actuarial prediction accuracy of the discriminate function
analysis. (D) Level IV accuracy would increase slightly over Level
II accuracy.
llExperience Level: It was hypothesized that the statistic
ally sophisticated graduate students would be the most accurate, the
statistically unsophisticated graduate students next most accurate,
20
and the professionals least accurate.
IllConfidence and Appropriateness: It was hypothesized that
confidence ratings would increase with increments of information and
that appropriateness would also increase with more information.
RESULTS
Accuracy: The Effects of Infornation and Experience
Accuracy was defined as the proportion of correct judgments per
presentation of 25 MMPI profiles. The two control Js showed no prac
tice effects. Judge A's accuracy was 52 per cent on the first
presentation and 48 per cent on each of the three subsequent presen
tations. Judge B's accuracy was distributed across sessions as
follows: 76 per cent, 48 per cent, 68 per cent, and 68 per cent.
Table 3 presents accuracy by information level and experience
level. Two analyses of variance were conducted to determine the
effects of information level, experience level, judges, profile set,
and profile. The analysis of variance for profile set effects was
nonsignificant (f'=.59, df=3,91). An Emax test for homogeneity of
variances between groups was also nonsignificant (fmax=343' k=3,
df=16). A surmary of the analysis of variance for information level
and experience level effects is shown in Table 4.
Information. Mean judgmental accuracy increased consistently
with increments of information from Level i to Level IV (x =.55,
Xl=.61, Xi '.67, XV=.69). These differences were significant
(L10.82, df=3,27, pa .01). A graphic presentation of this trend
is shown in Figure 1. Inspection of Figure I shows approximately a
linear increase in accuracy for the three groups by information
level. Both the P and UGS groups increased their accuracy at each
level while the SGS group showed increases at Levels II and Ill
21
Table 3
Proportion of Correct Judgments
Experience Information Level Totals
Level J I II III IV
S .64 .68 .76 .76 .71
SGS 2 .56 .68 .72 .64 .65
3 .64 .68 .72 .72 .69
4 .63 .60 .72 .68 .65
Total .61 .66 .73 .70 .68
5 .72 .64 .64 .76 .69
UGS 6 .40 .64 .72 .68 .61
GS 7 .36 .52 .48 .56 .48
8 .36 .48 .68 .64 .54
Total .46 .57 .63 .66 .58
9 .44 .48 .68 .72 .58
10 .68 .68 .64 .76 .69
P 11 .64 .68 .68 .72 .68
12 .56 .60 .64 .60 .60
Total .58 .61 .66 .70 .64
Totals .55 .61 .67 .69
Control A .52 .48 .48 .48 .49
Control B .76 .48 .68 .68 .65
Table 4
Summary of Analysis of Variance of Accuracy
Source of Variation df MS F
Mean I 478.80
Information Leel 3 1.21 10.82*
Experience Level 2 0.92 2.31;
Information X Experience 6 0.81 7.23*'
Judges 9 0.39 0.69
Information X Judges 27 0.11 0.95
Profile 288 0.57
Information X Profile 864 0.11
** Significant at the .01 level.
 SGS
A UGS
70
0
5 0
0
II III IV
Information Level
Fig. i. Accuracy by information levels.
but a slight decrease in accuracy from Level III to Level IV.
Fxperiencc. There were no differences in accuracy due to exper
ience level except for a trend toward group differences (F=2.34, df=
2,9, p<.20). The SGS group was the most accurate and the UGS group
the least accurate (XGs=.68, p=.64, _uGS=58). Only the SGS
group's overall accuracy was at the level of the discriminant func
tion which predicted with 67 per cent accuracy.
Information and experience level interaction. The only other
significant source of variance was the group by information level
interaction (F=7.23, df=6,27, p<.01). The NewmanKeuls test of
differences between means was used (Kirk, 19G8) and the results of
this analysis are given in Appendix B. The interaction was based
largely on a significantly lower proportion of correct judgments of
the UGS group at Level I. The UGS group not only started with the
lowest proportion of correct judgments, but also showed the most
significant increase in accuracy as information was added. Their
final degree of accuracy, however, was approximately the same as the
SGS accuracy at Level I! The UGS group significantly increased
their level of accuracy at Levels III and IV from Level I when the
composite 7 score, conditional probabilities, end base rates were
added (p<.01).
The only significant increase in accuracy for the P group was
between the first level, with the profile only, and the final level
With all information (p<.05). Increases in accuracy for all groups
across information levels were significant except the increase from
Level III to Level IV where conditional probabilities and base rates
were added. Adding conditional probabilities and base rates to the
previous information did not result in a significant increase in
Js' accuracy over Level III, which included the composite 7 score.
For the SGS group there were no significant differences in accuracy
across information levels. The only significant group differences
within information levels were between the SGS group and the UGS
group (p4.Ol) and between the SGS group and the P group (p .05).
Confidence
Mean confidence scores by information level and experience
level are shown in Table 5. Table 6 shows the summary of the
analysis of variance for confidence scores. The Js confidence in
creased significantly as subsequent items of information were added
to the protocols for all groups (F=5.38, df=2,28, p4.05). Although
there were no differences between confidence scores for groups, the
P group tended to be the most confident and the SGS group tended to
be the least confident (Xp=76.86, UGS=72.72, GS=69.96). These
trends are shown in Figure 2.
Aonroraten ass
A measure of appropriateness (confidence weighted by accuracy)
was measured by Pearson productmoment correlations between confi
dence scores and accuracy scores for each J at each level of
information. There were no significant group effects or interact
tions but the SGS group tended to be the most appropriate (LSGS=.26,
p =.20, I GS=.17). The higher the correlation, the more appropriate
the judgment. Correlation coefficients for appropriateness are
Table 5
Mean Confidence Scores
Experience Information Level Totals
Level J I I III IV
1 61.2 64.0 65.6 65.6 64.1
2 61.2 60.8 60.8 72.0 63.7
SGS 3 75.6 74.0 73.6 74.4 74.9
4 72.0 76.2 80.4 82.0 77.7
Total 67.5 68.8 70.1 73.5 70.0
5 59.0 58.0 66.0 69.2 63.1
UGS 6 85.4 86.8 84.0 92.2 87.1
7 78.2 79.8 76.2 82.2 79.1
8 64.2 61.6 61.6 62.0 62.4
Total 71.7 71.6 71.5 76.4 72.7
9 87.6 87.2 88.4 90.0 88.3
10 64.8 56.0 65.8 56.0 60.7
II 85.2 87.8 86.4 86.6 86.5
12 68.0 70.4 76.8 72.8 72.0
Total 75.2 75.4 79.4 76.4 76.9
Totals 71.4 71.9 73.8 75.4
Table 6
Summary of Analysis of Variance of Confidence
Source of Variation df MS F
Information Level 2 52.67 5.38*
Experience Level 2 191.84 .39
Information X Experience 6 12.73 1.30
Judges within Croups 9 493.76
Information X Judges 28 9.79
* Significant at the .05 level.
80
ou
70
0
0 60
o5 .n SGS
2: C0 P
h L UGS
50
I II III IV
Information Level
Fig. 2. Confidence by information levels.
given in Table 7 with the analysis of variance summary for appro
priateness in Table 8. The analysis of variance was based on Z
transformations of the correlation coefficients. Mean appropriate
ness scores were significantly higher at each level of information
(E=22.03, df=2,28, p<.01).
The SGS group was most appropriate because they were most
accurate and not overconfident, that is, their confidence was con
sistent with their accuracy. The P group was overly confident and
the UGS group was less accurate, making these two groups' confidence
inconsistent with their accuracy. These trends can be seen in
Figure 3.
Judges versus the discriminant function
The discriminant function correctly classified 67 per cent of
the profiles. This information was given to the Js at Level Ill.
At Level Ill only the SGS judges were more accurate than the discri
minant function with a hit rate of 73 per cent. The P group had a
hit rate of 66 per cent and the UGS group had a hit rate of 63 per
cent at Level 11l. The accuracy for all judges combined was 67 per
cent. Five judges (two in the SGS group, two in the P group and one
in the UGS group) were more accurate overall than the linear regres
sion 2 score and only one J (in the UGS group) operated below the
chance level overall.
At Level I, four Js had accuracy scores below the level of
chance and two others were only slightly above chance. However,
none of tie four Js who was below chance were in the SGS group. At
Level !I there were two Js below chance and one slightly above; again,
Table 7
Correlation Coefficients for Appropriateness
Experience Information Level Totals
Level J I II III IV
S .35 .35 .34 .28 .33
SGS 2 .25 .20 .18 .65 .32
3 .34 .31 .A1 .28 .34
4 .11 .06 .01 .02 .05
Total .26 .23 .24 .31 .26
5 .13 .06 .24 .03 .07
U S 6 .03 .15 .10 .47 .10
7 .05 .03 .30 .47 .17
8 .29 .37 .41 .27 .34
Total .09 .04 .26 .30 .17
9 .27 .20 .17 .33 .24
S10 .06 .23 .33 .31 .23
1I .11 .19 .29 .28 .22
12 .02 .07 .08 .41 .10
Total .11 .14 .22 .33 .20
Totals .12 .13 .24 .31
Table 8
Summary of Analysis of Variance of Appropriateness Correlations
Source of Variation df MS F
Information Level 2 .0705 22.03**
Experience Level 2 .0366 2.26
Information X Experience 6 .0032 2.25
Judges within Groups 9 .0162
Information X Judges 28 .0072
** Significant at the .01 level.
33
rj0 SGS
00 P
& UGS
.30
20
^ .10
.00
I II llI IV
Information Level
Fig. 3. Appropriateness correlations between accuracy
and confidence by information levels.
34
none of these Js was in the SGS group. With the addition of the
statistical information, only one J (in the UGS group) remained near
the chance level of accuracy and he was the least accurate of all
the Js.
DISCUSSION
The present study demonstrated that judges can substantially
improve their decision accuracy when provided with increments of
information, particularly statistical information. This finding
extends the earlier findings reported for a different clinical judg
ment task (Shagoury & Satz, 1969) and contrasts with'previous studies
which have used nonquantitative data. These findings also suggest
that if the clinician is able to incorporate quantitative information,
he may improve his own decisionmaking ability and equal or surpass
the accuracy of actuarial methods.
The findings of the present study also showed that accuracy
increased directly as a function of the amount of information avail
able to the judges. Two conclusions that can be drawn from this
finding are that the information was relevant to the judgmental task
and that the judges used this information in formulating their
decisions.
Information
A posttesting interview revealed that the type of information
used varied between groups, among judges, and between information
levels. However, the interview was not structured enough to deter
mine the actual decision rules used by the judges.
At Level I, with the MMPI profile only, most judges used their
35
own intuition about the relationships of which scales were elevated
and the extent of these elevations to the length of stay in treat
ment criterion. There was a great deal of individual variation in
approach since each judge had different training experiences with
the MMPI. The judges of the SGS group had the most similar training
experience in the use of the MMPI since some training in the ration
ale and use of this instrument was given in the statistical decision
theory course. The SGS group also showed the least amount of indi
vidual variation in accuracy at Level I. The other group of graduate
students (UGS) had the least amount of exposure to the MMPI. The
UGS group was barely familiar with this test instrument and none of
these judges had had any formal training in its use. It is inter
esting to note that the group of unsophisticated graduate students
showed the lowest accuracy throughout and was the only group whose
accuracy was never below the level of chance. It seems then, that
the more familiar a judge is with a test instrument, the more accur
ate he will be in using it for prediction.
The SGS group was not familiar with the specific type of task
used in the present study. That is, they had not been trained in
correlating MMPI data to length of stay in psychotherapy. This
aspect of the study was novel to each of the three groups.
At Level II again each judge approached the data differently
and selected certain measures to use in making his decisions. The
variation within groups decreased and there was less difference in
this variation between groups. Accuracy increased or stayed the
same for all but one judge whose accuracy dropped. Thus, the
hypothesis that accuracy would decrease at Level II was not supported.
It was originally felt that all of the biographical data provided
would make the task more complex and more difficult and would thus
confuse the judges. However, the judges were able to relate some of
the information to the task and thereby improve their judgments.
Most judges used some combination of factors. Whether the profile
subject had previous counseling or subsequent counseling and his age
were the factors used most often. Some of the judges also considered
marital status when the subject was married. This finding (Level II)
contrasts with other studies which indicate lowering of accuracy
when data are combined (Golden, 1964).
In support of the hypothesis, overall accuracy for Level III
was the same as the discriminant function's accuracy of 67 per cent.
With the addition of Z scores at Level Ill, only one judge, who was
in the P group, used the cutoff score exclusively. In this same
group one judge changed none of his judgments from Level II and the
other two judges used primarily their own subjective inferences. The
UGS group essentially ignored the Z scores and relied on their own
intuition and thus did not reach the level of accuracy of the cut
off score. All judges in the SGS group combined the cutoff score
data with their own intuition to improve upon the accuracy of the
discriminant function. These findings imply that the clinician can
make use of his intuition and experience but not at the expense of
ignoring available data, particularly when they include quantitative
information. The findings also imply that the most accurate judges
are the ones who are able to utilize statistical data.
The fact that there was an increase in accuracy from Level III
to Level IV,but that this increase was nonsignificant, supported the
hypothesis that Level IV accuracy would increase slightly over Level
III accuracy. For the SGS group there was a slight decrease in
accuracy. One reason for this decrease might have been the informa
tion itself. These judges had been trained to use more powerful
statistical information, that is, data that discriminated groups and
subgroups more than did the data of the present study. The base
rates of .65 and .35 were not sufficiently different from base rates
of .5 to be of much help. Also the conditional probabilities were
not high enough to provide maximum discrimination. All of the
statistical data given were in approximately a 2/3 to 1/3 ratio.
Because all of the statistical data had approximately the same pre
dictive power, it may have been difficult to know which kind of
information would be most useful. Instead, judges may have tried to
combine two or more kinds of data and as a, result were less accurate
than they would have been using one type exclusively. Quantitative
information is most useful when it represents higher ratios, such as
base rates of .2/.8 or .1/.9; conditional probabilities of .85/.15;
and cutoff scores of 75 per cent or higher.
Even though five out of the twelve judges showed decreases in
accuracy at Level IV in comparison with Level III, these decreases
were slight and represented only one more incorrect judgment out of
the 25 judgments for all five judges.
At Level IV, the UGS group ignored the base rates and used the
conditional probabilities. The SGS group and the P group both used
a combination of conditional probabilities and base rates and both
of these groups had the same degree of accuracy at Level IV. Also,
both the SGS and P groups were more accurate than the UGS group.
It seems probable that statistical information is more impor
tant than biographical information about the subjects since there
was a greater increase in accuracy with the addition of statistical
information. Other studies have shown that biographical dataare of
minimal value to judges. Golden (1964) found that judges agreed
less in their description of protocols when they were given identi
fying data alone than when they were given a single psychological
test or a combination of tests. Kostlan (1954) found that judges
were more accurate in their psychodiagnoses when they received both
social case histories and the more quantitative MMPI protocol than
when they received social case histories alone.
One may ask if the judges would have been more accurate had they
been given some feedback on their accuracy at each level of infor
mation. This is possible but then the task would not have been as
lifelike in the sense that clinicians in actual situations must
usually wait some time before learning the accuracy of their predic
tions. However, this does emphasize the point that clinicians should
check the accuracy of their predictions when possible and learn what
helps them to predict most accurately.
Exeperie nce and train i n
The lack of overall differences between groups was not antici
pated. It was assumed that the SGS group would have benefited
from their training in statistical decision theory. However,
artifacts in the design tended to wash out group effects by providing
a guaranteed hit rate if the Z scores were used at Levels III and IV.
The convergence of judgmental accuracy foi each group at Level IV
lends some support for this argument.
The hypothesis that the SGS group would be the most accurate
was thus only tentatively supported since there was not a signifi
cant group effect. However, the SGS group tended to be the most
accurate in their judgments. This finding implies that clinicians
can be trained to improve their own subjective inferences with
statistical information. These judges trained in statistical de
cision theory were able to add their own intuitive judgments to the
statistical data and thereby predict more accurately than did the
discriminant function alone or than they had done without the statis
tical data. This special training taught them not only how to use
statistical information but also how to use their clinical intuition
to its best advantage. The SGS group also tended to be the most
appropriate, that is, to know when their judgments were most accurate
and when they were most inaccurate.
The fact that the P group tended to be more accurate than the
UGS group was also unexpected in light of previous findings con
cerning amount of clinical training and accuracy of prediction.
This finding does not support the previous evidence (Goldberg, 1959;
Oskamp, 1962; Shagoury, 1969; Shagoury & Satz, 1969) that as the
amount of clinical experience increases, prediction accuracy
decreases. In the present study, judges in the SGS and P groups
used familiar methods, the MMPI profiles or statistical data; the
UGS group, by contrast, was presented with essentially unfamiliar
prediction tools. One reason data in the present study were at
variance with previous findings is that previous studies required
clinicians to predict an unknown criterion or to use unfamiliar
methods so that any previous "set" of the clinician was not advanta
geous. In the present study, the judges' familiarity with either the
MMPI or statistical types of data helped them in their predictions.
Interaction effects
The significant group by information interaction effect showed
at least indirect support for the experience level hypothesis that
the SGS group would be most accurate. This interaction showed that
the unsophisticated graduate students started off predicting below
chance and, finally at Level IV, reached the level of accuracy that
the sophisticated graduate students attained at Level I (MMPI pro
files alone). It was the former group with the least amount of
experience, familiarity with the MMPI, and sophistication with the
statistical decision theory which accounted for most of the group
differences and much of the interaction effect.
The rest of the interaction effect was due to the changes in
accuracy across information levels with the UGS group showing the
greatest change and the SGS group showing the least amount of
change in accuracy. The latter group started out predicting fairly
accurately and had less room for improvement while the former group
started out so poorly that their improvement was marked. The SGS
group predicted almost as well as the discriminant function with
the profiles only. The UGS group improved from below chance to the
level of accuracy achieved by the discriminant function.
Confidence
Previous studies would suggest that the trained clinicians
should have had less confidence in their judgments than the two pre
professional groups. Although group differences for confidence were
nonsignificant, the professionals in the present study tended to be
the most confident. Again, this may have been because they were
using the WIPI with which they were more familiar than were the other
two groups. Also the professional group was predicting a criterion
about which they knew something, that is length of stay in psycho
therapy. This again suggests that previous studies have placed the
clinician at a disadvantage so that he is less accurate and less con
fident than he would be predicting in a familiar setting.
In general, adding information substantially increased the
judges' confidence. Judges became more confident as well as more
accurate with increments in information. However, the UGS group's
confidence did not increase until they had all the available infor
mation.
One problem with asking judges to assign a confidence rating to
each judgment was that each judge had a different standard or set
for measuring how confident he was. The range of confidence scores
used also varied between'judges and within groups so that one judge
used all six possible levels of confidence ranking (50, 60, 70, 80,
90, and 100) while another judge only used two (60, 70) or three
(80, 90, 100) rankings.
Approprlateness
The most meaningful measure to express appropriateness, defined
as the relationship between accuracy and confidence, was the corre
lation coefficient. Just as accuracy and confidence increased with
each level of information, so did appropriateness. As judges became
more accurate they also became appropriately more confident.
The increases in appropriateness followed the same pattern as
the increases in accuracy and confidence. That is, appropriateness
increased significantly across levels of information but there was
only a tendency for one group to be more appropriate than the other
groups. As with accuracy, the SGS group tended to be the most appro
priate and the UGS group tended to be the least appropriate. This
contrasts earlier findings that trained clinicians are more appro
priate in their confidence levels than are graduate students in
psychology (Oskamp, 1962; Shagoury, 1969). The findings of the
present study, however, do not contradict.earlier findings since the
present differences between groups on the measure of appropriateness
were nonsignificant.
A ppl icat ions
It appears that actuarial data and training in their use can be
applied to situations in which clinicians must predict and make de
cisions. In the present study, judges were able to postdiet length
of stay in psychotherapy fairly well. The next step would be to
apply these techniques to the same setting and Lr.dict a client's
length of stay in psychotherapy. This could then be followed up at
the end of treatment as a check of prediction accuracy. This would
enable the clinician to determine which short stays were "no shows"
and which were treated. Thus the discriminant function Z score and
judges predictions could be much higher and more useful for practical
application to the clinician's population of clients. This type of
procedure is most useful in a clinic situation which must limit the
number of clients seen or must screen those that will be seen.
Statistical methods of prediction can be particularly applicable to
the screening of patients to determine what type of treatment is most
appropriate and would be most useful for each client.
To use actuarial data in a clinic situation,they must first be
collected and analyzed. Too many clinical situations today fail to
make use of the data they have available. They do not even know the
base rates for various classifications of the clients they see.
Collecting and analyzing statistical data is another way to more
fully understand a particular clinical setting by learning what type
of patients are seen, how long they stay in treatment, and hopefully,
which ones are most likely to improve.
If a clinic decides to see everyone who comes in for help, tests
and statistical data are not of benefit in selecting whom to see.
However, these data might be used for prediction and research in a
setting which sees all clients. It is in situations where everyone
cannot be treated that improving tests and collecting base rate
information is most needed. Where decisions and predictions must be
made, actuarial methods are most needed to improve the clinician's
decisionmaking ability.
45
A further study which would be a fair and optimal test of clini
cal versus statistical prediction would be to give judges an oppor
tunity to see the relationships of test variables with a criterion
on a standardization sample. Then, the judges would be compared
with a discriminant function on a cross validation sample. However,
this was not the purpose of the present study.
SUMMARY
The present study was designed to look at the effects of adding
quantitative and qualitative data to a relevant clinical judgment
task. In essence, it compared judges with varying degrees of clini
cal experience to actuarial prediction methods. The study also
attempted to train judges to use actuarial information to improve
their prediction accuracy.
Twelve judges representing three levels of clinical experience
made postdictive judgments on the length of stay in psychotherapy
(short or long) fro.a a sample of MMPI profiles of clients seen in a
university mental health service. Judgments were made under four
conditions in which qualitative and quantitative information was
added incrementally at each level. The three levels of judges' ex
perience were professional clinical psychologists, "sophisticated'
third year clinical psychology graduate students trained in statis
tical decision theory, and "unsophisticated" third year clinical
psychology graduate students without any training in statistical
decision theory.
Accuracy increased over levels of information but there were no
differences in accuracy for the three levels of experience. A sig
nificant group by information level interaction demonstrated some
group effects due to a lower proportion of correct judgments for the
less experienced judges under conditions involving the least amount
46
of information.
Judges became more confident in their judgments as they received
more information. Appropriateness, defined as accuracy weighted by
confidence and measured by correlation coefficients between accuracy
and confidence, increased substantially as increments of information
were added. The group trained in statistical decision theory tended
to make the most appropriate judgments and the least experienced
group of graduate students tended to make the least appropriate judg
ments.
The present study showed that clinicians can use quantitative
data to improve their own judgmental ability and to predict more
accurately than actuarial data alone. Also, since those judges with
the most experience in using actuarial tasks tend to be the most
appropriate in their judgments, this implies that clinicians can also
be trained to be more appropriate and to know when their judgments
are more likely to be accurate.
APPENDIX A
INSTRUCTIONS
APPENDIX AI
INSTRUCTIONS PART I
This study is designed to examine the decision process when only
limited information is available. You will be presented with 25
Minnesota Multiphasic Personality Inventory (HMPI) profiles of stu
dents seen at the University of Florida Infirmary Mental Health
Service. Some of these students stayed a long time in therapy (5 or
more sessions, X=9) and some stayed only a short time (4 or less
sessions, X=2). Your task will be to decide which students stayed
a long time (L) and which stayed only a short time (S) on the basis
of the test profile alone.
Your task is to try to make the best estimate of probable length
of stay in psychotherapy given only limited information. It is pos
sible to correctly classify all the profiles. It is hoped that your
predictions will in some way help us to understand one aspect of the
decisionmaking process as it is applied by psychologists in clinical
settings.
You will also be asked to rate your confidence for each subject
on a scale from 50 per cent to 100 per cent. If you are positive of
your decision, you should mark 100 per cent; if you are only guessing
you should mark 50 per cent. That is, the more certain of your de
cision, the higher percentage you should mark.
APPENDIX AIl
INSTRUCTIONS PART II
Your task on Part II is identical to that on Part I. You will
be presented the same 25 profiles and asked to predict (S) or (L).
However, this time more information will be available to you. That
is, you will also have biographical data. You may use this infor
mation in any way you wish. You may choose to disregard the infor
mation altogether and make your predictions as you did in Part I.
Your task is to try to make the best estimate of probable length
of stay in therapy given only limited information. It is possible to
correctly classify all the profiles. It is hoped that your predic
tions will in some way help us to understand one aspect of the
decisionmaking process as it is applied by psychologists in clinical
settings,
Again, please indicate your confidence in your judgment for each
subject from 50 per cent to 100 per cent.
APPENDIX AII
INSTRUCTIONS PART III
Your task on Part III is identical to that of Parts I and II.
You will be presented the same 25 profiles and asked to predict as
accurately as possible, on the basis of the information given,
whether the student is (S) or (L). Again, more information will be
made available to you. The following statistical information will
be added.
Discriminant function analysis provided weights for each of the
13 MMPI scale variables in order to obtain maximal differentiation
between long stayers (I.) and short stayers (S). A composite score
'(Z) was obtained which best estimates the combined relative effects
of all the scale variables.
This Z score is used to make the best prediction as to which
criterion group a particular profile belongs. This can be summarized
as follows:
1. ZA32.02 is a positive test sign (4) and indicates a probable
long stay in therapy (.).
2. Z<32.02 is a negative test sign () and indicates a probable
short stay in therapy (S).
No test, however, classifies without some errors. This derived com
posite cutoff (Z=32.02) yields the following percentages of
classification:
Composite test sign Criterion
S L
2<32.02 69% 38%
Z!32.02 31% 62%
1
In other words:
1. A (+) test sign (Zd 32.02) correctly classified 62 per cent
of the long stayers (L). This is known as the valid positive
rate. Also, a (+) test sign incorrectly classified 31 per
cent of the short stayers (S) and this is the false positive
rate.
2. A () test sign (Z c32.02) correctly classified 69 per cent
of (5), the valid neaatjv ra, and incorrectly classified
38 per cent of (L), the false negative rate.
This means that 38 per cent of the (L)'s scored below 32.02 and were
incorrectly classified (S), and 31 per cent of the (S)'s scored above
32.02 and were incorrectly classified as (L). The total percentage
correctly classified was 67 per cent.
You will be required to predict as accurately as possible
whether the student belongs to (S) or (L), short or long stay. It is
possible to score every profile correctly scoring 100 per cent.
You may predict (S) or (L) by using (1)the composite Z score
cutoff, (2)the biographical data, (3)the profile alone, or (4)any
combination of (1), (2), and (3). The composite cutoff score was
applied to yield the best overall classification rate but no test
53
is perfect and errors may be made with any procedure. It is quite
possible that the clinician may be able to improve upon the linear
statistical method (Z score) by utilizing combinations of both
"intuitive" and statistical data.
Your task is to try to make the best estimate of probable length
of stay in therapy given additional, but limited, information. It is
possible to correctly classify all the profiles. It is hoped that
your predictions will in some way help us to understand one aspect
of the decisionmaking process as it is applied by psychologists in
clinical settings.
APPENDIX AIV
INSTRUCTIONS PART IV
Your task on Part IV is identical to that of Parts I, II, and
III, utilizing the same 25 profiles. You are to predict as accur
ately as possible on the basis of the information given, whether the
student is (S) or (L). Again, more information will be made avail
able to you. In addition to the composite Z score, biographical
data, and test data, you will also be told the conditional probabili
ties and base rates for the groups and test signs.
Conditional probabilities combine test signs, (+) or (), and
base rates to yield a quantitative index of the probability of
correct classification when Z232.02 (+) or when Z 32.02 ().
For example, some of the subjects will be (L) when Z232.02 (+)
and some will be (S) when Z 32.02 (). The problem is to determine
how confident we can be with each test sign under the base rates of
the population. The base rates for the two groups are: Short (S)=
66 per cent and Long (L)=34 per cent. In other words, 34 per cent
of the subjects stayed a long time in therapy and 66 per cent stayed
only a short time. The majority, therefore, were shorts (S).
Based on this information, the conditional probabilities are:
for a (+) test sign, P(L/+)=.51 and for a () test sign, P(S/)=.78.
This means that the probability of a person staying a long time in
therapy (L), given a positive test sign, is .51, and the probability
of a person staying a short time (S), given a negative test sign, is
.78.
A conditional probability of .51 for a (+) test sign means that
you would be as often wrong as you were correct in prediction (L) for
a (+) sign. A conditional probability of .78 for a () test sign
means that you would be correct more often than you would be wrong
in predicting (S) for a () test sign.
Your task is to try to make the best estimate of probable length
of stay in psychotherapy given additional, but limited, information.
It is possible to correctly classify all the profiles. It is hoped
that your predictions will in some way help us to understand one
aspect of the decisionmaking process as it is applied by psycholo
gists in clinical settings.
Again, please indicate your confidence in your judgment for each
"subject from 50 per cent to 100 per cent.
APPENDIX B
SUMMARY OF NEWMANKEULS TESTS
APPENDIX BI
SUMMARY OF NEWMANKEULS TEST FOR GROUP MEAN DIFFERENCES
Differences among Level I means
GS XSGS
"GS = .46  .12* .15**
= .58  .03
LSGS = 61
* 2<. 01
Differences among Level II means
iGS P SGS
GS= .57  .04 .11
= .61  .05
p
X = .66
SGS
Differences among Level III means
X XX
UGS 1P SGS
GS = .63  .03 .10
= .66  .07
XG =.73 
SGS
Differences among Level IV means
UGS P SGS
GS .66  .04 .05
Xp .70  .01
XSGS .71
APPENDIX BII
SUMMARY OF NEWMANKEULS TEST FOR INFORMATION MEAN DIFFERENCES
Differences among SGS means
I II Iv II
S = .61  .05 .09 .12
I
X1 = .66  .04 .07
S.70  .03
Iv
X = .73 
III
Differences among P means
x = .58
X, = .61
X = 66
I .71
Iv
X X X X
I I I II II IV
 .03 .08 .13"
 .05 .10
 .05

Differences among UGS means
I II III IV
x = .46  .I* .17 .20;
XI .57  .06 .09
 I .63  .03
I .66 
<** .01
* U<.05
Differences among group means
I II Ill Ilv
S = .55  .06* 12** .14,k,
 = .61  .06* .08*
S= .67  .02
xV = .69 
REFERENCES
Adams, J. K. A confidence scale defined in terms of expected percen
tages. Amer. J. Psychol., 1957, 70, 432436.
Dahlstrom, W. G., & Welsh, G. S. An MMPI handbook: a ouide to use
in clinical practice and research. Minneapolis, Minn.: The
University of Minnesota Press, 1968.
Edwards, A. L. Exerimental design in psychological research. New
York: Rinehart & Company, Inc., 1950.
Goldber., L, R. 1he effectiveness of clinicians' judgements: the
diagnosis of brain damage from the BenderGestalt Test. J.
consul,. Psychol., 1959, 23, 2533.
Go!dberg, L. R. Simple models or simple processes? Some research on
clinics! judgements. Amer. Psyc t, 1968, 23, 483496.
Golden, M. Some effects of combining psychological tests on clinical
inferences. 1. consult. Psychol., 1964, 28, 440446.
Holt, R. R. Clinical and statistical prediction: a reformation and
some new data. J. abnorm. soc. Psychol., 1958, 56, 112.
Holtzman, W. H. Can the computer supplant the clinician? J. clin.
Psychol., 1960, 16, 119122.
Kirk, R. E. Experimental design: procedures for the behavioral
sciences. Belmont, Calif.: Brooks/Cole Publishing Co., 1968.
Kostlan, A. A method for the empirical study of psychodiagnosis. J.
consult. Psychol. 1954, 18, 8388.
61
Lindzey, G. Seer versus sign. J. exp. Res. Pers., 1965, 1, 1726.
Meehl, P. E. Clinical versus statistical prediction. Minneapolis,
Minn.: University of Minnesota Press, 1954.
Meehl, P. E. Seer versus signs: the first good example. J. exi .
Res. Pers., 1965, 1, 2733.
Meehl, P. E., & Rosen, A. Antecedent probability and the efficiency
of signs, patterns, or cutting scores. Psvchol. Bull., 1955,
52, 194216.
Megargee, E. I. (ed.). Research in clinical assessment. New York:
Harper & Row Publishers, 1966.
Mello, Nancy K., & Guthrie, G. M. MMPI profiles and behavior in
counseling. J. counsel. Psychol., 1958, 5, 125129.
Oskamp, S. The relationship of clinical experience and training
methods to several criteria of clinical prediction. Psvchol.
Mono 1S2. 76, No. 547.
Satz, P A block rotation task: the application of rultivariate and
decision theory analysis for the prediction of organic brain
disorder. Psychol. Monoqg., 1966, 80, No. 629.
Shagoury, P. Ta1 influence of statistical information on clinical
decisions. Unpublished master's thesis, University of
Florida, 1969.
Shagoury, P., & Satz, P. Tne effect of statistical information on
clinical prediction. Proceedings: 77th Annual Convention,
APA, !969, 310311.
BIOGRAPHICAL SKETCH
Ann Weimer Moxley was born March 14, 1946, in New York City,
New York. She attended public schools in Gainesville, Florida, and
graduated from Gainesville High School in 1964. She enrolled in the
University of Florida in September, 1964, and received her Bachelor
of Arts degree, magna cum laude, in December, 1967. She received
her Master of Science degree at the University of Florida in December,
1968. She is currently engaged in her clinical psychology internship
at the University of Rochester's Strong Memorial Hospital in Rochester,
New York.
She is married to James Edward Moxley who received his Master 's
in Business Administration at the University of Florida and is now
employed by Eastman Kodak in Rochester, New York. Ann is a member of
Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board.
This dissertation was prepared under the direction of the
chairman of the candidate's supervisory committee and has been
approved by all members of that committee. It was submitted to
the Dean of the College of Arts and Sciences and to the Graduate
Council, and was approved as partial fulfillment of the require
ments for the degree of Doctor of Philosophy.
December 1970
Dean, College of Arts and ciencs
Dean, Graduate School
Supervisory Committee:
Chairman
2Aa sS 
r ma
